Absolutely blown away by this RU-vid video! In just one word: phenomenal. It's like diving into an encyclopedia dedicated to CI/CD pipelines. My quest for a basic explanation led me to countless sources, but this video turned out to be an absolute goldmine.
Well spoken, well prepared, nicely presented. Thank you for helping others. One suggestion (IMHO): I would reduce the last 10 minutes to 2 to 3 minutes, for example: In dashboard, instead of showing the removal of each and every dataframe, I would just show the removal of one, and tell the audience "Likewise, you can remove all the other dataframes". Same thing for adding title (header) to each visualization and arranging visualizations: I would just do it for one and tell audience "Likewise you can add title to all other visualization and arrange them per your requirements". Then I would just fast forward (skip) to show the final view of the dashboard with a few seconds of my comments.
So is Spark use for aggregating and viewing data only like this ?? It's for Data analyst so ? No, Could you show a real example with data coming from a source (exemple an API) and writing production code to send spark job on batch data ?
Thanks for the informative session. Can you please let me know if we can import all the functions together instead of importing them one by one ( eg: from pyspark.sql.functions import month,year,quarter ) like we import libraries pandas,matplotlib, etc in Python?
Hi Sir, one question on the query "frequency of customer who visited restaurant". In the Sales.csv file there are 27 records with restaurant entries.Your output giving 21 records. In your video you did ".agg(countDistinct("ordered_date"))" I changed that with "agg(count("customer_id"))" and I got 27 records matching with the input file. Request you to look into it and suggest if any misunderstanding from my end.
All your videos are commendable. Could you please create a video on scheduling the execution of a Databricks notebook using Azure Data Factory (ADF) pipeline?
Hey, was there a need to use inferschema option when you are manually defining the schema? Can you please reply? Also, from where we can download the data set for practice?
how can we store this dashboard into pdf or how can we share this dashboard to others and can you pls share the ppt that you are presented in the video
Hi, that's good explanation, I liked it. but my advise is please don't say Ok all the times and don't go fast. If you can improve these 2 things in your explanation then you can become good tutor.
Hi, Im working on the pay-as-you-go service of Databricks. When I'm uploading the file its not giving me the path of my computer where the file is stored. It's getting stored in the 'hive' of the databricks as a table and sales.csv its getting changed to delta format. Can you tell me how to upload a csv file and work on it. Thank you.
Iam preparing for interviews,Iam watching and practicing your realtime pyspark projects it's very helpful for me, If possible can you make video on how to explain about real time project in interviews,and what type of questions could I expect they will ask about realtime projects.
earlier it was running but now for this command:- sales_df = sales_df.withColumn("order_month",month(sales_df.order_date)) sales_df = sales_df.withColumn("order_quarter",quarter(sales_df.order_date)) display(sales_df) this is the error i m getting:- AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "month(order_date)" due to data type mismatch: parameter 1 requires "DATE" type, however, "order_date" is of "INT" type.;