Wow, Manish, you are such an amazing human. Your ability to teach complex things such easily is not common. Please continue the Playlist. I have learned so much from your videos. Your lectures helped secure a job. I am post this only to let you know that your work isn't going to waste. A lot of people like me are learning and also implementing in real life.
for item in data.get('MAINDATA'): # print (item) for next_item in item.get('HeaderFields'): print (next_item) for key in next_item.keys(): if key =='FieldTypeName': count +=1 print (count)
count = 0 for word in data["MAINDATA"]: for headerFile in word["HeaderFields"]: if "FieldTypeName" in headerFile: count += 1 print(count) Ans : count = 38
The staging layer is not part of data warehouse. It is an intermediate storage used for data processing before it is loaded into the data warehouse. Otherwise, there will be no difference in data warehouse and data lake.
Hi Manish, I am using the same code and getting 4 jobs: flight_data=spark.read.format("csv")\ .option("header","true")\ .option("inferSchema", "true")\ .load("/FileStore/tables/2010_summary.csv") flight_data_repartition = flight_data.repartition(3) us_flight_data=flight_data.filter("DEST_COUNTRY_NAME=='United States'") us_india_data = us_flight_data.filter((col("ORIGIN_COUNTRY_NAME")=='India') | (col("ORIGIN_COUNTRY_NAME")=='Singapore')) total_flight_ind_sing = us_india_data.groupby("DEST_COUNTRY_NAME").sum("count") total_flight_ind_sing.show() Can you explain the reason for the same.
count = 0 for i in range(len(data["MAINDATA"])): for j in range(len(data["MAINDATA"][i]["HeaderFields"])): for k in (data["MAINDATA"][i]["HeaderFields"][j]): if k =="FieldTypeName": count = count+1 else: count logger.info(count)
Hi, Thank you for such informative videos I am not able to find Lecture-6 in Spark Fundamental Series, please guide me from where I can watch lecture -6
Hi Manish Sir, can you start a playlist on how to work on streaming data with Apache Kafka? Basically expecting a playlist on streaming data analyze and processing, Thank You.
l_w_cost={"mahesh":500,"Ramesh":400,"Mithilesh":400,"Jagmohan":1000,"Rampyare":800} total_cost=0 for i in range(0,50): for j in l_w_cost: total_cost=total_cost+l_w_cost[j] print(total_cost) #sub Jagmohan for 7 days for i in range(0,7): total_cost=total_cost-l_w_cost["Jagmohan"] print(total_cost) #sub Mahesh for 3 days for i in range(0,3): total_cost=total_cost-l_w_cost["Ramesh"] print(total_cost) 155000 148000 146800
Hello Manish sir, first of all thank you so much for the knowledgeable playlist. we learned lot from it. can you make a video on. "How to read data from Database table using pyspark in Databricks?" Thanks.