MANISH KUMAR

180
1 566 415

Hello Everyone,
My name is Manish Kumar and I am currently working as Data engineer @Jio.

If you want to connect with me then reach out to me on:-
topmate.io/manish_kumar25

On this channel, I upload videos related to Data engineering. I have uploaded few podcast too.

If you are looking for Data engineering roadmap then go to my videos titled "How I bagged 12 offfer". I have explained my strategy in that video.

Hope I am adding some values in your Data engineering career through the videos.

Комментарии

@AbhishekSharma-ue8sn 15 часов назад

Can anyone provide me the pdf of this all classes?

@satyamkumarjha4185 15 часов назад

sets are mutable.

@sachinkrshaw74 17 часов назад

person_df.join(address_df,address_df["personid"]==person_df["personid"],"left").select(col("firstName"), col("lastName"), col("city"), col("state")).show()

@raaj6869 22 часа назад

Bhaiya Videos kyun nhi aa rahi h, Ek community post daal dijiye. Thank you bhaiya for your Python and Spark Series.❤❤

@vaibhavshanbhag5016 22 часа назад

@manish_kumar_1 Sir kya mast content banaya, maja aa gaya, thank you!

@bolisettisaisatwik2198 23 часа назад

Wow, Manish, you are such an amazing human. Your ability to teach complex things such easily is not common. Please continue the Playlist. I have learned so much from your videos. Your lectures helped secure a job. I am post this only to let you know that your work isn't going to waste. A lot of people like me are learning and also implementing in real life.

@meetpatel2720 День назад

Dsa ka question nahi puchha tha?

@HarshKumar-adi День назад

Very Good.....

@mdtasauvar День назад

for item in data.get('MAINDATA'): # print (item) for next_item in item.get('HeaderFields'): print (next_item) for key in next_item.keys(): if key =='FieldTypeName': count +=1 print (count)

@AKSHAY28ful День назад

hi manish !!! are you going to continue this series or this was the last video of this course please let me know

@manish_kumar_1 День назад

For now you can consider this as a last video

@VivekKhare-z1m День назад

count = 0 for word in data["MAINDATA"]: for headerFile in word["HeaderFields"]: if "FieldTypeName" in headerFile: count += 1 print(count) Ans : count = 38

@VivekKhare-z1m День назад

labour_with_cost = {"Mahesh" : 500,"Mithilesh" : 400, "Ramesh" : 400, "sumesh" : 300, "jagmohan" : 1000, "Rampyare" : 800} total_working_days = 50 absent_day = {"Mahesh" : 3 ,"jagmohan" : 7} for labour, cost_per_day in labour_with_cost.items(): if labour in absent_day: absent_count = absent_day[labour] present_days = total_working_days - absent_count total_cost = present_days * cost_per_day else: total_cost = total_working_days * cost_per_day logger.info(f"Total cost for {labour} is ₹{total_cost}")

@mohdhamza3591 День назад

The staging layer is not part of data warehouse. It is an intermediate storage used for data processing before it is loaded into the data warehouse. Otherwise, there will be no difference in data warehouse and data lake.

@prachideokar7639 2 дня назад

Ye pure project ke liye prerequisites kya hai plz rply

@manish_kumar_1 2 дня назад

Yes

@Sahil1001 2 дня назад

solution: df.filter(col("refer_id").isNull()).select("name").union( df.filter(col("refer_id") != 2).select("name") ).show()

@koushlendrasinghrajput6040 2 дня назад

please give data set so we can practice on thAT

@taukeerahmad9328 2 дня назад

I have recommended this channel to 100 of people , This is by far best channel for those who want to become data engineer.

@BishalKarki-pe8hs 3 дня назад

more video in SCD manish bhai

@deepaksharma-xr1ih 3 дня назад

good job man

@payalbhatia6927 3 дня назад

where could we see high cpu usage in spark UI when data coming from disk to memory and gets deserialized ?

@lakshyagupta5688 3 дня назад

Hi Manish, I am using the same code and getting 4 jobs: flight_data=spark.read.format("csv")\ .option("header","true")\ .option("inferSchema", "true")\ .load("/FileStore/tables/2010_summary.csv") flight_data_repartition = flight_data.repartition(3) us_flight_data=flight_data.filter("DEST_COUNTRY_NAME=='United States'") us_india_data = us_flight_data.filter((col("ORIGIN_COUNTRY_NAME")=='India') | (col("ORIGIN_COUNTRY_NAME")=='Singapore')) total_flight_ind_sing = us_india_data.groupby("DEST_COUNTRY_NAME").sum("count") total_flight_ind_sing.show() Can you explain the reason for the same.

@__oo__._._._._._._._.___00007 3 дня назад

Thank you

@manishgound1091 4 дня назад

Bhai mast batate ho

@satyamkumarjha4185 4 дня назад

count = 0 for i in range(len(data["MAINDATA"])): for j in range(len(data["MAINDATA"][i]["HeaderFields"])): for k in (data["MAINDATA"][i]["HeaderFields"][j]): if k =="FieldTypeName": count = count+1 else: count logger.info(count)

@shreyakeshari951 4 дня назад

Hi, Thank you for such informative videos I am not able to find Lecture-6 in Spark Fundamental Series, please guide me from where I can watch lecture -6

@poojapatil7193 4 дня назад

Did you missed to upload the video because after lect 5 you taught spark architecture.. didn't told of CSV files or corrupted data storing part.

@tejasnareshsuvarna7948 4 дня назад

Thank you so much for this. I now understand the concept of Window functions crystal clear!

@saurabhkatkar 4 дня назад

Answer for Q1: sales_df = sales_df.withColumn('sales_date',from_unixtime(unix_timestamp('sales_date', 'dd-MM-yyyy'))) grouped_df = sales_df.groupBy('product_name',year('sales_date').alias('year'),month('sales_date').alias('month')).agg(sum('sales').alias('total_sales_monthly')) sum_window = Window.partitionBy('product_name','year') grouped_df.withColumn('total_sales', sum('total_sales_monthly').over(sum_window))\ .withColumn('percent_month_sales_wrt_total',\ round(100*col('total_sales_monthly')/col('total_sales'),2)).show()

@sandeep7077 5 дней назад

Very informatic video.. like it love it

@aasthagupta9381 5 дней назад

Python 3.10.7 and Spark 3.5.1 worked in my case

@SandeshMotoVlogs 5 дней назад

Sorry SCD0 explanation is wrong. In SCD0, Once the data is written to the table, we can't alter it no matter if the source data is changed

@kartikjaiswal8923 5 дней назад

nice explanation

@voice6905 5 дней назад

Hi Manish Sir, can you start a playlist on how to work on streaming data with Apache Kafka? Basically expecting a playlist on streaming data analyze and processing, Thank You.

@__oo__._._._._._._._.___00007 6 дней назад

I have done ineer join to filter matching records of right table with left table, althugh left_semi is now one shot solution 😅

@imamhussain7544 6 дней назад

Hii bro, how's work pressure and work life balance at jio

@rajkumardubey5486 6 дней назад

Count bhi ek action hoga na means 3 job create hua

@Rajeshkumbhkar-x6v 6 дней назад

Amazing explanation sir jii

@JokesFunMasti 7 дней назад

l_w_cost={"mahesh":500,"Ramesh":400,"Mithilesh":400,"Jagmohan":1000,"Rampyare":800} total_cost=0 for i in range(0,50): for j in l_w_cost: total_cost=total_cost+l_w_cost[j] print(total_cost) #sub Jagmohan for 7 days for i in range(0,7): total_cost=total_cost-l_w_cost["Jagmohan"] print(total_cost) #sub Mahesh for 3 days for i in range(0,3): total_cost=total_cost-l_w_cost["Ramesh"] print(total_cost) 155000 148000 146800

@agarwalankita504 7 дней назад

your videos are very useful but your speaking content is more which makes the video large and boring

@aniketraut6864 8 дней назад

Thank you Manish bhai for the awesome videos, thanks for giving the script.

@QuaidKhan1 8 дней назад

Love from Pakistan

@QuaidKhan1 8 дней назад

superb Teaching Skills a lot of love from Pakistan 😘

@QuaidKhan1 8 дней назад

superb Teaching Skills

@QuaidKhan1 8 дней назад

Love from Pakistan ❤

@QuaidKhan1 8 дней назад

proud u sir Love from Pakistan

@QuaidKhan1 8 дней назад

real teacher🥰

@__oo__._._._._._._._.___00007 8 дней назад

Aashirwad dijiye guru ji,Aapke pure course ko complete kr paun 🙏

@manish_kumar_1 6 дней назад

All the best

@__oo__._._._._._._._.___00007 6 дней назад

🙏

@BeingSam7 8 дней назад

w = Window.partitionBy("Product_ID") Total_df = df.withColumn("Total", sum(col("Sales")).over(w)) Total_df.withColumn("Percen_Sales", round((col("Sales")*100)/col("Total"),2) ).show()

@PrashantKumar-gp4cv 8 дней назад

Hello Manish sir, first of all thank you so much for the knowledgeable playlist. we learned lot from it. can you make a video on. "How to read data from Database table using pyspark in Databricks?" Thanks.

@DsSarangi23 8 дней назад

i have attribute error in this video when I write df.write.format