Bro Thank You So much your videos helped me to get the good hike of 160% that completely changed things for me. Please create new videos. Your way of explaining things is awesome. ❤❤
bahot badhiya , i have been working in bigdata domain for last 12+ years and i can say that this is well explained. Your videos do show the effort you are putting in.
I think Shuffle Sort Merge JOIN is the default join in spark from 2.3 version, right? Correct me if I am wrong. You mentioned Shuffle hash join as default join in spark.
In Shuffle hash join first step is partition, For example in the code anywhere we didn't use partition, in this case also partition will happen as strategy of inside the shuffle hash join ?
Its a simple ex assuming that after partition ,each partion has same key matching with hashed dataset , but you should have took say 101,102 in part-1 , 102,103 in part- 2 etc
how i join small table with big table but i want to fetch all the data in small table like the small table is 100k record and large table is 1 milion record df = smalldf.join(largedf, smalldf.id==largedf.id , how = 'left_outerjoin') it makes out of memory and i cant do broadcast the small df idont know why what is best case here pls help