Spark performance optimization is one of the most important activity while writing spark jobs. This video talks in detail about optimizations that can be done at code level to optimize spark jobs.
This is excellent and valuable knowledge sharing... Easily one can make out these trainings are coming out of personal deep hands-on experience and not the mere theory ..Great work
Earlier I watched some videos regarding this topic ,no one can explained in this way ,I am glad to see this video,now clearly understood spark optimization techniques
Awesome explanation of the optimisation techniques. If possible please create a video to cover the realtime challenges which you faced in your project and the solution you provided. That will be really helpful.
thanks a lot , i have case where someother modules write parquet file , i need to process in my module by reading it, so how should i apply bucketing on that day ...can it be possible without writing ???
@@BigDataThoughts thanks! Can you build an end to end project or some mini project where one can see how and where these properties arte getting implemented? Just watching these in silos only give half knowledge. Thanks.
how i join small table with big table but i want to fetch all the data in small table like the small table is 100k record and large table is 1 milion record df = smalldf.join(largedf, smalldf.id==largedf.id , how = 'left_outerjoin') it makes out of memory and i cant do broadcast the small df idont know why what is best case here pls help