Bigdata Thoughts is a channel focused on Bigdata technologies and cloud solutions. Would be sharing my experience of building different use cases on cloud, the challenges faced and the different solutions. Would also talk about different technology options, architecture and design principles for each of the solutions.
This is excellent and valuable knowledge sharing... Easily one can make out these trainings are coming out of personal deep hands-on experience and not the mere theory ..Great work
how i join small table with big table but i want to fetch all the data in small table like the small table is 100k record and large table is 1 milion record df = smalldf.join(largedf, smalldf.id==largedf.id , how = 'left_outerjoin') it makes out of memory and i cant do broadcast the small df idont know why what is best case here pls help
That was very well explained. Thank you for putting this together. One question though, do you really think data modelling should be done on the Gold layer? I don't think so because Gold datasets are just busineess level aggregates suited to particular business consumption needs. Whereas Silver layer is the warehouse in Lakehouse. That is where modelling should be done, if needed.
Thank you for the detailed explanation. However the problems that I faced with reading dates prior to 1900, does not resolve even after setting all the mentioned properties. Does any one have a working example that solved the issue of reading dates prior to 1900. Below is the code that I added but did not work. conf = sparkContext.getConf() conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "CORRECTED") conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "CORRECTED") conf.set("spark.sql.datetime.java8API.enabled", "true")
@@BigDataThoughts did you explained in this style big query also. ? improvement in this video can be summary in slow way. please dont get hurt because i gave comment. you did really well in video. excellent explanation.