Are you ready to embark on a thrilling journey into the world of Azure Databricks and Apache Spark? Look no further! Our channel is your go-to destination for all things related to these powerful data processing and analytics tools.
Join us as we delve into the depths of Azure Databricks and Apache Spark, unraveling their capabilities, exploring best practices, and unlocking the secrets to harnessing their true potential. Whether you're a data engineer, data scientist, or a curious learner passionate about big data technologies, our channel offers a wealth of knowledge to fuel your growth.
Here's what you can expect: In-depth Tutorials Best Practices and Tips Use Case Discussions Performance Optimization Interview Preparation
Get ready to unlock the full potential of Azure Databricks and Apache Spark with our engaging and informative videos. Don't forget to subscribe to our channel and hit the notification bell, so you never miss an update.
Thanks mate for the detailed info as i started to watch with first video then i continue watch the complete series. Please could you attach the lab work what you explain the video and much appreciated for your great work!
I have a doubt regarding update operation, you mentioned that delta engine scans for those particular files which have records that needs to updated and then updates on them, but if this the case, how time travel could be possible because updating existing files will result in loss of historical data.
Parquet files are immutable in nature. So during update, relevant files are getting scanned and based on updated value new parquet files are getting created. It won't overwrite existing parquet files
Sir your videos are really good and very understandable in a very simple language. I'm stuck as my cluster is not running. it is saying - Azure Quota Exceeded Exception. I'd be grateful if you could help solve this.
Awesome explanation,you will deserve a big applause for this.Every second in this video plays main role in understanding the concept of partitioning of data.Really ,loved the conent you explained in this manner.
What is Lit() : whenever we want to add a constant literal value to entire data frame, then we go with LIT(). we can also add these values only to certain records using when and otherwise. Eg: EMPDF = df.withcolname("Bonus",when(df.sal>50k, lit(sal*10)).otherwise(lit(sal*20))....Thanks for the amazing session Raj sir
Wonderful, Isnt the delta lake schema on write? Delta Lake tables are schema on write, which means that the schema is already defined when the data is read. Delta Lakes are aware when data with other schemas have been appended.
If possible can you also try to explain if we can update only certain range of partition data. For eg. if the data is partition by month , and i want to update only last 3 months of partition data then how we can achieve that?