Тёмный

34. Databricks - Spark: Data Skew Optimization 

Raja's Data Engineering
Подписаться 22 тыс.
Просмотров 23 тыс.
50% 1

#DataSkew, #Bigdata-Dataskew, #BigdataOptimization, #AdaptiveQueryExecution, #AQE, #DatabricksDataskew, #SparkSalting, #Salting, #DatabricksSalting, #SkewHint, #SparkSkewhint, #DatabricksOptimization,#pysparkOptimization, #sparkOptmimization, #SparkPerformanceOptimization, #SparkPerformance, #DatabricksPerformanceImprovement,#Databricks, #DatabricksTutorial, #AzureDatabricks
#Databricks
#Pyspark
#Spark
#AzureDatabricks
#AzureADF
#Databricks #LearnPyspark #LearnDataBRicks #DataBricksTutorial
databricks spark tutorial
databricks tutorial
databricks azure
databricks notebook tutorial
databricks delta lake
databricks azure tutorial,
Databricks Tutorial for beginners,
azure Databricks tutorial
databricks tutorial,
databricks community edition,
databricks community edition cluster creation,
databricks community edition tutorial
databricks community edition pyspark
databricks community edition cluster
databricks pyspark tutorial
databricks community edition tutorial
databricks spark certification
databricks cli
databricks tutorial for beginners
databricks interview questions
databricks azure

Наука

Опубликовано:

 

9 дек 2021

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 31   
@Prashanth-os5he
@Prashanth-os5he 11 месяцев назад
This is by far the best databricks and spark tutorial series on youtube... great job Raja
@rajasdataengineering7585
@rajasdataengineering7585 11 месяцев назад
Glad you think so! Thanks for your comment
@joyo2122
@joyo2122 2 года назад
You are the best Raja 🙌
@srinubathina7191
@srinubathina7191 11 месяцев назад
Awesome content Thank You So much Sir
@rajasdataengineering7585
@rajasdataengineering7585 11 месяцев назад
Glad you liked it
@abhinavsingh1173
@abhinavsingh1173 10 месяцев назад
Your course it best. But problem with you course is that you are not attching the github link for your sample data and code. Irequest you as your audience please do this. Thanks
@sumanmondal8836
@sumanmondal8836 2 года назад
Thanks, Raja, your explanations are really good...can you please make a video on salting techniques with example? It will be very helpful.
@rajasdataengineering7585
@rajasdataengineering7585 2 года назад
Thank you Suman. Sure, will make a video on salting
@skasifali4457
@skasifali4457 Год назад
Thanks Raja..Your video is really useful. Can you please create a video on debugging techniques and how we can use spark UI to debug and understand the bottleneck using use cases. Thanks a lot again
@rajasdataengineering7585
@rajasdataengineering7585 Год назад
Sure Asif, will post a video on debugging
@sravankumar1767
@sravankumar1767 2 года назад
Superb
@rajasdataengineering7585
@rajasdataengineering7585 2 года назад
Thank you
@SaurabhDestiny18
@SaurabhDestiny18 Год назад
Hi Tq for such useful videos, i have one question, i am still confused about executor boundary and cores/tasks boundary. In your first video you mentioned executor can have many cores/ram and then this video you mention executor runs in its own jvm process , which means all the cores/tasks are running under one jvm process? Or under than parent jvm process there are many more jvm process are running which are equal to number of cores/tasks?
@naveenkumarsingh3829
@naveenkumarsingh3829 24 дня назад
why cant we use set maxpartitionbytes to get equal size of partitions and handle data skewness?
@VishalSharma-hv6ks
@VishalSharma-hv6ks 2 года назад
You mainly focus on theoretical. It would be great if you write the code for salting as well.
@rajasdataengineering7585
@rajasdataengineering7585 2 года назад
Sure, will post another video with coding example
@prathapganesh7021
@prathapganesh7021 3 месяца назад
thank you
@rajasdataengineering7585
@rajasdataengineering7585 3 месяца назад
Welcome!
@Personalcomments
@Personalcomments 2 года назад
Your videos are very informative. Can you please post a video on Client mode vs Cluster mode vs local
@rajasdataengineering7585
@rajasdataengineering7585 2 года назад
Sure Merin, will post the video on this topic
@tanushreenagar3116
@tanushreenagar3116 Год назад
nice
@rajasdataengineering7585
@rajasdataengineering7585 Год назад
Thanks
@rajunaik8803
@rajunaik8803 11 месяцев назад
Hi Raja, QQ - Does AQE take care of salting and skew hint technique automatically in case of data skewness? Or do we have to explicitly apply them?
@rajasdataengineering7585
@rajasdataengineering7585 11 месяцев назад
Yes AQE handles data skewness automatically. In later spark versions after 3.0, it is enabled by default. For prior versions of spark, we just need to enable AQE through spark config settings
@rajunaik8803
@rajunaik8803 11 месяцев назад
@@rajasdataengineering7585 thanks alot for your response. Do you have any telegram channel? And may I know your LinkedIn id please
@balakrishna61
@balakrishna61 2 месяца назад
@rajasdataengineering7585 Please explain salting in detail.It's not clear how you parition the German-1,_2 and so on .Each record will become one partition correct in this case?
@sanskarsuman9340
@sanskarsuman9340 Год назад
i have doubt: when u say data is partitioned on country and there are five different countries, out of which lets say Germany has 80% of data, so how can I say that germany data is in single partition only? coz partition is determined on the size of the block and 1 parttion = 128mb size, so depending on its size, germany data could be splitted into multiple partitions automatically?
@ndbweurt34485
@ndbweurt34485 Год назад
same question i had
@iamkiri_
@iamkiri_ 7 месяцев назад
Thanks for the video, I have a question.. Is salting technique applied while reading the data from source or during intermediate processing of the application..
@rajasdataengineering7585
@rajasdataengineering7585 7 месяцев назад
It is applied during transformation stage, not at data extraction
@iamkiri_
@iamkiri_ 7 месяцев назад
Thanks Bro
Далее
НЕ ДЕЛАЙТЕ УКЛАДКИ В САЛОНАХ
00:43
24 Fix Skewness and Spillage with Salting in Spark
21:17