Performance Tuning in Spark

Подписаться 19 тыс.

Просмотров 7 тыс.

50% 1

If you need any guidance you can book time here, topmate.io/bha...
Follow me on Linkedin
/ bhawna-bedi-540398102
Instagram
www.instagram....
You can support my channel at: bhawnabedi15@okicici
Here are the links you might need to re check!
JOIN STRATERGIES IN SPARK
• 35. Join Strategy in ...
CHOOSE RIGHT CLUSTER CONFIGURATION
• 22. How to select Work...
• Databricks Cluster Cre...
CORRECTLY PARTITION THE DATA
• Partitions in Data bricks
• 8. Delta Optimization...
Z-ORDER/COMPACTING
• 8. Delta Optimization...

Опубликовано:

4 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 11

@oldoctopus393 Год назад

1) 0:54 - not correct. DataSets and DataFrame has to be serialized and de-serialized as well, but since these APIs impose structure on data collection these processes could be faster. Overall RDDs provide more control to Spark in terms of data manipulations; 2) not all DataFrames could be cached; 3) UDFs could be converted into native JVM bytecode with help of Catalyst optimizer. You may use df.explain() to see something like "Generated code: Yes" or "Generated code: No" in the output

@krishnasai7550 2 месяца назад

Hi bawana, I learned somewhere we cannot uncache the data but we can unpersist so we use persist more inplace of a cache. but here you mentioned we can uncache. I'm bit confused which is correct?

@CoolGuy 11 месяцев назад

Bucketing, salting are also good optimization techniques.

@EDWDB Год назад

Thanks Bhawna, can you please make a video on monitoring and troubleshooting spark jobs via UI

@tanushreenagar3116 9 месяцев назад

So nice its helps a lot

@AyushSrivastava-gh7tb Год назад

Hi Bhawna. Your videos have helped me immensely in my databricks journey and I've nothing but appreciation for your work. Just a humble request, could you also please make a video on Databricks Unity Catalog??

@cloudfitness Год назад

Yes already done with a playlist in UC 😀