Apache Spark Executor Tuning | Executor Cores & Memory

Подписаться 4,4 тыс.

Просмотров 6 тыс.

50% 1

Welcome back to our comprehensive series on Apache Spark Performance Tuning & Optimisation! In this guide, we dive deep into the art of executor tuning in Apache Spark to ensure your data engineering tasks run efficiently.
🔹 What is inside:
Learn how to properly allocate CPU and memory resources to your Spark executors and the number of executors to create to achieve optimal performance. Whether you're new to Apache Spark or an experienced data engineer looking to refine your Spark jobs, this video provides valuable insights into configuring the number of executors, memory, and cores for peak performance. I’ve covered everything from understanding the basic structure of Spark executors within a cluster, to advanced strategies for sizing executors optimally, including detailed examples and calculations.
📘 Resources:
📄 Complete Code on GitHub: github.com/afaqueahmad7117/sp...
🎥 Full Spark Performance Tuning Playlist: • Apache Spark Performan...
🔗 LinkedIn: / afaque-ahmad-5a5847129
Chapters:
0:00 - Introduction to Executor Tuning in Apache Spark
0:37 - Understanding Executors in a Spark Cluster
3:30 - Example: Sizing Executors in a Cluster
4:58 - Example: Sizing a Fat Executor
9:34 - Example: Sizing a Thin Executor
12:50 - Advantages and Disadvantages of Fat Executor
18:25 - Advantages and Disadvantages of Thin Executor
22:12 - Rules for sizing an Optimal Executor
26:30 - Example 1: Sizing an Optimal Executor
38:15 - Example 2: Sizing an Optimal Executor
43:50 - Key Takeaways
#ApacheSparkTutorial #SparkPerformanceTuning #ApacheSparkPython #LearnApacheSpark #SparkInterviewQuestions #ApacheSparkCourse #PerformanceTuningInPySpark #ApacheSparkPerformanceOptimization #ApacheSpark #DataEngineering #SparkTuning #PythonSpark #ExecutorTuning #SparkOptimization #DataProcessing #pyspark #databricks

Опубликовано:

15 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 80

@bijjigirisupraja8021 7 дней назад

Bro do the videos regularly on spark it will be very helpful. Thank you

@SandeepPatel-wt7ye 9 дней назад

This is awesome stuff..The executor Tuning concept is explained at a very granular level.

@afaqueahmad7117 7 дней назад

Appreciate it @SandeepPatel-wt7ye, thank you!

@mohitupadhayay1439 5 дней назад

Really waiting to see if you can add some real world use cases to your videos to strengthen our understanding. It will be appreciated a lot man!

@BabaiChakraborty-ss8pt 2 месяца назад

Man your tutorials are the best. I have been following you for Spark turning related videos. Thanks

@afaqueahmad7117 2 месяца назад

Thank you @BabaiChakraborty-ss8pt, really appreciate it, means a lot to me :)

@mayapareek2844 Месяц назад

Wow !! Great Content !! I am preparing for interviews and found this super helpful. Thanks a Ton !!

@afaqueahmad7117 Месяц назад

Glad you're finding it helpful @mayapareek2844, heartfelt thanks :)

@saineelkiranch9790 2 месяца назад

Excellent. Very Well Explained

@afaqueahmad7117 2 месяца назад

Thank you @saineelkiranch9790, really appreciate it :)

@seenu0104 3 месяца назад

Thank you very much for this amazing content with super easy explanation 👏👏

@afaqueahmad7117 2 месяца назад

Thank you @seenu0104, really appreciate it :)

@ComedyXRoad 2 месяца назад

thanks for the content and your efforts

@afaqueahmad7117 2 месяца назад

Thank you @ComedyXRoad, appreciate the kind words :)

@iamexplorer6052 3 месяца назад

Thanks for this currently working on job optimization it is very useful to me

@afaqueahmad7117 3 месяца назад

Thank you, really appreciate it :)

@adtempgupta 3 месяца назад

Thankyou so much for wonderful content. please start PySpark session

@sankarshkadambari2742 2 месяца назад

Amazing is the word you never dissapoint us . very greatful and indebted to you for this excellent content you are creating. God bless you !

@afaqueahmad7117 2 месяца назад

Thank you @sankarshkadambari2742, really appreciate it, means a lot to me :)

@AshishStudyDE Месяц назад

Great work, going good. I hope you cover 2 more topic of driver oom and executor oom. Why it happens and how we can tackle it.

@leilaturgarayeva105 3 месяца назад

Thank you for the useful content! IRL an analyst / engineer would have access to a huge cluster which is shared between many people / teams. It would be very interesting to watch a video where you calculate the amount of resources that should be requested based on the task at hand (particular dataset, task and output). And again - thanks for helping to understand these somewhat hard to grasp concepts :-)

@asokanramasamy2087 2 месяца назад

Great! If possible Pls make video of Spark streaming as well!

@iamkiri_ 2 месяца назад

Awesome :)

@afaqueahmad7117 2 месяца назад

Thank you @iamkiri_, really appreciate it :)

@purnimasharma9734 Месяц назад

Hell Afaque, your tutorials are excellent and I learnt so much about optimization techniques. I am wondering if you can add some real world use cases to your videos to strengthen our understanding. It will be appreciated a lot.

@wreckergta5470 3 месяца назад

Thanks

@afaqueahmad7117 3 месяца назад

Appreciate it, @wreckergta5470 :)

@Amarjeet-fb3lk 2 месяца назад

Thanks for this videos. I have been watching your videos from quite a while. You explain things in a very easy and simple manner. But, I thinks in real time we would be processing a very large amount of data, So , It will be great if you can make a video ön processing large amounts of data with all the optimisation techniques we can use. Thanks in advance.

@afaqueahmad7117 Месяц назад

Hey @Amarjeet-fb3lk, Thank you so much for the kind words; they truly mean a lot! I'm delighted to hear that you find the explanations easy and simple to understand. While production/large-scale projects are in the future plans, I would like to emphasize that the fundamental concepts and optimization techniques remain the same. My goal is to help you build a rock solid understanding of these concepts so you can confidently apply them in any scenario.

@yashwantdhole7645 17 дней назад

Hi Afaque, it is was a really nice video. Never got such detailed understanding anywhere. Do you also provide 1:1 session? If yes, I am highly interested.

@afaqueahmad7117 16 дней назад

Hey @yashwantdhole7645, appreciate the kind words, means a lot. At this moment, I do not take 1:1 sessions, but if you have any questions feel free to shoot an email or comment here in this thread :)

@dataterre 2 месяца назад

Thanks Afaque, this is an excellent video to start my Saturday morning. It has been on my list to do for the whole week. A couple of questions for you / community since this is very relevant to my current work. 1) Considering we are "exhausting" the cluster resources, could you explain where does driver node come into the picture in this pool of resources (e.g. --driver-memory)? I presume a sizeable amount of driver memory is required since we tend to collect data in the driver node in a count(), etc. 2) Understand the concept of optimal executor sizing here. Suppose my application abstraction is looking at optimal Spark sessions running in parallel, then this optimal tuning here would mean I can only run 1 spark-submit job in the entire cluster, right? Excellent video, again

@afaqueahmad7117 2 месяца назад

Hi @dataterre, thank you for the kind words, means a lot to me :) On the questions: 1. Indeed, a reasonable amount of cores and memory is required for the driver because it is the one coordinating the lifecycle of the application, managing communication, creating and scheduling tasks to be executed on executors. However, in this video, with specific focus being on "executor" tuning, driver resource allocation is skipped, but it's important to note (as you rightly pointed out) - driver will need resources for it's own functioning / executing it's responsibilities + collecting data as a result of actions (count(), show() etc..). I would think of subtracting out an appropriate number for driver cores and memory from the total cluster cores/memory and then doing the executor sizing discussed in the video. 2. Yes, this example assumes, you're taking up the whole cluster for best utilization. However, if you're looking forward to running multiple Spark sessions in parallel, you could do the following: a. Enable dynamic allocation (by setting `spark.dynamicAllocation.enabled` set to `true`) to allow each session to use resources. b. Define a reasonable minimum and maximum number of executors per application (by using `spark.dynamicAllocation.minExecutors`, `spark.dynamicAllocation.maxExecutors`) c. Adjust `spark.executor.cores` and `spark.executor.memory` using the principles/rules as discussed (in video), to ensure that each application gets enough resources to perform efficiently but not so much that it monopolizes cluster resources

@chitransh847 11 дней назад

sir can you please bring python and sql series for prep of interviews and also basics of it , remaining of the content is just great!

@afaqueahmad7117 7 дней назад

Thank you, appreciate it @chitransh847, Python coming soon :)

@yatinchadha1803 Месяц назад

Thanks Afaque for this great tutorial. This will really help while working on Spark Optimization. It would be of great help if you can tell how do you deal with this type of questions: - spark cluster size -- 200 cores and 100 gb RAM data to be processed --100 gb give the calculation of spark for driver memory, driver cores, executor memory, overhead memory, number of executors

@afaqueahmad7117 Месяц назад

Hey @yatinchadha1803, thanks for the kind words, really appreciate it. Regarding the question - after watching the video, it should be a cakewalk :)

@yatinchadha1803 Месяц назад

@@afaqueahmad7117 can you please guide on how to calculate the driver memory and driver cores?

@remedyiq8034 3 месяца назад

Hi, Can you please make a video on Spark UI or Databricks Spark UI understanding. There are a lot of tabs there; its tough to understand it.

@afaqueahmad7117 2 месяца назад

Hey @remedyiq8034, could you share which tabs are troubling you? The most important ones, I've discussed, sharing links below: 1. Storage tab: Caching video (ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-FujwRYkBwM4.html) 2. SQL tab: Master Reading Spark Query Plans video (ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-KnUXztKueMU.html) 3. Jobs/Stages/SQL - Unlock Performance With Spark DAG Mastery video (ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-O_45zAz1OGk.html)

@Amarjeet-fb3lk Месяц назад

Hi @Afaque I watched this video previously ,and I am still watching many more videos that covers, spark memory management and reading articles on spark memory and partitions. So here are some points that I have learnt. 1. Memory for each core should we 4 times of 128MB. 2. Total number of partitions should be , 4*no. Of cores. But, How should we decide the no. Of partitions,each partitions size, memory for each core. Because, this things will change,according to our data. So,can u answer this 3 questions? Thanks.

@atifiu 2 месяца назад

Thanks Afaque for this video. Have question regarding task level and executor level parallelism. As per my understanding 1 partition = 1 task = 1 core/thread, so how task level parallelism is executed as 1 task will be assigned to only one core which means within a executor remaining 46 cores will not be utilized if number of tasks are say only 5.

@maheshmahadev9918 2 месяца назад

Great Explanation, thanks !! I have a question: Can you explain the basis for choosing these numbers? Is it based on the incoming data that needs to be processed? In that case, for the calculations in this video, what is the data size considered. Thanks again

@afaqueahmad7117 2 месяца назад

Hey @maheshmahadev9918, the numbers for the cluster (X Nodes, Y Cores, Z RAM) are for illustration and independent of the incoming data size. As discussed in 34:06, the reason why I'm not talking about incoming data sizes because that should be tailored based on the "Memory per core". The most granular unit of data is going to be a "partition", and as long the core has got enough memory to process that partition, things will run fine. Would suggest to re-watch 34:06 again, if unclear :)

@satheeshkumar2149 3 месяца назад

How much of memory or core should we set aside for the internal stuff if we have got a standalone cluster instead of YARN ?

@ShubhamWakshe-e4c 15 дней назад

you talked about yarn application master. is it driver which contain application master container right? means we are assigning driver memory as 1 gb. right?

@rohitdeshmukh7274 Месяц назад

Very informative video. I have one question. I’m having databricks cluster and auto scaling is enabled. Will calculations change in that case?

@adusumillisudheer2772 13 дней назад

same question to me also. when autoscaling is enabled. how it will tune up the workers and executors inside it.

@Wonderscope1 Месяц назад

I really enjoy your videos. Thanks for sharing your knowledge. I have a question about how you create these videos. It is an amazing way to create tutorial videos. Do you mind share what tools you use to make these videos? Thanks

@afaqueahmad7117 Месяц назад

Thank you @Wonderscope1, really appreciate it. I use Notion and Miro :)

@Wonderscope1 Месяц назад

@@afaqueahmad7117 I am familiar with Notion as project managmeent tool I didn't know it can help with video production. I need to look into that. Thanks 😊

@afaqueahmad7117 Месяц назад

Sorry I meant Notion for the code snippets. I use Ecamm Live for video production :)

@Wonderscope1 Месяц назад

@@afaqueahmad7117 perfect that's what I was looking for . Thanks :)

@naveenreddybedadala 29 дней назад

Will that final actual executor memory again split into user,reserve, unified, overhead memory??

@maheshh1695 2 месяца назад

Hi thanks for sharing the information In fat executor case, since we have 5 nodes and each node is having only one executor , then number of cores should be 5*11 ie 55 cores right

@afaqueahmad7117 Месяц назад

Hey @maheshh1695, total cores will be 55 while cores per node is 11

@roshankumargupta46 2 месяца назад

Hi Afaque! Can you confirm if I'm wrong here. So if thin executors promote more parallelism than fat executors? Because in the case of a thin executor, the number of executors will be higher, resulting in more individual cores, which will eventually promote parallelism. Whereas in Fat executor, all cores will be consumed by Executors which may lead to wastage of resources.

@remedyiq8034 2 месяца назад

At 35:10 @afaqueahmad7117 I want to add one point. You said that executions happen in execution memory, that is 60 % percent, and 40 percent is user memory. So . 60 Percent of 20GB -> is 12 GB memory. Out of which 50 percent is for execution and 50 percent for storage. Let's assume 50 percent is given to execution(static allocation). Out of 12 GB, only 6 GB is for execution. As we have 5 cores per executor. Therefore 6/5 === approximately 1.2 per portion of memory per core. The maximum partition size that can be accommodated is 1.2 GB of partition. My thought process is correct ????

@iamkiri_ 2 месяца назад

Looks Like this is a valid question bro!

@afaqueahmad7117 2 месяца назад

Hi @remedyiq8034, this is a very valid point and thanks for highlighting this. You're absolutely right about ~1.2GB memory per core. My mind was referring to execution memory but I really appreciate your attention to the breakdown of the `--executor-memory` into its various components, which I should have explained more clearly before doing the memory per core calculation. I'll look into adding an info card to make this clear in the video. Thanks again for your sharp observation!

@remedyiq8034 2 месяца назад

@@afaqueahmad7117 Thanks > I learned a lot from you. Watched all your videos. Keep doing great work for the community . Better than paid coursed of Udemy !!

@ShubhamWakshe-e4c 15 дней назад

if we have already alloting 1 core and 1 gb ram for yarn/os deamons then why do we need to allot seperate 1 core and 1 gb or one executor for yarn resource manager?

@suresh.suthar.24 2 месяца назад

wonderfull explanation ahmad, i have one doubt like as in your example 23GB memory willl be assigned to each and every executor and then 10% will excluded for overhead memory so we will left with 20 GB memory for executor. So now this 20 GB memory is ON heap memory and this will be divided into reserved memory, storage memory, execution memory. Am i wrong or right please reply i have asked this question to my seniors but they dont have answer for this. Thank you in advance..!!

@afaqueahmad7117 2 месяца назад

Hey @SS1251, You're correct! The 20GB of memory is indeed on-heap memory and it will be divided respectively into reserved, storage, and execution memory. The memory defined through `--executor-memory` or `spark.executor.memory` is the one allocated to on-heap. You can refer this video to get a better understanding: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-sXL1qgrPysg.html :)

@swapnilpatil18 Месяц назад

Hi , in case of fat executor we assigned all 47 GB remaining to executor (1 GB for Hadoop yarn ops). In this case from where executor overhead memory will come from ??

@afaqueahmad7117 Месяц назад

Hey @swapnilpatil18, Good question. In the initial parts of the video (before explaining the 4 rules to size an optimal executor), the goal to explain fat executors was to only point out that they take up a large portion of the memory on a node and that was the rationale for not separating out the respective parts i.e. overhead memory, AM memory. However, you understanding is absolutely correct. The ideal calculation should involve subtraction of Max(384MB, 10 % 47GB) = Max(384MB, 4.7GB) = 4.7GB per executor before calculating the `--executor-memory`

@vikastangudu712 3 месяца назад

Great Video, Thanks for the Explanation, But how would a fat exec improve Data Locality ? A node can be broken into 11 exec or 1 exec, The HDFS storage or some other storage within the node is still the same for all the exec inside the node. Data Locality talks about the storage not memory. Thus Fat/Thin --> No effect on Data Locality.

@rambabuposa5082 2 месяца назад

Because FAT executor have more memory, it can store more partitions of your dataset and not much shuffling of data is required, and also it increases data locality (i.e. most of its required partitions are stored within that FAT executor)

@afaqueahmad7117 2 месяца назад

Hey @vikastangudu712, you're correct in saying that data locality talks about "storage". However, what I'm referring to is that the interplay with "memory" becomes important once data is loaded in memory in the sense that "how much" amount of data can be processed without having go through the overhead of having to load data from disk again. Several operations are going to benefit from this "memory" locality. In Spark, the best form of locality is `PROCESS_LOCAL` - which would mean that the data required for a task is present in the memory of the same JVM. Therefore, fat executors occupying most memory of the node would benefit in this case - given that the chances of data being present on the same JVM increases. Hope this clarifies :)

@rambabuposa5082 2 месяца назад

Hi @afaqueahmad7117 At 35.30 minutes, you were discussing about "Memory per core" which 4gb per core. If we have partitions with size of 128Mb or 256Mb with this 4gb per core configuration, is that mean any inefficient utilisation of resources (memory)? because one core can process upto 4gb but partition size is very less. Do we need to reduce "Memory per core" size to get better performance and efficient utilisation of resources? Many thanks

@afaqueahmad7117 2 месяца назад

Hey @rambabuposa5082, Good question! 4GB per core was for an example. If the partition sizes are 128MB or 256MB, then this would indeed be underutilising the cluster. You could reduce the memory per core giving some room for overhead (maybe 400MB per core for a 256MB partition), however, it's important to keep the 4 rules of the game as discussed in mind (e.g. keeping number of cores

@remedyiq8034 2 месяца назад

@@afaqueahmad7117 I want to add one point. You told that executions happen in execution memory, that is 60 % percent and 40 percent is user memory. So . 60 Percent of 20GB --> is 12 GB memory. Out of which 50 percent is for execution and 5- percent storage. Let's assume 50 percent is given to execution(static allocation). out of 12 GB, only 6 GB is for execution. As we have 5 cores per executor. therefore 6/5 === approximately 1.2 per portion of memory per core. Maximum partition size can be accommodated is 1.2 GB of partition. MY thought process is correct ????

@afaqueahmad7117 2 месяца назад

Copying the same answer as in the previous comment for the community :) """ Hi @remedyiq8034, this is a very valid point and thanks for highlighting this. You're absolutely right about ~1.2GB memory per core. My mind was referring to execution memory but I really appreciate your attention to the breakdown of the `--executor-memory` into its various components, which I should have explained more clearly before doing the memory per core calculation. I'll look into adding an info card to make this clear in the video. Thanks again for your sharp observation! """

@Amarjeet-fb3lk Месяц назад

Hi, I watched this video till end. Very good explanation. But, I have below doubts. If no. of cores are 5 per executor, At shuffle time, by default it creates 200 partitions,how that 200 partitions will be created,if no of cores are less, because 1 partition will be stored on 1 core. Suppose, that My config is, 2 executor each with 5 core. Now, how it will create 200 partitions if I do a group by operation? There are 10 cores, and 200 partitions are required to store them, right? How is that possible?

@afaqueahmad7117 Месяц назад

Hi @Amarjeet-fb3lk, thanks again for the kind words. Regarding your question, you're right in stating that 1 partition will be processed by 1 core. Given the configuration you shared has 2 * 5 = 10 cores in total, it is not necessary for the number of cores to match the number of partitions exactly at any given moment. Spark will create 200 partitions during shuffle by default and it will manage the execution of those 200 partitions by scheduling the tasks in chunks based on resource availability - firstly 10 partitions assigning them to 10 cores and once those 10 cores are freed, then the remaining 10 and so on.. until all 200 partitions are processed.

@Amarjeet-fb3lk Месяц назад

@@afaqueahmad7117 thanks for your response Afaque. Learning and going deep into the topics, bringing me lots of doubts and questions. Thanks for the answer,highly appreciate that.

@tushibhaque863 18 дней назад

Thanks and please provide contact details .Also do you take classes?

@afaqueahmad7117 16 дней назад

Hey @tushibhaque863, appreciate the kind words. At this moment, I do not take classes, but if you have any questions feel free to shoot an email or comment here in this thread :)