Тёмный

Spark Interview Question | How many CPU Cores | How many executors | How much executor memory 

Learning Journal
Подписаться 77 тыс.
Просмотров 27 тыс.
50% 1

Опубликовано:

 

4 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 47   
@vinayak6685
@vinayak6685 10 месяцев назад
With this formula, Memory of Executor = 2.5GB~3GB always For X size of data, No. of Cores = 8*X No. of Executors = 8/5*X
@sayedsamimahamed5324
@sayedsamimahamed5324 6 месяцев назад
Excellent . Thank you sir
@ETLMasters
@ETLMasters Год назад
Very well explained. Thank you so much.
@KARTHIKEYATHANNIRU-w8t
@KARTHIKEYATHANNIRU-w8t 2 дня назад
for above example how many nodes cluster it would you be ?
@prasannakumar7097
@prasannakumar7097 4 месяца назад
Very well explained
@mnaveenvamshi3651
@mnaveenvamshi3651 6 месяцев назад
Awesome explanation.
@sonurohini6764
@sonurohini6764 3 месяца назад
Great .but follow up question for this by interviwever is s how do we take 4x memory per executor.
@amlansharma5429
@amlansharma5429 3 месяца назад
Spark reserved memory is 300 mb in size and executor memory should be atleast 1.5X times of the spark reserved memory, i.e. 450 mb, which is why we are taking executor memory per core as 4X, that sums up as 512mb per executor per core
@navasampath
@navasampath Год назад
Hi , Is 4x a kind of standard ?. Please confirm
@Matrix_Mayhem
@Matrix_Mayhem 8 месяцев назад
Thanks Sir!
@vikaschavan6118
@vikaschavan6118 Год назад
Can you please explain why 4x memory required for each core
@TheWqrahd
@TheWqrahd 11 месяцев назад
The basic idea here is that when we read compressed parquet files and load them into memory, they tend to expand. In general, we assume that the data can increase in size by about three to four times once it's uncompressed in memory. That's likely where this number comes from.
@vikaschavan6118
@vikaschavan6118 6 месяцев назад
Thanks for your reply
@tridipdas9930
@tridipdas9930 4 месяца назад
What if the cluster size is fixed? Also ,shouldn't we take into account per node constraint? For eg: what if the no. of cores in a node is 4?
@marreddyp3010
@marreddyp3010 Год назад
16 executors. Each with 5 cores and 3 gb ram. In each executor how much data can be cached. How much data can be processed. What about shuffling. For narrow and wide transformations. Any out of memory issues. Do you really think total 80 cores and 3*16 = 48gb ram required to process 10gb data. Please give complete answer sir.
@ScholarNest
@ScholarNest Год назад
That's how the formula works for maximum parallelism and doing everything in one shot. You can run this on a single executor with 5 cores and 3 GB memory. It will work smoothly.
@ranjithrampally7982
@ranjithrampally7982 Год назад
@@ScholarNest at 4:37 seconds, we are assigning a minimum of 4X memory for each core, how did we come to this number of 4X? why not some other number ?. Can you please answer sir.
@vinayak6685
@vinayak6685 10 месяцев назад
We gave 4 times memory to each core than the size of partition. Think it like 4 portions we have done to the complete memory of 1 core. So in 1 portion data will be sitting and in the remaining 3 portions of each core will work out the other aspects and hence will not cause OOM at the executor. If you still find out that it is not suggest, use a bigger number as multiple factor, like instead of 4, use 5 times in memory allocation formula.
@arnabghosh21
@arnabghosh21 2 месяца назад
@@vinayak6685 If you see the spark memory distribution, you will find spark execution part of an executor gets only 1/3rd of the executor memory. And if we add off heap memory, generally it will use only 1/4th of the executor memory. The same thing will replicate for cores as well. @ScholarNest sir, correct me if I am wrong.
@ramu7571
@ramu7571 16 дней назад
If recommended memory per executor is 3gb . for 10 gb file executors we need is only 4. how come 16 are there according to calculation . please kindly answer.
@sangu2227
@sangu2227 8 месяцев назад
can u explain about the seralization with example in spark that is used with profer results
@arnabghosh21
@arnabghosh21 2 месяца назад
For the same 10 GB file suppose we have following resources: 38 GB worker memory with10 cores, 8gb driver memory with 2 cores, manually configured schuffle partitions - 80. How will it behave?
@Sauravsuman11005
@Sauravsuman11005 2 месяца назад
Datanode = 10 16 CPUs / node 64 GB Memory / node Please tell me cluster config we are going to choose ?
@shivamdwivedi771
@shivamdwivedi771 11 месяцев назад
Sir what if if we are reading 100GB file in that case number of executor will be 160 . Do you think 160 executor will be correct one here
@vinothkannaramsingh8224
@vinothkannaramsingh8224 8 месяцев назад
Good question
@ritikpatil4077
@ritikpatil4077 8 месяцев назад
I Tried same with below configuration Question - If you have 100 GB of Data, how many Cores and Number of Executor you required (Considering we have only 50GB of RAM, 40 Cores in Total) - The default Partition Size is 128 MB, 100 GB total means 102400 MB. So Total partition will be 102400/128 = 800 Partition - To achieve highest parallelism we need to have similar number of Cores as Partitions. But we don’t have 800 Cores. The recommended cores per executor is 5 for better IO in HDFS. - So, 40/5 = 8, so we can make up to 8 Executors. - For this 8 Executor we have distribute Memory equally, it will be 50/8 = 6.25 ~ 6 GB per Executor. - So, in final we have 8 Executors with 5 Core each. I will take some times to run all Data
@amanmishra98apr
@amanmishra98apr 7 месяцев назад
Sir purchased your course in 2020
@vaibhavtyagi9885
@vaibhavtyagi9885 4 месяца назад
in last question each and every value you took was default only (128mb, 4, 512mb,5 cores) , so lets say the question is for 50 gb of data then still 3gb would be the answer?
@rinkesh_xo
@rinkesh_xo 2 месяца назад
Yes, only the total number of executors will increase. This is for peak performance
@cherukurid0835
@cherukurid0835 Год назад
Hi , what if the file is in different storage location and the cluster manager is different from YARN ? how to calculate.
@ultimo8458
@ultimo8458 10 месяцев назад
i have applay 4x memory in each core for 5Gb file but no luck can you please help me to how to resolve this issue Road map: 1)Find the number of partition -->5GB(10240mb)/128mb=40 2)find the CPU cores for maximum parallelism -->40 cores for partition 3)find the maximum allowed CPU cores for each executor -->5 cores per executor for Yarn 4)number of executors=total cores/executor cores -> 40/5=8 executors Amount of memory is required Road map: 1)Find the partition size -> by default size is 128mb 2)assign a minimum of 4x memory for each core -> what is applay ??????? 3)multiple it by executor cores to get executor memory ->????
@souravarora7741
@souravarora7741 8 месяцев назад
10240/128 is 80 , not 40
@friendlykolam
@friendlykolam 6 месяцев назад
2) 4 times x 128 mb block = 512mb needed per core 3) 512mb x 5 cores of an executor = 2560mb is required per executor Conclusion of my understanding from this video is whether it is 10gb or 5g or anything is data size, you always mention executor-cores=5 and executor-memory=3g (i.e. round of 2560mb)
@vipuljohri8520
@vipuljohri8520 7 месяцев назад
How did you assume that each core will require 4x the partition size ?
@paulfunigga
@paulfunigga 4 месяца назад
He's an indian, god spoke to him
@Amarjeet-fb3lk
@Amarjeet-fb3lk 4 месяца назад
If no. of cores are 5 per executor, At shuffle time, by default it creates 200 partitions,how that 200 partitions will be created,if no of cores are less, because 1 partition will be stored on 1 core. Suppose, that My config is, 2 executor each with 5 core. Now, how it will create 200 partitions if I do a group by operation? There are 10 cores, and 200 partitions are required to store them, right? How is that possible?
@navdeepjha2739
@navdeepjha2739 4 месяца назад
You can set the no of partitions equal to no. of cores for maximum parallelism. ofcourse, u cannot create 200 partitions in this case
@DUFFERMEHUL
@DUFFERMEHUL 2 месяца назад
In your case if 200 partitions are created, then your degree of Parallelism will be 10, which means 10 partitions will be processed in a single time and then once those slots are free the next 10 partitions will be processed.
@vinothvk2711
@vinothvk2711 8 месяцев назад
Hi - amount of memory In this case 3gb always for all size of data ? I think we have to tweak as per the size of data
@sudarshanmhaisdhune1039
@sudarshanmhaisdhune1039 5 дней назад
is it feasible to tweak considering huge number of executors we have?
@sandippatel6999
@sandippatel6999 Год назад
sir, is there any way to get Databricks certifications vouchers?
@sanjaybedwal2385
@sanjaybedwal2385 Год назад
+1
@raviyadav-dt1tb
@raviyadav-dt1tb 9 месяцев назад
Hello sir how to process 100 gb data . How can we calculate memory and executor and driver pleas help me .
@davexavier3927
@davexavier3927 8 месяцев назад
Did you find the answer ? I'm interested
@sudarshanmhaisdhune1039
@sudarshanmhaisdhune1039 5 дней назад
Look, 100 GB data will have 800 partitions but of course, it's not always possible to allocate 800 cpu cores for one-shot processing. Consider we have 100 cpu cores available. So, 100/5 = 20 executors will be needed. This 20 executors will process data fully in 800 partitions /100 cpu cores = 8 batches. That is, 1 executor will need to handle 40 partitions (4 batches) here for full data processing.
@shreemanthmamatavbal7468
@shreemanthmamatavbal7468 Год назад
why 4x memory is required for each core
@TheWqrahd
@TheWqrahd 11 месяцев назад
The basic idea here is that when we read compressed parquet files and load them into memory, they tend to expand. In general, we assume that the data can increase in size by about three to four times once it's uncompressed in memory. That's likely where this number comes from.
Далее
iPhone 16 & beats 📦
00:30
Просмотров 162 тыс.
小路飞嫁祸姐姐搞破坏 #路飞#海贼王
00:45
PART - 1 : How I cracked SAP Interview? With Notes
6:23
Apache Spark Executor Tuning | Executor Cores & Memory
44:35
Spark Executor Core & Memory Explained
8:32
Просмотров 62 тыс.
iPhone 16 & beats 📦
00:30
Просмотров 162 тыс.