Тёмный
No video :(

Spark [Executor & Driver] Memory Calculation 

Data Engineering
Подписаться 170 тыс.
Просмотров 56 тыс.
50% 1

#spark #bigdata #apachespark #hadoop #sparkmemoryconfig #executormemory #drivermemory #sparkcores #sparkexecutors #sparkmemory
Video Playlist
-----------------------
Hadoop in Tamil - bit.ly/32k6mBD
Hadoop in English - bit.ly/32jle3t
Spark in Tamil - bit.ly/2ZzWAJN
Spark in English - bit.ly/3mmc0eu
Batch vs Stream processing Tamil - • Data - Batch processi...
Batch vs Stream processing English - • Data - Batch processi...
NOSQL in English - bit.ly/2XtU07B
NOSQL in Tamil - bit.ly/2XVLLjP
Scala in Tamil : goo.gl/VfAp6d
Scala in English: goo.gl/7l2USl
Email : atozknowledge.com@gmail.com
LinkedIn : / sbgowtham
Instagram : / bigdata.in
RU-vid channel link
/ atozknowledgevideos
Website
atozknowledge.com/
Technology in Tamil & English
#bigdata #hadoop #spark #apachehadoop #whatisbigdata #bigdataintroduction #bigdataonline #bigdataintamil #bigdatatamil #hadoop #hadoopframework #hive #hbase #sqoop #mapreduce #hdfs #hadoopecosystem #apachespark

Опубликовано:

 

21 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 37   
@neelbanerjee7875
@neelbanerjee7875 Год назад
Sometime question from interviewer that what is the data size of your project and how you do this memory allocation based on data size? Could you please make a video to explain those real cases depend upon data size
@sangramrajpujari3829
@sangramrajpujari3829 3 года назад
This is a real A to Z calculation for memory. Thanks for the useful video.
@pavan64pavan
@pavan64pavan 3 года назад
This is the best video I have watched on Executor Memory calculation. Thank you brother.
@svJayaram9
@svJayaram9 3 года назад
Thank you so much. It's very clear to me
@sukanyanarayanan5763
@sukanyanarayanan5763 Год назад
Hi, One clarification In real time scenario, we need to decide the resource based on the file size we are going to process. Can you please explain how to explan how to determine the resource based. On the file size
@gouthamanush
@gouthamanush 2 года назад
What if your input size keeps changing? On one day if it's 1GB and another day it's 1TB, would you still suggest the same configuration? Can there be a correct configuration in such cases?
@ananthb1600
@ananthb1600 3 года назад
Hi anna, Vanakkam. My doubt is... How can this calculation be suitable for all the jobs.. Taking the same cluster configuration which you explained in the video, for all the jobs even though the size of data handling will differ from job to job, still we will calculate according to whole cluster configuration...please explain I am really confused.
@bhavaniv1721
@bhavaniv1721 3 года назад
Thanks for the detailed explanations 👍
@sachinchandanshiv7578
@sachinchandanshiv7578 Год назад
Hi Sir, Can you please explain, What is the practical Hadoop cluster size in projects of companies?
@AshokKumar66
@AshokKumar66 3 года назад
Could you explain When do I increase number of executors and when do I increase no of cores for a job?
@umarfarook2815
@umarfarook2815 2 года назад
If you want to run more task parllelly then you could increase core size..
@sreelakshmang7275
@sreelakshmang7275 Год назад
My core node in emr has 32 gb memory and 4 cores, but when checking spark ui, i can see only 10.8gb and 1 core being used. why is that?
@svdfxd
@svdfxd 5 месяцев назад
How do the dataframe partitions impact the job?
@vikaschavan6118
@vikaschavan6118 2 года назад
Do we need to consider existing running jobs in prod environment while giving these parameter values for our spark application. Thanks in advance.
@vijjukumar100
@vijjukumar100 Год назад
If my spark reads data from event hub what is the recommended partitions count at Event hub. if partitions count is 10 only one driver connect to all partitions and sends to worker nods?
@vivekrajput6782
@vivekrajput6782 3 года назад
Still a question for which I am not able to get proper answer. Suppose you have 10 gb data to process, using data volume scenario , please explain number of executor, executor memory, driver memory.
@dataengineeringvideos
@dataengineeringvideos 3 года назад
I need your cluster / node configuration like RAM , Core each nodes and what is your cluster size ?
@nithinprasenan
@nithinprasenan 2 года назад
What if... I have a stand alone mode... And I have 16core and 64gb of ram.. how to calculate executor and driver memory
@WritingWithShreya
@WritingWithShreya 2 года назад
Hi, How minimum memory is 5GB? Please explain?
@royjohn465
@royjohn465 2 года назад
HIi, I have an Interview scenario - A spark job is running on a cluster with 2 executors and there are 5 crores per executor. if the transformation takes 1 min per partition how long does the job run for a dataframe with 20 partitions? Please advice.
@nnishanthh
@nnishanthh Год назад
2 mins
@snehilverma4012
@snehilverma4012 Год назад
@@nnishanthh Hey buddy, are we considering one partition as one task? If yes, why?
@user-dw3pn6rk9g
@user-dw3pn6rk9g 4 месяца назад
@@snehilverma4012 Each executor processes 5 partitions concurrently. Each partition takes 1 minute to process. Given that there are 2 executors and each executor processes 5 partitions concurrently, it means all 20 partitions can be processed simultaneously. So, the total runtime of the job would be equal to the time taken for the slowest executor to finish processing all its partitions. Since each partition takes 1 minute to process, and each executor can process 5 partitions concurrently, the slowest executor would need to process all 20 partitions in: 20 partitions / 5 partitions per minute = 4 minutes So, the job would run for a total of 4 minutes.
@macklonfernandes7902
@macklonfernandes7902 3 года назад
Thanks for info. So i have 1 master and 1 worker with 4cpu and 16gb and available memory is 12gb So when i submit spark job on yarn with driver and exector memory 10gb and core as 4 Its not able to assign the passed values. Inturn 1 core and 5 or 8 gb is assigned for executor Any help would be helpful
@shailendraakshinthala
@shailendraakshinthala Год назад
excellent
@guptaashok121
@guptaashok121 3 года назад
Thanks for this nice video, have a question. suppose i have worker 2 nodes each having 4 core and 14GB memry scenario 1: by defualt databricks creats 1 executor per node, it means each executor will have 4 cores and 14 GB, hence can run 4 parallel tasks total 8 parallel tasks. Scenario 2: If i configure databricks to have 1 executor per core and configure 3 GB memory per executor I can have 8 executor in total, which means 8 task can run in parallel, each will have 3 gb both ways i can run max 8 tasks in parallel, on what basis i should choose my distribution model to get optimal perforance?
@prudhvinadh1622
@prudhvinadh1622 2 года назад
anyone, please explain this scenario, having the same confusion!!
@umarfarook2815
@umarfarook2815 2 года назад
I think both scenarios gives same performance...first scenario 1 executer has 4 core & 14gb Ram running 4 tasks parllelly & 2nd scenario we are reducing memery and core for executer(each executer having 1core 3gb ram)running 1 task only then 4 executer running 4 task parllelly ..so both give same performance...but I am not sure ...
@tashi9154
@tashi9154 Год назад
In scenario 2, you are going for thin executors approach (min resources per exec) and having 1 core/exec won't give u multi-threading and also if using broadcast variable you'll need more no. of copies cuz each exec needs separate copy. Ideally no. of cores per executor is 1-5 cuz more than 5 cores might cause sysytem to suffer throughput.
@user-dh3nu9sh7l
@user-dh3nu9sh7l 6 месяцев назад
you have data engineering course notes pdf
@ravikirantuduru1061
@ravikirantuduru1061 3 года назад
why not dynamic allocation
@dataengineeringvideos
@dataengineeringvideos 3 года назад
DRA is Not recommended in the PROD Env , since many teams deploy many jobs , so sometimes jobs will over utilize the cluster
@arvindkumar-ed4gf
@arvindkumar-ed4gf 3 года назад
@@dataengineeringvideos but this can be achieved by dividing your root queue into multiple queues and give allocation to these. This will ensure to get the required resources to the applications running in high priority queue.
@dataengineeringvideos
@dataengineeringvideos 3 года назад
Yes we do have sub queues with manual memory config instead of DRA I will not recommend DRA in sub queues too For example The sub queue A is over utilized then it try to use other sub queue resource B , that's how yarn queues work in this case enabling DRA is not a good idea
@ravikirantuduru1061
@ravikirantuduru1061 3 года назад
@@arvindkumar-ed4gf sorry I didn't get you.can you explain with an example.Thanks
@gouthamanush
@gouthamanush 2 года назад
@@dataengineeringvideos if we give the maximum number and minimum number of executors, there is nothing wrong with dynamic allocation. I am not sure about you, we have at least 60+ jobs processing TBs of data every day with dynamic allocation and never saw any problem.
@jaeger809
@jaeger809 3 года назад
I guess you are in south part of India.
Далее
Spark Submit Cluster [YARN] Mode
13:11
Просмотров 17 тыс.
Spark Executor Core & Memory Explained
8:32
Просмотров 60 тыс.
Spark [Driver and Executor] Memory Management Deep Dive
13:36