No video :(

16 Understand Spark Execution on Cluster

Ease With Data

Подписаться 4,6 тыс.

Просмотров 2,1 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

22 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 19

@easewithdata 9 месяцев назад

Note: For Standalone clusters: the --num-executors parameter may not work always. So, to control the number of executors: 1. define number of cores per executors with --executor-cores parameter (spark.executor.cores) 2. control max number of cores for execution with --total-executor-cores parameter (spark.cores.max) If you need 3 executors with 2 cores (you don't need to use --num-executors) --executor-cores 2 --total-executor-cores 6 --num-executors parameter can be used to control number of executor for yarn resource manager. No need to worry as we will work more with spark cluster configuration if future sessions.

@kunalnandwana4280 5 месяцев назад

@easewithdata. How you are running cluster mode on local machine? Means from where you are getting this much of resources

@satishkumarparida4797 4 месяца назад

Same question as Kunal, how are you running Cluster Mode in Local Machine, little bit of context will be good here.

@easewithdata 4 месяца назад

Hello Kunal & Satish, I have a 4 core, 8 processor machine. Docker utilizes hyperthreading to enable multi-processing with the same core. This is the reason you see 16 cores (2 threads each processor) available in cluster. And docker doesn't allocate complete resource from host machine to containers rather some percentage of it, which can be controlled using parameters. You can learn more about it in Docker documentations.

@gyanaranjannayak3333 3 месяца назад

Can you please tell how both master node and two workers node run on same machine?

@easewithdata 3 месяца назад

Hello, I am using docker to run both master and worker nodes as docker containers.

@Kevin-nt4eb Месяц назад

so in deployement mode the driver program is submitted inside a executer which is present inside a cluster. am I rignt?

@easewithdata Месяц назад

The spark submit command on the driver not on executors

@bhavishyasharma998 3 месяца назад

Hi, can you please tell how a data frame with 10 column gets partitioned into 11 parts with 2 executors having 8 cores i.e. total 16 cores processing it?

@easewithdata 2 месяца назад

Dataframes/data is not partitioned based on number of columns. Its is partitioned based on data (horizontal partitioning).

@bhavishyasharma998 2 месяца назад

@@easewithdata ok thanks

@gyanaranjannayak3333 3 месяца назад

How Are you running this Spark stand alone cluster? You have installed Spark on you system separately and running or what? I am using with pip install pyspark right now. What I have to do to use this standalone cluster like you are doing?

@easewithdata 3 месяца назад

Hello, I am using docker containers to run a standalone Cluster.

@gyanaranjannayak3333 3 месяца назад

@@easewithdata both master slave executor running on same machine?

@shivakant4698 Месяц назад

spark's standalone cluster is where on docker or any where please tell me my cluster execution codes are not running why?

@easewithdata Месяц назад

Standalone cluster used in this tutorial is on docker. You can set it up yourself. For notebook - hub.docker.com/r/jupyter/pyspark-notebook You can use the below docker file to setup cluster github.com/subhamkharwal/docker-images/tree/master/spark-cluster-new