3.7 Apache Spark Tutorial | Spark Broadcast Variables

Data Savvy

Подписаться 29 тыс.

Просмотров 33 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

27 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 64

@DataSavvy 4 года назад

Apology For Quality of Audio in Video. I will create a new Video with Better Voice quality

@vdrvideoschannel 6 лет назад

Thank you so much brother. Very useful video.

@gobieee1 3 года назад

Good Explanation, liked it... have couple of question, when you bundle (say jar file) the code, then dp (broadcast map) is also part of the bundle. isn't it ? then db map is available all the worker node where ever bundle is available.. if I dont want to do broadcast, then what would be the code change I need to make? also, bit unclear about the concepts, if someone broadcasting something, then there should be some receiver isn't it? can you help how the worker node receiving it , how it is assigned to broadcasted variable in the worker node ?

@mohammadhaque1873 3 года назад

Excellent explanation.

@vijayanand4437 Год назад

How come even after unpersist, the map function works? Unpersist is like de-broadcast right in our scenario?

@paulfunigga Год назад

I thought that db connections were not serializable? That's why you can't broadcast them to executors?

@allwinjayson 3 года назад

Please upload the recent updated spark interview questions and answers

@techtonicTushar 5 лет назад

Can you please make a video on the difference between mapPartitions and mapPartitionsWithIndex? I am having some issue with this topic.

@dikshachourasiya9749 2 года назад

Can we create Hive tables using Spark DF only,can't we create hive tables using spark RDD??

@vishalaaa1 Год назад

make video on ci/cd and git integration - databricks, git, adf

@TheVikash620 5 лет назад

The video is good, would have been more informative if the implication of normal use without bv was also mentioned. And, please use a good mic.

@DataSavvy 4 года назад

You are right Vikas. Unfortunately, RU-vid does not allow to edit audio for this. This is improve din new Videos

@San-hs7zx 3 года назад

Can you please make a video, how to solve group by out of memory issue ? Interviewer asked me how to solve out of memory issue without code changes ? Please explain with code changes and without code changes

@vinaykumar-rd9bv 6 лет назад

Hi Sir , Its nice video !!! can we have url (video) how the real time video how we can deploy the spark code . step by step process

@DataSavvy 6 лет назад

Sure will create a video on this

@arjunaare7950 6 лет назад

good information sir, thanks can you give me a brief explanation about accumulators in spark??

@DataSavvy 6 лет назад

Sure, will create a video on that... Please subscribe to our channel and share with friends, Also let me know some other questions that you have

@the_high_flyer 6 лет назад

Very nice video bro,very clear..but you need to be a bit louder

@DataSavvy 6 лет назад

Thanks Hemanshu... Will take care of sound in next set of videos

@pareshbapna3386 5 лет назад

Hi can we use broadcast variables as DB connection string, asking because I suppose DB connection is being only used by driver and not by executors but not sure of this..

@DataSavvy 5 лет назад

You can broadcast db connection also... It is required when executor want to fetch data based on row that they are progressing

@DataSavvy 5 лет назад

Processing

@pareshbapna3386 5 лет назад

@@DataSavvy Okie but in one of the review comment which i got is "Broadcast Variables in Spark should only be used to disseminate data to be used by Executors during processing such as lookup tables, hash maps, etc. In some cases Broadcast Variables were being used incorrectly (to store variables only used by the Driver process - such as configuration properties or connection strings). " So this means all executor's DB connection request goes to driver and hence it is not a good idea to use broadcast variable in case of database strings

@sivaprasanna_sethuraman 4 года назад

I would rather suggest you to use mapPartition() and instantiate DB connection within that.

@pareshbapna3386 4 года назад

Why??Appreciate if you can add pros and cons for wider community

@vdrvideoschannel 6 лет назад

Nice explanation, but voice low. please use mic brother.

@pareshbapna3386 5 лет назад

Okie but in one of the review comment which i got is "Broadcast Variables in Spark should only be used to disseminate data to be used by Executors during processing such as lookup tables, hash maps, etc. In some cases Broadcast Variables were being used incorrectly (to store variables only used by the Driver process - such as configuration properties or connection strings). " So this means all executor's DB connection request goes to driver and hence it is not a good idea to use broadcast variable in case of database strings

@DataSavvy 5 лет назад

I did not get your statement that all db request goes to driver... I have used this approach several times to broadcast db connection to executors and it works perfectly for me... Can you highlight any scenario what I am missing here.. what kind of problem it will cause...

@pareshbapna3386 5 лет назад

@@DataSavvy Even I am not sure, the code review comment was given by external agency consultant. I have included the same statement (above) in double quotes. I am investigating how much valid are these statement.(Pl refer above statements in double quotes). Is there any official document which can highlight where to store DB connection strings, based on spark internal architecture?

@sreedivya8368 5 лет назад

Hi could you please explain the concept of salting on skewed dTa with a simple example

@ravireddy966 5 лет назад

Hi, how can i read the file(one am going to broadcast) from hdfs into spark and create a key,value pair ?? when you get using textFile, it will be RDD[String}...how can i convert this to Map[Key -> Value] ??

@DataSavvy 5 лет назад

Use a map operation and split string and create a tuple out of string... It will give you key value pair rdd

@sandeeppatil6752 6 лет назад

Nice work brother...

@DataSavvy 6 лет назад

Sandeep Patil thanks Sandeep :) please share your suggestions, how can I improve videos

@sandeeppatil6752 6 лет назад

Please share your email id we'll discuss ..

@sandeeppatil6752 6 лет назад

Please share your email id we'll discuss about it ..

@DataSavvy 6 лет назад

Sandeep Patil harjeet.kumar4@gmail.com

@nagarajua4050 5 лет назад

kindly do the video on another shared variable i.e accumulators

@DataSavvy 5 лет назад

Here is your video ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-rmLsT1z20Us.html

@nagarajua4050 5 лет назад

thanks

@juniorfarmer1351 5 лет назад

sir can you post video about setMaster[local]

@DataSavvy 4 года назад

Due to health reasons, I have taken break... I will soon start making videos again

@deepakkini3835 6 лет назад

Make a video on how to convert an Rdd to a dataframe

@DataSavvy 6 лет назад

will create a video on this

@MrManish389 4 года назад

Sir please increase the volume because of which interest will not be there to viewers

@ravurunaveenkumar7987 4 года назад

hi sir can you explain accumulators with example

@LivenLove 6 лет назад

What is the average salary of spark developer?

@DataSavvy 6 лет назад

Depends on experience and role... In Bangalore a 5 6 year experience guy with spark, is able to get around 25 lpa. If he is from a good college number goes up... Few folks whose base package is less, may get around 18 lpa

@LivenLove 6 лет назад

@@DataSavvy Thank you, I am a Java developer and looking for upgrading my skills specially in the field of big data. The whole big data ecosystem is kind of overwhelming. How long can it take to master it..if we dedicate a few hrs every day?

@rajatsaha891 4 года назад

is it possible to broadcast a dataframe??

@DataSavvy 4 года назад

Data frame is a distributed data structure. Logically it is not meaning full to broadcast a data frame.. usually a variable is broadcasted

@rajatsaha891 4 года назад

I am getting an exception due to an increased size of a dataframe in the pyspark code of my project. The exception is: "org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 35421:6 was 269173286 bytes, which exceeds max allowed: spark.rpc.message.maxSize(268435456 bytes). Consider incresing spark.rpc.message.maxSize or using broadcast variables for large values". The dataframe is getting created but any operation with the same like 'show()' or 'saveAsTable()' is throwing the exception. Now, I solved it by increasing the spark.rpc.message.maxSize, however is it possible to solve the issue by implementing broadcast variables in any way as mention in the exception thrown by spark?

@watchmanling 5 лет назад

why your broadcast variable has get method?

@pradeepkalthuru5433 4 года назад

Why do we use parallelize()

@rightsdefence 4 года назад

for creating rdd

@bhavaniv1721 3 года назад

Can I know communication details?

@SuperSazzad2010 5 лет назад

Volume++ please

@DataSavvy 5 лет назад

Thanks for suggestion ... I have improved the audio in new videos

@arindambose8124 6 лет назад

need video on explain

@naveenkuramsetty 4 года назад

Please eat something before the session

@DataSavvy 4 года назад

Thanks For Suhhestion Naveen... Unfortunately, RU-vid does not allow me to edit and improve audio quality. Hence stuck with Same Video... This is improved in new Videos

@MrManish389 4 года назад

Volume++ please

@DataSavvy 4 года назад

Will take care in New videos

@MrManish389 4 года назад

Data Savvy thank you sir