Spark Interview Question | Scenario Based | Merge DataFrame in Spark | LearntoSpark

Подписаться 14 тыс.

Просмотров 43 тыс.

50% 1

In this video, we will learn how to merge two Dataframe in Spark using PySpark. we will discuss all the available approach to do it. Hope this video will be helpful in your Spark Interview Preparation.
Blog link to learn more on Spark:
www.learntospark.com
Linkedin profile:
/ azarudeen-s-83652474
FB page:
/ learntospark-104523781...

Опубликовано:

11 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 87

@user-co8oc1rm5w 3 года назад

being a newbie to spark I find it very helpful boss.keep it up brother.looking forward to see more such from you.

@shubne Год назад

Now you can use unionByName() function as well. df3 = df.unionByName(df2, allowMissinColumns=True) df3.show()

@4brogames 3 года назад

Real and true looking forward to see more videos

@ajaykiranchundi9979 2 года назад

Last approach was incredible. Did not know it was possible to subtract the columns to get the delta!!

@davimonteiropaulelli9649 3 года назад

Excelent video Azarudeen, you helped me alot! Thankssss

@arvindyadav1504 3 года назад

Thanks Azar for making such a nice scenario based question series with demo.

@nagamohanreddy1602 3 года назад

Really its nice help friend

@nareshvemula2204 Год назад

Good videos. Thank you. One small info, in "Automated Approach" if number of columns difference between two data frame is more than one and not in alphabetical order then it won't work. We need to sort the columns while performing union operation like below. df_final=df_file1.select(sorted(df_file1.columns)).union(df_file2.select(sorted(df_file2.columns)))

@Rajgupta-fh3yt 3 года назад

u r doing great job and its helping a lot to the beginners. Thanks

@monicakannan9731 2 года назад

When merging 5 different data format files how it will work ?? Your answer will be helpful

@dattaningole8063 3 года назад

Very good explanation of each scenario .... Thanks a lot @Azarudeen Shahul... Keep it up

@AzarudeenShahul 3 года назад

Thanks for your support.. 😊

@DiverseDestinationsDiaries 3 года назад

Hi Shaul, Superb content. Never seen such an clear and all possible approaches in RU-vid. Thanks a lot. Not only for the interview , to get out daily jobs done ,you're videos so helpful.

@krishnakishorenamburi9761 4 года назад

great work Azar. I used the automatic technique for a datawareshousing project.

@AzarudeenShahul 4 года назад

Thanks for your support, share with your bigdata frnds

@ankbala 3 года назад

very nice approach and clear explanation! Thank you very much.

@sumitkumarsahoo 3 года назад

The tutorial is very lucid and clear

@SurendraKapkoti 2 года назад

Very clear and useful. Thank you very much

@madhavkondapalli785 4 года назад

Thank you so much for these real time scenario videos brother Eagarly waiting for more such All the best

@AzarudeenShahul 4 года назад

Thanks for your support, pls share with ur frnds aswell :)

@smileplease6151 2 года назад

Thank you so much for the videos. They definitely increased my hope towards practical learning!!!

@AzarudeenShahul 2 года назад

Thanks for your support 🙂

@aneksingh4496 3 года назад

Good video ..please keep posted on new scenario based questions

@AzarudeenShahul 3 года назад

Sure, move videos to come

@sasmigration1920 2 года назад

Awesome Azharuddin, your videos are very helpful...Do you take any online coaching?

@sravankumar1767 2 года назад

Superb bro 👌 👏

@sarjfud 4 года назад

Great example and nice explaination

@AzarudeenShahul 4 года назад

Thanks for your support, :-)

@4brogames 3 года назад

Awesome work man. Appreciated

@abhinavsingh9333 Год назад

Nice video.. informative.. ❤❤

@AzarudeenShahul Год назад

Thanks for all your support

@souravsardar 3 года назад

Excellent. Thanks for sharing. Can u make a video on reading data from multiple parquet files of different schema using schema evolution.

@AzarudeenShahul 3 года назад

Sure, can except the same soon👍

@adshakin 3 года назад

Great pyspark tutorial thanks

@ashwinc9867 3 года назад

Can you also make some videos on spark using scala? All your videos are brilliant

@awanishkumar6308 3 года назад

HI Azarudeen its Awanish your video really helpful,,, actually i have installed Spark but while i am checking on command prompt by entering pyspark its saying path is not specified , even though i have made many correctness and checked even environment variables as well many times

@heenagirdher6443 2 года назад

Hi Azarudeen. Thank you so much for this video. I have implemented the same question in spark scala but I am facing problem in implementing the automated approach in spark scala. Could you please help me on this and provide me solution for the same.

@muddy8107 3 года назад

Boss , you are beauty!!’

@priyankas6354 4 года назад

Very nice explanation of the concepts. How we can achieve this in scala. Also it will be great if you also explain some scenarios using Scala . Thank you

@rohitrathod8150 2 года назад

How outer join worked? We have same columns in both the DF, which columns it will take?

@DiverseDestinationsDiaries 3 года назад

For the same scenario, I have used motonically I'd column for two then I have done left join. Is that approach was correct?

@ashwinc9867 3 года назад

How can I achive same in scala? I tried following code but not working.consider a and b as two dataframe Val diffcol=a.columns.diff(b.columns) for(i

@pavithrasri1890 3 года назад

Hi..your videos are really helpful... could you please post a video on spark incremental data load and merge that data with scd2 type (using SCALA)...

@DataIsBusiness 10 месяцев назад

thanks a lot bro,

@AzarudeenShahul 10 месяцев назад

Thanks for all your support 😊

@awanishkumar6308 3 года назад

so can you help me to fix it ? can you check i am ready to share my screen ? dear please helpp i have learnt theory part of Hadoop and spark but not feeling confident because of no good hands on because of no environment

@AzarudeenShahul 3 года назад

Please mail me the error message scrnshot and steps u followed.. if needed we can chk on screen sharing

@sriharipinapaka1030 3 года назад

Awesome Bro !.. If you can, please do the video on the same scenario by using Scala.

@AzarudeenShahul 3 года назад

Sure 👍

@srinugoriparthi4608 2 года назад

Can you help in merge two dataframes with date column and big int column i am getting error like failed to merge

@DanishAnsari-hw7so Год назад

How can we get the code for all the scenarios in this playlist?

@AzarudeenShahul 8 месяцев назад

we have a github link provided in description of all recent video. u can find notebook for some scenario based question.

@viswasp3388 3 года назад

nice !

@puggyk4220 3 года назад

I'm trying string (json style) -> parquet for merging different columns dataframe

@ritikgupta8478 3 месяца назад

We can use unionByName in scala

@vineethkyatham536 3 года назад

How to compare two data frames, with matched records and unnmatched record values?

@anuvindkorivi5262 2 года назад

Hi bro how to achieve the same using scala

@swaroopsuki1322 2 года назад

Can we do this using unionByName

@realMujeeb 2 года назад

Hi Sir, in for loop we see df2=df2.withColumn(i,lit("null")) here we are able to update the dataframes, but how is it possible if dataframes are immutable.

@murari5921 Год назад

DataFrames are immutable that is the reason why we are assigning it to variable

@srinuch9531 4 года назад

Thanks Azar for making real-time scenario based videos.. how automated process works when both data frames have different column names ?

@AzarudeenShahul 4 года назад

Thanks for your support,; Are you referring to same data with different column names. If so, then automated approach does not suits.. try schema method...

@himanshujain2047 2 года назад

@@AzarudeenShahul Just if the order of columns is not same between 2 DFs then this will fail. In that case, we can use unionByName or do df2= df2.select(df1.columns) first then we can apply union.

@localmartian9047 2 года назад

@@himanshujain2047 there is also allowMissingColumns param in unionByName that does the same as this video

@0305ram 4 года назад

@Azarudeen Shah - In the example the missing column is at the last for one of the dataframe. So with_column automatically adds at the end. What if the column is missing in middle of the table structure ? Thank you!!

@AzarudeenShahul 4 года назад

Thanks for the question Before merging, we can select the columns in same order as that of other like Df1.select(df2.columns) Hope this helps you :)

@0305ram 4 года назад

@@AzarudeenShahul wow.. cool thanks Azar..

@Real_Nature_shorts222 2 года назад

bro pls help me to install spark share me doc of steps i have windows 10

@ashwinc9867 3 года назад

Can you please share the scala code for automated approach

@fortheknowledge145 3 года назад

Just add a scenario if we do not have columns in same order in both dataframes after loop? New columns arrive or some columns may disappear over time but the merge/union should keep happening daily. - we need to select columns in right order before doing union we use foldLeft instead of loop (more functional programming way)

@SpiritOfIndiaaa 4 года назад

Thank you , but in automated approach , updating df2 in for loop it won't work in java

@SpiritOfIndiaaa 4 года назад

Whatever changed inside is not accessible outside of loop...can you help me how to handle it

@awanishkumar6308 3 года назад

how to get your mail id ?

@pshar2931 3 года назад

Your methods will not work if both tables have one an extra column. For example TableA: name, age, salary TableB: name,age,gender

@pranayshukla9980 4 года назад

From where input1.csv is fetched, do u have uploaded any CSV file there.?

@sangamrathore7850 3 года назад

Yes Parnay I have created and uploaded csv file in my databricks account

@sudippandit9855 2 года назад

Awesome content!! please help me if we save the output => df1.union(df2).show() and save it to new dataframe as df, and apply df.show(), it didn't work, why?

@MyVaibhavraj Год назад

we can achieve this by using UnionByName: union_df = df1.unionByName(df2, allowMissingColumns = True)

@AzarudeenShahul Год назад

Here we discuss about spark below 3.1 unionByName works when both DataFrames have the same columns, but in a different order. An optional parameter was also added in Spark 3.1 to allow unioning slightly different schemas.