Adding new columns to a Dataframe by comparing another Dataframe in PySpark | Realtime Scenario

Подписаться 5 тыс.

Просмотров 4,7 тыс.

50% 1

Hi Friends,
In this video, I have explained a realtime scenario in PySpark
github.com/sra...
Code & dataset are uploaded to GirHub:
raw.githubuser...
github.com/sra...
Please subscribe to my channel for more interesting learnings.

Опубликовано:

14 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 27

@sravankumar1767 2 года назад

Nice explanation sravana 👌 👍 👏

@sravanalakshmipisupati6533 2 года назад

Thank you, Sravan.

@ahamadp6615 2 года назад

Another approach is unionByName() to above query. Correct me if I am wrong MAM

@sravanalakshmipisupati6533 2 года назад

If the requirement is to have data from both the dataframes then we can use UnionByName.. but here the requirement is to check if the columns are matching with other dataframe and if any column is missing then it will create a column with same name and null value.

@Learn2Share786 2 года назад

Nicely explained, could you pls share the gihub link for this notebook, will help in practicing.

@sravanalakshmipisupati6533 2 года назад

Thank you. Please check the description for GitHub links for code.

@Learn2Share786 2 года назад

@@sravanalakshmipisupati6533 thanks..under "sampledata" branch, looks this specific notebook is not checked in yet, could you commit the same? Or pls help me locate the file if it's checked in with some other name..

@sravanalakshmipisupati6533 2 года назад

@@Learn2Share786 github.com/sravanapisupati/SampleDataSet/blob/main/weatherHistory.csv please open this link and click on the code hyperlink.

@praneethchiluka9602 2 года назад

could you do one video on project end-to-end pipeline. Like how we are using github, jenkins, etc into project. what is the process in project.

@sravanalakshmipisupati6533 2 года назад

I can explain the overall procedure.. I may not be able to execute and show because I don't have required setup in my personal laptop.

@praneethchiluka9602 2 года назад

@@sravanalakshmipisupati6533 Yes please explain the overall procedure. like what tools are using (Github, jenkins, jira, etc) in the project with flow. actually there is no proper video which will explain project end to end process. so it will be great if you do one ?

@praneethchiluka9602 2 года назад

@@sravanalakshmipisupati6533 Hi, please make this in your checklist.

@srinivasasameer9615 2 года назад

Will you provide Scala code for this example if you are aware of it.

@sravanalakshmipisupati6533 2 года назад

Can you please try with FoldLeft? If facing some issues, please let me know.

@jeffersonerick4081 Год назад

I have the same scenario. I have 2 dataframes with different number of columns. The second dataframe have an update values so i want to update the dataframe 1 considering the values of the second dataframe but keeping the values of the first dataframe if there is no a change. Could you help with this?

@sravanalakshmipisupati6533 Год назад

Hi Jefferson, Please join the 2 dataframes and select the updated columns from 2nd dataframe.

@avirubin613 Год назад

Why can't you just do an inner join?

@sravanalakshmipisupati6533 Год назад

We need a join key for that right?

@ankitagawande9028 Год назад

Hi give me solution if i have table with name , id ,departmnet in name column 2 name . now the condition is want new column but in that new column i want all nam which are in name column

@sravanalakshmipisupati6533 Год назад

Use withColumn and use the name column, it will copy the contents from name to new column. Df.withColumn("new_col", f.col("name"))