Тёмный
No video :(

How To Compare CSV Files For Differences in Python 

Data Analytics Ireland
Подписаться 1,5 тыс.
Просмотров 11 тыс.
50% 1

Do you have a need to understand how to compare two CSV files for differences? In this video tutorial, we look at comparing CSV files with Python pandas. When you want to compare CSV files for differences, there can be a number of options and we show three different ways to approach this.
⏲⏲⏲TIMESTAMPS⏲⏲⏲
Beginning 00:00
Problem overview 00:21
Reviewing output 01:23
An important thing to note 03:10
Code review 03:36

################ Lets be Social! ##################
Website - dataanalyticsireland.ie/
Twitter - / dataanalyticsi1
Facebook - / dataanalyticsirl
Linkedin - / data-analytics-ireland
Pinterest - www.pinterest.ie/dataanalytic...
#CSV #comparefiles #dataanalyticsireland #dataanalytics

Опубликовано:

 

13 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 31   
@Ricled100
@Ricled100 3 года назад
Great video! I am new to python and really enjoyed you taking the time to explain your code and how it works! Looking forward to more videos
@DataAnalyticsIreland
@DataAnalyticsIreland 3 года назад
Thank you so much, I'll be posting more soon, just a bit tied up with something at the moment, probably be next week!
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
Hi , I would welcome some feedback if you have a moment, can you go to this link and tell me what you think, it would be really appreciated Thanks! Joe DAI ru-vid.comcommunity
@Indrail4k
@Indrail4k Год назад
Giving Key Error with method 3 in get_loc raise KeyError(key) from err
@DataAnalyticsIreland
@DataAnalyticsIreland Год назад
Hi, I need to see the full code if you can share, so can investigate further. Thanks, Data Analytics Ireland
@yeturuvenkataarunkumarredd297
@yeturuvenkataarunkumarredd297 2 года назад
How do we configure both the csv files which are located in two different unix paths..
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
Hi, sorry for the delay, I was away and only getting a chance to look at it now. I don't personally use any Unix system, but found this, and wonder is it useful to you? www.oreilly.com/library/view/python-standard-library/0596000960/ch13s04.html Data Analytics Ireland
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
Hi , I would welcome some feedback if you have a moment, can you go to this link and tell me what you think, it would be really appreciated Thanks! Joe DAI ru-vid.comcommunity
@sarvansps
@sarvansps 2 года назад
Since we know the difference is in year column.. we have checked only for that column! What if we have 100 columns in those two csv files and how to compare the column values ?
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
Hi! I was looking at this, and then realised that possibly method 2 above might give you your answer? What it will do is show you the differences between the two data frames, and only print out those rows that have differences on them. You can take those rows from each file and do a comparison. In my example I have compared the first to the second file, what you could do is: (A) Create the output from the first comparison, and save it to a new data frame ( say df_a_diff for example) (B) Repeat step A above but in reverse, and then call the second one df_b_diff. (C) Now compare these two data frames to see where your differences are. Does this help? Data Analytics Ireland
@sarvansps
@sarvansps 2 года назад
@@DataAnalyticsIreland Thanks! It works for me 👍🏻
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
@@sarvansps excellent good to hear!
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
Hi , I would welcome some feedback if you have a moment, can you go to this link and tell me what you think, it would be really appreciated Thanks! Joe DAI ru-vid.comcommunity
@CMondi27
@CMondi27 2 года назад
May I know what would be the best approach to find the differences between two Excel or CSV file if they contain duplicate ids in each files. For instance, Excel 'A' has 123 as an Id but it is repeated 5 times with different column value in Excel A, where as Excel B with 123 id has 7 rows with different column values. I'm really searching to find the difference for this scenarios. Thanks.
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
I'd have to research this for you, but my initial thoughts would be to run a script to correct the ids you want, then when they all unique, make that column a primary key, so duplication will not happen going forward??
@CMondi27
@CMondi27 2 года назад
@@DataAnalyticsIreland Umm, that's right, I was able to create a script which works fine for unique id's, but id's being duplicate in large number is the one I couldn't able to crack it yet. Would be really glad if you able to get any insight on it. Thanks.
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
Are you able to supply your logic, with some sample made up data please and will have a look?
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
Hi , I would welcome some feedback if you have a moment, can you go to this link and tell me what you think, it would be really appreciated Thanks! Joe DAI ru-vid.comcommunity
@findthetruth3021
@findthetruth3021 2 года назад
Can you please find the percentage of discrepancy/mismatch between the two databases? for example, I can say 30% of the data1(csv1) is different than data2(csv2). Is it possible to do that?
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
Thanks for your message! To confirm for the files you are comparing, not after you load them into a database?
@findthetruth3021
@findthetruth3021 2 года назад
@@DataAnalyticsIreland yes let's say two CSV files but with 10 columns and 300 rows of each of them. Once we done with the comparison, then we need to indicate or mention the percentage of the difference between them. For example I am saying that through the comparison I found out the first CSV was 50% different that the second CSV this needs to be decided based on the comparison we have done before. Thanks again for your prompt answer. If you didn't get my message again I am so happy to get in touch with you in Skype and inform you even share my questions test with you. Have a great day.
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
Hi Sorry for the delay. I have the code tweaked for this, will be doing a video on it hopefully tomorrow. Essentially the output will show the percentage match as a number ( i.e 50, 10,100 etc) in a data frame. This can then be used as you please. Hope this works for you. DAI
@finnmccool8671
@finnmccool8671 3 года назад
Great tips.
@DataAnalyticsIreland
@DataAnalyticsIreland 3 года назад
Thank you, your welcome!
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
Hi , I would welcome some feedback if you have a moment, can you go to this link and tell me what you think, it would be really appreciated Thanks! Joe DAI ru-vid.comcommunity
@hemant943
@hemant943 2 года назад
Can you please share your code
@hemant943
@hemant943 2 года назад
Share plz
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
Hi, thanks for visiting the channel, have a look at this page, if you have any questions come back! dataanalyticsireland.ie/2021/08/07/how-to-compare-csv-files-for-differences/
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
Just did, can you see it?!
@hemant943
@hemant943 2 года назад
@@DataAnalyticsIreland thank you so much It was really Helpfull for me
@DataAnalyticsIreland
@DataAnalyticsIreland 2 года назад
Your welcome, glad I could help you!
Далее
Pydantic Tutorial • Solving Python's Biggest Problem
11:07
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Three Best AI tools for Data Analysis
15:39
Просмотров 41 тыс.