Python Pandas Tutorial 6. Handle Missing Data: replace function

codebasics

Подписаться 1,1 млн

Просмотров 215 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

12 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 158

@codebasics 2 года назад

Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced

@darrennewell3845 2 года назад

This helped me tremendously with cleaning up messy serial data that I was logging from a microcontroller into a useful data frame. Thank you for posting these free of charge and helping me finish my senior design project!

@engineerbaaniya4846 4 года назад

it was sweet and sort to the point best tutorials i have seen on whole youtube platform if anyone planning to learn pandas go for his playlist line by line it is amazing (best from all).....

@anushachand2443 4 года назад

You are so great in explaining the concepts, anyone can understand.

@s.sidharttan9241 4 года назад

Yea im literally learning pandas from his vdeos

@binodrai3653 3 года назад

Thank you for making it free. One of the best pandas tutorial

@codebasics 3 года назад

Glad it was helpful!

@ijeffking 7 лет назад

Your videos have always more to offer. Very useful for data analysis and in the process eventually for Machine Learning. Thank you very much

@codebasics 5 лет назад

Machine learning tutorials with exercises are available at: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-gmvvaobm7eQ.html

@codebasics 4 года назад

Step by step guide on how to learn data science for free: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Vn_mmOuQkSA.html Machine learning tutorials with exercises: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-gmvvaobm7eQ.html

@lyricathelyricsworld8945 3 года назад

Sir please make a video on regex

@diplomatic_koboko 6 лет назад

You are absolutely fantastic. I am looking forward to whatever you do next

@ZeeshanYounas-m5v Год назад

Sir! I am very excited to see this tutorial....i am starting your roadmap of data science..This is very usefull

@abhi9029 5 лет назад

Thanks for making this wonderful tutorial. It shows how powerful python is.

@codebasics 5 лет назад

oh definately, python rules the world!

@prashantkumarvishwakarma8645 3 года назад

Sir, you are teaching very good way

@SulemanTheTraveller 3 года назад

Extremely helpful. Thank You sir for making us understand in such a easy way.

@GoldPhoenix99 6 лет назад

These are an excellent video series, by the way. You've got my sub!

@bhats230284 3 года назад

Superb explanation, I have started with this series and its helping me a lot. Many Thanks.

@rickrs5289 6 лет назад

Thank you very much for this awesome video! I have 2 small questions: 1. How to replace multiple occurrences of a same value in different columns with (a) same single value and (b) with different values ? 2. How to replace n number of values in different columns by (a) single value and (b) with different values ? Could you please add example codes for this on Github? Best regards.

@sihlengena5022 4 года назад

thank you for helping me out.... I struggled the whole week... this helped meee

@codebasics 3 года назад

Glad it helped!

@stormhawk252 4 года назад

Amazing this came in so handy when completing my assignment.

@codebasics 4 года назад

👍😊

@joelu3440 5 лет назад

Thanks, this is a fantastic tutorial. Just one question - at 7:14, you modified the temperatures column to hold 32 F and 32 C. Then you just removed the letters so that both of them became 32. Should you have done some sort of conversion first? 32 F and 32 C are not equal, so shouldn't you have used the (°F − 32) × 5/9 = °C formula to normalize all of them to C or all of to F?

@anweshgandham6776 5 лет назад

Have u found the solution ..how to ahead in case of such situations . Kmph and mph in same column . Centigrade and farenheit in same column .

@kannanv8831 4 года назад

Technically you are correct. This values will confuse. But he is teaching only how to chop-off the letter which is fixed with this values. In real world, you will make the temperature column as F or C. You can not hold both with out letters.

@abhinavreddy1083 4 года назад

you are correct .......if we replace like that values cant give correct prediction on analysis

@PrasannaChowdharyborn9th 3 года назад

@@anweshgandham6776 We can use regex to identify whether C or F is present in the cell and multiply the filtered records with respective conversion.

@eafadeev 3 года назад

@@anweshgandham6776 you could use regex with a replace pattern being a function

@roopagaur8834 5 лет назад

Amazing teaching.!!!!! Thank you.

@HawkingMerchant 4 года назад

A 13 minute vdo felt like a 1hour class that's the richness of you content excellent 😍😍 And the anaconda was best I never knew of it before your vedio that's so much useful thank you so much

@codebasics 4 года назад

Vinay I am glad you liked it

@jennythedancer5139 2 года назад

Watched Again. Thank You Very Much, Its' Very Helpful.

@talamuslu 6 лет назад

You are absolutely fantastic. more videos pls

@Raja-tt4ll 4 года назад

Awesome tutorial! Thanks

@Lifelicious28 4 года назад

At 10:35, you can also use below code i guess, new_pr = df.replace(['A-Za-z'], regex=True) instead of using the dictionary. Worked well for me.

@codebasics 4 года назад

Yes regex can also be used. Thanks for the tip bhakti 👍

@1716_anujpradhan-wz7lu 8 месяцев назад

But it will remove the data from event also, which we don't want.

@kunjalsahu3504 2 года назад

Thank u sir was strugling with one question related to replace able to solved it thank u...

@nomoreospf 4 года назад

Really great and clever tutorials. Thank you!

@codebasics 4 года назад

Glad it was helpful!

@kishorekumarviswanadhuni5055 6 лет назад

Excellent Videos. Thanks for Uploading Videos

@mansijain2250 3 года назад

Sir,as there are different values 32F anf 32C.What if we want to convert all F values to C values.How we will handle this type of data?

@Raj_indian10 Год назад

Good explanation.

@ritikajaiswal3824 2 года назад

from all these months, your contents have helped me a lot. But still its tough for a newbie like me to understand how SQL,Python(pandas,matplotlib,numpy,seaborn),PowerBI/Tableau helps in data analysis. Can you make a sample project PLEASE!!!

@codebasics 2 года назад

I have few projects on power bi/ tableau already on my channel. I don't have projects on SQL/Pandas etc and I will definitely add those in the future.

@shaikansari6882 6 лет назад

Thanks for the wonderful tutorial. It helped me a lot.

@muhammadazam8422 Год назад

Hi Brother, Thanks for making this wonderful tutorial.You are great in explaining the concepts. How to extract the year in the string "June 13, 1980 (United States)". Kindly Regards, MA

@pratipkhandelwal1101 6 лет назад

Supposedly there are multiple special values in a column , so we are not able to add them manually into the replace list , so anyway how to know the special values without us checking the data columns row by row or without us seeing the dataframe ?

@nataliaielnykova3173 6 лет назад

It is simple and helpful! Thanks!!!

@harshKumar-uk3jx 4 года назад

Amazing sir 👍🙌

@anveshreddy5905 3 года назад

Really awesome👍

@codebasics 3 года назад

Glad it was helpful!

@godwingeorgethekkanath 3 года назад

Great video. But 32F != 32C. We have to covert at least 1 unit. How to do that, if there are multiple units in a column?

@petiwalas Год назад

excellent (as usual :-) one comment/question on the DatetimeIndex - I noticed that when you create a date_range dt = pd.date_range("01-01-2017","01-11-2017") dt itself is of type DatetimeIndex, so you shouldn't need to create another object instance idx and instead could use df.reindex(dt) directly... can you please explain the need to create this separate instance idx? Thank you.

@su80061 5 лет назад

Thank you so much. i learnt a great deal.

@mohammedrashidakhtaransari8267 3 года назад

Hello sir. I have a question. How to use excel sheet cell data to modify/.replace a text file. E.g. i have a excel file. In which i have a data in cell 1 e.g A1=10 20 30. And i want to use this cell vale to .replace a text file. .replace('cell A1 data', '202020 (which is available in text').

@nareshgb1 6 лет назад

when you do the regex replace, the number format for the temp and windspeed columns changes from 99.9 to 99 - in fact its not clear whether the data is considered numeric anymore.

@gurpreettata 4 года назад

explained nicely

@codebasics 4 года назад

Gurpreet , I am happy this was helpful to you

@ak47ava 5 лет назад

If replacing values with the mean of that column, could i just do this -> new_df = df.replace({ 'temperature':-99999, 'windspeed':0,}, { df['temperature'].mean(), df['windspeed'].mean()} ) new_df I got an error saying Value argument must be scalar, dict, or series?

@codebasics 5 лет назад

df.temperature.replace(-999999,df.temperature.mean())

@skkkks2321 5 лет назад

Again ,a big job done,A Great thank you.

@lyricathelyricsworld8945 3 года назад

Sir please make a video on regex

@bartdziubek327 3 года назад

Great tutorial :)

@yashchavan1350 3 года назад

but 32 F is not equal to 32 C so how is the data correct .Is there any way to make this right or we need to multiply the conversion manually

@mukeshkumar-kh2fh 2 года назад

sir can we replace NaN value of column by mean in such a way that if other parameter value is in a particular range than find the mean and replace . Example..if column BMI has NaN value then if age of that person is 45 then we first find the mean BMI of people with a age of range 40 to 50 and replace with this.Similarly,for other person have NaN BMI ... then first check the age of that person and set an interval age and find mean and replace...

@badamsuraj2327 3 года назад

Sir I just code ML with excel sheet contains a small data then when I was run the program it showing error - no such file or directory.Is there any solution for this

@jpcam4781 3 года назад

i have a CSV and xlsx file (both the same data) but it cant use parse_dates or .astype to convert to datetime64 type. ?? any suggestions? Thanks for the videos very informative

@boubacaramaiga4408 4 года назад

Many thanks.

@lakshmanmaddi3763 3 года назад

Excellent presentation sir. I would like to know your name please.

@subee128 Год назад

Thanks

@suchismitadash2399 3 года назад

I have a situation where I need to exchange the column values between two data frames based on some criteria. I have a code ready for that as well using replace () but it is not working in few scenarios. Can you please help. I can email you my code and data frame details

@osamashawky622 4 года назад

Amazing man

@codebasics 4 года назад

Osama, I am glad you liked it

@moatlaredikamogelo6126 6 лет назад

In the case where ? represents the missing value, how do you still implement the replace method. It seems not to work when the value is ? sign

@chapidi99 4 года назад

Why missing values need to be -99999 is it effecient to do. Does replacing with 999 or 9999, 99999 makes any difference?

@geocarvalhont 7 лет назад

I really grateful, thanks

@fatinafiqah.y938 6 лет назад

Hi your video is great. But I dont know why as I import the excel file and want to solve the missing values in it with your method it just cannot work as it still in a NaN . Btw, is there any way that I can communicate with you to discuss my problem. Then when I try to use the scikit learn method, it just appears an error that my 'imputer' cannot be subscriptable. what does that means? Pls help me to solve this error.

@RAZONEbe_sep_aiii_0819 4 года назад

let's say that I have a big dataset where a feature has many -ive values, I want to replace all of them by 0, can you tell me how to do that, U have taken a very small dataset here.

@gowthamshetty9276 4 года назад

Hey! thanks for the tutorial. i have one doubt in this tutorial, can we also use na_values() method for replacing those 99999 values with Nan in data frame??

@blackisfav7222 4 года назад

Gowtham Shetty we can use.Also we even can replace with averages and mean modes of respective columns for all Nans as per practical problem demand.This is one of better ways to deal

@rony979 Год назад

Hi, Your tutorials are really helpful. Thanks for these clips. my question are, 1. How can I keep the changes I am making? Cause, everytime I am trying the other option, data goes back to the original status and making the changes on it. 2. How can I combine multiple codes together? For example, I used this code and worked on the dataset. new_df.replace({ 'temperature': '-99999', 'windspeed': ['-99999', '-88888'], 'event': '0' }, 'NaN') Then when I used the following code, data set went back to original shape. Meaning, the changes occurred due to previous code were no longer there. new_df.replace({ 'temperature':'[A-Za-z]', 'windspeed': '[A-Za-z]', },'', regex=True) test So, how can I make it stick without creating Dataframe every time? I really appreciate your suggestion.

@nexusbiswa7895 2 года назад

while using dictionary for replace my temperature column is nt replaceing

@biswajitmondal7807 3 года назад

Sir is it mandatory to learn ML we have to cover pandas,matplotlib,seaborn?

@naveenkuppili2889 6 лет назад

How to implement the below replace fn only to "Score" column? df.replace(['poor', 'average', 'good', 'exceptional'], [1,2,3,4])

@pulkitprakash7566 5 лет назад

Why did you use "np.NaN", and not simply "NaN" in "replace" at 2:05 ?

@codebasics 5 лет назад

There is nothing like NaN in python. You have to use np.nan from numpy module

@StasoMalgaray 5 лет назад

please tell it to me how can i replace this pattern (#R##N##R##N#) this columns contain text too

@shockey3084 5 лет назад

good job dear

@wingochambers2562 7 лет назад

Great stuff -- thanks!

@TechieDishant 3 года назад

nice sir

@mostafaalaywan3704 5 лет назад

the code (new_df=df.replace(-9999,np.NaN) new_df ) don't work , what i have to do ?

@Abhishek-jy4ul 5 лет назад

df = pd.read_csv('filename',na_values=(-1)) df

@kneelakanta8137 2 года назад

Can we have cheat sheet for all these pandas tutorials

@jayshreedonga2833 Год назад

thanks sir

@easydatascience2508 Год назад

Hei, you can watch mine too. The channel has both Python and R playlists, and source files can be downloaded(link is in video description).

@bhavyashreethimmarayappa4945 3 года назад

Thanks for the wonderful explanation! I have a query, Kindly address. When the 'replace' function was used on 'Temperature' and 'Windspeed' columns, the values were converted from 'int' to 'float'. Could you please explain how can we replace few values to NaN and retain the type of that column as 'integer'(not float)?

@sameer-verma 7 лет назад

when doing the below, I also want to replace 'No Event' with 'Sunny': df_new = df.replace({ 'temprature':'[A-Za-z]', 'windspeed':'[A-Za-z]' },'',regex=True) : Is it possible, I tried doing this way df_new = df.replace({ 'temprature':'[A-Za-z]', 'windspeed':'[A-Za-z]' },'',regex=True, 'No Event', '')

@raunakpatni1403 6 лет назад

While dealing with 32 F, you used regex, but the data is now object type not int64 and you can not do mean, mode and other similar stuff on objects type data

@lakshsinghania Год назад

then how to deal with that i mean change ?

@mrakesh00 3 года назад

how to replace particular column values with their mean where some column values are =0 pls, help me out.

@brendachirata2283 5 лет назад

I love your vids, thank you. But may i please ask how to deal with missing values that comes in form of a hyphen in my data set. Kind regards.

@brendachirata2283 5 лет назад

@@codebasics ok, thank you

@balsamshallal6805 7 лет назад

Hi, thank you for this tutorial, I would like to ask how could we combine monitoring data for each half hour to hourly data?

@balsamshallal6805 7 лет назад

it has worked, thank you

@shihabuddinahmed8955 6 лет назад

How can I download csv file from github which link is given..is it possible?

@manojjha6597 6 лет назад

Hi, I had a small doubt here The data set that u are using to explain this concept - " missing values treatment" is a very small data set where i can see my entire data set and visually observe wheather or not my data set contains any missing values and then do the treatment accordingly. but if data set is too big to be observed visually then how would i figure out wheather the data set contains any form of missing values ?

@didi098710 8 месяцев назад

df.isna().any(axis=1)

@sonalithakker6517 3 года назад

Thank you

@codebasics 3 года назад

You're welcome

@mvcutube 3 года назад

the best

@mansijain2250 3 года назад

How np.NaN works.Can anybody help me out?

@Arvindraj-os9ep 8 месяцев назад

You will have to import numpy as np to use it

@sudheerpapasani541 4 года назад

How to handle with data's like suppose age=200 how to rectify this

@PRATIK1900 6 лет назад

Great tutorials Sir, really helpful :) One question. In the last section, where you showed how to replace a list of values with another list, that looked like it applied to the entire data frame ( e.g. we had multiple columns with "exceptional, "average", etc). So it would carry out this replacement in all the columns. Suppose we have 5 such columns (5 exams/subjects ) and I want this replacement in only two columns. then do I need to do something like this? df_new = df.replace({ 'exam1' : ["poor", "average", "excellent"] 'exam2' : ["poor", "average", "excellent"] }, [0, 1, 2] ) is this correct code?

@PRATIK1900 6 лет назад

so we have to write a line of code for every column we want to do this, right? Also, technically was my code wrong?

@rajatpati8808 6 лет назад

df9 = pd.DataFrame({ 'score': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'], 'student': ['rob', 'maya', 'parthiv', 'tom', 'julian', 'erica'], 'score1': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'], 'score2': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'], }) df9 df9.score.replace(['poor', 'average', 'good', 'exceptional'], [1,2,3,4],inplace = True) df9.score1.replace(['poor', 'average', 'good'], [1,2,3],inplace = True) df9

@md.shafiqulislam5692 5 лет назад

Great !!!

@ak47ava 5 лет назад

Hello, if i wanted to replace 0 values lets say with the mean of that one column and -9999 with the mean of the other column. What i mean is how would i replace values of the column with the mean of that respected column.

@codebasics 5 лет назад

df.column1.mean should give you a mean value and then you use that in your replace function.

@jatinfalwaria7087 7 лет назад

man hats off.

@jatinfalwaria7087 7 лет назад

Buddy can u share your email with me or mail me at jatin97.intruder@gmail.com,coz i have some doubts to be cleared😊

@kishorekumarviswanadhuni5055 6 лет назад

Do you training videos on Machine Learning and R language also?

@kishorekumarviswanadhuni5055 6 лет назад

Thank you so much. This is big help.

@dhananjaykansal8097 5 лет назад

For some really really strange reason. My replace function just doesn't seem to work. Like it doesn't show any error. It basically does nothing. I really fail to understand what's wrong. new_df = df.replace({ 'Temperature': -99999, 'Windspeed':[-99999,-88888], 'Event': 0 },np.NaN) new_df

@prickingpringle5187 5 лет назад

it should be small letters 'temperature','windspeed','event', also 'event':'0' new_df = df.replace({ 'temperature': -99999, 'windspeed':[-99999,-88888], 'event':'0' },np.NaN) new_df

@dhananjaykansal8097 5 лет назад

@@prickingpringle5187 In my file I've put like that. Where every first alphabet is capital.

@codebasics 5 лет назад

Can you print df just before the replace call and make sure column names and values you want to replace are same as what you are passing in replace call as parameters. You code looks correct to me so not sure why it would not work!

@dhananjaykansal8097 5 лет назад

@@codebasics I tried each and every thing. Lastly I'll try on some another pc or reinstall my anaconda and try again with Jupyter. Because in PyCharm pandas doesn't work for me.

@ak47ava 5 лет назад

@@codebasics Hello, if i wanted to replace 0 values lets say with the mean of that one column and -9999 with the mean of the other column. What i mean how would i replace values of the column with the mean of that respected column.

@panagiotisgoulas8539 5 лет назад

How to replace values under a certain condition? Example I want to replace all values on temperature column that are above 32 with a word

@panagiotisgoulas8539 5 лет назад

@@codebasics Thank you teacher, your videos are really helpful

@shivatarun9125 6 лет назад

I have one question. How to replace a particular column of all values which has greater than a particular value. Example: x['ApplicantIncome']>5070 , it has to replace with 5000 which has greater than the values of all 5070 ?

@shivatarun9125 6 лет назад

Awesome , thanks for yur repsonse. train['ApplicantIncome']=train.ApplicantIncome.apply(lambda x: train.mean if x >5070 else x). for me getting an TypeError: '>' not supported between instances of 'method' and 'int' while giving anyvalue instead of train.mean its working.

@shivatarun9125 6 лет назад

train['ApplicantIncome']=train.ApplicantIncome.apply(lambda x: train.mean() if x >5070 else x) TypeError: object of type 'int' has no len() please help me out

@rajatpati8808 6 лет назад

df10 = pd.DataFrame({ 'score': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'], 'student': ['rob', 'maya', 'parthiv', 'tom', 'julian', 'erica'], 'income': [5071,6000, 6500, 7500, 8000, 3000], }) df10.income = df10.income.apply(lambda x: 5000 if x >5070 else x)

@yanamadalaharishkumar5041 4 года назад

converter and replace both act same ?

@gunjankumar2267 6 лет назад

hello, data.replace({'Dependents':['+']},'', regex=True) data.replace({'Dependents':'+'},'', regex=True) data.replace('+',' ', regex=True) i tried all the method.. facing same error all the time error: nothing to repeat at position 0 how to remove that '+' sign from the data set.

@gunjankumar2267 6 лет назад

Dependents columns has value ---> 0, 1, 2, 3+, nan total no of row is 614. regex=False it reflects complete dataset---like print(data) and no change in the values of Dependent column

@gunjankumar2267 6 лет назад

codebasics Thanks A Lot... Finally it worked... Thanku