This helped me tremendously with cleaning up messy serial data that I was logging from a microcontroller into a useful data frame. Thank you for posting these free of charge and helping me finish my senior design project!
it was sweet and sort to the point best tutorials i have seen on whole youtube platform if anyone planning to learn pandas go for his playlist line by line it is amazing (best from all).....
Step by step guide on how to learn data science for free: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Vn_mmOuQkSA.html Machine learning tutorials with exercises: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-gmvvaobm7eQ.html
Thank you very much for this awesome video! I have 2 small questions: 1. How to replace multiple occurrences of a same value in different columns with (a) same single value and (b) with different values ? 2. How to replace n number of values in different columns by (a) single value and (b) with different values ? Could you please add example codes for this on Github? Best regards.
Thanks, this is a fantastic tutorial. Just one question - at 7:14, you modified the temperatures column to hold 32 F and 32 C. Then you just removed the letters so that both of them became 32. Should you have done some sort of conversion first? 32 F and 32 C are not equal, so shouldn't you have used the (°F − 32) × 5/9 = °C formula to normalize all of them to C or all of to F?
Technically you are correct. This values will confuse. But he is teaching only how to chop-off the letter which is fixed with this values. In real world, you will make the temperature column as F or C. You can not hold both with out letters.
A 13 minute vdo felt like a 1hour class that's the richness of you content excellent 😍😍 And the anaconda was best I never knew of it before your vedio that's so much useful thank you so much
from all these months, your contents have helped me a lot. But still its tough for a newbie like me to understand how SQL,Python(pandas,matplotlib,numpy,seaborn),PowerBI/Tableau helps in data analysis. Can you make a sample project PLEASE!!!
Hi Brother, Thanks for making this wonderful tutorial.You are great in explaining the concepts. How to extract the year in the string "June 13, 1980 (United States)". Kindly Regards, MA
Supposedly there are multiple special values in a column , so we are not able to add them manually into the replace list , so anyway how to know the special values without us checking the data columns row by row or without us seeing the dataframe ?
excellent (as usual :-) one comment/question on the DatetimeIndex - I noticed that when you create a date_range dt = pd.date_range("01-01-2017","01-11-2017") dt itself is of type DatetimeIndex, so you shouldn't need to create another object instance idx and instead could use df.reindex(dt) directly... can you please explain the need to create this separate instance idx? Thank you.
Hello sir. I have a question. How to use excel sheet cell data to modify/.replace a text file. E.g. i have a excel file. In which i have a data in cell 1 e.g A1=10 20 30. And i want to use this cell vale to .replace a text file. .replace('cell A1 data', '202020 (which is available in text').
when you do the regex replace, the number format for the temp and windspeed columns changes from 99.9 to 99 - in fact its not clear whether the data is considered numeric anymore.
If replacing values with the mean of that column, could i just do this -> new_df = df.replace({ 'temperature':-99999, 'windspeed':0,}, { df['temperature'].mean(), df['windspeed'].mean()} ) new_df I got an error saying Value argument must be scalar, dict, or series?
sir can we replace NaN value of column by mean in such a way that if other parameter value is in a particular range than find the mean and replace . Example..if column BMI has NaN value then if age of that person is 45 then we first find the mean BMI of people with a age of range 40 to 50 and replace with this.Similarly,for other person have NaN BMI ... then first check the age of that person and set an interval age and find mean and replace...
Sir I just code ML with excel sheet contains a small data then when I was run the program it showing error - no such file or directory.Is there any solution for this
i have a CSV and xlsx file (both the same data) but it cant use parse_dates or .astype to convert to datetime64 type. ?? any suggestions? Thanks for the videos very informative
I have a situation where I need to exchange the column values between two data frames based on some criteria. I have a code ready for that as well using replace () but it is not working in few scenarios. Can you please help. I can email you my code and data frame details
Hi your video is great. But I dont know why as I import the excel file and want to solve the missing values in it with your method it just cannot work as it still in a NaN . Btw, is there any way that I can communicate with you to discuss my problem. Then when I try to use the scikit learn method, it just appears an error that my 'imputer' cannot be subscriptable. what does that means? Pls help me to solve this error.
let's say that I have a big dataset where a feature has many -ive values, I want to replace all of them by 0, can you tell me how to do that, U have taken a very small dataset here.
Hey! thanks for the tutorial. i have one doubt in this tutorial, can we also use na_values() method for replacing those 99999 values with Nan in data frame??
Gowtham Shetty we can use.Also we even can replace with averages and mean modes of respective columns for all Nans as per practical problem demand.This is one of better ways to deal
Hi, Your tutorials are really helpful. Thanks for these clips. my question are, 1. How can I keep the changes I am making? Cause, everytime I am trying the other option, data goes back to the original status and making the changes on it. 2. How can I combine multiple codes together? For example, I used this code and worked on the dataset. new_df.replace({ 'temperature': '-99999', 'windspeed': ['-99999', '-88888'], 'event': '0' }, 'NaN') Then when I used the following code, data set went back to original shape. Meaning, the changes occurred due to previous code were no longer there. new_df.replace({ 'temperature':'[A-Za-z]', 'windspeed': '[A-Za-z]', },'', regex=True) test So, how can I make it stick without creating Dataframe every time? I really appreciate your suggestion.
Thanks for the wonderful explanation! I have a query, Kindly address. When the 'replace' function was used on 'Temperature' and 'Windspeed' columns, the values were converted from 'int' to 'float'. Could you please explain how can we replace few values to NaN and retain the type of that column as 'integer'(not float)?
when doing the below, I also want to replace 'No Event' with 'Sunny': df_new = df.replace({ 'temprature':'[A-Za-z]', 'windspeed':'[A-Za-z]' },'',regex=True) : Is it possible, I tried doing this way df_new = df.replace({ 'temprature':'[A-Za-z]', 'windspeed':'[A-Za-z]' },'',regex=True, 'No Event', '')
While dealing with 32 F, you used regex, but the data is now object type not int64 and you can not do mean, mode and other similar stuff on objects type data
Hi, I had a small doubt here The data set that u are using to explain this concept - " missing values treatment" is a very small data set where i can see my entire data set and visually observe wheather or not my data set contains any missing values and then do the treatment accordingly. but if data set is too big to be observed visually then how would i figure out wheather the data set contains any form of missing values ?
Great tutorials Sir, really helpful :) One question. In the last section, where you showed how to replace a list of values with another list, that looked like it applied to the entire data frame ( e.g. we had multiple columns with "exceptional, "average", etc). So it would carry out this replacement in all the columns. Suppose we have 5 such columns (5 exams/subjects ) and I want this replacement in only two columns. then do I need to do something like this? df_new = df.replace({ 'exam1' : ["poor", "average", "excellent"] 'exam2' : ["poor", "average", "excellent"] }, [0, 1, 2] ) is this correct code?
Hello, if i wanted to replace 0 values lets say with the mean of that one column and -9999 with the mean of the other column. What i mean is how would i replace values of the column with the mean of that respected column.
For some really really strange reason. My replace function just doesn't seem to work. Like it doesn't show any error. It basically does nothing. I really fail to understand what's wrong. new_df = df.replace({ 'Temperature': -99999, 'Windspeed':[-99999,-88888], 'Event': 0 },np.NaN) new_df
it should be small letters 'temperature','windspeed','event', also 'event':'0' new_df = df.replace({ 'temperature': -99999, 'windspeed':[-99999,-88888], 'event':'0' },np.NaN) new_df
Can you print df just before the replace call and make sure column names and values you want to replace are same as what you are passing in replace call as parameters. You code looks correct to me so not sure why it would not work!
@@codebasics I tried each and every thing. Lastly I'll try on some another pc or reinstall my anaconda and try again with Jupyter. Because in PyCharm pandas doesn't work for me.
@@codebasics Hello, if i wanted to replace 0 values lets say with the mean of that one column and -9999 with the mean of the other column. What i mean how would i replace values of the column with the mean of that respected column.
I have one question. How to replace a particular column of all values which has greater than a particular value. Example: x['ApplicantIncome']>5070 , it has to replace with 5000 which has greater than the values of all 5070 ?
Awesome , thanks for yur repsonse. train['ApplicantIncome']=train.ApplicantIncome.apply(lambda x: train.mean if x >5070 else x). for me getting an TypeError: '>' not supported between instances of 'method' and 'int' while giving anyvalue instead of train.mean its working.
train['ApplicantIncome']=train.ApplicantIncome.apply(lambda x: train.mean() if x >5070 else x) TypeError: object of type 'int' has no len() please help me out
hello, data.replace({'Dependents':['+']},'', regex=True) data.replace({'Dependents':'+'},'', regex=True) data.replace('+',' ', regex=True) i tried all the method.. facing same error all the time error: nothing to repeat at position 0 how to remove that '+' sign from the data set.
Dependents columns has value ---> 0, 1, 2, 3+, nan total no of row is 614. regex=False it reflects complete dataset---like print(data) and no change in the values of Dependent column