Тёмный

How do I handle missing values in pandas? 

Data School
Подписаться 244 тыс.
Просмотров 197 тыс.
50% 1

Most datasets contain "missing values", meaning that the data is incomplete. Deciding how to handle missing values can be challenging! In this video, I'll cover all of the basics: how missing values are represented in pandas, how to locate them, and options for how to drop them or fill them in.
SUBSCRIBE to learn data science with Python:
www.youtube.co...
JOIN the "Data School Insiders" community and receive exclusive rewards:
/ dataschool
== RESOURCES ==
GitHub repository for the series: github.com/jus...
"read_csv" documentation: pandas.pydata.o...
"isnull" documentation: pandas.pydata.o...
"notnull" documentation: pandas.pydata.o...
"dropna" documentation: pandas.pydata.o...
"value_counts" documentation: pandas.pydata.o...
"fillna" documentation: pandas.pydata.o...
Working with missing data: pandas.pydata.o...
== LET'S CONNECT! ==
Newsletter: www.dataschool...
Twitter: / justmarkham
Facebook: / datascienceschool
LinkedIn: / justmarkham

Опубликовано:

 

2 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 376   
@dataschool
@dataschool 6 лет назад
In pandas version 0.21 (released October 2017), they added 'isna' and 'notna' as aliases for 'isnull' and 'notnull'. Learn more in my latest video, "5 new changes in pandas you need to know about": ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-te5JrSCW-LY.html
@bragattemas
@bragattemas 4 года назад
Even in the final of 2019 your material form 2016 still gives incredible help. I have certainty the DataSchool will keep been a success and helping people. Excellent job Kevin Markham. Thanks.
@Taranggpt6
@Taranggpt6 4 года назад
Why after replacing na with *various* the count is different . Coubts of various must be equal to na values earlier which was 2644
@EdgardThreat
@EdgardThreat 4 года назад
​@@Taranggpt6 hi, that's because there is already a category named "VARIOUS" in the dataset, so the new filled in data gets added up to the existing count of "VARIOUS".
@vigneshpadmanabhan
@vigneshpadmanabhan 3 года назад
can we get a video on how to handle missing values for data time related datasets. may be sensor values or any sensitive values. multiple varieties of handling missing value would be very useful.
@nadyamoscow2461
@nadyamoscow2461 2 года назад
@@bragattemas I must say even in 2021 it is still completely up to date
@339059331
@339059331 3 года назад
I like his way of teaching, he doesn't assume that the audience knows by default. He breaks down the explanation piece by piece, it is a great learning experience, concise and clear stated lectures as always! Thanks!
@dataschool
@dataschool 3 года назад
You're very welcome! Thanks for your kind words!
@depokboy
@depokboy 3 года назад
@@dataschool first time watch,,,,those positves comments are true,,,,,thanks a lot
@BAIBHAVPATHYBEE
@BAIBHAVPATHYBEE Год назад
6 years has gone released this video and i m watching it now and it still made me fall in love with the series ... beautifully explained every concept in detail.
@dataschool
@dataschool Год назад
Thank you so much! 🙏
@kuldipchauhan524
@kuldipchauhan524 6 лет назад
Awesome- you are gifted.... -- your explanation and content are clean and effective.
@dataschool
@dataschool 6 лет назад
Thanks very much for your kind words!
@codesandroads
@codesandroads 4 года назад
I never leave this place unsatisfied or without answers, total treasure.
@dataschool
@dataschool 4 года назад
Thank you so much!
@stevechops3226
@stevechops3226 3 года назад
I cannot tell you how much you have helped me, with all sorts of problems! You have the clearest way of explaining things, thank you so much!
@dataschool
@dataschool 3 года назад
You're so very welcome, thanks for your kind words! 🙏
@FULLCOUNSEL
@FULLCOUNSEL 7 лет назад
You are doing an excellent job. You are called to do this for sure. Cheers
@dataschool
@dataschool 7 лет назад
Wow, thank you so much for your comment! I really appreciate it.
@firdharamadhani5162
@firdharamadhani5162 4 года назад
i rarely leave youtube comment but thank you!! if it werent for your video i wouldn't understand how to do my assignment at all, you did a great job at explaining!
@sapnasinha804
@sapnasinha804 5 лет назад
Fantastic explanation , however at the end would be good to mention that there are more ways to fill with value_counts , eg. With the mean of all other values etc and not just merging null column with any other column. Cheers!
@dataschool
@dataschool 5 лет назад
Thanks!
@elilavi7514
@elilavi7514 8 лет назад
Awesome as usual !
@dataschool
@dataschool 8 лет назад
Thanks!
@nasserabachi9625
@nasserabachi9625 7 лет назад
Now i am in love with Pandas just by seeing a couple of your videos, Shukran Jazeelan !
@dataschool
@dataschool 7 лет назад
That's awesome! Thanks for sharing!
@MrsRimouch
@MrsRimouch 2 года назад
Thank you so much. It is always clearer to listen to you!!
@dataschool
@dataschool 2 года назад
You are so welcome!
@vishwanathg8083
@vishwanathg8083 6 лет назад
Thank you , You made learning pandas a cake walk.
@dataschool
@dataschool 6 лет назад
Awesome, that's great to hear!
@gouravkushwaha4488
@gouravkushwaha4488 5 лет назад
You are good. Your explanation really made it simple.
@dataschool
@dataschool 4 года назад
Thanks!
@rajpaul1501
@rajpaul1501 3 года назад
Truly amazing videos. Can you do a series on Matplotlib and Seaborn
@dataschool
@dataschool 3 года назад
Thanks for your kind words and suggestion!
@kiranachanta9741
@kiranachanta9741 5 лет назад
I have been watching Kevin Videos, needless to say he is an Awesome Instructor. His explanation in all of his videos is Conceptual, In-depth and breaking down any complex topic into the easiest way. Thanks Kevin for your great Work!!! It would be great if you could make videos on visualization using Matplotlib & Seaborn.
@dataschool
@dataschool 5 лет назад
Thanks for your kind words, and for your suggestions! :)
@RahulKumar-bh9hb
@RahulKumar-bh9hb 4 года назад
Explanation techniques is great........want to thank you for sharing your knowledge......Grt videos
@bagushari1886
@bagushari1886 Год назад
Could you please make a video on how to handling missing values in multiple sheets in pandas? Or any recommendation source that I can read about it? Thanks in advance
@dataschool
@dataschool Год назад
Thanks for your suggestion!
@luqikong283
@luqikong283 4 года назад
The most amazing python tutorial I've watched so far. Fell in love with python.
@nyashagracenhandara7757
@nyashagracenhandara7757 2 года назад
thank you so much the explanation is very clear
@dataschool
@dataschool 2 года назад
Glad it was helpful!
@huseyngadirov7658
@huseyngadirov7658 7 лет назад
You look like Elon Musk!
@dataschool
@dataschool 7 лет назад
HA! I haven't heard that before :)
@MrJioYoung
@MrJioYoung 4 года назад
Thank you! Great instructions!
@dataschool
@dataschool 4 года назад
You're welcome!
@paula805
@paula805 6 лет назад
What inspires a down vote on any of these videos?? Always great content!
@dataschool
@dataschool 6 лет назад
Thanks Paul! :)
@mingqian813
@mingqian813 3 года назад
Thanks for all your well-made videos! I got to know you and your classes from Datacamp. As a beginner in the ML field, please allow me to ask a silly question. So if we have categorical features with missing values, do we need to handle missing values first then do categorical feature transformation using encoders? Or the order doesn't matter? Thanks!
@dataschool
@dataschool 3 года назад
Great question! Previous to scikit-learn 0.24, missing values need to be handled first if you are going to one-hot encode them. Starting in 0.24, OneHotEncoder can handle missing values itself. Hope that helps!
@NR_Tutorials
@NR_Tutorials 5 лет назад
thanks for Nice lecture we love ur sir
@dataschool
@dataschool 5 лет назад
You're welcome!
@uttamkumarpatra7616
@uttamkumarpatra7616 5 лет назад
You are simply awesome :) .thank you for making such wonderful videos
@dataschool
@dataschool 5 лет назад
That's so nice of you to say - thank you!
@tresortshimbombo3133
@tresortshimbombo3133 5 лет назад
That's exactly what I was looking for!
@dataschool
@dataschool 5 лет назад
Great to hear!
@tyl9680
@tyl9680 5 месяцев назад
In the last part of the video, why the number of "VARIOUS" made by fillna doesn't match the previous NA number?
@Person_Not_Known
@Person_Not_Known 6 лет назад
Thanks for your videos. most of the python online course i took... i just couldn't get into. Something about your cadence, data sets, and or approach just clicks with me. Thanks for the content.
@dataschool
@dataschool 6 лет назад
That's awesome! Thanks so much for sharing!
@pegasoos
@pegasoos 5 лет назад
I watched your first video, you are legend!
@dataschool
@dataschool 5 лет назад
Ha! Thanks :)
@delmaregals
@delmaregals Год назад
Hi let's say I accidentally changed the value like the one I line 19 where NAN is change to Various can I reverse the change?
@dataschool
@dataschool 11 месяцев назад
No, changes made through assignment (or inplace operations) are permanent!
@salamatburj9502
@salamatburj9502 6 лет назад
I think it would be great if you can make lecture about handling missing missing values for machine learning.
@dataschool
@dataschool 6 лет назад
Thanks for your suggestion!
@yashugarg1815
@yashugarg1815 5 лет назад
Doubt: Sir , If I want to assign Na to a value suppose 5.Means where ever 5 is present in a DataFrame it will be replaced by Na.then how I have to proceed???? Thanks
@mountainscott5274
@mountainscott5274 5 лет назад
df.column_name.replace(5, np.nan, inplace = True) check to make sure values are replaced with df.info()
@dataschool
@dataschool 4 года назад
Nice!
@amishbhat3560
@amishbhat3560 4 года назад
You told how to handle NaN values but if there are some other values such as "Not Provided" then what to do? How to ignore them?
@dataschool
@dataschool 4 года назад
Excellent answer! 👏
@nimesharya909
@nimesharya909 8 лет назад
Amazing, clear, precise and I got it working as well :)
@dataschool
@dataschool 8 лет назад
Great to hear!
@ExcelTutorials1
@ExcelTutorials1 2 года назад
This is super helpful, thank you!!!!!
@dataschool
@dataschool 2 года назад
Glad it was helpful!
@FE12343
@FE12343 4 года назад
Amazing video, thanks!
@astroinceptor
@astroinceptor 2 года назад
You saved my life twice today, your videos are great and the way you explain is really good. Thank you!
@dataschool
@dataschool 2 года назад
Thank you!
@HJ-uy6ez
@HJ-uy6ez 3 года назад
Great video, how do I get the codes? Does any know or can help? Thanks
@dataschool
@dataschool 3 года назад
See here: nbviewer.jupyter.org/github/justmarkham/pandas-videos/blob/master/pandas.ipynb Hope that helps!
@melvin9993
@melvin9993 6 лет назад
I learned a lot through your videos!
@dataschool
@dataschool 6 лет назад
Great to hear!
@wesleypgurira7142
@wesleypgurira7142 2 года назад
and by the way i love the way you teach , its just perfect
@dataschool
@dataschool 2 года назад
Thank you!
@rohitsinghal2972
@rohitsinghal2972 4 года назад
Sir what you told in the video is applicable only for the numbers and what should be done for the string values?
@dataschool
@dataschool 4 года назад
If you like, you can impute missing string values with the most common values using scikit-learn's SimpleImputer.
@SkillTop
@SkillTop 4 года назад
ROHIT SINGHAL Hi programmer🔌🤩 pleaaaase see my channel🌹
@samiagharib3796
@samiagharib3796 4 года назад
Do you handle missing data before splitting the data set (training set and test set) ?
@alexhenning7086
@alexhenning7086 4 года назад
Superb video! Thanks a lot it helps alot !
@dataschool
@dataschool 4 года назад
Glad it helped!
@jaden_vdb
@jaden_vdb 7 месяцев назад
How can we count each time we drop a row and not count the amount of NaN values?
@dishydez
@dishydez 3 года назад
Great video btw. Just a quick question. I am trying to build a benchmark, would it be okay to make the data standardized before creating it or?
@sammy0722
@sammy0722 4 года назад
Good video. Learnt a lot in short and crisp way
@dataschool
@dataschool 4 года назад
Thanks!
@shenhenry4667
@shenhenry4667 4 года назад
Thanks a lot for your video!
@dataschool
@dataschool 4 года назад
You are welcome!
@vinodkumarjodu4062
@vinodkumarjodu4062 5 лет назад
In Some Scenarios instead of NaN, will be having Zero, How do you handle those or how you will count number of Zeros
@mountainscott5274
@mountainscott5274 5 лет назад
df.column_name.replace(0, np.nan, inplace = True)
@dataschool
@dataschool 4 года назад
Thanks for sharing!
@grumpyae86
@grumpyae86 3 года назад
Exceptional would be a single word to describe your tutorial. Looking forward to binging on your videos lol. Thank you for such clear explanation.
@dataschool
@dataschool 3 года назад
Thank you! 🙏
@konradpyrz8559
@konradpyrz8559 3 года назад
This yung gentleman is simply amazing.
@dataschool
@dataschool 3 года назад
Thank you! I'm actually 40 years old now 😊
@kennethstephani692
@kennethstephani692 Год назад
Great video!!
@dataschool
@dataschool Год назад
Thanks!
@rahuldeepdraws8699
@rahuldeepdraws8699 3 года назад
This is actually the most clearly explained video on DataFrames that I have ever come across. Glad I found you. Thank you so much.
@dataschool
@dataschool 3 года назад
Glad it was helpful!
@bennguyen1313
@bennguyen1313 10 месяцев назад
Any suggestions on how to do linear interpolation on a List that has consecutive NaNs? For example, I use read_csv from pandas, and convert to a list: Y3List = df["Y3"].interpolate().tolist() Because each column in the csv file (100k lines long!) gets updated only after all other columns are updated: "Time","Y1","Y2","Y3","Y4","Y5","Y6","Y7","Y8","Y9","Y10","Y11","Y12","Y13" "2.730","","","27.598","","","","","","","","","","" "3.116","","","","1.14","","","","","","","","","" "3.624","","","","","0.025","","","","","","","","" "4.127","","","","","","13.786","","","","","","","" "4.631","","","","","","","0.019","","","","","","" etc there are many NaNs between a column's values, which causes .interpolate() to still generate a List with NaNs. Any suggestions?
@dataschool
@dataschool 9 месяцев назад
I'm not sure, sorry!
@rahulbhusari1478
@rahulbhusari1478 2 года назад
Really clear and amazing tutorial
@dataschool
@dataschool 2 года назад
Glad it was helpful!
@njoy2075
@njoy2075 4 года назад
Can you please post any complete project from scratch including pandas , matpootlib, scikitlean, seaborn ?
@deepakshisharma2660
@deepakshisharma2660 3 года назад
I used drop command to drop a col which has 10,000 same entries out of 50,000 but it is deleting all row when i use df.dropna(how='any').shape what i do?
@adelabdallah3833
@adelabdallah3833 Год назад
I actually have question, I have a dataframe grouped by month and country. Some of those countries don't have a value for a certain month which is causing anomalies in the visualization. I want to generate a record for the month and the country with zero if no record is found, how can I achieve that? Thanks in advance
@ItsWithinYou
@ItsWithinYou 2 года назад
I have 1 column with 100 rows. After dropping 4 rows with null values, new column has 96 rows. How to write a code that can tell me which 4 rows were dropped
@nasser_omar
@nasser_omar 3 года назад
What about displaying the rows where columns 'A' and 'B' both of them have any missing values?
@gabrielreilly7010
@gabrielreilly7010 3 года назад
Great videos covering the basics. I enjoy how the additional values within the functions are covered, i.e. axis, etc.
@dataschool
@dataschool 3 года назад
Glad it was helpful!
@eshaal2525
@eshaal2525 Год назад
Hi when I change na to nan in my data frame ...all I refers become floats... N first there was no bulk values showed ..but now there are null values
@asutoshnayak1391
@asutoshnayak1391 3 года назад
Bro how to do data cleaning in pandas ? What are the methods used for it ? Please reply
@rayrivera1830
@rayrivera1830 4 года назад
what happens if you have missing values while training the model, e.g. xgboost?
@محمدالفقى-ي4ب
@محمدالفقى-ي4ب 4 года назад
thank you so much
@dataschool
@dataschool 4 года назад
You're welcome!
@wesleypgurira7142
@wesleypgurira7142 2 года назад
hey , how can we replace a NaN value with the previous value in a database like on ufos (shapes ) instead of various you place maybe rectangle shape if it was before the NaN value
@AiBeast
@AiBeast 4 года назад
Nice video sir
@dataschool
@dataschool 4 года назад
Thanks!
@indreshkumar2002
@indreshkumar2002 7 лет назад
u are superb.i took a paid course but they were not able to make me explain these things as u explained me in such a easy way.thnx a lot.
@dataschool
@dataschool 7 лет назад
You are very welcome! Thanks so much for your kind comment!
@nadyamoscow2461
@nadyamoscow2461 2 года назад
Thanks a lot, your course is really helpful and very detailed. You are a great teacher!
@dataschool
@dataschool 2 года назад
Thank you!
@oysteijo
@oysteijo 8 лет назад
Hi Kevin! How can I fill na based on a condition? Say I want to fill NA for all missing cities, but only if the color is red.
@dataschool
@dataschool 8 лет назад
Great question! ufo.loc[(ufo.City.isnull()) & (ufo['Colors Reported']=='RED'), 'City'] = 'New value'
@ashishkhuraishy
@ashishkhuraishy 6 лет назад
Man thx btw😁
@prudhviraj3651
@prudhviraj3651 4 года назад
where did u handle the missing values you just showed where are the missing values lol
@ashishkumar-fk8rh
@ashishkumar-fk8rh 4 года назад
Can you make a tutorial on how to handle null values for Time-Series data?
@dataschool
@dataschool 4 года назад
Thanks for your suggestion!
@ashishkumar-fk8rh
@ashishkumar-fk8rh 4 года назад
@@dataschool Sir, Have a look at my capstone project. I did it until data preprocessing. Tell me how I am doing. Did I commit any mistakes or I left something? Can you give me your suggestions to make it better? Here is the Github link: github.com/aishrock006/Capstone-Project.git
@dataschool
@dataschool 4 года назад
Sorry, I won't be able to review your project - good luck!
@josekuruvilla1
@josekuruvilla1 4 года назад
This is an excellent explanation! I have a question regarding fillna. In my dataFrame, I want to fill all 'nan' to an empty value. Please see my situation below and help me. excel_file_path = theData.xlsx' df = pd.read_excel(excel_file_path) df = df.astype(str) for (column) in df.columns: df[column] = df[column].str.replace(r'[^\._!a-zA-Z0-9\s-]', '', flags=re.I, regex=True) df[column].fillna(value='', inplace=True) print(df) Can you please tell me why fillna is not working here?
@muhammadusmankhan5645
@muhammadusmankhan5645 3 года назад
how to replace comma separated '---' value wiith 0?
@magelauditore333
@magelauditore333 4 года назад
Sir pls make a series on NUMPY pls pls. Earnest request
@TrevorHigbee
@TrevorHigbee 4 года назад
Great videos. I love how all the CSVs are available online.
@dataschool
@dataschool 4 года назад
Thanks! 😄
@tommonks2490
@tommonks2490 4 года назад
Great explanation. This was a huge help. Thanks so much!
@dataschool
@dataschool 4 года назад
You're very welcome!
@nataliyakunderevych1211
@nataliyakunderevych1211 6 лет назад
Super. I understood everything. Nice explanation
@dataschool
@dataschool 6 лет назад
Thanks!
@carolinasantoslages5604
@carolinasantoslages5604 4 года назад
Excellent! Still have one doubt: how do I creat a third column (dummy variable) based on others two columns (dummy variables), considering that they have missing values. I don´t want to lose information, in other words, I want to consider the pair (NaN, 1) or (0, NaN) as 1 or 0.
@ashishsahu2925
@ashishsahu2925 3 года назад
Really helpful. This means if one needs to figure out number of rows with 1 or more Null values, the code should look like dataframe[dataframe.isnull().sum(axis=1) > 0].
@aravindgoli9407
@aravindgoli9407 5 лет назад
Super video
@dataschool
@dataschool 5 лет назад
Thank you!
@kostasnikoloutsos5172
@kostasnikoloutsos5172 7 лет назад
I think you can use str method replace to convert all NaN into whatever you want. What do you think?
@dataschool
@dataschool 7 лет назад
Might be possible, I'm not sure!
@TheBeltranito
@TheBeltranito 3 года назад
Hey, first of all thanks a lot for your videos! One question regarding the fillna() method you use. At the end of the video, when you check the NAs in Shape Reported it said that there were 2644 NaN. However, when you use the fillna() method, it appears that there are 2977 VARIOUS. I dont understand why there are more VARIOUS than NaN? Thanks in advance
@TheBeltranito
@TheBeltranito 3 года назад
Okay nvm, there was already a group called various with 333 observations
@samiaydogan8796
@samiaydogan8796 4 года назад
What if I want to replace NaN values with last 2 values mean before NaN,What should I do?(for example:df[0]=12 df[1]=2 df[3]=NaN, df[3] should be 7)
@dataschool
@dataschool 4 года назад
Sorry, I don't quite understand your question... good luck!
@osmankhaled4565
@osmankhaled4565 4 года назад
Hi, I imported excel file from a website with five columns. The numbers in one column are preceded with apostrophe ('4567). How to remove the apostrophe to have only the integer (4567). Thanks
@dataschool
@dataschool 4 года назад
You'll need to use a string method, and then change the data type of the column. See ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-bofaC0IckHo.html and ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-V0AWyzVMf54.html - hope that helps!
@keshavkashyap2012
@keshavkashyap2012 4 года назад
In this Tutorial for finding the missing city name we used syntax ufo[ufo.City.isnull()] but what if i have to find the missing "Shape Reported" the syntax ufo[ufo.Shape Reported.isnull()] is not working? how to specify the space?
@fatemehbehrad7161
@fatemehbehrad7161 5 лет назад
Soooo interesting 😀 thanks alot 🙏🙏🙏🙏🙏
@dataschool
@dataschool 5 лет назад
You're very welcome!
@eniisy
@eniisy 2 года назад
Dude it's just an awesome video, forgive me for saying this turning playback speed 1.25 is felts more normal hahah .Love ya, appreciate for your effort about teaching piece by piece !!!!!
@dataschool
@dataschool 2 года назад
Thank you!
@lingobol
@lingobol 7 лет назад
How do I deal with data where neither can I drop nor fill in any appropriate values? For example 'Colors Reported' in this example, we can't remove that coz it will impact drastically the number of observations bringing them down to 2882. Can't fill in any value coz it will affect the model.
@dataschool
@dataschool 7 лет назад
You could choose to not use that column as a feature, or you could fill in the missing values with imputed values. However, explaining missing value imputation is not something I can do in a few sentences. Good luck!
@rishusaini7874
@rishusaini7874 6 лет назад
Hi Data School, Hope you are doing very well, My Question : Can we highlight NaN value in result data? Waiting of your appreciate result. Thanks
@dataschool
@dataschool 6 лет назад
There's no simple way that I'm aware of.
@jonathanfriz4410
@jonathanfriz4410 3 года назад
Hi, how you can handle the ValueError: arrays must all be same length ? when df.transpone() is not an option?
@BrokenLightPole
@BrokenLightPole 5 лет назад
Great video and explanation as always!
@dataschool
@dataschool 5 лет назад
Thanks!
@easymortgagesolutionsptylt2018
@easymortgagesolutionsptylt2018 2 года назад
Would you be able di one video where you can explain different method of importing files to pandas and how to deal with corrupt file? Thanks in advance! By the way i like the way you explain !!
@dataschool
@dataschool 2 года назад
Thanks for your suggestion!
@msctube45
@msctube45 4 года назад
Excellent video Data School, very helpful, your explanations are clear and objective. Thank you !
@causap
@causap 5 лет назад
Markham, make America Great Again...You're the Boss..
@dataschool
@dataschool 5 лет назад
Ha! Thanks very much :)
@sharathnandalike8108
@sharathnandalike8108 5 лет назад
Hello Sir, If there exists a column in a data frame with only numerical values & if there's a single missing value in that column, can we take the mean of the column values to fill the missing value. Is this a correct method & how. How this will apply to 2 or 3 missing values. Thanks.
@dataschool
@dataschool 5 лет назад
I think you can use fillna for this purpose.
@ragurajan7567
@ragurajan7567 5 лет назад
Why do i use get dummies when i can just use the .replace function instead. ? In cases where one category has more value than the other for example in titanic survival, upper class had more chance of surviving than lower class. If i replace the upper class with 3 and the lower class with 2 or 1 , it actually is going to improve my model right?
@dataschool
@dataschool 5 лет назад
With ordered classes, you are correct that you can create a single feature column as you described. With unordered classes, that's not the case.
@saachishivhare4836
@saachishivhare4836 4 года назад
I am really loving your videos. Explored your channel just 2 days back!! Earlier I had no idea about pandas but after watching your video, I feel that I will be able to work on my assignment. Great Work! Thank you!
@adarshpandey6594
@adarshpandey6594 5 лет назад
I have a '?' in my dataset instead of 'NaN'......how do I do it with Imputer library please help me fast...I am doing my project and I am about to hit the deadline ....please
@dataschool
@dataschool 4 года назад
Sorry but I'm too late!
@vijaysinghchauhan5118
@vijaysinghchauhan5118 4 года назад
Do you have posted any video on how to replace NaN values for a column by deriving it from other columns like using KNN or any other imputation technique
@dataschool
@dataschool 4 года назад
No, sorry!
Далее
What do I need to know about the pandas index? (Part 1)
13:37
ОВР Шоу:  Семейные понты  @ovrshow_tnt
07:21
How do I use the MultiIndex in pandas?
25:01
Просмотров 174 тыс.
How do I make my pandas DataFrame smaller and faster?
19:06
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Don't Replace Missing Values In Your Dataset.
6:10
Просмотров 9 тыс.
How do I merge DataFrames in pandas?
21:49
Просмотров 158 тыс.