Тёмный
No video :(

Handling missing data | Numerical Data | Simple Imputer 

CampusX
Подписаться 226 тыс.
Просмотров 52 тыс.
50% 1

Simple Imputer is a practical solution for filling missing numerical values in a dataset. This method replaces missing entries with the mean, median, or a specified constant, providing a straightforward approach to address and mitigate the impact of missing numerical data in your dataset.
Code Used: github.com/cam...
============================
Do you want to learn from me?
Check my affordable mentorship program at : learnwith.camp...
============================
📱 Grow with us:
CampusX' LinkedIn: / campusx-official
CampusX on Instagram for daily tips: / campusx.official
My LinkedIn: / nitish-singh-03412789
Discord: / discord
E-mail us at support@campusx.in
⌚Time Stamps⌚
00:00 - Intro
00:37 - Handling Missing Numerical Data
03:33 - Mean / Median Imputation
07:55 - Code Demo
17:15 - Imputation using SKlearn
20:15 - Arbitarry Value Imputation
22:40 - Code Demo
25:57 - End of Distribution Imputation
30:09 - Outro

Опубликовано:

 

14 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 69   
@PS_nestvlogs
@PS_nestvlogs 2 года назад
Nitish sir ek aap dekhna ek aisa time aayega jab Data science bolne par aapka channel CampusX hi hum jaise bande dekhenge.. so rich content.. apka videos dekh ke feel hua that agar pata hota pehle main course enrol nahi sakta sirf aapka channel se Data science seekh jata. you are an inspiration to everybody from this domain. kabhi channel band mat karna sir
@akash.deblanq
@akash.deblanq 3 года назад
Man, I can't stop commenting about how thorough your content is. Every other video on youtube covers this topic in 5 min videos. Thanks a lot for taking the time to talk about all the nuances.
@rachit_2410
@rachit_2410 4 месяца назад
From now onwards, I will comment on every video I watch of yours.
@ayesha11261
@ayesha11261 Месяц назад
I'm literally SO happy I found this video, I was trying so hard to find any resources that explained when to impute with what, and you explained it SO well here. Thank you so much
@ajaykushwaha-je6mw
@ajaykushwaha-je6mw 3 года назад
Sir, app ki jitni bhi tareef ki jaye wo kam hai. Aap bahot bahot accha padhate hain.
@namanjoshi5089
@namanjoshi5089 7 месяцев назад
can't believe the depth and how easily i understood everything ... this content is top tier
@AltafAnsari-tf9nl
@AltafAnsari-tf9nl Год назад
This channel contains the best content with notebook for each topic and in-depth intuition. Keep it up.
@JACKSPARROW-ch7jl
@JACKSPARROW-ch7jl Год назад
Thanks a lot bhai ,god will fulfill all your wishes 🙌🙌🙌
@ridoychandraray2413
@ridoychandraray2413 Год назад
Dada you are best than krish Naik
@charansai235
@charansai235 4 месяца назад
Nithish bhayya I’m commenting on every video , but still you deserve the applause for detailing, im looking for taking the enrollment in the 2.0
@MuhammadJunaid-yr8jd
@MuhammadJunaid-yr8jd Год назад
sir salute to you, n ur teaching
@shreyasur1534
@shreyasur1534 Месяц назад
Sir cross validation par ek video banaiye na pls
@tusharshukla9361
@tusharshukla9361 Год назад
thank god I found this channel 🙏❣
@ParthivShah
@ParthivShah 5 месяцев назад
Thank You.
@MonkeyDLuffy4885
@MonkeyDLuffy4885 Год назад
I first like the video, then watch it.... I am pretty sure the content is awesome😁
@learningpoint19
@learningpoint19 3 года назад
i watched you SVM kernal trick video and after that i moved toward channel for looking brand new content to check your improvement in content , i feel that you improved too much , best wishes for you
@devgupta2040
@devgupta2040 7 месяцев назад
At 10:57, sir has mistakenly filled missing values of Fare with the meidan_age and mean_age. It should be median_fare and mean_fare, respectively. The correct code is there in Github repo
@MeetShingala-un1zl
@MeetShingala-un1zl 22 дня назад
I wanted to ask if your are also supposed to fill the null values with median also in the X_test?!
@jimmysandhu66
@jimmysandhu66 2 года назад
Hi, As u always say ke fit the imputer/encoder on test data and transform both test and split. But isn’t there a chance of Data leakage with that. As explained in the course of kaggale data lekage happens if we do this thing .
@namanjoshi5089
@namanjoshi5089 7 месяцев назад
I have a doubt, if we have large field of numerical data missing like 10% < then what to do, which method to go for i mean. Btw great video sir !
@mylofiworld9979
@mylofiworld9979 5 месяцев назад
Love you Sir 🎉
@tusarmundhra5560
@tusarmundhra5560 9 месяцев назад
awesome
@ali75988
@ali75988 7 месяцев назад
If the data missing is less than 5% we use mean/median in numerical. If the data missing is more than 5%? then what do we do?
@gabriel46567
@gabriel46567 4 месяца назад
How do ya'll access English subtitles for this?
@sandipansarkar9211
@sandipansarkar9211 Год назад
finished coding and watching
@sanjaisrao484
@sanjaisrao484 10 месяцев назад
Thanks ji
@heetbhatt4511
@heetbhatt4511 11 месяцев назад
Thank you sir
@sreekanthtalari6582
@sreekanthtalari6582 3 года назад
Hi sir, is it mandatory to split the data into training and testing datasets?
@campusx-official
@campusx-official 3 года назад
Yes some form of cross validation is required
@sidindian1982
@sidindian1982 2 года назад
@@campusx-official Sir, You are God .. :-)
@ShubhamSharma-gs9pt
@ShubhamSharma-gs9pt 2 года назад
thanks sir !! great video:)
@GauravKawatrakir
@GauravKawatrakir 7 месяцев назад
What if we don't fill in missing data in case of less than 5% missing data?
@pradyumnakumardhar1541
@pradyumnakumardhar1541 2 года назад
sir we do the variance check on th whole data set or only on the xtrain data
@vikranttomar8392
@vikranttomar8392 Год назад
you have used add_plot and plotted multiple kde using matplot, Can you drop the code in reply to do the same using seaborn
@ajinkyakumbhar7585
@ajinkyakumbhar7585 Год назад
Sir suppose we use the mean or median to fill NA value and after filling the NA value data distribution has changed. So question is can we use the transformer function on data after filling NA values.
@nitheesh340
@nitheesh340 11 месяцев назад
Yes, at least as per the sequence of procedures which he explained, imputation is following by the transformation techniques. Transformation doesn't work with null values
@kushh7550
@kushh7550 2 года назад
Thanks a lot sir!
@pratikp007
@pratikp007 Год назад
I like every video that I watched on your youtube channel. But can you help me with this? I'm unable to download the dataset from your GitHub link. Can you please fix it
@mohitkushwaha8974
@mohitkushwaha8974 Год назад
Why cant we fill the missing data before split??? It would have become easier for us, else we have to repeat the same process for test data also.
@Neo_harris
@Neo_harris Год назад
@CampusX Don't you think in line 149 for Fare variable we should use .fillna(mean_fare/median_fare) instead of .fillna(mean_age/median_age)
@AbhishekSharma-xr2zu
@AbhishekSharma-xr2zu Год назад
@campusX i too am having the same doubt. Can you please clarify the same sir?
@siddhiyadav1537
@siddhiyadav1537 Год назад
It's a typo error I guess!
@ArjunYadav-tp6ee
@ArjunYadav-tp6ee 3 года назад
awesome....
@ajaykushwaha-je6mw
@ajaykushwaha-je6mw 3 года назад
is it possible to use Mean,median by simpleImputer on same feature ?
@GamerBoy-ii4jc
@GamerBoy-ii4jc 2 года назад
Sir according to google mean is not robust to outliers but median is robust to outliers, sir is it True? if it's true so why in the boxplot there are outliers increases after applying median method?
@krishnakanthmacherla4431
@krishnakanthmacherla4431 2 года назад
Box plot works by calculating percentile values , and as we imputed the null values with median, it identified newly imputed values into the inter quartile range , those which are already outside of the whiskers of the box plot are now identified as outliers , sir also said that those are not outliers but shown as outliers .. pls check how a box plot is drawn and you will get clarity 👍👍
@itsamankumar403
@itsamankumar403 8 месяцев назад
TYSM :)
@ko_bong_love3214
@ko_bong_love3214 3 месяца назад
one thing pls note: fare median and fare_mean are filled by median age and mean age mistakenly
@JourneytoWINover_PORN
@JourneytoWINover_PORN 7 месяцев назад
X_train = trf.transform(X_train) X_test = trf.transform(X_test) in the above part should not X_train have fit_transform?
@arshad1781
@arshad1781 3 года назад
Thanks
@vinayvakkalagadda7366
@vinayvakkalagadda7366 Месяц назад
How do we know whether data is missing at random or not
@ursdrex
@ursdrex 19 дней назад
plot the data
@aiforeveryone
@aiforeveryone Год назад
Cool
@asharkhokhar1839
@asharkhokhar1839 2 месяца назад
Github link is not working
@yashjain6372
@yashjain6372 Год назад
best
@shishukumarchoudhary5989
@shishukumarchoudhary5989 2 года назад
Sir we remove those column which has missing value less then 5% but it not reflect in original dataset. How we deal this stuff
@tarunchauhan2339
@tarunchauhan2339 Год назад
use: inplace=True as a attribute
@shishukumarchoudhary5989
@shishukumarchoudhary5989 Год назад
@@tarunchauhan2339 thanks bhai for your response
@dileepkr6069
@dileepkr6069 10 дней назад
Please re-upload the coding part
@dilipgyawali1776
@dilipgyawali1776 2 года назад
sir, please provide the dataset that you have created and used.
@campusx-official
@campusx-official 2 года назад
github.com/campusx-official/100-days-of-machine-learning/tree/main/day36-imputing-numerical-data
@dilipgyawali1776
@dilipgyawali1776 2 года назад
thank u sir
@AIWALABRO
@AIWALABRO 2 года назад
your code file is not accessible please provide the link to that github
@campusx-official
@campusx-official 2 года назад
github.com/campusx-official/100-days-of-machine-learning/tree/main/day36-imputing-numerical-data
@Noob31219
@Noob31219 2 года назад
Sabhi concepts jeise chipak rahe hai dimag mai😄😇
@JazlaanUr-Rehman
@JazlaanUr-Rehman 11 месяцев назад
there is a mistake in it
@amitattafe
@amitattafe Год назад
Yaar sir thumbnail badlo aur topic ka naam bhi. Mai 2 din se kaggle ka titanic dataset competition ki tyyari kar rha hun yeh video aaj google ne dikhaya hai.
@vikaskadam9842
@vikaskadam9842 Год назад
sir salute to you, n ur teaching
Далее
Qora Gelik
00:26
Просмотров 420 тыс.
Handling categorical data
11:13
Просмотров 10 тыс.