Handling Missing Data | Part 1 | Complete Case Analysis

Подписаться 213 тыс.

Просмотров 54 тыс.

50% 1

Handling missing data is an essential step in the data preprocessing pipeline, ensuring that ML models are trained on high-quality, representative datasets, leading to more accurate and reliable predictions Techniques like imputation, dropping missing values, or advanced methods such as Multiple Imputation can be employed based on the nature and impact of missing data. Choosing the right strategy ensures the reliability and accuracy of your models.
Code Used: github.com/campusx-official/1...
============================
Do you want to learn from me?
Check my affordable mentorship program at : learnwith.campusx.in/s/store
============================
📱 Grow with us:
CampusX' LinkedIn: / campusx-official
CampusX on Instagram for daily tips: / campusx.official
My LinkedIn: / nitish-singh-03412789
Discord: / discord
E-mail us at support@campusx.in
⌚Time Stamps⌚
00:00 - Intro
00:58 - Handling Missing Data
05:50 - Complete Case Analysis [CCA]
07:09 - Assumption for CCA
09:38 - Advantages and Disadvantages of CCA
11:39 - When to use CCA?
13:24 - Code Example

Опубликовано:

13 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 47

@ajaykushwaha-je6mw 3 года назад

You are the fist youtuber on youtube with zero dislike. It makes me happy. Sir app ka effor kabile tareef hai !

@akash.deblanq 2 года назад

I can't comprehend how much I've learned from your videos. Got my first silver medal in kaggle today. All credit goes to you. Feature engineering is so important, I'm focusing really hard on all these topics and you've done an amazing job at making these thorough tutorials. You're a great teacher. 🙏

@MuhammadJunaid-yr8jd 11 месяцев назад

You are the fist youtuber on youtube with zero dislike. It makes me happy. Sir app ka effor kabile tareef hai

@ayesha11261 12 дней назад

This was extremely helpful and exactly what I was looking for. Thank you

@GamerBoy-ii4jc 2 года назад

Sir ap first hain jinho ne complete btaya k q or kb apply krna CCA wrna mostly har koi bs btaa deta k apply krna ye ni btata k q krna . Thank u so much Sir again for providing this knowledge.

@Sanjay_Singh_Bisht Год назад

Real Guru. Dhnya ho gaya main, jabse aapki video dekhi hai.

@1111Shahad Месяц назад

This is Gold for new learners, Thanks Nitish

@somilsaxenamusicworld8053 9 месяцев назад

Ur best on whole RU-vid ❤

@AmbujRai-ft5cx Час назад

new_df= df.dropna(subset=cols) to drop the rows and keep the cols as it is i.e the new_df.shape= (17182,13)

@Aestheticdeeps 9 месяцев назад

thank you soo much sir ,crystal clear😇

@ShubhamSharma-gs9pt 2 года назад

thanks sir !! great video

@ParthivShah 4 месяца назад

Thank You.

@user-eg9ff7ww8q 16 дней назад

best lecture

@ajaykushwaha4233 3 года назад

Guru ji, aap gajab ka padhate hain, maja aa jata hai.

@jitendratrivedi7889 2 года назад

Target to complete the playlist by 12th January 202. Deserve more views. You are doing a great job

@rohinijadhav744 2 года назад

Hey .. have you completed it?

@rohinijadhav744 2 года назад

Can you please share your experiences?

@hritikroshanmishra3630 11 месяцев назад

@@rohinijadhav744 tum kiyyee??

@furry2fun 11 месяцев назад

@@rohinijadhav744 this is the best course you can find on youtube. I completed it once, now i am revising for second time.

@GAMEZONEX7912 2 месяца назад

Hi! Thank you for the wonderful playlist. I have a query can we remove missing values using XGBoost), or probabilistic methods like Bayesian statistics .

@yashjain6372 Год назад

best as always

@kislaykrishna8918 3 года назад

clearly explained

@heetbhatt4511 10 месяцев назад

Thank you sir

@tusarmundhra5560 8 месяцев назад

awesome

@SumitKumar-uq3dg Год назад

Hello sir. How to understand whether the missing data is missing at random or not

@shubhankarsharma4094 10 месяцев назад

after CCA we are left with 17000 rows of new__df and 19000 rows of df .how to concatenate them for modeelling

@gautamdinga8213 7 месяцев назад

what if % of missing values of an attribute is exactly 5% then ? should we perform CCA

@rachanakotha6059 6 месяцев назад

@CampusX If factual data is missing like manufacture year of a vehicle. Is it fine to impute it? (Size of ds: 20k)

@niranjania 5 месяцев назад

cols= [var for var in df.columns if ((df[var].isnull().mean()0))

@niranjania 5 месяцев назад

hello sir i tried line 39 but it gave me an error message saying invalid syntax my code is correct

@zaidnadeem4918 Год назад

Targeting to complete it by 18

@nikitha3921 Год назад

How to add this cca data back to main dataframe???

@Noob31219 Год назад

Can you give one notes of all this toutorials

@kanha1733 2 года назад

Question :- after CCA, i concate the data but complete data is unbalanced becouse all rows are not same index

@historywithreese 2 года назад

df.dropna(subset=[column names])

@niranjania 5 месяцев назад

TypeError: '>' not supported be tween instances of 'method' and 'int'

@arshad1781 3 года назад

thank , but make a video on outlier in data

@akash.deblanq 2 года назад

Check all his videos man. He has already made a video on it.

@learnenglish699 2 года назад

@@akash.deblanq can i have your whatsapp number pls

@shady_wits Год назад

what does "var for var in df.columns if df[var] ................... " mean? what is "var" here?

@tusharshukla9361 Год назад

var is specific rows inside the dataset [row for rows in df.columns if df[row].isnull().mean() < 0.05 and df[row].isnull().mean() > 0]

@shubhamshrivastava5431 Год назад

What's yr name sir?

@tusharshukla9361 Год назад

hey I got error while using pd.concat() my code: temp = pd.concat([ # percentage of observations per category, original data df['enrolled_university'].value_counts() / len(df), # percentage of observations per category, cca data new_df['enrolled_university'].value_counts() / len(new_df) ], axis=1) # add column names temp.columns = ['original', 'cca'] ERROR: ValueError: No axis named 1 for object type Series I dont no where I am making mistake

@armankhan-pl6sw Год назад

To resolve the error, make sure the number of columns in the concatenated DataFrame matches the number of column names you are assigning.

@thegamingflow8933 9 месяцев назад

i have a question sir , Why we calculating distribution of Data in the end ? what if data is not missing completely at random then ? waste of efforts?

@DharmendraKumar-DS 5 месяцев назад

We have to check the distributions in order to make sure that missing data is random. After checking the plots and visuals, if we find that missing data is not random then we have to proceed with another approach.