No video :(

Missing Indicator | Random Sample Imputation | Handling Missing Data Part 4

Подписаться 226 тыс.

Просмотров 40 тыс.

50% 1

The Missing Indicator method involves creating a binary indicator for missing values in a dataset, providing additional information on missing patterns. Random Sample Imputation, on the other hand, fills missing values with random samples from the observed data. These techniques offer alternative strategies for handling missing data in a dataset.
Code Used: github.com/cam...
============================
Do you want to learn from me?
Check my affordable mentorship program at : learnwith.camp...
============================
📱 Grow with us:
CampusX' LinkedIn: / campusx-official
CampusX on Instagram for daily tips: / campusx.official
My LinkedIn: / nitish-singh-03412789
Discord: / discord
E-mail us at support@campusx.in
⌚Time Stamps⌚
00:00 - Intro
00:12 - Revision
02:12 - What is Random Imputation?
08:35 - Code Demo using Titanic Dataset
21:33 - Missing Indicator
30:17 - Automatically selecting value for Imputation
36:36 - Outro

Опубликовано:

14 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 42

@simranagichani4943 5 месяцев назад

You give logic to every step and the answers to why and how are just amazing.!!!!!!! Truely phenomenal.

@Keep_Laughfing Год назад

Amazing sir I watched a lot of video as datascientist but ur explanations are matchless. .

@alimuiz5328 18 дней назад

Thank you for another great video, sir. At 20:15, shouldn't it be 'data' instead of 'df'?

@manujkumarjoshi9342 11 месяцев назад

The best part is you always target production, 👍

@murumathi4307 3 года назад

Thank you so much sir,you explain very good.. keep going and rack

@Noob31219 2 года назад

Thankyou sir ji😭

@SreevalliJonnavithula-rm6rs Год назад

One question.. why did you run the imputation already and then trying to find out which is best using GridSearchCV? Without applying impute, we should run GridSearchCV and then use the best one right, logically?

@mridang2064 2 года назад

Thanks a lot 🙏

@jiteshsingh6030 2 года назад

Amazing 🔥

@zee4654 2 года назад

Sir! ik video ( ffill ) and ( bfill ) pr bna dain. when we should use them .

@namansethi1767 2 года назад

thanks sir, for these videos

@ParthivShah 5 месяцев назад

Thank You.

@kameshyuvraj5693 3 года назад

the way you communicated is ultimate byya

@SACHINKUMAR-px8kq Год назад

Thankyou So much Sir

@krinabhikadiya9617 2 года назад

Here, at the last example SimpleImputer(add_indicator=True) why you train model on x_train_trf2 or x_test_trf2? I think you should use x_train and x_test.

@tusharkhatri5795 Год назад

same ques as he already used fit transform on that x_train_trf2 and x_test_trf2 above why he is using again below then if using add_indicator =True)

@AbdulHannan-dg6dl Год назад

There are two ways to do the same task. Mistakenly he mentioned X_train_Trf2 and X_test_trf2 at start which should be at end

@heetbhatt4511 11 месяцев назад

Thankyou sir

@ajaykushwaha-je6mw 2 года назад

Sir Ramdom value imputation ko column Transform mein kaise use karenge ?

@AbdulHannan-dg6dl Год назад

from sklearn.compose import ColumnTransformer from sklearn.impute import SimpleImputer from sklearn.preprocessing import StandardScaler # Select the numerical columns numeric_cols = ['age', 'income'] # Create the preprocessing pipelines for the numerical columns numeric_pipeline = Pipeline([ ('imputer', SimpleImputer(strategy='random_sample')), ('scaler', StandardScaler()) ]) # Create the ColumnTransformer preprocessor = ColumnTransformer([ ('numeric', numeric_pipeline, numeric_cols) ])

@arshad1781 3 года назад

thanks

@sahibnoorsingh2432 Год назад

what does random_state=2 means in the code?

@preetisrivastava1624 Год назад

I am still confused when to use pipeline and when column transformer

@Alive-Ness Год назад

Suppose you have 1 column on which you want to apply ordinal encoder and on any other column onehotencoder so here you have to first split those columns from the dataset for applying those two transformers on it and after applying again you have to merge those two columns with dataset.....so it seems very time consuming and boring, right ?....so here comes the superhero whose name is ColumnTransformer and with the help of column transformer you can apply both those transformers on both columns in a single code and that too without splitting them from the dataset which reduces our lot of efforts......so in the case where you want to apply more than one transformer on more than one column then simply use columntransformer. When to use pipeline: We have trained our ML model on the preprocessed data but when the model will be in the production then the users will not provide us the preprocessed data....so here we use pipeline in which the input by the user goes and pass through all the preprocessing steps we performed while training the model and in the pipeline we passes columntranformers because those are the heroes who took our dirty data and gave us the preprocessed data and here they do the same with the input from real time users and in pipeline we also passes our model. In piepline the output form one step acts as a input for the next step and after completion of all the steps, it goes to our model for prediction and that's why we pass model in pipeline. So in the production, we use the strength of both column transformer and pipeline together. I know kuch zyada hi explain kar diya par nalla baitha tha so I thought thoda revision hi karlu😮‍💨...I hope it would help you

@preetisrivastava1624 Год назад

@@Alive-Ness hahaha you did a great job in explaining the concept.....I got it..... thank you very much man🙏🙏

@GamerBoy-ii4jc 2 года назад

Sir why you divide the imputed_firplacequ feature to total length of "DataFrame" rather than "X_train" ? and for Garagequal feature you divide with len(X_train) what is correct ?

@snrmedia8965 2 года назад

I also wanted to same question

@datascience4487 Год назад

len(X_train) is correct

@flakky626 Год назад

Random imputation code ka part hard pad raha little bit...GAve it like 2-3 hours still having doubts

@mitalisingh8405 Год назад

at the time of concat, I have a float error. someone, please help me with the conversion.

@Keep_Laughfing Год назад

Change float to int.

@aaditstudent Год назад

instead of int(df.col), use => df.col.astype(int)

@mdriad4521 Год назад

name 'observation' is not defined,,but why?

@anshulsharma7080 Год назад

Yup

@sudipgautam7471 2 месяца назад

observation is not a dataframe there

@excalibur2889 2 года назад

While applying missing indicators Do we have to make a separate column for all the columns that have missing values? Suppose I have 5 column with missing values and if I want to use missing indicators how many columns would I need 1 more or 5 more?

@sanjibanmohanty8467 Год назад

Hi Sir, I am getting below error; What is observation["Fare"] here ? Code : sampled_value = X_train["Age"].dropna().sample(1, random_state = int(observation["Fare"])) Error: --------------------------------------------------------------------------- NameError Traceback (most recent call last) in () ----> 1 sampled_value = X_train["Age"].dropna().sample(1, random_state = int(observation["Fare"])) NameError: name 'observation' is not defined.