Тёмный

Feature Selection Techniques Easily Explained | Machine Learning 

Krish Naik
Подписаться 1 млн
Просмотров 216 тыс.
50% 1

Feature Selection is the process where you automatically or manually select those features which contribute most to your prediction variable or output in which you are interested in. Having irrelevant features in your data can decrease the accuracy of the models and make your model learn based on irrelevant features.
References : towardsdatasci....
#FeatureSelectionTechnique
Github url :github.com/kri...
You can buy my book in Finance with ML and DL from the below url
www.amazon.in/...

Опубликовано:

 

8 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 209   
@rithishkesav
@rithishkesav 3 года назад
This video is really the one that I was looking for all over the internet. The video was succinct and to the point.
@adithyaramula16
@adithyaramula16 3 года назад
What if we have categorical features? Should we keep them aside from feature importance or should we encode them and treat them as numerical?
@ramthiagu2330
@ramthiagu2330 3 года назад
Problems which I faced while working on this: Univariate Selection: independent variables should be non-negative, independent features shouldn't have object or category features. Feature importance: dependent variable shouldn't be a continuous variable. Correlation matrix: it will omit category and object features.
@SSNU706
@SSNU706 5 лет назад
Thanks a lot for all your videos and your efforts. I would like to mention something regarding Covariance and Correlation Coeffiicients @3:25 that you talked about. What you mentioned about Correlation Coeff is correct that it ranges from -1 to 1. However, Covariance is measure of correlation and doesn't vary between 0 & 1 but varies from -inf to +inf and is unit dependent(ie, for ex: covariance of age vs height in cms will be drastically different with age vs height in inches is considered as both of them are in different units) unlike correlation coefficient where the values are standardized and hence it ranges from -1 to 1.
@amalsunil4722
@amalsunil4722 4 года назад
yea bro ur right i thought the same
@DennisRiungu
@DennisRiungu 5 месяцев назад
You are the best explainer of Machine Learning concepts Krish. I have greatly benefited form your intuitive and practical explanations. Thank you!
@07blue71
@07blue71 5 лет назад
You definitely deserves more views! Thank you so much!
@MrAyandebnath
@MrAyandebnath 3 года назад
Best feature selection video on youtube. Thanks @Krish Naik
@kailashkrea3884
@kailashkrea3884 4 года назад
I loved ur video of feature selection. It is really informative. I just wanted to know when we can consider a data set to be huge and when we can consider it to be a smaller one ... could you please make a video or some analysis on how to handle those data sets and best possible approaches and industrial standards used on the same too?
@mansour6629
@mansour6629 3 года назад
You cover important topics, no the most important topics. So thanks!
@joonsenews3954
@joonsenews3954 3 года назад
You men deserve more more more and more likes, Just because of you I got my results. Thank you for sharing this with us.
@crypto_visions-u7f
@crypto_visions-u7f 4 года назад
sir,really you are a great person....now I can learn and understand better way this DS....... THANK YOU SIR..please help me for internship for data science....
@bhavindedhia9968
@bhavindedhia9968 4 года назад
Top class Quality Content it really help me to understand each and every topic easily
@sm-pz8er
@sm-pz8er 4 месяца назад
Very well explained . Thank you
@dominic2446
@dominic2446 3 года назад
15:17 I think there is a simpler way to select the columns for the X matrix, ie X=data.iloc[: , :-1] instead of X=data.iloc[; , 0:20]
@somashekar6431
@somashekar6431 3 года назад
Really super👌
@theotherside5851
@theotherside5851 2 года назад
Excellent
@yashkhant5874
@yashkhant5874 3 года назад
GREAT.... GREAT.... GREAT ...... EXPLAINATION KRISH BHAI
@RAVIKUMAR-qg1yp
@RAVIKUMAR-qg1yp 4 года назад
Bahut badhiya...
@thongtech1984
@thongtech1984 4 года назад
Really appreciate your knowledge sharing, my friend. Just only 1 thought, it would be a little better if you can slow down a little bit, for a foreigner like me it would take some time to understand and absorb.
@AmeerulIslam
@AmeerulIslam 4 года назад
you can watch ta video at .75x
@reaganlopezmusic
@reaganlopezmusic Год назад
Thanks a ton Krish.. u rock..
@DEEPAKSHARMA-kt6qk
@DEEPAKSHARMA-kt6qk 2 года назад
thank you for such videos . i really love your work . please keep uploading these videos related to data science field
@huangkevin5183
@huangkevin5183 2 года назад
Thank you so much Krish for this tutorial!
@semaakkus6287
@semaakkus6287 2 года назад
You are hero. Thanks for your explanation.
@Bhavanareddytmp
@Bhavanareddytmp 3 года назад
Good explanation sir 👍👌👏
@jamesang7861
@jamesang7861 4 года назад
Thank you so much! 1 video settles all the confusion!
@ZahidHasan-cc8tf
@ZahidHasan-cc8tf 3 года назад
Easy to understand. Thanks for the video.
@RAKIBULISLAM-ru1mf
@RAKIBULISLAM-ru1mf 4 года назад
Priceless lectures sir,It has well explanation with codes and beautiful examples
@kushalhu7189
@kushalhu7189 3 года назад
Best explanation...
@prabhukesavan2403
@prabhukesavan2403 3 года назад
Nicely explained...
@getahunberhanu8859
@getahunberhanu8859 2 года назад
I like it. but it is better if you leave the training data under description.
@katadermaro
@katadermaro 3 года назад
this vid was my aha moment. ty!
@louerleseigneur4532
@louerleseigneur4532 3 года назад
Thanks Krish
@soumayanmajumder7799
@soumayanmajumder7799 4 года назад
From above we can see that correlation matrix also give correlation between independent features,so it is more helpful than univariate selection. So can you say when we should use univariate selection over correlationmatrix and why ?
@ashwinprasad5180
@ashwinprasad5180 3 года назад
Correlation will be useless if the variables have non linear dependence. So ,you need to be careful
@ayikkathilkarthik4312
@ayikkathilkarthik4312 3 года назад
@@ashwinprasad5180 But if we are using Spearman's correlation, then it will capture non linear relation too. Then why to use univariate feature selection?
@ashwinprasad5180
@ashwinprasad5180 3 года назад
@@ayikkathilkarthik4312 I don't think spearman's correlation can detect non linear dependence . Now coming to your second question , we can use univariate correlation to make sure some features are dependant with the dependent variable. But , you can't remove features based on low value for correlation coefficient itself. For , that we might need to plot the variables with respect to each other and check for ourself. Correct me if I am wrong
@maralazizi
@maralazizi 2 года назад
Your videos are the best! Thank you!!
@praveensingh-lx4dk
@praveensingh-lx4dk 4 года назад
Very helpful. Loved every second. Thank you very much.
@giridharnair01
@giridharnair01 3 года назад
awesome video sir, superb explanation
@mhmoudkhadija3839
@mhmoudkhadija3839 Год назад
awesome tutorial! Thanks for this amazing work !
@zeeshan3703
@zeeshan3703 3 года назад
Great Insights, and now I have SUBSCRIBED, and can't wait to see more from you. Huge Thank You!!!
@prashantwadkar8443
@prashantwadkar8443 4 года назад
Your explanation helped a lot
@vivekagrw
@vivekagrw 4 года назад
Good job...krish, really appreciated your effort. Partical approach is always best way of learning
@shaiksuleman3191
@shaiksuleman3191 4 года назад
Wow Super B Explanation Sir
@mohdazam1404
@mohdazam1404 4 года назад
Just awesome..... damn good explanation Karish ! Thanks for the video
@mvikyk
@mvikyk 4 года назад
Very informative. Thank you. At 20:15 - Just wondering why you chose to apply correlation again by doing data[top_corr_features].corr() when its already available in corrmat. Just the fear of missing out something important. Code given below ##get correlations of each features in dataset corrmat = data.corr() top_corr_features = corrmat.index plt.figure(figsize=(20,20)) ##plot heat map g=sns.heatmap(data[top_corr_features].corr(),annot=True,cmap="RdYlGn")
@satriogani3105
@satriogani3105 4 года назад
If I am not mistaken, covariance and correlation coefficient measure the linear relationship between two independent random variables. If their values are zero, it mean that there is no linear relationship. But this does not mean there is no interaction between these two random variables. Instead of looking on correlation coefficient, shall we use another test that can check independence between these variables?
@akashpoudel571
@akashpoudel571 5 лет назад
Sir , it is an awesome video....Very elegantly explained
@SaiPavankumar1
@SaiPavankumar1 4 года назад
Great Video. But I am not sure if Chi square test can be applied between a continuous and categorical variable. In the uni variate analysis chi square was performed between all variables including continuous against categorical target variable
@adityabenere6004
@adityabenere6004 3 года назад
I have the same doubt......
@murtazajabalpurwala8124
@murtazajabalpurwala8124 3 года назад
very nice video
@anuragpareta1893
@anuragpareta1893 4 года назад
Excellent explanation Krish.. Tnx
@ultraprogramming4681
@ultraprogramming4681 4 года назад
Very nice
@theq18
@theq18 4 года назад
Great tutorial, very helpful.
@aksmalviyan8342
@aksmalviyan8342 Год назад
thanks for this...
@SabahMahjabeenSarwar
@SabahMahjabeenSarwar 5 месяцев назад
This was really helpful, but as I can see its a classification problem, I can use the chi-square test as far as I know. this test can be used to assess the strength of the relationship between categorical features and the output variable (which is categorical in this case ). But for not classifying problems if the data are numerical continuous data we have to use co relation. Am I right ? please help me with this doubt.
@sagataroy5357
@sagataroy5357 4 года назад
Great video Krish .. Keep uploading more and more
@fineescape1257
@fineescape1257 2 года назад
Hello Mister! Could you please also distinguish the difference between univariate selection and multiunivariate selection? Also, is there a difference between feature ranking, feature selection and feature extraction?
@vroomvroom4308
@vroomvroom4308 4 года назад
Awesome Elaboration
@NeeRaja_Sweet_Home
@NeeRaja_Sweet_Home 4 года назад
Nice Video... in correlation techniques with heat map we have to select features which are close to 1 or greater than 0.2
@md.shafaatjamilrokon8587
@md.shafaatjamilrokon8587 2 года назад
Watched full video
@phanikumar3136
@phanikumar3136 4 года назад
Krish, In general we apply chi^2 test for only categorically variables but here in the data we considers is consists of numeric data hw can we apply chi2 in univariate selection.... Can u plzzz help up with the query.
@amalsunil4722
@amalsunil4722 4 года назад
Exactly man
@adityabenere6004
@adityabenere6004 3 года назад
brooo i also have the same doubt.....if you find any answer let me know too.
@sandykumar5350
@sandykumar5350 5 лет назад
Thanks you so much for your fabulous explaination!!!
@pnfei
@pnfei 4 года назад
Thanks for your video and could you explain briefly how to score each feature or the philosophy behind the scoring? Or, some references are also preferable.
@dagma3437
@dagma3437 4 года назад
Great explanation. Thank you.
@ayikkathilkarthik4312
@ayikkathilkarthik4312 3 года назад
Doubt here: How Chi-square is working with non-categorical attributes here? In stats video, you said chi-squared is only applicable to finding relation between categorical attributes.
@adityabenere6004
@adityabenere6004 3 года назад
brooo i also have the same doubt.....if you find any answer let me know too.
@143balug
@143balug 4 года назад
Very helpful. Thank you very much.
@amitjajoo9510
@amitjajoo9510 4 года назад
superb explain sir
@ehtishamraza2623
@ehtishamraza2623 5 лет назад
you are doing very good job
@MsRAJDIP
@MsRAJDIP 5 лет назад
You are awsome man...I am going to see all your videos
@sushantshekhar8082
@sushantshekhar8082 4 года назад
Krish, I have a doubt, we apply chi2 test for categorical features then why we have applied it for numerical features?
@amalsunil4722
@amalsunil4722 4 года назад
exactly
@adityabenere6004
@adityabenere6004 3 года назад
brooo i also have the same doubt.....if you find any answer let me know too.
@tonyhathuc
@tonyhathuc 3 года назад
The best!!!!!!!!!
@70ME3E
@70ME3E 3 года назад
"and this will definitely probably work" is the best thing statisticians can say I guess :P one step better than 'probably approximately correct' )
@shravanshukla5352
@shravanshukla5352 Год назад
Covariance lies between to - infinity to + infinity
@upendra35
@upendra35 2 года назад
Great
@migi7787
@migi7787 3 года назад
Perfect!
@adityabenere6004
@adityabenere6004 3 года назад
Can anyone tell me at 18;30 when krish used Extratreesclassifier and the score for every explanatory varible was generated, what is the math behind those "scores" (numbers)????
@fun-ih5sc
@fun-ih5sc 4 года назад
awesome sir.. Liked it n Learned from it :)
@RenormalizedAdvait
@RenormalizedAdvait 2 года назад
Please correct: 3:24 Covariance is not bound to vary between 0 and 1 , covariance can have values greater than 1 as well as less than 0.
@tamilmanimadhaiyan4821
@tamilmanimadhaiyan4821 5 лет назад
awesome explaination tq sir
@bhaskarrao2169
@bhaskarrao2169 4 года назад
Great content..thanks a lot Kris
@sandalidahanayake4972
@sandalidahanayake4972 Год назад
pls do a video about automatic feature selection with Featurewiz :)
@azmath4710
@azmath4710 5 лет назад
It is perfect explanation
@explorenations892
@explorenations892 4 года назад
Amazing video😍
@banjiaderibigbe1415
@banjiaderibigbe1415 3 года назад
how do we apply these method when the dataset is mixed with categorical variable
@Ash-bc8vw
@Ash-bc8vw 3 года назад
Thank you so much!
@atuljain8340
@atuljain8340 3 года назад
your content is too good, it really help me alot, but what if the data is categorical? Will that still work or I have to convert the data into numerical form.
@remrem6681
@remrem6681 3 года назад
Which one should use all checking relationship with target
@rudroroy1054
@rudroroy1054 Год назад
Awesome tutorial, thank you. One question that I have is, will these techniques be applicable if the data set has categorical or object/char type independent variables? The data set used here has all numeric variables.
@ankurkaiser
@ankurkaiser 8 месяцев назад
For numerical features : Annova Test, Chi sqr test, correlation and VIF technique For Categorical : Embedded Methods (penalising the coefficients to reduce loss), Tree Based Models, MRMR( min redundancy max relevance), SHAP and even visualising the dataset using melt method.
@pratiksawant24
@pratiksawant24 Год назад
Hi Sir, If I have (A,B,C,D,E) columns and E is my target column, then while training the model Should I use E also in X or I should use only (A,B,C,D) ?
@sz8558
@sz8558 2 года назад
How would you define a small - med -large dataset? Small being min 5K datapoints etc?
@pepetisiddhardha9848
@pepetisiddhardha9848 3 года назад
should we standardize or normalize data before doing this feature selection ??
@ajaykushwaha-je6mw
@ajaykushwaha-je6mw 2 года назад
Hi Krish, chi square we use for categorical variable but here few feature are continuous variable.
@jayakrishnamohapatra628
@jayakrishnamohapatra628 5 лет назад
Hi Krish All of your videos are simple and excellent. I am not sure whether to ask you or not but if possible could you please upload the videos in 1080p format as well.
@user-mo6xs6uk4c
@user-mo6xs6uk4c Год назад
Curious to know if we should scale the dataset before applying feature selection method or after applying the feature selection method? Thank you!
@JoEl-jx7dm
@JoEl-jx7dm Год назад
Depends on your dataset, scaling an unscaled dataset could give you a big hell of a difference when it comes to feature selection
@Raja-tt4ll
@Raja-tt4ll 4 года назад
Very nice video Thanks :)
@akd9977
@akd9977 5 лет назад
Excellent one . Thanks for sharing. I have one query. In a dataset of 2 lakhs records where a variable 'country' is having high correlation with target and the country column contain all 300 country name in character format, what do you suggest. Can I use one hot encoding.
@algorithmsguide5076
@algorithmsguide5076 4 года назад
I have similar question. have you found any solution or share any suggested articles. please
@sarthakbhatt100
@sarthakbhatt100 4 года назад
@krish naik How can you apply Chi Square test on continuous features?
@sana3358
@sana3358 4 года назад
same doubt pls clarify my doubt
@adityabenere6004
@adityabenere6004 3 года назад
@@sana3358 brooo i also have the same doubt.....if you find any answer let me know too.
@tennysonchildofgod9992
@tennysonchildofgod9992 4 года назад
You nailed it! Thank you.
@3663johnny
@3663johnny 4 года назад
Good explaination
@alokprasad3726
@alokprasad3726 4 года назад
Hi, Suppose while selecting the best features using correlation matrix, if I found out that the two features i.e. the two independent variables have a very high value of the coefficient of the correlation say 0.8 or 0.9, so shall I drop one of the features or continue working with both?
@rohitrathod8150
@rohitrathod8150 4 года назад
Check accuracy by selecting - 1. First feature 2. Second feature 3. Both feature Calculate accuracy for all the 3 cases and select the best. Cheers :)
@lifebetterment2149
@lifebetterment2149 4 года назад
When there are multiple X-axis values such as x0 to x9 and other x values for an activity as from a gyroscope dataset which feature engg technique should we apply..
@DuyHoang-me3rg
@DuyHoang-me3rg 5 лет назад
Thank for your video. can u make a video for explaining about metrics and how to choose a right metric for our algorithms???
@hamzaarif4964
@hamzaarif4964 4 года назад
Thank you so much❤
@panizalinejhad8922
@panizalinejhad8922 5 лет назад
This is great! thanks. May I ask you why the results of my feature importance score differs every time I run the code. I am trying to get the top 30 and every time I run the code, it gives me a different set of features as the top30 largest feature importances.... am I doing something wrong?
Далее
Feature selection in machine learning | Full course
46:41
How do I select features for Machine Learning?
13:16
Просмотров 176 тыс.
Standardization Vs Normalization- Feature Scaling
12:52