Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets

Derek Kane

Подписаться 20 тыс.

Просмотров 93 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Наука

Опубликовано:

24 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 50

@hannanazamkhan 6 лет назад

Probably the best and the easiest to follow explanation of Ridge, LASSO and Elastic Net Regressions. Thanks.

@jnscollier 9 лет назад

By the end of the video, I feel like I've just returned home after climbing a mental mountain. Putting it all together at the end with an example and walking through each model was truly an awesome and insightful experience. I 'feel' I learned a lot. Great teaching approach. Thanks a lot.

@DerekKaneDataScience 9 лет назад

+jnscollier Thanks for the kind words and I'm glad you finished the marathon. A lesser person might not have made it to the peak of Mt. Everest. :0) There is so much theory behind these techniques it can really be overwhelming at first. L1 & L2 penalties, matrix notation, shrinkage estimators, etc... However, these are fantastic tools to have in your repertoire.

@carodak9849 7 лет назад

Love the way you did this video. I was feeling exausted and thought that I would find another boring video but you made it easy to follow and I've woken up

@mdddd0731 8 лет назад

Very good presentation. Can I find your code that produced the result in the prostate cancer example?

@pinardemetci8868 6 лет назад

could you please label the x and y axes of your graphs in your future presentations? (There are also a few inconsistencies with parameters. e.g. when you use 'i' in the summation term but proceed with 'j' instead)

@jitenjaipuria 7 лет назад

wow, thanx for doing the hard work of explaining complex statistics well

@wiscatbijles 9 лет назад

At 7:28 the squares are not really what they should be. You take the x difference, where you should take the y difference and square that. For the rest good video.

@DerekKaneDataScience 9 лет назад

+Sjors van Heuveln Thank you for pointing this point out. You are 100% correct about the Y difference as the basis for the square. Thanks for watching

@harlananelson 6 лет назад

The error term is the vertical distance between the observed and the regression line. Your video incorrectly does not draw the side of each box to extend from the observed to the regression line. +Sjors van Heuvein is correct. You picture shows the sum{(Y^ - f(x+(Y^-f(x))) )^2 being minimized.

@罗星-g4t 8 лет назад

I am grateful for your video , it is very understandable!

@kaleabwoldemariam4288 8 лет назад

Derek, Thank you for providing a very good learning material. Can you please post a video entirely dedicated to Ridge Regression? Is it possible to use Ridge Regression to estimate the coefficients and determine which covariates are important drivers of the model?

@DerekKaneDataScience 8 лет назад

+Kaleab Woldemariam Thank you for the kind words and I will definitely think about adding more content to the Ridge. Some of the limitations of the Ridge Regression technique itself is that it does not lend itself to get a sense of the importance of the covariates that are the key drivers. I typically will flip to the LASSO and Elastic net variants to get this assessment (the technique itself will eliminate the less important variables through shrinking the value to 0). You could consider running a PCA analysis or employing variable selection routines to gauge variable importance before leveraging the Ridge Regression.

@scargomez9437 7 лет назад

Awesome. Why you stopped your videos?

@jacobhorowitz9939 6 лет назад

Where does the equation in the third bullet at 16:50 (for minimum lambda) come from? Did I miss where you defined these new parameters?

@petepittsburgh 8 лет назад

How can Ridge Regression and Lasso techniques apply to binary dependent variables. Is there a logit transformation necessary first?

@goodmanryanc 7 лет назад

Thanks Derek! That was helpful! I didn't feel that you explained how elastic nets capture multicollinear groups (schools of fish). It just looks like a blended version of Ridge and Lasso without creating any clusters/schools like you mentioned. Also, any insight into why Ridge outperformed Lasso and Elastic? Is that usually the case? Also, I think someone asked below... can these models be used for logistic regression (for classification/binary output - i.e. yes/no)? And, how do you handle binary inputs (I'm guessing no adjustment needed there)?

@sarahchen4385 9 лет назад

Very concise; visual and well presented.

@DerekKaneDataScience 9 лет назад

+Sarah Chen Thanks for the kind words and I am glad that you liked it.

@DezoCorka007 8 лет назад

This may be true, if you are a professional mathematician. As a layman, I was lost after ca. 10 minutes and the rest of the presentation was a series of pictures and hieroglyphs that I had no clue about.

@tobias2688 7 лет назад

Hey Derek, thanks for this great Video lecture! I just have a couple of questions: 1. What is the R Matrix in 23:10? Is it a variance-covariance Matrix? 2. Why does the Ridge model lose to OLS beyond the dashed line in 31:41? 3. When you compare the three models in the end by their MSEs, are these MSEs in sample or out of sample? Thank you very much!

@chloehe5523 6 лет назад

Hello, Do you think you could post the r-code for your last example

@peiwang3223 5 лет назад

Thank you sooo much, the explanation is so clear, you really save my life!

@DerekKaneDataScience 5 лет назад

Pei Wang thank you and you really made my day. Glad to help.🙂

@PinkFloydTheDarkSide 7 лет назад

This is a fantastic lecture video. One question. In the final comparison, the MSE for Lasso is higher by only 0.0124 but in return, we are getting rid of 2 variables. Don't you think it is worth the trade-off? Derek or anyone who is good in this, please answer. Thanks.

@phebewu 9 лет назад

Thank you. This is very valuable. I have one question about the Elastic net and hope you can help. Since Elastic net will include group of correlated variables, my question is if I apply elastic net, can I still interpret the coefficients on the effect to the prediction? I remembered that when multicollinearity is in presence , the coefficients will become the opposite sign. So i am concerned that I will not be able to interpret my coefficients. (in the end I want to be able to say, these set of variables can have positive impact on the predicted value while the other set of variables will have 0 or negative impact.) thank you!

@hh636 6 лет назад

At 58:00 why is 6.5 the ideal lambda?Is this eyeballed? Could it have been 6 or 7 instead?

@giorgossartzetakis8771 5 лет назад

An hour well spent. 2019

@albi232 9 лет назад

good tutorial man. good for the intuitions. I'll look for analytical derivations and codes in other videos. thanks

@danieldeychakiwsky1928 6 лет назад

At 21:37 you state that X transpose X is the correlation matrix of the data X? How does XTransposeX give you the correlation matrix?

@sagarvadher 6 лет назад

Awesome. Loved the last part!!

@MishaFeldman121 9 лет назад

at 26:40min it's 0.06 vertical line

@ضياءبايشسلمانالعبودي 5 лет назад

thank you .input it explained about the method of adabtv lasso

@meghananaik8017 6 лет назад

Very good presentation!!

@demudu 9 лет назад

Good insights on Ridge Regression..Thank you

@DerekKaneDataScience 9 лет назад

demudu naganaidu I'm glad that you found some value in this demudu. Ridge regression is kind of tricky and I find that it takes a little bit of work to get comfortable with it. Good luck.

@roffpoff8221 7 лет назад

WE WANT MORE !!!

@theq18 8 лет назад

Thank you very much for the video and explanation

@ashishsinha2555 7 лет назад

It’s really a nice tutorial on different regression techniques. I want to use LASSO/ELASTIC NET in my Ph.D. research problem (Problem: Correction of Satellite based Rainfall by using several independent variable such as location, topographical variables) May I have your personal email to discuss the problem with you?

@pasqualelaise1181 8 лет назад

Excellent

@preeyank5 7 лет назад

Thanks a lot Sir!! good explanation.

@scarletovergods 8 лет назад

Why use log of features instead of raw features at 54:24 ?

@amineounajim9818 8 лет назад

It's to correct skewness, also it's a common practice in linear models because transforming features make residuals more normally distributed. (Lo ok up feature transformation in linear models and boxcox transform for further information).

@chloehe5523 6 лет назад

By the way, that's a really helpful lecture!!!