Тёмный

Ridge vs Lasso Regression, Visualized!!! 

StatQuest with Josh Starmer
Подписаться 1,3 млн
Просмотров 252 тыс.
50% 1

Опубликовано:

 

29 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 422   
@statquest
@statquest 2 года назад
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@thatguyadarsh
@thatguyadarsh Год назад
Subscriiibed ! DOUBLE BAM!!! 😂😂
@Mabitesurtaglotte
@Mabitesurtaglotte 4 года назад
Still the best stat videos on RU-vid You have no idea how much you've helped me. You'll be in the acknowledgments of my diploma
@statquest
@statquest 4 года назад
Wow, thanks!
@johnnyt5108
@johnnyt5108 3 года назад
He'd probably like better to be in the acknowledgments of your checkbook then
@reflections86
@reflections86 Год назад
@@johnnyt5108 I am sure many people will do that by buying the reports and the book Josh has written.
@LauraMarieChua
@LauraMarieChua Год назад
update after 2 years: did u include him on ur diploma?
@chzpan
@chzpan 7 дней назад
Agree with you, yet "unfortunately, no one asked me!".
@xvmmazy4398
@xvmmazy4398 8 месяцев назад
Dude you succeed at helping me and at making that thing funny as I'm struggling with my ML homework, thank you so much
@statquest
@statquest 8 месяцев назад
Glad I could help!
@surendra1764
@surendra1764 Год назад
the least sum of squared errors is possible only when lamda value is zero (which is what i think important to have low error) but slope is high tough , for lamda value 40 the slope is low compared to lamda = 0 but sum of squared errors is high compared to lambda = 40 . why we are looking to minimize slope when sum of squared error is important
@statquest
@statquest Год назад
We're not necessary trying to minimize the slope. However, reducing the slope a little bit compared to the least squares slope, might help the model perform better in the long run. For details, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Q81RR3yKn30.html
@Dynamite_mohit
@Dynamite_mohit 4 года назад
Was about to start this topic. Thanks @Statquest Hellowww , I just have quick question. by what time we should expect your video on Neural networks? And I have a request, could you add your upcomming videos on your website in a separate sections. By looking at the topic and the date odr month it will be uploaded. It will be very helpfull for students to buy a subscription plan of your channel in order to get early access to your videos.
@statquest
@statquest 4 года назад
You can get early access by becoming a channel member or signing up for my Patreon: www.patreon.com/statquest
@germantempo
@germantempo Год назад
In 6:05 isn't it L1 penalty? thanks
@statquest
@statquest Год назад
Most people call it the L1 norm.
@nabeel123ful
@nabeel123ful Год назад
disagree ... the overall Lasso/Ridge SSE is basically the fixed base SSE + a very curved quadratic as a function of slope, if the curved quadratic over-take the fixed SSE curve, then the center would surely lie at the minimum of the curved quadratic.
@statquest
@statquest Год назад
Feel free to check out the proof given in The Introduction to Statistical Learning in R to learn more.
@ehg02
@ehg02 4 года назад
Can we start a petition to change the lasso and ridge names to absolute value penalty and squared penalty pwease?
@statquest
@statquest 4 года назад
That would be awesome! :)
@JoaoVitorBRgomes
@JoaoVitorBRgomes 3 года назад
@@statquest I am listening to u on spotify
@statquest
@statquest 3 года назад
@@JoaoVitorBRgomes Bam!
@chyldstudios
@chyldstudios 4 года назад
The visualization really sells it.
@statquest
@statquest 4 года назад
Thanks! :)
@hemaswaroop7970
@hemaswaroop7970 4 года назад
Fantastic, Josh!! Thank you very very much. We all owe you a lot many thanks. "I" owe you a lot. 😊😊👍👍
@statquest
@statquest 4 года назад
Awesome! Thanks! :)
@praveerparmar8157
@praveerparmar8157 3 года назад
"Unfortunately, no one asked me" 🤣🤣🤣
@statquest
@statquest 3 года назад
:)
@Azureandfabricmastery
@Azureandfabricmastery 4 года назад
Hello Josh, Ridge and Lasso clearly visualized :) I must say that if one thing that makes your videos clearly explained to curious minds like me is that the visual illustrations that you provide in your stat videos. Glad. Thank you very much for your efforts.
@statquest
@statquest 4 года назад
Thank you very much! :)
@platanus726
@platanus726 3 года назад
You are truly an angel. Your videos on Ridge, Lasso and Elastic Net really helps with my understanding. It's way better than the lectures in my university.
@statquest
@statquest 3 года назад
Thanks!
@afaisaladhamshaazi7519
@afaisaladhamshaazi7519 4 года назад
I was wondering why I missed out on this video while going through the ones on Ridge and Lasso Regression from Sept-Oct 2018. Then I noticed this is a video you put out only a few days ago. Awesome. Much gratitude from Malaysia. 🙇
@statquest
@statquest 4 года назад
Thanks! :)
@huzefaghadiyali5886
@huzefaghadiyali5886 2 года назад
I'm just gonna take a minute to appreciate the effort you put in your jokes to make the video more interesting. Its quite underrated.
@statquest
@statquest 2 года назад
Thank you!
@thryce82
@thryce82 4 года назад
this channel is saving my ass when it comes to applied ml class. so frustrating when a dude who has been researching Lasso for 10 years just breaks out some Linear algebra derviation and then acts like your suppose to instantly understand that...... thanks for taking the time to come up with an exhbition that makes sense.
@statquest
@statquest 4 года назад
Thanks!
@chrissmith1152
@chrissmith1152 4 года назад
incredible videos, been watching all of your videos during quarantine for my future job interview. Still waiting for the time series tho. Thanks sir
@statquest
@statquest 4 года назад
Thanks!
@joxa6119
@joxa6119 2 года назад
So you mean this statquest answered the question "why Lasso regression can remove useless variable and Ridge cannot", am I right?
@statquest
@statquest 2 года назад
yes
@chunchen3450
@chunchen3450 4 года назад
Just found this channel today, great illustrations! Thanks for keeping the voice speed down, that makes me easy to follow!
@statquest
@statquest 4 года назад
Awesome, thank you!
@drpkmath12345
@drpkmath12345 4 года назад
Ridge regression! Good topic to cover as always!
@statquest
@statquest 4 года назад
Thanks! :)
@lakshitakamboj198
@lakshitakamboj198 3 года назад
Thanks, josh for this amazing video. I promise to support this channel once I land a job offer as a data scientist. This is the only video on youtube, that practically shows all the algo's.
@statquest
@statquest 3 года назад
Thank you and Good luck!
@adibhatlavivekteja2679
@adibhatlavivekteja2679 4 года назад
Explain stats to a 10-year-old? Me: "You kid, Subscribe and drill through all the content of StatQuest with Josh Starmer"
@statquest
@statquest 4 года назад
:)
@markevans5648
@markevans5648 4 года назад
Great work Josh! Your songs get me every time.
@statquest
@statquest 4 года назад
Bam! :)
@shamshersingh9680
@shamshersingh9680 2 месяца назад
Hi Josh, pse accept my heartfelt thanks to such a wonderful video. I guess your videos are an academy in itself. Just follow along your videos and BAM!! you are a master of Data Science and Machine Learning. 👏
@statquest
@statquest 2 месяца назад
Wow, thank you!
@mathildereynes8508
@mathildereynes8508 4 года назад
Could be interesting to see the explaination in the case of a multidimensional problem with more than 2 d features, but very nice video!
@neillunavat
@neillunavat 4 года назад
Be grateful we've got such a nice guy.
@tvbala00s27
@tvbala00s27 2 года назад
Thanks a lot for this wonderful lesson...loved it ..seeing how the function behaves with different parameters makes it etched in the memory
@statquest
@statquest 2 года назад
Glad it helped!
@srs.shashank
@srs.shashank 3 года назад
As a result when the slope becomes 0 for large lambda in lasso, then we can use lasso for feature selection. Nice Video Josh!!.
@statquest
@statquest 3 года назад
Bam! :)
@tanbui7569
@tanbui7569 3 года назад
Thank you for your work as always. Its AWESOME. I just got some questions. Why is there a kink in the SSR curve for Lasso Regression ? Is it because we are adding lambda * |slope| which is a linear component ? And Does the curve for Ridge Regression stay parabola because we are adding lambda*slope^2 which is a parabola component ?
@statquest
@statquest 3 года назад
I believe that is correct.
@sudeshnasen2731
@sudeshnasen2731 2 года назад
Hi. Great video! I had the same query as to why we cannot see a similar kink in curve in the Ridge Regression CF vs Slope curve.
@myunghee7231
@myunghee7231 4 года назад
thank you!!!!! i have question do you have time series model or time series forecasting?? please please make those video with you amazing explanation!!!! :):)
@statquest
@statquest 4 года назад
I don't have one yet, but it's on the to-do list. :)
@myunghee7231
@myunghee7231 4 года назад
StatQuest with Josh Starmer ohhh good to hear!!!! thank you for response! i will wait the time series !!
@Physicsope875
@Physicsope875 Месяц назад
Mind Blowing! Thank you for such valuable content
@statquest
@statquest Месяц назад
Thanks!
@adhiyamaanpon4168
@adhiyamaanpon4168 4 года назад
Hey josh!! Can u plz make a video for K-modes algorithm for categorical variables(unsupervised learning) with an example..plz?
@hopelesssuprem1867
@hopelesssuprem1867 10 месяцев назад
Many people on the Internet explain regularization of regression using polynomial features in the data that ridge and lasso are allegedly used to reduce the curvature of the line, but in fact in this case you just need to find the right degree of the polynomial. You are one of the few who have shown the real essence of regularization in linear regression and the bottom line is that we simply fine the model by exchanging the bias for a lower variance through slope changes. By the way, real overfitting in regression can be well observed in data with a large number of features, some of which strongly correlate with each other, as well as a relatively small number of samples, and in this case that L1/L2/Lasso will be useful. Thank you so much for a very good explanation.
@statquest
@statquest 10 месяцев назад
Thanks!
@philwebb59
@philwebb59 3 года назад
Best visuals ever! No matter how much I think I know about stats, I always learn something from your videos. Thanks.
@statquest
@statquest 3 года назад
Thank so much! BAM! :)
@RaviShankar-jm1qw
@RaviShankar-jm1qw 4 года назад
You simply amaze me with each of your videos. The best part is the way you explain stuff is so original and simple. Will really love if you could also pen down a book on AI/ML. Would be a bestseller i reckon for sure. Keep up the good work and enlightening us :)
@statquest
@statquest 4 года назад
Wow, thank you!
@rainymornings
@rainymornings Год назад
This aged very well (he has a book now lol)
@martinflo
@martinflo 11 месяцев назад
Hi thanks for the great videos. I don't understand why we get this "kink" on Lasso regression and not Ridge
@statquest
@statquest 11 месяцев назад
The "kink" comes from the absolute value function.
@charan01ai
@charan01ai 10 месяцев назад
unfortunately no one asked me 🤣🤣
@statquest
@statquest 10 месяцев назад
:)
@Spamdrew128
@Spamdrew128 2 года назад
I needed this information for my data science class and didn't expect such a well crafted and humorous video! You are doing great work sir!
@statquest
@statquest 2 года назад
Wow, thank you!
@ainiaini4426
@ainiaini4426 2 года назад
I wish someone asked you before naming those penalties then it would be much simpler to remember their names 😐
@statquest
@statquest 2 года назад
BAM! :)
@PerfectPotential
@PerfectPotential 4 года назад
"I got ... calling a young StatQuest phone" 😁 (The Ladys might love your work fam.)
@statquest
@statquest 4 года назад
Bam!
@sane7263
@sane7263 Год назад
That's a great video, Josh! 6:10 they should definitely have asked you 😂
@statquest
@statquest Год назад
BAM! :)
@ll-bc4gn
@ll-bc4gn 10 месяцев назад
"Unfortunately, no one asked me." I almost fart loud in a library.
@statquest
@statquest 10 месяцев назад
:)
@ismailelboujaddaini
@ismailelboujaddaini 8 месяцев назад
Thank you so much Blessing from Spain/Morocco
@statquest
@statquest 8 месяцев назад
Thanks!
@Jjhvh860
@Jjhvh860 3 года назад
The more cringe the 5 second intro is the better explanation statsquest gives
@statquest
@statquest 3 года назад
That's pretty funny.
@omnesomnibus2845
@omnesomnibus2845 4 года назад
Really excellent video Josh. You consistently do a great job, and I appreciate it. Could you make a video showing the use of Ridge regression and especially Lasso regression in parameter selection? I had to do that once, and it is complicated. From your example it seems that using neither penalty gives you the best response. So, in what circumstances do you want to use the regression to improve your result? If you are using lasso regression to find the top 3 predictive parameters, how does this work? What are the dangers? How do you optimally use it? A complicated subject for sure! I'm sorry if this is covered in your videos on Lasso and Ridge regression individually, I am watching them next. I agree with your naming convention btw, squared and absolute-value penalty is MUCH more intuitive!
@statquest
@statquest 4 года назад
Watch the other regularization videos first. I cover some of what you would like to know in about parameter selection in my video on Elastic-Net in R: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-ctmNq7FgbvI.html
@omnesomnibus2845
@omnesomnibus2845 4 года назад
@@statquest I will check out those videos, thanks. I actually did use elastic net regularization. The whole issue is complex (for somebody without a decent stats background) because the framework of how everything works isn't covered very well AND simply anywhere that I could find, without going down several pretty deep rabbit holes. Some of the parameter selections that I remember were suggested depended on the assumption that the parameters were independent, which was NOT the case in my situation. I'm still not sure what the best approach would have been.
@omnesomnibus2845
@omnesomnibus2845 4 года назад
@@statquest As an additional note, I've always found that examples and exercises are even more important than theory, while theory is essential at times too. In many math classes concepts were laid out in formal and generalized glory, but I couldn't get the concept at all until I put hard numbers or examples to it. It's probably not the subject of your channel or in your interest, but I think some really hand-holding examples of using these concepts in some kaggle projects, or going through what some interesting papers did, would be a great way of bringing the theory and the real world together.
@statquest
@statquest 4 года назад
@@omnesomnibus2845 I do webinars that focus on the applied side of all these concepts. So we can learn the theory, and then practice it with real data.
@omnesomnibus2845
@omnesomnibus2845 4 года назад
@@statquest That's great!
@anshpujara14
@anshpujara14 4 года назад
Can you do a lecture on Kohonen Self Organising Maps?
@baharehbehrooziasl9517
@baharehbehrooziasl9517 Месяц назад
I thought I was familiar with the concept of regularization, but your videos always help me grasp the concept more easily and, of course, deeper!
@statquest
@statquest Месяц назад
Thanks!
@Sello_Hunter
@Sello_Hunter 3 года назад
This explained everything i needed to know in 9 minutes. Absolute genius, thank you!
@statquest
@statquest 3 года назад
Glad it was helpful!
@andyn6053
@andyn6053 2 года назад
Hi, I dont get why we have a kink at zero in the Lasso regression case
@statquest
@statquest 2 года назад
The lasso function includes the absolute value, which doesn't round off as nicely as the square function when parameter values are close to 0.
@Seff2
@Seff2 4 года назад
Another example on why lasso and ridge just makes no sense. The best fit was the normal linear regression. Lasso and ridge both made the line fit worse. So I conclude that lasso and ridge are pointless and best avoided.
@statquest
@statquest 4 года назад
Again, the dataset are small in order to show the concepts behind how these methods work. In practice, we have more data, but not so much that we can be super confident that whatever line we fit will accurately make predictions with new data that the original line was not fit to. Ridge and Lass Regression compensate for the lack of a sufficient amount of data to begin with.
@marcelocoip7275
@marcelocoip7275 2 года назад
I agree with you, these simple example doesn't show any benefits, and a lot of people saying "Thank you, you saved me" I want to know what did they do with this knowledge, repeting like a parrot on a test?
@geovannarojas2580
@geovannarojas2580 3 года назад
These videos are so clear and fun, they helped me a lot with modeling and statistic in biology.
@statquest
@statquest 3 года назад
Thank you! :)
@mansoorbaig9232
@mansoorbaig9232 4 года назад
You are awesome Josh. This one always bothered me why L1 would make coefficients to 0 and not L2 and you explained it so simply.
@statquest
@statquest 4 года назад
Thank you! :)
@usamahussain4461
@usamahussain4461 2 года назад
Excellent. I have just one question. In case of L1 penalty, isn't the line with lambda equal 40 (or slope 0) giving a bad line? I mean with blue line, we were getting a better fit since it didn't completely ignore weight in predicting height and sum of residuals is smallest?
@statquest
@statquest 2 года назад
What time point, minutes and seconds, are you asking about?
@usamahussain4461
@usamahussain4461 2 года назад
@@statquest 7:16
@statquest
@statquest 2 года назад
@@usamahussain4461 Yes. For both L1 and L2 you need to test different values for lambda, including setting it to 0, to find the optimal value.
@ganpatinatrajan5890
@ganpatinatrajan5890 Год назад
Excellent Explanations 👍👍👍 Great work 👍👍👍
@statquest
@statquest Год назад
Thank you!
@lizhihuang3312
@lizhihuang3312 4 месяца назад
this one has official korean subtitle!?
@statquest
@statquest 4 месяца назад
bam!
@berkceyhan5031
@berkceyhan5031 2 года назад
I first like your videos then watch them!
@statquest
@statquest 2 года назад
BAM!
@Imran_et_al
@Imran_et_al Год назад
The explanation can't be any better than this....!
@statquest
@statquest Год назад
bam! :)
@arjunpukale3310
@arjunpukale3310 4 года назад
And thats the reason why lasso does a kind of feature selection and sets many weights to 0 compared to ridge regression. And now I know the reason behind it thanks a lot❤
@statquest
@statquest 4 года назад
BAM! :)
@NRienadire
@NRienadire 3 года назад
Great videos, thank you very much!!!
@statquest
@statquest 3 года назад
Glad you like them!
@haxlwaxl997
@haxlwaxl997 4 года назад
And why is there now kink in the Graph? I understood that Lasso can get to Zero and this has its benefits, but why can it ?
@statquest
@statquest 4 года назад
Because for parameter values between 0 and 1, the lasso penalty shrinks them much faster than the ridge penalty.
@haxlwaxl997
@haxlwaxl997 4 года назад
@@statquest thank you
@ngochua6679
@ngochua6679 3 года назад
Fortunately, I asked you :) I agree squared and absolute penalty are better word choices for these regularization methods. Thanks again for making my ML at Scale a tad bit easier.
@statquest
@statquest 3 года назад
BAM! Thank you very much! :)
@ZinzinsIA
@ZinzinsIA Год назад
Great, many thanks, very understandable and clear. it gave me a good intuiton of how lasso regression shrinks some varables to zero.
@statquest
@statquest Год назад
Glad it was helpful!
@sunritjana4573
@sunritjana4573 3 года назад
Thanks a lot for thes awesome videos, you deserver milllion followers, and a lot of credits :) I just love these and they are KISS. so simple and understandable. I owe you a lot of thanks and credits :D
@statquest
@statquest 3 года назад
Thank you so much 😀!
@siddhu2605
@siddhu2605 3 года назад
Your are a super professor and I'll give you a infinity BAM !!!!!!!!!!!. I really like the way your repeat the earlier discussed topic to refresh the student memory and that really helpful and you have a-lot of patience. Once again you proved that a picture is worth a thousand words.
@statquest
@statquest 3 года назад
Thank you very much! :)
@Jack-mz7ox
@Jack-mz7ox 3 года назад
This is the perfect explanation I am searching for why L1 can be used for feature importance!!!
@statquest
@statquest 3 года назад
bam! :)
@aliaghayari2294
@aliaghayari2294 8 месяцев назад
dude is creating quality videos and replies to every comment! talk about dedication! thanks a lot
@statquest
@statquest 8 месяцев назад
bam! :)
@ruonanzheng2019
@ruonanzheng2019 2 года назад
Thank you, regularization seris videos from 2018 to 2020 are so helpful.😀
@statquest
@statquest 2 года назад
Thanks!
@MorriganSlayde
@MorriganSlayde 2 года назад
I died laughing when you sang the intro.
@statquest
@statquest 2 года назад
That's a good one! :)
@flavio4923
@flavio4923 2 года назад
I've never been good with this kind of math/statistics because when I encounter the book formulas I tend to forget or not understand the symbols. Your videos make it possible to go beyond the notation and to learn the idea behind these concepts to apply them in machine learning. Thank you !
@statquest
@statquest 2 года назад
Bam! :)
@gireejatmajhiremath6751
@gireejatmajhiremath6751 4 года назад
why does that kink exist in Lasso regression?
@statquest
@statquest 4 года назад
Because of the absolute value function.
@albertomontori2863
@albertomontori2863 3 года назад
this video.....you are my savior ❤️❤️❤️
@statquest
@statquest 3 года назад
bam!
@nightnavin
@nightnavin 3 года назад
Really well done, excellent job, even this dummy can get it! Thanks!
@statquest
@statquest 3 года назад
Glad to help!
@case540
@case540 2 года назад
But why... I cant wrap my head around why absolute value has this property
@statquest
@statquest 2 года назад
I believe it is because when the parameters start get smaller, the square penalty becomes much smaller relative to the parameter itself. For example, when the parameter is 0.01, then the penalty is 0.01^2 = 0.0001. In contrast, the absolute value penalty remains as large as the parameter. When the parameter is 0.01, the absolute value penalty is 0.01, which is much larger than the square penalty for the same parameter value.
@case540
@case540 2 года назад
@@statquest Thank you for the response! I just discovered this channel and am binge watching all of the vids!
@statquest
@statquest 2 года назад
@@case540 bam!
@suryan5934
@suryan5934 4 года назад
Amazing video as always Josh! Just to be sure if I got it correctly, the plot between RSS error and slope represents a parabola in 2D. So when we do the same thing in 3D i.e. With 2 parameters, does it represent the same bowl shaped cost function that we try to minimise?
@statquest
@statquest 4 года назад
Yes
@yidong7706
@yidong7706 2 года назад
thanks to this video I finally understand why lasso and ridge have the so called shrinking effect.
@statquest
@statquest 2 года назад
bam!
@mihailtegovski4028
@mihailtegovski4028 3 года назад
You should receive a Nobel Prize.
@statquest
@statquest 3 года назад
BAM! :)
@baharb5321
@baharb5321 3 года назад
Awesome! And I should mention actually: We are asking YOU!"
@statquest
@statquest 3 года назад
Bam!
@katielui131
@katielui131 7 месяцев назад
This is amazing - thanks for this
@statquest
@statquest 7 месяцев назад
Thanks!
@ammarkhan2611
@ammarkhan2611 4 года назад
Thanks Josh. I have a small doubt. What is the reason of the LASSO plot becoming Linear (when the slope values are negative )
@statquest
@statquest 4 года назад
It looks linear, but it's just that the sum of the squared residuals really start to dominate the equation in a huge way - just like how squaring residuals makes outliers dominate a normal regression.
@ammarkhan2611
@ammarkhan2611 4 года назад
@@statquest Thanks Josh, this was really helpful
@rishipatel7998
@rishipatel7998 Год назад
This guy is amazing.... BAM!!!
@statquest
@statquest Год назад
Thanks! :)
@eminatabeypeker6305
@eminatabeypeker6305 3 года назад
You are really doing great great job. This channel is the best way to learn a lot, right and important things in a short time.
@statquest
@statquest 3 года назад
Thank you very much! And thank you for your support!! BAM! :)
@richardxue1506
@richardxue1506 7 дней назад
Ridge Regression (L2-norm) never shrinks coefficients to zero, but Lasso Regression (L1-norm) may shrink coefficients to zero, and that's the reason Lasso can perform feature selection while Ridge can't.
@statquest
@statquest 7 дней назад
bam! :)
@vishnuprakash9196
@vishnuprakash9196 11 месяцев назад
The best. Definitely gonna come back and donate once I land a job.
@statquest
@statquest 11 месяцев назад
Wow! Thank you!
@pmsiddu
@pmsiddu 4 года назад
Very well explained this one video cleared all my doubts along with practical calculations and visualization. Kudos for the great job.
@statquest
@statquest 4 года назад
Thanks! :)
@juanmanuelpedrosa53
@juanmanuelpedrosa53 4 года назад
Hi Josh, would you consider explain the nuances of arithmetic, geometric and harmonic means?. I couldn't find it on the quests.
@statquest
@statquest 4 года назад
I'll put it on the to-do list.
@juanmanuelpedrosa53
@juanmanuelpedrosa53 4 года назад
@@statquest thank you!
@shamshersingh9680
@shamshersingh9680 2 месяца назад
I have a doubt in this video, at time stamp 3:43 when you say "The residuals are smaller than before, so the Sum of Squared Residuals is smaller than before...". This particular line is not clear. For same value of slope of parameter and given value of lambda, how can [Sum of Squared Residuals (SSR) + penalty] be lesser than [SSR only]. For example let say at slope = 0.45 the SSR is 0 when we have not applied Ridge. After Ridge, the loss fn is SSR + 10 * 0.45^2 = 0 + 2.025 = 2.025. So when we apply Ridge the loss is slightly increased which is also visible with orange parabola. Second point I have not understood is at time stamp 5:06, the Note says "We can also see that when lambda = 10, the lowest point in the parabola is closer to 0 to than when lambda = 0. But the lowest point of orange parabola is slightly away from zero as compared to blue parabola. As per my understanding, that what Ridge does. it increases the loss a little bit by making lowest point of parabola away from zero to make model less prone to overfitting. And it does so by reducing the slope so that model is not over sensitive to any small change in parameter value. I am sure I am missing some piece of puzzle here. Thanks and regards.
@statquest
@statquest 2 месяца назад
At 3:43 we are just comparing the residuals for when the slope = 0 to when the slope = 0.2. When the slope is increased from 0 to 0.2, then the residuals are smaller. As a result, the sum of the squared residuals are also smaller for slope = 0.2 than slope = 0. Now, when we add the regularization penalty to equation things are a little more interesting, however, the decrease in the residuals more than compensates in the increase in the regularization penalty.
@thedanglingpointer8411
@thedanglingpointer8411 4 года назад
God of explanation !!! 🙏🏻🙏🏻🙏🏻 Awesome stuff 🙂🙂
@statquest
@statquest 4 года назад
Thank you! 🙂
@tymothylim6550
@tymothylim6550 3 года назад
Thank you very much for this video! It helped me visually understand how Lasso regression can remove some predictors from the final model!
@statquest
@statquest 3 года назад
Glad it was helpful!
@kzengineai
@kzengineai 4 года назад
your videos're very explanatory for studying this field...
@statquest
@statquest 4 года назад
Glad you think so!
@homeycheese1
@homeycheese1 3 месяца назад
I guess I'm still not following why ridge regression can't reduce the slope to zero...for the ridge penalty: 0^2 = 0, and for lasso abs(0) = 0 , they both can equal zero... ?
@statquest
@statquest 3 месяца назад
I'm not sure I understand what you are saying with the math. If the slope is 0, then what you say is correct, 0^2 = 0. But the slope doesn't start out that way. How does it get there if the bottom of the curve is always > 0?
@r0cketRacoon
@r0cketRacoon 2 месяца назад
OMG!!! I've always thought that Ridge is a better method for solving overfitting because it introduces squared penalty to cost function that reduces weights more heavily and faster close to 0. Now you've changed my mind
@statquest
@statquest 2 месяца назад
bam! They both have strengths and weaknesses.
@gladdema236
@gladdema236 4 года назад
you are crazy and you are the best
@statquest
@statquest 4 года назад
Thank you!!!
@SzehangChoi
@SzehangChoi 2 месяца назад
You saved my degree
@statquest
@statquest 2 месяца назад
bam!
@sebastiencrepel5032
@sebastiencrepel5032 3 года назад
Great videos. Very helpful. Thanks !
@statquest
@statquest 3 года назад
Glad you like them!
@saranyakumaran459
@saranyakumaran459 2 года назад
Thank you very much for the video!!! All u r Videos are really easy to understand... thanks alot.. could you please!!.... upload a video for SCAD (Smoothly Clipped Absolute Deviation Method) Regularization Method....
@statquest
@statquest 2 года назад
I'll keep that in mind.
@AhmedKhaled-xp7dm
@AhmedKhaled-xp7dm 3 месяца назад
Amazing series on regularization (As usual) I just didn't quite understand why in the ridge regression the weights/parameters never ever reach zero, I didn't give it much thought but it didn't pop right at me like it usually does in your videos lol, but again great series!
@statquest
@statquest 3 месяца назад
Thanks!
@bedirhangundoner9627
@bedirhangundoner9627 3 года назад
"Why can't it be zero?" I struggled with this idea. But I think I finally found it! Could that be the reason for this? Slopes = (x ^ T.X + aI) ^ - 1 * (X ^ T.y) equation ... Even if we increase a (Reg Term) infinitely, it cannot be zero!
@statquest
@statquest 3 года назад
Maybe! :)
@oanphong61
@oanphong61 2 года назад
Thank you very much!
@statquest
@statquest 2 года назад
bam!
@bikramsarkar1484
@bikramsarkar1484 4 года назад
You are a life saver! I have been trying to understand this for years now!! Thanks a ton!!!
@statquest
@statquest 4 года назад
Bam! :)
@samuelhughes804
@samuelhughes804 4 года назад
All your videos are great, but the regularization ones have been a fantastic help. Was wondering if you were planning any on selective inference from lasso models? That would complete the set for me haha
@statquest
@statquest 4 года назад
Not yet!
@designcredible8247
@designcredible8247 Год назад
Hi, thanks for this explanation, it really helped! In my previous work place almost everyone said that lasso could be used for feature selection and it was kind of given. Like no matter what the lambda value is but it solely depends on the lambda right? It may not remove any features at all? And increasing lambad value to the maximum isn't always most beneficial?
@statquest
@statquest Год назад
That's correct.
Далее
Regularization Part 3: Elastic Net Regression
5:19
Просмотров 205 тыс.
Regularization Part 1: Ridge (L2) Regression
20:27
50m Small Bike vs Car FastChallenge
00:22
Просмотров 4,1 млн
ROC and AUC, Clearly Explained!
16:17
Просмотров 1,5 млн
Ridge Regression
16:54
Просмотров 127 тыс.
Regularization Part 2: Lasso (L1) Regression
8:19
Просмотров 575 тыс.
Lasso regression - explained
18:35
Просмотров 17 тыс.
Gradient Descent, Step-by-Step
23:54
Просмотров 1,3 млн
AdaBoost, Clearly Explained
20:54
Просмотров 759 тыс.
Covariance, Clearly Explained!!!
22:23
Просмотров 556 тыс.