Linear regression (6): Regularization

Alexander Ihler

Подписаться 21 тыс.

Просмотров 166 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Lp regularization penalties; comparing L2 vs L1

Опубликовано:

29 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 60

@vineel.gujjar 10 лет назад

Thank you. One of the best explanations of L1 vs L2 regularization!

@AlexanderIhler 9 лет назад

@Emm-- not sure how / if I can reply to your comment. An iso-surface is the set of points such that a function f(x) has constant value, e.g. all x such that f(x) = c. For a Gaussian distribution, for example, this is an ellipse, shaped according to the eigenvectors and eigenvalues of the covariance matrix. So, the iso-surfaces of theta1^2 + theta2^2 are circles, while the iso-surfaces of |theta1|+|theta2| look like diamonds. The iso-surface of the squared error on the data is also ellipsoidal, with a shape that depends on the data. Alpha scales the importance of the regularization term in the loss function, so higher alpha means more regularization. I didn't prove the sparsity assertion in the recording, but effectively, the "sharpness" of the diamond shape on the axes (specifically, the discontinuous derivative at e.g. theta1=0) means that it is possible for the optimum of the sum of (data + regularization) to have its optimum at a point where some of the parameters are exactly zero. If the function is differentiable at those points, this will effectively never happen -- the optimum will effectively always be at some (possibly small, but) non-zero value.

@TileoptikoPamfago 5 лет назад

I have had courses and put a lot of effort reading material online, but your explanation is by far the one that will remain indelible in my mind. Thank you

@kent.johnson 8 лет назад

sorry i can give only one like : )

@morkapronczay1512 5 лет назад

Best explanation of regularization I ever saw! Concise, detailed just enough, and covers all the practically important aspects. Thank you Sir!

@aravkat 6 лет назад

Wonderful video to give some intuition on L1 vs L2. Thank you!

@MarZandvliet 4 года назад

Whoa, I wasn't ready for the superellipse, that's a nice suprise. That helps me understand the limit case of p -> inf. Also exciting to think about rational values for P such as the 0.5 case. Major thanks for the picture at 7 minutes in. I learned about the concept of compressed sensing the other day, but didn't understand how optimization under regularized L1 norm leads to sparsity. This video made it click for me. :)

@poojakabra1479 2 года назад

Why don’t we draw concentric circles and diamonds as well? To represent optimization space of regularization term?

@nathanzorndorf8214 2 года назад

Next time, I'd love it if you included the effect lambda has on regularization, including visuals!

@chd9841 5 лет назад

I love your accent

@joshuasansom_software 6 лет назад

OMG! This stuff is just way too cool! I love maths.

@dadmehrdidgar4971 4 месяца назад

great video even after 10 years! thanks! :)

@lumpy1092 8 лет назад

Very clear explained, helped a lot, thanks Alex!

@intrepid_grovyle 2 года назад

awesome explanation. thank you

@yd_ 3 года назад

First heard of this via more theoretical material. Very cool to see a discussion from a more applied (?) perspective.

@dARKf3n1Xx 4 года назад

Lasso gives sparse parameter vectors. QUOTE. OF THE DAY, GO AHEAD AND FINISH THE REPORT :P

@Why_I_am_a_theist 11 месяцев назад

Nice video , this is what I dig in youtube , an actual concise clear explanation worth any paid course

@RandomUser20130101 8 лет назад

Thank you for the great explanation. Some questions: 1. At 2:09 the slide says that the regularization term alpha x theta x thetaTranspose is known as the L2 penalty. However, going by the formula for Lp norm, isn't your term missing the square root? Shouldn't the L2 regularization be: alpha x squareroot(theta x thetaTranspose)? 2. At 3:27 you say "the decrease in the mean squared error would be offset by the increase in the norm of theta". Judging from the tone of your voice, I would guess that statement should be self-apparent from this slide. However, am I correct in understanding that this concept is not explained here; rather, it is explained two slides later?

@AlexanderIhler 8 лет назад

+RandomUser20130101 "L2 regularization" is used loosely in the literature to mean either Euclidean distance, or squared Euclidean distance. Certainly, the L2 norm has a square root, and in some cases (L2,1 regularization, for example; see en.wikipedia.org/wiki/Matrix_norm) the square root is important, but often it is not; it does not change, for example, the isosurface shape. So, there should exist values of alpha (regularization strength) that will make them equivalent; alternatively, the path of solutions as alpha is changed should be the same. offset by increase: regularization is being explained in these slides generally; using the (squared) norm of theta is introduced as a notion of "simplicity" in the previous slides, and I think it is not hard to see (certainly if you actually solve the values) that to get the regression curve in the upper right of the slide at 3:27 requires high values of the coefficients, causing a trade-off between the two terms. Two slides later is the geometric picture in parameter space, which certainly also illustrates this trade-off point.

@RandomUser20130101 8 лет назад

+Alexander Ihler Thank you for the info.

@jokotajokisi 9 месяцев назад

Oh my G. After 5 years of confusion, I finally understood Lp regularization! Thank you so much Alex!

@Phoenix231092 Год назад

Very few videos online give some key concepts here, like what we're truly trying to minimize with the penalty expression. Most just give the equation but never explain the intuition behind L1 and L2. Kudos man

@axe863 9 лет назад

English major: Brevity is the soul of wit. Statistics/Math Major: Verbal SCAD type regularization is the soul of wit.

@insanaprilian8184 4 года назад

I just found out you videos now, thank you for a such wonderful explanation, it really helps me to understand this term

@bhupensinha3767 5 лет назад

Apologies. But what is rationale of concentric ellipses ??? Understood the l1/l2 area though

@nathanzorndorf8214 2 года назад

Wow, that was such a great explanation. Thank you.

@rezab314 3 года назад

As my old friend Borat would say: Very Nice!

@erisha78 Год назад

Beautiful!

@amipigeon 2 года назад

Thank you! That's a very clear and concise explanation.

@alimuqaibel7619 5 лет назад

Thank you for excellent video

@sharkserberus 5 лет назад

How to identify the extremities of ellipse with the equation?

@OmerBoehm 4 года назад

Many thanks for the brilliant video !!

@chucktrier 4 года назад

Awesome description, thanks 🙏

@tictacX1 3 года назад

Best explanation yet on what ridge regression does.

@Raven-bi3xn 4 года назад

Great presentation with very reasonable depth!

@OmerBoehm 4 года назад

Thank you Alexander - very well explained !

@yuanqili4836 5 лет назад

Finally I know what does those isosurface diagrams mean found in PRML

@콘충이 4 года назад

Awesome!

@lukaleko7208 4 года назад

This really is an incredible explanation of the idea behind regularization. Thanks a lot for your insight!

@andreluisal 9 лет назад

Which excellent videos you posted! Congratulations!

@lokeshn8864 6 лет назад

Awesome Explanation sir. Thanks much!

@ednaT1991 6 лет назад

Sometimes I wish some profs would present a RU-vid playlist of good videos instead of giving their lectures themselves. This is so much better explained. There are so many good resources on the net, why are there still so many bad lectures given?

@aminator1 6 лет назад

fuck thats truth but depressing

@ThinkingRockh 7 лет назад

Thank you!! This really helped to understand the difference between L1 and L2.

@MillionEuroRoad 9 лет назад

Great video! Did help me a lot!

@Revetice 7 лет назад

Thanks! This helps me understand regularization term a lot.

@zheshi3823 7 лет назад

I learn a lot from this video.Thank you!

@austinmw89 7 лет назад

Thanks you make great videos :)

@ksy8585 5 лет назад

The most perfect video

@michaelclark195 7 лет назад

This is superb. Thanks for putting it together.

@jasminesun4355 4 года назад

Thank you so much!!!

@okeuwechue9238 6 лет назад

Nice, clear explanation. Thnx.

@pt2091 10 лет назад

How is L1 regularization performed?

@AlexanderIhler 10 лет назад

Just replace the "regularizing" cost term that is the sum of squared values of the parameters (L2 penalty), with one that is the sum of the absolute values of the parameters.