By the end of the video, I feel like I've just returned home after climbing a mental mountain. Putting it all together at the end with an example and walking through each model was truly an awesome and insightful experience. I 'feel' I learned a lot. Great teaching approach. Thanks a lot.
+jnscollier Thanks for the kind words and I'm glad you finished the marathon. A lesser person might not have made it to the peak of Mt. Everest. :0) There is so much theory behind these techniques it can really be overwhelming at first. L1 & L2 penalties, matrix notation, shrinkage estimators, etc... However, these are fantastic tools to have in your repertoire.
Love the way you did this video. I was feeling exausted and thought that I would find another boring video but you made it easy to follow and I've woken up
could you please label the x and y axes of your graphs in your future presentations? (There are also a few inconsistencies with parameters. e.g. when you use 'i' in the summation term but proceed with 'j' instead)
At 7:28 the squares are not really what they should be. You take the x difference, where you should take the y difference and square that. For the rest good video.
The error term is the vertical distance between the observed and the regression line. Your video incorrectly does not draw the side of each box to extend from the observed to the regression line. +Sjors van Heuvein is correct. You picture shows the sum{(Y^ - f(x+(Y^-f(x))) )^2 being minimized.
Derek, Thank you for providing a very good learning material. Can you please post a video entirely dedicated to Ridge Regression? Is it possible to use Ridge Regression to estimate the coefficients and determine which covariates are important drivers of the model?
+Kaleab Woldemariam Thank you for the kind words and I will definitely think about adding more content to the Ridge. Some of the limitations of the Ridge Regression technique itself is that it does not lend itself to get a sense of the importance of the covariates that are the key drivers. I typically will flip to the LASSO and Elastic net variants to get this assessment (the technique itself will eliminate the less important variables through shrinking the value to 0). You could consider running a PCA analysis or employing variable selection routines to gauge variable importance before leveraging the Ridge Regression.
Thanks Derek! That was helpful! I didn't feel that you explained how elastic nets capture multicollinear groups (schools of fish). It just looks like a blended version of Ridge and Lasso without creating any clusters/schools like you mentioned. Also, any insight into why Ridge outperformed Lasso and Elastic? Is that usually the case? Also, I think someone asked below... can these models be used for logistic regression (for classification/binary output - i.e. yes/no)? And, how do you handle binary inputs (I'm guessing no adjustment needed there)?
This may be true, if you are a professional mathematician. As a layman, I was lost after ca. 10 minutes and the rest of the presentation was a series of pictures and hieroglyphs that I had no clue about.
Hey Derek, thanks for this great Video lecture! I just have a couple of questions: 1. What is the R Matrix in 23:10? Is it a variance-covariance Matrix? 2. Why does the Ridge model lose to OLS beyond the dashed line in 31:41? 3. When you compare the three models in the end by their MSEs, are these MSEs in sample or out of sample? Thank you very much!
This is a fantastic lecture video. One question. In the final comparison, the MSE for Lasso is higher by only 0.0124 but in return, we are getting rid of 2 variables. Don't you think it is worth the trade-off? Derek or anyone who is good in this, please answer. Thanks.
Thank you. This is very valuable. I have one question about the Elastic net and hope you can help. Since Elastic net will include group of correlated variables, my question is if I apply elastic net, can I still interpret the coefficients on the effect to the prediction? I remembered that when multicollinearity is in presence , the coefficients will become the opposite sign. So i am concerned that I will not be able to interpret my coefficients. (in the end I want to be able to say, these set of variables can have positive impact on the predicted value while the other set of variables will have 0 or negative impact.) thank you!
demudu naganaidu I'm glad that you found some value in this demudu. Ridge regression is kind of tricky and I find that it takes a little bit of work to get comfortable with it. Good luck.
It’s really a nice tutorial on different regression techniques. I want to use LASSO/ELASTIC NET in my Ph.D. research problem (Problem: Correction of Satellite based Rainfall by using several independent variable such as location, topographical variables) May I have your personal email to discuss the problem with you?
It's to correct skewness, also it's a common practice in linear models because transforming features make residuals more normally distributed. (Lo ok up feature transformation in linear models and boxcox transform for further information).