Тёмный

Simple Linear Regression: Checking Assumptions with Residual Plots 

jbstatistics
Подписаться 205 тыс.
Просмотров 323 тыс.
50% 1

An investigation of the normality, constant variance, and linearity assumptions of the simple linear regression model through residual plots.
The pain-empathy data is estimated from a figure given in:
Singer et al. (2004). Empathy for pain involves the affective but not sensory components of pain. Science, 303:1157--1162.
The Janka hardness-density data is found in:
Hand, D.J., Daly, F. , Lunn, A.D., McConway, K., and Ostrowski, E., editors (1994). The Handbook of Small Data Sets. Chapman & Hall, London.
Original source: Williams, E.J. (1959). Regression Analysis. John Wiley & Sons, New York. Page 43, Table 3.7.

Опубликовано:

 

4 дек 2012

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 148   
@48956l
@48956l 7 лет назад
I'M GIVIN THIS VIDEO THE BIG CHECK MARK
@jbstatistics
@jbstatistics 7 лет назад
Thanks!
@read89simo
@read89simo 7 лет назад
ME TOO + A BIG SUBSCRIBE
@messididit
@messididit 4 года назад
@@read89simo + A BIG LIKE BUTTON
@jbstatistics
@jbstatistics 11 лет назад
"think you guys should get more views..." Thanks! (And I'll take as a compliment that you said "you guys", since this is a one man show.) Getting lots of views isn't very high on my priority list -- I'm just trying to provide the best resources for my students that I can. (I haven't done any promotion, and I don't allow ads on the videos.) There are many students in intro stats in North America and around the world, and I'm glad that some of them find my videos helpful.
@williamlee0
@williamlee0 4 года назад
I'd upvote you x10 if I could just for the anti-advert policy.
@user-pl7zr2jm5h
@user-pl7zr2jm5h 4 месяца назад
this was posted 11 years ago T-T and has the best explanations and videos on statistics I have ever found, thank you so much for all your hard work and legacy, i hope you know you're my savior.
@jbstatistics
@jbstatistics 4 месяца назад
I'm glad to be of help! 11 years, where'd they go? :)
@jbstatistics
@jbstatistics 11 лет назад
I'm glad you find them useful John. Best of luck in your course!
@johncasey722
@johncasey722 11 лет назад
I'm so fricking glad these videos align well with my UIUC stats class. Much appreciated!
@jbstatistics
@jbstatistics 10 лет назад
You are very welcome Simon!
@snake1625b
@snake1625b 8 лет назад
Excellent methods used to help students learn in this vid. This is the future of education!
@nkululekoshabane3373
@nkululekoshabane3373 9 лет назад
One of the best, if not the best, video on regression analysis I've seen. Thank you very much for creating it. Your service is highly appreciated.
@jbstatistics
@jbstatistics 9 лет назад
Nkululeko Shabane You are very welcome, and thank you very much for the compliment!
@vasili111
@vasili111 10 лет назад
Very good videos about simple linear regression. Thank you very much for creating them!
@jbstatistics
@jbstatistics 10 лет назад
You are very welcome!
@MohamedAbdo-xs7bf
@MohamedAbdo-xs7bf 5 лет назад
You are Awesome! Thank you so much for sharing your valuable knowledge.
@raseshgupta6276
@raseshgupta6276 2 года назад
I was struggling to understand the assumptions in simple linear regression through other sources. This video has made it clear
@GuppyPal
@GuppyPal 2 года назад
This is exactly what I have needed. My professor goes over these plots but has been doing statistics at a high level so long that I think it's hard for him to relate to someone who is new to it. I really needed someone to just explain it all from start to finish, and you did that. Thank you so much! Your videos are so, so helpful. Sincerely, a first year statistics graduate student.
@jbstatistics
@jbstatistics 2 года назад
I'm glad to be of help!
@rodrigopaolinelli6448
@rodrigopaolinelli6448 2 года назад
This is a definitely a great video, thank you! You are awesome!
@valeriereid2337
@valeriereid2337 Год назад
Thank you for this excellent lecture. It certainly helps.
@shayd146
@shayd146 10 лет назад
JB thank you so much you have helped me more than you'll ever know! My only suggestion to you would be to create playlists for associated topics. Other than that your teaching methods are incredible! Thanks!
@jbstatistics
@jbstatistics 10 лет назад
Thanks very much for the compliment Shaydoyle! I believe I do have playlists ordered by topic. I've also set up a website (www.jbstatistics.com), which keeps the videos in a more organized fashion. (I'm not plugging anything on the site - it's just organized lists of my videos.) Cheers.
@hritwick1221
@hritwick1221 3 года назад
you are great man . thanks for your content . I am forever great full to you .
@carnationize
@carnationize 5 лет назад
Thanks a lot! All your videos on stats are very clear and have been very helpful!
@jbstatistics
@jbstatistics 5 лет назад
You are very welcome!
@Jelly-cy4vh
@Jelly-cy4vh Год назад
This was very useful, thank you for all the information
@dedraryqui5606
@dedraryqui5606 8 лет назад
very clear, easy understandable video
@jbstatistics
@jbstatistics 11 лет назад
We often simply rely on an appropriate sampling design or experimental design to ensure independence. But if, say, we have recorded the observations in some sort of time order, then plots of the residuals through time can give us some indication of whether the residuals are correlated.
@angelinelam5862
@angelinelam5862 3 года назад
Thank you for this useful video !
@linneajohansson3796
@linneajohansson3796 3 года назад
This was very helpful! Thank you!
@hichamitani6433
@hichamitani6433 2 года назад
Thank you Need more like these videos on outliers in residuals
@Stephanbitterwolf
@Stephanbitterwolf 6 лет назад
Great video! Thank you!
@dylanburns9381
@dylanburns9381 7 лет назад
great video. such a clear explanation. subbed.
@CHIRAGPERLA
@CHIRAGPERLA 5 лет назад
This is gold!
@bharathganeshkumar7071
@bharathganeshkumar7071 5 лет назад
Thanks for their video.. Short and sweet...!!!
@jbstatistics
@jbstatistics 5 лет назад
You are very welcome!
@wenlidi1604
@wenlidi1604 8 лет назад
very clear explanation.
@jingwen8133
@jingwen8133 4 года назад
Very useful video ! Thank you
@hanaizdihar4368
@hanaizdihar4368 3 года назад
this really helps, thank you
@mostafaali8684
@mostafaali8684 8 лет назад
Good video, thank you very much for uploading it.
@jbstatistics
@jbstatistics 8 лет назад
+Mostafa Ali You are very welcome. I'm glad you found it useful!
@rahkshi96
@rahkshi96 8 лет назад
Thank you very much jb statistics. This is incredibly helpful and well explained.
@jbstatistics
@jbstatistics 8 лет назад
+Peter Song You are very welcome. Thanks for the compliment!
@GlorifiedTruth
@GlorifiedTruth 6 лет назад
So helpful! Thanks.
@sivanschwartz3813
@sivanschwartz3813 8 лет назад
thank you for this amazing video!!!!!!
@jbstatistics
@jbstatistics 8 лет назад
You're very welcome!
@Riley8185
@Riley8185 6 лет назад
These are very good videos
@deniskapliy2642
@deniskapliy2642 7 лет назад
Small...and then they're big...and then they're small...and then they're big.. Great video, pretty simplistic, but very useful, thank you!
@ananyapamde4514
@ananyapamde4514 3 года назад
Great video!
@Pavankumar-zw2fz
@Pavankumar-zw2fz 3 года назад
Very good Explanation Sir.Thank You
@davidli6068
@davidli6068 4 года назад
thanks a lot your a king
@yingdili2219
@yingdili2219 4 года назад
perfect video
@doodelay
@doodelay 5 лет назад
"The residual plot removes that increasing trend and then re-scales the y axis, so it's a little bit easier to see these issues.. sometimes in the residual plot." Now that is some serious insight. Thank you so much and this video was superb with really excellent examples!
@jbstatistics
@jbstatistics 5 лет назад
Thanks for the kind words!
@bibekanandasahoo3497
@bibekanandasahoo3497 2 года назад
thanks for this great explanation sir .....
@syedahmedali7417
@syedahmedali7417 4 года назад
you are such a great teacher...
@jbstatistics
@jbstatistics 4 года назад
Thanks!
@Bombingp
@Bombingp 6 лет назад
Thanks! Helped a lot!
@jbstatistics
@jbstatistics 6 лет назад
You are welcome!
@bhabeshmahanta3408
@bhabeshmahanta3408 5 лет назад
Very nice teaching. Thanks
@siryohannb3626
@siryohannb3626 3 года назад
thankyou very much
@pubgvulcanizer7857
@pubgvulcanizer7857 3 года назад
Very nicely explained 👍
@purityrima1366
@purityrima1366 4 года назад
@jbstatistics, thank you so much for helping me understand these plots! You are the best teacher:) I give you a big check mark for this video too. awesome explanation!
@Maya_s1999
@Maya_s1999 6 лет назад
Prof Balka knocks it out of the park every time! We miss your videos. Could you do some videos on multiple linear regression? Hope you come back soon with new vids!
@jbstatistics
@jbstatistics 6 лет назад
Thanks for the compliment! I'm trying to make time for video production, but probably won't get back to it until the new year. It's been a busy few years, but returning to the videos has always been part of the plan (with multiple regression videos up near the top of the list). Cheers.
@Maya_s1999
@Maya_s1999 6 лет назад
YAY!! Thanks Prof !! I will look out for them.
@simonschacht1810
@simonschacht1810 10 лет назад
Thank you
@muhammadusama1558
@muhammadusama1558 4 года назад
The more I watch your video, the more I hate my uni. Much love man
@renshiue
@renshiue 2 года назад
nice and clear
@frederickrosas5248
@frederickrosas5248 2 года назад
Hi Sir. May I know what statistical tests/treatments being used in residuals plots to confirm what is allowed and not? Thank you for your help.
@jamiebond8481
@jamiebond8481 7 лет назад
good and simple explanation of residual plots and assumptions.
@jbstatistics
@jbstatistics 7 лет назад
Thanks!
@maydin34
@maydin34 7 лет назад
Nice video.Thank you. But it is just plotted between random part vs independent variable(x). What if we have multiple independent varibles ( say z,t,w etc.). Do we need to check for all those seperately by expecting very same variance again regardless of independent variable? (random vs z, random vs t , etc.) Or is it ok just plotting predicted-y vs random part?
@savageprincess2796
@savageprincess2796 Год назад
im giving this video A BIG CHECK MARK (2)
@aicnzheng
@aicnzheng Год назад
Prof, Could you do some videos on multiple linear regression? Hope you come back soon with new vids!
@VivianGameCollections
@VivianGameCollections 2 года назад
safe my day
@Patriciacx
@Patriciacx 6 лет назад
Thank you!
@jbstatistics
@jbstatistics 5 лет назад
You are very welcome!
@sarita-ey5cw
@sarita-ey5cw 5 лет назад
+jbstatistics ऊघजेऐऊ
@Jemimakl
@Jemimakl 5 лет назад
So helpful! Thank you for this :)
@selinechung1692
@selinechung1692 5 лет назад
LMAO BRO WHY ARE YOU HERE
@Jemimakl
@Jemimakl 5 лет назад
Seline Chung WHY ARE YOU HERE
@selinechung1692
@selinechung1692 5 лет назад
@@Jemimakl WHY ARE YOU SO HARDWORKING
@selinechung1692
@selinechung1692 5 лет назад
@@Jemimakl BRO YOU STARTED A WEEK AGO
@Jemimakl
@Jemimakl 5 лет назад
Seline Chung I WAS DOING HOMEWORK
@DHDH_DH
@DHDH_DH 5 месяцев назад
Still extremely helpful in 2024
@aayushiagarwal6188
@aayushiagarwal6188 3 месяца назад
Perfectly explained ✨️ Could you please let me know, if the white centre line (the one around which all the ebsilon points are there) is itself not straight and showing a pattern ,what do we interpret? Does this mean that the mean of errors is non zero and hence our assumption is contradicted?
@infoesenn
@infoesenn 3 года назад
Question: Why do you assume normally distributed errors? From my understanding, in large samples iid-errors with from any distribution should be sufficient (Central Limit Theorem).
@SaranathenArun11E214
@SaranathenArun11E214 5 лет назад
brilliant
@jbstatistics
@jbstatistics 5 лет назад
Thanks!
@frederikhe707
@frederikhe707 7 лет назад
Nice! The only improvement I would suggest is that you actually name the violated assumptions. I mean people can draw that conclusion on their own but that would make it even more clear.
@TB3hnz
@TB3hnz 4 года назад
4:12 "I'm giving this the joker variance, because *let's put a SMILE on that FACE!* "
@nightwalkers5579
@nightwalkers5579 11 лет назад
think you guys should get more views... may be there are not enough stats students in the country
@Kingshuk91
@Kingshuk91 10 лет назад
Great video. How is plot of e vs time and plot of e(t) vs e(t-1) different?
@karrisgiani5137
@karrisgiani5137 7 лет назад
Brill video! If residuals appear to show an inverted U, how can I improve the model?
@omkareshpali8486
@omkareshpali8486 2 года назад
Hi I have a question, let's say I built a model and the R2 value came out 70% How do I make sure that is the maximum variance I can explain by looking at the residuals.
@sanjaypandey6586
@sanjaypandey6586 2 года назад
is it ok in linear regression if dependent and independent variable are not normally distributed if not what should be the optimum solution for negative skew and neg kurtosis
@harrygroundwater2590
@harrygroundwater2590 Месяц назад
Ver helpful
@purityrima1366
@purityrima1366 4 года назад
Please can you share me a link with your video on how to correct the unequal change in variance problem shown on the plots. Thanks in advance
@jt007rai
@jt007rai 4 года назад
at 4:21 , can we determine which model will solve this issue based on just looking at this residual plot?
@ogedaykhan9909
@ogedaykhan9909 5 лет назад
E X C E L L E N T Thanks a lot
@Kaa279
@Kaa279 8 лет назад
in 4:40 you said that there is another feature that we didn't included in our model. but it can also conclude that my model is not good, right?
@pate1495
@pate1495 3 года назад
I have a question regarding the Normal Q-Q plot. On the y-axes, does it show the quantiles of the residual distribution, or the residuals itself? On the x-axes it shows the quantiles of the residual distribution if it were normal, correct? Thank you, great video!
@jbstatistics
@jbstatistics 3 года назад
There are different ways of formatting these plots, but here I have the ordinary residuals on the y axis. (The y axis value for any point is the ordinary residual of that point.) Any value could be considered a quantile. The x axis represents the corresponding quantile from the standard normal distribution. So if the residuals were normally distributed, we'd expected those values to fall (roughly) in a linear pattern. (There are some technical issues here, as the observed residuals aren't technically iid normal, even if the OLS assumptions are true, but it's a rough approximation.)
@JoaoVitorBRgomes
@JoaoVitorBRgomes 4 года назад
At 1:56 you can't plot against Y because there is dependence between Y and the residuals? You mean the residuals are the difference between the observed and the estimated, so makes no sense to plot against the observed? But why? Could you clarify this?
@KingQuetzal
@KingQuetzal 3 года назад
So I got the 4:12 graph how can I find out what kind of data I have?
@JoaoVitorBRgomes
@JoaoVitorBRgomes 4 года назад
3:23 what kind of graph indicates non normality?
@ohhrelingo6271
@ohhrelingo6271 2 года назад
If I can't find out if the variance is constant from the plot what should I do?
@Doh333
@Doh333 8 лет назад
Would it be relevant to make residual plots if i want to check a categorical variable in a lineare regression ?
@jbstatistics
@jbstatistics 8 лет назад
Yes, some types of residual plots are still informative for categorical explanatory variables. With a categorical variable, a check for linearity is not required, but residual plots can still help to check the normality and common variance assumptions.
@lowerterror7993
@lowerterror7993 Месяц назад
No one people like data analytics
@abhishekbhatia6092
@abhishekbhatia6092 5 лет назад
While interpreting the residual plots, can I first pool the residuals in specific bins of X (say each bin 1 unit long or whatever) , so that it looks more like the previous plot with residuals for a given value of X, enabling me to verify the homoscedasticity (and also normality somewhat) more clearly? Edit: Q) You mentioned that one of the assumptions was that for a given value of X, the error terms are normally distributed with a constant variance sigma-squared (same for each X). Then at 5:50 you took all the residuals disregarding the value of X, and graphically checked it for normality using a Q-Q plot. Didn't you mention that the normality assumption was for errors for a given value of X? I am confused. pls help.
@n9537
@n9537 5 лет назад
answer to edit: If we assume that sampling was completely random, then data from all treatments/groups/sub-populations/values of X were equally likely to be represented in your sample.In that case all the residuals can be clubbed together and checked for normality.It s same as checking for each treatment group.Note this applies only for the residuals, not the variables.
@n9537
@n9537 5 лет назад
In regression we usually have predictor variables continuous. so it is impossible to check normality for each value of X. in case of ANOVA , the predictor is usually categorical and you can venture to check residual normality for each treatment group/category.Both ANOVA and Regression come under Generalized Linear Model(GLM), so the assumptions are the same but they play out differently.
@n9537
@n9537 5 лет назад
Actually all assumptions are on the error terms.But since Residuals are an estimate on the error, we check for "good behavior" on the residuals. We have to make do with what we have(which is the residuals, the error is unknown)
@n9537
@n9537 5 лет назад
this also follows from the assumption that error ~ i.i.d N(0,sigma^2). So all residuals(used in place of error as a good estimate) are identically distributed(same mean and variance) and are independent of X , implies you can't look at a set of residuals and figure out which value of X it came from. For all you know, they all could be from the same value of X or different values of X.Needless to say, they must be sourced from the same population, you can't club residuals from different populations/different predictor(s). So for checking normality of residuals, you can disregard value of X.This is not the case for Y (dependent variable).
@davidchau6874
@davidchau6874 11 лет назад
how to check the independence in the residual plot?
@ujasdiyora2804
@ujasdiyora2804 7 месяцев назад
I have one doubt, here we are talking about simple linear regression in all of these videos in playlist. So this assumptions are also true for linear regression, multiple regression and polynomial regression ? , and all of these theory of finding confidence interval and hypothesis testing at the end to find whether coefficients are statistically significant or not , are these methods also applied in any other linear regression ?
@jbstatistics
@jbstatistics 7 месяцев назад
The general idea still holds, yes. The specific formulas for the standard errors, degrees of freedom, etc., will change when there is more than one predictor. And there are many subtleties when it comes to multiple regression, so it's best to learn all about MLR rather than think something like "well, it's just like simple linear regression but with more predictors." That said, yes, the general ideas port over from simple linear regression to multiple linear regression in a natural way. Polynomial regression is a type of multiple regression, so same idea there.
@carlosaugusto212
@carlosaugusto212 4 года назад
Shouldn't we analyse the standardized residual plot? I mean, the residuals will be naturally bigger as the y value gets bigger, won't it? If the y range goes from 0,1 to 10 thousand, we expect bigger residual absolute values near the 10 thousand mark. Correct me if I'm wrong, please
@aabinamasoodgundroo5971
@aabinamasoodgundroo5971 2 года назад
my graph is blank, what does that mean?
@alexkay7199
@alexkay7199 5 лет назад
Please help: Do the residuals have a unit or are they unitless???
@jbstatistics
@jbstatistics 5 лет назад
The residuals are the differences between the observed values of Y and the predicted values of Y. The units of both the observed and predicted values of Y are just the units of Y, and thus the units of the residuals are the units of Y.
@alexkay7199
@alexkay7199 5 лет назад
@@jbstatistics Thanks a lot! Really helpful!!
@MohitSingh-ub9gc
@MohitSingh-ub9gc 6 лет назад
but why are we doing this, please explain ?
@ukrainrussiawarvideos2810
@ukrainrussiawarvideos2810 8 лет назад
A GOOD GAIED Program
@Nias0404
@Nias0404 6 лет назад
I'm not sure how to get the Q-Q plot... can anyone explain?
@jbstatistics
@jbstatistics 6 лет назад
It's almost always created using software. My intro to normal QQ plots is found here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-X9_ISJ0YpGw.html
@Nias0404
@Nias0404 6 лет назад
Thanks a lot!! Much appreciated
@yousifsalam
@yousifsalam Год назад
@4:40 why did you say the residuals are small then big then small.. don't you mean they're negative, positive.. since their magnitude is the same?
@dariopl8664
@dariopl8664 Год назад
I think it's because "ε" is a random variable (as he mentioned it in previous videos), and should stay so. If they appear at time sections up and then a bunch of them down, that randomness breaks up, since when a whole of them are up you can forsee they'll be down next time (then where's the randomness?). I think that if all are up the same amount they're down (as you see @4:40), then, would they still have a normal distribution? no, it would be just a straight line probability distribution, in which you know the moment you're up, next will be down, and so on. This model assumed ε follows a normal distribution, which is reasonable, since in real life many events occur this way. If they're are jumping up and down in clusters then we're not dealing any longer with this reasonable distribution. But of course, at the end he'd some way deal with this time effect he didn't know beforehand was causing this, maybe so as to normalize them, as they should be to fit the model🤔. I don't know yet how he tackles this problem. If I find about it I'll tell you. Hope this reply was helpful. Best regards.
@MrPreston1056
@MrPreston1056 10 лет назад
Is there a way to t test the residual plot?
@jbstatistics
@jbstatistics 10 лет назад
What kind of test are you hoping to do? There isn't going to be an overall increasing or decreasing trend in the residuals in simple linear regression. There may be curvature, and we could test to see whether adding higher order terms (e.g. X^2) results in a significantly improved fit. Cheers.
@MrPreston1056
@MrPreston1056 10 лет назад
Can you use the t statistic to test if H: E(e "hat"sub i)=0 vs. H: E(e"hat" sub i) not equal to zero. E being mean and e "hat" being error
@jbstatistics
@jbstatistics 10 лет назад
Preston C No, we can't test that. The (observed) residuals always sum to 0 in simple linear regression. When we say the expectation of epsilon is 0 (at every X), we are in effect saying that E(Y|X) falls on the line beta_0 + beta_1X. Conceptually, we could have a different model where the expectation of epsilon was assumed to be 2 instead of 0. This would change very little, except that beta_0 in this model would be 2 less than beta_0 in our usual model. This would unnecessarily complicate things, so we define epsilon to be a random variable with a mean of 0. Cheers.
@MrPreston1056
@MrPreston1056 10 лет назад
But if we tested to see if e_i=0 vs not equal to 0 and rejected the null hypothesis that e_i=0 wouldn't that indicate that the residuals did not sum to zero and our previous assumptions were false?
@jbstatistics
@jbstatistics 10 лет назад
Preston C The observed residuals sum to 0. That is not an assumption, it is a consequence of the least squares fit. If we attempted to test the null hypothesis that the true mean residual is 0 with a t test, we would end up with a test statistic of 0 and a p-value of 1. So that wouldn't really be a test. If you're wondering about testing the null hypothesis that E(epsilon) = 0 at *any given value of X*, that's a bit of a different story. We do something along those lines when we carry out a lack-of-fit test. (This tests the null hypothesis that the means do indeed fall on a line. We can do this sort of thing when we have multiple observations at at least some of the X's.)
@unofficiallyofficial2149
@unofficiallyofficial2149 5 лет назад
Probably a simple model for college students, not high school.
@jbstatistics
@jbstatistics 5 лет назад
The "simple" in simple linear regression refers to there being only one predictor (one x), and not because it's simple or easy. It's just the well-established name of the model. Unlike many others, I don't use any clickbait words like "easy" or "simple".
@unofficiallyofficial2149
@unofficiallyofficial2149 5 лет назад
@@jbstatistics Oh, I understand. Thanks.
@JoshuaDHarvey
@JoshuaDHarvey 3 года назад
Nothing he is explaining makes any sense.
Далее
Inference on the Slope (The Formulas)
6:57
Просмотров 58 тыс.
Regression assumptions explained!
47:16
Просмотров 246 тыс.
Checking assumptions of the linear model
9:05
Просмотров 47 тыс.
Statistics 101: Linear Regression, Residual Analysis
19:56
Introduction to residuals and least squares regression
7:39
Regression diagnostics and analysis workflow
17:48
Просмотров 14 тыс.
Simple Linear Regression:  An Example
9:51
Просмотров 85 тыс.
Residuals and Residual Plots
7:33
Просмотров 21 тыс.