Тёмный

Advanced Regression - Categorical X variables and Interaction terms 

zedstatistics
Подписаться 236 тыс.
Просмотров 124 тыс.
50% 1

All my stats videos are found here: www.zstatistics.com/videos/
See the whole regression series here:
• Regression series (10 ...
To download the jaybob.csv dataset, head over to the website above, I'll upload the data (and associated model worksheet) to the video page.

Приколы

Опубликовано:

 

6 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 136   
@leosizaret4104
@leosizaret4104 6 лет назад
Your videos on regression are amazing! Interesting, clear, very informative, you the stats & intuition behind regressions into something fun to lose oneself in :D
@hitm43
@hitm43 4 года назад
This video was exactly what I needed. Clear and thorough. Keep it up!
@arushibhattacharya2143
@arushibhattacharya2143 2 года назад
These videos are a godsend. These are going to save my life for my massive regression analysis research paper. Thanks for the great content!!
@bernardosangir2698
@bernardosangir2698 5 лет назад
I love your videos, especially this video has given me the insight on explaining a regression model with interactions which I have struggled with a lot. Thank you so much
@jasminepandit9861
@jasminepandit9861 2 года назад
Thank you SO much for this series! Best I've seen on RU-vid so far!
@petercrooks3166
@petercrooks3166 3 года назад
One of the best explanations of the Dummy Variable Trap and how to circumvent it!
@niv2419
@niv2419 6 лет назад
Quality content and well explained. Thanks a ton!
@the5to9life
@the5to9life 4 года назад
Sir, you are amazing. Thank you for making these videos.
@anandparanjape1
@anandparanjape1 5 лет назад
Awesome videos man! You are a very good teacher :-)
@ivangarcialaverde2065
@ivangarcialaverde2065 4 года назад
You explain 100 times better than my statistics teacher, you've just save my exam, thanks a lot !!
@PunmasterSTP
@PunmasterSTP 2 года назад
Hey I know it's been awhile, but I just came across your comment and was curious. How'd the rest of the class go?
@PunmasterSTP
@PunmasterSTP 2 года назад
Categorical X? More like "Certainly the best." These videos rock!
@houlipouli3559
@houlipouli3559 2 года назад
brooooo, how did I not see your videos in 5 years *sight* i would have gone through uni so much easier!!
@woodcrestshop5621
@woodcrestshop5621 3 года назад
Very well explained, Thank you so much !!. Stay safe Professor !.
@benflis1618
@benflis1618 2 года назад
Thanks. My professor threw this into the review of SLR and MLR (which we didn't originally learn), but he didn't explain it very well. This video was a big help Edit: and by "this," I mean interaction terms
@PunmasterSTP
@PunmasterSTP 2 года назад
How'd the rest of your class go?
@jovial129
@jovial129 4 года назад
Love the excitement of the pink slip variable becoming significant lol 9:55
@leehyeah9133
@leehyeah9133 6 месяцев назад
omg you have saved my thesis now. Thank you 3000
@kanikabagree1084
@kanikabagree1084 4 года назад
Thankyou so much for such an amazing explanation you're my saviour thankyou :)
@drewtamales5999
@drewtamales5999 3 года назад
These videos are fantastic thank you!
@amirnashed9701
@amirnashed9701 5 лет назад
amazing work with the car example
@danielalonso3664
@danielalonso3664 3 года назад
20:08 you should also take into account that being in cat4 gets you -0.390 so apart from adding 0.123 of the pink slip, you should subtract 0.390 fro being in cat4
@abdulmateen6101
@abdulmateen6101 3 года назад
You are absolutely right in pointing this omission
@azzakamoun2294
@azzakamoun2294 3 года назад
think of it in terms of marginal effect: consider that you have a simpler form of regression with only pink slip (X1) and agecat4( X2) + interaction term (X1*X2) with a coef of b3, if you re going to asses the marginal effect of the var1 ( pink slip ) , you ll proceed with the derivative of Y over deriv of X1 and the result il the following ( Dy/Dx1 = b1+ b3* X2 ) , thus considering the " boolean" nature of X2 if X2=1 , then Y will increase by b1+b3 else it will only increase by b1, same thing if you want to see the marginal effect of X2 ! => Conclusion : always think of it as the derivative over the variable and you ll get the answer
@anthonyabolarin4961
@anthonyabolarin4961 Год назад
There is no omission: Ln(P) = constant (b0) - 0.181Ac2 - 0.800Ac3 - 0.390Ac4 -0.209 LnD + 0.123PS +1.371(Ps x Ac4) The term [- 0.390Ac4] is what you counted as omitted, but it is present. Model7 Coef Var (ind) LN Anti-Ln c 9.125 1 9.1250 ac2 -0.181 0 0.0000 ac3 -0.800 0 0.0000 ac4 -0.390 1 -0.3900 lnd -0.209 5.669880923 -1.1850 ps 0.123 1 0.1230 psac 1.371 1 1.3710 9.0440 8,467.54 (Apologies, the table does not show very well here)
@existentialrap521
@existentialrap521 Год назад
I was wondering about this as well. Glad you caught it as well! Edit: He did do it while calculating price in model 7 at the very end.
@freeSpiritNonna
@freeSpiritNonna 6 месяцев назад
That time slot is about the effect of attaining a 'pink slip' and the -0.390 item is only for the age of the car, hence not included, I think.
@kanchangupta19
@kanchangupta19 5 лет назад
iam totally sold by ths video. i mean your teaching pattern is totally upto par. please make some videos on logistic regression, random forest, neural network and clustering as well.
@akanshabari6394
@akanshabari6394 6 месяцев назад
Amazing content!!!
@heidilinnsandster920
@heidilinnsandster920 3 года назад
This was so helpful!
@siddhft3001
@siddhft3001 3 года назад
Great video! Thank you!
@in100seconds5
@in100seconds5 4 года назад
well-done bro !
@cravenhealth1563
@cravenhealth1563 4 года назад
"15-35, well, they're the shit boxes aren't they?" lol
@petercrooks3166
@petercrooks3166 3 года назад
Think about the kids learning advanced regression! /s lol
@pomme_paille
@pomme_paille 4 года назад
You forgot minus 1 at the end 😉 Thanks for the awesome content
@JoaoVitorBRgomes
@JoaoVitorBRgomes 4 года назад
Keep posting!
@kanewilliams1653
@kanewilliams1653 5 месяцев назад
You're a legend!
@live.through.dance.
@live.through.dance. 5 лет назад
Firstly Thannk You so much Sir for your videos filled with detailed explamation! Any query pops up in my mind..mostly gets solved up within few minutes in the videos...Got a clear idea about Interaction terms and correlation.. So just a small doubt....is correlation only difference bw Multicolineaity and Interaction terms...as we donot prefer Multicolinearity...?
@KFIR93
@KFIR93 3 года назад
You are the best!
@tedofbeverlyhills
@tedofbeverlyhills 2 года назад
Awesome videos, any book you particularly recommend to understand how to do linear regressions?
@user-mk4rm5pv4o
@user-mk4rm5pv4o 11 месяцев назад
extremely clear!
@rrrprogram8667
@rrrprogram8667 4 года назад
Change ur channel name to "zstatistics for machine learning" you goona soon have million subs
@serikshamgunov7940
@serikshamgunov7940 5 лет назад
thank you very much for this video
@eddiele644
@eddiele644 4 года назад
So when do we actually interact our variables? Is there a way to see if it is necessary or do we just do it and then see if the coefficient on the interaction term is statistically significant?
@sum1sw
@sum1sw 4 года назад
Any idea how does one calculate SE for multi parameter non-linear regression?
@Mona-xl6mv
@Mona-xl6mv 3 года назад
I love u, made my life so much easier
@azizsakr5565
@azizsakr5565 5 лет назад
Thank you for the interesting videos. I think there is a little bit confusion in the interpretation of the resulting coefficients. The change in one of the independent variables holding all the others constant does not mean increasing the dependent variable by the same percentage. Please check an example of this at 11:28
@imglenngarcia
@imglenngarcia 2 года назад
On the time stamp you provided, I think it should be 60.64% higher rather than 47.4%.
@helloworld1537
@helloworld1537 Год назад
Yes! I found the same issue in the last video as well, it should be: the median of the car price will be e^0.474=1.6064 times. Thus 60.64% increase
@helloworld1537
@helloworld1537 Год назад
@@imglenngarcia I think the pink slip interaction term case also has the same problem..
@banashreeshiva4506
@banashreeshiva4506 4 года назад
Changing the format of age into categorical variable increased the significance of pink slip. Is this by chance or it happens for every variables? Is it okay if we had just dropped the pink slip variable for the next model without changing the format of the age variable?
@garyabrams1020
@garyabrams1020 5 лет назад
This is probably a very basic question - you added variables to get the final price of your car at the end of IVb that were not statistically significant - why - thanks gary
@shashankkhare1023
@shashankkhare1023 4 года назад
Hi Justin, hope you are doing great! I love your videos and have been following them on your website as well. I have one doubt in this video. At 3:48, where you added pink slip variable, you say that having a pink slip increases the price by 15.6% as coeff for pink slip is 0.156. I am confused here as y is logged and when y is logged and x is not, general interpretation is that 1 unit increase in x means exp(coeff)-1 percent change in y. Please help me understand where I am going wrong. Thanks :)
@JoaoVitorBRgomes
@JoaoVitorBRgomes 4 года назад
I think it is because he is not doing a logistic regression but a linear regression.
@crock1255
@crock1255 6 лет назад
In doing an actual analysis, would you still add the pink-slip coefficient to the pink-slip x cat4 interaction even when the pink slip variable alone is not statistically significant?
@malikakbar
@malikakbar 4 года назад
I have the same question, would appreciate if anyone willing to share the answer for this one
@lekjov6170
@lekjov6170 3 года назад
@@malikakbar It does, even if the variable on itself is not statistically significant. Think about this scenario: There's a new AgeCat5 variable that is added for all cars that are over 70 years old; and also we are gonna add another variable called redCar, that takes 1 as a value if the car is red, and 0 otherwise. Now, if I were to ask you: "Does the color of the car being red have an impact in the price of the car?" Probably not, cars are customizable and the color on itself doesn't seem to be too relevant to determine the price of the car. For the sake of the example, let's stirr things up and claim that in the 40's(80years ago) almost all the cars were either grey or black, and only BMW was producing fancy red cars that were way more expensive than the grey/black cars; but since it's been so long ago it's really hard nowadays to find those special edition BMW cars. So, I believe it's pretty easy to tell that if you find a car that is over 80years old whose original paint is red, it's almost certain that it is one of those special edition BMW red cars that are way more expensive than the others. So, for all the other cars that were fabricated less than 70 years ago, the color is not a factor that affects the price of the car, so the variable "redCar" is probably not gonna be statistically significant on itself, but in the case that the car is 70years older, the color of the car is gonna play a huge role to predict the value of the car, because if it is red,. the price is gonna be way more expensive, therefore, it's gonna be significant in that scenario. In this case, intuiton tells me we should add another predictive variable to the equation that describes that relationship, which you would do like this "+ Beta6 * redCar*ageCat5
@UdoLattek02
@UdoLattek02 2 года назад
You saved my thesis
@qinghuafeng1705
@qinghuafeng1705 Год назад
Thank you!
@olb47
@olb47 4 года назад
Hi, can we use the model for Datsun if pinkslip's pvalue is not significant and we are highly restrictive? I'm asking because pvalue is higher than 0.05 and with common methodology it seems insignificant.
@haroldbradford690
@haroldbradford690 3 года назад
brilliant!!!
@Nereknu93
@Nereknu93 4 года назад
Hey, what if the categorical variable would have had a lot of levels (nationality, religion...) so that it would effectively mean so many variables that the adjusted R squared would be very low? And 2) - when some variables in your model are insignificant, should't you remove them from the model? but then some of the then-significant variables become insignificant... :(
@sociologie4507
@sociologie4507 3 года назад
excellent!
@jahongirmuratov1576
@jahongirmuratov1576 4 года назад
Why all videos on interaction terms discuss only the cases where both variables are expected to have positive impact on Y and the results of the interaction are also positive? What about having 2 independent variables (x1 and x2) with one of them having positive impact on Y and another - negative. How to interpret the interaction term if it is positive? If it is negative?
@yasminfatima5948
@yasminfatima5948 4 года назад
How removing one agecategory variable is making sense and including all four wont?
@huntermarshall161
@huntermarshall161 4 года назад
Hey Z, You mentioned interaction terms should only be included when the added IV(2) affects the relationship between IV(1) and Y. How do you determine the affect of adding IV(2), is it a change in the regression coefficient of IV(1)? I’m conducting orthopaedic related research involving canine models, and I’m using multiple regression models to control for the size of the specimen. This would really help me out as I’m working to nail down models for our various parameters.
@JoaoVitorBRgomes
@JoaoVitorBRgomes 4 года назад
Actually I think it is when the p value of the interaction term is statistically significant. Then you see there's an effect modifying e.g. age modifying selling a car with a pink slip.
@neelabhdubey8453
@neelabhdubey8453 2 года назад
Why do we say there's a 47.4% increase in price in Cat4 as compared to Cat1 and why don't we read it as a 47.4% jump from Cat3, i understand that the base category is 1 but i fail to see the reason behind the interpretation
@Skey1337
@Skey1337 Год назад
The interpretation of the pink slip coefficient in Model 7, 19:44 - is that still 149,4% relative to cat 1?
@user-ov1to6cs7i
@user-ov1to6cs7i 8 месяцев назад
thank you Professor, why do we have to spare the variable age in model 5 ?
@jongkargrinang8012
@jongkargrinang8012 11 дней назад
If you add age and ln(age) in the model, it will create multicollinearlity. Which coefficient is useful, age or ln(age)?
@compilations6358
@compilations6358 3 года назад
Here you intuitively decided that the coefficient of a certain feature is not as you would expect. So usually we have thousands of dimensions, how can we know if the coefficients make sense? any sort of analysis other than manual checking of coefficients we can do here?
@ael3377
@ael3377 2 года назад
Don't you have to exponentiate the coefficient of the dummy variable and then interpret it as a multiplier? The sales are still logged, so I guess you would have to exponentiate both sides of the equation to see the change in sales from non pink slips to pink slips.
@jakeandersonbell5993
@jakeandersonbell5993 2 года назад
Same here, I thought it was (exp(coef) - 1) * 100 for non-transformed independent variables. Someone please correct me.
@timothyagandahabagre1091
@timothyagandahabagre1091 4 года назад
Hi, thanks for your videos. They are being very helpful to me. Could we simply code the new AgeCat variable such that: AgeCat: = 0 if age
@gamerchil
@gamerchil 2 года назад
Yes I was thinking this as well. I have always learned that amount of dummies = category levels (age in this case 4) - 1 = amount of dummies that meet be created
@anthonyabolarin4961
@anthonyabolarin4961 Год назад
Restoring the Ac1 (age 1 to 5 years) into the Model is okay. However, the value of Ac1 = 0, annulling the entire term and contributing nothing to the Model, just like the Ac2 and Ac3. Other interactions are feasible. (e.g., odometer vs. age).
@yourstrulysj2183
@yourstrulysj2183 Месяц назад
How will I understand the price conversion e9044 to real dollar terms $8468. Can you please help me understand!
@Zerudite
@Zerudite 3 года назад
why do we add the pinkslip and pinkslip*agecat4 coefficients for the interpretation but not the agecat4? is it because it's not significant? or because we were only interpreting the coefficient of pinkslip with the condition that agecat4 is true?
@KissingPL
@KissingPL 3 года назад
You can also interpret the effect of agecat4, which works the same way. Models that are in agecat4, but have NO pink-slip, have on average a 39% lower price than the baseline, holding all else constant. If they do have a pink-slip, the price is on average 98,1% (1,371-0,39) higher than the baseline, hold all else constant.
@zoyaaqib9269
@zoyaaqib9269 2 года назад
I don't understand why we turned age, a continuous variable, into a categorical variable. Was that just for explaining how a multi-level categorical variable works or it was actually important to our model?
@folumb
@folumb 5 лет назад
Why is this video categorized as comedy?
@arnavtube
@arnavtube 3 года назад
statistics is funny
@anthonyabolarin4961
@anthonyabolarin4961 Год назад
...to them that are statistically lost, it is indeed a comedy! 1CO1!18
@aritradatta448
@aritradatta448 3 года назад
In the model 7, the pvalue for pink slip is 0.5 and therefore, quite highly insignificant. Shouldn't it be removed? If yes, then after removing the base variable, can the interaction variable pink slip*agecat4 be retained in the model?
@drnurintankamaruddin3164
@drnurintankamaruddin3164 2 года назад
I have the same exact question. Have you found the answer?
@anthonyabolarin4961
@anthonyabolarin4961 Год назад
If an interaction variable is SIGNIFICANT, t]then its components must be admitted into the model.
@korman9872
@korman9872 2 года назад
Tx sir
@couragee1
@couragee1 2 года назад
thanks
@johnpark1797
@johnpark1797 2 года назад
15:35 really clarifies it
@_Anonymous_9
@_Anonymous_9 3 года назад
brooo, you didnt show how you coded the interaction term with dummy vars
@bk6prod490
@bk6prod490 3 года назад
can i remove one control variable in the model ? and dont remove the others ( interaction and one control variable )
@anthonyabolarin4961
@anthonyabolarin4961 Год назад
Whatever you remove demands that you recalculate your Model Regression and obtain new coefficients and significances.
@rogerdoux
@rogerdoux 3 года назад
"but mate, this thing goes" bloody oath
@lauramollema1817
@lauramollema1817 3 года назад
I noticed you're interpreting insignificant variables multiple times. Could you please leave these out when performing calculations?
@piku9290dgp
@piku9290dgp 4 года назад
How do you identify which two variable can interact. is this based on business or domain knowledge
@JoaoVitorBRgomes
@JoaoVitorBRgomes 4 года назад
I think according to what he said is a bit subjective (domain knowledge)
@anthonyabolarin4961
@anthonyabolarin4961 Год назад
Any combination can interact! But to what effect? If the interaction is INSIGNIFICANT, we save time by ignoring the crossing, except we are constrained, say in an exam setting.
@samuelthomaz
@samuelthomaz 2 года назад
Valeu!
@hiteshkamboj
@hiteshkamboj 3 года назад
Why you take (Age)^2
@petercrooks3166
@petercrooks3166 3 года назад
Why did you not include [-0.390(AgeCat4)] when the car is older than 35 years and has a pink slip? Isn't your answer incorrect? The answer should be, "... for models older than 35 years, attaining a pink slip increases the price by an average of 110.4%, holding all else constant."
@helloworld1537
@helloworld1537 Год назад
It should be e^1.104 times higher
@anthonyabolarin4961
@anthonyabolarin4961 Год назад
@@helloworld1537 We generally think in decimals.
@abdelkaderkaouane1944
@abdelkaderkaouane1944 Год назад
👌
@95FH95
@95FH95 5 лет назад
haha i can get everything but cannot solve -0.209Ln(290). can someone please quickly show me the calculations for this?
@EpicLaith
@EpicLaith 5 лет назад
Why not? Use a calculator and type in -0.209log(290)
@anthonyabolarin4961
@anthonyabolarin4961 Год назад
@@EpicLaith By now, you have settled this matter. To use Excel, change 'log' to 'LN' and add = ahead of the minus sign. = - 0.209 * LN(290) and enter (voila!)
@thenineteennn
@thenineteennn 5 лет назад
1. In the regression models with an intercept, the coefficients can not be interpreted as % change as the coefficient doesn't effect the constant intercept. 2. At 20mins you forget to include the ageCat4 solo term. I.e. there are 3 terms to sum not just 2. Otherwise well done for a perspicuous explanation!
@zedstatistics
@zedstatistics 5 лет назад
Hi! Thanks for watching! Regarding your observations: 1. They certainly can be interpreted as a % change on y. Remember that if you were to exponentiate the log, youd find that y=exp(B0+B1X). Or in other words y = exp(B0)*exp(B1X), so the constant term just becomes a multiplying constant (ie. won't affect the percentage change in y for a given absolute change in X). 2. The ageCat4 term is def there! Just on the second line of the equation. Hope that helps.
@thenineteennn
@thenineteennn 5 лет назад
@@zedstatistics 1. Erm exactly my point. When B1 increases by 10%, B1X increases by 10%. But Exp(B1X) does not. 2. It was in the forumula but not summed
@thenineteennn
@thenineteennn 5 лет назад
Sorry ignore 2.
@zedstatistics
@zedstatistics 5 лет назад
@@thenineteennn Be careful here, B1 does not increase at all. B1 is the coefficient. It is X that increases. Also, if you're talking about a log-linear relationship (ie. ln(y)=B0 + B1X...) then a 1 unit increase in X has a constant % effect on Y. As per my first response. If you have a log-log relationship (ie. ln(y)=B0+B1(ln(X))), only THEN can you interpret it as "a 1% increase in X relates to a 1% increase in Y". But this is different to your example above. Hope that helps :)
@jaliu
@jaliu 4 года назад
@@zedstatistics how come at 20:00 you added the main effect of pink slip but didn't subtract the main effect of agecat4?
@annabrenner5995
@annabrenner5995 11 месяцев назад
Pre-ZedStats I respected my profs and trusted they knew best even though it was impossible to understand the jargon they mumbled. Nowadays I'm furious that those small-minded fools are getting thousands of dollars for smugly explaining nothing while I'm spending hours each week learning from ZedStats' free videos. American universities are the worst.
@pradeep2005s
@pradeep2005s 5 лет назад
For log-level regression interpretation sites.google.com/site/curtiskephart/ta/econ113/interpreting-beta
@lav1093
@lav1093 2 года назад
Big mistake in min 19:15 , you should have considered the coefficient of Cat4 honey
@zedstatistics
@zedstatistics 2 года назад
We're talking about the effect of pinkslip. So not a mistake, sweetie.
@lav1093
@lav1093 2 года назад
@@zedstatistics but without considering Cat4, that term is 0. You should combine the effect of having the pinkSlip on cat4, not over Cat1 (BASE category). Thus, the price increases 110.4%.
@OskarBienko
@OskarBienko 2 года назад
Could you elaborate? Please
@lav1093
@lav1093 2 года назад
@@OskarBienko he shoud have sum the coefficients when cat4=1, pinkslip=1 and pinkslipXcat4=1
@lav1093
@lav1093 2 года назад
The extra effect of pinkslip is zero without consdering cat4
@dashnaso
@dashnaso 2 года назад
18:00 nah bro but thanks.
@TheCsePower
@TheCsePower 2 года назад
that s ar really cheap and cool car
@masoudparpanchi505
@masoudparpanchi505 5 лет назад
speak LOUDER man!!!!!
@zedstatistics
@zedstatistics 5 лет назад
Rock that volume button bruh!
Далее
WHAT ARE LOGIT MODELS?? (Logistic models)
18:25
Просмотров 112 тыс.
REGRESSION: Non-Linear relationships & Logarithms
21:22
The lightweights ended Round One with a BANG 💪
00:10
Regression Output Explained
33:19
Просмотров 663 тыс.
Dummy Variables in Multiple Regression
7:08
Просмотров 118 тыс.
Video 5: Dummy Variables
23:44
Просмотров 236 тыс.
What are degrees of freedom?!? Seriously.
27:17
Просмотров 194 тыс.
Adding variables to your multiple regression model
28:40
прода в тгк: ПРОСТО МОСТАК
0:27
Просмотров 609 тыс.
Короче,парень ниже 1 часть
0:58