Тёмный
No video :(

Design Matrices For Linear Models, Clearly Explained!!! 

StatQuest with Josh Starmer
Подписаться 1,2 млн
Просмотров 13 тыс.
50% 1

In order to use general linear models (GLMs) you need to create design matrices. At first, these can seem intimidating, but this StatQuest puts together a bunch of examples and illustrats them all so that they are clearly explained.
If you'd like to support StatQuest, please consider...
Patreon: / statquest
...or...
RU-vid Membership: / @statquest
...buy my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
statquest.org/...
...or just donating to StatQuest!
www.paypal.me/...
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
#StatQuest

Опубликовано:

 

21 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 36   
@statquest
@statquest Год назад
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@accountname1047
@accountname1047 Год назад
As a pure mathematician who studied a fair bit of design theory i find the use of designs in statistics fascinating
@statquest
@statquest Год назад
:)
@user-ju9wx1fv5u
@user-ju9wx1fv5u 9 месяцев назад
First of all, your videos are the best thing that exists on the internet. I just bought your linear regression study guide. Secondly, if I were expecting the slope for the line between the control and mutant mice to have different slopes could I create a fourth column in my matrix. The third column would have the x values for the control mice in the first four rows and and then zeros in the last four rows. The fourth column would have zeros in the first four rows and then the x values for the mutant mice in the last four rows. In my equation I would have the slope for the control mice multiplied by column three plus the slope for the mutant mice multiplied by column four. Then when I calculate my F value my parameters for that equation (p-fancy) would be 4 and I could calculate if it fits better than any simpler version. In my data I am working with a situation like this and I would like to know if this all is valid.
@statquest
@statquest 9 месяцев назад
If you are expecting different slopes, then you have something called an "interaction" and you can add an "interaction term" to your equation. For details on how to do this, see: stats.stackexchange.com/questions/19271/different-ways-to-write-interaction-terms-in-lm p.s. Thank you for supporting StatQuest!!! BAM! :)
@user-ju9wx1fv5u
@user-ju9wx1fv5u Месяц назад
@@statquest Thank you so much! My graduate studies pulled me in a different direction the last few months but now I am back on this question and I have one more problem. Let's say I wanted to see if the relationship between weight and size for control mice is stronger than the relationship between weight and size for mutant mice. In other words, if we look at the plot at 8:12. Does the green line fit the green points better than the red line fits the red points? Edit: I know I could find a p-value for each line and see which one has a smaller p-value and R squared. But this does not tell me how confident I am that one correlation is actually better.
@statquest
@statquest Месяц назад
@@user-ju9wx1fv5u What you're asking is whether or not there is an interaction between the status, mutant vs control, and the things we measured. To test for this, you would add an interaction term to your equation and it's something that deserves a whole video to explain. In the mean time, check out this link: developer.nvidia.com/blog/a-comprehensive-guide-to-interaction-terms-in-linear-regression/#:~:text=An%20important%2C%20and%20often%20forgotten,value%20of%20another%20independent%20variable.
@user-ju9wx1fv5u
@user-ju9wx1fv5u Месяц назад
@@statquest Thanks! I was able to use the links you gave me to figure out the stuff from my first comment. I think I have a solution to my second comment too. I will support this channel on Patreon because it has been extremely helpful to me.
@statquest
@statquest Месяц назад
@@user-ju9wx1fv5u BAM! :)
@ahmednafir2286
@ahmednafir2286 Год назад
Amazing content, thank you so much Josh 🙌
@statquest
@statquest Год назад
Glad you liked it!
@user-ps7pg8pd5n
@user-ps7pg8pd5n Год назад
God this is so useful and just saved my module! Thank you so much. Oh I love StatQuest.
@statquest
@statquest Год назад
Hooray! BAM! :)
@user-ps7pg8pd5n
@user-ps7pg8pd5n Год назад
Hey Josh what if the two predictors have interactions? Which represents in R like this: lm(Expression~type* weight, data) Is there a video explaning this topic hopefully? :)
@statquest
@statquest Год назад
@@user-ps7pg8pd5n Not yet.
@andreadelcortona6230
@andreadelcortona6230 Год назад
Thanks
@statquest
@statquest Год назад
:)
@MrCracou
@MrCracou Год назад
I tend not to introduce it that way. When i begin with the linear regression, i introduce with dummy variables and them I make them notice that the ANOVA is just a specific case of linear regression with those dummy variables and that the test is just a partial Fisher test. I hope that you will explain contrasts as they often generate error among students
@statquest
@statquest Год назад
This is actually part 3 in the series. Part 1 is linear regression: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-nk2CQITm_eo.html Part 1.5 is multiple regression: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zITIFTsivN8.html Part 2 is t-tests and ANOVA: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-NF5_btOaCig.html and then this one.
@stevebarratt888
@stevebarratt888 Год назад
I think something is slightly blurry in the explanation for those (me) coming to this with hypothesis testing, rather than models, centered in the mind: You're not completely explicit why you want to compare different/simpler models, and if/why this constitutes the basis of a hypothesis test. How does this sound?: To test if the measured weight* is significantly different between mouse genotypes, one starts by constructing a comprehensive glm (of multiple parameters, if necessary) to predict the measured weight. Then by applying the F = ... equation, you compare the fit of this first glm, with another one which specifically leaves out the parameter of interest, mouse genotype, but includes all the other parameters. This thereby isolates the influence of the genotype parameter in predicting mouse weight. * I guess in this example it's actually weight relative to size..
@statquest
@statquest Год назад
Sounds good to me.
@bjurv
@bjurv Год назад
StatQuest could you please number all your fantastic videos so it would be easier to find their order in your series of lectures? E.g now I have a hard time to find your first video "GLM PART 1" on this series on "GLMs"
@statquest
@statquest Год назад
You can find all of my videos, organized, here: statquest.org/video-index/
@muditgupta5968
@muditgupta5968 Год назад
Hey Josh, amazing content, I wanted request you to create a statquest on K-mer counting and NLP. Thanks
@statquest
@statquest Год назад
I'll keep that in mind.
@marvinbcn2
@marvinbcn2 Год назад
Brilliant as usual! I'm just wondering, when comparing the two regression lines, how you would deal with the design matrix in case the slope is different for normal and mutant mice. Would it be acceptable to split the third column into two, with 0 and non-zero values to "turn on and off" the slope corresponding to each type?
@statquest
@statquest Год назад
If you have enough data, you can estimate different slopes for each line. If you don't have enough data (to estimate all of the parameters) then you can use something called "mixed models" which just makes assumptions about what is going on instead of using data to make estimates.
@MrCracou
@MrCracou Год назад
In his case the model is just: y = beta0 + beta1*quantitative + beta2*qualitative + epsilon You create a model with different slopes, you need something like: y = beta0 + beta1*quantitative + beta2*qualitative + +beta3*quantitative*qualitative + epsilon
@leedongsik2
@leedongsik2 Год назад
I'm a beginner. Your explanation is surprisingly clear. But I am confused because the dummy variable coding and design matrices look very similar. Can you tell what the difference is if possible?
@statquest
@statquest Год назад
I believe they are the same.
@Salvador_Dali
@Salvador_Dali Год назад
hey josh! thx a bunch for this awesume video! a question to the last example (batch effect): is this basically what is considered to be the interaction term in the multifactorial anova? PS: i love the illustrated guide to machine learning! keep up the good work and make more of these please!
@statquest
@statquest Год назад
Interactions are a little different. An example of an interaction affect would be at 7:12 if mutants had a different slope than control.
@MrErluz
@MrErluz 10 месяцев назад
@@statquest Are you planning to make a video that explains the interaction term? Great videos by the way..
@statquest
@statquest 10 месяцев назад
@@MrErluz It's on the to-do list, but probably not for a while.
Далее
Linear Regression, Clearly Explained!!!
27:27
Просмотров 238 тыс.
How to calculate p-values
25:15
Просмотров 407 тыс.
Power Analysis, Clearly Explained!!!
16:45
Просмотров 306 тыс.
Matrix Form Simple Linear Regression
11:55
Просмотров 27 тыс.
Odds and Log(Odds), Clearly Explained!!!
11:31
Просмотров 345 тыс.
Pearson's Correlation, Clearly Explained!!!
19:13
Просмотров 380 тыс.
Linear Regression, Clearly Explained!!!
27:27
Просмотров 1,3 млн