Тёмный

Saturated Models and Deviance 

StatQuest with Josh Starmer
Подписаться 1,3 млн
Просмотров 112 тыс.
50% 1

This video follows from where we left off in Part 3 of the Logistic Regression series, but the ideas are more general, so I decided not to make it just Part 4, since it's more than just that.
NOTE: This StatQuest assumes that you are already familiar with Logistic Regression. If you're not already down with that, you can check out these videos:
• StatQuest: Logistic Re...
• Logistic Regression De...
• Logistic Regression De...
• Logistic Regression De...
For a complete index of all the StatQuest videos, check out:
statquest.org/...
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumr...
Paperback - www.amazon.com...
Kindle eBook - www.amazon.com...
Patreon: / statquest
...or...
RU-vid Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshi...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer....
...or just donating to StatQuest!
www.paypal.me/...
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
#statquest #statistics

Опубликовано:

 

5 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 197   
@statquest
@statquest 4 года назад
NOTE: In statistics, machine learning and most programming languages, the default base for the log() function is 'e'. In other words, when I write, "log()", I mean "natural log()", or "ln()". Thus, the log to the base 'e' of 2.717 = 1. ALSO NOTE: This video is about Saturated Models and Deviance as applied to Logistic Regression (and not ordinary linear regression). Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@falaksingla6242
@falaksingla6242 2 года назад
Hi Josh, Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so. Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.
@MilaBear
@MilaBear 4 года назад
You know you've been watching too many StatQuest videos when you say "double bam" before he does.
@statquest
@statquest 4 года назад
You made me laugh out loud! Funny. :)
@mili3212
@mili3212 4 года назад
dude wtf. why have I just discovered your channel? you have a gift to make things as simple as possible
@statquest
@statquest 4 года назад
Thanks! :)
@spartan9729
@spartan9729 Год назад
bro you are still 2 years earlier than me. I am discovering this right now when almost everyone is a pro in machine learning. 😭😭😭
@amarnathmishra8697
@amarnathmishra8697 3 года назад
I love the weird tunes you make during any sort of calculations.Besides needless to say your teaching is amazing!!!
@statquest
@statquest 3 года назад
Wow, thank you!
@AndrewCarlson005
@AndrewCarlson005 5 лет назад
10:17 BOOP BOOP BEEP BEEP BOOP BOOP BEEP BEEP BOOP BOOP BEEP BEEP HAHAHA had me dying
@statquest
@statquest 5 лет назад
:) That's the sound of plugging in numbers. :)
@ehg02
@ehg02 3 года назад
@@statquest can you make a song about that hahaha
@cfalguiere
@cfalguiere 4 года назад
Here for the song :-) and because it was an unknown piece of jargon to me. Jargon clarified... BAM!
@statquest
@statquest 4 года назад
Awesome!!!! Thank you very much. :)
@mariaelisaperesoliveira4419
@mariaelisaperesoliveira4419 5 лет назад
OMG HAHAHAHA You make statistics seem so easy and simple!!! I looooove this channel
@statquest
@statquest 5 лет назад
Awesome! I'm impressed. Not many people watch this video because they don't want the details, but you've gone all the way. You deserve a prize! :)
@inamahdi7959
@inamahdi7959 10 месяцев назад
I have been listening to your lectures as a review on my commute and I am mesmerized. I never made the connection between LRT and these methods.bThey are one in the same! just re-written differently. or where the negative or the product of 2 come from.
@statquest
@statquest 10 месяцев назад
Glad you are enjoying the videos and making some deep connnections.
@taotaotan5671
@taotaotan5671 4 года назад
Another exciting Statquest! Thank you Josh
@statquest
@statquest 4 года назад
Bam! :)
@mutonchops1
@mutonchops1 5 лет назад
Why on Earth did I not have you giving this lecture in my course - I actually understood it! Thanks!
@taotaotan5671
@taotaotan5671 4 года назад
It's surprising to see t-test, linear model and anova roughly fit the framework of MLE, and likelihood ratio test. BAMMMMM
@taotaotan5671
@taotaotan5671 4 года назад
Wait... I want to confirm if that is correct. I tried to manually calculate the p-value from t test (equal variance), and manually calculate the log likelihood ratio and use chi-square to resolve the p-value. But it seems there is a small difference (second decimal). Is that because of the bias estimation of variance in MLE? It is really cool to resolve the same question from two very distinct frameworks (t test and likelihood ratio test). triple BAM!
@statquest
@statquest 4 года назад
I'm not really sure what causes the small difference, but I'm glad that it was small! That's a very cool thing to try out.
@taotaotan5671
@taotaotan5671 4 года назад
StatQuest with Josh Starmer BAM
@margotalalicenciatura1376
@margotalalicenciatura1376 5 лет назад
Hey Josh! Again, thanks for your awesome work as always! In this specific case, I got into more questions than answers, but after trying to find answers on my own I assume it's a conscious decision to not overly complicate things. Namely, you say that we "-2" in the log likelihood ratio is what makes it be chi-squared distributed. The origin of this seems to be the fairly complicate derivation in Wilks (1937), so I assume there is no quick intuitive way to answer why is this chi-squared distributed (i.e. in 13:42, which is the standard normal variable that's being squared?). In the same line, about why are we working with the log of the likelihoods, is it because the saturated likelihood is obtained by maximization and thus it is easier to work with the logs? If that's the case, I'm not clear on what dictates which densities are used for the likelihood computation. As you say, you used normal distribution here for illustration purposes, but with regards to the actual computation of the likelihoods, what distributions are used by the glm function for example? In logistic regression I would first say that it's a binomial but that's only conditional on the x_is and would demand one parameter per observation. And last but not least, the most simple question: why are Deviances and Likelihoods required in GLM in contrast to straight Sum of Squares and Residuals? Again, thanks a lot for your time!
@Gengar99
@Gengar99 2 года назад
I loved your video, super no-brainer examples, thank you.
@statquest
@statquest 2 года назад
Thank you!
@muhammedhadedy4570
@muhammedhadedy4570 3 года назад
The best RU-vid channel ever existed.
@statquest
@statquest 3 года назад
Thanks!
@WeLoveChouBJu
@WeLoveChouBJu 3 года назад
I cannot like this video enough.
@statquest
@statquest 3 года назад
Thanks! :)
@dominicj7977
@dominicj7977 5 лет назад
How did you calculate the likelihood of univariate data?
@bobo0612
@bobo0612 4 года назад
I am wondering the same as yours. How can the likelihood bigger than 1...
@mktsp2
@mktsp2 3 года назад
@@bobo0612 a normal density with small std rises above 1..
@PunmasterSTP
@PunmasterSTP 5 месяцев назад
10:17 12:23 Peak number-plugging-in sound effect! 👌
@statquest
@statquest 5 месяцев назад
Haha! :)
@eurekam6117
@eurekam6117 3 года назад
wow this opening song is great!
@statquest
@statquest 3 года назад
:)
@Maha_s1999
@Maha_s1999 6 лет назад
That's right Josh- here just for the song 😂 Seriously - your videos are super helpful!
@statquest
@statquest 6 лет назад
Thanks so much!!! Wow, you're even watching the video on Saturated Models! I like this one, but it's not very popular.
@Maha_s1999
@Maha_s1999 6 лет назад
You are welcome! I am a graduate stat student and now studying generalised linear models (binomial, ordinal/nominal and count models) so any other videos like this one will be really useful. If I find myself watching more videos from your channel, I would definitely like to support you like I do with other channels I am an avid follower of. I saw that you prefer your subscribers buy your music - do you have a Patreon account for regular donations?
@snehashishpaul2740
@snehashishpaul2740 6 лет назад
Love your Bams and oan ouannn as much as your teaching. :)
@statquest
@statquest 6 лет назад
Thank you!!! :)
@bfod
@bfod Год назад
Hi Josh, amazing videos. You are a godsend for visual learners and showing the concepts behind the math. I am a little confused about why we calculate the p-value for the R^2 the way we do. We have: Null Deviance ~ ChiSq(df = num params sat - null) Residual Deviance ~ ChiSq(df = num params sat - prop) and we use these distributions to calculate the p-values for the null and residual deviance. We also have: LL R^2 = (LL_null - LL_prop) / (LL_null - LL_sat) but we calculate the p-value via: null deviance - residual deviance ~ ChiSq(df = num param prop - null model) Can it be shown that (LL_null - LL_prop) / (LL_null - LL_sat) = null deviance - residual deviance and therefore R^2 ~ ChiSq(df = num param prop - null model) or am I missing something?
@statquest
@statquest Год назад
Off the top of my head, I don't know how to respond to your question. Unfortunately you'll have to find someone else to verify it one way or the other.
@suyashmishra8821
@suyashmishra8821 2 года назад
sir U r great. U r Ultimate. U r dangerous. lots of love from a Data Scientist 🥰🥰🥰
@statquest
@statquest 2 года назад
Thanks!
@craigmauz
@craigmauz 5 лет назад
@2:34 How did you calculate the LL of the data?
@dominicj7977
@dominicj7977 5 лет назад
Hi Josh. I don't understand how you got the likelihood values. Mathematically likelihood is just a probability.
@wexwexexort
@wexwexexort 4 года назад
You deserve a big bam. BAM!!!
@statquest
@statquest 4 года назад
Thanks! :)
@TheFacial83
@TheFacial83 2 года назад
Do you have video that about multinomial logistic regression . your video is the best in stat thanks a lot
@statquest
@statquest 2 года назад
Unfortunately I don't
@scottnelson7841
@scottnelson7841 4 года назад
I want to earn my PhD in BAM!!! Is there a thesis option?
@statquest
@statquest 4 года назад
That would be awesome! :)
@younghoe6849
@younghoe6849 4 года назад
BAM, BAM, what a great teacher
@statquest
@statquest 4 года назад
Thank you! :)
@palsshin
@palsshin 5 лет назад
You nailed it, much appreciated
@statquest
@statquest 5 лет назад
Hooray! I'm glad you like this video! :)
@hgupta32323
@hgupta32323 3 года назад
When you are calculating the likelihood of each of the null (2:39). proposed (3:18), and saturated (3:57) models, where do those raw values you are multiplying together come from? To me, it looks a bit like you have drawn several probability density functions (PDF) and looking at the value of a given PDF for each of the points. However, I know my interpretation is wrong because the values are often greater than 1, thus I am confused as to what those values are?
@statquest
@statquest 3 года назад
You interpretation is actually correct. I drew PDF functions and I use the y-axis value above each point the x-axis. The y-axis values are likelihoods. Likelihoods are not probabilities, and thus, can be larger than 1. For example, if you draw a normal curve with mean = 0 and sd = 0.1, you will see many y-axis coordinates on the curve that are > 1. For more details on the difference between likelihoods and probabilities, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pYxNSUDSFH4.html
@karannchew2534
@karannchew2534 2 года назад
For my future revision. To work out R², we need Log Likelihood of: .Saturated Model (best possible) .Null Model (most lousy) .Proposed Model (fitting) To work out the p value, we need the Deviance. Residual Deviance = Deviance of Saturated (best possible) Model vs Proposed (fitted) Model = 2 * (LL_saturated_model - LL_proposed_model) = the value for chi stats to work out the p value Is it from Log[prob saturated model / prob proposed model ]² ? Null Deviance = Deviance of Saturated (best possible) Model vs Null (lousiest) Model = 2 * (LL_saturated_model - LL_null_model) Null Deviance - Residual Deviance = Yet Another Deviance = The Chi Sq value to work out significance between Null and Proposed model. BAM!!!
@statquest
@statquest 2 года назад
YES! BAM! :)
@mahadmohamed2748
@mahadmohamed2748 2 года назад
15:26 Why does multiplying the log likelihood's by 2 give a chi-sqaured distribution with degrees of freedom equal to the difference in the number of parameters of the proposed model and the null model.
@statquest
@statquest 2 года назад
That would require a whole other StatQuest to explain.
@karollipinski76
@karollipinski76 5 лет назад
Keep it up, Josh Star-scat-mer.
@statquest
@statquest 5 лет назад
You just made me laugh out loud! And you get double likes for watching this relatively obscure video. Only the best make it this far into my catalog! :)
@karollipinski76
@karollipinski76 5 лет назад
@@statquest You are probably too modest. I'm trying to remedy my own obscurity here. By the way, will you discuss models more like the scream of Tarzan or Johnny Rotten?
@statquest
@statquest 5 лет назад
I will do my best! :)
@jukazyzz
@jukazyzz 4 года назад
Really nice video, like all others! However, I'm still unsure about how we can disregard the saturated model in the logistic regression to find our R^2 when that could mean that our R^2 will be larger than 1. For example, that would happen in this example that you showed. I'm probably missing something, could you please explain?
@statquest
@statquest 4 года назад
At 15:59 I show how that for logistic regression, the log(likelihood) of the saturated model = 0. Thus, it can be ignored for logistic regression.
@jukazyzz
@jukazyzz 4 года назад
I saw that yes, but the R^2 of that logistic regression would be higher than 1, if your particular example from the video. What to do then?
@statquest
@statquest 4 года назад
Why do you say that the R^2 would be > 1 in this example? LL(Null) = -4.8. LL(Proposed) = -3.3, so R^= (-4.8 - (-3.3)) / -4.8 = 0.31.
@jukazyzz
@jukazyzz 4 года назад
@@statquest Because I thought that the example from the first part of the video where you explain saturated models and deviance only is the same as the one in the later part of the video where you explain why we can get rid of the saturated models in the logistic regression. Specifically, numbers from the first example were: LL(null) = -3.51; LL(proposed) = 1.27. If these were the numbers for the logistic regression, we would get R^2 higher than 1 without using saturated model? I didn't realise that you used another example (numbers) for logistic regression because now it makes sense after your comment.
@salvatoregiordano6816
@salvatoregiordano6816 5 лет назад
Great explanation. Thank you sir!
@winglau7713
@winglau7713 5 лет назад
learned a lot from your lessons, thx so much. BAM!!
@xruan6582
@xruan6582 4 года назад
(2:50) can anyone explain why the probabilities 1.5, 2.5 are greater than 1 ?
@statquest
@statquest 4 года назад
Although we commonly treat the words "likelihood" and "probability" as if they mean the same thing, in statistics they are different. What you're seeing at 2:50 are likelihoods, not probabilities. I explain the difference in this video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pYxNSUDSFH4.html
@kimicheng5611
@kimicheng5611 4 года назад
@@statquest Could you please do an example that likelihood is greater than 1? I still feel confused how do you get 1.5 2.5 after watching the Probability vs Likelihood video.
@statquest
@statquest 4 года назад
@@kimicheng5611 Plot a normal distribution with mean = 0 and standard deviation = 0.1. The maximum y-axis value is 4.
@franzvonmoor6145
@franzvonmoor6145 4 года назад
@@statquest Hey Josh, hey everyone, Your videos a totes useful, thanks so much! I want to agree with other viewers here, that the Likelihoods above 1 in this video do cause quite a bit of confusion, since in most other videos, you explain likelihoods with values on the scale of [0, 1]. Hence, I first understood it that way that likelihoods are always between zero and one and respectively, log(Likelihoods) are always below or equal to zero. Looking it up on StackExchange I found a good helpful answer (stats.stackexchange.com/questions/4220/can-a-probability-distribution-value-exceeding-1-be-ok). I believe not to have totally grasped it yet, and hence can only humbly recommend you bridging this gap by updating the video where you explain likelihoods v probabilites and explain why and when likelihoods can be above 1. Best regards from Germany!
@jakobmeryn61
@jakobmeryn61 3 года назад
@@franzvonmoor6145 I still struggle with this concept. Would be great to have a new video.
@alanzhu7538
@alanzhu7538 4 года назад
2:37 How do you come up with the numbers for the likelihood of the data?
@statquest
@statquest 4 года назад
The likelihoods are the y-axis coordinates on the normal curve. For more details, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pYxNSUDSFH4.html
@alanzhu7538
@alanzhu7538 4 года назад
StatQuest with Josh Starmer That is clear, thank you!
@jic3897
@jic3897 4 года назад
17:06 Can someone explain why the likelihoods will be between 0 and 1? I thought likelihoods are greater than 0 but can be positive infinite.
@statquest
@statquest 4 года назад
The value of the likelihoods depends on the distribution we are using. For example, with a normal distribution, the likelihoods can be much larger than 1. However, for the sigmoidal function that has a range from 0 to 1 that is used with Logistic Regression, the maximum value is 1. In other words, the likelihood is the vertical distance from a point on the x-axis to a point on the sigmoidal curve. Since that curve only goes to up to 1, then the maximum value is 1.
@shalompatole5710
@shalompatole5710 4 года назад
@Josh Starmer. First of all thanks a lot. secondly, I am confused how you got your Saturated model likelihood. You multiplied 3.3 6 times. I understand 6 is no. of parameters. What is 3.3 here?
@statquest
@statquest 4 года назад
3.3 is the likelihood. To learn more about likelihoods, check out this 'Quest: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pYxNSUDSFH4.html
@shalompatole5710
@shalompatole5710 4 года назад
@@statquest I have seen that video. I was just wondering how you got a value of 3.3 for it here? I understand that likelihood is value on y-axis for a given data point. Should it not be bounded by 1 in that case?
@statquest
@statquest 4 года назад
@@shalompatole5710 Likelihoods are not probabilities, so they are not bounded by 1. All that is required for a probability distribution is for the area under the curve to be equal to 1. For example, a uniform distribution that goes from 0 to 0.5 has a maximum value of 2, and this means that the area is 1 (since 0.5 * 2 = 1, where 0.5 is the width of the rectangle and 2 is the height of the rectangle). A normal distribution with standard deviation of 0.25 has a maximum value of 1.6. (In R, you can see this with: dnorm(x=0, mean=0, sd=0.25) ).
@shalompatole5710
@shalompatole5710 4 года назад
@@statquest as usual a very crisp and clear explanation. Sigh!! how do you do it man? Thanks so much. DUnno how I would have managed without your explanations.
@Han-ve8uh
@Han-ve8uh 4 года назад
This video talks a lot about using chi-sq distribution, I have a general question about how distributions come about, because it seems people can design infinitely many different experiments to generate infinitely many distributions. Is chi-sq invented to measure p-value of R2 of logistic regression? If no, is it pure coincidence that this p-value can be seen from chi-sq, or the inventors of the 2(null dev-res dev) formula saw some useful properties of chi-sq and decided to use chi-sq, and invented the null dev - res dev formula? Is this formula tied to chi-sq only, or we can put the 21.34 value at 12:40 on the x axis of another dist that is not chi-sq too? Similarly, how/why was F distribution chosen to provide p-value of R2 for linear regression models?
@statquest
@statquest 4 года назад
The people that created the statistics knew about the Chi-squared distribution in advance and saw that, given this type of data, they could create something that followed that distribution. In other words, for logistic regression, knowledge of the distributions came first and they tried to to make use of them.
@adenuristiqomah984
@adenuristiqomah984 3 года назад
I know it's weird but I was waiting for the jackhammer's sound XD
@statquest
@statquest 3 года назад
:)
@kartikeyasharma4017
@kartikeyasharma4017 2 года назад
At 14:31, we have the p-value =0.002 which means that the proposed and the null model are statistically different but can't we infer that the result is by chance (connecting it with the root definition of p-value)
@statquest
@statquest 2 года назад
I probably should have been more careful with my wording at that point. I assumed that because the p-value was less than < 0.05 that we have rejected the null hypothesis that the data are due to random chance. However, you are correct, the p-value 0.002 tells us that the probability of random chance giving us the observed data or something rarer is 0.002.
@TheSambita20
@TheSambita20 Год назад
Is there any pre-requisite video before watching this? FInding it little complicated to understand.
@statquest
@statquest Год назад
Well, it helps if you've stumbled over the term "saturated model" before. If not, check out this playlist, which will give you context: ru-vid.com/group/PLblh5JKOoLUKxzEP5HA2d-Li7IJkHfXSe
@aojing
@aojing 8 месяцев назад
@9:37 Hi Josh, can you explain why the Residual Deviance is defined as this form and what is the connection to chi-square test, or provide a reference link? Thanks.
@statquest
@statquest 8 месяцев назад
Unfortunately I can't explain it, other than the definition allows the values to be modeled by a chi-square distribution.
@xruan6582
@xruan6582 4 года назад
(12:05) why the null deviance changed from "saturated model - ..." to "proposed model - ..." after greying out
@statquest
@statquest 4 года назад
That's just a typo.
@potatopancake5259
@potatopancake5259 2 года назад
Hey Josh, thanks for this video on saturated models and deviances! I'm wondering whether I'm missing some point in your video or whether there may be some slight confusion. On the slide at 14:22 it says that the p-value computed on the slide before for the difference between NullDeviance and ResidualDeviance (NullDev - ResDev) gives a p-value for the R-squared (R2). I can follow the argument up to the point that NullDev - ResDev has a chi-square distribution, since it is in fact a classical likelihood ratio, namely the same as 2*(LL(Prop model) - LL(Null model)). However, if we want to express the R2 in terms of deviances, R2 would be (NullDev - ResDev)/NullDev, i.e. we divide by NullDev. The distribution of this is not chi-square anymore and we would get a different p-value. Would you be able to clarify this?
@statquest
@statquest 2 года назад
I'm not sure this answers your question, but the R^2 we are using in this video is McFadden's Pseudo-R^2 ( see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-9T0wlKdew6I.html ), and that is what we are calculating the p-value for. I'm not sure the formula you are using can be applied to Logistic Regression.
@potatopancake5259
@potatopancake5259 2 года назад
@@statquest Thanks a lot for your quick reply! So we want to express Mc Fadden's R^2 in terms of deviances, and we get in the numerator: LL(null) - LL(Prop) = ResDev - NullDev. In the denominator we get: LL(Null) - LL(Sat) = - NullDev. (Ignoring the factor 2, which cancels out between numerator and denominator). So in total for McFadden's R^2 we get (NullDev - ResDev)/NullDev. In the slides (at 13:28; ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-9T0wlKdew6I.html), the distribution of NullDev - ResDev is developed (chi-squared) and p-value computed for this quantity. And there is the statement that this p-value is the p-value for Mc Fadden's R^2 (slide at 14:28; ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-9T0wlKdew6I.html). However, Mc Fadden's R^2 would rather be (NullDev - ResDev)/NullDev instead of (NullDev - ResDev), as far as I can see. So, I'm confused at what happens between slide at 14:19 (ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-9T0wlKdew6I.html) and slide at 14:28 (ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-9T0wlKdew6I.html). I'm wondering if I'm missing a piece here. Any input appreciated.
@statquest
@statquest 2 года назад
@@potatopancake5259 To be honest, I'm not sure I understand your question. We are calculating two different quantities: McFadden's R-squared and it's corresponding p-value, and it appears that you seem to want to use the same equation for both. If so, I'm not sure I understand why.
@hilfskonstruktion
@hilfskonstruktion 2 года назад
@@statquest What I'm asking is whether you're really calculating the p-value for McFadden's R-squared. The way I follow your slides, I think you are calcuating the p-value for the likelihood ratio test statistics (NullDev - ResDev in your notation) instead.
@statquest
@statquest 2 года назад
@@hilfskonstruktion It's possible that I am wrong here, but I believe what I have is consistent with McFadden's original manuscript. See eqs. 29 and 30 on page 121 of this: eml.berkeley.edu/reprints/mcfadden/zarembka.pdf
@Han-ve8uh
@Han-ve8uh 4 года назад
How is the saturated model at 16:19 created? I thought lines can only go upwards from left to right, but there is a line going downwards from 1st blue to 3rd red dot. Also how are there 3 line segments? Are they created from 3 logistic regression models, with first 2 red dot + 1st blue dot creating 1st model, 1st blue dot + 3rd red dot creating 2nd model, 3rd red dot + last 3 blue dot creating 3rd model?
@statquest
@statquest 4 года назад
The saturated model is just a model that fits the data perfectly. It does not have to follow rules like "must always increase" etc., because if it did, it would not be able to fit the data perfectly.
@vinhnguyenthi8656
@vinhnguyenthi8656 3 года назад
Thank you for this video! I have some questions after watching. How to create the curve in the proposed model and why are you choose this curve when below the 2-curve area to calculate the log-likelihood?
@vinhnguyenthi8656
@vinhnguyenthi8656 3 года назад
3:57 is 3.3 equal to the mean in the normal curve?
@statquest
@statquest 3 года назад
3.3 is the likelihood, the y-axis coordinate that corresponds to the highest point in the curve. The highest point on the curve occurs at the mean value.
@panpiyasil790
@panpiyasil790 6 лет назад
This one is difficult Do you plan to do other goodness-of-fit methods as well? I'm looking for a way to do counted R-square for conditional logistic regression in R
@statquest
@statquest 6 лет назад
Are you asking if I have plans to cover other approximations of R-squared for logistic regression?
@panpiyasil790
@panpiyasil790 6 лет назад
Yes
@statquest
@statquest 6 лет назад
I'll add that to my "to-do" list.
@panpiyasil790
@panpiyasil790 6 лет назад
Thank you. I really appreciate your work.
@panagiotisgoulas8539
@panagiotisgoulas8539 5 лет назад
How do we know that mouse weight follows a normal distribution?
@statquest
@statquest 5 лет назад
We can plot a histogram of mouse weights and look at it. If it looks normal, then that's a good indication that it is. We can also draw a "q-q plot" ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-okjYjClSjOg.html and use that to determine if it is normal.
@panagiotisgoulas8539
@panagiotisgoulas8539 5 лет назад
@@statquest I mean we assumed normality from these 6 data points only? Also, why would we care for normality regarding the mouse weight which is an independent variable? As far as I recall regarding assumptions in linear regression I care about normality in the residuals and the dependent variable, so I assume something similar would be required on logistic one since you basically transform y-axis in log odds one. Thanks
@statquest
@statquest 5 лет назад
@@panagiotisgoulas8539 In this video, the normal distribution is just an example, that let's us illustrate the concept of how to calculate likelihoods of the null and saturated models - it's not a requirement. The concepts work with any model that we can use to calculate likelihoods. At the end of the video at 15:59, I show how we can use a squiggle to calculate likelihoods. So, don't worry too much about the normal distribution here - it's just used to illustrate the concepts.
@patrickbormann8103
@patrickbormann8103 4 года назад
Is the Saturated Model always a model thats overfitted in machine learning lingus (16:30 in the vid)? If so, does it make sense to say we are looking for a proposed model that is close to the saturated model? OR is the term "close" the keyword, as we don't want an overfit, but just a very good model? Or are Overfit and Saturated Models something completely different, and not related to each other?
@statquest
@statquest 4 года назад
They are related concepts, but used in very different ways. When we use a model in a machine learning context, we are worried about overfitting. In contrast, when we are using a model in a statistical analysis context, where are are trying to determine if some variable (or set of variables) is (are) related to another, the goal is to get it to fit the data as close as possible because that means our model "explains the data" and the variables are related.
@patrickbormann8103
@patrickbormann8103 4 года назад
@@statquest Ok, thanks ! Got it :-) Quest on!
@bartoszszafranski8051
@bartoszszafranski8051 4 года назад
This got me confused. I've always thought of R^2 as a measure of how much better our model is at predicting the value of dependent variable given the values of features, but adding a saturated model to the equation makes it less intuitive for me. Of course, when we have the same amount of observations as dependent variables, we will maximize R^2, just because that's how math works (as they're in the same dimension you'll always be able to fit a geometric figure to all of them) but overfitting obviously is a malpractice. I don't get why we measure our proposed model's predictive ability by kinda comparing it to a saturated one (which is pretty useless).
@mahadmohamed2748
@mahadmohamed2748 2 года назад
4:11 How can the likelihood of the data given the distribution be larger than 1. I thought each specific likelihood was a probability (so
@mahadmohamed2748
@mahadmohamed2748 2 года назад
I think the above is only for logistic regression, since the y axis for the sigmoid function is a probability. Though this is not the case for the normal distribution.
@statquest
@statquest 2 года назад
Likelihood are conceptually different from probabilities and can be much larger than 1. For details, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pYxNSUDSFH4.html
@PedroRibeiro-zs5go
@PedroRibeiro-zs5go 5 лет назад
Thanks! The video was great as usual :)
@statquest
@statquest 5 лет назад
You're welcome! :)
@mahdimohammadalipour3077
@mahdimohammadalipour3077 2 года назад
Why do we use saturated model? Why do we use it as the basis for evaluating how good our model is? I can't comprehend the concept behind it! in the saturated model we consider a separate model for each point and obviously it results in overfitting doesn't it ??
@statquest
@statquest 2 года назад
The Saturated model simply provides an upper bound for how well a model fits the data and we can use that upper bound to compare and contrast the model we are interested in using.
@junhaowang6513
@junhaowang6513 3 года назад
3:20 The likelihood is greater than 1? Is it possible to have greater than 1 likelihood?
@statquest
@statquest 3 года назад
Yes. Likelihoods are not probabilities and are not limited to being values between 0 and 1. See: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pYxNSUDSFH4.html For example, the likelihood at 0 for a normal distribution with mean = 0 and sd=0.25 is 1.6, which is > 1.
@junhaowang6513
@junhaowang6513 3 года назад
@@statquest Thank you!
@ahjiba
@ahjiba 3 года назад
Can you explain what the log likelihood based R^2 actually represents? I know that R^2 in normal linear regression just represents the strength of correlation but what is it here?
@statquest
@statquest 3 года назад
Are you asking about this? ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-xxFYro8QuXA.html
@hiclh4128
@hiclh4128 3 года назад
Could someone explain why it is a chi-square distribution instead of a F distribution? I am confused because I saw there's a ratio of two chi-square distribution.
@statquest
@statquest 3 года назад
That would take a whole other video, but I'll keep the topic in mind.
@stefanopalliggiano6596
@stefanopalliggiano6596 3 года назад
Question: when you calculate the likelihood based on two means at 03:20, how do you know which distribution to use? The arrows intersect both distributions, so for each data point we should multiply two likelihoods? Could you please clarify?
@statquest
@statquest 3 года назад
The data are assigned to one of the two distributions. That's the whole idea of the "fancy" model - that some observations come from the distribution on the left and others come from the distribution on the right. So, given those assignments (which you make as part of creating the "fancy" model), you can calculate likelihoods.
@stefanopalliggiano6596
@stefanopalliggiano6596 3 года назад
@@statquest I see now, thank you for the reply!
@hajer3335
@hajer3335 6 лет назад
If I have a model contain four parameters with three variables as like concentration, times and their interaction, is the model with just theirs interaction as a variable and contain two parameters, is it a nested model?
@statquest
@statquest 6 лет назад
It could be. I'd have to see the formulas to be sure.
@hajer3335
@hajer3335 6 лет назад
StatQuest with Josh Starmer yes sure, I saw this formula in a paper about biology: Y is the transformed FDI, which may depend on time t, concentration C and their interaction (i.e. Ct) to get a four parameter model Y =a+b1log(t)+b2log(c)+b3log(t)log(c) (1) If we omit the influence of the interaction between C and t, we get three-parameter model in the form: Y =a+b1log(t)+b2log(c) (2) If the independent data do not depend on C or t separately, but on the product Ct, the parameters b1 and b2 can be merged into a single parameter b and we get the two-parameter model, which we will use here, in the form: Y =a+blog(Ct)
@statquest
@statquest 6 лет назад
I'll be honest, I haven't had a lot of experience with models like this. However, I would think that in order for the models to be nested, you would need the full model (the first one) to be this: y = a + b1log(Ct) + b2log(t)log(c) and the second (the reduced model) to be: y = a + b1log(Ct).
@hajer3335
@hajer3335 6 лет назад
StatQuest with Josh Starmer Anyway thank you so much for your patience.🙏🏻
@shaadakhtar5986
@shaadakhtar5986 Год назад
Why is the LL at 4:00 3.3 for each curve?
@statquest
@statquest Год назад
The likelihood is the y-axis coordinate on the curve above each point. And each curve is the same hight, so the y-axis coordinate is the same for each point. For more details, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pYxNSUDSFH4.html and ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Dn6b9fCIUpM.html
@gamaputradanusohibien5730
@gamaputradanusohibien5730 3 года назад
why liklihood vaule can >1?? probability value >1?
@statquest
@statquest 3 года назад
Likelihood and probability are not always the same thing. Here's why: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pYxNSUDSFH4.html
@monishadamodaran677
@monishadamodaran677 4 года назад
How did you calculate log-likelihoods @16:10 and @16:33???
@statquest
@statquest 4 года назад
I used the natural log. The log base 'e' is the standard log function for statistics and machine learning.
@shubhamshukla5093
@shubhamshukla5093 4 года назад
2:34 how do you get the likelihood of data? please explain.
@statquest
@statquest 4 года назад
The likelihood is the y-axis coordinate value. So we just plug the x-axis coordinate into the equation for the normal distribution and we get the likelihoods. For more details, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Dn6b9fCIUpM.html
@ehg02
@ehg02 3 года назад
Which models necessitate including the LL(saturated model) or equiv. to calculate R^2?
@statquest
@statquest 3 года назад
Logistic Regression
@ehg02
@ehg02 3 года назад
@@statquest but I thought you said the “sigmoidal plot” of the saturated model fits all the data points and hence the LL(sat. model) is zero. This, we can ignore it? Thank you.
@statquest
@statquest 3 года назад
@@ehg02 Technically, Logistic Regression needs it, but it goes away. However, there are other generalized linear models, like poisson regression, that may make more use of it.
@BeVal
@BeVal 5 лет назад
Oh My! I really love this man
@statquest
@statquest 5 лет назад
Hooray!!! :)
@navneetpatwari1305
@navneetpatwari1305 2 года назад
How likelihood for the saturated model brings value-3.3 at time- 4:03
@statquest
@statquest 2 года назад
The likelihood is the y-axis coordinate on each curve over the red dot. For details on likelihoods, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pYxNSUDSFH4.html
@anand.aditya31
@anand.aditya31 2 года назад
@@statquest Hi Josh! Just to make sure I am getting the concept right, if the std dev is smaller, the likelihood value will be higher, right? And the value of likelihood at mean will depend on the std dev only? At the end, thanks a ton for making these awesome videos. People like you, Andrew N G, Salman Khan of Khan Academy are making this world so much better. ❤️ Please clarify my doubt.
@statquest
@statquest 2 года назад
@@anand.aditya31 For a normal curve, the hight of the likelihood is determined only by the standard deviation. A very narrow standard deviation, like 0.1, results in a y-axis coordinate at the mean = 4. We can see this in R with the command: > dnorm(0, sd=0.1) [1] 3.989423
@duynguyennhu5677
@duynguyennhu5677 Год назад
can somebody explain to me how the likelihood of data is calculated
@statquest
@statquest Год назад
Sure - see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pYxNSUDSFH4.html and ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-p3T-_LMrvBc.html
@ausrace
@ausrace 6 лет назад
Can you explain how you calculated the likelihood of the data please? If you are multiplying the probabilities then surely they must all be less than 1?
@statquest
@statquest 6 лет назад
You are correct that probabilities should all be less than one, but likelihoods are different and can be larger. The probability is the area under the curve between two points. The likelihood is the hight of the curve at a specific point. For more details, check out this StatQuest: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pYxNSUDSFH4.html
@ausrace
@ausrace 6 лет назад
thanks will do.
@dominicj7977
@dominicj7977 5 лет назад
I don't think it should be greater than 1. I think these values are not probabilities. But likelihood is just a probability
@justinking5964
@justinking5964 4 года назад
Hi Mr. handsome Josh. Is it possible to analysis for lottery pick 3. I have different theroy to lottery pick 3 and want to verify it.
@statquest
@statquest 4 года назад
Maybe one day - right now I have my hands full.
@hajer3335
@hajer3335 6 лет назад
R^2=0.45, means the proposed model does not fit the data! Is this right? (I think r-squared must be very small)
@statquest
@statquest 6 лет назад
It depends on what you are modeling. R^2=0.45 means that 45% of the variation in the data is explained by the model. In some fields, like human genetics, that would be awesome and would mean you have a super model. In other fields, like engineering, that would be very low and mean you have a bad model. So it all depends on what you are studying.
@hajer3335
@hajer3335 6 лет назад
I do logistic regression in MAPLE 18. I want to learn another program to applied. I need an advice from you. How about R? I do not want to loss the time? please help me?
@statquest
@statquest 6 лет назад
Today I will put up a video on how to do logistic regression in R.
@hajer3335
@hajer3335 6 лет назад
Thank you so much , i'm waiting your video.
@DrewAlexandros
@DrewAlexandros 21 день назад
@statquest
@statquest 20 дней назад
:)
@dr.kingschultz
@dr.kingschultz 2 года назад
your explanation 6:20 is very confusing
@statquest
@statquest 2 года назад
Are you familiar with R-squared? If not, see ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-2AQKmw14mHM.html
@emonreturns7811
@emonreturns7811 5 лет назад
did you really do that with your mouth??????
@jayjayf9699
@jayjayf9699 5 лет назад
I have no clue what u talking about
@statquest
@statquest 5 лет назад
Bummer.
@jayjayf9699
@jayjayf9699 5 лет назад
@@statquest how did u calculate the null model? Or the two parameter model (fancier model)?
@statquest
@statquest 5 лет назад
For details of how to fit models to data, you should either watch my series of videos on Linear Models (i.e. linear regression) and my series of videos on Logistic Regression. You can find links to these videos on my website here: statquest.org/video-index/ Once you have those concepts down, this video will make a lot more sense.
@jayjayf9699
@jayjayf9699 5 лет назад
@@statquest OK I'll check it out, I've already got the concept of linear regression down, Inusing inference on the slope estimate, sum of squares ect
@statquest
@statquest 5 лет назад
If you really understand linear regression well, then you should know how to fit a one parameter model to data and you should know how to fit a 2 (or more) parameter model to data. So that you should answer your original question.
Далее
Deviance Residuals
6:18
Просмотров 88 тыс.
Тренд Котик по очереди
00:10
Просмотров 306 тыс.
p-values: What they are and how to interpret them
11:21
ROC and AUC, Clearly Explained!
16:17
Просмотров 1,5 млн
Deviance in logistic regression models
17:46
Просмотров 1,4 тыс.
Are you Bayesian or Frequentist?
7:03
Просмотров 253 тыс.
How to calculate p-values
25:15
Просмотров 417 тыс.
Linear mixed effects models
18:37
Просмотров 226 тыс.
The Fisher Information
17:28
Просмотров 65 тыс.