They are exactly the same in the sense that your chi-squared statistic will literally be the square of your z statistic (eg, if z = 2, your chi-squared statistic will be 4). The only advantage of the z-test is that it more naturally allows you to do a 1-sided test, whereas chi-squared tests are 1-sided by default. For a two-sided z test and a chi-square test of a 2x2 table, you will get exactly the same results (p-value)
Loved the way you intuitively derived the Markov Inequality. In my opinion it is much better than the proof given in the Introduction to Probability book by Dimitri and John. Thank you very much.
I might make some videos in the future about minimal sufficiency or completeness, but probably not anytime soon. Most sufficient statistics you would ever find are minimal - minimal just means you're not including an extra information. For example, if the sample mean is sufficient, then the pair (sample mean, sample median) would NOT be minimal sufficient because it contains extra information. Similarly, the original complete data set itself is always sufficient, but it's usually not going to be minimal sufficient, because it contains a whole bunch of extra information.
Someone else yesterday requested a video on completeness, so I am going to make a series of short videos on minimal sufficiency, completenes, and ancillary statistics that should be out in the next week or so.
Great work, quick question! Why is it ok to use a normal distribution for response variables like weight if weight can't be negative, or zero? I see it a lot, but don't understand why it's so common.
There's pretty much nothing that *really* follows a normal distribution - it's all approximations. Take height for example - and suppose the height follows an approximately normal distribution with mean = 64 inches and sd = 4 inches. Even though a normal distribution has some probability of being less than 0 (which is impossible), because that is 16 standard deviations away from the mean, the probability is basically 0 anyways (less than 1 in a billion billion billion billion billion billion). So yes, you're totally right that it's impossible, but assuming it's normal makes things easy and the probability calculations are often pretty accurate!
Chernoff bounds are a further extension - in Markov’s we use the mean, in Chebyshev’s we use the variance, but in Chernoff bounds we use the moment generating function which gives us an even better bound. If you look at the Chernoff bound Wikipedia page, you’ll see the proof is very similar to this Chebyshev proof, as an extension of Markov.
Best explaination, i finally understand. Do you have video about R and it's relation to the R^2? I have seen the video of Veritasium about IQ where he shows the graph with regression and tell labout getting R^2 from R. I wanted to undrstand both, now i know what is R^2, R left.
I don't have a video on the correlation coefficient r. It's just the square root of R^2, but r will be either negative or positive depending on whether the line is going up or down. For example, if R^2 = 0.49, then r will be either 0.7 (for a line that is going up) or -0.7 (for a line that slopes downward). So r gives you a little more information (the direction), but it doesn't have an easy interpretation - 0.7 doesn't really "mean" anything. R^2 is a little more general, because R^2 exists for any type of regression model (multiple regression, or more complicated forms of regression), whereas the correlation coefficient r only applies to simple linear regression where there is 1 predictor variable.
Great video, only let me express to you that you jumped a huge step when introducing the Poisson equation into the one that simplifies to ((n-1)/n)^t. Why did you transform the probability formalization into that equation? A huge step and I think many might lost the track here. It would be nice if you added the explanation in a comment. Thanks.
The data follows a Poisson distribution, so anything would need to be eventually translated into something involving the Poisson formula. It relies on the fact that X1 follows a Poisson (lambda) distribution, sum of x2…xn follows a Poisson ((n-1)lambda) distribution, sum of x1…xn follows a Poisson (n*lambda) distribution
Your mastery of the topic is evident when you are able to explain it so easily like that. Provides the intuition in a precise manner and with clear connection to the application. Thank you very much! Excellent teacher, please continue sharing, happily subscribed!
So at the beginning you said you couldn't use values of degrees celsius as a random variable but why not? In the video it is said that the values must have a minimum value hence they should be non negative so they always have a minimum of 0, but degrees celsius do have a minimum value (the absolute zero) even though they are real numbers.
The requirement is not that the variables have to have *any* minimum - the minimum has to be 0 (the variable has to be non-negative). That's why we could only use Markov's Inequality for degrees Kelvin, because the minimum actually would be 0. For Celsius, since it is possible to be as low as −273.15, Markov's inequality would not work.
This has nothing to do with the video, but I tried to buy a biased coin from Amazon - no result -, and searching for other means to get one (like Galton Boards, they would be great in class), I had to find out that biased coins probably do not exist. Help me out if you know more about that.
I believe you could make a biased coin by welding a straight wire with specific length to one side that is sticking straight outward from the middle. This way when flipping the coin, the wire stick will operate as a weight trying to balance the coin pointing downward, like a keel for a ship, so more times will the coin point with the wire downward. I've not tried this but I believe this could work.
In your final slide, you say that the link function maps from the original scale to "the parameter of the relevant probability distribution". You also say the parameter is personalised.... Is your final slide saying that in general, the link function maps to the parameter of the data's distribution? e.g. "p" in Bernoulli, "sigma" in Rayleigh? Apologies if i haven't understood this correctly.
Yes, the link function is just transforming a real number with no restrictions (negative infinity to infinity) to something with the correct possibilities for the parameter of interest. In logistic regression, if we were predicting the probability of having diabetes based on weight, you and me would each get a personalized parameter p based on our weight. The heavier person might have p = 0.7, reflecting the fact that their weight makes it more likely that they may have diabetes. The lighter person might have p=0.3. But they will both be between 0 and 1 no matter eat because the link function transformed the scale to ensure that it’s between 0 and 1, which regular linear regression did not do.
Before this I was going through other RU-vid videos. Everyone saying missing data. But in the first min you said it's missing col of a sheet and not row. This cleared my confusion. Thank you sir
I was struggling to understand MLE for few days now. Finally came across your video and your video totally rewired my brain to understand this concept. You are awesome man. Keep on making such videos.
At 5:06, in the yellow box said "It is NOT a likelihood in this context either" confused me. When dealing with a single data point, the likelihood function is simply the probability density function (PDF) evaluated at that point. So, the likelihood of observing a specific value x from a normal distribution with mean μ and variance σ² is given by: L(μ, σ² | x) = f(x; μ, σ²) where f(x; μ, σ²) is the PDF of the normal distribution. Therefore, for a single value, the likelihood and density value are equivalent.
Yes, they are the same number, but the interpretation is different. A likelihood might be the same number as a pmf or pdf, of course, that’s the entire point of the video.
@@ecarg007You seem to understand that likelihood is sometimes the same number as a pdf. But a likelihood and a pdf are never going to have the same interpretation because a likelihood is not a probability. Not sure what you’re trying to say about statquest’s video.
@@statswithbrian"It is NOT a likelihood in this context either", here "in this context" does it mean the following? In the density function with x=1, the density value is 0.2419707, it is not a likelihood. But you can say, the likelihood of observing x=1 from a normal distribution with mean 0 and variance 1 is 0.2419707.
@@ecarg007 when we know the true parameters are mu = 0 and sigma = 1, 0.2419707 is the probability density at x=1, which allows us to compute probabilities around x=1. When we do not know the true parameters and only know our observed data x=1, 0.2419707 is only the likelihood in the universe where mu = 0 and sigma=1. The likelihood would be a completely different with different parameters. For example, when x = 1, the likelihood is 0.3989 when mu =1 and sigma = 1. So 0.3989 and 0.2419707 cannot both be probabilities (or probability) densities in this context. They are likelihoods under different scenarios.
Great video, thank you. I was often confused between probability and likelihood and would just move on to the next topic still not fully understanding the difference, the "multiple universes" idea is the key part for me which finally explains it! Thank you.