If the sample is 30 or less and you do not know the population standard deviation then you would use the t-table. Once you get to larger samples the t-distribution is pretty much the same as the Z distribution so you can't go wrong using the t- distribution unless you are in a test with a picky examiner.
WAS LITERALLY ABOUT TO COMMENT THIS lowkey kinda mad that I woke up for an 8am lecture and felt really lost and confused and then watched this video and now I understand like why am I
@@alenjose3903 Thanks for that. Khan is not a statistics teacher and teaches like a maths tutor which is quite different. I really appreciate it when people can see the difference.
Quite right Mohammed! Sometimes Khan's instructors over explain and fail to stay on point with the subject matter. The result is that I walk away more frustrated and confused than I did before watching the video! Dr. Nic keeps it simple and plain; the way I like to learn!
This video teaches the more traditional way to calculate confidence intervals. It still concentrates on conceptual understanding rather than mathematical methods.
@@swalihashanu5827 This is not a term I am familiar with. It is a confidence interval for a mean based on a t distribution. Not sure what they mean by "exact"
Thanks a lot! Great statement for clarification at 4:19 about CI : "This is not the interval that will hold the weight of 95% of the apples in the orchard: It is the interval that we have 95% confidence will contain the true value of the population mean"
great job, really made it simple. Superb. Just a lit little bit curious at the like being placed against the comment trashing Khan academy. I agree that their explanation is not that great but overall the guy has done a great job. One academic should not enjoy the trashing of another. Just saying
Thanks. My concerns with Khan Academy are based around the procedural pedagogy and lack of vetting. Considering the funding that has gone into KA, there needs to be better quality control. I have written several blog posts about my concerns, including an open letter to Sal Khan. Many teachers share my concerns.
And you may have noticed that I try to like all the comments, as I appreciate any interaction. I also "liked" a comment saying I was almost as good as KA.
Statistics Learning Centre First disclaimer , I actually had to learn what pedagogy meant before replying :-). I agree that the explanation by Sal Khan is not up to the mark. And I may be mistaken but the comment by M. Shafei was the only one with the logo along with it. Anyway, my knowledge is zilch as compared to yours or Sals, and I benefit from both. Again, thanks for making It so simple. And taking the time to reply. Quick question, is Relative Risk applicable to control studies comparing a drug and a placebo? Thanks.
Thank you for your useful video, but I have not clear understand the concept yet. I do not know what is "t" value and how we calculate it. Why do we choose 90%, 95%, and 99%. Could you help me to figure out the problems. Thank you again!
Hi - You do not calculate a t value, you read it from a table. However the computer generally does that for you. You choose the level of confidence depending on how confident you wish to be that the interval you get will contain the (unknown) population parameter. It is a trade-off, though, as the more confident you wish to be, the larger the interval gets.
You are a life saver. I hope you will continue to do this great work for years to come. i have been going through tons of tutorials and I find this refreshing. taking a data science course.
This was well explained and easily understood video. I am doing an online course, and this helped me understand the material so much better. THANK YOU!!
Hi Mam, Thank for your clear video. I also have a question. Can you explain when we can use this formula, compared with the formula of (Average value)+/- 1.65*(Standard deviation)? Thanks in advance
Nguyen Phuong The formula you have given relates to a normal distribution and gives the middle 90% of the values in that distribution. The big difference is that you have used the standard deviation, rather than the standard error. When we use the standard error s/sqrt(n), we are finding the range of values within which we expect the population mean to lie. It is the distribution of sample means, not the distribution of the values. I suggest you watch Understanding Confidence Intervals, to get an idea of what the distribution is of.
Guys why don't we just skip hours-long college lectures and learn it all here in 5 minutes?I have community medicine exam on Saturday. Thanks for the help!
GREAT VIDEO. REALLY HELPS TO UNDERSTAND CONCEPTS BEHIND CONFIDENCE INTERVAL. THE ENTIRE TEAM OF Statistics Learning Centre IS TO BE CONGRATULATED. WISH TO CONTINUE THEIR ENDEAVOUR IN SIMPLIFING STATISTICAL METHODOLOGY
Hi Dr Nic, thanks for this wonderful video. I am confused about @2:23 confidence level and sample size data. I was wondering why 0.05 column in the t distribution is used for 90% confidence and 0.025 for 95%? Sorry Edit: I found the answer in below comments, because it's split between two confidence interval ends. But now confused the logic behind splitting it between two confidence interval ends, what if we just use e.g. 95% confidence for 0.05 t distribution column? (i see this as more straightforward). Would you be able to please explain? thank you
This is a great video, thank you. One thing I don't understand though is why the proportion of sample size to the original data isn't considered. E.g. All things being equal other than the population size, I would expect a calculation based on a sample of 200 people out of 400 to have a higher confidence than a calculation based on a sample of 200 people out of 10,000 people. This doesn't seem to be considered though, can you help me to understand why please?
In most of what we do, the population is big enough to ignore the sample size effect. There is such a thing as a finite population correction which should be applied if the sample is getting close to the size of the population. The issue is more that we do NOT sample with replacement, yet the formulas and theory assume that we do sample with replacement. Hope this helps.
Note for future revision. 4:18 "This is not the interval that will hold the weight of 95% of the apples in the orchard. It is the interval that we are 95% confident will contain the true and known value of the population mean." "If we were to take a lot of samples of size 15, and create 95% confident intervals from them, 95% of them will contain the true population mean. But 5% of them will not contain the population mean."
@@DrNic Very helpful indeed. I have been confused with the definition of confident interval, this videos clarifies. One thing is still unclear to me though. Re: "If we were to take sample of 15 apples and create 95% confident interval from them". This sounds recursive... How to create the 95% confident interval from the samples in the first place (which then used to define the confident interval)??
@@karannchew2534 I'm not sure what you are asking. The way to get a confidence interval is to take a sample, decide on the confidence level and use the formula to calculate the confidence interval.
@Dr Nic's Maths and Stats Thanks for getting back. Let me rewrite the question. I am confused about the following two aspects of CI definition: A) The 95% CI of a set of sample can be calculated using x +- t x s/sqrt(n). If I did that calculation for many sample sets, I end up with many CIs. B) 95% of the many CIs from (A) will include the true mean. There are two "95%". In A, it's 95% of the range under t-distribution of ONE set of samples. In B, it's 95% of the CIs from ALL set of samples. How do I link these two views of CI? Hope my question is clear. I've been stuck with this point for a long time :-)
@@karannchew2534 The 95% is referring to the same thing in both instances. The 95% CI in A is called that because of B. You might find this blog post helpful. creativemaths.net/blog/sampling-distribution/
That is great to hear. You probably had most of the ideas already and this video helps to organise them for you. Good luck in the rest of your studies. You can find our videos organised here: creativemaths.net/videos/
The statement (at 2.48): " We are 95% confident that the population mean lies within this interval" is a BAYESIAN principle, whereas the construction of the confidence interval as demonstrated here is a FREQUENTIST's principle. Note (from here, en.wikipedia.org/wiki/Confidence_interval , section "Misunderstandings"): __"A 95% confidence level does not mean that for a given realized interval there is a 95% probability that the population parameter lies within the interval"__
Epic approach to teaching, keen knowledge of learner needs, and effortless ability to teach complex concepts simply through common sense illustration. Respect for Dr. Nic!! So helpful, thank you.
Your videos are remarkably clear. Much better than many lecturers' I've had. Just a quick comment/question: my Harvard graduate Ph.D. in Epidemiology big head professor told me that I shouldn't make the assumption that "we are 95% confident that this interval contains the true value of the population" because this definition only applies when we are thinking of a hypothetical infinite number of samples (re-sampling/frequentist statistics approach) but never to any specific sample alone. Therefore, the explanation of minute 4:34 would be correct based on what he explained to me, but what you said a few seconds earlier (in minute 4:25) would be imprecise due to this technicality because we can only say this when we are considering broad re-sampling (but never for our single sample). Would you say that he might be correct when saying this? Thanks a lot. Your contribution to people you want to understand statistics is truly enormous with such clear videos!
For any specific sample, providing it is random and representative of the population in question, we can say that we are 95% confident that a 95% confidence interval will surround the population mean. The explanation of 4:34 is the statistical meaning of "95% confident". We do not take more than one sample in any instance, so this is the correct interpretation of a single confidence interval. There is always a tension between very very precise definitions and definitions that are useful and useable.
In order to use Z you need to know the population standard deviation or have a large sample. If you know the population standard deviation you wouldn't be finding a confidence interval as you would already know the mean. If you have a large sample, the t statistic becomes the Z statistics so you don't have to worry.
On a table I refer to, for a sample size of 15 and confidence level of 95%, I get 2.131, your t value of 2.145 appears but for a sample size of 14. However, the explanation is fantastic.
Thanks. The number in the t-table is the degrees of freedom. For a confidence interval for a mean, the degrees of freedom is n-1 (or the sample size minus 1). That is why we look up 14, not 15. I realise at 2:25 this is a little confusing. At 2:14 I did put in that df is sample size - 1.
How awesome. I am seeing this video before my epidemiology and i am sure i ll b man of the match. Dr. Dr. Nic, I automatically subscribed to your channel. Plus , i am also impressed by the quality of the video. i wanna know how such very attractive videos are prepared. I need somebody's answer
Your video is great, lot easier to understand. But, i got confused at 1.55. As you said that more confident we are, larger the CI will be.. If we have more information then our CI should be small. As more information we have more confident we will be. Therefore our CI should be small..
This is a common source of confusion. The point is that we do not have more information. We are using the same information and we want to be more confident of including the population parameter in the interval, so we make it wider. Think of it as a net - to be more likely to catch something, we make the net bigger.
This was a preety helpful video. I have got one small request. I want to know from where you have calculated the t value and how you have calculated. I have tried few 't' charts but nothing was helpful. Hoping you will be able to help me.
2.145 is the t value for a sample of 15 (which means 14 degrees of freedom) and a confidence level of 95%. You can get it from a table or use a spreadsheet to find it. In Excel you would put =T.INV.2T(0.05,14). It is the inverse t distribution with 5% outside the confidence interval (as there is 95% inside) and the degrees of freedom comes to 15 - 1 = 14.
Thnx for the video. a small question..In this formula, the confidence interval is independent of the population. Will the confidence interval not vary for taking for a sample of 15 apples out of a population of 100 apples compared to 15 out of population of say 10000 apples? In case the population has been taken into account, which part of the formula caters that?
+Akash Deep Choudhury You are quite correct, but the finite population correction is used then. At a basic level we assume that the population is big enough for it not to matter. If we are concerned about the relative size of the sample to the population, we should use a “finite population correction factor”. This only makes an appreciable difference when the population is small (less than about 1000) and the sample is more than 20% of the population. The finite population correction factor is the square root of the proportion of the population that isn’t sampled. So if we had a population of 1000, and sampled 200 of them, then the correction factor would be the square root of 0.8 = 0.89. If the population is 10000 and we sample 1000, then the correction factor is 0.95. This correction factor is applied to the standard error, so that in the traditional confidence intervals, it would decrease the confidence intervals by that amount.
Thank you so much Dr. You are the best teacher. I have learned alot in your demonstration. Now i have cleared concepts about confidence interval and point estimator, standard error of mean and t value.