Standard Error (of the sample mean) | Sampling | Confidence Intervals | Proportions

Подписаться 234 тыс.

Просмотров 77 тыс.

50% 1

See all my videos at www.zstatistics.com/videos/
0:00 Introduction
1:20 Definition
6:40 Confidence Intervals
12:51 Proportions
17:16 Challenge Question
Series music by Purdy.
purdy.bandcamp.com/
Song: 3 Friends to the Stars

Опубликовано:

8 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 113

@abhishekdileep2627 8 месяцев назад

Seriously man , i can't believe how much of a difference creators like him make. Honestly they are the reason why people like us have honest confidence to accomplish things in the field that we are doing ! you are the reason growth of this societies happen

@joel-uni-acc0012 2 года назад

"so heres the formula.. now don´t get too excited." thanks justin the impact your videos are having on all these peoples futures is inspiring

@TheSouzaMarvin 4 года назад

SE(x) = s/√n s/√n+x = ½( s/√n) √n+x = 2√n n+x = 4n x=3n x=3*20=60 You will likely require 60 more measurements, provided the standard deviation remains the same when the sample size increases.

@j10001 3 года назад

I like how you point out the standard deviation could change! Nice insight. 🥇

@MDMAx 2 года назад

why +x?

@vivekadithyamohankumar6134 2 года назад

can someone explain this?

@stevefirsake899 2 года назад

Remember the question is how many more measurements to carry out...so that's why we say (n + x more measurements). Good work

@Undercover_FE Год назад

Here's a much simpler solution and the key takeaway: SE(xbar) = s/√n ½SE(xbar) = ½( s/√n) Once you are here, you're done, notice that 1/2 = 1 / √4, therefore, ½SE(xbar) = s/(√n√4,) ½SE(xbar) = s/(√4n,) So, in General, we need 4 times as many observations to cut SE(xbar) in half. Generally, for any factor we wish to reduce SE by; reducing SE by a factor of x requires a x^2 as many observations: (1/x) SE(xbar) = s / √(n * x^2) For the solution to the specific problem, 4n - n = 3n = 60.

@007Anukul 4 года назад

(S/root(n))/(S/root(a)) = 2 a/n = 4 a = 4n if n = 20, a = 80, so 60 more observations.

@sivakoti9799 2 года назад

Excellent

@mohemmedansari8664 3 года назад

Very very helpful. I landed here in my journey on Lean - 6 Sigma education.This series and the other videos from you helped me sharpen my statistics concepts.

@peterbuchanan334 2 года назад

The delivery at 2:06 was absolutely perfect and had me chuckling for a good minute

@rohitchakravarthi94 Год назад

hahaha, this was just perfect,

@goodfornothing8786 4 года назад

Hey Brother, You are the best statistics tutor on the RU-vid. Thank you so much. Appreciate your good work.

@bupemalama8883 4 года назад

You're a God send dude thank you

@robhemp5548 2 года назад

12:19 shows the 500 count interval below the sample mean.

@dakshpatel4244 Год назад

Very Informative. I recommend this channel for the students who want to learn statistics for Data science and Machine Learning. As almost all concepts are covered that essential to learn for understanding the Machine learning models... Thank you and really appreciate your work

@virotev5203 5 лет назад

Very good explained, I just understand what its meaning for....thanks.

@minniemoo6956 2 года назад

This was so incredibly helpful!!!

@entrepreneurshipnews3499 3 года назад

Sir , lovely presentation! Loved from India

@jinruzhang2916 3 года назад

Amazing! That's Really Helpful!

@nikkishivoy9921 2 года назад

The lecture was very helpful.

@claudiodamiaorosa9639 3 года назад

I'd enjoy watching a video about sample size formulas to achieve specifics margins of errors! Great videos man!!

@judahdsouza9196 3 года назад

check khan academy

@carly0051 3 года назад

Loved it! I am new to statistics and ML and this has been extremely helpful in understanding the basic concepts 🙏

@swarnachoppella388 2 года назад

Challenge: S.E(sample mean)/2=sample S.D/√(4n) so it will be 80 measurements (4*20) So we need 60 more measurements 🙂thank you for sharing wonderful lectures

@othmanalshuhail7325 2 года назад

U literally saved my life...

@armaosk 5 лет назад

Hi Justin, I've made it but I cannot find the videos of outliers, boutique measures and Range+IQR) which, according to the diagram should be on the list. I searched your channel but I could not find them. Are they posted yet?

@user-uk8ww3ue7f Год назад

Дай бог Вам здоровья!

@j10001 4 года назад

EDIT: this answer is wrong, but in an interesting way. So I’m going to leave it here, with a correction posted below. I have posted a separate answer with what I believe is the correct math for the challenge question. _________ If you increase the sample size, two things will happen: (1) n will increase, and (2) you’ll be adding random data points, so they will probably end up with a somewhat different mean and standard deviation. About #1: n appears in the denominator of the formulas for standard deviation *and* SE, so an increase in n is sort of double counted; combining the formulas, you get a denominator of [sqrt(n-1)*sqrt(n)], which is pretty close to just n, especially for larger n. So in short, doubling n should get you close to halving SE. About #2: The change in mean doesn’t affect SE, but if the variation about the mean changes as new data comes in, it would alter the numerator of standard deviation, which in turn alters the numerator of SE. It’s beyond my skills to know if this is likely to change up or down as n grows, so I’ll just assume that, on average, actual variation about the mean does not change. Thus, all that matters is the increase in n, where a doubling of the sample would approximately half the SE.

@TheGao1978 3 года назад

This is not correct. Doubling n would only reduce SE by sqrt of n. Hence you'd need to 4x n to cut SE by half.

@j10001 3 года назад

In the interest of helping anyone who reads this in the future, I’ll explain my error here and then post a new comment with the correct answer to the challenge question. My insight was to look at where n appears in the standard deviation formula. My mistake was not realizing it appears (implicitly) *twice* in that formula. In the standard deviation formula, we see n in the denominator, but it is *also* implicitly in the numerator because the more values you sample, the larger the summation gets. (If you double your sample size, this summation approximately doubles.) So the implicit n in the numerator of the standard deviation and the n in the denominator of the SE approximately cancel out. So we need only use the denominator of the standard deviation to answer this question: it is sqrt(n-1), so we need slightly less than 4x the original sample size @The Gao Thanks for pushing back. This led me to go and reevaluate the algebra again. You might consider that your response would have been more useful if you had engaged with the math I wrote, which introduced a new idea about the role of n in the _standard_ _deviation_ formula. Just telling me I’m wrong, or just telling me what the correct answer is without engaging with my explanation, doesn’t help me learn. I ended up going back and figuring it out for myself, so no worries there. It just didn’t help me or future readers to avoid the mistake I made.

@j10001 3 года назад

@@minhtoto1542 Thank you so much! I’m so glad someone found it helpful. 😀

@shreddersengupta7384 4 года назад

( SE/(ROOT(20 + x) ) = SE/2 => x = 60 , therefore 60 more. Therefore, Total 80 will make Half the SE

@peterpansen4243 3 года назад

Your first "SE" is wrong, it has to be "S" :) cause it's definition is: SE(x) = s/(root(n))

@hannongshao5434 2 года назад

thank you😁😁😁

@my8411 3 года назад

Hi Justin, didnt find your video on boutique measures. would love to go through that one as well, just like the other videos of this series. Love your content and explanation.

@sagarladhwani4287 3 года назад

It's the last video in the Descriptive Statistics playlist and goes by the name - Correlation and covariance

@my8411 3 года назад

@@sagarladhwani4287 Thank you Sagar. I'll check it.

@igsor2009 Год назад

Thanks for a great stats course! timestamp 12:45 - Why sample mean 114.7 is out of a confidence interval [110.8, 113.1] ?

@codercoder7270 Год назад

I am new at statistics, therefore this answer could be wrong BUT i think confidence interval is calculated for estimate the population mean, population mean has to be that interval, sample mean doesn't need to be inside that interval (Correct me if i am wrong)

@aneeshbishnoi1279 4 месяца назад

@@codercoder7270 We actually add some value in the sample mean and subtract the same value from the sample mean. So the distance from the sample mean of both, the cieling and the floor value (in the interval) would be the same.

@aneeshbishnoi1279 4 месяца назад

Actually what was wrong is that he forgot to change the sample mean value. 112 in n=5. If u see, then u will find that both below values are equidistant as I mentioned from 112, not the value of mean when n=50, n=500. So, technically he forgot to update the sample mean value when finding the new intervals.

@sylvesterrokuman7520 2 года назад

SE(x_bar)_{0.5} = SE(x_bar)_{1.0}/2 = s/(2*sqrt{n}) = s/(sqrt{4n}) i.e. we double the sqrt of the sample size, n in the denominator →２*sqrt{n}

@jediTempleGuard Год назад

Is standard error formula also valid for lognormal distribution? Or in other words, is it applicable to different types of distributions?

@suchitrakulkarni4559 3 года назад

Why do we multiply the t critical value to get standard error though? Nice video!! thanks!!

@bookender 2 года назад

Is it possible for the confidence intervals not to contain the sample mean, as shown at 12:39 for n=500? If I understand this correctly, it is saying that when you have a sample mean of 114.7 there is a high likelihood that the population mean is between 110.8 and 113.1. Unless we had other information about the population, this doesn't seem reasonable to me.

@AndrewClark83 Год назад

@zedstatistics Perhaps, I've missed something, but If the confidence intervals are the sample mean +/- the (SE * t score), how is the 95% confidence interval score for n=500 *lower* than the sample mean? Sample mean is 114.7, and the 95% confidence upper threshold is 113.1 in the slide (12:46). Should it not be 0.55 (SE) * 1.964729 (T score) + 114.7 = 115.78? In the slide, we have n = 5 with an even difference +-/ from the sample mean (15.8) with n = 50 we have (6.9 below and 0.3 above) with n = 500 we have (3.9 below and 1.6 below)

@uclalse 3 года назад

I find the probability distribution picture in this example confusing because it appears to be treating the true mean as a random variable, which it is not.

@shnks0943 2 месяца назад

i dont think the underlying distribution needs to be a normal random variable...the sample mean distribution will always follow a normal distribution due to central limit theorem...also z statistic is used in case population variance is known otherwise t statistic is used

@deogettic 3 года назад

SE(x)=S/√n Equation for 20 samples equal SE(x)=5/√20 Equation for the required measurements: 1/2SE (x)=S/√n SE(x)=2S/√n Therefore, let equation 1 equal equation 2 5/√20=2S/√n S.√n=2S.√20 √n=(2S.√20)/S (cancel the S together) √n=(2.√20)/1 Square both sides (√n)2=( 2√20)/1)2 n=80 Therefore, you need 60 more measurements if you wish to half the standard error

@MDMAx 2 года назад

where is 5 from?

@dashsunil Год назад

@@MDMAx old s size as mentioned in the video. Hope this helps.

@xXxIMMORTALxXx 3 года назад

Can anyone please explain to me the new formula for standard error of sample proportion at 13:54 ? Justin didn't quit go into details about it and I'm not able to extrapolate from the similar formula of standard error of the sample mean that was shown just right before. Why the numerator of SE of the sample proportion is p*(1-p)?

@Shauracool123 3 года назад

So basically this has something to do with the binomial distribution. When we talk about proportions or situations where we can have two outcomes like (yes or no, Voting in favor or against) we use binomial distribution. In binomial distribution p(1-p) is the variance. So [ p(1-p)]^(1/2) will be the standard deviation Since we know that standard error = standard deviation/ (n^1/2). So we directly put the value of standard deviation in our formula.

@xXxIMMORTALxXx 3 года назад

@@Shauracool123 Thanks Shaurya

@jennifershao4608 4 года назад

80 measurements but on the condition that the standard deviation remains the same with the observations increasing from 20 to 80.

@tanmaychandak9958 4 года назад

i want to believe that your answer is right. but since he said he is going to make the question difficult I suppose there could be more to it

@11Sentinel 3 года назад

@@tanmaychandak9958 Found you. Thanks for the channel recommendation. :)

@tanmaychandak9958 3 года назад

@@11Sentinel I am glad you visited the channel. Happy learning :)

@pb48711 3 года назад

60, not 80. You need 60 MORE measurements to halve the standard error. You already have 20.

@wilsondsouza3515 3 года назад

What is Expected incidence (best guess)

@cococnk388 Год назад

SE --> 1/2 SE n --> 4n 20 -- > 80 we need more 60 points for our sample. Everything been equal, "n" and "SE" should only be the changing factors (the other remain fixed) in the SE formula, this is to find the relationship between SE and n. let us say my SE = 12, s= 24 , n=20 SE/2 = 6 i will later on do 1/2 (24/sqrt(20)) = s/sqrt(n) n= 80 x (s^2/ 24^2), everything been equal s=24, in order to see the impact on "n" when changing the SE. n= 80 x (s^2/ s^2) n= 80

@baldinggrey5368 2 года назад

Here is one thing I do not understand about the t-distribution: why do we need to assume the population is normally distributed? In effect the t-distribution is the sample distribution for a given sample size, correct? I have read, and it does make intuitive sense as well, that sample distributions do not depend on the underlying population distribution. So, if you have say a sample size of 30, you will have a t-distribution approaching a normal distribution regardless of whether the population distribution was non-normal, bi-modal, skewed or what not. Were am I going wrong here?

@andriilekhan9901 3 года назад

Thanks for your explanation, Justin! Very helpful. Regarding the question in the end: SE = SD/sqrt(n). Since SD = sqrt(Sum((x'-xi)^2)/(n-1)), it makes SE = (whatever)/sqrt(n(n-1)). Therefore, to halve SE, one needs to increase the output of denominator, making sqrt(a(a-1)) = 2sqrt(n(n-1). Thus: sqrt(a(a-1)) = 2sqrt((20(20-1))) a(a-1) = 4(20(20-1) a^2 - a - 1520 = 0 Solving for positive a, we receive something roughly around 40. In order to halve the SE we need to increase n by 20. My best guess.

@EwanThomasM 4 года назад

hi justin, could u plz redirect me to the video dealing with Boutique Measure.

@kartikayasahu2914 4 года назад

Hi, i still didn't get why you took 97.5% for t-distribution part if u just want for 95% only while calculating for the confidence interval for sample mean??

@Jukka70 3 года назад

I know this is old, but I can help, when you look at the distribution, 5% lies outside but it's outside in both what are called tails. The 5% is for both on the positive side and for the negative side. Therefore there is 2.5% in each tail. So that's the reason we need to look at the .975. It seems a bit weird I know, but you are actually looking for the overall piece, which means .025 in either direction. So when you do it for .025 you are getting it for one tail. So now if you do it in the + and the - direction, twice that is the .05 that you were looking for. I hope that helped

@wuyunhua7368 5 лет назад

did you make some mistakens when compute confidential interval as n=50 and n=500, for example, as n=50, and x bar equals 115.3, your confidential interval is [108.5,115.6], however, (108.5+115.6)/2 not equals 115.3. looking forward your help, thx

@TheExceptionalState 4 года назад

Just spotted that as well. My guess is that he forgot to change the mean and stuck with 112 as the sample mean. The final mean (114.7) isn't even in the confidence interval

@marcellmunnich1094 3 года назад

Love this explanation, you laid it out so well! However, it is still not clear to me as to why we are using SQRT of n (why SQRT, where does that come from?) Could anyone clarify? cheers!

@segsb7085 3 года назад

For the most part, the Standard Deviation of your sample (the numerator) doesn't change much regardless of sample size. So thanks to the "magic" of mathematics, using the square root of N makes it easy to systematically reduce your standard error. If you use the square root of N as the denominator, increasing your N by 4 cuts your SE in half. So this is a quick and easy way to eyeball how large your sample needs to be to reduce your standard error and consequently improve your confidence intervals. That's at least one good reason for using the square root of N as opposed to something else. For a more detailed response: statisticsbyjim.com/hypothesis-testing/standard-error-mean/

@SACHINKUMAR-zo5pm 2 года назад

This is because when using inductive method, SE = std dev/ sqrt(n) matches more closely with the standard deviation of deductive method. Since we want to at least make our inductive analysis as close as possible to deductive method, we take this expression of SE. Hope this is helpful.

@utopianclass 2 года назад

✓n is found because I think of it as sqrt of variance divided by the total observation (mean of the total variance). So finding sqrt of the mean of the total variance gives you Standard deviation/sqrt n .. Hope this helps.

@luiscarlospallaresascanio2374 Год назад

Sqrt(varíance) / sqrt(n)

@alexandrenarolles7994 4 года назад

Hi, I am kind lost, I still don't know where are the 2.78 from and how it is calculated (11:20mn)

@YRPortfolio 4 года назад

this is a t-test, which takes into account that the sample size is small. go watch a video about it and it may become clearer although it only really clicked with me today lol

@j10001 3 года назад

You have to look up the t-statistic in a table of values, or calculate it in excel or other software

@yasserahmed2781 3 года назад

Why does the iq have to be normally distributed? Doesn't the central limit theorem state that the distribution of means will be normally distributed anyway? Great videos btw

@shubhajitpaul5554 2 года назад

Quantitative genetics ...

@lateefsalawu6318 2 года назад

If X = Kyp, Show that the standard error in X is given by Sx = (xPS_Y)/y Please how do i go about this?

@prathamsinghal5261 4 года назад

4 times i.e total 20x4 = 80

@SiddharthPrabhu1983 5 лет назад

To decrease the standard error by half, n would have to increase by a factor of 4 i.e. you would need 80 measurements. Is that correct?

@j10001 3 года назад

I got 77 total (so we need 57 more measurements). The math explained in detail in a separate comment to this video.

@arundhatisarkarbose5317 4 года назад

What software is used to make this video? power point?

@sushmasingh6039 3 года назад

Prezi PPT

@mohdwahab4803 5 лет назад

60 more measurements to halve that standard error.

@varunjashnani3120 2 года назад

At 12:40, when n = 500, the sample mean is not part of the confidence interval? Is this okay?

@YRPortfolio 4 года назад

I also got 80 at first but that doesn't change n in the sample variance denominator. So I inserted the formula for the standard deviation into the formula for the standard error and moved things around a bit so that the denominator is (n-1)*sqrt(n). Our goal is to halve the standard error, so we must double this denominator. Insert n = 20 and we get approx. 85 as our denominator and that doubly is 170. Our question now is what is n so that our formula for the denominator equals 170. That's some cumbersome algebra so I put it in online and the result was approx. 31, so we LIKELY need 11 more samples. LIKELY because new samples may shift the numerator, which will shift the mean, which will shift the standard deviation, which will shift the standard error. Let me know what you think, I could be wrong but hoping to be on the right track haha.

@j10001 3 года назад

I think you are on the right track! I like that you substituted in the formula for standard deviation, which allows you to consider all the places n appears. I think this is what most other commenters are missing. Nice approach! I also did this, and I worked through the math manually to come up with the number 57. (See my comments on this video for details and a simple exposition of the math.)

@md.barkatullah5084 Год назад

I like and understand your presentation. However, I can't seem to figure out, what formula was used to calculate Std-Err-of-Percent in the following table? Gender Result Frequency Weighted-Frequency Std-Err-of-Wgt-Freq Percent Std-Err-of-Percent 95%CI-for-Percent Male Negative 1 16.20746 16.20746 7.5605 7.5259 0.0000 22.6253 Positive 4 64.82982 31.56546 30.2419 14.4387 1.3398 59.1441 Total 5 81.03728 34.96896 37.8024 15.9078 5.9594 69.6454 Fem Negative 3 100.00000 56.73086 46.6482 18.0779 10.4613 82.8350 Positive 1 33.33333 33.33333 15.5494 14.1520 0.0000 43.8778 Total 4 133.33333 64.91964 62.1976 15.9078 30.3546 94.0406 Total Negative 4 116.20746 58.52508 54.2087 17.5366 19.1054 89.3120 Positive 5 98.16316 45.08850 45.7913 17.5366 10.6880 80.8946 Total 9 214.37061 71.16743 100.0000

@SanjayFGeorge 3 года назад

@12:17 how is the upper limit of the 95% confidence interval of the third sample of 500 at 113.1 less than the sample mean of 114.7 ?

@konstantinasrimkus9667 3 года назад

I suppose he used a sample mean of 112 for the confidence interval itself since it would be somewhat in the middle of 110.8 and 113.1.

@mrr1491 5 лет назад

Multiply your standard error by 4?

@godfreypigott 2 года назад

You mean multiply the number of scores by 4.

@prashantsherawat3075 2 года назад

@j10001 3 года назад

We need about *57* _more_ measurements, for a total of about *77* _______________ Here’s why: The sample size *n* affects the SE formula in 3 places: (1) It appears in the denominator of the SE. (2) It appears as “n-1” in the denominator of *s* (the standard deviation, which is the numerator of the SE formula). (3) It _also_ appears implicitly in the numerator of *s* because the summation there gets larger as the sample size increases, and, assuming the deviations are similar for the new measurements, the summation increases at the same rate as *n* (1) and (3) cancel. They are both just √n. So we can ignore them. Therefore we only need (2) to answer the question. Halving SE will require doubling the denominator of *s* In this question, the denominator of *s* is √(n-1) = √(19) Doubling this: 2 * √(19) = √(4*19) = √(76) = √(77-1) = √(n-1) So *n* = 77. This means we need 77-20=57 _more_ measurements to halve the SE. (Suggestions for improvement to this answer are welcome! 🙂)

@arnabdas7019 3 месяца назад

80 measures are required.

@baqerghezi1342 4 года назад

the playlist does not order well...

@zedstatistics 4 года назад

I went with Hi Fidelity Rules. Started with a bang, took it up a notch then cooled it off.

@AhsanIftikharQureshi 3 года назад

Standard formula for standard error is $$SE = \frac{s}{\sqrt{n}}$$. As per the question the desired objective is reduce the SE by half i.e. SE/2. Thus the equations becomes $$\frac{SE}{2} = \frac{s}{\sqrt{n+x}}$$, where n = 20 and x is the additional measurements we are looking for as the answer. Solving this entire question will give us the final result of 60 (it will also involve the substitution of values). Thus 60 additional measurements will be required to decrease the standard error by half. Thanks Note: Above notation is based on Latex. If you put it in a RMarkdown document then you can see it as normal mathematical notation. (provided that latex or tinytex has been installed)

@devvrat1988 3 месяца назад

20×4=80 so 60 more

@henkover1 Год назад

I did not understand the proportions part. Why does p gets to be 0.65? Why does 65 percent of votes on a specific party result in a p value of 0.65? Could you explain please? I would be thankful. Your videos are superb by the way!

@Vyshu5 Год назад

It’s just 65/100 =0.65

@annaroliak22 Месяц назад

at 12:20 shouldn't you calculate 95% confidence intervals for n=50 and n=500, using their respected means, and not the mean for n=5. The whole point of these intervals looks wrong to me. at 15:32 "If N gets pretty large, all distributions converge to a normal distribution" -- you simply citing the Central Limit Theorem wrong. Try to apply what you're saying to uniform distribution... The Central Limit Theorem says that SAMPLING DISTRIBUTION of means (for example) of ANY distr. converge to a normal distribution, not ANY distr. --> normal !!!