Тёмный

Sample Size Estimation in A/B Tests Explained! 

Emma Ding
Подписаться 56 тыс.
Просмотров 50 тыс.
50% 1

This video is a step-by-step guide on how to estimate sample sizes for A/B tests. This is a very important concept to master in order to ace your Data Science interviews.
Everything You Need to Know About A/B Testing in Data Science Interviews
• Crack A/B Testing Prob...
🟢Get all my free data science interview resources
www.emmading.com/resources
🟡 Product Case Interview Cheatsheet www.emmading.com/product-case...
🟠 Statistics Interview Cheatsheet www.emmading.com/statistics-i...
🟣 Behavioral Interview Cheatsheet www.emmading.com/behavioral-i...
🔵 Data Science Resume Checklist www.emmading.com/data-science...
✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: www.emmading.com/coaching
// Comment
Got any questions? Something to add?
Write a comment below to chat.
// Let's connect on LinkedIn:
/ emmading001

Опубликовано:

 

12 янв 2021

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 42   
@dwardster
@dwardster 2 года назад
Hi two questions: 1. I see this formula is used for two-sample T-tests, but what about the more common case of a two-sample Z-test for proportions? What would we use to calculate sample size in an interview in that case? 2. How do we estimate σ before running the experiment? Just take a sample before running the experiment and use its standard deviation to estimate σ?
@vitheltone
@vitheltone 2 года назад
Super insightful, but please, at min 1:30 it says "acceptance of H0" , we never accept H0, its reject or fail to reject :)
@waynewang2071
@waynewang2071 2 года назад
The math trick was wrong: x = \Phi(z_x) for 0 < x < 1. But you can still use the fact that z_{\beta} = - z_{1-\beta} in this case, which will give you the correct formula in the end.
@shreyaschaturvedi1933
@shreyaschaturvedi1933 3 года назад
hi emma, can you explain how to calculate the standard deviation in this formula? since the control and treatment group will have different variation, should we use some kind of pooled/unpooled standard error for this number?
@sitongchen6688
@sitongchen6688 3 года назад
Hi Emma, thanks for your great video! One question, is here I know you are assuming control and treatment groups have equal variances. But in reality, they may not have same variance, therefore when applying this formula, which variance shall we pick or shall we use pooled variance? Or shall we ensure that those two groups have the same variance? Thanks a lot!
@hiteshgupta7099
@hiteshgupta7099 2 года назад
Hi Emma, thanks for this video. Is the sample size formula to calc only if there are 2 variant groups(control + treatment)? If so how will the formula change if we need more than 2 variant groups?
@user-mq9si8eh5k
@user-mq9si8eh5k Год назад
Wow, so much information, thanks a lot!
@songxiyou2347
@songxiyou2347 3 года назад
Hi Emma, Question : beta = P(accept H0 | H0 is false) is a conditional probability, why we say beta = P(accept H0) directly?
@jingcaomath
@jingcaomath 2 года назад
same question here? Everyone can explain this?
@weichiyao5712
@weichiyao5712 2 года назад
I think this is a typo or something. Formally, Beta = P(Do not accept H0 | H1) = P( |xbar / s /√2n| < z_{alpha/2} | H1). Conditioning on H1 means xbar ~ N(µc-µt, s /√2n). That's why in the next step she subtract µc-µt from xbar.
@chihiroa1045
@chihiroa1045 2 года назад
Thank you.
@lucasdeoliveirasilva8466
@lucasdeoliveirasilva8466 2 года назад
Is the n the sum of control and variant group sizes? i mean, should i consider n=16*(sigmaˆ2)/(lift ˆ2)= n(control) + n(variant) ? (assuming the AB test is 50/50) Or n=n(variant)=n(control)?
@chelseazhang1458
@chelseazhang1458 3 года назад
Hey Emma! Your videos are very helpful and efficient for preparing interviews! Can you explain a little bit on when should we use t-test vs z-test or what's the difference between the two? I'm a little bit confused about the transition between the two. Thanks!!
@emma_ding
@emma_ding 3 года назад
More videos about statistic is upcoming. Stay tuned!
@xinyuntang8124
@xinyuntang8124 3 года назад
T test is for comparing two samples and z test is for comparing sample and population, based on your use case, it’s usually pretty obvious whether you should use t test or z test
@AdiJ8
@AdiJ8 3 года назад
z-test assumes a population standard deviation/ t-test assumes you don't know the population std and so you estimate it with a sample std
@owenho561
@owenho561 3 года назад
Your video is very helpful. One question for clarification when you mention delta square is the difference between control and treatment. What the variable is difference of? Can you give us an example. The variance is clear. Thx in advance
@owenho561
@owenho561 3 года назад
This is super helpful! Thanks for sharing. One quick clarification on delta square. Is the Uc and Ut population delta between control and treatment group or it is something else? Can you give us an example?
@emma_ding
@emma_ding 3 года назад
mu_c and mu_t are population means of the control and the treatment groups. Those are unknown and estimated from the data.
@joe162840
@joe162840 2 года назад
There is something wrong here....at 1:49, shouldn't it be accepting H(1) or rejecting H(0)?
@yuchingyang3089
@yuchingyang3089 2 года назад
Hi Emma, Thanks for your clearly explained for sample size. I would like to know that in practice, how we can determine the minimum detective effect (delta), is it the same as the practical significance boundary?
@hiteshgupta9286
@hiteshgupta9286 2 года назад
usually yes.
@kagelsmith997
@kagelsmith997 2 года назад
At 1:04, why is the variance 2*sigma^2 / n? Why do you need to multiply by 2?
@zdaman011
@zdaman011 3 года назад
Great explanation! Out of curiosity, when would we need to apply a formula like this? Wouldn't we already know what the sample size is if we have a sample variance?
@zdaman011
@zdaman011 3 года назад
@@Doorknob985 That makes sense, but what's confusing me is that the sample size formula requires us to have the variance of the sample means already, which implies we've already ran an experiment
@emma_ding
@emma_ding 3 года назад
@@zdaman011 No, sample variance can be obtained from the data before the experiment (it's the same as the variance in the control group after you run the test), exactly as Ravi mentioned above i.e. from existing metric. Sample size is the size of the data thats needed to get a certain power of a test. This is an over-simplified formula as it assumes sample variance is the same in control and treatment groups, but it would be helpful for interviews. In reality, it's more common to have a "ramping" process to control risks rather than splitting all users into either control or treatment.
@learnrepeatlearn
@learnrepeatlearn 2 года назад
@@emma_ding Hi Emma! Thank you for the great video. I have a quick question: What happens if the variance is different between control and treatment, is there another equation we can use to calculate the sample sizes for each? I am assuming that control and treatment group may not necessarily have the same sizes here
@popo-je8ze
@popo-je8ze 2 года назад
I found that some practical issues are quite hard 1. how to calculate required sample size for ratio or quantile metric ? 2. if we already know that there are some correlations between our sample,thus,underestimating the variance, how to estimate required sample size correctly either for AA test or AB test
@rickyg3390
@rickyg3390 3 года назад
Hi Emma, isn't phi(Z(x)) = x? instead of -x?
@martingai7333
@martingai7333 2 года назад
I agree, this is also where I got stuck. Have u figure it out?
@jianzezhou1301
@jianzezhou1301 2 года назад
This part is right. Actually it is the last part to be wrong, Z0.025 = 1.96 and Z0.2 = 0.84.
@jianzezhou1301
@jianzezhou1301 2 года назад
The last part is wrong, Z0.025 = 1.96 and Z0.2 = 0.84. Actually if Z0.025 is negative everything after the absolute value part is wrong.
@EvanZamir
@EvanZamir 3 года назад
Is there a book that contains this derivation?
@emma_ding
@emma_ding 3 года назад
hmm, not I'm aware of. But there're online resources on it if you search on Google.
@mvijayvenkatesh
@mvijayvenkatesh 3 года назад
I am using "Applied statistics & probability for engineers" by Douglas Montgomery and George Runger, 2nd Edition. Chapters 7,8,9 deal with these topics and the book is an excellent resource.
@overseasafrican9899
@overseasafrican9899 3 года назад
@evan zamir, You can find another derivation of this formula in Gerald van Belle's book Statistical Rules of Thumb. Chapter 2, page 30. van Belle has a free pdf of this particular chapter posted on his web page.
@adrianusqueiroz
@adrianusqueiroz 2 года назад
How can we have sample variance, if we did not perform the experiment yet? this is making me confused!
@lucasdeoliveirasilva8466
@lucasdeoliveirasilva8466 2 года назад
that would be an estimate...i think
@empiricalformulas9693
@empiricalformulas9693 Год назад
right answer, incorrect alpha and beta.
Далее
Викторина от ПАПЫ 🆘 | WICSUR #shorts
00:56
Редакция. News: 128-я неделя
57:38
Просмотров 1,3 млн
An easier way to do sample size calculations
12:21
Просмотров 15 тыс.
How To Use The A/B Testing Duration Calculator
11:32
Просмотров 3,5 тыс.
A/B Testing Interview with a Google Data Scientist
13:06
How To A/B Test a Product
4:24
Просмотров 43 тыс.
Викторина от ПАПЫ 🆘 | WICSUR #shorts
00:56