Crack A/B Testing Problems for Data Science Interviews | Product Sense Interviews

Подписаться 55 тыс.

Просмотров 156 тыс.

50% 1

A/B Testing questions are very commonly asked in Data Science interviews together with metric ("case") problems. In this video, we will go over everything you need to know about A/B testing. Make sure you stay till the end. I am going to share other A/B testing resources to help you with your interview preparation.
Read a More Comprehensive Article on A/B Testing
towardsdatascience.com/7-a-b-...
Step by Step Guide on Calculating Sample Sizes for A/B Tests
• Sample Size Estimation...
Cracking Product Sense Problems in Data Science Interviews
• Crack Metric/Business ...
Udacity's A/B Testing Course www.udacity.com/course/ab-tes...
My friend Kelly's post on Towards Data Science towardsdatascience.com/a-summ...
Book: Trustworthy Online Controlled Experiments www.amazon.com/Trustworthy-On...
LinkedIn's Ego Cluster Paper
arxiv.org/pdf/1903.08755.pdf
🟢Get all my free data science interview resources
www.emmading.com/resources
🟡 Product Case Interview Cheatsheet www.emmading.com/product-case...
🟠 Statistics Interview Cheatsheet www.emmading.com/statistics-i...
🟣 Behavioral Interview Cheatsheet www.emmading.com/behavioral-i...
🔵 Data Science Resume Checklist www.emmading.com/data-science...
✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: www.emmading.com/coaching
// Comment
Got any questions? Something to add?
Write a comment below to chat.
// Let's connect on LinkedIn:
/ emmading001
====================
Contents of this video:
====================
0:00 Intro
1:26 What is A/B testing
2:30 Designing an A/B test
4:46 Multiple testing problem
7:47 Novelty and primacy effect
9:38 Interference between groups
12:51 Dealing with interference
15:37 Resources

Опубликовано:

11 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 124

@emma_ding 3 года назад

FAQ: 1. 7:46 it should be 10 rather than 1 False positives in 200 metrics, thanks Ayank for pointing it out! 2. Running one A/B test with 10 variants vs running 10 A/B tests 10 variants testing means you have 1 control and 9 treatments. For example, you want to test 10 colors of a button, so each group of users see a different color. It's different from 10 A/B tests (each with 2 variants). For the 10 color example, you could run 10 A/B tests each have two variants (1 control and 1 treatment) but it's less efficient. This article may help understand the multiple testing concept home.uchicago.edu/amshaikh/webfiles/palgrave.pdf 3. 5:50 Probability of "no false positive" For details of how it's computed as (1 - alpha) ^ n, you can read more from home.uchicago.edu/amshaikh/webfiles/palgrave.pdf 4. 12:20 in two-sided markets the treatment effect would be overestimated. Why is that? For example, if a small group of Uber users receives incentives to have more rides, there will be enough driver to accommodate for the additional demand. However, if the incentives extents to all users, it's likely there will be not enough drivers to meet the huge increase of the demand (in the short term). Therefore, the treatment effect would likely to be overestimated. Feel free to ask questions below. Your questions may help others as well! If you have specific questions in your job search, feel free to reach out to me here data-interview-questions.web.app/.

@rachitsingh3299 3 года назад

5:50 Why .05 is subtracted. I understand the part of no false positive. Why .05?

@jasonchen3062 3 года назад

@@rachitsingh3299 5% type 1 error is a commonly used value

@leoyuanluo 3 года назад

Hey emma, in 12:06 you said, "...a new product that attracts more drivers in the treatment group...", is the objective of the treatment group to attract more drivers or to make uber users to call more uber rides?

@sitongchen6688 3 года назад

Hi Emma, thanks for your great sharing! Regarding the point 4 above, I feel this is a comparison between pre and post launches of a new feature. What about bias during ab test between control and treatment groups? I think that should also be overestimation of true treatment effect, since there will be less available drivers for the control group which will cause a less number of rides completed than that of normal scenario for control group riders.

@oliverxu5134 Год назад

For False Positive Rate, I want to know, how do you know a rejection is false positive. I mean, unlike classification, we know the true label and prediction, so we know whether a prediction is false positive or not. But in this case, we don't know whether the null hypothesis is true or not. Then how do we know a rejection is false positive? Besides, for each rejection, should we use the original criteria (p = 0.05) to reject?

@aspark47 3 года назад

Awesome content. I appreciate the structured walk-through of potential problems in designing A/B testing. I also like the idea of summarizing "trustworthy online controlled experiments." Looking forward to it!

@goodjuju2132 3 года назад

I was really struggling with A/B testing. This video + your friend Kelly's post just helped me ace an interview on it! You are a treasure

@lexichen4131 3 года назад

This 16mins saved me 3 hrs at least, thanks so much!

@taylorlee8196 3 года назад

Best video ever! Very organized and oriented! Look forward to seeing more!

@jeoffleonora4612 3 года назад

This is the best ab testing video. Period.

@abtestingvideos2259 3 года назад

This is more helpful than a paid A/B testing course on Udemy! Emma, you are so awesome!

@1-person-startup 3 года назад

this channel is a goldmine

@jieyuwang5120 3 года назад

Really great video! Thanks for making it available to everyone!

@alanzhu7538 2 года назад

Love the content! Keep going!

@Theartsygalslays 2 года назад

So well articulated and enlightening! This is the vocabulary I wish I had to explain A/B testing stats to less technical folks in the past. Thank you!

@emma_ding 2 года назад

Thank you for your kind words Veronica! :)

@miamamia354 3 года назад

Great! I am also reading the book you recommended. Looking forward to the next video.

@afridmondal3454 Год назад

Amazing Explanation! Loved it ☺

@MrBlackitalian 2 года назад

Thank you so much for the resources!!

@kellypeng9026 3 года назад

Very comprehensive content! Honored to be mentioned in Emma’s video! 😄😄😊

@tattwadarshipanda6029 3 года назад

You both are an inspiration.

@klimmy. 2 года назад

Hey Emma, thank you, that's really helpful! Please note, for the multiple testing problem there is a common confusion between p-value and false positive error ratio and what you calculated on 5:58 I believe is not particularly a false positive. They are related, but not the same (for the reference you may use pages. 41, 186 of Trustworthy experiments, or an article A dirty dozen: twelve p-value misconceptions). False positives depend on p-value and the prior belief in the Hypothesis. This example helped me: if you are trying to convert steel to the gold you may get in the experiment p-value = 0.05. But our prior belief is that we cannot do that from the chemical perspective. So 100% of rejections will be false, or False Positives will be 1.00 for our experiment (not 0.05). In probability terms (H0 means null hypothesis is true, D means data observed): False Positive Rate = P(H0, D) p-value = P(D | H0) by definition Their relations: P(H0, D) = P(D | H0) * P(H0) Hope that'd help :)

@minma1987 2 года назад

This was very helpful, thank you!

@hameddadgour Год назад

Great content! Thank you for sharing.

@poopah4497 2 года назад

Thank you. Watch multiple times.

@halflearned2190 3 года назад

Excellent content, thanks!

@ceciliaxu 3 года назад

This is very helpful. Your voice is like one of my teacher at Bittiger. Her name is also Emma. 😊😊

@carloschavez9740 2 года назад

I 've read a lot of articles and this video is amz

@mussdroid 3 года назад

I want to be data scientist. Emma rocks the industry 🙏

@goelnikhils 2 года назад

Thanks a lot. Amazing content

@diegozpulido 3 года назад

Hi Ema. Thank you very much for your videos. Thanks to them I got a Senior Data Scientist position at Facebook. I will forever thank you for your exceedingly good work.

@emma_ding 3 года назад

Congrats! I'm so glad to hear it, best of luck with your new job!

@tinos0330 2 месяца назад

wow it's very informative emma

@shelllu6888 3 года назад

Hey Emma, thanks a lot for creating the video. tbh this is the most applicable ab testing video I've watched on RU-vid! Great job on creating this and thanks for making the video and help the data science community grow. 1. Got a quick question on determining the # of days to run AB testing, you mentioned to divide sample size by # of users in each group. If we have multiple groups with not equal number of users, how do we decide # of days to run AB testing accordingly? 2. About FDR: I'm still a bit confused on the definitions, why the formula involves calculating expectations, is FDR a random variable? (If I'm lagging so much behind, could you help throw me a link so that I can read more to pick up?) Thanks so much again!

@user-er7sn7ef2p 3 года назад

Brilliant!

@xingchenwang1471 3 года назад

I just read the summary article by Kelly a few days ago

@zhefeijin9627 3 года назад

Hi Emma. One more such useful video!! I have a question about 'split the control and treatment group by cluster'. I know the clustering by geo-location can introduce some selection bias. For example, we do not know if it works in the U.S when we test it in Spain. Therefore, Facebook and Linkedin make the cluster according to the social graph. My question is 'if randomly take some of these clusters (social graph) for testing, will it also have any selection bias'? Thank you so much!

@hasantao 2 года назад

Very well done.

@halflearned2190 3 года назад

Nice video, thanks!

@zzzs5545 3 года назад

Great! Looking for more ab testing contents.

@emma_ding 3 года назад

More to come!

@linhe5896 3 года назад

I enjoyed this one a lot Emma. You are becoming a pro at youtube content and style. You show more facial expression => user engagement. The second part I like is how relevant it is to real interview questions. Please keep going, and perhaps a case study combined with product sense and AB testing for future topic.

@RobertoAnzaldua 2 года назад

Great video, thanks for posting :D

@emma_ding 2 года назад

My pleasure! So happy it was helpful for you Roberto!

@santoshbiswal6567 Год назад

Thanks Emma for putting this up. One question: If we want to compare total revenue/acquisition of Test and Control group, what test(z-test ,Chisquare etc) can be used to test hypothesis? Population size > 1Mn

@judyhe686 3 года назад

Hi Emma, thanks for this video and it's super helpful! I have a question around the ego-network randomization to solve network effect. I don't understand how it works because even if each user is not assigned a feature, they are still likely affected by users in the treatment group when it spills over? Can you elaborate more on that? Thanks!

@Alexandra-he8ol 3 года назад

Thank you very much🙏🏻

@iOSGamingDynasties 3 года назад

Great video Emma, some of the best A/B testing materials I have to say. However, I have some questions, when we say sample size, does it mean adding control + treatment groups? I read from somewhere that it is just the number of experimenters in a single group. Also why when we calculate the time it takes to run A/B test, we use the formula (sample size/# of users in a group)? Group here means control/treatment or just a batch of users that we show the experiment to at a single time? Do you think that it is a good idea to show all users at the same time, when the required sample size is small? Thanks!

@jfjoubertquebec 2 года назад

Subscribed, liked. Finally, who talks like an adult. Thank you for your ptofessionalism!

@emma_ding 2 года назад

Thank you JJ!

@ARJUN-op2dh 3 года назад

Amazing........!!!!!!!!!!!

@timhsu87 2 года назад

Thank you so much 😊

@sophial.4488 2 года назад

Quality content in each and every video. Emma you are great to condense information into digestible format.

@zenofall4455 3 года назад

Emma your channel is brilliant. Thanks for creating this content. I had a quick follow up question: Lets say we do a small format change on posts at FB and want to measure if this has any effect on user interaction. We choose metric: #UsersWhoEngagedinAction/#TotalUsers Based on your A/B testing video - where you used approx formula N = 16*var/d^2 , to determine sample size. typically for a binomial distributed metric like one we chose: var= p*(1-p) , say if p=0.2, and dmin=2%, sample size comes to ~6400. For a big company like FB where they have 2.5B DAU, approx 30K users active per min (Assumption: Ignoring any other splitting of users by characterstics or time of day) if we decide to only use 1% of our active users per min (30k * 1% ) and split them into two groups - 150 each, the minimum samples required would be collected in 21mins. 6400/300. Is that correct? - are the experimentation durations this small for a problem like this at a high traffic platform.

@emma_ding 3 года назад

You are right on the math. But in reality, companies don't assign all users to either control or treatment groups of a single test. It's due to a few reasons: 1. they may run hundreds (if not thousands) of experiments in parallel (especially in companies such as FB) so each test don't get that many users. 2. In reality, it's more common to have a "ramping" process to control risks rather than splitting all users into either control or treatment, so the duration will be longer than the calculated value.

@lanaherman 3 года назад

why did you take (var=p*(1-p) instead of var=p*(1-p)*n) and (dmin=2%)?

@kelseyarthur6421 2 года назад

Great video

@omid9422 2 года назад

Excellent

@teddy911 2 года назад

小姐姐的视频真不错啊，很有用

@SerenaKong Год назад

Thanks for sharing these videos. It is really clear and helpful! I have a question. How can we know if there is the spillover effect between control group and treatment group? If there is any way to detect it?

@thegreatlazydazz 3 года назад

I would like to say that I whole heartedly support the idea of making a video on the book with the picture of the hippo. I am from a staistics background, but never quite understood how stats were being used in this ab testing setting. Thanks a ton!!!!!

@jonathanloganmoran 3 года назад

Fantastic video-thank you, Emma, for your help! Just an FYI, you forgot to reference LinkedIn's ego-cluster paper in the description (14:45).

@emma_ding 3 года назад

I added a link to the paper in the description. Thanks!

@plttji2615 2 года назад

Hi Emma, thank you for the video. What if we want to decide among two features how can we design the AB testing? or Is it multivariate testing? Thank you

@yihongsui4525 11 месяцев назад

Hey Emma, thanks so much for the great video! 9:34 when test is already running while you want to deal with the novelty and primacy effect, would it be better to compare "first time users in treatment" vs "first time users in control"? or even... compare "first-time in treatment vs first-time in control" vs "old in treamtment vs old in control"?

@amneymnr6455 3 года назад

Thanks Emma! I got this question on a previous interview and would love your thoughts: 'What methods can you use when an A/B test cannot or has not been conducted?"

@emma_ding 3 года назад

Ideas could be comparing before and after. Or implementing and compare variants in different geo-regions (or based on other user segmentation methods). You can google and explore more ideas. Depending on the problem, the downside of not using A/B is you may need more effort on analysis and/or bias correction.

@neeru1196 3 года назад

It would help if you explained the variables and talked about "parameters" in detail. Thanks for the video!

@emma_ding 3 года назад

Noted! Thanks for the feedback!

@lisawenyingliu3801 2 года назад

Hi Emma, thanks a lot for making these high quality tutorial videos, very helpful. But it is hard for me to understand because I don't have any basic knowledge, can I ask do you have any book to recommend for me to read so that I can better understand your videos?

@emma_ding 2 года назад

Hi Lisa, please check out this blog! towardsdatascience.com/how-i-got-4-data-science-offers-and-doubled-my-income-2-months-after-being-laid-off-b3b6d2de6938#6f86

@karundeep07 3 года назад

Hey Emma, One more quick questions - At 3:50 when we are calculating Sample Size, it is said that we can get variance from the sample. Just wondering how we can get variance while we are in the first phase of designing A/B Test and we have not run the experiment and we don't have the sample yet. How we will get the sample variance? Please help me here as well.

@tejashshah5202 Год назад

Hi @Karundeep Yadav, did you find out the answer to your question. Would love to hear the answer in that case. Had same question too!

@nope4881 3 года назад

Hi, great topic! I have a question! You mentioned the 'difference between treatment and control' = 'delta' can be obtained by MDE. How do we get it? How to estimate 'delta' from MDE? Also, can you show an example of the use the sample size = 16*sample variance/delta formula, obtain 'delta' from MDE and get a value of 'sample size'. Hope you understand the question :)

@emma_ding 3 года назад

You can refer to this video ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-JEAsoUrX6KQ.html for derivation of the sample size.

@tejas5872 3 года назад

Hey Emma, Thank you for the valuable content. I've been following your channel and it's helping me regarding the expectation in the interview!. I just have a question - You mentioned coding round will be conducted in the first round. Will the coding round be based on data structures (Linked Lists, Queues, Stacks, Dynamic programming etc) or basic coding challenges like print a palindrome? Please help

@emma_ding 3 года назад

Good question! This blog summarizes all the different kinds of coding interviews and I think it may clarify things towardsdatascience.com/the-ultimate-guide-to-acing-coding-interviews-for-data-scientists-d45c99d6bddc!

@janeli2487 3 года назад

Hi Emma, Thanks for your video, It's very comprehensive. I am wondering what would you do or communicate with PMs if the p-value is just a little bit missed, such as you got 0.051 while you defined your significant level at 0.05? Thanks

@emma_ding 3 года назад

The situation is debatable. An option could be to run the experiment a little longer to see if the p value changes. The bottomline is you don't want to compromise the criteria (ie the significance level) after seeing the results.

@janeli2487 3 года назад

@@emma_ding Thanks!

@lingli8999 3 года назад

Emma, another great video, thank you! I had a question. You mentioned in this video that referral program is usually considered as long-term. I understand for referral programs for like housing, it takes a long time. How about other referral programs like Uber eats, Robinhood new user program with a random stock? Can those be tested with A/B testing?

@emma_ding 3 года назад

Even for Uber eats and Robinhood referral programs are still longer time compared with instantaneous change eg. feature update. You can A/B test those but with longer feedback loop.

@lingli8999 3 года назад

@@emma_ding Thanks a lot Emma!

@XuJiBoY 3 года назад

Hi Emma, thank you very much for the great informative video! I have a question: at 12:20 you mentioned that in two-sided markets the treatment effect would be overestimated, may I know why is that? I can't quite figure it out.

@emma_ding 3 года назад

For example, if a small group of Uber users receives incentives to have more rides, there will be enough driver to accommodate for the additional demand. However, if the incentives extents to all users, it's likely there will be not enough drivers to meet the huge increase of the demand (in the short term). Therefore, the treatment effect would likely to be overestimated.

@XuJiBoY 3 года назад

@@emma_ding Thank you very much for the explanation. This makes sense. So it's the resource competition in the population of all users, which was not an issue in the sub-population of the experiment. I guess it's probably assumed that the treatment effect is focusing on the increase in successful ride transactions, instead of pure ride demand from users (regardless of fulfillment of the demand).

@rachitsingh3299 3 года назад

Hey Emma! can you explain the difference between A/B testing and experimental design?

@emma_ding 3 года назад

A/B testing is the same as online controlled experiment.

@cl2hanovastar 2 года назад

at 7:43, what does '200 metrics" mean? According to definition of FDR, it should be 200 rejected null hypothesis but not 200 tests. Could you please clarify?

@yidanhu7889 2 года назад

Hi Emma, I do not understand ego-network randomization. What is the difference between it and "create network effect" method? I do not understand your sentence in the video "meaning the effect of my immediate connections treatment on me"? Could you please help? The paper is too long. Thank you!!

@allison-hd1fg 2 года назад

Is minimum detectable effect the same thing as practical significance?

@LauraLigmail 2 года назад

Hey Emma, for 5% FDR, would u mind helping me understand how you got to ‘at least 1 false positive for 200 metrics ‘?

@jessesong9546 2 года назад

I think she meant that the probability of observing at least 1 false positive among 200 metrics is .05, hope this makes sense.

@nplgwnm 3 месяца назад

Video was made in 2021, and I busted into laughter when “company X” is mentioned 😂 who would know, right? 😂😂😂

@alanzhu7538 2 года назад

14:40 When you talked about splitting the clusters, do you mean randomly splitting people within a cluster to treatment and control group?

@nipundiwan 2 года назад

Let's say there are a total of n clusters in the entire sample. You randomly assign n/2 clusters to the treatment group and the remaining n/2 clusters to the control group.

@roshanpatnaik1902 2 года назад

Hi Emma, In the sample size discussion i.e. where you mentioned that sample size is 16 sigma square/ Delta square, sigma is sample variance of the test or control?

@emma_ding 2 года назад

Hi Roshan, thank you for your question. Have you checked out my video -> ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-VpTlNRUcIDo.html, where I discuss the basics of A/B testing? Have a look and let me know if you still have questions! Thanks for watching and sharing!

@karencao1538 3 года назад

Hi Emma, one question on sample variance when calculating sample size, are we referring to the sample variance of the treatment group before the experiment? Just a bit confused on what actually we're referring to here...

@emma_ding 3 года назад

The statistic we are testing is delta (the difference) so the "variance" is the variance of delta.

@rogerzhao1158 3 года назад

@@emma_ding @Data Interview Pro Hi Emma, the video is super helpful. I have one question: the sample variance is calculated as the variance of the delta, so we can only calculate the sample size after the experiment is started and data is collected? But shouldn't we decide the sample size before we start the experiment? I get confused about the order and hope you can help clarify. Thank you.

@yingyingxu9926 3 года назад

Question: when you talk about multiple testing problem, is that required exact same tests among 10 groups? Like 10x AA test? If we have 10 different variants, we can think it as 10 different AB tests conduct simultaneously. Do I miss something here?

@emma_ding 3 года назад

No, multiple testing means you have 10 variants, i.e. 1 control and 9 treatments. For example, you want to test 10 colors of a button, so each group of users see a different color. It's different from 10 A/B tests (each with 2 variants). For the 10 color example, you could run 10 A/B tests each have two variants (1 control and 1 treatment) but there's not need to do it.

@karundeep07 3 года назад

Thank a lot emma. One quick question. At 3:45 pm, since we haven't run the test yet... then how we can the value of sigma and delta. Delta, we can get by minimum detectable effect. But what about signma. Please help me understand this. Thanks again..

@emma_ding 3 года назад

Both sigma and delta are predetermined. They should be known before running the experiment.

@guancan 2 года назад

@@emma_ding Wonder how we can know what are the samples if the sample size is not determined -- if we don't know what are the samples, how we could observe sample variance? Could you please further explain Emma?

@haowu6918 2 года назад

How to estimate the variance from datasets?

@ayankgupta4796 3 года назад

7:46, should it not be 10 False positives in 200 metrics? Am i missing something

@emma_ding 3 года назад

Thanks for pointing it out!

@stellaying5483 3 года назад

Thanks. Had the same question.

@rogerzhang6296 3 года назад

same question here

@nathannguyen2041 3 года назад

Informative video! What is the difference between A/B testing and analysis of variance (design of experiments topics? All of these topics are essentially the same e.g., treatments/factors, randomisation, experiment design, Bonferroni/Kimball inequality, etc. Is there a particular reason why there is a distinction of A/B testing from the general ANOVA framework? I may have just answered my own question though..."the general ANOVA framework," but it doesn't hurt to ask someone with more education and work experience than me.

@dunjianxiao4105 3 года назад

LIFESAVER

@Han-ve8uh 3 года назад

At 5:50 it shows (1-0.05)^3 for 3 groups (i assume it means variants also), then is the formula for 2 groups (1-0.05)^2? But this seems wrong because no False positive for 2 groups should just be 0.95? Something confusing here is the concept of number of tests and variants within a test. I'm not sure if these 2 are the same thing? At 5:30 i interpret it as 2 variants in a single test, suddenly at 5:50, the word variant disappeared and changed to 3 groups, making me think it's 3 variants in a single test, but it also looks like 3 tests, each containing 1 group/variant and the "no change" null group?

@emma_ding 3 года назад

Sorry for the confusion, I should've made it clearer. Group refers to the treatment group, 3 groups at 5:50 means there're 4 variants in total. Multiple testing problem is about more than two variants in a single test, it does not relate to multiple A/B tests (each has two variants). This may help you understand the concept better home.uchicago.edu/amshaikh/webfiles/palgrave.pdf "But this seems wrong because no False positive for 2 groups should just be 0.95?" - Why? If we have 2 variants (i.e. one control and one treatment), the false positive rate (Type 1 error or significance level) should be exactly 0.05 thus the probability of seeing no false positive is 0.95.

@jaysun2654 2 года назад

I found a typo at 8:23 that is word of 'lager' should be 'larger'.

@freya_yuen 9 месяцев назад

Why can't I save this video for my playlist /.\

@enlightenment9834 3 года назад

You are so cutee

@seant7907 3 года назад

Emma, I don't mean any offense. Can you add subtitles to your vids? I find it hard to follow what you're speaking because I myself am not a native English speaker. Thank youu!!

@emma_ding 3 года назад

Thanks for the feedback! RU-vid has the subtitles function (a "cc" icon on the right bottom of the video) that may help with understanding the content. It may have some errors though, I'll try to upload subtitles as soon as I can.