Тёмный

Applied Statistics Interview Question | Google Data Scientist Interview 

DataInterview
Подписаться 31 тыс.
Просмотров 12 тыс.
50% 1

Опубликовано:

 

21 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 28   
@DataInterview
@DataInterview Год назад
Want more questions like this? Join the prep community on www.datainterview.com/ 🚀
@gupnir
@gupnir Год назад
This question was asked in my Walmart interview. Wish you released this video before
@pragyantiwari3885
@pragyantiwari3885 Месяц назад
Another approach I'm thinking about is using INDEPENENT Z TEST ...where the null hypothesis would be that there is no difference in the proportion of heads and tails. And, if the null hypothesis is rejected we can say that the coin is not fair.
@AngelofWar16
@AngelofWar16 Год назад
We could use the binomial distribution, it would be more accurate. And we would compute the probability of getting less then 30 heads plus probability of getting more than 70 heads. And then we would compare it with 0.05 threshold.
@sssam844
@sssam844 Год назад
This is how I've learned statistics at a german uni
@pravinborate1500
@pravinborate1500 Год назад
Yes I think.... This is what hit my mind when I saw the question
@robertwilsoniii2048
@robertwilsoniii2048 Год назад
Exactly. I can't believe people are paid 6 figures to do z tests...
@robertwilsoniii2048
@robertwilsoniii2048 Год назад
It feels like a practical joke. I thougjt they'd be doing stuff like mixed model regression and hardcore generalized modeling.
@kylerasmussen4921
@kylerasmussen4921 Год назад
Remember, we are learning this for interviewing. "Z Test" is doing p values based on the cumulative distribution function of the normal distribution. Since binomial distribution converges due to CLT, its much easier to use. The alternative is to use the cumulative (mass) distribution function of the binomial distribution, which in fact changes based on N, which makes it much harder to do in practice.
@dmg-s
@dmg-s 3 месяца назад
The variance you've computed here is from Bernoulli and I would like to know how it follows the normal as n increases. isn't it the binomial distribution which does so?
@dmg-s
@dmg-s 3 месяца назад
could chi2 be the best option here?
@user-wr4yl7tx3w
@user-wr4yl7tx3w Год назад
Be great to still see the results in case of Bayesian approach
@xinyili7450
@xinyili7450 Год назад
could you give a more detailed answer for why used z-test here, many thanks!
@DataInterview
@DataInterview Год назад
Hi! Though the outcome is binary, you can represent this in proportion. When your sample size is large enough, given CLT, the sampling distribution of proportion becomes normal. Z-test assumes that the sampling distribution is normal for the test to work. In this case, the condition is satisfied as mentioned above. So that’s why Z-Test for proportion works here.
@shritejchavan6222
@shritejchavan6222 7 месяцев назад
what if we calculate the confidence Interval for Population Proportion using the standard error of the estimate using sqrt(0.7*0.3/100) and check whether 0.5 lies in the resulting Confidence Interval?
@nu940
@nu940 Год назад
Thanks, this is a good explanation of the problem
@user-wr4yl7tx3w
@user-wr4yl7tx3w Год назад
more videos like this. great format.
@robertwilsoniii2048
@robertwilsoniii2048 Год назад
Why not just do a Chi-square goodness of fit test?
@AlirezaAminian
@AlirezaAminian 2 месяца назад
Your face is on top of the critical solution number (z=4.0) the whole time
@jaspreetsingh-nr6gr
@jaspreetsingh-nr6gr Год назад
I was expecting MLE parameter estimation, but that is also sort of bayesian. Do u agree/disagree with that? It does rely on bayesian principles.
@heyman620
@heyman620 Год назад
It's based on the CLT (central limit theorem). I am not sure how you intend to use MLE for it, maybe I don't understand enough but it seems like MLE for that kind of task is just the mean. If you can generate more data, the law of large numbers would let you estimate mu directly!
@kylerasmussen4921
@kylerasmussen4921 Год назад
@@heyman620 LLN just provides that a stochastic process will tend to the sample mean asymptotically if the process is stationary. CLT will already start converging by 100 data points. That being said, MLE is a statistical method to try and determine the PDF of the data, but doesn't make statements like "bias". You still need to do hypothesis testing, whether that be chi square or otherwise.
@heyman620
@heyman620 Год назад
​​@@kylerasmussen4921Remember that at the end, all you test is that the means of two groups are different, you can do it by gazillion ways. When you have a small amount of data you can make some assumptions regarding the distribution, e.g. assume it's normal. That being said, you don't have to, you can actually use Chebyshev's inequality to mimic the test I believe. But assuming normality makes a lot of sense in this setup. What I think is, that all you need is the means, the variance, and a way to know how likely it is to be an estimation error. I guess I just stated it implicitly but you are right. Given infinite data and finite variance you don't need a test though, i.e., LLN :). Very nice comment, thanks!
@heyman620
@heyman620 Год назад
It's a nice solution but I think that once you figure out you can use the normal distribution to do so talking about "computing z value" is a little 3rd grade. I think what's more important is knowing the assumptions, i.e, independence. And understanding that this test is, in fact, based on the CLT (this mean is sampled from the distribution of the means!). Honestly - sorry, I am not sure I would give you a perfect score for the interview since you just used a statistical test blindly (pass for sure).
@KumarHemjeet
@KumarHemjeet 4 месяца назад
Ho would you solve this problem then?
@heyman620
@heyman620 4 месяца назад
So much better to just simulate it... You can assume normality because of the convergence in distribution to normal in this setup, which stems from the CLT. However, this convergence happens as n -> \infty and here n is fixed to 100. Instead, you can actually simulate with P=0.5 and get a better estimation. Just find how many of the outcomes in your simulation have at least 70 heads, let K be this number and N the number of simulations. Your p-value is K/N (the null hypothesis is that P=0.5 and you count instances in which it is as observed in the description). That's a form of Bootstrapping.
@heyman620
@heyman620 4 месяца назад
@@KumarHemjeet Like, what would you do if I tell you it is 11 of 13, would you still assume normality?
Далее
Ребенок по калькуляции 😂
00:32
Просмотров 110 тыс.
Немного заблудился 😂
00:16
Просмотров 300 тыс.
Как не носить с собой вещи
00:31
Просмотров 937 тыс.
A/B Testing Interview with a Google Data Scientist
13:06
Amazon Data Science Interview: Linear Regression
23:09
How He Got $600,000 Data Engineer Job
19:08
Просмотров 143 тыс.
Meta Top 15 Interview Questions
26:34
Просмотров 11 тыс.
Probability and Statistics: Overview
29:43
Просмотров 57 тыс.
The Essential Guide To Hypothesis Testing
16:01
Просмотров 10 тыс.
Ребенок по калькуляции 😂
00:32
Просмотров 110 тыс.