Applied Statistics Interview Question | Google Data Scientist Interview

DataInterview

Подписаться 31 тыс.

Просмотров 12 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

21 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 28

@DataInterview Год назад

Want more questions like this? Join the prep community on www.datainterview.com/ 🚀

@gupnir Год назад

This question was asked in my Walmart interview. Wish you released this video before

@pragyantiwari3885 Месяц назад

Another approach I'm thinking about is using INDEPENENT Z TEST ...where the null hypothesis would be that there is no difference in the proportion of heads and tails. And, if the null hypothesis is rejected we can say that the coin is not fair.

@AngelofWar16 Год назад

We could use the binomial distribution, it would be more accurate. And we would compute the probability of getting less then 30 heads plus probability of getting more than 70 heads. And then we would compare it with 0.05 threshold.

@sssam844 Год назад

This is how I've learned statistics at a german uni

@pravinborate1500 Год назад

Yes I think.... This is what hit my mind when I saw the question

@robertwilsoniii2048 Год назад

Exactly. I can't believe people are paid 6 figures to do z tests...

@robertwilsoniii2048 Год назад

It feels like a practical joke. I thougjt they'd be doing stuff like mixed model regression and hardcore generalized modeling.

@kylerasmussen4921 Год назад

Remember, we are learning this for interviewing. "Z Test" is doing p values based on the cumulative distribution function of the normal distribution. Since binomial distribution converges due to CLT, its much easier to use. The alternative is to use the cumulative (mass) distribution function of the binomial distribution, which in fact changes based on N, which makes it much harder to do in practice.

@dmg-s 3 месяца назад

The variance you've computed here is from Bernoulli and I would like to know how it follows the normal as n increases. isn't it the binomial distribution which does so?

@dmg-s 3 месяца назад

could chi2 be the best option here?

@user-wr4yl7tx3w Год назад

Be great to still see the results in case of Bayesian approach

@xinyili7450 Год назад

could you give a more detailed answer for why used z-test here, many thanks!

@DataInterview Год назад

Hi! Though the outcome is binary, you can represent this in proportion. When your sample size is large enough, given CLT, the sampling distribution of proportion becomes normal. Z-test assumes that the sampling distribution is normal for the test to work. In this case, the condition is satisfied as mentioned above. So that’s why Z-Test for proportion works here.

@shritejchavan6222 7 месяцев назад

what if we calculate the confidence Interval for Population Proportion using the standard error of the estimate using sqrt(0.7*0.3/100) and check whether 0.5 lies in the resulting Confidence Interval?

@nu940 Год назад

Thanks, this is a good explanation of the problem

@user-wr4yl7tx3w Год назад

more videos like this. great format.

@robertwilsoniii2048 Год назад

Why not just do a Chi-square goodness of fit test?

@AlirezaAminian 2 месяца назад

Your face is on top of the critical solution number (z=4.0) the whole time

@jaspreetsingh-nr6gr Год назад

I was expecting MLE parameter estimation, but that is also sort of bayesian. Do u agree/disagree with that? It does rely on bayesian principles.

@heyman620 Год назад

It's based on the CLT (central limit theorem). I am not sure how you intend to use MLE for it, maybe I don't understand enough but it seems like MLE for that kind of task is just the mean. If you can generate more data, the law of large numbers would let you estimate mu directly!

@kylerasmussen4921 Год назад

@@heyman620 LLN just provides that a stochastic process will tend to the sample mean asymptotically if the process is stationary. CLT will already start converging by 100 data points. That being said, MLE is a statistical method to try and determine the PDF of the data, but doesn't make statements like "bias". You still need to do hypothesis testing, whether that be chi square or otherwise.

@heyman620 Год назад

@@kylerasmussen4921Remember that at the end, all you test is that the means of two groups are different, you can do it by gazillion ways. When you have a small amount of data you can make some assumptions regarding the distribution, e.g. assume it's normal. That being said, you don't have to, you can actually use Chebyshev's inequality to mimic the test I believe. But assuming normality makes a lot of sense in this setup. What I think is, that all you need is the means, the variance, and a way to know how likely it is to be an estimation error. I guess I just stated it implicitly but you are right. Given infinite data and finite variance you don't need a test though, i.e., LLN :). Very nice comment, thanks!

@heyman620 Год назад

It's a nice solution but I think that once you figure out you can use the normal distribution to do so talking about "computing z value" is a little 3rd grade. I think what's more important is knowing the assumptions, i.e, independence. And understanding that this test is, in fact, based on the CLT (this mean is sampled from the distribution of the means!). Honestly - sorry, I am not sure I would give you a perfect score for the interview since you just used a statistical test blindly (pass for sure).

@KumarHemjeet 4 месяца назад

Ho would you solve this problem then?

@heyman620 4 месяца назад

So much better to just simulate it... You can assume normality because of the convergence in distribution to normal in this setup, which stems from the CLT. However, this convergence happens as n -> \infty and here n is fixed to 100. Instead, you can actually simulate with P=0.5 and get a better estimation. Just find how many of the outcomes in your simulation have at least 70 heads, let K be this number and N the number of simulations. Your p-value is K/N (the null hypothesis is that P=0.5 and you count instances in which it is as observed in the description). That's a form of Bootstrapping.

@heyman620 4 месяца назад

@@KumarHemjeet Like, what would you do if I tell you it is 11 of 13, would you still assume normality?