Тёмный
DataInterview
DataInterview
DataInterview
Подписаться
A one-stop shop for candidates preparing for data scientist, data engineering, and ML engineering interviews.
How to Land Data Roles in Chime and Amazon
1:04:09
2 года назад
Interview with an Amazon BI Engineer
23:36
2 года назад
Комментарии
@ghostofuchiha8124
@ghostofuchiha8124 5 дней назад
I recently failed google ds interview in AB Testing ; Feels really bad + next attempt would be after a year which makes me even sad. So yeah if sitting for FAANG always look into AB testing and statistics
@joseharper3314
@joseharper3314 9 дней назад
How did I not see this before the last 5 case studies I bombed 😢. Ty!! 😊
@joseharper3314
@joseharper3314 11 дней назад
This was excellent
@annizheng5289
@annizheng5289 12 дней назад
Nice for having a brush up! Thank you!
@nikitabuynyy6236
@nikitabuynyy6236 14 дней назад
absolutely fantastic video! thank you so much!
@technicalboyshreyans
@technicalboyshreyans 14 дней назад
Here is a simple version. You do not necessarily need a subquery or a temp table to solve this question: SELECT w.url, COUNT(a.event_id) AS click_count FROM google_search_activity a JOIN google_search_websites w ON a.website_id = w.website_id WHERE a.event_type = 'clicked' AND a.creation_dt BETWEEN '2022-01-01' AND '2022-06-01' GROUP BY w.url ORDER BY click_count DESC;
@technicalboyshreyans
@technicalboyshreyans 14 дней назад
Here is a simple version. You do not necessarily need a subquery or a temp table to solve this question. SELECT w.url, COUNT(a.event_id) AS click_count FROM google_search_activity a JOIN google_search_websites w ON a.website_id = w.website_id WHERE a.event_type = 'clicked' AND a.creation_dt BETWEEN '2022-01-01' AND '2022-06-01' GROUP BY w.url ORDER BY click_count DESC;
@SwatiTirthramGulia
@SwatiTirthramGulia 15 дней назад
Thank you, Dan, for consistently providing us with valuable and practical content
@sheetalborar6813
@sheetalborar6813 15 дней назад
Is it the variance of the metric or the difference in metric between control and treatment?
@sheetalborar6813
@sheetalborar6813 15 дней назад
can you please clarify what is one sample and two sample?
@chetnamohapatra5181
@chetnamohapatra5181 15 дней назад
Hey Dan! This is exactly what I wanted. Not just the template but going in details of each step and explaining why we chose something vs the other. This is amazing! I am going to subscribe to practice more of these.
@ayushmathur5984
@ayushmathur5984 16 дней назад
in #2 at 4.30 mins how he told that mean,median and mode value can anyone explain please
@SwatiTirthramGulia
@SwatiTirthramGulia 19 дней назад
Your ability to unravel complex problems and present them in simpler terms is truly impressive.
@dmg8529
@dmg8529 20 дней назад
could chi2 be the best option here?
@dmg8529
@dmg8529 20 дней назад
The variance you've computed here is from Bernoulli and I would like to know how it follows the normal as n increases. isn't it the binomial distribution which does so?
@jsaaiman
@jsaaiman 22 дня назад
Hadn't heard of SMOTE or random forest. Helpful to understand.
@deliadane
@deliadane 27 дней назад
This was so helpful and straight forward. Thank you so much!!
@sentiventi
@sentiventi 28 дней назад
for the second question, why can't i use: SELECT DISTINCT email, prime_joined_dt FROM amazon_users JOIN amazon_transactions USING (user_id) JOIN amazon_products USING (product_id) WHERE category IN ('clothing', 'jewelry') AND prime_member = 1 ORDER BY prime_joined_dt ASC; am I missing something?
@abhishekchandrashukla3814
@abhishekchandrashukla3814 28 дней назад
She is beautiful
@SuperMsmystery
@SuperMsmystery 29 дней назад
Can the alternative hypothesis not be revenue being higher in the target group? And do a one tailed test? It doesn't make sense to change the algorithm for a business to make the same amount of money
@Mrys534
@Mrys534 Месяц назад
airbnb: the problem its trying to resolve is two sides, for the hosts side, it allows hosts to make money from places that they are not living at, from the customers side, it allows customers to find cheap housings; airbnb targets the kind of users that have such needs, the onboarding process is the signup process for the 2 sides; user journey: customer side: they search for an accomodation, they land on the main site, they input the dates and destinations/other filters, they are taken to the lodging page, they can scroll, read, and go to multiple hosts sites, then they can check the reviews, location, description etc, the site rewards users with loyalty, it retains users through email notifications/voucher campaigns etc, it grows users through marketing events, referral programs, promo offers etc
@juancruzguillen8288
@juancruzguillen8288 Месяц назад
How would you do if you want to perform an A/B/C test?
@learningrealenglish4964
@learningrealenglish4964 Месяц назад
ChatGPT can answer all the question related to code. So I don't think interviewer should ask those questions anymore. They would focus on how we solve the problems by ideas
@BINTETA.42CAORIS
@BINTETA.42CAORIS Месяц назад
Walk. Physical
@guieclipse
@guieclipse Месяц назад
Is the expected output usually showed in the interviews?
@prachihamlai2717
@prachihamlai2717 Месяц назад
Thank you for creating this video. It provides such a clear and concise overview of A/B testing 👏
@wasami1103
@wasami1103 Месяц назад
Hi I'm on the website but I can't find the corresponding feedback video
@christiansetzkorn6241
@christiansetzkorn6241 2 месяца назад
should the variance not be multiplied by 2?
@aashgohil2665
@aashgohil2665 2 месяца назад
The video is incorrectly labeled market sizing - the more accurate description should be Measuring Success
@laurab6180
@laurab6180 2 месяца назад
I excelled the SQL but couldn't ace the Python part
@OjasBhanarkar
@OjasBhanarkar 2 месяца назад
Hey! Can you please share what exactly in Python were you asked?
@vinayanayak1228
@vinayanayak1228 Месяц назад
hey , can you tell what questions were asked to you?
@liverpooler1997
@liverpooler1997 2 месяца назад
I think the candidate was nervous and spike before thinking many times in the product case. Maybe if she had a couple minutes to compose her thoughts, she coulda done better.
@xiaowenkang9598
@xiaowenkang9598 2 месяца назад
👍thank you so much
@liverpooler1997
@liverpooler1997 2 месяца назад
Both the interviewer and candidate were pretty bad here. Candidate was low energy, and the interviewer didn't really dive deeper... just asked surface level questions and moved on
@monicaakannan9461
@monicaakannan9461 2 месяца назад
@datainterview Wouldn't the click through rate or number or products purchased per user per day be a more effective success metric? If it is revenue per user per day, what if the control group purchases only one high priced product whereas treatment group purchases multiple low prices products, the recommendation system works but the revenue would be higher for the control group
@oorjamathur8459
@oorjamathur8459 2 месяца назад
1. Looking at the past history of the same airline, the preliminary cause of the flight delay might be operational issues. 2. Geography 3. Weather 4. No of runways - if they are too less then if one flight gets delayed then the chances of others getting delayed increases
@kaiyan9589
@kaiyan9589 2 месяца назад
what software/ tool is used for the A/B testing example in the video?
@ilikegeorgiabutiveonlybeen6705
@ilikegeorgiabutiveonlybeen6705 2 месяца назад
idc about "acing" anything
@jacksun7999
@jacksun7999 2 месяца назад
6:43 should the numerator be cov(X,Y)? Seems there is a 1/(N-1) term missing.
@aresdan
@aresdan 3 месяца назад
Good video, but I think you forgot to ask the most important question: how much money the business can save with your model and whether it is worth implementing it in the first place. In your case, you want to create a model that would predict delays in real time and for passengers that are already in the airport and their departure should be in the next couple of hours. But in this case, either your prediction won't be important at all (if it is 10 min or 1 hour delay), since passengers can't do anything with this information, or your delay prediction is so large (5+ hours) that users will ask for extra compensation (since you didn't inform them earlier) and complain why didn't you inform them a day before that the plane is delayed by 5 hours. In your case, you assumed it is for people who are already at the airport, but I would argue that the actual end users are people who's flight is in a day or so. Since if the company would inform users a day before that their flight is 5 hours delayed, users can make adjustments and might not complain so much and request for a refund, compare to when you predict that the flight will be 5 hours delayed when the user is already on the airport. It's more important to predict whether a flight will be delayed 12+ hours before departure, so that the passenger would decide for themselves when to leave home and whether to make other plan changes, rather than a couple of hours before the departure, when most probably everyone is at the airport, and the delay can be easily calculated using a simple math. And when people are already on the airport, it's already too late to tell them that their flight is 24+ hours delayed. In this case you don't need to have a real time prediction. Next point, I don't agree with 95% cutoff. Yes, there might be some bugs, but I would not cutoff like that, since those flight delays might be actually the most important ones (which have 24+ hours delay)
@samw2066
@samw2066 3 месяца назад
Question on metric tested in the hypothesis test - avg. # of clicks per user. Why wouldn't it be avg revenue per user per day? I believe the user could click the "buy now" button and still exit, or they may buy but spend less money with the new button,etc....
@samw2066
@samw2066 3 месяца назад
Excellent video. Should the candidate list the guardrail metrics way before step 7 (i.e., step 1) so those are known & agreed upon before the test starts ?
@tanmayagrawal922
@tanmayagrawal922 3 месяца назад
This is a neat solution but can be made 5x more efficient if you map the sum of the two rolls to an outcome on the seven sided die: For e.g Sum Number on 7 sided die (2,5) -> 1 (3,4)->. 2 (6)->. 3 and so on (you will need to remove one of the 6 cases which give 7 though). This way, we will be using 35 out of 36 possible outcomes instead of just 7
@iLoveTurtlesHaha
@iLoveTurtlesHaha 3 месяца назад
Bro I have an interview in the morning and your video is giving me the confidence I need to go through with it. 😅
@skid-ed2qk
@skid-ed2qk 3 месяца назад
Thanks for sharing! This video on A/B testing is hands down the most informative and practical one I've seen on RU-vid.
@Whateverrrrr-no5em
@Whateverrrrr-no5em 3 месяца назад
the best a b testing video
@aakashdusane
@aakashdusane 3 месяца назад
What if we have same user making multiple purchases across the day? Does it make sense to have metric as revenue-per-session / revenue-per-search if we have multiple purchases by same user in a day, and metric is revenue per user per day, wont we have to change the randomization unit?
@alecryan8220
@alecryan8220 3 месяца назад
Hey Dan, can you talk about the decision to pick revenue-per-user-per-day? Doesn't using this metric require the delta method or bootstrapping if randomizing by user? Thanks!
@abeldomokos
@abeldomokos 3 месяца назад
Thank you very much for explaining this problem.
@maggiegeng8596
@maggiegeng8596 3 месяца назад
Why for the second question we use subquery instead of using joins and put filter in the end?
@TheRaju991
@TheRaju991 3 месяца назад
I am sorry, I have found a lot of good content in this channel but this video has to be the worst one.