Amazon Data Science Business Case | FAANG Interview Prep

Подписаться 30 тыс.

Просмотров 13 тыс.

50% 1

👉 Land Your Dream Data Job. Visit www.datainterview.com 🚀
👉 Join the Data Scientist Interview Bootcamp: www.datainterview.com/bootcam...
====== ✅ Details ======
In data science interviews, business cases are common. Top companies such as Meta, Amazon and Google ask business case questions to assess the candidate's technical competency, communication and business acumen.
In this video, Dan (formerly Google data scientist) provides a step-by-step guide on how to crack the Amazon business case.
Here are the topics covered👇
📚 How business cases are conducted in interviews
✍️ How to structure your response
⭐ Business case - estimate CLV of Amazon customers
🚀 Looking for data science interview prep? Get access to 1:1 coaching, courses and slack community groups on datainterview.com/
👍 Make sure to subscribe, like, and share!
====== ⏱️ Timestamps ======
00:00 - Amazon Data Science Business Case | FAANG Interview Prep
00:38 - Problem Statement
01:32 - Business Process
10:24 - Modeling Solution
====== 📚 Other Useful Contents ======
1. How to Ace Product Metric Questions 👉 bit.ly/3xeCgOl
2. Cracking Data Science Business Cases 👉 bit.ly/3trCHDP
3. Crack the Amazon Data Scientist Interview 👉 bit.ly/3MyC6XJ
====== Connect ======
📗 LinkedIn - / danleedata
📘 Medium - / datainterview
📧 Email - dan@datainterview.com

Опубликовано:

3 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 19

@datamar2891 Год назад

Hey Dan, my name is Sam. I'm new to this and I'm early in my DS/MLE journey. I've only watched one of your case videos and going to take a stab at it by using the template you used in that one. I would definitely appreciate some feedback: Clarifying Questions: 1. What's the current process look like: - How does amazon currently calculate CLV of their shoppers? - What's the average lifespan of a customer of Amazon? - What's Amazon's profit percentage from a single sale? 2. What's the business objective: - Is the goal to improve CLV considering the latter half of the prompt? 3. Data sources: -Customer-specific data: (number of purchases, whether they were a prime member (month-to-month vs annual), total spent,, time duration, address (including city and country) Fleshing out the ML Pipeline: 1) EDA: - Observe distributions of the features - Correlation analysis between Xs and Y using something like Pearson's correlation 2) Data Pre-processing - Handle missing values: 1) If they account for +50% of data remove feature. 2) Build a classifier and use that feature as your target variable 3) You can use clustering - Encode categorical features: 1) If that feature has few categories then you can do one-hot/dummy encoding. 2) If high amount of categories you can do numerical target encoding - Normalize data (no need to if using decision tree based model like Random Forest 3) Feature Engineering - If we had more detailed purchase history with timestamps we can break it do some time decomposition: month - day - hour - minute - second - Can't think of anything else 4) Feature Selection - To avoid curse of dimensionality we can try PCA, Random Forest variable Importance or L1 Regression 5) Model Selection - Since we are predicting continuous, real-value we will be doing Regression - Random Forest Regressor comes to mind or XGBoost Regressor given their robust reputation - Hyperparameter tuning: (depth of trees, number of trees, min number of sample per leaf, pruning, learning rate in case of XGBoost) 6) Model Evaluation - MSE and possibly try MAE since it is less sensitive to outliers and we don't want our model necessarily to sensitive to them - K-fold cross validation to make sure our model generalizes well 7) Productionalize - Not too experienced here other than using a REST API using one of the cloud service providers like AWS, Azure or even something like Databricks which I know has that functionality

@DataInterview Год назад

Hey Sam👋. Welcome aboard! Interviewing is challenging, but with practice, you will succeed =D. Thanks for your solution. I placed my feedback on quotation below👇 Clarifying Questions: 1. What's the current process look like: - How does amazon currently calculate CLV of their shoppers? - What's the average lifespan of a customer of Amazon? - What's Amazon's profit percentage from a single sale? 👉 “Great questions to start off. The only thing I would advise is that I would generally avoid questions that the interviewer would presume that you would already know. This is to demonstrate expertise from the gecko. For instance, if you are a practitioner in the field, or you’ve already done some research, you would anticipate that CLV is calculated based on some fixed horizon, let’s say 1, month, 3 months, 6 months, 12 months. I would provide some info up front then confirm with the interviewer.” 2. What's the business objective: - Is the goal to improve CLV considering the latter half of the prompt? 3. Data sources: -Customer-specific data: (number of purchases, whether they were a prime member (month-to-month vs annual), total spent,, time duration, address (including city and country) 👉 “Great list of signals!” Fleshing out the ML Pipeline: 1) EDA: - Observe distributions of the features - Correlation analysis between Xs and Y using something like Pearson's correlation 2) Data Pre-processing - Handle missing values: 1) If they account for +50% of data remove feature. 2) Build a classifier and use that feature as your target variable 3) You can use clustering - Encode categorical features: 1) If that feature has few categories then you can do one-hot/dummy encoding. 2) If high amount of categories you can do numerical target encoding - Normalize data (no need to if using decision tree based model like Random Forest 3) Feature Engineering - If we had more detailed purchase history with timestamps we can break it do some time decomposition: month - day - hour - minute - second - Can't think of anything else 4) Feature Selection - To avoid curse of dimensionality we can try PCA, Random Forest variable Importance or L1 Regression 👉 “Agreed, but remember interpretation matters for the latter case. So, I’d stick to L1 Regression or RF.” 5) Model Selection - Since we are predicting continuous, real-value we will be doing Regression - Random Forest Regressor comes to mind or XGBoost Regressor given their robust reputation - Hyperparameter tuning: (depth of trees, number of trees, min number of sample per leaf, pruning, learning rate in 👉 “+1” case of XGBoost) 6) Model Evaluation - MSE and possibly try MAE since it is less sensitive to outliers and we don't want our model necessarily to sensitive to them - K-fold cross validation to make sure our model generalizes well 👉 “+1” 7) Productionalize - Not too experienced here other than using a REST API using one of the cloud service providers like AWS, Azure or even something like Databricks which I know has that functionality 👉 “Depending on the level of depth the interviewer expects, follow-up questions may vary. But, generally the framework works like this - you need to build an ETL, establish a Cron job (not sure what Amazon readily uses, but you could use Airflow), wrap your model in a REST API that includes preprocessing, training and prediction. The result should be stored in DBs for prediction, monitoring and so and so” Hope this helps! 😉 If you have any questions, feel free to reach out dan@datainterview.com Happy interviewing!

@datamar2891 Год назад

@@DataInterview Thank you for your reply, Dan. I appreciate the feedback.

@chetnamohapatra5181 15 дней назад

Hey Dan! This is exactly what I wanted. Not just the template but going in details of each step and explaining why we chose something vs the other. This is amazing! I am going to subscribe to practice more of these.

@Aidan_Au 2 года назад

Thank you Dan for walking us through and providing commentary in another question. Whoever doesn't leave a comment for your feedback is missing out!

@DeepakRajput-wl2pi 2 года назад

Please do more of these

@sonug2924 Год назад

Great work ! Thanks Dan

@mariullom8105 Год назад

I absolutely love your videos.

@rezarafieirad Год назад

just another perfect video. thanks

@mohammadrahmaty521 10 месяцев назад

Amazing! Thanks!

@TheMISBlog 2 года назад

Very informative,Thanks Dan

@KS-df1cp 2 года назад

Thank you! One thing that I learnt is your style! I jumped straight on defining target, getting features and talking about metrics. I think my approach is robotic and the impact is missing. Will definitely practice some on your channel. There is no way my thoughts can occur so fast though!! Is that acceptable? How can I think faster in system design interview? Is practice the only way? Thank you.

@DataInterview 2 года назад

Thanks! Many of us start awkward when we are trying something new for the first time. But, with practice and experience you will get better! - Dan

@KS-df1cp 2 года назад

@@DataInterview thank you .. :) this is really helpful.

@anthonykinruizcalvo7516 2 года назад

Nice video! Please can you explain how you do Numerical Encoding when a large number of products? I couldn't understand how you avoid adding too many features if you are going to get avg/total per product. Thanks!

@DataInterview Год назад

Hey Anthony, thanks for the question, the numeral encoding works like this: For each categorical value, you essentially try to have a numerical representation of it. Suppose you have a category of product items. Instead of applying one-hot on each product which is going to cause your feature space to explode, you aggregate each product on some continuous variables like, historical sales, volume, inventory count and such. So now, you have some dense representation for each product item, then you merge those columns to your model data. Hope this clarifies!

@anthonykinruizcalvo7516 Год назад

@@DataInterview thank You very much!