Meta and Uber SQL Problems | Data Science Interview

DataInterview

Подписаться 31 тыс.

Просмотров 9 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

21 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 36

@Aidan_Au 2 года назад

Thank you for walking through us two SQL questions. You explained your thought process really well. I’m gonna get good at communicating like you do! In the FB question, in the first CTE, alternatively, we can use the AND clause to filter the dates too, so that only the rows with a signup date after 2020-01-01 will get matched/joined. When calculating the proportion, I’d *1.0 just in case that the format/type is not float. Some people like to do (time_per_userevent/time_per_user)::float too.

@Aidan_Au 2 года назад

WITH userevent_time AS ( SELECT u.user_id, u.name, a.event, SUM(a.time_spent) AS time_per_userevent FROM User u JOIN Activity a ON u.user_id = a.user_id AND u.date_signup > ‘ > ‘ 2020 -01 -01 ’ GROUP BY 1, 2, 3 ), total_time_per_user AS ( SELECT user_id, SUM(time_per_userevent) AS total FROM user_activity_time GROUP BY 1 ) SELECT ua.user_id, ua.name, ua.event, ua.time_per_event * 1.0 / t.total AS proportion FROM userevent_time ua JOIN total_time_per_user t ON ua.user_id = t.user_id P.S. I like to explicit use the aliases for the columns because it's easy to forget and make mistake there

@Aidan_Au 2 года назад

Q2 Uber WITH rider_ride_count AS ( SELECT user_id, count(*) AS ride_count FROM RideStatus WHERE status_of_order = 'Success' GROUP BY 1 ), uber_one_ride_ranking AS ( SELECT u.market_id, u.user_id, r.ride_count, RANK() OVER ( PARTITION BY u.market_id ORDER BY r.ride_count DESC ) AS rk FROM UserProfile u JOIN rider_ride_count s ON u.user_id = r.user_id AND uber_one = 1 ORDER BY 1, 4 ) SELECT * FROM uber_one_ride_ranking WHERE rk

@oscararmandocisnerosruvalc8503 2 года назад

Hello bro have you been practicing using strata ???? . do you recommend buying the pro plan? Regards!!.

@Aidan_Au 2 года назад

@@oscararmandocisnerosruvalc8503 Hi Oscar. I use Dan's Daetama now daily.

@oscararmandocisnerosruvalc8503 2 года назад

@@Aidan_Au cool I think dan and nate from stratascratch are the top SQL masters. Regards!!.

@Aidan_Au 2 года назад

Joma Tech has a SQL course. I learned a lot from his SQL course. I practice SQL on StrataScratch because they have a decent size of question banks there and an IDE. Thank you Dan for providing a list of clauses/commends that we should know in SQL. Would you please make a video about what we should do when Meta recently confirmed the hiring freeze?

@DataInterview 2 года назад

Given the recent hiring freeze across companies I think it would be a decent video content, good idea!

@Aidan_Au 2 года назад

@@DataInterview Looks like Twitter just announced hiring freeze too but that shouldn't be surprising with a lot of movement going on for them

@siyanhu 2 года назад

for first sql problem, can you simply use window function sum over the event in which the query will be shorter?

@DataInterview 2 года назад

Yup, that is another way. In fact, probably more efficient as there's less records to JOIN. Thanks!

@shampurnadas7664 Год назад

Hi Dan I can understand the sum of tumescent as time_per_user_event but I can not understand again the sum of that time_per_user_event. Please explain.

@rajkamal2010 2 года назад

Very informative session

@nitishgalat5483 2 года назад

Hi @dan I just love your content. Can you suggest some websites where we can practice such type of question? . Thanks in advance

@DataInterview 2 года назад

Check out datainterview.com. There’s also LeetCode.

@SL-hp2zk 2 года назад

thanks for this great content Dan! what level of data scientist position is this question for?

@DataInterview 2 года назад

The questions are medium level difficulty so it would be mid-to-senior. Entry in some cases but that depends on companies.

@rahulruke307 2 месяца назад

Can anyone tell me what's the proportion they are getting as a output?

@Aidan_Au 2 года назад

Thank you for covering a question on Windows Function. Your student Matt in Meta said that we need to be able to write windows function during our sleep. This Uber question can be a bit harder by asking us to calculate the success rate and testing us on DENSE_RANK() I posted my solution in the reply right below here.

@DataInterview 2 года назад

Right on, thanks Aidan!

@oscararmandocisnerosruvalc8503 2 года назад

please try stratascratch !!!

@krishnendudey7155 2 года назад

Try providing the DDL also ..That would be very beneficial. Thanks 👍

@DataInterview 2 года назад

Will do!

@marathiManus10 2 года назад

On another note, am waiting for that fateful day when RU-vid will allow markdown and efficient way to tag people in comments.

@DataInterview 2 года назад

Agreed!

@garyboy7135 2 года назад

Uber question: 1. Without subquery (12 line of code, using snowflake syntax) SELECT up.market_id, rs.user_id, count(rs.ride_id) as ride_count , rank() OVER(partitioned by market_id order by ride_count desc) rank FROM RideStatus rs JOIN UserProfile up on rs.user_id = up.user_id WHERE rs.status_of_order = 'Success' AND up.uber_one = 1 GROUP BY 1, 2 QUALIFY rank

@marathiManus10 2 года назад

Thanks Dan! @DataInterview always gives amazing content! should we use DENSE_RANK in Q2 Uber, so that we capture multiple user_ids with same number of ridee? lets say - for a market_id 2, total user_ids with successful rides are 9 AND user_id A,B,C,D,E had 10 rides user_id F had 5 rides user_id G,H had 3 rides user_id I had 2 rides THEN top 3 riders by ride count in market 2 should be users with ride counts 10,5,3 - am I right? Please correct me if am wrong! user_id || ride_count || rank || dense_rank A | 10 || 1 || 1 B | 10 || 1 || 1 C | 10 || 1 || 1 D | 10 || 1 || 1 E | 10 || 1 || 1 F | 5 || 6 || 2 G | 3 || 7 || 3 H | 3 || 7 || 3 I | 2 || 9 || 4

@DataInterview 2 года назад

Yup, works!

@Aidan_Au 2 года назад

Question 2 Uber WITH user_ride_count AS ( SELECT user_id, count(*) AS ride_count FROM RideStatus WHERE status_of_order = 'Success' GROUP BY 1 ), uber_one_user_ride_count AS( SELECT u.market_id, u.user_id, r.ride_count, RANK() OVER ( PARTITION BY u.market_id ORDER BY r.ride_count DESC ) AS rk FROM UserProfile u JOIN user_ride_count r ON u.user_id = r.user_id AND u.uber_one = 1 ) SELECT market_id, user_id, ride_count, rk FROM uber_one_user_ride_count WHERE rk

@DataInterview 2 года назад

Yup, good solution!

@frashertseng9426 2 года назад

with _temp as (select * from activity as a left join user as b on a.userid = b.userid where b.date_signup >= '2020-01-01' ) select name, user_id, event_type, 1.0 * sum(time_spent) / sum(sum(time_spent)) over(partition by name, user_id) as pct from _temp group by 1,2,3 order by 1,2,3 ------- with _temp as (select * from ride as r left join user as u on r.user_id = u.user_id where u.uber_one = 1 ) select * from (select mkt_id, user_id, sum(case when status = 'successed' then 1 else 0 end) as success_rides, dense_rank() over(partition by mkt_id order by sum(case when status = 'successed' then 1 else 0 end) desc) as rnk from _temp group by 1,2 ) where rnk in (1,2,3)

@garyboy7135 2 года назад

FB question, using window function method: with user_signed_up_2021 AS ( SELECT * FROM User WHERE date_signup >= '2021-01-01' ), user_event_time_spent AS ( SELECT user_id, event, sum(time_spent) as total_timespent_event FROM Activity GROUP BY 1 ), user_event_time_calculation AS ( SELECT signup.user_id, signup.name, timespent.event, timespent.total_timespent_event, sum(total_timespent_event) OVER(partitioned by user_id) as total_timespent_user FROM user_signed_up_2021 signup LEFT JOIN user_event_time_spent timespent ON signup.user_id = timespent.user_id GROUP BY 1, 2, 3, 4 ) SELECT name, user_id, event, total_timespent_event/total_timespent_user AS proportion FROM user_event_time_calculation

@Aidan_Au 2 года назад

Question 1 FB WITH userevent_time AS ( SELECT u.user_id, u.name, a.event, SUM(a.time_spent) AS time_per_userevent FROM User u JOIN Activity a ON u.user_id = a.user_id AND u.date_signup > ‘ > ‘ 2020 -01 -01 ’ GROUP BY 1, 2, 3 ), total_time_per_user AS ( SELECT user_id, SUM(time_per_userevent) AS total FROM userevent_time GROUP BY 1 ) SELECT ua.name, ua.user_id, ua.event, ua.time_per_event * 1.0 / t.total AS proportion FROM userevent_time ua JOIN total_time_per_user t ON ua.user_id = t.user_id P.S. I like to explicit use the aliases for the columns because it's easy to forget and make mistake there

@marathiManus10 2 года назад

Hi @Aidan Au. Thats elegant solution - specially the *1.0 hack. SQL newbie here - specially to window functions. Would this solution work for Q1 - meta. WITH userevent_time AS ( SELECT U.user_id, U.name, A.event, A.time_spent, SUM(A.time_spent) OVER (PARTITION BY U.user_id) AS user_total_time FROM User U LEFT JOIN Activity A ON U.user_id = A.user_id WHERE U.date_signup > '2020-01-01' ) UET SELECT user_id, name, event, time_spent * 1.0 / user_total_time AS proportion FROM UET GROUP BY 1,2,3;

@Aidan_Au 2 года назад

@@marathiManus10 Thank you. I'm flattered. Dan is a Master of subqueries. There's a syntax error in your CTE. It should start with 'WITH (CTE's name) AS ' without any parameters (?) in your query. Also, we don't have to use windows function in this question. We're not looking for the cumulative sum of for each row. Rather, we can just use group by aggregation to get the aggregated time spent per each event per each user.

@marathiManus10 2 года назад

@@Aidan_Au Thanks! I mimicked this def style from some other video - hv edited the syntax error. I tried window function to avoid creating additional table and making another join in final query. My understanding was : PARTITION allows aggregating based on single dimension - which GROUP BY does not allow. do you think my query is correct logically? Am really interested to clear my understanding.