Thank you for walking through us two SQL questions. You explained your thought process really well. I’m gonna get good at communicating like you do! In the FB question, in the first CTE, alternatively, we can use the AND clause to filter the dates too, so that only the rows with a signup date after 2020-01-01 will get matched/joined. When calculating the proportion, I’d *1.0 just in case that the format/type is not float. Some people like to do (time_per_userevent/time_per_user)::float too.
WITH userevent_time AS ( SELECT u.user_id, u.name, a.event, SUM(a.time_spent) AS time_per_userevent FROM User u JOIN Activity a ON u.user_id = a.user_id AND u.date_signup > ‘ > ‘ 2020 -01 -01 ’ GROUP BY 1, 2, 3 ), total_time_per_user AS ( SELECT user_id, SUM(time_per_userevent) AS total FROM user_activity_time GROUP BY 1 ) SELECT ua.user_id, ua.name, ua.event, ua.time_per_event * 1.0 / t.total AS proportion FROM userevent_time ua JOIN total_time_per_user t ON ua.user_id = t.user_id P.S. I like to explicit use the aliases for the columns because it's easy to forget and make mistake there
Q2 Uber WITH rider_ride_count AS ( SELECT user_id, count(*) AS ride_count FROM RideStatus WHERE status_of_order = 'Success' GROUP BY 1 ), uber_one_ride_ranking AS ( SELECT u.market_id, u.user_id, r.ride_count, RANK() OVER ( PARTITION BY u.market_id ORDER BY r.ride_count DESC ) AS rk FROM UserProfile u JOIN rider_ride_count s ON u.user_id = r.user_id AND uber_one = 1 ORDER BY 1, 4 ) SELECT * FROM uber_one_ride_ranking WHERE rk
Joma Tech has a SQL course. I learned a lot from his SQL course. I practice SQL on StrataScratch because they have a decent size of question banks there and an IDE. Thank you Dan for providing a list of clauses/commends that we should know in SQL. Would you please make a video about what we should do when Meta recently confirmed the hiring freeze?
Hi Dan I can understand the sum of tumescent as time_per_user_event but I can not understand again the sum of that time_per_user_event. Please explain.
Thank you for covering a question on Windows Function. Your student Matt in Meta said that we need to be able to write windows function during our sleep. This Uber question can be a bit harder by asking us to calculate the success rate and testing us on DENSE_RANK() I posted my solution in the reply right below here.
Uber question: 1. Without subquery (12 line of code, using snowflake syntax) SELECT up.market_id, rs.user_id, count(rs.ride_id) as ride_count , rank() OVER(partitioned by market_id order by ride_count desc) rank FROM RideStatus rs JOIN UserProfile up on rs.user_id = up.user_id WHERE rs.status_of_order = 'Success' AND up.uber_one = 1 GROUP BY 1, 2 QUALIFY rank
Thanks Dan! @DataInterview always gives amazing content! should we use DENSE_RANK in Q2 Uber, so that we capture multiple user_ids with same number of ridee? lets say - for a market_id 2, total user_ids with successful rides are 9 AND user_id A,B,C,D,E had 10 rides user_id F had 5 rides user_id G,H had 3 rides user_id I had 2 rides THEN top 3 riders by ride count in market 2 should be users with ride counts 10,5,3 - am I right? Please correct me if am wrong! user_id || ride_count || rank || dense_rank A | 10 || 1 || 1 B | 10 || 1 || 1 C | 10 || 1 || 1 D | 10 || 1 || 1 E | 10 || 1 || 1 F | 5 || 6 || 2 G | 3 || 7 || 3 H | 3 || 7 || 3 I | 2 || 9 || 4
Question 2 Uber WITH user_ride_count AS ( SELECT user_id, count(*) AS ride_count FROM RideStatus WHERE status_of_order = 'Success' GROUP BY 1 ), uber_one_user_ride_count AS( SELECT u.market_id, u.user_id, r.ride_count, RANK() OVER ( PARTITION BY u.market_id ORDER BY r.ride_count DESC ) AS rk FROM UserProfile u JOIN user_ride_count r ON u.user_id = r.user_id AND u.uber_one = 1 ) SELECT market_id, user_id, ride_count, rk FROM uber_one_user_ride_count WHERE rk
with _temp as (select * from activity as a left join user as b on a.userid = b.userid where b.date_signup >= '2020-01-01' ) select name, user_id, event_type, 1.0 * sum(time_spent) / sum(sum(time_spent)) over(partition by name, user_id) as pct from _temp group by 1,2,3 order by 1,2,3 ------- with _temp as (select * from ride as r left join user as u on r.user_id = u.user_id where u.uber_one = 1 ) select * from (select mkt_id, user_id, sum(case when status = 'successed' then 1 else 0 end) as success_rides, dense_rank() over(partition by mkt_id order by sum(case when status = 'successed' then 1 else 0 end) desc) as rnk from _temp group by 1,2 ) where rnk in (1,2,3)
FB question, using window function method: with user_signed_up_2021 AS ( SELECT * FROM User WHERE date_signup >= '2021-01-01' ), user_event_time_spent AS ( SELECT user_id, event, sum(time_spent) as total_timespent_event FROM Activity GROUP BY 1 ), user_event_time_calculation AS ( SELECT signup.user_id, signup.name, timespent.event, timespent.total_timespent_event, sum(total_timespent_event) OVER(partitioned by user_id) as total_timespent_user FROM user_signed_up_2021 signup LEFT JOIN user_event_time_spent timespent ON signup.user_id = timespent.user_id GROUP BY 1, 2, 3, 4 ) SELECT name, user_id, event, total_timespent_event/total_timespent_user AS proportion FROM user_event_time_calculation
Question 1 FB WITH userevent_time AS ( SELECT u.user_id, u.name, a.event, SUM(a.time_spent) AS time_per_userevent FROM User u JOIN Activity a ON u.user_id = a.user_id AND u.date_signup > ‘ > ‘ 2020 -01 -01 ’ GROUP BY 1, 2, 3 ), total_time_per_user AS ( SELECT user_id, SUM(time_per_userevent) AS total FROM userevent_time GROUP BY 1 ) SELECT ua.name, ua.user_id, ua.event, ua.time_per_event * 1.0 / t.total AS proportion FROM userevent_time ua JOIN total_time_per_user t ON ua.user_id = t.user_id P.S. I like to explicit use the aliases for the columns because it's easy to forget and make mistake there
Hi @Aidan Au. Thats elegant solution - specially the *1.0 hack. SQL newbie here - specially to window functions. Would this solution work for Q1 - meta. WITH userevent_time AS ( SELECT U.user_id, U.name, A.event, A.time_spent, SUM(A.time_spent) OVER (PARTITION BY U.user_id) AS user_total_time FROM User U LEFT JOIN Activity A ON U.user_id = A.user_id WHERE U.date_signup > '2020-01-01' ) UET SELECT user_id, name, event, time_spent * 1.0 / user_total_time AS proportion FROM UET GROUP BY 1,2,3;
@@marathiManus10 Thank you. I'm flattered. Dan is a Master of subqueries. There's a syntax error in your CTE. It should start with 'WITH (CTE's name) AS ' without any parameters (?) in your query. Also, we don't have to use windows function in this question. We're not looking for the cumulative sum of for each row. Rather, we can just use group by aggregation to get the aggregated time spent per each event per each user.
@@Aidan_Au Thanks! I mimicked this def style from some other video - hv edited the syntax error. I tried window function to avoid creating additional table and making another join in final query. My understanding was : PARTITION allows aggregating based on single dimension - which GROUP BY does not allow. do you think my query is correct logically? Am really interested to clear my understanding.