For the question on "second highest order amount in the fashion department", should we be using DENSE_RANK() instead of RANK() - since if 2 orders have the highest amount then the third order (which is the second highest amount order) will get a rank of 3 using RANK() and a rank of 2 using DENSE_RANK()?
I feel like interviewer's focus on the parts that aren't really important i.e 18:20. Focus should be on testing the logic and the ability to get to the solution.
at 27:27 do we really need to group by c.first_name and c.last_name? I mean we already grouping by c.customer_id which has a kind of one-to-one relation with the first and last name
Almost every written query had very serious issues and either won't run at all or would give wrong answers. I'm not talking about typos, I'm talking about using an aggregate function when defining an order inside a partition clause, for example, which is not possible. Or using a window function AND a group by in a same query, which won't give the result he was hoping for. Or, as was done in his last query, filtering the data to only include one month and then using the lag function. Filtering with where is executed before the lag, so by the time lag is executed, the dataset has only records for december, meaning he will get nulls for every department. This is not a comprehensive list of issues, mind you, there are more. And that's just serious issues. In addition to that most of his filtering conditions were non-sargable, and he never thought about edge cases. For example, in the second problem it is possible that there were no users who bought something from one of the departments, and in that case the inner join he used would have lost that department, it wouldn't show in the result at all. Is it a plausible situation when working with real data? No. But it is definitely possible, and it should have been at least mentioned. Overall I am very disappointed with both the interviewee and the interviewer, who missed all of the mentioned mistakes.
I did not understand the code of the last question. He had to calculate the increase or decrease in month-over-month growth for the year 2022. What was the basis for creating the CTE for November and December (hardcoded)? In the second CTE, he filtered out only the December amount. Does it calculate and compare all the data back to January?
With nov_dec_sum as ( Select department_name, sum(case when year(order_date) = 2022 and month(order_date) = 11 then order_amount else 0 end as nov_sum, sum(case when year(order_date) = 2022 and month(order_date) = 12 then order_amount else 0 end as dec_sum From orders o join department d on o.orders_id = d.orders_id Group by department ) Select department_name, From nov_dec_sum Order by nov_sum - dec_sum desc Limit 1
Thanks a million for this work, highly appreciated as i'll be applying for data positions at the end of the year. Quick observation; The cte in the third query was not used, cte = orders_per_year . And the rank was not also used either
Hey newenglandnomad9405! The complexity of queries you'll write on the job can vary depending on the role, department, and company. In many cases, you'll have the support of senior colleagues and prior discussions to guide you. Technical interviews typically focus on standard topics and concepts relevant to the job, like SQL for data science or data structures for software engineering. These questions aim to assess the candidates' technical foundations, even if the skills might not be directly used in their day-to-day tasks. So, while you may encounter similar queries in interviews, the actual queries you write on the job may differ based on the specific role and requirements. Hope this helps!
SELECT customer_id, COUNT(order_id) AS total_orders FROM orders WHERE order_date >= CURRENT_DATE - INTERVAL '5' YEAR GROUP BY customer_id ORDER BY total_orders DESC LIMIT 1;