Thanks DJ jiang for pointing out that negative offset of LEAD or LAG works in PostgreSQL and Snowflake. I was referring to T-SQL that negative offset is not allowed and I should have clarified it in the video. Details here docs.microsoft.com/en-us/sql/t-sql/functions/lag-transact-sql?view=sql-server-ver15.
stopped in between of the video, the best part about the video is that it explains the difference between group by and window function. This is something not many youtuber do. I was struggling to visualize how it happens. Thanks a lot
This..... WAS FANTASTIC! I've got 4+ years of experience in SQL and I needed a quick refresher for window functions and this was exactly what I needed. Thank You!
Thanks Emma for sharing. What you shared helped me comprehend window functions. Most importantly, I applied those window functions to my day to day work, which boosted efficiency. Appreciated :)
Emma, your content is great. Your English is beautiful and clear. My respect For subscribers: I would like to use the comments to find ways to improve my English. If you are starting your journey in data analytics and you are a native speaker of English, we can be useful to each other) For example, weekly calls or something similar Best regards, Senior Data Analyst from Russia
Hi there, as a non-native speaker who's working in an English-speaking country, what I found most helpful is not to have English conversations, but to pay closer attention to all the sentences and expressions that come from the native speakers. Think about how you would have phrased your words vs. how they did. Find the difference and work on it. You could use everything as the source for that purpose: movies, youtube videos etc. But this is not to say that having conversations in English is not helpful. It is, but just maybe not as important as many people think. Cuz I know many non-native speakers, who have worked in English every day for many years, but still speak bad English.
Hello Emma, your videos are superb and great and very useful! Just one thing, we can use negative value when we use LEAD or LAG, basically lead(date,-1) over (order by date) equivalent of lag(date,1) over(order by date), which I tested in PostgreSQL and Snowflake, and they both works.
Thanks DJ for pointing it out! I was focusing on T-SQL docs.microsoft.com/en-us/sql/t-sql/functions/lag-transact-sql?view=sql-server-ver15 and should have mentioned it in the video. Thanks again!
@@emma_ding No worries, you are great! Your videos are very detailed and clear and well organized and very high level! Thank you and Keep up your hard works!
Great video! One quick question: At 9:10, I thought SUM(value) OVER(ORDER BY value) will return cumulative sum; but SUM(value) OVER() ORDER BY value will return the SUM shown in the video. or is it a PostgreSQL thing? I'm a little confused now.
" I thought SUM(value) OVER(ORDER BY value) will return cumulative sum; but SUM(value) OVER() ORDER BY value will return the SUM shown in the video". This is the correct statement. There must have been an error in the video.
You are 100%. Reason: (1.) When ORDER BY is specified and FRAMING is allowed, the DEFAULT FRAME is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW i.e all values up to the current value in the partition ( 2.) When ORDER BY is not specified and FRAMING is allowed, the DEFAULT FRAME is ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING i.e The entire partition
Can you explain how this query works ? SELECT COUNT(p) OVER(order by p ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as p_count, p, c,d FROM `d.t` SELECT COUNT(p) OVER(order by p ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as p_count, p FROM `d.t` Why is the window affected when I specify c and d columns? I would expect to count 1 before, current one and next one. So for first row it would be 2 for next ones except last one 3 and for last one to be 3, because the window is made of 3 items so it will count 3 most of the time. But no.. it returns different counts that does not make any sense to me. I read the bigquery documentation and I really don't get it.
Sorry for another dumb question: I tried something like "select rank(a) over(order by a) from t" in PostgreSQL. It'll show error telling me to add WITHIN GROUP. But if I write "select rank() over(order by a) from t" then it's fine. Seems like I can't pass anything in the () after rank. Could anyone clarify?