Count Unique Active Users Design Deep Dive with Google SWE! | Systems Design Interview Question 26

Подписаться 42 тыс.

Просмотров 5 тыс.

50% 1

I took a hyperloglog right after this video, damn you iHop
Much credit to Gaurav Sen for his HyperLogLog video which I heavily plagiarized
00:00 Introduction
00:35 Functional Requirements
01:46 Capacity Estimates
02:20 API Design
02:54 Database Schema
03:31 Architectural Overview

Наука

Опубликовано:

24 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 36

@Snehilw Год назад

Excellent content as usual man. Keep these going! These videos are absolute masterpieces in their own league :)

@xskrish Год назад

ngl the best I came up with was the indexing approach, learning a lot from these vids! thanks.

@jordanhasnolife5163 Год назад

Glad to hear!

@mayurdugar03 Год назад

Love your efforts! 😎

@dind7926 9 месяцев назад

hey Jordan, is there a particular reason why we are using RDBMS here to capture the status update instead of lets say Cassandra? If we are just doing writes I would assume NoSQL db would be a better option? Also why are we streaming data change in DB if we could send the data straight from the client whenever a user logs in/ logs out? Thank you 🙏

@jordanhasnolife5163 8 месяцев назад

Hey! 1) Pretty much arbitrary, as I didn't really think that we'd be writing so frequently that our database write throughput would be a bottleneck. Cassandra seems like a fair choice here! 2) I think you can do that too, though sometimes you'd like to know for purposes other than our system who is currently logged in or not, hence the reason for keeping it in a database so that others can query it. Good questions!

@mnchester Год назад

"I took a hyperloglog". Saw that coming from a mile away, but still funny as shit. Literally

@jordanhasnolife5163 Год назад

Low hanging fruit for sure - I eat a lot of those which is why I keep taking all these hyperloglogs

@vijayalaxmibasavaraj3038 Год назад

Plz let me know how the algo would work. suppose there are 10 active users, all of whose ids would map to rightmost bit of 1( i.e there is no streak of 0s on rightmost bits on any user id)

@jordanhasnolife5163 Год назад

That would then output 2 ^ 0 = 1

@mrdanp123 Год назад

Great video Jordan. I don’t quite understand how to keep track of users going from active to inactive. (using the Hyperloglog path). You mentioned that each server needs to store a singleUserId at a time. And we want to randomize which server clients hit (so no consistent hashing here). So how do you handle users going offline so we stop keeping track of them in the approximation? I know it will depend on how you define what is an active user, but I’m curious about your thoughts on this. Thanks

@jordanhasnolife5163 Год назад

Great question, and sorry I didn't elaborate a bit more on this throughout the course of the video, that's my mistake. So for now, let's abstract away how users actually come "online" or "offline" but assume that when they do so the change in status goes to our status change service. I propose that the status change service can send every newly online userId to a hyperloglog server, and the hyperloglog server will hold on to it if it has more rightmost zeros. Additionally, the combo of (device, userId, hyperLogLogShard) can be held in the status database. When a given device goes offline, check it's hyperloglogshard in the DB, and see if that was the Id with the righmost zeros - if so, query the DB for the userId with the next most rightmost zeros for that shard in the DB, and send it to the hyperloglogserver. That make sense? Without querying the DB, we'd have to keep many ids on each loglog server in the event that the userId with the most right zeros goes offline.

@mrdanp123 Год назад

@@jordanhasnolife5163 thanks man. Yes makes sense.

@abhis1560 Год назад

hi jordon, please do post Data structures and Algorithm content as well if possible. Loved your session

@natemanafter Год назад

I had a virtual onsite a few days ago (AWS) where the interviewer laid out a scenario where there was a massively popular mobile app and then asked how I would get a real-time count of active users, stored/grouped by minute, to be displayed on an internal dashboard that would update automatically over time. The data needed to persist so that other services (outside of scope) could query the data. I worked out a plan where the application's login and logout APIs would write login/logout events to a queue, events would be aggregated and compared (logins minus logouts, minus logins with no heartbeat/activity) using Flink or even a simple consumer service, then writing time-sequenced data to a database. Dashboard users would view the dashboard and receive updated data at whatever frequency we wanted over a WebSocket connection... It did seem like overkill but he had really emphasized that it was a massively popular app. After all of that, but before I even talked about caching or what database to use, he said, "Why do all of that? Haven't you heard of Amplify or Firebase? They're real-time and and push the data for you." That caught me by surprise because I had only really heard of Firebase in relation to mobile development and this was a backend SDE (aka SWE) position. He admitted that "firebase or amplify" was really what he was expecting for me to use and that he had been trying to hint at it, sort of implying that I got the wrong answer. Blew me away.

@jordanhasnolife5163 Год назад

That guy sends hella beta and you sound like a major alpha - but yeah actually if he's gonna say firebase you should say that's just a nosql database with support for websockets

@natemanafter Год назад

@@jordanhasnolife5163 that’s what I tried to explain. I tried to say, “yes, that’s why I had mentioned websockets, so we control exactly what goes to the client and we can more tightly control access to the database because it’s encapsulated through our own API.” But I think I had already lost him. It was a bummer.

@natemanafter Год назад

@@jordanhasnolife5163 your comment made me feel a lot better. I’ve been replaying that exchange in my head for the last 4 days.

@natemanafter Год назад

@@jordanhasnolife5163 Oof. Got the rejection email today. Whatever.

@jordanhasnolife5163 Год назад

@@natemanafter It's okay man! I've been rejected by companies (literally) hundreds of times! Sometimes interviewers have stupid requirements, but at the end of the day it's all about how much you interview! Like everything else, this is a numbers game, and the only thing you can do is prepare yourself :) You got em next time, I promise!!

@yashkhd1100 Год назад

Man ur content is top notch. Funny thing is the kind of algorithms(for exa. hyperloglog) u r talking about many of them I never heard from any other books/videos. I'm just wondering how u keep track of all these novel algorithms.

@jordanhasnolife5163 Год назад

I appreciate that man!! Just mostly going around RU-vid doing research myself

@riteshsingh1245 Год назад

Very nice video Jordan ! Could you please also do a video on designing cloud managed services like AWS S3

@jordanhasnolife5163 Год назад

Will hopefully get there!

@jayshah5695 Год назад

Couple of confusions: 1. how do you define an active user? a user who made a request (in some time frame that we are looking)? or an online user(checking heartbeats of a persistent connection from user's client devices)? 2. When I call API getUniqueUsers(): You mentioned to return unique users currently, what does that mean? Does it mean unique users who made a request in some specific timeframe/window, like last 1hr, last 1 min, last 1 day?

@jordanhasnolife5163 Год назад

1) I think that's something you can work out with your interviewer. You could do something like heartbeats, or if you really want to keep things simple, just have a log in and log out button. 2) Again this is something you'd work out with your interviewer. In my video, I just made an assumption that it was all the "active devices" and you would find the unique users from those.

@abhishekmishraji11 Год назад

Awesome video Jordan 🙂, i would like to request you for designing a tagging based system like we do it on Facebook/ twitter i.e we tag our friends in a comment or on the picture or hash tags as well . Didn't find any good resources on it

@jordanhasnolife5163 Год назад

Is this any more than just having a one to many relationship between post and taggedUsers? Perhaps even simpler just on the frontend say @user is a hyperlink to the user profils

@abhishekmishraji11 Год назад

@@jordanhasnolife5163 yeah I get your idea but hash tags seems a bit out of line here.