Heads up! Made a silly mistake with the primary key in the submission table. The submission table's primary key should be the ID, and then we'd want to add an index on competitionId. My bad 🫣
Again a brilliant job. Just a couple nitpicks to touch for Staff+ inteviewees and for the write up version: - DB has to be write optimized for competition. Submission table would probably need to be on a different tech such as Cassandra. In any case, Relational or NoSQL, it probably needs to be sharded especially to take care of the write demand at the end. Best candidate for sharding is submission ID. - Submission table PK has to be submission ID not competition ID. You can have a secondary index on competition ID but it's a serious error to say it's primary key is competition ID. - In the SQL query, MAX is the right aggregator, not MIN. You want the minimum of the maximum submission timestamp. Hence ASC order on MAX submission time:) - Redis won't be able to handle 100K user pulling in near the end of the competition. So some sharding and scatter/gather is needed there too. - You want rate limiting on the submissions too. So API gateway needs to be configured to do that or alternatively can be implemented in the primary server but would be much complicated. - Finally Problem POST should return a submission ID in the body of the response and update to the URL with submission ID (instead of a page load). This is needed because if user refreshes the page they'd want to continue to poll the latest submission. These are the only nitpicks I can find. I'm just listing them for others as a reference. Overall great. Please keep these coming. I derive immense value out of these. Very very good job!
These are by FAR the best product architecture/system design videos out there! I also highly recommend their mock interviews as well, I did 3 of them and the feedback you get back is more helpful than anything else you’d get out there! Please keep these videos coming
These videos are incredibly valuable and applicable not just for system design interviews but also for practical application in real world. Thank you for the excellent content!
Just a nit: for many of the interviews posted on this channel the database choice doesn't matter, I would like to see more where database choice actually matters and diving deeper into them
@@hello_interview Thanks for the great video! Please keep posting such system design videos. One question: Do we have to provide the cron job solution and then improve it to replace it with cache OR we can directly provide cache solution without even mentioning cron job solution? Same question for AWS SQS whether we can provide this solution from the beginning itself or at the end considering interviewee has 15+ years of experience?
Okay wow! This is by far the best system design video I've seen on YT. I actually was smiling and nodding my head throughout the whole video. THANK YOU SO MUCH!
There is a big problem with your videos. They are so good, they have spoilt every other resource, so for any questions that you are not posting, I feel there is no good resource any where whatsoever, please keep them coming, and keep them free, they are much needed.
So far best video I have seen on High level design which goes in iterative way and explains each bit and focuses on WHY part. Keep doing the good stuff !!
This is the only channel I setup notifications for as this has been one of the most valuable resources for tech interviews. Coding is easy to practice but having these videos really shows you how to approach the problem. I consider these videos the best resource out there. Please keep it up!!!!!!!!!!!!
@@hello_interview Seriously! I have read the Xu books and reviewed some other resources but this really helps me have a process which greatly helps to understand how to do these problems in an interview. Thank you so much!
This channel is a boon for software developers with upcoming interviews. Just discovered it last month and used your process in one of the interview today. Was very comfortable for me, only problem was had to make the interviewer understand that i will be doing the optimizations in the deep dive mostly. Need to still get accustomed to the infrastructure base process, although i am sure i will do it through the resources you have. Thanks a lot.
Great job from the team. This is by far one of the best sysdsgn resource platform on the internet. For handling asynchronous request on the client, can one also explore the use of service workers as an alternative to websockets and good-ol polling? I think this serves as a reasonable middle ground between the other options.
Hmm, you might have better knowledge of SW than I, in which case, certainly. Based on my understanding, I'm not sure where the benefit would be over polling, we don’t want to cache here, we need to poll until the submission is complete or a time limit elapses
@@hello_interview Thank you. Thinking about it again, it doesn’t seem to have much benefit. Heck, it will be the same thing if the sw allows the request go to the network except the server is capable of sending push events (which isn’t the case here)
For the leaderboard lo latency deep dive, can we not use a Spark MapReduce logic like explained in another problem and have a separate OLAP DB to store the live leaderboard information and have the user hit that as it is read optimized and give quick results back? I guess we will need to consider how frequent the competitions are in order to justify a separate DB Would love o hear what other people say.
These are great. You're pretty much exposing system design interviews though and raising the bar for everyone. Before it was a secret club that only a few people had access to because they had done it before. Now even junior engineers will have Google Fellow level knowledge and the usefulness of these interviews as filters will decrease. I wonder what other types of interviews they'll come up with?
I recently went through a full loop at Meta and had roughly ~35min to implement a solution to a system design problem similar to this one. It felt very rushed (to me based on my experience.) Do you have any advice regarding the short(er) amount of time to do this? Should we skip some of this content? Or just talk faster and make sure to write everything down as we go? Thank you for all the valuable content you share! I look forward to the next one.
I recently had a loop interview for staff at oracle, i thought i did pretty good following the same format like in here or Xu, but the interviewer was like lol..i need architecture diagram not these kind of boxes or component diagram or HLD, it really depends on the interviewer as well on the day of your interview goes as it is his expectation and his world might be a small shell. But this is amazing keep it up. Prep is important, but luck is damn important too. I was literally waiting for offer letter😂
I have some question regarding the database. Sometimes I'm struggling to understand what database would be better and it always come to my mind two keys: 1. Size 2. Query complexities How do you decide about the DB based on those two points? Something like 'we'll deal with 100GB, so maybe NoSQL' or 'We're doing a lot of complex joins, so SQL'; and what to do if we have complex queries in a huge database?
A great deep dive I’ve seen - what happens when the Redis sorted set grows too large for a single node? How do we partition? How do we know which partition to write to? How do we shift set entries up or down partitions? What do we do if the redis node goes down?
Can you clarify on the docker container part? We need an image to instantiate the container. Is it implicit that the image will get built based on every solution code/language and run on the container?
This channel is super helpful to all levels of developers, thank you for providing this level of guidance :) . Is there any plan to provide similar resources for ML system design?
Instead of having an SQL query I think, having a LeaderBoard Table would be good, have user_id, competetion_id, and points as column, and where user submits the all questions calculate the points and store it here. as things gets written once and read multiple times, fetching the points would be better way, just pass competion_id and sort on marks, u will have ur results. because think 100k user trying to see the leader board and that complex query getting executed for each on of them.
This is really interesting, i recently started learning the system design on your channel . I would say expalin things very clearly and other places people just confuse us while design a system and in end of design when i tried to summarize myself Its very confusing. After going through your HLDs i am now confident but can you also start putting LLD as well
FYI, malicious or buggy code in the container can technically bring down your host machine (EC2 instance? in this case). One deep dive could be to use MicroVMs to run the code. (What AWS Lambda uses via Firecracker).
Yah probably wouldn’t be the end of the world. Unlikely, and the host is still isolated here using ECS or something and we’ll just bring a new one up. But good call out for sure.
Thanks for these videos, they're really super helpful. Just curious, when you say you'd want a mid-level or senior to pick up on certain things, is that always proactively or can it be in response to interviewer prompts. For example, in this one would you expect a mid level to have picked up that the get leaderboard query is slow on their own, or could they still pass if you had to nudge them and say what do you think about this query and/or how can we improve it, and they came up with a good solution as you outlined?
Great Video, I did had a small doubt on how would we handle ranking users along with tracking the time they took to solve problems, similar to LeetCode? One approach could be to use sorted sets for ranking and store problem completion times in a Redis hash for each user per contest, with problem IDs as keys. My concern is whether this would lead to the N+1 problem when fetching the leaderboard, or if Redis is fast enough that it wouldn't significantly impact latency. What do you think?
maybe is worth introducing a queue for the results of the submission, and another MS that would read from this queue and update the leaderboard. this way the main service is not overloaded with not needed calculation of he scoreboard.
Good content, but I found the justification for CDC over a cron job hand-wavey. How specifically would a cron job have a higher infrastructure cost as you claimed? I also don’t think CDC is necessarily more reliable. I work at a big tech company and we’ve had three downtime events for our in-house CDC in the last two months. Also: the Kubernetes setting of “concurrencyPolicy: Forbid” would prevent the problem of too-frequent job runs you mentioned. And network isolation and no host machine system calls are the default behavior of a Docker container. The latter might not be possible at all.
Would love to know the suggestion of webhooks vs. polling when trying to get submission status? Sounds like the result would be equivalent and would lower the number of requests the server would receive?
The polling endpoint passes submissionId as a path parameter, but how would we know which submissionId to request for if the submit solution endpoint is just returning 200 status code. The submissionId would only be available once the submission record is created in the database, right?
Thanks for this video and all the other ones! I love your content. One thing I failed to understand in this video is where are the Docker containers (Language code runtime services) actually running ? Looking at your diagram, it seems like they are not running on the Primary Server. But they are Docker Containers. Their images must be pulled onto some server and served there such that the Workers can actually talk to them. Please explain that if possible. Thanks again and keep rocking!
@@hello_interview Do you mean to say that the Workers are running forever as EC2 instances, and that these worker instances were initially launched with language specific docker images? But how would they auto-scale? Kubernetes orchestration perhaps? Sorry I'm struggling to understand where these docker containers are actually running; and what's the real benefit of not using Lambdas in this design and how this design is actually solving the cold-start problem of Lambda functions?
Great video as always!! Just one doubt, is it right to make competitionId as a Primary Key for Submission table as it will be NULL in case when a submission is not associated with a competition and also multiple submission will have same competitionId? I might be missing something here. Please let me know your thoughts on this.
Thanks, very well explained!! I do have a question, why would you use competitionId as PK for the submission table? Wouldnt that make querying a specific submission very slow? Also SQS is in memory right? What if that goes down? Submissions will be dropped? Thanks
yah, this was silly mistake. submission table PK should be id and then we need an index on competitionId. Good callout. As for SQS, no one would wait more than 5s anyway, so if it does down and we lose them, it is what it is. Client shows an error saying they need to resubmit. They aren't going to wait 5 minutes for a response in any case.
I am curious what would you suggest for career development between Midlevel to Senior to Staff. I know there is a useful breakdown in terms of interview expectation, but I'm thinking more from the perspective of internal promotions. Do you have any useful tips?
Any reason for going with primary monolithic service rather than micro services like in other SD keys from HelloInterview ? Would love to see Messaging service like whatsapp or messenger next.