We help software engineering candidates prepare for upcoming FAANG and FAANG-adjacent interviews via mock interviews with FAANG senior+ engineers and managers, AI tools, and extensive free content created by experts.
Love the content! Watched so many videos and they have really helped me in interviews. I had one questions. Instead of dumbing data from Kinesis to S3 to be read by spark. Why not just have spark batch processing read right off of Kinesis? If you are doing reconciliation every hour, day, or whatever. If is is smaller than the retention policy on Kinesis, I thought you could just do it from there. What am I missing?
I have 2 questions and would be most grateful if you could answer them? 1) Is there a reason in TicketMaster example you drew the API Gateway component but not here? 2) Is there a reason you mentioned CRUD Service in TIcketMaster example but not in this one?
Maybe a nit or I'm not in the know. But SQS doesn't have built in exponential retry right? You'd need to implement with approxRecieveCount and modify visibility timeout?
Can you please explain why we need kinesis in the middle and not directly write to Flink? Just a suggestion, when you introduce a particular technology please explain why. You do at time for DB but sometime you miss.
This is the best system design study resource I've came across that has depth and feedback (pass/no pass for mid/sr/staff) on design choices. Question: How can we use DynamoDB for Driver Lock? I thought DDB is eventually consistent for writes, so it doesn't offer strong consistency. It offers strong consistency for reads. Is that sufficient?
As always the content is awesome, I had a QQ on the cron job thing to fix the stale reserved tickets. In the video you mentioned, There could be a scenario that ticket remains unreserved for 19 minutes when the cron job runs every 10 mins or so. I didn't understood that. If at all anyone is reading and has time, Can you please explain? Thanks in advance!
BTW, really enjoying the content! I think if you made a statement like "kafka never goes down" in an interview it would be a bit of an odd thing to hear as an interviewer... of course kafka could go down even if running a managed service. Your application should surely handle this scenario in such a way that the system can recover when Kafka comes back online, hopefully due to its durability and persistent storage of events, your application should be able to resume where it left off. Wdyt? Edit - I need to add precision to my statement here, so am I correct in saying Kafka is only as durable as how you configure it, specifically how you configure replicas and how producers set replica acknowledgement before marking commit transactions as complete. If acks=1 is set, the producer only waits for acknowledgement from the leader broker, which could fail before writing to replicas, and data could be lost. acks=all is called Leader and In-Sync Replica Acknowledgement. So in summary, Kafka can certainly fail, I think a good answer to this question would be detailing how to configure it for durability, how that will impact performance, whether the operation needs such data consistency vs availability, and how your application reacts to unavailability of the kafka broker, and how it recovers when Kafka comes back online.
Thanks for the video! When implementing the Redis solution, wouldn’t availability of cache resources be an issue? Now every swipe event will cause a check in the cache for the inverse swipe. If redis is single-threaded, wouldn’t that overload the cache?
I want to take a moment to explain why Evan is actually the King and it's not just his last name. Around 55 min into the video, When we are handling the consistency to only send 1 request to a driver at one time, He gave 2-3 solutions starting from adding a driver status like request_sent. He explained it properly and even said it's good enough for mid level with the cron job etc. Then he removed all that and brought it Redis into picture with it's TTL thing. Now this is where the whole thing becomes beautiful because he gave multiple approaches and then choose the best. Just like any DSA problem doing 2 loops and then using some dsa/algo to make it linear. And also he co-related it by saying it is just like a HashMap. I think this is how it will stay with one instead of just saying from starting - Okay use Redis bye. I have read tons of tutorials and some do say that. So yeah, Evan is the King! Thanks Evan. This might be the first time I have commented on each video of the channel I have gone through. You deserve it! Thanks
Watched first 30 minutes of the video and I have to say this is by far the best tutorial of Uber because you haven't used Websockets and all other jargons yet. We must understand the basics first. Like The system should atleast work for 1 user and me being mid-level candidate, It is important to build what you did in first 30 mins to actually understand things. All the other tutorials out there first talk about Websockets etc etc. Those are good but I think your method is a better. Amazing. Time to watch the remaining 30 mins! Thank you, You are doing awesome job to the community.
FWIU, Redis is crazy fast because it is in memory database. There's a possibility of data loss during which we're likely to miss matches altogether. Adding Redis is 1 step closer to getting a better experience of instant matches. However, we still need a cron job to look back and look for matches. Unless of course we use persistence like append only log but then we lose on performance -- it is equivalent to a memtable + append only log same as Cassandra. Or am I wrong here ?
Where do we draw the line between handling of mega users vs normal users? To me this suggests an issue with the design, is this actually the way big social media companies handle this?
How can you rate limit from stateless API gateway, are you suggesting a) using IP hash based load balancing and storing state on the gateway workers? b) using distributed state storage cache and reading from this upon every http request?
Hey Evan, This is truly awesome content. I had one question, In each of your video, You end up telling lot of amazing things like in this, One of thing you mentioned was FileSystemWatcher to tell the local changes. Now my question is more on How did you learn all this stuff? Like You have 10-12 videos on different designs so it's not likely that you practically built those systems in your career. So How do we basically learn these unknown of unknown things? Do you read some books etc etc? I hope my question made some sense, Thanks in advance.
Watched 20 minutes of the video so far and This is the 3rd resource I am watching regarding Dropbox design, I have read Alex's book, read Grokking book and now watching this just for fun and I think Evan King is actually the King lol. Amazing video, Please keep on adding more content. Yesterday, I commented on Tinder's Design video and now here. I think I might have to comment on all the videos once I watch those because this is really good stuff and we viewers should appreciate it and hence I will keep adding comments lol :D
Hi Evan, great video! A quick question for the consistency definition. In CAP, the consistency is defined as "every single read of the system needs to read the latest write or else it will fail". Based on this definition, how can you derive the definition of matching consistency in this video? (i.e. <1> we don't send more than one request at a time for a given ride; <2> we don't send any driver more than one request at a time). I am looking forward to your reply. Thank you!
Lol you are great, I stopped the video half way to comment this. The way you explained the eventually consistency problem where two users swiped each other at the same but Cassandra being eventually consistent will prevent them to ever know and they will not be able to find the love they are seeking lol. Amazing Stuff. Requesting you to create more and more content. You are a legend!
As i watch this again, why do you say checkpointing is not required at 45:30 ish time? How does a job keep track of which time interval to process the click rate? Does flink internally maintain a state of the processed timestamp? (im not familiar with flink! )
below are not clear to me, appreciate some explaination: 1. is the database in seralization isolation lvl, will a weaker lvl work? 2.say both user A and B try to book on same 3tickets. userA's whole process went through well and booked all three ticket. UserB's network is laggy, the request reached booking service 10min late, how will the system prevent double booking for userB? if we use select and update for ticket table, then we must use seralization isolation lvl right? will this isolation lvl good enough for large write traffics?
Aah thank you for this!! I'm just learning system design for the first time really. This is the first walkthrough I've watched and it is SO helpful to see your process and how all the building blocks fit together! 🤩
Hi Evan, thank you for sharing the high-quality video! One quick question: in the step of drawing the design diagram, when do you use one-directional arrows between boxes? When do you use the bidirectional arrows? I am looking forward to your reply. Thank you!
You can. The fact is there are a million things you *could* mention. CI/CD, monitoring, GDPR, all things you need in any system. Mention them if you like, but to me as an interviewer I don’t learn anything about the candidate when they say these things so it just wastes some of the valuable (and very limited) time we have
Hey Evan. I am a former FAANG staff engineer and I am interviewing actively now. Your content is the best out there. The deep dives are the best. They've helped me a lot. Please continue creating content. And if there is a Patreon or something I can contribute to then please let us know.
Hi i just want to say your work is truly fantastic and you are an excellent educator! I started watching your videos to prepare for system design interviews and your explanations and examples are top notch. I'm motivated to continue watching not just to prep but also to expand my knowledge and improve my craft. Eagerly awaiting the great work you'll do next!