Тёмный

System Design of Doordash: Geo-Hashing and WebSockets for Location Based Services 

Gaurav Sen
Подписаться 583 тыс.
Просмотров 93 тыс.
50% 1

We go through a popular interview question: Design Doordash.
The system design of Doordash (similar to Swiggy and Zomato in India) involves matching food orders to riders. A rider has to be selected based on their location proximity to the restaurant (since the time taken to deliver a order from restaurant to customer wont change on rider).
For this, we shard geographical locations using hashes, known as Geo-Hashing. An area can be recursively broken down using a geo hash. Riders within a goehash region can be asked to pick an order on demand.
Tracking a delivery is done using server-side events or WebSockets. I suggested the idea of WebRTC for this, but it seems like overkill. Let me know your thoughts :D
Jordan's RU-vid Channel: ‪@jordanhasnolife5163‬
Location-based databases: • Designing a location d...
System Design Website: interviewready.io
00:00 Intro
01:35 Functional Requirements
02:50 Capacity Estimations
06:45 API Endpoints
08:10 Data Sources
11:30 Onboarding a restaurant
12:20 GeoHashes
23:20 Driver Updates
27:00 Data Consistency
28:30 Consistent Hashing
36:40 Optimizing Deliveries
40:20 Delivery Tracking
44:44 WebRTC
46:18 Concluding thoughts
You can follow me at:
Github: github.com/coding-parrot/
Instagram: / applepie404
LinkedIn: / gaurav-sen-56b6a941
Quora: www.quora.com/profile/Gaurav-...
Twitter: / gkcs_
#SystemDesign #InterviewReady #Coding

Опубликовано:

 

24 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 67   
@binwelbeck1482
@binwelbeck1482 Год назад
Jordan Knows what he is taking about ; I like his confidence and the way he explain the scenario. thanks gaurav for inviting Jordan
@payalgupta3487
@payalgupta3487 2 года назад
Digging deeper into single component is such a good idea. Rightly said that while discussing complex systems on the whole we tend to oversimplify. Enjoyed the video thoroughly. Thanks.
@hardikmenger7360
@hardikmenger7360 2 года назад
This guy was randomly seen on my feed and since then i am following him. His channel is the most no bs system design walkthrough channel.
@jordanhasnolife5163
@jordanhasnolife5163 2 года назад
Thanks Hardik!
@geekydanish5990
@geekydanish5990 2 года назад
I have been watching Jordan’s content for quite a while now and dude you are just killing it thanks Gaurav for inviting him
@jordanhasnolife5163
@jordanhasnolife5163 2 года назад
Thanks Geeky! I appreciate it! 😙
@ayonkar1534
@ayonkar1534 2 года назад
This is one of the best videos I have come across for SD Excellent interview candidate.
@modernsimplelife8706
@modernsimplelife8706 Год назад
Great video! It's very helpful to see a video where both of the interviewer and the interviewee know what they're talking about in a mock. These are some thoughts about WebRTC from my experience. I've been using and implemented WebRTC for 3 years. WebRTC uses UDP under the hood, however it uses SCTP protocol on top of UDP (probably will be replaced by QUIC eventually) to create a TCP-like experience for data channels. The developer is allowed to configure either guaranteed delivery (TCP) or lossy (UDP). WebRTC requires a signaling server to build a root of trust, and connection information for both the user and the driver. WebRTC and Websocket are stateful, meaning the server needs to keep the socket connection open for the entire session, otherwise the clients must reconnect. The stateful nature makes the solution harder to scale, and more complicated. As you guys mentioned, most users likely don't track on their food delivery as often as they would with Uber. I think WebRTC is overkill in this use case. The WebRTC connection likely gets broken (from closing the app) many times in the food delivery tracking session, and cause load churns to the server. I personally prefer doing HTTP polling with HTTP/3 transport (which is also based on UDP and support 0-RTT connection establishment). I think it is the simplest solution with reasonable low load to the server and easier to load balance horizontally. But, at the end of the day, we should look at metrics to decide the best solution for the problem. Strategically, starting with the simplest solution to quickly gather important metrics would be a good idea. So, I would start with HTTP polling, collect the data, then re-evaluate if WebRTC is needed. Finally, collect the data between HTTP polling and WebRTC. HTTP polling might be the most optimal solution already, simpler solutions can do wonder sometimes 😀. There are also ways to reduce states in WebRTC.
@sachin_yt
@sachin_yt 2 года назад
It's amazing watching both of you! Learned a lot! Thanks Gaurav! Def need more of these :)
@jordanhasnolife5163
@jordanhasnolife5163 2 года назад
I appreciate it!
@jackkohnjj
@jackkohnjj Год назад
This was a realistic interview in the sense that the interviewer realizes that the scale they are asking to design for is completely insane (500K drivers, 10M orders a day) but still proceeds as a thought experiment. I had an interview recently for a game platform and the interviewer said I should design for 50M concurrent users. LMAO.
@SoumikPradhan
@SoumikPradhan 2 года назад
This collab was worth the wait. Nice discussion.
@givemeabreakbro
@givemeabreakbro Год назад
Jordan knows what he's talking about. Every answer sounds calculated and well reasoned. Gaurav + Jordan is a killer combo. Looking forward to more such collabs
@ayushraj6525
@ayushraj6525 2 года назад
Learned a lot from this collab.. Thanks a lot Gaurav! 🔥
@user-pp6fi2bt4w
@user-pp6fi2bt4w 4 месяца назад
excellent explanation about geoshashing, thanks guys!
@TheHp6515b
@TheHp6515b 2 года назад
Great content. Jordan is very articulate and able to seamlessly translate theoretical knowledge to concrete solution. Quick note, consistently hashed cluster is great when keys evenly distribute across the hash space. In this case if we use `geo-hash` directly as the key, we get uneven distribution since small number of prefixes will have most of the data (e.g. geo-hashes representing Manhattan, Seattle etc) i.e. small number of keys having larger data. We could use a more dynamic range based sharding (involves a seperate manager service etc) to split key ranges based on load. Alternatively, we can hash the geo-hash itself to get even spread but loose colocation of relatively closer grids which could be important as we zoom-out for finding drivers. Either works but worth discussing the tradeoff especially considering latency is critical. (fan-out vs fan-in). Also wasn't clear what approach was proposed. Also, Jordan touched about point related to use of pure peer to peer communication for tracking (user and driver directly exchange communicate). Few reasons why this isn't practical generally, 1> Security & Firewalls. Exposed open ports in user's phones are prime target for DOS attacks. Most routers or n/w infra will actually block any new external incoming connection by default and will require a user override to whitelist/forward port. If it's a public router etc there won't be anyway at all to receive connection requests. 2> Relaying through a central service gives more reliable (in terms of throughput) network path (aws/gcp have interconnects with major isps and dedicated n/w infra), whereas for direct connections require to go through congested public interconnects. This is less of a problem when people are in close geo proximity so insignificant in this case, however relevant for whatsapp etc. Understandably not every aspect can be discussed in detail and interviewee isn't even expected to be aware of every aspect. Hence FAANG(at least F,G,U) gives interviewee leeway in picking one aspect of their choice for deep dive besides getting overall high level design. Side note. It is ok to say i don't know internals of x or y (say websockets or loadbalancers or storage engine etc) and move the interview to what you know. i.e. play by your strengths, not knowing deeply about few things is expected and doesn't result in -ve points but vaguely/incorrectly answering something will lead to -ve point and more followups around the same thing eventually leading to botched interview.
@colton8971
@colton8971 2 года назад
Jordan is a funny dude and not ugly!
@hitmusicworldwide
@hitmusicworldwide 2 года назад
In the USA geohash fixed addresses by zip code+4. The post office has done all the work. A more sophisticated analysis of verified address locations means that you aren't sharding for a lake, or other uninhabited area, saving tables, and flagging possible false orders and incorrect data.
@suhani7137
@suhani7137 2 года назад
hi gaurav! thank you for the amazingg content you are providing. i have binge watched your entire content in the past few months.on point explanation...your teaching skills are really goood. just a suggestion if you may...you should collaborate with people who amplify your reach ...like aman dhattarwal..or maybe launch some live courses on their platform...his reach is MASSIVE..it will also help you increase sales for your startup Idk if the suggestion i gave is even possible😅but just felt like this precious content and your courses should be reaching more and more people!
@okidokiyowyow356
@okidokiyowyow356 2 года назад
Hahaha. The introduction is really funny
@devam9999
@devam9999 Год назад
Would have been great to see details about the DB/tables design, and then going for the data storage estimations based on that. Is that something generally expected, or is the high level data storage estimation without getting into details about schema design, as done by Jordan here, also acceptable in an interview?
@apoorvchaturvedi2493
@apoorvchaturvedi2493 Год назад
Thanks Gaurav for this great video. Just one question as redis is being used here as geo-sharded database how concurrency will be handled with redis.
@imranshaikh115
@imranshaikh115 Год назад
Very nice 👌 I appreciate 🙏 💛 your efforts
@ankitnmnaik229
@ankitnmnaik229 2 года назад
Bhaiya, i just passed clas 12 th..and i want to get into programming (ml/Ai) but i have never been good at problem solving..so how can I improve my problem solving from now on...can u suggest ways..pls..
@tanveer.shaikh
@tanveer.shaikh 2 года назад
hi Gaurav, for every video can you also please mention the pre-requisite concepts , i felt few of the things going over head
@nishantgarg2815
@nishantgarg2815 2 года назад
Nice one. Learnt a lot. Keep up the good work
@user-pj9ck8ef1j
@user-pj9ck8ef1j Год назад
why this video can not be found on gaurav main page home->system design tag?
@michahubert1179
@michahubert1179 3 месяца назад
Thats a great content ! One thing that happens notoriously without noticing are units. 5Mb is 5 mega bits "b" not bytes "B".
@nitishkumarsingh6020
@nitishkumarsingh6020 Год назад
Which tool has been used for drawing?
@secondIncomePartTime
@secondIncomePartTime Год назад
jordan is genius
@constantinekhvalin6038
@constantinekhvalin6038 Год назад
is it secure to have peer-to-peer connections between dashers and clients?
@kaustabpaul4514
@kaustabpaul4514 4 месяца назад
thanks for sharing it was awesome
@MrMAhmed22
@MrMAhmed22 10 месяцев назад
Very nice effort, felt like gaurav missing some stuff on geo sharing and asking off questions was a bit confusing for the interviewee
@RahulDasAdhikary-usa
@RahulDasAdhikary-usa Год назад
Nice work! First example is nice - Sri Lanka to India Food delivery 😀 ha ha.. Anyways, I like your channel Gaurav.😊
@kamalsmusic
@kamalsmusic 2 года назад
Why do we need to do range queries within the redis instance in order to find dashers? If each shard of redis handles a range of geohashes, then can each redis shard just have a hashmap which maps a geohash -> [list of dasher ids]? You can use consistent hashing to locate the shard, then just do a hashmap lookup within that shard. As long as given a geohash, I can calculate the 8 surrounding box geohashes, this should be sufficient right? I think most libraries for geohash can calculate this quickly
@jordanhasnolife5163
@jordanhasnolife5163 2 года назад
I think it really just depends on the size of the geoshard that you use. If they are big, then you'll have to do a range query to quickly find a nearby driver to a given geohash. If they are the perfect size, then you could just select any of the dashers located in that node.
@chinmaysharma1867
@chinmaysharma1867 2 года назад
Do I need to know Web development before study system design ???
@abhineetsingh6720
@abhineetsingh6720 Год назад
great video
@RedBeardRetroTech
@RedBeardRetroTech Год назад
Thanks for the vid
@sanketsardesai1961
@sanketsardesai1961 Год назад
I would have used PostGIS definitely, store the data in geometry and use all the spatial functions, indexing and what not
@joaoluismoraes7215
@joaoluismoraes7215 9 месяцев назад
Hi Gaurav, this interview feature by Interview Ready looks nice. How did you guys build it? WebRTC? Have you written any tech deep dive on implementing it?
@gkcs
@gkcs 9 месяцев назад
We wrote the code in Golang, and accept http requests only. My next video is a deep dive on the architecture, stay tuned!
@hazemabdelalim5432
@hazemabdelalim5432 Год назад
Isn't MYSQL harder to shard ? i believe no-sql is easier to shard in that case so i would choose it .
@gkcs
@gkcs Год назад
It has to be sharded manually, yes.
@mansivishwakarma361
@mansivishwakarma361 2 года назад
💕💕💕💕
@poshettiakhil2664
@poshettiakhil2664 2 года назад
Hi Gaurav, This question is not related to doordash system design, Please can you explain the difference between message bus ,message queue and message broker
@protyaybanerjee5051
@protyaybanerjee5051 2 года назад
Message broker can be a distributed system which provides a reliable and fault tolerant way of delivering messages b/w multiple producer and consumer. A message broker can handle many to many mapping between a producer and consumer
@joaoluismoraes7215
@joaoluismoraes7215 9 месяцев назад
I've worked at a company that solved a similar problem but at a smaller scale (it had 200k users and more than half weren't active). Instead of Geohashing, they did it with MongoDB Geospatial queries. Why is Redis geohashing better in this case? When does geospatial queries start to become a problem?
@mhmdshaaban
@mhmdshaaban 8 месяцев назад
MongoDb supports both 2d Index where it uses b-tree and 2dsphere Index where it uses similar approach "geo hasing" I'm curious why do I have to implement that from the get-go and It's already implemented
@deepak655655
@deepak655655 2 месяца назад
Gaurav talking about UDP and his voice went berserk, talking about coincidence !!!
@tusharabbott9946
@tusharabbott9946 2 года назад
now i know why swiggy was not able to find a delivery partner for my order.
@rkara2
@rkara2 2 года назад
Why can't you just have all the actors (drivers, customers, businesses) subscribe (websockets) to the system and ping their location (lat/long) in real-time? Then when a customer / business needs a driver, they are getting the real-time picture of whos active in the system and where they are. So at the end of the day you don't care where drivers / customers are in the world, you just care about their availability. So you can have one table called Availability that you match everybody through.
@fabioschubert
@fabioschubert 11 месяцев назад
Lots of good stuff here but should've at the very least *mentioned* storing the Restaurant's data, menus, building the order, etc.
@protyaybanerjee5051
@protyaybanerjee5051 2 года назад
Interview Ready UX could do with some improvement
@gkcs
@gkcs 2 года назад
Yes. Thanks for the feedback. Any suggestions for improving the UX?
@jackkohnjj
@jackkohnjj Год назад
FYI, lat/lng to geohash is simple arithmetic. It would be ridiculous to call a service to do that.
@VireshGehlawat
@VireshGehlawat Год назад
The content was great, the tool needs to be improved. I can see both of them struggling with the interface. @Gaurav - As the interviewer in a real world scenario, the interviewer won't really be drawing as much as you were in this case.
@sipwhitemocha
@sipwhitemocha Год назад
Aye shoutout to J. Epstein
@anirudhsilverking5761
@anirudhsilverking5761 Год назад
I think I gained 30 IQ points just listening to this
@gkcs
@gkcs Год назад
Damn!
@KiritiSai93
@KiritiSai93 3 месяца назад
The comment about geohash being ordered and doing binary search is a bit flawed. Remember that geohash has this property: 'abc' is ALWAYS closer to 'abd', but 'def' for example might actually be also closer to 'abc' as well. So closer in edit distance is a sufficient condition but not a necessary one. Read System Design book by Alex Xu for more info on how this could happen.
@gkcs
@gkcs 3 месяца назад
ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-OcUKFIjhKu0.html
@KiritiSai93
@KiritiSai93 3 месяца назад
Thank you! Looks like you've already covered in the topic :)
@timurakhalaya6289
@timurakhalaya6289 3 месяца назад
poor performance i would say, in term of technology picked and in terms of planning as well ...
@aparnaprabhu
@aparnaprabhu 2 года назад
I read this as system design of Doordarshan 🤦‍♀️🤦‍♀️
@gkcs
@gkcs 2 года назад
Lol
Далее
Elden Ring DLC - ПОДОЖГЛО ПОПУ!
07:26
Просмотров 529 тыс.
20 System Design Concepts Explained in 10 Minutes
11:41
Redis Deep Dive w/ a Ex-Meta Senior Manager
31:00
Просмотров 11 тыс.
System Design: TINDER as a microservice architecture
36:41