Thanks Jordan for all the good content, i had an interview at microsoft and i only knew DSA , Design Patterns and AWS microservices and i did not have any idea about the tradeoffs and chalenges we face when designing systems. I went through a lot of system design content but the creators did not propose alternative solutions for the problems nor did they prove the decisions they took while designing the system. I binge watched the 2 playlists - system design and system design questions in a span of a week. I was able to design a graph database using noSQL in the design round and got selected for the role. Also your views don't justify the quality of content you have in these playlists. I will surely promote this channel later in my linkedIn post and your friends too will know how big of a nerd you are.
Since reads are 100x writes, I think we need a consistent DB like Mongo DB or Postgres and then leverage a non-consensus algorithm. Yes you said Single Leader. I'll go more with Linearizable Compare-and-Swap, which you also hinted. However, this will increase latency. OR we can use Google Spanner for our Key Generator DB so that it ALWAYS doles out unique hashes to requests. It could be centralized with its own atomic clock. But again, that's SUPER expensive. :)
Thanks a lot for great contents and mainly to continue your dedication to keep producing these. Quick question: you mentioned Riak can be used, so that if conflict occur, we will assign one the next available sequence. But, at that point of time, we might have already provided the short urls to both the users, that will still be same issue as Cassandra? Am I missing anything here to use Riak?
Please excuse my ignorance if this is covered in video or comments. The key generation service approach has another disadvantage. For same actual url passed by user, the system will generate new tinyURL everytime. That creates duplication of data as compared to md5 or sha1 approach where tinyurl has 1-1 mapping with actual url.
Agreed - though I think that this is actually intended behavior for TinyURL just so that each user can track the performance of their own TinyURL. That's a great point to make during an interview though!
Hey Jordan, awesome content! 1. Does single leader replication implies there is always only one leader per db, or one leader per partition? 2. What do you say about having a key generation service generating X keys / second, storing them in Redis Cluster [key (string) -> used (bool)] and only using the keys from Redis when needed? Redis use single-leader replication, locks the data during update, so theoretically there should be always one unique key returned? And for Link Table we can use cassandra freely (as keys already generated and stored in redis)? 3. Would you use some message brokers? How the services should communicate?
1) 1 per partition 2) That's reasonable, but I doubt you're going to get better performance using a separate node as a lock than you would just using locking on each node, otherwise you need a 2 phase commit 3) If I was gonna do the approach in step 2, I'd probably just have to use a two phase commit here, no message broker
Hey Jordan, I had a question about what caching strategy you would use on writes, and to keep the cache coherent. From all the strategies you've mentioned in your previous video, what strategy would you suggest? 1) I'm guessing we would not use a write-back strategy, as we don't want to batch writes in our cache and then send them to our DB, as having write consistency for our long url -> short url mapping is important, and if we batch writes, then 2 different users could end up having the same short url for different long urls. 2) I'm guessing we don't use write-around, as we don't generally update the same key (same shortened URL) very often, so there's no need to write the key to the url, and invalidate it in the cache. 3) So I'm guessing we'd use a write through strategy, where we update the DB and the cache in parallel, but without using 2PC, as we only care about the write succeeding in the DB, but not so much about the write to the cache. I see from your diagram that MongoDB is reaching out to your cache, does that mean you've used a write-around?
Personally I think that a write around or write through is fine. It's no big deal if there are a couple of cache misses on the first couple of reads, so long as the popular items are being pulled into the cache. Keep in mind the issue with write through is that every url written would be sent to our cache, and there are probably a lot of tinyurls created that aren't read for a while or at all.
Hey Jordan. Thanks for all the great videos! Shouldn't the fetch/write servers query ZooKeeper in order to know which partition they should use for a given key?
Thanks, fantastic content, much more in-depth than other similar design videos (although requires more prior knowledge). What do you think about the idea of each service instance pre-loading (claiming) say 1000 keys each to minimise DB locking and synchronising on a single ever-increasing counter i.e. rowId. I think this was suggested in Grokking series you mentioned.
That's the key generation service idea! I believe that I mention that in this video as well, albeit perhaps less in depth :) Thanks for the nice comment!
hey Jordan, I'm curious if you could elaborate more on the "key generation service". you glossed over this pretty quickly after elaborating on why we wouldn't want to use hashing for this. so...how do you create the keys? how do you make them only be 8 characters long? is this simply something you can do in Mongo i.e. populate the LinkTable with a bunch of entries that are empty except for the key? how is this managed? thanks. also...where's the video of the train being run on u?
Oh, I actually think hashing is fine for this problem (and is probably what you should do), but just realize that you have to accept the possibility of collisions, and change one of the keys accordingly. As for the key generation service, the general premise is all of the possible keys are pre-populated in the database, even if they don't have a website associated with them. Then, everytime a user requests a new shortlink, they are assigned one of the available keys and that key is marked as used. The key thing to note here is that because each process is first going to be reading from a row and then trying to claim it, there should be locks used on all the rows to ensure that somebody does not claim a key only to have their claim overwritten by another concurrent request - it is important that all subsequent users see that the key has been taken. Thus, you would want a data base with ACID properties such as a relational one.
Hey Jordan, i have a doubt , not sure if i understood correctly. 1)So the create key servers generates the key using hashing and stores in the particular partition and how do we decide which partition that short url is to be stored. 2)Is our hash function generating same key if same url again comes to system?
1) Based on the hash 2) It should yes, unless of course you don't want it to because you want different people to have different links for the same URL, in which case you should pick a different shortened key
Hey jordan, thanks for this video. Appreciate your efforts. However i have a question related to single leader replication. I would like to understand more about below 2 scenarios 1. assuming that if there is one leader and our service is hosted across multiple regions (possibly in different countries as well ) does all call go to the same leader hosted in one region ? 2. if above is not true assuming that every region has single leader and multiple replication then how do calls from multiple data centers guarantee uniqueness ? so does all leaders in all regions call each other to see if the key is already created ?
1) For writes yes. Since this is TinyURL, I'm not too concerned about write latency. If we really needed to, we could shard our leaders based on IP address and then have those leaders in separate data centers (E.g. north american clients write to database in USA, indian clients write to database in India). You could assign each region a subset of the possible tinyURLs to assign so we don't have to worry about conflicts. 2) The above is true.
Great video! Question - if this is going to be a user oriented tinyurl generation, why not use the sha of (long url + userid)? Wouldn’t this ensure no collisions?
Hey would recommend watching the newer version of this video, that's what I did. It doesn't ensure no collisions, but should greatly reduce the amount.
Why do you say "200 writes per second is no joke. We are gonna need to support high write throughput"? Most RDBMS can sustain that load. Could you elaborate more on rules of thumb you use for QPS and what different systems can support?
Hey Jordan, Thanks for the effort in putting the series. Few questions though 1. Why is it problematic if two users want to generate tinyURLs for same long URL? Beyond analytics use-cases and audit information for the user who created the tinyURL, what other drawback is there in returning the same URL? 2. Why not use something like Twitter Snowflake instead of Key Generation Service?
Hi! I think you mainly voiced the problem right there, otherwise it would be totally fine. However, afaik many interviewers will want unique URLs. As for question 2, I'll have to look into Twitter snowflake, but generally niche technologies like that tend to be less viable for systems design interviews because the interviewer may not know them lol, so you'll have to explain it really well.
@Jordan, wondering if you have prepared slides for all the system design questions by any chance (similar to the system design playlist slides).. If yes, those will greatly help as a refresher as each video is 30mins and is not possible to go through all those to refresh the concepts/applications quickly before an interview..
@@jordanhasnolife5163 these sys design slides and videos are simply unparalleled btw.. love them man! Thanks for making these! Upped my sys design game by 10x for sure 😄. Great alternative to reading DDIA as I did twice, it is brick for sure yes 😀
Ahh fair point I always forget dynamodb and dynamo style are different, though I believe dynamo db isn't open source so it's hard to know exactly what's going on behind the scenes there
Hey Jordan, can you help elaborate a bit more on how does single leader based replication help with the collision problem better than a leaderless replication scheme ?
In single leader replication the writes can only go to one place so you can use locking to ensure that keys will not be duplicated - on the other hand in a leaderless schema this is not possible because two requests claiming the same short link will go to different replicas and only find out about one another later down the line. Then we will have a collision.
Hey Jordan, if there is a URL generator service that pre-populates free URLs to use (GET API queries for free URL while using locking), in that case how would write request to assign long URL->short URL cause write conflicts (and hence we can use leaderless replication on database that stores short URL->long URL association) ?
Hey Rahul, I'm a bit confused on what you're asking for here! For the key generation service there shouldn't be two users requesting the same short URL. because we just take the next free short URL as opposed to hashing where two distinct long URLs might hash to the same short URL. That being said, in a key generation service, there can still be write conflicts using leaderless replication because different replicas of the generation service may assign the same short URL to two different long URLs.
@@jordanhasnolife5163 you mentioned "different replicas of the generation service may assign the same short URL to two different long URLs", how can that happen? Key generation service I assumed would have : database of free URLS that is partitioned (using consistent hashing to distribute load), and each partition grabs a lock on the row of free URL and then marks URL provisioned, how will 2 requests receive same URLs (which can potentially cause write conflict)? Do you mean replication between free URL partitions might cause replicas to be out of sync and hence even locking will not guarantee unique free URLs reported? Would using single leader replication be better at this stage rather than at key generation service (detecting conflict sooner) ?
Hey Jordan! Awesome video. For the key generation service, if we are storing lets say 6 characters for each shortened URL, and we have a service that has a list of all the possible shortened URLs we can give out, how come we don't create a boolean in our shortened URL database called used? Also is there a process we can search for the next available shortened URL to be used (i.e. some sort of indexing capability?)
Hi Stephen! Both of these suggestions definitely work! The biggest thing is to be very careful of race conditions - e.g. you and I try to claim abcdef at the same time, and both of us see it is unavailable, and then we both successfully claim it one after the other. The point is that locking is necessary here.
I am somewhat surprised people are worried about double key allocation and using complex solutions with used\unused keys. If you are requesting a key and after that abandon it - forget about unused values. What do you think?
I think the one scenario you just want to avoid is where two people end up getting the same short URL for their link, but you can pretty easily avoid this with transactions
How to come up with these calculations in interview. do we need to remember or calculate them or we can ask interviewer about the scale we should cater to?
I'd work with the interviewer and ask them how many users you can expect, and then start making up numbers based off of that - this part isn't overly important you just need to get something in the ballpark of the correct scale
Can you elaborate more on Zookeeper please? Does fetch service or load balancer b/w fetch service and Db communicate with zookeeper to get the details of MongoDb partitioning i.e. get which master/replica contains the full length url given shortened url. Also, what happens when we add/remove servers from MongoDb cluster, will zookeeper detect it automatically and adjust it’s partition mapping?
The job of zookeeper is basically just to tell database nodes which of the database nodes in the cluster are up or down. Upon figuring out this information, the nodes in the cluster can make decisions to rebalance the partitions or things like that. I think with MongoDB specifically they may handle this for you, so perhaps ZooKeeper was unecessary here but were you to be making your own distributed database you would likely need a coordination service unless you decide to elect to use a gossip protocol.
Makes sense! Most of the Nosql Dbs now support automatic sharding but if we’re building something on our own then definetely we need some sort of coordination either through zookeeper or by making a HA configuration of specific nodes assigned as master node for coordination just how ElasticSearch does it
I have interview in 5 days, where can i find condensed information about different databases like this is single leader architecture or this one uses last write wins during conflict? Basically i want to know if there is a source of titbit info about various databases
Tbh I don't really think there's a great source for that which is unfortunate - I generally would just say Cassandra if you want leaderless, Mongo for single leader NoSQL, and MySQL for single leader relational databases - you could always talk about NewSQL solutions in a system design interview but to me it seems those are the three most relevant
@@jordanhasnolife5163 thank you for your reply, i just said these terms because they were mentioned in the video. If you could make a video on something like this would be unique across RU-vid and help full for people looking to gain some info. Video series like comparing the different database solutions for different criteria like handling db collision handling and sharding support and so on. It would be good source for system design prep. Thank you for your time.
@@sagarmoghe While I agree and had originally planned on doing something like this the truth of the matter is that there are a lot of possible database solutions - if you look on my channel I more or less made a video on each of the notable ones, I'd just recommend watching them all.
@@jordanhasnolife5163 thank you. I am really early career wise and don’t have much experience with db. When u used terms like leaderless, they are intuitive to understand and I didn’t know something like this exist. Even if u don’t make dedicated video on its working I understood the gist of it without going through the db workings in detail. Thank you😁 i was just lazy and wanted a condensed info/properties about databases without going deep into their workings. But i guess there is no way around.
I understand we are using single leader replication to avoid race conditions. But how do we know if the single replica can handle 200 writes/sec? when (the breakpoint write RPS) would the write load become a bottleneck, to consider other options?
Yeah this is where partitioning comes in handy. The writes for the same hashes go to the same server, so the write conflict will still be solved, while allowing us to scale out write throughput.
I do not understand the responsibility of the zookeeper here. Once in an earlier video, you said that it holds the info for which hash can be found on which partition but i do not understand what it does here.
Hey Jordan, Grokking estimates that tinyURL will need 12,500 servers... I think that is way too many and they must have made a mistake, but I want your confirmation before I go around saying that they're wrong.
@@jordanhasnolife5163 I disagree with the formula Number of Users = DAU / 8000. It would make sense if each DAU were giving one request per second, but it's more like each DAU gives one or two requests per day. So it should be (DAU * 2 ) / ( 8000 * 10^5) , where 10^5 is an estimation for the number of seconds in a day. So given a DAU of 100M (asa per Grokking) that should be like one server. Add a few more for replication and different az zones and maybe to account for peak traffic, but no more than 10.
Oh that’s the standard that Grokking chose to use as an average of the number of requests per second a server can handle. I’m not entirely convinced about it though. Would love to hear your thoughts about server load capacity and how many connections it can handle per second.
@@alicezhang3504 Yeah truthfully I'm not sure there and I'm sure it changes from framework to framework and CPU to CPU, but 8,000 doesn't seem terribly unreasonable.
Thanks! Assuming that you are completely new to coding - I'd first recommend learning a decently simple language like Python at freecodecamp, then make a personal project or two with it, and then finally start practicing medium questions at leetcode.com for a couple of months before applying to jobs!