How to implement TinyURL (System Design Interview)

KA Education

Подписаться 4,2 тыс.

Просмотров 77 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

20 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 120

@alexbordon8886 3 года назад

omg this is the best system design tutorial for tinyURL !!! Each minute hits the point instead of bullshit. 赞赞赞

@malcolmdinz1912 3 года назад

Best TinyURL design video I've found, and some concepts here can be applied to other design questions too. Thanks for making this!

@cynthia7000 3 года назад

Good quality! Would love to see more of these system design videos!!!

@Nimzo003 Месяц назад

Best system design video I've ever seen. I'd hire you on the spot.

@seeboonsoo 8 месяцев назад

No one explain better than this guy!!

@AnangPhatak 11 месяцев назад

Please make more videos on systems design. The content here is presented succinctly with non-complex simple diagrams and licid explanation.

@rohinirenduchintala Год назад

The best comprehensive tinyURL video for system design i've seen so far. Please make more!

@gsb22 3 года назад

@13:11 Everything was going, you even mentioned security issue of using counter making the url guessable and then you said, we use 10-15 bit to add randomness. Now we cannot add bit that will make this number a bigger number because zookeper might reach that range when it starting from 1million. 2nd, lets say, we add random character to it, then the base62 encoding wont be 7 char long and then you have to take first 7 chars which WILL result in collision sometime in future.

@ericfries7229 2 года назад

Convert the number from Zookeeper and the random bits to strings, then concatenate the strings instead of adding ints.

@Maffeos 2 года назад

@@ericfries7229 Could you elaborate? From what I understand, if we concatenate the number with the random bits, then the resulting string would be more than 7 chars. Say the new length is 10, base62 encoding would give another 10 char string, which can still result in collision, no?

@alperkocatas2353 7 месяцев назад

Up. Same concern here... Looks like we need another method for randomness.

@tenaciousbali 7 месяцев назад

Comprehensive explanation and good pointers. Loved this!!!

@umeshhbhat 7 месяцев назад

maine be yahi bataya tha

@vaibhav_cs 11 месяцев назад

Excellent approach and proper delivery of content !

@PallNPrash 3 года назад

Excellent video!! You packed everything of importance in a nice, short video. GREAT job!! And THANK YOU!!

@mariezhang6818 4 года назад

Best TinyURL design video! Good pace.

@vandal144 3 месяца назад

GOAT tinyURL vid

@bhavana1711 4 года назад

This is the best video with example solution for this problem, keep it up!

@9939364566 10 месяцев назад

Amazing way of explanation.... I watched many YT videos and they all confused me. You nailed it ✌️ Tysm

@neosapien247 4 года назад

This is the best video on this particular topic.

@RandomShowerThoughts Год назад

this was very good, clean and concise. Really like this video, especially the logic as to not pick sql

@DK-ox7ze Год назад

If we add 10-15 bits at the end of counter number then it will increase the base62 output size and exceed the 7 character limit. So can you clarify how that addition of random string works?

@jinushaun Год назад

TLDR: md5(“123”) vs md5(“123,xyz”) The original solution hashed the returned counter value, which is a simple increasing number. The resulting hash is then base62 encoded. The new solution takes the counter number and appends some extra characters at the end. This new string is then hashed and base62 encoded. In this way, the string sent into hash is not guessable. This works because the numeric portion is still guaranteed not to collide.

@DK-ox7ze Год назад

But in this new solution also, there is a chance of collision because we are hashing the counter+random_chars and limiting the hash to a short length.

@chirut4327 11 месяцев назад

Even without appending extra digits, base62 does not guarantees 7 chars. That simply means, we are not confining ourselves to 7 chars. We probably need to tell the interviewer that we would get as small as possible hash but no guarantees. This is a possible tradeoff to avoid collision.

@rajathrao3209 2 года назад

Would love to see more of these system design videos!!!

@tanujaphadke8219 3 года назад

this is the BEST video on this topic

@europana7 Месяц назад

Resiliency can already be done by AWS identical Lamda in multi regions and Elasticache. … and Azure prob has an equivalent.

@d86clot 4 года назад

Amazing video. Thanks for do it short and clear

@FlashFreelancers 4 месяца назад

How do we identify the same long url which got converted to tiny url already ? Since we are using counter each time it will get a new tiny url no ?

@studyiq5015 2 года назад

Great explanation Plz post more system design videos Thanks :)

@talktovideos4349 Год назад

Thanks a lot for making this video

@deep8843 5 месяцев назад

Really helpful!

@chronoel 3 месяца назад

is there a way to replace zookeeper? why not just store a record of ranges used in a db or redis cache. since the operation of generating a new range happens once in a while. specifically, when a new server is being initialized and another when the range of a server rans out and need to fetch a new one

@yuanyuanweng9784 4 года назад

Thanks! best system design video. Waiting for more video

@Yoyomanmanholla 2 года назад

LRU cache eviction has a big short coming. If we assume the cache is always at capacity with each new shortURL creation, then that means one of the top 20% URL will surely get evicted from cache, to make room for a random unpopular URL someone has submitted. This eviction approach means there will be a steady state of URLs in the cache that are not popular at all.

@boyangzheng269 7 месяцев назад

LRU is used to maintain the top 20%, so the usage count also matters.

@GauravSharma-wb9se 2 года назад

does Base62Encoding of counter guarantee to return only 6 characters or 7 characters string ? if this is the case then when we do Base62Encoding on MD5 hashed value, in that case also it should return 6 or 7 characters of string...…please clear this doubt

@chirut4327 11 месяцев назад

Believe me brother, none of the guys were able to clearly explain this on RU-vid/Books/Blogs. Base62 does not guarantee in terms of result length. It generally increases in this case though.

@sindhumohana6164 3 года назад

Best video on this Topic !

@narindersharma303 2 года назад

if we have to base60 the #, then why not precompute it and distribute it to servers, i.e. S1 will keep the hash60 of 0-1M , S2 will keep the hash from 1M-2M and so on.

@mrowox 2 года назад

Concerning the round robin approach to load balancing not taking server load into consideration, such that it might still forward request to an overloaded or slow server, I believe that due to the style of this approach, if one server is overloaded, then all the servers are possibly overloaded else seeing that it would have been distributing the loads equally except if the servers are not of equal capacity which will then bring the question of why will you spin up a server of lesser capacity

@bayareacarnatic 2 года назад

If you are using a unique counter why do you still need md5 and base62 encoding.? And then you need additional complexity like integrate zoo keeper etc just to maintain counters.

@boyangzheng269 7 месяцев назад

md5 is not required but base62 encoding is otherwise image how long the url could be when the count increases to 1 trillion.

@coop4476 4 года назад

This is excellent! Thank you!

@coop4476 3 года назад

Fake

@ashwinnatty 4 года назад

Hi. Good video. I have a question though. If we do a base 62 of smaller numbers like 0, 1, 2 ... we wont be getting a 7 char long tiny URL right? So how do we make sure all our short URL's are 7 char long? Kindly clarify

@p516entropy 4 года назад

base62("000007") -> "F2nt9Syt"; base62("000023") -> "F2nt9T75"

@ashwinnatty 4 года назад

@@p516entropy : Hi. Thanks for responding. But now consider a bigger number , say 1000000, it's base 62 -> 11PVWGSpX6 which is greater than 7 characters. I am still unable to understand how to fix your short url using counter method to exactly 7 characters

@p516entropy 4 года назад

@@ashwinnatty Oh sorry I was hurry and now I understood, here is base62 algorithm has the same approach as base2 (000, 001, 010, 011, 100, 101) or base16 (0000... FFFF), but now that is base62. (0 000000 , 1 0000001 , 2 0000002 , 3 0000003, 4 0000004, 59 000000X , 60 000000Y , 61 000000Z , 62 0000010 , 63 0000011 , 64 0000012 , 65 0000013, 56_800_235_583 0ZZZZZZ , 56_800_235_584 1000000, 3_521_614_606_207 ZZZZZZZ). And indeed, as you can see no any collisions

@journeytoraceday 4 года назад

@@p516entropy sorry, still don't understand this. base62(101020)=FMA4abRo and base62(101021)=FMA4abRp. If we take the first 7 chars from each, we get a collision, "FMA4abR"

@gurumack 4 года назад

@@journeytoraceday I have the same question...

@chilly2171 2 года назад

what if the custom url entered by the user already exists in the database? it results in duplication.

@michelmdf 7 месяцев назад

Why not just rely on a database cluster to generate a sequencial numeric id , and just persist the id and the long url ? So when a request comes in, we convert from base62 to decimal and find the record in the db. What is wrong of this idea ?

@mrowox 2 года назад

You mentioned zookeeper. what is the number of servers is not stable, could be 8 today, then 5 tomorrow, then 10 next tomorrow. How does zookeeper manage those ranges based on thee number of servers available. I really need to understand this. Please help

@renguomin1 3 года назад

Why introducing a counter guarantees you no collisions?

@funnyclipz520 2 года назад

why can't the database (mongodb) itself generate an auto increment ID and then server can generate MD5 or base62 after adding salt to it ? Since db upon creation would never fail to generate a unique primary key it would avoid single point of faliure. Making a distributed solution for generating ID seems overkill here.. I am new to this so let me know if there is a problem in this solution.

@jamiepearcey9335 3 года назад

Great video explanation.

@fjeld11 4 года назад

Thanks for the excellent video. A question, in solution 2 with zookeeper, how does each server records the range it was given and when the range exhausted? Is there a counter on each server?

@ketanmalwa4580 3 года назад

That logic can reside on the zookeeper service itself. For instance, the coordination service can use a custom hashing function based on the incoming request and forward it to specific servers depending on the range value.

@fjeld11 3 года назад

@@ketanmalwa4580 Got it, thanks a lot!

@mohamedelidrissi810 Год назад

For caching, will it be better to have some kind of background job that populates the cache with the most popular URLs from the database? Else if you're always adding a URL to the cache, then it's no longer just the popular URLs but all of them (or at least the capacity of the cache).

@hamzasanwar Год назад

Why base 62 not base 64

@abjbreal 4 месяца назад

52 alphabets and 10 digits = 62 total characters

@ubaidmanzoorwani6254 2 года назад

Why do we have to use base62enode after MD5??? Why is MD5 not enough??

@arjunsankarlal5296 3 года назад

Well explained!

@fannyzheng1077 8 месяцев назад

Thank you

@tgiflying 3 года назад

Why do MD5 or SHA256? Just pick 7 characters at random with replacement from the set of base62 symbols

@benjaminwestphal9685 2 года назад

How does your hashing work o.O - you take an 128 bit md5, and then make a base62 out of it with 20+ chars. How are you sure there are not collisions when only taking a subset of the base62 aka md5 hash(just differently displayed)? o.O

@dbenbasa 4 года назад

question about using counters - isn't that mean that if the same user asks to shorten the same URL multiple times, he will get multiple short URLs? Is it ok from a requirement standpoint?

@PABJEEGamer 3 года назад

Whenever you get a request to shorten URL 1st check should be to see if it already exists either in cache or main DB. If no, then only proceed with the short URL creation. Or you could use the insert "IF NOT EXISTS" feature provided by no sql solution like cassandra.

@gsb22 3 года назад

You cant return same url ever again.

@jochenjochen4246 3 года назад

i am curious why 128 bit goes to about 20 chars by using MD5 + base 64/62. 128bit to 32 char in hex. but what next? thanks in advance

@hyrdeshgangwar 2 года назад

Good Stuff!

@Freez2018 3 года назад

Hey Kevin, should we wait for new system design videos?

@georgesmith9178 4 года назад

Thumbs up, thank you.

@niteshupadhyaya007 2 года назад

The counter example is incorrect - Another solution could be to append the user id (which should be unique) to the input URL. However, if the user has not signed in, we would have to ask the user to choose a uniqueness key. Even after this, if we have a conflict, we have to keep generating a key until we get a unique one.

@spacepacehut3265 11 месяцев назад

it isn't incorrect because the counter itself will be unique everytime even if the random bits will be same. And so does the generated hash will be unique for every url.

@nicholasmanning4307 4 года назад

Nosql is in no way related to a database not excelling at joins. Graph databases excel at joins and relationships and they are categorized as nosql

@eleanorwang6226 3 года назад

Very clear illustration! Thank you

@lucianomonterovidela 3 года назад

One question, If we take the first 7 characters even with numbers differents, we can have colission. What I didnt undestand?

@gsb22 3 года назад

With number that is less than 3.5 trillion, it would always result in a unique base 62 string, but the issue here is,, it would be guessable, because if 1 million -> xyz001 and 1million 1 would be xyz002. In order to tackle that, we could add random characters to the number which will result in base 62 string with len > 7 and now it is an issue.

@kumarmanish9046 2 года назад

What about database? He never talked about how to store in db

@kavyapallari2902 2 года назад

Good video

@cbest3678 3 года назад

Can someone explain why zookeeper itself is not single point of failure in this case?

@jamiepearcey9335 3 года назад

Zookeeper provides increased availability through redundancy by operating in a multi-node cluster configuration. Multiple zookeeper nodes would have to go down for service interruption. As the URL shorting service reserves a range of values rather than a single value, the number of roundtrips to this counter reservation system is also reduced, which in turn also helps to reduce system load, network traffic, and the number of services involved in handling each reservation. These all help to increase availability, and can also help to increase performance and reduce running costs. You can increase availability further by spanning your nodes and services across separate datacenters, to avoid your infrastructure becoming a single point of failure.

@mayurkoli4145 4 года назад

Well the lecture is great i have a doubt, MD5 is generating unique hash each time so we can grab first 7 char of that hash, then why we need to convert it in base62.?

@jangrahul 3 года назад

Please let us know if you get the answer to this ques

@jangrahul 3 года назад

ok, got it. MD5 generates 128 bit output which is generally represented in hexadecimal, if you will take first 7 char of this string made up of hexadecimal you will not be able to make many combinations. only 16^7 combinations. However if you use base64 encoded string we you will able to have many combinations in 7 char i.e 64^7

@mayurkoli4145 3 года назад

@@jangrahul got it man thanks u✌️✌️☺️

@gsb22 3 года назад

beware this WILL result in collision if two users use the same url.

@johnhuynh9445 4 года назад

Let's say the requirements change and we want a unique_url every time the same long_url is entered. (Use Case: you want to conduct analytics on the different entry points to the long_url site). How would we accommodate unique short_urls for each successive insert of the same long_url?

@jamiepearcey9335 3 года назад

This design already allows us to associate the same long URL with one, or more short URLS. To prevent the same long URL being associated to multiple short URLS the service would first have to check if the long URLS is already in use, and if so return the existing URL, which isn't something that has been described in the requirements or implementation detail.

@chirut4327 11 месяцев назад

base62 encoding can result in a string of any length. And we are not supposed to take first 7 chars to avoid collision. So that means we take what ever output we get from base64. This is what I want to hear from these videos, but believe me none of them emphasize on this. They just say blabla and use base62.

@boyangzheng269 7 месяцев назад

We are not encoding the original URL because as u said it would result in a string of any length. Instead we are encoding the numbers (0 - 3.5 trillion) which would not exceed 7 chars because 62^7 > 3.5 trillion.

@kakakukukakakuku 3 года назад

A good candidate would be the person who has watched this video :-)

@blaycoder 2 года назад

🤣🤣

@sengo669 4 года назад

Base62(100000000000) + another 10 to 15 bit number would be super long. Do you think about it?

@sengo669 4 года назад

So our goal is to generate a short URL not to make a longer URL.

@jamiepearcey9335 3 года назад

Base62 should be considerably shorter than base 10, and the value you describe will approximate to 7 characters. Base64 is considerably shorter still, but the extra two characters might not be suitable for urls! You may be attempting to convert a string of numbers to base62 which would could create a longer base64 value, as you would be starting off in a higher base unit.