Тёмный

What is CONSISTENT HASHING and Where is it used? 

Gaurav Sen
Подписаться 594 тыс.
Просмотров 795 тыс.
50% 1

Опубликовано:

 

30 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 669   
@andreimarculescu911
@andreimarculescu911 6 лет назад
the best solution is not to use K hash functions, but to generate K replica ids for each server id. Designing K hash functions while maintaining random uniformity and consistency is hard. Generating K replica ids is easy: xxx gives K replicas xxx + '1', xxx + '2', ..., xxx + 'K'. Then you take these replicas and generate K points on the ring with the same hash function and this is what is actually used in practice. Chord algorithm is just an example of this technique to add K replicas for each server id
@gkcs
@gkcs 6 лет назад
That makes sense. K numbers assigned to each server would do the job :)
@pradipacharjee4915
@pradipacharjee4915 5 лет назад
Hi Andrei, can you just tell me how to choose idle replica count(k) ? for efficiently add or remove servers.
@dudejaa
@dudejaa 5 лет назад
The example that you took mentions xxx+1,+2,+3...+k. Correct me if I am wrong but if you assign k consecutive numbers to the same server the load wouldn't distribute (on adding or removing a server) uniformly. That could be one reason to look for different hash functions ?
@charchitpatodi8677
@charchitpatodi8677 5 лет назад
@@dudejaa Just a thought : he probably not means +1, +2... instead if xxx is id, M is ring capacity and k is number of servers then second position (after hash(xxx) )will be hash(xxx) + (M/k) OR hash(xxx+M/k).. And probably third position will be hash(xxx) + 2*(M/k) and so on till multiple of 'k'
@rishabhmalhotra7058
@rishabhmalhotra7058 5 лет назад
@Abhishek Dudeja xxx, xxx+1.. are ids for one server to take a hash on and then reach the respective points on the ring, not the points on the ring itself. And then the hash generated on xxx and on xxx+1.. would be completely different and random, and hence would plot k uniformly random points. @CHARCHIT PATODI I dont think that's the case cause if you think about it , if you add multiple servers each with k different points with that technique -> hash(xxx) + 2*(M/k)..till K, then you're not really randomizing and there would be no difference between adding 1 point or k points per server when it comes to choosing a server for a request. It would be like if you multiplied the ring length into k after choosing one point per server which would not get us what we want.
@AlwaysAnAddiction
@AlwaysAnAddiction 5 лет назад
Sitting in the hotel room, watching this 1 hour before my google interview in New York. Thanks Gaurav!
@gkcs
@gkcs 5 лет назад
All the best!
@gkcs
@gkcs 5 лет назад
@@AlwaysAnAddiction wow, tough stuff. How'd you reckon it went?
@AlwaysAnAddiction
@AlwaysAnAddiction 5 лет назад
@@gkcs I believe, It went well. I have watched most of your system design videos, they were quite helpful. I am on the junior side 3 YOE so I think they went easy on me in Sys Design. Also, I was able to complete all coding questions in time. Google is always a long shot though. 🤞🤞
@karthikmucheli7930
@karthikmucheli7930 4 года назад
@@AlwaysAnAddiction hope you got the job
@Leptoszom
@Leptoszom 4 года назад
You got the job, Bajpai?
@shreysom2060
@shreysom2060 4 года назад
I used to see your "Competitive Programming" videos before getting into a company and now after getting learning things there ,I am watching your "System Design" it feels good to grow with this channel. Thank you so much 😊
@keshavagrawal89
@keshavagrawal89 Год назад
Couldn't understand a thing within first 4 mins. Like I know its a ring of virtual address but suddenly some random things started getting into the video and I lost the interest. like when did blue area became data and why the data after s4 got served magically by s1 and not by s0. Sorry I m sure you are an expert and trying to help people out there but I guess it can be broken down into more clear diagrams and reasoning.
@gwho
@gwho 2 года назад
this video was hard to follow.
@UlfAslak
@UlfAslak 3 года назад
Notes to self: * The previous video gives the impression that there is a mapping from ranges of integers to server ids, and that consistent hashing is about to mapping request ids to integers in ranges resulting in more consistent routing of requests to same servers. -> I did realize that this would not work very well over time, as you would end up completely changing the ranges for higher-index servers with the addition of multiple servers. * In this video, requests ids map to an index in a ring with `M` indices. The "trick" then, is the map the server indices to indices in the ring using the same hash function that also hashes request ids. Now, to assign a server to a request, one simply looks clockwise for the nearest server. * To make it less likely that load will be unbalanced due to (what I would call) unlucky hashing, another idea is used: simply have multiple hash functions for the servers, such as to map them to multiple locations in the index ring! (clever). * @Andrei Marculescu points out that better than using multiple hash functions for server ids, it is easier to maintain multiple aliases for each server id ("...xxx gives K replicas xxx + '1', xxx + '2', ..., xxx + 'K'.") and thus map servers to multiple locations in the index ring.
@Luk3Stein
@Luk3Stein 3 года назад
Thank you!! I was having so much doubts after watching, reading this made it more clear.
@codingfork6708
@codingfork6708 2 года назад
How can we determine the value of `M`? Is [0, M-1] the range of the output of the hash function?
@UlfAslak
@UlfAslak 2 года назад
@@codingfork6708 Correct. I think there are good heuristics for choosing M (and probably everyone uses the same standard values). Your hash function has to apply modulus M, otherwise you get an index out of range.
@nxpy6684
@nxpy6684 2 года назад
Thank you! This helped me a lot!
@saitamaopm7561
@saitamaopm7561 4 года назад
You start out with "SO THE PROBLEM IS NOT ACTUALLY LOAD BALANCING!" "ADDING AND REMOVING SERVERS THAT WE SAW?" wait when did we do this? Over a cup of coffee? Feedback: I noticed this with most of your videos, video is not congruent with the title at all. Most of your videos have some pretext as if all the videos are part of a sequence. If someone were to just look for what is consistent hashing this video doesn't make any sense.
@gkcs
@gkcs 4 года назад
Yes, the load balancing and consistent hashing videos were shot on the same day. There is some context missing when this video is seen individually. I think a card link at the start of video will help. Good use of sarcasm btw. "Over a cup of coffee?" made me laugh :P
@nankitable
@nankitable 4 года назад
With multiple hash being applied, can there be case of collisions, i.e. multiple servers ending up on the same bucket? If not , why? If yes, how is it handled?
@srinivasasrikanthpodila4376
@srinivasasrikanthpodila4376 3 года назад
Gaurav, The Addition/Deletion of Servers using the k-hash functions with the fixed ring size is a hard problem to solve to ensure the correctness. It could be simplified with generating the multiple ids of the same server.
@gkcs
@gkcs 3 года назад
That's right 👍
@mohammadkareem1187
@mohammadkareem1187 2 года назад
And S4 is the happy guy, hmm , not really! :)))))))
@RicardoBuquet
@RicardoBuquet 4 года назад
2 years and no code... that not sound like "soon" :D
@gkcs
@gkcs 4 года назад
github.com/coding-parrot/SystemDesignCourse/blob/master/service-orchestrator/src/main/java/algorithms/ConsistentHashing.java It's never too late 😛
@mrrobot6320
@mrrobot6320 3 года назад
@@gkcs When we use k hash functions, how do we solve when two hashvalues collide? for example using h1, s1 = 1, s2 = 4 and using h2, s1 = 3 and s2 =1?
@gocrazy6177
@gocrazy6177 2 месяца назад
Fresher watching tiis video. Placements will start within a month
@ankushhv6972
@ankushhv6972 27 дней назад
Great explanation, I explained the same intuition in my zomato interview and the interviewer was really impressed, thanks a lot Gaurav ❤
@gkcs
@gkcs 26 дней назад
Cheers!
@jeffruan7701
@jeffruan7701 5 лет назад
Knowledgeable and confident presenter!
@rakeshvarma8091
@rakeshvarma8091 3 года назад
Gaurav, This video is wonderful Have small doubts Let's assume that request R1 is served by server S1. Now we have added a new server S2. Because of this let's assume the request R1 is now coming to S2. How the above scenario gets handled ? Is it like when a new server S2 is added , we have to move some portion of the data from the existing servers (S1) to the new server S2 based on its position on the ring? If it is the case, how can we do the distribution in real time ?
@SK-ur3hw
@SK-ur3hw 5 лет назад
Great video!! I thought that we can add a load factor or load limit like one server can have x requests. So once the load limit is reached, the incoming requests will point to next clockwise server. That way, no server will have too much load. But of course the virtual servers concept is good. Can you please add the code in the desc? Thanks. :)
@gkcs
@gkcs 5 лет назад
Sounds interesting. There are variations on consistent hashing which allow this. Code link: github.com/coding-parrot/SystemDesignCourse/blob/master/service-orchestrator/src/main/java/algorithms/ConsistentHashing.java 😁
@sreeram8942
@sreeram8942 3 года назад
@@gkcs As you said in previous video about the User's cache data in a particular server , How does consistent hashing solve that issue ?
@VishalYadav-gk1kg
@VishalYadav-gk1kg 10 месяцев назад
Very Nice Explanation Sir, Thank you !
@keshavabhamidipaty3126
@keshavabhamidipaty3126 5 лет назад
Great video! I was wondering though, with this architecture, do you have to ensure that the hash functions don't ever collide though right? What would happen if an incoming request suddenly mapped to two servers that fell on the same point?
@gkcs
@gkcs 5 лет назад
It's answered in the other comments 🙂
@timurmukhtarov1319
@timurmukhtarov1319 4 года назад
This was amazing! Havent seen other videos that talked about provisioning virtual servers/using multiple hash functions! Hooked!
@phaneendran4208
@phaneendran4208 5 лет назад
Hi Gaurav, Great series of videos. Thank you for sharing your experiences. I have one question on consistent hashing.. Which component of the distributed system is responsible for implementing this technique. 1) Is it load-balancer's job because it is a load distribution technique? 2) Or is it application's responsibility.? Curious to hear your thoughts. Cheers!
@_romeopeter
@_romeopeter 2 года назад
I don’t know if you still need answer to this but it’s the Load Balancer’s job because distributes the request and allocate them to the right servers.
@sivas09
@sivas09 4 года назад
'n' being the number of servers and 'm' being possible hash values, would spacing out the servers at a value of m/n be a working solution? For ex - with m as 256 and n as 4, first server could be at 64, second be at 128, third at 192 and 4 at 256 - along those lines Understood the possibility of skewed allocations and the need for replicating ids tho. Hooked to your amazing content! kudos
@SP-db6sh
@SP-db6sh Год назад
This channel is like System-Design Wala , far far better than most paid courses, simple explanation
@jeyakumar4728
@jeyakumar4728 4 года назад
Hi Gaurov, Wont removing / adding servers to the cluster affects the hash function modulo(%) Example: initially we have 4 servers hash(req for same id) % 4 -> s2 if we remove 1 server :- Hash(req for same id) % 3 -> s1 in this way, still the server 2 have stale cache data right?
@pradyum88
@pradyum88 4 года назад
Would the User caching still exist in consistent hashing? Great videos BTW!
@abdelrhmansamir1426
@abdelrhmansamir1426 3 года назад
What would happen if there is a collusion when you calculate the virtual servers? I mean if h1(S0) = h2(S1) = 1. So there are 2 servers with the same ID right?
@AbhishekKumar-ub8co
@AbhishekKumar-ub8co 5 лет назад
I loved the way with ease and simplicity you explained the problem using some pictorial diagram. Good work keep it up!!
@gkcs
@gkcs 5 лет назад
Thank you 😋
@baby_adventures
@baby_adventures 4 года назад
If we add a new server in this consistent hashing ring then again caching problem will remain same? The requests which was going through s3 before adding new server are now handle by s4.. so, s3 cache for those requests will be useless? Please explain
@romanesterkin
@romanesterkin 4 года назад
Gaurav, I have a question: if the hash function h(x) maps values to the range of (0...M-1), why do you need h(Server Number)%M? %M is redundant here, isn't it?
@vipindixit5532
@vipindixit5532 Год назад
Same question from my side.
@NehaKumari-my7cv
@NehaKumari-my7cv 5 лет назад
Hi Gaurav, Thanks for sharing such a nice concept.I have one doubt what happen if one server die suppose s1 for 2 hr and then again come back after that so in this case how request are handled.
@crabjuice47
@crabjuice47 3 года назад
God Bless you Gaurav. Love from Pakistan.
@irockrock44
@irockrock44 5 лет назад
thanks for the video but i think you can improve overall. specially you din't talk much why it is called "consistent" and how it matters when we loos/add servers
@saurabh1203
@saurabh1203 3 года назад
What if the 2 different hashing algorithms for 2 different servers produce same result ? Like for S1, H1 gives 19 and for S4, H2 gives 19. Now both of them will be placed at same location in the ring. What is the solution for this ?
@yashkumardhawan6961
@yashkumardhawan6961 4 месяца назад
I have got one question. Lets say there are two hash functions. Now each of the M servers are going to have M*2 copies. But, consider a scenario where the hash of one server mod M and the second hash of another virtual server mod M comes out to be the same? Let the hash values be as follows: H1(2) = 40 H2(5) = 20 As we can see above, the hash values are fairly different. But, H1(2) % 5 = 40 % 5 = 0 H2(5) % 4 = 20 % 5 = 0 In this case, both the hashes are different but the virtual server 5 is pointing to the same location as the actual server 2. Isn't it conflicting?
@saypal18
@saypal18 4 месяца назад
I am thinking of a different method: we have a single hash function h(r) and if there are k servers, we compute all the hash values h(r)%1, h(r)%2, ... , h(r)%k, and store it in an array. Now loop through the array in reverse and the first time we get array[i] = i-1, we send it to the ith server, and of course come out / break the loop Let's prove this via induction. Its trivial for only 1 server. Let's say that we have n servers, and load was equally distributed among them. Let's introduce a new server n+1. The new value added to the array for any given request is h(r) % (n+1). in only 1/n+1 evenly distributed cases, will its output be n, and request will be directed to the n+1th server.. Since we assumed that previous load was evenly distributed, load to the new server will also be taken evenly from all previously existing servers.
@z08840
@z08840 2 года назад
I didn't get the point - if hash-space is M - i.e. h = [0,M), then h%M == h - why you need to take a remainder??? how h(0) can be 39 if M is 30??? second - random distribution only averages evenly - there is a big difference between uniformity and evenness - when you have relatively (to M) small number of servers you inevitably in most cases will get uneven segmentation - load on AVERAGE will be 1/N - practically you will have significant differences - and ANY GOOD(!!!) hash function will give this flaw
@me57shreyashjadhav37
@me57shreyashjadhav37 3 месяца назад
Suppose there are three payment microservice. Each has its own DB. Now if user1 comes with request. His ID will be unique and he will be assigned to any particular microservice again and again everytime . Until new server of microservice is been introduced or existing microservice is removed (according to consistent hashing). Now my question is, if user1 wants to do payment . He did payment and his payment status will be stored in DB of that particular payment microservice. Now by chance if that DB fails or service fails after the transaction is done. Now here how is this issue handled.
@perfectlyfantastic
@perfectlyfantastic 5 лет назад
8:33 it was told that k value should be log(M),Is it just a suggestion or its the value we should definitely consider
@roamwithashutosh
@roamwithashutosh 4 года назад
🙂
@pavankongara792
@pavankongara792 6 месяцев назад
Amazing!
@AbdurraffaySyed
@AbdurraffaySyed 2 месяца назад
Hi Gaurav, Thankyou for such an amazing video. I was thinking of how can we balane the load in a more better way and I came upto dictionaries. If we use dictionary along with a pointer which will be holding the next server id that will handle the request, so we might be able to balance the load in a more optimized way and the possibilities in the consistent hashing of skewness and uneven balance of load in case of server failures might be handled. I might be wrong over here. I would love to hear your thoughts on my solution.
@johnleonardo
@johnleonardo 3 года назад
your content is insanely good. seriously, the best! you were destined to teach others!
@ajayreddy9219
@ajayreddy9219 2 месяца назад
big fan of your videos, but i think consistent hahshing doesn't use the simple modular operation on the no of servers we have on hash space. it just uses the hash function to map a server and the key object unlike simple hashing.correct me if I'm wrong.
@i.sapnadip
@i.sapnadip 4 месяца назад
So does this mean ,Backup servers? like h1(0) h2(0)????
@maan9011
@maan9011 3 года назад
Hi Gaurav just curious r u student Or working if yes where... S U such brilliant kid.. I hv 10 yer exp in Sw still m not good as u
@harshmashru
@harshmashru 3 месяца назад
what if the number of requests is decreased or increased? In that case, M would not be constant right and the value of K would be changing, correct?
@gauravganna
@gauravganna 2 года назад
K hash functions can have collision. Two hash functions assigning same search space to different servers. How should this be avoided?
@GauravSharma-wb9se
@GauravSharma-wb9se 2 года назад
what will be the M for other Hash functions ? it can't be same otherwise we'll get same value...so either M must be changed or input value must be changed so which one we should change ?
@mohitmalhotra9034
@mohitmalhotra9034 2 года назад
What about hash collisions, different hash functions can generate similar values very often
@bouzie8000
@bouzie8000 7 месяцев назад
That virtual server solution blew my mind I'm so sorry. Geniuses have really paved the way for us in computer science.
@SamarendraPatra
@SamarendraPatra 3 года назад
What about Round Robin and Least Load Load balancing techniques... We use this in F5 Load balancer
@darkreaper4990
@darkreaper4990 5 месяцев назад
I am not a computer science student so the remainder calc. is kinda not intuitive. why do we use remainder here? sorry for the stupid question but I need to know.
@himanshubanerji8800
@himanshubanerji8800 20 дней назад
Adding more virtual servers to route those requests is fine, but at the end aren't they virtual servers ? so doesn't that mean that the they aren't there in real life, even if we route the request to those virtual servers what will the request do after they reach that point if there are no actual server to process them ?
@krutikpatel906
@krutikpatel906 8 месяцев назад
Can it be used to distribute workload among worker nodes? I see it mainly used in db applications
@TXS-xt6vj
@TXS-xt6vj 8 дней назад
Does this system design playlist just have 36 videos in total? I have watched about 5 of them and wamted to confirm that if i do these 36 videos and understand the concepts well will i be able enough for a faang interview?
@gkcs
@gkcs 7 дней назад
They are great to start with. For strong interview prep (product companies) or upskilling, I have a course at InterviewReady. interviewready.io/course-page/system-design-course
@mulllhausen
@mulllhausen 3 года назад
I guess the assumption is that all servers have the same data? Eg read-only slaves in a cluster. But what if they all have different data (shared)?
@theappareddy8376
@theappareddy8376 3 года назад
Hi Gaurav how can consistent hashing can solve hot user server load problem. Lets say we are sharding by user id even after consistent hashing hot user server ger the same load
@rahulkushwaha7896
@rahulkushwaha7896 4 месяца назад
why not always devide the server equally on the ring, on any event for addition and removal. ?
@RohitSharma-jc5bz
@RohitSharma-jc5bz Год назад
Question - how to prevent collisions? What if multiple servers hash to the same value?
@SuiMizu
@SuiMizu 5 лет назад
You are a really good teacher, Gaurav! Please keep up your good work! :)
@gkcs
@gkcs 5 лет назад
Thanks!
@yogeshthota9806
@yogeshthota9806 Месяц назад
crystal clear explanation of concept , Thanks a Million
@yuvrajmonga2501
@yuvrajmonga2501 5 лет назад
Why log(m)?
@andrespadilla1901
@andrespadilla1901 5 лет назад
Why is your speak so nasal? It's difficult to understand
@xawnia
@xawnia 4 года назад
Thanks a lot Gaurav, this was very clear! I was wondering what would happen if there is a clash between different (or the same) hash functions h(x)=h1(y) which server will the load get assigned to?
@vikassaran6430
@vikassaran6430 3 года назад
same question .....do you know the answer
@MANPREETSINGH-wd8sz
@MANPREETSINGH-wd8sz 16 дней назад
@@vikassaran6430 I guess we can have some tie breaking rule over there. Like assigning request to smallest indexed server or something like that
@aniket5736
@aniket5736 3 года назад
Will consistent hashing be irrelevant if we use caching server such as redis ??
@reactdeveloper2368
@reactdeveloper2368 Год назад
@guarav will these solution scale for websockets
@hiteshaggarwal
@hiteshaggarwal 3 года назад
Use of Log and how it solved the problem was not clear.
@GunjanAgarwal0811
@GunjanAgarwal0811 6 лет назад
Question - There are K hash functions to map servers on ring ? Then how do incoming requests uniformly get assigned to K virtual servers ? Are there K hash function to hash request id's as well ?
@gkcs
@gkcs 6 лет назад
I think I was ambiguous here. No, the request falls on the ring and is picked up by the nearest clockwise server on the ring. This point is among the server points(N*K in total). Each server has K points on the ring. So the request is mapped to just one server.
@abhilashavaishnav7456
@abhilashavaishnav7456 Год назад
hash request id and server id using same or different hash function. create a ring with (0 to m-1 partition called as search space). mark output of hash(req id) into ring. then do h(serverid)%(M) and mark it in ring. we go clock wise and find nearest server that will serve the request.
@abhilashavaishnav7456
@abhilashavaishnav7456 Год назад
read from description for revision
@headoverbars8750
@headoverbars8750 4 года назад
What an outstanding video! No shortage of tutorials on how to code or write algorithms out there buy not enough on Systems design... This is truly outstanding... been writing software 10 years and fringely do I touch these concepts, heck work within them daily yet either forgot or never knew. Thanks so much!!
@jrajesh11
@jrajesh11 3 года назад
Simply brilliant and clear explanation . Keep doing such awesome work.
@mercuriallabs9
@mercuriallabs9 4 года назад
So basically you divide the hash space in equal k ranges, k being number of servers. My question is hashing sometimes suffer from clustering. Why take a chance. Why not just divide the entire hash space equally explicitly in k ranges and assign each range to a server?
@gkcs
@gkcs 4 года назад
Because that would require maintaining a mapping between servers and their assigned spaces. Stateful vs. Stateless.
@KKukreja
@KKukreja 2 года назад
How is the value of 'M' chosen in practice?
@punerealestatebuilder
@punerealestatebuilder 3 года назад
Hey Gaurav, I may be asking more, but could you please remake the video again in a better way. The magic is missing in this video, previous videos were like I am watching a movie. When I say the movie, means easy to understand and nothing to force my brain to understand something. Those were too easy. But this video is :(
@gkcs
@gkcs 3 года назад
The content quality has improved over time. This one is a little old :)
@90abyss
@90abyss 5 лет назад
blue = request that comes in. red = server that the request goes to.
@gkcs
@gkcs 5 лет назад
Nice catch 😛
@darshanr5869
@darshanr5869 4 года назад
What is M here I am confused
@SandeepVerma-yh9ec
@SandeepVerma-yh9ec 6 лет назад
Thanks, Gaurav. Nice work. I have a small doubt. As you told to handle the skewed request by having virtual servers[by having multiple hashing functions for servers], how can we handle the collisions? I mean server S1 and S2 got the same output(say O1) from the hash function. Both will be serving the user request then
@gkcs
@gkcs 6 лет назад
That's rare. If that doesn't work, we can change one of the hash functions and rebalance 😁
@omarraghib905
@omarraghib905 Год назад
@@gkcs While hash collision might be rare, but the mod M of hashes may collide more frequently. How do we handle those?
@AkshayKumar-fj9hd
@AkshayKumar-fj9hd 4 года назад
Hi Gaurav, But the problem we were facing with traditional hashing is still there(correct me if i am wrong). Suppose req id 1 maps to slot 4 in the ring and the neareast slot on the ring taken by any server is slot 8 by server1, so all the requests of id 1 will be handled by server1, so it has now kind of maintained the cache according to that. But now consider if we are adding a new server (server4) and this server gets the slot7 in the ring after hashing, now all the requests of id 1 will be mapped to this newly added server which i think is the same problem we were facing with traditional hashing. thanks
@gkcs
@gkcs 4 года назад
The number of such requests which have to be remapped reduced due to consistent hashing.
@AkshayKumar-fj9hd
@AkshayKumar-fj9hd 4 года назад
@@gkcs may be I'm missing something basic, but even if requests are reduced, but after adding that new server all the requests for id1 will go to that new server, and this was the problem we were facing with traditional hashing.
@ravitejathoram5466
@ravitejathoram5466 5 лет назад
Hey, I have a doubt in this approach. How do we decide on the range of Search Space and find "M" before actually building a system?
@gkcs
@gkcs 5 лет назад
Read the other comments.
@AmitAgrawal-xu8uk
@AmitAgrawal-xu8uk 4 месяца назад
How do we decide the size of ring ? value of M ?
@Majitsu
@Majitsu 2 года назад
why bother with hash functions? why not just generate a random number with uniform distribution?
@gkcs
@gkcs 2 года назад
Because you want the requests to stick to a server based on some parameters. h(userId) -> serverId. You can now cache the user details on this particular server, since every time a request comes from this user, the same server will be hit.
@vishalkalaskar8567
@vishalkalaskar8567 4 года назад
Hello Gaurav, when you said 'adding virtual servers' did you mean, adding differently generated hashes of the available servers so that their relative positions on hash ring is uniformly distributed giving us the flexibility of less skewed distribution of requests..? if yes, That implies if 1 physical server goes down, isn't it it's multiple hashed positions will also be off the ring giving more skewed results?
@sauravdas7591
@sauravdas7591 4 года назад
Yes, it will affect the load, but consider this. If a server has, let's say, 4 points uniformly distributed across the hash ring, so when it crashes it will remove those 4 points, and this being uniformly distributed will increase the load other on other servers by 25%.
@sayandey1478
@sayandey1478 3 года назад
Regarding the second hash function, do you mean to say, that when request gets mapped to the failed node by first hash function, hashmap it using second hash?
@sayandey1478
@sayandey1478 3 года назад
Also how do you use the second hash when you insert a new node?
@indexleo
@indexleo 3 года назад
again, given the problem description, use round-robin! you are over shooting a simple problem that has a better and more reliable solution...
@gkcs
@gkcs 3 года назад
The problem is minimizing data migration on a server addition or removal in the system.
@giobaldu
@giobaldu 5 лет назад
Great video! Question: where do the requests sit in practice? Is there a node acting as a scheduler dispatching request by request, or the requests are mapped immediately to a server and kept internally in memory? Or both, so that the requests can be rescheduled if the server goes down? (I suppose this would require the scheduler to periodically ping each server, or set a timeout). What happens if the scheduler goes down? Second question: would it be possible to use work-stealing instead do reduce inbalance? Whenever a server is out of work, it would steal a request from the back of the queue of another random server. Or could this skew too much the execution order of the requests?
@gkcs
@gkcs 5 лет назад
Thanks! The load balancer is a service which needs to tell the other services where a request is to be routed. It can either be queried per request (which is very expensive), or a snapshot of the current assignments can be cached by all services. If the snapshot changes at the load balancer, it can notify all interested clients. The service is distributed and backed by a 'reliable' database, so a single failure won't take the system down. Second answer: It sounds complicated and I have never seen it implemented on a large scale system.
@siddharthsingh5117
@siddharthsingh5117 5 лет назад
Hi Gaurav, First of all thanks for such a informative session. My question is the problem which we discussed in load balancing video that our cache get cleared if new server is added, how that get solved using consistent hashing?
@dipanjandasroy9253
@dipanjandasroy9253 5 лет назад
Hi Gaurav, I have the same question for you. Could you please take a moment to shed some light into it? Thank you.
@ashutoshmishra2328
@ashutoshmishra2328 4 года назад
Hey gaurav, Thanks for this great video. i have one question, can we achieve the same results using a stick-table (which will keep user/IP and server mapping) in loadbalancer with some nondeterministic load balancing algorithm like RoundRobin or Least connection. if not then can you explain why.?
@gkcs
@gkcs 4 года назад
The main objective here to reduce the "rebalancing", the total number of cache loads and evictions. This is useful for load balancing on a cache cluster. The RoundRobin or Least connection algorithms are also useful in different scenarios.
@AbhishekChoudhary-tu7ig
@AbhishekChoudhary-tu7ig 3 года назад
I am a 3rd sem student and I guess I should not be bothering about these things but your explanations are sooooo gooood that I always wanna watch them :D
@anmolkumar5923
@anmolkumar5923 6 лет назад
My concern is what if the result of any two hash function return the same value, in that case for a single request r1 the point on pie chart would be say 5 but the nearest point 6 will be having more than 1 server which can serve the request
@gkcs
@gkcs 6 лет назад
That's a very good question Anmol. However, the chances of that happening are really small, as you have N servers and a key space which is much larger. N would be typically at most 1000 and the key space can be as large as 10^18. Can it happen though? Yes absolutely. However, I don't think anyone has seen it happening in theory. Also, we can keep changing the server's ID till the conflict is resolved.
@hitmusicworldwide
@hitmusicworldwide 3 года назад
Why can't the servers push "ready!" states to the load balancer to initiate requests for tasks, so then the tasks are only sent to servers that have made requests? In addition, if the servers or balancer is able to calculate time to completion they can inform the algorithm as to when each server is predicted to be ready to handle the next task/request. Would this then create a predictive balancing hash that adjusts to the environment?
@SuperSam4y0u
@SuperSam4y0u 3 года назад
This works for distributing "tasks" among the servers, bt this is for distributing data among the servers. If servers are enabled to request for data, then when the data is actually to be read, how can you deterministically pick the server that has that data?
@surajsri248
@surajsri248 3 года назад
What happens to the cache data when we change the hash function itself.....will the data not change drastically with change in hash function
@gkcs
@gkcs 3 года назад
That would need a deployment. We don't change the hash function.
@maddy232853
@maddy232853 4 года назад
What if we have a server or service which acts as a Dealer in a poker game and route requests to the next available server one after another? In this way, every server will have equal load. If any server goes down just mark that node as dead until the next push message or hello from that server. By the way, I am not a system design engineer, I just watched the poker scene from casino royale movie and wondering about its implementation in this context. :-D
@anaygondhalekar7330
@anaygondhalekar7330 11 месяцев назад
Isn’t it similar to Chord DHT?
@KarthikaRaghavan
@KarthikaRaghavan 6 лет назад
Thanks for making this video and its very useful. You mentioned about creating virtual servers by using different K hash functions. What are the chances of hash collisions across different if that happened inside M slots? ie many servers got assigned to the same slot in the ring from same hash output despite the hash algorithms being different... !
@gkcs
@gkcs 6 лет назад
It's more unlikely than an asteroid smashing Earth. So we go with the probability :)
@sriharshamadala4656
@sriharshamadala4656 5 лет назад
@@gkcs H_k(S_i) % M can take M possible values. If our hash function is good we can ensure each server will have a different value but how can we guarantee that collisions across the hash functions doesn't happen. In the extreme case (dense) of large K and large number of servers, collisions happen with large probability. You said that to ensure uniformity we need to increase density, which IMO causes more collisions.
@sanchitcop19
@sanchitcop19 4 года назад
I'm a simple man, I see Gaurav Sen I hit like
@fortnitenoob5124
@fortnitenoob5124 5 лет назад
What is M?
@gkcs
@gkcs 5 лет назад
A large, probably prime, number. M for modulo.
@jananiravichandran8370
@jananiravichandran8370 6 лет назад
Thanks for doing this! Your videos have really helped me understand things better =)
@gkcs
@gkcs 6 лет назад
Thanks Janani!
@zhikaicui3217
@zhikaicui3217 5 лет назад
Nice video Gaurav, and I have a quick question. What if there are collisions in the K has functions?
@gkcs
@gkcs 5 лет назад
It's mentioned in the comments 🙂 The chances of two same values in the hash functions is less than that of an astroid blowing up Earth :)
@amitdubey9201
@amitdubey9201 2 года назад
this seems good for writing but how would you map the reading....
@gkcs
@gkcs 2 года назад
I have a code link in the description.
@dannywadhwa1759
@dannywadhwa1759 3 года назад
After doing all this , suppose there may be a case where some server might have a long request queue and some of servers have no load , can't the request from former server be assigned to next available server node?
@gkcs
@gkcs 3 года назад
No, we do not have "work stealing" in this model.
@saravanprathi6956
@saravanprathi6956 4 года назад
Thanks a ton for explaining the concept clearly Gaurav. I attempted to write a code for this logic, and have put the initial draft at github.com/saravan-prathi/Algo_Practise/blob/master/Algorithms/src/ConsistentHashing.java Could you please take a look at it once and comment. I initially tried to use a circular linked list as the data structure for the loop, however I realized it wouldn't be a good idea as this functionality is going to be search intensive. So, I then used just an array and the code got simple(in fact too simple for a concept of this level of complexity). Hence I am dubious if I am in the right direction.
@saravanprathi6956
@saravanprathi6956 4 года назад
Got a chance to look at the code Gaurav?
@shireennagdivee
@shireennagdivee 2 года назад
Great videos Gaurav! Also, 35%4 is 3 ;)
@gkcs
@gkcs 2 года назад
Yes, thank you 🙈
Далее
What is a MESSAGE QUEUE and Where is it used?
9:59
Просмотров 976 тыс.
What is LOAD BALANCING? ⚖️
13:50
Просмотров 963 тыс.
Database Sharding and Partitioning
23:53
Просмотров 87 тыс.
Consistent Hashing | Algorithms You Should Know #1
8:04
What is an API and how do you design it? 🗒️✅
15:26
What is DATABASE SHARDING?
8:56
Просмотров 929 тыс.
Hash Tables and Hash Functions
13:56
Просмотров 1,6 млн
System Design: TINDER as a microservice architecture
36:41