Quite a wonderful set of tutorials. Sadly, not many professors have your gift of being able to explain these (not so hard after all) concepts. Many thanks!
I think what may be throwing people for a loop is the property of modulo. If you don't get that you won't be getting this video. Notice if something is %2=0, there is only a 50% chance (avg) that is %4=0. So by adding another bucket that is twice the factor of the previous modulo, in the process you split the modulo. Note: something can be crafted that doesn't. For instance, what would happen if you inserted 256 (divisible by 0,2,4,8,16,32,64,128), 10 times. I assume you'd split until you were under the capacity, and then you'd have a few overflow buckets for a long time. Really cool video. I'm local in Houston. Definitely want to meet sometime.
as far as i know you determine the function to use according to n, the bucket to be split. specifically you first check appropriate number of last bits which is largest k that satisfies # buckets >= 2^k. In that case it is 1 since there are 3 buckets. Then you check last k bits. If it is smaller than n you consider last k+1 bits for the function, ow you consider last k bits as the function. In this case n=1 so if last bit is 0, use last 2 bits 2 hash. ow use last bit to hash.
He says you need to check all elements if they fall into the new bucket. But IMHO the key idea to linear hashing is you do not have to check all elements in case of a split. You only need to redistribute the elements in the Bucket (including its overflows) which is splitted. Otherwise the table would get much slower the larger it gets.
so when he first hashes, he does 10 mod 2 because there are 2 buckets. that makes sense. but then when he needs to split and rehash, he does mod 4 on the first bucket, mod 2 on the second bucket, and mod 4 on the last bucket. why does he do this? why doesn't he do mod 3 for all the buckets because now there are 3 buckets?
+bmgag19 Because then he'd have to rehash not just the elements in the bucket that was split, but *all* the elements in the table in its entirety. That's slow. At least, that's my understanding.
The records originally in bucket 0 are distributed between the two buckets based on a different hashing function hi+1(K) = K mod 2M --- quoted from Ramez Elmasri
how do you know which hash function to use? After he adds 10 and there are buckets 0-2 he says bucket 0 and 2 use mod4 but bucket 1 still uses mod2. How do you know which hash function to use before you actually hash it....?
hey Gary, great video, i'm doin some late studying and this video helped majorly, i'm from south africa btw, there's just one thing i don't understand though :: i AM writing this evening after work but i'd still like to know: - innitially we would calculate the position of the value to add using (v mod 2), after the split happened the n pointer moved on bucket down and the value to mod with increased (2fold | by2)? ie: 4, but remained the same for the bucket to split ie: 2.
actually the formula for this mod 2 to mod 4 change is: k mod (Nx 2^L) N is the number of buckets (doesnt change after split) and L (Level) shows how much this file is doubled. So in this case after the first split L = 1 which means: k mod (2x2) -> k mod 4
Nothing is perfect. There is no perfect lesson for every student. Feel free to ask a question. And if you think something is explained elsewhere better please clue us off and bask in the upvotes.
As below the guy, That's exactly what i dont understand too. I couldnt get it clear that why he did the mod 4 not mod 3 and even it is true, why he did not the mod 4 on the second bucket.
In linear hashing, buckets are split one at a time. The change from mod 2 to mod 4 bears the idea that it is relevant to look at one more bit in the key, since we are splitting this bucket into 2 ( alpha mod 2 is the same as looking the least significant bit of alpha, while alpha mod 4 is the same as looking the 2 least significant bits in alpha).
great. but you dont really explain why you cange n, or why you change from mod 2 to mod 4. Is h(K) = K mod 2^i, where i is the number of splits you made? its a bit diffuse