I'm reading 'Designing data intensive applications' and needed a reminder of what exactly LSM trees were, this video is amazingly helpful! Everything I've learned so far is clearer!
Same here. I wish there was a channel that summarizes DDIA concepts chapter-by-chapter, so we can watch the videos after reading the text to further solidify our understanding.
You look very happy in this video, Gaurav! Nice to see this as I'm starting interviews again -- you helped me out last round, glad to have you onboard this time too :)
I think the meaning of Compaction which you explain missing a major point here. Compaction means throwing away duplicate keys in the log as there can be multiple updates for a single key. With compaction we keep only the most recent onces.
It would be good if you can make a video on "which storage engine to use and when", technically telling us when to use traditional sql vs nosql. I know there is no straightforward answer to it but just to know your view.
Correct me if wrong, but this is for Analytics DB. For transactional DBs which needs to change/delete value, sorted array would slow down writes. Transactional DBs still need to use B+ trees (at least SQL variants). Database with heavy write doesn't translate to Analytics DB, imagine a scenario, if you're storing a background job's state in DB and the number of jobs scale up to 1 billion for 1 billion users every 2 hours - you have 1 billion records that are ingested every hour but at same time need to be updatable.
you did not consider duplicates caused by updating records. found ur video helpful while reading the Designing Data-Intensive Applications book, I'm exactly at this topic, thanks for explaining it in short yet covering the most.
an example maybe you could give here for a bloom filter is - considering the 0th and nth element of the sorted array. If I need to search for a number and if that number is within the range of the smallest and highest, there is a chance it could be there (it may not be there as well), however if the search number is either lesser than the a[0] (smallest number in the sorted array) or greater than a[n] (largest number in the sorted array) there is no way it could be there. Hope that helps.
Why. I feel that it's same as b+ tree overall.. because if we see b+ tree abstractly, it's also maintaining sorted chunks of array...but ya we don't have control over size and when to merge two sorted chunks in b+.. in b+ tree.. insertion is always log(n).. While here it most of the time . constant..(I guess) reading operation is bit heavy here..but I guess that filter can help to reduce time.. Anyway, nice video..
@Guarav sen. I have a doubt. Say a read query comes in for a data point which is NOT yet persisted in database and still waiting to be flushed, wouldn't it be a 404 for client ?
When notification arrives from gaurav.. Me: Hey, you don't know anything about this concept. Don't waste your time by watching this. My Brain : you stop, gaurav makes the thinkgs clear..
*Explaining Bloom filter* - example word "cat", let's think of another word with c... "corona" 🤣I had to check the date of the video and indeed it's post the COVID-19 outbreak. Keep up the good work! Amazing content!
Great channel, thanks a lot. I personally would love to see some real world examples of that (open-source code etc.) maybe as a posts/short vids. I know it’s very time consuming stuff, but I bet it’s very interesting to see and understand how real world frameworks/projects use fundamental knowledge under the hood.
We can divide the searches into range of chunks also. First chunk ranges from say 1 to 30, second chunk 30 to 60 ..and so on .If we want to know if a number 45 is present in the DB or not, we will do binary search on the second chunk
was going through "data intensive applications" book and things were like plain text, came here - watched the video , and now the same text is forming images in my mind. Thanks man, you have created a perennial resource for system design topic.
Hi Gaurav, Thanks for video. With B+ tree we have read and write complexity as O(logn) where as you claim with your approach we can make writes O(1) or less than log(N). The concern i am having is, it will vary a lot depending upon how many sorted chunks you have along with how many merges you would be doing and size of sorted chunks etc. Along with that we have a lot of processing and complexity of implementing/maintaining /monitoring these sorted chunks and bloom filters+corner cases etc. There could be certain cases this approach may be better than B+ trees but there could be some cases where B+ trees will stand out. example could be like fb we have billions of records to process. How many chunks would you make and then how many would you merge and apply bloom filters on ? you cannot go with 6 chunks in your example but may be you need to go with 60k to 6M chunks and then apply processing 1st level search on 6M chunks and second level processing within that chunk + bloom filter etc. How would you handle record deletion and how many things you need to change (chunks) , bloom filter and even more may be.
2:17 it'll actually be slower if you do it in C/C++ (etc) if you do dynamic allocation for every element at runtime. You'll also lose out massively on caching performance. You should use a linked list with consistent size arrays (chunks). O(1) insertion + cache optimization.
Hi Gaurav, Thanks for the video. I have a doubt regarding constant time for write operation. 1) Whenever a write request is triggered it takes log m(where m is size of memtable) to insert in the memtable right? 2) write query is comprised of 2 operations right? i) memtable ii) log file
Hey Gaurav, when you are condensing multiple queries( to save number of I/O calls ) and sending multiple queries together at once how are we ensuring consistency (especially in SQL), like an user makes a write operation and within no time he is trying to retrieve the record again. since condensed queries might not get executed by that time, the read operation says no record found error... may be using some kind of cache (a mini database table at application server level with schema replicated) to serve scenario I mentioned above would help? what are your thoughts on this?
Appending at the end of the LinkedList, the time complexity is O(N) (we need to traverse through the linked list and we need to append), Correct me If i'm wrong
If I send a write query which is batched in the server, and before the batch is processed, I send a read for the same row, then the data is still not written on DB, but the user might expect it to be present. How does this work in such cases? The same thing goes for if the data is written on the log (or linked list), but not on the sorted array
I am so happy to watch this playlist. Most of time i read on platforms like quora or reddit that developers never used knowledge of DSA in their work. And everything was just to pass the interview test. But this series proves it wrong. If someone is working on developing a database or some services thst will be used as blackbox by others. Then knowledge of dsa will be so useful. Please correct me, if i am wrong. And add some more egs if i am right.
2:29 I don't get it, why do you care about read speed on the linked list? Doesn't the database have to read it in order? Or is it used as some kind of cache?
Honestly, this was a bit confusing. Also, this kind of seems to give an idea that we can have very fast writes and reads too, which I don't think is possible from what I have been reading and learning. Based on the RUM conjuncture that is. But I'll go check it out. The bloom filter...isn't that an extra cost to read? In terms of space that. Also, I was wondering how the bloom filter would be checked and what's the time complexity for that. There's a lot of words and interesting things, but I couldn't make sense out of the whole thing. It felt like simply shooting words in the air.
I'll go read the papers on LSM tree and more and implement them and come back and see if I can make sense out of this. I did hear all these words in many other videos though, the sorted string table, bloom filter, LSM tree, B tree, B+ tree etc
In a prod system, if the compaction is in progress, then how can we make sure that the LL which are being compacted are still available for read operation?
I just finished STORAGE and RETRIEVAL from Designing Data Intensive Application and this video came. It gives more insights and flavor to our knowledge.
Really nice video, few questions 1. What happens if the server goes down before we get a chance to flush the data from memory to the DB? 2. If we already know the ideal size of the segment to read the data efficiently, why not start with having the memory of the same size instead of opting in to merge the sorted segments of a smaller size?
The problem with Gaurav Sen's videos is that if its 17:22 minutes long then you need watch it exactly for 17:22 minutes to understand the concept. The moment you skip forward for 5 seconds anywhere then you need to rewind for 20 seconds.
A large DB will typically be having millions of entries, so will the merging not be very inefficient in that case? Since merging two sorted arrays has TC of O(n1+n2). Even if this compaction happens async, it can still slow down the DB because it will happen frequently.
@@gkcs No of entries will depend on data size of each unit, but let's say that the data is 100GB. Sorting that much of data frequently will not be efficient.
I don't understand something at 7:50 If we have N records in a sorted list and we want to add another sorted list of 6 elements - we don't have to sort it again, we just need to merge it. That's O(n), not O(nlogn), right?
Got confused over how to do binary search on linked list...cause I couldn't think of any way to randomly access middle node in O(1)..With 2 pointers technique we can find the middle node but it would require O(n/2) complexity....Which would be ~2x faster but still cannot be O(logN) Will appreciate if someone could help me here.
Hi Gaurav, Pardon me if I understood something wrong but: 1. Around 7:45, if we have 10,000 sorted records (n) in SST and we are trying to insert 6 new records (m), why do we need to sort (10,000 + 6) records? Instead, can't we do binary search on 10,000 records to find insertion position (log n) of the 6 records? Then time complexity will be = O(m * log n) = 6 * log(10,000) = 60 2. Also, if we go with the approach of merging the sorted arrays of size n (two arrays of 6 elements each merging into one array of 12 elements), why do we need to traverse all the arrays to insert a record x? Since the arrays are already sorted, we can easily compare starting and ending element of the array with x. If starting element is more than x or ending element is less than x, we don't need to traverse that array and can check another array where a[0]
8:03 It's still O(nlog(n)) imo. Because in the worst-case scenario you'll go over every single chunk (n) and binary search on them (log(n)) as you can't really separate them into sequentially sorted chunks. (e.g first chunk holds sorted values from 1 to 50, a second chunk from 51 to 100...). This would require sorting every chunk in O(nlog(n)) time because we're getting random values in every query block.
I went through Bloom Filter video, and looking at the false positive probability, why are we not using trie in this LSMT scenario. Atleast error rate would be zero, even if it is not optimal.
LSMT uses the bloom filter because it takes a fixed amount of memory. We can also adjust the filter size as per our requirements. Tries are notorious for taking large amounts of memory. However, if the key range is small, it could be used.
I have a good understanding of how MySQL does all of this, and the only real advantage to this queuing technique is multi-row inserts. Inserting multiple rows at once is faster than doing the same number of individual row inserts. When you insert data into the (InnoDB) storage engine, it stores the data records directly in the primary key BTree, so the sorting is done upon insert. If you want to optimize bulk inserts, simply create a queue of records and periodically dump that queue to the database. The types of workloads that benefit from queued inserts would primarily be ETL data transfers.
We can store updates in memory and send it to DB in bulk that part is clear, but how do we make sure that our data is divided into chunks and each chunk is sorted, isn't it internals of DBMS ? In short say I have a postgres DBMS, how do I implement this concept ?
Wow..never thought abt the database optimization from write and read perspective...always thought abt query optimzation Thanks a lot for providing such a useful information..thanks again and keep doing this great work
Hi Gaurav - Could you please recommend good books for learning DS and Algo. This is first time I am hearing LSM and want to know more on other structures..
I think thats how cassandra works to make writes faster .. mem table in memory and flushed to sst for good write latencies and using bloom filters/row and key caches to reduce read latency!
but searching on bloom filter means need to iterate query string and get hash value...... as the size of string increasing the operation will increase which can be greater than our binary search operation.
When is sorted chunk created? Is it during insertion or sometime later ? What happen when its get a query for the data which is not yet in the sorted chunk ?