B-Trees on Disk (Database internals)

Подписаться 36 тыс.

Просмотров 13 тыс.

50% 1

This is the third video in the Database Internals series.
In this video we dive into the world of disk based storage using B-Trees.
This video will help you understand the key differences between main memory and disk storage.
Chapters:
00:00 Defining the Problem
01:18 Memory vs Disk storage
03:00 Search data structures
04:35 Why separate data structure for Disk?
06:37 Why Binary Search Tree doesn't work on the Disk?
08:25 Design goals for Disk storage
09:21 B-Trees
11:13 B-Tree of order 5 (example)
14:54 4 billion keys? How many disk seeks?
15:36 What are these keys in a B-Tree?
17:43 Storing data in B-Tree vs keys only
19:31 Types of B-Trees
Follow me on Linkedin and Twitter: / kaivalyaapte and / thegeeknarrator
If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.
Database internals series: • Write-ahead-logging
Popular playlists:
Realtime streaming systems: • Realtime Streaming Sys...
Software Engineering: • Software Engineering
Distributed systems and databases: • Distributed Systems an...
Modern databases: • Modern Databases
Stay Curios! Keep Learning!
Cheers,
The GeekNarrator

Опубликовано:

25 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 11

@darpanmalhotra2 7 месяцев назад

I am still trying to visualize how a B-Tree (which is in-memory data structure) gets persisted on disk blocks, as nodes are populated in B-Tree. Ultimately, what we call B-Tree for memory is a file (i.e. index file) for disk. Further, I am still trying to visualize how the reverse happens i.e. when DB starts, it has to read the index file from disk blocks and convert it into a B-Tree (in-memory data structure). So, how does that wire-up? Somewhere, I fail to see this memory to disk (and reverse) data movement and interpretation. Maybe this is just too basic, yet very confusing to me.

@Lokeshsanapalli1729 7 месяцев назад

Same for me

@TheGeekNarrator 7 месяцев назад

B(+)-Tree is a data structure which is designed to store data on the disk. In memory you can have a binary search tree or a b-tree. To visualise, lets see if the following helps? - In memory each node can have : Intermediate - a list of pointers to children - some keys used to guide the search algorithm for CRUD. (show me where to go next?) Leaf: - ordered list of key,value pairs - key is the search column - value is a pointer for actual value OR an actual value (depends on implementation). - On disk: - All of these are files, that are stored on blocks. - Each block has a list of entries, which can either represent an internal node OR a leaf node. - A pointer to another node can be replaced with a record_id (page_id, file offset). - Leaf node is a block that contains key,value pairs in order so you can search for a specific key once you read a block from the disk. - again value can be a record_id (ex. Postgres) OR the data itself (ex. MySQL). So basically, each in memory node represents a block on disk which has an ordered list of keys and|or pointers (record_id) to another blocks (children pointers). So when you read a block, you match the key and decide which block to load next. Now a block can be filled completely OR be half filled based on the how your data is distributed. But you can imagine, because of the data layout (as a B+-Tree) on the disk, the number of blocks you need to read from the disk is log(N) base B, where N is the total number of keys in the dataset and B is the number of keys in one block (node). Does this help? Please let me know if it doesn't, if I try to simplify further.

@darpanmalhotra2 7 месяцев назад

@@TheGeekNarrator This helps, thanks ! But I am stuck at visualizing how the in-memory pointers to nodes (be it top-down traversal or left-right traversal) get persisted into disk. Pointers are relevant in context of memory only. I think, you are saying the pointers get translated to block address at the time of persistence? If yes, then the question is: who does this translation and how? I thought, the block addresses are known deep at OS/Disk Driver/HDD level?

@balajisrinivasan6618 7 месяцев назад

Great insights ! I was thinking about why not Arrays as the leaf node instead of B-Tree to reduce the disk seek further. Would love to hear your views on this. Consider, our use-case is : given a key, we need to find a pageId, tupleId (for concreteness, let's take the size of each ID as 32 bytes, so overall 64 bytes). In case of using B-Tree, we need to store the keys even on the leaf nodes. In case of arrays, we can use array index as the keys. (Be it arrays or B-Tree, as it's stored on 0/1 on the disk, some processing power is required to serialise / deserialise) Using arrays, if we consider the block size as 16 KB, approximately, we will be storing 250 values in it. To support larger number of keys, we can front the arrays by B-Trees. So, storing the keys on root and intermediary nods makes sense totally. But, storing the keys on the leaf node seems redundant. I understand there are certain limitations like only integer keys can be supported by arrays naively, this can be extended to hash table if required. But, if we can get the lower disk seeks if we use integer as the index key, we shall definitely consider using it for most of the applications.

@TheGeekNarrator 7 месяцев назад

I am not sure I understand. Are you talking about replacing the usage of B-Trees with arrays? Or just the leaf nodes as arrays? I am not sure how using the array index would work on the disk. Could you please elaborate with an example?

@balajisrinivasan6618 7 месяцев назад

@@TheGeekNarrator Sure. Sorry for that. I was making a point that we can replace the leaf node as arrays. Use-case : Given a key, find the pageId, tupleId. For simplicity, let's consider the key as an integer. We are using B-Tree for storing indices. This B-Tree node itself will also be stored on the disk. Let's assume each node corresponds to a block on the disk. For CPU to process anything, it had to bring the information from disk to memory and process it. That said, the data on the disk corresponding to the root node will be put in memory, B-Tree node will be constructed and the execution begins. We scan the B-tree node, look at the key and depending on the key, move to the corresponding pointer. We continue this process until we reach the leaf node. In the leaf node, we will probably be storing k1 : p1, t1, k2:p2, t2, etc... Now, the curiosity is : storing keys on the root node / intermediate is required to figure out which path to traverse. On the leaf node, do we need to store these keys? In my opinion it's not required and we can leverage the computes internal representation of memory allocation to get the value for a given key. Let's say a leaf node is holding 256 keys that has a pageId and tupleId. In B-Tree node, we need to store these 256 keys in it which consumes space. If we use arrays as the leaf node, since anyways we have to construct the data structure in memory before we process it, if we construct it as an array, memory will be allocated sequentially (say 100 - 356). For getting the key 1 -> we can go to 100th memory address, fetch the value. for key 2 -> go to 101st memory address and fetch the value. On a side note, If my understanding is correct, for any data structure that requires say 1000 bytes, computers will allocate memory sequentially only. In such a case, we can just piggyback on the memory representation to figure out the indices. Because indices are also sequential.