What makes db a DB? It is the underlying database engine. SQL is 5% of all the Databases used. Myths With NoSQL: NoSQL because SQL doesn't scale It doesn't scale with constraint. If you shard SQL, it will scale. By default, NoSQL is sharded so people claim it is scalable. SQL Db Structure Myths: SQL uses B+ Tree. You can write your storage engine. You can store data anywhere. Popular and default engine MyISAM engine and InnoDB engine use B+ Tree. Reason: they give log(n) lookup. No SQL Structure It Depends on the use case as it is having no Standardization. Few Types: NewSQL, InMemory, Key-Value, Columnar DB, Hybrid Dbs. Document DB It is very close to Relational Databases with a change of modeling layer. MongoDB uses wiretiger engine. The underlying engine can be the same for SQL and NoSQL. If they have the same underlying engine the difference can be with the guarantees they offer. Some can be distributed, some in memory, centralized, or embedded. A database system typically has several abstracted layers that handle different aspects of data management. These layers include: The Physical Layer: This is the lowest layer and is responsible for managing the actual storage of data on disk or other storage devices. It handles tasks such as allocating space for data, reading and writing data to disk and managing data files. The Storage Engine Layer: This layer sits above the physical layer and is responsible for managing the storage and retrieval of data. It handles tasks such as indexing data, managing data structures, and providing an API for querying data. The Query Layer: This layer sits above the storage engine layer and is responsible for parsing and executing queries. It provides an API for querying data and translating high-level queries into operations that can be executed by the storage engine. The Application Layer: This is the highest layer responsible for interacting with the user or application. It allows the user or application to interact with the database using a query language or an API. These layers are abstracted from each other so that changes or updates to one layer do not affect the functionality of the other layers. All of these are plug-and-play. DB is as performant as its storage layer. We can see JSON at the top, but beneath the layers, it is a highly complicated way of storing the data. What does the node of the B+ tree contain? In relational DB, it contains the exact row, as it has a fixed width, so it knows how much data it will require in one row. However, It is not necessary to have a single row in a node, it can have multiple rows. Indexing: It is similar to SQL and NoSQL. It makes reading faster. (lookups) Sparse Index: Indexed Value + Offset. Smaller Index Sizen Dense Index: All the words in the index Why are we not able to do joins in NoSQL even if the underlying data structure is the same? Join runs on compute side, not on the storage engine side. Databases need to be in the same machine to join. In sharded db, you need to bring data in one very costly machine (network overhead) so people say there is no join in NoSQL. So people tend to do Approximate Join or Partial Join. Geo-sharding: Geo-sharding is a technique used to distribute a database across multiple geographic locations to improve performance, scalability, and availability. Master-Slave architecture This is done to scale the reads. We do write in master. Pulls write periodically, called replication log. We are more likely to have read. Multi-Master Architecture Problems of Conflict Resolution, How will you handle ID's? Conflict Logic First Write Wins Last Write Wins Concat Not Accept Any Distributed Databases Masters are independent, as they have shards Joins in Sharded DB: All the relevant data from the databases will arrive at a single machine then the join will happen. It then computes the result and sends it back to all machines. These queries are good for analytics but not ideal for real-time use cases as it is very expensive. Use cases The strength of SQL DBs is ACID compliance, some distributed claims ACID compliance, which means they are having distributed transactions which will result in them becoming slower. If we want strong consistency we need a single node.
I think btrees or lsm tree or any other type of index ds will store the memory location and not the whole row.can you help me fact check that information?
Just amazing !!! free me itni knowledge mil gai itna muje bhut expensive course se an mile I love this channel do bring such staff software engineer who have such a great experience
Good Podcast. I've been a fan or arpit for a long time. His BitTorrent playlist was very interesting. About the podcast, I'd prefer if you put your and arpit's video side by side, it would give a conversation feel rather than this.
Would a postgres master slave architecture be eventually consistenet even with physical replication rather than logical replication on the storage layer? For example aurora postgres database
I think btrees or lsm tree or any other type of index ds will store the memory location and not the whole row.can you help me fact check that information?