a few things to add. i prefer partitioning based on a guaranteed key in the sense it will not distribute badly ... so the "first letter of name" is a bad idea. better use the record id and group 100k of them or what into a partition. then before storing partitions on different servers, there are a few more things to do first. one is to split modifying queries from read-only queries (which has to be done on the application level) so a simple read-replica-server (which is trivially to be setup in postgres) can be used. next what is possible is a db split on the logical level. i mean for example keep the user's core data on db1 and chat messages on db2. leaving out foreign keys and using weak references instead, with a periodic cleanup job that resolves broken links is a good idea, eliminating issues on backup restore when cut in a bad moment as well.
Coming from a decade+ of data work with health records, I have to bump this comment. Name, location and birthdate combined still aren't unique. Messing up data with potential tromps like this is straight up lethal in some fields. Remember, friends: bad data is worse than no data.
Thanks for the interesting content! 😍 Just a small off-topic question: 😅 I have these words 🤨. (behave today finger ski upon boy assault summer exhaust beauty stereo over). Not sure how to use them, would appreciate help. 🙏
Thanks for sharing such valuable information! I have a quick question: I have a SafePal wallet with USDT, and I have the seed phrase. (air carpet target dish off jeans toilet sweet piano spoil fruit essay). How should I go about transferring them to Binance?
"commodity hardware does not have ECC - don’t run a db on it" SQLite is a file based database. It doesn't have to reside into the non-paged part of the RAM. High energy cosmic radiation can corrupt only the volatile memory cells, not the storage. Also modern commodity hardware have some level of ECC for CPU cache memory. Single bit ECC support for L2 cache, and multi-bit ECC for L1 cache (at least my 10 year old Intel i7 has). A whole query operation will probably fit into the cache size of the CPU unless the data size for columns exceeds the L2 cache size of the CPU (good luck exceeding that, for example say L2 cache is 256 KB and even if we have half of it available for our query operation at this moment with all the data for columns, it would take more than 100 columns each containing >1000 bytes to surpass that cache boundary, domain corresponding these kinda large query is not a thing of commodity hardware anyways. Hospital billing, hotel management, restaurant billing? Nah). Taking worst case memory access time say 100 nano-seconds to fetch the data from RAM to L2 cache memory. Radiation will have to corrupt those exact memory bits inside the RAM within that 100 nano-seconds during the fetching cycle. Then it will take another 100 or so nano-seconds to write the data back to the disk (worst case disk access time of 50ms (0.005 ns) is assumed). It's extremely unlikely; almost next to impossible for that radiation to randomly flip those specific memory cells inside the RAM out of billions of memory cells pertaining to the SQLite update/delete query executing function that will complete it's execution and save the data into the disk within like 10 milliseconds at most (including all network overhead of system calls). SQLite for Desktop is your friend. However, if you intend to use any of the client-server architecture based database like MySQL etc then your statement is valid indeed.
I would think that another potential disadvantage would be if you are using commercial rather than OpenSource operating systems or databases where the licensing costs increase as the number of servers increase also.
The video script explains the basics of database sharding and partitioning in system design. It discusses how sharding can help manage large amounts of data by breaking it up into smaller partitions spread across multiple servers. The script also highlights the advantages and disadvantages of sharding in terms of scalability, performance, and operational complexity. Key moments: 00:32 Traditional databases encounter limitations with increasing data size, necessitating sharding to enhance scalability and performance. -Geobase sharding partitions data based on user locations, reducing latency by routing users to the closest node. -Range-based sharding divides data by key value ranges, simplifying partition computation but potentially leading to uneven splits. -Hash-based sharding uses hashing algorithms to evenly distribute data across partitions, reducing hotspots but potentially separating related rows. -Automatic sharding dynamically manages data partitioning for higher performance and scalability, but manual sharding at the application layer increases development complexity. 03:55 Sharding enables scaling, faster queries, and system availability, but poses challenges like complex management, hot spots, and high operational costs. -Advantages of sharding include scalability, faster queries, and improved system availability during outages. -Disadvantages of sharding involve complex data relationships, potential hot spots, and operational costs for maintaining high availability. Generated by sider.ai