The Problem With UUIDs

Подписаться 291 тыс.

Просмотров 179 тыс.

50% 1

THANK YOU PLANETSCALE FOR SPONSORING THIS VIDEO
UUID's have a time and place, but I rarely see them used correctly. I've wanted to do this rant for awhile and I'm happy I did because CUID2 is NOT a good option either 🙃
SOURCE
planetscale.com/blog/the-prob...
Check out my Twitch, Twitter, Discord more at t3.gg
S/O Ph4se0n3 for the awesome edit 🙏

Наука

Опубликовано:

6 май 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 974

@averageemogirl 24 дня назад

"hey guys! today i will be talking about something i clearly don't understand" is a great summary of this video

@einrealist Месяц назад

UUID versions are NOT supposed to replace each other. They are just different implementations that address different problem domains. Once humans become a serious space-faring civilization, we probably need yet another UUID version for problems like time dillation. But such a UUID V99 is not supposed to replace V4, but to complement it. V4 can still be used within certain bounds.

@vikingthedude Месяц назад

So its like newton’s laws

@TheOriginalBlueKirby Месяц назад

@@vikingthedudeTerrible analogy.

@Imperial_Squid Месяц назад

Calling them something like flavours rather than versions is probably more fitting

@steamer2k319 Месяц назад

@@Imperial_Squid Yeah. I've never been confused by the UUID types but I could see how one could be misled into expecting obviation/revision from a numerical sequence of "versions".

@vikingthedude Месяц назад

Newtons laws are still useful even though we’ve got general relativity

@flymetothemoon5138 Месяц назад

It's amazing watching somebody talk shit about something he obviously doesn't understand beyond his sponsors blog post

@pchris 26 дней назад

He talked up the video at the start like he's gone through many iterations of it so I was kind of shocked when the entire video was just him reading somebody else's blog post. Did he really have so few thoughts on it himself, or were they all about GUID and made obsolete as he read the UUID article?

@hellowill 26 дней назад

this frontend guy needs to stfu about backend and databases lol

@aguenter 25 дней назад

@@pchris Skimmed right over ULID too, which fits his stated requirements/wants.

@DavidOtto82 22 дня назад

lol i saw this top comment and thought "lets see how long it takes until i can call bullsh*t"...took 2 minutes. btw: former frontend dev here xD

@tahvohck 21 день назад

Thanks for saving me some time.

@cherubin7th Месяц назад

I propose v9, it is the same as v4, but instead of a 4 it has a 9. This changes everything.

@mx-kd2fl Месяц назад

lmao

@DavidLindes Месяц назад

vibes of the IPv9 (April Fools) RFC. :)

@johnblyberg4801 Месяц назад

Genius!

@follantic Месяц назад

It's incremented. Good for perormance. Then we can proceed with the A-F versions.

@follantic Месяц назад

Also, Tom, is that you?

@xdaniels13 Месяц назад

Oh I am still waiting on the video about PlanetScale removing the free plan and the cheaper price twice as much as competitors...

@portalpacific4578 Месяц назад

We need to stop watching compromised influencers pushing businesses despite not having any support from the dev community. I cant even hear the info cuz im so annoyed hes shilling their BS and taking their money.. Fck you Theo.

@portalpacific4578 Месяц назад

Theo = sell out.

@yiannis_p Месяц назад

@@portalpacific4578do you ever Google anything before calling people names? Name me one other company offering a hosted vitess cluster with a https proxy, backups, branches and online ddl that is cheaper than planetscale and then I will agree with you.

@yiannis_p Месяц назад

You do realise planetscale offers something no other company offers right ? The only other way to get a hosted vitess cluster with online ddl, backups and an http proxy is to make it yourself, and unless you value your time at zero dollars, it would be a lot more expensive…

@tanotive6182 Месяц назад

He already made a video about that

@eamonburns9597 Месяц назад

17:58 Just so you know, "I'm not telling you the difference between them, that's your problem", takes a lot more time to say than "1 byte is 8 bits"

@nArgari Месяц назад

Nice... So no difference between an octet and a Byte !? Or may be... The terms "octet" and "byte" are often used interchangeably in common usage, but there is a technical distinction between the two. A byte is a unit of digital information that consists of 8 bits. A bit is the smallest unit of data in a computer, and a byte is a group of 8 bits that can represent a single character of text or a small amount of numerical data. Bytes are commonly used to represent characters in text, encode instructions for a computer program, store data in memory, and transmit data over networks. An octet, on the other hand, is specifically defined as a unit of digital information that consists of 8 bits. In networking and data communication contexts, the term "octet" is often used to emphasize the size of the unit (8 bits) without assuming any particular character encoding or representation. The use of the term "octet" in networking standards helps to avoid confusion that may arise from different character encodings that use different numbers of bits per character. In summary, an octet is a specific term for a group of 8 bits, while a byte is a more general term for a group of 8 bits that can represent a character or other data. In practice, however, the two terms are often used interchangeably, especially in everyday language.

@vorrnth8734 29 дней назад

@@nArgari Actually a byte is the smallest directly accessible piece of memory. It does not need to be 8Bit wide. Historically there were other sizes too.

@uncaboat2399 28 дней назад

@@nArgari I thought an "octet" was a baby octopus? 😝

@PhilHibbs 27 дней назад

Anyone that deep into the video who doesn't know that is in the wrong place.

@eamonburns9597 27 дней назад

@@PhilHibbs I mean, if they are watching this channel, they are probably JavaScript developers, so....

@chepossofare Месяц назад

The proble with UUIDs is that you are not using PostgreSQL.

@mvnkycheez Месяц назад

What's the difference?

@KangoV Месяц назад

In PostgreSQL a UUID is stored in 16 bytes. Indexes on these are super fast as they are performed numerically.

@spicynoodle7419 Месяц назад

When using PostgreSQL instead of MySQL, your whole DB is much slower rather than only the UUIDs xD

@milanmiljus823 Месяц назад

@@spicynoodle7419 cringe

@landsman737 Месяц назад

hehe exactly

@tarunpai2121 Месяц назад

Dude literally just read an article and nothing more. Offer some POV at least. - I don't understand if this is really an issue with mysql or with postgres too? - Is this even as big a problem? or is it just clickbait? - What's the actual performance impact? there's a lot of "perf issues" thrown around with no(?) benchmarks.

@urmom8322 Месяц назад

What do you mean? He was also able to get paid to shill for planetscale. Amazing content

@intentionally_obscure Месяц назад

Thanks for saving me half an hour

@hum6le Месяц назад

thats kinda this whole channel

@AxioMATlC Месяц назад

Highly variable, but consider where an auto-incrementing 32bit int index is 16GB. UUIDs could be as large as 4x or 64GB. Now you are talking about the difference from a cheap server to a more expensive one unless you degrade to searching within an index based on drive speed instead of RAM. ~100x slower for index usage

@pchasco Месяц назад

@@AxioMATlCOK but you wouldn’t use cuid in a scenario where you can use a 32-bit int, either. You haven’t pointed to a problem with UUID here. This is a problem of choosing the wrong solution to a specific problem.

@rikschaaf Месяц назад

3:49 To my knowledge, not all versions in UUID should be seen as "the next version". Some are more like variants than versions. That's why both v1 and v4 are still used often. You could say that variant 5 is a new version of variant 3 though, because we now have better hashing algs that might have not been considered in v3. Variant 6 similarly is a new version of v1, where they reordered the date to be sortable (with v7 being a variant of that). You shouldn't think of every new --"version"-- variant being better than the previous, but that they complement each other.

@workmad3 Месяц назад

Exactly this. Going back to RFC-4122 (2005 one detailing the 5 currently standard variants), it says this about the variant bits: > The variant field determines the layout of the UUID. That is, the interpretation of all other bits in the UUID depends on the setting of the bits in the variant field. As such, it could more accurately be called a type field; we retain the original term for compatibility. The idea is you pick the type that suits your needs. If you don't have any particular needs, v4 (the 'random' type) is a good choice, because you simply want a long, random value.

@aredrih6723 Месяц назад

It's unfortunate that the article didn't mention "version" 0 (that only contains the nil UUID), it would have made that point more obvious.

@davidmartensson273 Месяц назад

Yes, they are intended to solve different problems and all come with different trade offs to achieve that, a user bound one is good if you need to be able to identify the user over values stored in many many systems to not have to add multiple extra columns. I see some problems with this, but I can se the reason. And while you sometimes do want UUIDs that are sortable, especially if you want to use them as primary keys to avoid having to inject things in pages, sometimes you really want more random with no traceable time component. This feels very much like a commercial for planetbase and not informational.

@rikschaaf Месяц назад

@@davidmartensson273 I don't think that the issue here was planetscale's article. It was very informative about the consequences of using those UUIDs in keys/indexes. I think that the issue here mostly came from Theo's prejudice and inability to see them as variants, rather than versions. To be fair, Wikipedia also talks about versions rather than variants, so Theo isn't alone in that regard. It would be good if official documentation and knowledge bases would be updated to reflect this. So, who's gonna change the word version to variant on wikipedia? 😄

@onoff5604 Месяц назад

sounds like 'versions' of raid storage...

@workmad3 Месяц назад

Thinking about the different UUID variants as 'versions' that replace previous versions isn't quite right... the idea is that the different variants are used in different circumstances, so a deterministic variant is useful in some circumstances. Which is why there's a 'variant' number in it, so different variants can be checked differently.

@supersat Месяц назад

V1 was a pretty good idea because collisions were basically guaranteed to never occur. But that was in a world where MAC addresses were guaranteed to be unique, and VMs basically killed that.

@ChrysusTV Месяц назад

"I spent years researching this to make sure I got it right. Turns out, I don't need to. Here's what my sponsor has to say." Sheeesh, DarkViper's video accurately notes a lot of key issues, like this one. Something always felt off about this channel and getting a different perspective really made things clear for me. This channel really is just regurgitation with, as you admit yourself in the first few seconds, little research.

@raphaelmorgan2307 22 дня назад

also in the video shows us himself posting someone else's joke on Twitter lmao

@dijikstra8 Месяц назад

Using a char(36) to store a uuid just seems like the worst idea every, does anyone actually do this? Postgres has a specific datatype for uuid, does MySQL still not have that?

@mehmeterencelik594 Месяц назад

Yes. For my opinion, this video only shows how better the postgres than other relational dbs.

@nathanalberg Месяц назад

Im moving our app to postgres because of the UUIDs alone. UUIDs in mysql is basically roll-your-own.

@SPeeSimon Месяц назад

Yes, they should have used varchar(36). Barely joking, because i actually saw a team use nvarchar(36) on an Oracle DB for their PK. Not really a shocker, because the NIH disorder was really high. So they did not use the same raw(16) type as everywhere else in the system and did not reuse existing code.

@Lexaire Месяц назад

@@SPeeSimon Codebase I work on uses varchar(40) and some uuids have hyphens and some don't. Thanks MySQL.

@johnbernardlambe8582 Месяц назад

If someone wants hexadecimal, they could at least omit the hyphens, giving 32 characters. Even worse is the 38-character format, with added chain brackets.

@steveoc64 Месяц назад

Javascript programmer using MySQL gives us a tech talk about the importance of performance at scale lol

@romankoncek150 Месяц назад

Probably more experienced with real projects at scale than 99.9% of people here posting useless opinionated statements having nothing better to do

@JohnSmith-xv1tp 28 дней назад

@@romankoncek150 As someone who has actually worked at scale, Theo should definitely have stayed in his lane on this one. He clearly doesn't know enough about databases nor UUID for this conversation.

@iPuls3 22 дня назад

Don't mind me, the Node.js TypeScript developer using MariaDB...

@danhorus Месяц назад

This channel does read a lot of blog posts

@Dekutard Месяц назад

i rather listen than read.

@thomassynths Месяц назад

Reading blog posts is fine and dandy. Regurgitating sponsor speak isn't though.

@loganyt8818 Месяц назад

instead of PlanetScale sponsoring Theo, they should have used that money to keep reasonable free plan

@sexymeal2427 Месяц назад

lmao

@__Brandon__ Месяц назад

Except now 100 thousand new people know about planet scale and probably 5% of those people are likely to use their service. 5k new users is a lot of users

@epajarjestys9981 Месяц назад

@@__Brandon__ "probably 5%", lol. that's a completely ridiculous and extremely improbable number. it's just as probable as theo's sponsorship money being enough to finance the free plan.

@unknown.profile.handle Месяц назад

btw it is an RFC, not just a proposal anymore. But there were many wrong assumptions in this video.

@falkkyburz Месяц назад

Why are we mixing "version" and "variant"?

@NicolasTroncoso Месяц назад

I believe there is some confusion about sortability. All primary keys are sortable, regardless of the value. They will fit in the B+ Tree pages with different packing sizes, and with the balancing issues described in the video. What some people want is that records are sorted by creation time. Which is what the CUID post talks about. Don't worry about the record being naturally sorted by creation time. Use a random key to avoid hot spots, and use a createAt (or reverse CreatedAt) index if you need to scan the table the naturally created order.

@rainmannoodles Месяц назад

Exactly. Locating a given CUID is still efficient with the tree. Internally, the tree structure just won’t follow the creation time order. There is still a defined sort order, it just doesn’t sort based on criteria that a human would consider “useful”.

@PatrikTrefil Месяц назад

I am glad that someone cleared this up in the comments. There is still one thing that is unclear to me though. How is it any different if you use ids that are not sorted based on time creation and use an index for createdAt? The secondary index for createdAt is again a B+ Tree (or any other datastructure) which again causes hotspots. In this case the hotspots are just in a different index. It seems like the performance issues are still there. Could anyone explain if and how this solves the problem?

@pacifico4999 Месяц назад

But what about page fragmentation?

@TricksterRad 18 дней назад

CUID's rationale behind why you don't need to worry about performance is "your db is fast enough anyway". The examples of security issues given are quite literally examples of bad security practices that won't be solved by just using a random ID instead of a sequential one. Here are the security issues raised btw: Unauthorized PW reset via guessed ID Unauthorized access to private GitLab issues Unauthorized PW reset via guessed GUID In case 1 and 3, the solution isn't random IDs, but signed password reset tokens with an expiry. In case 2, the solution isn't random IDs, but proper access controls on resources. In all cases, this is not a database design issue - it's a security practices issue. Designing your way out of this by degrading your database performance is a solution, but I hope we can all agree it is not the optimal solution.

@benmeehan1968 Месяц назад

Curious why there is no evidence regarding performance, only vague handwaving that storage space is a factor. What this does show is that MySQL has an incrementing index type and an indexing mechanism that is optimised for incrementing indexes. My (naive) understanding was that with a B-Tree, if the index values were random, the tree will likely tend towards being balanced (at least over time), and that B-Trees don't perform well with sequential index values, hence the need for an optimisation. Sequential values don't scale horizontally, hence the need for a larger 'namespace'. Sequential 32 bit values and random 128 bit values are apples and oranges, and meet different requirements. Choose your index keys appropriately, if horizontal scalability is a likely to be a factor (which is often a guess when we don't know just how large or distributed a storage system is going to be). Clearly a UUID can't compete with a 32bit sequential when you have a single key generator, but a 32bit sequential can't even function when there are multiple key generators (without segmenting the address space). Horses for courses.

@workmad3 Месяц назад

Yeah, the main thing I got from this was "MySQL sucks for UUID keys", because they've optimised their b-tree index page generation around sequential keys, and don't have the ability to change index type easily. Postgres, on the other hand, has a native uuid type that can automatically cast from strings, so you don't need to mess around with conversion functions, has a different optimization for b-trees that, while still has uuid impacts, isn't as severe from what I can tell (it's mostly a cache issue in postgres, as uuids are more likely to be spread over multiple pages, so if you've loaded a page into cache, it's not as likely to be hit again), and also gives the option of other index types that could be better depending on exact db use (like the hash index type, which only supports equality operations)

@7th_CAV_Trooper Месяц назад

You don't need evidence. This is btree 101. Left inserts are slow. Go watch a video to find out why right-only append is faster.

@benmeehan1968 Месяц назад

@@7th_CAV_Trooper I absolutely agree, but there is no quantification of how big the effect is, and that might mean any argument for optimisation is not relevant to any particular use case.

@7th_CAV_Trooper Месяц назад

@@benmeehan1968 worst case is every pk inserts at index zero and forces full right shift of the array each time. I think DB perf is quantitized in terms of O notation rather than ms.

@7th_CAV_Trooper Месяц назад

@@benmeehan1968 good point about perf not relevant for every use case

@dragons_advocate Месяц назад

"Planetscale is not very liked atm..." -- "Thank you planetscale for sponsoring this video"

@Kazyek Месяц назад

What I don't see mentioned here and should be mentioned is that your Database IDs, which you use for indexing, and referencing relationships, doesn't HAVE to be the same value you use to reference records from the API's PoV. You can just use an integer pirmary key, then add a binary UUID field with a unique key constraint, which you don't use for joining / referencing in the database but is used at api-surface-level to identify the records.

@TheCryn Месяц назад

Yes but I don't see how this improves things. Consider your primary key being an autoincrement integer, that you never expose via api and an object id that is some sort of UUID that you do expose. If you expose an object via the object id (UUID), you probably want to have an index on the object id, as you now need to access the object via object id (UUID), as you don't know the primary key on an api level. So you are at the rebalancing problem again, this time "just" for the index and you probably don't win anything performance wise. If your dataset is so small that you don't index, then the performance hit for using a UUID PK probably won't matter.

@HenryLoenwind 28 дней назад

@@TheCryn It improves things as you now only have one table with one index that has UUIDs. And this table will, in many cases, be relatively small, as the things you expose via ID often are short-lived (sessions, password reset tokens, email verification tokens, ...). That's way better than having a UUID primary key on each and every table---that could easily be hundreds of them. In any case, using a UUID as an external ID for a limited number of externally addressable objects is different from using it as the primary key for every table in the system.

@TheCryn 28 дней назад

@@HenryLoenwind I think most real world applications I have worked on had their PK for most tables exposed in one way or another (depending on where you draw the line on exposure). I have worked on a legacy system that dates back before the introduction of UUIDs there the original designers have put much effort into not using autoincrement PKs but to generate something time based (for sorting purposes) but also random (to prevent congestion on DB pages, if a lot of IDs/objects are added to the DB). You should also consider that a lot of IDs you expose (i.e. order IDs for a web shop) can give away insights you often don't want out there (see: German tank problem). So you probably still want most API IDs to not be predictable / autoincrement and need an ID-table for almost each table and an additional join for most queries (that can have a noticeable performance impact itself).

@titbarros Месяц назад

ULIDs resolve most, if not all of the problems mentioned. Is my go to now

@ckpioo Месяц назад

did he really just say storing 20x more is no big deal casually?, all computer engineers biting their nails rn

@Luxalpa Месяц назад

The 20x figure is highly misleading, because it depends on what is multiplicated. As he said in the video, your records already contain a lot of data, in that case it wouldn't be 20x, more like 1.05x.

@ckpioo Месяц назад

@@Luxalpa its not misleading instead you're misunderstanding it, the 20x figure is taking into account just the primary key, and youre also considering the *useful* data stored with the primary key, which you shouldn't be doing because the point is that if you consider the keys as useless data (because its mostly not used by the end user in an app) then what the 20x figure says is that as your data scales up, a larger % of the data will be "useless" data compared to another database which has the same data but is using a incremental primary key. the whole point of the key is just for locating the useful data, so if you are using a system which requires 20x more space to do just that then its basically just a big waste of storage space AND compute.

@Sharess Месяц назад

@@ckpioo you are correct, but overall, that 20+ x increase can be disregarded, unless you have an insane amount of data where reducing the storage and compute requirements would result in significant monetary savings

@DavidLindes Месяц назад

@@Sharess and @Luxalpa: the thing is, 20x _storage_ (long-term) for just the primary keys may not be a big deal in terms of overall table size, but it _is_ (or at least can be) for the _index size_ -- which does indeed have compute implications, as Theo discusses. And, ya know, it's nice when indexes are actually fast -- I've seen complex queries go from something absurd like 30 minutes (memory hazy on this, so grain of salt, but a long long time) to I think sub-1-second just by indexing a column that should have been indexed in the first place. But if those indexes are 20x slower (which may not strictly be the case here, not sure, but still)... ??

@danhorus Месяц назад

Keep in mind that these primary keys also appear as foreign keys in several other tables. It is not uncommon to have multi-million row transaction tables with dozens of foreign keys. This shit adds up. I'm also disappointed the video doesn't mention the performance loss in JOIN operations, which is also a big deal for applications with lots of users.

@worldadmin9811 Месяц назад

RIP ULIDs got no mention even when shown in article

@JohnDoe-jc4xp Месяц назад

Right ;(

@xelspeth Месяц назад

23:47 ULID mentioned

@QuentinDurot Месяц назад

We have to remember that he is a react dev and forgive him

@tabletuser123 Месяц назад

“this video is sponsored by planetscale” is a good way to make sure we know the video is not credible

@xelspeth Месяц назад

I mean he's just reading a blogpost and giving his own opinions, I don't see how this video would be credible in the first place lol You're supposed to form your own oppinion anyway

@TFayas 25 дней назад

Solution I've always used is have an incrementing int id primary key and a guid for external reference (and in some cases a unique human reasonable code name). Id for performance and internal joining, code name if you need to reference it in code, and guid for matching across environments in staging situations where you push items from one environment to another.

@disjustice Месяц назад

I generally use an integer sequence as the primary key and have an associated UUID in a unique index. The primary key is only ever used internally in the DB model and in application code. When exposing the entity in public references, the UUID is used. In addition to the issues touched on in the article, this saves us from another drawback: having to use that same 128bit UUID in foreign keys all over the DB. This goes for any globally unique identifier scheme. Because these unique identifiers need to be globally unique, they are generally so long that they are unwieldy to use as primary keys, even if they are sortable.

@Techman83 Месяц назад

This article appears to be entirely misunderstanding UUID as incremental versions, along with MySQL not handling them as well as something like PostgreSQL. UUID4 for guaranteed random, the rest have different, but valid use cases.

@iPuls3 22 дня назад

If used properly in MySQL, it's stored as binary.

@MeriaDuck Месяц назад

As others also pointet out, they are not versions, they are _variants_ with purposes per variant.

@fifty6737 Месяц назад

UUIDs event account for Time Travel but only as back as 1568, apparently, in the 1567 they were using Integers

@ericf6964 26 дней назад

Another problem with UUIDs is logging. First, logs almost always store UUIDs as text, which as you pointed out is very inefficient at 38 bytes per. Second, since they are designed to be unique, they tend to be relatively uncompressable compared to normal text. The time based UUIDs may be less bad in that respect. But the bottom line is that if you have a requirement to log your identifiers, then using UUIDs will practically guarantee huge storage costs for those logs.

@everton000rsc 25 дней назад

Good point, at massive scale any extra byte matters

@someed3702 24 дня назад

@@everton000rsc If we're talking extra bytes on a record, not really. When we're talking about extra bytes on every index record, Absolutely. It's easy to think that the index files are storing one copy of the primary key per record in the database. If your index is just for equality matches, that's even correct. But if your index handles sub string searches or most kinds of approximate match that's very much not the case. Note that I'm not just talking about the index for the primary key, which I would expect to be an equality index. I'm talking about any other columns on that record, so some of them could reasonably be indexed for sub string searches.

@Denominus Месяц назад

UUIDv7 is suitable for all the relational dbs, except MSSQL which would require V8 and a custom generator. Our largest table is many billions of rows and UUIDv7 still performs well in Postgres.

@paulie-g Месяц назад

Out of curiosity, what stops you using int autoincrement as primary key and uuids as an extra field if you need them for some reason? Is it some sort of roll-your-own-sharding thing?

@Denominus Месяц назад

@@paulie-g An earlier version of the system did actually have that, but it didn’t buy us much (if anything). It’s a big distributed system, so to say 99%+ of queries were done by the uuid (external_id as we called it) would be an underestimate. It also wrecks one of the benefits of using uuids as identifiers, that is you will need to do round trips to the DB on insert to get the ID for FKs.

@paulie-g Месяц назад

@@Denominus I always assumed people would use stored procedures or equivalent to avoid the round trip.

@uttula Месяц назад

@@paulie-gwhen you already have a (for most practical purposes) unique key you can use, adding a autoincrement field to use as your synthetic primary key serves zero purpose aside the fact that meatbag humans like sequential integers for some weird reason.

@paulie-g Месяц назад

@@uttula UUIDs are synthetic as well. Integer primary keys are an implementation detail. And they're an implementation detail for a reason that has nothing to do with human preference.

@username7763 Месяц назад

UUIDs were not designed to be database primary keys. They were designed to solve problems with distributed networking, not data storage. There is no problem with UUIDs if used as designed. Kids these days see UUIDs and immediately think databases. Ha!

@debasishraychawdhuri Месяц назад

MySQL does care about a distributed database, so they are interested mostly in the performance of a single B-Tree. Imagine how hard it would be to create an auto-incrementing ID in a distributed insertion. If you use an auto-incrementing ID, you can only insert in sequence. This is why people use UUID. Theo needs to learn about databases. Everyone writing enterprise software needs to learn about databases.

@scragar Месяц назад

Percona solved distributed ID insertions by having each master get a sequence, then each master only inserts IDs matching it's sequence mod the number of masters. So if you have 3 master servers you'll have one only inserting 1,4,7,10; another only inserting 2,5,8,11; and a third inserting 3,6,9,12; this ensures they never conflict on the ID. This does however mean the auto increment doesn't retain consistency between servers.

@workmad3 Месяц назад

@@scragar it also means that if you need to add a new master into the mix, you have more work to do adjusting every node, especially if you're using the id%n to identify the master for some purpose as well

@keithjohnson6510 Месяц назад

@@workmad3 A variant of @scragar option you just use a compound primary key of RecordID(seq), ClusterID, . ClusterID could even be a single byte giving you 256 nodes, U32 for the RecordID, then every node can have 4294967295 records. Guaranteed no collision with 5 bytes. Synchronising clusters is also very simple. Note also because is a sequence, RecordID, ClusterID, performance is also a added bonus as B-Tree's balance better than UUID.

@Novascrub Месяц назад

@@scragar We used to do something like this on the client side. Each client (web sever) would claim a block of ids by inserting a claim into an ids table (i claim id's 90-99), and then assign them client side. Combed uuids are just better. We knew about the page splits problem and solved it in 2003.

@paulie-g Месяц назад

You need to learn a bit more than you know to get you to the level you think you are at. Lamport timestamps or equivalent are necessary in a distributed system anyway for serializability and you get your unique, monotonically increasing id for free.

@Chamassa1210 Месяц назад

The bad part about the size comparison, is that where it matters you are probably using bigint already, making binary(16) only twice as big as bigint, without collisions

@XCanG Месяц назад

I used snowflakes a lot and like them slightly more, than other type of identifiers, especially considering that each of them have datetime baked in.

@2penry2 Месяц назад

Yeah snowflake IDs are nice, I like being able to sort via time on them

@Viviko Месяц назад

Why do we need to sort by ID? Why not sort through other fields, like a user’s name, creation date, etc…? Point of the ID is… as the name implies, identify some data. We don’t need to sort it using that field.

@7th_CAV_Trooper Месяц назад

Because the clustered index is sorted by the PK.

@itsbazyli Месяц назад

As explained in the video - it helps to improve performance of retrieving a single record by ID. It's not related to your application logic's sorting, just to how databases access records by ID.

@7th_CAV_Trooper Месяц назад

Sorted data makes it possible to perform a binary search. The ID isn't the offset into the data file. So even a point query requires a search.

@everton000rsc 25 дней назад

When the database gets to huge sizes any index created is costly. I have a use case in my company where the ordering by ids is used, if we had to order by createdAt we'd need an index in that column, and guess what would be the size of that index in disk in our DB? Almost 1TB (and it takes weeks to create)... The primary key is almost always in memory (cached), almost always the first and fastest to load, so using it for ordering is much less expensive compared to creating another index just for that. But I understand that use cases vary, depends on what kind of system you're working with and it's business rules and requirements.

@johnredberg Месяц назад

So... "The Problem with InnoDB".

@paulurban2 Месяц назад

Having sequential IDs create contention hotspots in the B-Tree. No good for horizontal scalability or vertical scalability.

@user-qq7yc1qp8z Месяц назад

Why even care about planetscale? For production, you have AWS RDS, for small projects you have Supabase.

@MrDadidou Месяц назад

how does it invalid the whole discussion around uuids ? It's like having a guy screenshare his IDE when talking about a complex programing question and your only input is "dude why even use the gruvbox theme in your IDE? you have Catpuccin and DoomOne available."

@carlosmspk Месяц назад

@@MrDadidou I like that you tried to show how silly his comment was, but ended up providing a valid comment for a video. It's perfectly valid for you to comment on whatever shows up in the video. He could even just go "Damn Theo, your hair is looking mighty fine today" and it would be a valid comment.

@jeffreyjdesir Месяц назад

@@MrDadidou I just wish the discussion wasn't based on faulty premises (like UUID variants serve to succeed their formers) cause it suggests biases I'm sure we're all aware of

@everton000rsc 25 дней назад

RDS gets really expensive when you get to the point you need horizontal scaling. Even planetscale is too expensive, my company choose to deploy our own vitess cluster manually to be able to save almost 2/3 in database costs. I get it that people don't like that planetscale removed the free plans but after I suffered so much in the migration from mysql/mariadb to vitess I understand the real value of what they are delivering. And they're almost the only maintainers of vitess atm so i hope they don't go out of business anytime soon...

@raphaelmorgan2307 22 дня назад

"why even care about planetscale?" because they're paying him money, obviously! (ty for posting other options)

@michaelutech4786 Месяц назад

You can always use UUIDs (or any stable data whatsoever) to compare them orderly. The problem with non-ordered ID's that that when inserting data, they are not growing according to that order. Sequential IDs do that, the next ID will always be greater than the previous one. When inserting random IDs, they tend to fill pages better (it's random on which side of a node the tree grows). Not all indexes are necessarily trees. The problem with such statements is that there is no A is better than B in all generality. You have to actually look at what your systems do with your data. If you need distributed keys, e.g. keys where there are multiple authorities (instances) that generate synthetic keys (keys that are not data driven), then UUIDs and other mechanisms make the handling of these authorities easy. That has a price. If you don't want to pay this price, you need to pay another one. Maybe you need to use data storage that works efficiently with your keys. Maybe you organize your key-namespaces or scopes manually such that you can use the most efficient local storage and key management and save runtime performance but then have to invest in deployment configuration. Any attempt to find a silver bullet can at best work for were wolfs and vampires but will fail for mosquitos or blue whales. These generalizations and over-simplifications are just stupid. UUIDs are perfectly fine and battle proven data types for certain applications. When used in the wrong context or in the wrong way (version, ...), they fail, just like any other mechanism does, 32 bit numbers or 640K were proven not to always be enough, despite timely expectations.

@paulmoore3755 Месяц назад

Some reference to the collisions that included the context around them would be appreciated

@winchesterdev Месяц назад

UUIDv7 can have counters which makes them sequential as well. The UUIDv7 that could be shipped with Postgres 17 has a randomly-seeded counter.

@mmathijs Месяц назад

Who needs standards, its all about the vibes. UUID v8 is by far the best

@Zeragamba Месяц назад

variant 8 still uses the same formatting of the 128bits so it's compatible with any system that can store UUIDs, but leaves the specific way to fill those bits up to the user/system

@yuwei2522 Месяц назад

Sounds more like a problem with MySQL. Some non-relational databases would actually benefit from the randomness of UUIDs.

@jongeduard 28 дней назад

Very true, I don't remember all the details, but for example SQL Server has solved this problem because of the way it sorts and stores the rows in a sorted order, while it generates the ids based on timestamps at the same time. Although it's still important how you configure things exactly, there are certain details which need to be done right.

@everton000rsc 25 дней назад

Unfortunately not all systems can/should/are built with non-relational DBs, it has some really bad drawbacks for some use cases

@amadensor 12 дней назад

Most relational databases also benefit from randomness. It keeps the trees balanced without having to do splits and tree balancing. You don't need to leave a gap for every potential value.

@TerranVisitor Месяц назад

@Theo : Funny. About 4 Decades ago I designed a UID that incorporated brute force and accident aspects - such as mentioned herein - for incorporation into (truck) weigh scale software. The software which utilized this UID was smaller than 1.44MByte, including graphics, credit card processing, remote access and update, and recording of transactions. The software was stored on an EEPROMish type device which emulated, as you MAY have guessed, a floppy disk. For those curious, the OS was (a form of) DOS and it was programmed in BASIC and assembly.

@JacobP81 21 день назад

Sounds like the UUIDv7 would make the most sense if you are using a UUID as a primary key because it combines the date/time with a random number, which to me actually sounds perfect if you need a primary key and just an incrementing value won't do. I also really like that it uses the UNIX Epoch for the time.

@TizzyT455 Месяц назад

Cuid2 sounds unnecessary. Its just a seeded sha3 from my reading. What is wrong with just combining a sequential counter and any secure prng? What I tend to use is an N-bit sequential part and an X-bit random number (usually 64/448). This allows to generate sequential numbers up to N, while having collision resistance of X). If you really wanted to, you could just use SHA3 to generate the X bits yourself if you want the same randomness as cuid2. Of course adjust as needed.

@peppybocan Месяц назад

I haven't heard of CUID, but I have heard of the ULID. I don't need the 128bit of random data. I am fine with the 64-80 bits of random data. It doesn't even need to be cryptographically secure, to be honest.

@foxquirk 22 дня назад

I saw the year 1568 and immediately thought 'Gregorian calendar', no joke, for real

@MikkoRantalainen Месяц назад

15:20 I don't understand this argument. If you're happy to store 32 bit integer in binary form, you should be happy to store the 128 bit UUID in binary form, too. The database can convert from binary value to human readable value on the fly, just the way it happens for binary 32 bit integers, too! With the same logic you could argue that 128 bit binary UUID is actually better format than 32 bit integer stored as base-10 number as a string. That said, if your database doesn't support UUID column type, maybe you should switch database.

@Mike-zr9wq Месяц назад

You'd have to be crazy to pay PlanetScale today

@sd_pjwal Месяц назад

My first startup I worked for, the main DB admin insisted on using primary keys made up of real user data or composite keys made up of real information. It was so insane. Social Security # as primary ID for a user? This was a benefits enrollment company. Shocker, some of our customers had illegals that shared SS #'s.

@User_2 Месяц назад

> Illegal argument exception 🤭

@7th_CAV_Trooper Месяц назад

SSN isn't unique by law and person can get a new one. They are not identifiers. Don't use natural keys.

@Spiker985Studios Месяц назад

@@7th_CAV_Trooper SSNs + Birth Date is unique, and it's why they are associated together. You will never see an SSN field without a DOB field, unless they're looking up your DOB by some other means

@7th_CAV_Trooper Месяц назад

@@Spiker985Studios yes, but OP suggested using SSN alone as PK, unless I misunderstood.

@RickYorgason Месяц назад

In Canada, I believe it's illegal to use social security numbers as database keys.

@mattiasfagerlund 23 дня назад

Random ids also have the disadvantage that for a large table, if you need to access 1000 rows, they'll be stored in 1000 different pages, even if the rows are conceptually consecutive. Non random ids instead create hotspots during inserts. So a semi-random index would tick both boxes, but every use case is different.

@Gersberms 26 дней назад

It turns out I accidentally invented my own UUID for references created on different systems. I wasn't smart enough to realize that this is what I was doing, but I was smart enough to have a sortable timestamp built in. I suppose I can pat myself on the back while kicking myself.

@everton000rsc 25 дней назад

Haha, it happens. Creating a whole solution for a problem and then finding out that there are already libs that solved that same thing but better... (happened to me more times than I would've liked...)

@TheOmfg02 Месяц назад

5:00 complains about 1568 as an implementation detail… proceeds to use 1970 instead.

@linminsu3443 Месяц назад

1970 is the unix epoch, which is already used everywhere

@mzg147 Месяц назад

complains that years 1568-1990 will never be used, therefore it's wasteful

@TheOmfg02 Месяц назад

@@linminsu3443 i think you missed the point of my comment 😅

@KangoV Месяц назад

Anyone who's done anything with dates should know what the Gregorian calendar is and when it started. I knew it straight away :) Try converting dates to other calendars like Chinese and JIS and you'll see how important this is!

@gbjbaanb Месяц назад

UNIX epoch time is as arbitrary a standard as Gregorian or any other time standard. You just have to pick a start point and a storage size. UNIX time isn't necessarily the best, as we will find out in 2038.

@dahahaka Месяц назад

Is indexes nowadays an accepted form of indices? When did that happen 😭 Edit: typo

@ceigey-au Месяц назад

Well, it's been more acceptable than spelling "accepted" as "excepted" for a while, but both are understood colloquially _irregardless_ ;-) (sorry, couldn't let those opportunities pass by...)

@dahahaka Месяц назад

@@ceigey-au omfg how did that happen, well that's embarrassing XD

@dahahaka Месяц назад

@@ceigey-au no worries, I would've done the same 💀

@ForcefighterX2 12 дней назад

If splitting your database is an expected requirement at some point, you can still add a local auto-increment number which is NOT your primary key, but only locally unique. This way you can still scale horizontally but also split by number locally on every DB.

@md5ize Месяц назад

Add to the advice including 'we don't write tests'. If you see this guy on a proposal, run.

@everton000rsc 25 дней назад

I had forgotten about that hahahaha

@robertluong3024 Месяц назад

4am, waiting for my medical procedure to start and I'm spending it learning about UUIDs.

@martinadoesthings Месяц назад

😮

@landsman737 Месяц назад

get well soon!

@klaudyw3 Месяц назад

Is the versioning done by the USB forum? It sure looks like it might be.

@KangoV Месяц назад

I think you've misunderstood what version means (it's actually the type).

@Spiker985Studios Месяц назад

@@KangoV I think that you missed the fact that the USB spec is horrendous, and that if using the version marker on UUID as an actual version, and not a variant - they are comparable to how horrible they are to follow

@speedytruck 23 дня назад

Regardless of the video's contents, what Firefox extension did you use for the tabs?

@ShinigamiZone Месяц назад

UUID v4 do not only have the "4" statically at index 12, but also there is a "8, 9, a or b" at index 16 (starting index by 0) So its /^[0-9A-F]{8}-[0-9A-F]{4}-4[0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i

@dfs-comedy Месяц назад

Using MySQL is the first mistake. FWIW, PostgreSQL has a native UUID type that occupies 16 byes of space. But yeah... I don't like UUIDs as primary keys.

@nicolaicornelis2853 Месяц назад

binary(16) is the same as a native UUID type though (storage wise).

@ziv132 Месяц назад

I personally like ULID's, not just because they are incremental, but also the fact that I can just swap to using them in the same BINARY(16) column that was used for UUIDs (probably not the best idea to switch and have mixed content, but eh)

@JimCraddock 27 дней назад

35 years as a data architect. Inevitably, I run into uuid's being used by people that do not understand databases. Use an incrementing integer for your clustered index, preferably make it your primary key. The primary key is included in every index on a table, so if you are using the uuid as the primary key you just increased the size of every single index by a lot compared to an integer.

@everton000rsc 25 дней назад

Yes, I initially believed that UUIDs are excellent and didn't care much about what ppl said about it's dawbacks, but after working on backend doing some query optimizations and understanding better how relational databases work I now understand how naive I was

@DonAlcohol Месяц назад

also for a start date/time to begin counting i have a good one :P , theres this relly weird but consistent thing the eath does , its like a wobble, and the whole thing lasts about 26000 years (25800) but as a consequence of this the sun rises in a different constellation on the spring equinox every 2160 years... so if we pick the beginning of one of those as the starting point to begin counting time wouldnt that be a great idea ? coincedentally the last time we shifted (from Ram to Pisces (its reverse form the horoscope) , its at the same time we shift from BC to AD so 1 jan of the year 1 seems like a nice spot to start counting time not?

@trapfethen Месяц назад

Couple Things. UUIDS can be stored as a regular column with an index for fast lookup while using an integer for the primary key. The UUID is used for externally representing the resource, while the integers are used internally for performance (it is more resource intensive to do a 128-bit compare than a 32 or 64 bit compare on most CPUs and that is if you use the binary format). This approach even allows for distributed systems to coordinate via message passing, only using the UUIDs instead of the internal integers which are treated as an implementation detail. This approach is less attractive than just storing the UUIDs as the primary key which makes coordination and replication trivially simple by comparison. It is an option worth exploring based on your needs. The second thing is that the rebalancing problem introduced by random UUIDs is more severe than communicated here. While it is true that random UUIDs can cause databases to not utilize a larger portion of page sizes, the real performance hit is the stochastic nature of B-tree rebalances. B-trees are a compromise between the theoretically optimal B-tree and the reality of modern computers where data retrieval is severely slower than multiple comparisons. B-trees lean into splitting up the tree more aggressively at each level and requiring more value comparisons over-all in order to reduce the number of costly data-fetches per look-up. This causes each rebalance to be more expensive than in a comparably unbalanced binary search tree and require touching more nodes than in a BST. In particular, unlike in a BST. b-tree balance procedures requires frequent changing of the node's content rather than just changing the links between nodes. Coupled with the ripple effect that can occur where b-trees might have to traverse all the way back to the root node in order to rebalance, the performance hit is worse than it first seems and is slightly random itself. When b-tree insertions are roughly ordered, these copies and extra comparisons are more akin to amortized costs like we see with auto-scaling Vectors. When the insertions are random, the frequency of multi-level rebalancing operations increases. If you support key removals, even more so.

@nouveu1 Месяц назад

But primary key is also an (unique) index and all indexed columns is what causes pressure when writing stuff. More indices means bigger penalty hence having both sequential id and uuid doesn't help and only make things worse. Not to mention that under hood index probably don't store whole record but pair uuid+id hence you have to make more lookups to get whole record.

@ed.puckett Месяц назад

I avoid sequential ids because they make it too easy to accidentally reference an incorrect record. With uuids, there is no chance of this.

@markm1514 Месяц назад

Editor: 'If you use the word "represented" too many times, too closely, it's bad writing, grammar and meaning be damned.'

@Danny1986il Месяц назад

I think in UUID the v stands for variant, not version

@sturdyfool103 Месяц назад

I always love the large discussion around databases, because they’re nearly useless to me, as in, in fifteen years I’ve never once used a database, and only three of the thousands of my applications would’ve been an ok use of a database, with the rest just needing a config file

@fgregerfeaxcwfeffece Месяц назад

The guy who came up with the 1568 wins a paddlin. That's a paddlin well deserved if I ever seen that. Unixtime was already the established standard and this new one does nothing better.

@guguludugulu Месяц назад

First UUIDs were used in 1980 and were counting time since 1980-01-01 and then were standardized around 2005 to use a gregorian calendar. While you could argue that unixtime was an established standard in 2005 i don't think it was "the" established standard back then since the programming landscape was very different and most of the programmers would use windows (well, arguably they still are). Windows times starts at 1601 btw. Either way that's a non-issue. If you have at least enough of a brain to carefully read the RFC and know how to count you won't have any problem implementing UUID.

@fgregerfeaxcwfeffece 22 дня назад

@@guguludugulu Fair point.

@MrL22 27 дней назад

Can someone explain what is wrong with storing number based id's but then using an encrypted version of that id for anywhere visible by the user? I have been using red-explosion/laravel-sqids

@zebraforceone Месяц назад

Of course they hurt performance, they will continue to do so until 128-bit architecture is a thing. They're not supposed to match integers in terms of performance.

@Hypn0s2 Месяц назад

Jebus. I couldn't last the full video. I don't know how you followed this all. I need to wash my brain now with something else and pretend this isn't real.

@diamondkingdiamond6289 Месяц назад

It isn’t those aren’t even the main reasons of why you shouldn’t use uuid in MySQL. Using uuids in MySQL is bad for performance in MySQL but it’s fine in postgresql

@KangoV Месяц назад

Perfectly easy to understand. Versions in this instance are actually Variants. Each variant does not supersede the last.

@denisblack9897 Месяц назад

Like in Loki?😅

@DominikZogg Месяц назад

Can we all agree that mysql is more of a pain in the ass than a decent database? So it's not a relevant metric for any solution.

@aldi_nh Месяц назад

@DominikZogg Месяц назад

@@aldi_nh if you burned your fingers often enough you'll change your mind. Postgres besides its ugly user management is superior. Or switch to non relational database cause most uses cases are better represented by documents anyway.

@aldi_nh Месяц назад

@@DominikZogg oh please, i beg to differ, mongodb are even harder to mantain. i dont know where you're coming from but from my experience, very little use cases are suitable with documents

@DominikZogg Месяц назад

@@aldi_nh web applications, websites

@byuwur Месяц назад

My dude doesn't know how a relationship works.

@gro967 Месяц назад

As long as you are using UUIDs with PostGres in your DB and Nanoid everywhere else, there is no problem.

@thomaseichinger1717 Месяц назад

What do you think about to use UUIDvTimebased for Businessobjects and snowflakes for provessingIdentifyers what are also not lived "long". integer/bigint/.. are amazingly fast because of the processor structure and also because of the storage structure.

@SatoshiCommentorto Месяц назад

click baited into a blog screenreader again

@MichaelKire Месяц назад

Regarding the sortability being a potential security: Security by obscurity is not actual security.

@jwueller Месяц назад

Technically it is if the obscurity is big enough. All of cryptography is technically just very hard to guess, not impossible.

@itsbazyli Месяц назад

There are a number of features where obscurity is the default mode of security. Think about situations like "share via link". You can do that to an unlisted RU-vid video. The only thing you need to know to see the contents is the link itself. If RU-vid's video ID was sequential, or guessable even just in part, you would be able to enumerate videos one-by-one, and discover every unlisted video on the platform. If the video ID is fully random though, your chance of "stumbling upon" an unlisted video is significantly lower. Not zero, but combined with rate limiting and other anti-bot security measures, becomes a pretty decent barrier from a bad actor doing just that.

@ivanfilhoz Месяц назад

This is a good motto for source code or other implementation details, but it shouldn't be taken generally. Exposing less information is part of security. The whole point of secrets, like keys, tokens or passwords, is to keep them safe. But they're not the only data that should be hidden, are they? Your application must only expose information it is supposed to, and if the sorting is guessable by your IDs, you are leaking data. Timestamps are not always sensitive data, but they often are, such as in bank transactions or medical history.

@pauldraper1736 28 дней назад

Bigger data size is extremely important. Sure it's irrelevant for disk storage or row size, but database are often RAM constrained. And perform terribly if they can't fit their indices (including UUID PKs, FKs) in memory.

@meryplays8952 13 дней назад

I really like this guy drawing attention to these subtle issues.

@tinymouse2 Месяц назад

I just use v4 and write my code with the idea of "can this collide? Oh it can? I'll check before generating it" Adding a read before generating and a write is a worthwhile step to ensure I don't run into a collision

@NubeBuster Месяц назад

You're writing a check for UUID4 collision? If you gave every person on the world 500 million uuids, there would be only 50% chance that there is 1 collision.

@tinymouse2 Месяц назад

@@NubeBuster What can I say, I'm generous with my UUID's. It's seriously overkill I know, but in the systems I've helped implement, the result of a collision could mean personal user data gets sent to the wrong user and that's not acceptable to me.

@jameslouizbuilds Месяц назад

Realistically how likely are you going to get a collision? Are you operating at a huge scale?

@tinymouse2 Месяц назад

@@jameslouizbuilds I added some logging for when we have a collision, and so far we've had none. The scale for the biggest thing I worked on with this is still relatively small, but we're generating maybe a few hundred UUID's a day, and doing a read that many times a day is a pretty small overhead for the absolute guarantee that we don't pipe sensitive information to the wrong party. If it was for something where security didn't matter I probably wouldn't have added the collision check honestly.

@NubeBuster Месяц назад

@@tinymouse2 tell the global news outlets when you get a collision. This will be world news.

@DominikZogg Месяц назад

Autoincrement (number) cannot be used by developers who care that an entity should have an id before being persisted, so for me any argument pro them is irrelevant.

@gbjbaanb Месяц назад

You pre-allocate the id in such cases. Most DBs will let you do that. But frankly you're probably doing it wrong if you want an entity to be tracked by the DB before you've even added it to the DB.

@fgregerfeaxcwfeffece Месяц назад

This makes me feel justified in using my own function, that literally just uses the crypto safe random byte generator of the OS. It's slower then the regular one, but slower in this context means: several megabytes per second. So it's just unfit to overwrite entire drives. I just took the shortcut and it turned out to be just way less of a mess, while being better at doing the job it is supposed to do.

@d3fau1thmph Месяц назад

I use internal sequence as PKEY and UUID as "public id". PK is never used in the code or visible in any API, it is only used on the database level for relational stuff. That aside, are UUIDs equally problematic in PostgreSQL?

@hansdietrich1496 Месяц назад

And all that hazzle, just because incrementing an integer is too boring for some people ...

@nnnik3595 Месяц назад

Seems like you do not understand basic programming so let me explain: Certain systems become so large and distributed it is difficult and slow to get definite answers. In those systems it is difficult to just "increment an integer". These systems are also fairly common due to the scale of computing we use in every day lives. As such we just generate an ID we know will be unique for all of our use cases.

@Luxalpa Месяц назад

Incrementing requires a mutation, so it requires one to wait for an authority to give out those IDs. That's a nightmare in distributed systems.

@hansdietrich1496 Месяц назад

@@Luxalpa No. You just need to give the initial value to the one creating its IDs (and only check for collision on insert). And for all creating authorities, you have their own IDs. which become part of the ID, e.g. 1_000_000_000_000 * CreatorID + generatedID. All easier than that UUID stuff.

@KangoV Месяц назад

@@hansdietrich1496 We have between 20-100 nodes that are created/destroyed based upon load. These are split across multiple availability zones and regions and are stateless. Now, how would you handle the creation of IDs across that scenario? We're talking trillions (dns queries).

@hansdietrich1496 Месяц назад

@@KangoV Not that complicated. Each of your upcoming nodes has a unique ID. 20-100, so one byte would be enough for that. Be generous and reserve two bytes. So each of your 2^16 max parallel hosts has its unique ID. Then you have two bytes of host-id and six bytes of whatever-id, and you got your IDs sorted within a single long int. If a single host runs out of IDs, just give him a new one of the probably superflous host IDs. And if in doubt, take some int128 and split it up accordingly.

@NeoChromer Месяц назад

imagine using mysql in 2024 holy..

@rafaelmesaglio1399 Месяц назад

So how do NanoIds (and other alternate ids) solve the sortability/paging issue? Not to mention those likely won't be natively handled by the database and language of choice, i don't think it's a great idea to go dependency heavy either

@danielreed5199 27 дней назад

I don't use them, simply because collisions can occur. The thought of trying to find that non-existent bug upsets me too much.

@coolemur976 Месяц назад

"The calculated probability of at least one collision occurring when generating 1 billion UUIDv4 identifiers is effectively 0. " Are collisions even real ? Ok, so you might get an error when creating few billions of records, so what ? Try, catch, retry. And about performance - if your UUIDv4 is too slow for your billions of records, perhaps it's time to archive the data ? I mean everyone was using UUIDv4 for a long time. And suddently it's a problem ? How so ? Or is it just someone wants to push their own agenda while introducing their own type of IDs?

@IceTank Месяц назад

Yeah, Wikipedia says you need to generate one billion v4 UUIDs per second for 86 years to have a 50% chance of collision.

@notangelmario Месяц назад

classic theo use someone else's work when you don't want to do it. man i love your work but you got to stop being so obvious about it

@philadams9254 Месяц назад

Do you mean because he copied Josh tried coding's video? Or something else?

@notangelmario Месяц назад

@@philadams9254 well i didn't know about that. but yeah sure, that and most of the videos. i am quite sad that these days i just click on the video and look up the article he is reading or the video he is watching and just do that instead of watching him

@rob011 Месяц назад

"barely a standard, more a vibe" is iconic

@TapetBart Месяц назад

I work a mot with time series data from measuring devices, and my biggest problem with UUIDs (each device has an uuid as a PK) is that you can basically not compress them. You can compress a 8 byte int by only storing the difference between each int.

@MonsterWriter Месяц назад

As I understand, the database does not need to rebalance the B-Tree very often if the indexed value is sequential (the id is sortable). If the ID is random, the tree deforms much faster.

@BruceWilsonTMB 22 дня назад

That's backwards. If the indexed value is sequential, then all the new rows will be added at the end, and the tail of the index will get very heavy. The existing part of the tree will be fast, the newly added records will be slower to access until it rebalances. With a random ID, the values are distributed equally across the index, and the index stays mostly in balance for a long time. He mentioned page splitting. Page splitting happens with both sequential and random IDs, it's just a question of where (a lot in one place, or a little bit everywhere). He mentioned the worst case of 50% full pages, but failed to mention that most databases tend toward 75% full over time.