Something to note is that the size of your data can affect performance as well as memory usage. CPUs are specifically designed to handle 32 bit and 64 bit values very fast, and sometimes, counterintuitively, an 8 bit value may take longer to process. So, as with everything, premature optimisation is the root of all evil. Keep the age as a 32 bit integer for now, if you have 10 million of them and have identified it as a problem that it uses too much memory _then_ go down to a u8 or use bit-packing methods. It's actually even more nuanced than that, because of cache locality, so actually smaller data can be faster and slower depending on the circumstance. But that's very complex and should be left to experimentation if the need arises.
This is why I think using fixed size integers is a mistake in almost any context that isn't data serialization and/or protocols. For performance conscious parts of the codebases of my recent projects I'm considering having a type selection system that will define word types that will be the optimal size for the CPU of the current platform, like word, dword, qword, etc. Choosing types for your variables is a whole can of worms.
@@Acceleration3 but you have to consider that register access time (what you describe) isn't everything. You also can't treat an object like a lose collection of primitives. For example let's stick to the students example. If you make age, id, birthday,... All 64bit 1 student is a huge object. With tons of wasted space. If you then have an array of students that wastes enormous amounts of space. At some point leading to a cache miss. Then the cpu has to wait for the ram to load the rest of your mostly empty data. At this point your register access times are meaningless by multiple magnitudes. Side note: The optimal cpu spacings are often choosen by the compiler anyway. Meaning that a student with 1 32bit value and 2 8 bit value will at the end be 64bits long. The reason is that the compiler knows you need at least 8bits for a particular value but adding empty space besides it doesn't change the code. Tldr: By trying to be smart you waste space and keep information from the compiler making the result worse. A datatype size is more or less a suggestion for the compiler and it will make smarter decisions than you will. By giving him false information you will not achieve improvements.
@@redcrafterlppa303 I know I'm making matters more complex but if you're going to have lots of students, a better approach to improve on cache locality and reduce memory usage without taking a hit in the performance COULD BE* using Struct of Arrays. Have on array for ages, another one for IDs, another one for phoneNumber... The first student created will have ages[0], ids[0], phoneNumber[0]... COULD BE: IF, your code is going to opperate on only a few of the fields at a time. Operating in all ages of everyone, then later on all phoneNumbers... This is Data Oriented Design. Structuring your data to not keep jumping around in both data memory and code memory.
@@besknighter that's for sure possible and something databases are for. Operating on larger amounts of data is something databases should be used for. Structuring in memory data like this isn't all that helpful and leads to messy hard to read code.
The best bit for me was his remark about automatic type coercion when doing things like "adding" a string and a number: "and this kind of bullsh*t is marketed as a feature". It reminds me of why (among so many other reasons) I hate php so very very much.
@@danielscott4514 honest question, is automatic number to string conversion really that bad? I see people shitting on it all the time but IMHO it's mostly harmless (at least in a sane language). Many languages have misfeatures which are at least 1000x worse (dynamic typing, everything being a reference type, nullability by default, etc.). Of all these things why do people fixate on int + string so much?
@@k2aj710 Certainly being able to do something like "number of results: " + resultCount (where resultCount is a numeric variable) is something that is pretty harmless most of the time. However, if your language lets you concatenate a number onto a string - and if it uses the + operator for string concatenation as well as numeric addition - then what happens when you try "3" + 2? Do you get "32", or 5? For what it's worth, I took the "and this is marketed as a feature" comment in the video's narration to be aimed directly at dynamic typing generally rather than at the specific example they gave. Dynamic typing is what causes the above kind of conundrum to be a thing. No strictly typed language will allow something as vague as "3" + 2. In c# I would have to do it as "3" + 2.ToString() if I wanted "32", and I would have to do it as Int32.Parse("3") + 2 if I wanted 5 as the result. The very nature of the language eliminates that whole class of bug, which easily comes about when (normal non-superhuman) programmers are not intimately familiar with every last detail of their dynamic language's type coercion behaviour. For what it's worth, on the subject of doing something like my first example: as a regular user of c#, I'm very conditioned to combining strings and numbers using various methods that accept a format specifier, which outputs the number with things like currency symbols, commas to separate thousands, various numbers of decimal places etc). I find that in many cases you want more control over how your number "looks" as part of a string than simply concatenating the number in whatever default representation the programming language uses. So, in my view, the value of being able to write code like "number of results: " + resultCount is questionable anyway. Although I spend quite a bit of time in C# currently, I've coded plenty of Javascript, and suffered far too much php (which can truly make Javascript seem sane). Dynamic typing combined with some bad language design can really ruin your day (especially since the bugs appear at runtime only). I'm far happier and more productive in c# where a huge number of errors are definitely caught at compile time and the lack of any ability to write "string" + 3 avoids various footguns that aren't worth risking for the sake of writing "string" + 3.ToString() (or various more modern c# alternatives, like string interpolation, but you get my point). Given this is a comment on a video about low-level performance, as a side-side note, I rarely ever actually use the + operator to concatenate strings in c# - the reasons why are generally well known (if you're not sure why + is bad for concatenating strings, google c# string concatenation best practice and you're bound to get a pretty good rundown of how concatenating strings works and which approaches are best in which cases: they're considerations in any language that gets into the weeds deep enough to give you at least some choice over how much memory gets allocated/used/eventually destroyed in the process of combining multiple strings.
@@k2aj710 Gawsh my earlier reply ended up being a tome (ignore things like the side-side note on string concatenation - I put that there for the benefit of someone else that might read it later ... kind of StackOverflow learned-behaviour I think). Anyway, I just realised I've got so many horrible memories of things that various dynamically typed languages have done to me over the years that I probably didn't really answer your question (since, I've just bothered to check and Javascript seems to give the string in a string + int operation some kind of precedence, and it actually does what I would consider the preferable thing with "3" + 2 - and gives "32"). That said, I've just looked and at 7:26 in the video there are variations on that "3" + 2 theme which do end up doing arithmetic addition instead. I think probably the string + int thing - if it gets mentioned a lot - is more of an easy to explain and demonstrate example of the greater problem of dynamic types in languages. It kind of builds from there into the kinds of problems you can have with code like: if (myVar) { ... } where there are all kinds of rules for what string and numeric and every other kind of value might mean "true".
@@keppycs the channel owner said they were not a native speaker in some other comment replies which is why they use the AI voice. It's definitely the script.
@@Tech.Library it's a speech synthesizer, probably something like UTAU (used for music production) or eSpeakNG (a utility more than anything) should give you similar results
I write Java code for my school's engineering team and holy hell I hate seeing every single value be a double for NO reason. 8 bit signed integer? Sure. But there is no need to have the precision of a *double* of all things. At the very least, if you need precision down to the decimal point, then use a float. What's worse is that the people that develop the libraries should be a bit more considerate of their resources than that given that they are (hopefully) a lot more mature than me.
Yeah, I know. I was supposed to include a little animation of the zig logo saying "are you challenging me?" I just forgot. However, as someone else said, the don't look like that in memory, although with bitwise operations you can do anything with singular bits.
there is same thing for rust just as module. These u31 seems iffy. It would be nice if we could pack them nicely together with NULLs ect, like lets say sizeof(Option) == sizeof(u32)
@@Lord2225 i would think they are amazing for things like rust enums. If you have types in a struct that are oddly sized and guarantee padding you can fit the varient bits in there and create zero cost enums just because a type sacrificed some bits it didn't need anyway. This is exactly what I do in my language that is heavily inspired by rust but tries to fix the sharp corners of the language like lifetimes and dyn dispatch.
I appreciate how almost everything that’s spoken is demonstrated on screen, even going as far to show real error logs from the different programming languages. Thanks for making these videos, great refresher and learning material.
"The reasons behind this limitation are beyond the scope of this video" Noooo...! That's exactly what I was hoping to learn, haha. I'll subscribe if it means you'll cover that in the future!
I'm at 4:30 and while this is generally true, for most uses an int32 is what you need, even if you're wasting some space. This is because due to modern architecture of CPUs, 32 bit int operations will be much faster than, say, byte operations. Conversely the best type for graphics calculations is float. Of course, sometimes it's beneficial to have more choice. But in many programming languages, the default is default for a reason and you still have that choice.
It amazes me how stupid these comments are. 8 bit ops are equally as fast as 32 or 64 ones. It's honestly mindless people who repeat nonsense without understanding. Sure the throughput is lower. Clueless people shouldn't make comments.
@@gregorymorse8423 It amazes me how stupid you are. 8 bit operations will definitely not be as fast as 32 or 64 ones. There are no instruction sets that support 8 bit operations, so these 8 bits have to be converted to 32, math done on that, and converted back to 8 bit. Those conversions takes up precious CPU cycles. On the other hand, smaller variables (like 16 or 32) vs larger ones (like 64) can win performance-wise if the constraint is memory, i.e. more tightly packed data won't have as many cache misses. It's honestly mindless people who repeat nonsense without understanding. Clueless people shouldn't make comments.
@@cheesepie4ever because modern architectures don't support bit operations on 8 bit data. This means these 8 bits have to be extended to 32, arithmetic done on that, and then converted back to 8 bits. This means extra CPU cycles for every operation. On the other hand, 8 bit operations can be potentially faster if you're operating on huge sets of data at a time - in which case the extra operations wouldn't hurt as much as cache misses, since 8 bits will obviously be more tightly packed and you can fit more of them in cache. As with all things performance related, don't theorize; benchmark. See for yourself.
7:33 Actually, in some javascript interpreters, the "default" type is a 64-bit double, but other types can be expressed by setting the exponent to a specific value used for NaN and Infinity. As long as those values are reserved, the rest 52 bits can be used to represent other types of values, including reference types, etc.
As far as I'm aware it's likely to be more than just "some interpreters". NaN boxing is an optimization used very widely in interpreters where the only number type is floating point. It provides a fast way to encode proper integers, allowing the use of the faster integer operations.
@@gregorymorse8423 NaN boxing uses doubles as tagged unions. All 11 exponent bits are 1s, and the most significant bit of the mantissa is a 0 (the NaN is marked "quiet"). The exact contents of the remaining 52 bits are effectively meaningless in floating point math. The number is still a NaN regardless of the data stored in these bits. That means we can use the lower 52 bits of the mantissa for anything we want. Currently x86_64 pointers are only 48 bits wide, so we can store pointers in these bits fine. We can also store integers or anything else we want in those bits. We can also operate in the values using integer math (with the exception that is the value would overflow the 52 bits we use that's a problem we need to deal with separately). So you can store a 52 (or less) bit integer in a NaN boxed double and treat it as though it's a regular integer for more efficient math. If you are only storing integers, you can use all 52 bits, if you have multiple types you are NaN boxing, you use several upper bits as a tag, and you can still have most of the bits dedicated to see data you want.
Regarding the alternative in 8:00, Some architectures prefer aligned address and this leaving least significant bits unused. Usually, 3 bits for x86-64. If you willing to sacrifice it, then you can encode 8 types that fit a register. It is tagged architecture used in dynamic programming language implementation. For example, a fixnum is data with lowest 3 bits zero encodes integers between 2^60-1 and -2^60. This fixnum representation fit perfectly one register size. The addition and subtraction of fixnums can use the same machine instructions. Right arithmetic shift need to be applied after multiplication. This example shows that even with clever optimization and representation, there will be some overheads in dynamic programming language. Other compound data like record (or struct) and array are pointers and aligned and their least significant bits can store the tag.
I like the style / cadence / pacing that you explain things. Just the perfect amount of detail + speed, clean speaking. You have earned yourself a sub and I look forward to seeing what you come up with!
Being a rookie to programing and languages as such, absolutely love how you touched upon stuff that I wouldnt have bothered learning about. What an absolutely great way of explaining usually boring stuff in an easier to understand and fun explanation. Way to go bro!
The speed of using 8-bit variables versus 32-bit variables can depend on several factors, including the specific CPU architecture and the access pattern of your program. On modern 32-bit and 64-bit CPUs, operations on 32-bit and 64-bit integers are usually the fastest, because these CPUs are optimized for these sizes. Operations on 8-bit integers can be slower because the CPU may need to perform additional operations to handle the smaller size. For example, the CPU might need to zero out the upper 24 bits of a 32-bit register to perform an operation on an 8-bit integer. However, using 8-bit integers can save memory, which can potentially improve cache efficiency and overall performance if your program is memory-bound. If your program accesses a large array of 8-bit integers, it can fit four times as many integers into the same amount of cache compared to an array of 32-bit integers. This can reduce cache misses and improve performance. So, whether it's faster to use 8-bit variables or 32-bit variables can depend on the specific circumstances. It's not a myth that operations on 8-bit integers can be slower on modern CPUs, but the impact on overall performance can vary. As always, if performance is a concern, it's best to measure and optimize based on the specific requirements and behavior of your program.
Technically, Javascript's implicit conversions aren't undefined behavior. They are defined in the ECMAScript spec, unlike undefined behavior in C or unsafe Rust which is literally whatever the compiler implementation decides to do. But implicit casts are not intuitive, so code can behave in ways that are unpredictable.
One of the ways to define a programming is its type system. I remember the time when I started with C# and the .NET Framework, I struggled to read and implement these verbose types without getting errors. But they were worth it because it helped me know how to write Python code professionally. Although I hated the semicolons.
Rust's dependence on fixed size types opens up foot-guns that it could have avoided if its default had been arbitrary size types: overflow and loss of precision.
I started learning Rust a few days ago after having 20+ years of Java and higher level language experience. It feels great to get closer to the metal and I can already see this series of videos will be invaluable for filling in the blank spots in my knowledge. Thanks.
I like Rust's explici static types. Some programming languages are counterintuitively implicitly statically typed. Even C falls into this category. Despite it being a statically typed language, the data types are implicit as they can differ from hardware. There is an stdint header file to help with this, but it doesn't eliminate all problems because not all libraries use it. Even if you use it yourself all the time, there is no way to guarantee everyone will, and so occasionally you will import someone else's code that not only may be unclear what its behavior is, but may even be filled with bugs caused solely by you running it on a different piece of hardware. I've ran into this once spending ages trying to debug someone else's code only to figure out that the microcontroller I ported it from defines the implicit signage of a char differently from my own. I honestly cant get upset over that being the programmer's mistake or just being bad code. Personally, I think it is a flaw in the language. I cannot see any justification to make implicit signage or even the width of primitive data types not something defined platform independently. Hot take but code that is written the same generally should run the same on every platform. The only exception should be if the programmer _explicitly_ puts in an exception. Anything implicit and platform dependent is bad language design imo.
"a very bad language won't tell you anything but rather implicitly convert one of the values and then perform any possible operation." *shows JS code* My god that's based, I love this channel already.
Something fun about sizes of things in Rust: for Option where there is a possible invalid state for T, Option is represented as just T, but None is the invalid state. For example, an optional pointer will just be a null pointer in memory if it is None, rather than actually using an extra byte for the discriminator. This concept applies to other enum types as well.
This will only work for Option As for example Option all 0 would be an ambiguous state since it would represent Some(0) and None at the same time. That optimization for pointers is only possible as each type in combination with option is treated individually. An Option would likely be 16 bits and an Option would maybe be 32 bits wide. On why it's not 24 bits look into "struct padding"
@@redcrafterlppa303 I said if there is an invalid state. Zero is a number. I thought I remembered testing this with Option and it working. Update: Just tried it and Option is in fact one byte, or at least my editor says so.
also important to add is that, according to the rustonomicon, such Option can be represented as &T, but doesnt need to, which can have consequences when using sth like transmute
@@redcrafterlppa303 I still think my “invalid state” idea might be correct. I just tested it. Specifically, I made an enum called Test with 255 variants and got the size of Option, which was in fact 1 byte. With 255 variants, every _bit_ is used, but the combination 11111111 remains unused, which if I am correct is what None is represented as in this case.
Size matters but what you put in the variables is more important. Meaning helpful information. Is more important than technically aspect of variables .
God I'm really glad you started spell checking and grammar checking your scripts after this video. The spelling/grammar mistakes in this one are killing me.
See right at the start, Memory usage is important, but when you're prototyping a functionality, and I've worked in circumstances where we didn't know what the final data sizes would be until later in the process, just that it would be an integer or a decimal, so we used larger containers, later I refactored the code when we knew what the final implementation limits should be. You should always optimize when you see a place to do so but making it behave the way you intend comes first you can always look at the memory footprint at multiple stages in development, I like a workflow of "get it to work, get it committed, benchmark, look for concerns, pull request" Also I'm glad to see I'm not the only one that doesn't appreciate when a language does an implicit cast and operates on it. I've had that absolutely wreck me before where I had to read the entire class to find the error, where a strongly typed language with explicit casts would have gone "This isn't something we can do implicitly, if you really want that behavior go explicitly cast it" I'd rather have an error tell me "Hey can't do that implicitly" and let me go review it, because chances are if I didn't cast it myself at the time of writing the code, I made a mistake and fed the function something I didn't mean to. This is also why by default my IDE is set so any warnings are treated as errors, so it won't compile if there are warnings, so I can go review those warning and determine if I just missed a nullable declaration or if I made a more serious error. (during rapid prototyping I will toggle that off and once it works I'll go through the warnings then, but default workflow is handle any warnings before building)
For example to estimate the location of something in a data structure like linked list and jumping straight to the memory value to bypass the linked list at times for performance
Hey, just stumbled on this channel, I really like channels like these. Seeing as you're a relatively new channel, and aren't a native English speaker, I'd be glad to proofread your scripts for you to ensure the grammar is correct and natural. By the way, I'm a senior video game engineer, so I've already got a firm grasp on these topics, so you wouldn't need to worry about me improperly altering the meanings either. Looking forward to more videos!
To summarise this for my reference later: Knowing variable types at compile time 1) Saves space as you know exact amount needed 2) Makes code more readable, no hidden logic behind the scenes. 3) Saves time and more space since you dont need to store data type and read, write and compare it later.
My first computer was a Nascom 1 with 1K of video RAM and 1K of user RAM (960 bytes available to me). I did a lot of Z80 coding, so things like the stack, pointers and different length integers are second nature for me. Hearing "Memory is limited" in 2024, when I have 64 GB RAM, is quite amusing!
7:05 problem isn't that JS or C# convert's int to string automatically (C# has strong typing but also calls ToString() automatically), but rather that uses the same operator + for both concatenation and addition. For example in PHP, which also uses automatic type conversion, + is addition and . is concatenation, so problme of "2"+"2" equals "22" don't occurs. And it isn't undefined behavior when it work's like in a specification. it's only a skill issue.
Your videos are teaching me about considerations that I was previously unaware of. I particularly like making use of arrays when programming. I am realising that it is very important to specify the necessary number of cells per array, and also the specific type of information I need.
Thank you for this type of programming content. Programming channels usually never touch on these types of concepts because they assume you already know them.
I would add an asterisk to your description of interpreted languages. A lot of them ate going the jit compiler route, so you do get some benefits of normal compiled languages, like real primitives, no interpreter step running for every line read, etc. Ofc, not perfect, but its not as bad as you'd think it is.
Another example of why I wish programming courses would start students with microcontrollers, like arduino. So many of these concepts are super abstract that can kind of be ignored, without notice, even when writing low level on a full PC with an OS, but with embedded systems, these concepts come home to roost, forcing you to understand them, or fail.
I'm not planning on learning Rust, rather, I'm learning these concepts for Zig. Your video was still super helpful and well done. Looking forward to your next videos in the series! Subscribed.
Yeah I know that. What I was trying to portrait is that low level people really don't like that kind of things at all. Like, in what world does it make sense that 3 * "3" is 9 but 3 + "3" is "33" ?
If you're referring specifically to the Javascript (ECMAScript) spec, then it's "defined" behaviour across so so very many edge cases is the subject of a lot of "oh my God, what the actual f*ck!" humour among programmers who don't have to deal with that kind of nonsense in their daily lives. I suspect Javascript developers just cry themselves to sleep instead. Php developers have their own catalog of "this value + that value = nonsense", which they share among themselves when they want to speculate on what the designers of that language might have been smoking.
In javascript all numbers are 64bit floats, untill you explicitly use bigint or typed arrays. For me it is the same kind of klowlege that knowing, that u in u8 means unsigned integer. Better example would be PHP, becouse it has 2 types that can convert automatically: int and float.
I think that the reason why everyone should start learning programming with C. That was my first language, so it was a surprise for me that someone has no idea what the size of the variable is!
super cool video, fantastic channel! A question: in C/C++/Rust, how does the program knows that, if i define a variable “float a = 3.0;”, the memory location storing “a” is storing a float32? because in Python you said that there are addition bytes used to store this information, while in C the type is defined at compile time. But then how the program remembers it?
It doesn't. The compiler makes sure that whatever instruction flow accesses that memory location only uses float32 instructions to process it UNLESS you specify otherwise. There is nothing to stop you from pointing at your float's location with an int pointer and then the real fun begins. You can even give the same memory location multiple names and types with a union.
5:40 javascript doesn't have integers, and I'd argue that's intuitive to every beginner programmer. Also what language even allows you to implicitly truncate floats? Pretty sure even C wants an explicit cast for that
If it did not have integers under the hood it would be a nightmare dealing with floating point errors in code logic that assumes integers. Imagine a for loop that increments a variable by 1 every pass. If it was a float you would get values like 5.99999998 and when you try to use that as a array index you would get an error.
@@ParkourGrip no you wouldn't, that's not how IEEE754 works. Precision errors pop up when you try to represent fractions like 1/3, but you only start losing precision on integers once you get to ludicrously large numbers that a u64 wouldn't even be able to represent.
Usually, but that's not where you are losing the big bucks on modern CPUs. Your real problem are cache misses. Where your variable is in memory is far more important than what it is.
The default is still often a signed int. Whatever an int may be for your system. Using it for age could help describe a person who in't to be born in decades, depending on your unit.
I wouldn't use an unsigned age. It's confusing. You could use a unix timestamp as date of birth and have a function that calculates the age (0 for born in the future) .
@@jwrm22 there is a 64bit version that gains adoption slowly but surely. It allows for prehistoric and astronomical dates solving 2 shortcomings at the same time.
I enjoyed the video and the shot at JavaScript for being able to add strings and integers. It would be awesome if you could make a video about the stack and heap. I get confused in c++ trying to figure out what gets put on the stack and heap all the time, so a video would be great!
Jorge** has added two excellent videos that cover each of those topics. His channel (this channel) is some of the very best content I've ever seen - anywhere - on this stuff. Definitely give the man a sub! (and go watch those two vids). In general anything that doesn't have a fixed size goes on the heap, whereas your various "primitive" types (integers, floating point numbers, single characters) and fixed-sized arrays of those primitive types go on the stack. In many languages you can also combine fixed combinations of those primitive types into "structs" which, because they are also a fixed size, can go on the stack. Things that don't have a fixed size (and must be stored on the heap) include; any sort of variable-length collection, strings (as opposed to a fixed-length array of characters), and objects. ** (it may be pronounced "hor-hey" unlike how the AI voice read it out, or maybe he's just had too many co-workers who can't pronounce his name and prefers "george" anyway?)
One thing that I must nitpick is the list of things python can’t do. It can do far more than js at a lower level, and you have to worry about these things if you use the applicable packages. For example, sized data with ctypes, and parallelism with multiprocessing. I agree that most people won’t need it, however it is built into the language when you do.
So in case of low level languages like Rust or C, how does the compiler know which byte is a string and which one is representing a number? I mean where is that information stored? That information must also be read in order for the compiler to determine operation type.
It's only being stored in your source code and then implicitly in the instruction flow of the compiled program. There is nothing in a compiled C program that will tell you explicitly what type of variable resides where and what its original name was. You can, if you want, include debug information in your program, then those tables will be included, together with the real names of your functions. But why make it easy for hackers to decompile your code, right?