ERRATA: 1. I mention that stack memory has faster access time than heap memory. While *allocating* and *deallocating* stack memory is much faster than doing so on the heap, it seems like access time for both types of memory is usually roughly the same.
I was just thinking about this at the beginning of the video. Heap and stack are just different areas of the same system memory. What matters here is that the stack is used to keep the "frame", i.e. all the values that are local, to the current function. This is how, after a function call returns, local variables retain their values, and this is what makes recursion possible. This stack behavior is implemented by keeping a pointer to the "top" of the stack and, on each function call, moving that pointer by an amount equal to the size of the new function's stack frame. That's why the compiler needs to know the size of the stack frame, and consequently, the size of any local variable to a function. Every other object that's dynamic in nature, or recursive, will have to live outside the stack, i.e. using Box. And like you just explained, deallocating on the stack is quite fast, since things aren't really "deallocated", the Stack Pointer is just moved back to where it was before the function call, while allocating and deallocating on the heap usually involves interacting with the Operating System to ask for available memory. Great video! Keep it up!
I think "stack is faster than heap" is a pretty reasonable starting point, especially for a talk that isn't going into nitty gritty details about allocators and caching. Stack memory is pretty much guaranteed to be in your fastest cache, but with heap memory a lot depends on access patterns. If you have a really hot Vec then sure, there's probably no performance difference compared to an array on the stack. But for example a Vec where each String has its own heap pointer into some random page, isn't going to perform as well.
@@oconnor663 For most programmers that aren't going down the nitty-gritty sysprog hole the assumption that "stack is faster than heap" covers 95% of all use-cases. The msot time spent when dealing with memory is allocating and deallocating after all.
You'd need to set another register than EBP but the type of memory is indeed exactly the same, and the cache will cover both. But there may be system calls when using the heap. "In an ideal world you'd have everything on the stack" - I disagree if that's in the absolute, bear in mind the stack is limited in size and if you cannot often control what was stacked before your function is called or what will be stacked by the code called by your function. It's not appropriate for collections either because it would complicate size management and cause more memory moves (which are very power-consuming). But I think you meant it otherwise, for small objects in simple cases where this isn't a concern. These days memories are so large that people tend to forget about those limitations and then they are surprised the first time they have to deal with embedded code. ;-)
It makes total sense, both are in RAM. The thing is the stack is contiguous so writing to it is fast because the writes are sequential, while the heap is probably fragmented, which means random writes. Edit: without taking into account what the others have said, about frames, OS allocation, etc, everything contributes.
Sir, your Rust tutorial are cohesive, easy to follow ( due to great examples ) and don't go overly deep into the details. Perfect combination. Keep up with the good work.
Honestly, I 've read about these things 3-4 times, and I more or less understand them, but it really clicks differently when someone tells you "these are the two main uses of Box: unsized things and self-referencing structs". Thank you, this is really helpful!
WOW WOW WOW! Rust is my favorite programming language, and I’ve used it for all sorts of things, but I’ve never dived into smart pointers (except box) and this was super helpful!
Thanks for the helpful video! It takes me a bit to catch everything on the first time around so I need repeat parts, but the clear examples and broken down explanation really help a lot.
I saw a lot of examples, including THE BOOK, and rust by examples, a lot of youtube videos. still didn't fully understand why how what. now i think i understood Rc finally. Thank you.
I excluded it from this video to keep things concise, and I wasn't convinced it would be useful for the vast majority of folks. But several people have requested I cover it, so I may at some point. In the meantime there is coverage of it in one of the later chapters of the Rust book.
Great video! I think what would have been simpler to explain the difference between Rc and Arc without mentioning reordering, is that the increment and decrement of the internal strong and weak counters are represented as AtomicUsize in Arc (i.e. thread-safe) and usize (i.e. non-thread-safe) in Rc.
Thanks and thanks for the feedback! Touching on ordering was probably a little confusing, to your point I probably could have just mentioned the different counter types, and that one is thread safe while the other isn't
.clone() allocates new memory on the heap while Rc::clone make it points to the same space in memory without duplicating data, that makes a huge difference if you're into memory management.
I think it's just to make it explicitly clear that we're cloning a pointer, not the underlying struct. If I see foo.clone() in the wild, I'm instantly suspicious, but Rc::clone() is using the type exactly as intended.
Also, to mention about Box usecases. The first use cases covers it, but it's not straightforward. Imagine that we are possibly returning many structs that implement the same trait from the function. In this case, the return type can not be known at compile time, so we need to make it Box
Omg, I _love_ your intro graphic, played at 0:30. *It's short!* Who wants to sit through 5 or ten seconds of some boring intro boilerplate every time we visit that channel, like a bad modal dialog box on some Windows 95 app, drives me nuts.
thanks modolief! I'd thought about creating a little intro reel, but every time I consider it I conclude that it would hinder my mission to provide as much value as possible in as little time as possible
@@codetothemoon The channel "PBS Eons" also has a really good intro bit. They start their video, then at around 20 or 30 seconds they give their little imprint. But what I really like about it is that even though it's more than about 3 seconds it fades out quickly, and they already start talking again before the sound is done. Very artistic, yet not intrusive.
I am unsure whether one should practice both safe and bad programming. At least it is safe, I suppose. Specifically, I do not understand one of these clone examples when good programming might ask the instance to remain singleton, all the way through (both literally and figuratively). You show us how to do it, and you behave as if: awesome.
I wouldn't say stack memory is faster to access, just that the allocation and deallocation is faster. It might be a bit faster in certain conditions since it will stay in cache most of the time.
Got it! Yeah my understanding was that stack memory is more likely to be stored on the CPU cache - but maybe that's possible for the heap as well... Though I haven't actually benchmarked this, maybe I'll do that...
Ordinary variables could also be assigned by the compiler to CPU registers, which makes them as fast as they get. This doesn't happen to the heap-allocated variables.
@@codetothemoon Access is fastest when the data is "near" the recent access. Which is a part of why data oriented programming is so much faster. but I bet the methods of memory access have changed so much that what we are taught is not what is implemented in the most recent technology
The stack is not faster than heap. Both are locations in main memory. True, stack might be partially in registers, but in general, stack is no different to heap. Heap memory involves an allocator which in turn of course causes more overhead (internal some atomics need to be swapped and free memory has to be found). But stack and heap are both located in equally fast main memory.
I understand if you’re coming from C or C++, the conceptual overhead of this stuff could make sense for you because it is largely stuff you actually already have to think about in a slightly different way. But if you have the option to use a garbage collected language, I have no idea why you’d drag along all of this conceptual baggage with you. I mean just look at the litany of peripheral specifiers that was created in this tiny example for no other reason than to appease the compiler. It’s a complete distraction from the problem you’re trying to solve.
actually interestingly, I think C/C++ knowledge doesn't help much unless you're writing `unsafe` Rust. then it might. But in `safe` Rust code, while you'll see some of the same symbols - mainly '&' - they may have a completely different meaning. as for the "why", most folks should probably stick with a garbage collected language. Rust can shine in the following situations, where may be well suited for solving said problem: 1. Performance is valued above all else 2. The project needs to run on hardware with extremely limited resources 3. The project needs to handle a large volume of traffic while minimizing hosting costs - ie the "great problem to have" where a very small company makes a product that becomes heavily used
Thanks for the reply! I enjoy your videos. I completely agree with your list of use cases. My mention of C/C++ wasn’t necessarily that it would make learning Rust easier, but that the seemingly crufty stuff that Rust does actually is an interesting solution to problems that do arise in those languages. So the overhead of dealing with it might make sense because it’s solving real problems that you commonly deal with in those languages (and not many others). For that reason I do think it would be easier for a C/C++ dev to pick up, because they’re at least familiar with the reasoning behind the design choices. But that’s definitely up for debate.
This was a super helpful primer on why/when to use these types! Would love to see more content building on it. I'm trying to form some internal decision tree for how to decide how long a given piece of data should live for. Going to go see if you have any videos on that topic right now... 😁
great, really happy you got something out of the video! I don't have a video specifically on deciding how long a piece of data should live for, but "Rust Demystified" does cover lifetimes.
🤔 I would understand them more intuitively if they were named more intuitively and consistently. One is a single ownership pointer, uniquely owned. One is a shared ownership pointer, implemented via reference counting. Another is the same as the previous, just with interlocked atomic increment/decrement. Names like "Box" and "Arc" though feel pulled out of a hat. A box has height, width, and depth, but there is nothing volumetric in Rust's "Box" (and loosely co-opting the concept of "boxing" from C# feels weird here).
Rc stands for reference counter and Arc stands for atomic reference counter, they are just abbreviations which is good because they are frequently used and imagine writing ReferenceCounter every time, especially when you have to wrap many things with them. For box it could be named better maybe, but there is no type that is going to be called a "box". If it is a math library it would call it cuboid, cube, rectangular prism or something else. For types that are frequently used short names are good.
Totally understand your frustration - to add to the other response, I believe "Box" and "Boxing" are terms that have histories that extend well prior to the inception of Rust, but are usually hidden from the developer by developer-facing language abstractions. I think Rust is just one of the first to actually expose the term directly to the developer.
@@codetothemoon Example dated usage: X.Leroy. Unboxed objects and polymorphic typing, 1992. The terms have been used in libraries also, at least since 2007 in Haskell and 2000 in Steel Bank Common Lisp. I suspect it could be traced back several decades more.
that's correct! Rc doesn't really help much if you intend to hang on to one reference until the program ends - you could just use regular borrows in that case - but in this example to show the strong_count function I just kept a reference in main.
One more thing. I'm assuming that for clarity, you used the explicit Arc::clone instead of the suffixed version. You can use .clone() on an Rc/Arc and it will clone the reference instead of the data.
Spread the word of Rust, son. The moment I realized the weakness of C/C++/C# it disgusted me. I craved the strength and certainty of Rust. Their kind calls C/C++/C# the temple but it will die and wither. And then they will beg us to save them. But I am already saved for Rust is immortal. Rust is inevitable. The Omnissiah the Blessing Machine revealed. Chaos exterminated.
For atomic, it is more than just compiler has to forgo some optimizations but it has to tell CPU to also not reorder, lock the bus, and handle cache-coherency issues. Both an INCrement and a DECrement, really have three parts load/read, compute, and store/write. Normally, both the compiler and the cpu can reorder many things and be lazy. So if you had pseudo-code: y=sin(x); if (cond) {i++}; pritnf("%d ",i); then compiler could reorder it to asm(pseudo x86): mov %eax, [i] mov %ebx, [cond] fsin x jz %ebx, prnt_label inc %eax prnt_label: push %eax push "%d" call printf mov [i],%eax We can have a lot going on between mov %eax, [i] (LOAD) and mov [i],%eax (STORE). The compiler needs combine mov %eax, [i], inc %eax, mov [i],%eax into : inc [i] .... But it also has to go further and add lock prefix . The lock prefix tells CPU that it has to make sure to hold the bus during the whole LOAD/COMPUTE/STORE phases of the instruction so another CPU doesn't do anything in the middle of all this. Also it has to make sure if other CPUs have L1, L2, etc cache that references that memory that it gets invalidated. c9x.me/x86/html/file_module_x86_id_159.html
11:02 this what I don't rust for. Where did we pass truck_b ownership to the thread? I don't see any obvious code that tells me that truck_b moved to the thread. The variable of type Arc is cloned by readonly reference, so why it passes ownership?
I'm interested how Rc knows when data is going out of scope, or being dropped like you did. How is it aware that the memory is no longer accessible after a specific point without knowing where the objects are created in the program? How does the Rc know that there is a reference to truck_b in the main function, for example?
great question, in Rc's implementation of clone there is `self.inner().inc_strong();` which increments the strong reference counter. So it doesn't necessarily know where the references are, it just increments a counter each time one is created. Then in Rc's implementation of the Drop trait (which has a drop method that is invoked when the implementor goes out of scope) we have `self.inner().dec_strong();` then if `self.inner().strong() == 0 { /*code for cleaning up memory here */ }`
2:11 Why accessing the heap would be slower? It's still RAM like the stack, and can be cached by the CPU like any other memory. The only drawback of the heap is that it can suffer from fragmentation during allocation and deallocation. But it's incorrect to say it has slower access time.
Allocation and deallocation themselves are slower for the heap. Moreover, (just reading this from StackOverflow), the heap often needs to be thread-safe, meaning it cannot benefit from some of the same optimisations as the stack can.
@@spaghettiking653 yes fragmentation can make allocation slower, but memory access isn't slower, which is what the video implied. Having an object on the heap is exactly as fast as anywhere else, and fragmentation issues only occur in rare cases. We're talking literal nanoseconds slower to find free space on the heap instead of putting it on the stack. Unless we're talking about a very hot loop on performance critical software, it doesn't matter, and you shouldn't allocate in a hot loop anyway.
@@sbef Yes, fair point. What about the problems with thread safety? I really have no clue whether that's a real concern or whether it is a problem at all, as I literally read it minutes ago-what do you think/know?
Yeah I may have misspoken a bit here - stack memory is faster to allocate / deallocate than heap memory. Would patch this if I could :/ I'll pin a comment.
@@spaghettiking653 not sure how thread-safe the Rust default allocator is to be honest, but I would expect to be pretty much lock-free even in heavily concurrent applications. It's not my area of expertise, but allocator technology has been refined over the past 3 decades.
Would be great to understand ownership and the stack. "The stack it's much faster than the heap" - i assume that if you pass variables by ref, the CPU Knows "Hey - i am going to use this storage, so i keep it in the cache", but what happens if F1() passes ownership to F2(), passes to F3()... F999() - is the data still on bottom Stack Frame and the storage is still in the cache ?? AFAIK the size of a stack frame cannot be changed. So is it save to always say "Stack is faster than Heap!!!". What comes to some crazy ideas like allocating a huge array for data that acts as "Database" with a fixed huge size in the most bottom stack frame, and then pass it through - or do i get something like "Stack frame to big" ? I can't believe that using the Stack is better than the Heap in this case. Maybe someone has a link that explains it in depth ?
Technically a stack frame can't be too big, the error that can occur is that the stack runs out of memory / stack overflow. A stack overflow could be achieved either by one mega stack frame or a multitude of small ones. Never the less, the error is that the stack memory is depleted (stack size varies from platform to platform and OS to OS) the size of any individual frame doesn't matter, it's the total memory that matters, either 1 large or N smaller ones, going over the stack size. He completely unnecessarily confuse ownership, lifetimes and stack vs heap, for these examples. The heap is generally "farther away" in memory than what the stack is. In computers we have cache, often multiple levels, these are extremely fast, pre fetched from main memory, and so, using data that is either A: close in space or close in time (temporal locality). The cpu will fetch this memory. So he also confuses what is fast about stack, because technically, operating on a large "database" as you refer to it, is also fast, because its temporal and spatial locality are both close- the cpu will understand that you want to do N things to that large array of data, so if you are operating on each element in a loop, the CPU will read that heap memory and pre fetch the data as your loop executes. When this happens, the heap is _exactly_ as fast as the stack, as, your large data blob is being operated on in a sequential manner, one element after the other (just like how the stack is laid out, close in space and close in time). This is the main reason why you want data elements close in memory to each other, because that will make it so that the CPU can "see" what you are trying to do and fetch the memory ahead of time and place some of it in the cache. There is another benefit of the stack, and that is that the clean up of stack memory involves just subtracting N bytes from the stack pointer. If all your data on the stack is "trivial" no involved destructors are run, compare this with the heap, where some clean up must happen to free the memory - and sometimes this could involve a system call which is much slower than normal functions, but even without system calls, there will be some overhead.
@@simonfarre4907 thanks a lot for this detailed answer. ah yeah i tested it out and the largest amount of data on my system was about 8 MB - what is even less then the cache size of the CPU. (Ubuntu 18, ryzen) Probably there are good reasons why to do so.
Thanks Thomas and Simon for pointing all of this out. I can definitely appreciate that "Stack vs Heap" is more nuanced than my brief portrayal of it in the video would lead you to believe.
Thanks, yeah I digged a little deeper. As far as I understand now: allocating and deallocating is faster on the stack. But for data that lives long it doesn't make meaningfull difference. I have not tried it out, but I can tell the linker to allow larger stacks. Therefore it could be possible to provoke a cache miss even on the stack? Or the OS panics if the stack exceeds the cpu cache size, because it always want to have the whole stack at least in L2 or L3. Would be a good reasoning for the default only allowing tiny stacks. If so, it might be faster in some scenarios to keep the stacks small, so the cpu has enough cache for the heap, instead of storing barely accessed data on the stack.
Definitely doing this at some point, given the spooky factor it would have been a good one for halloween, but unfortunately it probably won't be ready in time 🎃
It’s really challenging. But so interesting. And as I learn Rust I feel as though I am learning very important concepts that are key to becoming a proficient software engineer.
This is timely for me. I ran into Rc and cell last night while trying to learn rust with GTK. I find it all very confusing. Anything you can provide including RefCell is greatly appreciated. Thanks.
It's a single-threaded mutex (well, read/write lock.) This might seem useless, but it can be used to create shared references that can still be modified: make an Rc, which you can clone freely, but you can still lock it for mutable writing. (if you try to take multiple write locks at the same time, the thread will panic.) it's sort of like a pointer to an object in a regular OO language. You can also use it to make mutable thread-local data. Keep in mind anything containing a refcell can't be sent across threads. They're also a pain to serialize.
RefCell seems to be frequently requested, I'll probably make a video about it! In the meantime it looks like like strangeWaters has a good description, and there is also an explanation in chapter 15 of the Rust book.