As requested: I just worked on a micropython ("embedded python") project where it's the actual [re]allocation that took a LOT of time (or so it seems without great profiling tools). I wasn't running out of RAM, but it was the changing of capacity/allocation (in my case, in strings, not lists, but probably similar; maybe just not with spare capacity) that was the bottleneck. I like this deep-dive stuff. Even if it's not immediately practical, it's still useful to know how things work.
I would guess that it’s not particularly similar as strings in Python are immutable. There will never be a reallocation to extend capacity for a string because *any* mutating string operation creates a completely new object.
In Java there is a special class StringBuilder created for this reason that the strings are immutable. So you build strings from parts and join them once when you need it.
this keeps happening to me with "std::string" on embedded C++... also one day i'll find out why the heck the standard library hard-depends on crap like glibc-locales on embedded targets and refuses to compile about a random half of string operations... but hey at least there it does the thing when you more or less expect because no GC/... so it's not all too hard to at least shove the expensive code somewhere it doesn't run too often
This guy really went and took an elaborate deep dive for 10 minutes, only to debunk the relevance of his own video at the end. Well, at least you're humble!
It should be said that these are CPython specific internals and other implementations like PyPy or even subsequent versions of CPython can behave completely differently.
One interesting extra detail: For the last example with `for x in range` it probably doesn't reallocated at all, but only allocates once. This is what the `length_hint` special function is for. It's supposed to return a guess of how many elements the iterator returns
For [0,0,0], x=0;[x,x,x], [0]*3 and [0 for _ in range(3)] respectively: python 3.6.10 on macOS: 88/88/88/96 python 3.10.1 on macOS: 120/80/80/88 I find the amount of fluctuation interesting honestly.
I love these deep dives, but I feel like you could add on your description which OS and Python interpreter version showed this behavior. Both so that we can compare and for future reference
I think understanding the internals at some level (even if your details aren't 100% accurate) is *extremely* useful because having that vague idea makes it easier to recognize situations where you do need to dig in deeper. It's good to always have an idea of what's happening, even if in most cases, to borrow from physics, your "cow" is perfectly round and frictionless. Basically, when you're using an abstraction, you should at least understand the limits of that abstraction.
Well, in the past I was always complaining about shit in javascript like "[ ] == [ ] is false but ![ ] == [ ] is true" and "NaN === NaN is false" and praised python for its predictability. You prove me wrong every time you upload a video.
As an embedded c programmer, well, I just statically allocate list sizes... Using malloc() is too much work. Not programming work, having to deal with code reviewers and their inherent fear of that function is a real pain.
@@berylliosis5250 I once needed a b-tree for a project, but malloc() and friends were expressly forbidden by coding guidelines - no exceptions. So I created my own heap from a statically allocated array and my own allocation and deallocation functions to "dynamically" create/destroy elements. Was actually kind of fun and a good learning exercise.. plus they couldn't say crap when the unit tests passed and I never touched malloc😋
@@thefekete I do not understand that restriction, heh. I get it when malloc isn't available, or when there's very little memory, but making your own heap is both necessary and basically just malloc with extra steps ( _more_ potential for bugs than with actual malloc)
@@berylliosis5250 Big organization 101. No one gets fired for buying IBM. The same goes here. Nobody cares if your code works, they care about your code being compliant, so they do not get blamed when it does f* up. Yes, your malloc is almost guaranteed to be worse than the library version, but the reviewers are not required to enforce that. And believe me, they also know this is stupid, but just no one wants their employment and pension gone. And no, bosses are not idiots. They choose to enforce this also for a good reason -- most people are dumb, so while it does impede innovation, staying with rules actually helps more average people than it hurts much fewer talents. This is sad, but this is human nature. That's why there is social science besides natural science, and there are people making tons of money on jobs like economist and government advisors. As engineers we often write their works off, but without them, we could descent into chaos very quickly if everyone thinks only about the local best solution without regarding to the bigger picture.
@@berylliosis5250 It can be better to write your own allocator because you know exactly what the use case is. You can write and tune your allocator to solve exactly the problem that needs solving and not every type of allocation possible like with malloc. However, in my experience, dynamically allocating memory on the heap is basically never needed. You almost always know your worst case and can just statically allocate enough memory for that situation. Not being allowed to use malloc in an embedded system is not that much of a loss in my opinion.
Thanks for the insight! (And yes, in embedded systems, or slow platforms, small differences in memory consumption/runtime can add up, so knowing about this can be very relevant.)
y'know, the more i learn about python's inner workings, the more i begin to question it. i like the idea of code being as simple to read as python does it, but seeing what it takes to get there makes me have second thoughts.
Why? None of this is specific to python, these sorts of considerations are needed in any language that gives you a mutable list datatype. And this is all happening in the interpreter, so it's just as fast as writing your own C code to handle reallocations would be.
@@sykescalvin09 yeah, i get that, but... idk, part of me prefers having to deal with manual memory allocation on c++ over dealing with python's strange quirks
@@TuMadre8000 If you use STL containers like and friends, this is still going on behind the scenes. If you handle memory allocation yourself, quirks (segfaults!) are far more likely to creep in and unless you can optimise for the specific allocation patterns the roll-your-own solution is probably less performant as well.
This was very interesting! Somewhat related, I have heard that Python dicts have faster lookup times than lists, but this doesn't really make sense to me. I would love to see your take on deconstructing the memory and speed differences between using lists and dicts. Great content as always!
Hi! Regardless of if a video is made for this just thought would help you out a bit. The reason the lookup in a dict is faster than a list is because looking up a value in a list requires you to iterate through each item in the list until you find the value of what you are looking for. This can be costly especially if it’s a big list. However a dictionary (hash map) essentially uses the value of the key you are looking for in the lookup (not the index as you would in a list) and it’s either in the hash map or not, there is no iteration through every item in the dictionary. This means it is faster. If you were using Big-O (notation for how time complex a lookup would be) a list would be o(n) as the speed of the lookup soley relies on how many items there are and worst case scenario the whole list has to be iterated through. However for a dictionary it’s o(1) as the lookup time is constant (as the thing you are looking for in the dictionary is either there or not there) Edit: Some python and low level people may pick apart some holes in what I said I.e. it depends on the hashing algorithm under the hood etc, just trying to generalise
@@LettuceAttack176 That would be true if the Python list were implemented as a linked list but instead it's actually backed by an array and it does support random access in O(1) time. And just as a tip, be mindful about using a lower case o in Big-O analysis, as it has another meaning than an upper case O. As for why a lookup in a dictionary could be faster than in a list, I've got no clue. Both may require memory to be loaded from swap causing delay, and the dict may suffer from bad hashing so that seems to imply the opposite. Bonus: The backing array is implemented with a fixed size and if there's any room left, appending an element to the array takes Theta(1). If the array is full the entire array needs to be copied to another, larger block of allocated memory taking Theta(n), therefore technically appending an element is O(n). However, as this copying only occurs when the array is full, it's running time is sometimes averaged across the constant time operations into what's known as amortised worst case running time, which is ((n-1)*Theta(1) + Theta(n)) / n = Theta(1). Note that it still concerns the worst case, but averaged over multiple operations (as opposed to the average case over multiple different inputs). Edit: My bad, when you mentioned lookup I figured you meant random access. If you want to determine whether an element is present in a list then indeed the entire list would be traversed whereas with a dict it takes time depending on the hash function and the existing collisions, but O(1) in the average case. If you need an ordering like a list that supports constant time membership testing, starting from Python 3.7 the dict implementation preserves insertion order! Earlier versions may instead rely on OrderedDict.
@@LettuceAttack176 That's very interesting, thanks! Why do Python lists require iterating over the items? I always assumed Python lists (and probably also C++ std::vector) were more like C/C++ arrays which (as far as I know) are constant lookup instead of linked lists which require iterating through.
@@andrewglick6279 As far as I understand lookup to a specific index in a list is o(1), similar as it would be for lookup by key in a dict. Now if rather than simply looking up you were looking for a specific item in a list it's o(n) whereas for the dictionary it remains o(1). By contrast if python lists were backed by linked lists it'd be o(n) to both retrieve the ith item from the list, and search the list for a specific item that may or may not be in the list because with a linked list you gotta iterate in both cases.
My guess for the number-literal "quirk" is that it makes a deep copy of each arbitrary-precision int and therefore each one has a different address and unique pointers to be stored. That explains why when using variables it has the same capacity as when explicitly repeating the element. Maybe because it uses Run Length Encoding to internally repeat the same pointer without making copies of it
This is so much better than y’all just telling us. How I wish youtube instructors would debug in trials before telling us the answer. Print statements or it didn’t happen
I thought you chose the number at 1:45 because it is the maximum value for an unsigned int but I quickly realized that doesn't make sense and it's way too big to be the max value of an unsigned int. Do you mind sharing why you chose that number?
@ At first I assumed it was something to do with integer ranges, but it turned out not to be. Then I thought it was some kind of hidden message and tried converting it to different bases and ascii and stumbled upon the solution. In theory it should also work in binary and octal, but I couldn't do it, maybe I was doing something wrong.
@@SophieJMore it works in binary if you add 0 to the start, it doesn't work in octal because the bits don't align (1 character is 2 digits in hex, 8 digits in binary but 8/3 digits in octal)
Oh wow! RU-vid just deleted my comment, linking a Gist with Benchmarks of different types of list construction. Tl:dr was that both list comprehensions or preallocating a list of 10000 zeros is on my system about 28-44% faster than allocating an empty list, then appending each element separately and numpy is 5x faster than that. The gist is still up on my Github, same as my RU-vid name. So there *is* actionable information in this video :).
The entire concept about dynamically allocated arrays as being a single structure or container is directly related to data structures and algorithms. The geometric progression is based on the observations of both minimizing the code or number of instructions while also increasing its performance and efficiency. In other words or in simple terms, it's based on Big O notation for both time and space complexity of an arbitrary algorithm in contrast to the desired container or data structure. Although Python is a High Level scripting or interpreted language compared to other languages such as C and C++ and abstracts away from these details, why would these concepts about memory layout, locality, access times, and geometrical expansion be any different or change? They don't! These concepts even apply in code bases such as JavaScript or any other language that can do computations on datasets. Even database and querying languages such as SQL or PHP even have these principles embedded within their internal frameworks. It's the nature of things as it always goes back to the mailbox problem with indexing into address spaces as well as the phonebook problem with all of the different searching and sorting algorithms as well as others! Nice video though! I really enjoyed it!
I mainly use rust and there the capacity is actually important in some cases. For example when having 100 elements, you know you're going to add to the vector (rusts version of a list) it would be very wasteful if you had to reallocate the vec a few times before actually reaching the needed capacity. Therefore there is a with_capacity constructor.
@@raccoon1160 Yeah but... still need an explanation, weird things also have an explanation, specially if they repeat the same behavior (seems to be always 8, not just a random number)
This one seems to be subtle. Checking the bytecode for a couple of functions with dis.dis, [3,3,3] caused a list to be created then extended with a tuple (3,3,3). [e,e,e] on the other hand used BUILD_LIST with 3 arguments, and BUILD_LIST in turn created the list with reserved capacity using PyList_New() before populating it backwards. extend used list_resize to allocate space, and that added some margin. These two behaviours can be demonstrated using l=list(t) vs l=list();l.extend(t) That's my testing on CPython 3.9 on Linux x86_64. This is all implementation details that can and do vary between implementations.
@@igorswies5913 True, but there's also an array type (in the array module). Python's array type stores fixed size numbers which are converted to objects when read. This can make it more compact than a list, which has to store references to all elements, where each reference is usually (on 64-bit systems) the largest size available in array. Users are generally directed to use numpy instead, which offers more useful high level operations.
I know this was more on the informative side of videos, but honestly I was mostly laughing at your comments on screen lol. In terms of the vid itself, pretty much what I expected in terms of memory management, but awesome vid nevertheless!
Unfortunately it gets messier when you put lists in your lists. I knew I had an issue with the second version, it hurt me when I saw it, I just remembered why. If this comment survives birth, you can run this piece of code and see the problem-output: aa = [[]]*3 bb = [[] for _ in range(3)] print(aa) print(bb) aa[0].append(1) bb[0].append(1) print(aa) print(bb)
That's why when we use async we need to lock it before editing and appending the list ? So there is a chance that the list could be edited while switching the a new memory size ?
This would be a potential issue with threading, but not with async since async only swaps control at expicit points like "yield". However, current implementations of python use a global lock to prevent this from being an issue even with threading.
Would be good to get a video about "loadFactor", which is what Java's HashMaps use to determine when to add new buckets. I'm also curious as to how they reached this 9/8th's growth factor. I thought generally the capacity doubles every time (this is the case in Java's ArrayList I think), which is obviously more wasteful of memory, but understandable why they went with this approach.
loadFactor is not used for lists, only dicts. PyPy keeps the dict size between 1/3 to 2/3 full, AFAIK CPython is the same. The list type grows with the following code in PyPy, which AFAIK was ported from CPython directly: if newsize < 9: some = 3 else: some = 6 some += newsize >> 3 new_allocated = newsize + some
I was playing around with sys.getsizeof() on other builtin types and I stumbled on something surprising (for me). For strings of course the concept of capacity doesn't apply because they are an immutable type, so it makes sense that the exact amount of memory needed to store a given string is allocated, and no more. What surprised me is that for an empty string, sys.getsizeof returns 49. I expected a multiple of 8 bytes
Strings use multiple different in memory encodings. E.g. if your string is all ascii it uses 1 byte per character, but if you use an emoji it increases to 2 or more bytes per character. The string therefore needs to store this metadata about which scheme it is using inside the object, which is why you are getting this answer.
@@mCoding While true, that doesn't quite reveal why it's an odd size. I believe the reason is the encoding chosen was UTF8, and in that case an extra byte is allocated for a C-style NUL terminator.
Hmm... I hesitated before clicking the video. It turns out exactly as I expected, curious but useless. If one care about exact memory management you should use Python. For certain structured data we use NumPy and/or Pandas. Otherwise, switch back to C and do your own malloc() and free()
@@mCoding At first I thought it is some prime number with a special history where group theorists do some math jokes with it. But google gave no answer to it. Then I though it might be already HEX code, but it wasn't directly apparent, after going from binary to ascii I already smelled the answer. Now I wonder if I subscribed because of the great content or the subliminial messages...
_sees title_ pre-coffee-brain: "well yeah it's python, no wonder the memory management is hella inefficient as well" but joke aside, i wouldn't call that "remembering" or "what *you* did" (please *do not* rely on any of those effects for anything!), it's just a side effect of an allocator not wasting all your CPU all the time to save on 2 bytes of theoretically unused ram, what gives?
Is a python list contiguous? I don't see the reason it'd need to be to function, but I could see it for speed reasons to make it fast indexing like an array
Yes, python lists are contiguous. However, all python objects are pointers, so it is a contiguous list of pointers that point to allocated memory that is probably not contiguous.
I can see this knowledge being useful if you for some reason are using a ton of lists and loops and trying to optimize the code. But honestly, nobody codes in python for performance so it's probably, just as you said, good to know but not very useful.
Python is just a weird language all round. In the year 2022 it still doesn't understand tabs vs. spaces, for one thing. Then when you import another file, any code in that file that is OUTSIDE of a function is just merrily run by the calling file? That quirk of the language has been the subject of quite a few security bootcamp exercises I've worked on.
We understand that mixing tabs and spaces leads to confusion. And yes, running Python source runs statements. That's kind of a key thing. Are you comparing to programs that don't have top level statements like class and def?
Not directly, capacity is an implementation detail of CPython so the best you can hope for is to hope that [0]*n gives a capacity of n for all n. But remember that Python also automatically shrinks the capacity so you can't even do [0]*n and then clear(). I've been reading the implementation and, because of the automatic shrinking, providing any kind of control over the capacity would be a big rewrite at the very least.
That's a method too. But imagine if you have a list that has a million elements and you want to add another one. Would you increase the capacity to 2 million?
I'm not familiar with rust very much (more to learn in the future i suppose!). Does rust not have a comprehensive standard library? I hear so much good stuff about it I'd be surprised to know it's stl was lacking.
Most of these things are pointers to objects in memory. Hence, 64 bits or 8 bytes. And on 64b processors, smaller value sizes are usually not worth the hassle.
You seem confused. Those two are equal, because 'a' is the same immutable object either way. The difference is important if you replace 'a' with an expression that creates a mutable object, e.g. []. In this case they'll still be equal (as all the elements are empty) but different structure: [a,a,a] vs [a,b,c].
The numbers in the video are ouput from the source code linked in the description and shown on screen, so in that sense the video is self fact checking, though you are welcome to run the code for yourself. If you are getting a different answer you are most likely just using a different version of Python. One of the main points of the video is that the behavior is an implementation detail subject to change from version to version and not a fundamental property.
2129834018397712114277 isn't an integer limit because log_2(2129834018397712114278) isn't an integer you chose the number because it was the most conveniently random long thing you could type