Python lists remember what you did to them

mCoding

Подписаться 235 тыс.

Просмотров 128 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

29 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 196

@scoates 2 года назад

As requested: I just worked on a micropython ("embedded python") project where it's the actual [re]allocation that took a LOT of time (or so it seems without great profiling tools). I wasn't running out of RAM, but it was the changing of capacity/allocation (in my case, in strings, not lists, but probably similar; maybe just not with spare capacity) that was the bottleneck. I like this deep-dive stuff. Even if it's not immediately practical, it's still useful to know how things work.

@gormster 2 года назад

I would guess that it’s not particularly similar as strings in Python are immutable. There will never be a reallocation to extend capacity for a string because *any* mutating string operation creates a completely new object.

@scoates 2 года назад

@@gormster yeah; that’s a good point. Definitely no over-capacity on strings, but lots of (new) allocation when concatenating.

@Konyad 2 года назад

In Java there is a special class StringBuilder created for this reason that the strings are immutable. So you build strings from parts and join them once when you need it.

@QuantumHistorian 2 года назад

@@Konyad C# has the same thing. Which, considering how much it took from Java, is not exactly surprising.

@nonchip 2 года назад

this keeps happening to me with "std::string" on embedded C++... also one day i'll find out why the heck the standard library hard-depends on crap like glibc-locales on embedded targets and refuses to compile about a random half of string operations... but hey at least there it does the thing when you more or less expect because no GC/... so it's not all too hard to at least shove the expensive code somewhere it doesn't run too often

@jeroenritmeester73 2 года назад

This guy really went and took an elaborate deep dive for 10 minutes, only to debunk the relevance of his own video at the end. Well, at least you're humble!

@voxelfusion9894 2 года назад

list a: "I know what you did last runtime."

2 года назад

It should be said that these are CPython specific internals and other implementations like PyPy or even subsequent versions of CPython can behave completely differently.

@dhkatz_ 2 года назад

Who is using anything but CPython out in the actual world? lol

@DanCojocaru2000 2 года назад

@@dhkatz_ Anybody listening to Guido's advice.

@d3stinYwOw 2 года назад

@@DanCojocaru2000 Can you link this advice somehow?

@vinos1629 2 года назад

@@d3stinYwOw "If you want your code to run faster, you should probably just use PyPy." -- Guido van Rossum (creator of Python)

@pranavnyavanandi9710 2 года назад

@@vinos1629 Python at the speed of C?

@megaing1322 2 года назад

One interesting extra detail: For the last example with `for x in range` it probably doesn't reallocated at all, but only allocates once. This is what the `length_hint` special function is for. It's supposed to return a guess of how many elements the iterator returns

@callbettersaul Год назад

So many people ask that question about the list size differences, but this king actually answered it.

@TimmyTam 2 года назад

"List will remember that"

@davewagler1092 2 года назад

With Python3.8.10 in Linux Mint, the sizes of the three original lists are, respectively, 80, 80, 88.

@xtremegamer20k89 2 года назад

yes i got 88 for list comp and 80 for [0, 0, 0], [0]* n (i use 3.8.6)

@daskadse769 2 года назад

For [0,0,0], x=0;[x,x,x], [0]*3 and [0 for _ in range(3)] respectively: python 3.6.10 on macOS: 88/88/88/96 python 3.10.1 on macOS: 120/80/80/88 I find the amount of fluctuation interesting honestly.

@Ba_Dashi 2 года назад

I love these deep dives, but I feel like you could add on your description which OS and Python interpreter version showed this behavior. Both so that we can compare and for future reference

@bobuccman1424 2 года назад

he knows exactly it takes you less than 100 keystrokes to test that on your machine and compare to future reference

@nio804 2 года назад

I think understanding the internals at some level (even if your details aren't 100% accurate) is *extremely* useful because having that vague idea makes it easier to recognize situations where you do need to dig in deeper. It's good to always have an idea of what's happening, even if in most cases, to borrow from physics, your "cow" is perfectly round and frictionless. Basically, when you're using an abstraction, you should at least understand the limits of that abstraction.

@kaninchengaming-inactive-6529 2 года назад

"Python lists remember what you did to them" rather sounds like a threat

@Vijwal 2 года назад

"python lists remember what you did to them" Sounds like a threat

@comedyclub333 2 года назад

Well, in the past I was always complaining about shit in javascript like "[ ] == [ ] is false but ![ ] == [ ] is true" and "NaN === NaN is false" and praised python for its predictability. You prove me wrong every time you upload a video.

@spenceabeen 2 года назад

The code knows what you did The code knows what you did The code knows what you did

@thefekete 2 года назад

As an embedded c programmer, well, I just statically allocate list sizes... Using malloc() is too much work. Not programming work, having to deal with code reviewers and their inherent fear of that function is a real pain.

@berylliosis5250 2 года назад

I can see that in an embedded context, but I can't stand working with static arrays in C. No reentrance, actually fixed size, no nesting... no fun.

@thefekete 2 года назад

@@berylliosis5250 I once needed a b-tree for a project, but malloc() and friends were expressly forbidden by coding guidelines - no exceptions. So I created my own heap from a statically allocated array and my own allocation and deallocation functions to "dynamically" create/destroy elements. Was actually kind of fun and a good learning exercise.. plus they couldn't say crap when the unit tests passed and I never touched malloc😋

@berylliosis5250 2 года назад

@@thefekete I do not understand that restriction, heh. I get it when malloc isn't available, or when there's very little memory, but making your own heap is both necessary and basically just malloc with extra steps ( _more_ potential for bugs than with actual malloc)

@bskull3232 2 года назад

@@berylliosis5250 Big organization 101. No one gets fired for buying IBM. The same goes here. Nobody cares if your code works, they care about your code being compliant, so they do not get blamed when it does f* up. Yes, your malloc is almost guaranteed to be worse than the library version, but the reviewers are not required to enforce that. And believe me, they also know this is stupid, but just no one wants their employment and pension gone. And no, bosses are not idiots. They choose to enforce this also for a good reason -- most people are dumb, so while it does impede innovation, staying with rules actually helps more average people than it hurts much fewer talents. This is sad, but this is human nature. That's why there is social science besides natural science, and there are people making tons of money on jobs like economist and government advisors. As engineers we often write their works off, but without them, we could descent into chaos very quickly if everyone thinks only about the local best solution without regarding to the bigger picture.

@jaxfrank 2 года назад

@@berylliosis5250 It can be better to write your own allocator because you know exactly what the use case is. You can write and tune your allocator to solve exactly the problem that needs solving and not every type of allocation possible like with malloc. However, in my experience, dynamically allocating memory on the heap is basically never needed. You almost always know your worst case and can just statically allocate enough memory for that situation. Not being allowed to use malloc in an embedded system is not that much of a loss in my opinion.

@hagen-p 2 года назад

Thanks for the insight! (And yes, in embedded systems, or slow platforms, small differences in memory consumption/runtime can add up, so knowing about this can be very relevant.)

@TuMadre8000 2 года назад

y'know, the more i learn about python's inner workings, the more i begin to question it. i like the idea of code being as simple to read as python does it, but seeing what it takes to get there makes me have second thoughts.

@sykescalvin09 2 года назад

Why? None of this is specific to python, these sorts of considerations are needed in any language that gives you a mutable list datatype. And this is all happening in the interpreter, so it's just as fast as writing your own C code to handle reallocations would be.

@TuMadre8000 2 года назад

@@sykescalvin09 yeah, i get that, but... idk, part of me prefers having to deal with manual memory allocation on c++ over dealing with python's strange quirks

@sykescalvin09 2 года назад

@@TuMadre8000 If you use STL containers like and friends, this is still going on behind the scenes. If you handle memory allocation yourself, quirks (segfaults!) are far more likely to creep in and unless you can optimise for the specific allocation patterns the roll-your-own solution is probably less performant as well.

@starvlingk5122 Год назад

Python makes extremely great pseudo c code. I find it much easier to write an algorithm in python first then rewrite that same algorithm in c next.

@CyborusYT 2 года назад

1:41 ascii encoding of "subscribe" concatenated into a single number

@andrewglick6279 2 года назад

This was very interesting! Somewhat related, I have heard that Python dicts have faster lookup times than lists, but this doesn't really make sense to me. I would love to see your take on deconstructing the memory and speed differences between using lists and dicts. Great content as always!

@LettuceAttack176 2 года назад

Hi! Regardless of if a video is made for this just thought would help you out a bit. The reason the lookup in a dict is faster than a list is because looking up a value in a list requires you to iterate through each item in the list until you find the value of what you are looking for. This can be costly especially if it’s a big list. However a dictionary (hash map) essentially uses the value of the key you are looking for in the lookup (not the index as you would in a list) and it’s either in the hash map or not, there is no iteration through every item in the dictionary. This means it is faster. If you were using Big-O (notation for how time complex a lookup would be) a list would be o(n) as the speed of the lookup soley relies on how many items there are and worst case scenario the whole list has to be iterated through. However for a dictionary it’s o(1) as the lookup time is constant (as the thing you are looking for in the dictionary is either there or not there) Edit: Some python and low level people may pick apart some holes in what I said I.e. it depends on the hashing algorithm under the hood etc, just trying to generalise

@Marquesian 2 года назад

@@LettuceAttack176 That would be true if the Python list were implemented as a linked list but instead it's actually backed by an array and it does support random access in O(1) time. And just as a tip, be mindful about using a lower case o in Big-O analysis, as it has another meaning than an upper case O. As for why a lookup in a dictionary could be faster than in a list, I've got no clue. Both may require memory to be loaded from swap causing delay, and the dict may suffer from bad hashing so that seems to imply the opposite. Bonus: The backing array is implemented with a fixed size and if there's any room left, appending an element to the array takes Theta(1). If the array is full the entire array needs to be copied to another, larger block of allocated memory taking Theta(n), therefore technically appending an element is O(n). However, as this copying only occurs when the array is full, it's running time is sometimes averaged across the constant time operations into what's known as amortised worst case running time, which is ((n-1)*Theta(1) + Theta(n)) / n = Theta(1). Note that it still concerns the worst case, but averaged over multiple operations (as opposed to the average case over multiple different inputs). Edit: My bad, when you mentioned lookup I figured you meant random access. If you want to determine whether an element is present in a list then indeed the entire list would be traversed whereas with a dict it takes time depending on the hash function and the existing collisions, but O(1) in the average case. If you need an ordering like a list that supports constant time membership testing, starting from Python 3.7 the dict implementation preserves insertion order! Earlier versions may instead rely on OrderedDict.

@andrewglick6279 2 года назад

@@LettuceAttack176 That's very interesting, thanks! Why do Python lists require iterating over the items? I always assumed Python lists (and probably also C++ std::vector) were more like C/C++ arrays which (as far as I know) are constant lookup instead of linked lists which require iterating through.

@michaelmoore8 2 года назад

@@andrewglick6279 As far as I understand lookup to a specific index in a list is o(1), similar as it would be for lookup by key in a dict. Now if rather than simply looking up you were looking for a specific item in a list it's o(n) whereas for the dictionary it remains o(1). By contrast if python lists were backed by linked lists it'd be o(n) to both retrieve the ith item from the list, and search the list for a specific item that may or may not be in the list because with a linked list you gotta iterate in both cases.

@andrewglick6279 2 года назад

@@michaelmoore8 Ah, that makes sense then

@XCanG 2 года назад

I just today speaking with my friend who code on Swift about python lists, generators and list comprehensions, what an ideal time to release a video!

@Rudxain 2 года назад

My guess for the number-literal "quirk" is that it makes a deep copy of each arbitrary-precision int and therefore each one has a different address and unique pointers to be stored. That explains why when using variables it has the same capacity as when explicitly repeating the element. Maybe because it uses Run Length Encoding to internally repeat the same pointer without making copies of it

@calmpuffin 2 года назад

What a title

@alexandergu7797 2 года назад

Yea XDDD

@geegrant865 2 года назад

This is so much better than y’all just telling us. How I wish youtube instructors would debug in trials before telling us the answer. Print statements or it didn’t happen

@mCoding 2 года назад

Glad to hear you appreciate the style!

@knut-olaihelgesen3608 2 года назад

Super good audio quality!

@klausvonshnytke 2 года назад

The analogy to renting building is funny. Most business owners would rather cram more people in the same building than rent a bigger one.

@williamowens2063 2 года назад

I thought you chose the number at 1:45 because it is the maximum value for an unsigned int but I quickly realized that doesn't make sense and it's way too big to be the max value of an unsigned int. Do you mind sharing why you chose that number?

@leo848 2 года назад

please answer if anyone knows

@SophieJMore 2 года назад

@@leo848 It's just an easter egg. Try converting the number to hex and then converting the hex into ascii

2 года назад

@@SophieJMore That's it. For anyone too lazy, doing so ends up giving "subscribe". How didyou figure it out, Sophie?

@SophieJMore 2 года назад

@ At first I assumed it was something to do with integer ranges, but it turned out not to be. Then I thought it was some kind of hidden message and tried converting it to different bases and ascii and stumbled upon the solution. In theory it should also work in binary and octal, but I couldn't do it, maybe I was doing something wrong.

@ShalevWen 2 года назад

@@SophieJMore it works in binary if you add 0 to the start, it doesn't work in octal because the bits don't align (1 character is 2 digits in hex, 8 digits in binary but 8/3 digits in octal)

@xelaxander 2 года назад

Oh wow! RU-vid just deleted my comment, linking a Gist with Benchmarks of different types of list construction. Tl:dr was that both list comprehensions or preallocating a list of 10000 zeros is on my system about 28-44% faster than allocating an empty list, then appending each element separately and numpy is 5x faster than that. The gist is still up on my Github, same as my RU-vid name. So there *is* actionable information in this video :).

@skilz8098 2 года назад

The entire concept about dynamically allocated arrays as being a single structure or container is directly related to data structures and algorithms. The geometric progression is based on the observations of both minimizing the code or number of instructions while also increasing its performance and efficiency. In other words or in simple terms, it's based on Big O notation for both time and space complexity of an arbitrary algorithm in contrast to the desired container or data structure. Although Python is a High Level scripting or interpreted language compared to other languages such as C and C++ and abstracts away from these details, why would these concepts about memory layout, locality, access times, and geometrical expansion be any different or change? They don't! These concepts even apply in code bases such as JavaScript or any other language that can do computations on datasets. Even database and querying languages such as SQL or PHP even have these principles embedded within their internal frameworks. It's the nature of things as it always goes back to the mailbox problem with indexing into address spaces as well as the phonebook problem with all of the different searching and sorting algorithms as well as others! Nice video though! I really enjoyed it!

@johnnysun6495 2 года назад

"lists remember what you did to them" sounds like a threat

@ren200758 2 года назад

thanks, and i can imagine the same(-ish) being applied to other python objects. perhaps another homework time.

@andrey2001v 2 года назад

from now on I'll preallocate memory to lists instead of append() to empty list in loop. Gotta count those nanoseconds.

@remrevo3944 2 года назад

I mainly use rust and there the capacity is actually important in some cases. For example when having 100 elements, you know you're going to add to the vector (rusts version of a list) it would be very wasteful if you had to reallocate the vec a few times before actually reaching the needed capacity. Therefore there is a with_capacity constructor.

@mCoding 2 года назад

Yes, most compiled languages expose the capacity api as it is very important in performance oriented situations!

@remrevo3944 2 года назад

@@mCoding Though there is no automatic resizing of rust vecs when they get smaller.

@sharkinahat 2 года назад

Memory? We don't need memory where we're going! import antigravity

@xskerrittx1 2 года назад

is that number when a python regular number becomes a big int?

@smiley_1000 2 года назад

Doesn't look like it.

@romilgoel4191 2 года назад

It is a hidden message !

@gianclgar 2 года назад

But… Why does the [0,0,0] takes space for 8? 🙄

@rb1471 2 года назад

and why does [e, e, e] take 3? I didn't catch the reasoning for it

@gianclgar 2 года назад

@@raccoon1160 Yeah but... still need an explanation, weird things also have an explanation, specially if they repeat the same behavior (seems to be always 8, not just a random number)

@gianclgar 2 года назад

@@rb1471 3 makes sense because it takes spaces for 3 elements (3 e's) if i recall correctly

@0LoneTech Год назад

This one seems to be subtle. Checking the bytecode for a couple of functions with dis.dis, [3,3,3] caused a list to be created then extended with a tuple (3,3,3). [e,e,e] on the other hand used BUILD_LIST with 3 arguments, and BUILD_LIST in turn created the list with reserved capacity using PyList_New() before populating it backwards. extend used list_resize to allocate space, and that added some margin. These two behaviours can be demonstrated using l=list(t) vs l=list();l.extend(t) That's my testing on CPython 3.9 on Linux x86_64. This is all implementation details that can and do vary between implementations.

@lohannasarah152 2 года назад

Man, this channel is too good!

@mCoding 2 года назад

Thank you for the kind words!

@meemkoo 2 года назад

the list may be gone, but the list always remembers...

@kawaiimunism 2 года назад

Huh, interesting, I never realized that CPython lists were stored contiguously, I'd always assumed they were linked lists.

@mCoding 2 года назад

This is actually a common misconception! I also thought this at one point.

@igorswies5913 2 года назад

I guess they were called lists because it sounds simpler than arrays, and Python is supposed to be a simple language

@0LoneTech Год назад

@@igorswies5913 True, but there's also an array type (in the array module). Python's array type stores fixed size numbers which are converted to objects when read. This can make it more compact than a list, which has to store references to all elements, where each reference is usually (on 64-bit systems) the largest size available in array. Users are generally directed to use numpy instead, which offers more useful high level operations.

@XxyehezkelxX 2 года назад

I know this was more on the informative side of videos, but honestly I was mostly laughing at your comments on screen lol. In terms of the vid itself, pretty much what I expected in terms of memory management, but awesome vid nevertheless!

@non5309 2 года назад

я все еще плохо знаю английский. но эти видео смотрю с удовольствием, допонимая что не понимаю по контексту. Спасибо за материал!

@do0nv 2 года назад

List knows what **you** did.

@JohnMichaelCagle12 2 года назад

This is super interesting. Thank you.

@antonioberandas9610 2 года назад

So everytime I reach the capacity limit, the whole list gets reallocated to a new spot?

@unflexian 2 года назад

if lists could talk they would say really bad things about this guy

@teamllr3137 2 года назад

Yes I'm an embedded python dev using micropython

@Aditya-ne4lk 2 года назад

how to do you come across things like this?

@nathanoy_ 2 года назад

really enjoying your videos, thank you!

@rotemlv 2 года назад

Unfortunately it gets messier when you put lists in your lists. I knew I had an issue with the second version, it hurt me when I saw it, I just remembered why. If this comment survives birth, you can run this piece of code and see the problem-output: aa = [[]]*3 bb = [[] for _ in range(3)] print(aa) print(bb) aa[0].append(1) bb[0].append(1) print(aa) print(bb)

@TheArtikae 2 года назад

Mutability moment.

@SugarBeetMC 2 года назад

aa contains the same sublist object three times, bb contains three different sublist objects.

@Snifo Год назад

That's why when we use async we need to lock it before editing and appending the list ? So there is a chance that the list could be edited while switching the a new memory size ?

@mCoding Год назад

This would be a potential issue with threading, but not with async since async only swaps control at expicit points like "yield". However, current implementations of python use a global lock to prevent this from being an issue even with threading.

@b4ttlemast0r 5 месяцев назад

Is there something akin to C++'s resize, reserve and shrink_to_fit methods? These would allow more direct control over the capacity.

@tsalVlog 2 года назад

"please comment now" I feel attacked

@iamkai101 2 года назад

simple. getsizeof actually returns a random number

@SpeedingFlare 2 года назад

Key takeaway: don't use list literals!

@SpeedingFlare 2 года назад

Before you comment, I'm kidding

@mCoding 2 года назад

Several people stopped typing...

@nO_d3N1AL 2 года назад

Would be good to get a video about "loadFactor", which is what Java's HashMaps use to determine when to add new buckets. I'm also curious as to how they reached this 9/8th's growth factor. I thought generally the capacity doubles every time (this is the case in Java's ArrayList I think), which is obviously more wasteful of memory, but understandable why they went with this approach.

@nulano 2 года назад

loadFactor is not used for lists, only dicts. PyPy keeps the dict size between 1/3 to 2/3 full, AFAIK CPython is the same. The list type grows with the following code in PyPy, which AFAIK was ported from CPython directly: if newsize < 9: some = 3 else: some = 6 some += newsize >> 3 new_allocated = newsize + some

@robertbrummayer4908 2 года назад

Excellent video

@mujeebishaque 2 года назад

Love from Pakistan. Thanks for sharing

@mCoding 2 года назад

Glad to reach you all the way in Pakistan! Hope you enjoy!

@QUASAR098 2 года назад

good video

@mCoding 2 года назад

Glad you enjoyed!

@P4ExHzLRuuiFMg3X4U3v 2 года назад

I was playing around with sys.getsizeof() on other builtin types and I stumbled on something surprising (for me). For strings of course the concept of capacity doesn't apply because they are an immutable type, so it makes sense that the exact amount of memory needed to store a given string is allocated, and no more. What surprised me is that for an empty string, sys.getsizeof returns 49. I expected a multiple of 8 bytes

@mCoding 2 года назад

Strings use multiple different in memory encodings. E.g. if your string is all ascii it uses 1 byte per character, but if you use an emoji it increases to 2 or more bytes per character. The string therefore needs to store this metadata about which scheme it is using inside the object, which is why you are getting this answer.

@0LoneTech Год назад

@@mCoding While true, that doesn't quite reveal why it's an odd size. I believe the reason is the encoding chosen was UTF8, and in that case an extra byte is allocated for a C-style NUL terminator.

@gwentarinokripperinolkjdsf683 2 года назад

Just taking a guess. Because these are dynamically sized arrays, the capacity is counted towards the size

@iAmTheWagon 2 года назад

I love this channel

@doc0core 2 года назад

Hmm... I hesitated before clicking the video. It turns out exactly as I expected, curious but useless. If one care about exact memory management you should use Python. For certain structured data we use NumPy and/or Pandas. Otherwise, switch back to C and do your own malloc() and free()

@mCoding 2 года назад

Is understanding useless? If you think the purpose of this video is to tell you how to save memory in Python then you've completely missed the point.

@vmarzein 2 года назад

if I had to guess, 21298...4277 is the highest int value before int overflow?

@SophieJMore 2 года назад

Python doesn't really have int overflow. It just converts the number to bigint

@mCoding 2 года назад

(Hint) Maybe you should look at it in binary...

@kloetikalalloet1245 2 года назад

@@mCoding At first I thought it is some prime number with a special history where group theorists do some math jokes with it. But google gave no answer to it. Then I though it might be already HEX code, but it wasn't directly apparent, after going from binary to ascii I already smelled the answer. Now I wonder if I subscribed because of the great content or the subliminial messages...

@JorgeLuis-ts6qp 2 года назад

This is the kind of stuff you only use in a job interview.

@nonchip 2 года назад

_sees title_ pre-coffee-brain: "well yeah it's python, no wonder the memory management is hella inefficient as well" but joke aside, i wouldn't call that "remembering" or "what *you* did" (please *do not* rely on any of those effects for anything!), it's just a side effect of an allocator not wasting all your CPU all the time to save on 2 bytes of theoretically unused ram, what gives?

@imlovinit1232 2 года назад

Is a python list contiguous? I don't see the reason it'd need to be to function, but I could see it for speed reasons to make it fast indexing like an array

@mCoding 2 года назад

Yes, python lists are contiguous. However, all python objects are pointers, so it is a contiguous list of pointers that point to allocated memory that is probably not contiguous.

@0LoneTech Год назад

It's not only speed; discontiguous arrays (e.g. ropes or linked lists) need more memory to track where their subparts are.

@SkyFly19853 2 года назад

Oh... I think I like that feature... Btw, have you made any videos on how to speed up loops and if statements? Thank you in advance!

@pianocrasher 2 года назад

Yeah. He released a video about this topic some months ago. Just search for this title: 'The Fastest Way to Loop in Python - An Unfortunate Truth'

@SkyFly19853 2 года назад

@@pianocrasher I understand. ✅

@avananana 2 года назад

I can see this knowledge being useful if you for some reason are using a ton of lists and loops and trying to optimize the code. But honestly, nobody codes in python for performance so it's probably, just as you said, good to know but not very useful.

@Visleaf 2 года назад

Python lists remember what you did to them

Опубликовано:

Поделиться:

Ссылка:

Скачать:

Добавить в: