Nice one! Might be good to critique the system allocator on Alpine too, since it's not just glibc, and seems to perform quite poorly in some cases. Nice to have mimalloc and jemalloc available to work around it.
Looks like musl (the standard libc of alpine) has a bespoke malloc implementation (elixir.bootlin.com/musl/latest/source/src/malloc/mallocng/malloc.c). This allocator is significantly slower than glibc (and jemalloc/mimalloc). The good news is that it's just as easy to replace the allocator in musl as it is with glibc. By switching to mimalloc+musl, the test application shown at the end of the video performs only about 4% slower than mimalloc+glibc (roughly on par with glibc alone), and musl alone is 38% slower than mimalloc+musl. jemalloc_perf+musl is the same as mimalloc+musl, but with the high memory initial overhead as seen with jemalloc+glibc.
Didn't quite catch why wouldn't allocator give back 2559 dirty pages to OS if these 64 bytes are in use. Does allocator want us to free all requested memory to give those pages back or bcs we wrote data to these 10 Mb but freed only 9.9 Mb?
This is difficult to explain, sorry for the confusion. There's two types of allocation in Linux. One uses sbrk to move something called the break for the heap up and down. Think of the break like a line. In the case where you move the break up 10mb, either in one big jump, or many small jumps, then you use all of that 10mb, the you free all the memory other than the very top; the allocator cannot move the break back down because that very top is still being used. The other way to allocate is via mmap. If you use mmap by hand, you can grab a 10mb chunk of memory, use it, and the mark 9.9mb of that memory as DONT_NEED, I've not seen that sort of behaviour when an allocator uses mmap to grab memory and then give it to you via malloc/free. In the case where an allocator uses mmap, it will (hopefully) mark that 10mb chunk of memory as DONT_NEED when you are completely done with it. also, allocators try to be smart with mmap. Eg jemalloc will wait to mark an mmap as dont_need for some amount of time in case you ask for more memory.
@@masmullin So if allocator uses sbrk syscall there is particular reason why 9.9 Mb isn't freed (bcs 64 bytes are located at top of the 9.9 Mb). But in case of mmap it seems like nothing prevents allocator from freeing 9.9 Mb if it wants so, bcs mmap doesn't increase brk segment address but instead giving us pages of memory somewhere. So is it true that 'Dirty Pages' are really possible only while using sbrk, bcs if allocator use mmap it can call free syscall on freed(by allocator API) memory pages?