Beware that this can have performance impacts on some cpu architectures, e.g. on the ARM Cortex-A8 configured with 32K L1 data cache, if you've got an even page that aliases an odd page (by even/odd I'm referring to bit 12 of the virtual address) then accessing a cache line via one page will evict that same cache line if last accessed via the other page
Finally, I can download more RAM!! On a more serious note, I wonder if it would be feasible to incorporate this functionality directly into Linux at the kernel level? If so, would there be any way to ensure user-space code isn't running MESH on top of kernel provided MESH?
As they've described it, it's just a modification to the libc. If you preload a userspace malloc replacement, it would replace the existing malloc, and only one instance of the algorithm would ever run per program. As long as the two implementations replace the same functions there shouldn't be a conflict, since the compactor runs independently for each program. If the system version was substantially different in structure then some issues might start to crop up.
"Well, there already is one indirection.... how could we abuse it?" ...... Damn.... I would not think of that, and I would certainly not think it's possible... I am very impressed.
I've been trying out mimalloc, which can also be used through LD_PRELOAD. While not "compacting" as I understand it, they claim this: " *free list sharding* : instead of one big free list (per size class) we have many smaller lists per "mimalloc page" which reduces fragmentation and increases locality -- things that are allocated close in time get allocated close in memory. (A mimalloc page contains blocks of one size class and is usually 64KiB on a 64-bit system)."
So could this also be used to lower or remove the need for extra memory in languages with a GC? And on a similar not: the talk shows an example of Ruby with Mesh, but would there be benefits to building a GC "from the ground up" with the techniques of Mesh?
(1) re: reducing the need for extra memory with GC: yes, it can reduce the memory footprint of systems with garbage collection, as long as they use malloc/free (e.g., Ruby & Python) (2) re: GC + Mesh: we are exploring exactly that question!
No, GC code frees unused memory by copying the live allocations into a clean space. The need for this clean space is the root cause why GC code isn't highly efficient in memory usage.
He mentioned this in the last minute or so in response to a question. GCs can do something even better, which is to move and compact individual allocations. And they have to do this anyways to identify and remove unreachable objects from the graph of live objects. So there's not much benefit to using this technique which is basically opportunistically mashing together pages when possible to recover some fraction of unused memory when you could just recover all of it by compacting the whole heap.
No. With GC you need to make a tradeof of throughput, latency and memory consumption. If you want low memory consumption there are GCs that dont have big memory overhead but they slow down you application
I think there's one important point missing in the presentation and I'm surprised nobody asked about it. What happens after the mesh to free virtual space that conflicts with already allocated space in another virtual page of the same mesh? Is it marked as occupied to avoid conflicting allocations or do you un-mesh the pages when a conflicting allocation occurs?
Any program that allocates memory without ever freeing it would see no change, as would programs that only use the stack. For example, the typical "hello world" in C does not dynamically allocate any memory in userspace. (printf with a const cstring input compiles to puts, which, in glibc at least, just does a kernel "write" syscall with no allocation, which writes to the stdout file)
I don't understand how this won't lead to VMA fragmentation over time (as VM regions are remapped to "meshed" regions of physical memory), in which case you're just trading one form of fragmentation for another. Has this been tested with very long-running, high-fragmentation server workloads?
Ok, I like this! When will the be the default malloc implementation? *EDIT:* Ok, I was excited and tried this. I think many people had the same first thought: *Chrome!* ..... *segfault* ... ok, but firefox works, right? .... also segfault... - Ok, well... git works fine, but doesn't really need it. XD I might try a few other things, but I think, now I know *_why._* (Don't get me wrong: I still think it's amazing. Life is just not this nice... Maybe it will work with some work, but clearly those programs do some crazy things...)
This is really cool! I'm tinkering with some things (like a new interpreted language) that might benefit from this, eventually... edit: dang, windows is WIP.
2:08 ... wait, you have that page ... it will most likely not result in a segfault, you will just read the wrong data.... - as far as I understand - Maybe I am wrong. (but if I had pointer to that place a moment ago, it will probably not cause segfault... - is what I think anyway - I don't even do low level programming, though...)