
On 01/29/2011 01:25 AM, Dean Michael Berris wrote:
On Sat, Jan 29, 2011 at 6:53 AM, Steven Watanabe<watanabesj@gmail.com> wrote: ... elision by patrick...
Note that the OS never gets a mmap request less than 128 KB and all the memory allocated via sbrk is always contiguous.
Yes, and this is the problem with sbrk: if you rely on your allocator to use sbrk, sbrk is not guaranteed to just expand the available segment in place -- which means the VMM no matter what you do will actually have to find the memory for you and give you that chunk of memory. This means potentially swapping pages in/out. You limit this likelihood by asking for page-sized and page-aligned chunks and using those properly; making data in these chunks immutable makes for much more sane "shareability" and a much more predictable performance profile.
The problem is then compounded by when your program calls sbrk/brk in one core and fails -- then mmap is called and the memory returned by mmap will actually be in the memory module that's handled by a NUMA CPU. Now imagine using these chunks from another thread that's not running on the same core and you go downhill from there. That doesn't even factor the cache hits/misses that can kill your performance especially if the VMM has to page in/out memory that has been evicted from the L3 cache of the processors (if there's even an L3).
All of this stems from the fact that you cannot rely on there being enough memory at the start of the program at any given point. Thinking about it differently helps this way: if you assume that every memory allocation you make will cause a page fault and will potentially cause the VMM to allocate you a page, then you should be using this page of memory properly and saving on the amount of page faults you generate. In the context of playing well with the OS VMM, this means being wise about how you use your memory so that the VMM doesn't have to work too hard when you request for a data segment of a given size. This works for sbrk as well because asking for an extension in the amount of memory available to your program as a "heap" will cause page faults almost always.
Just food for thought. Have you guys thought about how this changes on a small memory constrained embedded device like a controller or a phone? Patrick