Re: [boost] [string] proposal

28 Jan 2011

      On Sat, Jan 29, 2011 at 2:51 AM, Nevin Liber <nevin@eviloverlord.com> wrote:
...
On 28 January 2011 12:22, Dean Michael Berris <mikhailberis@gmail.com>wrote:
The fact that contiguous pages don't get stored in contiguous physical
...
memory compounds the problem even more. You feel this a lot when your
system starts thrashing and the VMM does the work of swapping pages
and re-arranging page tables for you. The round-trips between a CPU
page fault and the VMM page fault handler along with potentially
invalidating L1/L2/L3 caches costs are significant in environments
where you need to get as much things done per second as physically
possible. If you can avoid having to go through this dance with a sane
memory management strategy, I think that's a win even in the trivial
case.
If I have to keep data in physical memory, the way to use the least number
of pages is to store it contiguously with respect to virtual memory.
Anything else will (except for degenerate cases) require *more* physical
pages of RAM.
More pages doesn't mean more contiguous pages. If you really needed
contiguous pages you already have those containers in place.

I never said that the segmented storage for strings will require less memory. ;)
...
There are many other reasons to break something up into a segmented data
structure, but avoiding thrashing when storing the object is not one of
them.
Hmmm... ? So how is avoiding the (mostly unnecessary) paging in/out by
not requiring contiguous pages not a way of avoiding thrashing? (Too
many negatives, too tired to edit that message).
...
Honestly, trying to argue that your new class is more efficient *on all
fronts* is futile.  You'd do far better showing the tradeoffs you are
making; which things are more efficient, and being honest about which things
are less efficient.
Really, look -- I didn't say my new class is more efficient on all
fronts, I think you're arguing a strawman here.

What was being discussed in this particular line of discourse is the
effect of contiguous pages of memory in a data structure like
std::vector or std::string. For the cases that you don't need these
contiguous pages like in immutable strings, not using contiguous pages
is a good thing, better than unnecessarily forcing the use of
contiguous pages. It wasn't about the saving of memory, it was about
limiting the potential for fragmentation and VMM paging involvement.

Also, read the paper now that it's there. This is getting tiring
really especially when a strawman red herring is thrown in once in a
while.
...
It depends on what you're trying to do and what algorithms you're
...
applying on your std::dequeue. This is the whole crux of the segmented
iterators debate/paper by Matt Austern and others. Segmented versions
of the algorithms would make using std::dequeue a lot more preferable
if you really needed a data structure that doesn't promote heap
fragmentation as much as a growing std::vector does.
GIven that a deque requires a new heap allocation for every N elements, how
does it avoid heap fragmentation?
Consider the flip side with an std::vector which doesn't have a
reallocate option. It requires a *larger* heap allocation everytime it
grows. Now riddle me this, what happens to the memory the vector used
to occupy? Doesn't having irregular sized allocations promote more
fragmentation than having regular sized allocations?

Having same-sized segments is already a good thing for cases where you
have an optimal way of allocating pages -- and most modern OSes have
these. Growing a std::queue introduces less fragmentation and
therefore less thrashing by making the sizes of the segments tuned and
aligned properly.
...
And I think this is straying off-topic from Boost...
I agree. Let's stop this talk now.

-- 
Dean Michael Berris
about.me/deanberris