Proposal: Monotonic Containers

newer
Re: [boost] GSoC Unicode library:...

older
DateTime conversion to tm problem?

Christian Schladetsch

9 Jun 2009 9 Jun '09

6:51 a.m.

Hello, There is a need in high-performance realtime software to use containers that may only grow in size. These are typically reset at the end of each update cycle. The proposal here consists of a base 'monotonic allocator', which takes its storage from a buffer that you provide. Allocations are made from this buffer, and deallocation only calls an object's destructor; no actual memory-management is performed at all. This is very useful for realtime systems that have to do things like, for example, physics or renderering systems. These systems build up lists and other containers over the period of an update frame, however at the end of that frame the storage can just be reset. The benefit of using this system over a normal std::allocator is that memory is not fragmented, and memory allocation and deallocation is basically optimal. Within the package are: boost.monotonic.inline_storage boost.monotonic.list boost.monotonic.vector boost.monotonic.map boost.monotonic.set boost.monotonic.ptr_list I didn't bother adding the other ptr_* containers yet. The files for the proposal are in the vault at http://www.boostpro.com/vault/index.php?action=downloadfile&filename=MonotonicAllocator.zip&directory=& . I'd be happy to address any issues or comments. There are some improvements that could be made. Regards, Christian.

Show replies by date

Thorsten Ottosen

9 Jun 9 Jun

7:18 a.m.

Christian Schladetsch skrev:

...

Hello,

There is a need in high-performance realtime software to use containers that may only grow in size. These are typically reset at the end of each update cycle.

The proposal here consists of a base 'monotonic allocator', which takes its storage from a buffer that you provide. Allocations are made from this buffer, and deallocation only calls an object's destructor; no actual memory-management is performed at all.

This is very useful for realtime systems that have to do things like, for example, physics or renderering systems. These systems build up lists and other containers over the period of an update frame, however at the end of that frame the storage can just be reset.

The benefit of using this system over a normal std::allocator is that memory is not fragmented, and memory allocation and deallocation is basically optimal.

Within the package are:

boost.monotonic.inline_storage boost.monotonic.list boost.monotonic.vector boost.monotonic.map boost.monotonic.set boost.monotonic.ptr_list

I didn't bother adding the other ptr_* containers yet.

The files for the proposal are in the vault at http://www.boostpro.com/vault/index.php?action=downloadfile&filename=MonotonicAllocator.zip&directory=& ..

Could you provide a online link to the docs, please. -Thorsten

Christian Schladetsch

7:24 a.m.

Hi Thorsten, Could you provide a online link to the docs, please. What you see is what you get at the moment. I guess this just means I should write some. Thanks, Christian.

Christian Schladetsch

8:21 a.m.

Hello, I have added a few pages of documentation, with motivational examples, to a new upload at http://www.boostpro.com/vault/index.php?action=downloadfile&filename=MonotonicAllocator_dox.zip&directory=& On Tue, Jun 9, 2009 at 7:24 PM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...

Hi Thorsten,

Could you provide a online link to the docs, please.

What you see is what you get at the moment. I guess this just means I should write some.

Thanks, Christian.

Thorsten Ottosen

11:46 a.m.

Christian Schladetsch skrev:

...

Hello,

I have added a few pages of documentation, with motivational examples, to a new upload at http://www.boostpro.com/vault/index.php?action=downloadfile&filename=MonotonicAllocator_dox.zip&directory=&

That is not what I asked for. I wanted online docs and code. We are very busy people, and anything that takes extra time is likely to mean we don't bother. -Thorsten

Christian Schladetsch

11:56 a.m.

Hello, I agree that it is a little inconvenient, but: Click on the link, open, and open "index.html" I guess you expected a personally hosted site. I do not run a personal website, so I can't post things locally. If there is a better way of doing it, please advise. Christian. On Tue, Jun 9, 2009 at 11:46 PM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:

...

Christian Schladetsch skrev:

...
Hello,

I have added a few pages of documentation, with motivational examples, to a new upload at

http://www.boostpro.com/vault/index.php?action=downloadfile&filename=MonotonicAllocator_dox.zip&directory=&

That is not what I asked for. I wanted online docs and code. We are very busy people, and anything that takes extra time is likely to mean we don't bother.

-Thorsten _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

12:06 p.m.

Hello, That is not what I asked for. I wanted online docs and code. We are very

...

...
busy people, and anything that takes extra time is likely to mean we don't bother.

OK well on second thought, what? I am also busy. I gave you a link to code and documentation in the one archive. http://www.boostpro.com/vault/index.php?action=downloadfile&filename=MonotonicAllocator_dox.zip&directory=& Click on the link, get the code and the documentation. Sorry that you have to click into the archive twice, but boost doesn't have a better way of doing it ATM that I am aware of. Chrstian.

Stewart, Robert

1:14 p.m.

Christian Schladetsch wrote: On Tuesday, June 09, 2009 8:07 AM

...

That is not what I asked for. I wanted online docs and code. We are very busy people, and anything that takes extra time is likely to mean we don't bother.

...
OK well on second thought, what?

I am also busy.

I gave you a link to code and documentation in the one archive.

Click on the link, get the code and the documentation. Sorry that you have to click into the archive twice, but boost doesn't have a better way of doing it ATM that I am aware of.

You're assuming that everyone uses an environment which makes viewing your HTML page just a couple of clicks, which is not necessarily true. Even when that's the case, those with no further interest must also remove the file(s) that result. A bigger issue for me is the attitude that seems to pervade your message traffic, including that in the DirectX thread. I think I see efforts at patience on your part, when I read things like, "I have repeatedly said, ...." It suggests frustration on your part that folks "aren't getting it" or aren't paying attention. However, I have read a great many posts from you that I can't construe as other than antagonistic, like this one. You are asking folks to take their time to investigate something you've proposed. Some folks will immediately jump at the idea and will be willing to expend a good deal of effort to evaluate what you have. Some are uninterested and won't expend any effort. There are others who have a passing interest and might have a look at what you've done, but could easily be thwarted from doing so by trivial things like having to download an archive, extract the files, view the HTML page, and then delete the files. Thorsten either fits in the latter group, or was concerned for your sake about that group. He offered you helpful advice that should increase the number of people that give your proposal at least a cursory look. Your response appears intended to denigrate Thorsten for his laziness and imposition on you. That is hardly constructive and discourages others from working with you. I hope you take the time to respond kindly and to assume the best of others henceforth. If I have misconstrued your attitude, then I ask that you think a little more about how you word your posts in order to avoid further misinterpretation insofar as it is within your power to do so. For your edification, you might read http://www.boost.org/community/policy.html#culture and http://www.boost.org/community/policy.html#effective-discussions. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

David Bergman

3:58 p.m.

On Jun 9, 2009, at 8:06 AM, Christian Schladetsch wrote:

...

Hello,

That is not what I asked for. I wanted online docs and code. We are very

...
...
busy people, and anything that takes extra time is likely to mean we don't bother.

OK well on second thought, what?

I am also busy.

But trying to sell something you should make it as easy for the potential "customer" as possible. /David

joaquin＠tid.es

12:12 p.m.

Christian Schladetsch escribió:

...

Hello,

I agree that it is a little inconvenient, but:

Click on the link, open, and open "index.html"

I guess you expected a personally hosted site. I do not run a personal website, so I can't post things locally. If there is a better way of doing it, please advise.

You can request write access to Boost sandbox and upload your stuff there so that people can access the docs directly, as exemplified by http://svn.boost.org/svn/boost/sandbox/msm/libs/msm/doc/index.html HTH, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Christopher Jefferson

12:13 p.m.

On 9 Jun 2009, at 12:56, Christian Schladetsch wrote:

...

Hello,

I agree that it is a little inconvenient, but:

Click on the link, open, and open "index.html"

I guess you expected a personally hosted site. I do not run a personal website, so I can't post things locally. If there is a better way of doing it, please advise.

Christian.

Very interesting. I have been working on a similar, but much more limited, proposal which just does this for vector. I decided I had to implement a new class, because building on top of the 'allocator' framework wasn't sensible, because you waste too much buffer space. For example, I believe your example won't work. I'd be interested to see what you think. Consider your example, in g++ (which I know how it operates internally) boost::monotonic::inline_storage<100*sizeof(Object)> storage; // create local storage on the stack boost::monotonic::vector<Object> deathrow(storage); // create a std::vector that uses this storage foreach (object in world) { if (IsDead(object)) deathrow.push_back(object); // allocation is just advancing a pointer } This will build a vector which will continuously expand, through sizes 1,2,4,8,16,32. At each step, it will allocate a new block of memory and free the old one, making the storage size 1,3,7,15,31,63. When we try to push_back beyond 32 elements, the next allocation will overflow the buffer. Am I missing something? Chris

...

On Tue, Jun 9, 2009 at 11:46 PM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:

...
Christian Schladetsch skrev:

...
Hello,

I have added a few pages of documentation, with motivational examples, to a new upload at

http://www.boostpro.com/vault/index.php?action=downloadfile&filename=MonotonicAllocator_dox.zip&directory=&

That is not what I asked for. I wanted online docs and code. We are very busy people, and anything that takes extra time is likely to mean we don't bother.

-Thorsten _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

12:19 p.m.

Hi Chris, Am I missing something? Yes. The vector allocations are done via a custom allocator. Regards, Christian

Christopher Jefferson

12:28 p.m.

On 9 Jun 2009, at 13:19, Christian Schladetsch wrote:

...

Hi Chris,

Am I missing something?

Yes. The vector allocations are done via a custom allocator.

I can see that. The following code assertss for me, g++ 4 on Mac OS X. Notice how I only need to do 33 iterations. Ddo you have even a basic testing suite for this code? I couldn't find one, and this seems like a fairly fundamental bug. Perhaps it doesn't occur on windows? #include "boost/monotonic/vector.h" int main() { boost::monotonic::inline_storage<100*sizeof(int)> storage; // create local storage on the stack boost::monotonic::vector<int> deathrow(storage); // create a std::vector that uses this storage for(int i = 0; i < 33; ++i) deathrow.push_back(i); }

Christian Schladetsch

12:34 p.m.

...

The following code assertss for me, g++ 4 on Mac OS X. Notice how I only need to do 33 iterations.

I will get back to you. No, I havent done exhaustive cross-platform testing but this is suprising and indicates a basic fault. I use Ubuntu by default.

...

Ddo you have even a basic testing suite for this code? I couldn't find one, and this seems like a fairly fundamental bug. Perhaps it doesn't occur on windows?

#include "boost/monotonic/vector.h"

int main() { boost::monotonic::inline_storage<100*sizeof(int)> storage; // create local storage on the stack boost::monotonic::vector<int> deathrow(storage); // create a std::vector that uses this storage

for(int i = 0; i < 33; ++i) deathrow.push_back(i);

}

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian.

Christopher Jefferson

2:18 p.m.

On 9 Jun 2009, at 13:34, Christian Schladetsch wrote:

...

...
The following code assertss for me, g++ 4 on Mac OS X. Notice how I only need to do 33 iterations.

I will get back to you. No, I havent done exhaustive cross-platform testing but this is suprising and indicates a basic fault. I use Ubuntu by default.

I find the code breaks identically on Ubuntu and Windows. Furthermore, I'm afraid to say I'm sure this is unfixable. vector requests a new block of memory, copies data across, and then frees the old block. You can't make it do what you want to. You could, either in your wrapper do a .reserve() for the appropriate amount of memory, but that's certainly not guaranteed by the standard to work in all cases -- I don't know if it would work in practice across multiple compilers. This would only work for vector, not the other containers, which would need their own fixes (for example most std::deques allocate in blocks of their own, large, predefined size, and also need a separate allocation for a "master list" of pointers). I imagine with std::set you might be better off, but then again that always does a fixed size allocation, so arguably there you would be better with an optimised fixed-size allocator, which can efficiently handle frees anyway. Chris

...

...
Ddo you have even a basic testing suite for this code? I couldn't find one, and this seems like a fairly fundamental bug. Perhaps it doesn't occur on windows?

#include "boost/monotonic/vector.h"

int main() { boost::monotonic::inline_storage<100*sizeof(int)> storage; // create local storage on the stack boost::monotonic::vector<int> deathrow(storage); // create a std::vector that uses this storage

for(int i = 0; i < 33; ++i) deathrow.push_back(i);

}

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Thorsten Ottosen

10 Jun 10 Jun

9:08 p.m.

Christopher Jefferson skrev:

...

Very interesting. I have been working on a similar, but much more limited, proposal which just does this for vector.

I decided I had to implement a new class, because building on top of the 'allocator' framework wasn't sensible, because you waste too much buffer space. For example, I believe your example won't work. I'd be interested to see what you think.

Consider your example, in g++ (which I know how it operates internally)

boost::monotonic::inline_storage<100*sizeof(Object)> storage; // create local storage on the stack boost::monotonic::vector<Object> deathrow(storage); // create a std::vector that uses this storage foreach (object in world) { if (IsDead(object)) deathrow.push_back(object); // allocation is just advancing a pointer }

This will build a vector which will continuously expand, through sizes 1,2,4,8,16,32. At each step, it will allocate a new block of memory and free the old one, making the storage size 1,3,7,15,31,63. When we try to push_back beyond 32 elements, the next allocation will overflow the buffer.

Am I missing something?

Maybe http://www.cs.aau.dk/~nesotto/boost/trunk/libs/auto_buffer/doc/html/ http://www.cs.aau.dk/~nesotto/boost/trunk/boost/auto_buffer/ is what you are looking for? -Thorsten

Christopher Jefferson

9:28 p.m.

On 10 Jun 2009, at 22:08, Thorsten Ottosen wrote:

...

Christopher Jefferson skrev:

...
Very interesting. I have been working on a similar, but much more limited, proposal which just does this for vector. I decided I had to implement a new class, because building on top of the 'allocator' framework wasn't sensible, because you waste too much buffer space. For example, I believe your example won't work. I'd be interested to see what you think. Consider your example, in g++ (which I know how it operates internally) boost::monotonic::inline_storage<100*sizeof(Object)> storage; // create local storage on the stack boost::monotonic::vector<Object> deathrow(storage); // create a std::vector that uses this storage foreach (object in world) { if (IsDead(object)) deathrow.push_back(object); // allocation is just advancing a pointer } This will build a vector which will continuously expand, through sizes 1,2,4,8,16,32. At each step, it will allocate a new block of memory and free the old one, making the storage size 1,3,7,15,31,63. When we try to push_back beyond 32 elements, the next allocation will overflow the buffer. Am I missing something?

Maybe

http://www.cs.aau.dk/~nesotto/boost/trunk/libs/auto_buffer/doc/html/ http://www.cs.aau.dk/~nesotto/boost/trunk/boost/auto_buffer/

is what you are looking for?

That doesn't do exactly what I want for two reasons, but most might be fixable. 1) I want run-time variable amounts of space (this does involve some alloca, and probably macro, hackery). 2) I want it to be an error if it is necessary to go to the heap. For (1) I use a macro something like: #define GET_ALLOCA_MEM(type, x) box<type>(x, alloca(x * sizeof(type)) Because obviously you can't call alloca inside the auto_buffer constructor. Chris

Christopher Jefferson

11 Jun 11 Jun

11:52 a.m.

On 10 Jun 2009, at 22:28, Christopher Jefferson wrote:

...

On 10 Jun 2009, at 22:08, Thorsten Ottosen wrote:

...
Christopher Jefferson skrev:

...
Very interesting. I have been working on a similar, but much more limited, proposal which just does this for vector. I decided I had to implement a new class, because building on top of the 'allocator' framework wasn't sensible, because you waste too much buffer space. For example, I believe your example won't work. I'd be interested to see what you think. Consider your example, in g++ (which I know how it operates internally) boost::monotonic::inline_storage<100*sizeof(Object)> storage; // create local storage on the stack boost::monotonic::vector<Object> deathrow(storage); // create a std::vector that uses this storage foreach (object in world) { if (IsDead(object)) deathrow.push_back(object); // allocation is just advancing a pointer } This will build a vector which will continuously expand, through sizes 1,2,4,8,16,32. At each step, it will allocate a new block of memory and free the old one, making the storage size 1,3,7,15,31,63. When we try to push_back beyond 32 elements, the next allocation will overflow the buffer. Am I missing something?

Maybe

http://www.cs.aau.dk/~nesotto/boost/trunk/libs/auto_buffer/doc/html/ http://www.cs.aau.dk/~nesotto/boost/trunk/boost/auto_buffer/

is what you are looking for?

That doesn't do exactly what I want for two reasons, but most might be fixable.

1) I want run-time variable amounts of space (this does involve some alloca, and probably macro, hackery). 2) I want it to be an error if it is necessary to go to the heap.

For (1) I use a macro something like:

#define GET_ALLOCA_MEM(type, x) box<type>(x, alloca(x * sizeof(type))

Because obviously you can't call alloca inside the auto_buffer constructor.

Replying to myself, passing an external buffer into auto_buffer isn't completely trivial, because you would also have to disable to copy constructor (at least, I can't think of anything else sensible to do). Chris On Thu, Jun 11, 2009 at 6:11 AM, Beman Dawes <bdawes@acm.org> wrote:

...

There is a pending request for sandbox write access from Christian Schladetsch.

This guy has been exhibiting problem behavior. He has not been following discussion policy and has insulted the motives and competence of various Boosters such as Joel.

I just don't have the time to deal with him, and won't until maybe Monday, so I'm hoping some other moderator could put him on warning that we will ban him if he doesn't follow our discussion policy.

Thanks,

--Beman _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Thorsten Ottosen

12:35 p.m.

Christopher Jefferson skrev:

...

On 10 Jun 2009, at 22:28, Christopher Jefferson wrote:

...
On 10 Jun 2009, at 22:08, Thorsten Ottosen wrote:

...
Christopher Jefferson skrev:

...
Very interesting. I have been working on a similar, but much more limited, proposal which just does this for vector. I decided I had to implement a new class, because building on top of the 'allocator' framework wasn't sensible, because you waste too much buffer space. For example, I believe your example won't work. I'd be interested to see what you think. Consider your example, in g++ (which I know how it operates internally) boost::monotonic::inline_storage<100*sizeof(Object)> storage; // create local storage on the stack boost::monotonic::vector<Object> deathrow(storage); // create a std::vector that uses this storage foreach (object in world) { if (IsDead(object)) deathrow.push_back(object); // allocation is just advancing a pointer } This will build a vector which will continuously expand, through sizes 1,2,4,8,16,32. At each step, it will allocate a new block of memory and free the old one, making the storage size 1,3,7,15,31,63. When we try to push_back beyond 32 elements, the next allocation will overflow the buffer. Am I missing something?

Maybe

http://www.cs.aau.dk/~nesotto/boost/trunk/libs/auto_buffer/doc/html/ http://www.cs.aau.dk/~nesotto/boost/trunk/boost/auto_buffer/

is what you are looking for?

That doesn't do exactly what I want for two reasons, but most might be fixable.

1) I want run-time variable amounts of space (this does involve some alloca, and probably macro, hackery). 2) I want it to be an error if it is necessary to go to the heap.

For (1) I use a macro something like:

#define GET_ALLOCA_MEM(type, x) box<type>(x, alloca(x * sizeof(type))

Because obviously you can't call alloca inside the auto_buffer constructor.

Replying to myself, passing an external buffer into auto_buffer isn't completely trivial, because you would also have to disable to copy constructor (at least, I can't think of anything else sensible to do).

As far as I know, alloca() is not portable. Am I wrong? You might use variable lenght arrays (C99), but see "Imperfect C++" by Matthew Wilson on why they suck. His analysis is the reason he implements stlsoft::auto_buffer instead. For 2), if you want to get an error when the stacl buffer is exhausted, you may write a custom GrowPolicy that gies an error inside GrowPolicy::new_capacity(). -Thorsten

Christopher Jefferson

12:49 p.m.

On 11 Jun 2009, at 13:35, Thorsten Ottosen wrote:

...

Christopher Jefferson skrev:

...
On 10 Jun 2009, at 22:28, Christopher Jefferson wrote:

...
On 10 Jun 2009, at 22:08, Thorsten Ottosen wrote:

...
Christopher Jefferson skrev:

...
Very interesting. I have been working on a similar, but much more limited, proposal which just does this for vector. I decided I had to implement a new class, because building on top of the 'allocator' framework wasn't sensible, because you waste too much buffer space. For example, I believe your example won't work. I'd be interested to see what you think. Consider your example, in g++ (which I know how it operates internally) boost::monotonic::inline_storage<100*sizeof(Object)> storage; // create local storage on the stack boost::monotonic::vector<Object> deathrow(storage); // create a std::vector that uses this storage foreach (object in world) { if (IsDead(object)) deathrow.push_back(object); // allocation is just advancing a pointer } This will build a vector which will continuously expand, through sizes 1,2,4,8,16,32. At each step, it will allocate a new block of memory and free the old one, making the storage size 1,3,7,15,31,63. When we try to push_back beyond 32 elements, the next allocation will overflow the buffer. Am I missing something?

Maybe

http://www.cs.aau.dk/~nesotto/boost/trunk/libs/auto_buffer/doc/ html/ http://www.cs.aau.dk/~nesotto/boost/trunk/boost/auto_buffer/

is what you are looking for?

That doesn't do exactly what I want for two reasons, but most might be fixable.

1) I want run-time variable amounts of space (this does involve some alloca, and probably macro, hackery). 2) I want it to be an error if it is necessary to go to the heap.

For (1) I use a macro something like:

#define GET_ALLOCA_MEM(type, x) box<type>(x, alloca(x * sizeof(type))

Because obviously you can't call alloca inside the auto_buffer constructor.

Replying to myself, passing an external buffer into auto_buffer isn't completely trivial, because you would also have to disable to copy constructor (at least, I can't think of anything else sensible to do).

As far as I know, alloca() is not portable. Am I wrong?

It isn't standard I believe, but it exists on at least Mac OS X, FreeBSD, linux and windows, and software I write uses it successfully on all these platforms.

...

You might use variable lenght arrays (C99), but see "Imperfect C++" by Matthew Wilson on why they suck. His analysis is the reason he implements stlsoft::auto_buffer instead.

stlsoft::auto_buffer seems to have exactly the same restriction as your auto_buffer, you must declare the amount of space (or at least an upper bound) as a compile-time constant. I haven't got a copy of Imperfect C++, but I have to say my personal experience is alloca and VLA (I experimented with using a VLA of chars instead of alloca, the two are basically implemented the same underneath) work fine.

...

For 2), if you want to get an error when the stacl buffer is exhausted, you may write a custom GrowPolicy that gies an error inside GrowPolicy::new_capacity().

Thanks, that sounds like a good idea.

...

-Thorsten

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Thorsten Ottosen

2:25 p.m.

Christopher Jefferson skrev:

...

On 11 Jun 2009, at 13:35, Thorsten Ottosen wrote:

...

...
...
...
That doesn't do exactly what I want for two reasons, but most might be fixable.

1) I want run-time variable amounts of space (this does involve some alloca, and probably macro, hackery).

...

...
...
...
For (1) I use a macro something like:

#define GET_ALLOCA_MEM(type, x) box<type>(x, alloca(x * sizeof(type))

Because obviously you can't call alloca inside the auto_buffer constructor.

Replying to myself, passing an external buffer into auto_buffer isn't completely trivial, because you would also have to disable to copy constructor (at least, I can't think of anything else sensible to do).

As far as I know, alloca() is not portable. Am I wrong?

It isn't standard I believe, but it exists on at least Mac OS X, FreeBSD, linux and windows, and software I write uses it successfully on all these platforms.

...
You might use variable lenght arrays (C99), but see "Imperfect C++" by Matthew Wilson on why they suck. His analysis is the reason he implements stlsoft::auto_buffer instead.

stlsoft::auto_buffer seems to have exactly the same restriction as your auto_buffer, you must declare the amount of space (or at least an upper bound) as a compile-time constant. I haven't got a copy of Imperfect C++, but I have to say my personal experience is alloca and VLA (I experimented with using a VLA of chars instead of alloca, the two are basically implemented the same underneath) work fine.

In his book he describes how differently the VLA implemetations behave, and how they have serious problems with recursion. I don't know if implementations have improved snice then. Anyway, I don't see an easy way to get rid of the restruction. Maybe it is possible to use a custom dummy allocator with auto_buffer. Then pass an allocator instance (which has been initialized somehow with alloca()) to the constructor of auto_buffer. Then you can say boost::alloca_allocator<T> alloc( n, alloca(sizeof(T)*n ) ); boost::auto_buffer<T,boost::store_n_objects<0>,boost::bounded_grow_policy,boost::alloca_allocator<T>> buffer( alloc ); I wish we had template alias :-) As for disabling the copy-constructor, then it would be nice, but I think you can smiply make the allocator non-copyable to do that. -Thorsten

Christian Schladetsch

13 Jun 13 Jun

6:47 a.m.

Hi Thorsten, I have attempted to use boost::auto_buffer for monotonic::inline_storage. If a resize is required and the inline_storage is used by a contiguous container such as a std::vector, the resize breaks said container. Just before a resize is required and the inline_storage is used by a node container such as a std::map, the resize breaks said container (under MSVC at least) as it thinks it has exhausted all of its memory :( The code is at https://svn.boost.org/svn/boost/sandbox/monotonic/libs/monotonic/test/main.c.... I have made reference to this in the documentation changes as well. It would be nice to have the best of both worlds, as people have suggested previously, but I coulnd't then and I can't now see a way for inline_storage, which is used by containers, to use auto_buffer for its allocation buffer. Perhaps I am missing something? Has anyone tried to use auto_buffer<> to provide memory for containers? Regards, Christian. On Fri, Jun 12, 2009 at 2:25 AM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:

...

Christopher Jefferson skrev:

...
On 11 Jun 2009, at 13:35, Thorsten Ottosen wrote:

That doesn't do exactly what I want for two reasons, but most might be

...
...
...
...
fixable.

1) I want run-time variable amounts of space (this does involve some alloca, and probably macro, hackery).

...
...
...
...
For (1) I use a macro something like:

#define GET_ALLOCA_MEM(type, x) box<type>(x, alloca(x * sizeof(type))

Because obviously you can't call alloca inside the auto_buffer constructor.

Replying to myself, passing an external buffer into auto_buffer isn't completely trivial, because you would also have to disable to copy constructor (at least, I can't think of anything else sensible to do).

As far as I know, alloca() is not portable. Am I wrong?

It isn't standard I believe, but it exists on at least Mac OS X, FreeBSD, linux and windows, and software I write uses it successfully on all these platforms.

...
You might use variable lenght arrays (C99), but see "Imperfect C++" by Matthew Wilson on why they suck. His analysis is the reason he implements stlsoft::auto_buffer instead.

stlsoft::auto_buffer seems to have exactly the same restriction as your auto_buffer, you must declare the amount of space (or at least an upper bound) as a compile-time constant. I haven't got a copy of Imperfect C++, but I have to say my personal experience is alloca and VLA (I experimented with using a VLA of chars instead of alloca, the two are basically implemented the same underneath) work fine.

In his book he describes how differently the VLA implemetations behave, and how they have serious problems with recursion. I don't know if implementations have improved snice then.

Anyway, I don't see an easy way to get rid of the restruction.

Maybe it is possible to use a custom dummy allocator with auto_buffer. Then pass an allocator instance (which has been initialized somehow with alloca()) to the constructor of auto_buffer.

Then you can say

boost::alloca_allocator<T> alloc( n, alloca(sizeof(T)*n ) ); boost::auto_buffer<T,boost::store_n_objects<0>,boost::bounded_grow_policy,boost::alloca_allocator<T>> buffer( alloc );

I wish we had template alias :-)

As for disabling the copy-constructor, then it would be nice, but I think you can smiply make the allocator non-copyable to do that.

-Thorsten

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Thorsten Ottosen

14 Jun 14 Jun

6:39 p.m.

Christian Schladetsch skrev:

...

Hi Thorsten,

I have attempted to use boost::auto_buffer for monotonic::inline_storage.

If a resize is required and the inline_storage is used by a contiguous container such as a std::vector, the resize breaks said container.

What particularly breaks the container? And why do you want to use auto_buffer for inline_storage? -Thorsten

Christian Schladetsch

10:29 p.m.

On Mon, Jun 15, 2009 at 6:39 AM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:

...

Christian Schladetsch skrev:

...
Hi Thorsten,

I have attempted to use boost::auto_buffer for monotonic::inline_storage.

If a resize is required and the inline_storage is used by a contiguous container such as a std::vector, the resize breaks said container.

What particularly breaks the container?

A resize that moves the storage for auto_buffer from the stack to the heap invalidates the pointers used by the vector.

...

And why do you want to use auto_buffer for inline_storage?

It was suggested to me that auto_buffer and monotonic were doing similar things, and that I should investigate using auto_buffer for the storage to remove any redundancy.

...

-Thorsten

On a related note: I am working on a series of containers that allow part of their storage to come from the stack, and other parts to be on the heap. I have since written rope<T, C>. This presents a sequence container to the user, using a list of vectors internally. T is the value_type, C is the ChunkSize. Each vector within the list-of-vectors is at least ChunkSize long. It provides O(N/ChunkSize) access time to elements in the sequence. So it is slower than a vector, but faster than a list. The speed of access can be tuned by the user by varying the ChunkSize. The main benefit of a rope over a vector is that a rope can grow indefinately without ever needing to be resized. I realise that 'rope' is a bad name, as it is often used to describe a similar but different data-structure often used for strings. It's a working name only at the moment. I created it mainly to address the issue of resizing vectors that use a monotonic allocator. When such a vector is resized, it is very wasteful of memory because the old storage is not reclaimed until the underlying storage goes out of scope. By using a 'rope' in this case, a sequence with almost-O(1)-acccess can use a monotonic allocator and still grow.. well, monotonically, without wasting memory and without ever needing to be resized. I am not really convinced of the utility of this, as the main reason for using a monotonic allocator in the first place is speed, and so it is counter-intuitive and counter-productive to then have to use to a slower and not contiguous vector-like thing. However, it can also support sequences where resizing is otherwise undesirable or impossible. For example, my 'rope' supports containers where some of the elements are on the stack, and others are on the heap. As another example, a 'rope' could also be made that supports the idea of `sequence of noncopyable T`, so I'm sticking with the idea for a little while. Anyway, it's in the sandbox. Regards, Christian

Scott McMurray

10:34 p.m.

2009/6/14 Christian Schladetsch <christian.schladetsch@gmail.com>:

...

I have since written rope<T, C>. This presents a sequence container to the user, using a list of vectors internally. T is the value_type, C is the ChunkSize. Each vector within the list-of-vectors is at least ChunkSize long.

It provides O(N/ChunkSize) access time to elements in the sequence. So it is slower than a vector, but faster than a list. The speed of access can be tuned by the user by varying the ChunkSize.

The main benefit of a rope over a vector is that a rope can grow indefinately without ever needing to be resized.

How does this differ from a deque?

Steven Watanabe

10:40 p.m.

AMDG Scott McMurray wrote:

...

2009/6/14 Christian Schladetsch <christian.schladetsch@gmail.com>:

...
I have since written rope<T, C>. This presents a sequence container to the user, using a list of vectors internally. T is the value_type, C is the ChunkSize. Each vector within the list-of-vectors is at least ChunkSize long.

It provides O(N/ChunkSize) access time to elements in the sequence. So it is slower than a vector, but faster than a list. The speed of access can be tuned by the user by varying the ChunkSize.

The main benefit of a rope over a vector is that a rope can grow indefinately without ever needing to be resized.

How does this differ from a deque?

deque has fixed size blocks In Christ, Steven Watanabe

Christian Schladetsch

11:30 p.m.

...

...
...
The main benefit of a rope over a vector is that a rope can grow indefinately without ever needing to be resized.

How does this differ from a deque?

deque has fixed size blocks

monotonic::inline_storage<4000> storage; // create local storage on the stack { { std::deque<int, boost::monotonic::allocator<int> > deque(storage); generate_n(back_inserter(deque), 100, rand); cout << "deque: " << storage.used() << endl; } storage.reset(); { std::vector<int, boost::monotonic::allocator<int> > vector(storage); generate_n(back_inserter(vector), 100, rand); cout << "vector: " << storage.used() << endl; } storage.reset(); { monotonic::rope<int> rope(storage); generate_n(back_inserter(rope), 100, rand); cout << "default rope: " << storage.used() << endl; } storage.reset(); { monotonic::rope<int, 100> rope2(storage); generate_n(back_inserter(rope2), 100, rand); cout << "rope<100>: " << storage.used() << endl; } storage.reset(); }

...

...
...

deque: 736 vector: 1724 default rope: 608 rope<100>: 464 Deque is certainly a valid container to use with monotonic allocators which avoids resizing. The main benefit of a rope over a deque is that it can span the stack/heap boundary. Obviously, I have to do some performance testing of deque vs. `rope`. Christian.

Christian Schladetsch

11:35 p.m.

For completeness: { std::list<int, boost::monotonic::allocator<int> > list(storage); generate_n(back_inserter(list), 100, rand); cout << "list: " << storage.used() << endl; } list: 1612 deque: 736 vector: 1724 default rope: 608 rope<100>: 464 On Mon, Jun 15, 2009 at 11:30 AM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...

...
...
...
The main benefit of a rope over a vector is that a rope can grow indefinately without ever needing to be resized.

How does this differ from a deque?

deque has fixed size blocks

monotonic::inline_storage<4000> storage; // create local storage on the stack { { std::deque<int, boost::monotonic::allocator<int> > deque(storage); generate_n(back_inserter(deque), 100, rand); cout << "deque: " << storage.used() << endl; } storage.reset();

{ std::vector<int, boost::monotonic::allocator<int> > vector(storage); generate_n(back_inserter(vector), 100, rand); cout << "vector: " << storage.used() << endl; } storage.reset();

{ monotonic::rope<int> rope(storage); generate_n(back_inserter(rope), 100, rand); cout << "default rope: " << storage.used() << endl; } storage.reset();

{ monotonic::rope<int, 100> rope2(storage); generate_n(back_inserter(rope2), 100, rand); cout << "rope<100>: " << storage.used() << endl; } storage.reset(); }

...
...
...
deque: 736 vector: 1724 default rope: 608 rope<100>: 464

Deque is certainly a valid container to use with monotonic allocators which avoids resizing. The main benefit of a rope over a deque is that it can span the stack/heap boundary.

Obviously, I have to do some performance testing of deque vs. `rope`.

Christian.

David Bergman

11:37 p.m.

On Jun 14, 2009, at 7:35 PM, Christian Schladetsch wrote:

...

For completeness:

{ std::list<int, boost::monotonic::allocator<int> > list(storage); generate_n(back_inserter(list), 100, rand); cout << "list: " << storage.used() << endl; }

list: 1612 deque: 736 vector: 1724 default rope: 608 rope<100>: 464

Why do I carry these feeling that all samples we will see here have monotonic (roped or not...) as the winner? ;-) /Davd

...

On Mon, Jun 15, 2009 at 11:30 AM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...
...
...
...
The main benefit of a rope over a vector is that a rope can grow indefinately without ever needing to be resized.

How does this differ from a deque?

deque has fixed size blocks

monotonic::inline_storage<4000> storage; // create local storage on the stack { { std::deque<int, boost::monotonic::allocator<int> > deque(storage); generate_n(back_inserter(deque), 100, rand); cout << "deque: " << storage.used() << endl; } storage.reset();

{ std::vector<int, boost::monotonic::allocator<int> > vector(storage); generate_n(back_inserter(vector), 100, rand); cout << "vector: " << storage.used() << endl; } storage.reset();

{ monotonic::rope<int> rope(storage); generate_n(back_inserter(rope), 100, rand); cout << "default rope: " << storage.used() << endl; } storage.reset();

{ monotonic::rope<int, 100> rope2(storage); generate_n(back_inserter(rope2), 100, rand); cout << "rope<100>: " << storage.used() << endl; } storage.reset(); }

...
...
...
deque: 736 vector: 1724 default rope: 608 rope<100>: 464

Deque is certainly a valid container to use with monotonic allocators which avoids resizing. The main benefit of a rope over a deque is that it can span the stack/heap boundary.

Obviously, I have to do some performance testing of deque vs. `rope`.

Christian.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

15 Jun 15 Jun

7:55 a.m.

On Mon, Jun 15, 2009 at 11:37 AM, David Bergman < David.Bergman@bergmangupta.com> wrote:

...

[snip] Why do I carry these feeling that all samples we will see here have monotonic (roped or not...) as the winner? ;-)

I wrote a bubble-sort to compare the speed of monotonic structures produced on the stack, versus standard structures built on the heap. The idea was to test my theoretical claim regarding list-node sizes and spatial locality on the stack outlined a few posts ago. The test code is here http://tinyurl.com/m77bzh. Results for VS2008, /O2 size mono std mono/std 3 0.001 0.008 0.12 13 0.027 0.047 0.57 23 0.085 0.119 0.71 33 0.179 0.23 0.77 43 0.305 0.377 0.80 53 0.472 0.552 0.85 63 0.655 0.759 0.86 73 0.883 1.009 0.87 83 1.14 1.295 0.88 93 1.444 1.639 0.88 Results for GCC 4.3.3, -O4 3 0 0 nan% 13 0.01 0.02 0.5% 23 0.02 0.03 0.666667% 33 0.04 0.05 0.8% 43 0.05 0.08 0.625% 53 0.06 0.09 0.666667% 63 0.11 0.13 0.846154% 73 0.14 0.2 0.7% 83 0.15 0.22 0.681818% 93 0.22 0.27 0.814815% Times are in microseconds. It is worth noting that there is no list manipulation in this test. There is only list creation, forward iteration, and destruction. Cheers, Christian * *

Christopher Jefferson

8:03 a.m.

On 15 Jun 2009, at 08:55, Christian Schladetsch wrote:

...

On Mon, Jun 15, 2009 at 11:37 AM, David Bergman < David.Bergman@bergmangupta.com> wrote:

...
[snip] Why do I carry these feeling that all samples we will see here have monotonic (roped or not...) as the winner? ;-)

I wrote a bubble-sort to compare the speed of monotonic structures produced on the stack, versus standard structures built on the heap.

The idea was to test my theoretical claim regarding list-node sizes and spatial locality on the stack outlined a few posts ago.

I think you are over-stating the importance of the stack. I just ran your experiments with a heap-allocated inline_storage and got very similar results.

...

The test code is here http://tinyurl.com/m77bzh.

Results for VS2008, /O2

size mono std mono/std 3 0.001 0.008 0.12 13 0.027 0.047 0.57 23 0.085 0.119 0.71 33 0.179 0.23 0.77 43 0.305 0.377 0.80 53 0.472 0.552 0.85 63 0.655 0.759 0.86 73 0.883 1.009 0.87 83 1.14 1.295 0.88 93 1.444 1.639 0.88

Results for GCC 4.3.3, -O4

3 0 0 nan% 13 0.01 0.02 0.5% 23 0.02 0.03 0.666667% 33 0.04 0.05 0.8% 43 0.05 0.08 0.625% 53 0.06 0.09 0.666667% 63 0.11 0.13 0.846154% 73 0.14 0.2 0.7% 83 0.15 0.22 0.681818% 93 0.22 0.27 0.814815%

Times are in microseconds.

It is worth noting that there is no list manipulation in this test. There is only list creation, forward iteration, and destruction.

Cheers, Christian

*

* _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

9:01 a.m.

Hi Christopher, I think you are over-stating the importance of the stack. I just ran your

...

experiments with a heap-allocated inline_storage and got very similar results.

You are right, and I should have posted results for heap- and stack-based storage originally. Belatedly, here are my results for heap-based monotonic storage, VS2008: size mono std mono/std 3 0.001 0.008 0.12 13 0.027 0.046 0.58 23 0.089 0.119 0.74 33 0.183 0.226 0.80 43 0.31 0.369 0.84 53 0.474 0.555 0.85 63 0.669 0.745 0.89 73 0.912 0.987 0.92 83 1.183 1.269 0.93 93 1.467 1.594 0.92 Compared with my posted results for stack-based storage, VS2008: 3 0.001 0.008 0.12 13 0.027 0.047 0.57 23 0.085 0.119 0.71 33 0.179 0.23 0.77 43 0.305 0.377 0.80 53 0.472 0.552 0.85 63 0.655 0.759 0.86 73 0.883 1.009 0.87 83 1.14 1.295 0.88 93 1.444 1.639 0.88 Putting the data on the stack makes a slight difference. And it can only be explained by spatial locality on the stack. Not much difference in this specific case, surely, but ~5% is far better than nothing, especially if it is free. All this test did was forward iterate over the list. I predict there would be greater benefit if there was more traffic between elements in the container and stack-based values. In any case, the fact that it works almost as well on the heap as on the stack is a good thing. I need to isolate out the cost of new and delete, to see what the benefits are to having a smaller data-structure, if any. Christian.

Christian Schladetsch

9:55 a.m.

Test of duplication speed, http://tinyurl.com/mshep8 VS2008 test_dupe: mono: 2.356 test_dupe: std: 17.318 GCC test_dupe: mono: 2.2 test_dupe: std: 6.67 Christian. PS. Yes, I triple-checked the VS result.

Scott McMurray

4:40 p.m.

2009/6/15 Christian Schladetsch <christian.schladetsch@gmail.com>:

...

I wrote a bubble-sort to compare the speed of monotonic structures produced on the stack, versus standard structures built on the heap.

The idea was to test my theoretical claim regarding list-node sizes and spatial locality on the stack outlined a few posts ago.

If you're talking about spacial locality, then shouldn't you also be using something like a funnel sort that's designed to take advantage of it?

Scott McMurray

14 Jun 14 Jun

11:39 p.m.

2009/6/14 Christian Schladetsch <christian.schladetsch@gmail.com>:

...

deque: 736 vector: 1724 default rope: 608 rope<100>: 464

Of course, if you did a reserve on vector you'd get the best result of all...

Christian Schladetsch

15 Jun 15 Jun

3:16 a.m.

Hello, Before I make a third proposal within a single week, I wanted to ask whatever crowd remains on this thread about the idea of a boost::chain. The basic interface is: template <class T, size_t C = 64, class Al = std::allocator<T> > struct chain; A `chain` (what I previously called a `rope`) is a random-access sequence. Internally, it is a sequence of array<T>'s, where each such 'link in the chain' has C elements. By default, it has O(1) access time. It is similar to a deque (with the same interface), except that it is nominally more efficient and the user can specify the trade-off on access time vs. granularity. Not incidentally, it also supports different links having their storage in different places. The complete interface is: template <class T , size_t C = 64 , class Al = std::allocator<T> , class Link = boost::uninitialised_array<T, C, Al> , class Links = std::deque<Link, Al>

...

struct chain; I have made reference to boost::uninitialised_array<T, C, Al> here. I don't think such a beast exists at the moment, but the idea is pretty simple. It is a fixed-sized array of uninitialised T's using storage from the given allocator. In the general case, chain<> is a hard-sell over a std::deque<>. They are functionally equivalent, and have the same big-O qualities. However, because chain<> uses fixed-size, inline storage for the links, and because it can be user-tuned, it can be more efficient in time and space than a deque. How much would be hard to guess, is abusable by the user, and whether it would be even worth finding out is hard to tell. But of course it is abundantly clear that the motivation for such a structure is to create an ideal 'array-like sequence' for use with a monotonic allocator, and further, for the case where some of the links are on the stack, and others are on the heap. That said, it is also clear that it may have application outside of that. So, my question is: should I bother writing this generally and make a proposal to put it into boost::, or do I just make it specialised and put it into boost::monotonic::chain? Thanks in advance for your thoughts, Regards, Christian PS. The mere act of writing post this makes me think that chain<> would raise limited interest in the general case.

Ross Levine

5:04 a.m.

I can't really think of a good niche for this. A deque is probably close enough for most people, and in general I don't think that most users' performance bottlenecks are caused in STL containers. Moreover, for those who *are* getting poor performance from STL containers probably will want to roll their own container that works in a custom way. This "chain" seems like a container that would rarely be used, because in order to use it, a user would want something that was like a deque, but performed nominally better, and would need to have profiled their code and discovered that it was the deque that was causing the problem. I am of the serious opinion that hardly anyone would have a legitimate use for it. So if you are convinced that it would useful, I would suggest against trying to submit it on its own. It's just a deque that you can choose the size of the chunks. On Sun, Jun 14, 2009 at 11:16 PM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...

Hello,

Before I make a third proposal within a single week, I wanted to ask whatever crowd remains on this thread about the idea of a boost::chain.

The basic interface is:

template <class T, size_t C = 64, class Al = std::allocator<T> > struct chain;

A `chain` (what I previously called a `rope`) is a random-access sequence. Internally, it is a sequence of array<T>'s, where each such 'link in the chain' has C elements.

By default, it has O(1) access time. It is similar to a deque (with the same interface), except that it is nominally more efficient and the user can specify the trade-off on access time vs. granularity. Not incidentally, it also supports different links having their storage in different places.

The complete interface is:

template <class T , size_t C = 64 , class Al = std::allocator<T> , class Link = boost::uninitialised_array<T, C, Al> , class Links = std::deque<Link, Al>

...
struct chain;

I have made reference to boost::uninitialised_array<T, C, Al> here. I don't think such a beast exists at the moment, but the idea is pretty simple. It is a fixed-sized array of uninitialised T's using storage from the given allocator.

In the general case, chain<> is a hard-sell over a std::deque<>. They are functionally equivalent, and have the same big-O qualities.

However, because chain<> uses fixed-size, inline storage for the links, and because it can be user-tuned, it can be more efficient in time and space than a deque. How much would be hard to guess, is abusable by the user, and whether it would be even worth finding out is hard to tell.

But of course it is abundantly clear that the motivation for such a structure is to create an ideal 'array-like sequence' for use with a monotonic allocator, and further, for the case where some of the links are on the stack, and others are on the heap.

That said, it is also clear that it may have application outside of that.

So, my question is: should I bother writing this generally and make a proposal to put it into boost::, or do I just make it specialised and put it into boost::monotonic::chain?

Thanks in advance for your thoughts,

Regards, Christian

PS. The mere act of writing post this makes me think that chain<> would raise limited interest in the general case. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

5:10 a.m.

Thanks Ross, I have come to the same conclusion. On Mon, Jun 15, 2009 at 5:04 PM, Ross Levine <ross.levine@uky.edu> wrote:

...

I can't really think of a good niche for this. A deque is probably close enough for most people, and in general I don't think that most users' performance bottlenecks are caused in STL containers. Moreover, for those who *are* getting poor performance from STL containers probably will want to roll their own container that works in a custom way. This "chain" seems like a container that would rarely be used, because in order to use it, a user would want something that was like a deque, but performed nominally better, and would need to have profiled their code and discovered that it was the deque that was causing the problem. I am of the serious opinion that hardly anyone would have a legitimate use for it.

So if you are convinced that it would useful, I would suggest against trying to submit it on its own. It's just a deque that you can choose the size of the chunks.

On Sun, Jun 14, 2009 at 11:16 PM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...
Hello,

Before I make a third proposal within a single week, I wanted to ask whatever crowd remains on this thread about the idea of a boost::chain.

The basic interface is:

template <class T, size_t C = 64, class Al = std::allocator<T> > struct chain;

A `chain` (what I previously called a `rope`) is a random-access sequence. Internally, it is a sequence of array<T>'s, where each such 'link in the chain' has C elements.

By default, it has O(1) access time. It is similar to a deque (with the same interface), except that it is nominally more efficient and the user can specify the trade-off on access time vs. granularity. Not incidentally, it also supports different links having their storage in different places.

The complete interface is:

template <class T , size_t C = 64 , class Al = std::allocator<T> , class Link = boost::uninitialised_array<T, C, Al> , class Links = std::deque<Link, Al>

...
struct chain;

I have made reference to boost::uninitialised_array<T, C, Al> here. I don't think such a beast exists at the moment, but the idea is pretty simple. It is a fixed-sized array of uninitialised T's using storage from the given allocator.

In the general case, chain<> is a hard-sell over a std::deque<>. They are functionally equivalent, and have the same big-O qualities.

However, because chain<> uses fixed-size, inline storage for the links, and because it can be user-tuned, it can be more efficient in time and space than a deque. How much would be hard to guess, is abusable by the user, and whether it would be even worth finding out is hard to tell.

But of course it is abundantly clear that the motivation for such a structure is to create an ideal 'array-like sequence' for use with a monotonic allocator, and further, for the case where some of the links are on the stack, and others are on the heap.

That said, it is also clear that it may have application outside of that.

So, my question is: should I bother writing this generally and make a proposal to put it into boost::, or do I just make it specialised and put it into boost::monotonic::chain?

Thanks in advance for your thoughts,

Regards, Christian

PS. The mere act of writing post this makes me think that chain<> would raise limited interest in the general case. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Thorsten Ottosen

2:36 p.m.

Christian Schladetsch skrev:

...

Thanks Ross,

I have come to the same conclusion.

FWIW, I'm working with one of the GSOC people that didn't get accepted on somethign we call boost::devector. It should be ready over the summer. -Thorsten

Thorsten Ottosen

2:37 p.m.

Christian Schladetsch skrev:

...

Hello,

The complete interface is:

template <class T , size_t C = 64 , class Al = std::allocator<T> , class Link = boost::uninitialised_array<T, C, Al> , class Links = std::deque<Link, Al> struct chain;

I have made reference to boost::uninitialised_array<T, C, Al> here. I don't think such a beast exists at the moment, but the idea is pretty simple. It is a fixed-sized array of uninitialised T's using storage from the given allocator.

boost::auto_buffer is usable like this via the function unitialized_grow(C). -Thorsten

Christian Schladetsch

8:53 p.m.

On Tue, Jun 16, 2009 at 2:37 AM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:

...

Christian Schladetsch skrev:

...
Hello,

The complete interface is:

template <class T , size_t C = 64 , class Al = std::allocator<T> , class Link = boost::uninitialised_array<T, C, Al> , class Links = std::deque<Link, Al> struct chain;

I have made reference to boost::uninitialised_array<T, C, Al> here. I don't think such a beast exists at the moment, but the idea is pretty simple. It is a fixed-sized array of uninitialised T's using storage from the given allocator.

boost::auto_buffer is usable like this via the function unitialized_grow(C).

Ahh right. I shall look into using auto_buffer for my chain and see if it can be resurrected in the general case. It will still be tricky, because IIRC auto_buffer doesn't span the stack/heap boundary? That is, it is either all on the stack, or all on the heap? The original motivation for chain (earlier 'rope') was to avoid resize, and to allow some of the links in the chain to be on the stack, and other links to be on the heap. Christian.

Thorsten Ottosen

9:08 p.m.

Christian Schladetsch skrev:

...

...
boost::auto_buffer is usable like this via the function unitialized_grow(C).

Ahh right. I shall look into using auto_buffer for my chain and see if it can be resurrected in the general case. It will still be tricky, because IIRC auto_buffer doesn't span the stack/heap boundary? That is, it is either all on the stack, or all on the heap?

The original motivation for chain (earlier 'rope') was to avoid resize, and to allow some of the links in the chain to be on the stack, and other links to be on the heap.

You're right that it is always either on the stack on on the heap. That is the only way to support T* iterators. OTOH, since you're only using it as a memory buffer, I see no reason why you should not be able to hand out pointers to both spaces. The main problem might be that is difficult to get the pointer to the stack buffer after a resize, but the memory stays valid. It wouldn't be hard to add T* auto_buffre::stack_buffer() though. -Thorsten

Thorsten Ottosen

2:42 p.m.

Christian Schladetsch skrev:

...

On Mon, Jun 15, 2009 at 6:39 AM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:

...
Christian Schladetsch skrev:

...
Hi Thorsten,

I have attempted to use boost::auto_buffer for monotonic::inline_storage.

If a resize is required and the inline_storage is used by a contiguous container such as a std::vector, the resize breaks said container.

What particularly breaks the container?

A resize that moves the storage for auto_buffer from the stack to the heap invalidates the pointers used by the vector.

...
And why do you want to use auto_buffer for inline_storage?

It was suggested to me that auto_buffer and monotonic were doing similar things, and that I should investigate using auto_buffer for the storage to remove any redundancy.

I'm not sure why you use the container that way. Don't you normally allocate everything at once? If so, then why don't you just do the same with the auto_buffer? Again, unitialized_grow() might be useful. -Thorsten

Christian Schladetsch

9:02 p.m.

...

Christian>

...
It was suggested to me that auto_buffer and monotonic were doing similar things, and that I should investigate using auto_buffer for the storage to remove any redundancy.

Thorsten> I'm not sure why you use the container that way. Don't you normally allocate everything at once? If so, then why don't you just do the same with the auto_buffer? Again, unitialized_grow() might be useful.

auto_buffer<char, store_n_bytes<N> > as a longer way of saying char [N] is not compelling and doesn't add anything. growing this is not possible on the stack, and 'growing' it by moving it to the heap makes it unsuitable as a storage basis for containers. What I see instead is that containers like chain<> can use monotonic::storage where some of the earlier links are on the stack, and when that is exhausted, the remaining links go on the heap. This is transparent to the chain. Similiarly for other node-based containers, but this cannot work for obvious reasons for vector. Christian.

Simonson, Lucanus J

9:12 p.m.

Christian Schladetsch wrote:

...

What I see instead is that containers like chain<> can use monotonic::storage where some of the earlier links are on the stack, and when that is exhausted, the remaining links go on the heap. This is transparent to the chain. Similiarly for other node-based containers, but this cannot work for obvious reasons for vector.

Your allocator needs to be smart about what to do if a request for something larger than the chunk size comes in. Luke

Christian Schladetsch

16 Jun 16 Jun

4:29 a.m.

...

What I see instead is that containers like chain<> can use monotonic::storage where some of the earlier links are on the stack, and when that is exhausted, the remaining links go on the heap. This is transparent to the chain. Similiarly for other node-based containers, but this cannot work for obvious reasons for vector.

Wrong again. The `new` storage<> can and does indeed support a vector 'migrating' correctly from the stack to the heap: monotonic::storage<100> store; { typedef std::vector<char, monotonic::allocator<char> > Vector; Vector vec(store); vec.resize(10); // on the stack vec.resize(1000); // now on the heap }

Ross Levine

6:45 a.m.

...

monotonic::storage<100> store; { typedef std::vector<char, monotonic::allocator<char> > Vector; Vector vec(store); vec.resize(10); // on the stack vec.resize(1000); // now on the heap }

I am telling you, this code is not portable. Some compilers will create code that silently crashes because std::vector is not required to use any specific monotonic::allocator. Please, please remember that it is REQUIRED that all allocators of the same type are completely interchangeable in STL containers. That means NO non-static data. So what you have written below could fail on some compilers. This type of failure is especially noticable in lists, and almost unavoidable when using the splice member function if the compiler makes the allocator equality assumption. Actually, in general, I would assume this code would fail: monotonic::storage<100> store1; boost::monotonic::list<int> list1(store1); { monotonic::storage<100> store2; boost::monotonic::list<int> list2(store2); list2.push_back(1); list1.splice(list1.begin(), list2); } std::cout << *list1.begin(); // segfault? Though I haven't tested it. Of course, I don't think there's a way to fix that. But it's something to know. Also, your allocator's member pointer storage is public. I assume it's supposed to be private?

Christian Schladetsch

6:53 a.m.

BOOST_AUTO_TEST_CASE(test_splice) { monotonic::storage<100> store1; std::list<int, monotonic::allocator<int> > list1(store1); { monotonic::storage<100> store2; std::list<int, monotonic::allocator<int> > list2(store2); list2.push_back(1); list1.splice(list1.begin(), list2); } BOOST_CHECK_EQUAL(*list1.begin(), 1); } Running 5 test cases... *** No errors detected On Tue, Jun 16, 2009 at 6:45 PM, Ross Levine <ross.levine@uky.edu> wrote:

...

...
monotonic::storage<100> store; { typedef std::vector<char, monotonic::allocator<char> > Vector; Vector vec(store); vec.resize(10); // on the stack vec.resize(1000); // now on the heap }

I am telling you, this code is not portable. Some compilers will create code that silently crashes because std::vector is not required to use any specific monotonic::allocator.

Please, please remember that it is REQUIRED that all allocators of the same type are completely interchangeable in STL containers. That means NO non-static data. So what you have written below could fail on some compilers. This type of failure is especially noticable in lists, and almost unavoidable when using the splice member function if the compiler makes the allocator equality assumption.

Actually, in general, I would assume this code would fail:

monotonic::storage<100> store1; boost::monotonic::list<int> list1(store1); { monotonic::storage<100> store2; boost::monotonic::list<int> list2(store2); list2.push_back(1); list1.splice(list1.begin(), list2); } std::cout << *list1.begin(); // segfault?

Though I haven't tested it. Of course, I don't think there's a way to fix that. But it's something to know.

Also, your allocator's member pointer storage is public. I assume it's supposed to be private? _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

7:26 a.m.

However, you are right that it fails on GCC. On Tue, Jun 16, 2009 at 6:53 PM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...

BOOST_AUTO_TEST_CASE(test_splice) { monotonic::storage<100> store1; std::list<int, monotonic::allocator<int> > list1(store1); { monotonic::storage<100> store2; std::list<int, monotonic::allocator<int> > list2(store2); list2.push_back(1); list1.splice(list1.begin(), list2); } BOOST_CHECK_EQUAL(*list1.begin(), 1); }

Running 5 test cases...

*** No errors detected

On Tue, Jun 16, 2009 at 6:45 PM, Ross Levine <ross.levine@uky.edu> wrote:

...
...
monotonic::storage<100> store; { typedef std::vector<char, monotonic::allocator<char> > Vector; Vector vec(store); vec.resize(10); // on the stack vec.resize(1000); // now on the heap }

I am telling you, this code is not portable. Some compilers will create code that silently crashes because std::vector is not required to use any specific monotonic::allocator.

Please, please remember that it is REQUIRED that all allocators of the same type are completely interchangeable in STL containers. That means NO non-static data. So what you have written below could fail on some compilers. This type of failure is especially noticable in lists, and almost unavoidable when using the splice member function if the compiler makes the allocator equality assumption.

Actually, in general, I would assume this code would fail:

monotonic::storage<100> store1; boost::monotonic::list<int> list1(store1); { monotonic::storage<100> store2; boost::monotonic::list<int> list2(store2); list2.push_back(1); list1.splice(list1.begin(), list2); } std::cout << *list1.begin(); // segfault?

Though I haven't tested it. Of course, I don't think there's a way to fix that. But it's something to know.

Also, your allocator's member pointer storage is public. I assume it's supposed to be private? _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Ross Levine

7:44 a.m.

The list splice thing can't be helped, because you are splicing from a destructed storage location. So that's just a note in documentation. Obviously, there's nothing that can be done. However, the more pressing issue is the non-static data in monotonic::allocator (namely, the pointer "storage"). On newer compilers, there will be no problems. However, the allocator is non-portable, and that REALLY bothers me. There has to be some way to avoid that. Luckily it seems c++0x will eliminate that assumption about equal allocators. However it is a royal pain in the meantime. I have a few ideas and might play around with them. On Tue, Jun 16, 2009 at 3:26 AM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...

However, you are right that it fails on GCC.

On Tue, Jun 16, 2009 at 6:53 PM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...
BOOST_AUTO_TEST_CASE(test_splice) { monotonic::storage<100> store1; std::list<int, monotonic::allocator<int> > list1(store1); { monotonic::storage<100> store2; std::list<int, monotonic::allocator<int> > list2(store2); list2.push_back(1); list1.splice(list1.begin(), list2); } BOOST_CHECK_EQUAL(*list1.begin(), 1); }

Running 5 test cases...

*** No errors detected

On Tue, Jun 16, 2009 at 6:45 PM, Ross Levine <ross.levine@uky.edu> wrote:

...
...
monotonic::storage<100> store; { typedef std::vector<char, monotonic::allocator<char> > Vector; Vector vec(store); vec.resize(10); // on the stack vec.resize(1000); // now on the heap }

I am telling you, this code is not portable. Some compilers will create code that silently crashes because std::vector is not required to use any specific monotonic::allocator.

Please, please remember that it is REQUIRED that all allocators of the same type are completely interchangeable in STL containers. That means NO non-static data. So what you have written below could fail on some compilers. This type of failure is especially noticable in lists, and almost unavoidable when using the splice member function if the compiler makes the allocator equality assumption.

Actually, in general, I would assume this code would fail:

monotonic::storage<100> store1; boost::monotonic::list<int> list1(store1); { monotonic::storage<100> store2; boost::monotonic::list<int> list2(store2); list2.push_back(1); list1.splice(list1.begin(), list2); } std::cout << *list1.begin(); // segfault?

Though I haven't tested it. Of course, I don't think there's a way to fix that. But it's something to know.

Also, your allocator's member pointer storage is public. I assume it's supposed to be private? _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

8:07 a.m.

Hi Ross, Sorry for my staccato replies. You caught me on the way out the door from work and I only had time to run some tests and post the results. In truth, this is a non-issue. The code you presented broke the standard; monotonic allocator didn't. Allocators are indeed allowed to have non-static members. There are various solutions to this 'problem': * Only use one monotonic::storage. Multiple allocators that use the one storage will correctly compare as equal and can be shared. * Don't use std::list::splice with multiple allocators with GCC. * Fix the standard so that all STL implementations must respect allocator inequality. The fact that they are not required to is an anachronism. It is certainly possible. * Use a single static 'storage'. I hate this the most. None of these invalidate the library as it stands. BTW, this has nothing to do with my original code that you quoted where a vector migrates from the stack to the heap transparently.

...

I have a few ideas and might play around with them.

I'm interested in hearing your thoughts on how the larger issue could be addressed. But as noted, it is not me that is breaking the standard if you splice between two lists that are using different allocators. Regards, Chrisitan.

Christopher Jefferson

9:07 a.m.

On 16 Jun 2009, at 09:07, Christian Schladetsch wrote:

...

Hi Ross,

Sorry for my staccato replies. You caught me on the way out the door from work and I only had time to run some tests and post the results.

In truth, this is a non-issue. The code you presented broke the standard; monotonic allocator didn't. Allocators are indeed allowed to have non-static members.

There are various solutions to this 'problem':

* Only use one monotonic::storage. Multiple allocators that use the one storage will correctly compare as equal and can be shared. * Don't use std::list::splice with multiple allocators with GCC. * Fix the standard so that all STL implementations must respect allocator inequality. The fact that they are not required to is an anachronism. It is certainly possible.

This is not a simple "fix", there are major issues, to do with when, or how, assignment, construction and swapping move allocators around. There has been some serious discussion and research on this issue previously. Not to say it can't be fixed, but it isn't just a case of "oh yes, just say they respect it". Chris

Christian Schladetsch

10:09 a.m.

...

Chris> there are major issues, to do with when, or how, assignment, construction and swapping move allocators around. There has been some serious discussion and research on this issue previously.

And I don't pretend to be an expert on all the issues. From wikipedia: ``Stepanov later commented that, while allocators "are not such a bad ideas in theory (...) [u]nfortunately they cannot work in practice". He observed that to make allocators really useful, a change to the core language with regards to references<http://en.wikipedia.org/wiki/Reference_%28computer_science%29>was necessary`` However, at the end of the day, this specific proposal works very well without changing anything. The key motivation for the proposal was to make fast and small nested containers that can use the stack and/or heap for storage, and to make deallocation a no-op. This has been accomplished with just one storage used by a set of containers at or below a given stack frame, and also by having multiple storage areas that the programmer is required to keep seperate. This is not a huge burden, and is in fact a very simple thing to acheive in the scheme of things. Can it be abused? Yes. But you can also dereference a freed pointer.

...

Not to say it can't be fixed, but it isn't just a case of "oh yes, just say they respect it".

I really didn't mean to trivialise the issue. I realise that the idea of allocator equivalence has many issues, and it will be smarter people than I whom decide on them. I'm just making the point that it doesn't adversely affect the key aspects of this proposal. All of my performances tests to date have used one storage with one or more containers, have been completely safe (after adding correct alignment) and standards-based, and have shown measurable and repeatable benefit. I very clearly stand to be corrected on all results that I have posted.

...

Chris

Cheers, Christian.

Frank Mori Hess

12:46 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday 16 June 2009, Christian Schladetsch wrote:

...

In truth, this is a non-issue. The code you presented broke the standard; monotonic allocator didn't. Allocators are indeed allowed to have non-static members.

I hate to contradict your prejudices, but std c++03 containers are not required by the standard to support allocators which contain state, as has been noted several times already in this thread. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAko3lCwACgkQ5vihyNWuA4WjGwCg5kczlthFB0OcjnTTguvfEtUz V8IAoL+CDoXumkOSb0F2KY2ERCN/K7VO =yx/X -----END PGP SIGNATURE-----

David Bergman

3:17 p.m.

On Jun 16, 2009, at 8:46 AM, Frank Mori Hess wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Tuesday 16 June 2009, Christian Schladetsch wrote:

...
In truth, this is a non-issue. The code you presented broke the standard; monotonic allocator didn't. Allocators are indeed allowed to have non-static members.

I hate to contradict your prejudices, but std c++03 containers are not required by the standard to support allocators which contain state, as has been noted several times already in this thread.

DISCLAIMER: this post of mine might sound a bit harsh, but since Christian's "proposal revision factory" basically has occupied this list for a while, I think it is fair. It is interesting how much leeway this group gives to a quite immature "proposal" and its stream of quick fixes. No offense to Christian, I truly enjoy the energy he brings and there is something to the *overall idea* of having a limited pre-allocated block used by multiple containers, including lists and trees. I am quite certain that a proper proposal for such an idea will not come from Christian himself, due to a lack of experience with advanced C++ issues and a bit of a focus problem. It might come from Thorsten, though. BUT, the truth is that as soon as we enter real high-performance use cases, we often end up with one of two scenarios: 1. We know how many elements we will deal with at most, and they are not too many; we can then reserve the space in a vector. 2. There are many elements or they come and go during runtime. We then need a proper allocator, which also can deallocate. It is not clear how frequent, or important, the cases not covered by these two scenarios are, and in some of the space between #1 and #2, we can use Thorsten's auto_buffer. The only cases that are really palatable to the idea behind Christian's stream of proposals is that of multiple small, and pretty fixed, containers being used for an extended time in a high performance scenario. I am not sure this case is that common. I.e., it might very well be that even the *overall idea* concerns a non-problem in real life. Questions to the populace here: Do we see an light in the tunnel here, i.e., is this problem trying to be solved a problem? Do we want Christian to refine his ideas further? Do we want him to do it interactively on the list? /David

Christian Schladetsch

8:45 p.m.

Hi David, DISCLAIMER: this post of mine might sound a bit harsh, but since Christian's

...

"proposal revision factory" basically has occupied this list for a while, I think it is fair.

My proposal has hardly changed since I first introduced it. The only key un-breaking change that was made was an extra 5 lines to ensure that allocations for different types made from the same storage are aligned correctly for those different types on different platforms. Due to requests from the list, the proposal has been extended to allow the initial inline_storage to grow transparently to the heap if required. All other 'changes' have been additions like chain<> (required to have a sequence-like container that can literally span the stack/heap boundary), and more benchmarks and tests.

...

It is interesting how much leeway this group gives to a quite immature "proposal" and its stream of quick fixes.

Again, there has not been a "stream of quick fixes". I added generalised alignment for different types. That is the only 'fix' that has been made to the original proposal.

...

No offense to Christian,

None taken.

...

I truly enjoy the energy he brings and there is something to the *overall idea* of having a limited pre-allocated block used by multiple containers, including lists and trees.

There are a number of novel aspects to this proposal. I do not believe that anyone else has ever even tried, let alone succeeded, in creating a system that spans the stack and heap for storage for generalised containers. This system can start on the stack and grow to use the heap transparently, producing containers where some elements are on the stack others are on the heap. I have demonstrated repeatedly the efficacy of this approach, and have posted benchmarks showing real performance improvements.

...

I am quite certain that a proper proposal for such an idea will not come from Christian himself, due to a lack of experience with advanced C++ issues and a bit of a focus problem. It might come from Thorsten, though.

Rather than revert to ad hominem attacks, I invite you to criticise the proposal and the benchmarks as they stand.

...

BUT, the truth is that as soon as we enter real high-performance use cases, we often end up with one of two scenarios:

1. We know how many elements we will deal with at most, and they are not too many; we can then reserve the space in a vector.

We also often need node-based containers. In any case, a vector on the stack is faster than a vector on the heap.

...

2. There are many elements or they come and go during runtime. We then need a proper allocator, which also can deallocate.

If they come and go in a short space of time and there is some upper bound, we do not need to deallocate. We can just truncate memory at the end. Even if there is no previously known upper bound, we can still truncate at the end. By deferring all memory management we can realise real performance gains. I have posted benchmarks that demonstrate the practical benefit of this approach, which are being ignored. I should also point out that monotonic storage has the following interface, and has no upper bound on the size of its buffer: /// storage that spans the stack/heap boundary. /// /// allocation requests first use inline fixed_storage of N bytes. /// once that is exhausted, later requests are serviced from the heap. /// /// all allocations remain valid at all times. template <size_t InlineSize = 8*1024, size_t MinHeapIncrement = InlineSize*1024, class Al = std::allocator<char> > struct storage; It is not clear how frequent, or important, the cases not covered by these

...

two scenarios are, and in some of the space between #1 and #2, we can use Thorsten's auto_buffer. The only cases that are really palatable to the idea behind Christian's stream of proposals

I have not made a stream of proposals. I have made one, which has hardly changed. That one proposal was altered once to add correct alignment of different types on different platforms. It was then extended to allow the initial inline_storage to grow as required.

...

is that of multiple small, and pretty fixed, containers being used for an extended time in a high performance scenario. I am not sure this case is that common.

I.e., it might very well be that even the *overall idea* concerns a non-problem in real life.

Questions to the populace here: Do we see an light in the tunnel here, i.e., is this problem trying to be solved a problem? Do we want Christian to refine his ideas further? Do we want him to do it interactively on the list?

I am happy to leave it be. I have posted the code and my results. I have made my case. I do not think that anyone previously has ever even thought of doing what this proposal does, so I expected some friction against the idea, and yes I was quite defensive which has not helped my case. In truth, Thorsten's auto_buffer<> provides a subset of the functionality provided by this library, so I guess I have ruffled some feathers: monotonic::storage<> storage; std::vector<T0, monotonic::allocator<T0> > v0(storage); std::vector<T1, monotonic::allocator<T1> > v1(storage); ... std::vector<Tn, monotonic::allocator<Tn> > vn(storage); std::list<U0, monotonic::allocator<U0> > l0(storage); ... std::map<int, std::list<std::vector< ... > > m0(storage); Use the containers as you wish. Storage for them will start on the stack, using a limit that you can provide. It will grow as required to the heap, in chunk sizes that you can provide . Allocation will be O(1). Deallocation is deferred until the storage goes out of scope. Using these containers is demonstrably faster than using the same containers with a default allocator. Using the stack for storage for multiple containers is a good idea. Allowing that storage to then move transparently to the heap makes it more general and safer to use. By making de-allocation a nop, I also make allocation O(1), use less memory, and allow adjacent-in-time allocations to be adjacent-in-memory. All of these things makes it easy to make small, fast and general data-structures for use in high-performance applications without having to revert to custom containers. I have posted results that back up my claims. /David Regards, Christian.

Christian Schladetsch

8:01 p.m.

On Wed, Jun 17, 2009 at 12:46 AM, Frank Mori Hess <frank.hess@nist.gov>wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Tuesday 16 June 2009, Christian Schladetsch wrote:

...
In truth, this is a non-issue. The code you presented broke the standard; monotonic allocator didn't. Allocators are indeed allowed to have non-static members.

I hate to contradict your prejudices, but std c++03 containers are not required by the standard to support allocators which contain state, as has been noted several times already in this thread.

I didn't claim that they did. Ross did: "Luckily it seems c++0x will eliminate that assumption about equal allocators". In truth, the standard states that STL implementations are free to treat allocators of the same type isomorphically. It does not state that allocators may not have non-static data. That is all that I have claimed from the start. What this means practically is that yes, you can write code that breaks one some platforms (but not all) if you use and attempt to intermix elements from containers that use allocators that are based on different storage. In reality, this is rarely a problem. In any case, the monotonic library as it stands can be and is being safely used with just one storage and many allocators. If necessary, this can be enforced by using a static global. Thanks, Christian.

Frank Mori Hess

9:44 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday 16 June 2009, Christian Schladetsch wrote:

...

On Wed, Jun 17, 2009 at 12:46 AM, Frank Mori Hess <frank.hess@nist.gov>wrote:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...

...
I hate to contradict your prejudices, but std c++03 containers are not required by the standard to support allocators which contain state, as has been noted several times already in this thread.

I didn't claim that they did. Ross did: "Luckily it seems c++0x will eliminate that assumption about equal allocators".

No, he didn't. You seem to be confusing c++03 and c++0x.

...

In truth, the standard states that STL implementations are free to treat allocators of the same type isomorphically. It does not state that allocators may not have non-static data. That is all that I have claimed from the start.

What this means practically is that yes, you can write code that breaks one some platforms (but not all) if you use and attempt to intermix elements from containers that use allocators that are based on different storage. In reality, this is rarely a problem.

What it means is that a conforming implementation of c++03 can choose to provide containers which do not even store an allocator object, and which completely ignore any allocator passed to their constructor. They can choose to default construct a temporary allocator object whenever they need to do an allocation or deallocation and then discard it. Your current monotonic::allocator will fail an assertion when it is default constructed and attempted to be used for allocation. I do think your allocator could be useful, it just needs to be paired with containers that it is guaranteed to work with. Anyways, I'm beginning to find this thread frustrating and don't intend to comment further. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAko4ElgACgkQ5vihyNWuA4WaIQCg4EQUpM5+N8mND++7Jhzx1O9d uY4AnRd00opmUx0sSpLyITtyWigMpwMf =SkCj -----END PGP SIGNATURE-----

Christian Schladetsch

17 Jun 17 Jun

3:02 a.m.

Hi Frank, I do think your allocator could be useful, it just needs to be paired with

...

containers that it is guaranteed to work with.

I empathise, so I added monotonic::static_storage<> and monotonic::static_shared_storage<>. See http://tinyurl.com/lmojlv. I then re-ran test_map_list(), see http://tinyurl.com/n59qb8. MSVC /O2: test_map_list count mono std mono_static mono/std mono_static/std 100 0.136 0.219 0.412 0.330097% 0.531553% 1100 1.831 1.717 4.398 0.416326% 0.390405% 2100 3.558 3.531 8.466 0.420269% 0.41708% 3100 5.592 5.576 13.065 0.428014% 0.426789% 4100 7.757 8.225 17.522 0.442701% 0.46941% 5100 10.138 10.114 21.922 0.462458% 0.461363% 6100 12.684 12.588 26.46 0.479365% 0.475737% 7100 14.636 15.233 30.796 0.475257% 0.494642% 8100 17.816 17.014 35.67 0.499467% 0.476983% 9100 19.358 19.199 39.872 0.485504% 0.481516% And for GCC 4.3.3 -O4: 100 0.08 0.07 0.13 0.615385% 0.538462% 1100 1.2 1.18 1.86 0.645161% 0.634409% 2100 2.53 2.62 3.56 0.710674% 0.735955% 3100 3.9 3.81 5.48 0.711679% 0.695255% 4100 5.26 5.28 7.58 0.693931% 0.69657% 5100 6.77 6.96 9.87 0.685917% 0.705167% 6100 8.54 8.36 12.36 0.690939% 0.676375% 7100 10.76 10.96 15.47 0.69554% 0.708468% 8100 11.81 11.76 16.46 0.717497% 0.714459% 9100 13.28 13.57 18.98 0.699684% 0.714963% I believe this resolves the issue of default-constructed monotonic containers. You don't need to make an explicit storage now, but I still think it's better to do so.

...

Anyways, I'm beginning to find this thread frustrating and don't intend to comment further.

Sorry to see you go. Regards, Christian.

Simonson, Lucanus J

4:15 p.m.

Christian Schladetsch wrote:

...

Hi Frank,

I do think your allocator could be useful, it just needs to be paired with

...
containers that it is guaranteed to work with.

I empathise, so I added monotonic::static_storage<> and monotonic::static_shared_storage<>. See http://tinyurl.com/lmojlv.

I then re-ran test_map_list(), see http://tinyurl.com/n59qb8.

MSVC /O2: count mono std mono_static mono/std mono_static/std 1100 1.831 1.717 4.398 0.416326% 0.390405%

And for GCC 4.3.3 -O4: 1100 1.2 1.18 1.86 0.645161% 0.634409%

First of all, speedup should be old/new, not new/old. Second, I looked at your benchmark code to confirm and your mono, std and mono_static numbers above are apparently elapsed time in (I assume) seconds. The problem with this is that mono and mono_static have higher elapsed time in all cases in the data you present than std, but you somehow get this mono/std fraction that is less than one and tack a % sign on it for no reason that I can see. I find such things annoying, but furthermore these indicate a lack of background, experience and attention to detail that calls the quality of your work in general into question. Have you ever published a technical academic paper? Finally, it is clear that GCC and MSVC are using completely different allocators since you get very different results when comparing. Instead of comparing against whatever you get with default switches in two major compilers, why don't you go out and do some research into 3rd party allocators such as are provided by google and TBB which are known to be *better* than the std allocator in gcc and compare against those. I think you are learning a lot from all of this, but if I think writing your own allocator to meet the need for a faster allocator might be the wrong way to solve your performance problem. You should be looking for a freely available alternative allocator that has the features you want, and only after you rule out that you can get one for free should you write your own. I've thought some more about your application domain, and programming for XBox and Cell is a special kind of hell where cache coherency really is king, since you have to do it manually. I can sympathize with your desire to allocate everything on the stack in such an architecture. Perhaps you are solving a problem that is specific to your situation and not really general to C++ programming at large. Regards, Luke

Ross Levine

5:16 p.m.

Agreed. Christian hasn't posted any results for pool_allocator and fast_pool_allocator either. I am also tired of the complete lack of understanding of what is and is not defined in the standard. Originally I wanted to help with this idea, but the idea isn't very mature and I can't support it anymore. This discussion has officially become too frustrating to participate in for me. On Wed, Jun 17, 2009 at 12:15 PM, Simonson, Lucanus J < lucanus.j.simonson@intel.com> wrote:

...

Christian Schladetsch wrote:

...
Hi Frank,

I do think your allocator could be useful, it just needs to be paired with

...
containers that it is guaranteed to work with.

I empathise, so I added monotonic::static_storage<> and monotonic::static_shared_storage<>. See http://tinyurl.com/lmojlv.

I then re-ran test_map_list(), see http://tinyurl.com/n59qb8.

MSVC /O2: count mono std mono_static mono/std mono_static/std 1100 1.831 1.717 4.398 0.416326% 0.390405%

And for GCC 4.3.3 -O4: 1100 1.2 1.18 1.86 0.645161% 0.634409%

First of all, speedup should be old/new, not new/old. Second, I looked at your benchmark code to confirm and your mono, std and mono_static numbers above are apparently elapsed time in (I assume) seconds. The problem with this is that mono and mono_static have higher elapsed time in all cases in the data you present than std, but you somehow get this mono/std fraction that is less than one and tack a % sign on it for no reason that I can see. I find such things annoying, but furthermore these indicate a lack of background, experience and attention to detail that calls the quality of your work in general into question. Have you ever published a technical academic paper?

Finally, it is clear that GCC and MSVC are using completely different allocators since you get very different results when comparing. Instead of comparing against whatever you get with default switches in two major compilers, why don't you go out and do some research into 3rd party allocators such as are provided by google and TBB which are known to be *better* than the std allocator in gcc and compare against those.

I think you are learning a lot from all of this, but if I think writing your own allocator to meet the need for a faster allocator might be the wrong way to solve your performance problem. You should be looking for a freely available alternative allocator that has the features you want, and only after you rule out that you can get one for free should you write your own.

I've thought some more about your application domain, and programming for XBox and Cell is a special kind of hell where cache coherency really is king, since you have to do it manually. I can sympathize with your desire to allocate everything on the stack in such an architecture. Perhaps you are solving a problem that is specific to your situation and not really general to C++ programming at large.

Regards, Luke _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

9:11 p.m.

Hi Ross, Agreed. Christian hasn't posted any results for pool_allocator and

...

fast_pool_allocator either.

I agree that I should do this and I shall. However, it should also be clear that these other allocators are specific to a given type T. Monotonic allocation is available to any type, and from any set of containers, from the same single storage buffer, which can be on the stack, on the heap, or spanning both.

...

I am also tired of the complete lack of understanding of what is and is not defined in the standard.

I have been completely aware of the standard from the start. My documentation references it, I talk about the issue of non-static data in an allocator there. I have always understood that the standard states that allocators of the same type may be considered isomorphic by a given STL implementation. All I have ever said is that the standard does not say that an allocator may not have non-static data. I'll repeat that, because there seems to be some confusion. All I have ever said is that the standard does not say that an allocator cannot have non-static data. The standard states that a STL implementation may treat allocators of the same type as being all equivalent. I can't help thinking that there is a case here of people cutting off their noses to spite their faces. My benchmarks have shown significant gains in both MSVC and GCC. This is true either for a local storage, or for global storage. So, ignore the local storage for now and consider the case for the static storage. This uses default-constructured monotonic::allocator<T>'s. All such allocators are isomorphic. They have non-static data, sure, but that doesn't matter because a STL implementation can treat all monotonic::allocator<T> as being equivalent, and they can be default-constructed, used, then discarded, repeatedly and safely. I resisted adding a global store because the idea was anathema to me, but now I see that it helps with acceptance. So, you can get an existing data-structure, change it to use monotonic::allocator, recompile and see performance gains. All you need to add is a place to reset the usage. To reject it out of hand seems an act of intentional obtuseness to me. The key aspects of the proposal haven't changed since I first made it: I first added generalised alignment for different types from the same storage, then I added ability for a store to merge into the heap as required, then added a default global storage for use by a default-constructed allocator. These are not sweeping changes. I would consider these a normal part of developing a proposal. Adding a global default storage was absolutely trivial, and it solves any concerns people may've had about allocator equivalence. I should have had this in my proposal from the absolute start. I didn't, because I never had a need for it and in my world globals are bad. Now that it's there, and addresses people's concerns about portability, everyone suddenly loses interest and leaves. Interesting. I was under the impression that proposals were just that, and were expected to be slightly modified as it was exposed to a wider audience. Indeed, isn't that the entire point? There is no argument against the allocator in terms of the standard or use by older STL implementations, because monotonic::allocator<T> can be safely default-constructed, and all instances of default-constructed allocator<T> are exactly identical in every way. To ignore significant gains from very simple code based on personal bias seems silly to me. I will re-write the documentation and base it on just using the single global storage, and make the local-store case secondary. Using default-constructed monotonic::allocator<T>'s (for all T! unlike pool_allocator et al.) with a single global store does not have any lurking issues with portability or unsafe STL implementations, and yet shows significant performance and memory gains out-of-the-box with no other changes required. Sorry to see you go Ross. Regards, Christian.

David Bergman

9:32 p.m.

On Jun 17, 2009, at 5:11 PM, Christian Schladetsch wrote:

...

Hi Ross,

Agreed. Christian hasn't posted any results for pool_allocator and

...
fast_pool_allocator either.

I agree that I should do this and I shall. However, it should also be clear that these other allocators are specific to a given type T. Monotonic allocation is available to any type, and from any set of containers, from the same single storage buffer, which can be on the stack, on the heap, or spanning both.

...
I am also tired of the complete lack of understanding of what is and is not defined in the standard.

I have been completely aware of the standard from the start. My documentation references it, I talk about the issue of non-static data in an allocator there. I have always understood that the standard states that allocators of the same type may be considered isomorphic by a given STL implementation. All I have ever said is that the standard does not say that an allocator may not have non-static data.

I'll repeat that, because there seems to be some confusion. All I have ever said is that the standard does not say that an allocator cannot have non-static data. The standard states that a STL implementation may treat allocators of the same type as being all equivalent.

I can't help thinking that there is a case here of people cutting off their noses to spite their faces. [snip]

Yes, it is not easy for a rational, calm and experienced developer to come with a solid proposal to all these hot-headed and self- destructive people... /David

Christian Schladetsch

9:36 p.m.

...

Yes, it is not easy for a rational, calm and experienced developer to come with a solid proposal to all these hot-headed and self-destructive people...

Point taken. I know I have been my own worst enemy on occasion. Christian.

Simonson, Lucanus J

9:39 p.m.

Christian Schladetsch wrote:

...

These are not sweeping changes. I would consider these a normal part of developing a proposal. Adding a global default storage was absolutely trivial, and it solves any concerns people may've had about allocator equivalence. I should have had this in my proposal from the absolute start. I didn't, because I never had a need for it and in my world globals are bad. Now that it's there, and addresses people's concerns about portability, everyone suddenly loses interest and leaves. Interesting. I was under the impression that proposals were just that, and were expected to be slightly modified as it was exposed to a wider audience. Indeed, isn't that the entire point?

There is no argument against the allocator in terms of the standard or use by older STL implementations, because monotonic::allocator<T> can be safely default-constructed, and all instances of default-constructed allocator<T> are exactly identical in every way.

To ignore significant gains from very simple code based on personal bias seems silly to me.

People are naturally skeptical of bold claims. Up to now you have not make your point in a convincing manner. People are not ignoring significant gains out of personal bias, they are ignoring bold claims because you haven't offered convincing evidence yet that your allocator in a significant improvement on what is already available. We should be skeptical. What makes you so much smarter than the folks at Google or Intel that have implemented good memory allocators that are faster than what ships with the compiler or the OS? The personal bias is not against your proposal, but against the way you react defensively to constructive feedback that is part of what you rightly identify as the normal process of developing a proposal. I can see you making an effort to be gracious, and I appreciate that, but people are getting frustrated with you because you are still going about making your proposal in the wrong way. I actually find your idea interesting, but you haven't convinced me yet that I should use your allocator. I believe writing your own memory pool is the last thing a developer should do these days when there are so many freely available to choose from. I just spent half the day fixing a bug in a memory pool written five years ago that never worked when someone requests a block of size larger than the arbitrary threshold 500000000 bytes because it triggered a special case that left it with dangling pointers, a case which never happened before last week. Do I trust you more than I trust Google to replace this legacy memory pool? You are going to need to earn that trust. Don't be defensive that people are skeptical, people should be skeptical. Defensiveness does not engender trust, but rather distrust. Regards, Luke

Christian Schladetsch

18 Jun 18 Jun

1:06 a.m.

Hi Luke, People are naturally skeptical of bold claims. Up to now you have not make

...

your point in a convincing manner.

I realise I have blundered in a number of areas, but I stand by my claims. count mono mono_static std std/mono std/mono_static 100 0.07 0.08 0.13 185.714% 162.5% 1100 1.13 1.1 1.63 144.248% 148.182% 2100 2.3 2.29 3.36 146.087% 146.725% 3100 3.78 3.53 5.13 135.714% 145.326% 4100 4.9 4.89 7.13 145.51% 145.808% 5100 6.5 6.33 9.15 140.769% 144.55% 6100 7.91 7.79 11.79 149.052% 151.348% 7100 9.58 9.38 13.53 141.232% 144.243% 8100 11.22 11.01 16.1 143.494% 146.231% 9100 12.72 13.12 17.9 140.723% 136.433% The test-code is here http://tinyurl.com/n59qb8. I have other tests of course, but I stick with this one because it makes a relatively modest claim (as opposed to say test_dupe() which shows a many-times improvement). According to a previous post, I have also added monotonic::shared_allocator<T>. Even though you could have used just monotonic::allocator<T> with a monotonic::shared_storage, Steven convinced me that it is better to be explicit about it at the allocator-level. So now you can have general containers that use thread-safe monotonic storage without ambguity about which allocation scheme or shared_storage to use. It's all there in the sandbox. // single-threaded, use static_storage std::map<int, std::list<int, monotonic::allocator<int> >, std::less<int>, monotonic::allocator<int> > map; // multi-threaded, use static_shared_storage std::map<int, std::list<int, monotonic::shared_allocator<int> >, std::less<int>, monotonic::shared_allocator<int> > shared_map; I've also had the presumptiveness to assume that in a vector of lists, the lists should use the same allocator as the vector. So: typedef std::list<int, monotonic::allocator<int> > List; typedef std::vector<List, monotonic::allocator<List> > Vector; Vector vector; vector.resize(42); assert(vector[12].get_allocator().get_storage() == vector.get_allocator().get_storage()); This works in the general case for all containers of containers as well, and for shared and unshared allocation. One outstanding issue is operator[] for std::map, which cannot be fixed. It cannot be fixed because STL implementations do not always call allocator.construct() after allocator.allocate(). In this case, a monotonic::map is necessary to ensure that a reference created with operator[], if it is a monotonic container, will use the same allocator as the parent map.

...

Don't be defensive that people are skeptical, people should be skeptical. Defensiveness does not engender trust, but rather distrust.

I completely understand the scepticism, and I understand that and recognise that I have made mistakes. But I find myself in a quandary. Do I continue to post results and develop the idea, or just drop it in the face of people, experience people, telling me that it "just wont work". But, well, it does work. I have the numbers that show that it works. In retrospect, I should have sat on my hat for a few weeks longer, decided on the necessity of default-construction myself, created an academic-quality paper with references, tables, comparisons, and so on. I will do so now, retro-actively and will stop bickering. Regards,

...

Luke

Thanks for your comments. I will limit myself to just posting important results as they come, and I will compare with other allocation models. Regards, Christian.

Christian Schladetsch

1:36 a.m.

Apologies in advance, I realise I am on the verge of spamming and will not continue.

...

I've also had the presumptiveness to assume that in a vector of lists, the lists should use the same allocator as the vector. [...] It cannot be fixed because STL implementations do not always call allocator.construct() after allocator.allocate().

Somehow I managed to debunk myself in the space of three paragraphs. It is not possible to do what I wanted even with lists or vectors, let alone map. The reason that I risked this final post is to propose that boost needs a collection library that correctly deals with allocators. This would also have helped with boost::interprocess. The current state of affairs is painful in many respects. A lot of things would be fixed by boost::map, boost::list, boost::vector etc that wrap std::container but correctly use std::allocators, or even better, a boost::allocator model that is a superset of std::allocator. There are many many issues with this, I realise. For instance, what to do with allocators in operator=, copy-construction using a different allocator from the original and so on. Anyway, you guys arent the only ones that are getting tired of this, so.. Cheers, Christian.

Peter Soetens

8:59 a.m.

On Thu, Jun 18, 2009 at 03:36, Christian Schladetsch<christian.schladetsch@gmail.com> wrote:

...

Apologies in advance, I realise I am on the verge of spamming and will not continue.

Now that this guy has left, I wanted to add in retrospect that the idea of the monotonic allocator was thrilling for an embedded & real-time user like me. Today we are settling with vector.reserve() or custom containers and this idea could be of use in some cases, if anyone doubted that at all. I understand that The Standard is here to protect us from making the same mistakes all over again, but we're sometimes a bit more relaxed than the people around this list. The unit test will tell eventually if it works or not, not the standard, I haven't read it. Nothing to add about the way this proposal evolved (and started) on the list... that's been made clear by several people already. Peter

Peter Soetens

10:09 a.m.

On Thu, Jun 18, 2009 at 03:36, Christian Schladetsch<christian.schladetsch@gmail.com> wrote:

...

Apologies in advance, I realise I am on the verge of spamming and will not continue.

Simonson, Lucanus J

4:44 p.m.

Christian Schladetsch wrote:

...

In retrospect, I should have sat on my hat for a few weeks longer, decided on the necessity of default-construction myself, created an academic-quality paper with references, tables, comparisons, and so on. I will do so now, retro-actively and will stop bickering.

I'd be very surprised to learn that there isn't a memory pool out there that allows the user to specify an address and size as the initial buffer. This address could obviously come from the stack. You've repeatedly said you are the first or only to do such-and-such, but haven't backed it up. Have you looked? If you want to write an allocator you should be well versed in what allocators are available and their relative strengths and weaknesses. You know there is a difference between gcc and MSVC allocators, but you don't know what that difference is. I'd like you to be able to answer that question if you are comparing yourself to them. You can step through their code in the debugger, after all. Making an allocator thread safe is not generally trivial, so saying you added mutexes and made it thread safe doesn't inspire confidence. Do you know and understand the pitfalls in this? Do you know what the double locking pattern is, or why people use it? Along with a proposal (to boost) for a new allocator you need to be able to present a review of exising allocators and demonstrate mastery of the subject matter (including a historical perspective) so that if your library is accepted and people begin to defer to you as an expert in allocators it is because, in fact, you are an expert in allocators and not just because you are very confident in yourself. Confidence is not a bad thing. I'm confident in myself too. Three years ago I wasn't an expert in much of anything but had some good ideas and some bad ideas. Be patient. Regards, Luke

David Bergman

5:24 p.m.

On Jun 18, 2009, at 12:44 PM, Simonson, Lucanus J wrote: [snip]

...

Three years ago I wasn't an expert in much of anything but had some good ideas and some bad ideas. Be patient.

Three years? Is that all it takes? There is hope for me then. :-) /David

Scott McMurray

7:09 p.m.

2009/6/18 David Bergman <David.Bergman@bergmangupta.com>

...

On Jun 18, 2009, at 12:44 PM, Simonson, Lucanus J wrote: [snip]

...
Three years ago I wasn't an expert in much of anything but had some good ideas and some bad ideas. Be patient.

Three years? Is that all it takes? There is hope for me then. :-)

I thought it was ten years: http://norvig.com/21-days.html =]

Simonson, Lucanus J

7:49 p.m.

New subject: expert, was: Proposal: Monotonic Containers

Scott McMurray wrote:

...

2009/6/18 David Bergman <David.Bergman@bergmangupta.com>

...
On Jun 18, 2009, at 12:44 PM, Simonson, Lucanus J wrote: [snip]

...
Three years ago I wasn't an expert in much of anything but had some good ideas and some bad ideas. Be patient.

Three years? Is that all it takes? There is hope for me then. :-)

I thought it was ten years: http://norvig.com/21-days.html

I taught myself perl in three days. Would have been two but I kept falling asleep over the 21-days book, which was thick and made a nice pillow when open. I've been programming in C++ for twelve years, so if it takes ten then I guess I wasn't an expert three years ago ;) I taught myself C++ in the summer of 1997 between my sophmore and senior year of highschool when I was on crutches and once in my mother's basement in front of a dual boot windows/linux box was physically challenged to get away. I think I did the Borland C++ Builder 21-days book in 11, but it was a really terrible book and I didn't learn much except that OWL wasn't as good an idea as they thought. Regards, Luke

joel

7:07 p.m.

I'm bouncing on this. Pardon my noobish suggestion but, if we need to have allocator with only static data, can't we have an allacotor with a preallacoted stack storage that is used to perform allocation much like a FORTRAN common ? -- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

Steven Watanabe

17 Jun 17 Jun

9:46 p.m.

AMDG Christian Schladetsch wrote:

...

...
I am also tired of the complete lack of understanding of what is and is not defined in the standard.

I have been completely aware of the standard from the start. My documentation references it, I talk about the issue of non-static data in an allocator there. I have always understood that the standard states that allocators of the same type may be considered isomorphic by a given STL implementation. All I have ever said is that the standard does not say that an allocator may not have non-static data.

I'll repeat that, because there seems to be some confusion. All I have ever said is that the standard does not say that an allocator cannot have non-static data. The standard states that a STL implementation may treat allocators of the same type as being all equivalent.

Regardless, we already have a complete set of STL containers in Interprocess to work around this and other allocator issues in the current standard. http://tinyurl.com/kq3rs2

...

There is no argument against the allocator in terms of the standard or use by older STL implementations, because monotonic::allocator<T> can be safely default-constructed, and all instances of default-constructed allocator<T> are exactly identical in every way.

According to the standard, all instances of monotonic::allocator should compare equal because since deallocate is a no-op, any allocator can be used to deallocate memory allocated by any other allocator. Therefore, allocator equality is a non-issue.

...

I will re-write the documentation and base it on just using the single global storage, and make the local-store case secondary.

Allowing both local storage and a single global storage in the same allocator type seems like a bad idea to me. If you don't do any locking, then the global storage will not play well with threads. If you do lock, then you will introduce extra overhead for local storage. If you lock only for global storage, then the semantics may be confusing.

...

Using default-constructed monotonic::allocator<T>'s (for all T! unlike pool_allocator et al.)

boost::pool_allocator always uses global storage. rebind etc. work just fine. In Christ, Steven Watanabe

Christian Schladetsch

10:10 p.m.

...

Steven> Allowing both local storage and a single global storage in the same allocator type seems like a bad idea to me. If you don't do any locking, then the global storage will not play well with threads.

That is a good point. On considering this, I am considering introducing shared_allocator<T>. There is monotonic::static_storage<> and monotonic::static_shared_storage<> which is guarded with a mutex. There is also monotonic::storage<> and monotonic::shared_storage<> for local stores. I agree that there is room for error with these four different storage types. However, to use local storage you have to explicitly pass that to the container on construction, so the syntax and the intent are both clear: std::list<int, monotonic::allocator<int> > uses_global; monontic::storage<> storage; std::list<int, monotonic::allocator<int> > uses_local(storage); However, selecting which of static_storage or static_shared_storage to use is currently done with a function call. It seems that the following would be better: std::list<int, monotonic::allocator<int> > uses_global; std::list<int, monotonic::shared_allocator<int> > uses_global_shared; monontic::shared_storage<> shared_local; std::list<int, monotonic::allocator<int> > multithreaded(shared_local); There is a case to be made for removing the local storage types completely. But I personally use them, and want to continue to use them. Perhaps something like monotonic::potentially_unsafe_if_you_mix_with_other_storage::use_at_your_own_risk::but_it_can_really_help::storage<> could be provided. Even so, monotonic::static_storage<> is completely safe and isomorphic, and monotonic::static_shared_storage<> is completely safe and isomorphic for threaded applications. Thanks for the idea of introducing shared_allocator<T>. Regards, Christian.

vicente.botet

21 Jun 21 Jun

5:06 p.m.

New subject: Proposal: Monotonic Containers- Thread specific monotonic allocator

Hi, ----- Original Message ----- From: "Christian Schladetsch" <christian.schladetsch@gmail.com> To: <boost@lists.boost.org> Sent: Thursday, June 18, 2009 12:10 AM Subject: Re: [boost] Proposal: Monotonic Containers

...

...
Steven> Allowing both local storage and a single global storage in the same allocator type seems like a bad idea to me. If you don't do any locking, then the global storage will not play well with threads.

That is a good point. On considering this, I am considering introducing shared_allocator<T>.

Thanks for the idea of introducing shared_allocator<T>.

what do you think about having a monotonic allocator using a thread specific storage so this allocator will not need to use a mutex to protect the allocation? Best, Vicente

Christian Schladetsch

11:34 p.m.

New subject: Proposal: Monotonic Containers- Thread specific monotonic allocator

Hi Vicente, what do you think about having a monotonic allocator using a thread specific

...

storage so this allocator will not need to use a mutex to protect the allocation?

This seems like a good idea. How about: std::list<int, monotonic::allocator<int> > global; std::list<int, monotonic::shared_allocator<int> > shared_global; // uses mutex std::list<int, monotonic::local_allocator<int> > thread_specific_global; // doesnt use mutex This is now in the sanbox at https://svn.boost.org/svn/boost/sandbox/monotonic/boost/monotonic/local_allo..., but untested. I am yet to write a test harness for multi-threaded applications.

...

Best, Vicente

Thanks for the idea Vicente, Regards, Christian PS. updated benchmark results for GCC and MSVC at http://tinyurl.com/lj6nab

Christian Schladetsch

17 Jun 17 Jun

8:35 p.m.

...

First of all, speedup should be old/new, not new/old. Second, I looked at your benchmark code to confirm and your mono, std and mono_static numbers above are apparently elapsed time in (I assume) seconds. The problem with this is that mono and mono_static have higher elapsed time in all cases in the data you present than std,

I got the heading wrong after I hastily added the static_mono tests. What was

...

MSVC /O2: count mono std mono_static mono/std mono_static/std 1100 1.831 1.717 4.398 0.416326% 0.390405%

Should have been

...

MSVC /O2: count mono mono_static std mono/std mono_static/std 1100 1.831 1.717 4.398 0.416326% 0.390405%

I had mentioned the scales previously, I will make it part of the generated output now, as well as the headings.

...

but you somehow get this mono/std fraction that is less than one and tack a % sign on it for no reason that I can see.

It is a percentage of the time that the test took using a monotonc::allocator, in comparison to std::allocator. My apologies for not multiplying it by ten, and for transposing old/new versus new/old.

...

Finally, it is clear that GCC and MSVC are using completely different allocators since you get very different results when comparing.

Clearly the two runtimes have different allocation models. The results are ~50% faster on MSVC, and 25-35% faster on GCC. I do not think these are "very different results".

...

Instead of comparing against whatever you get with default switches in two major compilers, why don't you go out and do some research into 3rd party allocators such as are provided by google and TBB which are known to be *better* than the std allocator in gcc and compare against those.

Will do. I think you are learning a lot from all of this, but if I think writing your

...

own allocator to meet the need for a faster allocator might be the wrong way to solve your performance problem. You should be looking for a freely available alternative allocator that has the features you want, and only after you rule out that you can get one for free should you write your own.

I have looked, and no other allocator can allocate from the stack, or the heap, or both, and no other allocator has zero memory overhead and a no-op dealloc. Even so, other allocation systems are specific for one type. Monotonic allocation is available to any type, and multiple containers, from the one storage buffer.

...

I've thought some more about your application domain, and programming for XBox and Cell is a special kind of hell where cache coherency really is king, since you have to do it manually. I can sympathize with your desire to allocate everything on the stack in such an architecture.

However, I am not allocating everything on the stack. See https://svn.boost.org/svn/boost/sandbox/monotonic/boost/monotonic/storage.hp... .

...

Perhaps you are solving a problem that is specific to your situation and not really general to C++ programming at large.

You could argue that, except that I get large performance gains in the general case.

...

Regards, Luke

Thanks for your comments, which were useful as usual. I've fixed the headings and multiplied the ratios by 100. Regards, Christian.

Ross Levine

16 Jun 16 Jun

4:45 p.m.

Hi Christian, I'm afraid you are incorrect on both issues. On the issue of non-static data: Allocators are allowed non-static data, but STL implementations are allowed to assume that allocators of the same type are equal. The statement of interest is in section 20.1.6 of the C++ standard: "Implementations of containers described in this International Standard are permitted to assume that their Allocator template parameter meets the following two additional requirements beyond those in Table 34. — *All instances of a given allocator type are required to be interchangeable and always compare equal to each other.*" A copy of the draft is available (free) at www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1905.pdf. In particular: monotonic::inline_storage<1000> store1, store2; monotonic::allocator<int> alloc1(store1), alloc2(store2); std::list<int, monotonic::allocator<int> > list1(alloc1), list2(alloc2); STL implementations are allowed to assume alloc1 == alloc2, and list1 is allowed to allocate using alloc2, and vice versa. Moreover, it is allowed to use the list splice member on two lists of allocators of the same type. This has been pointed out multiple times, and you have yet to acknowledge the text that is plainly right there. Until you solve this issue, or create your own containers which explicitly do not make this assumption, I think your library should not be used, and quite honestly I am trying to be nice because I think your containers have merit, your repeated disregard for the c++ language rules is really annoying me. On Tue, Jun 16, 2009 at 4:07 AM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...

Hi Ross,

Sorry for my staccato replies. You caught me on the way out the door from work and I only had time to run some tests and post the results.

In truth, this is a non-issue. The code you presented broke the standard; monotonic allocator didn't. Allocators are indeed allowed to have non-static members.

There are various solutions to this 'problem':

* Only use one monotonic::storage. Multiple allocators that use the one storage will correctly compare as equal and can be shared. * Don't use std::list::splice with multiple allocators with GCC. * Fix the standard so that all STL implementations must respect allocator inequality. The fact that they are not required to is an anachronism. It is certainly possible. * Use a single static 'storage'. I hate this the most.

None of these invalidate the library as it stands.

BTW, this has nothing to do with my original code that you quoted where a vector migrates from the stack to the heap transparently.

...
I have a few ideas and might play around with them.

I'm interested in hearing your thoughts on how the larger issue could be addressed. But as noted, it is not me that is breaking the standard if you splice between two lists that are using different allocators.

Regards, Chrisitan. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Thorsten Ottosen

9:07 p.m.

Ross Levine skrev:

...

Hi Christian,

I'm afraid you are incorrect on both issues.

On the issue of non-static data: Allocators are allowed non-static data, but STL implementations are allowed to assume that allocators of the same type are equal. The statement of interest is in section 20.1.6 of the C++ standard: "Implementations of containers described in this International Standard are permitted to assume that their Allocator template parameter meets the following two additional requirements beyond those in Table 34. — *All instances of a given allocator type are required to be interchangeable and always compare equal to each other.*" A copy of the draft is available (free) at www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1905.pdf. In particular:

monotonic::inline_storage<1000> store1, store2; monotonic::allocator<int> alloc1(store1), alloc2(store2); std::list<int, monotonic::allocator<int> > list1(alloc1), list2(alloc2);

STL implementations are allowed to assume alloc1 == alloc2, and list1 is allowed to allocate using alloc2, and vice versa.

How can list1 ever use alloc2?

...

Moreover, it is allowed to use the list splice member on two lists of allocators of the same type.

I'm wondering how this leads to any curruption. I'm assuming 1. that memory is never freed 2. that memory is only use locally (in local containers) so even though some memory is mixed between the two lists, it shouldn't matter as allocator::deallocate() is a no-op. What am I missing? Thanks -Thorsten

Christian Schladetsch

9:30 p.m.

Hi Thorsten, I'm wondering how this leads to any curruption.

...

It doesn't.

...

I'm assuming

1. that memory is never freed

Memory is freed when the storage goes out of scope.

...

2. that memory is only use locally (in local containers)

so even though some memory is mixed between the two lists, it shouldn't matter as allocator::deallocate() is a no-op.

What am I missing?

GCC throws an exception if you attempt to splice between lists that have a different allocator.

...

Thanks

-Thorsten

Regards, Christian

Christian Schladetsch

9:14 p.m.

Hi Ross, //monotonic::inline_storage<1000> store1, store2;

...

monotonic::storage<> store1, store2; monotonic::allocator<int> alloc1(store1), alloc2(store2); std::list<int, monotonic::allocator<int> > list1(alloc1), list2(alloc2);

You can do that, that is fine. Just don't then try to mix the two lists. Attempts to do so is against the standard. The standard states that an STL implementation is allowed to treat two allocators of the same type isomorphically, however monotonic allocators have non-static data. So if you use two different monotonic storages like that, then splice between the lists, the behavior is undefined. It may work on some platforms, like MSVC, but for example it throws an assertion on GCC. I suggest that if you wish to write safe cross-platform code, that you limit yourself to using just one monotonic::storage for each collection of containers. Attempting to mix between them can indeed work, but there are known problems with this approach on some platforms and this usage is not supported. Regards, Christian.

David Bergman

9 Jun 9 Jun

3:55 p.m.

On Jun 9, 2009, at 7:56 AM, Christian Schladetsch wrote:

...

Hello,

I agree that it is a little inconvenient, but:

Click on the link, open, and open "index.html"

I guess you expected a personally hosted site. I do not run a personal website, so I can't post things locally. If there is a better way of doing it, please advise.

In general, use Sourceforge, GitHub or Google Code (http://code.google.com/ ). I prefer GitHub for various reasons. /David

Jonas Persson

8:52 a.m.

Hi, can you motivate why the proposal are adding a new set of containers instead of only an allocator to be used with the standard containers? / Jonas Christian Schladetsch wrote:

...

Hello,

There is a need in high-performance realtime software to use containers that may only grow in size. These are typically reset at the end of each update cycle.

The proposal here consists of a base 'monotonic allocator', which takes its storage from a buffer that you provide. Allocations are made from this buffer, and deallocation only calls an object's destructor; no actual memory-management is performed at all.

This is very useful for realtime systems that have to do things like, for example, physics or renderering systems. These systems build up lists and other containers over the period of an update frame, however at the end of that frame the storage can just be reset.

The benefit of using this system over a normal std::allocator is that memory is not fragmented, and memory allocation and deallocation is basically optimal.

Within the package are:

boost.monotonic.inline_storage boost.monotonic.list boost.monotonic.vector boost.monotonic.map boost.monotonic.set boost.monotonic.ptr_list

I didn't bother adding the other ptr_* containers yet.

The files for the proposal are in the vault at http://www.boostpro.com/vault/index.php?action=downloadfile&filename=MonotonicAllocator.zip&directory=& .

I'd be happy to address any issues or comments. There are some improvements that could be made.

Regards, Christian. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

9:02 a.m.

Hi Jonas, can you motivate why the proposal are adding a new set of containers instead

...

of only an allocator to be used with the standard containers?

Because the container constructors need to be passed the storage. The container implementations are trivial. Regards, Christian.

Andrew Sutton

12:17 p.m.

...

can you motivate why the proposal are adding a new set of containers instead

...
of only an allocator to be used with the standard containers?

Because the container constructors need to be passed the storage. The container implementations are trivial.

I think you can omit the containers if you just expect the programmer to explicitly construct the them over the storage_base. For example (I'm abbreviating monotonic): typedef std::vector<T, mono::allocator<T> > Vector; Vector v(mono::allocator<T>(store)); It' may appear be as graceful, but it doesn't require the duplication of classes, which can be harmful. There are libraries in Boost that specialize (templates) on containers types, and new containers would likely fail the specializations. Andrew Sutton andrew.n.sutton@gmail.com

joaquin＠tid.es

12:22 p.m.

Andrew Sutton escribió:

...

...
can you motivate why the proposal are adding a new set of containers instead

...
of only an allocator to be used with the standard containers?

Because the container constructors need to be passed the storage. The container implementations are trivial.

I think you can omit the containers if you just expect the programmer to explicitly construct the them over the storage_base. For example (I'm abbreviating monotonic):

typedef std::vector<T, mono::allocator<T> > Vector; Vector v(mono::allocator<T>(store));

I concur. Also, replicating container classes for the sake of avoiding this little boilerplate code is a maintenance bottleneck. (Besides, the adapted container classes as they're presented are missing losts of variant constructors). Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Christian Schladetsch

12:29 p.m.

Hi Joaquín, I concur. Also, replicating container classes for the sake of avoiding this

...

little boilerplate code is a maintenance bottleneck.

STL isn't rapidly changed. I agree that my proposal is not complete with all forwarding construction parameters. But they are readily added if the underlying idea is accepted. People will expect that a monotonic::foo<..> is like a foo<..>, and they will accept that it requires a storage argument. But they will find it harder to accept that it requires retooling from a type-argument level of the allocator. Regards, Christian.

joaquin＠tid.es

1:02 p.m.

Christian Schladetsch escribió:

...

Hi Joaquín,

I concur. Also, replicating container classes for the sake of avoiding this

...
little boilerplate code is a maintenance bottleneck.

STL isn't rapidly changed. I agree that my proposal is not complete with all forwarding construction parameters. But they are readily added if the underlying idea is accepted.

I'm not saying the task is difficult (in fact it's trivial), but the idea of wrapping every container in sight is (IMHO) a flawed one because: * Not only will you want to wrap containers in the STL but also those in Boost, which can be found in (off the top of my head), Boost.PtrContainer, Boost.MultiIndex, Boost.Intrusive, Boost.Interprocess, Boost.MultiArray, Boost.Unordered. I'm sure I'm missing some, we are talking about dozens of containers. And you'll have to keep track of new additions. * At a more fundamental level, you're turning a constant-complexity task into a linear-complexity task: if you leave it at the allocator level, your job's finished as soon as you wrote the allocator. With your approach, you are coupling with developments on the container arena, forcing yourself to track this indefinitely. That is the beauty, for instance, of the iterator concept: By decoupling containers from algorithms via iterators, you move from N*M possible combinations (N algorithms, M containers) to a M+N scenario. Allocators (ill-designed as they admittedly are) decouple one aspect, namely memory management, from the rest of aspects contributing to a container implementation. One last illustration of the point: say another author wrote wrapped containers simplifying the use of a template parameter other than the allocator, for instance: template<typename T,typename Allocator> struct greater_set:std::set<T,greater<T>,Allocator> { ... }; Now, how do you combine greater_set and mononotic::set?

...

People will expect that a monotonic::foo<..> is like a foo<..>, and they will accept that it requires a storage argument. But they will find it harder to accept that it requires retooling from a type-argument level of the allocator.

I can't say what people will expect, but I do know allocators are meant to be used and that's why they're part of the template argument interface. That's not "retooling". Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Andrew Sutton

1:07 p.m.

...

STL isn't rapidly changed.

That may be the understatement of the year :)

...

People will expect that a monotonic::foo<..> is like a foo<..>, and they will accept that it requires a storage argument. But they will find it harder to accept that it requires retooling from a type-argument level of the allocator.

Why incur the overhead of expectation, when foo<T, monotonic::allocator<T>> gives you exactly what you want? Andrew Sutton andrew.n.sutton@gmail.com

Stewart, Robert

1:27 p.m.

Andrew Sutton wrote: On Tuesday, June 09, 2009 9:07 AM

...

...
People will expect that a monotonic::foo<..> is like a foo<..>, and they will accept that it requires a storage argument. But they will find it harder to accept that it requires retooling from a type-argument level of the allocator.

Why incur the overhead of expectation, when foo<T, monotonic::allocator<T>> gives you exactly what you want?

There's a problem with using an allocator for the node based containers: they use the allocator to allocate the elements but not the nodes. That implies free store (de)allocations for the nodes which is contrary to the intended purpose of this proposal. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Christian Schladetsch

12:22 p.m.

Hi Andrew,

...

typedef std::vector<T, mono::allocator<T> > Vector; Vector v(mono::allocator<T>(store));

It' may appear be as graceful, but it doesn't require the duplication of classes, which can be harmful. There are libraries in Boost that specialize (templates) on containers types, and new containers would likely fail the specializations.

It is possible to use the proposed system without the need for custom container types. It is also possible to use STL without the need for default allocator types. Regards, Christian.

Cory Nelson

10 Jun 10 Jun

1:39 a.m.

On Mon, Jun 8, 2009 at 11:51 PM, Christian Schladetsch<christian.schladetsch@gmail.com> wrote:

...

Hello,

There is a need in high-performance realtime software to use containers that may only grow in size. These are typically reset at the end of each update cycle.

The proposal here consists of a base 'monotonic allocator', which takes its storage from a buffer that you provide. Allocations are made from this buffer, and deallocation only calls an object's destructor; no actual memory-management is performed at all.

This is very useful for realtime systems that have to do things like, for example, physics or renderering systems. These systems build up lists and other containers over the period of an update frame, however at the end of that frame the storage can just be reset.

The benefit of using this system over a normal std::allocator is that memory is not fragmented, and memory allocation and deallocation is basically optimal.

Within the package are:

boost.monotonic.inline_storage boost.monotonic.list boost.monotonic.vector boost.monotonic.map boost.monotonic.set boost.monotonic.ptr_list

I didn't bother adding the other ptr_* containers yet.

The files for the proposal are in the vault at http://www.boostpro.com/vault/index.php?action=downloadfile&filename=MonotonicAllocator.zip&directory=& .

I'd be happy to address any issues or comments. There are some improvements that could be made.

My largest complaint with C++ is that containers do not fit well with high-perf low-overhead scenarios, so I'm very interested in seeing something like this. I would want to see it taken a step further though. Overhead can be a big deal for large objects, so re-implementing string and vector completely (so that you don't have the pointer/capacity specified twice, in container & allocator) would be a welcome change. Let the user specify storage manually and throw if you go over it. Make default copying just copy the pointers around instead of all data. -- Cory Nelson http://int64.org

Christian Schladetsch

1:48 a.m.

Hi Cory,

...

My largest complaint with C++ is that containers do not fit well with high-perf low-overhead scenarios, so I'm very interested in seeing something like this.

I would want to see it taken a step further though. Overhead can be a big deal for large objects, so re-implementing string and vector completely (so that you don't have the pointer/capacity specified twice, in container & allocator) would be a welcome change.

I don't think anyone doing high-performance code uses std::string. It is a nightmare, and similarly for std::vector, for the reasons you claim. However, re-writing these is not what I wish to do here...

...

Let the user specify storage manually and throw if you go over it.

That is precisely what my proposal does.

...

Make default copying just copy the pointers around instead of all data.

We can't lose value semantics for containers by default. Regards, Christian.

David Bergman

2:22 a.m.

On Jun 9, 2009, at 9:48 PM, Christian Schladetsch wrote:

...

Hi Cory,

...
My largest complaint with C++ is that containers do not fit well with high-perf low-overhead scenarios, so I'm very interested in seeing something like this.

I would want to see it taken a step further though. Overhead can be a big deal for large objects, so re-implementing string and vector completely (so that you don't have the pointer/capacity specified twice, in container & allocator) would be a welcome change.

I don't think anyone doing high-performance code uses std::string.

I have; when there has been no need to add to it - obviously - or deal with Unicode.

...

It is a nightmare, and similarly for std::vector, for the reasons you claim.

I have... Why would it be a nightmare using std::vector? The allocation mechanism, for the implementations I know, is not that bad, and one can always handle it pre-emptively, just as one would with "my_cool_buffer_array_like_thingie," especially with the kind of PODs one often deals with in high-performance operations. I just wonder if the problem you see with the allocation scheme is an algorithmic one or something specific to a certain implementation. Putting allocation strategies aside - as in a lot of high-performance cases one is well aware of the "max buffer size" - what factors do you see of std::vector that is intrinsically, or in current implementations, limiting performance? /David

Christian Schladetsch

2:34 a.m.

Hi David, Why would it be a nightmare using std::vector? The allocation mechanism, for

...

the implementations I know, is not that bad, and one can always handle it pre-emptively, just as one would with "my_cool_buffer_array_like_thingie," especially with the kind of PODs one often deals with in high-performance operations. I just wonder if the problem you see with the allocation scheme is an algorithmic one or something specific to a certain implementation.

The 'problem' with std::vector, as with all the other containers, is that it treats memory allocation with little respect. This is precisely what my boost::monotonic proposal addresses. It gives you, the user of the containers, complete control over where the memory comes from, and how much there is of it.

...

Putting allocation strategies aside - as in a lot of high-performance cases one is well aware of the "max buffer size" - what factors do you see of std::vector that is intrinsically, or in current implementations, limiting performance?

Currently, all storage for a std::vector must be on the heap. That is not cool. It should also be able to come from the stack. That is what my proposal addresses. Having the ability to use STL containers which use stack-based memory is very important. It reduces memory fragmentation and increases cache coherency. Regards, Christian

Francois Barel

5:28 a.m.

Christian, Christian Schladetsch wrote:

...

This is precisely what my boost::monotonic proposal addresses. It gives you, the user of the containers, complete control over where the memory comes from,

I didn't follow his proposal closely, but I believe that is exactly what Thorsten Ottosen proposed with his auto_buffer proposal, no? AFAIK he has offered a fairly polished implementation, which is now awaiting review.

...

and how much there is of it.

In a std::vector, isn't that what vector::reserve is for (run-time reservation)? In a boost::auto_buffer, isn't that what StackBufferPolicy is for (compile-time reservation)? As has been asked before by Frank Mori Hess in this thread, and as I don't believe you have answered yet, in what respects is your proposal better/different from Thorsten's auto_buffer? Thanks, François

Christian Schladetsch

5:41 a.m.

Hi Francois, I didn't follow his proposal closely, but I believe that is exactly

...

what Thorsten Ottosen proposed with his auto_buffer proposal, no? AFAIK he has offered a fairly polished implementation, which is now awaiting review.

This is the first I have heard of Thorsten Ottose's auto_buffer proposal. Do you have a link to it?

...

As has been asked before by Frank Mori Hess in this thread, and as I don't believe you have answered yet, in what respects is your proposal better/different from Thorsten's auto_buffer?

My apologies if I overlooked that question. I do not recall seeing it. The purpose of boost::monotonic is to allow containers to share the same, fixed-size storage, and to not perform any memory management. Objects can be created and destroyed, but the percentage of used storage always monotonically increases. That fixed-size storage can come from either the heap or the stack. If auto_buffer does that, then yes there is a conflict and redundant effort.

...

Thanks, François

Cheers, Christian

Francois Barel

5:49 a.m.

Christian Schladetsch wrote:

...

This is the first I have heard of Thorsten Ottose's auto_buffer proposal. Do you have a link to it?

I believe the last links he posted are: http://www.cs.aau.dk/~nesotto/boost/auto_buffer.zip http://www.cs.aau.dk/~nesotto/boost/trunk/libs/auto_buffer/doc/html/ Regards, François

Christian Schladetsch

5:52 a.m.

This is unrelated. On Wed, Jun 10, 2009 at 5:49 PM, Francois Barel <frabar666@gmail.com> wrote:

...

...
This is the first I have heard of Thorsten Ottose's auto_buffer

Christian Schladetsch wrote: proposal.

...
Do you have a link to it?

I believe the last links he posted are: http://www.cs.aau.dk/~nesotto/boost/auto_buffer.zip<http://www.cs.aau.dk/%7Enesotto/boost/auto_buffer.zip> http://www.cs.aau.dk/~nesotto/boost/trunk/libs/auto_buffer/doc/html/<http://www.cs.aau.dk/%7Enesotto/boost/trunk/libs/auto_buffer/doc/html/>

Regards, François _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

5:59 a.m.

Hello, My apologies; I should have been less `succint`. auto_buffer is, in fact, related, inasmuch as it also attempts to use the stack for storage. However, boost::monotonic provides an allocator that means that *all* STL containers - and other ones - can use the stack for storage. Or the heap. In that respect, auto_buffer<T> is a subset of the proposed boost::monotonic::vector<T>. Regards, Christian. On Wed, Jun 10, 2009 at 5:52 PM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...

This is unrelated.

On Wed, Jun 10, 2009 at 5:49 PM, Francois Barel <frabar666@gmail.com>wrote:

...
...
This is the first I have heard of Thorsten Ottose's auto_buffer

Christian Schladetsch wrote: proposal.

...
Do you have a link to it?

I believe the last links he posted are: http://www.cs.aau.dk/~nesotto/boost/auto_buffer.zip<http://www.cs.aau.dk/%7Enesotto/boost/auto_buffer.zip> http://www.cs.aau.dk/~nesotto/boost/trunk/libs/auto_buffer/doc/html/<http://www.cs.aau.dk/%7Enesotto/boost/trunk/libs/auto_buffer/doc/html/>

Regards, François _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

David Bergman

6:51 a.m.

On Jun 10, 2009, at 1:59 AM, Christian Schladetsch wrote:

...

Hello,

My apologies; I should have been less `succint`.

You are a bit trigger happy sometimes.

...

auto_buffer is, in fact, related, inasmuch as it also attempts to use the stack for storage.

Ok, so you could use Thorsten's auto_buffer as your storage, which actually would give you, potentially, the best of two worlds (allocation speed and - you seem to push this angle - cache) and dynamic heap beyond that initial segment.

...

However, boost::monotonic provides an allocator that means that *all* STL containers - and other ones - can use the stack for storage. Or the heap.

NOTE: your "storage_base" is not a base abstraction at all, but a "heap_storage"; having a non-leaf having a concrete but totally different behavior from its leaf child is less than beautiful. Why is the behavior of whether to pick the heap storage ("storage_base"... ugh) or stack (if not embedded in a heap-allocated object...) decided at runtime, via that polymorphic "storage_interface"? It is set at the creation of the allocator, so why not make it a type parameter of the allocator, with a default to, say, "inline_storage"? Do you think it is important to have two containers sharing the exact same type even when one uses "storage_base" and the other "inline_storage"?

...

In that respect, auto_buffer<T> is a subset of the proposed boost::monotonic::vector<T>.

I still fail to see why you wrap all those containers, instead of providing just that allocator. Is the only reason what I just mentioned, to keep the storage mechanism variable ("storage_base" or "internal_storage") while the container types equal? That is the only somewhat logical rationale for me. Regarding injecting stuff, such as your allocator, in a template class higher up the hierarchy (such as in std::vector), you should have a look at my proposal at a "type injector": http://blog.davber.com/2006/07/24/hierarchy-generation-in-c/ Take care. /David

Christian Schladetsch

8:16 a.m.

Hi David, auto_buffer is, in fact, related, inasmuch as it also attempts to use the

...

...
stack for storage.

Ok, so you could use Thorsten's auto_buffer as your storage, which actually would give you, potentially, the best of two worlds (allocation speed and - you seem to push this angle - cache) and dynamic heap beyond that initial segment.

Cache coherency is King! auto_buffer<T> is not analogous to monotonic::inline_storage<N> However, boost::monotonic provides an allocator that means that *all* STL

...

containers - and other ones - can use the stack for storage. Or the heap.

NOTE: your "storage_base" is not a base abstraction at all, but a

...

"heap_storage"; having a non-leaf having a concrete but totally different behavior from its leaf child is less than beautiful.

I agree that the current implementation is not ideal, but I don't follow what you mean here. storage_base may be ugly, but it is also required so that different containers can use the same storage. I agree that it could be improved. I had a lot of problems with the more fundamental problems of std::allocator

...

Why is the behavior of whether to pick the heap storage ("storage_base"... ugh) or stack (if not embedded in a heap-allocated object...) decided at runtime, via that polymorphic "storage_interface"? It is set at the creation of the allocator, so why not make it a type parameter of the allocator, with a default to, say, "inline_storage"? Do you think it is important to have two containers sharing the exact same type even when one uses "storage_base" and the other "inline_storage"?

I agree that there are some ugly aspects to the current proposal. However, it also works as advertised which is very important. I hate the current implementation's basis of using a virtual base class. It is anathema to me. I only presented it as it stands because I was more concerned about adoption of the general idea, rather than the smaller and less-relevant details of the implementation. I think most of us could write this from scratch with no problem, and without virtuals. Point being, if the idea is adopted, then yes there are optimisations and beautifying to do. That said, you may have also missed the point that multiple containers can use the same storage - which may be on the heap or the stack.

...

In that respect, auto_buffer<T> is a subset of the proposed boost::monotonic::vector<T>.

I still fail to see why you wrap all those containers, instead of providing

...

just that allocator. Is the only reason what I just mentioned, to keep the storage mechanism variable ("storage_base" or "internal_storage") while the container types equal? That is the only somewhat logical rationale for me. Regarding injecting stuff, such as your allocator, in a template class higher up the hierarchy (such as in std::vector), you should have a look at my proposal at a "type injector": http://blog.davber.com/2006/07/24/hierarchy-generation-in-c/

I will consider dropping the idea of including the containers on top of the allocator. The 'wrapping of the containers' is very small, very straight-forward, only requires the passing-forward of construction parameters, and is completely safe as none of them have any state outside of their base classes. Inheriting from the std::containers also means that existing code that pattern-matches against the std::containers will also match against monotonic::containers. They are functionally equivalent, and in-memory equivalent. They are the same thing, except that monotonic::containers must be given either an allocator or storage from which to make one. The fact that STL containers do not have virtual dtors is not relevant. There is no state in the proposed monotonic containers that is not in std::containers. There is a case for: boost::monotonic::inline_storage<N> storage; std::vector<int, boost::monotonic::allocator<int> > v(storage); // (A) versus: boost::monotonic::inline_storage<N> storage; boost::monotonic::vector<int> v(storage); // (B) but what about: boost::monotonic::inline_storage<N> storage; boost::monotonic::map<int, float> v(storage); versus: boost::monotonic::inline_storage<N> storage; std::map<int, float, std::less<int>, boost::monotonic::allocator<int> > v(storage); My main concern would be that people just aren't comfortable with the idea of using custom allocator types, but they will accept the idea of needing to pass the storage in once they realised that they needed a monotonic container. There are other important issues, like having to give the predicate type for the associative containers first. If I didn't think it was important, I wouldn't have proposed the library in the first place. Given that I think that it's important, it is also important to me that the usage is clean. (A) above may be `better`, but (B) is cleaner. Take care.

...

/David

Cheers, Christian

Christian Schladetsch

8:32 a.m.

I was wrong. boost::monotonic::inline_storage<N> storage; std::map<int, float, std::less<int>, boost::monotonic::allocator<int> > v(storage); Won't even work. Storage is not the allocator. Rather, to not use the container overloads would require: boost::monotonic::inline_storage<N> storage; std::map<int, float, std::less<int>, boost::monotonic::allocator<int> > v(boost::monotonic::allocator<int>(storage)); Which is dramatically even worse again. On Wed, Jun 10, 2009 at 8:16 PM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...

Hi David,

auto_buffer is, in fact, related, inasmuch as it also attempts to use the

...
...
stack for storage.

Ok, so you could use Thorsten's auto_buffer as your storage, which actually would give you, potentially, the best of two worlds (allocation speed and - you seem to push this angle - cache) and dynamic heap beyond that initial segment.

Cache coherency is King!

auto_buffer<T> is not analogous to monotonic::inline_storage<N>

However, boost::monotonic provides an allocator that means that *all* STL

...
containers - and other ones - can use the stack for storage. Or the heap.

NOTE: your "storage_base" is not a base abstraction at all, but a

...
"heap_storage"; having a non-leaf having a concrete but totally different behavior from its leaf child is less than beautiful.

I agree that the current implementation is not ideal, but I don't follow what you mean here. storage_base may be ugly, but it is also required so that different containers can use the same storage.

I agree that it could be improved. I had a lot of problems with the more fundamental problems of std::allocator

...
Why is the behavior of whether to pick the heap storage ("storage_base"... ugh) or stack (if not embedded in a heap-allocated object...) decided at runtime, via that polymorphic "storage_interface"? It is set at the creation of the allocator, so why not make it a type parameter of the allocator, with a default to, say, "inline_storage"? Do you think it is important to have two containers sharing the exact same type even when one uses "storage_base" and the other "inline_storage"?

I agree that there are some ugly aspects to the current proposal. However, it also works as advertised which is very important.

I hate the current implementation's basis of using a virtual base class. It is anathema to me. I only presented it as it stands because I was more concerned about adoption of the general idea, rather than the smaller and less-relevant details of the implementation. I think most of us could write this from scratch with no problem, and without virtuals.

Point being, if the idea is adopted, then yes there are optimisations and beautifying to do.

That said, you may have also missed the point that multiple containers can use the same storage - which may be on the heap or the stack.

...
In that respect, auto_buffer<T> is a subset of the proposed boost::monotonic::vector<T>.

I still fail to see why you wrap all those containers, instead of providing

...
just that allocator. Is the only reason what I just mentioned, to keep the storage mechanism variable ("storage_base" or "internal_storage") while the container types equal? That is the only somewhat logical rationale for me. Regarding injecting stuff, such as your allocator, in a template class higher up the hierarchy (such as in std::vector), you should have a look at my proposal at a "type injector": http://blog.davber.com/2006/07/24/hierarchy-generation-in-c/

I will consider dropping the idea of including the containers on top of the allocator.

The 'wrapping of the containers' is very small, very straight-forward, only requires the passing-forward of construction parameters, and is completely safe as none of them have any state outside of their base classes.

Inheriting from the std::containers also means that existing code that pattern-matches against the std::containers will also match against monotonic::containers. They are functionally equivalent, and in-memory equivalent. They are the same thing, except that monotonic::containers must be given either an allocator or storage from which to make one.

The fact that STL containers do not have virtual dtors is not relevant. There is no state in the proposed monotonic containers that is not in std::containers.

There is a case for:

boost::monotonic::inline_storage<N> storage; std::vector<int, boost::monotonic::allocator<int> > v(storage); // (A)

versus: boost::monotonic::inline_storage<N> storage; boost::monotonic::vector<int> v(storage); // (B)

but what about: boost::monotonic::inline_storage<N> storage; boost::monotonic::map<int, float> v(storage);

versus: boost::monotonic::inline_storage<N> storage; std::map<int, float, std::less<int>, boost::monotonic::allocator<int> > v(storage);

My main concern would be that people just aren't comfortable with the idea of using custom allocator types, but they will accept the idea of needing to pass the storage in once they realised that they needed a monotonic container.

There are other important issues, like having to give the predicate type for the associative containers first.

If I didn't think it was important, I wouldn't have proposed the library in the first place. Given that I think that it's important, it is also important to me that the usage is clean. (A) above may be `better`, but (B) is cleaner.

Take care.

...
/David

Cheers, Christian

Peter Bindels

9:17 a.m.

Christian, 2009/6/10 Christian Schladetsch <christian.schladetsch@gmail.com>

...

I was wrong.

boost::monotonic::inline_storage<N> storage; std::map<int, float, std::less<int>, boost::monotonic::allocator<int> > v(storage);

Won't even work. Storage is not the allocator. Rather, to not use the container overloads would require:

boost::monotonic::inline_storage<N> storage; std::map<int, float, std::less<int>, boost::monotonic::allocator<int> > v(boost::monotonic::allocator<int>(storage));

Which is dramatically even worse again.

Unless you make the constructor of the allocator explicit, it should work. Kind regards, Peter Bindels

Christian Schladetsch

9:33 a.m.

Hi Peter, Unless you make the constructor of the allocator explicit, it should work.

...

Good point, but that doesn't resolve the fact that the std associative containers take the predicate type before the allocator type, and doesn't help with the ctors that take range arguments. I have requested sandbox write ability to update the library... Regards, Christian.

David Bergman

5:49 p.m.

On Jun 10, 2009, at 4:32 AM, Christian Schladetsch wrote:

...

I was wrong.

boost::monotonic::inline_storage<N> storage; std::map<int, float, std::less<int>, boost::monotonic::allocator<int> > v(storage);

Won't even work. Storage is not the allocator. Rather, to not use the container overloads would require:

boost::monotonic::inline_storage<N> storage; std::map<int, float, std::less<int>, boost::monotonic::allocator<int> > v(boost::monotonic::allocator<int>(storage));

Which is dramatically even worse again.

Yes, it is sad that C++ does not have parameterized typedef's, like haXe (http://www.haxe.org), but if you at least abstract away the specific containers, by: template<template C, typename T> class map : public C<T, std::less<T>, boost::monotonic::allocator<T> > { typedef C<T, std::less<T>, boost::monotonic::allocator<T> > base_cont; .... }; and get rid of that polymorphic "storage_base", or at least use a good default; you should actually make it a policy instead of having to pass a "storage_base" (polymorphic) object to the constructor of your container. Also, that the default constructor of your containers create a container that throws on any addition of elements is perhaps not ideal. Even the fact that the user has to make sure that this "storage_base" object reside in a safe place for the lifespan of your container is sad.

...

On Wed, Jun 10, 2009 at 8:16 PM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...
Hi David,

auto_buffer is, in fact, related, inasmuch as it also attempts to use the

...
...
stack for storage.

Ok, so you could use Thorsten's auto_buffer as your storage, which actually would give you, potentially, the best of two worlds (allocation speed and - you seem to push this angle - cache) and dynamic heap beyond that initial segment.

Cache coherency is King!

Yes, I have noticed that you push that angle very hard. Three comments: 1. In most high-performance applications I have written, the number of re-allocations of std::vector (or similar abstractions in other languages) has been quite low, and the cache hit ratio pretty good, with that quite firm contiguous block. You seem to think that there is something magical about stack space relative heap space here. Can you explain that in more detail: i.e., why is cache coherency for a fixed block, such as your "internal_storage" so much higher than for an equally fixed block on the heap (using std::vector)? This as you labelled the use of the former as silly and a no go in high- performance applications. To paraphrase, "nobody writing high-speed applications would ever consider using std::vector". 2. Why is cache coherency so much better when you use an inline array then when Thorsten does it? He also allocates inline (at first) and you only have a first, anyway. So, can you please explain why cache coherency is so much more King in your internal_storage<1000> storage; vector<Foo> chrisCont(storage); // and let us just pray that 'storage' survives this ;-) than in Thorsten's auto_buffer<store_n_bytes<1000> > thorCont; ? NOTE: I know that your contiguous block could potentially scan various containers, which *could be* a good thing for cache coherence, but in real life, there is often *one* container involved in a certain high- performance task. What is the use case you had in mind with more than one container, while having good cache coherence? It would have to involve a lot of activity in a specific region of that block (i.e., basically working on one container) or deal with a tiny block, which would make it less viable for most high-performance applications. And, there could not be many reallocations by the container class (such as growing.) Corollary: did you even look at Thorsten's auto_buffer before exclaiming that "cache coherency is King"?

...

...
auto_buffer<T> is not analogous to monotonic::inline_storage<N>

However, boost::monotonic provides an allocator that means that *all* STL

...
containers - and other ones - can use the stack for storage. Or the heap.

NOTE: your "storage_base" is not a base abstraction at all, but a

...
"heap_storage"; having a non-leaf having a concrete but totally different behavior from its leaf child is less than beautiful.

I agree that the current implementation is not ideal, but I don't follow what you mean here. storage_base may be ugly, but it is also required so that different containers can use the same storage.

What I mean is that if you look at the inheritance graph: storage_interface (abstract) | storage_base (allocates on heap) | internal_storage (allocates inline, or stack...) It is weird that "storage_base" (i) is concrete and (ii) does a completely different thing than the leaf, "internal_storage"; and (iii) the name does not imply its behavior at all; it should be "heap_storage", and the graph should look like this: storage_interface (abstract) | storage_base (whatever concrete stuff is common for all or most storages) / \ heap_storage internal_storage I would actually merge the two abstract types, but that is just me.

...

...
I agree that it could be improved. I had a lot of problems with the more fundamental problems of std::allocator

...
Why is the behavior of whether to pick the heap storage ("storage_base"... ugh) or stack (if not embedded in a heap-allocated object...) decided at runtime, via that polymorphic "storage_interface"? It is set at the creation of the allocator, so why not make it a type parameter of the allocator, with a default to, say, "inline_storage"? Do you think it is important to have two containers sharing the exact same type even when one uses "storage_base" and the other "inline_storage"?

I agree that there are some ugly aspects to the current proposal. However, it also works as advertised which is very important.

I do not know what was advertised exactly, but (i) it does crash with a bus error when the container is default constructed, without explicitly setting a storage, and (ii) it does crash on certain platforms and uses cases, due to misaligned memory blocks. [I will comment more on this in another reply to you] Besides, it does not even compile on GCC with strict error checking. And, your 'map' does not work at all, since the allocator is expecting the key type instead of the proper 'pair<key, value>' which is expected by the standard container. Oh, well, by skipping the 'map' and changing the 'P' to 'p' in a few instances, I got it to compile. What is the meaning of your "New" function in your allocator? It does some funky back-stepping, repeating a construction. GCC NOTE: on 4.3 and 4.4, INT_MAX is not defined in your code, whereas on GCC 4.0 it is.

...

...
I hate the current implementation's basis of using a virtual base class. It is anathema to me. I only presented it as it stands because I was more concerned about adoption of the general idea, rather than the smaller and less-relevant details of the implementation. I think most of us could write this from scratch with no problem, and without virtuals.

Point being, if the idea is adopted, then yes there are optimisations and beautifying to do.

That said, you may have also missed the point that multiple containers can use the same storage - which may be on the heap or the stack.

No, I did not miss that.

...

...
...
In that respect, auto_buffer<T> is a subset of the proposed boost::monotonic::vector<T>.

I still fail to see why you wrap all those containers, instead of providing

...
just that allocator. Is the only reason what I just mentioned, to keep the storage mechanism variable ("storage_base" or "internal_storage") while the container types equal? That is the only somewhat logical rationale for me. Regarding injecting stuff, such as your allocator, in a template class higher up the hierarchy (such as in std::vector), you should have a look at my proposal at a "type injector": http://blog.davber.com/2006/07/24/hierarchy-generation-in-c/

I will consider dropping the idea of including the containers on top of the allocator.

The 'wrapping of the containers' is very small, very straight- forward, only requires the passing-forward of construction parameters, and is completely safe as none of them have any state outside of their base classes.

Inheriting from the std::containers also means that existing code that pattern-matches against the std::containers will also match against monotonic::containers.

I do not follow you.

...

...
They are functionally equivalent,

So, you can replace 'std::vector' with 'boost::monotonic::vector', perhaps with a textual replace all?

...

...
and in-memory equivalent. They are the same thing, except that monotonic::containers must be given either an allocator or storage from which to make one.

Ah, yes, so they are not functionally equivalent per se.

...

...
The fact that STL containers do not have virtual dtors is not relevant. There is no state in the proposed monotonic containers that is not in std::containers.

There is a case for:

boost::monotonic::inline_storage<N> storage; std::vector<int, boost::monotonic::allocator<int> > v(storage); // (A)

versus: boost::monotonic::inline_storage<N> storage; boost::monotonic::vector<int> v(storage); // (B)

but what about: boost::monotonic::inline_storage<N> storage; boost::monotonic::map<int, float> v(storage);

versus: boost::monotonic::inline_storage<N> storage; std::map<int, float, std::less<int>, boost::monotonic::allocator<int> > v(storage);

My main concern would be that people just aren't comfortable with the idea of using custom allocator types, but they will accept the idea of needing to pass the storage in once they realised that they needed a monotonic container.

The people you target, who would think that the, in my opinion, proper way of doing it is so much more hairy than your "wrapped" way would also assume that they can use the container default-constructed.

...

...
There are other important issues, like having to give the predicate type for the associative containers first.

These issues are not important, in my opinion.

...

...
If I didn't think it was important, I wouldn't have proposed the library in the first place.

Alas, it is intrinsically important, QED?

...

...
Given that I think that it's important, it is also important to me that the usage is clean. (A) above may be `better`, but (B) is cleaner.

I think the usage that *requires* an explicit storage object is less than clean. You should at least default to a specific storage, worst case heap-allocating it, since you do not want to slice a concrete storage object. I like some aspects of your idea, which is why I take the time to go through your code and run some samples. I think Thorsten's auto_buffer covers most cases in performance-critical applications, but like the idea of extending it to maps and lists. So, get rid of those wrappers and tidy up the allocators a bit, and see how you can integrate it with Thorsten's effort, and we might have ourselves a winner. /David

Simonson, Lucanus J

6:16 p.m.

David Bergman wrote:

...

...
...
...
Ok, so you could use Thorsten's auto_buffer as your storage, which actually would give you, potentially, the best of two worlds (allocation speed and - you seem to push this angle - cache) and dynamic heap beyond that initial segment.

Cache coherency is King!

Yes, I have noticed that you push that angle very hard.

...

2. Why is cache coherency so much better when you use an inline array then when Thorsten does it? He also allocates inline (at first) and you only have a first, anyway. So, can you please explain why cache coherency is so much more King in your...

Cache coherency is I think a misapplied term here. What you two seem to be talking about is spatial locality of reference. The stack typically has good spatial locality of reference relative to the heap because it is contiguous. A vector or similar contiguous data structure always has good spatial locality of reference because it is contiguous unless it is very small. You will have no significant benefit from allocating a vector on the stack vs. the heap if it is very large because 1) your dynamic allocation cost is ammoratized over a very large number of vector elements and 2) the benefit from the first few elements already being in cache because of being on the stack is amoratized over a very large number of elements that are likely not in the cache and will perform the same as if allocated on the heap. Furthermore, the performance benefit of allocating a small vector on the stack as opposed to the heap will be dominated by the time saved by not going to to the dynamic memory allocation routine as opposed to the cache misses caused by the dynamic memory allocated being cold. Both of these performance concerns can be improved by pooling allocators. I question the benfit of this stack vs. heap based allocation for containers when the termonology used to describe its benefit is misapplied. By the way, allocating large containers on the stack is an extremely, extremely foolish thing to do because it causes crashes and performance problems. Cache coherency is a concurrency issue where multiple processors (or cores) are sharing memory and when modifying the shared memory in their cache need to make sure that the cached version of the same address is also modified in the caches of other cores/processors. It is a serious problem when you have to handle it in software, and is best solved by hardware, which is tricky and expensive. Cell processors do not have hardware cache coherency, while intel processors do. Since you don't have to worry about cache coherency in general, and when you do need to worry about it your life is hell and you definitely know it, and since it is only an issue on some platforms, I doubt any boost library should ever consider the issue of cache coherency at all. Regards, Luke

David Bergman

6:23 p.m.

On Jun 10, 2009, at 2:16 PM, Simonson, Lucanus J wrote:

...

David Bergman wrote:

...
...
...
...
Ok, so you could use Thorsten's auto_buffer as your storage, which actually would give you, potentially, the best of two worlds (allocation speed and - you seem to push this angle - cache) and dynamic heap beyond that initial segment.

Cache coherency is King!

Yes, I have noticed that you push that angle very hard.

...

...
2. Why is cache coherency so much better when you use an inline array then when Thorsten does it? He also allocates inline (at first) and you only have a first, anyway. So, can you please explain why cache coherency is so much more King in your...

Cache coherency is I think a misapplied term here.

I know, that is why I put apostrophes around it when initially talking about it; but that seemed to be the preferred term, with it being King and all. Yes, I am talking about locality of reference, and used basically the arguments you do here in order to get the idea that stacks are somehow intrinsically more cache-friendly than the heap out of somebody's head... /David

...

What you two seem to be talking about is spatial locality of reference. The stack typically has good spatial locality of reference relative to the heap because it is contiguous. A vector or similar contiguous data structure always has good spatial locality of reference because it is contiguous unless it is very small. You will have no significant benefit from allocating a vector on the stack vs. the heap if it is very large because 1) your dynamic allocation cost is ammoratized over a very large number of vector elements and 2) the benefit from the first few elements already being in cache because of being on the stack is amoratized over a very large number of elements that are likely not in the cache and will perform the same as if allocated on the heap. Furthermore, the performance benefit of allocating a small vector on the stack as opposed to the heap will be dominated by the time saved by not going to to the dynamic memory allocation routine as opposed to the cache misses caused by the dynamic memory allocated being cold. Both of these performance concerns can be improved by pooling allocators. I question the benfit of this stack vs. heap based allocation for containers when the termonology used to describe its benefit is misapplied. By the way, allocating large containers on the stack is an extremely, extremely foolish thing to do because it causes crashes and performance problems.

Cache coherency is a concurrency issue where multiple processors (or cores) are sharing memory and when modifying the shared memory in their cache need to make sure that the cached version of the same address is also modified in the caches of other cores/processors. It is a serious problem when you have to handle it in software, and is best solved by hardware, which is tricky and expensive. Cell processors do not have hardware cache coherency, while intel processors do. Since you don't have to worry about cache coherency in general, and when you do need to worry about it your life is hell and you definitely know it, and since it is only an issue on some platforms, I doubt any boost library should ever consider the issue of cache coherency at all.

Regards, Luke _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Peter Bindels

8:33 a.m.

Hi Christian, 2009/6/10 Christian Schladetsch <christian.schladetsch@gmail.com>

...

There is a case for:

boost::monotonic::inline_storage<N> storage; std::vector<int, boost::monotonic::allocator<int> > v(storage); // (A)

versus: boost::monotonic::inline_storage<N> storage; boost::monotonic::vector<int> v(storage); // (B)

but what about: boost::monotonic::inline_storage<N> storage; boost::monotonic::map<int, float> v(storage);

versus: boost::monotonic::inline_storage<N> storage; std::map<int, float, std::less<int>, boost::monotonic::allocator<int> > v(storage);

My main concern would be that people just aren't comfortable with the idea of using custom allocator types, but they will accept the idea of needing to pass the storage in once they realised that they needed a monotonic container.

There are other important issues, like having to give the predicate type for the associative containers first.

If I didn't think it was important, I wouldn't have proposed the library in the first place. Given that I think that it's important, it is also important to me that the usage is clean. (A) above may be `better`, but (B) is cleaner.

I much prefer using a standard template library container to any other, even if it's the same thing but different. I would therefore much prefer the std::map<int, float, std::less<int>, ...> approach to creating another container; especially as this one can also take a different comparison operator. My rationale is that when I want a vector with a different allocation scheme, I like to say "vector with a different allocation scheme" rather than "differently-allocated not-quite-a-std::vector". Kind regards, Peter Bindels

Christian Schladetsch

8:42 a.m.

Hi Peter, I much prefer using a standard template library container to any other, even

...

if it's the same thing but different. I would therefore much prefer the std::map<int, float, std::less<int>, ...> approach to creating another container; especially as this one can also take a different comparison operator.

Yes, but what about boost::monotonic::inline_storage<N> storage; std::map<int, float, std::less<int>, boost::monotonic::allocator<int> > v(boost::monotonic::allocator<int>(storage)); versus: boost::monotonic::inline_storage<N> storage; boost::monotonic::map<int, float> v(storage); My rationale is that when I want a vector with a different allocation

...

scheme, I like to say "vector with a different allocation scheme" rather than "differently-allocated not-quite-a-std::vector".

I grok that; but boost::monotonic::container also is-a std::container with guranteed no extra junk. They are absolutely %100 type-compatible after creation. Regards, Christian

Thorsten Ottosen

9:37 p.m.

Christian Schladetsch skrev:

...

Hi Peter,

I much prefer using a standard template library container to any other, even

...
if it's the same thing but different. I would therefore much prefer the std::map<int, float, std::less<int>, ...> approach to creating another container; especially as this one can also take a different comparison operator.

Yes, but what about

boost::monotonic::inline_storage<N> storage; std::map<int, float, std::less<int>, boost::monotonic::allocator<int> > v(boost::monotonic::allocator<int>(storage));

versus:

boost::monotonic::inline_storage<N> storage; boost::monotonic::map<int, float> v(storage);

My rationale is that when I want a vector with a different allocation

...
scheme, I like to say "vector with a different allocation scheme" rather than "differently-allocated not-quite-a-std::vector".

I grok that; but boost::monotonic::container also is-a std::container with guranteed no extra junk. They are absolutely %100 type-compatible after creation.

I think it is slightly wrong of you to put so much emphaisis on syntactical convinience. In general, such is nice, but your library should offer some more real value. Especially since most containers end up being typedefs so you can easily switch to another, and so nested types are easy to type, and so that you can give better fitting names to your types for a given task. Locality of reference seems like an important problem to improve. I don't agree with those that say high-performance applications usually involves a single container; mine certainly doesn't. Intrusive containers seems to give similar benefits w.r.t. locality of reference. Similar for flat_set/flat_map. It is not clear to me in which niche your library would fit. I hope you can make that clear, and provide examples that show your library performs better than otherwise. -Thorsten

David Bergman

9:46 p.m.

On Jun 10, 2009, at 5:37 PM, Thorsten Ottosen wrote:

...

Locality of reference seems like an important problem to improve. I don't agree with those that say high-performance applications usually involves a single container; mine certainly doesn't.

Actually, I have been thinking a bit about it after I wrote it - I am afraid I am that "they" here ;-) - and, you are right, most high- performance applications I write involve more than one container, just that one is pivotal and the typical bottleneck, but that fact is ignoring the "cache defocusing" of moving to that "secondary" container. Anyway, my other comments still hold. This locality across a few containers is something I get equally well, if not better, from a small heap than this "monotonic" container proposal.

...

Intrusive containers seems to give similar benefits w.r.t. locality of reference. Similar for flat_set/flat_map.

Yes

...

It is not clear to me in which niche your library would fit. I hope you can make that clear, and provide examples that show your library performs better than otherwise.

I am curious as to how his ideas can affect the future of your auto_buffer, and perhaps creating an "auto_map" ;-) /David

Thorsten Ottosen

11 Jun 11 Jun

8:25 a.m.

David Bergman skrev:

...

...
It is not clear to me in which niche your library would fit. I hope you can make that clear, and provide examples that show your library performs better than otherwise.

I am curious as to how his ideas can affect the future of your auto_buffer, and perhaps creating an "auto_map" ;-)

Well, with a simple modification to flat_set/flat_map and circular_buffer, all these containers should be configured to use a custom buffer, e.g. an array or auto_buffer. Would this fit what you have in mind? -Thorsten

David Bergman

2:33 p.m.

On Jun 11, 2009, at 4:25 AM, Thorsten Ottosen wrote:

...

David Bergman skrev:

...
...
It is not clear to me in which niche your library would fit. I hope you can make that clear, and provide examples that show your library performs better than otherwise. I am curious as to how his ideas can affect the future of your auto_buffer, and perhaps creating an "auto_map" ;-)

Well, with a simple modification to flat_set/flat_map and circular_buffer, all these containers should be configured to use a custom buffer, e.g. an array or auto_buffer.

Would this fit what you have in mind?

Yes. Tanks in advance ;-) /David

Scott McMurray

10 Jun 10 Jun

8:18 p.m.

2009/6/10 Christian Schladetsch <christian.schladetsch@gmail.com>:

...

Cache coherency is King!

Well, for portability, cache obliviousness is King, since there's no way you can tune for every specific combination of memory caches, instruction caches, page caches, etc all at the same time. ( http://en.wikipedia.org/wiki/Cache-oblivious_algorithm )

Christian Schladetsch

11 Jun 11 Jun

2:25 a.m.

I do not intend to argue about the difference between cache coherency and spatial locality; suffice to say they are related. If you have a set of nodes for a map scattered over a heap, and traverse them, you will be generally hitting different cache lines. If you have a set of nodes for a map on the stack, and traverse them, you will generally be hitting fewer cache lines. This is a platform-independant fact of all modern hardware. Now, if you have an unordered_map<string, list<Foo> >, and that is on the stack using a monotonic::allocator, using this data-structure will be much faster than using the exact same structure with a std::allocator. Why? Because the first one is not doing *any memory management*. Despite claims otherwise, I challenge anyone to show me where the proposed system does a single memory allocation, or a memory de-allocation. All the storage is in a boost::array<char, N>. That is the allocation buffer. It is allocated by the compiler. It is on an aligned boundary. It is probably on the stack. I am just returning newly constructed objects from within that buffer. Should I align objects within that buffer? Sure, I've already done it, it's a trivial change. But its still not an allocation, it is a re-interpretation of space within an already-allocated buffer. So, the first reason why my data-structures are much faster than using the std:: ones is that they do no memory allocation. No new, no delete. "Allocation" is simply incrementing a pointer. "Deallocation" does nothing at all. The second reason why containers using monotonic::allocator are faster than otherwise is that they can use the stack for storage of std:: containers. This means that you can put a map<string, list<map<int, vector<foo>>>> on the stack. All the nodes and the pointers within those nodes are also on the stack. This means that traversing that structure is fast, and going from the vector<foo> inside to the outer map<string, ..> is fast, as the values for both will be stored adjacent to each other on the stack. Make a new structure using the same storage, somewhere down a call-chain perhaps, and it will play very nicely with the existing ones. I have a set of motivational examples in the documentation. So, the three things that are different about this proposal is that 1. there is no memory management and 2. the stack can be used to store the nodes in a map, set or list (or unordered_map, etc). 3. no memory fragmentation. The main compelling reason to use a monotonic::vector<T> over a std::vector<T> is that the former will not fragment memory, but the second one will. The downside of couse is that resizing a monotonic::vector is very wasteful, as the memory for the old vector is not freed. That is why, for example, the following fails: monotonic::storage<100*sizeof(int)> storage; //< on the stack! not on the heap, as others have suggested monotonic::vector<int> v(storage); for (int n = 0; n < 100; ++n) v.push_back(n); // will fail at n = 34 with an exhausted storage but this succeeds: monotonic::storage<100*sizeof(int)> storage; monotonic::vector<int> v(100, storage); // create with an initial size and storage for (int n = 0; n < 100; ++n) v[n] = n; As the vector in the first case expands, it resizes. But the old memory is not freed - that is why it is explicitly called a "monotonic allocator". On Thu, Jun 11, 2009 at 8:18 AM, Scott McMurray <me22.ca+boost@gmail.com>wrote:

...

2009/6/10 Christian Schladetsch <christian.schladetsch@gmail.com>:

...
Cache coherency is King!

Well, for portability, cache obliviousness is King, since there's no way you can tune for every specific combination of memory caches, instruction caches, page caches, etc all at the same time.

( http://en.wikipedia.org/wiki/Cache-oblivious_algorithm ) _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Frank Mori Hess

1:04 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 10 June 2009, Christian Schladetsch wrote:

...

monotonic::storage<100*sizeof(int)> storage; //< on the stack! not on the heap, as others have suggested monotonic::vector<int> v(storage); for (int n = 0; n < 100; ++n) v.push_back(n); // will fail at n = 34 with an exhausted storage

but this succeeds: monotonic::storage<100*sizeof(int)> storage; monotonic::vector<int> v(100, storage); // create with an initial size and storage for (int n = 0; n < 100; ++n) v[n] = n;

As the vector in the first case expands, it resizes. But the old memory is not freed - that is why it is explicitly called a "monotonic allocator".

Have you considered also supporting a slightly modified allocator that falls back on the heap once the initial storage is exhausted? I'm more interested in a small buffer optimization while still supporting larger buffers, rather than restricting usage to a fixed storage size. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkoxAMEACgkQ5vihyNWuA4XcjQCgihXSDr6ZpAOqeq9wZVxLGXV3 hpcAniShoS7yU11EtR5pFO8CYv2ZF/nL =DNgA -----END PGP SIGNATURE-----

Simonson, Lucanus J

7:02 p.m.

Christian Schladetsch wrote:

...

I do not intend to argue about the difference between cache coherency and spatial locality; suffice to say they are related.

Yes, there is the issue of false sharing of data that happens to fall on the same cache line. Depending on how hardware cache coherency is implemented this leads to false cache coherency concerns (and performance loss) when granularity of hardware cache coherency is at the cache line as opposed to the much more expensive word size level. Both these problems, however, are solved by custom allocators such as are provided by TBB and other threading libraries that allocate entire cache lines and/or create separate heaps for each thread to prevent them from needing to serialize on access to static data in the allocator while at the same time preventing false sharing. Neither is a good reason to use montonic containers. Please compare performance of your container to std containers with a good memory pooling allocator and include system settings with your benchmarks so that we know what you are measuring. Your benchmarks mean nothing to me because I don't know what allocator you are comparing to, but suspect you are comparing to the system allocator, which is clearly a straw man argument. Regards, Luke

Christian Schladetsch

12 Jun 12 Jun

12:26 a.m.

Hi Luke, Please compare performance of your container to std containers with a good

...

memory pooling allocator and include system settings with your benchmarks so that we know what you are measuring. Your benchmarks mean nothing to me because I don't know what allocator you are comparing to, but suspect you are comparing to the system allocator, which is clearly a straw man argument.

I am directly comparing monotonic::allocator to std::allocator, so there is no straw-man. I agree that eventually I'll have to compare against as many other possible arrangements as possible, stopping when any of them fails to provide a performance benefit for the proposed system. The benchmarks I posted, while hardly exhaustive, do indeed show a performance increase of 10-100x over the standard schemes. At least this was a proof-of-concept over and above my hand-waving. I showed this for both stack- and heap-based storage, for the same algorithm implemented using the same containers, which no other changes or custom data-structures. I agree that more testing should be done, and I'll do so with a better system analysis. I'll post more results and an updated library with full constructor forwarding and alignment. It may well be that a more extensive performance analysis shows that my proposed allocator is a wash. Regards,

...

Luke

Cheers, Christian.

christopher hopman

1:38 a.m.

On Thu, Jun 11, 2009 at 7:26 PM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote: The benchmarks I posted, while hardly exhaustive, do indeed show a performance increase of 10-100x over the standard schemes. At least this was a proof-of-concept over and above my hand-waving. Cheers, Christian I don't really have much faith in those 10-100x performance results. Someone else in that thread ran the code and got more like 1.2x better performance with the monotonic container. In my tests I, too, see about 1.2x improvement. Until somebody reproduces your performance results I will remain skeptical. -Chris Hopman

Scott McMurray

2:42 a.m.

2009/6/11 Christian Schladetsch <christian.schladetsch@gmail.com>:

...

The benchmarks I posted, while hardly exhaustive, do indeed show a performance increase of 10-100x over the standard schemes. At least this was a proof-of-concept over and above my hand-waving.

As posted, however, I was unable to reproduce your results, getting more in the 0.1x range. Perhaps if you gave details of your environment, others would have more luck.

David Bergman

4:58 a.m.

On Jun 11, 2009, at 8:26 PM, Christian Schladetsch wrote:

...

Hi Luke,

Please compare performance of your container to std containers with a good

...
memory pooling allocator and include system settings with your benchmarks so that we know what you are measuring. Your benchmarks mean nothing to me because I don't know what allocator you are comparing to, but suspect you are comparing to the system allocator, which is clearly a straw man argument.

I am directly comparing monotonic::allocator to std::allocator, so there is no straw-man.

I agree that eventually I'll have to compare against as many other possible arrangements as possible, stopping when any of them fails to provide a performance benefit for the proposed system.

The benchmarks I posted, while hardly exhaustive, do indeed show a performance increase of 10-100x over the standard schemes. At least this was a proof-of-concept over and above my hand-waving.

I showed this for both stack- and heap-based storage, for the same algorithm implemented using the same containers, which no other changes or custom data-structures.

NOTE: I do get that preserving space - on stack or heap - is better than incremental allocations (or reallocations.), and your "monotonic" allocator could definitely be a good one if one is willing to bet on a fairly small maximum size. But. if we skip this reserve+monotonic vs unreserved+standard comparison: What we have then is for your allocator simply creates crashing code for a lot of structures or classes on certain platforms, whereas it yields worse performance (for my simple examples of misaligned structures, I got between 10% and 20% slower) for the lenient (in this aspect...) Intel processor family. /David

...

I agree that more testing should be done, and I'll do so with a better system analysis. I'll post more results and an updated library with full constructor forwarding and alignment. It may well be that a more extensive performance analysis shows that my proposed allocator is a wash.

Regards,

...
Luke

Cheers, Christian. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

6:10 a.m.

Hi David, Thanks for helping to bring the problem of alignment to my (admittedly somewhat dim at times) attention. I have changed inline_storage.h: /// allocate storage. ignore hint. void *allocate(size_t num_bytes, void const * = 0) { // ensure we return a pointer on an aligned boundary size_t extra = num_bytes & 15; // todo: policy-based on the alignment is overkill? size_t required = num_bytes + extra; char *ptr = &buffer[cursor]; cursor += required; return ptr + extra; } Then I re-ran your test. I also added another test, which does exactly the same thing, but using std::vector, rather than monotonic::vector: Incrementing ord = 2.89 ps per iteration Incrementing ord = 2.86 ps per iteration STD: Incrementing ord = 3.1 ps per iteration STD: Incrementing ord = 3.1 ps per iteration My other tests dont change much: monotonic: 0.001 std: 0.016 monotonic: 0.01 std: 0.95 I am using VS2008, full optimisation. Note that your test case is ~8% faster when implemented using monotonic::vector over using std::vector. This is because the monotonic vector is using the stack for storage, where the loop controls and other fields are also. I will upload the change to boost/monotonic/inline_storage.h within hours, with this and more tests. Regards and Thanks, Christian. On Fri, Jun 12, 2009 at 4:58 PM, David Bergman < David.Bergman@bergmangupta.com> wrote:

...

On Jun 11, 2009, at 8:26 PM, Christian Schladetsch wrote:

Hi Luke,

...
Please compare performance of your container to std containers with a good

...
memory pooling allocator and include system settings with your benchmarks so that we know what you are measuring. Your benchmarks mean nothing to me because I don't know what allocator you are comparing to, but suspect you are comparing to the system allocator, which is clearly a straw man argument.

I am directly comparing monotonic::allocator to std::allocator, so there is no straw-man.

I agree that eventually I'll have to compare against as many other possible arrangements as possible, stopping when any of them fails to provide a performance benefit for the proposed system.

The benchmarks I posted, while hardly exhaustive, do indeed show a performance increase of 10-100x over the standard schemes. At least this was a proof-of-concept over and above my hand-waving.

I showed this for both stack- and heap-based storage, for the same algorithm implemented using the same containers, which no other changes or custom data-structures.

NOTE: I do get that preserving space - on stack or heap - is better than incremental allocations (or reallocations.), and your "monotonic" allocator could definitely be a good one if one is willing to bet on a fairly small maximum size.

But. if we skip this reserve+monotonic vs unreserved+standard comparison:

What we have then is for your allocator simply creates crashing code for a lot of structures or classes on certain platforms, whereas it yields worse performance (for my simple examples of misaligned structures, I got between 10% and 20% slower) for the lenient (in this aspect...) Intel processor family.

/David

I agree that more testing should be done, and I'll do so with a better

...
system analysis. I'll post more results and an updated library with full constructor forwarding and alignment. It may well be that a more extensive performance analysis shows that my proposed allocator is a wash.

Regards,

...
Luke

Cheers, Christian. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Ross Levine

6:28 a.m.

Can you post the full source for the test? Ross On Fri, Jun 12, 2009 at 2:10 AM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...

Hi David,

Thanks for helping to bring the problem of alignment to my (admittedly somewhat dim at times) attention.

I have changed inline_storage.h:

/// allocate storage. ignore hint. void *allocate(size_t num_bytes, void const * = 0) { // ensure we return a pointer on an aligned boundary size_t extra = num_bytes & 15; // todo: policy-based on the alignment is overkill? size_t required = num_bytes + extra; char *ptr = &buffer[cursor]; cursor += required; return ptr + extra; }

Then I re-ran your test. I also added another test, which does exactly the same thing, but using std::vector, rather than monotonic::vector:

Incrementing ord = 2.89 ps per iteration Incrementing ord = 2.86 ps per iteration STD: Incrementing ord = 3.1 ps per iteration STD: Incrementing ord = 3.1 ps per iteration

My other tests dont change much: monotonic: 0.001 std: 0.016 monotonic: 0.01 std: 0.95

I am using VS2008, full optimisation.

Note that your test case is ~8% faster when implemented using monotonic::vector over using std::vector. This is because the monotonic vector is using the stack for storage, where the loop controls and other fields are also.

I will upload the change to boost/monotonic/inline_storage.h within hours, with this and more tests.

Regards and Thanks, Christian.

On Fri, Jun 12, 2009 at 4:58 PM, David Bergman < David.Bergman@bergmangupta.com> wrote:

...
On Jun 11, 2009, at 8:26 PM, Christian Schladetsch wrote:

Hi Luke,

...
Please compare performance of your container to std containers with a

good

...
...
...
memory pooling allocator and include system settings with your

benchmarks

...
...
so that we know what you are measuring. Your benchmarks mean nothing to me because I don't know what allocator you are comparing to, but suspect you are comparing to the system allocator, which is clearly a straw man argument.

I am directly comparing monotonic::allocator to std::allocator, so there is no straw-man.

I agree that eventually I'll have to compare against as many other possible arrangements as possible, stopping when any of them fails to provide a performance benefit for the proposed system.

The benchmarks I posted, while hardly exhaustive, do indeed show a performance increase of 10-100x over the standard schemes. At least this was a proof-of-concept over and above my hand-waving.

I showed this for both stack- and heap-based storage, for the same algorithm implemented using the same containers, which no other changes or custom data-structures.

NOTE: I do get that preserving space - on stack or heap - is better than incremental allocations (or reallocations.), and your "monotonic" allocator could definitely be a good one if one is willing to bet on a fairly small maximum size.

But. if we skip this reserve+monotonic vs unreserved+standard comparison:

What we have then is for your allocator simply creates crashing code for a lot of structures or classes on certain platforms, whereas it yields worse performance (for my simple examples of misaligned structures, I got between 10% and 20% slower) for the lenient (in this aspect...) Intel processor family.

/David

I agree that more testing should be done, and I'll do so with a better

...
system analysis. I'll post more results and an updated library with full constructor forwarding and alignment. It may well be that a more extensive performance analysis shows that my proposed allocator is a wash.

Regards,

...
Luke

Cheers, Christian. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

6:40 a.m.

Hi Ross, I am in the process of updating the sandbox, but in the meantime: #include <boost/timer.hpp> #include <boost/monotonic/vector.h> #include <iostream> namespace { template<typename C> struct Foo { long ord; C c; }; const int LOOP_COUNT = 100000000; const int ELEM_COUNT = 1000; template<typename C> void test_loop_monotonic() { boost::monotonic::inline_storage<100000> storage; boost::monotonic::vector<Foo<C> > vec(storage); Foo<C> orig = { 'A', 65 }; vec.assign(ELEM_COUNT, orig); boost::timer timer; for (int i = 0; i < LOOP_COUNT; ++i) ++vec[1 + i % (ELEM_COUNT - 2)].ord; double elapsed = timer.elapsed(); std::cout << "Incrementing ord = " << 1000000000*elapsed/LOOP_COUNT << " ps per iteration" << std::endl; } template <class C> void test_loop_std() { Foo<C> orig = { 'A', 65 }; std::vector<Foo<C> > vec; vec.assign(ELEM_COUNT, orig); boost::timer timer; for (int i = 0; i < LOOP_COUNT; ++i) ++vec[1 + i % (ELEM_COUNT - 2)].ord; double elapsed = timer.elapsed(); std::cout << "STD: Incrementing ord = " << 1000000000*elapsed/LOOP_COUNT << " ps per iteration" << std::endl; } } // namespace void test_alignment() { test_loop_monotonic<char>(); test_loop_monotonic<long>(); test_loop_std<char>(); test_loop_std<short>(); } //EOF Incrementing ord = 2.88 ps per iteration Incrementing ord = 2.87 ps per iteration STD: Incrementing ord = 3.11 ps per iteration STD: Incrementing ord = 3.12 ps per iteration Regards, Christian. On Fri, Jun 12, 2009 at 6:28 PM, Ross Levine <ross.levine@uky.edu> wrote:

...

Can you post the full source for the test?

Ross

On Fri, Jun 12, 2009 at 2:10 AM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...
Hi David,

Thanks for helping to bring the problem of alignment to my (admittedly somewhat dim at times) attention.

I have changed inline_storage.h:

/// allocate storage. ignore hint. void *allocate(size_t num_bytes, void const * = 0) { // ensure we return a pointer on an aligned boundary size_t extra = num_bytes & 15; // todo: policy-based on the alignment is overkill? size_t required = num_bytes + extra; char *ptr = &buffer[cursor]; cursor += required; return ptr + extra; }

Then I re-ran your test. I also added another test, which does exactly the same thing, but using std::vector, rather than monotonic::vector:

Incrementing ord = 2.89 ps per iteration Incrementing ord = 2.86 ps per iteration STD: Incrementing ord = 3.1 ps per iteration STD: Incrementing ord = 3.1 ps per iteration

My other tests dont change much: monotonic: 0.001 std: 0.016 monotonic: 0.01 std: 0.95

I am using VS2008, full optimisation.

Note that your test case is ~8% faster when implemented using monotonic::vector over using std::vector. This is because the monotonic vector is using the stack for storage, where the loop controls and other fields are also.

I will upload the change to boost/monotonic/inline_storage.h within hours, with this and more tests.

Regards and Thanks, Christian.

On Fri, Jun 12, 2009 at 4:58 PM, David Bergman < David.Bergman@bergmangupta.com> wrote:

...
On Jun 11, 2009, at 8:26 PM, Christian Schladetsch wrote:

Hi Luke,

...
Please compare performance of your container to std containers with a

...
...
...
memory pooling allocator and include system settings with your

benchmarks

...
...
so that we know what you are measuring. Your benchmarks mean nothing to me because I don't know what allocator you are comparing to, but suspect you are comparing to the system allocator, which is clearly a straw man argument.

I am directly comparing monotonic::allocator to std::allocator, so

...
...
is no straw-man.

I agree that eventually I'll have to compare against as many other possible arrangements as possible, stopping when any of them fails to provide a performance benefit for the proposed system.

The benchmarks I posted, while hardly exhaustive, do indeed show a performance increase of 10-100x over the standard schemes. At least

...
...
was a proof-of-concept over and above my hand-waving.

I showed this for both stack- and heap-based storage, for the same algorithm implemented using the same containers, which no other changes or custom data-structures.

NOTE: I do get that preserving space - on stack or heap - is better

good there this than

...
incremental allocations (or reallocations.), and your "monotonic" allocator could definitely be a good one if one is willing to bet on a fairly small maximum size.

But. if we skip this reserve+monotonic vs unreserved+standard comparison:

What we have then is for your allocator simply creates crashing code for a lot of structures or classes on certain platforms, whereas it yields worse performance (for my simple examples of misaligned structures, I got between 10% and 20% slower) for the lenient (in this aspect...) Intel processor family.

/David

I agree that more testing should be done, and I'll do so with a better

...
system analysis. I'll post more results and an updated library with full constructor forwarding and alignment. It may well be that a more extensive performance analysis shows that my proposed allocator is a wash.

Regards,

...
Luke

Cheers, Christian. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

7:04 a.m.

Hello, I should point out that no allocator can help you if you intentionally try to make non-aligned access to fields within an allocated array of structures. On Fri, Jun 12, 2009 at 6:40 PM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...

Hi Ross,

I am in the process of updating the sandbox, but in the meantime:

[snip] Regards, Christian.

Scott McMurray

7:05 a.m.

2009/6/11 Christian Schladetsch <christian.schladetsch@gmail.com>:

...

template<typename C> void test_loop_monotonic() { boost::monotonic::inline_storage<100000> storage; boost::monotonic::vector<Foo<C> > vec(storage); Foo<C> orig = { 'A', 65 }; vec.assign(ELEM_COUNT, orig); }

This doesn't even compile for me (GCC 4.3.3), apparently because it can't swap the allocators.

Christian Schladetsch

7:36 a.m.

Hi Scott, Thanks for that; adding operator!= for monotonic::allocator seems to've fixed it. I am in the process of incorporating all the suggested changes into the sandbox now, and testing across more compilers. Cheers, Christian. On Fri, Jun 12, 2009 at 7:05 PM, Scott McMurray <me22.ca+boost@gmail.com>wrote:

...

2009/6/11 Christian Schladetsch <christian.schladetsch@gmail.com>:

...
template<typename C> void test_loop_monotonic() { boost::monotonic::inline_storage<100000> storage; boost::monotonic::vector<Foo<C> > vec(storage); Foo<C> orig = { 'A', 65 }; vec.assign(ELEM_COUNT, orig); }

This doesn't even compile for me (GCC 4.3.3), apparently because it can't swap the allocators. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

8:30 a.m.

Hello, The code with the various suggested fixes and all references tests is now in the sandbox at svn.boost.org/svn/boost/monotonic. On VS2008, full optimisation: Incrementing ord = 2.99 ps per iteration Incrementing ord = 2.99 ps per iteration STD: Incrementing ord = 3.22 ps per iteration STD: Incrementing ord = 3.19 ps per iteration monotonic: 0.001 std: 0.016 monotonic: 0.014 std: 0.955 On GCC 4.3.3, -O4: Incrementing ord = 2.6 ps per iteration Incrementing ord = 2.7 ps per iteration STD: Incrementing ord = 3.1 ps per iteration STD: Incrementing ord = 2.7 ps per iteration monotonic: 0 std: 0 monotonic: 0.02 std: 0.02 Obviously, I have a lot of testing to do to show anything real (other than that GCC is better at optimisation than VS in these cases). However, none of the results are discouraging; there is no case where monotonic is slower and only cases where it is faster. However, I am the first to admit that I really need to provide far better and more exhaustive testing. Now that it is in the sandbox, that will be easier to do. Thanks and Regards, Christian. On Fri, Jun 12, 2009 at 7:36 PM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...

Hi Scott,

Thanks for that; adding operator!= for monotonic::allocator seems to've fixed it. I am in the process of incorporating all the suggested changes into the sandbox now, and testing across more compilers.

Cheers, Christian.

On Fri, Jun 12, 2009 at 7:05 PM, Scott McMurray <me22.ca+boost@gmail.com>wrote:

...
2009/6/11 Christian Schladetsch <christian.schladetsch@gmail.com>:

...
template<typename C> void test_loop_monotonic() { boost::monotonic::inline_storage<100000> storage; boost::monotonic::vector<Foo<C> > vec(storage); Foo<C> orig = { 'A', 65 }; vec.assign(ELEM_COUNT, orig); }

This doesn't even compile for me (GCC 4.3.3), apparently because it can't swap the allocators. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Frank Mori Hess

12:41 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday 12 June 2009, Christian Schladetsch wrote:

...

On VS2008, full optimisation: Incrementing ord = 2.99 ps per iteration Incrementing ord = 2.99 ps per iteration STD: Incrementing ord = 3.22 ps per iteration STD: Incrementing ord = 3.19 ps per iteration monotonic: 0.001 std: 0.016 monotonic: 0.014 std: 0.955

On GCC 4.3.3, -O4: Incrementing ord = 2.6 ps per iteration Incrementing ord = 2.7 ps per iteration STD: Incrementing ord = 3.1 ps per iteration STD: Incrementing ord = 2.7 ps per iteration monotonic: 0 std: 0 monotonic: 0.02 std: 0.02

Obviously, I have a lot of testing to do to show anything real (other than

Are you setting _SCL_SECURE to zero on msvc? It's a common cause of performance drops with std containers on msvc. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkoyTO0ACgkQ5vihyNWuA4WRSgCdHGhL5HhYqbtHMHW2s0MhDFYe HU0AoIaiJGezLyGcHz6ncYLDK3HOKhQ5 =VJHu -----END PGP SIGNATURE-----

Simonson, Lucanus J

3:51 p.m.

Christian Schladetsch wrote:

...

Note that your test case is ~8% faster when implemented using monotonic::vector over using std::vector. This is because the monotonic vector is using the stack for storage, where the loop controls and other fields are also.

How do you know? Unless you went in and measured with vtune how many cache misses you get with each you are just making things up and stating them as fact. I highly doubt the difference is due to improved spatial locality from allocation being on the stack. Frank Mori Hess wrote:

...

Are you setting _SCL_SECURE to zero on msvc? It's a common cause of performance drops with std containers on msvc.

How many times has someone come to boost with an optimization that isn't because they don't know about _SCL_SECURE? This is how the stl gets the reputation of being slow which so damages people's ability to write sensible code. Forcing everyone who is ignorant of what the compiler is doing to write containers with bugs in them because they don't like the fact that stl is made slow by default in MSVC is hardly improving security. Storing all the nodes of a tree in a buffer instead of scattered in the heap is a good idea. Storing all the nodes of a tree in a buffer of a fixed size that causes a crash when you exceed it is a bad idea. We get massive performance improvements by using a (non-standard) memory buffer allocator in internal code that implements a variant of R*Tree. We don't crash if we keep adding elements to the tree. We find that in toy benchmarks like Christian shows the performance advantage of storing the nodes of a tree in a contiguous buffer is greatly less than in a real application because even the system allocator will cluster them together in memory when memory is completely free and because everything in a toy benchmark fits in cache anyway. What Christian is trying to do is in general good, but this idea that allocating the buffer on the stack is somehow a great idea seems to be getting in the way of the portion of his overall proposal that is actually good. He also hasn't yet addressed the design flaw which makes his data structures unsafe to use. Performance at the expense of safety is easy to achieve, just program in C. Instead of reinventing the wheel perhaps can we just find some good allocators people have written and compare them? Since there is no std::R*Tree and no way Christian would implement monotonic::R*Tree it would be better to provide a standard compliant allocator people can use on their own containers that are modeled after the standard than re-implement the stl containers with a non-standard compliant allocator. Luke

Christian Schladetsch

13 Jun 13 Jun

1:39 a.m.

Hi Luke, On Sat, Jun 13, 2009 at 3:51 AM, Simonson, Lucanus J < lucanus.j.simonson@intel.com> wrote:

...

Christian Schladetsch wrote:

...
Note that your test case is ~8% faster when implemented using monotonic::vector over using std::vector. This is because the monotonic vector is using the stack for storage, where the loop controls and other fields are also.

How do you know? Unless you went in and measured with vtune how many cache misses you get with each you are just making things up and stating them as fact. I highly doubt the difference is due to improved spatial locality from allocation being on the stack.

I agree that I was brash in my statement, and that I have been wrong before. I hypothesised that it would be the case. I argued my case in theory, then someone else wrote a benchmark. I tested that benchmark using my system and using the std::system. The measurements I took showed my system was faster, so I logically took it as support of my theory. If you could find the time to look at the test in question, I would be delighted to hear alternative theories.

...

Frank Mori Hess wrote:

...
Are you setting _SCL_SECURE to zero on msvc? It's a common cause of performance drops with std containers on msvc.

How many times has someone come to boost with an optimization that isn't because they don't know about _SCL_SECURE? This is how the stl gets the reputation of being slow which so damages people's ability to write sensible code. Forcing everyone who is ignorant of what the compiler is doing to write containers with bugs in them because they don't like the fact that stl is made slow by default in MSVC is hardly improving security.

I tested it on GCC and VS. I found that yes indeed, my initial report of a 100x improvement was still demonstrable on VS, but it was much more modest or non-existant, on GCC. So I dug a little deeper, ran and re-ran the tests. Same result; VS was just much slower than GCC for that test. At first I put it down to being a difference in the compiler. After all I thought, 100x improvement on VS, even if not reproducible on GCC, was still worthwhile to everyone using VS for realtime code. Including, of course, myself. However, it nagged at me, and later more elaborate tests (which are now in the sandbox with revised implementation at http://svn.boost.org/svn/boost/sandbox/monotonic) showed that MSVC was getting pathologically poor results. Something was wrong. To cut a short story long, I had once made a change to the project settings to use /Zi, which includes debug symbols in the /O2 optimised build. That stayed, and also just so happened to give results that showed that the std::containers were far slower than the monotonic::containers for MSVC. A newbie mistake to be sure, compounded by the effect of 'wishful thinking', which in this case was compounded by the effect that that specific fault gave just the results I had hoped for. My current benchmarks for VS2008 /O2 (and *not* /Zi) are: Incrementing ord = 3.16 ps per iteration Incrementing ord = 3.18 ps per iteration STD: Incrementing ord = 3.43 ps per iteration STD: Incrementing ord = 4.04 ps per iteration monotonic: 0.001 std: 0.002 monotonic: 0.011 std: 0.019 <<--- big drop from ~1.00!! test_map_list: mono: 19.628 test_map_list: std: 46.226 And for GCC4.3.3 -O4: Incrementing ord = 3.1 ps per iteration Incrementing ord = 3.1 ps per iteration STD: Incrementing ord = 2.8 ps per iteration STD: Incrementing ord = 2.9 ps per iteration monotonic: 0 std: 0 monotonic: 0.02 std: 0.01 <<--- std:: is actually faster on GCC; deviation is high for this test test_map_list: mono: 14.5 test_map_list: std: 21.76 So the results are mixed; except for the test_map_list test, which is 2.3x faster using MSVC and 1.5x faster using GCC. That test is the first more-real-world test provided, and I think is starting to get outside of the 'toy benchmarks' area. The implementation is at https://svn.boost.org/svn/boost/sandbox/monotonic/libs/monotonic/test/main.c... . This test creates and uses a map<int, list<int> > a few times. It also more accurately fits the original motivations outlined in https://svn.boost.org/svn/boost/sandbox/monotonic/libs/monotonic/doc/index.h... . Is a 50%-200% improvement worth investigating further? Perhaps, perhaps not. It is certainly not 'compelling'. Storing all the nodes of a tree in a buffer instead of scattered in the heap

...

is a good idea. Storing all the nodes of a tree in a buffer of a fixed size that causes a crash when you exceed it is a bad idea.

There are many cases in high-performance, realtime code that you can or should know and set high-level water marks for memory usage, either per-frame or per-configuration or both. In any case, the same argument can be made against exhausting any memory pool, including the heap itself. To claim that the proposed system is faulty because it can exhaust an explicitly-sized allocation buffer seems a little mischievious.

...

We get massive performance improvements by using a (non-standard) memory buffer allocator in internal code that implements a variant of R*Tree. We don't crash if we keep adding elements to the tree. We find that in toy benchmarks like Christian shows the performance advantage of storing the nodes of a tree in a contiguous buffer is greatly less than in a real application because even the system allocator will cluster them together in memory when memory is completely free and because everything in a toy benchmark fits in cache anyway.

Perhaps. I didn't think "gee, how can I make some wacky boost proposal; I know, I'll come up with a strange allocator and try to fefuddle things with poorly executed benchmarks!" Perhaps that's how it has progressed thus far, however. Even so, the original idea and implementation came from performance analysis of real-world game-system code that I write for 360 and PS3. I saw that, especially on the PS3, cache was really important and I didn't want to be jumping around all over the heap. I wanted to use standard containers (not custom ones), which used either the stack or the heap for memory. I especially liked the idea of using the stack, as it was often the case that these temporary containers only existing for the lifetime of a stack-frame. I also knew that I didn't want memory to be "allocated and freed"; I knew how much memory would be required by a static analysis of the code, so I did not need a "memory allocator" in the traditional sense.

...

What Christian is trying to do is in general good, but this idea that allocating the buffer on the stack is somehow a great idea seems to be getting in the way of the portion of his overall proposal that is actually good. He also hasn't yet addressed the design flaw which makes his data structures unsafe to use.

If you are referring to the memory alignment issues, the current implementation finally does a very basic alignment, after some wrong-headedness from me. As noted by Thorsten, it should be changed again to use boost::align_memory. Another important part of my motivation, which hasn't been repeatedly explicitly mentioned because it is a "soft" argument and hard to measure, is that of heap fragmentation. This is a huge issue for any real-time system that has to stay up and running indefinately, and was also a major motivation for the monotonic proposal. Being able to 'wipe the slate clean' after each frame or configuration state is vitally important for continuous realtime systems. This is what the 'real-world-like' test (test_map_list in the output pasted above) that currently provides %50-%250 gains in performance demonstrates. On top of the performance gains, the test also demonstrates that the monotonic system does not fragment the heap over time. I haven't been pushing this aspect of my proposal a lot because I know it's soft and hard to measure, and because it could be argued to be too 'niche' of a concern to be much of an issue. So, I've pushed for results, perhaps too agressively, in other areas like performance and ease of use.

...

Performance at the expense of safety is easy to achieve, just program in C.

Instead of reinventing the wheel perhaps can we just find some good allocators people have written and compare them? Since there is no std::R*Tree and no way Christian would implement monotonic::R*Tree

I wouldn't need to implement monotonic::R*Tree, which would just be said hypothetical implementation using a monotonic::allocator.

...

it would be better to provide a standard compliant allocator people can use on their own containers that are modeled after the standard than re-implement the stl containers with a non-standard compliant allocator.

I haven't re-implemented std::containers. I have, perhaps in poor judgement, made a proposal that includes a wrapping of them that facilitates using them with a monotonic allocator. The proposal can be used either way: typedef boost::monotonic::map<int, boost::monotonic::list<int> > Map; or: typedef std::map<int, std::list<int, boost::monotonic::allocator<int> >, std::less<int>, boost::monotonic::allocator<int> > Map; So far, I have avoided a discussion about the merits and lack thereof of the design and practical use of std::allocators. Monotonic::allocator stores a pointer to storage. If this is an affront to the standard, why can and indeed must allocators be compared for equality? Put another way, if allocators really cannot have any state at all, not even a pointer, why must they be compared? If they really cannot have state, then a work-around is to have a static global that points to the storage to use. Hack? Yes, but so are std::allocators.

...

Luke

Thanks for your comments Luke, I really appreciate your thoughts on the matter. At the end of the day, we all want the same thing; standards-compliant, cross-platform, high-performance programs. If monotonic never sees the light of day, but at least starts a discussion about the issues it tries to address, and something else comes of it completely, then I would still consider that a complete win. I also need to look at using auto_buffer<T> for storage, boost::align_memory, and research R*Trees, and a lot of other things besides, so for me personally it has been very beneficial. So I say thanks guys for all the comments, and I apologise for sometimes being obtuse. Regards, Christian.

Christian Schladetsch

1:54 a.m.

This is the detail of the 'more-real-world' test mentioned below. My results show that monotonic approach is %50 faster in GCC and much faster again in MSVC. Equally importantly, the monotonic implementation does not fragment the heap: // the purpose of this test is to simulate a `realtime system` void test_map_list_realtime() { monotonic::inline_storage<1000000> storage; const size_t outter_loops = 10*1000; const size_t inner_loops = 10000; boost::timer t0; for (int n = 0; n < outter_loops; ++n) { test_map_list_impl_mono(inner_loops, storage); // *** reset the memory usage to zero, hence zero heap fragmentation **( storage.reset(); } double e0 = t0.elapsed(); cout << "test_map_list: mono: " << e0 << endl; boost::timer t1; for (int n = 0; n < outter_loops; ++n) { test_map_list_impl_std(inner_loops); } double e1 = t1.elapsed(); cout << "test_map_list: std: " << e1 << endl; } // part of the test_map_list_realtime test. // this is to be like running part of a simulation. void test_map_list_impl_mono(size_t count, monotonic::storage_base &storage) { typedef monotonic::list<int> List; typedef monotonic::map<int, List> Map; Map map(storage); size_t mod = count/10; for (size_t n = 0; n < count; ++n) { int random = rand() % mod; Map::iterator iter = map.find(random); if (iter == map.end()) { map.insert(make_pair(random, List(storage))); } else { iter->second.push_back(n); } } } // same as test_map_list_impl_mono, but using std::containers void test_map_list_impl_std(size_t count) { typedef std::list<int> List; typedef std::map<int, List> Map; Map map; size_t mod = count/10; for (size_t n = 0; n < count; ++n) { int random = rand() % mod; Map::iterator iter = map.find(random); if (iter == map.end()) { map.insert(make_pair(random, List())); } else { iter->second.push_back(n); } } } On Sat, Jun 13, 2009 at 1:39 PM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...

Hi Luke,

On Sat, Jun 13, 2009 at 3:51 AM, Simonson, Lucanus J < lucanus.j.simonson@intel.com> wrote:

...
Christian Schladetsch wrote:

...
Note that your test case is ~8% faster when implemented using monotonic::vector over using std::vector. This is because the monotonic vector is using the stack for storage, where the loop controls and other fields are also.

How do you know? Unless you went in and measured with vtune how many cache misses you get with each you are just making things up and stating them as fact. I highly doubt the difference is due to improved spatial locality from allocation being on the stack.

I agree that I was brash in my statement, and that I have been wrong before. I hypothesised that it would be the case. I argued my case in theory, then someone else wrote a benchmark. I tested that benchmark using my system and using the std::system. The measurements I took showed my system was faster, so I logically took it as support of my theory. If you could find the time to look at the test in question, I would be delighted to hear alternative theories.

...
Frank Mori Hess wrote:

...
Are you setting _SCL_SECURE to zero on msvc? It's a common cause of performance drops with std containers on msvc.

How many times has someone come to boost with an optimization that isn't because they don't know about _SCL_SECURE? This is how the stl gets the reputation of being slow which so damages people's ability to write sensible code. Forcing everyone who is ignorant of what the compiler is doing to write containers with bugs in them because they don't like the fact that stl is made slow by default in MSVC is hardly improving security.

I tested it on GCC and VS. I found that yes indeed, my initial report of a 100x improvement was still demonstrable on VS, but it was much more modest or non-existant, on GCC.

So I dug a little deeper, ran and re-ran the tests. Same result; VS was just much slower than GCC for that test. At first I put it down to being a difference in the compiler. After all I thought, 100x improvement on VS, even if not reproducible on GCC, was still worthwhile to everyone using VS for realtime code. Including, of course, myself.

However, it nagged at me, and later more elaborate tests (which are now in the sandbox with revised implementation at http://svn.boost.org/svn/boost/sandbox/monotonic) showed that MSVC was getting pathologically poor results. Something was wrong.

To cut a short story long, I had once made a change to the project settings to use /Zi, which includes debug symbols in the /O2 optimised build. That stayed, and also just so happened to give results that showed that the std::containers were far slower than the monotonic::containers for MSVC.

A newbie mistake to be sure, compounded by the effect of 'wishful thinking', which in this case was compounded by the effect that that specific fault gave just the results I had hoped for.

My current benchmarks for VS2008 /O2 (and *not* /Zi) are: Incrementing ord = 3.16 ps per iteration Incrementing ord = 3.18 ps per iteration STD: Incrementing ord = 3.43 ps per iteration STD: Incrementing ord = 4.04 ps per iteration monotonic: 0.001 std: 0.002 monotonic: 0.011 std: 0.019 <<--- big drop from ~1.00!! test_map_list: mono: 19.628 test_map_list: std: 46.226

And for GCC4.3.3 -O4: Incrementing ord = 3.1 ps per iteration Incrementing ord = 3.1 ps per iteration STD: Incrementing ord = 2.8 ps per iteration STD: Incrementing ord = 2.9 ps per iteration monotonic: 0 std: 0 monotonic: 0.02 std: 0.01 <<--- std:: is actually faster on GCC; deviation is high for this test test_map_list: mono: 14.5 test_map_list: std: 21.76

So the results are mixed; except for the test_map_list test, which is 2.3x faster using MSVC and 1.5x faster using GCC.

That test is the first more-real-world test provided, and I think is starting to get outside of the 'toy benchmarks' area. The implementation is at https://svn.boost.org/svn/boost/sandbox/monotonic/libs/monotonic/test/main.c... .

This test creates and uses a map<int, list<int> > a few times. It also more accurately fits the original motivations outlined in https://svn.boost.org/svn/boost/sandbox/monotonic/libs/monotonic/doc/index.h... .

Is a 50%-200% improvement worth investigating further? Perhaps, perhaps not. It is certainly not 'compelling'.

Storing all the nodes of a tree in a buffer instead of scattered in the

...
heap is a good idea. Storing all the nodes of a tree in a buffer of a fixed size that causes a crash when you exceed it is a bad idea.

There are many cases in high-performance, realtime code that you can or should know and set high-level water marks for memory usage, either per-frame or per-configuration or both.

In any case, the same argument can be made against exhausting any memory pool, including the heap itself. To claim that the proposed system is faulty because it can exhaust an explicitly-sized allocation buffer seems a little mischievious.

...
We get massive performance improvements by using a (non-standard) memory buffer allocator in internal code that implements a variant of R*Tree. We don't crash if we keep adding elements to the tree. We find that in toy benchmarks like Christian shows the performance advantage of storing the nodes of a tree in a contiguous buffer is greatly less than in a real application because even the system allocator will cluster them together in memory when memory is completely free and because everything in a toy benchmark fits in cache anyway.

Perhaps. I didn't think "gee, how can I make some wacky boost proposal; I know, I'll come up with a strange allocator and try to fefuddle things with poorly executed benchmarks!" Perhaps that's how it has progressed thus far, however.

Even so, the original idea and implementation came from performance analysis of real-world game-system code that I write for 360 and PS3. I saw that, especially on the PS3, cache was really important and I didn't want to be jumping around all over the heap. I wanted to use standard containers (not custom ones), which used either the stack or the heap for memory. I especially liked the idea of using the stack, as it was often the case that these temporary containers only existing for the lifetime of a stack-frame. I also knew that I didn't want memory to be "allocated and freed"; I knew how much memory would be required by a static analysis of the code, so I did not need a "memory allocator" in the traditional sense.

...
What Christian is trying to do is in general good, but this idea that allocating the buffer on the stack is somehow a great idea seems to be getting in the way of the portion of his overall proposal that is actually good. He also hasn't yet addressed the design flaw which makes his data structures unsafe to use.

If you are referring to the memory alignment issues, the current implementation finally does a very basic alignment, after some wrong-headedness from me. As noted by Thorsten, it should be changed again to use boost::align_memory.

Another important part of my motivation, which hasn't been repeatedly explicitly mentioned because it is a "soft" argument and hard to measure, is that of heap fragmentation.

This is a huge issue for any real-time system that has to stay up and running indefinately, and was also a major motivation for the monotonic proposal. Being able to 'wipe the slate clean' after each frame or configuration state is vitally important for continuous realtime systems.

This is what the 'real-world-like' test (test_map_list in the output pasted above) that currently provides %50-%250 gains in performance demonstrates. On top of the performance gains, the test also demonstrates that the monotonic system does not fragment the heap over time.

I haven't been pushing this aspect of my proposal a lot because I know it's soft and hard to measure, and because it could be argued to be too 'niche' of a concern to be much of an issue. So, I've pushed for results, perhaps too agressively, in other areas like performance and ease of use.

...
Performance at the expense of safety is easy to achieve, just program in C.

Instead of reinventing the wheel perhaps can we just find some good allocators people have written and compare them? Since there is no std::R*Tree and no way Christian would implement monotonic::R*Tree

I wouldn't need to implement monotonic::R*Tree, which would just be said hypothetical implementation using a monotonic::allocator.

...
it would be better to provide a standard compliant allocator people can use on their own containers that are modeled after the standard than re-implement the stl containers with a non-standard compliant allocator.

I haven't re-implemented std::containers. I have, perhaps in poor judgement, made a proposal that includes a wrapping of them that facilitates using them with a monotonic allocator. The proposal can be used either way:

typedef boost::monotonic::map<int, boost::monotonic::list<int> > Map;

or:

typedef std::map<int, std::list<int, boost::monotonic::allocator<int> >, std::less<int>, boost::monotonic::allocator<int> > Map;

So far, I have avoided a discussion about the merits and lack thereof of the design and practical use of std::allocators.

Monotonic::allocator stores a pointer to storage. If this is an affront to the standard, why can and indeed must allocators be compared for equality?

Put another way, if allocators really cannot have any state at all, not even a pointer, why must they be compared?

If they really cannot have state, then a work-around is to have a static global that points to the storage to use. Hack? Yes, but so are std::allocators.

...
Luke

Thanks for your comments Luke, I really appreciate your thoughts on the matter.

At the end of the day, we all want the same thing; standards-compliant, cross-platform, high-performance programs. If monotonic never sees the light of day, but at least starts a discussion about the issues it tries to address, and something else comes of it completely, then I would still consider that a complete win.

I also need to look at using auto_buffer<T> for storage, boost::align_memory, and research R*Trees, and a lot of other things besides, so for me personally it has been very beneficial.

So I say thanks guys for all the comments, and I apologise for sometimes being obtuse.

Regards, Christian.

Simonson, Lucanus J

15 Jun 15 Jun

4:11 p.m.

Christian Schladetsch wrote:

...

I agree that I was brash in my statement, and that I have been wrong before. I hypothesised that it would be the case. I argued my case in theory, then someone else wrote a benchmark. I tested that benchmark using my system and using the std::system. The measurements I took showed my system was faster, so I logically took it as support of my theory. If you could find the time to look at the test in question, I would be delighted to hear alternative theories.

The memory overhead of header information on dynamically allocated data in the heap could effectively reduce your apparent cahce size by causing fewer total number of elements to fit in cache. This would lead to more cache misses without being due to spatial locality. This could explain the entire difference in runtime, though I wouldn't go so far as to say I'm sure that is the case.

...

Another important part of my motivation, which hasn't been repeatedly explicitly mentioned because it is a "soft" argument and hard to measure, is that of heap fragmentation.

Heap fragmentation is a huge concern and very important in real applications. However, heap fragmentation does not occur if you free all the memory you allocate. Using dynamic allocation doesn't necessarily lead to heap fragmentation, only using it to allocate many small, long lived objects interleaved with other usage. Heap fragmentation is solved in a number of ways including memory pooling allocators. Heap fragmentation is one of the things I was alluding to when I mentioned the difference between toy benchmarks and real-world applications.

...

I haven't re-implemented std::containers. I have, perhaps in poor judgement, made a proposal that includes a wrapping of them that facilitates using them with a monotonic allocator. The proposal can be used either way:

typedef boost::monotonic::map<int, boost::monotonic::list<int> > Map; or: typedef std::map<int, std::list<int, boost::monotonic::allocator<int>

...
, std::less<int>, boost::monotonic::allocator<int> > Map;

I didn't realize. This is much better than what I thought you had. In fact I find myself liking the proposal more. I think you are thinking in the right direction to improve performance, and you remind me of some of my colleagues in your enthusiasm and the types of ideas you come up with. We have a data structure very similar to your "rope" or "chain" idea that we use to allocate storage for a 2D spatial query data structure that is a substitute for the R*Tree variant I mentioned before. I like that you can start with an array on the stack and then extend into the heap as needed. This is a nice feature for when the object turns out to be extremely small. I like that it is safe as opposed a fixed sized buffer. Ours has a map instead of linked list to access the nodes in the chain, so that we have log n/chunksize access time. The data structure is actually a substitute for map, where insertion and removal time is not better than a map, but access time is better and iteration speed is much better. It also conumes much less memory. It differs from a map also because random insertion/removal invalidates iterators because we insert into the middle of each chunk. If you only insert at the end you obviously keep your iterators valid. Still, an allocator is no substitute for using the right data structures and algorithms. Using a map is a well known way to write slug code when you could use a vector with reserve, sort the vector and get log n random access lookup in the cases where no insertion or removal need be performed after the data structure is initially built, which is often the case. A map of int to list is a data structure that can be expected to perform very poorly. It isn't completely fair to do performance comparisions against such a poor data structure.

...

Thanks for your comments Luke, I really appreciate your thoughts on the matter.

At the end of the day, we all want the same thing; standards-compliant, cross-platform, high-performance programs. If monotonic never sees the light of day, but at least starts a discussion about the issues it tries to address, and something else comes of it completely, then I would still consider that a complete win.

I think an allocator that uses your chain buffer for storage and is safe could be a good thing. I think that the allocator can be implemented in a way that it doesn't need non-static data members. The chain is in and of itself not interesting as a replacement for deque, but as a memory buffer for a good allocator I think it could find its niche. I'd like to see the memory buffer implement object deallocation and memory reuse, and perhaps thread saftey (maybe optional thread safety enabled by default). I almost coded up a proposal for such an allocator based on a deque when I last replied to you, but I see you've thought way further along those lines without my prompting. Regards, Luke

Ross Levine

5:26 p.m.

On Mon, Jun 15, 2009 at 12:11 PM, Simonson, Lucanus J < lucanus.j.simonson@intel.com> wrote:

...

I think an allocator that uses your chain buffer for storage and is safe could be a good thing. I think that the allocator can be implemented in a way that it doesn't need non-static data members. The chain is in and of itself not interesting as a replacement for deque, but as a memory buffer for a good allocator I think it could find its niche. I'd like to see the memory buffer implement object deallocation and memory reuse, and perhaps thread saftey (maybe optional thread safety enabled by default). I almost coded up a proposal for such an allocator based on a deque when I last replied to you, but I see you've thought way further along those lines without my prompting.

Now that is a great idea! A boost::allocators library would be a great addition. Then everyone wins, because we could have a cross-platform stack_based_allocator and monotonic_allocator and the like.

Christian Schladetsch

8:36 p.m.

Yes, I was going to suggest boost::allocator::monotonic, rather than boost::monotonic::allocator. More on this later. On Tue, Jun 16, 2009 at 5:26 AM, Ross Levine <ross.levine@uky.edu> wrote:

...

On Mon, Jun 15, 2009 at 12:11 PM, Simonson, Lucanus J < lucanus.j.simonson@intel.com> wrote:

...
I think an allocator that uses your chain buffer for storage and is safe could be a good thing. I think that the allocator can be implemented in

...
way that it doesn't need non-static data members. The chain is in and of itself not interesting as a replacement for deque, but as a memory buffer for a good allocator I think it could find its niche. I'd like to see

a the

...
memory buffer implement object deallocation and memory reuse, and perhaps thread saftey (maybe optional thread safety enabled by default). I almost coded up a proposal for such an allocator based on a deque when I last replied to you, but I see you've thought way further along those lines without my prompting.

Now that is a great idea! A boost::allocators library would be a great addition. Then everyone wins, because we could have a cross-platform stack_based_allocator and monotonic_allocator and the like. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

10:14 p.m.

On Tue, Jun 16, 2009 at 4:11 AM, Simonson, Lucanus J < lucanus.j.simonson@intel.com> wrote:

...

Christian Schladetsch wrote:

...
[...] if you could find the time to look at the test in question, I would be delighted to hear alternative theories.

The memory overhead of header information on dynamically allocated data in the heap could effectively reduce your apparent cahce size by causing fewer total number of elements to fit in cache.

I have surmised this as well. This would lead to more cache misses without being due to spatial locality.

...

This could explain the entire difference in runtime, though I wouldn't go so far as to say I'm sure that is the case.

However, for the test in point, there was no such overhead in any case, and allocation was not being compared, as the container being tested was a fully reserved vector in both cases: for (int i = 0; i < LOOP_COUNT; ++i) ++vec[1 + i % (ELEM_COUNT - 2)].ord; The same principal is shown by the bubble-sort results I posted earlier. Placing the storage for the bubble-sort test on the stack resulted in a faster execution time. Placing *the same* monotonically-allocated structure on the heap is slower. They were both faster than a std::allocation on the heap. The difference isn't much, but there is a difference, and it is not negligable, and it is free if you have a known and fixed upper bound on storage size. chain<> will go some way to assisting with structures that span the stack/heap boundary and hence do not need a fixed upper bound. In general, there are five factors at work that make monotonic allocation, and containers that use it, faster and smaller: 1. Optional Stack Locality (memory access doesnt need the heap) 2. No memory-management size overhead (smaller structures + nodes are adjacent) 3. Faster allocation (no list traveral or modification) 4. Faster deallocation (does nothing) 5. Free Thread Safety (when on the stack) Heap fragmentation is one of the things I was alluding to when I mentioned

...

the difference between toy benchmarks and real-world applications.

I shall re-post results that demonstrate how monotonic allocation helps with this problem, as well as being faster in 'non-toy' situations, after I reply here.

...

...
I haven't re-implemented std::containers. [...] I didn't realize. This is much better than what I thought you had. In fact I find myself liking the proposal more. I think you are thinking in the right direction to improve performance, and you remind me of some of my

colleagues in your enthusiasm and the types of ideas you come up with.

Thanks :) I fumbled the way in which I originally made the proposal, with my nervous and defensive manner, and in how I attempted to communicate my ideas at first. I was abrasive and rude, and I apologise for that to Ayrton and Thorsten and others. Live and learn. It is pleasing to see that with some persistence and tests I am becoming a little less incoherent.

...

We have a data structure very similar to your "rope" or "chain" idea that we use to allocate storage for a 2D spatial query data structure that is a substitute for the R*Tree variant I mentioned before. I like that you can start with an array on the stack and then extend into the heap as needed. This is a nice feature for when the object turns out to be extremely small. I like that it is safe as opposed a fixed sized buffer. Ours has a map instead of linked list to access the nodes in the chain, so that we have log n/chunksize access time. The data structure is actually a substitute for map, where insertion and removal time is not better than a map, but access time is better and ite ration speed is much better. It also conumes much less memory. It differs from a map also because random insertion/removal invalidates iterators because we insert into the middle of each chunk. If you only insert at the end you obviously keep your iterators valid.

Interesting. chain<> defaults to using a deque, not a list, for link management. this is O(1). Still, an allocator is no substitute for using the right data structures and

...

algorithms.

I agree. However, whichever data-structures and algorithms you use, they will need storage and an allocator.

...

A map of int to list is a data structure that can be expected to perform very poorly. It isn't completely fair to do performance comparisions against such a poor data structure.

This is true if I was proposing new data-structures. However, I am not comparing performance of data-structures, so that is irrelevant. The comparisons and results I am showing use *the exact same code and data-structures* for both tests. I am merely testing the execution speed of the code produced by a) using std::allocator and b) using monotonic::allocator. In either case, the data-structures and the code is identical and hence largely irrelevant. Of course, one can write code that is designed to skew results one way or the other, which I have endevoured to avoid. You can judge how well I have done that for yourself. Tests like test_dupe() however is designed to show that having a no-op deallocation is indeed fast.

...

I think an allocator that uses your chain buffer for storage and is safe could be a good thing. I think that the allocator can be implemented in a way that it doesn't need non-static data members. The chain is in and of itself not interesting as a replacement for deque, but as a memory buffer for a good allocator I think it could find its niche. I'd like to see the memory buffer implement object deallocation and memory reuse, and perhaps thread saftey (maybe optional thread safety enabled by default).

Stack-based containers are completely thread-safe by default with no extra work and no locking required. I have added: monotonic::storage<N> // renamed from monotonic::inline_storage<N> monotonic::shared_storage<N> // for thread-safe heap-based storage The nice thing about using monotonic::shared_storage is that you can use all containers in a thread-safe way transparently and efficiently, without needing to change the containers themselves. This is a very pleasing result. Regards, Christian.

Christian Schladetsch

10:24 p.m.

Results for http://tinyurl.com/n59qb8. This tests multiple uses of a map<int, list<int> > over a period of time. MSVC /O2 test_map_list: mono: 18.727 test_map_list: std: 44.936 GCC 4.3.3 -O4 test_map_list: mono: 14.86 test_map_list: std: 21.93 Cheers, Christian.

Christian Schladetsch

11:42 p.m.

I ran the same test from http://tinyurl.com/n59qb8 multiple times, for different data-structure sizes: MSVC /O2 size mono std mono/std -------------------------------------------------- 100 0.103 0.355 0.290141% 1100 1.439 4.402 0.326897% 2100 3.01 8.669 0.347214% 3100 4.841 13.186 0.367132% 4100 6.598 17.344 0.38042% 5100 8.584 21.848 0.392896% 6100 10.629 26.113 0.407039% 7100 12.595 30.762 0.409434% 8100 14.512 35.135 0.413035% 9100 16.996 39.713 0.427971% GCC 4.3.3 -O4 size mono std mono/std -------------------------------------------------- 100 0.08 0.12 0.666667% 1100 1.18 1.71 0.690058% 2100 2.4 3.48 0.689655% 3100 3.7 5.42 0.682657% 4100 5.11 8.06 0.633995% 5100 7 9.62 0.727651% 6100 8.27 11.78 0.702037% 7100 9.69 14.02 0.691155% 8100 11.72 16.29 0.71946% 9100 12.99 18.64 0.696888% Cheers, Christian.

Christian Schladetsch

16 Jun 16 Jun

3:17 a.m.

Due to popular demand, I introduce monotonic::chained_storage. http://tinyurl.com/llahjq /// storage that spans the stack/heap boundary. /// /// allocation requests first use inline storage of N bytes. /// once that is exhausted, later requests are serviced from the heap. /// /// all allocations remain valid at all times. template <size_t N, size_t MinLinkSize = N*1000, class Al = std::allocator<char> > struct chained_storage; I re-ran test_map_list() using monotonic::chained_storage<1000> storage; rather than monotonic::storage<1000000> storage; The rest of the code remained unchanged. MSVC /O2, chained_storage<1000> 100 0.175 0.358 0.488827% 1100 2.068 4.415 0.468403% 2100 4.145 8.59 0.482538% 3100 6.604 13.08 0.504893% 4100 8.821 17.246 0.511481% 5100 11.145 22.099 0.504321% 6100 13.708 26.431 0.518633% 7100 16.97 31.065 0.546274% 8100 19.394 36.177 0.536086% 9100 21.698 40.566 0.534881% GCC 4.33 -O4 100 0.09 0.13 0.692308% 1100 1.19 1.74 0.683908% 2100 2.45 3.69 0.663957% 3100 3.94 5.39 0.730983% 4100 5.25 7.4 0.709459% 5100 6.73 9.96 0.675703% 6100 8.64 11.68 0.739726% 7100 10.3 13.92 0.739943% 8100 11.63 16.29 0.713935% 9100 13.44 19.07 0.704772% Point being, chained_storage<N> is the same as storage<N>. But chained_storage<N> will only fail an allocation request after the heap is exhausted. chained_storage can be used as a buffer for either sequence-based or node-based containers. This buffer will start on the stack if you like, or via an inline buffer on the heap. Later requests that cannot be serviced by this inline buffer will then fall back to heap allocation, which is cached on reset(). You can use chained_storage<N> and storage<N> interchangeably. Indeed, there is little need for storage<N> now, unless you really want to set a fixed high-water mark on memory usage. I havel renamed storage<N> to fixed_storage<N>, and chained_storage<N> to storage<> with a default inline-storage size of 8k. Regards, Christian.

Christian Schladetsch

3:45 a.m.

For completeness, I have added monotonic::shared_storage<> http://tinyurl.com/m7hyap This is not tested. However, it simply puts a mutex guard around all pertinent methods that are passed through to a private storage. Note that you do not need shared_storage and pay for the mutex locks if your containers are on the stack. So, now you have stack-based storage for any container or number of containers if you wish, which will then transparently use the heap as required without invalidating containers that have some of their data on the stack. It is also thread-safe, either by guaranteeing data is on the stack using fixed_storage<N>, or by using shared_storage<>. Christian.

David Abrahams

19 Jun 19 Jun

12:45 a.m.

on Mon Jun 15 2009, Christian Schladetsch <christian.schladetsch-AT-gmail.com> wrote:

...

I ran the same test from http://tinyurl.com/n59qb8 multiple times, for different data-structure sizes:

...

GCC 4.3.3 -O4

IIUC, even -O3 is not really supported by the GCC people. Again IIUC, if you report a codegen bug with -O3, it may or may not be fixed, whereas an -O2 codegen bug is considered a defect. I can only imagine how reliable -O4 is. I'm not sure it's reasonable to base any comparisons on the result of such a feature. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Christian Schladetsch

16 Jun 16 Jun

4:25 a.m.

...

The nice thing about using monotonic::shared_storage is that you can use all containers in a thread-safe way transparently and efficiently, without needing to change the containers themselves.

This is a very pleasing result.

That may be, but it is also incorrect. Just by having the allocator thread-safe, doesn't mean that the containers that use that allocator are also thread-safe.

Thorsten Ottosen

12 Jun 12 Jun

10:37 a.m.

Christian Schladetsch skrev:

...

Hi David,

...

Then I re-ran your test. I also added another test, which does exactly the same thing, but using std::vector, rather than monotonic::vector:

Incrementing ord = 2.89 ps per iteration Incrementing ord = 2.86 ps per iteration STD: Incrementing ord = 3.1 ps per iteration STD: Incrementing ord = 3.1 ps per iteration

Please compare with Boost.Intrusive also. It's not hard to beat a standard implementation, but it is probably harder to beat intrusive containers. Also, if you need aligned storare, please use boost::aligned_storage. -Thorsten

Mathias Gaunard

13 Jun 13 Jun

4:54 p.m.

Christian Schladetsch wrote:

...

But its still not an allocation, it is a re-interpretation of space within an already-allocated buffer.

Which is what an allocation is. An allocation, as the name suggests, is simply the act of assigning some range of the computer memory to some use. What you're doing is simply allocating some range of memory on the execution stack to your container. I fail to see how the standard allocator concept wouldn't be as good for that. Sure, the standard allocator concept has deficiencies, it doesn't provide two-way growing and shrinking, nor does it say how much memory it actually has allocated you. But people are working on fixing that it seems.

Christian Schladetsch

11:01 p.m.

Hi Mathias, Christian Schladetsch wrote:

...

...
But its still not an allocation, it is a re-interpretation of space within an already-allocated buffer.

Which is what an allocation is. An allocation, as the name suggests, is simply the act of assigning some range of the computer memory to some use.

Yes, I was completely wrong about that. Although indeed the entire buffer is first allocated, there are many `allocations` made by different containers of different types within that initial buffer, each requiring correct alignment. I have corrected the deficit of the lack of corrent alignment in the code in the sandbox. This now uses boost::aligned_storage<T> in the allocator, which passes that down to the storage to make correctly aligned allocations within that shared buffer.

...

What you're doing is simply allocating some range of memory on the execution stack to your container.

That is indeed part of it, yes. Except that the same range of memory can be used by multiple containers, the range of memory can be placed on the stack *or* the heap, and by making de-allocation a no-op, memory management is optimally efficient in time but always increases in space.

...

I fail to see how the standard allocator concept wouldn't be as good for that.

It is indeed; the allocator in the proposed library is-a std::allocator. One key extra component that the library provides is a coupling with a storage concept that can be on the heap or the stack, and can be shared by multiple allocators, and by being 'monotonic', memory management is extremely fast.

...

Sure, the standard allocator concept has deficiencies, it doesn't provide two-way growing and shrinking, nor does it say how much memory it actually has allocated you. But people are working on fixing that it seems.

This proposal is not about extending or changing the std::allocator concept. There are other systems proposed for that, as I think most agree that it needs some work. This proposal uses std::allocators and a common storage concept to provide correctly-aligned allocation of objects of different types from the same buffer which can be on the heap or on the stack, using an optimal-in-time allocation scheme. Thanks, Christian.

David Bergman

11:20 p.m.

On Jun 13, 2009, at 7:01 PM, Christian Schladetsch wrote:

...

Hi Mathias,

Christian Schladetsch wrote:

...
...
But its still not an allocation, it is a re-interpretation of space within an already-allocated buffer.

Which is what an allocation is. An allocation, as the name suggests, is simply the act of assigning some range of the computer memory to some use.

Yes, I was completely wrong about that.

Although indeed the entire buffer is first allocated, there are many `allocations` made by different containers of different types within that initial buffer, each requiring correct alignment.

I have corrected the deficit of the lack of corrent alignment in the code in the sandbox. This now uses boost::aligned_storage<T> in the allocator, which passes that down to the storage to make correctly aligned allocations within that shared buffer.

...
What you're doing is simply allocating some range of memory on the execution stack to your container.

That is indeed part of it, yes. Except that the same range of memory can be used by multiple containers, the range of memory can be placed on the stack *or* the heap, and by making de-allocation a no-op, memory management is optimally efficient in time but always increases in space.

This is the key unique aspect of your allocator, so what I think you need to do is that this specific aspect can bring a higher performance. So create a sample with at least two containers, preferably one vector-like and the other a tree, using the same block. Make the block big enough to be serious but fit in a L2 cache comfortably ;-) That way you can sell this idea, which I think is rather unique: of having a small contiguous segment with cheap internal allocation for *two or more containers, and not restricted to vector-like containers*. [If one makes the block really big, one starts to compete with regular allocator schemes, which is not what I think you should do; i.e., focus on few containers cooperating in a high-performance scenario] /David

Christian Schladetsch

14 Jun 14 Jun

1:35 a.m.

Hi David, Christian> [...] the same range of memory can be

...

...
used by multiple containers, the range of memory can be placed on the stack *or* the heap, and by making de-allocation a no-op, memory management is optimally efficient in time but always increases in space.

David> This is the key unique aspect of your allocator, so what I think you need to do is that this specific aspect can bring a higher performance. So create a sample with at least two containers, preferably one vector-like and the other a tree, using the same block.

I have done this. The results show a %50 improvement. I posted the code three posts ago on this thread. It exists in the sandbox at https://svn.boost.org/svn/boost/sandbox/monotonic/libs/monotonic/test/main.c..., starting on line 419 "void test_map_list_realtime()".

...

That way you can sell this idea, which I think is rather unique: of having a small contiguous segment with cheap internal allocation for *two or more containers, and not restricted to vector-like containers*.

After I posted my initial map<int, list<int> > implementation using the proposed system, and got some good results (if tainted somewhat by my blunder on the 100x claim), Scott McMurray wrote a custom intrusive container just for this one case, and compared the results:

...

Scott: Upping it to 0x1010101 gave this:

mono: 35.82/35.71/35.82 std: 37.94/37.83/37.81 intrusive: 36.17/35.78/35.62 So, using a monotonic allocator with the stock standard containers is at least comparable if not better than even a hand-built intrusive container. Why? One thing to consider as well about the monotonic allocator is that it is also more efficient in space. This is because there is no overhead for keeping track of freelists or other memory-management issues. Specifically, say you have a map<int, int> and want to insert two nodes. Using MSVC's <xtree> for reference: struct Node { _Nodeptr _Left; // left subtree, or smallest element if head _Nodeptr _Parent; // parent, or root of tree if head _Nodeptr _Right; // right subtree, or largest element if head value_type _Myval; // the stored value, unused if head char _Color; // _Red or _Black, _Black if head char _Isnil; // true only if head (also nil) node }; This is 3*4 + 4 + 2 = 18 bytes per node on a typical 32-bit system. I assume a typical 32-bit system in the rest of this post. There will be vagaries between platforms and architectures, but even so that is incidental to the underlying point I will try to make. Let's have a quick look at what happens when you put one of these on the heap with new. There will be whatever alignment padding is required, plus whatever overhead your memory allocator uses. We can't say too much about what a generalised allocator requires, but at a guess I'd say: two extra pointers to insert into a linked list = +8 bytes a block size, and a used size = +8 bytes In a real system there would likely be more, but let's just say that it's around 16 bytes. Allocating for two new nodes here is (18 + 16)*2 = 68 bytes. Compare to using a monotonic allocator. Two new nodes use 36 bytes. Three use 54 bytes, which is still less than the space required for just two nodes using new. I have intentionally ignored alignment for both cases for the moment, I'll get to that in a moment. Now, compare to a list node. In this case, the ratio of overhead to value is higher still. A list node for ints uses 12 bytes - which is smaller than the allocation overhead! So you lose every time you allocate. On a typical 32-bit system: new allocating 4 nodes for list<int>: 4*(16 + 12) = 112 bytes mono allocating 4 nodes for list<int>: 4*(12) = 48 bytes. Plus, and this is very important for speed, the 4 nodes in the monotonic storage are exactly adjacent in memory. On GCC, which uses 4-byte alignment on intel platforms, there is not even any padding between the nodes. They will reside precisely adjacent in memory, and possibly even right there on the stack. I am guessing about the overhead for each allocation made by new. If it is 16 or 32 or 8, it is still higher than the overhead for a monotonic allocator, which is exactly zero. Of course, the ratio of waste goes down for larger values stored in the nodes, but whatever that ratio is, it will be smallest for a monotonic allocator which has exactly zero overhead. Yes, there will be extra required for alignment, but this is required for any allocator. Thanks again to Arytom and Thorsten for their perserverance agaisnt my ignorance on this matter. So, this is why a tree of lists of vectors is very fast and very small using monotonic allocators. It is why using a monotonic allocator for the standard containers produces code that is comparable to or even faster than hand-crafted intrusive containers. This, combined with the fact that the nodes themselves can be placed on the stack, and pointers in those nodes are also pointers to other nodes that are also on the stack, and quite likely very close on the stack as well. So the offsets are small, the entire data-structure itself is very small, there is no overhead other than the required alignment, and it is far easier to make efficient datastructures. As David and others have pointed out, any benefit given by this approach becomes impractical to leverage or swamped by other issues for very large structures. Even then however, if an upper limit is known, the storage can be put on the heap and not the stack and you can still get the benefit of zero allocation overhead and zero heap fragmentation. It is not for everyone, and it's certainly not a general solution. The approach here was born from the need for small, fast, local, short-lived, but general and nested containers, and it works best in that context.

...

/David

Regards, Christian.

Frank Mori Hess

10 Jun 10 Jun

3 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 10 June 2009, Christian Schladetsch wrote:

...

However, boost::monotonic provides an allocator that means that *all* STL containers - and other ones - can use the stack for storage. Or the heap.

Not the containers specified by c++03. Quoting from the allocator requirements section 20.1.5: "Implementations of containers described in this International Standard are permitted to assume that their Allocator template parameter meets the following two additional requirements beyond those in Table 32. — All instances of a given allocator type are required to be interchangeable and always compare equal to each other." which implies they are only required to support stateless allocators. I do believe providing an allocator that supports stack allocation with buffer fallback is the way to go though. It could still be used portably with the containers Ion mentioned that support stateful allocators. Or, you might be able to use the old trick of storing state used by the allocator immediately before the allocated buffer. That is, allocate a larger block than requested, use the beginning for state, and return a pointer offset beyond the state information. Also, is monotonic a good name? std::vector grows its capacity monotonically too, so it doesn't seem to capture any novel feature of your code over existing standard features. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkovyokACgkQ5vihyNWuA4WHTwCfaN3mlYfuaUFmCqAL0vUmc0z5 NIEAoKJ/wwAxwDT1+1U8yAyd23LIewQB =UXSJ -----END PGP SIGNATURE-----

Cory Nelson

3:16 a.m.

On Tue, Jun 9, 2009 at 6:48 PM, Christian Schladetsch<christian.schladetsch@gmail.com> wrote:

...

Hi Cory,

...
My largest complaint with C++ is that containers do not fit well with high-perf low-overhead scenarios, so I'm very interested in seeing something like this.

I would want to see it taken a step further though. Overhead can be a big deal for large objects, so re-implementing string and vector completely (so that you don't have the pointer/capacity specified twice, in container & allocator) would be a welcome change.

I don't think anyone doing high-performance code uses std::string. It is a nightmare, and similarly for std::vector, for the reasons you claim. However, re-writing these is not what I wish to do here...

...
Let the user specify storage manually and throw if you go over it.

That is precisely what my proposal does.

...
Make default copying just copy the pointers around instead of all data.

We can't lose value semantics for containers by default.

Where does the copy constructor get the new storage from? -- Cory Nelson http://int64.org

Christian Schladetsch

3:33 a.m.

...

Where does the copy constructor get the new storage from?

...

From the source object.

...

-- Cory Nelson http://int64.org _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Schladetsch

3:50 a.m.

In fact, I just did a test. void test_copy() { boost::monotonic::inline_storage<1000*sizeof(int)> storage; boost::monotonic::vector<int> v1(storage); for (int n = 0; n < 100; ++n) v1.push_back(n); size_t rem1 = storage.remaining(); boost::monotonic::vector<int> v2(v1); size_t rem2 = storage.remaining(); assert(v2 == v1); assert(rem1 - rem2 == 100*sizeof(int)); } The interesting thing is that stepping-through the assignment in Visual Studio had no source-code. It was entirely moved inline and was performed in one step, even in a debug build. On Wed, Jun 10, 2009 at 3:33 PM, Christian Schladetsch < christian.schladetsch@gmail.com> wrote:

...

Where does the copy constructor get the new storage from?

From the source object.

...
-- Cory Nelson http://int64.org _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

John Phillips

2:09 a.m.

Cory Nelson wrote:

...

[snip] Make default copying just copy the pointers around instead of all data.

What then happens if one holder of the pointer changes some of the data? Copy on Write is OK in a single threaded system, but multiple threads leads to big problems. John

Scott McMurray

2:41 a.m.

2009/6/9 Cory Nelson <phrosty@gmail.com>:

...

I would want to see it taken a step further though. Overhead can be a big deal for large objects, so re-implementing string and vector completely (so that you don't have the pointer/capacity specified twice, in container & allocator) would be a welcome change.

An implementation could already define new/delete and std::allocator in such a way that the allocation block size wasn't stored in the allocation system. Are there any that do so? (And -- if we could ignore backwards compatability -- would it be worth requiring delete[n] so that the same could be done for new[]/delete[]? My gut feeling is that programmers have that information around at delete[]-time anyways...) ~ Scott

Cory Nelson

3:14 a.m.

On Tue, Jun 9, 2009 at 7:41 PM, Scott McMurray<me22.ca+boost@gmail.com> wrote:

...

2009/6/9 Cory Nelson <phrosty@gmail.com>:

...
I would want to see it taken a step further though. Overhead can be a big deal for large objects, so re-implementing string and vector completely (so that you don't have the pointer/capacity specified twice, in container & allocator) would be a welcome change.

An implementation could already define new/delete and std::allocator in such a way that the allocation block size wasn't stored in the allocation system.

Yes, I should have gone into more depth -- I meant allocating large objects with an arena allocator (just incrementing a pointer). -- Cory Nelson http://int64.org

5858

Age (days ago)

5870

Last active (days ago)

List overview

Download

156 comments

24 participants

participants (24)

Andrew Sutton
Christian Schladetsch
christopher hopman
Christopher Jefferson
Cory Nelson
David Abrahams
David Bergman
Francois Barel
Frank Mori Hess
joaquin＠tid.es
joel
John Phillips
Jonas Persson
Mathias Gaunard
Peter Bindels
Peter Soetens
Peter Soetens
Ross Levine
Scott McMurray
Simonson, Lucanus J
Steven Watanabe
Stewart, Robert
Thorsten Ottosen
vicente.botet