[utility] new auto_buffer class --- RFC

Dear all, For a long time I have been missing this utility in my own work, and in boost related stuff too. I think there are a number of libraries in boost that can benefit from this utility. Libs like boost.format can be quite a bit more effective. So can logging libraries. etc. Basically boost::auto_buffer<T,N> is a C++ version of variable lenght array (from C99). It's based on Matthew Wilson's stlsoft::auto_buffer, which can be found described in his book "Imperfect C++". I have changed a few things though: // - no swap() and resize() provided // - works with allocators from Boost.Interprocess // - do not store T[N] on the stack, as this is bad when the // type has a non-trivial default constructor. // - handless non-pod types in an exception-safe manner // - has a fixed capacity, but a non-fixed size // - provides a non-growing (thus inlineable) push_back() Comments are most welcome. best regards -Thorsten

AMDG Mathias Gaunard wrote:
Thorsten Ottosen wrote:
Basically boost::auto_buffer<T,N> is a C++ version of variable lenght array (from C99).
N is known at compile-time. Thus I do not see how it is related to C99 VLAs at all.
N is not the number of elements in the container. N controls how much stack storage is allocated. The number of elements is dynamic. In Christ, Steven Watanabe

Steven Watanabe skrev:
AMDG
Mathias Gaunard wrote:
Thorsten Ottosen wrote:
Basically boost::auto_buffer<T,N> is a C++ version of variable lenght array (from C99).
N is known at compile-time. Thus I do not see how it is related to C99 VLAs at all.
N is not the number of elements in the container. N controls how much stack storage is allocated. The number of elements is dynamic.
Exactly. The idea is that you know ca in advance the maximum number of elements you need for your temporary array. For example, in a logging library, you might not expect 95% of all lines to be less than 512 chars. Thus you can use boost::auto_buffer<char,512> buffer; to avoid heap-allocations in 95% of the cases. This can be a major speed-improvement. For the remaining 5% of the cases, the buffer will be placed on the heap, but the code from the user's persective remains the same. Also, Matthew Wilson compares VLA impleemntations, and they are very bad, often not working properly in connection with recursion. auto_buffer simply works. -Thorsten

On Mon, Mar 2, 2009 at 4:35 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Steven Watanabe skrev:
AMDG
Mathias Gaunard wrote:
Thorsten Ottosen wrote:
Basically boost::auto_buffer<T,N> is a C++ version of variable lenght array (from C99).
N is known at compile-time. Thus I do not see how it is related to C99 VLAs at all.
N is not the number of elements in the container. N controls how much stack storage is allocated. The number of elements is dynamic.
Exactly.
The idea is that you know ca in advance the maximum number of elements you need for your temporary array. For example, in a logging library, you might not expect 95% of all lines to be less than 512 chars. Thus you can use
boost::auto_buffer<char,512> buffer;
to avoid heap-allocations in 95% of the cases. This can be a major speed-improvement. For the remaining 5% of the cases, the buffer will be placed on the heap, but the code from the user's persective remains the same.
I think that sounds very useful. I'd be interested in seeing something like that in boost. But it's not anything like C99 VLAs, I wouldn't refer to it as such. -- Cory Nelson

Cory Nelson skrev:
On Mon, Mar 2, 2009 at 4:35 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Steven Watanabe skrev:
AMDG
Mathias Gaunard wrote:
Thorsten Ottosen wrote:
Basically boost::auto_buffer<T,N> is a C++ version of variable lenght array (from C99). N is known at compile-time. Thus I do not see how it is related to C99 VLAs at all. N is not the number of elements in the container. N controls how much stack storage is allocated. The number of elements is dynamic. Exactly.
[snip]
I think that sounds very useful. I'd be interested in seeing something like that in boost. But it's not anything like C99 VLAs, I wouldn't refer to it as such.
Well, AFAIK, they are very similar. You set the capacity once when you initialize am object: boost::auto_buffer<T> buffer( get_max_capacity_from_somewhere() ); isn't that exactly what you do with VLAs? -Thorsten

Thorsten Ottosen wrote:
Well, AFAIK, they are very similar. You set the capacity once when you initialize am object:
boost::auto_buffer<T> buffer( get_max_capacity_from_somewhere() );
isn't that exactly what you do with VLAs?
With VLAs, the buffer is on the stack whatever the capacity. With auto_buffer, the buffer is on the stack only if the capacity is smaller than N (which you define to be 256 by default).

On Mon, Mar 2, 2009 at 9:42 AM, Mathias Gaunard < mathias.gaunard@ens-lyon.org> wrote:
With VLAs, the buffer is on the stack whatever the capacity. With auto_buffer, the buffer is on the stack only if the capacity is smaller than N (which you define to be 256 by default).
I am wondering if there should be a default at all for N here as it seems like it is not really possible to make a good decision as to what that default value should be at the library level. If anything, I'd suggest at least having the default value able to be easily changed by users of the library with a #define. That raw, magic 256 at such a level is just a little scary to me. -- -Matt Calabrese

Matt Calabrese skrev:
On Mon, Mar 2, 2009 at 9:42 AM, Mathias Gaunard < mathias.gaunard@ens-lyon.org> wrote:
With VLAs, the buffer is on the stack whatever the capacity. With auto_buffer, the buffer is on the stack only if the capacity is smaller than N (which you define to be 256 by default).
I am wondering if there should be a default at all for N here as it seems like it is not really possible to make a good decision as to what that default value should be at the library level. If anything, I'd suggest at least having the default value able to be easily changed by users of the library with a #define. That raw, magic 256 at such a level is just a little scary to me.
I'm more scared by a #define that at the 256 default. Different users (or different library implementers) can make their own decision regarding this based on domain specific knowledge. For example, Boost.Format might conjecture that 99% of all strings are less than 350 chars, and so use boost::auto_buffer<char,350> OTOH, in my custom project where we work with NP-Hard graph problems, I know that N=64 is the maximum possible vertex set size I can handle. So I can use boost::auto_buffer<unsigned,64> ------------------- One might speculate if the default should be the number of elements of the stack buffer, or the total size of the stack buffer. If somebody wrote boost::auto_buffer< boost::array<T,100> > buffer; this would consume quite much stack space. Alternatively we might try to compute a reasoable default size based on the size of the type and based on minimum and maximum default sizes. -Thorsten

On Mon, Mar 2, 2009 at 10:29 AM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
I'm more scared by a #define that at the 256 default.
Hmm? Why is that? On the contrary, I would say a #define here is simply good practice. A raw 256 is only providing a default at a level where you have little-to-no knowledge of what such a default should be. Allowing the default to be overridden via a #define lets a user easily override that value based on his own requirements, and do so without having to directly modify any boost code (often just from command line arguments to the compiler or from project settings in an IDE) and without having to use a metafunction/template alias/wrapper to make a new default for his or her domain. I see all of this as much more preferable to a strict default value of 256. I'm all about avoiding the preprocessor when there are better alternatives, but here it seems to be the ideal solution and offers the most flexibility at no cost. Different users (or different library implementers)
can make their own decision regarding this based on domain specific knowledge.
For example, Boost.Format might conjecture that 99% of all strings are less than 350 chars, and so use
OTOH, in my custom project where we work with NP-Hard graph problems, I know that N=64
One might speculate if the default should be the number of elements of the stack buffer, or the total size of the stack buffer. If somebody wrote
This is precisely my point. At what point does it make sense for 256 to be the default? IMHO, if you have little basis for the value other than that you think some default should be there and 256 is a nice round power of 2, it should at least be easy to customize by users of the library, otherwise I question the usefulness of such a default. -- -Matt Calabrese

On Mon, Mar 2, 2009 at 1:11 PM, Matt Calabrese <rivorus@gmail.com> wrote:
On Mon, Mar 2, 2009 at 10:29 AM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
I'm more scared by a #define that at the 256 default.
Hmm? Why is that? On the contrary, I would say a #define here is simply good practice. A raw 256 is only providing a default at a level where you have little-to-no knowledge of what such a default should be. Allowing the default to be overridden via a #define lets a user easily override that value based on his own requirements, and do so without having to directly modify any boost code (often just from command line arguments to the compiler or from project settings in an IDE) and without having to use a metafunction/template alias/wrapper to make a new default for his or her domain. I see all of this as much more preferable to a strict default value of 256.
I rather have no default at all. It avoids ODR violations by defining different defaults for different libraries, and there's no reason why one number would be a better default than the other. If someone wants to create a specific default, it can derive from auto_buffer. [snip]
-- -Matt Calabrese
-- Felipe Magno de Almeida

On Mon, Mar 2, 2009 at 11:19 AM, Frank Mori Hess <frank.hess@nist.gov> wrote:
If a user wanted the default stack capacity for some auto_buffer to be set via macro, couldn't they just do
boost::auto_buffer<int, DEFAULT_CAPACITY> buf;
One could say the same for default function or template arguments in any context. Of course you could always have the user explicitly pass a "default" with every use, but that in many ways defeats the purpose of having a default to begin with. If in the general sense it is an implementation detail, especially one which may require knowledge of the underlying implementation to set "correctly," an appropriate implicit default is much better than having a user explicitly specify the same constant over and over. A general user should not have to be concerned with explicitly passing the default around. I do, however, agree that no default is better than a fixed 256 default. It should be easily customizable if need be. Direct support for a #define makes it easy for users to have some implementation-defined default during the development and then non-intrusively be able to go back and tweak the default if it turns out to be a problem. They can always wrap auto_buffer or use metafunction calls, etc. but those both require unnecessary effort to simply configure a default, where in most compilers configuring it via a #define means simply passing a command line argument. It also makes it easy for someone to go back and change the default if throughout their code they were using auto_buffer directly rather than a wrapper, but then later on in development realized that the default was insufficient for their needs. Without the #define, you pretty would always be wrapping auto_buffer or explicitly passing a default everywhere in order to be "future-safe" if there is any chance that you may need to change the default for some reason (I.E. compiling for a platform where you have a much smaller stack). On Mon, Mar 2, 2009 at 11:18 AM, Felipe Magno de Almeida < felipe.m.almeida@gmail.com> wrote:
I rather have no default at all. It avoids ODR violations by defining different defaults for different libraries, and there's no reason why one number would be a better default than the other. If someone wants to create a specific default, it can derive from auto_buffer.
Doesn't a similar issue exist with certain other libraries in boost that offer customization of limits? You have to be careful to not violate ODR with any kind of configuration like this, but the functionality is at least still there. To be clear, I wouldn't propose supporting the use of different macro definitions in the same program, for that you should fall back to using a meta function or template alias, etc. This is intended for per-program use. As well, you can error on at least some issues that may arise by having the namespace the template is defined in be dependent on the value of the #define, then pull that template into namespace boost via using to allow easy usage. This has the benefit of making calls not link if a function is defined in one translation unit with one #define but invoked from another translation unit with another #define since the namespace the "using" resolves to would be different in each case. Example: _____________________ namespace boost { namespace BOOST_PP_CAT( auto_buffer, BOOST_AUTO_BUFFER_DEFAULT_SIZE ) { // Definition here } using BOOST_PP_CAT( auto_buffer, BOOST_AUTO_BUFFER_DEFAULT_SIZE )::auto_buffer; } _____________________ It's still hairy overall and the user should again avoid using 2 different defines in the same program, but at least a way is offered to specify a default. If no one else is with me on this, I am in support of no default at all rather than a fixed one. -- -Matt Calabrese

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 02 March 2009, Matt Calabrese wrote:
Hmm? Why is that? On the contrary, I would say a #define here is simply good practice. A raw 256 is only providing a default at a level where you have little-to-no knowledge of what such a default should be. Allowing the default to be overridden via a #define lets a user easily override that value based on his own requirements, and do so without having to directly modify any boost code (often just from command line arguments to the compiler or from project settings in an IDE) and without having to use a metafunction/template alias/wrapper to make a new default for his or her domain. I see all of this as much more preferable to a strict default value of 256.
I'm all about avoiding the preprocessor when there are better alternatives, but here it seems to be the ideal solution and offers the most flexibility at no cost.
If a user wanted the default stack capacity for some auto_buffer to be set via macro, couldn't they just do boost::auto_buffer<int, DEFAULT_CAPACITY> buf; where DEFAULT_CAPACITY is the macro? That would seem to be more flexible, as you could define multiple macros used by different auto_buffers that serve different purposes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkmsBwIACgkQ5vihyNWuA4WBpQCfZaGxJjvHrAiT3nKMgA9JQBCP Y08AniaUQeEiZTNjDgcfQ8uUbXpFqrQy =EWeO -----END PGP SIGNATURE-----

On Mon, Mar 2, 2009 at 10:29 AM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
Alternatively we might try to compute a reasoable default size based on the size of the type and based on minimum and maximum default sizes.
Agreed. Especially in terms of providing a meaningful default, the amount of bytes being used up on the stack is much more of a concern than the number of individual objects on the stack. In retrospect, it might even make sense for the template parameter to be in bytes rather than object count. Concerning your implementation there is at least one major issue I have -- your use of "optimized_const_reference." You define optimized_const_reference to be a const value_type if the value_type has a trivial assignment operation. You then return this from const accessors such as front, back, and operator []. While it first this might seem like a clever trick, it is flawed in several subtle ways. Firstly, it is not necessarily optimized because the object may be considerably large, implying you'd be making a copy of a large object simply by calling operator [], but aside from that there are much more important reasons why you should avoid this. Unless I am mistaken, it makes your container not a valid STL container, although it already has an O(n) swap which is unavoidable. But the main reason why I consider this a fault is that it now means that if you store a pointer/reference to an element in the array taken from a non-const accessor it will not reference the same object as what is returned from a const accessor which can cause obvious subtle issues (and it means you cannot get a pointer to an element in the array when all you have is a const reference to the container unless you use "data")! It also means that you cannot directly take the address of the values returned from your const accessor functions in cases where the type contained has a trivial assignment operation since you can't directly take the address of a temporary. All of this is avoided by simply returning a const_reference to the contained object, and it makes the behavior for your template consistent between different instantiations, not to mention with containers in the STL. While the optimized_const_reference trick is clever, I do not believe it is appropriate here. -- -Matt Calabrese

Matt Calabrese skrev:
On Mon, Mar 2, 2009 at 10:29 AM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
Alternatively we might try to compute a reasoable default size based on the size of the type and based on minimum and maximum default sizes.
Agreed. Especially in terms of providing a meaningful default, the amount of bytes being used up on the stack is much more of a concern than the number of individual objects on the stack. In retrospect, it might even make sense for the template parameter to be in bytes rather than object count.
Concerning your implementation there is at least one major issue I have -- your use of "optimized_const_reference." You define optimized_const_reference to be a const value_type if the value_type has a trivial assignment operation. You then return this from const accessors such as front, back, and operator []. While it first this might seem like a clever trick, it is flawed in several subtle ways. Firstly, it is not necessarily optimized because the object may be considerably large, implying you'd be making a copy of a large object simply by calling operator [], but aside from that there are much more important reasons why you should avoid this.
Yes, it should also take into the account that the object's size should be <= sizeof(long double).
Unless I am mistaken, it makes your container not a valid STL container, although it already has an O(n) swap which is unavoidable. But the main reason why I consider this a fault is that it now means that if you store a pointer/reference to an element in the array taken from a non-const accessor it will not reference the same object as what is returned from a const accessor which can cause obvious subtle issues (and it means you cannot get a pointer to an element in the array when all you have is a const reference to the container unless you use "data")!
or *obj.begin()[n]. So there are plenty of ways to get a refence into the array.
It also means that you cannot directly take the address of the values returned from your const accessor functions in cases where the type contained has a trivial assignment operation since you can't directly take the address of a temporary.
True, you can only bind to a local const reference.
All of this is avoided by simply returning a const_reference to the contained object, and it makes the behavior for your template consistent between different instantiations, not to mention with containers in the STL. While the optimized_const_reference trick is clever, I do not believe it is appropriate here.
Major use-cases are for types like char, int, wchar_t. Maybe this is a good place to discuss this, but AFAIR, boost::circular_buffer does excatly this. -Thorsten

On Mon, Mar 2, 2009 at 1:04 PM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
or *obj.begin()[n]. So there are plenty of ways to get a refence into the array.
True, though you have the admit that is a bit odd and inconsistent with the STL. If someone really wants the optimized returns, I would maybe consider having them named differently from those of container concepts but still supporting STL-style accessors by default. This way, generic code written for standard containers will work with your types. On Mon, Mar 2, 2009 at 1:04 PM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
Maybe this is a good place to discuss this, but AFAIR, boost::circular_buffer does excatly this.
Oh wow, I had no idea. If that's true, I'm a little bit concerned, though if it went through review fine maybe I'm just being too pedantic. -- -Matt Calabrese

Matt Calabrese skrev:
On Mon, Mar 2, 2009 at 1:04 PM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
or *obj.begin()[n]. So there are plenty of ways to get a refence into the array.
True, though you have the admit that is a bit odd and inconsistent with the STL. If someone really wants the optimized returns, I would maybe consider having them named differently from those of container concepts but still supporting STL-style accessors by default. This way, generic code written for standard containers will work with your types.
Well, it has to be generic code that iterates over a const container, and that tries to store a const reference to an object non-locally via operator[]. Most generic code is based on iterators, and for *i there is no problem.
On Mon, Mar 2, 2009 at 1:04 PM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
Maybe this is a good place to discuss this, but AFAIR, boost::circular_buffer does excatly this.
Oh wow, I had no idea. If that's true, I'm a little bit concerned, though if it went through review fine maybe I'm just being too pedantic.
I don't know id this was present before the review, or if it was added later. Any comments, Jan? -Thorsten

Thorsten Ottosen skrev:
Matt Calabrese skrev:
On Mon, Mar 2, 2009 at 1:04 PM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
or *obj.begin()[n]. So there are plenty of ways to get a refence into the array.
True, though you have the admit that is a bit odd and inconsistent with the STL. If someone really wants the optimized returns, I would maybe consider having them named differently from those of container concepts but still supporting STL-style accessors by default. This way, generic code written for standard containers will work with your types.
I don't know id this was present before the review, or if it was added later. Any comments, Jan?
I talked with Jan. He said it was suggested by a reviewer. -Thorsten

Mathias Gaunard skrev:
Thorsten Ottosen wrote:
Well, AFAIK, they are very similar. You set the capacity once when you initialize am object:
boost::auto_buffer<T> buffer( get_max_capacity_from_somewhere() );
isn't that exactly what you do with VLAs?
With VLAs, the buffer is on the stack whatever the capacity. With auto_buffer, the buffer is on the stack only if the capacity is smaller than N (which you define to be 256 by default).
Yes. I'm talking about the conceptual behavior of the class. -Thorsten

Thorsten Ottosen wrote:
Mathias Gaunard skrev:
Thorsten Ottosen wrote:
Well, AFAIK, they are very similar. You set the capacity once when you initialize am object:
boost::auto_buffer<T> buffer( get_max_capacity_from_somewhere() );
isn't that exactly what you do with VLAs?
With VLAs, the buffer is on the stack whatever the capacity. With auto_buffer, the buffer is on the stack only if the capacity is smaller than N (which you define to be 256 by default).
Yes. I'm talking about the conceptual behavior of the class.
A VLA is a fixed-size array whose size is determined at runtime. It is conceptually more like scoped_array than auto_buffer. Basically, we have all these array type possibilities (fixed capacity implies dynamic size): - fixed compile-time size: C/C++ arrays or boost::array - fixed runtime size: VLAs or scoped_array (we need better, something without pointers in the interface and with copy semantics) - fixed compile-time capacity: nothing (something would be useful) - fixed runtime capacity: auto_buffer - variable capacity (necessarily at runtime): std::vector All of these situations have different strategies for memory management: - allocate everything within the object itself, that is to say on the execution stack (only possible for fixed compile-time size or capacity) - allocate everything in an area external to the object, such as was is colloquially known as the heap, and reference that area (possible for every situation, and may be a better strategy even if the first is possible) - allocate the buffer in the object itself if the buffer is smaller than a certain fixed compile-time size, and use an external memory source otherwise. That technique is typically known as Small Buffer Optimization (possible for every situation) So auto_buffer is a fixed runtime capacity array which uses the third strategy. I suggest all array type possibilities be implemented with all memory management strategies. Which means obviously making some kind of policy-based design, with SBO something independent that you can plug into auto_buffer. I have worked on some kind of allocator interface that allows SBO, if there is interest.

Mathias Gaunard skrev:
Thorsten Ottosen wrote:
Mathias Gaunard skrev:
Thorsten Ottosen wrote:
Well, AFAIK, they are very similar. You set the capacity once when you initialize am object:
boost::auto_buffer<T> buffer( get_max_capacity_from_somewhere() );
isn't that exactly what you do with VLAs?
With VLAs, the buffer is on the stack whatever the capacity. With auto_buffer, the buffer is on the stack only if the capacity is smaller than N (which you define to be 256 by default).
Yes. I'm talking about the conceptual behavior of the class.
A VLA is a fixed-size array whose size is determined at runtime. It is conceptually more like scoped_array than auto_buffer.
Basically, we have all these array type possibilities (fixed capacity implies dynamic size): - fixed compile-time size: C/C++ arrays or boost::array - fixed runtime size: VLAs or scoped_array (we need better, something without pointers in the interface and with copy semantics) - fixed compile-time capacity: nothing (something would be useful)
Let's call this static_auto_buffer.
- fixed runtime capacity: auto_buffer - variable capacity (necessarily at runtime): std::vector
All of these situations have different strategies for memory management: - allocate everything within the object itself, that is to say on the execution stack (only possible for fixed compile-time size or capacity) - allocate everything in an area external to the object, such as was is colloquially known as the heap, and reference that area (possible for every situation, and may be a better strategy even if the first is possible) - allocate the buffer in the object itself if the buffer is smaller than a certain fixed compile-time size, and use an external memory source otherwise. That technique is typically known as Small Buffer Optimization (possible for every situation)
So auto_buffer is a fixed runtime capacity array which uses the third strategy.
I suggest all array type possibilities be implemented with all memory management strategies. Which means obviously making some kind of policy-based design, with SBO something independent that you can plug into auto_buffer.
I have worked on some kind of allocator interface that allows SBO, if there is interest.
Thanks for the detailed analysis. The question is if there is much need for all these permutations. I do use static_auto_buffer, but the overhead induced by auto_buffer is very small (1 word + a few instructions in initialization). I question the using auto_buffer without an internal stack buffer. This is more or less the whole reason for its existance. -Thorsten

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 02 March 2009, Thorsten Ottosen wrote:
you might not expect 95% of all lines to be less than 512 chars. Thus you can use
boost::auto_buffer<char,512> buffer;
to avoid heap-allocations in 95% of the cases. This can be a major speed-improvement. For the remaining 5% of the cases, the buffer will be placed on the heap, but the code from the user's persective remains the same.
- From glancing at the implementation posted to the list, it does not appear to fall back to heap allocation once the maximum stack capacity is reached. push_back() simply asserts that the stack capacity hasn't been used up yet. Although, I would be able to use such a container if it were added as a utility to boost, as I implemented one for my own use as an internal optimization in Boost.Signals2 (boost/signals2/detail/stack_vector.hpp and stack_allocator.hpp). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkmr/MsACgkQ5vihyNWuA4XaiQCgtM4TxduJuLQ6+Zw7+Jlw2ddp 1DYAoNj+MXS6R+jHy3v7HJHsbjfbJWlf =5p8M -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 02 March 2009, Frank Mori Hess wrote:
On Monday 02 March 2009, Thorsten Ottosen wrote:
you might not expect 95% of all lines to be less than 512 chars. Thus you can use
boost::auto_buffer<char,512> buffer;
to avoid heap-allocations in 95% of the cases. This can be a major speed-improvement. For the remaining 5% of the cases, the buffer will be placed on the heap, but the code from the user's persective remains the same.
From glancing at the implementation posted to the list, it does not appear to fall back to heap allocation once the maximum stack capacity is reached. push_back() simply asserts that the stack capacity hasn't been used up yet. Although, I would be able to use such a container if it were added as a utility to boost, as I implemented one for my own use as an internal optimization in Boost.Signals2 (boost/signals2/detail/stack_vector.hpp and stack_allocator.hpp).
Ah, looking at little further at the code I see when you say fall-back on the heap, you only mean you can pass a capacity to the constructor that exceeds the stack storage, but the auto_buffer can never be resized after that. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkmsAVwACgkQ5vihyNWuA4UH/wCfZoczUieYwY4CKwVnhXEPHKlj N/EAoNvOYkK7szkS4ngU7AKRnTB9bN68 =taMD -----END PGP SIGNATURE-----

to avoid heap-allocations in 95% of the cases. This can be a major speed-improvement. For the remaining 5% of the cases, the buffer will be placed on the heap, but the code from the user's persective remains the same.
- From glancing at the implementation posted to the list, it does not appear to fall back to heap allocation once the maximum stack capacity is reached. push_back() simply asserts that the stack capacity hasn't been used up yet.
In which case how does this differ from boost::array? John.

John Maddock skrev:
to avoid heap-allocations in 95% of the cases. This can be a major speed-improvement. For the remaining 5% of the cases, the buffer will be placed on the heap, but the code from the user's persective remains the same.
- From glancing at the implementation posted to the list, it does not appear to fall back to heap allocation once the maximum stack capacity is reached. push_back() simply asserts that the stack capacity hasn't been used up yet.
Yes, that is necessary to make push_back() inlinable. I also think this fits most usage scenarios.
In which case how does this differ from boost::array?
The capacity is fixed (at run-time), but the size is not. The size is initially 0, but can be expanded (mainly for PODs) with uninitialized_grow(). I don't use T[N] internally to allow this. -Thorsten

Thorsten Ottosen:
John Maddock skrev:
- From glancing at the implementation posted to the list, it does not appear to fall back to heap allocation once the maximum stack capacity is reached. push_back() simply asserts that the stack capacity hasn't been used up yet.
Yes, that is necessary to make push_back() inlinable.
It's also necessary if you want to introduce stack buffer overflow attacks. Now, I don't question the right of every C++ programmer to be able to overflow the stack, but I don't like this ability being presented under the name "push_back".
I also think this fits most usage scenarios.
I'm not sure. A typical use case for push_back is when the programmer doesn't know the number of elements in advance. for x in w if pred(x) v.push_back(x) The typical number of elements satisfying pred may be ~7.1, so making v have a stack capacity of 8 will eliminate a heap allocation and will be a big win. But you can't determine the maximum capacity in advance. The inability to grow also penalizes cases like for # of rows v.clear() read number of elements into n for each element read it into e, v.push_back(e) process( v.begin(), v.end() ) In this case the programmer wants to reuse v between the rows to minimize the number of heap allocations. But there is no way to know what the maximum n would end up to be.

Peter Dimov skrev:
Thorsten Ottosen:
John Maddock skrev:
- From glancing at the implementation posted to the list, it does not appear to fall back to heap allocation once the maximum stack capacity is reached. push_back() simply asserts that the stack capacity hasn't been used up yet.
Yes, that is necessary to make push_back() inlinable.
It's also necessary if you want to introduce stack buffer overflow attacks. Now, I don't question the right of every C++ programmer to be able to overflow the stack, but I don't like this ability being presented under the name "push_back".
I also think this fits most usage scenarios.
I'm not sure. A typical use case for push_back is when the programmer doesn't know the number of elements in advance.
for x in w if pred(x) v.push_back(x)
The typical number of elements satisfying pred may be ~7.1, so making v have a stack capacity of 8 will eliminate a heap allocation and will be a big win. But you can't determine the maximum capacity in advance.
Is it not |w|?
The inability to grow also penalizes cases like
for # of rows v.clear() read number of elements into n for each element read it into e, v.push_back(e) process( v.begin(), v.end() )
In this case the programmer wants to reuse v between the rows to minimize the number of heap allocations. But there is no way to know what the maximum n would end up to be.
A good points. FWIW, stlsoft::auto_buffer is resizeable. It's not that hard to add this functionality, though. And we can add unchecked_push_back() then. -Thorsten

On Mon, Mar 2, 2009 at 12:46 PM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
FWIW, stlsoft::auto_buffer is resizeable. It's not that hard to add this functionality, though.
Could you expand on that a bit further? I assume references and iterators were invalidated on resize?
And we can add unchecked_push_back() then.
Yep. --Beman

Beman Dawes skrev:
On Mon, Mar 2, 2009 at 12:46 PM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
FWIW, stlsoft::auto_buffer is resizeable. It's not that hard to add this functionality, though.
Could you expand on that a bit further? I assume references and iterators were invalidated on resize?
Yes, that is unavoidable for array-based containers. -Thorsten

Thorsten Ottosen:
Peter Dimov skrev: ...
for x in w if pred(x) v.push_back(x)
The typical number of elements satisfying pred may be ~7.1, so making v have a stack capacity of 8 will eliminate a heap allocation and will be a big win. But you can't determine the maximum capacity in advance.
Is it not |w|?
Yes, it technically is bounded by |w|. But using |w| as the capacity of v defeats the purpose of the optimization, which is to avoid allocating storage for |w| elements (typically on the heap).

Peter Dimov skrev:
Thorsten Ottosen:
Peter Dimov skrev: ....
for x in w if pred(x) v.push_back(x)
The typical number of elements satisfying pred may be ~7.1, so making v have a stack capacity of 8 will eliminate a heap allocation and will be a big win. But you can't determine the maximum capacity in advance.
Is it not |w|?
Yes, it technically is bounded by |w|. But using |w| as the capacity of v defeats the purpose of the optimization, which is to avoid allocating storage for |w| elements (typically on the heap).
But then you can't know if using auto_buffer is an optimization. If I use auto_buffer<T> buffer(|w|) I risk that the buffer must be resized several times. If it must be resize only once, simply using vector + reserve() would be at least as good. -Thorsten

Thorsten Ottosen:
Peter Dimov skrev:
Thorsten Ottosen:
Peter Dimov skrev: ....
for x in w if pred(x) v.push_back(x)
The typical number of elements satisfying pred may be ~7.1, so making v have a stack capacity of 8 will eliminate a heap allocation and will be a big win. But you can't determine the maximum capacity in advance.
Is it not |w|?
Yes, it technically is bounded by |w|. But using |w| as the capacity of v defeats the purpose of the optimization, which is to avoid allocating storage for |w| elements (typically on the heap).
But then you can't know if using auto_buffer is an optimization.
It depends on the distribution of |v|. Let's say that in X% of the cases the final |v| is less than a constant K. When X is close enough to 100, using auto_buffer<T,K> is a win. Even though some outliers may cause more than one allocation, the majority will cause zero.

Peter Dimov skrev:
Thorsten Ottosen:
Peter Dimov skrev:
Thorsten Ottosen:
Peter Dimov skrev: ....
for x in w if pred(x) v.push_back(x)
The typical number of elements satisfying pred may be ~7.1, so making v have a stack capacity of 8 will eliminate a heap allocation and will be a big win. But you can't determine the maximum capacity in advance.
Is it not |w|?
Yes, it technically is bounded by |w|. But using |w| as the capacity of v defeats the purpose of the optimization, which is to avoid allocating storage for |w| elements (typically on the heap).
But then you can't know if using auto_buffer is an optimization.
It depends on the distribution of |v|. Let's say that in X% of the cases the final |v| is less than a constant K. When X is close enough to 100, using auto_buffer<T,K> is a win. Even though some outliers may cause more than one allocation, the majority will cause zero.
Right. But it is *very* complicated to analyse such a situation. It could, as you say, include multiple buffer-expansions with different "movability cost" based on T. And the check for buffer overflow has a cost. And the fact that push_back() is not inlined could also have a *major* impact. <stupid_idea> I guess it would be an advantage if "push_back" could say "if we must go for a heap buffer, make it at least |w|" So maybe for x in w if pred(x) v.push_back(x,|w|) </stupid_idea> -Thorsten

Thorsten Ottosen skrev:
Peter Dimov skrev:
The inability to grow also penalizes cases like
for # of rows v.clear() read number of elements into n for each element read it into e, v.push_back(e) process( v.begin(), v.end() )
In this case the programmer wants to reuse v between the rows to minimize the number of heap allocations. But there is no way to know what the maximum n would end up to be.
A good points.
FWIW, stlsoft::auto_buffer is resizeable. It's not that hard to add this functionality, though.
I guess your case above should be implemented as for # of rows v.clear(); v.resize( n ); for each element read e v.push_back_unchecked(e) process( v.begin(), v.end() ) where v.resize() should never actually shrink the buffer. (Do we need another function that does that?), or should we simply rename v.resize(n) to v.ensure_capacity(n). But as the example above shows, we do not need a growing push_back() here. If the name push_back() is problematic, then I'm all for just calling it something else. But I don't like adding a growing push_back() unless we have a use-case. -Thorsten

On Mon, Mar 2, 2009 at 3:19 PM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Thorsten Ottosen skrev:
[snip]
I guess your case above should be implemented as
for # of rows v.clear(); v.resize( n ); for each element read e v.push_back_unchecked(e) process( v.begin(), v.end() )
where v.resize() should never actually shrink the buffer. (Do we need another function that does that?), or should we simply rename v.resize(n) to v.ensure_capacity(n).
But as the example above shows, we do not need a growing push_back() here.
Nor we would for std::vector if we use it the way you suggest.
If the name push_back() is problematic, then I'm all for just calling it something else.
How is the usage you shown faster than a growing push_back?
But I don't like adding a growing push_back() unless we have a use-case.
The use-case seems obvious to me. A vector that can use stack-allocation for >90% of the cases for an application. As someone pointed out, signals2 uses one in a detail namespace.
-Thorsten
Regards, -- Felipe Magno de Almeida

Felipe Magno de Almeida skrev:
On Mon, Mar 2, 2009 at 3:19 PM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Thorsten Ottosen skrev:
[snip]
I guess your case above should be implemented as
for # of rows v.clear(); v.resize( n ); for each element read e v.push_back_unchecked(e) process( v.begin(), v.end() )
where v.resize() should never actually shrink the buffer. (Do we need another function that does that?), or should we simply rename v.resize(n) to v.ensure_capacity(n).
But as the example above shows, we do not need a growing push_back() here.
Nor we would for std::vector if we use it the way you suggest.
Right. But vector<T>::unchecked_push_back() does not exists.
If the name push_back() is problematic, then I'm all for just calling it something else.
How is the usage you shown faster than a growing push_back?
It's inlineable. And it doesn't check for a full buffer. The difference can be anything from not much to really much.
But I don't like adding a growing push_back() unless we have a use-case.
The use-case seems obvious to me. A vector that can use stack-allocation for >90% of the cases for an application. As someone pointed out, signals2 uses one in a detail namespace.
Is that use case different from the above? -Thorsten

On Mon, Mar 2, 2009 at 3:38 PM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Felipe Magno de Almeida skrev:
[snip]
How is the usage you shown faster than a growing push_back?
It's inlineable. And it doesn't check for a full buffer. The difference can be anything from not much to really much.
Can't push_back call reserve if it doesn't fit? That way push_back could still be inlineable. Or am I missing something?
But I don't like adding a growing push_back() unless we have a use-case.
The use-case seems obvious to me. A vector that can use stack-allocation for >90% of the cases for an application. As someone pointed out, signals2 uses one in a detail namespace.
Is that use case different from the above?
I don't think so. But it is quite common it seems.
-Thorsten
-- Felipe Magno de Almeida

Felipe Magno de Almeida skrev:
On Mon, Mar 2, 2009 at 3:38 PM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Felipe Magno de Almeida skrev:
[snip]
How is the usage you shown faster than a growing push_back? It's inlineable. And it doesn't check for a full buffer. The difference can be anything from not much to really much.
Can't push_back call reserve if it doesn't fit? That way push_back could still be inlineable. Or am I missing something?
Last I checked, vector<T>::push_back() was not inlined, no matter the optimization level. The whole involvement of the allocator, and copying of elements on expansion seems to tell the compile it is inadvisable. And even if it was inlinable on compiler X, then what about compile Y? And then there is the additional check for buffer expansion.
But I don't like adding a growing push_back() unless we have a use-case. The use-case seems obvious to me. A vector that can use stack-allocation for >90% of the cases for an application. As someone pointed out, signals2 uses one in a detail namespace. Is that use case different from the above?
I don't think so. But it is quite common it seems.
But nevermind, Peter constructed a (perhaps rare) use-case in another branch of this discussion. -Thorsten

On Mon, Mar 2, 2009 at 3:59 PM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Felipe Magno de Almeida skrev:
[snip]
Can't push_back call reserve if it doesn't fit? That way push_back could still be inlineable. Or am I missing something?
Last I checked, vector<T>::push_back() was not inlined, no matter the optimization level. The whole involvement of the allocator, and copying of elements on expansion seems to tell the compile it is inadvisable.
I see, And auto_buffer::push_back is inlined? Have you tried with the buffer expansion check? I still advocate for a growing push_back even if it is not inlined, and adding a unchecked_push_back if somebody needs more performance.
And even if it was inlinable on compiler X, then what about compile Y?
I guess this can't really be enforced in a standard way anyway. No matter how small.
And then there is the additional check for buffer expansion.
I would think this check would be quite unexpensive. [snip]
-Thorsten
Regards, -- Felipe Magno de Almeida

Felipe Magno de Almeida skrev:
On Mon, Mar 2, 2009 at 3:59 PM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Felipe Magno de Almeida skrev:
[snip]
Can't push_back call reserve if it doesn't fit? That way push_back could still be inlineable. Or am I missing something? Last I checked, vector<T>::push_back() was not inlined, no matter the optimization level. The whole involvement of the allocator, and copying of elements on expansion seems to tell the compile it is inadvisable.
I see, And auto_buffer::push_back is inlined?
At least it is much more likely to be. Again, the standard doesn't guarantee it.
Have you tried with the buffer expansion check? I still advocate for a growing push_back even if it is not inlined, and adding a unchecked_push_back if somebody needs more performance.
Yes, but are needed.
And then there is the additional check for buffer expansion.
I would think this check would be quite unexpensive.
Yeah, probably small compared to the function not being inlined. But still an overhead we would like to avoid if possible. -Thorsten

Thorsten Ottosen:
I guess your case above should be implemented as
for # of rows v.clear(); v.resize( n );
v.reserve( n );
for each element read e v.push_back_unchecked(e) process( v.begin(), v.end() ) ...
But as the example above shows, we do not need a growing push_back() here.
Yes, this second example was supposed to show that we sometimes need to alter the capacity. If you want it to argue for push_back, change the file format so that the rows are terminated with a sentinel instead of prefixed with n. :-)

Peter Dimov skrev:
Thorsten Ottosen:
I guess your case above should be implemented as
for # of rows v.clear(); v.resize( n );
v.reserve( n );
Right :-)
for each element read e v.push_back_unchecked(e) process( v.begin(), v.end() ) ....
But as the example above shows, we do not need a growing push_back() here.
Yes, this second example was supposed to show that we sometimes need to alter the capacity. If you want it to argue for push_back, change the file format so that the rows are terminated with a sentinel instead of prefixed with n. :-)
Right. -Thorsten

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 02 March 2009, Thorsten Ottosen wrote:
where v.resize() should never actually shrink the buffer. (Do we need another function that does that?), or should we simply rename v.resize(n) to v.ensure_capacity(n).
I believe the usual way std::vector handles this is that resize() never reduces capacity, and if you do want to reduce capacity you have to swap your vector with a newly constructed one. Hmm, your auto_buffer doesn't support swap though.
But as the example above shows, we do not need a growing push_back() here. If the name push_back() is problematic, then I'm all for just calling it something else. But I don't like adding a growing push_back() unless we have a use-case.
I'm using it in Boost.Signals2 to hold tracked shared_ptrs during signal invocation. During before invoking each slot, I need to hold shared_ptrs to all the slot's tracked objects. So I store them in a std::vector which has a custom allocator that uses the stack as long as the vector size doesn't exceed 10, then falls back to heap allocation. So before each slot is run, the vector of tracked objects is cleared and the tracked objects from the next slot are pushed onto it. if all the slots in the signal's slot list have 10 or less tracked objects, no allocations occur. But it can also handle slots with large numbers of tracked objects. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkmsKDcACgkQ5vihyNWuA4UfhgCbBgOEfKBj0NfwOLkN4qWJDK7i VzEAn2jIWCgFD/0aIozvqovfOW0z+up3 =GOub -----END PGP SIGNATURE-----

Frank Mori Hess skrev:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Monday 02 March 2009, Thorsten Ottosen wrote:
where v.resize() should never actually shrink the buffer. (Do we need another function that does that?), or should we simply rename v.resize(n) to v.ensure_capacity(n).
I believe the usual way std::vector handles this is that resize() never reduces capacity, and if you do want to reduce capacity you have to swap your vector with a newly constructed one. Hmm, your auto_buffer doesn't support swap though.
No. As it might be O(n). I'm not against providing "swap()", it should just be under a different name, e.g. liniear_swap().
But as the example above shows, we do not need a growing push_back() here. If the name push_back() is problematic, then I'm all for just calling it something else. But I don't like adding a growing push_back() unless we have a use-case.
I'm using it in Boost.Signals2 to hold tracked shared_ptrs during signal invocation. During before invoking each slot, I need to hold shared_ptrs to all the slot's tracked objects. So I store them in a std::vector which has a custom allocator that uses the stack as long as the vector size doesn't exceed 10, then falls back to heap allocation. So before each slot is run, the vector of tracked objects is cleared and the tracked objects from the next slot are pushed onto it. if all the slots in the signal's slot list have 10 or less tracked objects, no allocations occur. But it can also handle slots with large numbers of tracked objects.
Ok, but can you determine the size of the buffer in advance, so you can call reserve? -Thorsten

Thorsten Ottosen wrote:
Frank Mori Hess skrev:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Monday 02 March 2009, Thorsten Ottosen wrote:
where v.resize() should never actually shrink the buffer. (Do we need another function that does that?), or should we simply rename v.resize(n) to v.ensure_capacity(n).
I believe the usual way std::vector handles this is that resize() never reduces capacity, and if you do want to reduce capacity you have to swap your vector with a newly constructed one. Hmm, your auto_buffer doesn't support swap though.
No. As it might be O(n). I'm not against providing "swap()", it should just be under a different name, e.g. liniear_swap().
Certainly swap should be provided somehow; a fast-as-possible swap is not otherwise possible since the user can't get at the buffer pointer. boost::array supports swap, and yours should be at least as fast as that. On that precedent I'd say you should support swap() under that name. Has anyone ever complained about the deceptive slowness of boost::array's swap? Also, I think the following sort of code is quite plausible: std::pair<int, auto_buffer<int, 2> > a,b; //... swap(a,b); (e.g. I often find myself sorting vectors of such pairs) and for that to work you need to provide a free swap overload. John Bytheway

John Bytheway skrev:
Thorsten Ottosen wrote:
Frank Mori Hess skrev: Hmm, your
auto_buffer doesn't support swap though. No. As it might be O(n). I'm not against providing "swap()", it should just be under a different name, e.g. liniear_swap().
Certainly swap should be provided somehow; a fast-as-possible swap is not otherwise possible since the user can't get at the buffer pointer. boost::array supports swap, and yours should be at least as fast as that. On that precedent I'd say you should support swap() under that name. Has anyone ever complained about the deceptive slowness of boost::array's swap?
I don't know. I just stay clear of it. The point is that we don't want inexperienced users to use an O(n) swap accidently.
Also, I think the following sort of code is quite plausible:
std::pair<int, auto_buffer<int, 2> > a,b; //... swap(a,b);
(e.g. I often find myself sorting vectors of such pairs) and for that to work you need to provide a free swap overload.
Ok. That justifies some way of performing the swap. -Thorsten

AMDG Thorsten Ottosen wrote:
Certainly swap should be provided somehow; a fast-as-possible swap is not otherwise possible since the user can't get at the buffer pointer. boost::array supports swap, and yours should be at least as fast as that. On that precedent I'd say you should support swap() under that name. Has anyone ever complained about the deceptive slowness of boost::array's swap?
I don't know. I just stay clear of it. The point is that we don't want inexperienced users to use an O(n) swap accidently.
Please define n. The cost is bounded by the stack capacity, rather than the actual size. In Christ, Steven Watanabe

Steven Watanabe skrev:
AMDG
Thorsten Ottosen wrote:
Certainly swap should be provided somehow; a fast-as-possible swap is not otherwise possible since the user can't get at the buffer pointer. boost::array supports swap, and yours should be at least as fast as that. On that precedent I'd say you should support swap() under that name. Has anyone ever complained about the deceptive slowness of boost::array's swap?
I don't know. I just stay clear of it. The point is that we don't want inexperienced users to use an O(n) swap accidently.
Please define n.
n would be c.size() -Thorsten

On Tuesday 03 March 2009, Thorsten Ottosen wrote:
Steven Watanabe skrev:
AMDG
Thorsten Ottosen wrote:
I don't know. I just stay clear of it. The point is that we don't want inexperienced users to use an O(n) swap accidently.
Please define n.
n would be c.size()
n needs to be stack capacity if you want to call it O(n) swap. If n is size(), then swap becomes O(1) since once you get large enough to fall back on dynamic allocation you can just swap the pointers.

Frank Mori Hess skrev:
On Tuesday 03 March 2009, Thorsten Ottosen wrote:
Steven Watanabe skrev:
AMDG
Thorsten Ottosen wrote:
I don't know. I just stay clear of it. The point is that we don't want inexperienced users to use an O(n) swap accidently. Please define n. n would be c.size()
n needs to be stack capacity if you want to call it O(n) swap. If n is size(), then swap becomes O(1) since once you get large enough to fall back on dynamic allocation you can just swap the pointers.
Let me call is a linear time swap for n >= N. -Thorsten

On Mar 3, 2009, at 1:45 AM, Thorsten Ottosen wrote:
Steven Watanabe skrev:
AMDG Thorsten Ottosen wrote:
Certainly swap should be provided somehow; a fast-as-possible swap is not otherwise possible since the user can't get at the buffer pointer. boost::array supports swap, and yours should be at least as fast as that. On that precedent I'd say you should support swap() under that name. Has anyone ever complained about the deceptive slowness of boost::array's swap?
I don't know. I just stay clear of it. The point is that we don't want inexperienced users to use an O(n) swap accidently. Please define n.
n would be c.size()
I'm sorry, I know this will sound harsh, but I don't know how else to say it: we went down this road with ptr_container, and should not repeat that mistake. The instinct to protect naive users from performance pitfalls by leaving out fundamental operations (swap in this case; assignment and copy in the case of ptr_container) is just wrong. That's especially true when the implementations would NOT violate the efficiency guarantees required by the concepts. Let me also point out that std::array will support swap for C++0x (by swapping individual elements). -- David Abrahams BoostPro Computing http://boostpro.com

David Abrahams skrev:
On Mar 3, 2009, at 1:45 AM, Thorsten Ottosen wrote:
Steven Watanabe skrev:
AMDG Thorsten Ottosen wrote:
Certainly swap should be provided somehow; a fast-as-possible swap is not otherwise possible since the user can't get at the buffer pointer. boost::array supports swap, and yours should be at least as fast as that. On that precedent I'd say you should support swap() under that name. Has anyone ever complained about the deceptive slowness of boost::array's swap?
I don't know. I just stay clear of it. The point is that we don't want inexperienced users to use an O(n) swap accidently. Please define n.
n would be c.size()
I'm sorry, I know this will sound harsh, but I don't know how else to say it: we went down this road with ptr_container, and should not repeat that mistake.
ptr_containers have been copy-constrictible, and assignable is some releases now.
The instinct to protect naive users from performance pitfalls by leaving out fundamental operations (swap in this case; assignment and copy in the case of ptr_container) is just wrong. That's especially true when the implementations would NOT violate the efficiency guarantees required by the concepts.
I have provided a swap(), see the thread "[utility] auto_buffer v2". But a linear size() can be problematic, so can a linear swap().
Let me also point out that std::array will support swap for C++0x (by swapping individual elements).
Ok. Somebody should add it to boost::array then. -Thorsten

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 02 March 2009, Thorsten Ottosen wrote:
Ok, but can you determine the size of the buffer in advance, so you can call reserve?
Yes. If you're proposing adding the ability to grow an auto_buffer's capacity after it is constructed via a reserve() call, I could live with that. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkmsPHoACgkQ5vihyNWuA4VEXgCgqty4OqVcIiPhk/RUbfggzQ+n 3tQAn0Jnv9VNFwD0rgri61pf/Wqbs/yA =EJYO -----END PGP SIGNATURE-----

Frank Mori Hess skrev:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Monday 02 March 2009, Thorsten Ottosen wrote:
Ok, but can you determine the size of the buffer in advance, so you can call reserve?
Yes. If you're proposing adding the ability to grow an auto_buffer's capacity after it is constructed via a reserve() call, I could live with that.
No, I was actually advocating the that push_back should not grow. But as pointed out in another part of this thread, we need both cases. push_back() push_back_unchecked() you will be able to use the latter, which is more efficient. -Thorsten

On 2 Mar 2009, at 20:12, Thorsten Ottosen wrote:
Frank Mori Hess skrev:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 02 March 2009, Thorsten Ottosen wrote:
Ok, but can you determine the size of the buffer in advance, so you can call reserve? Yes. If you're proposing adding the ability to grow an auto_buffer's capacity after it is constructed via a reserve() call, I could live with that.
No, I was actually advocating the that push_back should not grow.
But as pointed out in another part of this thread, we need both cases.
push_back() push_back_unchecked()
you will be able to use the latter, which is more efficient.
You could make exactly the same argument about pushing back into a vector, why not have a push_back_unchecked that doesn't check if reallocation must occur. I did once experiment with this and got no measurable speed improvement. Chris

Christopher Jefferson skrev:
On 2 Mar 2009, at 20:12, Thorsten Ottosen wrote:
Frank Mori Hess skrev:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 02 March 2009, Thorsten Ottosen wrote:
Ok, but can you determine the size of the buffer in advance, so you can call reserve? Yes. If you're proposing adding the ability to grow an auto_buffer's capacity after it is constructed via a reserve() call, I could live with that.
No, I was actually advocating the that push_back should not grow.
But as pointed out in another part of this thread, we need both cases.
push_back() push_back_unchecked()
you will be able to use the latter, which is more efficient.
You could make exactly the same argument about pushing back into a vector, why not have a push_back_unchecked that doesn't check if reallocation must occur. I did once experiment with this and got no measurable speed improvement.
So did I in a real-time ray-tracing application. The difference was a deal-breaker. -Thorsten

On Mon, Mar 2, 2009 at 4:32 PM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
Christopher Jefferson skrev:
You could make exactly the same argument about pushing back into a vector, why not have a push_back_unchecked that doesn't check if reallocation must occur. I did once experiment with this and got no measurable speed improvement.
So did I in a real-time ray-tracing application. The difference was a deal-breaker.
I definitely see this tool as being all about low-level optimization, which is one of the few things we can say almost for certain concerning its domain of usage, so it might as well directly offer as much functionality in that regard as possible. -- -Matt Calabrese

On Mon, Mar 2, 2009 at 17:56, Matt Calabrese <rivorus@gmail.com> wrote:
On Mon, Mar 2, 2009 at 4:32 PM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
Christopher Jefferson skrev:
You could make exactly the same argument about pushing back into a vector, why not have a push_back_unchecked that doesn't check if reallocation must occur. I did once experiment with this and got no measurable speed improvement.
So did I in a real-time ray-tracing application. The difference was a deal-breaker.
I definitely see this tool as being all about low-level optimization, which is one of the few things we can say almost for certain concerning its domain of usage, so it might as well directly offer as much functionality in that regard as possible.
LLVM has a similar class, FWIW: http://llvm.org/docs/ProgrammersManual.html#dss_smallvector It seems to be entirely about reducing heap overheads, though, with apparently no need for *_unchecked functions. It seems to me that if you need unchecked, then the class shouldn't be storing the capacity at all (outside of debug versions), as it's unnecessary overhead, which suggests to me that it would be the domain of a separate, obviously unsafe class (unchecked_vector?). ~ Scott

On Mon, Mar 2, 2009 at 6:11 PM, Scott McMurray <me22.ca+boost@gmail.com>wrote:
It seems to me that if you need unchecked, then the class shouldn't be storing the capacity at all (outside of debug versions), as it's unnecessary overhead, which suggests to me that it would be the domain of a separate, obviously unsafe class (unchecked_vector?).
True, or behavior could be specified by a template argument -- one instantiation has a static max_size() function that yields the appropriate value based on the N argument and has an unsafe, unchecked push_back, the other is more flexible/safe and behaves more similarly to vector. -- -Matt Calabrese

Matt Calabrese skrev:
On Mon, Mar 2, 2009 at 6:11 PM, Scott McMurray <me22.ca+boost@gmail.com>wrote:
It seems to me that if you need unchecked, then the class shouldn't be storing the capacity at all (outside of debug versions), as it's unnecessary overhead, which suggests to me that it would be the domain of a separate, obviously unsafe class (unchecked_vector?).
True, or behavior could be specified by a template argument -- one instantiation has a static max_size() function that yields the appropriate value based on the N argument and has an unsafe, unchecked push_back, the other is more flexible/safe and behaves more similarly to vector.
We can't really implement a growing push_back() or reserve() without a capacity. I think static_auto_buffer<T,N> where T is a POD might fill a small niche. So I can provide both classes. -Thorsten

2009/3/2 Scott McMurray <me22.ca+boost@gmail.com>:
LLVM has a similar class, FWIW: http://llvm.org/docs/ProgrammersManual.html#dss_smallvector
This is much closer to what I really want. If I were designing it, given template< typename T, size_t N = 0, typename A = allocator<T> > class ebo_vector<T, N, A> (where ebo stands for Embedded Buffer Optimization): 1. Interface is drop in replacement for std::vector<T>, T != bool 2. For ebo_vector<bool>, behave as if std::vector<bool> specialization did not exist 3. Runtime ability to set/reset the maximum capacity; defaults to std::vector<T, A>::capacity() 4. Ability to construct/resize with default (in addition to value) initialized elements Other comments: To keep the interface simple, N should be the number of embedded elements (I don't want to call them stack elements, as that is only true if the instantiation is on the stack) and should only be specified in terms of number of elements. If folks want to do it in terms of bytes, have some kind of traits class that can do the conversion. -- Nevin ":-)" Liber <mailto:nevin@eviloverlord.com> (847) 691-1404

Nevin ":-]" Liber skrev:
2009/3/2 Scott McMurray <me22.ca+boost@gmail.com>:
LLVM has a similar class, FWIW: http://llvm.org/docs/ProgrammersManual.html#dss_smallvector
This is much closer to what I really want. If I were designing it, given
template< typename T, size_t N = 0, typename A = allocator<T> > class ebo_vector<T, N, A> (where ebo stands for Embedded Buffer Optimization):
1. Interface is drop in replacement for std::vector<T>, T != bool 2. For ebo_vector<bool>, behave as if std::vector<bool> specialization did not exist 3. Runtime ability to set/reset the maximum capacity; defaults to std::vector<T, A>::capacity() 4. Ability to construct/resize with default (in addition to value) initialized elements
Already there.
Other comments:
To keep the interface simple, N should be the number of embedded elements (I don't want to call them stack elements, as that is only true if the instantiation is on the stack) and should only be specified in terms of number of elements. If folks want to do it in terms of bytes, have some kind of traits class that can do the conversion.
There is a newer version if you look in the thread [utility] auto_buffer v2 I do not want to supply the whole interface of vector. For the operations that they have in common, it is almost a drop-in replacement. I say almost, because this class is really about speed, and often don't allow overlapping ranges, assignment to *this etc. The exception-safety guarantees might also be weaker if it hurts performance. -Thorsten

On Thu, Mar 5, 2009 at 18:29, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
I do not want to supply the whole interface of vector.
Why not?
For the operations that they have in common, it is almost a drop-in replacement. I say almost, because this class is really about speed, and often don't allow overlapping ranges, assignment to *this etc. The exception-safety guarantees might also be weaker if it hurts performance.
I think anything that's an "almost" needs an explicit rationale. I really don't, for example, see why the self-assignment check is worth omitting.

Scott McMurray skrev:
On Thu, Mar 5, 2009 at 18:29, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
I do not want to supply the whole interface of vector.
Why not?
It's a lot of work, so if rather have a reason to do this. And in general it is not a drop-in replacement for vector, so there is no reason to pretend it is.
For the operations that they have in common, it is almost a drop-in replacement. I say almost, because this class is really about speed, and often don't allow overlapping ranges, assignment to *this etc. The exception-safety guarantees might also be weaker if it hurts performance.
I think anything that's an "almost" needs an explicit rationale. I really don't, for example, see why the self-assignment check is worth omitting.
The rationale would be simplicity, speed and less generated code. -Thorsten

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday 06 March 2009, Thorsten Ottosen wrote:
Scott McMurray skrev:
On Thu, Mar 5, 2009 at 18:29, Thorsten Ottosen
<thorsten.ottosen@dezide.com> wrote:
I do not want to supply the whole interface of vector.
Why not?
It's a lot of work, so if rather have a reason to do this. And in general it is not a drop-in replacement for vector, so there is no reason to pretend it is.
What about the more general container concepts, for the sake of generic programming? Is it a Container/Forward Container/Sequence/Back Insertion Sequence? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkmxM4MACgkQ5vihyNWuA4UQJgCdHN/s4DE4DHJaBHRiWksysK6x RysAoLQMaitkC8rjaXinwX9H+VOvcfF7 =/Dq0 -----END PGP SIGNATURE-----

Frank Mori Hess skrev:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Friday 06 March 2009, Thorsten Ottosen wrote:
Scott McMurray skrev:
On Thu, Mar 5, 2009 at 18:29, Thorsten Ottosen
<thorsten.ottosen@dezide.com> wrote:
I do not want to supply the whole interface of vector. Why not? It's a lot of work, so if rather have a reason to do this. And in general it is not a drop-in replacement for vector, so there is no reason to pretend it is.
What about the more general container concepts, for the sake of generic programming? Is it a Container/Forward Container/Sequence/Back Insertion Sequence?
Currently it satisfies the reversible container concept. -Thorsten

Sent from my iPhone On Mar 6, 2009, at 5:30 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com
wrote:
Scott McMurray skrev:
On Thu, Mar 5, 2009 at 18:29, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
I do not want to supply the whole interface of vector. Why not?
It's a lot of work, so if rather have a reason to do this. And in general it is not a drop-in replacement for vector, so there is no reason to pretend it is.
For the operations that they have in common, it is almost a drop-in replacement. I say almost, because this class is really about speed, and often don't allow overlapping ranges, assignment to *this et I think anything that's an "almost" needs an explicit rationale. I really don't, for example, see why the self-assignment check is worth omitting.
The rationale would be simplicity, speed and less generated code.
-Thorsten

On Mar 6, 2009, at 2:30 AM, Thorsten Ottosen wrote:
For the operations that they have in common, it is almost a drop-in replacement. I say almost, because this class is really about speed, and often don't allow overlapping ranges, assignment to *this etc. The exception- safety guarantees might also be weaker if it hurts performance.
I think anything that's an "almost" needs an explicit rationale. I really don't, for example, see why the self-assignment check is worth omitting.
The rationale would be simplicity, speed and less generated code.
Have you measured the speed/space difference and found it to be significant in any real application? A class that attempts to provide value semantics but doesn't support x = x is putting a big hole in the system of equational reasoning. Justifying that (to me) would take some pretty heavy proof. -- David Abrahams BoostPro Computing http://boostpro.com

David Abrahams skrev:
On Mar 6, 2009, at 2:30 AM, Thorsten Ottosen wrote:
For the operations that they have in common, it is almost a drop-in replacement. I say almost, because this class is really about speed, and often don't allow overlapping ranges, assignment to *this etc. The exception-safety guarantees might also be weaker if it hurts performance.
I think anything that's an "almost" needs an explicit rationale. I really don't, for example, see why the self-assignment check is worth omitting.
The rationale would be simplicity, speed and less generated code.
Remark: the comment is more general, and also pertains to non-e.g. not allowing overlapping ranges.
Have you measured the speed/space difference and found it to be significant in any real application?
A class that attempts to provide value semantics but doesn't support x = x is putting a big hole in the system of equational reasoning. Justifying that (to me) would take some pretty heavy proof.
I have not really seen code that exhibits x = x. I've seen lot's of discussion (e.g. Sutter & Meyers). Does anybody write such code? -Thorsten

On Sat, Mar 14, 2009 at 04:47, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
David Abrahams skrev:
A class that attempts to provide value semantics but doesn't support x = x is putting a big hole in the system of equational reasoning. Justifying that (to me) would take some pretty heavy proof.
I have not really seen code that exhibits x = x. I've seen lot's of discussion (e.g. Sutter & Meyers). Does anybody write such code?
On purpose? Doubtful. Does it happen? Probably. Is there really a cost to allowing it, though? I'd assume it would have a vector-like operator= that'd be something like this: copy(other.begin(), other.begin() + size(), begin()); if (size() > other.size()) erase(begin()+other.size(), end()); else insert(end(), other.begin()+size(), other.end()); That works fine when this == &other, and I can't see a reason it would be slower than some version that doesn't allow self-assignment. Sure, it means self-assignment is O(n), but that's acceptable and consistent. It's sufficiently exception-safe, too.

Scott McMurray skrev:
On Sat, Mar 14, 2009 at 04:47, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
David Abrahams skrev:
A class that attempts to provide value semantics but doesn't support x = x is putting a big hole in the system of equational reasoning. Justifying that (to me) would take some pretty heavy proof.
I have not really seen code that exhibits x = x. I've seen lot's of discussion (e.g. Sutter & Meyers). Does anybody write such code?
On purpose? Doubtful. Does it happen? Probably.
Is there really a cost to allowing it, though? I'd assume it would have a vector-like operator= that'd be something like this:
copy(other.begin(), other.begin() + size(), begin()); if (size() > other.size()) erase(begin()+other.size(), end()); else insert(end(), other.begin()+size(), other.end());
That is not quite how I implemented it, but for operator= the extra check is not that important, so I can add it, -Thorsten

On 2009-03-15, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Scott McMurray skrev:
Is there really a cost to allowing it, though? I'd assume it would have a vector-like operator= that'd be something like this:
copy(other.begin(), other.begin() + size(), begin()); if (size() > other.size()) erase(begin()+other.size(), end()); else insert(end(), other.begin()+size(), other.end());
That is not quite how I implemented it, but for operator= the extra check is not that important, so I can add it,
I agree that self-assignment is uncommon enough for an explicit check to be a pessimization, though. Is the algorithm you have ( the attachment to the first post in this thread doesn't have operator= ) really more efficient than one that doesn't need the explicit check?

Scott McMurray skrev:
On 2009-03-15, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Scott McMurray skrev:
Is there really a cost to allowing it, though? I'd assume it would have a vector-like operator= that'd be something like this:
copy(other.begin(), other.begin() + size(), begin()); if (size() > other.size()) erase(begin()+other.size(), end()); else insert(end(), other.begin()+size(), other.end()); That is not quite how I implemented it, but for operator= the extra check is not that important, so I can add it,
I agree that self-assignment is uncommon enough for an explicit check to be a pessimization, though. Is the algorithm you have ( the attachment to the first post in this thread doesn't have operator= ) really more efficient than one that doesn't need the explicit check?
Here's the current version. I guess I would have to do some test to make 100% sure it's faster. -Thorsten auto_buffer& operator=( const auto_buffer& r ) // basic { BOOST_ASSERT( this != &r ); difference_type diff = size_ - r.size_; if( diff >= 0 ) { pop_back_n( static_cast<size_type>(diff) ); assign_impl( r.begin(), r.end(), begin() ); } else { if( capacity_ >= r.size() ) { unchecked_push_back_n( static_cast<size_type>(-diff) ); assign_impl( r.begin(), r.end(), begin() ); } else { pointer new_buffer = allocate( r.size() ); (*this).~auto_buffer(); buffer_ = new_buffer; capacity_ = r.size(); copy_impl( r.begin(), r.end(), buffer_ ); size_ = capacity_; } } BOOST_ASSERT( size() == r.size() ); BOOST_ASSERT( is_valid() ); return *this; }

On Sun, Mar 15, 2009 at 14:34, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Scott McMurray skrev:
I agree that self-assignment is uncommon enough for an explicit check to be a pessimization, though. Is the algorithm you have ( the attachment to the first post in this thread doesn't have operator= ) really more efficient than one that doesn't need the explicit check?
Here's the current version. I guess I would have to do some test to make 100% sure it's faster.
I would be interested to see if the manual inlining (essentially) really does make a noticeable speedup. For small n it wouldn't surprise me if it did, though. The important part, though, is that what you have should work just fine for self-assignment. diff will be 0, so you go into the first if, pop_back nothing, and then copy over it with itself. So I think you actually wrote a self-assignment-safe version without realizing it. ~ Scott P.S. What kind of exception-throwing limitations are you assuming? That (*this).~auto_buffer(); is scary, especially when you don't yet have the replacement memory initialized...

Scott McMurray skrev:
On Sun, Mar 15, 2009 at 14:34, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Scott McMurray skrev:
I agree that self-assignment is uncommon enough for an explicit check to be a pessimization, though. Is the algorithm you have ( the attachment to the first post in this thread doesn't have operator= ) really more efficient than one that doesn't need the explicit check? Here's the current version. I guess I would have to do some test to make 100% sure it's faster.
I would be interested to see if the manual inlining (essentially) really does make a noticeable speedup. For small n it wouldn't surprise me if it did, though.
The important part, though, is that what you have should work just fine for self-assignment. diff will be 0, so you go into the first if, pop_back nothing, and then copy over it with itself. So I think you actually wrote a self-assignment-safe version without realizing it.
No, because assign_impl() does memcpy() for pods which doesn't allow overlapping ranges. -Thorsten

on Sun Mar 15 2009, Scott McMurray <me22.ca+boost-AT-gmail.com> wrote:
On Sun, Mar 15, 2009 at 14:34, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Scott McMurray skrev:
I agree that self-assignment is uncommon enough for an explicit check to be a pessimization, though. Is the algorithm you have ( the attachment to the first post in this thread doesn't have operator= ) really more efficient than one that doesn't need the explicit check?
Here's the current version. I guess I would have to do some test to make 100% sure it's faster.
I would be interested to see if the manual inlining (essentially) really does make a noticeable speedup. For small n it wouldn't surprise me if it did, though.
It would surprise me.
The important part, though, is that what you have should work just fine for self-assignment. diff will be 0, so you go into the first if, pop_back nothing, and then copy over it with itself. So I think you actually wrote a self-assignment-safe version without realizing it.
~ Scott
P.S. What kind of exception-throwing limitations are you assuming? That (*this).~auto_buffer(); is scary, especially when you don't yet have the replacement memory initialized...
Why wouldn't you use copy and swap for that branch? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

on Sat Mar 14 2009, Thorsten Ottosen <thorsten.ottosen-AT-dezide.com> wrote:
David Abrahams skrev:
A class that attempts to provide value semantics but doesn't support x = x is putting a big hole in the system of equational reasoning. Justifying that (to me) would take some pretty heavy proof.
I have not really seen code that exhibits x = x. I've seen lot's of discussion (e.g. Sutter & Meyers). Does anybody write such code?
Yes, it happens. It doesn't happen that way, of course. It happens in x = y when x and y happen to refer to the same object. Regardless, as I've been saying, the burden of proof runs in the other direction. You need to show a /realistic/ benchmark that suffers significantly. And don't bias the test unfairly, either: you only need the self-assignment check in your slow branch, the one that does the allocation. The great computer scientists that came before us have shown us the way. http://en.wikipedia.org/wiki/Optimization_(computer_science)#When_to_optimiz... -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Scott McMurray wrote: On Thursday, March 05, 2009 10:24 PM
really don't, for example, see why the self-assignment check is worth omitting.
The conditional leads to predictive execution by modern CPUs. If the predicted outcome of the conditional differs from reality, the pipeline stalls. Thus, the canonical copy assignment operator uses copy-and-swap. That reuses the cctor logic and the (expected) efficiency of swap() to make copying safe and efficient without the need for a self-assignment check. In those rare situations in which self-assignment actually occurs, profiling might reveal the extra copying and swapping and expose the need to redesign the calling code. For the normal use case, the non-checking copy assignment operator will be as efficient as possible. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Stewart, Robert skrev:
Scott McMurray wrote: On Thursday, March 05, 2009 10:24 PM
really don't, for example, see why the self-assignment check is worth omitting.
The conditional leads to predictive execution by modern CPUs. If the predicted outcome of the conditional differs from reality, the pipeline stalls. Thus, the canonical copy assignment operator uses copy-and-swap. That reuses the cctor logic and the (expected) efficiency of swap() to make copying safe and efficient without the need for a self-assignment check.
In those rare situations in which self-assignment actually occurs, profiling might reveal the extra copying and swapping and expose the need to redesign the calling code. For the normal use case, the non-checking copy assignment operator will be as efficient as possible.
Unfortunately, the caninical operator=(T) that swaps is not efficient for this container. -Thorsten

Thorsten Ottosen wrote: On Thursday, March 05, 2009 6:30 PM
I do not want to supply the whole interface of vector. For the operations that they have in common, it is almost a drop-in replacement. I say almost, because this class is really about speed, and often don't allow overlapping ranges, assignment to *this etc. The exception-safety guarantees might also be weaker if it hurts performance.
If your goal is speed at the expense of all else, then name the class "fast_buffer" or something of that sort. It should be clear that there are many interested in a vector replacement that includes the small buffer optimization. If you're not interested in creating the latter, perhaps another will. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Stewart, Robert skrev:
Thorsten Ottosen wrote: On Thursday, March 05, 2009 6:30 PM
I do not want to supply the whole interface of vector. For the operations that they have in common, it is almost a drop-in replacement. I say almost, because this class is really about speed, and often don't allow overlapping ranges, assignment to *this etc. The exception-safety guarantees might also be weaker if it hurts performance.
If your goal is speed at the expense of all else, then name the class "fast_buffer" or something of that sort. It should be clear that there are many interested in a vector replacement that includes the small buffer optimization. If you're not interested in creating the latter, perhaps another will.
Perhaps. It is difficult to design a class that fits both purposes well. For example, the resize() strategy would be different -- in auto_buffer it is agreesive, because we know it is (almost always) used locally. The fact that it it can have a large space overhead is not a problem for auto_buffer's intended use, but could be if people start using it as vector. So is the complexity/nothrow guarantee of swap() etc. -Thorsten

On 2 Mar 2009, at 21:32, Thorsten Ottosen wrote:
Christopher Jefferson skrev:
On 2 Mar 2009, at 20:12, Thorsten Ottosen wrote:
Frank Mori Hess skrev:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 02 March 2009, Thorsten Ottosen wrote:
Ok, but can you determine the size of the buffer in advance, so you can call reserve? Yes. If you're proposing adding the ability to grow an auto_buffer's capacity after it is constructed via a reserve() call, I could live with that.
No, I was actually advocating the that push_back should not grow.
But as pointed out in another part of this thread, we need both cases.
push_back() push_back_unchecked()
you will be able to use the latter, which is more efficient. You could make exactly the same argument about pushing back into a vector, why not have a push_back_unchecked that doesn't check if reallocation must occur. I did once experiment with this and got no measurable speed improvement.
So did I in a real-time ray-tracing application. The difference was a deal-breaker.
Interesting, I wouldn't have expected to get such a big difference. I certainly have no trouble with a push_back_unchecked(), but think push_back() should do what it does in vector and friends. Chris

Christopher Jefferson skrev:
On 2 Mar 2009, at 21:32, Thorsten Ottosen wrote:
So did I in a real-time ray-tracing application. The difference was a deal-breaker.
Interesting, I wouldn't have expected to get such a big difference. I certainly have no trouble with a push_back_unchecked(), but think push_back() should do what it does in vector and friends.
Right. -Thorsten

Thorsten Ottosen:
Christopher Jefferson skrev:
On 2 Mar 2009, at 21:32, Thorsten Ottosen wrote:
So did I in a real-time ray-tracing application. The difference was a deal-breaker.
Interesting, I wouldn't have expected to get such a big difference. I certainly have no trouble with a push_back_unchecked(), but think push_back() should do what it does in vector and friends.
Right.
The reason for the apparent discrepancy is probably because Thorsten is comparing Dinkumware's vector::push_back against his unchecked auto_buffer::push_back, and erroneously attributing the difference in performance to the fact that push_back is not inlined (false; push_back itself is easily inlined by all compilers I know of) and to the presence of the capacity check. In fact, the problem lies in the Dinkumware implementation of push_back, not in the interface requirements. Here's the older version (VC6): void push_back(const _Ty& _X) {insert(end(), _X); } Why this underperforms should be obvious. Here's the more recent version (VC7.1-9.0): void push_back(const _Ty& _Val) { // insert element at end if (size() < capacity()) _Mylast = _Ufill(_Mylast, 1, _Val); else insert(end(), _Val); } This is somewhat better, but still suboptimal, for two reasons. One, it diligently goes through Alloc::construct, using two nested function calls for that; two, size() and capacity() hide two divisions on sizeof(T). Changing the implementation to check for _Mylast < _Myend and inlining the _Ufill call by hand improves the performance of vector<int>::push_back dramatically, and the unchecked version no longer has any significant advantage.

Peter Dimov skrev:
Thorsten Ottosen:
Christopher Jefferson skrev:
On 2 Mar 2009, at 21:32, Thorsten Ottosen wrote:
So did I in a real-time ray-tracing application. The difference was a deal-breaker.
Interesting, I wouldn't have expected to get such a big difference. I certainly have no trouble with a push_back_unchecked(), but think push_back() should do what it does in vector and friends.
Right.
The reason for the apparent discrepancy is probably because Thorsten is comparing Dinkumware's vector::push_back against his unchecked auto_buffer::push_back, and erroneously attributing the difference in performance to the fact that push_back is not inlined (false; push_back itself is easily inlined by all compilers I know of) and to the presence of the capacity check.
In fact, the problem lies in the Dinkumware implementation of push_back, not in the interface requirements. Here's the older version (VC6):
void push_back(const _Ty& _X) {insert(end(), _X); }
Why this underperforms should be obvious.
Here's the more recent version (VC7.1-9.0):
void push_back(const _Ty& _Val) { // insert element at end if (size() < capacity()) _Mylast = _Ufill(_Mylast, 1, _Val); else insert(end(), _Val); }
This is somewhat better, but still suboptimal, for two reasons. One, it diligently goes through Alloc::construct, using two nested function calls for that; two, size() and capacity() hide two divisions on sizeof(T).
Changing the implementation to check for _Mylast < _Myend and inlining the _Ufill call by hand improves the performance of vector<int>::push_back dramatically, and the unchecked version no longer has any significant advantage.
Thanks for the analysis. But don't you agree that is is worth avoiding the check for the buffer being full when this is possible? -Thorsten

Thorsten Ottosen:
But don't you agree that is is worth avoiding the check for the buffer being full when this is possible?
I'm fine with having a push_back_unchecked for the small percentage of cases where its use would be justified. But the main and encouraged mode of operation should be push_back.

On Mon, Mar 2, 2009 at 12:32 PM, Peter Dimov <pdimov@pdimov.com> wrote:
Thorsten Ottosen:
John Maddock skrev:
- From glancing at the implementation posted to the list, it does not appear to fall back to heap allocation once the maximum stack capacity is reached. push_back() simply asserts that the stack capacity hasn't been used up yet.
Yes, that is necessary to make push_back() inlinable.
It's also necessary if you want to introduce stack buffer overflow attacks. Now, I don't question the right of every C++ programmer to be able to overflow the stack, but I don't like this ability being presented under the name "push_back".
I agree strongly with Peter. The default needs to be safety, unless there is something to indicate the danger is accepted. Thus push_back() could throw on overflow, while unchecked_push_back() could have the semantics of the current push_back() implementation. There are probably other approaches, too, that would provide reasonable security. --Beman

On Mon, Mar 2, 2009 at 2:50 PM, Beman Dawes <bdawes@acm.org> wrote:
On Mon, Mar 2, 2009 at 12:32 PM, Peter Dimov <pdimov@pdimov.com> wrote:
[snip]
It's also necessary if you want to introduce stack buffer overflow attacks. Now, I don't question the right of every C++ programmer to be able to overflow the stack, but I don't like this ability being presented under the name "push_back".
I agree strongly with Peter.
The default needs to be safety, unless there is something to indicate the danger is accepted. Thus push_back() could throw on overflow,
I rather have auto_buffer to be growable. Having an exception thrown is rarely what the user wants. I wanted auto_buffer to be a SBO class with a STL interface. Maybe it would be nice to have a SBO class for heterogeneous array as well?
while unchecked_push_back() could have the semantics of the current push_back() implementation. There are probably other approaches, too, that would provide reasonable security.
Having unchecked_push_back can't hurt.
--Beman
Regards, -- Felipe Magno de Almeida

On Sun, Mar 1, 2009 at 4:04 PM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Dear all,
For a long time I have been missing this utility in my own work, and in boost related stuff too. I think there are a number of libraries in boost that can benefit from this utility. Libs like boost.format can be quite a bit more effective. So can logging libraries. etc.
...
Comments are most welcome.
I think auto_buffer would be a very useful addition to Boost. A few quick comments: * Move the comments about VLA's and differences from stlsoft::auto_buffer to a "Design History" (or similar) section of documentation. They just confuse readers who don't know anything about VLA's or have access to Matthew's book. * Rename N to something more explicit. MaxStackElements perhaps? * It looks like auto_buffer would often be preferable to boost::scoped_array. Or am I missing something? --Beman

Beman Dawes skrev:
On Sun, Mar 1, 2009 at 4:04 PM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Dear all,
For a long time I have been missing this utility in my own work, and in boost related stuff too. I think there are a number of libraries in boost that can benefit from this utility. Libs like boost.format can be quite a bit more effective. So can logging libraries. etc.
...
Comments are most welcome.
I think auto_buffer would be a very useful addition to Boost.
A few quick comments:
* Move the comments about VLA's and differences from stlsoft::auto_buffer to a "Design History" (or similar) section of documentation. They just confuse readers who don't know anything about VLA's or have access to Matthew's book.
Good idea. I should have written some docs, but I just wanted to hear if there is interrest.
* Rename N to something more explicit. MaxStackElements perhaps?
Good idea.
* It looks like auto_buffer would often be preferable to boost::scoped_array. Or am I missing something?
No. There is one thing that could make boost::scoped_array<T> (and scoped_ptr<T> for that matter) occupy a slight niche. There is no general way to give up ownership of the buffer inside boost::auto_buffer<T> because it might be on the heap, or it might be on the stack. If boost::scoped_array<T> added a release() member function it would occupy that niche. release() is really useful to have when dealing with legacy code, so I have been unable to understand why it is not there in the first place (and yes, I understand the encapsulation concerns, but much prefer usability when we have to choose one of them). -Thorsten

A very useful addition, I'm all for it. On Mon, Mar 2, 2009 at 9:23 AM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
There is one thing that could make boost::scoped_array<T> (and scoped_ptr<T> for that matter) occupy a slight niche. There is no general way to give up ownership of the buffer inside boost::auto_buffer<T> because it might be on the heap, or it might be on the stack. If boost::scoped_array<T> added a release() member function it would occupy that niche. release() is really useful to have when dealing with legacy code, so I have been unable to understand why it is not there in the first place (and yes, I understand the encapsulation concerns, but much prefer usability when we have to choose one of them).
I have the same issue and I'm glad to see I'm not the only one. A release function really should be there IMO. -- -Matt Calabrese

Hi, I think this would be very useful, in almost all applications. I had a quick look at the code. Just one comment: there appears to be no reason for the copy[_rai]() functions to me non-static members - it will just create code bloat when auto_buffer<>s are created with different N. Am I missing something, or would it make sense into a private namespace? Amit

Amit skrev:
Hi,
I think this would be very useful, in almost all applications.
I had a quick look at the code. Just one comment: there appears to be no reason for the copy[_rai]() functions to me non-static members - it will just create code bloat when auto_buffer<>s are created with different N.
Am I missing something, or would it make sense into a private namespace?
I guess this could be a general algorithm added to range_ex. I don't think this will generate bloat unless you explicitly instantate the class. -Thorsten

Thorsten Ottosen wrote:
Dear all,
For a long time I have been missing this utility in my own work, and in boost related stuff too. I think there are a number of libraries in boost that can benefit from this utility. Libs like boost.format can be quite a bit more effective. So can logging libraries. etc.
Basically boost::auto_buffer<T,N> is a C++ version of variable lenght array (from C99). It's based on Matthew Wilson's stlsoft::auto_buffer, which can be found described in his book "Imperfect C++".
I have changed a few things though:
// - no swap() and resize() provided // - works with allocators from Boost.Interprocess // - do not store T[N] on the stack, as this is bad when the // type has a non-trivial default constructor. // - handless non-pod types in an exception-safe manner // - has a fixed capacity, but a non-fixed size // - provides a non-growing (thus inlineable) push_back()
Comments are most welcome.
What does a C VLA, or your auto_buffer class, offer that a std::vector<T> does not have ?

Edward Diener skrev:
Thorsten Ottosen wrote:
Dear all,
For a long time I have been missing this utility in my own work, and in boost related stuff too. I think there are a number of libraries in boost that can benefit from this utility. Libs like boost.format can be quite a bit more effective. So can logging libraries. etc.
Comments are most welcome.
What does a C VLA, or your auto_buffer class, offer that a std::vector<T> does not have ?
It can avoid many heap-allocations leading to a major speed-improvement. It don't allow overlapping ranges and can use memcpy() then, also leading to some improvement. -Thorsten
participants (19)
-
Amit
-
Beman Dawes
-
Christopher Jefferson
-
Cory Nelson
-
David Abrahams
-
Edward Diener
-
Felipe Magno de Almeida
-
Frank Mori Hess
-
Frank Mori Hess
-
John Bytheway
-
John Maddock
-
Mathias Gaunard
-
Matt Calabrese
-
Nevin ":-]" Liber
-
Peter Dimov
-
Scott McMurray
-
Steven Watanabe
-
Stewart, Robert
-
Thorsten Ottosen