any_range vs. “canonical form” - what is the latter?

Boost's any_range documentation says the following: "Despite the underlying any_iterator being the fastest available implementation, the performance overhead of any_range is still appreciable due to the cost of virtual function calls required to implement increment, decrement, advance, equal etc. Frequently a better design choice is to convert to a canonical form." What does the author mean by a "canonical form"? Can someone give an example? Many thanks [question originally asked at http://stackoverflow.com/questions/5601303]

AMDG On 04/08/2011 11:46 PM, John Funnell wrote:
Boost's any_range documentation says the following:
"Despite the underlying any_iterator being the fastest available implementation, the performance overhead of any_range is still appreciable due to the cost of virtual function calls required to implement increment, decrement, advance, equal etc. Frequently a better design choice is to convert to a canonical form."
What does the author mean by a "canonical form"? Can someone give an example?
For example, copying the range into a vector. In Christ, Steven Watanabe

On Sat, Apr 9, 2011 at 4:36 PM, Steven Watanabe
AMDG
On 04/08/2011 11:46 PM, John Funnell wrote:
Boost's any_range documentation says the following:
"Despite the underlying any_iterator being the fastest available implementation, the performance overhead of any_range is still appreciable due to the cost of virtual function calls required to implement increment, decrement, advance, equal etc. Frequently a better design choice is to convert to a canonical form."
What does the author mean by a "canonical form"? Can someone give an example?
For example, copying the range into a vector.
Yes, this is exactly the alternative design that I had in mind when writing the documentation. The overhead of iterating over an any_range is quite considerable, and often compares poorly with copying a concrete result-type into a container such as a vector. However, this is not always the case and some of the users of Boost.Range have desired the ability to implement algorithms that operate upon any_range instances. This is sometimes desirable to allow, for example, exposure of algorithms from a shared library that supports various containers. The use of any_range may also make sense where the number of passes over the range are small, but the memory size of the underlying container is very large. In many cases, the performance overhead will not matter. I wanted to ensure that I did not mislead anyone into the widespread adoption of any_range usage. I believe that the valid usages for this class are few, but sometimes it is *exactly* the correct design choice. I shall improve the documentation with some additional clarification and examples in due course. Thank you for your great question. Regards, Neil Groves

Den 09-04-2011 18:45, Neil Groves skrev:
On Sat, Apr 9, 2011 at 4:36 PM, Steven Watanabe
For example, copying the range into a vector.
Yes, this is exactly the alternative design that I had in mind when writing the documentation. The overhead of iterating over an any_range is quite considerable, and often compares poorly with copying a concrete result-type into a container such as a vector. However, this is not always the case and some of the users of Boost.Range have desired the ability to implement algorithms that operate upon any_range instances. This is sometimes desirable to allow, for example, exposure of algorithms from a shared library that supports various containers.
Two questions: A. would it not be more efficient to store only a range object instead of two iterators as this leads to each iterator begin potentially heap-allocated. (e.g. use boost::iterator_range<I> internally) B. would it not make sense to remove a whole bunch of those template arguments to any_range? Why can't I just say void foo( const any_range<int>& rng ) { switch( rng.category() ) { case boost::ranges::consequtive: ...; break; case boost::ranges::randome_access: ...; break; // etc } } -Thorsten

On Sat, Apr 9, 2011 at 7:36 PM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
Den 09-04-2011 18:45, Neil Groves skrev:
On Sat, Apr 9, 2011 at 4:36 PM, Steven Watanabe
For example, copying the range into a vector.
Yes, this is exactly the alternative design that I had in mind when writing the documentation. The overhead of iterating over an any_range is quite considerable, and often compares poorly with copying a concrete result-type into a container such as a vector. However, this is not always the case and some of the users of Boost.Range have desired the ability to implement algorithms that operate upon any_range instances. This is sometimes desirable to allow, for example, exposure of algorithms from a shared library that supports various containers.
Two questions:
A. would it not be more efficient to store only a range object instead of two iterators as this leads to each iterator begin potentially heap-allocated. (e.g. use boost::iterator_range<I> internally)
This is worth investigating particularly for large iterators. I've found through profiling many applications that my current small-buffer optimization means that the iterators are allocated in an embedded buffer, so the iterators are typically allocated using placement new which compiles out to a no-op. Hence I think that the further optimization you have proposed would be limited to the unusual case of wrapping large iterators. It is still worth some brain cycles though. I'll look into this.
B. would it not make sense to remove a whole bunch of those template arguments to any_range? Why can't I just say
void foo( const any_range<int>& rng ) { switch( rng.category() ) { case boost::ranges::consequtive: ...; break; case boost::ranges::randome_access: ...; break; // etc } }
I didn't go this route because a user can decide to use the default arguments for the other template parameters and implement specialized algorithms as you have proposed. However, by allowing the optional use of the additional template parameter types, it allows full interoperability with algorithms not explicitly coded for use with any_range. I should probably give some thought to improved support for people wishing to code algorithms that use runtime detection of category etc. This could definitely be improved.
-Thorsten
Thanks for your ideas. Regards, Neil Groves

Den 09-04-2011 22:19, Neil Groves skrev:
I should probably give some thought to improved support for people wishing to code algorithms that use runtime detection of category etc. This could definitely be improved.
Also consider the possibility of performing an any-like cast to get the original range out. -Thorsten
participants (4)
-
John Funnell
-
Neil Groves
-
Steven Watanabe
-
Thorsten Ottosen