Overloading boost::for_each()
There are several container types (such as typical deque implementations, if you have access to their internals) that can have for_each() implemented for them in a way that is faster than using iterators and a generic for_each() function. Is there a way to take advantage of that in Boost.Range, such as by overloading boost::for_each()? Is there some kind of Iterable concept that doesn't require actual iterators or allows them to be slow? Would it be worthwhile to add one? Are there any other techniques that would be useful instead, such as always calling for_each() unqualified in my code and using ADL to find custom versions (like is done for swap)? I am not interested in making Boost.Range algorithms use this new concept, but there are places in my own code that could use for_each() rather than explicit loops if that was faster. Thank you for any answers or advice. -- Jeremiah Willcock
BoostPro Computing, http://boostpro.com
Sent from coveted but awkward mobile device
--
On Aug 28, 2010, at 3:58 PM, Jeremiah Willcock
There are several container types (such as typical deque implementations, if you have access to their internals) that can have for_each() implemented for them in a way that is faster than using iterators and a generic for_each() function. Is there a way to take advantage of that in Boost.Range, such as by overloading boost::for_each()? Is there some kind of Iterable concept that doesn't require actual iterators or allows them to be slow?
I presume you're familiar with http://lafstern.org/matt/segmented.pdf?
On Sat, 28 Aug 2010, Dave Abrahams wrote:
On Aug 28, 2010, at 3:58 PM, Jeremiah Willcock
wrote: There are several container types (such as typical deque implementations, if you have access to their internals) that can have for_each() implemented for them in a way that is faster than using iterators and a generic for_each() function. Is there a way to take advantage of that in Boost.Range, such as by overloading boost::for_each()? Is there some kind of Iterable concept that doesn't require actual iterators or allows them to be slow?
I presume you're familiar with http://lafstern.org/matt/segmented.pdf?
I had heard of segmented iterators but hadn't looked at them in detail, so thanks for the link. I'm not sure I want something that's just two-level and still based on iterators, though; one case I was thinking of is compressed data that truly has a linear sequence but it might be faster to have the decompressor control the program's control flow rather than having the algorithm do it (of course, that isn't as general as iterators, but works for some algorithms). -- Jeremiah Willcock
BoostPro Computing, http://boostpro.com
Sent from coveted but awkward mobile device
--
On Aug 28, 2010, at 6:47 PM, Jeremiah Willcock
On Sat, 28 Aug 2010, Dave Abrahams wrote:
On Aug 28, 2010, at 3:58 PM, Jeremiah Willcock
wrote: There are several container types (such as typical deque implementations, if you have access to their internals) that can have for_each() implemented for them in a way that is faster than using iterators and a generic for_each() function. Is there a way to take advantage of that in Boost.Range, such as by overloading boost::for_each()? Is there some kind of Iterable concept that doesn't require actual iterators or allows them to be slow?
I presume you're familiar with http://lafstern.org/matt/segmented.pdf?
I had heard of segmented iterators but hadn't looked at them in detail, so thanks for the link. I'm not sure I want something that's just two-level
Read the paper. It works for as many levels as you want. The localiterator can itself be segmented.
and still based on iterators, though; one case I was thinking of is compressed data that truly has a linear sequence but it might be faster to have the decompressor control the program's control flow rather than having the algorithm do it (of course, that isn't as general as iterators, but works for some algorithms).
Ah, push or pull—the eternal struggle.
On 29/08/2010 00:53, Dave Abrahams wrote:
I presume you're familiar with http://lafstern.org/matt/segmented.pdf?
I had heard of segmented iterators but hadn't looked at them in detail, so thanks for the link. I'm not sure I want something that's just two-level
Read the paper. It works for as many levels as you want. The localiterator can itself be segmented.
I see some potential for some interesting discussion here, sorry in advance for hijacking this thread. I think the whole segmented iterators/ranges is a bad idea, as it has quite a few disadvantages and there are alternative, simpler solutions that provide the same benefits. segmented ranges have the following disadvantages: - while segmented iterators are easier to write than flat iterators in certain cases, they're still not as easy to write as they could be - any code or algorithm that deals with regular ranges has to be rewritten, not just in order to gain the performance gains, but to get basic functionality working as well. That's about as bad as it gets from an integration point of view. - only trivial operations that manipulate element per element (for_each, copy, transform, accumulate, erase_if) can really exploit the performance gains of segmented ranges - making non-trivial operations (any operation that needs to manipulate several adjacent elements [1]) deal with segmented ranges can be so complicated that they can realistically only be written with a flattened non-segmented view of the range. [1] examples include character encoding conversion, base64 encoding/decoding, gzip encoding/decoding, etc.
and still based on iterators, though; one case I was thinking of is compressed data that truly has a linear sequence but it might be faster to have the decompressor control the program's control flow rather than having the algorithm do it (of course, that isn't as general as iterators, but works for some algorithms).
Ah, push or pull—the eternal struggle.
Push and pull are the opposite ends of the same thing. When you want to generate data, you'd rather push rather than be pulled, when you want to read it, you'd rather pull than be pushed. So what's the solution? Generators. An by that, I mean a function that writes data to an output iterator. A generator is a pure-push primitive, and is trivial to write efficiently. Trivial operations can be implemented with no cost on top of a generator, by using something such as boost::function_output_iterator. There are generic techniques to build a (normal, non-segmented) range that you can then easily pull from on top of a generator, but those are more complicated and introduce some overhead; overhead you would need anyway to do any non-trivial operation. There are two techniques to achieve this. The first one, the magical silver bullet, involves using a coroutine. Every time you generate a value, yield. This gives control back to the other end, that can do whatever it wants with the element, and when it pulls again it goes back in the generator exactly where it was, and can keep on pushing. The code keeps swapping between a pushing and a pulling context and achieves exactly the push->pull integration we wanted. It however raises a few concerns: context size, allocation and management, and potentially excessive context swapping. The other technique is to generate data step by step, each step having a fixed maximum size. Then you can run a step on a temporary buffer, read off that temporary buffer, then run a new step once everything has been read. An iterator that does this transparently on-top of a generator can be written. Albeit it does introduce some overhead, it could be less than that of the coroutine solution. The biggest concern is that it makes the iterator fat, since the temporary buffer is part of the iterator; and iterators are typically passed by value. The latter approach is the one I used to implement my generic Converter system as part of my Unicode library. A converter is a step-generator, except it takes input to convert as well. I've got an iterator adaptor that can then run a converter as the data is being iterated, and thus lazily perform any arbitrary conversion. Note that the end result, in the case of Unicode, in not very different from the Unicode iterator adaptors hand-written by John Maddock, except the implementation is much simpler, and can be easily used for efficient pushing as well. So, at the end of this long message, what do I propose to solve the original problem? Have your container provide a step-generator. You can then implement your iterator on top of it, or not, your choice (but I'd recommend using it to avoid unnecessary code duplication and maintenance). A type trait could then be introduced to tell whether a sequence provides a generator or not. Boost.Range could make use of that type trait to specialize trivial operations to use the generator instead of the iterators. We end up with: - an easy to write solution - compatibility will all code that takes ranges as input - only the operations that want to can be specialized to exploit the performance gains - no restriction imposed on operations that cannot be implemented in terms of the generator directly
On Sun, Aug 29, 2010 at 4:34 AM, Mathias Gaunard
So, at the end of this long message, what do I propose to solve the original problem?
Have your container provide a step-generator. You can then implement your iterator on top of it, or not, your choice (but I'd recommend using it to avoid unnecessary code duplication and maintenance).
I don't understand your suggestion . To make your proposal understandable, could you please illustrate, in code, what a step-generator would look like for std::deque, and how it could be used? -- Dave Abrahams BoostPro Computing http://www.boostpro.com
On 29/08/2010 17:53, Dave Abrahams wrote:
On Sun, Aug 29, 2010 at 4:34 AM, Mathias Gaunard
wrote: So, at the end of this long message, what do I propose to solve the original problem?
Have your container provide a step-generator. You can then implement your iterator on top of it, or not, your choice (but I'd recommend using it to avoid unnecessary code duplication and maintenance).
I don't understand your suggestion . To make your proposal understandable, could you please illustrate, in code, what a step-generator would look like for std::deque, and how it could be used?
I'm not really familiar with the implementation of std::deque, so I'll
just assume it's a vector of statically-sized arrays. (I'm also assuming
each statically-sized array is full for the sake of simplicity, but in
truth the ones on both ends wouldn't be)
I don't know yet what a generator that just runs a step should look
like, here's a first try:
template<typename T>
struct deque_step_generator
{
typedef T output_type;
static const size_t max_output = whatever;
deque_step_generator(deque& deq_) : deq(deq_), it(deq.begin()) {}
template<typename Out>
Out operator()(Out out)
{
return copy(*it++, out);
}
bool at_end()
{
return it == deq.end();
}
private:
typedef vector< array
At Mon, 30 Aug 2010 00:43:57 +0100, Mathias Gaunard wrote:
Of course, for certain data structures, providing a step-generator, as I call it, can still be too complicated, and they could only have a 'global' generator, which, in the case or our deque, would be:
template<typename T> struct deque_generator { typedef T output_type;
deque_generator(deque& deq_) : deq(deq_) {}
template<typename Out> Out operator()(Out out) { for(typename deque::const_iterator it = deq.begin(); it != deq.end(); ++it) out = copy(*it++, out); return out; }
private: typedef vector< array
> deque; deque &deq; }; In the case of a global generator, implementing algorithms such as std::accumulate should be basically the same thing as the string_appender example here: http://www.boost.org/doc/libs/1_44_0/libs/iterator/doc/function_output_itera... In the case of the step one, it's the same except you need to do it in a loop.
[Followup-To set to the developers' list, as this doesn't seem like a boost-users discussioin anymore.] Hi Mathias, Thanks, but you only answered the first part of my question. I don't understand how you'd use such a thing. Thanks again. -- Dave Abrahams BoostPro Computing http://www.boostpro.com
On 30/08/2010 13:05, David Abrahams wrote:
Hi Mathias,
Thanks, but you only answered the first part of my question. I don't understand how you'd use such a thing.
I don't really understand your question, so I've put together a practical example, that is available on the vault: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=generator_iterator.zip&directory=& To demonstrate the idea, I've implemented a simplistic binary tree in binary_tree.hpp, and provided for it a forward iterator and a global generator. As can be clearly seen, the global generator has a trivial, (and possibly more efficient) implementation, while the iterator is a bit more complicated and shouldn't fare as well. I'm not even sure the iterator implementation is correct, which shows writing iterators is quite more complicated. I've implemented a range-based 'copy' function, that behaves just like boost::copy from Boost.Range, except that if the type has a generator, it can use it instead. The same thing could be done for all standard algorithms: for_each, accumulate, etc. I've also implemented a facility to build an iterator from a generator using coroutines (uses the latest version of boost.coroutine I could find). I tried to do some simple timings, but I must be doing it wrong, since the results can be extremely variable.
Sent from my iPhone
On Aug 30, 2010, at 3:26 PM, Mathias Gaunard
I don't really understand your question, so I've put together a practical example, that is available on the vault: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=generator_iterator.zip&directory=&
And if you'd put that in a github repo I'd have been able to peruse it with my coveted but awkward mobile device ;-)
On Mon, Aug 30, 2010 at 3:22 PM, Dave Abrahams
Sent from my iPhone On Aug 30, 2010, at 3:26 PM, Mathias Gaunard
wrote: I don't really understand your question, so I've put together a practical example, that is available on the vault: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=generator_iterator.zip&directory=&
And if you'd put that in a github repo I'd have been able to peruse it with my coveted but awkward mobile device ;-)
My coveted, yet awkward mobile device[0] can handle that link just fine, extract the zip, and peruse the source code fine. ;-) [0] HTC Touch Pro, running a dual-boot between Windows Mobile 6.5 and Android 2.2.
On Sun, 29 Aug 2010, Mathias Gaunard wrote:
On 29/08/2010 00:53, Dave Abrahams wrote:
I presume you're familiar with http://lafstern.org/matt/segmented.pdf?
I had heard of segmented iterators but hadn't looked at them in detail, so thanks for the link. I'm not sure I want something that's just two-level
Read the paper. It works for as many levels as you want. The localiterator can itself be segmented.
I see some potential for some interesting discussion here, sorry in advance for hijacking this thread.
I think the whole segmented iterators/ranges is a bad idea, as it has quite a few disadvantages and there are alternative, simpler solutions that provide the same benefits.
segmented ranges have the following disadvantages:
- while segmented iterators are easier to write than flat iterators in certain cases, they're still not as easy to write as they could be
- any code or algorithm that deals with regular ranges has to be rewritten, not just in order to gain the performance gains, but to get basic functionality working as well. That's about as bad as it gets from an integration point of view.
- only trivial operations that manipulate element per element (for_each, copy, transform, accumulate, erase_if) can really exploit the performance gains of segmented ranges
- making non-trivial operations (any operation that needs to manipulate several adjacent elements [1]) deal with segmented ranges can be so complicated that they can realistically only be written with a flattened non-segmented view of the range.
[1] examples include character encoding conversion, base64 encoding/decoding, gzip encoding/decoding, etc.
I agree with all of this; an algorithm that works on segmented iterators would effectively need to treat the leaves of an arbitrary tree as its sequence. Additionally, in my reading of the paper, the depth of the tree is encoded into the iterator types (and so must have a compile-time bound).
and still based on iterators, though; one case I was thinking of is compressed data that truly has a linear sequence but it might be faster to have the decompressor control the program's control flow rather than having the algorithm do it (of course, that isn't as general as iterators, but works for some algorithms).
Ah, push or pull—the eternal struggle.
Push and pull are the opposite ends of the same thing. When you want to generate data, you'd rather push rather than be pulled, when you want to read it, you'd rather pull than be pushed.
So what's the solution? Generators. An by that, I mean a function that writes data to an output iterator.
A generator is a pure-push primitive, and is trivial to write efficiently. Trivial operations can be implemented with no cost on top of a generator, by using something such as boost::function_output_iterator.
Yes, we just don't have them implemented.
There are generic techniques to build a (normal, non-segmented) range that you can then easily pull from on top of a generator, but those are more complicated and introduce some overhead; overhead you would need anyway to do any non-trivial operation.
There are two techniques to achieve this.
The first one, the magical silver bullet, involves using a coroutine. Every time you generate a value, yield. This gives control back to the other end, that can do whatever it wants with the element, and when it pulls again it goes back in the generator exactly where it was, and can keep on pushing. The code keeps swapping between a pushing and a pulling context and achieves exactly the push->pull integration we wanted. It however raises a few concerns: context size, allocation and management, and potentially excessive context swapping.
That's my concern, plus implementation complexity.
The other technique is to generate data step by step, each step having a fixed maximum size. Then you can run a step on a temporary buffer, read off that temporary buffer, then run a new step once everything has been read. An iterator that does this transparently on-top of a generator can be written. Albeit it does introduce some overhead, it could be less than that of the coroutine solution. The biggest concern is that it makes the iterator fat, since the temporary buffer is part of the iterator; and iterators are typically passed by value.
For this special kind of iterator, though, algorithms that know about the special kind can pass iterators by reference/rvalue reference and avoid copying them.
The latter approach is the one I used to implement my generic Converter system as part of my Unicode library. A converter is a step-generator, except it takes input to convert as well. I've got an iterator adaptor that can then run a converter as the data is being iterated, and thus lazily perform any arbitrary conversion.
Note that the end result, in the case of Unicode, in not very different from the Unicode iterator adaptors hand-written by John Maddock, except the implementation is much simpler, and can be easily used for efficient pushing as well.
So, at the end of this long message, what do I propose to solve the original problem?
Have your container provide a step-generator. You can then implement your iterator on top of it, or not, your choice (but I'd recommend using it to avoid unnecessary code duplication and maintenance).
The problem is that, without real coroutines, I'm concerned about the performance of this approach. The example I'm thinking of is some code that I wrote a few years ago that encoded decompressor state in control flow (as I assume a fast UTF-8 decoder would), and needing that to be steppable without saving the program counter and registers (i.e., a "real" coroutine library) would prevent that technique.
A type trait could then be introduced to tell whether a sequence provides a generator or not. Boost.Range could make use of that type trait to specialize trivial operations to use the generator instead of the iterators.
I do agree with that, and that's kind of what I was thinking. That's what I would view a specialized for_each() as: write algorithms in terms of for_each() when that is efficient, and ranges that have a faster for_each() operation than the default using iterators would overload that function to get performance improvements in the algorithms.
We end up with:
- an easy to write solution - compatibility will all code that takes ranges as input - only the operations that want to can be specialized to exploit the performance gains - no restriction imposed on operations that cannot be implemented in terms of the generator directly
Yes, I agree with all of this, except for the issue of requiring generators to be steppable. I think just overloading for_each() and writing algorithms in terms of that when possible would get the same benefits, though. -- Jeremiah Willcock
On 8/29/2010 1:03 PM, Jeremiah Willcock wrote:
I think just overloading for_each() and writing algorithms in terms of that when possible would get the same benefits, though.
This way lies madness. The separation of containers and algorithms in the STL is what keeps it manageable. You don't want to implement N algorithms over M container types. You want to define a concept on which you can "overload" for_each. -- Eric Niebler BoostPro Computing http://www.boostpro.com
On Sun, 29 Aug 2010, Eric Niebler wrote:
On 8/29/2010 1:03 PM, Jeremiah Willcock wrote:
I think just overloading for_each() and writing algorithms in terms of that when possible would get the same benefits, though.
This way lies madness. The separation of containers and algorithms in the STL is what keeps it manageable. You don't want to implement N algorithms over M container types. You want to define a concept on which you can "overload" for_each.
Yes, that's what I meant. Have a default implementation, but specifically allow it to be overloaded, then make other algorithms use for_each() so that they can use overloaded versions. Maybe the base algorithm should be some kind of fold like it is in a functional language, though. -- Jeremiah Willcock
On 8/29/2010 2:27 PM, Jeremiah Willcock wrote:
On Sun, 29 Aug 2010, Eric Niebler wrote:
On 8/29/2010 1:03 PM, Jeremiah Willcock wrote:
I think just overloading for_each() and writing algorithms in terms of that when possible would get the same benefits, though.
This way lies madness. The separation of containers and algorithms in the STL is what keeps it manageable. You don't want to implement N algorithms over M container types. You want to define a concept on which you can "overload" for_each.
Yes, that's what I meant. Have a default implementation, but specifically allow it to be overloaded, then make other algorithms use for_each() so that they can use overloaded versions. Maybe the base algorithm should be some kind of fold like it is in a functional language, though.
No. There is no one-algorithm-to-rule-them-all, with which all other algorithms can be implemented. There are different algorithms and different sequence types. The trick is matching the best algorithm implementation with the specific capabilities supported by the sequence. Hence the iterator categories. And segmented iterators are an extra category for which the algorithms can be specialized. But I'm pretty sure you know all this, so I feel a bit strange saying it. Have I missed something? -- Eric Niebler BoostPro Computing http://www.boostpro.com
On Sun, 29 Aug 2010, Eric Niebler wrote:
On 8/29/2010 2:27 PM, Jeremiah Willcock wrote:
On Sun, 29 Aug 2010, Eric Niebler wrote:
On 8/29/2010 1:03 PM, Jeremiah Willcock wrote:
I think just overloading for_each() and writing algorithms in terms of that when possible would get the same benefits, though.
This way lies madness. The separation of containers and algorithms in the STL is what keeps it manageable. You don't want to implement N algorithms over M container types. You want to define a concept on which you can "overload" for_each.
Yes, that's what I meant. Have a default implementation, but specifically allow it to be overloaded, then make other algorithms use for_each() so that they can use overloaded versions. Maybe the base algorithm should be some kind of fold like it is in a functional language, though.
No. There is no one-algorithm-to-rule-them-all, with which all other algorithms can be implemented.
I agree, but there are some algorithms that can be written in terms of for_each or some more general construct.
There are different algorithms and different sequence types. The trick is matching the best algorithm implementation with the specific capabilities supported by the sequence. Hence the iterator categories. And segmented iterators are an extra category for which the algorithms can be specialized.
I agree. The extra capability I am talking about is that for_each() can be done for a particular sequence in a way that is faster than the standard implementation using iterators.
But I'm pretty sure you know all this, so I feel a bit strange saying it. Have I missed something?
I think the issue is that I was talking about overloading an existing algorithm, rather than having for_each() be a member of a concept like you suggested before. Basically, have a concept ForEachable that requires a for_each() operation, have it be refined by all iterator-based sequence concepts (or have a generic model defined), then let individual sequences model it explicitly/override the generic implementation when appropriate. Then, those algorithms that can be implemented using only for_each() -- but, as you said, not all algorithms -- can be changed to use the new concept and a for_each() operation rather than iterator-based loops. I think we agree on a lot more than it looks like; what I want is to give even more genericity/specialization potential to certain types of sequences with certain algorithms while still reusing a single algorithm implementation across all sequences. Some containers will still just want an iterator-based for_each(), and some algorithms will still need explicit iterators or ranges rather than just a for_each() operation; this new concept only changes the behavior of some sequences with some algorithms. -- Jeremiah Willcock
On Sun, Aug 29, 2010 at 5:34 AM, Mathias Gaunard
On 29/08/2010 00:53, Dave Abrahams wrote:
I presume you're familiar with http://lafstern.org/matt/segmented.pdf?
I had heard of segmented iterators but hadn't looked at them in detail, so thanks for the link. I'm not sure I want something that's just two-level
Read the paper. It works for as many levels as you want. The localiterator can itself be segmented.
I see some potential for some interesting discussion here, sorry in advance for hijacking this thread.
I think the whole segmented iterators/ranges is a bad idea, as it has quite a few disadvantages and there are alternative, simpler solutions that provide the same benefits.
segmented ranges have the following disadvantages:
- while segmented iterators are easier to write than flat iterators in certain cases, they're still not as easy to write as they could be
I don't think anyone is suggesting replacing flat iterators with segmented iterators -- they would coexist. I don't see anything wrong with letting generic algorithms (like for_each) transparently take advantage of the underlying storage of a sequence.
- making non-trivial operations (any operation that needs to manipulate several adjacent elements [1]) deal with segmented ranges can be so complicated that they can realistically only be written with a flattened non-segmented view of the range.
[1] examples include character encoding conversion, base64 encoding/decoding, gzip encoding/decoding, etc.
While developing a HTTP parser, I wanted to be able to parse between
two or more buffers (segments), but that was quite slow. I wanted
full performance for the 99% of the time that a valid sequence would
exist in a single buffer. I wanted to avoid complicated code to move
between segments, and didn't want to copy buffers around.
I ended up letting the parser keep taking flat iterators as input --
as you say, clearly these are the simplest kind to work with. On top
of that I built another parser that first uses a parser
On 29/08/2010 13:34, Mathias Gaunard wrote:
I think the whole segmented iterators/ranges is a bad idea, as it has quite a few disadvantages and there are alternative, simpler solutions that provide the same benefits.
segmented ranges have the following disadvantages:
- any code or algorithm that deals with regular ranges has to be rewritten, not just in order to gain the performance gains, but to get basic functionality working as well. That's about as bad as it gets from an integration point of view.
I just re-read the paper (I had read it a long time ago), and it seems I had missed an important element: a segmented iterator is still an iterator, out of which you can extract the actual iterator of the segments and the iterator of the data within a segment. (the terminology makes it a bit confusing) I thought the segmented iterator and the iterator of the segments were the same thing, which is why I thought it wasn't a good solution: passing the iterator to the segments to existing algorithms wouldn't work; but with the approach as described in the paper, it will, and segmented iterators are indeed a refinement of input iterators. Algorithms written in terms of input iterators do not have to be rewritten to work, since segmented iterators are input iterators with an extra distinct iteration mode.
participants (7)
-
Cory Nelson
-
Dave Abrahams
-
David Abrahams
-
Eric Niebler
-
Jeremiah Willcock
-
Mathias Gaunard
-
OvermindDL1