
As part of my work on the MTL rewrite (http://www.boost-consulting.com/projects/mtl) I have to re-implement the FAST library (http://www.osl.iu.edu/research/mtl/reference/html/#fast) I am re-imagining FAST to use "iterators" whose positions are represented in their types. This is a similar concept to that of a fusion iterator (http://www.boost.org/libs/spirit/fusion/readme.txt). That will allow full loop unrolling for fixed size arrays and their subranges -- the whole point of FAST -- without having to explicitly supply integral constants. Naturally such iterators can't be incremented in-place; traversal will be done with next(i). There will be a function to convert FAST iterators into regular std:: iterators. FAST iterators are different from Fusion iterators in that respect; fusion iterators are heterogeneous in general and so can't have a single value_type. It turns out that OSL (and probably Boost in general) are going to be writing several kinds of algorithm implementations for different kinds of iterators, including segmented algorithms (http://lafstern.org/matt/segmented.pdf). We also have the standard algorithms, Fusion algorithms, and who-knows-what-else so it makes sense to try to supply a common interface that can be used generically, without regard to the kinds of sequences are being operated on. It would also be nice to make the system extensible, so there's at least a chance of providing new algorithm implementations for new kinds of iterator and having things "mostly work." I am planning a sequence/range-based (as opposed to iterator-based) interface, to make sequence adaptation and chaining easier. In case you disagree, I consider that a comparatively uninteresting and orthogonal point, so I'd prefer to save any arguments on that for later. As far as I can tell, the only reasonable way to unify these libraries extensibly is to provide a central algorithm dispatcher that is called with qualification, e.g.: boost::algorithm::transform(input_sequence, op, output_sequence) The easiest way to do dispatching is to use ADL based on tags associated with the sequences or their iterators, something like: // algorithm.hpp namespace boost { namespace algorithm { template <class I, class Op, class O> something transform(I input_sequence, Op op, O output_sequence) { return transform_impl( input_sequence , algorithm::get_tag(input_sequence) , op , output_sequence , algorithm::get_tag(output_sequence)); } }} // fast.hpp namespace boost { namespace algorithm { namespace fast { struct fast_tag {}; template <class I, class Op, class O, class T> something transform_impl( I input_sequence, fast_tag, Op op, O output_sequence, T) { // ... whatever ... } }}} I believe ambiguities will be the greatest obstacle to extensibility, especially because many algorithms (e.g. the 2-sequence version of transform) can operate on utterly different sequences. It's possible to add another transform overload without the tag arguments in fast:: so that users can explicitly invoke fast::transform in case algorithm::transform is ambiguous, but that won't help when transform is being invoked from within some library whose code the user doesn't control. Another problem is that determining the type of "something" for the outer transform can be complicated. That's related to the use of ADL dispatching because we might have to determine how the ADL works out just to understand where to find the return type calculation. Yes, typeof would solve that problem. The ambiguity problem can be resolved by going to a specialization-based dispatch. It allows us to silently choose one of many equal implementations in case of ambiguity, which can be neccessary if you want to prevent ambiguities from arising in libraries over which the user has no control. The user can also add specializations to resolve the ambiguity, though that leads to the risk of clashing specializations. That approach adds a great deal of machinery for something we haven't even seen to be a problem in practice yet. For an example, see the end of the message. I am inclined to go with ADL-based dispatching, at least for now. Whether results should be lazy (as in Fusion) or greedy, as in standard algorithms, or whether both variants should be provided somehow, are interesting questions. The answers change the usage pattern, i.e.: copy(transform(s1,op), output) or output = transform(s1,op) // if output supports it vs. transform(s1,op,output) Your thoughts on these topics would be much appreciated. ---------- // Specialization-based dispatch example // algorithm.hpp namespace boost { namespace algorithm { // algorithm identifier struct transform_tag {}; // Map some implementation identifier into a transform implementation. template <class IO> struct transform_impl; // A metafunction to trace refinement relations among sequence // concept tags. E.G., // refines<random_access_sequence_tag>::type // yields bidirectional_sequence_tag. We can work out a protocol // for "multiple refinement," if neccessary. template <class Tag> struct refines; // implementation -- produce an implementation for an algorithm // and set of sequence types and category tags // // ID must be a function type of the form: // algorithm-identifier ( Seq1, tag1, ... SeqN, tagN ) // template <class ID> struct implementation { typedef mpl::false_ is_specialized; typedef mpl::int_<0> score; }; // If there's no closer specialization, then try again with the // "base" sequence category. template <class Algorithm, class S1,class Tag> struct implementation<Algorithm(S1,Tag)> : implementation<Algorithm(S1,typename refines<Tag>::type)> { typedef typename mpl::next< typename implementation< Algorithm(S1,typename refines<Tag>::type) >::score >::type score; }; // If there's no closer specialization, then try again with the // "base" sequence categories. template <class Algorithm, class S1, class Tag1, class S2, class Tag2> struct implementation<Algorithm(S1,Tag1,S2,Tag2)> : best_scoring< // definition left as exercise ;-) mpl::vector2< implementation< Algorithm(S1,typename refines<Tag1>::type,S2,Tag2) > , implementation< Algorithm(S1,Tag1,S2,typename refines<Tag2>::type) > > >::type {}; // ... // impl -- dispatch an algorithm and sequence types to a given // implementation // // ID must be a function type of the form: // algorithm-identifier ( Seq1, ... SeqN ) // template <class ID> struct impl; template <class Algorithm, class S1> struct impl<Algorithm(S1)> : implementation<Algorithm(S1, typename sequence_tag<S1>::type)> {}; template <class Algorithm, class S1, class S2> struct impl<Algorithm(S1)> : implementation< Algorithm( S1, typename sequence_tag<S1>::type , S2, typename sequence_tag<S2>::type )> {}; // ... // Dispatching wrapper for transform implementations template <class I, class Op, class O> impl<transform_tag(I,O)>::result_type transform(I input_sequence, Op op, O output_sequence) { impl<transform_tag(I,O)> ::execute(input_sequence, op, output_sequence); } }} // fast.hpp namespace boost { namespace algorithm { namespace fast { struct tag {}; // identify FAST sequences }}} namespace boost { namespace algorithm { // A transform algorithm implementation for FAST input sequences template <class I, class O, class OTag> struct implementation< transform_tag(I,fast::tag,O,OTag) > { // ... } }} -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
Whether results should be lazy (as in Fusion) or greedy, as in standard algorithms, or whether both variants should be provided somehow, are interesting questions. The answers change the usage pattern, i.e.:
copy(transform(s1,op), output)
or output = transform(s1,op) // if output supports it
vs.
transform(s1,op,output)
WRT Fusion, I am re-considering its laziness. MPL algorithms are sequence preserving. Fusion is not, due to laziness. i.e: transform(tuple, op) does not return a sequence like tuple. instead, it returns a view (transform_view). However, I realize now, that in most use cases, I noticed that I had to eagerly generate the result into a tuple almost as immediately after each transform step. After leaving this for a while in my thought queue, and after implementing phoenix2's lazy STL algorithms, I realized that the lazy Fusion algorithms could as well be implemented as phoenix2's lazy fusion algorithms. If I implemented Fusion algorithms as non-lazy and sequence preserving like MPL, I could easily write lazy fusion algorithms through phoenix. I am still unsure about which is the best way to go. I just thought I'd share my recent thoughts on the issue. Cheers, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman <joel@boost-consulting.com> writes:
David Abrahams wrote:
Whether results should be lazy (as in Fusion) or greedy, as in standard algorithms, or whether both variants should be provided somehow, are interesting questions. The answers change the usage pattern, i.e.: copy(transform(s1,op), output) or output = transform(s1,op) // if output supports it vs. transform(s1,op,output)
WRT Fusion, I am re-considering its laziness. MPL algorithms are sequence preserving. Fusion is not, due to laziness. i.e:
transform(tuple, op)
does not return a sequence like tuple. instead, it returns a view (transform_view).
However, I realize now, that in most use cases, I noticed that I had to eagerly generate the result into a tuple almost as immediately after each transform step.
Really; both the compile-time part (type) and the runtime part (value)?
After leaving this for a while in my thought queue, and after implementing phoenix2's lazy STL algorithms, I realized that the lazy Fusion algorithms could as well be implemented as phoenix2's lazy fusion algorithms.
Interesting. I'm not sure I understand what you mean yet, though.
If I implemented Fusion algorithms as non-lazy and sequence preserving like MPL, I could easily write lazy fusion algorithms through phoenix.
I'd be interested to see that.
I am still unsure about which is the best way to go. I just thought I'd share my recent thoughts on the issue.
Well, the use of segmented iterators and hierarchical algorithms (http://lafstern.org/matt/segmented.pdf) has some interesting implications. You can't just use iterator adaptors such as transform_iterator to implement laziness unless either: 1. You want to pay a big efficiency hit or 2. The adaptors expose the same "extended properties" (e.g. segmented_iterator_traits) as the iterators they're adapting. That requires knowledge of all extended properties. It seems to me that in order to do the traversal optimally, more than laziness is required: you'd have to build expression templates and then somehow analyze the traversal properties of the leaf sequences in order to evaluate the expressions. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
Joel de Guzman <joel@boost-consulting.com> writes:
David Abrahams wrote:
Whether results should be lazy (as in Fusion) or greedy, as in standard algorithms, or whether both variants should be provided somehow, are interesting questions. The answers change the usage pattern, i.e.: copy(transform(s1,op), output) or output = transform(s1,op) // if output supports it vs. transform(s1,op,output)
WRT Fusion, I am re-considering its laziness. MPL algorithms are sequence preserving. Fusion is not, due to laziness. i.e:
transform(tuple, op)
does not return a sequence like tuple. instead, it returns a view (transform_view).
However, I realize now, that in most use cases, I noticed that I had to eagerly generate the result into a tuple almost as immediately after each transform step.
Really; both the compile-time part (type) and the runtime part (value)?
Yes. However, see below...
After leaving this for a while in my thought queue, and after implementing phoenix2's lazy STL algorithms, I realized that the lazy Fusion algorithms could as well be implemented as phoenix2's lazy fusion algorithms.
Interesting. I'm not sure I understand what you mean yet, though.
Consider an immediate fusion algorithm: transform(s, f) which returns another s immediately. Now consider a corresponding phoenix lazy function: phoenix::transform(_1, f) This version returns a a curried (ahem.. partially applied) function that is lazily evaluated, later, when you pass in a sequence, s. Yet... Well, it turns out that I am wrong. Such a scheme will suffer because ,while the example above is as lazy as it can get, composition of algos such as: transform(remove_if(_1, f1), f2) will turn out to be unoptimal because the immediate function being forwarded to (the immediate versions of transform and remove_if) will both return intermediate structures, unlike with current fusion, where no intermediates are ever created; just views.
Well, the use of segmented iterators and hierarchical algorithms (http://lafstern.org/matt/segmented.pdf) has some interesting implications. You can't just use iterator adaptors such as transform_iterator to implement laziness unless either:
1. You want to pay a big efficiency hit
or
2. The adaptors expose the same "extended properties" (e.g. segmented_iterator_traits) as the iterators they're adapting. That requires knowledge of all extended properties.
It seems to me that in order to do the traversal optimally, more than laziness is required: you'd have to build expression templates and then somehow analyze the traversal properties of the leaf sequences in order to evaluate the expressions.
After reading the doc "segmented.pdf", I see what you mean. I am not sure, but perhaps views that take advantage of the segmentation information will work as well as ETs because these views can exploit the traversal properties of the sequences they are acting on. Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman <joel@boost-consulting.com> writes:
Consider an immediate fusion algorithm:
transform(s, f)
which returns another s immediately. Now consider a corresponding phoenix lazy function:
phoenix::transform(_1, f)
phoenix::transform(_1, _2) would correspond more closely.
This version returns a a curried (ahem.. partially applied) function that is lazily evaluated, later, when you pass in a sequence, s.
Yet...
Well, it turns out that I am wrong. Such a scheme will suffer because ,while the example above is as lazy as it can get, composition of algos such as:
transform(remove_if(_1, f1), f2)
will turn out to be unoptimal because the immediate function being forwarded to (the immediate versions of transform and remove_if) will both return intermediate structures, unlike with current fusion, where no intermediates are ever created; just views.
Yep.
Well, the use of segmented iterators and hierarchical algorithms (http://lafstern.org/matt/segmented.pdf) has some interesting implications. You can't just use iterator adaptors such as transform_iterator to implement laziness unless either:
1. You want to pay a big efficiency hit
or
2. The adaptors expose the same "extended properties" (e.g. segmented_iterator_traits) as the iterators they're adapting. That requires knowledge of all extended properties.
It seems to me that in order to do the traversal optimally, more than laziness is required: you'd have to build expression templates and then somehow analyze the traversal properties of the leaf sequences in order to evaluate the expressions.
After reading the doc "segmented.pdf", I see what you mean. I am not sure, but perhaps views that take advantage of the segmentation information will work as well as ETs because these views can exploit the traversal properties of the sequences they are acting on.
The real problem with views, if you care about optimal efficiency as we do in the MTL, is that in real life iterator adaptors will impose an abstraction penalty when applied to pointers, at least on most compilers :( -- Dave Abrahams Boost Consulting www.boost-consulting.com
participants (2)
-
David Abrahams
-
Joel de Guzman