[boost] Boost Algorithms Library

21 Mar 2005

      As part of my work on the MTL rewrite
(http://www.boost-consulting.com/projects/mtl) I have to re-implement
the FAST library
(http://www.osl.iu.edu/research/mtl/reference/html/#fast)

I am re-imagining FAST to use "iterators" whose positions are
represented in their types.  This is a similar concept to that of a
fusion iterator (http://www.boost.org/libs/spirit/fusion/readme.txt).
That will allow full loop unrolling for fixed size arrays and their
subranges -- the whole point of FAST -- without having to explicitly
supply integral constants.  Naturally such iterators can't be
incremented in-place; traversal will be done with next(i).  There will
be a function to convert FAST iterators into regular std:: iterators.
FAST iterators are different from Fusion iterators in that respect;
fusion iterators are heterogeneous in general and so can't have a
single value_type.

It turns out that OSL (and probably Boost in general) are going to be
writing several kinds of algorithm implementations for different kinds
of iterators, including segmented algorithms
(http://lafstern.org/matt/segmented.pdf).  We also have the standard
algorithms, Fusion algorithms, and who-knows-what-else so it makes
sense to try to supply a common interface that can be used
generically, without regard to the kinds of sequences are being
operated on.  It would also be nice to make the system extensible, so
there's at least a chance of providing new algorithm implementations
for new kinds of iterator and having things "mostly work."

I am planning a sequence/range-based (as opposed to iterator-based)
interface, to make sequence adaptation and chaining easier.  In case
you disagree, I consider that a comparatively uninteresting and
orthogonal point, so I'd prefer to save any arguments on that for
later.

As far as I can tell, the only reasonable way to unify these libraries
extensibly is to provide a central algorithm dispatcher that is called
with qualification, e.g.:

  boost::algorithm::transform(input_sequence, op, output_sequence)

The easiest way to do dispatching is to use ADL based on tags
associated with the sequences or their iterators, something like:

  // algorithm.hpp
  namespace boost { namespace algorithm {

    template <class I, class Op, class O>
    something transform(I input_sequence, Op op, O output_sequence)
    {
       return transform_impl(
            input_sequence
          , algorithm::get_tag(input_sequence)
          , op
          , output_sequence
          , algorithm::get_tag(output_sequence));
    }
  }}

  // fast.hpp
  namespace boost { namespace algorithm { namespace fast {
    struct fast_tag {};

    template <class I, class Op, class O, class T>
    something transform_impl(
        I input_sequence, fast_tag, Op op, O output_sequence, T)
    {
       // ... whatever ...
    }
  }}}

I believe ambiguities will be the greatest obstacle to extensibility,
especially because many algorithms (e.g. the 2-sequence version of
transform) can operate on utterly different sequences.  It's possible
to add another transform overload without the tag arguments in fast::
so that users can explicitly invoke fast::transform in case
algorithm::transform is ambiguous, but that won't help when transform
is being invoked from within some library whose code the user doesn't
control.  Another problem is that determining the type of "something"
for the outer transform can be complicated.  That's related to the use
of ADL dispatching because we might have to determine how the ADL
works out just to understand where to find the return type
calculation.  Yes, typeof would solve that problem.

The ambiguity problem can be resolved by going to a
specialization-based dispatch.  It allows us to silently choose one of
many equal implementations in case of ambiguity, which can be
neccessary if you want to prevent ambiguities from arising in
libraries over which the user has no control.  The user can also add
specializations to resolve the ambiguity, though that leads to the
risk of clashing specializations.  

That approach adds a great deal of machinery for something we haven't
even seen to be a problem in practice yet.  For an example, see the
end of the message.  I am inclined to go with ADL-based dispatching,
at least for now.

Whether results should be lazy (as in Fusion) or greedy, as in
standard algorithms, or whether both variants should be provided
somehow, are interesting questions.  The answers change the usage
pattern, i.e.:

  copy(transform(s1,op), output)

or 
  output = transform(s1,op)  // if output supports it

vs.

  transform(s1,op,output)

Your thoughts on these topics would be much appreciated.

----------

// Specialization-based dispatch example

// algorithm.hpp
namespace boost { namespace algorithm {

  // algorithm identifier
  struct transform_tag {};

  // Map some implementation identifier into a transform implementation.
  template <class IO> struct transform_impl;

  // A metafunction to trace refinement relations among sequence
  // concept tags.  E.G.,
  //   refines<random_access_sequence_tag>::type 
  // yields bidirectional_sequence_tag.  We can work out a protocol
  // for "multiple refinement," if neccessary.
  template <class Tag> struct refines;

  // implementation -- produce an implementation for an algorithm
  // and set of sequence types and category tags
  //
  // ID must be a function type of the form:
  //    algorithm-identifier ( Seq1, tag1, ... SeqN, tagN )
  //
  template <class ID> struct implementation
  {
       typedef mpl::false_ is_specialized;
       typedef mpl::int_<0> score;
  };

  // If there's no closer specialization, then try again with the 
  // "base" sequence category.
  template <class Algorithm, class S1,class Tag>
  struct implementation<Algorithm(S1,Tag)>
    : implementation<Algorithm(S1,typename refines<Tag>::type)>
  {
      typedef typename mpl::next<
          typename implementation<
              Algorithm(S1,typename refines<Tag>::type)
          >::score
      >::type score;
  };

  // If there's no closer specialization, then try again with the 
  // "base" sequence categories.
  template <class Algorithm, class S1, class Tag1, class S2, class Tag2>
  struct implementation<Algorithm(S1,Tag1,S2,Tag2)>
    : best_scoring<           // definition left as exercise ;-)
        mpl::vector2<
            implementation<
                Algorithm(S1,typename refines<Tag1>::type,S2,Tag2)
            >
          , implementation<
                Algorithm(S1,Tag1,S2,typename refines<Tag2>::type)
            >
        >
     >::type
  {};

  // ...

  // impl -- dispatch an algorithm and sequence types to a given
  // implementation
  //
  // ID must be a function type of the form:
  //    algorithm-identifier ( Seq1, ... SeqN )
  //
  template <class ID> struct impl;

  template <class Algorithm, class S1> 
  struct impl<Algorithm(S1)>
    : implementation<Algorithm(S1, typename sequence_tag<S1>::type)>
  {};

  template <class Algorithm, class S1, class S2> 
  struct impl<Algorithm(S1)>
    : implementation<
        Algorithm(
            S1, typename sequence_tag<S1>::type
          , S2, typename sequence_tag<S2>::type
        )>
  {};

  // ...

  // Dispatching wrapper for transform implementations
  template <class I, class Op, class O>
  impl<transform_tag(I,O)>::result_type
  transform(I input_sequence, Op op, O output_sequence)
  {
     impl<transform_tag(I,O)>
     ::execute(input_sequence, op, output_sequence);
  }
}}

// fast.hpp
namespace boost { namespace algorithm { namespace fast {
  struct tag {}; // identify FAST sequences
}}}

namespace boost { namespace algorithm { 

  // A transform algorithm implementation for FAST input sequences
  template <class I, class O, class OTag>
  struct implementation< transform_tag(I,fast::tag,O,OTag) >
  {
       // ...
  }
}}

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[boost] Boost Algorithms Library

David Abrahams