[boost] RangeEx review

1 Mar 2009

      My RangeEx review follows.  I'm afraid that I have not been following 
much of the ongoing discussion of this library on the list so apologies 
if I simply repeat what other people are saying.

Algorithms
----------

I have always found the verbosity of things like 
std::sort(container.begin(),container.end()) to be an embarrassing 
misfeature of the std library so it's good to be able to now write the 
much more sensible sort(container).  Just a couple of comments:

- It would be good if the docs were explicit about this being a thin 
wrapper around your existing std::sort, and identifying those 
algorithms that are not so thin.

- Do we really want this in the top-level boost:: namespace?  There 
have been discussions about sorting libraries for Boost who might also 
want boost::sort.  Would it not be better to put these algorithms in a 
boost::range namespace?

Then we move on to the algorithms where you can select what is returned 
using a template parameter.  This is an interesting idea, but it's a 
little verbose and I can't think of any precedent for it in the std 
library or elsewhere in Boost.  Being unfamiliar increases the learning 
curve for a feature.  I was wondering if it would be possible to have 
something more like:

   boost::unique(rng).found  instead of  boost::unique<boost::return_found>(rng)

Or maybe not; just a thought.

You then have this example for sorting and removing duplicates:

   boost::erase( vec, boost::unique<boost::return_found_end>( 
boost::sort(vec) ) );

Err... yuk.  Why can't I write:

   unique(vec);

Or maybe

   unique(sort(vec));

or

   sort(vec);
   unique(vec);

My point is that std::unique only has its "move duplicates to the end" 
behaviour because the conventional interface with a pair of iterators 
doesn't let you change the size of the container.  Now that we have a 
unique() that takes a container we don't need that restriction.

I recently tried to write a starts_with algorithm:

template <typename CONTAINER>
bool starts_with(const CONTAINER& a, const CONTAINER& b) {
   // does a start with b ?
   return std::equal(b.begin(), b.end(), a.begin());
}

The problem with this is that it will crash if a is shorter than b, 
because std::equal takes only three not four iterators.  The fix is to 
add an additional test:

   return a.size() >= b.size() && std::equal(b.begin(), b.end(), a.begin());

Unfortunately, for containers where size() is O(N) this makes the 
algorithm O(N+M), which is especially problematic if the common case if 
for a difference to be found early in the sequences.  So I was curious 
to see if range::equal addresses this problem: it has the potential to 
do so since it has access to all four ends of the containers.  
According to the docs the complexity is "Linear. Exactly distance(rng) 
comparisons." which doesn't look hopeful, but what "rng" is it 
referring to?  Looking at the source, however, it seems that it does 
have an optimal implementation.  I think this needs a documentation fix.

This also makes me wonder if these would be useful algorithms:

bool longer(rng1,rng2);
bool equal_length(rng1,rng2);
bool size_greater_than(rng,size_t);
bool size_equal_to(rng,size_t);

Adaptors
--------

The author seems to be very enthusiastic about chaining functions using 
operator|.  I'm not inherently opposed to this idea but I don't think 
it should be the role of this library, RangeEx, to provide it.  In my 
opinion this feature should be a general-purpose one.  Example:

   y = sqrt ( sin (x) );

In natural language we may choose to describe that as "the square root 
of the sine of x" or "take the sine of x and then take the square root 
of that".  If the second of those seems more "natural" to you then you 
may prefer to write:

   y = x | sin | sqrt;

or maybe even

   y = (sin | sqrt) (x);

I guess that someone who is better at metaprogramming than me could 
write a general-purpose operator| that can be used in this way.  Here's 
a start:

template <typename T1, typename T2, typename T3>
struct chained_func {
   typedef function<T1(T2)> f1_t;
   typedef function<T2(T3)> f2_t;
   f1_t f1;
   f2_t f2;
   chained_func(f1_t f1_, f2_t f2_): f1(f1_), f2(f2_) {}
   T1 operator()(T3 arg) { return f1(f2(arg)); }
};

template <typename T1, typename T2, typename T3>
chained_func<T1,T2,T3> operator|(function<T1(T2)> f1, function<T2(T3)> 
f2) {
   return chained_func<T1,T2,T3>(f1,f2);
}

template <typename T1, typename T2, typename T3>
T1 operator|(T3 val, chained_func<T1,T2,T3> cf) {
   return cf(val);
}

This feature could then be available for any purpose that the user 
prefers, including range adaptors, and RangeEx users who don't want to 
use it can ignore it entirely.  (For all I know, maybe this already 
exists somewhere.)

Even if you choose to keep operator| I suggest making the documentation 
less "evangelical".  Quote: "Why do I prefer the operator| syntax? The 
answer is readbility:" and you go on to show one version that uses 
operator| and one that doesn't.  I would argue that the version using 
operator| is _less_ readable simply because it uses operator|, and the 
reader will wonder why you're doing a bitwise OR between a vector and 
a, umm, a something else.  Of course anyone who reads the documentation 
will understand it, but one of the best features of this library is 
that, as in the example of sort(container), the functions do "exactly 
what they say on the tin", and a casual reader of code using it doesn't 
need to have digested the docs first.

Quote:
   * boost::copy_if( rng, pred, out )
   * boost::count_if( rng, pred )
   These algorithms are not defined and the reason is that both of them 
can be
   expressed by using the original algorithm togther with a Adaptor Generator:

The fact that something can be expressed using a combination of other 
things is not a sufficient  justification for leaving it out.  For 
example, sort(container) could be expressed as 
sort(container.begin(),container.end()) yet you have still provided 
it.  Why should I have to write:

   int n_zeros = count( container | boost::adaptors::filtered(_1==0) );

instead of

   int n_zeros = count_if(container, _1==0);

?

Summary
-------

- What is your evaluation of the design?

I would prefer that operator| be factored out.  Otherwise I am 
satisfied with the design.

- What is your evaluation of the implementation?

I have not properly studied the implementation, but I have not seen any 
problems in the code that I have looked at.

- What is your evaluation of the documentation?

Good.

- What is your evaluation of the potential usefulness of the library?

The basic feature of being able to pass containers to algorithms makes 
it useful enough to justify inclusion.

- Did you try to use the library? With what compiler? Did you have any
problems?

I did some very basic experiments with gcc 4.1 on linux, with no problems.

- How much effort did you put into your evaluation? A glance? A quick 
reading? In-depth study?

A couple of hours.

- Are you knowledgeable about the problem domain?

Not especially.

- Do you think the library should be accepted as a Boost library?

I would prefer to see the operator| stuff factored out, as described 
above.  But given the choice between including the library as-is or not 
including it at all I would choose to include it as is, since the 
algorithms alone are sufficiently useful to justify inclusion.

Regards,  Phil.

[boost] RangeEx review

Phil Endecott