Re: [boost] Formal Review: Boost.RangeEx - missing algorithms

22 Feb 2009

      Dear Vincente,

On Sun, Feb 22, 2009 at 3:08 PM, vicente.botet <vicente.botet@wanadoo.fr>wrote:
...
----- Original Message -----
From: "Neil Groves" <neil@grovescomputing.com>
To: <boost@lists.boost.org>
Sent: Sunday, February 22, 2009 3:11 PM
Subject: Re: [boost] Formal Review: Boost.RangeEx - missing algorithms
...
Dear Vicente,
On Sun, Feb 22, 2009 at 12:13 PM, vicente.botet <
vicente.botet@wanadoo.fr>wrote:
...
<snip>
</snip>
...
May be it will be godd to add the equivalences. The user can always
include
...
them in their application if they found useful.
I'm not sure about this issue. I really want to obtain feedback from as
many
people as possible. I understand the appeal of providing everything that
is
in <algorithm> but I dislike setting what is, in essence, a bad example
of
how to write algorithms. My personal preference is to drop the inferior
technique, this will hopefully make it easy to do the right thing, and
hard
to do the wrong thing. I am, of course, open to feedback on this issue.
Ultimately the library is for it's users.
It's OK. I will wait from  feedback from others also.
...
...
...
The _n operations are replaced if the input is a model of the
RandomAccessRangeConcept with boost::copy( output, input | sliced(0,
n) )
for example, but on reflection this can and must be improved before
release.
If I simply add a first_n adaptor then we can replace all _n
algorithms
with
a superior adaptor for all valid ranges. Hence I propose to add
something
that would allow:
boost::copy( output, input | first_n(n) );
I am pleased that you made me justify dropping the _n versions,
because
frankly my solution is currently insufficient to be a perfect
substition.
Yes, somthing like that should work, but what will be the difference
between
first_n(n) and sliced(0, n)?
The key difference is that while sliced works perfectly fine for
RandomAccessRange's it does not work for SinglePassRange's. It is
possible
to make a first_n work for all valid Ranges. Hence I can't correctly
state
that I have currently provided the tools to make the _n versions of the
algorithms always replacable with a superior alternative. I shall,
because I
want to get rid of them, I should have provided a better alternative
first.
Ok, I see.
...
...
...
As for is_sorted and is_heap, these seem like useful additions to the
algorithm_ext.hpp.
And horror of horrors, the random_sample is missing due to no good
justifiable reason what-so-ever. This I will address.
waiting for them.
...
...
There are surely hidden reasons to don't include some of them.
<snip>
...
I definately like the idea of carefully adding partitioning that is
compatible with Boost.Range. I need to revist posts and suggestions
made
by
others on this list.
IIRC Dr. Arnold Schodl had some interesting ideas, and partitioning is
actively being worked on.
...
But a dynamic one would also be interesting. What do you think?
The only comment I can make at this point is that it requires some
investigation. There are others that have investigate partitioning in
much
more detail and I it would be silly for me to comment before studying
their
contributions. It appears to be something that can be extended and
made
compatible with range very easily without affecting the core.
Hopefully I
can still find enough time to make this happen before release.
Could you give me some pointers?
I have already found the thread in this ML, but I'm not sure we address the
same partition concept. I need to reread it.
I do too!
...
...
...
I have just a last question, is there any difference between
using directly a Range
   Range range;
using a sub_range
   boost::sub_range<Range> sub_rng(boost::begin(range),
boost::end(range));
and using a sliced adaptor
   range | sliced(0, boost:size(range))
?
The idea is that the algorithm may be written without regard for what
actual
range is being used. There are reasons to have iterator_range, sub_range
and
special returns from the adaptors. I believe these are all documented, so
I'm going to be cheeky and ask that you look in the documentation for
sub_range.
Sorry, I was not enough explicit. My question was related to performances.
So is there any performance difference between ...
Actually having re-read my paragraph I state something I don't really mean.
The user of the algorithm must be able to call it for any range that models
a valid Range concept. In very few cases when implementing an algorithm you
can benefit from using the sub_range as a tool to help build the algorithm.

There is no performance difference between iterator_range, and sub_range to
the best of my knowledge and belief. I have not yet measured so this is
slightly speculative.
...
BTW, if I need to make a function that gets the i-th partition of a range
divided on k-parts
template <typename Range>
??? get_partition(Range& range, unsigned i, unsigned parts) {
        std::size_t size = boost::size(range);
        if (i==parts - 1)
           return range | sliced(boost::begin(range_)+i*(size/parts_),
boost::end(range_));
       else
           return range | sliced(boost::begin(range_)+i*(size/parts_),
boost::begin(range_)+((i+1)*(size/parts_))-1);
}
Which will be the type of get_partition?
Should this be done in another way?
I can't see any reason why you should not be able to do it this way.

What this example shows is that I was wrong to state that the return type of
adaptors did not need to be known. I will add the return type information
into the documentation.

The answer in this particular case is:
iterator_range<typename range_iterator<Range>::type >
...
...
Indeed it is a core design goal that the user may provide free
functions to allow other containers to work with the range library
seamlessly. This indeed is exemplified by the code that allows ATL and
MFC
'collections' to operate as ranges without requiring modification. The
relevant documentation is here index.html -> reference -> extending the
library.
The use of anything that is a valid model of the appropriate Range
Concept
must work. It's a defect if it doesn't.
I'm not sure if my example is a counter-example. I'm trying to define a
parallel_sort function taking a Range as parameter. I have started with

...
template <typename Range>
void parallel_sort(Range& range, unsigned cutoff=10000) {
   pool_type pool( boost::tp::poolsize( 2) );
   x_sort fct( pool, cutoff);
   fct.execute(range);
}
class x_sort {
private:
 pool_type &  pool_;
 unsigned    cutoff_;
template <typename Range>
 void seq_( Range& range)  {
   sort_fct()(range);
 }
template <typename Range>
 void par_( Range& range)  {
   unsigned size = boost::size(range);
   if ( size <= cutoff_) return seq_( range);
   else {
       boost::sub_range<Range> left(boost::begin(range),
boost::begin(range)+(size/2));
       boost::sub_range<Range> right(boost::begin(range)+(size/2)+1,
boost::end(range));
       task_type task(pool_.submit(
               boost::bind(& x_sort::par_<Range>,
                   boost::ref( * this), boost::ref(left))));
       this->par_(right);
       task.wait();
       inplace_merge_fct()(range, boost::begin(range)+(size/2));
   }
 }
public:
 x_sort( pool_type & pool, unsigned cutoff)  : pool_( pool), cutoff_(
cutoff)  {}
 template <typename Range>  void execute( boost::sub_range<Range>& range)
 {
   par_( range);
 }
};
I think it's a good example of where the differing range types are not as
easy to implement an algorithm as we would wish. The finished algorithm
should be able to work with anything that is a model of a Range Concept.
...
The problem is that the instantiation of par_<Range> requests the
instantiation of par_<boost::sub_range<Range>>, and ..., which fall in an
recursive instantiation that do not finish.
Should the types boost::sub_range<Range> and
boost::sub_range<boost::sub_range<Range> > be convertibles?
I have taken a workaround, create a sub_range before calling to the
internal function execute.
template <typename Range>
void parallel_sort(Range& range, unsigned cutoff=10000) {
   pool_type pool( boost::tp::poolsize( 2) );
   x_sort fct( pool, cutoff);
   boost::sub_range<Range> rng(range);
   fct.execute(range);
}
The internal templates have a boost::sub_range<Range> parameter instead of
a Range, so the single tempalte instantiation is
par_<boost::sub_range<Range> >.
class x_sort {
private:
 pool_type &  pool_;
 unsigned    cutoff_;
template <typename Range>
 void seq_( boost::sub_range<Range>& range)  {
   sort_fct()(range);
 }
template <typename Range>
 void par_( boost::sub_range<Range>& range)  {
   unsigned size = boost::size(range);
   if ( size <= cutoff_) return seq_( range);
   else {
       boost::sub_range<Range> left(boost::begin(range),
boost::begin(range)+(size/2));
       boost::sub_range<Range> right(boost::begin(range)+(size/2)+1,
boost::end(range));
       // fork a new sub-action t1 in pool
       task_type task(pool_.submit(
               boost::bind(& x_sort::par_<Range>,
                   boost::ref( * this), boost::ref(left))));
       this->par_(right);
       task.wait();
       inplace_merge_fct()(range, boost::begin(range)+(size/2));
   }
 }
public:
 x_sort( pool_type & pool, unsigned cutoff)
 : pool_( pool), cutoff_( cutoff)
 {}
template <typename Range>
 void execute( boost::sub_range<Range>& range)
 {
   par_( range);
 }
};
Is this the correct way to use Range and sub_range?
Yes, this is exactly what I intend. The public algorithm exposes an
interface that can accept any Range Concept, and the internal implementation
has to, occasionally jump through some hoops to handle sub_range etc. I
haven't been able to think of a way to eliminate this need. Fortunately for
most algorithm implementations it is not necessary, and I think the extra
function call is not a huge deal.

If anyone knows a technique to avoid the need to use sub_range that
simplifies matters, I would gladly use it.
...
Best,
Vicente
Thanks,
Neil Groves
...
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost