Re: [boost] Interest in some Algorithms

27 Jan 2016

...
On 26 Jan 2016, at 05:15, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Dean Michael Berris wrote:
...
You can
leverage the fact that the ranges are already sorted, using
std::lower_bound and std::upper_bound to find contiguous ranges that can be
rotated out of the way incrementally.
Let's see the code.  And the benchmarks.  And your complexity claims!
Indeed. I’ll go through my employer’s patching process for this. :)
...
The challenge is surely to get linear complexity worst-case, which
you can do with the sort of scan Steven suggested and also has good
cache locality characteristics, while keeping the sublinear complexity
in best cases.
I think if it was possible to do a specialisation on taking forward iterators, the linear algorithm would be fine. I've forgotten about whether this is possible or even a good idea to do without concept support in the language yet.
...
Here's the obvious linear algorithm:
template <typename I1, typename I2>
I1 set_inplace_difference(I1 f1, I1 l1, I2 f2, I2 l2)
{
 I1 o = f1;
 I1 i1 = f1;
 I2 i2 = f2;
while (i1 < l1 && i2 < l2) {
   if (*i1 < *i2) {
     std::swap(*o,*i1);
     ++i1;
     ++o;
   } else if (*i2 < *i1) {
     ++i2;
   } else {
     ++i1;
     ++i2;
   }
 }
while (i1 < l1) {
   std::swap(*o,*i1);
   ++i1;
   ++o;
 }
return o;
}
Now I think that you can make that logarithmic in the best case, but without
making it O(N log N) in the worst case like my recursive algorithm did, like
this:
template <typename I1, typename I2>
I1 set_inplace_difference(I1 f1, I1 l1, I2 f2, I2 l2)
{
 I1 o = f1;
 I1 i1 = f1;
 I2 i2 = f2;
 int log_size = log2(l2-f2);
int n = 0;
 while (i1 < l1 && i2 < l2) {
   if (*i1 < *i2) {
     std::swap(*o,*i1);
     ++i1;
     ++o;
     n = 0;
   } else if (*i2 < *i1) {
     ++i2;
     ++n;
     if (n == log_size) {
       i2 = std::lower_bound(i2,l2,*i1);
       n = 0;
     }
   } else {
     ++i1;
     ++i2;
     n = 0;
   }
 }
if (i1 == o) return l1;
while (i1 < l1) {
   std::swap(*o,*i1);
   ++i1;
   ++o;
 }
return o;
}
That optimises skipping through the second sequence.  You could do
something similar for the first sequence in the first if-block but
only for the case where i1==o, and also in the third if-block for
sequences of equal values.  (I hadn't previously considered duplicate
values; if these are sets, is that allowed?)
I had been working on the assumption that sets were allowed to have multiple contiguous runs of similar elements. If this wasn't the case then indeed it would be simpler (it would obviate the need to use std::upper_bound at least from my implementation).
...
Something else to note is that my recursive algorithm keeps the
to-discard items sorted, so you can actually think of it as a kind of
partition algorithm i.e. both halves of the output are useful.  The
linear algorithms don't do that.
Right, it's a form of stable partition. Maybe it should be called a "stable_set_inplace_difference", and that would certainly be useful.

Thanks Phil!