Re: [boost] Proposed templated integer_sort

12 Jul 2008

      Steven Ross wrote:
...
Are there any more suggestions for the core algorithm?
for each bin {
    for each element between this bin's current_position and its end {
      while (1) {
        determine the bin this element should be in
        if it's in the right bin { break }
        else {
          swap it with the element at the right bin's current_position
          increment that bin's current_position
        }
      }
    }
}

I'm considering the case where the type is relatively large and swap 
takes time proportional to its size, e.g. a large struct.  Currently 
your code (and my version above) will, on average, swap each element 
twice: once to what's probably the wrong bin and then into the right 
bin.  If swap itself uses a temporary then each element is copied on 
average 3 times.  Maybe the compiler can optimise some of that, and the 
cache may make some of those copies faster than others.  But I guess 
that it can probably be improved to nearer one copy per element.  
Here's the idea:

We want to move element A to the position currently occupied by element 
B.  We could swap them, but moving B to A's position is probably not 
useful.  So let's first move B to a good position for it, and then move 
A into its space.  Of course some other element C will be occupying the 
position where we want to put B, so we have to keep going until we find 
an element that wants to be in the position where A was.

It's relatively easy to code a recursive version of this:

void circulate(Iter i) {
   // Do an N-way circular swap that eventually puts the element at i in 
its right place,
   // and puts an element that wants to be at i there.
   *i = circulate_rec(i,i);
}

T circulate_rec(Iter source, Iter current) {
   // Move the element at current to its right position, and return an element
   // that can be stored at source.
   // Find current's right position:
   Iter target = right_bin_for(*current).current_position++;
   // Base case: if the element at target would be happy at position 
source, that's easy:
   if (right_bin_for(*target) == bin_of(source)) {
     tmp = *target;

   } else {
     // Otherwise, first move target to a good position
     tmp = circulate_rec(source,target);
   }

   // Now actually copy the element at current to the vacant target
   *target = *current;
   return tmp;
}

The problem with that is that you might recurse a lot and fill your 
stack, so in practice you'd want to limit it to a few iterations and 
then give up and put a "wrong" value back at the source.  Also you need 
to be certain that 'tmp' gets some sort of return value optimisation 
and isn't copied during each return.  An explicitly unrolled iterative 
version would probably be best:

Iter a = ...;
Iter b = right_bin_for(*a).current_position++;
if (right_bin_for(*b) != bin_of(a)) {
   Iter c = right_bin_for(*b).current_position++;
   if (right_bin_for(*c) != bin_of(a)) {
     Iter d = right_bin_for(*c).current_position++;
     if (right_bin_for(*d != bin_of(a)) {
       Iter e = right_bin_for(*d);
       tmp = *e;
       *e = *d;
     } else {
       tmp = *d;
     }
     *d = *c;
   } else {
     tmp = *c;
   }
   *c = *b;
} else {
   tmp = *b;
}
*b = *a;
*a = tmp;

Whether it actually improves performance will depend on quite how large 
the structs are.  It certainly won't help with ints.  So let's see your 
version that sorts structs!

Phil.

Re: [boost] Proposed templated integer_sort

Phil Endecott