
Steven Ross wrote:
Are there any more suggestions for the core algorithm?
Log2 can be done in O(log Nbits) steps, rather than O(Nbits). I spent a while trying to understand this bit of code: //Swap into place //This dominates runtime, especially in the swap; std::sort calls are the other main time-user RandomAccessIter current = first; for(unsigned u = 0; current != last; ++u) { for(current_bin = (bins + ((*current >> log_divisor) - div_min)); (current_bin->count > u); current_bin = bins + ((*current >> log_divisor) - div_min)) std::swap(*current, *(current_bin->current_position++)); //Now that we've found the item belonging in this position, increment the bucket count if(current_bin->current_position == current++) ++(current_bin->current_position); } You don't help understanding by changing the meaning of bin.count half way through :-(. Anyway, I think you're doing more work than necessary here. For each bin, the elements between the start of the bin and bin.current_position will be correct and can be skipped over; this will be on average half of the elements. This doesn't save any swaps but it does save the shifting and some comparisons. Pseudo-code: for each bin { for each element between this bin's current_position and its end { while (1) { determine the bin this element should be in if it's in the right bin { break } else { swap it with the element at the right bin's current_position increment that bin's current_position } } } } Have I understood this correctly? Phil.