Re: [boost] Proposed templated integer_sort

30 Jul 2008

      On Tue, Jul 29, 2008 at 8:39 PM, Steven Watanabe <watanabesj@gmail.com>wrote:
...
Steven Ross wrote:
...
A functor is required because of the cast operation; I don't see any
generic
way to code a general cast operation.
How about using a traits template?
That's worth considering.
Can I use is_floating_point to mean that the entire value_type can be safely
cast to an integer?
I could cast a built-in float/double type to a built-in signed integer type
of the same size, but this wouldn't work for user-defined data types being
sorted on a float key (unless I cast the result of their user-defined >>
operator, which would be weird), so effectively it would be a special-case
for built-in data types.
float_sort works by taking a signed-magnitude floating point number,
casting it to a 2's complement integer, and then reversing the negative
buckets to fix the order.

Since the usage that I can come up with for a functor-free version is so
specialized, I would prefer to skip it, but if the application is more
general than I imagine, then I'll consider it seriously.

I've thought about some prior suggestions for making
integer_sort/float_sort/string_sort apply to any forward_iterator, and came
to this conclusion:
Spreadsort (the underlying algorithm) is optimized on the assumption that
memory allocation is slow and best minimized, random accesses are somewhat
(but not greatly) slower than comparisons, and comparisons are slower than
iterating across the provided iterators.
What should be done if these assumptions are substantially off:
If memory allocation is fast and memory reasonably available, use an LSB
radix sort or the original non-in-place variant of Spreadsort.
If random accesses are really slow, an entirely different algorithm is
called for, such as the file sorting version of Spreadsort (not open
source), or Mergesort.
If random accesses are fast, MAX_SPLITS can be eliminated (or made infinite)
and Spreadsort will run in theta(n) time on continuous integrable function
distributions.
If comparisons are extremely slow, use a pure radix sort, such as LSB radix
sort; Spreadsort isn't optimized for this situation; if they're just a
little slow, tune Spreadsort to prefer radix sorting.
If iteration is slow then MAX_SPLITS should be eliminated, and the
non-in-place Spreadsort should probably be used.

Based upon these optimization assumptions, I don't think my in-place
Spreadsort algorithm is appropriate for non-random-access sorting in a
situation where a programmer is using an appropriate data structure for
their problem; a different and probably specialized algorithm would be
called for.

The basic idea behind integer_sort/float_sort/string_sort is to provide
hybrid generic algorithms that are substantially faster for most common
sorting problems than std::sort.  The assumption is that for more
specialized tasks, a different off-the-shelf algorithm will probably be
better.

Do you recommend any particular reference on traits, such as Mr. Abraham's
book?

The standard states that, "For the algorithms to work correctly,
...
comp has to induce a strict weak ordering on the values."
This is not true for NaNs, therefore, IMO, you don't need to worry
about them.
Thanks for checking, Steve, I won't worry about them then.