
On Tue, Jul 29, 2008 at 8:39 PM, Steven Watanabe <watanabesj@gmail.com>wrote:
Steven Ross wrote:
A functor is required because of the cast operation; I don't see any generic way to code a general cast operation.
How about using a traits template?
That's worth considering. Can I use is_floating_point to mean that the entire value_type can be safely cast to an integer? I could cast a built-in float/double type to a built-in signed integer type of the same size, but this wouldn't work for user-defined data types being sorted on a float key (unless I cast the result of their user-defined >> operator, which would be weird), so effectively it would be a special-case for built-in data types. float_sort works by taking a signed-magnitude floating point number, casting it to a 2's complement integer, and then reversing the negative buckets to fix the order. Since the usage that I can come up with for a functor-free version is so specialized, I would prefer to skip it, but if the application is more general than I imagine, then I'll consider it seriously. I've thought about some prior suggestions for making integer_sort/float_sort/string_sort apply to any forward_iterator, and came to this conclusion: Spreadsort (the underlying algorithm) is optimized on the assumption that memory allocation is slow and best minimized, random accesses are somewhat (but not greatly) slower than comparisons, and comparisons are slower than iterating across the provided iterators. What should be done if these assumptions are substantially off: If memory allocation is fast and memory reasonably available, use an LSB radix sort or the original non-in-place variant of Spreadsort. If random accesses are really slow, an entirely different algorithm is called for, such as the file sorting version of Spreadsort (not open source), or Mergesort. If random accesses are fast, MAX_SPLITS can be eliminated (or made infinite) and Spreadsort will run in theta(n) time on continuous integrable function distributions. If comparisons are extremely slow, use a pure radix sort, such as LSB radix sort; Spreadsort isn't optimized for this situation; if they're just a little slow, tune Spreadsort to prefer radix sorting. If iteration is slow then MAX_SPLITS should be eliminated, and the non-in-place Spreadsort should probably be used. Based upon these optimization assumptions, I don't think my in-place Spreadsort algorithm is appropriate for non-random-access sorting in a situation where a programmer is using an appropriate data structure for their problem; a different and probably specialized algorithm would be called for. The basic idea behind integer_sort/float_sort/string_sort is to provide hybrid generic algorithms that are substantially faster for most common sorting problems than std::sort. The assumption is that for more specialized tasks, a different off-the-shelf algorithm will probably be better. Do you recommend any particular reference on traits, such as Mr. Abraham's book? The standard states that, "For the algorithms to work correctly,
comp has to induce a strict weak ordering on the values." This is not true for NaNs, therefore, IMO, you don't need to worry about them.
Thanks for checking, Steve, I won't worry about them then.