[library submission] smoothsort

Hello, I just finished a C++ implementation of the smoothsort algorithm (a derivative of the heapsort algorithm by Dijkstra that, in his words, "is of order N log N in the worst case, but of order N in the best case, with a smooth transition between the two"). When sorting nearly sorted containers smoothsort achieves excellent performances. If you keep sorting the same container, smoothsort may be the right algorithm for you. This implementation works on all sortable containers, with best performances on containers supporting random access iterators (in other words, smoothsorting a std::list is quite slow). It only requires the "<" operator between two elements and the user can supply the function with a binary predicate. To my knowledge, there is no equivalent public C++ implementation. I was wondering if this was interesting enough to be included in the boost library. Kind regards. -- Edouard Alligand

I would certainly be interested in such a library. Have you benchmarked it against std::sort in any compilers? Chris On 14 Dec 2008, at 14:18, Edouard A. wrote:
Hello,
I just finished a C++ implementation of the smoothsort algorithm (a derivative of the heapsort algorithm by Dijkstra that, in his words, "is of order N log N in the worst case, but of order N in the best case, with a smooth transition between the two").
When sorting nearly sorted containers smoothsort achieves excellent performances. If you keep sorting the same container, smoothsort may be the right algorithm for you.
This implementation works on all sortable containers, with best performances on containers supporting random access iterators (in other words, smoothsorting a std::list is quite slow). It only requires the "<" operator between two elements and the user can supply the function with a binary predicate.
To my knowledge, there is no equivalent public C++ implementation.
I was wondering if this was interesting enough to be included in the boost library.
Kind regards.
--
Edouard Alligand
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

I have done only very few tests. For the moment the theory doesn't seem to pay off. Although the complexity should be close to O(N) for sorted input, std::sort performs incredibly well on such sorted input (MS' implementation is an introspective sort). To use a technical term, std::sort is currently spanking my smoothsort very hard. This will need more tests, and again, my implementation is not optimized. For example I have kept iterator's arithmetic generic so that it may work on std::list. A hint that I can do much better is that even std::make_heap, std::sort_heap outperforms my current implementation. Compiler : Visual Studio 2008, x64 platform, all optimizations known to me turned on. Processor : Intel Q6600. -- Edouard
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost- bounces@lists.boost.org] On Behalf Of Christopher Jefferson Sent: dimanche 14 décembre 2008 16:06 To: boost@lists.boost.org Subject: Re: [boost] [library submission] smoothsort
I would certainly be interested in such a library.
Have you benchmarked it against std::sort in any compilers?
Chris
On 14 Dec 2008, at 14:18, Edouard A. wrote:
Hello,
I just finished a C++ implementation of the smoothsort algorithm (a derivative of the heapsort algorithm by Dijkstra that, in his words, "is of order N log N in the worst case, but of order N in the best case, with a smooth transition between the two").
When sorting nearly sorted containers smoothsort achieves excellent performances. If you keep sorting the same container, smoothsort may be the right algorithm for you.
This implementation works on all sortable containers, with best performances on containers supporting random access iterators (in other words, smoothsorting a std::list is quite slow). It only requires the "<" operator between two elements and the user can supply the function with a binary predicate.
To my knowledge, there is no equivalent public C++ implementation.
I was wondering if this was interesting enough to be included in the boost library.
Kind regards.
--
Edouard Alligand
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

on Sun Dec 14 2008, "Edouard A." <edouard-AT-fausse.info> wrote:
I have done only very few tests. For the moment the theory doesn't seem to pay off. Although the complexity should be close to O(N) for sorted input, std::sort performs incredibly well on such sorted input (MS' implementation is an introspective sort).
Yeah, SGI was the first implementation I know of to use introsort. By now I hope they all do.
To use a technical term, std::sort is currently spanking my smoothsort very hard.
In that case, maybe you'd better prove that smoothsort can outperform introsort first :-)
This will need more tests, and again, my implementation is not optimized. For example I have kept iterator's arithmetic generic so that it may work on std::list.
Using advance() and distance() for that works well, and should be optimally efficient for random-access iterators. That's what the binary searches in most std implementations use. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Sorry for the double post, but good news everyone: I just did compilation with profiling. Smoothsort is twice faster than std::sort on sorted input, up to 5 times slower on random input. My benchmark output, on a vector of size 1,000,000 : std::sort running time: 0.171 s std::sort running time (sorted): 0.082 s Smoothsort running time: 0.6 s Smoothsort running time (sorted): 0.035 s Now I must figure out why without profiling I dont have good performances. Maybe there's some built-in optimization in MS' compiler when it encounters its own std::sort? -- Edouard
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost- bounces@lists.boost.org] On Behalf Of Christopher Jefferson Sent: dimanche 14 décembre 2008 16:06 To: boost@lists.boost.org Subject: Re: [boost] [library submission] smoothsort
I would certainly be interested in such a library.
Have you benchmarked it against std::sort in any compilers?
Chris
On 14 Dec 2008, at 14:18, Edouard A. wrote:
Hello,
I just finished a C++ implementation of the smoothsort algorithm (a derivative of the heapsort algorithm by Dijkstra that, in his words, "is of order N log N in the worst case, but of order N in the best case, with a smooth transition between the two").
When sorting nearly sorted containers smoothsort achieves excellent performances. If you keep sorting the same container, smoothsort may be the right algorithm for you.
This implementation works on all sortable containers, with best performances on containers supporting random access iterators (in other words, smoothsorting a std::list is quite slow). It only requires the "<" operator between two elements and the user can supply the function with a binary predicate.
To my knowledge, there is no equivalent public C++ implementation.
I was wondering if this was interesting enough to be included in the boost library.
Kind regards.
--
Edouard Alligand
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

on Sun Dec 14 2008, "Edouard A." <edouard-AT-fausse.info> wrote:
Sorry for the double post, but good news everyone:
I just did compilation with profiling. Smoothsort is twice faster than std::sort on sorted input, up to 5 times slower on random input.
That's progress.
My benchmark output, on a vector of size 1,000,000 :
std::sort running time: 0.171 s std::sort running time (sorted): 0.082 s Smoothsort running time: 0.6 s Smoothsort running time (sorted): 0.035 s
Now I must figure out why without profiling I don’t have good performances. Maybe there's some built-in optimization in MS' compiler when it encounters its own std::sort?
Did you use -D_SECURE_SCL=0 ? Without that, anything that you do with most iterators is a dog. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

on Sun Dec 14 2008, "Edouard A." <edouard-AT-fausse.info> wrote:
Hello,
I just finished a C++ implementation of the smoothsort algorithm (a derivative of the heapsort algorithm by Dijkstra that, in his words, "is of order N log N in the worst case, but of order N in the best case, with a smooth transition between the two").
When sorting nearly sorted containers smoothsort achieves excellent performances. If you keep sorting the same container, smoothsort may be the right algorithm for you.
Awesome. Sounds really useful.
This implementation works on all sortable containers, with best performances on containers supporting random access iterators (in other words, smoothsorting a std::list is quite slow).
Why? Is the complexity bound still respected?
It only requires the "<" operator between two elements and the user can supply the function with a binary predicate.
To my knowledge, there is no equivalent public C++ implementation.
I was wondering if this was interesting enough to be included in the boost library.
Absolutely, IMO. I think we might need a general algorithms library to put it in. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On 14 Dec 2008, at 19:21, David Abrahams wrote:
Absolutely, IMO. I think we might need a general algorithms library to put it in.
I have often been surprised at the lack of an algorithms library in boost. There has often been times I've wished for a guarded or unguarded insertion sort, or riffle shuffle. Is his lack just down to no-one stepping forward to help write and maintain such a library of algorithms? Chris

on Sun Dec 14 2008, Christopher Jefferson <chris-AT-bubblescope.net> wrote:
On 14 Dec 2008, at 19:21, David Abrahams wrote:
Absolutely, IMO. I think we might need a general algorithms library to put it in.
I have often been surprised at the lack of an algorithms library in boost. There has often been times I've wished for a guarded or unguarded insertion sort, or riffle shuffle. Is his lack just down to no-one stepping forward to help write and maintain such a library of algorithms?
Seems like it -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams wrote:
on Sun Dec 14 2008, Christopher Jefferson <chris-AT-bubblescope.net> wrote:
I have often been surprised at the lack of an algorithms library in boost. There has often been times I've wished for a guarded or unguarded insertion sort, or riffle shuffle. Is his lack just down to no-one stepping forward to help write and maintain such a library of algorithms?
Seems like it
I seem to recall that at BoostCon this year, there was a push to extend the algorithm support in Boost as part of the Library in a Week program. Did anything significant come of that? - Jim

David Abrahams wrote:
on Sun Dec 14 2008, Christopher Jefferson <chris-AT-bubblescope.net> wrote:
I have often been surprised at the lack of an algorithms library in boost. There has often been times I've wished for a guarded or unguarded insertion sort, or riffle shuffle. Is his lack just down to no-one stepping forward to help write and maintain such a library of algorithms?
Seems like it
I seem to recall that at BoostCon this year, there was a push to extend the algorithm support in Boost as part of the Library in a Week program. Did anything significant come of that?
There's some stuff in the sandbox (some of which I wrote), but as far as I know, none has come up for review. I'm hoping to get off my duff and write some docs and get some of it reviewed before this year's BoostCon. -- -- Marshall Marshall Clow Idio Software <mailto:marshall@idio.com> It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning. It is by caffeine alone I set my mind in motion.

This implementation works on all sortable containers, with best performances on containers supporting random access iterators (in other words, smoothsorting a std::list is quite slow).
Why? Is the complexity bound still respected? [Edouard A.]
You have a lot of std::advance within the algorithm, typically you will do std::advance(iterator, 25) and the like. Sucks on std::list's iterators.
Absolutely, IMO. I think we might need a general algorithms library to put it in.
[Edouard A.] I was thinking the same, by itself smoothsort would not justify the creation of a library, but where to place it? And I'm sure we could figure out a nice catalog of algorithms to implement.
Did you use -D_SECURE_SCL=0 ?
Without that, anything that you do with most iterators is a dog.
\o/ You are my hero I totally forgot about that \o/ Without profiling: std::sort running time : 0.098 std::sort running time (sorted) : 0.024 Smoothsort running time : 0.367 Smoothsort running time (sorted) : 0.017 Press any key to continue . . . For a 1,000,000 std::vector<long>. And I'm sure it's possible to improve my implementation as the processing for high entropy input is not very good. -- Edouard

On Sun, Dec 14, 2008 at 2:25 PM, Edouard A. <edouard@fausse.info> wrote:
I was thinking the same, by itself smoothsort would not justify the creation of a library, but where to place it? And I'm sure we could figure out a nice catalog of algorithms to implement.
I have a Spreadsort-based hybrid radix-based/comparison-based library I'm preparing to submit as a proposed Boost library (currently I'm regression testing a final version). I've finished adding integer, floating-point, and string versions, with functor version for flexibility. It's roughly twice as fast as std::sort across a wide range of (random) distributions. It's even faster (more than 2X) on already-sorted data relative to std::sort. I propose this: 1) Combine your unmodified smoothsort implementation with my sorting library, which I consider a logical combination. 2) I can implement versions of integer_sort, float_sort, and string_sort that use smoothsort for their comparison-based sorting approach (currently it uses std::sort), which would provide an algorithm that (ideally) is no slower than std::sort on the vast majority of data sets, and is much faster on already-sorted data. I could also convert the versions which use smoothsort to use std::advance to support non-random-access iterators. Does that interest you?

On Sun, 14 Dec 2008 16:46:25 -0800, "Steven Ross" <spreadsort@gmail.com> wrote:
On Sun, Dec 14, 2008 at 2:25 PM, Edouard A. <edouard@fausse.info> wrote:
I have a Spreadsort-based hybrid radix-based/comparison-based library I'm preparing to submit as a proposed Boost library (currently I'm regression testing a final version). I've finished adding integer, floating-point, and string versions, with functor version for flexibility. It's roughly
twice
as fast as std::sort across a wide range of (random) distributions. It's even faster (more than 2X) on already-sorted data relative to std::sort.
I propose this: 1) Combine your unmodified smoothsort implementation with my sorting library, which I consider a logical combination. 2) I can implement versions of integer_sort, float_sort, and string_sort that use smoothsort for their comparison-based sorting approach (currently it uses std::sort), which would provide an algorithm that (ideally) is no slower than std::sort on the vast majority of data sets, and is much faster on already-sorted data. I could also convert the versions which use smoothsort to use std::advance to support non-random-access iterators.
Does that interest you?
That sounds like a good idea, if I understood correctly what you propose is to start building up a Boost.Sorting library? Even more good news, I woke with a lot of ideas to improve my existing implementation, this is related to the way leonardo numbers are computed, the current way of doing is clearly suboptimal. -- EA

That sounds like a good idea, if I understood correctly what you propose is to start building up a Boost.Sorting library?
Yes, or Boost.Algorithm.Sorting.
Even more good news, I woke with a lot of ideas to improve my existing implementation, this is related to the way leonardo numbers are computed, the current way of doing is clearly suboptimal.
Considering that your algorithm is still in the tuning stages, and mine is in final regression testing, I suspect I'd be better off proposing a boost/algorithm/sorting with my algorithms, and then include smoothsort (and spreadsort hybrids with smoothsort if appropriate) when you've finished with it. I'd also be happy to take a look at your algorithm when you've finished optimizing it to see if I can tweak it a little. I've been amazed by the net improvement I've made with one 10 percent improvement and a bunch of 1 to 3 percent improvements over the last few months. They add up.

Considering that your algorithm is still in the tuning stages, and mine is in final regression testing, I suspect I'd be better off proposing a boost/algorithm/sorting with my algorithms, and then include smoothsort (and spreadsort hybrids with smoothsort if appropriate) when you've finished with it.
Perfect. Ideally I see the sorting library as a toolset providing default sorting constructs, but it should also be possible to combine sorting algorithms together to match a very specific need.
I'd also be happy to take a look at your algorithm when you've finished optimizing it to see if I can tweak it a little. I've been amazed by the net improvement I've made with one 10 percent improvement and a bunch of 1 to 3 percent improvements over the last few months. They add up.
I should be able to send a version to you very soon. It's quite possible that you find room for improvement. -- Edouard

Edouard A. wrote:
Absolutely, IMO. I think we might need a general algorithms library to put it in.
[Edouard A.]
I was thinking the same, by itself smoothsort would not justify the creation of a library, but where to place it? And I'm sure we could figure out a nice catalog of algorithms to implement.
There already is boost/algorithm directory. boost/algorithm/sorting seems like a good choice to me.

On Mon, 15 Dec 2008, Andrey Semashev wrote: Hi,
There already is boost/algorithm directory. boost/algorithm/sorting seems like a good choice to me.
Yep, a lot of that is the result of the BoostCon library-in-a-week project. You might contact Jeff Garland for further information-- we seem to have been low on steam lately, but there's a start! _Jesse Williamson ;-};

Good news everyone, In adding a precomputation for leonardo's numbers I am now consistently faster than std::sort on (nearly) sorted input (between 10% to 100%). std::sort is however much better on very random input (where it performs the best), up to 5 times better. I also wanted to add an optimization for counting the 0 to the right, but as surprising as it may sound the naïve while(!(p & 1)) { p >>= 1; ln.up<1>(); } Is faster than using tables to get the immediate count. Steven, if this is ok with you I can send you privately the current state of my implementation. It's less than 10 kb big and consist in only two header files. -- Edouard

Edouard wrote:
I just finished a C++ implementation of the smoothsort algorithm (a derivative of the heapsort algorithm by Dijkstra that, in his words, "is of order N log N in the worst case, but of order N in the best case, with a smooth transition between the two").
Wouldn't something like the following be more practical? (note: incomplete and untested) template <typename iT> void optimistic_sort(iT b, iT e) { iT low_end_of_unsorted_range = find_first_out_of_order_element(b, e); iT high_end_of_unsorted_range = find_last_out_of_order_element(low_end_of_unsorted_range, e); iT low_sorting_range = binary_search(b, low_end_of_unsorted_range, min_element(low_end_of_unsorted_range, high_end_of_unsorted_range)); iT high_sorting_range = binary_search(b, low_end_of_unsorted_range, max_element(low_end_of_unsorted_range, high_end_of_unsorted_range)); std::sort(low_sorting_range, high_sorting_range); } By implementing a function that finds the high end of the unsorted range as well as the min and max element in that range in a single pass you would have a more optimal implementation of optimistic sort than what I sketched out here. In the best case optimistic sort would be linear and generally less than 2X std::sort runtime in the worst case because its overhead is linear. It would provide a smooth transition from linear to O(n log n) as the size of the unsorted range increases. Optimistically, people who have a mostly sorted range have a small unsorted-subrange. Everyone wants a better sorting algorithm. Everyone wants a better binary tree. One thing I think would be useful is a bucketed binary tree that is more efficient with memory consumption and keeps its elements in contiguous order in the buckets. The tree would rebalance the elements to keep its bucket utilization high, but insertion and removal into the tree would invalidate its iterators. You get better cache performance and lookup/iteration speed than the std tree at the cost of iterators being invalidated by modifying the data structure. This is a pretty simple data structure to implement using a class encapsulated array as the elements of a std::set. Alternately, you can use a linked list of class encapsulated arrays with iterators into the list stored in a std::set along with the lower bound of each bucket to avoid loading buckets into cache if they aren't the one you are looking for. I may end up submitting a bucketed tree data structure along with my geometry algorithms if it turns out that it helps their performance meaningfully. Because it is a general data structure, it might not belong in the guts of a geometry library, and ought to be factored out. Regards, Luke
participants (9)
-
Andrey Semashev
-
Christopher Jefferson
-
David Abrahams
-
Edouard A.
-
James Porter
-
Jesse Williamson
-
Marshall Clow
-
Simonson, Lucanus J
-
Steven Ross