[sorting] Taking away quicksort and mergesort

Would users care if I took quicksort and merge sort out of the library? They are already implemented by the stl.

Sam Schetterer wrote:
Would users care if I took quicksort and merge sort out of the library? They are already implemented by the stl.
Sorry I haven't looked at your lib yet, I plan to this weekend... But perhaps you should compare you quicksort and mergesort implementation to the stl ones to see if there are differences. I ask because I have a quicksort implementation specifically written, not to be faster, but to have tighter complexity bounds. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Rene: What do you mean by "tighter complexity bounds"? Just curious, -- Hervé Brönnimann hervebronnimann@mac.com On Mar 17, 2007, at 1:16 AM, Rene Rivera wrote:
Sam Schetterer wrote:
Would users care if I took quicksort and merge sort out of the library? They are already implemented by the stl.
Sorry I haven't looked at your lib yet, I plan to this weekend... But perhaps you should compare you quicksort and mergesort implementation to the stl ones to see if there are differences. I ask because I have a quicksort implementation specifically written, not to be faster, but to have tighter complexity bounds.
-- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/ listinfo.cgi/boost

Hervé Brönnimann wrote:
Rene: What do you mean by "tighter complexity bounds"? Just curious,
I meant tighter guarantees of the complexity of the algorithm, in big O. The customary quicksort is usually O(NlogN) on most sequences, but it can be as high as O(N2). The quicksort variant I have has a guaranteed worst case of O(NlogN). The price for that guarantee is a larger K overall. Which if one is using objects for which the comparison is expensive is a better trade off. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

on Sat Mar 17 2007, Rene Rivera <grafikrobot-AT-gmail.com> wrote:
Hervé Brönnimann wrote:
Rene: What do you mean by "tighter complexity bounds"? Just curious,
I meant tighter guarantees of the complexity of the algorithm, in big O. The customary quicksort is usually O(NlogN) on most sequences, but it can be as high as O(N2). The quicksort variant I have has a guaranteed worst case of O(NlogN). The price for that guarantee is a larger K overall. Which if one is using objects for which the comparison is expensive is a better trade off.
How is your version different from introsort? http://en.wikipedia.org/wiki/Introsort -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
on Sat Mar 17 2007, Rene Rivera <grafikrobot-AT-gmail.com> wrote:
Hervé Brönnimann wrote:
Rene: What do you mean by "tighter complexity bounds"? Just curious, I meant tighter guarantees of the complexity of the algorithm, in big O. The customary quicksort is usually O(NlogN) on most sequences, but it can be as high as O(N2). The quicksort variant I have has a guaranteed worst case of O(NlogN). The price for that guarantee is a larger K overall. Which if one is using objects for which the comparison is expensive is a better trade off.
How is your version different from introsort?
It's different in that it's not introsort ;-) It's the 3-way partitioning quicksort by Bentley and McIlroy <http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol23/issue11/spe862jb.pdf>. Analysis of the complexity in this presentation <http://www.cs.princeton.edu/~rs/talks/QuicksortIsOptimal.pdf>. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Rene: you misunderstood the meaning of the word optimal in that algorithm. There is no version of quicksort which takes n log n time in the worst case, short of doing some selection on the pivot. The algorithm you quote, iirc, is optimal on average even when the distribution has many equal elements (which in itself is an achievement since almost any version of quicksort is not going to have an average time of n log n when many elements are equal). It can still (for some particular sequence) take quadratic time. Doug McIlroy gave such a sequence in the 80s (he calls it the killer sequence since it makes almost any variation of quicksort run in quadratic time). Please, take a look at John Valois revisited version of introsort. John no longer works in academia, but at some point he posted his code online and I believe it must still be out there. It's a good read ! Herve -- Hervé Sent from my BlackBerry® wireless handheld Please excuse misspellings and typos -----Original Message----- From: Rene Rivera <grafikrobot@gmail.com> Date: Sun, 18 Mar 2007 13:03:27 To:boost@lists.boost.org Subject: Re: [boost] [sorting] Taking away quicksort and mergesort David Abrahams wrote:
on Sat Mar 17 2007, Rene Rivera <grafikrobot-AT-gmail.com> wrote:
Hervé Brönnimann wrote:
Rene: What do you mean by "tighter complexity bounds"? Just curious, I meant tighter guarantees of the complexity of the algorithm, in big O. The customary quicksort is usually O(NlogN) on most sequences, but it can be as high as O(N2). The quicksort variant I have has a guaranteed worst case of O(NlogN). The price for that guarantee is a larger K overall. Which if one is using objects for which the comparison is expensive is a better trade off.
How is your version different from introsort?
It's different in that it's not introsort ;-) It's the 3-way partitioning quicksort by Bentley and McIlroy <http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol23/issue11/spe862jb.pdf>. Analysis of the complexity in this presentation <http://www.cs.princeton.edu/~rs/talks/QuicksortIsOptimal.pdf>. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Herve Bronnimann wrote:
Rene: you misunderstood the meaning of the word optimal in that algorithm.
Perhaps :-) I only found the references I posted today. My original source for the algo was in a paper for creation of suffix arrays. In my use case I can't use the standard sort (specifically stable_sort), because the implementation doesn't provide the appropriate hooks to do the needed updates on the suffix array build procedure.
There is no version of quicksort which takes n log n time in the worst case, short of doing some selection on the pivot.
Now it's you that has to be clear. Do you mean "short of doing some selection *of* the pivot"? I ask because the Bentley-McIlroy paper does use pivot selection. Albeit it's a statistical sampling selection, which is why it adds to the complexity and it also doesn't play nice with the cache.
The algorithm you quote, iirc, is optimal on average even when the distribution has many equal elements (which in itself is an achievement since almost any version of quicksort is not going to have an average time of n log n when many elements are equal). It can still (for some particular sequence) take quadratic time. Doug McIlroy gave such a sequence in the 80s (he calls it the killer sequence since it makes almost any variation of quicksort run in quadratic time).
Hm, I would think McIlroy would take such a sequence into consideration in the paper he writes, with Bentley, in 93.
Please, take a look at John Valois revisited version of introsort. John no longer works in academia, but at some point he posted his code online and I believe it must still be out there. It's a good read ! Herve -- Hervé
I'm failing to find a reference to that. Any change you know where to find it? ...Anyway my original point in this thread was that different sort implementations offer different trade offs if one has some knowledge about the input sequences. Hence dismissing them just because they aren't faster than the standard sort is premature. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Rene Rivera wrote:
Sam Schetterer wrote:
Would users care if I took quicksort and merge sort out of the library? They are already implemented by the stl.
Sorry I haven't looked at your lib yet, I plan to this weekend... But perhaps you should compare you quicksort and mergesort implementation to the stl ones to see if there are differences. I ask because I have a quicksort implementation specifically written, not to be faster, but to have tighter complexity bounds.
To illustrate I expanded a perf test I have for sorting to include the two std sorts, and the other sort I have "inplace_mergesort". Sorry about the wrapping... The interesting case is at the bottom where it sorts 512K strings. --- Start iteration #0, item count = 524288 --- --| struct rsi_tri_part_quicksort_, no-op: class std::vector<int,class std::allocator<int> > --| total time 2.1250 iterations 10485760 iteration 0.0000 iterations/second 4.7059 M --| struct rsi_inplace_mergesort_, no-op: class std::vector<int,class std::allocator<int> > --| total time 4.6719 iterations 10485760 iteration 0.0000 iterations/second 2.1405 M --| struct std_sort_, no-op: class std::vector<int,class std::allocator<int> > --| total time 0.5938 iterations 10485760 iteration 0.0000 iterations/second 16.8421 M --| struct std_stable_sort_, no-op: class std::vector<int,class std::allocator<int> > --| total time 1.9063 iterations 10485760 iteration 0.0000 iterations/second 5.2459 M --- --| struct rsi_tri_part_quicksort_, reverse: class std::vector<int,class std::allocator<int> > --| total time 2.1563 iterations 10485760 iteration 0.0000 iterations/second 4.6377 M --| struct rsi_inplace_mergesort_, reverse: class std::vector<int,class std::allocator<int> > --| total time 4.5938 iterations 10485760 iteration 0.0000 iterations/second 2.1769 M --| struct std_sort_, reverse: class std::vector<int,class std::allocator<int> > --| total time 0.6563 iterations 10485760 iteration 0.0000 iterations/second 15.2381 M --| struct std_stable_sort_, reverse: class std::vector<int,class std::allocator<int> > --| total time 2.3438 iterations 10485760 iteration 0.0000 iterations/second 4.2667 M --- --| struct rsi_tri_part_quicksort_, random: class std::vector<int,class std::allocator<int> > --| total time 3.3594 iterations 10485760 iteration 0.0000 iterations/second 2.9767 M --| struct rsi_inplace_mergesort_, random: class std::vector<int,class std::allocator<int> > --| total time 5.1250 iterations 10485760 iteration 0.0000 iterations/second 1.9512 M --| struct std_sort_, random: class std::vector<int,class std::allocator<int> > --| total time 1.5625 iterations 10485760 iteration 0.0000 iterations/second 6.4000 M --| struct std_stable_sort_, random: class std::vector<int,class std::allocator<int> > --| total time 2.4688 iterations 10485760 iteration 0.0000 iterations/second 4.0506 M --- --| struct rsi_tri_part_quicksort_, blocks of 3 - no-op: class std::vector<int,class std::allocator<int> > --| total time 2.2031 iterations 10485760 iteration 0.0000 iterations/second 4.5390 M --| struct rsi_inplace_mergesort_, blocks of 3 - no-op: class std::vector<int,class std::allocator<int> > --| total time 4.6094 iterations 10485760 iteration 0.0000 iterations/second 2.1695 M --| struct std_sort_, blocks of 3 - no-op: class std::vector<int,class std::allocator<int> > --| total time 0.5625 iterations 10485760 iteration 0.0000 iterations/second 17.7778 M --| struct std_stable_sort_, blocks of 3 - no-op: class std::vector<int,class std::allocator<int> > --| total time 1.9063 iterations 10485760 iteration 0.0000 iterations/second 5.2459 M --- --| struct rsi_tri_part_quicksort_, blocks of 3 - reverse: class std::vector<int,class std::allocator<int> > --| total time 2.2656 iterations 10485760 iteration 0.0000 iterations/second 4.4138 M --| struct rsi_inplace_mergesort_, blocks of 3 - reverse: class std::vector<int,class std::allocator<int> > --| total time 4.7344 iterations 10485760 iteration 0.0000 iterations/second 2.1122 M --| struct std_sort_, blocks of 3 - reverse: class std::vector<int,class std::allocator<int> > --| total time 0.6719 iterations 10485760 iteration 0.0000 iterations/second 14.8837 M --| struct std_stable_sort_, blocks of 3 - reverse: class std::vector<int,class std::allocator<int> > --| total time 2.2188 iterations 10485760 iteration 0.0000 iterations/second 4.5070 M --- --| struct rsi_tri_part_quicksort_, blocks of 3 - random: class std::vector<int,class std::allocator<int> > --| total time 3.1094 iterations 10485760 iteration 0.0000 iterations/second 3.2161 M --| struct rsi_inplace_mergesort_, blocks of 3 - random: class std::vector<int,class std::allocator<int> > --| total time 5.1563 iterations 10485760 iteration 0.0000 iterations/second 1.9394 M --| struct std_sort_, blocks of 3 - random: class std::vector<int,class std::allocator<int> > --| total time 1.5313 iterations 10485760 iteration 0.0000 iterations/second 6.5306 M --| struct std_stable_sort_, blocks of 3 - random: class std::vector<int,class std::allocator<int> > --| total time 2.5313 iterations 10485760 iteration 0.0000 iterations/second 3.9506 M --- --| struct rsi_tri_part_quicksort_, random: class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > --| total time 39.2656 iterations 10485760 iteration 0.0000 iterations/second 260.7879 K --| struct rsi_inplace_mergesort_, random: class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > --| total time 340.3269 iterations 10485760 iteration 0.0000 iterations/second 30.0887 K --| struct std_sort_, random: class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > --| total time 44.5469 iterations 10485760 iteration 0.0000 iterations/second 229.8702 K --| struct std_stable_sort_, random: class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > --| total time 53.4688 iterations 10485760 iteration 0.0000 iterations/second 191.5137 K -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Sam Schetterer wrote:
Would users care if I took quicksort and merge sort out of the library? They are already implemented by the stl.
I think you meant "standard library" when you wrote "stl". Your statement isn't correct. Neither "quicksort" (there's a qsort function that got inherited from C. That function neither needs to implement quicksort nor does it fit very well into the C++ standard library) nor "mergesort" are mentioned at any point in the C++ standard (there are merge operations, though. However, they aren't a complete sorting algorithm). Do not confuse the specific C++ implementation you are using with requirements from the standard. However, adding another quicksort imlpementation doesn't make much sense if it doesn't cover anything that isn't already covered by the specification of std::sort, std::stable_sort etc. Mergesort, OTOH, could be very interesting to have. Mergesort is a good choice for sorting large amounts of data on external storage. Among the parameters for that algorithm could be the point of a strategy switch, e.g. number of data elements that get sorted in memory (similar to switching from QS to some other algorihm for smaller sets.) Regards, m Send instant messages to your online friends http://au.messenger.yahoo.com

Sam: the Stl already implements sorting (stable and not), inplace and partial sorting (uses heapsort), nth_element (generalized median), functors and iterators (generic interface). As far as I can tell, your library has no added value, except you claim performance. I think you might badly underestimate the effort that goes into implementing the C++ std library. The only reason someone would consider using your library over the standard library would be if you could convincingly and reproducingly show that you are faster (and by more than a small percentage). I haven't seen numbers. I looked at your code and I see that you're using unguarded recursion in quicksort (check Valois's introspective sort revisited, published in software practice and experience, 2000), swap in insertion sort, int instead of ptrdiff_t (you're going to have a bad surprise sorting 2 billion integers on a 64bit platform), etc. My experiences showed that radix sort was not faster than quicksort, due to the random access pattern (for large arrays) and overhead of counters (for small arrays). And I have some doubts about the endianness (did you test on both big and small endian platforms?). That's a lot of comments. I'll leave it at that. Cheers, Herve -- Hervé Sent from my BlackBerry® wireless handheld Please excuse misspellings and typos -----Original Message----- From: Sam Schetterer <samthecppman@gmail.com> Date: Fri, 16 Mar 2007 22:06:36 To:boost <boost@lists.boost.org> Subject: [boost] [sorting] Taking away quicksort and mergesort Would users care if I took quicksort and merge sort out of the library? They are already implemented by the stl. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (6)
-
David Abrahams
-
Herve Bronnimann
-
Hervé Brönnimann
-
Martin Wille
-
Rene Rivera
-
Sam Schetterer