Re: [boost] [sorting]

Sam: again I'm curious (read: highly skeptical). When you say "much slower" do you base yourself on actual data? >I.e. Have you tried it? The reason I ask is that I don't believe it. Let's talk for concreteness of
Herve Bronnimann wrote three levels (name, >number, grade). Any algorithm that does three passes on the data when one will do will be slower (doesn't matter how >complicated your comparison is, if all the name data is distinct the outer sort comparison will conclude quickly). >Radix sort insists on doing all three passes even if all names are distinct. I'm sure data will back me up, if not please >show me the numbers. --Herve Here are some numbers for normal quicksort and radix quicksort on sorting strings - these are from Algorithms in C++ third edition, by Robert Sedgewick. N is the number of words, and these words are from Moby Dick. N standard quicksort three way partitioning quicksort radix quicksort 12500 7 6 6 25000 14 12 11 50000 34 26 25 100000 83 61 54 Now, these results are from single words. here is the general equation for quicksort and radix quicksort on strings Radix quicksort - k + (2N log N), where k is the average number of bytes in each string. Normal quicksort k(2N log N), where k is the average number of bytes in each string. Imagine the speed difference in the sorting algorithms if the strings were first and last names, which commonly reach lengths of 15 or more characters, unlike the words in Moby Dick. Think of how much faster radix quicksort would be.
From the point of view of a user appending bytes onto the end of a string from a native arithmetic type, even if you lose time on the integer bytes, you gain a very large amount of time when you sort the name part of the string.

Sam: again I'm curious (read: highly skeptical). When you say "much slower" do you base yourself on actual data? >I.e. Have you tried it? The reason I ask is that I don't believe it. Let's talk for concreteness of
Sam: I have my copy of the sedgewick (I took his advanced algorithms class back when I was a graduate student in geometric algorithms at Princeton). I meant *your* impl of radix sort vs *std:sort* which btw is not the standard quicksort. Remember: I progammed radix sort too and I didn't see an improvement contrary to what I expected. I guess I will have to try your code tonight... --Herve -- Hervé Sent from my BlackBerry® wireless handheld Please excuse misspellings and typos -----Original Message----- From: Sam Schetterer <samthecppman@gmail.com> Date: Sun, 18 Mar 2007 14:18:11 To:boost <boost@lists.boost.org> Subject: Re: [boost] [sorting] Herve Bronnimann wrote three levels (name, >number, grade). Any algorithm that does three passes on the data when one will do will be slower (doesn't matter how >complicated your comparison is, if all the name data is distinct the outer sort comparison will conclude quickly). >Radix sort insists on doing all three passes even if all names are distinct. I'm sure data will back me up, if not please >show me the numbers. --Herve Here are some numbers for normal quicksort and radix quicksort on sorting strings - these are from Algorithms in C++ third edition, by Robert Sedgewick. N is the number of words, and these words are from Moby Dick. N standard quicksort three way partitioning quicksort radix quicksort 12500 7 6 6 25000 14 12 11 50000 34 26 25 100000 83 61 54 Now, these results are from single words. here is the general equation for quicksort and radix quicksort on strings Radix quicksort - k + (2N log N), where k is the average number of bytes in each string. Normal quicksort k(2N log N), where k is the average number of bytes in each string. Imagine the speed difference in the sorting algorithms if the strings were first and last names, which commonly reach lengths of 15 or more characters, unlike the words in Moby Dick. Think of how much faster radix quicksort would be.
From the point of view of a user appending bytes onto the end of a string from a native arithmetic type, even if you lose time on the integer bytes, you gain a very large amount of time when you sort the name part of the string.
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (2)
-
Herve Bronnimann
-
Sam Schetterer