I've been reviewing a proposed addition to Boost.Sort called pdqsort. When using a general-case partitioning algorithm, it performs comparably to std::sort in my testing (Windows and Linux). When used as a replacement for the std::sort fallback in spreadsort, using a branch-reducing optimization it is ~20% faster on ints and floats, and ~40% slower on strings. The concerning part is that any comparison operator that involves a branch may have a comparable slowdown to the string case. pdqsort is also significantly faster than std::sort on mostly-sorted data and some other (relatively common) special cases. This makes little difference when used as a fallback by spreadsort, but does make a difference when used on its own. I see these possible things to do with pdqsort: 1) Reject it completely as too similar to std::sort, which is already highly optimized. 2) Add it as another optional library in the Boost.Sort library. 3) Add it to Boost.Sort, and only use it as a fallback for ints and floats. 4) Add it to Boost.Sort, use it as a fallback for all of spreadsort, and have pdqsort itself only use its branch-reduction optimization on ints and floats. 5) #4, except eliminate the branch-reduction optimization completely from pdqsort for simplicity. Does anyone have a strong preference, advice, or comments on how they use Boost.Sort? Should I run a mini-review for pdqsort?
Do you have any description of the algorithm ?
Where can find an implementation of it ?
I will examine, and say my opinion as soon as possible
2017-01-13 19:49 GMT+01:00 Steven Ross
I've been reviewing a proposed addition to Boost.Sort called pdqsort. When using a general-case partitioning algorithm, it performs comparably to std::sort in my testing (Windows and Linux). When used as a replacement for the std::sort fallback in spreadsort, using a branch-reducing optimization it is ~20% faster on ints and floats, and ~40% slower on strings. The concerning part is that any comparison operator that involves a branch may have a comparable slowdown to the string case. pdqsort is also significantly faster than std::sort on mostly-sorted data and some other (relatively common) special cases. This makes little difference when used as a fallback by spreadsort, but does make a difference when used on its own.
I see these possible things to do with pdqsort: 1) Reject it completely as too similar to std::sort, which is already highly optimized. 2) Add it as another optional library in the Boost.Sort library. 3) Add it to Boost.Sort, and only use it as a fallback for ints and floats. 4) Add it to Boost.Sort, use it as a fallback for all of spreadsort, and have pdqsort itself only use its branch-reduction optimization on ints and floats. 5) #4, except eliminate the branch-reduction optimization completely from pdqsort for simplicity.
Does anyone have a strong preference, advice, or comments on how they use Boost.Sort?
Should I run a mini-review for pdqsort?
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost
On Fri, Jan 13, 2017 at 2:51 PM Francisco José Tapia
Do you have any description of the algorithm ? Where can find an implementation of it ?
Library with Boost license: https://gist.github.com/orlp/2c417ab76391b126e1d58bac4bf4af0f Repository: https://github.com/orlp/pdqsort Paper: https://drive.google.com/file/d/0B1-vl-dPgKm_T0Fxeno1a0lGT0E/view
I will examine, and say my opinion as soon as possible
2017-01-13 19:49 GMT+01:00 Steven Ross
: I've been reviewing a proposed addition to Boost.Sort called pdqsort. When using a general-case partitioning algorithm, it performs comparably to std::sort in my testing (Windows and Linux). When used as a replacement for the std::sort fallback in spreadsort, using a branch-reducing optimization it is ~20% faster on ints and floats, and ~40% slower on strings. The concerning part is that any comparison operator that involves a branch may have a comparable slowdown to the string case. pdqsort is also significantly faster than std::sort on mostly-sorted data and some other (relatively common) special cases. This makes little difference when used as a fallback by spreadsort, but does make a difference when used on its own.
I see these possible things to do with pdqsort: 1) Reject it completely as too similar to std::sort, which is already highly optimized. 2) Add it as another optional library in the Boost.Sort library. 3) Add it to Boost.Sort, and only use it as a fallback for ints and floats. 4) Add it to Boost.Sort, use it as a fallback for all of spreadsort, and have pdqsort itself only use its branch-reduction optimization on ints and floats. 5) #4, except eliminate the branch-reduction optimization completely from pdqsort for simplicity.
Does anyone have a strong preference, advice, or comments on how they use Boost.Sort?
Should I run a mini-review for pdqsort?
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Fri, 13 Jan 2017, Steven Ross wrote:
I've been reviewing a proposed addition to Boost.Sort called pdqsort. When using a general-case partitioning algorithm, it performs comparably to std::sort in my testing (Windows and Linux). When used as a replacement for the std::sort fallback in spreadsort, using a branch-reducing optimization it is ~20% faster on ints and floats, and ~40% slower on strings. The concerning part is that any comparison operator that involves a branch may have a comparable slowdown to the string case. pdqsort is also significantly faster than std::sort on mostly-sorted data and some other (relatively common) special cases. This makes little difference when used as a fallback by spreadsort, but does make a difference when used on its own.
I see these possible things to do with pdqsort: 1) Reject it completely as too similar to std::sort, which is already highly optimized. 2) Add it as another optional library in the Boost.Sort library. 3) Add it to Boost.Sort, and only use it as a fallback for ints and floats. 4) Add it to Boost.Sort, use it as a fallback for all of spreadsort, and have pdqsort itself only use its branch-reduction optimization on ints and floats. 5) #4, except eliminate the branch-reduction optimization completely from pdqsort for simplicity.
Does anyone have a strong preference, advice, or comments on how they use Boost.Sort?
Should I run a mini-review for pdqsort?
I assume this is the same pdqsort (https://github.com/orlp/pdqsort) that was proposed for both libc++ and libstdc++ as an implementation of std::sort. For libstdc++, sadly, it was never reviewed because the reviewer turned out to be busier than he expected. I didn't follow what happened for libc++.
From my notes (https://gcc.gnu.org/ml/libstdc++/2015-10/msg00040.html), on my application where copy is cheap (one pointer) but the comparison can be a bit expensive (reading memory through several pointers), libstdc++ sort was taking 3.46s, stable_sort 3.08s, and pdqsort 2.85s.
I am surprised you found it so much slower for strings. Was it C++03 or C++11? COW strings or not? Short or long strings? Did the author have any comment on this benchmark? Now the code may have changed since I tried it in 2015, maybe it was re-tuned more for int/float (the branch-reducing stuff?) and less for more expensive types since then... (A naive question: do people ever sort a large, plain vector<int> or vector<double>? I don't think I've ever used sort with basic types, or at least not on large vectors where performance was critical. The closest would have the int/double as one element of a pair.) I'd rather see the various implementations of std::sort improved to the point where integrating pdqsort in boost would be useless, but I am not sure that's going to happen. I am not sure what your benchmark involves exactly. If pdqsort is used only as a fallback for spreadsort, that's a special kind of input that might not represent what users would feed it if they called it directly. However, a 40% slowdown is a serious cause for concern... You don't say if disabling the branch-reduction option improves performance for strings, which your #4 hints at. -- Marc Glisse
On Fri, Jan 13, 2017, 4:05 PM Marc Glisse
On Fri, 13 Jan 2017, Steven Ross wrote:
I've been reviewing a proposed addition to Boost.Sort called pdqsort. When using a general-case partitioning algorithm, it performs comparably to std::sort in my testing (Windows and Linux). When used as a replacement for the std::sort fallback in spreadsort, using a branch-reducing optimization it is ~20% faster on ints and floats, and ~40% slower on strings. The concerning part is that any comparison operator that involves a branch may have a comparable slowdown to the string case. pdqsort is also significantly faster than std::sort on mostly-sorted data and some other (relatively common) special cases. This makes little difference when used as a fallback by spreadsort, but does make a difference when used on its own.
I see these possible things to do with pdqsort: 1) Reject it completely as too similar to std::sort, which is already highly optimized. 2) Add it as another optional library in the Boost.Sort library. 3) Add it to Boost.Sort, and only use it as a fallback for ints and floats. 4) Add it to Boost.Sort, use it as a fallback for all of spreadsort, and have pdqsort itself only use its branch-reduction optimization on ints and floats. 5) #4, except eliminate the branch-reduction optimization completely from pdqsort for simplicity.
Does anyone have a strong preference, advice, or comments on how they use Boost.Sort?
Should I run a mini-review for pdqsort?
I assume this is the same pdqsort (https://github.com/orlp/pdqsort) that was proposed for both libc++ and libstdc++ as an implementation of std::sort. For libstdc++, sadly, it was never reviewed because the reviewer turned out to be busier than he expected. I didn't follow what happened for libc++.
From my notes (https://gcc.gnu.org/ml/libstdc++/2015-10/msg00040.html), on my application where copy is cheap (one pointer) but the comparison can be a bit expensive (reading memory through several pointers), libstdc++ sort was taking 3.46s, stable_sort 3.08s, and pdqsort 2.85s.
I am surprised you found it so much slower for strings. Was it C++03 or C++11? COW strings or not? Short or long strings? Did the author have any comment on this benchmark? Now the code may have changed since I tried it in 2015, maybe it was re-tuned more for int/float (the branch-reducing stuff?) and less for more expensive types since then...
(A naive question: do people ever sort a large, plain vector<int> or vector<double>? I don't think I've ever used sort with basic types, or at least not on large vectors where performance was critical. The closest would have the int/double as one element of a pair.)
I'd rather see the various implementations of std::sort improved to the point where integrating pdqsort in boost would be useless, but I am not sure that's going to happen.
I am not sure what your benchmark involves exactly. If pdqsort is used only as a fallback for spreadsort, that's a special kind of input that might not represent what users would feed it if they called it directly. However, a 40% slowdown is a serious cause for concern... You don't say if disabling the branch-reduction option improves performance for strings, which your #4 hints at.
It does; the speed is roughly the same as std::sort as a fallback for spreadsort for both ints and strings when I revert the branch reduction optimization. The standalone (outside spreadsort) improvement for mostly-sorted data is substantial both with and without the branch optimization. I have sorted basic data types at work (it's especially useful for compression), but pairs are more common.
-- Marc Glisse
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 01/13/2017 10:05 PM, Marc Glisse wrote:
(A naive question: do people ever sort a large, plain vector<int> or vector<double>? I don't think I've ever used sort with basic types, or at least not on large vectors where performance was critical. The closest would have the int/double as one element of a pair.)
Especially your second case is quite common in my work. I also have some occasional vectors of integers only that need to be sorted. These are (in my case) always lists of some hashes or indices that are ordered for later (binary) searching if a special element exists. Both cases usually involve between 10 and a few hundred million values. So although I have to admit that sorting (and searching) these lists is not where most of the time is spent in my work, it was fairly easy to speed these parts up by simply using spreadsort over std::sort. So if I get increased performance practically for free I say "Thank you!" and take it. Norbert
On 01/13/2017 07:49 PM, Steven Ross wrote:
I see these possible things to do with pdqsort: 1) Reject it completely as too similar to std::sort, which is already highly optimized. 2) Add it as another optional library in the Boost.Sort library. 3) Add it to Boost.Sort, and only use it as a fallback for ints and floats. 4) Add it to Boost.Sort, use it as a fallback for all of spreadsort, and have pdqsort itself only use its branch-reduction optimization on ints and floats. 5) #4, except eliminate the branch-reduction optimization completely from pdqsort for simplicity.
Does anyone have a strong preference, advice, or comments on how they use Boost.Sort?
I don't have a *strong* preference, but if easily possible I prefer option 2 so I can make my own measurements and decide whether I want to use pdqsort. If you additionally add it as a fallback to default spreadsort for whatever cases you figure out it works reasonably better than std::sort I'd consider this a plus. Norbert
On 13/01/2017 19:49, Steven Ross wrote:
I've been reviewing a proposed addition to Boost.Sort called pdqsort. When using a general-case partitioning algorithm, it performs comparably to std::sort in my testing (Windows and Linux). When used as a replacement for the std::sort fallback in spreadsort, using a branch-reducing optimization it is ~20% faster on ints and floats, and ~40% slower on strings. The concerning part is that any comparison operator that involves a branch may have a comparable slowdown to the string case. pdqsort is also significantly faster than std::sort on mostly-sorted data and some other (relatively common) special cases. This makes little difference when used as a fallback by spreadsort, but does make a difference when used on its own.
I see these possible things to do with pdqsort: 1) Reject it completely as too similar to std::sort, which is already highly optimized. 2) Add it as another optional library in the Boost.Sort library. 3) Add it to Boost.Sort, and only use it as a fallback for ints and floats. 4) Add it to Boost.Sort, use it as a fallback for all of spreadsort, and have pdqsort itself only use its branch-reduction optimization on ints and floats. 5) #4, except eliminate the branch-reduction optimization completely from pdqsort for simplicity.
I'd be interested in having Boost-licensed quicksort-like algorithm. In some containers supporting C++03 I need a move-emulation enabled implementation. If the general-purpose algorithm is not suitable for Boost.Sort, then I'd add it as an implementation detail in Boost.Move. Is the original author proposing an implementation with a Boost license or someone has rewritten the algorithm based on the paper? Best, Ion
On Thu, Jan 19, 2017 at 11:12 AM Ion Gaztañaga
On 13/01/2017 19:49, Steven Ross wrote:
I've been reviewing a proposed addition to Boost.Sort called pdqsort. When using a general-case partitioning algorithm, it performs comparably to std::sort in my testing (Windows and Linux). When used as a replacement for the std::sort fallback in spreadsort, using a branch-reducing optimization it is ~20% faster on ints and floats, and ~40% slower on strings. The concerning part is that any comparison operator that involves a branch may have a comparable slowdown to the string case. pdqsort is also significantly faster than std::sort on mostly-sorted data and some other (relatively common) special cases. This makes little difference when used as a fallback by spreadsort, but does make a difference when used on its own.
I see these possible things to do with pdqsort: 1) Reject it completely as too similar to std::sort, which is already highly optimized. 2) Add it as another optional library in the Boost.Sort library. 3) Add it to Boost.Sort, and only use it as a fallback for ints and floats. 4) Add it to Boost.Sort, use it as a fallback for all of spreadsort, and have pdqsort itself only use its branch-reduction optimization on ints and floats. 5) #4, except eliminate the branch-reduction optimization completely from pdqsort for simplicity.
I'd be interested in having Boost-licensed quicksort-like algorithm. In some containers supporting C++03 I need a move-emulation enabled implementation. If the general-purpose algorithm is not suitable for Boost.Sort, then I'd add it as an implementation detail in Boost.Move.
Is the original author proposing an implementation with a Boost license or someone has rewritten the algorithm based on the paper?
It's Orson Peters, so I think he's the author.
Best,
Ion
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (5)
-
Francisco José Tapia
-
Ion Gaztañaga
-
Marc Glisse
-
Norbert Wenzel
-
Steven Ross