[sort] Timsort review result

older
Re: [boost] [cmake] Minimum viable...

Steven Ross

13 Jun 2017 13 Jun '17

11:09 a.m.

After reviewing the responses, I have rejected Timsort. Here are the reasons: 1. No author/maintainer responses to the questions on boost (except for forwarding some documentation links). We need a maintainer for all our source, and I'm not volunteering to maintain new functions as complex as Timsort. 2. There was no response on the Timsort bug reported during the review, nor any tests for it. 3. We need better tests. 4. Francisco is including a more efficient stable sort as part of his additions to the library, including handling special cases like Timsort handles better than stable_sort does. Francisco will maintain his code. I'd like to note that many people THINK they are experts on sorting because they covered it in class or may have read about it in a textbook, but most do not understand hybrid sorts well. As a specific example, I saw O(N*log(N)) mentioned as the worst-case speed limit of general-case sorts, when in fact Spreadsort beats that as described in its docs: http://www.boost.org/doc/libs/1_61_0/libs/sort/doc/html/index.html#sort.over... Many people also don't carefully analyze for worst-case performance, which can be tricky. With regards to future sorting library reviews, I'll get at least a one-page doc describing the algorithm for the review in advance, but the boost documentation formatting is an unreasonable request for a simple sorting function, and I don't see the value it adds for these simple reviews. It is useful for a full library, but we'll be integrating the documentation into our full library after a new function is accepted. It isn't accurate to call it a mini-review (as described here: http://www.boost.org/community/reviews.html#Fast-Track) because it's reviewing completely new source we're talking about adding to the library. That said, do you still want me to a call the upcoming one for pdqsort a "mini-review"?

Show replies by date

Soul Studios

16 Jun 16 Jun

10:07 p.m.

Time complexity is not the only consideration when reviewing sorting techniques. Smoothsort is purported to have much better time complexity across a range of scenarios, but in reality it's cache performance is rubbish. That's where quicksort variants and timsort work well. I haven't investigated spreadsort personally. On 13/06/2017 11:09 p.m., Steven Ross via Boost wrote:

...

After reviewing the responses, I have rejected Timsort. Here are the reasons:

1. No author/maintainer responses to the questions on boost (except for forwarding some documentation links). We need a maintainer for all our source, and I'm not volunteering to maintain new functions as complex as Timsort. 2. There was no response on the Timsort bug reported during the review, nor any tests for it. 3. We need better tests. 4. Francisco is including a more efficient stable sort as part of his additions to the library, including handling special cases like Timsort handles better than stable_sort does. Francisco will maintain his code.

I'd like to note that many people THINK they are experts on sorting because they covered it in class or may have read about it in a textbook, but most do not understand hybrid sorts well. As a specific example, I saw O(N*log(N)) mentioned as the worst-case speed limit of general-case sorts, when in fact Spreadsort beats that as described in its docs: http://www.boost.org/doc/libs/1_61_0/libs/sort/doc/html/index.html#sort.over...

Many people also don't carefully analyze for worst-case performance, which can be tricky.

With regards to future sorting library reviews, I'll get at least a one-page doc describing the algorithm for the review in advance, but the boost documentation formatting is an unreasonable request for a simple sorting function, and I don't see the value it adds for these simple reviews. It is useful for a full library, but we'll be integrating the documentation into our full library after a new function is accepted.

It isn't accurate to call it a mini-review (as described here: http://www.boost.org/community/reviews.html#Fast-Track) because it's reviewing completely new source we're talking about adding to the library. That said, do you still want me to a call the upcoming one for pdqsort a "mini-review"?

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Steven Ross

11:39 p.m.

On Fri, Jun 16, 2017, 6:08 PM Soul Studios via Boost <boost@lists.boost.org> wrote:

...

Time complexity is not the only consideration when reviewing sorting techniques. Smoothsort is purported to have much better time complexity across a range of scenarios, but in reality it's cache performance is rubbish. That's where quicksort variants and timsort work well. I haven't investigated spreadsort personally.

Agreed, that's why spreadsort has to limit itself to dividing into a number of buckets that will fit in L1 cache (its theoretical time complexity would be better otherwise). It works out to between slightly faster than std::sort to about 2X as fast on most data. With string or other multiple piece keys the spreadsort speedup can be spectacular, as it will traverse the full length of the key at most once during the radix portion of the sort, where comparison sorts can do so for each comparison. When I modified bzip to use spreadsort the overall compression runtime (more than just sorting) dropped 40%. Smoothsort has the same nlogn worst case as other comparison sorts, and it has the cache issues of heapsort, which makes for poor performance on average.

...

On 13/06/2017 11:09 p.m., Steven Ross via Boost wrote:

...
After reviewing the responses, I have rejected Timsort. Here are the reasons:

1. No author/maintainer responses to the questions on boost (except for forwarding some documentation links). We need a maintainer for all our source, and I'm not volunteering to maintain new functions as complex as Timsort. 2. There was no response on the Timsort bug reported during the review, nor any tests for it. 3. We need better tests. 4. Francisco is including a more efficient stable sort as part of his additions to the library, including handling special cases like Timsort handles better than stable_sort does. Francisco will maintain his code.

I'd like to note that many people THINK they are experts on sorting because they covered it in class or may have read about it in a textbook, but most do not understand hybrid sorts well. As a specific example, I saw O(N*log(N)) mentioned as the worst-case speed limit of general-case sorts, when in fact Spreadsort beats that as described in its docs:

http://www.boost.org/doc/libs/1_61_0/libs/sort/doc/html/index.html#sort.over...

...
Many people also don't carefully analyze for worst-case performance,

...
can be tricky.

With regards to future sorting library reviews, I'll get at least a one-page doc describing the algorithm for the review in advance, but the boost documentation formatting is an unreasonable request for a simple sorting function, and I don't see the value it adds for these simple reviews. It is useful for a full library, but we'll be integrating the documentation into our full library after a new function is accepted.

It isn't accurate to call it a mini-review (as described here: http://www.boost.org/community/reviews.html#Fast-Track) because it's reviewing completely new source we're talking about adding to the

which library.

...
That said, do you still want me to a call the upcoming one for pdqsort a "mini-review"?

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Paul A. Bristow

21 Jun 21 Jun

9:52 a.m.

...

-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Steven Ross via Boost Sent: 13 June 2017 12:09 To: boost@lists.boost.org Cc: Steven Ross Subject: [boost] [sort] Timsort review result

After reviewing the responses, I have rejected Timsort. Here are the reasons:

1. No author/maintainer responses to the questions on boost (except for forwarding some documentation links). We need a maintainer for all our source, and I'm not volunteering to maintain new functions as complex as Timsort. 2. There was no response on the Timsort bug reported during the review, nor any tests for it.

Disappointing but agree with your judgement - a commitment to maintenance is vital, no matter how perfect the code ;-)

...

3. We need better tests. 4. Francisco is including a more efficient stable sort as part of his additions to the library, including handling special cases like Timsort handles better than stable_sort does. Francisco will maintain his code.

:-)

...

With regards to future sorting library reviews, I'll get at least a one-page doc describing the algorithm for the review in advance, but the boost documentation formatting is an unreasonable request for a simple sorting function, and I don't see the value it adds for these simple reviews. It is useful for a full library, but we'll be integrating the documentation into our full library after a new function is accepted.

That sounds fine to me - it was the total lack of docs that wasn't OK.

...

It isn't accurate to call it a mini-review (as described here: http://www.boost.org/community/reviews.html#Fast-Track) because it's reviewing completely new source we're talking about adding to the library. That said, do you still want me to a call the upcoming one for pdqsort a "mini-review"?

FWIW, and not wanting to go all legalistic, I think pdqsort does qualify - provided the Review Wizard says it does ;-) Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830

2930

Age (days ago)

2938

Last active (days ago)

List overview

Download

3 comments

3 participants

participants (3)

Paul A. Bristow
Soul Studios
Steven Ross