[sort] Timsort review reminder
The review for C++ Timsort in Boost is ongoing and will continue through June 12. I am the review manager. The code can be found here: https://github.com/boostorg/sort/pull/12 To merge it into a local copy of the Boost.Sort library: git checkout -b ZaMaZaN4iK-feature_branch/TimSort develop git pull https://github.com/ZaMaZaN4iK/sort.git \ feature_branch/TimSort If you want to see a proposed version of the Sort library source with Francisco's changes and Timsort included, download the zip file https://drive.google.com/file/d/0B4s-GPKYVV0jOGx5ckJUTkZzMWM/view?usp=sharin.... Feel free to compare flat_stable_sort with Timsort. Please answer these questions in reply to this thread: 1. What is your evaluation of the Timsort implementation? 2. What is your evaluation of the comments? 3. Did you try to use the library? With what compiler? Did you have any problems? 4. How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? 5. How familiar are you with the details of hybrid and stable sorts? 6. Would you actually use Timsort if it was in Boost.Sort? If so, how is it better than the next best alternative? 7. Do you think Timsort should be included in Boost.Sort? Please ask Alexander Zaitsev any questions you have about Timsort for this review before submitting your final review. Thanks, Steven Ross
Please provide a link to online documentation, if if one exists. All I've been able to find so far is a paper and Doxygen reference docs. I got that much by cloning the GitHub repo (putting docs online somewhere will get you a lot more reviewers :). Did I miss it somewhere? Zach On Tue, Jun 6, 2017 at 10:09 AM, Steven Ross via Boost < boost@lists.boost.org> wrote:
The review for C++ Timsort in Boost is ongoing and will continue through June 12. I am the review manager. The code can be found here: https://github.com/boostorg/sort/pull/12 To merge it into a local copy of the Boost.Sort library: git checkout -b ZaMaZaN4iK-feature_branch/TimSort develop git pull https://github.com/ZaMaZaN4iK/sort.git \ feature_branch/TimSort
If you want to see a proposed version of the Sort library source with Francisco's changes and Timsort included, download the zip file <https://drive.google.com/file/d/0B4s-GPKYVV0jOGx5ckJUTkZzMWM/view? usp=sharing>. Feel free to compare flat_stable_sort with Timsort.
Please answer these questions in reply to this thread:
1. What is your evaluation of the Timsort implementation? 2. What is your evaluation of the comments? 3. Did you try to use the library? With what compiler? Did you have any problems? 4. How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? 5. How familiar are you with the details of hybrid and stable sorts? 6. Would you actually use Timsort if it was in Boost.Sort? If so, how is it better than the next best alternative? 7. Do you think Timsort should be included in Boost.Sort?
Please ask Alexander Zaitsev any questions you have about Timsort for this review before submitting your final review.
Thanks,
Steven Ross
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost
On 6 June 2017 at 17:40, Zach Laine via Boost
Please provide a link to online documentation, if if one exists. All I've been able to find so far is a paper and Doxygen reference docs. I got that much by cloning the GitHub repo (putting docs online somewhere will get you a lot more reviewers :). Did I miss it somewhere?
Zach
On Tue, Jun 6, 2017 at 10:09 AM, Steven Ross via Boost < boost@lists.boost.org> wrote:
The review for C++ Timsort in Boost is ongoing and will continue through June 12. I am the review manager. The code can be found here: https://github.com/boostorg/sort/pull/12 To merge it into a local copy of the Boost.Sort library: git checkout -b ZaMaZaN4iK-feature_branch/TimSort develop git pull https://github.com/ZaMaZaN4iK/sort.git \ feature_branch/TimSort
If you want to see a proposed version of the Sort library source with Francisco's changes and Timsort included, download the zip file <https://drive.google.com/file/d/0B4s-GPKYVV0jOGx5ckJUTkZzMWM/view? usp=sharing>. Feel free to compare flat_stable_sort with Timsort.
Please answer these questions in reply to this thread:
1. What is your evaluation of the Timsort implementation? 2. What is your evaluation of the comments? 3. Did you try to use the library? With what compiler? Did you have any problems? 4. How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? 5. How familiar are you with the details of hybrid and stable sorts? 6. Would you actually use Timsort if it was in Boost.Sort? If so, how is it better than the next best alternative? 7. Do you think Timsort should be included in Boost.Sort?
Please ask Alexander Zaitsev any questions you have about Timsort for this review before submitting your final review.
Thanks,
Steven Ross
I agree, a summary of what the hell is TimSort would be useful for potential reviewers. Joël Lamotte
On 6 June 2017 at 22:09, Klaim - Joël Lamotte via Boost < boost@lists.boost.org> wrote:
I agree, a summary of what the hell is TimSort would be useful for potential reviewers.
It seems docs are missing from this implementation, but it *is* possible to find out what the hell TimSort is. https://en.wikipedia.org/wiki/Timsort degski -- "*Ihre sogenannte Religion wirkt bloß wie ein Opiat reizend, betäubend, Schmerzen aus Schwäche stillend.*" - Novalis 1798
On 6/6/17 9:41 PM, degski via Boost wrote:
On 6 June 2017 at 22:09, Klaim - Joël Lamotte via Boost < boost@lists.boost.org> wrote:
I agree, a summary of what the hell is TimSort would be useful for potential reviewers.
It seems docs are missing from this implementation, but it *is* possible to find out what the hell TimSort is.
https://en.wikipedia.org/wiki/Timsort
degski
Wait a minute ! Is being suggested that a library without documentation is going to be reviewed by Boost. If this is true, it is simply not possible. Reviewing a library is a lot of effort and reviewers (or users) can't be expected to decipher the how the library is to be used from through the source. It should not be accepted for review in this condition. Robert Ramey
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Klaim - Joël Lamotte via Boost Sent: 06 June 2017 20:09 To: Boost Developers List Cc: Klaim - Joël Lamotte Subject: Re: [boost] [sort] Timsort review reminder
On 6 June 2017 at 17:40, Zach Laine via Boost
wrote: Please provide a link to online documentation, if if one exists. All I've been able to find so far is a paper and Doxygen reference docs. I got that much by cloning the GitHub repo (putting docs online somewhere will get you a lot more reviewers :). Did I miss it somewhere?
Zach
On Tue, Jun 6, 2017 at 10:09 AM, Steven Ross via Boost < boost@lists.boost.org> wrote:
The review for C++ Timsort in Boost is ongoing and will continue through June 12. I am the review manager. The code can be found here: https://github.com/boostorg/sort/pull/12 To merge it into a local copy of the Boost.Sort library: git checkout -b ZaMaZaN4iK-feature_branch/TimSort develop git pull https://github.com/ZaMaZaN4iK/sort.git \ feature_branch/TimSort
If you want to see a proposed version of the Sort library source with Francisco's changes and Timsort included, download the zip file <https://drive.google.com/file/d/0B4s-GPKYVV0jOGx5ckJUTkZzMWM/view? usp=sharing>. Feel free to compare flat_stable_sort with Timsort.
Please answer these questions in reply to this thread:
1. What is your evaluation of the Timsort implementation? 2. What is your evaluation of the comments? 3. Did you try to use the library? With what compiler? Did you have any problems? 4. How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? 5. How familiar are you with the details of hybrid and stable sorts? 6. Would you actually use Timsort if it was in Boost.Sort? If so, how is it better than the next best alternative? 7. Do you think Timsort should be included in Boost.Sort?
Please ask Alexander Zaitsev any questions you have about Timsort for this review before submitting your final review.
Thanks,
Steven Ross
I agree, a summary of what the hell is TimSort would be useful for potential reviewers.
It is totally unacceptable for there not to be any documentation at all (even if the Wikipedia article gives good details). At the very least it should say "This is a C++ implementation of the referenced algorithm by Tim Peters." I note this in the Wikipedia article "Formal verification- In 2015, Dutch and German researchers in the EU FP7 ENVISAGE project found a bug in the standard implementation of Timsort.[8]" I think we must have a comment on if this is 'fixed' - or ? "GNU Octave's oct-sort.cc – the C++ implementation of Timsort used in GNU Octave" are there any licensing issues from 'borrowing' from this code? I see mention of documentation - is this intended to be merged into the main Boost.Sort documentation? Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830
On 06/06/2017 16:40, Zach Laine via Boost wrote:
Please provide a link to online documentation, if if one exists. All I've been able to find so far is a paper and Doxygen reference docs. I got that much by cloning the GitHub repo (putting docs online somewhere will get you a lot more reviewers :). Did I miss it somewhere?
I see the pull request implements no documentation page matching http://www.boost.org/doc/libs/1_64_0/libs/sort/doc/html/sort/sort_hpp/intege... nor modifying it. Some doxygen comment documentation can be found at: https://github.com/boostorg/sort/pull/12/files#diff-c13fbe11aeffb2befb942f91... I'll be honest here: I personally would have felt this pull request better reviewed and handled by Boost.Sort's maintainer. It's too small and limited for a full fat review by the entire community unless there is something very controversial about it and the maintainer feels the community needs to invest a week into thinking about this. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On Wed, Jun 7, 2017 at 6:33 AM Niall Douglas via Boost < boost@lists.boost.org> wrote:
Please provide a link to online documentation, if if one exists. All I've been able to find so far is a paper and Doxygen reference docs. I got
On 06/06/2017 16:40, Zach Laine via Boost wrote: that
much by cloning the GitHub repo (putting docs online somewhere will get you a lot more reviewers :). Did I miss it somewhere?
I see the pull request implements no documentation page matching
http://www.boost.org/doc/libs/1_64_0/libs/sort/doc/html/sort/sort_hpp/intege... nor modifying it. Some doxygen comment documentation can be found at:
https://github.com/boostorg/sort/pull/12/files#diff-c13fbe11aeffb2befb942f91...
Here is some documentation on Timsort: 1. Wiki https://en.wikipedia.org/wiki/Timsort 2. brief description in python dev list http://svn.python.org/projects/python/trunk/Objects/listsort.txt 3. Original implementation https://github.com/gfx/cpp-TimSort Francisco and I are in the middle of rewriting the docs for the Sort library. Integrating Timsort in there once done will be relatively simple.
I'll be honest here: I personally would have felt this pull request better reviewed and handled by Boost.Sort's maintainer. It's too small and limited for a full fat review by the entire community unless there is something very controversial about it and the maintainer feels the community needs to invest a week into thinking about this.
Niall
I would prefer a mini-review myself (as I agreed to do when adding new algorithms to the collection).but Ronald suggested I do a full review. My main questions are: Does anyone care about Timsort? Will Alexander maintain the source and answer questions about it? If you don't care about Timsort, you don't need to review it.
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Francisco and I are in the middle of rewriting the docs for the Sort library. Integrating Timsort in there once done will be relatively simple.
I really wish you'd supplied that single page already. New Boost code is supposed to come with documentation.
I'll be honest here: I personally would have felt this pull request better reviewed and handled by Boost.Sort's maintainer. It's too small and limited for a full fat review by the entire community unless there is something very controversial about it and the maintainer feels the community needs to invest a week into thinking about this.
I would prefer a mini-review myself (as I agreed to do when adding new algorithms to the collection).but Ronald suggested I do a full review. My main questions are: Does anyone care about Timsort?
My sole experience with Timsort was many years ago when I was confounded by some Python code I was writing where a sort was not scaling with input as it should have done (it was considerably better than N log N, and I couldn't see how that was possible). So yes, Timsort should be in C++, preferably with std::sort() doing it automatically when the to be sorted set is big enough to make it worthwhile. Exactly where than cutoff is is very hard to estimate though. And I haven't looked at the implementation, a big worry would be exception safety and guarantees. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 7 Jun 2017, at 13:31, Niall Douglas via Boost
Francisco and I are in the middle of rewriting the docs for the Sort library. Integrating Timsort in there once done will be relatively simple.
I really wish you'd supplied that single page already. New Boost code is supposed to come with documentation.
I'll be honest here: I personally would have felt this pull request better reviewed and handled by Boost.Sort's maintainer. It's too small and limited for a full fat review by the entire community unless there is something very controversial about it and the maintainer feels the community needs to invest a week into thinking about this.
I would prefer a mini-review myself (as I agreed to do when adding new algorithms to the collection).but Ronald suggested I do a full review. My main questions are: Does anyone care about Timsort?
My sole experience with Timsort was many years ago when I was confounded by some Python code I was writing where a sort was not scaling with input as it should have done (it was considerably better than N log N, and I couldn't see how that was possible).
So yes, Timsort should be in C++, preferably with std::sort() doing it automatically when the to be sorted set is big enough to make it worthwhile. Exactly where than cutoff is is very hard to estimate though
I agree that TimSort should be in Boost. But it isn't just a matter of size - TimSort excels when the input is already mostly-sorted at the possible cost in the case when it is really unsorted. Of course std::sort has to work for everything without being told ahead of time whether it is mostly sorted or not. So it would be great to be explicitly choose TimSort from Boost.Sort when I /do/ know that.
And I haven't looked at the implementation, a big worry would be exception safety and guarantees
I took a brief look at the pull request. Apologies if I wasn't looking at the latest code but brief comments: - Tests didn't appear to cover custom comparator cases at all. - Corner case tests give misleading re-assurance. They test for the zero and one element cases, but these are irrelevant for TimSort - very small inputs automatically fall through to a simple sort. You need to test the real boundaries (around 32 elements?) - Given users are only going to use this to squeeze performance, why are the internal vectors tmp_ and pending_ hard-coded to std::allocator<T> - the user may want to provide an alternate allocator or even a completely separate workspace - he may have an existing workspace that can be re-used - should this be a customisation point? - Documentation needs to give some comparison against some popular std::sort implementations. Against which and for what data is TimSort more performant?
On Wed, Jun 7, 2017 at 5:54 AM, Steven Ross via Boost wrote: I would prefer a mini-review myself (as I agreed to do when adding new
algorithms to the collection).but Ronald suggested I do a full review. My
main questions are: Does anyone care about Timsort?
Will Alexander maintain the source and answer questions about it? If you don't care about Timsort, you don't need to review it. I don't know if I care about TimSort (though I'd really *like* to know
that). That's exactly the sort of question documentation is meant to
answer. It needs to be ready before the review, so that reviewers can even
do the review. It also needs to exist so that after the review, users can
determine if the code will be useful to them, without having to create or
run benchmarks, or read source code.
Zach
On 6/7/17 3:54 AM, Steven Ross via Boost wrote:
On Wed, Jun 7, 2017 at 6:33 AM Niall Douglas via Boost <
If you don't care about Timsort, you don't need to review it.
Well I can't know whether I care about Timsort or not. I probably don't. But I CAN tell you I care about Boost. And this kind of thing - no documentation - no rationale, etc is exactly what Boost was conceived to avoid. Robert Ramey
On 6/7/17 8:09 AM, Robert Ramey via Boost wrote:
On 6/7/17 3:54 AM, Steven Ross via Boost wrote:
Actually, I'm thinking that since this is (or should be) an incremental improvement to an existing boost libary it should probably be handled as a "mini review" described here: http://www.boost.org/community/reviews.html#Fast-Track This would be easier for everyone involved. FWIW - "mini review" should not be considered a shortcut to compromising boost standards - such as they are. It should still have documentation, tests, etc. But would be an integral part of the sort library. Robert Ramey
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Steven Ross via Boost Sent: 07 June 2017 11:54 To: boost@lists.boost.org Cc: Steven Ross Subject: Re: [boost] [sort] Timsort review reminder
On Wed, Jun 7, 2017 at 6:33 AM Niall Douglas via Boost < boost@lists.boost.org> wrote:
Please provide a link to online documentation, if if one exists. All I've been able to find so far is a paper and Doxygen reference docs. I got
On 06/06/2017 16:40, Zach Laine via Boost wrote: that
much by cloning the GitHub repo (putting docs online somewhere will get you a lot more reviewers :). Did I miss it somewhere?
I see the pull request implements no documentation page matching
http://www.boost.org/doc/libs/1_64_0/libs/sort/doc/html/sort/sort_hpp/intege... nor modifying it. Some doxygen comment documentation can be found at:
https://github.com/boostorg/sort/pull/12/files#diff-c13fbe11aeffb2befb942f91...
Here is some documentation on Timsort:
1. Wiki https://en.wikipedia.org/wiki/Timsort 2. brief description in python dev list http://svn.python.org/projects/python/trunk/Objects/listsort.txt 3. Original implementation https://github.com/gfx/cpp-TimSort
Francisco and I are in the middle of rewriting the docs for the Sort library. Integrating Timsort in there once done will be relatively simple.
I'll be honest here: I personally would have felt this pull request better reviewed and handled by Boost.Sort's maintainer. It's too small and limited for a full fat review by the entire community unless there is something very controversial about it and the maintainer feels the community needs to invest a week into thinking about this.
Niall
I would prefer a mini-review myself (as I agreed to do when adding new algorithms to the collection) but Ronald suggested I do a full review. My main questions are: Does anyone care about Timsort?
I agree that TimSort is well worth having in Boost. So a YES to accept. But provided we can have some documentation on when it is likely to perform well - covered well by the Wikipedia and original articles (including implementation notes on details like the 'bug' discovered). It should be clear about acknowledging other people's work too. Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830
On 6/7/17 9:03 AM, Paul A. Bristow via Boost wrote:
I would prefer a mini-review myself (as I agreed to do when adding new algorithms to the collection) but Ronald suggested I do a full review. My main questions are: Does anyone care about Timsort?
I agree that TimSort is well worth having in Boost. So a YES to accept. But provided we can have some documentation on when it is likely to perform well - covered well by the Wikipedia and original articles (including implementation notes on details like the 'bug' discovered).
It should be clear about acknowledging other people's work too.
Hmmm - so that's your review? Here are some questions you might want to answer in your review: What is your evaluation of the design? don't know What is your evaluation of the implementation? don't know What is your evaluation of the documentation? there is a link to a Wikipedia article with a general description of TimSort What is your evaluation of the potential usefulness of the library? very useful? Did you try to use the library? With what compiler? Did you have any problems? Nope - just looked at ???? How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? about zero Are you knowledgeable about the problem domain? yes - we're all knowledgeable about sorting. YES to accept
Paul
--- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey via Boost Sent: 07 June 2017 17:17 To: Paul A. Bristow via Boost Cc: Robert Ramey Subject: Re: [boost] [sort] Timsort review reminder
On 6/7/17 9:03 AM, Paul A. Bristow via Boost wrote:
I would prefer a mini-review myself (as I agreed to do when adding new algorithms to the collection) but Ronald suggested I do a full review. My main questions are: Does anyone care about Timsort?
I agree that TimSort is well worth having in Boost. So a YES to accept. But provided we can have some documentation on when it is likely to perform well - covered well by the Wikipedia and original articles (including implementation notes on details like the 'bug' discovered).
It should be clear about acknowledging other people's work too.
Hmmm - so that's your review?
Here are some questions you might want to answer in your review:
What is your evaluation of the design?
don't know
What is your evaluation of the implementation?
don't know
What is your evaluation of the documentation?
there is a link to a Wikipedia article with a general description of TimSort
What is your evaluation of the potential usefulness of the library?
very useful?
Did you try to use the library? With what compiler? Did you have any problems?
Nope - just looked at ????
How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
about zero
Are you knowledgeable about the problem domain?
yes - we're all knowledgeable about sorting.
YES to accept
TL;DR We should not have had a formal review. Author and maintainer should have just have asked if anyone thought this a bad idea ;-) Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830
Just some info for those who're unfamiliar with the algorithm: As far as I understand it this particular implementation is based on Fugi Goro's implementation here, which I have contributed to a little: https://github.com/gfx/cpp-TimSort/ As the page will tell you, Timsort is faster for mostly-sorted or reversed sequences, is stable as opposed to the bulk of std::sort implementations which use quicksort variants, and in my experience is also faster on random sequences where the range of values is low (more benchmarking required). As Niall has noted, it is O(n log n) in worst case scenario, has a memory cost over quicksort/introsort/etc, and is generally slower for more random sequences. In some scenarios std::stable_sort is better, some worse. Like Paul I see no reason for a review, just a headsup. Matt Bentley
participants (11)
-
Alexander Zaitsev
-
Alexander Zaitsev
-
degski
-
Klaim - Joël Lamotte
-
Niall Douglas
-
Paul A. Bristow
-
Pete Bartlett
-
Robert Ramey
-
Soul Studios
-
Steven Ross
-
Zach Laine