Re: [boost] [review][sort] Formal review period for Sort library begins today, November 10, and ends Wednesday, November 19
On 10 Nov 2014 at 0:39, Edward Diener wrote:
- What is your evaluation of the design?
It's okay. I would prefer a design where there is a sorted_view which if std::copy gives an actually sorted copy. Obviously one would then specialise that scenario with a more directly optimised specialisation, and you'd also have the API presented here which looks like std::sort(). I'd also prefer to see fallback implementations for less than random iterators, right down to forward only iterators.
- What is your evaluation of the implementation?
There seems to be an assumption that user supplied overloads and specialisations e.g. swap() don't throw exceptions. I think that unwise. BOOST_NOEXCEPT can test if something can throw. Explicit guarantees of what happens when something does throw needs to be in the documentation, per API. There seems to be a general lack of explicit C++ 11 support as well. In particular, I think all Boost libraries should use inline namespaces to implement ABI version management. Implementation isn't capable of dual build as standalone use without any dependency on Boost. I'd suggest my forthcoming Boost.BindLib library.
- What is your evaluation of the documentation?
Lacks exception guarantees, one per API. Lacks scaling graphs near front so people can quickly evaluate usefulness compared to std::sort. I believe the author will fix this though.
- What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems?
No.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
Glance.
- Are you knowledgeable about the problem domain?
Author of a very popular open source indexing algorithm library.
And finally, every review should attempt to answer this question:
- Do you think the library should be accepted as a Boost library?
Yes with these conditions: 1. Either change its name to Boost.Spreadsort, or do a much better job at a general sort library e.g. sorted_view, more than one sort algorithm not in the STL, more than random access iterator support. 2. Exception safety is not obvious, and explicit exception guarantees need to be made and stated per API in the documentation. 3. Explicit ABI version management in C++ 11 using inline namespaces. Forthcoming Boost.BindLib can do this for you, or it can be done by hand. It would also be nice if this library were also capable of standalone build without Boost, but this is not a requirement. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Niall,
On Fri, Nov 14, 2014 at 7:31 AM, Niall Douglas
On 10 Nov 2014 at 0:39, Edward Diener wrote:
- What is your evaluation of the design?
It's okay. I would prefer a design where there is a sorted_view which if std::copy gives an actually sorted copy. Obviously one would then specialise that scenario with a more directly optimised specialisation, and you'd also have the API presented here which looks like std::sort().
I'm not sure exactly what you mean by a sorted_view.
I'd also prefer to see fallback implementations for less than random iterators, right down to forward only iterators.
To sort at reasonable speed requires either random access iterators or allocating a deep or shallow copy of the data. I could write a simple wrapper to create a vector of iterators pointing to the input data, sort the iterators, and then use them to permute the original list, but what's the use case?
Lacks scaling graphs near front so people can quickly evaluate usefulness compared to std::sort. I believe the author will fix this though.
Done in the develop branch.
2. Exception safety is not obvious, and explicit exception guarantees need to be made and stated per API in the documentation.
How about statements like this in the documentation for each of the headers: Exceptions: Throws if any of the element comparisons, the element swaps (or moves), the right shift, subtraction of right-shifted elements, or the operations on iterators throws.
On 14 Nov 2014 at 21:34, Steven Ross wrote:
- What is your evaluation of the design?
It's okay. I would prefer a design where there is a sorted_view which if std::copy gives an actually sorted copy. Obviously one would then specialise that scenario with a more directly optimised specialisation, and you'd also have the API presented here which looks like std::sort().
I'm not sure exactly what you mean by a sorted_view.
some_iterator first, last; sorted_view v(first, last); // O(N), builds an index for(auto &i : v) // O(N log N), but can early out .... This is the sort of stuff that separates a sort function from a Boost general purpose sort library. sorted_view builds a lookup index in O(N) time and space and can be looked at as a fixed initial cost. Iterating the view may take considerably longer, but because the O(N log N) complexity is pushed into the iteration, code can early out or do other more efficient things. Point is you give options to code with this design rather than sorting being all or nothing as it is with std::sort. What one is looking for is *significantly* improving on the STL sort facilities, and performance is one of many ways of improving on the STL. A particular advantage of the above design is that it should be much more cache friendly for large datasets. You blitz the LL cache in a single pre-iteration step when building the index, after that iteration should only touch a small subset of total cache lines. As a practical example of utility, coincidentally this Monday I should be starting yet another new Boost library, this one implementing a portable kernel memory allocator and manager. This abstracts the management of memory allocated in the kernel. One thing you can do with it is work with sparse really huge datasets, ones which cannot possibly fit into a process address space. A sorted_view design would let you build the sort index as a very slow initial overhead (each window into the sparse data has to be individually mapped, processed and unmapped) and then iteration could "pick off" the top N items from the sort using very few window maps. Another example of what I mean by sorted_view. Hopefully it makes sense now.
I'd also prefer to see fallback implementations for less than random iterators, right down to forward only iterators.
To sort at reasonable speed requires either random access iterators or allocating a deep or shallow copy of the data. I could write a simple wrapper to create a vector of iterators pointing to the input data, sort the iterators, and then use them to permute the original list, but what's the use case?
You're assuming sizes of dataset which can fit into memory, and/or all items are available quickly to the sorting process. I am also unsure if random access iterators really are necessary. A brute force solution is to create a copy of a forward only iterator one per item, and swap those around. I'm sure there must be sort algorithms that can do better than that though.
2. Exception safety is not obvious, and explicit exception guarantees need to be made and stated per API in the documentation.
How about statements like this in the documentation for each of the headers: Exceptions: Throws if any of the element comparisons, the element swaps (or moves), the right shift, subtraction of right-shifted elements, or the operations on iterators throws.
Exception safety is about what "throws" exactly does at any point in the sort. Does it leave things half sorted? If all your operators above are marked noexcept, can you use an inplace sort instead of making copies? Does your sort recover from a throw, or take alternative action e.g. exclude the items where a throw happens. That sort of thing. Generally the STL follows a rule of leaving inputs untouched if an exception is thrown which is why when operators are not noexcept it must do a lot of copying. This is where std::move_if_noexcept comes from. Probably code entering a Boost library should copy STL conventions here where appropriate. Now, none of the above I think is a requirement for entry into Boost which is why I didn't make any of it a condition. Just food for thought on how to make a really valuable general purpose sort library. Your library, if renamed to Boost.SpreadSort, is not far from entry as is. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Niall,
On Sat, Nov 15, 2014, 9:13 AM Niall Douglas
some_iterator first, last; sorted_view v(first, last); // O(N), builds an index for(auto &i : v) // O(N log N), but can early out ....
This is the sort of stuff that separates a sort function from a Boost general purpose sort library. sorted_view builds a lookup index in O(N) time and space and can be looked at as a fixed initial cost. Iterating the view may take considerably longer, but because the O(N log N) complexity is pushed into the iteration, code can early out or do other more efficient things. Point is you give options to code with this design rather than sorting being all or nothing as it is with std::sort. What one is looking for is *significantly* improving on the STL sort facilities, and performance is one of many ways of improving on the STL.
A particular advantage of the above design is that it should be much more cache friendly for large datasets. You blitz the LL cache in a single pre-iteration step when building the index, after that iteration should only touch a small subset of total cache lines.
As a practical example of utility, coincidentally this Monday I should be starting yet another new Boost library, this one implementing a portable kernel memory allocator and manager. This abstracts the management of memory allocated in the kernel. One thing you can do with it is work with sparse really huge datasets, ones which cannot possibly fit into a process address space. A sorted_view design would let you build the sort index as a very slow initial overhead (each window into the sparse data has to be individually mapped, processed and unmapped) and then iteration could "pick off" the top N items from the sort using very few window maps.
Another example of what I mean by sorted_view. Hopefully it makes sense now.
Thanks for the explanation. Have you considered partial_sort? That provides the separation of the sort runtime overhead you're requesting (as do heaps). With regards to sorting data that won't fit in main memory, there are two standard approaches: k-way merge, and radix-based division of the problem. Radix-based division is usually faster (merging is simply concatenation), but k-way merge guarantees that it will split up the data small enough to fit in RAM in one pass (radix is usually 1, and can guarantee no more than 2 passes). Radix based division is compatible with the kind of piece at a time sorting you mention. Both approaches are efficient ways to sort in parallel and/or data on disk. I've written code for both in the past. Disk-based sorting requires writing and managing a bunch of temp files, for which the optimal approach may vary across OSes. Most people who want a scalable way to sort data that is too large to fit in a single system's memory are probably best off using something like MapReduce. I could write a radix-based division/map function to split up data into manageable chunks as you're requesting, but handling the output streams efficiently is more complicated than the trivial bucketing the function would provide. Have you considered just bucketing your data based on the top X bits (say 8-16), which allows you to finish the sort later for each bucket as needed?
I'd also prefer to see fallback implementations for less than random iterators, right down to forward only iterators.
To sort at reasonable speed requires either random access iterators or allocating a deep or shallow copy of the data. I could write a simple wrapper to create a vector of iterators pointing to the input data, sort the iterators, and then use them to permute the original list, but what's the use case?
You're assuming sizes of dataset which can fit into memory, and/or all items are available quickly to the sorting process.
I am also unsure if random access iterators really are necessary. A brute force solution is to create a copy of a forward only iterator one per item, and swap those around. I'm sure there must be sort algorithms that can do better than that though.
You are correct about the brute force approach with forward iterators: that is a shallow copy. Sorting data that won't fit in memory normally requires a deep copy. Once you get into network or file I/O, the limitations of the I/O tend to dominate performance if the merge/split and sorting algorithms are good. My recommended flow for sorting files from a forward iterator on a single system with too much data to fit in RAM using k-way merge: Read a section of the file at a time that will comfortably fit in RAM. Sort it (possibly with a separate thread), and write it to disk. Repeat with the next section, until you have a set of sorted files. Then merge them all together in a k-way merge. My recommended approach with MSD radix sorting: Read the input stream, writing each element into a file corresponding to a bucket based on its most significant bits. Track the counts of elements within each interval at sufficient granularity (1000 or so) that if some buckets are too full, you'll know exactly how to split them up in a second pass and store that with buckets that need it. Then sort each bucket from the top or the bottom as needed, concatenating the results. These sub-sorts can be done on separate thread. I'm willing to code up the radix bucketing with a guaranteed maximum of 2 bucketing passes, or the k-way merge step, but they can't wrap the entire functionality without adding a bunch of file operations, which can be tricky to port efficiently. They'd also need a bunch of parameters about things like maximum RAM to use, number of files, maximum numbers of files to have open simultaneously, etc.
2. Exception safety is not obvious, and explicit exception guarantees need to be made and stated per API in the documentation.
How about statements like this in the documentation for each of the headers: Exceptions: Throws if any of the element comparisons, the element swaps (or moves), the right shift, subtraction of right-shifted elements, or the operations on iterators throws.
Exception safety is about what "throws" exactly does at any point in the sort. Does it leave things half sorted? If all your operators above are marked noexcept, can you use an inplace sort instead of making copies? Does your sort recover from a throw, or take alternative action e.g. exclude the items where a throw happens. That sort of thing.
Generally the STL follows a rule of leaving inputs untouched if an exception is thrown which is why when operators are not noexcept it must do a lot of copying. This is where std::move_if_noexcept comes from. Probably code entering a Boost library should copy STL conventions here where appropriate.
I've added some basic documentation on exception safety along these lines, just like std::sort has, in the develop branch.
On 15 Nov 2014 at 21:29, Steven Ross wrote:
Another example of what I mean by sorted_view. Hopefully it makes sense now.
Thanks for the explanation. Have you considered partial_sort? That provides the separation of the sort runtime overhead you're requesting (as do heaps).
They're not quite commensurate. A bit like a string_view, a sorted_view lets you write code as if a full sort had taken place, it's just it's delayed until point of need. A partial sort requires different coding.
With regards to sorting data that won't fit in main memory, there are two standard approaches: k-way merge, and radix-based division of the problem. Radix-based division is usually faster (merging is simply concatenation), but k-way merge guarantees that it will split up the data small enough to fit in RAM in one pass (radix is usually 1, and can guarantee no more than 2 passes). Radix based division is compatible with the kind of piece at a time sorting you mention. Both approaches are efficient ways to sort in parallel and/or data on disk. I've written code for both in the past. Disk-based sorting requires writing and managing a bunch of temp files, for which the optimal approach may vary across OSes. Most people who want a scalable way to sort data that is too large to fit in a single system's memory are probably best off using something like MapReduce. I could write a radix-based division/map function to split up data into manageable chunks as you're requesting, but handling the output streams efficiently is more complicated than the trivial bucketing the function would provide. Have you considered just bucketing your data based on the top X bits (say 8-16), which allows you to finish the sort later for each bucket as needed?
:) I have no present need to sort out of memory items, nor expect to any time soon. My need for a kernel memory allocator is actually to enable true zero copy network packet handling whereby incoming and outgoing UDP packets are never ever copied, not once, except where other people's code gets in the way. We essentially need UDP packets to be assembled into a form we can dispatch to the GPU via OpenCL and process there with hashing and decryption/encryption before sending data straight to the filing system. Local process code need never actually touch any of the data moving between disk and the NIC, it all remains either in the GPU or in the kernel.
You are correct about the brute force approach with forward iterators: that is a shallow copy. Sorting data that won't fit in memory normally requires a deep copy.
If you can SHA512 the serialisation of an item, you can thereafter work with 512 bit integers and not need to touch the original. With AVX2, 512 bit integers are even a single register now, amazing really.
I'm willing to code up the radix bucketing with a guaranteed maximum of 2 bucketing passes, or the k-way merge step, but they can't wrap the entire functionality without adding a bunch of file operations, which can be tricky to port efficiently. They'd also need a bunch of parameters about things like maximum RAM to use, number of files, maximum numbers of files to have open simultaneously, etc.
There is a library in the Boost review queue which implements portable asynchronous file i/o. I may have had something to do with it.
I've added some basic documentation on exception safety along these lines, just like std::sort has, in the develop branch.
Cool, thanks. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
participants (2)
-
Niall Douglas
-
Steven Ross