Hi Steven I have the final version of all the algorithms. I wrote 6 algorithms of stable sort, and selected the fastest. On my computer is about 15% faster than the GCC stable sort. I had problems too with an error in an algorithm mixed with an error in the GCC dynamic memory system, Under certain sequence of request, the system provide a non valid address and due this a segmentation fault. The debug for to detect and correct had been a nightmare. The algorithms provided are : 1.- sort : The algorithm is intro_sort. Don't need additional memory. Performance similar to the GCC version and Windows version. Non stable 2.- parallel_sort : Decompose the global range in small ranges for to be sorted with intro_sort. Don't need additional memory. Similar performance than the TBB version and Microsoft version. Non Stable 3.- stable_sort : The algorithm is called smart_merge_sort, and is about 10-15% faster than the GCC stable sort version, The additional memory is an half of the memory used by the data.( The GCC stable sort use the same memory than the data) 4.- sample_sort : is a parallel stable sort. Have the same performance than the TBB implementations, and much more faster than the GCC parallel stable_sort. The additional memory is the same than the data 5.- parallel_stable_sort : This algorithm is based on the sample sort. Increase the sample_sort time about a 5%, but the additional memory used is an half of the memory used by the data. The functions are in the namespàce boost::sort All the algorithms are exception safe, and use indirect sort when the size of the objects is big. According to my benchmarks, the size limit for to change a indirect sort is 128 bytes. The function format of the parallel algorithms are algorithm ( first, last, compare , number_threads). The number of threads is passed with an struct NThreads which receives an unsigned, and return an unsigned. The default construct of this struct is filled with the HW threads At today I am testing on Windows with Visual Studio 2013 CTP 4, and changing some things of the code due to the incompatibility with C++11 features In a few days I will document all the functions of the code for generate a doxygen documentation, and can send you a version of the code. Then we must talk about how to design the benchmark. I think the starting point can be your experience with your code The C++ committee propose a format for to call the parallel functions described in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3554.pdf it would be a good idea to examine, and decide if follow the proposal, if we don't follow or if we provide the two interfaces for the parallel algorithms. According with my first impression it's not very complex to implement. Yours Francisco Tapia 2015-01-14 17:26 GMT+01:00 Francisco José Tapia <fjtapia@gmail.com>:
Hi Thorsten,
The idea about the indirect sort is very easy. Instead of sort the elements, create a vector of pointers, and sort the vector of pointers, with this you don't move the elements , move the pointers. And at end with the vector of pointers sorted, move the data only 1 time.
In the stable sort, the algorithms usually need additional memory. The GCC and TBB the same memory than the data, in my algorithms only 1 half. Imagine you want to sort 10 million elements of 128 bytes each. With the indirect sort you need 10 million pointers and additionally memory for to allocate 5 million pointers.
The memory used by the data is 1280 Megas, but pointers and the additional memory are 15 million * 8 = 120 Megas, around a 10% of the memory used by the data.
In the first messages of this conversation, I sent a zip with code. In this file you can find the code of the indirect sort. If don't understand, something, please say me, and I will try to explain.
Paul,
about the name space, if at end, the results are satisfactory, and decide to include in the library, only say me the name space and I will put. For me it's indifferent . But, I need time for the new algorithm, and design a complete test suite of the code, the benchmarks, for to compare with others versions and the documentation.
If you want a benchmark with multi precision numbers, we can do without problem, even if you want to check the speed with different sizes of the numbers
Yours
Francisco