Hi Francisco,
On Wed, Mar 25, 2015 at 3:57 PM Francisco José Tapia
TEST OBJECTS
The object used in the test ( Reburcio is a word of an exercise from my first pascal book, and don't have any sense) In the construction , copy construction and operator =, copy all the elements with a loop. The only one operation which use only the first element is the comparison.
I had done a new comparison, and compare the sum of all the elements of the object. The time are similar, you must add 5% in the small objects to 15% in the big objects.
I'm not precisely sure of your meaning, but I suggested modification of the test object to use default operators where possible, to randomize the entire contents of the object (so that not all integers in the array are the same), and to use the entire contents of the object for comparison. I believe that these suggested changes make it more reflective of the conventional large-object comparison case. Most of the time, only the first integer will be compared, but in equal cases the entire array will be compared. I don't like feeding overly simplified objects into benchmarks, because compilers can make some surprising optimizations.
TIMSORT
The Tim Sort code is not mine.
Agreed. I think we can drop it from this comparison, as your algorithm is clearly much faster. I mentioned it because it is used with other languages, and responded when somebody else on the email group suggested I allow it. I'm not asking you for an implementation. Timsort uses N/2 additional memory.
DESIGN PHASE
Now , we are in the design phase. Nothing is fixed. We can comment and change anything.
The specification is simple, provide sort algorithms, stable and not stable, for to run with 1 thread or with any number of threads, without any other external library ( as Open MP , TBB, CilkPlus..). Only with the C++ standard.
If you want to discuss about this, we can do. We admit ideas, suggestions, opinions. Any help is welcome. The final decision about if the code is included or not in the boost sort library, depend of Steven, the creator of the library.
After the design phase, with the decisions taken and the interfaces fixed. We prepare the final version, redesign a test system, test in deep and create the documentation.
I suggest deciding what you want to try to include. Getting a competitive parallel unstable sort to tbb will be hard, and I'm skeptical of including a library that just replicates what tbb provides for easy use.
I tried to improve, I would like to have more time, more equipment, and better documentation on order to find the best options, but I don't have, and I do this in my free time after my work and my family.
I'm in the same situation with regards to available time. I think many boost participants are.
I designed 6 stable sort algorithms, for to select the fast (internally named smart_merge_sort, about 10% faster than the GCC version). I prepared a version of Sample Sort, only with threads and atomic variables. In the benchmarks have similar performance than the TBB version, use one half of the auxiliary memory used by TBB and is exception safe.
I like your parallel stable sorts. They look like the most promising part of your proposed library additions. Indirect sorting is also interesting. I just want reliable large-object and string comparison numbers for them before diving in further. Have you looked at integrating your testing with the approach in the Boost Sort library? The randomgen binary generates files with different types of random number distributions that can test a variety of scenarios, and the tune.pl script controls automated performance testing.
In my opinion, the parallel stable sort is reasonably good for to be included in the final version. The important case, which shows the quality of the algorithm, is when the data are unsorted. I will try to implement the detection of the sorted elements , like in TBB, but I am not sure about the results. Th internal loops of the algorithm are delicate, and a small change , can produce big changes in the performance, as happened in other algorithms
Is there a way you can do a fast initial pass, checking for already-sorted
data? That's what Spreadsort does (at each level of the sort).