
Franciso, On Sun, Dec 14, 2014 at 1:03 PM, Francisco José Tapia <fjtapia@gmail.com> wrote:
Sorry by the delay, As promised, the sorting parallel algorithms
On my computer (Quad Core) the GCC std::sort and stable_sort are slight faster than my version of intro_sort and merge_sort (stable sort is a synonym of merge_sort). In other computer with an Intel I3, my algorithms are slight faster than in the GCC version.
An I3 is low-end hardware. I recommend testing on an i7 or faster system.
This code is a preliminary version, done for my countertree library. If you consider useful, I will change the name space and document the code, the algorithms and the ideas obtained from the benchmarks.
Please feel free for ask me any question or for to modify or change anything. The goal is to provide good sort methods to the users.
In the zip file you have the code and a file with a description of each code.
First, thanks for sending this in a fairly easy to test format. I suggest updating your docs to recommend single-line compilation (if appropriate), which saves a step: Sort/Modified/benchmark/GCC/util_sort$ g++ -std=c++11 -Wall -fexceptions -fopenmp -fomit-frame-pointer -fexpensive-optimizations -O3 -I../../../src benchmark_sort_03.cpp -fopenmp -s -lpthread -ltbb -o benchmark_sort_03 -pthread I was unable to run your indirect sort benchmark: Sort/Modified/benchmark/GCC/util_sort$ g++ -std=c++11 -Wall -fexceptions -fopenmp -fomit-frame-pointer -fexpensive-optimizations -O3 -I../../../src -c benchmark_sort_indirect.cpp -o benchmark_sort_indirect.o -pthread Sort/Modified/benchmark/GCC/util_sort$ g++ -o benchmark_sort_indirect benchmark_sort_indirect.o -pthread -fopenmp -s -lpthread -ltbb Sort/Modified/benchmark/GCC/util_sort$ ./benchmark_sort_indirect Size of element 8 Number of elements :100000000 terminate called after throwing an instance of 'std::system_error' what(): Enable multithreading to use std::thread: Operation not permitted Aborted (core dumped) I'm actually quite interested in the generalized indirect sort idea (I looked at your two functions). Do you think you could package that up so it's a simple wrapper, where the user passes in the sort function they want to use, and it takes care of changing the input and comparison (if possible) to be indirect, running the sort, and then swapping into place based on the results of the indirect sort? If that can be done without significant loss of efficiency, it would seem quite useful (I've had do indirect sorts multiple times, and it's always some code). Even your two functions look useful on their own. I suggest using the boost random number generator (or if you stick with the c++11 requirement, which I don't recommend, the std:: equivalent): #include <boost/random/mersenne_twister.hpp> #include <boost/random/uniform_int_distribution.hpp> random::mt19937 generator; random::uniform_int_distribution<int> distribution; And for each random number: int result = distribution(generator) You may want to reuse and modify some of my tests in https://github.com/spreadsort/sort/tree/master/example rand() is returning lots of duplicate elements, as you're sorting more than RAND_MAX elements. I'm not sure why you're putting so much effort into competing with stable_sort. Some people use it, but I've never seen it in actual production code. std::stable_sort is not just merge sort, as based on the documentation it can use less memory than merge sort: http://www.cplusplus.com/reference/algorithm/stable_sort/ Complexity If enough extra memory is available, linearithmic in the distance between first and last: Performs up to N*log2(N) element comparisons (where N is this distance), and up to that many element moves. Otherwise, polyloglinear in that distance: Performs up to N*log22(N) element comparisons, and up to that many element swaps. The speed results I get on my system for your library are below. It just doesn't seem competitive with tbb for speed on my system. If you get it there (within 5% on random, sorted, and mostly-sorted data on both Windows and Linux), or if you find someone else with a genuine need to use this in production code, I'll be interested to take another look. This is compiled using g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2 and run on a Intel(R) Core(TM) i7-3612QM CPU @ 2.10GHz: max. OpenMP threads:8 (processors)8 Sort of 200000000 elements in a vector ------------------- Sorted elements ----------------------------------------------- STL std::sort :3.60897 secs parallel::sort :1.24468 secs tbb::parallel_sort :0.059404 secs countertree::parallel_sort :1.25563 secs std::stable_sort :10.3108 secs parallel::stable_sort :3.1866 secs countertree::parallel_merge_sort :3.71677 secs Reverse sorted elements --------------------------------------- STL std::sort :2.57237 secs parallel::sort :1.02421 secs tbb::parallel_sort :0.895415 secs countertree::parallel_sort :1.30754 secs std::stable_sort :10.4286 secs parallel::stable_sort :3.21695 secs countertree::parallel_merge_sort :4.08913 secs Random elements, few repeated ( rand() )----------------------- STL std::sort :19.7451 secs parallel::sort :3.98043 secs tbb::parallel_sort :4.28722 secs countertree::parallel_sort :6.85041 secs std::stable_sort :18.6291 secs parallel::stable_sort :4.08416 secs countertree::parallel_merge_sort :6.09388 secs Random elements, quite repeated ( rand() % (NELEM/2) )--------- STL std::sort :19.5069 secs parallel::sort :3.93207 secs tbb::parallel_sort :4.66967 secs countertree::parallel_sort :6.29049 secs std::stable_sort :18.8519 secs parallel::stable_sort :4.31173 secs countertree::parallel_merge_sort :6.1165 secs Random element many repeated (rand() % 10000)-------------------- STL std::sort :11.8256 secs parallel::sort :2.65534 secs tbb::parallel_sort :3.44815 secs countertree::parallel_sort :3.79712 secs std::stable_sort :18.7368 secs parallel::stable_sort :3.70629 secs countertree::parallel_merge_sort :6.09851 secs Equal elements -------------------------------------------------- STL std::sort :4.50343 secs parallel::sort :1.44262 secs tbb::parallel_sort :0.0588514 secs countertree::parallel_sort :1.53466 secs std::stable_sort :10.382 secs parallel::stable_sort :3.22236 secs countertree::parallel_merge_sort :4.17672 secs