We can do too a benchmark with variable lenght elements with strings.
If you prepare the operations for to do in that test, I can prepare an
additional benchmark with all the algorithms and your operations.
After all, we see and take a decision
Yours
Francisco
2014-12-23 4:34 GMT+01:00 Steven Ross
Francisco,
On Mon Dec 22 2014 at 5:04:17 PM Francisco José Tapia
wrote: I had been working about the memory used by the different algorithms, specially the stable algorithms. I developed a low memory algorithm circular_stable_sort, which use only a 10% more of memory, and the speed decrease around 15% with small objects and with big objects is similar to the merge_sort
Some people may find that useful. What is the worst-case computational complexity? Is it a well-known algorithm? The bar is going to be higher (may require a full review) for previously unknown algorithms.
I was looking too, the TimSort algorithm. I found a C++ implementation in https://github.com/gfx/cpp-TimSort .
I inserted this algorithm in the benchmark programs. With small objects, the speed is slower than the stable sort and merge_sort, and have a good performance with big objects, greater than 64 bits. It's logic the decision of Java and Python, because they must order objects, usually non contiguous and bigger than the naked number.
For the measure of the memory used by the program I use the command /usr/bin/time -v program_to_execute. I take the benchmark_algorithm_04 (100000000 uint64_t numbers), and comment all the invocations to the algorithms , except one, compile, and run in a terminal with the command /usr/bin/time -v benchmark_sort_04.
I repeat this process with all the algorithms for to know the memory used by each algorithm.
I suggest changing the struct you use for benchmarking to this: template
struct reburcio { uint64_t M[NN]; reburcio () { init(); } void init() { for (int i = 0; i < NN; ++i) { M[i] = my_rand(); } }
bool operator < ( const reburcio &R) const{ for (int i = 0; i < NN; ++i) { if (M[i] != R.M[i]) { return M[i] < R.M[i]; } } return false; } };
That way it's using the default copy and assignment operators, which copy over the entire datastructure, and it initializes and compares all elements of the array. It appears that the benchmark you sent me only copies over and compares the first element in the array, which might have caused some of the large-element differences in performance you were seeing.
With this data structure, countertree intro_sort is competitive on my linux system with std::sort on fully random data, and countertree parallel_sort is competitive for 8, 16, and 256 bytes per element, but significantly underperforms (10-15%) tbb for 32 and 64 bytes.
As you can see with a 256 bytes of size the indirect parallel_stable_sort is 3 times faster. And the indirect_sort is a 25% faster.
Yes, indirect sort should eventually be faster, given a large enough element, especially if indirect sort is done with the fastest underlying algorithm available.
The final solution, if at end is taken, will be a hybrid algorithm depending of the size of the elements
That sounds reasonable for fixed-size elements. What about variable-length elements, like strings? Will this only be for fixed-size elements?
I propose you the next. Let me 2 or 3 weeks and I will prepare a big benchmark with all the algorithms, including the indirect version of the stable and unstable sort. We will run the benchmarks over different machines, and according the results, take a decision about if we propose something to Boost, and which algorithm must be used in the hybrid version.
If, at end, we decide don't propose nothing, I only can say this had been a nice and exciting experience
I think indirect sorting is promising.
Mi intention is not to impose my code, I want provide to the final
users
the best solution, if exist. With my code or with the code of other. I am the first to clap the winner.
We are doing this because this parallel implementation don't need any other library, only the standard. There are compilers without OpenMP or TBB support, or simply, the user don't want to load them for to make a sort.
Agreed, but OpenMP is very common (even Android supports it)!, so systems without OpenMP are a very small niche in terms of total systems out there. TBB is supported on intel-compatible processors, and the vast majority of deployed processors where parallel sorting makes sense (won't drain the battery really quickly) are intel-compatible. If we're going to provide an alternative, it should cost very little in terms of performance, and ideally provide some kind of benefit besides eliminating a dependency.
I have in mind, several improvements of the algorithms. I will codify and test, and if run well, I include in the algorithms. I want, too, simplify the interface of indirect_sort, for to be used in a easy way by any sort algorithm. If you are interested, I will prepare too a brief document with the explanation and the internal algorithm.
I'd like to see that document. Along with testing a struct that is an array of ints, you should probably test variable-length string sorting, as string sorting is very common.
Merry Christmas,
Steve
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost