
Hi Steven, About parallel_introsort, there are two strategies : 1.- Divide the the elements in two parts with the quicksort procedure, several times and when the parts are smaller than predefined value, sort with introsort. This method don't need additional memory ( only the variables of the recursive calls). This have a problem, the first division is done with 1 HW thread, the second with 2, the third with 4, and the fourth with 8. With processors with a small number of HW threads, run well and usually is the fastest. But with a great number of HW threads this is not true. This strategy is used by TBB parallel_sort, Boost parallel_introsort and Microsoft PPL parallel_sort. 2.- Sort small parts of the data , and do a fusion, usually using the same technique than sample_sort. With a great number of cores this version outperform the previous strategy, but need an additional memory equal than the used with the data. This strategy is used by GCC parallel_sort, and Microsoft parallel_buffered_sort The most critical element for the performance is the data bus. A personal computer have a dual channel, and a server have a quad channel and a big cache. Due this, sometimes, good results in small machines are bad in big machines , and the opposite too. The attached file is the result of running on a machine with 16 cores of the Polytechnic University of Madrid. The information about the memory used is in the documentation. But if you want to check yourself, I send you too the old benchmark files, which sort the same than in the benchmark code of the documentation. On Linux you must execute /usr/bin/time -f “%M” program [parameters], and print the memory used by the program. By example parallel_numbers 1, sort with the GCC sort, with the 2 as parameter, with introsort, and without parameters print a menu, and must input a option by the keyboard. In Windows, open the administration tool for to see the process, and there is a column showing the maximum memory used. I changed the syntax errors in the README.md , and now I go to change the int_array Francisco 2015-06-30 3:04 GMT+02:00 Steven Ross <spreadsort@gmail.com>:
On Fri, Jun 26, 2015 at 11:00 AM, Francisco José Tapia <fjtapia@gmail.com> wrote:
Hi
the code and the documentation are finished
All is in https://github.com/fjtapia/sort_parallel
Please, if you see any error, mistake or something to correct, say me in order to change as soon as possible
Francisco,
Please fix your int_array type; I still see unnecessary calls in it. Here's what I suggest:
template <uint32_t NN> struct int_array { uint64_t M[NN];
template <class generator > static int_array<NN> generate (generator &gen) { int_array<NN> result; for ( uint32_t i =0 ; i < NN ; ++i) result.M[i] = gen(); return result; };
uint64_t counter ( void) const { uint64_t Acc =0 ; for ( uint32_t i =0 ; i < NN ; Acc += M[i++]) ; return Acc ; }
bool operator < ( const int_array &R) const { return ( counter() < R.counter()); }; };
There is no need to pass an int_array to the generate method. If you want, I can send you a pull request.
With the int_array corrected, I see a case in which the introsort is useful too; I think we can keep it all (depending on how Windows testing goes), but please fix the int_array.
If you'd prefer, I can send a pull request with the changes, though you'll need to make minor test fixes to accommodate it.
Steve
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost