Hi Steven,
About parallel_introsort, there are two strategies :
1.- Divide the the elements in two parts with the quicksort procedure,
several times and when the parts are smaller than predefined value, sort
with introsort. This method don't need additional memory ( only the
variables of the recursive calls).
This have a problem, the first division is done with 1 HW thread, the
second with 2, the third with 4, and the fourth with 8. With processors
with a small number of HW threads, run well and usually is the fastest. But
with a great number of HW threads this is not true.
This strategy is used by TBB parallel_sort, Boost parallel_introsort and
Microsoft PPL parallel_sort.
2.- Sort small parts of the data , and do a fusion, usually using the same
technique than sample_sort. With a great number of cores this version
outperform the previous strategy, but need an additional memory equal than
the used with the data.
This strategy is used by GCC parallel_sort, and Microsoft
parallel_buffered_sort
The most critical element for the performance is the data bus. A personal
computer have a dual channel, and a server have a quad channel and a big
cache. Due this, sometimes, good results in small machines are bad in big
machines , and the opposite too.
The attached file is the result of running on a machine with 16 cores of
the Polytechnic University of Madrid.
The information about the memory used is in the documentation. But if you
want to check yourself, I send you too the old benchmark files, which sort
the same than in the benchmark code of the documentation.
On Linux you must execute /usr/bin/time -f “%M” program [parameters], and
print the memory used by the program. By example parallel_numbers 1, sort
with the GCC sort,
with the 2 as parameter, with introsort, and without parameters print a
menu, and must input a option by the keyboard.
In Windows, open the administration tool for to see the process, and there
is a column showing the maximum memory used.
I changed the syntax errors in the README.md , and now I go to change the
int_array
Francisco
2015-06-30 3:04 GMT+02:00 Steven Ross
On Fri, Jun 26, 2015 at 11:00 AM, Francisco José Tapia
wrote: Hi
the code and the documentation are finished
All is in https://github.com/fjtapia/sort_parallel
Please, if you see any error, mistake or something to correct, say me in order to change as soon as possible
Francisco,
Please fix your int_array type; I still see unnecessary calls in it. Here's what I suggest:
template
struct int_array { uint64_t M[NN]; template <class generator > static int_array<NN> generate (generator &gen) { int_array<NN> result; for ( uint32_t i =0 ; i < NN ; ++i) result.M[i] = gen(); return result; };
uint64_t counter ( void) const { uint64_t Acc =0 ; for ( uint32_t i =0 ; i < NN ; Acc += M[i++]) ; return Acc ; }
bool operator < ( const int_array &R) const { return ( counter() < R.counter()); }; };
There is no need to pass an int_array to the generate method. If you want, I can send you a pull request.
With the int_array corrected, I see a case in which the introsort is useful too; I think we can keep it all (depending on how Windows testing goes), but please fix the int_array.
If you'd prefer, I can send a pull request with the changes, though you'll need to make minor test fixes to accommodate it.
Steve
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost