
Edouard A." wrote:
That's a lot of extra copying. [Edouard A.]
I don't do in place merge. I don't know if this would be faster.
My point is that you're doing extra copying compared even to a standard out-of-place mergesort. You're copying all the data on the way down, and then copying it all again during the merge on the way back up. Using in-place merge can avoid some of the copying on the way back up. I think you should get rid of the copying on the way down.
What do you mean by "hidden system locks"? [Edouard A.]
If you have two or more CPUs trying to access the same cache line, locking will occur.
Right. But do you think this effect is more significant than copying all of your data O(log n) times? Surely not. But even if it were a problem you could easily fix it by aligning your split points to cache lines. I encourage you to investigate parallel sorting algorithms in the literature - doesn't Knuth have a whole book of them? - rather than inventing your own. And base your improvements on benchmark results. Regards, Phil.