
I also increased the size of the block to 10,000 and performance improved a little bit (to ~ 0.316). If I increase to 100,000, performance go back to ~ 0.331.
Good new, isn't it?
Yes, but there is room for improvement... As long as we don't go four times faster than std::sort, we can do better. ;)
I've run a test with sizes between 100 and 150,000 and it seems to have little impact on the outcome. This is strange. It's a bit early to say if it's bad news. See csv & graph attached. I've used GetTickCount to measure.
I agree this is suspect.
We should perhaps do a test where we would inject large amount of tasks of precise duration and see how the scheduler behaves. It would also be interesting to measure the delay of execution of one of the tasks (if you know a task should last 1 s, measure how long it actually took).
Profiling will be welcome.
The latency + bandwidth test should explain why the slice's size doesn't seem to affect the performances. For the test task I see something like: for(int count = ::GetTickCount(); count != target; count = ::GetTickCount()); This is a trivial spin to make sure the tasks eat up some CPU for the desired amount of ms. -- EA