
On 6/16/06, Matt Calabrese <rivorus@gmail.com> wrote:
I can't say that I agree with your conclusion that parallel algorithms
aren't very useful for general applications nor that manually multithreading applications is the best option. On the contrary, I think it's going to be very important that the average programmer has high-level generic tools for working with parallel algorithms for even seemingly trivial applications in the not-too-far future.
I agree that in the future this will be important. But currently the technologies to support it (thread apis and mutexes) are not good enough to make this happen. I have uploaded what I came up with to parallel_stl_test.zip in the concurrent programming section of the boost vault. Perhaps I am doing something horribly wrong. I tried several different approaches to implementing things, and this one ended up being the fastest. All of the tests are also written to be kind to the parallel execution mode, and they still end up falling behind. If you were to 'optimize' the serial searching tests even further by using a hash table or qsort there would be no way that the parallel versions could keep up. I approach the problem in the same way STAPL does (I assume), using parallel containers and algorithms. It's only implemented just far enough that it will work at all. The one other thing STAPL does that I never got around to was to optimize algorithms at runtime so you would only run a serial version instead of a parallel version if that ended up being faster. But I'm fairly certain that the serial version will be faster 95% of the time, so often that it would negate any rare benefits of the parallel mode several times over. The way I see things, if the algorithms don't make inherently serial tasks faster then they don't have any reason to exist. Because inherently parallel things can already be made parallel easily. btw... this is how to interpret the output of the test application: find total = the total time it takes to execute overhead: setup = how much time has been spent in function overheads, copying values around etc pool = how much time has been spent in the thread pool, which includes all the waiting for the primary thread wait0 = how much time has been spent waiting in thread 0 wait1 = how much time has been spent waiting in thread 1 execution - these list how much time was actually spent in the execution of the task at hand on each cpu. this is how fast things could be if it wasn't for all the timing and waiting going on. find total should equal setup + pool + cpu0 - Clint Levijoki