Re: [boost] Is there interest in parallel versions of <algorithm> functions?

17 Jun 2006

      On 6/16/06, Matt Calabrese <rivorus@gmail.com> wrote:
...
...
I can't say that I agree with your conclusion that parallel algorithms
aren't very useful for general applications nor that manually
multithreading
applications is the best option. On the contrary, I think it's going to be
very important that the average programmer has high-level generic tools
for
working with parallel algorithms for even seemingly trivial applications
in
the not-too-far future.
I agree that in the future this will be important.  But currently the
technologies to support it (thread apis and mutexes) are not good enough to
make this happen.

I have uploaded what I came up with to parallel_stl_test.zip in the
concurrent programming section of the boost vault.  Perhaps I am doing
something horribly wrong.  I tried several different approaches to
implementing things, and this one ended up being the fastest.  All of the
tests are also written to be kind to the parallel execution mode, and they
still end up falling behind.  If you were to 'optimize' the serial searching
tests even further by using a hash table or qsort there would be no way that
the parallel versions could keep up.

I approach the problem in the same way STAPL does (I assume), using parallel
containers and algorithms.  It's only implemented just far enough that it
will work at all.  The one other thing STAPL does that I never got around to
was to optimize algorithms at runtime so you would only run a serial version
instead of a parallel version if that ended up being faster.  But I'm fairly
certain that the serial version will be faster 95% of the time, so often
that it would negate any rare benefits of the parallel mode several times
over.

The way I see things, if the algorithms don't make inherently serial tasks
faster then they don't have any reason to exist.  Because inherently
parallel things can already be made parallel easily.

btw... this is how to interpret the output of the test application:

find total = the total time it takes to execute
overhead:
  setup = how much time has been spent in function overheads, copying values
around etc
  pool = how much time has been spent in the thread pool, which includes all
the waiting for the primary thread
  wait0 = how much time has been spent waiting in thread 0
  wait1 = how much time has been spent waiting in thread 1
execution - these list how much time was actually spent in the execution of
the task at hand on each cpu.  this is how fast things could be if it wasn't
for all the timing and waiting going on.

find total should equal setup + pool + cpu0

- Clint Levijoki