Re: [boost] Flow-based programming library for Boost?

17 Dec 2012


      on Sun Dec 16 2012, Julian Gonggrijp <j.gonggrijp-AT-gmail.com> wrote:
...
...
In the degenerate case where the dependency graph is strictly linear,
concurrency might be completely wasted unless you're able to take
advantage of pipelining.
Isn't execution of linearly dependent operations always a pipeline?
Yes, but pipelining doesn't help if the input period exceeds the
pipeline latency.
...
Assuming that, isn't every vertex in a dependency graph a pipeline by 
itself?
I suppose it's a degenerate pipeline, but I don't know what you're
getting at.
...
Assuming that, isn't pipelining the entire purpose of any form of
computational composition?
IMO the purpose of computational composition is to manage complexity
through re-use.
...
...
And there's still a question of how that concurrency is carved up.  If
you have a complicated dependency graph and you can run several of these
flow graphs in parallel, but each in a single thread (as described
below), that seems likely to induce less synchronization than if you try
to run the nodes of a single flow graph in parallel.  Of course whether
that decomposition can work depends on your problem.
You have a point. But: suppose that you have two operations, A and B, 
and you want to output B(A(item)) for each item out of a collection of 
inputs. The alternatives that you're suggesting are
(1) have operation A executed by one thread and operation B by the 
other, then pipeline these threads;
(2) have two (or more) identical threads, each taking an equal share 
of the inputs and executing B o A.
In scenario (2) the threads don't need to communicate, so you're right 
that it will involve less waiting if you generalize this to more 
complex computations, but now we're not taking into account that (1) 
is only a member out of a much larger class of solutions to the same 
problem. If we realistically assume that one operation takes longer 
than the other, let's say that B takes three times the amount of time 
of A, then the following network will prove superior to (1) and (2) on 
the condition that you have a CPU with at least four hardware threads:
(1b) We use four execution threads (or any multiple). The first takes 
the inputs, applies operation A on them and divides the intermediate 
results between the other three threads. The other three threads apply 
operation B, each to their own share of the intermediate results.
This also generalizes to more complex dependency graphs. In fact, the 
more operations are involved, the more likely you are to find 
significantly different execution times.
Yes.


-- 
Dave Abrahams
BoostPro Computing                  Software Development        Training
http://www.boostpro.com             Clang/LLVM/EDG Compilers  C++  Boost