Re: [boost] Flow-based programming library for Boost?

15 Dec 2012

      on Wed Dec 12 2012, Julian Gonggrijp <j.gonggrijp-AT-gmail.com> wrote:
...
Dave Abrahams wrote:
...
If you want such a library to be truly efficient, you should be able to
wire together the dataflows at compile-time and not pay any overhead for
composing large computations out of smaller ones.  The accumulators
library uses something like a dataflow model, with the "wiring" done
completely at compile-time.
Because of your remark I studied the Accumulators library, and I must 
say it's very elegant indeed. Dependencies between computations are 
resolved at compile-time (wow!) so they don't add unnecessary overhead 
at run-time and everything will be executed in the most efficient 
order.
However, note the word "order". Boost.Accumulators was clearly 
designed under the assumption of single-threaded operation and tightly 
interdependent computations. I believe it might very well be the best 
possible approach under those assumptions, but when the assumptions 
don't apply other considerations may call for a different solution.
"Dataflow" is a very general term, but the people who advertise with 
that word typically mean something more specific, i.e. lock-free 
multithreading with message passing. The underlying assumptions are 
more or less opposite to the assumptions in Boost.Accumulators: 
computations can run concurrently without waiting for each other, as 
long as they have input available and they can dispose of their output 
somewhere. If computations can take place /at the same time/ it does 
not really matter whether connections are made at compile-time or run-
time; /waiting/ is eliminated altogether so the building of 
connections at run-time adds only constant overhead.
It's not just the cost of *building* of dynamic connections but the cost
of traversing them.  Of course if your sub-computations happen to have
large granularity, these costs may be swamped by others, but a truly
flexible system would allow you to tune the approach to the problem.

The accumulators library was first used in a massively-parallel context
with each process running thousands or millions of accumulator sets.
Obviously it really depends on your problem domain but that might be a
more efficient way to break down the parallel processing.  Of course a
truly flexible system would (as Jeremiah suggested his design does)
allow you to combine static and dynamic wiring (sort of like Xpressive
does).
...
That constant overhead is worth it if accepting it helps you to avoid 
other problems. Note that the "wires" built by Accumulators at 
compile-time are /logical/ connections between /types/ that represent 
operations. Message passing "wires" are /physical/ connections between 
/threads/.
The accumulators library's wires are just as "physical" as any others,
they're just encoded in the instruction stream instead of in mutable
data.  I don't see the relevance of your emphasis on the word "types."
...
Instantiating the latter kind of connection at compile-time seems very
hard, if not impossible.
No, you could easily use threads with a system like the accumulators
framework.
...
Actually I think the Accumulators framework and lock-free message 
passing are completely orthogonal.
Exactly.  It seems as though you're now contradicting yourself.
...
One could use a "dataflow" library to connect threads at run-time
while composing computations for the individual threads at
compile-time with Boost.Accumulators, and get the best of both worlds.
I hope I'm making sense. Please let me know what you think.
I think you eventually made sense :-)

-- 
Dave Abrahams
BoostPro Computing                  Software Development        Training
http://www.boostpro.com             Clang/LLVM/EDG Compilers  C++  Boost