
On Wed, Sep 3, 2008 at 5:36 AM, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
I just noticed this in the "lifetime of ranges vs. iterators" thread (which I've not really been following):
Arno Sch?dl wrote:
rng | filtered( funcA ) | filtered( funcB ) | filtered( funcC ) | filtered( funcD ) | filtered( funcE )
I thought it worth pointing out the similarity, and also the difference, between this and the proposed dataflow notation. Here, operator| is being used like a shell pipe operator. In dataflow, operator| has a quite different meaning: it's a vertical line, distributing the output of "rng" to the inputs of the funcs in parallel. Confusing, perhaps?
Yes, I was anticipating that there would be possible confusion between the dataflow library use of "|" for branching and the common use of "|" for piping. A different operator could be used in dataflow, if preferable.
Anyway you could presumably write something like
rng >>= funcA >>= funcB ....
and I would be interested to hear how the two implementations compare. Is it true to say that stacked iterators implement a "data pull" style, while dataflow implements "data push"?
Dataflow.Signals networks are typically implemented as push networks, but they can also be used for pull-processing: http://www.dancinghacker.com/code/dataflow/dataflow/signals/introduction/tut... The direction indicated by >>= aligns with the direction of the signal (function call), but the data can flow in either way (either sent forward in the function call argument, or sent back through through the return value). So, you could do rng >>= funcA >>= funcB or funcB >>= funcA >>= rng depending on how the func and rng components are implemented.
I also note that Arno wants to use stacked iterators because this alternative:
result = fn1( fn2( fn3( fn4( huge_document ) ) ) );
creates large intermediates and requires dynamic allocation. Again, a framework that allowed buffering of "sensible size" chunks and potentially distributed the work between threads could be a good solution.
As far as the dataflow library goes, some sort of a "automatic task division" library would indeed be great in conjunction with dataflow, but I see this as orthogonal to dataflow. Automatic task division could be useful without dataflow, and dataflow could be useful without automatic task division. Is it your opinion that some sort of a task division strategy would be necessary for the dataflow library to be useful? Kind regards, Stjepan