[review] Dataflow Review starts today, September 1st

The review of Stjepan Rajko's Dataflow library starts today, September 1st, and will run until September 10th. --------------------------------------------------------- Description of the library: Dataflow is a generic library for dataflow programming. Dataflow programs can typically be expressed as a graph in which vertices represent components that process data, and edges represent the flow of data between the components. As such, dataflow programs can be easily reconfigured by changing the components and/or the connections. This review focuses on the Dataflow.Signals layer of the library. For its data transport mechanism, Dataflow.Signals uses Boost.Signals which can be used to make lasting dataflow connections based on function calls. Dataflow.Signals provides the following to facilitate signals-based dataflow networks: * A number of useful general-purpose components, and building blocks for implementing new components. * Various free functions and operators for connecting and using components. The library documentation provides some concrete examples of how Dataflow.Signals layer can be used. Some examples are: * Implementing distributed dataflow applications using Dataflow.Signals and Boost.Asio * An image processing network using Dataflow.Signals and Boost.GIL * A GUI dataflow editor (located in the Dataflow.Blueprint documentation) While the Dataflow library contains other layers, only the Dataflow.Signals layer is ready for review. Reviewers are welcome to provide feedback for any part of the library, but please be aware that the documentation and implementation for the other layers may be lacking (for example, there is the generic support layer, which provides concepts applicable to different dataflow frameworks, and can be used to develop generic dataflow code, as well as the Dataflow.Blueprint layer which provides run-time reflection and modeling of dataflow networks in a Boost Graph Library graph for any dataflow framework with implemented Dataflow library support). For the time being, please consider these other layers as implementation details or proof-of-concept examples, as appropriate. The library is accessible as a tarball at: http://www.boostpro.com/vault/index.php?&directory=Dataflow The documentation can be accessed here: http://www.dancinghacker.com/code/dataflow/ The documentation particular to the Dataflow.Signals layer under review: http://www.dancinghacker.com/code/dataflow/dataflow/signals.html Dataflow depends on many existing Boost libraries, such as Fusion and MPL. The Dataflow.Signals layer builds functionality over Boost.Signals. The Dataflow library has been tested using a recent version of the Boost trunk, as well as the 1.35 release. Tests and examples have been built successfuly on OS X (GCC 4.0, 4.2), Linux (GCC 4.2), and Windows (MSVC 8.0 and to some degree MinGW GCC 4.2). --------------------------------------------------------- --------------------------------------------------------- Questions you may want to answer in your review: - What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain? In particular, please remember to answer the following question explicitly: - Do you think the library should be accepted as a Boost library? Jaakko Järvi Review Manager

On Mon, Sep 1, 2008 at 7:19 AM, Jaakko Järvi <jarvi@cs.tamu.edu> wrote:
The review of Stjepan Rajko's Dataflow library starts today, September 1st, and will run until September 10th.
Hi, while skimming through the Dataflow documentation, I noticed an error in the distributed example (http://tinyurl.com/5qcjxx): void asio_server() { .... { boost::mutex::scoped_lock lock(mutex_); acceptor.listen(); cond.notify_all(); } .... } int main(int, char* []) { // start the server in a separate thread, and wait until it is listening boost::mutex::scoped_lock lock(mutex_); boost::thread t(asio_server); cond.wait(lock); } .... Cond is a condition variable, but it is being used wrongly. You need to tie the waiting to a predicate (for example an external shared flag, initially false) that must be checked in a loop: int main(...) { ... boost::thread t(asio_server); { boost::mutex::scoped_lock lock(mutex_); while(!other_thread_ready) cond.wait(lock); } ... } and you should set it before the broadcast: { boost::mutex::scoped_lock lock(mutex_); acceptor.listen(); other_thread_ready = true; cond.notify_all(); } BTW, hopefully I'll be able to write a review. The library seems interesting. -- gpd

On Mon, Sep 1, 2008 at 9:35 AM, Giovanni Piero Deretta <gpderetta@gmail.com> wrote:
On Mon, Sep 1, 2008 at 7:19 AM, Jaakko Järvi <jarvi@cs.tamu.edu> wrote:
The review of Stjepan Rajko's Dataflow library starts today, September 1st, and will run until September 10th.
Hi, while skimming through the Dataflow documentation, I noticed an error in the distributed example (http://tinyurl.com/5qcjxx):
Thanks for raising this issue - threading is definitely not my forte, and it's quite possible that I'm doing something weird. I see, looking at the docs: template <typename ScopedLock> void wait(ScopedLock& lock); Danger: This version should always be used within a loop checking that the state logically associated with the condition has become true. Without the loop, race conditions can ensue due to possible "spurious wake ups". I wasn't aware of the "spurious wake ups" issue. I will fix the code as you recommended.
BTW, hopefully I'll be able to write a review. The library seems interesting.
That would be great! Thanks, Stjepan

On Mon, Sep 1, 2008 at 7:44 PM, Stjepan Rajko <stipe@asu.edu> wrote:
On Mon, Sep 1, 2008 at 9:35 AM, Giovanni Piero Deretta <gpderetta@gmail.com> wrote:
On Mon, Sep 1, 2008 at 7:19 AM, Jaakko Järvi <jarvi@cs.tamu.edu> wrote:
The review of Stjepan Rajko's Dataflow library starts today, September 1st, and will run until September 10th.
Hi, while skimming through the Dataflow documentation, I noticed an error in the distributed example (http://tinyurl.com/5qcjxx):
Thanks for raising this issue - threading is definitely not my forte, and it's quite possible that I'm doing something weird. I see, looking at the docs:
template <typename ScopedLock> void wait(ScopedLock& lock);
Danger: This version should always be used within a loop checking that the state logically associated with the condition has become true. Without the loop, race conditions can ensue due to possible "spurious wake ups".
I wasn't aware of the "spurious wake ups" issue. I will fix the code as you recommended.
Spurious wakeups are rare in practice. OTOH, what is more likely to happen is that the main thread could miss the wake-up if the worker thread managed to signal the condition variable before the main thread got to wait for it (remember that condition variables are stateless). HTH, -- gpd

On Mon, Sep 1, 2008 at 10:50 AM, Giovanni Piero Deretta <gpderetta@gmail.com> wrote:
Spurious wakeups are rare in practice. OTOH, what is more likely to happen is that the main thread could miss the wake-up if the worker thread managed to signal the condition variable before the main thread got to wait for it (remember that condition variables are stateless).
Wouldn't the mutex lock have taken care of that? asio_server blocks before notifying, and the main thread doesn't release the mutex until getting to the wait point. Stjepan

On Mon, Sep 1, 2008 at 8:00 PM, Stjepan Rajko <stipe@asu.edu> wrote:
On Mon, Sep 1, 2008 at 10:50 AM, Giovanni Piero Deretta <gpderetta@gmail.com> wrote:
Spurious wakeups are rare in practice. OTOH, what is more likely to happen is that the main thread could miss the wake-up if the worker thread managed to signal the condition variable before the main thread got to wait for it (remember that condition variables are stateless).
Wouldn't the mutex lock have taken care of that? asio_server blocks before notifying, and the main thread doesn't release the mutex until getting to the wait point.
Yes, you a right. It can fail only because of a spurious wakeup. -- gpd

----- Original Message ----- From: "Jaakko "Järvi"" <jarvi@cs.tamu.edu> To: <boost-announce@lists.boost.org>; <boost@lists.boost.org>; <boost-users@lists.boost.org> Sent: Monday, September 01, 2008 7:19 AM Subject: [boost] [review] Dataflow Review starts today, September 1st Hi Stjepan, Thanks for a so interesting library.
This review focuses on the Dataflow.Signals layer of the library
Do you mean that the Generic Support Layer is not reviewed? For the moment just some questions and remarks on the documentation (typos, ...): * What does it means D in the following Port Requirement. Is it a Component, and the parameter (p) should not be a component instance? Name Expression Result Type Semantics Get Default Port get_default_port<D,M,T>(p) p Returns the port object. * Could you be more explict here? "ComplementedPorts are useful in situations where Port types are BinaryOperable in a one-to-one fashion (a pair of Port types are each other's port complements), or in a one-to-many fashion (a number of Port types have the same complement port). An example of the latter is Dataflow.Signals, where any signal of signature T has a complement port of type function<T>, and can therefore model ComplementedPort, but function<T> cannot because there are many signal types to which it can be connected. " * In VectorPort Refines, the link PortVevtor is invalid, and PortVector is not defined previously. * The names PortVector and VectorPort are confusing. * There is an error in PortVector Requirements PortVector Traits traits_of<>::type PVT The ComponentTraits of the component. Something missing between <> and the semantic do not conforms to the name * Need to add in the notation What is M and c in PortVector Requirements. GetPort get_port_c<M, I>(c) Returns the I'th PortVectorTraits exposed by C The name get_port_c don not conforms very well to the semantics. Is something wrong? * What does M stands for in Component Requirements GetComponentPort get_port<M, I>(c) Returns the I'th Port exposed by C * In ComponentOperable the use of C seems more convenient than P? * In "Setting up a producer Port and a consumer Port for VTK Now that we have the mechanism" What mechanism stands for (maybe this should be tag)? Best Regards, Vicente

Hi Vicente, On Mon, Sep 1, 2008 at 4:23 PM, vicente.botet <vicente.botet@wanadoo.fr> wrote:
----- Original Message ----- From: "Jaakko "Järvi"" <jarvi@cs.tamu.edu>
Thanks for a so interesting library.
Thank you for your feedback!
This review focuses on the Dataflow.Signals layer of the library
Do you mean that the Generic Support Layer is not reviewed?
Yes and no :-) As something that the Dataflow.Signals layer is based on, it certainly deserves scrutiny and I very much welcome feedback about it. But I don't believe that the Generic Support Layer in its current state is really review-ready in its own accord. For one, it is not truly generic - it's design is heavily biased towards the needs of Dataflow.Signals, and even though it has been successfully applied to other dataflow frameworks, it leaves much to be desired. Also, the docs / examples aren't as complete as I think they should be to facilitate a thorough review of that layer. I do intend to keep working on that layer and would love to have a formal review of it at some later point.
For the moment just some questions and remarks on the documentation (typos, ...):
* What does it means D in the following Port Requirement. Is it a Component, and the parameter (p) should not be a component instance? Name Expression Result Type Semantics Get Default Port get_default_port<D,M,T>(p) p Returns the port object.
Thanks for the catch - D is not documented on that page at all. It stands for a Direction type, and can be either dataflow::args::left or dataflow::args::right. For ports (when p is a port), get_default_port is a identity function - it will just return the port. If passed a Component c, get_default_port<D,M,T>(c) will return the default port of the component for that Direction and that Mechanism (its ComponentTraits have a map of default ports, keyed by the Direction and the Mechanism). As default ports are usually accessed during a binary operation (like connect), the Direction is deduced by the components position in the expression (args::left if it is on the left, args::right if on the right). The Mechanism is usually associated with the operation. In Dataflow.Signals, the mechanism associated with the connect operation is signals::connect_mechanism. Take for example, in the context of Dataflow.Signals: c >>= p; The underlying connect operation will do get_default_port<args::left, M, T>(c) and get_default_port<args::right, M, T>(p) to get the actual ports to be connected (where M=signals::connect_mechanism and T=signals::tag). If the component specifies a default port for args::left and mechanism M, it will return it, and that is what will get connected to port p, if possible.
* Could you be more explict here? "ComplementedPorts are useful in situations where Port types are BinaryOperable in a one-to-one fashion (a pair of Port types are each other's port complements), or in a one-to-many fashion (a number of Port types have the same complement port). An example of the latter is Dataflow.Signals, where any signal of signature T has a complement port of type function<T>, and can therefore model ComplementedPort, but function<T> cannot because there are many signal types to which it can be connected. "
Maybe I can give you a concrete example of how ComplementedPorts can be used. A part of what the Dataflow.Blueprint layer does is take Dataflow concepts and "turn them into" polymorphic runtime classes. For example, there is the blueprint::port class, which is a polymorphic base class that captures some Port functionality. The source is at: http://svn.boost.org/svn/boost/sandbox/SOC/2007/signals/boost/dataflow/bluep... There is also a class template port_t that inherits blueprint::port. So now let's say I have two Port types - Source and Target. And let's say I want to have a run-time function connect as follows: void connect(blueprint::port &producer, blueprint::port &consumer); I would use it in a program something like this: // the ports Source source; Target target; // suppose we need type erasure blueprint::port s = blueprint::port_t<Source>(source); blueprint::port t = blueprint::port_t<Target>(target); // we still want to connect connect(s, t); Now, we try to write the connect function. In order for it to do a dataflow connect, it needs to grab the underlying compile-time connect function, which is templated on both Source and Target. ComplementedPorts allow it to do that. If Source ports can only connect to Target ports, we can make Source a ComplementedPort with Target as its complement. Now, port_t<Source> has everything one would need to know about instantiating the right operation to connect Source to Target (because Target is Source's complement port type). If you look at the port source file I listed above, you will see that there is a complemented_port class with the member function port_to_complement_connector() - that function returns a function that knows how to connect the port to its complement. Returning to the implementation of void connect(blueprint::port &producer, blueprint::port &consumer). To do what it needs to do, this function can do the following: 1. check to see if producer is a complemented port (blueprint::port has a member function is_complemented_port) 2. if so, we can downcast it to complemented_port 3. we can find out the type of its complement's type_info 4. we can check to see whether consumer is of that type 5. if all went well, get the port_to_complement_connector from producer, and apply it to (producer, consumer). Now they are connected. I'm not sure whether this description helped at all. Basically, if PortTypeA only plays with PortTypeB, we can make it a ComplementedPort, which basically says "PortTypeA only plays with PortTypeB", and that piece of information can make things easier.
* In VectorPort Refines, the link PortVevtor is invalid, and PortVector is not defined previously.
Oops
* The names PortVector and VectorPort are confusing.
I agree 100%. For the time being, you can think of PortVector as a "vector of ports", and "VectorPort" as "a vector of ports that is itself a Port". I have a redesign of the Generic Support Layer in mind, and this nomenclature will go away.
* There is an error in PortVector Requirements PortVector Traits traits_of<>::type PVT The ComponentTraits of the component. Something missing between <> and the semantic do not conforms to the name
Yeah that should be traits_of<PV>::type, and the description should be "The PortVectorTraits" of the PortVector.
* Need to add in the notation What is M and c in PortVector Requirements. GetPort get_port_c<M, I>(c) Returns the I'th PortVectorTraits exposed by C The name get_port_c don not conforms very well to the semantics. Is something wrong?
Yikes, that page has lots of errors.
* What does M stands for in Component Requirements GetComponentPort get_port<M, I>(c) Returns the I'th Port exposed by C
That should be get_port<I, T>(c), where T is a tag
* In ComponentOperable the use of C seems more convenient than P?
Another messed up page...
* In "Setting up a producer Port and a consumer Port for VTK Now that we have the mechanism" What mechanism stands for (maybe this should be tag)?
Yep, should be tag. At one point, Mechanism became Tag, and then I brought back Mechanism for a different purpose. Recipe for disaster. Thanks for all the error-catching. As you see, there are some good reasons behind me thinking that the Generic Support Layer is not ready for review :-) But I'm glad you're giving it a look - I will give the docs for that layer a pass tomorrow and hopefully make them a little bit more up to snuff. Kind regards, Stjepan

On Mon, Sep 1, 2008 at 9:42 PM, Stjepan Rajko <stipe@asu.edu> wrote:
On Mon, Sep 1, 2008 at 4:23 PM, vicente.botet <vicente.botet@wanadoo.fr> wrote:
[snip documentation bugs]
Thanks for all the error-catching. As you see, there are some good reasons behind me thinking that the Generic Support Layer is not ready for review :-) But I'm glad you're giving it a look - I will give the docs for that layer a pass tomorrow and hopefully make them a little bit more up to snuff.
I have updated the generic support layer docs, and uploaded the new version: http://www.dancinghacker.com/code/dataflow/dataflow/support.html I (hopefully) corrected the errors you found as well as other ones. I have also updated the beginning of the VTK layer example to talk a little bit about the complemented ports it now uses (the latter part of the VTK layer example is still very much underdocumented). I also added another (undocumented) example of a support layer which is much simpler, but I'm not sure if it's any clearer without documentation. Thanks again, Stjepan

On Wed, Sep 3, 2008 at 12:10 PM, Stjepan Rajko <stipe@asu.edu> wrote:
I have updated the generic support layer docs, and uploaded the new version: http://www.dancinghacker.com/code/dataflow/dataflow/support.html
I just had my first glance at the dataport library documentation and I find it insufficient. Are there any examples on how the library is to be used? Are there any functions that are documented? For example, where is the documentation for the component_operation function template? Many people understand the general concepts, but the devil is in the details, and so I think that good documentation is critical for a dataflow library. A few questions: I haven't looked closely but I noticed that the library uses operator overloading to connect ports. What is the rationale for using operator overloading? Are weak connections supported? Can I connect two nodes in such a way that the connection automatically tears down if one of the components is destroyed? Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode

On Wed, Sep 3, 2008 at 12:34 PM, Emil Dotchevski <emil@revergestudios.com> wrote:
On Wed, Sep 3, 2008 at 12:10 PM, Stjepan Rajko <stipe@asu.edu> wrote:
I have updated the generic support layer docs, and uploaded the new version: http://www.dancinghacker.com/code/dataflow/dataflow/support.html
I just had my first glance at the dataport library documentation and I find it insufficient. Are there any examples on how the library is to be used? Are there any functions that are documented? For example, where is the documentation for the component_operation function template?
Just to be sure - did you only see the generic support layer referenced above or the entire documentation? The full docs are at: http://www.dancinghacker.com/code/dataflow/ BTW, the generic support layer is only provided as an implementation detail of the Dataflow.Signals layer (which is the focus of this review). The documentation of the generic support layer may be helpful in understanding how the library works, but it is by no means complete.
Many people understand the general concepts, but the devil is in the details, and so I think that good documentation is critical for a dataflow library.
A few questions:
I haven't looked closely but I noticed that the library uses operator overloading to connect ports. What is the rationale for using operator overloading?
operators are just syntactic sugar. Instead of a >>= b one can write connect(a,b). The rationale is my belief that in some cases, operator expressions might be more readable and concise. E.g., a >>= b >>= c; instead of connect(a,b); connect(b,c); Ultimately, the user can use whichever method he or she prefers.
Are weak connections supported? Can I connect two nodes in such a way that the connection automatically tears down if one of the components is destroyed?
I used to take advantage of the trackable functionality provided by Boost.Signals (it's just a matter of the library components inheriting from the appropriate class), which accomplishes this goal. At some point, I dropped it partly because the upcoming thread_safe_signals library didn't support that functionality (but I think things may have changed in the meantime), and partly because I ran into a case which didn't seem to be handled correctly by Boost.Signals (Doug Gregor thought it looked like a bug in Boost.Signals, but I never looked into it much further). I can reintroduce support for the trackable functionality in the library if desired (perhaps as an option). Thanks for taking a look! Stjepan

Jaakko Järvi wrote:
The review of Stjepan Rajko's Dataflow library starts today, September 1st, and will run until September 10th.
--------------------------------------------------------- Description of the library:
Dataflow is a generic library for dataflow programming. Dataflow programs can typically be expressed as a graph in which vertices represent components that process data, and edges represent the flow of data between the components. As such, dataflow programs can be easily reconfigured by changing the components and/or the connections.
This review focuses on the Dataflow.Signals layer of the library. For its data transport mechanism, Dataflow.Signals uses Boost.Signals which can be used to make lasting dataflow connections based on function calls. Dataflow.Signals provides the following to facilitate signals-based dataflow networks:
* A number of useful general-purpose components, and building blocks for implementing new components. * Various free functions and operators for connecting and using components.
The library documentation provides some concrete examples of how Dataflow.Signals layer can be used. Some examples are:
* Implementing distributed dataflow applications using Dataflow.Signals and Boost.Asio * An image processing network using Dataflow.Signals and Boost.GIL * A GUI dataflow editor (located in the Dataflow.Blueprint documentation)
While the Dataflow library contains other layers, only the Dataflow.Signals layer is ready for review. Reviewers are welcome to provide feedback for any part of the library, but please be aware that the documentation and implementation for the other layers may be lacking (for example, there is the generic support layer, which provides concepts applicable to different dataflow frameworks, and can be used to develop generic dataflow code, as well as the Dataflow.Blueprint layer which provides run-time reflection and modeling of dataflow networks in a Boost Graph Library graph for any dataflow framework with implemented Dataflow library support). For the time being, please consider these other layers as implementation details or proof-of-concept examples, as appropriate.
The library is accessible as a tarball at: http://www.boostpro.com/vault/index.php?&directory=Dataflow
The documentation can be accessed here: http://www.dancinghacker.com/code/dataflow/
The documentation particular to the Dataflow.Signals layer under review: http://www.dancinghacker.com/code/dataflow/dataflow/signals.html
Dataflow depends on many existing Boost libraries, such as Fusion and MPL. The Dataflow.Signals layer builds functionality over Boost.Signals. The Dataflow library has been tested using a recent version of the Boost trunk, as well as the 1.35 release. Tests and examples have been built successfuly on OS X (GCC 4.0, 4.2), Linux (GCC 4.2), and Windows (MSVC 8.0 and to some degree MinGW GCC 4.2). ---------------------------------------------------------
--------------------------------------------------------- Questions you may want to answer in your review:
- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?
In particular, please remember to answer the following question explicitly:
- Do you think the library should be accepted as a Boost library?
Jaakko Järvi Review Manager
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Hey, The filter of the GIl example is always processed, whether or whether not it is multiplexed to the output, right? Maybe it would be good to insert a conditional component right before the filter, therefore the data is dropped, if the multiplexer takes data from slot 0. Otherwise a senseless computation is done, if the filter is turned off. I wanted to alter the example as suggested, but i don't know how to use the conditional component? I couldn't find a example for that. Maybe you could point out how to use it. Kind Regards Manuel

Hi Manuel, On Tue, Sep 2, 2008 at 8:48 AM, Manuel Jung <gzahl@arcor.de> wrote:
Hey,
The filter of the GIl example is always processed, whether or whether not it is multiplexed to the output, right? Maybe it would be good to insert a conditional component right before the filter, therefore the data is dropped, if the multiplexer takes data from slot 0. Otherwise a senseless computation is done, if the filter is turned off.
This is correct.
I wanted to alter the example as suggested, but i don't know how to use the conditional component? I couldn't find a example for that. Maybe you could point out how to use it.
The conditional component is a generic component that is used to make other components. As such, it wouldn't serve the purpose you need out of the box. But the junction component would: http://www.dancinghacker.com/code/dataflow/dataflow/signals/components/flow/... That particular page doesn't mention it (although the example demonstrates it), but junction also has gate functionality (it's mentioned in the component summary). Actually, "gate" would probably be a better name for this component. Incidentally, junction is implemented using the conditional component like this: http://svn.boost.org/trac/boost/browser/sandbox/SOC/2007/signals/boost/dataf... Best, Stjepan

Stjepan Rajko wrote: Hey,
Hi Manuel,
On Tue, Sep 2, 2008 at 8:48 AM, Manuel Jung <gzahl@arcor.de> wrote:
Hey,
The filter of the GIl example is always processed, whether or whether not it is multiplexed to the output, right? Maybe it would be good to insert a conditional component right before the filter, therefore the data is dropped, if the multiplexer takes data from slot 0. Otherwise a senseless computation is done, if the filter is turned off.
This is correct.
I wanted to alter the example as suggested, but i don't know how to use the conditional component? I couldn't find a example for that. Maybe you could point out how to use it.
The conditional component is a generic component that is used to make other components. As such, it wouldn't serve the purpose you need out of the box. But the junction component would:
http://www.dancinghacker.com/code/dataflow/dataflow/signals/components/flow/...
That particular page doesn't mention it (although the example demonstrates it), but junction also has gate functionality (it's mentioned in the component summary). Actually, "gate" would probably be a better name for this component.
Ah, yes, i read the junction page, even opened it through the conditional page, but missed it. And i agree, "gate" would be a better name!
Incidentally, junction is implemented using the conditional component like this:
http://svn.boost.org/trac/boost/browser/sandbox/SOC/2007/signals/boost/dataf...
Best,
Stjepan
Kind regards Manuel
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

The review of Stjepan Rajko's Dataflow library starts today, September 1st, and will run until September 10th.
[snip]
While the Dataflow library contains other layers, only the Dataflow.Signals layer is ready for review. Reviewers are welcome to provide feedback for any part of the library, but please be aware that the documentation and implementation for the other layers may be lacking (for example, there is the generic support layer, which provides concepts applicable to different dataflow frameworks, and can be used to develop generic dataflow code, as well as the Dataflow.Blueprint layer which provides run-time reflection and modeling of dataflow networks in a Boost Graph Library graph for any dataflow framework with implemented Dataflow library support). For the time being, please consider these other layers as implementation details or proof-of-concept examples, as appropriate.
[snip]
- Do you think the library should be accepted as a Boost library?
Is the work proposed for inclusion in Boost in its current form? (In which case, we should take into account its scope and reject it if we consider it too limited.) Or are we to review this as a first part of something, with subsequent parts to be reviewed separately before anything is finally accepted? For example, am I right in thinking that the Signals component is not even usable without the Generic component, which is not being reviewed? Perhaps the review manager could comment. A couple of initial thoughts, to provoke discussion: - I hope everyone has seen the 2-dimensional "ASCII Art" dataflow graphs using operator overloading: // ---Connect the dataflow network ----------------------------------------- // // ,---------. // | control | -----------------------+ // `---------' | // | | // v v // ,-------. ,-----------. ,-------. // | timer | --> | generator | -+--------------> 0 | ,---------. // `-------' `-----------' | | mux | -> | display | // | ,--------. | | `---------' // +->| filter |--> 1 | // `--------' `-------' // // ------------------------------------------------------------------------- timer >>= generator | mux.slot<0>() | (filter >>= mux.slot<1>()); mux >>= display; control.value_signal >>= generator; control.select_signal >>= mux.select_slot(); My comment: if the operator overloading were good enough, you wouldn't need the picture in the comment block. Since it isn't good enough, why not stick to connect(): connect(timer,generator); connect(generator,mux.slot<0>); connect(generator,filter); connect(filter,mux.slot<1>); connect(mux,display); connect(control.value_signal,generator); connect(control.select_signal,mux.select_slot); - The hard aspects of this sort of thing are buffering, distributing work over threads, and so on. I don't think you have any of that; the components always process one datum at a time. As it happens I've recently written some code that grabs an image from a web cam, annotates it with some text and shows it on the screen - very much like your example. This was using the video4linux2 API. To get a respectable framerate you need to queue up a number of frame requests in the kernel, but if you queue too many you'll get too much latency. I honestly can't say that I feel this library would have made this code any easier to write, or given any better results. I wonder if there are more complex motivating examples that would better illustrate it? Regards, Phil.

On Tue, Sep 2, 2008 at 2:53 PM, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
- Do you think the library should be accepted as a Boost library?
Is the work proposed for inclusion in Boost in its current form? (In which case, we should take into account its scope and reject it if we consider it too limited.) Or are we to review this as a first part of something, with subsequent parts to be reviewed separately before anything is finally accepted? For example, am I right in thinking that the Signals component is not even usable without the Generic component, which is not being reviewed?
I was intending the Generic layer to be considered as an implementation detail of the Dataflow.Signals layer. If Dataflow.Signals got accepted into boost, it would definitely need the generic layer to function, but I would not expose the generic layer (in the docs, for example) until it underwent its own review and got reviewed favorably. I left the docs for the other layers included for the review so that reviewers can have access to as much information as possible, but as was indicated in the review statement, please treat the other layers as implementation details (e.g., the generic layer) or as proof of concept examples (e.g., the Blueprint layer or the GUI editor based on it).
Perhaps the review manager could comment.
A more official opinion would be great.
A couple of initial thoughts, to provoke discussion:
- I hope everyone has seen the 2-dimensional "ASCII Art" dataflow graphs using operator overloading:
[snip]
My comment: if the operator overloading were good enough, you wouldn't need the picture in the comment block. Since it isn't good enough, why not stick to connect():
I agree that the operator syntax isn't always perfectly readable, especially until one gets used to it. Would it be useful if I provided both operator-based and connect-based construction of networks in the examples? I think in some cases the operator syntax is good enough (and if you have suggestions on how you would like the operator syntax to look so that it is more readable, I'm open to changing it).
- The hard aspects of this sort of thing are buffering, distributing work over threads, and so on. I don't think you have any of that; the components always process one datum at a time. As it happens I've recently written some code that grabs an image from a web cam, annotates it with some text and shows it on the screen - very much like your example. This was using the video4linux2 API. To get a respectable framerate you need to queue up a number of frame requests in the kernel, but if you queue too many you'll get too much latency. I honestly can't say that I feel this library would have made this code any easier to write, or given any better results. I wonder if there are more complex motivating examples that would better illustrate it?
I've been working on an example that uses thread_safe_signals and the threadpool library to allow asynchronous tasks. It was based on a suggestion by Manuel Jung - the thread can be seen here: http://tinyurl.com/5huap7 [nabble] I am planning to provide a couple more examples of how this could be added to Dataflow.Signals, based on Manuel Jung's latest feedback. Is this closer to what you would find useful? Thank you for your feedback, Stjepan

"Phil Endecott" <spam_from_boost_dev@chezphil.org> writes:
- Do you think the library should be accepted as a Boost library?
Is the work proposed for inclusion in Boost in its current form? (In which case, we should take into account its scope and reject it if we consider it too limited.) Or are we to review this as a first part of something, with subsequent parts to be reviewed separately before anything is finally accepted? For example, am I right in thinking that the Signals component is not even usable without the Generic component, which is not being reviewed? Perhaps the review manager could comment.
The roles of the layers are (Stjepan feel free to correct me) as follows: -- The Generic Support layer specifies a generic interface (a set of concepts) of any dataflow framework. The purpose of the Generic support layer is to enable generic code that is parameterized over a particular dataflow framework. The library gives an example of such code: Dataflow.Blueprint. -- Dataflow.signals is one particular instance of a framework that conforms to the Generic Support Layer's interface (the library gives an example of another, VTK). Per the request of the submitter, the library under review is the Dataflow.Signals library. As a review manager, I will interpret a positive vote to mean support for including the Dataflow.Signals library to Boost, without any expectations or conditions regarding the other layers. The division between the layers is not clear cut, as the Signals layer does rely on the Generic layer. The components that are necesessary to include form the Generic layer will not be part of Dataflow.Signals's public interface. Of course, the usefulness of Dataflow.Signals as a stand-alone library is a relevant criterion in a review. The presence of the other layers gives the bigger picture, and shows what the author plans to bring forward in the future. I'm assuming that their own separate reviews will be required for those layers. Best Regards, Jaakko Järvi Review Manager

I was reading other review commentary when it occurred to me that Boost.Accumulators is also implementing a dataflow system. I would like to know if Stjepan considered Boost.Accumulators and whether his library could build on some of that work. Thanks, -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Wed, Sep 3, 2008 at 5:08 PM, David Abrahams <dave@boostpro.com> wrote:
I was reading other review commentary when it occurred to me that Boost.Accumulators is also implementing a dataflow system. I would like to know if Stjepan considered Boost.Accumulators and whether his library could build on some of that work.
I have looked at Boost.Accumulators, and really like it's dataflow characteristics, especially the way it deals with dependencies. As far as building on it, at the moment the Dataflow library is in it's entirety focused on connecting components at run-time, and (unless I am mistaken), the dataflow connections in a Boost.Accumulators accumulator_set are all determined at compile time. So far, my only adventure into compile-time connections was a brief component composition study for Dataflow.Signals, where I verified that one could take a chain of components (what would at run-time be connected using the syntax, e.g., c1 >>= c2 >>= c3) and turn it into a single component that performs the same operation, but with the intermediate calls being connected at compile-time. By that I mean that, e.g., where in the run-time chain c1 would send a signal using Boost.Signals to c2, in the compile-time composition c1 calls c2 directly (with the call possibly being optimized out by the compiler). At some point (and that might be far into the future), I would like to extend the Dataflow library to the compile-time realm. At that point, it will need a way for dataflow networks to be specified at compile time, and so far my thoughts on possible ways of doing that are: * using Proto expression trees * using, factoring out, or mimicking the way Boost.Accumulators does it * using a compile-time metagraph library, like the one suggested by Gordon Woodhull: http://archives.free.net.ph/message/20080706.145113.313c713e.en.html Best, Stjepan

on Wed Sep 03 2008, "Stjepan Rajko" <stipe-AT-asu.edu> wrote:
On Wed, Sep 3, 2008 at 5:08 PM, David Abrahams <dave@boostpro.com> wrote:
I was reading other review commentary when it occurred to me that Boost.Accumulators is also implementing a dataflow system. I would like to know if Stjepan considered Boost.Accumulators and whether his library could build on some of that work.
I have looked at Boost.Accumulators, and really like it's dataflow characteristics, especially the way it deals with dependencies. As far as building on it, at the moment the Dataflow library is in it's entirety focused on connecting components at run-time, and (unless I am mistaken), the dataflow connections in a Boost.Accumulators accumulator_set are all determined at compile time.
I believe that is correct, except for "dropping" of accumulators, which adds a runtime component. I am not really an expert on what you'd use such a library for It seems like compile-time configurability is of much greater interest in general for problems you'd approach with dataflow, especially if you are using a DSEL to describe the system. In other words, when you actually *need* runtime configurability you'd probably want a graphical front-end or something, and the syntax of making connections in C++ wouldn't matter much. Am I missign something?
So far, my only adventure into compile-time connections was a brief component composition study for Dataflow.Signals, where I verified that one could take a chain of components (what would at run-time be connected using the syntax, e.g., c1 >>= c2 >>= c3) and turn it into a single component that performs the same operation, but with the intermediate calls being connected at compile-time. By that I mean that, e.g., where in the run-time chain c1 would send a signal using Boost.Signals to c2, in the compile-time composition c1 calls c2 directly (with the call possibly being optimized out by the compiler).
At some point (and that might be far into the future), I would like to extend the Dataflow library to the compile-time realm. At that point, it will need a way for dataflow networks to be specified at compile time, and so far my thoughts on possible ways of doing that are: * using Proto expression trees * using, factoring out, or mimicking the way Boost.Accumulators does it * using a compile-time metagraph library, like the one suggested by Gordon Woodhull: http://archives.free.net.ph/message/20080706.145113.313c713e.en.html
Yeah, I couldn't really get a grip on what Gordon was describing, so I thought I'd wait for his "soon, soon" which AFAIK hasn't happened yet :-) -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Thu, Sep 4, 2008 at 6:51 AM, David Abrahams <dave@boostpro.com> wrote:
on Wed Sep 03 2008, "Stjepan Rajko" <stipe-AT-asu.edu> wrote:
I have looked at Boost.Accumulators, and really like it's dataflow characteristics, especially the way it deals with dependencies. As far as building on it, at the moment the Dataflow library is in it's entirety focused on connecting components at run-time, and (unless I am mistaken), the dataflow connections in a Boost.Accumulators accumulator_set are all determined at compile time.
I believe that is correct, except for "dropping" of accumulators, which adds a runtime component.
Yes, that is true.
I am not really an expert on what you'd use such a library for It seems like compile-time configurability is of much greater interest in general for problems you'd approach with dataflow, especially if you are using a DSEL to describe the system. In other words, when you actually *need* runtime configurability you'd probably want a graphical front-end or something, and the syntax of making connections in C++ wouldn't matter much. Am I missign something?
You're right - in fact I already put together a proof-of-concept editor which can be used with any framework that has a Dataflow library support layer. Some videos can be seen here: http://dancinghacker.blip.tv/posts?view=archive
* using a compile-time metagraph library, like the one suggested by Gordon Woodhull: http://archives.free.net.ph/message/20080706.145113.313c713e.en.html
Yeah, I couldn't really get a grip on what Gordon was describing, so I thought I'd wait for his "soon, soon" which AFAIK hasn't happened yet :-)
I think he has some more descriptive posts, but that was the best I could find last night :-( If it helps any more to illustrate his idea, I think these examples are his: http://svn.boost.org/svn/boost/sandbox/metagraph/libs/metagraph/example/ Kind regards, Stjepan

on Thu Sep 04 2008, "Stjepan Rajko" <stipe-AT-asu.edu> wrote:
On Thu, Sep 4, 2008 at 6:51 AM, David Abrahams <dave@boostpro.com> wrote:
I am not really an expert on what you'd use such a library for It seems like compile-time configurability is of much greater interest in general for problems you'd approach with dataflow, especially if you are using a DSEL to describe the system. In other words, when you actually *need* runtime configurability you'd probably want a graphical front-end or something, and the syntax of making connections in C++ wouldn't matter much. Am I missign something?
You're right - in fact I already put together a proof-of-concept editor which can be used with any framework that has a Dataflow library support layer.
Okay, but which applications need runtime configurability? Would not the performance advantages of a compile-time structure be more valuable than the flexibility of runtime configuration in most applications?
Some videos can be seen here:
Thanks, I recall looking at those. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Thu, Sep 4, 2008 at 7:21 PM, David Abrahams <dave@boostpro.com> wrote:
on Thu Sep 04 2008, "Stjepan Rajko" <stipe-AT-asu.edu> wrote:
On Thu, Sep 4, 2008 at 6:51 AM, David Abrahams <dave@boostpro.com> wrote:
I am not really an expert on what you'd use such a library for It seems like compile-time configurability is of much greater interest in general for problems you'd approach with dataflow, especially if you are using a DSEL to describe the system. In other words, when you actually *need* runtime configurability you'd probably want a graphical front-end or something, and the syntax of making connections in C++ wouldn't matter much. Am I missign something?
You're right - in fact I already put together a proof-of-concept editor which can be used with any framework that has a Dataflow library support layer.
Okay, but which applications need runtime configurability? Would not the performance advantages of a compile-time structure be more valuable than the flexibility of runtime configuration in most applications?
Distributed applications could be a good use case. If you have your components on different machines, you can't take advantage of compile-time structure anyways. Even for normal applications it is often very useful to be able to reorganize your pipeline without a recompile, at least for coarse grained components, where the benefit of static checking and optimization might be less important. In particular, the ability to add or remove sink and sources, or disabling optional components, is very useful. -- gpd

on Thu Sep 04 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:
On Thu, Sep 4, 2008 at 7:21 PM, David Abrahams <dave@boostpro.com> wrote:
Okay, but which applications need runtime configurability? Would not the performance advantages of a compile-time structure be more valuable than the flexibility of runtime configuration in most applications?
Distributed applications could be a good use case. If you have your components on different machines, you can't take advantage of compile-time structure anyways.
Even for normal applications it is often very useful to be able to reorganize your pipeline without a recompile, at least for coarse grained components, where the benefit of static checking and optimization might be less important. In particular, the ability to add or remove sink and sources, or disabling optional components, is very useful.
Good points, all. Thanks, -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Thu, Sep 4, 2008 at 10:58 AM, David Abrahams <dave@boostpro.com> wrote:
on Thu Sep 04 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:
On Thu, Sep 4, 2008 at 7:21 PM, David Abrahams <dave@boostpro.com> wrote:
Okay, but which applications need runtime configurability? Would not the performance advantages of a compile-time structure be more valuable than the flexibility of runtime configuration in most applications?
Distributed applications could be a good use case. If you have your components on different machines, you can't take advantage of compile-time structure anyways.
Even for normal applications it is often very useful to be able to reorganize your pipeline without a recompile, at least for coarse grained components, where the benefit of static checking and optimization might be less important. In particular, the ability to add or remove sink and sources, or disabling optional components, is very useful.
Good points, all.
Indeed - thanks for mentioning these. Some more examples are when you would like to load the components dynamically (e.g., using Boost.Extension) so that new components could be added without touching the main application. In combination with the GUI system, it can also be used for rapid prototyping (in a future far far away, when/if the Dataflow library starts supporting compile-time specification of dataflow networks, I could see a cycle involving rapid prototyping using the GUI, and then exporting the final solution to a source file which can be used to produce an optimized version of the run-time network). Stjepan

on Thu Sep 04 2008, "Stjepan Rajko" <stipe-AT-asu.edu> wrote:
Indeed - thanks for mentioning these. Some more examples are when you would like to load the components dynamically (e.g., using Boost.Extension) so that new components could be added without touching the main application. In combination with the GUI system, it can also be used for rapid prototyping (in a future far far away, when/if the Dataflow library starts supporting compile-time specification of dataflow networks, I could see a cycle involving rapid prototyping using the GUI, and then exporting the final solution to a source file which can be used to produce an optimized version of the run-time network).
Maybe I am misremembering, but IIRC we did the same thing for the accumulators library by adding Python wrappers. Naturally that would mean there needed to be *some* kind of dynamic layer. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Stjepan: I'm just curious, because my wife uses such a dataflow program for interactive music performance. The program is called MAX and is distributed by a company named Cyclic74 (http://www.cycling74.com/products/max5). Some of the components are MIDI inputs or outputs, and the signal can consist of a scalar or or more evolved entities like a soundwave, etc. It has MSP (audio signal processing) and Jitter (video processing) modules. It's been in use for more than 20 yrs, allows graphical interface programming, generation of C code, and separate compilation. One of the worse quirks when flowing the data (in a push mode, I guess) is that components are processed in (get this) geometric order, from left to right (breaking ties with top to bottom, I think). I guess it's important to know what the order i, when you allow feedback loops (cycles). But the left-to-right has been a never-ending source of bugs, in my wife's experience :-) Which prompts me to ask: - can you create push networks with cycles? - in the presence of cycles, how is the signal flowing to the data sinks? - in the absence of cycles, do you process the components in topological sorting order? or is the evaluation order arbitrary? Surely I could deduce that from your docs if I had read carefully the interface of the components, but why not discuss it up front? I'm lacking a general high-level view of what I can and can't do with this library. A nice feature of Max/MSP though, is that you can create a "patch" which recursively acts as an atomic component, and can be compiled separately, so you can create libraries of patches some of which are opaque to enforce intellectual property (not that I approve, but...). Is there an abstract base class for component? This would enable one conceivably to distribute one's own components as DLLs (i.e., not source). MAX was originally conceived by Miller Puckette, who went on to rewrite it in Tcl/Tk as PureData (http://en.wikipedia.org/wiki/ Pure_Data). It still has connection to Midi, but is more geared toward data manipulation. These are the ones I'm more familiar with. I just found the WikiPedia page on my own (http://en.wikipedia.org/wiki/ Dataflow_programming) and *then* (only then) noticed the link in your description (second section of Introduction page documentation). It's way too discrete imho. Bring it out, write a short paragraph about some known examples, fire up the imagination, something. As it is, it's a really steep curve in the introduction (which is an introduction to your library, but not to the topic). Beyond that, I've just started browsing, but I was a bit put off by the steepness of the documentation and the use of operator overloading. It would really benefit from a gentler introduction that is not based on the existing library but on the general idea of dataflow programming. It gets more concrete in the examples, but that's too late, I've already been hopelessly confused by that point :) Cheers, -- Hervé Brönnimann hervebronnimann@mac.com

On Thu, Sep 4, 2008 at 8:24 PM, Hervé Brönnimann <hervebronnimann@mac.com> wrote:
Stjepan: I'm just curious, because my wife uses such a dataflow program for interactive music performance.
The program is called MAX and is distributed by a company named Cyclic74
I really appreciate you bringing up MAX - it was actually my exposure to MAX that eventually lead me to write the Dataflow library. I had previously seen dataflow programming in LabVIEW, but it wasn't until MAX that I realized how easily people (with no formal programming training) did really cool things with such an environment. It lead me to experiment with the dataflow paradigm within C++, and eventually realized that Boost.Signals is a good way of connecting components together. That's how the Dataflow.Signals layer got started.
One of the worse quirks when flowing the data (in a push mode, I guess) is that components are processed in (get this) geometric order, from left to right (breaking ties with top to bottom, I think). I guess it's important to know what the order i, when you allow feedback loops (cycles). But the left-to-right has been a never-ending source of bugs, in my wife's experience :-)
Yep :-)
Which prompts me to ask: - can you create push networks with cycles?
Yes, as long as the components are designed in such a way that doesn't propagate the signal in an infinitely recursive loop. For example, MAX solves some situations with cycles using a convention - signals received on the leftmost inlet are typically processed and propagated, while those received on other inlets are just stored and not propagated. Hence, connecting something to a non-leftmost inlet can break the infinite recursion in cycles. Another way of dealing with cycles is using threading (with some components / sets of components executing in their own threads). The example that was developed at the end of the following thread would allow cycles through the use of threading: http://tinyurl.com/5huap7 [nabble] I hope to expand on that example soon.
- in the presence of cycles, how is the signal flowing to the data sinks?
In the case of the above threading example, at some point of the cycle a component would submit a task to a thread pool corresponding to the signal call (all data communication in the Dataflow.Signals layer happens through Boost.Signals).
- in the absence of cycles, do you process the components in topological sorting order? or is the evaluation order arbitrary?
Currently, arbitrary. In practice, I believe the signals get sent out according to the order in which the consumers were connected to the producer, but I don't think that Boost.Signals guarantees that. Boost.Signals does have a way of ordering signals: http://www.boost.org/doc/libs/1_36_0/doc/html/signals/tutorial.html#id347051... ... Unfortunatelly, Dataflow.Signals doesn't take advantage of it (yet). The library currently offers no way of customizing a connection (e.g., to specify the signaling order), which is a serious limitation that I need to fix.
Surely I could deduce that from your docs if I had read carefully the interface of the components, but why not discuss it up front? I'm lacking a general high-level view of what I can and can't do with this library.
This review is helping me tremendously in understanding which parts of the documentation are more useful than others, and what is missing. I think the docs are definitely going to get restructured :-) To answer your question though, I think the most useful thing about the Dataflow.Signals layer (this layer is the focus of the review) is that it provides tools to implement components that can be used in a dataflow network - both specific ones (e.g., something that generates a particular type of image), and generic ones (something that can be used with any signal signature / any number of arguments, e.g., a component that doubles each of the arguments it receives before passing them on). It also makes connecting components easier, IMO. As far as applications where a library like this might be useful, Giovanni Piero Deretta mentioned a few examples earlier in this thread. The specific domains would be determined by what component one has available / is willing to develop.
A nice feature of Max/MSP though, is that you can create a "patch" which recursively acts as an atomic component, and can be compiled separately, so you can create libraries of patches some of which are opaque to enforce intellectual property (not that I approve, but...). Is there an abstract base class for component? This would enable one conceivably to distribute one's own components as DLLs (i.e., not source).
Yes - the Dataflow.Blueprint layer offers such functionality (but note that this layer is still very much a prototype). As you're familiar with MAX, you might find interesting a sample visual editor I developed for the Dataflow library: http://dancinghacker.com/code/dataflow/dataflow/blueprint/examples/fltk_gui.... If you haven't seen them, there are some videos of the editor in action: http://dancinghacker.blip.tv/posts?view=archive&nsfw=dc
These are the ones I'm more familiar with. I just found the WikiPedia page on my own (http://en.wikipedia.org/wiki/Dataflow_programming) and *then* (only then) noticed the link in your description (second section of Introduction page documentation). It's way too discrete imho. Bring it out, write a short paragraph about some known examples, fire up the imagination, something. As it is, it's a really steep curve in the introduction (which is an introduction to your library, but not to the topic).
OK, I can try to improve that part of the documentation.
Beyond that, I've just started browsing, but I was a bit put off by the steepness of the documentation and the use of operator overloading. It would really benefit from a gentler introduction that is not based on the existing library but on the general idea of dataflow programming. It gets more concrete in the examples, but that's too late, I've already been hopelessly confused by that point :)
Oh oh :-( I was hoping that the "Dataflow programming in C++" section would serve as a gentler introduction, but perhaps it does jump into the library rather quickly and steeply. I am curious, what did you find off-putting about the operator overloading? I am finding this to be a rather contentious issue. Thank you for your feedback, Stjepan

Stjepan Rajko wrote:
On Thu, Sep 4, 2008 at 8:24 PM, Herv? Br?nnimann <hervebronnimann@mac.com> wrote:
Stjepan: I'm just curious, because my wife uses such a dataflow program for interactive music performance.
The program is called MAX and is distributed by a company named Cyclic74
I really appreciate you bringing up MAX - it was actually my exposure to MAX that eventually lead me to write the Dataflow library. I had previously seen dataflow programming in LabVIEW, but it wasn't until MAX that I realized how easily people (with no formal programming training) did really cool things with such an environment.
This is some very interesting rationale that you hadn't previously shared with us ;-) A question that had been forming in my mind was, "Who is this library for?". Do you see this library useful primarily for people with no (or little) formal programming training? If so, do you really think that Boost, or for that matter C++, is the right starting point? Or perhaps you believe that the benefits that those people see would also apply to mainstream software developers? Are you hoping that "regular C++ programmers like us" will start to use the dataflow style with the help of the library, or is there an existing body of dataflow programmers who currently use some other language who can be "converted"? I also wonder how much of the benefit of a graphical environment like LabVIEW carry over into your textual dataflow description (even with 2D operator overloading). It seems to me that one of the main benefits of a GUI is that the user is somewhat guided towards a "syntactically correct" program by the help of, for example, labelled parameter fields to fill in on the components. That, and other aspects, are lost. It may be significant that systems like LabView and MAX have not escaped from their niche application areas. "Real programmers prefer text" perhaps. Note that over the last couple of decades, chip design has almost entirely moved from GUI input (schematic capture) to textual input (hardware description languages). And while thinking about hardware description languages, note also that they don't expose any sort of dataflow model even though the underlying circuit often has that sort of structure. Similarly, when people were building "dataflow computers" back in the '80s they wrote compilers that hid the dataflow nature of the hardware behind a more conventional programming language (e.g. SISAL). The closest to what you're proposing that I have seen is the stuff that the GNU Radio project is doing; I mentioned them on this list once before. (They have been in the news recently after operating a GSM base station using their software radio at Burning Man.) It would be really helpful if you could perhaps try to re-implement some of their stuff using your library, and see how it compares in terms of ease of coding, performance etc. You could ask them to compare it themselves and submit reviews here. I think you really need to justify to us why this library is useful, and to whom.
- can you create push networks with cycles?
Yes, as long as the components are designed in such a way that doesn't propagate the signal in an infinitely recursive loop.
I am reminded here of the design of asynchronous circuits using handshake signalling. You might like to have a look at Kees van Berkel's thesis, which seems to be visible at books.google.com. Regards, Phil.

I think you really need to justify to us why this library is useful, and to whom.
I plan to use Boost.Dataflow to stack protocol-classes to protocol stacks (communication protocol stacks). Maybe I can use Boost.Dataflow for my RETE implementation. Oliver

On Fri, Sep 5, 2008 at 4:40 AM, Kowalke Oliver (QD IT PA SI) <Oliver.Kowalke@qimonda.com> wrote:
I plan to use Boost.Dataflow to stack protocol-classes to protocol stacks (communication protocol stacks). Maybe I can use Boost.Dataflow for my RETE implementation.
Please let me know how it goes. RETE as in http://en.wikipedia.org/wiki/Rete_algorithm ? Stjepan

On Fri, Sep 5, 2008 at 4:30 AM, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Stjepan Rajko wrote:
I really appreciate you bringing up MAX - it was actually my exposure to MAX that eventually lead me to write the Dataflow library. I had previously seen dataflow programming in LabVIEW, but it wasn't until MAX that I realized how easily people (with no formal programming training) did really cool things with such an environment.
This is some very interesting rationale that you hadn't previously shared with us ;-)
A question that had been forming in my mind was, "Who is this library for?". Do you see this library useful primarily for people with no (or little) formal programming training?
Thanks for bringing this up - it is a very important piece of information that I completely neglected to address in the documentation. At it's current state, I think the library is more geared towards people that are designing plugin-based Dataflow systems (like MAX or LabVIEW), or developing an application / library for which the dataflow paradigm is particularly suitable. In the context of Dataflow.Signals, that means providing facilities for programmers to create new components, and providing different ways in which the components can be connected (e.g., through GUI, using operators, using the connect function, using the bind_mem_fn helper). Once someone develops a reasonably complete set of components for a particular domain, *then* a person with less programming experience could use those components to create their own applications, assuming they are also comfortable with whatever facility for connecting is given to them (GUI vs. connecting in C++).
If so, do you really think that Boost, or for that matter C++, is the right starting point? Or perhaps you believe that the benefits that those people see would also apply to mainstream software developers? Are you hoping that "regular C++ programmers like us" will start to use the dataflow style with the help of the library, or is there an existing body of dataflow programmers who currently use some other language who can be "converted"?
I have a few points related to these questions. I can tell you from my personal experience as a "regular C++ programmer" that, after I started playing with MAX, I found that way of programming (meaning dataflow, not necessarily needing a GUI editor) very well suited for certain kinds of applications. I worked with signal processing quite a bit (in C++), and after seeing MAX found myself trying to implement them in a dataflow sort of ways - developing components, and then connecting them. I came up with a precursor to Dataflow.Signals, and found it very useful for signal processing (I used the operator syntax with it). It eased development and maintainability, simply because it allowed me to program in a paradigm that matched the application very well. To address its applicability to boost, I have two angles on that. For one, I would assume that a good number of programmers that use boost have clients with limited programming expertise. They could use Dataflow.Signals like I mentioned above - by developing a relevant set of components, and training the client on how to connect them and use them. Again, from my experience, people with absolutely no programming experience can understand the dataflow paradigm very quickly. I've seen it happen. I think it is because it uses metaphors that are readily accessible to many people, as we tend to connect things in our daily lives (e.g., connecting headphones to an iPod, a DVD player to a TV, a guitar to a pedal to an amp), and we are also used to tweaking the behavior of the components to get the desired result (e.g., turning the volume knob). As far as the other angle... Bjarne Stroustrup gave a very interesting talk at the last BoostCon, and it had to do with providing different levels of accessibility / configurability trade-offs to C++ programmers. E.g., an entry level version of a library (or set of libraries) that is super easy to use (and debug with) but maybe not very flexible (although providing enough features to make it useful & fun), vs. a fully templated super powerful version of the same library that can assume you really know what you are doing. I actually do see the Dataflow library as something that can address the high accessibility / low configurability (of individual components) end of the spectrum, especially when combined with a GUI. Furthermore, I think it can be conducive to leading people to learn more about C++ (because the system itself could be used without any C++ knowledge whatsoever). Sooner or later, many users of a GUI/dataflow system will find the need to extend a component in some new way or develop a new component. It wouldn't be difficult to show someone how to take an existing component that adds two numbers, and change it to something that multiplies the two numbers (granted, they would have to learn how to compile C++ / boost which can be non-trivial). If that sparks their interest and they learn more C++, perhaps they will advance to wanting to develop a component that works with files, and learn Boost.Filesystem :-)
I also wonder how much of the benefit of a graphical environment like LabVIEW carry over into your textual dataflow description (even with 2D operator overloading). It seems to me that one of the main benefits of a GUI is that the user is somewhat guided towards a "syntactically correct" program by the help of, for example, labelled parameter fields to fill in on the components. That, and other aspects, are lost.
Yes, the operator syntax is a gross simplification / approximation and probably only suitable for simple networks. As far as labeling fields, the library offers some help. It is possible to provide appropriately named member functions that return a port - for example, the storage component has a member function send_slot(), which returns the port corresponding to the send() member function. So, if I do, storage<void()> initiate_send_of_zero; storage<void(int)> zero(0); storage<void(int)> receiver; connect (initiate_send_of_zero, zero.send_slot()); connect (zero, receiver); initiate_send_of_zero.send(); what will happen is: initiate_send_of_zero will send a void() signal to the zero.send() function (i.e., the zero.send() function will be called); when zero.send() is called, it will send it's stored value (0) to receiver. receiver now holds a 0. In addition, the library offers a way to enumerate all of the ports - you can do things like connect(get_port_c<0>(a), get_port_c<1>(b)); there are also mpl versions: connect(get_port<mpl::int_<0> >(a), get_port_c<mpl::int_<1> >(b)); So, you can have member typedefs for the ports that carry the name of the port (e.g., typedef mpl::int<0> output;), in which case you could do: connect(get_port<A::output>(a), get_port<B::input>(b)); Unfortunately, registering all the ports so they can be enumerated is an advanced topic and not discussed in the documentation yet :-(
It may be significant that systems like LabView and MAX have not escaped from their niche application areas. "Real programmers prefer text" perhaps. Note that over the last couple of decades, chip design has almost entirely moved from GUI input (schematic capture) to textual input (hardware description languages).
The Dataflow library as a whole is an attempt to provide dataflow functionality regardless of the programming medium (text or visual).
And while thinking about hardware description languages, note also that they don't expose any sort of dataflow model even though the underlying circuit often has that sort of structure. Similarly, when people were building "dataflow computers" back in the '80s they wrote compilers that hid the dataflow nature of the hardware behind a more conventional programming language (e.g. SISAL).
I have found dataflow programming useful precisely when it matches the underlying problem structure.
The closest to what you're proposing that I have seen is the stuff that the GNU Radio project is doing; I mentioned them on this list once before. (They have been in the news recently after operating a GSM base station using their software radio at Burning Man.) It would be really helpful if you could perhaps try to re-implement some of their stuff using your library, and see how it compares in terms of ease of coding, performance etc. You could ask them to compare it themselves and submit reviews here.
I tried suggesting a Google Summer of Code project along these lines: http://lists.gnu.org/archive/html/discuss-gnuradio/2008-02/msg00247.html ... but got very little response. In terms of reimplementing - when it comes to an existing dataflow system, the purpose of the dataflow library wouldn't be so much in reimplementing it, but helping build on the core of the dataflow framework. I.e., we can take the base GNU Radio framework which provides the component / connection code, and develop a Dataflow support layer for it. Then, we get things like the GUI for free. Unfortunately, it seems like GNU radio doesn't support pure C++ programs fully yet (I will give it a try when it does). I have done a similar experiment with VTK though - in the dataflow library, there is an example in libs/dataflow/example/glv_gui/glvgui_vtk which brings up the GUI editor with the 5 components from the following example: http://www.dancinghacker.com/code/dataflow/dataflow/support/examples/new_lay... With the GUI editor, you can instantiate the 5 components, connect them in the correct chain, and click on the final component to invoke the rendered scene.
I think you really need to justify to us why this library is useful, and to whom.
This has definitely been lacking, but I hope I did a little bit of a better job in my response here.
- can you create push networks with cycles?
Yes, as long as the components are designed in such a way that doesn't propagate the signal in an infinitely recursive loop.
I am reminded here of the design of asynchronous circuits using handshake signalling. You might like to have a look at Kees van Berkel's thesis, which seems to be visible at books.google.com.
Will do - thanks for the reference. And thank you for the continuing discussion about the library, it is very helpful. Best, Stjepan

Phil Endecott wrote: <snip>
It may be significant that systems like LabView and MAX have not escaped from their niche application areas. "Real programmers prefer text" perhaps. Note that over the last couple of decades, chip design has almost entirely moved from GUI input (schematic capture) to textual input (hardware description languages).
And while thinking about hardware description languages, note also that they don't expose any sort of dataflow model even though the underlying circuit often has that sort of structure. Similarly, when people were building "dataflow computers" back in the '80s they wrote compilers that hid the dataflow nature of the hardware behind a more conventional programming language (e.g. SISAL).
I don't think I understand. Verilog supports dataflow modeling and for anything non-trivial is the main mode of modeling. What am I missing?
The closest to what you're proposing that I have seen is the stuff that the GNU Radio project is doing; I mentioned them on this list once before. (They have been in the news recently after operating a GSM base station using their software radio at Burning Man.) It would be really helpful if you could perhaps try to re-implement some of their stuff using your library, and see how it compares in terms of ease of coding, performance etc. You could ask them to compare it themselves and submit reviews here.
I think you really need to justify to us why this library is useful, and to whom.
I think questioning a library's audience and usefulness is a requirement but I am surprised at the number of people who don't see how/why dynamic dataflow is useful. Are you questioning the value of dataflow modeling or this specific library? I use a dataflow framework that is driven from XML description files. The framework utilizes dynamically loaded components as Stjepan has described in an earlier post. The framework has been deployed in applications ranging from digital video/audio systems to aviation flight/meteorology equipment in international airports. I hope to have an opportunity to review the Dataflow library. -- ---------------------------------- Michael Caisse Object Modeling Designs www.objectmodelingdesigns.com

Michael Caisse wrote:
Phil Endecott wrote:
And while thinking about hardware description languages, note also that they don't expose any sort of dataflow model even though the underlying circuit often has that sort of structure.
I don't think I understand. Verilog supports dataflow modeling and for anything non-trivial is the main mode of modeling. What am I missing?
In Verilog (or VHDL), if I have two components that I want to "pipe" together I need to declare a wire that will be the channel for the communication and then declare the two components with this wire connected to the appropriate port. Something like this: wire[7:0] a; ExampleSource src (.the_output(a)); ExampleSink sink (.the_input(a)); As far as I am aware, neither language has syntax to pipe them together more concisely, i.e. ExampleSource src >>= ExampleSink sink;
I think questioning a library's audience and usefulness is a requirement but I am surprised at the number of people who don't see how/why dynamic dataflow is useful. Are you questioning the value of dataflow modeling or this specific library?
I am questioning it all, but with an open mind ready to be convinced of its value.
I use a dataflow framework that is driven from XML description files. The framework utilizes dynamically loaded components as Stjepan has described in an earlier post. The framework has been deployed in applications ranging from digital video/audio systems to aviation flight/meteorology equipment in international airports.
I hope to have an opportunity to review the Dataflow library.
Please do! As soon as some people who actually know something about Dataflow start posting comments then I will shut up. My comments are intended really just to provoke debate. Regards, Phil.

In Verilog (or VHDL), if I have two components that I want to "pipe" together I need to declare a wire that will be the channel for the communication and then declare the two components with this wire connected to the appropriate port. Something like this:
wire[7:0] a; ExampleSource src (.the_output(a)); ExampleSink sink (.the_input(a));
As far as I am aware, neither language has syntax to pipe them together more concisely, i.e.
ExampleSource src >>= ExampleSink sink;
I've been lurking and have faced many of these problems both in VHDL and more recently in designing multi-processor pluggable signal processors in software that go even further than dataflow with their dynamic reconfigurability. I've found the separate concept of a wire to be useful in some situations (separating data distribution from the processing) particularly where flows can be 'one to many' either as copies or read only, where in-place data operations provide performance benefits and where persistence of the data (even if just short term) is usefully managed by some form of controller. Typically these all involve multiple threads so simple assumptions about data availability cannot be made. That said, when I've not needed it, the software complexity of having explicit wires over something implicit in the linking of components is an overhead. I really like the dataflow work and am watching with interest, but ongoing changes in my circumstances mean I can't do a full review or participate more fully. I would be interested in what the use cases are that dataflow is tackling. e.g. which of: 1) 'Single shot' data flow - all components run once. Typically the data being passed is pre-sized such that the size of the block on inputs gives rise to the answer on the output. e.g. Take the whole music file and then filter it all in one hit to give one answer 2) Packetised data flow (components laid out once and then operating on data packets) with explicit packet aggregation by components and having 'do_processing' events driven by the availability of a packet from its inputs. A component may consume three input packets and deliver one. e.g. Process music in 20ms lumps e.g. using FFTs for filtering 3) Streaming use cases (components laid out once and then operating on data streams). Each component called at a sample rate but perhaps only processing or packaging a 'packet' 1 out of every N calls or when all inputs have provided enough data to satisfy a block algorithm) e.g. do continuous processing at front end FIR filters, but explicitly decimate and manage a slower processing rate in the back end of the processing. This one is akin to much hardware I've designed where one may have clock domains and manage data events separate from processing events. In some places one can do processing every clock cycle, other places things are more complex. 4) Not only are data events managed, but the processing flow itself may be dynamically altered - e.g. music contains some vocals so I'll add a flow through a 'remove vocals' process (normally don't bother) or perhaps other dynamic constraints from user interaction will dynamically alter the flows being used and there are sufficient permutations not to pre-allocate a full network with switched decision points. Which of the above is dataflow suited to? Thanks to all who have provided various interesting links. To add to that list... I found the descriptions and tutorials accompanying the product Gedae (www.gedae.com) to nicely decouple and present many of the concepts familiar to hardware designers that a software based dataflow library may also want to consider. I also found the research and development pages at insomniac games provide another useful perspective where they manage both data and component instantiation and flow on the parallel cores of the playstation 3. http://www.insomniacgames.com/tech/techpage.php . While they largely deal with implementation issues with their SPU shaders and Dynamic component system, the focus is making things work in a multi-core system where dataflow also needs awareness of the resources on which to run. Regards and good luck with the review Paul Baxter

On Sun, Sep 7, 2008 at 8:43 AM, Paul Baxter <pauljbaxter@hotmail.com> wrote:
I really like the dataflow work and am watching with interest, but ongoing changes in my circumstances mean I can't do a full review or participate more fully.
Thank you for taking the time to join the discussion!
I would be interested in what the use cases are that dataflow is tackling. e.g. which of:
I will answer your questions from the standpoint of what the Dataflow.Signals framework / layer offers, since that one is the focus of the review. Dataflow.Signals is intended for component-run, signal-driven processing. By that I mean that the network has no brains whatsoever. It is up to the components to send signals, decide when to propagate signals and when not to, etc. It is possible to have something controlling the network from the outside (e.g., activating a component, inserting some data, grabbing a result), but none of this is done by the framework - it has to be done by the user. Below, I will provide examples of what components you might need (unless I specify that a component is a Dataflow.Signals component, the component would have to be implemented) and how you'd connect them and run the network .
1) 'Single shot' data flow - all components run once. Typically the data being passed is pre-sized such that the size of the block on inputs gives rise to the answer on the output. e.g. Take the whole music file and then filter it all in one hit to give one answer
Example: // this would be a component that can read a music file. // It provides a member function .send() that will send the // contents of the file via a signal of signature void(const MusicFile &) whole_music_file_reader reader("file.in"); // this would be a filter component that takes as input a music file // and outputs a filtered version music_filter filter; // this component is provided by Dataflow.Signals. It // will store values from incoming signals. signals::storage<void(const MusicFile &)> result; // connect reader >>= filter >>= result; // run once reader.send(); // we have the result - the at<0> member function will access it // (0 because the MusicFile is the 1st parameter of the signature) result.at<0>().play();
2) Packetised data flow (components laid out once and then operating on data packets) with explicit packet aggregation by components and having 'do_processing' events driven by the availability of a packet from its inputs. A component may consume three input packets and deliver one. e.g. Process music in 20ms lumps e.g. using FFTs for filtering
You could do this as long as the components could figure out everything locally - but depending on the details this might not be the most suited task for Dataflow.Signals. Here is something you could do: // A Dataflow.Signals component which will run in its own thread, // and generate periodic void() signals signals::timed_generator<void()> timer; // A packetised sound source. Each time it receives a void() signal, // it sends a void(const Packet &) signal sound_source source; // Some packet filters. Let's say that the filter will consume // 3 packets before producing 1, and then produce a packet // on each packet received. That is, if it's input packets are // ip1, ip2, ip3, ip4... and its output packets are opa, opb, opc... // the filter would use ip1, ip2, ip3 to produce opa, it would // use ip2, ip3, ip4 to produce opb, etc. packet_filter filter1(some_filter_fn), filter2(some_other_filter_fn); // connect the network connect(timer, source); connect(source, filter1); connect(source, filter2); // set the timer to produce a signal every 20ms timer.enable(0.02); So far so good - the timer will drive the source, which will provide the input to the filters (the filters need to take care of buffering the inputs themselves). But now we get to places where the suitability of Dataflow.Signals breaks down. For example, what about the results of filter1 and filter2? If they each go to separate outputs (call them output1 and output2), no problem: connect(filter1, output1); connect(filter2, output2); but what if they both go to the same output (which combines them in some way)? Then that output needs to be smart about how to handle two inputs. It has to provide two input ports: connect(filter1, ouput.input_port_1()); connect(filter2, ouput.input_port_2()); This is not the problem - the problem is that output will get calls from each of the filters, and have no idea what frames of source data the filtered data corresponds to. It could use a naive strategy where it waits to get both an input from port_1 and an input from port_2, then combine, then repeat, but that is not very robust (what if filter2 was connected a few frames after filter1?). So, we'd need to add some information to the data, which would allow the components to be "smart enough". In this case, we would explicitly need to add something like a frame number to the Packet, and the component would have to figure out what to do with data originating from different frames. Not ideal.
3) Streaming use cases (components laid out once and then operating on data streams). Each component called at a sample rate but perhaps only processing or packaging a 'packet' 1 out of every N calls or when all inputs have provided enough data to satisfy a block algorithm) e.g. do continuous processing at front end FIR filters, but explicitly decimate and manage a slower processing rate in the back end of the processing.
If you could put all of the logic into the components, you could do this, but this is where the Dataflow.Signals framework would probably be unsuitable. I started working on a different framework (called Dataflow.Managed), which is an example of a smarter network. In what I have so far, the framework takes care of things like "only invoke a component when its inputs have changed", and figures out the correct order of invocation, but one could similarly extend Dataflow.Managed (or create a new framework) that would allow the user to specify things like "only invoke a component when all it's inputs are ready", or other things that are needed.
This one is akin to much hardware I've designed where one may have clock domains and manage data events separate from processing events. In some places one can do processing every clock cycle, other places things are more complex.
One could make a Dataflow.Signals network where each component is connected to a clock, so that each component does get control at every clock cycle and does what it wants. But, the user would have to make sure that the order of invocation is correct - and that's why I'd say Dataflow.Signals is not the right tool for a network like this.
4) Not only are data events managed, but the processing flow itself may be dynamically altered - e.g. music contains some vocals so I'll add a flow through a 'remove vocals' process (normally don't bother) or perhaps other dynamic constraints from user interaction will dynamically alter the flows being used and there are sufficient permutations not to pre-allocate a full network with switched decision points.
This is not a problem - the network can be dynamically altered.
Which of the above is dataflow suited to?
I hope the above illustrated what the Dataflow.Signals layer is suitable or unsuitable for. Ideally, the Dataflow library as a whole will evolve to a point where it can accommodate frameworks that can handle all of the above appropriately - then it will be a matter of providing implementations of frameworks that function in the appropriate way (or writing support layers for frameworks that already do).
Thanks to all who have provided various interesting links. To add to that list... I found the descriptions and tutorials accompanying the product Gedae (www.gedae.com) to nicely decouple and present many of the concepts familiar to hardware designers that a software based dataflow library may also want to consider.
I also found the research and development pages at insomniac games provide another useful perspective where they manage both data and component instantiation and flow on the parallel cores of the playstation 3. http://www.insomniacgames.com/tech/techpage.php . While they largely deal with implementation issues with their SPU shaders and Dynamic component system, the focus is making things work in a multi-core system where dataflow also needs awareness of the resources on which to run.
Regards and good luck with the review
Thanks! Kind regards, Stjepan

On Sunday 07 September 2008 14:57:21 Stjepan Rajko wrote:
This one is akin to much hardware I've designed where one may have clock domains and manage data events separate from processing events. In some places one can do processing every clock cycle, other places things are more complex.
One could make a Dataflow.Signals network where each component is connected to a clock, so that each component does get control at every clock cycle and does what it wants. But, the user would have to make sure that the order of invocation is correct - and that's why I'd say Dataflow.Signals is not the right tool for a network like this.
This is my fundamental problem with my use cases for the dataflow library. In order to write a review, I tried to implement a simple OFDM demodulator. Samples come in at the "correct" rate and are stored in a cyclic buffer; once in a while, an FFT is performed, and then a channel equalizer block provides the equalized signal and soft information at each carrier. I could not figure out how to introduce the notion of a clock which is necessary for modeling concurrent hardware execution. Initially, it seemed that the dataflow library provided the tools required to replace SystemC in a cleaner and more efficient way, but I haven't yet managed to reach that goal. In order to model a clock, the connection between two modules would need to be managed by an object that knows about the clock. In particular, a value written to a connection object would not visible before the next clock (next active edge for the Verilog/VHDL people here). Such an object is easy enough to write until one encounters multiple clock domains (pre- and post-FFT in the simple example above) with feedback. In the documentation, you mentioned a "pin-based" approach. Such an approach would seem to map very well to modeling concurrent hardware, but the notion of clock construction eludes me even for that case. A combination of a clock- based connection, a pin-based model and a suite of fixed-point numbers based on expression templates would be sufficient to replace SystemC for my use cases. As an aside, I have been waiting (a long time) for Maurizio Vitale to post his completed fixed-point classes based on Proto. Does anyone know what has happened to him? I haven't seen anything from him on this list for a while. Regards, Ravi

On Mon, Sep 8, 2008 at 3:09 PM, Ravikiran Rajagopal <ravi.rajagopal@amd.com> wrote:
One could make a Dataflow.Signals network where each component is connected to a clock, so that each component does get control at every clock cycle and does what it wants. But, the user would have to make sure that the order of invocation is correct - and that's why I'd say Dataflow.Signals is not the right tool for a network like this.
This is my fundamental problem with my use cases for the dataflow library. In order to write a review, I tried to implement a simple OFDM demodulator. Samples come in at the "correct" rate and are stored in a cyclic buffer; once in a while, an FFT is performed, and then a channel equalizer block provides the equalized signal and soft information at each carrier.
Thanks for giving it a try!
I could not figure out how to introduce the notion of a clock which is necessary for modeling concurrent hardware execution. Initially, it seemed that the dataflow library provided the tools required to replace SystemC in a cleaner and more efficient way, but I haven't yet managed to reach that goal. In order to model a clock, the connection between two modules would need to be managed by an object that knows about the clock. In particular, a value written to a connection object would not visible before the next clock (next active edge for the Verilog/VHDL people here). Such an object is easy enough to write until one encounters multiple clock domains (pre- and post-FFT in the simple example above) with feedback.
Multiple clock domains with feedback definitely sounds like a scenario not suitable for Dataflow.Signals. I think in general, as soon as you start needing a control object that needs to have some overall knowledge of the network (unless it is a specific network for which a custom control object can be implemented), Dataflow.Signals is probably the wrong tool.
In the documentation, you mentioned a "pin-based" approach. Such an approach would seem to map very well to modeling concurrent hardware, but the notion of clock construction eludes me even for that case. A combination of a clock- based connection, a pin-based model and a suite of fixed-point numbers based on expression templates would be sufficient to replace SystemC for my use cases.
Perhaps a Dataflow library support layer could be written for SystemC. I looked at SystemC a while back when someone referred me to it, but never ended up doing anything with it... Anyway, regarding the pin-based approach - recently I started writing a framework for the Dataflow library called Dataflow.Managed, which is inspired by the pin-based approach mentioned in the documentation. You can see a test case here: http://svn.boost.org/svn/boost/sandbox/SOC/2007/signals/libs/dataflow/test/m... (sorry it is not well documented) The Dataflow.Managed layer provides "intelligent" invocation of components - it will only invoke components whose inputs have changed. Even though it is not specifically geared towards clock-driven networks, you could have a clock component that changes it's output at every edge or clock cycle, and thereby invokes each of the components connected to it. If you wanted to do hardware simulation, the components would still need to simulate signal delays (a simple solution could be that that the components read and buffer their inputs in one half of the clock cycle, and write to their outputs in the other), so this would still not be an ideal solution. I think after I finalize the generic layer of the Dataflow library (this is the underlying layer of concepts on top of which Dataflow.Signals is built), I will try to provide small example frameworks suitable for different kinds of scenarios. A framework specifically intended for clock-based networks should definitely be one of those examples. Thanks again for trying to implement something using the library - I'm sorry it turned out Dataflow.Signals was the wrong tool for the job (in the future I will know to do a better job of describing the suitability/unsuitability of Dataflow.Signals in the documentation). If you feel like you've gotten a good sense of where Dataflow.Signals is useful and where it isn't (perhaps you were able to implement some simplified version of the problem?), I would still encourage you to write a review if you have the time.
As an aside, I have been waiting (a long time) for Maurizio Vitale to post his completed fixed-point classes based on Proto. Does anyone know what has happened to him? I haven't seen anything from him on this list for a while.
You might want to re-post this part in a new e-mail with an appropriate subject line - it is likely to go unnoticed here. Kind regards, Stjepan

Phil Endecott wrote:
Michael Caisse wrote:
I don't think I understand. Verilog supports dataflow modeling and for anything non-trivial is the main mode of modeling. What am I missing?
In Verilog (or VHDL), if I have two components that I want to "pipe" together I need to declare a wire that will be the channel for the communication and then declare the two components with this wire connected to the appropriate port. Something like this:
wire[7:0] a; ExampleSource src (.the_output(a)); ExampleSink sink (.the_input(a));
As far as I am aware, neither language has syntax to pipe them together more concisely, i.e.
ExampleSource src >>= ExampleSink sink;
Maybe I need some educating. It wouldn't be the first time (o; . I thought Dataflow programming dealt primarily with the fact that components have inputs and outputs. These inputs/outputs are connected to other components. Components "execute" once the required inputs are available. Is there a definition that requires the connection to be defined by a pipe like construct or can the binding be a wire? I'm not sure that the binding description is important. For my own work I describe the components and then describe the bindings. I think of it as wiring up the outputs and inputs (which can of course fan-out). Eventually I get a directed graph. I guess this sounds a lot like the Verilog Data Flow abstraction to me. I assume I am still missing something.
Please do! As soon as some people who actually know something about Dataflow start posting comments then I will shut up. My comments are intended really just to provoke debate.
Regards, Phil.
Debate is good. Best Regard- Michael -- ---------------------------------- Michael Caisse Object Modeling Designs www.objectmodelingdesigns.com

On Sat, Sep 6, 2008 at 4:03 PM, Michael Caisse <boost@objectmodelingdesigns.com> wrote:
I use a dataflow framework that is driven from XML description files. The framework utilizes dynamically loaded components as Stjepan has described in an earlier post. The framework has been deployed in applications ranging from digital video/audio systems to aviation flight/meteorology equipment in international airports.
Nice! Out of curiosity, (if you can share) what XML syntax did you use? With one of the dataflow systems I worked on I ended up using something like this: <component name="A" type="some_source_type" parameter1="value1" parameter2="value2" /> <component type="some_sink_type" some_other_parameter="other_value"> <input name="A"/> </component> At some point I hope to provide an example that uses XML configuration files to initialize a network, but built on top of the Dataflow Generic Support Layer so that it can be used with any framework with an implemented support layer (in the same way the GUI example does). There are some nice possibilities using Boost.Fusion with reading the parameters from the XML file and providing them type-safely to the constructor of the component.
I hope to have an opportunity to review the Dataflow library.
That would be great! Best, Stjepan

Stjepan Rajko wrote:
Nice! Out of curiosity, (if you can share) what XML syntax did you use? With one of the dataflow systems I worked on I ended up using something like this:
<component name="A" type="some_source_type" parameter1="value1" parameter2="value2" />
<component type="some_sink_type" some_other_parameter="other_value"> <input name="A"/> </component>
At some point I hope to provide an example that uses XML configuration files to initialize a network, but built on top of the Dataflow Generic Support Layer so that it can be used with any framework with an implemented support layer (in the same way the GUI example does). There are some nice possibilities using Boost.Fusion with reading the parameters from the XML file and providing them type-safely to the constructor of the component.
I have two frameworks that I use. The one that most closely matches the above description looks like this: <!-- The framework will create an object of someType using a factory. --> <!-- The factory can instantiate objects based on loadable modules. --> <component name="myName" type="someType" > <dataMap> <input name="data_name_1" internalName="my_inside_parameter_name1" expire="100" /> <input name="data_name_2" internalName="my_inside_parameter_name2" expire="100" /> </dataMap> <!-- additional component configuration goes here --> </component> <component name="yourName" type="anotherType" > <dataMap> <input name="foo" /> <dataMap> </component> The internalName and expire parameters are optional. internalName allows data inputs to have a name that the component would know. The expire came about because the validity of inputs to a component (in the work I perform at least) is often based on the relative time between inputs. As a simple example... Calculating dew point requires temperature and relative humidity (RH). When the dew point component has both valid temperature and RH it can then execute. However, utilizing temperature from 10-minutes ago and a current RH value may not be appropriate for the application. This pattern of validity occurred so often for me that I just built handling the constraint into the framework. The XML description could be improved on I am sure. It has worked well for what I've needed over the years. Best Regards - Michael -- ---------------------------------- Michael Caisse Object Modeling Designs www.objectmodelingdesigns.com

on Thu Sep 04 2008, Hervé Brönnimann <hervebronnimann-AT-mac.com> wrote:
Stjepan: I'm just curious, because my wife uses such a dataflow program for interactive music performance.
The program is called MAX and is distributed by a company named Cyclic74 (http://www.cycling74.com/products/max5). Some of the components are MIDI inputs or outputs, and the signal can consist of a scalar or or more evolved entities like a soundwave, etc. It has MSP (audio signal processing) and Jitter (video processing) modules. It's been in use for more than 20 yrs, allows graphical interface programming, generation of C code, and separate compilation.
Wow, MAX is still around! Pretty cool. I started out in the music software biz. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Okay, but which applications need runtime configurability? Would not the performance advantages of a compile-time structure be more valuable than the flexibility of runtime configuration in most applications?
Distributed applications could be a good use case. If you have your components on different machines, you can't take advantage of compile-time structure anyways.
Even for normal applications it is often very useful to be able to reorganize your pipeline without a recompile, at least for coarse grained components, where the benefit of static checking and optimization might be less important. In particular, the ability to add or remove sink and sources, or disabling optional components, is very useful.
A few month ago I was developing a motions library (not finished yet) that led me to the same kind of questions. The principle is quite the same as Accumulators or Dataflow (btw I wonder if I could reimplement everything using one of them). The goal is to easily define how some values change in time, by starting with some very basic movements (linear speed, random values, etc...) and make the values pass through a pipeline that alter them to add some effects (inertia, stepping, recording, dependencies between motions, etc...). I wanted to have compile-time connections inside the pipeline because a runtime overhead wasn't acceptable as it was for a game engine. But sometimes some flexibility was required: disabling a dependency, removing an effect, etc... Finally it turned out that the best option was to have some special components implementing those points of flexibility. So I added some components called "dropping", "variant", "switch" that are still connected at compile-time but implement a precise point of flexibility at runtime. That way, the runtime overhead is limited only to the very few flexibility that's really needed in a precise context. The concept is finally the following: sometimes you need runtime flexibility, but you always know at compile-time if you'll need it, how you'll need it and where you'll need it. Bruno

Hello, I took a quick look into the lib and I've following questions: - How does Boost.DataFlow support exchanging data in both directions between two components (duplex communication)? Example: stack of network-protocols: service_contract >>= serializer >>= encoder >>= protocol_1 >>= protocol_2 >>= transport (send data) transport >>= protocol_2 >>= protocol_1 >>= encoder >>= deserializer >>= service_contract (receive data) - If one consumer is connected to multiple producers, how can the consumer selectively disconnect from specific producers? Contrary to some other posts -I find the syntax of ovelroaded operators not confusing:
= push sematic =<< pull semantic
The pipe operator '|' could also be used in the parallel execution semantic. regards, Oliver

On Thu, Sep 4, 2008 at 4:06 PM, Kowalke Oliver (QD IT PA SI) <Oliver.Kowalke@qimonda.com> wrote: [snip]
The pipe operator '|' could also be used in the parallel execution semantic.
How about the 'caret' operator ("^") to signify parallelism instead of operator| which more semantically means piping data from left to right? -- Dean Michael C. Berris Software Engineer, Friendster, Inc.

On Thu, Sep 4, 2008 at 4:06 PM, Kowalke Oliver (QD IT PA SI) <Oliver.Kowalke@qimonda.com> wrote: [snip]
The pipe operator '|' could also be used in the parallel
execution semantic.
How about the 'caret' operator ("^") to signify parallelism instead of operator| which more semantically means piping data from left to right?
Yes, also possible. It was onyl a hint for expressing parallel execution of data processing (in separat threads). Oliver

On Thu, Sep 4, 2008 at 1:06 AM, Kowalke Oliver (QD IT PA SI) <Oliver.Kowalke@qimonda.com> wrote:
Hello,
I took a quick look into the lib and I've following questions:
- How does Boost.DataFlow support exchanging data in both directions between two components (duplex communication)? Example: stack of network-protocols:
service_contract >>= serializer >>= encoder >>= protocol_1 >>= protocol_2 >>= transport (send data) transport >>= protocol_2 >>= protocol_1 >>= encoder >>= deserializer >>= service_contract (receive data)
Depends on the exact details. If the communication is always such that the left component initiates the exchange by sending its piece of data, and then the right component responds, in the Dataflow.Signals layer you could use a signature like `type1(type2), i.e. use the argument to send the data to the right and the return value to send the data to the left. Here is an example (untested) class duplex_multiplier : public signals::filter<duplex_multiplier, int(int)> { public: int operator()(int x) { // double the value received from the left, and send that to the right int response_from_right = out(x*2); // then return triple the response to the component on the left return 3 * response_from_right; } }; // signal consumers don't need to know their signature class loop_back : public signals::consumer<loop_back> { public: int operator()(int x); { // just return the received value back to the left return x; } }; // now: duplex_multiplier a,b,c; loop_back d; // the following connections are bidirectional because of the int(int) signature // so >>= is misleading and a different operator might be better to use a >>= b >>= c >>= d; int response = a(1); // should return 1*2*2*2*3*3*3 If in your scenario a component would need to respond instantly (before sending data to the right and receiving a response), then it would need to send the rightward signal in a separate thread. If, on the other hand, you have a true duplex scenario where both the leftmost and rightmost component can initiate a signal, then each component would need an additional signal. The network would then look something like: c1 >>= c2 >>= c3 >>= c4 c4.left_signal() >>= c3.left_signal() >>= c2.left_signal() >>= c1 This is similar to: http://www.dancinghacker.com/code/dataflow/dataflow/signals/introduction/tut...
- If one consumer is connected to multiple producers, how can the consumer selectively disconnect from specific producers?
I would recommend storing the connection object and using it to break the connection, as shown here: http://www.dancinghacker.com/code/dataflow/dataflow/signals/introduction/tut... At one point, I put together an example of how this tracking of connection objects by the consumer could be done automatically: http://svn.boost.org/svn/boost/sandbox/SOC/2007/signals/libs/dataflow/exampl... http://svn.boost.org/svn/boost/sandbox/SOC/2007/signals/libs/dataflow/exampl... In these examples, the consumer was programmed to disconnect itself from any producer that sends it a signal after it has already received a specified number of signals, but the logic could be changed.
Contrary to some other posts -I find the syntax of ovelroaded operators not confusing:
= push sematic =<< pull semantic
The pipe operator '|' could also be used in the parallel execution semantic.
I am also happy with >>= (and adding <<= to indicate pull semantics), but the example above showed another problem - if the connection is bidirectional, neither >>= or <<= seem appropriate. What do people think of making the operator choice completely up to the user (i.e., the user can specify what operator they would like to use for what operation, with some default mappings provided)? Thanks for taking a look at the library! Best, Stjepan

Am Montag, 1. September 2008 07:19:13 schrieb Jaakko Järvi:
--------------------------------------------------------- Questions you may want to answer in your review:
- What is your evaluation of the design?
I think the design is in a good shape
- What is your evaluation of the documentation?
could be a little more expressive
- What is your evaluation of the potential usefulness of the library?
I find it verry useful for several aspects of my projects (pipelined architectures)
- Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
A quick reading but I plan to use the lib in my projects
- Are you knowledgeable about the problem domain?
A little bit
In particular, please remember to answer the following question explicitly:
- Do you think the library should be accepted as a Boost library?
Yes - I would vote for it regards, Oliver

On Wed, Sep 10, 2008 at 11:28 AM, <k-oli@gmx.de> wrote:
Am Montag, 1. September 2008 07:19:13 schrieb Jaakko Järvi:
--------------------------------------------------------- Questions you may want to answer in your review:
- What is your evaluation of the design?
I think the design is in a good shape
- What is your evaluation of the documentation?
could be a little more expressive
- What is your evaluation of the potential usefulness of the library?
I find it verry useful for several aspects of my projects (pipelined architectures)
- Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
A quick reading but I plan to use the lib in my projects
- Are you knowledgeable about the problem domain?
A little bit
In particular, please remember to answer the following question explicitly:
- Do you think the library should be accepted as a Boost library?
Yes - I would vote for it
regards, Oliver _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Thanks for taking a look at the library and submitting a review. If you do start using the library in your projects, I would very much welcome your further feedback. Best regards, Stjepan
participants (16)
-
Bruno Lalande
-
David Abrahams
-
Dean Michael Berris
-
Emil Dotchevski
-
Giovanni Piero Deretta
-
Hervé Brönnimann
-
jarvi@cs.tamu.edu
-
k-oli@gmx.de
-
Kowalke Oliver (QD IT PA SI)
-
Manuel Jung
-
Michael Caisse
-
Paul Baxter
-
Phil Endecott
-
Ravi
-
Stjepan Rajko
-
vicente.botet