
Reviewer info: ============== I have an M.Sc. in computing science. Much of my experience is with state-space search, which has certain similarities to the topic at hand. Also, lately I have been working daily with code that ought to have been written using an FSM library, but wasn't! :-( I have been following the FSM discussion on the list, and have focussed on the library for one day, reading through the docs and rationale, plus Harel's initial statechart paper. The rationale document did not convince me of the correctness of some design choices, so I did not attempt to write code with the library. Reviewer comments: ================== Let us consider that we have multiple fsms and we would like to apply operations to these fsms, e.g. direct concatenation (the terminal state of one automata becomes the start state of another), or choice (depending on an action of, or an accept/reject by fsm a, proceed to execute fsm b or fsm c). This is apparently straightforwared when working at the level of states; however, the library draws a distinction between a state_machine and a state. This acts as a barrier to the direct composition of state machines that we would like to use as components within large state machines. The distinction made between state and state_machine, which I consider to be unfortunate, was not justified in the rationale. For naturally recursive structures, distinguishing between the structure's frontier and its interior by type is best avoided. (Analogously, the head of a (singly-linked) list and the root of a tree would not be distinguished by type from their successors and children, respectively.) Composition is critical! See, for instance, section 12.4 of Keller (http://www.cs.hmc.edu/claremont/keller/webBook/ch12/). An example of multiple-machine composition into a large machine should also exist in the tutorial. The simplest useful such case would likely be connecting two half-adders to make a full-adder machine. There is a brief note in the tutorial re: submachines that fails to do this topic justice. If the template approach recommended there is to be the canonical form for a reusable component, that style should be propagated through all but the simplest examples in the tutorial. Each submachine should clearly be labelled as part of its own .hpp file. I was pleased that the tutorial and rationale collectively contain considerable discussion of exception handling. In the speed vs. scalability trade-off section of the rationale page, it is claimed that using an fsm for a parsing task is an abuse, excusing the relatively low performance of the library. However, finite-state automata are inextricably linked with the acceptance of regular languages: see, for instance, section 12.2 of Keller (again, http://www.cs.hmc.edu/claremont/keller/webBook/ch12/); it is not too much to ask that an fsm library to be able to serve as the underpinning of a reasonably efficient regex library. The lack of support for dynamically-configurable automata is also unfortunate, though not unexpected given the requirements list. Other reviewers have also already commented upon the lack of a table-lookup dispatch method. From the rationale: "To warrant fast table lookup, states and events must be represented with an integer. To keep the table as small as possible, the numbering should be continuous, e.g. if there are ten states, it's best to use the ids 0-9. To ensure continuity of ids, all states are best defined in the same header file. The same applies to events. Again, this does not scale." However, I think that for table lookup there is no requirement that ids be consecutive -- a compile-time write-once read-many hash table could do the trick. But even that may be an aside. I am unconvinced by: "To satisfy requirement 6 [to support the development of arbitrarily large and complex state machines], it should be possible that to spread a single state machine over several translation units. This however means that the dispatch table must be filled at runtime and the different trnaslation units must somehow make themselves 'known' so that their part of the state machine can be added to the table. There simply is no way to do this automatically and portably." While perhaps each state (compilation module) must know of its successor states, no central table of "current state x action x next state" encompassing the entire machine need be explicitly created. Under the "Limitations" section, reference is made to "lost events": "It is currently not possible to specially handle events that have not triggered a recation (and would thus be silently discarded). However, a facility allowing this will probably be added in the not-too-distant future." If the event is irrelevant at that point in the state machine, it is most properly ignored. If it is not irrelevant, there ought to be a transition (perhaps a self-transition). Therefore, I don't understand the comment. I have an uneasy feeling about how differently asynchronous and synchronous machines are operated, but did not consider this issue in detail. Allowing the tracking of history is intruiging; the prohibition against deep history of states with orthogonal regions is suspicious. The explanation given is that it would be more complicated (but still possible) to implement. In my experience, that sort of reasoning just doesn't fly at Boost. The discussion of fork and join bars, combined with the event dispatch to orthogonal regions discussion immediately below that, suggests that the implementation is not truly designed to handle concurrent states. I think, however, if one is to claim that UML statecharts are supported that the semantics of execution ought to be the same! I can well imagine that this "would complicate the implementation quite a bit", but Boost is also about interface, not only implementation, and if the wrong interface is standardized (whether de jure or de facto) it's worse for everybody in the end. There is no explicit discussion of non-deterministic fsms. Of course, every such fsm can be re-expressed deterministically. Similarly, a given fsm may be non-minimal, but such minimization procedures exist. Both of these procedures necessarily require a global view of the fsm. As this is contrary to the scalability requirement listed in the rationale, I conclude that neither are present. :-) However, I would like to ask if (excepting possible compiler limitations) is there anything about the library that precludes extending it such that NDFAs could be specified in code, and converted into the minimized DFA during compilation? (http://www2.toki.or.id/book/AlgDesignManual/BOOK/BOOK5/NODE207.HTM# SECTION03177000000000000000 gives a toy example of FSM minimization). We had tuple and mpl separately before fusion; it seems reasonable for boost to have both compile- and run-time fsm libraries, if ultimately they may be similarly fused. Reviewer conclusion: ==================== I like this library, but I'm not satisfied with it. What level of library is good enough for boost? The library is not ready for standardization, however, this doesn't appear to be the standard that most people use when deciding upon their vote. This library is certainly good enough for people to make successful use of. Ultimately, I think there are issues of design and rationale that must be sorted through, appropriate modifications made, followed by re-review. Less weighty issues: ==================== struct Greeting; struct Machine: fsm::state_machine<Machine, Greeting> {}; (and all similar) should be changed to struct Machine: fsm::state_machine<Machine, struct Greeting> {}; If necessary, the old way can be metioned as a workaround for older compilers in a footnote. The "default" black disc in the diagrams is too large! Maybe make the radius about 2/3s of what it is now. Change "Such transitions are currently flagged" to "Transitions across orthogonal regions are currently flagged". It's the first sentence in the last paragraph of the rationale.

Dave Gomboc wrote:
The distinction made between state and state_machine, which I consider to be unfortunate, was not justified in the rationale...
There is at least one good reason - the library supports both synchronous and asynchronous state machines, which are essentially different beasts and are operated quite differently. No such distinction exists for states, therefore they shouldn't be treated the same way as state machines. Otherwise you would be able to mix both synchronous and asynchronous states in one machine, which doesn't make sense.

Hi Dave
Let us consider that we have multiple fsms and we would like to apply operations to these fsms, e.g. direct concatenation (the terminal state of one automata becomes the start state of another), or choice (depending on an action of, or an accept/reject by fsm a, proceed to execute fsm b or fsm c). This is apparently straightforwared when working at the level of states; however, the library draws a distinction between a state_machine and a state. This acts as a barrier to the direct composition of state machines that we would like to use as components within large state machines.
A much better way to achieve that is with templated states, as you seem to have observed yourself.
The distinction made between state and state_machine, which I consider to be unfortunate, was not justified in the rationale.
I didn't justify it because it feels natural to me that a state machine has only one event queue, and one place where history is stored (both are containers inside state_machine). Admittedly, because the outermost state knows at compile time that it is the outermost state, the state_machine class template could be "abstracted away". You'd then only have simple_state and state. However, this seems at best obscure to me. [snip]
There is a brief note in the tutorial re: submachines that fails to do this topic justice. If the template approach recommended there is to be the canonical form for a reusable component, that style should be propagated through all but the simplest examples in the tutorial. Each submachine should clearly be labelled as part of its own .hpp file.
While I might agree with your reasoning in theory, I don't think that it would work in practice. Even now I sometimes get complaints that the examples are too complex (too many templates) and changing all examples to work with templated states would make them even more so. Plus, often there is no need to use a state in more than one place, so why bother with templated states then? [snip]
In the speed vs. scalability trade-off section of the rationale page, it is claimed that using an fsm for a parsing task is an abuse
In its generality this remark is of course false and I should have replaced it with better reasoning long ago. What I want to say here is that it is a bad idea to use this library for parsing, as it is so general that performance tradeoffs are inevitable. For parsing, one is much better off with a parser framework, which is optimized for the task. The IMO important observation is the following: Parsers like Spirit only need to return when they have finished processing all "events" (characters). An FSM however must return after processing each event. Parsers can exploit this fact and store state with recursive calls, which is of course much faster (some stuff can even be inlined) than any super-clever double-dispatch scheme. There is often a hard limit for stack space which - I think - is the reason why FSMs do make sense in parsers nevertheless. However, all the FSM-employing parsers I know are of the dynamic variant (e.g. regex) and these FSMs are not directly designed by humans but computed by an algorithm. Last but not least I think it would be tedious for a human to implement a parser on the state- level, which is probably the reason why I dared to make the admittedly dumb remark in the first place. I think Darryl puts this rather nicely: <quote> Fast flat fsms to process simple events and having little additional context are best implemented using other methods. However such implementations are not difficult, and there are plenty of domain specific tools already (eg various language recognisers). </quote>
, excusing the relatively low performance of the library.
That was not intended as an excuse, see above.
However, finite-state automata are inextricably linked with the acceptance of regular languages: see, for instance, section 12.2 of Keller (again, http://www.cs.hmc.edu/claremont/keller/webBook/ch12/); it is not too much to ask that an fsm library to be able to serve as the underpinning of a reasonably efficient regex library.
See above. A regex library needs a dynamic FSM. [snip]
Other reviewers have also already commented upon the lack of a table-lookup dispatch method. From the rationale: "To warrant fast table lookup, states and events must be represented with an integer. To keep the table as small as possible, the numbering should be continuous, e.g. if there are ten states, it's best to use the ids 0-9. To ensure continuity of ids, all states are best defined in the same header file. The same applies to events. Again, this does not scale."
However, I think that for table lookup there is no requirement that ids be consecutive -- a compile-time write-once read-many hash table could do the trick.
Unless you use a perfect hashing algorithm (which is infeasible because you need to know all the ids beforehand), this would essentially wreck the ability to use this library in real-time systems. Hashed containers are a bad idea for real-time as there is no upper limit for a search operation.
But even that may be an aside. I am unconvinced by: "To satisfy requirement 6 [to support the development of arbitrarily large and complex state machines], it should be possible that to spread a single state machine over several translation units. This however means that the dispatch table must be filled at runtime and the different trnaslation units must somehow make themselves 'known' so that their part of the state machine can be added to the table. There simply is no way to do this automatically and portably." While perhaps each state (compilation module) must know of its successor states, no central table of "current state x action x next state" encompassing the entire machine need be explicitly created.
Which is exactly what this library is doing. There is a lookup "table" within each state and if that fails lookup considers the states outer state. However, this inevitably means that you cannot have constant time lookup. The larger your FSM becomes the longer lookup takes. BTW, as I've written in another post I don't see why reaction search is so important for people, because in typical FSMs I'd expect transition execution to be much more of a performance hog.
Under the "Limitations" section, reference is made to "lost events": "It is currently not possible to specially handle events that have not triggered a recation (and would thus be silently discarded). However, a facility allowing this will probably be added in the not-too-distant future." If the event is irrelevant at that point in the state machine, it is most properly ignored. If it is not irrelevant, there ought to be a transition (perhaps a self-transition). Therefore, I don't understand the comment.
There are applications of state-machines where an ignored event consitutes an error. Since the library currently does not allow to define transitions that are triggered by any event there is no easy way to handle such "ignored" events.
I have an uneasy feeling about how differently asynchronous and synchronous machines are operated, but did not consider this issue in detail.
Why? They are rather different things aren't they?
Allowing the tracking of history is intruiging; the prohibition against deep history of states with orthogonal regions is suspicious.
Why?
The explanation given is that it would be more complicated (but still possible) to implement.
It would be much more complicated to be precise.
In my experience, that sort of reasoning just doesn't fly at Boost.
Why? Must I really offer all the functionality in the first version of the library? Very few people use deep history and orthogonal states in conjunction. For those that do I describe a work-around. I have not received a single request to implement full deep history.
The discussion of fork and join bars, combined with the event dispatch to orthogonal regions discussion immediately below that, suggests that the implementation is not truly designed to handle concurrent states.
Why?
I think, however, if one is to claim that UML statecharts are supported that the semantics of execution ought to be the same! I can well imagine that this "would complicate the implementation quite a bit", but Boost is also about interface, not only implementation, and if the wrong interface is standardized (whether de jure or de facto) it's worse for everybody in the end.
The interface change for this is rather easy: Simply specify multiple rather than just one target state. I don't see the problem. BTW, this is connected to deep history not working for orthogonal sub-states.
There is no explicit discussion of non-deterministic fsms. Of course, every such fsm can be re-expressed deterministically.
I don't see why I should discuss that.
Similarly, a given fsm may be non-minimal, but such minimization procedures exist. Both of these procedures necessarily require a global view of the fsm. As this is contrary to the scalability requirement listed in the rationale, I conclude that neither are present.
This has never been a goal and I don't see why I should provide such functionality. Also I suspect that state-local storage could prevent such a minimization.
However, I would like to ask if (excepting possible compiler limitations) is there anything about the library that precludes extending it such that NDFAs could be specified in code, and converted into the minimized DFA during compilation?
Yes, the scalability requirements. As I explain above (and you seem to have found that yourself) there is no way to have a global view at an FSM implemented with this library.
We had tuple and mpl separately before fusion; it seems reasonable for boost to have both compile- and run-time fsm libraries, if ultimately they may be similarly fused.
I don't think that is feasible unless you limit yourself to FSMs specified in single TUs so that a global view is possible.
I like this library, but I'm not satisfied with it. What level of library is good enough for boost? The library is not ready for standardization, however, this doesn't appear to be the standard that most people use when deciding upon their vote. This library is certainly good enough for people to make successful use of.
The "default" black disc in the diagrams is too large! Maybe make the radius about 2/3s of what it is now.
I think the size is in line with the current UML standard, but I'll check again.
Change "Such transitions are currently flagged" to "Transitions across orthogonal regions are currently flagged". It's the first sentence in the last paragraph of the rationale.
Ok, noted. Thanks for your review! Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.

"Andreas Huber" <ahd6974-spamgroupstrap@yahoo.com> wrote in message news:loom.20050307T143817-929@post.gmane.org...
Hi Dave
Let us consider that we have multiple fsms and we would like to apply operations to these fsms, e.g. direct concatenation (the terminal state of one automata becomes the start state of another), or choice (depending on an action of, or an accept/reject by fsm a, proceed to execute fsm b or fsm c). This is apparently straightforwared when working at the level of states; however, the library draws a distinction between a state_machine and a state. This acts as a barrier to the direct composition of state machines that we would like to use as components within large state machines.
A much better way to achieve that is with templated states, as you seem to have observed yourself.
The distinction made between state and state_machine, which I consider to be unfortunate, was not justified in the rationale.
I didn't justify it because it feels natural to me that a state machine has only one event queue, and one place where history is stored (both are containers inside state_machine). Admittedly, because the outermost state knows at compile time that it is the outermost state, the state_machine class template could be "abstracted away". You'd then only have simple_state and state. However, this seems at best obscure to me.
[snip]
There is a brief note in the tutorial re: submachines that fails to do this topic justice. If the template approach recommended there is to be the canonical form for a reusable component, that style should be propagated through all but the simplest examples in the tutorial. Each submachine should clearly be labelled as part of its own .hpp file.
While I might agree with your reasoning in theory, I don't think that it would work in practice. Even now I sometimes get complaints that the examples are too complex (too many templates) and changing all examples to work with templated states would make them even more so. Plus, often there is no need to use a state in more than one place, so why bother with templated states then?
[snip]
In the speed vs. scalability trade-off section of the rationale page, it is claimed that using an fsm for a parsing task is an abuse
In its generality this remark is of course false and I should have replaced it with better reasoning long ago. What I want to say here is that it is a bad idea to use this library for parsing, as it is so general that performance tradeoffs are inevitable. For parsing, one is much better off with a parser framework, which is optimized for the task. The IMO important observation is the following: Parsers like Spirit only need to return when they have finished processing all "events" (characters). An FSM however must return after processing each event. Parsers can exploit this fact and store state with recursive calls, which is of course much faster (some stuff can even be inlined) than any super-clever double-dispatch scheme. There is often a hard limit for stack space which - I think - is the reason why FSMs do make sense in parsers nevertheless. However, all the FSM-employing parsers I know are of the dynamic variant (e.g. regex) and these FSMs are not directly designed by humans but computed by an algorithm. Last but not least I think it would be tedious for a human to implement a parser on the state- level, which is probably the reason why I dared to make the admittedly dumb remark in the first place. I think Darryl puts this rather nicely: <quote> Fast flat fsms to process simple events and having little additional context are best implemented using other methods. However such implementations are not difficult, and there are plenty of domain specific tools already (eg various language recognisers). </quote>
, excusing the relatively low performance of the library.
That was not intended as an excuse, see above.
However, finite-state automata are inextricably linked with the acceptance of regular languages: see, for instance, section 12.2 of Keller (again, http://www.cs.hmc.edu/claremont/keller/webBook/ch12/); it is not too much to ask that an fsm library to be able to serve as the underpinning of a reasonably efficient regex library.
See above. A regex library needs a dynamic FSM.
[snip]
Other reviewers have also already commented upon the lack of a table-lookup dispatch method. From the rationale: "To warrant fast table lookup, states and events must be represented with an integer. To keep the table as small as possible, the numbering should be continuous, e.g. if there are ten states, it's best to use the ids 0-9. To ensure continuity of ids, all states are best defined in the same header file. The same applies to events. Again, this does not scale."
However, I think that for table lookup there is no requirement that ids be consecutive -- a compile-time write-once read-many hash table could do the trick.
Unless you use a perfect hashing algorithm (which is infeasible because you need to know all the ids beforehand), this would essentially wreck the ability to use this library in real-time systems. Hashed containers are a bad idea for real-time as there is no upper limit for a search operation.
But even that may be an aside. I am unconvinced by: "To satisfy requirement 6 [to support the development of arbitrarily large and complex state machines], it should be possible that to spread a single state machine over several translation units. This however means that the dispatch table must be filled at runtime and the different trnaslation units must somehow make themselves 'known' so that their part of the state machine can be added to the table. There simply is no way to do this automatically and portably." While perhaps each state (compilation module) must know of its successor states, no central table of "current state x action x next state" encompassing the entire machine need be explicitly created.
Which is exactly what this library is doing. There is a lookup "table" within each state and if that fails lookup considers the states outer state. However, this inevitably means that you cannot have constant time lookup. The larger your FSM becomes the longer lookup takes. BTW, as I've written in another post I don't see why reaction search is so important for people, because in typical FSMs I'd expect transition execution to be much more of a performance hog.
Under the "Limitations" section, reference is made to "lost events": "It is currently not possible to specially handle events that have not triggered a recation (and would thus be silently discarded). However, a facility allowing this will probably be added in the not-too-distant future." If the event is irrelevant at that point in the state machine, it is most properly ignored. If it is not irrelevant, there ought to be a transition (perhaps a self-transition). Therefore, I don't understand the comment.
There are applications of state-machines where an ignored event consitutes an error. Since the library currently does not allow to define transitions that are triggered by any event there is no easy way to handle such "ignored" events.
I have an uneasy feeling about how differently asynchronous and synchronous machines are operated, but did not consider this issue in detail.
Why? They are rather different things aren't they?
Allowing the tracking of history is intruiging; the prohibition against deep history of states with orthogonal regions is suspicious.
Why?
The explanation given is that it would be more complicated (but still possible) to implement.
It would be much more complicated to be precise.
In my experience, that sort of reasoning just doesn't fly at Boost.
Why? Must I really offer all the functionality in the first version of the library? Very few people use deep history and orthogonal states in conjunction. For those that do I describe a work-around. I have not received a single request to implement full deep history.
The discussion of fork and join bars, combined with the event dispatch to orthogonal regions discussion immediately below that, suggests that the implementation is not truly designed to handle concurrent states.
Why?
I think, however, if one is to claim that UML statecharts are supported that the semantics of execution ought to be the same! I can well imagine that this "would complicate the implementation quite a bit", but Boost is also about interface, not only implementation, and if the wrong interface is standardized (whether de jure or de facto) it's worse for everybody in the end.
The interface change for this is rather easy: Simply specify multiple rather than just one target state. I don't see the problem. BTW, this is connected to deep history not working for orthogonal sub-states.
There is no explicit discussion of non-deterministic fsms. Of course, every such fsm can be re-expressed deterministically.
I don't see why I should discuss that.
Similarly, a given fsm may be non-minimal, but such minimization procedures exist. Both of these procedures necessarily require a global view of the fsm. As this is contrary to the scalability requirement listed in the rationale, I conclude that neither are present.
This has never been a goal and I don't see why I should provide such functionality. Also I suspect that state-local storage could prevent such a minimization.
However, I would like to ask if (excepting possible compiler limitations) is there anything about the library that precludes extending it such that NDFAs could be specified in code, and converted into the minimized DFA during compilation?
Yes, the scalability requirements. As I explain above (and you seem to have found that yourself) there is no way to have a global view at an FSM implemented with this library.
We had tuple and mpl separately before fusion; it seems reasonable for boost to have both compile- and run-time fsm libraries, if ultimately they may be similarly fused.
I don't think that is feasible unless you limit yourself to FSMs specified in single TUs so that a global view is possible.
I like this library, but I'm not satisfied with it. What level of library is good enough for boost? The library is not ready for standardization, however, this doesn't appear to be the standard that most people use when deciding upon their vote. This library is certainly good enough for people to make successful use of.
The "default" black disc in the diagrams is too large! Maybe make the radius about 2/3s of what it is now.
I think the size is in line with the current UML standard, but I'll check again.
Change "Such transitions are currently flagged" to "Transitions across orthogonal regions are currently flagged". It's the first sentence in the last paragraph of the rationale.
Ok, noted.
Thanks for your review!
Regards,
-- Andreas Huber
When replying by private email, please remove the words spam and trap from the address shown in the header.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Jonathan Turkanis wrote:
"Andreas Huber" wrote in message
Hi Dave
Let us consider that we have multiple fsms and we would like to apply operations to these fsms, e.g. direct concatenation (the terminal state of one automata becomes the start state of another),
I guess I must have sent this message by mistake. Sorry. Jonathan
participants (4)
-
Andreas Huber
-
Dave Gomboc
-
Jonathan Turkanis
-
Peter Petrov