[boost] review of proposed fsm library

7 Mar 2005

      Reviewer info:
==============

I have an M.Sc. in computing science.  Much of my experience is with
state-space search, which has certain similarities to the topic at
hand.  Also, lately I have been working daily with code that ought
to have been written using an FSM library, but wasn't! :-(

I have been following the FSM discussion on the list, and have
focussed on the library for one day, reading through the docs and
rationale, plus Harel's initial statechart paper.  The rationale
document did not convince me of the correctness of some design
choices, so I did not attempt to write code with the library.

Reviewer comments:
==================

Let us consider that we have multiple fsms and we would like to
apply operations to these fsms, e.g. direct concatenation (the
terminal state of one automata becomes the start state of another),
or choice (depending on an action of, or an accept/reject by fsm a,
proceed to execute fsm b or fsm c).  This is apparently
straightforwared when working at the level of states; however, the
library draws a distinction between a state_machine and a state.
This acts as a barrier to the direct composition of state machines
that we would like to use as components within large state machines.

The distinction made between state and state_machine, which I
consider to be unfortunate, was not justified in the rationale.  For
naturally recursive structures, distinguishing between the
structure's frontier and its interior by type is best avoided.
(Analogously, the head of a (singly-linked) list and the root of a
tree would not be distinguished by type from their successors and
children, respectively.)  Composition is critical!  See, for
instance, section 12.4 of Keller
(http://www.cs.hmc.edu/claremont/keller/webBook/ch12/).  An example
of multiple-machine composition into a large machine should also
exist in the tutorial.  The simplest useful such case would likely
be connecting two half-adders to make a full-adder machine.

There is a brief note in the tutorial re: submachines that fails to
do this topic justice.  If the template approach recommended there
is to be the canonical form for a reusable component, that style
should be propagated through all but the simplest examples in the
tutorial.  Each submachine should clearly be labelled as part of its
own .hpp file.

I was pleased that the tutorial and rationale collectively contain
considerable discussion of exception handling.

In the speed vs. scalability trade-off section of the rationale
page, it is claimed that using an fsm for a parsing task is an
abuse, excusing the relatively low performance of the library.
However, finite-state automata are inextricably linked with the
acceptance of regular languages: see, for instance, section 12.2 of
Keller (again,
http://www.cs.hmc.edu/claremont/keller/webBook/ch12/); it is not too
much to ask that an fsm library to be able to serve as the
underpinning of a reasonably efficient regex library.  The lack of
support for dynamically-configurable automata is also unfortunate,
though not unexpected given the requirements list.

Other reviewers have also already commented upon the lack of a
table-lookup dispatch method.  From the rationale:
    "To warrant fast table lookup, states and events must be
represented with an integer.  To keep the table as small as
possible, the numbering should be continuous, e.g. if there are ten
states, it's best to use the ids 0-9.  To ensure continuity of ids,
all states are best defined in the same header file.  The same
applies to events.  Again, this does not scale."

However, I think that for table lookup there is no requirement that
ids be consecutive -- a compile-time write-once read-many hash table
could do the trick.  But even that may be an aside.  I am
unconvinced by:
    "To satisfy requirement 6 [to support the development of
arbitrarily large and complex state machines], it should be possible
that to spread a single state machine over several translation
units.  This however means that the dispatch table must be filled at
runtime and the different trnaslation units must somehow make
themselves 'known' so that their part of the state machine can be
added to the table.  There simply is no way to do this automatically
and portably."
While perhaps each state (compilation module) must know of its
successor states, no central table of "current state x action x next
state" encompassing the entire machine need be explicitly created.

Under the "Limitations" section, reference is made to "lost events":
    "It is currently not possible to specially handle events that
have not triggered a recation (and would thus be silently
discarded).  However, a facility allowing this will probably be
added in the not-too-distant future."
If the event is irrelevant at that point in the state machine, it is
most properly ignored.   If it is not irrelevant, there ought to be
a transition (perhaps a self-transition).  Therefore, I don't
understand the comment.

I have an uneasy feeling about how differently asynchronous and
synchronous machines are operated, but did not consider this issue
in detail.

Allowing the tracking of history is intruiging; the prohibition
against deep history of states with orthogonal regions is
suspicious.  The explanation given is that it would be more
complicated (but still possible) to implement.  In my experience,
that sort of reasoning just doesn't fly at Boost.

The discussion of fork and join bars, combined with the event
dispatch to orthogonal regions discussion immediately below that,
suggests that the implementation is not truly designed to handle
concurrent states.  I think, however, if one is to claim that UML
statecharts are supported that the semantics of execution ought to
be the same!  I can well imagine that this "would complicate the
implementation quite a bit", but Boost is also about interface, not
only implementation, and if the wrong interface is standardized
(whether de jure or de facto) it's worse for everybody in the end.

There is no explicit discussion of non-deterministic fsms.  Of
course, every such fsm can be re-expressed deterministically.
Similarly, a given fsm may be non-minimal, but such minimization
procedures exist.  Both of these procedures necessarily require a
global view of the fsm.  As this is contrary to the scalability
requirement listed in the rationale, I conclude that neither are
present. :-)  However, I would like to ask if (excepting possible
compiler limitations) is there anything about the library that
precludes extending it such that NDFAs could be specified in code,
and converted into the minimized DFA during compilation?

(http://www2.toki.or.id/book/AlgDesignManual/BOOK/BOOK5/NODE207.HTM#
SECTION03177000000000000000 gives a toy example of FSM
minimization).

We had tuple and mpl separately before fusion; it seems reasonable
for boost to have both compile- and run-time fsm libraries, if
ultimately they may be similarly fused.

Reviewer conclusion:
====================

I like this library, but I'm not satisfied with it.  What level of
library is good enough for boost?  The library is not ready for
standardization, however, this doesn't appear to be the standard
that most people use when deciding upon their vote.  This library is
certainly good enough for people to make successful use of.

Ultimately, I think there are issues of design and rationale that
must be sorted through, appropriate modifications made, followed by
re-review.

Less weighty issues:
====================

    struct Greeting;
    struct Machine: fsm::state_machine<Machine, Greeting> {};

(and all similar) should be changed to

    struct Machine: fsm::state_machine<Machine, struct Greeting> {};

If necessary, the old way can be metioned as a workaround for older
compilers in a footnote.

The "default" black disc in the diagrams is too large!  Maybe make
the radius about 2/3s of what it is now.

Change "Such transitions are currently flagged" to "Transitions
across orthogonal regions are currently flagged".  It's the first
sentence in the last paragraph of the rationale.

[boost] review of proposed fsm library

Dave Gomboc