[boost] FSM review

10 Mar 2005

      I vote to reject the FSM library because I find the rationale for not providing
better performance insufficient. Specifically, I am not convinced that the
framework could not have been implemented to use static dispatch, constant-time
dynamic dispatch or some combination of these and the current approach.

Normally, before voting to reject a library I like to study the implementation
thoroughly so I can make concrete proposals about how it might be improved. I
haven't had time to do this with the FSM library, but I would still like my view
to be known.

I'll start by going through the relevant portions of rationale.html; then I'll
answer the review questions.

-------

* Introduction. The nine requirements are well-stated, and I accept them.

* Dynamic configurability. I agree with the argument for not making all state
machines dynamically-configurable. However, the documentation does not explain
why it would not be possible to support both dynamically-configurable and
non-dynamically-configurable state machines in the same framework, the way
xpressive supports both dynamically and statically specified regular
expressions. However, I don't consider dynamically-configurability essential, so
I'm satisfied with the library in this respect.

* Speed versus scalability tradeoffs.

Here there is a big problem. I accept that there must be a speed versus
scalability tradeoff; in fact, I'd be slightly surprised if there weren't.
However, I would normally expect a slow-down of at most a few percentage points
when using a framework; I find it stunning that the difference in performance
between hand-written machines and framework-generated machines is so significant
(10ns/10ns vs. 130ns/220ns), even in a "somewhat unrealistic" test case.

In order to accept this performance gap I would need a very convincing
explanation. Instead, the explanation seems to be of this form: "The
requirements are X; I tried hard to satisfy X without compromising performance,
but I couldn't; therefore, it can't be done." Even if we accept that the author
really has tried all known techniques, it is important to document what was
tried and why it doesn't work, since many library users are sure to be surprised
by the performance difference and wonder whether it represents a flawed design.

The explanation of why the performance difference shouldn't matter is also
unconvincing. I'm particularly disturbed by the implication that FMSs should not
be used for parsing. Why not? I have a candidate application of FMSs which is
very similar to parsing. I'm currently modifying the Iostreams library to
accommodate non-blocking i/o, and I have come to see that many filters can be
thought of as state machines. The fact that a dispatch may involve a linear
serach involving an RTTI check at each step immediately rules out use of the FSM
framework to implement filters, since this search would have to be performed for
each character. So if I want to benefit from the FSM abstraction, I would have
to write my own framework. I don't see why this should be necessary.

* Double dispatch. This seems like it should be a subsection of "Speed versus
scalability tradeoffs"

- Acyclic visitor: I accept this explanation.
- GOF Visitor: It's natural to ask why a complex state machine can't be broken
up into several parts which don't know much about each other, each of which uses
a more efficient form of dispatch. Less efficient dispatch would be needed only
at
junctions between the separate parts. Dave Gomboc raised a similar point, and I
don't think it has been adequately addressed.
- Two-dimensional array of function pointers. This explanation makes no sense to
me, even as supplemented by Andreas's reply to Iain Hanson this morning. It's
common for a composite component to make use of several components implemented
in different TUs. For instance, the composite commonent may "know about" the
other components by derivation, by containment, typelists, etc. When it needs to
assemble the array of pointers, it can simply ask each of them for their
contribution.
If there are order of initialization problems, the arrays can be allocated on
first use.

-------

In my experience, C++ provides extremely fine-grained control over which parts
of a computation are performed at compile time and which at runtime. I have a
strong suspicion that a FSM library could be written which satisfies the nine
requirements and also provides ruthless efficiencies, at least in many common
cases. It may turn out to be necessary to sacrifice performance for some FSMs,
but I believe I hybrid approach may be possible which uses fast dispatch in the
majority of cases. I'm willing to be convinced that it is impossible, but I
haven't seen a satisfying explanation yet.

Pavel Vozenilek wrote:
...
Here are some questions you might want to answer in
your review:
* What is your evaluation of the documentation?
      How easy (or hard) it is to understand library
      features? What can be improved?
The documentation is quite good, except that the Rationale is insufficient.
...
* What is your evaluation of the design?
      What features are supported better by
      alternative FSMs? What could be added
      (or removed) from the library?
Since I've already discussed my dissatisfaction with performance, I'll just
address the library's user interface.

Overall, I find the design very good. Others have complained about use of CRTP
or having to forward to declare states, but to me it looks pretty easy to use. I
only have a few comments:

- The documentation warns that executing code after calling
simple_state::transit (and several other functions) can lead to undefined
behavior. I believe the library should prevent this from occurring. One approach
would be to have the function specify a result which is stored until react has
finished executing, and then acted upon. Another possibility would be to give
result move semantics, so result instances can't be stored in local variables
and returned later.

- The restriction on event deferral -- one must pass an event pointed to by an
instance of intrusive_ptr -- is extremely inelegant:

       int main()
      {
            myMachine.process_event(
                 *boost::intrusive_ptr<Event>( new Event() )
            );
      }

I'm sure there is a good reason for this restriction, but there must be a
cleaner way to accomplish the same thing.

- The procedure for posting events from inside entry actions looks messy; this
is noted in the docs, and it's not nearly as bad as the syntax for event
deferral, but I'd like to see it improved.

- I don't understand why it is necessary to pass a type as a length-one
mpl::list simply because it is a class template instantiation. The docs say it
is "due to some template-related implementation details", but this sounds fishy
to me.
...
* The library documentation contains
      few not yet solved issues (name,
      separating the library into two parts,
      exception handling). What is you opinion here?
I have no opinion.
...
* What is your evaluation of the implementation?
      Are there parts of code too obscure or
      duplicating exiting Boost functionality?
      Can something be factored out to standalone
      library or among general utilities?
Unfortunately, I only had time to glance at the implementation.
...
* Are there performance bottlenecks?
      Does the library design fit requirements
      of real-time systems? How is it useable
      for embedded systems?
      Is the documentation clear on performance
      tradeoffs?
See above.
...
* What is your evaluation of the potential
      usefulness of the library?
Useful, judging by the comments of reviewers who are actually using the library.
...
Can you compare this FSM to other implementations?
No.
...
* Did you try to use the library?  With what
      compiler? Did you have any problems?
      Do you have tips for better support of older
      compilers? What are possible portability problems?
I ran the regression tests on seven compilers, with the following results:

1. VC7.1 -- passed all tests
2. VC8.0 -- passed all tests
3. VC7.0 --
       ...failed updating 36 targets..
       ...skipped 144 targets...
       ...updated 506 targets...
4. GCC 3.4.1 (Cygwin) -- passed all tests
4. GCC 3.2 (MinGW) -- passed all tests
5. Intel 8.0
    ...failed updating 4 targets...
    ...skipped 4 targets...
    ...updated 678 targets...
6. CW 8.3
    ...failed updating 28 targets...
    ...skipped 112 targets...
    ...updated 546 targets...
7. Borland 5.6.4
   ...failed updating 36 targets...
    ...skipped 144 targets...
    ...updated 506 targets...

I'd be happy to send the full error messages.
...
* How much effort did you put into your
      evaluation? A glance? A quick reading? In-depth study?
Probably about six or seven hours studying the documentation, reading the review
comments and preparing my review.
...
* Are you knowledgeable about the problem domain?
I'm not very familiar with FMSs, but know quite a bit about other formal models
of computation such as DFAs and Abstract State Machines.
...
Pavel Vozenilek
FSM review manager
Jonathan

[boost] FSM review

Jonathan Turkanis