[boost] Re: FSM review

12 Mar 2005

      Jonathan Turkanis wrote:
...
...
Do I really say this? I don't see any place where I say or imply that
something can't be done because *I* couldn't think of a solution.
Of course not. Otherwise I wouldn't have had to make up the obviously
fake quote. Consider: "Quite a bit of effort has gone into making the
library fast for small simple machines and scaleable at the same time
... the scalability does not come for free." I naturally assumed it
was your effort you were talking about.
Yes, but I don't see how one can *imply* from the above that it can't be 
done because *I* haven't found ways to do it. Anyway, this is rather OT 
and I won't argue any more about this.
...
I'm just arguing that the Rationale doesn't adequately explain why
better performance is not possible. I'm willing to be convinced that
it isn't possible, and that you've made the best possible effort to
find more efficient alternatives.
Define best possible effort. How much time do you think should I spend 
on this?
...
...
...
- Two-dimensional array of function pointers. This explanation makes
no sense to me, even as supplemented by Andreas's reply to Iain
Hanson this morning. It's common for a composite component to make
use of several components implemented in different TUs. For
instance, the composite commonent may "know about" the other
components by derivation, by containment, typelists, etc. When it
needs to assemble the array of pointers, it can simply ask each of
them for their contribution.
No, in a nutshell it cannot because it doesn't know how many or what
components exist.
I'm not talking about the proposed library here.
Neither do I, I'm talking about any library satisfying the 9 
requirements.
...
I accept that it's
not possible in your implementation.
Noted.
...
...
Of course, there must be links between the
components
but C++ does not allow you to follow these links in a useful way and
there is no way to force the components in the TUs to register
themselves. The only way you can guarantee the registration is by
requiring the programmer to register all the parts explicitly, which
is contrary to the scalability requirements.
You frequently mention "manually registering" components. It would be
helpful if you would explain what you mean by this. For example, a
speciailization of std::pair has two components, first_type and
second_type. When I write
typedef std::pair<int, float> pair_type;
does this count as "manually registering" the components of
pair_type, since I have to know what they are and write them down?
I'm inclined to answer yes but I'm not sure whether this example isn't 
too far from what we're currently discussing.
...
Lots of C++ libraries allow users to construct composite components
out of components defined in different TUs.
I think it is worth noting that "usual" C++ components consist of a 
declaration and a definition. In the example above, if int and float 
were user-defined types (classes), you'd need to include their header 
files. In the context of this discussion this does *not* count as 
multiple TUs. That's because I don't see how such a separation is 
useful/feasible for states (as always: suggestions are welcome), i.e. 
state declaration and definition are more or less ineviatbly in one 
place.
...
I'm not trying to be
stubborn; I really don't understand why FSM is special in this regard.
See above.
...
...
Looking at the code above we notice several things:
1. Visibility: In the proposed FSM lib, an outer state "knows" only
its inner initial state (which can be an incomplete type) but none
of its other inner states (Active does not know that Running exists,
neither
does StopWatch). An inner state knows all its outer states. Now, that
it is done this way and not the other way round (outer states know
all their inner states but inner states don't know their outer
states) is
not a coincidence for the simple reason that the other way wouldn't
leave any room for parts in multiple TUs. I also think this is pretty
much the only sensible way how you can spread an FSM over multiple
TUs. I can't prove this (for me this is just obvious).
Here you've come quite close to saying: "The requirements are X; I
tried hard to satisfy X ...but I couldn't; therefore, it can't be
done."
Define "quite close". I'm saying that *I* don't see other ways (you're 
welcome to show such a way). I'm am *not* saying that it can't be done. 
As mentioned above, I don't think such word games are very productive. I 
won't argue any longer about this.
...
...
So, if you have any
objections be sure to attach a proposal on how you would select the
visibility to enable spreading over multiple TUs.
Please -- I acknowledged at the beginning of my review that I hadn't
had time to thoroughly examine the implementation and suggest
improvements.
The visibility discussion is purely about the interface, it has nothing 
to do with the implementation.
...
But I do have a little experience in C++, and I wanted
to point out what I saw as gaps in your rationale.
That's fine with me.
...
...
2. Connections: The proposed FSM lib connects the parts at link time,
through a call to the react function. This works fine but does leave
you
with the "problem" that there is no way to have a global view at your
FSM, *not* *even* *at* *runtime*. When the FSM still resides inside
Stopped you don't have *any* way of knowing that Running even exists.
Likewise, when the transition to Running has been made, you cannot
possibly know whether there are other states because Running might
itself define similar custom_reactions that lead you to other states
when the reactions are triggered. Note that the custom reaction
cannot possibly have any knowledge about the target state as you
then can no longer hide Running in a separate TU.
Here you're just describing your particular implementation.
Yes.
...
...
Now, let us consider other possibilities how Running could be
connected
to Stopped (I'm neither assuming a particular lib interface nor a
particular lib implementation, the only thing that I assume is that
we have an FSM employing a hypothetical lib with the same states
Active, Stopped, Running spread over the TUs Running.cpp and
Main.cpp):
Good, now you're going to consider alternatives.
It seems you haven't noticed that I did that in the Visibility 
discussion too.
...
...
a) At compile time only: I don't see how that could work: Running.cpp
does not have any knowledge about Main.cpp and vice versa.
As far as I'm concerned, you've just dismissed this out of hand
because you can't see how to do it.
NO, I haven't dismissed that possiblity I've just stated a fact, namely 
that *I* don't see how it could work. This also falls into the word 
games category and I won't argue any more about this.
...
Possibly the joints between parts of machines defined in separate TUs
must be weaker (i.e., less known at compile time) than the
connections between collections of states implemented in a single TU.
Yes, that occurred to me also. I don't see how I can make the link 
weaker without sacrificing static checking (or other requirements). And 
please: I'm NOT dismissing that possibility and concrete suggestions are 
always *very* welcome.
...
...
b) At runtime only: State definition could contain a static instance
of a class, which, in its constructor, registers the state with its
outer state.
This is well-known not to work, for the reasons you state.
...
c) Any combination of the above: I don't see how that could improve
things in any way
If I was working on the library, I'd try a combination of your
current technique and a)
Concrete suggestions welcome. Especially interesting would be a proof 
that shows either of the following:
a) Different TUs can share (read/write) information at compile time
b) If a) isn't possible, how link-time connections can be "improved" by 
CT information available from a single TU

I hope we agree that proving a) or b) is a lot less work than proving 
that it can't be done.
...
...
3. State-local storage: Consider what we need to do if we want to
give Running a new auxilary state variable, e.g. the startTime_
mentioned
in the tutorial. Because Running has its own storage, we can simply
add a member to its class and are done. We don't need to change
Active or
the StopWatch.
If the proposed FSM lib wouldn't provide storage for each state, it
would force people to change the state_machine each time they need to
introduce such a variable (or at least once when they introduce the
first variable).
I never said you should omit state-local storage.
Agreed. I added this discussion because you are dissatisfied with the 
overall performance and I think that state-local storage is usually the 
main perf bottleneck.
...
...
Point 1 & 2 limit the maximum possible dispatch performance, more on
that later.
Point 3 limits the maximum possible transition performance, which
will typically be the main performance bottleneck.
...
...
If there are order of initialization problems, the arrays can be
allocated on first use.
...
No, that is not an option. Unless the allocation and the filling of
the array always take considerably less time than the reaction
itself, this would prohibit the use of such an FSM lib in real-time
applications. In real-time applications you mostly cannot afford to
do such lazy
init stuff, because that would raise the upper limit of the reaction
time to often unacceptable levels.
You snipped the context.
The context is irrelevant, see below.
...
Jonathan Turkanis wrote:
...
When it needs to assemble
the array of pointers, it can simply ask each of them for their
contribution. If there are order of initialization problems, the
arrays can be allocated on first use.
The first use would be when the machine is constructing the table of
pointers, which would presumably be when it is constructed or
immediately after it is started.
How does the machine construct the the table of pointers when it doesn't 
know how the full FSM layout looks like?
...
...
...
In my experience, C++ provides extremely fine-grained control over
which parts of a computation are performed at compile time and which
at runtime. I have a strong suspicion that a FSM library could be
written which satisfies the nine requirements and also provides
ruthless efficiencies, at least in many common cases.
I believe a library satisfying the 9 requirements can at best provide
O(n) dispatch for an FSM using all the features, where n is the
number of currently active orthogonal regions. So, if you don't use
orthogonal states you get constant time dispatch. The reason lies in
the fact that, as I hope I have shown above,
Sorry, but no.
Too bad, it's the best I can do. As I said, it is much more difficult to 
prove that a certain design is the best possible than showing with a 
concrete proposal that an improvement is possible. I for one don't think 
that spending any more effort on the former is justified.
...
...
you cannot know how your
whole FSM looks. Dispatch must always begin with a currently active
innermost state and consider its reactions first. Only if there is no
suitable reaction for the innermost state, the next outer state can
be considered and so on until an outermost state is reached. Now,
since the innermost state knows all its outer states at compile time,
it could collapse all the reactions of the outer states into a single
STT, yielding constant time lookup for each innermost state
*separately*. Hence, if there are n innermost states active at the
same time you have O(n) lookup.  In any case I don't expect dispatch
to be the main performance bottleneck in a typical FSM, see above.
...
...
It may turn out
to be necessary to sacrifice performance for some FSMs, but I
believe I hybrid approach may be possible which uses fast dispatch
in the majority of cases. I'm willing to be convinced that it is
impossible, but I haven't seen a satisfying explanation yet.
...
A hybrid approach could indeed be possible, but I don't think it can
be done without requiring help from the user (i.e. complicating the
interface) or constraining the functionality.
Have you tried?
No, because my instincts say the opposite of what yours say.
...
My instinct says it can be done.
Suggestions welcome.
...
...
For example you could
require the user to somehow (at CT or RT) register all the possible
states of a machine or you could prohibit custom_reactions (i.e.
require the implementation of the whole FSM in one TU).
Again, I don't know what you mean by "registering all possible
states". If it just means listing them in a typelist, for instance, I
Yes, that is one possibility.
...
don't see why this can't be done separately for each TU that contains
part of the state machine implementation. The definition of the whole
machine doesn't necessarily have to enumerate all the states.
You seem to be ignoring orthogonal states.
...
And I'm not suggesting you prohibit custom reactions.
Noted.
...
...
Both would
give you O(1) lookup. But let's not forget that lookup is typically
not the main performance bottleneck! Neither of these measures would
speed up transition execution.
As far I can see, you have not explained why state local storage is
expensive, even in your implementation. Is it dynamic allocation?
No.
...
Why
not construct small states into a properly aligned buffer?
You can do that with the current version by overloading operator new of 
each state.
Construction is inefficient because that involves various book-keeping 
tasks. I think it is best if you have a look at the code.
...
...
Transition execution can only be sped up by prohibiting state-local
storage, i.e. by further constraining the functionality.
"Why not?"
you might ask.
Of course.
...
It seems that this can only be done if you complicate
the interface for everyone, including the users who need the full
functionality. A user requiring state- local storage would be forced
to somehow associate a state idenitifier with a type, which she can
then use as a container to hold the auxillary variables.
I can't see why this would be necessary. Honestly, I really don't
understand what you're talking about here. As I said at the beginning
I accept the requirements, including state local storage.
See above, because it is the main performance hog.
...
...
I guess here probably lies our main disagreement: For me the users
requiring the full functionality have the priority. Why? Because I
think that in a certain class of applications most non-trivial FSMs
will hugely benefit from state-local storage (at least some users
have confirmed that).
Nothing I said in my review implied that you should sacrifice
features for performance; all I want is a detailed explanation why
this would be necessary.
Ok, it was just a guess.
...
Others have suggested that the library might be accepted but renamed
to indicate that it did not pretend to be the definitive C++
implementation of FSMs. For instance, the proposed library would be
"Boost.UMLStatecharts," while a separate library might cover faster
state machines with fewer features.
I'm against this unless it can be shown that the better perfomance
can only be obtained by sacrificing functionality. I want to emphase
again that I phrased my review as an objection to the Rationale
section of the documentation.
As I said, I can't explain it any better than I already did. This 
doesn't mean that I won't answer questions.
I have to accept the consequences.
...
...
To conclude the above:
...
2. I agree that it is theoretically possible to implement a hybrid
library that supports both classes, but *only* at the cost of a much
more complex interface *and* a much more complex implementation.  I
would even go as far as saying that two separate libraries optimized
for each class would have very little in common (interface or
implementation). I don't see any benefits in a hybrid approach other
than being able to say: "This is *the* solution for all your FSM
problems". And it could very well turn out that the hybrid interface
is so complex that users would not consider using it.
My guess is that a smooth interface might be possible.
A concrete proposal is welcome.
...
...
...
- The restriction on event deferral -- one must pass an event
pointed to by an instance of intrusive_ptr -- is extremely
inelegant:
int main()
     {
           myMachine.process_event(
                *boost::intrusive_ptr<Event>( new Event() )
           );
     }
I'm sure there is a good reason for this restriction, but there must
be a cleaner way to accomplish the same thing.
Have you read the rationale for this?
http://tinyurl.com/6xbo2 (Limitations, Deferring and posting events)
You don't explain why a pointer to a dynamically allocated event
can't be passed.
I could offer such an interface, but that would leave users guessing who 
will be deleting the event after consumption.
...
...
...
- I don't understand why it is necessary to pass a type as a
length-one mpl::list simply because it is a class template
instantiation. The docs say it is "due to some template-related
implementation details", but this sounds fishy to me.
This is the example in the docs:
struct NotShooting : fsm::simple_state< NotShooting, Camera,
  /* ... */, mpl::list< fsm::deep_history< Idle > >,
  fsm::has_deep_history >
{
  // ...
};
Internally, I use mpl::is_sequence to find out whether we already
have a list of inner initial states. If there's none I need to make
such a list. At the point of definition of NotShooting, Idle is an
incomplete type. Now the problem is that mpl::is_sequence triggers an
instantiation of the deep_history template and I have not found a way
how I can prevent access to Idle inside that template. When you wrap
that in a list the deep_history template is not instantiated and Idle
is not accessed. I'm sure there is a solution but the problem simply
never made it to the top of my to-do list.
Perhaps you could automatically wrap the template argument at that
position in a sequence-wrapper which turns most types into length-one
sequences but is specialized for mpl::list to pass through the
intrinsic operations.
That might work, yes.

Regards,

-- 
Andreas Huber

When replying by private email, please remove the words spam and trap
from the address shown in the header.