
I vote to reject the FSM library because I find the rationale for not providing better performance insufficient. Specifically, I am not convinced that the framework could not have been implemented to use static dispatch, constant-time dynamic dispatch or some combination of these and the current approach. Normally, before voting to reject a library I like to study the implementation thoroughly so I can make concrete proposals about how it might be improved. I haven't had time to do this with the FSM library, but I would still like my view to be known. I'll start by going through the relevant portions of rationale.html; then I'll answer the review questions. ------- * Introduction. The nine requirements are well-stated, and I accept them. * Dynamic configurability. I agree with the argument for not making all state machines dynamically-configurable. However, the documentation does not explain why it would not be possible to support both dynamically-configurable and non-dynamically-configurable state machines in the same framework, the way xpressive supports both dynamically and statically specified regular expressions. However, I don't consider dynamically-configurability essential, so I'm satisfied with the library in this respect. * Speed versus scalability tradeoffs. Here there is a big problem. I accept that there must be a speed versus scalability tradeoff; in fact, I'd be slightly surprised if there weren't. However, I would normally expect a slow-down of at most a few percentage points when using a framework; I find it stunning that the difference in performance between hand-written machines and framework-generated machines is so significant (10ns/10ns vs. 130ns/220ns), even in a "somewhat unrealistic" test case. In order to accept this performance gap I would need a very convincing explanation. Instead, the explanation seems to be of this form: "The requirements are X; I tried hard to satisfy X without compromising performance, but I couldn't; therefore, it can't be done." Even if we accept that the author really has tried all known techniques, it is important to document what was tried and why it doesn't work, since many library users are sure to be surprised by the performance difference and wonder whether it represents a flawed design. The explanation of why the performance difference shouldn't matter is also unconvincing. I'm particularly disturbed by the implication that FMSs should not be used for parsing. Why not? I have a candidate application of FMSs which is very similar to parsing. I'm currently modifying the Iostreams library to accommodate non-blocking i/o, and I have come to see that many filters can be thought of as state machines. The fact that a dispatch may involve a linear serach involving an RTTI check at each step immediately rules out use of the FSM framework to implement filters, since this search would have to be performed for each character. So if I want to benefit from the FSM abstraction, I would have to write my own framework. I don't see why this should be necessary. * Double dispatch. This seems like it should be a subsection of "Speed versus scalability tradeoffs" - Acyclic visitor: I accept this explanation. - GOF Visitor: It's natural to ask why a complex state machine can't be broken up into several parts which don't know much about each other, each of which uses a more efficient form of dispatch. Less efficient dispatch would be needed only at junctions between the separate parts. Dave Gomboc raised a similar point, and I don't think it has been adequately addressed. - Two-dimensional array of function pointers. This explanation makes no sense to me, even as supplemented by Andreas's reply to Iain Hanson this morning. It's common for a composite component to make use of several components implemented in different TUs. For instance, the composite commonent may "know about" the other components by derivation, by containment, typelists, etc. When it needs to assemble the array of pointers, it can simply ask each of them for their contribution. If there are order of initialization problems, the arrays can be allocated on first use. ------- In my experience, C++ provides extremely fine-grained control over which parts of a computation are performed at compile time and which at runtime. I have a strong suspicion that a FSM library could be written which satisfies the nine requirements and also provides ruthless efficiencies, at least in many common cases. It may turn out to be necessary to sacrifice performance for some FSMs, but I believe I hybrid approach may be possible which uses fast dispatch in the majority of cases. I'm willing to be convinced that it is impossible, but I haven't seen a satisfying explanation yet. Pavel Vozenilek wrote:
Here are some questions you might want to answer in your review:
* What is your evaluation of the documentation? How easy (or hard) it is to understand library features? What can be improved?
The documentation is quite good, except that the Rationale is insufficient.
* What is your evaluation of the design? What features are supported better by alternative FSMs? What could be added (or removed) from the library?
Since I've already discussed my dissatisfaction with performance, I'll just address the library's user interface. Overall, I find the design very good. Others have complained about use of CRTP or having to forward to declare states, but to me it looks pretty easy to use. I only have a few comments: - The documentation warns that executing code after calling simple_state::transit (and several other functions) can lead to undefined behavior. I believe the library should prevent this from occurring. One approach would be to have the function specify a result which is stored until react has finished executing, and then acted upon. Another possibility would be to give result move semantics, so result instances can't be stored in local variables and returned later. - The restriction on event deferral -- one must pass an event pointed to by an instance of intrusive_ptr -- is extremely inelegant: int main() { myMachine.process_event( *boost::intrusive_ptr<Event>( new Event() ) ); } I'm sure there is a good reason for this restriction, but there must be a cleaner way to accomplish the same thing. - The procedure for posting events from inside entry actions looks messy; this is noted in the docs, and it's not nearly as bad as the syntax for event deferral, but I'd like to see it improved. - I don't understand why it is necessary to pass a type as a length-one mpl::list simply because it is a class template instantiation. The docs say it is "due to some template-related implementation details", but this sounds fishy to me.
* The library documentation contains few not yet solved issues (name, separating the library into two parts, exception handling). What is you opinion here?
I have no opinion.
* What is your evaluation of the implementation? Are there parts of code too obscure or duplicating exiting Boost functionality? Can something be factored out to standalone library or among general utilities?
Unfortunately, I only had time to glance at the implementation.
* Are there performance bottlenecks? Does the library design fit requirements of real-time systems? How is it useable for embedded systems? Is the documentation clear on performance tradeoffs?
See above.
* What is your evaluation of the potential usefulness of the library?
Useful, judging by the comments of reviewers who are actually using the library.
Can you compare this FSM to other implementations?
No.
* Did you try to use the library? With what compiler? Did you have any problems? Do you have tips for better support of older compilers? What are possible portability problems?
I ran the regression tests on seven compilers, with the following results: 1. VC7.1 -- passed all tests 2. VC8.0 -- passed all tests 3. VC7.0 -- ...failed updating 36 targets.. ...skipped 144 targets... ...updated 506 targets... 4. GCC 3.4.1 (Cygwin) -- passed all tests 4. GCC 3.2 (MinGW) -- passed all tests 5. Intel 8.0 ...failed updating 4 targets... ...skipped 4 targets... ...updated 678 targets... 6. CW 8.3 ...failed updating 28 targets... ...skipped 112 targets... ...updated 546 targets... 7. Borland 5.6.4 ...failed updating 36 targets... ...skipped 144 targets... ...updated 506 targets... I'd be happy to send the full error messages.
* How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
Probably about six or seven hours studying the documentation, reading the review comments and preparing my review.
* Are you knowledgeable about the problem domain?
I'm not very familiar with FMSs, but know quite a bit about other formal models of computation such as DFAs and Abstract State Machines.
Pavel Vozenilek FSM review manager
Jonathan

Hi Jonathan Thanks for your review! It seems that at least one of your doubts/questions I have already answered elsewhere, especially:
The explanation of why the performance difference shouldn't matter is also unconvincing. I'm particularly disturbed by the implication that FMSs should not be used for parsing. Why not? I have a candidate
http://article.gmane.org/gmane.comp.lib.boost.devel/119246 Let me know whether that's convincing or not. At the moment, I don't have time to answer your other points. Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.

Andreas Huber wrote:
Hi Jonathan
Thanks for your review!
I'm sorry it was not more positive.
It seems that at least one of your doubts/questions I have already answered elsewhere, especially:
The explanation of why the performance difference shouldn't matter is also unconvincing. I'm particularly disturbed by the implication that FMSs should not be used for parsing. Why not? I have a candidate
Sorry I missed this. I think I missed the whole message.
Let me know whether that's convincing or not.
Okay. Let me see if I understand you correctly: Andreas Huber wrote:
... What I want to say here is that it is a bad idea to use this library for parsing, as it is so general that performance tradeoffs are inevitable.
This seems to be a restatement of the performance vs. scalability part of the rationale, correct?
For parsing, one is much better off with a parser framework, which is optimized for the task.
The IMO important observation is the following: Parsers like Spirit only need to return when they have finished processing all "events" (characters). An FSM however must return after processing each event. can exploit this fact and store state with recursive calls, which is of course much faster (some stuff can even be inlined) than any super-clever double-dispatch scheme. There is often a hard limit for stack space which - I think - is the reason why FSMs do make sense in parsers nevertheless.
Here are you talking about any FSM framework or about your library in particular? At any rate, I don't understand why recursion is more efficient. In the case I mentioned -- non-blocking filters -- there is a strong resemblence with FSMs because the filter may be interrupted at any point during its processing if input is temporarily unavailable or if output buffers are full. Filter writers must categorize the various points at which the filtering algorithm might be interupted and save this information so that when processing begins again the algorithm can continue correctly. These categories correspond to states. Furthermore, there is often auxilliary information that must be stored; e.g., if a filter was interupted after it had consumed a character but before it could pass it downstream, the character must be stored together with the state. This auxilliary information varies from state to state; it would therefore seem to correspond to state-local storage.
However, all the FSM-employing parsers I know are of the dynamic variant (e.g. regex) and these FSMs are not directly designed by humans but computed by an algorithm.
I don't see the relevance of this.
Last but not least I think it would be tedious for a human to implement a parser on the state- level
It's also tedious to write a non-blocking filter by hand; being able to use FSMs could help bring some order to the chaos.
At the moment, I don't have time to answer your other points.
Okay.
Regards,
Jonathan

Jonathan Turkanis wrote:
Sorry I missed this. I think I missed the whole message.
Let me know whether that's convincing or not.
Okay. Let me see if I understand you correctly:
Andreas Huber wrote:
... What I want to say here is that it is a bad idea to use this library for parsing, as it is so general that performance tradeoffs are inevitable.
This seems to be a restatement of the performance vs. scalability part of the rationale, correct?
Yes.
For parsing, one is much better off with a parser framework, which is optimized for the task.
The IMO important observation is the following: Parsers like Spirit only need to return when they have finished processing all "events" (characters). An FSM however must return after processing each event. can exploit this fact and store state with recursive calls, which is of course much faster (some stuff can even be inlined) than any super-clever double-dispatch scheme. There is often a hard limit for stack space which - I think - is the reason why FSMs do make sense in parsers nevertheless.
Here are you talking about any FSM framework or about your library in particular?
I'm talking about any FSM framework.
At any rate, I don't understand why recursion is more efficient.
Recursion can be implemented with simple (non-virtual) function calls. Moreover, if recursion depth is known at compile time, you can inline such calls. I believe double-dispatch needs at least one indirection (virtual call, function pointer, whatever). I might be wrong.
In the case I mentioned -- non-blocking filters -- there is a strong resemblence with FSMs because the filter may be interrupted at any point during its processing if input is temporarily unavailable or if output buffers are full. Filter writers must categorize the various points at which the filtering algorithm might be interupted and save this information so that when processing begins again the algorithm can continue correctly. These categories correspond to states. Furthermore, there is often auxilliary information that must be stored; e.g., if a filter was interupted after it had consumed a character but before it could pass it downstream, the character must be stored together with the state. This auxilliary information varies from state to state; it would therefore seem to correspond to state-local storage.
Ok.
However, all the FSM-employing parsers I know are of the dynamic variant (e.g. regex) and these FSMs are not directly designed by humans but computed by an algorithm.
I don't see the relevance of this.
It's a real-world data point that supports (only weakly, admittedly) my view that you not normally implement parsers-FSMs manually but you use a tool to do that for you.
Last but not least I think it would be tedious for a human to implement a parser on the state- level
It's also tedious to write a non-blocking filter by hand; being able to use FSMs could help bring some order to the chaos.
I can't really comment on this, I'm not knowledgeable about these things. Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.

* Introduction. The nine requirements are well-stated, and I accept them.
Good. That's a start. [snip]
* Speed versus scalability tradeoffs.
Here there is a big problem. I accept that there must be a speed versus scalability tradeoff; in fact, I'd be slightly surprised if there weren't. However, I would normally expect a slow-down of at most a few percentage points when using a framework; I find it stunning that the difference in performance between hand-written machines and framework-generated machines is so significant (10ns/10ns vs. 130ns/220ns), even in a "somewhat unrealistic" test case.
See below.
In order to accept this performance gap I would need a very convincing explanation. Instead, the explanation seems to be of this form: "The requirements are X; I tried hard to satisfy X without compromising performance, but I couldn't; therefore, it can't be done."
Do I really say this? I don't see any place where I say or imply that something can't be done because *I* couldn't think of a solution.
The explanation of why the performance difference shouldn't matter is also unconvincing. I'm particularly disturbed by the implication that FMSs should not be used for parsing. Why not?
See my upcoming answer in the branch of this thread.
* Double dispatch. This seems like it should be a subsection of "Speed versus scalability tradeoffs"
Noted.
- GOF Visitor: It's natural to ask why a complex state machine can't be broken up into several parts which don't know much about each other, each of which uses a more efficient form of dispatch. Less efficient dispatch would be needed only at junctions between the separate parts. Dave Gomboc raised a similar point, and I don't think it has been adequately addressed.
I think the answer lies in the way how an FSM can be split up usefully, see below. Also, dispatch is typically not the main performance problem.
- Two-dimensional array of function pointers. This explanation makes no sense to me, even as supplemented by Andreas's reply to Iain Hanson this morning. It's common for a composite component to make use of several components implemented in different TUs. For instance, the composite commonent may "know about" the other components by derivation, by containment, typelists, etc. When it needs to assemble the array of pointers, it can simply ask each of them for their contribution.
No, in a nutshell it cannot because it doesn't know how many or what components exist. Of course, there must be links between the components but C++ does not allow you to follow these links in a useful way and there is no way to force the components in the TUs to register themselves. The only way you can guarantee the registration is by requiring the programmer to register all the parts explicitly, which is contrary to the scalability requirements. I can only show this with an example: What follows is stripped-down StopWatch code, one state (Running) is implemented in a separate TU. I'm currently too tired to push this through a compiler so I hope to get this correct enough to make sense: *** Main.hpp *** #include <boost/fsm/state_machine.hpp> #include <boost/fsm/event.hpp> #include <boost/fsm/simple_state.hpp> struct Active; struct StopWatch : fsm::state_machine< StopWatch, Active > {}; struct Stopped; struct Active : fsm::simple_state< Active, StopWatch, fsm::no_reactions, Stopped > {}; *** Main.cpp *** #include "Main.hpp" struct Stopped : fsm::simple_state< Stopped, Active, fsm::custom_reaction< EvStartStop > > { // Here we only say that we might do something with EvStartStop fsm::result react( const EvStartStop & ); }; int main() { StopWatch stopWatch; stopWatch.process_event( EvStartStop() ); return 0; } *** Running.cpp *** #include <boost/fsm/simple_state.hpp> #include "Main.hpp" struct Running : fsm::simple_state< Running, Active > {}; fsm::result Stopped::react( const EvStartStop & ) { // Here we connect the so far unrelated parts of the FSM together. return transit< Running >(); } Looking at the code above we notice several things: 1. Visibility: In the proposed FSM lib, an outer state "knows" only its inner initial state (which can be an incomplete type) but none of its other inner states (Active does not know that Running exists, neither does StopWatch). An inner state knows all its outer states. Now, that it is done this way and not the other way round (outer states know all their inner states but inner states don't know their outer states) is not a coincidence for the simple reason that the other way wouldn't leave any room for parts in multiple TUs. I also think this is pretty much the only sensible way how you can spread an FSM over multiple TUs. I can't prove this (for me this is just obvious). So, if you have any objections be sure to attach a proposal on how you would select the visibility to enable spreading over multiple TUs. It is worth noting that there are strong links to base classes and derived classes. A base class corresponds to an outer state, a derived class corresponds to an inner state, with the same directions of visibility, order of definition and inheritance of properties. 2. Connections: The proposed FSM lib connects the parts at link time, through a call to the react function. This works fine but does leave you with the "problem" that there is no way to have a global view at your FSM, *not* *even* *at* *runtime*. When the FSM still resides inside Stopped you don't have *any* way of knowing that Running even exists. Likewise, when the transition to Running has been made, you cannot possibly know whether there are other states because Running might itself define similar custom_reactions that lead you to other states when the reactions are triggered. Note that the custom reaction cannot possibly have any knowledge about the target state as you then can no longer hide Running in a separate TU. Now, let us consider other possibilities how Running could be connected to Stopped (I'm neither assuming a particular lib interface nor a particular lib implementation, the only thing that I assume is that we have an FSM employing a hypothetical lib with the same states Active, Stopped, Running spread over the TUs Running.cpp and Main.cpp): a) At compile time only: I don't see how that could work: Running.cpp does not have any knowledge about Main.cpp and vice versa. b) At runtime only: State definition could contain a static instance of a class, which, in its constructor, registers the state with its outer state. The outermost states would register with the FSM. This way the FSM would know all its states and also their place in the state hierarchy. Problem is: C++ does not guarantee when exactly the ctors of the static objects are called (see 9.4.2/7, 3.6.2/3). They might be called only "before the first use of any function or object defined in the same translation unit as the [static] object". So, registration with static instances is pretty much useless in our case. It's a hen/egg problem: The states don't register themselves until we call a function in the TU they are defined in but we don't know what functions we need to call until they are registered. I might be reading the standard wrong but I know of at least one platform that implements it exactly this way (MSVC7.1). That is enough for me to steer well clear of automatic runtime registration. There's another problem: Runtime-only registration pretty much rules out some forms of static checking (invalid transitions). c) Any combination of the above: I don't see how that could improve things in any way 3. State-local storage: Consider what we need to do if we want to give Running a new auxilary state variable, e.g. the startTime_ mentioned in the tutorial. Because Running has its own storage, we can simply add a member to its class and are done. We don't need to change Active or the StopWatch. If the proposed FSM lib wouldn't provide storage for each state, it would force people to change the state_machine each time they need to introduce such a variable (or at least once when they introduce the first variable). This is contrary to the scalability requirements. I can't think of other possibilities to ensure the scalability in this area. To improve the situation one can surely toy around with not destructing the objects holding the variables on exit (call entry() and exit() member functions instead), but I don't think this will buy you much performance-wise because it would make the bookkeeping inside the FSM harder (somewhere you still need to track active and inactive states). Also, in FSMs with many states this would fatten the memory footprint considerably. Point 1 & 2 limit the maximum possible dispatch performance, more on that later. Point 3 limits the maximum possible transition performance, which will typically be the main performance bottleneck.
If there are order of initialization problems, the arrays can be allocated on first use.
No, that is not an option. Unless the allocation and the filling of the array always take considerably less time than the reaction itself, this would prohibit the use of such an FSM lib in real-time applications. In real-time applications you mostly cannot afford to do such lazy init stuff, because that would raise the upper limit of the reaction time to often unacceptable levels. In other words: In real-time applications you much prefer having always the same performance characteristics for every event, even if that means that the average performance is worse than the average performance that could be achieved by making lazy init optimizations for the first event.
In my experience, C++ provides extremely fine-grained control over which parts of a computation are performed at compile time and which at runtime. I have a strong suspicion that a FSM library could be written which satisfies the nine requirements and also provides ruthless efficiencies, at least in many common cases.
I believe a library satisfying the 9 requirements can at best provide O(n) dispatch for an FSM using all the features, where n is the number of currently active orthogonal regions. So, if you don't use orthogonal states you get constant time dispatch. The reason lies in the fact that, as I hope I have shown above, you cannot know how your whole FSM looks. Dispatch must always begin with a currently active innermost state and consider its reactions first. Only if there is no suitable reaction for the innermost state, the next outer state can be considered and so on until an outermost state is reached. Now, since the innermost state knows all its outer states at compile time, it could collapse all the reactions of the outer states into a single STT, yielding constant time lookup for each innermost state *separately*. Hence, if there are n innermost states active at the same time you have O(n) lookup. In any case I don't expect dispatch to be the main performance bottleneck in a typical FSM, see above.
It may turn out to be necessary to sacrifice performance for some FSMs, but I believe I hybrid approach may be possible which uses fast dispatch in the majority of cases. I'm willing to be convinced that it is impossible, but I haven't seen a satisfying explanation yet.
A hybrid approach could indeed be possible, but I don't think it can be done without requiring help from the user (i.e. complicating the interface) or constraining the functionality. For example you could require the user to somehow (at CT or RT) register all the possible states of a machine or you could prohibit custom_reactions (i.e. require the implementation of the whole FSM in one TU). Both would give you O(1) lookup. But let's not forget that lookup is typically not the main performance bottleneck! Neither of these measures would speed up transition execution. Transition execution can only be sped up by prohibiting state-local storage, i.e. by further constraining the functionality. "Why not?" you might ask. It seems that this can only be done if you complicate the interface for everyone, including the users who need the full functionality. A user requiring state- local storage would be forced to somehow associate a state idenitifier with a type, which she can then use as a container to hold the auxillary variables. I guess here probably lies our main disagreement: For me the users requiring the full functionality have the priority. Why? Because I think that in a certain class of applications most non-trivial FSMs will hugely benefit from state-local storage (at least some users have confirmed that). To conclude the above: 1. Users confirm that the current interface and features is/are useful for a certain (IMO sufficiently large) class of applications. I realize that there is another sufficiently large class of applications which don't need many of performance-degrading features present in the proposed FSM library. 2. I agree that it is theoretically possible to implement a hybrid library that supports both classes, but *only* at the cost of a much more complex interface *and* a much more complex implementation. I would even go as far as saying that two separate libraries optimized for each class would have very little in common (interface or implementation). I don't see any benefits in a hybrid approach other than being able to say: "This is *the* solution for all your FSM problems". And it could very well turn out that the hybrid interface is so complex that users would not consider using it.
- The documentation warns that executing code after calling simple_state::transit (and several other functions) can lead to undefined behavior. I believe the library should prevent this from occurring. One approach would be to have the function specify a result which is stored until react has finished executing, and then acted upon. Another possibility would be to give result move semantics, so result instances can't be stored in local variables and returned later.
- The restriction on event deferral -- one must pass an event pointed to by an instance of intrusive_ptr -- is extremely inelegant:
int main() { myMachine.process_event( *boost::intrusive_ptr<Event>( new Event() ) ); }
I'm sure there is a good reason for this restriction, but there must be a cleaner way to accomplish the same thing.
Have you read the rationale for this? http://tinyurl.com/6xbo2 (Limitations, Deferring and posting events)
- The procedure for posting events from inside entry actions looks messy; this is noted in the docs, and it's not nearly as bad as the syntax for event deferral, but I'd like to see it improved.
I don't see how. Suggestions welcome.
- I don't understand why it is necessary to pass a type as a length-one mpl::list simply because it is a class template instantiation. The docs say it is "due to some template-related implementation details", but this sounds fishy to me.
This is the example in the docs: struct NotShooting : fsm::simple_state< NotShooting, Camera, /* ... */, mpl::list< fsm::deep_history< Idle > >, fsm::has_deep_history > { // ... }; Internally, I use mpl::is_sequence to find out whether we already have a list of inner initial states. If there's none I need to make such a list. At the point of definition of NotShooting, Idle is an incomplete type. Now the problem is that mpl::is_sequence triggers an instantiation of the deep_history template and I have not found a way how I can prevent access to Idle inside that template. When you wrap that in a list the deep_history template is not instantiated and Idle is not accessed. I'm sure there is a solution but the problem simply never made it to the top of my to-do list. [snip]
I ran the regression tests on seven compilers, with the following results:
1. VC7.1 -- passed all tests 2. VC8.0 -- passed all tests 3. VC7.0 -- ...failed updating 36 targets.. ...skipped 144 targets... ...updated 506 targets... 4. GCC 3.4.1 (Cygwin) -- passed all tests 4. GCC 3.2 (MinGW) -- passed all tests 5. Intel 8.0 ...failed updating 4 targets... ...skipped 4 targets... ...updated 678 targets... 6. CW 8.3 ...failed updating 28 targets... ...skipped 112 targets... ...updated 546 targets... 7. Borland 5.6.4 ...failed updating 36 targets... ...skipped 144 targets... ...updated 506 targets...
I'd be happy to send the full error messages.
Please do, thanks for the testing! Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.

Andreas Huber wrote:
* Introduction. The nine requirements are well-stated, and I accept them.
Good. That's a start.
[snip]
* Speed versus scalability tradeoffs.
In order to accept this performance gap I would need a very convincing explanation. Instead, the explanation seems to be of this form: "The requirements are X; I tried hard to satisfy X without compromising performance, but I couldn't; therefore, it can't be done."
Do I really say this? I don't see any place where I say or imply that something can't be done because *I* couldn't think of a solution.
Of course not. Otherwise I wouldn't have had to make up the obviously fake quote. Consider: "Quite a bit of effort has gone into making the library fast for small simple machines and scaleable at the same time ... the scalability does not come for free." I naturally assumed it was your effort you were talking about. I'm just arguing that the Rationale doesn't adequately explain why better performance is not possible. I'm willing to be convinced that it isn't possible, and that you've made the best possible effort to find more efficient alternatives.
- Two-dimensional array of function pointers. This explanation makes no sense to me, even as supplemented by Andreas's reply to Iain Hanson this morning. It's common for a composite component to make use of several components implemented in different TUs. For instance, the composite commonent may "know about" the other components by derivation, by containment, typelists, etc. When it needs to assemble the array of pointers, it can simply ask each of them for their contribution.
No, in a nutshell it cannot because it doesn't know how many or what components exist.
I'm not talking about the proposed library here. I accept that it's not possible in your implementation.
Of course, there must be links between the components but C++ does not allow you to follow these links in a useful way and there is no way to force the components in the TUs to register themselves. The only way you can guarantee the registration is by requiring the programmer to register all the parts explicitly, which is contrary to the scalability requirements.
You frequently mention "manually registering" components. It would be helpful if you would explain what you mean by this. For example, a speciailization of std::pair has two components, first_type and second_type. When I write typedef std::pair<int, float> pair_type; does this count as "manually registering" the components of pair_type, since I have to know what they are and write them down? Lots of C++ libraries allow users to construct composite components out of components defined in different TUs. I'm not trying to be stubborn; I really don't understand why FSM is special in this regard.
I can only show this with an example:
<snip detailed example>
Looking at the code above we notice several things:
1. Visibility: In the proposed FSM lib, an outer state "knows" only its inner initial state (which can be an incomplete type) but none of its other inner states (Active does not know that Running exists, neither does StopWatch). An inner state knows all its outer states. Now, that it is done this way and not the other way round (outer states know all their inner states but inner states don't know their outer states) is not a coincidence for the simple reason that the other way wouldn't leave any room for parts in multiple TUs. I also think this is pretty much the only sensible way how you can spread an FSM over multiple TUs. I can't prove this (for me this is just obvious).
Here you've come quite close to saying: "The requirements are X; I tried hard to satisfy X ...but I couldn't; therefore, it can't be done."
So, if you have any objections be sure to attach a proposal on how you would select the visibility to enable spreading over multiple TUs.
Please -- I acknowledged at the beginning of my review that I hadn't had time to thoroughly examine the implementation and suggest improvements. But I do have a little experience in C++, and I wanted to point out what I saw as gaps in your rationale.
2. Connections: The proposed FSM lib connects the parts at link time, through a call to the react function. This works fine but does leave you with the "problem" that there is no way to have a global view at your FSM, *not* *even* *at* *runtime*. When the FSM still resides inside Stopped you don't have *any* way of knowing that Running even exists. Likewise, when the transition to Running has been made, you cannot possibly know whether there are other states because Running might itself define similar custom_reactions that lead you to other states when the reactions are triggered. Note that the custom reaction cannot possibly have any knowledge about the target state as you then can no longer hide Running in a separate TU.
Here you're just describing your particular implementation.
Now, let us consider other possibilities how Running could be connected to Stopped (I'm neither assuming a particular lib interface nor a particular lib implementation, the only thing that I assume is that we have an FSM employing a hypothetical lib with the same states Active, Stopped, Running spread over the TUs Running.cpp and Main.cpp):
Good, now you're going to consider alternatives.
a) At compile time only: I don't see how that could work: Running.cpp does not have any knowledge about Main.cpp and vice versa.
As far as I'm concerned, you've just dismissed this out of hand because you can't see how to do it. Possibly the joints between parts of machines defined in separate TUs must be weaker (i.e., less known at compile time) than the connections between collections of states implemented in a single TU.
b) At runtime only: State definition could contain a static instance of a class, which, in its constructor, registers the state with its outer state.
This is well-known not to work, for the reasons you state.
c) Any combination of the above: I don't see how that could improve things in any way
If I was working on the library, I'd try a combination of your current technique and a)
3. State-local storage: Consider what we need to do if we want to give Running a new auxilary state variable, e.g. the startTime_ mentioned in the tutorial. Because Running has its own storage, we can simply add a member to its class and are done. We don't need to change Active or the StopWatch. If the proposed FSM lib wouldn't provide storage for each state, it would force people to change the state_machine each time they need to introduce such a variable (or at least once when they introduce the first variable).
I never said you should omit state-local storage.
Point 1 & 2 limit the maximum possible dispatch performance, more on that later. Point 3 limits the maximum possible transition performance, which will typically be the main performance bottleneck.
If there are order of initialization problems, the arrays can be allocated on first use.
No, that is not an option. Unless the allocation and the filling of the array always take considerably less time than the reaction itself, this would prohibit the use of such an FSM lib in real-time applications. In real-time applications you mostly cannot afford to do such lazy init stuff, because that would raise the upper limit of the reaction time to often unacceptable levels.
You snipped the context. Jonathan Turkanis wrote:
When it needs to assemble the array of pointers, it can simply ask each of them for their contribution. If there are order of initialization problems, the arrays can be allocated on first use.
The first use would be when the machine is constructing the table of pointers, which would presumably be when it is constructed or immediately after it is started.
In my experience, C++ provides extremely fine-grained control over which parts of a computation are performed at compile time and which at runtime. I have a strong suspicion that a FSM library could be written which satisfies the nine requirements and also provides ruthless efficiencies, at least in many common cases.
I believe a library satisfying the 9 requirements can at best provide O(n) dispatch for an FSM using all the features, where n is the number of currently active orthogonal regions. So, if you don't use orthogonal states you get constant time dispatch. The reason lies in the fact that, as I hope I have shown above,
Sorry, but no.
you cannot know how your whole FSM looks. Dispatch must always begin with a currently active innermost state and consider its reactions first. Only if there is no suitable reaction for the innermost state, the next outer state can be considered and so on until an outermost state is reached. Now, since the innermost state knows all its outer states at compile time, it could collapse all the reactions of the outer states into a single STT, yielding constant time lookup for each innermost state *separately*. Hence, if there are n innermost states active at the same time you have O(n) lookup. In any case I don't expect dispatch to be the main performance bottleneck in a typical FSM, see above.
It may turn out to be necessary to sacrifice performance for some FSMs, but I believe I hybrid approach may be possible which uses fast dispatch in the majority of cases. I'm willing to be convinced that it is impossible, but I haven't seen a satisfying explanation yet.
A hybrid approach could indeed be possible, but I don't think it can be done without requiring help from the user (i.e. complicating the interface) or constraining the functionality.
Have you tried? My instinct says it can be done.
For example you could require the user to somehow (at CT or RT) register all the possible states of a machine or you could prohibit custom_reactions (i.e. require the implementation of the whole FSM in one TU).
Again, I don't know what you mean by "registering all possible states". If it just means listing them in a typelist, for instance, I don't see why this can't be done separately for each TU that contains part of the state machine implementation. The definition of the whole machine doesn't necessarily have to enumerate all the states. And I'm not suggesting you prohibit custom reactions.
Both would give you O(1) lookup. But let's not forget that lookup is typically not the main performance bottleneck! Neither of these measures would speed up transition execution.
As far I can see, you have not explained why state local storage is expensive, even in your implementation. Is it dynamic allocation? Why not construct small states into a properly aligned buffer?
Transition execution can only be sped up by prohibiting state-local storage, i.e. by further constraining the functionality. "Why not?" you might ask.
Of course.
It seems that this can only be done if you complicate the interface for everyone, including the users who need the full functionality. A user requiring state- local storage would be forced to somehow associate a state idenitifier with a type, which she can then use as a container to hold the auxillary variables.
I can't see why this would be necessary. Honestly, I really don't understand what you're talking about here. As I said at the beginning I accept the requirements, including state local storage.
I guess here probably lies our main disagreement: For me the users requiring the full functionality have the priority. Why? Because I think that in a certain class of applications most non-trivial FSMs will hugely benefit from state-local storage (at least some users have confirmed that).
Nothing I said in my review implied that you should sacrifice features for performance; all I want is a detailed explanation why this would be necessary. Others have suggested that the library might be accepted but renamed to indicate that it did not pretend to be the definitive C++ implementation of FSMs. For instance, the proposed library would be "Boost.UMLStatecharts," while a separate library might cover faster state machines with fewer features. I'm against this unless it can be shown that the better perfomance can only be obtained by sacrificing functionality. I want to emphase again that I phrased my review as an objection to the Rationale section of the documentation.
To conclude the above:
2. I agree that it is theoretically possible to implement a hybrid library that supports both classes, but *only* at the cost of a much more complex interface *and* a much more complex implementation. I would even go as far as saying that two separate libraries optimized for each class would have very little in common (interface or implementation). I don't see any benefits in a hybrid approach other than being able to say: "This is *the* solution for all your FSM problems". And it could very well turn out that the hybrid interface is so complex that users would not consider using it.
My guess is that a smooth interface might be possible.
- The restriction on event deferral -- one must pass an event pointed to by an instance of intrusive_ptr -- is extremely inelegant:
int main() { myMachine.process_event( *boost::intrusive_ptr<Event>( new Event() ) ); }
I'm sure there is a good reason for this restriction, but there must be a cleaner way to accomplish the same thing.
Have you read the rationale for this?
http://tinyurl.com/6xbo2 (Limitations, Deferring and posting events)
You don't explain why a pointer to a dynamically allocated event can't be passed. Even if there is a reason this can't be done, there's got to be a better way than having users write *boost::intrusive_ptr<Event>( new Event() ) Was this just a toy example? What would a real-word use look like?
- The procedure for posting events from inside entry actions looks messy; this is noted in the docs, and it's not nearly as bad as the syntax for event deferral, but I'd like to see it improved.
I don't see how. Suggestions welcome.
Okay.
- I don't understand why it is necessary to pass a type as a length-one mpl::list simply because it is a class template instantiation. The docs say it is "due to some template-related implementation details", but this sounds fishy to me.
This is the example in the docs:
struct NotShooting : fsm::simple_state< NotShooting, Camera, /* ... */, mpl::list< fsm::deep_history< Idle > >, fsm::has_deep_history > { // ... };
Internally, I use mpl::is_sequence to find out whether we already have a list of inner initial states. If there's none I need to make such a list. At the point of definition of NotShooting, Idle is an incomplete type. Now the problem is that mpl::is_sequence triggers an instantiation of the deep_history template and I have not found a way how I can prevent access to Idle inside that template. When you wrap that in a list the deep_history template is not instantiated and Idle is not accessed. I'm sure there is a solution but the problem simply never made it to the top of my to-do list.
Perhaps you could automatically wrap the template argument at that position in a sequence-wrapper which turns most types into length-one sequences but is specialized for mpl::list to pass through the intrinsic operations.
I ran the regression tests on seven compilers, with the following results:
1. VC7.1 -- passed all tests 2. VC8.0 -- passed all tests 3. VC7.0 -- ...failed updating 36 targets.. ...skipped 144 targets... ...updated 506 targets... 4. GCC 3.4.1 (Cygwin) -- passed all tests 4. GCC 3.2 (MinGW) -- passed all tests 5. Intel 8.0 ...failed updating 4 targets... ...skipped 4 targets... ...updated 678 targets... 6. CW 8.3 ...failed updating 28 targets... ...skipped 112 targets... ...updated 546 targets... 7. Borland 5.6.4 ...failed updating 36 targets... ...skipped 144 targets... ...updated 506 targets...
I'd be happy to send the full error messages.
Please do, thanks for the testing!
Okay, I will. Jonathan

Jonathan Turkanis wrote:
Do I really say this? I don't see any place where I say or imply that something can't be done because *I* couldn't think of a solution.
Of course not. Otherwise I wouldn't have had to make up the obviously fake quote. Consider: "Quite a bit of effort has gone into making the library fast for small simple machines and scaleable at the same time ... the scalability does not come for free." I naturally assumed it was your effort you were talking about.
Yes, but I don't see how one can *imply* from the above that it can't be done because *I* haven't found ways to do it. Anyway, this is rather OT and I won't argue any more about this.
I'm just arguing that the Rationale doesn't adequately explain why better performance is not possible. I'm willing to be convinced that it isn't possible, and that you've made the best possible effort to find more efficient alternatives.
Define best possible effort. How much time do you think should I spend on this?
- Two-dimensional array of function pointers. This explanation makes no sense to me, even as supplemented by Andreas's reply to Iain Hanson this morning. It's common for a composite component to make use of several components implemented in different TUs. For instance, the composite commonent may "know about" the other components by derivation, by containment, typelists, etc. When it needs to assemble the array of pointers, it can simply ask each of them for their contribution.
No, in a nutshell it cannot because it doesn't know how many or what components exist.
I'm not talking about the proposed library here.
Neither do I, I'm talking about any library satisfying the 9 requirements.
I accept that it's not possible in your implementation.
Noted.
Of course, there must be links between the components but C++ does not allow you to follow these links in a useful way and there is no way to force the components in the TUs to register themselves. The only way you can guarantee the registration is by requiring the programmer to register all the parts explicitly, which is contrary to the scalability requirements.
You frequently mention "manually registering" components. It would be helpful if you would explain what you mean by this. For example, a speciailization of std::pair has two components, first_type and second_type. When I write
typedef std::pair<int, float> pair_type;
does this count as "manually registering" the components of pair_type, since I have to know what they are and write them down?
I'm inclined to answer yes but I'm not sure whether this example isn't too far from what we're currently discussing.
Lots of C++ libraries allow users to construct composite components out of components defined in different TUs.
I think it is worth noting that "usual" C++ components consist of a declaration and a definition. In the example above, if int and float were user-defined types (classes), you'd need to include their header files. In the context of this discussion this does *not* count as multiple TUs. That's because I don't see how such a separation is useful/feasible for states (as always: suggestions are welcome), i.e. state declaration and definition are more or less ineviatbly in one place.
I'm not trying to be stubborn; I really don't understand why FSM is special in this regard.
See above.
Looking at the code above we notice several things:
1. Visibility: In the proposed FSM lib, an outer state "knows" only its inner initial state (which can be an incomplete type) but none of its other inner states (Active does not know that Running exists, neither does StopWatch). An inner state knows all its outer states. Now, that it is done this way and not the other way round (outer states know all their inner states but inner states don't know their outer states) is not a coincidence for the simple reason that the other way wouldn't leave any room for parts in multiple TUs. I also think this is pretty much the only sensible way how you can spread an FSM over multiple TUs. I can't prove this (for me this is just obvious).
Here you've come quite close to saying: "The requirements are X; I tried hard to satisfy X ...but I couldn't; therefore, it can't be done."
Define "quite close". I'm saying that *I* don't see other ways (you're welcome to show such a way). I'm am *not* saying that it can't be done. As mentioned above, I don't think such word games are very productive. I won't argue any longer about this.
So, if you have any objections be sure to attach a proposal on how you would select the visibility to enable spreading over multiple TUs.
Please -- I acknowledged at the beginning of my review that I hadn't had time to thoroughly examine the implementation and suggest improvements.
The visibility discussion is purely about the interface, it has nothing to do with the implementation.
But I do have a little experience in C++, and I wanted to point out what I saw as gaps in your rationale.
That's fine with me.
2. Connections: The proposed FSM lib connects the parts at link time, through a call to the react function. This works fine but does leave you with the "problem" that there is no way to have a global view at your FSM, *not* *even* *at* *runtime*. When the FSM still resides inside Stopped you don't have *any* way of knowing that Running even exists. Likewise, when the transition to Running has been made, you cannot possibly know whether there are other states because Running might itself define similar custom_reactions that lead you to other states when the reactions are triggered. Note that the custom reaction cannot possibly have any knowledge about the target state as you then can no longer hide Running in a separate TU.
Here you're just describing your particular implementation.
Yes.
Now, let us consider other possibilities how Running could be connected to Stopped (I'm neither assuming a particular lib interface nor a particular lib implementation, the only thing that I assume is that we have an FSM employing a hypothetical lib with the same states Active, Stopped, Running spread over the TUs Running.cpp and Main.cpp):
Good, now you're going to consider alternatives.
It seems you haven't noticed that I did that in the Visibility discussion too.
a) At compile time only: I don't see how that could work: Running.cpp does not have any knowledge about Main.cpp and vice versa.
As far as I'm concerned, you've just dismissed this out of hand because you can't see how to do it.
NO, I haven't dismissed that possiblity I've just stated a fact, namely that *I* don't see how it could work. This also falls into the word games category and I won't argue any more about this.
Possibly the joints between parts of machines defined in separate TUs must be weaker (i.e., less known at compile time) than the connections between collections of states implemented in a single TU.
Yes, that occurred to me also. I don't see how I can make the link weaker without sacrificing static checking (or other requirements). And please: I'm NOT dismissing that possibility and concrete suggestions are always *very* welcome.
b) At runtime only: State definition could contain a static instance of a class, which, in its constructor, registers the state with its outer state.
This is well-known not to work, for the reasons you state.
c) Any combination of the above: I don't see how that could improve things in any way
If I was working on the library, I'd try a combination of your current technique and a)
Concrete suggestions welcome. Especially interesting would be a proof that shows either of the following: a) Different TUs can share (read/write) information at compile time b) If a) isn't possible, how link-time connections can be "improved" by CT information available from a single TU I hope we agree that proving a) or b) is a lot less work than proving that it can't be done.
3. State-local storage: Consider what we need to do if we want to give Running a new auxilary state variable, e.g. the startTime_ mentioned in the tutorial. Because Running has its own storage, we can simply add a member to its class and are done. We don't need to change Active or the StopWatch. If the proposed FSM lib wouldn't provide storage for each state, it would force people to change the state_machine each time they need to introduce such a variable (or at least once when they introduce the first variable).
I never said you should omit state-local storage.
Agreed. I added this discussion because you are dissatisfied with the overall performance and I think that state-local storage is usually the main perf bottleneck.
Point 1 & 2 limit the maximum possible dispatch performance, more on that later. Point 3 limits the maximum possible transition performance, which will typically be the main performance bottleneck.
If there are order of initialization problems, the arrays can be allocated on first use.
No, that is not an option. Unless the allocation and the filling of the array always take considerably less time than the reaction itself, this would prohibit the use of such an FSM lib in real-time applications. In real-time applications you mostly cannot afford to do such lazy init stuff, because that would raise the upper limit of the reaction time to often unacceptable levels.
You snipped the context.
The context is irrelevant, see below.
Jonathan Turkanis wrote:
When it needs to assemble the array of pointers, it can simply ask each of them for their contribution. If there are order of initialization problems, the arrays can be allocated on first use.
The first use would be when the machine is constructing the table of pointers, which would presumably be when it is constructed or immediately after it is started.
How does the machine construct the the table of pointers when it doesn't know how the full FSM layout looks like?
In my experience, C++ provides extremely fine-grained control over which parts of a computation are performed at compile time and which at runtime. I have a strong suspicion that a FSM library could be written which satisfies the nine requirements and also provides ruthless efficiencies, at least in many common cases.
I believe a library satisfying the 9 requirements can at best provide O(n) dispatch for an FSM using all the features, where n is the number of currently active orthogonal regions. So, if you don't use orthogonal states you get constant time dispatch. The reason lies in the fact that, as I hope I have shown above,
Sorry, but no.
Too bad, it's the best I can do. As I said, it is much more difficult to prove that a certain design is the best possible than showing with a concrete proposal that an improvement is possible. I for one don't think that spending any more effort on the former is justified.
you cannot know how your whole FSM looks. Dispatch must always begin with a currently active innermost state and consider its reactions first. Only if there is no suitable reaction for the innermost state, the next outer state can be considered and so on until an outermost state is reached. Now, since the innermost state knows all its outer states at compile time, it could collapse all the reactions of the outer states into a single STT, yielding constant time lookup for each innermost state *separately*. Hence, if there are n innermost states active at the same time you have O(n) lookup. In any case I don't expect dispatch to be the main performance bottleneck in a typical FSM, see above.
It may turn out to be necessary to sacrifice performance for some FSMs, but I believe I hybrid approach may be possible which uses fast dispatch in the majority of cases. I'm willing to be convinced that it is impossible, but I haven't seen a satisfying explanation yet.
A hybrid approach could indeed be possible, but I don't think it can be done without requiring help from the user (i.e. complicating the interface) or constraining the functionality.
Have you tried?
No, because my instincts say the opposite of what yours say.
My instinct says it can be done.
Suggestions welcome.
For example you could require the user to somehow (at CT or RT) register all the possible states of a machine or you could prohibit custom_reactions (i.e. require the implementation of the whole FSM in one TU).
Again, I don't know what you mean by "registering all possible states". If it just means listing them in a typelist, for instance, I
Yes, that is one possibility.
don't see why this can't be done separately for each TU that contains part of the state machine implementation. The definition of the whole machine doesn't necessarily have to enumerate all the states.
You seem to be ignoring orthogonal states.
And I'm not suggesting you prohibit custom reactions.
Noted.
Both would give you O(1) lookup. But let's not forget that lookup is typically not the main performance bottleneck! Neither of these measures would speed up transition execution.
As far I can see, you have not explained why state local storage is expensive, even in your implementation. Is it dynamic allocation?
No.
Why not construct small states into a properly aligned buffer?
You can do that with the current version by overloading operator new of each state. Construction is inefficient because that involves various book-keeping tasks. I think it is best if you have a look at the code.
Transition execution can only be sped up by prohibiting state-local storage, i.e. by further constraining the functionality. "Why not?" you might ask.
Of course.
It seems that this can only be done if you complicate the interface for everyone, including the users who need the full functionality. A user requiring state- local storage would be forced to somehow associate a state idenitifier with a type, which she can then use as a container to hold the auxillary variables.
I can't see why this would be necessary. Honestly, I really don't understand what you're talking about here. As I said at the beginning I accept the requirements, including state local storage.
See above, because it is the main performance hog.
I guess here probably lies our main disagreement: For me the users requiring the full functionality have the priority. Why? Because I think that in a certain class of applications most non-trivial FSMs will hugely benefit from state-local storage (at least some users have confirmed that).
Nothing I said in my review implied that you should sacrifice features for performance; all I want is a detailed explanation why this would be necessary.
Ok, it was just a guess.
Others have suggested that the library might be accepted but renamed to indicate that it did not pretend to be the definitive C++ implementation of FSMs. For instance, the proposed library would be "Boost.UMLStatecharts," while a separate library might cover faster state machines with fewer features.
I'm against this unless it can be shown that the better perfomance can only be obtained by sacrificing functionality. I want to emphase again that I phrased my review as an objection to the Rationale section of the documentation.
As I said, I can't explain it any better than I already did. This doesn't mean that I won't answer questions. I have to accept the consequences.
To conclude the above:
2. I agree that it is theoretically possible to implement a hybrid library that supports both classes, but *only* at the cost of a much more complex interface *and* a much more complex implementation. I would even go as far as saying that two separate libraries optimized for each class would have very little in common (interface or implementation). I don't see any benefits in a hybrid approach other than being able to say: "This is *the* solution for all your FSM problems". And it could very well turn out that the hybrid interface is so complex that users would not consider using it.
My guess is that a smooth interface might be possible.
A concrete proposal is welcome.
- The restriction on event deferral -- one must pass an event pointed to by an instance of intrusive_ptr -- is extremely inelegant:
int main() { myMachine.process_event( *boost::intrusive_ptr<Event>( new Event() ) ); }
I'm sure there is a good reason for this restriction, but there must be a cleaner way to accomplish the same thing.
Have you read the rationale for this?
http://tinyurl.com/6xbo2 (Limitations, Deferring and posting events)
You don't explain why a pointer to a dynamically allocated event can't be passed.
I could offer such an interface, but that would leave users guessing who will be deleting the event after consumption.
- I don't understand why it is necessary to pass a type as a length-one mpl::list simply because it is a class template instantiation. The docs say it is "due to some template-related implementation details", but this sounds fishy to me.
This is the example in the docs:
struct NotShooting : fsm::simple_state< NotShooting, Camera, /* ... */, mpl::list< fsm::deep_history< Idle > >, fsm::has_deep_history > { // ... };
Internally, I use mpl::is_sequence to find out whether we already have a list of inner initial states. If there's none I need to make such a list. At the point of definition of NotShooting, Idle is an incomplete type. Now the problem is that mpl::is_sequence triggers an instantiation of the deep_history template and I have not found a way how I can prevent access to Idle inside that template. When you wrap that in a list the deep_history template is not instantiated and Idle is not accessed. I'm sure there is a solution but the problem simply never made it to the top of my to-do list.
Perhaps you could automatically wrap the template argument at that position in a sequence-wrapper which turns most types into length-one sequences but is specialized for mpl::list to pass through the intrinsic operations.
That might work, yes. Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.

Hi Andreas, Instead of replying point by point, let me summarize your message and reply in general. 1. My one sentence summary of your rationale was unfair 2. If I object to part of your library I should offer a concrete proposal to fix it 3. If I want to understand where the performance bottlenecks are, I should examine the code. I'm still not sure what part of my summary was wrong, but I'm sorry I offended you. What I would like to see in the rationale is a comparison of a small handful (2-5) of alternate implementation technique, either approaches taken by other libraries or approaches you tried yourself early in development, together with an explanation of why they fail to satisfy the requirements. I realize you do mention some other frameworks, but you don't sketch their design/implementation or show how they fail to satisfy the requirements. Furthermore, readers wishing to understand the performance limitations of your library should not have to be told to examine the code. You should provide a section giving a fairly detailed presentation of the implementation, explaining where the performance problems arise. Andreas Huber wrote:
Jonathan Turkanis wrote:
Andreas Huber wrote:
Jonathan Turkanis wrote:
- The restriction on event deferral -- one must pass an event pointed to by an instance of intrusive_ptr -- is extremely inelegant:
You don't explain why a pointer to a dynamically allocated event can't be passed.
I could offer such an interface, but that would leave users guessing who will be deleting the event after consumption.
Jonathan

Jonathan Turkanis wrote:
Instead of replying point by point, let me summarize your message and reply in general.
1. My one sentence summary of your rationale was unfair
I think so, yes. To me, the undertone of that summary and other sentences suggested arrogance on my part. You wrote: "The requirements are X; I tried hard to satisfy X without compromising performance, but I couldn't; therefore, it can't be done." To me, especially the last 8 words imply that I think I have tried *every* technique imagineable. This implies that I know pretty much everything there is to know about C++ and associated techniques. I'm very far from claiming that.
2. If I object to part of your library I should offer a concrete proposal to fix it
Either that or prototypes of techniques of which I said I can't see how they will work. I said that because I have invested considerable time in discussing things that might come to mind when one wants to improve performance. I can't think of much else I can say in support of the current design performance-wise. So I thought it would be more productive if you or anyone else could make concrete proposals. Such a proposal would either send me back to the drawing board or I would need to argue convincingly why it doesn't work.
3. If I want to understand where the performance bottlenecks are, I should examine the code.
No, I didn't mean that in the generality your summary suggests. In *one* *specific* case I thought it is better if *you* (Jonathan) look at the code before I explain all the details in text. I did that because I *assumed* (maybe wrongly) that you want a very detailed description for which I really don't have the time at the moment.
I'm still not sure what part of my summary was wrong,
See above.
but I'm sorry I offended you.
Apology accepted.
What I would like to see in the rationale is a comparison of a small handful (2-5) of alternate implementation technique, either approaches taken by other libraries or approaches you tried yourself early in development, together with an explanation of why they fail to satisfy the requirements.
That's interesting. I was under the impression that exactly such a list of alternate implementation techniques would not satisfy you, because it would in no way show that the design I chose is the best performance-wise. And performance was your biggest concern, wasn't it?
I realize you do mention some other frameworks, but you don't sketch their design/implementation or show how they fail to satisfy the requirements. Furthermore, readers wishing to understand the performance limitations of your library should not have to be told to examine the code.
That was never my intention, see above.
You should provide a section giving a fairly detailed presentation of the implementation, explaining where the performance problems arise.
Ok, noted. Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.

Andreas Huber wrote:
Jonathan Turkanis wrote:
2. If I object to part of your library I should offer a concrete proposal to fix it
Either that or prototypes of techniques of which I said I can't see how they will work. I said that because I have invested considerable time in discussing things that might come to mind when one wants to improve performance. I can't think of much else I can say in support of the current design performance-wise. So I thought it would be more productive if you or anyone else could make concrete proposals. Such a proposal would either send me back to the drawing board or I would need to argue convincingly why it doesn't work.
I've read the documentation and most if not all of the review discussion, and still I do not find your explanations satisfying. It's very tempting to try to design an FSM library with better performance characteristics, but I don't have anywhere near enough time, esp. because I'd have to start out by becoming an FSM expert. I don't think I should have to do this before writing a review.
3. If I want to understand where the performance bottlenecks are, I should examine the code.
No, I didn't mean that in the generality your summary suggests. In *one* *specific* case I thought it is better if *you* (Jonathan) look at the code before I explain all the details in text. I did that because I *assumed* (maybe wrongly) that you want a very detailed description for which I really don't have the time at the moment.
But this one specific case -- the expense of implementing state local storage -- forms the basis for your argument that dispatch time is not so important, right? Jonathan

Jonathan Turkanis wrote:
Andreas Huber wrote:
Jonathan Turkanis wrote:
2. If I object to part of your library I should offer a concrete proposal to fix it
Either that or prototypes of techniques of which I said I can't see how they will work. I said that because I have invested considerable time in discussing things that might come to mind when one wants to improve performance. I can't think of much else I can say in support of the current design performance-wise. So I thought it would be more productive if you or anyone else could make concrete proposals. Such a proposal would either send me back to the drawing board or I would need to argue convincingly why it doesn't work.
I've read the documentation and most if not all of the review discussion, and still I do not find your explanations satisfying.
It helps if you explain exactly what you don't find satisfying, see below.
It's very tempting to try to design an FSM library with better performance characteristics, but I don't have anywhere near enough time, esp. because I'd have to start out by becoming an FSM expert. I don't think I should have to do this before writing a review.
Certainly not. However, I do expect that people questioning the performance (and along with it the implementation) are able to ask concrete questions why some things are the way they are. You partially did that in your previous posts but only partially. I think it is difficult to respond to statements like: <quote> Possibly the joints between parts of machines defined in separate TUs must be weaker (i.e., less known at compile time) than the connections between collections of states implemented in a single TU. </quote> Suggestions like this aren't of much help as long as you don't get more concrete. In the worst case you get an unconvincing "I don't see how", in the best case I can respond with a list of ideas I tried and lengthy explanations why those ideas don't work. While the latter might be more convincing, it is rather inefficient because I likely end up explaining things that you already know or have figured out yourself. As I said before, I think it is close to impossible to prove that a certain design is the best performance-wise. Also, I don't claim that there isn't any room in the current implementation for improvements, rather I think it has reached a stage where substantial (>= 2x faster) optimizations are non-trivial. I can only describe why common optimizations don't work or why they would violate one ore more of the requirements. Things that I *consciously* avoided or unsuccessfully tried are described in the rationale. Moreover, during any programmer career a bag of knowledge is acquired that is unconsciously applied to every programming job. Naturally, this bag of knowledge is not the same for everyone which is probably the main reason why the rationale seems to be sufficient for some people while others find it unsatisfying. Ergo, the only thing I can do is listen to the questions people ask and either add explanations to the rationale or change the implementation.
3. If I want to understand where the performance bottlenecks are, I should examine the code.
No, I didn't mean that in the generality your summary suggests. In *one* *specific* case I thought it is better if *you* (Jonathan) look at the code before I explain all the details in text. I did that because I *assumed* (maybe wrongly) that you want a very detailed description for which I really don't have the time at the moment.
But this one specific case -- the expense of implementing state local storage -- forms the basis for your argument that dispatch time is not so important, right?
Right. Here's a description of the relevant parts of the implementation: a) The currently active states form a tree where the number of branches is equal to the number of currently active orthogonal regions. In flat FSMs this tree consists of only the root object. In hierarchical but non-orthogonal FSMs the tree consists of exactly one branch, with the depth being equal to the nesting depth b) The state_machine subtype object maintains pointers to the currently active outermost state and all currently active innermost states c) Each object in the tree maintains pointers to its outer context and all its currently active direct inner states. Innermost states contain a std::list iterator that points to the position where the state is stored in the state_machine subtypes' innermost states list As a result of this structure, the following needs to be done during transitions: 1) State entry: For every state that is entered an object is constructed. During construction a vtable pointer is assigned. The (intrusive) reference count is set to 1. If BOOST_FSM_USE_NATIVE_RTTI is not defined another pointer member is assigned. The state is registered with its outer context. An internal pointer to the outer context is assigned, as a result the reference count of the possibly present outer state is incremented. 2) State exit: For every state that is exited an object needs to be destructed. Destruction involves deregistering the state with its outer state and decrementing the reference count of the outer state. In a FSM without state-local storage all in 1 & 2 could be avoided. Instead, a transition is simply a series of function calls to entry/exit functions. Functions with none or a small implementation can easily be inlined. Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.

Andreas Huber wrote:
I've read the documentation and most if not all of the review discussion, and still I do not find your explanations satisfying.
It helps if you explain exactly what you don't find satisfying, see below.
Of course. I was just summarizing my previous messages.
It's very tempting to try to design an FSM library with better performance characteristics, but I don't have anywhere near enough time, esp. because I'd have to start out by becoming an FSM expert. I don't think I should have to do this before writing a review.
Certainly not. However, I do expect that people questioning the performance (and along with it the implementation) are able to ask concrete questions why some things are the way they are. You partially did that in your previous posts but only partially.
I think it is difficult to respond to statements like:
<quote> Possibly the joints between parts of machines defined in separate TUs must be weaker (i.e., less known at compile time) than the connections between collections of states implemented in a single TU. </quote>
Why should it be easy to respond? What I found shocking in your documentation was the statement that hand-written machines can be much faster than framework-generated machines. To accept a library with such poor performance (even if it is adequate in many cases) requires a very convincing justification, which I believe is obviously absent from the documentation. As a reviewer, I could have done my job simply by pointing to the absence of justification, and left it at that. (This is similar to a motion for summary judgment by the defense in an American trial: if the plaintiff hasn't presented evidence, you just point out that fact and you're done.) All my other vague suggestions were attempts to suggest either (i) how the library could have been redesigned to yield better performance or (ii) how you might have documented the impossibility of achieving better performance given your requirements. I'm aware they were not fully fleshed-out proposals. I was hoping you would try to meet me half-way, by trying to figure out what I was getting at and explaing why it wouldn't work. At the very least, you could have asked me what I meant.
As I said before, I think it is close to impossible to prove that a certain design is the best performance-wise. Also, I don't claim that there isn't any room in the current implementation for improvements, rather I think it has reached a stage where substantial (>= 2x faster) optimizations are non-trivial. I can only describe why common optimizations don't work or why they would violate one ore more of the requirements.
All I wanted is a succint description of the various known approaches to implementing FSMs -- including existing frameworks and stuff you tried in the early stages of developing your framework -- together with an explanation of the performance tradeoffs.
Things that I *consciously* avoided or unsuccessfully tried are described in the rationale.
If all the things you unsuccessfully tried are described in the rationale, then you didn't try very much. I was giving you the beneift of the doubt by assuming you had omitted describing your alternate designs.
Moreover, during any programmer career a bag of knowledge is acquired that is unconsciously applied to every programming job. Naturally, this bag of knowledge is not the same for everyone
This is obviously true.
which is probably the main reason why the rationale seems to be sufficient for some people while others find it unsatisfying.
This is just funny. People who find your rationale satisfying could be people who don't care about performance as long as it's good enough for their application or people who failed to consider the same design alternatives you failed to consider.
Ergo, the only thing I can do is listen to the questions people ask and either add explanations to the rationale or change the implementation.
3. If I want to understand where the performance bottlenecks are, I should examine the code.
No, I didn't mean that in the generality your summary suggests. In *one* *specific* case I thought it is better if *you* (Jonathan) look at the code before I explain all the details in text. I did that because I *assumed* (maybe wrongly) that you want a very detailed description for which I really don't have the time at the moment.
But this one specific case -- the expense of implementing state local storage -- forms the basis for your argument that dispatch time is not so important, right?
Right. Here's a description of the relevant parts of the implementation:
<snip detailed description> This sort of stuff should be in the docs. Jonathan

"Jonathan Turkanis" <technews@kangaroologic.com> writes:
All I wanted is a succint description of the various known approaches to implementing FSMs -- including existing frameworks and stuff you tried in the early stages of developing your framework -- together with an explanation of the performance tradeoffs.
That does seem like a reasonable request. Rationales and comparisons with alternate designs are very important in Boost documentation. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
"Jonathan Turkanis" <technews@kangaroologic.com> writes:
All I wanted is a succint description of the various known approaches to implementing FSMs -- including existing frameworks and stuff you tried in the early stages of developing your framework -- together with an explanation of the performance tradeoffs.
That does seem like a reasonable request. Rationales and comparisons with alternate designs are very important in Boost documentation.
FWIW, I also think such a request is reasonable. I questioned the expectations Jonathan seems to have regarding such documentation. Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.

Jonathan Turkanis wrote:
It's very tempting to try to design an FSM library with better performance characteristics, but I don't have anywhere near enough time, esp. because I'd have to start out by becoming an FSM expert. I don't think I should have to do this before writing a review.
Certainly not. However, I do expect that people questioning the performance (and along with it the implementation) are able to ask concrete questions why some things are the way they are. You partially did that in your previous posts but only partially.
I think it is difficult to respond to statements like:
<quote> Possibly the joints between parts of machines defined in separate TUs must be weaker (i.e., less known at compile time) than the connections between collections of states implemented in a single TU. </quote>
Why should it be easy to respond?
I'm not saying responding should be easy. My point was that such statements are not likely to bring up useful answers.
What I found shocking in your documentation was the statement that hand-written machines can be much faster than framework-generated machines.
One could probably produce an equally shocking case for just about any non-trivial library out there. My "mistake" was to measure it and publish the numbers.
To accept a library with such poor performance (even if it is adequate in many cases) requires a very convincing justification, which I believe is obviously absent from the documentation.
That became sufficiently apparent during the review. My hope was that people would understand that the provided functionality must have a significant overhead. I was obviously wrong and there will be better documentation in this direction.
As a reviewer, I could have done my job simply by pointing to the absence of justification, and left it at that.
Fortunately, there are many other things that matter in a review.
(This is similar to a motion for summary judgment by the defense in an American trial: if the plaintiff hasn't presented evidence, you just point out that fact and you're done.)
This seems to imply that in a review the library author is the plaintiff and the reviewers are the defendants. I don't think the roles in a review regarding evidence are usually as clear-cut as they are in a trial.
All my other vague suggestions were attempts to suggest either (i) how the library could have been redesigned to yield better performance or (ii) how you might have documented the impossibility of achieving better performance given your requirements.
This is exactly something that I don't think I will ever be able to prove. I can only provide evidence that hints in this direction. New C++ techniques and idioms are discovered almost on a monthly basis and there's always the possibility that such a technique enables a much better implementation (e.g. I wouldn't have imagined that it is possible to enumerate overloads like Alexander does or I couldn't possibly have conceived the technique that makes possible the FOREACH macro). I think it is close to impossible for an individual to be proficient in all the currently known techniques, not to mention the ones that have yet to be discovered.
I'm aware they were not fully fleshed-out proposals. I was hoping you would try to meet me half-way, by trying to figure out what I was getting at and explaing why it wouldn't work.
... which I did in the cases where I could imagine candidate solutions. I the cases where I couldn't see one I said so. Since the latter usually didn't lead to a follow-up on your part I assumed that you don't have a solution either. There's no meeting half-way in such a situation.
At the very least, you could have asked me what I meant.
AFAICT, that was usually not the problem. I *think* I mostly understood what you meant, see above.
As I said before, I think it is close to impossible to prove that a certain design is the best performance-wise. Also, I don't claim that there isn't any room in the current implementation for improvements, rather I think it has reached a stage where substantial (>= 2x faster) optimizations are non-trivial. I can only describe why common optimizations don't work or why they would violate one ore more of the requirements.
All I wanted is a succint description of the various known approaches to implementing FSMs -- including existing frameworks and stuff you tried in the early stages of developing your framework -- together with an explanation of the performance tradeoffs.
I'll add a few links with a discussion of the performance-tradeoffs.
Things that I *consciously* avoided or unsuccessfully tried are described in the rationale.
If all the things you unsuccessfully tried are described in the rationale, then you didn't try very much. I was giving you the beneift of the doubt by assuming you had omitted describing your alternate designs.
What else should I have tried?
Moreover, during any programmer career a bag of knowledge is acquired that is unconsciously applied to every programming job. Naturally, this bag of knowledge is not the same for everyone
This is obviously true.
which is probably the main reason why the rationale seems to be sufficient for some people while others find it unsatisfying.
This is just funny. People who find your rationale satisfying could be people who don't care about performance as long as it's good enough for their application or people who failed to consider the same design alternatives you failed to consider.
Which alternatives did I fail to consider?
Right. Here's a description of the relevant parts of the implementation:
<snip detailed description>
This sort of stuff should be in the docs.
It will be. -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.

Andreas Huber wrote:
Jonathan Turkanis wrote:
Andreas Huber wrote:
I think it is difficult to respond to statements like:
<quote> Possibly the joints between parts of machines defined in separate TUs must be weaker (i.e., less known at compile time) than the connections between collections of states implemented in a single TU. </quote>
Why should it be easy to respond?
I'm not saying responding should be easy. My point was that such statements are not likely to bring up useful answers.
Since you apparently don't know what I was suggesting, and you didn't bother to ask, I don't see how you can say this.
What I found shocking in your documentation was the statement that hand-written machines can be much faster than framework-generated machines.
One could probably produce an equally shocking case for just about any non-trivial library out there. My "mistake" was to measure it and publish the numbers.
This is ridiculous. Consider, for example, the iterators library, or Andrei and Dave's smart pointer library. If incrementing an iterator constructed using the framework could take up to twenty times as long as incrementing a hand-written iterator, or if copying a framework-generated smart pointer was twenty times as expensive as copying a hand-written smart pointer, these libraries would be dead in the water. A case closer to home is the iostreams library. I conducted measurements showing that a stringstream constructed using the library was essentially indistinguishable from Dinkumware stringstream, and that a library-generated fstream was slightly faster (See http://lists.boost.org/MailArchives/boost/msg71285.php for caveats) I couldn't honestly recommend that somebody use a library-generated socket_stream, say, if it would involve any significant slowdown.
To accept a library with such poor performance (even if it is adequate in many cases) requires a very convincing justification, which I believe is obviously absent from the documentation.
That became sufficiently apparent during the review. My hope was that people would understand that the provided functionality must have a significant overhead. I was obviously wrong and there will be better documentation in this direction.
As a reviewer, I could have done my job simply by pointing to the absence of justification, and left it at that.
Fortunately, there are many other things that matter in a review.
My objection to your library was based on a deficiency in the rationale. I don't know if this deficiency could be remedied without exposing fundamental problems in your library. The only way to find out would be to try to fix it.
(This is similar to a motion for summary judgment by the defense in an American trial: if the plaintiff hasn't presented evidence, you just point out that fact and you're done.)
This seems to imply that in a review the library author is the plaintiff and the reviewers are the defendants.
Yes. Metaphorically, of course.
I don't think the roles in a review regarding evidence are usually as clear-cut as they are in a trial.
Hah! You should read a few trial transcripts. But we digress.
All my other vague suggestions were attempts to suggest either (i) how the library could have been redesigned to yield better performance or (ii) how you might have documented the impossibility of achieving better performance given your requirements.
This is exactly something that I don't think I will ever be able to prove. I can only provide evidence that hints in this direction. New C++ techniques and idioms are discovered almost on a monthly basis and there's always the possibility that such a technique enables a much better implementation (e.g. I wouldn't have imagined that it is possible to enumerate overloads like Alexander does or I couldn't possibly have conceived the technique that makes possible the FOREACH macro). I think it is close to impossible for an individual to be proficient in all the currently known techniques, not to mention the ones that have yet to be discovered.
Okay, I shouldn't have said "impossibility"; maybe "impossibility, using known methods" or "unliklihood, using known methods" would have been more accurate. But seriously, you must know that I'm not insisting you prove that you're library could not possibly be improved by yet-to-be-discovered-techniques.
I'm aware they were not fully fleshed-out proposals. I was hoping you would try to meet me half-way, by trying to figure out what I was getting at and explaing why it wouldn't work.
... which I did in the cases where I could imagine candidate solutions. I the cases where I couldn't see one I said so. Since the latter usually didn't lead to a follow-up on your part I assumed that you don't have a solution either. There's no meeting half-way in such a situation.
This is getting very vague.
Things that I *consciously* avoided or unsuccessfully tried are described in the rationale.
If all the things you unsuccessfully tried are described in the rationale, then you didn't try very much. I was giving you the beneift of the doubt by assuming you had omitted describing your alternate designs.
What else should I have tried?
Normally, in designing a framework one considers a small handful (say two to six) of alternate designs which view the subject matter from completely different perspectives. In your case, what were those alternate designs?
Moreover, during any programmer career a bag of knowledge is acquired that is unconsciously applied to every programming job. Naturally, this bag of knowledge is not the same for everyone
This is obviously true.
which is probably the main reason why the rationale seems to be sufficient for some people while others find it unsatisfying.
This is just funny. People who find your rationale satisfying could be people who don't care about performance as long as it's good enough for their application or people who failed to consider the same design alternatives you failed to consider.
Which alternatives did I fail to consider?
Note the subjunctive mood. Let me make one last attempt to explain what should be in the rationale. Look at section 16.2 of "The C++ Programming Language", 3d ed. Here Stroustrup gives two design criteria for a containers library, and then discusses three designs: "Specialized containers and iterators" - satisfies the first criterion "Based Containers" - satisfies the second criterion "STL" - satifies both criteria I'm not suggesting that you give such a detailed treatement as Stroustrup. However, if you were to give a similar overview of the landscape of possible FSM frameworks, and thoughtfully discuss the tradeoffs in flexibility and performace which each involves, reviewers might have some confidence that you are aware of the techniques which might be used to construct a fast, flexible FSM framework, and have carefully considered the design. This type of overview would probably be a useful component of any library's documentation; in your case it is made absolutely essential by the performance figures you cite. Jonathan

From: "Jonathan Turkanis" <technews@kangaroologic.com>
Andreas Huber wrote:
Jonathan Turkanis wrote:
Andreas Huber wrote:
<quote> Possibly the joints between parts of machines defined in separate TUs must be weaker (i.e., less known at compile time) than the connections between collections of states implemented in a single TU. </quote>
Why should it be easy to respond?
I'm not saying responding should be easy. My point was that such statements are not likely to bring up useful answers.
Since you apparently don't know what I was suggesting, and you didn't bother to ask, I don't see how you can say this.
The discussion is devolving. I think you both are seriously trying to understand the other, but both are failing. Without digging out the previous messages and confirming the matter, I think Jonathan's suggestion, quoted above, was simply that *when* the user wants separate translation units *and* the features that would typically require tight coupling, you could introduce a weaker link -- some form of indirection -- that would slow the resulting FSM or even complicate its specification, but would at least permit the functionality.
What I found shocking in your documentation was the statement that hand-written machines can be much faster than framework-generated machines.
One could probably produce an equally shocking case for just about any non-trivial library out there. My "mistake" was to measure it and publish the numbers.
This is ridiculous. Consider, for example, the iterators library, or Andrei and Dave's smart pointer library. If incrementing an iterator constructed using the framework could take up to twenty times as long as incrementing a hand-written iterator, or if copying a framework-generated smart pointer was twenty times as expensive as copying a hand-written smart pointer, these libraries would be dead in the water.
A case closer to home is the iostreams library. I conducted measurements showing that a stringstream constructed using the library was essentially indistinguishable from Dinkumware stringstream, and that a library-generated fstream was slightly faster (See http://lists.boost.org/MailArchives/boost/msg71285.php for caveats) I couldn't honestly recommend that somebody use a library-generated socket_stream, say, if it would involve any significant slowdown.
This is right on target. *Since* the current design, with all of its requirements is so much slower, then the onus is on Andreas to explain why the requirements his library meets exceed those met by other FSM libraries and directly lead to the performance overhead. Clear distinctions regarding the performance improvements gained by avoiding the performance draining features in an FSM while still using this library will help to demonstrate that situation, too.
To accept a library with such poor performance (even if it is adequate in many cases) requires a very convincing justification, which I believe is obviously absent from the documentation.
That became sufficiently apparent during the review. My hope was that people would understand that the provided functionality must have a significant overhead. I was obviously wrong and there will be better documentation in this direction.
As a reviewer, I could have done my job simply by pointing to the absence of justification, and left it at that.
Fortunately, there are many other things that matter in a review.
My objection to your library was based on a deficiency in the rationale. I don't know if this deficiency could be remedied without exposing fundamental problems in your library. The only way to find out would be to try to fix it.
It is also possible that the only way to remedy the deficiency is to remove a requirement.
All my other vague suggestions were attempts to suggest either (i) how the library could have been redesigned to yield better performance or (ii) how you might have documented the impossibility of achieving better performance given your requirements.
This is exactly something that I don't think I will ever be able to prove. I can only provide evidence that hints in this direction. New C++ techniques and idioms are discovered almost on a monthly basis and there's always the possibility that such a technique enables a much better implementation (e.g. I wouldn't have imagined that it is possible to enumerate overloads like Alexander does or I couldn't possibly have conceived the technique that makes possible the FOREACH macro). I think it is close to impossible for an individual to be proficient in all the currently known techniques, not to mention the ones that have yet to be discovered.
Okay, I shouldn't have said "impossibility"; maybe "impossibility, using known methods" or "unliklihood, using known methods" would have been more accurate. But seriously, you must know that I'm not insisting you prove that you're library could not possibly be improved by yet-to-be-discovered-techniques.
I think too much emotion is being attached to words both consciously and subconsciously.
Things that I *consciously* avoided or unsuccessfully tried are described in the rationale.
If all the things you unsuccessfully tried are described in the rationale, then you didn't try very much. I was giving you the beneift of the doubt by assuming you had omitted describing your alternate designs.
What else should I have tried?
Normally, in designing a framework one considers a small handful (say two to six) of alternate designs which view the subject matter from completely different perspectives. In your case, what were those alternate designs?
Note that Jonathan is asking that those alternatives be documented. That way, other developers who wonder why you didn't try X will see that you did and they'll see why you decided it was not appropriate.
Let me make one last attempt to explain what should be in the rationale. Look at section 16.2 of "The C++ Programming Language", 3d ed. Here Stroustrup gives two design criteria for a containers library, and then discusses three designs:
"Specialized containers and iterators" - satisfies the first criterion "Based Containers" - satisfies the second criterion "STL" - satifies both criteria
I'm not suggesting that you give such a detailed treatement as Stroustrup. However, if you were to give a similar overview of the landscape of possible FSM frameworks, and thoughtfully discuss the tradeoffs in flexibility and performace which each involves, reviewers might have some confidence that you are aware of the techniques which might be used to construct a fast, flexible FSM framework, and have carefully considered the design.
This type of overview would probably be a useful component of any library's documentation; in your case it is made absolutely essential by the performance figures you cite.
I don't think I can add any more to that. I think it is reasonable and clear. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Jonathan Turkanis wrote:
Andreas Huber wrote:
Jonathan Turkanis wrote:
Andreas Huber wrote:
I think it is difficult to respond to statements like:
<quote> Possibly the joints between parts of machines defined in separate TUs must be weaker (i.e., less known at compile time) than the connections between collections of states implemented in a single TU. </quote>
Why should it be easy to respond?
I'm not saying responding should be easy. My point was that such statements are not likely to bring up useful answers.
Since you apparently don't know what I was suggesting, and you didn't bother to ask, I don't see how you can say this.
I believe I know very well what you were suggesting and I think - please correct me if I'm wrong - that should be apparent from my answer to the quote above: <answer> Yes, that occurred to me also. I don't see how I can make the link weaker without sacrificing static checking (or other requirements). And please: I'm NOT dismissing that possibility and concrete suggestions are always *very* welcome. </answer> I assume that you would have followed up on that had my answer shown any signs of misunderstanding. You didn't.
What I found shocking in your documentation was the statement that hand-written machines can be much faster than framework-generated machines.
One could probably produce an equally shocking case for just about any non-trivial library out there. My "mistake" was to measure it and publish the numbers.
This is ridiculous. Consider, for example, the iterators library, or Andrei and Dave's smart pointer library. If incrementing an iterator constructed using the framework could take up to twenty times as long as incrementing a hand-written iterator, or if copying a framework-generated smart pointer was twenty times as expensive as copying a hand-written smart pointer, these libraries would be dead in the water.
A much more adequate comparison would be measuring dumb C++ pointers vs. reference-counted smart pointers. IIRC, the overhead for copying was way beyond a few percentage points for shared_ptr. [snip]
As a reviewer, I could have done my job simply by pointing to the absence of justification, and left it at that.
Fortunately, there are many other things that matter in a review.
My objection to your library was based on a deficiency in the rationale. I don't know if this deficiency could be remedied without exposing fundamental problems in your library. The only way to find out would be to try to fix it.
Fix the documentation or the library? [snip]
Hah! You should read a few trial transcripts. But we digress.
Indeed.
All my other vague suggestions were attempts to suggest either (i) how the library could have been redesigned to yield better performance or (ii) how you might have documented the impossibility of achieving better performance given your requirements.
This is exactly something that I don't think I will ever be able to prove. I can only provide evidence that hints in this direction. New C++ techniques and idioms are discovered almost on a monthly basis and there's always the possibility that such a technique enables a much better implementation (e.g. I wouldn't have imagined that it is possible to enumerate overloads like Alexander does or I couldn't possibly have conceived the technique that makes possible the FOREACH macro). I think it is close to impossible for an individual to be proficient in all the currently known techniques, not to mention the ones that have yet to be discovered.
Okay, I shouldn't have said "impossibility"; maybe "impossibility, using known methods"
Even that is close to impossible.
or "unliklihood, using known methods"
I would agree if you replaced the last three words with: "using methods known to me and to people who studied the implementation". But I guess that's what you meant anyway.
But seriously, you must know that I'm not insisting you prove that you're library could not possibly be improved by yet-to-be-discovered-techniques.
Given your previous comments I couldn't be sure. To me those comments always had the air that something close to a proof is feasible. I must have misunderstood you.
I'm aware they were not fully fleshed-out proposals. I was hoping you would try to meet me half-way, by trying to figure out what I was getting at and explaing why it wouldn't work.
... which I did in the cases where I could imagine candidate solutions. I the cases where I couldn't see one I said so. Since the latter usually didn't lead to a follow-up on your part I assumed that you don't have a solution either. There's no meeting half-way in such a situation.
This is getting very vague.
I don't think so, but let's leave it at that then.
Things that I *consciously* avoided or unsuccessfully tried are described in the rationale.
If all the things you unsuccessfully tried are described in the rationale, then you didn't try very much. I was giving you the beneift of the doubt by assuming you had omitted describing your alternate designs.
What else should I have tried?
Normally, in designing a framework one considers a small handful (say two to six) of alternate designs which view the subject matter from completely different perspectives.
I agree with this general statement but I'm still wondering how you came to the conclusion that I haven't tried very much.
In your case, what were those alternate designs?
The documentation contains fairly visible hints that I considered: 1. GOF Visitor 2. Two-dimensional table of function pointers 3. Acyclic visitor Moreover, I also considered: 4. GOF FSM-like implementation (where subclassing is used for substates) 5. Herb Sutters function pointer FSM 6. Alekseys FSM 7. Dave's FSM variants I didn't bother to implement any of these with the exception of 3, because it became very apparent very soon that the other approaches would not satisfy my requirements and that it would be very difficult or impossible to modify them to suit. So, I really tried only one other approach. If you are still of the opinion this is not much I'd be interested in your *concrete* suggestions what else I should have tried.
Moreover, during any programmer career a bag of knowledge is acquired that is unconsciously applied to every programming job. Naturally, this bag of knowledge is not the same for everyone
This is obviously true.
which is probably the main reason why the rationale seems to be sufficient for some people while others find it unsatisfying.
This is just funny. People who find your rationale satisfying could be people who don't care about performance as long as it's good enough for their application or people who failed to consider the same design alternatives you failed to consider.
Which alternatives did I fail to consider?
Note the subjunctive mood.
Ah yes, now that you mention it.
Let me make one last attempt to explain what should be in the rationale. Look at section 16.2 of "The C++ Programming Language", 3d ed. Here Stroustrup gives two design criteria for a containers library, and then discusses three designs:
"Specialized containers and iterators" - satisfies the first criterion "Based Containers" - satisfies the second criterion "STL" - satifies both criteria
I'm not suggesting that you give such a detailed treatement as Stroustrup. However, if you were to give a similar overview of the landscape of possible FSM frameworks, and thoughtfully discuss the tradeoffs in flexibility and performace which each involves, reviewers might have some confidence that you are aware of the techniques which might be used to construct a fast, flexible FSM framework, and have carefully considered the design.
I'm not sure how this is relevant (I already agreed to add such documentation). -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.

Hi Andreas, I agree with Rob that the discussion is devolving. I think there are still a few legitimate points to be made, but I'm going to take a few days away from this thread. Jonathan

Andreas Huber wrote:
Jonathan Turkanis wrote:
I forgot to respond to part of your message.
What I would like to see in the rationale is a comparison of a small handful (2-5) of alternate implementation technique, either approaches taken by other libraries or approaches you tried yourself early in development, together with an explanation of why they fail to satisfy the requirements.
That's interesting. I was under the impression that exactly such a list of alternate implementation techniques would not satisfy you, because it would in no way show that the design I chose is the best performance-wise.
Why not? That would be the main point.
And performance was your biggest concern, wasn't it?
Yes. Jonathan

Jonathan Turkanis wrote:
Andreas Huber wrote:
Jonathan Turkanis wrote:
I forgot to respond to part of your message.
What I would like to see in the rationale is a comparison of a small handful (2-5) of alternate implementation technique, either approaches taken by other libraries or approaches you tried yourself early in development, together with an explanation of why they fail to satisfy the requirements.
That's interesting. I was under the impression that exactly such a list of alternate implementation techniques would not satisfy you, because it would in no way show that the design I chose is the best performance-wise.
Why not? That would be the main point.
I don't understand. A list of alternate techniques would show that none of these techniques satisfies the requirements but it doesn't say anything why the current FSM implementation performs worse than any of these techniques. Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.

Andreas Huber wrote:
Jonathan Turkanis wrote:
Andreas Huber wrote:
That's interesting. I was under the impression that exactly such a list of alternate implementation techniques would not satisfy you, because it would in no way show that the design I chose is the best performance-wise.
Why not? That would be the main point.
I don't understand. A list of alternate techniques would show that none of these techniques satisfies the requirements but it doesn't say anything why the current FSM implementation performs worse than any of these techniques.
You don't just list the techniques. You explain them and summarize their performance characteristics. Jonathan

Jonathan Turkanis wrote:
That's interesting. I was under the impression that exactly such a list of alternate implementation techniques would not satisfy you, because it would in no way show that the design I chose is the best performance-wise.
Why not? That would be the main point.
I don't understand. A list of alternate techniques would show that none of these techniques satisfies the requirements but it doesn't say anything why the current FSM implementation performs worse than any of these techniques.
You don't just list the techniques. You explain them and summarize their performance characteristics.
That's what I meant. This will still not show how "good" the current design is. It will only show that the current design satisfies requirements that other designs don't and explain the techniques (and performance characteristics) I have chosen to satisfy the requirements. Whether those techniques are the best performance-wise is a completely different matter. -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.

Andreas Huber wrote:
Jonathan Turkanis wrote:
That's interesting. I was under the impression that exactly such a list of alternate implementation techniques would not satisfy you, because it would in no way show that the design I chose is the best performance-wise.
Why not? That would be the main point.
I don't understand. A list of alternate techniques would show that none of these techniques satisfies the requirements but it doesn't say anything why the current FSM implementation performs worse than any of these techniques.
You don't just list the techniques. You explain them and summarize their performance characteristics.
That's what I meant. This will still not show how "good" the current design is. It will only show that the current design satisfies requirements that other designs don't and explain the techniques (and performance characteristics) I have chosen to satisfy the requirements. Whether those techniques are the best performance-wise is a completely different matter.
I don't understand the destinction you're trying to make. Jonathan

Jonathan Turkanis wrote:
Andreas Huber wrote:
Jonathan Turkanis wrote:
That's interesting. I was under the impression that exactly such a list of alternate implementation techniques would not satisfy you, because it would in no way show that the design I chose is the best performance-wise.
Why not? That would be the main point.
I don't understand. A list of alternate techniques would show that none of these techniques satisfies the requirements but it doesn't say anything why the current FSM implementation performs worse than any of these techniques.
You don't just list the techniques. You explain them and summarize their performance characteristics.
That's what I meant. This will still not show how "good" the current design is. It will only show that the current design satisfies requirements that other designs don't and explain the techniques (and performance characteristics) I have chosen to satisfy the requirements. Whether those techniques are the best performance-wise is a completely different matter.
I don't understand the destinction you're trying to make.
We've discussed that in the other thread... -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.
participants (4)
-
Andreas Huber
-
David Abrahams
-
Jonathan Turkanis
-
Rob Stewart