
Jonathan Turkanis wrote:
Andreas Huber wrote:
Jonathan Turkanis wrote:
2. If I object to part of your library I should offer a concrete proposal to fix it
Either that or prototypes of techniques of which I said I can't see how they will work. I said that because I have invested considerable time in discussing things that might come to mind when one wants to improve performance. I can't think of much else I can say in support of the current design performance-wise. So I thought it would be more productive if you or anyone else could make concrete proposals. Such a proposal would either send me back to the drawing board or I would need to argue convincingly why it doesn't work.
I've read the documentation and most if not all of the review discussion, and still I do not find your explanations satisfying.
It helps if you explain exactly what you don't find satisfying, see below.
It's very tempting to try to design an FSM library with better performance characteristics, but I don't have anywhere near enough time, esp. because I'd have to start out by becoming an FSM expert. I don't think I should have to do this before writing a review.
Certainly not. However, I do expect that people questioning the performance (and along with it the implementation) are able to ask concrete questions why some things are the way they are. You partially did that in your previous posts but only partially. I think it is difficult to respond to statements like: <quote> Possibly the joints between parts of machines defined in separate TUs must be weaker (i.e., less known at compile time) than the connections between collections of states implemented in a single TU. </quote> Suggestions like this aren't of much help as long as you don't get more concrete. In the worst case you get an unconvincing "I don't see how", in the best case I can respond with a list of ideas I tried and lengthy explanations why those ideas don't work. While the latter might be more convincing, it is rather inefficient because I likely end up explaining things that you already know or have figured out yourself. As I said before, I think it is close to impossible to prove that a certain design is the best performance-wise. Also, I don't claim that there isn't any room in the current implementation for improvements, rather I think it has reached a stage where substantial (>= 2x faster) optimizations are non-trivial. I can only describe why common optimizations don't work or why they would violate one ore more of the requirements. Things that I *consciously* avoided or unsuccessfully tried are described in the rationale. Moreover, during any programmer career a bag of knowledge is acquired that is unconsciously applied to every programming job. Naturally, this bag of knowledge is not the same for everyone which is probably the main reason why the rationale seems to be sufficient for some people while others find it unsatisfying. Ergo, the only thing I can do is listen to the questions people ask and either add explanations to the rationale or change the implementation.
3. If I want to understand where the performance bottlenecks are, I should examine the code.
No, I didn't mean that in the generality your summary suggests. In *one* *specific* case I thought it is better if *you* (Jonathan) look at the code before I explain all the details in text. I did that because I *assumed* (maybe wrongly) that you want a very detailed description for which I really don't have the time at the moment.
But this one specific case -- the expense of implementing state local storage -- forms the basis for your argument that dispatch time is not so important, right?
Right. Here's a description of the relevant parts of the implementation: a) The currently active states form a tree where the number of branches is equal to the number of currently active orthogonal regions. In flat FSMs this tree consists of only the root object. In hierarchical but non-orthogonal FSMs the tree consists of exactly one branch, with the depth being equal to the nesting depth b) The state_machine subtype object maintains pointers to the currently active outermost state and all currently active innermost states c) Each object in the tree maintains pointers to its outer context and all its currently active direct inner states. Innermost states contain a std::list iterator that points to the position where the state is stored in the state_machine subtypes' innermost states list As a result of this structure, the following needs to be done during transitions: 1) State entry: For every state that is entered an object is constructed. During construction a vtable pointer is assigned. The (intrusive) reference count is set to 1. If BOOST_FSM_USE_NATIVE_RTTI is not defined another pointer member is assigned. The state is registered with its outer context. An internal pointer to the outer context is assigned, as a result the reference count of the possibly present outer state is incremented. 2) State exit: For every state that is exited an object needs to be destructed. Destruction involves deregistering the state with its outer state and decrementing the reference count of the outer state. In a FSM without state-local storage all in 1 & 2 could be avoided. Instead, a transition is simply a series of function calls to entry/exit functions. Functions with none or a small implementation can easily be inlined. Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.