
Hi Simon,
I think it would be worthwhile talking a little more about compile time performance in the documentation.
You're right, I could explain more in the documentation. Unfortunately, the answer is not straightforward. The compilation time (and how far the compiler plays with us) depends on a number of factors: 1) the compiler 2) the compiler version 3) even your hardware (I get very different compile times on 2 different machines with the same processor) 4) the state machine structure and the chosen front-end 1) usually, g++ crashes later than VC and needed fewer workarounds 2) Surprisingly, g++4.2 > g++ 4.4. VC10 > VC9 > VC8 3) you will need RAM, especially with VC8 4) the compile-time is proportional to the number of transitions in the transition table and to the depth of the state machine (meaning submachine of submachine of...). EUML will cost you much more with VC than g++ as it implies a metaprogramming layer above the standard metaprogramming of the standard front-end and because the underlying decltype/typeof seems much better implemented with g++ (if you don't count the generated code size). A small test with VC9 in Release mode (and no other code) gave me: 20 transitions => 11s 30 transitions => 16s 40 transitions => 22s 50 transitions => 31s This is just to give you an idea as it can greatly depend on the factors named above.
Also I'd like to see some suggestions of how to improve compile times (and how factors including the number of entries in a transition table affect the compile time).
There is just one solution: reduce the number of transitions. The best way to do it is using orthogonal regions. For example, if you see in your diagram many transitions to a single state based on the same event, you should factor them all into a second region. Adding submachines could help in some cases. On one hand, the compiler now has to generate code for a second machine, OTOH it reduces the number of transitions in the main one. This will only help if your submachine handles a small number of events, as Msm will add a row in the main table for every event of the submachines. If you can in exchange reduce the number of transitions, it can help.
Is there provision to split a machine across multiple compilation units (I assume not)?
You unfortunately assume correctly, as what costs compile-time really is the heavy metaprogramming.
You include instructions for how to increase the maximum size of an MPL vector, but this only works up to 50 entries. What do we need to do to increase this further?
This is pretty simple, we need to: - add a vector60/70/80....hpp in mpl/vector. - add a map60/70....hpp in mpl/map. - if more than 50 states, add vector60/70....hpp in fusion/vector This is pretty simple. If that helps, I can add them into the Msm sandbox as I already wrote them. We could also request a feature in the MPL. You will however not manage to increase the size of the transition table forever. At some point will almost every compiler crash in a big cloud of dark smoke. To give you an idea, I usually manage: - 60-80 transitions max with VC9. The exact point is variable on a number of different factors which are hard to understand. - probably less with VC8. VC8 will also quite fast need > 2GB RAM. - 80 transitions with g++4.2. It could probably manage more but I ran out of RAM. - 50-60 with g++4.4. Why 4.4 is in this case less good than 4.2 is a mystery to me. - In Debug mode, you'll see the compiler give up faster - Euml will also have an influence.
Euml looks really cool, but I'm getting an ICE when using it with 36 transitions in EumlSimple.cpp (12 and 24 work fine though)
VC8 only has limited support for eUML for 3 reasons: - less than perfect metaprogramming capabilities - bad support for SFINAE - VC problems with decltype/typeof So yes, I would advise eUML for VC8 for only small state machines and without using complicated functors as actions/guards. VC9 would work much better, VC10 better than VC9 too (so Microsoft seems to be working on it).
One of my favourite features of boost::statechart is state local storage. It looks like this would have to be done manually in msm?
Yes, this is not supported yet. However, Msm has always been developed very user-oriented and what users ask, they usually get it. Msm has from the beginning greatly gained from clever people suggesting new features and I will not stop the development after the review, whatever the outcome is. If you'd like a feature, I'd be happy to discuss a solution with you and implement it in the next version, thus making Msm more useful.
The library is looking really powerful and expressive (not to mention fast!) for small state machines.
Thanks. This is the reason for the compile-times ;-) Christophe

Zitat von Christophe Henry <christophe.j.henry@googlemail.com>:
The compilation time (and how far the compiler plays with us) depends on a number of factors: 1) the compiler 2) the compiler version 3) even your hardware (I get very different compile times on 2 different machines with the same processor) 4) the state machine structure and the chosen front-end
sorry for the OT, but you seem to be experienced with compile time optimization and I`m new to that. a library that I plan to submit to boost causes > 50 s compile times even for simple examples, so do you have any recommendations on where to start? is there a compiler that can profile its own compile time? iirc GCC has a debug option that prints the name of the template that is currently instantiated, but creating a profile from that is still a lot of work. my library invokes a lot of Boost.Serialization and some MPL/fusion code, so I'm not even sure if I can do anything about that within my library. thanks

Hi Christophe, I've dragged myself back to 2009 from 2004 where I've been stuck with CodeWarrior and an emulated OSX 10.3 environment with no chance of using any modern boost libs. So I hope to have a review in by the end of the week. Christophe Henry wrote:
Hi Simon,
I think it would be worthwhile talking a little more about compile time performance in the documentation.
You're right, I could explain more in the documentation. Unfortunately, the answer is not straightforward. The compilation time (and how far the compiler plays with us) depends on a number of factors: 1) the compiler 2) the compiler version 3) even your hardware (I get very different compile times on 2 different machines with the same processor) 4) the state machine structure and the chosen front-end
1) usually, g++ crashes later than VC and needed fewer workarounds 2) Surprisingly, g++4.2 > g++ 4.4. VC10 > VC9 > VC8 3) you will need RAM, especially with VC8
Indeed MSM 1.x showed g++4.2 much faster and uses less ram than VC8. Is MSM 2.x any faster or more ram frugal?
4) the compile-time is proportional to the number of transitions in the transition table and to the depth of the state machine (meaning submachine of submachine of...). EUML will cost you much more with VC than g++ as it implies a metaprogramming layer above the standard metaprogramming of the standard front-end and because the underlying decltype/typeof seems much better implemented with g++ (if you don't count the generated code size).
A small test with VC9 in Release mode (and no other code) gave me: 20 transitions => 11s 30 transitions => 16s 40 transitions => 22s 50 transitions => 31s
For 41 transitions and 5 states, I was seeing about 30s on XCode with g++4.2.x. Unfortunately the same code takes over 2.5 minutes on VC8. In fact so many compute resources were used that Incredibuild, our distributed build tool, was blocked from distributing any other compilation units for about 2 minutes. This forced me to revert to the TMP book's STT implementation which compiled in less than 30 seconds on both platforms, and allows Incredibuild to do it's job. Aah! I should mention these times are for debug builds, which is a much more important measurement in an iterative/agile development environment. It's not that big of a deal if the nightly release build takes a few minutes longer. By the way I'm on XP with 4GB ram using the /3GB switch. Interestingly if I just put all the MSM implementation inline there is about a 1 minute overall build time improvement.
This is just to give you an idea as it can greatly depend on the factors named above.
Also I'd like to see some suggestions of how to improve compile times (and how factors including the number of entries in a transition table affect the compile time).
There is just one solution: reduce the number of transitions. The best way to do it is using orthogonal regions. For example, if you see in your diagram many transitions to a single state based on the same event, you should factor them all into a second region. Adding submachines could help in some cases. On one hand, the compiler now has to generate code for a second machine, OTOH it reduces the number of transitions in the main one. This will only help if your submachine handles a small number of events, as Msm will add a row in the main table for every event of the submachines. If you can in exchange reduce the number of transitions, it can help.
My efforts at submachine refactoring actually increased compile times significantly. Also my main motivation for using the STT is it's ability to give both a forest-and-trees presentation of a moderately complex set of operations. Any refactoring just to get better compile times that obscures this ease of understanding just negates the benefits. Of course if refactoring into regions/zones exposes some underlying relations inherent in these states, transition and operations than it's fully justified, IMHO. [snip]
One of my favourite features of boost::statechart is state local storage. It looks like this would have to be done manually in msm?
Yes, this is not supported yet. However, Msm has always been developed very user-oriented and what users ask, they usually get it. Msm has
I can attest to Christophe responsiveness and willingness to accommodate users needs. Jeff

At Wed, 09 Dec 2009 11:18:16 -0500, Jeff Flinn wrote:
For 41 transitions and 5 states, I was seeing about 30s on XCode with g++4.2.x.
Unfortunately the same code takes over 2.5 minutes on VC8. In fact so many compute resources were used that Incredibuild, our distributed build tool, was blocked from distributing any other compilation units for about 2 minutes. This forced me to revert to the TMP book's STT implementation which compiled in less than 30 seconds on both platforms, and allows Incredibuild to do it's job.
Aah! I should mention these times are for debug builds, which is a much more important measurement in an iterative/agile development environment. It's not that big of a deal if the nightly release build takes a few minutes longer. By the way I'm on XP with 4GB ram using the /3GB switch.
It might be interesting to contemplate a "fast-compile mode" that builds the same logical state machine using the same input syntax but trades away a little run time for compile time. -- Dave Abrahams Meet me at BoostCon: http://www.boostcon.com BoostPro Computing http://www.boostpro.com

David Abrahams wrote:
At Wed, 09 Dec 2009 11:18:16 -0500, Jeff Flinn wrote:
For 41 transitions and 5 states, I was seeing about 30s on XCode with g++4.2.x.
Unfortunately the same code takes over 2.5 minutes on VC8. In fact so many compute resources were used that Incredibuild, our distributed build tool, was blocked from distributing any other compilation units for about 2 minutes. This forced me to revert to the TMP book's STT implementation which compiled in less than 30 seconds on both platforms, and allows Incredibuild to do it's job.
Aah! I should mention these times are for debug builds, which is a much more important measurement in an iterative/agile development environment. It's not that big of a deal if the nightly release build takes a few minutes longer. By the way I'm on XP with 4GB ram using the /3GB switch.
It might be interesting to contemplate a "fast-compile mode" that builds the same logical state machine using the same input syntax but trades away a little run time for compile time.
Yes, I'd be willing to trade that, as my SM is not running in a real time environment, but is managing user mouse and menu management. Jeff

Christophe Henry wrote:
You're right, I could explain more in the documentation. Unfortunately, the answer is not straightforward. The compilation time (and how far the compiler plays with us) depends on a number of factors: 1) the compiler 2) the compiler version 3) even your hardware (I get very different compile times on 2 different machines with the same processor) 4) the state machine structure and the chosen front-end
1) usually, g++ crashes later than VC and needed fewer workarounds 2) Surprisingly, g++4.2 > g++ 4.4. VC10 > VC9 > VC8 3) you will need RAM, especially with VC8 4) the compile-time is proportional to the number of transitions in the transition table and to the depth of the state machine (meaning submachine of submachine of...). EUML will cost you much more with VC than g++ as it implies a metaprogramming layer above the standard metaprogramming of the standard front-end and because the underlying decltype/typeof seems much better implemented with g++ (if you don't count the generated code size).
Christophe, I'm sorry if my 2 cents won't be relevant since I'm not very familiar with neither MSM nor with MPL, but I have experience doing m-p with some in-house libraries. The most common reason for compiler slowliness/crashiness that I've seen came from too deep of a template recursion. I'm not totally sure, but I think that you're using mpl::vector, which, I think, is a form of a type list. Type lists have a recursion depth equal the number of elements in the list. In my work I ended up using type trees which have a recursion depth of log(number_of_types) and that greatly increased compiler's resiliency in the face of the increasing number of types. Sometimes it even increased compilation speed. I'm not sure if MPL has type trees or not, but they aren't hard to implement. Thanks, Andy.
participants (5)
-
Andy Venikov
-
Christophe Henry
-
David Abrahams
-
Jeff Flinn
-
strasser@uni-bremen.de