Re: [boost] [msm] scalability

David Abrahams wrote:
The idea would be to do all parts that are “hard on the compiler” at runtime. I don't see why you should have to sacrifice features.
I promised to make an analysis of which feature costs how much compile-time and test your point. I hope to bring a beginning of an answer and hope you will enjoy the trip to VC++ world ;-) I took the iPod example from the last BoostCon and compile it with VC8 and VC9 in Debug and Release. I think it is an interesting example: submachines, orthogonal regions, a good number of states. I also took the MPLLimitTest example (80 transitions, 80 states) to check a wide state machine not factored into any submachines. Both can be found in the sandbox in libs/msm/doc (/iPod or directly for the MPLLimitTest). So, the results: VC9 (Release): 60s VC9(Debug): 79s VC8(Release): 64s VC8(Debug): 183s (!!!) Notice how debug takes so much longer, as rightly noticed by Jeff. Part1: we now move the submachines into different headers, simply included in the main application. Should change nothing, right? First small surprise, it helps VC8 in Debug: VC9 (Release): 65s VC9(Debug): 80s VC8(Release): 68s VC8(Debug): 166s I didn't see why Debug took so much longer, so I played a bit with the compiler options until I found the culprit. If we now deactivate the /Gm option ("Enable minimal rebuild") in Debug (not activated in Release), we now see a big difference: VC9(Debug): 58s VC8(Debug): 62s Waow! 20s better for VC9, 102 for VC8. This could explain Jeff's problems. This is what I call a successful optimization from the compiler to create a minimal rebuild. Now Debug builds are faster than Release (due to shorter link time if you ask). Part2: This is nice but we did not add any new policy for faster compiling yet. I now add one. Unfortunately, reducing the number of template instantiations proved to be a hard business as most of removed instantiations which were "hard on the compiler" were still done somewhere else and VC is clever enough not to repeat itself. On the MPLLimitTest it gives: Before: VC9(Release): 134s VC9(Debug): 135s (without /Gm. Otherwise count 275!) VC8(Release): crash :( (ok this is really a hard test) VC8(Debug): I stopped after the compiler used up 3,5GB of my 4GB... after: VC9(Release): 84s VC9(Debug): 84s VC8(Release): crash :( VC8(Debug): 153s Ok better. It can be also better for the iPod example as the policy allows you to move some of the metaprogramming for submachines to other TUs and compile with 3 cores (one fsm, 2 submachines) using /MP (available on VC8 and VC9). See in the doc the example in iPod/Part2. We now have the following compile time (using 3 cores): VC9(Release): 44s VC9(Debug): 37s VC8(Release): 50s VC8(Debug):40s This looks much better as it allows the clever user to build submachines and enjoy better compile times. For cases where performance is needed, simply omit the submachines' cpp files from build. The trick works using a boost::any to contain the event, thus allowing you to avoid instantiating a process_event inside the main machine process_event. The only feature you lose with this policy is the new possibility to make a derived event fire his basis event's transition (any can recognize only strict types). So this is the current state. But it can be better. We still pay for the instantiation of submachines into the main machine, which still costs you time. I made a test using a proxy for submachines. The proxy containing only a void* to the real submachine, instantiating it is cheap. It's not finished but we now build on 5 cores (the main fsm, 2 for each submachine, 2 submachines). Yes we can now build a submachine on 2 cores :) This gives us the possible future compile-time: VC9(Release): 34s VC9(Debug): 27s VC8(Release): 40s VC8(Debug):30s I hope this will make the compile-time problem more manageable. Again, more to come on this later. Jeff, if you could give it a try, it would be really great! Christophe PS: I am using a Q6600, 4 cores, 2.4GHz, no hyperthreading. We can soon use the new hexacore with hyperthreading :)

Christophe Henry wrote:
David Abrahams wrote:
The idea would be to do all parts that are “hard on the compiler” at runtime. I don't see why you should have to sacrifice features.
I promised to make an analysis of which feature costs how much compile-time and test your point. I hope to bring a beginning of an answer and hope you will enjoy the trip to VC++ world ;-)
I took the iPod example from the last BoostCon and compile it with VC8 and VC9 in Debug and Release. I think it is an interesting example: submachines, orthogonal regions, a good number of states. I also took the MPLLimitTest example (80 transitions, 80 states) to check a wide state machine not factored into any submachines. Both can be found in the sandbox in libs/msm/doc (/iPod or directly for the MPLLimitTest).
So, the results: VC9 (Release): 60s VC9(Debug): 79s VC8(Release): 64s VC8(Debug): 183s (!!!)
...
I didn't see why Debug took so much longer, so I played a bit with the compiler options until I found the culprit. If we now deactivate the /Gm option ("Enable minimal rebuild") in Debug (not activated in Release), we now see a big difference: VC9(Debug): 58s VC8(Debug): 62s
Waow! 20s better for VC9, 102 for VC8. This could explain Jeff's problems. This is what I call a successful optimization from the compiler to create a minimal rebuild. Now Debug builds are faster than Release (due to shorter link time if you ask).
Sorry for the delay. That is awesome Christoffe! I've turned off /Gm for all my projects and my state machine now compiles in the 6-sec range, and Incredibuild is able to keep the entire compile farm running at capacity. You shaved >30% off my full build times. Jeff
participants (2)
-
Christophe Henry
-
Jeff Flinn