
On Wednesday, September 14, 2011 10:15:54 AM Beren Minor wrote:
Hi,
I'm interested in this compile-time issue because I'm facing the same kind of issue in some of my own projects. I'm extensively using Boost so I don't really know if it comes from it or if it comes from my code (which similarly uses a lot of templates).
Anyway, could you share a little bit over how you can find out what are the compile-time hitters? I've got some sources taking like 40s to build with gcc and it's getting really annoying. Are they some best practices when using template meta-programming? Some tricks to know about what pattern is slow and what's quicker for the compiler?
For example -- as this is a phoenix thread -- could you share some examples of what took time in phoenix and how you fixed it?
Unfortunately i can't really give general advise of what to do. Here is what I have done for parts of phoenix that brought down compile times on gcc and clang, obviously this doesn't hold for MSVC. 1) Partially preprocessing headers that had preprocessing loops to emulate variadic templates. This helped reducing the constant time needed for the preprocessor to generate code. Compile were reduced significantly but only in TUs that are not that big (i.e. if you have long expression that take ages to compile it is not really noticable, example here is spirit, it rarely depends on variadics and spend most of its time in instantiating templates). 2) Avoid mpl metafunctions like at, if_ etc. and use at_c, if_c. This reduces the number of instantiated templates because those meta functions are usually implemented in terms of their _c functions, for example: template <typename Sequence, typename N> struct at : at_c<Sequence, N::value> {}; Unfortunately this trick doesn't hold for fusion as it is the other way around there, the _c functions are implemented in terms of their non-_c counterparts. (FWIW, i think that here lies a potential optimization possibility for fusion). 3) Avoid full specializations. According to the standard, a template that is fully specialized needs to be instantiated. I used this technique for the various customization points in phoenix, for example too register rules and actions. The definition for a action is: struct default_actions { template <typename Rule, typename Dummy = void> struct when; }; When registering an action for a certain rule you can write: template <typename Dummy> struct default_actions::when<your_rule, Dummy> { // ... }; This avoids the instantiation of that specific template if the header containing is included, but the expression triggering that template isn't actually used, thus saving time. However, i tried to apply that trick to the various fusion extension points and it didn't work. One possible explanation for this behaviour is that the fusion extension points are very lightweight struct themself. That is they contain e nested struct which is a template themself that can't be instantiated yet. Thus the added complexity of the Dummy parameter lead to more compile time instead of decreasing it cause of less instantiations. The phoenix actions nested when struct is different as it is heavier to instantiate (it is usally implemented to derive from phoenix::call, which might be quite heavy on the compiler ... there might be another optimization possibility) 4) Avoid SFINAE. It doesn't scale. Consider a function that is overloaded with different sfinae things enabled. Upon a function call, the compiler puts all of these overloaded functions that have teh same arity as the function call in the "suitable function set", after that every SFINAE template needs to be instantiated and it needs to be decided if the type expression is valid or not. Prefer tag dispatching and boolean metafunctions to dispatch to the correct function.
I've tried Steven Watanabe template profiler but got a lot of trouble with the STL complaining about its code being modified (which is true as the template profiler adds some code to it).
I haven't really used it myself yet as i don't find it very helpful to extract valuable information out of the output. Additionally, the instantiation count alone isn't really meaningful, as can be seen with the diverging compile times of gcc and msvc in the other posts. Also the results of various optimization tries show that it isn't enough. I hope the lines I wrote above are enough to continue the discussion. Please correct me if I am wrong and i would be happy if others can contribute their experience too. Joel Falcou also suggested to compile a list of the usual TMP scenarios with different use case patterns. Based on these small scale examples we could analyze certain optimization or pesimization techniques more efficiently. I think we just don't fully understand the impact of TMP to compilers yet. I would like to get oppinions and insight from compiler vendors too, as they are at the source of "evil".