Re: [boost] [phoenix] compile-time performance

14 Sep 2011

      On Wednesday, September 14, 2011 10:15:54 AM Beren Minor wrote:
...
Hi,
I'm interested in this compile-time issue because I'm facing the same
kind of issue in some of my own projects.
I'm extensively using Boost so I don't really know if it comes from it or
if it comes from my code (which similarly uses a lot of templates).
Anyway, could you share a little bit over how you can find out what
are the compile-time hitters? I've got some sources taking like 40s to
build with gcc and it's getting really annoying. Are they some best
practices when using template meta-programming? Some tricks to
know about what pattern is slow and what's quicker for the compiler?
For example -- as this is a phoenix thread -- could you share some
examples of what took time in phoenix and how you fixed it?
Unfortunately i can't really give general advise of what to do. Here is what I 
have done for parts of phoenix that brought down compile times on gcc and 
clang, obviously this doesn't hold for MSVC.

1) Partially preprocessing headers that had preprocessing loops to emulate 
variadic templates. This helped reducing the constant time needed for the 
preprocessor to generate code. Compile were reduced significantly but only in 
TUs that are not that big (i.e. if you have long expression that take ages to 
compile it is not really noticable, example here is spirit, it rarely depends 
on variadics and spend most of its time in instantiating templates).

2) Avoid mpl metafunctions like at, if_ etc. and use at_c, if_c. This reduces 
the number of instantiated templates because those meta functions are usually 
implemented in terms of their _c functions, for example:
template <typename Sequence, typename N>
struct at : at_c<Sequence, N::value> {};
Unfortunately this trick doesn't hold for fusion as it is the other way around 
there, the _c functions are implemented in terms of their non-_c counterparts.
(FWIW, i think that here lies a potential optimization possibility for 
fusion).

3) Avoid full specializations. According to the standard, a template that is 
fully specialized needs to be instantiated.
I used this technique for the various customization points in phoenix, for 
example too register rules and actions. The definition for a action is:

    struct default_actions
    {
        template <typename Rule, typename Dummy = void>
        struct when;
    };

When registering an action for a certain rule you can write:

    template <typename Dummy>
    struct default_actions::when<your_rule, Dummy>
    {
        // ...
    };

This avoids the instantiation of that specific template if the header 
containing is included, but the expression triggering that template isn't 
actually used, thus saving time.
However, i tried to apply that trick to the various fusion extension points 
and it didn't work.
One possible explanation for this behaviour is that the fusion extension 
points are very lightweight struct themself. That is they contain e nested 
struct which is a template themself that can't be instantiated yet. Thus the 
added complexity of the Dummy parameter lead to more compile time instead of 
decreasing it cause of less instantiations.
The phoenix actions nested when struct is different as it is heavier to 
instantiate (it is usally implemented to derive from phoenix::call, which 
might be quite heavy on the compiler ... there might be another optimization 
possibility)

4) Avoid SFINAE. It doesn't scale. Consider a function that is overloaded with 
different sfinae things enabled. Upon a function call, the compiler puts all 
of these overloaded functions that have teh same arity as the function call in 
the "suitable function set", after that every SFINAE template needs to be 
instantiated and it needs to be decided if the type expression is valid or 
not. Prefer tag dispatching and boolean metafunctions to dispatch to the 
correct function.
...
I've tried Steven Watanabe template profiler but got a lot of trouble with
the STL complaining about its code being modified (which is true as the
template profiler adds some code to it).
I haven't really used it myself yet as i don't find it very helpful to extract 
valuable information out of the output.
Additionally, the instantiation count alone isn't really meaningful, as can be 
seen with the diverging compile times of gcc and msvc in the other posts. Also 
the results of various optimization tries show that it isn't enough.

I hope the lines I wrote above are enough to continue the discussion. Please 
correct me if I am wrong and i would be happy if others can contribute their 
experience too.

Joel Falcou also suggested to compile a list of the usual TMP scenarios with 
different use case patterns. Based on these small scale examples we could 
analyze certain optimization or pesimization techniques more efficiently.
I think we just don't fully understand the impact of TMP to compilers yet.
I would like to get oppinions and insight from compiler vendors too, as they 
are at the source of "evil".

Re: [boost] [phoenix] compile-time performance

Thomas Heller