Re: [boost] [fusion] improving compile times

3 Jun 2009

      On Tue, Jun 2, 2009 at 10:09 AM, Eric Niebler <eric@boostpro.com> wrote:
...
Joel de Guzman wrote:
...
Sebastian Redl wrote:
...
Eric Niebler wrote:
...
I confess that I'm not actually benchmarking compile speed; rather,
I'm benchmarking the number of template instantiations as reported by
Steven's template profiler. I'm profiling TMP-heavy code like some of
Proto's and xpressive's tests and cherry-picking the worst offenders.
The Fusion vector_n_chooser patch knocked off 100's of template
instantiations, for instance.
That's not necessarily a good benchmark, especially if you replace it by
preprocessor metaprogramming which leads to more non-template code. GCC
is extremely slow at instantiating templates, but this is not
necessarily true for other compilers - I believe, for example, that
Clang will be faster at instantiating templates than parsing raw code.
(No benchmarks - but I know the code.)
Cool! I wonder how that's possible. I have it from Walter Bright (Zortech,
Symantec, Digital Mars) that instantiating a template is inherently
expensive, and certain features of the C++ language (ADL, partial
specialization, etc.) force that to be the case.
Template instantiation is expensive, but you're probably seeing some
O(n^2) or worse effects because of a poor choice in data structures.
In GCC, for example, a surprising amount of time is wasted determining
whether the class template specialization X<T1, T2, ..., TN> refers to
an already-known template instantiation (or specialization), because
GCC stores all of the template instantiations in a linked list. Thus,
you pay  each time you name X<T1, T2, ..., TN>, even if you don't
instantiate it. That's why we see quadratic (or worse) behavior for
template metaprograms with GCC. I suspect that other compilers have
the same problem.
...
If Clang has found a way to
solve these problems, that's good news indeed.
We're working on it. I did some simple benchmarking with the
ultra-boring Fibonacci template metaprogram last Friday, just to see
how Clang is doing, and posted the results here:

  http://lists.cs.uiuc.edu/pipermail/cfe-dev/attachments/20090529/a392b024/att...

The cost of template *instantiation* for the Fibonacci example is
quite small for both compilers, since we're just talking about
creating a class with its special member functions and a single static
data member. However, GCC is exhibiting quadratic behavior because
every time we name Fibonacci<I> for some value I, it's doing a linear
search to see if there's already a specialization for that value of I.
Clang is scaling much better here because our search for an
already-named specialization is constant time in the average case.

I can't promise that the improvements we see in Fibonacci will extend
to real template metaprograms, because I haven't tried it. Nor can I:
Clang lacks both member templates and class template partial
specialization [*], which means that we can't compile a serious
template metaprogram with Clang. Obviously, template metaprogramming
is important to me, personally, so we'll do our best to scale this
well for real template metaprograms.
...
I read form the Wikipedia
entry that Clang's C++ support is 2-3 years from being usable, though.
I can't comment on that, but I appreciate Dave's wager ;)

  - Doug

[*] And it lacks function templates, for you snarky folks trying to
fool compilers into handling template metaprograms better :)

Re: [boost] [fusion] improving compile times

Doug Gregor