
On Tue, Jun 2, 2009 at 10:09 AM, Eric Niebler <eric@boostpro.com> wrote:
Joel de Guzman wrote:
Sebastian Redl wrote:
Eric Niebler wrote:
I confess that I'm not actually benchmarking compile speed; rather, I'm benchmarking the number of template instantiations as reported by Steven's template profiler. I'm profiling TMP-heavy code like some of Proto's and xpressive's tests and cherry-picking the worst offenders. The Fusion vector_n_chooser patch knocked off 100's of template instantiations, for instance.
That's not necessarily a good benchmark, especially if you replace it by preprocessor metaprogramming which leads to more non-template code. GCC is extremely slow at instantiating templates, but this is not necessarily true for other compilers - I believe, for example, that Clang will be faster at instantiating templates than parsing raw code. (No benchmarks - but I know the code.)
Cool! I wonder how that's possible. I have it from Walter Bright (Zortech, Symantec, Digital Mars) that instantiating a template is inherently expensive, and certain features of the C++ language (ADL, partial specialization, etc.) force that to be the case.
Template instantiation is expensive, but you're probably seeing some O(n^2) or worse effects because of a poor choice in data structures. In GCC, for example, a surprising amount of time is wasted determining whether the class template specialization X<T1, T2, ..., TN> refers to an already-known template instantiation (or specialization), because GCC stores all of the template instantiations in a linked list. Thus, you pay each time you name X<T1, T2, ..., TN>, even if you don't instantiate it. That's why we see quadratic (or worse) behavior for template metaprograms with GCC. I suspect that other compilers have the same problem.
If Clang has found a way to solve these problems, that's good news indeed.
We're working on it. I did some simple benchmarking with the ultra-boring Fibonacci template metaprogram last Friday, just to see how Clang is doing, and posted the results here: http://lists.cs.uiuc.edu/pipermail/cfe-dev/attachments/20090529/a392b024/att... The cost of template *instantiation* for the Fibonacci example is quite small for both compilers, since we're just talking about creating a class with its special member functions and a single static data member. However, GCC is exhibiting quadratic behavior because every time we name Fibonacci<I> for some value I, it's doing a linear search to see if there's already a specialization for that value of I. Clang is scaling much better here because our search for an already-named specialization is constant time in the average case. I can't promise that the improvements we see in Fibonacci will extend to real template metaprograms, because I haven't tried it. Nor can I: Clang lacks both member templates and class template partial specialization [*], which means that we can't compile a serious template metaprogram with Clang. Obviously, template metaprogramming is important to me, personally, so we'll do our best to scale this well for real template metaprograms.
I read form the Wikipedia entry that Clang's C++ support is 2-3 years from being usable, though.
I can't comment on that, but I appreciate Dave's wager ;) - Doug [*] And it lacks function templates, for you snarky folks trying to fool compilers into handling template metaprograms better :)