
Doug Gregor wrote:
On Tue, Jun 2, 2009 at 10:09 AM, Eric Niebler <eric@boostpro.com> wrote:
Cool! I wonder how that's possible. I have it from Walter Bright (Zortech, Symantec, Digital Mars) that instantiating a template is inherently expensive, and certain features of the C++ language (ADL, partial specialization, etc.) force that to be the case.
Template instantiation is expensive, but you're probably seeing some O(n^2) or worse effects because of a poor choice in data structures.
Indeed, I recall Walter saying it's an N^2 problem. I've convinced him to write a blog entry about why template instantiation in C++ is inherently slow, so soon we'll know why he thinks so. Maybe your work in Clang can prove him wrong.
In GCC, for example, a surprising amount of time is wasted determining whether the class template specialization X<T1, T2, ..., TN> refers to an already-known template instantiation (or specialization), because GCC stores all of the template instantiations in a linked list. Thus, you pay each time you name X<T1, T2, ..., TN>, even if you don't instantiate it.
That sucks.
That's why we see quadratic (or worse) behavior for template metaprograms with GCC. I suspect that other compilers have the same problem.
If Clang has found a way to solve these problems, that's good news indeed.
We're working on it. I did some simple benchmarking with the ultra-boring Fibonacci template metaprogram last Friday, just to see how Clang is doing, and posted the results here:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/attachments/20090529/a392b024/att...
Lookin' good!
The cost of template *instantiation* for the Fibonacci example is quite small for both compilers, since we're just talking about creating a class with its special member functions and a single static data member.
Would it go faster if the compiler didn't have to create the special member functions? Could we use the declared-but-not-defined trick to suppress their generation and speed up template instantiations for metafunctions?
However, GCC is exhibiting quadratic behavior because every time we name Fibonacci<I> for some value I, it's doing a linear search to see if there's already a specialization for that value of I. Clang is scaling much better here because our search for an already-named specialization is constant time in the average case.
I can't promise that the improvements we see in Fibonacci will extend to real template metaprograms, because I haven't tried it. Nor can I: Clang lacks both member templates and class template partial specialization [*], which means that we can't compile a serious template metaprogram with Clang. Obviously, template metaprogramming is important to me, personally, so we'll do our best to scale this well for real template metaprograms.
Where can I read more and follow the team's progress? -- Eric Niebler BoostPro Computing http://www.boostpro.com