[fusion] improving compile times

newer
Re: [boost] unique_ptr for C++0x...

older
Change to documentation writing...

Eric Niebler

2 Jun 2009 2 Jun '09

1:26 a.m.

I'm attaching a simple patch to vector_n_chooser.hpp that replaces some template metaprogramming with preprocessor metaprogramming in the interest of improving compile times. I found this hotspot through profiling, and there are likely to be many more such patches in the near future. I could open a trac ticket and attach all the patches there, or I could just commit them to trunk as I go. Thoughts? P.S. It would be great to get some other heavy users of template metaprogramming interested in making other such changes to bring down the compile times of the core TMP libraries like MPL, Fusion, Proto, etc., etc. -- Eric Niebler BoostPro Computing http://www.boostpro.com Index: vector_n_chooser.hpp =================================================================== --- vector_n_chooser.hpp (revision 53535) +++ vector_n_chooser.hpp (working copy) @@ -25,11 +25,13 @@ #include <boost/fusion/container/vector/vector50.hpp> #endif -#include <boost/mpl/distance.hpp> -#include <boost/mpl/find.hpp> -#include <boost/mpl/begin_end.hpp> #include <boost/preprocessor/cat.hpp> +#include <boost/preprocessor/arithmetic/dec.hpp> +#include <boost/preprocessor/arithmetic/sub.hpp> +#include <boost/preprocessor/facilities/intercept.hpp> #include <boost/preprocessor/repetition/enum_params.hpp> +#include <boost/preprocessor/repetition/enum_trailing_params.hpp> +#include <boost/preprocessor/repetition/enum_params_with_a_default.hpp> namespace boost { namespace fusion { @@ -38,40 +40,23 @@ namespace boost { namespace fusion { namespace detail { - template <int N> - struct get_vector_n; + template <BOOST_PP_ENUM_PARAMS(FUSION_MAX_VECTOR_SIZE, typename T)> + struct vector_n_chooser + { + typedef BOOST_PP_CAT(vector, FUSION_MAX_VECTOR_SIZE)<BOOST_PP_ENUM_PARAMS(FUSION_MAX_VECTOR_SIZE, T)> type; + }; template <> - struct get_vector_n<0> + struct vector_n_chooser<BOOST_PP_ENUM_PARAMS(FUSION_MAX_VECTOR_SIZE, void_ BOOST_PP_INTERCEPT)> { - template <BOOST_PP_ENUM_PARAMS(FUSION_MAX_VECTOR_SIZE, typename T)> - struct call - { - typedef vector0 type; - }; + typedef vector0 type; }; #define BOOST_PP_FILENAME_1 \ <boost/fusion/container/vector/detail/vector_n_chooser.hpp> -#define BOOST_PP_ITERATION_LIMITS (1, FUSION_MAX_VECTOR_SIZE) +#define BOOST_PP_ITERATION_LIMITS (1, BOOST_PP_DEC(FUSION_MAX_VECTOR_SIZE)) #include BOOST_PP_ITERATE() - template <BOOST_PP_ENUM_PARAMS(FUSION_MAX_VECTOR_SIZE, typename T)> - struct vector_n_chooser - { - typedef - mpl::BOOST_PP_CAT(vector, FUSION_MAX_VECTOR_SIZE) - <BOOST_PP_ENUM_PARAMS(FUSION_MAX_VECTOR_SIZE, T)> - input; - - typedef typename mpl::begin<input>::type begin; - typedef typename mpl::find<input, void_>::type end; - typedef typename mpl::distance<begin, end>::type size; - - typedef typename get_vector_n<size::value>::template - call<BOOST_PP_ENUM_PARAMS(FUSION_MAX_VECTOR_SIZE, T)>::type - type; - }; }}} #endif @@ -85,14 +70,12 @@ #define N BOOST_PP_ITERATION() - template <> - struct get_vector_n<N> + template <BOOST_PP_ENUM_PARAMS(N, typename T)> + struct vector_n_chooser< + BOOST_PP_ENUM_PARAMS(N, T) + BOOST_PP_ENUM_TRAILING_PARAMS(BOOST_PP_SUB(FUSION_MAX_VECTOR_SIZE, N), void_ BOOST_PP_INTERCEPT)> { - template <BOOST_PP_ENUM_PARAMS(FUSION_MAX_VECTOR_SIZE, typename T)> - struct call - { - typedef BOOST_PP_CAT(vector, N)<BOOST_PP_ENUM_PARAMS(N, T)> type; - }; + typedef BOOST_PP_CAT(vector, N)<BOOST_PP_ENUM_PARAMS(N, T)> type; }; #undef N

Show replies by date

Joel de Guzman

2 Jun 2 Jun

1:44 a.m.

Eric Niebler wrote:

...

I'm attaching a simple patch to vector_n_chooser.hpp that replaces some template metaprogramming with preprocessor metaprogramming in the interest of improving compile times. I found this hotspot through profiling, and there are likely to be many more such patches in the near future. I could open a trac ticket and attach all the patches there, or I could just commit them to trunk as I go. Thoughts?

Awesome! Eric, feel free to commit.

...

P.S. It would be great to get some other heavy users of template metaprogramming interested in making other such changes to bring down the compile times of the core TMP libraries like MPL, Fusion, Proto, etc., etc.

Agreed 100% Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net

Larry Evans

3:39 a.m.

On 06/01/09 20:26, Eric Niebler wrote:

...

I'm attaching a simple patch to vector_n_chooser.hpp that replaces some template metaprogramming with preprocessor metaprogramming in the interest of improving compile times. I found this hotspot through profiling, [snip] Eric,

Could you post your benchmark code that showed the improvement in compile speed? I'd like to eventually try it with a variadic template compiler version of fusion vector. Somewhat off topic: What I'd really like to see is someone explain how metaprogramming improves compile speed. Steven said earlier that the slowdown depends on the template: http://article.gmane.org/gmane.comp.lib.boost.devel/186051 I would have guessed that the more template instances that, are created, then the slower the compile time. However, Steven's remark made me wonder. What I'm guessing is that if the template metaprogram produces a lot of intermediate results, then it might be better to use preprocessor metaprogramming to just produce the final result. Is that about right or is it more complicated? -regards, Larry

Eric Niebler

5:58 a.m.

Larry Evans wrote:

...

On 06/01/09 20:26, Eric Niebler wrote:

...
I'm attaching a simple patch to vector_n_chooser.hpp that replaces some template metaprogramming with preprocessor metaprogramming in the interest of improving compile times. I found this hotspot through profiling, [snip] Eric,

Could you post your benchmark code that showed the improvement in compile speed? I'd like to eventually try it with a variadic template compiler version of fusion vector.

I confess that I'm not actually benchmarking compile speed; rather, I'm benchmarking the number of template instantiations as reported by Steven's template profiler. I'm profiling TMP-heavy code like some of Proto's and xpressive's tests and cherry-picking the worst offenders. The Fusion vector_n_chooser patch knocked off 100's of template instantiations, for instance.

...

Somewhat off topic:

What I'd really like to see is someone explain how metaprogramming improves compile speed. Steven said earlier that the slowdown depends on the template:

http://article.gmane.org/gmane.comp.lib.boost.devel/186051

Interesting.

...

I would have guessed that the more template instances that, are created, then the slower the compile time. However, Steven's remark made me wonder. What I'm guessing is that if the template metaprogram produces a lot of intermediate results, then it might be better to use preprocessor metaprogramming to just produce the final result. Is that about right or is it more complicated?

My experience matches yours: more instantiations --> longer compiles. I wonder what Steven's experience is. Steven? -- Eric Niebler BoostPro Computing http://www.boostpro.com

Sebastian Redl

1:57 p.m.

...

Larry Evans wrote:

...
On 06/01/09 20:26, Eric Niebler wrote:

...
I'm attaching a simple patch to vector_n_chooser.hpp that replaces some template metaprogramming with preprocessor metaprogramming in the interest of improving compile times. I found this hotspot through profiling, [snip] Eric,

Could you post your benchmark code that showed the improvement in compile speed? I'd like to eventually try it with a variadic template compiler version of fusion vector.

I confess that I'm not actually benchmarking compile speed; rather, I'm benchmarking the number of template instantiations as reported by Steven's template profiler. I'm profiling TMP-heavy code like some of Proto's and xpressive's tests and cherry-picking the worst offenders. The Fusion vector_n_chooser patch knocked off 100's of template instantiations, for instance. That's not necessarily a good benchmark, especially if you replace it by

Eric Niebler wrote: preprocessor metaprogramming which leads to more non-template code. GCC is extremely slow at instantiating templates, but this is not necessarily true for other compilers - I believe, for example, that Clang will be faster at instantiating templates than parsing raw code. (No benchmarks - but I know the code.) So really, before comitting something, you should measure its real-time impact, and measure it at least in two compilers. Sebastian

Joel de Guzman

2:45 p.m.

Sebastian Redl wrote:

...

...
Larry Evans wrote:

...
On 06/01/09 20:26, Eric Niebler wrote:

...
I'm attaching a simple patch to vector_n_chooser.hpp that replaces some template metaprogramming with preprocessor metaprogramming in the interest of improving compile times. I found this hotspot through profiling, [snip] Eric,

Could you post your benchmark code that showed the improvement in compile speed? I'd like to eventually try it with a variadic template compiler version of fusion vector. I confess that I'm not actually benchmarking compile speed; rather, I'm benchmarking the number of template instantiations as reported by Steven's template profiler. I'm profiling TMP-heavy code like some of Proto's and xpressive's tests and cherry-picking the worst offenders. The Fusion vector_n_chooser patch knocked off 100's of template instantiations, for instance. That's not necessarily a good benchmark, especially if you replace it by

Eric Niebler wrote: preprocessor metaprogramming which leads to more non-template code. GCC is extremely slow at instantiating templates, but this is not necessarily true for other compilers - I believe, for example, that Clang will be faster at instantiating templates than parsing raw code. (No benchmarks - but I know the code.)

Agreed 100% Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net

Eric Niebler

5:09 p.m.

Joel de Guzman wrote:

...

Sebastian Redl wrote:

...
Eric Niebler wrote:

...
I confess that I'm not actually benchmarking compile speed; rather, I'm benchmarking the number of template instantiations as reported by Steven's template profiler. I'm profiling TMP-heavy code like some of Proto's and xpressive's tests and cherry-picking the worst offenders. The Fusion vector_n_chooser patch knocked off 100's of template instantiations, for instance.

That's not necessarily a good benchmark, especially if you replace it by preprocessor metaprogramming which leads to more non-template code. GCC is extremely slow at instantiating templates, but this is not necessarily true for other compilers - I believe, for example, that Clang will be faster at instantiating templates than parsing raw code. (No benchmarks - but I know the code.)

Cool! I wonder how that's possible. I have it from Walter Bright (Zortech, Symantec, Digital Mars) that instantiating a template is inherently expensive, and certain features of the C++ language (ADL, partial specialization, etc.) force that to be the case. If Clang has found a way to solve these problems, that's good news indeed. I read form the Wikipedia entry that Clang's C++ support is 2-3 years from being usable, though.

...

Agreed 100%

OK. When compiling Fusion's vector_make.cpp test ... Before ...

...

$ time g++ -I ../../../.. -c vector_make.cpp

real 0m1.670s user 0m1.216s sys 0m0.325s

After ...

...

$ time g++ -I ../../../.. -c vector_make.cpp

real 0m1.208s user 0m0.684s sys 0m0.309s

From the user time, my recent changes make this test compile twice as fast for gcc-3.4 (cygwin). For MSVC, the wins are less dramatic. Your point is taken, though ... instantiation count is merely a rule of thumb and the real measure is clock time. It is, in my experience and with compilers actually in use today, a very good rule of thumb, though. -- Eric Niebler BoostPro Computing http://www.boostpro.com

Steven Watanabe

7:51 p.m.

AMDG Eric Niebler wrote:

...

Cool! I wonder how that's possible. I have it from Walter Bright (Zortech, Symantec, Digital Mars) that instantiating a template is inherently expensive, and certain features of the C++ language (ADL, partial specialization, etc.) force that to be the case. If Clang has found a way to solve these problems, that's good news indeed. I read form the Wikipedia entry that Clang's C++ support is 2-3 years from being usable, though.

It should be possible to reduce the cost by using fancier data structures. With appropriate data structures, you don't necessarily have to look at every specialization or overload to know which is the most specialized. In most cases with class template specialization, there are only a few cases that can lead to ambiguities. Most specializations are disjoint, so it should be possible to prune most of them quickly. (I haven't thought this through fully--it's probably pretty difficult to get right). Also, except for the templates used for the various extension mechanisms, class templates tend to have only a few specializations. In Christ, Steven Watanabe

Joel de Guzman

3 Jun 3 Jun

12:03 a.m.

Eric Niebler wrote:

...

From the user time, my recent changes make this test compile twice as fast for gcc-3.4 (cygwin). For MSVC, the wins are less dramatic.

Your point is taken, though ... instantiation count is merely a rule of thumb and the real measure is clock time. It is, in my experience and with compilers actually in use today, a very good rule of thumb, though.

That's already a huge improvement, Eric! Cool! Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net

David Abrahams

4:28 p.m.

on Tue Jun 02 2009, Eric Niebler <eric-AT-boostpro.com> wrote:

...

Joel de Guzman wrote:

...
Sebastian Redl wrote:

...
Eric Niebler wrote:

...
I confess that I'm not actually benchmarking compile speed; rather, I'm benchmarking the number of template instantiations as reported by Steven's template profiler. I'm profiling TMP-heavy code like some of Proto's and xpressive's tests and cherry-picking the worst offenders. The Fusion vector_n_chooser patch knocked off 100's of template instantiations, for instance.

That's not necessarily a good benchmark, especially if you replace it by preprocessor metaprogramming which leads to more non-template code. GCC is extremely slow at instantiating templates, but this is not necessarily true for other compilers - I believe, for example, that Clang will be faster at instantiating templates than parsing raw code. (No benchmarks - but I know the code.)

Cool! I wonder how that's possible. I have it from Walter Bright (Zortech, Symantec, Digital Mars) that instantiating a template is inherently expensive, and certain features of the C++ language (ADL, partial specialization, etc.) force that to be the case. If Clang has found a way to solve these problems, that's good news indeed.

It may be "inherently expensive" by some measure, but most compilers were implemented by people for whom template instantiation speed was way down the list of priorities, and most got their template implementations before "interesting TMP" was even available for them to test against. In some cases they do *really* dumb things.

...

I read form the Wikipedia entry that Clang's C++ support is 2-3 years from being usable, though.

I wouldn't bet against Doug Gregor when he's firing on all cylinders :-)

...

...
Agreed 100%

OK. When compiling Fusion's vector_make.cpp test ...

Before ...

...
$ time g++ -I ../../../.. -c vector_make.cpp

real 0m1.670s user 0m1.216s sys 0m0.325s

After ...

...
$ time g++ -I ../../../.. -c vector_make.cpp

real 0m1.208s user 0m0.684s sys 0m0.309s

From the user time, my recent changes make this test compile twice as fast for gcc-3.4 (cygwin). For MSVC, the wins are less dramatic.

Your point is taken, though ... instantiation count is merely a rule of thumb and the real measure is clock time. It is, in my experience and with compilers actually in use today, a very good rule of thumb, though.

Well, it's great to get the instantiation count down, but consider that what you're replacing it with may not be any faster :-) If you *are* getting a win from PP metaprogramming, there's a good chance that you could improve the speed a lot more, e.g. by using the "z" parameter as described in http://www.boostpro.com/tmpbook/preprocessor.html#horizontal-repetition -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Eric Niebler

8:36 p.m.

David Abrahams wrote:

...

on Tue Jun 02 2009, Eric Niebler <eric-AT-boostpro.com> wrote:

...
I read form the Wikipedia entry that Clang's C++ support is 2-3 years from being usable, though.

I wouldn't bet against Doug Gregor when he's firing on all cylinders :-)

I didn't know Doug was involved. That changes everything! :-)

...

...
From the user time, my recent changes make this test compile twice as fast for gcc-3.4 (cygwin). For MSVC, the wins are less dramatic.

Your point is taken, though ... instantiation count is merely a rule of thumb and the real measure is clock time. It is, in my experience and with compilers actually in use today, a very good rule of thumb, though.

Well, it's great to get the instantiation count down, but consider that what you're replacing it with may not be any faster :-)

See the comment above about a measured 2x speed-up.

...

If you *are* getting a win from PP metaprogramming, there's a good chance that you could improve the speed a lot more, e.g. by using the "z" parameter as described in http://www.boostpro.com/tmpbook/preprocessor.html#horizontal-repetition

Yep, I know about that, but generally avoid nested horizontal repetition. Nevertheless, I appreciate the suggestions. For anybody concerned about the nature of my changes, here is an example of the fat in Fusion I'm trimming: https://svn.boost.org/trac/boost/changeset/53566/trunk/boost/fusion/containe... -- Eric Niebler BoostPro Computing http://www.boostpro.com

Doug Gregor

5:01 p.m.

On Tue, Jun 2, 2009 at 10:09 AM, Eric Niebler <eric@boostpro.com> wrote:

...

Joel de Guzman wrote:

...
Sebastian Redl wrote:

...
Eric Niebler wrote:

...
I confess that I'm not actually benchmarking compile speed; rather, I'm benchmarking the number of template instantiations as reported by Steven's template profiler. I'm profiling TMP-heavy code like some of Proto's and xpressive's tests and cherry-picking the worst offenders. The Fusion vector_n_chooser patch knocked off 100's of template instantiations, for instance.

That's not necessarily a good benchmark, especially if you replace it by preprocessor metaprogramming which leads to more non-template code. GCC is extremely slow at instantiating templates, but this is not necessarily true for other compilers - I believe, for example, that Clang will be faster at instantiating templates than parsing raw code. (No benchmarks - but I know the code.)

Cool! I wonder how that's possible. I have it from Walter Bright (Zortech, Symantec, Digital Mars) that instantiating a template is inherently expensive, and certain features of the C++ language (ADL, partial specialization, etc.) force that to be the case.

Template instantiation is expensive, but you're probably seeing some O(n^2) or worse effects because of a poor choice in data structures. In GCC, for example, a surprising amount of time is wasted determining whether the class template specialization X<T1, T2, ..., TN> refers to an already-known template instantiation (or specialization), because GCC stores all of the template instantiations in a linked list. Thus, you pay each time you name X<T1, T2, ..., TN>, even if you don't instantiate it. That's why we see quadratic (or worse) behavior for template metaprograms with GCC. I suspect that other compilers have the same problem.

...

If Clang has found a way to solve these problems, that's good news indeed.

We're working on it. I did some simple benchmarking with the ultra-boring Fibonacci template metaprogram last Friday, just to see how Clang is doing, and posted the results here: http://lists.cs.uiuc.edu/pipermail/cfe-dev/attachments/20090529/a392b024/att... The cost of template *instantiation* for the Fibonacci example is quite small for both compilers, since we're just talking about creating a class with its special member functions and a single static data member. However, GCC is exhibiting quadratic behavior because every time we name Fibonacci<I> for some value I, it's doing a linear search to see if there's already a specialization for that value of I. Clang is scaling much better here because our search for an already-named specialization is constant time in the average case. I can't promise that the improvements we see in Fibonacci will extend to real template metaprograms, because I haven't tried it. Nor can I: Clang lacks both member templates and class template partial specialization [*], which means that we can't compile a serious template metaprogram with Clang. Obviously, template metaprogramming is important to me, personally, so we'll do our best to scale this well for real template metaprograms.

...

I read form the Wikipedia entry that Clang's C++ support is 2-3 years from being usable, though.

I can't comment on that, but I appreciate Dave's wager ;) - Doug [*] And it lacks function templates, for you snarky folks trying to fool compilers into handling template metaprograms better :)

Eric Niebler

8:46 p.m.

Doug Gregor wrote:

...

On Tue, Jun 2, 2009 at 10:09 AM, Eric Niebler <eric@boostpro.com> wrote:

...
Cool! I wonder how that's possible. I have it from Walter Bright (Zortech, Symantec, Digital Mars) that instantiating a template is inherently expensive, and certain features of the C++ language (ADL, partial specialization, etc.) force that to be the case.

Template instantiation is expensive, but you're probably seeing some O(n^2) or worse effects because of a poor choice in data structures.

Indeed, I recall Walter saying it's an N^2 problem. I've convinced him to write a blog entry about why template instantiation in C++ is inherently slow, so soon we'll know why he thinks so. Maybe your work in Clang can prove him wrong.

...

In GCC, for example, a surprising amount of time is wasted determining whether the class template specialization X<T1, T2, ..., TN> refers to an already-known template instantiation (or specialization), because GCC stores all of the template instantiations in a linked list. Thus, you pay each time you name X<T1, T2, ..., TN>, even if you don't instantiate it.

That sucks.

...

That's why we see quadratic (or worse) behavior for template metaprograms with GCC. I suspect that other compilers have the same problem.

...
If Clang has found a way to solve these problems, that's good news indeed.

We're working on it. I did some simple benchmarking with the ultra-boring Fibonacci template metaprogram last Friday, just to see how Clang is doing, and posted the results here:

http://lists.cs.uiuc.edu/pipermail/cfe-dev/attachments/20090529/a392b024/att...

Lookin' good!

...

The cost of template *instantiation* for the Fibonacci example is quite small for both compilers, since we're just talking about creating a class with its special member functions and a single static data member.

Would it go faster if the compiler didn't have to create the special member functions? Could we use the declared-but-not-defined trick to suppress their generation and speed up template instantiations for metafunctions?

...

However, GCC is exhibiting quadratic behavior because every time we name Fibonacci<I> for some value I, it's doing a linear search to see if there's already a specialization for that value of I. Clang is scaling much better here because our search for an already-named specialization is constant time in the average case.

I can't promise that the improvements we see in Fibonacci will extend to real template metaprograms, because I haven't tried it. Nor can I: Clang lacks both member templates and class template partial specialization [*], which means that we can't compile a serious template metaprogram with Clang. Obviously, template metaprogramming is important to me, personally, so we'll do our best to scale this well for real template metaprograms.

Where can I read more and follow the team's progress? -- Eric Niebler BoostPro Computing http://www.boostpro.com

Doug Gregor

9:12 p.m.

On Wed, Jun 3, 2009 at 1:46 PM, Eric Niebler <eric@boostpro.com> wrote:

...

Doug Gregor wrote:

...
Template instantiation is expensive, but you're probably seeing some O(n^2) or worse effects because of a poor choice in data structures.

Indeed, I recall Walter saying it's an N^2 problem. I've convinced him to write a blog entry about why template instantiation in C++ is inherently slow, so soon we'll know why he thinks so. Maybe your work in Clang can prove him wrong.

I hope so!

...

...
The cost of template *instantiation* for the Fibonacci example is quite small for both compilers, since we're just talking about creating a class with its special member functions and a single static data member.

Would it go faster if the compiler didn't have to create the special member functions? Could we use the declared-but-not-defined trick to suppress their generation and speed up template instantiations for metafunctions?

Yes. GCC already optimizes this case, in the sense that it doesn't build the declaration for a special member function until that declaration is actually needed. Other compilers most certainly have a similar optimization, but Clang does not have it (yet).

...

...
However, GCC is exhibiting quadratic behavior because every time we name Fibonacci<I> for some value I, it's doing a linear search to see if there's already a specialization for that value of I. Clang is scaling much better here because our search for an already-named specialization is constant time in the average case.

I can't promise that the improvements we see in Fibonacci will extend to real template metaprograms, because I haven't tried it. Nor can I: Clang lacks both member templates and class template partial specialization [*], which means that we can't compile a serious template metaprogram with Clang. Obviously, template metaprogramming is important to me, personally, so we'll do our best to scale this well for real template metaprograms.

Where can I read more and follow the team's progress?

Information about Clang is available here: http://clang.llvm.org/ Clang is open source, so the best way to follow the team's progress is to join us and help Clang progress faster :) Barring that, the developer and commit mailing lists can help watch progress at a course-grained level, and the C++ Status page shows roughly where we think we are. - Doug

David Abrahams

5 Jun 5 Jun

1:09 a.m.

on Wed Jun 03 2009, Eric Niebler <eric-AT-boostpro.com> wrote:

...

Would it go faster if the compiler didn't have to create the special member functions? Could we use the declared-but-not-defined trick to suppress their generation and speed up template instantiations for metafunctions?

At BoostCon, a few of us hacked on a C++0x scheme that did exactly one class template instantiation per metaprogram. Here's my work: http://github.com/techarcana/mpl0x And one of my colleagues': https://svn.boost.org/trac/boost/browser/sandbox/ftmpl Unfortunately because of GCC's dumb linear list search, it spends most of its time looking up function template specializations and we get almost no speedup. But this approach could be a big win with a smarter compiler implementation. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Steven Watanabe

2 Jun 2 Jun

2:06 p.m.

AMDG Eric Niebler wrote:

...

Larry Evans wrote:

...
On 06/01/09 20:26, Eric Niebler wrote:

...
I'm attaching a simple patch to vector_n_chooser.hpp that replaces some template metaprogramming with preprocessor metaprogramming in the interest of improving compile times. I found this hotspot through profiling, [snip] Eric,

Could you post your benchmark code that showed the improvement in compile speed? I'd like to eventually try it with a variadic template compiler version of fusion vector.

I confess that I'm not actually benchmarking compile speed; rather, I'm benchmarking the number of template instantiations as reported by Steven's template profiler. I'm profiling TMP-heavy code like some of Proto's and xpressive's tests and cherry-picking the worst offenders. The Fusion vector_n_chooser patch knocked off 100's of template instantiations, for instance.

...
Somewhat off topic:

What I'd really like to see is someone explain how metaprogramming improves compile speed. Steven said earlier that the slowdown depends on the template:

http://article.gmane.org/gmane.comp.lib.boost.devel/186051

Interesting.

...
I would have guessed that the more template instances that, are created, then the slower the compile time. However, Steven's remark made me wonder. What I'm guessing is that if the template metaprogram produces a lot of intermediate results, then it might be better to use preprocessor metaprogramming to just produce the final result. Is that about right or is it more complicated?

My experience matches yours: more instantiations --> longer compiles. I wonder what Steven's experience is. Steven?

For macro optimizations that substantially reduce the number of template instantiations, that's a pretty safe assumption. I definitely don't trust it for micro optimizations, though. Also, be wary of the total instantiation count produced by my tool. It's easy to change it by changing the #includes for instance. In Christ, Steven Watanabe

John Bytheway

9:42 p.m.

Eric Niebler wrote:

...

Larry Evans wrote:

...
On 06/01/09 20:26, Eric Niebler wrote:

...
I'm attaching a simple patch to vector_n_chooser.hpp that replaces some template metaprogramming with preprocessor metaprogramming in the interest of improving compile times. I found this hotspot through profiling, [snip] Eric,

Could you post your benchmark code that showed the improvement in compile speed? I'd like to eventually try it with a variadic template compiler version of fusion vector.

I confess that I'm not actually benchmarking compile speed; rather, I'm benchmarking the number of template instantiations as reported by Steven's template profiler. I'm profiling TMP-heavy code like some of Proto's and xpressive's tests and cherry-picking the worst offenders. The Fusion vector_n_chooser patch knocked off 100's of template instantiations, for instance.

Be careful. I once spent a couple of days trying to speed up some compilations that were taking ~5mins per file. All my efforts to reduce template instantiations had no effect on the compile time. Eventually I realised that ccache was actually caching the compiles and my timing was ignoring the compile step entirely; those minutes were being spent in the preprocessor. I rewrote some preprocessor metaprogramming, changing an algorithm from O(n^4) to O(n) (at the expense of some runtime memory), and it all went away. I later had to abandon the project because the template metaprogramming was too memory-hungry, but it was a valuable lesson. I don't suppose anyone's written a preprocessing metaprogramming profiler yet? John

Steven Watanabe

9:56 p.m.

AMDG John Bytheway wrote:

...

I later had to abandon the project because the template metaprogramming was too memory-hungry, but it was a valuable lesson. I don't suppose anyone's written a preprocessing metaprogramming profiler yet?

It should be easy to build it on top of wave. In Christ, Steven Watanabe

Hartmut Kaiser

3 Jun 3 Jun

1:13 a.m.

Steven Watanabe wrote:

...

...
I later had to abandon the project because the template

John Bytheway wrote: metaprogramming

...
was too memory-hungry, but it was a valuable lesson. I don't suppose anyone's written a preprocessing metaprogramming profiler yet?

It should be easy to build it on top of wave.

Yeah, easy enough to hack that together relatively quickly. I added a new command line option to the Wave tool --macrocounts/-c allowing to specify a file name (or '-' for stdout) where the tool will print the names and invocation counts of all expanded macros. The best way to profile macro expansion counts seems to be 'wave -c- -o- ...include paths... filename', where -c- prints the counts to cout and -o- suppresses any output from the actual preprocessing. HTH Regards Hartmut

David Abrahams

4:21 p.m.

on Tue Jun 02 2009, Eric Niebler <eric-AT-boostpro.com> wrote:

...

Larry Evans wrote:

...
On 06/01/09 20:26, Eric Niebler wrote:

...
I'm attaching a simple patch to vector_n_chooser.hpp that replaces some template metaprogramming with preprocessor metaprogramming in the interest of improving compile times. I found this hotspot through profiling, [snip] Eric,

Could you post your benchmark code that showed the improvement in compile speed? I'd like to eventually try it with a variadic template compiler version of fusion vector.

I confess that I'm not actually benchmarking compile speed; rather, I'm benchmarking the number of template instantiations as reported by Steven's template profiler.

Watch out: PP metaprogramming can slow down compilation, too. I had to get Paul Mensonides to help me figure out how to make it efficient in Boost.Python, because what I was doing caused noticeable drag.

...

I'm profiling TMP-heavy code like some of Proto's and xpressive's tests and cherry-picking the worst offenders. The Fusion vector_n_chooser patch knocked off 100's of template instantiations, for instance.

Better make sure you're actually saving cycles, or you could be obfuscating code for naught. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams

4:18 p.m.

on Mon Jun 01 2009, Eric Niebler <eric-AT-boostpro.com> wrote:

...

P.S. It would be great to get some other heavy users of template metaprogramming interested in making other such changes to bring down the compile times of the core TMP libraries like MPL, Fusion, Proto, etc., etc.

MPL has already done so much with "pre-preprocessing" that I'd be surprised if there's a great deal to gain. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Eric Niebler

8:25 p.m.

David Abrahams wrote:

...

on Mon Jun 01 2009, Eric Niebler <eric-AT-boostpro.com> wrote:

...
P.S. It would be great to get some other heavy users of template metaprogramming interested in making other such changes to bring down the compile times of the core TMP libraries like MPL, Fusion, Proto, etc., etc.

MPL has already done so much with "pre-preprocessing" that I'd be surprised if there's a great deal to gain.

I've posted a message here (and cc'ed Aleksey) about the inefficiency of mpl::sequence_tag, which is invoked from pretty much everywhere. I haven't heard back from him. -- Eric Niebler BoostPro Computing http://www.boostpro.com

5890

Age (days ago)

5893

Last active (days ago)

List overview

Download

21 comments

9 participants

participants (9)

David Abrahams
Doug Gregor
Eric Niebler
Hartmut Kaiser
Joel de Guzman
John Bytheway
Larry Evans
Sebastian Redl
Steven Watanabe