[proto] Looong compile times and other issues

John Maddock

11 Sep 2011 11 Sep '11

5:18 p.m.

So.... I thought I had all the expression template stuff pretty much done, and then I tried some real world use cases (compiling all the students_t distribution functions with my extended-precision FP type) and everything fell apart: * VC10 wouldn't compile the code at all - more or less runs the system out of swap space (takes about 10 minutes or more!), then exits with an internal compiler error (if I break the code down into it's parts and instantiate each part separately it does compile though - that's not a solution though!). * GCC-4.4.x fails to compile the code due to clashes between boost::math::complement (a function) and boost::proto::complement (a class). I suspect this is an old gcc bug (finding structures via ADL) - I guess the solution is to not derive my number type from a proto-type so ADL can't find proto:: classes? Or will I hit this from some other unforeseen lookup? * GCC-4.5.0 Fails with an internal compiler error :-( * GCC-4.6.0 Builds the code OK, but takes a long time - though possibly just barely acceptable. I suspect this is a "triple template" problem: * proto is complex template library. * my number class is a fairly large/complex template in it's own right. * the code above gets instantiated from deep within Boost.Math's internals. Other than using proto::switch in the grammar (which I'm doing already), is there anything I can do to reduce the template load on the proto side of things? Otherwise since I'm only using a tiny fraction of proto's capabilities, the only other option I can see is to rip it out and replace with a mini-proto designed to minimalize template instantiations within this particular use case. Thanks in advance, John.

Show replies by date

Joel Falcou

11 Sep 11 Sep

5:44 p.m.

Le 11/09/2011 19:18, John Maddock a écrit :

...

So.... I thought I had all the expression template stuff pretty much done, and then I tried some real world use cases (compiling all the students_t distribution functions with my extended-precision FP type) and everything fell apart:

* VC10 wouldn't compile the code at all - more or less runs the system out of swap space (takes about 10 minutes or more!), then exits with an internal compiler error (if I break the code down into it's parts and instantiate each part separately it does compile though - that's not a solution though!). * GCC-4.4.x fails to compile the code due to clashes between boost::math::complement (a function) and boost::proto::complement (a class). I suspect this is an old gcc bug (finding structures via ADL) - I guess the solution is to not derive my number type from a proto-type so ADL can't find proto:: classes? Or will I hit this from some other unforeseen lookup? * GCC-4.5.0 Fails with an internal compiler error :-( * GCC-4.6.0 Builds the code OK, but takes a long time - though possibly just barely acceptable.

I suspect this is a "triple template" problem:

* proto is complex template library. * my number class is a fairly large/complex template in it's own right. * the code above gets instantiated from deep within Boost.Math's internals.

Other than using proto::switch in the grammar (which I'm doing already), is there anything I can do to reduce the template load on the proto side of things?

Otherwise since I'm only using a tiny fraction of proto's capabilities, the only other option I can see is to rip it out and replace with a mini-proto designed to minimalize template instantiations within this particular use case.

What kind of transform do you use ? I foudn out that colalpsing complex trasnform into primitive one eased the problem. Next to that : - improve CT through forward declaration. proto is a bit woobey on this side and I think it may help if we split proto files in more grain fashion - fusion is also a big hitter : lots of PP and lots of forced instanciation instead of lazy specialization. As for the ADL, can your terminals be made PODs ? We can see if adding ADL barriers in proto helps For the ICE, is there any hints in the core dump ?

Eric Niebler

7:11 p.m.

On 9/11/2011 1:44 PM, Joel Falcou wrote:

...

Le 11/09/2011 19:18, John Maddock a écrit :

...
So.... I thought I had all the expression template stuff pretty much done, and then I tried some real world use cases (compiling all the students_t distribution functions with my extended-precision FP type) and everything fell apart:

* VC10 wouldn't compile the code at all - more or less runs the system out of swap space (takes about 10 minutes or more!), then exits with an internal compiler error (if I break the code down into it's parts and instantiate each part separately it does compile though - that's not a solution though!). * GCC-4.4.x fails to compile the code due to clashes between boost::math::complement (a function) and boost::proto::complement (a class). I suspect this is an old gcc bug (finding structures via ADL) - I guess the solution is to not derive my number type from a proto-type so ADL can't find proto:: classes? Or will I hit this from some other unforeseen lookup?

This is not a gcc bug. At least, the gcc developers would say so. There is some ambiguity in the standard about whether ADL should find classes in addition to functions -- despite of the fact that finding classes is never what the user wants. John, where does complement show up unqualified in your code?

...

...
* GCC-4.5.0 Fails with an internal compiler error :-( * GCC-4.6.0 Builds the code OK, but takes a long time - though possibly just barely acceptable.

I suspect this is a "triple template" problem:

* proto is complex template library. * my number class is a fairly large/complex template in it's own right. * the code above gets instantiated from deep within Boost.Math's internals.

Other than using proto::switch in the grammar (which I'm doing already), is there anything I can do to reduce the template load on the proto side of things?

Otherwise since I'm only using a tiny fraction of proto's capabilities, the only other option I can see is to rip it out and replace with a mini-proto designed to minimalize template instantiations within this particular use case.

What kind of transform do you use ? I foudn out that colalpsing complex trasnform into primitive one eased the problem.

Right. The nice transform syntax that allows composition via function types is rather expensive at compile time. It's great for knocking together a DSEL quickly, but once you have it all working, you can replace them one by one with the equivalent primitive transforms, like Joel suggests.

...

Next to that : - improve CT through forward declaration. proto is a bit woobey on this side and I think it may help if we split proto files in more grain fashion

Joel, can you give an example of where this might help>

...

- fusion is also a big hitter : lots of PP and lots of forced instanciation instead of lazy specialization.

I avoid fusion, sadly. Proto is not built on top of fusion for just this reason. Proto is expensive enough all by itself. :-P

...

As for the ADL, can your terminals be made PODs ? We can see if adding ADL barriers in proto helps

Proto already has ADL barriers. The expression types are in boost::proto::exprns_, for instance. I would have to see the code to know what gcc is complaining about.

...

For the ICE, is there any hints in the core dump ?

There usually isn't, AFAIK. John, if you can post your code publicly, perhaps one of the proto cognoscenti can help. -- Eric Niebler BoostPro Computing http://www.boostpro.com

Joel Falcou

7:43 p.m.

Le 11/09/2011 21:11, Eric Niebler a écrit :

...

Joel, can you give an example of where this might help>

I don't like how all transform are crammed in a single header or how the proto_fwd is hugely huge :/ I wonder if having a forward folder contianing fine grain forward header wont be better.

...

I avoid fusion, sadly. Proto is not built on top of fusion for just this reason. Proto is expensive enough all by itself. :-P

I was thinking of when you use some fusion algorithm on expression. This is a purely Fusion problem

...

There usually isn't, AFAIK.

Herp I thought it will :(

Eric Niebler

8:07 p.m.

On 9/11/2011 3:43 PM, Joel Falcou wrote:

...

Le 11/09/2011 21:11, Eric Niebler a écrit :

...
Joel, can you give an example of where this might help>

I don't like how all transform are crammed in a single header

Many are in matches.hpp, and this could be broken up. Others are already defined in spearate headers under the transform/ directory.

...

or how the proto_fwd is hugely huge :/

I tend to doubt a few hundred forward declarations and typedefs present much of a problem.

...

I wonder if having a forward folder contianing fine grain forward header wont be better.

...
I avoid fusion, sadly. Proto is not built on top of fusion for just this reason. Proto is expensive enough all by itself. :-P

I was thinking of when you use some fusion algorithm on expression. This is a purely Fusion problem

I think this only happens in proto::(reverse_)fold, and only when the sequence isn't a proto expression type. But perhaps I'm forgetting something.

...

...
There usually isn't, AFAIK.

Herp I thought it will :(

Herp? I suspect something is really very wrong for the compile times John is seeing. But at this point, it's impossible to say. John, you might also try Steven's template profiler. -- Eric Niebler BoostPro Computing http://www.boostpro.com

Joel Falcou

8:35 p.m.

Le 11/09/2011 22:07, Eric Niebler a écrit :

...

On 9/11/2011 3:43 PM, Joel Falcou wrote:

...
Le 11/09/2011 21:11, Eric Niebler a écrit :

...
Joel, can you give an example of where this might help>

I don't like how all transform are crammed in a single header

Many are in matches.hpp, and this could be broken up. Others are already defined in spearate headers under the transform/ directory.

matches.hpp is indeed the file i had in mind

...

I tend to doubt a few hundred forward declarations and typedefs present much of a problem.

i retract this indeed.

...

I think this only happens in proto::(reverse_)fold, and only when the sequence isn't a proto expression type. But perhaps I'm forgetting something.

...

Herp?

random deception sounds :p

...

I suspect something is really very wrong for the compile times John is seeing. But at this point, it's impossible to say. John, you might also try Steven's template profiler.

Last time i had these, it was when i was using ICC that, for i dunno which reason, is that slow and brittle with template based code

Mathias Gaunard

12 Sep 12 Sep

9:47 a.m.

On 09/11/2011 09:43 PM, Joel Falcou wrote:

...

I don't like how all transform are crammed in a single header or how the proto_fwd is hugely huge :/ I wonder if having a forward folder contianing fine grain forward header wont be better.

It would be good as well if proto/tags.hpp could avoid including the whole of proto_fwd.hpp

John Maddock

8:42 a.m.

----- Original Message ----- From: "Eric Niebler" <eric@boostpro.com> To: <boost@lists.boost.org> Sent: Sunday, September 11, 2011 8:11 PM Subject: Re: [boost] [proto] Looong compile times and other issues On 9/11/2011 1:44 PM, Joel Falcou wrote:

...

Le 11/09/2011 19:18, John Maddock a écrit :

...
So.... I thought I had all the expression template stuff pretty much done, and then I tried some real world use cases (compiling all the students_t distribution functions with my extended-precision FP type) and everything fell apart:

* VC10 wouldn't compile the code at all - more or less runs the system out of swap space (takes about 10 minutes or more!), then exits with an internal compiler error (if I break the code down into it's parts and instantiate each part separately it does compile though - that's not a solution though!). * GCC-4.4.x fails to compile the code due to clashes between boost::math::complement (a function) and boost::proto::complement (a class). I suspect this is an old gcc bug (finding structures via ADL) - I guess the solution is to not derive my number type from a proto-type so ADL can't find proto:: classes? Or will I hit this from some other unforeseen lookup?

...

This is not a gcc bug. At least, the gcc developers would say so. There is some ambiguity in the standard about whether ADL should find classes in addition to functions -- despite of the fact that finding classes is never what the user wants.

Oh :-(

...

John, where does complement show up unqualified in your code?

It's in my concept-checking code, the actual (abreviated) error message is: t.cpp:13: instantiated from here ../../../../trunk/boost/proto/tags.hpp:30: error: 'struct boost::proto::tag::complement' is not a function, ../../../../trunk/boost/math/distributions/complement.hpp:156: error: conflict with 'template<class Dist, class RealType> boost::math::complemented2_type<Dist, RealType> boost::math::complement(const Dist&, const RealType&)' ../../../../trunk/boost/math/concepts/distributions.hpp:132: error: in call to 'complement' ../../../../trunk/boost/proto/proto_fwd.hpp:488: error: 'template<class T> struct boost::proto::complement' is not a function, ../../../../trunk/boost/math/distributions/complement.hpp:156: error: conflict with 'template<class Dist, class RealType> boost::math::complemented2_type<Dist, RealType> boost::math::complement(const Dist&, const RealType&)' ../../../../trunk/boost/math/concepts/distributions.hpp:132: error: in call to 'complement' The point is that boost::math::complement is a documented public interface so: using namespace boost::math; real_type a = cdf(complement(distribution_type, real_type)); Is going to be common place code - so I really don't want these kinds of ambiguities coming up - not least because the error messages are sufficiently cryptic that it's not immediately obvious what the issue is.

...

...
* GCC-4.5.0 Fails with an internal compiler error :-( * GCC-4.6.0 Builds the code OK, but takes a long time - though possibly just barely acceptable.

I suspect this is a "triple template" problem:

* proto is complex template library. * my number class is a fairly large/complex template in it's own right. * the code above gets instantiated from deep within Boost.Math's internals.

Other than using proto::switch in the grammar (which I'm doing already), is there anything I can do to reduce the template load on the proto side of things?

Otherwise since I'm only using a tiny fraction of proto's capabilities, the only other option I can see is to rip it out and replace with a mini-proto designed to minimalize template instantiations within this particular use case.

What kind of transform do you use ? I foudn out that colalpsing complex trasnform into primitive one eased the problem.

...

Right. The nice transform syntax that allows composition via function types is rather expensive at compile time. It's great for knocking together a DSEL quickly, but once you have it all working, you can replace them one by one with the equivalent primitive transforms, like Joel suggests.

Ummm, well I confess I'm not using any transforms - just the expression template creation - then I unpick the expression myself. Partly that allowed me to get things going quickly, and partly it's allowed me to do all kinds of interesting things with the expression. The actual code that unpicks the expression is somewhat verbose, but not all that complex (not too much template depth... I hope).

...

Next to that : - improve CT through forward declaration. proto is a bit woobey on this side and I think it may help if we split proto files in more grain fashion

...

Joel, can you give an example of where this might help>

...

- fusion is also a big hitter : lots of PP and lots of forced instanciation instead of lazy specialization.

...

I avoid fusion, sadly. Proto is not built on top of fusion for just this reason. Proto is expensive enough all by itself. :-P

...

As for the ADL, can your terminals be made PODs ? We can see if adding ADL barriers in proto helps

...

Proto already has ADL barriers. The expression types are in boost::proto::exprns_, for instance. I would have to see the code to know what gcc is complaining about.

...

For the ICE, is there any hints in the core dump ?

...

There usually isn't, AFAIK.

Not for GCC, for MSVC I do get some kind of error location - but it's different for every problem case :-(

...

John, if you can post your code publicly, perhaps one of the proto cognoscenti can help.

OK to reproduce you will need: * Current trunk. * The contents of sandbox/big_number. * A copy of MPFR. Then: #include <boost/math/big_number/mpfr.hpp> #include <boost/math/concepts/distributions.hpp> #include <boost/math/distributions.hpp> void foo() { using namespace boost; using namespace boost::math; using namespace boost::math::concepts; function_requires<DistributionConcept<students_t_distribution<boost::math::mpfr_real_50>

...

...
(); }

If the MPFR dependency proves an issue, then I can try and produce a test case without it. For those prefering to read code, the proto-dependent stuff is mostly all in: boost/math/big_number/big_number_base.hpp (grammar and meta programming). boost/math/big_number.hpp (The actual terminal, and expression evaluation). Thanks, John.

Eric Niebler

6:28 p.m.

On 9/12/2011 4:42 AM, John Maddock wrote:

...

It's in my concept-checking code, the actual (abreviated) error message is:

t.cpp:13: instantiated from here ../../../../trunk/boost/proto/tags.hpp:30: error: 'struct boost::proto::tag::complement' is not a function,

Ah! The tags need to be in an ADL-blocking namespace. That should be an easy fix. Can you file a bug? I'm currently on a week-long vacation with my family.

...

Ummm, well I confess I'm not using any transforms - just the expression template creation - then I unpick the expression myself. Partly that allowed me to get things going quickly, and partly it's allowed me to do all kinds of interesting things with the expression. The actual code that unpicks the expression is somewhat verbose, but not all that complex (not too much template depth... I hope).

It sounds like you're not actually using Proto for that much. Which leaves me puzzled because I can't figure where the perf problems would be coming from. Disclaimer: I haven't actually looked at the code, because I'm on vacation.

...

...
John, if you can post your code publicly, perhaps one of the proto cognoscenti can help.

OK to reproduce you will need:

* Current trunk. * The contents of sandbox/big_number. * A copy of MPFR. <snip>

I can't get to this this week. If one of the other proto users wanted to take a crack at this, it would be much appreciated. -- Eric Niebler BoostPro Computing http://www.boostpro.com

Joel Falcou

6:50 p.m.

Le 12/09/2011 20:28, Eric Niebler a écrit :

...

On 9/12/2011 4:42 AM, John Maddock wrote:

...
It's in my concept-checking code, the actual (abreviated) error message is:

t.cpp:13: instantiated from here ../../../../trunk/boost/proto/tags.hpp:30: error: 'struct boost::proto::tag::complement' is not a function,

Ah! The tags need to be in an ADL-blocking namespace. That should be an easy fix. Can you file a bug? I'm currently on a week-long vacation with my family.

Fill the ticket I'll take care of it over the week

Eric Niebler

13 Sep 13 Sep

4:41 p.m.

On 9/12/2011 2:50 PM, Joel Falcou wrote:

...

Le 12/09/2011 20:28, Eric Niebler a écrit :

...
On 9/12/2011 4:42 AM, John Maddock wrote:

...
It's in my concept-checking code, the actual (abreviated) error message is:

t.cpp:13: instantiated from here ../../../../trunk/boost/proto/tags.hpp:30: error: 'struct boost::proto::tag::complement' is not a function,

Ah! The tags need to be in an ADL-blocking namespace. That should be an easy fix. Can you file a bug? I'm currently on a week-long vacation with my family.

Fill the ticket I'll take care of it over the week

I just committed a fix for this on trunk. -- Eric Niebler BoostPro Computing http://www.boostpro.com

John Maddock

8:01 a.m.

...

...
Ummm, well I confess I'm not using any transforms - just the expression template creation - then I unpick the expression myself. Partly that allowed me to get things going quickly, and partly it's allowed me to do all kinds of interesting things with the expression. The actual code that unpicks the expression is somewhat verbose, but not all that complex (not too much template depth... I hope).

It sounds like you're not actually using Proto for that much. Which leaves me puzzled because I can't figure where the perf problems would be coming from. Disclaimer: I haven't actually looked at the code, because I'm on vacation.

Nod, could be my template metaprogramming that's at fault.... I can't help thinking that if I rolled my own ET's I could eliminate most of those by rolling the functionality directly into the operator overloads.... of course that may end up being even worse :-( Anyhow I've filed a bug report on the ADL issue, Thanks, John.

Joel de Guzman

12 Sep 12 Sep

1:20 a.m.

On 9/12/2011 1:44 AM, Joel Falcou wrote:

...

- fusion is also a big hitter : lots of PP and lots of forced instanciation instead of lazy specialization.

PP: right, this can be fixed. lots of forced instanciation: I don't know what you mean. Can you please be more specific? Regards, -- Joel de Guzman http://www.boostpro.com http://boost-spirit.com

Eric Niebler

3:42 a.m.

On 9/11/2011 9:20 PM, Joel de Guzman wrote:

...

On 9/12/2011 1:44 AM, Joel Falcou wrote:

...
- fusion is also a big hitter : lots of PP and lots of forced instanciation instead of lazy specialization.

PP: right, this can be fixed.

lots of forced instanciation: I don't know what you mean. Can you please be more specific?

I can't speak for Joel F. here, but consider the templates instantiated simply to access the Nth element of a fusion vector. From a cursory inspection of sequence/intrinsic/at.hpp, the following call: at_c<N>(v); where v is a fusion vector instantiates: lazy_disable_if is_const result_of::at_c result_of::at mpl::int_ detail::tag_of extension::at_impl extension::at_impl::apply mpl::at detail::ref_result add_reference I believe that it also must compute the return type of the const overload of at_c in order to do overload resolution, so that many of the above templates must be instantiated twice: once for a const vector and once for non-const, and throw in an additional add_const. (And come to think of it, Proto probably suffers from this problem too!) That's a lot of templates for a simple element access. I didn't chase the template breadcrumbs into mpl, so there may be more. In Proto, I roll my own poor-man's heterogeneous sequences so I can save template instantiations. I even use my own typelists over mpl::vector because it had a measurable impact. And yes, I measured. I'll say this: Fusion's implementation is BEAUTIFUL compared to Proto's, which is an ugly, hard-to-maintain mess of PP gunk. Proto also exposes fewer customization points. These were tradeoffs I made in the interest of compile times, and it's still not enough. A complete rewrite using decltype, rvalue refs and variadic templates would GREATLY improve things. That's probably true for both Fusion and Proto. -- Eric Niebler BoostPro Computing http://www.boostpro.com

Joel Falcou

5:16 a.m.

Le 12/09/2011 05:42, Eric Niebler a écrit :

...

On 9/11/2011 9:20 PM, Joel de Guzman wrote:

...
On 9/12/2011 1:44 AM, Joel Falcou wrote:

...
- fusion is also a big hitter : lots of PP and lots of forced instanciation instead of lazy specialization.

PP: right, this can be fixed.

lots of forced instanciation: I don't know what you mean. Can you please be more specific?

I can't speak for Joel F. here, but consider the templates instantiated simply to access the Nth element of a fusion vector. From a cursory inspection of sequence/intrinsic/at.hpp, the following call:

at_c<N>(v);

where v is a fusion vector instantiates:

lazy_disable_if is_const result_of::at_c result_of::at mpl::int_ detail::tag_of extension::at_impl extension::at_impl::apply mpl::at detail::ref_result add_reference

I believe that it also must compute the return type of the const overload of at_c in order to do overload resolution, so that many of the above templates must be instantiated twice: once for a const vector and once for non-const, and throw in an additional add_const. (And come to think of it, Proto probably suffers from this problem too!)

That's a lot of templates for a simple element access. I didn't chase the template breadcrumbs into mpl, so there may be more.

^ This and the fact that the _impl struct are all made like : template<class Tag> struct at_impl; template<> struct at_impl<some_tag> instead of a more CT friendly template<class Tag, class Dummy=void> struct at_impl; template<class Dummy> struct at_impl<some_tag,Dummy> I think heller started played with that and got some measurable CT performance increase THe C++11 rewrite is obviously a long term project, MPL has to go this way too (My secret dream is to merge Fusion and MPL and have MPL be deltype over Fusion calls) and I think at some point we should start thinking of doign it. Fusion laready have a 0x implementation in the SOC sandbox folder but I think it can be pushed a bit more but it'll require us to have access to a couple of strong C++11 enabled compilers. The CT performances of our infrastructure trifecta (Fusion/MPL/Proto) should become target #1 at some point.

Joel de Guzman

6:57 a.m.

On 9/12/2011 1:16 PM, Joel Falcou wrote:

...

Le 12/09/2011 05:42, Eric Niebler a écrit :

...
On 9/11/2011 9:20 PM, Joel de Guzman wrote:

...
On 9/12/2011 1:44 AM, Joel Falcou wrote:

...
- fusion is also a big hitter : lots of PP and lots of forced instanciation instead of lazy specialization.

PP: right, this can be fixed.

lots of forced instanciation: I don't know what you mean. Can you please be more specific?

I can't speak for Joel F. here, but consider the templates instantiated simply to access the Nth element of a fusion vector. From a cursory inspection of sequence/intrinsic/at.hpp, the following call:

at_c<N>(v);

where v is a fusion vector instantiates:

lazy_disable_if is_const result_of::at_c result_of::at mpl::int_ detail::tag_of extension::at_impl extension::at_impl::apply mpl::at detail::ref_result add_reference

I believe that it also must compute the return type of the const overload of at_c in order to do overload resolution, so that many of the above templates must be instantiated twice: once for a const vector and once for non-const, and throw in an additional add_const. (And come to think of it, Proto probably suffers from this problem too!)

That's a lot of templates for a simple element access. I didn't chase the template breadcrumbs into mpl, so there may be more.

^ This

and the fact that the _impl struct are all made like :

template<class Tag> struct at_impl;

template<> struct at_impl<some_tag>

instead of a more CT friendly

template<class Tag, class Dummy=void> struct at_impl;

template<class Dummy> struct at_impl<some_tag,Dummy>

I think heller started played with that and got some measurable CT performance increase

THe C++11 rewrite is obviously a long term project, MPL has to go this way too (My secret dream is to merge Fusion and MPL and have MPL be deltype over Fusion calls) and I think at some point we should start thinking of doign it. Fusion laready have a 0x implementation in the SOC sandbox folder but I think it can be pushed a bit more but it'll require us to have access to a couple of strong C++11 enabled compilers.

The CT performances of our infrastructure trifecta (Fusion/MPL/Proto) should become target #1 at some point.

The one you show above is also a very simple tweak. I welcome any CT improvements we can do as long as the code is kept in a reasonably comprehensible state. I applaud what you and Heller are doing. Let me make it clear though that it is an unfair characterization to say that Fusion is the cause of CT slowdown for Proto. First, as Eric says, Proto avoids Fusion and Second, there's a clear indication that a library without Proto is still faster, regardless of the intense CT perf tweaks done thus far. For example, here is the current CT status of Phoenix2 vs Phoenix3 comparing the elapsed (CT) time for the phoenix2 vs. phoenix3 lambda_tests.cpp (**): MSVC 10: Phoenix2: 00:04.5 Phoenix3: 00:29.9 G++ 4.5: Phoenix2: 00:02.6 Phoenix3: 00:04.7 You all know that Phoenix2 uses Fusion exclusively. Phoenix3 uses proto, which according to Eric does not use Fusion, although IIRC the core of Phoenix3 uses some Fusion still (quick check: Thomas uses an optimized-PP version of fusion:: vector for phoenix3). Heller did a helluva perf-tweaks for Phx3 to get that number for g++ (alas, not MSVC). In fairness, I did absolutely no CT perf-tweaks for both Phoenix2 and Fusion. (** I made sure both tests have exactly the same code, so I removed the last test. I can post the exact code if need be) Regards. -- Joel de Guzman http://www.boostpro.com http://boost-spirit.com

Thomas Heller

8:06 a.m.

On Monday, September 12, 2011 02:57:21 PM Joel de Guzman wrote:

...

On 9/12/2011 1:16 PM, Joel Falcou wrote:

...
Le 12/09/2011 05:42, Eric Niebler a écrit :

...
On 9/11/2011 9:20 PM, Joel de Guzman wrote:

...
On 9/12/2011 1:44 AM, Joel Falcou wrote:

...
- fusion is also a big hitter : lots of PP and lots of forced instanciation instead of lazy specialization.

PP: right, this can be fixed.

lots of forced instanciation: I don't know what you mean. Can you please be more specific?

I can't speak for Joel F. here, but consider the templates instantiated simply to access the Nth element of a fusion vector. From a cursory

inspection of sequence/intrinsic/at.hpp, the following call: at_c<N>(v);

where v is a fusion vector instantiates: lazy_disable_if is_const result_of::at_c result_of::at mpl::int_ detail::tag_of extension::at_impl extension::at_impl::apply mpl::at detail::ref_result add_reference

I believe that it also must compute the return type of the const overload of at_c in order to do overload resolution, so that many of the above templates must be instantiated twice: once for a const vector and once for non-const, and throw in an additional add_const. (And come to think of it, Proto probably suffers from this problem too!)

That's a lot of templates for a simple element access. I didn't chase the template breadcrumbs into mpl, so there may be more.

^ This

and the fact that the _impl struct are all made like :

template<class Tag> struct at_impl;

template<> struct at_impl<some_tag>

instead of a more CT friendly

template<class Tag, class Dummy=void> struct at_impl;

template<class Dummy> struct at_impl<some_tag,Dummy>

I think heller started played with that and got some measurable CT performance increase

THe C++11 rewrite is obviously a long term project, MPL has to go this way too (My secret dream is to merge Fusion and MPL and have MPL be deltype over Fusion calls) and I think at some point we should start thinking of doign it. Fusion laready have a 0x implementation in the SOC sandbox folder but I think it can be pushed a bit more but it'll require us to have access to a couple of strong C++11 enabled compilers.

The CT performances of our infrastructure trifecta (Fusion/MPL/Proto) should become target #1 at some point.

The one you show above is also a very simple tweak. I welcome any CT improvements we can do as long as the code is kept in a reasonably comprehensible state. I applaud what you and Heller are doing.

I am trying to come up with a patch today. The changes Joel Falcou suggests are really easy to do. And already promise to show significant CT improvements.

...

Let me make it clear though that it is an unfair characterization to say that Fusion is the cause of CT slowdown for Proto. First, as Eric says, Proto avoids Fusion and Second, there's a clear indication that a library without Proto is still faster, regardless of the intense CT perf tweaks done thus far.

For example, here is the current CT status of Phoenix2 vs Phoenix3 comparing the elapsed (CT) time for the phoenix2 vs. phoenix3 lambda_tests.cpp (**):

MSVC 10: Phoenix2: 00:04.5 Phoenix3: 00:29.9

G++ 4.5: Phoenix2: 00:02.6 Phoenix3: 00:04.7

I wasn't aware that Phoenix3 was so bad under MSVC 10.

...

You all know that Phoenix2 uses Fusion exclusively. Phoenix3 uses proto, which according to Eric does not use Fusion, although IIRC the core of Phoenix3 uses some Fusion still (quick check: Thomas uses an optimized-PP version of fusion:: vector for phoenix3).

Heller did a helluva perf-tweaks for Phx3 to get that number for g++ (alas, not MSVC). In fairness, I did absolutely no CT perf-tweaks for both Phoenix2 and Fusion.

(** I made sure both tests have exactly the same code, so I removed the last test. I can post the exact code if need be)

FWIW, there are some unit tests that outperform the compile times of Phoenix2 (with gcc), the current bad hit on compile times seem to only occur with let, lambda and switch/case expressions.

...

Regards.

Joel de Guzman

10:50 a.m.

On 9/12/2011 4:06 PM, Thomas Heller wrote:

...

On Monday, September 12, 2011 02:57:21 PM Joel de Guzman wrote:

...
The one you show above is also a very simple tweak. I welcome any CT improvements we can do as long as the code is kept in a reasonably comprehensible state. I applaud what you and Heller are doing.

I am trying to come up with a patch today. The changes Joel Falcou suggests are really easy to do. And already promise to show significant CT improvements.

Awesome! Looking forward to it.

...

...
Let me make it clear though that it is an unfair characterization to say that Fusion is the cause of CT slowdown for Proto. First, as Eric says, Proto avoids Fusion and Second, there's a clear indication that a library without Proto is still faster, regardless of the intense CT perf tweaks done thus far.

For example, here is the current CT status of Phoenix2 vs Phoenix3 comparing the elapsed (CT) time for the phoenix2 vs. phoenix3 lambda_tests.cpp (**):

MSVC 10: Phoenix2: 00:04.5 Phoenix3: 00:29.9

G++ 4.5: Phoenix2: 00:02.6 Phoenix3: 00:04.7

I wasn't aware that Phoenix3 was so bad under MSVC 10.

Yes. A solution for compiler X does not necessarily work for compiler Y.

...

...
You all know that Phoenix2 uses Fusion exclusively. Phoenix3 uses proto, which according to Eric does not use Fusion, although IIRC the core of Phoenix3 uses some Fusion still (quick check: Thomas uses an optimized-PP version of fusion:: vector for phoenix3).

Heller did a helluva perf-tweaks for Phx3 to get that number for g++ (alas, not MSVC). In fairness, I did absolutely no CT perf-tweaks for both Phoenix2 and Fusion.

(** I made sure both tests have exactly the same code, so I removed the last test. I can post the exact code if need be)

FWIW, there are some unit tests that outperform the compile times of Phoenix2 (with gcc), the current bad hit on compile times seem to only occur with let, lambda and switch/case expressions.

Fair enough, but: 1) You are comparing (CT) optimized code (Phx3) with unoptimized code (Phx2). Apply the same optimizations for Phx2 then it will again leave Phx3 in the dust. 2) It is these more complex expressions that's purportedly where Proto should shine. Otherwise, an easy hand-written ET such as that of Phx2, with less CT overhead, would suffice. Regards, -- Joel de Guzman http://www.boostpro.com http://boost-spirit.com

Thomas Heller

1:36 p.m.

On Monday, September 12, 2011 06:50:08 PM Joel de Guzman wrote:

...

On 9/12/2011 4:06 PM, Thomas Heller wrote:

...
On Monday, September 12, 2011 02:57:21 PM Joel de Guzman wrote:

...
The one you show above is also a very simple tweak. I welcome any CT improvements we can do as long as the code is kept in a reasonably comprehensible state. I applaud what you and Heller are doing.

I am trying to come up with a patch today. The changes Joel Falcou suggests are really easy to do. And already promise to show significant CT improvements.

Awesome! Looking forward to it.

Turns out it didn't work out that good ... The patch can be downloaded here: https://gist.github.com/1211241 The fusion testcases compile, unfortunately they are around 10% slower to compile with that patch applied ... What i have done is to simply avoid full specialisations of the form: template <> struct some_impl<some_tag> ... The idea was to avoid eager instantations when this particular feature wasn't used. Looks like we have a false friend here. Maybe i missed something. Would be glad if someone points to what i miss.

Eric Niebler

6:39 p.m.

On 9/12/2011 2:57 AM, Joel de Guzman wrote:

...

Let me make it clear though that it is an unfair characterization to say that Fusion is the cause of CT slowdown for Proto.

I never said that. I'm not sure Joel F. said that either, just if IF John is using Fusion it can add to the TMP load. But that's jumping the gun. Let's not point fingers until we measure and figure out where the bloat is coming from.

...

First, as Eric says, Proto avoids Fusion and Second, there's a clear indication that a library without Proto is still faster, regardless of the intense CT perf tweaks done thus far.

For example, here is the current CT status of Phoenix2 vs Phoenix3 comparing the elapsed (CT) time for the phoenix2 vs. phoenix3 lambda_tests.cpp (**):

MSVC 10: Phoenix2: 00:04.5 Phoenix3: 00:29.9

Ouch! Thomas, what can we do here?

...

G++ 4.5: Phoenix2: 00:02.6 Phoenix3: 00:04.7

That's more reasonable, esp. considering all the Phx3 does that Phx2 doesn't. But this is drifting off-topic. Let's focus on John's problem, not the compile-time performance of Phoenix3. -- Eric Niebler BoostPro Computing http://www.boostpro.com

Joel de Guzman

13 Sep 13 Sep

2:08 a.m.

On 9/13/2011 2:39 AM, Eric Niebler wrote:

...

On 9/12/2011 2:57 AM, Joel de Guzman wrote:

...
Let me make it clear though that it is an unfair characterization to say that Fusion is the cause of CT slowdown for Proto.

I never said that. I'm not sure Joel F. said that either, just if IF John is using Fusion it can add to the TMP load. But that's jumping the gun. Let's not point fingers until we measure and figure out where the bloat is coming from.

[...]

...

But this is drifting off-topic. Let's focus on John's problem, not the compile-time performance of Phoenix3.

Sorry about that. I was just reacting to Joel F's note saying "- fusion is also a big hitter" in a thread titled "[proto] Looong compile times and other issues". I was just defending Fusion with clear proof that the overhead is not due to Fusion and that proof happens to come from Phoenix2 which uses Fusion exclusively. I agree. Let's not point fingers. I would've expected Joel F to substantiate any claims like that with real numbers, lest it starts to sound more like spreading FUD. Regards, -- Joel de Guzman http://www.boostpro.com http://boost-spirit.com

Joel Falcou

9:22 a.m.

On 13/09/2011 04:08, Joel de Guzman wrote:

...

Sorry about that. I was just reacting to Joel F's note saying "- fusion is also a big hitter" in a thread titled "[proto] Looong compile times and other issues". I was just defending Fusion with clear proof that the overhead is not due to Fusion and that proof happens to come from Phoenix2 which uses Fusion exclusively.

I agree. Let's not point fingers. I would've expected Joel F to substantiate any claims like that with real numbers, lest it starts to sound more like spreading FUD.

Regards,

Shortcut are to be blamed, qs Eric said, I was pointign out that, in some case, the cocktail fusion in proto transform leads to surprising CT and that, before seeign John code, it *may* be the case. I am using Fusion et al. for like eveyrthign, I start to know when and how stuff an get wrong, i wont take on me to spread FUDs on this subject.

Joel de Guzman

11:44 a.m.

On 9/13/2011 5:22 PM, Joel Falcou wrote:

...

On 13/09/2011 04:08, Joel de Guzman wrote:

...
Sorry about that. I was just reacting to Joel F's note saying "- fusion is also a big hitter" in a thread titled "[proto] Looong compile times and other issues". I was just defending Fusion with clear proof that the overhead is not due to Fusion and that proof happens to come from Phoenix2 which uses Fusion exclusively.

I agree. Let's not point fingers. I would've expected Joel F to substantiate any claims like that with real numbers, lest it starts to sound more like spreading FUD.

Regards,

Shortcut are to be blamed, qs Eric said, I was pointign out that, in some case, the cocktail fusion in proto transform leads to surprising CT and that, before seeign John code, it *may* be the case. I am using Fusion et al. for like eveyrthign, I start to know when and how stuff an get wrong, i wont take on me to spread FUDs on this subject.

Ok, fair enough. I'm sorry if I overreacted. I did not intend to say that you are intentionally spreading FUD. What I meant to say is that sweeping statements like that are easily misconstrued and should better be substantiated by sufficient explanation, code and real benchmarks. Regards, -- Joel de Guzman http://www.boostpro.com http://boost-spirit.com

greened＠obbligato.org

14 Sep 14 Sep

4:32 p.m.

Eric Niebler <eric@boostpro.com> writes:

...

A complete rewrite using decltype, rvalue refs and variadic templates would GREATLY improve things. That's probably true for both Fusion and Proto.

Speaking as a proto user working with a rather large grammar, this would be most welcome. :) -Dave

Mathias Gaunard

12 Sep 12 Sep

9:45 a.m.

On 09/11/2011 07:18 PM, John Maddock wrote:

...

* GCC-4.4.x fails to compile the code due to clashes between boost::math::complement (a function) and boost::proto::complement (a class). I suspect this is an old gcc bug (finding structures via ADL) -

Could you demonstrate the problem?

...

I guess the solution is to not derive my number type from a proto-type so ADL can't find proto:: classes? Or will I hit this from some other unforeseen lookup?

The solution to ADL problems is usually to set up your namespaces correctly or to qualify.

...

* the code above gets instantiated from deep within Boost.Math's internals.

Why do you need to use Proto in the internals?

John Maddock

11:10 a.m.

...

...
I guess the solution is to not derive my number type from a proto-type so ADL can't find proto:: classes? Or will I hit this from some other unforeseen lookup?

The solution to ADL problems is usually to set up your namespaces correctly or to qualify.

...
* the code above gets instantiated from deep within Boost.Math's internals.

Why do you need to use Proto in the internals?

It's not: the issue is that a proto-ized type is used as an argument to template functions, so you get quite a deep instantiation tree even before the proto-expressions start to get instantiated. That said I don't have any issues instantiating those functions with mpfr_class (http://math.berkeley.edu/~wilken/code/gmpfrxx/) - which uses it's own (admittedly fairly simple) expression-template code. HTH, John.

Mathias Gaunard

11:36 a.m.

On 12/09/2011 13:10, John Maddock wrote:

...

It's not: the issue is that a proto-ized type is used as an argument to template functions, so you get quite a deep instantiation tree even before the proto-expressions start to get instantiated. That said I don't have any issues instantiating those functions with mpfr_class (http://math.berkeley.edu/~wilken/code/gmpfrxx/) - which uses it's own (admittedly fairly simple) expression-template code.

You could directly use your own backend with the associated evaluation functions that don't require any expression templates. I don't really see what expression templates bring to your big_number classes. The only use I see for that kind of thing is recognizing certain patterns of operators to call a better primitive. But in the internals you know what the patterns are, and you can optimize them by hand. Otherwise, I gave a cursory look at your code, and it doesn't seem to be making good use of Proto, especially in the file big_number_base.hpp The whole expression_type thing seems unnecessary and the assign_and_eval_imp look like badly-written (and slow) grammars.

5062

Age (days ago)

5065

Last active (days ago)

List overview

Download

26 comments

7 participants

participants (7)

Eric Niebler
greened＠obbligato.org
Joel de Guzman
Joel Falcou
John Maddock
Mathias Gaunard
Thomas Heller