
On Monday, September 12, 2011 02:57:21 PM Joel de Guzman wrote:
On 9/12/2011 1:16 PM, Joel Falcou wrote:
Le 12/09/2011 05:42, Eric Niebler a écrit :
On 9/11/2011 9:20 PM, Joel de Guzman wrote:
On 9/12/2011 1:44 AM, Joel Falcou wrote:
- fusion is also a big hitter : lots of PP and lots of forced instanciation instead of lazy specialization.
PP: right, this can be fixed.
lots of forced instanciation: I don't know what you mean. Can you please be more specific?
I can't speak for Joel F. here, but consider the templates instantiated simply to access the Nth element of a fusion vector. From a cursory
inspection of sequence/intrinsic/at.hpp, the following call: at_c<N>(v);
where v is a fusion vector instantiates: lazy_disable_if is_const result_of::at_c result_of::at mpl::int_ detail::tag_of extension::at_impl extension::at_impl::apply mpl::at detail::ref_result add_reference
I believe that it also must compute the return type of the const overload of at_c in order to do overload resolution, so that many of the above templates must be instantiated twice: once for a const vector and once for non-const, and throw in an additional add_const. (And come to think of it, Proto probably suffers from this problem too!)
That's a lot of templates for a simple element access. I didn't chase the template breadcrumbs into mpl, so there may be more.
^ This
and the fact that the _impl struct are all made like :
template<class Tag> struct at_impl;
template<> struct at_impl<some_tag>
instead of a more CT friendly
template<class Tag, class Dummy=void> struct at_impl;
template<class Dummy> struct at_impl<some_tag,Dummy>
I think heller started played with that and got some measurable CT performance increase
THe C++11 rewrite is obviously a long term project, MPL has to go this way too (My secret dream is to merge Fusion and MPL and have MPL be deltype over Fusion calls) and I think at some point we should start thinking of doign it. Fusion laready have a 0x implementation in the SOC sandbox folder but I think it can be pushed a bit more but it'll require us to have access to a couple of strong C++11 enabled compilers.
The CT performances of our infrastructure trifecta (Fusion/MPL/Proto) should become target #1 at some point.
The one you show above is also a very simple tweak. I welcome any CT improvements we can do as long as the code is kept in a reasonably comprehensible state. I applaud what you and Heller are doing.
I am trying to come up with a patch today. The changes Joel Falcou suggests are really easy to do. And already promise to show significant CT improvements.
Let me make it clear though that it is an unfair characterization to say that Fusion is the cause of CT slowdown for Proto. First, as Eric says, Proto avoids Fusion and Second, there's a clear indication that a library without Proto is still faster, regardless of the intense CT perf tweaks done thus far.
For example, here is the current CT status of Phoenix2 vs Phoenix3 comparing the elapsed (CT) time for the phoenix2 vs. phoenix3 lambda_tests.cpp (**):
MSVC 10: Phoenix2: 00:04.5 Phoenix3: 00:29.9
G++ 4.5: Phoenix2: 00:02.6 Phoenix3: 00:04.7
I wasn't aware that Phoenix3 was so bad under MSVC 10.
You all know that Phoenix2 uses Fusion exclusively. Phoenix3 uses proto, which according to Eric does not use Fusion, although IIRC the core of Phoenix3 uses some Fusion still (quick check: Thomas uses an optimized-PP version of fusion:: vector for phoenix3).
Heller did a helluva perf-tweaks for Phx3 to get that number for g++ (alas, not MSVC). In fairness, I did absolutely no CT perf-tweaks for both Phoenix2 and Fusion.
(** I made sure both tests have exactly the same code, so I removed the last test. I can post the exact code if need be)
FWIW, there are some unit tests that outperform the compile times of Phoenix2 (with gcc), the current bad hit on compile times seem to only occur with let, lambda and switch/case expressions.
Regards.