Re: [boost] [Fusion] port to c++0x

27 Sep 2009

      On 09/26/09 10:40, Christopher Schmidt wrote:
...
Larry Evans schrieb:
...
There's no type_at in:
https://svn.boost.org/trac/boost/browser/sandbox/variadic_templates/libs/mpl...
...
...
There is, however, a tuple<,>::at_type<I> nested struct.  Is that what
you're referring to?  If so, I don't understand why that would be
expensive since neither at_type or what it calls, at_elem, use
recursion. Could you explain why at_type would be expensive?  Or maybe
you've run some timing tests on it.  If so, could you provide the test
driver so I could test to see why it's so expensive?
[snip]
...
...
TIA.
-Larry
I am sorry to have kept you waiting so long. I have been busy with
moving to a new flat.
No problem at all (I actually thought the response was pretty fast).
...
Consider an instantiation of a tuple with 7 elements, the types foo,
bar, int, char, blub, dub and mup.
My fusion vector will directly evolve to
vector<foo,bar,int,char,blub,dub,mup>
  : detail::vector_impl<0,foo,bar,int,char,blub,dub,mup>
      : detail::vector_impl<4,blub,dub,mup>
As you elaborated, your tuple will evolve to
tuple<char, foo,bar,int,char,blub,dub,mup>
  : _Tuple_impl<
        mpl::package_c<char,0,1,2,3,4,4,5,6,7>,
        foo,bar,int,char,blub,dub,mup
    >
      : _Tuple_element<mpl::integral_c<char, 0>, foo>
      , _Tuple_element<mpl::integral_c<char, 1>, bar>
      , _Tuple_element<mpl::integral_c<char, 2>, int>
      , _Tuple_element<mpl::integral_c<char, 3>, char>
      , _Tuple_element<mpl::integral_c<char, 4>, blub>
      , _Tuple_element<mpl::integral_c<char, 5>, dub>
      , _Tuple_element<mpl::integral_c<char, 6>, mup>
Summed up, there are N/4+1 template instantiations for my vector and
N+2+x for your tuple, with x being the amount of instantiations for
mpl::package_range_forward. If mpl::package_range_forward<...> is not
cached, x will be N+C.

...
Also you need to consider the overhead of template expressions that
are not instantiated but still are evaluated.
Good point.  I wasn't aware of that.
...
I think your at_type implementation is expensive as a complete template
function invocation expression is evaluated. This involves a lot of
steps in the lookup process that are not necessary in a meta-only
implementation.
Good point. I had not considered that.
...
Also the lookup process should scale commensurate to N.
Is that not true with my tuple? AFAICT, it's not dependent on N.
OOPS, wait, because the number of template instantiations depends on
N, and the lookup time is dependent on the number of template
instantiations, the lookup time is dependent on N, but AFAICT, it's
commensurate (if by that you mean linear time complexity).  Is that
what you mean by commensurate?
...
I cannot see how this should be faster than a controlled meta-recursion
with a break condition.
What's "controlled meta-recursion" and what's "a break condition".
Is "a break condition" just the template recursion termination
condition?  Is "controlled meta-recursion" the recursion every 4 (or
whatever) elements while for those 4 elements boost_pp is used to
generate the member variables and access one of those 4 elements?
...
On top of that your implementation is limited to compiler with decltype
support. I tried to avoid implementations in which certain c++0x
features depend on others. A user, whose compiler does support variadic
templates but not decltype, should still be able to enjoy the variadic
template candy. This is not much of a problem in our specific case but
in other cases an independent implementation is crucial.
I hadn't thought that was important because I'd assumed that by the
time variadic templates were adopted, decltype would surely be
adopted.
...
Either way, there are tons of pros and cons for both of our
implementations. I really do like your implementation. It looks and
feels native. Maybe it faster than mine. Maybe it is not. Who knows?
Compiler are too different to let one state performance predictions a
priori.
I judge compile-time performance of an expression by the number steps
(and their orders of growth) involved into evaluating it, because I
think, this is what comes closest to actual real-life performance in
most cases. This led to my conclusion that an unrolled recursive
implementation might be faster.
I attached a test driver to this post. The testcase instantiates 25*25
distinct tuples (either vertical or horizontal ones) with 0 to 10
elements. Then optionally a type (meta-at) and/or object lookup (at) is
done on each element.
Very nice!  I love the way you've filtered out anything that's
"extraneous" both in the "vertical" (i.e. the boost_pp method) and
the "horizontal" (the tagged multiple inheritance method).  This makes
it easier to see what's going on.  However, I did have a hard time
understanding how the test cases were generated.  I added several
comments which I hope make it clearer.  I could upload this extra
commented version to the boost vault under the variadic_template
directory if you're interested.
...
I used gcc 4.4.1 (TDM-1 mingw32) and ptime to measure performance. It is
pretty interesting that the instantiation of the horizontal tuple type
is just slightly slower.
gcc 4.5 is now using hashing to speed up the template lookup:

   http://gcc.gnu.org/viewcvs?view=revision&revision=149188

So, that may lessen the difference.
...
The at-operations differ all the more surprisingly.
===  g++ -std=c++0x -c test.cpp ===
Execution time: 5.274 s
===  g++ -std=c++0x -c test.cpp -DAT ===
Execution time: 11.821 s
===  g++ -std=c++0x -c test.cpp -DMETA_AT ===
Execution time: 12.161 s
===  g++ -std=c++0x -c test.cpp -DAT -DMETA_AT ===
Execution time: 19.565 s
===  g++ -std=c++0x -c test.cpp -DVERTICAL ===
Execution time: 4.884 s
===  g++ -std=c++0x -c test.cpp -DAT -DVERTICAL ===
Execution time: 7.136 s
===  g++ -std=c++0x -c test.cpp -DMETA_AT -DVERTICAL ===
Execution time: 5.835 s
===  g++ -std=c++0x -c test.cpp -DAT -DMETA_AT -DVERTICAL ===
Execution time: 8.156 s
Putting this in a table for easier comparison:

             VERTICAL  HORIZONTAL

container      4.884       5.274
at             7.136      11.821
meta_at        5.835      12.161
at;meta_at     8.156      19.565
...
I hope I made sense.
Yes.  Thanks very much.  I would like to see this test case as part of
the fusion documentation so that others could see the advantage of
BOOST_PP vs the Tagged MI option.  Maybe it should be relegated to a
separate directory for documentation of the implementation instead of
including it in the user docs.  That way it would be available for
future maintainers but wouldn't distract current users.
...
[snip]
...
By the way, your variadic mpl implementation looks great.
Thank you.
...
Do you have any plans to integrate your work into the trunk?
I'd be happy to if there's interest. IIUC, there was some work done
during BoostCon09 on that; however, as noted in another post:

   http://article.gmane.org/gmane.comp.lib.boost.devel/194263

there's no code in the sandbox to compare with :(

Also, I've a strong suspicion that other's will suggest the code be
modified, using maybe some BOOST_PP magic, to lessen the number of
template instantiations.  In particular, I suspect this will be
suggested for where foldr_pack is used in the supertypes of the
sequences.  In particular, I'd think the technique you used in your
test driver( in the VERTICAL part) could be used with some benefit
there.  The interesting thing is, the original mpl didn't do
that, instead it used just plain template recursion.  OTOH, maybe it
wouldn't really be much use, but who knows?

Thanks for your work, Christopher.

-regards,
Larry