
On 09/26/09 10:40, Christopher Schmidt wrote:
Larry Evans schrieb:
There's no type_at in:
https://svn.boost.org/trac/boost/browser/sandbox/variadic_templates/libs/mpl...
There is, however, a tuple<,>::at_type<I> nested struct. Is that what you're referring to? If so, I don't understand why that would be expensive since neither at_type or what it calls, at_elem, use recursion. Could you explain why at_type would be expensive? Or maybe you've run some timing tests on it. If so, could you provide the test driver so I could test to see why it's so expensive?
[snip]
TIA.
-Larry
I am sorry to have kept you waiting so long. I have been busy with moving to a new flat.
No problem at all (I actually thought the response was pretty fast).
Consider an instantiation of a tuple with 7 elements, the types foo, bar, int, char, blub, dub and mup.
My fusion vector will directly evolve to
vector<foo,bar,int,char,blub,dub,mup> : detail::vector_impl<0,foo,bar,int,char,blub,dub,mup> : detail::vector_impl<4,blub,dub,mup>
As you elaborated, your tuple will evolve to
tuple<char, foo,bar,int,char,blub,dub,mup> : _Tuple_impl< mpl::package_c<char,0,1,2,3,4,4,5,6,7>, foo,bar,int,char,blub,dub,mup > : _Tuple_element<mpl::integral_c<char, 0>, foo> , _Tuple_element<mpl::integral_c<char, 1>, bar> , _Tuple_element<mpl::integral_c<char, 2>, int> , _Tuple_element<mpl::integral_c<char, 3>, char> , _Tuple_element<mpl::integral_c<char, 4>, blub> , _Tuple_element<mpl::integral_c<char, 5>, dub> , _Tuple_element<mpl::integral_c<char, 6>, mup>
Summed up, there are N/4+1 template instantiations for my vector and N+2+x for your tuple, with x being the amount of instantiations for mpl::package_range_forward. If mpl::package_range_forward<...> is not cached, x will be N+C.
Also you need to consider the overhead of template expressions that are not instantiated but still are evaluated.
Good point. I wasn't aware of that.
I think your at_type implementation is expensive as a complete template function invocation expression is evaluated. This involves a lot of steps in the lookup process that are not necessary in a meta-only implementation.
Good point. I had not considered that.
Also the lookup process should scale commensurate to N.
Is that not true with my tuple? AFAICT, it's not dependent on N. OOPS, wait, because the number of template instantiations depends on N, and the lookup time is dependent on the number of template instantiations, the lookup time is dependent on N, but AFAICT, it's commensurate (if by that you mean linear time complexity). Is that what you mean by commensurate?
I cannot see how this should be faster than a controlled meta-recursion with a break condition.
What's "controlled meta-recursion" and what's "a break condition". Is "a break condition" just the template recursion termination condition? Is "controlled meta-recursion" the recursion every 4 (or whatever) elements while for those 4 elements boost_pp is used to generate the member variables and access one of those 4 elements?
On top of that your implementation is limited to compiler with decltype support. I tried to avoid implementations in which certain c++0x features depend on others. A user, whose compiler does support variadic templates but not decltype, should still be able to enjoy the variadic template candy. This is not much of a problem in our specific case but in other cases an independent implementation is crucial.
I hadn't thought that was important because I'd assumed that by the time variadic templates were adopted, decltype would surely be adopted.
Either way, there are tons of pros and cons for both of our implementations. I really do like your implementation. It looks and feels native. Maybe it faster than mine. Maybe it is not. Who knows? Compiler are too different to let one state performance predictions a priori. I judge compile-time performance of an expression by the number steps (and their orders of growth) involved into evaluating it, because I think, this is what comes closest to actual real-life performance in most cases. This led to my conclusion that an unrolled recursive implementation might be faster.
I attached a test driver to this post. The testcase instantiates 25*25 distinct tuples (either vertical or horizontal ones) with 0 to 10 elements. Then optionally a type (meta-at) and/or object lookup (at) is done on each element.
Very nice! I love the way you've filtered out anything that's "extraneous" both in the "vertical" (i.e. the boost_pp method) and the "horizontal" (the tagged multiple inheritance method). This makes it easier to see what's going on. However, I did have a hard time understanding how the test cases were generated. I added several comments which I hope make it clearer. I could upload this extra commented version to the boost vault under the variadic_template directory if you're interested.
I used gcc 4.4.1 (TDM-1 mingw32) and ptime to measure performance. It is pretty interesting that the instantiation of the horizontal tuple type is just slightly slower.
gcc 4.5 is now using hashing to speed up the template lookup: http://gcc.gnu.org/viewcvs?view=revision&revision=149188 So, that may lessen the difference.
The at-operations differ all the more surprisingly.
=== g++ -std=c++0x -c test.cpp === Execution time: 5.274 s === g++ -std=c++0x -c test.cpp -DAT === Execution time: 11.821 s === g++ -std=c++0x -c test.cpp -DMETA_AT === Execution time: 12.161 s === g++ -std=c++0x -c test.cpp -DAT -DMETA_AT === Execution time: 19.565 s === g++ -std=c++0x -c test.cpp -DVERTICAL === Execution time: 4.884 s === g++ -std=c++0x -c test.cpp -DAT -DVERTICAL === Execution time: 7.136 s === g++ -std=c++0x -c test.cpp -DMETA_AT -DVERTICAL === Execution time: 5.835 s === g++ -std=c++0x -c test.cpp -DAT -DMETA_AT -DVERTICAL === Execution time: 8.156 s
Putting this in a table for easier comparison: VERTICAL HORIZONTAL container 4.884 5.274 at 7.136 11.821 meta_at 5.835 12.161 at;meta_at 8.156 19.565
I hope I made sense.
Yes. Thanks very much. I would like to see this test case as part of the fusion documentation so that others could see the advantage of BOOST_PP vs the Tagged MI option. Maybe it should be relegated to a separate directory for documentation of the implementation instead of including it in the user docs. That way it would be available for future maintainers but wouldn't distract current users.
[snip]
By the way, your variadic mpl implementation looks great.
Thank you.
Do you have any plans to integrate your work into the trunk?
I'd be happy to if there's interest. IIUC, there was some work done during BoostCon09 on that; however, as noted in another post: http://article.gmane.org/gmane.comp.lib.boost.devel/194263 there's no code in the sandbox to compare with :( Also, I've a strong suspicion that other's will suggest the code be modified, using maybe some BOOST_PP magic, to lessen the number of template instantiations. In particular, I suspect this will be suggested for where foldr_pack is used in the supertypes of the sequences. In particular, I'd think the technique you used in your test driver( in the VERTICAL part) could be used with some benefit there. The interesting thing is, the original mpl didn't do that, instead it used just plain template recursion. OTOH, maybe it wouldn't really be much use, but who knows? Thanks for your work, Christopher. -regards, Larry