[Fusion] port to c++0x

I participated in Google's 'Summer of Code' for Boost this year. My project was to port Boost.Fusion to c++0x. The program is over now and I would like to give you a short report on my work and put my code up for comments. In a nutshell, there is rvalue support, native variadic templates are used if possible and the public classes and (meta-) functions are guarded by a set of static asserts that detect most invalid (template) arguments. I also added certain new functionality to provide an easier, faster and more intuitive programming interface. My implementation is pretty much backwards compatible and all test cases pass on gcc 3.4/4.4 and vc 9. Here is a short overview of the most striking changes besides the implementation of raw c++0x support. I removed the deque container and the extension-classes. The deque container is undocumented and, in my opinion, does not go well with the other container due to its intrinsic extension mechanism. There also should not be a performance gain compared to a regular fusion::joint_view in combination with two fusion::vector. The extension classes are pretty much Boost.Proto specific. At the moment they seem to be out of place, especially as they are undocumented, unmaintained and not even part of the regular file structure. I would like to see them either fully documented (with the underlying concept of segmented sequences) or in the detail namespace of Boost.Proto. I added support for associative iterators. An associative forward sequence shall provide an iterator type which implements the extension backend of the intrinsic (meta-)functions fusion::result_of::value_of_data, fusion::deref_data and fusion::result_of::key_of. This has the advantage that all view classes, even fusion::iterator_range, are able to adapt the underlying sequence's associative-ness. fusion::transform_view, as a special case, got an additional template parameter. This parameter specifies whether the view should model either the forward concept only or the associative and the forward concept. This design breaks consistency with Boost.MPL. Unfortunately these changes are crucial and I would rather risk this break with Boost.MPL than missing that important feature. Using the old implementation, there is no way to avoid explicitly converting from a view back to an associative sequence to preserve the associative-ness. This leads to a nasty runtime overhead due to evaluation of every element of the view and copy-/move-constructing it. Besides, other changes include: -If the compiler supports rvalue references, all fusion container have a template constructor (taking either const lvalue or 'templated' rvalue references). -If the compiler supports variadic templates, fusion::cons, fusion::pair and fusion::single_view have a variadic template constructor (taking either const lvalue or 'templated' rvalue references) that allows emplace construction. -The view classes adapt the full category of their underlying sequences. As mentioned above, the associative-ness is preserved, but also, if possible, the random-access or bidirectional trait. -The result_of::-metafunctions accept r- and lvalue references arguments. If an argument is not reference-qualified, it will be considered to be an lvalue reference. All existing extension implementations must be changed to adapt this new behaviour. -The algorithm functions now also take (and correctly forward) non-const reference arguments. -Fusion iterators (with reference compatible sequence types and a very same index) and views (with reference compatible sequence types) have a working operator= that correctly reassigns the underlying sequence references. -Fusion now also adapts MPL iterators. If, and only if, the <boost/fusion/adapted/mpl.hpp>-header is included, mpl sequences and iterators are recognized as fusion sequences and iterators. -There are new headers to adapt std::(tr1::)array or std::(tr1::)tuple. The overall compile-time performance of my code is satisfying. Compiling a compilation unit that includes all possible fusion headers takes on gcc 4.4 slightly more time than the official fusion implementation. If native variadic templates are present, the compilation is a lot faster (usually up to 3 to 6 times), regardless of the value of FUSION_MAX_xxx_SIZE. The compile-time performance of actual fusion code is mixed. The intrinsic container and iterator (meta-)functions compile a little bit slower compared to the official fusion implementation. This is due to explicit removal of the references on the arguments of all result_of::metafunctions and some additional abstraction layer. If the compiler support rvalue references, there has to be a rather ugly and slow type evaluation/transformation to avoid issues with rvalue references on non-array, non-class types (see /support/internal/ref.hpp and 8.5.3p5 of the current c++0x draft for more details). On the other hand, I added several optimizations (such as result preevaluation), and, if variadic templates are present, the actual instantiation of fusion containers is a lot faster. All in all, the compile-time performance is a little bit worse if c++0x features are not present. If c++0x features are present, the code should compile a little bit faster in most cases. For example, the lambda example of the Boost.Proto documentation, which internally heavily relies on Boost.Fusion, takes on my machine 2.6 seconds to compile with the 'old' fusion on gcc 4.4, 3 seconds with my port without -std=c++0x, and 2.1 seconds with c++0x features being present. The overall performance might be even better once Boost.MPL adapts native variadic templates and the compiler supports template aliases. There have been tons of other changes, mostly bugfixes, optimizations (in terms of compile time of course) and removals of redundancy. Whenever I had the choice, I tried to prefer clean, redundancy-free instead of heavily optimized code. The overall performance loss is not noticeable though, and most compile-time optimizations that go beyond the as-little-template-instantiations-as-possible paradigm are pretty much compiler dependent anyway. There is still some work to do. I did not add c++0x-specific test cases and there are several parts of the documentation which need to be updated. By the way, are there any official guidelines for documenting c++03/c++0x-specific differences? For now, I would like to hold off on implementing these changes until Boost.TypeTraits adapts c++0x. Right now I implemented missing or incorrectly behaving trait functions in the fusion::detail-namespace. I would rather like to use the official ones, at least in the documentation and in the test cases. Besides that, I think my implementation is trunk-ready. If there are no objections, I would like to finish the outstanding work as soon as possible, and then merge my code with the trunk. Of course I will provide long-term support for the project. In fact, I would love to become an official fusion maintainer. You can find the code in the boost sandbox (/sandbox/SOC/2009/fusion). If you test the code with a compiler that has decltype support, please use the HEAD of the trunk rather than the official 1.40 distribution package to compile your code. Best regards Christopher

Christopher Schmidt wrote:
I participated in Google's 'Summer of Code' for Boost this year. My project was to port Boost.Fusion to c++0x. The program is over now and I would like to give you a short report on my work and put my code up for comments.
Kudos to Christopher. You did a splendid job. I'm very happy with the work that you have done. I'm looking forward to having your code incorporated into Fusion in the future. Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net http://tinyurl.com/facebook-jdg

Joel de Guzman schrieb:
Christopher Schmidt wrote:
I participated in Google's 'Summer of Code' for Boost this year. My project was to port Boost.Fusion to c++0x. The program is over now and I would like to give you a short report on my work and put my code up for comments.
Kudos to Christopher. You did a splendid job. I'm very happy with the work that you have done. I'm looking forward to having your code incorporated into Fusion in the future.
Regards,
Thanks a lot! Based on the feedback of Dan, Hartmut, Larry and you I think I can start working on making my code trunk-mergable. For now I would like to try to keep up with the mainline development and get all boost test-cases (in particular the Boost.Proto and Boost.Spirit ones) passing when being compiled with my port. Once this is done, I will get back to this mailing list so a further approach can be discussed. Is this okay? -Christopher

Christopher Schmidt wrote:
Joel de Guzman schrieb:
Christopher Schmidt wrote:
I participated in Google's 'Summer of Code' for Boost this year. My project was to port Boost.Fusion to c++0x. The program is over now and I would like to give you a short report on my work and put my code up for comments.
Kudos to Christopher. You did a splendid job. I'm very happy with the work that you have done. I'm looking forward to having your code incorporated into Fusion in the future.
Regards,
Thanks a lot!
Based on the feedback of Dan, Hartmut, Larry and you I think I can start working on making my code trunk-mergable. For now I would like to try to keep up with the mainline development and get all boost test-cases (in particular the Boost.Proto and Boost.Spirit ones) passing when being compiled with my port. Once this is done, I will get back to this mailing list so a further approach can be discussed. Is this okay?
Fine by me :-) Looks like we have a new Fusion maintainer :-P Not that I'll be abandoning it, but there's no harm in having more than a couple either. Hat's off to you, Christopher! :-) Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net http://tinyurl.com/facebook-jdg

----- Original Message ---- From: Joel de Guzman <joel@boost-consulting.com> To: boost@lists.boost.org Sent: Thursday, 1 October, 2009 3:58:14 Subject: Re: [boost] [Fusion] port to c++0x
Joel de Guzman wrote: Christopher Schmidt wrote:
Joel de Guzman schrieb:
Christopher Schmidt wrote:
Based on the feedback of Dan, Hartmut, Larry and you I think I can start working on making my code trunk-mergable. For now I would like to try to keep up with the mainline development and get all boost test-cases (in particular the Boost.Proto and Boost.Spirit ones) passing when being compiled with my port. Once this is done, I will get back to this mailing list so a further approach can be discussed. Is this okay?
Fine by me :-) Looks like we have a new Fusion maintainer :-P Not that I'll be abandoning it, but there's no harm in having more than a couple either.
Hat's off to you, Christopher! :-)
Welcome aboard Christopher! Cheers Dan

dan marsden wrote:
----- Original Message ---- From: Joel de Guzman <joel@boost-consulting.com> To: boost@lists.boost.org Sent: Thursday, 1 October, 2009 3:58:14 Subject: Re: [boost] [Fusion] port to c++0x
Joel de Guzman wrote: Christopher Schmidt wrote:
Joel de Guzman schrieb:
Christopher Schmidt wrote:
Based on the feedback of Dan, Hartmut, Larry and you I think I can start working on making my code trunk-mergable. For now I would like to try to keep up with the mainline development and get all boost test-cases (in particular the Boost.Proto and Boost.Spirit ones) passing when being compiled with my port. Once this is done, I will get back to this mailing list so a further approach can be discussed. Is this okay?
Fine by me :-) Looks like we have a new Fusion maintainer :-P Not that I'll be abandoning it, but there's no harm in having more than a couple either.
Hat's off to you, Christopher! :-)
Welcome aboard Christopher!
As an aside, one of the GSOC selection criteria was "how likely are they to become ongoing boost contributors?". Obviously this project stands out. So an additional thanks from rest of the '09 GSOC gang to you and Hartmut. Ausgezeichnet. [golfclap] -t

troy d. straszheim schrieb:
dan marsden wrote:
----- Original Message ---- From: Joel de Guzman <joel@boost-consulting.com> To: boost@lists.boost.org Sent: Thursday, 1 October, 2009 3:58:14 Subject: Re: [boost] [Fusion] port to c++0x
Joel de Guzman wrote: Christopher Schmidt wrote:
Joel de Guzman schrieb:
Christopher Schmidt wrote:
Based on the feedback of Dan, Hartmut, Larry and you I think I can start working on making my code trunk-mergable. For now I would like to try to keep up with the mainline development and get all boost test-cases (in particular the Boost.Proto and Boost.Spirit ones) passing when being compiled with my port. Once this is done, I will get back to this mailing list so a further approach can be discussed. Is this okay?
Fine by me :-) Looks like we have a new Fusion maintainer :-P Not that I'll be abandoning it, but there's no harm in having more than a couple either.
Hat's off to you, Christopher! :-)
Welcome aboard Christopher!
As an aside, one of the GSOC selection criteria was "how likely are they to become ongoing boost contributors?". Obviously this project stands out. So an additional thanks from rest of the '09 GSOC gang to you and Hartmut. Ausgezeichnet.
[golfclap]
-t
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Thank you very much for the kind words. It's great to read feedback like that. I will not let you down. Regards -Christopher

On 09/22/09 17:40, Christopher Schmidt wrote:
I participated in Google's 'Summer of Code' for Boost this year. My project was to port Boost.Fusion to c++0x. The program is over now and I would like to give you a short report on my work and put my code up for comments. [snip]
You can find the code in the boost sandbox (/sandbox/SOC/2009/fusion). If you test the code with a compiler that has decltype support, please use the HEAD of the trunk rather than the official 1.40 distribution package to compile your code.
Best regards Christopher
Thanks for all the work. However, I was wondering why you couldn't eliminate some of the extensive BOOST_PP magic in: https://svn.boost.org/trac/boost/browser/sandbox/SOC/2009/fusion/boost/fusio... by using the "tagged multiple inheritance"(TMI) method shown around here: https://svn.boost.org/trac/boost/browser/sandbox/variadic_templates/libs/mpl... The only disadvantage of TMI that I can think of is that maybe the template recursion of package_range_c shown at: https://svn.boost.org/trac/boost/browser/sandbox/variadic_templates/boost/mp... suffers some limitation imposed by the template recursion depth. However, I don't know if that's a real limitation since I have tested where, if a single template is used in the recursion, then the -ftemplate-depth-XX seems irrelevant: http://gcc.gnu.org/ml/gcc-help/2009-05/msg00279.html The only other objection might be to limit the number of instantiations because the more template instantiations, the slower the compile time; however, Doug Gregor suggested he wasn't too worried about that at the bottom of: http://groups.google.com/group/comp.std.c++/msg/6449d909fd3d5cdc -regards, Larry

Larry Evans schrieb:
Thanks for all the work.
However, I was wondering why you couldn't eliminate some of the extensive BOOST_PP magic in:
https://svn.boost.org/trac/boost/browser/sandbox/SOC/2009/fusion/boost/fusio...
by using the "tagged multiple inheritance"(TMI) method shown around here:
https://svn.boost.org/trac/boost/browser/sandbox/variadic_templates/libs/mpl...
...
The only other objection might be to limit the number of instantiations because the more template instantiations, the slower the compile time; however, Doug Gregor suggested he wasn't too worried about that at the bottom of:
http://groups.google.com/group/comp.std.c++/msg/6449d909fd3d5cdc
-regards, Larry
Thank you very much for your feedback. I did consider that specific method. I decided against using it due to the enormous amount of template instantiations and the expensive type_at. The current implementation is 4-unrolled recursive. I am pretty certain that it is faster - there are about 4 to 8 times less template instantiations. Besides, this is the only place in the native variadic template code in which I use heavy preprocessor repetition. All other container (besides fusion::cons) have a clean implementation based on fusion::vector. -Christopher

On 09/23/09 14:47, Christopher Schmidt wrote:
Larry Evans schrieb:
Thanks for all the work.
However, I was wondering why you couldn't eliminate some of the extensive BOOST_PP magic in:
https://svn.boost.org/trac/boost/browser/sandbox/SOC/2009/fusion/boost/fusio...
by using the "tagged multiple inheritance"(TMI) method shown around here:
https://svn.boost.org/trac/boost/browser/sandbox/variadic_templates/libs/mpl...
...
The only other objection might be to limit the number of instantiations because the more template instantiations, the slower the compile time; however, Doug Gregor suggested he wasn't too worried about that at the bottom of:
http://groups.google.com/group/comp.std.c++/msg/6449d909fd3d5cdc
-regards, Larry
Thank you very much for your feedback.
Your'e very welcome.
I did consider that specific method. I decided against using it due to the enormous amount of template instantiations and the expensive type_at.
I'm not sure there would be more template instantiations. It would really help my understanding if maybe you could explain ( with maybe a few examples and concrete numbers for number of instantiations), why there would be more instantiations with the TMI method. I know that would be a lot of work, but it certainly would make your case a lot stronger.
The current implementation is 4-unrolled recursive. I am pretty certain that it is faster - there are about 4 to 8 times less template instantiations.
Yeah, I saw that 4 a lot of places. I think other libraries use some macro to set such a number. For example, BOOST_MPL_LIMIT_UNROLLING here: https://svn.boost.org/trac/boost/browser/trunk/boost/mpl/aux_/iter_fold_if_i... Are there plans to add a similar macro to variadic fusion vector?

On 09/23/09 14:47, Christopher Schmidt wrote:
Larry Evans schrieb:
by using the "tagged multiple inheritance"(TMI) method shown around here:
https://svn.boost.org/trac/boost/browser/sandbox/variadic_templates/libs/mpl...
...
The only other objection might be to limit the number of instantiations because the more template instantiations, the slower the compile time; however, Doug Gregor suggested he wasn't too worried about that at the bottom of:
http://groups.google.com/group/comp.std.c++/msg/6449d909fd3d5cdc
-regards, Larry
Thank you very much for your feedback.
I did consider that specific method. I decided against using it due to the enormous amount of template instantiations and the expensive type_at. The current implementation is 4-unrolled recursive. I am pretty certain that it is faster - there are about 4 to 8 times less template instantiations.
There's no type_at in: https://svn.boost.org/trac/boost/browser/sandbox/variadic_templates/libs/mpl... There is, however, a tuple<,>::at_type<I> nested struct. Is that what you're referring to? If so, I don't understand why that would be expensive since neither at_type or what it calls, at_elem, use recursion. Could you explain why at_type would be expensive? Or maybe you've run some timing tests on it. If so, could you provide the test driver so I could test to see why it's so expensive? Now, AFAICT, the number of template instantiations is, neglecting those needed for the package_range_c at: https://svn.boost.org/trac/boost/browser/sandbox/variadic_templates/libs/mpl... which (as Doug mentioned near the bottom of his 12 Jan 2009 post) are amortized, the number of "overhead" template instantiations is N+C where N is the number of elements and C is a small constant. I think C would be around 1, i.e. 1 for _Tuple_impl. The N is for the N _Tuple_element<Tag,Value> values where Tag is some instance of mpl::integral_c<IndexType,IndexValue>. By "overhead" I means the number needed besides the N for each element in the tuple and the 1 for the overall tuple (in the case of vector, that would be mpl::vector). Does that make sense? TIA. -Larry

Larry Evans schrieb:
There's no type_at in:
https://svn.boost.org/trac/boost/browser/sandbox/variadic_templates/libs/mpl...
There is, however, a tuple<,>::at_type<I> nested struct. Is that what you're referring to? If so, I don't understand why that would be expensive since neither at_type or what it calls, at_elem, use recursion. Could you explain why at_type would be expensive? Or maybe you've run some timing tests on it. If so, could you provide the test driver so I could test to see why it's so expensive?
Now, AFAICT, the number of template instantiations is, neglecting those needed for the package_range_c at:
https://svn.boost.org/trac/boost/browser/sandbox/variadic_templates/libs/mpl...
which (as Doug mentioned near the bottom of his 12 Jan 2009 post) are amortized, the number of "overhead" template instantiations is N+C where N is the number of elements and C is a small constant. I think C would be around 1, i.e. 1 for _Tuple_impl. The N is for the N _Tuple_element<Tag,Value> values where Tag is some instance of mpl::integral_c<IndexType,IndexValue>. By "overhead" I means the number needed besides the N for each element in the tuple and the 1 for the overall tuple (in the case of vector, that would be mpl::vector).
Does that make sense?
TIA.
-Larry
I am sorry to have kept you waiting so long. I have been busy with moving to a new flat. Consider an instantiation of a tuple with 7 elements, the types foo, bar, int, char, blub, dub and mup. My fusion vector will directly evolve to vector<foo,bar,int,char,blub,dub,mup> : detail::vector_impl<0,foo,bar,int,char,blub,dub,mup> : detail::vector_impl<4,blub,dub,mup> As you elaborated, your tuple will evolve to tuple<char, foo,bar,int,char,blub,dub,mup> : _Tuple_impl< mpl::package_c<char,0,1,2,3,4,4,5,6,7>, foo,bar,int,char,blub,dub,mup > : _Tuple_element<mpl::integral_c<char, 0>, foo> , _Tuple_element<mpl::integral_c<char, 1>, bar> , _Tuple_element<mpl::integral_c<char, 2>, int> , _Tuple_element<mpl::integral_c<char, 3>, char> , _Tuple_element<mpl::integral_c<char, 4>, blub> , _Tuple_element<mpl::integral_c<char, 5>, dub> , _Tuple_element<mpl::integral_c<char, 6>, mup> Summed up, there are N/4+1 template instantiations for my vector and N+2+x for your tuple, with x being the amount of instantiations for mpl::package_range_forward. If mpl::package_range_forward<...> is not cached, x will be N+C. Also you need to consider the overhead of template expressions that are not instantiated but still are evaluated. I think your at_type implementation is expensive as a complete template function invocation expression is evaluated. This involves a lot of steps in the lookup process that are not necessary in a meta-only implementation. Also the lookup process should scale commensurate to N. I cannot see how this should be faster than a controlled meta-recursion with a break condition. On top of that your implementation is limited to compiler with decltype support. I tried to avoid implementations in which certain c++0x features depend on others. A user, whose compiler does support variadic templates but not decltype, should still be able to enjoy the variadic template candy. This is not much of a problem in our specific case but in other cases an independent implementation is crucial. Either way, there are tons of pros and cons for both of our implementations. I really do like your implementation. It looks and feels native. Maybe it faster than mine. Maybe it is not. Who knows? Compiler are too different to let one state performance predictions a priori. I judge compile-time performance of an expression by the number steps (and their orders of growth) involved into evaluating it, because I think, this is what comes closest to actual real-life performance in most cases. This led to my conclusion that an unrolled recursive implementation might be faster. I attached a test driver to this post. The testcase instantiates 25*25 distinct tuples (either vertical or horizontal ones) with 0 to 10 elements. Then optionally a type (meta-at) and/or object lookup (at) is done on each element. I used gcc 4.4.1 (TDM-1 mingw32) and ptime to measure performance. It is pretty interesting that the instantiation of the horizontal tuple type is just slightly slower. The at-operations differ all the more surprisingly. === g++ -std=c++0x -c test.cpp === Execution time: 5.274 s === g++ -std=c++0x -c test.cpp -DAT === Execution time: 11.821 s === g++ -std=c++0x -c test.cpp -DMETA_AT === Execution time: 12.161 s === g++ -std=c++0x -c test.cpp -DAT -DMETA_AT === Execution time: 19.565 s === g++ -std=c++0x -c test.cpp -DVERTICAL === Execution time: 4.884 s === g++ -std=c++0x -c test.cpp -DAT -DVERTICAL === Execution time: 7.136 s === g++ -std=c++0x -c test.cpp -DMETA_AT -DVERTICAL === Execution time: 5.835 s === g++ -std=c++0x -c test.cpp -DAT -DMETA_AT -DVERTICAL === Execution time: 8.156 s I hope I made sense. Larry Evans schrieb:
Yeah, I saw that 4 a lot of places. I think other libraries use some macro to set such a number. For example, BOOST_MPL_LIMIT_UNROLLING here:
https://svn.boost.org/trac/boost/browser/trunk/boost/mpl/aux_/iter_fold_if_i...
Are there plans to add a similar macro to variadic fusion vector?
This is a good idea. I will put it on my TODO-list. Thanks! By the way, your variadic mpl implementation looks great. Do you have any plans to integrate your work into the trunk? -Christopher template<int> struct identity_int {}; #ifdef VERTICAL template<int Index,typename... Args> struct tuple_impl; template<int Index> struct tuple_impl<Index> {}; template<int Index,typename H0> struct tuple_impl<Index,H0> { typedef H0 t0; H0 h0; H0& get(identity_int<Index+0>){return h0;} H0 const& get(identity_int<Index+0>)const{return h0;} }; template<int Index,typename H0, typename H1> struct tuple_impl<Index,H0,H1> { typedef H0 t0; typedef H1 t1; H0 h0; H0& get(identity_int<Index+0>){return h0;} H0 const& get(identity_int<Index+0>)const{return h0;} H1 h1; H1& get(identity_int<Index+1>){return h1;} H1 const& get(identity_int<Index+1>)const{return h1;} }; template<int Index,typename H0, typename H1, typename H2> struct tuple_impl<Index,H0,H1,H2> { typedef H0 t0; typedef H1 t1; typedef H2 t2; H0 h0; H0& get(identity_int<Index+0>){return h0;} H0 const& get(identity_int<Index+0>)const{return h0;} H1 h1; H1& get(identity_int<Index+1>){return h1;} H1 const& get(identity_int<Index+1>)const{return h1;} H2 h2; H2& get(identity_int<Index+2>){return h2;} H2 const& get(identity_int<Index+2>)const{return h2;} }; template<int Index,typename H0, typename H1, typename H2, typename H3> struct tuple_impl<Index,H0,H1,H2, H3> { typedef H0 t0; typedef H1 t1; typedef H2 t2; typedef H3 t3; H0 h0; H0& get(identity_int<Index+0>){return h0;} H0 const& get(identity_int<Index+0>)const{return h0;} H1 h1; H1& get(identity_int<Index+1>){return h1;} H1 const& get(identity_int<Index+1>)const{return h1;} H2 h2; H2& get(identity_int<Index+2>){return h2;} H2 const& get(identity_int<Index+2>)const{return h2;} H3 h3; H3& get(identity_int<Index+3>){return h3;} H3 const& get(identity_int<Index+3>)const{return h3;} }; template<int Index,typename H0, typename H1, typename H2, typename H3, typename... Others> struct tuple_impl<Index,H0,H1,H2, H3,Others...>: tuple_impl<Index+4,Others...> { typedef tuple_impl<Index+4,Others...> base; typedef H0 t0; typedef H1 t1; typedef H2 t2; typedef H3 t3; using base::get; H0 h0; H0& get(identity_int<Index+0>){return h0;} H0 const& get(identity_int<Index+0>)const{return h0;} H1 h1; H1& get(identity_int<Index+1>){return h1;} H1 const& get(identity_int<Index+1>)const{return h1;} H2 h2; H2& get(identity_int<Index+2>){return h2;} H2 const& get(identity_int<Index+2>)const{return h2;} H3 h3; H3& get(identity_int<Index+3>){return h3;} H3 const& get(identity_int<Index+3>)const{return h3;} }; template<typename... Args> struct tuple: tuple_impl<0,Args...> {}; template<typename Tuple, int Index> struct meta_value_at: meta_value_at<typename Tuple::base, Index-4> {}; template<typename Tuple> struct meta_value_at<Tuple,0> { typedef typename Tuple::t0 type; }; template<typename Tuple> struct meta_value_at<Tuple,1> { typedef typename Tuple::t1 type; }; template<typename Tuple> struct meta_value_at<Tuple,2> { typedef typename Tuple::t2 type; }; template<typename Tuple> struct meta_value_at<Tuple,3> { typedef typename Tuple::t3 type; }; #else template<int Max,int... Args> struct make_package: make_package<Max-1,Max-1,Args...> {}; template<int...> struct package; template<int... Args> struct make_package<0,Args...> { typedef package<Args...> type; }; template<typename, typename Arg> struct element{Arg a;}; template<typename Keys, typename... Args> struct tuple_impl; template<int... Indices, typename... Args> struct tuple_impl<package<Indices...>, Args...>: element<identity_int<Indices>, Args>... {}; template<typename... Args> struct tuple: tuple_impl<typename make_package<sizeof...(Args)>::type, Args...> { template<int I, typename T> static T& at_elem(element<identity_int<I>,T>& a){return a.a;} template<int I, typename T> static T const& at_elem(element<identity_int<I>,T>const& a){return a.a;} template<int I, typename T> static T at_type_helper(element<identity_int<I>,T>& a); template<int I> struct at_type { typedef decltype(at_type_helper<I>(*reinterpret_cast<tuple*>(0))) type; }; }; #endif template<int I,int J, int Max, typename... Args> struct get_tuple: get_tuple<I,J,Max-1,identity_int<I*1000+J*10+Max>,Args...> {}; template<int I,int J, typename... Args> struct get_tuple<I,J,0, Args...> { typedef tuple<Args...> type; }; template<int I,int J=0> struct test_impl { typedef typename get_tuple<I, J, J%10>::type tuple_type; template<int K> static void at_test(tuple_type& t,identity_int<K>) { #ifdef META_AT # ifdef VERTICAL typename meta_value_at<tuple_type,K>::type(); # else typename tuple_type::template at_type<K>::type(); # endif #endif #ifdef AT # ifdef VERTICAL t.get(identity_int<K>()); # else tuple_type::template at_elem<K>(t); # endif #endif at_test(t,identity_int<K+1>()); } static void at_test(tuple_type&,identity_int<J%10>){} static void exec() { tuple_type t; at_test(t,identity_int<0>()); test_impl<I,J+1>::exec(); } }; template<int I> struct test_impl<I,25> { static void exec(){} }; template<int I=0> struct test { static void exec() { test_impl<I>::exec(); test<I+1>::exec(); } }; template<> struct test<25> { static void exec(){} }; int main() { test<>::exec(); }

On 09/26/09 10:40, Christopher Schmidt wrote:
Larry Evans schrieb:
There's no type_at in:
https://svn.boost.org/trac/boost/browser/sandbox/variadic_templates/libs/mpl...
There is, however, a tuple<,>::at_type<I> nested struct. Is that what you're referring to? If so, I don't understand why that would be expensive since neither at_type or what it calls, at_elem, use recursion. Could you explain why at_type would be expensive? Or maybe you've run some timing tests on it. If so, could you provide the test driver so I could test to see why it's so expensive?
[snip]
TIA.
-Larry
I am sorry to have kept you waiting so long. I have been busy with moving to a new flat.
No problem at all (I actually thought the response was pretty fast).
Consider an instantiation of a tuple with 7 elements, the types foo, bar, int, char, blub, dub and mup.
My fusion vector will directly evolve to
vector<foo,bar,int,char,blub,dub,mup> : detail::vector_impl<0,foo,bar,int,char,blub,dub,mup> : detail::vector_impl<4,blub,dub,mup>
As you elaborated, your tuple will evolve to
tuple<char, foo,bar,int,char,blub,dub,mup> : _Tuple_impl< mpl::package_c<char,0,1,2,3,4,4,5,6,7>, foo,bar,int,char,blub,dub,mup > : _Tuple_element<mpl::integral_c<char, 0>, foo> , _Tuple_element<mpl::integral_c<char, 1>, bar> , _Tuple_element<mpl::integral_c<char, 2>, int> , _Tuple_element<mpl::integral_c<char, 3>, char> , _Tuple_element<mpl::integral_c<char, 4>, blub> , _Tuple_element<mpl::integral_c<char, 5>, dub> , _Tuple_element<mpl::integral_c<char, 6>, mup>
Summed up, there are N/4+1 template instantiations for my vector and N+2+x for your tuple, with x being the amount of instantiations for mpl::package_range_forward. If mpl::package_range_forward<...> is not cached, x will be N+C.
Also you need to consider the overhead of template expressions that are not instantiated but still are evaluated.
Good point. I wasn't aware of that.
I think your at_type implementation is expensive as a complete template function invocation expression is evaluated. This involves a lot of steps in the lookup process that are not necessary in a meta-only implementation.
Good point. I had not considered that.
Also the lookup process should scale commensurate to N.
Is that not true with my tuple? AFAICT, it's not dependent on N. OOPS, wait, because the number of template instantiations depends on N, and the lookup time is dependent on the number of template instantiations, the lookup time is dependent on N, but AFAICT, it's commensurate (if by that you mean linear time complexity). Is that what you mean by commensurate?
I cannot see how this should be faster than a controlled meta-recursion with a break condition.
What's "controlled meta-recursion" and what's "a break condition". Is "a break condition" just the template recursion termination condition? Is "controlled meta-recursion" the recursion every 4 (or whatever) elements while for those 4 elements boost_pp is used to generate the member variables and access one of those 4 elements?
On top of that your implementation is limited to compiler with decltype support. I tried to avoid implementations in which certain c++0x features depend on others. A user, whose compiler does support variadic templates but not decltype, should still be able to enjoy the variadic template candy. This is not much of a problem in our specific case but in other cases an independent implementation is crucial.
I hadn't thought that was important because I'd assumed that by the time variadic templates were adopted, decltype would surely be adopted.
Either way, there are tons of pros and cons for both of our implementations. I really do like your implementation. It looks and feels native. Maybe it faster than mine. Maybe it is not. Who knows? Compiler are too different to let one state performance predictions a priori. I judge compile-time performance of an expression by the number steps (and their orders of growth) involved into evaluating it, because I think, this is what comes closest to actual real-life performance in most cases. This led to my conclusion that an unrolled recursive implementation might be faster.
I attached a test driver to this post. The testcase instantiates 25*25 distinct tuples (either vertical or horizontal ones) with 0 to 10 elements. Then optionally a type (meta-at) and/or object lookup (at) is done on each element.
Very nice! I love the way you've filtered out anything that's "extraneous" both in the "vertical" (i.e. the boost_pp method) and the "horizontal" (the tagged multiple inheritance method). This makes it easier to see what's going on. However, I did have a hard time understanding how the test cases were generated. I added several comments which I hope make it clearer. I could upload this extra commented version to the boost vault under the variadic_template directory if you're interested.
I used gcc 4.4.1 (TDM-1 mingw32) and ptime to measure performance. It is pretty interesting that the instantiation of the horizontal tuple type is just slightly slower.
gcc 4.5 is now using hashing to speed up the template lookup: http://gcc.gnu.org/viewcvs?view=revision&revision=149188 So, that may lessen the difference.
The at-operations differ all the more surprisingly.
=== g++ -std=c++0x -c test.cpp === Execution time: 5.274 s === g++ -std=c++0x -c test.cpp -DAT === Execution time: 11.821 s === g++ -std=c++0x -c test.cpp -DMETA_AT === Execution time: 12.161 s === g++ -std=c++0x -c test.cpp -DAT -DMETA_AT === Execution time: 19.565 s === g++ -std=c++0x -c test.cpp -DVERTICAL === Execution time: 4.884 s === g++ -std=c++0x -c test.cpp -DAT -DVERTICAL === Execution time: 7.136 s === g++ -std=c++0x -c test.cpp -DMETA_AT -DVERTICAL === Execution time: 5.835 s === g++ -std=c++0x -c test.cpp -DAT -DMETA_AT -DVERTICAL === Execution time: 8.156 s
Putting this in a table for easier comparison: VERTICAL HORIZONTAL container 4.884 5.274 at 7.136 11.821 meta_at 5.835 12.161 at;meta_at 8.156 19.565
I hope I made sense.
Yes. Thanks very much. I would like to see this test case as part of the fusion documentation so that others could see the advantage of BOOST_PP vs the Tagged MI option. Maybe it should be relegated to a separate directory for documentation of the implementation instead of including it in the user docs. That way it would be available for future maintainers but wouldn't distract current users.
[snip]
By the way, your variadic mpl implementation looks great.
Thank you.
Do you have any plans to integrate your work into the trunk?
I'd be happy to if there's interest. IIUC, there was some work done during BoostCon09 on that; however, as noted in another post: http://article.gmane.org/gmane.comp.lib.boost.devel/194263 there's no code in the sandbox to compare with :( Also, I've a strong suspicion that other's will suggest the code be modified, using maybe some BOOST_PP magic, to lessen the number of template instantiations. In particular, I suspect this will be suggested for where foldr_pack is used in the supertypes of the sequences. In particular, I'd think the technique you used in your test driver( in the VERTICAL part) could be used with some benefit there. The interesting thing is, the original mpl didn't do that, instead it used just plain template recursion. OTOH, maybe it wouldn't really be much use, but who knows? Thanks for your work, Christopher. -regards, Larry

comments inline... Larry Evans schrieb: ...
Also the lookup process should scale commensurate to N.
Is that not true with my tuple? AFAICT, it's not dependent on N. OOPS, wait, because the number of template instantiations depends on N, and the lookup time is dependent on the number of template instantiations, the lookup time is dependent on N, but AFAICT, it's commensurate (if by that you mean linear time complexity). Is that what you mean by commensurate?
I cannot see how this should be faster than a controlled meta-recursion with a break condition.
What's "controlled meta-recursion" and what's "a break condition". Is "a break condition" just the template recursion termination condition? Is "controlled meta-recursion" the recursion every 4 (or whatever) elements while for those 4 elements boost_pp is used to generate the member variables and access one of those 4 elements?
I expressed the above inartfully. I was just referring to the recursive at-metafunction of the vertical implementation that terminates once the relevant index is reached. The decltype-meta-at in a horizontal implementation visits every public baseclass as part of the lookup process. Even if a matching baseclass is found, the others will still be visited. This is not much of a problem, though. In most cases N is <15, and so this should not be relevant. It is much more important that the decltype-meta-at is not plain meta. As mentioned above, the argument matching process should be a lot more expensive than the simple meta-recursion.
On top of that your implementation is limited to compiler with decltype support. I tried to avoid implementations in which certain c++0x features depend on others. A user, whose compiler does support variadic templates but not decltype, should still be able to enjoy the variadic template candy. This is not much of a problem in our specific case but in other cases an independent implementation is crucial.
I hadn't thought that was important because I'd assumed that by the time variadic templates were adopted, decltype would surely be adopted.
From what I have heard, VC 10 does not support decltype. (AFAIK it does not support variadic templates either.) Anyway, the core language additions are way to complex to be adapted within one release cycle. Especially when it comes to more exotic compiler (such as IBM's C++ compiler for z/OS mainframes) adopting the full extend of c++0x will probably take decades.
Either way, there are tons of pros and cons for both of our implementations. I really do like your implementation. It looks and feels native. Maybe it faster than mine. Maybe it is not. Who knows? Compiler are too different to let one state performance predictions a priori. I judge compile-time performance of an expression by the number steps (and their orders of growth) involved into evaluating it, because I think, this is what comes closest to actual real-life performance in most cases. This led to my conclusion that an unrolled recursive implementation might be faster.
I attached a test driver to this post. The testcase instantiates 25*25 distinct tuples (either vertical or horizontal ones) with 0 to 10 elements. Then optionally a type (meta-at) and/or object lookup (at) is done on each element.
Very nice! I love the way you've filtered out anything that's "extraneous" both in the "vertical" (i.e. the boost_pp method) and the "horizontal" (the tagged multiple inheritance method). This makes it easier to see what's going on. However, I did have a hard time understanding how the test cases were generated. I added several comments which I hope make it clearer. I could upload this extra commented version to the boost vault under the variadic_template directory if you're interested.
Yeah, the code is a mess. Sorry. Go ahead and upload your version. It definitely will be helpful for other developers.
...
Yes. Thanks very much. I would like to see this test case as part of the fusion documentation so that others could see the advantage of BOOST_PP vs the Tagged MI option. Maybe it should be relegated to a separate directory for documentation of the implementation instead of including it in the user docs. That way it would be available for future maintainers but wouldn't distract current users.
That a good idea. Once (if?) I merge my branch to the trunk, I will get back to you.
...
Also, I've a strong suspicion that other's will suggest the code be modified, using maybe some BOOST_PP magic, to lessen the number of template instantiations. In particular, I suspect this will be suggested for where foldr_pack is used in the supertypes of the sequences. In particular, I'd think the technique you used in your test driver( in the VERTICAL part) could be used with some benefit there. The interesting thing is, the original mpl didn't do that, instead it used just plain template recursion. OTOH, maybe it wouldn't really be much use, but who knows?
I think both methods (unpacked vertical, horizontal) should be implemented and enabled/disabled on-the-fly based on empirically determined test results of the specific compiler (version). At least it makes sense in fusion as only the implementation of fusion::vector would need to be adjusted. Regards -Christopher

On 09/26/09 10:40, Christopher Schmidt wrote: [snip]
I used gcc 4.4.1 (TDM-1 mingw32) and ptime to measure performance. It is pretty interesting that the instantiation of the horizontal tuple type is just slightly slower. The at-operations differ all the more surprisingly.
=== g++ -std=c++0x -c test.cpp === Execution time: 5.274 s === g++ -std=c++0x -c test.cpp -DAT === Execution time: 11.821 s === g++ -std=c++0x -c test.cpp -DMETA_AT === Execution time: 12.161 s === g++ -std=c++0x -c test.cpp -DAT -DMETA_AT === Execution time: 19.565 s === g++ -std=c++0x -c test.cpp -DVERTICAL === Execution time: 4.884 s === g++ -std=c++0x -c test.cpp -DAT -DVERTICAL === Execution time: 7.136 s === g++ -std=c++0x -c test.cpp -DMETA_AT -DVERTICAL === Execution time: 5.835 s === g++ -std=c++0x -c test.cpp -DAT -DMETA_AT -DVERTICAL === Execution time: 8.156 s [snip] There's now a tuple.benchmark.zip in boost vault under the variadic_templates directory. It uses the compiler's -ftime-report flag to report times. The zip file also contains a short makefile (suffix .mk).
The report shows a large difference between gcc4.4 and gcc4.5 in the HORIZONTAL implementation. It shows with 4.5 there's not nearly as large a difference compared to the VERTICAL implementation. -regards, Larry

Larry Evans schrieb:
On 09/26/09 10:40, Christopher Schmidt wrote: [snip]
I used gcc 4.4.1 (TDM-1 mingw32) and ptime to measure performance. It is pretty interesting that the instantiation of the horizontal tuple type is just slightly slower. The at-operations differ all the more surprisingly.
=== g++ -std=c++0x -c test.cpp === Execution time: 5.274 s === g++ -std=c++0x -c test.cpp -DAT === Execution time: 11.821 s === g++ -std=c++0x -c test.cpp -DMETA_AT === Execution time: 12.161 s === g++ -std=c++0x -c test.cpp -DAT -DMETA_AT === Execution time: 19.565 s === g++ -std=c++0x -c test.cpp -DVERTICAL === Execution time: 4.884 s === g++ -std=c++0x -c test.cpp -DAT -DVERTICAL === Execution time: 7.136 s === g++ -std=c++0x -c test.cpp -DMETA_AT -DVERTICAL === Execution time: 5.835 s === g++ -std=c++0x -c test.cpp -DAT -DMETA_AT -DVERTICAL === Execution time: 8.156 s [snip] There's now a tuple.benchmark.zip in boost vault under the variadic_templates directory. It uses the compiler's -ftime-report flag to report times. The zip file also contains a short makefile (suffix .mk).
The report shows a large difference between gcc4.4 and gcc4.5 in the HORIZONTAL implementation. It shows with 4.5 there's not nearly as large a difference compared to the VERTICAL implementation.
-regards, Larry
Amazing results. I guess when it comes to gcc 4.5, the overall complexity of the code rather than the raw number of template instantiations ought be the quantum to measure compile-time performance. Thanks for all your work! -Christopher

on Sat Sep 26 2009, Christopher Schmidt <mr.chr.schmidt-AT-online.de> wrote:
I attached a test driver to this post. The testcase instantiates 25*25 distinct tuples (either vertical or horizontal ones) with 0 to 10 elements. Then optionally a type (meta-at) and/or object lookup (at) is done on each element. I used gcc 4.4.1 (TDM-1 mingw32) and ptime to measure performance. It is pretty interesting that the instantiation of the horizontal tuple type is just slightly slower.
FYI, gcc 4.5 is going to look dramatically different (better) than 4.4.x for many metaprogramming tasks, because they're switching from linear to O(1) lookup of template instantiations. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams wrote:
on Sat Sep 26 2009, Christopher Schmidt <mr.chr.schmidt-AT-online.de> wrote:
I attached a test driver to this post. The testcase instantiates 25*25 distinct tuples (either vertical or horizontal ones) with 0 to 10 elements. Then optionally a type (meta-at) and/or object lookup (at) is done on each element. I used gcc 4.4.1 (TDM-1 mingw32) and ptime to measure performance. It is pretty interesting that the instantiation of the horizontal tuple type is just slightly slower.
FYI, gcc 4.5 is going to look dramatically different (better) than 4.4.x for many metaprogramming tasks, because they're switching from linear to O(1) lookup of template instantiations.
Amazing! Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net http://www.facebook.com/djowel Meet me at BoostCon http://www.boostcon.com/home http://www.facebook.com/boostcon
participants (6)
-
Christopher Schmidt
-
dan marsden
-
David Abrahams
-
Joel de Guzman
-
Larry Evans
-
troy d. straszheim