
On 10/24/12 14:09, Eric Niebler wrote: [snip]
I presented at BoostCon my own benchmarks of tuple with and without preprocessing. The results were unambiguously and strongly in favor of unrolling with the preprocessor. Tested with gcc. The presentation is here:
https://github.com/boostcon/cppnow_presentations_2012/blob/master/mon/troubl... Thanks. I took a look at it with: http://www.viewdocsonline.com/document/ and saw the comparison chart on slide 12. That chart, as you say above, unambiguously shows favorably on the unrolled tuple.
The source code is here:
I downloaded that and AFAICT: * The preprocessor method is in: unrolled_tuple.hpp and is roughly the same as the vertical tuple implementation here: http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/... The main difference, AFAICT, is that unrolled uses aggregation (via the member declaration: tuple<Tail...> tail; on line 133. In contrast, the vertical tuple uses inheritance: struct tuple_impl<Index, BOOST_PP_ENUM_PARAMS(TUPLE_CHUNK, TUPLE_IMPL_TYPE_NAME), Others...> : tuple_impl<Index+TUPLE_CHUNK, Others...> as shown on line 42 of the .hpp file. I'm still trying to understand how the get works. What's puzzling to me is: template<typename Tuple, int I> static inline constexpr auto get_elem(Tuple &&that, int_<I>) RETURN( impl<I-I>::get_elem(static_cast<Tuple &&>(that).tail, int_<I-UNROLL_MAX>()) ) since impl<I-I> has got to be 0, why use I-I? Also, the impl template parameter, J, is not used anywhere. I'm sure I could figure the reason out eventually, but not yet :(. I brief explanation would help. Also, it's not obvious to me why: static_cast<Tuple &&>(that) is needed because that has been declared as Tuple &&. I've no idea what are the pros and cons of the two methods(unrolled vs vertical). * The variadic template method is in: tuple.cpp which is close to that here: http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/... in that both methods use multiple inheritance with an int key type paired with the tuple element type. In the case of tuple.cpp, the pairing is done with: template<int I, typename T> struct tuple_elem in tuple_impl_horizontal, pairing is done with: template<typename Key, typename Value> struct element ; template<int Key, typename Value> struct element<int_key<Key>,Value> { Value value; }; The get functions are essentially the same. After looking at the code (and Makefile) it's not clear how the benchmark was done. The Makefile has nothing about timing in it, and the readme.txt mentions nothing about timing. Looking at the tuple.cpp code shows something with tree_builder in it, which sounds like it might be the benchmark code; however, so does unrolled_tuple.cpp. So, what is the benchmark used to produce the chart on slide 12 of trouble_with_tuples.pptx?
I thought it also interesting that clang seems to do better than gcc, as reported here:
Interesting. I didn't test with clang.
I'll try testing your benchmark, if you provide the code, with both clang and g++ and post the results. -regards, Larry