Re: [boost] [Fusion] port to c++0x

26 Sep 2009

      Larry Evans schrieb:
...
There's no type_at in:
https://svn.boost.org/trac/boost/browser/sandbox/variadic_templates/libs/mpl...
There is, however, a tuple<,>::at_type<I> nested struct.  Is that what
you're referring to?  If so, I don't understand why that would be
expensive since neither at_type or what it calls, at_elem, use
recursion. Could you explain why at_type would be expensive?  Or maybe
you've run some timing tests on it.  If so, could you provide the test
driver so I could test to see why it's so expensive?
Now, AFAICT, the number of template instantiations is, neglecting
those needed for the package_range_c at:
https://svn.boost.org/trac/boost/browser/sandbox/variadic_templates/libs/mpl...
which (as Doug mentioned near the bottom of his 12 Jan 2009 post) are
amortized, the number of "overhead" template instantiations is N+C
where N is the number of elements and C is a small constant.  I think
C would be around 1, i.e. 1 for _Tuple_impl.  The N is for the N
_Tuple_element<Tag,Value> values where Tag is some instance of
mpl::integral_c<IndexType,IndexValue>.  By "overhead" I means the
number needed besides the N for each element in the tuple and the 1
for the overall tuple (in the case of vector, that would be
mpl::vector).
Does that make sense?
TIA.
-Larry
I am sorry to have kept you waiting so long. I have been busy with 
moving to a new flat.

Consider an instantiation of a tuple with 7 elements, the types foo, 
bar, int, char, blub, dub and mup.

My fusion vector will directly evolve to

vector<foo,bar,int,char,blub,dub,mup>
   : detail::vector_impl<0,foo,bar,int,char,blub,dub,mup>
       : detail::vector_impl<4,blub,dub,mup>

As you elaborated, your tuple will evolve to

tuple<char, foo,bar,int,char,blub,dub,mup>
   : _Tuple_impl<
         mpl::package_c<char,0,1,2,3,4,4,5,6,7>,
         foo,bar,int,char,blub,dub,mup
     >
       : _Tuple_element<mpl::integral_c<char, 0>, foo>
       , _Tuple_element<mpl::integral_c<char, 1>, bar>
       , _Tuple_element<mpl::integral_c<char, 2>, int>
       , _Tuple_element<mpl::integral_c<char, 3>, char>
       , _Tuple_element<mpl::integral_c<char, 4>, blub>
       , _Tuple_element<mpl::integral_c<char, 5>, dub>
       , _Tuple_element<mpl::integral_c<char, 6>, mup>

Summed up, there are N/4+1 template instantiations for my vector and 
N+2+x for your tuple, with x being the amount of instantiations for 
mpl::package_range_forward. If mpl::package_range_forward<...> is not 
cached, x will be N+C. Also you need to consider the overhead of 
template expressions that are not instantiated but still are evaluated.

I think your at_type implementation is expensive as a complete template 
function invocation expression is evaluated. This involves a lot of 
steps in the lookup process that are not necessary in a meta-only 
implementation. Also the lookup process should scale commensurate to N. 
I cannot see how this should be faster than a controlled meta-recursion 
with a break condition.

On top of that your implementation is limited to compiler with decltype 
support. I tried to avoid implementations in which certain c++0x 
features depend on others. A user, whose compiler does support variadic 
templates but not decltype, should still be able to enjoy the variadic 
template candy. This is not much of a problem in our specific case but 
in other cases an independent implementation is crucial.

Either way, there are tons of pros and cons for both of our 
implementations. I really do like your implementation. It looks and 
feels native. Maybe it faster than mine. Maybe it is not. Who knows? 
Compiler are too different to let one state performance predictions a 
priori.
I judge compile-time performance of an expression by the number steps 
(and their orders of growth) involved into evaluating it, because I 
think, this is what comes closest to actual real-life performance in 
most cases. This led to my conclusion that an unrolled recursive 
implementation might be faster.

I attached a test driver to this post. The testcase instantiates 25*25 
distinct tuples (either vertical or horizontal ones) with 0 to 10 
elements. Then optionally a type (meta-at) and/or object lookup (at) is 
done on each element.
I used gcc 4.4.1 (TDM-1 mingw32) and ptime to measure performance. It is 
pretty interesting that the instantiation of the horizontal tuple type 
is just slightly slower. The at-operations differ all the more surprisingly.

===  g++ -std=c++0x -c test.cpp ===
Execution time: 5.274 s
===  g++ -std=c++0x -c test.cpp -DAT ===
Execution time: 11.821 s
===  g++ -std=c++0x -c test.cpp -DMETA_AT ===
Execution time: 12.161 s
===  g++ -std=c++0x -c test.cpp -DAT -DMETA_AT ===
Execution time: 19.565 s
===  g++ -std=c++0x -c test.cpp -DVERTICAL ===
Execution time: 4.884 s
===  g++ -std=c++0x -c test.cpp -DAT -DVERTICAL ===
Execution time: 7.136 s
===  g++ -std=c++0x -c test.cpp -DMETA_AT -DVERTICAL ===
Execution time: 5.835 s
===  g++ -std=c++0x -c test.cpp -DAT -DMETA_AT -DVERTICAL ===
Execution time: 8.156 s

I hope I made sense.

Larry Evans schrieb:
...
Yeah, I saw that 4 a lot of places.  I think other libraries use some
macro to set such a number.  For example, BOOST_MPL_LIMIT_UNROLLING
here:
https://svn.boost.org/trac/boost/browser/trunk/boost/mpl/aux_/iter_fold_if_i...
...
Are there plans to add a similar macro to variadic fusion vector?
This is a good idea. I will put it on my TODO-list. Thanks!

By the way, your variadic mpl implementation looks great. Do you have 
any plans to integrate your work into the trunk?

-Christopher

template<int>
struct identity_int
{};

#ifdef VERTICAL
template<int Index,typename... Args>
struct tuple_impl;
template<int Index>
struct tuple_impl<Index>
{};
template<int Index,typename H0>
struct tuple_impl<Index,H0>
{
	typedef H0 t0;
	H0 h0;
	H0& get(identity_int<Index+0>){return h0;}
	H0 const& get(identity_int<Index+0>)const{return h0;}
};
template<int Index,typename H0, typename H1>
struct tuple_impl<Index,H0,H1>
{
	typedef H0 t0; typedef H1 t1;
	H0 h0;
	H0& get(identity_int<Index+0>){return h0;}
	H0 const& get(identity_int<Index+0>)const{return h0;}
	H1 h1;
	H1& get(identity_int<Index+1>){return h1;}
	H1 const& get(identity_int<Index+1>)const{return h1;}
};

template<int Index,typename H0, typename H1, typename H2>
struct tuple_impl<Index,H0,H1,H2>
{
	typedef H0 t0; typedef H1 t1; typedef H2 t2;
	H0 h0;
	H0& get(identity_int<Index+0>){return h0;}
	H0 const& get(identity_int<Index+0>)const{return h0;}
	H1 h1;
	H1& get(identity_int<Index+1>){return h1;}
	H1 const& get(identity_int<Index+1>)const{return h1;}
	H2 h2;
	H2& get(identity_int<Index+2>){return h2;}
	H2 const& get(identity_int<Index+2>)const{return h2;}
};
template<int Index,typename H0, typename H1, typename H2, typename H3>
struct tuple_impl<Index,H0,H1,H2, H3>
{
	typedef H0 t0; typedef H1 t1; typedef H2 t2; typedef H3 t3;
	H0 h0;
	H0& get(identity_int<Index+0>){return h0;}
	H0 const& get(identity_int<Index+0>)const{return h0;}
	H1 h1;
	H1& get(identity_int<Index+1>){return h1;}
	H1 const& get(identity_int<Index+1>)const{return h1;}
	H2 h2;
	H2& get(identity_int<Index+2>){return h2;}
	H2 const& get(identity_int<Index+2>)const{return h2;}
	H3 h3;
	H3& get(identity_int<Index+3>){return h3;}
	H3 const& get(identity_int<Index+3>)const{return h3;}
};
template<int Index,typename H0, typename H1, typename H2, typename H3, typename... Others>
struct tuple_impl<Index,H0,H1,H2, H3,Others...>:
	tuple_impl<Index+4,Others...>
{
	typedef tuple_impl<Index+4,Others...> base;
	typedef H0 t0; typedef H1 t1; typedef H2 t2; typedef H3 t3;
	using base::get;
	H0 h0;
	H0& get(identity_int<Index+0>){return h0;}
	H0 const& get(identity_int<Index+0>)const{return h0;}
	H1 h1;
	H1& get(identity_int<Index+1>){return h1;}
	H1 const& get(identity_int<Index+1>)const{return h1;}
	H2 h2;
	H2& get(identity_int<Index+2>){return h2;}
	H2 const& get(identity_int<Index+2>)const{return h2;}
	H3 h3;
	H3& get(identity_int<Index+3>){return h3;}
	H3 const& get(identity_int<Index+3>)const{return h3;}
};
template<typename... Args>
struct tuple:
	tuple_impl<0,Args...>
{};

template<typename Tuple, int Index>
struct meta_value_at:
	meta_value_at<typename Tuple::base, Index-4>
{};
template<typename Tuple>
struct meta_value_at<Tuple,0>
{
	typedef typename Tuple::t0 type;
};
template<typename Tuple>
struct meta_value_at<Tuple,1>
{
	typedef typename Tuple::t1 type;
};
template<typename Tuple>
struct meta_value_at<Tuple,2>
{
	typedef typename Tuple::t2 type;
};
template<typename Tuple>
struct meta_value_at<Tuple,3>
{
	typedef typename Tuple::t3 type;
};
#else
template<int Max,int... Args>
struct make_package:
	make_package<Max-1,Max-1,Args...>
{};
template<int...>
struct package;
template<int... Args>
struct make_package<0,Args...>
{
	typedef package<Args...> type;
};

template<typename, typename Arg>
struct element{Arg a;};
template<typename Keys, typename... Args>
struct tuple_impl;
template<int... Indices, typename... Args>
struct tuple_impl<package<Indices...>, Args...>:
	element<identity_int<Indices>, Args>... 
{};

template<typename... Args>
struct tuple:
	tuple_impl<typename make_package<sizeof...(Args)>::type, Args...> 
{
	template<int I, typename T>
	static T& at_elem(element<identity_int<I>,T>& a){return a.a;}
	template<int I, typename T>
	static T const& at_elem(element<identity_int<I>,T>const& a){return a.a;}

	template<int I, typename T>
	static T at_type_helper(element<identity_int<I>,T>& a); 
	template<int I>
	struct at_type
	{
		typedef decltype(at_type_helper<I>(*reinterpret_cast<tuple*>(0))) type;
	};
};
#endif

template<int I,int J, int Max, typename... Args>
struct get_tuple:
	get_tuple<I,J,Max-1,identity_int<I*1000+J*10+Max>,Args...>
{};
template<int I,int J, typename... Args>
struct get_tuple<I,J,0, Args...>
{
	typedef tuple<Args...> type;
};

template<int I,int J=0>
struct test_impl
{
	typedef typename get_tuple<I, J, J%10>::type tuple_type;

	template<int K>
	static void at_test(tuple_type& t,identity_int<K>)
	{
#ifdef META_AT
#	ifdef VERTICAL
		typename meta_value_at<tuple_type,K>::type();
#	else
		typename tuple_type::template at_type<K>::type();
#	endif
#endif
#ifdef AT
#	ifdef VERTICAL
		t.get(identity_int<K>());
#	else
		tuple_type::template at_elem<K>(t);
#	endif
#endif

		at_test(t,identity_int<K+1>());
	}
	static void at_test(tuple_type&,identity_int<J%10>){}

	static void exec()
	{
		tuple_type t;
		at_test(t,identity_int<0>());
		test_impl<I,J+1>::exec();
	}
};
template<int I>
struct test_impl<I,25>
{
	static void exec(){}
};

template<int I=0>
struct test
{
	static void exec()
	{
		test_impl<I>::exec();
		test<I+1>::exec();
	}
};
template<>
struct test<25>
{
	static void exec(){}
};

int main()
{
	test<>::exec();
}

Re: [boost] [Fusion] port to c++0x

Christopher Schmidt