Interesting article on stack-based TMP

Julien Nitard

22 Oct 2012 22 Oct '12

12:14 p.m.

Hi All, That article came out of reddit C++ section: http://pubby8.wordpress.com/2012/10/15/stack-based-template-metaprogramming-... It looks like TMP made easy (finally !). I am not that sure it is though and was wondering if anybody had any comments on it. Regards, Julien

Show replies by date

Giovanni Piero Deretta

22 Oct 22 Oct

12:26 p.m.

On Mon, Oct 22, 2012 at 1:14 PM, Julien Nitard <julien.nitard@m4tp.org> wrote:

...

Hi All,

That article came out of reddit C++ section:

http://pubby8.wordpress.com/2012/10/15/stack-based-template-metaprogramming-...

It looks like TMP made easy (finally !). I am not that sure it is though and was wondering if anybody had any comments on it.

While I'm a fan of point-free programming, I think it the that comparison is not completely fair; if you allow c++11 template using syntax, then the classic MPL metafunction notation becomes: template<typename Predicate, typename List> struct foo: add<square<length<filter<Predicate, List>>>>, int_<5>>{}; Which is almost as good as the concatenative version. The extra <> noise is annoying in this case, but in the general case where the syntax tree is more complex, the classic MPL version would actually be more readable. Now, to make this thread on-topic: is anybody working on a C++11 rewrite of Boost.MPL? -- gpd

Larry Evans

1:14 p.m.

On 10/22/12 07:26, Giovanni Piero Deretta wrote:

...

On Mon, Oct 22, 2012 at 1:14 PM, Julien Nitard <julien.nitard@m4tp.org> wrote:

...
Hi All,

That article came out of reddit C++ section:

http://pubby8.wordpress.com/2012/10/15/stack-based-template-metaprogramming-...

It looks like TMP made easy (finally !). I am not that sure it is though and was wondering if anybody had any comments on it.

While I'm a fan of point-free programming,

What iws "point-free" programing?

...

I think it the that comparison is not completely fair; if you allow c++11 template using syntax, then the classic MPL metafunction notation becomes:

template<typename Predicate, typename List> struct foo: add<square<length<filter<Predicate, List>>>>, int_<5>>{};

Which is almost as good as the concatenative version.

What is "the concatenative" version?

...

The extra <> noise is annoying in this case, but in the general case where the syntax tree is more complex, the classic MPL version would actually be more readable.

Now, to make this thread on-topic: is anybody working on a C++11 rewrite of Boost.MPL?

-- gpd

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Giovanni Piero Deretta

1:22 p.m.

On Mon, Oct 22, 2012 at 2:14 PM, Larry Evans <cppljevans@suddenlink.net> wrote:

...

On 10/22/12 07:26, Giovanni Piero Deretta wrote:

...
On Mon, Oct 22, 2012 at 1:14 PM, Julien Nitard <julien.nitard@m4tp.org> wrote:

...
Hi All,

That article came out of reddit C++ section:

http://pubby8.wordpress.com/2012/10/15/stack-based-template-metaprogramming-...

It looks like TMP made easy (finally !). I am not that sure it is though and was wondering if anybody had any comments on it.

While I'm a fan of point-free programming,

What iws "point-free" programing?

in general, expressions without explicit arguments. [snip]

...

...
Which is almost as good as the concatenative version.

What is "the concatenative" version?

the concatenative version it the one given in the article. In general concatenative programming is a form of point free programming where function call are just juxtaposed without any explicit delimitation of subexpressions. Stack languages are a subset of concatenative languages. For a more indepth description see: http://en.wikipedia.org/wiki/Concatenative_programming_language HTH, -- gpd

Patrick Bene

6:09 p.m.

I'm the author that article and it's coincidental that I was actually thinking about gauging interest here about the library implementation of it that I'm working on. The library is coming along great and has support for C++03, full continuations, and Boost.MPL wrapper words. If there's sufficient interest I'll consider submitting it for formal review when I'm done.

...

It looks like TMP made easy (finally !)

"Easy" is not the correct word, but "easier" is absolutely true.

...

While I'm a fan of point-free programming, I think it the that comparison is not completely fair; if you allow c++11 template using syntax, then the classic MPL metafunction notation becomes:

template<typename Predicate, typename List> struct foo: add<square<length<filter<Predicate, List>>>>, int_<5>>{};

Which is almost as good as the concatenative version. The extra <> noise is annoying in this case, but in the general case where the syntax tree is more complex, the classic MPL version would actually be more readable.

You're correct, it's not a fair comparison; however, there are two additional points to consider: - 'using' syntax is a huge syntactical improvement, although from what I understand it cannot be used in higher-order metafunctions without some additional boilerplate. Not really a problem for a library though - I'd love to see Boost.MPL updated to support C++11. - Functional TMP unfortunately lacks the infix operators found in many other functional languages and thus -loses a lot of their readability. (Hopefully I sent this email correctly, I'm unfamiliar with mailing lists) -- Patrick

Larry Evans

23 Oct 23 Oct

1:05 p.m.

On 10/22/12 07:26, Giovanni Piero Deretta wrote: [snip]

...

Now, to make this thread on-topic: is anybody working on a C++11 rewrite of Boost.MPL?

Anyone attempting such a rewrite and planning to use the variadic template features might want to start from: http://svn.boost.org/svn/boost/sandbox/variadic_templates/boost/mpl/ However, using variadic templates may cause a slowdown in the compile times. IIRC, a variadic template version of proto was developed but was abandoned in favor of using: http://www.boost.org/doc/libs/1_51_0/libs/preprocessor/doc/index.html because of large compile times. -regards, Larry

Thorsten Ottosen

2:27 p.m.

On 23-10-2012 15:05, Larry Evans wrote:

...

On 10/22/12 07:26, Giovanni Piero Deretta wrote: [snip]

...
Now, to make this thread on-topic: is anybody working on a C++11 rewrite of Boost.MPL?

Anyone attempting such a rewrite and planning to use the variadic template features might want to start from:

http://svn.boost.org/svn/boost/sandbox/variadic_templates/boost/mpl/

However, using variadic templates may cause a slowdown in the compile times. IIRC, a variadic template version of proto was developed but was abandoned in favor of using:

http://www.boost.org/doc/libs/1_51_0/libs/preprocessor/doc/index.html

because of large compile times.

Hm. This is really sad if variadic templates can't be used for the thing they were created for. Is this an inherent problem for large-scale use of variadic templates? -Thorsten

Eric Niebler

4:05 p.m.

(dropping the boost-users list...) On 10/23/2012 7:27 AM, Thorsten Ottosen wrote:

...

On 23-10-2012 15:05, Larry Evans wrote:

...
However, using variadic templates may cause a slowdown in the compile times. IIRC, a variadic template version of proto was developed but was abandoned in favor of using:

http://www.boost.org/doc/libs/1_51_0/libs/preprocessor/doc/index.html

because of large compile times.

Hm.

This is really sad if variadic templates can't be used for the thing they were created for. Is this an inherent problem for large-scale use of variadic templates?

I don't want the results of my early experimentation with variadic templates to be used to draw conclusions about the feature. What Larry says is true, but in part it was due to my own naivete. Variadic templates are very good for *some* things, like perfect forwarding. Other things can be done in very tricky ways of which I was not aware at the time, like getting O(1) access to the back of a template parameter pack. And yeah, sometimes preprocessing can still improve compile-times. Without random access into a parameter pack, a data structure like std::tuple is a massive, bloated TMP hog. A fully variadic Proto *is* possible, and would probably compile faster than the non-variadic one. Creating such a library is a non-trivial undertaking, but you can see my progress thus far at: https://github.com/ericniebler/proto-0x I have only needed preprocessing to partly unroll proto::expr, which is a tuple-like data structure. I imagine that a fully variadic mpl::vector will also be problematic. -- Eric Niebler BoostPro Computing http://www.boostpro.com

Larry Evans

24 Oct 24 Oct

3:47 p.m.

New subject: tuple benchmarks show marked differences from std::tuple(was Re: Interesting article on stack-based TMP

(renaming from Re: Interesting article on stack-based TMP) On 10/23/12 11:05, Eric Niebler wrote:

...

(dropping the boost-users list...) [snip]

I don't want the results of my early experimentation with variadic templates to be used to draw conclusions about the feature. What Larry says is true, but in part it was due to my own naivete. Variadic templates are very good for *some* things, like perfect forwarding. Other things can be done in very tricky ways of which I was not aware at the time, like getting O(1) access to the back of a template parameter pack. Could you please explain this O(1) trick briefly? And yeah, sometimes preprocessing can still improve compile-times. Without random access into a parameter pack, a data structure like std::tuple is a massive, bloated TMP hog. [snip] The attached is the output .txt file produced by running:

http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/... while in the working directory produced by `svn checkout` on the above svn directory. It shows that with std::tuple, the compile times are 4 times as long as with the "horizontal" tuple implementation which uses no preprocessing. It also shows the "horizontal" is about the same as the "vertical" which does use preprocessing. The "horizontal" trick (as Douglas Gregor has explained elsewhere) is to use multiple inheritance where the tuple elements are "paired" with the key to retrieve them. I thought it also interesting that clang seems to do better than gcc, as reported here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54710#c10 HTH. -regards, Larry

Eric Niebler

7:09 p.m.

New subject: tuple benchmarks show marked differences from std::tuple(was Re: Interesting article on stack-based TMP

On 10/24/2012 8:47 AM, Larry Evans wrote:

...

The attached is the output .txt file produced by running:

http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/...

while in the working directory produced by `svn checkout` on the above svn directory. It shows that with std::tuple, the compile times are 4 times as long as with the "horizontal" tuple implementation which uses no preprocessing. It also shows the "horizontal" is about the same as the "vertical" which does use preprocessing. The "horizontal" trick (as Douglas Gregor has explained elsewhere) is to use multiple inheritance where the tuple elements are "paired" with the key to retrieve them.

Without seeing the different header files referenced by your benchmark program, or the tuple client code, it's impossible for anyone to draw any conclusions from your numbers. I presented at BoostCon my own benchmarks of tuple with and without preprocessing. The results were unambiguously and strongly in favor of unrolling with the preprocessor. Tested with gcc. The presentation is here: https://github.com/boostcon/cppnow_presentations_2012/blob/master/mon/troubl... The source code is here: https://github.com/ericniebler/home/tree/master/src/tuple

...

I thought it also interesting that clang seems to do better than gcc, as reported here:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54710#c10

Interesting. I didn't test with clang. -- Eric Niebler BoostPro Computing http://www.boostpro.com

Larry Evans

7:14 p.m.

New subject: tuple benchmarks show marked differences from std::tuple(was Re: Interesting article on stack-based TMP

On 10/24/12 14:09, Eric Niebler wrote:

...

On 10/24/2012 8:47 AM, Larry Evans wrote:

...
The attached is the output .txt file produced by running:

http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/...

while in the working directory produced by `svn checkout` on the above svn directory. It shows that with std::tuple, the compile times are 4 times as long as with the "horizontal" tuple implementation which uses no preprocessing. It also shows the "horizontal" is about the same as the "vertical" which does use preprocessing. The "horizontal" trick (as Douglas Gregor has explained elsewhere) is to use multiple inheritance where the tuple elements are "paired" with the key to retrieve them.

Without seeing the different header files referenced by your benchmark program, or the tuple client code, it's impossible for anyone to draw any conclusions from your numbers.

Are they not here: the http://svn.boost.org/svn/boost/sandbox/variadic_templates /sandbox/slim/test/ ? I meant to suggest that they could be found there with:

...

...
while in the working directory produced by `svn checkout` on the above svn directory.

Of course maybe that was not plain enough. If so, sorry. I will try to remember to be more explicit. -regards, Larry

Larry Evans

25 Oct 25 Oct

12:22 a.m.

New subject: tuple benchmarks show marked differences from std::tuple(was Re: Interesting article on stack-based TMP

On 10/24/12 14:14, Larry Evans wrote:

...

On 10/24/12 14:09, Eric Niebler wrote:

...
On 10/24/2012 8:47 AM, Larry Evans wrote:

...
The attached is the output .txt file produced by running:

http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/...

while in the working directory produced by `svn checkout` on the above svn directory. It shows that with std::tuple, the compile times are 4 times as long as with the "horizontal" tuple implementation which uses no preprocessing. It also shows the "horizontal" is about the same as the "vertical" which does use preprocessing. The "horizontal" trick (as Douglas Gregor has explained elsewhere) is to use multiple inheritance where the tuple elements are "paired" with the key to retrieve them.

Without seeing the different header files referenced by your benchmark program, or the tuple client code, it's impossible for anyone to draw any conclusions from your numbers.

Are they not here: the http://svn.boost.org/svn/boost/sandbox/variadic_templates /sandbox/slim/test/

OOPS. Using -MMD compiler option found the above was missing: macros.benchmark.pp.hpp Just added that. Hopefully that's all the needed files. Does it work for you now? -regards, Larry

Larry Evans

4:58 p.m.

New subject: tuple benchmarks show marked differences from std::tuple(was Re: Interesting article on stack-based TMP

On 10/24/12 14:09, Eric Niebler wrote: [snip]

...

I presented at BoostCon my own benchmarks of tuple with and without preprocessing. The results were unambiguously and strongly in favor of unrolling with the preprocessor. Tested with gcc. The presentation is here:

https://github.com/boostcon/cppnow_presentations_2012/blob/master/mon/troubl... Thanks. I took a look at it with: http://www.viewdocsonline.com/document/ and saw the comparison chart on slide 12. That chart, as you say above, unambiguously shows favorably on the unrolled tuple.

...

The source code is here:

https://github.com/ericniebler/home/tree/master/src/tuple

I downloaded that and AFAICT: * The preprocessor method is in: unrolled_tuple.hpp and is roughly the same as the vertical tuple implementation here: http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/... The main difference, AFAICT, is that unrolled uses aggregation (via the member declaration: tuple<Tail...> tail; on line 133. In contrast, the vertical tuple uses inheritance: struct tuple_impl<Index, BOOST_PP_ENUM_PARAMS(TUPLE_CHUNK, TUPLE_IMPL_TYPE_NAME), Others...> : tuple_impl<Index+TUPLE_CHUNK, Others...> as shown on line 42 of the .hpp file. I'm still trying to understand how the get works. What's puzzling to me is: template<typename Tuple, int I> static inline constexpr auto get_elem(Tuple &&that, int_<I>) RETURN( impl<I-I>::get_elem(static_cast<Tuple &&>(that).tail, int_<I-UNROLL_MAX>()) ) since impl<I-I> has got to be 0, why use I-I? Also, the impl template parameter, J, is not used anywhere. I'm sure I could figure the reason out eventually, but not yet :(. I brief explanation would help. Also, it's not obvious to me why: static_cast<Tuple &&>(that) is needed because that has been declared as Tuple &&. I've no idea what are the pros and cons of the two methods(unrolled vs vertical). * The variadic template method is in: tuple.cpp which is close to that here: http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/... in that both methods use multiple inheritance with an int key type paired with the tuple element type. In the case of tuple.cpp, the pairing is done with: template<int I, typename T> struct tuple_elem in tuple_impl_horizontal, pairing is done with: template<typename Key, typename Value> struct element ; template<int Key, typename Value> struct element<int_key<Key>,Value> { Value value; }; The get functions are essentially the same. After looking at the code (and Makefile) it's not clear how the benchmark was done. The Makefile has nothing about timing in it, and the readme.txt mentions nothing about timing. Looking at the tuple.cpp code shows something with tree_builder in it, which sounds like it might be the benchmark code; however, so does unrolled_tuple.cpp. So, what is the benchmark used to produce the chart on slide 12 of trouble_with_tuples.pptx?

...

...
I thought it also interesting that clang seems to do better than gcc, as reported here:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54710#c10

Interesting. I didn't test with clang.

I'll try testing your benchmark, if you provide the code, with both clang and g++ and post the results. -regards, Larry

Larry Evans

8:53 p.m.

New subject: tuple benchmarks show marked differences from std::tuple(was Re: Interesting article on stack-based TMP

On 10/25/12 11:58, Larry Evans wrote: [snip]

...

On 10/24/12 14:09, Eric Niebler wrote: [snip]

...
The source code is here:

https://github.com/ericniebler/home/tree/master/src/tuple

I downloaded that [snip] After looking at the code (and Makefile) it's not clear how the benchmark was done. The Makefile has nothing about timing in it, and the readme.txt mentions nothing about timing. Looking at the tuple.cpp code shows something with tree_builder in it, which sounds like it might be the benchmark code; however, so does unrolled_tuple.cpp. [snip] I should have looked closer at the code which would have shown that tuple.cpp and unrolled_tuple.cpp are the same starting at:

template<int I> struct tree_builder { So the benchmark code was just the code starting at that point, AFAICT. Right? -regards, Larry

Jeffrey Lee Hellrung, Jr.

26 Oct 26 Oct

2:24 a.m.

New subject: tuple benchmarks show marked differences from std::tuple(was Re: Interesting article on stack-based TMP

On Thu, Oct 25, 2012 at 9:58 AM, Larry Evans <cppljevans@suddenlink.net>wrote: [...]

...

Also, it's not obvious to me why:

static_cast<Tuple &&>(that)

is needed because that has been declared as Tuple &&.

[...] The expression "that" (without quotes) is an lvalue, regardless of the declared type of the variable "that". Same reason that boost::forward/std::forward exists. It's like that (har har) to prevent you from unintentionally moving. - Jeff

Larry Evans

1:27 p.m.

New subject: tuple benchmarks show marked differences from std::tuple(was Re: Interesting article on stack-based TMP

On 10/25/12 11:58, Larry Evans wrote:

...

On 10/24/12 14:09, Eric Niebler wrote: [snip]

...
I presented at BoostCon my own benchmarks of tuple with and without preprocessing. [snip]

The source code is here:

https://github.com/ericniebler/home/tree/master/src/tuple

I downloaded that and AFAICT:

* The preprocessor method is in:

unrolled_tuple.hpp

and is roughly the same as the vertical tuple implementation here:

http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/...

...

The main difference, AFAICT, is that unrolled uses aggregation (via the member declaration:

tuple<Tail...> tail;

on line 133. In contrast, the vertical tuple uses inheritance:

struct tuple_impl<Index, BOOST_PP_ENUM_PARAMS(TUPLE_CHUNK, TUPLE_IMPL_TYPE_NAME), Others...> : tuple_impl<Index+TUPLE_CHUNK, Others...>

as shown on line 42 of the .hpp file.

[snip]

...

* The variadic template method is in:

tuple.cpp

which is close to that here:

http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/...

...

in that both methods use multiple inheritance with an int key type paired with the tuple element type. In the case of tuple.cpp, the pairing is done with:

template<int I, typename T> struct tuple_elem

in tuple_impl_horizontal, pairing is done with:

template<typename Key, typename Value> struct element ; template<int Key, typename Value> struct element<int_key<Key>,Value> { Value value; };

The get functions are essentially the same.

[snip] Actually, another, maybe significant difference, is the templated CTORs in the BoostCon implementations. The sandbox/slim implementations don't have them, relying just on the default CTOR's for the tuple elements. Currently in the sandbox slim/test directory: http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/ there's tuple_impl_bcon12_horizontal.hpp tuple_impl_bcon12_vertical.hpp As indicated in comments within these files, the code was copied from files in: https://github.com/ericniebler/home/tree/master/src/tuple and then slightly modified to allow use in the tuple.benchmark.mini.cpp in the sandbox slim/test directory. The python program in that directory was also modified to use these 2 new implementations. The resulting test output is also in the sandbox slim/test directory in file: tuple.benchmark.mini.time.txt It shows that the bcon12 implementations: tuple_impl_bcon12_horizontal.hpp tuple_impl_bcon12_vertical.hpp compile significantly slower than the original implementations: tuple_impl_horizontal.hpp tuple_impl_vertical.hpp I hadn't expected that since, as mentioned above, there didn't seem to be that much difference in the 2 implementations. Anyone have any theories on why that's the case (I wouldn't think the templated CTOR's would cause any slowdown since they're not used). -regards, Larry

Thorsten Ottosen

25 Oct 25 Oct

5:24 p.m.

On 23-10-2012 18:05, Eric Niebler wrote:

...

(dropping the boost-users list...)

On 10/23/2012 7:27 AM, Thorsten Ottosen wrote:

...
On 23-10-2012 15:05, Larry Evans wrote:

...
However, using variadic templates may cause a slowdown in the compile times. IIRC, a variadic template version of proto was developed but was abandoned in favor of using:

http://www.boost.org/doc/libs/1_51_0/libs/preprocessor/doc/index.html

because of large compile times.

Hm.

This is really sad if variadic templates can't be used for the thing they were created for. Is this an inherent problem for large-scale use of variadic templates?

I guess some of my confusion comes from reading http://www.jot.fm/issues/issue_2008_02/article2.pdf some time ago. I think this article is why I thought they were a bonus for compilation time. It probably only affects the parsing time. -Thorsten

Larry Evans

6:20 p.m.

On 10/25/12 12:24, Thorsten Ottosen wrote: [snip]

...

I guess some of my confusion comes from reading

http://www.jot.fm/issues/issue_2008_02/article2.pdf

some time ago. I think this article is why I thought they were a bonus for compilation time.

In particular, page 33, in the last sentence before section 3, says: Section 4 demonstrates how variadic templates drastically reduce compilation times -Larry

David Abrahams

28 Oct 28 Oct

12:47 a.m.

on Thu Oct 25 2012, Larry Evans <cppljevans-AT-suddenlink.net> wrote:

...

On 10/25/12 12:24, Thorsten Ottosen wrote: [snip]

...
I guess some of my confusion comes from reading

http://www.jot.fm/issues/issue_2008_02/article2.pdf

some time ago. I think this article is why I thought they were a bonus for compilation time.

In particular, page 33, in the last sentence before section 3, says:

Section 4 demonstrates how variadic templates drastically reduce compilation times

I believe that they do/can. However, getting the most performance out of variadic templates involves avoiding O(N) recursive template instantiations, which can be maddeningly difficult to achieve for some problems. -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

Mathias Gaunard

22 Oct 22 Oct

1:07 p.m.

On 22/10/12 14:14, Julien Nitard wrote:

...

Hi All,

That article came out of reddit C++ section:

http://pubby8.wordpress.com/2012/10/15/stack-based-template-metaprogramming-...

It looks like TMP made easy (finally !). I am not that sure it is though and was wondering if anybody had any comments on it.

It's just an evaluator for reverse polish notation. You can type evaluate_rpn(f, g, x) instead of f(g(x)) (assuming f and g have fixed arities of 1 which the system is informed of) hardly a revolution. I seriously doubt it makes debugging easier, and it definitely will only make compilation slower.

Mathias Gaunard

2:52 p.m.

On 22/10/12 15:07, Mathias Gaunard wrote:

...

On 22/10/12 14:14, Julien Nitard wrote:

...
Hi All,

That article came out of reddit C++ section:

http://pubby8.wordpress.com/2012/10/15/stack-based-template-metaprogramming-...

It looks like TMP made easy (finally !). I am not that sure it is though and was wondering if anybody had any comments on it.

It's just an evaluator for reverse polish notation.

Sorry that's just polish notation, not reverse polish notation

4656

Age (days ago)

4662

Last active (days ago)

List overview

Download

20 comments

9 participants

participants (9)

David Abrahams
Eric Niebler
Giovanni Piero Deretta
Jeffrey Lee Hellrung, Jr.
Julien Nitard
Larry Evans
Mathias Gaunard
Patrick Bene
Thorsten Ottosen