Interesting article on stack-based TMP

Hi All, That article came out of reddit C++ section: http://pubby8.wordpress.com/2012/10/15/stack-based-template-metaprogramming-... It looks like TMP made easy (finally !). I am not that sure it is though and was wondering if anybody had any comments on it. Regards, Julien

On Mon, Oct 22, 2012 at 1:14 PM, Julien Nitard <julien.nitard@m4tp.org> wrote:
Hi All,
That article came out of reddit C++ section:
http://pubby8.wordpress.com/2012/10/15/stack-based-template-metaprogramming-...
It looks like TMP made easy (finally !). I am not that sure it is though and was wondering if anybody had any comments on it.
While I'm a fan of point-free programming, I think it the that comparison is not completely fair; if you allow c++11 template using syntax, then the classic MPL metafunction notation becomes: template<typename Predicate, typename List> struct foo: add<square<length<filter<Predicate, List>>>>, int_<5>>{}; Which is almost as good as the concatenative version. The extra <> noise is annoying in this case, but in the general case where the syntax tree is more complex, the classic MPL version would actually be more readable. Now, to make this thread on-topic: is anybody working on a C++11 rewrite of Boost.MPL? -- gpd

On 10/22/12 07:26, Giovanni Piero Deretta wrote:
On Mon, Oct 22, 2012 at 1:14 PM, Julien Nitard <julien.nitard@m4tp.org> wrote:
Hi All,
That article came out of reddit C++ section:
http://pubby8.wordpress.com/2012/10/15/stack-based-template-metaprogramming-...
It looks like TMP made easy (finally !). I am not that sure it is though and was wondering if anybody had any comments on it.
While I'm a fan of point-free programming,
What iws "point-free" programing?
I think it the that comparison is not completely fair; if you allow c++11 template using syntax, then the classic MPL metafunction notation becomes:
template<typename Predicate, typename List> struct foo: add<square<length<filter<Predicate, List>>>>, int_<5>>{};
Which is almost as good as the concatenative version.
What is "the concatenative" version?
The extra <> noise is annoying in this case, but in the general case where the syntax tree is more complex, the classic MPL version would actually be more readable.
Now, to make this thread on-topic: is anybody working on a C++11 rewrite of Boost.MPL?
-- gpd
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Mon, Oct 22, 2012 at 2:14 PM, Larry Evans <cppljevans@suddenlink.net> wrote:
On 10/22/12 07:26, Giovanni Piero Deretta wrote:
On Mon, Oct 22, 2012 at 1:14 PM, Julien Nitard <julien.nitard@m4tp.org> wrote:
Hi All,
That article came out of reddit C++ section:
http://pubby8.wordpress.com/2012/10/15/stack-based-template-metaprogramming-...
It looks like TMP made easy (finally !). I am not that sure it is though and was wondering if anybody had any comments on it.
While I'm a fan of point-free programming,
What iws "point-free" programing?
in general, expressions without explicit arguments. [snip]
Which is almost as good as the concatenative version.
What is "the concatenative" version?
the concatenative version it the one given in the article. In general concatenative programming is a form of point free programming where function call are just juxtaposed without any explicit delimitation of subexpressions. Stack languages are a subset of concatenative languages. For a more indepth description see: http://en.wikipedia.org/wiki/Concatenative_programming_language HTH, -- gpd

I'm the author that article and it's coincidental that I was actually thinking about gauging interest here about the library implementation of it that I'm working on. The library is coming along great and has support for C++03, full continuations, and Boost.MPL wrapper words. If there's sufficient interest I'll consider submitting it for formal review when I'm done.
It looks like TMP made easy (finally !)
"Easy" is not the correct word, but "easier" is absolutely true.
While I'm a fan of point-free programming, I think it the that comparison is not completely fair; if you allow c++11 template using syntax, then the classic MPL metafunction notation becomes:
template<typename Predicate, typename List> struct foo: add<square<length<filter<Predicate, List>>>>, int_<5>>{};
Which is almost as good as the concatenative version. The extra <> noise is annoying in this case, but in the general case where the syntax tree is more complex, the classic MPL version would actually be more readable.
You're correct, it's not a fair comparison; however, there are two additional points to consider: - 'using' syntax is a huge syntactical improvement, although from what I understand it cannot be used in higher-order metafunctions without some additional boilerplate. Not really a problem for a library though - I'd love to see Boost.MPL updated to support C++11. - Functional TMP unfortunately lacks the infix operators found in many other functional languages and thus -loses a lot of their readability. (Hopefully I sent this email correctly, I'm unfamiliar with mailing lists) -- Patrick

On 10/22/12 07:26, Giovanni Piero Deretta wrote: [snip]
Now, to make this thread on-topic: is anybody working on a C++11 rewrite of Boost.MPL?
Anyone attempting such a rewrite and planning to use the variadic template features might want to start from: http://svn.boost.org/svn/boost/sandbox/variadic_templates/boost/mpl/ However, using variadic templates may cause a slowdown in the compile times. IIRC, a variadic template version of proto was developed but was abandoned in favor of using: http://www.boost.org/doc/libs/1_51_0/libs/preprocessor/doc/index.html because of large compile times. -regards, Larry

On 23-10-2012 15:05, Larry Evans wrote:
On 10/22/12 07:26, Giovanni Piero Deretta wrote: [snip]
Now, to make this thread on-topic: is anybody working on a C++11 rewrite of Boost.MPL?
Anyone attempting such a rewrite and planning to use the variadic template features might want to start from:
http://svn.boost.org/svn/boost/sandbox/variadic_templates/boost/mpl/
However, using variadic templates may cause a slowdown in the compile times. IIRC, a variadic template version of proto was developed but was abandoned in favor of using:
http://www.boost.org/doc/libs/1_51_0/libs/preprocessor/doc/index.html
because of large compile times.
Hm. This is really sad if variadic templates can't be used for the thing they were created for. Is this an inherent problem for large-scale use of variadic templates? -Thorsten

(dropping the boost-users list...) On 10/23/2012 7:27 AM, Thorsten Ottosen wrote:
On 23-10-2012 15:05, Larry Evans wrote:
However, using variadic templates may cause a slowdown in the compile times. IIRC, a variadic template version of proto was developed but was abandoned in favor of using:
http://www.boost.org/doc/libs/1_51_0/libs/preprocessor/doc/index.html
because of large compile times.
Hm.
This is really sad if variadic templates can't be used for the thing they were created for. Is this an inherent problem for large-scale use of variadic templates?
I don't want the results of my early experimentation with variadic templates to be used to draw conclusions about the feature. What Larry says is true, but in part it was due to my own naivete. Variadic templates are very good for *some* things, like perfect forwarding. Other things can be done in very tricky ways of which I was not aware at the time, like getting O(1) access to the back of a template parameter pack. And yeah, sometimes preprocessing can still improve compile-times. Without random access into a parameter pack, a data structure like std::tuple is a massive, bloated TMP hog. A fully variadic Proto *is* possible, and would probably compile faster than the non-variadic one. Creating such a library is a non-trivial undertaking, but you can see my progress thus far at: https://github.com/ericniebler/proto-0x I have only needed preprocessing to partly unroll proto::expr, which is a tuple-like data structure. I imagine that a fully variadic mpl::vector will also be problematic. -- Eric Niebler BoostPro Computing http://www.boostpro.com

(renaming from Re: Interesting article on stack-based TMP) On 10/23/12 11:05, Eric Niebler wrote:
(dropping the boost-users list...) [snip]
I don't want the results of my early experimentation with variadic templates to be used to draw conclusions about the feature. What Larry says is true, but in part it was due to my own naivete. Variadic templates are very good for *some* things, like perfect forwarding. Other things can be done in very tricky ways of which I was not aware at the time, like getting O(1) access to the back of a template parameter pack. Could you please explain this O(1) trick briefly? And yeah, sometimes preprocessing can still improve compile-times. Without random access into a parameter pack, a data structure like std::tuple is a massive, bloated TMP hog. [snip] The attached is the output .txt file produced by running:
http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/... while in the working directory produced by `svn checkout` on the above svn directory. It shows that with std::tuple, the compile times are 4 times as long as with the "horizontal" tuple implementation which uses no preprocessing. It also shows the "horizontal" is about the same as the "vertical" which does use preprocessing. The "horizontal" trick (as Douglas Gregor has explained elsewhere) is to use multiple inheritance where the tuple elements are "paired" with the key to retrieve them. I thought it also interesting that clang seems to do better than gcc, as reported here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54710#c10 HTH. -regards, Larry

On 10/24/2012 8:47 AM, Larry Evans wrote:
The attached is the output .txt file produced by running:
http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/...
while in the working directory produced by `svn checkout` on the above svn directory. It shows that with std::tuple, the compile times are 4 times as long as with the "horizontal" tuple implementation which uses no preprocessing. It also shows the "horizontal" is about the same as the "vertical" which does use preprocessing. The "horizontal" trick (as Douglas Gregor has explained elsewhere) is to use multiple inheritance where the tuple elements are "paired" with the key to retrieve them.
Without seeing the different header files referenced by your benchmark program, or the tuple client code, it's impossible for anyone to draw any conclusions from your numbers. I presented at BoostCon my own benchmarks of tuple with and without preprocessing. The results were unambiguously and strongly in favor of unrolling with the preprocessor. Tested with gcc. The presentation is here: https://github.com/boostcon/cppnow_presentations_2012/blob/master/mon/troubl... The source code is here: https://github.com/ericniebler/home/tree/master/src/tuple
I thought it also interesting that clang seems to do better than gcc, as reported here:
Interesting. I didn't test with clang. -- Eric Niebler BoostPro Computing http://www.boostpro.com

On 10/24/12 14:09, Eric Niebler wrote:
On 10/24/2012 8:47 AM, Larry Evans wrote:
The attached is the output .txt file produced by running:
http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/...
while in the working directory produced by `svn checkout` on the above svn directory. It shows that with std::tuple, the compile times are 4 times as long as with the "horizontal" tuple implementation which uses no preprocessing. It also shows the "horizontal" is about the same as the "vertical" which does use preprocessing. The "horizontal" trick (as Douglas Gregor has explained elsewhere) is to use multiple inheritance where the tuple elements are "paired" with the key to retrieve them.
Without seeing the different header files referenced by your benchmark program, or the tuple client code, it's impossible for anyone to draw any conclusions from your numbers.
Are they not here: the http://svn.boost.org/svn/boost/sandbox/variadic_templates /sandbox/slim/test/ ? I meant to suggest that they could be found there with:
while in the working directory produced by `svn checkout` on the above svn directory.
Of course maybe that was not plain enough. If so, sorry. I will try to remember to be more explicit. -regards, Larry

On 10/24/12 14:14, Larry Evans wrote:
On 10/24/12 14:09, Eric Niebler wrote:
On 10/24/2012 8:47 AM, Larry Evans wrote:
The attached is the output .txt file produced by running:
http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/...
while in the working directory produced by `svn checkout` on the above svn directory. It shows that with std::tuple, the compile times are 4 times as long as with the "horizontal" tuple implementation which uses no preprocessing. It also shows the "horizontal" is about the same as the "vertical" which does use preprocessing. The "horizontal" trick (as Douglas Gregor has explained elsewhere) is to use multiple inheritance where the tuple elements are "paired" with the key to retrieve them.
Without seeing the different header files referenced by your benchmark program, or the tuple client code, it's impossible for anyone to draw any conclusions from your numbers.
Are they not here: the http://svn.boost.org/svn/boost/sandbox/variadic_templates /sandbox/slim/test/
OOPS. Using -MMD compiler option found the above was missing: macros.benchmark.pp.hpp Just added that. Hopefully that's all the needed files. Does it work for you now? -regards, Larry

On 10/24/12 14:09, Eric Niebler wrote: [snip]
I presented at BoostCon my own benchmarks of tuple with and without preprocessing. The results were unambiguously and strongly in favor of unrolling with the preprocessor. Tested with gcc. The presentation is here:
https://github.com/boostcon/cppnow_presentations_2012/blob/master/mon/troubl... Thanks. I took a look at it with: http://www.viewdocsonline.com/document/ and saw the comparison chart on slide 12. That chart, as you say above, unambiguously shows favorably on the unrolled tuple.
The source code is here:
I downloaded that and AFAICT: * The preprocessor method is in: unrolled_tuple.hpp and is roughly the same as the vertical tuple implementation here: http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/... The main difference, AFAICT, is that unrolled uses aggregation (via the member declaration: tuple<Tail...> tail; on line 133. In contrast, the vertical tuple uses inheritance: struct tuple_impl<Index, BOOST_PP_ENUM_PARAMS(TUPLE_CHUNK, TUPLE_IMPL_TYPE_NAME), Others...> : tuple_impl<Index+TUPLE_CHUNK, Others...> as shown on line 42 of the .hpp file. I'm still trying to understand how the get works. What's puzzling to me is: template<typename Tuple, int I> static inline constexpr auto get_elem(Tuple &&that, int_<I>) RETURN( impl<I-I>::get_elem(static_cast<Tuple &&>(that).tail, int_<I-UNROLL_MAX>()) ) since impl<I-I> has got to be 0, why use I-I? Also, the impl template parameter, J, is not used anywhere. I'm sure I could figure the reason out eventually, but not yet :(. I brief explanation would help. Also, it's not obvious to me why: static_cast<Tuple &&>(that) is needed because that has been declared as Tuple &&. I've no idea what are the pros and cons of the two methods(unrolled vs vertical). * The variadic template method is in: tuple.cpp which is close to that here: http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/... in that both methods use multiple inheritance with an int key type paired with the tuple element type. In the case of tuple.cpp, the pairing is done with: template<int I, typename T> struct tuple_elem in tuple_impl_horizontal, pairing is done with: template<typename Key, typename Value> struct element ; template<int Key, typename Value> struct element<int_key<Key>,Value> { Value value; }; The get functions are essentially the same. After looking at the code (and Makefile) it's not clear how the benchmark was done. The Makefile has nothing about timing in it, and the readme.txt mentions nothing about timing. Looking at the tuple.cpp code shows something with tree_builder in it, which sounds like it might be the benchmark code; however, so does unrolled_tuple.cpp. So, what is the benchmark used to produce the chart on slide 12 of trouble_with_tuples.pptx?
I thought it also interesting that clang seems to do better than gcc, as reported here:
Interesting. I didn't test with clang.
I'll try testing your benchmark, if you provide the code, with both clang and g++ and post the results. -regards, Larry

On 10/25/12 11:58, Larry Evans wrote: [snip]
On 10/24/12 14:09, Eric Niebler wrote: [snip]
The source code is here:
I downloaded that [snip] After looking at the code (and Makefile) it's not clear how the benchmark was done. The Makefile has nothing about timing in it, and the readme.txt mentions nothing about timing. Looking at the tuple.cpp code shows something with tree_builder in it, which sounds like it might be the benchmark code; however, so does unrolled_tuple.cpp. [snip] I should have looked closer at the code which would have shown that tuple.cpp and unrolled_tuple.cpp are the same starting at:
template<int I> struct tree_builder { So the benchmark code was just the code starting at that point, AFAICT. Right? -regards, Larry

On Thu, Oct 25, 2012 at 9:58 AM, Larry Evans <cppljevans@suddenlink.net>wrote: [...]
Also, it's not obvious to me why:
static_cast<Tuple &&>(that)
is needed because that has been declared as Tuple &&.
[...] The expression "that" (without quotes) is an lvalue, regardless of the declared type of the variable "that". Same reason that boost::forward/std::forward exists. It's like that (har har) to prevent you from unintentionally moving. - Jeff

On 10/25/12 11:58, Larry Evans wrote:
On 10/24/12 14:09, Eric Niebler wrote: [snip]
I presented at BoostCon my own benchmarks of tuple with and without preprocessing. [snip]
The source code is here:
I downloaded that and AFAICT:
* The preprocessor method is in:
unrolled_tuple.hpp
and is roughly the same as the vertical tuple implementation here:
http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/...
The main difference, AFAICT, is that unrolled uses aggregation (via the member declaration:
tuple<Tail...> tail;
on line 133. In contrast, the vertical tuple uses inheritance:
struct tuple_impl<Index, BOOST_PP_ENUM_PARAMS(TUPLE_CHUNK, TUPLE_IMPL_TYPE_NAME), Others...> : tuple_impl<Index+TUPLE_CHUNK, Others...>
as shown on line 42 of the .hpp file.
[snip]
* The variadic template method is in:
tuple.cpp
which is close to that here:
http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/...
in that both methods use multiple inheritance with an int key type paired with the tuple element type. In the case of tuple.cpp, the pairing is done with:
template<int I, typename T> struct tuple_elem
in tuple_impl_horizontal, pairing is done with:
template<typename Key, typename Value> struct element ; template<int Key, typename Value> struct element<int_key<Key>,Value> { Value value; };
The get functions are essentially the same.
[snip] Actually, another, maybe significant difference, is the templated CTORs in the BoostCon implementations. The sandbox/slim implementations don't have them, relying just on the default CTOR's for the tuple elements. Currently in the sandbox slim/test directory: http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/ there's tuple_impl_bcon12_horizontal.hpp tuple_impl_bcon12_vertical.hpp As indicated in comments within these files, the code was copied from files in: https://github.com/ericniebler/home/tree/master/src/tuple and then slightly modified to allow use in the tuple.benchmark.mini.cpp in the sandbox slim/test directory. The python program in that directory was also modified to use these 2 new implementations. The resulting test output is also in the sandbox slim/test directory in file: tuple.benchmark.mini.time.txt It shows that the bcon12 implementations: tuple_impl_bcon12_horizontal.hpp tuple_impl_bcon12_vertical.hpp compile significantly slower than the original implementations: tuple_impl_horizontal.hpp tuple_impl_vertical.hpp I hadn't expected that since, as mentioned above, there didn't seem to be that much difference in the 2 implementations. Anyone have any theories on why that's the case (I wouldn't think the templated CTOR's would cause any slowdown since they're not used). -regards, Larry

On 23-10-2012 18:05, Eric Niebler wrote:
(dropping the boost-users list...)
On 10/23/2012 7:27 AM, Thorsten Ottosen wrote:
On 23-10-2012 15:05, Larry Evans wrote:
However, using variadic templates may cause a slowdown in the compile times. IIRC, a variadic template version of proto was developed but was abandoned in favor of using:
http://www.boost.org/doc/libs/1_51_0/libs/preprocessor/doc/index.html
because of large compile times.
Hm.
This is really sad if variadic templates can't be used for the thing they were created for. Is this an inherent problem for large-scale use of variadic templates?
I guess some of my confusion comes from reading http://www.jot.fm/issues/issue_2008_02/article2.pdf some time ago. I think this article is why I thought they were a bonus for compilation time. It probably only affects the parsing time. -Thorsten

On 10/25/12 12:24, Thorsten Ottosen wrote: [snip]
I guess some of my confusion comes from reading
http://www.jot.fm/issues/issue_2008_02/article2.pdf
some time ago. I think this article is why I thought they were a bonus for compilation time.
In particular, page 33, in the last sentence before section 3, says: Section 4 demonstrates how variadic templates drastically reduce compilation times -Larry

on Thu Oct 25 2012, Larry Evans <cppljevans-AT-suddenlink.net> wrote:
On 10/25/12 12:24, Thorsten Ottosen wrote: [snip]
I guess some of my confusion comes from reading
http://www.jot.fm/issues/issue_2008_02/article2.pdf
some time ago. I think this article is why I thought they were a bonus for compilation time.
In particular, page 33, in the last sentence before section 3, says:
Section 4 demonstrates how variadic templates drastically reduce compilation times
I believe that they do/can. However, getting the most performance out of variadic templates involves avoiding O(N) recursive template instantiations, which can be maddeningly difficult to achieve for some problems. -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

On 22/10/12 14:14, Julien Nitard wrote:
Hi All,
That article came out of reddit C++ section:
http://pubby8.wordpress.com/2012/10/15/stack-based-template-metaprogramming-...
It looks like TMP made easy (finally !). I am not that sure it is though and was wondering if anybody had any comments on it.
It's just an evaluator for reverse polish notation. You can type evaluate_rpn(f, g, x) instead of f(g(x)) (assuming f and g have fixed arities of 1 which the system is informed of) hardly a revolution. I seriously doubt it makes debugging easier, and it definitely will only make compilation slower.

On 22/10/12 15:07, Mathias Gaunard wrote:
On 22/10/12 14:14, Julien Nitard wrote:
Hi All,
That article came out of reddit C++ section:
http://pubby8.wordpress.com/2012/10/15/stack-based-template-metaprogramming-...
It looks like TMP made easy (finally !). I am not that sure it is though and was wondering if anybody had any comments on it.
It's just an evaluator for reverse polish notation.
Sorry that's just polish notation, not reverse polish notation
participants (9)
-
David Abrahams
-
Eric Niebler
-
Giovanni Piero Deretta
-
Jeffrey Lee Hellrung, Jr.
-
Julien Nitard
-
Larry Evans
-
Mathias Gaunard
-
Patrick Bene
-
Thorsten Ottosen