[offtopic] C++11 useful trick

I recently ran into a limitation in recursive functions that make use of decltype in trailing return types. I've come up with a work-around that I think is a little non-obvious, so I thought I'd share. The problem came up while trying to implementing something like a variadic back() function. The "obvious" implementation doesn't work: template<typename T> T back(T && t) { return t; } template<typename T, typename ...U> auto back(T && t, U &&... u) -> decltype(::back(static_cast<U &&>(u)...)) // ERROR { return ::back(static_cast<U &&>(u)...); } int main() { int i = ::back(1,2,3); } The problem occurs on the line marked "HERE". Trouble is, the variadic "back" function is not yet in scope until after the return type. However, we'd like to use it when specifying the return type. The solution uses a default function template parameter to make the ::back symbol dependent, thereby deferring it's lookup until phase 2, when the symbol is visible. Check it: struct S { template<typename T> static T back(T && t) { return t; } template<typename T, typename ...U, typename Impl = S> static auto back(T && t, U &&... u) -> decltype(Impl::back(static_cast<U &&>(u)...)) { return Impl::back(static_cast<U &&>(u)...); } }; int main() { int i = S::back(1,2,3); } The trick is the "typename Impl = S" followed by a (now dependent) invocation of Impl::back. Voila. Apologies if this is a known technique, and I'm spamming the list with old news. It's news to me. -- Eric Niebler BoostPro Computing http://www.boostpro.com

On Jul 1, 2012, at 4:20 PM, Eric Niebler wrote:
I recently ran into a limitation in recursive functions that make use of decltype in trailing return types. I've come up with a work-around that I think is a little non-obvious, so I thought I'd share.
The problem came up while trying to implementing something like a variadic back() function. The "obvious" implementation doesn't work:
template<typename T> T back(T && t) { return t; }
template<typename T, typename ...U> auto back(T && t, U &&... u) -> decltype(::back(static_cast<U &&>(u)...)) // ERROR { return ::back(static_cast<U &&>(u)...); }
int main() { int i = ::back(1,2,3); }
The problem occurs on the line marked "HERE". Trouble is, the variadic "back" function is not yet in scope until after the return type. However, we'd like to use it when specifying the return type.
The solution uses a default function template parameter to make the ::back symbol dependent, thereby deferring it's lookup until phase 2, when the symbol is visible. Check it:
struct S { template<typename T> static T back(T && t) { return t; }
template<typename T, typename ...U, typename Impl = S> static auto back(T && t, U &&... u) -> decltype(Impl::back(static_cast<U &&>(u)...)) { return Impl::back(static_cast<U &&>(u)...); } };
int main() { int i = S::back(1,2,3); }
The trick is the "typename Impl = S" followed by a (now dependent) invocation of Impl::back. Voila.
Isn’t it the case that the back function without the U parameter pack has the same function signature as the back function with an empty U parameter pack, which would be the source of your troubles in the first case? The struct-based case simply ensures that the two backs will always have differing signatures since there will always be at least two template parameters for the second function, but this isn’t due to any special struct-related property. Admittedly, I’m still working to understand variadic templates, so I might be off-base here... Steve

On 7/1/2012 9:53 PM, Steve Ramsey wrote:
On Jul 1, 2012, at 4:20 PM, Eric Niebler wrote:
I recently ran into a limitation in recursive functions that make use of decltype in trailing return types. I've come up with a work-around that I think is a little non-obvious, so I thought I'd share.
The problem came up while trying to implementing something like a variadic back() function. The "obvious" implementation doesn't work:
template<typename T> T back(T && t) { return t; }
template<typename T, typename ...U> auto back(T && t, U &&... u) -> decltype(::back(static_cast<U &&>(u)...)) // ERROR { return ::back(static_cast<U &&>(u)...); }
int main() { int i = ::back(1,2,3); }
The problem occurs on the line marked "HERE". Trouble is, the variadic "back" function is not yet in scope until after the return type. However, we'd like to use it when specifying the return type.
The solution uses a default function template parameter to make the ::back symbol dependent, thereby deferring it's lookup until phase 2, when the symbol is visible. Check it:
struct S { template<typename T> static T back(T && t) { return t; }
template<typename T, typename ...U, typename Impl = S> static auto back(T && t, U &&... u) -> decltype(Impl::back(static_cast<U &&>(u)...)) { return Impl::back(static_cast<U &&>(u)...); } };
int main() { int i = S::back(1,2,3); }
The trick is the "typename Impl = S" followed by a (now dependent) invocation of Impl::back. Voila.
Isn’t it the case that the back function without the U parameter pack has the same function signature as the back function with an empty U parameter pack, which would be the source of your troubles in the first case? The struct-based case simply ensures that the two backs will always have differing signatures since there will always be at least two template parameters for the second function, but this isn’t due to any special struct-related property. Admittedly, I’m still working to understand variadic templates, so I might be off-base here...
You're off base. Although the variadic back *could* be called with 1 argument, overload resolution would prefer the non-variadic function over the variadic one. But that's irrelevant in this case. look at the invocation: ::back(1,2,3). That can *only* call the variadic one. It peels off one element and then tries to call ::back(2,3). That also could only possibly call the variadic one. However, it can't for the simple reason that the compiler doesn't know about it until after the trailing return type is parsed. But by making the invocation of back in the trailing return type dependent on a template argument, it forces name lookup to happen later, when the template arguments are known. HTH, -- Eric Niebler BoostPro Computing http://www.boostpro.com

On 2012-07-02 01:20, Eric Niebler wrote:
I recently ran into a limitation in recursive functions that make use of decltype in trailing return types. I've come up with a work-around that I think is a little non-obvious, so I thought I'd share.
The problem came up while trying to implementing something like a variadic back() function. The "obvious" implementation doesn't work:
template<typename T> T back(T && t) { return t; }
template<typename T, typename ...U> auto back(T && t, U &&... u) -> decltype(::back(static_cast<U &&>(u)...)) // ERROR { return ::back(static_cast<U &&>(u)...); }
int main() { int i = ::back(1,2,3); }
The problem occurs on the line marked "HERE". Trouble is, the variadic "back" function is not yet in scope until after the return type. However, we'd like to use it when specifying the return type.
The solution uses a default function template parameter to make the ::back symbol dependent, thereby deferring it's lookup until phase 2, when the symbol is visible. Check it:
struct S { template<typename T> static T back(T && t) { return t; }
template<typename T, typename ...U, typename Impl = S> static auto back(T && t, U &&... u) -> decltype(Impl::back(static_cast<U &&>(u)...)) { return Impl::back(static_cast<U &&>(u)...); } };
int main() { int i = S::back(1,2,3); }
The trick is the "typename Impl = S" followed by a (now dependent) invocation of Impl::back. Voila.
Apologies if this is a known technique, and I'm spamming the list with old news. It's news to me.
Hmm, I would have done something like this: // ----------------------------------------------------------------- namespace detail { template<typename T, typename ...U> struct back_t { typedef typename back_t<U...>::type type; }; template<typename T> struct back_t<T> { typedef T type; }; } template<typename T> T back(T && t) { return t; } template<typename T, typename ...U> typename detail::back_t<U...>::type back(T && t, U &&... u) { return ::back(static_cast<U &&>(u)...); } int main() { int i = ::back(1,2,3); } // ----------------------------------------------------------------- The advantages I see are that * nobody will be confused by the Impl parameter. * no need to put back() into a struct Do you see any drawbacks? Regards, Roland

On 7/2/2012 1:12 AM, Roland Bock wrote:
On 2012-07-02 01:20, Eric Niebler wrote:
I recently ran into a limitation in recursive functions that make use of decltype in trailing return types. I've come up with a work-around that I think is a little non-obvious, so I thought I'd share.
The problem came up while trying to implementing something like a variadic back() function. The "obvious" implementation doesn't work:
template<typename T> T back(T && t) { return t; }
template<typename T, typename ...U> auto back(T && t, U &&... u) -> decltype(::back(static_cast<U &&>(u)...)) // ERROR { return ::back(static_cast<U &&>(u)...); }
int main() { int i = ::back(1,2,3); }
The problem occurs on the line marked "HERE". Trouble is, the variadic "back" function is not yet in scope until after the return type. However, we'd like to use it when specifying the return type.
The solution uses a default function template parameter to make the ::back symbol dependent, thereby deferring it's lookup until phase 2, when the symbol is visible. Check it:
struct S { template<typename T> static T back(T && t) { return t; }
template<typename T, typename ...U, typename Impl = S> static auto back(T && t, U &&... u) -> decltype(Impl::back(static_cast<U &&>(u)...)) { return Impl::back(static_cast<U &&>(u)...); } };
int main() { int i = S::back(1,2,3); }
The trick is the "typename Impl = S" followed by a (now dependent) invocation of Impl::back. Voila.
Apologies if this is a known technique, and I'm spamming the list with old news. It's news to me.
Hmm, I would have done something like this:
// ----------------------------------------------------------------- namespace detail { template<typename T, typename ...U> struct back_t { typedef typename back_t<U...>::type type; };
template<typename T> struct back_t<T> { typedef T type; }; }
template<typename T> T back(T && t) { return t; }
template<typename T, typename ...U> typename detail::back_t<U...>::type back(T && t, U &&... u) { return ::back(static_cast<U &&>(u)...); }
int main() { int i = ::back(1,2,3); } // -----------------------------------------------------------------
The advantages I see are that
* nobody will be confused by the Impl parameter. * no need to put back() into a struct
Do you see any drawbacks?
Yeah, it's booo-ring. ;-) Also, it needlessly instantiates O(N) templates. And why write both a function template *and* a class template when the function template alone will do? Now you have to maintain two computations in parallel. I hate writing metafunctions to compute things the compiler already knows. And in less trivial examples, the metafunction wouldn't be so easy to write. BTW, only the implementation of back needs to go in a struct. The user-visible function could fwd to it. This is just a trivial example of the technique. I've been doing a lot of C++11 making use of automatic type deduction via decltype+trailing return types. This issue comes up fairly often for me, so finding a solution to this problem was satisfying and greatly simplified my code. BTW, you can also solve this problem by making the recursive call unqualified and relying on ADL, but then you're using ADL and have bigger issues. -- Eric Niebler BoostPro Computing http://www.boostpro.com

On 02/07/2012 19:29, Eric Niebler wrote:
Yeah, it's booo-ring. ;-) Also, it needlessly instantiates O(N) templates. And why write both a function template *and* a class template when the function template alone will do? Now you have to maintain two computations in parallel. I hate writing metafunctions to compute things the compiler already knows. And in less trivial examples, the metafunction wouldn't be so easy to write. BTW, only the implementation of back needs to go in a struct. The user-visible function could fwd to it.
Have you done benchmarks that shows your approach is significantly faster? Use of a C++11 feature just for the sake of it doesn't seem like such a good idea to me. Especially since BOOST_TYPEOF in the return type tends to be quite fragile, so it cannot really be emulated.

On 7/2/2012 11:18 AM, Mathias Gaunard wrote:
On 02/07/2012 19:29, Eric Niebler wrote:
Yeah, it's booo-ring. ;-) Also, it needlessly instantiates O(N) templates. And why write both a function template *and* a class template when the function template alone will do? Now you have to maintain two computations in parallel. I hate writing metafunctions to compute things the compiler already knows. And in less trivial examples, the metafunction wouldn't be so easy to write. BTW, only the implementation of back needs to go in a struct. The user-visible function could fwd to it.
Have you done benchmarks that shows your approach is significantly faster?
No.
Use of a C++11 feature just for the sake of it doesn't seem like such a good idea to me.
<baffle> Have I not given enough good reasons above? Maybe I'll just keep my good ideas to myself next time. :-/
Especially since BOOST_TYPEOF in the return type tends to be quite fragile, so it cannot really be emulated.
True. This is only for C++11 code. That's why the subject says "C++11 useful trick". :-/ -- Eric Niebler BoostPro Computing http://www.boostpro.com

On 02/07/2012 21:41, Eric Niebler wrote:
<baffle> Have I not given enough good reasons above? Maybe I'll just keep my good ideas to myself next time. :-/
Tricks are nice, but they can also obfuscate code somewhat and make it hard to maintain. If the trick is very useful, it may become an idiom. If its use is limited, it's more of a hack. Don't know what kind of trick this one is just yet. Sorry if I offended you.

On Monday, July 02, 2012 12:41:22 PM Eric Niebler wrote:
Use of a C++11 feature just for the sake of it doesn't seem like such a good idea to me.
<baffle> Have I not given enough good reasons above? Maybe I'll just keep my good ideas to myself next time. :-/
Please don't. This one is actually pretty widely applicable if you are doing type deduction in C++11; it took me quite a while to figure this one out when I ran into it. I, too, fail to see how the reasons you gave were not perfectly clear. Regards, Ravi

On 7/2/2012 7:26 PM, Ravi wrote:
On Monday, July 02, 2012 12:41:22 PM Eric Niebler wrote:
Use of a C++11 feature just for the sake of it doesn't seem like such a good idea to me.
<baffle> Have I not given enough good reasons above? Maybe I'll just keep my good ideas to myself next time. :-/
Please don't. This one is actually pretty widely applicable if you are doing type deduction in C++11; it took me quite a while to figure this one out when I ran into it. I, too, fail to see how the reasons you gave were not perfectly clear.
Thanks Ravi. For the record, this trick is also useful for CRTP bases when you need to static_cast this from base to derived in a return type: template<typename Derived> struct Base { template<typename D = Derived> auto foo() -> decltype(static_cast<D *>(this)->bar()) { return static_cast<D *>(this)->bar(); } }; struct Derived : Base<Derived> { int bar() { return 0; } }; int main() { Derived d; d.foo(); } If you try to do this instead: auto foo() -> decltype(static_cast<Derived *>(this)->bar()) it won't compile because the compiler can't tell that the static_cast is legit. You might think this would work: auto foo() -> decltype(std::declval<Derived &>().bar()) ...but then it can't find member bar() in Derived. -- Eric Niebler BoostPro Computing http://www.boostpro.com

On Jul 3, 2012, at 3:13 AM, Eric Niebler wrote:
On 7/2/2012 7:26 PM, Ravi wrote:
On Monday, July 02, 2012 12:41:22 PM Eric Niebler wrote:
Use of a C++11 feature just for the sake of it doesn't seem like such a good idea to me.
<baffle> Have I not given enough good reasons above? Maybe I'll just keep my good ideas to myself next time. :-/
Please don't. This one is actually pretty widely applicable if you are doing type deduction in C++11; it took me quite a while to figure this one out when I ran into it. I, too, fail to see how the reasons you gave were not perfectly clear.
Thanks Ravi. For the record, this trick is also useful for CRTP bases when you need to static_cast this from base to derived in a return type:
Something like this also showed up in Matt Calabrese's suggested BOOST_ENABLE_IF() macro that appeared in this thread: http://lists.boost.org/Archives/boost/2011/08/184465.php I didn't really understand the introduction of the dummy template parameter at the time, and the explanation Matt gave was (rough paraphrase, I can't find the exact quote) The BoostDetailEnableIfDependentType template parameter exists in order to ensure that the later condition is dependent on a template argument. which was pretty opaque to me at the time, but with Eric's explanation I think maybe I now understand.

On 2012-07-02 19:29, Eric Niebler wrote:
Yeah, it's booo-ring. ;-) Also, it needlessly instantiates O(N) templates. And why write both a function template *and* a class template when the function template alone will do? Now you have to maintain two computations in parallel. I hate writing metafunctions to compute things the compiler already knows. And in less trivial examples, the metafunction wouldn't be so easy to write. BTW, only the implementation of back needs to go in a struct. The user-visible function could fwd to it.
Boring? Hey! :-) The other arguments are certainly valid, though. I used the attached versions to compare compilation times: Using clang with -O3 Eric: real 0m0.471s user 0m0.420s sys 0m0.030s Roland: real 0m0.512s user 0m0.470s sys 0m0.020s Obviously the measurement is rather crude and imprecise, but your version definitely compiles faster. And it is also much shorter and easier to setup for such a comparison :-)
This is just a trivial example of the technique. I've been doing a lot of C++11 making use of automatic type deduction via decltype+trailing return types. This issue comes up fairly often for me, so finding a solution to this problem was satisfying and greatly simplified my code. I'll look through some of the code I've written recently. I guess I can apply this trick here and there.
Thanks for sharing! Regards, Roland

On 07/03/2012 09:50 AM, Roland Bock wrote:
I used the attached versions to compare compilation times:
Using clang with -O3
Eric: real 0m0.471s user 0m0.420s sys 0m0.030s
Roland: real 0m0.512s user 0m0.470s sys 0m0.020s
Obviously the measurement is rather crude and imprecise, but your version definitely compiles faster. And it is also much shorter and easier to setup for such a comparison :-)
Given your test, I suggest you benchmark the preprocessed output rather than the source directly.

On 2012-07-03 16:52, Mathias Gaunard wrote:
Given your test, I suggest you benchmark the preprocessed output rather than the source directly.
OK, created preprocessed versions with clang -E and measured again, but the results are similarly close (and way too much under the influence of the sheer amount of bytes that need to be processed). I came up with a new test, see attachments: Without optimization: Eric: real 0m11.457s user 0m11.080s sys 0m0.330s Roland: real 0m11.228s user 0m10.790s sys 0m0.400s With -O3: Eric: real 0m2.910s user 0m2.730s sys 0m0.150s Roland: real 0m2.867s user 0m2.680s sys 0m0.160s Interestingly enough, Eric's version takes a bit longer to compile on my machine: clang version 3.2 (trunk 155315) Target: x86_64-unknown-linux-gnu And clang crashes when I add another row of parameters in Eric's version. No problems with my version... Can somebody try with a different compiler? Or is this test complete nonsense for some reason? Regards, Roland

On 7/3/2012 9:22 AM, Roland Bock wrote:
On 2012-07-03 16:52, Mathias Gaunard wrote:
Given your test, I suggest you benchmark the preprocessed output rather than the source directly.
OK, created preprocessed versions with clang -E and measured again, but the results are similarly close (and way too much under the influence of the sheer amount of bytes that need to be processed).
I came up with a new test, see attachments:
Without optimization: Eric: real 0m11.457s user 0m11.080s sys 0m0.330s
Roland: real 0m11.228s user 0m10.790s sys 0m0.400s
With -O3: Eric: real 0m2.910s user 0m2.730s sys 0m0.150s
Roland: real 0m2.867s user 0m2.680s sys 0m0.160s
Interestingly enough, Eric's version takes a bit longer to compile on my machine:
Too close to be very meaningful. I find that with TMP, the real costs don't become evident until you have a non-trivial program.
clang version 3.2 (trunk 155315) Target: x86_64-unknown-linux-gnu
And clang crashes when I add another row of parameters in Eric's version. No problems with my version...
I hope you filed a bug. :-)
Can somebody try with a different compiler? Or is this test complete nonsense for some reason?
-- Eric Niebler BoostPro Computing http://www.boostpro.com

On 2012-07-03 19:36, Eric Niebler wrote:
On 7/3/2012 9:22 AM, Roland Bock wrote:
Given your test, I suggest you benchmark the preprocessed output rather than the source directly. OK, created preprocessed versions with clang -E and measured again, but
On 2012-07-03 16:52, Mathias Gaunard wrote: the results are similarly close (and way too much under the influence of the sheer amount of bytes that need to be processed).
I came up with a new test, see attachments:
Without optimization: Eric: real 0m11.457s user 0m11.080s sys 0m0.330s
Roland: real 0m11.228s user 0m10.790s sys 0m0.400s
With -O3: Eric: real 0m2.910s user 0m2.730s sys 0m0.150s
Roland: real 0m2.867s user 0m2.680s sys 0m0.160s
Interestingly enough, Eric's version takes a bit longer to compile on my machine: Too close to be very meaningful. I find that with TMP, the real costs don't become evident until you have a non-trivial program.
clang version 3.2 (trunk 155315) Target: x86_64-unknown-linux-gnu
And clang crashes when I add another row of parameters in Eric's version. No problems with my version... I hope you filed a bug. :-) Will do :-)

On 2012-07-03 19:50, Roland Bock wrote:
On 2012-07-03 19:36, Eric Niebler wrote:
And clang crashes when I add another row of parameters in Eric's version. No problems with my version... I hope you filed a bug. :-) Will do :-)

On 2012-07-03 20:52, Roland Bock wrote:
On 2012-07-03 19:50, Roland Bock wrote:
On 2012-07-03 19:36, Eric Niebler wrote:
And clang crashes when I add another row of parameters in Eric's version. No problems with my version... I hope you filed a bug. :-) Will do :-)
This is going way over my current abilities, but the ticket has caught attention: <snip> --- Comment #5 from Richard Smith <richard-llvm@metafoo.co.uk> 2012-07-03 16:31:55 CDT --- To be clear, we have three problems on the original code: 1) We don't appear to apply the instantiation depth limit in this case (we still hit a stack overflow with -ftemplate-depth=1). 2) The stack usage per instantiation is sometimes high enough to blow out the stack before we would have hit the limit anyway (depending on ulimits). 3) The compile-time (and, potentially, runtime) performance is quadratic in the number of arguments (but this is fundamental to the way the code has been written). Your second attachment avoids the first problem by using class templates (for which the limit is enforced), and mitigates the second problem (because the stack usage per instantiation is lower for that case). </snip - original code: Eric's version - second attachment: my "boring" version Richard also offers an alternative (see ticket link above) which he claims to be much more effective, but I haven't comprehended it yet...

On 7/3/2012 11:12 PM, Roland Bock wrote:
On 2012-07-03 20:52, Roland Bock wrote:
On 2012-07-03 19:50, Roland Bock wrote:
On 2012-07-03 19:36, Eric Niebler wrote:
And clang crashes when I add another row of parameters in Eric's version. No problems with my version... I hope you filed a bug. :-) Will do :-)
This is going way over my current abilities, but the ticket has caught attention:
<snip>
--- Comment #5 from Richard Smith <richard-llvm@metafoo.co.uk> 2012-07-03 16:31:55 CDT --- To be clear, we have three problems on the original code:
1) We don't appear to apply the instantiation depth limit in this case (we still hit a stack overflow with -ftemplate-depth=1).
Clang bug.
2) The stack usage per instantiation is sometimes high enough to blow out the stack before we would have hit the limit anyway (depending on ulimits).
Clang QoI. Hope they can get a handle on this one.
3) The compile-time (and, potentially, runtime) performance is quadratic in the number of arguments (but this is fundamental to the way the code has been written).
After scratching my head about this one, I think what he's getting at is the number of times arguments have to be copied and pushed on/popped off the stack. I can see how this can get expensive. The other version you posted, Roland, has the same problem.
Your second attachment avoids the first problem by using class templates (for which the limit is enforced), and mitigates the second problem (because the stack usage per instantiation is lower for that case).
He seems to be saying that instantiating a class template is cheaper (or requires less stack space?) than instantiating a function template. This surprises me.
- original code: Eric's version - second attachment: my "boring" version Richard also offers an alternative (see ticket link above) which he claims to be much more effective, but I haven't comprehended it yet...
Yes, I get it. He's using a variant of the variadic tuple trick. He wants to find the Nth element in a parameter pack. So he puts all the elements in a tuple-like thingy (select_impl) which holds individual elements, each in its own base class (select_base). But only the Nth base actually holds onto its argument. (See the partial specialization of select_base.) His solution instantiates O(N) instances of select_base, but gets you the Nth element in O(1). Fiendishly clever. -- Eric Niebler BoostPro Computing http://www.boostpro.com

On 2012-07-04 09:50, Eric Niebler wrote:
- original code: Eric's version - second attachment: my "boring" version Richard also offers an alternative (see ticket link above) which he claims to be much more effective, but I haven't comprehended it yet... Yes, I get it. He's using a variant of the variadic tuple trick. He wants to find the Nth element in a parameter pack. So he puts all the elements in a tuple-like thingy (select_impl) which holds individual elements, each in its own base class (select_base). But only the Nth base actually holds onto its argument. (See the partial specialization of select_base.) His solution instantiates O(N) instances of select_base, but gets you the Nth element in O(1). Fiendishly clever.
Huh! Thanks for the explanation! Really awesome! :-)

on Wed Jul 04 2012, Eric Niebler <eric-AT-boostpro.com> wrote:
He's using a variant of the variadic tuple trick. He wants to find the Nth element in a parameter pack. So he puts all the elements in a tuple-like thingy (select_impl) which holds individual elements, each in its own base class (select_base). But only the Nth base actually holds onto its argument. (See the partial specialization of select_base.) His solution instantiates O(N) instances of select_base, but gets you the Nth element in O(1). Fiendishly clever.
That explanation doesn't match what I'm seeing in his latest attachment (http://llvm.org/bugs/attachment.cgi?id=8838&action=edit), which doesn't require storing anything. Which is pretty cool! -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

On 7/28/2012 7:10 PM, Dave Abrahams wrote:
on Wed Jul 04 2012, Eric Niebler <eric-AT-boostpro.com> wrote:
He's using a variant of the variadic tuple trick. He wants to find the Nth element in a parameter pack. So he puts all the elements in a tuple-like thingy (select_impl) which holds individual elements, each in its own base class (select_base). But only the Nth base actually holds onto its argument. (See the partial specialization of select_base.) His solution instantiates O(N) instances of select_base, but gets you the Nth element in O(1). Fiendishly clever.
That explanation doesn't match what I'm seeing in his latest attachment
Right, I was commenting on this: http://llvm.org/bugs/attachment.cgi?id=8825
(http://llvm.org/bugs/attachment.cgi?id=8838&action=edit), which doesn't require storing anything. Which is pretty cool!
His latest implementation is very, very cool and I've already found another use for his technique. I'm slightly dismayed that we need to be so very clever to use variadic templates efficiently. This stuff is not obvious AT ALL. -- Eric Niebler BoostPro Computing http://www.boostpro.com

On 07/03/2012 06:22 PM, Roland Bock wrote:
Can somebody try with a different compiler? Or is this test complete nonsense for some reason?
With GCC 4.7.0, no optimization Roland: real 4.87 user 4.36 sys 0.20 Eric: Exhausts all my ram (8 GB) after 24.27 seconds With -O3 Roland: real 1.28 user 0.95 sys 0.18 Eric: Exhausts all my ram (8 GB) after 25.07 seconds

On 07/03/12 12:46, Mathias Gaunard wrote:
On 07/03/2012 06:22 PM, Roland Bock wrote:
Can somebody try with a different compiler? Or is this test complete nonsense for some reason?
With GCC 4.7.0, no optimization
Roland: real 4.87 user 4.36 sys 0.20
Eric: Exhausts all my ram (8 GB) after 24.27 seconds
With -O3
Roland: real 1.28 user 0.95 sys 0.18
Eric: Exhausts all my ram (8 GB) after 25.07 seconds
I'm surprised that with more optimization (-O3) less compile time is taken (1.28 < 4.87). The gcc 4.7 docs here: http://gcc.gnu.org/onlinedocs/gcc-4.7.1/gcc/Optimize- Options.html#Optimize-Options claim: Optimizing compilation takes somewhat more time, and a lot more memory for a large function. Anyone know why, in this particular case, -O3 is faster than with no optimization? -regards, Larry

On 07/03/2012 08:19 PM, Larry Evans wrote:
Anyone know why, in this particular case, -O3 is faster than with no optimization?
Without optimizations Execution times (seconds) phase setup : 0.01 phase parsing : 0.10 phase lang. deferred : 0.54 phase cgraph : 3.50 phase generate : 4.04 |name lookup : 0.08 |overload resolution : 0.30 With optimizations Execution times (seconds) phase setup : 0.01 phase parsing : 0.09 phase lang. deferred : 0.52 phase cgraph : 0.40 phase generate : 0.92 |name lookup : 0.03 |overload resolution : 0.34 Optimizations done by the frontend resulted in much less symbols to compute the call graph for.

With GCC 4.7.0, no optimization
[snip]
Eric: Exhausts all my ram (8 GB) after 24.27 seconds
With -O3
[snip]
Eric: Exhausts all my ram (8 GB) after 25.07 seconds
You may want to report that as a GCC bug. I experienced similar symptoms when reporting PR52748 [1], it could well be the same root cause. Regards, Nate [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52748

On 7/3/2012 12:18 PM, Nathan Ridge wrote:
With GCC 4.7.0, no optimization
[snip]
Eric: Exhausts all my ram (8 GB) after 24.27 seconds
With -O3
[snip]
Eric: Exhausts all my ram (8 GB) after 25.07 seconds
You may want to report that as a GCC bug.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53846
I experienced similar symptoms when reporting PR52748 [1], it could well be the same root cause.
Regards, Nate
This is not likely to be the same bug. N3276 is about decltype causing the recursive instantiation of a class template -- unconditionally. In this case, a call to the back function succeeds if you pass fewer parameters, but fails with more. Seems to merely indicate an inefficient algorithm, not a fundamental problem. -- Eric Niebler BoostPro Computing http://www.boostpro.com
participants (9)
-
Dave Abrahams
-
Eric Niebler
-
Kim Barrett
-
Larry Evans
-
Mathias Gaunard
-
Nathan Ridge
-
Ravi
-
Roland Bock
-
Steve Ramsey