[math] pow redux and observations

I noticed that the trunk now has corrections for pow<N> to reduce template instantiations. However, the "N=1" specialization isn't correct. The second template parameter (bool odd) should be "true," not "false." I could be mistaken that 1 is an odd number, though. :-) As it is now, the whole template would work, but would ignore the 1/false specialization and create a new 1/true specialization that just multiplies "base" by the result of 0/false. Changing 1/false to 1/true will just cause it to return "base," as desired. Note also, please, that positive_power<0,false> returns 1 when base is 0. Since 0^0 is technically undefined, the "N=0" case should probably be handled at a higher level (in power_if_positive, perhaps), and cause an error, just like negative powers of 0. If that is done, the 0/false case could be removed from positive_power.

John Moeller <fishcorn <at> gmail.com> writes:
Note also, please, that positive_power<0,false> returns 1 when base is 0. Since 0^0 is technically undefined, the "N=0" case should probably be handled at a higher level (in power_if_positive, perhaps), and cause an error, just like negative powers of 0. If that is done, the 0/false case could be removed from positive_power.
Actually, a rationale could be provided for pow<0>(0) == 1. But it should be explained in the documentation. Alternatively, you could use a policy class, I suppose. Either way, pow<0> should probably be handled as a special case, not in positive_power.

Hello,
Actually, a rationale could be provided for pow<0>(0) == 1. But it should be explained in the documentation. Alternatively, you could use a policy class, I suppose. Either way, pow<0> should probably be handled as a special case, not in positive_power.
As returning 1 makes sense in most cases, I'd be inclined to simply state this behavior in the doc. We can use policies, but in this case I vote for having 2 different policies for negatives powers of 0 and 0^0, with throw_on_error as a default for the first one and ignore_error (namely: return 1) for the second one. Bruno

Bruno Lalande wrote:
Hello,
Actually, a rationale could be provided for pow<0>(0) == 1. But it should be explained in the documentation. Alternatively, you could use a policy class, I suppose. Either way, pow<0> should probably be handled as a special case, not in positive_power.
As returning 1 makes sense in most cases, I'd be inclined to simply state this behavior in the doc.
We can use policies, but in this case I vote for having 2 different policies for negatives powers of 0 and 0^0, with throw_on_error as a default for the first one and ignore_error (namely: return 1) for the second one.
Bruno, ping me if you need any help adding new policies to the existing framework. Do you have Sandbox SVN commit access BTW? Let me know if not and I'll sort that out as well (the procedure I've been testing out with this library is to make all substantive changes to the Sandbox version and only trivial fixes to Trunk and then periodically synchronise the Trunk and Sandbox versions). Cheers, John.

Bruno, ping me if you need any help adding new policies to the existing framework.
OK I'll do this when I'll have finished to compare Hervé's and John's solutions proposed in the other thread in terms of performances on several compilers. All this could take a few days since I'm a bit busy this week...
Do you have Sandbox SVN commit access BTW? Let me know if not and I'll sort that out as well (the procedure I've been testing out with this library is to make all substantive changes to the Sandbox version and only trivial fixes to Trunk and then periodically synchronise the Trunk and Sandbox versions).
No I don't have sandbox access for now, indeed it would be helpful to share my work more easily. Bruno

John Moeller wrote:
I noticed that the trunk now has corrections for pow<N> to reduce template instantiations. However, the "N=1" specialization isn't correct. The second template parameter (bool odd) should be "true," not "false." I could be mistaken that 1 is an odd number, though. :-)
As it is now, the whole template would work, but would ignore the 1/false specialization and create a new 1/true specialization that just multiplies "base" by the result of 0/false. Changing 1/false to 1/true will just cause it to return "base," as desired.
Note also, please, that positive_power<0,false> returns 1 when base is 0. Since 0^0 is technically undefined, the "N=0" case should probably be handled at a higher level (in power_if_positive, perhaps), and cause an error, just like negative powers of 0. If that is done, the 0/false case could be removed from positive_power.
Right. I've fixed the partial specialization buglet, but please note that the current version is really just a stopgap: there have been many excellent suggestions on this list, and if Bruno can wrap them all together into a new improved version, that would probably be the best solution. John.

John Moeller <fishcorn <at> gmail.com> writes:
I noticed that the trunk now has corrections for pow<N> to reduce template instantiations. However, the "N=1" specialization isn't correct. The second template parameter (bool odd) should be "true," not "false." I could be mistaken that 1 is an odd number, though.
As it is now, the whole template would work, but would ignore the 1/false specialization and create a new 1/true specialization that just multiplies "base" by the result of 0/false. Changing 1/false to 1/true will just cause it to return "base," as desired.
One more thing. Note that this is an honest question from me: Can it be proportionally difficult for a compiler to inline a series of nested function calls the deeper the calls go? If the answer to this question is "yes," it may be advantageous to alter the base template of positive power from this: template <int N, bool odd> struct positive_power { template <typename T> static typename tools::promote_args<T>::type result(T base) { return base*positive_power<N-1, (N-1)%2>::result(base); } }; to this: template <int N, bool odd> struct positive_power { template <typename T> static typename tools::promote_args<T>::type result(T base) { return base*positive_power<2, false>::result( positive_power<N/2, (N/2)%2>::result(base)); } }; The reason is that for pow<31>, the first generates calls that go 8 deep with 3 2/F calls: 31/T, 30/F, 15/T, 14/F, 7/T, 6/F, 3/F, 2/F while the second generates calls that go 5 deep with 4 2/F calls: 31/T, 15/T, 7/T, 3/T, 1/T The second generates one more 2/F call than the first, but the 2/F calls don't add to the depth of the call; they're sibling calls to the recursive calls. This may only be a minor problem for builtin types, but I could see this being a problem for a "mod-multiplication" type, for example, that can raise numbers to high powers.

John Moeller <fishcorn <at> gmail.com> writes:
One more thing. Note that this is an honest question from me:
Can it be proportionally difficult for a compiler to inline a series of nested function calls the deeper the calls go?
If the answer to this question is "yes," it may be advantageous to alter the base template of positive power from this:
...
Hervé Brönnimann should get credit for that suggestion, as it was his expansion to begin with.

John Moeller wrote:
Can it be proportionally difficult for a compiler to inline a series of nested function calls the deeper the calls go?
Probably yes.
If the answer to this question is "yes," it may be advantageous to alter the base template of positive power from this:
template <int N, bool odd> struct positive_power { template <typename T> static typename tools::promote_args<T>::type result(T base) { return base*positive_power<N-1, (N-1)%2>::result(base); } };
to this:
template <int N, bool odd> struct positive_power { template <typename T> static typename tools::promote_args<T>::type result(T base) { return base*positive_power<2, false>::result( positive_power<N/2, (N/2)%2>::result(base)); } };
The reason is that for pow<31>, the first generates calls that go 8 deep with 3 2/F calls:
31/T, 30/F, 15/T, 14/F, 7/T, 6/F, 3/F, 2/F
while the second generates calls that go 5 deep with 4 2/F calls:
31/T, 15/T, 7/T, 3/T, 1/T
The second generates one more 2/F call than the first, but the 2/F calls don't add to the depth of the call; they're sibling calls to the recursive calls. This may only be a minor problem for builtin types, but I could see this being a problem for a "mod-multiplication" type, for example, that can raise numbers to high powers.
Right, and it also reduces the number of template instantiations needed? Looks like an excellent idea to me, Regards, John.

Right, and it also reduces the number of template instantiations needed?
Yep, it instantiates almost half the number of templates instantiated by the first implementation. About inlining considerations: I don't think that the compiler cares about the deepness of function calls it has to inline. I see optimization and inlining as a recursive algorithm that does its job exactly the same way however deep you are. But I'm not a compiler expert... Anyway your solution is really good and I'll sure use it in the next implementation. Thanks! Bruno

Hi, I thought about something last night... since with the last proposition, odd and even specializations are almost similar, it's possible to simplify the code and remove the second template parameter by doing this: template <int N> struct positive_power { template <typename T> static typename tools::promote_args<T>::type result(T base) { return positive_power<N%2>::result(base) *positive_power<2>::result( positive_power<N/2>::result(base)); } }; At the cost of one additional template instantiation (power<0> will always be instantiated) we get rid of the second parameter and have no runtime evaluation, as advocated in the other thread. Bruno

At the cost of one additional template instantiation (power<0> will always be instantiated) we get rid of the second parameter and have no runtime evaluation, as advocated in the other thread.
Oops... I realize that this falls back to the solution initially proposed by John Moeller and that had been rejected by Hervé due to the multiplication by 1. Sorry :-) Bruno

I don't remember the following being suggested: --ges template<int N> struct positive_power_increment; template<> struct positive_power_increment<1> { template<typename T> static T result(T pow2, T base) { return pow2 * base; } }; template <> struct positive_power_increment<0> { template<typename T> static T result(T pow2, T base) { return pow2; } }; template<int N> struct positive_power { template <typename T> static T result(T base) { T val = positive_power<N/2>::result(base); return positive_power_increment<N%2>::result(val*val,base); } }; template<> struct positive_power<1> { template <typename T> static T result(T base) { return base; } };

I don't remember the following being suggested:
Yep it's a possibility. It's close to my last attempt, using the MPL: template <int N> struct even_positive_power { template <typename T> static typename tools::promote_args<T>::type result(T base) { return positive_power<2>::result( positive_power<N/2>::result(base)); } }; template <int N> struct odd_positive_power { template <typename T> static typename tools::promote_args<T>::type result(T base) { return base*positive_power<2>::result( positive_power<N/2>::result(base)); } }; template <int N> struct positive_power { template <typename T> static typename tools::promote_args<T>::type result(T base) { return mpl::if_< mpl::greater_equal<mpl::int_<N%2>, mpl::int_<1> >, odd_positive_power<N>, even_positive_power<N> >::type::result(base); } }; BTW, I have a question about the MPL. I would have preferred to write "equal<int_<N%2>, int_<0> >" followed by even_ then odd_ but it doesn't work. I'm apparently missing something in the use of mpl::equal. Could someone explain me why: mpl::equal<mpl::int_<1>, mpl::int_<0> >::type::value equals to 1?? Thanks Bruno

Bruno Lalande <bruno.lalande <at> gmail.com> writes:
Yep it's a possibility. It's close to my last attempt, using the MPL:
template <int N> struct even_positive_power { template <typename T> static typename tools::promote_args<T>::type result(T base) { return positive_power<2>::result( positive_power<N/2>::result(base)); } };
template <int N> struct odd_positive_power { template <typename T> static typename tools::promote_args<T>::type result(T base) { return base*positive_power<2>::result( positive_power<N/2>::result(base)); } };
template <int N> struct positive_power { template <typename T> static typename tools::promote_args<T>::type result(T base) { return mpl::if_< mpl::greater_equal<mpl::int_<N%2>, mpl::int_<1> >, odd_positive_power<N>, even_positive_power<N> >::type::result(base); } };
Bruno, I think that you'll need a specialization for positive_power<1> *and* positive_power<2> with this. In fact, both of these otherwise generate a 4- deep infinitely-recursive function call at runtime. Starting at positive_power<2>: positive_power<2>::result even_positive_power<2>::result positive_power<1>::result odd_positive_power<1>::result positive_power<2>::result Starting at positive_power<1> yields the same result. Specializing only positive_power<1> will still cause an infinite recursion in positive_power<2>: positive_power<2>::result even_positive_power<2>::result positive_power<1>::result positive_power<2>::result Note that these won't be infinite *template* generations, but infinite *runtime* calls. One more thing. The argument is passed by value, which is ok for builtins. Will all those potentially excess temporaries be an issue for larger UDTs? Perhaps the argument type of result() should be wrapped so that it's a const reference if it's larger than a pointer. John M.

John Moeller wrote:
One more thing. The argument is passed by value, which is ok for builtins. Will all those potentially excess temporaries be an issue for larger UDTs? Perhaps the argument type of result() should be wrapped so that it's a const reference if it's larger than a pointer.
It may be best to simply pass all arguments types by const reference in this instance. When instantiated then inlined the function call overhead related to making temp-copy of the argument doesn't need to be stripped away by the compiler during subsequent optimization, which can make it easier for the compiler to identify that it is a good idea to inline the function in the first place. There is no real draw back to passing a built in type by const reference, particularly if your intention is for the compiler to inline the function anyway. Luke

I have to react to this. There is something called aliasing that prevent compilers to optimize, and passing builtins by const ref is generally NOT a good idea performance wise. There is something called the const-forwarding type that resolves to value for built-ins and const refs for UDT that might be useful here. Finally, something that no one seems to be concerned about in this thread, is the cost of all these templates in terms of byte object code. In such a low-level header, you can expect all these templates to end up in every UoR in your libraries, linked together into an executable. Keeping positive_ and negative_ out of your template names is going to save that many bytes in your object. You may think its nothing. I've seen placeholders (_1 to _10) defined as static const Placeholder<N> take up to .5M for essentially nothing (it's an empty class!) We saved that .5MB by declaring them extern, as is done in Boost. So the rule is: keep it as simple and short as possible while remaining readable code. Sent via BlackBerry from T-Mobile -----Original Message----- From: "Simonson, Lucanus J" <lucanus.j.simonson@intel.com> Date: Thu, 22 May 2008 12:49:41 To:boost@lists.boost.org Subject: Re: [boost] [math] pow redux and observations John Moeller wrote:
One more thing. The argument is passed by value, which is ok for builtins. Will all those potentially excess temporaries be an issue for larger UDTs? Perhaps the argument type of result() should be wrapped so that it's a const reference if it's larger than a pointer.
It may be best to simply pass all arguments types by const reference in this instance. When instantiated then inlined the function call overhead related to making temp-copy of the argument doesn't need to be stripped away by the compiler during subsequent optimization, which can make it easier for the compiler to identify that it is a good idea to inline the function in the first place. There is no real draw back to passing a built in type by const reference, particularly if your intention is for the compiler to inline the function anyway. Luke _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

I have to react to this. There is something called aliasing that prevent compilers to optimize, and passing builtins by const ref is generally NOT a good idea performance wise.
There is something called the const-forwarding type that resolves to value for built-ins and const refs for UDT that might be useful here.
Usually to solve this problem I simply use call_traits::param_type, is there any objection to this?
Finally, something that no one seems to be concerned about in this thread, is the cost of all these templates in terms of byte object code. In such a low-level header, you can expect all these templates to end up in every UoR in your libraries, linked together into an executable. Keeping positive_ and negative_ out of your template names is going to save that many bytes in your object. You may think its nothing. I've seen placeholders (_1 to _10) defined as static const Placeholder<N> take up to .5M for essentially nothing (it's an empty class!) We saved that .5MB by declaring them extern, as is done in Boost. So the rule is: keep it as simple and short as possible while remaining readable code.
OK I'll take a closer look at how the executable size changes with the different implementations, indeed it's a point I've forgotten a bit... Bruno

Bruno Lalande <bruno.lalande <at> gmail.com> writes:
...
There is something called the const-forwarding type that resolves to value for built-ins and const refs for UDT that might be useful here.
Usually to solve this problem I simply use call_traits::param_type, is there any objection to this?
That's precisely the tool I was thinking of. For some reason, I had it in my mind that it was constructed for an example, and not actually somewhere in boost.
Finally, something that no one seems to be concerned about in this thread, > ... OK I'll take a closer look at how the executable size changes with the different implementations, indeed it's a point I've forgotten a bit...
Can't that problem be solved by stripping symbols? John M.

With the following code I am unable to induce the compiler to ever fail to inline and produce a symbol in the symbol table. It doesn't matter if you pass by const ref, or what you name the function if it is always inlined and that stuff is just compiled out of existence. I think this whole thread may be solving a problem that doesn't exist. If the compiler can succeed to inline all N recursive function calls it doesn't matter whether it is 4, or 8 or what they are named. In the end what (might) matter is whether you have many or few multiplication operations (with the split recursive algorithm.) That same algorithm can be implemented as a loop in a single function that that takes the exponent by value as a runtime argument and the compiler would still have the opportunity to unroll the loop when the exponent is a constant where the function is inlined. I tried that and the compiler (gcc 4.3.2 -O2) fails to unroll the loop upon inspection of the assembly. I also inspected the assembly generated for my template and noticed something else that is interesting. The compiler takes the O(N) multiplications my lazily implemented positive_power template generates and optimizes them upon instantiation and inlining for N=21 to only 6 multiplications, meaning that the compiler is actually rewriting the algorithm for you to reduce the number of multiply operations. In this thread we have just been speculating about how best to do for the compiler what it is perfectly capable of doing on its own. I vote we leave the pow template alone, since it isn't broken and doesn't need fixing. (Yes I know my vote doesn't count.) Thanks, Luke //recursive template version template <int N> struct positive_power { template <typename T> static T result(T base) { return positive_power<N-1>::result(base) * base; } }; template <> struct positive_power<0> { template <typename T> static T result(T base) { return 1; } }; template <> struct positive_power<1> { template <typename T> static T result(T base) { return base; } }; int main(int argc, char** argv) { std::cout << positive_power<21>::result(argc); //assembly generated (note only 6 multiplies) main: 0x89 movl %edi,%esi 0x004007a2: 0x48 subl $8,%rsp 0x004007a6: 0x0f imull %edi,%esi 0x004007a9: 0x0f imull %esi,%esi 0x004007ac: 0x0f imull %edi,%esi 0x004007af: 0x0f imull %esi,%esi 0x004007b2: 0x0f imull %esi,%esi 0x004007b5: 0x0f imull %edi,%esi 0x004007b8: 0xbf movl $0x600c20,%edi //loop version inline int pow(int base, int exponent) { if(exponent == 0) return 1; int result = base; for(int i = exponent; i > 1; i /= 2) { result *= result; if(i % 2) result *= base; } return result; } int main(int argc, char** argv) { std::cout << pow(2, 8) << std::endl; //assembly (note one multiply in a loop) main: pushl %r13 0x004008a2: movl $8,%ecx 0x004008a7: movl $3,%edx 0x004008ac: sarl 1,%ecx 0x004008ae: movl $4,%esi 0x004008b3: pushl %r12 0x004008b5: pushl %rbp 0x004008b6: pushl %rbx 0x004008b7: subl $264,%rsp 0x004008be: subl $1,%edx 0x004008c1: je 0x4008d6 0x004008c3: imull %esi,%esi 0x004008c6: testb $1,%cl 0x004008c7: roll $141,(%rcx) 0x004008ca: addb $54,%al 0x004008cc: cmovne %eax,%esi 0x004008cf: sarl 1,%ecx 0x004008d1: subl $1,%edx 0x004008d4: jne 0x4008c3 0x004008d6: movl $0x600e20,%edi

On Thu, May 22, 2008 at 8:30 PM, Simonson, Lucanus J <lucanus.j.simonson@intel.com> wrote:
I vote we leave the pow template alone, since it isn't broken and doesn't need fixing. (Yes I know my vote doesn't count.)
Runtime-wise, it may be fine, yes. The gain is in going from O(N) template instantiations to O(log N), which helps at compile-time.

Hi, On Fri, May 23, 2008 at 3:11 AM, Scott McMurray <me22.ca+boost@gmail.com> wrote:
On Thu, May 22, 2008 at 8:30 PM, Simonson, Lucanus J <lucanus.j.simonson@intel.com> wrote:
I vote we leave the pow template alone, since it isn't broken and doesn't need fixing. (Yes I know my vote doesn't count.)
Runtime-wise, it may be fine, yes.
The gain is in going from O(N) template instantiations to O(log N), which helps at compile-time.
Even at runtime I don't really agree since the solution proposed here was the one I had originally proposed. And when Joaquin corrected it by proposing something that better optimizes the number of multiplications, my tests shew that the gain at runtime was real and that my compiler didn't do that by itself. This being said, I should redo my tests to be sure I didn't miss anything. Anyway, I think all this is a matter of compiler and the best way is to take compiler's hand to force it to write the right code, even for those compilers that would have been able to do it by themselves. Luke, which compiler to you use? Bruno

Bruno wrote:
The gain is in going from O(N) template instantiations to O(log N), which helps at compile-time.
Even at runtime I don't really agree since the solution proposed here was the one I had originally proposed. And when Joaquin corrected it by proposing something that better optimizes the number of multiplications, my tests shew that the gain at runtime was real and that my compiler didn't do that by itself. This being said, I should redo my tests to be sure I didn't miss anything.
Anyway, I think all this is a matter of compiler and the best way is to take compiler's hand to force it to write the right code, even for those compilers that would have been able to do it by themselves.
Luke, which compiler to you use?
gcc 4.3.2 -O2 I didn't mean we should use the O(N) algorithm I tested, I meant we shouldn't obsess about improving the O(log N) algorithm already checked into the trunk. Whether x^32 results in 8, 6, or 4 template instantiations probably doesn't matter as much as clarity or concise code. People should be able to use it as an example for how to write their own generic functions. Yes, it is better to force the compiler's hand than trust it. I certainly won't trust the compiler to unroll a loop for me. For built in multiply operations the compiler can optimize the expression tree you give it, but for user defined multiply operators on user defined types it will give you N multiplies if that is what you ask for and it. I think we should get the algorithm right, for sure. Have you ever seen the compiler fail to inline pow? Thanks, Luke

Simonson, Lucanus J <lucanus.j.simonson <at> intel.com> writes:
Bruno wrote:
The gain is in going from O(N) template instantiations to O(log N), which helps at compile-time.
Even at runtime I don't really agree since the solution proposed here was the one I had originally proposed. And when Joaquin corrected it by proposing something that better optimizes the number of multiplications, my tests shew that the gain at runtime was real and that my compiler didn't do that by itself. This being said, I should redo my tests to be sure I didn't miss anything.
Anyway, I think all this is a matter of compiler and the best way is to take compiler's hand to force it to write the right code, even for those compilers that would have been able to do it by themselves.
Luke, which compiler to you use?
gcc 4.3.2 -O2
I didn't mean we should use the O(N) algorithm I tested, I meant we shouldn't obsess about improving the O(log N) algorithm already checked into the trunk.
No, we shouldn't, but that's not really what we're doing here.
Whether x^32 results in 8, 6, or 4 template instantiations probably doesn't matter as much as clarity or concise code. People should be able to use it as an example for how to write their own generic functions.
Because people should be able to use it as an example for how to write their own generic functions is precisely the reason that we should worry about how many template instantiations are performed. Template instantiations cost development time, and just because they aren't felt at runtime, doesn't mean they aren't important. If you can reduce them *in every case* (with a simple realignment of templates), then you should. At the same time, if you can reduce the number of excess temporaries in every case just by a wise choice of parameter type, then you should. It's not as though we're talking about creating compiler-specific code specializations just to squeeze out an extra 5% speed out of pow<>. We're talking about writing good generic code, and good generic code considers the compile-time factor too.
Yes, it is better to force the compiler's hand than trust it. I certainly won't trust the compiler to unroll a loop for me. For built in multiply operations the compiler can optimize the expression tree you give it, but for user defined multiply operators on user defined types it will give you N multiplies if that is what you ask for and it. I think we should get the algorithm right, for sure.
Have you ever seen the compiler fail to inline pow?
We're just making suggestions here. Bruno's going to measure the results and do the best thing. John M.

John Moeller wrote:
Because people should be able to use it as an example for how to write their own generic functions is precisely the reason that we should worry about >how many template instantiations are performed. Template instantiations cost development time, and just because they aren't felt at runtime, doesn't mean they aren't important. If you can reduce them *in every case* (with a simple realignment of templates), then you should. At the same time, if you can reduce the number of excess temporaries in every case just by a wise choice >of parameter type, then you should.
It's not as though we're talking about creating compiler-specific code specializations just to squeeze out an extra 5% speed out of pow<>. We're talking about writing good generic code, and good generic code considers >the compile-time factor too.
Can we really predict compile speed difference between one implementation and another? Take, for instance, this template based condition for odd exponent: template <int N> struct even_odd { template <typename T> static const T& result(const T& t, const T& t2) { return t2; } }; template <> struct even_odd<1> { template <typename T> static T result(const T& t, const T& t2) { return t * t2; } }; template <int N> struct positive_power { template <typename T> static T result(T base) { return even_odd<N%2>::result(base, positive_power<N/2>::result(base) * positive_power<N/2>::result(base)); } }; is it slower to compile than this version which depends on the if statement being compiled away because it has compile time conditional: template <int N> struct positive_power { template <typename T> static T result(T base) { if(N%2) return base * positive_power<N/2>::result(base) * positive_power<N/2>::result(base); return positive_power<N/2>::result(base) * positive_power<N/2>::result(base); } }; I don't know. It depends on the order the compiler optimizes and whether it takes longer to instantiate even_odd or to compile away the compile time conditional. It is subject to change as compiler improve. I don't think it matters. I would write it the second way if I were writing it because it is less code, and I value that. I would also try to avoid code duplication over reducing the number of template instantiations. I rely on the compiler here to compile away the multiply by 1 identity: template <int N> struct positive_power { template <typename T> static T result(T base) { return positive_power<N/2>::result(base) * positive_power<N/2>::result(base) * (N%2) ? base : 1; } }; but simplify the code still further. Is that better? Again, it depends what you think is important. So, as we weigh the value of achieving one objective at the expense of another it comes down to subjective judgments about what is best. It is subjective because we often can't quantify things and also because we all value different things to a different degree. I probably undervalue compile speed. Thanks, Luke

Hello, Luke, your considerations make sense but one thing is sure: the compilation time increases with the number of distinct N values with which a class template gets instantiated, simply because the compiler recompiles it. And this is the whole point of the solutions being discussed here. Tomorrow I hope I'll have the time to make a complete perf test suite with different compilers (probably gcc-4, msvc-8, msvc-9) and show the results here. I'd like to measure for each solution the compilation time, the execution time, and the executable size. Bruno

Hi, Finally I only had the time to test on Linux with GCC 4. Here are the executable size, compilation time and execution time of the test suite pow_test.cpp for each "pow.hpp.#" file attached to this mail. Each time is an average of 6 times. 0 (2758478 bytes): 17.476s, 2.174s 1 (2746484 bytes): 17.367s, 2.228s 2 (2747612 bytes): 17.345s, 2.217s 3 (2748912 bytes): 17.326s, 2.157s 4 (2748104 bytes): 17.331s, 2.201s 5 (2790112 bytes): 17.809s, 2.181s Given that an empty implementation of pow (always returning 1) gives a size of 2230722, a compilation time of 14.576s and an execution time of 1.941s, we can subtract this result to every result above to see more precisely what the deltas between solutions really represent: #0 (527756 bytes): 2.900s, 0.233s #1 (515762 bytes): 2.791s, 0.287s #2 (516890 bytes): 2.769s, 0.276s #3 (518190 bytes): 2.750s, 0.216s #4 (517382 bytes): 2.755s, 0.260s #5 (559390 bytes): 3.233s, 0.240s In the end, the differences are very small. We notice a big compile-time overhead for #5, due to the use of MPL, that immediately discards it. The best compromise seems to be obtained with #3, which gives good compilation and execution times. Its executable size is a bit high, but not as high as for the only solution giving a better execution time, that is #0. So I'm going to make a new version based of #3 if everybody's OK with that. Bruno

Hi John, I've committed the new implementation in the sandbox, and began to see how to handle the 0^0 case. I'm going to need help, because things are not exactly how I thought they were. As I said, what I'd like is to have a default behavior that returns 1. I thought I could use the already existing ignore_error policy but I realize its behavior is imposed by the framework. For domain_error, which is the policy that best fits the 0^0 case IMO, ignore_error returns quiet_NaN which is not what I want. I can't use a user_error either since it would force me to predefine the user_domain_error function, thus stealing this point of customization to the user. So I'm realizing I'll have to adapt the framework (that's probably what you meant the other day by "adding new policies", I didn't notice at that moment). I don't know Boost.Math policies very much but my understanding is that we don't need a new policy but just a new action type, that I could use to return 1. I see 2 possibilities: creating a brand new action type, or allowing the existing "ignore_error" action to take in its constructor the value that we want it to return in place of quiet_NaN. Or maybe you prefer to simply return quiet_NaN instead of 1? Note that it's not the behavior of the C pow function so it could be misleading for users. What do you think about that? Thanks Bruno

Bruno Lalande wrote:
As I said, what I'd like is to have a default behavior that returns 1. I thought I could use the already existing ignore_error policy but I realize its behavior is imposed by the framework. For domain_error, which is the policy that best fits the 0^0 case IMO, ignore_error returns quiet_NaN which is not what I want. I can't use a user_error either since it would force me to predefine the user_domain_error function, thus stealing this point of customization to the user. So I'm realizing I'll have to adapt the framework (that's probably what you meant the other day by "adding new policies", I didn't notice at that moment).
I don't know Boost.Math policies very much but my understanding is that we don't need a new policy but just a new action type, that I could use to return 1. I see 2 possibilities: creating a brand new action type, or allowing the existing "ignore_error" action to take in its constructor the value that we want it to return in place of quiet_NaN.
Or maybe you prefer to simply return quiet_NaN instead of 1? Note that it's not the behavior of the C pow function so it could be misleading for users.
What do you think about that?
I don't like the idea of returning a NaN: returning 1 seems to be the right thing to do, and should probably be the default behaviour? The other problem with reusing a domain_error is that the default behaviour is to throw an exception, which may not be what users expect for this corner case? So I guess I'm leaning towards yet-another error-action, "zero_power_error" maybe? On the other hand.... Googling around, it's clear that 0^0 has an indeterminant value, and C99 says of the pow function: "A domain error occurs if x is finite and negative and y is finite and not an integer value. A domain error may occur if x is zero and y is less than or equal to zero." However in the appendix under recomended practice it lists lots of special cases: F.9.4.4 The pow functions -pow(±0, y) returns ±¥ and raises the ''divide-by-zero'' floating-point exception for y an odd integer < 0. - pow(±0, y) returns +¥ and raises the ''divide-by-zero'' floating-point exception for y < 0 and not an odd integer. - pow(±0, y) returns ±0 for y an odd integer > 0. - pow(±0, y) returns +0 for y > 0 and not an odd integer. - pow(-1, ±¥) returns 1. - pow(+1, y) returns 1 for any y, even a NaN. - pow(x, ±0) returns 1 for any x, even a NaN. - pow(x, y) returns a NaN and raises the ''invalid'' floating-point exception for finite x < 0 and finite non-integer y. - pow(x, -¥) returns +¥ for | x | < 1. - pow(x, -¥) returns +0 for | x | > 1. - pow(x, +¥) returns +0 for | x | < 1. - pow(x, +¥) returns +¥ for | x | > 1. - pow(-¥, y) returns -0 for y an odd integer < 0. - pow(-¥, y) returns +0 for y < 0 and not an odd integer. - pow(-¥, y) returns -¥ for y an odd integer > 0. - pow(-¥, y) returns +¥ for y > 0 and not an odd integer. - pow(+¥, y) returns +0 for y < 0. - pow(+¥, y) returns +¥ for y > 0. Sorry about the mangled text: those funny symbols are infinities! The key bit here is: "- pow(x, ±0) returns 1 for any x, even a NaN." So the result should be 1! Personally I can't help wondering if they just overlooked the 0^0 case :-( Grrr, still not sure what to do here, how about: * If the domain_error action is "ignore_error" then return 1, otherwise, * Raise a domain_error as normal. ? Not sure if this helps, John.

I don't like the idea of returning a NaN: returning 1 seems to be the right thing to do, and should probably be the default behaviour? The other problem with reusing a domain_error is that the default behaviour is to throw an exception, which may not be what users expect for this corner case?
Originally I wanted to bypass that default behavior by having the second overload of pow calling the first one with a precise policy rather than just policy<>(). But this may not be a good approach (users could be surprised to not have the same result by calling pow<N>(x) than by calling pow<N>(x, policy<>()) ).
So I guess I'm leaning towards yet-another error-action, "zero_power_error" maybe?
Are you talking about an action or a policy? If it's an action, having an action dedicated to this case is weird since it won't apply to anything else than a domain_error (if we finally use that). If it's a policy, having a policy dedicated to one single function of the library sounds bad to me too. Why not be more general and call it "undetermined_result_error" or something else that doesn't interfere with any other policy and could eventually apply to more situations?
<snip>
Personally I can't help wondering if they just overlooked the 0^0 case :-(
Indeed they seem to contradict themselves a bit...
Grrr, still not sure what to do here, how about:
* If the domain_error action is "ignore_error" then return 1, otherwise, * Raise a domain_error as normal.
I had thought about this solution but didn't know 1) if the framework allows this kind of test easily and safely, and 2) if it doesn't break the general philosophy of policies in Boost.Math (a bit like doing type testing in OO). But if you think it's possible, why not... I'm like you, really not sure of what to do... I'll continue to think about all that. Bruno

Hi John, First of all: I'd like to ensure you had my response of may 27, since the mail system sent me today a notification saying it couldn't be delivered...?? When I search on Gmane I can see it so I think it's OK though... Second: in my previous mails I used the term "policy" where I should have used "error" since policies are rather the actions taken in case of error. Sorry for the terminological confusion... So finally, I think the best way is to create a new error. There are 2 options: either we call it zero_power_error and it's bound to be only used in pow, or we call it undetermined_result_error so that it can be eventually reused later for something else. The default policy for this error will be ignore_error. If we choose "zero_power_error", the result in case of ignore_error will be 1. If we choose "undetermined_result_error", the best way is to make the result depend on the function where it's used. In this case, the raise_undetermined_result_error will have to take that value in parameter. For example, in pow: return policies::raise_undetermined_result_error<T>( "boost::math::pow(%1%)", "The result of pow<0>(%1%) is undetermined", base, T(1), // value to return in case of ignore_error policy ); Which of the 2 options do you prefer? For the throw_on_error policy, do you see any existing exception (std or boost) that could fit or is it better to create a new one? For the errno_on_error policy, how about setting errno to EDOM and returning a NaN? Bruno

Bruno Lalande wrote:
Hi John,
First of all: I'd like to ensure you had my response of may 27, since the mail system sent me today a notification saying it couldn't be delivered...?? When I search on Gmane I can see it so I think it's OK though...
Yep.
So finally, I think the best way is to create a new error. There are 2 options: either we call it zero_power_error and it's bound to be only used in pow, or we call it undetermined_result_error so that it can be eventually reused later for something else. The default policy for this error will be ignore_error. If we choose "zero_power_error", the result in case of ignore_error will be 1. If we choose "undetermined_result_error", the best way is to make the result depend on the function where it's used. In this case, the raise_undetermined_result_error will have to take that value in parameter. For example, in pow:
return policies::raise_undetermined_result_error<T>( "boost::math::pow(%1%)", "The result of pow<0>(%1%) is undetermined", base, T(1), // value to return in case of ignore_error policy );
Which of the 2 options do you prefer?
Generally I prefer generic rather than specific solutions, so I guess I'd go for the latter - but would it be better called "indeterminate_result_error" ?
For the throw_on_error policy, do you see any existing exception (std or boost) that could fit or is it better to create a new one?
Not sure, the only obvious one is domain_error. Otherwise we're left to deriving a new exception type: would this gain us anything - storing the return value maybe?
For the errno_on_error policy, how about setting errno to EDOM and returning a NaN?
Yep, that's sensible, although should we return the value passed to the function: that would be consistent with C99: return 1 but optionally set errno as well. Cheers, John.

Hi,
Which of the 2 options do you prefer?
Generally I prefer generic rather than specific solutions, so I guess I'd go for the latter - but would it be better called "indeterminate_result_error" ?
Yep you're probably right, as English is not my native language you surely master those aspects better than me.
For the throw_on_error policy, do you see any existing exception (std or boost) that could fit or is it better to create a new one?
Not sure, the only obvious one is domain_error. Otherwise we're left to deriving a new exception type: would this gain us anything - storing the return value maybe?
I had thought about domain_error too. I don't think storing the return value in a new exception type would really bring anything valuable, so I think we can go with domain_error.
For the errno_on_error policy, how about setting errno to EDOM and returning a NaN?
Yep, that's sensible, although should we return the value passed to the function: that would be consistent with C99: return 1 but optionally set errno as well.
OK if you think it's more consistent with C99, let's return the same value as ignore_error and set errno. I should have the time to implement all that next week. Regards Bruno

Hi John, I have added the new undeterminate_result_error to the framework, used it in the pow function, and checked into the sandbox. Please could you tell me if all looks good to you? In test_error_handling.cpp, I noticed that not every policies were tested: ignore_error and errno_on_error were missing. I added some test for them with undeterminate_result_error, do you want me to do the same for the other types of error? It's pretty straight-forward now. I haven't been able to add a test to the last part of test_policy_2.cpp, where you test different overloads of make_policy(). I wanted to add the same test as at line 103, it didn't pass. I reduced the line until I end up with this: BOOST_CHECK(check_same( make_policy(undeterminate_result_error<ignore_error>()), policy<undeterminate_result_error<ignore_error> >() )); But it still doesn't pass. I guess it's a problem with the way in which I integrated the new error in the normalise class but I'm a bit confused about what to do exactly... Also, test_tr1 doesn't pass for me but I guess it's not related to my modifications...? Regards Bruno

Bruno Lalande wrote:
Hi John,
I have added the new undeterminate_result_error to the framework, used it in the pow function, and checked into the sandbox. Please could you tell me if all looks good to you?
Just one thing: "undeterminate" should be spelt "indeterminate" throughout. No one said English was going to be logical :-)
In test_error_handling.cpp, I noticed that not every policies were tested: ignore_error and errno_on_error were missing. I added some test for them with undeterminate_result_error, do you want me to do the same for the other types of error? It's pretty straight-forward now.
Please!
I haven't been able to add a test to the last part of test_policy_2.cpp, where you test different overloads of make_policy(). I wanted to add the same test as at line 103, it didn't pass. I reduced the line until I end up with this:
BOOST_CHECK(check_same( make_policy(undeterminate_result_error<ignore_error>()), policy<undeterminate_result_error<ignore_error> >() ));
But it still doesn't pass. I guess it's a problem with the way in which I integrated the new error in the normalise class but I'm a bit confused about what to do exactly...
It's because ignore_error is the default for indeterminate errors, so make_policy(undeterminate_result_error<ignore_error>() simply returns policy<>() and not policy<undeterminate_result_error<ignore_error> >(). Try it with an error action that's not the default and the test should pass. Also one small point, in pow_test you check for a special return value: "return T(123.456)", but since that's not exactly representable as a binary floating point number it may be susceptible to spurious failures due to rounding issues etc. Probably better to return an exact binary value like 123.25 or something just in case.
Also, test_tr1 doesn't pass for me but I guess it's not related to my modifications...?
Oh :-( What compiler/platform/errors? There are a small number of fixes to in the Trunk for that test as well that haven't been back-merged to the Sandbox yet: can you see if the Trunk version of that test passes for you? Thanks, John.

Just one thing: "undeterminate" should be spelt "indeterminate" throughout. No one said English was going to be logical :-)
Oops :-) OK, fixed it.
In test_error_handling.cpp, I noticed that not every policies were tested: ignore_error and errno_on_error were missing. I added some test for them with undeterminate_result_error, do you want me to do the same for the other types of error? It's pretty straight-forward now.
Please!
I have troubles testing for nan values. I wanted to use boost::math::isnan() but with the real_concept type it doesn't work. Is there a generic way to do this or should I just exclude real_concept from this test? Is it normal that raise_evaluation_error returns "val" for ignore_error while the documentation specifies infinity? The corresponding test fails because of that. I added all the tests but commented out the problematic ones for now.
It's because ignore_error is the default for indeterminate errors, so make_policy(undeterminate_result_error<ignore_error>() simply returns policy<>() and not policy<undeterminate_result_error<ignore_error> >(). Try it with an error action that's not the default and the test should pass.
OK, I've been able to add indeterminate_error to those tests. With this additional error type, the last test (line 110) ends up with a make_policy() of 11 arguments so I had to add a new overload. I think it's normal since the maximum number of arguments should be the same as the total number of types that policy<> can handle, right?
Also one small point, in pow_test you check for a special return value: "return T(123.456)", but since that's not exactly representable as a binary floating point number it may be susceptible to spurious failures due to rounding issues etc. Probably better to return an exact binary value like 123.25 or something just in case.
OK, I made those values representable in single precision.
Also, test_tr1 doesn't pass for me but I guess it's not related to my modifications...?
Oh :-(
What compiler/platform/errors? There are a small number of fixes to in the Trunk for that test as well that haven't been back-merged to the Sandbox yet: can you see if the Trunk version of that test passes for you?
Indeed, it passes in Trunk so I think a merge to the sandbox would fix the problem. I compile with GCC 4.2.3 on a Kubuntu Hardy 32 bits. The output is: ====== BEGIN OUTPUT ====== Running 1 test case... Testing type float unknown location(0): fatal error in "test_main_caller( argc, argv )": memory access violation at address: 0xbf10dfec: no mapping at fault address test_tr1.cpp(52): last checkpoint *** 1 failure detected in test suite "Test Program" EXIT STATUS: 201 ====== END OUTPUT ====== The error occurs on the line testing expm1. Commenting it out makes the test pass. If you merge your recent fixes back to the sandbox I can test again to see if it solves the issue. Bruno

Bruno Lalande wrote:
In test_error_handling.cpp, I noticed that not every policies were tested: ignore_error and errno_on_error were missing. I added some test for them with undeterminate_result_error, do you want me to do the same for the other types of error? It's pretty straight-forward now.
Please!
I have troubles testing for nan values. I wanted to use boost::math::isnan() but with the real_concept type it doesn't work. Is there a generic way to do this or should I just exclude real_concept from this test?
real_concept doesn't have NaN's as such (I realise you can initialise it with a NaN, but the idea is to model a type with a minimum set of requirements, and that means no NaN's). Use std::numeric_limits<T>::has_quiet_NaN to check if the type has a NaN before testing - there may be some obscure platforms with no NaN support for the builtin floating point types in any case, likewise when testing infinities.
Is it normal that raise_evaluation_error returns "val" for ignore_error while the documentation specifies infinity? The corresponding test fails because of that.
Oops, the docs are just plain wrong :-( It returns the closest approximation to the result found so far: it typically gets raised when an iterative method doesn't converge fast enough, so this is the behaviour that makes the most sense. I'll fix those docs.
It's because ignore_error is the default for indeterminate errors, so make_policy(undeterminate_result_error<ignore_error>() simply returns policy<>() and not policy<undeterminate_result_error<ignore_error>
(). Try it with an error action that's not the default and the test should pass.
OK, I've been able to add indeterminate_error to those tests. With this additional error type, the last test (line 110) ends up with a make_policy() of 11 arguments so I had to add a new overload. I think it's normal since the maximum number of arguments should be the same as the total number of types that policy<> can handle, right?
Yep, and it's possible that we may need to add more template arguments to policy<> now that we're adding more policies, it's just something I've been putting off - especially as it may bump up compile times :-(
What compiler/platform/errors? There are a small number of fixes to in the Trunk for that test as well that haven't been back-merged to the Sandbox yet: can you see if the Trunk version of that test passes for you?
Indeed, it passes in Trunk so I think a merge to the sandbox would fix the problem. I compile with GCC 4.2.3 on a Kubuntu Hardy 32 bits. The output is:
====== BEGIN OUTPUT ====== Running 1 test case... Testing type float unknown location(0): fatal error in "test_main_caller( argc, argv )": memory access violation at address: 0xbf10dfec: no mapping at fault address test_tr1.cpp(52): last checkpoint
*** 1 failure detected in test suite "Test Program"
EXIT STATUS: 201 ====== END OUTPUT ======
The error occurs on the line testing expm1. Commenting it out makes the test pass. If you merge your recent fixes back to the sandbox I can test again to see if it solves the issue.
OK I think I know what the issue is then: I don't really want to back merge from Trunk to sandbox until we're ready to synchonise the two - it's so much easier to keep track of things if the two versions get synchonised in both directions rather than making add-hock changes to both.

real_concept doesn't have NaN's as such (I realise you can initialise it with a NaN, but the idea is to model a type with a minimum set of requirements, and that means no NaN's). Use std::numeric_limits<T>::has_quiet_NaN to check if the type has a NaN before testing - there may be some obscure platforms with no NaN support for the builtin floating point types in any case, likewise when testing infinities.
OK I added this check to the tests, everything passes now. Just an idea: shouldn't there be such a check directly in isnan() in order it to just return false when has_quiet_NaN is false?
I'll fix those docs.
OK, and I will update the docs in the sandbox to add the new indeterminate_result_error soon. Bruno

Hi John, I have updated the docs of the error handling framework and of the pow function in the sandbox. Let me know if there's another place in the docs where the existence of the new error should be mentioned. On the way I have also fixed the mistake about ignore_error for an evaluation_error. Regards Bruno

Bruno Lalande wrote:
Hi John,
I have updated the docs of the error handling framework and of the pow function in the sandbox. Let me know if there's another place in the docs where the existence of the new error should be mentioned.
On the way I have also fixed the mistake about ignore_error for an evaluation_error.
Thanks Bruno, I'm going to be tied up for a few days, I'll take a look afterwards, John.

Bruno Lalande wrote:
Hi John,
I have updated the docs of the error handling framework and of the pow function in the sandbox. Let me know if there's another place in the docs where the existence of the new error should be mentioned.
On the way I have also fixed the mistake about ignore_error for an evaluation_error.
Thanks, I've corrected a couple of typos, but other than that, is this all ready to merge to Trunk now? Cheers, John.

Bruno Lalande wrote:
Thanks, I've corrected a couple of typos, but other than that, is this all ready to merge to Trunk now?
Yep you can merge the docs and the sources. Tests successfully pass here on a locally merged trunk so everything should be OK.
Merge done: if you can cast a critical eye over the Trunk to make sure I haven't messed anything up then that would be great! Thanks, John.

Hi,
I think that you'll need a specialization for positive_power<1> *and* positive_power<2> with this. In fact, both of these otherwise generate a 4- deep infinitely-recursive function call at runtime. Starting at positive_power<2>:
positive_power<2>::result even_positive_power<2>::result positive_power<1>::result odd_positive_power<1>::result positive_power<2>::result
Actually I had those specializations in my code but didn't put them in the mail since their presence is almost obvious now (or, at least; I try at each new implementation to remove each of them and see the effect). I will put all them I my next mails to be explicit. Bruno

Bruno Lalande <bruno.lalande <at> gmail.com> writes:
Hi,
I think that you'll need a specialization for positive_power<1> *and* positive_power<2> with this. In fact, both of these otherwise generate a 4- deep infinitely-recursive function call at runtime. Starting at positive_power<2>:
positive_power<2>::result even_positive_power<2>::result positive_power<1>::result odd_positive_power<1>::result positive_power<2>::result
Actually I had those specializations in my code but didn't put them in the mail since their presence is almost obvious now (or, at least; I try at each new implementation to remove each of them and see the effect). I will put all them I my next mails to be explicit.
No need. I just wanted to make sure that it was caught. Honestly, even if you hadn't thought of it, you'd probably have caught it on testing anyway. :-) Sorry for the noise. John
participants (9)
-
Bruno Lalande
-
hervebronnimann@mac.com
-
John Maddock
-
John Maddock
-
John Moeller
-
Schrader, Glenn
-
Scott McMurray
-
Simonson, Lucanus J
-
Steven Watanabe