
Hello,
To be honest I've done some experiments, also with a polygon with a compile-time number of vertices. Called it the template <size_t D> gon so a gon<3> for a triangle, etc. It disappointed me a bit because the compile-time area calculation routine which I drafted turned out to be slower than the runtime version...
For the pythagoras version I've given above, my tests show that the performances are the same as with the original version with GCC 4, which is not surprising. This is true with or without optimizations (but I don't care about the "without" version, anyway).
Do you mean the version generated using a compile time area calculation turns out to be slower at runtime than the native runtime version? I
don't
understand how this might be possible. What are you trying to achieve at compile time?
Yep, it would be interesting to see your compile-time version because the fact it's slower sounds very surprising, indeed.
In the post, the compile-time version unrolls the loop during compilation; it doesn't compute a result based on values specified in the source that are defined at compilation. Thus, the compile-time function could have a higher cost. It will almost certainly have a higher cost in a debug build since many compilers do not inline in that situation. I'd have to think recursive function calls are going to be much more expensive than a simple for loop. In release, it really depends on how well a compiler inlines. I haven't done extensive testing on any current compiler, but I know MSVC 6.0 optimized very well when __forceinline was used judiciously, but was quite a bit more hit-or-miss when relying on its inlining decisions.
When I meta-program, I usually don't care about what will happen with a non-optimizing compiler, since meta-programming entirely relies on the compilers' inlining and optimizing skills, anyway. Thus, when it comes to compare performances, I do it both with optimizations turned on and off, but I only care about the results of the "on" version (see above). Bruno