[Boost.Array] surprising performance issue
Hello, I've discovered a situation where using boost::array results in a 2-fold performance degradation over other array types, including C arrays, tr1::array, and a simple handwritten array class. While the tr1::array performs well, I was suprised to find that it pads the array, so sizeof(tr1::array<3,float>) == sizeof(tr1::array<4,float>). Is that a requirement of tr1::array? Anyway, the performance problem can be seen using Intel C++ (v9.1) on the Itanium platform, using '-O3' optimization. I'm forced to take boost::array out of my computation's inner loop. Is this a previously known issue? Any ideas why this performance discrepancy might exist? Thanks, Bryan Green
I'm forced to take boost::array out of my computation's inner loop. Is this a previously known issue? Any ideas why this performance discrepancy might exist?
I am curious; what is the performance problem? I don't think you will get much help w/o being a little more specific. Chris
Bryan,
I've discovered a situation where using boost::array results in a 2-fold performance degradation over other array types, including C arrays, tr1::array, and a simple handwritten array class. While the tr1::array performs well, I was suprised to find that it pads the array, so sizeof(tr1::array<3,float>) == sizeof(tr1::array<4,float>). Is that a requirement of tr1::array?
It is very strange that you see a performance difference between tr1::array and boost::array. I think that boost::array was used as a submission for tr1::array (they are likely one and the same).
Anyway, the performance problem can be seen using Intel C++ (v9.1) on the Itanium platform, using '-O3' optimization.
In your build environment, ensure that you ARE NOT defining _DEBUG, and that you ARE defining NDEBUG. These flags can make a significant difference in run-time performance. Just because you are compiling with optimization flags does not necessarily mean you are running the most optimized code :-) Hope This Helps, Justin
I already hear about the boost::array peformance luck.
If you need really fast math library, you could try Blitz++ library arrays, it's perfomance is nearly as good as FORTRAN - coded program one. You may find some info about Blitz++ on the codeproject.com || codeguru.com.
Hope this helps.
Andrei
-----Original Message-----
From: KSpam
Bryan,
I've discovered a situation where using boost::array results in a 2-fold performance degradation over other array types, including C arrays, tr1::array, and a simple handwritten array class. While the tr1::array performs well, I was suprised to find that it pads the array, so sizeof(tr1::array<3,float>) == sizeof(tr1::array<4,float>). Is that a requirement of tr1::array?
It is very strange that you see a performance difference between tr1::array and boost::array. I think that boost::array was used as a submission for tr1::array (they are likely one and the same).
Anyway, the performance problem can be seen using Intel C++ (v9.1) on the Itanium platform, using '-O3' optimization.
In your build environment, ensure that you ARE NOT defining _DEBUG, and that you ARE defining NDEBUG. These flags can make a significant difference in run-time performance. Just because you are compiling with optimization flags does not necessarily mean you are running the most optimized code :-)
Hope This Helps, Justin _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
KSpam writes:
I've discovered a situation where using boost::array results in a 2-fold performance degradation over other array types, including C arrays, tr1::array, and a simple handwritten array class.
In your build environment, ensure that you ARE NOT defining _DEBUG, and that you ARE defining NDEBUG. These flags can make a significant difference in run-time performance.
Setting NDEBUG took care of it. Thanks. :)
While the tr1::array performs well, I was suprised to find that it pads the array, so sizeof(tr1::array<3,float>) == sizeof(tr1::array<4,float>). Is that a requirement of tr1::array?
It is very strange that you see a performance difference between tr1::array and boost::array. I think that boost::array was used as a submission for tr1::array (they are likely one and the same).
The tr1::array does not do range checking or an assert in operator[]. As for the alignment, this explains it: from gcc tr1/array: value_type _M_instance[_Nm ? _Nm : 1] __attribute__((__aligned__)); This is in the implementation of tr1::array, for gcc's standard C++ library, which is used by Intel C++. I sure would like to know the rational for putting in the alignment attribute, as it changes the code in a very non-trivial way. -bgreen
On 28/11/2007, Bryan Green
KSpam writes:
I've discovered a situation where using boost::array results in a 2-fold performance degradation over other array types, including C arrays, tr1::array, and a simple handwritten array class.
In your build environment, ensure that you ARE NOT defining _DEBUG, and that you ARE defining NDEBUG. These flags can make a significant difference in run-time performance.
Setting NDEBUG took care of it. Thanks. :)
While the tr1::array performs well, I was suprised to find that it pads the array, so sizeof(tr1::array<3,float>) == sizeof(tr1::array<4,float>). Is that a requirement of tr1::array?
It is very strange that you see a performance difference between tr1::array and boost::array. I think that boost::array was used as a submission for tr1::array (they are likely one and the same).
The tr1::array does not do range checking or an assert in operator[].
As for the alignment, this explains it:
from gcc tr1/array:
value_type _M_instance[_Nm ? _Nm : 1] __attribute__((__aligned__));
This is in the implementation of tr1::array, for gcc's standard C++ library, which is used by Intel C++.
I sure would like to know the rational for putting in the alignment attribute, as it changes the code in a very non-trivial way.
-bgreen
This was mentioned on the libstdc++ list: ---------- Forwarded message ---------- From: Aaron Graham Date: 23 Oct 2007 23:33 Subject: sizeof std::tr1::array To: libstdc++@gcc.gnu.org This has probably been discussed before, but I can't find any information on it. Unlike boost::array, std::tr1::array (and std::array in 4.3.x) has an attribute-alignment on its data, which means that sizeof(the_array) is always a multiple of 16, at least on all systems I'm currently using: value_type _M_instance[_Nm ? _Nm : 1] __attribute__((__aligned__)); This is rather inconvenient, and somewhat unnecessary, especially on embedded systems, since there's a nontrivial hidden cost here. I have a harder time convincing programmers not to roll-their-own when I see stuff like that. Can I get an explanation please? Thanks in advance. Aaron ---------- Forwarded message ---------- From: Paolo Carlini Date: 24 Oct 2007 06:14 Subject: Re: sizeof std::tr1::array To: Aaron Graham Cc: libstdc++@gcc.gnu.org Aaron Graham wrote:
Can I get an explanation please?
No big deal, we wanted to play safe wrt some extensions which need a large alignment. I agree we can change it back to the natural alignment and fix that other stuff... Thanks, Paolo.
On 26/11/2007, Bryan Green
I'm forced to take boost::array out of my computation's inner loop. Is this a previously known issue? Any ideas why this performance discrepancy might exist?
I know that boost::array asserts that the index is in-range when using operator[] (as it's caught a few bugs for me), but that's an overhead that -O3 isn't going to eliminate. As KSpam mentioned, try with -DNDEBUG as well. ~ Scott -- Sed quis custodiet ipsos custodes?
Bryan Green wrote:
Hello, I've discovered a situation where using boost::array results in a 2-fold performance degradation over other array types, including C arrays, tr1::array, and a simple handwritten array class. While the tr1::array performs well, I was suprised to find that it pads the array, so sizeof(tr1::array<3,float>) == sizeof(tr1::array<4,float>). Is that a requirement of tr1::array?
No, that will be an effect of the compiler's structure alignment rules. BTW tr1::array and boost::array are basically the same thing, so I'm surprised that there's any difference at all.
Anyway, the performance problem can be seen using Intel C++ (v9.1) on the Itanium platform, using '-O3' optimization.
I'm forced to take boost::array out of my computation's inner loop. Is this a previously known issue? Any ideas why this performance discrepancy might exist?
Can you share a test case? As others have already noted, defining NDEBUG is likely to have quite an effect as well. HTH, John.
( sorry if the formatting is bad, I'm fighting the new hotmail editor that seems to think graphics are more important then text....)
Can you share a test case?
And, the compiled code if it is not too difficult? I don't know if I can contribute anything but I am curious to see what the compilers generate. If there is a spurious flag or combination of flags, it may be obvious in the generated code. ( extra calls in inner loop or some other dumb thing). I guess it is possible you could have a wierd case where padding makes things worse by increasing cache miss rate or something but otherwise I wouldn't expect it to be a big deal. Thanks. Mike Marchywka586 Saint James WalkMarietta GA 30067-7165404-788-1216 (C)<- leave message989-348-4796 (P)<- emergency onlymarchywka@hotmail.comNote: Hotmail is blocking my mom's entireISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'Tassume I am ignoring you and tryme on marchywka@yahoo.com if no replyhere. Thanks. ----------------------------------------
From: john@johnmaddock.co.uk To: boost-users@lists.boost.org Date: Tue, 27 Nov 2007 16:52:28 +0000 Subject: Re: [Boost-users] [Boost.Array] surprising performance issue
Bryan Green wrote:
Hello, I've discovered a situation where using boost::array results in a 2-fold performance degradation over other array types, including C arrays, tr1::array, and a simple handwritten array class. While the tr1::array performs well, I was suprised to find that it pads the array, so sizeof(tr1::array<3,float>) == sizeof(tr1::array<4,float>). Is that a requirement of tr1::array?
No, that will be an effect of the compiler's structure alignment rules.
BTW tr1::array and boost::array are basically the same thing, so I'm surprised that there's any difference at all.
Anyway, the performance problem can be seen using Intel C++ (v9.1) on the Itanium platform, using '-O3' optimization.
I'm forced to take boost::array out of my computation's inner loop. Is this a previously known issue? Any ideas why this performance discrepancy might exist?
Can you share a test case?
As others have already noted, defining NDEBUG is likely to have quite an effect as well.
HTH, John.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_________________________________________________________________ Share life as it happens with the new Windows Live.Download today it's FREE! http://www.windowslive.com/share.html?ocid=TXT_TAGLM_Wave2_sharelife_112007
participants (7)
-
Andrei
-
Bryan Green
-
Chris Weed
-
John Maddock
-
KSpam
-
Mike Marchywka
-
Scott McMurray