[GGL] [geometry] Inexplicable speed benefit when using Visual C++ 2010

Hi everybody, my first attempt to post to this list bounced, so I am trying again. My employer is an early adopter of the Boost Generic Geometry Library [GGL] in an engineering application related to mobile radio communicatons. We use it to estimate and optimize the coverage of 4G radio networks. Our code uses a lot of multi-polygon unions to estimate the amount of ground covered (and not covered) by radio beams and iteratively improves the antenna parameters. We've been compiling and shipping our application with Visual C++ 2008 so far. We found that GCC 4.4 on Linux was about 100% faster than Visual C++ 2008 on Linux without modifying the code. This bothered us quite a bit as both compilers were allowed to use full optimization. We found that by optimizing (globaly overloading) the new and delete operators to re-use allocated memory fragments on Windows we were able to get nearly 50% speed benefit, so we attributed much of the performance difference to sub-optimal memory heap management of Visual C++ 2008. Then we tried recompiling the project with Visual C++ 2010 Ultimate Release Candidate (RC). The speed gain of the algorithm was 900% (not joking) and the results still appear to be correct. Now this is surreal and no one here in the office has found a reasonable explanation yet without going into the metaphysical domain. Would anyone with knowledge of compiler and runtime internals be able to make an educated guess as to how such a speed gain of factor 10 is possible? Is anyone else seeing similar speedups in boost or in the geometry library when compiling with Visual C++ 2010 RC (HINT: it's a free download, so anyone can try it out until end of June 2010). Christian

On Fri, Apr 16, 2010 at 2:30 PM, Christian Buchner <christian.buchner@gmail.com> wrote: [...]
Would anyone with knowledge of compiler and runtime internals be able to make an educated guess as to how such a speed gain of factor 10 is possible? Is anyone else seeing similar speedups in boost or in the geometry library when compiling with Visual C++ 2010 RC (HINT: it's a free download, so anyone can try it out until end of June 2010).
VC++ 2010 now has the iterator checking disabled by default; It is probably unlikely that you were unknowingly running with checking enabled, but that would easily explain the speedup. HTH, -- gpd

I guess it's also possible that you're getting some speed up from the C++0x stuff in the standard library (move support in containers and such)? -- View this message in context: http://old.nabble.com/-GGL---geometry--Inexplicable-speed-benefit-when-using... Sent from the Boost - Dev mailing list archive at Nabble.com.

Hi Giovanni,
VC++ 2010 now has the iterator checking disabled by default
LOL, an old employer of mine used to be less than happy with the performance of my code, and so was often (half) joking, asking me to finally remove that "delay(1000)" source code line ;) I guess someone within the VC++ team finally heard him! P.S: it was about time they take that back! Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

VC++ 2010 now has the iterator checking disabled by default; It is probably unlikely that you were unknowingly running with checking enabled, but that would easily explain the speedup.
I tried setting -D_SECURE_SCL=0 in Visual C++ Release mode, however then the program crashed right away. Possibly because depencies, such as OpenSceneGraph and QT were still built with the default settings. But anyway, thanks for the pointer. That looks like a hot candidate to investigate.

On 4/16/2010 9:04 AM, Giovanni Piero Deretta wrote:
VC++ 2010 now has the iterator checking disabled by default; It is probably unlikely that you were unknowingly running with checking enabled, but that would easily explain the speedup.
Does anyone have a reference to confirm the above? I have googled away but have found no evidence to support this. It would be great news indeed, as the burden of using _SECURE_SCL consistent/correct across a set of 3rd party libraries is too high.

Theres a mention of it at: http://blogs.msdn.com/vcblog/archive/2009/10/22/visual-studio-2010-beta-2-is... I thought it was mentioned in more detail on another post there, but i can't find the post at the moment. -- View this message in context: http://old.nabble.com/-GGL---geometry--Inexplicable-speed-benefit-when-using... Sent from the Boost - Dev mailing list archive at Nabble.com.

On 4/16/2010 12:42 PM, Richard Webb wrote:
Theres a mention of it at: http://blogs.msdn.com/vcblog/archive/2009/10/22/visual-studio-2010-beta-2-is...
I thought it was mentioned in more detail on another post there, but i can't find the post at the moment.
Ok... sleuthing further in MSDN, I see 1) http://msdn.microsoft.com/en-us/library/aa985896%28VS.100%29.aspx Which supports your hyphothesis and 2) http://msdn.microsoft.com/en-us/library/aa985965%28VS.100%29.aspx Which (ambiguously) refutes it... sigh.

On Fri, Apr 16, 2010 at 3:35 PM, eg <egoots@gmail.com> wrote:
On 4/16/2010 12:42 PM, Richard Webb wrote:
Theres a mention of it at:
http://blogs.msdn.com/vcblog/archive/2009/10/22/visual-studio-2010-beta-2-is...
I thought it was mentioned in more detail on another post there, but i can't find the post at the moment.
Ok... sleuthing further in MSDN, I see
1) http://msdn.microsoft.com/en-us/library/aa985896%28VS.100%29.aspx
Which supports your hyphothesis
and
2) http://msdn.microsoft.com/en-us/library/aa985965%28VS.100%29.aspx
Which (ambiguously) refutes it...
Actually it does not really, the checked iterators enabling are handled by a different define, which is globally disabled by the _SECURE_SCL anyway, but I still manually disabled it in all my builds. I pass through about 6 definitions to disable various things that Visual Studio rather horribly breaks otherwise.

On 16/04/2010 20:06, eg wrote:
On 4/16/2010 9:04 AM, Giovanni Piero Deretta wrote:
VC++ 2010 now has the iterator checking disabled by default; It is probably unlikely that you were unknowingly running with checking enabled, but that would easily explain the speedup.
Does anyone have a reference to confirm the above?
I have googled away but have found no evidence to support this. It would be great news indeed, as the burden of using _SECURE_SCL consistent/correct across a set of 3rd party libraries is too high.
"Friday, June 26, 2009 2:47 PM by Stephan T. Lavavej [MSFT] [...] Fortunately, VC10 after Beta 1 will contain "#pragma detect_mismatch", which will detect _ITERATOR_DEBUG_LEVEL mismatch deterministically at link time, instead of crashing mysteriously at run time.
Is the default in release mode now to turn SECURE_SCL off?
In VC10 after Beta 1, _ITERATOR_DEBUG_LEVEL in release mode will default to 0. And there was much rejoicing." (http://blogs.msdn.com/vcblog/archive/2009/06/23/stl-performance.aspx) I can confirm both points (see [1] and [2]). I'm really glad VS team finally listened and disabled checked iterators by default in release mode. The ODR violation check is just the cherry on the top. From what I understand, a new macro, _ITERATOR_DEBUG_LEVEL, supersedes both _SECURE_SCL and _ITERATOR_DEBUG_LEVEL. By default, _ITERATOR_DEBUG_LEVEL is set to 2 in Debug mode and to 0 in Release mode. The documentation is a bit weak, but if you search for _ITERATOR_DEBUG_LEVEL inside VC header files, you'll see the actual logic. [1] http://social.msdn.microsoft.com/Forums/en/vcpluslanguage/thread/47cac5fe-50... [2] iterator debug level preprocessor checks I've used when testing VC10, just to be sure. #if _ITERATOR_DEBUG_LEVEL == 0 #pragma message("_ITERATOR_DEBUG_LEVEL == 0") #elif _ITERATOR_DEBUG_LEVEL == 1 #pragma message("_ITERATOR_DEBUG_LEVEL == 1") #elif _ITERATOR_DEBUG_LEVEL == 2 #pragma message("_ITERATOR_DEBUG_LEVEL == 2") #endif #if _ITERATOR_DEBUG_LEVEL == 0 && _SECURE_SCL != 0 #error _SECURE_SCL != 0 while _ITERATOR_DEBUG_LEVEL == 0 #endif

Hi Christian,
We found that GCC 4.4 on Linux was about 100% faster than Visual C++ 2008 on Linux without modifying the code.
I can confirm your speed measurement differences. I have done about a year ago benchmarks with Boost.Geometry using some different compilers. It is indeed the case that GCC was much faster then VC 2008. However, the largest difference was between VC 2005 and VC 2008. The VC 2005 compiler produces to my findings much faster code than VC 2008 does. I didn't measure 2010 (beta) at that time, but it seems that the problem that was introduced in 2008 is now solved. Good news. I've asked last year during BoostCon to Microsoft, present there, if they knew this issue, but apparently they didn't, and told me it would surprise him because the compiler basically was the same. When I used after that once the VC 2008 command line compiler (so not from Visual Studio) the problem disappeared. So it was fast. So it was not the compiler itself, it was some setting in the IDE or VPROJ. I've never managed to find which setting, though I studied it carefully. See also our page here: <http://geometrylibrary.geodan.nl/compiling.html> stating that "our measurements indicate that MSVC 2005 generates faster code than MSVC 2008" I just repeated my measurements, now also including 2010, so yes the 2008 issue is indeed solved. VS 2005/VS 2010 fastest code GCC 3.4.5/MinGW: slower (~30%) VS 2008: very slow (~500%) All VS using _SECURE_SCL=0. I'm talking about the Express versions of all these three. In my last 2010 measurement I turned of _SECURE_SCL and there was no measurable difference (referring to last message in this thread) Regards, Barend

On 16/04/2010 11:30 PM, Christian Buchner wrote:
Hi everybody,
my first attempt to post to this list bounced, so I am trying again.
My employer is an early adopter of the Boost Generic Geometry Library [GGL] in an engineering application related to mobile radio communicatons. We use it to estimate and optimize the coverage of 4G radio networks. Our code uses a lot of multi-polygon unions to estimate the amount of ground covered (and not covered) by radio beams and iteratively improves the antenna parameters.
We've been compiling and shipping our application with Visual C++ 2008 so far.
We found that GCC 4.4 on Linux was about 100% faster than Visual C++ 2008 on Linux without modifying the code. This bothered us quite a bit as both compilers were allowed to use full optimization. We found that by optimizing (globaly overloading) the new and delete operators to re-use allocated memory fragments on Windows we were able to get nearly 50% speed benefit, so we attributed much of the performance difference to sub-optimal memory heap management of Visual C++ 2008.
Then we tried recompiling the project with Visual C++ 2010 Ultimate Release Candidate (RC). The speed gain of the algorithm was 900% (not joking) and the results still appear to be correct. Now this is surreal and no one here in the office has found a reasonable explanation yet without going into the metaphysical domain.
Would anyone with knowledge of compiler and runtime internals be able to make an educated guess as to how such a speed gain of factor 10 is possible? Is anyone else seeing similar speedups in boost or in the geometry library when compiling with Visual C++ 2010 RC (HINT: it's a free download, so anyone can try it out until end of June 2010).
Nothing new here, with msvc10 GPC, which is entirely C (no templates), sees a roughly 6x-7x on fastcode and favour-speed settings over msvc9, with PGO it gets to about 8x-9x and on intel v11 with PGO et al you're looking at around 11x-12x increase over intel v10 or msvc9. The point here is that the increase is mainly centered around new memory allocation mechanisms (as described by Stephan) in the msvc10 backend - not necessarily anything special MS has done wrt c++ specifically. (btw the polygons used range from simple 4-5 corner convex to 100k+ corners concave-disjoint with holes and concentric islands, all operations union, diff, xor). On a side note with intel v11 if the loop unrolling is set correctly for the target processor and if sse4.1 is available, it peaks at around 13x-14x - and this is with a code base that was last touched nearly 8 years ago.

Actually, I said that there were no massive performance improvements between VC9 and VC10, except for the addition of rvalue references. Keeping everything else constant, there shouldn't be order-of-magnitude performance improvements between VC9 and VC10 for purely C code. Of course, we improve our compiler back-end's code generation in every major release, but by a few percent (if we're very lucky). (VC does not yet implement anything like autovectorization.) Also, as I explained, malloc()/new are unchanged between VC9 and VC10 - they both call HeapAlloc(). Now, if you're not keeping everything else constant - e.g. switching from x86 to x64, or from XP to Vista+ - then the Low Fragmentation Heap may be responsible. If it isn't rvalue references and it isn't the LFH, then I'm truly stumped. Additionally, I find it interesting that you say that performance also massively increased between Intel v10 and Intel v11. That's when they implemented rvalue references too. Are you sure that you're not wrapping GPC (or anything else) in C++ code that would automatically benefit from rvalue references? Thanks, STL -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Arash Partow Sent: Tuesday, April 20, 2010 5:50 PM To: boost@lists.boost.org Subject: Re: [boost] [GGL] [geometry] Inexplicable speed benefit when using Visual C++ 2010 On 16/04/2010 11:30 PM, Christian Buchner wrote:
Hi everybody,
my first attempt to post to this list bounced, so I am trying again.
My employer is an early adopter of the Boost Generic Geometry Library [GGL] in an engineering application related to mobile radio communicatons. We use it to estimate and optimize the coverage of 4G radio networks. Our code uses a lot of multi-polygon unions to estimate the amount of ground covered (and not covered) by radio beams and iteratively improves the antenna parameters.
We've been compiling and shipping our application with Visual C++ 2008 so far.
We found that GCC 4.4 on Linux was about 100% faster than Visual C++ 2008 on Linux without modifying the code. This bothered us quite a bit as both compilers were allowed to use full optimization. We found that by optimizing (globaly overloading) the new and delete operators to re-use allocated memory fragments on Windows we were able to get nearly 50% speed benefit, so we attributed much of the performance difference to sub-optimal memory heap management of Visual C++ 2008.
Then we tried recompiling the project with Visual C++ 2010 Ultimate Release Candidate (RC). The speed gain of the algorithm was 900% (not joking) and the results still appear to be correct. Now this is surreal and no one here in the office has found a reasonable explanation yet without going into the metaphysical domain.
Would anyone with knowledge of compiler and runtime internals be able to make an educated guess as to how such a speed gain of factor 10 is possible? Is anyone else seeing similar speedups in boost or in the geometry library when compiling with Visual C++ 2010 RC (HINT: it's a free download, so anyone can try it out until end of June 2010).
Nothing new here, with msvc10 GPC, which is entirely C (no templates), sees a roughly 6x-7x on fastcode and favour-speed settings over msvc9, with PGO it gets to about 8x-9x and on intel v11 with PGO et al you're looking at around 11x-12x increase over intel v10 or msvc9. The point here is that the increase is mainly centered around new memory allocation mechanisms (as described by Stephan) in the msvc10 backend - not necessarily anything special MS has done wrt c++ specifically. (btw the polygons used range from simple 4-5 corner convex to 100k+ corners concave-disjoint with holes and concentric islands, all operations union, diff, xor). On a side note with intel v11 if the loop unrolling is set correctly for the target processor and if sse4.1 is available, it peaks at around 13x-14x - and this is with a code base that was last touched nearly 8 years ago. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On 21/04/2010 11:23 AM, Stephan T. Lavavej wrote:
Additionally, I find it interesting that you say that performance also massively increased between Intel v10 and Intel v11. That's when they implemented rvalue references too. Are you sure that you're not wrapping GPC (or anything else) in C++ code that would automatically benefit from rvalue references?
No rvalues, no library wrapping, just better loop-unrolling and an interesting reordering of some jump tables based on the input set used during PGO (differs from how they are when compiled with just O2), sometimes the simplest things seem to be the most effective.
participants (10)
-
Arash Partow
-
Barend Gehrels
-
Christian Buchner
-
eg
-
Fernando Cacciola
-
Giovanni Piero Deretta
-
OvermindDL1
-
Richard Webb
-
Stephan T. Lavavej
-
Tanguy Fautré