
On 16/04/2010 11:30 PM, Christian Buchner wrote:
Hi everybody,
my first attempt to post to this list bounced, so I am trying again.
My employer is an early adopter of the Boost Generic Geometry Library [GGL] in an engineering application related to mobile radio communicatons. We use it to estimate and optimize the coverage of 4G radio networks. Our code uses a lot of multi-polygon unions to estimate the amount of ground covered (and not covered) by radio beams and iteratively improves the antenna parameters.
We've been compiling and shipping our application with Visual C++ 2008 so far.
We found that GCC 4.4 on Linux was about 100% faster than Visual C++ 2008 on Linux without modifying the code. This bothered us quite a bit as both compilers were allowed to use full optimization. We found that by optimizing (globaly overloading) the new and delete operators to re-use allocated memory fragments on Windows we were able to get nearly 50% speed benefit, so we attributed much of the performance difference to sub-optimal memory heap management of Visual C++ 2008.
Then we tried recompiling the project with Visual C++ 2010 Ultimate Release Candidate (RC). The speed gain of the algorithm was 900% (not joking) and the results still appear to be correct. Now this is surreal and no one here in the office has found a reasonable explanation yet without going into the metaphysical domain.
Would anyone with knowledge of compiler and runtime internals be able to make an educated guess as to how such a speed gain of factor 10 is possible? Is anyone else seeing similar speedups in boost or in the geometry library when compiling with Visual C++ 2010 RC (HINT: it's a free download, so anyone can try it out until end of June 2010).
Nothing new here, with msvc10 GPC, which is entirely C (no templates), sees a roughly 6x-7x on fastcode and favour-speed settings over msvc9, with PGO it gets to about 8x-9x and on intel v11 with PGO et al you're looking at around 11x-12x increase over intel v10 or msvc9. The point here is that the increase is mainly centered around new memory allocation mechanisms (as described by Stephan) in the msvc10 backend - not necessarily anything special MS has done wrt c++ specifically. (btw the polygons used range from simple 4-5 corner convex to 100k+ corners concave-disjoint with holes and concentric islands, all operations union, diff, xor). On a side note with intel v11 if the loop unrolling is set correctly for the target processor and if sse4.1 is available, it peaks at around 13x-14x - and this is with a code base that was last touched nearly 8 years ago.