Re: [Boost-users] Boost.MultiIndex question on merge

28 Apr 2009

      On Tue, Apr 28, 2009 at 5:05 PM, arm2arm <arm2arm@gmail.com> wrote:
...
To: Ovanes
Are there gain in the speed if  I would use the for_each?
I am always avoiding to use for_each  to allow the compiler (like INTEL)
auto-parallelize the regions.
But for this particular case Is not a issue.
to be honest, that's strange. I know from MSVC that using std::algorithms
allows parallelisation.

e.g. using std::find of a 32bit numeric value in a vector runs up to 4 times
faster, due to XMM register optimization. Copying out the std::find loop
implemation runs slower, since the compiler does not know how the vector
pointed by pointer/iterator is aligned. It is pretty well explained here:
http://www.agner.org/optimize/optimizing_cpp.pdf Chapter 11. My experiments
with MSVC 2003 & 2005 show that searching for a number (with is not present
in the vector) in a loop copied from find impl is 4 times slower as using
find itself. I am curious how it is with Intel compiler.

IMO STL algorithms delivered with compiler are pretty well optimized for
that particular compiler version. If you write your loop, it is for sure not
faster as comaprible STL algo distributed with the compiler. Does Intel
explicitly state that they do not optimize STL code and it is not parallel
nor the STL does not use XMM registers?

Thanks,
Ovanes

Re: [Boost-users] Boost.MultiIndex question on merge

Ovanes Markarian