Thanks Ovanes for explanations, I did not touched MSVC ages currently I am using g++ and icpc: Probably intel also provides its own STL optimized library, I am not an expert, but always the fast gooling gives that in the intel presentations they avoiding usage of for_each, I will try to bench the part of the code to see which approach is faster in my case insert or for_each. But I am running in to trouble with inserter: struct strData{ int id; float a, b; strData(int id_, float a_, float b_):a(a_),b(b_),id(id_){} }; typedef multi_index_container< strData, indexed_by< ordered_unique< tag<id>, BOOST_MULTI_INDEX_MEMBER(strData,int,id)>, ordered_non_unique< tag<snap>,BOOST_MULTI_INDEX_MEMBER(strData,float,a)>
indexed_data_type;
indexed_data_type data_setA, data_setB;// we need append setB to setA { scoped_timer timeme("merge by INSERT: "); data_setA.insert(data_setB.begin(), data_setB.end()); } { scoped_timer timeme("merge by FOR_EACH: "); std::for_each( data_setB.begin(), data_setB.end(),std::inserter(data_setA)); } The compiler gives long error on: error: no matching function for call to 'inserter(boost::multi_index::multi_index_container etc.... What is the correct syntax of the inserter with multi_index_container ? Thanks Arman. Ovanes Markarian wrote:
On Tue, Apr 28, 2009 at 5:05 PM, arm2arm
wrote: To: Ovanes Are there gain in the speed if I would use the for_each? I am always avoiding to use for_each to allow the compiler (like INTEL) auto-parallelize the regions. But for this particular case Is not a issue.
to be honest, that's strange. I know from MSVC that using std::algorithms allows parallelisation.
e.g. using std::find of a 32bit numeric value in a vector runs up to 4 times faster, due to XMM register optimization. Copying out the std::find loop implemation runs slower, since the compiler does not know how the vector pointed by pointer/iterator is aligned. It is pretty well explained here: http://www.agner.org/optimize/optimizing_cpp.pdf Chapter 11. My experiments with MSVC 2003 & 2005 show that searching for a number (with is not present in the vector) in a loop copied from find impl is 4 times slower as using find itself. I am curious how it is with Intel compiler.
IMO STL algorithms delivered with compiler are pretty well optimized for that particular compiler version. If you write your loop, it is for sure not faster as comaprible STL algo distributed with the compiler. Does Intel explicitly state that they do not optimize STL code and it is not parallel nor the STL does not use XMM registers?
Thanks, Ovanes
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- View this message in context: http://www.nabble.com/Boost.MultiIndex-question-on-merge-tp23277496p23282660... Sent from the Boost - Users mailing list archive at Nabble.com.