I did some boost::mpl programming intended for high performance mathematical calculation. The class had some template parameter which was supposed to be a boost::mpl::vector containing boolean elements. Operations between two instances (with different template vector classes) should result in a new type with all the vector elements being the result of an or operation. I relied on that the boost::mpl::for_each function should be inlined. I also relied on that references to the vector elements from an if-statement should result in removed code -- means any if(vector-element) .. else .. should result in code which does not check for this vector element anymore, since it is constant for every type. What I ended up with were function calls. First the execute methods inside the for_each.hpp were not inlined. I added some __forceinline (or matching attribute for gcc) into the boost::mpl::for_each code: diff -wr boost_1_35_0/boost/mpl/for_each.hpp boost_1_35_0.saved/boost_1_35_0/boost/mpl/for_each.hpp 41,42c41 < __attribute__ ((__always_inline__)) < inline static void execute( ---
static void execute(
61,62c60 < __attribute__ ((__always_inline__)) < inline static void execute( ---
static void execute(
92d89 < __attribute__ ((__always_inline__)) 107d103 < __attribute__ ((__always_inline__)) The next problem was that there was some treatment of some value (which I don't understand) which was not inlined: diff -wr boost_1_35_0/boost/utility/value_init.hpp boost_1_35_0.saved/boost_1_35_0/boost/utility/value_init.hpp 55,56d54 < __attribute__ ((__always_inline__)) < inline 75,76d72 < __attribute__ ((__always_inline__)) < inline 82,83d77 < __attribute__ ((__always_inline__)) < inline 92,93d85 < __attribute__ ((__always_inline__)) < inline 99,100d90 < __attribute__ ((__always_inline__)) < inline 106,107d95 < __attribute__ ((__always_inline__)) < inline 115,116d102 < __attribute__ ((__always_inline__)) < inline 122,123d107 < __attribute__ ((__always_inline__)) < inline I don't understand what is being achieved with this value_init stuff. We are dealing with const bool values, which do not need to reside on the stack again. Somehow these value initializations made it into the optimized assembler code and where not removed -- they are also not referenced in the assembler code -- loading 0 or 1 byte literals into stack variables, which were not referenced later. Finally I ended up with slower code than before. It should result in faster code, since some unnecessary operations are now removed. The idea of these type conversions and checked constructors and assignment operators is very appealing. Thanks for the mpl library. I hope you consider my experience in making this code faster and so more widely applicable. Peter Foelsche