
On Thu, Sep 10, 2009 at 10:17 AM, DE <satan66613@yandex.ru> wrote:
i'd like to present some kind of micro optimization towards pipelined cpus
traditionaly min() has an implementation like
template<typename type> inline type min(type a, type b) { return a<b ? a : b; }
this function (even if inlined) introduces a branch in a sequence of instructions
I'm curious, have you actually tried to look at the generated instructions? Because on a 686+ this should compile this into two non-branching instructions for min(): cmp a, b cmova a, b And the same thing but with cmovb for max(). You solution should generate at least 5 instructions. If an Intel engineer is to be believed, cmov is great for use in normal code but when done in a short loop it creates dependencies that cause a performance hit. It sounds like your benchmark may be hitting this inefficiency squarely on the nose. https://mail.mozilla.org/pipermail/tamarin-devel/2008-April/000454.html -- Cory Nelson http://int64.org