
You might find some sections of Sean Aaron Anderson's "Bit-Twiddling Hacks" of interest, though perhaps not of practical use. Check out: Compute the minimum (min) or maximum (max) of two integers without branching <http://graphics.stanford.edu/%7Eseander/bithacks.html#IntegerMinOrMax> Conditionally set or clear bits without branching <http://graphics.stanford.edu/%7Eseander/bithacks.html#ConditionalSetOrClearBitsWithoutBranching> Conditionally negate a value without branching <http://graphics.stanford.edu/%7Eseander/bithacks.html#ConditionalNegate> Its worth noting that the better optimization your compiler has the better the loop in your test case will do on your branchless method. But a good optimizing compiler would probably avoid the branch in the first place. To use the technique to its fullest you or the optimizer should translate your loop into something like (untested): type a[2] = {BIG_VALUE, ANY_VALUE}; type* vp[2] = {--&v1[0], --&v2[0]}; // I know, but I'm expressing some possibly low level optimization. for (size_t i = 0; i<n; ++i) { a[1] = *vp[*++vp[1]<*++vp[0]]; a[0] = a[a[1]<a[0]]; } (My apologies if I did something dumb -- my work situation has resulted in my not doing much C++ for a number of years -- but you should get the idea). A good optimizing compiler should be able to do most or all of this from the inlined method, though the "relaxation" of the transfer of the move to "c[]" into an indirection as vp is pretty sophisticated. Binding the local a and the temporary containing the min of the v's to the vector they would otherwise be transfered to is routine. Of course, if this were code you really wanted to optimize, rather than a test loop, you would improve locality by processing each vector separately by making c[] a reference Topher DE wrote:
i'd like to present some kind of micro optimization towards pipelined cpus
traditionaly min() has an implementation like
template<typename type> inline type min(type a, type b) { return a<b ? a : b; }
this function (even if inlined) introduces a branch in a sequence of instructions though branch prediction algorithms do exist they will guess at most 50% of them because routine input is arbitrary since a wrong branch prediction causes pipeline stall i thought that such a useful routine like min() MUST have non-stalling (read 'non-branching') implementation i thought hard and in the end recalled a simple implementation which is called LUT (for 'look-up table') here's what i finally got:
template<typename type> inline type min(type a, type b) { const type c[2] = {a, b}; return c[b<a]; }
intuitively this implementation should work slower than the first one but in real life it runs faster (roughly by a factor of 2 for 4 byte types on core2duo) here's a test code:
#include <vector> int main() { typedef int type; const int n = 1000000; std::vector<type> v1(n), v2(n); //filling v1 and v2 somehow type min = BIG_VALUE; for (size_t i = 0; i<n; ++i) a = min(a, min(v1[i], v2[i])); //searching minima of v1 and v2 return 0; }
however for 'double' it runs ~2% slower (i think it's because creating a lut for 2 elements 8 bytes each is slower than a branch on c2d) nevertheless i think it's a great approach at least for ints also max() can implemented very similar
you may skip the rest example of usage imagine we want to implement a symmetric matrix that stores only half of its elements (say upper triangle)
a00 a01 a02 --- a11 a12 --- --- a22
then access requests to the lower triangle elements would be redirected to upper triangle ones here we have a constraint that actually we only have access to elements a(i, j) where i<=j must always be true (look at indices above) if we use a branch like
//... return i<=j ? a(i, j) : a(j, i);
we get a pipeline stall almost every time we want to access an arbitrary element however if we rewrite it carefully as
//... const size_t m = min(i, j); //non-branching min() here return a(m, m + abs(int(i) - int(j))); //abs() is also //non-branching we get rid of branches at all! and we can hope a pipelined cpu will handle it better than the other (branched) implementation
-- pavel
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost