
On 2011-09-07 22:30:35 +0300, Beman Dawes said:
inline void reorder(int64_t source, int64_t& target) { target = ((source << 0x38) & 0xFF00000000000000) | ((source << 0x28) & 0x00FF000000000000) | ((source << 0x18) & 0x0000FF0000000000) | ((source << 0x08) & 0x000000FF00000000) | ((source >> 0x08) & 0x00000000FF000000) | ((source >> 0x18) & 0x0000000000FF0000) | ((source >> 0x28) & 0x000000000000FF00) | ((source >> 0x38) & 0x00000000000000FF); }
If you use uint64_t instead of int64_t (always the right thing when doing bit-twidling), there shouldn't be any UB, right?
Good point! Each case of "source" in the above code could be changed to "static_cast<uint32_t>(source)", and then the whole expression cast back to int32_t.
If you're doing a proper benchmark, Beman, I'd add one more trick to compare in the test: inline void reorder(uint64_t source, uint64_t & target) { uint64_t step32, step16; step32 = source << 32 | source >> 32; step16 = (step32 & 0x0000FFFF0000FFFF) << 16 | (step32 & 0xFFFF0000FFFF0000) >> 16; target = (step16 & 0x00FF00FF00FF00FF) << 8 | (step16 & 0xFF00FF00FF00FF00) >> 8; } It's at least seemingly less work than the code above from tymofey. Of course, the compiler optimizations might end up making it worse. PS. The code for 128-bit integers is one step more, etc. -- Pyry Jahkola pyry.jahkola@iki.fi