
inline void reorder(int64_t source, int64_t& target) { target = ((source << 0x38) & 0xFF00000000000000) | ((source << 0x28) & 0x00FF000000000000) | ((source << 0x18) & 0x0000FF0000000000) | ((source << 0x08) & 0x000000FF00000000) | ((source >> 0x08) & 0x00000000FF000000) | ((source >> 0x18) & 0x0000000000FF0000) | ((source >> 0x28) & 0x000000000000FF00) | ((source >> 0x38) & 0x00000000000000FF); }
would be more efficient?
Yes, but if I read the standard correctly, it is relying on undefined behavior.
Both the C and C++ standards say:
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2 [to the power] E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1×2 [to the power] E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
In practice I don't think this is a problem, but Boost code generally doesn't rely on code that has undefined behavior. Perhaps it would be OK to #ifdef on the architecture, and use shift-and-mask implementations only on architectures that don't do something weird (like cause an interrupt) in the case covered by the last sentence.
If you use uint64_t instead of int64_t (always the right thing when doing bit-twidling), there shouldn't be any UB, right?
Good point! Each case of "source" in the above code could be changed to "static_cast<uint32_t>(source)", and then the whole expression cast back to int32_t. AFAICS, that does eliminate the UB. --Beman