
On Wed, Sep 7, 2011 at 6:15 PM, Beman Dawes <bdawes@acm.org> wrote:
On Mon, Sep 5, 2011 at 2:56 PM, tymofey <tymofey@qip.ru> wrote:
This is not a full review, but rather a few comments on the implementation of reorder(x) functions. What you are doing is casting pointers around and assigning individual bytes, which might not really effective:
inline void reorder(int64_t source, int64_t& target) { const char* s (reinterpret_cast<const char*>(&source)); char * t (reinterpret_cast<char*>(&target) + sizeof(target) - 1); *t = *s; *--t = *++s; *--t = *++s; *--t = *++s; *--t = *++s; *--t = *++s; *--t = *++s; *--t = *++s; }
it does eight increments, eight decrements, one addition, one substraction and eight assignments. maybe something like:
inline void reorder(int64_t source, int64_t& target) { target = ((source << 0x38) & 0xFF00000000000000) | ((source << 0x28) & 0x00FF000000000000) | ((source << 0x18) & 0x0000FF0000000000) | ((source << 0x08) & 0x000000FF00000000) | ((source >> 0x08) & 0x00000000FF000000) | ((source >> 0x18) & 0x0000000000FF0000) | ((source >> 0x28) & 0x000000000000FF00) | ((source >> 0x38) & 0x00000000000000FF); }
would be more efficient?
Yes, but if I read the standard correctly, it is relying on undefined behavior.
Both the C and C++ standards say:
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2 [to the power] E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1×2 [to the power] E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
In practice I don't think this is a problem, but Boost code generally doesn't rely on code that has undefined behavior. Perhaps it would be OK to #ifdef on the architecture, and use shift-and-mask implementations only on architectures that don't do something weird (like cause an interrupt) in the case covered by the last sentence.
If you use uint64_t instead of int64_t (always the right thing when doing bit-twidling), there shouldn't be any UB, right? -- gpd