
even though uuid::data is uint8_t[16]. I see an assembly listing that is virtually identical to the one you posted for _Equal, which is also just a memcmp call: You may probably want to make sure the Oi switch is present.
inline bool __CLRCALL_OR_CDECL _Equal(const unsigned char *_First1, const unsigned char *_Last1, const unsigned char *_First2, random_access_iterator_tag, _Range_checked_iterator_tag) { // compare [_First1, _Last1) to [First2, ...), for unsigned chars #if _HAS_ITERATOR_DEBUGGING _DEBUG_RANGE(_First1, _Last1); if (_First1 != _Last1) _DEBUG_POINTER(_First2); #endif /* _HAS_ITERATOR_DEBUGGING */
return (::memcmp(_First1, _First2, _Last1 - _First1) == 0); } You have an indirection here and the original code has two indirections -- iterators plus _Equal call. As you may see this is too complex for MS optimizer -- it looks like the optimizer loses the data size and alignment information and gives up.
I'm using VC++2005 though. I do remember that at least memcpy intrinsic worked pretty smart in that compiler. Though the optimizers of the subsequent versions are no doubt better.
Although in both cases, the loop that is actually executed is identical to the one you give above. You are right. And I was unable to make the VC to unroll it. That is why I do believe the library that is intended to be used in commercial applications shall do its best in manual optimization of evident cases. And I believe the uint8_t data[16] is crystal evident case in all respects including alignment issues.
-- Michael Kochetkov