Re: [boost] boost.atomic and boost.lockfree backports

30 May 2011

      ...
double-checked with intel's software developer's manual, volume 3a,
section
8.1.1: there is no way to access 128bit words atomically. afaik, the only
way to
implement them correctly is to use cmpxchg16b ...
That's the answer we got from Intel & AMD architects a year ago. Unlike
8-byte aligned memory access on x86, there are no guarantees for atomicity
for 16-byte memory accesses on x64 (except for cmpxchg16b). Though I believe
all implementations did at that time.

A few notes, after a one minute look over one file only:

Some of the functions seem pretty weird. E.g. why is a 128-bit fetch_add
implemented in terms of a packed add. Don't individual components of the
vector just wrap around on overflow without carrying over? And
__declspec(align) takes a number of bytes not bits.

BTW, VC 9 had a fairly random set of supported intrinsics for various locked
operations on x86/x64. I added a couple more to make this a bit more
consistent across bit widths in VC10 -- to support my implementation of
<atomic>.

Also alignment is not really alignment in x86 VC. Sadly, VC++ for x86 has no
strong stack alignment. By default, there are no stronger guarantees than
that ESP is 4 byte aligned on function entry. Hence, for locals with
stronger alignment requirements dynamic stack alignment is required
(interprocedural optimizations can sometimes elide that setup). Here doubles
differ from long longs from types with explicit alignment requirements (i.e.
__declspec(align)). Only the latter are really guaranteed to produce aligned
addresses when declared as locals.

I'm not sure what aligned_storage and friends do, but just because __alignof
reports some value does not mean that locals will be properly aligned.

-hg

Re: [boost] boost.atomic and boost.lockfree backports

Holger Grund