
On Wednesday 03 August 2011 09:39:21 Grund, Holger wrote:
Efficient loads & stores are a bit tricky in that SSE2 is not a requirement for 32-bit Windows. Without it, I think we need to resort FILD/FISTP, which is a pain.
iirc, sse2 intrinsics are not guaranteed to be atomic, so sometimes memory access has to be emulated via CAS.
All aligned 64-bit accesses are guaranteed to be atomic on x86. The same is not true for 128-bit load and stores on x64 (at least there are no architectural guarantees -- I think most (all) Intel & AMD implementations still did in 2009)
I'm not really sure how you would implement a fully correct lock-free atomic<int128_t> on x64. A cmpxchg16b requires the underlying page to be writable.
if the page is not writable, then why would you need an atomic<int128_t> in the first place? - if the data is unchanging, then is doesn't matter - if the data is changing (through a writable mapping by someone else to the page), then you have some sort of producer-/consumer-problem and that is trivialley solvable with word-sized atomic operations IMHO the same rationale holds for 64 bit atomics on 32 bit, so emulation via DCAS is acceptable -- since the lock prefix is needed anyway before cmpxchg8b/cmpxchg16b this should also deal with misalignment (even though this incurs a hefty performance penalty) Helge