RE: [boost] platform specific structures

(first post to this forum, so be nice guys! :)
For example, on ia32 32 bit aligned ops are atomic
But you will also need to express the constraint that the data be aligned?
, on ia64 it is 64 bit.
However, remember that the fact that a given data type can be written in a single operation != visibility across all processors in an MP system.
doubles on ia32 need locking to be atomic, on ia64 they don't.
I think that the later models of ia32 can write up to 8 bytes at a time.
Any suggestions on best practice for this?
I think you might want to send this post to comp.programming.threads too - you may get some interesting feedback. I think Alexander Therekhov has some class wrapper for atomic types that I am sure he'd be keen to discuss, but unfortunately I've never had the time to review it. Tom -------------------------------------------------------- NOTICE: If received in error, please destroy and notify sender. Sender does not waive confidentiality or privilege, and use is prohibited.

Thanks for the interest Tom, <tomas.puverle@morganstanley.com> wrote:
(first post to this forum, so be nice guys! :)
For example, on ia32 32 bit aligned ops are atomic
But you will also need to express the constraint that the data be aligned?
Yep, I just assume the appropriate alignment compiler option presently, which isn't really good enough but it works for me as I use 8 byte alignment, the default on Intel/MSVC compilers. This will also break if you otherwise created a non-aligned object. But in practice needs_lock<T> works for many situations for me when used wisely. The performance benefits can be dramatic from avoiding such locks and automating the choice is handy especially as it adds to the self documenting nature of the code. Without such an assumptions you would need to check if the address of an rvalue modulo 4 or 8 == 0. This is a bit different as this is dependent on the object rather than the type and more than a compile time trait can handle. Thus need_lock<T> would only be indicative of capability unless alignment is assumed. Perhaps a preprocessor check or some such of appropriate alignment could be used as well, but this would still not guarantee correctness, just help sidestep a common pitfall.
, on ia64 it is 64 bit.
However, remember that the fact that a given data type can be written in a single operation != visibility across all processors in an MP system.
True. The usefulness depends on how you use it. For example, for many size_t, float, int setters and getters on ia32 it is just dandy. However, as you suggest, care must be taken as if the setting or getting of the property may interfere with the incomplete state of another concurrent transactions then you're in dangerous water. I usually find this not to be the case, but when saving such cycles you need to be aware of this and be aware of the atomicity, consistency and isolation aspects. Objects with complex operations should always lock completely unless you are very sure of what you are doing. For a class which consists of a simple bunch of properties, these concerns typically don't exist.
doubles on ia32 need locking to be atomic, on ia64 they don't.
I think that the later models of ia32 can write up to 8 bytes at a time.
You're very right. From below it appears to me that Pentium and above can, only the 486/386 seem to miss out. Allowing atomic doubles and 64 bit ints with need_lock<T> is much nicer as long as your platform supports it. Begs the question: Should boost have specific architecture flags for libs such as Boost.Thread. This is the song and verse from Intel's software developer ia32 manual (vol 3): ____________________________________ <quote> 7.1.1. Guaranteed Atomic Operations The Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors guarantee that the following basic memory operations will always be carried out atomically: • reading or writing a byte • reading or writing a word aligned on a 16-bit boundary • reading or writing a doubleword aligned on a 32-bit boundary The Pentium 4, Intel Xeon, and P6 family, and Pentium processors guarantee that the following additional memory operations will always be carried out atomically: • reading or writing a quadword aligned on a 64-bit boundary • 16-bit accesses to uncached memory locations that fit within a 32-bit data bus The P6 family processors guarantee that the following additional memory operation will always be carried out atomically: • unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a 32-byte cache line Accesses to cacheable memory that are split across bus widths, cache lines, and page boundaries are not guaranteed to be atomic by the Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors. The Pentium 4, Intel Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and should be avoided. </quote> ____________________________________
Any suggestions on best practice for this?
I think you might want to send this post to comp.programming.threads too - you may get some interesting feedback. I think Alexander Therekhov has some class wrapper for atomic types that I am sure he'd be keen to discuss, but unfortunately I've never had the time to review it.
Different type of atomicity to what I'm discussin here I think to the traditional view of atomic ops. Atomic ops do something to a location. Here I'm just talking about getting or setting rather than operating on the value. There is no exchange or operation consistency taking place, just yep you can read or write this many bytes with consistency guaranteed. There has also been talk of a memory model for C++ on std.c++ recently. A causal syntax for required ordering would be useful, but getting beyond sequence points to a memory model with fencing or whatever is going to be a hard slog. Probably better off making memory fence like primitive available through a portable mechanism. The problem with that is that some platforms might offer 15 different types and one might offer 3, as was an example given. No easy answers to there...
Tom Thanks Tom.
Any view on the ifdef, separate header, api wrapping tradeofs for such code? Regards, Matt Hurd matthurd@acm.org www.hurd.com.au
participants (2)
-
Matt Hurd
-
Puverle, Tomas (IT)