Solaris versions of sp_counted_base and atomic_count by Michael van der Westhuizen

Michael van der Westhuizen has generously contributed Solaris versions of atomic_count and sp_counted_base. You can find them in the CVS HEAD and as attachments to this message. I haven't applied his patch (also attached) to atomic_count.hpp and sp_counted_base.hpp that would enable them yet, but I plan do so in a couple of days if there are no objections. If you have any suggestions or comments, please let me know; in particular, if you have a better idea how to detect SunOS 5.10 or later, or if you know what memory synchronization guarantees the Solaris atomics are supposed to provide. :-) Thanks in advance for any help or interest, and of course, big thanks to Michael for his contribution.

Bronek Kozicki kindly reminded me of a g++/Sparc version of sp_counted_base by Piotr Wyderski he sent me some months ago, which I'd completely forgotten about. I've attached it to this message; it requires the V9 instruction set and can be enabled by #elif defined( __GNUC__ ) && ( defined( __sparc ) || defined(__sparc_v9__ ) ) # include <boost/detail/sp_counted_base_gcc_sparc.hpp> in sp_counted_base.hpp. We might want to use it in preference to the Solaris version. Anybody using g++ on Sparc care to comment or give it a try?

Peter Dimov wrote:
Bronek Kozicki kindly reminded me of a g++/Sparc version of sp_counted_base by Piotr Wyderski he sent me some months ago, which I'd completely forgotten about. I've attached it to this message; it requires the V9 instruction set and can be enabled by
#elif defined( __GNUC__ ) && ( defined( __sparc ) || defined(__sparc_v9__ ) ) # include <boost/detail/sp_counted_base_gcc_sparc.hpp>
in sp_counted_base.hpp.
Actually, now that I think about it, this should be #elif defined( __GNUC__ ) && defined( __sparc_v9__ ) # include <boost/detail/sp_counted_base_gcc_sparc.hpp> as the implementation requires Sparc V9.

#elif defined( __GNUC__ ) && ( defined( __sparc ) || defined(__sparc_v9__ ) ) # include <boost/detail/sp_counted_base_gcc_sparc.hpp>
in sp_counted_base.hpp.
Actually, now that I think about it, this should be
#elif defined( __GNUC__ ) && defined( __sparc_v9__ ) # include <boost/detail/sp_counted_base_gcc_sparc.hpp>
as the implementation requires Sparc V9.
Peter, You may want to reconsider this fix. The assembly code in the previous post will work on v8 or v9. The original test is correct. Tom

Tomas Puverle wrote:
#elif defined( __GNUC__ ) && ( defined( __sparc ) || defined(__sparc_v9__ ) ) # include <boost/detail/sp_counted_base_gcc_sparc.hpp>
in sp_counted_base.hpp.
Actually, now that I think about it, this should be
#elif defined( __GNUC__ ) && defined( __sparc_v9__ ) # include <boost/detail/sp_counted_base_gcc_sparc.hpp>
as the implementation requires Sparc V9.
Peter,
You may want to reconsider this fix. The assembly code in the previous post will work on v8 or v9. The original test is correct.
Unfortunately, I'm told that the default for g++ is V7 and the code is being rejected. I don't have access to a Sparc and am unable to test it myself. :-)

You may want to reconsider this fix. The assembly code in the previous post will work on v8 or v9. The original test is correct.
Unfortunately, I'm told that the default for g++ is V7 and the code is being rejected. I don't have access to a Sparc and am unable to test it myself.
For 32bit compilation, consider this: g++ -mcpu=v8 -mtune=ultrasparc3 <file> For 64bit compilation, try this g++ -mcpu=v9 -mtune=ultrasparc3 -m64 <file> I think this should work. Tom

Tomas Puverle wrote:
You may want to reconsider this fix. The assembly code in the previous post will work on v8 or v9. The original test is correct.
Unfortunately, I'm told that the default for g++ is V7 and the code is being rejected. I don't have access to a Sparc and am unable to test it myself.
For 32bit compilation, consider this:
g++ -mcpu=v8 -mtune=ultrasparc3 <file>
For 64bit compilation, try this
g++ -mcpu=v9 -mtune=ultrasparc3 -m64 <file>
I think this should work.
I'm more interested in the source code side; as I understand it, g++ <file> will fail if we test __sparc__ and try to use cas. Do you think that #if defined( __sparc_v8__ ) || defined( __sparc_v9__ ) will work reliably (by which I mean, include the header file when it would compile, and not include it when its compilation would fail)?

On 7/3/06, Peter Dimov <pdimov@mmltd.net> wrote:
Tomas Puverle wrote:
You may want to reconsider this fix. The assembly code in the previous post will work on v8 or v9. The original test is correct.
Unfortunately, I'm told that the default for g++ is V7 and the code is being rejected. I don't have access to a Sparc and am unable to test it myself.
For 32bit compilation, consider this:
g++ -mcpu=v8 -mtune=ultrasparc3 <file>
For 64bit compilation, try this
g++ -mcpu=v9 -mtune=ultrasparc3 -m64 <file>
I think this should work.
I'm more interested in the source code side; as I understand it, g++ <file> will fail if we test __sparc__ and try to use cas. Do you think that
#if defined( __sparc_v8__ ) || defined( __sparc_v9__ )
will work reliably (by which I mean, include the header file when it would compile, and not include it when its compilation would fail)?
I've just tested the above guard, and it works. Before changing to this, I got the following error: /usr/ccs/bin/as: "/var/tmp//ccHTWOlm.s", line 126: error: cannot use v8plus instructions in a non-v8plus target binary After the change, g++ with no machine/cpu arguments produces a binary which file says is: shared_ptr_mt_test: ELF 32-bit MSB executable SPARC Version 1, dynamically linked, not stripped With machine/cpu flags set to "-m32 -mcpu=ultrasparc -mtune=ultrasparc3": shared_ptr_mt_test: ELF 32-bit MSB executable SPARC32PLUS Version 1, V8+ Required, UltraSPARC1 Extensions Required, dynamically linked, not stripped Timing tests tell me that the correct code is selected :-) Michael

For 32bit compilation, consider this:
g++ -mcpu=v8 -mtune=ultrasparc3 <file>
For 64bit compilation, try this
g++ -mcpu=v9 -mtune=ultrasparc3 -m64 <file>
<snip> Re-reading the g++ docs, it seems that compiling with v9 in 32-bit mode implies v8+. So a small change to my original compile line: 32-bit build:
g++ -mcpu=v9 -mtune=ultrasparc3 foo.C
file a.out a.out: ELF 32-bit MSB executable SPARC32PLUS Version 1, V8+ Required, dynamically linked, not stripped
g++ -mcpu=v9 -mtune=ultrasparc3 -m64 foo.C file a.out a.out: ELF 64-bit MSB executable SPARCV9 Version 1, dynamically
64-bit build: linked, not stripped Tom

I'm more interested in the source code side; as I understand it, g++ <file> will fail if we test __sparc__ and try to use cas. Do you think that
#if defined( __sparc_v8__ ) || defined( __sparc_v9__ )
will work reliably (by which I mean, include the header file when it would compile, and not include it when its compilation would fail)?
Yes, based on a cursory look. Can you please forward the whole file to me so I can review it properly? You can PM me. Thanks, Tom

Hi, On 7/3/06, Tomas Puverle <Tomas.Puverle@morganstanley.com> wrote:
You may want to reconsider this fix. The assembly code in the previous post will work on v8 or v9. The original test is correct.
Unfortunately, I'm told that the default for g++ is V7 and the code is being rejected. I don't have access to a Sparc and am unable to test it myself.
For 32bit compilation, consider this:
g++ -mcpu=v8 -mtune=ultrasparc3 <file>
For 64bit compilation, try this
g++ -mcpu=v9 -mtune=ultrasparc3 -m64 <file>
I think this should work.
Tom
This is confirmed to work with changes (you actually need v8plus, but gcc didn't grok that, so I used ultrasparc instead), and with Peter's include guard changes this looks like a solid solution. Here are test results for shared_ptr_mt_test and weak_ptr_mt_test using sp_counted_base_gcc_sparc. I've run tests in both 32 bit (sparcv8plus) and 64 bit (sparcv9) flavours. These tests were run on a dual CPU Sun Fire 280 (2x750MHz UltraSPARC III) which is so old it's already desupported by Sun... [michael@hawk test]$ uname -a; g++ -v SunOS hawk 5.9 Generic_118558-21 sun4u sparc SUNW,Sun-Fire-280R Reading specs from /usr/local/lib/gcc-lib/sparc-sun-solaris2.9/3.3/specs Configured with: ../configure --disable-nls --with-as=/usr/ccs/bin/as --with-ld=/usr/ccs/bin/ld Thread model: posix gcc version 3.3 Anyway, test results appended. Michael ************************* ** 32 bit tests ************************* [michael@hawk test]$ g++ -DBOOST_SP_USE_PTHREADS -pthreads -m32 -mcpu=ultrasparc -mtune=ultrasparc3 -O2 -Wall -ftemplate-depth-255 -Wno-non-virtual-dtor -I "/export/home/michael/boost_cvs/boost" -o shared_ptr_mt_test shared_ptr_mt_test.cpp [michael@hawk test]$ file shared_ptr_mt_test shared_ptr_mt_test: ELF 32-bit MSB executable SPARC32PLUS Version 1, V8+ Required, UltraSPARC1 Extensions Required, dynamically linked, not stripped [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib timex ./shared_ptr_mt_test Using POSIX threads: 16 threads, 1048576 iterations: 71.740 seconds. real 39.29 user 1:02.11 sys 9.64 [michael@hawk test]$ g++ -pthreads -m32 -mcpu=ultrasparc -mtune=ultrasparc3 -O2 -Wall -ftemplate-depth-255 -Wno-non-virtual-dtor -I "/export/home/michael/boost_cvs/boost" -o shared_ptr_mt_test shared_ptr_mt_test.cpp [michael@hawk test]$ file shared_ptr_mt_test shared_ptr_mt_test: ELF 32-bit MSB executable SPARC32PLUS Version 1, V8+ Required, UltraSPARC1 Extensions Required, dynamically linked, not stripped [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib timex ./shared_ptr_mt_test Using POSIX threads: 16 threads, 1048576 iterations: 31.090 seconds. real 16.20 user 29.58 sys 1.53 [michael@hawk test]$ g++ -DBOOST_SP_USE_PTHREADS -pthreads -m32 -mcpu=ultrasparc -mtune=ultrasparc3 -O2 -Wall -ftemplate-depth-255 -Wno-non-virtual-dtor -I "/export/home/michael/boost_cvs/boost" -o weak_ptr_mt_test weak_ptr_mt_test.cpp [michael@hawk test]$ file weak_ptr_mt_test weak_ptr_mt_test: ELF 32-bit MSB executable SPARC32PLUS Version 1, V8+ Required, UltraSPARC1 Extensions Required, dynamically linked, not stripped [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib timex ./weak_ptr_mt_test Using POSIX threads: 16 threads, 16384 * 512 iterations: 382300 locks, 182933 forced rebinds, 8022692 normal rebinds. 364167 locks, 173352 forced rebinds, 8040825 normal rebinds. 379800 locks, 181960 forced rebinds, 8025192 normal rebinds. 377893 locks, 180638 forced rebinds, 8027099 normal rebinds. 384408 locks, 183885 forced rebinds, 8020584 normal rebinds. 383262 locks, 183571 forced rebinds, 8021730 normal rebinds. 409346 locks, 196148 forced rebinds, 7995646 normal rebinds. 387171 locks, 185867 forced rebinds, 8017821 normal rebinds. 392784 locks, 188247 forced rebinds, 8012208 normal rebinds. 372941 locks, 178030 forced rebinds, 8032051 normal rebinds. 372678 locks, 177818 forced rebinds, 8032314 normal rebinds. 370134 locks, 176891 forced rebinds, 8034858 normal rebinds. 384430 locks, 184190 forced rebinds, 8020562 normal rebinds. 375437 locks, 179117 forced rebinds, 8029555 normal rebinds. 391604 locks, 187686 forced rebinds, 8013388 normal rebinds. 360062 locks, 172281 forced rebinds, 8044930 normal rebinds. 60.330 seconds. real 31.12 user 1:00.31 sys 0.03 [michael@hawk test]$ g++ -pthreads -m32 -mcpu=ultrasparc -mtune=ultrasparc3 -O2 -Wall -ftemplate-depth-255 -Wno-non-virtual-dtor -I "/export/home/michael/boost_cvs/boost" -o weak_ptr_mt_test weak_ptr_mt_test.cpp [michael@hawk test]$ file weak_ptr_mt_test weak_ptr_mt_test: ELF 32-bit MSB executable SPARC32PLUS Version 1, V8+ Required, UltraSPARC1 Extensions Required, dynamically linked, not stripped [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib timex ./weak_ptr_mt_test Using POSIX threads: 16 threads, 16384 * 512 iterations: 384712 locks, 184139 forced rebinds, 8020280 normal rebinds. 383976 locks, 183671 forced rebinds, 8021016 normal rebinds. 366409 locks, 175521 forced rebinds, 8038583 normal rebinds. 379393 locks, 181134 forced rebinds, 8025599 normal rebinds. 382390 locks, 183082 forced rebinds, 8022602 normal rebinds. 383288 locks, 183024 forced rebinds, 8021704 normal rebinds. 372383 locks, 178038 forced rebinds, 8032609 normal rebinds. 371368 locks, 177416 forced rebinds, 8033624 normal rebinds. 371369 locks, 177616 forced rebinds, 8033623 normal rebinds. 382172 locks, 183196 forced rebinds, 8022820 normal rebinds. 384192 locks, 184367 forced rebinds, 8020800 normal rebinds. 393082 locks, 188548 forced rebinds, 8011910 normal rebinds. 381293 locks, 182481 forced rebinds, 8023699 normal rebinds. 405369 locks, 194527 forced rebinds, 7999623 normal rebinds. 371669 locks, 177506 forced rebinds, 8033323 normal rebinds. 377079 locks, 180282 forced rebinds, 8027913 normal rebinds. 59.540 seconds. real 30.21 user 59.53 sys 0.03 ************************* ** 64 bit tests ************************* [michael@hawk test]$ g++ -DBOOST_SP_USE_PTHREADS -pthreads -m64 -mcpu=ultrasparc -mtune=ultrasparc3 -O2 -Wall -ftemplate-depth-255 -Wno-non-virtual-dtor -I "/export/home/michael/boost_cvs/boost" -o shared_ptr_mt_test shared_ptr_mt_test.cpp [michael@hawk test]$ file shared_ptr_mt_test shared_ptr_mt_test: ELF 64-bit MSB executable SPARCV9 Version 1, UltraSPARC1 Extensions Required, dynamically linked, not stripped [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib/sparcv9 timex ./shared_ptr_mt_test Using POSIX threads: 16 threads, 1048576 iterations: 83.140 seconds. real 45.19 user 1:09.11 sys 14.05 [michael@hawk test]$ g++ -pthreads -m64 -mcpu=ultrasparc -mtune=ultrasparc3 -O2 -Wall -ftemplate-depth-255 -Wno-non-virtual-dtor -I "/export/home/michael/boost_cvs/boost" -o shared_ptr_mt_test shared_ptr_mt_test.cpp [michael@hawk test]$ file shared_ptr_mt_test shared_ptr_mt_test: ELF 64-bit MSB executable SPARCV9 Version 1, UltraSPARC1 Extensions Required, dynamically linked, not stripped [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib/sparcv9 timex ./shared_ptr_mt_test Using POSIX threads: 16 threads, 1048576 iterations: 30.560 seconds. real 16.28 user 27.99 sys 2.58 [michael@hawk test]$ g++ -DBOOST_SP_USE_PTHREADS -pthreads -m64 -mcpu=ultrasparc -mtune=ultrasparc3 -O2 -Wall -ftemplate-depth-255 -Wno-non-virtual-dtor -I "/export/home/michael/boost_cvs/boost" -o weak_ptr_mt_test weak_ptr_mt_test.cpp [michael@hawk test]$ file weak_ptr_mt_test weak_ptr_mt_test: ELF 64-bit MSB executable SPARCV9 Version 1, UltraSPARC1 Extensions Required, dynamically linked, not stripped [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib/sparcv9 timex ./weak_ptr_mt_test Using POSIX threads: 16 threads, 16384 * 512 iterations: 364959 locks, 174877 forced rebinds, 8040033 normal rebinds. 382668 locks, 183003 forced rebinds, 8022324 normal rebinds. 394876 locks, 189060 forced rebinds, 8010116 normal rebinds. 380618 locks, 181686 forced rebinds, 8024374 normal rebinds. 362100 locks, 172931 forced rebinds, 8042892 normal rebinds. 372930 locks, 178306 forced rebinds, 8032062 normal rebinds. 386203 locks, 184726 forced rebinds, 8018789 normal rebinds. 369316 locks, 176145 forced rebinds, 8035676 normal rebinds. 388403 locks, 185688 forced rebinds, 8016589 normal rebinds. 367113 locks, 175432 forced rebinds, 8037879 normal rebinds. 381852 locks, 182550 forced rebinds, 8023140 normal rebinds. 376530 locks, 180348 forced rebinds, 8028462 normal rebinds. 358899 locks, 171207 forced rebinds, 8046093 normal rebinds. 398739 locks, 191667 forced rebinds, 8006253 normal rebinds. 349401 locks, 166384 forced rebinds, 8055591 normal rebinds. 368334 locks, 176060 forced rebinds, 8036658 normal rebinds. 40.560 seconds. real 20.55 user 40.52 sys 0.06 [michael@hawk test]$ g++ -pthreads -m64 -mcpu=ultrasparc -mtune=ultrasparc3 -O2 -Wall -ftemplate-depth-255 -Wno-non-virtual-dtor -I "/export/home/michael/boost_cvs/boost" -o weak_ptr_mt_test weak_ptr_mt_test.cpp [michael@hawk test]$ file weak_ptr_mt_test weak_ptr_mt_test: ELF 64-bit MSB executable SPARCV9 Version 1, UltraSPARC1 Extensions Required, dynamically linked, not stripped [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib/sparcv9 timex ./weak_ptr_mt_test Using POSIX threads: 16 threads, 16384 * 512 iterations: 373029 locks, 178768 forced rebinds, 8031963 normal rebinds. 371022 locks, 177439 forced rebinds, 8033970 normal rebinds. 384749 locks, 184009 forced rebinds, 8020243 normal rebinds. 382946 locks, 183064 forced rebinds, 8022046 normal rebinds. 364680 locks, 174424 forced rebinds, 8040312 normal rebinds. 388327 locks, 186181 forced rebinds, 8016665 normal rebinds. 386915 locks, 185417 forced rebinds, 8018077 normal rebinds. 362228 locks, 172848 forced rebinds, 8042764 normal rebinds. 393288 locks, 188491 forced rebinds, 8011704 normal rebinds. 345412 locks, 164552 forced rebinds, 8059580 normal rebinds. 371340 locks, 177177 forced rebinds, 8033652 normal rebinds. 375262 locks, 179372 forced rebinds, 8029730 normal rebinds. 388247 locks, 185931 forced rebinds, 8016745 normal rebinds. 364153 locks, 173886 forced rebinds, 8040839 normal rebinds. 375689 locks, 179917 forced rebinds, 8029303 normal rebinds. 386681 locks, 185009 forced rebinds, 8018311 normal rebinds. 37.290 seconds. real 19.01 user 37.27 sys 0.04

inline long compare_exchange(long* v, const long c, const long n) {
long r;
#ifdef __arch64__
// UltraSparc 64-bit variant
__asm__ __volatile__("casx [%2], %3, %0 \t\n" \ " \t\n" : "=&r"(r), "=m"(*v) : "r"(v), "r"(c), "0"(n), "m"(*v) :); #else
// Legacy 32-bit variant
__asm__ __volatile__("cas [%2], %3, %0 \t\n" \ " \t\n" : "=&r"(r), "=m"(*v) : "r"(v), "r"(c), "0"(n), "m"(*v) :);
Here's my version: inline uint32_t compareAndSwap(uint32_t * dest_, uint32_t compare_, uint32_t swap_) { __asm__ __volatile__("cas %0, %1, %2 \n\t" : "+m" (*dest_), "+r" (compare_) : "r" (swap_) : ); return compare_; } inline uint64_t compareAndSwap(uint64_t * dest_, uint64_t compare_, uint64_t swap_) { __asm__ __volatile__("casx %0, %1, %2 \n\t" : "+m" (*dest_), "+r" (compare_) : "r" (swap_) : ); return compare_; } I also have these intrinsics for the Sunpro compiler. The only problem is that with Sunpro, these have to come in a separate .il file, which has to be included on the compile line at the same time as the .cpp being compiled. Tom

Hi Tom, On 7/3/06, Tomas Puverle <Tomas.Puverle@morganstanley.com> wrote:
inline long compare_exchange(long* v, const long c, const long n) { [snip intrinsics code]
I also have these intrinsics for the Sunpro compiler. The only problem is that with Sunpro, these have to come in a separate .il file, which has to be included on the compile line at the same time as the .cpp being compiled.
Could you send the sunpro versions to me (or the list) please? I'll see what I can do about getting them into asm() blocks in the morning. For those not familiar with sunpro asm() blocks, they're a nightmare, and are completely defeated by inlining, but I think I have a workaround... Michael

Could you send the sunpro versions to me (or the list) please? I'll see what I can do about getting them into asm() blocks in the morning.
I have tried and tried a while ago. :) My conclusion was that unfortunately register allocation doesn't play nicely with inlining. Also, returning a parameter is REALLY hard to do and I wasn't able to get it to work in the general case. The atomics for sunpro are pretty much identical to the gcc version, however, because the asm() block is inserted at a different stage of compilation than the .il file, the block doesn't recognise the synthetic instructions such as cas or casx. You will need to use the real version of the instruction, which is CASA and CASXA, so your asm block will look like this (if my memory serves me well): //32-bit asm("casa (%reg) #ASI_PRIMARY_BIG, %compareReg, %swapReg") //64-bit asm("casxa (%reg) #ASI_PRIMARY_BIG, %compareReg, %swapReg") If you can get this to work, I'd love to hear back from you becase I absolutely loathe the .il model. It really makes it difficult to write libs, because you have to do special magic to add the .il to the compile line when your library gets included. Tom

Hi Tom, On 7/4/06, Tomas Puverle <Tomas.Puverle@morganstanley.com> wrote:
Could you send the sunpro versions to me (or the list) please? I'll see what I can do about getting them into asm() blocks in the morning.
I have tried and tried a while ago. :) My conclusion was that unfortunately register allocation doesn't play nicely with inlining. Also, returning a parameter is REALLY hard to do and I wasn't able to get it to work in the general case.
Yes, that's the nightmare part I was talking about :-)
The atomics for sunpro are pretty much identical to the gcc version, however, because the asm() block is inserted at a different stage of compilation than the .il file, the block doesn't recognise the synthetic instructions such as cas or casx. You will need to use the real version of the instruction, which is CASA and CASXA, so your asm block will look like this (if my memory serves me well):
[snip]
If you can get this to work, I'd love to hear back from you becase I absolutely loathe the .il model. It really makes it difficult to write libs, because you have to do special magic to add the .il to the compile line when your library gets included.
We are starting to drift a little off topic now, but if the list doesn't mind indulging us :-) As Tom correctly states above, inlining a function containing an asm() call makes it difficult /impossible to refer to registers when compiling with Sun Studio. The following technique is not suitable for applications which accept the overhead of a function call, but it is suitable for libraries wanting to remain header-only while using inline assembly. All of that aside, this technique also smells like a hack, but it does work. What you do is create your functions containing your inline assembly as a template in an anonymous namespace in your header, like so: namespace { // effects: (*target)++; template <typename T> void templated_atomic_inc_32(volatile uint32_t *target) { #if defined(__i386) asm(".volatile \n\ movl 8(%ebp), %eax \n\ lock \n\ incl (%eax) \n\ .nonvolatile \n\ "); #else # error Port me #endif } } Then give it a "normal" name, using an inline function, like this: inline void my_atomic_inc_32(volatile uint32_t *target) { templated_atomic_inc_32<bool>(target); } Then just use "my_atomic_inc_32" wherever you need it, and you won't have to drag .il files around with you! I've attached my proof-of-concept source. This compiles and works predictably with or without inlining, and in debug or at the highest optimisation levels. Tom, I hope this is the solution you were looking for - if your performance requirements can accept the extra function call, then this should work for you. Michael

Tomas Puverle wrote:
Here's my version:
inline uint32_t compareAndSwap(uint32_t * dest_, uint32_t compare_, uint32_t swap_) { __asm__ __volatile__("cas %0, %1, %2 \n\t" : "+m" (*dest_), "+r" (compare_) : "r" (swap_) : ); return compare_; }
Is it possible to make this: inline bool compareAndSwap(uint32_t * dest_, uint32_t * compare_, uint32_t swap_ ); such that compare_ receives the old value and the function returns true on success, false on failure? I tried finding information on whether the CAS instruction sets any flags but the online manuals are of little use. Also, where is uint32_t defined? We probably need to include the appropriate header.

Hi Peter, On 7/7/06, Peter Dimov <pdimov@mmltd.net> wrote: [snip]
Is it possible to make this:
inline bool compareAndSwap(uint32_t * dest_, uint32_t * compare_, uint32_t swap_ );
[snip]
Also, where is uint32_t defined? We probably need to include the appropriate header.
You're looking for: #include <inttypes.h> Michael

I wrote:
I tried finding information on whether the CAS instruction sets any flags...
It doesn't. Michael van der Westhuizen wrote:
Also, where is uint32_t defined? We probably need to include the appropriate header.
You're looking for: #include <inttypes.h>
Thanks. I've attached my current version of sp_counted_base_gcc_sparc.hpp. Does it look correct to you?

Hi Peter, On 7/8/06, Peter Dimov <pdimov@mmltd.net> wrote:
Thanks. I've attached my current version of sp_counted_base_gcc_sparc.hpp. Does it look correct to you?
compare_and_swap certainly looks correct. atomic_fetch_and_add and atomic_conditional_increment both call compare_exchange, which does not exist in this file - are they meant to call compare_and_swap? I like the use of __builtin_expect - I didn't know that existed! Michael

Michael van der Westhuizen wrote:
Hi Peter,
On 7/8/06, Peter Dimov <pdimov@mmltd.net> wrote:
Thanks. I've attached my current version of sp_counted_base_gcc_sparc.hpp. Does it look correct to you?
compare_and_swap certainly looks correct.
atomic_fetch_and_add and atomic_conditional_increment both call compare_exchange, which does not exist in this file - are they meant to call compare_and_swap?
Yes indeed. I've attached a fixed version.
I like the use of __builtin_expect - I didn't know that existed!
Credit for that goes to Piotr Wyderski, the author of the original. :-)

Hi Peter, On 7/8/06, Peter Dimov <pdimov@mmltd.net> wrote:
Michael van der Westhuizen wrote:
On 7/8/06, Peter Dimov <pdimov@mmltd.net> wrote:
Thanks. I've attached my current version of sp_counted_base_gcc_sparc.hpp. Does it look correct to you?
compare_and_swap certainly looks correct.
Yes indeed. I've attached a fixed version.
Ok, it looks like I spoke too soon. I was getting core dumps on weak_ptr_mt_test until I changed compare_and_swap to this: inline int32_t compare_and_swap( int32_t * dest_, int32_t compare_, int32_t swap_ ) { __asm__ __volatile__( "cas [%2], %3, %0" : "=&r"(compare_), "=m"(*dest_) : "r"(dest_), "r"(compare_), "0"(swap_), "m"(*dest_) : "memory" ); return compare_; } After that, all mt tests pass. I also had to change the selector code in detail/sp_counted_base to this: #elif defined(__GNUC__) && ( defined(__sparcv8) || defined(__sparcv9) ) # include <boost/detail/sp_counted_base_gcc_sparc.hpp> Test results follow. Michael [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib timex ./shared_ptr_mt_test_v8 Using POSIX threads: 16 threads, 1048576 iterations: 30.880 seconds. real 16.01 user 29.38 sys 1.52 [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib timex ./weak_ptr_mt_test_v8 Using POSIX threads: 16 threads, 16384 * 512 iterations: 379813 locks, 181731 forced rebinds, 8025179 normal rebinds. 371551 locks, 177624 forced rebinds, 8033441 normal rebinds. 374752 locks, 178872 forced rebinds, 8030240 normal rebinds. 383374 locks, 183838 forced rebinds, 8021618 normal rebinds. 380078 locks, 181728 forced rebinds, 8024914 normal rebinds. 382020 locks, 182268 forced rebinds, 8022972 normal rebinds. 369253 locks, 176284 forced rebinds, 8035739 normal rebinds. 393058 locks, 187938 forced rebinds, 8011934 normal rebinds. 382446 locks, 183078 forced rebinds, 8022546 normal rebinds. 375525 locks, 179473 forced rebinds, 8029467 normal rebinds. 400324 locks, 191431 forced rebinds, 8004668 normal rebinds. 386603 locks, 185354 forced rebinds, 8018389 normal rebinds. 371274 locks, 177531 forced rebinds, 8033718 normal rebinds. 373879 locks, 178946 forced rebinds, 8031113 normal rebinds. 413007 locks, 197973 forced rebinds, 7991985 normal rebinds. 397481 locks, 191423 forced rebinds, 8007511 normal rebinds. 56.960 seconds. real 29.46 user 56.95 sys 0.03 [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib/sparcv9 timex ./shared_ptr_mt_test_v9 Using POSIX threads: 16 threads, 1048576 iterations: 29.900 seconds. real 16.11 user 27.08 sys 2.83 [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib/sparcv9 timex ./weak_ptr_mt_test_v9 Using POSIX threads: 16 threads, 16384 * 512 iterations: 374408 locks, 179259 forced rebinds, 8030584 normal rebinds. 373631 locks, 178540 forced rebinds, 8031361 normal rebinds. 366733 locks, 175164 forced rebinds, 8038259 normal rebinds. 365186 locks, 174167 forced rebinds, 8039806 normal rebinds. 371817 locks, 178056 forced rebinds, 8033175 normal rebinds. 388075 locks, 186092 forced rebinds, 8016917 normal rebinds. 380838 locks, 181954 forced rebinds, 8024154 normal rebinds. 364328 locks, 174298 forced rebinds, 8040664 normal rebinds. 360716 locks, 172291 forced rebinds, 8044276 normal rebinds. 359705 locks, 171985 forced rebinds, 8045287 normal rebinds. 374523 locks, 179275 forced rebinds, 8030469 normal rebinds. 381814 locks, 182852 forced rebinds, 8023178 normal rebinds. 380550 locks, 182078 forced rebinds, 8024442 normal rebinds. 375862 locks, 179565 forced rebinds, 8029130 normal rebinds. 351562 locks, 167660 forced rebinds, 8053430 normal rebinds. 398814 locks, 191018 forced rebinds, 8006178 normal rebinds. 38.000 seconds. real 19.21 user 38.01 sys 0.01

Michael van der Westhuizen wrote:
Ok, it looks like I spoke too soon. I was getting core dumps on weak_ptr_mt_test until I changed compare_and_swap to this:
inline int32_t compare_and_swap( int32_t * dest_, int32_t compare_, int32_t swap_ ) { __asm__ __volatile__( "cas [%2], %3, %0" : "=&r"(compare_), "=m"(*dest_) : "r"(dest_), "r"(compare_), "0"(swap_), "m"(*dest_) : "memory" ); return compare_; }
It's better if we can get an +m(*p)/=m(*p) formulation to work, instead of passing p in a register with r(p). It makes a difference in efficiency on platforms that can address an expression in addition to a single register, and on some architectures/modes when a C++ pointer is 32 bit but the hardware address is 64 bit, using r(p) truncates p to 32 bits and crashes. :-) Actually, the bug in the original seems to be that CAS returns the old value in swap_ and not compare_, so how about: inline int32_t compare_and_swap( int32_t * dest_, int32_t compare_, int32_t swap_ ) { __asm__ __volatile__( "cas %0, %2, %1" : "+m" (*dest_), "+r" (swap_) : "r" (compare_) : "memory" ); return swap_; }

Hi, On 7/8/06, Peter Dimov <pdimov@mmltd.net> wrote:
Actually, the bug in the original seems to be that CAS returns the old value in swap_ and not compare_, so how about:
inline int32_t compare_and_swap( int32_t * dest_, int32_t compare_, int32_t swap_ ) { __asm__ __volatile__( "cas %0, %2, %1" : "+m" (*dest_), "+r" (swap_) : "r" (compare_) : "memory" );
return swap_; }
I've just tested with the above code, and it works perfectly. Test results below. Michael [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib timex ./shared_ptr_mt_test_v8 LD_LIBRARY_PATH=/usr/local/lib timex ./weak_ptr_mt_test_v8 LD_LIBRARY_PATH=/usr/local/lib/sparcv9 timex ./shared_ptr_mt_test_v9 LD_LIBRARY_PATH=/usr/local/lib/sparcv9 timex ./weak_ptr_mt_test_v9 Using POSIX threads: 16 threads, 1048576 iterations: 29.290 seconds. real 15.35 user 27.80 sys 1.50 [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib timex ./weak_ptr_mt_test_v8 Using POSIX threads: 16 threads, 16384 * 512 iterations: 393142 locks, 188334 forced rebinds, 8011850 normal rebinds. 370140 locks, 176821 forced rebinds, 8034852 normal rebinds. 379069 locks, 181471 forced rebinds, 8025923 normal rebinds. 385131 locks, 183882 forced rebinds, 8019861 normal rebinds. 386402 locks, 184821 forced rebinds, 8018590 normal rebinds. 371917 locks, 177690 forced rebinds, 8033075 normal rebinds. 354785 locks, 169169 forced rebinds, 8050207 normal rebinds. 386274 locks, 184679 forced rebinds, 8018718 normal rebinds. 365789 locks, 174817 forced rebinds, 8039203 normal rebinds. 386848 locks, 185118 forced rebinds, 8018144 normal rebinds. 391157 locks, 187057 forced rebinds, 8013835 normal rebinds. 393826 locks, 188886 forced rebinds, 8011166 normal rebinds. 379926 locks, 181955 forced rebinds, 8025066 normal rebinds. 368179 locks, 175849 forced rebinds, 8036813 normal rebinds. 376914 locks, 179919 forced rebinds, 8028078 normal rebinds. 393944 locks, 188666 forced rebinds, 8011048 normal rebinds. 57.650 seconds. real 29.55 user 57.64 sys 0.03 [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib/sparcv9 timex ./shared_ptr_mt_test_v9 Using POSIX threads: 16 threads, 1048576 iterations: 29.490 seconds. real 16.02 user 26.97 sys 2.55 [michael@hawk test]$ LD_LIBRARY_PATH=/usr/local/lib/sparcv9 timex ./weak_ptr_mt_test_v9 Using POSIX threads: 16 threads, 16384 * 512 iterations: 388081 locks, 186099 forced rebinds, 8016911 normal rebinds. 364511 locks, 173847 forced rebinds, 8040481 normal rebinds. 369474 locks, 176415 forced rebinds, 8035518 normal rebinds. 383873 locks, 183812 forced rebinds, 8021119 normal rebinds. 384370 locks, 184453 forced rebinds, 8020622 normal rebinds. 383454 locks, 183042 forced rebinds, 8021538 normal rebinds. 388084 locks, 186153 forced rebinds, 8016908 normal rebinds. 371612 locks, 178114 forced rebinds, 8033380 normal rebinds. 372215 locks, 177854 forced rebinds, 8032777 normal rebinds. 381007 locks, 181799 forced rebinds, 8023985 normal rebinds. 377198 locks, 180555 forced rebinds, 8027794 normal rebinds. 375531 locks, 179215 forced rebinds, 8029461 normal rebinds. 383229 locks, 183052 forced rebinds, 8021763 normal rebinds. 399208 locks, 191269 forced rebinds, 8005784 normal rebinds. 404128 locks, 193853 forced rebinds, 8000864 normal rebinds. 372118 locks, 178341 forced rebinds, 8032874 normal rebinds. 37.060 seconds. real 19.37 user 37.05 sys 0.02

Michael van der Westhuizen wrote:
inline int32_t compare_and_swap( int32_t * dest_, int32_t compare_, int32_t swap_ ) { __asm__ __volatile__( "cas %0, %2, %1" : "+m" (*dest_), "+r" (swap_) : "r" (compare_) : "memory" );
return swap_; }
I've just tested with the above code, and it works perfectly. Test results below.
Great! Added to CVS, let's hope that I haven't broken anything in the final round. Regarding the Solaris situation, perhaps we need to just depend on the user-specified BOOST_USE_SOLARIS_ATOMICS for now? Has Solaris 5.11 been officially released?

Hi Peter, On 7/8/06, Peter Dimov <pdimov@mmltd.net> wrote:
Regarding the Solaris situation, perhaps we need to just depend on the user-specified BOOST_USE_SOLARIS_ATOMICS for now? Has Solaris 5.11 been officially released?
Sounds good to me. Solaris 11 has not yet been released - it is currently only available as pre-release software via the Solaris Express program, and as OpenSolaris. I've got no problems with specifying BOOST_USE_SOLARIS_ATOMICS. I'll log an issue against the Sun Studio C++ issue tracker for easier to use SunOS version detection. Michael

Hi All, On 7/8/06, Michael van der Westhuizen <r1mikey@gmail.com> wrote:
Hi Peter, [snip] I've got no problems with specifying BOOST_USE_SOLARIS_ATOMICS. I'll log an issue against the Sun Studio C++ issue tracker for easier to use SunOS version detection.
This issue is now logged with Sun as bug ID 6448611. The issue will be viewable in a few days at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6448611. If you are a Sun Developer Network member, please vote for the bug to be resolved at http://bugs.sun.com/bugdatabase/addVote.do?bug_id=6448611, or keep an eye on the bug at http://bugs.sun.com/bugdatabase/addBugWatch.do?bug_id=6448611. Michael

On 7/8/06, Peter Dimov <pdimov@mmltd.net> wrote:
Regarding the Solaris situation, perhaps we need to just depend on the user-specified BOOST_USE_SOLARIS_ATOMICS for now? Has Solaris 5.11 been officially released?
Picking nits as is my wont, shouldn't this be SPARC and not SOLARIS? Solaris runs on architectures other than SPARC (e.g. x86). The -- Caleb Epstein

Hi Caleb, On 7/13/06, Caleb Epstein <caleb.epstein@gmail.com> wrote: [snip]
Picking nits as is my wont, shouldn't this be SPARC and not SOLARIS? Solaris runs on architectures other than SPARC (e.g. x86). The
A worthwhile nit to pick, and an important question :-) Solaris provides versions of the underlying functions used by this implementation of sp_counted_base for all supported CPU types (sparc , sparcv9, i386 and amd64). Implementations can be viewed in the appropriately named subdirectories of http://tinyurl.com/m5xbj Michael

Also, where is uint32_t defined? We probably need to include the appropriate header.
You're looking for: #include <inttypes.h>
Sorry for being out of the loop for a few days. Why not use <boost/cstdint.hpp>?

On 7/3/06, Peter Dimov <pdimov@mmltd.net> wrote:
Bronek Kozicki kindly reminded me of a g++/Sparc version of sp_counted_base by Piotr Wyderski he sent me some months ago, which I'd completely forgotten about. I've attached it to this message; it requires the V9 instruction set and can be enabled by
#elif defined( __GNUC__ ) && ( defined( __sparc ) || defined(__sparc_v9__ ) ) # include <boost/detail/sp_counted_base_gcc_sparc.hpp>
in sp_counted_base.hpp. We might want to use it in preference to the Solaris version. Anybody using g++ on Sparc care to comment or give it a try?
Just a note: I had to delete the last ":" in the inline ASM (both 32 and 64 bit versions) for compare_exchange to get this to compile. Michael

Just a note: I had to delete the last ":" in the inline ASM (both 32 and 64 bit versions) for compare_exchange to get this to compile.
I think we should simplify the constraints a bit more, see my previous email.

On 7/3/06, Peter Dimov <pdimov@mmltd.net> wrote: [snip]
If you have any suggestions or comments, please let me know; in particular, if you have a better idea how to detect SunOS 5.10 or later, or if you know what memory synchronization guarantees the Solaris atomics are supposed to provide. :-)
Regarding memory synchronisation... The SPARC v9 architecture manual (http://developers.sun.com/solaris/articles/sparcv9.pdf) suggests that memory ordering is not a problem. Section 8.4.6 (Hardware Primitives for Mutual Exclusion) lists Compare and Swap (CAS) as being completely atomic. The following sentence from the manual suggests that (since we're not doing DMA or IO) we have a reliable solution: "In addition, the atomicity of hardware mutual-exclusion primitives is guaranteed only for processor memory references and not when the memory location is simultaneously being addressed by an I/O device such as a channel or DMA (impl. dep. #120)." The atomic_inc_32 and atomic_dec_32 functions are implemented in terms of "add32", which uses a CAS construct to perform the "add". I'd guess that these were implemented this way due to the atomic nature of CAS. All of that aside, the Solaris manual pages state that this interface is MT-safe, and the kernel versions of these functions are used in a similar way to what I've used in sp_counted_base_solaris.hpp, so I *hope* there are no memory ordering bugs lurking. The SPARC implementation of the atomic functions can be viewed in the OpenSolaris repository: http://cvs.opensolaris.org/source/xref/on/usr/src/common/atomic/sparc/atomic... I'm also going to post to the Solaris developers forums to see what feedback I can get on memory ordering issues, but questions like that have been pretty much ignored in the past. Michael

If you have any suggestions or comments, please let me know; in particular, if you have a better idea how to detect SunOS 5.10 or later, or if you know what memory synchronization guarantees the Solaris atomics are supposed to provide.
Regarding memory synchronisation...
The SPARC v9 architecture manual (http://developers.sun.com/solaris/articles/sparcv9.pdf) suggests that memory ordering is not a problem.
<snip> As a second check, from my reading of the ultrasparc III manual (http://www.sun.com/processors/manuals/USIIIv2.pdf), it indeed seems like that MEMBAR is not necessary on atomic instructions, whether or not the CPU is running in the TSO or in the relaxed memory model mode. As for the OS primitives, as far as I'm concerned it's much better to use the unline asm (and it will work on pre-solaris 10, too!) Tom

Hi, On 7/3/06, Peter Dimov <pdimov@mmltd.net> wrote:
If you have any suggestions or comments, please let me know; in particular, if you have a better idea how to detect SunOS 5.10 or later, or if you know what memory synchronization guarantees the Solaris atomics are supposed to provide. :-)
I've just completed testing of sp_counted_base_solaris using Solaris 10 x86 and amd64 with Sun Studio 11. These tests were run on a HT P4 2.8GHz, not a real SMP system, as I don't have any of those. The 32 bit pthreads vs. atomic MT test results are surprising, I ran these tests three times to ensure that there wasn't spurious load on the machine at the time of the pthreads tests. Test results below. Michael === 32 bit tests === => shared_ptr pthreads Using POSIX threads: 16 threads, 1048576 iterations: 697.770 seconds. real 5:54.93 user 11:35.56 sys 2.21 => shared_ptr atomic Using POSIX threads: 16 threads, 1048576 iterations: 9.990 seconds. real 5.26 user 9.46 sys 0.53 => weak_ptr pthreads Using POSIX threads: 16 threads, 16384 * 512 iterations: 399466 locks, 191047 forced rebinds, 8005526 normal rebinds. 385025 locks, 184121 forced rebinds, 8019967 normal rebinds. 380024 locks, 181677 forced rebinds, 8024968 normal rebinds. 383451 locks, 183404 forced rebinds, 8021541 normal rebinds. 384869 locks, 184049 forced rebinds, 8020123 normal rebinds. 380370 locks, 181706 forced rebinds, 8024622 normal rebinds. 380380 locks, 182010 forced rebinds, 8024612 normal rebinds. 389058 locks, 186247 forced rebinds, 8015934 normal rebinds. 373268 locks, 178289 forced rebinds, 8031724 normal rebinds. 383204 locks, 183411 forced rebinds, 8021788 normal rebinds. 378338 locks, 180868 forced rebinds, 8026654 normal rebinds. 373563 locks, 178833 forced rebinds, 8031429 normal rebinds. 382966 locks, 183010 forced rebinds, 8022026 normal rebinds. 385417 locks, 184716 forced rebinds, 8019575 normal rebinds. 407387 locks, 195585 forced rebinds, 7997605 normal rebinds. 373062 locks, 177987 forced rebinds, 8031930 normal rebinds. 31.660 seconds. real 16.09 user 31.64 sys 0.02 => weak_ptr atomic Using POSIX threads: 16 threads, 16384 * 512 iterations: 402357 locks, 193055 forced rebinds, 8002635 normal rebinds. 374548 locks, 179180 forced rebinds, 8030444 normal rebinds. 357985 locks, 170962 forced rebinds, 8047007 normal rebinds. 385048 locks, 184842 forced rebinds, 8019944 normal rebinds. 385500 locks, 184454 forced rebinds, 8019492 normal rebinds. 357195 locks, 170384 forced rebinds, 8047797 normal rebinds. 401780 locks, 192791 forced rebinds, 8003212 normal rebinds. 406168 locks, 194565 forced rebinds, 7998824 normal rebinds. 394920 locks, 189816 forced rebinds, 8010072 normal rebinds. 378184 locks, 180446 forced rebinds, 8026808 normal rebinds. 385599 locks, 184390 forced rebinds, 8019393 normal rebinds. 379443 locks, 181784 forced rebinds, 8025549 normal rebinds. 377387 locks, 180414 forced rebinds, 8027605 normal rebinds. 371901 locks, 178113 forced rebinds, 8033091 normal rebinds. 386731 locks, 185463 forced rebinds, 8018261 normal rebinds. 381244 locks, 182401 forced rebinds, 8023748 normal rebinds. 28.120 seconds. real 14.30 user 28.10 sys 0.02 === 64 bit tests === => shared_ptr pthreads Using POSIX threads: 16 threads, 1048576 iterations: 37.980 seconds. real 19.73 user 36.73 sys 1.25 => shared_ptr atomic Using POSIX threads: 16 threads, 1048576 iterations: 9.410 seconds. real 5.18 user 8.35 sys 1.06

Hello All, On 7/3/06, Peter Dimov <pdimov@mmltd.net> wrote: [snip]
If you have any suggestions or comments, please let me know; in particular, if you have a better idea how to detect SunOS 5.10 or later, or if you know what memory synchronization guarantees the Solaris atomics are supposed to provide. :-)
A note on what's needed to actually use sp_counted_base_solaris (or compile it...). This code is valid for Solaris 10 update 1 (1/06 release) and later, including OpenSolaris. The code is valid for the original Solaris 10 GA (3/05) if the following patches are applied: (on x86/amd64) 118885-01 (or greater) atomic.h patch 118844-12 (or greater) kernel patch 118891-01 (or greater) llib-lc patch 118345-04 (or greater) libc.so.1 patch (on SPARC) 118884-01 (or greater) atomic.h patch 118822-12 (or greater) kernel patch 118890-01 (or greater) llib-lc patch 119689-03 (or greater) libc.so.1 patch The above patches were released in September 2005. The code is not valid for Solaris 9 or older unless Sun decides to release patches for the older operating systems. Michael

This code is valid for Solaris 10 update 1 (1/06 release) and later, including OpenSolaris.
The code is valid for the original Solaris 10 GA (3/05) if the following patches are applied:
<snip>
The code is not valid for Solaris 9 or older unless Sun decides to release patches for the older operating systems.
Hold on, what do you mean? Surely the atomic intrinsics we've written will work on ANY version of Solaris, since they only use HW primitives? Are you talking about using the os provided ones? Why would you do that if you have the assembly for the former?

Hi, On 7/4/06, Tomas Puverle <Tomas.Puverle@morganstanley.com> wrote: [snip]
The code is not valid for Solaris 9 or older unless Sun decides to release patches for the older operating systems.
Hold on, what do you mean? Surely the atomic intrinsics we've written will work on ANY version of Solaris, since they only use HW primitives? Are you talking about using the os provided ones? Why would you do that if you have the assembly for the former?
Yes, I'm talking about the OS provided intrinsics. The code I provided in sp_counted_base_solaris only uses OS provided intrinsics. Inline assembly versions will, of course, work anywhere that the assembly is valid. Michael
participants (4)
-
Caleb Epstein
-
Michael van der Westhuizen
-
Peter Dimov
-
Tomas Puverle