shared_ptr and weak_ptr concurrency

newer
Re: [Boost-users] shared_ptr and...

older
[signals2] tracking intrusively...

Rajeev Rao

1 Sep 2009 1 Sep '09

12:32 a.m.

This is the first time I'm using boost in a multi-threaded env (linux x86_64,gcc 4). This question may have been asked in other forms. However, from what I've been able to search (google) up, I could not get a crystal clear answer. I'm essentially trying to use the solution provided in this page. http://onlamp.com/pub/a/onlamp/2006/05/04/smart-pointers.html?page=5.

...

From the boost documentation, it appears that this could be an undefined operation as the global shared pointer is being read and written to by multiple threads. Is that correct ? In case the link fails or someone wants more details, please read on. I've got a n reader threads and 1 writer thread. initializion Thread : shared_ptr< MyClass > global_ptr(createNewObject()) ; WriterThread :global_ptr.reset( createNewObject())

ReaderThreads:weak_ptr<MyClass> local_weak_ptr (globally_ptr) ;... if(shared_ptr< MyClass > local_shared_ptr = local_weak_ptr.lock() ) { // using local_shared_ptr.} else { // recreate weak ptr from global ?} thanks. Rajeev

Attachments:

attachment.html (text/html — 1.6 KB)

Show replies by date

Gottlob Frege

2 Sep 2 Sep

2:51 a.m.

On Mon, Aug 31, 2009 at 8:32 PM, Rajeev Rao <rbsrao79@yahoo.co.in> wrote:

...

This is the first time I'm using boost in a multi-threaded env (linux x86_64,gcc 4).

This question may have been asked in other forms. However, from what I've been able to search (google) up, I could not get a crystal clear answer. I'm essentially trying to use the solution provided in this page.

http://onlamp.com/pub/a/onlamp/2006/05/04/smart-pointers.html?page=5.

From the boost documentation, it appears that this could be an undefined operation as the global shared pointer is being read and written to by multiple threads. Is that correct ?

In case the link fails or someone wants more details, please read on.

I've got a n reader threads and 1 writer thread.

initializion Thread : shared_ptr< MyClass > global_ptr(createNewObject()) ;

WriterThread : global_ptr.reset( createNewObject())

ReaderThreads: weak_ptr<MyClass> local_weak_ptr (globally_ptr) ; ...

if(shared_ptr< MyClass > local_shared_ptr = local_weak_ptr.lock() ) { // using local_shared_ptr. } else { // recreate weak ptr from global ? }

thanks.

Rajeev

I don't think that code is thread-safe. You can modify the shared count safely between threads, but you can't modify the pointer itself. Tony

Rajeev Rao

5:18 a.m.

Thanks for the response. What if I place accesses to the global pointer within critical sections (indicated by mutex.lock()-unlock()) ? I've enclosed the dominant portions of the code in while loops. Does this make it thread safe ? <code> //initializion Thread (runs before any other thread): shared_ptr< MyClass > global_ptr(createNewObject()) ; //WriterThread : while(true) {mutex.lock()global_ptr.reset( createNewObject())mutex.unlock();sleep (5) ;} //ReaderThreads: mutex.lock()weak_ptr<MyClass> local_weak_ptr (globally_ptr) ;mutex.lock()...while(true) {if(shared_ptr< MyClass > local_shared_ptr = local_weak_ptr.lock() ) { // using local_shared_ptr.} else { // recreate weak ptr from global ?}} // end of while loop </code> From: Gottlob Frege <gottlobfrege@gmail.com> Subject: Re: [Boost-users] shared_ptr and weak_ptr concurrency To: boost-users@lists.boost.org Date: Wednesday, September 2, 2009, 8:21 AM On Mon, Aug 31, 2009 at 8:32 PM, Rajeev Rao <rbsrao79@yahoo.co.in> wrote: This is the first time I'm using boost in a multi-threaded env (linux x86_64,gcc 4). This question may have been asked in other forms. However, from what I've been able to search (google) up, I could not get a crystal clear answer. I'm essentially trying to use the solution provided in this page. http://onlamp.com/pub/a/onlamp/2006/05/04/smart-pointers.html?page=5.

...

From the boost documentation, it appears that this could be an undefined operation as the global shared pointer is being read and written to by multiple threads. Is that correct ?

In case the link fails or someone wants more details, please read on. I've got a n reader threads and 1 writer thread. initializion Thread : shared_ptr< MyClass > global_ptr(createNewObject()) ; WriterThread :global_ptr.reset( createNewObject()) ReaderThreads:weak_ptr<MyClass> local_weak_ptr (globally_ptr) ;... if(shared_ptr< MyClass > local_shared_ptr = local_weak_ptr.lock() ) { // using local_shared_ptr.} else { // recreate weak ptr from global ?} thanks. Rajeev I don't think that code is thread-safe. You can modify the shared count safely between threads, but you can't modify the pointer itself. Tony -----Inline Attachment Follows----- _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Gottlob Frege

3 Sep 3 Sep

4:02 a.m.

On Wed, Sep 2, 2009 at 1:18 AM, Rajeev Rao <rbsrao79@yahoo.co.in> wrote:

...

Thanks for the response. What if I place accesses to the global pointer within critical sections (indicated by mutex.lock()-unlock()) ? I've enclosed the dominant portions of the code in while loops. Does this make it thread safe ?

<code>

//initializion Thread (runs before any other thread): shared_ptr< MyClass > global_ptr(createNewObject()) ;

//WriterThread :

while(true) { mutex.lock() global_ptr.reset( createNewObject()) mutex.unlock(); sleep (5) ; }

By the way, to minimize contention, call createNewObject() outside the lock: local = createNewObject(); lock(); global = local; unlock();

...

//ReaderThreads:

mutex.lock() weak_ptr<MyClass> local_weak_ptr (globally_ptr) ; mutex.lock()

I assume you meant UNlock in that second mutex call above!

...

... while(true) { if(shared_ptr< MyClass > local_shared_ptr = local_weak_ptr.lock() ) { // using local_shared_ptr. } else { // recreate weak ptr from global ?

if you decide to recreate, you need to relock, of course.

...

} } // end of while loop

yep, that's the typical usage. Tony

Stefan Strasser

6:30 a.m.

Am Thursday 03 September 2009 06:02:05 schrieb Gottlob Frege:

...

...
<code>

//initializion Thread (runs before any other thread): shared_ptr< MyClass > global_ptr(createNewObject()) ;

//WriterThread :

while(true) { mutex.lock() global_ptr.reset( createNewObject()) mutex.unlock(); sleep (5) ; }

By the way, to minimize contention, call createNewObject() outside the lock: local = createNewObject(); lock(); global = local; unlock();

...
//ReaderThreads:

mutex.lock() weak_ptr<MyClass> local_weak_ptr (globally_ptr) ; mutex.lock()

why would you even need a lock here? the shared_ptr doc says that you can expect the same thread safety from shared_ptr as you can from built-in types. you can use multiple-readers-single-writer without any locks on built-in types.

Kevin Kassil

4 Sep 4 Sep

2:31 p.m.

Stefan, On Thu, Sep 3, 2009 at 2:30 AM, Stefan Strasser <strasser@uni-bremen.de>wrote: ...

...

why would you even need a lock here? the shared_ptr doc says that you can expect the same thread safety from shared_ptr as you can from built-in types. you can use multiple-readers-single-writer without any locks on built-in types.

You can? Is assigning to a char or a double guaranteed to be atomic? How can the compiler guarantee that? -- What if there is some architecture for which it's not a single instruction assignment? Kevin

John Dlugosz

3:58 p.m.

I know that on x86 and x64 architectures, assigning to or reading from any location up to the native general-purpose register size is atomic if the item in question is properly aligned for its size. That is, { a=b; in parallel with b=c; } will not read a partially-changed b, if b is declared in the normal manner. Reading oddly-aligned things out of a packed stream, or using pragmas to change the alignment options, may upset this. Furthermore, the cache is coherent among multiple CPUs or cores, even on a NUMA server. What you have to watch out for is when the compile actually issues the read or write, since it can us a register and save it back to memory much later, or re-arrange the requests. Furthermore, the chip queues requests to memory with reads having priority, so a write followed by a read needs special consideration. Assigning to a (non-volatile) char might do something “interesting”. For example, if two separate char variables are declared, the compiler might keep them in registers and save them both out at the end with a single 16-bit write. The x86/x64 instruction set is conducive to that, but not to other cases. But in general an architecture might indeed merge separate variables to single larger register. The compiler might then re-save something that didn’t change, thus clobbering a change made on another thread. The current C++ standard does not address threads, so there is indeed no portable way to guarantee that. You have to encapsulate and implement for each architecture, and use compiler-specific extensions. It would be interesting to see a list of architectures noting whether or not primitive type reads and writes are atomic, at least verifying that they are even if there is nothing listed that doesn’t. I’m sure they would all have footnotes to that, as I described above for the one I’m familiar with. --John From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Kevin Kassil Sent: Friday, September 04, 2009 9:32 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] shared_ptr and weak_ptr concurrency Stefan, On Thu, Sep 3, 2009 at 2:30 AM, Stefan Strasser <strasser@uni-bremen.de<mailto:strasser@uni-bremen.de>> wrote: ... why would you even need a lock here? the shared_ptr doc says that you can expect the same thread safety from shared_ptr as you can from built-in types. you can use multiple-readers-single-writer without any locks on built-in types. You can? Is assigning to a char or a double guaranteed to be atomic? How can the compiler guarantee that? -- What if there is some architecture for which it's not a single instruction assignment? Kevi TradeStation Group, Inc. is a publicly-traded holding company (NASDAQ GS: TRAD) of three operating subsidiaries, TradeStation Securities, Inc. (Member NYSE, FINRA, SIPC and NFA), TradeStation Technologies, Inc., a trading software and subscription company, and TradeStation Europe Limited, a United Kingdom, FSA-authorized introducing brokerage firm. None of these companies provides trading or investment advice, recommendations or endorsements of any kind. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

Rajeev Rao

4:06 p.m.

Even assuming its possible for pointer assignments to be atomic on a given implementation, I don't think it follows that shared_ptr assignments will be atomic (since there is a change to the count and the actual pointer). Rajeev --- On Fri, 9/4/09, John Dlugosz <JDlugosz@TradeStation.com> wrote: From: John Dlugosz <JDlugosz@TradeStation.com> Subject: Re: [Boost-users] shared_ptr and weak_ptr concurrency To: "boost-users@lists.boost.org" <boost-users@lists.boost.org> Date: Friday, September 4, 2009, 9:28 PM I know that on x86 and x64 architectures, assigning to or reading from any location up to the native general-purpose register size is atomic if the item in question is properly aligned for its size. That is, { a=b; in parallel with b=c; } will not read a partially-changed b, if b is declared in the normal manner. Reading oddly-aligned things out of a packed stream, or using pragmas to change the alignment options, may upset this. Furthermore, the cache is coherent among multiple CPUs or cores, even on a NUMA server. What you have to watch out for is when the compile actually issues the read or write, since it can us a register and save it back to memory much later, or re-arrange the requests. Furthermore, the chip queues requests to memory with reads having priority, so a write followed by a read needs special consideration. Assigning to a (non-volatile) char might do something “interesting”. For example, if two separate char variables are declared, the compiler might keep them in registers and save them both out at the end with a single 16-bit write. The x86/x64 instruction set is conducive to that, but not to other cases. But in general an architecture might indeed merge separate variables to single larger register. The compiler might then re-save something that didn’t change, thus clobbering a change made on another thread. The current C++ standard does not address threads, so there is indeed no portable way to guarantee that. You have to encapsulate and implement for each architecture, and use compiler-specific extensions. It would be interesting to see a list of architectures noting whether or not primitive type reads and writes are atomic, at least verifying that they are even if there is nothing listed that doesn’t. I’m sure they would all have footnotes to that, as I described above for the one I’m familiar with. --John From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Kevin Kassil Sent: Friday, September 04, 2009 9:32 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] shared_ptr and weak_ptr concurrency Stefan, On Thu, Sep 3, 2009 at 2:30 AM, Stefan Strasser <strasser@uni-bremen.de> wrote: ... why would you even need a lock here? the shared_ptr doc says that you can expect the same thread safety from shared_ptr as you can from built-in types. you can use multiple-readers-single-writer without any locks on built-in types. You can? Is assigning to a char or a double guaranteed to be atomic? How can the compiler guarantee that? -- What if there is some architecture for which it's not a single instruction assignment? Kevi TradeStation Group, Inc. is a publicly-traded holding company (NASDAQ GS: TRAD) of three operating subsidiaries, TradeStation Securities, Inc. (Member NYSE, FINRA, SIPC and NFA), TradeStation Technologies, Inc., a trading software and subscription company, and TradeStation Europe Limited, a United Kingdom, FSA-authorized introducing brokerage firm. None of these companies provides trading or investment advice, recommendations or endorsements of any kind. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. -----Inline Attachment Follows----- _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Chris Uzdavinis

4:17 p.m.

On Thu, Sep 3, 2009 at 2:30 AM, Stefan Strasser<strasser@uni-bremen.de> wrote:

...

...
...
mutex.lock() weak_ptr<MyClass> local_weak_ptr (globally_ptr) ; mutex.lock()

why would you even need a lock here?

Because it's unsafe otherwise.

...

the shared_ptr doc says that you can expect the same thread safety from shared_ptr as you can from built-in types.

This is true, but you're drawing the wrong conclusion from it. You need a lock around built-in types as well. You need to use an "atomic" type to safely do what you're saying. In the same documentation, they give examples which are enlightening: //--- Example 3 --- // thread A p = p3; // reads p3, writes p // thread B p3.reset(); // writes p3; undefined, simultaneous read/write

...

you can use multiple-readers-single-writer without any locks on built-in types.

Not true. You can use multiple-readers, NO-writers without any locks, however. Chris

Stefan Strasser

6:28 p.m.

Am Friday 04 September 2009 18:17:36 schrieb Chris Uzdavinis:

...

...
the shared_ptr doc says that you can expect the same thread safety from shared_ptr as you can from built-in types.

This is true, but you're drawing the wrong conclusion from it. You need a lock around built-in types as well. You need to use an "atomic" type to safely do what you're saying.

In the same documentation, they give examples which are enlightening:

//--- Example 3 ---

// thread A p = p3; // reads p3, writes p

// thread B p3.reset(); // writes p3; undefined, simultaneous read/write

I believe this means that the contents of p are undefined, not that there is an undefined state within a shared_ptr (on a platform that guarantees atomic pointers)

...

...
you can use multiple-readers-single-writer without any locks on built-in types.

Not true. You can use multiple-readers, NO-writers without any locks, however.

you can on platforms with atomic builtin types, as the c++ standard doesn't say anything about that (yet). so if it is really true that that's not the case for shared_ptr, then the statement of the shared_ptr doc would be wrong. but as far as I can see from the shared_ptr implementation there is nothing that would indicate that. could someone with more insight into the implementation clear this up please? (and maybe the documentation)

John Dlugosz

6:42 p.m.

"undefined" means that anything at all could happen. The computer might crash. In real life, I expect worst to be that the readers gets inconsistent junk when reading p3, both because it is partly updated and because it might peek at the state more than once over time, with it still changing. I agree that "just like built-in types" would not hold for the nonstandard implementation behavior of atomic reads and writes, so many people might misunderstand that. I've had to implement smart-pointer like things that do have this feature, so it must be carefully designed to have only a single pointer in the main struct and update things to maintain a valid state at all times (e.g. swap in that main pointer last). I don't suppose a standard-conforming shared_ptr would have that feature unless explicitly advertised as such.

...

-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of Stefan Strasser Sent: Friday, September 04, 2009 1:29 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] shared_ptr and weak_ptr concurrency

Am Friday 04 September 2009 18:17:36 schrieb Chris Uzdavinis:

...
...
the shared_ptr doc says that you can expect the same thread safety from shared_ptr as you can from built-in types.

This is true, but you're drawing the wrong conclusion from it. You need a lock around built-in types as well. You need to use an "atomic" type to safely do what you're saying.

In the same documentation, they give examples which are enlightening:

//--- Example 3 ---

// thread A p = p3; // reads p3, writes p

// thread B p3.reset(); // writes p3; undefined, simultaneous read/write

I believe this means that the contents of p are undefined, not that there is an undefined state within a shared_ptr (on a platform that guarantees atomic pointers)

...
...
you can use multiple-readers-single-writer without any locks on

built-in

...
...
types.

Not true. You can use multiple-readers, NO-writers without any locks, however.

you can on platforms with atomic builtin types, as the c++ standard doesn't say anything about that (yet).

so if it is really true that that's not the case for shared_ptr, then the statement of the shared_ptr doc would be wrong. but as far as I can see from the shared_ptr implementation there is nothing that would indicate that.

could someone with more insight into the implementation clear this up please? (and maybe the documentation)

_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

TradeStation Group, Inc. is a publicly-traded holding company (NASDAQ GS: TRAD) of three operating subsidiaries, TradeStation Securities, Inc. (Member NYSE, FINRA, SIPC and NFA), TradeStation Technologies, Inc., a trading software and subscription company, and TradeStation Europe Limited, a United Kingdom, FSA-authorized introducing brokerage firm. None of these companies provides trading or investment advice, recommendations or endorsements of any kind. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

Stefan Strasser

8:25 p.m.

Am Friday 04 September 2009 20:42:48 schrieb John Dlugosz:

...

I agree that "just like built-in types" would not hold for the nonstandard implementation behavior of atomic reads and writes, so many people might misunderstand that.

I've had to implement smart-pointer like things that do have this feature, so it must be carefully designed to have only a single pointer in the main struct and update things to maintain a valid state at all times (e.g. swap in that main pointer last). I don't suppose a standard-conforming shared_ptr would have that feature unless explicitly advertised as such.

can you point to anything that the boost shared_ptr does that is unsafe in the one-writer-multiple-reader case? TR1 doesn't mention thread safety or atomicity at all, so strictly speaking the guarantee that the boost shared_ptr doc gives is already an extension. writing to an expired weak_ptr while multiple readers are trying to lock() it seems safe to me. (in the current implementation)

John Dlugosz

7:36 p.m.

...

can you point to anything that the boost shared_ptr does that is unsafe in the one-writer-multiple-reader case?

Just looking at the header, I see two direct members, px and pn. So straight assignment isn't going to copy the struct in one atomic operation. Looking at operator=, I see two separate assignments. First the assignment to the contained pointer takes place. Now, a reader on another thread will see the object to have the wrong shared_count structure. Then, the assigning thread continues, taking its time to crank through the shared_count assignment which changes two reference counts (lengthy pipeline stalls for atomic operations, CPU stays busy for a while) before finally writing the new pn value. It could be made safe by implementing it like this: make a temp shared_ptr object, initialized from the RHS. issue a double-wide atomic swap instruction, swapping the LHS for the temp. let the temp destruct. That's machine architecture specific, and compiler specific on how to make it emit the correct swap instruction. The x86 and x64 have an atomic exchange that's twice the size of a normal pointer, so it will swap two pointers in one struct. That's what it's there for -- implementing things like this. You could easily change the function in shared_ptr.hpp if you wanted it to work that way. --John TradeStation Group, Inc. is a publicly-traded holding company (NASDAQ GS: TRAD) of three operating subsidiaries, TradeStation Securities, Inc. (Member NYSE, FINRA, SIPC and NFA), TradeStation Technologies, Inc., a trading software and subscription company, and TradeStation Europe Limited, a United Kingdom, FSA-authorized introducing brokerage firm. None of these companies provides trading or investment advice, recommendations or endorsements of any kind. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

Stefan Strasser

10:41 p.m.

Am Friday 04 September 2009 21:36:53 schrieb John Dlugosz:

...

...
can you point to anything that the boost shared_ptr does that is unsafe in the one-writer-multiple-reader case?

Just looking at the header, I see two direct members, px and pn. So straight assignment isn't going to copy the struct in one atomic operation.

Looking at operator=, I see two separate assignments.

I can see why it is not atomic in general. (although I still think the documentation should be changed. I don't think very many people understand that statement as "the c++ standard doesn't guarantee atomicity for builtin types, so shared_ptr isn't either", but as "I can do with shared_ptr anything I can do with an 'int' on my platform.) but I'm still not convinced that there's a lock required in my case, which was: "writing to an expired weak_ptr while multiple readers are trying to lock() it seems safe to me. (in the current implementation)" the relevant code is: writing: template<class Y> weak_ptr & operator=(shared_ptr<Y> const & r) // never throws { px = r.px; (*) pn = r.pn; return *this; } reading: template<class Y> explicit shared_ptr(weak_ptr<Y> const & r): pn(r.pn) // may throw { // it is now safe to copy r.px, as pn(r.pn) did not throw px = r.px; } if we assume that the reads/writes are not reordered by the compiler (which I think is true because assigning to pn acquires a mutex or does something equivalent on lock-free platforms which should act as a memory barrier), then reading from a weak_ptr which was expired and is now in between assignment (at line marked with (*)) doesn't have any effect because the shared_ptr(weak_ptr) constructor only proceeds if there is a positive shared count. am I missing something?

John Dlugosz

11:51 p.m.

...

I can see why it is not atomic in general. (although I still think the documentation should be changed. I don't think very many people understand that statement as "the c++ standard doesn't guarantee atomicity for builtin types, so shared_ptr isn't either", but as "I can do with shared_ptr anything I can do with an 'int' on my platform.)

Having just had an issue with documentation myself on another thread, I agree that it is spartan and not illustrative in nature.

...

but I'm still not convinced that there's a lock required in my case, which was: "writing to an expired weak_ptr while multiple readers are trying to lock() it seems safe to me. (in the current implementation)"

I think it matters if you are simply dereferencing (as long as px is one value or the other, you'll take it), or copying into another smart pointer object (which must get px and pn in sync to work correctly). It's my recent experience in my current work that carefully considering use cases is what allows for ultra-high performance code. But these assumptions also makes it brittle against future maintenance and changes to the program, so it's important to understand and document them.

...

if we assume that the reads/writes are not reordered by the compiler (which I think is true because assigning to pn acquires a mutex or does something equivalent on lock-free platforms which should act as a memory barrier),

The compiler is free to rearrange non-volatile reads and writes, and with inlining can get pretty creative with that. Just coding "do all this stuff to the structure, AND THEN assign a pointer to that completed structure" is a known pitfall. Even if the pointer itself is declared volatile, the contents can still be written after the "final" pointer assignment. Looking at the calls to inc and dec involved, I think (it's hard to follow) it ends up calling the Win32 API function. Oh, but you didn't say what platform you are on. In the past, I've seen compilers surprise me by keeping things in registers even across function calls, as it assumed that something declared locally and never apparently having its address taken could not be known anywhere else. Well, it was wrong <g>. I don't know to what extent the compiler may take liberities in assuming that an imported function might know an alias to some variables of yours. But a smart compiler *could* re-arrange things. Adding compiler-specific decorations to the functions is a way to improve performance, so it might very well "know" that the function only uses its parameters and they don't alias anything (Microsoft has several ways of promising that). Point is, if it's not declared volatile, the compiler MAY re-arrange it, even across function calls. The compiler re-arranging access to variables, holding them in registers and sending them back later, etc. is a separate issue from what the platform does once it hits the "mov" instruction targeting that memory location. CPU memory fences are distinct from Compiler memory fences. You must use both. So... make both writes to volatile variables so the compiler will do that promptly and not reverse them. You can use reference casts to make "just this write" volatile. Meanwhile, I know that on the x86/x64 that writes take effect in the order in which they are issued (it's mixing reads and writes that things get interesting). Furthermore, in this example, the shown operator= is only for #if defined(__BORLANDC__) || defined(__GNUC__) and it normally uses the generated assignment operator. I don't think that the standard requires the members to be assigned in any particular order (but I'd have to check to be sure). But, since neither variable is volatile, it could rearrange at will. In particular, the expanded inlined pn assignment contains several statements, and all those combined with the assignment to px are fair game to re-arrange to maximize throughput and avoid memory bottlenecks. To make sure it works, add an explicit operator= that's like the one shown for BORLANDC and GNUC, but aliases both px and pn to volatile variables. Ah, but then you'll have trouble with the function call, so add 'volatile' to that function, and so it goes. You might also use the compiler-specific features to prevent code movement. For Microsoft, that would be the intrinsic pseudo-function _WriteBarrier(). But it between the two statements, and you know it will code for the px assignment first in the final machine code. --John TradeStation Group, Inc. is a publicly-traded holding company (NASDAQ GS: TRAD) of three operating subsidiaries, TradeStation Securities, Inc. (Member NYSE, FINRA, SIPC and NFA), TradeStation Technologies, Inc., a trading software and subscription company, and TradeStation Europe Limited, a United Kingdom, FSA-authorized introducing brokerage firm. None of these companies provides trading or investment advice, recommendations or endorsements of any kind. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

OvermindDL1

6 Sep 6 Sep

12:25 a.m.

On Fri, Sep 4, 2009 at 1:36 PM, John Dlugosz<JDlugosz@tradestation.com> wrote:

...

...
can you point to anything that the boost shared_ptr does that is unsafe in the one-writer-multiple-reader case?

Just looking at the header, I see two direct members, px and pn. So straight assignment isn't going to copy the struct in one atomic operation.

Actually you could. Atomic CAS instructions support up to 64bits on a 32bit platform, and 128bits on a 64bit platform. As long as those are aligned and side-by-side, you can change both atomically. Although Boost does not currently do this.

John Dlugosz

8 Sep 8 Sep

3:24 p.m.

...

Actually you could. Atomic CAS instructions support up to 64bits on a 32bit platform, and 128bits on a 64bit platform. As long as those are aligned and side-by-side, you can change both atomically. Although Boost does not currently do this.

Right, but the assignment operator doesn't *only* store those two values. Like I detailed elsewhere, it would need to initialize a temporary, do the double-wide swap of that with the LHS, and then let the temp go out of scope. The instruction is CMPXCHG8B (or -16B on x64), which as you point out also does a compare. So code a read followed by the cmp-and-swap. Checking the CPU manual, I also see that in x64 the operand must be aligned on a 16-byte boundary. The compiler will only align the structure on the 8-byte boundary (each field is an 8-byte pointer), so you'll also have to convince the compiler to align the structure more strictly. That doesn't seem to be necessary in x86 mode. TradeStation Group, Inc. is a publicly-traded holding company (NASDAQ GS: TRAD) of three operating subsidiaries, TradeStation Securities, Inc. (Member NYSE, FINRA, SIPC and NFA), TradeStation Technologies, Inc., a trading software and subscription company, and TradeStation Europe Limited, a United Kingdom, FSA-authorized introducing brokerage firm. None of these companies provides trading or investment advice, recommendations or endorsements of any kind. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

5748

Age (days ago)

5755

Last active (days ago)

List overview

Download

16 comments

7 participants

participants (7)

Chris Uzdavinis
Gottlob Frege
John Dlugosz
Kevin Kassil
OvermindDL1
Rajeev Rao
Stefan Strasser