Re: [Boost-users] [general question] on threading anddoublecheckedlocking pattern

Sohail Somani

23 Jan 2007 23 Jan '07

3:33 a.m.

Actually, its not so much wrong as it is "working currently but will probably break at some stage". Apparently, since Solaris always runs in "total storage ordering" mode, their implementation of pthread_once is correct for those hardware platforms. Just thought I'd clarify that.

...

-----Original Message----- From: Sohail Somani Sent: Monday, January 22, 2007 7:25 PM To: 'boost-users@lists.boost.org' Subject: RE: [Boost-users] [general question] on threading anddoublecheckedlocking pattern

Oopsie:

http://bugs.opensolaris.org/view_bug.do?bug_id=6513516

So even the smart guys get it wrong.

...
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Sohail Somani Sent: Monday, January 22, 2007 5:12 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] [general question] on threading anddoublecheckedlocking pattern

Show replies by date

Ovanes Markarian

23 Jan 23 Jan

9:39 a.m.

New subject: [general question] on threading and double checked locking pattern

Actually I read all your and Tony's points and may be I was misunderstood. My first question is: If mutex does not guarantee thread safety what then? static boost:mutex s_m class Singleton { //everything as before in the class }; //creative get Singleton* Singleton::creating_singleton_getter() { boost::mutex::scoped_lock lock(s_m); //allways called when entered //all other calls to this function // are blocking, so it is not possible // to enter this function twice if lock is active if(Singleton::pInstance == NULL) Singleton::pInstance = new Singleton; //does not matter how these steps are executed // and reordered by compiler, since the function // can only be entered when s_m is unlocked Singleton::getter = &non_creating_getter; //this is still guarded by locked mutex!!! } //mutex unlock //everything else as it was before So the point is: As long as Singleton::instance is called from multiple threads and these are not created from global vars before main is called, this code should be thread safe. The scenario is like this: Threads: A B C D instance instance instance //only A, B or C will get access to instance, other will wait instance // if creating get was successful, D calls the lightweigt version // of getter Static class variables are guaranteed to be initialized before main is entered: C++ standard 9.4.2 states: ... Static data members are initialized and destroyed exactly like non-local objects (3.6.2, 3.6.3). ... 3.6.2 states: ... Objects with static storage duration (3.7.1) shall be zero-initialized (8.5) before any other initialization takes place. Zero-initialization and initialization with a constant expression are collectively called static initialization; all other initialization is dynamic initialization. ... So I assume, that initialization of getter with address of a (static) class function is a constant expression and therefore is not a dynamic initialization. (Please see 5.19 of a standard especially: ... Other expressions are considered constant-expressions only for the purpose of non-local static object initialization (3.6.2). Such constant expressions shall evaluate to one of the following: ... - an address constant expression, ... An address constant expression is a pointer to an lvalue designating an object of static storage duration, a string literal (2.13.4), or a function.) Therefore there should be a guaranty that the Singleton static members are initialized before main is entered. The locked mutex guarantees that only one thread at one processor will enter the function at the same time. Isn't it so? Thanks for your ideas and answers. Best Regards, Ovanes -----Original Message----- From: Sohail Somani [mailto:s.somani@fincad.com] Sent: Tuesday, January 23, 2007 4:33 AM To: Sohail Somani; boost-users@lists.boost.org Subject: Re: [Boost-users] [general question] on threadinganddoublecheckedlocking pattern Actually, its not so much wrong as it is "working currently but will probably break at some stage". Apparently, since Solaris always runs in "total storage ordering" mode, their implementation of pthread_once is correct for those hardware platforms. Just thought I'd clarify that.

...

-----Original Message----- From: Sohail Somani Sent: Monday, January 22, 2007 7:25 PM To: 'boost-users@lists.boost.org' Subject: RE: [Boost-users] [general question] on threading anddoublecheckedlocking pattern

Oopsie:

http://bugs.opensolaris.org/view_bug.do?bug_id=6513516

So even the smart guys get it wrong.

...
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Sohail Somani Sent: Monday, January 22, 2007 5:12 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] [general question] on threading anddoublecheckedlocking pattern

_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Gottlob Frege

6:08 p.m.

New subject: [general question] on threading and double checked locking pattern

On 1/23/07, Ovanes Markarian <om_boost@keywallet.com> wrote:

...

Actually I read all your and Tony's points and may be I was misunderstood.

You were not misunderstood at all. I've gone down the same road as you. More than once. With various techniques, including this create_getter vs non_creating_getter idea. My first question is:

...

If mutex does not guarantee thread safety what then?

It only guarantees thread safety when used for ALL accesses of the shared variables. Not just on the write of the shared variables. You need it for both read and write. Not just because the shared variable may change 'while' you are reading it, but because it may have changed, but your processor hasn't 'seen' those changes yet, even though it has seen other changes that happened 'before' the shared variable changed. This is the seeming paradox of DCLP and modern CPU architecture.

...

//creative get

Singleton* Singleton::creating_singleton_getter() { boost::mutex::scoped_lock lock(s_m); //allways called when entered

//all other calls to this function // are blocking, so it is not possible // to enter this function twice if lock is active if(Singleton::pInstance == NULL) Singleton::pInstance = new Singleton; //does not matter how these steps are executed // and reordered by compiler, since the function // can only be entered when s_m is unlocked Singleton::getter = &non_creating_getter; //this is still guarded by locked mutex!!!

No, getter is not 'still guarded'. As soon as it is set, another thread can now start using the non_creating_getter. What if the compiler DID reorder the instructions?: Singleton::getter = &non_creating_getter; // line 1 if(Singleton::pInstance == NULL) Singleton::pInstance = new Singleton; // line 2 certainly in this case, between line 1 and 2, another thread could come in and start using non_creating_getter too early. Now, imagine that it wasn't the compiler that reordered the lines, but instead the processor (ie using speculative exection). Or not the processor, but the memory bus. That's what happens. They will still appear in order for the one processor, but not necessarily for another processor. Worse, it depends on the platform, so this bug is not yet very visible, and that's why we have so much code relying on it working. So much that I'm surprised that chip makers even consider allowing the reordering to happen - I would expect it to break too much code. Similarly, by the way, you can even be sure the pointer pInstance is seen to be set before the bytes of Singleton that it points to are seen to be written! } //mutex unlock So let's tighten the mutex boundary: Singleton* Singleton::creating_singleton_getter() { { boost::mutex::scoped_lock lock(s_m); //aquire mutex here if(Singleton::pInstance==NULL) Singleton::pInstance = new Singleton; } Singleton::getter = &non_creating_singleton_getter; return Singleton::pInstance; } Now the mutex is unlocked before getter is set - this puts a write barrier between the 2 instructions - which means that THIS processor (and its memory-handler queue) will NOT change the order of when getter is set. In effect, it flushes the memory-write-request queue before getter is written (or, more accurately, the request to write the global memory for getter, is placed in the write-queue). And this is where it gets fuzzy for me - from my understanding, it requires the other processor (where some other thread is running) to queue up 2 read requests: - 'read getter please' - 'read the bytes of the new Singleton' and then have those requests reordered. The oddity being why would the second request be in the queue before the first request was answered - ie the second request *depends* on the answer of the first. I can only imagine that this happens because of 2 reasons: - speculative execution - the CPU could see that it was 'probably' going to read pInstance regardless of getter (which seems more plausible in the traditional DLCP case where getter is just a flag, then an checked in an if, so the CPU can easily look ahead). - the CPU (or memory controller) had recently read and cached the memory where pInstance points, and didn't feel a need to re-read it (ie there where no obvious dependencies and/or no reason that the memory should be different since the last time it read or wrote that memory). Basically, the idea here is that the CPU, as a single CPU, is consistent - it is only inconsistent in the presence of other CPUs, and it depends on the architecture as to whether those inconsistencies are allowed to exist or not. And this is where/when you need to start asking on comp.programming.threads, but I suspect they'll tell you (with better detail and understanding) the same thing - it just doesn't work without a read barrier on the other threads. So the point is: As long as Singleton::instance is called from multiple

...

threads and these are not created from global vars before main is called, this code should be thread safe.

I'm not sure what you are saying about before main, etc. If you are just concerned about creating_getter being initially set properly, I agree you are probably OK, since it is static initialization. My only concern there would be, as mentioned, with Singletons inside DLLs / shared libraries - I don't think loading shared libraries is thread safe under linux (which boggles my mind, but that's what I've heard). And the standard doesn't say anything about shared libraries. The scenario is like this:

...

Threads:

A B C D instance instance instance //only A, B or C will get access to instance, other will wait instance // if creating get was successful, D calls the lightweigt version // of getter

The scenario is that D reads the 'new' getter, but still manages to read the 'old' (uninitialized) Singleton, because of crazy modern memory architectures. Static class variables are guaranteed to be initialized before main is

...

entered: C++ standard 9.4.2 states: ... Static data members are initialized and destroyed exactly like non-local objects (3.6.2, 3.6.3). ...

3.6.2 states: ... Objects with static storage duration (3.7.1) shall be zero-initialized ( 8.5) before any other initialization takes place. Zero-initialization and initialization with a constant expression are collectively called static initialization; all other initialization is dynamic initialization. ...

So I assume, that initialization of getter with address of a (static) class function is a constant expression and therefore is not a dynamic initialization.

OK. (Please see 5.19 of a standard especially:

...

... Other expressions are considered constant-expressions only for the purpose of non-local static object initialization (3.6.2). Such constant expressions shall evaluate to one of the following: ... - an address constant expression, ... An address constant expression is a pointer to an lvalue designating an object of static storage duration, a string literal (2.13.4), or a function.)

Therefore there should be a guaranty that the Singleton static members are initialized before main is entered. The locked mutex guarantees that only one thread at one processor will enter the function at the same time. Isn't it so?

Yep, only one thread gets into the guarded part of creating_singleton_getter, but non_creating_getter might still be seen and used too early. Thanks for your ideas and answers.

...

Best Regards, Ovanes

I hope it makes sense - it didn't make much sense to me the first 10 times. You might also want to try comp.programming.threads - it has been discussed there a few times. Tony.

6776

Age (days ago)

6776

Last active (days ago)

List overview

Download

2 comments

3 participants

participants (3)

Gottlob Frege
Ovanes Markarian
Sohail Somani