
So, we're thinking of using the signals library in one of our projects, but the lack of thread-safety is a concern for us. What issues are preventing Boost.Signals from being made threadsafe? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

On Monday 26 April 2004 09:53 am, David Abrahams wrote:
So, we're thinking of using the signals library in one of our projects, but the lack of thread-safety is a concern for us. What issues are preventing Boost.Signals from being made threadsafe?
The biggest issue is granularity. Do we use signal-level locking or connection/slot-level locking? The former requires few locks, but constrains the coding style quite a bit, whereas the latter is going to incur a bit of overhead (a lock for every slot call, disconnect, etc.). The other issue is correctness w.r.t. deletion of trackable objects, especially when dealing with the slot call iterators: for instance, consider a simple combiner that does this: while (first != last) *first++; If that "++" happens and first != last is true (because there are more slots to call), then another thread disconnects all of the signals from first up to last, then the dereference operator is Bad News. I'm sure there are more issues, but that's a start. Doug

Douglas Gregor wrote:
On Monday 26 April 2004 09:53 am, David Abrahams wrote:
So, we're thinking of using the signals library in one of our projects, but the lack of thread-safety is a concern for us. What issues are preventing Boost.Signals from being made threadsafe?
The biggest issue is granularity. Do we use signal-level locking or connection/slot-level locking? The former requires few locks, but constrains the coding style quite a bit, whereas the latter is going to incur a bit of overhead (a lock for every slot call, disconnect, etc.).
I was thinking there would be one lock object per signal object. I'm not sure I understand the other option you mention. Could you explain a bit? Regardless, locking behavior should be made a policy. That way, users who are not interested in paying for thread-safety are accomodated.
The other issue is correctness w.r.t. deletion of trackable objects, especially when dealing with the slot call iterators: for instance, consider a simple combiner that does this:
while (first != last) *first++;
If that "++" happens and first != last is true (because there are more slots to call), then another thread disconnects all of the signals from first up to last, then the dereference operator is Bad News.
I've been giving this issue some thought, and I think I can see a way to make this thread-safe. Here's how: 1) signal::operator() grabs the lock 2) signal::operator() passes the slot iterators to combiner::operator() 3) slot_iterator::operator* does the following: i) copies the slot into a local ii) releases the lock iii) calls the slot through the local iv) reaquires the lock. 4) slot_iterator::operator++ skips any nulled-out slots (note that it always executes when the lock is held) 5) combiner::operator() returns and signal::operator() releases the lock. Now, you must be careful in signal::connect() and signal::disconnect() to do nothing that would invalidate the slot iterators. Suppose they were std::list::iterators, and that signal::disconnect() merely nulled out slots instead of removing them from the list. Then this would work. But it needs to be documented that slot_iterators are only valid for the duration of combiner::operator(). (That is, you can't copy them into a global and use them after the lock has been released.) What do you think? -- Eric Niebler Boost Consulting www.boost-consulting.com

On Wed, 5 May 2004, Eric Niebler wrote:
Douglas Gregor wrote:
projects, but the lack of thread-safety is a concern for us. What issues are preventing Boost.Signals from being made threadsafe?
The biggest issue is granularity. Do we use signal-level locking or connection/slot-level locking? The former requires few locks, but constrains the coding style quite a bit, whereas the latter is going to incur a bit of overhead (a lock for every slot call, disconnect, etc.).
I was thinking there would be one lock object per signal object.
This is the most intuitive way, and probably the right way.
I'm not sure I understand the other option you mention. Could you explain a bit?
Sure. I'm a bit concerned about the trackable issue, because deletion of a trackable object can mean that lots of connections (from many different signals) can be disconnected at once. So we could have the scenario where: Signal A has connections to slots a and b, both of which reference trackable object o. Signal B has connections to slots c and d, both of which reference trackable object o. We could be executing in Signal A and in Signal B simultaneously; then, slot c or d deletes object o (causing slots a, b, c, d to be disconnected). It's okay in Signal A--Signals was design for this--but can we be sure that Signal B won't fail? I think we can't, but we might be able to get close if we had locks in each "connection" object, and locked all connections involved with a slot before calling it. This is massive overhead, though, so I think I should just drop the notion.
Regardless, locking behavior should be made a policy. That way, users who are not interested in paying for thread-safety are accomodated.
Agreed. I was actually just about to prototype a threading policy for Function, as a warm-up for Signals.
I've been giving this issue some thought, and I think I can see a way to make this thread-safe. Here's how:
1) signal::operator() grabs the lock 2) signal::operator() passes the slot iterators to combiner::operator() 3) slot_iterator::operator* does the following: i) copies the slot into a local
Might not actually need this step, because the slot itself cannot be deleted at this point.
ii) releases the lock iii) calls the slot through the local iv) reaquires the lock.
Ah, clever! Good for asynchronous slot invocation as well, if we ever care about that.
4) slot_iterator::operator++ skips any nulled-out slots (note that it always executes when the lock is held)
... and this part is already implemented.
5) combiner::operator() returns and signal::operator() releases the lock.
Now, you must be careful in signal::connect() and signal::disconnect() to do nothing that would invalidate the slot iterators. Suppose they were std::list::iterators, and that signal::disconnect() merely nulled out slots instead of removing them from the list. Then this would work. But it needs to be documented that slot_iterators are only valid for the duration of combiner::operator(). (That is, you can't copy them into a global and use them after the lock has been released.)
The good news is that all of this special work to avoid invalidating slot iterators has already been done to deal with the recursive deletion case. I think it is documented that slot_call_iterators are only valid within the combiner, but I could have omitted that.
What do you think?
Sounds very, very promising. Doug

Douglas Paul Gregor wrote:
We could be executing in Signal A and in Signal B simultaneously; then, slot c or d deletes object o (causing slots a, b, c, d to be disconnected). It's okay in Signal A--Signals was design for this--but can we be sure that Signal B won't fail?
I think I see the problem, but I don't know enough about trackable objects to say for sure whether it can be fixed. But it seems to be that in step (3i) below (which copies the slot into a local variable while the lock is still held) that the local variable should be able to keep a trackable object alive long enough to complete the slot call, even if it has been "deleted" in another thread. At the very least it should be possible to implement it such that a zombie trackable object can be detected so that calling a zombie slot has well-defined behavior.
3) slot_iterator::operator* does the following: i) copies the slot into a local
Might not actually need this step, because the slot itself cannot be deleted at this point.
I don't think you can skip this step. Consider the next two steps:
ii) releases the lock iii) calls the slot through the local
Imagine another thread nulls out the slot between (ii) and (iii). If you access the slot directly instead of through a local, you're in for trouble. And as I mentioned above, the local variable might be able to do some lifetime management for trackable objects.
What do you think?
Sounds very, very promising.
Excellent! -- Eric Niebler Boost Consulting www.boost-consulting.com

On Wed, 5 May 2004, Eric Niebler wrote:
We could be executing in Signal A and in Signal B simultaneously; then, slot c or d deletes object o (causing slots a, b, c, d to be disconnected). It's okay in Signal A--Signals was design for this--but can we be sure that Signal B won't fail?
I think I see the problem, but I don't know enough about trackable objects to say for sure whether it can be fixed. But it seems to be that in step (3i) below (which copies the slot into a local variable while the lock is still held) that the local variable should be able to keep a trackable object alive long enough to complete the slot call, even if it has been "deleted" in another thread.
It's really a user error we're trying to prevent. The user has said "delete o" when another thread is using "o" within a slot. The only way to stop the deletion is to acquire a lock in the destructor of a trackable object and wait until the call is done, but that's playing with fire :)
At the very least it should be possible to implement it such that a zombie trackable object can be detected so that calling a zombie slot has well-defined behavior.
That might be doable. I haven't much thought about the locking strategy for trackable objects, although it's probably just a trivial mutex around the list of connections.
3) slot_iterator::operator* does the following: i) copies the slot into a local
Might not actually need this step, because the slot itself cannot be deleted at this point.
I don't think you can skip this step. Consider the next two steps:
ii) releases the lock iii) calls the slot through the local
Imagine another thread nulls out the slot between (ii) and (iii). If you access the slot directly instead of through a local, you're in for trouble. And as I mentioned above, the local variable might be able to do some lifetime management for trackable objects.
It depends on what "nulls out" means. Within the current Signals architecture, the slot itself is unaffected by the disconnect operation, which touches only the associated connection... it's like having a pair<bool, slot>, so we can set the bool false (meaning: "don't use this slot") without affecting someone that's already calling through that slot. Doug

Douglas Paul Gregor <gregod@cs.rpi.edu> writes:
On Wed, 5 May 2004, Eric Niebler wrote:
We could be executing in Signal A and in Signal B simultaneously; then, slot c or d deletes object o (causing slots a, b, c, d to be disconnected). It's okay in Signal A--Signals was design for this--but can we be sure that Signal B won't fail?
I think I see the problem, but I don't know enough about trackable objects to say for sure whether it can be fixed. But it seems to be that in step (3i) below (which copies the slot into a local variable while the lock is still held) that the local variable should be able to keep a trackable object alive long enough to complete the slot call, even if it has been "deleted" in another thread.
It's really a user error we're trying to prevent. The user has said "delete o" when another thread is using "o" within a slot. The only way to stop the deletion is to acquire a lock in the destructor of a trackable object and wait until the call is done, but that's playing with fire :)
Standard prohibition applies: you can read any object from multiple threads, but if a thread is going to write it, no other thread can touch it for any purpose. I don't think you should try to "solve" this for Signals. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com
participants (4)
-
David Abrahams
-
Douglas Gregor
-
Douglas Paul Gregor
-
Eric Niebler