[thread 1.48] Multiple interrupt/timed_join leads to deadlock

Gaetano Mendola

4 Dec 2012 4 Dec '12

6:32 p.m.

Hi all, I was investigating a rare deadlock when issuing an interrupt and a timed_join in parallel. I come out with the the following code showing the behavior. This is the backtrace of it: http://pastebin.com/xArHgHXf The deadlock is rare so sometime you need to wait a bit. I couldn't try it with boost 1.52 because the code is invalid due the precondition of "thread joinable" when issuing the timed_join. Is the code not valid or a real bug? Regards Gaetano Mendola ////////////////////// CODE ////////////////////// #include <iostream> #include <boost/thread.hpp> struct Runner { void operator()() { const boost::posix_time::time_duration mySeconds = boost::posix_time::seconds(100); boost::this_thread::sleep(mySeconds); } }; struct Interrupter { Interrupter(boost::thread& aThread, boost::barrier& aBarrier) : theThread(aThread), theSeconds(boost::posix_time::seconds(1)), theBarrier(aBarrier) { } void operator()() { theBarrier.wait(); for (int i=0; i<1000;++i) { theThread.interrupt(); theThread.timed_join(theSeconds); } } boost::thread& theThread; boost::posix_time::time_duration theSeconds; boost::barrier& theBarrier; }; int main() { for (size_t i = 0; i < 1000000; ++i) { Runner myRunner; boost::thread myRunnerThread(myRunner); boost::barrier myBarrier(2); Interrupter myInterrupter1(myRunnerThread, myBarrier); Interrupter myInterrupter2(myRunnerThread, myBarrier); boost::thread myInterrupterThread1(myInterrupter1); boost::thread myInterrupterThread2(myInterrupter2); myInterrupterThread1.join(); myInterrupterThread2.join(); myRunnerThread.join(); std::cout << "."; std::cout.flush(); } }

Show replies by date

Anthony Williams

5 Dec 5 Dec

8:16 a.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

On 04/12/12 18:32, Gaetano Mendola wrote:

...

Hi all, I was investigating a rare deadlock when issuing an interrupt and a timed_join in parallel. I come out with the the following code showing the behavior.

The deadlock is rare so sometime you need to wait a bit.

I couldn't try it with boost 1.52 because the code is invalid due the precondition of "thread joinable" when issuing the timed_join.

That's a hint.

...

Is the code not valid or a real bug?

The code is invalid: you keep trying to interrupt and join even after the thread has been joined! Once the thread has been joined, the thread handle is no longer valid, and you should exit the loop.

...

void operator()() { theBarrier.wait(); for (int i=0; i<1000;++i) { theThread.interrupt(); theThread.timed_join(theSeconds); } }

Anthony -- Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/ just::thread C++11 thread library http://www.stdthread.co.uk Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976

Gaetano Mendola

11:33 a.m.

On 05/12/2012 09.16, Anthony Williams wrote:

...

On 04/12/12 18:32, Gaetano Mendola wrote:

...
Hi all, I was investigating a rare deadlock when issuing an interrupt and a timed_join in parallel. I come out with the the following code showing the behavior.

The deadlock is rare so sometime you need to wait a bit.

I couldn't try it with boost 1.52 because the code is invalid due the precondition of "thread joinable" when issuing the timed_join.

That's a hint.

...
Is the code not valid or a real bug?

The code is invalid: you keep trying to interrupt and join even after the thread has been joined! Once the thread has been joined, the thread handle is no longer valid, and you should exit the loop.

I haven't seen this statement in the documentation. The loop was meant to exploit exactly this, then you are confirming that interrupting a joined thread is not valid. How do I safely interrupt then a thread? There is no "atomic" check_joinable_then_interrupt, whatching at the interrupt code it seems that the check is done inside. I'm lost. In order to cope with a bug in 1.40 (an interrupt to a thread could have been lost) I have implemented my own ThreadGroup: ThreadGroup::interrupt_all() { for_each_thread( boost::thread::interrupt(); if ( boost::thread::timed_join() ) { move_to_next_thread } ) } along with the fact that boost::thread_group doesn't provide a method "join_any" with the semantic to issue an interrupt_all if any of the threads terminate I have implemented join_any this way: ThreadGroup::join_any() { while(true) { for_each_thread( if ( boost::thread::timed_join() ) { interrupt_all(); } else { move_to_next_thread } ) } } This has working well for 2 years now. Upgrading to 1.48 I'm experiencing dead locks and core dumps. The backtrace shows that a timed_join crashes if somehow the thread terminates at the same time. Given the fact in the 1.48 documentation there is nothing written about the fact I can not call a timed_join concurrently with the interrupt and the fact there is specified no precondition on the interrupt method I did suppose the above code should have been armless using the 1.48. I can try to remove the code issuing an interrupt until the timed_join doesn't exit successfully, thrusting that after an interrupt a boost::thread exits if is or reach an interruption point but I'm not quite convinced that this will solve for sure the deadlocks/crashes. I will remove the "redundant" code from my ThreadGroup and I will run my regression tests, I'll be back as soon I have some hints. Regards Gaetano Mendola

Anthony Williams

12:39 p.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

On 05/12/12 11:33, Gaetano Mendola wrote:

...

On 05/12/2012 09.16, Anthony Williams wrote:

...
On 04/12/12 18:32, Gaetano Mendola wrote:

...
Hi all, I was investigating a rare deadlock when issuing an interrupt and a timed_join in parallel. I come out with the the following code showing the behavior.

The deadlock is rare so sometime you need to wait a bit.

I couldn't try it with boost 1.52 because the code is invalid due the precondition of "thread joinable" when issuing the timed_join.

That's a hint.

...
Is the code not valid or a real bug?

The code is invalid: you keep trying to interrupt and join even after the thread has been joined! Once the thread has been joined, the thread handle is no longer valid, and you should exit the loop.

I haven't seen this statement in the documentation.

The thread object itself is not thread-safe --- a given thread object should have a single owner. Only one thread can perform an operation on that thread object at a time. If you wish to call member functions on it from multiple threads then they need synchronizing.

...

The loop was meant to exploit exactly this, then you are confirming that interrupting a joined thread is not valid. How do I safely interrupt then a thread?

Yes, you cannot interrupt (or do ANYTHING at all to) a joined thread. After joining with a thread the thread object is no longer associated with the thread --- the thread itself has terminated and all resources are cleaned up. To interrupt a thread you call interrupt(). Just don't call it concurrently with anything else.

...

There is no "atomic" check_joinable_then_interrupt, whatching at the interrupt code it seems that the check is done inside. I'm lost.

True. This is not necessary. A simple if(t.joinable()) t.interrupt() will suffice, since no other thread can be legally accessing your thread object.

...

In order to cope with a bug in 1.40 (an interrupt to a thread could have been lost) I have implemented my own ThreadGroup:

ThreadGroup::interrupt_all() { for_each_thread( boost::thread::interrupt(); if ( boost::thread::timed_join() ) { move_to_next_thread } ) }

along with the fact that boost::thread_group doesn't provide a method "join_any" with the semantic to issue an interrupt_all if any of the threads terminate I have implemented join_any this way:

ThreadGroup::join_any() { while(true) { for_each_thread( if ( boost::thread::timed_join() ) { interrupt_all(); } else { move_to_next_thread } ) } }

This has working well for 2 years now. Upgrading to 1.48 I'm experiencing dead locks and core dumps. The backtrace shows that a timed_join crashes if somehow the thread terminates at the same time. Given the fact in the 1.48 documentation there is nothing written about the fact I can not call a timed_join concurrently with the interrupt and the fact there is specified no precondition on the interrupt method I did suppose the above code should have been armless using the 1.48.

If a concurrent interrupt() and timed_join() call worked then that was a bonus. It is not guaranteed. Just because a thread object manages a thread does not mean it is itself thread-safe. Anthony -- Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/ just::thread C++11 thread library http://www.stdthread.co.uk Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976

Gaetano Mendola

1:13 p.m.

On 05/12/2012 13.39, Anthony Williams wrote:

...

On 05/12/12 11:33, Gaetano Mendola wrote:

...
On 05/12/2012 09.16, Anthony Williams wrote:

...
On 04/12/12 18:32, Gaetano Mendola wrote:

...
Hi all, I was investigating a rare deadlock when issuing an interrupt and a timed_join in parallel. I come out with the the following code showing the behavior.

The deadlock is rare so sometime you need to wait a bit.

I couldn't try it with boost 1.52 because the code is invalid due the precondition of "thread joinable" when issuing the timed_join.

That's a hint.

...
Is the code not valid or a real bug?

The code is invalid: you keep trying to interrupt and join even after the thread has been joined! Once the thread has been joined, the thread handle is no longer valid, and you should exit the loop.

I haven't seen this statement in the documentation.

The thread object itself is not thread-safe --- a given thread object should have a single owner. Only one thread can perform an operation on that thread object at a time. If you wish to call member functions on it from multiple threads then they need synchronizing.

This is exactly what I was looking for in the documentation but I haven't find any trace of it. Now all my observations make sense. Regards Gaetano Mendola

Vicente J. Botet Escriba

12:42 p.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

Le 05/12/12 12:33, Gaetano Mendola a écrit :

...

On 05/12/2012 09.16, Anthony Williams wrote:

...
On 04/12/12 18:32, Gaetano Mendola wrote:

...
Hi all, I was investigating a rare deadlock when issuing an interrupt and a timed_join in parallel. I come out with the the following code showing the behavior.

The deadlock is rare so sometime you need to wait a bit.

I couldn't try it with boost 1.52 because the code is invalid due the precondition of "thread joinable" when issuing the timed_join.

That's a hint.

...
Is the code not valid or a real bug?

The code is invalid: you keep trying to interrupt and join even after the thread has been joined! Once the thread has been joined, the thread handle is no longer valid, and you should exit the loop.

I haven't seen this statement in the documentation. The loop was meant to exploit exactly this, then you are confirming that interrupting a joined thread is not valid. How do I safely interrupt then a thread? There is no "atomic" check_joinable_then_interrupt, whatching at the interrupt code it seems that the check is done inside. I'm lost. Boost.Thread and std::thread are designed so that there is only one owner of the thread. That is only one thread can join/interrupt a thread safely.

Vicente

Gaetano Mendola

1:45 p.m.

On 05/12/2012 13.42, Vicente J. Botet Escriba wrote:

...

Le 05/12/12 12:33, Gaetano Mendola a écrit :

...
On 05/12/2012 09.16, Anthony Williams wrote:

...
On 04/12/12 18:32, Gaetano Mendola wrote:

...
Hi all, I was investigating a rare deadlock when issuing an interrupt and a timed_join in parallel. I come out with the the following code showing the behavior.

The deadlock is rare so sometime you need to wait a bit.

I couldn't try it with boost 1.52 because the code is invalid due the precondition of "thread joinable" when issuing the timed_join.

That's a hint.

...
Is the code not valid or a real bug?

The code is invalid: you keep trying to interrupt and join even after the thread has been joined! Once the thread has been joined, the thread handle is no longer valid, and you should exit the loop.

I haven't seen this statement in the documentation. The loop was meant to exploit exactly this, then you are confirming that interrupting a joined thread is not valid. How do I safely interrupt then a thread? There is no "atomic" check_joinable_then_interrupt, whatching at the interrupt code it seems that the check is done inside. I'm lost. Boost.Thread and std::thread are designed so that there is only one owner of the thread. That is only one thread can join/interrupt a thread safely.

Unless I have totally missed it the documentation doesn't mention anything about thread safety (would that be an hint about it?). Googling for it comes out exactly this question on stackoverflow: http://stackoverflow.com/questions/9292747/is-boostthread-thread-safe and the answer was: "And it appears that yes, it is thread-safe." I have added now the answer stating the opposite. Regards Gaetano Mendola

Vicente Botet

3:29 p.m.

Gaetano Mendola-3 wrote

...

On 05/12/2012 13.42, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 12:33, Gaetano Mendola a écrit :

...
On 05/12/2012 09.16, Anthony Williams wrote:

...
On 04/12/12 18:32, Gaetano Mendola wrote:

...
Hi all, I was investigating a rare deadlock when issuing an interrupt and a timed_join in parallel. I come out with the the following code showing the behavior.

The deadlock is rare so sometime you need to wait a bit.

I couldn't try it with boost 1.52 because the code is invalid due the precondition of "thread joinable" when issuing the timed_join.

That's a hint.

...
Is the code not valid or a real bug?

The code is invalid: you keep trying to interrupt and join even after the thread has been joined! Once the thread has been joined, the thread handle is no longer valid, and you should exit the loop.

I haven't seen this statement in the documentation. The loop was meant to exploit exactly this, then you are confirming that interrupting a joined thread is not valid. How do I safely interrupt then a thread? There is no "atomic" check_joinable_then_interrupt, whatching at the interrupt code it seems that the check is done inside. I'm lost. Boost.Thread and std::thread are designed so that there is only one owner of the thread. That is only one thread can join/interrupt a thread safely.

Unless I have totally missed it the documentation doesn't mention anything about thread safety (would that be an hint about it?).

...

From the 1.48 documentation "Member function timed_join()

bool timed_join(const system_time& wait_until); template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time); Preconditions: this->get_id()!=boost::this_thread::get_id() Postconditions: If *this refers to a thread of execution on entry, and timed_join returns true, that thread of execution has completed, and *this no longer refers to any thread of execution. If this call to timed_join returns false, *this is unchanged. " Your second call doesn't satisfy the pre-conditions, so that the outcome of this second call is undefined. Best, Vicente -- View this message in context: http://boost.2283326.n4.nabble.com/thread-1-48-Multiple-interrupt-timed-join... Sent from the Boost - Dev mailing list archive at Nabble.com.

Gaetano Mendola

3:59 p.m.

On 05/12/2012 16.29, Vicente Botet wrote:

...

Gaetano Mendola-3 wrote

...
On 05/12/2012 13.42, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 12:33, Gaetano Mendola a écrit :

...
On 05/12/2012 09.16, Anthony Williams wrote:

...
On 04/12/12 18:32, Gaetano Mendola wrote:

...
Hi all, I was investigating a rare deadlock when issuing an interrupt and a timed_join in parallel. I come out with the the following code showing the behavior.

The deadlock is rare so sometime you need to wait a bit.

I couldn't try it with boost 1.52 because the code is invalid due the precondition of "thread joinable" when issuing the timed_join.

That's a hint.

...
Is the code not valid or a real bug?

The code is invalid: you keep trying to interrupt and join even after the thread has been joined! Once the thread has been joined, the thread handle is no longer valid, and you should exit the loop.

I haven't seen this statement in the documentation. The loop was meant to exploit exactly this, then you are confirming that interrupting a joined thread is not valid. How do I safely interrupt then a thread? There is no "atomic" check_joinable_then_interrupt, whatching at the interrupt code it seems that the check is done inside. I'm lost. Boost.Thread and std::thread are designed so that there is only one owner of the thread. That is only one thread can join/interrupt a thread safely.

Unless I have totally missed it the documentation doesn't mention anything about thread safety (would that be an hint about it?).

From the 1.48 documentation "Member function timed_join()

bool timed_join(const system_time& wait_until);

template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

this->get_id()!=boost::this_thread::get_id()

Postconditions:

If *this refers to a thread of execution on entry, and timed_join returns true, that thread of execution has completed, and *this no longer refers to any thread of execution. If this call to timed_join returns false, *this is unchanged. "

Your second call doesn't satisfy the pre-conditions, so that the outcome of this second call is undefined.

That precondition tests that your are not interrupting yourself doesn't say anything about thread safety. Am I missing something ?

Anthony Williams

5:08 p.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

On 05/12/12 15:59, Gaetano Mendola wrote:

...

On 05/12/2012 16.29, Vicente Botet wrote:

...
template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

this->get_id()!=boost::this_thread::get_id()

Postconditions:

If *this refers to a thread of execution on entry, and timed_join returns true, that thread of execution has completed, and *this no longer refers to any thread of execution. If this call to timed_join returns false, *this is unchanged. "

Your second call doesn't satisfy the pre-conditions, so that the outcome of this second call is undefined.

That precondition tests that your are not interrupting yourself doesn't say anything about thread safety. Am I missing something ?

Your code had two bugs. Firstly, it called interrupt and timed_join on the same thread object from multiple threads. Secondly, it called interrupt and timed_join in a loop without ever checking the result of timed_join. Vicente's quote from the docs addresses the second issue: if timed_join returns true then looping round to call interrupt and timed_join again is now a precondition violation, and thus undefined behaviour. Anthony -- Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/ just::thread C++11 thread library http://www.stdthread.co.uk Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976

Gaetano Mendola

5:39 p.m.

On 05/12/2012 18.08, Anthony Williams wrote:

...

On 05/12/12 15:59, Gaetano Mendola wrote:

...
On 05/12/2012 16.29, Vicente Botet wrote:

...
template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

this->get_id()!=boost::this_thread::get_id()

Postconditions:

If *this refers to a thread of execution on entry, and timed_join returns true, that thread of execution has completed, and *this no longer refers to any thread of execution. If this call to timed_join returns false, *this is unchanged. "

Your second call doesn't satisfy the pre-conditions, so that the outcome of this second call is undefined.

That precondition tests that your are not interrupting yourself doesn't say anything about thread safety. Am I missing something ?

Your code had two bugs. Firstly, it called interrupt and timed_join on the same thread object from multiple threads. Secondly, it called interrupt and timed_join in a loop without ever checking the result of timed_join.

Vicente's quote from the docs addresses the second issue: if timed_join returns true then looping round to call interrupt and timed_join again is now a precondition violation, and thus undefined behaviour.

Then I'm not reading well the precondition. this->get_id(): the thread id of the instance on which I'm calling the boost::thread::interrupt(). boost::this_thread::get_id(): gives the caller thread id. from documentation I'm reading that if an instance of boost::thread doesn't refer to a thread of execution then it returns a default-constructed boost::thread::id that represents Not-a-Thread. This means that in case of an already joined thread this->get_id() returns a default-constructed boost::thread::id, how can that be equal to current (caller) thread id? I'm reading the precondition as: "you can not time_join your self" Indeed looking at 1.52 code in the interrupt method I can read: if (this_thread::get_id() == get_id()) { boost::throw_exception( thread_resource_error( system::errc::resource_deadlock_would_occur, "boost thread: trying joining itself")); } **boost thread: trying joining itself** I'm sure I'm still missing something. Regards Gaetano Mendola

Vicente J. Botet Escriba

6:57 p.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

Le 05/12/12 18:39, Gaetano Mendola a écrit :

...

On 05/12/2012 18.08, Anthony Williams wrote:

...
On 05/12/12 15:59, Gaetano Mendola wrote:

...
On 05/12/2012 16.29, Vicente Botet wrote:

...
template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

this->get_id()!=boost::this_thread::get_id()

Postconditions:

If *this refers to a thread of execution on entry, and timed_join returns true, that thread of execution has completed, and *this no longer refers to any thread of execution. If this call to timed_join returns false, *this is unchanged. "

Your second call doesn't satisfy the pre-conditions, so that the outcome of this second call is undefined.

That precondition tests that your are not interrupting yourself doesn't say anything about thread safety. Am I missing something ?

Your code had two bugs. Firstly, it called interrupt and timed_join on the same thread object from multiple threads. Secondly, it called interrupt and timed_join in a loop without ever checking the result of timed_join.

Vicente's quote from the docs addresses the second issue: if timed_join returns true then looping round to call interrupt and timed_join again is now a precondition violation, and thus undefined behaviour.

Then I'm not reading well the precondition.

this->get_id(): the thread id of the instance on which I'm calling the boost::thread::interrupt().

boost::this_thread::get_id(): gives the caller thread id.

from documentation I'm reading that if an instance of boost::thread doesn't refer to a thread of execution then it returns a default-constructed boost::thread::id that represents Not-a-Thread.

This means that in case of an already joined thread this->get_id() returns a default-constructed boost::thread::id, how can that be equal to current (caller) thread id?

I'm reading the precondition as: "you can not time_join your self"

Indeed looking at 1.52 code in the interrupt method I can read:

if (this_thread::get_id() == get_id()) { boost::throw_exception( thread_resource_error( system::errc::resource_deadlock_would_occur, "boost thread: trying joining itself")); }

**boost thread: trying joining itself**

I'm sure I'm still missing something.

My bad. You are right the pre-conditions in 1.48 is not the correct one. I have updated the documentation for join() since 1.50 and since 1.52 for all the other join functions. In particular Member function |timed_join()| <http://www.boost.org/doc/libs/1_52_0/doc/html/thread/thread_management.html#thread.thread_management.thread.timed_join> bool timed_join(const system_time& wait_until); template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time); Preconditions: the thread is joinable. So you are right there is a bug up to 1.51, with a fix on 1.52. Sorry for the troubles this issue could caused you. Best, Vcente

Gaetano Mendola

8:05 p.m.

On 05/12/2012 19.57, Vicente J. Botet Escriba wrote:

...

Le 05/12/12 18:39, Gaetano Mendola a écrit :

...
On 05/12/2012 18.08, Anthony Williams wrote:

...
On 05/12/12 15:59, Gaetano Mendola wrote:

...
On 05/12/2012 16.29, Vicente Botet wrote:

...
template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

this->get_id()!=boost::this_thread::get_id()

Postconditions:

If *this refers to a thread of execution on entry, and timed_join returns true, that thread of execution has completed, and *this no longer refers to any thread of execution. If this call to timed_join returns false, *this is unchanged. "

Your second call doesn't satisfy the pre-conditions, so that the outcome of this second call is undefined.

That precondition tests that your are not interrupting yourself doesn't say anything about thread safety. Am I missing something ?

Your code had two bugs. Firstly, it called interrupt and timed_join on the same thread object from multiple threads. Secondly, it called interrupt and timed_join in a loop without ever checking the result of timed_join.

Vicente's quote from the docs addresses the second issue: if timed_join returns true then looping round to call interrupt and timed_join again is now a precondition violation, and thus undefined behaviour.

Then I'm not reading well the precondition.

this->get_id(): the thread id of the instance on which I'm calling the boost::thread::interrupt().

boost::this_thread::get_id(): gives the caller thread id.

from documentation I'm reading that if an instance of boost::thread doesn't refer to a thread of execution then it returns a default-constructed boost::thread::id that represents Not-a-Thread.

This means that in case of an already joined thread this->get_id() returns a default-constructed boost::thread::id, how can that be equal to current (caller) thread id?

I'm reading the precondition as: "you can not time_join your self"

Indeed looking at 1.52 code in the interrupt method I can read:

if (this_thread::get_id() == get_id()) { boost::throw_exception( thread_resource_error( system::errc::resource_deadlock_would_occur, "boost thread: trying joining itself")); }

**boost thread: trying joining itself**

I'm sure I'm still missing something.

My bad. You are right the pre-conditions in 1.48 is not the correct one. I have updated the documentation for join() since 1.50 and since 1.52 for all the other join functions. In particular

Member function |timed_join()|

<http://www.boost.org/doc/libs/1_52_0/doc/html/thread/thread_management.html#thread.thread_management.thread.timed_join>

bool timed_join(const system_time& wait_until);

template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

the thread is joinable.

So you are right there is a bug up to 1.51, with a fix on 1.52.

Sorry for the troubles this issue could caused you.

I believe boost project deserve a wiki documentation site. Gaetano Mendola -- http://cpp-today.blogspot.it/

Vicente J. Botet Escriba

6 Dec 6 Dec

10:31 p.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

Le 05/12/12 21:05, Gaetano Mendola a écrit :

...

On 05/12/2012 19.57, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 18:39, Gaetano Mendola a écrit :

...
On 05/12/2012 18.08, Anthony Williams wrote:

...
On 05/12/12 15:59, Gaetano Mendola wrote:

...
On 05/12/2012 16.29, Vicente Botet wrote:

...
template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

this->get_id()!=boost::this_thread::get_id()

Postconditions:

If *this refers to a thread of execution on entry, and timed_join returns true, that thread of execution has completed, and *this no longer refers to any thread of execution. If this call to timed_join returns false, *this is unchanged. "

Your second call doesn't satisfy the pre-conditions, so that the outcome of this second call is undefined.

That precondition tests that your are not interrupting yourself doesn't say anything about thread safety. Am I missing something ?

Your code had two bugs. Firstly, it called interrupt and timed_join on the same thread object from multiple threads. Secondly, it called interrupt and timed_join in a loop without ever checking the result of timed_join.

Vicente's quote from the docs addresses the second issue: if timed_join returns true then looping round to call interrupt and timed_join again is now a precondition violation, and thus undefined behaviour.

Then I'm not reading well the precondition.

this->get_id(): the thread id of the instance on which I'm calling the boost::thread::interrupt().

boost::this_thread::get_id(): gives the caller thread id.

from documentation I'm reading that if an instance of boost::thread doesn't refer to a thread of execution then it returns a default-constructed boost::thread::id that represents Not-a-Thread.

This means that in case of an already joined thread this->get_id() returns a default-constructed boost::thread::id, how can that be equal to current (caller) thread id?

I'm reading the precondition as: "you can not time_join your self"

Indeed looking at 1.52 code in the interrupt method I can read:

if (this_thread::get_id() == get_id()) { boost::throw_exception( thread_resource_error( system::errc::resource_deadlock_would_occur, "boost thread: trying joining itself")); }

**boost thread: trying joining itself**

I'm sure I'm still missing something.

My bad. You are right the pre-conditions in 1.48 is not the correct one. I have updated the documentation for join() since 1.50 and since 1.52 for all the other join functions. In particular

Member function |timed_join()|

<http://www.boost.org/doc/libs/1_52_0/doc/html/thread/thread_management.html#thread.thread_management.thread.timed_join>

bool timed_join(const system_time& wait_until);

template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

the thread is joinable.

So you are right there is a bug up to 1.51, with a fix on 1.52.

Sorry for the troubles this issue could caused you.

I believe boost project deserve a wiki documentation site.

And your suggestion is? Vicente

Gaetano Mendola

7 Dec 7 Dec

12:11 a.m.

On 06/12/2012 23.31, Vicente J. Botet Escriba wrote:

...

Le 05/12/12 21:05, Gaetano Mendola a écrit :

...
On 05/12/2012 19.57, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 18:39, Gaetano Mendola a écrit :

...
On 05/12/2012 18.08, Anthony Williams wrote:

...
On 05/12/12 15:59, Gaetano Mendola wrote:

...
On 05/12/2012 16.29, Vicente Botet wrote: > template<typename TimeDuration> > bool timed_join(TimeDuration const& rel_time); > > Preconditions: > > this->get_id()!=boost::this_thread::get_id() > > Postconditions: > > If *this refers to a thread of execution on entry, and > timed_join > returns true, that thread of execution has completed, and *this no > longer > refers to any thread of execution. If this call to timed_join > returns > false, > *this is unchanged. > " > > Your second call doesn't satisfy the pre-conditions, so that the > outcome of > this second call is undefined.

That precondition tests that your are not interrupting yourself doesn't say anything about thread safety. Am I missing something ?

Your code had two bugs. Firstly, it called interrupt and timed_join on the same thread object from multiple threads. Secondly, it called interrupt and timed_join in a loop without ever checking the result of timed_join.

Vicente's quote from the docs addresses the second issue: if timed_join returns true then looping round to call interrupt and timed_join again is now a precondition violation, and thus undefined behaviour.

Then I'm not reading well the precondition.

this->get_id(): the thread id of the instance on which I'm calling the boost::thread::interrupt().

boost::this_thread::get_id(): gives the caller thread id.

from documentation I'm reading that if an instance of boost::thread doesn't refer to a thread of execution then it returns a default-constructed boost::thread::id that represents Not-a-Thread.

This means that in case of an already joined thread this->get_id() returns a default-constructed boost::thread::id, how can that be equal to current (caller) thread id?

I'm reading the precondition as: "you can not time_join your self"

Indeed looking at 1.52 code in the interrupt method I can read:

if (this_thread::get_id() == get_id()) { boost::throw_exception( thread_resource_error( system::errc::resource_deadlock_would_occur, "boost thread: trying joining itself")); }

**boost thread: trying joining itself**

I'm sure I'm still missing something.

My bad. You are right the pre-conditions in 1.48 is not the correct one. I have updated the documentation for join() since 1.50 and since 1.52 for all the other join functions. In particular

Member function |timed_join()|

<http://www.boost.org/doc/libs/1_52_0/doc/html/thread/thread_management.html#thread.thread_management.thread.timed_join>

bool timed_join(const system_time& wait_until);

template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

the thread is joinable.

So you are right there is a bug up to 1.51, with a fix on 1.52.

Sorry for the troubles this issue could caused you.

I believe boost project deserve a wiki documentation site.

And your suggestion is?

If I had time and resources I would have put up a boost doc site wiki style so it can be easily modified by almost everyone without bashing doc maintainers for even little adjustments. As said most of the problems I'm seeing is a "RTFM" problem but some time is even a WTFM related issue. I understand that is not easy to maintain documentation in shape especialy when you have so many "active" boost version around. Consider that last Ubuntu LTS deploys by default boost 1.46 and not all IT department do accept to install libraries that are not part of the distribution (we have to face it). To add salt to the wound boost doesn't back-port bugs corrections (AFAIK) so various Linux distributions are left with boost versions containing annoying bugs. Don't take me wrong, no project is bug-less, but at least a "warning" on documentation can save people time. As example we used then 1.40 boost 2 years ago (shipped with the previous Ubunt LTS version) and it had the bug of sporadic boost::thread::interrupt not working, we spend 3 days to figure it out, I would have put a "warning" in the 1.40 documentation about the boost::thread::interrupt missing. Regards Gaetano Mendola -- http://cpp-today.blogspot.it/

AntonioM

6 Dec 6 Dec

4:02 p.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

Hi Vicente,

...

[...] template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

the thread is joinable.

could you please, then, summarize what is the correct way of coping with a single thread structure from multiple threads of execution in boost 1.48? Besides locking it for synchronizing access do you think it would be safe to access the thread methods like this? (Please don't elaborate on the specific code choices, I put there some almost non-sense code, just to better illustrate my question) { end_condition = false while(not end_condition) { <get the to_be_joined_thread lock> if (to_be_joined_thread->joinable()) { result = to_be_joined_thread->timed_join(<1 second>); if (result) { end_condition = true; } } else { end_condition = true; } } } Suppose at the first round it can't join the thread, after 1 second it times out, it goes back to the beginning and tries again. Between the lock release and the new lock acquisition another thread runs the same code and it manages to join the to_be_joined_thread. When the first thread gets the lock again, what is it going to find with the joinable() call? I would expect a joinable() == false, 'cause the thread has been already joined by someone else. Am I right? A.

Vicente J. Botet Escriba

5 Dec 5 Dec

5:35 p.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

Le 05/12/12 16:59, Gaetano Mendola a écrit :

...

On 05/12/2012 16.29, Vicente Botet wrote:

...
Gaetano Mendola-3 wrote

...
On 05/12/2012 13.42, Vicente J. Botet Escriba wrote:

...
...
On 05/12/2012 09.16, Anthony Williams wrote:

...
On 04/12/12 18:32, Gaetano Mendola wrote: > Hi all, > I was investigating a rare deadlock when issuing an interrupt and > a timed_join in parallel. I come out with the the following code > showing the behavior. > > The deadlock is rare so sometime you need to wait a bit. > > I couldn't try it with boost 1.52 because the code is invalid > due the precondition of "thread joinable" when issuing the > timed_join.

That's a hint.

> Is the code not valid or a real bug?

The code is invalid: you keep trying to interrupt and join even after the thread has been joined! Once the thread has been joined, the thread handle is no longer valid, and you should exit the loop.

I haven't seen this statement in the documentation. The loop was meant to exploit exactly this, then you are confirming that interrupting a joined thread is not valid. How do I safely interrupt then a thread? There is no "atomic" check_joinable_then_interrupt, whatching at the interrupt code it seems that the check is done inside. I'm lost. Boost.Thread and std::thread are designed so that there is only one owner of the thread. That is only one thread can join/interrupt a

Le 05/12/12 12:33, Gaetano Mendola a écrit : thread safely.

Unless I have totally missed it the documentation doesn't mention anything about thread safety (would that be an hint about it?).

From the 1.48 documentation "Member function timed_join()

bool timed_join(const system_time& wait_until);

template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

this->get_id()!=boost::this_thread::get_id()

Postconditions:

If *this refers to a thread of execution on entry, and timed_join returns true, that thread of execution has completed, and *this no longer refers to any thread of execution. If this call to timed_join returns false, *this is unchanged. "

Your second call doesn't satisfy the pre-conditions, so that the outcome of this second call is undefined.

That precondition tests that your are not interrupting yourself doesn't say anything about thread safety. Am I missing something ?

Hi, no function is considered been thread-safe until it is stated explicitly. Vicente

Gaetano Mendola

5:49 p.m.

On 05/12/2012 18.35, Vicente J. Botet Escriba wrote:

...

Le 05/12/12 16:59, Gaetano Mendola a écrit :

...
On 05/12/2012 16.29, Vicente Botet wrote:

...
Gaetano Mendola-3 wrote

...
On 05/12/2012 13.42, Vicente J. Botet Escriba wrote:

...
...
On 05/12/2012 09.16, Anthony Williams wrote: > On 04/12/12 18:32, Gaetano Mendola wrote: >> Hi all, >> I was investigating a rare deadlock when issuing an interrupt and >> a timed_join in parallel. I come out with the the following code >> showing the behavior. >> >> The deadlock is rare so sometime you need to wait a bit. >> >> I couldn't try it with boost 1.52 because the code is invalid >> due the precondition of "thread joinable" when issuing the >> timed_join. > > That's a hint. > >> Is the code not valid or a real bug? > > The code is invalid: you keep trying to interrupt and join even > after > the thread has been joined! Once the thread has been joined, the > thread > handle is no longer valid, and you should exit the loop.

I haven't seen this statement in the documentation. The loop was meant to exploit exactly this, then you are confirming that interrupting a joined thread is not valid. How do I safely interrupt then a thread? There is no "atomic" check_joinable_then_interrupt, whatching at the interrupt code it seems that the check is done inside. I'm lost. Boost.Thread and std::thread are designed so that there is only one owner of the thread. That is only one thread can join/interrupt a

Le 05/12/12 12:33, Gaetano Mendola a écrit : thread safely.

Unless I have totally missed it the documentation doesn't mention anything about thread safety (would that be an hint about it?).

From the 1.48 documentation "Member function timed_join()

bool timed_join(const system_time& wait_until);

template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

this->get_id()!=boost::this_thread::get_id()

Postconditions:

If *this refers to a thread of execution on entry, and timed_join returns true, that thread of execution has completed, and *this no longer refers to any thread of execution. If this call to timed_join returns false, *this is unchanged. "

Your second call doesn't satisfy the pre-conditions, so that the outcome of this second call is undefined.

That precondition tests that your are not interrupting yourself doesn't say anything about thread safety. Am I missing something ?

Hi,

no function is considered been thread-safe until it is stated explicitly.

You are completely right, it was my own fault considering it thread safe. So even thread_group is not thread safe? The implementation is full thread safe. Is that just implementation detail and we have to not rely on the fact is thread safe? Regards Gateano Mendola -- http://cpp-today.blogspot.it/

Vicente J. Botet Escriba

6:44 p.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

Le 05/12/12 18:49, Gaetano Mendola a écrit :

...

On 05/12/2012 18.35, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 16:59, Gaetano Mendola a écrit :

...
On 05/12/2012 16.29, Vicente Botet wrote:

...
Gaetano Mendola-3 wrote

...
On 05/12/2012 13.42, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 12:33, Gaetano Mendola a écrit : > On 05/12/2012 09.16, Anthony Williams wrote: >> On 04/12/12 18:32, Gaetano Mendola wrote: >>> Hi all, >>> I was investigating a rare deadlock when issuing an interrupt and >>> a timed_join in parallel. I come out with the the following code >>> showing the behavior. >>> >>> The deadlock is rare so sometime you need to wait a bit. >>> >>> I couldn't try it with boost 1.52 because the code is invalid >>> due the precondition of "thread joinable" when issuing the >>> timed_join. >> >> That's a hint. >> >>> Is the code not valid or a real bug? >> >> The code is invalid: you keep trying to interrupt and join even >> after >> the thread has been joined! Once the thread has been joined, the >> thread >> handle is no longer valid, and you should exit the loop. > > I haven't seen this statement in the documentation. > The loop was meant to exploit exactly this, then you are confirming > that interrupting a joined thread is not valid. How do I safely > interrupt then a thread? > There is no "atomic" check_joinable_then_interrupt, whatching at > the > interrupt code it seems that the check is done inside. I'm lost. Boost.Thread and std::thread are designed so that there is only one owner of the thread. That is only one thread can join/interrupt a thread safely.

Unless I have totally missed it the documentation doesn't mention anything about thread safety (would that be an hint about it?).

From the 1.48 documentation "Member function timed_join()

bool timed_join(const system_time& wait_until);

template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

this->get_id()!=boost::this_thread::get_id()

Postconditions:

If *this refers to a thread of execution on entry, and timed_join returns true, that thread of execution has completed, and *this no longer refers to any thread of execution. If this call to timed_join returns false, *this is unchanged. "

Your second call doesn't satisfy the pre-conditions, so that the outcome of this second call is undefined.

That precondition tests that your are not interrupting yourself doesn't say anything about thread safety. Am I missing something ?

Hi,

no function is considered been thread-safe until it is stated explicitly.

You are completely right, it was my own fault considering it thread safe.

So even thread_group is not thread safe? The implementation is full thread safe. How it is thread-safe? Only one thread can call to the thread_group functions at a time.

Vicente

Gaetano Mendola

8:02 p.m.

On 05/12/2012 19.44, Vicente J. Botet Escriba wrote:

...

Le 05/12/12 18:49, Gaetano Mendola a écrit :

...
On 05/12/2012 18.35, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 16:59, Gaetano Mendola a écrit :

...
On 05/12/2012 16.29, Vicente Botet wrote:

...
Gaetano Mendola-3 wrote

...
On 05/12/2012 13.42, Vicente J. Botet Escriba wrote: > Le 05/12/12 12:33, Gaetano Mendola a écrit : >> On 05/12/2012 09.16, Anthony Williams wrote: >>> On 04/12/12 18:32, Gaetano Mendola wrote: >>>> Hi all, >>>> I was investigating a rare deadlock when issuing an interrupt and >>>> a timed_join in parallel. I come out with the the following code >>>> showing the behavior. >>>> >>>> The deadlock is rare so sometime you need to wait a bit. >>>> >>>> I couldn't try it with boost 1.52 because the code is invalid >>>> due the precondition of "thread joinable" when issuing the >>>> timed_join. >>> >>> That's a hint. >>> >>>> Is the code not valid or a real bug? >>> >>> The code is invalid: you keep trying to interrupt and join even >>> after >>> the thread has been joined! Once the thread has been joined, the >>> thread >>> handle is no longer valid, and you should exit the loop. >> >> I haven't seen this statement in the documentation. >> The loop was meant to exploit exactly this, then you are confirming >> that interrupting a joined thread is not valid. How do I safely >> interrupt then a thread? >> There is no "atomic" check_joinable_then_interrupt, whatching at >> the >> interrupt code it seems that the check is done inside. I'm lost. > Boost.Thread and std::thread are designed so that there is only one > owner of the thread. That is only one thread can join/interrupt a > thread > safely.

Unless I have totally missed it the documentation doesn't mention anything about thread safety (would that be an hint about it?).

From the 1.48 documentation "Member function timed_join()

bool timed_join(const system_time& wait_until);

template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

this->get_id()!=boost::this_thread::get_id()

Postconditions:

If *this refers to a thread of execution on entry, and timed_join returns true, that thread of execution has completed, and *this no longer refers to any thread of execution. If this call to timed_join returns false, *this is unchanged. "

Your second call doesn't satisfy the pre-conditions, so that the outcome of this second call is undefined.

That precondition tests that your are not interrupting yourself doesn't say anything about thread safety. Am I missing something ?

Hi,

no function is considered been thread-safe until it is stated explicitly.

You are completely right, it was my own fault considering it thread safe.

So even thread_group is not thread safe? The implementation is full thread safe. How it is thread-safe? Only one thread can call to the thread_group functions at a time.

The implementation is straight. The boost::thread_group keeps a std::list of boost::threads pointers. the public methods: create_thread add_thread remove_thread join_all interrupt_all size are all protecting the internal status and then all the threads in the list with a common mutex. It appears completely thread safe to me. Regards Gaetano Mendola --

Gaetano Mendola

6 Dec 6 Dec

5:32 p.m.

On 05/12/2012 21.02, Gaetano Mendola wrote:

...

On 05/12/2012 19.44, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 18:49, Gaetano Mendola a écrit :

...
On 05/12/2012 18.35, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 16:59, Gaetano Mendola a écrit :

...
On 05/12/2012 16.29, Vicente Botet wrote:

...
Gaetano Mendola-3 wrote > On 05/12/2012 13.42, Vicente J. Botet Escriba wrote: >> Le 05/12/12 12:33, Gaetano Mendola a écrit : >>> On 05/12/2012 09.16, Anthony Williams wrote: >>>> On 04/12/12 18:32, Gaetano Mendola wrote: >>>>> Hi all, >>>>> I was investigating a rare deadlock when issuing an interrupt >>>>> and >>>>> a timed_join in parallel. I come out with the the following code >>>>> showing the behavior. >>>>> >>>>> The deadlock is rare so sometime you need to wait a bit. >>>>> >>>>> I couldn't try it with boost 1.52 because the code is invalid >>>>> due the precondition of "thread joinable" when issuing the >>>>> timed_join. >>>> >>>> That's a hint. >>>> >>>>> Is the code not valid or a real bug? >>>> >>>> The code is invalid: you keep trying to interrupt and join even >>>> after >>>> the thread has been joined! Once the thread has been joined, the >>>> thread >>>> handle is no longer valid, and you should exit the loop. >>> >>> I haven't seen this statement in the documentation. >>> The loop was meant to exploit exactly this, then you are >>> confirming >>> that interrupting a joined thread is not valid. How do I safely >>> interrupt then a thread? >>> There is no "atomic" check_joinable_then_interrupt, whatching at >>> the >>> interrupt code it seems that the check is done inside. I'm lost. >> Boost.Thread and std::thread are designed so that there is only one >> owner of the thread. That is only one thread can join/interrupt a >> thread >> safely. > > Unless I have totally missed it the documentation doesn't mention > anything about thread safety (would that be an hint about it?).

From the 1.48 documentation "Member function timed_join()

bool timed_join(const system_time& wait_until);

template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

this->get_id()!=boost::this_thread::get_id()

Postconditions:

If *this refers to a thread of execution on entry, and timed_join returns true, that thread of execution has completed, and *this no longer refers to any thread of execution. If this call to timed_join returns false, *this is unchanged. "

Your second call doesn't satisfy the pre-conditions, so that the outcome of this second call is undefined.

That precondition tests that your are not interrupting yourself doesn't say anything about thread safety. Am I missing something ?

Hi,

no function is considered been thread-safe until it is stated explicitly.

You are completely right, it was my own fault considering it thread safe.

So even thread_group is not thread safe? The implementation is full thread safe. How it is thread-safe? Only one thread can call to the thread_group functions at a time.

The implementation is straight. The boost::thread_group keeps a std::list of boost::threads pointers.

the public methods:

create_thread add_thread remove_thread join_all interrupt_all size

are all protecting the internal status and then all the threads in the list with a common mutex.

It appears completely thread safe to me.

Watching at it the problem is not the thread safety but the fact that is possible to issue two join_all and the second call becomes an "every one guess" operation. I would not say it's not thread safe. Regards Gaetano Mendola

Vicente J. Botet Escriba

10:31 p.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

Le 05/12/12 21:02, Gaetano Mendola a écrit :

...

On 05/12/2012 19.44, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 18:49, Gaetano Mendola a écrit :

...
On 05/12/2012 18.35, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 16:59, Gaetano Mendola a écrit :

...
On 05/12/2012 16.29, Vicente Botet wrote:

...
Gaetano Mendola-3 wrote > On 05/12/2012 13.42, Vicente J. Botet Escriba wrote: >> Le 05/12/12 12:33, Gaetano Mendola a écrit : >>> On 05/12/2012 09.16, Anthony Williams wrote: >>>> On 04/12/12 18:32, Gaetano Mendola wrote: >>>>> Hi all, >>>>> I was investigating a rare deadlock when issuing an >>>>> interrupt and >>>>> a timed_join in parallel. I come out with the the following >>>>> code >>>>> showing the behavior. >>>>> >>>>> The deadlock is rare so sometime you need to wait a bit. >>>>> >>>>> I couldn't try it with boost 1.52 because the code is invalid >>>>> due the precondition of "thread joinable" when issuing the >>>>> timed_join. >>>> >>>> That's a hint. >>>> >>>>> Is the code not valid or a real bug? >>>> >>>> The code is invalid: you keep trying to interrupt and join even >>>> after >>>> the thread has been joined! Once the thread has been joined, the >>>> thread >>>> handle is no longer valid, and you should exit the loop. >>> >>> I haven't seen this statement in the documentation. >>> The loop was meant to exploit exactly this, then you are >>> confirming >>> that interrupting a joined thread is not valid. How do I safely >>> interrupt then a thread? >>> There is no "atomic" check_joinable_then_interrupt, whatching at >>> the >>> interrupt code it seems that the check is done inside. I'm lost. >> Boost.Thread and std::thread are designed so that there is only >> one >> owner of the thread. That is only one thread can join/interrupt a >> thread >> safely. > > Unless I have totally missed it the documentation doesn't mention > anything about thread safety (would that be an hint about it?).

From the 1.48 documentation "Member function timed_join()

bool timed_join(const system_time& wait_until);

template<typename TimeDuration> bool timed_join(TimeDuration const& rel_time);

Preconditions:

this->get_id()!=boost::this_thread::get_id()

Postconditions:

If *this refers to a thread of execution on entry, and timed_join returns true, that thread of execution has completed, and *this no longer refers to any thread of execution. If this call to timed_join returns false, *this is unchanged. "

Your second call doesn't satisfy the pre-conditions, so that the outcome of this second call is undefined.

That precondition tests that your are not interrupting yourself doesn't say anything about thread safety. Am I missing something ?

Hi,

no function is considered been thread-safe until it is stated explicitly.

You are completely right, it was my own fault considering it thread safe.

So even thread_group is not thread safe? The implementation is full thread safe. How it is thread-safe? Only one thread can call to the thread_group functions at a time.

The implementation is straight. The boost::thread_group keeps a std::list of boost::threads pointers.

the public methods:

create_thread add_thread remove_thread join_all interrupt_all size

are all protecting the internal status and then all the threads in the list with a common mutex.

It appears completely thread safe to me. You are right. I didn't see the implementation. Anthony, I don't know what was the intent of the initial design. Do you have an idea? What do you suggest? update the documentation?

Best, Vicente

Gaetano Mendola

7 Dec 7 Dec

12:23 a.m.

On 06/12/2012 23.31, Vicente J. Botet Escriba wrote:

...

Le 05/12/12 21:02, Gaetano Mendola a écrit :

...
On 05/12/2012 19.44, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 18:49, Gaetano Mendola a écrit :

...
On 05/12/2012 18.35, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 16:59, Gaetano Mendola a écrit :

...
On 05/12/2012 16.29, Vicente Botet wrote: > Gaetano Mendola-3 wrote >> On 05/12/2012 13.42, Vicente J. Botet Escriba wrote: >>> Le 05/12/12 12:33, Gaetano Mendola a écrit : >>>> On 05/12/2012 09.16, Anthony Williams wrote: >>>>> On 04/12/12 18:32, Gaetano Mendola wrote: >>>>>> Hi all, >>>>>> I was investigating a rare deadlock when issuing an >>>>>> interrupt and >>>>>> a timed_join in parallel. I come out with the the following >>>>>> code >>>>>> showing the behavior. >>>>>> >>>>>> The deadlock is rare so sometime you need to wait a bit. >>>>>> >>>>>> I couldn't try it with boost 1.52 because the code is invalid >>>>>> due the precondition of "thread joinable" when issuing the >>>>>> timed_join. >>>>> >>>>> That's a hint. >>>>> >>>>>> Is the code not valid or a real bug? >>>>> >>>>> The code is invalid: you keep trying to interrupt and join even >>>>> after >>>>> the thread has been joined! Once the thread has been joined, the >>>>> thread >>>>> handle is no longer valid, and you should exit the loop. >>>> >>>> I haven't seen this statement in the documentation. >>>> The loop was meant to exploit exactly this, then you are >>>> confirming >>>> that interrupting a joined thread is not valid. How do I safely >>>> interrupt then a thread? >>>> There is no "atomic" check_joinable_then_interrupt, whatching at >>>> the >>>> interrupt code it seems that the check is done inside. I'm lost. >>> Boost.Thread and std::thread are designed so that there is only >>> one >>> owner of the thread. That is only one thread can join/interrupt a >>> thread >>> safely. >> >> Unless I have totally missed it the documentation doesn't mention >> anything about thread safety (would that be an hint about it?). > > From the 1.48 documentation > "Member function timed_join() > > bool timed_join(const system_time& wait_until); > > template<typename TimeDuration> > bool timed_join(TimeDuration const& rel_time); > > Preconditions: > > this->get_id()!=boost::this_thread::get_id() > > Postconditions: > > If *this refers to a thread of execution on entry, and > timed_join > returns true, that thread of execution has completed, and *this no > longer > refers to any thread of execution. If this call to timed_join > returns > false, > *this is unchanged. > " > > Your second call doesn't satisfy the pre-conditions, so that the > outcome of > this second call is undefined.

That precondition tests that your are not interrupting yourself doesn't say anything about thread safety. Am I missing something ?

Hi,

no function is considered been thread-safe until it is stated explicitly.

You are completely right, it was my own fault considering it thread safe.

So even thread_group is not thread safe? The implementation is full thread safe. How it is thread-safe? Only one thread can call to the thread_group functions at a time.

The implementation is straight. The boost::thread_group keeps a std::list of boost::threads pointers.

the public methods:

create_thread add_thread remove_thread join_all interrupt_all size

are all protecting the internal status and then all the threads in the list with a common mutex.

It appears completely thread safe to me. You are right. I didn't see the implementation. Anthony, I don't know what was the intent of the initial design. Do you have an idea? What do you suggest? update the documentation?

Even the implementation can be improved, even if it's thread safe it doesn't avoid to issue a join to an already joined thread, or for the matter to interrupt a thread already joined/interrupted. I also believe the create_thread returning a boost::thread pointers is very dangerous, indeed doing so the boost::thread_group doesn't have anymore the ownership of it, it would be better if create_thread would have returned a thread id instead. Sure the programmer can shoot himself in the foot in other more spectacular way but at least having a thread group interface obbeying the principle of least astonishment can be a plus. Regards Gaetano Mendola -- http://cpp-today.blogspot.it/

Vicente J. Botet Escriba

6:03 p.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

Le 07/12/12 01:23, Gaetano Mendola a écrit :

...

On 06/12/2012 23.31, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 21:02, Gaetano Mendola a écrit :

...
On 05/12/2012 19.44, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 18:49, Gaetano Mendola a écrit :

...
On 05/12/2012 18.35, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 16:59, Gaetano Mendola a écrit : > On 05/12/2012 16.29, Vicente Botet wrote: >> Gaetano Mendola-3 wrote >>> On 05/12/2012 13.42, Vicente J. Botet Escriba wrote: >>>> Le 05/12/12 12:33, Gaetano Mendola a écrit : >>>>> On 05/12/2012 09.16, Anthony Williams wrote: >>>>>> On 04/12/12 18:32, Gaetano Mendola wrote: >>>>>>> Hi all, >>>>>>> I was investigating a rare deadlock when issuing an >>>>>>> interrupt and >>>>>>> a timed_join in parallel. I come out with the the following >>>>>>> code >>>>>>> showing the behavior. >>>>>>> >>>>>>> The deadlock is rare so sometime you need to wait a bit. >>>>>>> >>>>>>> I couldn't try it with boost 1.52 because the code is invalid >>>>>>> due the precondition of "thread joinable" when issuing the >>>>>>> timed_join. >>>>>> >>>>>> That's a hint. >>>>>> >>>>>>> Is the code not valid or a real bug? >>>>>> >>>>>> The code is invalid: you keep trying to interrupt and join >>>>>> even >>>>>> after >>>>>> the thread has been joined! Once the thread has been >>>>>> joined, the >>>>>> thread >>>>>> handle is no longer valid, and you should exit the loop. >>>>> >>>>> I haven't seen this statement in the documentation. >>>>> The loop was meant to exploit exactly this, then you are >>>>> confirming >>>>> that interrupting a joined thread is not valid. How do I safely >>>>> interrupt then a thread? >>>>> There is no "atomic" check_joinable_then_interrupt, >>>>> whatching at >>>>> the >>>>> interrupt code it seems that the check is done inside. I'm >>>>> lost. >>>> Boost.Thread and std::thread are designed so that there is only >>>> one >>>> owner of the thread. That is only one thread can >>>> join/interrupt a >>>> thread >>>> safely. >>> >>> Unless I have totally missed it the documentation doesn't mention >>> anything about thread safety (would that be an hint about it?). >> >> From the 1.48 documentation >> "Member function timed_join() >> >> bool timed_join(const system_time& wait_until); >> >> template<typename TimeDuration> >> bool timed_join(TimeDuration const& rel_time); >> >> Preconditions: >> >> this->get_id()!=boost::this_thread::get_id() >> >> Postconditions: >> >> If *this refers to a thread of execution on entry, and >> timed_join >> returns true, that thread of execution has completed, and *this no >> longer >> refers to any thread of execution. If this call to timed_join >> returns >> false, >> *this is unchanged. >> " >> >> Your second call doesn't satisfy the pre-conditions, so that the >> outcome of >> this second call is undefined. > > That precondition tests that your are not interrupting yourself > doesn't say anything about thread safety. Am I missing something ? > Hi,

no function is considered been thread-safe until it is stated explicitly.

You are completely right, it was my own fault considering it thread safe.

So even thread_group is not thread safe? The implementation is full thread safe. How it is thread-safe? Only one thread can call to the thread_group functions at a time.

The implementation is straight. The boost::thread_group keeps a std::list of boost::threads pointers.

the public methods:

create_thread add_thread remove_thread join_all interrupt_all size

are all protecting the internal status and then all the threads in the list with a common mutex.

It appears completely thread safe to me. You are right. I didn't see the implementation. Anthony, I don't know what was the intent of the initial design. Do you have an idea? What do you suggest? update the documentation?

Even the implementation can be improved, even if it's thread safe it doesn't avoid to issue a join to an already joined thread, or for the matter to interrupt a thread already joined/interrupted.

I don't know if you comment has something to be with these two tickets https://svn.boost.org/trac/boost/ticket/7668: thread_group::join_all() should check whether its threads are joinable https://svn.boost.org/trac/boost/ticket/7669: thread_group::join_all() should catch resource_deadlock_would_occur

...

I also believe the create_thread returning a boost::thread pointers is very dangerous, indeed doing so the boost::thread_group doesn't have anymore the ownership of it, it would be better if create_thread would have returned a thread id instead. I don't want to invest to much on the thread_group class, as for me this class was designed before threads were movable and a complete redesign should be done. You can of course create a Trac ticket requesting this new feature.

Sure the programmer can shoot himself in the foot in other more spectacular way but at least having a thread group interface obbeying the principle of least astonishment can be a plus. Best, Vicente

Gaetano Mendola

7:26 p.m.

On 07/12/2012 19.03, Vicente J. Botet Escriba wrote:

...

Le 07/12/12 01:23, Gaetano Mendola a écrit :

...
On 06/12/2012 23.31, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 21:02, Gaetano Mendola a écrit :

...
On 05/12/2012 19.44, Vicente J. Botet Escriba wrote:

...
Le 05/12/12 18:49, Gaetano Mendola a écrit :

...
On 05/12/2012 18.35, Vicente J. Botet Escriba wrote: > Le 05/12/12 16:59, Gaetano Mendola a écrit : >> On 05/12/2012 16.29, Vicente Botet wrote: >>> Gaetano Mendola-3 wrote >>>> On 05/12/2012 13.42, Vicente J. Botet Escriba wrote: >>>>> Le 05/12/12 12:33, Gaetano Mendola a écrit : >>>>>> On 05/12/2012 09.16, Anthony Williams wrote: >>>>>>> On 04/12/12 18:32, Gaetano Mendola wrote: >>>>>>>> Hi all, >>>>>>>> I was investigating a rare deadlock when issuing an >>>>>>>> interrupt and >>>>>>>> a timed_join in parallel. I come out with the the following >>>>>>>> code >>>>>>>> showing the behavior. >>>>>>>> >>>>>>>> The deadlock is rare so sometime you need to wait a bit. >>>>>>>> >>>>>>>> I couldn't try it with boost 1.52 because the code is invalid >>>>>>>> due the precondition of "thread joinable" when issuing the >>>>>>>> timed_join. >>>>>>> >>>>>>> That's a hint. >>>>>>> >>>>>>>> Is the code not valid or a real bug? >>>>>>> >>>>>>> The code is invalid: you keep trying to interrupt and join >>>>>>> even >>>>>>> after >>>>>>> the thread has been joined! Once the thread has been >>>>>>> joined, the >>>>>>> thread >>>>>>> handle is no longer valid, and you should exit the loop. >>>>>> >>>>>> I haven't seen this statement in the documentation. >>>>>> The loop was meant to exploit exactly this, then you are >>>>>> confirming >>>>>> that interrupting a joined thread is not valid. How do I safely >>>>>> interrupt then a thread? >>>>>> There is no "atomic" check_joinable_then_interrupt, >>>>>> whatching at >>>>>> the >>>>>> interrupt code it seems that the check is done inside. I'm >>>>>> lost. >>>>> Boost.Thread and std::thread are designed so that there is only >>>>> one >>>>> owner of the thread. That is only one thread can >>>>> join/interrupt a >>>>> thread >>>>> safely. >>>> >>>> Unless I have totally missed it the documentation doesn't mention >>>> anything about thread safety (would that be an hint about it?). >>> >>> From the 1.48 documentation >>> "Member function timed_join() >>> >>> bool timed_join(const system_time& wait_until); >>> >>> template<typename TimeDuration> >>> bool timed_join(TimeDuration const& rel_time); >>> >>> Preconditions: >>> >>> this->get_id()!=boost::this_thread::get_id() >>> >>> Postconditions: >>> >>> If *this refers to a thread of execution on entry, and >>> timed_join >>> returns true, that thread of execution has completed, and *this no >>> longer >>> refers to any thread of execution. If this call to timed_join >>> returns >>> false, >>> *this is unchanged. >>> " >>> >>> Your second call doesn't satisfy the pre-conditions, so that the >>> outcome of >>> this second call is undefined. >> >> That precondition tests that your are not interrupting yourself >> doesn't say anything about thread safety. Am I missing something ? >> > Hi, > > no function is considered been thread-safe until it is stated > explicitly.

You are completely right, it was my own fault considering it thread safe.

So even thread_group is not thread safe? The implementation is full thread safe. How it is thread-safe? Only one thread can call to the thread_group functions at a time.

The implementation is straight. The boost::thread_group keeps a std::list of boost::threads pointers.

the public methods:

create_thread add_thread remove_thread join_all interrupt_all size

are all protecting the internal status and then all the threads in the list with a common mutex.

It appears completely thread safe to me. You are right. I didn't see the implementation. Anthony, I don't know what was the intent of the initial design. Do you have an idea? What do you suggest? update the documentation?

Even the implementation can be improved, even if it's thread safe it doesn't avoid to issue a join to an already joined thread, or for the matter to interrupt a thread already joined/interrupted.

I don't know if you comment has something to be with these two tickets

https://svn.boost.org/trac/boost/ticket/7668: thread_group::join_all() should check whether its threads are joinable

Intersting, I saw the fix and as you can see it has a race condition there, indeed a thread can detach itself between the call to joinable and the join at that point even if the check was made when the join is issued the precondition doesn't hold anymore. Basically that is the question I'm asking: how can we be sure that a thread is joinable when we call the join on it, for sure checking the "joinable" has some problems. I have solved this in my thread group implementation with a flag saying if the thread was already joined or not and this works only because our threads do not detach them self so the thread group is the only responsible for the status change.

...

https://svn.boost.org/trac/boost/ticket/7669: thread_group::join_all() should catch resource_deadlock_would_occur

That is another story about the exception safety of thread_group, and as you can see it's not exception safe.

...

...
I also believe the create_thread returning a boost::thread pointers is very dangerous, indeed doing so the boost::thread_group doesn't have anymore the ownership of it, it would be better if create_thread would have returned a thread id instead. I don't want to invest to much on the thread_group class, as for me this class was designed before threads were movable and a complete redesign should be done. You can of course create a Trac ticket requesting this new feature.

Due the limitation of this class I have wrote my own thread group with also the method join_any having the semantic to interrupt all threads if any of the threads terminate. I believe seeing the current status of thread_group class the best to do is or fix it for good or deprecate it.

...

...
Sure the programmer can shoot himself in the foot in other more spectacular way but at least having a thread group interface obbeying the principle of least astonishment can be a plus. Best, Vicente

Regards Gaetano Mendola

Vicente J. Botet Escriba

7:53 p.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

Le 07/12/12 20:26, Gaetano Mendola a écrit :

...

On 07/12/2012 19.03, Vicente J. Botet Escriba wrote:

...
Le 07/12/12 01:23, Gaetano Mendola a écrit :

...
Even the implementation can be improved, even if it's thread safe it doesn't avoid to issue a join to an already joined thread, or for the matter to interrupt a thread already joined/interrupted.

I don't know if you comment has something to be with these two tickets

https://svn.boost.org/trac/boost/ticket/7668: thread_group::join_all() should check whether its threads are joinable

Intersting, I saw the fix and as you can see it has a race condition there, indeed a thread can detach itself between the call to joinable and the join at that point even if the check was made when the join is issued the precondition doesn't hold anymore.

Basically that is the question I'm asking: how can we be sure that a thread is joinable when we call the join on it, for sure checking the "joinable" has some problems. I have solved this in my thread group implementation with a flag saying if the thread was already joined or not and this works only because our threads do not detach them self so the thread group is the only responsible for the status change.

From my point of view a thread in a thread_group is owned by the thread group. Any direct operation on a thread belonging to the thread group falls in undefined behavior. From the documentation Member function create_thread() template<typename F> thread* create_thread(F threadfunc); Effects: Create a new boost::thread object as-if by new thread(threadfunc) and add it to the group. Postcondition: this->size() is increased by one, the new thread is running. Returns: A pointer to the new boost::thread object. Member function add_thread() void add_thread(thread* thrd); Precondition: The expression delete thrd is well-formed and will not result in undefined behaviour. Effects: Take ownership of the boost::thread object pointed to by thrd and add it to the group. Postcondition: this->size() is increased by one.

...

...
https://svn.boost.org/trac/boost/ticket/7669: thread_group::join_all() should catch resource_deadlock_would_occur

That is another story about the exception safety of thread_group, and as you can see it's not exception safe. Please, let me know where there is an issue and I'll try to fix it.

...
...
I also believe the create_thread returning a boost::thread pointers is very dangerous, indeed doing so the boost::thread_group doesn't have anymore the ownership of it, it would be better if create_thread would have returned a thread id instead. I don't want to invest to much on the thread_group class, as for me this class was designed before threads were movable and a complete redesign should be done. You can of course create a Trac ticket requesting this new feature.

Due the limitation of this class I have wrote my own thread group with also the method join_any having the semantic to interrupt all threads if any of the threads terminate. I believe seeing the current status of thread_group class the best to do is or fix it for good or deprecate it. I'm for deprecating it, as I posted already sometime ago, but of course I'm for fixing the bugs I could be introduced by recent modification in other parts of the library. Anyway, before deprecating it, we need an alternative. Maybe you have already a concrete proposal and you can submit it to Boost.

Best, Vicente

Gaetano Mendola

8:19 p.m.

On 07/12/2012 20.53, Vicente J. Botet Escriba wrote:

...

Le 07/12/12 20:26, Gaetano Mendola a écrit :

...
On 07/12/2012 19.03, Vicente J. Botet Escriba wrote:

...
Le 07/12/12 01:23, Gaetano Mendola a écrit :

...
Even the implementation can be improved, even if it's thread safe it doesn't avoid to issue a join to an already joined thread, or for the matter to interrupt a thread already joined/interrupted.

I don't know if you comment has something to be with these two tickets

https://svn.boost.org/trac/boost/ticket/7668: thread_group::join_all() should check whether its threads are joinable

Intersting, I saw the fix and as you can see it has a race condition there, indeed a thread can detach itself between the call to joinable and the join at that point even if the check was made when the join is issued the precondition doesn't hold anymore.

Basically that is the question I'm asking: how can we be sure that a thread is joinable when we call the join on it, for sure checking the "joinable" has some problems. I have solved this in my thread group implementation with a flag saying if the thread was already joined or not and this works only because our threads do not detach them self so the thread group is the only responsible for the status change.

From my point of view a thread in a thread_group is owned by the thread group. Any direct operation on a thread belonging to the thread group falls in undefined behavior.

I do full agree, that's why the thread_group::create_thread has to return void or at most the thread id. What can the user do with a pointer to the create thread? All members are not safe to be called apart two static functions. I believe not returning the thread pointer (that can not be touched anyway) would improve thread_group interface.

...

I'm for deprecating it, as I posted already sometime ago, but of course I'm for fixing the bugs I could be introduced by recent modification in other parts of the library. Anyway, before deprecating it, we need an alternative. Maybe you have already a concrete proposal and you can submit it to Boost.

I should rework our thread group to adapt it to new 1.52 interface. Regards Gaetano Mendola

Vicente J. Botet Escriba

9:24 p.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

Le 07/12/12 21:19, Gaetano Mendola a écrit :

...

On 07/12/2012 20.53, Vicente J. Botet Escriba wrote:

...
Le 07/12/12 20:26, Gaetano Mendola a écrit :

...
On 07/12/2012 19.03, Vicente J. Botet Escriba wrote:

...
Le 07/12/12 01:23, Gaetano Mendola a écrit :

...
Even the implementation can be improved, even if it's thread safe it doesn't avoid to issue a join to an already joined thread, or for the matter to interrupt a thread already joined/interrupted.

I don't know if you comment has something to be with these two tickets

https://svn.boost.org/trac/boost/ticket/7668: thread_group::join_all() should check whether its threads are joinable

Intersting, I saw the fix and as you can see it has a race condition there, indeed a thread can detach itself between the call to joinable and the join at that point even if the check was made when the join is issued the precondition doesn't hold anymore.

Basically that is the question I'm asking: how can we be sure that a thread is joinable when we call the join on it, for sure checking the "joinable" has some problems. I have solved this in my thread group implementation with a flag saying if the thread was already joined or not and this works only because our threads do not detach them self so the thread group is the only responsible for the status change.

From my point of view a thread in a thread_group is owned by the thread group. Any direct operation on a thread belonging to the thread group falls in undefined behavior.

I do full agree, that's why the thread_group::create_thread has to return void or at most the thread id. yes a thread::id would be a better option.

...

What can the user do with a pointer to the create thread? All members are not safe to be called apart two static functions. You can remove a thread from a thread_group :) I believe not returning the thread pointer (that can not be touched anyway) would improve thread_group interface.

...
I'm for deprecating it, as I posted already sometime ago, but of course I'm for fixing the bugs I could be introduced by recent modification in other parts of the library. Anyway, before deprecating it, we need an alternative. Maybe you have already a concrete proposal and you can submit it to Boost.

I should rework our thread group to adapt it to new 1.52 interface. Great. It will be a pleasure to review it.

Best, Vicente

Gaetano Mendola

11 Dec 11 Dec

10:46 a.m.

On 07/12/2012 22.24, Vicente J. Botet Escriba wrote:

...

Le 07/12/12 21:19, Gaetano Mendola a écrit :

...
On 07/12/2012 20.53, Vicente J. Botet Escriba wrote:

...
Le 07/12/12 20:26, Gaetano Mendola a écrit :

...
On 07/12/2012 19.03, Vicente J. Botet Escriba wrote:

...
Le 07/12/12 01:23, Gaetano Mendola a écrit :

...
Even the implementation can be improved, even if it's thread safe it doesn't avoid to issue a join to an already joined thread, or for the matter to interrupt a thread already joined/interrupted.

I don't know if you comment has something to be with these two tickets

https://svn.boost.org/trac/boost/ticket/7668: thread_group::join_all() should check whether its threads are joinable

Intersting, I saw the fix and as you can see it has a race condition there, indeed a thread can detach itself between the call to joinable and the join at that point even if the check was made when the join is issued the precondition doesn't hold anymore.

Basically that is the question I'm asking: how can we be sure that a thread is joinable when we call the join on it, for sure checking the "joinable" has some problems. I have solved this in my thread group implementation with a flag saying if the thread was already joined or not and this works only because our threads do not detach them self so the thread group is the only responsible for the status change.

From my point of view a thread in a thread_group is owned by the thread group. Any direct operation on a thread belonging to the thread group falls in undefined behavior.

I do full agree, that's why the thread_group::create_thread has to return void or at most the thread id. yes a thread::id would be a better option.

...
What can the user do with a pointer to the create thread? All members are not safe to be called apart two static functions. You can remove a thread from a thread_group :) I believe not returning the thread pointer (that can not be touched anyway) would improve thread_group interface.

...
I'm for deprecating it, as I posted already sometime ago, but of course I'm for fixing the bugs I could be introduced by recent modification in other parts of the library. Anyway, before deprecating it, we need an alternative. Maybe you have already a concrete proposal and you can submit it to Boost.

I should rework our thread group to adapt it to new 1.52 interface. Great. It will be a pleasure to review it.

Attached my first proposal, I don't know if this is the correct way to submit it. I have named it Thread_Group (capitalized) to not conflict with old thread_group implementation. 1) Thread group is now thread safe, it can be used concurrently by multiple threads 2) thread_group now mantains a list of handlers with the responsibility to: -) Avoid join and interrupts to be called concurrently on a thread -) Avoid to call join on a joined thread -) Avoid to call interrupt on a joined/interrupt thread 3) Added join_any with the semantic to wait of any of the threads to terminate. 3) create_thread now returns a thread::id *this is a break change*, what we can do now is to leave the create_thread as it is (returning a thread pointer) but returning not a real thread but a class inherited by thread allowing only the get_id method on it (other methods should be implemented with throw("method not callable"). 4) Due the fact mutex are not fair a thread issuing an interrupt_all most likely will go in starvation if a thread is issuing a join_all (especialy if the group contains a single thread). I can work at it. Regards Gaetano Mendola

Vicente Botet

3:13 p.m.

Gaetano Mendola-3 wrote

...

On 07/12/2012 22.24, Vicente J. Botet Escriba wrote:

...
Le 07/12/12 21:19, Gaetano Mendola a écrit :

...
On 07/12/2012 20.53, Vicente J. Botet Escriba wrote:

...
...
...
I'm for deprecating it, as I posted already sometime ago, but of course I'm for fixing the bugs I could be introduced by recent modification in other parts of the library. Anyway, before deprecating it, we need an alternative. Maybe you have already a concrete proposal and you can submit it to Boost.

I should rework our thread group to adapt it to new 1.52 interface. Great. It will be a pleasure to review it.

Attached my first proposal, I don't know if this is the correct way to submit it.

It depends on whether you want to submit an independent library. This has the advantage that you don't have to be backward compatible.

...

I have named it Thread_Group (capitalized) to not conflict with old thread_group implementation.

1) Thread group is now thread safe, it can be used concurrently by multiple threads

Why a thread group should be inherently thread-safe? It seems to me that having a thread container is already useful.

...

2) thread_group now maintains a list of handlers with the responsibility to: -) Avoid join and interrupts to be called concurrently on a thread -) Avoid to call join on a joined thread -) Avoid to call interrupt on a joined/interrupt thread

IMO, all the threads in a thread_group are owned by the group, and use move semantics, no need to use pointer to threads. As a consequence there is no need for the handler/wrapper.

...

3) Added join_any with the semantic to wait of any of the threads to terminate.

I don't like the implementation polling on all the threads to see if one of them has ended. Maybe the use of at_thread_exit could help. If all the threads of a thread group are created with the create_thread function, an alternative is to use a decorator of the user function that can signal when the thread has been finished. This behavior is quite close to futures.

...

3) create_thread now returns a thread::id *this is a break change*, what we can do now is to leave the create_thread as it is (returning a thread pointer) but returning not a real thread but a class inherited by thread allowing only the get_id method on it (other methods should be implemented with throw("method not callable").

Do you really need to remove threads? I will even go for don't returning anything? I will remove also the add_thread function.

...

4) Due the fact mutex are not fair a thread issuing an interrupt_all most likely will go in starvation if a thread is issuing a join_all (especialy if the group contains a single thread). I can work at it.

Could you clarify your concern? Why you have needed to change the implementation of join_all? It seems more complex now. As I said I'm for deprecating thread_group in Boost.Thread. The implementation you are proposing has not changed my mind. I would prefer an approach where data structures (containers) and algorithms (join, interrupt, ...) are separated, thread-safety is not mandatory, ... Best, Vicente -- View this message in context: http://boost.2283326.n4.nabble.com/thread-1-48-Multiple-interrupt-timed-join... Sent from the Boost - Dev mailing list archive at Nabble.com.

Gaetano Mendola

6:12 p.m.

On 11/12/2012 16.13, Vicente Botet wrote:

...

Gaetano Mendola-3 wrote

...
On 07/12/2012 22.24, Vicente J. Botet Escriba wrote:

...
Le 07/12/12 21:19, Gaetano Mendola a écrit :

...
On 07/12/2012 20.53, Vicente J. Botet Escriba wrote:

...
...
...
I'm for deprecating it, as I posted already sometime ago, but of course I'm for fixing the bugs I could be introduced by recent modification in other parts of the library. Anyway, before deprecating it, we need an alternative. Maybe you have already a concrete proposal and you can submit it to Boost.

I should rework our thread group to adapt it to new 1.52 interface. Great. It will be a pleasure to review it.

Attached my first proposal, I don't know if this is the correct way to submit it.

It depends on whether you want to submit an independent library. This has the advantage that you don't have to be backward compatible.

...
I have named it Thread_Group (capitalized) to not conflict with old thread_group implementation.

1) Thread group is now thread safe, it can be used concurrently by multiple threads

Why a thread group should be inherently thread-safe? It seems to me that having a thread container is already useful.

It can manage without pestering the developers the fact that one entity spawns a batch of threads and then wait for the completion waiting on the join() while another entity (an user interface as example) can stop the whole process if it's taking too much time. Otherwise as soon someone performs a boost::thread_group::join then nothing can be done from outside to stop the process. It seems a natural use to me.

...

...
2) thread_group now maintains a list of handlers with the responsibility to: -) Avoid join and interrupts to be called concurrently on a thread -) Avoid to call join on a joined thread -) Avoid to call interrupt on a joined/interrupt thread

IMO, all the threads in a thread_group are owned by the group, and use move semantics, no need to use pointer to threads. As a consequence there is no need for the handler/wrapper.

This is true if the thread_group does not permits to be used by multiple threads interrupting/joining.

...

...
3) Added join_any with the semantic to wait of any of the threads to terminate.

I don't like the implementation polling on all the threads to see if one of them has ended. Maybe the use of at_thread_exit could help. If all the threads of a thread group are created with the create_thread function, an alternative is to use a decorator of the user function that can signal when the thread has been finished. This behavior is quite close to futures.

I agree with it.

...

...
3) create_thread now returns a thread::id *this is a break change*, what we can do now is to leave the create_thread as it is (returning a thread pointer) but returning not a real thread but a class inherited by thread allowing only the get_id method on it (other methods should be implemented with throw("method not callable").

Do you really need to remove threads? I will even go for don't returning anything?

I will remove also the add_thread function.

I agree as well.

...

...
4) Due the fact mutex are not fair a thread issuing an interrupt_all most likely will go in starvation if a thread is issuing a join_all (especialy if the group contains a single thread). I can work at it.

Could you clarify your concern?

Sure, if a thread performs a closed loop: a) lock_mutex; b) timed_join c) unlock_mutex d) goto a and another thread is doing: a) lock_mutex; b) interrupt c) unlock_mutex then it goes in starvationm we have observed this (even in a deterministic way), then I had to make the two interrupt/timed_join on the thread handler fair each other. Our platform is a linux platform with a 3.2.0 kernel.

...

Why you have needed to change the implementation of join_all? It seems more complex now.

Because if you want to issue an interrupt while someone is joining a thread you need to have a join timed in order to periodically release the mutex protecting the thread.

...

As I said I'm for deprecating thread_group in Boost.Thread. The implementation you are proposing has not changed my mind.

Fine with me, leaving it as it is now is worst than

...

I would prefer an approach where data structures (containers) and algorithms (join, interrupt, ...) are separated, thread-safety is not mandatory, ...

I agreed with you about the fact if not explicitly written a library is not meant to be thread safe, this rule doesn't fit a thread library. I suggest to explicitly write, even in bold, the fact that thread_group and thread classes are not thread safe. Regards Gaetano Mendola

Vicente J. Botet Escriba

8:23 p.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

...

On 11/12/2012 16.13, Vicente Botet wrote:

...
Gaetano Mendola-3 wrote

...
1) Thread group is now thread safe, it can be used concurrently by multiple threads

Why a thread group should be inherently thread-safe? It seems to me that having a thread container is already useful.

It can manage without pestering the developers the fact that one entity spawns a batch of threads and then wait for the completion waiting on the join() while another entity (an user interface as example) can stop the whole process if it's taking too much time. Otherwise as soon someone performs a boost::thread_group::join then nothing can be done from outside to stop the process. It seems a natural use to me. OK I think I understand your use case. Here it is an alternative that don't use any mutex to protect the group of thread. I will choose and owner of all these threads, insert them on a container. Only this thread is able to join/interrupt the threads. I will use some way to transfer the request from the user interface

Le 11/12/12 19:12, Gaetano Mendola a écrit : thread to the owner that this is taking too much time (using atomic?). The owner will try to join each thread using try_join_until with the desired expiration time. If the thread is joined the thread is removed from the container. If there is a timeout the owner will check the protected state 'take_too_much_time' and will interrupt all the other threads and then join all of them. As you can see the contention is reduced. Note that this is a specific behavior that can not be added to the thread_group class. I will be for the addition of an algorithm/free function that try to join the threads on a container/range during a given duration or until an expiration time (removing the joined threads).

...

...
...
2) thread_group now maintains a list of handlers with the responsibility to: -) Avoid join and interrupts to be called concurrently on a thread -) Avoid to call join on a joined thread -) Avoid to call interrupt on a joined/interrupt thread

IMO, all the threads in a thread_group are owned by the group, and use move semantics, no need to use pointer to threads. As a consequence there is no need for the handler/wrapper.

This is true if the thread_group does not permits to be used by multiple threads interrupting/joining. I understand now why you did this way. But I will not do that.

...
...
4) Due the fact mutex are not fair a thread issuing an interrupt_all most likely will go in starvation if a thread is issuing a join_all (especialy if the group contains a single thread). I can work at it.

Could you clarify your concern?

Sure, if a thread performs a closed loop:

a) lock_mutex; b) timed_join c) unlock_mutex d) goto a

and another thread is doing:

a) lock_mutex; b) interrupt c) unlock_mutex

then it goes in starvationm we have observed this (even in a deterministic way), then I had to make the two interrupt/timed_join on the thread handler fair each other. Our platform is a linux platform with a 3.2.0 kernel.

IMO, only one thread should join/interrupt all the threads. This avoid all these issues.

...

...
Why you have needed to change the implementation of join_all? It seems more complex now.

Because if you want to issue an interrupt while someone is joining a thread you need to have a join timed in order to periodically release the mutex protecting the thread. I see why now.

...
As I said I'm for deprecating thread_group in Boost.Thread. The implementation you are proposing has not changed my mind.

Fine with me, leaving it as it is now is worst than The class thread_group is not a big one. You are free to propose to Boost whatever you consider is good for the Boost community.

...
I would prefer an approach where data structures (containers) and algorithms (join, interrupt, ...) are separated, thread-safety is not mandatory, ...

I agreed with you about the fact if not explicitly written a library is not meant to be thread safe, this rule doesn't fit a thread library. This is your opinion and I respect it even if I don't share it. I suggest to explicitly write, even in bold, the fact that thread_group and thread classes are not thread safe. I believed that we agreed that the thread group was thread-safe! Of course the thread class is not thread-safe. I could add something like all the functions in this library are not thread-safe until stated explicitly, as I did it for Boost.Chrono since the beginning.

Best, Vicente

Gaetano Mendola

11:32 p.m.

On 11/12/2012 21.23, Vicente J. Botet Escriba wrote:

...

...
On 11/12/2012 16.13, Vicente Botet wrote:

...
Gaetano Mendola-3 wrote

...
1) Thread group is now thread safe, it can be used concurrently by multiple threads

Why a thread group should be inherently thread-safe? It seems to me that having a thread container is already useful.

It can manage without pestering the developers the fact that one entity spawns a batch of threads and then wait for the completion waiting on the join() while another entity (an user interface as example) can stop the whole process if it's taking too much time. Otherwise as soon someone performs a boost::thread_group::join then nothing can be done from outside to stop the process. It seems a natural use to me. OK I think I understand your use case. Here it is an alternative that don't use any mutex to protect the group of thread. I will choose and owner of all these threads, insert them on a container. Only this thread is able to join/interrupt the threads. I will use some way to transfer the request from the user interface

Le 11/12/12 19:12, Gaetano Mendola a écrit : thread to the owner that this is taking too much time (using atomic?). The owner will try to join each thread using try_join_until with the desired expiration time. If the thread is joined the thread is removed from the container. If there is a timeout the owner will check the protected state 'take_too_much_time' and will interrupt all the other threads and then join all of them. As you can see the contention is reduced.

Note that this is a specific behavior that can not be added to the thread_group class. I will be for the addition of an algorithm/free function that try to join the threads on a container/range during a given duration or until an expiration time (removing the joined threads).

I don't know who the boost thread maintainer is and how/who decides if a design is good to be implemented or it work the other way around?

...

...
...
...
2) thread_group now maintains a list of handlers with the responsibility to: -) Avoid join and interrupts to be called concurrently on a thread -) Avoid to call join on a joined thread -) Avoid to call interrupt on a joined/interrupt thread

IMO, all the threads in a thread_group are owned by the group, and use move semantics, no need to use pointer to threads. As a consequence there is no need for the handler/wrapper.

This is true if the thread_group does not permits to be used by multiple threads interrupting/joining. I understand now why you did this way. But I will not do that.

Then the maintainer is you?

...

...
...
...
4) Due the fact mutex are not fair a thread issuing an interrupt_all most likely will go in starvation if a thread is issuing a join_all (especialy if the group contains a single thread). I can work at it.

Could you clarify your concern?

Sure, if a thread performs a closed loop:

a) lock_mutex; b) timed_join c) unlock_mutex d) goto a

and another thread is doing:

a) lock_mutex; b) interrupt c) unlock_mutex

then it goes in starvationm we have observed this (even in a deterministic way), then I had to make the two interrupt/timed_join on the thread handler fair each other. Our platform is a linux platform with a 3.2.0 kernel.

IMO, only one thread should join/interrupt all the threads. This avoid all these issues.

Avoid the issues at thread group level, but those issues will be present into an upper layer.

...

...
...
As I said I'm for deprecating thread_group in Boost.Thread. The implementation you are proposing has not changed my mind.

Fine with me, leaving it as it is now is worst than The class thread_group is not a big one. You are free to propose to Boost whatever you consider is good for the Boost community.

I'm proposing to decide what thread_group has to do because as it is now it have design flaws.

...

...
...
I would prefer an approach where data structures (containers) and algorithms (join, interrupt, ...) are separated, thread-safety is not mandatory, ...

I agreed with you about the fact if not explicitly written a library is not meant to be thread safe, this rule doesn't fit a thread library. This is your opinion and I respect it even if I don't share it. I suggest to explicitly write, even in bold, the fact that thread_group and thread classes are not thread safe. I believed that we agreed that the thread group was thread-safe! Of course the thread class is not thread-safe. I could add something like all the functions in this library are not thread-safe until stated explicitly, as I did it for Boost.Chrono since the beginning.

From an academic pure point of view the thread group is thread safe because it protects his internal list so it can be used by multiple threads, but to the other side if used (as it is now) from multiple threads then the best you can get is "everyone guess" joining a batch of already joined threads or for the matter to interrupt an already joined group. It seems it was coded when it was not a problem for a thread to be joined even if joined/interrupted. Regards Gaetano Mendola

Vicente J. Botet Escriba

12 Dec 12 Dec

7:33 a.m.

New subject: [thread 1.48] Multiple interrupt/timed_join leads to deadlock

Le 12/12/12 00:32, Gaetano Mendola a écrit :

...

On 11/12/2012 21.23, Vicente J. Botet Escriba wrote:

...
...
On 11/12/2012 16.13, Vicente Botet wrote:

...
Gaetano Mendola-3 wrote

...
1) Thread group is now thread safe, it can be used concurrently by multiple threads

Why a thread group should be inherently thread-safe? It seems to me that having a thread container is already useful.

It can manage without pestering the developers the fact that one entity spawns a batch of threads and then wait for the completion waiting on the join() while another entity (an user interface as example) can stop the whole process if it's taking too much time. Otherwise as soon someone performs a boost::thread_group::join then nothing can be done from outside to stop the process. It seems a natural use to me. OK I think I understand your use case. Here it is an alternative that don't use any mutex to protect the group of thread. I will choose and owner of all these threads, insert them on a container. Only this thread is able to join/interrupt the threads. I will use some way to transfer the request from the user interface

Le 11/12/12 19:12, Gaetano Mendola a écrit : thread to the owner that this is taking too much time (using atomic?). The owner will try to join each thread using try_join_until with the desired expiration time. If the thread is joined the thread is removed from the container. If there is a timeout the owner will check the protected state 'take_too_much_time' and will interrupt all the other threads and then join all of them. As you can see the contention is reduced.

Note that this is a specific behavior that can not be added to the thread_group class. I will be for the addition of an algorithm/free function that try to join the threads on a container/range during a given duration or until an expiration time (removing the joined threads).

I don't know who the boost thread maintainer is and how/who decides if a design is good to be implemented or it work the other way around? Here I was talking about my alternative solution.

...
...
...
...
2) thread_group now maintains a list of handlers with the responsibility to: -) Avoid join and interrupts to be called concurrently on a thread -) Avoid to call join on a joined thread -) Avoid to call interrupt on a joined/interrupt thread

IMO, all the threads in a thread_group are owned by the group, and use move semantics, no need to use pointer to threads. As a consequence there is no need for the handler/wrapper.

This is true if the thread_group does not permits to be used by multiple threads interrupting/joining. I understand now why you did this way. But I will not do that.

Then the maintainer is you? Here I was talking as a user, that is, that I will not use the design of your application. And yes, I'm the maintainer with Anthony Williams that is the principal author.

...
...
...
...
4) Due the fact mutex are not fair a thread issuing an interrupt_all most likely will go in starvation if a thread is issuing a join_all (especialy if the group contains a single thread). I can work at it.

Could you clarify your concern?

Sure, if a thread performs a closed loop:

<snip> then it goes in starvationm we have observed this (even in a deterministic way), then I had to make the two interrupt/timed_join on the thread handler fair each other. Our platform is a linux platform with a 3.2.0 kernel.

IMO, only one thread should join/interrupt all the threads. This avoid all these issues.

Avoid the issues at thread group level, but those issues will be present into an upper layer. Maybe. Have you identified these issues on the design I have proposed above?

...
...
...
As I said I'm for deprecating thread_group in Boost.Thread. The implementation you are proposing has not changed my mind.

Fine with me, leaving it as it is now is worst than The class thread_group is not a big one. You are free to propose to Boost whatever you consider is good for the Boost community.

I'm proposing to decide what thread_group has to do because as it is now it have design flaws. I agree completely with the design flaw. This thread_group class should be a specific user class so that it can adapt it to its specific needs.

...
...
...
I would prefer an approach where data structures (containers) and algorithms (join, interrupt, ...) are separated, thread-safety is not mandatory, ...

I agreed with you about the fact if not explicitly written a library is not meant to be thread safe, this rule doesn't fit a thread library. This is your opinion and I respect it even if I don't share it. I suggest to explicitly write, even in bold, the fact that thread_group and thread classes are not thread safe. I believed that we agreed that the thread group was thread-safe! Of course the thread class is not thread-safe. I could add something like all the functions in this library are not thread-safe until stated explicitly, as I did it for Boost.Chrono since the beginning.

From an academic pure point of view the thread group is thread safe because it protects his internal list so it can be used by multiple threads, but to the other side if used (as it is now) from multiple threads then the best you can get is "everyone guess" joining a batch of already joined threads or for the matter to interrupt an already joined group. If the current design doesn't respond to your expectations you should use an alternative. It seems it was coded when it was not a problem for a thread to be joined even if joined/interrupted. You are right, the last change in thread had some some undesirable impacts on thread_group. IMO, the two fixes I reported in this thread resolve the issues. Please let me know if this is not the case.

Best, Vicente

Gaetano Mendola

8:19 a.m.

On 12/12/2012 08.33, Vicente J. Botet Escriba wrote:

...

Le 12/12/12 00:32, Gaetano Mendola a écrit :

...
On 11/12/2012 21.23, Vicente J. Botet Escriba wrote:

...
...
On 11/12/2012 16.13, Vicente Botet wrote:

...
Gaetano Mendola-3 wrote

...
1) Thread group is now thread safe, it can be used concurrently by multiple threads

Why a thread group should be inherently thread-safe? It seems to me that having a thread container is already useful.

It can manage without pestering the developers the fact that one entity spawns a batch of threads and then wait for the completion waiting on the join() while another entity (an user interface as example) can stop the whole process if it's taking too much time. Otherwise as soon someone performs a boost::thread_group::join then nothing can be done from outside to stop the process. It seems a natural use to me. OK I think I understand your use case. Here it is an alternative that don't use any mutex to protect the group of thread. I will choose and owner of all these threads, insert them on a container. Only this thread is able to join/interrupt the threads. I will use some way to transfer the request from the user interface

Le 11/12/12 19:12, Gaetano Mendola a écrit : thread to the owner that this is taking too much time (using atomic?). The owner will try to join each thread using try_join_until with the desired expiration time. If the thread is joined the thread is removed from the container. If there is a timeout the owner will check the protected state 'take_too_much_time' and will interrupt all the other threads and then join all of them. As you can see the contention is reduced.

Note that this is a specific behavior that can not be added to the thread_group class. I will be for the addition of an algorithm/free function that try to join the threads on a container/range during a given duration or until an expiration time (removing the joined threads).

I don't know who the boost thread maintainer is and how/who decides if a design is good to be implemented or it work the other way around? Here I was talking about my alternative solution.

...
...
...
...
2) thread_group now maintains a list of handlers with the responsibility to: -) Avoid join and interrupts to be called concurrently on a thread -) Avoid to call join on a joined thread -) Avoid to call interrupt on a joined/interrupt thread

IMO, all the threads in a thread_group are owned by the group, and use move semantics, no need to use pointer to threads. As a consequence there is no need for the handler/wrapper.

This is true if the thread_group does not permits to be used by multiple threads interrupting/joining. I understand now why you did this way. But I will not do that.

Then the maintainer is you? Here I was talking as a user, that is, that I will not use the design of your application. And yes, I'm the maintainer with Anthony Williams that is the principal author.

...
...
...
...
4) Due the fact mutex are not fair a thread issuing an interrupt_all most likely will go in starvation if a thread is issuing a join_all (especialy if the group contains a single thread). I can work at it.

Could you clarify your concern?

Sure, if a thread performs a closed loop:

<snip> then it goes in starvationm we have observed this (even in a deterministic way), then I had to make the two interrupt/timed_join on the thread handler fair each other. Our platform is a linux platform with a 3.2.0 kernel.

IMO, only one thread should join/interrupt all the threads. This avoid all these issues.

Avoid the issues at thread group level, but those issues will be present into an upper layer. Maybe. Have you identified these issues on the design I have proposed above?

Of course not if thread_group is treated as not thread safe indeed you avoid all those issues at once.

...

...
It seems it was coded when it was not a problem for a thread to be joined even if joined/interrupted. You are right, the last change in thread had some some undesirable impacts on thread_group. IMO, the two fixes I reported in this thread resolve the issues. Please let me know if this is not the case.

It solves the issue yes. A consideration I can do is to completely remove the mutex protection inside the thread_group because it's useless not being a class meant to be used by multiple threads, also due the fact it is an only header class the inspection of implementation it gives false expectations. Regards Gaetano Mendola

4609

Age (days ago)

4617

Last active (days ago)

List overview

Download

34 comments

5 participants

participants (5)

Anthony Williams
AntonioM
Gaetano Mendola
Vicente Botet
Vicente J. Botet Escriba