boost::thread::join causes assertion (pthreads)

Thorsten Froehlich

21 Mar 2005 21 Mar '05

1:01 a.m.

Hi, I noticed that when using thread::join and the thread has already completed, at least on Mac OS X using pthreads, pthread_join returns EINVAL, thus triggering the assertion res == 0. Is this intentional or a bug? Thorsten

Show replies by date

Kevin Wheatley

21 Mar 21 Mar

9:16 a.m.

Thorsten Froehlich wrote:

...

I noticed that when using thread::join and the thread has already completed, at least on Mac OS X using pthreads, pthread_join returns EINVAL, thus triggering the assertion res == 0. Is this intentional or a bug?

I think this is intentional as the same occurs for all 3 platforms Threads supports (personally used on Win32 and Linux). I'd prefer to see the return codes from the various platforms turned into a platform independent error code but I'm not sure what the various possibilities are and if they are small enough to be usefully consolidatable into true platform indepenant codes. You can of cause compile in release mode with the assertions turned off, but its a pain when debugging. I tried asking about this on the developer list but I guess people are busy (isn't everybody :-) or ignoring me (paranoid!). Kevin -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |

Michael Glassford

2:46 p.m.

Kevin Wheatley wrote:

...

I tried asking about this on the developer list but I guess people are busy (isn't everybody :-) or ignoring me (paranoid!).

I haven't been able to answer all Boost.Threads questions, though I do try to bookmark them to look at later. Unfortunately I lost many bookmarks when my machine died recently, and I don't even remember seeing the message you refer to here. Could you point me to it? Thanks, Mike

Kevin Wheatley

4:16 p.m.

Michael Glassford wrote:

...

Kevin Wheatley wrote:

...
I tried asking about this on the developer list but I guess people are busy (isn't everybody :-) or ignoring me (paranoid!).

I haven't been able to answer all Boost.Threads questions, though I do try to bookmark them to look at later. Unfortunately I lost many bookmarks when my machine died recently, and I don't even remember seeing the message you refer to here. Could you point me to it?

David E. Konerding DSD staff

7:52 p.m.

Kevin Wheatley wrote:

...

Thorsten Froehlich wrote:

...
I noticed that when using thread::join and the thread has already completed, at least on Mac OS X using pthreads, pthread_join returns EINVAL, thus triggering the assertion res == 0. Is this intentional or a bug?

I think this is intentional as the same occurs for all 3 platforms Threads supports (personally used on Win32 and Linux). I'd prefer to see the return codes from the various platforms turned into a platform independent error code but I'm not sure what the various possibilities are and if they are small enough to be usefully consolidatable into true platform indepenant codes. You can of cause compile in release mode with the assertions turned off, but its a pain when debugging.

I tried asking about this on the developer list but I guess people are busy (isn't everybody :-) or ignoring me (paranoid!).

Kevin

The following is not strictly Boost specific, but might be helpful: Mac OS X is much more sensitive to thread programming errors than Linux (at least Red Hat 9; there could be other platforms with more compliant and useful pthreads implementations) . We discovered a threading bug in our code only after attemping to port to Mac OS X and getting various thread errors. In each case, we found out that it was a real thread bug in our code (for exmaple, attemping to lock a locked mutex, or unlocking an unlocked mutex) and Linux simply passed through that line of code with no error messages, while OS X crashed immediately with a helpful error message. ACtually, now that I read the man page (pthread_mutex_lock)again, I see that LinuxThreads (and maybe NPTL) uses "fast" mutex kinds by default, which don't check for certain error conditions. In the end, we were able to reproduce the problem using valgrind on Linux IA32. Dave

Thorsten Froehlich

8:48 p.m.

Kevin Wheatley wrote:

...

I think this is intentional as the same occurs for all 3 platforms Threads supports (personally used on Win32 and Linux). I'd prefer to see the return codes from the various platforms turned into a platform independent error code but I'm not sure what the various possibilities are and if they are small enough to be usefully consolidatable into true platform indepenant codes. You can of cause compile in release mode with the assertions turned off, but its a pain when debugging.

Well, I would already be happy to not get an assertion on legal code. For now I removed those assertions as they provide very little value at all.

...

I tried asking about this on the developer list but I guess people are busy (isn't everybody :-) or ignoring me (paranoid!).

Good question, I have noticed that not a whole lot is happening with boost threads lately, but plenty of features that were once announced never made it. Not to mention the documentation seems to be free of any examples whatsoever. It would be really great if someone with time could pick up boost thread development and documentation again, it certainly is a valuable part of boost that could use a lot more attention. Thorsten

Michael Glassford

9:47 p.m.

Thorsten Froehlich wrote:

...

Good question, I have noticed that not a whole lot is happening with boost threads lately,

True. I've been too busy to do much with Boost.Threads, and while it's possible that I may find more time, I can't promise it. Every time I've hoped I would have more time it hasn't worked out that way. I've had some off-list discussions with Roland, and he has proposed a couple of ideas, but we haven't gone beyond that.

...

but plenty of features that were once announced never made it.

A lot of possible features were discussed, but I don't think any were announced other than a general plan to complete features started by William Kempf in the thread_dev branch and move them to the main branch. What particular features did you have in mind?

...

Not to mention the documentation seems to be free of any examples whatsoever.

There have been a lot of complaints about the documentation, but no volunteers to help fix it.

...

It would be really great if someone with time could pick up boost thread development and documentation again,

If someone with time wants to help with or even take over Boost.Threads development, I'd be glad to talk about it (and offer what help I can). Maybe Roland?

...

it certainly is a valuable part of boost that could use a lot more attention.

I agree. Mike

Michael Glassford

29 Mar 29 Mar

4:24 p.m.

Michael Glassford wrote:

...

Thorsten Froehlich wrote:

...
Good question, I have noticed that not a whole lot is happening with boost threads lately,

True. I've been too busy to do much with Boost.Threads, and while it's possible that I may find more time, I can't promise it. Every time I've hoped I would have more time it hasn't worked out that way.

[snip]

...

If someone with time wants to help with or even take over Boost.Threads development, I'd be glad to talk about it (and offer what help I can). Maybe Roland?

[snip] As it turns out, I may have more time in about a month. I'm losing my job, which is one of the things that has kept me so busy, and hope to be able to spend some time on Boost.Threads during/after the transition. Still no promises, of course, because I don't yet know what will happen, but its a possibility. Mike

Ben Hutchings

21 Mar 21 Mar

2:43 p.m.

Thorsten Froehlich wrote:

...

Hi,

I noticed that when using thread::join and the thread has already completed,

I'm not sure whether you mean that the thread has exited or that it has also been joined.

...

at least on Mac OS X using pthreads, pthread_join returns EINVAL, thus triggering the assertion res == 0. Is this intentional or a bug?

Which part of this behaviour are you questioning? The reasons why pthread_join may fail all represent bugs rather than unpredictable errors, hence the assertion. Joining a thread makes the thread object an empty shell which has no further use. Detaching a thread also makes it non-joinable and makes the thread object almost useless. If your program attempts to join a non-joinable thread, that's the bug. Otherwise, it is conceivable that you have found a bug in MacOS X or in Boost.Threads, but I'd be surprised. Ben.

Thorsten Froehlich

8:30 p.m.

Ben Hutchings wrote:

...

I'm not sure whether you mean that the thread has exited or that it has also been joined.

The thread ended normally as the thread's function returned. There does not appear to be any way to detect this in boost::thread.

...

...
EINVAL, thus triggering the assertion res == 0. Is this intentional or a bug?

Which part of this behaviour are you questioning?

That when I try to join a thread that is no longer running, which cannot be reliably determined by the given thread class interface (well, it cannot at all in fact), then getting an assertion upon try to join that thread is a bug.

...

The reasons why pthread_join may fail all represent bugs rather than unpredictable errors, hence the assertion.

Sorry, this is incorrect, see below.

...

Joining a thread makes the thread object an empty shell which has no further use. Detaching a thread also makes it non-joinable and makes the thread object almost useless. If your program attempts to join a non-joinable thread, that's the bug. Otherwise, it is conceivable that you have found a bug in MacOS X or in Boost.Threads, but I'd be surprised.

I am perfectly aware what happens in the code. My code is certainly correct and doing nothing invalid at all - the problem is with boost thread behavior or interface (well, lack of it) to prevent the assertion, which would only be possible by checking if a thread terminated already, and checking for this in a reliable manner. Boost threads does not provide that facility. Thus, an assertion is incorrect at that point in the boost thread join code. Thorsten

Ben Hutchings

9:29 p.m.

Thorsten Froehlich wrote:

...

Ben Hutchings wrote:

...
I'm not sure whether you mean that the thread has exited or that it has also been joined.

The thread ended normally as the thread's function returned. There does not appear to be any way to detect this in boost::thread.

Right.

...

...
...
EINVAL, thus triggering the assertion res == 0. Is this intentional or a bug?

Which part of this behaviour are you questioning?

That when I try to join a thread that is no longer running, which cannot be reliably determined by the given thread class interface (well, it cannot at all in fact), then getting an assertion upon try to join that thread is a bug.

If the thread is supposed to be joinable, then of course this is a bug, yes.

...

...
The reasons why pthread_join may fail all represent bugs rather than unpredictable errors, hence the assertion.

Sorry, this is incorrect, see below.

I think you misunderstand me. I don't mean that a failure of pthread_join necessarily indicates a bug in your program, but that it indicates a bug at some level. The specification doesn't provide for any error conditions like lack of resources.

...

...
Joining a thread makes the

...
thread object an empty shell which has no further use. Detaching a thread also makes it non-joinable and makes the thread object almost useless. If your program attempts to join a non-joinable thread, that's the bug. Otherwise, it is conceivable that you have found a bug in MacOS X or in Boost.Threads, but I'd be surprised.

I am perfectly aware what happens in the code. My code is certainly correct and doing nothing invalid at all - the problem is with boost thread behavior or interface (well, lack of it) to prevent the assertion, which would only be possible by checking if a thread terminated already, and checking for this in a reliable manner. Boost threads does not provide that facility. Thus, an assertion is incorrect at that point in the boost thread join code.

If a thread is joinable then pthread_join must succeed whether it is called before or after the thread exits. Similarly for boost::thread::join. Since you apparently don't understand the semantics of joining, perhaps you have in fact made an error. Why don't you post an example that shows the behaviour you're reporting? Ben.

Thorsten Froehlich

10:52 p.m.

Ben Hutchings wrote:

...

Since you apparently don't understand the semantics of joining, perhaps you have in fact made an error.

I can assure you my co-workers and I perfectly understand joining.

...

Why don't you post an example that shows the behaviour you're reporting?

As others have already understood the problem, there does not seem to be a need to do so. Thorsten

Kevin Wheatley

22 Mar 22 Mar

9:28 a.m.

Thorsten Froehlich wrote:

...

...
Why don't you post an example that shows the behaviour you're reporting?

As others have already understood the problem, there does not seem to be a need to do so.

something like: #include <boost/thread/thread.hpp> #include <iostream> using namespace std; using namespace boost; void foo(void) { // So it doesn't really matter what goes on here but // something causes the routine to exit... int i = 10000; while (true) { // in my case this 'signal' is a special terminate job // as we all know thread cancelation is 'hard' if (--i == 0) break; } // Once we leave here we have no idea when the join call will occur } int main(int argc, char** argv) { thread fooThread(&foo); // Now do something to cause a delay here... // ignore the fact that input/output may not be 'thread safe' as this is not the point!! std::cout << "press the 'any' key..." << std:endl; char dummy; std::cin >> dummy; // assert may fail during join as the foo thread may have exited, but how do we know this? fooThread.join(); return 0; } So I could have made the thread detatched, but then consider the case when using thread_group, you've got multiple threads doing work and you want them to stop for some reason, how do you do that safely? My atificial delays above actually occur in operating systems Kevin -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |

Vladimir Prus

11:23 a.m.

Kevin Wheatley wrote:

...

Thorsten Froehlich wrote:

...
...
Why don't you post an example that shows the behaviour you're reporting?

As others have already understood the problem, there does not seem to be a need to do so.

something like:

#include <boost/thread/thread.hpp> #include <iostream>

.........

...

int main(int argc, char** argv) { thread fooThread(&foo);

I can't get your program to fail.

...

// Now do something to cause a delay here...

// ignore the fact that input/output may not be 'thread safe' as this is not the point!! std::cout << "press the 'any' key..." << std:endl;

And the "std:endl" makes me wonder how did you get this program to compile ;-) Can you please post a program that: 1. Compilers 2. Fails (at least sometimes) 3. Specify version of Boost, compiler, OS version. That's the only chance for the problem to be fixed. If I can get it fail on my Linux box, I can look into it. - Volodya

Peter Dimov

1:03 p.m.

Vladimir Prus wrote:

...

That's the only chance for the problem to be fixed. If I can get it fail on my Linux box, I can look into it.

Isn't this thread about an OS X-specific problem? How can you reproduce that on Linux?

Vladimir Prus

1:16 p.m.

New subject: boost::thread::join causes assertion (pthreads)

Peter Dimov wrote:

...

Vladimir Prus wrote:

...
That's the only chance for the problem to be fixed. If I can get it fail on my Linux box, I can look into it.

Isn't this thread about an OS X-specific problem?

It started this way. However, I'm not sure what system Kevin has in mind OS X in mind, or that the problem he has is indeed OS X specific.

...

How can you reproduce that on Linux?

If Kevin's problem is OS-X specific, I cannot. But somebody else might reproduce it, once the example is compilable. - Volodya

Kevin Wheatley

2:14 p.m.

New subject: boost::thread::join causes assertion(pthreads)

Vladimir Prus wrote:

...

Peter Dimov wrote:

...
Vladimir Prus wrote:

...
That's the only chance for the problem to be fixed. If I can get it fail on my Linux box, I can look into it.

Isn't this thread about an OS X-specific problem?

It started this way. However, I'm not sure what system Kevin has in mind OS X in mind, or that the problem he has is indeed OS X specific.

There appears to be some confusion here.. so I'll try help. My issue is that that in general thread::join() has asserts in it that appear to be triggering (in my case I get them mostly on Windows via thread_group::join_all) that are hiding/revealing the fact that the return code from the various platforms supported by Boost.Threads are being discarded (hopefullt because it prevents the errors from occuring in theory) Thorsten appears to have similar concerns on Mac OS X (pthreads) where the pthread_join() returns EINVAL, now I don't know about Mac OS X but I belive that indicates the following: from IRIX (similar wording under Linux): [EINVAL] The thread identified by thread is not joinable (it is detached). or perhaps from the Open Group: [EINVAL] The implementation has detected that the value specified by thread does not refer to a joinable thread. On inspection it does not appear that you can create a detached thread using Boost.Threads on pthreads, so from what Thorsten then says: <quote> That when I try to join a thread that is no longer running, which cannot be reliably determined by the given thread class interface (well, it cannot at all in fact), then getting an assertion upon try to join that thread is a bug. </quote> and my reading of pthreads, this suggests join() being called on a thread more than once or a bug in Mac OS X (Ben Hutchings messages suggest the same conclusion) On Windows I'm getting the: res = WaitForSingleObject(reinterpret_cast<HANDLE>(m_thread), INFINITE); assert(res == WAIT_OBJECT_0); res = CloseHandle(reinterpret_cast<HANDLE>(m_thread)); assert(res); res == WAIT_OBJECT_0 assert failing, but I have no idea why because I'm not told it failed due to the void thread::join(). I guess we both have similar issues but not necessarily for the same reason. Interestingly, the threads return code from thread_proxy() (passed to pthread_create) is always 0, and is discarded in the pthread_join() so we are only talking about the thread platforms own return codes. As far as my bit of code goes I should have been clearer, I thought it might help with the problem Thorsten had so you can forget it as far as my problem is concerned... Kevin -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |

Vladimir Prus

3:09 p.m.

New subject: boost::thread::join causes assertion(pthreads)

Kevin Wheatley wrote:

...

On Windows I'm getting the:

res = WaitForSingleObject(reinterpret_cast<HANDLE>(m_thread), INFINITE); assert(res == WAIT_OBJECT_0); res = CloseHandle(reinterpret_cast<HANDLE>(m_thread)); assert(res);

res == WAIT_OBJECT_0 assert failing, but I have no idea why because I'm not told it failed due to the void thread::join().

I don't yet know if thread::join should return something or throw an exception, but reporting the error in a better way is desirable. On Linux, assert can be replaced with "perror/abort" combination, and on windows something like that should be also possible ("FormatMessage"?). Anybody objects if I make the change for Linux? - Volodya

Peter Dimov

3:35 p.m.

New subject: boost::thread::join causes assertion(pthreads)

Vladimir Prus wrote:

...

Kevin Wheatley wrote:

...
On Windows I'm getting the:

res = WaitForSingleObject(reinterpret_cast<HANDLE>(m_thread), INFINITE); assert(res == WAIT_OBJECT_0); res = CloseHandle(reinterpret_cast<HANDLE>(m_thread)); assert(res);

res == WAIT_OBJECT_0 assert failing, but I have no idea why because I'm not told it failed due to the void thread::join().

I don't yet know if thread::join should return something or throw an exception, but reporting the error in a better way is desirable. On Linux, assert can be replaced with "perror/abort" combination, and on windows something like that should be also possible ("FormatMessage"?).

Anybody objects if I make the change for Linux?

Yes. assert is not equivalent to perror/abort. The two have different meanings and are not interchangeable.

Vladimir Prus

3:54 p.m.

New subject: boost::thread::join causes assertion(pthreads)

Peter Dimov wrote:

...

Vladimir Prus wrote:

...
Kevin Wheatley wrote:

...
On Windows I'm getting the:

res = WaitForSingleObject(reinterpret_cast<HANDLE>(m_thread), INFINITE); assert(res == WAIT_OBJECT_0); res = CloseHandle(reinterpret_cast<HANDLE>(m_thread)); assert(res);

res == WAIT_OBJECT_0 assert failing, but I have no idea why because I'm not told it failed due to the void thread::join().

I don't yet know if thread::join should return something or throw an exception, but reporting the error in a better way is desirable. On Linux, assert can be replaced with "perror/abort" combination, and on windows something like that should be also possible ("FormatMessage"?).

Anybody objects if I make the change for Linux?

Yes. assert is not equivalent to perror/abort. The two have different meanings and are not interchangeable.

Care to elaborate why the change cannot be made in this specific case? As far as I'm concerned, there's no practical difference. The file and line number can be added to 'perror' argument, and gdb will stop just fine both on abort and assert. - Volodya

Peter Dimov

4:45 p.m.

New subject: boost::thread::join causes assertion(pthreads)

Vladimir Prus wrote:

...

...
Yes. assert is not equivalent to perror/abort. The two have different meanings and are not interchangeable.

Care to elaborate why the change cannot be made in this specific case?

assert(expr) means a precondition failure or an internal error, depending on expr. perror/abort means... perror/abort. If the problem you are trying to fix is that assert() does not give enough information about the bug - and I haven't seen this particular problem reported as a defect - the proper way to fix it is to define an user-overrideable callback.

Vladimir Prus

23 Mar 23 Mar

7:01 a.m.

New subject: boost::thread::join causes assertion(pthreads)

Peter Dimov wrote:

...

Vladimir Prus wrote:

...
...
Yes. assert is not equivalent to perror/abort. The two have different meanings and are not interchangeable.

Care to elaborate why the change cannot be made in this specific case?

assert(expr) means a precondition failure or an internal error, depending on expr.

perror/abort means... perror/abort.

Ah.. you're talking about the meaning attached to the code by the programmer. I was more talking that the effect will be almost the same. Well, it's also possible to write if (!result) { perror(...); assert(...); }

...

If the problem you are trying to fix is that assert() does not give enough information about the bug - and I haven't seen this particular problem reported as a defect

I interpreted Kevin's post this way -- Boost.Tread fails and does not produce any useful diagnostic.

...

- the proper way to fix it is to define an user-overrideable callback.

I think you're overgeneralising. If assert prints an error message and exits, then what's the problem with printing one extra diagnostic message before assert? - Volodya

Peter Dimov

11:17 a.m.

New subject: boost::thread::join causesassertion(pthreads)

Vladimir Prus wrote:

...

Peter Dimov wrote:

...
If the problem you are trying to fix is that assert() does not give enough information about the bug - and I haven't seen this particular problem reported as a defect

I interpreted Kevin's post this way -- Boost.Tread fails and does not produce any useful diagnostic.

The proper way to fix that is to make Boost.Threads not fail, right? That's what invariant/"assertion" asserts are for - to catch bugs. They aren't an error reporting mechanism.

...

...
- the proper way to fix it is to define an user-overrideable callback.

I think you're overgeneralising. If assert prints an error message and exits, then what's the problem with printing one extra diagnostic message before assert?

None (if you are sure that this is what assert does, of course - this is platform specific), except that the asserts should not fire. So spending time and effort to make something happen when the asserts fire is not productive.

Kevin Wheatley

12:22 p.m.

New subject: boost::thread::joincausesassertion(pthreads)

Peter Dimov wrote:

...

Vladimir Prus wrote:

...
I interpreted Kevin's post this way -- Boost.Tread fails and does not produce any useful diagnostic.

The proper way to fix that is to make Boost.Threads not fail, right? That's what invariant/"assertion" asserts are for - to catch bugs. They aren't an error reporting mechanism.

So, what I don't know is 100% if the bug is in Boost.Threads, OS, Compiler, my application, etc, etc. What I do know is that some of my problem stems from the fact that Boost.Threads is not just a header only library, it has a library aspect to it. If everything was perfect in threads then the assertion does not help me debug my application code (assuming it is in there that the problem lies) as it discards the platforms' reasons for the error, it only states that some pre-condition assumed by the code is broken. If threads was header only then I could come up with a macro controlled mechanism for instance to enable extra output in my debugging build, this would be different to the Debug and Release build, but wouldn't require any tinkering with anything other than the files in question. For this particular application I'd like 3 states: release - no asserts, no debugging code, no nothing. debug - asserts, no debugging code. debugging - asserts, extra debugging code (Some people like to leave assertions enabled in release code, some don't) currently the built libraries only cover the first 2 cases. Getting consensus on how to deal with this in general is probably not straight forward, but maybe threads being what they are (hard to program, easy to get wrong, not always repeatable) threads could have a special case need. Kevin -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |

Martin Pasdzierny

3:35 p.m.

New subject: boost::thread::joincausesassertion(pthreads)

Hello Thread experts (hi Peter ;-), I know that keeping concepts well ordered instead of headless hacking is an important virtue. But I think an "error reporting mechanism" is nothing else but ONE tool to "catch bugs" ? In my case the boost::thread-asserts occured about 2-7 days after starting the program with ca. 50 threads. They occured only in productive environments with real loads. I spent some days to write separate programs reproducing the behaviour with minimized code: no success. ANY hint would have been helpful as a starting point to find out the bug causing the asserts. In this situations theoretic discussions are not really welcome ... One certain bug caused somebody to start this discussion. Finally it will be not of interest wether it was a bug in his code or inside boost::thread or inside the OS code. I think it would be a good idea to allow the application programmers to choose the error/assertion policy. Thanks in advance. Martin. Kevin Wheatley wrote:

...

Peter Dimov wrote:

...
Vladimir Prus wrote:

...
I interpreted Kevin's post this way -- Boost.Tread fails and does not produce any useful diagnostic.

The proper way to fix that is to make Boost.Threads not fail, right? That's what invariant/"assertion" asserts are for - to catch bugs. They aren't an error reporting mechanism.

So,

what I don't know is 100% if the bug is in Boost.Threads, OS, Compiler, my application, etc, etc.

What I do know is that some of my problem stems from the fact that Boost.Threads is not just a header only library, it has a library aspect to it. If everything was perfect in threads then the assertion does not help me debug my application code (assuming it is in there that the problem lies) as it discards the platforms' reasons for the error, it only states that some pre-condition assumed by the code is broken.

If threads was header only then I could come up with a macro controlled mechanism for instance to enable extra output in my debugging build, this would be different to the Debug and Release build, but wouldn't require any tinkering with anything other than the files in question.

For this particular application I'd like 3 states:

release - no asserts, no debugging code, no nothing. debug - asserts, no debugging code. debugging - asserts, extra debugging code

(Some people like to leave assertions enabled in release code, some don't)

currently the built libraries only cover the first 2 cases. Getting consensus on how to deal with this in general is probably not straight forward, but maybe threads being what they are (hard to program, easy to get wrong, not always repeatable) threads could have a special case need.

Kevin

-- Martin Pasdzierny, <martin@pasdzierny.de>

Vladimir Prus

12:47 p.m.

New subject: boost::thread::join causesassertion(pthreads)

Peter Dimov wrote:

...

Vladimir Prus wrote:

...
Peter Dimov wrote:

...
If the problem you are trying to fix is that assert() does not give enough information about the bug - and I haven't seen this particular problem reported as a defect

I interpreted Kevin's post this way -- Boost.Tread fails and does not produce any useful diagnostic.

The proper way to fix that is to make Boost.Threads not fail, right? That's what invariant/"assertion" asserts are for - to catch bugs. They aren't an error reporting mechanism.

They are not. Specifically, "assert" is not a error reporting mechanism. But still it prints the file name and line number, which helps to debug the bugs that assert catches.

...

...
...
- the proper way to fix it is to define an user-overrideable callback.

I think you're overgeneralising. If assert prints an error message and exits, then what's the problem with printing one extra diagnostic message before assert?

None (if you are sure that this is what assert does, of course - this is platform specific), except that the asserts should not fire. So spending time and effort to make something happen when the asserts fire is not productive.

I disagree. Here, two people report that they got assertion failures. Good, but we have no idea why. If there were additional diagnostic, we'd knew the exact system error which was reported. - Volodya

Peter Dimov

1:55 p.m.

New subject: boost::thread::joincausesassertion(pthreads)

Vladimir Prus wrote:

...

Peter Dimov wrote:

...
None (if you are sure that this is what assert does, of course - this is platform specific), except that the asserts should not fire. So spending time and effort to make something happen when the asserts fire is not productive.

I disagree. Here, two people report that they got assertion failures. Good, but we have no idea why. If there were additional diagnostic, we'd knew the exact system error which was reported.

Hm. Thorsten Froehlich reports that on Mac OS X pthread_join on a thread that has completed returns EINVAL. He is given a lecture on pthread_join. Nobody with Mac OS X access bothers to investigate, at least not visibly. Kevin Wheatley reports that the assert( res == WAIT_OBJECT_0 ) is failing for him on Windows. Printing res (WAIT_FAILED) or GetLastError() ("invalid handle" would be my bet) won't give us much information as to whether the assert is failing due to double thread::join, thread::join on a default-constructed thread, or an internal error in Boost.Threads. Your response is to add perror on Linux. :-) However I do see your point. In these situations I usually replace assert( r == 0 ); with assert( r != EINVAL ); assert( r != ESRCH ); assert( r != EDEADLK ); assert( r == 0 ); I don't claim that it's better than perror/abort when this is what assert does, print a message to stderr and call abort, but adding platform-specific error reporting sections to every assert would probably be a slippery slope.

Kevin Wheatley

3:01 p.m.

New subject: boost::thread::joincausesassertion(pthreads)

Peter Dimov wrote:

...

Kevin Wheatley reports that the assert( res == WAIT_OBJECT_0 ) is failing for him on Windows. Printing res (WAIT_FAILED) or GetLastError() ("invalid handle" would be my bet) won't give us much information as to whether the assert is failing due to double thread::join, thread::join on a default-constructed thread, or an internal error in Boost.Threads.

To me the most obvious thing missing is something like: void thread::join() { assert(m_joinable); int res = 0; rest of function.... at the start of the function, that should catch some of the programming errors across all platforms and at least says that the assumption is that join is only valid on joinable threads (those non-default constructed, not already joined threads). At least I think we could all agree on that :-) but then does this mean there should be a bool thread::isJoinable() const { return m_joinable; } function too? this may lead to lots of "if (thrd.isJoinable()) thrd.join();" which I'm personally not too happy of else where in threads there are somethings which suggest code more like: void thread::join() { assert(m_joinable); if (!m_joinable) return; int res = 0; rest of function.... for instance this is effectively what goes on in thread_group::remove_thread() withthe if clause after the assert, in that function there is a comment which even indicates that it is potentially unclear what the appropriate level of error this should be. Kevin -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |

Ben Hutchings

6:39 p.m.

Kevin Wheatley wrote:

...

Peter Dimov wrote:

...
Kevin Wheatley reports that the assert( res == WAIT_OBJECT_0 ) is failing for him on Windows. Printing res (WAIT_FAILED) or GetLastError() ("invalid handle" would be my bet) won't give us much information as to whether the assert is failing due to double thread::join, thread::join on a default-constructed thread, or an internal error in Boost.Threads.

To me the most obvious thing missing is something like:

void thread::join() { assert(m_joinable); int res = 0;

rest of function....

Yes, that could make the cause of errors clearer and would do a much better job of detecting them than the assertion about the result of the system return value. The ID or handle value in a non-joinable thread object could be reused by the time of the erroneous join, so that the error would not be detected by the system join function. <snip>

...

but then does this mean there should be a

bool thread::isJoinable() const { return m_joinable; }

function too? <snip>

This seems somewhat reasonable. However, currently there are two significantly different reasons for thread objects to be non-joinable - either the thread is default-constructed and is just an ID holder or the thread has already been joined. These should be distinguishable. I'm convinced that there should be entirely separate types for thread IDs and thread starter/joiner objects and don't understand why they were given the same type. Ben.

Peter Dimov

6:49 p.m.

Ben Hutchings wrote:

...

This seems somewhat reasonable. However, currently there are two significantly different reasons for thread objects to be non-joinable - either the thread is default-constructed and is just an ID holder or the thread has already been joined. These should be distinguishable. I'm convinced that there should be entirely separate types for thread IDs and thread starter/joiner objects and don't understand why they were given the same type.

For various historical reasons, most of them no longer relevant. :-) FWIW, my position on the issue has always been that all (Boost) threads should be joinable.

Caleb Epstein

7:33 p.m.

On Wed, 23 Mar 2005 20:49:02 +0200, Peter Dimov <pdimov@mmltd.net> wrote:

...

Ben Hutchings wrote:

...
This seems somewhat reasonable. However, currently there are two significantly different reasons for thread objects to be non-joinable - either the thread is default-constructed and is just an ID holder or the thread has already been joined. These should be distinguishable. I'm convinced that there should be entirely separate types for thread IDs and thread starter/joiner objects and don't understand why they were given the same type.

For various historical reasons, most of them no longer relevant. :-)

FWIW, my position on the issue has always been that all (Boost) threads should be joinable.

Even if the underlying implementation thread is not (e.g. it has been pthread_detach'd or the equivalent)? Or are you suggesting that a join on a non-implementation-joinable thread should just succeed silently? -- Caleb Epstein caleb dot epstein at gmail dot com

Peter Dimov

11:29 p.m.

Caleb Epstein wrote:

...

On Wed, 23 Mar 2005 20:49:02 +0200, Peter Dimov <pdimov@mmltd.net> wrote:

...
Ben Hutchings wrote:

...
This seems somewhat reasonable. However, currently there are two significantly different reasons for thread objects to be non-joinable - either the thread is default-constructed and is just an ID holder or the thread has already been joined. These should be distinguishable. I'm convinced that there should be entirely separate types for thread IDs and thread starter/joiner objects and don't understand why they were given the same type.

For various historical reasons, most of them no longer relevant. :-)

FWIW, my position on the issue has always been that all (Boost) threads should be joinable.

Even if the underlying implementation thread is not (e.g. it has been pthread_detach'd or the equivalent)? Or are you suggesting that a join on a non-implementation-joinable thread should just succeed silently?

Succeed silently, yes. That's what join means: wait for the thread to finish. If it has already finished, join should just return immediately. Not only that, it should be safe to call it from multiple threads at the same time, which should result in exactly one call to pthread_join. pthread_join is join+destructor rolled into one, but we have "real" destructors in C++, so we don't need to emulate its behavior so strictly.

Caleb Epstein

24 Mar 24 Mar

2:25 a.m.

On Thu, 24 Mar 2005 01:29:57 +0200, Peter Dimov <pdimov@mmltd.net> wrote:

...

Caleb Epstein wrote:

...
Even if the underlying implementation thread is not (e.g. it has been pthread_detach'd or the equivalent)? Or are you suggesting that a join on a non-implementation-joinable thread should just succeed silently?

Succeed silently, yes. That's what join means: wait for the thread to finish. If it has already finished, join should just return immediately. Not only that, it should be safe to call it from multiple threads at the same time, which should result in exactly one call to pthread_join.

But what of the case of a detached thread that is still running? These are not joinable, and having join succeed for them would be misleading to the user, no? Or does a user who tries to join a detached thread get what s/he deserves?

...

pthread_join is join+destructor rolled into one, but we have "real" destructors in C++, so we don't need to emulate its behavior so strictly.

Yes, but I'd expect that if I successfully join a thread, that the thread would have well and truly stopped executing. I don't think there is any way to do that w/a detached pthread. -- Caleb Epstein caleb dot epstein at gmail dot com

Peter Dimov

1:30 p.m.

Caleb Epstein wrote:

...

On Thu, 24 Mar 2005 01:29:57 +0200, Peter Dimov <pdimov@mmltd.net> wrote:

...
Caleb Epstein wrote:

...
Even if the underlying implementation thread is not (e.g. it has been pthread_detach'd or the equivalent)? Or are you suggesting that a join on a non-implementation-joinable thread should just succeed silently?

Succeed silently, yes. That's what join means: wait for the thread to finish. If it has already finished, join should just return immediately. Not only that, it should be safe to call it from multiple threads at the same time, which should result in exactly one call to pthread_join.

But what of the case of a detached thread that is still running?

A Boost thread is never detached. This is a POSIX-only thing to compensate lack of destructors in C. You either have a way to call join (in which case no pthread_detach should have been called), or you don't (in which case it doesn't matter whether pthread_detach has been called or not).

Caleb Epstein

2:57 p.m.

On Thu, 24 Mar 2005 15:30:32 +0200, Peter Dimov <pdimov@mmltd.net> wrote:

...

A Boost thread is never detached. This is a POSIX-only thing to compensate lack of destructors in C. You either have a way to call join (in which case no pthread_detach should have been called), or you don't (in which case it doesn't matter whether pthread_detach has been called or not).

My mistake. For some reason I thought it was possible to create non-joinable threads. -- Caleb Epstein caleb dot epstein at gmail dot com

Ben Hutchings

5:03 p.m.

Caleb Epstein wrote:

...

On Thu, 24 Mar 2005 15:30:32 +0200, Peter Dimov <pdimov@mmltd.net> wrote:

...
A Boost thread is never detached. This is a POSIX-only thing to compensate lack of destructors in C. You either have a way to call join (in which case no pthread_detach should have been called), or you don't (in which case it doesn't matter whether pthread_detach has been called or not).

My mistake. For some reason I thought it was possible to create non-joinable threads.

Just in case it's not clear: it *is* possible to create detached threads. However they cannot have boost::thread objects associated with them (except perhaps as ID holders) because detachment is achieved by destroying the associated boost::thread object. Ben.

Michael Glassford

29 Mar 29 Mar

4:25 p.m.

Peter Dimov wrote:

...

Ben Hutchings wrote:

...
This seems somewhat reasonable. However, currently there are two significantly different reasons for thread objects to be non-joinable - either the thread is default-constructed and is just an ID holder or the thread has already been joined. These should be distinguishable. I'm convinced that there should be entirely separate types for thread IDs and thread starter/joiner objects and don't understand why they were given the same type.

For various historical reasons, most of them no longer relevant. :-)

If you remember, could you detail what these reasons were and why they're no longer relevant? I'm starting to think about what a Boost.Threads rewrite would look like in the event that I get more time (more about that in another post). Of course I want do deal with open issues as well.

...

FWIW, my position on the issue has always been that all (Boost) threads should be joinable.

Thanks, Mike

Kevin Wheatley

24 Mar 24 Mar

9:45 a.m.

Ben Hutchings wrote:

...

Kevin Wheatley wrote:

...
To me the most obvious thing missing is something like:

void thread::join() { assert(m_joinable); int res = 0;

rest of function....

Yes, that could make the cause of errors clearer and would do a much better job of detecting them than the assertion about the result of the system return value. The ID or handle value in a non-joinable thread object could be reused by the time of the erroneous join, so that the error would not be detected by the system join function.

as a note, if this were to happen, it would suggest that a mutex is needed for the duration of the thread::join() function to avoid what then becomes a race condition relating to the m_joinable boolean under the condition that multiple joins() are called by different threads at the same time. This of course depends on the outcome of the question: is it an error to do so? In all the programs I have written I've had a single thread have the responsibility of cleaning up any threads it creates, so for me the lock is fine and it is and error to call join multiple times. What do others think? On the subject of detatched threads, you have to create these youreself outside of the Boost.Threads code as far as I can see. In which case you are on your own anyway. is there a way to make detached threads portable ? I don't know enough about all the platforms to know if it is even possible to do so natively! Kevin -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |

Ben Hutchings

5:39 p.m.

Kevin Wheatley wrote:

...

Ben Hutchings wrote:

...
Kevin Wheatley wrote:

...
To me the most obvious thing missing is something like:

void thread::join() { assert(m_joinable); int res = 0;

rest of function....

Yes, that could make the cause of errors clearer and would do a much better job of detecting them than the assertion about the result of the system return value. The ID or handle value in a non-joinable thread object could be reused by the time of the erroneous join, so that the error would not be detected by the system join function.

as a note, if this were to happen, it would suggest that a mutex is needed for the duration of the thread::join() function to avoid what then becomes a race condition relating to the m_joinable boolean under the condition that multiple joins() are called by different threads at the same time.

Only one thread is expected to "own" another, so boost::thread does not provide thread-safety for its own instances. It already has member functions other than the constructor that modify state and are not thread-safe. If a client needs shared ownership then it should be easy enough to provide this in a wrapper, whereas if boost::thread was itself thread-safe it would add unavoidable overhead that few clients need.

...

On the subject of detatched threads, you have to create these youreself outside of the Boost.Threads code as far as I can see. <snip>

No, you just have to destroy the boost::thread without joining. Ben.

Michael Glassford

29 Mar 29 Mar

4:28 p.m.

New subject: boost::thread::joincausesassertion(pthreads)

Kevin Wheatley wrote: [snip]

...

To me the most obvious thing missing is something like:

void thread::join() { assert(m_joinable); int res = 0;

rest of function....

Done. Mike

Ben Hutchings

23 Mar 23 Mar

5:50 p.m.

Peter Dimov wrote:

...

Vladimir Prus wrote:

...
Peter Dimov wrote:

...
None (if you are sure that this is what assert does, of course - this is platform specific), except that the asserts should not fire. So spending time and effort to make something happen when the asserts fire is not productive.

I disagree. Here, two people report that they got assertion failures. Good, but we have no idea why. If there were additional diagnostic, we'd knew the exact system error which was reported.

Hm.

Thorsten Froehlich reports that on Mac OS X pthread_join on a thread that has completed returns EINVAL. He is given a lecture on pthread_join. Nobody with Mac OS X access bothers to investigate, at least not visibly. <snip>

Well, if anyone would care to compile and run the following program on Mac OS X using pthreads, we can at least tell whether there is a consistent failure: // BEGIN #include <boost/bind.hpp> #include <boost/thread/thread.hpp> #include <boost/thread/xtime.hpp> #include <iostream> #include <ostream> void my_sleep(int sec) { boost::xtime stop; boost::xtime_get(&stop, boost::TIME_UTC); stop.sec += sec; boost::thread::sleep(stop); } void do_nothing() {} int main() { boost::thread thread1(do_nothing); my_sleep(1); std::cout << "Joining exited thread\n"; thread1.join(); boost::thread thread2(boost::bind(my_sleep, 5)); std::cout << "Joining running thread\n"; thread2.join(); } // END It works for me on Windows XP (compiled with VC++ 7.1) and Linux 2.6 (compiled with g++ 3.3). Ben.

Matthew Peltzer

24 Mar 24 Mar

5:12 a.m.

...

Well, if anyone would care to compile and run the following program on Mac OS X using pthreads, we can at least tell whether there is a consistent failure:

...

g++ -v Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs Thread model: posix gcc version 3.3 20030304 (Apple Computer, Inc. build 1671) g++ -I/sw/include/boost-1_31 -L/sw/lib -lboost_thread-1_31 -o thread thread.cpp In file included from /sw/include/boost-1_31/boost/type_traits/is_arithmetic.hpp:13, from /sw/include/boost-1_31/boost/type_traits/arithmetic_traits.hpp:14, from /sw/include/boost-1_31/boost/function/function_base.hpp:19, from /sw/include/boost-1_31/boost/function/detail/prologue.hpp:16, from /sw/include/boost-1_31/boost/function.hpp:22, from /sw/include/boost-1_31/boost/thread/thread.hpp:20, from thread.cpp:2: /sw/include/boost-1_31/boost/type_traits/is_float.hpp:21: warning: use of `long double' type; its size may change in a future release /sw/include/boost-1_31/boost/type_traits/is_float.hpp:21: warning: (Long double usage is reported only once for each file. /sw/include/boost-1_31/boost/type_traits/is_float.hpp:21: warning: To disable

Here's what it looks it looks like with boost1_31 from fink on OS X 10.3.8: this warning, use -Wno-long-double.)

...

./thread Joining exited thread Joining running thread

...not sure if i'm "using pthreads" explicitly, but it all looks good to me. -- Matthew Peltzer -- goochrules@gmail.com

Ben Hutchings

5:53 p.m.

Matthew Peltzer wrote:

...

...
Well, if anyone would care to compile and run the following program on Mac OS X using pthreads, we can at least tell whether there is a consistent failure:

Here's what it looks it looks like with boost1_31 from fink on OS X 10.3.8:

...
g++ -v

Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs Thread model: posix gcc version 3.3 20030304 (Apple Computer, Inc. build 1671)

...
g++ -I/sw/include/boost-1_31 -L/sw/lib -lboost_thread-1_31 -o thread thread.cpp

It's not clear to me from this whether you linked to a debug version of the thread library. In a non-debug version the assertions are disabled, so that wouldn't provide a useful test.

...

In file included from /sw/include/boost-1_31/boost/type_traits/is_arithmetic.hpp:13, from /sw/include/boost-1_31/boost/type_traits/arithmetic_traits.hpp:14, from /sw/include/boost-1_31/boost/function/function_base.hpp:19, from /sw/include/boost-1_31/boost/function/detail/prologue.hpp:16, from /sw/include/boost-1_31/boost/function.hpp:22, from /sw/include/boost-1_31/boost/thread/thread.hpp:20, from thread.cpp:2: /sw/include/boost-1_31/boost/type_traits/is_float.hpp:21: warning: use of `long double' type; its size may change in a future release /sw/include/boost-1_31/boost/type_traits/is_float.hpp:21: warning: (Long double usage is reported only once for each file. /sw/include/boost-1_31/boost/type_traits/is_float.hpp:21: warning: To disable this warning, use -Wno-long-double.)

...
./thread

Joining exited thread Joining running thread

...not sure if i'm "using pthreads" explicitly, but it all looks good to me.

"Thread model: posix" seems to imply that g++ will generate code suitable for pthreads, though most ports require a command-line option to ensure that it generates thread-safe code and uses thread-safe libraries. Perhaps the Darwin port always operates in thread-safe mode. Ben.

Matthew Peltzer

25 Mar 25 Mar

12:41 a.m.

On Thu, 24 Mar 2005 17:53:49 +0000, Ben Hutchings <ben.hutchings@businesswebsoftware.com> wrote:

...

Matthew Peltzer wrote:

...
...
Well, if anyone would care to compile and run the following program on Mac OS X using pthreads, we can at least tell whether there is a consistent failure:

Here's what it looks it looks like with boost1_31 from fink on OS X 10.3.8:

...
g++ -v

Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs Thread model: posix gcc version 3.3 20030304 (Apple Computer, Inc. build 1671)

...
g++ -I/sw/include/boost-1_31 -L/sw/lib -lboost_thread-1_31 -o thread thread.cpp

It's not clear to me from this whether you linked to a debug version of the thread library. In a non-debug version the assertions are disabled, so that wouldn't provide a useful test.

...

g++ -Wall -O0 -DDEBUG -I/sw/include/boost-1_31 -L/sw/lib -lboost_thread-d-1_31 -o thread thread.cpp In file included from /sw/include/boost-1_31/boost/type_traits/is_arithmetic.hpp:13, from /sw/include/boost-1_31/boost/type_traits/arithmetic_traits.hpp:14, from /sw/include/boost-1_31/boost/function/function_base.hpp:19, from /sw/include/boost-1_31/boost/function/detail/prologue.hpp:16, from /sw/include/boost-1_31/boost/function.hpp:22, from /sw/include/boost-1_31/boost/thread/thread.hpp:20, from thread.cpp:2: /sw/include/boost-1_31/boost/type_traits/is_float.hpp:21: warning: use of `long double' type; its size may change in a future release /sw/include/boost-1_31/boost/type_traits/is_float.hpp:21: warning: (Long double usage is reported only once for each file. /sw/include/boost-1_31/boost/type_traits/is_float.hpp:21: warning: To disable

one more time then: this warning, use -Wno-long-double.)

...

./thread Joining exited thread Joining running thread

...

"Thread model: posix" seems to imply that g++ will generate code suitable for pthreads, though most ports require a command-line option to ensure that it generates thread-safe code and uses thread-safe libraries. Perhaps the Darwin port always operates in thread-safe mode.

I don't have any answers here. -- Matthew Peltzer -- goochrules@gmail.com

Ben Hutchings

22 Mar 22 Mar

2:54 p.m.

Thorsten Froehlich wrote:

...

Ben Hutchings wrote:

...
Since you apparently don't understand the semantics of joining, perhaps you have in fact made an error.

I can assure you my co-workers and I perfectly understand joining.

So you understand that it is possible to join a (joinable) thread after it has exited, and pthread_join will not fail in this case.

...

...
Why don't you post an example that shows the behaviour you're reporting?

As others have already understood the problem, there does not seem to be a need to do so.

Removing the assertions merely hides whatever the problem is. Please post some code so that the problem can be properly investigated and fixed. Ben.

Peter Dimov

3:34 p.m.

Ben Hutchings wrote:

...

Thorsten Froehlich wrote:

...
Ben Hutchings wrote:

...
Since you apparently don't understand the semantics of joining, perhaps you have in fact made an error.

I can assure you my co-workers and I perfectly understand joining.

So you understand that it is possible to join a (joinable) thread after it has exited, and pthread_join will not fail in this case.

Thorsten's point is that it fails on Mac OS X.

Ben Hutchings

5:02 p.m.

Peter Dimov wrote:

...

Ben Hutchings wrote:

...
Thorsten Froehlich wrote:

...
Ben Hutchings wrote:

...
Since you apparently don't understand the semantics of joining, perhaps you have in fact made an error.

I can assure you my co-workers and I perfectly understand joining.

So you understand that it is possible to join a (joinable) thread after it has exited, and pthread_join will not fail in this case.

Thorsten's point is that it fails on Mac OS X.

Well, it does in his program, which none of has have seen. Ben.

7437

Age (days ago)

7445

Last active (days ago)

List overview

Download

46 comments

10 participants

participants (10)

Ben Hutchings
Caleb Epstein
David E. Konerding DSD staff
Kevin Wheatley
Martin Pasdzierny
Matthew Peltzer
Michael Glassford
Peter Dimov
Thorsten Froehlich
Vladimir Prus