[thread] patch for tss.cpp (win32 + static linking support)

Hi, I believe I've found a bug in tss.cpp, for details please ref http://article.gmane.org/gmane.comp.lib.boost.user/26591 . The immediate fix is straightforward, it just involves resetting the native tss value after deleting the pointed-to data at thread exit. Patch is attached to this message. The changes only affects the Win32 platform and I've run the supplied tests under libs/thread/test to verify that it doesn't break anything obvious. As the bug involves dereferencing a dangling pointer and isn't detected most of the time, I'd really appreciate if the patch could be accepted as soon as possible. I'm awaiting Roland Schwartz to "bless" the patch. If he agrees, would it be possible to accept this for 1.34? / Johan

Johan Nilsson wrote:
Hi,
I believe I've found a bug in tss.cpp, for details please ref http://article.gmane.org/gmane.comp.lib.boost.user/26591 .
The immediate fix is straightforward, it just involves resetting the native tss value after deleting the pointed-to data at thread exit. Patch is attached to this message. The changes only affects the Win32 platform and I've run the supplied tests under libs/thread/test to verify that it doesn't break anything obvious.
As the bug involves dereferencing a dangling pointer and isn't detected most of the time, I'd really appreciate if the patch could be accepted as soon as possible. I'm awaiting Roland Schwartz to "bless" the patch. If he agrees, would it be possible to accept this for 1.34?
Ping, anyone? / Johan

"Johan Nilsson" <r.johan.nilsson@gmail.com> writes:
I believe I've found a bug in tss.cpp, for details please ref http://article.gmane.org/gmane.comp.lib.boost.user/26591 .
The immediate fix is straightforward, it just involves resetting the native tss value after deleting the pointed-to data at thread exit. Patch is attached to this message. The changes only affects the Win32 platform and I've run the supplied tests under libs/thread/test to verify that it doesn't break anything obvious.
As the bug involves dereferencing a dangling pointer and isn't detected most of the time, I'd really appreciate if the patch could be accepted as soon as possible. I'm awaiting Roland Schwartz to "bless" the patch. If he agrees, would it be possible to accept this for 1.34?
Your patch calls TlsSetValue after calling cleanup_slots. cleanup_slots may call TlsFree, in which case this is not a safe thing to do. A correct fix would require further investigation. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Anthony Williams wrote:
"Johan Nilsson" <r.johan.nilsson@gmail.com> writes:
I believe I've found a bug in tss.cpp, for details please ref http://article.gmane.org/gmane.comp.lib.boost.user/26591 .
The immediate fix is straightforward, it just involves resetting the native tss value after deleting the pointed-to data at thread exit. Patch is attached to this message. The changes only affects the Win32 platform and I've run the supplied tests under libs/thread/test to verify that it doesn't break anything obvious.
As the bug involves dereferencing a dangling pointer and isn't detected most of the time, I'd really appreciate if the patch could be accepted as soon as possible. I'm awaiting Roland Schwartz to "bless" the patch. If he agrees, would it be possible to accept this for 1.34?
Your patch calls TlsSetValue after calling cleanup_slots. cleanup_slots may call TlsFree, in which case this is not a safe thing to do. A correct fix would require further investigation.
[Someone's listening, great] Yes, you are correct. TlsFree will be called by tss_data_dec_use (which is in turn is called by cleanup_slots) when the global reference count reaches 0. - One solution would be to statically initialize tss_data_native_key to TLS_OUT_OF_INDEXES, and reset to that value from tss_data_dec_use in the above case. The code in tss_thread_exit could then check for a valid tls index before calling TlsSetValue. There shouldn't be a race condition checking/manipulating the native tls index as it should exist only one thread using tss data at this point ... or? Am I missing something again? - A second alternative would be to remove the TlsFree call altogether, as the application is shutting down anyway. A bit dirty perhaps. Do you have any suggestions for a better fix? If it would be possible to have tss_thread_exit called at the correct time for the main thread, the problems would probably be nonexistent. Still, explicitly (de-)initializing the globals doesn't sound like a bad idea. / Johan

"Johan Nilsson" <r.johan.nilsson@gmail.com> writes:
Anthony Williams wrote:
"Johan Nilsson" <r.johan.nilsson@gmail.com> writes:
I believe I've found a bug in tss.cpp, for details please ref http://article.gmane.org/gmane.comp.lib.boost.user/26591 .
The immediate fix is straightforward, it just involves resetting the native tss value after deleting the pointed-to data at thread exit. Patch is attached to this message. The changes only affects the Win32 platform and I've run the supplied tests under libs/thread/test to verify that it doesn't break anything obvious.
As the bug involves dereferencing a dangling pointer and isn't detected most of the time, I'd really appreciate if the patch could be accepted as soon as possible. I'm awaiting Roland Schwartz to "bless" the patch. If he agrees, would it be possible to accept this for 1.34?
Your patch calls TlsSetValue after calling cleanup_slots. cleanup_slots may call TlsFree, in which case this is not a safe thing to do. A correct fix would require further investigation.
- One solution would be to statically initialize tss_data_native_key to TLS_OUT_OF_INDEXES, and reset to that value from tss_data_dec_use in the above case. The code in tss_thread_exit could then check for a valid tls index before calling TlsSetValue.
There shouldn't be a race condition checking/manipulating the native tls index as it should exist only one thread using tss data at this point ... or? Am I missing something again?
There isn't a race condition per-se, but there is no logic to reinitialize tss_data_native_key once it has been freed, so that would have to be added.
- A second alternative would be to remove the TlsFree call altogether, as the application is shutting down anyway. A bit dirty perhaps.
Possible, yes --- the code to call TlsFree is relatively new. Dirty, yes. I think the TlsFree call should be moved to on_process_exit, but that's not a trivial change. You'd still need to call TlsSetValue to clear the entry.
Do you have any suggestions for a better fix? If it would be possible to have tss_thread_exit called at the correct time for the main thread, the problems would probably be nonexistent.
Are you using Boost.Thread as a DLL or a LIB file? If you're using it as a lib file, then the code in tss_pe.cpp controls when the exit stuff is called. It is possible to make on_thread_exit be called correctly for the main thread, by calling it on process exit, but the problem here is that at this point in the process shutdown, things like new and delete don't necessarily work correctly, and the runtime has already been shut down --- the TSS code would need to be changed to use HeapAlloc/HeapFree rather than new and delete, and then there's the problem of what the destructors for the TSS data do: they can't necessarily use any library functions either, and that's not enforceable or readily explainable. Having global destructors and atexit functions use TSS is a bad idea; this is just a manifestation of that. Sorry, but I don't think this can be fixed for 1.34.0. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Anthony Williams wrote:
"Johan Nilsson" <r.johan.nilsson@gmail.com> writes:
[snip]
- One solution would be to statically initialize tss_data_native_key to TLS_OUT_OF_INDEXES, and reset to that value from tss_data_dec_use in the above case. The code in tss_thread_exit could then check for a valid tls index before calling TlsSetValue.
There shouldn't be a race condition checking/manipulating the native tls index as it should exist only one thread using tss data at this point ... or? Am I missing something again?
There isn't a race condition per-se, but there is no logic to reinitialize tss_data_native_key once it has been freed, so that would have to be added.
I've already been bitten by that problem in the current implementation, so this wouldn't be a regression. See: http://article.gmane.org/gmane.comp.lib.boost.devel/155056
- A second alternative would be to remove the TlsFree call altogether, as the application is shutting down anyway. A bit dirty perhaps.
Possible, yes --- the code to call TlsFree is relatively new. Dirty, yes. I think the TlsFree call should be moved to on_process_exit, but that's not a trivial change. You'd still need to call TlsSetValue to clear the entry.
Sure.
Do you have any suggestions for a better fix? If it would be possible to have tss_thread_exit called at the correct time for the main thread, the problems would probably be nonexistent.
Are you using Boost.Thread as a DLL or a LIB file?
Static library (hint, see subject line :-).
If you're using it as a lib file, then the code in tss_pe.cpp controls when the exit stuff is called.
Yes, I know. I'll probably do some experimenting with this when and if I get some extra time.
It is possible to make on_thread_exit be called correctly for the main thread, by calling it on process exit, but the problem here is that at this point in the process shutdown, things like new and delete don't necessarily work correctly, and the runtime has already been shut down --- the TSS code would need to be changed to use HeapAlloc/HeapFree rather than new and delete, and then there's the problem of what the destructors for the TSS data do: they can't necessarily use any library functions either, and that's not enforceable or readily explainable.
Agreed. The RTL should always be in a valid state when the TSS data destructors are run, as long as non-pod types can be stored in TSS.
Having global destructors and atexit functions use TSS is a bad idea; this is just a manifestation of that.
Being able to use TSS in this way is nice and transparent from a user's point of view. I assume that you mean implementation-wise?
Sorry, but I don't think this can be fixed for 1.34.0.
No, I've given up this already as 1.34 is already in beta. Perhaps 1.34.1 or 1.35. I can workaround the problem by patching the local boost installation. I'm surprised that no one else seem to have experienced the problem, though. / Johan
participants (3)
-
Anthony Williams
-
Johan Nilsson
-
Roland Schwarz