Windows MSVC thread exit handler for staticly linked Boost.Thread

Following is an implementation of, and some comments on, a staticly linked Boost.Thread thread exit handler for MSVC. By the way, I appologize for talking so long to cough this up after posting teasing remarks about this months ago. Programming is presently a hobby and volunteer work for me, unfortunately. First, the handler itself: ---msvc_exit_handler.cpp--- // copyright disclaimed. ABSOLUTELY NO WARRANTY. #define WIN32_LEAN_AND_MEAN #include <windows.h> #define BOOST_THREAD_USE_LIB #include <boost/thread/thread.hpp> #include <boost/thread/tss.hpp> extern "C" BOOST_THREAD_DECL void on_process_exit(void); extern "C" BOOST_THREAD_DECL void on_thread_exit(void); namespace { // Force a TLS directory to be generated even when static TLS is not used. extern "C" int _tls_used; int dummy() { return _tls_used; } // Report thread and process detach events. void NTAPI tls_callback (PVOID, DWORD Reason, PVOID) { if(Reason == DLL_THREAD_DETACH) on_thread_exit(); else if(Reason == DLL_PROCESS_DETACH) { on_thread_exit(); on_process_exit(); } } // Add callback to the TLS callback list in TLS directory. #pragma data_seg(push, old_seg) #pragma data_seg(".CRT$XLB") DWORD tls_callback_ptr = (DWORD)tls_callback; #pragma data_seg(pop, old_seg) } // namespace // Indicate that TSS cleanup is implemented. extern "C" void tss_cleanup_implemented(void) { } ---EOF--- And a simple testcase: ---tss.cpp--- /* Uncomment the following to use _beginthread() to create theads instead * of CreateThread. */ // #define USE_BEGINTHREAD #include <iomanip> #include <iostream> #define WIN32_LEAN_AND_MEAN #include <windows.h> #include <process.h> #define BOOST_THREAD_USE_LIB #include <boost/thread/thread.hpp> #include <boost/thread/tss.hpp> class myclass { public: myclass(); ~myclass(); }; myclass::myclass() { std::cerr << "init: " << GetCurrentThreadId() << '\n'; } myclass::~myclass() { std::cerr << "fini: " << GetCurrentThreadId() << '\n'; } boost::thread_specific_ptr<myclass> value; #ifdef USE_BEGINTHREAD void thread_proc(void *) { value.reset(new myclass); } #else // USE_BEGINTHREAD DWORD WINAPI thread_proc(LPVOID) { value.reset(new myclass); return 0; } #endif // USE_BEGINTHREAD int main(int argc, char *argv[]) { value.reset(new myclass); for (int i=0; i<2; ++i) { #ifdef USE_BEGINTHREAD _beginthread(thread_proc, 0, 0); #else DWORD id; CreateThread(0, 0, thread_proc, 0, 0, &id); #endif } Sleep(100); } ---EOF--- I tested this on Microsoft Visual C++ Toolkit 2003 on Windows XP SP1, as follows: C:\aaronwl\cs\boost\tls\handler>cl /EHa /MT /I"C:\aaronwl\cs\boost\cvs\boost" tss.cpp msvc_exit_handler.cpp ... C:\aaronwl\cs\boost\tls\handler>tss init: 2976 init: 4080 fini: 4080 init: 4040 fini: 4040 fini: 2976 Note that thread termination is correctly reported for all of the main thread and the two extra threads. It is my guess that this mechanism will work correctly on all versions of i386 Windows since Windows 95 and Windows NT 3.51. However, I have not tested it exhaustively due to lack of access. This mechanism is required to work by the PECOFF specification, so I'd expect any compliant x86 PE loader to support it. It is also my guess that this method of generating a thread termination notification will work on any version of MSVC that generates "32-bit" Windows executables. Only part of this compiler support, as implemented, is mandated by PECOFF. While apparently unused, completely undocumented, and largely unknown, the TLS callback section support is very similar to the general CRT global initialization and finalization routines, which--while they are also undocumented--most likely can never be removed as it would break backward object compatibility. I would be suprised if this did not work on a future MSVC version. This method of catching thread termination shares a very serious flaw with the DllMain() method: the thread PTD has already been destroyed by the time this code runs. As you can see, everything appears to work fine despite this; however, this is only by chance. In some cases, I think the code will silently do the wrong thing. For example, output might be lost. There was some suggestion about re-creating the PTD temporarily. This sounds good to me, except I can not find a reasonable way to do this, as the ptd manipulation functions are not exported the MSVCRT runtime dll. It might be possible to create a ptd manually, but this seems very risky to me. There was also mention of some sort of runtime library floating point hook. I was unable to figure out anything about this; however, if this is possible, and the ptd is still valid when this hook is called, that method is probably superior. As far as I can tell, FlsCallback() offers no advantage with regards to the ptd issue. It is also called "too late." Another problem with this method is that it gets destructor order wrong. For example, in the case of termination of the main thread, global destructors will be called before the TSS destructors are. This also could lead to silent misbehavior. It is likely that a better method for handling termination of the main thread could be found, such as a global destructor. Note that, with this method, it doesn't particular matter how the thread was created. There will be no ptd regardless of whether the thread was created with _beginthread or not. I am working on code for the MinGW runtime that should be able to cause GCC on Windows to handle TSS destructors perfectly. I don't know anything about any of the other Windows compilers with regards to this issue. In summary, I beleive this is the best shot so far at implementing thread exit handlers when Boost.Thread is statically linked, despite its serious problems. It is possible that a way to mitigate these problems, or a superior alternative, could be found. Aaron W. LaFramboise

Aaron W. LaFramboise wrote:
Following is an implementation of, and some comments on, a staticly linked Boost.Thread thread exit handler for MSVC.
Wow, great. It works on MSVC71, but not on MSVC6. I tried following, very simple test code, compiled with any of CRT variants /MD /MDd /MT or /MTd . What's meaning of data_seg(".CRT$XLB")? #define WIN32_LEAN_AND_MEAN #include <windows.h> DWORD WINAPI f(void *) { Sleep(500); return 0; } // Report thread and process detach events. void NTAPI tls_callback(PVOID, DWORD Reason, PVOID) { if(Reason == DLL_THREAD_DETACH) Beep(440, 100); else if(Reason == DLL_PROCESS_DETACH) Beep(880, 50); } int main(int argc, char* argv[]) { DWORD id; HANDLE h = CreateThread(NULL, 0, &f, 0, 0, &id); WaitForSingleObject(h, INFINITE); CloseHandle(h); Sleep(300); return 0; } #pragma optimize ("", off) namespace { // "__declspec(thread)" is documented in MSDN to create .tls section extern __declspec(thread) int dummy_ = 1; int dummy() {return dummy_;} } #pragma optimize ("", on) #pragma data_seg(".CRT$XLB") DWORD callback_ptr = (DWORD)tls_callback;

Bronek Kozicki wrote:
Aaron W. LaFramboise wrote:
Following is an implementation of, and some comments on, a staticly linked Boost.Thread thread exit handler for MSVC.
Wow, great. It works on MSVC71, but not on MSVC6. I tried following, very simple test code, compiled with any of CRT variants /MD /MDd /MT or /MTd . What's meaning of data_seg(".CRT$XLB")?
How unfortunate. I think I have an 'introductory version' of MSVC6 somewhere around here. I will look at it and see if this is something that can be fixed. That pragma causes data to temporarily be put into a specific section. In MS PE (I don't think this is specified in PECOFF), sections that have identical names up to a $ character are merged consecutively at link time. The characters after the $ specify the order the sections are merged in. The code XLA, in this case, has a special, but as far as I know, entirely undocumented meaning. The X appears to be some random identifier. The L means 'TLS callback section.' Other options here are C, for C initializers, I, for C++ global ctors, and P and T for global dtors (the former is for "pre-termination"). The final B is an ordering priority that can be anything A to Z, but it looked like it would be a bad idea to use A. Since this general .CRT mechanism was present in MSVC6, and that (as far as I know) no new static TLS features were added by MSVC7.1, it seems likely to me that this same TLS callback mechanism would be availible in MSVC6. I'm going to check. Aaron W. LaFramboise

"Aaron W. LaFramboise" wrote
Bronek Kozicki wrote: [cut]
Since we are sort of messing with PEF and considering a separate DLL for catching thread exit notifications (it won't work with threads forcefully terminated with TerminateThread Win32 API) perhaps hooking Win32 API calls ExitThread and TerminateThread would be one solution? Tony

Tony Juricic wrote:
Since we are sort of messing with PEF and considering a separate DLL for catching thread exit notifications (it won't work with threads forcefully terminated with TerminateThread Win32 API) perhaps hooking Win32 API calls ExitThread and TerminateThread would be one solution?
I've given this some amount of thought, but to be honest, I don't see any way this could be done reliably and without making some serious compromises: -I think it might be possible for threads to die some other way. I don't think anything makes any guarantee that ExitThread and TerminateThread are the only ways a thread may exit. For example, a thread may be caused to exit by some lower-level API (perhaps a direct access to NTDLL). -Hooking itself is problematic. I don't think that modifying kernel32's in-memory EAT is going to catch all possible ways ExitThread might be called. An alternate proposed strategy is to directly alter the first few bytes of the function the EAT refers to, but this is also not generally correct (a jmp is typically five bytes, but there might be some jmp into the middle of that instr), and also is a system global change on Win9x systems. Besides these problems, modifying in-memory DLL images in any manner is undesirable, as it will trigger COW semantics and cause possibly significant increases in memory usage. -Hooking may interfere with proper operation of debuggers. Hooking is great strategy for a debugger, or for programs designed to act as debuggers, but I think it is far to risky a technique to rely on for normal application programs. The advantage you mention (besides possibly having a valid PTD at dtor-time) is that we can execute dtors for threads killed by TerminateThread. I don't know if this is a good idea. In the general case, there is no particular guarantee about what context the thread will be in when you call TerminateThread. In particular, any and all thread-local data structures may be in an inconsistant state. Executing dtors in this situation would be wrong, and would lead to undesirable and unpredictable actions. Aaron W. LaFramboise

"Aaron W. LaFramboise" wrote:
... -I think it might be possible for threads to die some other way. I don't think anything makes any guarantee that ExitThread and TerminateThread are the only ways a thread may exit. For example, a thread may be caused to exit by some lower-level API (perhaps a direct access to NTDLL). ...
Yes. ExitProcess, for example, will terminate all misbehaving threads. By misbehaving I mean something from as dumb as thread procedure like: while(true) ; to complex thread procedures using locks, mutexes and so on. I share your opinions about hooking API calls. To summarize my own understanding (or misunderstanding) let me try classify the problem issues: 1) TSS slot cleanup. 1a) intentional leak of a 'hidden' native slot 1b) cleanup of TSS slots reserved by boost::thread_specific_ptr<class T> 2) cleanup of objects stored in TSS slots. semi-sloppy thread procedures that forget reset like: { if (ptr.get() == 0) ptr.reset(new Object()); .... code // forgotten ptr.reset(); } 3) Thread (and process/main thread) exit notifications required for cleanup 3a) semi-sloppy thread procedures will get TSS and objects cleaned up 3b) misbehaving thread procedures leaking everything, including TSS Tony

Tony Juricic wrote:
Since we are sort of messing with PEF and considering a separate DLL for catching thread exit notifications (it won't work with threads forcefully terminated with TerminateThread Win32 API) perhaps hooking Win32 API calls ExitThread and TerminateThread would be one solution?
I do not think so. 1. we are not "messing"with PE - TSL callback procedure is well-documented in PE file format, see section 6.7.2 of document "Microsoft Portable Executable and Common Object File Format Specification" (available http://www.openwatcom.org/ftp/devel/docs/pecoff.pdf and in other places). It's also well handled by Windows NT line of OSes (and possibly older versions of Windows too) 2. we are acually "messing" with the compiler (and linker), trying to force it to put right data in the executable file. So far we succeed only with MSVC71. "messing" with compilers is something that members of Boost community are already pretty familiar with. We may also seek other *supported* ways to attach our function to module loader list. 3. ThreadTerminate is something that I believe we do not need to handle. This function does really not free anything, not even TLS nor stack. Thus freeing anything in higher-level code (like Boost) would be just silly. It's supposed to be called only in handling of SEH exceptions, logic errors or when terminating thread from outside, eg. when thread is in "undefined" ("while(1);") state. Trying to free its resource is similar to "recovering" from undefined behaviour (you simply can't do it well). If we really want to handle thread termination, RegisterWaitForSingleObject should suffice. 4. ExitThread and TerminateThread are both defined in kernel32.dll . These functions do use some lower-level functionality from ntdll.dll. However ExitThread is the only Windows function that fully implements exiting from thread (ie. notyfing loaded modules, freeing TLS, freeing FLS, freeing stack and notyfing scheduler). There are no other functions in ntdll.dll that could be used "instead" to cleanly exit from thread in Win32 subsystem. 5. Microsoft already published method for hooking into system functions, it's called "Detours" and documented here : http://research.microsoft.com/sn/detours/ . However I strongly believe we should not use it; it's useful only for very limited set of purposes (tracing, debuging, logging, system diagnostic) and for our users it would be no-no in production environment. B.

Aaron W. LaFramboise wrote:
Following is an implementation of, and some comments on, a staticly linked Boost.Thread thread exit handler for MSVC.
By the way, I appologize for talking so long to cough this up after posting teasing remarks about this months ago. Programming is presently a hobby and volunteer work for me, unfortunately.
Thanks! I've been looking forward to you posting this, and it appears to work beautifully for VC++ 7.1 (as others have already commented). Do I have your permission to included it in Boost.Threads under the Boost license (with your name on it, of course)? Mike

Michael Glassford wrote:
Aaron W. LaFramboise wrote:
Following is an implementation of, and some comments on, a staticly linked Boost.Thread thread exit handler for MSVC.
By the way, I appologize for talking so long to cough this up after posting teasing remarks about this months ago. Programming is presently a hobby and volunteer work for me, unfortunately.
Thanks! I've been looking forward to you posting this, and it appears to work beautifully for VC++ 7.1 (as others have already commented). Do I have your permission to included it in Boost.Threads under the Boost license (with your name on it, of course)?
Yes, you and anyone else have perpetual permission to offer it under any license you choose. I'm working on a way to do this under VC6. Please let me know if theres any other way I can be of assistance with regards to this. Aaron W. LaFramboise

Aaron W. LaFramboise wrote:
Michael Glassford wrote:
Aaron W. LaFramboise wrote:
Following is an implementation of, and some comments on, a staticly linked Boost.Thread thread exit handler for MSVC.
By the way, I appologize for talking so long to cough this up after posting teasing remarks about this months ago. Programming is presently a hobby and volunteer work for me, unfortunately.
Thanks! I've been looking forward to you posting this, and it appears to work beautifully for VC++ 7.1 (as others have already commented). Do I have your permission to included it in Boost.Threads under the Boost license (with your name on it, of course)?
Yes, you and anyone else have perpetual permission to offer it under any license you choose.
Thank you. It's in, and will be in the upcoming release. A lot of people will be happy, including me! I look forward to the GCC version as well, though I haven't yet used GCC myself.
I'm working on a way to do this under VC6. Please let me know if theres any other way I can be of assistance with regards to this.
By the way, in your posting, you had this code: // Report thread and process detach events. void NTAPI tls_callback (PVOID, DWORD Reason, PVOID) { if(Reason == DLL_THREAD_DETACH) on_thread_exit(); else if(Reason == DLL_PROCESS_DETACH) { on_thread_exit(); on_process_exit(); } } If I understand what DLL_PROCESS_DETACH means (the thread is being notified that it is being detached because the process is exiting), the call to on_process_exit() is incorrect; on_process_exit() is meant to be called when the process actually exits (it calls TlsFree()). The most likely place to call it is at the end of main() or after the main thread exits. Mike

Michael Glassford wrote:
Aaron W. LaFramboise wrote:
Michael Glassford wrote:
Aaron W. LaFramboise wrote:
Following is an implementation of, and some comments on, a staticly linked Boost.Thread thread exit handler for MSVC.
By the way, I appologize for talking so long to cough this up after posting teasing remarks about this months ago. Programming is presently a hobby and volunteer work for me, unfortunately.
Thanks! I've been looking forward to you posting this, and it appears to work beautifully for VC++ 7.1 (as others have already commented). Do I have your permission to included it in Boost.Threads under the Boost license (with your name on it, of course)?
Yes, you and anyone else have perpetual permission to offer it under any license you choose.
Thank you. It's in, and will be in the upcoming release. A lot of people will be happy, including me! I look forward to the GCC version as well, though I haven't yet used GCC myself.
I'm working on a way to do this under VC6. Please let me know if theres any other way I can be of assistance with regards to this.
By the way, in your posting, you had this code:
// Report thread and process detach events. void NTAPI tls_callback (PVOID, DWORD Reason, PVOID) {
if(Reason == DLL_THREAD_DETACH) on_thread_exit(); else if(Reason == DLL_PROCESS_DETACH) { on_thread_exit(); on_process_exit(); } }
If I understand what DLL_PROCESS_DETACH means (the thread is being notified that it is being detached because the process is exiting), the call to on_process_exit() is incorrect; on_process_exit() is meant to be called when the process actually exits (it calls TlsFree()). The most likely place to call it is at the end of main() or after the main thread exits.
I sent my last message too soon, before I mentioned that I'm aware that the incorrect code was originally copied from my code. In thinking about it, I also realized that, at least in the case of Boost.Threads being built as a dll, there isn't a convenient place to call on_process_exit() from and the name of the function is misleading (since the function needs to be called when the dll is being unloaded, not necessarily when the process is exiting). So I'm considering the following changes: 1) Eliminating the on_process_enter() function, which is unnecessary ans has a misleading name, and the on_process_exit() function, which is difficult to call correctly and also has a misleading name. 2) Adding the new on_thread_enter() function. 3) Implementing a attached-thread-count scheme: on_thread_enter() increments the count, on_thread_exit() decrements it, and when it reaches zero on_thread_exit() calls TlsFree(). The DllMain() or tls_callback() functions call on_thread_enter() for both the DLL_PROCESS_ATTACH and the DLL_THREAD_ATTACH messages, and they call on_thread_exit for both the DLL_PROCESS_DETACH and DLL_THREAD_DETACH messages. This scheme seems a little cleaner and so far seems to be working OK. Any thoughts about this? Mike

Michael Glassford wrote:
Michael Glassford wrote:
Aaron W. LaFramboise wrote:
Michael Glassford wrote:
Aaron W. LaFramboise wrote:
Following is an implementation of, and some comments on, a staticly linked Boost.Thread thread exit handler for MSVC.
By the way, I appologize for talking so long to cough this up after posting teasing remarks about this months ago. Programming is presently a hobby and volunteer work for me, unfortunately.
Thanks! I've been looking forward to you posting this, and it appears to work beautifully for VC++ 7.1 (as others have already commented). Do I have your permission to included it in Boost.Threads under the Boost license (with your name on it, of course)?
Yes, you and anyone else have perpetual permission to offer it under any license you choose.
Thank you. It's in, and will be in the upcoming release. A lot of people will be happy, including me! I look forward to the GCC version as well, though I haven't yet used GCC myself.
I'm working on a way to do this under VC6. Please let me know if theres any other way I can be of assistance with regards to this.
By the way, in your posting, you had this code:
// Report thread and process detach events. void NTAPI tls_callback (PVOID, DWORD Reason, PVOID) {
if(Reason == DLL_THREAD_DETACH) on_thread_exit(); else if(Reason == DLL_PROCESS_DETACH) { on_thread_exit(); on_process_exit(); } }
If I understand what DLL_PROCESS_DETACH means (the thread is being notified that it is being detached because the process is exiting), the call to on_process_exit() is incorrect; on_process_exit() is meant to be called when the process actually exits (it calls TlsFree()). The most likely place to call it is at the end of main() or after the main thread exits.
I sent my last message too soon, before I mentioned that I'm aware that the incorrect code was originally copied from my code. In thinking about it, I also realized that, at least in the case of Boost.Threads being built as a dll, there isn't a convenient place to call on_process_exit() from and the name of the function is misleading (since the function needs to be called when the dll is being unloaded, not necessarily when the process is exiting).
So I'm considering the following changes: 1) Eliminating the on_process_enter() function, which is unnecessary ans has a misleading name, and the on_process_exit() function, which is difficult to call correctly and also has a misleading name. 2) Adding the new on_thread_enter() function. 3) Implementing a attached-thread-count scheme: on_thread_enter() increments the count, on_thread_exit() decrements it, and when it reaches zero on_thread_exit() calls TlsFree().
The DllMain() or tls_callback() functions call on_thread_enter() for both the DLL_PROCESS_ATTACH and the DLL_THREAD_ATTACH messages, and they call on_thread_exit for both the DLL_PROCESS_DETACH and DLL_THREAD_DETACH messages.
This scheme seems a little cleaner and so far seems to be working OK. Any thoughts about this?
Yes I think reference counting seems like a good idea here. The only concern I have--I don't know how important this is--is that when a thread is killed with TerminateThread, I think neither TLS callbacks nor DllMain()s will be called. This would cause the TLS count to never reach zero, and for TlsFree to never be called. These threads might not even be associated with the Boost TSS at all. For example, another third party library in some DLL might be using TerminateThread, and the application programmer may not even know about this, so it may be suprising to him when the reference count never reaches zero for apparently no good reason.
Mike
Aaron W. LaFramboise

"Aaron W. LaFramboise" <aaronrabiddog51@aaronwl.com> writes: <snip several complete quoted messages>
The DllMain() or tls_callback() functions call on_thread_enter() for both the DLL_PROCESS_ATTACH and the DLL_THREAD_ATTACH messages, and they call on_thread_exit for both the DLL_PROCESS_DETACH and DLL_THREAD_DETACH messages.
This scheme seems a little cleaner and so far seems to be working OK. Any thoughts about this?
Yes I think reference counting seems like a good idea here.
Please limit the amount of quoted text in messages. http://www.boost-consulting.com/boost/more/discussion_policy.htm -- Dave Abrahams Boost Moderator

Aaron W. LaFramboise wrote: [snip previous conversation]
The DllMain() or tls_callback() functions call on_thread_enter() for both the DLL_PROCESS_ATTACH and the DLL_THREAD_ATTACH messages, and they call on_thread_exit for both the DLL_PROCESS_DETACH and DLL_THREAD_DETACH messages.
This scheme seems a little cleaner and so far seems to be working OK. Any thoughts about this?
Yes I think reference counting seems like a good idea here.
The only concern I have--I don't know how important this is--is that when a thread is killed with TerminateThread, I think neither TLS callbacks nor DllMain()s will be called. This would cause the TLS count to never reach zero, and for TlsFree to never be called.
Good point. Two comments: 1) If you're calling TerminateThread, you're probably already leaking; maybe one more leak won't matter. And...
These threads might not even be associated with the Boost TSS at all. For example, another third party library in some DLL might be using TerminateThread, and the application programmer may not even know about this, so it may be suprising to him when the reference count never reaches zero for apparently no good reason.
2) I"m only reference counting threads on which Boost.Threads tss is actually used, so this shouldn't happen. Mike

Michael Glassford wrote:
Aaron W. LaFramboise wrote:
Michael Glassford wrote:
Aaron W. LaFramboise wrote:
Following is an implementation of, and some comments on, a staticly linked Boost.Thread thread exit handler for MSVC.
By the way, I appologize for talking so long to cough this up after posting teasing remarks about this months ago. Programming is presently a hobby and volunteer work for me, unfortunately.
Thanks! I've been looking forward to you posting this, and it appears to work beautifully for VC++ 7.1 (as others have already commented). Do I have your permission to included it in Boost.Threads under the Boost license (with your name on it, of course)?
Yes, you and anyone else have perpetual permission to offer it under any license you choose.
Thank you. It's in, and will be in the upcoming release. A lot of people will be happy, including me! I look forward to the GCC version as well, though I haven't yet used GCC myself.
Make sure you look at Roland's code that gets MSVC6 right by doing a fixup that fixes a problem in the TLS support code. That fixup should only be done on MSVC6 though. For GCC/MinGW, I'm actually adding an API that gets all of this right: an atexit variant that knows about module unloading and thread destruction. Ideally all compilers would have such an API to make these sorts of problems easy. :)
If I understand what DLL_PROCESS_DETACH means (the thread is being notified that it is being detached because the process is exiting), the call to on_process_exit() is incorrect; on_process_exit() is meant to be called when the process actually exits (it calls TlsFree()). The most likely place to call it is at the end of main() or after the main thread exits.
Well, as near as I can tell, when the TLS routine is called with DLL_PROCESS_DETACH, this will be the absolute last user code executing in the process, including DLL destructors. As far as I know, once you return from DLL_PROCESS_DETACH in the TLS routine, no more user code will run. Aaron W. LaFramboise

Aaron W. LaFramboise wrote:
Michael Glassford wrote:
Aaron W. LaFramboise wrote:
Michael Glassford wrote:
Aaron W. LaFramboise wrote:
Following is an implementation of, and some comments on, a staticly linked Boost.Thread thread exit handler for MSVC.
By the way, I appologize for talking so long to cough this up after posting teasing remarks about this months ago. Programming is presently a hobby and volunteer work for me, unfortunately.
Thanks! I've been looking forward to you posting this, and it appears to work beautifully for VC++ 7.1 (as others have already commented). Do I have your permission to included it in Boost.Threads under the Boost license (with your name on it, of course)?
Yes, you and anyone else have perpetual permission to offer it under any license you choose.
Thank you. It's in, and will be in the upcoming release. A lot of people will be happy, including me! I look forward to the GCC version as well, though I haven't yet used GCC myself.
Make sure you look at Roland's code that gets MSVC6 right by doing a fixup that fixes a problem in the TLS support code. That fixup should only be done on MSVC6 though.
I definitely will. I was under the impression that it was still under construction, but if it's ready, I'm interested.
For GCC/MinGW, I'm actually adding an API that gets all of this right: an atexit variant that knows about module unloading and thread destruction. Ideally all compilers would have such an API to make these sorts of problems easy. :)
If I understand what DLL_PROCESS_DETACH means (the thread is being notified that it is being detached because the process is exiting), the call to on_process_exit() is incorrect; on_process_exit() is meant to be called when the process actually exits (it calls TlsFree()). The most likely place to call it is at the end of main() or after the main thread exits.
Well, as near as I can tell, when the TLS routine is called with DLL_PROCESS_DETACH, this will be the absolute last user code executing in the process, including DLL destructors. As far as I know, once you return from DLL_PROCESS_DETACH in the TLS routine, no more user code will run.
Is it called only for the last thread to detach, or can it be called for more than one thread? Does it work the same for tls_callback and DllMain? Mike

On Fri, 30 Jul 2004 23:34:54 -0500 "Aaron W. LaFramboise" <aaronrabiddog51@aaronwl.com> wrote:
There was also mention of some sort of runtime library floating point hook. I was unable to figure out anything about this; however, if this is possible, and the ptd is still valid when this hook is called, that method is probably superior.
This is indeed the case. The tiddata structure is still vaild at the time this "hook" is beeing called. Below I am posting a variant on a statically linked thread cleanup. Also I think I got the dtor order to work correct. However the implementation requires that the c-runtime is linked against. (This I consider no serious restriction since there are only rare cases where one does not need CRT especially when using boost.) // file: tss.c linked statically to your application, define ,BOOST_THREAD_USE_LIB globally #include <stdlib.h> typedef void (__cdecl *_PVFV)(void); extern void on_process_enter(void); extern void on_process_exit(void); extern void on_thread_exit(void); static void on_process_init(void); static void on_process_term(void); static void on_thread_term(void); static void on_mainthread_term(void); /* following is taken from Codeguru: */ /* http://www.codeguru.com/Cpp/misc/misc/threadsprocesses/article.php/c6945__2/ */ /* we get control after all global ctors */ #pragma data_seg(".CRT$XCU") static _PVFV pinit = on_process_init; #pragma data_seg() #pragma data_seg(".CRT$XTU") static _PVFV pterm = on_process_term; #pragma data_seg() /* This is the FP kook */ static _PVFV _FPmttermOrig = 0; extern _PVFV _FPmtterm; static void on_process_init(void) { /* hook the main thread exit, will run before global dtors */ /* but will not run when 'quick' exiting the library! */ /* this is the same behaviour as for global dtors */ atexit(on_mainthread_term); /* hook the normal threads exit */ _FPmttermOrig = _FPmtterm; _FPmtterm = on_thread_term; on_process_enter(); } static void on_mainthread_term(void) { on_thread_exit(); } static void on_process_term(void) { on_process_exit(); } static void on_thread_term(void) { on_thread_exit(); /* chain to the original termination handler code */ if ( _FPmttermOrig != 0 ) (*_FPmttermOrig)(); } void tss_cleanup_implemented(void) {}; Some comments: While the fp hook works fine, I am not sure yet whether any fp calls might alter the pointer. This would definitely be very bad, and surely has to be investigated more thuroughly. (Some fp gurus out there?) Since this trick is not documented in the user-part (I took the information from the CRT sources) there also might be a chance that this "feature" will not be available sometime in the future. But: as far as I can see the behaviour is the same in MSVC6 as in MSVC7. (Sidebar: .CRT$XLB does not work in MSVC6. I am afraid this has to do with CRT, and not with the OS loader calling the TLS callbacks. Did anyone try the PE TLS-callback without the CRT linked in?) The above approach will work for any threads that have been created by _beginthread or _beginthreadex.
As far as I can tell, FlsCallback() offers no advantage with regards to the ptd issue. It is also called "too late."
As already discussed, the "too late" argument actually is not a very important one at all. The tiddata structure is dynamically created by the CRT when needed. The worst thing that may happen is, that (only when using certain functions in the destructor) the thread leaves with a memory leak.
Another problem with this method is that it gets destructor order wrong. For example, in the case of termination of the main thread, global destructors will be called before the TSS destructors are. This also could lead to silent misbehavior. It is likely that a better method for handling termination of the main thread could be found, such as a global destructor.
I think my proposal addresses this problem in the correct way. The only problem is when the main thread exits (destructing the thread_specific_ptr) leaving other threads behind. But however I consider this wrong program behaviour anyways.
Note that, with this method, it doesn't particular matter how the thread was created. There will be no ptd regardless of whether the thread was created with _beginthread or not.
This (unfortunately?) isn't the case with my solution, but however, relying on _beinthread or _beginthreadex having been called is a much less restrictive requirement. BTW.: While perhaps not the most clean solution it also might be possible to 'clean up' the tiddata of the thread from withing DLL_THREAD_DETACH? Of course this still would require to even more fiddle around with internal CRT structures. Roland

On Fri, 30 Jul 2004 23:34:54 -0500 "Aaron W. LaFramboise" <aaronrabiddog51@aaronwl.com> wrote:
It is likely that a better method for handling termination of the main thread could be found, such as a global destructor.
I didn't answer corrcetcly to this one in my first post. Here it goes: I am using the clibs atexit function for termination in the main thread. Below I am posting a variation on my previous proposal, which builds on the "piggy-pack" DLL proposal I've posted some time ago. This version does not need to fiddle around with the fp termination and works on foreign threads too. However the same caveats as Aaron already pointed out, apply. While I agree that Aarons solution is much more cleaner and elegant (and should be preferred where possible), I think it will not be able to make it work on MSVC6. I am afraid, the ".CRT$XLB" segment is not recognized as a TLS code pointer table in linkers previous to 7.1. Since the boost code on tss now has the process/thread specific stuff factored out very nicely (Thanks to Mike?) it was easy to port my piggy-dll to support it. Also my concerns about locking have gone, since the piggy-dll is doing exactly the same as the other solutions accomplish. // file: tssdll.c linked statically to your application, define BOOST_THREAD_USE_LIB globally #include <stdlib.h> #define WIN32_LEAN_AND_MEAN #include <windows.h> /* the boost handlers */ extern void on_process_enter(void); extern void on_process_exit(void); extern void on_thread_exit(void); /* locally defined handlers */ static void on_process_init(void); static void on_process_term(void); static void on_thread_term(void); static void on_mainthread_term(void); typedef void (__cdecl *_PVFV)(void); /* following is taken from Codeguru: */ /* http://www.codeguru.com/Cpp/misc/misc/threadsprocesses/article.php/c6945__2/ */ /* we get control after all global ctors */ #pragma data_seg(".CRT$XCU") static _PVFV pinit = on_process_init; #pragma data_seg() #pragma data_seg(".CRT$XTU") static _PVFV pterm = on_process_term; #pragma data_seg() /* these are the module and filename of the temporary DLL */ static HMODULE hm = NULL; static LPTSTR fn = NULL; /* This is the piggy-pack DLL hook */ /* * The embedded 'dllmain_dll' image has been generated from the commented out small * DLL stub file with the command: (VC6.0) * NB.: it does not depend on CRT in any way. * cl /W3 /O1 DllMain.c /link /out:DllMain.dll /dll /align:64 /ignore:4108 /nodefaultlib /machine:I386 */ #if 0 __declspec(dllexport) void (__cdecl *thread_detach)(void); void __cdecl dummy(void) {}; int __stdcall _DllMainCRTStartup( void * hDllHandle, unsigned long dwReason, void* lpreserved ) { switch (dwReason) { case 1: thread_detach = &dummy; break; case 3: thread_detach(); break; } return 1; } #endif /* the following could be put in a data segment that is discarded after * startup. If anyone knows about such a thing, please let me know. */ static const unsigned char dllmain_dll[] = /* 960 */ {0x4D,0x5A,0x90,0x00,0x03,0x00,0x00,0x00,0x04,0x00,0x00,0x00,0xFF,0xFF,0x00 ,0x00,0xB8,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x40,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0xC0,0x00,0x00,0x00,0x0E,0x1F,0xBA,0x0E,0x00,0xB4,0x09,0xCD,0x21,0xB8,0x01 ,0x4C,0xCD,0x21,0x54,0x68,0x69,0x73,0x20,0x70,0x72,0x6F,0x67,0x72,0x61,0x6D ,0x20,0x63,0x61,0x6E,0x6E,0x6F,0x74,0x20,0x62,0x65,0x20,0x72,0x75,0x6E,0x20 ,0x69,0x6E,0x20,0x44,0x4F,0x53,0x20,0x6D,0x6F,0x64,0x65,0x2E,0x0D,0x0D,0x0A ,0x24,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x6F,0xDD,0x05,0xDB,0x2B,0xBC,0x6B ,0x88,0x2B,0xBC,0x6B,0x88,0x2B,0xBC,0x6B,0x88,0x2D,0x9F,0x61,0x88,0x2A,0xBC ,0x6B,0x88,0xD4,0x9C,0x6F,0x88,0x2A,0xBC,0x6B,0x88,0x52,0x69,0x63,0x68,0x2B ,0xBC,0x6B,0x88,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x50,0x45,0x00 ,0x00,0x4C,0x01,0x04,0x00,0xCF,0xC9,0xC8,0x3F,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0xE0,0x00,0x0E,0x21,0x0B,0x01,0x06,0x00,0x40,0x00,0x00,0x00,0x00 ,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x81,0x02,0x00,0x00,0x80,0x02,0x00,0x00 ,0xC0,0x02,0x00,0x00,0x00,0x00,0x00,0x10,0x40,0x00,0x00,0x00,0x40,0x00,0x00 ,0x00,0x04,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x04,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0xC0,0x03,0x00,0x00,0x80,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x02 ,0x00,0x00,0x00,0x00,0x00,0x10,0x00,0x00,0x10,0x00,0x00,0x00,0x00,0x10,0x00 ,0x00,0x10,0x00,0x00,0x00,0x00,0x00,0x00,0x10,0x00,0x00,0x00,0xC0,0x02,0x00 ,0x00,0x4C,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x80,0x03,0x00,0x00,0x10,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x2E,0x74,0x65,0x78,0x74,0x00,0x00,0x00,0x24,0x00 ,0x00,0x00,0x80,0x02,0x00,0x00,0x40,0x00,0x00,0x00,0x80,0x02,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x20,0x00,0x00,0x60 ,0x2E,0x72,0x64,0x61,0x74,0x61,0x00,0x00,0x4C,0x00,0x00,0x00,0xC0,0x02,0x00 ,0x00,0x80,0x00,0x00,0x00,0xC0,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x40,0x00,0x00,0x40,0x2E,0x64,0x61,0x74,0x61 ,0x00,0x00,0x00,0x04,0x00,0x00,0x00,0x40,0x03,0x00,0x00,0x40,0x00,0x00,0x00 ,0x40,0x03,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x40,0x00,0x00,0xC0,0x2E,0x72,0x65,0x6C,0x6F,0x63,0x00,0x00,0x10,0x00 ,0x00,0x00,0x80,0x03,0x00,0x00,0x40,0x00,0x00,0x00,0x80,0x03,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x40,0x00,0x00,0x42 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xC3,0x8B,0x44,0x24,0x08 ,0x48,0x74,0x0C,0x48,0x48,0x75,0x12,0xFF,0x15,0x40,0x03,0x00,0x10,0xEB,0x0A ,0xC7,0x05,0x40,0x03,0x00,0x10,0x80,0x02,0x00,0x10,0x6A,0x01,0x58,0xC2,0x0C ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0xCF,0xC9,0xC8,0x3F,0x00,0x00,0x00,0x00,0xF2,0x02,0x00,0x00 ,0x01,0x00,0x00,0x00,0x01,0x00,0x00,0x00,0x01,0x00,0x00,0x00,0xE8,0x02,0x00 ,0x00,0xEC,0x02,0x00,0x00,0xF0,0x02,0x00,0x00,0x40,0x03,0x00,0x00,0xFE,0x02 ,0x00,0x00,0x00,0x00,0x44,0x6C,0x6C,0x4D,0x61,0x69,0x6E,0x2E,0x64,0x6C,0x6C ,0x00,0x74,0x68,0x72,0x65,0x61,0x64,0x5F,0x64,0x65,0x74,0x61,0x63,0x68,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x10,0x00,0x00,0x00,0x8E,0x32,0x96,0x32,0x9A,0x32,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00}; static void on_process_init(void) { DWORD dw; LPTSTR path; HANDLE hf; /* hook the main thread exit, will run before global dtors */ /* but will not run when 'quick' exiting the library! */ /* this is the same behaviour as for global dtors */ atexit(on_mainthread_term); /* hook the normal threads exit */ /* we do this by dynamically writing out a small DLL stub and loading it */ /* during process startup */ dw = GetTempPath(0,NULL); path = (LPTSTR)malloc(dw); GetTempPath(dw,path); fn = (LPTSTR)malloc(dw+32); GetTempFileName(path,"AET", 0, fn); free(path); hf = CreateFile(fn, GENERIC_WRITE,0,NULL,CREATE_ALWAYS, FILE_ATTRIBUTE_TEMPORARY,NULL); if (INVALID_HANDLE_VALUE != hf) { WriteFile(hf, dllmain_dll, sizeof(dllmain_dll), &dw, NULL); /* don't forget to adjust the size! */ CloseHandle(hf); hm = LoadLibrary(fn); if (NULL != hm) { *(_PVFV*)GetProcAddress(hm, "thread_detach") = on_thread_term; } else { DeleteFile(fn); free(fn); fn = 0; } } else { free(fn); fn = 0; } /* do the boost stuff */ on_process_enter(); } static void on_mainthread_term(void) { on_thread_exit(); } static void on_thread_term(void) { on_thread_exit(); } static void on_process_term(void) { on_process_exit(); FreeLibrary(hm); DeleteFile(fn); free(fn); } void tss_cleanup_implemented(void) {}; ---EOF--- In summary there are now three variations on the exithtread theme available. (actually 4 when counting the user supplied calls to process/thread init/term). I hope this should be enough to reintroduce the static linking feature in the next version :-) Roland

Here is version working on MSVC6. It uses definition of IMAGE_TLS_DIRECTORY structure as defined in Platform SDK installed together with MSVC6 (later on it has been updated to contain just 6 DWORDs). I received this code from Holger Grund, tweaked it to run with old Platform SDK headers, added Beeps and second thread and finally tested with MSVC6. I think that definition of _tls_used (from Holger) is something new and useful :> B. #define WIN32_LEAN_AND_MEAN #include <windows.h> typedef unsigned long ULONG_PTR; namespace StaticTlsSection { void NTAPI tls_thread_callback( void*, DWORD dwReason, void* ) { if(dwReason == DLL_THREAD_DETACH) Beep(440, 100); else if(dwReason == DLL_PROCESS_DETACH) Beep(880, 50); } ULONG tls_index; const char data_template[1] = { 0 }; //const void (NTAPI *callback_array[1])( PIMAGE_TLS_CALLBACK callback_array[2] = { &tls_thread_callback, NULL }; extern "C" const IMAGE_TLS_DIRECTORY _tls_used; const IMAGE_TLS_DIRECTORY _tls_used = { reinterpret_cast<DWORD> ( &data_template ), reinterpret_cast<DWORD> ( (&data_template) + 1 ), reinterpret_cast<PDWORD> ( &tls_index ), reinterpret_cast<PIMAGE_TLS_CALLBACK *> ( &callback_array ), 0, 0 }; } DWORD WINAPI f(void *) { Sleep(500); return 0; } int main(int argc, char* argv[]) { DWORD id; HANDLE h = CreateThread(NULL, 0, &f, 0, 0, &id); WaitForSingleObject(h, INFINITE); CloseHandle(h); return 0; }

"Bronek Kozicki" wrote
I think that definition of _tls_used (from Holger) is something new and useful :>
That's great! _tls_used variable appears to be absolutely required. To explain, I was testing one ordinary (not using any boost code) static library with VC7.1 and forcing .tss segment creation with __declspec(thread) as in example: <code> void NTAPI tls_callback (PVOID, DWORD Reason, PVOID) { if(Reason == DLL_THREAD_DETACH) { on_thread_exit(); } else if(Reason == DLL_PROCESS_DETACH) { on_thread_exit(); on_process_exit(); } } #pragma data_seg(push, old_seg) #pragma data_seg(".CRT$XLB") DWORD tls_callback_ptr = (DWORD)tls_callback; #pragma data_seg(pop, old_seg) __declspec(thread) int ithread; ... some other library code </code> I was surprised to find that tls_callback code was not invoked at all! Tony

Tony Juricic wrote:
I think that definition of _tls_used (from Holger) is something new and useful :>
That's great!
_tls_used variable appears to be absolutely required. To explain, I was
What I understand (it's quite blury picture right now) is that if you define one of: a) int _tls_index or b) _declspec(thread) some_variable any of them will result in two things: * section .tls will be created * symbol __tls_used will be created and put in .tls section b) is actually documented in MSDN. Trouble is that if you do one of a) or b), symbol __tls_used will be defined by compiler and I do not know how to put there (using MSVC6) address of your callback function. It seems that MSVC6 just does not support appending anything to .tls section. However, you may define this symbol ("extern "C" const DWORD _tls_used[6]; const DWORD _tls_used[6] = ...") yourself in executable (but not in .tls section) and if you build this symbol right way (as demonstrated by Holger and specified in section 6.7.1 of PE/COFF specification) your callback procedure will be recognized by operating system and called at the right time. I think that it does not conform to PE/COFF specification (section 6.7 explicitly specify section name .tls) but it works, even on MSVC6 (but not under MinGW, hmmm ...). B.

On Sun, 01 Aug 2004 22:31:22 +0200 Bronek Kozicki <brok@rubikon.pl> wrote:
However, you may define this symbol ("extern "C" const DWORD _tls_used[6]; const DWORD _tls_used[6] = ...") yourself in executable (but not in .tls section) and if you build this symbol right way (as demonstrated by Holger and specified in section 6.7.1 of PE/COFF specification) your callback procedure will be recognized by operating system and called at the right time. I think that it does not conform to PE/COFF specification (section 6.7 explicitly specify section name .tls) but it works, even on MSVC6 (but not under MinGW, hmmm ...).
Just tried this at home. My MSVC 6.0 compiler does not do the trick. :-( Might it be, that running on W2K is the problem? Roland

"Bronek Kozicki" wrote:
What I understand (it's quite blury picture right now) is that if you define one of: a) int _tls_index or b) _declspec(thread) some_variable
any of them will result in two things: * section .tls will be created * symbol __tls_used will be created and put in .tls section
The automatic creation of _tls_used does not happen in my tests. To be more precise, I was considering Matt Pietrek's 'Under The Hood' article from 1999 stating: "...These functions are prototyped as IMAGE_TLS_CALLBACK functions, which are defined in WINNT.H. Not coincidentally, the TLS initialization callbacks happen to look just like a DllMain function. For what it's worth, when using __declspec(thread) variables, Visual C++ emits data that causes this routine to be invoked. However, no actual callbacks are currently defined by the runtime library, so the array of function pointers is a single NULL entry." However, in my case that does not seem to be a true for a static library (in contrast to implicitly linked DLL) created with Visual Studio 2003 (VC++ 71.), all in Debug mode. I looked at: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore/html... but it does not mention any specific issues with static libraries. Tony

Bronek Kozicki <brok <at> rubikon.pl> writes:
Hi Bronek, I hope you don't mind following up in here as I don't have access to the private groups right now anyway.
_tls_used variable appears to be absolutely required. To explain, I was
What I understand (it's quite blury picture right now) is that if you define one of: a) int _tls_index or b) _declspec(thread) some_variable
I believe things are a little bit different. Most of it is just guessing, though. But anyway: Essentially if a symbol named "__tls_used" is present, the VC6 linker will emit. VC 7 introduced __declspec(thread) which uses this functionality. Therefore the 7.0= CRT provides the __tls_used structure referencing the data in ]CRTXLA,CRTXLZ[. This would result in a static TLS section emitted for every PE image. Therefore the VC7 linker introduced a check for the __tls_index and the ".tls" section. __declspec(thread) introduces a data item in .tls. Accessing such a variable introduces references to _tls_array and _tls_index. So if you use VC7+ linker you need ".tls" section and __tls_used. With a VC6 linker the latter should be sufficient. With a GCC ld a version script might help. And GCC also support something like __attribute__((section)) which might work with PE images, too. -hg

Holger Grund wrote:
I hope you don't mind following up in here as I don't have access to the private groups right now anyway.
I wonder how many MVPs are in boost :>
So if you use VC7+ linker you need ".tls" section and __tls_used. With a VC6 linker the latter should be sufficient.
I think that in MSVC6 things little are more complicated. But anyway Roland posted nice solution here, I'm testing it now :) B.

Bronek Kozicki wrote:
Holger Grund wrote:
I hope you don't mind following up in here as I don't have access to the private groups right now anyway.
I wonder how many MVPs are in boost :>
At least 4, that I know of - I'm sure it's more than that though. [A little OT - sorry!] -cd

On Sun, 01 Aug 2004 19:10:32 +0200 Bronek Kozicki <brok@rubikon.pl> wrote:
Here is version working on MSVC6. It uses definition of IMAGE_TLS_DIRECTORY structure as defined in Platform SDK installed together with MSVC6 (later on it has been updated to contain just 6 DWORDs). I received this code from Holger Grund, tweaked it to run with old Platform SDK headers, added Beeps and second thread and finally tested with MSVC6. I think that definition of _tls_used (from Holger) is something new and useful :>
This sounds very promising! But unfortunately I cannot get it to work on my MSVC 6.0. BTW.: I think there is a typo in your post:
reinterpret_cast<PDWORD> ( &tls_index ), reinterpret_cast<PIMAGE_TLS_CALLBACK *> ( &callback_array ),
shouldn't they read: reinterpret_cast<DWORD> ( &tls_index ), reinterpret_cast<DWOD> ( &callback_array ), instead? However the callback never gets called for me. I tried to link with Multithreaded debug version of CRT. Roland

Roland wrote:
BTW.: I think there is a typo in your post:
reinterpret_cast<PDWORD> ( &tls_index ), reinterpret_cast<PIMAGE_TLS_CALLBACK *> ( &callback_array ),
shouldn't they read: reinterpret_cast<DWORD> ( &tls_index ), reinterpret_cast<DWOD> ( &callback_array ),
instead?
It's just a question of which winnt.h (indirectly through windows.h) you are using. I'm using MSVC6 with original includes, possibly you have newer Platform SDK. This whole structure could be just replaced by "extern "C" DWORD _tls_used[6]; DWORD _tls_used[6] = ..." in order to avoid conflicts in the future.
However the callback never gets called for me. I tried to link with Multithreaded debug version of CRT.
I think that's because this structure is not placed in .tls section. On my computer it's executed, possibly because I have different OS (Windows Server 2003). We need to find a way to put callback address in .tls section using older compilers. B.

Bronek Kozicki wrote:
I think that's because this structure is not placed in .tls section. On my computer it's executed, possibly because I have different OS (Windows Server 2003). We need to find a way to put callback address in .tls section using older compilers.
As far as I know, the callback list does not go in .tls, but in .rdata (or I guess .data). .tls is only for initialized TLS data (I suppose it is like a TLS version of .data). Aaron W. LaFramboise

Aaron W. LaFramboise wrote:
As far as I know, the callback list does not go in .tls, but in .rdata (or I guess .data). .tls is only for initialized TLS data (I suppose it is like a TLS version of .data).
OK, but if you create __tls_used by hand, section .tls is not created at all. That might be reason why on some OSes callback is not executed (however on Windows Server 2003 it is). B.

Bronek Kozicki wrote:
Here is version working on MSVC6. It uses definition of IMAGE_TLS_DIRECTORY structure as defined in Platform SDK installed together with MSVC6 (later on it has been updated to contain just 6 DWORDs). I received this code from Holger Grund, tweaked it to run with old Platform SDK headers, added Beeps and second thread and finally tested with MSVC6. I think that definition of _tls_used (from Holger) is something new and useful :>
Just as a FYI, I now have a copy of MSVC6, and am working on this. MSVC6 does, in fact, have the necessary support, but there is a bug (I had noticed this before, and this was one of the reasons I wasn't able to offer more information a few months ago, and I had entirely forgotten about it. Oops.). Fortunately, the bug is in the runtime library, not in the linker or anything else. I'm trying to find an easy work around for this. I'll report back within 24 hours. As a last resort, the TLS directory (that pesky _tls_used symbol) can be redefined, but this is hackish, and on the edge of what I'd consider acceptable, and would need to be tested quite extensively to make sure it is compatible. For example, the suggestions so far for overriding the TLS directory will break __declspec(thread) and will probably cause random redefined symbol link failures in some situations. Aaron W. LaFramboise

"Aaron W. LaFramboise" wrote:
As a last resort, the TLS directory (that pesky _tls_used symbol) can be redefined, but this is hackish, and on the edge of what I'd consider acceptable, and would need to be tested quite extensively to make sure it is compatible. For example, the suggestions so far for overriding the TLS directory will break __declspec(thread) and will probably cause random redefined symbol link failures in some situations.
Here are some (hopefully) more specific details about problems with the static library, using VC++ 7.1, Debug compilation, linking with static multithreaded CRT. 1) The main program code does not really matter - in my case it is a dumb Win32 console exe that uses no boost code, has no extra threads (except for the main process thread) and its sole purpose is to call foo() function in a static library and exit. 2) Beside implementing foo() fuction called from the main, static library uses no boost code and the only requirement is to get: void NTAPI tls_callback (PVOID, DWORD Reason, PVOID) { ... whatever, some code where you can set a breakpoint } invoked when the main program thread exits. So far I found only two ways to hit the breakpoint set in tls_callback: A) Put this code in static library: extern "C" int _tls_used; int dummy() { return _tls_used; } OR B) put this: __declspec(thread) int ithread; You don't have to create a thread, use ithread variable or do anything else in your static library code except for implementing foo() and tls_callback(). Either A or B will cause tls_callback to be called (breakpoint will be hit) with VC++ 7.1, but only and only if the following is also a part of the static library code: #pragma data_seg(push, old_seg) #pragma data_seg(".CRT$XLB") DWORD tls_callback_ptr = (DWORD)tls_callback; #pragma data_seg(pop, old_seg) So, unless I made some gross mistake, it seems to me it is quite lame that Microsoft would document TLS callbacks PEF data but not provide any documented way to define and declare TLS callbacks. Tony

Tony Juricic wrote:
Either A or B will cause tls_callback to be called (breakpoint will be hit) with VC++ 7.1, but only and only if the following is also a part of the static library code:
#pragma data_seg(push, old_seg) #pragma data_seg(".CRT$XLB")
Yes, it seems that this is the only way MSVC71 can append address of your callback to __tls_used. Problem si that it does not work with MSVC6 :<< B.

On Sun, 01 Aug 2004 16:41:18 -0500 "Aaron W. LaFramboise" <aaronrabiddog51@aaronwl.com> wrote:
Just as a FYI, I now have a copy of MSVC6, and am working on this.
MSVC6 does, in fact, have the necessary support, but there is a bug (I had noticed this before, and this was one of the reasons I wasn't able to offer more information a few months ago, and I had entirely forgotten about it. Oops.). Fortunately, the bug is in the runtime library, not in the linker or anything else.
Yes the bug is, that the TLS handlers must be in a contiguous area between the __xl_a and __xl_z symbols. I fixed this by running a small piece of code during the startup (in __xi_a .. __xi_z area). Finally I wrapped everything up into a small C file that either can be bound to boost or be linked with the user application. Despite now having everything in a single file, I think boost still should not give away the possibility of letting the user code call the process/thread startup/termination hooks directly. There always might be some code that needs this. Thanks to Aaron now now have a TLS solution that can handle any thread creation mechansim, while still reside in a statically bound library. The tsstls.c file follows: To test it compile your application with BOOST_THREAD_USE_LIB /* Boost Software License - Version 1.0 - August 17th, 2003 Permission is hereby granted, free of charge, to any person or organization obtaining a copy of the software and accompanying documentation covered by this license (the "Software") to use, reproduce, display, distribute, execute, and transmit the Software, and to prepare derivative works of the Software, and to permit third-parties to whom the Software is furnished to do so, all subject to the following: The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software, in whole or in part, and all derivative works of the Software, unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. This piece of code is a result of the work of: Aaron W.LaFramboise, who showed how to implement TLS-callback Michael Glassford, who factored out the startup code Bronek Kozicki, who showed me, that it is not harmful to access the CRT after thread end Roland Schwarz, who did the writing, runtime initialization (.CRTXxx), correct dtor behaviour and broken MSVC 6 fix 08.02.2004 */ #include <stdlib.h> #define WIN32_LEAN_AND_MEAN #include <windows.h> typedef void (__cdecl *_PVFV)(void); typedef void (NTAPI* _TLSCB)(HINSTANCE, DWORD, PVOID); /* some symbols for connection to the runtime environment */ extern IMAGE_TLS_DIRECTORY _tls_used; /* the tls directory (located in .rdata segment) */ extern _TLSCB __xl_a[], __xl_z[]; /* tls initializers */ /* the boost tss startup interface */ extern void on_process_enter(void); extern void on_process_exit(void); extern void on_thread_exit(void); /* some forward declarations */ static void on_tls_prepare(void); static void on_process_init(void); static void NTAPI on_thread_callback(HINSTANCE, DWORD, PVOID); /* The .CRT$Xxx information is taken from Codeguru: */ /* http://www.codeguru.com/Cpp/misc/misc/threadsprocesses/article.php/c6945__2/ */ /* The tls glue code is to be run first */ /* I don't think it is necessary to run it */ /* at .CRT$XIB level, since we are only */ /* interested in thread detachement. But */ /* this could be changed easily if required. */ #pragma data_seg(".CRT$XIU") static _PVFV p_tls_prepare = on_tls_prepare; #pragma data_seg() /* we need to get control after all global ctors */ #pragma data_seg(".CRT$XCU") static _PVFV p_process_init = on_process_init; #pragma data_seg() /* this is the TLS callback */ #pragma data_seg(".CRT$XLB") _TLSCB p_thread_callback = on_thread_callback; #pragma data_seg() /* we will run the termination late */ #pragma data_seg(".CRT$XTU") static _PVFV p_process_exit = on_process_exit; #pragma data_seg() static void on_tls_prepare(void) { _TLSCB* pfbegin; _TLSCB* pfend; _TLSCB* pfdst; pfbegin = __xl_a; pfend = __xl_z; /* the following line has an important side effect: */ /* if the TLS directory is not already there, it will */ /* be created by the linker. (_tls_used) */ pfdst = (_TLSCB*)_tls_used.AddressOfCallBacks; /* the following loop will merge the address pointers */ /* into a contiguous area, since the tlssup code seems */ /* to require this (at least on MSVC 6) */ while (pfbegin < pfend) { if (*pfbegin != 0) { *pfdst = *pfbegin; ++pfdst; } ++pfbegin; } } static void on_process_init(void) { /* This hooks the main thread exit. It will run the */ /* termination before global dtors, but will not be run */ /* when 'quick' exiting the library! However, this is the */ /* standard behaviour for all global dtors anyways. */ atexit(on_thread_exit); /* hand over to boost */ on_process_enter(); } void NTAPI on_thread_callback(HINSTANCE h, DWORD dwReason, PVOID pv) { if(dwReason == DLL_THREAD_DETACH) on_thread_exit(); } void tss_cleanup_implemented(void) {}; ----EOF--- Roland

Oops, a small (yet not very harmful) bug in the tls_prepare loop: I forgot to write the end marker. while (pfbegin < pfend) { if (*pfbegin != 0) { *pfdst = *pfbegin; ++pfdst; } ++pfbegin; } *pfdst = 0; /* write the end marker of course! */ Roland

Roland wrote:
On Sun, 01 Aug 2004 16:41:18 -0500 "Aaron W. LaFramboise" <aaronrabiddog51@aaronwl.com> wrote:
Just as a FYI, I now have a copy of MSVC6, and am working on this.
MSVC6 does, in fact, have the necessary support, but there is a bug (I had noticed this before, and this was one of the reasons I wasn't able to offer more information a few months ago, and I had entirely forgotten about it. Oops.). Fortunately, the bug is in the runtime library, not in the linker or anything else.
Yes the bug is, that the TLS handlers must be in a contiguous area between the __xl_a and __xl_z symbols. I fixed this by running a small piece of code during the startup (in __xi_a .. __xi_z area).
I think that the problem is something else. The linker sorts everything correctly and puts it into a contiguous section. The problem is apparently that noone ever used the __xl_a code (until 7.1.. what does this mean?) and so never noticed its broken. The linker merges the sections like this: .CRT$XLA ___xl_a: .long 0 ; provided by tlssup.obj in the runtime .CRT$XL? ; B through Y pointer_to_tls_callback ___xl_z: .CRT$XLZ .long 0 ; provided by tlssup.obj also A relocation is generated that assigns ___xl_a to the TLS callback field of the TLS directory. The storage referred to by ___xl_z null-terminates the list, as specified by PECOFF. The trouble is that we actually want the callback field to point to ___xl_a + 4, not at ___xl_a itself, which is zero. The tlssup.obj that is part of MSVC6's runtime libraries gets this wrong, and so the TLS callback list pointed to by the TLS directory looks like this: [null pointer][user-specified callback][null pointer] In other words, whatever bit of the PE loader responsible for calling the TLS callbacks hits that first null, thinks (correctly) that it is the end of the list, and never calls any of the callbacks. Apparently someone noticed this, and fixed it for MSVC7.1. If you try to use the MSVC6 tlssup.obj with MSVC7.1, you'll get the same broken behavior. (You can't do the reverse because the objects aren't backwards compatible.) In any case, the runtime fixup you mention appears to fix this, although it might be doing more work that it needs to (you just need to replace that first zero with something valid). I must admit I am slightly concerned about modifying an PE image at runtime to make it correct, for the same reason I am concerned with hooking in production code. It seems a little hackish, and it seems like it might cause suprising behavior. The alternative is to provide an implementation of tlssup.obj that isn't broken, but this is also slightly hackish (although it does at least produce an image that is correct with no runtime fixups needed). I was hoping there might be some sort of way to tweak something or other to make the real MSVC6 tlssup.obj behave correctly, but there does not seem to be any way other than doing some sort of runtime fixup, or flat-out replacing the whole object. In any case, no sort of runtime fixup should be done on anything other than MSVC6, since later versions seem to get it right. On these versions, I think we really should be marking the callbacks const and using bss_seg rather than data_seg. This matches the behavior of the rest of the native TLS support, and I think is more likely to work in general. Also, on a unrelated point, is there any reason to use the .CRT$XC section directly rather than use a global class? They're really the same thing, but the entire .CRT section is undocumented, and not very well known. It seems unnecessary to depend upon that interface if there is no particular gain from using it over the well-defined interface. Aaron W. LFramboise

On Sun, 01 Aug 2004 21:28:30 -0500 "Aaron W. LaFramboise" <aaronrabiddog51@aaronwl.com> wrote:
The trouble is that we actually want the callback field to point to ___xl_a + 4, not at ___xl_a itself, which is zero. The tlssup.obj that is part of MSVC6's runtime libraries gets this wrong, and so the TLS callback list pointed to by the TLS directory looks like this:
[null pointer][user-specified callback][null pointer]
In other words, whatever bit of the PE loader responsible for calling the TLS callbacks hits that first null, thinks (correctly) that it is the end of the list, and never calls any of the callbacks.
Unfortunately this seems to be not enough. Even when the first entry is not zero (tried to set it to dummy stub) my callback allocated via .CRT$XLB does not get called, there are still lot of zeroes in between. It seems as if the linker has a minimum size when emiting data segements.
In any case, the runtime fixup you mention appears to fix this, although it might be doing more work that it needs to (you just need to replace that first zero with something valid). I must admit I am slightly concerned about modifying an PE image at runtime to make it correct, for the same reason I am concerned with hooking in production code.
Hmm. Do I really modify the PE image? I am just modifying data that lies in the data segment. Isn't this ok? What are your concerns? And, then: even Microsoft relies on modifying in memory read only segments. I learned this at some time when I was interested in those "first level exceptions" that you can see in the debugger at times, but never get through to your program. They are used to create a copy of the RO memory on the fly, so that it can be written. And yes this has been seen in production code. But again I don't think we are doing something simmilar ugly here. What I would be concerned of is, that someone else already has taken reference of an item I am moving away. This however could be easily solved by providing a non null nop-callback, in place of the zeroes. But thread shutdown would last longer of course.
It seems a little hackish, and it seems like it might cause suprising behavior. The alternative is to provide an implementation of tlssup.obj that isn't broken, but this is also slightly hackish (although it does at least produce an image that is correct with no runtime fixups needed).
I was hoping there might be some sort of way to tweak something or other to make the real MSVC6 tlssup.obj behave correctly, but there does not seem to be any way other than doing some sort of runtime fixup, or flat-out replacing the whole object.
Reading your post again I am not anymore sure if the linker really is doing the expected thing here. As it is for now, holes of zeroes are normal, but PE directory requires contiguous behaviour you say? The .CRT$XIC startup code fixes the very same problem for the startup code the following way: while ( pfa < pfz ) { if ( *pfa != 0 ) (**pfa)(); ++pfa; } Doing the same in the tlssup code replacement would do a lot of unnecessary looping on every thread startup/shutdown. I cannot believe that this is intended (or desirable).
In any case, no sort of runtime fixup should be done on anything other than MSVC6, since later versions seem to get it right.
Agreed. This callback never was used in MSVC6 so it is a hack however you view it. Obviously we are the first to use it ever.
Also, on a unrelated point, is there any reason to use the .CRT$XC section directly rather than use a global class? They're really the same thing, but the entire .CRT section is undocumented, and not very well known. It seems unnecessary to depend upon that interface if there is no particular gain from using it over the well-defined interface.
Yes. We need to run after the last global c-tor has finished to be sure that all our thread_specific_ptr ctors have been called. This is because I rely on the well documented behaviour of the atexit function that I use at this time to schedule the main-thread exit. This in turn will cause that it will be run before any of the global dtors (e.g. thread_specific_ptr) get to live. This is to solve for the wrong dtor ordering problem. If you know an other well documented way how to achieve this I would prefer it of course. BTW: global ctor execution order (as to my knowledge) is not specified by the C++ standard. And then I think the method used is by no way less "documented" and "reliable" than the TLS callback. (Did you check out my link to godeguru?) And then, CRT relies on it to such an extent, that it is unlikely to change (without beeing replaced by a more capable means). To summarize: I think you have found the second least hackish solution of all using TLS callback. My original proposal (using the piggy pack DLL) uses _only_ documented API's for implementation. But it looks ugly. I would vote for using TLS-callback, but commenting out the fixup-code when used for MSVC7. And then: should there be any unforeseeable problems in the future we always can revert back to the piggy-pack-DLL solution without (noticeable) effect for the user of the library. What do you think? Roland

Roland wrote:
On Sun, 01 Aug 2004 21:28:30 -0500 "Aaron W. LaFramboise" <aaronrabiddog51@aaronwl.com> wrote:
The trouble is that we actually want the callback field to point to ___xl_a + 4, not at ___xl_a itself, which is zero. The tlssup.obj that is part of MSVC6's runtime libraries gets this wrong, and so the TLS callback list pointed to by the TLS directory looks like this:
[null pointer][user-specified callback][null pointer]
In other words, whatever bit of the PE loader responsible for calling the TLS callbacks hits that first null, thinks (correctly) that it is the end of the list, and never calls any of the callbacks.
Unfortunately this seems to be not enough. Even when the first entry is not zero (tried to set it to dummy stub) my callback allocated via .CRT$XLB does not get called, there are still lot of zeroes in between. It seems as if the linker has a minimum size when emiting data segements.
In any case, the runtime fixup you mention appears to fix this, although it might be doing more work that it needs to (you just need to replace that first zero with something valid). I must admit I am slightly concerned about modifying an PE image at runtime to make it correct, for the same reason I am concerned with hooking in production code.
Hmm. Do I really modify the PE image? I am just modifying data that
data segment. Isn't this ok? What are your concerns? You are right; this sort of modification is probably not that big of a deal. I am just concerned that, since this data forms part of the PE
Also, on a unrelated point, is there any reason to use the .CRT$XC section directly rather than use a global class? They're really the same thing, but the entire .CRT section is undocumented, and not very well known. It seems unnecessary to depend upon that interface if there is no particular gain from using it over the well-defined interface.
Yes. We need to run after the last global c-tor has finished to be sure that all our thread_specific_ptr ctors have been called. This is because I rely on the well documented behaviour of the atexit function that I use at
This is odd. This doesn't happen for me. The only problem I have is the single leading 4-byte zero caused by the improper tlssup.obj code. As I mentioned, I can swap the correct and incorrect tlssup.obj's out, and see the problem come and go. I am not sure what PECOFF says, but it is the expected and well-known behavior that all sections with a $ in them are sorted and merged before being linked. I have no idea how zeroes could come to be in the middle. lies in the format, a debugger pr loader might rely on it not being altered at runtime. this time
to schedule the main-thread exit. This in turn will cause that it will be run before any of the global dtors (e.g. thread_specific_ptr) get to live. This is to solve for the wrong dtor ordering problem.
Yes, you are right here. This does seem to be the best way to get this right.
And then: should there be any unforeseeable problems in the future we always can revert back to the piggy-pack-DLL solution without (noticeable) effect for the user of the library.
I suppose once its in CVS and a few testers run it, we'll see if anything major breaks. But, I wonder if any of the testers are running, for eg, Windows 95. A lot of developers still care about users on Windows 95, but probably none are still using it. That OS is almost a decade old now, and is older than the copy of the PECOFF specification that I have. It would be good to have a confirmation that this works there. Another thing I am curious about is the case where Boost.Thread is statically linked to user code in a DLL. Will this TLS callback code still work there? Aaron W. LaFramboise

"Roland" wrote :
Yes the bug is, that the TLS handlers must be in a contiguous area between the __xl_a and __xl_z symbols. I fixed this by running a small piece of code during the startup (in __xi_a .. __xi_z area).
I admit I wondered what this post: <quote> From: Jonathan Wilson (jonwil_at_tpgi.com.au) Date: Mon Aug 18 2003 - 22:01:21 CDT a.. Next message: Marcelo Duarte: "Fw: Fix treeview with checkboxes creation" a.. Previous message: Jonathan Wilson: "Re: [Mingw-users] How Thread Local Storage works in WIN32 and in Visual C++" b.. Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ] ---------------------------------------------------------------------------- ---- This is information I have figured out and found out about how Thread Local Storage (that is, variables declared with __declspec(thread)) works in Win32 from the perspective of the exe file (as opposed to the perspective of the kernel). I am posting this to the ReactOS list since it needs support for this in its kernel code. I am posting this to the WINE list since it also should support this from a kernel point of view. Plus it should (ideally) support __declspec(thread) in WineGCC at some point if that is possible I am posting this to the MingW list since MingW needs __declspec(thread) support. First, TLS in visual C++ relies on: 1.the __declspec(thread) keyword & some special stuff the compiler does when accessing TLS variables 2.the .tls segment in the PE file 3.the IMAGE_TLS_DIRECTORY32 structure (pointed at by a field in the PE header and stored in the read only data segment) and the things it points at (specificly the TlsStart, TlsEnd and TlsIndex variables) TlsIndex is stored in the read/write data segment Tls 4.the tlssup.obj file (which is inside the Visual C++ runtime library .lib files like libc.lib and msvcrt.lib) and 5.a linker feature that will combine segments in a special way if they are named right (for example, .tls will be written first, then anything labeled .tls$ then .tls$zzz. All of them will be combined into one segment labeled .tls) Firstly, when you declare a variable as __declspec(thread), the compiler accesses it a special way. It takes the current value in the TlsIndex variable. Then it takes the value stored at offset 2C in the TEB (which contains the pointer to the TLS data area for that thread) it then does TlsPointer + TlsIndex * 4. This is what it uses when it reads and writes from Thread Local Storage variables. The first variable seems to be stored at (this address) + 4 then + 8 and so on. at (this address) is the _tls_start variable as explained below) Also, when you declare a __declspec(thread) variable, its put into the obj file inside the .tls$ segment. Most of the magic happens inside the linker. First, lets explain tlssup.obj This contains an item called __tls_used that resides in the read only data segment which then becomes the TLS directory pointed at by the PE header. This points at __tls_start which is in the .tls segment, __tls_end which is in the .tls$zzz segment and __tls_index which is in the read write data segment There is also ___xl_a and ___xl_z, ___xl_a is pointed to by the thread callbacks pointer. So, what we have in the resulting exe file is: the __tls_used variable in the read only data segment the __tls_index variable in the read write data segment and the .tls segment containing: __tls_start //all the user declared __declspec(thread) variables __tls_end And we also have ___xl_a followed by calls to the thread callback functions folowed by ___xl_z So, basicly, to implement TLS in compilers, we need to: 1.make __declspec(thread) point at the thread local storage and get the compiler to access these variables correctly. 2.make the variables all end up in the correct place in the exe file with the TLS directory pointing at the start and end of the list 3.write the needed support code for the RTL (as needed) 4.make the callbacks (if needed) get generated & output to the exe file properly and 5.make sure that the PE header points to the TLS directory. If anyone has anything to add or wants more details, do reply As for kernel-mode support in ReactOS or whatever, that seems easy enough to implement, anyone wanna have a go? </quote>a.. really means in the terms of implementation without Roland's code. However, I still find it hard to accept that all this hoopla for protecting programmer who forgot to call thread_specifIc_ptr->reset() in his thread procedure is a serious programming issue worthy of such attention and so many lines of code.a.. Tony

Roland wrote:
Yes the bug is, that the TLS handlers must be in a contiguous area between the __xl_a and __xl_z symbols. I fixed this by running a small piece of code during the startup (in __xi_a .. __xi_z area).
Hello Roland did you tested this code with MSVC6 ? I'm trying it now (MSVC6 SP6, cl.exe version 12.00.8804), no results :-( B.

Roland wrote:
On Sun, 01 Aug 2004 16:41:18 -0500 "Aaron W. LaFramboise" <aaronrabiddog51@aaronwl.com> wrote:
Just as a FYI, I now have a copy of MSVC6, and am working on this.
MSVC6 does, in fact, have the necessary support, but there is a bug (I had noticed this before, and this was one of the reasons I wasn't able to offer more information a few months ago, and I had entirely forgotten about it. Oops.). Fortunately, the bug is in the runtime library, not in the linker or anything else.
Yes the bug is, that the TLS handlers must be in a contiguous area between the __xl_a and __xl_z symbols. I fixed this by running a small piece of code during the startup (in __xi_a .. __xi_z area).
Finally I wrapped everything up into a small C file that either can be bound to boost or be linked with the user application. Despite now having everything in a single file, I think boost still should not give away the possibility of letting the user code call the process/thread startup/termination hooks directly. There always might be some code that needs this.
I'm not planning to remove them.
Thanks to Aaron now now have a TLS solution that can handle any thread creation mechansim, while still reside in a statically bound library.
The tsstls.c file follows: To test it compile your application with BOOST_THREAD_USE_LIB
[code snipped] I must be missing something. I added the code you posted into Boost.Threads static library and built it--no problem. But then when I link with that library, I get: boost_cvs6sd.lib(threadmon.obj) : error LNK2001: unresolved external symbol "struct _IMAGE_TLS_DIRECTORY32 _tls_used" (?_tls_used@@3U_IMAGE_TLS_DIRECTORY32@@A) boost_cvs6sd.lib(threadmon.obj) : error LNK2001: unresolved external symbol "void (__stdcall** __xl_z)(struct HINSTANCE__ *,unsigned long,void *)" (?__xl_z@@3PAP6GXPAUHINSTANCE__@@KPAX@ZA) boost_cvs6sd.lib(threadmon.obj) : error LNK2001: unresolved external symbol "void (__stdcall** __xl_a)(struct HINSTANCE__ *,unsigned long,void *)" (?__xl_a@@3PAP6GXPAUHINSTANCE__@@KPAX@ZA) Debug6/boost_test.exe : fatal error LNK1120: 3 unresolved externals In the code you posted, the missing symbols are all defined as extern; what do I need to do to get their actually definitions linked in? Mike

On Mon, 02 Aug 2004 23:07:38 -0400 Michael Glassford <glassfordm@hotmail.com> wrote:
I must be missing something. I added the code you posted into Boost.Threads static library and built it--no problem. But then when I link with that library, I get:
boost_cvs6sd.lib(threadmon.obj) : error LNK2001: unresolved external symbol "struct _IMAGE_TLS_DIRECTORY32 _tls_used" (?_tls_used@@3U_IMAGE_TLS_DIRECTORY32@@A) boost_cvs6sd.lib(threadmon.obj) : error LNK2001: unresolved external symbol "void (__stdcall** __xl_z)(struct HINSTANCE__ *,unsigned long,void *)" (?__xl_z@@3PAP6GXPAUHINSTANCE__@@KPAX@ZA) boost_cvs6sd.lib(threadmon.obj) : error LNK2001: unresolved external symbol "void (__stdcall** __xl_a)(struct HINSTANCE__ *,unsigned long,void *)" (?__xl_a@@3PAP6GXPAUHINSTANCE__@@KPAX@ZA) Debug6/boost_test.exe : fatal error LNK1120: 3 unresolved externals
In the code you posted, the missing symbols are all defined as extern; what do I need to do to get their actually definitions linked in?
I linked the file statically with my test programs so far. I will look into this. Could you please send me a copy of your current threadmon, so I can test it with my setup. Roland mailto:roland.schwarz@chello.at

On Mon, 02 Aug 2004 23:07:38 -0400 Michael Glassford <glassfordm@hotmail.com> wrote:
In the code you posted, the missing symbols are all defined as extern; what do I need to do to get their actually definitions linked in?
This is very strange. The symbols are all defined by the CRT. The _tls_used is the 'magic' symbol that seems to be created by the linker itself when needed. As far as we know it is used to create the automatic TLS directory for PE. Nevertheless I tried to modify my local copy of the threadmon and it worked. Here is what I did: in file threadmon.cpp: #if defined(BOOST_MSVC) && (BOOST_MSVC >= 1310) //1310 == VC++ 7.1 //As currently defined, the following is known //to work only for VC++ 7.1. //It is known not to work with VC 6. #include <libs/thread/src/pe_tls.ipp> #elif defined(BOOST_MSVC) && (BOOST_MSVC >= 1200) //200 == VC++ 6.0 #include <libs/thread/src/pe6_tls.ipp> #endif the file pe6_tls.ipp is my original post, wrapped for C bindings: #include <stdlib.h> #define WIN32_LEAN_AND_MEAN #include <windows.h> extern "C" { ... original posted file ... }; I am doing bjam -s"TOOLS=msvc" --prefix=C:\ install then. I verified, that the correct version is getting called by stepping into the source file with the debugger. I hope his issue can be resolved. BTW.: I am currently testing if there are any problems when linking the boost threads statically to a user DLL, and if there are any drawbacks/interferences to the __declspec(thread) of the compiler, as was mentioned by Aaron. Roland

On Wed, 04 Aug 2004 09:48:00 +0200 Bronek Kozicki <brok@rubikon.pl> wrote:
Roland wrote:
#elif defined(BOOST_MSVC) && (BOOST_MSVC >= 1200) //200 == VC++ 6.0 #include <libs/thread/src/pe6_tls.ipp>
What version of MSVC6 (service pack ?) you are using?
I used some service pack before 6 as it appeared to work. Now I installed SP6 and it does not work any more. Same as your observations. I will try to find out what is going on. Roland

On Wed, 4 Aug 2004 11:10:29 +0200 (W. Europe Daylight Time) Roland <roland.schwarz@chello.at> wrote:
On Wed, 04 Aug 2004 09:48:00 +0200 Bronek Kozicki <brok@rubikon.pl> wrote:
Roland wrote:
#elif defined(BOOST_MSVC) && (BOOST_MSVC >= 1200) //200 == VC++ 6.0 #include <libs/thread/src/pe6_tls.ipp>
What version of MSVC6 (service pack ?) you are using?
I used some service pack before 6 as it appeared to work. Now I installed SP6 and it does not work any more. Same as your observations.
I will try to find out what is going on.
So sorry, I was alittle bit too fast. It still does work with MSVC6 SP6. I made a mistake in my test code, which prevented the thread to die, and consequently I got no callback. So I am still wondering why it does not work for you. I am running on a W2K machine. Could this be the reason? At least the callbacks are directly from within NTDLL.DLL. Roland

On Wed, 4 Aug 2004 08:56:42 +0200 (W. Europe Daylight Time) Roland <roland.schwarz@chello.at> wrote:
BTW.: I am currently testing if there are any problems when linking the boost threads statically to a user DLL,
My findings so far: Building a user DLL that is statically linked to boost surprisingly is disallowed. I get the message: #error: "Mixing a dll boost library with a static runtime is a really bad idea..." Can anyone explain, please, why this should be bad? This simply prohibits shrink wrapped user DLL code with embedded boost. Are there any requirements of other part of boost that I am not aware? Then: Of course linking with the dynamic version is possible, as was before. Just one observation: The current implementation also suffers from the wrong dtor order problem that Aaron metioned. While in the case of threadmon this is not harmful thanks to a side effect of the thread_specific_ptr dtor calling the delete's of any yet bound objects, I consider this as somewhat 'unclean'. In the DLL_PROCESS_DETACH the on_thread_exit is called for the main thread case DLL_PROCESS_DETACH: { on_thread_exit(); // this does nothing 'really useful' on_process_exit(); break; } which is a no-op in our case at best. If anyone ever will try to provide a user level atexitthread (similar to atexit) it will fail badly, because of different semantics to atexit. atexit will normally run before any global dtors. This is not so in our case, because on_thread_exit is beeing called even after CRT deinitialization! At this time I think no user level code should be called. This could be cleaned up by a similar approach as in the static version where atexit is scheduled after global ctor have finished. Roland

Building a user DLL that is statically linked to boost surprisingly is disallowed. I get the message: #error: "Mixing a dll boost library with a static runtime is a really bad idea..."
You will get that error if you are trying to use a Boost lib as a DLL, when the run time is statically linked.... and that really is a bad idea as the calling code and the lib will each have different runtimes. If you statically link Boost to a DLL that doesn't expose Boost to the outside world then that should be fine, and should not produce the above message. John.

On Wed, 4 Aug 2004 11:02:05 +0100 John Maddock <john@johnmaddock.co.uk> wrote:
.... If you statically link Boost to a DLL that doesn't expose Boost to the outside world then that should be fine, and should not produce the above message.
This is exactly the case. I do not expose any of boost, but do get the message. But thank you, your information was helpful. I forgot to set the BOOST_THREAD_USE_LIB before trying to statically link to my DLL. Now it compiles. Roland

Roland wrote:
In the code you posted, the missing symbols are all defined as extern; what do I need to do to get their actually definitions linked in?
This is very strange. The symbols are all defined by the CRT. The _tls_used is
I think that Michael did the same mistake as I : forgot to place these symbols in extern "C". Attached project compiles fine and runs on my computer. However if I build your code as separate static library, TLS callback is not called :-( B. ***************************************************************************** ** ** ** WARNING: This email contains an attachment of a very suspicious type. ** ** You are urged NOT to open this attachment unless you are absolutely ** ** sure it is legitmate. Opening this attachment may cause irreparable ** ** damage to your computer and your files. If you have any questions ** ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. ** ** ** ** This warning was added by the IU Computer Science Dept. mail scanner. ** *****************************************************************************

Bronek Kozicki wrote:
Roland wrote:
In the code you posted, the missing symbols are all defined as extern; what do I need to do to get their actually definitions linked in?
This is very strange. The symbols are all defined by the CRT. The _tls_used is
I think that Michael did the same mistake as I : forgot to place these symbols in extern "C".
Thanks, that did turn out to be the problem. It works fine, now, which is really good news. Mike

On Wed, 04 Aug 2004 08:51:35 -0400 Michael Glassford <glassfordm@hotmail.com> wrote:
Thanks, that did turn out to be the problem. It works fine, now, which is really good news.
Fine. I am currently trying to unify both versions (7 and 6). This also seems to be necessary, because Aarons code while paving the way yet has some minor problems (also on vc7): While it is very interesting, that it is not mainCRTStartup which is the very first function of the executable that receives control, it may be problematic in our case. I disbelieve that the threading library is prepared to be the very first in town (not even the EXE's entry point has been called yet!) and so I think it is much safer to initalize everything after the runtime system is up, but before main. BTW.: I found the following in the documentation of the CRT: ;*** ;defsects.inc - defines sections. ; ; Copyright (c) 1989-1997, Microsoft Corporation. All rights reserved. ; ;Purpose: ; This file defines sections for the C and C++ libs. ; ; NOTE: As needed, special "CRT" sections can be added into the existing ; init/term tables. These will be for our use only -- users who put ; stuff in here do so at their own risk. ; ;****************************************************************************** and then ; XIA Begin C Initializer Sections ; XIC Microsoft Reserved ; XIU User ; XIZ End C Initializer Sections and so on. I believe using the XnU sections may be regarded rather safe usage then. Also I would like to propose the following: Instead of the threadmon.cpp including the pe_tls.ipp file we should simply leave the static part empty. Then we put into the library a pe_tls.c file which exports void on_process_enter(void); void on_process_exit(void); void on_thread_exit(void); and void tss_cleanup_implemented(void) ; when the user code now provides its own implementation of these function she overrides the boost.thread library versions. This not only will increase compatibility with the current status but also provides a default implementation. What do you think about this? I will try this out and post something when ready. Also I will be a little more conservative about the meaning of _tls_used as Aaron pointed out. It is not really necessary to acces any of its fields directly. BTW.: did you ever consider providing the user level function atthreadexit that seems to be having been planned by W.Kempf? Roland

Roland wrote:
On Wed, 04 Aug 2004 08:51:35 -0400 Michael Glassford <glassfordm@hotmail.com> wrote:
Thanks, that did turn out to be the problem. It works fine, now, which is really good news.
Fine.
I am currently trying to unify both versions (7 and 6). This also seems to be necessary, because Aarons code while paving the way yet has some minor problems (also on vc7):
While it is very interesting, that it is not mainCRTStartup which is the very first function of the executable that receives control, it may be problematic in our case.
I disbelieve that the threading library is prepared to be the very first in town (not even the EXE's entry point has been called yet!) and so I think it is much safer to initalize everything after the runtime system is up, but before main.
Having experimented a bit and thought a bit, here's my take on how this could work: 1) Functions on_process_enter() and on_thread_enter() are unnecessary. 2) DllMain() and tls_callback() are needed only to call on_thread_exit(). 3) Function at_thread_exit() schedules a tss cleanup task that needs to be run for a thread that uses tss. The first time it is executed it also calls: atexit(on_process_exit); atexit(on_thread_exit); to schedule cleanup for the main thread before destructors of global objects are run and to schedule on_process_exit() (which calls TlsFree()). 4) In addition, to provide some cleanup support for compilers that don't yet have tls_callback() support, the thread class continues to call on_thread_exit() after the thread function exits for threads created by Boost.Threads. This should make unification of the VC++ 7.1 and 6 versions easier; the only difference would be the startup code needed to call on_tls_prepare(), I believe. Comments? Also, I was wondering: has anyone tried this on VC++ 7.0? Mike

On Wed, 04 Aug 2004 10:50:47 -0400 Michael Glassford <glassfordm@hotmail.com> wrote:
Having experimented a bit and thought a bit, here's my take on how this could work:
1) Functions on_process_enter() and on_thread_enter() are unnecessary.
I am not convinced about on_process_enter. This could be the place to allocate the slot without the need for any locking. (and the call to atexit)
2) DllMain() and tls_callback() are needed only to call on_thread_exit().
I agree on that.
3) Function at_thread_exit() schedules a tss cleanup task that needs to be run for a thread that uses tss. The first time it is executed it also calls:
atexit(on_process_exit); atexit(on_thread_exit);
to schedule cleanup for the main thread before destructors of global objects are run and to schedule on_process_exit() (which calls TlsFree()).
I have some reservations of how you will make sure, that the first call is in the context of the main thread. If not the call to atexit might be erreonous. (Is atexit really multithread safe?) This is why I schedule the call to atexit within .CRT$XCU. (BTW the initialization of the slot could very well also be from this place, the choice to do it earlier was only to have the slot well before any constructors were run.)
4) In addition, to provide some cleanup support for compilers that don't yet have tls_callback() support, the thread class continues to call on_thread_exit() after the thread function exits for threads created by Boost.Threads.
Yes this is very reasonable. I would even suggest that this is the default behaviour and only when the thread is foreign, the callback shall be used. However this will require some (thread specific) flag that can be interrogated by the tls_callback implementation whether boost already has called on_thread_exit. (Or see the comment further below.)
This should make unification of the VC++ 7.1 and 6 versions easier; the only difference would be the startup code needed to call on_tls_prepare(), I believe.
I think your original semantics of the three: void on_process_enter(void); void on_process_exit(void); void on_thread_exit(void); is already perfect. void main { on_process_enter(); ... on_process_exit(); } and calling from the end of the thread proc: on_thread_exit. I personally would like to see the tls implementation exposing only these and relying on the compiler specific files doing the rest. I can even think of on_thread_exit having an internal one shot flag which simply prevent it running multiple times in case e.g. boost.thread and the callback (or a user code) is calling in multiple times. What do you think? BTW.: Unification is not so hard at all, given the current version. I am just dealing with the different semantics of the initializers. Newer ones simply have a return value which may abort initialization prematurely. But this is rather trivial to solve for.
Comments?
Also, I was wondering: has anyone tried this on VC++ 7.0?
Not yet. But I still have a version around on my machine. What still puzzles me, that there were reports that the solution does not work with __declspec(thread). I will dive into this issue (tomorrow) and also the static binding to a user DLL which also has to betested out yet. Roland

Below I am posting my version that worked successfully with MSVC6 (SP6 and below) and MSVC7.1 I put this file (pe_tls.cpp) in the src directory and added it to the Jamfile. Both compilers did well. And testing was succesful. Usercode still needs BOOST_THREAD_USE_LIB. Could you please check if I set the conditions correctly so things won't break for other compilers? File pe_tls.cpp follows: /* Boost Software License - Version 1.0 - August 17th, 2003 Permission is hereby granted, free of charge, to any person or organization obtaining a copy of the software and accompanying documentation covered by this license (the "Software") to use, reproduce, display, distribute, execute, and transmit the Software, and to prepare derivative works of the Software, and to permit third-parties to whom the Software is furnished to do so, all subject to the following: The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software, in whole or in part, and all derivative works of the Software, unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. This piece of code is a result of the work of: Aaron W.LaFramboise, who showed how to implement TLS-callback Michael Glassford, who factored out the startup code Bronek Kozicki, who showed me, that it is not harmful to access the CRT after thread end Roland Schwarz, who did the writing, runtime initialization (.CRTXxx), correct dtor behaviour and broken MSVC 6 fix 08.02.2004 */ #include <boost/thread/detail/config.hpp> #if defined(BOOST_HAS_WINTHREADS) && defined(BOOST_THREAD_BUILD_LIB) && defined (BOOST_MSVC) /* altough this file is essentially C we need to compile it as CPP because */ /* the config.hpp assumes this */ extern "C" { #include <stdlib.h> #define WIN32_LEAN_AND_MEAN #include <windows.h> #if (BOOST_MSVC < 1310) // 1310 == VC++ 7.1 vers 7.0 still has to be evaluated yet! typedef void (__cdecl *_PVFV)(void); #define INIRETSUCCESS #define PVAPI void #else typedef int (__cdecl *_PVFV)(void); #define INIRETSUCCESS 0 #define PVAPI int #endif typedef void (NTAPI* _TLSCB)(HINSTANCE, DWORD, PVOID); /* some symbols for connection to the runtime environment */ extern DWORD _tls_used; /* the tls directory (located in .rdata segment) */ extern _TLSCB __xl_a[], __xl_z[]; /* tls initializers */ /* the boost tss startup interface */ extern void on_process_enter(void); extern void on_process_exit(void); extern void on_thread_exit(void); /* some forward declarations */ static PVAPI on_tls_prepare(void); static PVAPI on_process_init(void); static PVAPI on_process_term(void); static void NTAPI on_thread_callback(HINSTANCE, DWORD, PVOID); /* The .CRT$Xxx information is taken from Codeguru: */ /* http://www.codeguru.com/Cpp/misc/misc/threadsprocesses/article.php/c6945__2/ */ /* The tls glue code is to be run first */ /* I don't think it is necessary to run it */ /* at .CRT$XIB level, since we are only */ /* interested in thread detachement. But */ /* this could be changed easily if required. */ #pragma data_seg(".CRT$XIU") static _PVFV p_tls_prepare = on_tls_prepare; #pragma data_seg() /* we need to get control after all global ctors */ #pragma data_seg(".CRT$XCU") static _PVFV p_process_init = on_process_init; #pragma data_seg() /* this is the TLS callback */ #pragma data_seg(".CRT$XLB") _TLSCB p_thread_callback = on_thread_callback; #pragma data_seg() /* we will run the termination late */ #pragma data_seg(".CRT$XTU") static _PVFV p_process_term = on_process_term; #pragma data_seg() static PVAPI on_tls_prepare(void) { /* the following line has an important side effect: */ /* if the TLS directory is not already there, it will */ /* be created by the linker. (_tls_used) */ /* voltile should prevent the optimizer from removing the reference */ DWORD volatile dw = _tls_used; #if (BOOST_MSVC < 1310) // 1310 == VC++ 7.1 _TLSCB* pfbegin; _TLSCB* pfend; _TLSCB* pfdst; pfbegin = __xl_a; pfend = __xl_z; pfdst = pfbegin; /*pfdst = (_TLSCB*)_tls_used.AddressOfCallBacks; */ /* the following loop will merge the address pointers */ /* into a contiguous area, since the tlssup code seems */ /* to require this (at least on MSVC 6) */ while (pfbegin < pfend) { if (*pfbegin != 0) { *pfdst = *pfbegin; ++pfdst; } ++pfbegin; } *pfdst = 0; #endif return INIRETSUCCESS; } static PVAPI on_process_init(void) { /* This hooks the main thread exit. It will run the */ /* termination before global dtors, but will not be run */ /* when 'quick' exiting the library! However, this is the */ /* standard behaviour for all global dtors anyways. */ atexit(on_thread_exit); /* hand over to boost */ on_process_enter(); return INIRETSUCCESS; } static PVAPI on_process_term(void) { on_process_exit(); return INIRETSUCCESS; } void NTAPI on_thread_callback(HINSTANCE h, DWORD dwReason, PVOID pv) { if(dwReason == DLL_THREAD_DETACH) on_thread_exit(); } void tss_cleanup_implemented(void) {}; }; #endif // BOOST_HAS_WINTHREADS --EOF--- Roland

I thought I had sent this yesterday, but it appears that I didn't. Though it's a little late, here it is anyway. Roland wrote:
On Wed, 04 Aug 2004 10:50:47 -0400 Michael Glassford <glassfordm@hotmail.com> wrote:
Having experimented a bit and thought a bit, here's my take on how this could work:
1) Functions on_process_enter() and on_thread_enter() are unnecessary.
I am not convinced about on_process_enter. This could be the place to allocate the slot without the need for any locking. (and the call to atexit)
I'm not worried about locking. Locking is needed for at_thread_exit(), since it can be called from multiple threads simultaneously. Since locking will already be necessary there, it makes sense to use it for the on_process_*() and on_thread_*() functions, especially since they are also available for users to call and we can't guarantee they will be called in a thread-safe manner, either.
2) DllMain() and tls_callback() are needed only to call on_thread_exit().
I agree on that.
3) Function at_thread_exit() schedules a tss cleanup task that needs to be run for a thread that uses tss. The first time it is executed it also calls:
atexit(on_process_exit); atexit(on_thread_exit);
to schedule cleanup for the main thread before destructors of global objects are run and to schedule on_process_exit() (which calls TlsFree()).
I have some reservations of how you will make sure, that the first call is in the context of the main thread. If not the call to atexit might be erreonous. (Is atexit really multithread safe?)
Good point. I'll make that modification.
This is why I schedule the call to atexit within .CRT$XCU. (BTW the initialization of the slot could very well also be from this place, the choice to do it earlier was only to have the slot well before any constructors were run.)
I don't think that's a big problem either if locking is assumed. It makes as much sense to allocate the slot when it is first used; that way if Boost.Threads tss is never used a tls slot is never allocated.
4) In addition, to provide some cleanup support for compilers that don't yet have tls_callback() support, the thread class continues to call on_thread_exit() after the thread function exits for threads created by Boost.Threads.
Yes this is very reasonable. I would even suggest that this is the default behaviour and only when the thread is foreign, the callback shall be used. However this will require some (thread specific) flag that can be interrogated by the tls_callback implementation whether boost already has called on_thread_exit. (Or see the comment further below.)
I'm assuming that on_thread_exit can be called multiple times per thread (as it can with the current implementation). The thread-specific flag is the tls slot's value: if it's zero, no cleanup is needed.
This should make unification of the VC++ 7.1 and 6 versions easier; the only difference would be the startup code needed to call on_tls_prepare(), I believe.
I think your original semantics of the three: void on_process_enter(void); void on_process_exit(void); void on_thread_exit(void);
is already perfect. void main { on_process_enter();
....
on_process_exit(); }
and calling from the end of the thread proc: on_thread_exit.
I personally would like to see the tls implementation exposing only these and relying on the compiler specific files doing the rest. I can even think of on_thread_exit having an internal one shot flag which simply prevent it running multiple times in case e.g. boost.thread and the callback (or a user code) is calling in multiple times.
As I said, the tls slot itself essentially does this.
What do you think?
BTW.: Unification is not so hard at all, given the current version. I am just dealing with the different semantics of the initializers. Newer ones simply have a return value which may abort initialization prematurely. But this is rather trivial to solve for.
OK.
Comments?
Also, I was wondering: has anyone tried this on VC++ 7.0?
Not yet. But I still have a version around on my machine. What still puzzles me, that there were reports that the solution does not work with __declspec(thread). I will dive into this issue (tomorrow) and also the static binding to a user DLL which also has to betested out yet.
Let me know what you find out about both of these. Thanks, Mike

Michael Glassford wrote:
I think that Michael did the same mistake as I : forgot to place these symbols in extern "C".
Thanks, that did turn out to be the problem. It works fine, now, which is really good news.
What happens if you compile whole solution to separate .lib (static library), then link this library to some other project using TLS and threads? This did not work for me (under MSVC6)... B.

On Wed, 4 Aug 2004 15:33:36 +0200 Bronek Kozicki <brok@rubikon.pl> wrote:
Michael Glassford wrote:
I think that Michael did the same mistake as I : forgot to place these symbols in extern "C".
Thanks, that did turn out to be the problem. It works fine, now, which is really good news.
What happens if you compile whole solution to separate .lib (static library), then link this library to some other project using TLS and threads? This did not work for me (under MSVC6)...
What did you observe? Roland

Roland wrote:
What happens if you compile whole solution to separate .lib (static library), then link this library to some other project using TLS and threads? This did not work for me (under MSVC6)...
What did you observe?
callbacks are not called B.

"Bronek Kozicki" <brok@rubikon.pl> wrote in message news:01d401c47a2f$ecd414b0$3000000a@integral.int...
Roland wrote:
What happens if you compile whole solution to separate .lib (static library), then link this library to some other project using TLS and threads? This did not work for me (under MSVC6)...
What did you observe?
callbacks are not called
Excuse me for jumping into the middle of the conversation. If this is the case, could it be that the linker discards the symbols as they are not obviously referenced by the .exe code? Could you try to force the librarian to reference those symbols (under VC7.1 it's the /INCLUDE: option for the lib - don't have access to VC6 at the moment). // Johan

Johan Nilsson wrote:
If this is the case, could it be that the linker discards the symbols as they are not obviously referenced by the .exe code? Could you try to force the librarian to reference those symbols (under VC7.1 it's the /INCLUDE: option for the lib - don't have access to VC6 at the moment).
Bingo! I've added to test project: #pragma comment(linker, "/include:_p_thread_callback") and it works now. I'm attaching MSVC6 project for others to try it :) Last thing that is working not quite as expected is that if I link to dynamic runtime CRT (/MD or /MDd), on_process_exit is not called. I will look into it later. B. ***************************************************************************** ** ** ** WARNING: This email contains an attachment of a very suspicious type. ** ** You are urged NOT to open this attachment unless you are absolutely ** ** sure it is legitmate. Opening this attachment may cause irreparable ** ** damage to your computer and your files. If you have any questions ** ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. ** ** ** ** This warning was added by the IU Computer Science Dept. mail scanner. ** *****************************************************************************

On Thu, 05 Aug 2004 10:07:47 +0200 Bronek Kozicki <brok@rubikon.pl> wrote:
Johan Nilsson wrote:
If this is the case, could it be that the linker discards the symbols as they are not obviously referenced by the .exe code? Could you try to force the librarian to reference those symbols (under VC7.1 it's the /INCLUDE: option for the lib - don't have access to VC6 at the moment).
Bingo! I've added to test project: #pragma comment(linker, "/include:_p_thread_callback") and it works now. I'm attaching MSVC6 project for others to try it :)
This can be done easier. The problem is that you are using this bit of code slightly of topic. However you as already having discovered somehow need to convince the linker to link in the lib. Since you are not referencing any exported symbols this is the reason for not linking the lib in. Without #pragma trickery this can be done in the following way: extern "C" void tss_cleanup_implemented(void); int main(int argc, char* argv[]) { tss_cleanup_implemented(); // just a dummy call to link the lib in printf("entering main\n"); DWORD id; HANDLE h = CreateThread(NULL, 0, &f, 0, 0, &id); WaitForSingleObject(h, INFINITE); CloseHandle(h); _dummy = 0; printf("leaving main\n"); return _dummy; }
Last thing that is working not quite as expected is that if I link to dynamic runtime CRT (/MD or /MDd), on_process_exit is not called. I will look into it later.
In the meantime I did all this sort of tests. I would be glad if you could confirm this. I will post the details in a separate mail. However you will need to tweak your local copy of boost since it seems Mike did not yet have time to check everything into the CVS. Roland

On Thu, 5 Aug 2004 10:43:05 +0200 (W. Europe Daylight Time) Roland <roland.schwarz@chello.at> wrote:
Last thing that is working not quite as expected is that if I link to dynamic runtime CRT (/MD or /MDd), on_process_exit is not called. I will look into it later.
Just beeing curious. How does the _RTLDLL getting set? I experimented with #ifdef _RTLDLL #error "DLL RT" #else #error "STATIC RT" #endif No matter which runtime I select it always seems not to be defined. Roland

"Roland" <roland.schwarz@chello.at> wrote in message news:20040805083842.XVAN21832.viefep17-int.chello.at@speedsnail...
On Thu, 05 Aug 2004 10:07:47 +0200 Bronek Kozicki <brok@rubikon.pl> wrote:
[...]
This can be done easier. The problem is that you are using this bit of code slightly of topic. However you as already having discovered somehow need to convince the linker to link in the lib. Since you are not referencing any exported symbols this is the reason for not linking the lib in.
Without #pragma trickery this can be done in the following way:
extern "C" void tss_cleanup_implemented(void);
int main(int argc, char* argv[]) { tss_cleanup_implemented(); // just a dummy call to link the lib in printf("entering main\n");
DWORD id; HANDLE h = CreateThread(NULL, 0, &f, 0, 0, &id); WaitForSingleObject(h, INFINITE); CloseHandle(h);
_dummy = 0;
printf("leaving main\n");
return _dummy; }
From the end-user point of view I'd still prefer the "/include" method - if
possible. BTW, you guys have performed some great work here. I'm just a bit surprised that no one of the MS lurkers has had anything (technical) to say about this. // Johan

Roland wrote:
This can be done easier. The problem is that you are using this bit of code slightly of topic
That's true. I'm used to test small pieces of code (I call them "solutions", but it has nothing to do with Visual Studio thing) separately from boost in order to allow anybody to see how they work without need to grab boost tarball. That's actually also for my own convenience, because my INCLUDE and LIB paths are set to last release, not development version of boost. B.

Roland wrote:
...since it seems Mike did not yet have time to check everything into the CVS.
Sorry, I did not. I did a lot of cleaning and rearranging and now have to get things working again. I haven't had enough time to do it yet, though I hope to soon. Mike

On Thu, 05 Aug 2004 10:15:27 -0400 Michael Glassford <glassfordm@hotmail.com> wrote:
Roland wrote:
...since it seems Mike did not yet have time to check everything into the CVS.
Sorry, I did not. I did a lot of cleaning and rearranging and now have to get things working again. I haven't had enough time to do it yet, though I hope to soon.
No reason to excuse. You are doing a very good job anyways. Just in case my latest post slipped through: I posted a complete boostified version that already compiles in the bjam environemnt under the new subject: [boost] [boost.thread] Static Library support for Windows Thanks, Roland

Roland wrote: I just noticed that in the code you posted, you wrote:
static void on_process_init(void) { /* This hooks the main thread exit. It will run the */ /* termination before global dtors, but will not be run */ /* when 'quick' exiting the library! However, this is the */ /* standard behaviour for all global dtors anyways. */ atexit(on_thread_exit);
/* hand over to boost */ on_process_enter(); }
Just to make certain, should that read: atexit(on_process_exit); ^^^^^^^ or was the point to call on_thread_exit() sooner for the main thread? If that is the case, should both atexit(on_thread_exit); atexit(on_process_exit); be included, in your opinion? Mike

On Wed, 04 Aug 2004 09:25:43 -0400 Michael Glassford <glassfordm@hotmail.com> wrote:
Just to make certain, should that read:
atexit(on_process_exit); ^^^^^^^
No. on_thread_exit is correct.
or was the point to call on_thread_exit() sooner for the main thread? If that is the case, should both
atexit(on_thread_exit); atexit(on_process_exit);
While this could be done here, I took a different route which I think is a little cleaner. atexit will not run when the user does a quick exit of the program which deliberately omits calls to the global destructors. This is standard behaviour of the library. So we take part in this, because the main pupose of on_thread_exit is to call the destructors of the TSS globals. On the other side we won't miss on_process_exit so I am defering this to the section which will run even when the user is quick exiting. If you think this is not reasonable it should be changed of course. Roland

Aaron W. LaFramboise wrote:
it is compatible. For example, the suggestions so far for overriding the TLS directory will break __declspec(thread) and will probably cause random redefined symbol link failures in some situations.
That's true. It would be much better to be able to insert callback into __tls_used created by compiler instead of creating own __tls_used B.

On Mon, 02 Aug 2004 08:46:32 +0200 Bronek Kozicki <brok@rubikon.pl> wrote:
Aaron W. LaFramboise wrote:
it is compatible. For example, the suggestions so far for overriding the TLS directory will break __declspec(thread) and will probably cause random redefined symbol link failures in some situations.
That's true. It would be much better to be able to insert callback into __tls_used created by compiler instead of creating own __tls_used
I did not yet test this, but are we overriding the _tls_used when simply referencing it? We only rely on the linkers ability do define it when needed. Roland

Roland wrote:
I did not yet test this, but are we overriding the _tls_used when simply referencing it? We only rely on the linkers ability do define it when needed.
We are not. But Holger Grund proposed solution that worked on MSVC6 (at least on Windows Server 2003), and that solution was just defining symbol _tls_used instead of asking compiler to create it. EXE did not contain .tls section at all in this case. B.

On Wed, 04 Aug 2004 08:56:51 +0200 Bronek Kozicki <brok@rubikon.pl> wrote:
Roland wrote:
I did not yet test this, but are we overriding the _tls_used when simply referencing it? We only rely on the linkers ability do define it when needed.
We are not. But Holger Grund proposed solution that worked on MSVC6 (at least on Windows Server 2003), and that solution was just defining symbol _tls_used instead of asking compiler to create it. EXE did not contain .tls section at all in this case.
I am a little afraid, that defining _tls_used ourselfes, we could messs up with the copilers ability to handle __declspec(thread) correctly. Did you any testing on this so far? Roland

Roland wrote:
I am a little afraid, that defining _tls_used ourselfes, we could messs up with the copilers ability to handle __declspec(thread) correctly. Did you any testing on this so far?
Yes, with results you are expecting - linker error due to symbol redefinition :/ Thus I think we should not do it, and stay with _tls_used defined by compiler. B.

Roland wrote:
On Wed, 04 Aug 2004 08:56:51 +0200 Bronek Kozicki <brok@rubikon.pl> wrote:
We are not. But Holger Grund proposed solution that worked on MSVC6 (at least on Windows Server 2003), and that solution was just defining symbol _tls_used instead of asking compiler to create it. EXE did not contain .tls section at all in this case.
I am a little afraid, that defining _tls_used ourselfes, we could messs up with the copilers ability to handle __declspec(thread) correctly. Did you any testing on this so far?
To clarify the situation from my perspective: The PECOFF spec simply says that (on x86) _tls_used will be defined if static TLS is being used. What it doesn't say is that _tls_used is actually the TLS directory itself. The system is set up so that absolutely no special linker magic is needed for static TLS, with the sole exception that if the linker notices the _tls_used symbol, it will cause the TLS directory pointer to refer to it. _tls_used is an object of the TLS directory type, usually defined in .rdata. On Microsoft implementations, it is provided by tlssup.obj within the runtime library, which uses normal features of linkers and PE to arrange for the TLS directory to contain the necessary pointers. tlssup.obj also provides a few other external TLS symbols. The MSVC6 problem is that theres a trivial bug in tlssup.obj. (Roland also mentions a linker problem, but I have been unable to reproduce this myself.) If you duplicate the symbols in tlssup.obj exactly, with the exception of the bug fix, it makes everything work perfectly. (If the duplication is not exact, it will break __declspec(thread), cause link time errors, or other unexpected things.) In summary, duplicating tlssup.obj (which is trivial, especially because the Microsoft tlssup.obj has debugging information) and providing the bug fix for MSVC6 forms an alternate solution to the MSVC6 problem, opposed to Roland's runtime fixup. I have no particular technical preference either way; neither solution is particularly clean, but neither seems particularly harmful if done right. Aaron W. LaFramboise

On Wed, 04 Aug 2004 03:19:17 -0500 "Aaron W. LaFramboise" <aaronrabiddog51@aaronwl.com> wrote:
In summary, duplicating tlssup.obj (which is trivial, especially because the Microsoft tlssup.obj has debugging information) and providing the bug fix for MSVC6 forms an alternate solution to the MSVC6 problem, opposed to Roland's runtime fixup.
Could you send me please a copy of your tlssup.c code? Unfortunately this piece of code is missing from the CRT source that is beeing shipped with the compiler. mailto: roland.schwarz@chello.at I agree on your description of the tls functions, with one minor remark. It seems as if the linker is used to 'assemble' the .CRT$XLx segments. Good. From your description it follows, that the linker always keeps these segments as small as possible, i.e. it does not append any zeroes in between the segments. However this is not what I am observing. For me it appears as if each segment has a minimum size, which is preset with zeroes. When now pointers are filled in, of course there are zeroes between these segment. Could it be, that you are using some linker switches that cause the segments to be trimmed? Roland

Roland wrote:
Could you send me please a copy of your tlssup.c code? Unfortunately this piece of code is missing from the CRT source that is beeing shipped with the compiler. mailto: roland.schwarz@chello.at
Give me about 12 hours to produce this in a reasonable form. I have not yet written it, but have little doubt that this is trivial.
It seems as if the linker is used to 'assemble' the .CRT$XLx segments. Good. From your description it follows, that the linker always keeps these segments as small as possible, i.e. it does not append any zeroes in between the segments.
However this is not what I am observing. For me it appears as if each segment has a minimum size, which is preset with zeroes. When now pointers are filled in, of course there are zeroes between these segment.
Could it be, that you are using some linker switches that cause the segments to be trimmed?
I have not invoked the linker directly, only through CL. I have verified the behavior I mentioned earlier using a PE dumper and a hex editor. There are some options for specifying alignment of sections, but note this is for linker output, not input. Input sections are aligned according to the section symbol, which for .tls$ in my test code for both MSVC6 and MSVC7.1 is 2**2, on dword boundaries. If somehow the alignment was being set by the compiler/assembler to something higher than this, this could account for zeroes. But nothing I have suggests that this is happening. It's also possible that the compiler (or its assembler machinery) is padding variables for some reason, but I don't know why that would be happening in this case. If you have some specific test case where this happens for you, let me know and I'll see exactly what output it gives me. Aaron W. LaFramboise

On Wed, 04 Aug 2004 16:25:35 -0500 "Aaron W. LaFramboise" <aaronrabiddog51@aaronwl.com> wrote:
There are some options for specifying alignment of sections, but note this is for linker output, not input. Input sections are aligned according to the section symbol, which for .tls$ in my test code for both MSVC6 and MSVC7.1 is 2**2, on dword boundaries. If somehow the alignment was being set by the compiler/assembler to something higher than this, this could account for zeroes. But nothing I have suggests that this is happening.
It's also possible that the compiler (or its assembler machinery) is padding variables for some reason, but I don't know why that would be happening in this case.
If you have some specific test case where this happens for you, let me know and I'll see exactly what output it gives me.
After having unified your and my code and having compared MSVC& and 7 I came to the following observations so far: 1) MSVC6: The linker creates a named data segemnet when requested by the object that has a minimumsize of 260 Bytes. Then it assembles the pointer(s) into this segement. In the result the sequence: data_seg(".CRT$XLB") static _TLSCB pinit1 = mycb1fn; data_seg() data_Seg(".CRT$XLC") static _TLSCB pinit2 = mycb2fn; data_seg() will end up as initialized memory: __xl_a : 00 00 00 00 __xl_a+256: 00 00 00 00 adr1: [mycb1fn] adr1+4: 00 00 00 00 .... 00 00 00 00 adr1+256: 00 00 00 00 adr1+260: [mycb2fn] adr1+264: 00 00 00 00 .... 00 00 00 00 adr1+516: 00 00 00 00 __xl_z: 00 00 00 00 The __xl_a and __xl_z variables that mark the start and end of the callback area are located on start of segments .CRTXLA and .CRTXLZ respectively. In addition the tlstir.AddressOfCallBacks member is pointing to __xla. The net result is that the linker is creating an inappropriate PE callback directory. Consequently I am trying to roeorder the data. (Sidebar: since I simply move pointers around I think this will not mess with relocation issues.) 2) MSVC7: Things are very similar with the exception that tlsdir.AddressOfCallBacks is initialzed to __xla+4 AND the linker has a minimum segment size of 4 for .CRT$XLx segements. (It does not show this behaviour for e.g. .CRT$XIx segments.) The net result is, that it will create a valid PE callback table. So I think my approach for the fixup is rather conservative. Please note that it should be possible to run the fixup code even multiple times, without any negative effect. So this scheme also allows for the use in DLL's To get more confidence in these observations I grepped into the LINKER.EXE binary. You can find there explicitely the strings .CRT$XL and _tls_used. This makes me rather condident that the linker is the culprit, which gets things wrong. Even if we have succes in changing the minimum size to 4 bytes, I still can see no way how we could change AddressOfCallBacks pointing to the correct entry. Roland

Roland wrote:
After having unified your and my code and having compared MSVC& and 7 I came to the following observations so far:
1) MSVC6: The linker creates a named data segemnet when requested by the object that has a minimumsize of 260 Bytes. Then it assembles the pointer(s) into this segement. In the result the sequence:
I positively cannot account for the behavior you are seeing. This does not happen for me. Perhaps it was a bug on an earlier version of MSVC6. For reference, CL reports version 12.00.8168 and LINK reports version 6.00.8168. In any case, if the following source file is added to the compile line (to replace the broken tlssup.obj in the runtime library), TLS callbacks work correctly on my copy of MSVC6 as they do on MSVC7.1: ---msvc_tls.cpp--- #define WIN32_LEAN_AND_MEAN #include <windows.h> namespace { #pragma const_seg(".tls") extern const LONGLONG _tls_start = 0; #pragma const_seg(".tls$ZZZ") extern const LONGLONG _tls_end = 0; #pragma const_seg() } // namespace extern "C" { #pragma data_seg(".CRT$XLA") PIMAGE_TLS_CALLBACK __xl_a = 0; #pragma data_seg(".CRT$XLZ") PIMAGE_TLS_CALLBACK __xl_z = 0; #pragma data_seg() DWORD _tls_index; extern const IMAGE_TLS_DIRECTORY _tls_used = { reinterpret_cast<DWORD>(&_tls_start), reinterpret_cast<DWORD>(&_tls_end), &_tls_index, &__xl_a + 1, 0, 0 }; } // extern "C" ---EOF--- If this does not work for you, then there must be some other problem, such as a linker bug as you mention, that does not occur on my MSVC6 installation. At any rate, I think your suggested implementation is the way to go as it gets everything right and works everywhere. :) Aaron W. LaFramboise

On Thu, 05 Aug 2004 12:43:11 -0500 "Aaron W. LaFramboise" <aaronrabiddog51@aaronwl.com> wrote:
I positively cannot account for the behavior you are seeing. This does not happen for me. Perhaps it was a bug on an earlier version of MSVC6. For reference, CL reports version 12.00.8168 and LINK reports version 6.00.8168.
Not wanting to be pedantic, but this is very strange, and perhaps the current situation still can be improved when we find out the reason for this. My versions are (got it with --hep switch); CL: 12.00.8804 LINK: 6.00.8447 which are both from the June Service Pack 6 and both appear to be newer than yours.
At any rate, I think your suggested implementation is the way to go as it gets everything right and works everywhere. :)
I did not yet try your code (thank you btw), but I am afraid, that this will not address the problem I am seeing entirely since I see 256 bytes between start of .CRT$XLA and start of .CRT$XLB (where the callback lives). Yes, and if the compiler does get it right the reordering code simply reduces to a no operation even when not commented out. However this is only true when AddressOfCallbacks is pointing to __xl_a. Perhaps I should again use the "real" starting address instead of __xl_a altough this implies the assuption that _tls_used is really the TLS-dircetory. I tried to avoid this on behalf of your suggestion. Roland

Roland wrote:
On Thu, 05 Aug 2004 12:43:11 -0500 "Aaron W. LaFramboise" <aaronrabiddog51@aaronwl.com> wrote:
I positively cannot account for the behavior you are seeing. This does not happen for me. Perhaps it was a bug on an earlier version of MSVC6. For reference, CL reports version 12.00.8168 and LINK reports version 6.00.8168.
Not wanting to be pedantic, but this is very strange, and perhaps the current situation still can be improved when we find out the reason for this.
I did not yet try your code (thank you btw), but I am afraid, that
If you can compose an exact testcase where this happens for you, I will run it here and see if I get the same behavior. Perhaps we are doing something different from each other somehow. this will not address the
problem I am seeing entirely since I see 256 bytes between start of .CRT$XLA and start of .CRT$XLB (where the callback lives).
Probably not.
Yes, and if the compiler does get it right the reordering code simply reduces to a no operation even when not commented out. However this is only true when AddressOfCallbacks is pointing to __xl_a. Perhaps I should again use the "real" starting address instead of __xl_a altough this implies the assuption that _tls_used is really the TLS-dircetory. I tried to avoid this on behalf of your suggestion.
The code I have presented here is my reimplementation of what is in MSVC6's tlssup.obj. If we were presented with the actual source of tlssup.obj, the two sources would be quite similar (I hope :) ). Aaron W. LaFramboise

Aaron W. LaFramboise wrote:
Following is an implementation of, and some comments on, a staticly linked Boost.Thread thread exit handler for MSVC.
[snip rest of message] Here's another question: calls to DllMain are serialized (i.e., only one thread at a time is allowed to call DllMain); is the same true for the tls callback? If not, the implementation of on_thread_exit(), will have to be a lot more careful to be thread-safe. Mike

Michael Glassford wrote:
thread at a time is allowed to call DllMain); is the same true for the tls callback? If not, the implementation of on_thread_exit(), will have to be a lot more careful to be thread-safe.
Yes. TLS callback is serialized in the same section where DllMain calls are serialized, just after calling all DllMain functions. At least this is how it works in Windows 2000 and Windows Server 2003. B.
participants (10)
-
Aaron W. LaFramboise
-
Bronek Kozicki
-
Carl Daniel
-
David Abrahams
-
Holger Grund
-
Johan Nilsson
-
John Maddock
-
Michael Glassford
-
Roland
-
Tony Juricic