[thread] TSS cleanup

Hello, I would like to know if the storage created for boost::thread_specific_ptr in the tss specific code for win32 is free'd after the destruction of said boost::thread_specific_ptr, or if it stays until the end of the process. For example: for(int i = very_big_number;i--;i > 0) boost::thread_specific_ptr<int> p(new int); To my knowledge, the pointer returned by new in this case is freed. But does the int* that is created to store the pointer to new int is also freed, or this program will get only bigger and bigger? I'm using boost 1.36. The boost in 1.34 did get bigger and bigger, and I modified to use a map, which made the complexity bigger (log N), but made it manageable to use spirit 1.8, which creates boost::thread_specific_ptr and destroys a lot of them. Thanks in advance, -- Felipe Magno de Almeida

"Felipe Magno de Almeida" <felipe.m.almeida@gmail.com> writes:
I would like to know if the storage created for boost::thread_specific_ptr in the tss specific code for win32 is free'd after the destruction of said boost::thread_specific_ptr, or if it stays until the end of the process.
Storage is allocated on a per-thread basis, and freed when the thread exits.
For example:
for(int i = very_big_number;i--;i > 0) boost::thread_specific_ptr<int> p(new int);
To my knowledge, the pointer returned by new in this case is freed. But does the int* that is created to store the pointer to new int is also freed, or this program will get only bigger and bigger?
As written this code is fine: it will reuse the same slot since p will have the same address each time through, and it uses the address as the key. It is undefined behaviour if other threads can still try and access a given thread_specific_ptr (including having data stored associated with it) after it has been destroyed. If you create and destroy thread_specific_ptr instances all over the place then in general this will increase the list for each thread that accesses the tsp. It shouldn't be too hard to have the TSS entry be removed from the list when it is no longer needed. I'll try and get round to sorting it for Boost 1.37. If you raise a trac ticket I'll be more likely to remember. Anthony -- Anthony Williams | Just Software Solutions Ltd Custom Software Development | http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

On Mon, Sep 22, 2008 at 1:05 PM, Anthony Williams <anthony.ajw@gmail.com> wrote:
"Felipe Magno de Almeida" <felipe.m.almeida@gmail.com> writes:
I would like to know if the storage created for boost::thread_specific_ptr in the tss specific code for win32 is free'd after the destruction of said boost::thread_specific_ptr, or if it stays until the end of the process.
Storage is allocated on a per-thread basis, and freed when the thread exits.
For example:
for(int i = very_big_number;i--;i > 0) boost::thread_specific_ptr<int> p(new int);
To my knowledge, the pointer returned by new in this case is freed. But does the int* that is created to store the pointer to new int is also freed, or this program will get only bigger and bigger?
As written this code is fine: it will reuse the same slot since p will have the same address each time through, and it uses the address as the key. It is undefined behaviour if other threads can still try and access a given thread_specific_ptr (including having data stored associated with it) after it has been destroyed.
Well, so for spirit it would indeed get only bigger. Since it will hardly reuse slots. I see the tss algorithm is linear as well.
If you create and destroy thread_specific_ptr instances all over the place then in general this will increase the list for each thread that accesses the tsp. It shouldn't be too hard to have the TSS entry be removed from the list when it is no longer needed. I'll try and get round to sorting it for Boost 1.37. If you raise a trac ticket I'll be more likely to remember.
Would this work? It seems to fix my problem: Index: boost/thread/tss.hpp =================================================================== --- boost/thread/tss.hpp (revisão 46411) +++ boost/thread/tss.hpp (cópia de trabalho) @@ -22,6 +22,7 @@ virtual void operator()(void* data)=0; }; + BOOST_THREAD_DECL void remove_tss_data(void const* key,boost::shared_ptr<tss_cleanup_function> func,void* tss_data,bool cleanup_existing); BOOST_THREAD_DECL void set_tss_data(void const* key,boost::shared_ptr<tss_cleanup_function> func,void* tss_data,bool cleanup_existing); BOOST_THREAD_DECL void* get_tss_data(void const* key); } @@ -74,6 +75,7 @@ ~thread_specific_ptr() { reset(); + detail::remove_tss_data(this,cleanup,0,true); } T* get() const Index: libs/thread/src/win32/thread.cpp =================================================================== --- libs/thread/src/win32/thread.cpp (revisão 46411) +++ libs/thread/src/win32/thread.cpp (cópia de trabalho) @@ -541,6 +541,40 @@ } return NULL; } + + void remove_tss_data(void const* key,boost::shared_ptr<tss_cleanup_function> func,void* tss_data,bool cleanup_existing) + { + tss_cleanup_implemented(); // if anyone uses TSS, we need the cleanup linked in + detail::thread_data_base* const current_thread_data(get_current_thread_data()); + if(current_thread_data) + { + detail::tss_data_node* current_node=current_thread_data->tss_data + , *previous_node = 0; + while(current_node) + { + if(current_node->key==key) + { + // found + if(previous_node) + { + current_node = current_node->next; + heap_delete<tss_data_node>(previous_node->next); + previous_node->next = current_node; + } + else + { + previous_node = current_node; + current_node = current_node->next; + heap_delete<tss_data_node>(previous_node); + current_thread_data->tss_data = current_node; + } + break; + } + previous_node = current_node; + current_node=current_node->next; + } + } + } void set_tss_data(void const* key,boost::shared_ptr<tss_cleanup_function> func,void* tss_data,bool cleanup_existing) {
Anthony -- Anthony Williams | Just Software Solutions Ltd Custom Software Development | http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
Thanks, -- Felipe Magno de Almeida

"Felipe Magno de Almeida" <felipe.m.almeida@gmail.com> writes:
On Mon, Sep 22, 2008 at 1:05 PM, Anthony Williams <anthony.ajw@gmail.com> wrote:
"Felipe Magno de Almeida" <felipe.m.almeida@gmail.com> writes:
I would like to know if the storage created for boost::thread_specific_ptr in the tss specific code for win32 is free'd after the destruction of said boost::thread_specific_ptr, or if it stays until the end of the process.
Storage is allocated on a per-thread basis, and freed when the thread exits.
For example:
for(int i = very_big_number;i--;i > 0) boost::thread_specific_ptr<int> p(new int);
To my knowledge, the pointer returned by new in this case is freed. But does the int* that is created to store the pointer to new int is also freed, or this program will get only bigger and bigger?
As written this code is fine: it will reuse the same slot since p will have the same address each time through, and it uses the address as the key. It is undefined behaviour if other threads can still try and access a given thread_specific_ptr (including having data stored associated with it) after it has been destroyed.
Well, so for spirit it would indeed get only bigger. Since it will hardly reuse slots. I see the tss algorithm is linear as well.
If you create and destroy thread_specific_ptr instances all over the place then in general this will increase the list for each thread that accesses the tsp. It shouldn't be too hard to have the TSS entry be removed from the list when it is no longer needed. I'll try and get round to sorting it for Boost 1.37. If you raise a trac ticket I'll be more likely to remember.
Would this work? It seems to fix my problem:
It looks like it at first glance. Anthony -- Anthony Williams | Just Software Solutions Ltd Custom Software Development | http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

----- Original Message ----- From: "Anthony Williams" <anthony.ajw@gmail.com> To: <boost@lists.boost.org> Sent: Monday, September 22, 2008 6:05 PM Subject: Re: [boost] [thread] TSS cleanup
For example:
for(int i = very_big_number;i--;i > 0) boost::thread_specific_ptr<int> p(new int);
To my knowledge, the pointer returned by new in this case is freed. But does the int* that is created to store the pointer to new int is also freed, or this program will get only bigger and bigger?
As written this code is fine: it will reuse the same slot since p will have the same address each time through, and it uses the address as the key. It is undefined behaviour if other threads can still try and access a given thread_specific_ptr (including having data stored associated with it) after it has been destroyed.
Hi, I didn't know that the key was the address of the variable. As pthread_getspecific ensures constant complexity, I expected the same for boost::thread_specific_ptr, but if the key is the address this seams not possible. So which is the complexity of boost::thread_specific_ptr<T>::get()? Looking at the code we see that is O(N). IMHO, the constant complexity is one the major requirements of such a feature. I supose you had some good raison to not use an index key instead of the address. In addition, the release and reset functions inccur on two lookups on the set of thread specific pointers. T* release() { T* const temp=get(); detail::set_tss_data(this,boost::shared_ptr<detail::tss_cleanup_function>(),0,false); return temp; } void reset(T* new_value=0) { T* const current_value=get(); if(current_value!=new_value) { detail::set_tss_data(this,cleanup,new_value,true); } } If the set_tss_data returns the old value these two functions could be encoded with reduced complexity like that. T* release() { return detail::set_tss_data(this,boost::shared_ptr<detail::tss_cleanup_function>(),0,false); } void reset(T* new_value=0) { detail::set_tss_data(this,cleanup,new_value,true); } Does the thread_local C++0x feature requires constant complexity on getting and setting the value? It is a pitie that we can not use the thread_local (or the equivalent) when the compiler provies to define boost::thread_specific_ptr. Or maybe the preprocesor can help? #define BOOST_THREAD_LOCAL(T,name,init_value) \ class BOOST_THREAD_LOCAL##name { private: thread_local static T value init_value; public: T* get() const { return &value; } } name; In the mean time, it would be great if you add the nature of the key, the complexity and the rationale on this design decision on the documentation. Thanks, Vicente

"vicente.botet" <vicente.botet@wanadoo.fr> writes:
From: "Anthony Williams" <anthony.ajw@gmail.com>
For example:
for(int i = very_big_number;i--;i > 0) boost::thread_specific_ptr<int> p(new int);
To my knowledge, the pointer returned by new in this case is freed. But does the int* that is created to store the pointer to new int is also freed, or this program will get only bigger and bigger?
As written this code is fine: it will reuse the same slot since p will have the same address each time through, and it uses the address as the key. It is undefined behaviour if other threads can still try and access a given thread_specific_ptr (including having data stored associated with it) after it has been destroyed.
I didn't know that the key was the address of the variable. As pthread_getspecific ensures constant complexity, I expected the same for boost::thread_specific_ptr, but if the key is the address this seams not possible.
pthread_getspecific makes no guarantees about its complexity: http://www.opengroup.org/onlinepubs/009695399/functions/pthread_setspecific....
So which is the complexity of boost::thread_specific_ptr<T>::get()? Looking at the code we see that is O(N). IMHO, the constant complexity is one the major requirements of such a feature. I supose you had some good raison to not use an index key instead of the address.
The set of thread_specific_ptr values accessed by a given thread cannot be known when the thread is launched. Consequently you cannot know which set of indices will be used. Use of indices would require a sparse vector. I intend to upgrade the data structure to a map at some point, which would therefore have faster lookup.
In addition, the release and reset functions inccur on two lookups on the set of thread specific pointers.
True.
If the set_tss_data returns the old value these two functions could be encoded with reduced complexity like that. T* release() { return detail::set_tss_data(this,boost::shared_ptr<detail::tss_cleanup_function>(),0,false); } void reset(T* new_value=0) { detail::set_tss_data(this,cleanup,new_value,true); }
Yes, it would.
Does the thread_local C++0x feature requires constant complexity on getting and setting the value?
No, the C++0x thread_local keyword doesn't offer any complexity guarantees.
It is a pitie that we can not use the thread_local (or the equivalent) when the compiler provies to define boost::thread_specific_ptr. Or maybe the preprocesor can help?
#define BOOST_THREAD_LOCAL(T,name,init_value) \ class BOOST_THREAD_LOCAL##name { private: thread_local static T value init_value; public: T* get() const { return &value; } } name;
We'll have to see what the semantics are when compilers eventually implement this feature. I am particularly concerned with how it relates to dynamic libraries. Certainly, MSVC has issues with __declspec(thread) in DLLs.
In the mean time, it would be great if you add the nature of the key, the complexity and the rationale on this design decision on the documentation.
Add a trac ticket to remind me and I'll update the docs when I have time. Anthony -- Anthony Williams | Just Software Solutions Ltd Custom Software Development | http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Anthony Williams wrote:
"vicente.botet" <vicente.botet@wanadoo.fr> writes:
So which is the complexity of boost::thread_specific_ptr<T>::get()? Looking at the code we see that is O(N). IMHO, the constant complexity is one the major requirements of such a feature. I supose you had some good raison to not use an index key instead of the address.
The set of thread_specific_ptr values accessed by a given thread cannot be known when the thread is launched. Consequently you cannot know which set of indices will be used. Use of indices would require a sparse vector. I intend to upgrade the data structure to a map at some point, which would therefore have faster lookup.
Why not using the sparse vector then? I understand that POSIX gives no guarantees, but as I read TlsGetValue documentation on MSDN I get a strong impression that the lookup is O(1) there. I would expect the same on most POSIX platforms, regardless that the paper does not require that. And I would really like the Boost solution to provide the similar performance, at least where such is provided by OS.

----- Original Message ----- From: "Anthony Williams" <anthony.ajw@gmail.com> To: <boost@lists.boost.org> Sent: Tuesday, September 23, 2008 10:46 AM Subject: Re: [boost] [thread] TSS complexity
"vicente.botet" <vicente.botet@wanadoo.fr> writes:
I didn't know that the key was the address of the variable. As pthread_getspecific ensures constant complexity, I expected the same for boost::thread_specific_ptr, but if the key is the address this seams not possible.
pthread_getspecific makes no guarantees about its complexity:
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_setspecific....
Ok, there is no mention to at all on the complexity, but we can see in that it has been designed to favor speed and simplicity over error reporting. "Performance and ease-of-use of pthread_getspecific() are critical for functions that rely on maintaining state in thread-specific data. Since no errors are required to be detected by it, and since the only error that could be detected is the use of an invalid key, the function to pthread_getspecific() has been designed to favor speed and simplicity over error reporting." Do you know of an implementation that do not provides contant complexity, e.g. linear complexity? Do you think that you will use the pthread_specific functions in order to store the thread_specific data used by the Boost.Thread library on a such platform?
So which is the complexity of boost::thread_specific_ptr<T>::get()? Looking at the code we see that is O(N). IMHO, the constant complexity is one the major requirements of such a feature. I supose you had some good raison to not use an index key instead of the address.
The set of thread_specific_ptr values accessed by a given thread cannot be known when the thread is launched. Consequently you cannot know which set of indices will be used.
Correct, we can not know in advance how many key will be in use.
Use of indices would require a sparse vector.
Right.
I intend to upgrade the data structure to a map at some point, which would therefore have faster lookup.
Great! I have yet a suggestion, why not take the better of each approach: - direct access when free key index are available, - no limit the number of keys We can have a key that is a variant of index and variable address, so * When a new key must be created the library will return a index variant if there are free index, and a address variant otherwise. * The key -> pointer mapping will be decomposed on a vector of fixed size and a map. * The get/set functions will use the vector or the map depending on the variant. The single liability I have identified is that the size of the key is bigger, but we don't have too much keys on a process, so this is a minor liability.
In the mean time, it would be great if you add the nature of the key, the complexity and the rationale on this design decision on the documentation.
Add a trac ticket to remind me and I'll update the docs when I have time.
Done Best, Vicente
participants (4)
-
Andrey Semashev
-
Anthony Williams
-
Felipe Magno de Almeida
-
vicente.botet