
On 30.03.2009, at 14:08, Anthony Williams wrote:
Oliver Abert
writes: Thanks for alerting me to this thread Peter.
Oliver Abert
writes: On 29.03.2009, at 19:36, Peter Dimov wrote:
Oliver Abert:
Hi Everyone,
I am using Boost Threads (1.38) as threading library and I also use the thread_specific_ptr to store a minor amount of data per thread (I think currently it is like 5 different pointer values per thread). Technically everything works out fine, but I am having a performance problem on Mac OS X. On Linux the performance is 10 times faster than on Mac OS. If I use pthreads on Mac OS I have identical performance to the Linux version. Both versions are running on the same machine using 8 threads both.
What does your profiler say?
about 80% of the time is spend in __spin_lock which in turnwas called by pthread_once. If I use only one thread (instead of 8) the percantage goes down to 2.5% - which is still a bit much for my taste.
pthread_once is called by the thread_specific_ptr code to ensure that the TLS key it uses has been allocated and is valid. It's a real pain if that is too slow.
yes, i understand that so far - but there seems to be some more serious problem. Is it possible that there is some unintended mutex lock, because it seems like exactly that is happening. Maybe it is related to the static variables, which might get mutexed automatically? I heard there is a bug with the Apple gcc 4.0.1 regarding statics, but this morning I also tried the intel 11.0 compiler with the same dissapointing results. What makes me wonder, ist that the same code runs just fine on Linux.
Some more background Information: The problem is definitevly caused by calls to get() of the shared pointer. I am using it in a realtively hot section of my code. Profiling is not so helpful, because there are a bunch of unknown libraries in between my call and the pthread_once call - and yes I also used a begug build of boost - I have not a clue what is happening in between.
Could you show the code that accesses the thread_specific_ptr?
Okay, the calling is done by a simple:
HierarchyTraverser *ht = RenderThread::hierarchyTraverser();
(there is nothing boost related stuff before and after that call)
while that is:
inline HierarchyTraverser* RenderThread::hierarchyTraverser()
{
#ifdef BOOST
return
reinterpret_cast
Anthony -- Author of C++ Concurrency in Action | http://www.manning.com/williams just::thread C++0x thread library | http://www.stdthread.co.uk Just Software Solutions Ltd | http://www.justsoftwaresolutions.co.uk 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users