Re: [boost] very poor thread performance

14 May 2008

      James Sutherland wrote:
...
I have been testing thread performance on Linux and Mac.  My Linux  
system has two dual-core processors and my Mac has one dual-core  
processor.  Both are intel chips.
For the code snippet given below, the execution time should ideally  
decrease as the number of threads increases.  However, the opposite  
trend is observed.  For example, using -O3 flags on my Linux desktop  
produces the following timings:
1 Thread: 0.66 sec
2 Threads: 0.9 sec
3 Threads: 1.2 sec
4 Threads: 1.4 sec
I do not have a lot of experience with threads, and was wondering if  
this result surprises anyone?
Hi James,

Quoting your code out of order:
...
for( int itask=0; itask<nTasks; ++itask ){
     boost::thread_group threads;
     for( int i=0; i<nThreads; ++i ){
       threads.create_thread( MyStruct(itask++ + 100) );
     }
     threads.join_all();
   }
Did you really want the ++itask in the first for() ?  Isn't it being 
incremented enough in the create_thread line?
...
struct MyStruct
{
   explicit MyStruct(const int i) : tag(i) {}
   void operator()() const
   {
     const int n = 100;
     std::vector<int> nums(n,0);
     for( int j=0; j<1000000; ++j )
       for( int i=0; i<n; ++i )
         nums[i] = i+tag;
   }
private:
   int tag;
};
So sizeof(MyStruct)==sizeof(int) [for the tag].  Now, if you were 
creating the MyStruct objects like this:

MyStruct my_structs[n];

then I would say that they are all sharing a cache line, and that cache 
line is being fought over by the different processors when they read 
tag, and that you should add some padding.  But you're not; you're 
passing a temporary MyStruct to create_thread which presumably stores a 
copy of it.  How does boost::thread_group store the functors that are 
passed to it?  If it is storing them in some sort of array or vector 
then that could still be the problem - and it could be fixed by adding 
padding inside boost.thread, or by copying the functor onto the new 
thread's stack.

Also, I would imagine that the compiler would keep tag in a register.  
What happens if you declare it as const?

I suggest that you try adding some padding and see what happens.

Phil.

Re: [boost] very poor thread performance

Phil Endecott