
Phil, Thank you for your suggestions - I am hunting in the dark here...
for( int itask=0; itask<nTasks; ++itask ){ boost::thread_group threads; for( int i=0; i<nThreads; ++i ){ threads.create_thread( MyStruct(itask++ + 100) ); } threads.join_all(); }
Did you really want the ++itask in the first for() ? Isn't it being incremented enough in the create_thread line?
This was intentional. This highly contrived example creates a fixed amount of work (nTasks) and divides them up among nThreads threads. As a thread takes a task, I increment itask. Strange, I know...
struct MyStruct { explicit MyStruct(const int i) : tag(i) {} void operator()() const { const int n = 100; std::vector<int> nums(n,0); for( int j=0; j<1000000; ++j ) for( int i=0; i<n; ++i ) nums[i] = i+tag; } private: int tag; };
So sizeof(MyStruct)==sizeof(int) [for the tag]. Now, if you were creating the MyStruct objects like this:
MyStruct my_structs[n];
then I would say that they are all sharing a cache line, and that cache line is being fought over by the different processors when they read tag, and that you should add some padding. But you're not; you're passing a temporary MyStruct to create_thread which presumably stores a copy of it. How does boost::thread_group store the functors that are passed to it? If it is storing them in some sort of array or vector then that could still be the problem - and it could be fixed by adding padding inside boost.thread, or by copying the functor onto the new thread's stack.
I tried adding a member double pad[9999] (uninitialized) to MyStruct to increase its size. This had no effect on performance. I think that boost::thread stores the functors by copy on each individual thread, not on the boost::thread_group object.
Also, I would imagine that the compiler would keep tag in a register. What happens if you declare it as const?
I changed it to const and there was no effect on the performance... James