
No, if you also increment itask in the outer loop, you are doing fewer tasks when nThreads is smaller. Although, what I said before about creating more threads than you think was wrong (I misunderstood your program).
Also, your program appears to output the cpu time spent on the program, not the real time it took. What I see (after fixing the itask increment):
$time ./a.out Executing 12 tasks using 1 threads. time: 9.64
real 0m9.668s user 0m9.641s sys 0m0.004s
$ time ./a.out Executing 12 tasks using 2 threads. time: 9.91
real 0m4.991s user 0m9.909s sys 0m0.012s
You are right - here is the modified loop structure: int itask=0; while( itask<nTasks ){ boost::thread_group threads; for( int i=0; i<nThreads; ++i ){ threads.create_thread( MyStruct(itask++ + 100) ); } threads.join_all(); } Any thoughts on what is chewing up the extra time then? Isn't the total time the correct measure here? Is the difference between the "real" time and the "system" time due to the join_all? James