Hello Boost-Users,
I have a strange Problem and also a theorie to solve it. I do not realy understand the reason
and hope for somebody how may explain the problem.
I am using boost.threads to execute some algorithms in parallel. I will post source code
in detail if this is required.
I have some work to do which is similar for different Data. Concrete: I compute some Forces
on a Deformationmodel of geometrical Edges. I create for n available Processors n-1 Threads
so lets say we have 10.000 Edges 3 times 2500 are processed from boost threads and 2500
are processed by the main thread. Because this is a worker crew i join the threads and will
be finished.
All this is done in Win32 using Visual Studio Express and Boost.
I create some wrapper Structure at the moment
struct Wrappy
{
void operator()( EdgeProcessor * array, int count)
{
for (int i = 0; i < count; i++) array[i]->compute();
}
}
My main application is creating a EdgeProcessor Array of 10000 Elements. In boost::thread
constructor I give a ptr to the elements the thread should compute and count is for every thread
2500. You'll see - everything is straight forward and worked fine in a lot of situations.
Now EdgeProcessor is a class compiled in a seperate dll (Multithreaded DLL) doing a lot of
calls - some recursive, Just basic C++ calls, no std contaiiners in use.
If I process all Elements with the main application using no threads it takes 0,07 sec and i have 100%
usage of one CPU (have an I5, so 25% in total)
If I use threads, i got 0,19 sec - more than twice the time.
If I use just one boost::thread - not processing in main app - again 0,07 sec.
If I use all threads all four cpus are in 100% usage - they are all together working but require more
than twice the time - Yes i am sure that every thread is just working on 2500 Elements.
If more than one thread is calling the EdgeProcessor, the task is done very slow.
The data for the EdgeProcessor is by the way parallel, so every EdgeProcessor has its own data and
there are no intersections or synchronisations at all.
I assume that there is a realy big overhead because of the dll. Maybe the access of the class can
not be done in parallel or not as fast as in a static case. Maybe there is a hidden synchronisation.
Do you have some explanation for this?
I use the same approach on other parts of the application without problems. The only difference
is a) using of a class Instance within a dll and b) the compute will result i a couple of recursive calls
(traversing tree)
I have experience in this, but this behavior is strange. I am sorry if this is a little win32 / dll / visual studio
like, but i am not sure where to ask.
Thanks for your ideas!
Simon