Hi, In my previous mail, I was speaking about the compiler optimization dependencies while creating a parallel version of the following code: DO J = 1, 100 DO I = 1, 100 DO K = 1, 100 C(I,J) = C(I,J) + A(I,K) * B(K,J) END DO END DO END DO As you know, the above code is the 2D-Matrix multiplication logic. I tested the above logic using various compiler optimization levels from -o0, -o1, -o2 and -o3. There were no problems due to the compiler optimizations. I have used pthreads to convert the serial code to its parallel version. Initially, the serial version took 72sec to execute and the parallel version with 4 threads took around 16sec. Please have a look at the attachment for the complete working source code. The question now is, I have used the threads as shown in the pseudo code below: int main() { create_pthreads(assign_thread_ID, call the function); join_threads(thread_ID); destroy_threads(); } function_called_by_each_thread(thread_ID) { all_computations; } All thread documentation reaches to some sort of options like this. Is this the correct way to approach the problem for creating parallel algorithms for boost threads. Please clarify, whether there are any alternative approaches available to achieve parallelism using threads. PS: Please have a look at the code for further details Regards, *NAVEEN* | Mobile: 832-720-2393 | about.me http://about.me/naveen.namashivayam |