GSoC Ideas: Optimizing Boost libraries for multi-core machines

Hello everybody, As I said in last e-mails, I'm interested in Boost projects to GSoC. Now, I have write a proposal and submit it to GSoC, but I'll really appreciate if anyone comment my idea. (I can change something wrong or strange yet) Well, I'm proposing to parallelize one or more libraries in Boost, using OpenMP to do this. Basically, I'll choose a library, review its source code, identify and parallelize the bottlenecks, and document the strategy and results. These steps will be done iteratively during the project. Any comment is welcome. Best, -- Cristianno Martins Mastering in Computer Science State University of Campinas cristianno.martins@students.ic.unicamp.br skype:cristiannomartins g-Talk: cristiannomartins msn: cristiannomartins@hotmail.com (rarely used)

On Apr 2, 2008, at 10:48 AM, Cristianno Martins wrote:
As I said in last e-mails, I'm interested in Boost projects to GSoC. Now, I have write a proposal and submit it to GSoC, but I'll really appreciate if anyone comment my idea. (I can change something wrong or strange yet) Well, I'm proposing to parallelize one or more libraries in Boost, using OpenMP to do this. Basically, I'll choose a library, review its source code, identify and parallelize the bottlenecks, and document the strategy and results. These steps will be done iteratively during the project.
That is certainly an interesting project. When you submit your GSoC application, please identify exactly which library you plan to parallelize with OpenMP, and which parts of that library you will parallelize. - Doug

Just a quick question. Wouldn't be TBB the better choice instead of OpenMP? Christian On Wed, Apr 2, 2008 at 10:59 AM, Doug Gregor <dgregor@osl.iu.edu> wrote:
On Apr 2, 2008, at 10:48 AM, Cristianno Martins wrote:
As I said in last e-mails, I'm interested in Boost projects to GSoC. Now, I have write a proposal and submit it to GSoC, but I'll really appreciate if anyone comment my idea. (I can change something wrong or strange yet) Well, I'm proposing to parallelize one or more libraries in Boost, using OpenMP to do this. Basically, I'll choose a library, review its source code, identify and parallelize the bottlenecks, and document the strategy and results. These steps will be done iteratively during the project.
That is certainly an interesting project. When you submit your GSoC application, please identify exactly which library you plan to parallelize with OpenMP, and which parts of that library you will parallelize.
- Doug _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Apr 2, 2008, at 11:05 AM, Christian Henning wrote:
Just a quick question. Wouldn't be TBB the better choice instead of OpenMP?
I can see good reasons for both options. TBB will make it easier to implement task-oriented parallel algorithms, but it brings with it a dependency on a non-Boost library. OpenMP is reasonably good at data parallelism (and can require fewer changes to algorithms where it applies), and depends on features available in many compilers, but it isn't as widely applicable as TBB. I think you choice will be dictated mainly by the library you choose, and how you think you will need to parallelize that library. - Doug

In my experience, there are very few instances where parallelism can be usefully concealed behind a library interface. OpenMP has very high overhead and will help only for very long-running functions -- at least under Microsoft's compiler on x86 and x64 platforms. Algorithms that are likely to be applied to very large datasets could have the OMP pragmas inserted optionally but they would need to be protected by #ifdef logic (or given distinct names) because otherwise the overhead will destroy programs that make more frequent calls on smaller datasets. In VS2005 a parallel-for structure with a literal 0 enabling flag parameter adds all the overhead of one that is enabled; you can't use the enable logic in the OMP syntax to do effective data-size algorithm selection without replicating the code. Identifying interfaces that can be usefully altered to more readily permit applications to exploit parallelism may be harder but is a lot more likely to pay off. Also consider that, in an application that is already parallelized, there are no extra cores for the library to use internally. On a smaller scale, adding "vectorized" and/or "streaming producer-consumer" interfaces for selected libraries may help a lot by encouraging use of vector instruction / execution units, unrolling of loops and improving instruction and data locality. I must also repeat the age-old, time-tested capital-T Truth about optimization: if you do something that is not suggested by and validated against careful analysis of realistic use cases you are wasting your time. I strongly advise you to not start hacking without solid data. Gathering real-world use cases into a "library" of performance-oriented application code and datasets would be, in my opinion, a pretty good summer's work. Produce a final report that others can later mine for performance improvement opportunities. Wrap the use-case library up as a performance regression test suite; plumb the test into the boost automated testing infrastructure. Another good use for a library of smaller use cases is as fodder for "profile guided optimization" offered by many modern compilers. If you still have time left over you could add support for PGO to boost.build. For some chips, notably Itanium, PGO makes a very noticeable difference. Hard to see how that would help with header-only libs though, except by giving the application programmers an example to go by.

On Wed, Apr 02, 2008 at 12:21:43PM -0500, Stephen Nuchia wrote:
In my experience, there are very few instances where parallelism can be usefully concealed behind a library interface. OpenMP has very high overhead and will help only for very long-running functions -- at least under Microsoft's compiler on x86 and x64 platforms. Algorithms that are likely to be applied to very large datasets could have the OMP pragmas inserted optionally but they would need to be protected by #ifdef logic (or given distinct names) because otherwise the overhead
Does #ifdef around #pragma omp indeed work? At least #define PARALLEL #pragma omp and using PARALLEL does not work.
will destroy programs that make more frequent calls on smaller datasets.
I must also repeat the age-old, time-tested capital-T Truth about optimization: if you do something that is not suggested by and validated against careful analysis of realistic use cases you are wasting your time. I strongly advise you to not start hacking without solid data.
The compression algorithms (zip, ...) (part of the streams library?) would be a very good candidate. I once tested a parallel bzip2 algorithm and it scales really well. Jens

Does #ifdef around #pragma omp indeed work? At least #define PARALLEL #pragma omp and using PARALLEL does not work.
I think the above may clash with either the definition or an implementation of #pragma; I can't recall ever successfully using a macro that expands to any directive. But that could be because I haven't tried it in the last twenty years :-) #if PARALLEL #pragma omp ... #endif For ( Sagans of rows of data ) do something_expensive() Should "work" but not in a way that is likely to lead to a good general-purpose library interface.
The compression algorithms (zip, ...) (part of the streams library?) would be a very good candidate. I once tested a parallel bzip2 algorithm and it scales really well.
Compression and crypto (and linear algebra, and image manipulation) are already available as threaded, micro-optimized, runtime-selected routines in the performance libraries from Intel and Amd; probably everybody else too. Wrapping them in boost interfaces with fallback implementations would be nice; it would encourage use of the best available implementations by programs with portability requirements. For my purposes, the ability to set up a "pipeline" like, say, serialization -> compression -> file i/o without having to code both sides of each relationship would be nice. It would also provide useful parallelism at a point in the application where the user is waiting for the program in many cases and, if generic enough, it could do so without requiring multithreading of the core algorithms of each of the filters. I wrote a hard-realtime system for Schlumberger back in the dark ages that had a multithreaded, typed data stream engine at its core. It was in C but designed with object-oriented concepts and would convert to a template library (+ compiled back end) very nicely. The coding discipline required to implement a source, sink, or filter was fairly rigid but the result was that the initialization script could paste together very elaborate, very high performance signal processing pipelines just by listing the names and parameters of the participants. We'd have to get permission from Schlumberger if we wanted to reuse any of my work products: anybody have a contact there? Note that, for programs working on very large datasets, any non-streaming interface can become a problem because it precludes efficient use of the storage hierarchy. If I must stream my data into an array before I pass it to the compression routine I've already lost: no matter how fast it is, it can't make up for the memory traffic I've already wasted compared with cache-to-cache producer/consumer transfers. Have a look at: http://developer.amd.com/TechnicalArticles/Articles/Pages/CrazyFastDataS haring.aspx A good producer / consumer pattern implementation framework may already exist in boost or elsewhere but if it does I haven't stumbled across it yet.

On Wed, Apr 2, 2008 at 11:37 PM, Stephen Nuchia <snuchia@statsoft.com> wrote:
Does #ifdef around #pragma omp indeed work? At least #define PARALLEL #pragma omp and using PARALLEL does not work.
I think the above may clash with either the definition or an implementation of #pragma; I can't recall ever successfully using a macro that expands to any directive. But that could be because I haven't tried it in the last twenty years :-)
I do not think that there is any way for a macro to expand to a directive. Except that C99 has _Pragma that can be used in macros. I do not know how many c++ compilers support it nor if it actually works with OpenMP. -- gpd

Giovanni Piero Deretta wrote:
On Wed, Apr 2, 2008 at 11:37 PM, Stephen Nuchia <snuchia@statsoft.com> wrote:
Does #ifdef around #pragma omp indeed work? At least #define PARALLEL #pragma omp and using PARALLEL does not work.
I think the above may clash with either the definition or an implementation of #pragma; I can't recall ever successfully using a macro that expands to any directive. But that could be because I haven't tried it in the last twenty years :-)
I do not think that there is any way for a macro to expand to a directive.
You may want to pose that question to Paul Mensonides. He may have a way to get the desired effect with Boost::PP. Jeff Flinn

On Apr 3, 2008, at 8:36 AM, Jeff Flinn wrote:
Giovanni Piero Deretta wrote:
On Wed, Apr 2, 2008 at 11:37 PM, Stephen Nuchia <snuchia@statsoft.com> wrote:
Does #ifdef around #pragma omp indeed work? At least #define PARALLEL #pragma omp and using PARALLEL does not work.
I think the above may clash with either the definition or an implementation of #pragma; I can't recall ever successfully using a macro that expands to any directive. But that could be because I haven't tried it in the last twenty years :-)
I do not think that there is any way for a macro to expand to a directive.
You may want to pose that question to Paul Mensonides. He may have a way to get the desired effect with Boost::PP.
C99 has _Pragma; otherwise, it isn't possible. - Doug

Doug Gregor wrote:
On Apr 3, 2008, at 8:36 AM, Jeff Flinn wrote:
Giovanni Piero Deretta wrote:
On Wed, Apr 2, 2008 at 11:37 PM, Stephen Nuchia <snuchia@statsoft.com> wrote:
Does #ifdef around #pragma omp indeed work? At least #define PARALLEL #pragma omp and using PARALLEL does not work. I think the above may clash with either the definition or an implementation of #pragma; I can't recall ever successfully using a macro that expands to any directive. But that could be because I haven't tried it in the last twenty years :-)
I do not think that there is any way for a macro to expand to a directive. You may want to pose that question to Paul Mensonides. He may have a way to get the desired effect with Boost::PP.
C99 has _Pragma; otherwise, it isn't possible.
I haven't tried this, but how about the following(albeit not pretty): #include pragma_omp_parallel { #include pragma_omp_for for(int i = 1; i < size; ++i) x[i] = (y[i-1] + y[i+1])/2; } where the pragma_omp_* expand to file names whose contents contain #pragma omp ***? Just a thought. Jeff Flinn
participants (7)
-
Christian Henning
-
Cristianno Martins
-
Doug Gregor
-
Giovanni Piero Deretta
-
Jeff Flinn
-
Jens Seidel
-
Stephen Nuchia