Re: [boost] GSoC Ideas: Optimizing Boost libraries for multi-coremachines

2 Apr 2008

...
Does #ifdef around #pragma omp indeed work? At least
#define PARALLEL #pragma omp
and using PARALLEL does not work.
I think the above may clash with either the definition or an
implementation of #pragma; I can't recall ever successfully using a
macro that expands to any directive.  But that could be because I
haven't tried it in the last twenty years :-)

#if PARALLEL
#pragma omp ...
#endif
	For ( Sagans of rows of data ) do something_expensive()

Should "work" but not in a way that is likely to lead to a good
general-purpose library interface.
...
The compression algorithms (zip, ...) (part of the streams
library?) would be a very good candidate. I once tested a
parallel bzip2 algorithm and it scales really well.
Compression and crypto (and linear algebra, and image manipulation) are
already available as threaded, micro-optimized, runtime-selected
routines in the performance libraries from Intel and Amd; probably
everybody else too. Wrapping them in boost interfaces with fallback
implementations would be nice; it would encourage use of the best
available implementations by programs with portability requirements.

For my purposes, the ability to set up a "pipeline" like, say,
serialization -> compression -> file i/o without having to code both
sides of each relationship would be nice.  It would also provide useful
parallelism at a point in the application where the user is waiting for
the program in many cases and, if generic enough, it could do so without
requiring multithreading of the core algorithms of each of the filters.

I wrote a hard-realtime system for Schlumberger back in the dark ages
that had a multithreaded, typed data stream engine at its core.  It was
in C but designed with object-oriented concepts and would convert to a
template library (+ compiled back end) very nicely.  The coding
discipline required to implement a source, sink, or filter was fairly
rigid but the result was that the initialization script could paste
together very elaborate, very high performance signal processing
pipelines just by listing the names and parameters of the participants.
We'd have to get permission from Schlumberger if we wanted to reuse any
of my work products: anybody have a contact there?

Note that, for programs working on very large datasets, any
non-streaming interface can become a problem because it precludes
efficient use of the storage hierarchy.  If I must stream my data into
an array before I pass it to the compression routine I've already lost:
no matter how fast it is, it can't make up for the memory traffic I've
already wasted compared with cache-to-cache producer/consumer transfers.
Have a look at:

http://developer.amd.com/TechnicalArticles/Articles/Pages/CrazyFastDataS
haring.aspx

A good producer / consumer pattern implementation framework may already
exist in boost or elsewhere but if it does I haven't stumbled across it
yet.

Re: [boost] GSoC Ideas: Optimizing Boost libraries for multi-coremachines

Stephen Nuchia