Re: [boost] How to structurate libraries ?

17 Jan 2009

      ...
Ahem: http://www.cray.com
Two points :
1/ Not everyone has access to a cray-like machine. Parallelization tools 
for CotS machines is not to be neglected and, on this front, lots of
David A. Greene a écrit :
thing need to be done
2/ vector supercomputer != SIMD-enabled processor even if the former may 
include the later.
...
Auto parallelization has been around since at least the '80's in 
production machines.  I'm sure it was around even earlier than that.
What do you call auto-parallelization ?

Are you telling me that, nowaday , I can take *any* source code written 
in C or C++ or w/e compile it with some compiler specifying --parallel 
and automagically get a parallel version of the code ?  If so, you'll 
have to send a memo to at least a dozen research team (including mine) 
all over the world so they can stop trying working on this problem and 
move on something else. Should I also assume than each time a new 
architecture comes out, those compilers also know the best way to 
generate code for them ?  I beg to differ, but automatic parallelization 
is far from "done".

Then again, by just looking at the problem of writing SIMD code : 
explain why we still get better performance for simple code when writing 
SIMD code by hand than letting gcc auto-vectorize it ?
...
Perhaps your SIMD library could invent convenient ways 
to express those idioms in a machine-independent way.
Well, considering the question was first about how to structure the 
group of library i'm proposing,

I apologize to not having taken the time to express all the features of 
those libraries. Moreover, even with a simple example, the fact that the 
library hides the differences between 
SSE2,SSSE3,SSE3,SSE4,Altivec,SPU-VMX and the forecoming AVX is a feature 
on its own. Oh, and as specified in the former mail, the DSL take care 
of optimizing fused operation so thing like FMA are detected and 
replaced by the proper intrinsic when possible. Same with reduction like 
min/max, operations like b*c-a or SAD on SSEx.
...
Your simple SIMD expression example isn't terribly compelling.  Any competent 
compiler should be able to vectorize a scalar loop that implements it
...
What would be compelling is a library to express things like the Cell's 
scratchpad.  Libraries to do data staging would be interesting because more 
and more processers are going to add these kinds of local memory
I don't see what you have in mind. Do you mean something like Hierarchic 
Tiled Array ? or some Cell based development library ? If the later, I 
don't think boost is the best home for it. As for HTA, lots of 
implementation already exists, and guess what, they just do the
Well, sorry then to have given a simple example.
parallelization themselves instead of letting the computer do it.

Anyway, we'll be able to discuss the library in itself and its features 
when a proper thread for it will start.

-- 
___________________________________________
Joel Falcou - Assistant Professor
PARALL Team - LRI - Universite Paris Sud XI
Tel : (+33)1 69 15 66 35