Re: [boost] [gsoc] boost.simd news from the front.

11 Jun 2011


      Joel falcou <joel.falcou@gmail.com> writes:
...
On 11/06/11 11:17, David A. Greene wrote:
...
Mathias Gaunard<mathias.gaunard@ens-lyon.org>  writes:
...
Making data parallelism simpler is the goal of NT2. And we do that by
removing loops and pointers entirely.
First off, I want to apologize for sparking some emotions.  That was not
my intent.  I am deeply sorry for not expressing myself well.
We all fall for blatant miscommunication there I guess ;)
...
NT2 sounds very interesting!  Does it generate the loops given calls
into generic code?
Basically yes, you express container based, semantic driven code using
a matlab like syntax (+ more in case where matlab dont provide
anything suitable) and the various evaluation point generates loop
nests with properties derived from information carried by the
container type and its settings (storage order, sharing data status,
etc).
This is super-cool!  Anything to help the programmer restructure code
(or generate the loops correctly in the first place) is a huge win.
...
The evaluation is then done by forwarding the expression to a
hierarchical layer of architecture dependant meta-programms that, at
each steps, strip the expression of its important high level semantic
inforamtions and help generate the proper piece of code.
You machine intrinsics here, yes?  This is where I think many times
the compiler might do better.  If the compiler is good.  :)

It's a little odd that "important" information would be stripped.
I know this is not a discussion of NT2 but for the curious, can
you explain this?  Thanks!
...
I assume the rest of the discussion is done for a programm written
with the correct algorithm in term of compelxity, right ?
By correct algorithm, you mean an algorithm structured to expose data
parallelism?  If so, yes, I think that's right.
...
...
- Programmer tries to run the compiler on it, examines code
     - Code sometimes (maybe most of the time) executes poorly
     - If not, done
Yes.
...
- Programmer restructures loop nest to expose parallelism
     - Try compiler directives first, if available (tell compiler which
       loops to interchange, where to cache block, blocking factors,
       which loops to collapse, etc.)
     - Otherwise, hand-restructure (ouch!)
If compilers allow for such informations to be carried yes.
Right.  Many don't and in those cases, boost.simd is a great
alternative.
...
...
- Programmer tries compiler again on restructured loop nest
     - Code may execute poorly
     - If not, done
Yes
...
- Programmer adds directives to tell the compiler which loops
     to vectorize, which to leave scalar, etc.
     - Code may still execute poorly
     - If not, done
Again, provided such a compiler is available on said platform
Of course.
...
...
- Programmer uses boost.simd to write vector code at a higher level
     than provided compiler intrinsics
Yes and using a proper range based interface instead of a mess of for loops.
Yep!
...
...
Does that seem like a reasonable use case?
Yes. What we missed to clarify is that for a large share of people,
available compilers on their systems fails to provide way to do step
#2 and #3. And for these people, what they see is a world in which
they are on their own dealing with this.
Oh absolutely.  But I think that such people should be aware that code
generated by boost.simd may not be the best for their hardware
implementation IF they have access to a good compiler later.  In those
cases, though, I suppose replacing, say, pack<float> with float
everywhere should get most of the original scalar code back.  There may
yet be a little more cleanup to do but isn't that the case with _every_
HPC code?  :) :) :)

                          -Dave

Re: [boost] [gsoc] boost.simd news from the front.

greened＠obbligato.org