GSoC SIMD project

Joel Falcou and I have a SIMD library in the works that we plan to submit to Boost by May; we've also submitted a boostcon talk on the subject. Would it be acceptable to propose a GSoC project within Boost which could involve porting the library to new architectures, writing more tests, documentation, and examples, playing with it to implement fast vectorized functions for common tasks, etc.?

On Mon, Feb 28, 2011 at 4:32 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
Joel Falcou and I have a SIMD library in the works that we plan to submit to Boost by May; we've also submitted a boostcon talk on the subject.
Mathias, may I ask for links to source/examples/docs? Best regards, Christoph

On 28/02/2011 17:05, Christoph Heindl wrote:
On Mon, Feb 28, 2011 at 4:32 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
Joel Falcou and I have a SIMD library in the works that we plan to submit to Boost by May; we've also submitted a boostcon talk on the subject.
Mathias,
may I ask for links to source/examples/docs?
Source is at <http://github.com/MetaScale/nt2/> in include/nt2/sdk/simd but it's not quite user-friendly yet (no docs etc.). Announcements will be made when docs and tutorials are available. Some good tutorials will be made at Boostcon ;).

On 28 February 2011 18:30, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 28/02/2011 17:05, Christoph Heindl wrote:
On Mon, Feb 28, 2011 at 4:32 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
Joel Falcou and I have a SIMD library in the works that we plan to submit to Boost by May; we've also submitted a boostcon talk on the subject.
Mathias,
may I ask for links to source/examples/docs?
Source is at <http://github.com/MetaScale/nt2/> in include/nt2/sdk/simd but it's not quite user-friendly yet (no docs etc.).
Announcements will be made when docs and tutorials are available. Some good tutorials will be made at Boostcon ;).
Then wouldn't it be a bit late for a GSoC project? At least for this year... Out of curiosity, what would be the requirements to apply to this particular project as a student? A *good* knowledge of SIMD? on all architectures supporting it?

On 28/02/11 18:54, Mathieu - wrote:
Then wouldn't it be a bit late for a GSoC project? At least for this year...
I have to check. Docs have to be written for internal use atm anyway
Out of curiosity, what would be the requirements to apply to this particular project as a student? A *good* knowledge of SIMD? on all architectures supporting it? i dont think so. We have a comprehensive ssex supprot except for corners of sse 4 and ssse3. Altivec is more primitive and also easier to program

On Mon, Feb 28, 2011 at 6:30 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
Source is at <http://github.com/MetaScale/nt2/> in include/nt2/sdk/simd but it's not quite user-friendly yet (no docs etc.).
thanks! Christoph

Source is at <http://github.com/MetaScale/nt2/> in include/nt2/sdk/simd but it's not quite user-friendly yet (no docs etc.).
Announcements will be made when docs and tutorials are available. Some good tutorials will be made at Boostcon ;).
That's great. I have been working for two or three months now on a low level core library, but with possibly different goals. I was not that much interested into developping a comprehensive math library, but mostly into being able to target GPUs. I came out with proto being used as a front end and driving a back end that generates GPU specific source code (OpenCL, or vendor specific langage such as AMD CAL IL, or even CPU instructions using SSE Intel intrinsics for instance). The generated code is then compiled again and run inside GPU runtime environment. However, this is probably very simple minded compared to NT2, and given the time I spent on it. So, is NT2 able to target GPU? and to take into acount GPU's programing model? Best regards, Antoine.

Does this support runtime selection? Also, every time I look for something in this project, I have to go through about 15 header files daisy chained before I find close to what I'm looking for. Dan On 02/28/2011 04:19 PM, Antoine de Maricourt wrote:
Source is at <http://github.com/MetaScale/nt2/> in include/nt2/sdk/simd but it's not quite user-friendly yet (no docs etc.).
Announcements will be made when docs and tutorials are available. Some good tutorials will be made at Boostcon ;).
That's great.
I have been working for two or three months now on a low level core library, but with possibly different goals. I was not that much interested into developping a comprehensive math library, but mostly into being able to target GPUs.
I came out with proto being used as a front end and driving a back end that generates GPU specific source code (OpenCL, or vendor specific langage such as AMD CAL IL, or even CPU instructions using SSE Intel intrinsics for instance). The generated code is then compiled again and run inside GPU runtime environment. However, this is probably very simple minded compared to NT2, and given the time I spent on it.
So, is NT2 able to target GPU? and to take into acount GPU's programing model?
Best regards,
Antoine.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On 01/03/2011 00:36, Dan Weber wrote:
Does this support runtime selection?
No.
Also, every time I look for something in this project, I have to go through about 15 header files daisy chained before I find close to what I'm looking for.
Yes, that's a problem I hope to get rid of for the boostification.

On 01/03/11 00:36, Dan Weber wrote:
Does this support runtime selection?
Also, every time I look for something in this project, I have to go through about 15 header files daisy chained before I find close to what I'm looking for. Yeah this something of a concern. However it helped us to stay easily extensible by chaining SSE variant one in the other without hassle and huge #ifdef/else/endif chains.
Any proposal/patches/comments welcome.

On 01/03/11 00:19, Antoine de Maricourt wrote:
I came out with proto being used as a front end and driving a back end that generates GPU specific source code (OpenCL, or vendor specific langage such as AMD CAL IL, or even CPU instructions using SSE Intel intrinsics for instance). The generated code is then compiled again and run inside GPU runtime environment. However, this is probably very simple minded compared to NT2, and given the time I spent on it.
So, is NT2 able to target GPU? and to take into acount GPU's programing model?
Yes.

On 01/03/2011 00:19, Antoine de Maricourt wrote:
Source is at <http://github.com/MetaScale/nt2/> in include/nt2/sdk/simd but it's not quite user-friendly yet (no docs etc.).
Announcements will be made when docs and tutorials are available. Some good tutorials will be made at Boostcon ;).
That's great.
I have been working for two or three months now on a low level core library, but with possibly different goals. I was not that much interested into developping a comprehensive math library, but mostly into being able to target GPUs.
We've decided that math functions (trigonometric, exponential, etc.) wouldn't be in the SIMD library but only in NT2. While they use the SIMD library, those don't have any platform-specific code anyway. (at least I believe so, haven't seen the code of all of them)
I came out with proto being used as a front end and driving a back end that generates GPU specific source code (OpenCL, or vendor specific langage such as AMD CAL IL, or even CPU instructions using SSE Intel intrinsics for instance).
Doing that kind of thing is only possible when you have a substantial amount of code, such as a whole function or a code kernel. That's out of the scope of the SIMD library, which only aims at providing a portable and efficient set of SIMD operations, and recognize certain combination patterns to map them to the optimized primitives. You could, however, compile a bunch of code at runtime using the library to achieve the desired effect.
The generated code is then compiled again and run inside GPU runtime environment. However, this is probably very simple minded compared to NT2, and given the time I spent on it.
So, is NT2 able to target GPU? and to take into acount GPU's programing model?
Yes NT2 can do that (or will be able to, rather, since that code is not in the public repository yet), but that works at a way higher level of abstraction than the SIMD component that we're proposing for Boost (it works in terms of series of operations on multidimensional tables of arbitrary size, while the SIMD library only works with SIMD registers of for example 128 or 256 bits). The two GPU backends (OpenCL and CUDA) are not released to the public yet because we're still considering commercial ventures with these. The OpenCL backend generates and compiles code at runtime from a proto expression, the CUDA one does something smarter.

I have been working for two or three months now on a low level core library, but with possibly different goals. I was not that much interested into developping a comprehensive math library, but mostly into being able to target GPUs.
We've decided that math functions (trigonometric, exponential, etc.) wouldn't be in the SIMD library but only in NT2.
While they use the SIMD library, those don't have any platform-specific code anyway. (at least I believe so, haven't seen the code of all of them)
Seems to be a reasonable choice.
I came out with proto being used as a front end and driving a back end that generates GPU specific source code (OpenCL, or vendor specific langage such as AMD CAL IL, or even CPU instructions using SSE Intel intrinsics for instance).
Doing that kind of thing is only possible when you have a substantial amount of code, such as a whole function or a code kernel.
Yes, that's what I will do. You write functions that look like kernels. They are embedded inside C++ code, compiled by a regular C++ compiler, and when you execute those functions, they issue dedicated source code to be compile again and executed into the target runtime's environment.
That's out of the scope of the SIMD library, which only aims at providing a portable and efficient set of SIMD operations, and recognize certain combination patterns to map them to the optimized primitives.
You could, however, compile a bunch of code at runtime using the library to achieve the desired effect.
Does "could" mean that the library will provide support to do it in a straightforward way? Or does it mean it's theoriticaly possible at great expense, because that's not a library's use case?
The generated code is then compiled again and run inside GPU runtime environment. However, this is probably very simple minded compared to NT2, and given the time I spent on it.
So, is NT2 able to target GPU? and to take into acount GPU's programing model?
Yes NT2 can do that (or will be able to, rather, since that code is not in the public repository yet), but that works at a way higher level of abstraction than the SIMD component that we're proposing for Boost (it works in terms of series of operations on multidimensional tables of arbitrary size, while the SIMD library only works with SIMD registers of for example 128 or 256 bits).
OK, I think my use case needs a more low level fine control, but I need to know more about what NT2 will be able to do. I'll wait for the doc and friendly examples.
The two GPU backends (OpenCL and CUDA) are not released to the public yet because we're still considering commercial ventures with these.
Does this mean there is a possibility the GPU backends will not be avaible as open source? or with a restricted licence?
The OpenCL backend generates and compiles code at runtime from a proto expression, the CUDA one does something smarter.
I'd like to ear about the smarter things! but I guess you won't say that much if you have commercial plans... Best regards, Antoine.

On 01/03/11 23:05, Antoine de Maricourt wrote:
I have been working for two or three months now on a low level core library, but with possibly different goals. I was not that much interested into developping a comprehensive math library, but mostly into being able to target GPUs.
We've decided that math functions (trigonometric, exponential, etc.) wouldn't be in the SIMD library but only in NT2.
While they use the SIMD library, those don't have any platform-specific code anyway. (at least I believe so, haven't seen the code of all of them)
Seems to be a reasonable choice.
Most trigonometric code is built on top of extension agnostic operators. The question is really how much should we put inside Boost.Simd. We have litterally hundreds of them and they have, for some, quite uncanny dependencies to low level IEEE bit-fiddling function.
Yes, that's what I will do. You write functions that look like kernels. They are embedded inside C++ code, compiled by a regular C++ compiler, and when you execute those functions, they issue dedicated source code to be compile again and executed into the target runtime's environment. NT2 is more focus on compile-time code generation in every situation it can
Does "could" mean that the library will provide support to do it in a straightforward way? Or does it mean it's theoriticaly possible at great expense, because that's not a library's use case? We have such a feature planned to accomodate some embedded architecture where a c++ compiler is non existant..
OK, I think my use case needs a more low level fine control, but I need to know more about what NT2 will be able to do. I'll wait for the doc and friendly examples. We'll push hard tohave some, we really mean it :)
Does this mean there is a possibility the GPU backends will not be avaible as open source? or with a restricted licence? Depends of a lot of thing, some external to us, some depending on what's left in the design.

Joel Falcou and I have a SIMD library in the works that we plan to submit to Boost by May; we've also submitted a boostcon talk on the subject.
Would it be acceptable to propose a GSoC project within Boost which could involve porting the library to new architectures, writing more tests, documentation, and examples, playing with it to implement fast vectorized functions for common tasks, etc.?
I think that would be a great project. Why don't you add a short description to the project page, here: https://svn.boost.org/trac/boost/wiki/SoC2011 Andrew

Joel Falcou and I have a SIMD library in the works that we plan to submit to Boost by May; we've also submitted a boostcon talk on the subject.
out of curiosity: do you support any `advanced' SIMD operations like libm functions for SIMD vectors and the like? i have a small generic SIMD library [1], which implements a subset of libm in a generic way. it might be an interesting gsoc project to add some kind of SIMD math support, if it is not in there, yet ... the functionality is not trivial to implement, but gives some considerable speedup ... cheers, tim [1] http://tim.klingt.org/git?p=nova-simd.git;a=summary -- tim@klingt.org http://tim.klingt.org Music is the can opener of the soul. It makes you terribly quiet inside, makes you aware that there's a roof to your being. Henry Miller

On 28/02/11 20:06, Tim Blechmann wrote:
Joel Falcou and I have a SIMD library in the works that we plan to submit to Boost by May; we've also submitted a boostcon talk on the subject. out of curiosity: do you support any `advanced' SIMD operations like libm functions for SIMD vectors and the like? i have a small generic SIMD library [1], which implements a subset of libm in a generic way. it might be an interesting gsoc project to add some kind of SIMD math support, if it is not in there, yet ... the functionality is not trivial to implement, but gives some considerable speedup ...
we have libm and more, just extracting list of function from the trigonometric toolbox give us this: https://github.com/MetaScale/nt2/tree/master/include/nt2/toolbox/trigonometr... for info, our sin/cos is like 9 cycles/value (~40 cycles per vector of float) with a 1-2 ulp of precision. we also have a degraded fast sin/cos that only works on the trigonometric circle and yields 2.7 cycles/values. Other speed up are roughly the same. and here is our complete toolbox list so far : https://github.com/MetaScale/nt2/tree/master/include/nt2/toolbox

Joel Falcou and I have a SIMD library in the works that we plan to submit to Boost by May; we've also submitted a boostcon talk on the subject. out of curiosity: do you support any `advanced' SIMD operations like libm functions for SIMD vectors and the like? i have a small generic SIMD library [1], which implements a subset of libm in a generic way. it might be an interesting gsoc project to add some kind of SIMD math support, if it is not in there, yet ... the functionality is not trivial to implement, but gives some considerable speedup ...
we have libm and more, just extracting list of function from the trigonometric toolbox give us this:
very cool! i will probably target boost.simd as backend then :D -- tim@klingt.org http://tim.klingt.org The aim of education is the knowledge, not of facts, but of values William S. Burroughs
participants (8)
-
Andrew Sutton
-
Antoine de Maricourt
-
Christoph Heindl
-
Dan Weber
-
Joel Falcou
-
Mathias Gaunard
-
Mathieu -
-
Tim Blechmann