
On 01/03/2011 00:19, Antoine de Maricourt wrote:
Source is at <http://github.com/MetaScale/nt2/> in include/nt2/sdk/simd but it's not quite user-friendly yet (no docs etc.).
Announcements will be made when docs and tutorials are available. Some good tutorials will be made at Boostcon ;).
That's great.
I have been working for two or three months now on a low level core library, but with possibly different goals. I was not that much interested into developping a comprehensive math library, but mostly into being able to target GPUs.
We've decided that math functions (trigonometric, exponential, etc.) wouldn't be in the SIMD library but only in NT2. While they use the SIMD library, those don't have any platform-specific code anyway. (at least I believe so, haven't seen the code of all of them)
I came out with proto being used as a front end and driving a back end that generates GPU specific source code (OpenCL, or vendor specific langage such as AMD CAL IL, or even CPU instructions using SSE Intel intrinsics for instance).
Doing that kind of thing is only possible when you have a substantial amount of code, such as a whole function or a code kernel. That's out of the scope of the SIMD library, which only aims at providing a portable and efficient set of SIMD operations, and recognize certain combination patterns to map them to the optimized primitives. You could, however, compile a bunch of code at runtime using the library to achieve the desired effect.
The generated code is then compiled again and run inside GPU runtime environment. However, this is probably very simple minded compared to NT2, and given the time I spent on it.
So, is NT2 able to target GPU? and to take into acount GPU's programing model?
Yes NT2 can do that (or will be able to, rather, since that code is not in the public repository yet), but that works at a way higher level of abstraction than the SIMD component that we're proposing for Boost (it works in terms of series of operations on multidimensional tables of arbitrary size, while the SIMD library only works with SIMD registers of for example 128 or 256 bits). The two GPU backends (OpenCL and CUDA) are not released to the public yet because we're still considering commercial ventures with these. The OpenCL backend generates and compiles code at runtime from a proto expression, the CUDA one does something smarter.