
I have been working for two or three months now on a low level core library, but with possibly different goals. I was not that much interested into developping a comprehensive math library, but mostly into being able to target GPUs.
We've decided that math functions (trigonometric, exponential, etc.) wouldn't be in the SIMD library but only in NT2.
While they use the SIMD library, those don't have any platform-specific code anyway. (at least I believe so, haven't seen the code of all of them)
Seems to be a reasonable choice.
I came out with proto being used as a front end and driving a back end that generates GPU specific source code (OpenCL, or vendor specific langage such as AMD CAL IL, or even CPU instructions using SSE Intel intrinsics for instance).
Doing that kind of thing is only possible when you have a substantial amount of code, such as a whole function or a code kernel.
Yes, that's what I will do. You write functions that look like kernels. They are embedded inside C++ code, compiled by a regular C++ compiler, and when you execute those functions, they issue dedicated source code to be compile again and executed into the target runtime's environment.
That's out of the scope of the SIMD library, which only aims at providing a portable and efficient set of SIMD operations, and recognize certain combination patterns to map them to the optimized primitives.
You could, however, compile a bunch of code at runtime using the library to achieve the desired effect.
Does "could" mean that the library will provide support to do it in a straightforward way? Or does it mean it's theoriticaly possible at great expense, because that's not a library's use case?
The generated code is then compiled again and run inside GPU runtime environment. However, this is probably very simple minded compared to NT2, and given the time I spent on it.
So, is NT2 able to target GPU? and to take into acount GPU's programing model?
Yes NT2 can do that (or will be able to, rather, since that code is not in the public repository yet), but that works at a way higher level of abstraction than the SIMD component that we're proposing for Boost (it works in terms of series of operations on multidimensional tables of arbitrary size, while the SIMD library only works with SIMD registers of for example 128 or 256 bits).
OK, I think my use case needs a more low level fine control, but I need to know more about what NT2 will be able to do. I'll wait for the doc and friendly examples.
The two GPU backends (OpenCL and CUDA) are not released to the public yet because we're still considering commercial ventures with these.
Does this mean there is a possibility the GPU backends will not be avaible as open source? or with a restricted licence?
The OpenCL backend generates and compiles code at runtime from a proto expression, the CUDA one does something smarter.
I'd like to ear about the smarter things! but I guess you won't say that much if you have commercial plans... Best regards, Antoine.