Re: [boost] Interest in a GPU computing library

18 Sep 2012

      On 09/18/2012 06:28 PM, Kyle Lutz wrote:
...
*** Why not target CUDA and/or support multiple back-ends? ***
CUDA and OpenCL are two very different technologies. OpenCL works by
compiling C99 code at run-time to generate kernel objects which can
then be executed on the GPU. CUDA, on the other hand, works by
compiling its kernels using a special compiler (nvcc) which then
produces binaries which can executed on the GPU.
The company I work at has technology to generate both CUDA (at 
compile-time) and OpenCL (at runtime) kernels from expression templates.

At the moment we have support for element-wise, global and partial 
reduction across all dimensions as well as partial scanning across all 
dimensions. Element-wise function combinations can be merged into a 
single reduction and scanning kernel.

Everything is automatically streamed and retrieved as needed and data is 
cached on the device when possible, with a runtime deciding the right 
amount of memory and computing resources to allocate for each 
computation depending on the device capabilities.

Therefore, I do not think both CUDA and OpenCL is an impossible problem. 
People want CUDA for a simple reason: CUDA is still faster than 
equivalent OpenCL on NVIDIA hardware.

I think however that automatic kernel generation is a whole problem of 
its own, and should be clearly separated from the distribution and 
memory handling logic.
...
This is the reason why I wrote the Boost.Compute lambda library.
Basically it takes C++ lambda expressions (e.g. _1 * sqrt(_1) + 4) and
transforms them into C99 source code fragments (e.g. “input[i] *
sqrt(input[i]) + 4)”) which are then passed to the Boost.Compute
STL-style algorithms for execution. While not perfect, it allows the
user to write code closer to C++ that still can be executed through
OpenCL.
From your description, it looks like you've reinvented the wheel there, 
causing needless limitations and interoperability problems for users.

It could have just been done by serializing arbitrary Proto transforms 
to C99, with extension points for custom tags.

With CUDA, you'd actually have hit the problem that the Proto functions 
are not marked __device__, but with OpenCL it doesn't matter.

Re: [boost] Interest in a GPU computing library

Mathias Gaunard