
On 09/18/2012 08:20 PM, Eric Niebler wrote:
On 9/18/2012 11:00 AM, Mathias Gaunard wrote:
This is the reason why I wrote the Boost.Compute lambda library. Basically it takes C++ lambda expressions (e.g. _1 * sqrt(_1) + 4) and transforms them into C99 source code fragments (e.g. “input[i] * sqrt(input[i]) + 4)”) which are then passed to the Boost.Compute STL-style algorithms for execution. While not perfect, it allows the user to write code closer to C++ that still can be executed through OpenCL.
From your description, it looks like you've reinvented the wheel there, causing needless limitations and interoperability problems for users.
It could have just been done by serializing arbitrary Proto transforms to C99, with extension points for custom tags.
With CUDA, you'd actually have hit the problem that the Proto functions are not marked __device__, but with OpenCL it doesn't matter.
Mathias, Could you say more about what is needed to make Proto CUDA-friendly? I'm not familiar with CUDA.
In CUDA, the same function can exist on the host (the CPU), the device (the GPU), or both. By default, unfortunately, it only exists on the host. So to be able to call certain functions from a kernel (which is on the device), those functions need to be marked as __device__ or __host__ __device__. There is the macro BOOST_GPU_ENABLED for this. I haven't tried this in a long time though, I wonder how this interacts with __attribute__((always_inline)).