
On 9/18/2012 11:42 AM, Manjunath Kudlur wrote:
With CUDA, you'd actually have hit the problem that the Proto functions are not marked __device__, but with OpenCL it doesn't matter.
Mathias, Could you say more about what is needed to make Proto CUDA-friendly? I'm not familiar with CUDA.
The thing needed to make Proto more CUDA-friendly is the same thing that is needed to make it AMP-friendly, in case you are familiar with C++ AMP [1]. Basically, you have to intrusively annotate every Proto function with the "__host__ __device__" annotation (restrict(x86, amp) in the case of C++ AMP).
*Every* function in Proto? Or the just ones that build Proto expressions? Or evaluate them? Or some other subset? There's precedent for this. Someone submitted a patch adding BOOST_FORCEINLINE to the important functions that need to be inlined for optimal evaluation performance. Perhaps this would be a good place to start for adding something like a BOOST_PROTO_GPU_ENABLED macro. I would be perfectly willing to accept a patch. Would anybody care to submit one? -- Eric Niebler BoostPro Computing http://www.boostpro.com