On Sun, Dec 28, 2014 at 4:46 PM, Gruenke,Matt
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Kyle Lutz Sent: Sunday, December 28, 2014 14:42 To: boost@lists.boost.org List Subject: Re: [boost] [compute] review
On Sun, Dec 28, 2014 at 1:54 AM, Gruenke,Matt wrote:
I agree with other comments made about synchronization. The design should be more explicit about what's asynchronous,
Like I mentioned before, there is only one method for asynchrony in Boost.Compute, the command queue abstraction provided by OpenCL. Operations are enqueued to be executed on the compute device and this occurs asynchronously with respect to code executing on the host. The exception to this rule are functions which interact directly with host-memory which by default are blocking and offer explicit "_async()" versions (in order to prevent potential race-conditions on host-memory).
Regarding synchronization, I'm a also bit concerned about the performance impact of synchronizing on all copies to host memory. Overuse of synchronization can easily result in performance deterioration. On this point, I think it might be worth limiting host memory usable with algorithms, to containers that perform implicit synchronization to actual use (or destruction) of results. Give users the choice between that or performing explicit copies to raw types.
To be clear, all copies are not synchronized with host memory. Boost.Compute allows both synchronous and asynchronous memory transfers between the host and device.
My understanding, based on comments you've made to other reviewers, is that functions like boost::compute::transform() are asynchronous when the result is on the device, but block when the result is on the host. This is what I'm concerned about. Is it true?
Yes this is correct. In general, algorithms like transform() are asynchronous when the input/output ranges are both on the device and synchronous when one of the ranges is on the host. I'll work on better ways to allow asynchrony in the latter case. One of my current ideas is add asynchronous memory-mapping support to the mapped_view class [1] which can then be used with any of the algorithms in an asynchronous fashion.
Also, I agree with Thomas M that it'd be useful for operations to return events.
All asynchronous operations in the command queue class do return events. One of his comments was to also return events from the synchronous methods for consistency and I am working on adding this.
Well, what I had in mind was events for higher-order operations, like boost::compute::transform().
Yes, I would also like to have higher-level support for chaining together algorithms asynchronously. However, designing a generic and generally useful API for this is a complex task and may take some time and (I've shied away from just adding an extra "_async()" function for all of the algorithm APIs as I think it could be done better/more-extensibly). Any ideas/proposals for this would be great to hear. -kyle [1] http://kylelutz.github.io/compute/boost/compute/mapped_view.html