On Tue, Dec 16, 2014 at 11:20 AM, Sebastian Schaetz
Hi,
Here is my review of Boost.Compute:
Thanks for the review! I left my responses in-line below. Let me know if I missed anything.
1. What is your evaluation of the design?
The library is based upon OpenCL, a cross-platform cross-device open standard that abstracts access to and provides a programming model for many-core vector co-processors such as GPUs. These co-processors are usually referred to as "devices".
The library provides a wrapper layer around the OpenCL C interface. It skips the standard OpenCL C++ wrapper which I don't consider a problem because except for destructors there is no added value in using this wrapper. In my opinion Khronos should adopt Boost.Compute as their C++ layer for OpenCL.
Boost.Compute provides compatibility with the OpenCL C interface through conversion operators that decay Boost.Compute types to their OpenCL C equivalents. This can be quite useful.
On top of this wrapper Boost.Compute exhibits 3 core components:
* types to interact with and issue commands to devices: these follow OpenCL concepts but are not necessary if defaults are used
* means of managing memory (allocate, copy) on devices: this component contains also asynchronous operations which I consider essential in a library that deals with co-processors
* a collection of parallel primitives and meta-functions with an STL interface: this components contains powerful iterators to combine containers and algorithms to implement more complex algorithms in an efficient manner
One thing I'm not clear about is how asynchrony is handled. Command queues are exposed, issuing commands to different queues is a way to express concurrency. At the same time copy_async returns a future which is another way of exposing concurrency.
It is out of the scope of Boost.Compute to solve the challenges of asynchronous/concurrent operations because it is a different and difficult topic not yet solved for C++ in general either, but at least the documentation should be more explicit about which commands are executed when, which commands are synchronous, which are asynchronous and what is the role of the command_queue in this regard.
Yeah, I should have documented this better. As a general rule, the Boost.Compute algorithms execute asynchronously with respect to the host. The algorithms operate by queuing up operations (e.g. kernels launches) to be executed on the device via the command queue (which is handled by OpenCL). So, for example, executing the "transform()" algorithm on a vector on the device will occur in parallel to any further code run on the host (at least until making another OpenCL call which leads to a synchronization point between the host and device, e.g. "clFinish()"). The exception to this rule is that any algorithm which read/modifies host-memory (such as the "copy()" algorithm with host and device iterators) will block until the operation is complete. I chose to implement Boost.Compute this way in order to eliminate any potential race-conditions between the device writing to host memory and the host code using that same memory without synchronizing. This is the reason I introduced the "copy_async()" algorithm which makes its asynchronous nature explicit and requires the user to synchronize (via wait() on the returned future or finish() on the command queue) themselves before attempting to read the modified memory. I am still looking to improve Boost.Compute in this area and also paying close attention new techniques (such as those in the Concurrency TS). Any ideas/proposals/thoughts on this would be greatly appreciated.
2. What is your evaluation of the implementation?
I did not evaluate the implementation in detail but looked at a few of the tricks Boost.Compute uses to generate kernels. The implementation of this part of the library is good and instructive.
3. What is your evaluation of the documentation?
Boost.Compute documentation is of excellent quality. The recent addition of performance data is helpful. I could not find any documentation about fancy iterators, this should probably be added. Also, it would be great if the my questions regarding asynchrony/concurrency could be addressed in the documentation.
Addressed above. I'll work on updating the documentation to explain this more thoroughly.
4. What is your evaluation of the potential usefulness of the library?
Boost.Compute is extremely useful. With this library a developer familiar with the STL can utilize the processing power of GPUs without any knowledge of vector co-processor programming. The documentation shows that for large vector sizes, some Boost.Compute algorithms outperform the STL by an order of magnitude.
5. Did you try to use the library? With what compiler? Did you have any problems?
I tried the unit-tests on a 8x GeForce Titan system without any problems and on a ARM Mali GPU with some unit tests failing. I'll be working with the library author to fix tcomp.lib.boost.develhe problems in these unit tests. I used gcc 4.8.2 for the tests on both GeForce and Mali.
Thanks for testing these! Hopefully it'll be quick to fix the issues on the Mali GPU.
6. How much effort did you put into your evaluation? A glance? Aquick reading? In-depth study?
I reviewed the library a few months ago in-depth and reread the documentation for this review as well as ran some unit tests.
7. Are you knowledgeable about the problem domain?
My job involves working with both CUDA and OpenCL. Furthermore I am the author of the Aura library [0] a similar, albeit lower level library for accelerator programming.
8. Do you think the library should be accepted as a Boost library?
I think the library should be accepted into Boost. The interface is simple and easy to understand for non-experts and the benefits of using this library can be significant.
I'd like to add that Boost.Compute represents one level of abstraction for accelerator programming. I'd like the Boost community to keep an open mind when it comes to different levels of abstraction, either lower (i.e. my Aura library) or higher (i.e. VexCL). Libraries with different levels of abstraction can coexist, be compatible with one another or could even build upon one another.
Thanks! And I also look forward to having a larger ecosystem of GPU/accelerator programming libraries. -kyle