Re: [boost] [compute] review

1 Jan 2015


      Le 30/12/14 20:08, Kyle Lutz a écrit :
...
On Tue, Dec 30, 2014 at 7:15 AM, Vicente J. Botet Escriba
<vicente.botet@wanadoo.fr> wrote:
...
Hi,
I have no time to do a complete review. Anyway here it is my minimal review
*1. What is your evaluation of the design?**
***
I would prefer that the algorithms take as first parameter the queue (which
could be abstracted to an asynchronous executor).
Any particular reason to prefer the first argument rather than the
last? Currently, by using the last argument to specify the command
queue, we can use a default argument of  "system::default_queue()".
For the common case of only wanting to use the default compute device
on the system, all algorithms will work correctly and the user never
has to bother with finding a device, setting up a context, creating a
command queue and passing it to the algorithm function.
One of the reasons is to allow for algorithms that take a variadic 
number of arguments. I don't know if you have one on your library, but I 
find this a uniform pattern.
Second, I consider that the user must be explicit about which queue 
(devices) is used as AFAIK this is an optimization library.
...
The performances let think that there is no interest in using the library
for containers having less than 10^4/10^5 elements.
In general yes, GPUs/Accelerators are ill-suited for small problem
sizes (unless the computation performed on each element is quite
expensive). Sorting 1000 integers will nearly always be faster on the
host rather than the device.
...
Even if this is considered a low level library, the documentation should
show how a library writer could make use of the provided interface to
provide a higher one.
I agree, there should be more documentation for this. Could you let me
know what sort of higher-level interface you'd like to see
demonstrated?
The one of the standard Parallel TS :), or a pipeline framework.
...
I would prefer that the library make use of std::future or boost::future
class instead of adding an additional one.
Do the author tried to make use of the existing futures? if yes, which
problems were found?
The implementation of the boost::compute::future type is fairly
different than that of the std::/boost::future types. Essentially,
there is no operation akin to "promise.set_value()" for
boost::compute::future. Instead, it is a simple wrapper around an
OpenCL event object which can be used to wait for an operation on the
device to complete (via calling the blocking clWaitForEvents()
function from the OpenCL API). Please let me know if there are ways to
better integrate with boost::future.
I understand this seems to don't match with the std/future/promise 
design. Note that the futures returned by std::async, 
std::make_ready_future, future<>::then, when_all, when_any, don't show 
neither an associated promise (as it is hidden).
I don't know yet how this could be done, but I find it an important use 
case to be solved.

I'm not (by principle) against a multiplication of future types, but 
this means that we would need to duplicate the whole future interface 
(then continuations, when_all, when _any, ...).
...
...
I'm missing a range interface to most of the algorithms.
The algorithms API is meant to mimic the STL algorithms API. Adding a
range-based interface is definitely possible, but is a non-trivial
amount of work. Would you be interested in contributing this?
Agreed this is a lot of work. No I have not time to contribute to this. 
I was just wondering if an range interface, couldn't make it possible to 
compose the programs executed by the whole pipeline and make the cost of 
copies less significant.
...
...
I suspect that as this is a C++03 library it doesn't take advantages of some
interesting C++11+/C++14 features.
It does take advantage of some C++11 features (e.g. variadic
functions) where possible. I would be very open to integrating more
C++11/14 features where appropriate.
...
It is not clear which operations are synchronous, which ones could block,
...
Could you be more specific about what operations you are referring to?
All the functions of the core classes must be explicit about 
thread-safety (synchronization).
I suspect that almost all the algorithm block, and no parallelism 
(pipeline) is possible. Could you confirm?
...
I know there is already some documentation regarding this for the
command_queue class as well as for the copy()/copy_async() functions.
I'll work on adding more.
Great.
...
In order to get the native OpenCl handle I suggest to name the function
get_native_handle instead of get.
*2. What is your evaluation of the implementation?*
I've not see at the implementation.
*3. What is your evaluation of the documentation?*
The documentation lacks of some core tutorials. I was unable to have an idea
of what a context is, what can be done with a command_queue, what is a
Kernel, a pipe, ...? Does it mean that the user must know OpenCL?
The low-level API is a wrapper around OpenCL and so having some
background knowledge there helps. I'll work on adding more
documentation for these classes and how they fit together.
Thanks.
...
The reference documentation should follow he C++ standard way, including
pre-conditions, effects, synchronization, throws clauses.
Fully agree, I'll continue to work on this.
Great.
...
I would appreciate a description of the algorithms used, how the parallelism
is used? is this completely managed by OpenCL or do the library something
else?
I'm not sure if I fully understand what you're asking here but I'll
try to answer it.
OpenCL provides an API for executing code on compute devices (e.g.
GPUs), Boost.Compute provides algorithms which, when called, use the
OpenCL API to create programs, compile them for the device, and
execute them.
Let me know if I can better explain this.
I have take a look at the OpenCL documentation and I understand it better.
Could it be possible to have a roughly pseudo code of the program that 
each one of the algorithms executes on the device?
...
...
*4. What is your evaluation of the potential usefulness of the library?**
***
This is essentially an optimization library. It should be quite useful, but
the performances results don't probe it.
Could you clarify what you mean by "the performances results don't
probe it"? Boost.Compute provides a number of benchmark programs which
can be used to measure and evaluate the performance of the library on
a particular system. Results from this are also published on the
"Performance" page [1] in the documentation.
What a meant is the performances are often worst except when the size of 
the container is high enough. The scope of the library could be to be 
better only for these big containers, but I'm missing an interface that 
makes a choice between the Compute or the STL one.
...
...
*8. Do you think the library should be accepted as a Boost library? **
*
As I have not taken too much time I'm unable to say that this library should
not be included in Boost. So I vote to include it conditionally, subject to:
     * I would need some performances that probe the library is better than
using standard STL containers in most of the cases.
Boost.Compute will never be universally better than the STL, and it
was never intended to be. CPUs and GPUs make fundamentally different
trade-offs which lead to very different performance characteristics.
GPUs support performing computation on large amounts of data with very
high throughput (in some cases 10x-100x faster than a CPU). The
down-side to this is a larger overhead to get the GPU running as well
as the cost of transferring data to device memory. Boost.Compute and
the STL are different tools which each suit their own different
use-cases. I don't think Boost.Compute should be required to supplant
the STL in order to be an acceptable library.
I suggest you to make more evident the goal and scope of the library in 
the introduction of the library. Please, replace my requirement by this 
one.
...
...
* The documentation should be completed with more tutorials and a better
reference documentation.
Fully agree, I'll definitely continue to work on improving the documentation.
Thanks for the review. Let me know if I can explain anything more clearly.
I will comeback if I find some time to read the documentation carefully.

Good luck,
Vicente