Hi,
not a proper review, just some comments from a library user based on studying
documentation and writing a couple of simple programs using Boost.Compute on
Intel's and NVIDIA's OpenCL implementations (i7 CPU and GTX970 GPU
respectively). I used MSVC2013 for the tests.
Some of these comments I've shared previously on this list, but I'm repeating
them here for completeness.
I have some background in heterogeneous development and OpenCL in particular,
but I'm no expert. I found Boost.Compute easy to pick up and providing a nice
C++ layer, with a nice mix of low and high level.
Comments in no particular order:
1) As a library user and one who's not a C++ guru by any means, the error
messages thrown up when passing incorrect parameters can be quite confusing.
For example, first time I tried to use Boost.Compute I managed to not pass the
ouput iterator parameter to transform(), so that the lambda expression ended up
in that position instead. This resulted in several very confusing errors inside
result_of.hpp, with no errors anywhere near my or Boost.Compute source code.
It would be nice if the compilation could fail in more informative manners.
2) I did miss async versions of the algorithms, so it's possible to chain
together multiple calls. Even though all the data sits on the compute device,
the overhead of waiting for each operation to finish before queuing the next can
make the compute gains completely irrelevant.
3) I think relevant calls should have a non-throwing form returning an error
code, ala Boost.ASIO.
4) Threading has been mentioned in this list. At the very least the
documentation should be updated to clearly state the thread safety of each
call/class, ala Boost.ASIO. Beyond that I would like to be able to share a
context between several threads, each which could contain their own queue and
device.
5) If thread safety or similar important features requires a compiled part, I
wouldn't mind Boost.Compute being non-header-only. Though if possible making it
optional like other Boost libraries do would be great.
6) The tests, and preferably the performance tests as well, should have some way
of testing all devices across all platforms. Currently they only test the
default device on the default platform it seems. This is insufficient IMHO.
7) I'd like some way of defining user functions either as lambda expressions,
raw OpenCL-C code via make_function_from_source or both, which can be used in
lambda expressions for say transform(). Something like
function
On Sun, Dec 28, 2014 at 1:40 PM, Asbjørn
Hi,
not a proper review, just some comments from a library user based on studying documentation and writing a couple of simple programs using Boost.Compute on Intel's and NVIDIA's OpenCL implementations (i7 CPU and GTX970 GPU respectively). I used MSVC2013 for the tests.
Some of these comments I've shared previously on this list, but I'm repeating them here for completeness.
I have some background in heterogeneous development and OpenCL in particular, but I'm no expert. I found Boost.Compute easy to pick up and providing a nice C++ layer, with a nice mix of low and high level.
Comments in no particular order:
1) As a library user and one who's not a C++ guru by any means, the error messages thrown up when passing incorrect parameters can be quite confusing.
For example, first time I tried to use Boost.Compute I managed to not pass the ouput iterator parameter to transform(), so that the lambda expression ended up in that position instead. This resulted in several very confusing errors inside result_of.hpp, with no errors anywhere near my or Boost.Compute source code.
It would be nice if the compilation could fail in more informative manners.
Fully agree. There are many places where more static_assert's could be added to help improve compile error messages. I'll work on this.
2) I did miss async versions of the algorithms, so it's possible to chain together multiple calls. Even though all the data sits on the compute device, the overhead of waiting for each operation to finish before queuing the next can make the compute gains completely irrelevant.
Can you let me know what chain of functions you're calling? Many algorithms should already execute asynchronously and provide the behavior you expect.
3) I think relevant calls should have a non-throwing form returning an error code, ala Boost.ASIO.
This could be implemented, but would be a large amount of work (essentially doubling the size of the public API). Can you let me know more about your use-case and why the current exception-based API is not suitable?
4) Threading has been mentioned in this list. At the very least the documentation should be updated to clearly state the thread safety of each call/class, ala Boost.ASIO. Beyond that I would like to be able to share a context between several threads, each which could contain their own queue and device.
I'll add more documentation for this. You should be able to share a context between multiple threads already, let me know if you run into any issues with this.
5) If thread safety or similar important features requires a compiled part, I wouldn't mind Boost.Compute being non-header-only. Though if possible making it optional like other Boost libraries do would be great.
I have been considering this more, but it would be a large change (and would be a non-backwards-compatible for existing users).
6) The tests, and preferably the performance tests as well, should have some way of testing all devices across all platforms. Currently they only test the default device on the default platform it seems. This is insufficient IMHO.
I have wanted to improve the testing infrastructure to run all tests for all devices automatically, I just don't know the best way to use Boost.Test to accomplish this. For now I manually run the tests for each device (usually by setting BOOST_COMPUTE_DEFAULT_DEVICE and then running ctest).
7) I'd like some way of defining user functions either as lambda expressions, raw OpenCL-C code via make_function_from_source or both, which can be used in lambda expressions for say transform(). Something like
function
sqr = _1 * _1; transform(inp.begin(), inp.end(), out.begin(), sqr(2 * _1 + 1)), queue);
This is definitely something I want to support, just haven't had the time to implement it yet. There is an issue open already for it [1].
Overall I think Boost.Compute can be useful right now for some use-cases, and it has good potential to be significantly more useful with a bit more work.
Thanks. I'll definitely continue working to fix these issues (as well as others that came up during the review). -kyle [1] https://github.com/kylelutz/compute/issues/59
On 29.12.2014 01:42, Kyle Lutz wrote:
On Sun, Dec 28, 2014 at 1:40 PM, Asbjørn
wrote: 2) I did miss async versions of the algorithms, so it's possible to chain together multiple calls. Even though all the data sits on the compute device, the overhead of waiting for each operation to finish before queuing the next can make the compute gains completely irrelevant.
Can you let me know what chain of functions you're calling? Many algorithms should already execute asynchronously and provide the behavior you expect.
Seems I had missed this crucial part. Given that copy() is sync and there's a special async version of it, if you forget the details it's easy to forget that other operations are enqueued non-blocking. For example, on the reference page of transform()[1] there's no mention of it being asynchronous. It makes sense assuming the default in-order command queue execution, but I think it should be more explicit to make it harder to forget late at night :)
3) I think relevant calls should have a non-throwing form returning an error code, ala Boost.ASIO.
This could be implemented, but would be a large amount of work (essentially doubling the size of the public API). Can you let me know more about your use-case and why the current exception-based API is not suitable?
Fair enough. From experience, the main errors which one can handle sensibly would be insufficient memory for a buffer (one can try using smaller buffers/reduce dataset/alternate algorithm) and kernels failing to compile/run due to lack of registers or similar resources (one can try an alternate algorithm/kernel). For example, using a single large buffer may be significantly faster, but if the max buffer size is small, one can switch to a ping-pong algorithm. [1]: http://kylelutz.github.io/compute/boost/compute/transform.html Cheers - Asbjørn
On Tue, Dec 30, 2014 at 4:12 AM, Asbjørn
On 29.12.2014 01:42, Kyle Lutz wrote:
On Sun, Dec 28, 2014 at 1:40 PM, Asbjørn
wrote: 2) I did miss async versions of the algorithms, so it's possible to chain together multiple calls. Even though all the data sits on the compute device, the overhead of waiting for each operation to finish before queuing the next can make the compute gains completely irrelevant.
Can you let me know what chain of functions you're calling? Many algorithms should already execute asynchronously and provide the behavior you expect.
Seems I had missed this crucial part. Given that copy() is sync and there's a special async version of it, if you forget the details it's easy to forget that other operations are enqueued non-blocking. For example, on the reference page of transform()[1] there's no mention of it being asynchronous. It makes sense assuming the default in-order command queue execution, but I think it should be more explicit to make it harder to forget late at night :)
I agree, I'll work on documenting this behavior better.
3) I think relevant calls should have a non-throwing form returning an error code, ala Boost.ASIO.
This could be implemented, but would be a large amount of work (essentially doubling the size of the public API). Can you let me know more about your use-case and why the current exception-based API is not suitable?
Fair enough. From experience, the main errors which one can handle sensibly would be insufficient memory for a buffer (one can try using smaller buffers/reduce dataset/alternate algorithm) and kernels failing to compile/run due to lack of registers or similar resources (one can try an alternate algorithm/kernel).
For example, using a single large buffer may be significantly faster, but if the max buffer size is small, one can switch to a ping-pong algorithm.
Yeah, I would be very interested in exploring error handling strategies like these. -kyle
Asbjørn wrote
Comments in no particular order:
1) As a library user and one who's not a C++ guru by any means, the error messages thrown up when passing incorrect parameters can be quite confusing.
For example, first time I tried to use Boost.Compute I managed to not pass the ouput iterator parameter to transform(), so that the lambda expression ended up in that position instead. This resulted in several very confusing errors inside result_of.hpp, with no errors anywhere near my or Boost.Compute source code.
It would be nice if the compilation could fail in more informative manners.
This is an annoying, and recurring problem in much of boost library documentation. The solution is well known for many years. Here is more information: http://rrsd.com/blincubator.com/advice_concepts/ Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/compute-Review-comments-tp4670330p4670364... Sent from the Boost - Dev mailing list archive at Nabble.com.
This is an annoying, and recurring problem in much of boost library documentation. The solution is well known for many years. Here is more information:
http://rrsd.com/blincubator.com/advice_concepts/
Robert Ramey
+1, This is very useful advice, especially for library writers. Paul -- View this message in context: http://boost.2283326.n4.nabble.com/compute-Review-comments-tp4670330p4670396... Sent from the Boost - Dev mailing list archive at Nabble.com.
participants (4)
-
Asbjørn
-
Kyle Lutz
-
pfultz2
-
Robert Ramey