On Sun, Dec 28, 2014 at 7:26 PM, John Bytheway
Here's my review.
1. What is your evaluation of the design?
The design of building on OpenCL seems sound, for the reasons Kyle has explained. The fact that it has real users is very encouraging.
I don't really like the name (I was unable to guess what the library was about from the name), but it's not worth changing at this point.
I am worried about the compile-time configuration with macros. I would hope that BOOST_COMPUTE_HAVE_THREAD_LOCAL could be read from Boost.Config. For the two features which require linking against other Boost libraries, my instinct would be to have them on by default, rather than off by default (that should be easier if and when Boost.Compute is part of the Boost distribution).
I agree and I would very much prefer to have these thread-safe features on by default and will work towards improving this. Also, I looked for a configuration macro to enable support for C++11 "thread_local" storage in Boost.Config, but could not find one. If/when one is added I'll update Boost.Compute to use it. And like you said, many of these changes become easier if Boost.Compute becomes a part of the standard Boost distribution.
It would be very nice to have range versions of all the algorithms. I find myself using the boost::range algorithms much more than the std:: ones these days.
I fully agree and also I'm very much looking forward to having range algorithms standardized in C++17.
Similarly, it would be good to provide the safer 4-iterator versions of equal, is_permutation, mismatch (and any I've forgotten) as were added to C++14.
Fully agree. This shouldn't be too much work.
Some minor things:
- Consider renaming boost::compute::lambda to be consistent with one of the other placeholder-containing namespaces from std or Boost. (You can keep the old name as an alias for backwards-compatibility).
Yeah, I'll work on unifying the placeholders.
- Why the default of .25 as the bernoulli_distribution parameter?
Looks like an oversight, I'll get it updated to use 0.5 just as std::bernoulli_distribution does.
- Did you consider using Boost.Predef for your version macro?
Currently Boost.Compute remains compatible with Boost version 1.48 which didn't include Boost.Predef (added in 1.55 I believe). If Boost.Compute is accepted into the standard Boost distribution, updating it to use this should be no problem. (as well as many other Boost features added since 1.48).
2. What is your evaluation of the implementation?
Only looked at a couple of things while trying to understand the docs.
3. What is your evaluation of the documentation?
Overall, not too bad.
More tutorial would be good. In particular, something using some of the standard algorithms.
I'll definitely work on this more. Any ideas of specific tutorials/examples you'd like to see would be very helpful.
Some concerns:
- All the header links off http://kylelutz.github.io/compute/boost_compute/reference.html are broken.
- All or most of the things in "See also" sections look like they ought to be links but aren't.
- For all your function parameter documentation, the list of parameters seems to be sorted alphabetically by parameter name rather than in the order they are passed to the function. This is really weird...
These are all, to the best of my knowledge, bugs in the Boost documentation toolchain. I reported these issues a while back on the boost-doc mailing list here [1] but never found a resolution. I would really like to get these to work, but am not knowledgable enough about the Boost documentation pipeline (specifically the doxygen->quickbook infrastructure) to fix it. If anyone has any ideas or could help with this I would be very appreciative.
- As others have mentioned, some kind of complexity estimate or promise for the algorithms is very important. In particular it would be good to understand how much parallelism they take advantage of and how they ought to scale with device compute capability.
Will do.
- The warning about accumulate being slow for floats should be much more prominent.
Will do.
- http://kylelutz.github.io/compute/boost/compute/copy.html mentions copying from e.g. a std::list to a compute::vector. It would be nice if it gave some kind of indication as to how efficient we can expect such an operation to be. In particular, I guess it ought to do some kind of buffering on the host?
Currently it will just buffer the entire range into a std::vector<T> and then copy that to the device. If this is a bottleneck it could also be improved to copy the data in smaller batches. Let me know if you encounter issues with this.
- The way I normally copy into vectors is call v.reserve(...) and then copy to std::back_inserter(v). I tried that with a compute::vector and unsurprisingly it was very inefficient. I see you already warned about this in the push_back docs, but perhaps it should also be highlighted in the documentation of compute::copy.
Yeah, this idiom is not well suited to copying into device memory. You should prefer to do "bulky" operations using a single call to boost::compute::copy() (which internally will call the OpenCL function for copying large chunks of memory).
- Is there no version of inclusive/exclusive_scan or iota using an arbitrary associative operator?
These are not currently supported. I will work on adding an implementation for these and documenting them.
- http://kylelutz.github.io/compute/boost/compute/includes.html it's unclear whether the subrange has to be contiguous.
It should have the same semantics of std::includes, which IIRC, does not require the subrange to be contiguous.
- min_element and max_element should cross-reference minmax_element.
Will do.
- What source of randomness does random_shuffle use?
Currently it simply calls std::random_shuffle to generate the shuffled index sequence then uses boost::compute::scatter() to actually re-arrange the values on the device. However I am planning on deprecating it and providing an implementation of the C++11 shuffle() algorithm instead which will use the random number generator provided by the user.
- Why does reduce return its result through an OutputIterator?
In order to allow the algorithm to store the result in device memory and avoid a host-device synchronization point (which would be necessary if it simply returned the reduced result).
- http://kylelutz.github.io/compute/boost/compute/remove.html Does this really remove things or use std::remove semantics? You should clarify and document the return value. (Same with remove_if, unique)
Works just the same as std::remove. I'll add more documentation for these.
- I can see how sort_by_key might be useful, and I'm slightly surprised that there's no stable_sort_by_key (which seems like it would be more useful than stable_sort). Any particular reason?
Just not enough time to implement it yet. There is a parallel merge-sort in the works which would be very suitable for implementing a stable_sort_by_key(). Hopefully this should be ready soon.
- The boost/compute/types headers docs seem to be broken or missing. In particular I couldn't find reference documentation for float4_ et al.
Yeah, I wasn't sure the best way to document all of these (IIRC, 10 different built in scalar types times four different vector version of each). I'll add some information with an overview of the provided fundamental types.
- I was interested to see allocators, but the docs seem uninformative: http://kylelutz.github.io/compute/boost/compute/pinned_allocator.html
I'll definitely update this.
- Given the confusion with std::future, you might want to clarify whether compute::future's destructor blocks.
It does not, I'll update the documentation for this.
- Also, http://kylelutz.github.io/compute/BOOST_COMPUTE_CLOSURE.html doesn't explain what format the lists (arguments and capture) should be in. Preprocessor functions often take lists in other formats, such as a Boost.PP Sequence http://www.boost.org/doc/libs/1_57_0/libs/preprocessor/doc/data/sequences.ht..., and your example only has one-element lists, so it's not enough to clarify this point. (To be clear: I'm not suggesting you should use Boost.PP sequences, merely that you should make it clear in the docs whether you are).
Will do.
- http://kylelutz.github.io/compute/BOOST_COMPUTE_CL_CALLBACK.html is empty, as is http://kylelutz.github.io/compute/BOOST_COMPUTE_MAX_ARITY.html
Will fix.
- http://kylelutz.github.io/compute/boost/compute/valarray.html - presumably compute::valarray has overloaded arithmetic operators? Are they mentioned anywhere?
Actually they aren't currently implemented. I'll open an issue for this and also document them once they are added.
- http://kylelutz.github.io/compute/boost/compute/vector.html the unusual behaviour of vector's size_t constructor (not initializing the elements) should be warned more prominently.
Will do.
- http://kylelutz.github.io/compute/boost/compute/device.html the documentation of the enum type in the synopsis seems to be messed up. Similarly memory_object.
Thanks, I'll look into this more.
- http://kylelutz.github.io/compute/boost/compute/function.html I'm puzzled by this interface. What's the name for? Would one ever use this constructor rather than e.g. make_function_from_source?
There are a few legitimate use-cases. One is for specifying the names
of pre-existing or built-in functions (e.g.
function
- http://kylelutz.github.io/compute/boost/compute/field.html My reading of this is that when you get the type wrong you are guaranteed some kind of meaningful error report, and not just undefined behaviour; is that right? Perhaps you should clarify.
- I can't guess what atomic_min and atomic_max do.
They just forward to the corresponding OpenCL functions [2]. I'll update the documentation to be more clear about these.
- The right arrow ('next') link from http://kylelutz.github.io/compute/boost/compute/atomic_xor.html is broken (looks like the docs for the placeholders don't work).
Hmm, I'll look into this more.
- The iterators should state what iterator category they have.
Currently Boost.Compute doesn't really have different iterator categories, all iterators are random-access. I'll add this information to the documentation.
- I can't guess what svm_ptr is for. Later I discovered svm_alloc and svm_free. Is there an RAII way to manage these things?
These are for managing shared virtual memory objects which were added in OpenCL 2.0. Currently there is no RAII wrappers around them and they behave like malloc()/free(). I have plans to add an svm_allocator which will provide an allocator for SVM objects which can be used with the Boost.Compute containers (and also possibly standard STL containers).
- What does default_device do if there are no devices? Do the various environment variables mentioned have any kind of precedence? (Probably linking to the corresponding OpenCL docs is sufficient for such questions). Do they have to be exact matches? Are they case-sensitive?
There is a precedence but it currently is not explained in the documentation. For most cases only one of the default device variables is meant to be used. They are currently will match substrings and are case-sensitive. I'll work on improving the documentation for these and give some example usages.
- What does find_device do if the requested device does not exist?
Currently it will return a null (default-constructed) device object. I have considered updating this (and the default_device() function) to throw an exception when no valid device is found so that returned device objects are always valid.
- Can you give examples for type_definition? My guess was that it did the thing which type_name later turned out to do...
Yeah, type_definition<T> will return a string with the full definition for the type while type_name<T> returns just the name. So for something like: struct Foo { int id; float mass; }; BOOST_COMPUTE_ADAPT_STRUCT(Foo, Foo, (id, mass)); The following strings will be returned: type_name<Foo>() == "Foo" type_definition<Foo>() == "typedef struct { int id; float mass; } Foo;" I'll update the documentation for type_definition<T> with a better example.
- BOOST_COMPUTE_ADAPT_STRUCT should state that it must be used at global scope (i.e. not within a function or other namespace).
Will do.
- Congratulations for getting compile-time detection of internal padding into BOOST_COMPUTE_ADAPT_STRUCT :). You might want to mention in the docs that this will be detected, so people don't worry too much.
Will do. I also hope in the future to transparently support padded structs as well and to be able to correctly copy them to/from the device without issue.
- It looks like the performance tests don't include the time for copying data to (and, where applicable, from) the device. You should mention this on the Performance page.
I'll add this, though it is very device/system specific. Also, there is a performance benchmark available in Boost.Compute ("perf_copy_to_device") which could be used to measure this on your own system.
- You should probably add C++AMP to your things compared with in the FAQ.
Will do.
4. What is your evaluation of the potential usefulness of the library?
High. I've wanted something like this in the past and not found this.
5. Did you try to use the library? With what compiler? Did you have any problems?
I used it with g++ 4.8.2 in C++1y mode. Some quick tests seemed to work OK, and indeed I was able to sort some large vectors of scalars faster using boost::compute::sort than std::sort, which is nice.
I was pleased to see that 64-bit integers were handled fine.
6. How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
Couple of hours reading and trying simple examples.
7. Are you knowledgeable about the problem domain?
I have done a couple of small projects with CUDA, and am generally knowledgeable, but with no previous OpenCL experience.
8. Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion.
Yes. OpenCL seems to have fairly broad acceptance, but is rather difficult to use safely and with C++-isms. This library fills a useful niche. In an ideal world it will eventually be rendered obsolete by things like parallelism entering the standard library, but I doubt that will happen soon if at all. This provides a good solution now, and we shouldn't dismiss it simply because something sparkly is approaching, which may or may not turn out to be awesome and widely adopted.
This library has clearly been a lot of work. Congratulations on making it this far, I hope the review is not too stressful :).
Thanks for the review! Your feedback has been very helpful! -kyle [1] http://lists.boost.org/boost-docs/2013/07/5077.php [2] https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/atomic_min.html