Re: [boost] [compute] Review

31 Dec 2014


      On Tue, Dec 30, 2014 at 6:23 PM, John Bytheway
<jbytheway+boost@gmail.com> wrote:
...
On 2014-12-29 02:01, Kyle Lutz wrote:
...
On Sun, Dec 28, 2014 at 7:26 PM, John Bytheway
<jbytheway+boost@gmail.com> wrote:
...
It would be very nice to have range versions of all the algorithms.  I
find myself using the boost::range algorithms much more than the std::
ones these days.
I fully agree and also I'm very much looking forward to having range
algorithms standardized in C++17.
Do you mean that you are planning to add them to Boost.Compute?
I would be interested in having range algorithms in Boost.Compute but
this would be a large amount of work and is not currently very high in
my list of priorities. If anyone is interested in working on these
please let me know.
...
...
...
3. What is your evaluation of the documentation?
Overall, not too bad.
More tutorial would be good.  In particular, something using some of the
standard algorithms.
I'll definitely work on this more. Any ideas of specific
tutorials/examples you'd like to see would be very helpful.
Something using an algorithm which moves elements around might be nice.
 Maybe sort or partition...
Will do. Also, if you're not aware, there are quite a few example
applications in the "example" directory [1]. I'll also work on making
these more prominent in the documentation.
...
...
...
Some concerns:
- All the header links off
<http://kylelutz.github.io/compute/boost_compute/reference.html> are broken.
- All or most of the things in "See also" sections look like they ought
to be links but aren't.
- For all your function parameter documentation, the list of parameters
seems to be sorted alphabetically by parameter name rather than in the
order they are passed to the function.  This is really weird...
These are all, to the best of my knowledge, bugs in the Boost
documentation toolchain. I reported these issues a while back on the
boost-doc mailing list here [1] but never found a resolution. I would
really like to get these to work, but am not knowledgable enough about
the Boost documentation pipeline (specifically the doxygen->quickbook
infrastructure) to fix it. If anyone has any ideas or could help with
this I would be very appreciative.
Shame :(.  Hope you get some responses here.
Yeah, me too.
...
...
...
- <http://kylelutz.github.io/compute/boost/compute/copy.html> mentions
copying from e.g. a std::list to a compute::vector.  It would be nice if
it gave some kind of indication as to how efficient we can expect such
an operation to be.  In particular, I guess it ought to do some kind of
buffering on the host?
Currently it will just buffer the entire range into a std::vector<T>
and then copy that to the device. If this is a bottleneck it could
also be improved to copy the data in smaller batches. Let me know if
you encounter issues with this.
Does this also work with copy_async?  I presume not, because there would
be troubles with the lifetime of the temporary vector...
Yeah, as mentioned in one of the replies, copy_async() currently only
works with contiguous ranges. I have ideas on how to make this work
more generally but I haven't had time to implement it.
...
...
...
- The way I normally copy into vectors is call v.reserve(...) and then
copy to std::back_inserter(v).  I tried that with a compute::vector and
unsurprisingly it was very inefficient.  I see you already warned about
this in the push_back docs, but perhaps it should also be highlighted in
the documentation of compute::copy.
Yeah, this idiom is not well suited to copying into device memory. You
should prefer to do "bulky" operations using a single call to
boost::compute::copy() (which internally will call the OpenCL function
for copying large chunks of memory).
...
- <http://kylelutz.github.io/compute/boost/compute/includes.html> it's
unclear whether the subrange has to be contiguous.
It should have the same semantics of std::includes, which IIRC, does
not require the subrange to be contiguous.
Then perhaps rephrasing would make it clearer.  The wording on
<http://en.cppreference.com/w/cpp/algorithm/includes> seems clear.
...
...
- What source of randomness does random_shuffle use?
Currently it simply calls std::random_shuffle to generate the shuffled
index sequence then uses boost::compute::scatter() to actually
re-arrange the values on the device. However I am planning on
deprecating it and providing an implementation of the C++11 shuffle()
algorithm instead which will use the random number generator provided
by the user.
Sounds good.
...
...
- Why does reduce return its result through an OutputIterator?
In order to allow the algorithm to store the result in device memory
and avoid a host-device synchronization point (which would be
necessary if it simply returned the reduced result).
That makes sense, but seems inconsistent with accumulate, and to a
lesser extent the algorithms that return bool (binary_search, equal,
etc.)  Are they all introducing a synchronization point?  What about the
ones returning iterators?
Yeah, algorithms like accumulate() which return the resulting value on
the host are synchronization points. Others like transform() which
return an iterator do not because the returned iterator is not
actually the "result" of the operation and can be calculated and
returned before the operation is actually complete.
...
...
...
- The boost/compute/types headers docs seem to be broken or missing.  In
particular I couldn't find reference documentation for float4_ et al.
Yeah, I wasn't sure the best way to document all of these (IIRC, 10
different built in scalar types times four different vector version of
each). I'll add some information with an overview of the provided
fundamental types.
At least something documenting the supported means of initialization and
element access would be good :).
Will do.
...
...
...
- <http://kylelutz.github.io/compute/boost/compute/function.html> I'm
puzzled by this interface.  What's the name for?  Would one ever use
this constructor rather than e.g. make_function_from_source?
There are a few legitimate use-cases. One is for specifying the names
of pre-existing or built-in functions (e.g.
function<float(float)>("sqrt") would call the built-in sqrt()
function).
So constructing a function by name points it at some existing named
function.  That makes sense, but might be unexpected, so should probably
be mentioned in the constructor documentation.
Will do.
...
...
...
- <http://kylelutz.github.io/compute/boost/compute/field.html> My
reading of this is that when you get the type wrong you are guaranteed
some kind of meaningful error report, and not just undefined behaviour;
is that right?  Perhaps you should clarify.
Any comment here?
Yeah, I'll work on improving the documentation for this. I've added an
issue for it (#384).
...
...
...
- The iterators should state what iterator category they have.
Currently Boost.Compute doesn't really have different iterator
categories, all iterators are random-access. I'll add this information
to the documentation.
Now that you mention this I realize I was confused by what
function_input_iterator does.  I had assumed it was an InputIterator
which supported stateful functions, like
<http://www.boost.org/doc/libs/1_57_0/libs/iterator/doc/function_input_iterator.html>.
 But I guess in fact it has no guarantees about order of evaluation.
You might want to clarify this for folks accustomed to Boost.Iterator.
Will do.
...
...
...
- What does default_device do if there are no devices?  Do the various
environment variables mentioned have any kind of precedence?  (Probably
linking to the corresponding OpenCL docs is sufficient for such
questions).  Do they have to be exact matches?  Are they case-sensitive?
There is a precedence but it currently is not explained in the
documentation. For most cases only one of the default device variables
is meant to be used. They are currently will match substrings and are
case-sensitive. I'll work on improving the documentation for these and
give some example usages.
...
- What does find_device do if the requested device does not exist?
Currently it will return a null (default-constructed) device object. I
have considered updating this (and the default_device() function) to
throw an exception when no valid device is found so that returned
device objects are always valid.
I think that would be better.  In particular, I experienced what happens
if the default device does not initialize properly (in my case because
of a missing kernel module) and that was an exception.  Missing device
should probably be handled similarly.
Yeah, I'm becoming more convinced of this. I'll probably make that
change soon. I've added an issue for this (#403).
...
...
...
- Congratulations for getting compile-time detection of internal padding
into BOOST_COMPUTE_ADAPT_STRUCT :).  You might want to mention in the
docs that this will be detected, so people don't worry too much.
Will do. I also hope in the future to transparently support padded
structs as well and to be able to correctly copy them to/from the
device without issue.
That would be nice.
...
...
- It looks like the performance tests don't include the time for copying
data to (and, where applicable, from) the device.  You should mention
this on the Performance page.
I'll add this, though it is very device/system specific. Also, there
is a performance benchmark available in Boost.Compute
("perf_copy_to_device") which could be used to measure this on your
own system.
I think I was not clear.  I mean that, for example, the sort benchmark
doesn't include the time taken to copy data before and after the sort.
The text on the performance page should explain this, so people can
interpret the times properly.
Ahh I see now. I'll add this information.

-kyle

[1] https://github.com/kylelutz/compute/tree/master/example