On Tue, Dec 30, 2014 at 6:23 PM, John Bytheway
On 2014-12-29 02:01, Kyle Lutz wrote:
On Sun, Dec 28, 2014 at 7:26 PM, John Bytheway
wrote: It would be very nice to have range versions of all the algorithms. I find myself using the boost::range algorithms much more than the std:: ones these days.
I fully agree and also I'm very much looking forward to having range algorithms standardized in C++17.
Do you mean that you are planning to add them to Boost.Compute?
I would be interested in having range algorithms in Boost.Compute but this would be a large amount of work and is not currently very high in my list of priorities. If anyone is interested in working on these please let me know.
3. What is your evaluation of the documentation?
Overall, not too bad.
More tutorial would be good. In particular, something using some of the standard algorithms.
I'll definitely work on this more. Any ideas of specific tutorials/examples you'd like to see would be very helpful.
Something using an algorithm which moves elements around might be nice. Maybe sort or partition...
Will do. Also, if you're not aware, there are quite a few example applications in the "example" directory [1]. I'll also work on making these more prominent in the documentation.
Some concerns:
- All the header links off http://kylelutz.github.io/compute/boost_compute/reference.html are broken.
- All or most of the things in "See also" sections look like they ought to be links but aren't.
- For all your function parameter documentation, the list of parameters seems to be sorted alphabetically by parameter name rather than in the order they are passed to the function. This is really weird...
These are all, to the best of my knowledge, bugs in the Boost documentation toolchain. I reported these issues a while back on the boost-doc mailing list here [1] but never found a resolution. I would really like to get these to work, but am not knowledgable enough about the Boost documentation pipeline (specifically the doxygen->quickbook infrastructure) to fix it. If anyone has any ideas or could help with this I would be very appreciative.
Shame :(. Hope you get some responses here.
Yeah, me too.
- http://kylelutz.github.io/compute/boost/compute/copy.html mentions copying from e.g. a std::list to a compute::vector. It would be nice if it gave some kind of indication as to how efficient we can expect such an operation to be. In particular, I guess it ought to do some kind of buffering on the host?
Currently it will just buffer the entire range into a std::vector<T> and then copy that to the device. If this is a bottleneck it could also be improved to copy the data in smaller batches. Let me know if you encounter issues with this.
Does this also work with copy_async? I presume not, because there would be troubles with the lifetime of the temporary vector...
Yeah, as mentioned in one of the replies, copy_async() currently only works with contiguous ranges. I have ideas on how to make this work more generally but I haven't had time to implement it.
- The way I normally copy into vectors is call v.reserve(...) and then copy to std::back_inserter(v). I tried that with a compute::vector and unsurprisingly it was very inefficient. I see you already warned about this in the push_back docs, but perhaps it should also be highlighted in the documentation of compute::copy.
Yeah, this idiom is not well suited to copying into device memory. You should prefer to do "bulky" operations using a single call to boost::compute::copy() (which internally will call the OpenCL function for copying large chunks of memory).
- http://kylelutz.github.io/compute/boost/compute/includes.html it's unclear whether the subrange has to be contiguous.
It should have the same semantics of std::includes, which IIRC, does not require the subrange to be contiguous.
Then perhaps rephrasing would make it clearer. The wording on http://en.cppreference.com/w/cpp/algorithm/includes seems clear.
- What source of randomness does random_shuffle use?
Currently it simply calls std::random_shuffle to generate the shuffled index sequence then uses boost::compute::scatter() to actually re-arrange the values on the device. However I am planning on deprecating it and providing an implementation of the C++11 shuffle() algorithm instead which will use the random number generator provided by the user.
Sounds good.
- Why does reduce return its result through an OutputIterator?
In order to allow the algorithm to store the result in device memory and avoid a host-device synchronization point (which would be necessary if it simply returned the reduced result).
That makes sense, but seems inconsistent with accumulate, and to a lesser extent the algorithms that return bool (binary_search, equal, etc.) Are they all introducing a synchronization point? What about the ones returning iterators?
Yeah, algorithms like accumulate() which return the resulting value on the host are synchronization points. Others like transform() which return an iterator do not because the returned iterator is not actually the "result" of the operation and can be calculated and returned before the operation is actually complete.
- The boost/compute/types headers docs seem to be broken or missing. In particular I couldn't find reference documentation for float4_ et al.
Yeah, I wasn't sure the best way to document all of these (IIRC, 10 different built in scalar types times four different vector version of each). I'll add some information with an overview of the provided fundamental types.
At least something documenting the supported means of initialization and element access would be good :).
Will do.
- http://kylelutz.github.io/compute/boost/compute/function.html I'm puzzled by this interface. What's the name for? Would one ever use this constructor rather than e.g. make_function_from_source?
There are a few legitimate use-cases. One is for specifying the names of pre-existing or built-in functions (e.g. function
("sqrt") would call the built-in sqrt() function). So constructing a function by name points it at some existing named function. That makes sense, but might be unexpected, so should probably be mentioned in the constructor documentation.
Will do.
- http://kylelutz.github.io/compute/boost/compute/field.html My reading of this is that when you get the type wrong you are guaranteed some kind of meaningful error report, and not just undefined behaviour; is that right? Perhaps you should clarify.
Any comment here?
Yeah, I'll work on improving the documentation for this. I've added an issue for it (#384).
- The iterators should state what iterator category they have.
Currently Boost.Compute doesn't really have different iterator categories, all iterators are random-access. I'll add this information to the documentation.
Now that you mention this I realize I was confused by what function_input_iterator does. I had assumed it was an InputIterator which supported stateful functions, like http://www.boost.org/doc/libs/1_57_0/libs/iterator/doc/function_input_iterat.... But I guess in fact it has no guarantees about order of evaluation. You might want to clarify this for folks accustomed to Boost.Iterator.
Will do.
- What does default_device do if there are no devices? Do the various environment variables mentioned have any kind of precedence? (Probably linking to the corresponding OpenCL docs is sufficient for such questions). Do they have to be exact matches? Are they case-sensitive?
There is a precedence but it currently is not explained in the documentation. For most cases only one of the default device variables is meant to be used. They are currently will match substrings and are case-sensitive. I'll work on improving the documentation for these and give some example usages.
- What does find_device do if the requested device does not exist?
Currently it will return a null (default-constructed) device object. I have considered updating this (and the default_device() function) to throw an exception when no valid device is found so that returned device objects are always valid.
I think that would be better. In particular, I experienced what happens if the default device does not initialize properly (in my case because of a missing kernel module) and that was an exception. Missing device should probably be handled similarly.
Yeah, I'm becoming more convinced of this. I'll probably make that change soon. I've added an issue for this (#403).
- Congratulations for getting compile-time detection of internal padding into BOOST_COMPUTE_ADAPT_STRUCT :). You might want to mention in the docs that this will be detected, so people don't worry too much.
Will do. I also hope in the future to transparently support padded structs as well and to be able to correctly copy them to/from the device without issue.
That would be nice.
- It looks like the performance tests don't include the time for copying data to (and, where applicable, from) the device. You should mention this on the Performance page.
I'll add this, though it is very device/system specific. Also, there is a performance benchmark available in Boost.Compute ("perf_copy_to_device") which could be used to measure this on your own system.
I think I was not clear. I mean that, for example, the sort benchmark doesn't include the time taken to copy data before and after the sort. The text on the performance page should explain this, so people can interpret the times properly.
Ahh I see now. I'll add this information. -kyle [1] https://github.com/kylelutz/compute/tree/master/example