On Tue, Dec 23, 2014 at 9:02 AM, Mathias Gaunard
Hi,
While reading through the code of Boost.Compute to see what it does and how it does it, I often found that the approach used by the library of putting all OpenCL kernels inside of strings was an annoying limitation and made it quite difficult to reason with them, much less debug or maintain them. This has a negative effect on the effort needed to contribute to the library.
While yes, it does make developing Boost.Compute itself a bit more complex, it also gives us much greater flexibility. For instance, we can dynamically build programs at run-time by combining algorithmic skeletons (such as reduce or scan) with custom user-defined reduction functions and produce optimized kernels for the actual platform that executes the code (which in fact can be dramatically different hardware than where Boost.Compute itself was compiled). It also allows us to automatically tune algorithm parameters for the actual hardware present at run-time (and also allows us to execute currently algorithms as efficiently as possible on future hardware platforms by re-tuning and scaling up parameters, all without any recompilation). It also allows us to generate fully specialized kernels at run-time based on dynamic-input/user-configuration (imagine user-created filter pipelines in Photoshop or custom database queries in PGSQL). I think this added complexity is well worth the cost and this fits naturally with OpenCL's JIT-like programming model.
Has separate compilation been considered? Put the OpenCL code into .cl files, and let the build system do whatever is needed to transform them into a form that can be executed.
Compiling programs to binaries and then later loading them from disk is supported by Boost.Compute (and is in fact used to implement the offline kernel caching infrastructure). However, for the reasons I mentioned before, this mode is not used exclusively in Boost.Compute and the algorithms are mainly implemented in terms of the run-time program creation and compilation model. Another concern is that Boost.Compute is a header-only library and doesn't control the build system or how it the library will be loaded. This limits our ability to pre-compile certain programs and "install" them for later use by the library. That said, I am very interested in exploring methods for integrating OpenCL source files built by the build tool-chain and make loading and executing them seamless with the rest of Boost.Compute. One approach I have for this is an "extern_function<>" class which works like "boost::compute::function<>", but instead of being specified with a string at run-time, its object code is loaded from a pre-compiled OpenCL binary on disk. I've also been exploring a clang-plugin-based approach to simplify embedding OpenCL code in C++ and using it together with the Boost.Compute algorithms. There is certainly room for improvement, and I'd be very happy to collaborate with anyone interested in this sort of work. -kyle