On Tue, Dec 23, 2014 at 5:46 PM, Mathias Gaunard
On 23/12/2014 20:21, Kyle Lutz wrote:
While yes, it does make developing Boost.Compute itself a bit more complex, it also gives us much greater flexibility.
For instance, we can dynamically build programs at run-time by combining algorithmic skeletons (such as reduce or scan) with custom user-defined reduction functions and produce optimized kernels for the actual platform that executes the code (which in fact can be dramatically different hardware than where Boost.Compute itself was compiled). It also allows us to automatically tune algorithm parameters for the actual hardware present at run-time (and also allows us to execute currently algorithms as efficiently as possible on future hardware platforms by re-tuning and scaling up parameters, all without any recompilation). It also allows us to generate fully specialized kernels at run-time based on dynamic-input/user-configuration (imagine user-created filter pipelines in Photoshop or custom database queries in PGSQL).
I think this added complexity is well worth the cost and this fits naturally with OpenCL's JIT-like programming model.
I could see that from the code, yes. But nothing should prevent doing that while still writing the original OpenCL source code (or skeletons/templates) in separate files rather than C strings.
Has separate compilation been considered? Put the OpenCL code into .cl files, and let the build system do whatever is needed to transform them into a form that can be executed.
Compiling programs to binaries and then later loading them from disk is supported by Boost.Compute (and is in fact used to implement the offline kernel caching infrastructure). However, for the reasons I mentioned before, this mode is not used exclusively in Boost.Compute and the algorithms are mainly implemented in terms of the run-time program creation and compilation model.
I didn't necessarily mean compiling OpenCL to SPIR (if that's indeed what you mean by binary).
Well, to be clear, OpenCL provides two mechanisms for creating programs, one from source strings with clCreateProgramWithSource() and one from binary blobs with clCreateProgramWithBinary(). Binaries are either in a vendor-specific format, or in the SPIR form for platforms that support it (which essentially attempts to be a "vendor-neutral" binary representation).
You could just make the build system automatically generate the C string from a .cl file, for example.
Boost.Compute has no "build system", it is merely a set of header files. If, in the future, we move away from a header-only implementation, we could certainly do something like this.
Another concern is that Boost.Compute is a header-only library and doesn't control the build system or how it the library will be loaded. This limits our ability to pre-compile certain programs and "install" them for later use by the library.
As it is, you're probably getting some bloat for the sole reason that you're getting a copy of all your strings in every TU, in particular the radix sort kernel. It makes more sense for it to be a library IMHO.
There is a tendency for people to prefer header-only designs because it facilitates deployment due to not having to build a library with compatible settings separately, but I do not think someone should go for header-only just for that reason.
These are good points. In the future we may move away from a header-only implementation if it proves to be too big of a hinderance.
That said, I am very interested in exploring methods for integrating OpenCL source files built by the build tool-chain and make loading and executing them seamless with the rest of Boost.Compute. One approach I have for this is an "extern_function<>" class which works like "boost::compute::function<>", but instead of being specified with a string at run-time, its object code is loaded from a pre-compiled OpenCL binary on disk. I've also been exploring a clang-plugin-based approach to simplify embedding OpenCL code in C++ and using it together with the Boost.Compute algorithms.
I do not know what you have in mind with your clang development, but I assumed your library was sticking to oldish standard OpenCL for compatibility with a wide variety of devices and older toolchains.
There are already some compiler projects that can generate hybrid CPU and GPU code from a single source, turning functions into GPU kernels as needed: C++AMP does it, CUDA does it too somewhat, and now there is SYCL, a recent addition to the OpenCL standards that was presented at SC14, which should become the best solution for this.
Yes, I am well aware of these projects. However, one of my goals for Boost.Compute was to provide a GPGPU library which required no special compiler or compiler extensions (as CUDA, C++AMP, SYCL, OpenACC, etc... all do). My aim is to provide a portable parallel programming library in C++ which supports the widest range of available platforms and compilers and I feel OpenCL fills this role very well (also see the "Why OpenCL?" section [1] in the documentation for more on my rationale for this choice). -kyle [1] http://kylelutz.github.io/compute/boost_compute/design.html#boost_compute.de...