Gruenke,Matt
This is my biggest misgiving, by far. In the very near future, I expect developers will opt for either SYCL (https://www.khronos.org/opencl/sycl) or Bolt (https://github.com/HSA-Libraries/Bolt). SYCL provides a modern, standard C++11 wrapper around OpenCL, with better concurrency control and support for integrating kernels inline. Bolt provides many of the same higher-level abstractions found in Boost.Compute, but with forthcoming support for HSA.
Bolt relies on an extension to OpenCL called "OpenCL Static C++ Kernel Language Extension". Only AMD bothered to implement it as to my knowledge. C++ AMP is in my opinion a more promising proposal compared to SYCL. Developers opt for C++ AMP today. But both SYCL and C++ AMP are higher level tools and have the disadvantages of any higher level library compared to a lower level library. In addition, they need a custom compiler or compiler extensions. This increases the fragmentation of the accelerator co-processor field further. I think Boost.Compute does the right thing here. Identify the lowest common denominator: OpenCL. Build a library on top of it that anyone can use on any platform, provided a standard C++ compiler is available and the OpenCL library is implemented. Build whatever fancy thing you want on top of that.
To have the kind of lasting relevance and broad applicability to which all Boost libraries should aspire, I think Boost.Compute should be architected to support multiple backends. Though OpenCL support is currently ascendant, it's far from universal and is already flagging on some platforms (Nvidia, not the least). And HSA provides a foundation on which alternatives are actively being built. Most importantly, there exist multitudes of multi-core and multiprocessor systems which lack OpenCL support. It would be eminently useful to support these with such backends as thread pool, OpenMP, etc. And backends could be added to support new technologies, as they mature.
OpenCL is supposed to be the abstraction layer that does all that, remember? That is, support multi-core, multi-processor and many-core vector co-processors. Asking Boost.Compute to support threading and OpenMP is asking it to do the job of OpenCL library implementers. To play the heretic for the sake of argument: why stop at single nodes then? Why not add, on top of the OpenMP/threading layer you ask Boost.Compute to support an MPI layer? I urge you to not open this can of worms.