
This is a call for interest in a GPU computing library for Boost. I have been working on the library in my spare time for the past month or so and it’s reached a point where I am ready for feedback. Below gives a brief overview, some notes on the design, and a small example of the library which I’ve named Boost.Compute. --- Overview --- * C++ library for general-purpose computing on GPUs/Accelerators * Based on OpenCL (Open Computing Language) * Header-only implementation * API inspired by the STL, Boost and Thrust * Boost dependencies: Config, Utility, Iterator, Exception, Preprocessor, TypeTraits, StaticAssert, MPL, Proto --- Design --- OpenCL is a framework for writing programs that run on parallel computing devices such as GPUs and multi-core CPUs. The OpenCL language is based on C99 with a few extensions to simplify writing parallel and vector-based code. More background: http://en.wikipedia.org/wiki/OpenCL. The core of the Boost Compute library is a thin C++ wrapper over the OpenCL C API. It provides classes for creating and managing various OpenCL entities such as contexts, buffers, devices and kernels. These classes are written in a style consistent with Boost and the C++ standard library. Written on top of the core library is a partial implementation of the C++ STL which includes common containers (e.g. vector<T>, array<T, N>) and algorithms (e.g. copy, find_if, sort) along with a few extensions (e.g. scatter, exclusive_scan, flat_set<T>). The aim of Boost.Compute’s STL API is to provide a familiar interface to developers wanting to easily write new code or port existing code to run on GPU devices. It also features a few “fancy” iterators inspired by the Boost.Iterator library such as transform_iterator<>, counting_iterator<>, and permutation_iterator<>. Furthermore, a lambda expression library was written using Boost.Proto which allows for mathematical expressions to be defined at the call site of an algorithm and then be executed on the GPU. For example, to multiply each element in a vector by the square root of itself and then add four: transform(v.begin(), v.end(), v.begin(), _1 * sqrt(_1) + 4); --- Example --- Below is a small example of using the Boost.Compute API to sort a vector of int values: // create vector of random values on the host std::vector<int> host_vector(10000); std::generate(host_vector.begin(), host_vector.end(), rand); // create a compute context for the default gpu device boost::compute::context gpu_context = boost::compute::default_gpu_context(); // create a vector on the gpu boost::compute::vector<int> device_vector(gpu_context); // transfer the values to the device device_vector = host_vector; // sort the values on the device boost::compute::sort(device_vector.begin(), device_vector.end()); // transfer the sorted values back to the host boost::compute::copy(device_vector.begin(), device_vector.end(), host_vector.begin()); --- Conclusion --- The Boost Compute library provides a useful, intuitive, and familiar interface for running high-performance parallel code on GPU devices. Incorporating Boost.Compute into the Boost libraries would make GPU computing readily accessible to a large number of C++ developers. All comments and feedback are welcome and greatly appreciated. Thanks, Kyle