[boost] Interest in a GPU computing library

18 Sep 2012

      This is a call for interest in a GPU computing library for Boost. I
have been working on the library in my spare time for the past month
or so and it’s reached a point where I am ready for feedback. Below
gives a brief overview, some notes on the design, and a small example
of the library which I’ve named Boost.Compute.

--- Overview ---

* C++ library for general-purpose computing on GPUs/Accelerators
* Based on OpenCL (Open Computing Language)
* Header-only implementation
* API inspired by the STL, Boost and Thrust
* Boost dependencies: Config, Utility, Iterator, Exception,
Preprocessor, TypeTraits, StaticAssert, MPL, Proto

--- Design ---

OpenCL is a framework for writing programs that run on parallel
computing devices such as GPUs and multi-core CPUs. The OpenCL
language is based on C99 with a few extensions to simplify writing
parallel and vector-based code. More background:
http://en.wikipedia.org/wiki/OpenCL.

The core of the Boost Compute library is a thin C++ wrapper over the
OpenCL C API. It provides classes for creating and managing various
OpenCL entities such as contexts, buffers, devices and kernels. These
classes are written in a style consistent with Boost and the C++
standard library.

Written on top of the core library is a partial implementation of the
C++ STL which includes common containers (e.g. vector<T>, array<T, N>)
and algorithms (e.g. copy, find_if, sort) along with a few extensions
(e.g. scatter, exclusive_scan, flat_set<T>).

The aim of Boost.Compute’s STL API is to provide a familiar interface
to developers wanting to easily write new code or port existing code
to run on GPU devices. It also features a few “fancy” iterators
inspired by the Boost.Iterator library such as transform_iterator<>,
counting_iterator<>, and permutation_iterator<>.

Furthermore, a lambda expression library was written using Boost.Proto
which allows for mathematical expressions to be defined at the call
site of an algorithm and then be executed on the GPU. For example, to
multiply each element in a vector by the square root of itself and
then add four:

transform(v.begin(), v.end(), v.begin(), _1 * sqrt(_1) + 4);

--- Example ---

Below is a small example of using the Boost.Compute API to sort a
vector of int values:

// create vector of random values on the host
std::vector<int> host_vector(10000);
std::generate(host_vector.begin(), host_vector.end(), rand);

// create a compute context for the default gpu device
boost::compute::context gpu_context = boost::compute::default_gpu_context();

// create a vector on the gpu
boost::compute::vector<int> device_vector(gpu_context);

// transfer the values to the device
device_vector = host_vector;

// sort the values on the device
boost::compute::sort(device_vector.begin(), device_vector.end());

// transfer the sorted values back to the host
boost::compute::copy(device_vector.begin(), device_vector.end(),
host_vector.begin());

--- Conclusion ---

The Boost Compute library provides a useful, intuitive, and familiar
interface for running high-performance parallel code on GPU devices.
Incorporating Boost.Compute into the Boost libraries would make GPU
computing readily accessible to a large number of C++ developers.

All comments and feedback are welcome and greatly appreciated.

Thanks,
Kyle

[boost] Interest in a GPU computing library

Kyle Lutz