Hi, I have no experience with OpenCL or GPU computing in general, so bear with me if my questions sound silly. I have a few questions regarding Boost.Compute: 1. When you define a kernel (e.g. with the BOOST_COMPUTE_FUNCTION macro), is this kernel supposed to be in C? Can it reference global (namespace scope) objects and other functions? Other kernels? 2. When is the kernel compiled and uploaded to the device? Is it possible to cache and reuse the compiled kernel? 3. Why is the library not thread-safe by default? I'd say, we're long past single-threaded systems now, and having to always define the config macro is a nuisance. 4. Is it possible to upload the data to process to the device's local memroy from a user-provided buffer, without copying it to boost::compute::vector? Same for downloading. What I'd like to do is move some of data processing to the GPU while the rest is performed on the CPU (possibly with other libraries), and avoid excessive copying. 5. Is it possible to pass buffers in the device-local memory between different processes (on the CPU) without downloading/uploading data to/from the CPU memory? 6. Is it possible to discover device capabilities? E.g. the amount of local memory (total/used/free), execution units, vendor and device name? Thanks.