
[Repost: first one doesn't appear to have made it through] Just trying to refocus this thread to what I meant it to be. We have a SIMD abstraction layer that we would like to eventually submit to Boost. For appropriate values of N, simd::pack<T, N> represents a SIMD register, which allows you to do the same operation on N elements in parallel with a single CPU instruction. pack<T, N> provides all the same operators as T, and can also detect a sequence of operations that exists on the CPU as a single instruction. It falls back to a loop if the architecture has no SIMD register of size sizeof(T)*N bytes. simd::pack<T, N> is also a fusion sequence and a range. The library also provides a series of useful functions, like summing a pack or reordering its elements. What I would like to know, is how people think we could integrate this system into iterators and ranges so that existing algorithms could be adapted to treat N elements at a time instead of 1, and therefore get a potential speed gain. As I said, we currently have an iterator adapter that adapts a range of properly aligned Ts, that are also contiguous at least in chunks of N elements, into a range of pack<T, N>s. Are there other utilities we could provide to help with the usage of SIMD? I was thinking of supporting adapting non-aligned ranges as well and padding them with some values, but that is not possible to do efficiently with the standard iterator model, which led to some discussion about an alternative iterator model that seems to be going nowhere. Any suggestion of features or tools the SIMD could provide to make the life of the developer easier would be appreciated. With regards to alignment, we also provide a memory allocator that aligns correctly, and functions to get the next or previous aligned address from a particular address.