
My other concern is that while it should be fairly straightforward to provide efficient SIMD instruction sequences for addition, subtraction, multiplication, (refined) division, and related fused-accumulate ops, many of the more "interesting" SIMD algorithms are highly sensitive to swizzling & permutation, prefetching, built-in conversion operations, etc.
A SIMD library exposing just (for instance) the Ring or Field algebraic operations does have its uses; however, I feel that in most "real world" scenarios the unexposable details needed for high-performance coding will mean that the library is only used for toy applications. That is, unless you think you can also define platform-independent forwarding to swizzle/permute/conditional-lane-usage/etc.
We did our homework I can reassure you. We provide more than just operators on these pack and we use proto to detect fused operations seuqence and replace them before evaluation. As for swizzling and permute, we have various potential interface for this and trying to settle on some. Conversion are used through a simd::cast<T> operator. As for prefetching, the current solutino is to provide a generic prefetch function one can use. In NT2, those prefetch are estimated and inserted in the array evaluation but here, we deal with a lower level.
Another (particular ugly) use of SIMDs is to allow per-object aligned storage & load of structures into/outof binary array blobs. This would basically look like an allocator for an object that uses the SIMD extensions for loading/storing data-structures as binary blobs into and out of arrays. Have you considered an interface for this sort of usage?
I fail to understand your use case ? We have some SIMD compatible allocator already but what d you make reference to ?