Re: [boost] Accelerating algorithms with SIMD - Segmented iterators and alternatives

19 Oct 2010

      At Mon, 18 Oct 2010 22:43:25 +0200,
Sebastian Redl wrote:
...
On 18.10.2010, at 16:34, David Abrahams wrote:
...
No, you need different code for iterating the end bits.  Look
carefully at the hierarchical_fill implementation near the end of the
paper.
Segmented iterators as described in the paper have two limitations, as I see it:
1) The value type of the flat iterator and the local iterator have
to be the same.
Which is totally fine in this case
...
2) Segments must be uniform.
(2) means that you can't have one segment have one type with one
kind of iterator, and the next another type with another kind of
iterator. Case in point, joint_iterator over a fusion sequence of
containers.
Also totally fine in this case.  If you want to operate on a
heterogeneous sequence, you'd need fusion-y segmented iterators of
course, so you'd make the obvious corresponding extensions of fusion
iterator concepts.  But you don't need it for SIMD, AFAICT
...
You can't represent those with a segmented iterator, because the
segments are different container types and thus you cannot have a
segment iterator, since it wouldn't have a single value type.  It
occurs to me that there probably would be a way to bring
compile-time fixed-size heterogenous iterators (well, fusion
sequences) into a reusable concept world here too,
They're already in that world, IIUC.
...
but that's a lot of additional work on top of segmented iterators.
(2) also means that the local iterators for all segments are the
same, and therefore that the final value type of every segment has
to be the same.
Also not a problem.
...
(1) is not immediately obvious. The segmented_iterator_traits do not
actually make this requirement. However, it is a consequence of the
intended use of segmented iterators: transparent efficiency increase
for the standard algorithms. fill() is a case in point. The flat
version looks like this:
template <typename ForwardIterator, typename T>
void fill(ForwardIterator first, ForwardIterator last, const T &v) {
  for (; first != last; ++first) {
    *first = v;
  }
}
Requirements: ForwardIterator is a forward iterator, and T is
assignable to ForwardIterator::reference.
If ForwardIterator was additionally a segmented iterator, the
complicated hierarchical fill should be transparently invoked. This
comes down to this (implementation switching omitted):
template <typename SegIterator, typename T>
void fill(SegIterator first, SegIterator last, const T &v) {
  typedef segmented_iterator_traits<SegIterator> Seg;
  if (Seg::segment(first) == Seg::segment(last)) {
    fill(Seg::local(first), Seg::local(last), v);
  } else {
    // omitted
  }
}
But that nested fill has a different ForwardIterator, and thus
introduces a new requirement that bubbles outward: T must be
assignable to LocalIterator::reference.
So the only way that segmented iterators stay a transparent
optimization (that is, can be used in the standard algorithms
without introducing additional requirements on the caller) is if the
LocalIterator's reference guarantees that anything assignable to
ForwardIterator's reference is also assignable to itself, which is
pretty much the same as saying that the two have to be the same. Not
quite, but almost.
I don't understand what you're talking about in that section above,
but I think you're making things too complicated because you're
fixated on the idea of a heterogeneous sequence.  Here's a simple
extension to the hierarchical algorithm formulation that I think could
accomodate SIMD.  The key change is the segment_fill function, which
gives us an additional customization point with which to capture SIMD
segments

  template<class SegIter,class T>
  void fill(SegIter first, SegIter last, const T& x, true_type)
  {

      typedef segmented_iterator_traits<SegIter> traits;

      typename traits::segment_iterator sfirst = traits::segment(first);
      typename traits::segment_iterator slast = traits::segment(last);

      if(sfirst==slast)
          fill(traits::local(first),traits::local(last), x);
      else 
      {
          fill(traits::local(first), traits::end(sfirst), x);

          for(++sfirst; sfirst!=slast; ++sfirst)
              segment_fill(sfirst, x, traits());

          fill(traits::begin(sfirst), traits::local(last), x);
      }
  }

  template <class SegmentForwardIter, class T, class Traits>
  void segment_fill(
      SegmentForwardIterator s, T const& x, Traits traits)
  {
      fill(traits.begin(s), traits.end(s), x);
  }

  template <class ForwardIter,class T>
  void
  fill(ForwardIter first,ForwardIter last,const T&x, false_type)
  {
      for(; first!=last; ++first)
          *first = x;
  }

  template <class Iter,class T>
  inline void
  fill(Iter first,Iter last,const T&x)
  {
      typedef segmented_iterator_traits<Iter> Traits;

      fill(first,last,x,
           typename Traits::is_segmented_iterator());
  }

  template <class U, class T, class Traits>
  void segment_fill(
      my_SIMD_iterator<U> s, T const& x, Traits traits)
  {
      // Do SIMD stuff on an entire segment
  }

Naturally, there are some things you'd want to rework about this.  For
example, you'd probably want to rethink the way the segmented traits
are formulated and how dispatching to segment_fill worked.  But
Austern's paper gives the bones of a solution, I think.

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com