Re: [boost] [SPAM (Bayesian)] - Re: lifetime of ranges vs. iterators - Bayesian Filter detected spam

1 Sep 2008

      On Mon, Sep 1, 2008 at 4:08 PM, Arno Schödl <aschoedl@think-cell.com> wrote:
...
...
Need to think about it. Could you give a practical example of an
iterator adapter needing three iterators?
I never needed them in practice. But I would forward this to Dave. He likes the
complete generality of factored iterators, I think. I think it is a cleaner concept than
saying the common data must be representable as a range, but I cannot really back
this up with practical examples. The wrapping/unwrapping (see below) troubles me in > either case.
Sure, full generality is always good, but if ranges cover 90% of the
cases, maybe it is good enough.
...
...
You would still have to allow for storing the wrapped iterator in the
wrapper if the wrapped iterator does not support refactoring. I do not
think there is a way around it.
But in my proposal (actually, yours), all iterators would be "lean", and each iterator
would have access to its range, all the way up the iterator/range stack. So each
iterator would itself only wrap other lean iterators. Storage bloat for stack size N
would only be O(N) for the indirections, rather than O(2^N).
You can't assume that all iterators are lean. In fact most won't.
Unless you drop compatibility with existing iterators, which is a
no-no.

My proposal was aimed at making ranges as lean as possible, not
necessarily iterators.
Anyways, as long as all parts of the stack are 'lean' ranges, you
should have only a 2x storage bloat when using iterators.
...
...
Also, I have this (completely unfounded) feeling that an iterator that
relies on accessing common data in the owning range might be harder to
optimize.
...
I think that ranges should be used for long term storage, so if the
iterator size itself is not optimal shouldn't be a big problem.
Without any kind of storage optimization like your preferred-range templates or
Dave's factored-iterators, the storage bloat would be _factor_ 2^N, so ever handling > completely unwrapped iterators just does not scale.
So it comes down to:
a) with storage optimization, at every iterator access, unwrap/wrap the iterator out of their storage:
[Dave]
  void increment() {
     m_itBase=decompose( ++compose( m_itBase, m_commondata ) ).first; // second is common_data again
  }
or
[Giovanni]
  void increment() {
     m_rng=TRange( ++m_rng.begin(), m_rng.end() );
  }
(Make that '++' 'boost::next()').

They are pretty equivalent. I think that my idea requires less
scafolding: you probably are going to make a specific wrapper range
anyway, so it is a mater of specializing associated_range. On the
other hand, once you have compose and decompose, you proably do not
need a specific wrapper. So, I do not know. I would have to try.

I do not think that the above looks that bad. In fact I think that
compilers should be able to optimize that wrapping and unwrapping
pretty well (even more with move semantics).

Here is a more or less complete (forward)  filter_iterator and
associated range (given appropriate definitions of front and rest)
[note: more or less pseudocode]

template<class Iter, class F>
struct filter_iterator : iterator_facade<blah,blah...> {
 filter_iterat(Iter b, iter e, F f ) :
   m_range(b,e), m_f(f) {}
private:
  void increment() {
      while(!m_f(front(m_range))
            m_range = rest(m_range);
  }

  reference dereference() const {
     // problem: front(x) is defined as *begin(x);
     // what if the reference returned by
     // begin(m_range) is invalidated when the
     // iterator goes out of scope? Yes, it
     // happens and I got bitten at least once.
     // We need a trait.
     return front(m_range);
  }

  bool equal(filter_iterator const&rhs) {
         return begin(rhs.m_range) == begin(m_range) &&
                    end(rhs.m_range) == end(m_range);
  }

   associated_range<Iter>::type m_range;
   F m_f; // use compressed pair here
};

template<class R, class F>
struct filter_range {
  typedef filter_iterator<range_iterator<R>::type, R> iterator;

  iterator begin() const{
       return iterator(begin(m_range), end(m_range), m_f);
  }

  iterator end() const{
       return iterator(end(m_range), end(m_range), m_f);
  }

   R m_range;
   F m_f;  // use compressed pair
};

// specialization
template<class Iter, class F>
class associated_range<filter_iterator<Iter, F> >  {
   typedef filter_range<associater_range<Iter>::type, F> type;
};

This should handle nested filter iterators (assuming no further
reductions) or other iterators with specialized nested_ranges quite
nicely.
...
or
b) have the iterator carry an indirection to the outside data.
Given that the iterators coming out of a) are decomposed recursively by the next
layer of iterators (that is, ItBase::operator++ inside the expression does the same
thing again, all the way up), I would think that a) is actually harder to optimize than b),
compilers are quite good at eliminating all compile time recursive
functions, so I would say it doesn't matter.
...
and the effect of no optimization would be much more dramatic.
this is always the case with code with high abstraction (10x is what I
usually see).

-- 
gpd

Re: [boost] [SPAM (Bayesian)] - Re: lifetime of ranges vs. iterators - Bayesian Filter detected spam

Giovanni Piero Deretta