Re: [boost] lifetime of ranges vs. iterators

3 Sep 2008

      on Wed Sep 03 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:
...
On Wed, Sep 3, 2008 at 3:11 PM, David Abrahams <dave@boostpro.com> wrote:
...
on Tue Sep 02 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:
...
...
Great, but what are you doing with these stacks?  Normally, the answer
would be something like "I'm applying the X algorithm to a range of
elements that represent Y but have been adapted to look like Z"
Usually is a matter of converting a text document to a point in a
feature vector space as a preprocessing stage:
e.g., for the purposes of search?  Just trying to get a feel for what
you're describing.
Near duplicate elimination and document clustering in general. Most
clustering algorithms work on representations of documents as points
in a very high dimension space.
OK.
...
...
...
Most pipelines start with tokenization, normalization, filtering and
hashing, with a 'take' operation at the end.
By 'take' you mean a destructive read?
What do you mean with 'destructive'?
Nevermind; it was a stab in the dark and you explain what you mean
below.
...
'take(range, n)' returns a range adaptor that iterates over the first
'n' elements of 'range'. This way I do not have to decide how much of
the original document (which theoretically could be of unlimited
length) I need to until the end of the pipeline (most importantly, I
want to set the final length *after* filtering).
OK.
...
<snip>
...
...
Most of the uses of lazy ranges are in relatively non performance
critical part of the application, so I do not aim for absolute zero
abstraction overhead.
Okay, so... is it worth breaking up the iterator abstraction for this
purpose?
Do you think we are breaking the iterator abstraction? How? Or I've
misunderstood the question?
Well, if we add "Factorable" or something like it, the iterator
abstraction gets broken down into smaller bits for the purpose of
storage, complicating some code [like a generic range for example ;-)]
and it imposes an extra coding duty on writers of high-quality iterator
adaptors.  I'm asking if its worth it.
...
We are simply trying to come up with a simple and clean trick to keep
iterator size under control.
Really?  IIUC so far we're only discussing compressing the size of
ranges, unless we're willing to pay for indirection inside of iterators
(I'm not).
...
...
...
I'm interested in using dynamic iterators in the near future for code
decoupling
You mean, like, any_iterator?
Yes.
...
...
and It would be a pity if these ranges couldn't fit in a small object
optimization buffer.
Do you know how big that target ("small object optimization buffer") is?
I'm going to implement my own any_iterator, so I'll make the buffer
big enough for the average iterator size.
Oh, that's not the small object optimization.  The SOO is only done by
compilers: it means passing small objects in registers instead of on the
stack.
http://softwarecommunity.intel.com/isn/Community/en-US/forums/thread/323704....
IIUC you're talking about the too-specifically-named "small string
optimization."

I don't understand why you would implement any_iterator yourself when
there are so many extant implementations, but maybe it's none of my
business.
...
Currently the iterators I use are often over 400 bytes, which is a bit
to big :), thus the need to squeeze their size down as much as
possible. A hundred bytes could be enough [1].
Again, have you measured?
...
[1] 100 bytes for SBO might seems a lot, but, I've used a custom
string with a very large SBO buffer (over 100 bytes), been happily
copying it around and performance was much better than with standard
std::string (which, admittedly, did do reference counting but no SBO).
I guess that the size of the buffer depends a lot on what you are
going to do with it.
Yes indeed.

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com