
On Wed, Sep 3, 2008 at 3:11 PM, David Abrahams <dave@boostpro.com> wrote:
on Tue Sep 02 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:
Great, but what are you doing with these stacks? Normally, the answer would be something like "I'm applying the X algorithm to a range of elements that represent Y but have been adapted to look like Z"
Usually is a matter of converting a text document to a point in a feature vector space as a preprocessing stage:
e.g., for the purposes of search? Just trying to get a feel for what you're describing.
Near duplicate elimination and document clustering in general. Most clustering algorithms work on representations of documents as points in a very high dimension space.
Most pipelines start with tokenization, normalization, filtering and hashing, with a 'take' operation at the end.
By 'take' you mean a destructive read?
What do you mean with 'destructive'? 'take(range, n)' returns a range adaptor that iterates over the first 'n' elements of 'range'. This way I do not have to decide how much of the original document (which theoretically could be of unlimited length) I need to until the end of the pipeline (most importantly, I want to set the final length *after* filtering). <snip>
Most of the uses of lazy ranges are in relatively non performance critical part of the application, so I do not aim for absolute zero abstraction overhead.
Okay, so... is it worth breaking up the iterator abstraction for this purpose?
Do you think we are breaking the iterator abstraction? How? Or I've misunderstood the question? We are simply trying to come up with a simple and clean trick to keep iterator size under control.
A nice thing of lazy ranges is that are useful to control peak memory usage of the application (in fact if you are careful, you can do very little dynamic memory allocation)
Yes.
I'm interested in using dynamic iterators in the near future for code decoupling
You mean, like, any_iterator?
Yes.
and It would be a pity if these ranges couldn't fit in a small object optimization buffer.
Do you know how big that target ("small object optimization buffer") is?
I'm going to implement my own any_iterator, so I'll make the buffer big enough for the average iterator size. Currently the iterators I use are often over 400 bytes, which is a bit to big :), thus the need to squeeze their size down as much as possible. A hundred bytes could be enough [1]. [1] 100 bytes for SBO might seems a lot, but, I've used a custom string with a very large SBO buffer (over 100 bytes), been happily copying it around and performance was much better than with standard std::string (which, admittedly, did do reference counting but no SBO). I guess that the size of the buffer depends a lot on what you are going to do with it. -- gpd