Re: [boost] [string] proposal

21 Jan 2011

      On Sat, Jan 22, 2011 at 1:51 AM, Dave Abrahams <dave@boostpro.com> wrote:
...
At Sat, 22 Jan 2011 01:14:38 +0800,
Dean Michael Berris wrote:
...
...
...
4. Looks like a real STL container except the iterator type is smarter
than your average iterator.
Encoding is a matter of external interpretation and I think should not
be part of a string's interface. You can have wrappers that interpret
a string as a UTF-* string.
What does it iterate over?  chars?  code points?  characters?
Something else?
I can see basically a way of saying what you want when you want to get
an iterator from it -- by default though a call to '.begin()' will
return an iterator characters (just so you don't break compatibility
with std::string).
Then you mean an iterator over chars, not characters.
Yeah, over chars. :)
...
...
The iterator can store a reference to the original string and when
advanced, can do the appropriate interpretation of the string in
context. If you wanted a code point iterator, you'd get the code point
iterator. If you wanted a character based on a certain encoding then
you can have a special iterator for that. An iterator would also know
whether it was out of bounds.
This allows people to write code that dealt with code points,
characters (based on the encoding), and raw data if absolutely
necessary.
Hmm, I'm just not sure whether these are useful.  The iterators to be
supplied (if any) should IMO be dictated by the needs of real
algorithms.
I thought about it a little more too, and there should be a way of
just crafting the appropriate iterator from the outside -- much like
how the current Iterators library allows you to create different kinds
of iterators.

Algorithms that deal with text, like rendering characters for example
in a GUI, would basically need to iterate over code points or glyphs.
Typesetting algorithms would pretty much need the same kind of
traversal. Also things like instance counting (building a histogram
based on character counts) for example for compression and all the
cool things like that would need to have access to individual
"elements" of a given text -- in the pre-Unicode days this was just a
simple table of 255 characters, unfortunately it's gotten a lot more
complex than that ;).

-- 
Dean Michael Berris
about.me/deanberris