
On Wed, 19 Jan 2011 00:00:59 +0100 Robert Kawulak <robert.kawulak@gmail.com> wrote:
From: Artyom Ok let's thing what do you need iterators for? Accessing "characters" if so you are most likely doing something terribly wrong as you ignore the fact that codepoint != character.
I would say such iterator is wrong by design unless you develop a Unicode algorithm that relates to code point.
Now wouldn't it be nice if ascii_t (or whatever it's called) and utf*_t string classes had 3 kinds of iterators: - storage iterator (char, wchar_t etc.), - codepoint iterator, - character iterator.
The current iterators fall under the storage iterator category, but code-point iterators are easily possible. Character iterators may require help from a full-fledged Unicode library (I don't yet know whether there's a simple way to determine what code-points are combining ones, I doubt there is), but they should be doable too.
You could then reuse many existing algorithms to perform operations on a level that is sufficient in a given situation [...] I don't know Unicode quirks enough to tell how useful this interface would be, but it seems interesting.
And intriguing. When I get back to the Unicode string classes, I'll look into adding such iterators. -- Chad Nelson Oak Circle Software, Inc. * * *