
From: Artyom Ok let's thing what do you need iterators for? Accessing "characters" if so you are most likely doing something terribly wrong as you ignore the fact that codepoint != character.
I would say such iterator is wrong by design unless you develop a Unicode algorithm that relates to code point.
Now wouldn't it be nice if ascii_t (or whatever it's called) and utf*_t string classes had 3 kinds of iterators: - storage iterator (char, wchar_t etc.), - codepoint iterator, - character iterator. You could then reuse many existing algorithms to perform operations on a level that is sufficient in a given situation, like: - bitwise copy: std::copy(utf8_1.storage_begin(), utf8_1.storage_end(), utf8_2.storage_begin()) - check if utf32 is a substring of utf8, codepoint-wise: std::search(utf8.codepoint_begin(), utf8.codepoint_end(), utf32.codepoint_begin(), utf32.codepoint_end()) - character-wise copy ascii_t to utf_16, considering the codepage of ascii object: utf16_t utf16(ascii.character_begin(), ascii_t.character_end()) - count codepoints: std::distance(utf8.codepoint_begin(), utf8.codepoint_end()) - count characters: std::distance(utf8.character_begin(), utf8.character_end()) - get the 5th codepoint: std::advance(utf8.codepoint_begin(), 5) I don't know Unicode quirks enough to tell how useful this interface would be, but it seems interesting. What do you think? Best regards, Robert