
On Thu, 21 Oct 2004 20:31:24 +0200, Erik Wien <wien@start.no> wrote:
The best solution would be to never append single code units, but instead code points. The += operator would determine how many code units is required for the given code point.
I fully agree with you on that; I was considering what should happen if the user appended something invalid (e.g., an isolated surrogate). Sorry for any confusion caused. I made a second mistake in mixing up the two levels in an unclear way. I very much like Peter's suggestion of using free functions converting invalid values to valid ones. Using that I suggest: unicode::codepoint_string should throw when an invalid codepoint is appended to it (e.g., an isolated surrogate). unicode::correct_codepoint() should convert an invalid codepoint into U+FFFD, and could be used to "safely" insert codepoints. char32_t correct_codepoint (char32_t); unicode::string should take a unicode::character for appending. A unicode::character object may be constructed with a single codepoint, which will be its base character. If this codepoint is invalid, it should throw. If the codepoint is a combining mark, it should also throw. unicode::correct() should convert an invalid codepoint into U+FFFD, and if it is input a combining mark, it should use U+0020 SPACE as a base character. character correct (char32_t); Regards, Rogier