Re: [boost] Re: Re: Any interest in adding unicode support to boost?

21 Oct 2004

      On Thu, 21 Oct 2004 20:31:24 +0200, Erik Wien <wien@start.no> wrote:
...
The best solution would be to never append single code units, but instead
code points. The += operator would determine how many code units is required
for the given code point.
I fully agree with you on that; I was considering what should happen
if the user appended something invalid (e.g., an isolated surrogate).
Sorry for any confusion caused.

I made a second mistake in mixing up the two levels in an unclear way.
I very much like Peter's suggestion of using free functions converting
invalid values to valid ones. Using that I suggest:

unicode::codepoint_string should throw when an invalid codepoint is
appended to it (e.g., an isolated surrogate).
unicode::correct_codepoint() should convert an invalid codepoint into
U+FFFD, and could be used to "safely" insert codepoints.

char32_t correct_codepoint (char32_t);

unicode::string should take a unicode::character for appending. A
unicode::character object may be constructed with a single codepoint,
which will be its base character. If this codepoint is invalid, it
should throw. If the codepoint is a combining mark, it should also
throw.
unicode::correct() should convert an invalid codepoint into U+FFFD,
and if it is input a combining mark, it should use U+0020 SPACE as a
base character.

character correct (char32_t);

Regards,
Rogier

Re: [boost] Re: Re: Any interest in adding unicode support to boost?

Rogier van Dalen