Re: [boost] Re: Re: Any interest in adding unicode support to boost?

22 Oct 2004

      On Fri, 22 Oct 2004 12:46:00 -0400 (EDT), Rob Stewart <stewart@sig.com> wrote:
...
From: Rogier van Dalen <rogiervd@gmail.com>
...
unicode::string should take a unicode::character for appending. A
unicode::character object may be constructed with a single codepoint,
which will be its base character. If this codepoint is invalid, it
should throw. If the codepoint is a combining mark, it should also
throw.
unicode::correct() should convert an invalid codepoint into U+FFFD,
and if it is input a combining mark, it should use U+0020 SPACE as a
base character.
Why not have unicode::character's ctor invoke unicode::correct()?
unicode::correct() replaces every encoding error in the input by a
replacement character. This loses information and it is not
recoverable. The combining character bit is only slightly better. When
I proposed a policy I called it workaround_encoding_error; maybe we
need a better name than "correct".

I agree with Peter Dimov, however, that the default should be to throw
rather than to throw away information and pretend nothing happened.

Regards,
Rogier