Re: [boost] Re: Any interest in adding unicode support to boost?

23 Oct 2004

      On Fri, 22 Oct 2004 14:49:46 -0400, Beman Dawes <bdawes@acm.org> wrote:
...
At 01:10 PM 10/22/2004, Miro Jurisic wrote:
...
boost::fs, as far as I understand it, ran into the problem that it was
impossible to sidestep the invariant.
No, rather than the error check was on by default. Some people want it off
as the default.
As far as Unicode strings are concerned, the question is a little
different. Is it well defined behavior to create a string that does not
meet the Unicode invariants? If so, can ordinary operations break
invariants, or is such dangerous activity restricted to "experts only"
functions?
I guess you could say that all ordinary operations may take place on
three different levels. Appending a code unit to the sequence of code
units may make it uninterpretable as a codepoint sequence. Appending a
codepoint may make it uninterpretable as a sequence of characters. The
problem I think is not the operations, but rather the level they
operate on.

I have not yet found examples where a non-const code unit or codepoint
sequence is needed, except for input. I think initialising from a code
unit sequence (say, a UTF-8 encoded file) from two iterators, as shown
by Peter Dimov, would be just right.

(You can always make your own UTF-8 sequence and put it into a Unicode
string, of course, but this will probably mean copying the data.)

(For output, non-mutating access to the code units may be provided,
for example to UTF-16 code units if you want to interface with Win32
API functions.)

Rogier

Re: [boost] Re: Any interest in adding unicode support to boost?

Rogier van Dalen