
On Fri, 22 Oct 2004 14:49:46 -0400, Beman Dawes <bdawes@acm.org> wrote:
At 01:10 PM 10/22/2004, Miro Jurisic wrote:
boost::fs, as far as I understand it, ran into the problem that it was impossible to sidestep the invariant.
No, rather than the error check was on by default. Some people want it off as the default.
As far as Unicode strings are concerned, the question is a little different. Is it well defined behavior to create a string that does not meet the Unicode invariants? If so, can ordinary operations break invariants, or is such dangerous activity restricted to "experts only" functions?
I guess you could say that all ordinary operations may take place on three different levels. Appending a code unit to the sequence of code units may make it uninterpretable as a codepoint sequence. Appending a codepoint may make it uninterpretable as a sequence of characters. The problem I think is not the operations, but rather the level they operate on. I have not yet found examples where a non-const code unit or codepoint sequence is needed, except for input. I think initialising from a code unit sequence (say, a UTF-8 encoded file) from two iterators, as shown by Peter Dimov, would be just right. (You can always make your own UTF-8 sequence and put it into a Unicode string, of course, but this will probably mean copying the data.) (For output, non-mutating access to the code units may be provided, for example to UTF-16 code units if you want to interface with Win32 API functions.) Rogier