
On 9/27/07, Jeremy Maitin-Shepard <jbms@cmu.edu> wrote:
I think as others have said, in practice a fixed-width encoding really gains you very little or nothing at all. Needing random access to code points is, I think, an extremely rare operation.
I know, but it'd be easy to put together a fixed-width encoded basic_string, and we could use that as a basis for building a code conversion framework, at least as a proof-of-concept. Of course, that assumes that we'd be using basic_string for fixed-width strings, which isn't necessarily the case. UCS-2 is bogus and should not be used at all. Conceivably UCS-4 is
legitimate but in practice not likely to be used by anyone. Still, it is probably important to support it.
Are there any situations where UCS-2 is actually needed (deprecated libraries, for instance)? If not, then I agree that we can eliminate it. I don't think the issues of a mutable UTF-8/UTF-16 representation are
very different from the issues of a mutable UTF-32 representation. In practice, in handling non-ASCII text, all searching and replacement will be in terms of substrings (likely single or sequences of grapheme clusters).
I suppose it depends on how we allow UTF-8/UTF-16 strings to be modified. Direct (mutable) character access through operator [] would be bad, but substrings would be better. Depending on the situation, it may be better to use a stringstream to compose a new string from the old. I'd have to think about it some more. - James