
On Wed, 16 Mar 2005 18:13:36 +0100, Erik Wien <wien@start.no> wrote:
Not entirely, but certainly less that optimal. basic_string (and the iostreams) make assuptions that don't neccesarily apply to Unicode text. One of them is that strings can be represented as a sequence of equally sized characters. Unicode can be represented that way, but that would mean you'd have to use 32 bits pr. character to be able to represent all the code point assigned in the Unicode standard. In most cases, that is way too much overhead for a string, and usually also a waste, since unicode code points rarely require more that 16 bits to be encoded. You could of course implement unicode for 16 bit characters in basic_string, but that would require that the user know about things like surrogate pairs, and also know how to correctly handle them. An unlikely scenario.
Looking at the code, it seems to duplicate alot of what basic_string does. AFAIK, though i haven't looked that close at unicode, you have two ways of viewing the string. As a string of UTF-* elements(?) and the other as a string of characters. The former has the same properties as basic_string, the latter doesn't. It seems to me then, that a possible design would be to make it a basic_string and provide special iterators etc that views the string as characters. This would require the iterator to have a reference to the basic_string to be able to support assignment. Maybe it would require whole wrapper class around basic_string to provide the required functionality. Rakshasa