Re: [boost] Strings tagged with their character set

27 Sep 2007

      James Porter wrote:
...
Actually, UTF-32 (equivalently UCS-4) *is* fixed-width (as of the 
Unicode 5.0.0 standard). Page 31 of the standard (chapter 2) says:
"UTF-32 is the simplest Unicode encoding form. Each Unicode code point 
is represented directly by a single 32-bit code unit. Because of this, 
UTF-32 has a one-to-one relationship between encoded character and code 
unit; it is a fixed-width character encoding form."
UTF-32 is a fixed-width encoding of Unicode, but Unicode itself is a
"variable-width character set", what with combining characters.

Whether this is the business of a core string layer in C++ is a
different question.

Sebastian Redl