
27 Sep
2007
27 Sep
'07
10:54 a.m.
James Porter wrote:
Actually, UTF-32 (equivalently UCS-4) *is* fixed-width (as of the Unicode 5.0.0 standard). Page 31 of the standard (chapter 2) says:
"UTF-32 is the simplest Unicode encoding form. Each Unicode code point is represented directly by a single 32-bit code unit. Because of this, UTF-32 has a one-to-one relationship between encoded character and code unit; it is a fixed-width character encoding form."
UTF-32 is a fixed-width encoding of Unicode, but Unicode itself is a "variable-width character set", what with combining characters. Whether this is the business of a core string layer in C++ is a different question. Sebastian Redl