Re: [boost] [string] proposal

22 Jan 2011

      On 01/21/2011 09:50 AM, Beman Dawes wrote:
...
... elision by patrick ....
IMO, Any serious Unicode string proposal has to address UTF-8 strings,
UTF-16 strings, UTF-32 strings, and probably UTF strings where the
particular UTF encoding is established at runtime. Applications that
deal with Asian languages, do a lot of random access, or would pay a
performance or storage penalty will demand more than just UTF-8
strings. There might be other variants, too, such as a BMP-string. If
a Unicode string library provides a strong design framework that is
clearly articulated, then an initial implementation would only have to
provide the most needed types; UTF-8 and UTF-16/BMP.
I really doubt any proposal will get taken very seriously is it only
supports one of the UTF encodings.
+1 with the caveat that UTF-8 and UTF-32 is considered by many to be the 
most needed types with UTF-16 considered evil.  (Seems to be a 
Windows/non-Windows split.  I like them all;)  So all three (four if you 
want to differentiate between fixed-width UTF-16/BMP (really UCS-2) and 
the full UTF-16) would be needed to avoid people saying that it doesn't 
fill their needs so why did we bother.  The UTF string with run-time 
would carry a lot of extra code.  Wouldn't a programmer know which he 
wanted to use internally at compile time?

Patrick

p.s. Nice quick description of the differences between and history of 
UCS-2 UCS-4 utf-8 utf-16 utf-32 at 
http://en.wikipedia.org/wiki/Universal_Character_Set

Re: [boost] [string] proposal

Patrick Horgan