
15 May
2009
15 May
'09
2:12 p.m.
Scott McMurray wrote:
I really think UTF-8 should be the recommended one, since it forces people to remember that it's no longer one unit, one "character".
Even in Beman Dawes's talk (http://www.boostcon.com/site-media/var/sphene/sphwiki/attachment/2009/05/07/...) where slide 11 mentions UTF-32 and remembers that UTF-16 can still take 2 encoding units per codepoint, slide 13 says that UTF-16 is "desired" where "random access critical".
I don't plan on supporting random access for UTF-16. UTF-16 is still faster than UTF-8 because UTF-8 requires more complex decoding. UTF-16 has only two cases, making it easier to optimize branches under the likely and unlikely case.