
On 25/04/2011 21:50, Ryou Ezoe wrote:
On Tue, Apr 26, 2011 at 3:55 AM, Artyom<artyomtnk@yahoo.com> wrote:
From: Ryou Ezoe<boostcpp@gmail.com>
Sort by code point is not the best solution. But at least, it's consistent if we use one encoding.
No it is not, UCS encoding has different order in different representations:
UTF-8 and UTF-32 order is consistent i.e.
for each a,b in utf8(a)< utf8(b) iff utf32(a)< utf32(b)
However this is not correct for UTF-16 where codepoints outside of BMP has different ordering. i.e.
It may be that codepoint (a)> codepoint(b) but UTF-16(a) sorted before UTF-16(b)
What do you mean? No matter what UTF you use. Code point is same. You can't compare UTF-8 string by comparing each octet.
Actually, you can. And you should actually do it at the octet level for efficiency.