
20 Oct
2004
20 Oct
'04
1:09 p.m.
So unicode::string<unicode::codepoint_string<std::string> > would be a UTF8-encoded string that is manipulated using its characters.
Encoded characters or abstract characters? (See section 2.4 of Unicode standard for definitions)
I mean a base character with its combining characters. I don't think this is the same as "abstract character", is it? My plan was to decompose all characters in unicode::string. This makes manipulation of diacritics easier. Correct me if I'm wrong, but your example of finding "ΓΌ" in a string would come down to finding the codepoint sequence "U+0075 U+0308" and checking whether it is not followed by another combining character, pretty trivial still. Regards, Rogier