
20 Oct
2004
20 Oct
'04
5:41 p.m.
"Miro Jurisic" <macdev@meeroh.org> wrote in message news:macdev-
My plan was to decompose all characters in unicode::string. This makes manipulation of diacritics easier. Correct me if I'm wrong, but your example of finding "ΓΌ" in a string would come down to finding the codepoint sequence "U+0075 U+0308" and checking whether it is not followed by another combining character, pretty trivial still.
You have to not only decompose them but put them in a canonical decomposed order in order for that to work.
You could also do a Canonical Composition after the decompsition. (Normalization form C) Either way this is not something you would like to do on every assigment of a string, but rather when it is needed. (i.e. on comparison.)