
Eric Niebler wrote:
Erik Wien wrote:
Ultimately I feel that the operation of normalization (which involves canonical decomposition) of unicode strings should be hidden from the user completely and be performed automatically by the library where that is needed. (Like on a call to the == operator.) I think that solution would be satisfactory for most users as the normalization process is somewhat intricate and really not something users should be forced to understand.
Are we at all on the same page now?
No. "Normalization" doesn't always mean canonical decomposition. There are several canonical forms, some of which *require* the use of composite characters. In fact, the XML standard requires such a canonical form. A Unicode library cannot hide the issue of canonicalization from the user, because users will care which canonical form is being used.
Why? If I want to compare two string, I don't really care which normalized form is used. - Volodya