
Jeremy Maitin-Shepard <jbms@attbi.com> writes:
What I am saying is that operations such as "convert to uppercase" on Unicode strings are locale-independent, and thus such operations need not and should not be part of the locale interface.
In which case you are wrong. The SpecialCasings.txt file from the Unicode data file set identifies locale-specific case conversions, such as: # Turkish and Azeri # I and i-dotless; I-dot and i are case pairs in Turkish and Azeri # The following rules handle those cases. # Remove spurious dot above small i's when lowercasing, if there are no more # accents above: 0307; ; 0307; 0307; tr AFTER_i NOT_MORE_ABOVE # COMBINING DOT ABOVE 0307; ; 0307; 0307; az AFTER_i NOT_MORE_ABOVE # COMBINING DOT ABOVE # Fix case pairs 0049; 0131; 0049; 0049; tr; # LATIN CAPITAL LETTER I 0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I 0049; 0131; 0049; 0049; az; # LATIN CAPITAL LETTER I 0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I In fact, as the sample shows, not only is case conversion locale-dependent, but it is context-dependent too --- the conversion of a character depends on the preceding characters. Anthony -- Anthony Williams Senior Software Engineer, Beran Instruments Ltd.