[boost] Re: Boost Unicode support ideas

14 Apr 2004

      Jeremy Maitin-Shepard <jbms@attbi.com> writes:
...
What I am saying is that operations such as "convert to uppercase" on
Unicode strings are locale-independent, and thus such operations need not
and should not be part of the locale interface.
In which case you are wrong. The SpecialCasings.txt file from the Unicode data
file set identifies locale-specific case conversions, such as:

# Turkish and Azeri

# I and i-dotless; I-dot and i are case pairs in Turkish and Azeri
# The following rules handle those cases.

# Remove spurious dot above small i's when lowercasing, if there are no more
# accents above:

0307; ; 0307; 0307; tr AFTER_i NOT_MORE_ABOVE # COMBINING DOT ABOVE
0307; ; 0307; 0307; az AFTER_i NOT_MORE_ABOVE # COMBINING DOT ABOVE

# Fix case pairs

0049; 0131; 0049; 0049; tr; # LATIN CAPITAL LETTER I
0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I

0049; 0131; 0049; 0049; az; # LATIN CAPITAL LETTER I
0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I

In fact, as the sample shows, not only is case conversion locale-dependent,
but it is context-dependent too --- the conversion of a character depends on
the preceding characters.

Anthony
-- 
Anthony Williams
Senior Software Engineer, Beran Instruments Ltd.