Re: [boost] [regex] Boost.Regex and TR1 fundamentally broken?

24 Jun 2005

      ...
The most pressing point for level 1 support is section 1.5 Caseless
Matching:  "Supported, note that at this level, case transformations are
1:1, many to many case folding operations are not supported (for example 
"ß"
to "SS"). "
I forgot to mention: this is part of a larger digraph problem - in some 
languages more than one character may collate as a single unit - in some 
case Unicode may provide predefined ligatures for these, but they don't do 
so for every case combination of every ligature.

Boost.Regex supports things like [[.ae.]-[.ll.]] (match anything that 
collates in the range "ae" to "ll"), and currently this should work 
reasonably well in case insensitive mode as well (it fails where a 
many-to-one case transformation is required).  Also, since there is no way 
tell which digraphs (if any) are supported by the current locale, 
expressions such as [a-z] will only ever match one character, and never 
match say "ae", even if the current locale does regard "ae" as a single 
unit.  I believe this is the only sensible option, particularly as in many 
cases whether the next two characters are regarded as a digraph is dependent 
upon the meaning of the word (which is to say you need a dictionary to work 
it out, as Martin Bonner pointer out).

Re ICU: this appears to case folding (convert everything to a case 
insensitive form) for caseless comparisons, I would assume their regex 
component does the same, but haven't had a chance to try it out.

John.

Re: [boost] [regex] Boost.Regex and TR1 fundamentally broken?

John Maddock