
Sebastian Redl wrote:
Eric Niebler wrote:
Since I wrote the above, I have fixed the performance problem with BMH and case-insensitive matches by extended the regex traits class with a function that returns all the case-folded equivalents of a character. This resulted in a significant performance improvement for case-insensitive matches.
How does that work with multiple character case mappings, like the German ß -> SS (the sharp s does not exist in upper case)?
It doesn't. :-P Xpressive aims for "Basic Unicode Support," as defined by Unicode TR18 (http://www.unicode.org/reports/tr18/): Some caseless matches may match one character against two: for example, U+00DF "ß" matches the two characters "SS". And case matching may vary by locale. However, because many implementations are not set up to handle this, at Level 1 only simple case matches are necessary. So correct handling of German ß -> SS is only necessary for "Extended Unicode Support," which would be nice but is a more distant goal. Sadly and AFAICT, TR1.Regex doesn't even make accommodation for Basic Unicode Support, since it doesn't provide syntax for character set subtraction and intersection. In short, it's a problem, but there are bigger fish to fry. If you need a regex engine that can handle this today, try ICU. -- Eric Niebler Boost Consulting www.boost-consulting.com