
John Maddock wrote:
No. We're only talking about case folding -- specifically the mappings found in http://www.unicode.org/Public/UNIDATA/CaseFolding.txt.
Well maybe you are, but the regex traits clases were always intended to allow for other forms of equivalence as well.
OK
Here's my suggestion. We add to the traits class two functions:
bool in_range(Char from, Char to, Char ch);
What use is this one, or are you allowing equivalents other than case folding now <wink>? If so then I approve :-)
I dunno. I threw it in for completeness, but I don't think any implementation besides: return from <= ch && ch <= to; makes sense. You don't want to do any character translations or fancy equivalence stuff here. Consider what happens if translate(from) > translate(to).
bool in_range_nocase(Char from, Char to, Char ch);
OK, but see below.
We define the behavior of the regex engine in terms of these functions, but we don't require their use. In particular, for narrow character sets, implementers would be free to use a std::bitset<256>, enumerate the char range [from, to], call translate_nocase on each char, and set the appropriate bit in the bitset. Matching happens by calling translate_nocase on the input char and seeing if its bit is set in the bitset. That gives the same behavior.
I don't like traits class API's that may or may not be called: what happens if a user defined traits class is provided that alters the behavior of in_range, but not translate? The side effects produced by these API's are clearly visible.
As I suggest above, I don't think in_range should depend on translate. Your point is still valid, though, but the optimization is too important to ignore. We could standardize a specialization of regex_traits<char> (like the specialization of char_traits<char>) for which the behavior is known. Or more generally, we could require that for all regex traits for which 1==sizeof(char_type) then in_range_nocase is required to give the same results as the algorithm described above.
I agree Unicode support is clearly desirable: however on point of proceedure, I believe it's too late to change this for TR1, changes for C++0x are clearly still possible though. Whatever we need to file this as a DR.
Agreed. How does one file a DR? On comp.std.c++? Do you want to do the honors, or should I?
The most pressing point for level 1 support is section 1.5 Caseless Matching: "Supported, note that at this level, case transformations are 1:1, many to many case folding operations are not supported (for example "ß" to "SS"). "
The way I read this, a 1:1 mapping is all that is needed for Level 1 support. So we don't have to worry about "ß" to "SS" unless we are shooting for Level 2 or 3, which IMO we should. But that's a radical change from TR1 regex. Let's fix what we got first. -- Eric Niebler Boost Consulting www.boost-consulting.com