
1) Do what ICU does and enumerate every character in the range [x-y] and convert it to it's case-folded equivalent. The trouble is it's pathologically slow for very large ranges. Of course if you use a very large range you get exactly what you deserve!
Please, no.
:-)
4) Punt the decision to the traits type :-) For xpressive, I added a in_range_nocase(Char a, Char b) member to the traits concept. By default the traits provided by xpressive do *not* do proper case folding. They just use toupper and tolower, and are documented as such. An ambitious person can write their own trait to do proper Unicode case folding and get the right behavior.
Right, but the question is: is it actually *possible* to do proper Unicode case folding with this interface?
Inconsistent. Yes. Weird? No. The simple rule is: when you're about to start repeating a group, that capture and all captures within are first set to undefined. (ECMA-262 15.10.2.5.) Now that I look more closely, I see that ECMA is stricter about setting captures to undefined in these situations than Perl is, and xpressive is non-compliant in this area, too. <sigh>
Oh shucks I see it now: incidently the behaviour described is consistent, doing what they do will work for all alternatives and repeats just as well (since an alternative must be within a repeat if it's captures are going to need clearing). It's going to be a pain to implement though :-/ Thanks for the pointer, John.