
Also, I may have found another issue, closely related to the one under discussion. It regards case-insensitive matching of named character classes. The regex_traits<> provides two functions for working with named char classes: lookup_classname and isctype. To match a char class such as [[:alpha:]], you pass "alpha" to lookup_classname and get a bitmask. Later, you pass a char and the bitmask to isctype and get a bool yes/no answer.
But how does case-insensitivity work in this scenario? Suppose we're doing a case-insensitive match on [[:lower:]]. It should behave as if it were [[:lower:][:upper:]], right? But there doesn't seem to be enough smarts in the regex_traits interface to do this.
I've always thought that a case insensitive match for [[:lower:]] was an abomination frankly, but here's how I currently handle it: If the final bitmask contains all of the bits of the mask returned by lookup_classname("lower") or all the bits of the mask retruned by lookup_classname("upper") then I or the mask with the result of lookup_classname("alpha").
Imagine I write a traits class which recognizes [[:fubar:]], and the "fubar" char class happens to be case-sensitive. How is the regex engine to know that? And how should it do a case-insensitive match of a character against the [[:fubar:]] char class? John, can you confirm this is a legitimate problem?
OK, user defined classes may be an issue (see below).
I see two options:
1) Add a bool icase parameter to lookup_classname. Then, lookup_classname( "upper", true ) will know to return lower|upper instead of just upper.
2) Add a isctype_nocase function
I prefer (1) because the extra computation happens at the time the pattern is compiled rather than when it is executed.
If we're going to change this then (1) is definitely preferable, it's quite a small change after all. In fact I suspect this may be a real bug in the current Boost.Regex Unicode support: matching a case insensitive [[:Ll:]] will only match lower case letters. Although frankly which of the other L* categories it should match is an open question: should it match Lo or Lm for example? Head swimmingly yours, John.