data:image/s3,"s3://crabby-images/e769c/e769c71c46da9fe5bfc1b7f216083fc56717120e" alt=""
Interesting! Can you confirm that re2c is not handling backreferences? That is, after a match, is there a way to access what the Nth group matched? Also, do you think you could send around the code that re2c is generating for this expression?
This regex newbie is ignorant about what it means to "back-reference." My usage (so far) has been recognizing one type of expression at a time, like: ((Sunday|Sun)|(Monday|Mon) ...etc.(Saturday|Sat)) or ZipCode ##### or #####-#### It seems to be able to get part way thru the longer possibility, and then "settle for" the shorter "abbreviation". That probably isn't back-referencing. ZipCode 00000-000 is "almost" #####-####, but isn't, so it recognizes #####. MondX is "almost" Monday, but isn't, so it recognizes Mon, which is also group=2. It is relatively straightforward to get the results of a single match and do what you want with it ... I don't have experience with untangling something more complicated like a multi-piece date/time: MMM DDD, YYYY hh:mm:ss [ap].m
I'm guessing that re2c is generating a DFA. Boost.Regex and Xpressive generate NFAs because DFAs aren't suited to doing backreferences[*]. I've considered adding DFA support to Xpressive, and use DFAs for those regexes that don't need the full power of NFAs. Clearly, the performance win would be worth the trouble. This would not be a trivial undertaking, however.
[*] Technically, DFAs only have a problem with patterns such as "(.)\\1"; that is, when the result of the backreference is used within the pattern itself.
Re2c generates VERY gnarly code, full of goto's and labels ... but the compiler is happy and the optimizer seems to straighten everything out into fast object code. Re2c is apparently part of PHP. yy4: yych = *++YYCURSOR; goto yy3; yy5: yych = *++YYCURSOR; if(yych <= '/') goto yy6; if(yych <= '9') goto yy7; yy6: YYCURSOR = YYMARKER; switch(yyaccept){ case 1: goto yy10; case 0: goto yy3; } yy7: yych = *++YYCURSOR; if(yych <= '/') goto yy6; if(yych >= ':') goto yy6; yych = *++YYCURSOR; if(yych <= '/') goto yy6;