regexp: New problem with trailing - in character class
data:image/s3,"s3://crabby-images/fc708/fc7080db1922ed3ec23d743559c8e38b6655a093" alt=""
Hi, I've been using Boost regexp library forever, and it's great. But I just grabbed 1.33.0 (trying to get it to compile on 64-bit Windows), and I'm getting an error I never got before: it refuses to allow this regular expression: [a-z-] which is supposed to mean "a lower case letter or a dash". Previous versions of Boost regexp allowed this, and grep allows it, and version 1.33.0 does allow this one: [-a-z] Is this a deliberate restriction? It's causing me lots of problems, because my application has a lot of existing regular expressions which use this syntax. This occurs both on x64 with VC8, and on a pretty standard (and old) 32-bit Linux system. I've hacked around it horribly for now by adding this to the top of regcompA (I use the A posix interface exclusively): // HACK by GMF to fix problem that [a-z-] is not accepted as valid, but [-a-z] is. Fix it by moving the dash. char *p = (char*) ptr; while (*p) { printf("p: %c\n", *p); if (*p == '\\') p++; else if (*p == '[') { char *q = p+1; while (*q && (*q != ']')) { printf("*q: %c\n", *q); q++; } if (*q == ']') { q--; if (*q == '-') { memmove(p+2, p+1, (q-p)-1); *(p+1) = '-'; p = q; } } } p++; } but that's very nasty, and probably doesn't work properly anyway, and I'd sure like to get that out of my production code. Help! Greg
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
I've been using Boost regexp library forever, and it's great. But I just grabbed 1.33.0 (trying to get it to compile on 64-bit Windows), and I'm getting an error I never got before: it refuses to allow this regular expression:
[a-z-]
which is supposed to mean "a lower case letter or a dash". Previous versions of Boost regexp allowed this, and grep allows it, and version 1.33.0 does allow this one:
[-a-z]
Is this a deliberate restriction? It's causing me lots of problems, because my application has a lot of existing regular expressions which use this syntax.
Confirmed as a bug, here's the patch going into cvs for 1.33.1: Index: boost/regex/v4/basic_regex_parser.hpp =================================================================== RCS file: /cvsroot/boost/boost/boost/regex/v4/basic_regex_parser.hpp,v retrieving revision 1.9.2.4 diff -u -r1.9.2.4 basic_regex_parser.hpp --- boost/regex/v4/basic_regex_parser.hpp 16 Oct 2005 18:12:58 -00001.9.2. 4 +++ boost/regex/v4/basic_regex_parser.hpp 31 Oct 2005 11:07:09 -0000 @@ -1220,6 +1220,17 @@ char_set.add_range(start_range, end_range); if(this->m_traits.syntax_type(*m_position) == regex_constants::syntax_dash) { + if(m_end == ++m_position) + { + fail(regex_constants::error_brack, m_position - m_base); + return; + } + if(this->m_traits.syntax_type(*m_position) == regex_constants::syntax_close_set) + { + // trailing - : + --m_position; + return; + } fail(regex_constants::error_range, m_position - m_base); return; } John.
participants (2)
-
Greg Ferrar
-
John Maddock