Re: [Boost-Users] Regex++ newbie problems

20 Mar 2003

      ...
I've just started using Regex++ (from boost 1.29.0)
and I'm experiencing some strangeness that don't seem to be mentioned in
the
faq.
Firstly I found that [-A-Za-z]+ matched spaces and punctuation characters
unexpectedly
rather than plain alphabetic characters and hyphens only as desired.
Reading the documentation I altered this to [-:alpha:] & [-:upper::lower]
with no
effect.  So I decided to experiment with adding ^[:space:].
When finally I reached the expression below I got a coredump where the
expression
was declared.
The intention of this expression was to strip and keep leading and
trailing
punctuation and
spaces as well as extracting a word from the middle.
static const boost::regex
Word_expression("([:punct::space:]*)([-:upper::lower:^[:punct::space:]]+)([:
...
punct::space:]*)");
Is it right that 'bad' expressions should coredump?
boost::regex will through an exception if you pass it an invalid
expression - you need to catch it or else yes your program will core dump.

It's an invalid expression because:

[:punct::space:]* should be [[:punct:][:space:]]*

and

[-:upper::lower:^[:punct::space:]] you can't nest character classes like
that (in any regular expression language that I know of).
...
And if so in what way is the above expression bad?
(as an aside maybe we could catch bad ones better by replacing regex
strings
with
 overloaded operators the way streams have superceded printf)
I found I still get rogue matches on punctuation and spaces when I use the
manually expanded
form below:
You are using the member first of boost::match_results as a null terminated
string - it is *Not* a copy of the string matched or a null terminated
string it is an iterator into your text - either use the sequence
(first-second), or call match_results::str() to get a std::string object.

John.

Re: [Boost-Users] Regex++ newbie problems

John Maddock