I've just started using Regex++ (from boost 1.29.0) and I'm experiencing some strangeness that don't seem to be mentioned in the faq.
Firstly I found that [-A-Za-z]+ matched spaces and punctuation characters unexpectedly rather than plain alphabetic characters and hyphens only as desired. Reading the documentation I altered this to [-:alpha:] & [-:upper::lower] with no effect. So I decided to experiment with adding ^[:space:]. When finally I reached the expression below I got a coredump where the expression was declared. The intention of this expression was to strip and keep leading and trailing punctuation and spaces as well as extracting a word from the middle.
static const boost::regex
Word_expression("([:punct::space:]*)([-:upper::lower:^[:punct::space:]]+)([:
punct::space:]*)");
Is it right that 'bad' expressions should coredump?
boost::regex will through an exception if you pass it an invalid expression - you need to catch it or else yes your program will core dump. It's an invalid expression because: [:punct::space:]* should be [[:punct:][:space:]]* and [-:upper::lower:^[:punct::space:]] you can't nest character classes like that (in any regular expression language that I know of).
And if so in what way is the above expression bad? (as an aside maybe we could catch bad ones better by replacing regex strings with overloaded operators the way streams have superceded printf)
I found I still get rogue matches on punctuation and spaces when I use the manually expanded form below:
You are using the member first of boost::match_results as a null terminated string - it is *Not* a copy of the string matched or a null terminated string it is an iterator into your text - either use the sequence (first-second), or call match_results::str() to get a std::string object. John.