New subject: [Boost-Users] Regex++ newbie problems

18 Mar 2003

      Hi,
   I've just started using Regex++ (from boost 1.29.0) 
and I'm experiencing some strangeness that don't seem to be mentioned in the
faq.

Firstly I found that [-A-Za-z]+ matched spaces and punctuation characters
unexpectedly
rather than plain alphabetic characters and hyphens only as desired.
Reading the documentation I altered this to [-:alpha:] & [-:upper::lower]
with no
effect.  So I decided to experiment with adding ^[:space:].
When finally I reached the expression below I got a coredump where the
expression 
was declared.
The intention of this expression was to strip and keep leading and trailing
punctuation and
spaces as well as extracting a word from the middle.

static const boost::regex 

Word_expression("([:punct::space:]*)([-:upper::lower:^[:punct::space:]]+)([:
punct::space:]*)");

Is it right that 'bad' expressions should coredump? 
And if so in what way is the above expression bad?
(as an aside maybe we could catch bad ones better by replacing regex strings
with
 overloaded operators the way streams have superceded printf)

I found I still get rogue matches on punctuation and spaces when I use the
manually expanded
form below:

([-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]+)

What is going wrong?
			Regards,
				  Bruce A.

Full source attached: 
E.g. try running e.g. ReadWord ",; token-alpha; ,"
     I would desire the matches to be:
             what[1] -> ",; "
             what[2] -> "token-alpha"
             what[3] -> "; ,"

----cut here----

#include <iostream>
#include <fstream>
#include "boost/regex.hpp"

int main(int argc,const char* const argv[]) {
    int Status = 0;

//   static const boost::regex Word_expression("[a-zA-Z]+");

// causes coredump
//   static const boost::regex 
Word_expression("([:punct::space:]*)([-:upper::lower:^[:punct::space:]]+)([:
punct::space:]*)");

static const boost::regex 
Word_expression("([:punct::space:]*)([-abcdefghijklmnopqrstuvwxyzABCDEFGHIJK
LMNOPQRSTUVWXYZ]+)([:punct::space:]*)");

    // dump arguments
    for(int argNo=0;argNo != argc;argNo++) {
       std::cout << "argNo " << argNo
		<< " = '" << argv[argNo] << "'" << std::endl;

       boost::cmatch what;
       if(regex_search(argv[argNo], what, Word_expression)) {
	 std::cout << "Whole = " << what[0].first << std::endl;
	 int resultNo = 1;
	 while(what[resultNo].matched == true) {
	    std::cout << "sub[" << resultNo << "]  = "
		      << "'" << what[resultNo].first << "'" << std::endl;
	    resultNo++;
	 }
       }
    }

    if (argc <= 1) {
       std::cout << "Usage: ReadWord <filename>..." << std::endl;
       Status = 1;
    } else {
       for(int argNo=1;argNo != argc;argNo++) {
	 std::ifstream In(argv[argNo]);
	 if (!In) {
	    std::cout << "Error: could not open file: "
		      << argv[argNo] << std::endl;
	 } else while((In) && (In.eof() == false)) {
	    std::string InputLine;
	    In >> InputLine;
	    std::cout << InputLine << std::endl;
	 }
       }
    }

    return Status;
} //main

============================================================================
 Any opinions expressed in this e-mail are those of the individual and not
 necessarily those of Tyco Safety Products.

 Any prices for the supply of goods or services are only valid if supported
 by a formal written quotation.

 This e-mail and any files transmitted with it, including replies and
 forwarded copies (which may contain alterations) subsequently transmitted
 from Tyco Saftey Products are confidential and solely for the use
 of the intended recipient.

 If you are not the intended recipient or the person responsible for
 delivery to the intended recipient, be advised that you have received this
 e-mail in error and that any use is strictly prohibited.  In this event,
 please notify us via e-mail at 'helpdesk.tepg@tycoint.com' or telephone on 
 0121 255 6499 and then delete the e-mail and any copies of it.
============================================================================

Regex++ newbie problems

Bruce Adams [TSP Sunbury]

John Maddock

tags

participants (2)