regex problem: removing two elements in a row?

I want to use boost::regex 1.33.1 and vc7.1 on a WinXp-Sp2 computer to remove common words from a file. I'm encountering a problem which seems related to having two of the common words next to each other. std::string test(" in the beginning god created the heavens and the earth "); boost::regex reg; reg.assign("( and | can | did | the | in | a )"); test = boost::regex_replace(test, reg, " "); cout << test << endl; The result is: " the beginning god created heavens the earth " Am I doing something wrong or leaving out a step? Seems like the above ought to work. It catches one of the occurrences of " the " between " created " and " heavens " but not those immediately after the " in " or " and ".

Lynn Allan wrote:
I want to use boost::regex 1.33.1 and vc7.1 on a WinXp-Sp2 computer to remove common words from a file. I'm encountering a problem which seems related to having two of the common words next to each other.
std::string test(" in the beginning god created the heavens and the earth "); boost::regex reg; reg.assign("( and | can | did | the | in | a )"); test = boost::regex_replace(test, reg, " "); cout << test << endl;
The result is: " the beginning god created heavens the earth "
Am I doing something wrong or leaving out a step? Seems like the above ought to work. It catches one of the occurrences of " the " between " created " and " heavens " but not those immediately after the " in " or " and ".
Check your whitespace in the expression: the first match is " in ", the remaining text then starts "the beginning" which won't match " the " etc etc. Did you mean to turn on the x-modifier? John.

ooops ... and thanks for the very prompt help to this regex newbie
... I'm very much at the "trial and error" state of getting regex
statements to work.
Would one of the following be the preferred "assignment"?
reg.assign("(\\
Lynn Allan wrote:
I want to use boost::regex 1.33.1 and vc7.1 on a WinXp-Sp2 computer to remove common words from a file. I'm encountering a problem which seems related to having two of the common words next to each other.
std::string test(" in the beginning god created the heavens and the earth "); boost::regex reg; reg.assign("( and | can | did | the | in | a )"); test = boost::regex_replace(test, reg, " "); cout << test << endl;
The result is: " the beginning god created heavens the earth "
Am I doing something wrong or leaving out a step? Seems like the above ought to work. It catches one of the occurrences of " the " between " created " and " heavens " but not those immediately after the " in " or " and ".
Check your whitespace in the expression: the first match is " in ", the remaining text then starts "the beginning" which won't match " the " etc etc.
Did you mean to turn on the x-modifier?
John.

Lynn Allan wrote:
ooops ... and thanks for the very prompt help to this regex newbie ... I'm very much at the "trial and error" state of getting regex statements to work.
Would one of the following be the preferred "assignment"? reg.assign("(\\
|\\ |\\ |\\ |\\ |\\)"); or reg.assign("\\<(and|can|did|the|in|a)\\>"); Both seem to work, but perhaps this regex newbie is missing a better alternative.
The latter looks more elegant to me. Make the opening parenthesis a (?: if you don't actually want to capture a sub-expression.
What is the "x modifier"? I looked up the perl explanation, and the boost::regex documentation, but that seemed related to formatting the statement itself.
Also, thanks for the time and effort you put into boost::regex ... development and ongoing support.
It causes whitespace in the expression to be ignored. You can turn it on in the flags passed to assign: reg.assign("\\<(and|can|did|the|in|a)\\>", boost::regex::perl|boost::regex::mod_x); Or you can embed it in the expression: "(?x)\\<(and | can | did | the | in | a)\\>" // whitespace ignored HTH, John.

Lynn Allan wrote:
I want to use boost::regex 1.33.1 and vc7.1 on a WinXp-Sp2 computer to remove common words from a file. I'm encountering a problem which seems related to having two of the common words next to each other.
std::string test(" in the beginning god created the heavens and the earth "); boost::regex reg; reg.assign("( and | can | did | the | in | a )"); test = boost::regex_replace(test, reg, " "); cout << test << endl;
The result is: " the beginning god created heavens the earth "
Am I doing something wrong or leaving out a step? Seems like the above ought to work. It catches one of the occurrences of " the " between " created " and " heavens " but not those immediately after the " in " or " and ".
Check your whitespace in the expression: the first match is " in ", the remaining text then starts "the beginning" which won't match " the " etc etc. Did you mean to turn on the x-modifier? John.
participants (2)
-
John Maddock
-
Lynn Allan