Regex problem: cannot parse terms containing OR
data:image/s3,"s3://crabby-images/03b1d/03b1d417d141e12cde1e2b2ff4eec6dd2e0700b7" alt=""
I have been able to parse stuff more complicated than this, but now I am
stuck with something seemingly simpler.
The expression being parsed is the common sequence:
1,2-5,7,8-11
Question 1: Notice my approach. I first match the whole expression, with
"regex_match", to make sure that it is valid (that works great). Next, I
use "regex_iterator" to break down the parts. Is that good practice? Am
I being inefficient/redundant?
Question 2: My code below only extracts "range terms" ("x-y"), for some
reason I cannot extract "number terms".
As a workaround, I can always feed my data like this:
1-1,2-5,7-7,8-11
but, after a lot of tries, would love to learn how to do this properly.
TIA,
-Ramon
-----------------------------------------------------
#include <iostream>
#include
data:image/s3,"s3://crabby-images/f9ecd/f9ecdac30e0c31950c61129fa787ee2661a42e9e" alt=""
On Sat, Sep 26, 2009 at 5:04 PM, Ramon F Herrera
I have been able to parse stuff more complicated than this, but now I am stuck with something seemingly simpler.
The expression being parsed is the common sequence:
1,2-5,7,8-11
Question 1: Notice my approach. I first match the whole expression, with "regex_match", to make sure that it is valid (that works great). Next, I use "regex_iterator" to break down the parts. Is that good practice? Am I being inefficient/redundant?
Question 2: My code below only extracts "range terms" ("x-y"), for some reason I cannot extract "number terms".
As a workaround, I can always feed my data like this:
1-1,2-5,7-7,8-11
but, after a lot of tries, would love to learn how to do this properly.
TIA,
-Ramon
-----------------------------------------------------
#include <iostream> #include
using namespace std; bool term_callback(const boost::match_resultsstd::string::const_iterator& what) { for (unsigned int i = 0; i < what.size(); i++) { cout << "what[" << i << "]: " << what[i].str() << endl; cout << "---------" << endl; } return true; }
int main(int argc, char *argv[]) { const char hyphen = '-'; const char left_paren = '('; const char right_paren = ')'; const char bar = '|'; const char comma = ','; const char star = '*';
const string number = "[0-9]+"; const string range = number + hyphen + number; const string term = left_paren + number + bar + range + right_paren; const string sequence = term + bar + left_paren + term + comma + right_paren + star + term;
boost::regex expression(sequence); boost::regex piece(range); boost::cmatch matches;
char argument[1024]; strcpy(argument, argv[1]);
if (!boost::regex_match(argument, matches, expression)) { cerr << "There is no match" << endl; return 1; }
string text = argument;
boost::sregex_iterator m1(text.begin(), text.end(), piece); boost::sregex_iterator m2; for_each(m1, m2, &term_callback);
return 0; }
Do note, if you are wanting to do something with your numbers, like convert them to numbers and do some operations on them, there is a much easier way to do this if you use Boost.Spirit2.1 instead of Boost.Regex. Your problem is more of a parsing problem then a matching problem, and regex is nice for matching, and Spirit2.1 is better for parsing. If you are interested then I or someone else could whip up some code that does the same thing in Spirit2.1, but will run a whole lot faster and be a lot easier to use.
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
----- Original Message -----
From: "Ramon F Herrera"
I have been able to parse stuff more complicated than this, but now I am stuck with something seemingly simpler.
The expression being parsed is the common sequence:
1,2-5,7,8-11
Question 1: Notice my approach. I first match the whole expression, with "regex_match", to make sure that it is valid (that works great). Next, I use "regex_iterator" to break down the parts. Is that good practice? Am I being inefficient/redundant?
It depends :-) If it's fast enough it's good enough, and I've often done similar things (it's easy to understand compared to a more "sophisticated" approach too).
Question 2: My code below only extracts "range terms" ("x-y"), for some reason I cannot extract "number terms".
You need something like: \\d+(?:\\s*-\\s*\\d+)? to match a "digit or range". HTH, John.
participants (3)
-
John Maddock
-
OvermindDL1
-
Ramon F Herrera