[Regex] Why is my RE not working?... :-)

Given these definitions: RE: " (ip=|\\(|\\[)(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})(\\]|\\)|,|;| |\n)"; Text: "BLAH ([1.2.3.4]) [5.6.7.8] ip=100.100.200.200 BLAH2 (77.48.32.42)\n"; the following code prints: IP=5.6.7.8 100.100.200.200 77.48.32.42 I wonder why it skips the first IP, ie. 1.2.3.4 ? Is there something wrong in my RE definition above, the code below, or in boost::regex? I'm sure it's just a silly error of mine but I don't see it... :-( ... bool GetIP(const char* ApszText, char* ARet_szIP) { // find all IPs in ApszText and return them via ARet_szIP (seperate 'em by a blank if multiple found) const char* szRe = " (ip=|\\(|\\[)(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})(\\]|\\)|,|;| |\n)"; boost::regex re(szRe); string s = ApszText; string::const_iterator start = s.begin(), end = s.end(); boost::match_resultsstring::const_iterator what; boost::match_flag_type flags = boost::match_default; int cFound = 0; while (boost::regex_search(start, end, what, re, flags)) { string sIP(what[2].first, what[2].second); if (!cFound++) ARet_szIP[0] = 0; // clear only if found any else strcat(ARet_szIP, " "); strcat(ARet_szIP, sIP.c_str()); start = what[0].second; // set the next start pos } return cFound > 0; } int main() { const char* szLine = "BLAH ([1.2.3.4]) [5.6.7.8] ip=100.100.200.200 BLAH2 (77.48.32.42)\n"; char szIP[256] = ""; bool f = GetIP(szLine, szIP); printf("IP=%s\n", szIP); return 0; }

AMDG Adem wrote:
Given these definitions: RE: " (ip=|\\(|\\[)(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})(\\]|\\)|,|;| |\n)"; Text: "BLAH ([1.2.3.4]) [5.6.7.8] ip=100.100.200.200 BLAH2 (77.48.32.42)\n";
the following code prints: IP=5.6.7.8 100.100.200.200 77.48.32.42
I wonder why it skips the first IP, ie. 1.2.3.4 ? Is there something wrong in my RE definition above, the code below, or in boost::regex? I'm sure it's just a silly error of mine but I don't see it... :-(
The problem is the leading space in the regex. In Christ, Steven Watanabe

Steven Watanabe wrote:
Adem wrote:
Given these definitions: RE: " (ip=|\\(|\\[)(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})(\\]|\\)|,|;| |\n)"; Text: "BLAH ([1.2.3.4]) [5.6.7.8] ip=100.100.200.200 BLAH2 (77.48.32.42)\n";
the following code prints: IP=5.6.7.8 100.100.200.200 77.48.32.42
I wonder why it skips the first IP, ie. 1.2.3.4 ? Is there something wrong in my RE definition above, the code below, or in boost::regex? I'm sure it's just a silly error of mine but I don't see it... :-(
The problem is the leading space in the regex.
But this seems to be a problem in the regex-library, isn't it? Even the following isn't working: RE = "[,; ](ip=|\\(|\\[)(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})[,; \n\\)\\]]"; ie. a comma, semicolon or blank in front. I also explicitly stated "perl"-mode, ie: const char* szRe = "[,; ](ip=|\\(|\\[)(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})[,; \n\\)\\]]"; boost::regex re(szRe, boost::regex::perl | boost::regex::icase);

Adem wrote:
I wonder why it skips the first IP, ie. 1.2.3.4 ? Is there something wrong in my RE definition above, the code below, or in boost::regex? I'm sure it's just a silly error of mine but I don't see it... :-(
The problem is the leading space in the regex.
But this seems to be a problem in the regex-library, isn't it? Even the following isn't working: RE = "[,; ](ip=|\\(|\\[)(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})[,; \n\\)\\]]";
I had to reformat your expression to understand what was going on, but if we write it like this: " (ip=" "|\\(" "|\\[)" "(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})" "(\\]|\\)|,|;| |\n)" Then it matches an IP that begins with: * A leading space followed by *either* of: "ip=" or "(" or "[" But the first IP address in your text begins: " ([" So there's no way it can match, either in Perl or in Boost.Regex. HTH, John Maddock.

Because the first address has parentheses *and* brackets. Your regex requires one or the other, and won't recognize both. At 05:45 PM 12/27/2008, Adem wrote:
Given these definitions: RE: " (ip=|\\(|\\[)(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})(\\]|\\)|,|;| |\n)"; Text: "BLAH ([1.2.3.4]) [5.6.7.8] ip=100.100.200.200 BLAH2 (77.48.32.42)\n";
participants (4)
-
Adem
-
Alan M. Carroll
-
John Maddock
-
Steven Watanabe