Regex - hyperlink match/submatch problem
Dear forum members: I’m hoping that someone can copy and compile the code below and tell me what is wrong with my expression. I am trying to write a pattern that produces the following match and submatch: MATCH: <a href="http://test1.com">TEST #1</a> SUBMATCH: http://test1.com // main.cpp #include <boost/regex.hpp> #include <iostream> using namespace std; using namespace boost; int main() { // here is the data, please note the minor ‘href=’ differences. string sText = "<a href=\"http://test1.com\">TEST #1</a>" "<a href =\"http://test2.com\">TEST #2</a>" "<a href= \"http://test3.com\">TEST #3</a>" "<a href = \"http://test4.com\">TEST #4</a>" "<a href=\"http://test5.com\">TEST #5</a>"; // the following 4 patterns were my bet attempts char exp[] = "<a href(.*?)</a>"; "<a href\s*=\s*\"(.*?)\""; "<a href=(.*?)</a>"; "<a href\s*=\s*\"(.*?)</a>" int subs[] = {0,1}; regex e(exp, regex::normal | regbase::icase); sregex_token_iterator i(sText.begin(), sText.end(), e, subs); sregex_token_iterator j; while(i != j) { cout << "*******************************" << endl; cout << *i++ << endl; cout << *i++ << endl; } return 0; }
Jeff wrote:
Dear forum members:
I’m hoping that someone can copy and compile the code below and tell me what is wrong with my expression. I am trying to write a pattern that produces the following match and submatch:
MATCH: <a href="http://test1.com">TEST #1</a>
SUBMATCH: http://test1.com
Hi Jeff, Your code does not compile but you should check out this website: http://www.cuneytyilmaz.com/prog/jrx/ I use it every time I am playing with regexp. I found out that this expression might suite your needs: <a href\s*=\s*\"(.*?)>(.*?)<\/a> Note the backslashed slash ("\/") at the end of the expression. Hope this helps. PS: on the website I gave you, untick all options but "split input". JD
Jeff wrote:
Dear forum members:
I’m hoping that someone can copy and compile the code below and tell me what is wrong with my expression. I am trying to write a pattern that produces the following match and submatch:
If you want a set of patterns to be treated as alternatives then you need to separate them with "|". Also you need to double up the escapes on the \s to \\s : remember that the C++ compiler will strip one \ so you need two of them if you want the regex engine to see one. So simply: char exp[] = "<a href\\s*=\\s*\"(.*?)</a>"; would do what you want. John.
participants (3)
-
JD
-
Jeff
-
John Maddock