[xpressive] Problem accessing repeating embedded pattern.

I am trying to use Xpressive for parsing string-based message buffers. The messages start with a fixed string and end with an "empty" line (similar in form to HTTP). There is an example in the code below. Following a successful search, I'd like to be able process messages in the buffer with something like this pseudo code: for each message for each name_value pair make_pair(name, value) prepend what.suffix() to the next packet The examples of repeating an embedded regex I found in the .pdf and the examples subdirectory seemed close to what I need but I have not been able to make them work. Here is code that is the closest I've been able to get: #include <iostream> #include <boost/xpressive/xpressive.hpp> using namespace std; using namespace boost::xpressive; int main(int argc, char **argv) { std::string buffer = "FROGGIE\r\n" "Volume = 1\r\n" "Other1= 2\r\n" "Channel=3\r\n" "Other =4\r\n" "\r\n" "FROGGIE\r\n" "Volume = 5\r\n" "Other1= 6\r\n" "Channel=7\r\n" "Other =8\r\n" "\r\n" "FROGGIE\r\n" "Volume = 9\r\n" "Other1= 0\r\n" "Channel=10\r\n"; sregex name_value_pair_ = (+alnum >> *_s >> "=" >> *_s >> +_d >> *_s >> _ln); sregex message_ = (*_s >> "FROGGIE" >> _ln >> +name_value_pair_ >> _ln); sregex re_ = +(s1= message_); smatch what; if( regex_search( buffer, what, re_ ) ) { cout << "\n<what.size()>" << what.size() << "</what.size()>" << "\n<what[0]>\n" << what[0] << "</what[0]>" << "\n<what[1]>\n" << what[1] << "</what[1]>" << "\n<what[2]>\n" << what[2] << "</what[2]>" << "\n<what.suffix()>\n" << what.suffix() << "</what.suffix()>" << endl; } } // main I am not very accomplished with regular expressions but what[0] and what.suffix() look correct and I hoped that what.size() held the number of messages and what[1] and what[2] accessed each in turn. I clearly did not understand. <what.size()>2</what.size()> <what[0]> FROGGIE Volume = 1 Other1= 2 Channel=3 Other =4 FROGGIE Volume = 5 Other1= 6 Channel=7 Other =8 </what[0]> <what[1]> FROGGIE Volume = 5 Other1= 6 Channel=7 Other =8 </what[1]> <what[2]> </what[2]> <what.suffix()> FROGGIE Volume = 9 Other1= 0 Channel=10 </what.suffix()> OS is Linux Core 3 2.6.9-1.667. Compiler is gcc 3.4.4-linux. Xpressive README.txt (in the .zip) says "xpressive 0.9.8d". If someone could point me in the right direction, I'd surely appreciate it. Regards, Dick Bridges

Dick Bridges wrote:
I am trying to use Xpressive for parsing string-based message buffers.
<snip> You are confusing a backreference with a nested regex. A backreference (s1, s2, etc.) is accessed positionally from the match results struct like what[1], what[2], etc. In contrast, when you nest a regex in another regex, you ended up with nested match results. You can access them with what.nested_results(), which is a collection of nested results, each result corresponding to an invocation of a nested regex (and each of which may contain other nested results, ad infinitum). In your example ... sregex message_ = (*_s >> "FROGGIE" >> _ln >> +(+alnum >> *_s >> "=" >> *_s >> +_d >> *_s >> _ln) >> _ln); sregex re_ = +message_; smatch what; if( regex_search( buffer, what, re_ ) ) { cout << "Count: " << what.nested_results().size() << endl; BOOST_FOREACH(smatch const &msg, what.nested_results()) { cout << "Message:\n" << msg[0] << endl; } } This displays the three messages one at a time (as long as you add one extra newline at the end of the last messge, otherwise your regex only matches the first two messages). You'll notice that I eliminated your name_value_pair_ regex. That's because it exposed a bug in xpressive. :-P -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler wrote:
Dick Bridges wrote:
I am trying to use Xpressive for parsing string-based message buffers.
<snip>
You are confusing a backreference with a nested regex. A backreference (s1, s2, etc.) is accessed positionally from the match results struct like what[1], what[2], etc.
In contrast, when you nest a regex in another regex, you ended up with nested match results.
<snip>
You'll notice that I eliminated your name_value_pair_ regex. That's because it exposed a bug in xpressive. :-P
Fixed. Please get the latest version of xpressive from the Vault. Below is a nice little program that will parse your messages and print out the name/value pairs. It also is a nice little demo of backrefs and nested results. I may use this in xpressive's docs, with your permission. I'll definitely add this as a test case. #include <string> #include <iostream> #include <boost/foreach.hpp> #include <boost/xpressive/xpressive.hpp> using namespace std; using namespace boost::xpressive; int main() { std::string buffer = "FROGGIE\r\n" "Volume = 1\r\n" "Other1= 2\r\n" "Channel=3\r\n" "Other =4\r\n" "\r\n" "FROGGIE\r\n" "Volume = 5\r\n" "Other1= 6\r\n" "Channel=7\r\n" "Other =8\r\n" "\r\n" "FROGGIE\r\n" "Volume = 9\r\n" "Other1= 0\r\n" "Channel=10\r\n" "\r\n"; mark_tag name(1), value(2); sregex name_value_pair_ = (name= +alnum) >> *_s >> "=" >> *_s >> (value= +_d) >> *_s >> _ln; sregex message_ = *_s >> "FROGGIE" >> _ln >> +name_value_pair_ >> _ln; sregex re_ = +message_; smatch what; if(regex_search(buffer, what, re_)) { cout << "Msg count: " << what.nested_results().size() << endl; BOOST_FOREACH(smatch const &msg, what.nested_results()) { cout << "Message:\n" << msg[0] << endl; cout << " name/value pair count: " << msg.nested_results().size() << endl; BOOST_FOREACH(smatch const &nvp, msg.nested_results()) { cout << " name:" << nvp[name] << ", value:" << nvp[value] << endl; } } } return 0; } -- Eric Niebler Boost Consulting www.boost-consulting.com

On Tuesday 03 January 2006 10:43, Eric Niebler wrote:
You'll notice that I eliminated your name_value_pair_ regex. That's because it exposed a bug in xpressive. :-P
Fixed. Please get the latest version of xpressive from the Vault.
I tried to use the xpressive library from the Boost CVS and I ran into some problems. Do I understand the above to mean that the Vault version is a better choice? regards Bjørn Roald

Bjørn Roald wrote:
On Tuesday 03 January 2006 10:43, Eric Niebler wrote:
You'll notice that I eliminated your name_value_pair_ regex. That's because it exposed a bug in xpressive. :-P
Fixed. Please get the latest version of xpressive from the Vault.
I tried to use the xpressive library from the Boost CVS and I ran into some problems. Do I understand the above to mean that the Vault version is a better choice?
No, the Vault is pretty much the same as what is in CVS at the moment. What is the problem you're seeing? -- Eric Niebler Boost Consulting www.boost-consulting.com

On Tuesday 03 January 2006 22:00, Eric Niebler wrote:
Bjørn Roald wrote:
On Tuesday 03 January 2006 10:43, Eric Niebler wrote:
You'll notice that I eliminated your name_value_pair_ regex. That's because it exposed a bug in xpressive. :-P
Fixed. Please get the latest version of xpressive from the Vault.
I tried to use the xpressive library from the Boost CVS and I ran into some problems. Do I understand the above to mean that the Vault version is a better choice?
No, the Vault is pretty much the same as what is in CVS at the moment. What is the problem you're seeing?
Using static regexp to create simple grammars I encountered two problems. 1. The example code for nested grammar in the documentation produced only printout of part of the results compared to the documentation. 2. Creating a more complex grammar with static regexp I encountered core dumps attempting to print nested matches. Call stack is listed below. I suspect it is related to my lack of understanding the static regexp features/rules/limitations. I am atempting to use nested regexp in the grammar that match stuff like C++ comments and literal strings, when I remove them it all works better. I have read that regexp in general break if you nest unlimited lenght specifiers like + or *. Can that be what causes the core dump? If this is the case, is this a limitation we have to live with? Any workaround? If you like me to post the code I can reduce it as much as I can first. --- Bjørn Call Stack: (gdb) bt #0 0x0805648b in std::ostream_iterator<char, char, std::char_traits<char>>::operator= (this=0xbfe80af0, __value=@0x0) at stream_iterator.h:196 #1 0x0805be0d in std::__copy<char const*, std::ostream_iterator<char, char, std::char_traits<char> > > (__first=0x0, __last=0x8c3bd13 "}..\" .. {dfgdg() {aa} fddd \"dffdfd\\\"dfdfgg\"}{bb}ddd // ", 'g' <repeats 12 times>, "\nddd // ostekake\nddd // pannekake\n/*\n", 'd' <repeats 12 times>, "\n*/\n/*", 'e' <repeats 15 times>, "*/\n}\n\n", __result={<std::iterator<std::output_iterator_tag,void,void,void,void>> = {<No data fields>}, _M_stream = 0x1c7280, _M_string = 0x0}) at stl_algobase.h:247 #2 0x08059bf3 in std::__copy_aux2<char const*, std::ostream_iterator<char, char, std::char_traits<char> > > (__first=0x0, __last=0x8c3bd13 "}..\" .. {dfgdg(){aa} fddd \"dffdfd\\\"dfdfgg\"}{bb}ddd // ", 'g' <repeats 12 times>, "\nddd // ostekake\nddd // pannekake\n/*\n", 'd' <repeats 12 times>, "\n*/\n/*", 'e' <repeats 15 times>, "*/\n}\n\n", __result={<std::iterator<std::output_iterator_tag,void,void,void,void>> = {<No data fields>}, _M_stream = 0x1c7280, _M_string = 0x0}) at stl_algobase.h:273 #3 0x08056453 in std::__copy_ni2<char const*, std::ostream_iterator<char, char, std::char_traits<char> > > (__first=0x0, __last=0x8c3bd13 "}..\" .. {dfgdg(){aa} fddd \"dffdfd\\\"dfdfgg\"}{bb}ddd // ", 'g' <repeats 12 times>, "\nddd // ostekake\nddd // pannekake\n/*\n", 'd' <repeats 12 times>, "\n*/\n/*", 'e' <repeats 15 times>, "*/\n}\n\n", __result=Cannot access memory at address 0x0 ) at stl_algobase.h:308 #4 0x08054719 in std::__copy_ni1<__gnu_cxx::__normal_iterator<char const*, std::string>, std::ostream_iterator<char, char, std::char_traits<char> > > (__first={_M_current = 0x0}, __last={_M_current = 0x8c3bd13 "}..\" .. {dfgdg(){aa} fddd \"dffdfd\\\"dfdfgg\"}{bb}ddd // ", 'g' <repeats 12 times>, "\nddd // ostekake\nddd // pannekake\n/*\n", 'd' <repeats 12 times>, "\n*/\n/*", 'e' <repeats 15 times>, "*/\n}\n\n"}, __result=Cannot access memory at address 0x0 ) at stl_algobase.h:317 #5 0x080523bb in std::copy<__gnu_cxx::__normal_iterator<char const*, std::string>, std::ostream_iterator<char, char, std::char_traits<char> > > (__first={_M_current = 0x0}, __last={_M_current = 0x8c3bd13 "}..\" .. {dfgdg(){aa} fddd \"dffdfd\\\"dfdfgg\"}{bb}ddd // ", 'g' <repeats 12 times>, "\nddd // ostekake\nddd // pannekake\n/*\n", 'd' <repeats 12 times>, "\n*/\n/*", 'e' <repeats 15 times>, "*/\n}\n\n"}, __result=Cannot access memory at address 0x0 ) at stl_algobase.h:358 #6 0x08050be8 in boost::xpressive::operator<< <__gnu_cxx::__normal_iterator<char const*, std::string>, char, std::char_traits<char> > (sout=@0x1c7280, sub=@0x8c3be70) at sub_match.hpp:139 #7 0x080524d3 in output_nested_results::operator()<__gnu_cxx::__normal_iterator<char const*, std::string> > (this=0xbfe80d3c, what=@0x8c3bc20) at test1.cpp:28 #8 0x08050ca6 in std::for_each<std::_List_const_iterator<boost::xpressive::match_results<__gnu_cxx::__normal_iterator<char const*, std::string> > >, output_nested_results> (__first={_M_node = 0x8c3bc18}, __last={_M_node = 0xbfe80e38}, __f={tabs_ = 0}) at stl_algo.h:158 #9 0x0804ec89 in main () at test1.cpp:94

Bjørn Roald wrote:
On Tuesday 03 January 2006 22:00, Eric Niebler wrote:
Bjørn Roald wrote:
I tried to use the xpressive library from the Boost CVS and I ran into some problems. Do I understand the above to mean that the Vault version is a better choice?
No, the Vault is pretty much the same as what is in CVS at the moment. What is the problem you're seeing?
Using static regexp to create simple grammars I encountered two problems.
1. The example code for nested grammar in the documentation produced only printout of part of the results compared to the documentation.
Do you mean the calculator example? For me, this program: std::string str("foo 9*(10+3) bar"); smatch what; sregex group, factor, term, expression; group = '(' >> by_ref(expression) >> ')'; factor = +_d | group; term = factor >> *(('*' >> factor) | ('/' >> factor)); expression = term >> *(('+' >> term) | ('-' >> term)); if(regex_search(str, what, expression)) { std::cout << what[0] << std::endl; } prints out this: 9*(10+3) which is correct, and agrees with the docs. If you mean the regex for matching balanced nested parentheses, I just tried that too and got the expected results.
2. Creating a more complex grammar with static regexp I encountered core dumps attempting to print nested matches. Call stack is listed below.
This sounds an awful lot like the bug I just fixed (and committed to CVS at 2am this morning). Are you sure you have the latest? If you do, please post code that reproduces the error and I'll fix it. Even if the recent changes fixed your problem, if you send the code anyway, I'll add it to xpressive's regression test. (Hint: you should do this.) -- Eric Niebler Boost Consulting www.boost-consulting.com

On Tuesday 03 January 2006 23:19, Eric Niebler wrote:
Bjørn Roald wrote:
1. The example code for nested grammar in the documentation produced only printout of part of the results compared to the documentation.
Do you mean the calculator example? For me, this program:
No its the nested parenteses example. I don't have it up now but I will check it tomorrow after updating my CVS. It's getting late here :-)
2. Creating a more complex grammar with static regexp I encountered core dumps attempting to print nested matches. Call stack is listed below.
This sounds an awful lot like the bug I just fixed (and committed to CVS at 2am this morning). Are you sure you have the latest?
No I did this testing a few days ago, have not updated my CVS since.
If you do, please post code that reproduces the error and I'll fix it. Even if the recent changes fixed your problem, if you send the code anyway, I'll add it to xpressive's regression test. (Hint: you should do this.)
I will post cleaned up code tomorrow. At least if problem does not go away. Thanks. Bjørn

On Tuesday 03 January 2006 23:42, Bjørn Roald wrote:
On Tuesday 03 January 2006 23:19, Eric Niebler wrote:
If you do, please post code that reproduces the error and I'll fix it. Even if the recent changes fixed your problem, if you send the code anyway, I'll add it to xpressive's regression test. (Hint: you should do this.)
I will post cleaned up code tomorrow. At least if problem does not go away. Thanks.
I had no joy with updating my cvs working area today :-( --- see seperate post. Until I get the latest cvs tested I post the source which I cleaned up somewhat, in case you want to use it: #include <iostream> #include <string> #include <boost/xpressive/xpressive.hpp> using namespace boost::xpressive; // Displays nested results to std::cout struct output_nested_results { int tabs_; output_nested_results( int tabs = 0 ) : tabs_( tabs ) { } template< typename BidiIterT > void operator ()( match_results< BidiIterT > const &what ) const { // output the match std::cout << what[0] << '\n'; // comment out this and it works // output any nested matches std::for_each( what.nested_results().begin(), what.nested_results().end(), output_nested_results( tabs_ + 1 ) ); } }; int main() { sregex line_comment = "//" >> -*_ >> _n; sregex scoped_block; scoped_block = as_xpr('{') >> *( keep( +~(set='{','}') ) | by_ref(scoped_block) ) >> '}'; sregex boost_namespace_begin = as_xpr( "namespace" ) >> +space >> "boost" >> +space >> '{'; sregex boost_namespace_body = *( line_comment | // comment out this and it works scoped_block | space | *( keep( +~( set='{', '}' ) ) ) ); sregex boost_namespace_end = as_xpr( '}' ); sregex boost_namespace = boost_namespace_begin >> *space >> boost_namespace_body >> *space >> boost_namespace_end ; smatch what; std::string str = "#include <boost/xxxx.hpp>\n" "namespace boost {\n" "gruble \"grot}..\" .. {dfgdg(){aa} fddd \"dffdfd\\\"dfdfgg\"}{bb}" "}\n"; if( regex_search( str, what, boost_namespace ) ) { // display the whole match std::cout << what[0] << '\n'; // can be removed, but only prints first line ??? // display the nested results std::for_each( what.nested_results().begin(), what.nested_results().end(), output_nested_results() ); } } I put comments on 3 lines which seem to effect the result negatively. My program output is: namespace boost { namespace boost { /bin/bash: line 1: 11082 Segmentation fault bin/gcc/debug/test_bug regards, bjorn

Bjørn Roald wrote:
On Tuesday 03 January 2006 23:42, Bjørn Roald wrote:
On Tuesday 03 January 2006 23:19, Eric Niebler wrote:
If you do, please post code that reproduces the error and I'll fix it. Even if the recent changes fixed your problem, if you send the code anyway, I'll add it to xpressive's regression test. (Hint: you should do this.)
I will post cleaned up code tomorrow. At least if problem does not go away. Thanks.
I had no joy with updating my cvs working area today :-( --- see seperate post.
Until I get the latest cvs tested I post the source which I cleaned up somewhat, in case you want to use it:
<snip> Thanks for the code. I just tried it locally and it seems to work fine, so the bug is already fixed. The output I get is: namespace boost { gruble "grot} namespace boost { gruble "grot } (Looks like you'll need to be smart about not matching braces that are in string literals.) The current version of xpressive.zip in the Vault contains the fix you're after. Thanks for the test case -- I'll add it to xpressive's regression test. -- Eric Niebler Boost Consulting www.boost-consulting.com

On Thursday 05 January 2006 01:15, Eric Niebler wrote:
(Looks like you'll need to be smart about not matching braces that are in string literals.)
Yeah, you are right, also braces in comments. But I think that is the main issues to look out for -- that is if I ignore the posibility of unbalanced braces in macro definitions and conditional compiles, uhhhh ;-( ---- I don't like to think of all that yet. I have done some attempts on string litorials and comments, but I was hampered by the problems posted earlier. I left it out ot the code I sent since it did not seem to be needed to produce the bug. I will download from the CVS or Vault now and see if can moove forward again, I am hopefull, thanks to your fixes :-) When we discuss the issue of string literals, do you have any suggestion of how to best match string literals static regexp sregexp string_literal = '"' >> -*_ >> ( quote that is not escaped); a naive attempt: sregex literal_string_end = ~as_xpr( '\\' ) >> '"'; sregex string_literal = as_xpr( '"' ) >> before( literal_string_end ) >> '"'; I don't know if this even will work as I think it may, since I am a litle confused about the meaning of before(...). regards Bjørn

Bjørn Roald wrote:
When we discuss the issue of string literals, do you have any suggestion of how to best match string literals static regexp
<untested> sregex quoted = '"' // open quote >> keep( // turn off backtracking *( // zero or more ... ~(set= '\\', '"') // chars that are not quote or escape | // or '\\' >> _ // an escaped char ) ) >> '"'; // close quote </untested> -- Eric Niebler Boost Consulting www.boost-consulting.com

On Thursday 05 January 2006 01:15, Eric Niebler wrote:
Until I get the latest cvs tested I post the source which I cleaned up somewhat, in case you want to use it:
<snip>
Thanks for the code. I just tried it locally and it seems to work fine, so the bug is already fixed. The output I get is:
namespace boost { gruble "grot} namespace boost { gruble "grot }
(Looks like you'll need to be smart about not matching braces that are in string literals.)
The current version of xpressive.zip in the Vault contains the fix you're after. Thanks for the test case -- I'll add it to xpressive's regression test.

On Thursday 05 January 2006 01:15, Eric Niebler wrote:
Thanks for the code. I just tried it locally and it seems to work fine, so the bug is already fixed. The output I get is:
namespace boost { gruble "grot} namespace boost { gruble "grot }
After CVS update I get same results :-)
(Looks like you'll need to be smart about not matching braces that are in string literals.)
I use this for matching string literals: sregex escaped_quote = as_xpr( '\\' ) >> '"'; sregex string_literal = as_xpr( '"' ) >> *( space | escaped_quote | -*_ ) >> '"'; seems to work, it will eat \" as part of string. I guess it should only eat if there is an odd number of back-slashes prior to quote. I tried: sregex escaped_quote = as_xpr( '\\' ) >> '"'; sregex escaped_backslash = as_xpr( "\\\\" ); sregex string_literal = as_xpr( '"' ) >> *( space | escaped_backslash | escaped_quote | -*_ ) >> '"'; But it did not work. What are the percidence rules for the sub-elements in *( space | escaped_backslash | escaped_quote | -*_ ) ? Next I try to combine that the literal string regexp into grammar so I don't scan for braces within string literals. Bjørn
participants (3)
-
Bjørn Roald
-
Dick Bridges
-
Eric Niebler