Spirit Newbie (balanced parentheses)

Hi all I'm trying to use spirit in order to make a parser for a simple language. Last time i had to do this I was in school (some more than 15 years ago). I managed to do something that works but it does not handle "balanced parentheses". I know this is a common problem but after having look for every possible way of spelling "balanced parentheses" I did not find anything (that I could understand). The files I want to parse are made of ASCII char. They can contains commands that should be interpreted (replacement). Commands always start with "^!" (command switch). An example of a file content: text text text ^!for-each( A )( B ) text text text text I want to send the "text" strings to the std::cout and detect the commands "^!for-each( A )( B )" in order to process them. At the moment I only have one command "for-each" Where: A : is a query that can contain /'" and balanced () B : is the text to be sent to output for each result of the query. This can be a 'script' meaning text + commands, for example: aa bb ^!for-each(C)(D) tt yy uu therefore B can also contain balanced (). The rules, I wrote does not handled balanced () very well. For example the following command should be parsed successfully: ^!for-each( a/b/c()/e[]/d ) ( item [name] do ^!for-each(sub/text()) ( print() ) ) I need some advices. Thank you So here are my rules { boost::spirit::chlit<> LPAREN('('); boost::spirit::chlit<> RPAREN(')'); boost::spirit::strlit<> CMDSWITCH("^!"); script // main rule = * ( (boost::spirit::anychar_p - CMDSWITCH) | command ); command = boost::spirit::discard_first_node_d[ CMDSWITCH >> (for_each | boost::spirit::eps_p // for error reporting ) ]; for_each = boost::spirit::discard_first_node_d[ boost::spirit::as_lower_d["for-each"] >> *boost::spirit::space_p >> query >> *boost::spirit::space_p >> subscript ]; query = boost::spirit::inner_node_d[ LPAREN >> *(boost::spirit::anychar_p - ( RPAREN )) >> RPAREN ]; subscript = boost::spirit::inner_node_d[ LPAREN >> *( (boost::spirit::anychar_p - ( CMDSWITCH | RPAREN )) |command ) >> RPAREN ]; }

On Fri, Apr 16, 2010 at 11:41 AM, EricB
Hi all
I'm trying to use spirit in order to make a parser for a simple language. Last time i had to do this I was in school (some more than 15 years ago). I managed to do something that works but it does not handle "balanced parentheses". I know this is a common problem but after having look for every possible way of spelling "balanced parentheses" I did not find anything (that I could understand).
The files I want to parse are made of ASCII char. They can contains commands that should be interpreted (replacement). Commands always start with "^!" (command switch).
An example of a file content:
text text text ^!for-each( A )( B ) text text text text
I want to send the "text" strings to the std::cout and detect the commands "^!for-each( A )( B )" in order to process them. At the moment I only have one command "for-each" Where: A : is a query that can contain /'" and balanced () B : is the text to be sent to output for each result of the query. This can be a 'script' meaning text + commands, for example: aa bb ^!for-each(C)(D) tt yy uu therefore B can also contain balanced ().
The rules, I wrote does not handled balanced () very well.
For example the following command should be parsed successfully:
^!for-each( a/b/c()/e[]/d ) ( item [name] do ^!for-each(sub/text()) ( print() ) )
I need some advices. Thank you
So here are my rules { boost::spirit::chlit<> LPAREN('('); boost::spirit::chlit<> RPAREN(')'); boost::spirit::strlit<> CMDSWITCH("^!");
script // main rule = * ( (boost::spirit::anychar_p - CMDSWITCH) | command ); command = boost::spirit::discard_first_node_d[ CMDSWITCH >> (for_each | boost::spirit::eps_p // for error reporting ) ]; for_each = boost::spirit::discard_first_node_d[ boost::spirit::as_lower_d["for-each"] >> *boost::spirit::space_p >> query >> *boost::spirit::space_p >> subscript ]; query = boost::spirit::inner_node_d[ LPAREN >> *(boost::spirit::anychar_p - ( RPAREN )) >> RPAREN ]; subscript = boost::spirit::inner_node_d[ LPAREN >> *( (boost::spirit::anychar_p - ( CMDSWITCH | RPAREN )) |command ) >> RPAREN ]; }
Perhaps something like (untested, not currently at home, but should be valid): { using boost::spirit::qi; // I am lazy using boost::spirit::ascii; // assuming ascii encoding script // main rule = command | char_ ; command = "^!" >> ( for_each | eps // why an eps, why not just fail out? ) ; for_each = no_case["for-each"] >> skip(space) [ query >> subscript ] ; query = '(' >> raw[stringparen_inner] >> ')' ; subscript = '(' >> ( command | stringparen_inner // command eats the possible "^!" first, no need to test ) >> ')' ; stringparen_inner = ('(' >> stringparen_inner >> ')') | ~char_(')') ; } Do note, the above is written in the latest version of Spirit, where-as yours was written in the ancient and slower (and more verbose) version. That should handle nested parenthesis and all just fine.

Thank you your example really helped me. I'm using the following as a test case: fdqfds dd d ^!for-each ( a/b/c()/e[]/d ) ( item ) Here are the rules { boost::spirit::chlit<> LPAREN('('); boost::spirit::chlit<> RPAREN(')'); boost::spirit::strlit<> CMDSWITCH("^!"); script = * ( (boost::spirit::anychar_p - CMDSWITCH) | command ); /* */ command = boost::spirit::discard_first_node_d[ CMDSWITCH >> (for_each | to_string | boost::spirit::eps_p [&calculator::error_log] ) ] ; /* */ for_each = boost::spirit::discard_first_node_d[ boost::spirit::as_lower_d["for-each"] >> *boost::spirit::space_p >> query >> *boost::spirit::space_p >> subscript ] ; /* */ query = boost::spirit::inner_node_d[ LPAREN >> * ( query_text1 | query_text2 ) >> RPAREN ] ; { // this is to handle balanced () inside query query_text1 = boost::spirit::anychar_p - (LPAREN|RPAREN); query_text2 = LPAREN >> * ( query_text1 | query_text2) >> RPAREN ; } subscript = boost::spirit::inner_node_d[ LPAREN >> *( command | subscript1 | subscript2 ) >> RPAREN ] ; { // this is to handle balanced () and command inside subscript subscript1 = boost::spirit::anychar_p - (CMDSWITCH|LPAREN|RPAREN) ; subscript2 = LPAREN >> *( subscript1 | subscript2 | command ) >> RPAREN ; } to_string = boost::spirit::as_lower_d["to-string"] ; } I'm building an tree while parsing and then I'm printing it using boost::spirit::tree_to_xml. I have 2 additional questions : 1) The for each statement produces the following AST. The nodes marked with XXX at end of lines are produced due to the "*boost::spirit::space_p " part of the for-each rule. Is there a way (a directive or something else in the grammar) to skip this content so that the marked nodes (XXX) would not appeared in the AST ? (i have tried discar_node_d, and other directives but nothing works) <parsenode rule="for_eachID"> <parsenode> XXX <value> </value> XXX </parsenode> XXX <parsenode rule="queryID"> <parsenode> ... </parsenode> </parsenode> <parsenode> XXX <value> </value> XXX </parsenode> XXX <parsenode> XXX <value>\n</value> XXX </parsenode> XXX <parsenode> XXX <value> </value> XXX </parsenode> XXX <parsenode rule="subscriptID"> <parsenode> ... </parsenode> </parsenode> </parsenode> 2)The query rule handles inner balanced parentheses but it produces the following AST: =>input is b/c()/e I Would like the AST produced from the inner () to be flatten. I would like the node marked with XXX not to be generated. Is there a directive for this ? <parsenode> <value>b</value> </parsenode> <parsenode> <value>/</value> </parsenode> <parsenode> <value>c</value> </parsenode> <parsenode> XXX <parsenode> <value>(</value> </parsenode> <parsenode> <value>)</value> </parsenode> </parsenode> XXX <parsenode> <value>/</value> </parsenode> <parsenode> <value>e</value> </parsenode>

On Sat, Apr 17, 2010 at 12:51 PM, EricB
I have 2 additional questions : 1) The for each statement produces the following AST. The nodes marked with XXX at end of lines are produced due to the "*boost::spirit::space_p " part of the for-each rule. Is there a way (a directive or something else in the grammar) to skip this content so that the marked nodes (XXX) would not appeared in the AST ? (i have tried discar_node_d, and other directives but nothing works) <parsenode rule="for_eachID"> <parsenode> XXX <value> </value> XXX </parsenode> XXX <parsenode rule="queryID"> <parsenode> ... </parsenode> </parsenode> <parsenode> XXX <value> </value> XXX </parsenode> XXX <parsenode> XXX <value>\n</value> XXX </parsenode> XXX <parsenode> XXX <value> </value> XXX </parsenode> XXX <parsenode rule="subscriptID"> <parsenode> ... </parsenode> </parsenode> </parsenode>
It is extremely simple in a more recent Spirit version, the directive
skip(space)[] in my code above handled that fine. You *really* need
to update, the version you are using is far more difficult to use,
slower, and far less capable.
On Sat, Apr 17, 2010 at 12:51 PM, EricB
2)The query rule handles inner balanced parentheses but it produces the following AST: =>input is b/c()/e
I Would like the AST produced from the inner () to be flatten. I would like the node marked with XXX not to be generated. Is there a directive for this ? <parsenode> <value>b</value> </parsenode> <parsenode> <value>/</value> </parsenode> <parsenode> <value>c</value> </parsenode> <parsenode> XXX <parsenode> <value>(</value> </parsenode> <parsenode> <value>)</value> </parsenode> </parsenode> XXX <parsenode> <value>/</value> </parsenode> <parsenode> <value>e</value> </parsenode>
That is what the raw[] directives in my above code is for, you really need to use the newer functionality rather then the old syntax that you are currently using.

is there any way to manage this with 1.6.4 My parser is part of a bigger project, I don't think I would be authorized to upgrade spirit (might have side effects). Moreover, I think there are issues with the compiler we use (BCC). I will try on my own computer outside of the project versionning system.

I have tried with latest version of spirit. I get "Compiler not supported." Therefore I have to keep using 1.6.4.

On Sun, Apr 18, 2010 at 3:34 AM, EricB
I have tried with latest version of spirit. I get "Compiler not supported."
Therefore I have to keep using 1.6.4.
Ah, BCC, yes, that compiler has a *ton* of issues, and that is still putting it mildly. Why is your place using such an absolutely broken compiler? Any chance of using GCC/MSVC?

Le 19/04/2010 02:01, OvermindDL1 a écrit :
Ah, BCC, yes, that compiler has a *ton* of issues, and that is still putting it mildly. Why is your place using such an absolutely broken compiler? Any chance of using GCC/MSVC?
Some part of the code has been checked to be compiled with GCC. So the change process is going on, but the software is more than 500K lines of code, this will be a long process (more than One year may be Two I think). My delivery is schedule for june. So there are no way to simulate the skip and raw directives behavior with 1.6.4 ?

EricB schrieb:
is there any way to manage this with 1.6.4
If only need to checked whether the string contains the same number of '(' and ')' what about int counter=0; rule<> r = ch_p('(')[increment_a(counter)] >> ~ch_p(')') >> ch_p(')')[decrement_a(counter)] if(counter!=0) { // unbalanced! } L.R.

On Mon, Apr 19, 2010 at 8:22 AM, Lars Rohwedder
EricB schrieb:
is there any way to manage this with 1.6.4
If only need to checked whether the string contains the same number of '(' and ')' what about
int counter=0;
rule<> r = ch_p('(')[increment_a(counter)] >> ~ch_p(')') >> ch_p(')')[decrement_a(counter)]
if(counter!=0) { // unbalanced! }
Unnecessary, slower, and error-prone. There is still a 'raw' equivalent in Spirit1 as I recall, but I do not know the syntax, you might be better off to ask on the Spirit mailing list (just be sure to mention that you are using a broken compiler, else you will be inundated with posts saying to stop using the ancient and slow Spirit1 :) ).

On 4/20/2010 1:14 PM, OvermindDL1 wrote:
On Mon, Apr 19, 2010 at 8:22 AM, Lars Rohwedder
wrote: EricB schrieb:
is there any way to manage this with 1.6.4
If only need to checked whether the string contains the same number of '(' and ')' what about
int counter=0;
rule<> r = ch_p('(')[increment_a(counter)]>> ~ch_p(')')>> ch_p(')')[decrement_a(counter)]
if(counter!=0) { // unbalanced! }
Unnecessary, slower, and error-prone. There is still a 'raw' equivalent in Spirit1 as I recall, but I do not know the syntax, you
Spirit1 is always "raw" in the sense that it is basically a transduction parser and returns the iterators to the matching range in the input. Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net http://www.facebook.com/djowel Meet me at BoostCon http://www.boostcon.com/home http://www.facebook.com/boostcon

I can not post on gmane.comp.parsers.spirit.general with firebird !!! Is there a specific way to post there

On Wed, Apr 21, 2010 at 6:05 AM, EricB
I can not post on gmane.comp.parsers.spirit.general with firebird !!! Is there a specific way to post there
Gmane is a pain, never touch it, use the official Spirit mailing list: https://lists.sourceforge.net/lists/listinfo/spirit-general
participants (4)
-
EricB
-
Joel de Guzman
-
Lars Rohwedder
-
OvermindDL1