Re: [boost] Re: thoughts on a GUI component library

In-Reply-To: <cqcjii$hs0$1@sea.gmane.org> dave@boost-consulting.com (David Abrahams) wrote (abridged):
That's okay for simple things, but for any serious printing work you need to know you're talking to a printer. For example, you may need to embed postscript in your output stream.
Not all printers support postscript, so you'd need to know not just that you were printing, but what kind of printer it was. Although there probably needs to be some kind of abstraction-busting pass-through mechanism, I see it as the exception rather than the rule. Another issue is transparency. My employer's drawing framework supports semi-transparency in the form of an alpha channel. This is used by us for things like anti-aliasing, and by users for subtle colour blends. Some printers can support transparency directly and some cannot. We found we needed a relatively elaborate architecture to hide that detail from application drawing code. I'm a bit concerned about the scope of this boost project. Our drawing framework is pretty huge. -- Dave Harris, Nottingham, UK

I realize that there is of course already the boost::spirit library for parsing, but I have recently released another recursive descent parsing library with a substantially different design. The YARD parser requires grammar productions to be defined as meta-functions and does not use operator overloading. The code base is quite a bit smaller, and I personally find the library easier to use than Spirit (of course I am very biased). The home page at http://yard-parser.sf.net/ . Would there be any interest in this of library for Boost? TIA -- Christopher Diggins http://www.cdiggins.com http://www.heron-language.com

On Thu, 23 Dec 2004 18:53:54 -0500, christopher diggins <cdiggins@videotron.ca> wrote:
I realize that there is of course already the boost::spirit library for parsing, but I have recently released another recursive descent parsing library with a substantially different design. The YARD parser requires grammar productions to be defined as meta-functions and does not use operator overloading. The code base is quite a bit smaller, and I personally find the library easier to use than Spirit (of course I am very biased). The home page at http://yard-parser.sf.net/ . Would there be any interest in this of library for Boost? TIA
Chris, I like Spirit and have a production system that parses a particular vendors data packets using it. Works well. My only real complaint about Spirit is that it is quite a slow parser. This could be improved by having first / follow set kind of things without an interface change. Spirit is quite good at representing the LL grammars you normal deal with when doing RDP. Do you have an example of two small grammars that might highlight why YARD is better? Regards, Matt matthurd@acm.org

----- Original Message ----- From: "Matt Hurd" <matt.hurd@gmail.com> To: <boost@lists.boost.org> Sent: Friday, December 24, 2004 1:45 AM Subject: Re: [boost] Interest in a Recursive Descent Parsing Library
On Thu, 23 Dec 2004 18:53:54 -0500, christopher diggins <cdiggins@videotron.ca> wrote:
I realize that there is of course already the boost::spirit library for parsing, but I have recently released another recursive descent parsing library with a substantially different design. The YARD parser requires grammar productions to be defined as meta-functions and does not use operator overloading. The code base is quite a bit smaller, and I personally find the library easier to use than Spirit (of course I am very biased). The home page at http://yard-parser.sf.net/ . Would there be any interest in this of library for Boost? TIA
Chris,
I like Spirit and have a production system that parses a particular vendors data packets using it. Works well. My only real complaint about Spirit is that it is quite a slow parser. This could be improved by having first / follow set kind of things without an interface change. Spirit is quite good at representing the LL grammars you normal deal with when doing RDP.
Do you have an example of two small grammars that might highlight why YARD is better?
Hi Matt, I would not say YARD is a better parser than Spirit, it is designed with quite different goals. YARD is designed to be compact and flexible. Concerning "first/follow set kind of things" I am not sure what you mean. I know that the YARD parser makes it very easy to write custom rules which could possibly allow what you are looking for. In terms of a grammar example the YARD release contains an XML grammar but I am not sure precisely how to express an XML grammar in Spirit. Christopher Diggins http://www.cdiggins.com http://www.heron-language.com

On 12/24/2004 12:13 AM, christopher diggins wrote: [snip]
I would not say YARD is a better parser than Spirit, it is designed with quite different goals. YARD is designed to be compact and flexible. Concerning "first/follow set kind of things" I am not sure what you mean.
They're used to determine the next non-terminal to parse by looking ahead 1 terminal symbol (for LL[1] parser). Informally, they're defined for each non-terminal in the language. For a non-terminal, N, FIRST(N) is the set of non-terminals which can start a sentence derived from N. Similarly, FOLLOW(N), is the set that can follow that sentence. For a formal definition, see the paper: A Strong LL(k) Parser Generator That Accepts Non-LL Grammars and Generates LL(1) Tables: Extended Abstract available at: http://www.iit.edu/~tc/llk.htm for an implementation, see: http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/boost/gram... actually, that's been replaced by one more in line with spirit (more meta-programming). I could upload it to the sandbox if you're interested.

----- Original Message ----- From: "Larry Evans" <cppljevans@cox-internet.com> To: <boost@lists.boost.org> Sent: Friday, December 24, 2004 8:42 AM Subject: [boost] Re: Interest in a Recursive Descent Parsing Library
On 12/24/2004 12:13 AM, christopher diggins wrote: [snip]
I would not say YARD is a better parser than Spirit, it is designed with quite different goals. YARD is designed to be compact and flexible. Concerning "first/follow set kind of things" I am not sure what you mean.
They're used to determine the next non-terminal to parse by looking ahead 1 terminal symbol (for LL[1] parser). [snip]
In YARD lookahead can be easily be done by hand. Consider the following grammar (which is in fact a complete and legal YARD grammar) struct StartRule : public re_or<RuleAB, RuleCD> { /* empty */ }; struct RuleAB : public re_or<RuleA, RuleB> { /* empty */ }; struct RuleCD : public re_or<RuleC, RuleD> { /* empty */ }; struct RuleA : public re_and<re_plus<MatchChar<'a'> > { /* empty */ }; // matches a a* struct RuleB : public re_and<re_plus<MatchChar<'b'> > { /* empty */ }; // matches b b* struct RuleC : public re_and<re_plus<MatchChar<'c'> > { /* empty */ }; // matches c c* struct RuleD : public re_and<re_plus<MatchChar<'d'> > { /* empty */ }; // matches d d* The straightforward way to implement a look-ahead rule in YARD would be to rewrite the StartRule as: struct StartRule { template<typename Elem_T> static bool Accept(ParserInputStream<Elem_T>& in) { switch (in.GetChar()) { case 'a' : return match<RuleA>(in); case 'b' : return match<RuleB>(in); case 'c' : return match<RuleC>(in); case 'd' : return match<RuleD>(in); default : return false; } } } This approach requires the programmer to figure out the FIRST(N) table by hand. I have found that there are typically only a couple of performance bottlenecks where lookahead is actually needed, and that generating these tables by hand to be easy. Perhaps other's experience is different. I would be curious how to express the above grammars in Spirit, with and without the hand-rolled lookahead rule. Christopher Diggins http://www.cdiggins.com

On 12/24/2004 09:25 AM, christopher diggins wrote: [snip]
I would be curious how to express the above grammars in Spirit, with and without the hand-rolled lookahead rule.
Gotta go on Christmas trip. Back Sunday. I have'nt got the parser with deterministic lookahead (I don't think), but I do have the lookahead sets calculated. I'll see if I can gen the parser and upload to sandbox when I get back.

On 12/24/2004 10:32 AM, Larry Evans wrote: [snip]
Gotta go on Christmas trip. Back Sunday. I have'nt got the parser with deterministic lookahead (I don't think), but I do have the lookahead sets calculated. I'll see if I can gen the parser and upload to sandbox when I get back. I've just uploaded to sandbox, libs/grammar_pipeline/test/spirit_proto /main.cpp which calculates the first, follow, and other sets (used for error recovery). I've not gen'ed the parser, but there is a version of the parser based on virtual functions instead of spirit's static functions. The static version of the functions would probably look very similar. If there's any interest, I'll upload this parser with virtual functions. Let me know.
Regards, Larry

christopher diggins wrote:
struct StartRule { template<typename Elem_T> static bool Accept(ParserInputStream<Elem_T>& in) { switch (in.GetChar()) { case 'a' : return match<RuleA>(in); case 'b' : return match<RuleB>(in); case 'c' : return match<RuleC>(in); case 'd' : return match<RuleD>(in); default : return false; } } }
This approach requires the programmer to figure out the FIRST(N) table by hand. I have found that there are typically only a couple of performance bottlenecks where lookahead is actually needed, and that generating these tables by hand to be easy. Perhaps other's experience is different.
I would be curious how to express the above grammars in Spirit, with and without the hand-rolled lookahead rule.
See: http://www.boost.org/libs/spirit/doc/switch_parser.html. Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

----- Original Message ----- From: "Joel de Guzman" <joel@boost-consulting.com> To: <boost@lists.boost.org> Sent: Friday, December 24, 2004 6:21 PM Subject: [boost] Re: Interest in a Recursive Descent Parsing Library
christopher diggins wrote:
struct StartRule { template<typename Elem_T> static bool Accept(ParserInputStream<Elem_T>& in) { switch (in.GetChar()) { case 'a' : return match<RuleA>(in); case 'b' : return match<RuleB>(in); case 'c' : return match<RuleC>(in); case 'd' : return match<RuleD>(in); default : return false; } } }
This approach requires the programmer to figure out the FIRST(N) table by hand. I have found that there are typically only a couple of performance bottlenecks where lookahead is actually needed, and that generating these tables by hand to be easy. Perhaps other's experience is different.
I would be curious how to express the above grammars in Spirit, with and without the hand-rolled lookahead rule.
See: http://www.boost.org/libs/spirit/doc/switch_parser.html.
I think this does a good job of illustrating the difference between Spirit and YARD. Spirit would require the following code, for the grammar production written above using YARD: rule<> rule_overall = switch_p [ case_p<'a'>(parser_a), case_p<'b'>(parser_b), case_p<'c'>(parser_b), case_p<'d'>(parser_b), default_p(parser_default) ] ; With Spirit there is an arbitrary upper bound on the number of case statements, which defaults to 3(!?). I would think that it is safe to say that the YARD parser is easier to use, and has fewer arbitrary constraints *BUT* does not provide dynamic rule expressions, and is more verbose. Is this balance of features in a CFG parsing library something that interests anyone in the Boost community? Christopher Diggins http://www.cdiggins.com http://www.heron-language.com

christopher diggins wrote:
I think this does a good job of illustrating the difference between Spirit and YARD. Spirit would require the following code, for the grammar production written above using YARD:
rule<> rule_overall = switch_p [ case_p<'a'>(parser_a), case_p<'b'>(parser_b), case_p<'c'>(parser_b), case_p<'d'>(parser_b), default_p(parser_default) ] ;
With Spirit there is an arbitrary upper bound on the number of case statements, which defaults to 3(!?). I would think that it is safe to say that the YARD parser is easier to use, and has fewer arbitrary constraints *BUT* does not provide dynamic rule expressions, and is more verbose.
What's the problem with the default value? Regards Hartmut

christopher diggins wrote:
----- Original Message ----- From: "Joel de Guzman" <joel@boost-consulting.com> To: <boost@lists.boost.org> Sent: Friday, December 24, 2004 6:21 PM Subject: [boost] Re: Interest in a Recursive Descent Parsing Library
christopher diggins wrote:
struct StartRule { template<typename Elem_T> static bool Accept(ParserInputStream<Elem_T>& in) { switch (in.GetChar()) { case 'a' : return match<RuleA>(in); case 'b' : return match<RuleB>(in); case 'c' : return match<RuleC>(in); case 'd' : return match<RuleD>(in); default : return false; } } }
This approach requires the programmer to figure out the FIRST(N) table by hand. I have found that there are typically only a couple of performance bottlenecks where lookahead is actually needed, and that generating these tables by hand to be easy. Perhaps other's experience is different.
I would be curious how to express the above grammars in Spirit, with and without the hand-rolled lookahead rule.
See: http://www.boost.org/libs/spirit/doc/switch_parser.html.
I think this does a good job of illustrating the difference between Spirit and YARD. Spirit would require the following code, for the grammar production written above using YARD:
rule<> rule_overall = switch_p [ case_p<'a'>(parser_a), case_p<'b'>(parser_b), case_p<'c'>(parser_b), case_p<'d'>(parser_b), default_p(parser_default) ] ;
With Spirit there is an arbitrary upper bound on the number of case statements, which defaults to 3(!?). I would think that it is safe to say that the YARD parser is easier to use, and has fewer arbitrary constraints *BUT* does not provide dynamic rule expressions, and is more verbose.
Is this balance of features in a CFG parsing library something that interests anyone in the Boost community?
If comparison to your *hand-coded* switch statement is your rationale, then that's unfair. You can also write a hand-coded single token switch parser using spirit's functor parser: http://www.boost.org/libs/spirit/doc/functor_parser.html Cheers, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

----- Original Message ----- From: "Joel de Guzman" <joel@boost-consulting.com> To: <boost@lists.boost.org> Sent: Saturday, December 25, 2004 6:43 PM Subject: [boost] Re: Interest in a Recursive Descent Parsing Library
christopher diggins wrote:
----- Original Message ----- From: "Joel de Guzman" <joel@boost-consulting.com> To: <boost@lists.boost.org> Sent: Friday, December 24, 2004 6:21 PM Subject: [boost] Re: Interest in a Recursive Descent Parsing Library
christopher diggins wrote:
struct StartRule { template<typename Elem_T> static bool Accept(ParserInputStream<Elem_T>& in) { switch (in.GetChar()) { case 'a' : return match<RuleA>(in); case 'b' : return match<RuleB>(in); case 'c' : return match<RuleC>(in); case 'd' : return match<RuleD>(in); default : return false; } } }
This approach requires the programmer to figure out the FIRST(N) table by hand. I have found that there are typically only a couple of performance bottlenecks where lookahead is actually needed, and that generating these tables by hand to be easy. Perhaps other's experience is different.
I would be curious how to express the above grammars in Spirit, with and without the hand-rolled lookahead rule.
See: http://www.boost.org/libs/spirit/doc/switch_parser.html.
I think this does a good job of illustrating the difference between Spirit and YARD. Spirit would require the following code, for the grammar production written above using YARD:
rule<> rule_overall = switch_p [ case_p<'a'>(parser_a), case_p<'b'>(parser_b), case_p<'c'>(parser_b), case_p<'d'>(parser_b), default_p(parser_default) ] ;
With Spirit there is an arbitrary upper bound on the number of case statements, which defaults to 3(!?). I would think that it is safe to say that the YARD parser is easier to use, and has fewer arbitrary constraints *BUT* does not provide dynamic rule expressions, and is more verbose.
Is this balance of features in a CFG parsing library something that interests anyone in the Boost community?
If comparison to your *hand-coded* switch statement is your rationale, then that's unfair. You can also write a hand-coded single token switch parser using spirit's functor parser:
I didn't mean to make an unfair comparison, I simply used the resource you pointed me to (see above). I looked at the functor parser documentation and I honestly can't figure out how to rewrite the above example. Perhaps someone else could do it. That single comparison was only a small part of the basis for my rationale. Christopher Diggins http://www.cdiggins.com http://www.heron-language.com

christopher diggins wrote: <snip an entire post>
I didn't mean to make an unfair comparison, I simply used the resource you pointed me to (see above). I looked at the functor parser documentation and I honestly can't figure out how to rewrite the above example. Perhaps someone else could do it. That single comparison was only a small part of the basis for my rationale.
Please limit the amount of quoted texts in your postings. -- Dave Abrahams Posting Nanny

christopher diggins wrote:
----- Original Message ----- From: "Joel de Guzman"
If comparison to your *hand-coded* switch statement is your rationale, then that's unfair. You can also write a hand-coded single token switch parser using spirit's functor parser:
I didn't mean to make an unfair comparison, I simply used the resource you pointed me to (see above). I looked at the functor parser documentation and I honestly can't figure out how to rewrite the above example. Perhaps someone else could do it. That single comparison was only a small part of the basis for my rationale.
Hi Christopher, Well, you use it to write your hand coded switch. Anyway, I would suggest for you to stop comparing apples and oranges and just provide a rationale based on the merits of your library alone. IMHO, comparisons to spirit will be meaningless if you are not at all familiar with it. Well, good luck! Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

----- Original Message ----- From: "Joel de Guzman" <joel@boost-consulting.com> To: <boost@lists.boost.org> Sent: Sunday, December 26, 2004 8:37 PM Subject: [boost] Re: Interest in a Recursive Descent Parsing Library
christopher diggins wrote:
----- Original Message ----- From: "Joel de Guzman"
Hi Christopher,
Well, you use it to write your hand coded switch.
I did get that far ;)
Anyway, I would suggest for you to stop comparing apples and oranges and just provide a rationale based on the merits of your library alone.
I didn't mean to compare an apple to an orange.
IMHO, comparisons to spirit will be meaningless if you are not at all familiar with it.
For the record, because I don't want to be misrpresented as someone who has not made the effort to learn Spirit, I did spend 8 to 10 hours studying the documentation, after which I wrote a few examples with Spirit.
Well, good luck!
Thank you, and congratulations on implementing such a successful and powerful tool. Christopher Diggins http://www.cdiggins.com http://www.heron-language.com

christopher diggins wrote:
Anyway, I would suggest for you to stop comparing apples and oranges and just provide a rationale based on the merits of your library alone.
I didn't mean to compare an apple to an orange.
IMHO, comparisons to spirit will be meaningless if you are not at all familiar with it.
For the record, because I don't want to be misrpresented as someone who has not made the effort to learn Spirit, I did spend 8 to 10 hours studying the documentation, after which I wrote a few examples with Spirit.
Ehmm... I do not think 8 to 10 hours of study is enough to make a valid comparative analysis. This is also the very reason why I chose not to compare Spirit with YACC, PCCTS, RDP etc. As much as I hate language wars like "hey C# is a lot better than C++", I also dislike library wars. Often, such discussions degenerate into meaningless "hey, X can do this, Y can't". Yeah, right ;-)
Well, good luck!
Thank you, and congratulations on implementing such a successful and powerful tool.
Thanks. Again, good luck! Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

----- Original Message ----- From: "Joel" <joel@boost-consulting.com>
Ehmm... I do not think 8 to 10 hours of study is enough to make a valid comparative analysis.
You are right, it isn't. I apologize, I just didn't want to appear completely uninformed ;-)
This is also the very reason why I chose not to compare Spirit with YACC, PCCTS, RDP etc. As much as I hate language wars like "hey C# is a lot better than C++", I also dislike library wars. Often, such discussions degenerate into meaningless "hey, X can do this, Y can't". Yeah, right ;-)
This is an excellent point. I will attempt to avoid making any such comparisons in the future then. I do believe YARD and Spirit to be completely different tools and that YARD occupies a useful and somewhat smaller niche. YARD was intentionally designed as a CFG parser which supports only statically defined grammar productions and semantic actions, and without any operator overloading. Do you have any suggestions, on how I can and should go about presenting a convincing argument for the utility of such a library? I have written extensive documentation at http://yard-parser.sf.net the library is freely downloadable at public domain at http://www.sf.net/projects/yard-parser and comes with a full XML parser. Thank a lot, Christopher Diggins http://www.cdiggins.com http://www.heron-language.com

christopher diggins wrote:
----- Original Message ----- From: "Joel" <joel@boost-consulting.com>
actions, and without any operator overloading. Do you have any suggestions, on how I can and should go about presenting a convincing argument for the utility of such a library? I have written extensive
Well, to be honest, I'm not sure I can help you because I haven't investigated your library yet enough to form an opinion. Cheers, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Matt Hurd wrote:
On Thu, 23 Dec 2004 18:53:54 -0500, christopher diggins <cdiggins@videotron.ca> wrote:
I realize that there is of course already the boost::spirit library for parsing, but I have recently released another recursive descent parsing library with a substantially different design. The YARD parser requires grammar productions to be defined as meta-functions and does not use operator overloading. The code base is quite a bit smaller, and I personally find the library easier to use than Spirit (of course I am very biased). The home page at http://yard-parser.sf.net/ . Would there be any interest in this of library for Boost? TIA
Chris,
I like Spirit and have a production system that parses a particular vendors data packets using it. Works well. My only real complaint about Spirit is that it is quite a slow parser. This could be improved by having first / follow set kind of things without an interface change. Spirit is quite good at representing the LL grammars you normal deal with when doing RDP.
FYI, Spirit-2, being developed now, will focus on performance. A limited test case shows a significant increase in speed. And, yes, a deterministic parsing scheme (first/follow) is part of the plan. Actually, you can already take advantage of deterministic parsing using the current switch_p parsers or a technique we call the "Nabialek trick". Surely, more deterministic schemes will be in place. Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
FYI, Spirit-2, being developed now, will focus on performance. A limited test case shows a significant increase in speed. And, yes, a deterministic parsing scheme (first/follow) is part of the plan. Actually, you can already take advantage of deterministic parsing using the current switch_p parsers or a technique we call the "Nabialek trick". Surely, more deterministic schemes will be in place.
Is first/follow really linked to determinism? It seems to me that you can use first/follow sets to prune unproductive searches in a nondeterministic parse, too (?) -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams wrote:
Joel de Guzman wrote:
FYI, Spirit-2, being developed now, will focus on performance. A limited test case shows a significant increase in speed. And, yes, a deterministic parsing scheme (first/follow) is part of the plan. Actually, you can already take advantage of deterministic parsing using the current switch_p parsers or a technique we call the "Nabialek trick". Surely, more deterministic schemes will be in place.
Is first/follow really linked to determinism? It seems to me that you can use first/follow sets to prune unproductive searches in a nondeterministic parse, too (?)
Right. First-follow makes a parser predictive and thus deterministic, typically with one symbol lookahead (LL(1)). A strictly predictive parser is a lot more restrictive in the grammars that one can use, however. There's no way to make it parse adhoc grammars like the C pre-processor or, ahem, c++. What I am inclined to do is to mix determinism with classic spirit's non-determinism. Exactly the recipe that you are alluding to. Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
David Abrahams wrote:
Joel de Guzman wrote:
FYI, Spirit-2, being developed now, will focus on performance. A limited test case shows a significant increase in speed. And, yes, a deterministic parsing scheme (first/follow) is part of the plan. Actually, you can already take advantage of deterministic parsing using the current switch_p parsers or a technique we call the "Nabialek trick". Surely, more deterministic schemes will be in place.
Is first/follow really linked to determinism? It seems to me that you can use first/follow sets to prune unproductive searches in a nondeterministic parse, too (?)
Right. First-follow makes a parser predictive and thus deterministic,
Okay, either I'm "right" and first-follow can be used to prune unproductive parses in a nondeterministic parse *or* using first-follow makes a parser deterministic. It can't be both :) As I understand "nondeterministic" it just means you're doing backtracking when a parse gets into trouble, unlike in an NFA where you're conceptually moving along all viable paths in parallel. I don't see why the use of first/follow should preclude backtracking. You can always define ambiguous grammars with dynamic failure conditions.
typically with one symbol lookahead (LL(1)). A strictly predictive parser is a lot more restrictive in the grammars that one can use, however. There's no way to make it parse adhoc grammars like the C pre-processor or, ahem, c++. What I am inclined to do is to mix determinism with classic spirit's non-determinism. Exactly the recipe that you are alluding to.
I'm not sure if it is or not -- I'm just getting more confuseder. ;^) -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams wrote:
Is first/follow really linked to determinism? It seems to me that you can use first/follow sets to prune unproductive searches in a nondeterministic parse, too (?)
Right. First-follow makes a parser predictive and thus deterministic,
Okay, either I'm "right" and first-follow can be used to prune unproductive parses in a nondeterministic parse *or* using first-follow makes a parser deterministic. It can't be both :)
Using first-follow makes a parser deterministic, it can't be both. An LL(1) parser, for instance, doesn't do backtracking. It always has one and only one choice when faced with alternatives. It's roughly like the switch (deterministic) versus the cascading ifs (non-deterministic). However... (see below)
As I understand "nondeterministic" it just means you're doing backtracking when a parse gets into trouble, unlike in an NFA where you're conceptually moving along all viable paths in parallel. I don't see why the use of first/follow should preclude backtracking. You can always define ambiguous grammars with dynamic failure conditions.
typically with one symbol lookahead (LL(1)). A strictly predictive parser is a lot more restrictive in the grammars that one can use, however. There's no way to make it parse adhoc grammars like the C pre-processor or, ahem, c++. What I am inclined to do is to mix determinism with classic spirit's non-determinism. Exactly the recipe that you are alluding to.
I'm not sure if it is or not -- I'm just getting more confuseder. ;^)
Classic LL(1) does not have the "dynamic failure conditions" that you mention. LL(1) grammars cannot have ambiguity. If the grammar is ambiguous, first follow computation will fail (first follow conflicts). I'm developing a scheme where you can mix both predictive LL(1) with classic spirit's non-determinism. The scheme involves using a predictive front, backed up by a non-predictive fall-back step when ambiguity arises. It's roughly like combining the switch with cascading-ifs. It's still too early if such a scheme will hold water. AFAIK, no one has done it yet. An alternative scheme is to let the user switch strategies (possibly using spirit directives). Another scheme, used by PCCTS, is to make the parser predictive by default, and allow the user to use syntactic and semantic predicates with arbitrary lookahead on specific points in the grammar. Cheers, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
David Abrahams wrote:
Is first/follow really linked to determinism? It seems to me that you can use first/follow sets to prune unproductive searches in a nondeterministic parse, too (?)
Right. First-follow makes a parser predictive and thus deterministic,
Okay, either I'm "right" and first-follow can be used to prune unproductive parses in a nondeterministic parse *or* using first-follow makes a parser deterministic. It can't be both :)
Using first-follow makes a parser deterministic, it can't be both. An LL(1) parser, for instance, doesn't do backtracking.
Yes, but LL(1) is not synonymous with "recursive descent with first/follow."
It always has one and only one choice when faced with alternatives. It's roughly like the switch (deterministic) versus the cascading ifs (non-deterministic).
However... (see below)
As I understand "nondeterministic" it just means you're doing backtracking when a parse gets into trouble, unlike in an NFA where you're conceptually moving along all viable paths in parallel. I don't see why the use of first/follow should preclude backtracking. You can always define ambiguous grammars with dynamic failure conditions.
typically with one symbol lookahead (LL(1)). A strictly predictive parser is a lot more restrictive in the grammars that one can use, however. There's no way to make it parse adhoc grammars like the C pre-processor or, ahem, c++. What I am inclined to do is to mix determinism with classic spirit's non-determinism. Exactly the recipe that you are alluding to.
I'm not sure if it is or not -- I'm just getting more confuseder. ;^)
Classic LL(1) does not have the "dynamic failure conditions" that you mention.
Yes, I know that. But then, LL(1) is not synonymous with "recursive descent with first/follow."
NLL(1) grammars cannot have ambiguity. If the grammar is ambiguous, first follow computation will fail (first follow conflicts).
Hum? The algorithm for computing the first and follow sets of a symbol don't have an "otherwise the computation fails" branch, and the definitions of these sets are simple: first(a) = the set of tokens that can begin strings derived from a. follow(N) = the set of tokens that can follow N in grammatical sequences You can use this information to eliminate hopeless branches of a parse and avoid wasting effort backtracking without requiring the grammar to be LL(1) or LL(2) or... even to be unambiguous at all.
I'm developing a scheme where you can mix both predictive LL(1) with classic spirit's non-determinism. The scheme involves using a predictive front, backed up by a non-predictive fall-back step when ambiguity arises. It's roughly like combining the switch with cascading-ifs.
Maybe we're talking about the same thing.
It's still too early if such a scheme will hold water. AFAIK, no one has done it yet.
If you're talking about what I'm describing, I don't think there's anything particularly radical about it, and I'd be very surprised if it hadn't been done before.
An alternative scheme is to let the user switch strategies (possibly using spirit directives).
Another scheme, used by PCCTS, is to make the parser predictive by default, and allow the user to use syntactic and semantic predicates with arbitrary lookahead on specific points in the grammar.
Eewww, I smell a kludge ;-) -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams wrote:
Joel de Guzman wrote:
David Abrahams wrote:
Is first/follow really linked to determinism? It seems to me that you can use first/follow sets to prune unproductive searches in a nondeterministic parse, too (?)
Right. First-follow makes a parser predictive and thus deterministic,
Okay, either I'm "right" and first-follow can be used to prune unproductive parses in a nondeterministic parse *or* using first-follow makes a parser deterministic. It can't be both :)
Using first-follow makes a parser deterministic, it can't be both. An LL(1) parser, for instance, doesn't do backtracking.
Yes, but LL(1) is not synonymous with "recursive descent with first/follow."
Hmmm... I don't recall mentioning that. Did I?
It always has one and only one choice when faced with alternatives. It's roughly like the switch (deterministic) versus the cascading ifs (non-deterministic).
However... (see below)
As I understand "nondeterministic" it just means you're doing backtracking when a parse gets into trouble, unlike in an NFA where you're conceptually moving along all viable paths in parallel. I don't see why the use of first/follow should preclude backtracking. You can always define ambiguous grammars with dynamic failure conditions.
typically with one symbol lookahead (LL(1)). A strictly predictive parser is a lot more restrictive in the grammars that one can use, however. There's no way to make it parse adhoc grammars like the C pre-processor or, ahem, c++. What I am inclined to do is to mix determinism with classic spirit's non-determinism. Exactly the recipe that you are alluding to.
I'm not sure if it is or not -- I'm just getting more confuseder. ;^)
Classic LL(1) does not have the "dynamic failure conditions" that you mention.
Yes, I know that. But then, LL(1) is not synonymous with "recursive descent with first/follow."
Sure. But I don't see the point. What am I missing?
NLL(1) grammars cannot have ambiguity. If the grammar is ambiguous, first follow computation will fail (first follow conflicts).
Hum? The algorithm for computing the first and follow sets of a symbol don't have an "otherwise the computation fails" branch, and the definitions of these sets are simple:
first(a) = the set of tokens that can begin strings derived from a. follow(N) = the set of tokens that can follow N in grammatical sequences
You can use this information to eliminate hopeless branches of a parse and avoid wasting effort backtracking without requiring the grammar to be LL(1) or LL(2) or... even to be unambiguous at all.
Yes. In theory.
I'm developing a scheme where you can mix both predictive LL(1) with classic spirit's non-determinism. The scheme involves using a predictive front, backed up by a non-predictive fall-back step when ambiguity arises. It's roughly like combining the switch with cascading-ifs.
Maybe we're talking about the same thing.
Yes! That was what I was trying to point out from the start. However, it's not as simple as it seems.
It's still too early if such a scheme will hold water. AFAIK, no one has done it yet.
If you're talking about what I'm describing, I don't think there's anything particularly radical about it, and I'd be very surprised if it hadn't been done before.
AFAIK, no. Correct me if I'm wrong.
An alternative scheme is to let the user switch strategies (possibly using spirit directives).
Another scheme, used by PCCTS, is to make the parser predictive by default, and allow the user to use syntactic and semantic predicates with arbitrary lookahead on specific points in the grammar.
Eewww, I smell a kludge ;-)
Not really. There's a big problem with backtracking in general: backtracking and semantic actions do not mix well (e.g. false triggering). In a syntactic and semantic predicate, all actions are intentionally inhibited. Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
David Abrahams wrote:
Joel de Guzman wrote:
Using first-follow makes a parser deterministic, it can't be both. An LL(1) parser, for instance, doesn't do backtracking.
Yes, but LL(1) is not synonymous with "recursive descent with first/follow."
Hmmm... I don't recall mentioning that. Did I?
No. I'm arguing that the mere use of first/follow does not in and of itself make a parser deterministic.
typically with one symbol lookahead (LL(1)). A strictly predictive parser is a lot more restrictive in the grammars that one can use, however. There's no way to make it parse adhoc grammars like the C pre-processor or, ahem, c++. What I am inclined to do is to mix determinism with classic spirit's non-determinism. Exactly the recipe that you are alluding to.
I'm not sure if it is or not -- I'm just getting more confuseder. ;^)
Classic LL(1) does not have the "dynamic failure conditions" that you mention.
Yes, I know that. But then, LL(1) is not synonymous with "recursive descent with first/follow."
Sure. But I don't see the point. What am I missing?
Well, I say "you could use first/follow to prune unproductive parses while still using dynamic failure conditions to get nondeterminism," and you come back with "Classic LL(1) doesn't have those." I assume you're using that as an argument against my statement, but it's a hollow argument because classic LL(1) is only one way to use first/follow with recursive descent.
NLL(1) grammars cannot have ambiguity. If the grammar is ambiguous, first follow computation will fail (first follow conflicts).
Hum? The algorithm for computing the first and follow sets of a symbol don't have an "otherwise the computation fails" branch, and the definitions of these sets are simple:
first(a) = the set of tokens that can begin strings derived from a. follow(N) = the set of tokens that can follow N in grammatical sequences
You can use this information to eliminate hopeless branches of a parse and avoid wasting effort backtracking without requiring the grammar to be LL(1) or LL(2) or... even to be unambiguous at all.
Yes. In theory.
So "first follow computation will fail" is false. BTW, I don't know what NLL(1) means. Nondeterministic LL(1)? I can't even picture that.
Maybe we're talking about the same thing.
Yes! That was what I was trying to point out from the start. However, it's not as simple as it seems.
Why not?
If you're talking about what I'm describing, I don't think there's anything particularly radical about it, and I'd be very surprised if it hadn't been done before.
AFAIK, no. Correct me if I'm wrong.
I once sent you a comparative study of some NLP parsing strategies. One of those was using first/follow computation and parsing ambiguous grammars. I'm pretty sure they were using backtracking or that they told me that they had thought of adding it.
An alternative scheme is to let the user switch strategies (possibly using spirit directives).
Another scheme, used by PCCTS, is to make the parser predictive by default, and allow the user to use syntactic and semantic predicates with arbitrary lookahead on specific points in the grammar.
Eewww, I smell a kludge ;-)
Not really. There's a big problem with backtracking in general: backtracking and semantic actions do not mix well (e.g. false triggering).
It's not a problem if you build a parse tree and run the analysis on that (which is sometimes required for synthesized attributes anyway).
In a syntactic and semantic predicate, all actions are intentionally inhibited.
Have you considered the strategy of http://docs.biostat.wustl.edu/cgi-bin/info2html?(bison.info.gz)GLR%2520Parse... (http://tinyurl.com/694kq) ?? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams wrote:
Joel de Guzman wrote:
David Abrahams wrote:
Joel de Guzman wrote:
Using first-follow makes a parser deterministic, it can't be both. An LL(1) parser, for instance, doesn't do backtracking.
Yes, but LL(1) is not synonymous with "recursive descent with first/follow."
Hmmm... I don't recall mentioning that. Did I?
No. I'm arguing that the mere use of first/follow does not in and of itself make a parser deterministic.
typically with one symbol lookahead (LL(1)). A strictly predictive parser is a lot more restrictive in the grammars that one can use, however. There's no way to make it parse adhoc grammars like the C pre-processor or, ahem, c++. What I am inclined to do is to mix determinism with classic spirit's non-determinism. Exactly the recipe that you are alluding to.
I'm not sure if it is or not -- I'm just getting more confuseder. ;^)
Classic LL(1) does not have the "dynamic failure conditions" that you mention.
Yes, I know that. But then, LL(1) is not synonymous with "recursive descent with first/follow."
Sure. But I don't see the point. What am I missing?
Well, I say "you could use first/follow to prune unproductive parses while still using dynamic failure conditions to get nondeterminism," and you come back with "Classic LL(1) doesn't have those." I assume you're using that as an argument against my statement, but it's a hollow argument because classic LL(1) is only one way to use first/follow with recursive descent.
Oh. I see. No, I'm not using it as an argument, and no, I am not disputing your statement. That's the very reason I qualified my statement with "classic LL(1)". Actually, we were in agreement (or at least I thought so).
NLL(1) grammars cannot have ambiguity. If the grammar is ambiguous, first follow computation will fail (first follow conflicts).
Hum? The algorithm for computing the first and follow sets of a symbol don't have an "otherwise the computation fails" branch, and the definitions of these sets are simple:
first(a) = the set of tokens that can begin strings derived from a. follow(N) = the set of tokens that can follow N in grammatical sequences
You can use this information to eliminate hopeless branches of a parse and avoid wasting effort backtracking without requiring the grammar to be LL(1) or LL(2) or... even to be unambiguous at all.
Yes. In theory.
So "first follow computation will fail" is false. BTW, I don't know what NLL(1) means. Nondeterministic LL(1)? I can't even picture that.
No, it is true. BTW, the "NLL" is a typo introduced by your quoting of my post. Here's the original:
Classic LL(1) does not have the "dynamic failure conditions" that you mention. LL(1) grammars cannot have ambiguity. If the grammar is ambiguous, first follow computation will fail (first follow conflicts).
When the grammar is ambiguous, the computation of the first set will fail because the set itself cannot have multiple same entries. You'll have to look-ahead more than 1 tokens to resolve the ambiguity, and by doing so, the grammar is no longer LL(1).
Maybe we're talking about the same thing.
Yes! That was what I was trying to point out from the start. However, it's not as simple as it seems.
Why not?
Long story. Anyway, the basic problem is that Spirit parsers are dynamic. The computation of the first set, for instance, cannot be done at compile time. Say for example: r = str_p("apple") | "banana" | "orange"; the first set of r is: ['a', 'b', 'o']. This can be obtained through a call to first(r) which calls first("apple"), first("banana") and first("orange") recursively. The problem is that each alternative can potentially have different types. A production such as: a | b | c; can be thought of as a composite<alternative_operation, tuple<A, B, C> > where A, B and C are the types of a, b and c. So, now, when creating the parser, you'll have to make a table of alternates tuple<A, B, C> which dispatches properly based on information known only at runtime: 1) the current input character and 2) the first follow sets of A, B and C. One possible solution is to make the parsers truly static. Example: str<'a','p','p','l','e'>() | str<'b','a','n','a','n',','a'>() | str<'o','r','a','n','g','e'>() in this case, first/follow can be a static operation. But, I'm not thrilled about this solution. It has lots of problems too (e.g. the rule's opaque virtual function wall). At the opposite extreme, another possible solution is to use virtual functions and run-time dispatch all over. Hence, the productions can be stored in standard homogeneous containers instead of heterogenious containers (tuples). I'm also not thrilled about this solution. Searching for a viable solution to this problem is actually one of the main motivations behind the development of Fusion. So, IOTW, I had to develop some basic infrastructure/scaffoldings first before I could tackle predictive parsing in Spirit. Hence, it wasn't as easy at it seems.
If you're talking about what I'm describing, I don't think there's anything particularly radical about it, and I'd be very surprised if it hadn't been done before.
AFAIK, no. Correct me if I'm wrong.
I once sent you a comparative study of some NLP parsing strategies. One of those was using first/follow computation and parsing ambiguous grammars. I'm pretty sure they were using backtracking or that they told me that they had thought of adding it.
Cool! I'll look into that.
An alternative scheme is to let the user switch strategies (possibly using spirit directives).
Another scheme, used by PCCTS, is to make the parser predictive by default, and allow the user to use syntactic and semantic predicates with arbitrary lookahead on specific points in the grammar.
Eewww, I smell a kludge ;-)
Not really. There's a big problem with backtracking in general: backtracking and semantic actions do not mix well (e.g. false triggering).
It's not a problem if you build a parse tree and run the analysis on that (which is sometimes required for synthesized attributes anyway).
Agreed. Yet, for that to succeed, you'll have to build it all the way to the top (start rule). Only then will all correct branches be known and all failed branches be thrown away. For simple parsing tasks, this is expensive. In a lot of cases, I do not want to build a parse tree.
In a syntactic and semantic predicate, all actions are intentionally inhibited.
Have you considered the strategy of http://docs.biostat.wustl.edu/cgi-bin/info2html?(bison.info.gz)GLR%2520Parse... (http://tinyurl.com/694kq) ??
No. At least, not just yet. I'd like to concentrate first on top-bottom RD approaches. I have a keen eye on recursive ascent, however. Cheers, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
David Abrahams wrote:
Well, I say "you could use first/follow to prune unproductive parses while still using dynamic failure conditions to get nondeterminism," and you come back with "Classic LL(1) doesn't have those." I assume you're using that as an argument against my statement, but it's a hollow argument because classic LL(1) is only one way to use first/follow with recursive descent.
Oh. I see.
No, I'm not using it as an argument, and no, I am not disputing your statement. That's the very reason I qualified my statement with "classic LL(1)". Actually, we were in agreement (or at least I thought so).
I guess I was confused (and still am, if you say you've been agreeing with me) by this little exchange: Me: Okay, either I'm "right" and first-follow can be used to prune unproductive parses in a nondeterministic parse *or* using first-follow makes a parser deterministic. It can't be both :) You: Using first-follow makes a parser deterministic, it can't be both. But I'm happy to drop this now. It isn't crucial.
Hum? The algorithm for computing the first and follow sets of a symbol don't have an "otherwise the computation fails" branch, and the definitions of these sets are simple:
first(a) = the set of tokens that can begin strings derived from a. follow(N) = the set of tokens that can follow N in grammatical sequences
So "first follow computation will fail" is false. BTW, I don't know what NLL(1) means. Nondeterministic LL(1)? I can't even picture that.
No, it is true.
BTW, the "NLL" is a typo introduced by your quoting of my post.
<blush>Sorry!</blush>.
When the grammar is ambiguous, the computation of the first set will fail because the set itself cannot have multiple same entries.
It's hard for me to understand how you reach that conclusion. When an algorithm says to add something to a set that is already in the set, the normal expectation is that the set remains unchanged and the algorithm proceeds, not that it "fails."
You'll have to look-ahead more than 1 tokens to resolve the ambiguity,
That's an entirely different question than whether the computation of the first and follow sets fails.
and by doing so, the grammar is no longer LL(1).
I've tried to say several times that the properties of LL(1) are irrelevant to the arguments I've been making, and you always seem to eventually agree, but here it is again. What is your point in bringing up LL(1) here?
Maybe we're talking about the same thing.
Yes! That was what I was trying to point out from the start. However, it's not as simple as it seems.
Why not?
Long story. Anyway, the basic problem is that Spirit parsers are dynamic. The computation of the first set, for instance, cannot be done at compile time.
Sure, but I thought you were going to be mostly emphasizing a static subrule model. I guess if you use string literals you'll have this problem no matter what, though I'm not sure the use of string literals should be a foregone conclusion. If you consider parser generators like YACC, they cleanly separate lexical analysis from parsing and the parser itself is never working with string literals.
Say for example:
r = str_p("apple") | "banana" | "orange";
the first set of r is: ['a', 'b', 'o']. This can be obtained through a call to first(r) which calls first("apple"), first("banana") and first("orange") recursively.
Right.
The problem is that each alternative can potentially have different types. A production such as:
a | b | c;
(nit: that's a RHS, not a production. You need a LHS symbol to make it a production).
can be thought of as a
composite<alternative_operation, tuple<A, B, C> >
where A, B and C are the types of a, b and c. So, now, when creating the parser, you'll have to make a table of alternates tuple<A, B, C> which dispatches properly based on information known only at runtime: 1) the current input character and 2) the first follow sets of A, B and C.
One possible solution is to make the parsers truly static. Example:
str<'a','p','p','l','e'>() | str<'b','a','n','a','n',','a'>() | str<'o','r','a','n','g','e'>()
in this case, first/follow can be a static operation. But, I'm not thrilled about this solution. It has lots of problems too (e.g. the rule's opaque virtual function wall).
Again, only an issue when crossing the static/dynamic boundary. In YACC this would be: enum tokens { apple, bananna, orange }; Actually, I rather like typedef str<'a','p','p','l','e'> apple; for Spirit. Most grammars don't have billions of tokens. BTW, is typedef str<'appl','e'> apple; legal C++? I'm just curious about multi-char literals.
At the opposite extreme, another possible solution is to use virtual functions and run-time dispatch all over. Hence, the productions can be stored in standard homogeneous containers instead of heterogenious containers (tuples). I'm also not thrilled about this solution.
Agreed. I don't really see why your previous idea was so bad in the case where you're *not* using runtime dispatch all over.
Searching for a viable solution to this problem is actually one of the main motivations behind the development of Fusion. So, IOTW, I had to develop some basic infrastructure/scaffoldings first before I could tackle predictive parsing in Spirit. Hence, it wasn't as easy at it seems.
<vague understanding> OK.
I once sent you a comparative study of some NLP parsing strategies. One of those was using first/follow computation and parsing ambiguous grammars. I'm pretty sure they were using backtracking or that they told me that they had thought of adding it.
Cool! I'll look into that.
I doubt you will find any specifics in the material I sent you, though.
Not really. There's a big problem with backtracking in general: backtracking and semantic actions do not mix well (e.g. false triggering).
It's not a problem if you build a parse tree and run the analysis on that (which is sometimes required for synthesized attributes anyway).
Sorry, I meant inherited attributes.
Agreed. Yet, for that to succeed, you'll have to build it all the way to the top (start rule). Only then will all correct branches be known and all failed branches be thrown away. For simple parsing tasks, this is expensive. In a lot of cases, I do not want to build a parse tree.
In a syntactic and semantic predicate, all actions are intentionally inhibited.
Have you considered the strategy of http://docs.biostat.wustl.edu/cgi-bin/info2html?(bison.info.gz)GLR%2520Parse... (http://tinyurl.com/694kq) ??
No. At least, not just yet. I'd like to concentrate first on top-bottom RD approaches. I have a keen eye on recursive ascent, however.
Hm. I didn't *think* their strategy for managing nondeterministic semantic actions was specific to bottom-up parsing. Is it?? Let me also mention that taking throw-away semantic actions is not so bad when: 1. The actions are all stateless and just produce synthesized attributes 2. First/Follow pruning eliminates most parse explorations that later turn out to fail anyway. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams wrote:
I guess I was confused (and still am, if you say you've been agreeing with me) by this little exchange:
Me: Okay, either I'm "right" and first-follow can be used to prune unproductive parses in a nondeterministic parse *or* using first-follow makes a parser deterministic. It can't be both :)
You: Using first-follow makes a parser deterministic, it can't be both.
But I'm happy to drop this now. It isn't crucial.
Ok, I looked back and re-read the exchanges. Now I understand where we diverged. I should have been more careful. You are right of course. And I agree with you. I think I didn't quite communicate what I meant. Re-reading, I realize there indeed is a lapse in logic there. It should have read: first-follow is used to make a parser deterministic.
When the grammar is ambiguous, the computation of the first set will fail because the set itself cannot have multiple same entries.
It's hard for me to understand how you reach that conclusion. When an algorithm says to add something to a set that is already in the set, the normal expectation is that the set remains unchanged and the algorithm proceeds, not that it "fails."
Another lapse in logic. The computation of the first set alone may not fail. It is the building of the predictive parser from the information that will fail. That was what I meant, sorry.
I've tried to say several times that the properties of LL(1) are irrelevant to the arguments I've been making, and you always seem to eventually agree, but here it is again. What is your point in bringing up LL(1) here?
Ok, let's prune this unproductive branch, backtrack, and move on :-)
Long story. Anyway, the basic problem is that Spirit parsers are dynamic. The computation of the first set, for instance, cannot be done at compile time.
Sure, but I thought you were going to be mostly emphasizing a static subrule model.
Yes. But 1) unless I eliminate all the dynamic rules, this will still be a problem. 2) There are very nice things you can do with dynamic parsing that can't be done with pure static parsing. Unfortunately, I can't solve 1 and I consider 2 very important.
I guess if you use string literals you'll have this problem no matter what, though I'm not sure the use of string literals should be a foregone conclusion. If you consider parser generators like YACC, they cleanly separate lexical analysis from parsing and the parser itself is never working with string literals.
Again, true. However, one of the original intent of Spirit is to be usable in micro parsing tasks. When parsing small grammars like, say, a complex number, you do not want to write a separate front end lexer. Anyway, I do plan to write an optional but separate lexer sometime.
Say for example:
r = str_p("apple") | "banana" | "orange";
the first set of r is: ['a', 'b', 'o']. This can be obtained through a call to first(r) which calls first("apple"), first("banana") and first("orange") recursively.
Right.
The problem is that each alternative can potentially have different types. A production such as:
a | b | c;
(nit: that's a RHS, not a production. You need a LHS symbol to make it a production).
I wonder why I snipped "r = ". It was there when I wrote my email.
can be thought of as a
composite<alternative_operation, tuple<A, B, C> >
where A, B and C are the types of a, b and c. So, now, when creating the parser, you'll have to make a table of alternates tuple<A, B, C> which dispatches properly based on information known only at runtime: 1) the current input character and 2) the first follow sets of A, B and C.
One possible solution is to make the parsers truly static. Example:
str<'a','p','p','l','e'>() | str<'b','a','n','a','n',','a'>() | str<'o','r','a','n','g','e'>()
in this case, first/follow can be a static operation. But, I'm not thrilled about this solution. It has lots of problems too (e.g. the rule's opaque virtual function wall).
Again, only an issue when crossing the static/dynamic boundary.
Inevitable, I guess.
In YACC this would be:
enum tokens { apple, bananna, orange };
Actually, I rather like
typedef str<'a','p','p','l','e'> apple;
for Spirit.
It should be an object so we don't have to instantiate it to work with ETs: str<'a','p','p','l','e'> apple;
Most grammars don't have billions of tokens. BTW, is
typedef str<'appl','e'> apple;
legal C++? I'm just curious about multi-char literals.
I would think so, but extracting the individual chars from the multi-char literal would have the endian problem, and the number of bits in the literal would be implementation defined; so, I would assume that 'appl' (assuming 4 chars) would not work on all platforms. AFAICT.
At the opposite extreme, another possible solution is to use virtual functions and run-time dispatch all over. Hence, the productions can be stored in standard homogeneous containers instead of heterogenious containers (tuples). I'm also not thrilled about this solution.
Agreed. I don't really see why your previous idea was so bad in the case where you're *not* using runtime dispatch all over.
I'll feel very sad to leave dynamic parsers behind. It will be a casualty once we go fully static. I might have a solution but it's still too early to tell if its any good.
No. At least, not just yet. I'd like to concentrate first on top-bottom RD approaches. I have a keen eye on recursive ascent, however.
Hm. I didn't *think* their strategy for managing nondeterministic semantic actions was specific to bottom-up parsing. Is it??
Not sure. Isn't it? Ok, I'll investigate.
Let me also mention that taking throw-away semantic actions is not so bad when:
1. The actions are all stateless and just produce synthesized attributes
Right!
2. First/Follow pruning eliminates most parse explorations that later turn out to fail anyway.
Yep. Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

On 12/25/2004 11:43 PM, Joel de Guzman wrote: [snip]
Long story. Anyway, the basic problem is that Spirit parsers are dynamic. The computation of the first set, for instance, cannot be done at compile time. Say for example:
r = str_p("apple") | "banana" | "orange";
the first set of r is: ['a', 'b', 'o']. This can be obtained through a call to first(r) which calls first("apple"), first("banana") and first("orange") recursively.
The problem is that each alternative can potentially have different types. A production such as:
Could you explain what this means. I'm guessing it means each alternate would have a different synthesized attribut calculated for it.
a | b | c;
That is, a's attribute, A, could be a float, b's attribute, B, could be a string, and c's attribute, C, could be something else. Is that right?
can be thought of as a
composite<alternative_operation, tuple<A, B, C> >
Wouldn't the most "natural" attribute for this be a "named" variant which map contain the same type for two or more different "names"? IOW, instead of the variant currently in boost, the sum type in: sandbox/boost/indexed_types/sum.hpp would represent the result of parsing the alternative expression: a | b | c and the names could be 1,2,3 (all unsigned) or possibly names, such as elements of an enumeration: enum alt_abc{ alt_a, alt_b, alt_c};
where A, B and C are the types of a, b and c. So, now, when creating the parser, you'll have to make a table of alternates tuple<A, B, C> which dispatches properly based on information known only at runtime: 1) the current input character and
Of course, I understand how this must be only known at runtime.
2) the first follow sets of A, B and C.
Now this could be calculated at compile time if there were no cycles in the productions. For example, the simplest case would be: A = 'a' A | '' i.e. a sequence of a's. The cycle, or recursion, is caused by the A on the rhs. Of course, we know the solution in this simple case, but other's would, I guess, couldn't be solved or would be too difficult to solve. (I'm really guessing just based on my inability to figure out how). The point is, some could be while other's might not be; hence, a more accurate statement would be: 2) some of the first and follow sets of A, B, and C could not be calculated at compile time or would be "too difficult" to calculate at compile time. Regards, Larry

Larry Evans wrote:
On 12/25/2004 11:43 PM, Joel de Guzman wrote:
The problem is that each alternative can potentially have different types. A production such as:
Could you explain what this means. I'm guessing it means each alternate would have a different synthesized attribut calculated for it.
Hi Larry! Ehm... no, the alternatives themselves have different types. Consider the grammar: r = ch_p('a') | ch_p('b') >> ch_p('c'); the type of the left alternative is chlit<char>. The type of the right alternative is sequence<chlit<char>, chlit<char> >.
a | b | c;
Wouldn't the most "natural" attribute for this be a "named" variant which map contain the same type for two or more different "names"? IOW, instead of the variant currently in boost, the sum type in:
sandbox/boost/indexed_types/sum.hpp
A variant. Yes, that's the plan. But that's a different matter. Anyway. Now that you've mentioned it. Yes, a named variant would be nice!
would represent the result of parsing the alternative expression:
a | b | c
and the names could be 1,2,3 (all unsigned) or possibly names, such as elements of an enumeration:
enum alt_abc{ alt_a, alt_b, alt_c};
Very nice!
where A, B and C are the types of a, b and c. So, now, when creating the parser, you'll have to make a table of alternates tuple<A, B, C> which dispatches properly based on information known only at runtime: 1) the current input character and
Of course, I understand how this must be only known at runtime.
2) the first follow sets of A, B and C.
Now this could be calculated at compile time if there were no cycles in the productions. For example, the simplest case would be:
A = 'a' A | ''
i.e. a sequence of a's. The cycle, or recursion, is caused by the A on the rhs. Of course, we know the solution in this simple case, but other's would, I guess, couldn't be solved or would be too difficult to solve. (I'm really guessing just based on my inability to figure out how). The point is, some could be
That adds another complication. But then my point was that: 1) the dynamic nature of spirit parsers makes it impossible to compute the first/follow sets at compile time. 2) You use this first/follow information to A) manipulate and build a parse table in the form of a tuple with heterogeneous types (not an array or vector of homogeneous types) and B) make a prediction of which alternative to choose in a tuple<alternatives...>; so basically, you'll have a runtime variable N (index) based on the prediction, but being in a tuple, choosing the right alternative requires a static constant N (index) ( i.e. get<N>(alt) instead of alt[N] ). The impedance mismatch between compile time and runtime is what makes this so darned difficult.
while other's might not be; hence, a more accurate statement would be: 2) some of the first and follow sets of A, B, and C could not be calculated at compile time or would be "too difficult" to calculate at compile time.
Yes! Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
An alternative scheme is to let the user switch strategies (possibly using spirit directives).
Another scheme, used by PCCTS, is to make the parser predictive by default, and allow the user to use syntactic and semantic predicates with arbitrary lookahead on specific points in the grammar.
Eewww, I smell a kludge ;-)
Not really. There's a big problem with backtracking in general: backtracking and semantic actions do not mix well (e.g. false triggering). In a syntactic and semantic predicate, all actions are intentionally inhibited.
Thats up to now the only problem I had, when using spirit. Are there plans, to allow defering of semantic actions? E.g like the phrase and character level directives that can be used switch the parsers behaviour at certain positions within rules, maybe that could be done for semantic actions too. Such that all semantic actions are triggered after the rules within the directive returned sucess. Maybe there are better solutions. Regards, Andreas Pokorny -- Psssst! Mit GMX Handyrechnung senken: http://www.gmx.net/de/go/mail 100 FreeSMS/Monat (GMX TopMail), 50 (GMX ProMail), 10 (GMX FreeMail)

Andreas Pokorny wrote:
Joel de Guzman wrote:
An alternative scheme is to let the user switch strategies (possibly using spirit directives).
Another scheme, used by PCCTS, is to make the parser predictive by default, and allow the user to use syntactic and semantic predicates with arbitrary lookahead on specific points in the grammar.
Eewww, I smell a kludge ;-)
Not really. There's a big problem with backtracking in general: backtracking and semantic actions do not mix well (e.g. false triggering). In a syntactic and semantic predicate, all actions are intentionally inhibited.
Thats up to now the only problem I had, when using spirit. Are there plans, to allow defering of semantic actions?
Yes.
E.g like the phrase and character level directives that can be used switch the parsers behaviour at certain positions within rules, maybe that could be done for semantic actions too.
Right.
Such that all semantic actions are triggered after the rules within the directive returned sucess.
Maybe there are better solutions.
Defering of semantic actions to be executed later is a possibility. Frank Birbacher worked on it sometime ago. I haven't followed up with the latest developments. One thing to note though is that deferred actions and auto syntax/parse tree generation is roughly the same thing. It might be interesting to merge deferred actions with tree generation. To guarantee against false trigerring, you'll have to defer the actions all the way up to the start rule. Only then shall all valid paths be known. It is also possible to defer actions only on certain points, but that does not guarantee false triggering if it is itself a part of an alternative somewhere up the parser hierarchy. Providing an explicit directive (say, defer_action[p]) is one way of doing it and passes the responsibility of ensuring that it is placed only in deterministic points (i.e. where we are sure that no backtracking will happen) to the client. However, I'm not sure how error prone that will be. It might also be interesting to let the parser determine the deterministic points and trigger all defered actions accumulated so far when the parse reaches such a point. Cheers, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
An alternative scheme is to let the user switch strategies (possibly using spirit directives).
Another scheme, used by PCCTS, is to make the parser predictive by default, and allow the user to use syntactic and semantic predicates with arbitrary lookahead on specific points in the grammar.
Eewww, I smell a kludge ;-)
Not really. There's a big problem with backtracking in general: backtracking and semantic actions do not mix well (e.g. false triggering). In a syntactic and semantic predicate, all actions are intentionally inhibited.
Thats up to now the only problem I had, when using spirit. Are there
Andreas Pokorny wrote: plans,
to allow defering of semantic actions?
Yes.
E.g like the phrase and character level directives that can be used switch the parsers behaviour at certain positions within rules, maybe that could be done for semantic actions too.
Right.
Such that all semantic actions are triggered after the rules within the directive returned sucess.
Maybe there are better solutions.
Defering of semantic actions to be executed later is a possibility. Frank Birbacher worked on it sometime ago. I haven't followed up with the latest developments. One thing to note though is that deferred actions and auto syntax/parse tree generation is roughly the same thing. It might be interesting to merge deferred actions with tree generation. Ah yes the ast. I once wrote a parser using spirit, that parsed dependency strings for a currently delayed linux package management system. Due to
Joel de Guzman wrote: possible false triggering of semantic actions I had to reiterate over the ast output. At that time i found the ast generating syntax not intuitive, only with the help of the debugging features, I was able to figure out how the tree was built. I would like to see either a more transparent interface, or a better documentation. Also, I had problems to make the tree create user defined tree nodes. I wanted to have at least three different tree node types to be generated in the parsing process. One to reflect an if-else-clause, one for a boolean expression and another one for a block of packages. But the tree generating code only works with a single type. I think it might be possible to define a base tree template node class, that knows its descendant, a type list of possible child nodes, and which defines basic tree storing and access operations. Thus the user only has to derive from that strucuture.
To guarantee against false trigerring, you'll have to defer the actions all the way up to the start rule. Only then shall all valid paths be known. It is also possible to defer actions only on certain points, but that does not guarantee false triggering if it is itself a part of an alternative somewhere up the parser hierarchy. Providing an explicit directive (say, defer_action[p]) is one way of doing it and passes the responsibility of ensuring that it is placed only in deterministic points (i.e. where we are sure that no backtracking will happen) to the client. However, I'm not sure how error prone that will be. It might also be interesting to let the parser determine the deterministic points and trigger all defered actions accumulated so far when the parse reaches such a point.
Sometimes the user knows synchronizing symbols, e.g. ";" in C, which could trigger all delayed actions. But a parser that evaluates such stuff would of course be much nicer. Also one needs semantic actions within the parsing process to redefine the behaviour of dynamic parsers, which is a really cool feature of spirit. Regards Andreas Pokorny -- Psssst! Mit GMX Handyrechnung senken: http://www.gmx.net/de/go/mail 100 FreeSMS/Monat (GMX TopMail), 50 (GMX ProMail), 10 (GMX FreeMail)

Dave Harris wrote:
In-Reply-To: <cqcjii$hs0$1@sea.gmane.org> dave@boost-consulting.com (David Abrahams) wrote (abridged):
That's okay for simple things, but for any serious printing work you need to know you're talking to a printer. For example, you may need to embed postscript in your output stream.
Not all printers support postscript, so you'd need to know not just that you were printing, but what kind of printer it was.
Not on the platforms I used; there was always a way to send the postscript in such a way as not to interfere with printers that didn't support it.
Although there probably needs to be some kind of abstraction-busting pass-through mechanism, I see it as the exception rather than the rule.
On the platforms where I did this, that mechanism was for the most part built into the OS. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

* Dave Harris <brangdon@cix.compulink.co.uk> [2004-12-23 17:46]:
In-Reply-To: <cqcjii$hs0$1@sea.gmane.org> dave@boost-consulting.com (David Abrahams) wrote (abridged):
That's okay for simple things, but for any serious printing work you need to know you're talking to a printer. For example, you may need to embed postscript in your output stream.
Not all printers support postscript, so you'd need to know not just that you were printing, but what kind of printer it was. Although there probably needs to be some kind of abstraction-busting pass-through mechanism, I see it as the exception rather than the rule.
Another issue is transparency. My employer's drawing framework supports semi-transparency in the form of an alpha channel. This is used by us for things like anti-aliasing, and by users for subtle colour blends. Some printers can support transparency directly and some cannot. We found we needed a relatively elaborate architecture to hide that detail from application drawing code.
I'm a bit concerned about the scope of this boost project. Our drawing framework is pretty huge.
This is an important discussion to have sooner than later. 1) Is a GUI project a bad idea? 2) Is Boost the right place for the a GUI project? * No * no.a) Where does this project belong? * Yes * yes.a) How is a GUI project different from existing Boost projects? yes.b) Which Boost projects have faced challenges similiar to Boost GUI? yes.c) What resources need to be created to support Boost GUI? yes.d) What procedures need to be created to support Boost GUI? I believe that Boost is the right place for a GUI project. I am ready for a new, hybrid approach to GUI development. Q - 1) Yes. I believe the time is ripe for a small, light-weight, XML + CSS renderer to attack the new surge of RSS content on the web. Existing CSS renders are either to big, poorly documented, or are licensed in ways that displease me. I consider how I'd approach the construction of such a component, and I immediately turn to modern C++. I imagine benefits like performance, economy of experssion through generics, conceptual compatibility with underlying OS code. Q - 2) Yes. If the project is to develop a GUI library using modern C++, then Boost is the right place. Boost has the community. Boost has the knowledge. Developing a GUI library is a complex undertaking. Startging out with a new community slows things down considerably. Building a community takes time. Establishing mailing lists takes time. Setting up a web site takes time. Choosing a name takes time. Upon success of the project, it is likely to need a separate umberella, but until then, building a community to support a project so complex, to me, is far more daunting than the software itself. -- Alan Gutierrez - alan@engrm.com

Alan Gutierrez wrote:
Q - 2) Yes.
If the project is to develop a GUI library using modern C++, then Boost is the right place. Boost has the community. Boost has the knowledge.
Developing a GUI library is a complex undertaking. Startging out with a new community slows things down considerably.
Building a community takes time. Establishing mailing lists takes time. Setting up a web site takes time. Choosing a name takes time.
Upon success of the project, it is likely to need a separate umberella, but until then, building a community to support a project so complex, to me, is far more daunting than the software itself.
I don't know if this is what you're suggesting, but I can't overemphasize how odious I find the idea of capitalizing on Boost's ready-made community, knowledge, and mailing lists, only to take the successful project away later and run it under "a separate umbrella." -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

I don't know if this is what you're suggesting, but I can't overemphasize how odious I find the idea of capitalizing on Boost's ready-made community, knowledge, and mailing lists, only to take the successful project away later and run it under "a separate umbrella."
This directly concerns me because I have a lot at stake with the upcoming Boost Interfaces Library. When it is proposed to Boost and goes through the inevitable evolution, if it doesn't meet my rather stringent guidelines I will be forced to create my own version of the library. Would doing so be odious? It seems to me that if a new project is spawned, then it becomes healthy competition for Boost and promotes technological progress. Christopher Diggins http://www.cdiggins.com http://www.heron-language.com

christopher diggins wrote:
I don't know if this is what you're suggesting, but I can't overemphasize how odious I find the idea of capitalizing on Boost's ready-made community, knowledge, and mailing lists, only to take the successful project away later and run it under "a separate umbrella."
This directly concerns me because I have a lot at stake with the upcoming Boost Interfaces Library. When it is proposed to Boost and goes through the inevitable evolution, if it doesn't meet my rather stringent guidelines I will be forced to create my own version of the library. Would doing so be odious? It seems to me that if a new project is spawned, then it becomes healthy competition for Boost and promotes technological progress.
That's fine. In that case Boost still has a project that reflects the needs and input of its membership. That's very different from starting a project here and using our intellectual resources and mailing list bandwidth with a plan or intention to take the project away later. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams wrote:
I don't know if this is what you're suggesting, but I can't overemphasize how odious I find the idea of capitalizing on Boost's ready-made community, knowledge, and mailing lists, only to take the successful project away later and run it under "a separate umbrella."
Hmm - maybe this concern is a little premature. Perhaps it might be best left until such a library is actually built in accordance with some boost consensus and passes boost review. Robert Ramey

Robert Ramey wrote:
David Abrahams wrote:
I don't know if this is what you're suggesting, but I can't overemphasize how odious I find the idea of capitalizing on Boost's ready-made community, knowledge, and mailing lists, only to take the successful project away later and run it under "a separate umbrella."
Hmm - maybe this concern is a little premature. Perhaps it might be best left until such a library is actually built in accordance with some boost consensus and passes boost review.
That would certainly be much too late, especially if the intention is to do the major work of development here and then take it away before even submitting it for review. My concern may be the result of misinterpretation, but if not, we ought to deal with it now. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

At 01:50 PM 12/24/2004, David Abrahams wrote:
Robert Ramey wrote:
David Abrahams wrote:
I don't know if this is what you're suggesting, but I can't overemphasize how odious I find the idea of capitalizing on Boost's ready-made community, knowledge, and mailing lists, only to take the successful project away later and run it under "a separate umbrella."
Hmm - maybe this concern is a little premature. Perhaps it might be best left until such a library is actually built in accordance with some boost consensus and passes boost review.
That would certainly be much too late, especially if the intention is to do the major work of development here and then take it away before even submitting it for review.
My concern may be the result of misinterpretation, but if not, we ought to deal with it now.
Yes. We do have successful models from Spirit, Python, and several other libraries of projects that run their own mailing lists and some other resources while continuing to work under the Boost umbrella and continuing to contribute to Boost releases. That's fine, and hopefully what the OP was talking about. But to hijack Boost resources during early development with no plan to actually summit a library for formal review would be totally unacceptable and lead to a lot of ill-will and hard feelings. --Beman

David Abrahams wrote:
I don't know if this is what you're suggesting, but I can't overemphasize how odious I find the idea of capitalizing on Boost's ready-made community, knowledge, and mailing lists, only to take the successful project away later and run it under "a separate umbrella."
FYI, I have no intention on doing this since Boost has a lot of prewrapped functionality such as build and test systems as well as a lot of libraries such as date-time and signals that my design is currently dependant on these (and growing in dependance). Splitting the GUI libraries would make it hard for new developers to get up to speed as they would need a version of Boost. Also, in moving the library, you would have the same problem as the original approach of starting out. There are people very knowledgable in various areas that frequent the Boost mailinglists and may not want to go to watch "a separate umbrella", but may have a thought that is important to the development of the GUI libraries. My main concern is in keeping the library simple, w.r.t. adoption, while allowing more complex design patterns such as splitters, text/HTML document viewers/editors and others. I am trying to keep the core functionality as simple and small as possible. Happy Holidays, Fröhliche Weihnachten, Feliz Navidad, Reece

* David Abrahams <dave@boost-consulting.com> [2004-12-24 13:28]:
Alan Gutierrez wrote:
Q - 2) Yes.
If the project is to develop a GUI library using modern C++, then Boost is the right place. Boost has the community. Boost has the knowledge.
Developing a GUI library is a complex undertaking. Startging out with a new community slows things down considerably.
Building a community takes time. Establishing mailing lists takes time. Setting up a web site takes time. Choosing a name takes time.
Upon success of the project, it is likely to need a separate umberella, but until then, building a community to support a project so complex, to me, is far more daunting than the software itself.
I don't know if this is what you're suggesting, but I can't overemphasize how odious I find the idea of capitalizing on Boost's ready-made community, knowledge, and mailing lists, only to take the successful project away later and run it under "a separate umbrella."
By Jove, No! I do not believe that a project of this sort, upon success, could be severed from its community. Please, don't think that I was suggesting that a Boost GUI would survive without depth of knowledge, and collective wisdom of the Boost contriubtors. For me to suggest such a thing, is not only odious, it is down-right assinine. I know where you are coming from. I ain't one of those. I'm imagining a project like this will create artifacts, like a froms library, or an XML + CSS renderer, or a vector graphics library, that would attract end users who are not at all familiar with the finer points of C++ programming. A GUI library would put screenshots on the Boost web site for the first time, and screen shots attract all sorts of inquieries. I phrased my post to attempt to address, and raise, the concerns of those who don't want to answer, "how do I put a check mark next to a menu item?", three times a week. If there there a couple screen shots of a versitile boost::grid on the Boost web site, you could bank postings that were no more than "I don't know C++, but tell me how to to that!" I want to see this project develop under Boost, because I think that it would generate an excellent GUI library implementation. I am hoping that the project is not forsaken because it is too broad in scope, since that would invite a new organization to form under the misconception that a GUI library has a necessarily huge code base. The Boost community, I suspect, is going to be particular about keeping the implementation lean. I imagine "a separate umberella", Boost branded and managed, to handle a new set of end users who arrive in response to a Boost GUI, who are not C++ programmers, but are compelled by what they see to employ C++ in their projects. With the Boost Python in place, plus a Boost GUI, Boost might get traffic that is not at all C++ related. I'd hate to see that shut Boost GUI down, or send it off to browner pastures. -- Alan Gutierrez - alan@engrm.com
participants (13)
-
Alan Gutierrez
-
Andreas Pokorny
-
Beman Dawes
-
brangdon@cix.compulink.co.uk
-
christopher diggins
-
David Abrahams
-
Hartmut Kaiser
-
Joel
-
Joel de Guzman
-
Larry Evans
-
Matt Hurd
-
Reece Dunn
-
Robert Ramey