[proto]emulation of homomorphisms in hierarchical algebras (was in spirit-devl list

As Eric suggested below, this has been moved from spirit-devel at: http://thread.gmane.org/gmane.comp.parsers.spirit.devel/3200/focus=3211 On 06/25/2007 04:53 PM, Eric Niebler wrote:
(This seems to be drifting off-topic for this Spirit-devel list. Perhaps we should move this discussion to the boost list.)
[snip]
Of course, ET transforms are not purely compile-time, or at least, they don't have to be. They can have a run-time counterpart, too. I've added some examples recently to demonstrate this. Have a look at the CountLeaves transform at http://tinyurl.com/2qme82 for an example that uses std::plus to accumulate a runtime value.
Thanks Eric. I'll have a look.
However, I just thought I'd try and see if the
lookaheaads could be calculated a compile time. That's what I attempted in:
<boost-vault>/Strings - Text Proccesing/gram_lookahead.zip
As the comments indicate, it's not completed. I ran against a problem with the first/follow attributes which is, I think, related to proto's terminals. If you recall, about a week or month ago, there was some discussion about the arity of proto terminals. You decided to change the arity to 0. This was good because it was consistent with the way algebra's and their morphisms are defined. However, in the case of gram_lookahead.zip, this lead to the aforementioned problem with first/follow calculation.
Which problem?
The problem involves my attempt to model the lookahead problem as an algebra problem whose solution involved the following steps. 1) For a grammar expression, e_gram, (a "source" algebra expression) perform a homomorphism, gram2X. to a corresponding target algebra, X, expression, e_X, where X is one of {empty,first,follow}. For example, gram2empty(inp | epsilon) = not_empty & yes_empty where inp is any element in the terminals of the grammar and epsilon is, of course, the epsilon expression representing the empty string. 2) Perform a sequence of reductions, based on "laws" of the target algebra, until a "constant" in the target algebra is reached. For example, some of the "laws" of the empty algebra are: not_empty & e_empty -reduce-> not_empty e_empty & not_empty -reduce-> not_empty where e_empty means any expression in the empty algebra. Based on this, and the example expression in 1): not_empty & yes_empty -reduce-> not_empty And not_empty is a "constant" in the empty algebra; do, the sequence of reductions is complete. Now, IIUC, a homomorphism maps functions with arity, n, in the source to functions with arity n in the target. Thus | in the gram algebra was mapped to & in empty algebra. If n = 0, this means constants in the source are mapped to cconstants in the target. For example, in the above example, inp_0, a constant in gram algebra was mapped to not_empty, a constant in the empty algebra. However, for gram2first, things aren't that simple. The constants in the first algebra are *sets* of constants in the gram algebra. More concretely: 1) homomorphism: gram2first 1.1) constants: gram2first(inp_0) = {inp_0} gram2first(inp_1) = {inp_1} ... gram2first(inp_n) = {inp_n} where n is the number of terminal symbols in the grammar. 1.2) binary |: gram2first(|) = \/ where \/ is set union operator or function. gram2first(e_gram_0 | e_gram_1) = gram2first(e_gram_0) \/ gram2first(e_gram_0) However, {inp_0} and {inp_1} and ... {inp_n} are not really a constants in the first grammar. Instead they are really composite terms or expressions in a multisorted set algebra with the single constant, empty_set, and binary function, insert: insert(empty_set,inp_0) insert(empty_set,inp_1) ... insert(empty_set,inp_n) and this is what lead me to look for something like the hierarchical algebra, which, I guess, is similar to the "hierarchical abstract type" in (6) here: http://tinyurl.com/ypekyd The type, P, in the above tinyurl corresponds to the multisorted set algebra mentioned above, and insert(empty_set,inp_0) would be an example of the "primitive term" mentioned in the tinyurl.
In gram_lookahead.zip, the morphisms
cannot proceed down the expression tree after a node with arity==0 is reached.
Huh, there is nothing "down" after a node with arity 0. That's all you get. :-)
The above definition of homomorphism mapped constants (functions with arith=0) to constants. However, in the case of gram2first, the mapping was from constants to some term in the primitive subalgebra of the target algebra (I don't even know if that's been defined somewhere, but I'd be very surprised if there were'nt some reference somewhere defining the idea). So instead of mapping constants to constants, in proto I'd be mapping arity-0 terms to arity-0 terms; however, the contents of those target arity-0 terms would be the primitive subalgebra terms; hence, there would be a "down" to get at the subalgebra term (in this case, probably implemented as mpl::set<GrammarTerminalNumeral,TermNum>).
However, with the first and follow algebra, the terminal
values are not single values like true or false, but instead multiple values like {ident,value,lpar} (e.g. the first attribute of factor nonterminal in arith_expression grammar).
OK, so your terminal is a 3-tuple. It's still a terminal.
Yes, but also a composite term in the primitive subalgebra. Your probably wondering why I'm so fixated on using the algebra concepts and I'm beginning to wonder myself ;) . However, my original intent was to try and be more formal and possibly gain some insight about how to do it other ways.
So, the reduction of values
in this grammar must continue through the value contained in the proto terminal (which is an instance of mpl::set on grammar terminals, like ident, value, or lpar).
I don't think you need proto to "recurse" into your terminals. I just think you need to write a custom transform that does something appropriate with your terminals.
OK, you're probably right and the attempt at formality is not worth the effort.
This lead me to search for something in
algebra literature to handle this, and that lead to something called "heirarchical albebras" where the arity-0 expressions are actually values in some other algebra (if I'm interpresting things correctly). This sounds like what I was looking for. I would be interesting if the transforms in proto actually were more like heirarchical algebra morphisms. Just something to think about (probably in the distant future ;) ).
Well, that's one approach. In fact, you can approximate that in proto today by wrapping a proto expression in a template, effectively hiding its proto-ness from proto, which would then treat it like any other terminal. You would be responsible for writing the transforms and/or contexts that knew about these special terminals and recursed into them.
OK, now you've inspired me to continue trying. Thanks for the feedback! -regards, Larry

I'm terrible at algebra, but I'll do my best. (Extensive quoting follows, because I'll have trouble following this discussion otherwise.) Larry Evans wrote:
The problem involves my attempt to model the lookahead problem as an algebra problem whose solution involved the following steps.
1) For a grammar expression, e_gram, (a "source" algebra expression) perform a homomorphism, gram2X. to a corresponding target algebra, X, expression, e_X, where X is one of {empty,first,follow}.
For example,
gram2empty(inp | epsilon) = not_empty & yes_empty
where inp is any element in the terminals of the grammar and epsilon is, of course, the epsilon expression representing the empty string.
2) Perform a sequence of reductions, based on "laws" of the target algebra, until a "constant" in the target algebra is reached.
For example, some of the "laws" of the empty algebra are:
not_empty & e_empty -reduce-> not_empty e_empty & not_empty -reduce-> not_empty
where e_empty means any expression in the empty algebra. Based on this, and the example expression in 1):
not_empty & yes_empty -reduce-> not_empty
And not_empty is a "constant" in the empty algebra; do, the sequence of reductions is complete.
I get this. Is this part purely compile time, or are some of these reductions dependent on runtime information? If it's purely compile time, this example is fairly close to what you've just described: http://tinyurl.com/253esm (http://boost-sandbox.sourceforge.net/libs/proto/doc/html/boost_proto/users_g...)
Now, IIUC, a homomorphism maps functions with arity, n, in the source to functions with arity n in the target. Thus | in the gram algebra was mapped to & in empty algebra. If n = 0, this means constants in the source are mapped to cconstants in the target. For example, in the above example, inp_0, a constant in gram algebra was mapped to not_empty, a constant in the empty algebra. However, for gram2first, things aren't that simple. The constants in the first algebra are *sets* of constants in the gram algebra. More concretely:
1) homomorphism: gram2first
1.1) constants:
gram2first(inp_0) = {inp_0} gram2first(inp_1) = {inp_1} ... gram2first(inp_n) = {inp_n}
where n is the number of terminal symbols in the grammar.
1.2) binary |:
gram2first(|) = \/ where \/ is set union operator or function.
gram2first(e_gram_0 | e_gram_1) = gram2first(e_gram_0) \/ gram2first(e_gram_0)
However, {inp_0} and {inp_1} and ... {inp_n} are not really a constants in the first grammar. Instead they are really composite terms or expressions in a multisorted set algebra with the single constant, empty_set, and binary function, insert:
insert(empty_set,inp_0) insert(empty_set,inp_1) ... insert(empty_set,inp_n)
and this is what lead me to look for something like the hierarchical algebra, which, I guess, is similar to the "hierarchical abstract type" in (6) here:
The type, P, in the above tinyurl corresponds to the multisorted set algebra mentioned above, and insert(empty_set,inp_0) would be an example of the "primitive term" mentioned in the tinyurl.
Ooof, I followed that link and I think I hurt myself. But I perhaps get the gist, which is that "first" for a bunch of alternates is a set union of the "first" sets of each alternate. "Input" here is what? A character? A string? A pattern? I suppose in the abstract it could be anything. I don't know what a "mustisorted set algebra" is, but it sounds like complication. Seems fairly simple .. you have a transform that builds a set by folding together the "first" sets of the alternates, and you recurse. Terminals yield a "first" set with one element. proto::transform::fold_tree would be handy for this, and you pass around the set you're building as a visitor parameter to the transform.
In gram_lookahead.zip, the morphisms
cannot proceed down the expression tree after a node with arity==0 is reached.
Huh, there is nothing "down" after a node with arity 0. That's all you get. :-)
The above definition of homomorphism mapped constants (functions with arith=0) to constants. However, in the case of gram2first, the mapping was from constants to some term in the primitive subalgebra of the target algebra (I don't even know if that's been defined somewhere, but I'd be very surprised if there were'nt some reference somewhere defining the idea). So instead of mapping constants to constants, in proto I'd be mapping arity-0 terms to arity-0 terms; however, the contents of those target arity-0 terms would be the primitive subalgebra terms; hence, there would be a "down" to get at the subalgebra term (in this case, probably implemented as mpl::set<GrammarTerminalNumeral,TermNum>).
I think now you lost me. I was thinking the "first" set would be a runtime data structure. But I guess I'm wrong. So let's back up. What does the "first" set represent?
However, with the first and follow algebra, the terminal
values are not single values like true or false, but instead multiple values like {ident,value,lpar} (e.g. the first attribute of factor nonterminal in arith_expression grammar).
OK, so your terminal is a 3-tuple. It's still a terminal.
Yes, but also a composite term in the primitive subalgebra. Your probably wondering why I'm so fixated on using the algebra concepts and I'm beginning to wonder myself ;) . However, my original intent was to try and be more formal and possibly gain some insight about how to do it other ways.
I encourage you to formalize what you're doing with proto. If proto obeys some algebra, or can be used to implement a useful one, I think that would be very interesting. I might even want to write a paper about it with you. :-)
So, the reduction of values
in this grammar must continue through the value contained in the proto terminal (which is an instance of mpl::set on grammar terminals, like ident, value, or lpar).
I don't think you need proto to "recurse" into your terminals. I just think you need to write a custom transform that does something appropriate with your terminals.
OK, you're probably right and the attempt at formality is not worth the effort.
See above.
This lead me to search for something in
algebra literature to handle this, and that lead to something called "heirarchical albebras" where the arity-0 expressions are actually values in some other algebra (if I'm interpresting things correctly). This sounds like what I was looking for. I would be interesting if the transforms in proto actually were more like heirarchical algebra morphisms. Just something to think about (probably in the distant future ;) ).
Well, that's one approach. In fact, you can approximate that in proto today by wrapping a proto expression in a template, effectively hiding its proto-ness from proto, which would then treat it like any other terminal. You would be responsible for writing the transforms and/or contexts that knew about these special terminals and recursed into them.
OK, now you've inspired me to continue trying.
Please do.
Thanks for the feedback!
My pleasure. -- Eric Niebler Boost Consulting www.boost-consulting.com

On 06/27/2007 06:40 PM, Eric Niebler wrote:
I'm terrible at algebra, but I'll do my best. (Extensive quoting follows, because I'll have trouble following this discussion otherwise.)
Larry Evans wrote:
The problem involves my attempt to model the lookahead problem as an algebra problem whose solution involved the following steps.
[snip]
I get this. Is this part purely compile time, or are some of these
Purely compile time. In an earlier version of gram_lookahead.zip (which I still have in a file, morphisms.cpp, if you're interested) I even had an recursive template compile time solution (just for empty attribute) corresponding to the runtime iterative solution implemented by: template < typename ProdList > bool solve ( ProdList& a_productions , unsigned iter=our_size+1 ) in cfg_lookahead_extends.cpp on line 2147. If you look in gram_lookahead.zip(gram_lookahead.cpp), you'll see several: template < class OutEmptySeq > struct apply The OutEmptySeq corresponds to the assumed values of lhs attributes, just like the superclass of the: template < lookahead::numerals_attr AttrNum , class NumeralsInp , class NumeralsOut
struct soln_context in cfg_lookahead.cpp corresponds to the runtime assumed value of the attributes.
reductions dependent on runtime information? If it's purely compile time, this example is fairly close to what you've just described:
(http://boost-sandbox.sourceforge.net/libs/proto/doc/html/boost_proto/users_g...)
Very good! That's a great help.
Now, IIUC, a homomorphism maps functions with arity, n, in the source to functions with arity n in the target.
[snip]
and this is what lead me to look for something like the hierarchical algebra, which, I guess, is similar to the "hierarchical abstract type" in (6) here:
The type, P, in the above tinyurl corresponds to the multisorted set algebra mentioned above, and insert(empty_set,inp_0) would be an example of the "primitive term" mentioned in the tinyurl.
Ooof, I followed that link and I think I hurt myself. But I perhaps get the gist, which is that "first" for a bunch of alternates is a set union of the "first" sets of each alternate.
Yes. The LewiLL1.zip in vault may be easier to follow. Lewi's the 1st author of the book on which LewiLL1.zip is based.
"Input" here is what? A character? A string? A pattern?
Just a terminal symbol. Lewi's book called terminals "inputs" and nonterminals "outputs". This is because, I think, he used the term, "vocabulary" as a generalization of both concepts, terminal and nonterminal symbols.
I suppose in the abstract it could be anything.
Well, just as a terminal symbol represents something abstract. The lexer could represent, e.g. the terminal symbol, mulop, as the character, '*' or something else, like '&' or '&&' or even '>>' or maybe even the successful parse of a whole other language. For example, a terminal symbol, 'stmt' could just be parsed as a simple identifier by a lexer or it could be parsed a a program statement by a statement cfg grammar.
I don't know what a "mustisorted set algebra" is,
Sort means type. A boolean algebra just involves the single boolean sort. See: http://www.artima.com/forums/flat.jsp?forum=106&thread=155960&start=60
but it sounds like complication.
Maybe, even probably. However, any abstraction could be considered a complication because something concrete is easier to understand that something abstract.
Seems fairly simple .. you have a transform that builds a set by folding together the "first" sets of the alternates, and you recurse. Terminals yield a "first" set with one element. proto::transform::fold_tree would be handy for this, and you pass around the set you're building as a visitor parameter to the transform.
I'll try that. Thanks!
In gram_lookahead.zip, the morphisms cannot proceed down the expression tree after a node with arity==0 is reached.
Huh, there is nothing "down" after a node with arity 0. That's all you get. :-)
The above definition of homomorphism mapped constants (functions with arith=0) to constants. However, in the case of gram2first, the mapping was from constants to some term in the primitive subalgebra of the target algebra (I don't even know if that's been defined somewhere, but I'd be very surprised if there were'nt some reference somewhere defining the idea). So instead of mapping constants to constants, in proto I'd be mapping arity-0 terms to arity-0 terms; however, the contents of those target arity-0 terms would be the primitive subalgebra terms; hence, there would be a "down" to get at the subalgebra term (in this case, probably implemented as mpl::set<GrammarTerminalNumeral,TermNum>).
I think now you lost me. I was thinking the "first" set would be a runtime data structure.
Yes, for cfg_lookahead_extends, but no for gram_lookahead. In gram_lookahead, these sets (one for each node in the expression tree), are calculated at compile time. Yeah, I know, it sounds way more complicated than any compiler can handle. However, as mentioned above (see morphisms.cpp), for very simple grammars, at least the empty attribute can be compile-time calculated.
But I guess I'm wrong. So let's back up. What does the "first" set represent?
The set of terminal symbols that can start a string in the language (or "sublanguage") described by a grammar expression. That's essentially repeated in: http://compilers.iecc.com/comparch/article/01-04-079 [snip]
Your probably wondering why I'm so fixated on using the algebra concepts and I'm beginning to wonder myself ;) . However, my original intent was to try and be more formal and possibly gain some insight about how to do it other ways.
I encourage you to formalize what you're doing with proto. If proto obeys some algebra, or can be used to implement a useful one, I think that would be very interesting. I might even want to write a paper about it with you. :-)
I've got some other ideas. For example, just as a grammar can be thought of as a set of equations expressible in proto ( as done by cfg_lookahead_extends) the equations describing the connections in a graph can be expressed if operator * can be somehow restricted to just two elements (i.e. a*b allowed but a*b*c is not), and where the 1st element in the factor pair is some literal and the second element represents an unknown. This could then be used to solve any number of path problems like the shortest path problem, and even DFA generation problem: http://groups.google.com/group/comp.compilers/msg/7d70310bc2c5aa45?dmode=source&hl=en This is where compile-time solution might also be important. After all, solving the following at compile-time: X = A * X + b where X and b are vectors and A is a matrix could be useful ;) I'm also wondering what's the relation between a proto grammar whose top level connective is bitwise_or and boost variant's recursive variant types: http://www.boost.org/doc/html/variant/tutorial.html#variant.tutorial.recursi... [snip]
Well, that's one approach. In fact, you can approximate that in proto today by wrapping a proto expression in a template, effectively hiding its proto-ness from proto, which would then treat it like any other terminal. You would be responsible for writing the transforms and/or contexts that knew about these special terminals and recursed into them.
OOPS. After thinking about it some more and writing the above: (one for each node in the expression tree), I realized restricting the lookahead attributes to the "wrappee's " of "wrapped" proto terminals won't work because the proto composite expressions (e.g. those with proto::tag::bitwise_or) also must have lookahead attributes. Back to drawing board :(
participants (2)
-
Eric Niebler
-
Larry Evans