
On 12/25/2004 11:43 PM, Joel de Guzman wrote: [snip]
Long story. Anyway, the basic problem is that Spirit parsers are dynamic. The computation of the first set, for instance, cannot be done at compile time. Say for example:
r = str_p("apple") | "banana" | "orange";
the first set of r is: ['a', 'b', 'o']. This can be obtained through a call to first(r) which calls first("apple"), first("banana") and first("orange") recursively.
The problem is that each alternative can potentially have different types. A production such as:
Could you explain what this means. I'm guessing it means each alternate would have a different synthesized attribut calculated for it.
a | b | c;
That is, a's attribute, A, could be a float, b's attribute, B, could be a string, and c's attribute, C, could be something else. Is that right?
can be thought of as a
composite<alternative_operation, tuple<A, B, C> >
Wouldn't the most "natural" attribute for this be a "named" variant which map contain the same type for two or more different "names"? IOW, instead of the variant currently in boost, the sum type in: sandbox/boost/indexed_types/sum.hpp would represent the result of parsing the alternative expression: a | b | c and the names could be 1,2,3 (all unsigned) or possibly names, such as elements of an enumeration: enum alt_abc{ alt_a, alt_b, alt_c};
where A, B and C are the types of a, b and c. So, now, when creating the parser, you'll have to make a table of alternates tuple<A, B, C> which dispatches properly based on information known only at runtime: 1) the current input character and
Of course, I understand how this must be only known at runtime.
2) the first follow sets of A, B and C.
Now this could be calculated at compile time if there were no cycles in the productions. For example, the simplest case would be: A = 'a' A | '' i.e. a sequence of a's. The cycle, or recursion, is caused by the A on the rhs. Of course, we know the solution in this simple case, but other's would, I guess, couldn't be solved or would be too difficult to solve. (I'm really guessing just based on my inability to figure out how). The point is, some could be while other's might not be; hence, a more accurate statement would be: 2) some of the first and follow sets of A, B, and C could not be calculated at compile time or would be "too difficult" to calculate at compile time. Regards, Larry