Detlef Meyer-Eltz wrote:
Now you can imagine, that it is a shock for me, to discover, that I misinterpreted the leftmost longest rule in the manner I liked. I didn't stumble over this error, because the matching of two alternatives with the same length seems to be a rare case.
Nod.
void add_symbol(const charT* p, symbol_type s);
I don't know, whether there is a chance to write code for such an addition to a lexregex which is already compiled. Otherwise such a lexregex had to be compiled in an extra step before use. In this form I could make it on top of the existing regex class.
I think you would have to recompile the whole regex in order to add an arbitrary extension to the expression.
In this context there are two other points I'm interested in:
In my parser generator there is a preference for literal tokens already (they aren't treated as regular expressions but by a ternary tree), and I have a vague idea, that generally a token should be preferred the more, the more literally it is. In your documentation you mention some experimental non-member comparison operators. What is the idea behind these comparisions? Could they be used, to define preferences?
The comparison operators aren't used anywhere to determine matching: they can be used by the user to compare the result to a specific string for example.
I guess, that testing one token after the other would be much more expensive, than testing them together. All the more as there is a special feature of my parser generator not only to test for tokens at the actual location in the input as to look for the next location, where one of several tokens occur. Can you tell me something about these differences of costs?
It's likely to be more expensive yes: "impossible" branches in the state machine get eliminated quite quickly in the regex internals during matching, where as the "one expression at a time" approach necessarily tests all the candidates. The difference would depend very much upon the particular expressions though. HTH, John.