
John Maddock wrote: JM> The problem with separating the parser out, is that it can often benefit JM> from linkage to the state machine format: it needs to know what "kinds" of JM> states are supported: whether the state machine uses a "complex" or JM> "reduced" instruction set This is why I suggested that the better way (I do not claim it's the best way, though) of connecting parser to state machine is by having regex_creator to perform the shift-reduce activity. These actions must be present somewhere in some form in any case, and moving them to regex_creator allows us to get by without creating additional structures (read: trees) if we don't need them explicitly. The following sample creates a tree as the parser calls creator's feature functions. Can your regex compilation process be represented this way? class abstract_regex_creator { public: void charset( const charset & ) { throw exception_unsupported; } void repeat( int, int, bool ) { throw exception_unsupported; } void cat() { throw exception_unsupported; } // ... }; class sample_regex_creator: public abstract_regex_creator { protected: // declare node_stack here public: void charset( const charset &cs ) { // push the charset node onto the stack } void repeat( int lo, int hi, bool lazy ) { // pop the top node N, repeat it (expanding into whatever // subexpression is necessary), and push the resulting node back } void cat() { // pop two nodes, push the cat node back } // and so on; if any regex feature is not supported, // just don't mention it here, so it will throw an exception }; JM> (for example whether to expand +, ?, and {a,b} JM> into just | and * operations: as most DFA implementations require. I JM> haven't looked to see if yours does though). Types of regex elements that are directly supported are defined in "anfa-nodes.h": null/charset leafs; +, +?, ?, ??, () unary operations; concatenation and alternation. * (*?) is expanded into + and ? (+? and ??). {a,b} is not in the source yet; it has to be expanded into a sequence of ?'s. -- Best regards, Vladimir Pozdyayev.