
Vladimir Pozdyayev wrote:
Hello.
Are there any plans to add finite automata-based subengine to the regex library? If there are, I can submit the algorithm/implementation of a modified nondeterministic FA which takes care of * greedy and lazy quantifiers, * eagerness of the alternation operator, * non-longest matches (`best' in the Perl-ish sense with respect to greediness/laziness/eagerness), * substring capturing. All these features being included, the time cost of a match is O(N(E+C)L), where N is the number of states, E is the average epsilon-closure size per state, C is the number of capturing parentheses, and L is the input length. As far as I know that's the lowest order of time complexity in this area (corrections are welcome)---though, of course, deterministic FA would be even faster.
I suppose there's no point in adding another regex library, but being able to guarantee linear time cost in an existing one would be nice.
Interesting. You say it supports substring capturing. Does it also support backreferences? By which I mean, can you refer to the captured substrings from within the pattern itself, as in the perl regex /(.)\1/? -- Eric Niebler Boost Consulting www.boost-consulting.com