[xpressive][review] regex type

What is the rationale for having the regex type bound to the iterator type used when matching/searching ? Thomas

Thomas Witt wrote:
What is the rationale for having the regex type bound to the iterator type used when matching/searching ?
Thomas
Good question. From the regex proposal, John writes: "There is one other restriction worth pointing out; the Boost interface [parameterized on Char instead of Iterator] does not easily allow an implementation to generate self-modifying machine code for the state machine. That would require the state machine to be fixed to a finite number of iterator types, and in practice there appear to be no regex implementations that make this design choice in any case." xpressive is such an implementation. At least, static xpressive is. Since the the full regex expression *and* the iterator type are known at compile time, the compiler is used to generate machine code to match a regex to a data sequence. Each static regex generates its own small custom pattern matching engine. For these regexes, there are no op-codes, jump tables, virtual calls or other indirections. If the regex type were parameterized on Char instead of Iterator, then the compiled form of a regex would have to be Iterator-neutral. You would have to separate the internal representation of the regex from the code that evaluates the regex against a data sequence. This requires an indirection somewhere -- in the case of Boost.Regex, it's op codes and a jump table. (John can correct me if I'm wrong here.) Arrays of function pointers flummox optimizers. In short, the answer is performance. In order to take full advantage of the compile-time information of static regexes, the regex type must be parameterized on Iterator, not Char. In case you're wondering if it matters or not, static regex are ~13% faster than their dynamic equivalents on average (on VC7.1). Perf results here: http://tinyurl.com/dblg9 HTH, -- Eric Niebler Boost Consulting www.boost-consulting.com
participants (2)
-
Eric Niebler
-
Thomas Witt