Re: [boost] [Review] xpressive

16 Sep 2005


      My reply to Darren seems to have been eaten by the GMane monster. 
Resending....


Eric Niebler wrote:
...
Answers inline...
Darren Cook wrote:
...
Do you think the library should be accepted as a Boost library? Yes, but
conditional on having benchmarks showing a worthwhile speed improvement
over boost.regex. (Or alternatively over spirit.) Without that there is
no strong reason to have it in Boost along with both Spirit and 
boost.regex.
There are results from of performance benchmarks of static xpressive 
vs. dynamic xpressive vs. Boost.Regex in the Appendix of xpressive's 
documentation. You must have missed it. See:
http://boost-sandbox.sf.net/libs/xpressive/doc/html/xpressive/perf.html
In short, xpressive comes out consistently ahead of Boost.Regex on 
short matches, and roughly on par for longer matches (with wide 
variation). Results are shown for both gcc 3.4 and vc7.1. The 
xpressive download includes the code for the perf test, so you can run 
it yourself, if you like.
...
* user_s_guide.html
 As I read I assumed "sregex" meant static (compile-time) regex. I then
thought compile() must be very clever and wondered why bother with the
alternative ">>" syntax.
 So I think you need to make it clearer on this page that sregex means
std::string regex, and that compile() is for a run-time regex, and the
">>" syntax is for a compile-time regex.
Agreed.
...
* creating_a_regex_object.html
 1. Either the meaning of Perl's /s modifier needs to be defined
clearly, or the difference between "_" and "~_n" needs to be shown with
an example (incidentally none of your examples at examples.html match
strings with carriage-returns).
Agreed. FYI, "_" matches any one character. ~_n matches any character 
that is not '\n'. I also need to describe _ln which matches a logical 
newline (eg., "\n" or "\r" or "\r\n" or other line separators) and 
~_ln which matches any one character that is not a line separator. 
This all needs to be documented better.
...
2. I see I can use icase("Abc") but is there a way to say the whole
regex should be case-insensitive? I.e. the equivalent of:
 "/match something/i"
You can just wrap the whole regex in icase(). I need to show an 
example of that.
...
* grammars_and_nested_matches.html
In the example that starts:
  sregex parentheses;
  parens = '('
should "parens" actually be "parentheses" ?
Yes. My bad.
...
2. In Filtering Nested Results, I wasn't clear what the purpose was. Is
it to show all the name matches before all the id matches? If so,
choosing a less regular example string would help, e.g. with more names
than ids, names following names some of the time, etc.
I'm not at all sure of the utility of the nested results filter, and I 
may just cut it. After matching a regex that contains nested regexes, 
the match_results object contains nested results. Figuring out which 
results correspond to which regex can be difficult. The filter lets 
you see only those results corresponding to a particular nested regex. 
But I've yet to need it in practice. *shrug*
...
3. "See the definition of output_nested_results in the Examples 
section."
  I think that function should be moved to
grammars_and_nested_matches.html; it seemed out of place in 
examples.html.
You're right it doesn't belong in Examples. But I didn't want to 
clutter the user doc with what is really an implementation detail. 
I'll think about it.
...
* Other
 1. I'd like to see some fuller examples, that show the I/O part as
well. E.g. a full program that takes a list of email addresses on stdin,
one per line, and spits out a list of the illegal ones.
Haha! Have you /seen/ the regex that matches email addresses? It's 5 
pages long. But I get the idea -- examples are important. I'll see 
what I can come up with.
...
2. Benchmarking. I wanted to see the relative speed of compile-time vs.
run-time vs. boost::regex (and ideally vs. PCRE or a scripting language)
on some realistic application.
It's in there.
-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com