[ANN] xpressive 0.9

It's been nearly a year since I first announced my intention to build a new regular expression engine. It has finally reached a point where I am comfortable recommending it for general use. << xpressive 0.9 >> http://boost-sandbox.sourceforge.net/libs/xpressive - What is it? It is a regular expression engine which allows you to author regexes as expression templates (like Spirit) *or* as strings (like Boost.Regex). When written as an expression template, the regex is syntax-checked at compile-time and statically bound for maximal inlining and optimization. (See http://tinyurl.com/6k9p8 for more detail.) Also, regular expressions can nest and call each other recursively, which gives you the power of a context free grammar. That makes it appropriate for simple parsing tasks. (See http://tinyurl.com/4enn4 for more detail.) - What's the interface like? xpressive is a Boost.Regex work-alike. It follows the regex standardization proposal closely, but not too close! You can read about all the differences in xpressive's documentation. The domain-specific embeded language for xpressive is heavily influenced by Spirit. - Where can I read more? xpressive's documentation is online at: http://boost-sandbox.sourceforge.net/libs/xpressive - Where can I download the code? You can get the xpressive zip archive at: http://boost-sandbox.sourceforge.net/xpressive.zip It contains the source code and the documentation in PDF format. Alternatively, you can get xpressive directly from the boost-sandbox. Source code is at /boost-sandbox/boost/xpressive, and the regression test and documentation are at /boost-sandbox/libs/xpressive. Note: this is version 0.9. Check out the "Not Yet Implemented" section in the documentation to see why. I'll be working towards v1.0 in the coming weeks and months, but I am confident at this point that the interface is not likely to change. What's there is stable and solid and ready for (ab)use. Original announcement of xpressive: http://lists.boost.org/MailArchives/boost/msg55623.php Cheers, -- Eric Niebler Boost Consulting www.boost-consulting.com

Is there a speed comparison with http://research.microsoft.com/projects/greta/ ?

Goran Mitrovic wrote:
Is there a speed comparison with http://research.microsoft.com/projects/greta/ ?
I haven't yet benchmarked performance. I plan to soon, and when I do, I'll post the results. -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler wrote:
Goran Mitrovic wrote:
Is there a speed comparison with http://research.microsoft.com/projects/greta/ ?
I have some preliminary performance numbers at: http://tinyurl.com/6gefh This is a comparison of dynamic xpressive, static xpressive and Boost.Regex, using the latest versions of both libraries in CVS. I'll see what I can do about getting the numbers for GRETA up there, too. These numbers were achieved using gcc 3.3.3 (cygwin) on my old laptop. Why cygwin? Because I'm hitting a buffer overflow trying to compile the tests on VC7.1. :-P Gotta work around that. The summary: - xpressive does well on short string, often beating Boost.Regex by 4x or more on this test. - Boost.Regex does well on long string, often beating xpressive by 3x or more on this test. - static xpressive usually edges out dynamic xpressive, but not by a lot. I can't say for certain at this point why xpressive is faster on short strings and slower on long ones, but I can guess. I haven't spent much time at this point tuning performance. For xpressive's part, you're seeing the results of brute-force search. No boyer-moore or first-and-follow or special-case handling for narrow character sets ... or anything terribly clever at all. This is clearly hurting xpressive on the long strings, but these things make less of a difference on short matches. xpressive is still quite immature in this regard. I'm very happy with the short string performance, but I bet it's highly sensitive to hardware/compiler configuration. YMMV. The code for this test is checked in at /boost-sandbox/libs/xpressive/perf. -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler wrote:
I have some preliminary performance numbers at: http://tinyurl.com/6gefh
These numbers have been updated. In addition, I ran a benchmark using VC7.1, and the results are dramatically different, so I feel compelled to publish them. They're here: http://tinyurl.com/63ypy With VC7.1, the performance gap between xpressive and Boost.Regex much smaller than with gcc. -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler <eric <at> boost-consulting.com> writes: These numbers have been updated. In addition, I ran a benchmark using
VC7.1, and the results are dramatically different, so I feel compelled to publish them. They're here:
With VC7.1, the performance gap between xpressive and Boost.Regex much smaller than with gcc.
OK, tnx, but please pay attention to the Greta. Author claims to be few times faster (due to the virtual fcalls eliminations concept) than Boost's regex.

Stefan Slapeta <stefan <at> slapeta.com> writes:
Goran Mitrovic wrote:
OK, tnx, but please pay attention to the Greta. Author claims to be few times faster (due to the virtual fcalls eliminations concept) than Boost's regex. This one was _very_ funny. Was this meant ironically?
No, not really. Greta was never never referenced in his articles, nor a rename was mentioned. Also, I tend to memory useful things (library name instead of author's one) and Greta took such place with a something well implemented impression. Funny? Sure... ;)

Goran Mitrovic wrote:
OK, tnx, but please pay attention to the Greta. Author claims to be few times faster (due to the virtual fcalls eliminations concept) than Boost's regex.
Yep, I wrote GRETA. GRETA /was/ faster than Boost.Regex pre-1.31. Since then, Boost.Regex has been rewritten to be much more efficient. Boost.Regex and GRETA are now pretty close in performance. I wrote GRETA while at Microsoft, and the code is owned by them. I started writing xpressive when I left Microsoft. It is a ground-up reimplementation, not a rename of GRETA. GRETA is no longer being actively maintained. xpressive takes the virt-call-elimination trick to the extreme, using expression templates to eliminate *all* virtuals. -- Eric Niebler Boost Consulting www.boost-consulting.com

Hi, Eric: I download GRETA, but I cann't compiler it bcz of serveral link error. I am using vs.net2002 unmanged. I just copy the example from the document of GRETA. thanks. "Eric Niebler" <eric@boost-consulting.com> wrote in message news:413CB678.1000700@boost-consulting.com...
Goran Mitrovic wrote:
OK, tnx, but please pay attention to the Greta. Author claims to be few
times
faster (due to the virtual fcalls eliminations concept) than Boost's regex.
Yep, I wrote GRETA. GRETA /was/ faster than Boost.Regex pre-1.31. Since then, Boost.Regex has been rewritten to be much more efficient. Boost.Regex and GRETA are now pretty close in performance.
I wrote GRETA while at Microsoft, and the code is owned by them. I started writing xpressive when I left Microsoft. It is a ground-up reimplementation, not a rename of GRETA. GRETA is no longer being actively maintained.
xpressive takes the virt-call-elimination trick to the extreme, using expression templates to eliminate *all* virtuals.
-- Eric Niebler Boost Consulting www.boost-consulting.com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hi, Eric: Sorry for asking the same question twice! I know you have answered me about the GRETA link error today, but very incidently, when I drag msgs from bulk mail to my local inbox, all msg had been deleted by hotmail sever!!! so I haven't read your msg. Hotmail alert me an error occured, then delete all my incoming msg!!! sigh! I wish you can reply again, Thanks very much! sincerily, chaujohnthan onlyinfo at pub.dgnet.gd.cn

Chau Johnthan wrote:
Hi, Eric:
Sorry for asking the same question twice!
I know you have answered me about the GRETA link error today, but very incidently, when I drag msgs from bulk mail to my local inbox, all msg had been deleted by hotmail sever!!! so I haven't read your msg.
Hotmail alert me an error occured, then delete all my incoming msg!!! sigh!
I wish you can reply again, Thanks very much!
What I said in my private reply to you was: 1) Discussion about GRETA is off-topic here becuase it is not a part of Boost and never will be. 2) I am no longer the maintainer of GRETA. Presumably you downloaded it from GRETA's website, where it says who the new maintainers are. And even though you have given virtually no specific information about your problem, I can probably guess that it's because you didn't add GRETA's cpp files to your project. In general, if you have a question for one person, it's better to send it to that one person instead of a huge distribution list. -- Eric Niebler Boost Consulting www.boost-consulting.com

Hi, Eric:
1) Discussion about GRETA is off-topic here becuase it is not a part of Boost and never will be.
so the boost lib will not be compatible with GRETA?
2) I am no longer the maintainer of GRETA. Presumably you downloaded it from GRETA's website, where it says who the new maintainers are.
yes, i know that, but i want to use GRETA inwith boost.
And even though you have given virtually no specific information about your problem, I can probably guess that it's because you didn't add GRETA's cpp files to your project.
I just created a win32 project, and replace _tmian with the supplied sample in the early start of the document. #include <iostream> #include <string> #include "regexpr2.h" using namespace std; using namespace regex; int main() { match_results results; string str( "The book cost $12.34" ); rpattern pat( "\\$(\\d+)(\\.(\\d\\d))?" ); // Match a dollar sign followed by one or more digits, // optionally followed by a period and two more digits. // The double-escapes are necessary to satisfy the compiler. match_results::backref_type br = pat.match( str, results ); if( br.matched ) { cout << "match success!" << endl; cout << "price: " << br << endl; } else { cout << "match failed!" << endl; } return 0; } and I added regexp2.cpp & regexp2.h only. ------ Build started: Project: testreg, Configuration: Debug Win32 ------ Compiling... stdafx.cpp Compiling... regexpr2.cpp e:\testreg\regexpr2.cpp(59) : warning C4005: 'REGEXPR_H_INLINE' : macro redefinition e:\testreg\regexpr2.cpp(57) : see previous definition of 'REGEXPR_H_INLINE' e:\testreg\regexpr2.cpp(64) : error C2006: '#include' : expected a filename, found 'identifier' e:\testreg\regexpr2.cpp(151) : warning C4005: 'REGEX_DECL_CTYPE' : macro redefinition e:\testreg\regexpr2.cpp(100) : see previous definition of 'REGEX_DECL_CTYPE' e:\testreg\regexpr2.cpp(198) : warning C4005: 'REGEX_DEBUG_HEAP' : macro redefinition e:\testreg\regexpr2.cpp(196) : see previous definition of 'REGEX_DEBUG_HEAP' e:\testreg\regexpr2.cpp(1883) : warning C4005: 'DECLARE_RECURSIVE_MATCH_ALL' : macro redefinition e:\testreg\regexpr2.cpp(1777) : see previous definition of 'DECLARE_RECURSIVE_MATCH_ALL' e:\testreg\regexpr2.cpp(1919) : warning C4005: 'DECLARE_ITERATIVE_MATCH_THIS' : macro redefinition e:\testreg\regexpr2.cpp(1804) : see previous definition of 'DECLARE_ITERATIVE_MATCH_THIS' e:\testreg\regexpr2.cpp(1935) : warning C4005: 'DECLARE_ITERATIVE_REMATCH_THIS' : macro redefinition e:\testreg\regexpr2.cpp(1821) : see previous definition of 'DECLARE_ITERATIVE_REMATCH_THIS' e:\testreg\regexpr2.cpp(6297) : warning C4005: 'REGEX_TO_INSTANTIATE' : macro redefinition e:\testreg\regexpr2.cpp(6294) : see previous definition of 'REGEX_TO_INSTANTIATE' e:\testreg\regexpr2.cpp(6325) : fatal error C1010: unexpected end of file while looking for precompiled header directive testreg.cpp Generating Code... Build log was saved at "file://e:\testreg\Debug\BuildLog.htm" testreg - 2 error(s), 7 warning(s) if I added all of those, there will be a lot of errors also. i googled it, found only one result in experts-exchange, which the one said he fixed it by adding some files into his project. but i missed it no matter what i do.
In general, if you have a question for one person, it's better to send it to that one person instead of a huge distribution list.
no offensive here, i replied you, but failed. so i posted it again here. thanks for your patientness.

hi, Eric: hello? "Eric Niebler" <eric@boost-consulting.com> wrote in message news:413F45AD.7050906@boost-consulting.com...
Chau Johnthan wrote:
Hi, Eric:
Sorry for asking the same question twice!
I know you have answered me about the GRETA link error today, but very incidently, when I drag msgs from bulk mail to my local inbox, all msg had been deleted by hotmail sever!!! so I haven't read your msg.
Hotmail alert me an error occured, then delete all my incoming msg!!! sigh!
I wish you can reply again, Thanks very much!
What I said in my private reply to you was:
1) Discussion about GRETA is off-topic here becuase it is not a part of Boost and never will be.
2) I am no longer the maintainer of GRETA. Presumably you downloaded it from GRETA's website, where it says who the new maintainers are.
And even though you have given virtually no specific information about your problem, I can probably guess that it's because you didn't add GRETA's cpp files to your project.
In general, if you have a question for one person, it's better to send it to that one person instead of a huge distribution list.
-- Eric Niebler Boost Consulting www.boost-consulting.com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

"Chau Johnthan" <chaujohnthan@hotmail.com> wrote in message news:chnlsm$plr$1@sea.gmane.org...
"Eric Niebler" <eric@boost-consulting.com> wrote in message:
[reordering]
In general, if you have a question for one person, it's better to send it to that one person instead of a huge distribution list.
hi, Eric:
hello?
Hello isn't even a question. Jonathan

Hi, Eric: I am puzzled, why GRETA can't work when using #include "stdafx.h"? how should i do to make it work, in the mean time to keep a normal wiz-gen win32 project enviorment? thanks. johnthan

"Chau Johnthan" <chaujohnthan@hotmail.com> wrote in message:
Hi, Eric:
I am puzzled, why GRETA can't work when using #include "stdafx.h"?
how should i do to make it work, in the mean time to keep a normal wiz-gen win32 project enviorment?
A lot of people read this list and get annoyed by off-topic posts. Jonathan "Eric Niebler" <eric@boost-consulting.com> wrote:
What I said in my private reply to you was:
1) Discussion about GRETA is off-topic here becuase it is not a part of Boost and never will be.
2) I am no longer the maintainer of GRETA. Presumably you downloaded it from GRETA's website, where it says who the new maintainers are.

"Chau Johnthan" <chaujohnthan@hotmail.com> writes:
Hi, Eric:
I am puzzled, why GRETA can't work when using #include "stdafx.h"?
how should i do to make it work, in the mean time to keep a normal wiz-gen win32 project enviorment?
thanks.
johnthan
Johnthan (or is it Chau?), Please take your GRETA questions to one of the appropriate *current* maintainers of that library. Nobody here right now even knows about it, since Eric is on vacation. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Can xpressive be used as scanner with spirit, in the (spirit) of lex + yacc?

Neal D. Becker wrote:
Can xpressive be used as scanner with spirit, in the (spirit) of lex + yacc?
It hasn't yet been tried. But Hartmut once wrote an interfacing layer so that xpressive regexes can be embedded into Spirit rules. The intention was to use xpressive as a lexer for Spirit in Wave, Hartmut's pre-processor. It hasn't been attempted yet, AFAIK. <aside> The future for xpressive and Spirit looks very interesting. Joel and company have been planning Spirit-2, and we plan to use some ideas from xpressive to give Spirit-2 optional exhaustive backtracking semantics. The plan is to bring these two libraries closer together. </aside> -- Eric Niebler Boost Consulting www.boost-consulting.com

"Eric Niebler" <eric@boost-consulting.com> wrote in message news:4137BFDD.1000805@boost-consulting.com...
It's been nearly a year since I first announced my intention to build a new regular expression engine. It has finally reached a point where I am comfortable recommending it for general use.
<< xpressive 0.9 >>
Looks great! A few questions. (Please forgive me if they are answered in the docs, I haven't finished reading them): - Have you considered allowing format strings to be bound statically? - Have you considered using _1, _2, _3 instead of s1, ... for capturing paretheses? You could also use them as placeholders in statically-bound format strings. - have you considered overloading operator/ to emulate perl syntax for substitution. E.g., s/expr/fmt, sub/expr/fmt, or just expr/fmt could return a function object with templated operator which invokes regex_replace. Just some thoughts .... Best Regards, Jonathan

Jonathan Turkanis wrote:
"Eric Niebler" <eric@boost-consulting.com>
<< xpressive 0.9 >>
Looks great!
Thanks!
A few questions. (Please forgive me if they are answered in the docs, I haven't finished reading them):
- Have you considered allowing format strings to be bound statically?
That hadn't occured to me. Certainly possible, but I think the returns would be small. The complexity of doing string replacements is linear, and so is not as sensitive to optimization as pattern matching. And the format string has very simple syntax, so static syntax checking isn't as big of a win here, either.
- Have you considered using _1, _2, _3 instead of s1, ... for capturing paretheses? You could also use them as placeholders in statically-bound format strings.
Yes, I used _1, _2, _3 for a long time. I abandoned it because of name conflicts with the identically-named placeholders from other boost libraries. I picked s1, s2, ... for two reasons: 1) The "s" in "s1" stands for "sub-match", which is what these thigs represent. 2) s1 kind of looks like $1, which is the perl equivalent. That said, I'm open to suggestions for avoiding the name conflicts. I would consider switching back to _1 _2 _3 if the technical problems were overcome and if people liked it better.
- have you considered overloading operator/ to emulate perl syntax for substitution. E.g.,
s/expr/fmt, sub/expr/fmt, or just expr/fmt
could return a function object with templated operator which invokes regex_replace.
A little too cute, IMO.
Just some thoughts ....
Keep 'em coming. -- Eric Niebler Boost Consulting www.boost-consulting.com

"Eric Niebler" <eric@boost-consulting.com> wrote in message news:413A5044.6020604@boost-consulting.com...
Jonathan Turkanis wrote:
- Have you considered allowing format strings to be bound statically?
That hadn't occured to me. Certainly possible, but I think the returns would be small. The complexity of doing string replacements is linear, and so is not as sensitive to optimization as pattern matching. And the format string has very simple syntax, so static syntax checking isn't as big of a win here, either.
- Have you considered using _1, _2, _3 instead of s1, ... for capturing paretheses? You could also use them as placeholders in statically-bound
Makes sense. format
strings.
Yes, I used _1, _2, _3 for a long time. I abandoned it because of name conflicts with the identically-named placeholders from other boost libraries.
I know this problem well :(
I picked s1, s2, ... for two reasons: 1) The "s" in "s1" stands for "sub-match", which is what these thigs represent. 2) s1 kind of looks like $1, which is the perl equivalent.
I didn't think of 2). Did you put in in the docs? FWIW, capital 'S' looks more like '$' to me. E.g., '<' >> (S1= +_w) >> '>' >> -*_ >> "</" >> S1 >> '>'
That said, I'm open to suggestions for avoiding the name conflicts. I would consider switching back to _1 _2 _3 if the technical problems were overcome and if people liked it better.
May I assume you have considered reusing the placeholders from boost::bind? There doesn't seem to be much operator overloading involving boost::arg<>.
- have you considered overloading operator/ to emulate perl syntax for substitution. E.g.,
s/expr/fmt, sub/expr/fmt, or just expr/fmt
could return a function object with templated operator which invokes regex_replace.
A little too cute, IMO.
Oh well ;-) Jonathan

Jonathan Turkanis wrote:
"Eric Niebler" <eric@boost-consulting.com> wrote:
2) s1 kind of looks like $1, which is the perl equivalent.
I didn't think of 2). Did you put in in the docs?
No. Guess I should.
FWIW, capital 'S' looks more like '$' to me. E.g.,
'<' >> (S1= +_w) >> '>' >> -*_ >> "</" >> S1 >> '>'
This is true, but it runs contrary to Boost's naming conventions. ALL CAPS is for macros.
That said, I'm open to suggestions for avoiding the name conflicts. I would consider switching back to _1 _2 _3 if the technical problems were overcome and if people liked it better.
May I assume you have considered reusing the placeholders from boost::bind? There doesn't seem to be much operator overloading involving boost::arg<>.
I thought of that. Trouble is, xpressive's placeholders need to have an operator=, which must be a member. Besides, I rely on ADL to find xpressive's operators, and bind's placeholders are not in the correct namespace for my purposes. -- Eric Niebler Boost Consulting www.boost-consulting.com

"Eric Niebler" <eric@boost-consulting.com> wrote in message news:413BF440.20505@boost-consulting.com...
Jonathan Turkanis wrote:
"Eric Niebler" <eric@boost-consulting.com> wrote:
2) s1 kind of looks like $1, which is the perl equivalent.
I didn't think of 2). Did you put in in the docs?
No. Guess I should.
FWIW, capital 'S' looks more like '$' to me. E.g.,
'<' >> (S1= +_w) >> '>' >> -*_ >> "</" >> S1 >> '>'
This is true, but it runs contrary to Boost's naming conventions. ALL CAPS is for macros.
I was thinking of vecS .. I forgot that 1 is a capital. :-)
That said, I'm open to suggestions for avoiding the name conflicts. I would consider switching back to _1 _2 _3 if the technical problems were overcome and if people liked it better.
May I assume you have considered reusing the placeholders from boost::bind? There doesn't seem to be much operator overloading involving boost::arg<>.
I thought of that. Trouble is, xpressive's placeholders need to have an operator=, which must be a member. Besides, I rely on ADL to find xpressive's operators, and bind's placeholders are not in the correct namespace for my purposes.
This class of placeholders is a big problem. Jonathan

"Eric Niebler" <eric@boost-consulting.com> writes:
Yes, I used _1, _2, _3 for a long time. I abandoned it because of name conflicts with the identically-named placeholders from other boost libraries. I picked s1, s2, ... for two reasons: 1) The "s" in "s1" stands for "sub-match", which is what these thigs represent. 2) s1 kind of looks like $1, which is the perl equivalent.
That said, I'm open to suggestions for avoiding the name conflicts. I would consider switching back to _1 _2 _3 if the technical problems were overcome and if people liked it better.
FWIW, The bind library placeholders are going to be moved out of the unnamed namespace soon, unless Peter has changed his mind. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com
participants (8)
-
Chau Johnthan
-
David Abrahams
-
Eric Niebler
-
Goran Mitrovic
-
Jonathan Turkanis
-
Jurko Gospodnetic
-
Neal D. Becker
-
Stefan Slapeta