
Hi All - I'm happy to announce that Boost will be mentoring 9 Google SoC projects this year. They are: Title Student Mentor ----------------------------------------------------------------------------- User-friendly graphs and their measures Andrew Sutton Jeremy Siek FastCGi and SCGI-compatible CGI library Darren Garvey Christopher M Kohlhoff Boost 'Big Integer' proposal Arseny Kapoulkine Jeff Garland Signal Network library Stjepan Rajko Douglas Gregor Evolutionary Computation Library Aaron Garrett Jaakko Järvi Application to extend Boost regex with recursive matching, named sub-expressions, and automata-based matching Hugh Wimberly John Maddock Visualization Of Arrays And STL Containers Jacob R>Voytko Joaquín L. Muñoz Boost.Extension and Reflection Mariano G. Consoni Hartmut Kaiser A Generic Geometric Library Huseyin Akcan Herve Bronnimann You can find the abstracts of these projects online here: http://code.google.com/soc/boost/about.html As with last year, there were many worthy students and projects that ultimately were not selected. So, for all the students that were not selected realize that the competition for a very few slots was immense. My main advice for students that might want to try again next year is to start participating now -- it really helps to hang out on the list, submit patches, and work on ideas well in advance of SoC. I'd like to thank the mentoring team that evaluated the ~220 applications we received this year. It takes a huge effort to go thru these, read, rank, clarify and give feedback. Note that in addition to the mentors above Andreas Huber, Rene Rivera, and Daniel Wallin all helped with evaluations. For those coming to BoostCon I'll be giving a presentation on Friday about Boost and SoC: http://www.boostcon.com/program/sessions#garland-summer-of-code And, if you aren't coming to BoostCon, well, you should ;-) http://www.boostcon.com/registration Jeff

Jeff Garland wrote:
Hi All -
I'm happy to announce that Boost will be mentoring 9 Google SoC projects this year. They are: <snip>
Application to extend Boost regex with recursive matching, named sub-expressions, and automata-based matching Hugh Wimberly John Maddock
<snip> A DFA or Thompson NFA regex backend would be very nice, but I'm a bit dubious of the claim that it can be faster than the backtracking engine in any but the most pathological cases. I've toyed with various approaches, and communicated with Russ Cox after his excellent article on the topic: http://swtch.com/~rsc/regexp/regexp1.html. The trouble is that getting ECMAScript semantics (captures, leftmost biased) saddles these approaches with bookkeeping that makes the Thompson NFA considerably slower in the common case than the dumber-than-a-bag-of-hammers backtracking NFA. I ported Russ' Thompson NFA-with-captures-and-leftmost-bias to C++/Boost, wrote a Spirit-based regex parser, separated the mutable state for thread-safety and did an apples-to-apples shootout against xpressive's backtracking NFA implementation. The result: for common regexes, xpressive was about 30x faster. But it's also true that pathological regexes exist that expose the weaknesses of the backtracking approach. <shrug> I would be happy to make the Thompson NFA engine publicly available. I'm sure it can be improved, but I tend to doubt a 30x speedup is possible. My advise: implement a simple DFA and use it only for patterns without captures (regex_constants::nosubs) or fancy assertions (eg. look-ahead). If you need captures and leftmost-biased semantics, just use the backtracking NFA. Oh, and make the DFA so that I can use it in xpressive, too! :-)
And, if you aren't coming to BoostCon, well, you should ;-)
You bet! See you there! -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler wrote:
Jeff Garland wrote:
Hi All -
I'm happy to announce that Boost will be mentoring 9 Google SoC projects this year. They are: <snip>
Application to extend Boost regex with recursive matching, named sub-expressions, and automata-based matching Hugh Wimberly John Maddock <snip> [...]
My advise: implement a simple DFA and use it only for patterns without captures (regex_constants::nosubs) or fancy assertions (eg. look-ahead). If you need captures and leftmost-biased semantics, just use the backtracking NFA.
Oh, and make the DFA so that I can use it in xpressive, too! :-)
FWIW, Spirit2 uses lexertl -- a DFA engine by Ben Hanson. Hartmut Kaiser did some benchmarks and found it to be almost as fast as RE2C Hartmut integrated it with Spirit2 and we are very happy with it. There's room for a couple of DFA engines, I guess... http://re2c.org/ http://www.benhanson.net/lexertl.html Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

My advise: implement a simple DFA and use it only for patterns without captures (regex_constants::nosubs) or fancy assertions (eg. look-ahead). If you need captures and leftmost-biased semantics, just use the backtracking NFA.
My intention all along was to see what functionality I could get the DFA to achieve while still being very fast; I hadn't even intended to add look-ahead assertions unless time looked favorable. I'd certainly like to see if I can make captures fast, and from what I've seen, for common regexes without assertions, the DFA really is faster, often considerably. Could you send me the tests you performed? I'd be really interested in it.
Oh, and make the DFA so that I can use it in xpressive, too! :-)
I hadn't looked into xpressive (I'm fairly new to Boost), so I don't know what hurdles that will add, but I'd certainly like to make this functionality as widely available as possible, so thanks for pointing out xpressive. Regards, Hugh Wimberly On 4/12/07, Joel de Guzman <joel@boost-consulting.com> wrote:
Jeff Garland wrote:
Hi All -
I'm happy to announce that Boost will be mentoring 9 Google SoC
Eric Niebler wrote: projects this
year. They are: <snip>
Application to extend Boost regex with recursive matching, named sub-expressions, and automata-based matching Hugh Wimberly John Maddock <snip> [...]
My advise: implement a simple DFA and use it only for patterns without captures (regex_constants::nosubs) or fancy assertions (eg. look-ahead). If you need captures and leftmost-biased semantics, just use the backtracking NFA.
Oh, and make the DFA so that I can use it in xpressive, too! :-)
FWIW, Spirit2 uses lexertl -- a DFA engine by Ben Hanson. Hartmut Kaiser did some benchmarks and found it to be almost as fast as RE2C Hartmut integrated it with Spirit2 and we are very happy with it. There's room for a couple of DFA engines, I guess...
http://re2c.org/ http://www.benhanson.net/lexertl.html
Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hugh Wimberly wrote:
My advise: implement a simple DFA and use it only for patterns without captures (regex_constants::nosubs) or fancy assertions (eg. look-ahead). If you need captures and leftmost-biased semantics, just use the backtracking NFA.
My intention all along was to see what functionality I could get the DFA to achieve while still being very fast; I hadn't even intended to add look-ahead assertions unless time looked favorable. I'd certainly like to see if I can make captures fast, and from what I've seen, for common regexes without assertions, the DFA really is faster, often considerably. Could you send me the tests you performed? I'd be really interested in it.
Oh, my poor addled brain. Too long ago, I promised to make my Boost port of Russ Cox's O(N) Perl regex engine -- a modified Thompson NFA -- available. It does captures and Perl's leftmost biased rule (as opposed to the POSIX leftmost longest rule). As I mentioned, I found the performance to be pretty bad in the common case, but without exponential blow-up in the pathological cases. YMMV. You can find the code in thompson-nfa-perl-regex.cpp at http://www.boost-consulting.com/vault/index.php?directory=Strings%20-%20Text%20Processing& -- Eric Niebler Boost Consulting www.boost-consulting.com

on Wed Apr 11 2007, Jeff Garland <jeff-AT-crystalclearsoftware.com> wrote:
Title Student Mentor ----------------------------------------------------------------------------- User-friendly graphs and their measures Andrew Sutton Jeremy Siek FastCGi and SCGI-compatible CGI library Darren Garvey Christopher M Kohlhoff Boost 'Big Integer' proposal Arseny Kapoulkine Jeff Garland Signal Network library Stjepan Rajko Douglas Gregor Evolutionary Computation Library Aaron Garrett Jaakko Järvi Application to extend Boost regex with recursive matching, named sub-expressions, and automata-based matching Hugh Wimberly John Maddock Visualization Of Arrays And STL Containers Jacob R>Voytko Joaquín L. Muñoz Boost.Extension and Reflection Mariano G. Consoni Hartmut Kaiser A Generic Geometric Library Huseyin Akcan Herve Bronnimann
For anyone who had trouble reading that, Jeff seems to be using 8-space tabs in most places. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com
participants (5)
-
David Abrahams
-
Eric Niebler
-
Hugh Wimberly
-
Jeff Garland
-
Joel de Guzman