[xpressive]: nested regular expressions for dynamic xpressive and basic_regex assignment

Hi Eric, I recently tried to make recursive/nested regular expressions available for the dynamic regex. It would be cool if you could have look at that. http://svn.boost.org/svn/boost/branches/xpressive/nested_dynamic_regex/ To achieve that I extended the compiler traits, which now includes a lookup for references to other regular expressions. I saw that static regular expressions used by reference, store a weak pointer to regex the regex impl object. For dynamic regular expressions I had to store a pointer to the regular expression, and make the class that groups all expressions noncopyable. Do you see a nicer solution? Regards Andreas Pokorny

(Sorry for the delay, I just got back from a week-long vacation.) Andreas Pokorny wrote:
Hi Eric, I recently tried to make recursive/nested regular expressions available for the dynamic regex. It would be cool if you could have look at that. http://svn.boost.org/svn/boost/branches/xpressive/nested_dynamic_regex/
To achieve that I extended the compiler traits, which now includes a lookup for references to other regular expressions. I saw that static regular expressions used by reference, store a weak pointer to regex the regex impl object. For dynamic regular expressions I had to store a pointer to the regular expression, and make the class that groups all expressions noncopyable.
Do you see a nicer solution?
Hi Andreas. I *also* recently made nested regexes work for dynamic regexes, and I made an announcement about it on this list not too long ago. See here: http://lists.boost.org/Archives/boost/2007/06/122641.php. In particular, see "Dynamic Regex Grammars with Named Regexes", which has the following example: sregex_compiler comp; sregex rx = comp.compile("^bar(?$RE)baz$"); comp.compile("(?$RE=)\\d+ \\d+"); The regex_compiler holds a map from names to regex objects. The first call to compile() creates a forward reference to a regex object named "RE" and the second call binds a regex to the name "RE". Is this the sort of thing you had in mind? (This is only available in the repository -- no public release has this feature yet.) -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

Hi Eric, Eric Niebler wrote:
(Sorry for the delay, I just got back from a week-long vacation.)
Andreas Pokorny wrote:
Do you see a nicer solution?
Hi Andreas. I *also* recently made nested regexes work for dynamic regexes, and I made an announcement about it on this list not too long ago. See here: http://lists.boost.org/Archives/boost/2007/06/122641.php.
Oh dear, I switched from my old gmx email account to the gmail based for boost, there are still 7k unread boost messages.
In particular, see "Dynamic Regex Grammars with Named Regexes", which has the following example:
sregex_compiler comp; sregex rx = comp.compile("^bar(?$RE)baz$"); comp.compile("(?$RE=)\\d+ \\d+");
The regex_compiler holds a map from names to regex objects. The first call to compile() creates a forward reference to a regex object named "RE" and the second call binds a regex to the name "RE".
Is this the sort of thing you had in mind? (This is only available in the repository -- no public release has this feature yet.)
(?$ looks like an assertion. I wanted to use that feature to implement configureable/extensible syntax highlighting withing quickbook. What happens if you add RE regex line e.g. comp.compile("(?$RE=)\\w+ "); Did you consider a different syntax for specifiying named regular expressions, or is (?$=) and (?$) something that exists in other regex engine? Regards Andreas Pokorny

Andreas Pokorny wrote:
Hi Eric,
Eric Niebler wrote:
(Sorry for the delay, I just got back from a week-long vacation.)
Andreas Pokorny wrote:
Do you see a nicer solution? Hi Andreas. I *also* recently made nested regexes work for dynamic regexes, and I made an announcement about it on this list not too long ago. See here: http://lists.boost.org/Archives/boost/2007/06/122641.php.
Oh dear, I switched from my old gmx email account to the gmail based for boost, there are still 7k unread boost messages.
I guess you've got some reading to do. ;-)
In particular, see "Dynamic Regex Grammars with Named Regexes", which has the following example:
sregex_compiler comp; sregex rx = comp.compile("^bar(?$RE)baz$"); comp.compile("(?$RE=)\\d+ \\d+");
The regex_compiler holds a map from names to regex objects. The first call to compile() creates a forward reference to a regex object named "RE" and the second call binds a regex to the name "RE".
Is this the sort of thing you had in mind? (This is only available in the repository -- no public release has this feature yet.)
(?$ looks like an assertion. I wanted to use that feature to implement configureable/extensible syntax highlighting withing quickbook. What happens if you add RE regex line e.g. comp.compile("(?$RE=)\\w+ ");
You mean add that line after the other two? It should replace the meaning of (?$RE) in all the regexes that refer to $RE. The syntax is something I made up and is evocative of a Perl variable assignment.
Did you consider a different syntax for specifiying named regular expressions, or is (?$=) and (?$) something that exists in other regex engine?
I made that syntax up and I'm not attached to it. I don't know of any prior art in this area, except that design work for Perl 6. I don't recall not what they're using or why I didn't go with it. :-P Can you suggest a better syntax? -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

Hi, Eric Niebler wrote:
I made that syntax up and I'm not attached to it. I don't know of any prior art in this area, except that design work for Perl 6. I don't recall not what they're using or why I didn't go with it. :-P Can you suggest a better syntax?
The stuff in the nested_dynamc_regex branch delays the compilation to a call to compile() which has to happen after the last addition of a (name,regex) tuple. Hence one can refer to regular expressions using plain literals. compiler.add_start("somename","^bar(baz)$") compiler.add("baz","\\d+ \\d+") This restricts the regex grammar because one cannot parse the literal "baz" because baz will always be the regular expression "\\d+ \\d", but as a user you know which words have to be parsed, and which can be used as name expressions. There is of course another disadvantage, because you have to separate bar from baz in the example above. I just look at quickbooks C++ grammar and I have to notice that it gets really hard to write as a dynamic. If you have to write bigger things a C++ embedded DSL really has advantages. Regards Andreas
participants (2)
-
Andreas Pokorny
-
Eric Niebler