
Hi is there a way to specify open and closed parenthesis with a regular expression syntax? For instance, if I'd want to specify a single regular expression that matches both (foo) and [foo] and {foo}? thanks in advance Lorenzo -- Lorenzo Bettini, PhD in Computer Science, DSI, Univ. di Firenze ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net

Lorenzo Bettini wrote:
Hi
is there a way to specify open and closed parenthesis with a regular expression syntax? For instance, if I'd want to specify a single regular expression that matches both (foo) and [foo] and {foo}?
Probably not what you wanted, but: [(\[{]foo[)\]}] sort of does what you want, but if you want to ensure that the brackets match up, then it's down to: \[foo\]|\(foo\)|\{foo\} John.

John Maddock wrote:
Lorenzo Bettini wrote:
Hi
is there a way to specify open and closed parenthesis with a regular expression syntax? For instance, if I'd want to specify a single regular expression that matches both (foo) and [foo] and {foo}?
Probably not what you wanted, but:
[(\[{]foo[)\]}]
but this would match also {foo] which would wrong...
sort of does what you want, but if you want to ensure that the brackets match up, then it's down to:
\[foo\]|\(foo\)|\{foo\}
I see, this brings to lots of duplicated code (especially if the inner part is big... do you happen to know whether there are any specifications for regular expressions for matching parenthesis (I mean in any regexp frameworks or libraries)? thanks Lorenzo -- Lorenzo Bettini, PhD in Computer Science, DSI, Univ. di Firenze ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net

Lorenzo Bettini wrote:
John Maddock wrote:
Lorenzo Bettini wrote:
Hi
is there a way to specify open and closed parenthesis with a regular expression syntax? For instance, if I'd want to specify a single regular expression that matches both (foo) and [foo] and {foo}?
Probably not what you wanted, but:
[(\[{]foo[)\]}]
but this would match also {foo] which would wrong...
You could use Spirit, of course, using a function that recorded which type of bracket (pedantic point: "(", "), "[", "]", "{" and "}" are all brackets, but only "(" and ")" are parentheses) was found first and another function that tested whether the close bracket matched the open bracket; this way, you could even check for correctly nested brackets if you wanted to. The downsides with using Spirit are: * if you don't already know it, you'll have to learn a fair bit of Spirit in order to do this; * it will add quite a big chunk of code to your source for what is, after all, a simple parsing operation; * it will increase the size of your executable considerably; * it will (possibly) be slower than regex. I'd suggest seeing if anyone else comes up with a solution using regex, and writing your own simple parser by hand (not using Spirit) if not.

Paul Giaccone wrote:
Lorenzo Bettini wrote:
John Maddock wrote:
Lorenzo Bettini wrote:
Hi
is there a way to specify open and closed parenthesis with a regular expression syntax? For instance, if I'd want to specify a single regular expression that matches both (foo) and [foo] and {foo}?
Probably not what you wanted, but:
[(\[{]foo[)\]}]
but this would match also {foo] which would wrong...
You could use Spirit, of course, using a function that recorded which type of bracket (pedantic point: "(", "), "[", "]", "{" and "}" are all brackets, but only "(" and ")" are parentheses) was found first and
yes, sorry: in Italian they're all called parentheses :-)
I'd suggest seeing if anyone else comes up with a solution using regex, and writing your own simple parser by hand (not using Spirit) if not.
I heard about Spirit, and I'll probably take a look at it... thanks Lorenzo -- Lorenzo Bettini, PhD in Computer Science, DSI, Univ. di Firenze ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net

Lorenzo Bettini wrote:
do you happen to know whether there are any specifications for regular expressions for matching parenthesis (I mean in any regexp frameworks or libraries)?
Not really no, but I have another idea, assuming the x-modifier is on: (?x) (?: (\() |(\[) |(\{) ) foo (?: (?(1)\) |(?:(?(2)\] |(?:\} )))) The idea is to use conditional expressions to check which opening backet matched and then react accordingly. You'll need to check I've got the ('s and )'s matching up 'cos I lost count while typing in :-( HTH, John.

John Maddock wrote:
Lorenzo Bettini wrote:
do you happen to know whether there are any specifications for regular expressions for matching parenthesis (I mean in any regexp frameworks or libraries)?
Not really no, but I have another idea, assuming the x-modifier is on:
(?x) (?: (\() |(\[) |(\{) ) foo (?: (?(1)\) |(?:(?(2)\] |(?:\} ))))
The idea is to use conditional expressions to check which opening backet matched and then react accordingly. You'll need to check I've got the ('s and )'s matching up 'cos I lost count while typing in :-(
that's actually what I myself thought I might turn to if there was no way of matching parenthesis (or brackets) with regular expression syntax I was asking for :-) I'll see whether this technique makes my regular expressions harder to read than a simple copy and paste solution. I thought there might be a syntax for it because I read in the regex documentation, in the section "Character classes that are supported by Unicode Regular Expressions" these syntaxes Ps Open Punctuation Pe Close Punctuation and I thought they might be related (by the way, I actually did not understand what they're for, since there's no example). Lorenzo -- Lorenzo Bettini, PhD in Computer Science, DSI, Univ. di Firenze ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net

Lorenzo Bettini wrote:
I'll see whether this technique makes my regular expressions harder to read than a simple copy and paste solution.
I thought there might be a syntax for it because I read in the regex documentation, in the section "Character classes that are supported by Unicode Regular Expressions" these syntaxes
Ps Open Punctuation Pe Close Punctuation
and I thought they might be related (by the way, I actually did not understand what they're for, since there's no example).
[[:Ps:]] matches any opening punctuation [[:Pe:]] matches any closing punctuation Exactly what that means is defined by the Unicode std, and these are only supported by u32regex when Boost.Regex is built with ICU support enabled. John.

John Maddock wrote:
Lorenzo Bettini wrote:
I'll see whether this technique makes my regular expressions harder to read than a simple copy and paste solution.
I thought there might be a syntax for it because I read in the regex documentation, in the section "Character classes that are supported by Unicode Regular Expressions" these syntaxes
Ps Open Punctuation Pe Close Punctuation
and I thought they might be related (by the way, I actually did not understand what they're for, since there's no example).
[[:Ps:]] matches any opening punctuation [[:Pe:]] matches any closing punctuation
Exactly what that means is defined by the Unicode std, and these are only supported by u32regex when Boost.Regex is built with ICU support enabled.
You mean something like ` and '? -- Lorenzo Bettini, PhD in Computer Science, DSI, Univ. di Firenze ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net

Lorenzo Bettini wrote:
Ps Open Punctuation Pe Close Punctuation
and I thought they might be related (by the way, I actually did not understand what they're for, since there's no example).
[[:Ps:]] matches any opening punctuation [[:Pe:]] matches any closing punctuation
Exactly what that means is defined by the Unicode std, and these are only supported by u32regex when Boost.Regex is built with ICU support enabled.
You mean something like ` and '?
No, I did a quick scan of unidata.txt and the only ASCII characters classified as Ps are (, [ and {. The next one that shows up is "TIBETAN MARK GUG RTAGS GYON" whatever that is :-) John.

John Maddock wrote:
Lorenzo Bettini wrote:
Ps Open Punctuation Pe Close Punctuation
and I thought they might be related (by the way, I actually did not understand what they're for, since there's no example). [[:Ps:]] matches any opening punctuation [[:Pe:]] matches any closing punctuation
Exactly what that means is defined by the Unicode std, and these are only supported by u32regex when Boost.Regex is built with ICU support enabled. You mean something like ` and '?
No, I did a quick scan of unidata.txt and the only ASCII characters classified as Ps are (, [ and {. The next one that shows up is "TIBETAN
OK, thanks; unfortunately it's not related to the related closing part
MARK GUG RTAGS GYON" whatever that is :-)
ah! :-) thanks again Lorenzo -- Lorenzo Bettini, PhD in Computer Science, DSI, Univ. di Firenze ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net

John Maddock wrote:
Lorenzo Bettini wrote:
do you happen to know whether there are any specifications for regular expressions for matching parenthesis (I mean in any regexp frameworks or libraries)?
Not really no, but I have another idea, assuming the x-modifier is on:
(?x) (?: (\() |(\[) |(\{) ) foo (?: (?(1)\) |(?:(?(2)\] |(?:\} ))))
The idea is to use conditional expressions to check which opening backet matched and then react accordingly. You'll need to check I've got the ('s and )'s matching up 'cos I lost count while typing in :-(
You can do something similar with xpressive (alternate regex engine which will be in boost 1.34): cregex rx = cregex::compile( "(\\(()|\\[()|\\{())foo(\\4\\}|\\3\\]|\\2\\))"); if(regex_match("(foo)", rx)) { std::cout << "match!\n"; } if(!regex_match("(foo]", rx)) { std::cout << "no match!\n"; } For each alternate, you create an empty capture with "()". Then you match that capture again in the balanced alternate on the other side. Backreferences (even empty ones) only match if their capture participated in the match. As a static regex, this would look like: cregex rx = ('(' >> (s1=nil) | '[' >> (s2=nil) | '{' >> (s3=nil)) >> "foo" (s3 >> '}' | s2 >> ']' | s1 >> ')') This avoids the need to double-escape everything. -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler wrote:
John Maddock wrote:
Lorenzo Bettini wrote:
do you happen to know whether there are any specifications for regular expressions for matching parenthesis (I mean in any regexp frameworks or libraries)? Not really no, but I have another idea, assuming the x-modifier is on:
(?x) (?: (\() |(\[) |(\{) ) foo (?: (?(1)\) |(?:(?(2)\] |(?:\} ))))
The idea is to use conditional expressions to check which opening backet matched and then react accordingly. You'll need to check I've got the ('s and )'s matching up 'cos I lost count while typing in :-(
You can do something similar with xpressive (alternate regex engine which will be in boost 1.34):
cregex rx = cregex::compile( "(\\(()|\\[()|\\{())foo(\\4\\}|\\3\\]|\\2\\))");
if(regex_match("(foo)", rx)) { std::cout << "match!\n"; } if(!regex_match("(foo]", rx)) { std::cout << "no match!\n"; }
For each alternate, you create an empty capture with "()". Then you match that capture again in the balanced alternate on the other side. Backreferences (even empty ones) only match if their capture participated in the match.
mhh... this looks nice :-) do you happen to know whether empty captures can be specified in boost::regex (I cannot try it at the moment, but I'll do it asap) thanks Lorenzo -- Lorenzo Bettini, PhD in Computer Science, DSI, Univ. di Firenze ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net

Lorenzo wrote:
is there a way to specify open and closed parenthesis with a regular expression syntax? For instance, if I'd want to specify a single regular expression that matches both (foo) and [foo] and {foo}?
Lorenzo, Matching parentheses to arbitrary depth is, like matching palindromes, a textbook example of something that cannot be done with a regular language and instead needs a context-free language. Google for "chomsky hierarchy palindrome" for some background material. Of course, it's possible that some regexp library somewhere has a hack to add this functionality. But fundamentally, you need a more sophisticated parser. Phil.

Phil Endecott wrote:
Lorenzo wrote:
is there a way to specify open and closed parenthesis with a regular expression syntax? For instance, if I'd want to specify a single regular expression that matches both (foo) and [foo] and {foo}?
Lorenzo,
Matching parentheses to arbitrary depth is, like matching palindromes, a textbook example of something that cannot be done with a regular language and instead needs a context-free language. Google for "chomsky hierarchy palindrome" for some background material. Of course, it's possible that some regexp library somewhere has a hack to add this functionality. But fundamentally, you need a more sophisticated parser.
Hi Phil yes I know about that, but since regex supports backreferences and conditionals, I thought that this might be implemented by a combination of these two mechanisms... indeed John Maddock answered to this thread right with a manual solution using backreferences and conditionals... so I thought a regular expression syntax could already be provided to match these... I'm using regular expressions for this software http://www.gnu.org/software/src-highlite that highlights programs, and it assumes that the program itself is correct. Thus, I assume that parenthesis (and brackets - sorry, in Italian both (, [ and { are called parenthesis :-) are balanced, and with greedy regular expressions, nested parenthesis are already handled... Lorenzo -- Lorenzo Bettini, PhD in Computer Science, DSI, Univ. di Firenze ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net
participants (5)
-
Eric Niebler
-
John Maddock
-
Lorenzo Bettini
-
Paul Giaccone
-
Phil Endecott