[regex] Question about pattern

Hi; I'm a newbie to regex patterns and need help in a relative simple task: I use boost::regex with perl syntax and I have to parse user input that provides a feature similar to the conditional formatting of cells in OpenOffice Calc. The string the user enters looks like this:
=30 [fg:red] [bg:white] or <0.0 [fg:none] [bg:transparent]
First the user can enter an optional operator (> < <> = <= >=) followed by a integer or floating point number. This whole term is optional so [fg:red] [bg:white] is also a valid expression. The terms for color selection are partly optional, so valid expressions are [fg:red] or [bg:white] or [fg:red][bg:white] or [bg:white][fg:red] The key words for foreground and background colors are predefined by the software (red, green, blue, transparent, none, ...). I tried starting with a pattern like (C++ syntax for strings) ^(?:(<|<=|>|>=|=|<>){1}((?:-|\\+)?[0-9]+(?:\\.[0-9]+|\\,[0-9]+)?))?(?:\\s*)(?:(\\[FG:(?:BLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN|NONE)\\])|(\\[BG:(?:BLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN)\\])|(\\[FG:(?:BBLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN|NONE)\\])(?:\\s*)(\\[BG:(?:BLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN)\\])){1}$ But this does not allow the [fg:][bg:] terms to be interchanged and also when a new color or style will be added to the format class I have to change the pattern three times. I tried backward references with no success. So any help is very welcome Thanks in advance Martin

On Tue, 26 Feb 2008 22:37:19 +0200,
[...]I tried starting with a pattern like (C++ syntax for strings)
^(?:(<|<=|>|>=|=|<>){1}((?:-|\\+)?[0-9]+(?:\\.[0-9]+|\\,[0-9]+)?))?(?:\\s*)(?:(\\[FG:(?:BLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN|NONE)\\])|(\\[BG:(?:BLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN)\\])|(\\[FG:(?:BBLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN|NONE)\\])(?:\\s*)(\\[BG:(?:BLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN)\\])){1}$
But this does not allow the [fg:][bg:] terms to be interchanged and also when a new color or style will be added to the format class I have to change the pattern three times. I tried backward references with no success.
If regular expressions get as big as this and maybe even more complicated you might want to use Boost.Spirit instead. Boost.Spirit is not as easy to use as Boost.Regex but complicated expressions are broken down to something more easily understandable. Boris

Boris wrote:
On Tue, 26 Feb 2008 22:37:19 +0200,
wrote: [...]I tried starting with a pattern like (C++ syntax for strings)
^(?:(<|<=|>|>=|=|<>){1}((?:-|\\+)?[0-9]+(?:\\.[0-9]+|\\,[0-9]+)?))?(?:\\s*)(?:(\\[FG:(?:BLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN|NONE)\\])|(\\[BG:(?:BLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN)\\])|(\\[FG:(?:BBLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN|NONE)\\])(?:\\s*)(\\[BG:(?:BLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN)\\])){1}$
But this does not allow the [fg:][bg:] terms to be interchanged and also when a new color or style will be added to the format class I have to change the pattern three times. I tried backward references with no success.
If regular expressions get as big as this and maybe even more complicated you might want to use Boost.Spirit instead. Boost.Spirit is not as easy to use as Boost.Regex but complicated expressions are broken down to something more easily understandable.
I second that suggestion. Any time input elements are not merely optional but order-insensitive, I start thinking "parser" rather than "regular expression."

office@mgs.co.at wrote:
[fg:red] [bg:white]
is also a valid expression.
The terms for color selection are partly optional, so valid expressions are
[fg:red] or [bg:white] or [fg:red][bg:white] or [bg:white][fg:red]
The key words for foreground and background colors are predefined by the software (red, green, blue, transparent, none, ...).
For the fg/bg part you could try something like: "(?:\\[(?:fg|bg):(?:BLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN|NONE)\\]){1,2}" But that would then allow things like: [bg:red][bg:white] Which may not be what you want :-( How about something like: "(?:\\[(?(1)(?!\\1)))(fg|bg):(?:BLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN|NONE)\\]){1,2}" Which says: "if $1 is matched - the (fg|bg) part - then check a forward lookahead assert that what follows is not the same as $1". But that only works for *two* alternatives, and I haven't tried this out in code yet either ! HTH, John.
I tried starting with a pattern like (C++ syntax for strings)
^(?:(<|<=|>|>=|=|<>){1}((?:-|\\+)?[0-9]+(?:\\.[0-9]+|\\,[0-9]+)?))?(?:\\s*)(?:(\\[FG:(?:BLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN|NONE)\\])|(\\[BG:(?:BLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN)\\])|(\\[FG:(?:BBLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN|NONE)\\])(?:\\s*)(\\[BG:(?:BLUE|RED|GREEN|BLACK|YELLOW|WHITE|CYAN)\\])){1}$
But this does not allow the [fg:][bg:] terms to be interchanged and also when a new color or style will be added to the format class I have to change the pattern three times. I tried backward references with no success.
So any help is very welcome
Thanks in advance
Martin
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (4)
-
Boris
-
John Maddock
-
Nat Goodspeed
-
office@mgs.co.at