Re: [Boost-Users] Can one "nest" regular expressions ?

20 Feb 2003

      ...
static const boost::regex find_imgs_with_alt("
  <\\s*img    Matches <, 0 or many whitespace, IMG
  \\s+src\\s* Matches 1 or more whitespace, SRC, 0 or many whitespace
  =\\s*       Matches = followed by 0 or more whitespace
  \"\\s*      Matches " followed by 0 or more whitespace
  [^\"]*      Matches any number of chars not "
  \\s+        Matches 1 or more whitespace
  [^alt]*     I would like match anything except the word ALT, but the
              regexp stuff interprets this as anything but 'a', 'l',
              or 't'
  alt\\s*=    Matches ALT, 0 or whitespace, =
  \"(^\")\"   Matches ", anything except a " as a group that I can
              reference, then another "
  [^>]*>      Matches any number of chars not >, followed by a >
  ",
  boost::regbase::normal | boost::regbase::icase);
You could use forward lookahead asserts:

"(?!\<alt\>)*"

matches a sequence of chars that are not "\<alt\>", although this is rather
slow I admit...
...
So what I want to do is make another regular expression which
        matches "alt", and in the part that says
[^alt]*
do instead something like
[^@alt]*
where '@' would indicate that 'alt' was the name of another
        regular expression, such as
static const boost::regex alt("alt",
  boost::regbase::normal | boost::regbase::icase);
I can see how to do what I want to do without this; I would
        get the whole IMG tag and do a separate regexp_search on the
        match.  But it seems to make it so much easier if it were
        possible, especially leaving me with fewer lines of regular
        expression code to have bugs in.
If this is possible I'd like to know.  Thanks in advance, and
        I'll post the regular expressions I end up using here if
        anyone might find them of use.
You can't do that right now - the main problem is how would the library find
an expression called "alt"? Interpreted languages with reflexive abilities
can do this (perl for example), but compiled languages can't.

At present I'm in the middle of rewriting the regex matching code (for those
that follow these things it's about 90% done and up to 10x faster than the
current version).  Once I've got that out the door there are a couple of
extensions that I will be able to add:

1) recursive regexes (A regex that can jump to an arbitrary part in it's own
state machine).
2) registered/named regexes: you would call boost::regex::register to
register a named regular expression, which can then be called from as many
other regexes as you want (basically it lets one state machine call
another).  There are limitations to be figured out, but I'm actually pretty
excited about this one - and it happens to solve your problem as well - or
at least almost, I admit I hadn't thought of referring to negated regexes as
you want to do, that's actually quite tricky :-(

John Maddock
http://ourworld.compuserve.com/homepages/john_maddock/index.htm