Re: Can one "nest" regular expressions ?

20 Feb 2003

      ...
...
static const boost::regex find_imgs_with_alt("
  <\\s*img    Matches <, 0 or many whitespace, IMG
  \\s+src\\s* Matches 1 or more whitespace, SRC, 0 or many whitespace
  =\\s*       Matches = followed by 0 or more whitespace
  \"\\s*      Matches " followed by 0 or more whitespace
  [^\"]*      Matches any number of chars not "
  \\s+        Matches 1 or more whitespace
  [^alt]*     I would like match anything except the word ALT, but the
              regexp stuff interprets this as anything but 'a', 'l',
              or 't'
  alt\\s*=    Matches ALT, 0 or whitespace, =
  \"(^\")\"   Matches ", anything except a " as a group that I can
              reference, then another "
  [^>]*>      Matches any number of chars not >, followed by a >
  ",
  boost::regbase::normal | boost::regbase::icase);
You could use forward lookahead asserts:
"(?!\<alt\>)*"
matches a sequence of chars that are not "\<alt\>", although this is
rather
slow I admit...
...
So what I want to do is make another regular expression which
        matches "alt", and in the part that says
[^alt]*
do instead something like
[^@alt]*
where '@' would indicate that 'alt' was the name of another
        regular expression, such as
static const boost::regex alt("alt",
  boost::regbase::normal | boost::regbase::icase);
I can see how to do what I want to do without this; I would
        get the whole IMG tag and do a separate regexp_search on the
        match.  But it seems to make it so much easier if it were
        possible, especially leaving me with fewer lines of regular
        expression code to have bugs in.
If this is possible I'd like to know.  Thanks in advance, and
        I'll post the regular expressions I end up using here if
        anyone might find them of use.
You can't do that right now - the main problem is how would the library
find
an expression called "alt"? Interpreted languages with reflexive abilities
can do this (perl for example), but compiled languages can't.
At present I'm in the middle of rewriting the regex matching code (for
...
that follow these things it's about 90% done and up to 10x faster than the
current version).  Once I've got that out the door there are a couple of
extensions that I will be able to add:
1) recursive regexes (A regex that can jump to an arbitrary part in it's
own
state machine).
2) registered/named regexes: you would call boost::regex::register to
register a named regular expression, which can then be called from as many
other regexes as you want (basically it lets one state machine call
another).  There are limitations to be figured out, but I'm actually
"John Maddock" <john_maddock@compuserve.com> wrote in message
news:04b801c2d8db$52ab7140$ce7687d9@1016031671...
those
pretty
...
excited about this one - and it happens to solve your problem as well - or
at least almost, I admit I hadn't thought of referring to negated regexes
as
you want to do, that's actually quite tricky :-(
How are you saving 2) ? In memory or permanently in a file ? If permanently
in a file, how does the end-user reuse named regexes in other situations
from the one in which he created a name for a regular expression ? Inquiring
minds want to know <g>.

Named regexes is something I have intermittently thought about for my
Regular Expression Component Library built using Boost Regex++. The
difficulty is a practical decision of saving named regexes so that they can
be used again in other invocations of the Boost Regex++ library. However one
saves them, it seems the end-user must transport such permanent storage
around with the Regex++ implementation, else the named regexes will be lost.

Re: Can one "nest" regular expressions ?

Edward Diener