static const boost::regex find_imgs_with_alt(" <\\s*img Matches <, 0 or many whitespace, IMG \\s+src\\s* Matches 1 or more whitespace, SRC, 0 or many whitespace =\\s* Matches = followed by 0 or more whitespace \"\\s* Matches " followed by 0 or more whitespace [^\"]* Matches any number of chars not " \\s+ Matches 1 or more whitespace [^alt]* I would like match anything except the word ALT, but the regexp stuff interprets this as anything but 'a', 'l', or 't' alt\\s*= Matches ALT, 0 or whitespace, = \"(^\")\" Matches ", anything except a " as a group that I can reference, then another " [^>]*> Matches any number of chars not >, followed by a > ", boost::regbase::normal | boost::regbase::icase);
You could use forward lookahead asserts:
"(?!\
)*" matches a sequence of chars that are not "\
", although this is rather slow I admit... So what I want to do is make another regular expression which matches "alt", and in the part that says
[^alt]*
do instead something like
[^@alt]*
where '@' would indicate that 'alt' was the name of another regular expression, such as
static const boost::regex alt("alt", boost::regbase::normal | boost::regbase::icase);
I can see how to do what I want to do without this; I would get the whole IMG tag and do a separate regexp_search on the match. But it seems to make it so much easier if it were possible, especially leaving me with fewer lines of regular expression code to have bugs in.
If this is possible I'd like to know. Thanks in advance, and I'll post the regular expressions I end up using here if anyone might find them of use.
You can't do that right now - the main problem is how would the library find an expression called "alt"? Interpreted languages with reflexive abilities can do this (perl for example), but compiled languages can't.
At present I'm in the middle of rewriting the regex matching code (for
that follow these things it's about 90% done and up to 10x faster than the current version). Once I've got that out the door there are a couple of extensions that I will be able to add:
1) recursive regexes (A regex that can jump to an arbitrary part in it's own state machine). 2) registered/named regexes: you would call boost::regex::register to register a named regular expression, which can then be called from as many other regexes as you want (basically it lets one state machine call another). There are limitations to be figured out, but I'm actually
"John Maddock"
excited about this one - and it happens to solve your problem as well - or at least almost, I admit I hadn't thought of referring to negated regexes as you want to do, that's actually quite tricky :-(
How are you saving 2) ? In memory or permanently in a file ? If permanently in a file, how does the end-user reuse named regexes in other situations from the one in which he created a name for a regular expression ? Inquiring minds want to know <g>. Named regexes is something I have intermittently thought about for my Regular Expression Component Library built using Boost Regex++. The difficulty is a practical decision of saving named regexes so that they can be used again in other invocations of the Boost Regex++ library. However one saves them, it seems the end-user must transport such permanent storage around with the Regex++ implementation, else the named regexes will be lost.