static const boost::regex find_imgs_with_alt(" <\\s*img Matches <, 0 or many whitespace, IMG \\s+src\\s* Matches 1 or more whitespace, SRC, 0 or many whitespace =\\s* Matches = followed by 0 or more whitespace \"\\s* Matches " followed by 0 or more whitespace [^\"]* Matches any number of chars not " \\s+ Matches 1 or more whitespace [^alt]* I would like match anything except the word ALT, but the regexp stuff interprets this as anything but 'a', 'l', or 't' alt\\s*= Matches ALT, 0 or whitespace, = \"(^\")\" Matches ", anything except a " as a group that I can reference, then another " [^>]*> Matches any number of chars not >, followed by a > ", boost::regbase::normal | boost::regbase::icase);
You could use forward lookahead asserts:
"(?!\
So what I want to do is make another regular expression which matches "alt", and in the part that says
[^alt]*
do instead something like
[^@alt]*
where '@' would indicate that 'alt' was the name of another regular expression, such as
static const boost::regex alt("alt", boost::regbase::normal | boost::regbase::icase);
I can see how to do what I want to do without this; I would get the whole IMG tag and do a separate regexp_search on the match. But it seems to make it so much easier if it were possible, especially leaving me with fewer lines of regular expression code to have bugs in.
If this is possible I'd like to know. Thanks in advance, and I'll post the regular expressions I end up using here if anyone might find them of use.
You can't do that right now - the main problem is how would the library find an expression called "alt"? Interpreted languages with reflexive abilities can do this (perl for example), but compiled languages can't. At present I'm in the middle of rewriting the regex matching code (for those that follow these things it's about 90% done and up to 10x faster than the current version). Once I've got that out the door there are a couple of extensions that I will be able to add: 1) recursive regexes (A regex that can jump to an arbitrary part in it's own state machine). 2) registered/named regexes: you would call boost::regex::register to register a named regular expression, which can then be called from as many other regexes as you want (basically it lets one state machine call another). There are limitations to be figured out, but I'm actually pretty excited about this one - and it happens to solve your problem as well - or at least almost, I admit I hadn't thought of referring to negated regexes as you want to do, that's actually quite tricky :-( John Maddock http://ourworld.compuserve.com/homepages/john_maddock/index.htm