data:image/s3,"s3://crabby-images/758ed/758ed636272ddc947a4ce1398eb6dee6f687ebf4" alt=""
That would work for simple cases where the two sub-patterns are separated by ".*", but fail for other non-trivial expressions such>as "omg(bbq|cakes)*wtf", which the program must also be able to accept. ;_;
The technique is obviously extendable. I'm doing a similar thing where I have boost regex and greta available as program options but also my own code specialized for certain odd searches. I generally do 1000's of REGEX's against 10-100 strings from 10-500k long each. To make the speed reasonable, I can compile the regex into a simple intermediate form and record which features I will need during searching- if you find a '(" or '|' then you just need to take a slightly different strategy then "omg.wtf". The overhead in doing a parse on a small regex is obviously lower than making a pointless check everywhere along the samples. It also may be worth considering indexing the string ( make something similar to database index of table of contents) if you have a lot of REGEX's with single character matches or can identify the most unique part of the regex and look for that first. If every REGEX has a 4 char exact match requirement, ...ABCD..., there is little point checking for the complicated stuff when it isn't near the "ABCD" part. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka@hotmail.com Note: Hotmail is blocking my mom's entire ISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'T assume I am ignoring you and try me on marchywka@yahoo.com if no reply here. Thanks. ________________________________ Date: Mon, 11 Feb 2008 18:08:31 -0800 From: gene@nttmcl.com To: boost-users@lists.boost.org Subject: Re: [Boost-users] [Regex] "Truly" incremental search? Renato golin wrote: Eugene M. Kim wrote: I am looking for a way to feed the input bytes only once, in order to reduce the time complexity to approximately O(n). Of course I will lose the ability to recover submatches, but the ability is unnecessary for this particular application that I am considering. I cannot break the regex into two parts ("omg" and "wtf") either because it is specified as an input to the program. Hi, It may sound silly but what if you pre-process the input splitting it by ".*" and do an AND match for all sub-strings? input = "omg.*wtf" => input[] = "omg", "wtf"; return (regexp_match(input[0]) && regexp_match(input[1])); cheers, --renato PS: of course the code above is pseudo-code and of course you'll take as many input parameters as available and not hard-coded like this. That would work for simple cases where the two sub-patterns are separated by ".*", but fail for other non-trivial expressions such as "omg(bbq|cakes)*wtf", which the program must also be able to accept. ;_; Eugene _________________________________________________________________ Need to know the score, the latest news, or you need your HotmailĀ®-get your "fix". http://www.msnmobilefix.com/Default.aspx