PB with Regex Syntax in boost::regex 1.38 cause crash with memory exhausted .
data:image/s3,"s3://crabby-images/bc137/bc137052137c0303faf4f025d627d33d9e975dd7" alt=""
Hi ,
i'm using boost::regex 1.38 and C++ to code a textfile parsor .
My code work correctly with most regular expression but i've got one problem with this expression:
^([^\s]+)\s+\d+\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+\w+\/(\d+)\s+\d+\s+(\w+)\s+(\w+)\:\/\/?([^\:\/\?\s]+):?(\d*)(\S*).+Referer:(.+)\\r.*$
and a file with these lines :
1237084031.249 985 10.0.0.1 TCP_MISS/200 37567 GET http://monsite.information.com/index.mas?epl=00960031UlsNZ0sAVVETVRBeHgsJRz5... - ROUNDROBIN_PARENT/10.10.10.10 text/html [Connection: Close\r\nCache-Control: no-cache,no-store\r\nPragma: No-Cache\r\nAccept: */*\r\nHost: monsite.information.com\r\nUser-Agent: XXXX/X.X (Update: XXXX X.X; UNIX)\r\nProxy-Connection: Keep-Alive\r\n] [HTTP/1.1 200 OK\r\nDate: Sun, 15 Mar 2009 02:27:10 GMT\r\nServer: Oversee Webserver v1.3.18\r\nSet-Cookie: ident=click:0%257csearch:0%257cexitpop:0%257ctoken:vqzyyyprxutxyvqs%257clload:0%257clvisit:1237084030; path=/; expires=Mon, 16-Mar-2009 02:27:10 GMT\r\nSet-Cookie: monsite.com=click:0%257csearch:0%257cexitpop:0%257clload:0%257clvisit:1237084030; path=/; expires=Mon, 16-Mar-2009 02:27:10 GMT\r\nSet-Cookie: Spusr=ac15000c6b3f49bc677e9c0b; path=/; expires=Tue, 15-Mar-11 02:27:10 GMT\r\nCache-control: private, no-cache, must-revalidate\r\nExpires: Mon, 26 Jul 1997 05:00:00 GMT\r\nPragma: no-cache\r\nP3P: policyref="http://monsite.information.com/w3c/p3p.xml", CP="NOI DSP COR ADMa OUR NOR STA"\r\nConnection: close\r\nContent-Type: text/html\r\n\r]
I've got this error message :
terminate called after throwing an instance of 'boost::exception_detail::clone_impl
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
The problem is probably here : (\d*)(\S*).+Referer:(.+) because when i delete this part it work correctly.
I try this expression and this file with a perl script and it work correctly , but with boost not .
Can you help me ?
This is a deliberate "feature" in that what's happening is the complexity of matching the regex has exceeded "safe" expectations: Perl in contrast will just keep churning away trying to find a match even if it take "forever". In the middle are a few cases where Perl eventually finds a match (albeit with poor performance), and Boost.Regex throws an exception. The way to fix this is to make the expression more explicit so that less backtracking occurs. Judicious use of independent sub-expressions can help, as can changing your repeats so that each branch in the state machine is mutually exclusive, for example: (\d*)(\S*).+Referer:(.+) Could be better written as: (\d+)(\D\S*)\s.*Referer:(.+) which is not quite the same thing, or: (\d*)(\D\S*)?\s.*Referer:(.+) which will do the same thing, but with each branch there is only one choice the machine can make. HTH, John.
participants (2)
-
coord.admin
-
John Maddock