Re: [Boost-users] Regular expression min match length
Thanks for the reply John, I'll try to expose my "use case" with the hope this feature will be available in the future...
I'm developing a system that detects TCP streams with a pattern matching, here is the pseudo code (in a simplified form) of the function I'd link to extend:
protocol_stream decodeStream(tcp_stream)
{
foreach(registered protocol in the system)
{
required_size_to_match = calc_min_regex_len(current_protocol.regex);
if(tcp_stream.buffer_size <= required_size_to_match)
check if tcp_stream.buffer matches
else
try to read from tcp_stream if data is available until
Raffaele Romito wrote:
Thanks for the reply John, I'll try to expose my "use case" with the hope this feature will be available in the future... I'm developing a system that detects TCP streams with a pattern matching, here is the pseudo code (in a simplified form) of the function I'd link to extend:
protocol_stream decodeStream(tcp_stream) { foreach(registered protocol in the system) { required_size_to_match = calc_min_regex_len(current_protocol.regex);
if(tcp_stream.buffer_size <= required_size_to_match) check if tcp_stream.buffer matches else try to read from tcp_stream if data is available until
and check again if matches } } Hope u can help since you have "an almost pathological interest in anything that can't be done" :)
Oh dear, I should have realised that this would come back to haunt me ! :-0 I've now realised that in the general case this can't in fact be implemented (think back-references), but can be for at least a subset of regexes. If you're still keen on the feature can you please file a feature request on the TRAC (http://svn.boost.org/trac) so this doesn't get lost? Thanks, John.
John Maddock wrote:
Raffaele Romito wrote:
Thanks for the reply John, I'll try to expose my "use case" with the hope this feature will be available in the future... I'm developing a system that detects TCP streams with a pattern matching, here is the pseudo code (in a simplified form) of the function I'd link to extend:
protocol_stream decodeStream(tcp_stream) { foreach(registered protocol in the system) { required_size_to_match = calc_min_regex_len(current_protocol.regex);
if(tcp_stream.buffer_size <= required_size_to_match) check if tcp_stream.buffer matches else try to read from tcp_stream if data is available until
and check again if matches } } Hope u can help since you have "an almost pathological interest in anything that can't be done" :)
Oh dear, I should have realised that this would come back to haunt me ! :-0
I've now realised that in the general case this can't in fact be implemented (think back-references), but can be for at least a subset of regexes. If you're still keen on the feature can you please file a feature request on the TRAC (http://svn.boost.org/trac) so this doesn't get lost?
But it can, and GRETA does this as an optimization. (It won't search for a match when it knows there isn't room for one.) For example: (foo|barbaz)\1 The minimum match length is 6. Things get tricky when a backreference refers to an enclosing group, as in (foo\1) (and yes, you can do that, but you really shouldn't), in which case, the conservative answer is to say the minimum match length of \1 is 0 and then proceed with the rest of the calculation. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com
participants (3)
-
Eric Niebler
-
John Maddock
-
Raffaele Romito