Why does regex_search search to the end of the iterator?
Could anyone explain the following behaviour of regex_search (In /usr/include/boost/regex/v3/regex_match.hpp, so Version 3) Using regex_search acting on a pair of iterators, with input "aaa bbb ccc\naaaaaaX" and regexp "(aaa)|(bbb)|(.)|(\n)" and the match_continuous flag set. It seems that regex_search iterates to the end of the sequence using ++iterator rather than stopping once it has found the first match or even the longest match. Why is this? I guess it is an optimisation to speed up subsequent matches. Is there a way of avoiding this? I have an application in mind where maybe I dont have the entire sequence or rather I'd like the match before I have the complete sequence. thanks, David --- details -- Why does regex_search iterate to the end of the sequence in the following situation, with match_continuous set It seems to me that it could stop as soon as it finds 'aaa' and not iterate further. expression.assign("(aaa)|(bbb)|(.)|(\n)"); // Regular expression const char * input = "aaa bbb ccc\naaaaaaX"; // Input string MyCharIter start = my_begin(input), end = my_end(input); boost::match_results<MyCharIter> what; boost::regex::flag_type flags = boost::match_default | boost::match_not_dot_newline | boost::match_continuous ; regex_search(start,end, what, expression, flags); MyCharIter is just a test iterator on the string with MyCharIter& MyCharIter::operator++(){ // prefix ++X printf("NT: ++X called bp=%d c= '%c'\n",bp,s[bp]); ++bp; return *this; } The output is NT: ++X called bp=0 c= 'a' NT: ++X called bp=1 c= 'a' NT: ++X called bp=2 c= 'a' NT: ++X called bp=0 c= 'a' NT: ++X called bp=1 c= 'a' NT: ++X called bp=2 c= 'a' NT: ++X called bp=3 c= ' ' NT: ++X called bp=4 c= 'b' NT: ++X called bp=5 c= 'b' NT: ++X called bp=6 c= 'b' NT: ++X called bp=7 c= ' ' NT: ++X called bp=8 c= 'c' NT: ++X called bp=9 c= 'c' NT: ++X called bp=10 c= 'c' NT: ++X called bp=11 c= ' ' NT: ++X called bp=12 c= 'a' NT: ++X called bp=13 c= 'a' NT: ++X called bp=14 c= 'a' NT: ++X called bp=15 c= 'a' NT: ++X called bp=16 c= 'a' NT: ++X called bp=17 c= 'a' NT: ++X called bp=18 c= 'X' NT: ++X called bp=0 c= 'a' NT: ++X called bp=0 c= 'a' NT: ++X called bp=1 c= 'a' NT: ++X called bp=2 c= 'a' NT: ++X called bp=0 c= 'a' NT: ++X called bp=0 c= 'a' NT: ++X called bp=1 c= 'a' NT: ++X called bp=2 c= 'a' NT: ++X called bp=0 c= 'a' NT: ++X called bp=1 c= 'a' NT: ++X called bp=2 c= 'a' ******** N = 0 Result = aaa
Could anyone explain the following behaviour of regex_search (In /usr/include/boost/regex/v3/regex_match.hpp, so Version 3)
Using regex_search acting on a pair of iterators, with input "aaa bbb ccc\naaaaaaX" and regexp "(aaa)|(bbb)|(.)|(\n)" and the match_continuous flag set.
It seems that regex_search iterates to the end of the sequence using ++iterator rather than stopping once it has found the first match or even the longest match.
That shouldn't be the case - I don't see that behaviour here anyway - note that this is obsolete code anyway, is there any way you can upgrade to 1.32? If you want me to investigate further can you 1) reproduce the behaviour with 1.32. 2) Let me have a reproducible test case. 3) Check that it's not a call to std::distance that seeking the iterator to the end (it shouldn't be). Thanks, John.
That shouldn't be the case - I don't see that behaviour here anyway - note that this is obsolete code anyway, is there any way you can upgrade to 1.32?
If you want me to investigate further can you
1) reproduce the behaviour with 1.32. 2) Let me have a reproducible test case. 3) Check that it's not a call to std::distance that seeking the iterator to the end (it shouldn't be).
Hi, Thank you for your helpful email. I have downloaded Boost 1.32 from sourceforge and installed it. The behaviour of the program under Boost 1.32 is what I expected and what I wanted i.e. regex_search no longer seeks to the end of the iterator before returning the first match. So, problem solved. I have a test program (132 lines) which demonstrates this problem, and shows that 'almost' the same code produces different results under Boost 1.32 and Boost 1.30 (The almost is ... #ifdef BOOST_VERSION_1.30 boost::regex::flag_type flags = #else boost::regex_constants::match_flag_type flags = #endif ) If this is of interest I can post the test program to the list. Thanks, David
Hi, Thank you for your helpful email. I have downloaded Boost 1.32 from sourceforge and installed it. The behaviour of the program under Boost 1.32 is what I expected and what I wanted i.e. regex_search no longer seeks to the end of the iterator before returning the first match.
So, problem solved.
Ok good, as long as that's solved the problem, there's not much point trying to figure out what 1.30 was doing, as that code is no longer even in cvs. John.
participants (2)
-
David McKelvie
-
John Maddock