
Hi Eric, --- Eric Niebler <eric@boost-consulting.com> wrote:
What you are describing, at least in regex terms, is a partial match. Ordinarily, a regex match will give you a "yes" or a "no" answer. With a partial match, you can get a "maybe" if the input sequence is exhausted before the regex state machine has reached its final state.
From my reading of the documentation, if I get a partial match and I want to continue to try for a full match I must buffer the entire data from the beginning of the partial match. This means that in the
I was aware of partial matches from perusing the documentation a while back, and I'm not sure that it's exactly the same thing -- please correct me if i'm wrong. partial_regex_grep example it cannot find a substring match that is greater then 4096 characters long, because that is all the data it will buffer. Furthermore, each time I want to retest the input against the expression it must process the whole input string again. This is not ideal from an efficiency point of view, since I could potentially receive input data one byte at a time. What I want is a stateful regular expression-based decoder object (since in theory it's just a state machine and can remember its current state). I can feed it more input which will cause more state transitions, and it will tell me when it reaches a terminal state. I never have to buffer more input than the block just read because earlier input will have been fully consumed by the decoder. As I said, this is an area I am very interested in exploring further when time permits (and not just in relation to regular expressions, but also things like Boost.Serialization), but that definitely belongs in its own thread. Cheers, Chris