
Hi Eric, --- Eric Niebler <eric@boost-consulting.com> wrote:
Understood. Perhaps what is called for is a "pull" iterator that buffers chunks of data at a time, and when the buffer underflows, it fetches another chunk of data. That way, libraries like xpressive and Spirit can keep their iterator-based interface and not worry whether or not "++begin" goes to disk for the next 4Kb, or reads from a socket, or whatever.
Would that address your problem?
I'm not sure, however your comments further down suggest probably not. If the read from socket is to use async I/O then the regex code has to give up control of the thread.
That's not the case for a backtracking regex engine like Boost.Regex or xpressive. These libraries require bidirectional iterators because they may need to back out state transitions and decrement the iterator to try a different alternative. You'll need to buffer everything read
so far, or else write it to a tmp file so you can get it back should you need it.
Ok, I didn't realise it required backtracking. Perhaps xpressive can be wrapped with something that does the buffering from the correct position in the input stream automatically in this case, but...
And the problem of returning a partial match and persisting the current state of the state machine is a hard one. Some implementations maintain their state on the program stack, so returning effectively wipes out all that information.
Does this mean that xpressive stores its state on the program stack?
These implementations would need to somehow serialize the state stored on the program stack, and then de-serialize it in order to begin executing where it left off. Tricky stuff.
This does confirm my feeling that there is call for a async I/O friendly "regular expression" library, and it: - Only supports expressions that can be mapped to FSMs without requiring backtracking. - Does not store any state on the program stack. I don't believe it needs to be anything like as rich in functionality as xpressive, say, and so I'm quite happy to drop support for the "hard" stuff in order to make it async I/O friendly. If that level of functionality is required a user can always do the processing in two steps, where the second step passes a complete message through something like xpressive. Cheers, Chris