
On Wed, May 04, 2005 at 12:11:41AM -0500, Aaron W. LaFramboise wrote:
Nathan Myers wrote:
After there's text in the buffer, you can decide if it's enough to merit calling whatever is supposed to extract and operate on it.
It seems that if you already have code that does this by directly examining the buffer, there may be little point in dropping back a level of abstraction and then using operator>>. In particular, in a common case, verifying whether input is complete does most of the work of actually extracting it.
Imagine that you want to use somebody's library that can parse a "{}"-delimited language like Javascript. It wants its input from an istream& or (more likely, I hope) streambuf*. You can scan the incoming text for matching brackets, and the bracket that matches the first opening bracket delimits a unit. The only real work you're doing is skipping comments and string literals. (Similarly, perhaps, for XML.) The size of the unit to parse is not restricted to the size of your buffer, but there might be a maximum size you care to handle, for security/DOS if nothing else. Then there's Giovanni's example of line delimiters, which may be a more common use. I can recognize newlines, but don't care to reproduce even the code to parse numbers, and don't want to copy each line all over creation just to get the numbers out; I'd rather parse them right from the buffer the text first landed in.
One thing I've never understood is how extractors are supposed to be written when they require reading two or more sub-objects from the input stream. If reading the first part suceeds, but the second part fails, what happens to the chunk of data that was read? And how do we prevent the stream from being in an indeterminant state due to not knowing how much was read? Perhaps the solution to this problem might present new ideas for solutions to the nonblocking extractor problem.
Or vice versa. The old libstdc++ istream used to support indefinitely- large pushback, but that's not really what's wanted. What you need is a way get a token from the streambuf that lets you seek the stream back to that point, e.g. when you find failbit set. When the token's dtor is called, the streambuf can discard any accumulated state up to the next place it issued a token. Of course that just takes a clever streambuf, and doesn't need any help from the standard library. (Unfortunately streampos can't be that token; no dtor.) A less elegant scheme is possible: you might have a streambuf that saves _everything_, until told to discard whatever came before some point, e.g. the current position when you know you have a good parse. It can pubseek() back to that point and to any point after. Of course, once it has seeked (sought?) back there, you need to know what to do next. I am finding it hard to think of what one might do. Skip to the next start delimiter? You could do that without seeking first. Try a different LR(n) parse-table production? Maybe, but that's a bit obscure. Nathan Myers ncm@cantrip.org