Re: [boost] Re: (Another) socket streams library

4 May 2005

      On Wed, May 04, 2005 at 12:11:41AM -0500, Aaron W. LaFramboise wrote:
...
Nathan Myers wrote:
...
After there's text in the 
buffer, you can decide if it's enough to merit calling whatever is 
supposed to extract and operate on it.
It seems that if you already have code that does this by directly
examining the buffer, there may be little point in dropping back a level
of abstraction and then using operator>>.  In particular, in a common
case, verifying whether input is complete does most of the work of
actually extracting it.
Imagine that you want to use somebody's library that can parse a 
"{}"-delimited language like Javascript.  It wants its input from an 
istream& or (more likely, I hope) streambuf*.  You can scan the 
incoming text for matching brackets, and the bracket that matches 
the first opening bracket delimits a unit.  The only real work you're 
doing is skipping comments and string literals.  (Similarly, perhaps, 
for XML.)  The size of the unit to parse is not restricted to the 
size of your buffer, but there might be a maximum size you care to 
handle, for security/DOS if nothing else.

Then there's Giovanni's example of line delimiters, which may be a
more common use.  I can recognize newlines, but don't care to 
reproduce even the code to parse numbers, and don't want to copy each 
line all over creation just to get the numbers out; I'd rather parse
them right from the buffer the text first landed in.
...
One thing I've never understood is how extractors are supposed to be
written when they require reading two or more sub-objects from the input
stream.  If reading the first part suceeds, but the second part fails,
what happens to the chunk of data that was read?  And how do we prevent
the stream from being in an indeterminant state due to not knowing how
much was read?  Perhaps the solution to this problem might present new
ideas for solutions to the nonblocking extractor problem.
Or vice versa.  The old libstdc++ istream used to support indefinitely-
large pushback, but that's not really what's wanted.  What you need is 
a way get a token from the streambuf that lets you seek the stream back 
to that point, e.g. when you find failbit set.  When the token's dtor 
is called, the streambuf can discard any accumulated state up to the 
next place it issued a token.  Of course that just takes a clever 
streambuf, and doesn't need any help from the standard library.  
(Unfortunately streampos can't be that token; no dtor.)  A less elegant 
scheme is possible: you might have a streambuf that saves _everything_,
until told to discard whatever came before some point, e.g. the current 
position when you know you have a good parse.  It can pubseek() back to 
that point and to any point after.

Of course, once it has seeked (sought?) back there, you need to know 
what to do next.  I am finding it hard to think of what one might do.  
Skip to the next start delimiter?  You could do that without seeking 
first.  Try a different LR(n) parse-table production?  Maybe, but 
that's a bit obscure.

Nathan Myers
ncm@cantrip.org