Re: [boost] [Ann] socketstream library 0.7

15 Jun 2005

      Hi Pedro,

Apologies for any sloppy formatting; mail client woes.

----- Original Message -----
From: <pedro.lamarao@mndfck.org>
To: <boost@lists.boost.org>
Sent: Wednesday, June 15, 2005 12:08 AM
Subject: Re: [boost] [Ann] socketstream library 0.7
...
This "buffering problem" is the problem that leads people to design
protocols with fixed sizes everywhere.
Yes - exactly. Header (incl length) and body is a perfectly functional
response to a need. Is it the best we can do?
...
...
To get to the point; I am currently reading blocks off network
connections
and presenting them to byte-by-byte lexer/parser routines. These form
the structured network messages directly, i.e. fields are already plucked
out.
So which is better? Direct byte-by-byte conversion to structured network
<> message or multi-pass?
I understood you correctly, I might rephrase that to myself like Do we
read the whole message before parsing, or are we parsing directly from
the data source?
Yes. That's a reasonable paraphrasing.
...
If we parse directly from the data source, we must analyze byte by byte,
and so obtain byte by byte. If we want this, we will want a buffering
layer to keep the amount of system calls to a reasonable level.
streambufs provide such a buffering level, with IO operations proper for
lexical analysis at such a level: sgetc, snextc, sbumpc.
Yes, thats true. As well as many points you make about streambufs (didnt
realize they were quite that flexible).
...
If you remember streambuf_iterators exist, and imagine a multi_pass
iterator (hint, hint), many other interesting things come to mind.
If we read the message completely beforehand, we must know how much we
have to read, or we must inspect the data source in some way to watch
for "end of message".
[snip]
...
At this point, we have read the same amount of bytes from the data
source, in whatever order. But the amount of calls made to the IO system
service is not the same, and the fixed size approach is more efficient
in this regard.
Also, the fixed size approach solves the "buffering problem" since we
make no resizings along the way. C++ people, blessed with std::vector,
already have a mechanismo to do away with such weirdness; think about
how you do it in C.
Sorry but there is such a gulf between our approaches I'm not
sure I can say anything to help clarify. As a last response the best
I can do is say that;

1. The difference (in terms of CPU time) in maintaining a counter
and inspecting a "current byte" and testing it for "end of message"
seems minimal. This is stated relatively, i.e. it is far more significant
that the bytes sent across the network are being scanned at the
receiver more than once. Even maintaining the body counter is
a (very low cost...) scan.
2. An approach using lex+parse techniques accepts raw byte
blocks as input (convenient) and notifies the user through some
kind of accept/reduce return code, that the message is complete
and already "broken apart", i.e. no further scanning required
by higher layers.
3. Lex+parse techniques do not care about block lengths. An
accept state or parser reduction can occur anywhere. All the
"unget" contortions recently mentioned are not needed. Partial
messages are retained in the parser stack and only finally
assembled on accept/reduce. This property is something
much easier to live with than any kind of "fixed-size" approach
that I have dealt with so far.
...
First. We, unfortunately, can't pass std::vector to the operating
system, so, at some point, we are allocating fixed sized buffers, and
passing it to our IO primitives. There is no escape.
Errrrr. Not quite following that. Are you saying that

send( socket_descriptor, &vector_buffer[ 0 ], vector_buffer.size() )

is bad?
...
If you are initializing std::vector with the correct size and giving
&*begin() to these primitives, well... Why not allocate with new? If you
are allocating it with whatever default size and resizing it later, you
are losing part of the proposed benefit.
Hmmm. If you are saying this to strengthen your case for streambufs
then I understand.
...
When we're about to throw a message to the network, how do we know what
size it is? If our message is composed of, say, a string, another string
and an int, are we going to call string::size() twice for every message
dumped? Is the int representation fixed in size? Is this size enough for
MAX_INT?
[snip large section]
...
If you are on the Internet, you have very little guarantees. It's hell
out here, sir.
Yes you make some very good points. The product I am currently working
on is a vipers' nest of the protocols you talk about and more. There have
been some unpleasant suggested uses for protocols such as IMAP4. Trying
to build a generic network messaging library that facillitates clear concise
application protocols *and* can cope with the likes of IMAP4 is, I believe,
unrealistic.

I didnt know I had a mechanismo until today. Feels great! :-)

Cheers,
Scott

Re: [boost] [Ann] socketstream library 0.7

Scott Woods