Re: [boost] Aaron's Informal Proposal for a Universal Demultiplexor for Boost

14 Sep 2004

      On Tue, Sep 14, 2004 at 09:42:15PM +1000, Christopher Kohlhoff wrote:
...
Hi Aaron,
In your wiki article you ask us to forget about the event handling
patterns, however I think the choice is fundamental to the design and
use of such a demultiplexing API. In your discussion you seem to imply
a reactive model (i.e. "tell me when i can take action without
blocking") when you talk about things like an on_file_readable event.
In developing asio I chose to use the proactive model (i.e. "start an
operation and tell me when it's done"), aka asynchronous I/O, because I
saw it as providing numerous advantages.
One thing I am able to do with a proactive model is provide a simple
and consistent way of encapsulating and abstracting complex
asynchronous operations. For example, the fundamental mechanism asio
provides for asynchronously receiving data on a socket is the
async_recv() member function:
class stream_socket {
    //...
    template <typename Handler>
    void async_recv(void* buf, size_t max_len, Handler h);
    //...
  };
But, this approach is not as flexible as the Reactor model.
The reactor model will just read as much as possible from
the socket everytime (unless there is no more room in the
buffer), allowing a decoder routine to decide how much
data to consume when decoding it.  No problems here.

The proactive pattern however needs to specify upfront how
much data it wants to read, otherwise there is no event
"I am done".  In certain protocols that is simply not
possible.

Assume a TCP socket over a link that cuts the data in pieces
of 200 bytes (small I know, but its just an example).

Protocol A: data comes in compressed chunks of 4 kb at a time.

  Reactive pattern: read(2) is called with a request as big
  as the buffer (usually big enough), so it will read 200
  bytes per call until one has 4 kb in total, at which point
  decoding can take place.

  Proactive pattern: async_recv is called with a request for
  4 kb.  Internally read(2) will be called in mostly the
  same way as above with the exception of the last packet
  (4 kb = 4 * 1024 = 4096 = 20 * 200 + 96).  But I won't
  bitch about that unneccessarily cutting into two of a
  natural packet :p.

Protocol B: data comes in binary messages that start with
a 4 byte field that contains the total length of the message.
Lets assume the buffer is empty and the next message turns
out to be 300 bytes.

  Reactive pattern: read(2) is called with a request as big
  as the buffer, it will return 200 bytes at first which
  a higher layer will decode the first 4 bytes of.  The
  next call to read(2) will read again 200 bytes at which point
  there is enough to decode the first message.

  Proactive pattern: async_recv is called with a request
  for 4 bytes.  It is not possible to request more because
  we have no idea how large the message is going to be,
  and when we request more than the size of the next message,
  and there wouldn't be more messages for a while, then we'd
  stall while there *is* something to decode.  It calls
  read(2) with a size of 4 because it doesn't know if there
  is more room in the buffer.  A higher layer will decode
  these 4 bytes and then call async_recv with a size of 300.
  Now read(2) is called with a size of 300, only returning
  196 bytes though (there was only 200 bytes available and
  we already did eat 4 bytes from that).  Internally it
  will call read(2) again and when the next packet comes
  in only consume 104 bytes of that 200 byte packet, leaving
  the rest again in the socket buffer.  At this point the
  message can be decoded.  Slightly more inefficient then
  with the reactor pattern, but I am still not bitching.

Protocol C: A text protocol, the size of the messages are
completely unknown - we only know that they end on \r\n
at which point we can start to think about decoding it.

  Reactive pattern: read(2) is called with a request as big
  as the buffer, it will return chunks of 200 bytes until
  we find the first EOL sequence, at this point we there
  is enough to decode the message.

  Proactive pattern: Huh.  How much are we going to read?
  Two bytes at a time?  Or is it possible to tell this
  pattern: read at MOST 4096 bytes (size of the buffer)
  but return when read(2) would block and we have at least
  1 byte?  If that is the way this pattern works, then
  what is the benefit over the reactor pattern?  Because
  in most cases above, it would just have returned
  chunks of 200 bytes and the same decoding techniques
  would have been needed.

The only "advantage" that one can possibly think of is that
the user can provide a buffer with a given size that DOES
meet the need of the current protocol, but that assumes one
knows the size of the messages in the protocol.  But ok:
the user controls the buffer (size).  But, is that really
an advantage?  It is NOT namely when we want to use a
stream buffer (boost.IOStreams) that never copies data
(libcw and its dbstreambuf).  Consider this case (you
need a fixed font to view this):

  .----- contigious message block in stream buffer.
 /    
[ <-*-> MESSAGE SO FAR|   ]
                      ^   ^__ end of allocated memory
		      |__ end of read data so far
                      <----------->
		           \__ expected size of total (decodable message)

Note that the expected size is necessary or else
the proactive pattern loses all its benefits.
The expected size goes over the edge of the
allocated buffer and we don't want to reallocate
it because that means copying the whole buffer.
Therefore, we (need to) call read(2) with a requested
size such that it precisely fills up this is block.
Therefore, the expected size becomes meaningless:
in this case we (the user!) would have to call
async_recv with a buffer pointer just after the 'R'
of "..FAR" and a size that cuts the message
into two non-contigious parts anyway.  Note that
above, under 'Protocol C', we concluded that it
is possible that async_recv returns after it
only read until 'FAR', even while we requested
more, because that is necessary or it won't even
be *possible* to decode certain protocols.

-- 
Carlo Wood <carlo@alinoe.com>