
On Tue, Sep 14, 2004 at 09:42:15PM +1000, Christopher Kohlhoff wrote:
Hi Aaron,
In your wiki article you ask us to forget about the event handling patterns, however I think the choice is fundamental to the design and use of such a demultiplexing API. In your discussion you seem to imply a reactive model (i.e. "tell me when i can take action without blocking") when you talk about things like an on_file_readable event.
In developing asio I chose to use the proactive model (i.e. "start an operation and tell me when it's done"), aka asynchronous I/O, because I saw it as providing numerous advantages.
One thing I am able to do with a proactive model is provide a simple and consistent way of encapsulating and abstracting complex asynchronous operations. For example, the fundamental mechanism asio provides for asynchronously receiving data on a socket is the async_recv() member function:
class stream_socket { //... template <typename Handler> void async_recv(void* buf, size_t max_len, Handler h); //... };
But, this approach is not as flexible as the Reactor model. The reactor model will just read as much as possible from the socket everytime (unless there is no more room in the buffer), allowing a decoder routine to decide how much data to consume when decoding it. No problems here. The proactive pattern however needs to specify upfront how much data it wants to read, otherwise there is no event "I am done". In certain protocols that is simply not possible. Assume a TCP socket over a link that cuts the data in pieces of 200 bytes (small I know, but its just an example). Protocol A: data comes in compressed chunks of 4 kb at a time. Reactive pattern: read(2) is called with a request as big as the buffer (usually big enough), so it will read 200 bytes per call until one has 4 kb in total, at which point decoding can take place. Proactive pattern: async_recv is called with a request for 4 kb. Internally read(2) will be called in mostly the same way as above with the exception of the last packet (4 kb = 4 * 1024 = 4096 = 20 * 200 + 96). But I won't bitch about that unneccessarily cutting into two of a natural packet :p. Protocol B: data comes in binary messages that start with a 4 byte field that contains the total length of the message. Lets assume the buffer is empty and the next message turns out to be 300 bytes. Reactive pattern: read(2) is called with a request as big as the buffer, it will return 200 bytes at first which a higher layer will decode the first 4 bytes of. The next call to read(2) will read again 200 bytes at which point there is enough to decode the first message. Proactive pattern: async_recv is called with a request for 4 bytes. It is not possible to request more because we have no idea how large the message is going to be, and when we request more than the size of the next message, and there wouldn't be more messages for a while, then we'd stall while there *is* something to decode. It calls read(2) with a size of 4 because it doesn't know if there is more room in the buffer. A higher layer will decode these 4 bytes and then call async_recv with a size of 300. Now read(2) is called with a size of 300, only returning 196 bytes though (there was only 200 bytes available and we already did eat 4 bytes from that). Internally it will call read(2) again and when the next packet comes in only consume 104 bytes of that 200 byte packet, leaving the rest again in the socket buffer. At this point the message can be decoded. Slightly more inefficient then with the reactor pattern, but I am still not bitching. Protocol C: A text protocol, the size of the messages are completely unknown - we only know that they end on \r\n at which point we can start to think about decoding it. Reactive pattern: read(2) is called with a request as big as the buffer, it will return chunks of 200 bytes until we find the first EOL sequence, at this point we there is enough to decode the message. Proactive pattern: Huh. How much are we going to read? Two bytes at a time? Or is it possible to tell this pattern: read at MOST 4096 bytes (size of the buffer) but return when read(2) would block and we have at least 1 byte? If that is the way this pattern works, then what is the benefit over the reactor pattern? Because in most cases above, it would just have returned chunks of 200 bytes and the same decoding techniques would have been needed. The only "advantage" that one can possibly think of is that the user can provide a buffer with a given size that DOES meet the need of the current protocol, but that assumes one knows the size of the messages in the protocol. But ok: the user controls the buffer (size). But, is that really an advantage? It is NOT namely when we want to use a stream buffer (boost.IOStreams) that never copies data (libcw and its dbstreambuf). Consider this case (you need a fixed font to view this): .----- contigious message block in stream buffer. / [ <-*-> MESSAGE SO FAR| ] ^ ^__ end of allocated memory |__ end of read data so far <-----------> \__ expected size of total (decodable message) Note that the expected size is necessary or else the proactive pattern loses all its benefits. The expected size goes over the edge of the allocated buffer and we don't want to reallocate it because that means copying the whole buffer. Therefore, we (need to) call read(2) with a requested size such that it precisely fills up this is block. Therefore, the expected size becomes meaningless: in this case we (the user!) would have to call async_recv with a buffer pointer just after the 'R' of "..FAR" and a size that cuts the message into two non-contigious parts anyway. Note that above, under 'Protocol C', we concluded that it is possible that async_recv returns after it only read until 'FAR', even while we requested more, because that is necessary or it won't even be *possible* to decode certain protocols. -- Carlo Wood <carlo@alinoe.com>