[Iostreams] Buffering nonblocking I/O

4 Jan 2012

      I find myself wanting to extend Boost.Iostreams to support (at least a
particular use case for) nonblocking I/O. I'd like to sketch a couple
concepts, hoping you will improve and generalize the notion. In
particular, it's easily possible that I don't yet understand Iostreams
well enough: what I'm describing may well fit into existing Concepts
better than I realize.

Boost.Iostreams intends to provide more complete support for
nonblocking I/O in future [0]. We can hope that a discussion around
these ideas could help evolve the library in that direction.

First I want to acknowledge Alexander Nasonov's work from 2003 [1]. I
haven't used or even looked at his "iostream-like pipes" library [2]
because I'm dubious about registering with Yahoo! before being allowed
to download it. (John Torjo seems to have had some trouble too. [3])
If it were posted somewhere more freely accessible, I'd be interested
to examine it because I think there's at least conceptual overlap.

Let's postulate a new Boost.Iostreams Concept to describe an object
capable of buffering data between producer and consumer. It's not
quite a Filter because, as I understand it, control is passed to a
Filter at only one end. An InputFilter receives control at the
consumer end; it forwards the call to its upstream Source. Conversely,
an OutputFilter receives control at the producer end, forwarding the
call to its downstream Sink. I'm talking about an object that receives
control at both the producer and consumer end, providing a buffer
between them.

My first thought was to call this concept a Pipe because it bears a
strong conceptual resemblance to OS pipes. But others in the
aforementioned mail thread ([4], [5]) point out that using the word
"pipe" in connection with Iostreams (or C++ streams in general) is too
easily misinterpreted: one tends to assume a stream interface to a
real OS pipe.

Other plausible names are "buffer" or "synchronization channel" [4] or
"message queue." Talking about a "buffer stream" versus a "streambuf"
could fairly quickly get confusing. I shy away from any form of
"channel" simply because, in our own code base, the word "channel" is
already overloaded to mean four or five vastly different things. Of
course "message queue" already has another specific meaning too. [6]
Until we can settle on a better name, as a placeholder, let's just
call this a Queue.

Naturally the producer end should model Sink, and the consumer end
should model Source.

Why elevate this to a concept? Why not just provide a class? Because
different use cases suggest different implementations -- all of which
could be plug-compatible.

To the extent I understand it, Alexander's library [1] appears to be
targeted at cross-thread message passing. It implicitly handles thread
synchronization. He also mentions unlimited vs. limited capacity, that
is, a bounded max size: when the buffer is full, a producer will block
until a consumer has eaten some of the pending data. Unless I went off
the rails somewhere, this would be a valid model of the Queue concept.

My use case targets an interactive program built around an event loop.
I'm not passing data between different threads, so thread
synchronization would be unnecessary overhead. Instead, every
iteration of the main loop I intend to poll a nonblocking source and a
nonblocking sink. I want to use a couple such Queue objects to buffer
the data.

The application logic must be able to write data to a std::ostream.
(If the producer end of a Queue models Sink, it should be
straightforward to construct such an ostream based on this Queue.)
Because the Sink underlying an ostream operation must model Blocking,
I actually need an unbounded Queue for this. Then each iteration of
the main loop will attempt to write all pending data on the
nonblocking sink; data actually written can be consumed from the
Queue. Unwritten data will be handled by subsequent iterations.

The input side is similar. Each iteration of the main loop will
attempt to fill a temporary buffer from the nonblocking source; the
data it obtains will be put into the other Queue. (This can be a
bounded Queue: given a way to determine how much buffer remains
available, we can limit the size of the nonblocking read.) Presently
there will be "enough" data for the application logic to read an
istream attached to the consumer side of the Queue. This is so we can
guarantee that the application-level istream read operation can be
completely satisfied without blocking.

Determination of "enough" is of course protocol-specific, as is the
means for notifying the application logic. I would want such details
to belong to a specific instance of the concept, perhaps a subclass of
a provided base class, rather than part of the Queue concept itself.

I've been speaking of a nonblocking "source" and "sink" (in lowercase)
because, for me, these are actually APR I/O functions. But it occurs
to me that we could provide adapters to existing Iostreams Source and
Sink objects, using the Iostreams conventions for nonblocking I/O. One
need only arrange to poll the adapters periodically.

It would also be possible to implement a Queue based on an actual OS pipe.

Does this avenue seem worth pursuing? I'm going to build something
like this anyway, but (a) I hope you can help me improve the
abstractions and (b) it may be worth trying to Boostify the code for
possible contribution.

[0] http://www.boost.org/doc/libs/1_48_0/libs/iostreams/doc/guide/asynchronous.h...
[1] http://lists.boost.org/Archives/boost/2003/08/51289.php
[2] http://groups.yahoo.com/group/boost/files/pipes.zip
[3] http://lists.boost.org/Archives/boost/2003/08/51310.php
[4] http://lists.boost.org/Archives/boost/2003/08/51405.php
[5] http://lists.boost.org/Archives/boost/2003/08/51448.php
[6] http://www.amqp.org/

Nat Linden

tags

participants (1)