
From: "Jonathan Turkanis" <technews@kangaroologic.com>
I. The Problem ---------------------------
Standard iostreams do not work well with non-blocking or asynchronous i/o. I would eventually like to extend the library to provide support for non-blocking and async i/o, and when I do so I expect I will have to introduce some new Device concepts. However, I would like to modify the *current* filter concepts so that they will work unchanged when non-blocking and asynchronous devices are introduced.
There's a difference between making the concepts work unchanged and making the components of the library work unchanged.
II. The Solution (the easy part) ---------------------------
I believe it will suffice to:
- provide the functions put() and write() (both filter member functions and the free functions with the same names) with a way to indicate that fewer than the requested number of characters have been written to the underlying data sink even though no error has occurred.
- Provide the functions get() and read() (both filter member functions and the free functions with the same names) with a way to indicate that fewer than the requested number of characters have been read from the underlying data source, even though no error has occurred and EOF has not been reached.
Reasonable notions.
This is easily achieved for put() and write(), and almost as easily for read():
- Instead of returning void, put() can return a bool indicating whether the given character was successfully written.
Clean enough.
- Instead of returning void, write() can return an integer indicating the number of characters written.
But it also needs to indicate errors.
- Currently, when read returns fewer characters than the requested amount it is treated as an EOF indication. Instead, we can allow read to return the actual number of characters read, and reserve -1 to indicate EOF, since it is not needed as an error indication.
Clean enough.
III. The Solution (the ugly part) ---------------------------
The function get presents more of a challenge. Currently it looks like this (for char_type == char):
struct my_input_filter : input_filter { template<typename Source> int get(Source& src); };
The return type already serves a dual purpose: it can store a character or an EOF indication. Unfortunately, with non-blocking or async i/o there are now three possible results of a call to get:
1. A character is successfully retrieved. 2. The end of the stream has been reached. 3. No characters are currently available, but more may be available later.
Right.
My preferred solution is to have get() return an instance of a specialization of a class template basic_character which can hold a character, an EOF indication or a temporary failure indication:
template<typename Ch> class basic_character { public: basic_character(Ch c); operator Ch () const; bool good() const; bool eof() const; bool fail() const; };
typedef basic_character<char> character; typedef basic_character<wchar_t> wcharacter;
character eof(); // returns an EOF indication character fail(); // returns a temporary failure indication.
wcharater weof(); wcharater wfail();
[Omitted: templated versions of eof and fail]
OK.
IV. Examples (feel free to skip) ---------------------------
[snipped async-enabled examples grow from synchronous versions]
V. Problems ----------------------------
1. Harder to learn. Currently the function get and the concept InputFilter are very easy to explain. I'm afraid having to understand the basic_character template before learning these functions will discourage people from using the library.
The class template is hardly complicated. I can't imagine it would be a show stopper, though it does add some complexity.
2. Harder to use. Having to check for eof and fail make writing simple filters, like the above, slightly harder. I'm worried that the effect on more complex filters may be even worse. This applies not just to get, but to the other functions as well, since their returns values will require more careful examination.
That's a real issue.
3. Performance. It's possible that the change will have a negative effect on performance. I was planning to implement it and then perform careful measurements, but I have run out of time for this. I think the effect will be slight.
I'd expect the impact to be small, but quantifying it would be helpful.
VI. Benefits ------------------------
A positive side-effect of this change would be that I can rename the filter concepts
InputFilter --> PullFilter OutputFilter --> PushFilter
and allow both types of filter to be added either to input or to output streams. Filter writers could then choose the filter concept which best expressed the filtering algorithm without worrying whether it will be used for input or output.
Nice.
VII. Alternatives.
1. Adopt the convention that read() always blocks until at least one character is available, and that get() always blocks. This would give up much of the advantage of non-blocking and async i/o.
Definitely not a good idea.
2. Add new non-blocking filter concepts, but hide them in the "advanced" section of the library. All the library-provided filters would be non-blocking, and users would be encouraged, but not required, to write non-blocking filters.
I like this better, but I wonder if there is a unified approach within the library that still keeps things tidy for those writing synchronous code. Could the library components recognize whether a filter was written using char/wchar_t versus basic_character and deal synchronously or asynchronously as a result. That way, those writing synchronous code can write it using a character type of char/wchar_t and everything to/from that code will be in the simplified, synchronous style you currently offer. However, if the client takes advantage of the advanced, asynchronous capabilities of the library by using basic_character, then the style changes based upon needing to deal with the EAGAIN condition. To keep the two styles as similar as possible, you might need to alter the current interfaces slightly, but probably not much (pure, abstract speculation; I haven't even tried to validate that assertion). (The member functions good(), eof(), and fail() on basic_character could be made non-member functions and could be implemented for char and wchar_t. That would help code deal with synchronous code in the asynchronous style.) -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;