
Hi All, Several important extensions of the Iostreams library have been on hold for over a month while I have tried to resolve the issue described in this message. Unfortunately it didn't get much attention during the review, so I'm hoping I can generate some discussion now. I'm sorry about the length of this message; I've been stuck on this for a long time and would really appreciate some help. I. The Problem --------------------------- Standard iostreams do not work well with non-blocking or asynchronous i/o. I would eventually like to extend the library to provide support for non-blocking and async i/o, and when I do so I expect I will have to introduce some new Device concepts. However, I would like to modify the *current* filter concepts so that they will work unchanged when non-blocking and asynchronous devices are introduced. There are several reasons for this: 1. Proper isolation of concepts. A filter represents a rule for transforming character sequences; ideally, how the sequence is accessed should not be relevant. For example, it would be silly to require separate versions of a toupper_filter for blocking and non-blocking i/o, since they would both represent the same simple rule. 2. Maximal code reuse. While it would just be silly to require several versions of a toupper_filter, it would be extremely wasteful to require several versions of more complex components like compression or encryption filters. 3. Reduced complexity of the library. The library already has a large number of concepts; I don't want to double or triple the number of filter concepts when non-blocking and async i/o is introduced. II. The Solution (the easy part) --------------------------- I believe it will suffice to: - provide the functions put() and write() (both filter member functions and the free functions with the same names) with a way to indicate that fewer than the requested number of characters have been written to the underlying data sink even though no error has occurred. - Provide the functions get() and read() (both filter member functions and the free functions with the same names) with a way to indicate that fewer than the requested number of characters have been read from the underlying data source, even though no error has occurred and EOF has not been reached. This is easily achieved for put() and write(), and almost as easily for read(): - Instead of returning void, put() can return a bool indicating whether the given character was successfully written. - Instead of returning void, write() can return an integer indicating the number of characters written. - Currently, when read returns fewer characters than the requested amount it is treated as an EOF indication. Instead, we can allow read to return the actual number of characters read, and reserve -1 to indicate EOF, since it is not needed as an error indication. III. The Solution (the ugly part) --------------------------- The function get presents more of a challenge. Currently it looks like this (for char_type == char): struct my_input_filter : input_filter { template<typename Source> int get(Source& src); }; The return type already serves a dual purpose: it can store a character or an EOF indication. Unfortunately, with non-blocking or async i/o there are now three possible results of a call to get: 1. A character is successfully retrieved. 2. The end of the stream has been reached. 3. No characters are currently available, but more may be available later. My preferred solution is to have get() return an instance of a specialization of a class template basic_character which can hold a character, an EOF indication or a temporary failure indication: template<typename Ch> class basic_character { public: basic_character(Ch c); operator Ch () const; bool good() const; bool eof() const; bool fail() const; }; typedef basic_character<char> character; typedef basic_character<wchar_t> wcharacter; character eof(); // returns an EOF indication character fail(); // returns a temporary failure indication. wcharater weof(); wcharater wfail(); [Omitted: templated versions of eof and fail] Alternatively, the member functions good, eof and fail could be made non-member functions taking a basic_character. IV. Examples (feel free to skip) --------------------------- With these changes, the uncommenting_input_filter (http://tinyurl.com/3ue9r) could be rewritten as follows: class uncommenting_input_filter : public input_filter { public: explicit uncommenting_input_filter(char comment_char = '#') : comment_char_(comment_char) { } template<typename Source> character get(Source& src) { character c = boost::io::get(src); if (c.good() && c == comment_char_) while (c.good() && c != '\n') c = boost::io::get(src); return c; } private: char comment_char_; }; Similarly, usenet_filter::get (http://tinyurl.com/6xqvk) could be rewritten: template<typename Source> int get(Source& src) { // Handle unfinished business. if (eof_) return EOF; if (off_ < current_word_.size()) return current_word_[off_++]; // Compute curent word. current_word_.clear(); while (true) { character c; if (!(c = boost::io::get(src)).good()) { if (c.eof()) eof_ = true; if (current_word_.empty()) return c; else break; } else if (isalpha((unsigned char) c)) { current_word_.push_back(c); } else { // Look up current word in dictionary. map_type::iterator it = dictionary_.find(current_word_); if (it != dictionary_.end()) current_word_ = it->second; current_word_.push_back(c); off_ = 0; break; } } return this->get(src); // Note: current_word_ is not empty. } V. Problems ---------------------------- 1. Harder to learn. Currently the function get and the concept InputFilter are very easy to explain. I'm afraid having to understand the basic_character template before learning these functions will discourage people from using the library. 2. Harder to use. Having to check for eof and fail make writing simple filters, like the above, slightly harder. I'm worried that the effect on more complex filters may be even worse. This applies not just to get, but to the other functions as well, since their returns values will require more careful examination. 3. Performance. It's possible that the change will have a negative effect on performance. I was planning to implement it and then perform careful measurements, but I have run out of time for this. I think the effect will be slight. VI. Benefits ------------------------ A positive side-effect of this change would be that I can rename the filter concepts InputFilter --> PullFilter OutputFilter --> PushFilter and allow both types of filter to be added either to input or to output streams. Filter writers could then choose the filter concept which best expressed the filtering algorithm without worrying whether it will be used for input or output. VII. Alternatives. 1. Adopt the convention that read() always blocks until at least one character is available, and that get() always blocks. This would give up much of the advantage of non-blocking and async i/o. 2. Add new non-blocking filter concepts, but hide them in the "advanced" section of the library. All the library-provided filters would be non-blocking, and users would be encouraged, but not required, to write non-blocking filters. If you've made it this far THANK YOU!!! Please let me know your opinion. Jonathan