[iostreams] Major interace changes planned -- comments requested

Hi All, Several important extensions of the Iostreams library have been on hold for over a month while I have tried to resolve the issue described in this message. Unfortunately it didn't get much attention during the review, so I'm hoping I can generate some discussion now. I'm sorry about the length of this message; I've been stuck on this for a long time and would really appreciate some help. I. The Problem --------------------------- Standard iostreams do not work well with non-blocking or asynchronous i/o. I would eventually like to extend the library to provide support for non-blocking and async i/o, and when I do so I expect I will have to introduce some new Device concepts. However, I would like to modify the *current* filter concepts so that they will work unchanged when non-blocking and asynchronous devices are introduced. There are several reasons for this: 1. Proper isolation of concepts. A filter represents a rule for transforming character sequences; ideally, how the sequence is accessed should not be relevant. For example, it would be silly to require separate versions of a toupper_filter for blocking and non-blocking i/o, since they would both represent the same simple rule. 2. Maximal code reuse. While it would just be silly to require several versions of a toupper_filter, it would be extremely wasteful to require several versions of more complex components like compression or encryption filters. 3. Reduced complexity of the library. The library already has a large number of concepts; I don't want to double or triple the number of filter concepts when non-blocking and async i/o is introduced. II. The Solution (the easy part) --------------------------- I believe it will suffice to: - provide the functions put() and write() (both filter member functions and the free functions with the same names) with a way to indicate that fewer than the requested number of characters have been written to the underlying data sink even though no error has occurred. - Provide the functions get() and read() (both filter member functions and the free functions with the same names) with a way to indicate that fewer than the requested number of characters have been read from the underlying data source, even though no error has occurred and EOF has not been reached. This is easily achieved for put() and write(), and almost as easily for read(): - Instead of returning void, put() can return a bool indicating whether the given character was successfully written. - Instead of returning void, write() can return an integer indicating the number of characters written. - Currently, when read returns fewer characters than the requested amount it is treated as an EOF indication. Instead, we can allow read to return the actual number of characters read, and reserve -1 to indicate EOF, since it is not needed as an error indication. III. The Solution (the ugly part) --------------------------- The function get presents more of a challenge. Currently it looks like this (for char_type == char): struct my_input_filter : input_filter { template<typename Source> int get(Source& src); }; The return type already serves a dual purpose: it can store a character or an EOF indication. Unfortunately, with non-blocking or async i/o there are now three possible results of a call to get: 1. A character is successfully retrieved. 2. The end of the stream has been reached. 3. No characters are currently available, but more may be available later. My preferred solution is to have get() return an instance of a specialization of a class template basic_character which can hold a character, an EOF indication or a temporary failure indication: template<typename Ch> class basic_character { public: basic_character(Ch c); operator Ch () const; bool good() const; bool eof() const; bool fail() const; }; typedef basic_character<char> character; typedef basic_character<wchar_t> wcharacter; character eof(); // returns an EOF indication character fail(); // returns a temporary failure indication. wcharater weof(); wcharater wfail(); [Omitted: templated versions of eof and fail] Alternatively, the member functions good, eof and fail could be made non-member functions taking a basic_character. IV. Examples (feel free to skip) --------------------------- With these changes, the uncommenting_input_filter (http://tinyurl.com/3ue9r) could be rewritten as follows: class uncommenting_input_filter : public input_filter { public: explicit uncommenting_input_filter(char comment_char = '#') : comment_char_(comment_char) { } template<typename Source> character get(Source& src) { character c = boost::io::get(src); if (c.good() && c == comment_char_) while (c.good() && c != '\n') c = boost::io::get(src); return c; } private: char comment_char_; }; Similarly, usenet_filter::get (http://tinyurl.com/6xqvk) could be rewritten: template<typename Source> int get(Source& src) { // Handle unfinished business. if (eof_) return EOF; if (off_ < current_word_.size()) return current_word_[off_++]; // Compute curent word. current_word_.clear(); while (true) { character c; if (!(c = boost::io::get(src)).good()) { if (c.eof()) eof_ = true; if (current_word_.empty()) return c; else break; } else if (isalpha((unsigned char) c)) { current_word_.push_back(c); } else { // Look up current word in dictionary. map_type::iterator it = dictionary_.find(current_word_); if (it != dictionary_.end()) current_word_ = it->second; current_word_.push_back(c); off_ = 0; break; } } return this->get(src); // Note: current_word_ is not empty. } V. Problems ---------------------------- 1. Harder to learn. Currently the function get and the concept InputFilter are very easy to explain. I'm afraid having to understand the basic_character template before learning these functions will discourage people from using the library. 2. Harder to use. Having to check for eof and fail make writing simple filters, like the above, slightly harder. I'm worried that the effect on more complex filters may be even worse. This applies not just to get, but to the other functions as well, since their returns values will require more careful examination. 3. Performance. It's possible that the change will have a negative effect on performance. I was planning to implement it and then perform careful measurements, but I have run out of time for this. I think the effect will be slight. VI. Benefits ------------------------ A positive side-effect of this change would be that I can rename the filter concepts InputFilter --> PullFilter OutputFilter --> PushFilter and allow both types of filter to be added either to input or to output streams. Filter writers could then choose the filter concept which best expressed the filtering algorithm without worrying whether it will be used for input or output. VII. Alternatives. 1. Adopt the convention that read() always blocks until at least one character is available, and that get() always blocks. This would give up much of the advantage of non-blocking and async i/o. 2. Add new non-blocking filter concepts, but hide them in the "advanced" section of the library. All the library-provided filters would be non-blocking, and users would be encouraged, but not required, to write non-blocking filters. If you've made it this far THANK YOU!!! Please let me know your opinion. Jonathan

Hi Jonathan, this is just a quick comment: "Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:d062af$h9l$1@sea.gmane.org... | Hi All, | III. The Solution (the ugly part) --------------------------- | | The function get presents more of a challenge. Currently it looks like this (for | char_type == char): | | struct my_input_filter : input_filter { | template<typename Source> | int get(Source& src); | }; | | The return type already serves a dual purpose: it can store a character or an | EOF indication. Unfortunately, with non-blocking or async i/o there are now | three possible results of a call to get: | | 1. A character is successfully retrieved. | 2. The end of the stream has been reached. | 3. No characters are currently available, but more may be available later. isn't this a job for boost::optional<int> ? -Thorsten

Thorsten Ottosen wrote:
The return type already serves a dual purpose: it can store a character or an EOF indication. Unfortunately, with non-blocking or async i/o there are now three possible results of a call to get:
1. A character is successfully retrieved. 2. The end of the stream has been reached. 3. No characters are currently available, but more may be available later.
isn't this a job for boost::optional<int> ?
Actually, template<typename Source> int get(Source&); should really be template<typename Source> optional<char> get(Source&); so maybe I should use optional< optional<char> > ;-) Seriously, I think the trouble with this suggestion is that EOF and EAGAIN really deserve equal treatment, whereas with optional<int>, the former would be represented as part of the int and the latter would be represented as the absence of an int. If use optional<int> (which I don't dismiss), there will be lots of tests like if (c && c.get() != EOF) where otherwise you would have if (c.good()) Jonathan

I think that : Keeping blocking io situation simple should take priority over providing non-blocking io. Alternative 1 is a non-starter. Alternative 2 is the way to go given that non-blocking filters will work without compromise in blocking situations. Keith Burton -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Jonathan Turkanis Sent: 03 March 2005 04:04 To: boost@lists.boost.org Subject: [boost] [iostreams] Major interace changes planned -- commentsrequested [snip] VII. Alternatives. 1. Adopt the convention that read() always blocks until at least one character is available, and that get() always blocks. This would give up much of the advantage of non-blocking and async i/o. 2. Add new non-blocking filter concepts, but hide them in the "advanced" section of the library. All the library-provided filters would be non-blocking, and users would be encouraged, but not required, to write non-blocking filters. If you've made it this far THANK YOU!!! Please let me know your opinion. Jonathan _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Keith Burton wrote:
I think that : Keeping blocking io situation simple should take priority over providing non-blocking io. Alternative 1 is a non-starter. Alternative 2 is the way to go given that non-blocking filters will work without compromise in blocking situations.
Okay, thanks. Obviously I am very sympathetic to this view, or I would have implemented my "prefered" solution a long time ago. Jonathan

Jonathan Turkanis wrote: [...]
II. The Solution (the easy part) ---------------------------
I believe it will suffice to:
- provide the functions put() and write() (both filter member functions and the free functions with the same names) with a way to indicate that fewer than the requested number of characters have been written to the underlying data sink even though no error has occurred.
- Provide the functions get() and read() (both filter member functions and the free functions with the same names) with a way to indicate that fewer than the requested number of characters have been read from the underlying data source, even though no error has occurred and EOF has not been reached.
[...] I haven't looked at the library in detail (but I have developed something similar). In my opinion, get and put are non-essential functions, syntactic sugar for read and write. put(c) is just write(&c, 1); get() is read(&tmp, 1); return tmp. I wouldn't worry much about the "proper" interface of put and get, and I wouldn't require filters to implement them. A get() returning -1 for EOF and -2 for "no input available" should be good enough for character-at-a-time filters, which are mostly of the form "return xform( get() );". In fact, a character-by-character filter is never the proper way to do things; the canonical form of the above should be int r = read(buffer, size); if( r > 0 ) std::transform(buffer, buffer+r, buffer, xform); return r; so I'm not sure whether get/put should ever be used at all. But as I said, I haven't looked at the library in detail, so I may be missing something.

Peter Dimov wrote:
I haven't looked at the library in detail (but I have developed something similar).
In my opinion, get and put are non-essential functions, syntactic sugar for read and write. put(c) is just write(&c, 1); get() is read(&tmp, 1); return tmp. I wouldn't worry much about the "proper" interface of put and get, and I wouldn't require filters to implement them. A get() returning -1 for EOF and -2 for "no input available" should be good enough for character-at-a-time filters, which are mostly of the form "return xform( get() );".
In fact, a character-by-character filter is never the proper way to do things; the canonical form of the above should be
int r = read(buffer, size);
if( r > 0 ) std::transform(buffer, buffer+r, buffer, xform);
return r;
so I'm not sure whether get/put should ever be used at all.
Filters with member functions get() and put() are allowed for convenience; users are incouraged to write multi-character filters which implement read() and/or write(). You might think it's completely trivial to tranform a filter which implements get() or put() into a filter which implements read() or write(); however, if get() or put() has multiple return statements or calls itself recursively, sometimes it's a bit tricky. The real need for get/put is as non-member functions used to implement filters. Even if a filter is given a mutli-character request to process, often it must still handle characters one at a time, using get/put applied to the next downstream device. As you say, read or write with single character buffers can be used instead, but it's often easier to use get or put. For example, if I rewrite the following to use read instead of get template<typename Source> character get(Source& src) { character c = io::get(src); if (c.good() && c == comment_char_) while (c.good() && c != '\n') c = io::get(src); return c; } I end up with something like this (untested): template<typename Source> character get(Source& src) { char c; streamsize amt; if ((amt = io::read(src, &c, 1)) != 1) return amt == -1 ? eof() : fail(); if (c == comment_char_) { while (amt == 1 && c != '\n') amt = io::read(src, &c, 1); if (amt != 1) return amt == -1 ? eof() : fail(); } return c; } If I rework it to implement read() instead of get(), I end up with this (untested): template<typename Source> streamsize read(Source& src, char* s, streamsize n) { streamsize m = 0; while (m < n) { char c; streamsize amt = io::read(src, &c, 1); if (amt != 1) return m == 0 && amt == -1 ? -1 : m; if (c == comment_char_) { while (amt == 1 && c != '\n') amt = io::read(src, &c, 1); if (amt != 1) return m == 0 && amt == -1 ? -1 : m; } s[m++] = c; } return m; } Perhaps the above can be simplified. The fact that it's not completely trivial makes me relcutant to abolish get() and put(). Jonathan

Jonathan Turkanis wrote: [...]
For example, if I rewrite the following to use read instead of get
template<typename Source> character get(Source& src) { character c = io::get(src); if (c.good() && c == comment_char_) while (c.good() && c != '\n') c = io::get(src); return c; }
Yes, you are right. A "proper" in-place read-based filter that implements the above (minus the bug) is much, much harder to write and understand. It will also be much, much faster, but the character version may be fast enough for most uses.

Peter Dimov wrote:
Jonathan Turkanis wrote:
[...]
For example, if I rewrite the following to use read instead of get
template<typename Source> character get(Source& src) { character c = io::get(src); if (c.good() && c == comment_char_) while (c.good() && c != '\n') c = io::get(src); return c; }
Yes, you are right. A "proper" in-place read-based filter that implements the above (minus the bug)
Care to share the bug with me? ;-) I know that the comment character is checked twice, but this seems harmless.
is much, much harder to write and understand. It will also be much, much faster,
Actually the distinction between an in-place filter and a filter which produces a modified copy of its input is separate from the get/read question. I'm going to provide optimized treament for in-place filters (this is one of the planned changes that's been on hold), but not all filters can be represented that way. It's especially useful for filters which only observe their input, such as line- or character-counting filters, or filters which implement an offset view of the downstream device.
but the character version may be fast enough for most uses.
So all things considered, do you think the basic_character abstraction makes the get() function and the InputFilter concept unreasonably complex? Thanks for you comments! Jonathan

Jonathan Turkanis wrote:
Peter Dimov wrote:
Jonathan Turkanis wrote:
[...]
For example, if I rewrite the following to use read instead of get
template<typename Source> character get(Source& src) { character c = io::get(src); if (c.good() && c == comment_char_) while (c.good() && c != '\n') c = io::get(src); return c; }
Yes, you are right. A "proper" in-place read-based filter that implements the above (minus the bug)
Care to share the bug with me? ;-) I know that the comment character is checked twice, but this seems harmless.
If the first get() returns comment_char_ and a subsequent get() returns EAGAIN before \n is encountered, the next call will return the rest of the comment. You need to remember the "we are in a comment" state. I think.
is much, much harder to write and understand. It will also be much, much faster,
Actually the distinction between an in-place filter and a filter which produces a modified copy of its input is separate from the get/read question. I'm going to provide optimized treament for in-place filters (this is one of the planned changes that's been on hold), but not all filters can be represented that way. It's especially useful for filters which only observe their input, such as line- or character-counting filters, or filters which implement an offset view of the downstream device.
I think that all character-based filters can be represented as in-place read-based filters.
but the character version may be fast enough for most uses.
So all things considered, do you think the basic_character abstraction makes the get() function and the InputFilter concept unreasonably complex?
I'm not sure. -1/io::eof for EOF and -2/io::again for EAGAIN seem good enough to me. I tried to rewrite your comment skipping filter using -1/-2, and it turned out surprisingly complex. I'm no longer sure that the character version is much easier. As with my previous read-based attempt, it exceeded my capability to write code in an e-mail message. ;-) Time for a contest. "Write the best non-blocking comment skipping filter". :-)

Peter Dimov wrote:
Jonathan Turkanis wrote:
Peter Dimov wrote:
Yes, you are right. A "proper" in-place read-based filter that implements the above (minus the bug)
Care to share the bug with me?
If the first get() returns comment_char_ and a subsequent get() returns EAGAIN before \n is encountered, the next call will return the rest of the comment. You need to remember the "we are in a comment" state. I think.
Right, thanks -- I've been bitten by this before. It's pretty typical of the complications that will be introduced if filters have to be non-blocking. It also brings up another point: none of my current tests would catch something like this :-*
So all things considered, do you think the basic_character abstraction makes the get() function and the InputFilter concept unreasonably complex?
I'm not sure. -1/io::eof for EOF and -2/io::again for EAGAIN seem good enough to me.
The problem with -1/-2 is that I need something that will work for signed characters and character types that are not integral, although the later should be rare.
I tried to rewrite your comment skipping filter using -1/-2, and it turned out surprisingly complex. I'm no longer sure that the character version is much easier.
Yeah ... I think it may be a general problem with writing non-blocking filters. I guess I'm leaning towards putting non-blocking filters in an advanced section and eliminating the non-blocking versions of get() and put().
As with my previous read-based attempt, it exceeded my capability to write code in an e-mail message. ;-)
Time for a contest. "Write the best non-blocking comment skipping filter". :-)
:-) Jonathan

Peter Dimov wrote:
Time for a contest. "Write the best non-blocking comment skipping filter". :-)
Here's my entry. struct filter { char comment_char_; bool in_comment_; filter(): comment_char_( '#' ), in_comment_( false ) { } template< class Source > int get( Source & src ) { int c; for( ;; ) { if( in_comment_ ) { for( ;; ) { c = src.get(); if( c == eof || c == eagain ) return c; if( c == '\n' ) break; } in_comment_ = false; } c = src.get(); if( c != comment_char_ ) return c; in_comment_ = true; } } }; struct filter2 { char comment_char_; bool in_comment_; filter2(): comment_char_( '#' ), in_comment_( false ) { } template< class Source > int read( Source & src, char * s, int n ) { int m = src.read( s, n ); if( m <= 0 ) return m; int r = 0; // dest: [s, s + r) char const * p = s; // src: [p, p + m) for( ;; ) { if( in_comment_ ) { char const * q = static_cast<char const *>( memchr( p, '\n', m ) ); if( q == 0 ) { return r; } m -= q - p + 1; p = q + 1; in_comment_ = false; } char const * q = static_cast<char const *>( memchr( p, comment_char_, m ) ); if( q == 0 ) { memmove( s + r, p, m ); return r + m; } memmove( s + r, p, q - p ); r += q - p; m -= q - p + 1; p = q + 1; in_comment_ = true; } } };

Peter Dimov wrote:
Peter Dimov wrote:
Time for a contest. "Write the best non-blocking comment skipping filter". :-)
Here's my entry.
struct filter { char comment_char_; bool in_comment_;
filter(): comment_char_( '#' ), in_comment_( false ) { }
template< class Source > int get( Source & src ) { int c;
for( ;; ) { if( in_comment_ ) { for( ;; ) { c = src.get();
if( c == eof || c == eagain ) return c; if( c == '\n' ) break; }
in_comment_ = false; }
c = src.get();
if( c != comment_char_ ) return c;
in_comment_ = true; } } };
This one is simple enough to put in the tutorial.
struct filter2 { char comment_char_; bool in_comment_;
filter2(): comment_char_( '#' ), in_comment_( false ) { }
template< class Source > int read( Source & src, char * s, int n ) { int m = src.read( s, n );
if( m <= 0 ) return m;
int r = 0; // dest: [s, s + r)
char const * p = s; // src: [p, p + m)
for( ;; ) { if( in_comment_ ) { char const * q = static_cast<char const *>( memchr( p, '\n', m ) );
if( q == 0 ) { return r; }
m -= q - p + 1; p = q + 1; in_comment_ = false; }
char const * q = static_cast<char const *>( memchr( p, comment_char_, m ) );
if( q == 0 ) { memmove( s + r, p, m ); return r + m; }
memmove( s + r, p, q - p ); r += q - p; m -= q - p + 1; p = q + 1; in_comment_ = true; } } };
Pretty sneaky! Try that with a tab-expanding filter ;-) Jonathan

From: "Jonathan Turkanis" <technews@kangaroologic.com>
I. The Problem ---------------------------
Standard iostreams do not work well with non-blocking or asynchronous i/o. I would eventually like to extend the library to provide support for non-blocking and async i/o, and when I do so I expect I will have to introduce some new Device concepts. However, I would like to modify the *current* filter concepts so that they will work unchanged when non-blocking and asynchronous devices are introduced.
There's a difference between making the concepts work unchanged and making the components of the library work unchanged.
II. The Solution (the easy part) ---------------------------
I believe it will suffice to:
- provide the functions put() and write() (both filter member functions and the free functions with the same names) with a way to indicate that fewer than the requested number of characters have been written to the underlying data sink even though no error has occurred.
- Provide the functions get() and read() (both filter member functions and the free functions with the same names) with a way to indicate that fewer than the requested number of characters have been read from the underlying data source, even though no error has occurred and EOF has not been reached.
Reasonable notions.
This is easily achieved for put() and write(), and almost as easily for read():
- Instead of returning void, put() can return a bool indicating whether the given character was successfully written.
Clean enough.
- Instead of returning void, write() can return an integer indicating the number of characters written.
But it also needs to indicate errors.
- Currently, when read returns fewer characters than the requested amount it is treated as an EOF indication. Instead, we can allow read to return the actual number of characters read, and reserve -1 to indicate EOF, since it is not needed as an error indication.
Clean enough.
III. The Solution (the ugly part) ---------------------------
The function get presents more of a challenge. Currently it looks like this (for char_type == char):
struct my_input_filter : input_filter { template<typename Source> int get(Source& src); };
The return type already serves a dual purpose: it can store a character or an EOF indication. Unfortunately, with non-blocking or async i/o there are now three possible results of a call to get:
1. A character is successfully retrieved. 2. The end of the stream has been reached. 3. No characters are currently available, but more may be available later.
Right.
My preferred solution is to have get() return an instance of a specialization of a class template basic_character which can hold a character, an EOF indication or a temporary failure indication:
template<typename Ch> class basic_character { public: basic_character(Ch c); operator Ch () const; bool good() const; bool eof() const; bool fail() const; };
typedef basic_character<char> character; typedef basic_character<wchar_t> wcharacter;
character eof(); // returns an EOF indication character fail(); // returns a temporary failure indication.
wcharater weof(); wcharater wfail();
[Omitted: templated versions of eof and fail]
OK.
IV. Examples (feel free to skip) ---------------------------
[snipped async-enabled examples grow from synchronous versions]
V. Problems ----------------------------
1. Harder to learn. Currently the function get and the concept InputFilter are very easy to explain. I'm afraid having to understand the basic_character template before learning these functions will discourage people from using the library.
The class template is hardly complicated. I can't imagine it would be a show stopper, though it does add some complexity.
2. Harder to use. Having to check for eof and fail make writing simple filters, like the above, slightly harder. I'm worried that the effect on more complex filters may be even worse. This applies not just to get, but to the other functions as well, since their returns values will require more careful examination.
That's a real issue.
3. Performance. It's possible that the change will have a negative effect on performance. I was planning to implement it and then perform careful measurements, but I have run out of time for this. I think the effect will be slight.
I'd expect the impact to be small, but quantifying it would be helpful.
VI. Benefits ------------------------
A positive side-effect of this change would be that I can rename the filter concepts
InputFilter --> PullFilter OutputFilter --> PushFilter
and allow both types of filter to be added either to input or to output streams. Filter writers could then choose the filter concept which best expressed the filtering algorithm without worrying whether it will be used for input or output.
Nice.
VII. Alternatives.
1. Adopt the convention that read() always blocks until at least one character is available, and that get() always blocks. This would give up much of the advantage of non-blocking and async i/o.
Definitely not a good idea.
2. Add new non-blocking filter concepts, but hide them in the "advanced" section of the library. All the library-provided filters would be non-blocking, and users would be encouraged, but not required, to write non-blocking filters.
I like this better, but I wonder if there is a unified approach within the library that still keeps things tidy for those writing synchronous code. Could the library components recognize whether a filter was written using char/wchar_t versus basic_character and deal synchronously or asynchronously as a result. That way, those writing synchronous code can write it using a character type of char/wchar_t and everything to/from that code will be in the simplified, synchronous style you currently offer. However, if the client takes advantage of the advanced, asynchronous capabilities of the library by using basic_character, then the style changes based upon needing to deal with the EAGAIN condition. To keep the two styles as similar as possible, you might need to alter the current interfaces slightly, but probably not much (pure, abstract speculation; I haven't even tried to validate that assertion). (The member functions good(), eof(), and fail() on basic_character could be made non-member functions and could be implemented for char and wchar_t. That would help code deal with synchronous code in the asynchronous style.) -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Rob Stewart wrote:
From: "Jonathan Turkanis" <technews@kangaroologic.com>
expect I will have to introduce some new Device concepts. However, I would like to modify the *current* filter concepts so that they will work unchanged when non-blocking and asynchronous devices are introduced.
There's a difference between making the concepts work unchanged and making the components of the library work unchanged.
True. I can always fix the library-provided component so that they do the right thing, even if I have to rely on magic, i.e., on an incestuous relationship with library internals. If the library is well-received, however, I expect eventually there will be a large body of user-defined filters, and I wouldn't want them to have to be rewritten.
II. The Solution (the easy part) ---------------------------
I believe it will suffice to:
- provide the functions put() and write() (both filter member functions and the free functions with the same names) with a way to indicate that fewer than the requested number of characters have been written to the underlying data sink even though no error has occurred.
- Provide the functions get() and read() (both filter member functions and the free functions with the same names) with a way to indicate that fewer than the requested number of characters have been read from the underlying data source, even though no error has occurred and EOF has not been reached.
Reasonable notions.
This is easily achieved for put() and write(), and almost as easily for read():
- Instead of returning void, put() can return a bool indicating whether the given character was successfully written.
Clean enough.
Okay.
- Instead of returning void, write() can return an integer indicating the number of characters written.
But it also needs to indicate errors.
Errors are indicated with exceptions. (This is another topic that didn't get much attention during review. There are some, notably James Kanze, who insist that well-designed stream buffers should not throw exceptions. My defense of exceptions is here: http://tinyurl.com/59xv6. At the time of the review, I was prepared to switch to error codes, but it's a bit late now.)
- Currently, when read returns fewer characters than the requested amount it is treated as an EOF indication. Instead, we can allow read to return the actual number of characters read, and reserve -1 to indicate EOF, since it is not needed as an error indication.
Clean enough.
Good, thanks.
III. The Solution (the ugly part) ---------------------------
The return type already serves a dual purpose: it can store a character or an EOF indication. Unfortunately, with non-blocking or async i/o there are now three possible results of a call to get:
1. A character is successfully retrieved. 2. The end of the stream has been reached. 3. No characters are currently available, but more may be available later.
Right.
My preferred solution is to have get() return an instance of a specialization of a class template basic_character which can hold a character, an EOF indication or a temporary failure indication:
<snip synopsis of basoc_character>
OK.
Great, thanks. This is the part I was worried about most.
V. Problems ----------------------------
1. Harder to learn. Currently the function get and the concept InputFilter are very easy to explain. I'm afraid having to understand the basic_character template before learning these functions will discourage people from using the library.
The class template is hardly complicated. I can't imagine it would be a show stopper, though it does add some complexity.
I'm thinking of decribing it as a replacement for traits_type::int_type. Compared to using int_type and its family of helper functions (eq_int_type, eof., not_eof, ...) basic_character is a snap. ;-)
2. Harder to use. Having to check for eof and fail make writing simple filters, like the above, slightly harder. I'm worried that the effect on more complex filters may be even worse. This applies not just to get, but to the other functions as well, since their returns values will require more careful examination.
That's a real issue.
It's become even more clear in this thread, given my botched uncommenting_filter implementation.
3. Performance. It's possible that the change will have a negative effect on performance. I was planning to implement it and then perform careful measurements, but I have run out of time for this. I think the effect will be slight.
I'd expect the impact to be small, but quantifying it would be helpful.
Agreed.
VI. Benefits ------------------------
A positive side-effect of this change would be that I can rename the filter concepts
InputFilter --> PullFilter OutputFilter --> PushFilter
and allow both types of filter to be added either to input or to output streams. Filter writers could then choose the filter concept which best expressed the filtering algorithm without worrying whether it will be used for input or output.
Nice.
I'm thinking if I adopt alternative 2, below, I can still do this. Adding a blocking PullFilter to an output stream or a blocking PushFilter to an input stream filter would cause extra copying, which I can warn abouit in the docs. People who need the highest performance can always write non-blocking filters.
VII. Alternatives.
1. Adopt the convention that read() always blocks until at least one character is available, and that get() always blocks. This would give up much of the advantage of non-blocking and async i/o.
Definitely not a good idea.
Agreed.
2. Add new non-blocking filter concepts, but hide them in the "advanced" section of the library. All the library-provided filters would be non-blocking, and users would be encouraged, but not required, to write non-blocking filters.
I like this better, but I wonder if there is a unified approach within the library that still keeps things tidy for those writing synchronous code. Could the library components recognize whether a filter was written using char/wchar_t versus basic_character and deal synchronously or asynchronously as a result.
It would work by introducing a new category tag non_blocking_tag, and treating user-defined components differently depending on whether their io_category is convertible to non_blocking_tag.
That way, those writing synchronous code can write it using a character type of char/wchar_t and everything to/from that code will be in the simplified, synchronous style you currently offer. However, if the client takes advantage of the advanced, asynchronous capabilities of the library by using basic_character, then the style changes based upon needing to deal with the EAGAIN condition.
To keep the two styles as similar as possible, you might need to alter the current interfaces slightly, but probably not much (pure, abstract speculation; I haven't even tried to validate that assertion).
If I adopt this solution, I'll be writing non-blocking versions of all the library filters and so may be able to judge whether there is a big stylistic difference, and whether it would be a good idea or even possible to modify the current concepts. My inclination is to say that no modification should be made if I support non-blocking components with a new category tag.
(The member functions good(), eof(), and fail() on basic_character could be made non-member functions and could be implemented for char and wchar_t. That would help code deal with synchronous code in the asynchronous style.)
I guess I could implement bool eof(int n) { return n == EOF; } bool good(int n) { return n != EOF; } bool weof(std::char_traits<wchar_t>::int_type n) { return n == WEOF; } bool wgood(std::char_traits<wchar_t>::int_type n) { return n != WEOF; } I tend to think that n == EOF and n == WEOF are more readable, however. Thanks for your comments! Jonathan

From: "Jonathan Turkanis" <technews@kangaroologic.com>
Rob Stewart wrote:
From: "Jonathan Turkanis" <technews@kangaroologic.com>
expect I will have to introduce some new Device concepts. However, I would like to modify the *current* filter concepts so that they will work unchanged when non-blocking and asynchronous devices are introduced.
There's a difference between making the concepts work unchanged and making the components of the library work unchanged.
True. I can always fix the library-provided component so that they do the right thing, even if I have to rely on magic, i.e., on an incestuous relationship with library internals.
If the library is well-received, however, I expect eventually there will be a large body of user-defined filters, and I wouldn't want them to have to be rewritten.
Given the filters written with sync I/O in mind are quite different from those written for async I/O, is that really an issue?
- Instead of returning void, write() can return an integer indicating the number of characters written.
But it also needs to indicate errors.
Errors are indicated with exceptions.
(This is another topic that didn't get much attention during review. There are some, notably James Kanze, who insist that well-designed stream buffers should not throw exceptions. My defense of exceptions is here: http://tinyurl.com/59xv6. At the time of the review, I was prepared to switch to error codes, but it's a bit late now.)
I apparently missed that during the review. On what basis does James Kanze make his assertion? So long as EOF and EAGAIN do not result in exceptions, given how commonly they occur, I have no problems with exceptions. However, it sounds like you might be throwing an exception to indicate EOF. If so, I don't like it. It is not entirely uncommon to read to EOF, do something to a file, and continue reading. For example, we have certain files that we tail throughout the day in our production apps. EOF is a common occurence because we quickly process the new data appended to the files and reach EOF again. That is, the sink is faster than the source. This behavior would be complicated by EOF throwing an exception. An error indication of EOF, plus the ability to clear that condition and try again, fits this usage better. (Granted, that use of a file is not widespread, but since you offer no choice, such usage is not possible.) You mentioned in the cited documentation section that you didn't want to complicate things such that users would wonder which of the various ways a given function indicates errors. I sympathize, but isn't that what you're introducing with your basic_character idea? How about putting a state variable in filters, sources, and sinks to indicate EAGAIN, EOF, and GOOD. Then, functions that only read or write a single character can return a bool, and functions that read or write multiple characters can return a count. If the bool is false or the count is zero, the code can query the state to determine whether EAGAIN or EOF occurred. That keeps the error indication in the return value simple and even leaves room for adding additional error states should that prove necessary. Hard errors can still be signaled via exception.
V. Problems ----------------------------
1. Harder to learn. Currently the function get and the concept InputFilter are very easy to explain. I'm afraid having to understand the basic_character template before learning these functions will discourage people from using the library.
The class template is hardly complicated. I can't imagine it would be a show stopper, though it does add some complexity.
I'm thinking of decribing it as a replacement for traits_type::int_type. Compared to using int_type and its family of helper functions (eq_int_type, eof., not_eof, ...) basic_character is a snap. ;-)
That's a good way to make it palatable to those already familiar with writing streambufs, but it hardly applies to the majority of your library audience, does it.
2. Harder to use. Having to check for eof and fail make writing simple filters, like the above, slightly harder. I'm worried that the effect on more complex filters may be even worse. This applies not just to get, but to the other functions as well, since their returns values will require more careful examination.
That's a real issue.
It's become even more clear in this thread, given my botched uncommenting_filter implementation.
Keep running notes on the things one can forget or do wrong so your documentation can warn about them.
That way, those writing synchronous code can write it using a character type of char/wchar_t and everything to/from that code will be in the simplified, synchronous style you currently offer. However, if the client takes advantage of the advanced, asynchronous capabilities of the library by using basic_character, then the style changes based upon needing to deal with the EAGAIN condition.
If I adopt this solution, I'll be writing non-blocking versions of all the library filters and so may be able to judge whether there is a big stylistic difference, and whether it would be a good idea or even possible to modify the current concepts.
That sounds good.
(The member functions good(), eof(), and fail() on basic_character could be made non-member functions and could be implemented for char and wchar_t. That would help code deal with synchronous code in the asynchronous style.)
I guess I could implement
bool eof(int n) { return n == EOF; } bool good(int n) { return n != EOF; }
bool weof(std::char_traits<wchar_t>::int_type n) { return n == WEOF; } bool wgood(std::char_traits<wchar_t>::int_type n) { return n != WEOF; }
I tend to think that n == EOF and n == WEOF are more readable, however.
Yes, they are more readable, but if you don't document the implementation of those functions, library users will expect to use them in all situations. Thus, switching to async from sync I/O will be less of a stylistic change. (That also leaves room for you to change the implementation details if need be.) Of course, if you adopt my state idea, then the return value would simply indicate that things went well or something less than ideal happened. In the latter case, one would query the state of the filter/device to learn what happened. That would obviate your basic_character class and these functions. One problem with code that returns status information is that folks can forget to check the status. You could return a special type in lieu of bool, and basic_character in lieu of char_type, which require inspecting whether they indicate an error condition. If that query is not done, the destructor can complain. OTOH, you could just say, "don't do that!" -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Rob Stewart wrote:
From: "Jonathan Turkanis" <technews@kangaroologic.com>
If the library is well-received, however, I expect eventually there will be a large body of user-defined filters, and I wouldn't want them to have to be rewritten.
Given the filters written with sync I/O in mind are quite different from those written for async I/O, is that really an issue?
Do you mean that different filtering operations will be desired, or that the implementation will be different? My hope was to make the implementation of non-blocking filters easy enough that I could require all filters to be non-blocking. Maybe my original message was unclear on this point.
(This is another topic that didn't get much attention during review. There are some, notably James Kanze, who insist that well-designed stream buffers should not throw exceptions. My defense of exceptions is here: http://tinyurl.com/59xv6. At the time of the review, I was prepared to switch to error codes, but it's a bit late now.)
I apparently missed that during the review. On what basis does James Kanze make his assertion?
See this message by James Kanze, my reply, his reply and my further reply: http://tinyurl.com/4tkj3. I don't find his argument very convincing on this point. But he is very knowledgable, so I'm reluctant to dismiss his view.
So long as EOF and EAGAIN do not result in exceptions,
right
given how commonly they occur, I have no problems with exceptions. However, it sounds like you might be throwing an exception to indicate EOF. If so, I don't like it.
How did I give you that impression?
It is not entirely uncommon to read to EOF, do something to a file, and continue reading. For example, we have certain files that we tail throughout the day in our production apps. EOF is a common occurence because we quickly process the new data appended to the files and reach EOF again. That is, the sink is faster than the source. This behavior would be complicated by EOF throwing an exception. An error indication of EOF, plus the ability to clear that condition and try again, fits this usage better. (Granted, that use of a file is not widespread, but since you offer no choice, such usage is not possible.) You mentioned in the cited documentation section that you didn't want to complicate things such that users would wonder which of the various ways a given function indicates errors. I sympathize, but isn't that what you're introducing with your basic_character idea?
I'm not treating EOF or EAGAIN as errors. What I'm trying to avoid is having a single error (or any other type of condition, for that matter) represented in more than one way.
How about putting a state variable in filters, sources, and sinks to indicate EAGAIN, EOF, and GOOD. Then, functions that only read or write a single character can return a bool, and functions that read or write multiple characters can return a count. If the bool is false or the count is zero, the code can query the state to determine whether EAGAIN or EOF occurred. That keeps the error indication in the return value simple and even leaves room for adding additional error states should that prove necessary. Hard errors can still be signaled via exception.
This was one of the options I seriously considered. I believe the very first version of the library I posted had this feature. I tend to think it would make life harder for people writing filters and devices to force them to implement an additional function. As things stand, you can often get by with a implementing a single function, and it works pretty well. Also, it makes it harder to fit standard streams and stream bufferss into the framework: stream buffers don't have a state indicator at all; streams do, but it doesn't map well to good/eof/eagain. I'm willing to be convinced otherwise, however.
V. Problems ----------------------------
1. Harder to learn. Currently the function get and the concept InputFilter are very easy to explain. I'm afraid having to understand the basic_character template before learning these functions will discourage people from using the library.
The class template is hardly complicated. I can't imagine it would be a show stopper, though it does add some complexity.
I'm thinking of decribing it as a replacement for traits_type::int_type. Compared to using int_type and its family of helper functions (eq_int_type, eof., not_eof, ...) basic_character is a snap. ;-)
That's a good way to make it palatable to those already familiar with writing streambufs, but it hardly applies to the majority of your library audience, does it.
The current interface has get() return an instance of int_type, so if I keep the interface how it is, I have to say something in the docs about char_traits. But you have a good point.
2. Harder to use. Having to check for eof and fail make writing simple filters, like the above, slightly harder. I'm worried that the effect on more complex filters may be even worse. This applies not just to get, but to the other functions as well, since their returns values will require more careful examination.
That's a real issue.
It's become even more clear in this thread, given my botched uncommenting_filter implementation.
Keep running notes on the things one can forget or do wrong so your documentation can warn about them.
Good idea. I'll definitely do this; the question is whether it will be in the advanced section or will apply to all filters and devices.
(The member functions good(), eof(), and fail() on basic_character could be made non-member functions and could be implemented for char and wchar_t.
I forgot to say that I like the idea of making them non-members. All I am unsure about is whether they should be implemented for char and wchar_t (actually for int and std::char_traits<wchar_t>::int_type)
That would help code deal with synchronous code in the asynchronous style.)
I guess I could implement
bool eof(int n) { return n == EOF; } bool good(int n) { return n != EOF; }
bool weof(std::char_traits<wchar_t>::int_type n) { return n == WEOF; } bool wgood(std::char_traits<wchar_t>::int_type n) { return n != WEOF; }
I tend to think that n == EOF and n == WEOF are more readable, however.
Yes, they are more readable, but if you don't document the implementation of those functions, library users will expect to use them in all situations. Thus, switching to async from sync I/O will be less of a stylistic change. (That also leaves room for you to change the implementation details if need be.)
Let me make sure we're on the same page: We're assuming I decide to provide separate concepts for non-blocking i/o, and that some of the non-blocking concepts use basic_character. Are you saying that someone who is used to writing non-blocking filters and tries to write a blocking filter may get confused because eof() doesn't work with int? Maybe I could have get() return a basic_character even in the blocking case, and explain that there's no need to test for eagain in blocking i/o. This would eliminate one of the last traces of char_traits in the public interface of the library, which would be good. The other reasonable way to eliminate char_traits would be to have blocking get() return an optional<char> (see my reply to Thorsten); but if I'm using basic_character for non-blocking i/o, it may make sense to use it in both places.
Of course, if you adopt my state idea, then the return value would simply indicate that things went well or something less than ideal happened. In the latter case, one would query the state of the filter/device to learn what happened. That would obviate your basic_character class and these functions.
One problem with code that returns status information is that folks can forget to check the status.
This is a real problem with eagain, since testing may not reveal the error unless eagain happens to occur at the right place in a character sequence.
You could return a special type in lieu of bool, and basic_character in lieu of char_type, which require inspecting whether they indicate an error condition. If that query is not done, the destructor can complain.
Could you elaborate? It sounds like it could lead to very poor performance.
OTOH, you could just say, "don't do that!"
Best Regards, Jonathan

From: "Jonathan Turkanis" <technews@kangaroologic.com> I just noticed that I didn't reply to this.
Rob Stewart wrote:
From: "Jonathan Turkanis" <technews@kangaroologic.com>
If the library is well-received, however, I expect eventually there will be a large body of user-defined filters, and I wouldn't want them to have to be rewritten.
Given the filters written with sync I/O in mind are quite different from those written for async I/O, is that really an issue?
Do you mean that different filtering operations will be desired, or that the implementation will be different?
I meant that the implementation would differ.
My hope was to make the implementation of non-blocking filters easy enough that I could require all filters to be non-blocking. Maybe my original message was unclear on this point.
OK.
(This is another topic that didn't get much attention during review. There are some, notably James Kanze, who insist that well-designed stream buffers should not throw exceptions. My defense of exceptions is here: http://tinyurl.com/59xv6. At the time of the review, I was prepared to switch to error codes, but it's a bit late now.)
I apparently missed that during the review. On what basis does James Kanze make his assertion?
See this message by James Kanze, my reply, his reply and my further reply: http://tinyurl.com/4tkj3. I don't find his argument very convincing on this point. But he is very knowledgable, so I'm reluctant to dismiss his view.
In the end, I think James conceded that exceptions for unusual things were needed. It's the common events that should be handled by exceptions:
So long as EOF and EAGAIN do not result in exceptions,
right
given how commonly they occur, I have no problems with exceptions. However, it sounds like you might be throwing an exception to indicate EOF. If so, I don't like it.
How did I give you that impression?
You're not, so it isn't important.
It is not entirely uncommon to read to EOF, do something to a file, and continue reading. For example, we have certain files that we tail throughout the day in our production apps. EOF is a common occurence because we quickly process the new data appended to the files and reach EOF again. That is, the sink is faster than the source. This behavior would be complicated by EOF throwing an exception. An error indication of EOF, plus the ability to clear that condition and try again, fits this usage better. (Granted, that use of a file is not widespread, but since you offer no choice, such usage is not possible.) You mentioned in the cited documentation section that you didn't want to complicate things such that users would wonder which of the various ways a given function indicates errors. I sympathize, but isn't that what you're introducing with your basic_character idea?
I'm not treating EOF or EAGAIN as errors. What I'm trying to avoid is having a single error (or any other type of condition, for that matter) represented in more than one way.
Good.
How about putting a state variable in filters, sources, and sinks to indicate EAGAIN, EOF, and GOOD. Then, functions that only read or write a single character can return a bool, and functions that read or write multiple characters can return a count. If the bool is false or the count is zero, the code can query the state to determine whether EAGAIN or EOF occurred. That keeps the error indication in the return value simple and even leaves room for adding additional error states should that prove necessary. Hard errors can still be signaled via exception.
This was one of the options I seriously considered. I believe the very first version of the library I posted had this feature. I tend to think it would make life harder for people writing filters and devices to force them to implement an additional function. As things stand, you can often get by with a implementing a single function, and it works pretty well. Also, it makes it harder to fit standard streams and stream bufferss into the framework: stream buffers don't have a state indicator at all; streams do, but it doesn't map well to good/eof/eagain.
Excellent points. I think your current direction is appropriate in light of these things.
That would help code deal with synchronous code in the asynchronous style.)
I guess I could implement
bool eof(int n) { return n == EOF; } bool good(int n) { return n != EOF; }
bool weof(std::char_traits<wchar_t>::int_type n) { return n == WEOF; } bool wgood(std::char_traits<wchar_t>::int_type n) { return n != WEOF; }
I tend to think that n == EOF and n == WEOF are more readable, however.
Yes, they are more readable, but if you don't document the implementation of those functions, library users will expect to use them in all situations. Thus, switching to async from sync I/O will be less of a stylistic change. (That also leaves room for you to change the implementation details if need be.)
Let me make sure we're on the same page: We're assuming I decide to provide separate concepts for non-blocking i/o, and that some of the non-blocking concepts use basic_character. Are you saying that someone who is used to writing non-blocking filters and tries to write a blocking filter may get confused because eof() doesn't work with int?
Maybe I could have get() return a basic_character even in the blocking case, and explain that there's no need to test for eagain in blocking i/o. This would eliminate one of the last traces of char_traits in the public interface of the library, which would be good. The other reasonable way to eliminate char_traits would be to have blocking get() return an optional<char> (see my reply to Thorsten); but if I'm using basic_character for non-blocking i/o, it may make sense to use it in both places.
I meant that if the library user only knows to write "eof(c)," and doesn't know that "c == EOF" is an alternative spelling because you don't document it, then the user will only ever write "eof(c)" whether they are writing a blocking or non-blocking filter. Whether blocking filters traffic char_traits, while non-blocking uses basic_character is a separate question, but I think the answer should be obvious: use basic_character in both places. Then, library users can remain oblivious to char_traits if they don't already know about it.
Of course, if you adopt my state idea, then the return value would simply indicate that things went well or something less than ideal happened. In the latter case, one would query the state of the filter/device to learn what happened. That would obviate your basic_character class and these functions.
One problem with code that returns status information is that folks can forget to check the status.
This is a real problem with eagain, since testing may not reveal the error unless eagain happens to occur at the right place in a character sequence.
You'll need special test sources and sinks that can be configured to produce "would_block" after N calls.
You could return a special type in lieu of bool, and basic_character in lieu of char_type, which require inspecting whether they indicate an error condition. If that query is not done, the destructor can complain.
Could you elaborate? It sounds like it could lead to very poor performance.
I just mean that querying the status sets a flag in the object that the dtor checks. If the flag wasn't set, then the dtor complains. The complaint code could be an assertion or maybe a message printed on stderr. You can even conditionally compile away the checks. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Rob Stewart wrote:
From: "Jonathan Turkanis" <technews@kangaroologic.com>
Rob Stewart wrote:
From: "Jonathan Turkanis" <technews@kangaroologic.com>
How about putting a state variable in filters, sources, and sinks to indicate EAGAIN, EOF, and GOOD. Then, functions that only read or write a single character can return a bool, and functions that read or write multiple characters can return a count. If the bool is false or the count is zero, the code can query the state to determine whether EAGAIN or EOF occurred. That keeps the error indication in the return value simple and even leaves room for adding additional error states should that prove necessary. Hard errors can still be signaled via exception.
This was one of the options I seriously considered. I believe the very first version of the library I posted had this feature. I tend to think it would make life harder for people writing filters and devices to force them to implement an additional function. As things stand, you can often get by with a implementing a single function, and it works pretty well. Also, it makes it harder to fit standard streams and stream bufferss into the framework: stream buffers don't have a state indicator at all; streams do, but it doesn't map well to good/eof/eagain.
Excellent points. I think your current direction is appropriate in light of these things.
Good. Thanks.
Yes, they are more readable, but if you don't document the implementation of those functions, library users will expect to use them in all situations. Thus, switching to async from sync I/O will be less of a stylistic change. (That also leaves room for you to change the implementation details if need be.)
I meant that if the library user only knows to write "eof(c)," and doesn't know that "c == EOF" is an alternative spelling because you don't document it, then the user will only ever write "eof(c)" whether they are writing a blocking or non-blocking filter.
Okay, I understand.
Whether blocking filters traffic char_traits, while non-blocking uses basic_character is a separate question, but I think the answer should be obvious: use basic_character in both places.
I agree.
Then, library users can remain oblivious to char_traits if they don't already know about it.
Or if they already know about it they can start to try to forget about it. ;-) I've started writing some non-blocking filters, and they don't look so bad, as long as they process one character at a time. So my inclination is to make all filters non-blocking except for a couple of convenience filters which process an entire document at a time. In that case, get will always return basic_character. One more possibility is this: enum eof_type { eof }; enum would_block_type { would_block }; template<typename Ch> class basic_character { basic_character(Ch = Ch()); basic_character(eof_type); basic_character(would_block_type); operator Ch () const; operator safe_bool () const; bool operator==(eof_type) const; bool operator!=(eof_type) const; bool operator==(would_block_type) const; bool operator!=(would_block_type) const; // All the other operators we discussed }; This would allow the usage: if (c == eof) { ... } if (c == would_block) { ... } How do you like this?
This is a real problem with eagain, since testing may not reveal the error unless eagain happens to occur at the right place in a character sequence.
You'll need special test sources and sinks that can be configured to produce "would_block" after N calls.
I'm think I might use file devices which process a random number of characters at a time.
You could return a special type in lieu of bool, and basic_character in lieu of char_type, which require inspecting whether they indicate an error condition. If that query is not done, the destructor can complain.
Could you elaborate? It sounds like it could lead to very poor performance.
I just mean that querying the status sets a flag in the object that the dtor checks. If the flag wasn't set, then the dtor complains. The complaint code could be an assertion or maybe a message printed on stderr. You can even conditionally compile away the checks.
That's what I though you meant. I think this might be a good idea for a debug mode. Thanks again! Jonathan

From: "Jonathan Turkanis" <technews@kangaroologic.com>
Rob Stewart wrote:
Then, library users can remain oblivious to char_traits if they don't already know about it.
Or if they already know about it they can start to try to forget about it. ;-)
:-)
I've started writing some non-blocking filters, and they don't look so bad, as long as they process one character at a time. So my inclination is to make all filters non-blocking except for a couple of convenience filters which process an entire document at a time. In that case, get will always return basic_character.
You gave the caveat, "as long as they process one character at a time." What about the other case? read() and write() return numbers, not characters, right? basic_character doesn't come into play there. Are you talking about something else?
One more possibility is this:
enum eof_type { eof };
enum would_block_type { would_block };
template<typename Ch> class basic_character { basic_character(Ch = Ch()); basic_character(eof_type); basic_character(would_block_type); operator Ch () const; operator safe_bool () const; bool operator==(eof_type) const; bool operator!=(eof_type) const; bool operator==(would_block_type) const; bool operator!=(would_block_type) const;
// All the other operators we discussed };
This would allow the usage:
if (c == eof) { ... } if (c == would_block) { ... }
How do you like this?
Hmmm. You've complicated the interface still more to get that syntax, which is more verbose besides. I don't care for it since I'm happy with the look of this: if (eof(c)) ... if (would_block(c)) ... Still, if there are folks firmly entrenched in the camp that prefers the (in)equality operator for those tests, it is an excellent approach. (If you're going to fatten the interface to provide this, I suggest keeping the non-member functions, too.) -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Rob Stewart wrote:
From: "Jonathan Turkanis":
I've started writing some non-blocking filters, and they don't look so bad, as long as they process one character at a time. So my inclination is to make all filters non-blocking except for a couple of convenience filters which process an entire document at a time. In that case, get will always return basic_character.
You gave the caveat, "as long as they process one character at a time." What about the other case? read() and write() return numbers, not characters, right? basic_character doesn't come into play there. Are you talking about something else?
Oops -- I forgot to finish my overview. Right now I have InputFilter, MulticharInputFilter, OutputFilter and MulticharOutputFilter. On top of these are built one_step_filter (a convenience class) and symmetric_filter_adapter (useful for converting C interfaces). I'm thinking I'll promote symmetric_filter_adapter to a full-fledged concept SymmetricFilter, and recommend it as the filter concept to use for high-performance applications. I'll get rid of the Multichar filters, because I've found writing non-blocking Multichar filters to be extremely messy; it's just as easy to write a SymmetricFilter in that case. So there will be three types of non-blocking filters: InputFilter (renamed PullFilter), OutputFilter (renamed PushFilter) and SymmetricFilter. There will also be two kinds of filters for beginners: one_step_filter, in which an entire document is presented in a vector, and filtered version must be appended to a second vector, and stdio_filter, in which the filter reads from std::cin and writes to std::cout. In the tutorial, I'll analyze each of the current example filters in detail (except that the presidential one will just be called dictionary_filter). I'll start by showing how to implement the algorithm using a stdio_filter, then I'll show how to modify it to implement the more advanced filter concepts.
One more possibility is this:
enum eof_type { eof };
enum would_block_type { would_block };
template<typename Ch> class basic_character { basic_character(Ch = Ch()); basic_character(eof_type); basic_character(would_block_type); operator Ch () const; operator safe_bool () const; bool operator==(eof_type) const; bool operator!=(eof_type) const; bool operator==(would_block_type) const; bool operator!=(would_block_type) const;
// All the other operators we discussed };
This would allow the usage:
if (c == eof) { ... } if (c == would_block) { ... }
How do you like this?
Hmmm. You've complicated the interface still more to get that syntax, which is more verbose besides. I don't care for it since I'm happy with the look of this:
if (eof(c)) ... if (would_block(c)) ...
Okay, I just wanted to get your opinion. I think I like the functions better too.
Still, if there are folks firmly entrenched in the camp that prefers the (in)equality operator for those tests, it is an excellent approach. (If you're going to fatten the interface to provide this, I suggest keeping the non-member functions, too.)
I think it should be one or the other. So I'll use the functions. Jonathan

From: "Jonathan Turkanis" <technews@kangaroologic.com>
Oops -- I forgot to finish my overview. Right now I have InputFilter, MulticharInputFilter, OutputFilter and MulticharOutputFilter. On top of these are built one_step_filter (a convenience class) and symmetric_filter_adapter (useful for converting C interfaces).
I'm thinking I'll promote symmetric_filter_adapter to a full-fledged concept SymmetricFilter, and recommend it as the filter concept to use for high-performance applications. I'll get rid of the Multichar filters, because I've found writing non-blocking Multichar filters to be extremely messy; it's just as easy to write a SymmetricFilter in that case.
So there will be three types of non-blocking filters: InputFilter (renamed PullFilter), OutputFilter (renamed PushFilter) and SymmetricFilter. There will also be two kinds of filters for beginners: one_step_filter, in which an entire document is presented in a vector, and filtered version must be appended to a second vector, and stdio_filter, in which the filter reads from std::cin and writes to std::cout.
In the tutorial, I'll analyze each of the current example filters in detail (except that the presidential one will just be called dictionary_filter). I'll start by showing how to implement the algorithm using a stdio_filter, then I'll show how to modify it to implement the more advanced filter concepts.
Sounds great. The reduction in the number of concepts is valuable. The tutorial progression is a good approach. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

On Wed, 2 Mar 2005 21:03:43 -0700, "Jonathan Turkanis" <technews@kangaroologic.com> said:
Hi All,
Hi. Sorry I'm replying to this thread so late! [ snip ]
My preferred solution is to have get() return an instance of a specialization of a class template basic_character which can hold a character, an EOF indication or a temporary failure indication:
template<typename Ch> class basic_character { public: basic_character(Ch c); operator Ch () const; bool good() const; bool eof() const; bool fail() const; };
typedef basic_character<char> character; typedef basic_character<wchar_t> wcharacter;
character eof(); // returns an EOF indication character fail(); // returns a temporary failure indication.
wcharater weof(); wcharater wfail();
[Omitted: templated versions of eof and fail]
How about if the character class has a safe conversion to bool which returns (!fail() && !eof()) ? All the filter code I've seen (mostly yours, admittedly :) ) calls 'get' in a while loop; how about instead of checking for 'good' status all the time, as in this code:
class uncommenting_input_filter : public input_filter { public: explicit uncommenting_input_filter(char comment_char = '#') : comment_char_(comment_char) { }
template<typename Source> character get(Source& src) { character c = boost::io::get(src); if (c.good() && c == comment_char_) while (c.good() && c != '\n') c = boost::io::get(src); return c; } private: char comment_char_; };
make it part of the loop continuation test, where the occurrence of EOF or EAGAIN terminates the loop: (BTW, this example (is meant to) take into account the call-again-after-EAGAIN issue) [Warning - don't try to compile this...] struct uncommenting_input_filter : public input_filter { explicit uncommenting_input_filter(char comment_char) : comment_char_(comment_char), in_comment_(false) {} template<typename Source> character get(Source& src) { character c; if (in_comment_) { while (in_comment_ && c = boost::io::get(src)) { // c is not EOF or EAGAIN if (c == '\n') { in_comment_ = false; } } if (in_comment_) // c is EOF or EAGAIN return c; } if (c = boost::io::get(src)) { // c is not EOF or EAGAIN if (c == comment_char_) { in_comment_ = true; return this->get(src); } } return c; } }
Similarly, usenet_filter::get (http://tinyurl.com/6xqvk) could be rewritten:
[ snip ] template<typename Source> int get(Source& src) { if (current_word_complete_ || eof_) { // Return any characters we buffered if (current_word_.size()) { int next = current_word_.begin(); current_word.erase(current_word.begin()); return next; } else { if (eof_) return EOF; else current_word_complete_ = false; } } character c; while (c = boost::io::get(src)) { // c is not EOF or EAGAIN if (is_alpha(c)) { current_word_.push_back(c); } else if (current_word_.size()) { map_type::iterator it = dictionary_.find(current_word_); if (it != dictionary_.end()) { current_word_ = (*it).second; } current_word_complete_ = true; return this->get(src); } else return c; } // c is EOF or EAGAIN if ((c == EOF) && (current_word_.size())) { eof_ = true; return this->get(src); } else return c; } In this mode, EOF and EAGAIN handling both disappear unless you're doing something clever like buffering, since in both cases the filter doesn't want to do anything with the character received except return it to the caller.
V. Problems ----------------------------
1. Harder to learn. Currently the function get and the concept InputFilter are very easy to explain. I'm afraid having to understand the basic_character template before learning these functions will discourage people from using the library.
If you rely on the boolean conversion, you often won't need to care whether the character is good(), fail() or otherwise.
2. Harder to use. Having to check for eof and fail make writing simple filters, like the above, slightly harder. I'm worried that the effect on more complex filters may be even worse. This applies not just to get, but to the other functions as well, since their returns values will require more careful examination.
Actually, moving the algorithm state out of the single 'get' call is the real complication... Now, there may be any number of reasons why this is unworkable - I'm sorry that I haven't had a chance to try the idea on your library before posting...
Please let me know your opinion.
Jonathan
Hope this helps in some way, Matt -- Matthew Vogt mattvogt@warpmail.net

Matthew Vogt wrote:
On Wed, 2 Mar 2005 21:03:43 -0700, "Jonathan Turkanis" <technews@kangaroologic.com> said:
Hi All,
Hi. Sorry I'm replying to this thread so late!
No problem -- I need all the input I can get.
My preferred solution is to have get() return an instance of a specialization of a class template basic_character which can hold a character, an EOF indication or a temporary failure indication:
<snip synopsis>
How about if the character class has a safe conversion to bool which returns (!fail() && !eof()) ?
I'd really like to do this. In fact, this was my first idea of how it would work. Unfortunately, when I tried implementing it I realized that a safe bool conversion interferes with the conversion to char; only one of the two can by implicit. So I could have a safe bool conversion and require that users explcitly call c.value() (or c.get()) when they want to extract a character.
All the filter code I've seen (mostly yours, admittedly :) ) calls 'get' in a while loop; how about instead of checking for 'good' status all the time, as in this code:
struct uncommenting_input_filter : public input_filter { explicit uncommenting_input_filter(char comment_char) : comment_char_(comment_char), in_comment_(false) {}
template<typename Source> character get(Source& src) { character c; if (in_comment_) { while (in_comment_ && c = boost::io::get(src)) { // c is not EOF or EAGAIN if (c == '\n')
if (c.value() == '\n')
{ in_comment_ = false; } } if (in_comment_) // c is EOF or EAGAIN return c; }
if (c = boost::io::get(src)) { // c is not EOF or EAGAIN if (c == comment_char_)
if (c.value() == comment_char_)
{ in_comment_ = true; return this->get(src); } } return c; } }
I guess it looks okay with c.value(). What do you think?
In this mode, EOF and EAGAIN handling both disappear unless you're doing something clever like buffering, since in both cases the filter doesn't want to do anything with the character received except return it to the caller.
This is a good way to explain how to write non-blocking filters. Correct me if I'm wrong, but I think the same discription applied to code which uses good(c) instead of a safe bool conversion.
V. Problems ----------------------------
1. Harder to learn. Currently the function get and the concept InputFilter are very easy to explain. I'm afraid having to understand the basic_character template before learning these functions will discourage people from using the library.
If you rely on the boolean conversion, you often won't need to care whether the character is good(), fail() or otherwise.
You'd be relying on the conversion as a substitute for good, no?
2. Harder to use. Having to check for eof and fail make writing simple filters, like the above, slightly harder. I'm worried that the effect on more complex filters may be even worse. This applies not just to get, but to the other functions as well, since their returns values will require more careful examination.
Actually, moving the algorithm state out of the single 'get' call is the real complication...
You're right. Thanks, Matt!
Matt
Jonathan

From: "Jonathan Turkanis" <technews@kangaroologic.com>
Matthew Vogt wrote:
On Wed, 2 Mar 2005 21:03:43 -0700, "Jonathan Turkanis" <technews@kangaroologic.com> said:
My preferred solution is to have get() return an instance of a specialization of a class template basic_character which can hold a character, an EOF indication or a temporary failure indication:
<snip synopsis>
How about if the character class has a safe conversion to bool which returns (!fail() && !eof()) ?
I'd really like to do this. In fact, this was my first idea of how it would work. Unfortunately, when I tried implementing it I realized that a safe bool conversion interferes with the conversion to char; only one of the two can by implicit. So I could have a safe bool conversion and require that users explcitly call c.value() (or c.get()) when they want to extract a character.
[snip example showing result of these ideas]
I guess it looks okay with c.value(). What do you think?
I don't like it.
In this mode, EOF and EAGAIN handling both disappear unless you're doing something clever like buffering, since in both cases the filter doesn't want to do anything with the character received except return it to the caller.
This is a good way to explain how to write non-blocking filters. Correct me if I'm wrong, but I think the same discription applied to code which uses good(c) instead of a safe bool conversion.
That's the right approach: just code in terms of good(c). You get simplified code without the oddity of asking a basic_character for its character. The real value in the suggestion is that one should ignore fail() and eof() conditions in the filter. That's something you can document and your examples can show the simplified form.
V. Problems ----------------------------
1. Harder to learn. Currently the function get and the concept InputFilter are very easy to explain. I'm afraid having to understand the basic_character template before learning these functions will discourage people from using the library.
If you rely on the boolean conversion, you often won't need to care whether the character is good(), fail() or otherwise.
You'd be relying on the conversion as a substitute for good, no?
Syntactic sugar is nice when it isn't too sweet. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Rob Stewart wrote:
I guess it looks okay with c.value(). What do you think?
I don't like it.
I don't like it either, but it does have the benefit of simplifying the code by removing all the explicit good() tests. I think explicit testing for good status is something that's going to be very easy to forget, while you're trying to write 'simple' filters that are implemented in get functions which can (effectively) be called asynchronously.
That's the right approach: just code in terms of good(c). You get simplified code without the oddity of asking a basic_character for its character. The real value in the suggestion is that one should ignore fail() and eof() conditions in the filter. That's something you can document and your examples can show the simplified form.
But the 'simplified' form will have (c = get(src) && c.good()) everywhere, and the abstraction of algorithm state will result in more convoluted boolean expressions where testing 'good' will be very easy to omit...
You'd be relying on the conversion as a substitute for good, no?
Syntactic sugar is nice when it isn't too sweet.
Well, if it helps correctness, it isn't really sugar. Matt

Matthew Vogt wrote:
Rob Stewart wrote:
I guess it looks okay with c.value(). What do you think?
I don't like it.
That's the right approach: just code in terms of good(c). You get simplified code without the oddity of asking a basic_character for its character. The real value in the suggestion is that one should ignore fail() and eof() conditions in the filter. That's something you can document and your examples can show the simplified form.
But the 'simplified' form will have (c = get(src) && c.good()) everywhere, and the abstraction of algorithm state will result in more convoluted boolean expressions where testing 'good' will be very easy to omit...
The idiom would be: if (good(c = get(src))) { ... } Jonathan

On Wed, 9 Mar 2005 13:48:42 -0700, "Jonathan Turkanis" <technews@kangaroologic.com> said:
The idiom would be:
if (good(c = get(src))) { ... }
Yep, that's fine with me. Matt -- Matthew Vogt mattvogt@warpmail.net

From: Matthew Vogt <mattvogt@warpmail.net>
Rob Stewart wrote:
I guess it looks okay with c.value(). What do you think?
I don't like it.
I don't like it either, but it does have the benefit of simplifying the code by removing all the explicit good() tests. I think explicit testing for good status is something that's going to be very easy to forget, while you're trying to write 'simple' filters that are implemented in get functions which can (effectively) be called asynchronously.
Your code had implicit goodness tests. They, too, can be forgotten, but I understand your point is that this: if (c = get(src)) isn't checking whether c is a non-zero character, but whether it is a good character.
That's the right approach: just code in terms of good(c). You get simplified code without the oddity of asking a basic_character for its character. The real value in the suggestion is that one should ignore fail() and eof() conditions in the filter. That's something you can document and your examples can show the simplified form.
But the 'simplified' form will have (c = get(src) && c.good()) everywhere, and the abstraction of algorithm state will result in more convoluted boolean expressions where testing 'good' will be very easy to omit...
I was thinking of "good(c = get(src))" which isn't convoluted.
You'd be relying on the conversion as a substitute for good, no?
Syntactic sugar is nice when it isn't too sweet.
Well, if it helps correctness, it isn't really sugar.
One thing to remember is that a filter is write once, use many times (modulo bug fixes). Once it works, it works. The syntactic sugar you're suggesting applies in only a few expressions in such a filter, so it isn't much of a win, is it? Let's review. There are several things one needs to do with a basic_character: - return it from a function - test it to determine success - compare it to a char/wchar_t Have I missed anything? Given those requirements the class needs: - value semantics - safe-bool conversion - comparisons with char/wchar_t Therefore, I think this will work: template <typename Ch> class basic_character { public: basic_character(Ch const); Ch value() const; operator unspecified-bool-type() const; bool operator ==(Ch const) const; bool operator !=(Ch const) const; bool operator >(Ch const) const; bool operator >=(Ch const) const; bool operator <(Ch const) const; bool operator <=(Ch const) const; friend bool good(basic_character const); friend bool fail(basic_character const); friend bool eof(basic_character const); friend bool would_block(basic_character const); private: Ch ch_; }; Now you can write both of these: if (c = get(src)) if (c == comment_char_) -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Rob Stewart wrote:
Let's review. There are several things one needs to do with a basic_character:
- return it from a function - test it to determine success - compare it to a char/wchar_t
Have I missed anything?
See below.
Given those requirements the class needs:
- value semantics - safe-bool conversion - comparisons with char/wchar_t
Therefore, I think this will work:
<snip synopsis of basic_character with lots of operator overloading>
Now you can write both of these:
if (c = get(src))
if (c == comment_char_)
Suppose you write a filter which expect ASCII characters. You might want to perform arithmetic operations on characters, e.g. if (c >= 65 && c < 91) c += 32; // Convert to lowercase. This may turn out to be pretty common. So we need +, -, +=, -=, too. Also, this ignores named functions which we might want to pass a character to, and the operations that a custom character type might support. I'm not sure if the safe-bool conversion is worth all this trouble. Fortunately, it's not a majpr design change. I'll soon be writing lots of non-blocking filters, and I can try both versions. Jonathan

Suppose you write a filter which expect ASCII characters. You might want to perform arithmetic operations on characters, e.g.
if (c >= 65 && c < 91) c += 32; // Convert to lowercase.
Surely this is something you would want to discourage?
I'm not sure if the safe-bool conversion is worth all this trouble. Fortunately, it's not a majpr design change. I'll soon be writing lots of non-blocking filters, and I can try both versions.
Well, that's certainly the best way to know. I hope it works out. Matt -- Matthew Vogt mattvogt@warpmail.net

Matthew Vogt wrote:
Suppose you write a filter which expect ASCII characters. You might want to perform arithmetic operations on characters, e.g.
if (c >= 65 && c < 91) c += 32; // Convert to lowercase.
Surely this is something you would want to discourage?
Why? I think it's usually discouraged because people should generally use locales. But if you're writing, e.g., a base64 encoder, the exact numerical values are important. I think this will be true enough of the time that I want people to be able to write c+= 32. Jonathan

From: "Jonathan Turkanis" <technews@kangaroologic.com>
Rob Stewart wrote:
Let's review. There are several things one needs to do with a basic_character:
- return it from a function - test it to determine success - compare it to a char/wchar_t
Have I missed anything?
See below.
Given those requirements the class needs:
- value semantics - safe-bool conversion - comparisons with char/wchar_t
Therefore, I think this will work:
<snip synopsis of basic_character with lots of operator overloading>
Now you can write both of these:
if (c = get(src))
if (c == comment_char_)
Suppose you write a filter which expect ASCII characters. You might want to perform arithmetic operations on characters, e.g.
if (c >= 65 && c < 91) c += 32; // Convert to lowercase.
This may turn out to be pretty common. So we need +, -, +=, -=, too. Also, this
Those are easy enough to add, so my suggestion still works. The unfortunate thing is that this scheme makes basic_character far more complicated, though it can be presented progressively (start with ctor, safe-bool conversion, value(), good(), eof(), fail(), and would_block(); later discuss comparison operators; still later discuss numeric operators).
ignores named functions which we might want to pass a character to, and the operations that a custom character type might support.
There's still the value() member function.
I'm not sure if the safe-bool conversion is worth all this trouble. Fortunately, it's not a majpr design change. I'll soon be writing lots of non-blocking filters, and I can try both versions.
I'm sorry you have to duplicate your work, but that is a good way to decide. If there's no clear winner, post examples so we can compare them. Whatever you choose, someone's bound to ask why you didn't do it another way. I suggest adding a FAQ while the decision is still fresh. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Rob Stewart wrote:
From: "Jonathan Turkanis" <technews@kangaroologic.com>
Suppose you write a filter which expect ASCII characters. You might want to perform arithmetic operations on characters, e.g.
if (c >= 65 && c < 91) c += 32; // Convert to lowercase.
This may turn out to be pretty common. So we need +, -, +=, -=, too. Also, this
Those are easy enough to add, so my suggestion still works.
Correct.
The unfortunate thing is that this scheme makes basic_character far more complicated,
Yes, I was hoping to limit the interface to a single conversion operator. I'd hate to see someone just learning the library look up get() in the reference section, click on the return type and be confronted with a monstrous synopsis. Could I present a "fictional" synopsis of basic_character, which doesn't show all the overloads, and include a note explaining the problem?
though it can be presented progressively (start with ctor, safe-bool conversion, value(), good(), eof(), fail(), and would_block(); later discuss comparison operators; still later discuss numeric operators).
ignores named functions which we might want to pass a character to, and the operations that a custom character type might support.
I'm thinking now it won't be much of a problem, since the conversion to char will likely be the only admissible conversion in such cases: no one will overload a function to take a basic_character::safe_bool, and if the function is a template, the argument will be deduced as basic_charater.
There's still the value() member function.
True.
I'm not sure if the safe-bool conversion is worth all this trouble. Fortunately, it's not a majpr design change. I'll soon be writing lots of non-blocking filters, and I can try both versions.
I'm sorry you have to duplicate your work, but that is a good way to decide. If there's no clear winner, post examples so we can compare them.
I don't think it will be so bad. I'll write the conditional tests using good() instead of the safe-bool conversion, with the hope that it will work for both versions of basic character. If it does, I can replace occurrences of good() with the safe-bool conversion and throw out the orginal version of basic_character.
Whatever you choose, someone's bound to ask why you didn't do it another way. I suggest adding a FAQ while the decision is still fresh.
Good idea. Jonathan

From: "Jonathan Turkanis" <technews@kangaroologic.com>
Rob Stewart wrote:
The unfortunate thing is that this scheme makes basic_character far more complicated,
Yes, I was hoping to limit the interface to a single conversion operator. I'd hate to see someone just learning the library look up get() in the reference section, click on the return type and be confronted with a monstrous synopsis. Could I present a "fictional" synopsis of basic_character, which doesn't show all the overloads, and include a note explaining the problem?
What's fictional?. The operators to which you refer would not be implemented as members, so you can add a section like this to the basic_character interface section: For numeric and character comparisions, basic_character also has the following operators available, where OP is ==, !=, >, <, >=, and <=: template <typename Ch> bool operator OP(basic_character<Ch>, Ch); template <typename Ch> bool operator OP(basic_character<Ch>, int); template <typename Ch> bool operator OP(Ch, basic_character<Ch>); template <typename Ch> bool operator OP(int, basic_character<Ch>); -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Rob Stewart wrote:
From: "Jonathan Turkanis" <technews@kangaroologic.com>
Rob Stewart wrote:
The unfortunate thing is that this scheme makes basic_character far more complicated,
Yes, I was hoping to limit the interface to a single conversion operator. I'd hate to see someone just learning the library look up get() in the reference section, click on the return type and be confronted with a monstrous synopsis. Could I present a "fictional" synopsis of basic_character, which doesn't show all the overloads, and include a note explaining the problem?
What's fictional?. The operators to which you refer would not be implemented as members,
They might be friends implemented in-class.
so you can add a section like this to the basic_character interface section:
For numeric and character comparisions, basic_character also has the following operators available, where OP is ==, !=, >, <, >=, and <=:
template <typename Ch> bool operator OP(basic_character<Ch>, Ch);
template <typename Ch> bool operator OP(basic_character<Ch>, int);
template <typename Ch> bool operator OP(Ch, basic_character<Ch>);
template <typename Ch> bool operator OP(int, basic_character<Ch>);
That's pretty good -- I like "Op". I also need overloads for + and -, which should return basic_character, I think. And +=/-=. Jonathan

From: "Jonathan Turkanis" <technews@kangaroologic.com> Date: Thu, 10 Mar 2005 15:20:23 -0700 Reply-To: boost@lists.boost.org Sender: boost-bounces@lists.boost.org
Rob Stewart wrote:
From: "Jonathan Turkanis" <technews@kangaroologic.com>
Rob Stewart wrote:
The unfortunate thing is that this scheme makes basic_character far more complicated,
Yes, I was hoping to limit the interface to a single conversion operator. I'd hate to see someone just learning the library look up get() in the reference section, click on the return type and be confronted with a monstrous synopsis. Could I present a "fictional" synopsis of basic_character, which doesn't show all the overloads, and include a note explaining the problem?
What's fictional?. The operators to which you refer would not be implemented as members,
They might be friends implemented in-class.
Sure, but they aren't strictly part of basic_character's interface so they don't have to be in the synopsis for the class. Their being implemented as friends in the definition is an implementation detail that doesn't matter for documentation purposes. Thus, I wouldn't call the result of their omission a fictional synopsis. Indeed, the meaning of "synopsis" means you can elide details. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Rob Stewart wrote:
From: "Jonathan Turkanis": Rob Stewart wrote:
From: "Jonathan Turkanis":
Yes, I was hoping to limit the interface to a single conversion operator. I'd hate to see someone just learning the library look up get() in the reference section, click on the return type and be confronted with a monstrous synopsis. Could I present a "fictional" synopsis of basic_character, which doesn't show all the overloads, and include a note explaining the problem?
What's fictional?. The operators to which you refer would not be implemented as members,
They might be friends implemented in-class.
Sure, but they aren't strictly part of basic_character's interface so they don't have to be in the synopsis for the class. Their being implemented as friends in the definition is an implementation detail that doesn't matter for documentation purposes.
Thus, I wouldn't call the result of their omission a fictional synopsis. Indeed, the meaning of "synopsis" means you can elide details.
I don't want to get distracted by the issue of whether the fact that an operator is defined in a friend declaration can be considered an implementation detail. My real question whether I can document the basic_character interface, broadly conceived, as simpler than it really is, and add a note explaining what's missing. I don't want a simple library element to require a huge section of documentation. Jonathan

From: "Jonathan Turkanis" <technews@kangaroologic.com>
Rob Stewart wrote:
From: "Jonathan Turkanis": Rob Stewart wrote:
From: "Jonathan Turkanis":
Yes, I was hoping to limit the interface to a single conversion operator. I'd hate to see someone just learning the library look up get() in the reference section, click on the return type and be confronted with a monstrous synopsis. Could I present a "fictional" synopsis of basic_character, which doesn't show all the overloads, and include a note explaining the problem?
What's fictional?. The operators to which you refer would not be implemented as members,
They might be friends implemented in-class.
Sure, but they aren't strictly part of basic_character's interface so they don't have to be in the synopsis for the class. Their being implemented as friends in the definition is an implementation detail that doesn't matter for documentation purposes.
Thus, I wouldn't call the result of their omission a fictional synopsis. Indeed, the meaning of "synopsis" means you can elide details.
I don't want to get distracted by the issue of whether the fact that an operator is defined in a friend declaration can be considered an implementation detail. My real question whether I can document the basic_character interface, broadly conceived, as simpler than it really is, and add a note explaining what's missing. I don't want a simple library element to require a huge section of documentation.
I'm not sure you got my point. I don't think you can avoid documenting the full interface of basic_character, including the namespace scope operators. However, you can provide a synopsis of the class that shows only the class members with a following section that discusses other functions that work with basic_character to give it a fuller interface. Thus, when clicking on the return type of get(), one sees a reasonably small class definition and discussion thereof. If one continues reading, one will learn about the namespace scope functions that augment that class' interface. If one doesn't continue reading, one simply returns to the previous page thinking basic_character is a pretty simple class. Indeed, one might write code using only that rudimentary knowledge of basic_character and, following the lead of the existing filters and examples, take advantage of the wider interface and not even notice. Eventually, such a one probably will wonder why certain expressions would work and will investigate the broader interface. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Rob Stewart wrote:
From: "Jonathan Turkanis" <technews@kangaroologic.com>
I don't want to get distracted by the issue of whether the fact that an operator is defined in a friend declaration can be considered an implementation detail. My real question whether I can document the basic_character interface, broadly conceived, as simpler than it really is, and add a note explaining what's missing. I don't want a simple library element to require a huge section of documentation.
I'm not sure you got my point. I don't think you can avoid documenting the full interface of basic_character, including the namespace scope operators.
But if they're defined as friends, there not technically namespace scope operators. I guess I can define them at namespace scope just to avoid this problem.
However, you can provide a synopsis of the class that shows only the class members with a following section that discusses other functions that work with basic_character to give it a fuller interface.
Thus, when clicking on the return type of get(), one sees a reasonably small class definition and discussion thereof.
Okay.
If one continues reading, one will learn about the namespace scope functions that augment that class' interface. If one doesn't continue reading, one simply returns to the previous page thinking basic_character is a pretty simple class.
Indeed, one might write code using only that rudimentary knowledge of basic_character and, following the lead of the existing filters and examples, take advantage of the wider interface and not even notice. Eventually, such a one probably will wonder why certain expressions would work and will investigate the broader interface.
May be the reference section for basic_character can start out with some examples. Jonathan

From: "Jonathan Turkanis" <technews@kangaroologic.com>
Rob Stewart wrote:
From: "Jonathan Turkanis" <technews@kangaroologic.com>
I don't want to get distracted by the issue of whether the fact that an operator is defined in a friend declaration can be considered an implementation detail. My real question whether I can document the basic_character interface, broadly conceived, as simpler than it really is, and add a note explaining what's missing. I don't want a simple library element to require a huge section of documentation.
I'm not sure you got my point. I don't think you can avoid documenting the full interface of basic_character, including the namespace scope operators.
But if they're defined as friends, there not technically namespace scope operators. I guess I can define them at namespace scope just to avoid this problem.
If they are friends defined in a class template, they are, by definition, namespace scope functions. It just happens that they aren't declared/defined until the template is specialized. The fact that you use that implementation technique to make the T specialization of an operator a friend of basic_character<T> doesn't change the nature of the operators. The fact that they are friends is merely an implementation detail that permits their being defined within the class definition in the first place. Let me put it another way: namespace boost { namespace iostreams { inline template <typename Ch> bool operator ==(basic_character<Ch> lhs, Ch rhs) { return lhs.value() == rhs; } } } That operator works just fine. It doesn't need to be a friend. It doesn't have to be implemented within the class definition, so it doesn't need to be part of the class synopsis. OTOH, if some of the operators needed to be friends for some reason, that's an implementation detail and you can still document them as == above. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Rob Stewart wrote:
From: "Jonathan Turkanis" <technews@kangaroologic.com>
Rob Stewart wrote:
From: "Jonathan Turkanis" <technews@kangaroologic.com>
I don't want to get distracted by the issue of whether the fact that an operator is defined in a friend declaration can be considered an implementation detail. My real question whether I can document the basic_character interface, broadly conceived, as simpler than it really is, and add a note explaining what's missing. I don't want a simple library element to require a huge section of documentation.
I'm not sure you got my point. I don't think you can avoid documenting the full interface of basic_character, including the namespace scope operators.
But if they're defined as friends, there
they're
not technically namespace
scope operators. I guess I can define them at namespace scope just to avoid this problem.
If they are friends defined in a class template, they are, by definition, namespace scope functions.
I guess I mispoke; it's true that they are namespace scope operatrors; however, they do not introduce new names into the namespace, so the operators cannot be explicitly namespace qualified. Therefore users can tell the difference between a friend function defined in class and a function defined outside the class. As a result, if I document them as defined out of class, but implement them in class, the synopsis you suggest would still be fictional. That's why I said that perhaps I should just implement them out of class to avoid complicating the docs. I think this problem is trivial enough that we've alreday spent to much time on it. ;-) You've already helped me a great deal. I'm hoping I can get your input on some more important questions which will be coming up soon. Thanks again! Jonathan

From: "Jonathan Turkanis" <technews@kangaroologic.com>
Rob Stewart wrote:
From: "Jonathan Turkanis" <technews@kangaroologic.com>
I'm not sure you got my point. I don't think you can avoid documenting the full interface of basic_character, including the namespace scope operators.
But if they're defined as friends, there
they're
Quite.
not technically namespace
scope operators. I guess I can define them at namespace scope just to avoid this problem.
If they are friends defined in a class template, they are, by definition, namespace scope functions.
I guess I mispoke; it's true that they are namespace scope operatrors; however, they do not introduce new names into the namespace, so the operators cannot be
Huh? 14.5.3/1: "the name shall be an unqualified id that declares (or redeclares) an ordinary (nontemplate) function."
explicitly namespace qualified. Therefore users can tell the difference between a friend function defined in class and a function defined outside the class.
Again, huh? Am I missing something?
As a result, if I document them as defined out of class, but implement them in class, the synopsis you suggest would still be fictional. That's why I said that perhaps I should just implement them out of class to avoid complicating the docs.
I don't think that's the case.
I think this problem is trivial enough that we've alreday spent to much time on it. ;-)
No doubt.
You've already helped me a great deal. I'm hoping I can get your input on some more important questions which will be coming up soon. Thanks again!
Great! -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Rob Stewart wrote:
Let's review. There are several things one needs to do with a basic_character:
- return it from a function - test it to determine success - compare it to a char/wchar_t
Have I missed anything?
Given those requirements the class needs:
- value semantics - safe-bool conversion - comparisons with char/wchar_t
Therefore, I think this will work:
I just verified that Borland 5.6.4 complains of ambiguity when safe-bool and char conversions are combined. The same is true with a void* conversion instead of safe-bool. It works with a bool conversion instead of safe-bool, but this makes me a bit nervous. I'll have to wait until I have a bunch of filters implemented. Jonathan

On Thu, 10 Mar 2005 15:36:40 -0700, "Jonathan Turkanis" <technews@kangaroologic.com> said:
I just verified that Borland 5.6.4 complains of ambiguity when safe-bool and char conversions are combined. The same is true with a void* conversion instead of safe-bool. It works with a bool conversion instead of safe-bool, but this makes me a bit nervous. I'll have to wait until I have a bunch of filters implemented.
I realise this is getting out of hand, but you get can around this with two layers of implicit conversion :) Like this (I know the real code is templated, this is just illustration): // return type for boost::io::get struct character : public boost::spirit::safe_bool<character> { // return type for filter::get struct value_type { value_type(const character& src) : c(src.value()) {} operator char() const { return c; } private: char c; }; character(void) {} character(char value) : c(value) {} char value(void) const { return c; } bool operator_bool(void) const { return good(); } private: char c; }; struct some_filter // details ommitted { template<typename Source> character::value_type get(Source& src) { character c; if (c = boost::io::get(src)) { // c is good() }; return c; } }; void user_func(void) { char c = get(some_source); } Obviously, this is getting further and further and from the original code, but only in terms of comprehension. The filter code itself remains transparent. Matt -- Matthew Vogt mattvogt@warpmail.net

Matthew Vogt wrote:
I realise this is getting out of hand,
indeed :-)
but you get can around this with two layers of implicit conversion :) Like this (I know the real code is templated, this is just illustration):
// return type for boost::io::get struct character : public boost::spirit::safe_bool<character> { // return type for filter::get struct value_type { value_type(const character& src) : c(src.value()) {}
operator char() const { return c; }
private: char c; };
character(void) {} character(char value) : c(value) {}
char value(void) const { return c; }
bool operator_bool(void) const { return good(); }
private: char c; };
struct some_filter // details ommitted { template<typename Source> character::value_type get(Source& src) { character c; if (c = boost::io::get(src)) { // c is good() }; return c; } };
I'm not sure how this works; within some_filter get, do you have to use c.value()?
void user_func(void) { char c = get(some_source); }
Obviously, this is getting further and further and from the original code, but only in terms of comprehension. The filter code itself remains transparent.
I'm going to try using char converion and a safe-bool conversion, except on Borland where the safe-bool conversion will be a plain conversion. If it doesn't work, I''ll go back to good(). Thanks for all your suggestions.
Matt
Jonathan

Jonathan Turkanis wrote:
struct some_filter // details ommitted { template<typename Source> character::value_type get(Source& src) { character c; if (c = boost::io::get(src)) { // c is good() }; return c; } };
I'm not sure how this works; within some_filter get, do you have to use c.value()?
No, provided you give the character class all the operators defined for char comparison, etc.
I'm going to try using char converion and a safe-bool conversion, except on Borland where the safe-bool conversion will be a plain conversion. If it doesn't work, I''ll go back to good().
Thanks for all your suggestions.
Suggestions are cheap :) Matt

From: "Jonathan Turkanis" <technews@kangaroologic.com>
I just verified that Borland 5.6.4 complains of ambiguity when safe-bool and char conversions are combined. The same is true with a void* conversion instead of safe-bool. It works with a bool conversion instead of safe-bool, but this makes me a bit nervous. I'll have to wait until I have a bunch of filters implemented.
I have no idea how old that compiler is or how important it is to your user base. Can you ignore it like so many are now ignoring MSVC 6.x? -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Jonathan Turkanis wrote:
if (c = boost::io::get(src)) { // c is not EOF or EAGAIN if (c == comment_char_)
if (c.value() == comment_char_)
{ in_comment_ = true; return this->get(src); } } return c; } }
I guess it looks okay with c.value(). What do you think?
Can the character class have operator==(char), or does this need to be a template? Matt

Matthew Vogt wrote:
Jonathan Turkanis wrote:
if (c = boost::io::get(src)) { // c is not EOF or EAGAIN if (c == comment_char_)
if (c.value() == comment_char_)
{ in_comment_ = true; return this->get(src); } } return c; } }
I guess it looks okay with c.value(). What do you think?
Can the character class have operator==(char), or does this need to be a template?
To support both the implicit conversion to char and the safe bool conversion we'd need to implement many more operators than operator==, and I think there would still be problems with ambiguity. Jonathan
participants (6)
-
Jonathan Turkanis
-
Keith Burton
-
Matthew Vogt
-
Peter Dimov
-
Rob Stewart
-
Thorsten Ottosen