
Daryle, I think this discussion is getting overheated. (See, e.g., the long code excerpts containing 'jon_xxx'). If I was a bit harsh in my first comments on your library, I'm sorry. I did vote to include a large percentage of it. On 8/30/04 12:01 PM, "Jonathan Turkanis" <technews@kangaroologic.com> wrote:
"Daryle Walker" <darylew@hotmail.com> wrote:
1. Aren't memory-mapped files and file descriptors highly platform specific?
Yes, just like threads, sockets and directory iteration.
Code that works with them would have to be non-portable, so I don't think they're appropriate for Boost.
It achieves portability the same way boost.thread and boost.filesystem do: by having separate implementations for different systems. See http://www.boost.org/more/imp_vars.htm ("Implementation variations").
But for the thread and file-system libraries, we can define default behavior.
We can do this for memory mapped files as well. Either including the appropriate header could cause a static assertion, or construction of mapped file resources could fail at runtime. Right now I've followed the example of Boost.Filesystem and assumed that every system is either Windows or Posix. This can easily be changed to produce more informative errors. Good point.
Thread-less environments act as if no spare threads can be allocated.
That's not the approach of Boost.Thread, IIRC. If thread support is unavailable, you get a preprocessor error (at least on windows.)
All file-systems can simulate a tree/container calculus, so a portable interface can be defined.
Again, Boost.Filesystem doesn't do this.
But memory-mapped files and file descriptors are totally meaningless on some environments; what would the code map to in those cases?
See above.
2. This library does what a lot of other text-I/O libraries do, try to fit in "kewl" compression schemes. The problem is that the types of
compression
here are binary oriented; they convert between sets of byte streams. However, characters are not bytes (although characters, like other types, are stored as bytes).
Are you saying there are problems with the implementation of the compression filters, e.g., that they make unwarranted assumptions about 'char'? If so, please let me know. I'm sure it can be fixed.
I'm complaining that binary I/O should _not_ be treated as a variant of text I/O (which your library assumes).
All I/O is treated as streams of characters. When these streams of characters require special 'textual' interpretation, you can use a newline_filter, for line-ending conversion, or a converter, for code conversion.
Binary I/O only concerns itself with bytes, which is too low-level for text I/O. There can and should be bridging code, but the concepts of text sources/sinks should be distinct from binary sources/sinks.
This just doubles the number of concepts, for little gain.
I don't see the iostream framework as relating to text streams only: streams can handle text and binary. In some cases, you want text and binary to work together. E.g., suppose you have a compressed text file ("essay.z") and you want to read a 'family-friendly' version of it. You can do so as follows:
filtering_istream in; in.push(regex_filter(regex("damn"), "darn")); in.push(zlib_decompressor()); in.push(file_source("essay.z")); // read from in.
Isn't this perfectly natural and convenient? What's wrong with using the decompressor and the regex filter in the same chain?
By itself, nothing. But these compression schemes only work with bytes, so you have hidden at least one text <-> binary converter in your code.
(BTW, the file_source above should have been opened in binary mode.) All that's assumed in this example is that the characters in the essay file can be mapped directly to chars. If they can't, one would have to add a layer of code conversion (using converter) after the decompression, and use a wide-character filtering stream and wide-character regex_filter. If the above example were disallowed, then in the common case that output is stored in a form which can be directly mapped to the internal character set without code conversion, the user would be forced to insert a do-nothing adapter. The current library trusts users to know when they are dealing with data which must be converter to a wide character type before it can be processed by text-oriented filters.
Can I rephrase this as follows: InputFilters and OutputFilters are a useful addition to the standard library, but Sources and Sinks just duplicate functionality alread present? If this is not your point please correct me
Yes, that's my point. I looked through your code, and thought "this is just a rearrangement of what's already in streams and stream-buffers". I got really convinced of this once I saw that you added member functions for locale control.
I found I had to add this, rather late in development, to implement converting streams and stream buffers (which still aren't finished). What's wrong with locales? You say it like it's a dirty word.
I've recently noticed that even your documentation for the Resource and Filter concepts admit that they're just like certain C++ or C I/O functions.
You mean when I say, for example, "Filters are class types which define one or more member functions get, put, read, write and seek having interfaces resembling the functions fgetc, fputc, fread, fwrite and fseek from <stdio.h>" ? The functions boost::io::read, boost::io::write, etc., are indeed generic versions of these familiar functions. I mention the familiar functions as a way to introduce readers to the generic versions. The benefits of generic programming are well known, I hope.
There are two main resons to write Sources and Sinks instead of stream buffers:
1. Sources and Sinks and sinks express just the core functionality of a component. Usually you have to implement just one or two functions with very natural interfaces. You don't have to worry about buffering or about putting back characters. I would have thought it would be obvious that it's easier to write:
template<typename Ch> struct null_buf { typedef Ch char_type; typedef sink_tag category; void write(const Ch*, std::streamsize) { } };
than to write your null_buf, which is 79 lines long.
That really misleading. The null-sink I have does a lot more. I keep track of how many characters passed through (i.e. a value-added function), and I optimize for single vs. multiple character output.
Okay, template<typename Ch> class null_buf { public: typedef Ch char_type; typedef sink_tag category; buf() : count_(0) { } void write(const Ch*, std::streamsize n) { count_ += n} int count() const { return count_; } private: int count_; }; This will lead to a stream buffer which keeps track of how many characters pass through, is optimized for single vs. multiple character output, *and* is buffered by default.
Also, I'm verbose in my writing style. If I wanted to be compact I could just do:
//======================================================================== template < typename Ch, class Tr = std::char_traits<Ch> > class basic_nullbuf : public std::basic_streambuf<Ch, Tr> { protected: // Overriden virtual functions virtual int_type overflow( int_type c = traits_type::eof() ) { return traits_type::not_eof( c ); } };
But that doesn't do what my version, listed above, does.
And for those of you who think that "traits_type" is scary: get over it! Using the obvious substitutes of "==", "<", "(int)", etc. is just sloppy and WRONG. The whole point of the traits class is so that a character type isn't forced to define those operators. Worse, those operators could exist but be inappropriate. For example, Josuttis' STL book has a string type that implements case-insensitive comparisons with a custom traits type. Using operator== directly would have missed that. Ignoring the policies of the traits type's creator could betray his/her vision of usage.
In early versions of my library, filters and resources had traits types as well as charatcer types. Prompted by remarks of Gennadiy Rozental, I made a careful study and found that traits could be eliminated from the public interface of the filter/resource module of the library without sacrificing generality or correctness, except in the case of the return type of get, which is still std::char_traits<char_type>::int_type. Even this could be eliminated by having get return optional<char>. For a more ambitious proposal along these lines, see http://tinyurl.com/6r8p2. Of course, filter and resources authors may need to use char_traits to implement member functions read, write, etc. .... But I'm not sure I see where this discussion is going.
2. Sources and sinks can be reused in cases where standard streams and stream buffers are either unnecessary or are not the appropriate abstraction. For example, suppose you want to write the concatenation of three files to a string. You can do so like this:
string s; boost::io::copy( concatenate( file_source("file1"), file_source("file2"), file_source("file3") ), back_insert_resource(s) );
A straw-man? Wouldn't an iterator-based solution have been better? (There are stream(-buffer) iterators, and (string) insert iterators. If the Boost iterator library provides a chaining iterator type, then the standard copying procedure could be used.)
It's tempting to try to do everything using iterators. In fact, Robert Ramey's original suggestion to expand the library to handle filtering suggested that it be based on iterator adapters. (http://lists.boost.org/MailArchives/boost/msg48300.php) The problem with this approach is that it misses the opportunity for many important optimizations that can be made when one is presented with a contiguous buffer full of characters, instead of one character at a time.
The whole framework seems like "I/O done 'right'", a "better" implementation of the ideas/concepts shown in the standard I/O framework.
I'd say thanks here if 'right' and 'better' weren't in quotes ;-)
It looked like you changed the interface just to change the interface, not out of any actual need. What about the following (untested) code:
I'm going to ignore the code, which seems sarcastic. (Don't name stuff after me until I'm dead.) Instead, let me quote part of my response to Dietmar Kuehl: Jonathan Wrote:
... The protected virtual interface of basic_streambuf is, IMO, quite strange. The function have wierd names: underflow, uflow, pbackfail, overflow, showmanyc, xsptun, xsgetn, seekoff, etc -- the functions read, write, and seek are much more intuitive. The specifications of the standard functions are tricky, too. For example, overflow (one of the better-named functions), is specified roughly like this:
virtual int_type overflow(int_type c = traits_type::eof());
"If c is not eof, attempts to insert into the output sequence the result of converting c to a character. If this can't be done, returns eof or throws an exception. Otherwise returns any value other than eof."
Contrast this with
void write(const char_type* s, std::streamsize n);
"Writes the sequence of n characters starting at s to the output sequence, throwing an exception in case of error."
What I've tried to do with the library is to factor out the essential functionality necessary to define a stream buffer. I've found that in most cases writing a stream buffer can be reduced to implementing one or two functions with simple names and specifications. It seems like an obvious win to me. <snip lots of code>
The price is a code size many times larger than the conventional system,
Are you talking about the size of the libray or the size of the generated code?
The size of the library.
1. The library is big partly because it contains a lot of special pupose components, such as compression filters. You don't pay for them if you don't use them. 2. The support for the generic read and write operations is quite lightweight. 3. If you use the library just to define new stream buffer types, then in addition to (3) the main code comes from <boost/io/detail/streambufs/indirect_streambuf.hpp>, which is the generic streambuf implementation, and from <boost/io/detail/adapters/resource_adapter.hpp> <boost/io/detail/adapters/filter_adapter.hpp> which are lightweight wrappers that allow indirect_streambuf to interact with filters and resources using a single interface. 4. If you want to chain filters, then in addition to (2) and (3), the main code comes from <boost/io/detail/chain.hpp> which at 16k is a small price to pay for a flexible filtering framework.
and a large chunk of it is a "poor man's" reflection system.
Do you mean the i/o categories? This follows the example of the standard library and the boost iterator library. It's better than reflection, since you can't get accidental conformance.
No, I'm talking about the code you used to get the existing standard I/O framework to inter-operate with your framework.
Specifically?
... The sample stream-buffer in More-I/O generally had added-value member functions attached, that perform inspection or (limited) reconfiguration. Those member functions also have to be manually carried over to the final derived stream class. ... The Iostreams framework seems to totally ignore the issue! ...
With a streambuf_facade or stream_facade you can access the underlying resource directly using operators * and ->. E.g.,
stream_facade<tcp_resource> tcp("www.microsoft.com", 80); ... if (tcp->input_closed()) { ... }
Maybe I should stress this more in the documentation. (I imagine some people won't like the use of operators * and -> here, but these can be replaced by a member functions such as resource().)
I didn't like the iterator "look" those operations have.
Noted.
Also, is a stream-façade an actual stream?
Yes.
1. Are there really any important sources/sinks that can't be put through the existing Standard I/O framework?
The standard library handles non-blocking, asynchronous and multiplexed i/o awkwardly at best. In contrast, for a generic i/o framework, adding such support should be fairly straightforward. We just need to introduce the right concepts.
2. An existing source/sink, if it wants to work with Standard C++, would work with the standard framework already.
To summarize: an existing source/sink, if it wants to work with the standard framework, already works with the standard framework?
You have a potential problem: standard C++ I/O is "too hard" But you got the wrong solution: throw away the standard I/O's legacy and start over from scratch (but include transition code)
I hope it's possible to improve some of the standard library I/O framework in the future. Perhaps experience with the current library will help form the basis for a proposal. But that's not the point of the current library. The point is to make easy what is currently not-so-easy, and to reduce the difficulty of what is currently very difficult.
This is independent of the decisions on memory-mapped files, file descriptors, binary I/O, and filters. Couldn't all of those been implemented around the standard framework?
Of couse -- with massive code duplication. Jonathan