
"Daryle Walker" <darylew@hotmail.com> wrote:
On 9/5/04 8:58 PM, "Jonathan Turkanis" <technews@kangaroologic.com> wrote:
On 8/30/04 12:01 PM, "Jonathan Turkanis" <technews@kangaroologic.com> wrote:
"Daryle Walker" <darylew@hotmail.com> wrote:
1. Aren't memory-mapped files and file descriptors highly platform specific?
But for the thread and file-system libraries, we can define default behavior.
We can do this for memory mapped files as well. Either including the appropriate header could cause a static assertion, or construction of mapped file resources could fail at runtime. Right now I've followed the example of Boost.Filesystem and assumed that every system is either Windows or Posix This can easily be changed to produce more informative errors. Good point
An object that can never be configured to work (for those deficient platforms) isn't very useful.
On those platforms, yes. On supported platforms, it is can be very useful.
I know that thread (and rarely file-system) classes have the same potential drawback, but I feel that threads and file systems are more general "computer science concepts" than memory mapped files, and so allowances could be made for the latter class ideas.
Threads and filesystem support are good additions to boost (and would be to the standard) because they are useful, not because they are general "computer science concepts".
Thread-less environments act as if no spare threads can be allocated.
That's not the approach of Boost.Thread, IIRC. If thread support is unavailable, you get a preprocessor error (at least on windows.)
Maybe that should be considered a bug.
It's useful in contexts where thread support can be turned on or off with a command-line switch. It's probably a bad approach on systems which don't support threads at all.
Binary I/O only concerns itself with bytes, which is too low-level for text I/O. There can and should be bridging code, but the concepts of text sources/sinks should be distinct from binary sources/sinks.
This just doubles the number of concepts, for little gain.
Not separating concepts that have notable distinctions is not a service. (That's why a separated regular pointer-based streams from the ones for pointers-to-const in my library. The "savings" in making only one set of class code wasn't worth mixing the semantics of the two stream types.)
What's wrong with this analogy: Saying that a sequence of characters represents 'text' is like saying that a sequence of characters represents a 'picture' (i.e., that it conforms to some image file format specification, such as jpeg, png, etc.) In order to interpret the data properly, the user must know something about its internal structure, and must in general apply an additional layer of software for the content to be usable. In the case of a sequence of characters representing Chinese text, the user must apply code conversion to produce a wide character representation. In the case of a sequence of characters representing a jpeg image, the user must apply a jpeg interpretter to produce an object representing the image size, pixel data. etc. In the first case, it would be naive to expect that sending the raw character sequence to std::cout will print Chinese characters to the console. In the second case, it would be naive to expect that sending the raw character sequence to std::cout will display a jpeg image on the console. So, do we need another family of resource concepts for 'pictures'? <snip history of C and C++ text/binary distinction>
If you're going to start over from scratch with I/O, why not go all the way and finally split-off binary I/O? Stop it from being treated as "text I/O with funny settings".
I'm not starting from scratch. I'm trying to make it easier to use the existing framework. (In the future, the library may be extended beyond the existing framework.)
filtering_istream in; in.push(regex_filter(regex("damn"), "darn")); in.push(zlib_decompressor()); in.push(file_source("essay.z")); // read from in.
All that's assumed in this example is that the characters in the essay file can be mapped directly to chars. If they can't, one would have to add a layer of code conversion (using converter) after the decompression, and use a wide-character filtering stream and wide-character regex_filter.
That a major implicit assumption.
It's not fundamentally different from the assumption that a sequence of characters conatins a gif image. filtering_istream in; in.push(gif_to_jpeg()) in.push(file_source("pony.gif")); // read jpeg data from in. Trust the programmer.
Can I rephrase this as follows: InputFilters and OutputFilters are a useful addition to the standard library, but Sources and Sinks just duplicate functionality alread present? If this is not your point please correct me
Yes, that's my point. I looked through your code, and thought "this is just a rearrangement of what's already in streams and stream-buffers". I got really convinced of this once I saw that you added member functions for locale control.
I found I had to add this, rather late in development, to implement converting streams and stream buffers (which still aren't finished). What's wrong with locales? You say it like it's a dirty word.
I have no problems with locales. I was noting that the more features you added to the base classes, the more they looked like the rearrangements of the standard I/O base classes.
Localizability is an optional behavior. Most filters and resources won't implement it. Filters and resources *do not* have to derive from the convenience base classes source, sink, input_filter, etc. Since localizability was so easy to add as a no-op, I gave these base classes no-op implementations of imbue and i/o categories refining localizable_tag. Programmers will rarely use this feature, but it imposes no runtime overhead and very little compile-time overhead, so I don't see any problem.
I've recently noticed that even your documentation for the Resource and Filter concepts admit that they're just like certain C++ or C I/O functions.
You mean when I say, for example,
"Filters are class types which define one or more member functions get, put, read, write and seek having interfaces resembling the functions fgetc, fputc, fread, fwrite and fseek from <stdio.h>"
?
Yes. But I was thinking more of the equivalent paragraph you gave in the documentation about Resources.
I think I need to change this part of the documentation. Unlike fread, etc, the basic_streambuf member functions can't be assumed to be familiar to most programmers. I should probably use istream::read, istream::write, etc. The reason I didn't is that these functions don't have the right return types, which is not a good reason since neither does streambuf::sputn.
template<typename Ch> class null_buf { public: typedef Ch char_type; typedef sink_tag category; buf() : count_(0) { } void write(const Ch*, std::streamsize n) { count_ += n} int count() const { return count_; } private: int count_; };
This will lead to a stream buffer which keeps track of how many characters pass through, is optimized for single vs. multiple character output, *and* is buffered by default.
I don't see any buffering. (I guess it'll be in whatever class you hook this up too, like "streambuf_façade".)
Right.
Which version, the first or second?
The second.
(Hopefully the first, since I wrote my code above after the first version, and you wrote the second as a response.) If it's the first, then what is my version missing? (If it's the second, then look at the version of the code under my review before comparing.)
I did. That's how I knew it was 79 lines long. It doesn't provide buffering, as far as I can tell.
The traits type carries the policies for comparing and copying (and EOF issues). Does the user have the option for overriding policies so they're not based on "std::char_traits<Ch>"?
As I said, the only place character traits are used in the public interface of filters and resources is in the return type of get. For this purpose, std::char_traits<Ch>::int_type should always be sufficient. At any rate, I'm considering changing it either to optional<char> or to a class type that can store a char, and eof indicator, or a 'no input available -- try back later' indicator. Then there would be absolutely no use of character traits. If you want to define a stream_facade with a custom char_traits type, you can do so using the second template parameter. template< typename T, typename Tr = ... typename Alloc = ... >, typename Mode = ... > class streambuf_facade;
What I've tried to do with the library is to factor out the essential functionality necessary to define a stream buffer. I've found that in most cases writing a stream buffer can be reduced to implementing one or two functions with simple names and specifications. It seems like an obvious win to me.
But is it always worth the extra layer of indirection you introduce (when you need to interface with standard-looking I/O)?
The indirection, mostly contained in <boost/io/operations.hpp>, is fairly lightweight. Users never need to look at it. I'm not sure why you're so concerned about it.
[SNIP concerns about total code size (in terms of header text length)]
and a large chunk of it is a "poor man's" reflection system.
Do you mean the i/o categories? This follows the example of the standard library and the boost iterator library. It's better than reflection, since
you can't get accidental conformance.
No, I'm talking about the code you used to get the existing standard I/O framework to inter-operate with your framework.
Specifically?
Just the large amount of "detail"-level headers.
Fairly typical for boost, I'm afraid.
[SNIP about forwarding to the base-stream's value-added functions and on the nature of the stream facades.]
1. Are there really any important sources/sinks that can't be put through the existing Standard I/O framework?
The standard library handles non-blocking, asynchronous and multiplexed i/o awkwardly at best. In contrast, for a generic i/o framework, adding such support should be fairly straightforward. We just need to introduce the right concepts.
Whoa.
I just had my "a-ha" moment.
I thought you re-did the interface for streaming concepts just to be arbitrary. But you actually did it because you have issues about the architectural philosophy used by the standard I/O framework, right?! You want to fix the problems with current streaming with re-imagining the architecture (i.e. starting from scratch), and you decided to re-do the interface to match.
As I said above, I don't think I'm redoing it from scratch -- I'm just generalizing a little. Later, I might generalize even more.
I guess one issue is that you're extending functionality through templates, while the standard framework uses virtual member functions.
I don't think virtual functions are an issue. Virtual function calls are only slightly more expensive that ordinary (non-inlined) function calls, and one can't expect all function calls to be inlined when you have a chain of non-trivial filters. One must relying on buffering to mitigate the function call overhead. Since the static types of the filtering streams and stream buffers do not depend on the static types of the filters and resources in the underlying chain, some type of runtime indirection, such as virtual functions, is required. I'm actually taking advantage of the streambuf virtual functions as a feature -- not a liability. If I didn't have basic_streambuf to serve as the 'glue' for filter chains, I'd have to write my own version, probably using virtual functions.
2. An existing source/sink, if it wants to work with Standard C++, would work with the standard framework already.
To summarize: an existing source/sink, if it wants to work with the standard framework, already works with the standard framework?
I meant that existing libraries would have already chosen to base their I/O around the standard framework, if they had no need to customize the I/O experience.
If the library is accepted -- and becomes widely used -- I except that developers will want to write sources and sinks instead of stream buffers. Existing stream buffers can be rewritten as source or sinks fairly easily in many cases.
You have a potential problem: standard C++ I/O is "too hard" But you got the wrong solution: throw away the standard I/O's legacy and start over from scratch (but include transition code)
I hope it's possible to improve some of the standard library I/O framework in the future. Perhaps experience with the current library will help form the basis for a proposal. But that's not the point of the current library. The point is to make easy what is currently not-so-easy, and to reduce the difficulty of what is currently very difficult.
I gave an example (the code you snipped) of how the simplified core interface could be integrated with the standard framework. What are the other difficulties?
I don't understand what's wrong with the way I've done it.
This is independent of the decisions on memory-mapped files, file descriptors, binary I/O, and filters. Couldn't all of those been implemented around the standard framework?
Of couse -- with massive code duplication.
Duplication where? (My question above assumed that your new architecture never existed and you built your other stuff around the standard framework.)
Right. A lot of typcial stream buffer implemention is boilerplate, esp. if buffering is used.
About the Overlap Between Our Contributions
A bunch of people during my I/O review wanted to defer decisions to see your I/O review. I'm not sure that there's a need to pick one-or-the-other due to how they work.
The review managers will sort this out.
I had no intention of redoing the concepts of I/O, so all my sources and sinks extend the standard framework.
You built a whole new framework, hopefully to address problems with the standard framework.
Again, I just wanted to make the standard framework easier to use.
You build the your sources and sinks to work with your framework. And you added adaptors so the new-I/O classes can work with std-I/O classes.
It's really the other way around. And the adapters are so thin you could crush them just be leaning against them ;-)
There's no problems with efficiency if new-I/O is used through-out the user's code, since you use a lot of template goodness. However, if the user needs to interface with std-I/O, at the user end or the final destination end, they will have to take a performance hit since std-I/O will call virtual functions which you can't remove. (The guy who writes the "xpressive" library seems to have techniques around the problem, but I'm not sure they can be applied here. [I don't know what the techniques are.] The std-I/O virtual call dispatch takes place in the standard stream classes, so the "xpressive" technique can't work if code changes are needed.) In these mixed cases, using the new framework can be a win if the applied task takes more time in the new framework than in the adaptor code. If the task at hand has a std-I/O interface, doesn't touch the issues that new-I/O was meant to solve, and can be succinctly expressed with std-I/O, then there is no advantage to making and/or using a new-I/O version, since the layer of indirection given by the adaptor class is the bigger bottleneck. (The pointer-based streams are an example of this.)
I think there's a basic misunderstanding here. The adapters generally have no virtual functions and function calls through the adapters are optimized away entirely. (I've confirmed this on several compilers. It should be true for any decent optimizing compiler.) There is currently an inefficiency when you add a standard stream or stream buffer to the end of a filtering stream, as I describe in the message "IOStreams Formal review -- Guide for Reviewers". This will be eliminated entirely if the library is accepted.
The point is that one set of class doesn't preclude the usage of the other. Each one has situations where it's the better solution.
As far as I can tell, the two valid points you have made, w.r.t. our two contributions, are: 1. Using my library to define a null_buff, pointerbuf or value_buf causes more code to be included. This is a legitimate criticism, but I don't think you've made the case that the amount of code included is so enormous that there should be two versions of the same components in boost. 2. The object code will be slightly larger when using a streambuf_facade (actually, I'm not sure you made that point, but I think it's correct.) This can be mitigated somewhat if it turns out to be a problem, but I don't think you have shown yet that it is. Best Regards, Jonathan