
Sebastian Redl <sebastian.redl@getdesigned.at> writes:
A few weeks ago, a discussion that followed the demonstration of the binary_iostream library made me think about the standard C++ I/O and what I would expect from an I/O model.
Now I have a preliminary design document ready and would like to have some feedback from the Boost community on it.
The document can be found here: http://windmuehlgasse.getdesigned.at/newio/
I'd especially like input on the unresolved issues, but all comments are welcome, even if you tell me that what I'm doing is completely pointless and misguided. (At least I'd know not to waste my time with refining and implementing the design. :-))
I am pleased to you taking an interest in a new I/O library for C++. The existing C++ I/O facilities have always bothered me, but I've never gotten around to trying to write something better. I have a number of comments. They aren't particularly well structured, because I didn't bother to try to reorganize them after initially just writing down thoughts as they occurred to me. - I think it is important to look at the boost iostreams architecture, and make sure to include or reuse any of the ideas or even actual code if possible. One idea from that library to consider is the direct/indirect device distinction. - Binary transport layer issue: Make the "binary transport layer" the "byte transport layer" to make it clear that it is for bytes. Platforms with unusual features, like 9-bit bytes or inability to handle types less than 32-bits in size can possibly still implement the interface for a text/character transport layer, possibly on top of some other lower-level transport that need not be part of the boost library. Clearly, the text encoding and decoding would have to be done differently anyway. - Asynchronous issue: Asynchronous I/O is extremely useful, but it also requires a very different architecture --- something like asio io_service is needed to manage requests, a function to call on completion or error must be provided. One issue is that there are very large differences between platforms (Windows and Linux). On Linux, asynchronous I/O via efficient polling for readiness is possible for sockets and pipes using epoll (and somewhat less efficiently using select and poll), but these mechanisms cannot be used for regular files. I think there may be other asynchronous I/O mechanisms on Linux that do support regular files, at least on some filesystems, but which are not very easily compatible with epoll and other methods suitable for sockets. Furthermore, even if read and write are asynchronous, open will always be synchronous on Linux. It may not be feasible, therefore, to implement a proper asynchronous I/O interface on Linux. Even on Windows, I belive it may not be possible to get asynchronous open. Thus, I think I agree that it would be better to avoid including an asychronous I/O interface in this library, although probably a bit more thought should go into the decision before it is made. - Seeking: Maybe make multiple mark/reset use the same interface as seeking, for simplicity. Just define that a seeking device has the additional restriction that the mark type is an offset, and the argument to seek need not be the result of a call to tell. Another issue is whether to standardize the return type from tell, like std::ios_base::streampos in the C++ iostreams library. - Binary formatting (perhaps the name data format would be better?): I think it is important to provide a way to format {uint,int}{8,16,32,64}_t as either little or big endian two's complement (and possibly also one's complement). It might be useful to look at the not-yet-official boost endian library in the vault. A similar variety of output formats for floating point types should also be supported. It is also important to provide the most efficient output format as an option as well (i.e. writing the in-memory represention of the type directly, via e.g. reinterpret_cast). It should probably also be possible to determine using the library at compile time what the native format is. It is not clear what to do about the issue of some platforms not using any standard format as its native format. - Header vs Precompiled: I think as much should be separately compiled as possible, but I also think that type erasure should not be used in any case where it will significantly compromise performance. - The "byte" stream and the character stream, while conceptually different, should probably both be considered just "streams" of particular POD types. The interfaces will in general be exactly the same as far as reading, writing, seeking, filtering. - Text transport: I don't think this layer should be restricted to Unicode encodings. Rather, a text transport should just be a "stream" of type T, where T might be uint8_t, uint16_t, uint32_t depending on the character encoding. For full generality, the library should provide facilities for converting between any two of a large list of encodings. (For simplicity, some of these conversions might internally be implemented by converting first to one encoding, like UTF-16, and then converting to the other encoding, if a direct conversion is not coded specially.) I think it is important to require that all of a minimal set of encodings are supported, where this minimal set should include at least all of the common unicode encodings, and perhaps all of the iso-8559-* encodings as well, in addition to ASCII. - Text formatting: For text formatting, I think it would be very useful to look at the IBM ICU library. It may in fact make sense to leave text formatting as a separate library (for example, as a unicode library), since it is somewhat encoding specific, and a huge task by itself and not very related to this I/O library. As long as the I/O library provides a suitable character stream interface, an arbitrary formatting facility can be used on top of it. -- Jeremy Maitin-Shepard