Fw: [rfc] I/O Library Design

----- Original Message ----- From: "Sebastian Redl" <sebastian.redl@getdesigned.at> To: <boost@lists.boost.org> Sent: Monday, June 18, 2007 3:51 AM Subject: [boost] [rfc] I/O Library Design
[snip]
The document can be found here: http://windmuehlgasse.getdesigned.at/newio/
I'd especially like input on the unresolved issues, but all comments are welcome, even if you tell me that what I'm doing is completely pointless and misguided. (At least I'd know not to waste my time with refining and implementing the design. :-))
Hi Sebastien, Thanks for the read of your doc. On the basis of that and the quality of the related postings I think your efforts have already paid off. I made a couple of attempts to write a decent analysis of your design but they quickly became too detailed and not suitable for this mailing list. I suspect that my point of view also needs more work. Some of your open issues; * Basic Unit Small but ugly issue. My feelings are that non-8-bit-byte archictures need to be explicitly chopped out of the scope or a pure abstract "basic unit" (basun? like beson only more elusive :-) needs to be defined in a similar manner to a codepoint name. Its a strategic decision. Personally I would go for the 8-bit-only. * Async Requests All I/O is more cleanly considered to be async. A sync model of access can always be implemented over the top. * Putback Very contentious. Currently I am swinging towards "no". I have a rule for all of my encodings that each item has "positive termination" or in language processing terms "simple accepting states" * Representation/Endian I think this issue should be bundled with "parsing". Exactly what the ntohl functions do is a nice simple model for what should be done here. * Inexact bit counts Refer to "Basic Unit". * Buffer types and encodings Buffers typed to encodings? No. The only thing buffered will be blocks of basic units. * Interface (I/O streams-like needed?) Yes, if only because adding the backward compatibility should be easy given the design/impl goes well. Some general points; 1. Confusion around char, byte, text, binary, encoding and codepoint For me this has been a bit frustrating (its been untidy for a long time) and also illuminating (Unicode). For me there are bytes (or basic units) and items of application data. Everything in between is encoding. 2. Inclusion of "endianness" and "representation" in the binary layers. IIUC you are allowing applications to declare that they will only talk to (e.g. ) Motorola-based machines. I suppose this can be justified but from an engineering point of view the strategy implicit in ntohl is more appealing. The subtle drawback of allowing the declaration of endianness and the fact that underlying operations (e.g. network nagling) shaft it anyhow makes it a "no go" for me. 3. Lack of extensibility While your design doesnt actually preclude this, it also isnt explicit about it being possible, i.e. how would you redo your diagram for an application that is using different encodings over different network connections and to data files. Cheers, Scott

"Scott Woods" <scott.suzuki@gmail.com> writes:
----- Original Message ----- From: "Sebastian Redl" <sebastian.redl@getdesigned.at> To: <boost@lists.boost.org> Sent: Monday, June 18, 2007 3:51 AM Subject: [boost] [rfc] I/O Library Design
[snip]
The document can be found here: http://windmuehlgasse.getdesigned.at/newio/
I'd especially like input on the unresolved issues, but all comments are welcome, even if you tell me that what I'm doing is completely pointless and misguided. (At least I'd know not to waste my time with refining and implementing the design. :-))
Hi Sebastien,
Thanks for the read of your doc. On the basis of that and the quality of the related postings I think your efforts have already paid off.
I made a couple of attempts to write a decent analysis of your design but they quickly became too detailed and not suitable for this mailing list. I suspect that my point of view also needs more work.
Some of your open issues; * Basic Unit Small but ugly issue. My feelings are that non-8-bit-byte archictures need to be explicitly chopped out of the scope or a pure abstract "basic unit" (basun? like beson only more elusive :-) needs to be defined in a similar manner to a codepoint name. Its a strategic decision. Personally I would go for the 8-bit-only. * Async Requests All I/O is more cleanly considered to be async. A sync model of access can always be implemented over the top.
In theory, this might be true, but in practice there is a significant difference in interfaces required for synchronous operations and the interfaces required for asynchronous operations: synchronous operations can essentially just block, while asynchronous operations need not only access to something like the io_service from asio, but also each operation needs to be supplied with a completion callback. Using a completion callback is inherently less efficient than merely blocking, and explicitly waiting for completion is also inherently less efficient, and consequently building synchronous operations on top of asynchronous operations would add some significant overhead. There is also a bigger problem. In many cases asynchronous operations simply aren't supported at all by the underlying device, and would have to be emulated by threads, which would add even more overhead. I suppose one possibility would be for the "asynchronous" operation to just block if the underlying device doesn't support asynchronous operations. As far as I understand, you are suggesting that every interface be asynchronous, except that a single synchronous layer at the very top could be added, but nothing else would be built on top of the synchronous layer. The issue is that it is somewhat more complicated, or at least more verbose in C++, to program using asynchronous interfaces, and imposing this inconvenience on users even when they ultimately intend to use the synchronous interface seems undesirable. Perhaps much or all of the run-time overhead could be avoided by using templates in certain ways, while still avoiding source code duplication. I do agree that asynchronous support would be very useful; it just seems very hard to support in practice. It is definitely something that should be considered thoroughly, though.
* Putback Very contentious. Currently I am swinging towards "no". I have a rule for all of my encodings that each item has "positive termination" or in language processing terms "simple accepting states"
I actually think this is a facility that should be provided by the I/O library, but it need not be a requirement of the basic stream interface. Rather, it can be implemented as a filter that can be applied to any stream.
* Representation/Endian I think this issue should be bundled with "parsing". Exactly what the ntohl functions do is a nice simple model for what should be done here. * Inexact bit counts Refer to "Basic Unit". * Buffer types and encodings Buffers typed to encodings? No. The only thing buffered will be blocks of basic units.
I think he may have actually meant marking streams with particular encodings. Specifically, whether there should be a uint16_t stream marked (either at compile-time or run-time) as containing UTF-16 text (a "text stream"), or should there just be uint16_t streams with no such marking. I agree that it is useful for the same buffering facility to be applicable to both text and data (non-text) streams.
* Interface (I/O streams-like needed?) Yes, if only because adding the backward compatibility should be easy given the design/impl goes well.
Some general points;
1. Confusion around char, byte, text, binary, encoding and codepoint For me this has been a bit frustrating (its been untidy for a long time) and also illuminating (Unicode). For me there are bytes (or basic units) and items of application data. Everything in between is encoding.
I see what you mean by this more so than when responding to your previous post. The fact that the application itself may view certain stream or I/O facilities as merely relating to encoding application data does not preclude the usefulness of certain abstractions within the I/O library, like byte streams or text streams. Furthermore, I think it is common that an application would want to deal directly with a text stream, because raw text is the application data.
2. Inclusion of "endianness" and "representation" in the binary layers. IIUC you are allowing applications to declare that they will only talk to (e.g. ) Motorola-based machines. I suppose this can be justified but from an engineering point of view the strategy implicit in ntohl is more appealing. The subtle drawback of allowing the declaration of endianness and the fact that underlying operations (e.g. network nagling) shaft it anyhow makes it a "no go" for me.
I don't think this is what was intended by supporting endian conversion. I believe it is intended to support a very wide variety of operations relating to endianness conversion and other representation conversion. In particular, it would certainly support converting integer types between big endian and native endian, which is what the ntohl/htonl/ntohs/htnos functions in the BSD socket interface do. This facility would be useful, for instance, for decoding UTF-16-BE. It would also be supported, however, to convert between little endian and native endian, or between little endian and big endian.
3. Lack of extensibility While your design doesnt actually preclude this, it also isnt explicit about it being possible, i.e. how would you redo your diagram for an application that is using different encodings over different network connections and to data files.
I agree that it is important to define the core concepts/interfaces (like "device" or "data stream" or "text stream") that will be points where the I/O library interfaces with external facilities. -- Jeremy Maitin-Shepard
participants (2)
-
Jeremy Maitin-Shepard
-
Scott Woods