
Sebastian Redl <sebastian.redl@getdesigned.at> writes:
Jeremy Maitin-Shepard wrote:
- One idea from [Boost.IOStreams] to consider is the direct/indirect device distinction.
I never noticed this distinction before. It seems useful, but there are issues not unlike the AsyncIO issues. Direct devices provide a different interface. A programmer can take advantage of this interface for some purposes, but for most, I fear, the advantages would be lost. Consider: - A direct device cannot be wrapped by filters that do dynamic data rewriting (such as (de)compression). The random access aspect would be lost. - A direct device cannot participate in the larger stack without propagating the direct access model throughout the stack. (And this stops at the text level anyway, because the character recoder does dynamic data rewriting.) Propagating another interface means a lot of additional implementation effort and complexity.
Okay. I'm inclined to agree with this.
- Binary transport layer issue:
Platforms with unusual features, like 9-bit bytes or inability to handle types less than 32-bits in size can possibly still implement the interface for a text/character transport layer, possibly on top of some other lower-level transport that need not be part of the boost library. Clearly, the text encoding and decoding would have to be done differently anyway.
A good point, but it does mean that the text layer dictates how the binary layer has to work. Not really desirable when pure binary I/O has nothing to do with text I/O.
I'm not sure what you mean by this exactly.
One approach that occurs to me would be to make the binary transport layer use a platform-specific byte type (octets, nonets, whatever) and have the binary formatting layer convert this into data suitable for character coding.
It seems like trying to support unusual architectures at all may be extremely difficult. See my other post. I suppose if you can find a clean way to support these unusual architectures, then all the better. It seems that it would be very hard to support e.g. utf-8 on a platform with 9-bit bytes or which cannot handle types smaller than 32-bits.
- Seeking:
Maybe make multiple mark/reset use the same interface as seeking, for simplicity. Just define that a seeking device has the additional restriction that the mark type is an offset, and the argument to seek need not be the result of a call to tell.
Another issue is whether to standardize the return type from tell, like std::ios_base::streampos in the C++ iostreams library.
These are incompatible requirements, and the reason I want to keep the interfaces separate. Standardizing the tell return type is a good idea and necessary for efficient work of type erasure and simple use of arbitrary values in seek(). The type must be transparent.
The return type of mark(), on the other hand, can and should be opaque. This allows for many interesting things to be done. For example: Consider a socket. It has no mark/reset, let alone seeking support. You have a recursive descent parser that requires multiple mark/reset support.
I see. It still seems that using different names means that something that requires only mark/reset support cannot use a stream providing seek/tell support, without an additional intermediate layer. [snip]
It should probably also be possible to determine using the library at compile time what the native format is. To what end? If the native format is one of the special predefined ones, it will hopefully be optimized in the platform-aware special implementation (well, I can dream) anyway.
The reason would be for a protocol in which little/big endian is specified as part of the message/data, and a typical implementation would always write in native format (and so it would need to determine which is the native format), but support both formats for reading.
- Header vs Precompiled:
I think as much should be separately compiled as possible, but I also think that type erasure should not be used in any case where it will significantly compromise performance.
I'm thinking of a system where components are templates on the component they wrap, so as to allow direct calls upwards. I'm thinking of using the common separately compiled template specialization extension of compilers to provide pre-compiled versions of the standard components instantiated with the erasure components. This is very similar to how Spirit works, except that it doesn't have pre-compiled stuff. In Spirit, rule is the erasure type, but the various parsers can be directly linked, too.
Ideally, the cost of the virtual function calls would normally be mitigated by calling e.g. read/write with a large number of elements at once, rather than with only a single element.
Then, if the performance is needed, the programmer can hand-craft his chain so that no virtual calls are made, at the cost of compiling his own copy of the components.
I'm afraid I don't see a better way of doing this. I'm wide open to suggestions.
- The "byte" stream and the character stream, while conceptually different, should probably both be considered just "streams" of particular POD types. I have explained in a different post why I don't think this is a good idea. - Text transport:
I don't think this layer should be restricted to Unicode encodings.
I have no plans of doing so. I just consider all encodings as encodings of the universal character set. An encoding is defined by how it maps the UCS code points onto groups of octets, words, or other primitives.
Is it in fact the case that all character encodings that are useful to support encode only a subset of Unicode? (i.e. there does not exist a useful encoding that can represent a character that cannot be represented by Unicode?) In any case, though, it is not clear exactly why there is a need to think of an arbitrary character encoding in terms of Unicode, except when explicitly converting between that encoding and a Unicode encoding. [snip] -- Jeremy Maitin-Shepard