
While working on ordinary web software, there are actually a lot more variations on data encodings than just text and binary: A binary format may itself be encoded as bytes (of varying endianess), or in Base64 for email attachments (RFC 2045) or Base32 for URLs or form post data (RFC 3548). When encoding in a plain-text format (after encoding into a narrow character set), there might still be escaping depending on the container. C, JS, XML attributes, elements and CDATAs, SQL (by database) all have different escaping rules. This fails to mention sillier issues like newline representation. Buffering is also an interesting problem because in some formats, buffering events (like flush overflow or EOF) have streaming output to indicate an explicit end of stream, minimum remaining distance or differences in distance (like how many bytes to the next chunk in a stream). None of these transformations are hard to write, but they are written over and over because standard streaming operators (be they Java, C++, Perl or printf) provide no straightforward way to inject the transformations. The cost tends to be that serializing an object is written several times over, or worse, gets tied up in a grander object persistence framework.
From my limited reasearch, the most complete description of a stream encoding is hidden in the description of HTTP 1.1 entities - this defines a 3-layer model for streaming:
Buffering events: How to determine how large the stream is (TE, Content-Length, Trailer headers) Transformations: Preprocessing required before the stream can be interpretted (Content-Encoding: gzip, deflate, could include byte encodings) Type: What class should further interpret the content, and for text entities, the character set encoding (Content-Type). This is not a complete model, largely because it ignores the issue of interpretting the content, but it seems like a good place to start since it's an intro to the problems of portably streaming data. John On 6/17/07, Jeremy Maitin-Shepard <jbms@cmu.edu> wrote:
Sebastian Redl <sebastian.redl@getdesigned.at> writes:
A few weeks ago, a discussion that followed the demonstration of the binary_iostream library made me think about the standard C++ I/O and what I would expect from an I/O model.
The document can be found here: http://windmuehlgasse.getdesigned.at/newio/
- Binary transport layer issue:
Make the "binary transport layer" the "byte transport layer" to make it clear that it is for bytes.
Platforms with unusual features, like 9-bit bytes or inability to handle types less than 32-bits in size can possibly still implement the interface for a text/character transport layer, possibly on top of some other lower-level transport that need not be part of the boost library. Clearly, the text encoding and decoding would have to be done differently anyway.