
"John Hayes" <john.martin.hayes@gmail.com> writes:
While working on ordinary web software, there are actually a lot more variations on data encodings than just text and binary:
It seems fairly logical to me to have the following organization: - Streams of arbitrary POD types For instance, you might have uint8_t streams, uint16_t streams, etc. - A byte stream would be a uint8_t stream. - A text stream holding utf-16 encoded text would be a uint16_t stream, while a text stream holding utf-8 encoded text would be a uint8_t stream. A text stream holding iso-8859-1 encoded text would also be a uint8_t stream. There is the issue of whether it is useful to have a special text stream type that is tagged (either at compile-time or at run-time) with the encoding in which the data either going in or out of it are supposed to be. How exactly this tagging should be done, and to what extent it would be useful, remains to be explored. It seems that your various examples of filters/encoding, like BASE-64, URL encoding, CDATA escaping, and C++ string escaping, might well fit into the framework I described in the previous paragraphs. Many of these filters can be viewed as encoding a byte stream as text. Let me know your thoughts, though. -- Jeremy Maitin-Shepard