
I think part of the issue may be the name "binary". A better name may be "byte" I/O or "byte" stream. Originally, the binary transport layer was called byte transport layer. I decided against this name for the simple reason that, as far as the C++ standard is concerned, a byte is pretty much the same as an unsigned char. Because the exact unit of transport is still in the open (and the current tendency I see is toward using octets, and leaving native bytes to some other mechanism), I didn't want any such implication in the name. The name binary isn't a very good choice either, I admit. In the end, all data is binary. But the distinction between "binary" and "textual" data is important, and not only at the concept level. What I have in my mind works something like this: Binary data is in terms of octets, bytes, primitives, or PODs, whatever. The distinguishing feature of binary data is that each "unit" is meaningful in isolation. It makes sense to fetch a single unit and work on it. It makes sense to jump to an arbitrary position in the stream and interpret the unit there. Textual data is far more complex. It's a stream of abstract characters, and they don't map cleanly to the underlying representative primitive. A UTF-8 character maps to one, two, three or four octets, leaving aside
Jeremy Maitin-Shepard wrote: the dilemma of combining accents. A UTF-16 character maps to one or two double-octets. It doesn't make sense to fetch a single primitive and work on it, because it may not be a complete character. It doesn't make sense to jump to an arbitrary position, because you might jump into the middle of a character. The internal character encoding is part of the text stream's type in my model.
Also, I believe the narrow/wide characters and locales stuff is broken beyond all repair, so I wouldn't recommend to do anything related to that.
I believe that this library will attempt to address and properly handle those issues.
I certainly will, especially as this view seems to be generally agreed on by the posters.
I also think text formatting is a different need than I/O. Indeed, it is often needed to generate a formatted string which is then given to a GUI Toolkit or whatever.
Presumably this would be supported by the using the text formatting layer on top of an output text sink backed by an in-memory buffer.
That's the idea. Separating formatting and I/O is necessary to avoid an ugly mess of responsibilities, which is why the text formatting layer is a distinct layer. However, having the the formatting build on the I/O interfaces instead of string interfaces allows for greater optimization opportunities. There is not much you can do when you create a string from format instructions. You have two choices, basically. One is to first find out how much memory is needed, allocate a buffer, and then format the data. This is the approach MFC's CString::Format takes. The obvious problem is that it does all work twice. The less obvious problem is that it makes the code considerably more complex: either you find a way to turn sub-formatting (that is, evaluating the format instructions for a single parameter, e.g. turning an int into a string) into a dummy operation that just returns the space needed. This is very complex and, depending on the exact way formatting works, may even be impossible (or at least will create many little strings that have their length read, only to be discarded and later re-created), or (to continue a sentence started a long time ago) you require the formatting methods to provide an explicit "measure-only" operation, which hurts extensibility. The other way is to just go ahead and format, re-allocating whenever space runs out. And that's just what the I/O-based method does anyway. However, if your formatting is bound to string objects, it means that every formatting operation has to create a string containing all formatted parameters. This may be a considerable memory/time overhead when compared to a formatting that works on the I/O interfaces, where formatted data is sent directly to the underlying device. (For efficiency, of course, there may be a buffer in the device chain. That's fine. The buffer is re-used, not allocated once per formatting operation.) So yes, the formatting layer will be very distinct (and developed after the other parts are complete), but I really believe that basing it on the I/O interfaces is the best solution. There can be, of course, a convenience interface that simply creates a string from formatting instructions and parameters. Sebastian Redl