Support for reading/writing binary 'streams' (asio?)

Hi, When reading/writing binary streams, you often need to read/write an (unaligned) integer in big, little or native endian. Is there support for this in Boost? I think iostreams isn't usable as 'zero-copy' is required. What do other Boost developers/users use/recommend? Wouldn't this make a nice additional to the new asio library? Olaf

On 3/28/06, Olaf van der Spek <olafvdspek@gmail.com> wrote: When reading/writing binary streams, you often need to read/write an
(unaligned) integer in big, little or native endian. Is there support for this in Boost?
The closest thing I'm aware of is the portable_binary_oarchive (and iarchive) in the Boost.Serialization examples directory. This has logic to save integers and longs in little-endian format regardless of the hardware architecture, saving a single byte length + binary data that represents the value. For example, 42 would be saved as \x01\x2a and 1024 would be saved as \x02\x00\x04 What do other Boost developers/users use/recommend?
Wouldn't this make a nice additional to the new asio library?
Having machine word conversion logic is indeed very handy for network programming. The C library will always (?) provide the primitive hton[ls] and ntoh[ls] routines, but these are not always sufficient in today's world. Many protocols are strictly little-endian, or utilize receiver-makes-right semantics, wherein the receiver of the data converts the data to the correct format for their architecture. General purpose byte-swapping routines for varying "word" sizes (2, 4, 8, 16) are invaluable when implementing these sorts of protocols. Its not entirely clear (to me at least) where this sort of logic belongs in Boost, as it is potentially useful in more places than just a networking library. Perhaps this should be just a single header in the boost directory? -- Caleb Epstein caleb dot epstein at gmail dot com

On 3/28/06, Caleb Epstein <caleb.epstein@gmail.com> wrote:
On 3/28/06, Olaf van der Spek <olafvdspek@gmail.com> wrote:
When reading/writing binary streams, you often need to read/write an
(unaligned) integer in big, little or native endian. Is there support for this in Boost?
The closest thing I'm aware of is the portable_binary_oarchive (and iarchive) in the Boost.Serialization examples directory. This has logic to save integers and longs in little-endian format regardless of the hardware architecture, saving a single byte length + binary data that represents the value. For example, 42 would be saved as \x01\x2a and 1024 would be saved as \x02\x00\x04
That's not good enough.
What do other Boost developers/users use/recommend?
Wouldn't this make a nice additional to the new asio library?
Having machine word conversion logic is indeed very handy for network programming. The C library will always (?) provide the primitive hton[ls]
That's not standard C (AFAIK) but BSD sockets.
and ntoh[ls] routines, but these are not always sufficient in today's world. Many protocols are strictly little-endian, or utilize receiver-makes-right semantics, wherein the receiver of the data converts the data to the correct format for their architecture. General purpose byte-swapping routines for varying "word" sizes (2, 4, 8, 16) are invaluable
Other sizes (3, 6, others) may also be nice, although those are far less common.
when implementing these sorts of protocols.
Its not entirely clear (to me at least) where this sort of logic belongs in Boost, as it is potentially useful in more places than just a networking library.
True. But it seems asio's future includes other IO too.
Perhaps this should be just a single header in the boost directory?
That would be good. Olaf

On Tue, 28 Mar 2006 09:29:30 -0500, Caleb Epstein wrote
Its not entirely clear (to me at least) where this sort of logic belongs in Boost, as it is potentially useful in more places than just a networking library. Perhaps this should be just a single header in the boost directory?
I'd suggest that these functions have a strong affinity with the Integer library -- probably could be a single header under that framework: http://www.boost.org/libs/integer/index.html Jeff

On 3/28/06, Caleb Epstein <caleb.epstein@gmail.com> wrote:
On 3/28/06, Olaf van der Spek <olafvdspek@gmail.com> wrote:
When reading/writing binary streams, you often need to read/write an
(unaligned) integer in big, little or native endian. Is there support for this in Boost?
The closest thing I'm aware of is the portable_binary_oarchive (and iarchive) in the Boost.Serialization examples directory. This has logic to save integers and longs in little-endian format regardless of the hardware architecture, saving a single byte length + binary data that represents the value. For example, 42 would be saved as \x01\x2a and 1024 would be saved as \x02\x00\x04
Is the code/function that writes the binary value (without the length) also available in the interface or only in the implementation?

"Olaf van der Spek" <olafvdspek@gmail.com> wrote in message news:b2cc26e40603280426m1c14752do996f84adf0455ab6@mail.gmail.com...
Hi,
When reading/writing binary streams, you often need to read/write an (unaligned) integer in big, little or native endian. Is there support for this in Boost?
Back in 2000 there was considerable Boost discussion of endian integers. Darin Adler and I and some others contributed. We came up with a set of requirements that implied a surprisingly large feature set. 1, 2, 3, 4, 5, 6, 7, and 8 byte integers and unsigned are required. Unaligned for all, aligned for 2, 4, and 8 byte flavors. Both POD and non-POD flavors were requested. Must work correctly regardless of char being signed or unsigned. Must work correctly even when internal and external number of bits differ. No manual configuration. No optimization (supposed optimizations turned out to be pessimizations all too often between CPU's, compilers, compiler versions, or even when a compiler switch was changed). IIRC, almost everyone in the discussion had a set of roll-your-own classes, but none were Boost ready.
Wouldn't this make a nice additional to the new asio library?
The need goes way beyond asio. B-trees and other libraries need such integers if data files are to be portable, and users might well use such integers in their own binary data files. By the way, I've got a binary_file class about ready to propose for the filesystem library. It essentially wraps POSIX open/read/write/seek/close functions in a class. --Beman

On 3/28/06, Beman Dawes <bdawes@acm.org> wrote:
"Olaf van der Spek" <olafvdspek@gmail.com> wrote in message news:b2cc26e40603280426m1c14752do996f84adf0455ab6@mail.gmail.com...
Hi,
When reading/writing binary streams, you often need to read/write an (unaligned) integer in big, little or native endian. Is there support for this in Boost?
Back in 2000 there was considerable Boost discussion of endian integers. Darin Adler and I and some others contributed.
We came up with a set of requirements that implied a surprisingly large feature set. 1, 2, 3, 4, 5, 6, 7, and 8 byte integers and unsigned are required. Unaligned for all, aligned for 2, 4, and 8 byte flavors. Both POD
Isn't aligned when you have unaligned just an optimization?
and non-POD flavors were requested.
How does POD apply to this?
Must work correctly regardless of char being signed or unsigned. Must work correctly even when internal and external number of bits differ. No manual configuration. No optimization (supposed optimizations turned out to be pessimizations all too often between CPU's, compilers, compiler versions, or even when a compiler switch was changed).
IIRC, almost everyone in the discussion had a set of roll-your-own classes, but none were Boost ready.
And no work was done after the discussion? That's a shame, because I think it's really needed.

"Olaf van der Spek" <olafvdspek@gmail.com> wrote in message news:b2cc26e40603280821g993fd51s48edf012bd7a0247@mail.gmail.com...
On 3/28/06, Beman Dawes <bdawes@acm.org> wrote:
"Olaf van der Spek" <olafvdspek@gmail.com> wrote in message news:b2cc26e40603280426m1c14752do996f84adf0455ab6@mail.gmail.com...
Hi,
When reading/writing binary streams, you often need to read/write an (unaligned) integer in big, little or native endian. Is there support for this in Boost?
Back in 2000 there was considerable Boost discussion of endian integers. Darin Adler and I and some others contributed.
We came up with a set of requirements that implied a surprisingly large feature set. 1, 2, 3, 4, 5, 6, 7, and 8 byte integers and unsigned are required. Unaligned for all, aligned for 2, 4, and 8 byte flavors. Both POD
Isn't aligned when you have unaligned just an optimization?
Yep. I personally haven't had a need for aligned, but Darin Adler did, and I respect his judgement.
and non-POD flavors were requested.
How does POD apply to this?
In theory, only POD types have strong enough layout guarantees to be portable across I/O operations. In practice, a type that is otherwise POD but has constructors and a destructor is in fact portable, but some people are scared to use such types because some compiler down the road might add some extra cruft.
Must work correctly regardless of char being signed or unsigned. Must work correctly even when internal and external number of bits differ. No manual configuration. No optimization (supposed optimizations turned out to be pessimizations all too often between CPU's, compilers, compiler versions, or even when a compiler switch was changed).
IIRC, almost everyone in the discussion had a set of roll-your-own classes, but none were Boost ready.
And no work was done after the discussion? That's a shame, because I think it's really needed.
I do too, but it takes time and I guess no one had that. I just took a look at the internals of my own roll-you-own classes, and I'd be embarrassed to post them since they are really just retreaded C code. C style casts and that sort of thing. Originally written in 1985! They are also just holders, without arithmetic operations. A Boost-quality set would provide full arithmetic functionality. Open source software is very user driven. If no users care enough to do it themselves, or fund someone else to do it, it doesn't happen until a developer comes along willing to spend the time/effort. --Beman

On 3/29/06, Beman Dawes <bdawes@acm.org> wrote:
Isn't aligned when you have unaligned just an optimization?
Yep. I personally haven't had a need for aligned, but Darin Adler did, and I respect his judgement.
and non-POD flavors were requested.
How does POD apply to this?
In theory, only POD types have strong enough layout guarantees to be portable across I/O operations. In practice, a type that is otherwise POD but has constructors and a destructor is in fact portable, but some people are scared to use such types because some compiler down the road might add some extra cruft.
What's a better situation, no libary or a non-perfect library?
I do too, but it takes time and I guess no one had that.
I just took a look at the internals of my own roll-you-own classes, and I'd be embarrassed to post them since they are really just retreaded C code. C style casts and that sort of thing. Originally written in 1985! They are also just holders, without arithmetic operations. A Boost-quality set would provide full arithmetic functionality.
What kind of arithmetic operations would you need?
Open source software is very user driven. If no users care enough to do it themselves, or fund someone else to do it, it doesn't happen until a developer comes along willing to spend the time/effort.
But it doesn't have to be a single individual. I think with some coordination it can be done a lot faster by a group.

Olaf van der Spek <olafvdspek <at> gmail.com> writes:
Hi,
When reading/writing binary streams, you often need to read/write an (unaligned) integer in big, little or native endian. Is there support for this in Boost?
Sometimes it is necessary to send floats and doubles as well between different hosts. In such cases, should a float be sent as is down the wire, relying on IEEE754 for correct representation at destination, or should it be converted to a null- terminated ascii-string, or should it...? It would be really nice if boost.asio or boost.whatever automagically handled such floating-point primitives in the most efficient yet portable manner. Rune

Sometimes it is necessary to send floats and doubles as well between different hosts.
When dealing with floating points, there's all kinds of issues, including (as already mentioned) IEEE754 or other formats. By the time you have a generalized "marshalling" scheme that handles integers, floats, strings, etc, you've re-invented protocols such as XDR (eXternal Data Representation - see http://en.wikipedia.org/wiki/External_Data_Representation). Maybe there should be a Boost library for encoding and decoding to XDR? There's other possibilities such as SDXF or CDR (used in Corba).
It would be really nice if boost.asio or boost.whatever automagically handled such floating-point primitives in the most efficient yet portable manner.
Whatever is added to Boost for encoding / marshalling and decoding / unmarshalling should be separate from libraries such as Asio - it can be used by multiple libraries, any time objects / data needs to be transported among multiple platforms / systems, whether through a network or file system ... And definitely there should hooks to Boost.Serialization. Cliff

There was a recent discussion on this list regarding portable representation of floating point numbers. No one was able to find a portable way to transmit all types of NaNs from one machine architecture/compiler to any other. That is, the effort is stymied even before issues of representation arise. Robert Ramey Cliff Green wrote:
Sometimes it is necessary to send floats and doubles as well between different hosts.
When dealing with floating points, there's all kinds of issues, including (as already mentioned) IEEE754 or other formats.
By the time you have a generalized "marshalling" scheme that handles integers, floats, strings, etc, you've re-invented protocols such as XDR (eXternal Data Representation - see http://en.wikipedia.org/wiki/External_Data_Representation). Maybe there should be a Boost library for encoding and decoding to XDR? There's other possibilities such as SDXF or CDR (used in Corba).
It would be really nice if boost.asio or boost.whatever automagically handled such floating-point primitives in the most efficient yet portable manner.
Whatever is added to Boost for encoding / marshalling and decoding / unmarshalling should be separate from libraries such as Asio - it can be used by multiple libraries, any time objects / data needs to be transported among multiple platforms / systems, whether through a network or file system ... And definitely there should hooks to Boost.Serialization.
Cliff
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (7)
-
Beman Dawes
-
Caleb Epstein
-
Cliff Green
-
Jeff Garland
-
Olaf van der Spek
-
Robert Ramey
-
Rune