[serialisation]Portable Binary Archives

Hughes, James

26 Jun 2007 26 Jun '07

2:42 p.m.

Hello all, Probably answered before, but I wondered whether there is a standard implementation for portable binary archives available. I have found a reference here (http://programmicon.com/2007/03/07/the-un-fun-stuff/). I noticed some recent posts on the subject, but haven't found out much info googling. Cheers James This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.

Show replies by date

Cliff Green

26 Jun 26 Jun

10:57 p.m.

...

Probably answered before, but I wondered whether there is a standard implementation for portable binary archives available.

I'm not quite sure what you mean by "standard implementation" - there are a number of commonly used portable binary approaches or standards. For example, XDR (IETF standard) has been around for quite a while: http://tools.ietf.org/rfc/rfc4506.txt as well as SDXF (not sure how much it is used): http://tools.ietf.org/rfc/rfc3072.txt There's CDR, used in CORBA and other libraries or frameworks where interoperability is needed: http://en.wikipedia.org/wiki/Common_Data_Representation I'm sure there's a gaggle of home-grown approaches (I've written some myself) as well as other industry standard approaches. If you're talking about portable binary archives for Boost.Serialization, there's been talk, but I'm not sure what the status is - anyone have an implementation? I'd be willing to help / work with someone on it. Beman submitted a nice set of utilities for endian handling, which could be used (something I've also written many times in the past, although not quite as comprehensive as Beman's). Integral byte swapping is pretty straight forward, but floating point is not - besides the obvious representation issues (e.g. IEEE 754 or not), there's some non-obvious issues dealing with floating point normalization and special values (infinity, etc). For example, I wrote one template function which returned (by value) byte swapped entities, but found it would silently change floating point values depending on the value, platform, compiler version, register usage, etc. Turns out normalization would occur on the "by value" return, changing bits in the byte swapped floating point number. What use cases and constraints do you envision? There's lots of tradeoffs and design choices already discussed in previous Boost threads. Personally I think portably binary archiving through Boost.Serialization is long overdue. Cliff

Johan Råde

27 Jun 27 Jun

7:33 a.m.

Cliff Green wrote:

...

...
Probably answered before, but I wondered whether there is a standard implementation for portable binary archives available.

[snip]

...

If you're talking about portable binary archives for Boost.Serialization, there's been talk, but I'm not sure what the status is - anyone have an implementation? I'd be willing to help / work with someone on it.

[snip]

...

Integral byte swapping is pretty straight forward, but floating point is not - besides the obvious representation issues (e.g. IEEE 754 or not), there's some non-obvious issues dealing with floating point normalization and special values (infinity, etc). For example, I wrote one template function which returned (by value) byte swapped entities, but found it would silently change floating point values depending on the value, platform, compiler version, register usage, etc. Turns out normalization would occur on the "by value" return, changing bits in the byte swapped floating point number.

I'd be willing to help out with the floating point issues. Last fall I wrote a library with portable fpclassify, isnan, signbit etc. (It is available in the vault in the Math - Numerics folder.) Several Boosters tested the library with more than 20 different OS / compiler / processor combinations. This experience taught me a lot about portable handling of floating point numbers. --Johan Råde

Robert Ramey

3:44 p.m.

Johan Råde wrote:

...

Cliff Green wrote:

...

Last fall I wrote a library with portable fpclassify, isnan, signbit etc. (It is available in the vault in the Math - Numerics folder.) Several Boosters tested the library with more than 20 different OS / compiler / processor combinations.

This is something that is really needed in boost. Could you possibly: a) make a small boost library for this which would include i) code (hopefully header only?) ii) tests iii) documentation and ask for a mini-review. b) enhance the portable binary archive to take advantage of your code. c) enhance the serialization documentation to explain that portable binary archive. I realize that although each of these steps isn't all that large, taken together they do add up to a non-trivial amount of work. In order to compensate with you'll get. a) a tiny increment in one man's quest (yours) for immortality. b) a very nice addition to your resume. c) the satisfaction of knowing that your work has passed muster with what might be the pickiest group of programmers in the known universe. d) the satisfaction of knowing that your code is probably a key piece a perhaps thousands of applications. I know its not much, but its the best I can offer.

...

This experience taught me a lot about portable handling of floating point numbers.

Which will bring its own curse. In the course of going through even a mini-review, you'll have to spend a lot of time with people who "know" a lot more than you do. Good Luck. Robert Ramey

...

--Johan Råde

Johan Råde

4:47 p.m.

Robert Ramey wrote:

...

Johan Råde wrote:

...
Cliff Green wrote:

...
Last fall I wrote a library with portable fpclassify, isnan, signbit etc. (It is available in the vault in the Math - Numerics folder.) Several Boosters tested the library with more than 20 different OS / compiler / processor combinations.

This is something that is really needed in boost. Could you possibly:

a) make a small boost library for this which would include i) code (hopefully header only?) ii) tests iii) documentation

and ask for a mini-review.

Dear Robert, The code (header only), tests and docs are done. The library also includes facets for portable handling of infinities and nans in text streams. The library is available in the vault: Math - Numerics / floating_point_utilities_v2.zip. It has been submitted for a review. All that is needed now is a review manager. Please take a look at the library Robert. I'd be happy to hear what you think. --Johan Råde

Johan Råde

4:53 p.m.

Robert Ramey wrote:

...

I realize that although each of these steps isn't all that large, taken together they do add up to a non-trivial amount of work. In order to compensate with you'll get.

a) a tiny increment in one man's quest (yours) for immortality. b) a very nice addition to your resume. c) the satisfaction of knowing that your work has passed muster with what might be the pickiest group of programmers in the known universe. d) the satisfaction of knowing that your code is probably a key piece a perhaps thousands of applications.

I know its not much, but its the best I can offer.

Maybe you could offer me a fix for the bug that makes the serialization library crash when you throw an exception in a load_construct_data function ;-) --Johan Råde

Robert Ramey

6:31 p.m.

Johan Råde wrote:

...

Maybe you could offer me a fix for the bug that makes the serialization library crash when you throw an exception in a load_construct_data function ;-)

LOL - hmmm - I think I looked into this but I don't remember what I did with it. Send me a link to the original complaint and I'll check again. Robert Ramey

...

--Johan Råde

Johan Råde

7:39 p.m.

New subject: [serialisation] load_construct_data exception safety

Robert Ramey wrote:

...

Johan Råde wrote:

...
Maybe you could offer me a fix for the bug that makes the serialization library crash when you throw an exception in a load_construct_data function ;-)

LOL - hmmm - I think I looked into this but I don't remember what I did with it. Send me a link to the original complaint and I'll check again.

Robert Ramey

The original complaint was in a private e-mail that I sent you 2 1/2 years ago. I don't think I will be able to find it. The sender was rade@maths.lth.se. If you don't have the mail, I'd be happy to send you a short example that reproduces the bug. --Johan Råde

Sohail Somani

7:54 p.m.

New subject: [serialisation] load_construct_data exception safety

On Wed, 2007-27-06 at 21:39 +0200, Johan Råde wrote:

...

Robert Ramey wrote:

...
Johan Råde wrote:

...
Maybe you could offer me a fix for the bug that makes the serialization library crash when you throw an exception in a load_construct_data function ;-)

LOL - hmmm - I think I looked into this but I don't remember what I did with it. Send me a link to the original complaint and I'll check again.

Robert Ramey

The original complaint was in a private e-mail that I sent you 2 1/2 years ago. I don't think I will be able to find it. The sender was rade@maths.lth.se. If you don't have the mail, I'd be happy to send you a short example that reproduces the bug.

Perhaps adding a ticket would also be beneficial so it doesn't get lost. http://svn.boost.org I think. Or atleast thats where I'm filing feature removals! I mean bugs. Sohail

Robert Ramey

10:01 p.m.

New subject: [serialisation] load_construct_data exceptionsafety

My current copy the code includes in the file iserializer.hpp (line # 291) the following code. Doesn't this address the situation? Robert Ramey template<class Archive, class T> BOOST_DLLEXPORT void pointer_iserializer<Archive, T>::load_object_ptr( basic_iarchive & ar, void * & x, const unsigned int file_version ) const { Archive & ar_impl = boost::smart_cast_reference<Archive &>(ar); auto_ptr_with_deleter<T> ap(heap_allocator<T>::invoke()); if(NULL == ap.get()) boost::throw_exception(std::bad_alloc()) ; T * t = ap.get(); x = t; // catch exception during load_construct_data so that we don't // automatically delete the t which is most likely not fully // constructed BOOST_TRY { // this addresses an obscure situtation that occurs when // load_constructor de-serializes something through a pointer. ar.next_object_pointer(t); boost::serialization::load_construct_data_adl<Archive, T>( ar_impl, t, file_version ); } BOOST_CATCH(...){ ap.release(); BOOST_RETHROW; } BOOST_CATCH_END ar_impl >> boost::serialization::make_nvp(NULL, * t); ap.release(); } Johan Råde wrote:

...

Robert Ramey wrote:

...
Johan Råde wrote:

...
Maybe you could offer me a fix for the bug that makes the serialization library crash when you throw an exception in a load_construct_data function ;-)

LOL - hmmm - I think I looked into this but I don't remember what I did with it. Send me a link to the original complaint and I'll check again.

Robert Ramey

The original complaint was in a private e-mail that I sent you 2 1/2 years ago. I don't think I will be able to find it. The sender was rade@maths.lth.se. If you don't have the mail, I'd be happy to send you a short example that reproduces the bug.

--Johan Råde

Johan Råde

30 Jun 30 Jun

8:23 a.m.

New subject: [serialisation] load_construct_data exceptionsafety

Robert Ramey wrote:

...

My current copy the code includes in the file iserializer.hpp (line # 291) the following code. Doesn't this address the situation?

Robert Ramey

[snip] I ran a simple test case. It fails with Boost 1.34.0. It passes with the new version of pointer_iserializer. Thank you, Johan Råde

Cliff Green

27 Jun 27 Jun

4:56 p.m.

...

This is something that is really needed in boost. Could you possibly:

a) make a small boost library for this which would include i) code (hopefully header only?) ii) tests iii) documentation

and ask for a mini-review.

b) enhance the portable binary archive to take advantage of your code.

c) enhance the serialization documentation to explain that portable binary archive.

I realize I need to look at the current portable binary archive example code to better understand the design tradeoffs, but there's a number of possible approaches to consider. To me, it makes sense to have multiple portable binary archives, allowing the app to make the best choice for its need (maybe a good "compromise" binary archive could be the default for Boost.Serialization). Possible choices (some already discussed): 1. Absolute smallest space - e.g. "continuation" bits, allowing integers to be 1, 2, 3, 4, etc bytes in size, depending on if the value is (respectively) 127 or less, 32767 or less, etc. (I'm pretty sure this has been discussed before). There could even be a "bitstream" packing versus byte-oriented, but the set of users for that might be pretty small (I've written "bitstream" protocols for wireless networks before, and believe me, the compactness is significantly more than the usual binary network protocols). 2. Absolute fastest speed - XDR fits this, as it is not all that space efficient (e.g. bools take up 4 bytes, short ints take up 4 bytes, etc). But it definitely is 4 byte aligned in everything (however, I would guess that the speed differences regarding alignment are much less for modern processors versus when XDR was first defined). 3. Flexibility versus simplicity - e.g. whether object ids, tags, implicit structuring and nesting, etc are directly supported versus just providing support for fundamental types, and (relatively) simple sequences. 4. Floating point support - IEE 754 formats only versus later standards (and long double), other non-typical floating point formats, and probably a host of other possibilities for binary floating point. 5. Adherence to a commonly used (or even not that commonly used) protocol - XDR, CDR, etc. Cliff

Robert Ramey

6:45 p.m.

Well, sounds like that would be 5 new archives and/or derivations. You've got a lot of work to do. Robert Ramey. Cliff Green wrote:

...

Possible choices (some already discussed):

1. Absolute smallest space -

The easiest way to do this is to use a streambuffer which has compression built in. This is included in the boost stream i/o library. Using such a stream will compress - almost to the max - any of the boost archive types.

...

2. Absolute fastest speed

I doubt the native binary can be slower than anything else as all it does is copy the bytes to the streambuffer. Of course its not portable - but it will be the fastest.

...

3. Flexibility versus simplicity - e.g. whether object ids, tags, implicit structuring and nesting, etc are directly supported versus just providing support for fundamental types, and (relatively) simple sequences.

Without the "extras" its not going to be compatible with boost serialization. It might be useful in its own right - but its not going to be the same thing.

...

4. Floating point support - IEE 754 formats only versus later standards (and long double), other non-typical floating point formats, and probably a host of other possibilities for binary floating point.

of course all floating point types are handled in a portable way by the current text archives so this is just a binary archive issue. (except for portable NaN and others which is a general i/o rather than serialization problem)

...

5. Adherence to a commonly used (or even not that commonly used) protocol - XDR, CDR, etc.

If someone were really interested in this I would expect that he could make some progress in this area. I'm not sure how useful it would be but who knows? Robert Ramey

Hughes, James

7:35 a.m.

...

-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Cliff Green Sent: 26 June 2007 23:58 To: boost-users@lists.boost.org Subject: Re: [Boost-users] [serialisation]Portable Binary Archives

...
Probably answered before, but I wondered whether there is a standard implementation for portable binary archives available.

I'm not quite sure what you mean by "standard implementation" - there are a number of commonly used portable binary approaches or standards. For example, XDR (IETF standard) has been around for quite a while:

http://tools.ietf.org/rfc/rfc4506.txt

as well as SDXF (not sure how much it is used):

http://tools.ietf.org/rfc/rfc3072.txt

There's CDR, used in CORBA and other libraries or frameworks where interoperability is needed:

http://en.wikipedia.org/wiki/Common_Data_Representation

I'm sure there's a gaggle of home-grown approaches (I've written some myself) as well as other industry standard approaches.

If you're talking about portable binary archives for Boost.Serialization, there's been talk, but I'm not sure what the status is - anyone have an implementation? I'd be willing to help / work with someone on it.

That's what I am looking for. We need to get the file sizes down. We could use text (XML is great for debugging but too big for final release), but again that has its overheads. We generally don't use floating point so that may help us. The link I posted does have a portable implementation which I think uses IEEE754. I just wondered if there was an official boost:serilisation version; obviously not!! <snip> James This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.

Robert Ramey

3:53 p.m.

Hughes, James wrote:

...

...
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Cliff Green Sent: 26 June 2007 23:58 To: boost-users@lists.boost.org Subject: Re: [Boost-users] [serialisation]Portable Binary Archives

...
Probably answered before, but I wondered whether there is a standard implementation for portable binary archives available.

I'm not quite sure what you mean by "standard implementation" - there are a number of commonly used portable binary approaches or standards. For example, XDR (IETF standard) has been around for quite a while:

http://tools.ietf.org/rfc/rfc4506.txt

When I looked at this it seemed to me that XDR only dealt with primitive types. So it would seem easy to make an XDR archive. Such an archive would have all XDR types, but would have some data members relevent to serialization - like obect ids for tracked types. This would still be XDR compatible but its not clear that it would still be portable for other languages like say FORTRAN. Perhaps it might be possible to make an XDR archive which would not permit things like serialization of pointers.

...

...
as well as SDXF (not sure how much it is used):

http://tools.ietf.org/rfc/rfc3072.txt

Don't know anything about this.

...

...
There's CDR, used in CORBA and other libraries or frameworks where interoperability is needed:

I made a very cursory examination of CDR and it seems an enhancement of XDR. It does have the concept of structures and object tags so its concievable that one might make a CDR archive.

...

...
If you're talking about portable binary archives for Boost.Serialization, there's been talk, but I'm not sure what the status is - anyone have an implementation? I'd be willing to help / work with someone on it.

Talk to John Rade.

...

That's what I am looking for. We need to get the file sizes down. We could use text (XML is great for debugging but too big for final release), but again that has its overheads. We generally don't use floating point so that may help us. The link I posted does have a portable implementation which I think uses IEEE754. I just wondered if there was an official boost:serilisation version; obviously not!!

I don't think binary archives are going to be smaller than the equivalent text archive. native binary archives are built for speed, text for portability, and xml to satisfy those who feel they need it. Robert Ramey

Cliff Green

4:27 p.m.

...

I don't think binary archives are going to be smaller than the equivalent text archive. native binary archives are built for speed, text for portability, and xml to satisfy those who feel they need it.

I generally agree, but I wouldn't be surprised if binary archives are smaller than text archives for everything but unusual cases. It may not be a significant percentage difference, and of course without some objective test cases I'm just guessing. And of course we need to better define the specifics of a "binary archive" - I'll follow-up with a reply to your next e-mail. Cliff

Scott McMurray

8:08 p.m.

On 27/06/07, Cliff Green <cliffg@codewrangler.net> wrote:

...

...
I don't think binary archives are going to be smaller than the equivalent text archive. native binary archives are built for speed, text for portability, and xml to satisfy those who feel they need it.

I generally agree, but I wouldn't be surprised if binary archives are smaller than text archives for everything but unusual cases. It may not be a significant percentage difference, and of course without some objective test cases I'm just guessing.

FWIW, I'm currently using binary files for some work because the text is huge and slow. For the naiive, punning-to-char* approach, my files go, for example, go down to 1.2 GiB instead of the 2.1 GiB for text, despite storing slightly more data per entry (quaternion instead of euler angles, to be precise). It also makes my load time drop from a few minutes to a few seconds for the smallest of the files (150 MiB of text), which adds up very quickly. Now this is nowhere as general as a proper portable Boost.Serialization-style archive, but I consider it an important usage of a possible binary library, whether that be some sort of boost.endian (to just make what I'm doing port portable), a binary I/O library (as another thread is discussing), or a portable binary archive Boost.Serialization library (which obviously couldn't get as much space reduction, but the speed boost should still be considerable). ~ Scott McMurray

Cliff Green

8:41 p.m.

"Scott McMurray" <me22.ca+boost@gmail.com> wrote:

...

FWIW, I'm currently using binary files for some work because the text is huge and slow.

... down to 1.2 GiB instead of the 2.1 GiB for text ... It also makes my load time drop from a few minutes to a few seconds

This has been my (somewhat limited application domain) experience, as well. Almost all of the distributed systems I've worked on that used binary message data passing did so for both processing speed and compactness of the data. Most of the system architects understood the "brittleness" and drawbacks of the design (specially when floating point values were needed), but were willing to make the tradeoff. I'd love to be able to remove at least some of the code design and portability drawbacks for applications needing to pass (or read / write) binary data. Cliff

Pfligersdorffer, Christian

25 Jul 25 Jul

3:40 p.m.

So, is there anything going on in this cause? I really need floating point types so I'd love to see some progress in that. I am also willing to do some coding if need be. Regards, -- Christian Pfligersdorffer Software Engineering www.eos.info _______________________________________________________________ Cliff Green on Wednesday, 27. June 2007 22:41: "Scott McMurray" <me22.ca+boost@gmail.com> wrote:

...

FWIW, I'm currently using binary files for some work because the text is huge and slow.

... down to 1.2 GiB instead of the 2.1 GiB for text ... It also makes my load time drop from a few minutes to a few seconds

Robert Ramey

5:14 p.m.

I am hopeful that progress will soon be made on this area. This would accomplish: a) fixing noted bugs in the portable binary archive and making it pass the serialization torture tests that the other archives are subjected to. b) subjecting it (along with other archives) to profiling at least with gcc compilers. I wasn't planning on addressing floating point. If someone wants to do this I would certainly be happy to lend my constructive criticism to the effort. The main issues are: a) portable subset of NaN's etc. It seems to me this is ready to that this code has been done by John Rade? so that's would seem ok. b) portable encoding of floating point - a very attractive proposal was submitted some time ago (with code !!!) by Ralf W Grosse-Kunstleve search the mailing list for this. So feel free to get busy. Robert Ramey Pfligersdorffer, Christian wrote:

...

So, is there anything going on in this cause? I really need floating point types so I'd love to see some progress in that. I am also willing to do some coding if need be.

Regards,

"Scott McMurray" <me22.ca+boost@gmail.com> wrote:

...
FWIW, I'm currently using binary files for some work because the text is huge and slow.

... down to 1.2 GiB instead of the 2.1 GiB for text ... It also makes my load time drop from a few minutes to a few seconds

This has been my (somewhat limited application domain) experience, as well. Almost all of the distributed systems I've worked on that used binary message data passing did so for both processing speed and compactness of the data. Most of the system architects understood the "brittleness" and drawbacks of the design (specially when floating point values were needed), but were willing to make the tradeoff.

I'd love to be able to remove at least some of the code design and portability drawbacks for applications needing to pass (or read / write) binary data.

Cliff _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Hughes, James

27 Jun 27 Jun

4:36 p.m.

...

...
That's what I am looking for. We need to get the file sizes down. We could use text (XML is great for debugging but too big for final release), but again that has its overheads. We generally don't use floating point so that may help us. The link I posted does have a portable implementation which I think uses IEEE754. I just wondered if there was an official boost:serilisation version; obviously not!!

I don't think binary archives are going to be smaller than the equivalent text archive. native binary archives are built for speed, text for portability, and xml to satisfy those who feel they need it.

Robert Ramey

Thanks for that comment on sizes Robert - will bring that to the table when we next talk about serialisation here!!!! James This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.

Robert Ramey

6:29 p.m.

Hughes, James wrote:

...

Thanks for that comment on sizes Robert - will bring that to the table when we next talk about serialisation here!!!!

Just to clarify. Suppose you have an archive with a lot 4 byte integers whose values contain small numbers - 1 or two digits. The text archive would represent this as: 1 24 14 2 7 ... averaging 3 bytes for integer while the native binary archive would represent the values as 4 bytes per integer. I don't remember about floating point format off hand. So... whether the text or binary is larger or smaller would depend on the specific case. I doubt that there is a huge difference either way. Robert Ramey

6577

Age (days ago)

6606

Last active (days ago)

List overview

Download

21 comments

7 participants

participants (7)

Cliff Green
Hughes, James
Johan Råde
Pfligersdorffer, Christian
Robert Ramey
Scott McMurray
Sohail Somani