Re: [Boost-users] [serialisation]Portable Binary Archives
boost-users-bounces@lists.boost.org wrote:
If you're talking about portable binary archives for Boost.Serialization, there's been talk, but I'm not sure what the status is - anyone have an implementation? I'd be willing to help / work with someone on it.
That's what I am looking for. We need to get the file sizes down. We could use text (XML is great for debugging but too big for final release), but again that has its overheads. We generally don't use floating point so that may help us. The link I posted does have a portable implementation which I think uses IEEE754. I just wondered if there was an official boost:serilisation version; obviously not!!
Hello James, there _is_ a portable binary archive provided with the examples but as Robert Ramey points out in the code documentation, it is not complete as it lacks the floating point conversion. But if Johan is willing to fill this gap I am perfectly confident that we could have a comprehensive implementation in a few weeks. That is, if the issue that Dan Leibovich found in the example can be resolved. I follow the portable archive discussion with great interest since I would very much like to use it too. PS @ Cliff: Very interesting post! Do you know about when and if the endian library will enter the official boost release? I would find that useful too. Regards, -- Christian Pfligersdorffer Software Engineering EOS GmbH
Pfligersdorffer, Christian wrote:
there _is_ a portable binary archive provided with the examples but as Robert Ramey points out in the code documentation, it is not complete as it lacks the floating point conversion. But if Johan is willing to fill this gap I am perfectly confident that we could have a comprehensive implementation in a few weeks. That is, if the issue that Dan Leibovich found in the example can be resolved.
I follow the portable archive discussion with great interest since I would very much like to use it too.
PS @ Cliff: Very interesting post! Do you know about when and if the endian library will enter the official boost release? I would find that useful too.
Regards,
-- Christian Pfligersdorffer Software Engineering EOS GmbH
Here is a rough sketch for how to deal with some of the issues: 1. Only support platforms with IEEE754 compliant float and double. However, allow platforms with no denorms, no infinity and no NaN. 2. Preserve signbit of zero, inifinty and Nan when serializing and deserializing. 3. Do not preserve the exact bit pattern of NaN. (That would be pointless, because a bit pattern that is a quiet Nan on one platform may be a signaling Nan on another platform.) 4. Replace denorms by 0 when deserializing denorms on platforms with no denorms. 5. Throw exception when deserializing infinity and Nan on platforms with no infinity and NaN. 6. Do not support long double at all. There exists a jungle of different long double formats. Any comments on this? --Johan Råde
On 27/06/07, Pfligersdorffer, Christian
PS @ Cliff: Very interesting post! Do you know about when and if the endian library will enter the official boost release? I would find that useful too.
It looks as though it hasn't been proposed for review: http://www.boost.org/more/formal_review_schedule.html I don't know whether it requires one, however, or if it can simply become part of Boost.Integer without one. I do remember participating in quite a bit of discussion about it, with many interesting ideas/additions/alternatives raised, so a mini-review (or something) would probably happen. ~ Scott McMurray
... endian library
It looks as though it hasn't been proposed for review: http://www.boost.org/more/formal_review_schedule.html
... quite a bit of discussion about it, with many interesting ideas/additions/alternatives raised, so a mini-review (or something) would probably happen.
It's been a while (almost a year?) since Beman uploaded his endian utility. Beman, is endian-06.zip the latest, in the vault? Is the first step in creating a portable binary archive collecting and summarizing the discussions from a year ago and performing a mini-review to make the endian utility part of Boost? This would provide endian utility for integral types. The floating point utilities from Johan (which I haven't looked at yet) would need a place to reside (they could reside in the serialization archive, but it seems natural to have them as a separate utility). I'm going to look at the example serialization code - one concern I have (at this point an uneducated concern, since so far I've only casually read the serialization docs) is that I'd like a way to use a binary archive that doesn't have the "metadata" that is normally provided in serialization archives - I want to be able to read / write or send / receive buffers of packed binary data where I have complete control over every byte - I don't want type ids, version numbers, pointer sharing semantics, etc (unless I explicitly put them in my code). I realize this is an orthogonal concern to the portable binary mechanics, but I wouldn't be surprised if the users and applications wanting compact and efficient binary archives overlap significantly with the users and apps wanting control of every byte in the stream. Cliff
Cliff Green wrote:
that I'd like a way to use a binary archive that doesn't have the "metadata" that is normally provided in serialization archives - I want to be able to read / write or send / receive buffers of packed binary data where I have complete control over every byte - I don't want type ids, version numbers, pointer sharing semantics, etc (unless I explicitly put them in my code). I realize this is an orthogonal concern to the portable binary mechanics, but I wouldn't be surprised if the users and applications wanting compact and efficient binary archives overlap significantly with the users and apps wanting control of every byte in the stream.
One of the stated goals/requirements of the serialization library is that it handle any collections of data structures that can be expressed in C++. I couldn't figure out how to do this without object tags. Having said that, object tags are emitted only when the serialized data structures require it. versioning is optional and be suppressed. However doing so always turns out to be a bad idea. So I believe that your requirements conflict in a fundamental way with those explicitly state for the serialization library. I suspect that it's not a good solution for you. Robert Ramey
Cliff
So I believe that your requirements conflict in a fundamental way with those explicitly state for the serialization library. I suspect that it's not a good solution for you.
That's very possible. Although never explicitly stated in the Boost.S11N docs, your e-mail statements imply that B.S11N is not usable unless both "sides" (reader / writer, sender / receiver) are using B.S11N. Otherwise the non-B.S11N side must know the intimate format and protocol of the metadata added by B.S11N, and would be subject to changes in the internal protocol, pretty much making it a non-usable maintenance headache. I have no problem with this constraint on B.S11N, although it does limit it's applicability (while providing more capabilities for the "save and restore object state" type of applications). I now better understand why some of the networking people have been discussing a completely separate "marshalling / unmarshalling" library to handle that need (rather than consider how to integrate with B.S11N). Much of my day job is designing and implementing distributed processing systems, between heterogeneous systems, where any or all of the following can vary: platform, OS, compiler, endianness, language, and third-party libraries. I was hoping that an application could write one (B.S11N) serialize function and instantiate it with a variety of (B.S11N) archives that would allow control of the object marshalling and unmarshalling, to the degree that an application could interface with another application not using B.S11N. The capabilities allowed for this kind of usage are obviously more limited than what B.S11N allows for object serialization and re-instantiating. Looking at the B.Asio example code using B.S11N, there's an assumption that both sides of the network pipe are using B.S11N. The text archive string is wrapped with a header (basically the size of the string), following a very typical networking approach (read the fixed size header, which contains the size of the rest of the message, then read the rest of the message until everything arrives). All of the "message data" contents are opaque to the network / ASIO plumbing of the example code, and the actual object serialization and deserialization is left up to the B.S11N archive code. I welcome any corrections to my assumptions, characterizations, or implications ... :) Cliff
On Wed, 2007-27-06 at 16:30 -0400, Cliff Green wrote:
Looking at the B.Asio example code using B.S11N, there's an assumption that both sides of the network pipe are using B.S11N
Hi Cliff :) I think this is true, but it is wholly dependent on the archive. I may be wrong, but I believe this to be true. In any case, for all networked applications, both ends need to fully agree on a protocol whether or not they are using the same s11n library. With a json archive, this might work. Theoretically. Sohail
Hey Sohail!
Sohail Somani
... In any case, for all networked applications, both ends need to fully agree on a protocol whether or not they are using the same s11n library.
Definitely true - but that's true of anything using any form of serialization (or marshalling or other appropriate term). Let me clarify or summarize my thoughts in this e-mail thread - here's one of the statements Robert makes in the B.S11N rationale: "This library will be useful in other contexts besides implementing persistence. The most obvious case is that of marshalling data for transmission to another system. " Since that fits exactly in the area I typically work in, and I'm starting to do some B.Asio work, and it's germane to other current Boost discussions (binary I/O streams, RPC libraries, various networking projects), I'm trying to see whether / how B.S11N fits in. If both sides use B.S11N with the same archive, it's going to work just fine (given an appropriate network header / wrapper, as demonstrated by Chris K in B.Asio). But if only one side uses B.S11N, what are the implications? What I'd like is something similar to (simple example): struct Track { Gps loc; // two floats Dir dir; // int std::string trackId; std::vector<int> iffData; }; template<class Archive> void serialize(Archive & ar, Track & t, const unsigned int version) { ar & t.loc; ar & t.dir; ar & t.trackId; ar & t.iffData; } Other app domain classes would / could have more complicated serialization semantics. Now the archive should / could be selectable by different layers of the app / framework, and not ever disturb the code above, so for example you might send / receive using: 1. B.S11N text archive (other side has same) 2. b.S11N XML archive (other side has same) 3. A binary archive (other side does not use Boost) 4. A text archive (other side does not use Boost) I would hope the network "on the wire" protocol could be implemented through a B.S11N archive. The above example code could be easily sent as XDR or CDR (for binary), or a simple text protocol with the other side able to directly extract the data into a similar struct. The full range of B.S11N capabilities are typically not implemented in most "low level, on the wire" protocols, so constraints would be present. But it would allow a nice separation of application domain serialization code from the mechanics of the serialization archives. Maybe another way of stating my thoughts - as I'm just now learning the basics of B.S11N, I'm trying to understand the mapping of the app domain serialization logic to the underlying B.S11N archive code, and the associated implications. If B.S11N is not the best general-purpose library for my needs, I want to understand why and what a good general-purpose design would be. In particular, application domain code should not know or care about the underlying protocols used for networking or file I/O. Cliff
Note that boost has reviewed and accepted an MPI library in which much effort was expended to address these issues along with others. Of course Boost Serialization has never been proposed nor promoted as the solution to every problem. The stated goals preclude that in any case. Its a good solution if you have the problem its designed to address. It has been a good solution for lots of problems that were not originally contemplated. The fundamental goal was/is to serialize anything that can be represented in C++. This would generally preclude guarenteeing a format which could be read by something like Visual Basic for example. It is what it is rather than something else. Robert Ramey
participants (6)
-
Cliff Green
-
Johan Råde
-
Pfligersdorffer, Christian
-
Robert Ramey
-
Scott McMurray
-
Sohail Somani