[Serialization] binary archive 1.75 times larger than source data

Hi, So I've got most of my serialization working now, the only problem now is that it's still largely inefficent. The resulting binary file is 1.75 times larger than the sourcer XML file which I parse from. This is probably a lot due to the fact that I still default tracking and version everything. What I'd love to do however is to specify on a archive by archive basis if I want tracking / versioning. In fact, isn't this what one would normally want to do? When you specify tracking on a class by class basis you have no idea if there will be multiple pointer to the same object of that type. I found the no_tracking which I found a reference to from 2006 on this very mailing list, however it still doesn't seem to affect at least archive size ( haven't checked deserialization timings for it ). When I take a look at the deserialization the binary archive seems to store identifiers for types as their straight up string versions. This also seems largely inefficient, is there a reason for why this can't just be a hash? Also, is binary archives streaming or do they wait until the entire archive is loaded? Basically I'm wondering what would be the fastest possible archive one could create if one doesn't care about portability. // Sebastian Karlsson

Sebastian.Karlsson@mmorpgs.org wrote:
Hi,
So I've got most of my serialization working now, the only problem now is that it's still largely inefficent. The resulting binary file is 1.75 times larger than the sourcer XML file which I parse from.
This is probably a lot due to the fact that I still default tracking and version everything.
maybe, maybe not
What I'd love to do however is to specify on a archive by archive basis if I want tracking / versioning. In fact, isn't this what one would normally want to do? When you specify tracking on a class by class basis you have no idea if there will be multiple pointer to the same object of that type. I found the no_tracking which I found a reference to from 2006 on this very mailing list, however it still doesn't seem to affect at least archive size ( haven't checked deserialization timings for it ).
At one time I considered implementing this as a runtime flag. I eventually concluded that this was not a good idea as it would load down everyone's code with the weight of a seldom used feature. I concluded that would better be implemented as a template parameter - if at all. Note that archives created with such a flag would be "write only" as tracking is necessary to properly restore pointers.
When I take a look at the deserialization the binary archive seems to store identifiers for types as their straight up string versions. This also seems largely inefficient, is there a reason for why this can't just be a hash?
I presume you're referring to exported types. If you don't like this you can just "pre-register" these types with ar.register<T>(0). This will assign a small integer to the type which is valid for just this archive.
Also, is binary archives streaming or do they wait until the entire archive is loaded?
streaming. The only storage used is for one data item at time. same as all archives.
Basically I'm wondering what would be the fastest possible archive one could create if one doesn't care about portability.
binary_archive Note that the binary archive stores just the raw bits. So if you have hugh number 0 on a 32 bit machine, you'll be saving 4 bytes for each data item. In a text rendering, this would be just 2 byte ('0' + space). Hence, there is no reason to believe that the binary archive will always be the smallest. If you're concerned about i/o time, you could use a streambuf with data compression added on - but that is outside the scope of the serializaton library. Robert Ramey
// Sebastian Karlsson
participants (2)
-
Robert Ramey
-
Sebastian.Karlsson@mmorpgs.org