[serialization] Binary archive STL container template specialization?

Greetings everyone, I need to stream a data over a very small data connection (think cell-phones and GPRS). The type of data will vary, but will typically contain strings, numbers, and short groups of these basic types. The application on both ends of the pipe will be written in C++, so I've loosely decided on the boost::serialization library as it virtually eliminates all of the code I would have needed to manually write. (Awesome!) I've made up a bunch of test archives, and I'd like to get some feedback on a possible optimization, or at least specialization, of the code that streams out STL collections. The current serialization methodology for STL containers saves the size of the container followed by each item inside the container. This code is also used for std::string as it behaves like an STL container of characters. The down-side of this is that each string uses a minimum of 8 bytes (32-bit integer) plus the string payload. Proposal: Write out a single byte that indicates the number of elements to follow. If the number of elements is 255 or more, write out a single byte 0xF, followed by the size_type indicating the correct count. Reading follows the same pattern in reverse. Read a single byte. If the byte is 0xF, read size_type, otherwise you have the count. Simple example from my problem domain: If I have a list of three-letter bin locations in a warehouse and each bin contains a quantity of a specific item, I will have the following data to send: DER: 427 ALU: 582 COM: 821 TER: 991 FLO: 0 TER: 298 ALP: 332 PED: 773 Using the boost serialization framework, that data becomes 160 bytes (8 for the size, 8+length for each string, and 8 for each integer). Using my proposal, the data size drops to 97 bytes, nearly 60% in data reduction. I theorize that many serialized strings and collections are less than 255 items or characters (especially in my problem domain) and that this technique will save us many on-the-wire bytes over time. A) Do you think this is a reasonable addition/modification to the serialization library? B) Is there any way to add this functionality to the serialization library without breaking existing archives? I see a call to get_library_version in the code, but I'm not sure what is the purpose of this statement. Anyone? Thanks, Eric

I think you should think bigger. Howabout compressing the whole stream in a general way. Why not encrypt it while you're at it. Oh and avoid wrtting code altogether. I bet you think I'm being sarcastic. Well, I'm not - I'm serious. I would investigate using the serialization library along with binary (or maybe text as an experiment), in combination of io streams library. Serialization should work with ANY stream. And the io streams library includes zlib compression. (I don't know about encryption). So in theory, you should be able to get your data stream shrunk by 50% and free encryption - without writting any code at all. My 2 cents.
A) Do you think this is a reasonable addition/modification to the serialization library?
Not in my opinion. However it might be a good idea for an "archive adaptor" which enhances the behavior of another archive.
B) Is there any way to add this functionality to the serialization library without breaking existing archives? I see a call to get_library_version in the code, but I'm not sure what is the purpose of this statement.
The library includes class versioning. For those classes which didnt' include versioning, there is library version which is used for the same purpose.
Anyone?
Robert Ramey

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: 21 May 2008 03:12 To: boost@lists.boost.org Subject: Re: [boost] [serialization] Binary archive STL containertemplatespecialization?
I think you should think bigger.
Howabout compressing the whole stream in a general way.
Why not encrypt it while you're at it.
Oh and avoid wrtting code altogether.
I bet you think I'm being sarcastic.
Well, I'm not - I'm serious.
I would investigate using the serialization library along with binary (or maybe text as an experiment), in combination of io streams library. Serialization should work with ANY stream. And the io streams library includes zlib compression. (I don't know about encryption). So in theory, you should be able to get your data stream shrunk by 50% and free encryption - without writting any code at all.
Another advantage to Robert's suggestion is that you probably get some error detection thrown in - for very good measure. If you are building a system for controlling warehouse picking, you must surely be considering some error detection/correction? Of course, anything with iostreams is complicated but http://www.boost.org/doc/libs/1_35_0/libs/iostreams/doc/index.html gives you a good start. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Another advantage to Robert's suggestion is that you probably get some error detection thrown in - for very good measure.
If you are building a system for controlling warehouse picking, you must surely be considering some error detection/correction?
Error detection is in my scope of work, however correction is not. That is to say that I need to know when a transmission error has occurred (common over slow radio links in a factory) but do not need to do anything fancy to reconstruct the missing data, rather just request a re-transmit.
Of course, anything with iostreams is complicated but
http://www.boost.org/doc/libs/1_35_0/libs/iostreams/doc/index.html
gives you a good start.
I've poured through the iostreams library quite a bit, and I love the concept of buffered sources and sinks. I haven't quite figured out how to tie iostreams into asio yet, but I'm working on it. ;) Eric

Hi! Eric Hill schrieb:
I've poured through the iostreams library quite a bit, and I love the concept of buffered sources and sinks. I haven't quite figured out how to tie iostreams into asio yet, but I'm working on it. ;)
Well, this will help, I guess: http://www.boost.org/doc/libs/1_35_0/doc/html/boost_asio/reference/ip__tcp/i... Frank

Well, this will help, I guess: http://www.boost.org/doc/libs/1_35_0/doc/html/boost_asio/reference/ip__tcp/i...
Well duh, I glossed right over that class. Thanks! Eric

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Eric Hill Sent: 21 May 2008 21:10 To: boost@lists.boost.org Subject: Re: [boost] [serialization] Binary archive STLcontainertemplatespecialization?
Another advantage to Robert's suggestion is that you probably get some error detection thrown in - for very good measure.
If you are building a system for controlling warehouse picking, you must surely be considering some error detection/correction?
Error detection is in my scope of work, however correction is not. That is to say that I need to know when a transmission error has occurred (common over slow radio links in a factory) but do not need to do anything fancy to reconstruct the missing data, rather just request a re-transmit.
Re-transmit sounds like error correction to me ;-) Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Howabout compressing the whole stream in a general way.
Why not encrypt it while you're at it.
Oh and avoid wrtting code altogether.
I bet you think I'm being sarcastic.
Well, I'm not - I'm serious.
I would investigate using the serialization library along with binary (or maybe text as an experiment), in combination of io streams library. Serialization should work with ANY stream. And the io streams library includes zlib compression. (I don't know about encryption). So in theory, you should be able to get your data stream shrunk by 50% and free encryption - without writting any code at all.
Absolutely, compression will be a consideration, but does it make sense not to optimize the container before compression? Having lots of null bytes should compress very well, but having no null bytes will compress even better... :) There's also the matter of compression and decompression time to take into account. The tiny handheld computers that this will be deployed on are very constrained in terms of CPU and battery life. It's possible that the additional compression and decompression will shave some battery life off the device, and we need all we can get. Additionally, getting zlib and bzip to compile on Windows didn't work "out of the box" for me, and I didn't take the time to fight it yet, so compression simply wasn't available for my initial testing.
participants (4)
-
Eric Hill
-
Frank Birbacher
-
Paul A Bristow
-
Robert Ramey