
Greetings everyone, I need to stream a data over a very small data connection (think cell-phones and GPRS). The type of data will vary, but will typically contain strings, numbers, and short groups of these basic types. The application on both ends of the pipe will be written in C++, so I've loosely decided on the boost::serialization library as it virtually eliminates all of the code I would have needed to manually write. (Awesome!) I've made up a bunch of test archives, and I'd like to get some feedback on a possible optimization, or at least specialization, of the code that streams out STL collections. The current serialization methodology for STL containers saves the size of the container followed by each item inside the container. This code is also used for std::string as it behaves like an STL container of characters. The down-side of this is that each string uses a minimum of 8 bytes (32-bit integer) plus the string payload. Proposal: Write out a single byte that indicates the number of elements to follow. If the number of elements is 255 or more, write out a single byte 0xF, followed by the size_type indicating the correct count. Reading follows the same pattern in reverse. Read a single byte. If the byte is 0xF, read size_type, otherwise you have the count. Simple example from my problem domain: If I have a list of three-letter bin locations in a warehouse and each bin contains a quantity of a specific item, I will have the following data to send: DER: 427 ALU: 582 COM: 821 TER: 991 FLO: 0 TER: 298 ALP: 332 PED: 773 Using the boost serialization framework, that data becomes 160 bytes (8 for the size, 8+length for each string, and 8 for each integer). Using my proposal, the data size drops to 97 bytes, nearly 60% in data reduction. I theorize that many serialized strings and collections are less than 255 items or characters (especially in my problem domain) and that this technique will save us many on-the-wire bytes over time. A) Do you think this is a reasonable addition/modification to the serialization library? B) Is there any way to add this functionality to the serialization library without breaking existing archives? I see a call to get_library_version in the code, but I'm not sure what is the purpose of this statement. Anyone? Thanks, Eric