[Boost-users] Thoughts & questions on adding serialization

13 Aug 2008

      I was thinking about adding serialization to some times I've been  
working on in the sandbox.  First I tried to recall how Mr. Ramey  
said serialization can be tested.  I couldn't find the specific post  
I was thinking about, but others that were found gave me the answer.   
Reading other posts in that search prompted me to ask more questions.

I could reduce the classes I'm working with to:

//=============================================
class computer;

class context
{
public:
     typedef boost::array  value_type;

     context();  // use auto copy-ctr, copy-=, dtr

     void        operator ()( bool );  // consumer
     bool        operator ==() const;  // equals
     bool        operator !=() const;  // not-equals
     value_type  operator ()() const;  // producer

private:
     friend class computer;

     boost::uint_fast64_t            length;
     boost::array  buffer;
     boost::array         queue;

     template < class Archive >
     void  serialize( Archive &ar, const unsigned int version );
};

class computer
     : public convenience_methods_base<context>
{
     // An object of type "context" is incorporated in this object
     // due to the base class.  A mutable/const pair of non-static
     // member functions named "context()" gives access to the inner
     // context object.

public:
     typedef context::value_type  value_type;

     // Put various access member functions here that forward to the
     // internals of the "context" type, which work because of the
     // friend declaration.

private:
     template < class Archive >
     void  serialize( Archive &ar, const unsigned int version );
};
//=============================================

I initially planned to have serialization functions for these two  
classes, the "convenience_methods_base" base class template, plus two  
other class templates (a base class and a support class) that  
"convenience_methods_base" uses.  But the e-mail search I mentioned  
found a thread from May 2007 (on the main Boost list) the suggested  
that the serialization of a non-primitive should match the user's  
external representation of the type, and not the type's particular  
internal structure.  So I decided to keep the serialization protocol  
just for the two public-facing classes, "context" and "computer."

I figured that the "computer" object can be serialized like:

//=============================================
template < class Archive >
inline void  computer::serialize( Archive &ar, const unsigned int  
version )
{ ar & boost::serialization::make_nvp("context", this->context()); }
//=============================================

Which leaves how "context" objects are serialized.  After thinking  
about it for hours, I decided to just whip out something quick &  
dirty and refine it later.  So:

//=============================================
template < class Archive >
inline void  context::serialize( Archive &ar, const unsigned int  
version )
{
     ar & BOOST_SERIALIZATION_NVP( length )
        & BOOST_SERIALIZATION_NVP( buffer )
        & BOOST_SERIALIZATION_NVP( queue );
}
//=============================================

would give a final serialization, in my test file, of:

//=============================================
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>

<test class_id="0" tracking_level="0" version="0">
	<context class_id="1" tracking_level="0" version="0">
		<length>1</length>
		<buffer class_id="2" tracking_level="0" version="0">
			<elems>
				<count>4</count>
				<item>1732584193</item>
				<item>4023233417</item>
				<item>2562383102</item>
				<item>271733878</item>
			</elems>
		</buffer>
		<queue class_id="3" tracking_level="0" version="0">
			<elems>
				<count>512</count>
				<item>1</item>
				<item>0</item>
<!-- I'll spare you, and the mail server, of 509 more "<item>0</ 
item>" lines -->
				<item>0</item>
			</elems>
		</queue>
	</context>
</test>

//=============================================

Now I started refining, keeping the principle of not leaking  
implementation details in mind.  The problem here is the array- 
counts, which I don't need since they'll never change.  The first one  
I can fix by writing each element separately:

//=============================================
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>

<test class_id="0" tracking_level="0" version="0">
	<context class_id="1" tracking_level="0" version="0">
		<length>1</length>
		<buffer-A>1732584193</buffer-A>
		<buffer-B>4023233417</buffer-B>
		<buffer-C>2562383102</buffer-C>
		<buffer-D>271733878</buffer-D>
		<message-tail class_id="2" tracking_level="0" version="0">
			<elems>
				<count>512</count>
				<item>1</item>
				<item>0</item>
<!-- 509 more "<item>0</item>" lines -->
				<item>0</item>
			</elems>
		</message-tail>
	</context>
</test>

//=============================================

I've always wanted to use something like a base-64 string encoding of  
the bit array, because it's cool and it'd save space.  I added  
conversion functions to/from the bit array and a std::string, and  
then (de)serialized the string.  I also had to separate "serialize"  
into "save" and "load" since conversion is complementary, not  
identical.  So now I have:

//=============================================
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>

<test class_id="0" tracking_level="0" version="0">
	<context class_id="1" tracking_level="0" version="0">
		<length>1</length>
		<buffer-A>1732584193</buffer-A>
		<buffer-B>4023233417</buffer-B>
		<buffer-C>2562383102</buffer-C>
		<buffer-D>271733878</buffer-D>
		<message-tail>g</message-tail>
	</context>
</test>

//=============================================

Then I added tests for: exactly 6 bits (i.e. one base-64 letter); a  
sextet (actually two) and a partial sextet together; filling a queue  
to capacity (actually one short of that since a full queue  
automatically activates a turnover); and going past capacity  
resulting in a new hash buffer and an empty message-tail.

//=============================================
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>

<test class_id="0" tracking_level="0" version="0">
	<context class_id="1" tracking_level="0" version="0">
		<length>1</length>
		<buffer-A>1732584193</buffer-A>
		<buffer-B>4023233417</buffer-B>
		<buffer-C>2562383102</buffer-C>
		<buffer-D>271733878</buffer-D>
		<message-tail>g</message-tail>
	</context>
</test>

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>

<test class_id="0" tracking_level="0" version="0">
	<context class_id="1" tracking_level="0" version="0">
		<length>6</length>
		<buffer-A>1732584193</buffer-A>
		<buffer-B>4023233417</buffer-B>
		<buffer-C>2562383102</buffer-C>
		<buffer-D>271733878</buffer-D>
		<message-tail>q</message-tail>
	</context>
</test>

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>

<test class_id="0" tracking_level="0" version="0">
	<context class_id="1" tracking_level="0" version="0">
		<length>14</length>
		<buffer-A>1732584193</buffer-A>
		<buffer-B>4023233417</buffer-B>
		<buffer-C>2562383102</buffer-C>
		<buffer-D>271733878</buffer-D>
		<message-tail>qQg</message-tail>
	</context>
</test>

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>

<test class_id="0" tracking_level="0" version="0">
	<context class_id="1" tracking_level="0" version="0">
		<length>511</length>
		<buffer-A>1732584193</buffer-A>
		<buffer-B>4023233417</buffer-B>
		<buffer-C>2562383102</buffer-C>
		<buffer-D>271733878</buffer-D>
		<message- 
tail>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789- 
_AAAAAAAAAAH__________g</message-tail>
	</context>
</test>

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>

<test class_id="0" tracking_level="0" version="0">
	<context class_id="1" tracking_level="0" version="0">
		<length>512</length>
		<buffer-A>2631642121</buffer-A>
		<buffer-B>80961853</buffer-B>
		<buffer-C>4033330630</buffer-C>
		<buffer-D>497373075</buffer-D>
		<message-tail></message-tail>
	</context>
</test>

//=============================================

If you want to see the actual work, look at revision/change-set  
#48131 in Boost's Subversion set-up.  Now to the actual questions:

1. If there's only one sub-object, base or member, that has any  
significant data, could someone call the "serialize" member function  
of that sub-object directly in the wrapping class's "serialize"?   
(This assumes that friendship is set up.)  This would make the  
wrapping class look identical to the sub-object's class, right?  Is  
this a good idea?

2.  Before actually trying to serialize a string, I was worried that  
the string's serialization would include a length count.  This would  
be unnecessary because the object's "length" attribute already  
implies the length of the string (int( ceil( double( length % 512 ) /  
6.0 ) )).  Here, we see that the string's length isn't explicitly  
included in the XML archive, so I have no worries.  But what about  
non-XML archives?  Will be string's length be directly serialized,  
wasting space?  If so, how can I fix that?

3.  Having to add std::string to support serialization makes my class  
header heavier. My class uses fixed-sized arrays, so is there any way  
that I can avoid allocating a string?  For writing out, could I set  
up a char-array with the encoding and write that out?  For reading  
in, can I read the string in piecemeal to a char-array just in case  
someone added more characters than required.  My converter currently  
ignores illegal characters and stops when enough legal characters  
have been read.  If what I ask is possible, would the reading routine  
have to seek to the end of the entry so further serialization isn't  
messed up?

-- 
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT hotmail DOT com