[serialization] string serialization

Nikolay Mladenov

3 Mar 2009 3 Mar '09

5:55 a.m.

Hi, I the basic_binary_iprimitive.ipp (and I suppose other files) the size of the strings is serialized as std::size_t and not as serialization::collection_size_type. Is that intentional? Thanks, Nikolay Mladenov

Show replies by date

Robert Ramey

3 Mar 3 Mar

4:22 p.m.

Nope, it's just a fact. Robert Ramey Nikolay Mladenov wrote:

...

Hi,

I the basic_binary_iprimitive.ipp (and I suppose other files) the size of the strings is serialized as std::size_t and not as serialization::collection_size_type. Is that intentional?

Thanks,

Nikolay Mladenov _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Nikolay Mladenov

4:37 p.m.

How should I understand this? as "Nope, it should be fixed", or "Nope, deal with it" or is it simply "get lost"? This "simple" fact causes incompatibility problems between 64 and 32 bit archives. Nikolay On Tue, Mar 3, 2009 at 11:22 AM, Robert Ramey <ramey@rrsd.com> wrote:

...

Nope, it's just a fact.

Robert Ramey

Nikolay Mladenov wrote:

...
Hi,

I the basic_binary_iprimitive.ipp (and I suppose other files) the size of the strings is serialized as std::size_t and not as serialization::collection_size_type. Is that intentional?

Thanks,

Nikolay Mladenov _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Robert Ramey

4 Mar 4 Mar

12:16 a.m.

Nope, It's not intentional. I don't remember why it is the way it is. Collections evolved to become more efficient and in the process collection_size_type came into being. Since std::string was already primitive and had a special implemention there hasn't been any motivation to mess with it. I'm not even that enthusiastic about colllection_size_type. Turns out that each std collection has its own size_type. So this makes thing even more confusing. It's the way it is because there was no obvious better choice. Robert Ramey Nikolay Mladenov wrote:

...

How should I understand this?

as "Nope, it should be fixed", or "Nope, deal with it"

or is it simply "get lost"?

This "simple" fact causes incompatibility problems between 64 and 32 bit archives.

Nikolay

On Tue, Mar 3, 2009 at 11:22 AM, Robert Ramey <ramey@rrsd.com> wrote:

...
Nope, it's just a fact.

Robert Ramey

Nikolay Mladenov wrote:

...
Hi,

I the basic_binary_iprimitive.ipp (and I suppose other files) the size of the strings is serialized as std::size_t and not as serialization::collection_size_type. Is that intentional?

Thanks,

Nikolay Mladenov _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

troy d. straszheim

1:24 p.m.

Robert Ramey wrote:

...

Nope, It's not intentional.

I don't remember why it is the way it is. Collections evolved to become more efficient and in the process collection_size_type came into being. Since std::string was already primitive and had a special implemention there hasn't been any motivation to mess with it. I'm not even that enthusiastic about colllection_size_type. Turns out that each std collection has its own size_type. So this makes thing even more confusing.

It's the way it is because there was no obvious better choice.

I think it is a reasonable choice. The more control serialization gives me over how things are stored, the less likely it is that a change in the serialization library will make it impossible for me to maintain an archive that is binary compatible with previous versions of serialization. (In our case, the previous version is v1.33.1, and in this case one needs special handling for strings anyhow.) For other containers, the fact that serialization maps the varying size_types of std collections to one type simplifies things. If I don't like it, I can supply my own serialization routine for the container in question. In the general case (as has already been discussed ad nauseam), I'd argue that 'portable' has a context-dependent meaning (e.g. are ieee754 floats 'portable' for this use case? Is little/big endianness required?) and it is up to the user to build an archive that implements whatever 'portable' is to them. Regarding the complaint about strings. For our (perhaps narrow) definition of portable: * 32/64 bit intel platforms, linux and osx, which implies little-endian archives * floats stored on disk in ieee754 format * serializable types required to use stdint typedefs where types vary across the platforms we run on (for us this basically means 'no plain longs', but when/if 128 bit platforms come around, we may need to go back and change our 'ints' to int32_t) * binary compatible with archives from a venerable custom binary archive built on top of 1.33.1. We can get away with just the following in our portable_binary_iarchive_impl: template<class Archive, class Elem, class Tr> class portable_binary_iarchive_impl : public basic_binary_iprimitive<Archive, Elem, Tr>, public basic_binary_iarchive<Archive> { // ... void load_override(std::string& s, BOOST_PFTO int) { uint32_t l; this->load(l); s.resize(l); this->load_binary(&(s[0]), l); } void load_override(class_name_type& t, BOOST_PFTO int) { std::string cn; cn.reserve(BOOST_SERIALIZATION_MAX_KEY_SIZE); this->load_override(cn, 0); if(cn.size() > (BOOST_SERIALIZATION_MAX_KEY_SIZE - 1)) throw_exception(archive_exception(invalid_class_name)); std::memcpy(t, cn.data(), cn.size()); // borland tweak t.t[cn.size()] = '\0'; } void load_override(boost::serialization::collection_size_type& t, BOOST_PFTO int) { uint32_t l; this->load(l); t = l; } -t

Nikolay Mladenov

6:12 p.m.

Considering your portability context (mine is even narrower): 1. the binary_archives already serialize boost::serialization::collection_size_type as unsigned int(uint32_t would still be better). 2.All this code will be unnecessary if the string serialization is not using "plain longs". On Wed, Mar 4, 2009 at 8:24 AM, troy d. straszheim <troy@resophonic.com> wrote:

...

Robert Ramey wrote:

...
Nope, It's not intentional.

I don't remember why it is the way it is. Collections evolved to become more efficient and in the process collection_size_type came into being. Since std::string was already primitive and had a special implemention there hasn't been any motivation to mess with it. I'm not even that enthusiastic about colllection_size_type. Turns out that each std collection has its own size_type. So this makes thing even more confusing.

It's the way it is because there was no obvious better choice.

I think it is a reasonable choice. The more control serialization gives me over how things are stored, the less likely it is that a change in the serialization library will make it impossible for me to maintain an archive that is binary compatible with previous versions of serialization. (In our case, the previous version is v1.33.1, and in this case one needs special handling for strings anyhow.) For other containers, the fact that serialization maps the varying size_types of std collections to one type simplifies things. If I don't like it, I can supply my own serialization routine for the container in question. In the general case (as has already been discussed ad nauseam), I'd argue that 'portable' has a context-dependent meaning (e.g. are ieee754 floats 'portable' for this use case? Is little/big endianness required?) and it is up to the user to build an archive that implements whatever 'portable' is to them.

Regarding the complaint about strings. For our (perhaps narrow) definition of portable:

* 32/64 bit intel platforms, linux and osx, which implies little-endian archives

* floats stored on disk in ieee754 format

* serializable types required to use stdint typedefs where types vary across the platforms we run on (for us this basically means 'no plain longs', but when/if 128 bit platforms come around, we may need to go back and change our 'ints' to int32_t)

* binary compatible with archives from a venerable custom binary archive built on top of 1.33.1.

We can get away with just the following in our portable_binary_iarchive_impl: template<class Archive, class Elem, class Tr> class portable_binary_iarchive_impl : public basic_binary_iprimitive<Archive, Elem, Tr>, public basic_binary_iarchive<Archive> {

// ...

void load_override(std::string& s, BOOST_PFTO int) { uint32_t l; this->load(l); s.resize(l); this->load_binary(&(s[0]), l); }

void load_override(class_name_type& t, BOOST_PFTO int) { std::string cn; cn.reserve(BOOST_SERIALIZATION_MAX_KEY_SIZE); this->load_override(cn, 0); if(cn.size() > (BOOST_SERIALIZATION_MAX_KEY_SIZE - 1)) throw_exception(archive_exception(invalid_class_name)); std::memcpy(t, cn.data(), cn.size()); // borland tweak t.t[cn.size()] = '\0'; }

void load_override(boost::serialization::collection_size_type& t, BOOST_PFTO int) { uint32_t l; this->load(l); t = l; }

-t _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

troy d. straszheim

6:43 p.m.

Nikolay Mladenov wrote:

...

Considering your portability context (mine is even narrower): 1. the binary_archives already serialize boost::serialization::collection_size_type as unsigned int(uint32_t would still be better). 2.All this code will be unnecessary if the string serialization is not using "plain longs".

Okay... I understand the complaint about the size of strings not being handled like the size of everything else, and agree that it would be nice to be consistent. Because, well, consistency is nice. I was also thrown by the special handling for std::string and had to spend some time in the debugger, serializing things to disk and going through the archives with hexdump. :/ C'est la vie. But since the binary_archives don't make any claim about portability, it would seem that you should serialize all sizes as std::size_t (often 'plain long'), even if the std library's containers' size_types aren't consistent (which I wasn't aware of until Robert pointed it out. still haven't checked.). I'd have to have a look: with a plain binary_archive, is it not possible to save a std::vector with greater than std::numeric_limits<uint32_t>::max() elements on a platform where std::vector<T>::size_type is uint64_t? If you buy that, then you still need to construct a portable_binary_archive and handle container_size_type consistently across architectures (e.g. convert to uint64_t before/after storing/loading, checking for overflow on platforms where the in-memory container_size_type is smaller than the on-disk container_size_type). Then the difference is only that one currently requires an extra override for std::string. -t

Nikolay Mladenov

6:56 p.m.

On Wed, Mar 4, 2009 at 1:43 PM, troy d. straszheim <troy@resophonic.com> wrote:

...

Nikolay Mladenov wrote:

...
Considering your portability context (mine is even narrower): 1. the binary_archives already serialize boost::serialization::collection_size_type as unsigned int(uint32_t would still be better). 2.All this code will be unnecessary if the string serialization is not using "plain longs".

Okay... I understand the complaint about the size of strings not being handled like the size of everything else, and agree that it would be nice to be consistent. Because, well, consistency is nice. I was also thrown by the special handling for std::string and had to spend some time in the debugger, serializing things to disk and going through the archives with hexdump. :/ C'est la vie.

But since the binary_archives don't make any claim about portability, it would seem that you should serialize all sizes as std::size_t (often 'plain long'), even if the std library's containers' size_types aren't consistent (which I wasn't aware of until Robert pointed it out. still haven't checked.). I'd have to have a look: with a plain binary_archive, is it not possible to save a std::vector with greater than std::numeric_limits<uint32_t>::max() elements on a platform where std::vector<T>::size_type is uint64_t?

correct, it is not possible.

...

If you buy that, then you still need to construct a portable_binary_archive and handle container_size_type consistently across architectures (e.g. convert to uint64_t before/after storing/loading, checking for overflow on platforms where the in-memory container_size_type is smaller than the on-disk container_size_type). Then the difference is only that one currently requires an extra override for std::string.

As far as I can tell this is already done by the binary_archives. Nikolay

Robert Ramey

8:03 p.m.

The current portable_binary_archive handles mismatched sizeof all types of integers. If the loading platform can't render the integer saved, an exception is thrown at runtime. Nikolay Mladenov wrote:

...

...
If you buy that, then you still need to construct a portable_binary_archive and handle container_size_type consistently across architectures (e.g. convert to uint64_t before/after storing/loading, checking for overflow on platforms where the in-memory container_size_type is smaller than the on-disk container_size_type). Then the difference is only that one currently requires an extra override for std::string.

As far as I can tell this is already done by the binary_archives.

Nikolay _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Nikolay Mladenov

8:58 p.m.

On Wed, Mar 4, 2009 at 3:03 PM, Robert Ramey <ramey@rrsd.com> wrote:

...

The current portable_binary_archive handles mismatched sizeof all types of integers. If the loading platform can't render the integer saved, an exception is thrown at runtime.

Yes, but all optimizations are disabled. And since I don need to handle endianness, this seems too costly. I have taken care not have mismatched integer sizes in the objects I am serializing, But the binary_i(o)_archives are still forcing the size_t on me. That is all. I understand though, that changing the binary_i(o)_archive string serialization to use collection_size_type will break all existing 64 binary archives... Nikolay Mladeonv

...

Nikolay Mladenov wrote:

...
...
If you buy that, then you still need to construct a portable_binary_archive and handle container_size_type consistently across architectures (e.g. convert to uint64_t before/after storing/loading, checking for overflow on platforms where the in-memory container_size_type is smaller than the on-disk container_size_type). Then the difference is only that one currently requires an extra override for std::string.

As far as I can tell this is already done by the binary_archives.

Nikolay _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Kim Barrett

11 p.m.

At 3:58 PM -0500 3/4/09, Nikolay Mladenov wrote:

...

On Wed, Mar 4, 2009 at 3:03 PM, Robert Ramey <ramey@rrsd.com> wrote:

...
The current portable_binary_archive handles mismatched sizeof all types of integers. If the loading platform can't render the integer saved, an exception is thrown at runtime.

Yes, but all optimizations are disabled. And since I don need to handle endianness, this seems too costly. I have taken care not have mismatched integer sizes in the objects I am serializing, But the binary_i(o)_archives are still forcing the size_t on me.

If you have specialized requirements that aren't acceptably met by any of the existing archive types, you can try writing a specialized archive pair that meets your requirements. How different your requirements are from those of existing archive types will affect how much new code you need to write vs how much help you can get from the existing infrastructure. I haven't actually looked into this specific question in detail, but I suspect it isn't too hard to define a variant of the binary archives that always uses (for example) uint32_t for at least some specific set of container sizes.

troy d. straszheim

9:11 p.m.

Nikolay Mladenov wrote:

...

On Wed, Mar 4, 2009 at 1:43 PM, troy d. straszheim <troy@resophonic.com> wrote:

...
But since the binary_archives don't make any claim about portability, it would seem that you should serialize all sizes as std::size_t (often 'plain long'), even if the std library's containers' size_types aren't consistent (which I wasn't aware of until Robert pointed it out. still haven't checked.). I'd have to have a look: with a plain binary_archive, is it not possible to save a std::vector with greater than std::numeric_limits<uint32_t>::max() elements on a platform where std::vector<T>::size_type is uint64_t?

correct, it is not possible.

So it is. This looks good to me: namespace boost { namespace serialization { BOOST_STRONG_TYPEDEF(std::size_t, collection_size_type) } } // end namespace boost::serialization But this looks bad: void save_override(const serialization::collection_size_type & t, int){ // for backward compatibility, 64 bit integer or variable length integer would be preferred unsigned int x = t.t; // :( * this->This() << x; }

...

...
If you buy that, then you still need to construct a portable_binary_archive and handle container_size_type consistently across architectures (e.g. convert to uint64_t before/after storing/loading, checking for overflow on platforms where the in-memory container_size_type is smaller than the on-disk container_size_type). Then the difference is only that one currently requires an extra override for std::string.

As far as I can tell this is already done by the binary_archives.

Yeah apparently so. I see what you're getting at: for certain scenarios the binary archives are almost portable. Again IMV that is only accidentally the case, shouldn't be that way, and one shouldn't depend on it. Anyhow, attached is the portable binary i/o archive we've been using, for the scenario mentioned. Hack out the bits that refer to very old boost versions (for which there is a different archive implementation)... I think this saves things the way you want them. HTH... -t

5989

Age (days ago)

5990

Last active (days ago)

List overview

Download

11 comments

4 participants

participants (4)

Kim Barrett
Nikolay Mladenov
Robert Ramey
troy d. straszheim