[serialization] class versioning changes in boost 1.42

Hi, It seems that the serialization library shipped with boost 1.42 changed the way class versions are handled. The version_type which used to be an unsigned int is now an uint_least16_t, although I found no mention of that in the release notes. The problem is that we are using a date-based versioning scheme for our classes (for example a developer making a change in some class on 2010-02-22 would increment the class version to 20100222). This scheme worked great for a couple of years, and helped us fix our backward compatibility bugs after the fact. Unfortunately, these big version numbers do not fit in 16-bit integers. As a consequence, the version string output in xml archives with the latest boost-serialization for this example would be 46206 instead of 20100222, breaking our reloading code. May I ask why this change was made? Would it be possible to revert it for the next boost release? If not, anyone has suggestions how we could fix our code while maintaining backward compatibility ? (we have old archives in the wild generated with various releases of our code, which we need to be able to reload). Thanks for any help, David.

David Raulo wrote:
Hi,
It seems that the serialization library shipped with boost 1.42 changed the way class versions are handled. The version_type which used to be an unsigned int is now an uint_least16_t, although I found no mention of that in the release notes.
The problem is that we are using a date-based versioning scheme for our classes (for example a developer making a change in some class on 2010-02-22 would increment the class version to 20100222). This scheme worked great for a couple of years, and helped us fix our backward compatibility bugs after the fact.
Unfortunately, these big version numbers do not fit in 16-bit integers. As a consequence, the version string output in xml archives with the latest boost-serialization for this example would be 46206 instead of 20100222, breaking our reloading code.
May I ask why this change was made? Would it be possible to revert it for the next boost release? If not, anyone has suggestions how we could fix our code while maintaining backward compatibility ? (we have old archives in the wild generated with various releases of our code, which we need to be able to reload).
The change was made to make the system more robust and portable. Other changes were made to suppress warnings which indicated potential problems. Your system presumes that an int is 32 bits. This assumption would make otherwise portable archives not portable to C implemenations whose int is 16 bits. The version # was always assumed to be exactly that. I didn't anticipate that this might be overloaded with some other data - like a date. Had this occurred to me, I would have advised against doing this in them documentation. I realise that this is of cold comfort to you. Here are some ideas. a) You could tweak your copy of the library to use a 32 bit integer. b) in the longer run, you could decline to use the versioning built into the system and substitute your own based on dates. This would entail tweaking all your serialization implemenations to save your "date version" as part of the data. If I thought about it, I suppose I could come up with some more ideas. BTW, I had experimented with an 8 bit version #, since it was (and is) inconcievable to me that one could modify a schema more than 255 times and still find it useful. But this created a problem with text archives outputing an 8 bit integer as a single (non-printable) character. Actually, now that you've brought this to my attention, I would be possible and maybe better to implement version as a special type rather than a STRONGTYPEDEF which would give me what I really want - which is an 8 bit version for binary archives and readable text for text archives. Sorry I can't be of more help. Moral of the story is "Overloading data leads to unanticipated consequences" Robert Ramey
Thanks for any help,
David.

On Mon, 22 Feb 2010 09:57:58 -0800
"Robert Ramey"
The change was made to make the system more robust and portable. Other changes were made to suppress warnings which indicated potential problems.
I understand that ints as such are not portable.
Your system presumes that an int is 32 bits.
Well, to this day and age, we assumed it was at least 32 bits. An unfortunate mistake...
This assumption would make otherwise portable archives not portable to C implemenations whose int is 16 bits. The version # was always assumed to be exactly that.
Are such platforms so widespread nowadays? And using boost libraries? Besides, I would assume making version_type a typedef for uint32_t instead would work everywhere, if inducing some overhead on 16-bits cpus?
I didn't anticipate that this might be overloaded with some other data - like a date. Had this occurred to me, I would have advised against doing this in them documentation.
I would not call our scheme "overloading". We still use these as version numbers, increasing with each new alteration of the classes, independantly for each class. Instead of dates, we could have used code-wide version numbers, ala subversion. All we needed was numbers representing a single point in time globally for all of our classe. Using this kind of numbering has several maintenance advantages, or so I think. For example, our early releases did not use versioning at all, since this simply was not a requirement initially. So we have old archives specifying 0 for the version of all their classes. A couple of releases later, we introduced backward compatibility at our clients demand (some of them clung to old releases just to be able to reuse their old archives). To this end, our load() methods contain code similar to this : template<class Archive> void load(Archive& ar, SomeClass& o, const unsigned int v) { unsigned int version = v; if (version == 0) version = archive_creator_release_version; if (version >= 20080517 && version <= 20080903) ... } where archive_creator_release_version is the date at which we released the software that generated the archive we're trying to reload (this is saved at the beginning of every archive). To write the compatibility reloading code, we only need to look at our code repository history, and add some of these special-cases to our load() methods. If we did use "classical" version numbers, this would have been very difficult to do (at the very least, we would need to store a map of the versions used for each class for each of our releases, and use that when encountering an archive with a 0 version). And a single mistake in a release would have been a lot more difficult to fix afterwards. What do you do when you discover that there are in fact archives which used a variant of SomeClass between version 4 and 5? Anyway, using the release date as the default version for all classes when reading back old archives proved very usefull. There are other cases were using dates as version helped maintain our code, but this message is too long already, and I'm not sure I explained the first case clearly enough ...
I realise that this is of cold comfort to you. Here are some ideas.
a) You could tweak your copy of the library to use a 32 bit integer.
Doable, but inducing difficult constraints. We'd really prefer that our clients be able to link to both official boost releases, and our libraries at the same time. We link dynamically to boost for that reason.
b) in the longer run, you could decline to use the versioning built into the system and substitute your own based on dates. This would entail tweaking all your serialization implemenations to save your "date version" as part of the data.
That would be an awfull lot of code rewriting, and would not help with reading back old archives unfortunatly.
Sorry I can't be of more help. Moral of the story is "Overloading data leads to unanticipated consequences"
I did not realize the unsigned int version was intended to be 8 bits, or I would not have done that. Too late now. Hopefully there is another way out we did not think of yet... Thanks for your response in any case, -- david

Hi Robert, It's not all that clear from the documentation what one can expect of the versioning scheme in Boost.Serialization. It would seem (and correct me if I'm mistaken): (1) Archives created with *older* versions of Boost are not guaranteed to be readable in *newer* versions of Boost (as the OP has discovered). (2) Conversely, there is no mechanism by which archives created with a *newer* Boost version can be guaranteed to be readable by a specific *older* version of Boost. Consequently: (3) It is not generally possible to write a program capable of reading archives that have been written by multiple distinct versions of itself. (4) It is not generally possible to write a program capable of writing archives that will be readable by multiple distinct versions of itself. Regardless of whether one thinks these issues are serious or not, I think they should at least be prominently documented, as the potential for pain is quite high. Essentially, the current versioning scheme can make it prohibitively difficult to ever upgrade the version of Boost that a particular codebase is using. Regards, Jarl.

Jarl Lindrud wrote:
Hi Robert,
It's not all that clear from the documentation what one can expect of the versioning scheme in Boost.Serialization. It would seem (and correct me if I'm mistaken):
(1) Archives created with *older* versions of Boost are not guaranteed to be readable in *newer* versions of Boost (as the OP has discovered).
The intention is that archives created by older versions of the librar and previous versions of the application are guarenteed to be readable by later versions or both. If this does in fact occur, I would consider it a mistake.
(2) Conversely, there is no mechanism by which archives created with a *newer* Boost version can be guaranteed to be readable by a specific *older* version of Boost.
This is true. I don't see anyway of making such a guarentee and I don't seen any utility in being able to do this.
Consequently:
(3) It is not generally possible to write a program capable of reading archives that have been written by multiple distinct versions of itself.
Since (1) is not true, this does not follow.
(4) It is not generally possible to write a program capable of writing archives that will be readable by multiple distinct versions of itself.
Same as above.
Regardless of whether one thinks these issues are serious or not, I think they should at least be prominently documented, as the potential for pain is quite high. Essentially, the current versioning scheme can make it prohibitively difficult to ever upgrade the version of Boost that a particular codebase is using.
The case we're referring to has come up due the presumption of a 32 bit version number. I made the version # with a certain expection of how it would be used - and it got used in a different way. And I failed to include extra code to enforce that expectation. When I added such code later, you got caught. Was it a mistake on my part to not include such code until now? Maybe. Was it a mistake to not make this expectation explicit in the documentation. Maybe. Was it a mistake to presume that a version # would be able to hold upto 4 G versions? Maybe. Does this help? Maybe - but probably not. I spent a little bit of time thinking about this and nothing immediate came to mind. I'll look at it some more when I have some time. You might want to add a track item and let me know if you come up with some other ideas. I'm not unsympathetic, but honestly, I can't forsee everything - though I try. Robert Ramey

On 25 February 2010 00:50, Robert Ramey
Jarl Lindrud wrote:
The case we're referring to has come up due the presumption of a 32 bit version number. I made the version # with a certain expection of how it would be used - and it got used in a different way. And I failed to include extra code to enforce that expectation. When I added such code later, you got caught.
Was it a mistake on my part to not include such code until now? Maybe. Was it a mistake to not make this expectation explicit in the documentation. Maybe. Was it a mistake to presume that a version # would be able to hold upto 4 G versions? Maybe.
It would be nice if the version number could be more flexible. Currently for an SQL database, I use a major-minor versioning scheme for the schema. I have two version of the program in active use across multiple sites, all working with their own database... a "stable" version of a program, and a "beta" version. The Stable program uses (eg) database schema version 3.23 The Beta program uses (eg) database schema version 4.5 The Beta version was forked from Stable around db-schema-version 3.20, and then Stable was put on maintenance mode. At this point, Beta's schema version changed to 4.0 and it was modified as features were added. Stable required a few bug fixes, so 3.20 became 3.21, 3.22 etc. Two parallel versioning paths. 3.21 would've come at the time that Beta had already reached 4.3. 3.22 and 3.23 were released while Beta was on version 4.4, so later I released a Beta that could upgrade from 3.23 to 4.5 When the site upgrades to Beta, the Beta's upgrade system knows how to upgrade from the various stable versions. eg 3.20 --> 4.0, 4.1, 4.2, 4.3, 4.4, 4.5 3.21 --> 4.4, 4.5 3.22 --> 4.5 3.23 --> 4.5 This has worked well. It allows me to upgrade my schemas in parallel as required, and I don't have to shoe-horn it into a single-numbering scheme... which I guess would look something like this: 3.20 == version 50 51 52 etc version 4.0 == version 100 So now there is about 50 spare upgrade slots for Stable before I run out and hit where Beta began. This is an arbitrary limitation, I could've made it 10 slots and then really shot myself in the foot if I ended up needing 11. So could we look into freeing up the versioning scheme? It doesn't have to be a number... it just needs to be: a) put into a XML tag eg version="abc" b) portable in a binary sense c) support the < and == operator d) ?? is that it? with a bit of work, even compound versioning numbers like my major/minor are possible. what do you think? Paul

Robert Ramey
(1) Archives created with *older* versions of Boost are not guaranteed to be readable in *newer* versions of Boost (as the OP has discovered).
The intention is that archives created by older versions of the librar and previous versions of the application are guarenteed to be readable by later versions or both. If this does in fact occur, I would consider it a mistake.
In that case, the issue David encountered is unambiguously a bug in Boost.Serialization, and should be fixed in a future version of Boost. Out of curiosity, do you store the Boost version number, or something equivalent, when creating an archive? If not, how do you make changes to the archive format (e.g. the change David found in 1.42.0) without breaking old archives?
(2) Conversely, there is no mechanism by which archives created with a *newer* Boost version can be guaranteed to be readable by a specific *older* version of Boost.
This is true. I don't see anyway of making such a guarentee and I don't seen any utility in being able to do this.
How would an application ever be able to exchange data with older, deployed, versions of itself, without this capability?
Consequently:
(3) It is not generally possible to write a program capable of reading archives that have been written by multiple distinct versions of itself.
Since (1) is not true, this does not follow.
Your response to David seemed to be essentially "too bad, maybe you can find a way around it yourself", so I can't see that (1) is being taken very seriously.
(4) It is not generally possible to write a program capable of writing archives that will be readable by multiple distinct versions of itself.
Same as above.
Not sure what you mean...
I'm not unsympathetic, but honestly, I can't forsee everything - though I try.
Of course... The point is that with a robust versioning scheme in place, archive format changes can be implemented without breaking older software. Regards, Jarl.

Jarl Lindrud wrote:
Robert Ramey
writes: (1) Archives created with *older* versions of Boost are not guaranteed to be readable in *newer* versions of Boost (as the OP has discovered).
The intention is that archives created by older versions of the librar and previous versions of the application are guarenteed to be readable by later versions or both. If this does in fact occur, I would consider it a mistake.
In that case, the issue David encountered is unambiguously a bug in Boost.Serialization, and should be fixed in a future version of Boost.
The version # was envisioned as a small integer. All the examples tests and demos used this. The problem comes about because it was unanticipated that someone would like to include actual data (ie a date in this case) in a version #. Note that this is the first time in 9 years that this has come up. So I think it's a little much to characterize it as a bug in the library. It would better be called an unanticipated usage of the version #.
Out of curiosity, do you store the Boost version number, or something equivalent, when creating an archive?
No.
If not, how do you make changes to the archive format (e.g. the change David found in 1.42.0) without breaking old archives?
This is described in the documentation. The version # is maintained on a class basis and is completely independent of any other number such as program or boost version. A little reflection should make it clear why it pretty much has to be this way.
(2) Conversely, there is no mechanism by which archives created with a *newer* Boost version can be guaranteed to be readable by a specific *older* version of Boost.
This is true. I don't see anyway of making such a guarentee and I don't seen any utility in being able to do this.
How would an application ever be able to exchange data with older, deployed, versions of itself, without this capability?
Again, a little reflection will make it clear that an older version of a program can't anticipate changes in a subsequent version. I'm sorry - it's just logically not possible. Think about it.
Consequently:
(3) It is not generally possible to write a program capable of reading archives that have been written by multiple distinct versions of itself.
Since (1) is not true, this does not follow.
Your response to David seemed to be essentially "too bad, maybe you can find a way around it yourself", so I can't see that (1) is being taken very seriously.
If I had an easy answer, honestly I would share it. Really. I don't. Sorry.
Of course... The point is that with a robust versioning scheme in place, archive format changes can be implemented without breaking older software.
There is a robust (and efficient) versioning scheme has been in place since the beginning. It was never designed to be able to hold extra data. It's unfortunate that I didn't trap such an unintended usage. I try really hard - but I haven't been able to trap every case where something is used in a way that doesn't occur to me. Robert Ramey.

Robert Ramey wrote:
In that case, the issue David encountered is unambiguously a bug in Boost.Serialization, and should be fixed in a future version of Boost.
The version # was envisioned as a small integer. All the examples tests and demos used this. The problem comes about because it was unanticipated that someone would like to include actual data (ie a date in this case) in a version #. Note that this is the first time in 9 years that this has come up.
Well, I assume that most folks who use Boost.Serialization don't post here to report what version scheme they have used. So, I suggest you don't make broad conclusions based on the number of reports.
So I think it's a little much to characterize it as a bug in the library. It would better be called an unanticipated usage of the version #.
Out of curiosity, do you store the Boost version number, or something equivalent, when creating an archive?
No.
If you have changed the number of bytes used to store the class version, and you do *not* store boost version number in archive, then how can 1.42 read an archive created by 1.41 -- even assuming the classes being serialized did not change themself.
If not, how do you make changes to the archive format (e.g. the change David found in 1.42.0) without breaking old archives?
This is described in the documentation. The version # is maintained on a class basis and is completely independent of any other number such as program or boost version. A little reflection should make it clear why it pretty much has to be this way.
Unless you promise and document that format used by boost.serialize will never change, it seems like you also have to include the version number for the archive format itself.
How would an application ever be able to exchange data with older, deployed, versions of itself, without this capability?
Again, a little reflection will make it clear that an older version of a program can't anticipate changes in a subsequent version. I'm sorry - it's just logically not possible. Think about it.
Assuming the use classes are not changed, why program built with 1.41 cannot read archive created by 1.42? - Volodya

Vladimir Prus wrote:
Robert Ramey wrote:
In that case, the issue David encountered is unambiguously a bug in Boost.Serialization, and should be fixed in a future version of Boost.
The version # was envisioned as a small integer. All the examples tests and demos used this. The problem comes about because it was unanticipated that someone would like to include actual data (ie a date in this case) in a version #. Note that this is the first time in 9 years that this has come up.
Well, I assume that most folks who use Boost.Serialization don't post here to report what version scheme they have used. So, I suggest you don't make broad conclusions based on the number of reports.
I would expect that people running across problems with the scheme would post here. So I think it's correct that this hasn't been a problem up until now.
If you have changed the number of bytes used to store the class version, and you do *not* store boost version number in archive, then how can 1.42 read an archive created by 1.41 -- even assuming the classes being serialized did not change themself.
This should be clear from reading the documentation. If it's not, we can enhance the documentation. It's very simple. Each class is assigned a version # starting with 0. When a new member is added to the class the version # is changed with BOOST_CLASS_VERSION(name, #). The signature for loading is: void load(Archive & ar, T & t, const version){ ar >> m_x; ... if(version >= 1) ar >> m_z; } in addition, there is a serialization library version returned with get_library_version. This is used internally by the library to address changes in serialization of primitives and other types for which class information is not kept in the archive. I believe that this version # is now up to 4. The class version # is the version # of the class - NOT boost, not the application, not anything else.
If not, how do you make changes to the archive format (e.g. the change David found in 1.42.0) without breaking old archives?
In this case, David overloaded the version # in a way I never anticipated. Their usage of the version number presumed a 32 bit integer. binary archives only use 16 bits. So their change would make it impossible for them to use binary archives. This issue doesn't show up in text archives since a variable length string is used for integers. I made the change to detect exactly this type of unintended usage which would make serializations non-portable. Obviously I did this a few years too late.
This is described in the documentation. The version # is maintained on a class basis and is completely independent of any other number such as program or boost version. A little reflection should make it clear why it pretty much has to be this way.
Unless you promise and document that format used by boost.serialize will never change, it seems like you also have to include the version number for the archive format itself.
it's in the documentation as get_library_version(). So far I don't think any user has ever had to call this function. We do call it in the implementation of serialization for collections. I think the need for this arose in the implementation of fast array serialization which made a "shortcut" through the normal procedure.
How would an application ever be able to exchange data with older, deployed, versions of itself, without this capability?
Again, a little reflection will make it clear that an older version of a program can't anticipate changes in a subsequent version. I'm sorry - it's just logically not possible. Think about it.
Assuming the use classes are not changed, why program built with 1.41 cannot read archive created by 1.42?
It can. In this particular case, the situation is that a program built with 1.42 cannot read an archive created by 1.41 Robert Ramey

On Thu, 25 Feb 2010 08:49:50 -0800
"Robert Ramey"
Vladimir Prus wrote:
Robert Ramey wrote:
Well, I assume that most folks who use Boost.Serialization don't post here to report what version scheme they have used. So, I suggest you don't make broad conclusions based on the number of reports.
I would expect that people running across problems with the scheme would post here. So I think it's correct that this hasn't been a problem up until now.
Maybe we are the only ones to use big version numbers, maybe not. In the later case these users will only get bit when they switch to boost 1.42. So it may be a little early to judge.
The signature for loading is:
void load(Archive & ar, T & t, const version){
Maybe it would be a good idea to change this signature to reflect the actual contract on version numbers, so new users can not miss it.
in addition, there is a serialization library version returned with get_library_version. This is used internally by the library to address changes in serialization of primitives and other types for which class information is not kept in the archive. I believe that this version # is now up to 4.
Did this number change with boost 1.42? If yes I could use it to detect these archives and work around my problem.
Their usage of the version number presumed a 32 bit integer. binary archives only use 16 bits. So their change would make it impossible for them to use binary archives.
Actually, we did encounter this problem with binary archive early on, and we found that binary archive stored the class version as 8 bits (not 16). I guess I should have posted about this then. But this was not a regression (AFAICT binary archives always behaved like this), and it made sense that in a binary format you may want to save every last byte. So we switched wo XML, and did not report about it here since we assumed to be safe. I understand now that you did not anticipate our usage. About binary archives, I just did a test, and with boost 1.42 we now get an exception: /opt/boost-1_42_0/include/boost/archive/basic_binary_oarchive.hpp:83: void boost::archive::basic_binary_oarchive<Archive>::save_override(const boost::archive:\ :version_type&, int) [with Archive = boost::archive::binary_oarchive]: Assertion `t.t <= boost::integer_traits<unsigned char>::const_max' failed. and this is while saving an object with version 300 to a binary archive (100 works as expected). With an xml archive, and a version number of 100000, no exception, and the version is truncated to 16 bits.
Assuming the use classes are not changed, why program built with 1.41 cannot read archive created by 1.42?
It can.
Not in my case. That's how I initially discovered the problem. The class version properties in XML archives generated by 1.42 were truncated to 16 bits, and so could not be read back (with either 1.41 or 1.42).
In this particular case, the situation is that a program built with 1.42 cannot read an archive created by 1.41
yes, and I understand that.
BTW there is still something I do not understand here.
Here is an archive generated with boost 1.42 :
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>

David Raulo wrote:
On Thu, 25 Feb 2010 08:49:50 -0800 "Robert Ramey"
wrote: The signature for loading is:
void load(Archive & ar, T & t, const version){
Maybe it would be a good idea to change this signature to reflect the actual contract on version numbers, so new users can not miss it.
This appeals to me very much. I would like to change it to template<class Archive> void load(Archive &ar, T &t, const uint_least8 version){ } But this would provoke a large number of warnings and another huge chorus of complaints. So I would probably not do this.
in addition, there is a serialization library version returned with get_library_version. This is used internally by the library to address changes in serialization of primitives and other types for which class information is not kept in the archive. I believe that this version # is now up to 4.
Did this number change with boost 1.42? If yes I could use it to detect these archives and work around my problem.
It didn't. But we could change it for boost 1.43 if that would help. I think it's only at 4 so we have 251 to go. At current rate of change, that should last about 500 more years.
Their usage of the version number presumed a 32 bit integer. binary archives only use 16 bits. So their change would make it impossible for them to use binary archives.
Actually, we did encounter this problem with binary archive early on, and we found that binary archive stored the class version as 8 bits (not 16).
Hmmm perhaps I mis-remembered.
I guess I should have posted about this then. But this was not a regression (AFAICT binary archives always behaved like this), and it made sense that in a binary format you may want to save every last byte. So we switched wo XML, and did not report about it here since we assumed to be safe. I understand now that you did not anticipate our usage.
Also, it has always been a fundamental goal of the library that serialization not be tied to a particular type of archive. That is, any serialization functions which world with one type of achive should work with any other. The explains things why serialization of wstring is a part of arhive_?text even though some have considered this as a mistake. Leaving a max version of 8 (or 16) in the binary_archive while using 32 bits in the text archives means that you can't now use binary archives. It never occured to me than anyone would want to couple these two concepts - archive and serialization.
About binary archives, I just did a test, and with boost 1.42 we now get an exception:
/opt/boost-1_42_0/include/boost/archive/basic_binary_oarchive.hpp:83: void boost::archive::basic_binary_oarchive<Archive>::save_override(const boost::archive:\ :version_type&, int) [with Archive = boost::archive::binary_oarchive]: Assertion `t.t <= boost::integer_traits<unsigned char>::const_max' failed.
and this is while saving an object with version 300 to a binary archive (100 works as expected). With an xml archive, and a version number of 100000, no exception, and the version is truncated to 16 bits.
well, that's inconsistent of course. I'll look into it.
Assuming the use classes are not changed, why program built with 1.41 cannot read archive created by 1.42?
It can.
Not in my case. That's how I initially discovered the problem. The class version properties in XML archives generated by 1.42 were truncated to 16 bits, and so could not be read back (with either 1.41 or 1.42).
OK - it can - except in your case.
In this particular case, the situation is that a program built with 1.42 cannot read an archive created by 1.41
yes, and I understand that.
I realise that. But I was answering a person how didn't understand that.
BTW there is still something I do not understand here. Here is an archive generated with boost 1.42 :
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!DOCTYPE boost_serialization>
<vec class_id="0" tracking_level="0" version="0"> <count>2</count> 100000 <item class_id="1" tracking_level="1" version="34464" object_id="_0"> <value>1</value> </item> <item object_id="_1"> <value>1</value> </item> </vec>This is an STL vector of objects having BOOST_CLASS_VERSION(100000). The class version is truncated, but the "original" number appears as the item_version just above. How did that happen?
lol - here is how that happened. In this case we have two version # floating around here. One is the version# of the container while the other is the version # of the container's item. Since the vector serialization has the capability to "shortcut" the normal process of doing the serializaiton item by item we need to create a data item called "item_version". This is just like any other data item so it could have any type. Of course it should have had the same type of the internal version type - but apparently this was an oversight.
Is there any way I can access this item_version ?
Look into the code for serialization of collections. I doubt there is an easy way to get at it.
Thanks for your help,
Sorry haven't been able to be more helpful. I'm still considering this. You could fix the problem by tweaking the library to permit reading of old archives and re-serializing them under the new system. I realize that this would be a pain but it would work. Also it's not clear that you need to upgrade at all. The changes in the library for the last couple of versions are mostly related to doing thread-safe serialization of types defined in DLLS. This is a whole other can of worms. If you're not doing this, I don't see you'll gain much by using the more recent library.
David.

The version # was envisioned as a small integer. All the examples tests and demos used this. The problem comes about because it was unanticipated that someone would like to include actual data (ie a date in this case) in a version #. Note that this is the first time in 9 years that this has come up. So I think it's a little much to characterize it as a bug in the library. It would better be called an unanticipated usage of the version #.
IIUC, in 1.41.0 and earlier, the version number was an int. In 1.42.0, it is now 16 bits, which is a breaking change on just about every platform. The responsibility of dealing with this archive format change surely lies with Boost.Serialization itself? Or do Boost.Serialization users need to know that archives they write are not necessarily readable by later versions? I can't see much middle ground here - either you're backwards compatible, or you're not.
If not, how do you make changes to the archive format (e.g. the change David found in 1.42.0) without breaking old archives?
This is described in the documentation. The version # is maintained on a class basis and is completely independent of any other number such as program or boost version. A little reflection should make it clear why it pretty much has to be this way.
I'm talking about changes within Boost.Serialization itself, not changes to user-defined types. The 32-bit-to-16-bit change that triggered this discussion, is a good example. How will Boost.Serialization in the future, know whether to read a 16 or 32 bit version number, from an archive? If it always reads a 16 bit version number, then you've broken compatibility with all pre-1.42.0 archives. If it always reads a 32 bit version number, then you've broken compatibility with 1.42.0. To deal with this, you really need to know which version of Boost was used to create the archive.
How would an application ever be able to exchange data with older, deployed, versions of itself, without this capability?
Again, a little reflection will make it clear that an older version of a program can't anticipate changes in a subsequent version. I'm sorry - it's just logically not possible. Think about it.
Do you realize that e.g. Microsoft Word 2007 can be instructed to save files in such a way that they can be loaded with Word 2003? What is logically impossible about that?
Your response to David seemed to be essentially "too bad, maybe you can find a way around it yourself", so I can't see that (1) is being taken very seriously.
If I had an easy answer, honestly I would share it. Really. I don't. Sorry.
Fair enough, but then it should be stated clearly in the documentation: "Archives created by one version of Boost.Serialization are *not* guaranteed to be readable by subsequent versions of Boost.Serialization.".
Of course... The point is that with a robust versioning scheme in place, archive format changes can be implemented without breaking older software.
There is a robust (and efficient) versioning scheme has been in place since the beginning. It was never designed to be able to hold extra data. It's unfortunate that I didn't trap such an unintended usage. I try really hard - but I haven't been able to trap every case where something is used in a way that doesn't occur to me.
How can you call it robust? It is evidently not providing compatibility in either the backwards, or the forwards, direction. Regards, Jarl.

Jarl Lindrud wrote:
The version # was envisioned as a small integer. All the examples tests and demos used this. The problem comes about because it was unanticipated that someone would like to include actual data (ie a date in this case) in a version #. Note that this is the first time in 9 years that this has come up. So I think it's a little much to characterize it as a bug in the library. It would better be called an unanticipated usage of the version #.
IIUC, in 1.41.0 and earlier, the version number was an int. In 1.42.0, it is now 16 bits, which is a breaking change on just about every platform.
The version # has always been 16 bits. The binary archive has always stored 16 bits for the version #. The code used an int - whose size varies between 16 to 64 bits depending on the platform. Text archives convert the int to a string and this conversion doesn't trap when the number passes 16 bits.
The responsibility of dealing with this archive format change surely lies with Boost.Serialization itself?
There is no format change in the library.
Or do Boost.Serialization users need to know that archives they write are not necessarily readable by later versions?
Hmmm - storing a 32 bit integer in a value saved as a 16 bit value (binary_archive) is not a good idea. I recognize that it was not obvious when one did that and that it could work in some cases - such as this users. That's exactly what the level 4 warning was telling me. So I fixed the code to suppress the warning ! and here we are.
I can't see much middle ground here - either you're backwards compatible, or you're not.
lol - no question about that.
If not, how do you make changes to the archive format (e.g. the change David found in 1.42.0) without breaking old archives?
This is described in the documentation. The version # is maintained on a class basis and is completely independent of any other number such as program or boost version. A little reflection should make it clear why it pretty much has to be this way.
I'm talking about changes within Boost.Serialization itself, not changes to user-defined types. The 32-bit-to-16-bit change that triggered this discussion, is a good example. How will Boost.Serialization in the future, know whether to read a 16 or 32 bit version number, from an archive?
In this particular case, the situation is not that bad. This particular code has only been tested with text archives. (It would break immediately with binary ones). So the only issue is what size should the version # be read into. Even here it's a specific case as on a machine with a 16 bit int, the users code would have already failed. I'm still thinking about this, but I can see that reading the version # into an int rather than an int_least16_t would solve his problem - though it wouldn't address the other issues I've mentioned. I'll consider this for version 1.43. This would permit him to load old archives. 1.42 will trap when a version # exceeds 16 bits. I wouldn't expect this to change though. So the problem of how use version # will have to be dealt with.
If it always reads a 16 bit version number, then you've broken compatibility with all pre-1.42.0 archives. If it always reads a 32 bit version number, then you've broken compatibility with 1.42.0.
To deal with this, you really need to know which version of Boost was used to create the archive.
There is a mechanism for addressing these kinds of issues - it's the library version # as described in the documentation. So far, that # is up to 4.
How would an application ever be able to exchange data with older, deployed, versions of itself, without this capability?
Again, a little reflection will make it clear that an older version of a program can't anticipate changes in a subsequent version. I'm sorry - it's just logically not possible. Think about it.
Do you realize that e.g. Microsoft Word 2007 can be instructed to save files in such a way that they can be loaded with Word 2003? What is logically impossible about that?
Can Microsoft 2003 word load files created with Microsoft word 2007? That is what we're talking about here. The question of being able to create previous versions has been discussed. In fact, there is a section of the documentation in which this is discussed as a possible extension. It wouldn't be all that hard to implement - but no one has shown any interest in doing it.
Your response to David seemed to be essentially "too bad, maybe you can find a way around it yourself", so I can't see that (1) is being taken very seriously.
If I had an easy answer, honestly I would share it. Really. I don't. Sorry.
Fair enough, but then it should be stated clearly in the documentation: "Archives created by one version of Boost.Serialization are *not* guaranteed to be readable by subsequent versions of Boost.Serialization.".
Hmmm - I might be willing to say a) that the intention is to make such a guarentee b) and every effort has been made to that end c) and that every attempt has been made to anticipate the usage of the library d) and that the library has been in usage for many years e) and that versioning is a widely used facility f) that has had very few problems from users g) and that continual efforts are being made to make that guarentee stronger h) but that it's possible that there is something I haven't anticipated which will create a problem. But I suppose that goes without saying.
Of course... The point is that with a robust versioning scheme in place, archive format changes can be implemented without breaking older software.
There is a robust (and efficient) versioning scheme has been in place since the beginning. It was never designed to be able to hold extra data. It's unfortunate that I didn't trap such an unintended usage. I try really hard - but I haven't been able to trap every case where something is used in a way that doesn't occur to me.
How can you call it robust? It is evidently not providing compatibility in either the backwards, or the forwards, direction.
Honestly, I can't help but wonder if you've read the documentation or used the library. Robert Ramey

The version # has always been 16 bits. The binary archive has always stored 16 bits for the version #. The code used an int - whose size varies between 16 to 64 bits depending on the platform. Text archives convert the int to a string and this conversion doesn't trap when the number passes 16 bits.
I see.
To deal with this, you really need to know which version of Boost was used to create the archive.
There is a mechanism for addressing these kinds of issues - it's the library version # as described in the documentation. So far, that # is up to 4.
Thanks, that's what I was after.
Do you realize that e.g. Microsoft Word 2007 can be instructed to save files in such a way that they can be loaded with Word 2003? What is logically impossible about that?
Can Microsoft 2003 word load files created with Microsoft word 2007? That is what we're talking about here.
Of course it can. Try it yourself (Word 2007 -> File -> Save As -> select the relevant file type). This capability is fundamentally important for many applications, and there's nothing magic or logically impossible about it. Networked applications need this capability as well. E.g. a newly developed client needs to be able to communicate with any number of older, deployed, servers.
Hmmm - I might be willing to say a) that the intention is to make such a guarentee b) and every effort has been made to that end c) and that every attempt has been made to anticipate the usage of the library d) and that the library has been in usage for many years e) and that versioning is a widely used facility f) that has had very few problems from users g) and that continual efforts are being made to make that guarentee stronger h) but that it's possible that there is something I haven't anticipated which will create a problem.
I'm confused. You indicate here that backwards compatibility is an *intention*, and yet the Boost.Serialization documentation indicates that backwards compatibility is a *guarantee* (Contents -> To Do -> Back Versioning -> "... Currently, the library permits one make programs that are guarenteed the ability to load archives with classes of a previous version..."). For end users of Boost.Serialization there is a big difference between a guarantee and an intention. * If backwards compatibility is a guarantee, then the breakage David came across is a regression, and would need to be fixed in a future version of Boost. * If backwards compatibility is a (best-effort) intention, then you can indeed take a pass on this breakage, and leave David to deal with it himself. But of course then the documentation should state clearly that backwards compatibility is an intention, not a guarantee. So which one is it? Regards, Jarl.

Jarl Lindrud wrote:
Can Microsoft 2003 word load files created with Microsoft word 2007? That is what we're talking about here.
Of course it can. Try it yourself (Word 2007 -> File -> Save As -> select the relevant file type). This capability is fundamentally important for many applications, and there's nothing magic or logically impossible about it.
Please reread my sentence above.
Networked applications need this capability as well. E.g. a newly developed client needs to be able to communicate with any number of older, deployed, servers.
This capability is and has always been part of the serialization library. It it included in the documentation, demos and tests. Please read the documentation.
Hmmm - I might be willing to say a) that the intention is to make such a guarentee b) and every effort has been made to that end c) and that every attempt has been made to anticipate the usage of the library d) and that the library has been in usage for many years e) and that versioning is a widely used facility f) that has had very few problems from users g) and that continual efforts are being made to make that guarentee stronger h) but that it's possible that there is something I haven't anticipated which will create a problem.
I'm confused. You indicate here that backwards compatibility is an *intention*, and yet the Boost.Serialization documentation indicates that backwards compatibility is a *guarantee* (Contents -> To Do -> Back Versioning -> "... Currently, the library permits one make programs that are guarenteed the ability to load archives with classes of a previous version...").
This is all true. The only restriction is that a class version number has to fit into 8 bits. This leaves room for 255 versions for each class. In this particular case, the version "number" was loaded with a 6 digit date. It was never the intention of the library to support this as it is unnecessary and would make some archives bigger. In never crossed my mind that someone would want to do such a thing. That is the problem - and only problem here.
For end users of Boost.Serialization there is a big difference between a guarantee and an intention.
well it's tested every day. I could say that its a guarentee as long as the facility is used as expected. But
* If backwards compatibility is a guarantee, then the breakage David came across is a regression, and would need to be fixed in a future version of Boost.
It's only a guarentee if one follows the rules. In fact he did get a compile error when he tried to use a binary_archive which in fact checks the size of the version number passed. The problem came up only because I added similar checking to the code for text base archives.
* If backwards compatibility is a (best-effort) intention, then you can indeed take a pass on this breakage, and leave David to deal with it himself. But of course then the documentation should state clearly that backwards compatibility is an intention, not a guarantee.
lol - I write 30,000 lines of code with the intention of making no errors. I cannot guarentee that I have made no errors. Robert Ramey

On Fri, 26 Feb 2010 21:44:04 -0800
"Robert Ramey"
Jarl Lindrud wrote:
Networked applications need this capability as well. E.g. a newly developed client needs to be able to communicate with any number of older, deployed, servers.
This capability is and has always been part of the serialization library. It it included in the documentation, demos and tests. Please read the documentation.
I think he did, but you did not understand his point. In order to allow newer clients talk to older servers, you need forward compatibility in addition to backward compability (that is to say, the newer client needs to save archives in the format understood by the older server). Boost serialization only tries to provide the later. Google protocol buffers do offer forward compatibility, but have little to do with serialization. http://en.wikipedia.org/wiki/Forward_compatibility Anyway, this is getting very OT.
I'm confused. You indicate here that backwards compatibility is an *intention*, and yet the Boost.Serialization documentation indicates that backwards compatibility is a *guarantee* (Contents -> To Do -> Back Versioning -> "... Currently, the library permits one make programs that are guarenteed the ability to load archives with classes of a previous version...").
This is all true. The only restriction is that [...] It was never the intention of the library to support this [...]
Again you are missing his point. Sure you could not anticipate this, this is still a regression. An accidental, unanticipated one. I am myself a software library writer. When I make such a guarantee, this is a promise to my users that if it ever breaks (accidents happen), I fix it. If I do not make any such claim, then my users are to keep the pieces. But of course this must be clearly understood by all parties, and put prominently in the documentation. This is just being honest. I would perfectly understand that you drop the claim of offering any guarantee of backward compatibility, just a best-effort. This is, after all, free software. You volonteered an enormous quantity of time writing this impressive library for free, and for that I'm grateful.
It's only a guarentee if one follows the rules.
Your rules aren't written anywhere... I honestly do not know what other implicit rules I unintentionally broke by just using the library. In other terms, it is now obvious to me I actually have no guarantee. We are only suggesting you to make that clear in the documentation, so new user can acess the risk they are taking, and balance that against the many benefits of using boost serialization.
In fact he did get a compile error when he tried to use a binary_archive which in fact checks the size of the version number passed.
no no no. This behavior is new. Before boost 1.42 I never got compile errors, and in fact I had no way of knowing what I was doing was "against the rules". This is becoming ridiculous. Can we please get back to the actual issue? Here are the facts : - boost_version used to be an unsigned int, which for the vast majority of your users was 32 bits at least; - boost 1.42 changed that to 16 bits; - in your mind it was always 8 bits and you were just trying to enforce this in a more explicit way; - this is not mentionned anywhere in the documentation or in the API. - you have at least one user who lost backward archive compatibility because of this. See below a patch against svn. Can we please discuss its advantages and drawbacks? - what good does it do: obviously in my case, restore backward compatibility. - it gives more flexibility to the versioning scheme. Two usefull such schemes were described previously in the discussion, where classes still have increasing integer versions, but are not possible with 8 bits storage. - what downsides does appying this patch have? Maybe occuring a slight overhead on 16-bits platforms? If true, can this be actually measured? - Would this patch cause any regression? Break any user code which was working fine before? Break user archive backward compatibilty? Thanks, David. Index: boost/archive/basic_archive.hpp =================================================================== --- boost/archive/basic_archive.hpp (revision 59943) +++ boost/archive/basic_archive.hpp (working copy) @@ -37,7 +37,7 @@ } /* boost */ \ /**/ -BOOST_ARCHIVE_STRONG_TYPEDEF(uint_least16_t, version_type) +BOOST_ARCHIVE_STRONG_TYPEDEF(uint_least32_t, version_type) BOOST_ARCHIVE_STRONG_TYPEDEF(int_least16_t, class_id_type) BOOST_ARCHIVE_STRONG_TYPEDEF(int_least16_t, class_id_optional_type) BOOST_ARCHIVE_STRONG_TYPEDEF(int_least16_t, class_id_reference_type)

David Raulo wrote:
This is becoming ridiculous. Can we please get back to the actual issue? Here are the facts :
- boost_version used to be an unsigned int, which for the vast majority of your users was 32 bits at least;
I believe that the C++ standard permits an int to be 16 bits. That in itself would be an indicator that assuming 32 bits might be an issue.
- boost 1.42 changed that to 16 bits; - in your mind it was always 8 bits and you were just trying to enforce this in a more explicit way;
This is true. Basically, I saw it as an oversight that it had not trapped the usage of a larger integer and I was trying rectify that. Had I had the foresight to do this 9 years ago, this problem would not have come up. Now it can't come up in the future.
- this is not mentionned anywhere in the documentation or in the API. - you have at least one user who lost backward archive compatibility because of this.
See below a patch against svn. Can we please discuss its advantages and drawbacks? - what good does it do: obviously in my case, restore backward compatibility.
Why can't you just patch your own copy?
- it gives more flexibility to the versioning scheme. Two usefull such schemes were described previously in the discussion, where classes still have increasing integer versions, but are not possible with 8 bits storage.
- what downsides does appying this patch have? Maybe occuring a slight overhead on 16-bits platforms? If true, can this be actually measured? - Would this patch cause any regression? Break any user code which was working fine before? Break user archive backward compatibilty?
Would this not break compatibility with binary_?archive ? Currently binary archive stores the version as a 16 bit integer. Maybe it wouldn't but it's another thing that would have to be considered. Even if it didn't, this would break the "guarentee" that any serialization which works for one archive class is guarenteed to work with any other one. I thought you already ran into this when you tried to use a binary_archive. Robert Ramey
Index: boost/archive/basic_archive.hpp =================================================================== --- boost/archive/basic_archive.hpp (revision 59943) +++ boost/archive/basic_archive.hpp (working copy) @@ -37,7 +37,7 @@ } /* boost */ \ /**/
-BOOST_ARCHIVE_STRONG_TYPEDEF(uint_least16_t, version_type) +BOOST_ARCHIVE_STRONG_TYPEDEF(uint_least32_t, version_type) BOOST_ARCHIVE_STRONG_TYPEDEF(int_least16_t, class_id_type) BOOST_ARCHIVE_STRONG_TYPEDEF(int_least16_t, class_id_optional_type) BOOST_ARCHIVE_STRONG_TYPEDEF(int_least16_t, class_id_reference_type)

On Sat, 27 Feb 2010 10:17:19 -0800
"Robert Ramey"
David Raulo wrote:
- boost_version used to be an unsigned int, which for the vast majority of your users was 32 bits at least;
I believe that the C++ standard permits an int to be 16 bits. That in itself would be an indicator that assuming 32 bits might be an issue.
which is why I proposed using uint_least32_t, instead of going back to unsigned int. Out of curiosity, do you know of a c++ compiler on some boost-supported platform which is using 16 bits ints? Some beefy microcontroller perhaps? Or is this concern about 16-bits ints purely hypothetical?
See below a patch against svn. Can we please discuss its advantages and drawbacks? - what good does it do: obviously in my case, restore backward compatibility.
Why can't you just patch your own copy?
I might do that, or something completely different, depending on what I learn here. As I said earlier, maintaining our own variant of boost is inconvenient to say the least. We may as well completely abandon the idea of reloading old archives, and restrict our usage of serialisation for same-dll, same-boost cases. For long-lived models and guarantees of backward compatibility, I'm starting to believe we should do something else entirely. Try to see it that way: you seem to be implying that using uint_least32_t now would somehow increase risks of future problems, but at the same time you suggest that we do just that. Now I'm wondering what those risks really are, and if we should take them. There is a second risk factor that I must evaluate before starting to maintain a fork, which looks a lot more real to me: if the backward compat is broken again in the future, will we be able to work around it and continue to synchronize with official boost? Or would all that have been in vain? This is boost-level c++ we are talking about here, so I can definitely envision us not having the resources to do that in the long term. But for the sake of the argument, let's continue evaluating the downsides of that patch, which prevent it from being applied to official boost. Yes, I understood long ago that you won't, but with all that time spent arguing, you have not yet shown a single concrete scenario where applying that patch would create you or any user any problem. See below.
- Would this patch cause any regression? Break any user code which was working fine before? Break user archive backward compatibilty?
Would this not break compatibility with binary_?archive ?
No, it would not. Binary would continue to save only 8 bits, AFAICT.
Currently binary archive stores the version as a 16 bit integer. Maybe it wouldn't but it's another thing that would have to be considered. Even if it didn't, this would break the "guarentee" that any serialization which works for one archive class is guarenteed to work with any other one.
It does not do that either, because boost 1.42 actually restricts text archives to 16 bits and binary ones to 8. I do understand that it is an oversight, and that 1.43 will put everyone on 8-bits, equal footing :) The fact remains that this "guarantee" was never there before, is still not there, and so the patch does not incur any regression here either.
I thought you already ran into this when you tried to use a binary_archive.
Yes, and that was no regression, so not a problem, as I said. And everyone expect such limits on binary archives, because of their very nature. XML, not so much. Besides, binary and xml are not equal anyway wrt serialization. Imagine that scenario: a user becomes accustomed to the fact that xml archives, the only kind she tests at first, is portable between platforms. Her application becomes dependant on that portability. Then at one point she discovers that binary archives, which she wanted to use, are not portable. She is now stuck with XML. Strangely, you are not suggesting that we make xml non-portable, so that users do not "couple the concepts or archive and serialization", are you? ;-) And finally: if I were to complete the patch with code to enable 32 bits versions for binary archives too, that would take care of your only argument against the first patch so far, yes? Would you accept it then? Or is there another rule that I missed? -- david

David Raulo wrote:
On Sat, 27 Feb 2010 10:17:19 -0800 "Robert Ramey"
wrote: David Raulo wrote:
- boost_version used to be an unsigned int, which for the vast majority of your users was 32 bits at least;
I believe that the C++ standard permits an int to be 16 bits. That in itself would be an indicator that assuming 32 bits might be an issue.
which is why I proposed using uint_least32_t, instead of going back to unsigned int.
which is good thinking. what I really want to do is to use uint_least_8_t to document and enforce the current design.
Out of curiosity, do you know of a c++ compiler on some boost-supported platform which is using 16 bits ints?
When faced with writing something like the serialization library one can take two approaches:
Some beefy microcontroller perhaps? Or is this concern about 16-bits ints purely hypothetical?
a) select the group of compilers that one is interested in and make sure everything works for all members of that group. b) code to the C++ standard. Then introduce work-arounds for those compilers which fail to support the standard in some way. I've chose b) because it results in much portable code and is less work to implement and maintain. In short, it's scalable. One only needs to consider differences between each compiler and the standard rather than an ever larger group of compilers. There are other reasons to support doing it this way but these are enough for me. One thing that is a bad idea, and this case illustrates it, is to make assumptions about the implementation which are not explicit. That is - if a integer has to be able to contain 32 bits for the program to work, one should use a type which indicates and guarentees that.
Try to see it that way: you seem to be implying that using uint_least32_t now would somehow increase risks of future problems,
Actually, it would create problems immediately. In fact, you have one right now. You haven't come upon it because you only use a type of archive where the problem doesn't show up.
but at the same time you suggest that we do just that.
Well, your case is special now.
Now I'm wondering what those risks really are, and if we should take them.
I suppose the risk is that there is another ambiguous issue inside the library that no one knows about which could later become a problem. You'll have to weigh the risk of alternatives- writing your own code, using a different library, etc.. to determine if they're less risky.
But for the sake of the argument, let's continue evaluating the downsides of that patch, which prevent it from being applied to official boost. Yes, I understood long ago that you won't, but with all that time spent arguing, you have not yet shown a single concrete scenario where applying that patch would create you or any user any problem. See below.
Your situation is a concrete case. If you want to use binary_archive to to make the process faster, you can't. Your program will fail.
- Would this patch cause any regression? Break any user code which was working fine before? Break user archive backward compatibilty?
The fact remains that this "guarantee" was never there before, is still not there, and so the patch does not incur any regression here either.
Thinking about it some more, I'll look into the possibility of permiting the loading of larger version # from older text archives. I see now that's what you have in mind. I would likely implement it in a different way but I think I can make it work. I'll have to think about it.
Besides, binary and xml are not equal anyway wrt serialization. Imagine that scenario: a user becomes accustomed to the fact that xml archives, the only kind she tests at first, is portable between platforms. Her application becomes dependant on that portability.
Then at one point she discovers that binary archives, which she wanted to use, are not portable. She is now stuck with XML. Strangely, you are not suggesting that we make xml non-portable, so that users do not "couple the concepts or archive and serialization", are you? ;-)
Not that it matters - but this is totally not understandable to me.
And finally: if I were to complete the patch with code to enable 32 bits versions for binary archives too, that would take care of your only argument against the first patch so far, yes? Would you accept it then? Or is there another rule that I missed?
Actually, it would be easy to "upgrade" the version # to 32 bits. But this would only encourage more mis-usage of the version # and make all binary_archives a little bit bigger. Give me some time to think about it some more. Robert Ramey

Robert Ramey
Jarl Lindrud wrote:
Can Microsoft 2003 word load files created with Microsoft word 2007? That is what we're talking about here.
Of course it can. Try it yourself (Word 2007 -> File -> Save As -> select the relevant file type). This capability is fundamentally important for many applications, and there's nothing magic or logically impossible about it.
Please reread my sentence above.
Let me try to make this more clear to you. * Word 2007 can save files in a format that will *not* be readable by Word 2003. * Word 2007 can *also* save files in a format that *will* be readable by Word 2003. With Boost.Serialization I *cannot* make a program that produces archives that are guaranteed readable by programs that I've already built and deployed, because those earlier programs may well use an earlier version of Boost. And as you yourself acknowledged, in regard to point 2 in my very first post, archives created with newer Boost versions are not guaranteed to be readable by older Boost versions. In fact, you stated that you don't even see any utility in such a capability, and then later that it is somehow logically impossible. Can you at least see that there are serious real world applications (Word, for starters) that have this capability as a fundamental requirement? Boost.Serialization currently does not support this kind of versioning. Fair enough. But let's not pretend that it's some academic feature that no-one is interested in.
Networked applications need this capability as well. E.g. a newly developed client needs to be able to communicate with any number of older, deployed, servers.
This capability is and has always been part of the serialization library. It it included in the documentation, demos and tests. Please read the documentation.
If the previously deployed servers use Boost 1.35.0, say, and my new client uses Boost 1.42.0, how can you claim that the new client will be able to send valid messages to the older servers? The Boost.Serialization documentation explicitly states that such compatibility is *not* supported (To Do -> Back Versioning). You even said so in earlier posts. What am I missing?
It's only a guarentee if one follows the rules. In fact he did get
So there are unknown rules one should follow... Are there more unknown rules? Is the end user responsible for knowing about these unknown rules?
lol - I write 30,000 lines of code with the intention of making no errors. I cannot guarentee that I have made no errors.
Car makers know that their cars are not perfect. When they offer you a guarantee, it doesn't mean "the car will never break down". It means, "if the car breaks down, we will fix it.". If Boost.Serialization's guarantee of backwards compatibility only applies under "anticipated usage" (as defined by yourself), then I guess that should be documented, along with your definition of "anticipated usage". Regards, Jarl.

Jarl Lindrud wrote:
Robert Ramey
writes: Jarl Lindrud wrote:
Can Microsoft 2003 word load files created with Microsoft word 2007? That is what we're talking about here.
Of course it can. Try it yourself (Word 2007 -> File -> Save As -> select the relevant file type). This capability is fundamentally important for many applications, and there's nothing magic or logically impossible about it.
Please reread my sentence above.
Let me try to make this more clear to you.
* Word 2007 can save files in a format that will *not* be readable by Word 2003. * Word 2007 can *also* save files in a format that *will* be readable by Word 2003.
LOL - Microsoft 2003 cannot load files created by Microsoft word 2007 unless they are specifically saved with that compatibility in mind. The ability to create a file compatible with some previous version is not supported by the library. The question has come up, and a cursory examination showed that it wouldn't be too hard to do. This is mentioned at the end of the documentation as ideas for future projects. But no one has had enough interest to work on this.
With Boost.Serialization I *cannot* make a program that produces archives that are guaranteed readable by programs that I've already built and deployed, because those earlier programs may well use an earlier version of Boost.
You can if you've used a class verision number which is less than 8 bits. Have you saved files with version # greater than 255?
And as you yourself acknowledged, in regard to point 2 in my very first post, archives created with newer Boost versions are not guaranteed to be readable by older Boost versions.
This is of course not true. One thing is Boost version. Class versions are an entirely different thing which has absolutely nothing to do with class version. I know I've said it before, but I can't believe you actually understand the issues here.
In fact, you stated that you don't even see any utility in such a capability, and then later that it is somehow logically impossible.
I'm talking about class versions here.
Can you at least see that there are serious real world applications (Word, for starters) that have this capability as a fundamental requirement?
Word can't do this. No program can. Microsoft word 2003 cannot read a file saved in Microsoft word 2007 format.
Boost.Serialization currently does not support this kind of versioning. Fair enough. But let's not pretend that it's some academic feature that no-one is interested in.
LOL
Networked applications need this capability as well. E.g. a newly developed client needs to be able to communicate with any number of older, deployed, servers.
It can and it does.
This capability is and has always been part of the serialization library. It it included in the documentation, demos and tests. Please read the documentation.
If the previously deployed servers use Boost 1.35.0, say, and my new client uses Boost 1.42.0, how can you claim that the new client will be able to send valid messages to the older servers? The Boost.Serialization documentation explicitly states that such compatibility is *not* supported (To Do -> Back Versioning). You even said so in earlier posts.
LOL
What am I missing?
You're totally lost here. I think I see it now. And this has in fact come up before. You've somehow got the idea that class versioning is somehow related to boost version #. The boost version # refers to the version of the serialization library code. The class version refers to the version of the class in one's own program. These are totally unrelated concepts whose only commonality is the word "version". The current boost serialization library can read archives created 10 years ago. The only requirement is that if the users code has added members, then the loading of those members has to be conditional on a class version number. The library assumes that this class version number is small integer.
It's only a guarentee if one follows the rules. In fact he did get
So there are unknown rules one should follow... Are there more unknown rules? Is the end user responsible for knowing about these unknown rules?
I'd have to go back and check the documentation to see what it says about version numbers. The examples all show small version numbers and the function signature uses "int" which the C++ standard permits to be at least as small as 16 bits. Also the binary_archive breaks if one tries to use a 32 bit integer. So it never occurred to me that someone might try to store a 32 bit number for the class version. In the latest iteration, we decided to compile boost at the highest warning level. Eliminating warnings resulted in including code to trap the storage of version # that are too big. So in the future these rules are better enforced.
lol - I write 30,000 lines of code with the intention of making no errors. I cannot guarentee that I have made no errors.
Car makers know that their cars are not perfect. When they offer you a guarantee, it doesn't mean "the car will never break down". It means, "if the car breaks down, we will fix it.".
There is a very simple fix - one line of code that can fix this users code. He can include it in his own version of the library and will have no problems loading his older files. This fix basically backs out the recent change and would prevent detection problems of this nature in the future. That's why I don't want to do it.
If Boost.Serialization's guarantee of backwards compatibility only applies under "anticipated usage" (as defined by yourself), then I guess that should be documented, along with your definition of "anticipated usage".
I'll add language to the documentation so that it's clear one should not use a number larger than 255 as a class version number. And I'll enhance the code to trap on violations of this rule in the future. Robert Ramey

LOL - Microsoft 2003 cannot load files created by Microsoft word 2007 unless they are specifically saved with that compatibility in mind.
Exactly, glad we've got that sorted. This kind of compatibility is of great utility, and there's nothing "logically impossible" about it.
And as you yourself acknowledged, in regard to point 2 in my very first post, archives created with newer Boost versions are not guaranteed to be readable by older Boost versions.
This is of course not true. One thing is Boost version. Class
You're contradicting yourself... In your reply to my first post you stated the following: ------------------------------------------------------------------------
(2) Conversely, there is no mechanism by which archives created with a *newer* Boost version can be guaranteed to be readable by a specific *older* version of Boost.
This is true. I don't see anyway of making such a guarentee and I don't seen any utility in being able to do this. ------------------------------------------------------------------------ But just now, when I repeated point 2, you stated that "This is of course not true"... Which way do you want it?
Networked applications need this capability as well. E.g. a newly developed client needs to be able to communicate with any number of older, deployed, servers.
It can and it does.
This capability is and has always been part of the serialization library. It it included in the documentation, demos and tests. Please read the documentation.
If the previously deployed servers use Boost 1.35.0, say, and my new client uses Boost 1.42.0, how can you claim that the new client will be able to send valid messages to the older servers? The Boost.Serialization documentation explicitly states that such compatibility is *not* supported (To Do -> Back Versioning). You even said so in earlier posts.
LOL
What am I missing?
You're totally lost here. I think I see it now. And this has in fact
Robert, as you've stated previously: --------------------------------------------------------------
Networked applications need this capability as well. E.g. a newly developed client needs to be able to communicate with any number of older, deployed, servers.
This capability is and has always been part of the serialization library. It it included in the documentation, demos and tests. Please read the documentation. -------------------------------------------------------------- Perhaps you can explain, then, how my 1.42.0-based client will be able to send intelligible messages to my 1.35.0-based server? Even if there have been no changes to my classes, there have been changes to Boost.Serialization archive formats between 1.35.0 and 1.42.0. Are you saying that 1.42.0 can "tweak" the archives so that they will be readable by 1.35.0 ? If not, how is the 1.35.0-based server ever going to be able to receive messages from the 1.42.0-based client?
I'll add language to the documentation so that it's clear one should not use a number larger than 255 as a class version number. And I'll enhance the code to trap on violations of this rule in the future.
What about the "guarantee" of backwards compatibility? Do you still feel that is an accurate description? Regards, Jarl.

Jarl Lindrud wrote:
LOL - Microsoft 2003 cannot load files created by Microsoft word 2007 unless they are specifically saved with that compatibility in mind.
Exactly, glad we've got that sorted. This kind of compatibility is of great utility, and there's nothing "logically impossible" about it.
It's not impossible, but it's pretty hard in general, and not easy to automate. If you're using class-level (or instance-level) versioning, your old archive has classes X1..Xn with versions v1..vn, and your new archive has classes X1..Xm with versions v1'..vm'. First you need to keep track of all these classes in your source code, and there well may be hundreds or thousands, not all of them present in any one document. Then you need to remember the version number for each that went into the old documents. Then, on save time, you need to pass the library the whole (v1..vn) list, so that it would be able to pass you vi when it calls your saver for Xi. Finally, you have to deal with new classes that were added after the old app shipped. Laborious. If, on the other hand, you use a single archive-level version number, this creates a central point that every class change needs to touch, and distributed development becomes a nightmare. Better versioning schemes may be possible, but nothing comes to mind.

It's not impossible, but it's pretty hard in general, and not easy to automate. If you're using class-level (or instance-level) versioning, your old archive has classes X1..Xn with versions v1..vn, and your new archive has classes X1..Xm with versions v1'..vm'. First you need to keep track of all these classes in your source code, and there well may be hundreds or thousands, not all of them present in any one document. Then you need to remember the version number for each that went into the old documents. Then, on save time, you need to pass the library the whole (v1..vn) list, so that it would be able to pass you vi when it calls your saver for Xi. Finally, you have to deal with new classes that were added after the old app shipped. Laborious.
Indeed - I wouldn't want to use class level versioning to implement forward compatibility.
If, on the other hand, you use a single archive-level version number, this creates a central point that every class change needs to touch, and distributed development becomes a nightmare.
I've actually implemented this, where I work. We have a large distributed application, where forward (and backward) compatibility are essential requirements. It's by no means automated, and there are some thorny issues when development proceeds on separate branches. Basically you only want to make schema changes on the trunk of your codebase. Assuming relatively few releases, you can collect a whole batch of changes into one version number change. I'm not sure what you mean by distributed development?
Better versioning schemes may be possible, but nothing comes to mind.
Protocol Buffers was Googles way of dealing with these issues. Regards, Jarl.

Jarl Lindrud wrote:
If, on the other hand, you use a single archive-level version number, this creates a central point that every class change needs to touch, and distributed development becomes a nightmare.
I've actually implemented this, where I work. We have a large distributed application, where forward (and backward) compatibility are essential requirements.
It's by no means automated, and there are some thorny issues when development proceeds on separate branches. Basically you only want to make schema changes on the trunk of your codebase. Assuming relatively few releases, you can collect a whole batch of changes into one version number change.
I'm not sure what you mean by distributed development?
I mean independent component development by multiple uncoordinated teams. For example, if you use boost::X in the project, its serialization support can't follow your global versioning scheme. It works if you control all pieces.

Peter Dimov
Jarl Lindrud wrote:
If, on the other hand, you use a single archive-level version number, this creates a central point that every class change needs to touch, and distributed development becomes a nightmare.
I've actually implemented this, where I work. We have a large distributed application, where forward (and backward) compatibility are essential requirements.
It's by no means automated, and there are some thorny issues when development proceeds on separate branches. Basically you only want to make schema changes on the trunk of your codebase. Assuming relatively few releases, you can collect a whole batch of changes into one version number change.
I'm not sure what you mean by distributed development?
I mean independent component development by multiple uncoordinated teams. For example, if you use boost::X in the project, its serialization support can't follow your global versioning scheme. It works if you control all pieces.
You need a version number for each component. If your program creates archives with objects from components C1, C2, and C3, then each one of those components should maintain a version number of its own. When your program wants to create archives for consumption by older versions of the program, it would have to supply four version numbers (3 component versions plus your own application version). That seems fairly manageable to me, much more so than class level versioning.

Jarl Lindrud wrote:
Even if there have been no changes to my classes, there have been changes to Boost.Serialization archive formats between 1.35.0 and 1.42.0.
I am not aware of any such changes. It's possible that there might be a few special internal types which could be a problem. I think I've trapped that situation, but as I write this, I don't remember. Are you saying that 1.42.0 can "tweak" the archives so that
they will be readable by 1.35.0 ? If not, how is the 1.35.0-based server ever going to be able to receive messages from the 1.42.0-based client?
No such tweaking should be necessary. Archive format is unchanged between boost versions. The only thing that changes are the changes in users classes. The user has to make special provision for such changes.
I'll add language to the documentation so that it's clear one should not use a number larger than 255 as a class version number. And I'll enhance the code to trap on violations of this rule in the future.
What about the "guarantee" of backwards compatibility? Do you still feel that is an accurate description?
I do. With proper usage of the library, users can be confident that current programs can read archives created by the very earliest version of the library. This particular case is the first time it has been reported that an archive made by previous version of the library fails to load. Robert Ramey

Even if there have been no changes to my classes, there have been changes to Boost.Serialization archive formats between 1.35.0 and 1.42.0.
I am not aware of any such changes. It's possible that there might be a few special internal types which could be a problem. I think I've trapped that situation, but as I write this, I don't remember.
Are you saying that 1.42.0 can "tweak" the archives so that
they will be readable by 1.35.0 ? If not, how is the 1.35.0-based server ever going to be able to receive messages from the 1.42.0-based client?
No such tweaking should be necessary. Archive format is unchanged between boost versions. The only thing that changes are the changes in users classes. The user has to make special provision for such changes.
There is definitely a breaking archive format change in basic_binary_iarchive.hpp:110 (1.42.0) - collection sizes serialized as std::size_t rather than unsigned int. Subversion tells me this change was made 4 August 2009, so I presume it was released in Boost 1.41.0 . I can't see any mention in the release notes. When you test against older versions of Boost.Serialization, do you check that archives can be loaded in both directions, i.e. old version can load new archives and new version can load old archives? Even so, this particular format change would go unnoticed unless you test with something like 64 bit Visual C++. Maybe there could be a section in the documentation listing, for each release, any archive format changes that are made. That would be valuable to users who are wondering whether they can safely upgrade.
I'll add language to the documentation so that it's clear one should not use a number larger than 255 as a class version number. And I'll enhance the code to trap on violations of this rule in the future.
What about the "guarantee" of backwards compatibility? Do you still feel that is an accurate description?
I do.
With proper usage of the library, users can be confident that current programs can read archives created by the very earliest version of the library. This particular case is the first time it has been reported that an archive made by previous version of the library fails to load.
Maybe a checklist in the documentation, regarding "proper usage", so one knows whether or not to expect backwards compatibility? Anyway, your call. Thanks for the discussion. Regards, Jarl.

Jarl Lindrud wrote:
There is definitely a breaking archive format change in basic_binary_iarchive.hpp:110 (1.42.0) - collection sizes serialized as std::size_t rather than unsigned int.
Subversion tells me this change was made 4 August 2009, so I presume it was released in Boost 1.41.0 . I can't see any mention in the release notes.
When you test against older versions of Boost.Serialization, do you check that archives can be loaded in both directions, i.e. old version can load new archives and new version can load old archives?
No - but it would be a good thing to test. Setting up such testing would entail a lot of work though. And the testing the serialization library already consumes more resources than testing any other library. If someone wanted to make tests for this purpose, I would consider adding them to the test suite.
Even so, this particular format change would go unnoticed unless you test with something like 64 bit Visual C++.
Hmmm - This change - changing unsigned int to std::size_t - doesn't seem to be an error to me. binary archives are be definition across platforms. That is a binary archive created on a 32 bit machine cannot be read on a 64 bit machine. Since std::size_t is the same size as unsigned int on all platforms, I don't think this should ever create an error. But I will say that this particular case is really a case of getting lucky. Now that I think about it, I should have made the type conditional on library version - even though I don't think it makes a difference in this case.
Maybe there could be a section in the documentation listing, for each release, any archive format changes that are made. That would be valuable to users who are wondering whether they can safely upgrade.
If I knew about them, I would fix them before release.
Maybe a checklist in the documentation, regarding "proper usage", so one knows whether or not to expect backwards compatibility?
I expect backward compatibility to be preserved. It is an explicit goal of the library. As far as I know, this is the first case where such compatibility has been broken. Robert Ramey

Robert Ramey
Hmmm - This change - changing unsigned int to std::size_t - doesn't seem to be an error to me. binary archives are be definition across platforms. That is a binary archive created on a 32 bit machine cannot be read on a 64 bit machine. Since std::size_t is the same size as unsigned int on all platforms, I don't think this should ever create an error.
64 bit Visual C++ has size_t as 64 bits, and unsigned int as 32 bits ...

On 1 Mar 2010, at 04:39, Jarl Lindrud wrote:
Robert Ramey
writes: Hmmm - This change - changing unsigned int to std::size_t - doesn't seem to be an error to me. binary archives are be definition across platforms. That is a binary archive created on a 32 bit machine cannot be read on a 64 bit machine. Since std::size_t is the same size as unsigned int on all platforms, I don't think this should ever create an error.
64 bit Visual C++ has size_t as 64 bits, and unsigned int as 32 bits ...
So does 64-bit Mac OS X (the default in Snow Leopard) Chris

Robert Ramey wrote:
Jarl Lindrud wrote:
There is definitely a breaking archive format change in basic_binary_iarchive.hpp:110 (1.42.0) - collection sizes serialized as std::size_t rather than unsigned int.
The code in question is: void load_override(serialization::collection_size_type & t, int){ if (this->get_library_version() < 6) { unsigned int x=0; * this->This() >> x; t = serialization::collection_size_type(x); } else { std::size_t x=0; * this->This() >> x; t = serialization::collection_size_type(x); } } This indicates that if the serialization library version is less than 6, then read the collection count as an unsigned int from the file. Otherwise read the collection count as an std::size_t from the file. Thus, backward compatibility is maintained through this change. Robert Ramey

Robert Ramey
Robert Ramey wrote:
Jarl Lindrud wrote:
There is definitely a breaking archive format change in basic_binary_iarchive.hpp:110 (1.42.0) - collection sizes serialized as std::size_t rather than unsigned int.
The code in question is:
void load_override(serialization::collection_size_type & t, int){ if (this->get_library_version() < 6) { unsigned int x=0; * this->This() >> x; t = serialization::collection_size_type(x); } else { std::size_t x=0; * this->This() >> x; t = serialization::collection_size_type(x); } }
This indicates that if the serialization library version is less than 6, then read the collection count as an unsigned int from the file. Otherwise read the collection count as an std::size_t from the file. Thus, backward compatibility is maintained through this change.
Sure. The point being made is that, contrary to your assumption, it is generally not possible for 1.35.0 based programs to read archives produced by 1.42.0 based programs. As the version number is 6, one might suspect that this isn't the only format change that's occurred. Regards, Jarl.
participants (7)
-
Christopher Jefferson
-
David Raulo
-
Jarl Lindrud
-
Paul
-
Peter Dimov
-
Robert Ramey
-
Vladimir Prus