serialization & registration questions

newer
[python] Py_Finalize(): To call,...

older
BGL: Copy constructor for Boost...

Jerry

21 Mar 2008 21 Mar '08

1:58 p.m.

Is there a direct way to check if a class is registered correctly? And why would this differ when using binary/XML mediums? My binary serialization problem persists (sorry!) when using fast_binary_archive though the same works when using the XML serializer. Also I thought I'd try and see what happens when using an archive implemented from scratch according to the boost docs. Is the implementation section of the archive class reference up to date with 1.34.1 ? I was getting hard to decipher template error messages using VC++ 7.1 SP1. Thanks.

Attachments:

attachment.html (text/html — 1.1 KB)

Show replies by date

Robert Ramey

21 Mar 21 Mar

5:15 p.m.

Note that results should be the same with all types of archives. This is a fundamental goal of the library. So if you can make an example which fails with binary archive but passes with another one - that would be of interest. Robert Ramey "Jerry" <jerry@chordia.co.uk> wrote in message news:1c7c01c88b5b$ac00fd90$0565a8c0@p424... Is there a direct way to check if a class is registered correctly? And why would this differ when using binary/XML mediums? My binary serialization problem persists (sorry!) when using fast_binary_archive though the same works when using the XML serializer. Also I thought I'd try and see what happens when using an archive implemented from scratch according to the boost docs. Is the implementation section of the archive class reference up to date with 1.34.1 ? I was getting hard to decipher template error messages using VC++ 7.1 SP1. Thanks. ------------------------------------------------------------------------------ _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Jerry

8 p.m.

...

Note that results should be the same with all types of archives. This is a fundamental goal of the library. So if you can make an example which > fails with binary archive but passes with another one - that would be of interest.

Thanks. I think I may be able to do this. What's the protocol here, assemble a minimal test example and post to the list? Does it accepts attachments? Jerry

Sohail Somani

9:12 p.m.

On Fri, 21 Mar 2008 20:00:12 +0000, Jerry wrote:

...

I think I may be able to do this. What's the protocol here, assemble a minimal test example and post to the list? Does it accepts attachments?

I think attachments work fine... -- Sohail Somani http://uint32t.blogspot.com

Jerry

23 Mar 23 Mar

2:33 p.m.

Robert,

...

Note that results should be the same with all types of archives. This is a fundamental goal of the library. So if you can make an example which > fails with binary archive but passes with another one - that would be of interest.

It is a header ordering issue. I instrumented the insert and tfind functions in basic_serializer_map.cpp such that they spewed some debug e.g. basic_serializer_map.cpp(53) : this: 00575BB0: Registering CDiagramButton -> true basic_serializer_map.cpp(82) : this: 00575BB0: tfind() CDiagramButton -> true basic_serializer_map.cpp(82) : this: 00575BF4: tfind() CDiagramButton -> false i.e. the first case was registering with one map, the second case was failing to find the type in another map. The failing case above had the following #include order #include <boost/archive/xml_iarchive.hpp> #include <boost/archive/xml_oarchive.hpp> #include <boost/serialization/export.hpp> // <-- wrong! move it #include <boost/archive/binary_iarchive.hpp> #include <boost/archive/binary_oarchive.hpp> #include "test_classes.h" I now realise that <boost/serialization/export.hpp> now needs to follow any of the archive headers. The #include order determines which failure will occur - something that might be added to the comments in the code. Is there any way to detect this problem at compile /link time? Thanks. Jerry. ----- Original Message ----- From: Robert Ramey To: boost-users@lists.boost.org Sent: Friday, March 21, 2008 5:15 PM Subject: Re: [Boost-users] serialization & registration questions Note that results should be the same with all types of archives. This is a fundamental goal of the library. So if you can make an example which fails with binary archive but passes with another one - that would be of interest.

François Mauger

4:12 p.m.

New subject: serialization & several text archive within the same files...

Hi I have large amount of data to store/load from files. I use boost::serialization library to do it. It has very nice features. I use version 1.33:

...

...
...
libboost-serialization-dev 1.33.1-9ubuntu3.1

Checking different strategies to play with my data and i/o archives, I met the following problem: if I save several text_oarchives within the same output file (a trick to break side-effects of memory tracking), then the deserialization failed for there is no separator between successive text archives. I have to explicitely add a 'std::endl' in the output stream to make it run. This pb does not appear with xml archives for the </tag> at the end is unambiguous to parse the end of each archive. I did not check for binary archives, but I guess there will be no pb. For me a mandatory 'white' character should be added as the last byte in a text output archive (when destructor is invoked?). This will make more coherent (symmetric!) in comparison with xml/binary ars. This is only a suggestion: I cannot imagine all the side-effects such a strategy could imply. A sample demo file is attached. Thanks for your attention. frc -- Francois Mauger Laboratoire de Physique Corpusculaire de Caen et Universite de Caen ENSICAEN - 6, Boulevard du Marechal Juin, 14050 CAEN Cedex, FRANCE e-mail: mauger@lpccaen.in2p3.fr tel.: (0/+33) 2 31 45 25 12 fax: (0/+33) 2 31 45 25 49

Robert Ramey

9:11 p.m.

New subject: serialization & several text archive within the samefiles...

if you don't like tracking - you can turn it off for the types you want. to add your own characters between archives try something like this: { ofstream os("filename); { boost::archive::text_archive oa(os); oa << .... } // close archive - leaves stream open os << '\n'; // or whatever { boost::archive::text_archive oa(os); // another archive in the same stream { boost::archive::text_archive oa(os); oa << .... } // close archive - leaves stream open os << '\n'; // or whatever } // closes output stream Robert Ramey François Mauger wrote:

...

Hi

I have large amount of data to store/load from files. I use boost::serialization library to do it. It has very nice features.

I use version 1.33:

...
...
...
libboost-serialization-dev 1.33.1-9ubuntu3.1

Checking different strategies to play with my data and i/o archives, I met the following problem: if I save several text_oarchives within the same output file (a trick to break side-effects of memory tracking), then the deserialization failed for there is no separator between successive text archives.

I have to explicitely add a 'std::endl' in the output stream to make it run.

This pb does not appear with xml archives for the </tag> at the end is unambiguous to parse the end of each archive. I did not check for binary archives, but I guess there will be no pb.

For me a mandatory 'white' character should be added as the last byte in a text output archive (when destructor is invoked?). This will make more coherent (symmetric!) in comparison with xml/binary ars. This is only a suggestion: I cannot imagine all the side-effects such a strategy could imply.

A sample demo file is attached.

Thanks for your attention.

frc

...
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

François Mauger

24 Mar 24 Mar

9:12 a.m.

New subject: serialization & several text archive within the samefiles...

Hi robert and all,

...

...
...
On Sun, March 23, 2008 10:11 pm, Robert Ramey wrote: if you don't like tracking - you can turn it off for the types you want.

see below.

...

to add your own characters between archives try something like this:

this is what I have done of course, to make it run. but I was considering that such behavior is the responsability of the core lib, not the user (me), that's my point. As I use a 'serialization manager' class that implements some txt/xml/bin archive depending of the file extension provided by the users of my lib, I would find more, say, 'elegant' not to treat text archives in a different way considering xml/bin ars. but maybe it is a purist issue! and maybe not really relevant. of course, this is not a pb if I add this 'blank' char by hand. So it is fine for me. -- next point: About memory tracking: after a few investigations while considering my needs, I find it very powerful. I only implies some care while using it. As I use serialization by pointers on nested objects in my lib, tracking is very useful for it handle the links properly. The only problem I met is the following: - I have say 1000000 records (some instances of a class) to write in the archive. - Each record uses massively std::vector or std::list as internal members (subrecords with dynamically allocated memory) - Each record (with its internal subrecords) is more than 1000 bytes long --> so total storage for my data set is about 1Gb in a single output file! - More, each record has pointers members in it, so tracking is activated and I NEED it. Now, when I write all the records using a kind of loop on my archive, I must not reuse the same memory addresses (as explained in the sample programs and docs in boost::serialization, I cannot use a temporary record instance in the loop). So I have to store in the RAM of the machine the whole sets of records (and dynamically allocated subrecords) at the same time before to store them in the archive (1Gb!). Typically I use a std::list for this. It implies to run a machine with 1Gb available RAM, which cannot be garanted for all my users, even on our computing center... On the other side, during the loop process, there is no way to erase (nor reduce the size of) previous written records while saving the current one in the archive. Of course this would save the current running memory but this available memory could be 'reused' at some point by the tracking mechanism (this is rather a random process that depends on the system) and then one would experience some misinterpretation of data as already serialized stuff through my pointers. Then the output will be corrupted. This is what I call a "long-range memory tracking effect": - within a single record (short range), memory tracking is fine for it enables to maintain some connections between 'subrecords' through pointers in a very nice and 'storage saving' way. More de-serialization works perfectly, without duplicate records/subrecords and memory leak issues. - for successive records (long range), I have no need of pointers to make links between objects (records and subrecords in it), so tracking is unuseful but it is still activated in the same program! Then it leads to some nesting between memory addresses among different records. This is a corruption case, unless, as I explained above, all records (and internal subrecords) are kept in memory till the end of the serialization process... then I need 1 Gb RAM machine! The only way I have found in my progs to break this long range effect is to use one archive per record. It then seems that memory tracking is confined within the limit of the current archive which is exactly what I want. Finally, what will be useful to enable my "per-record" serialization approach for a large data set, this is a kind of "memory tracking reset" function that could be invoked online while looping on the archive. I have no idea if it is possible to implement this, and if other people could find it useful, Robert first! Hope my point has been understood. At least, if you can confirm that tracking mechanism is confined within a single archive, I can use this strategy of multiple archives per file. Thanks a lot for your attention, advice and constructive critics. And many many thanks to Robert for this very nice and elegant serialization library. This is really a great and useful work! frc --

...

François Mauger wrote:

...
Hi

I have large amount of data to store/load from files. I use boost::serialization library to do it. It has very nice features.

I use version 1.33:

...
...
...
libboost-serialization-dev 1.33.1-9ubuntu3.1

Checking different strategies to play with my data and i/o archives, I met the following problem: if I save several text_oarchives within the same output file (a trick to break side-effects of memory tracking), then the deserialization failed for there is no separator between successive text archives.

I have to explicitely add a 'std::endl' in the output stream to make it run.

This pb does not appear with xml archives for the </tag> at the end is unambiguous to parse the end of each archive. I did not check for binary archives, but I guess there will be no pb.

For me a mandatory 'white' character should be added as the last byte in a text output archive (when destructor is invoked?). This will make more coherent (symmetric!) in comparison with xml/binary ars. This is only a suggestion: I cannot imagine all the side-effects such a strategy could imply.

A sample demo file is attached.

Thanks for your attention.

frc

...
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

-- Francois Mauger Laboratoire de Physique Corpusculaire de Caen et Universite de Caen ENSICAEN - 6, Boulevard du Marechal Juin, 14050 CAEN Cedex, FRANCE e-mail: mauger@lpccaen.in2p3.fr tel.: (0/+33) 2 31 45 25 12 fax: (0/+33) 2 31 45 25 49

Robert Ramey

5:09 p.m.

New subject: serialization & several text archive within the samefiles...

François Mauger wrote:

...

Hi robert and all,

...
...
...
On Sun, March 23, 2008 10:11 pm, Robert Ramey wrote: if you don't like tracking - you can turn it off for the types you want.

see below.

...
to add your own characters between archives try something like this:

this is what I have done of course, to make it run. but I was considering that such behavior is the responsability of the core lib, not the user (me), that's my point.

A very bad idea in my opinion. Now the user can embed serialized data whereever he wants. The library needs to be smaller rather than larger.

...

As I use a 'serialization manager' class that implements some txt/xml/bin archive depending of the file extension provided by the users of my lib, I would find more, say, 'elegant' not to treat text archives in a different way considering xml/bin ars. but maybe it is a purist issue! and maybe not really relevant. of course, this is not a pb if I add this 'blank' char by hand. So it is fine for me.

As it will be for all users with their own special requirements.

...

-- next point:

About memory tracking: after a few investigations while considering my needs, I find it very powerful. I only implies some care while using it. As I use serialization by pointers on nested objects in my lib, tracking is very useful for it handle the links properly.

...

The only problem I met is the following: - I have say 1000000 records (some instances of a class) to write in the archive. - Each record uses massively std::vector or std::list as internal members (subrecords with dynamically allocated memory) - Each record (with its internal subrecords) is more than 1000 bytes long --> so total storage for my data set is about 1Gb in a single output file! - More, each record has pointers members in it, so tracking is activated and I NEED it.

Now, when I write all the records using a kind of loop on my archive, I must not reuse the same memory addresses (as explained in the sample programs and docs in boost::serialization, I cannot use a temporary record instance in the loop). So I have to store in the RAM of the machine the whole sets of records (and dynamically allocated subrecords) at the same time before to store them in the archive (1Gb!). Typically I use a std::list for this. It implies to run a machine with 1Gb available RAM, which cannot be garanted for all my users, even on our computing center...

On the other side, during the loop process, there is no way to erase (nor reduce the size of) previous written records while saving the current one in the archive. Of course this would save the current running memory but this available memory could be 'reused' at some point by the tracking mechanism (this is rather a random process that depends on the system) and then one would experience some misinterpretation of data as already serialized stuff through my pointers. Then the output will be corrupted.

This is what I call a "long-range memory tracking effect":

- within a single record (short range), memory tracking is fine for it enables to maintain some connections between 'subrecords' through pointers in a very nice and 'storage saving' way. More de-serialization works perfectly, without duplicate records/subrecords and memory leak issues.

- for successive records (long range), I have no need of pointers to make links between objects (records and subrecords in it), so tracking is unuseful but it is still activated in the same program!

...

Then it leads to some nesting between memory addresses among different records. This is a corruption case, unless, as I explained above, all records (and internal subrecords) are kept in memory till the end of the serialization process... then I need 1 Gb RAM machine!

The only way I have found in my progs to break this long range effect is to use one archive per record. It then seems that memory tracking is confined within the limit of the current archive which is exactly what I want.

Finally, what will be useful to enable my "per-record" serialization approach for a large data set, this is a kind of "memory tracking reset" function that could be invoked online while looping on the archive. I have no idea if it is possible to implement this, and if other people could find it useful, Robert first!

Hope my point has been understood. At least, if you can confirm that tracking mechanism is confined within a single archive, I can use this strategy of multiple archives per file.

Thanks a lot for your attention, advice and constructive critics.

And many many thanks to Robert for this very nice and elegant serialization library. This is really a great and useful work!

A couple of points. a) First, you've got a special situation. This requires special effort to understand the library in more detail than normal. Thankfully, you've made the investment in this effort and have been able to exploit the library to help in addressing your situation. I recognize that this is not easy. b) You've given a very clear explanation of your situation, the reasons for it, the problems in addressing it and what you've done about it. This is even harder and I appreciate your efforts. c) I see your method of making one file => one archive per instance as being a very practical and viable solution. I don't see it as overly inefficient given that your class instances are so large to begin with. In thinking about this, I can see how a change in the library might help, but I don't think it would be more efficient than what your doing now. One idea you might want to look into is the concept of an "archive helper" which attaches special behavior to an archive. The current trunk includes archives which have "archive helper" attached to handle the special requirements of shared_ptr which does not model the concept of "serializable". Such a helper could be used to implement your own custom tracking behavior - (Which is what the helper for shared_ptr does). Another idea would be to use BOOST_STRONG_TYPEDEF to create a "wrapper" around types for which you want to turn off tracking for certain instances. This technique may or maynot be useful to your. Good Job and Good Luck. Robert Ramey

Pfligersdorffer, Christian

25 Mar 25 Mar

1:21 p.m.

New subject: serialization & several text archive within the samefiles...

Francois Mauger wrote:

...

For me a mandatory 'white' character should be added as the last byte in a text output archive (when destructor is invoked?). This will make more coherent (symmetric!) in comparison with xml/binary ars.

Hello everybody, as a frequent user of the serialization library and also dealing with consecutive archives in one stream I agree with Francois in this point. I would also consider it natural for the text_oarchive to end with a linebreak. For the same reasons: readability and symmetry with the other archive types. Regards, -- Christian Pfligersdorffer Software Engineering http://www.eos.info

Sohail Somani

23 Mar 23 Mar

4:18 p.m.

On Sun, 23 Mar 2008 14:33:40 +0000, Jerry wrote:

...

Robert,

...
Note that results should be the same with all types of archives. This is a fundamental goal of the library. So if you can make an example which > fails with binary archive but passes with another one - that would be of interest.

It is a header ordering issue.

Yep, this is outline in the doc that I linked you to earlier. [snip]

...

I now realise that <boost/serialization/export.hpp> now needs to follow any of the archive headers. The #include order determines which failure will occur - something that might be added to the comments in the code. Is there any way to detect this problem at compile /link time?

One thing I really like about the serialization library is that there are lots of asserts with lots of comments telling you what went wrong and how to fix it. Perhaps a gentle reminder about the header-order issue would have been appropriate. The development version of Boost is supposed to have removed the need for header ordering but I think the kinks are still being worked out: http://article.gmane.org/gmane.comp.lib.boost.devel/172425 -- Sohail Somani http://uint32t.blogspot.com

Robert Ramey

9:13 p.m.

Sohail Somani wrote:

...

On Sun, 23 Mar 2008 14:33:40 +0000, Jerry wrote:

...

One thing I really like about the serialization library is that there are lots of asserts with lots of comments telling you what went wrong and how to fix it. Perhaps a gentle reminder about the header-order issue would have been appropriate.

If we can fix the problem - then there will be no issue here.

...

The development version of Boost is supposed to have removed the need for header ordering but I think the kinks are still being worked out:

http://article.gmane.org/gmane.comp.lib.boost.devel/172425

Exactly - so at this point, this has to be considered a bug - fixes welcome. Robert Ramey

Sohail Somani

8:20 p.m.

On Sun, 23 Mar 2008 13:13:52 -0800, Robert Ramey wrote:

...

Sohail Somani wrote:

...
On Sun, 23 Mar 2008 14:33:40 +0000, Jerry wrote:

...
One thing I really like about the serialization library is that there are lots of asserts with lots of comments telling you what went wrong and how to fix it. Perhaps a gentle reminder about the header-order issue would have been appropriate.

If we can fix the problem - then there will be no issue here.

Yep, that's why I said "would have been appropriate" as it should not be necessary anymore.

...

...
The development version of Boost is supposed to have removed the need for header ordering but I think the kinks are still being worked out:

http://article.gmane.org/gmane.comp.lib.boost.devel/172425

Exactly - so at this point, this has to be considered a bug - fixes welcome.

Indeed. I would say this is a blocking issue for this compiler and the serialization library. I think a lot of people rely on export.hpp working properly. Just MHO of course ;-) -- Sohail Somani http://uint32t.blogspot.com

6360

Age (days ago)

6364

Last active (days ago)

List overview

Download

12 comments

5 participants

participants (5)

François Mauger
Jerry
Pfligersdorffer, Christian
Robert Ramey
Sohail Somani