Serialization: hugo file size, binary archives
Hello, I'm using the Serialization Library of Boost to store my data structure. I want to use the binary archive type by default: boost::archive::binary_oarchive(ostream &s) // saving boost::archive::binary_iarchive(istream &s) // loading But I noticed that these files can be very big compared to the stored data. I got a binary archive with around 1.5GByte. That could be but when I compress it I got only ~200MByte left (!). It seems that there is a lot of 'overhead' data or 'redundant' data (I see a lot of '0' when I look into it with an Hex editor). i tried the gzip (...) filter of the Iostreams library, but I want to avoid this for production due to increasing runtime. Some Information about my data structure (maybe helpful): - using a lot of pointer - using a lot of std::vector Does anybody investigate the same problem? Is there a possibility to decrease the archive size but storing the same amount of data? What could be a solution? Writing an own/optimized (regarding to my data structure) Archive class? thanks in advance Sascha
I'm am not aware of serialization causing such a problem. You might investigate std::vector resize(), etc to see if the vector really has a lot of null data. Robert Ramey Sascha Ochsenknecht wrote:
Hello,
I'm using the Serialization Library of Boost to store my data structure. I want to use the binary archive type by default: boost::archive::binary_oarchive(ostream &s) // saving boost::archive::binary_iarchive(istream &s) // loading
But I noticed that these files can be very big compared to the stored data. I got a binary archive with around 1.5GByte. That could be but when I compress it I got only ~200MByte left (!). It seems that there is a lot of 'overhead' data or 'redundant' data (I see a lot of '0' when I look into it with an Hex editor).
i tried the gzip (...) filter of the Iostreams library, but I want to avoid this for production due to increasing runtime.
Some Information about my data structure (maybe helpful): - using a lot of pointer - using a lot of std::vector
Does anybody investigate the same problem? Is there a possibility to decrease the archive size but storing the same amount of data? What could be a solution? Writing an own/optimized (regarding to my data structure) Archive class?
thanks in advance Sascha
Hello, First of all, thanks for the quick reply. I tried std::vector resize(), but the problem still exist. I also removed some redundant data from my data structure (its data which can be generated by a postprocess after reading the archive). I still have about 1.0GByte. Is there somewere a doc were 'overhead' data is documented? Would it be helpful if I send you a generated archive (with other data I can generate uncompressed archives around 1 MByte)? Would somebody have a look on it. Another thing ... I get a lot of compile warnings about unused variables within the serialization library. Would be nice if these can be fixed with the next release. Thanks in advance. Sascha Robert Ramey wrote:
I'm am not aware of serialization causing such a problem. You might investigate std::vector resize(), etc to see if the vector really has a lot of null data.
Robert Ramey
Sascha Ochsenknecht wrote:
Hello,
I'm using the Serialization Library of Boost to store my data structure. I want to use the binary archive type by default: boost::archive::binary_oarchive(ostream &s) // saving boost::archive::binary_iarchive(istream &s) // loading
But I noticed that these files can be very big compared to the stored data. I got a binary archive with around 1.5GByte. That could be but when I compress it I got only ~200MByte left (!). It seems that there is a lot of 'overhead' data or 'redundant' data (I see a lot of '0' when I look into it with an Hex editor).
i tried the gzip (...) filter of the Iostreams library, but I want to avoid this for production due to increasing runtime.
Some Information about my data structure (maybe helpful): - using a lot of pointer - using a lot of std::vector
Does anybody investigate the same problem? Is there a possibility to decrease the archive size but storing the same amount of data? What could be a solution? Writing an own/optimized (regarding to my data structure) Archive class?
thanks in advance Sascha
There is no information written into a binary archive which is not absolutly necessary. That is there is not redundant information. If you archives are "too" big there must be some mistake. I would suggest that you output (part of) the archive using text or xml format so you can see what is actually being written and how it differs from what you expect. Robert Ramey Sascha Ochsenknecht wrote:
Hello,
First of all, thanks for the quick reply.
I tried std::vector resize(), but the problem still exist. I also removed some redundant data from my data structure (its data which can be generated by a postprocess after reading the archive). I still have about 1.0GByte.
Is there somewere a doc were 'overhead' data is documented? Would it be helpful if I send you a generated archive (with other data I can generate uncompressed archives around 1 MByte)? Would somebody have a look on it.
Another thing ... I get a lot of compile warnings about unused variables within the serialization library. Would be nice if these can be fixed with the next release.
Thanks in advance.
Sascha
Robert Ramey wrote:
I'm am not aware of serialization causing such a problem. You might investigate std::vector resize(), etc to see if the vector really has a lot of null data.
Robert Ramey
Sascha Ochsenknecht wrote:
Hello,
I'm using the Serialization Library of Boost to store my data structure. I want to use the binary archive type by default: boost::archive::binary_oarchive(ostream &s) // saving boost::archive::binary_iarchive(istream &s) // loading
But I noticed that these files can be very big compared to the stored data. I got a binary archive with around 1.5GByte. That could be but when I compress it I got only ~200MByte left (!). It seems that there is a lot of 'overhead' data or 'redundant' data (I see a lot of '0' when I look into it with an Hex editor).
i tried the gzip (...) filter of the Iostreams library, but I want to avoid this for production due to increasing runtime.
Some Information about my data structure (maybe helpful): - using a lot of pointer - using a lot of std::vector
Does anybody investigate the same problem? Is there a possibility to decrease the archive size but storing the same amount of data? What could be a solution? Writing an own/optimized (regarding to my data structure) Archive class?
thanks in advance Sascha
Thanks for support, now I see the rootcause of my huge archive files. I reached a very good performance now (runtime and file size). Best regards, Sascha Robert Ramey wrote:
There is no information written into a binary archive which is not absolutly necessary. That is there is not redundant information. If you archives are "too" big there must be some mistake. I would suggest that you output (part of) the archive using text or xml format so you can see what is actually being written and how it differs from what you expect.
Robert Ramey
Sascha Ochsenknecht wrote:
Hello,
First of all, thanks for the quick reply.
I tried std::vector resize(), but the problem still exist. I also removed some redundant data from my data structure (its data which can be generated by a postprocess after reading the archive). I still have about 1.0GByte.
Is there somewere a doc were 'overhead' data is documented? Would it be helpful if I send you a generated archive (with other data I can generate uncompressed archives around 1 MByte)? Would somebody have a look on it.
Another thing ... I get a lot of compile warnings about unused variables within the serialization library. Would be nice if these can be fixed with the next release.
Thanks in advance.
Sascha
Robert Ramey wrote:
I'm am not aware of serialization causing such a problem. You might investigate std::vector resize(), etc to see if the vector really has a lot of null data.
Robert Ramey
Sascha Ochsenknecht wrote:
Hello,
I'm using the Serialization Library of Boost to store my data structure. I want to use the binary archive type by default: boost::archive::binary_oarchive(ostream &s) // saving boost::archive::binary_iarchive(istream &s) // loading
But I noticed that these files can be very big compared to the stored data. I got a binary archive with around 1.5GByte. That could be but when I compress it I got only ~200MByte left (!). It seems that there is a lot of 'overhead' data or 'redundant' data (I see a lot of '0' when I look into it with an Hex editor).
i tried the gzip (...) filter of the Iostreams library, but I want to avoid this for production due to increasing runtime.
Some Information about my data structure (maybe helpful): - using a lot of pointer - using a lot of std::vector
Does anybody investigate the same problem? Is there a possibility to decrease the archive size but storing the same amount of data? What could be a solution? Writing an own/optimized (regarding to my data structure) Archive class?
thanks in advance Sascha
On Mon, 10 Apr 2006 17:11:33 -0300, Sascha Ochsenknecht
Thanks for support, now I see the rootcause of my huge archive files. I reached a very good performance now (runtime and file size).
I'm curious, what was it? Bruno Ahora podes ir volando sin usar el telefono NUEVO ADSL 256K sin limites por $ 890 !! y los primeros cuatro meses a mitad de precio ______________________________________________________ http://www.internet.com.uy - En Uruguay somos internet Ahora podes ir volando sin usar el telefono NUEVO ADSL 256K sin limites por $ 890 !! y los primeros cuatro meses a mitad de precio ______________________________________________________ http://www.internet.com.uy - En Uruguay somos internet
Hi Bruno, I made a more detailed investigation of my data structure and the examples that I used and came to the result that the size was ok. Sascha Bruno Martínez wrote:
On Mon, 10 Apr 2006 17:11:33 -0300, Sascha Ochsenknecht
wrote: Thanks for support, now I see the rootcause of my huge archive files. I reached a very good performance now (runtime and file size).
I'm curious, what was it?
Bruno
Ahora podes ir volando sin usar el telefono NUEVO ADSL 256K sin limites por $ 890 !! y los primeros cuatro meses a mitad de precio ______________________________________________________ http://www.internet.com.uy - En Uruguay somos internet
Ahora podes ir volando sin usar el telefono NUEVO ADSL 256K sin limites por $ 890 !! y los primeros cuatro meses a mitad de precio ______________________________________________________ http://www.internet.com.uy - En Uruguay somos internet
participants (3)
-
Bruno Martínez
-
Robert Ramey
-
Sascha Ochsenknecht