Reducing the size of .obj file which does serialization

Hi, I have a parent class which has around 200 sub-classes. While serializing the entire hierarchy, i have lot of base class pointers and STLs of base class pointers. From the documentation, i found that if i have to serialize the polymorphic pointers, then i have to do a BOOST_EXPORT of all the relevant subclasses. But after doing that, i find my .obj file to be around 37MB. I am using the text archive in my code. When i tried the polymorphic_text_archive, the size only reduced by 3MB. Can someone help me on how i should go about reducing the .obj file size? Some of my questions are a) One approach, i could think of is to create a virtual function in the hierarchy, which just takes the Archive and does the serialization. This would help me, because i don't need to have the BOOST_EXPORT, which increases the size of the .obj file. b) Why is BOOST_EXPORT required for polymorphic_text_archive? Since it anyway uses virtual functions instead of templates, why does it need the derived class definition. Thanks, Gokul.

a) what compiler/platform are you using. b) 37 MB suggests that you're including the debug symbols. Assuming this is correct, how large is it when compiling for release. Does the size of the debug version really matter? c) I'm doubtful that BOOST_EXPORT has this large an impact. How large was the executable before you exported the derived classes? Gokulakannan Somasundaram wrote:
Hi, I have a parent class which has around 200 sub-classes. While serializing the entire hierarchy, i have lot of base class pointers and STLs of base class pointers. From the documentation, i found that if i have to serialize the polymorphic pointers, then i have to do a BOOST_EXPORT of all the relevant subclasses. But after doing that, i find my .obj file to be around 37MB. I am using the text archive in my code. When i tried the polymorphic_text_archive, the size only reduced by 3MB. Can someone help me on how i should go about reducing the .obj file size?
Some of my questions are a) One approach, i could think of is to create a virtual function in the hierarchy, which just takes the Archive and does the serialization. This would help me, because i don't need to have the BOOST_EXPORT, which increases the size of the .obj file.
b) Why is BOOST_EXPORT required for polymorphic_text_archive? Since it anyway uses virtual functions instead of templates, why does it need the derived class definition.
Assuming that one is not using the no_rtti type info system, the usage of BOOST_EXPORT should be unrelated to the usage of polymorphic_?archives.
Thanks, Gokul
Robert Ramey .

Hi Robert, Thanks for the reply. a) what compiler/platform are you using.
MSVC/Windows
b) 37 MB suggests that you're including the debug symbols. Assuming this is correct, how large is it when compiling for release. Does the size of the debug version really matter?
Yes it is in the debug mode and the size in Debug mode doesn't matter. In Release mode, it came down to 21MB. When i removed the BOOST_EXPORT, the size in DEBUG came down to 23MB.
c) I'm doubtful that BOOST_EXPORT has this large an impact. How large was the executable before you exported the derived classes?
Still, i haven't formed the executable with serialization. I will let you know, within a day or two.
Please let me know, if you need any other details. Thanks, Gokul.

My tests don't show the serialization consuming a disproportionate amount of memory. But I haven't made an application which serializes 200 classes either. One interesting idea is to have the linker generate a *.map file (in release mode) to show the size of modules being created. On a project this size, I would probably create it as a group of Libraries just to make build and code management easier. If I build those libraries as DLLS, then I can really see where the code is going. However, this requires a little more care in organizing code to be sure that only code actually used in loaded. This is sort of a pain - even more so with the serialization library - but it makes a much better final product. Combining this with the polymorphic archive should result in (almost) no code duplication. However, doing this requires special care in seperating declarations from definitions. Robert Ramey Gokulakannan Somasundaram wrote:
Hi Robert, Thanks for the reply.
a) what compiler/platform are you using.
MSVC/Windows
b) 37 MB suggests that you're including the debug symbols. Assuming this is correct, how large is it when compiling for release. Does the size of the debug version really matter?
Yes it is in the debug mode and the size in Debug mode doesn't matter. In Release mode, it came down to 21MB. When i removed the BOOST_EXPORT, the size in DEBUG came down to 23MB.
And what does the release mode version come down to?
c) I'm doubtful that BOOST_EXPORT has this large an impact. How large was the executable before you exported the derived classes?
Still, i haven't formed the executable with serialization. I will let you know, within a day or two.
Please let me know, if you need any other details.
Thanks, Gokul.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

In Release mode, without the BOOST_EXPORT it's around 14MB. We have decided
to keep all of them in one DLL. Similarly we have other DLLs (in windows and
.so for linux) for other areas of the product. Can you please brief me more
on separating declarations from definitions. We have written the class
declarations in .h and the functions to be inlined in .inl and the other
functions in .cpp.
So the code organization is definitely very clean. Hence i decided to put
all the serialization code into a separate .cpp file. Is this sufficient to
use polymorphic_text_archive? The serialization actually comes in the
non-critical path and hence virtual function call is fine for us.
I will try out the .map file idea, which you suggested. Thanks and please
let me know, if there are any other suggestions.
Thanks,
Gokul.
On Wed, Jan 6, 2010 at 1:51 AM, Robert Ramey
My tests don't show the serialization consuming a disproportionate amount of memory.
But I haven't made an application which serializes 200 classes either.
One interesting idea is to have the linker generate a *.map file (in release mode) to show the size of modules being created.
On a project this size, I would probably create it as a group of Libraries just to make build and code management easier. If I build those libraries as DLLS, then I can really see where the code is going. However, this requires a little more care in organizing code to be sure that only code actually used in loaded. This is sort of a pain - even more so with the serialization library - but it makes a much better final product.
Combining this with the polymorphic archive should result in (almost) no code duplication. However, doing this requires special care in seperating declarations from definitions.
Robert Ramey
Gokulakannan Somasundaram wrote:
Hi Robert, Thanks for the reply.
a) what compiler/platform are you using.
MSVC/Windows
b) 37 MB suggests that you're including the debug symbols. Assuming this is correct, how large is it when compiling for release. Does the size of the debug version really matter?
Yes it is in the debug mode and the size in Debug mode doesn't matter. In Release mode, it came down to 21MB. When i removed the BOOST_EXPORT, the size in DEBUG came down to 23MB.
And what does the release mode version come down to?
c) I'm doubtful that BOOST_EXPORT has this large an impact. How large was the executable before you exported the derived classes?
Still, i haven't formed the executable with serialization. I will let you know, within a day or two.
Please let me know, if you need any other details.
Thanks, Gokul.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Gokulakannan Somasundaram wrote:
In Release mode, without the BOOST_EXPORT it's around 14MB. We have decided to keep all of them in one DLL. Similarly we have other DLLs (in windows and .so for linux) for other areas of the product. Can you please brief me more on separating declarations from definitions. We have written the class declarations in .h and the functions to be inlined in .inl and the other functions in .cpp.
Sounds like your using the right approach. Basically it boils down to replacing inline definitions in header files with just declarations and implementing definitions in cpp and ?ipp files. Your raw numbers are what bother me. I'm reading for compilations in release mode: 23 MB for serialization using export 14 MB for serialization non using export 200 classes I'm assuming the above MB are just the parts related to serialization and don't include the whole ap. On a per class basis, this boils down to 115 KB for serialization using export 70 MB for serialization non using export which seems waaaaaaaayy out of line for me. The tests in the library don't consume anywhere near that. You should look more carefully at this. Perhaps making some smaller tests to see where all this "extra" code is coming from. It might well be that your DLLS are generating code for ALL the archives that the library supports.
So the code organization is definitely very clean. Hence i decided to put all the serialization code into a separate .cpp file. Is this sufficient to use polymorphic_text_archive? The serialization actually comes in the non-critical path and hence virtual function call is fine for us.
Though it's described in the documentation, here's the short version. a) your serialization *.cpp files include ONLY polymorphic_?archive.hpp files. Code is generated against ONLY that interface. b) Your mainline creates an instance of polymorphic_xml_?archive (or text or whatever). This is cast to one of its base classes - polymorphic_?archive. c) Serialization occurs as normal. d) Much memory is saved as the serialization code is only instantiated for polymorphic_?archive. e) Serialization takes measurable more time. Maybe on the order of 2x? This could be improved upon by making some implementation changes but it's not a current priority. Look into the demos and tests to see examples of how this is done. Robert Ramey
participants (2)
-
Gokulakannan Somasundaram
-
Robert Ramey