booser::serialization compilation times

We are a team that develops a C/C++/Fortran/Binary compiler project called ROSE and we want to serialize the Abstract Syntax Tree (AST). This AST has 436 different custom class types and uses maps, sets, vectors, lists and hash maps. These classes have about 1229 different variables. For convenience reasons we wanted to serialize the AST using boost::serialize, but we have some compiler performance problems. Currently the compilation takes (GCC 4.1.2) -48 minutes to compile and link a program that takes seconds to compile without serialization -The resulting binary goes from <1MB to 122 MB in size. -The compilation uses 2 GB of RAM -The object file contains 98075 symbols with serialization and 41 without If the templates for loading the AST is not instantiated the instantiated templates for saving the AST takes about - 19 minutes to compile and link - The resulting binary goes from <1MB to 77MB - The compilation uses 1.1GB of RAM We have our own custom serialization mechanism that is not that easy to support, but it does not visibly increa My machine is a 2.66 Ghz Quad Core Xeon 5355, 16 GB RAM and 2 TB of striped storage. Is there any trick I can use to reduce the compilation time? When the compilation time is 48 minutes boost::serialization is unfortunately not an option for us although it is a great tool at runtime. thanks, Andreas

On Thu, Jul 10, 2008 at 1:06 AM, Andreas Sæbjørnsen
Hi Andreas, You might want to move all serialization definitions away from headers and explicitly instantiate the functions for the archives you use. You can save a lot of instantiations when recompiling as long as you don't have to recompile almost everything all the time.
thanks, Andreas
-- Felipe Magno de Almeida

Here is what I recommend: a) Try buld for release. This should drop the binary down to a reasonable size. If it doesn't that would be interesting to know. b) Don't use inline code for serialization definitions. c) Separate the serialization code into separate modules so they will only need to be compiled when the class definition changes. d) Consider creating as many as one module per class. e) Consider making a library of all the modules containing your serialization code. In this way, even if you make a new applicaiton, the serialization code won't have to be recompiled. f) Consider using a polymorphic archive (in a library as above). In this way, one compiliation will server for ALL archive classes. g) Review the demo_pimpl.cpp for an example on how to do this. The result of the above will be that only those modules whose header is changed, will need to be compiled. That is, the compilations will take just as long, but should be necessary far less frequently. This approach is useful for all large applications. You might not like this idea - but you might want to consider using MS VC as compiler. Its much faster than GCC. Making your code conformant with both is very little effort. Robert Ramey Andreas Sæbjørnsen wrote:
We are a team that develops a C/C++/Fortran/Binary compiler project called ROSE and we want to serialize the Abstract Syntax Tree (AST). This AST has 436 different custom class types and uses maps, sets, vectors, lists and hash maps. These classes have about 1229 different variables. For convenience reasons we wanted to serialize the AST using boost::serialize, but we have some compiler performance problems. Currently the compilation takes (GCC 4.1.2) -48 minutes to compile and link a program that takes seconds to compile without serialization -The resulting binary goes from <1MB to 122 MB in size. -The compilation uses 2 GB of RAM -The object file contains 98075 symbols with serialization and 41 without
If the templates for loading the AST is not instantiated the instantiated templates for saving the AST takes about - 19 minutes to compile and link - The resulting binary goes from <1MB to 77MB - The compilation uses 1.1GB of RAM
We have our own custom serialization mechanism that is not that easy to support, but it does not visibly increa
My machine is a 2.66 Ghz Quad Core Xeon 5355, 16 GB RAM and 2 TB of striped storage.
Is there any trick I can use to reduce the compilation time? When the compilation time is 48 minutes boost::serialization is unfortunately not an option for us although it is a great tool at runtime.
thanks, Andreas

Thank you very much for the suggestions on how to reduce the need for recompilation. I will definitely look into that. We are currently using ccache to precompile header in order to avoid unnecessary recompilation.
From what I understand there are no other way to improve compilation time on first compilation other than changing the compiler? GCC really has trouble instantiating all the templates to account for those 98000 symbols in the object file. Unfortunately the MS VC compiler is not an option for us since most of our developers work on Linux.
It also a problem for us that when the compilation takes 2GB of RAM
people are unlikely to be able to compile our tool on an older machine
or laptop, and we have to support those environments as well (but we
could of cause make the serialization optional so it is not a big
problem).
thanks,
Andreas
On Wed, Jul 9, 2008 at 10:48 PM, Robert Ramey
a) Try buld for release. This should drop the binary down to a reasonable size. If it doesn't that would be interesting to know.
b) Don't use inline code for serialization definitions.
c) Separate the serialization code into separate modules so they will only need to be compiled when the class definition changes.
d) Consider creating as many as one module per class.
e) Consider making a library of all the modules containing your serialization code. In this way, even if you make a new applicaiton, the serialization code won't have to be recompiled.
f) Consider using a polymorphic archive (in a library as above). In this way, one compiliation will server for ALL archive classes.
g) Review the demo_pimpl.cpp for an example on how to do this.
The result of the above will be that only those modules whose header is changed, will need to be compiled.
That is, the compilations will take just as long, but should be necessary far less frequently.
This approach is useful for all large applications.
You might not like this idea - but you might want to consider using MS VC as compiler. Its much faster than GCC. Making your code conformant with both is very little effort.
Robert Ramey
Andreas Sæbjørnsen wrote:
We are a team that develops a C/C++/Fortran/Binary compiler project called ROSE and we want to serialize the Abstract Syntax Tree (AST). This AST has 436 different custom class types and uses maps, sets, vectors, lists and hash maps. These classes have about 1229 different variables. For convenience reasons we wanted to serialize the AST using boost::serialize, but we have some compiler performance problems. Currently the compilation takes (GCC 4.1.2) -48 minutes to compile and link a program that takes seconds to compile without serialization -The resulting binary goes from <1MB to 122 MB in size. -The compilation uses 2 GB of RAM -The object file contains 98075 symbols with serialization and 41 without
If the templates for loading the AST is not instantiated the instantiated templates for saving the AST takes about - 19 minutes to compile and link - The resulting binary goes from <1MB to 77MB - The compilation uses 1.1GB of RAM
We have our own custom serialization mechanism that is not that easy to support, but it does not visibly increa
My machine is a 2.66 Ghz Quad Core Xeon 5355, 16 GB RAM and 2 TB of striped storage.
Is there any trick I can use to reduce the compilation time? When the compilation time is 48 minutes boost::serialization is unfortunately not an option for us although it is a great tool at runtime.
thanks, Andreas
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

We have about 200 different classes we serialize, and I'm guessing tens of
terabytes of this data on disk. Lots of these classes have been through
many versions and have big nasty serialization methods. We ran in to the
recompile-time problem early and have been doing the following with pretty
good results:
/* serialization.h */
#define SERIALIZABLE(T) \
template void T::serialize(boost::archive::portable_binary_archive&, unsigned); \
template void T::serialize(boost::archive::portable_binary_iarchive&, unsigned); \
template void T::serialize(boost::archive::xml_oarchive&, unsigned);
/* C.h (for each class C) */
class C
{
template <typename Ar> void serialize(Ar&, unsigned); //declaration only
};
/* C.cpp */
#include

I wanted to give you an update on the effect of the changes you
suggested. I turned off debugging as you suggested
so that no debug symbols are generated and that
- reduced the binary to 17MB when compiling with "-O3".
- the compilation time was also reduced to 17 minutes.
- the number of symbols in the object file was reduces to 46707.
- binary becomes 29 MB
I also used
BOOST_CLASS_IMPLEMENTATION(pixel,
boost::serialization::object_serializable);
that does not seem to reduce compilation time. Most of the symbols are
weak symbols on the form
0000000000000000 V
_ZTVN5boost7archive6detail19pointer_iserializerINS0_15binary_iarchiveE12SgUpcThreadsEE
or weak symbols with default values
0000000000000000 W
_ZNK5boost7archive6detail11oserializerINS0_15binary_oarchiveE19SgAsmTypeDoubleWordE14is_polymorphicEv
But 17 minutes and 1.2 GB of compiler memory usage is still not great.
Is there any other common techniques used to reduce compilation
time when using template meta-programming that you are aware of?
thanks,
Andreas
On Wed, Jul 9, 2008 at 10:48 PM, Robert Ramey
a) Try buld for release. This should drop the binary down to a reasonable size. If it doesn't that would be interesting to know.
b) Don't use inline code for serialization definitions.
c) Separate the serialization code into separate modules so they will only need to be compiled when the class definition changes.
d) Consider creating as many as one module per class.
e) Consider making a library of all the modules containing your serialization code. In this way, even if you make a new applicaiton, the serialization code won't have to be recompiled.
f) Consider using a polymorphic archive (in a library as above). In this way, one compiliation will server for ALL archive classes.
g) Review the demo_pimpl.cpp for an example on how to do this.
The result of the above will be that only those modules whose header is changed, will need to be compiled.
That is, the compilations will take just as long, but should be necessary far less frequently.
This approach is useful for all large applications.
You might not like this idea - but you might want to consider using MS VC as compiler. Its much faster than GCC. Making your code conformant with both is very little effort.
Robert Ramey
Andreas Sæbjørnsen wrote:
We are a team that develops a C/C++/Fortran/Binary compiler project called ROSE and we want to serialize the Abstract Syntax Tree (AST). This AST has 436 different custom class types and uses maps, sets, vectors, lists and hash maps. These classes have about 1229 different variables. For convenience reasons we wanted to serialize the AST using boost::serialize, but we have some compiler performance problems. Currently the compilation takes (GCC 4.1.2) -48 minutes to compile and link a program that takes seconds to compile without serialization -The resulting binary goes from <1MB to 122 MB in size. -The compilation uses 2 GB of RAM -The object file contains 98075 symbols with serialization and 41 without
If the templates for loading the AST is not instantiated the instantiated templates for saving the AST takes about - 19 minutes to compile and link - The resulting binary goes from <1MB to 77MB - The compilation uses 1.1GB of RAM
We have our own custom serialization mechanism that is not that easy to support, but it does not visibly increa
My machine is a 2.66 Ghz Quad Core Xeon 5355, 16 GB RAM and 2 TB of striped storage.
Is there any trick I can use to reduce the compilation time? When the compilation time is 48 minutes boost::serialization is unfortunately not an option for us although it is a great tool at runtime.
thanks, Andreas
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Andreas Sæbjørnsen wrote:
Is there any trick I can use to reduce the compilation time? When the compilation time is 48 minutes boost::serialization is unfortunately not an option for us although it is a great tool at runtime.
Have you tried precompiling? http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html
participants (5)
-
Andreas Sæbjørnsen
-
Felipe Magno de Almeida
-
gchen
-
Robert Ramey
-
troy d. straszheim