Boost.Serialization [For MPI] Crashes
Hi all, I have been using the serialization library since it is the way to go for communicating whole classes in Boost.MPI. Previously, I didn't have any problems. However, I now experience a strange problem. I have successfully ran all my code in SDSC's Teragrid cluster (A cluster of Itanium processors, running intel-linux), but when I tried to do the same thing in NCSA's cluster (which is similarly composed of Itaniums with same OS) I just can't build serialization library successfully. It says: intel-linux.link.dll bin.v2/libs/serialization/build/intel-linux/release/libboost_serialization-il-1_35.so.1.35.0 OBJREAD Error: Could not create mapping for "bin.v2/libs/serialization/build/intel-linux/release/basic_oarchive.o". icpc: error: problem during multi-file optimization compilation (code 1) Both clusters use the same kernel (2.4.21), same compiler suite (icc 9.1), same architecture, same mpixx (/usr/local/apps/mpich-gm-1.2.6..14b-intel-r2/bin/mpicxx). So, that was a little frustrating. But the fact is that, it creates the multithreaded library (libboost_serialization-il-mt-1_35) even after the error. I gave it a try. Compilation is flawless, but during runtime, main() is even not called [it seg faults immediately].When I debug it, I get the following backtrace: This GDB was configured as "ia64-suse-linux"... (gdb) run Starting program: /home/ac/aydinb/ParSPGEMM/testpar Program received signal SIGSEGV, Segmentation fault. 0x6000000000011ef0 in typeinfo for boost::serialization::detail::extended_type_info_typeid_0 () (gdb) bt #0 0x6000000000011ef0 in typeinfo for boost::serialization::detail::extended_type_info_typeid_0 () #1 0x4000000000103610 in boost::serialization::detail::extended_type_info_typeid_0::less_than(boost::serialization::extended_type_info const&) const (this=0x6000000000018650, rhs=@0x6000000000018910) at libs/serialization/src/extended_type_info_typeid.cpp:22 (gdb) I said fine. Maybe the library was broken due to errors in make / make install. Since we have the same settings, I copied the libraries from SDSC (which works perfectly). Same error, same lines from the gdb :( Any insight why this might be the case? Thanks, -- Aydin
On 10 Feb 2008, at 02:21, Aydin Buluc wrote:
Hi all, I have been using the serialization library since it is the way to go for communicating whole classes in Boost.MPI. Previously, I didn't have any problems.
However, I now experience a strange problem. I have successfully ran all my code in SDSC's Teragrid cluster (A cluster of Itanium processors, running intel-linux), but when I tried to do the same thing in NCSA's cluster (which is similarly composed of Itaniums with same OS) I just can't build serialization library successfully. It says:
intel-linux.link.dll bin.v2/libs/serialization/build/intel-linux/ release/libboost_serialization-il-1_35.so.1.35.0
OBJREAD Error: Could not create mapping for "bin.v2/libs/ serialization/build/intel-linux/release/basic_oarchive.o".
icpc: error: problem during multi-file optimization compilation (code 1)
Both clusters use the same kernel (2.4.21), same compiler suite (icc 9.1), same architecture, same mpixx (/usr/local/apps/mpich- gm-1.2.6..14b-intel-r2/bin/mpicxx). So, that was a little frustrating. But the fact is that, it creates the multithreaded library (libboost_serialization-il-mt-1_35) even after the error. I gave it a try. Compilation is flawless, but during runtime, main() is even not called [it seg faults immediately].When I debug it, I get the following backtrace:
This GDB was configured as "ia64-suse-linux"... (gdb) run Starting program: /home/ac/aydinb/ParSPGEMM/testpar
Program received signal SIGSEGV, Segmentation fault. 0x6000000000011ef0 in typeinfo for boost::serialization::detail::extended_type_info_typeid_0 () (gdb) bt #0 0x6000000000011ef0 in typeinfo for boost::serialization::detail::extended_type_info_typeid_0 () #1 0x4000000000103610 in boost ::serialization ::detail ::extended_type_info_typeid_0 ::less_than(boost::serialization::extended_type_info const&) const (this=0x6000000000018650, rhs=@0x6000000000018910) at libs/serialization/src/ extended_type_info_typeid.cpp:22 (gdb)
I said fine. Maybe the library was broken due to errors in make / make install. Since we have the same settings, I copied the libraries from SDSC (which works perfectly). Same error, same lines from the gdb :(
Any insight why this might be the case?
Have you tried to see whether the serialization library works on those machines. Can you try to write the data types you want to send via MPI into a binary archive and see whether a similar problem appears? This looks like a problem with the serialization library. Matthias
participants (2)
-
Aydin Buluc
-
Matthias Troyer