[serialization] de-serialization performance issue

I have encountered a performance issue with serialization, when used with multi-index followed by a vector<size_t>. I have a reduced program below, which reproduces the issue (using boost 1.44.0). I basically have a class with a multi-index member, and a vector<size_t> member. It serializes OK, but deserialization's performance appears to be O(n^2). It seems to be getting bogged down in basic_iarchive_impl::reset_object_address(), when deserializing the vector. Any thoughts? Thanks, Brad ------------------------------------------- #include <iostream> #include <fstream> #include <sstream> #include <string> #include <vector> #include <boost/multi_index_container.hpp> #include <boost/multi_index/key_extractors.hpp> #include <boost/multi_index/hashed_index.hpp> #include <boost/multi_index/ordered_index.hpp> #include <boost/serialization/access.hpp> #include <boost/serialization/assume_abstract.hpp> #include <boost/serialization/base_object.hpp> #include <boost/serialization/export.hpp> #include <boost/serialization/nvp.hpp> #include <boost/serialization/vector.hpp> #include <boost/serialization/version.hpp> #include <boost/archive/text_iarchive.hpp> #include <boost/archive/text_oarchive.hpp> struct DummyItem { std::size_t ordered_val_; DummyItem( const std::size_t ordered_val) : ordered_val_(ordered_val) {} DummyItem() : ordered_val_(0) {} /** * Boost serialization hook */ friend class boost::serialization::access; template<class Archive> void serialize(Archive & ar, const unsigned int version) { ar & boost::serialization::make_nvp("ordered_val", ordered_val_); } }; BOOST_CLASS_VERSION(DummyItem,1) namespace bmi = boost::multi_index; class MultiIndexDummy { public: struct ordered_val_tag {}; typedef bmi::multi_index_container< DummyItem, bmi::indexed_by< bmi::ordered_non_unique< bmi::tag<ordered_val_tag>, BOOST_MULTI_INDEX_MEMBER(DummyItem,std::size_t,ordered_val_) > > > MultiIndexDb; typedef MultiIndexDb::index<ordered_val_tag>::type OrderedVal2Index; bool insert(DummyItem &item) { bool rc = false; OrderedVal2Index &str_idx = db_.get<ordered_val_tag>(); OrderedVal2Index::iterator str_iter = str_idx.find(item.ordered_val_); if (str_idx.end() == str_iter) { std::pair<OrderedVal2Index::iterator,bool> insert_res = str_idx.insert(item); if (insert_res.second) { rc = true; } } return rc; } MultiIndexDb db_; std::vector<std::size_t> vector_ints_; /** * Boost serialization hook */ friend class boost::serialization::access; template<class Archive> void serialize(Archive & ar, const unsigned int version) { std::cout << "SERIALIZE db_\n"; ar & boost::serialization::make_nvp("db", db_); std::cout << "SERIALIZE vector_ints_\n"; ar & boost::serialization::make_nvp("vector_ints", vector_ints_); // <- THIS IS SLOW DURING INPUT } }; BOOST_CLASS_VERSION(MultiIndexDummy, 1); int main(int argc, char* argv[]) { const std::string filename("persisted_data.dat"); std::cout << "Start\n"; { MultiIndexDummy mi_map; for (int i=0; i<100000; i++) { DummyItem item(i); mi_map.insert(item); mi_map.vector_ints_.push_back(i); } std::cout << "persist the data\n"; std::ofstream out_file(filename.c_str()); boost::archive::text_oarchive oa(out_file); oa.register_type(static_cast<MultiIndexDummy*>(NULL)); oa << boost::serialization::make_nvp("id", mi_map); } std::cout << "Restore\n"; { MultiIndexDummy mi_map; std::ifstream in_file(filename.c_str()); boost::archive::text_iarchive ia(in_file); ia >> mi_map; // <- THIS IS SLOW } std::cout << "Done\n"; return 0; }

Brad Higgins wrote:
I have encountered a performance issue with serialization, when used with multi-index followed by a vector<size_t>. I have a reduced program below, which reproduces the issue (using boost 1.44.0).
I basically have a class with a multi-index member, and a vector<size_t> member. It serializes OK, but deserialization's performance appears to be O(n^2). It seems to be getting bogged down in basic_iarchive_impl::reset_object_address(), when deserializing the vector. Any thoughts?
/** * Boost serialization hook */ friend class boost::serialization::access; template<class Archive> void serialize(Archive & ar, const unsigned int version) { std::cout << "SERIALIZE db_\n"; ar & boost::serialization::make_nvp("db", db_); std::cout << "SERIALIZE vector_ints_\n"; // ar & boost::serialization::make_nvp("vector_ints", // vector_ints_); // <- THIS IS SLOW DURING INPUT }
FOREACH(size_t x, vector_ints) ar & x
};
what happens if you make the change above? That is, if you substitute you're own loop for the boost::serialization implemenation? Robert Ramey

If I roll my own, I would not call reset_object_address(), and it would work much faster. The issue is that, for each element in the vector, the default deserialization code is calling reset_object_address(). Due to the deserialization of the large multi-index, reset_object_address() iterates through a list that is very large. I'll spend more time reading the code, but is there a way to avoid the big hit in reset_object_address(), or avoid the call to it all together? Thanks, Brad On Jan 20, 2011, at 1:34 AM, Robert Ramey wrote:
Brad Higgins wrote:
I have encountered a performance issue with serialization, when used with multi-index followed by a vector<size_t>. I have a reduced program below, which reproduces the issue (using boost 1.44.0).
I basically have a class with a multi-index member, and a vector<size_t> member. It serializes OK, but deserialization's performance appears to be O(n^2). It seems to be getting bogged down in basic_iarchive_impl::reset_object_address(), when deserializing the vector. Any thoughts?
/** * Boost serialization hook */ friend class boost::serialization::access; template<class Archive> void serialize(Archive & ar, const unsigned int version) { std::cout << "SERIALIZE db_\n"; ar & boost::serialization::make_nvp("db", db_); std::cout << "SERIALIZE vector_ints_\n"; // ar & boost::serialization::make_nvp("vector_ints", // vector_ints_); // <- THIS IS SLOW DURING INPUT }
FOREACH(size_t x, vector_ints) ar & x
};
what happens if you make the change above? That is, if you substitute you're own loop for the boost::serialization implemenation?
Robert Ramey
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (2)
-
Brad Higgins
-
Robert Ramey