[serialization] Runtime overhead of serialization archives

Hi all, currently i try to measure the runtime overhead of serialization vs. plain c-style serialization. The goal is to send it from one process to an other process on the same machine. As my statemachines are written with boost::msm, i want to deliver types. For the IPC i used interprocess::message_queue. I wrote a template function for the test and added a trait to plug in the test functions. I tested on the same machine, once with Linux Debian stretch x64 and Win 10 x64 MSVC 2015. What really astonish me is the fact, that the measured times using boost::serilization is so high compared to "c-style: id + data" method. In the c-style method i used a FNV hash of the type name as the ID. All tests were done on a Intel Core i7 2670QM CPU. All results in sec. I sent/received 100000 objects over a message_queue. Boost 1.61.0 Linux x86_64 / gcc 6.1.1 Win 10 x64/ MSVC 2015 Boost XML Send 2.220753 8.255834 Boost XML Receive 3.208353 10.14462 Boost Text Send 2.024946 8.578654 Boost Text Receive 3.207359 10.704126 Boost BinarySend 2.018026 8.363865 Boost Binary Receive 3.17984 11.201501 Cstyle Send 0.13566 0.056814 Cstyle Receive 0.087906 0.058706 Char Send 0.071683 0.013965 Char Receive 0.062119 0.012631 To measure the real overhead of the message passing, i made a test and just sent 100000 plain chars over the "wire". There are two strange things: a) The serilization with boost seems to be about 16 times slower than the plain c-style method, the receive seems to be about 30 times slower. I think i do something wrong..... he tests were compiled on release mode. b) On the same hardware, the Windows implementation is so much slower than the Linux one... about factor 3. But at the c-style method, it turns around.... Linux is slower than Windows. Has anyone an idea whats the issue here? I added the code at the end of this text Best regards Georg // ----------------------------------------------------- // Code // ----------------------------------------------------- #define BOOST_TEST_MODULE first_tests #include <boost/test/unit_test.hpp> #include <boost/interprocess/ipc/message_queue.hpp> // send all data through this #include <boost/timer/timer.hpp> // measure the used time #include <iostream> // for output // STL Archive + Stuff #include <boost/serialization/base_object.hpp> #include <boost/serialization/export.hpp> #include <boost/serialization/shared_ptr.hpp> #include <boost/serialization/unique_ptr.hpp> // include headers that implement a archive in xml format #include <boost/archive/archive_exception.hpp> #include <boost/archive/xml_iarchive.hpp> #include <boost/archive/xml_oarchive.hpp> #include <boost/archive/text_iarchive.hpp> #include <boost/archive/text_oarchive.hpp> #include <boost/archive/binary_iarchive.hpp> #include <boost/archive/binary_oarchive.hpp> #include <boost/iostreams/device/array.hpp> #include <boost/iostreams/device/back_inserter.hpp> #include <boost/iostreams/stream.hpp> #include <memory> #include <stdint.h> #include <typeinfo> #include <vector> using namespace boost::interprocess; static const int test_count = 100000; // send this many structs // the test structure struct ev_test { long a = 1; unsigned long b = 2; }; // ---------------------------------------------------------------------------- // a packet on the wire // ---------------------------------------------------------------------------- using packet = std::vector<char>; //----------------------------------------------------------------------------- // Type carrier and its support // ---------------------------------------------------------------------------- namespace { class carrier_visitor_base; class carrier_base // the base in the queue { public: using ptr = std::unique_ptr<carrier_base>; virtual ~carrier_base() {} virtual void accept(carrier_visitor_base *p_visitor) = 0; template <class Archive> void serialize(Archive &ar, const unsigned int version) { } }; template <typename T> class carrier; class carrier_visitor_base { public: virtual ~carrier_visitor_base() {} virtual void handle(carrier<ev_test> *p_evt) = 0; virtual void handle(carrier<char> *p_evt) = 0; }; template <typename T> class carrier : public carrier_base // the specific carrier { public: explicit carrier() : m_data() {} explicit carrier(const T &data) : m_data(data) {} virtual void accept(carrier_visitor_base *p_visitor) override { p_visitor->handle(this); } T &data() { return m_data; } private: T m_data; }; } // ns anon // ---------------------------------------------------------------------------- // the traits for the chartest // ---------------------------------------------------------------------------- struct char_test { static const char *name() { return "char_test 'c': "; } static size_t msg_size() { return sizeof(char); } static char get_data() { return 'c'; } template <typename T> static packet to_wire(T data) { packet ret; ret.push_back(get_data()); return ret; } static carrier_base::ptr from_wire(const packet &data) { auto p_data = new carrier<char>(); p_data->data() = data[0]; return std::unique_ptr<carrier<char>>(p_data); } }; // ---------------------------------------------------------------------------- // the traits for the boost xml serialization test // ---------------------------------------------------------------------------- // ---------------------------------------------------------------------------- // external serialization function // ---------------------------------------------------------------------------- namespace boost { namespace serialization { // serialization function for ev_test template <class Archive> inline void serialize(Archive &ar, ev_test &t, const unsigned int version) { ar &BOOST_SERIALIZATION_NVP(t.a); ar &BOOST_SERIALIZATION_NVP(t.b); } // serialization function for carrier<T> template <class Archive, typename T> void serialize(Archive &ar, carrier<T> &t, const unsigned int version) { ar &boost::serialization::make_nvp( "carrier_base", boost::serialization::base_object<carrier_base>(t)); // BOOST_SERIALIZATION_BASE_OBJECT_NVP(a); auto &data = t.data(); ar &BOOST_SERIALIZATION_NVP(data); } } } // we must export all carrier BOOST_SERIALIZATION_SHARED_PTR(carrier<ev_test>) BOOST_CLASS_EXPORT(carrier<ev_test>) struct boost_xml_trait { static const char *name() { return "boost_xml_test: ev_test: "; } typedef boost::archive::xml_oarchive oarchive; typedef boost::archive::xml_iarchive iarchive; }; struct boost_text_trait { static const char *name() { return "boost_text_test: ev_test: "; } typedef boost::archive::xml_oarchive oarchive; typedef boost::archive::xml_iarchive iarchive; }; struct boost_binary_trait { static const char *name() { return "boost_binary_test: ev_test: "; } typedef boost::archive::xml_oarchive oarchive; typedef boost::archive::xml_iarchive iarchive; }; template <typename archive_trait> struct boost_test { static const char *name() { return archive_trait::name(); } static size_t msg_size() { return 600; } // throws boost::archive::archive_exception template <typename T> static packet to_wire(T data) { using namespace boost::iostreams; using T1 = typename std::remove_cv<T>::type; using BT = typename std::remove_reference<T1>::type; carrier_base::ptr p_carrier = std::make_unique<carrier<BT>>(data); packet p; { back_insert_device<packet> sink{ p }; stream<back_insert_device<packet>> os{ sink }; typename archive_trait::oarchive oa(os); oa << BOOST_SERIALIZATION_NVP(p_carrier); } return p; } // throws boost::archive::archive_exception static carrier_base::ptr from_wire(const packet &data) { using namespace boost::iostreams; carrier_base::ptr p_carrier; boost::iostreams::array_source source{ data.data(), data.size() }; stream<array_source> is{ source }; typename archive_trait::iarchive ia(is); // this takes the most time ia >> BOOST_SERIALIZATION_NVP(p_carrier); return p_carrier; } }; // ---------------------------------------------------------------------------- // the traits for the cstyle serialization test // ---------------------------------------------------------------------------- // ---------------------------------------------------------------------------- // cstyle serialization with fnv type identifier // ---------------------------------------------------------------------------- struct cstyle_test { static const char *name() { return "cstyle_test: "; } static size_t msg_size() { return 1000; } // FowlerNollVo hash function // https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function struct fnv { const static uint64_t base = 0xcbf29ce484222325; const static uint64_t prime = 0x00000100000001b3; const static uint64_t next(uint64_t current, size_t char_code) { return (current ^ char_code) * prime; } const static uint64_t value(const char *cptr, uint64_t interim = base) { return (*cptr == '\0') ? interim : value(cptr + 1, next(interim, static_cast<size_t>(*cptr))); } }; template <typename T> static packet to_wire(T data) { packet buffer(msg_size()); // to be fair, send the same amount of bytes const uint64_t id = fnv::value(typeid(T).name()); size_t pos = 0; assert(msg_size() >= sizeof(uint64_t) + sizeof(T)); // buffer.resize(buffer.size() + sizeof(uint64_t) + sizeof(T)); std::memcpy(buffer.data() + pos, &id, sizeof(uint64_t)); pos += sizeof(uint64_t); std::memcpy(buffer.data() + pos, &data, sizeof(T)); pos += sizeof(T); return buffer; } static carrier_base::ptr from_wire(const packet &data) { // get the id size_t pos = 0; uint64_t id; std::memcpy(&id, data.data() + pos, sizeof(uint64_t)); pos += sizeof(uint64_t); // create the specific pointer if (id == fnv::value(typeid(ev_test).name())) { assert(data.size() >= sizeof(uint64_t) + sizeof(ev_test)); auto p_evt = new carrier<ev_test>(); std::memcpy(&p_evt->data(), data.data() + pos, sizeof(ev_test)); return carrier_base::ptr(p_evt); } throw std::runtime_error("deserialize error"); } }; template <typename trait> void do_test() { class runtime : public carrier_visitor_base { public: virtual void handle(carrier<ev_test> *p_evt) override { i += p_evt->data().a; } virtual void handle(carrier<char> *p_evt) override { i += p_evt->data(); } int i = 0; }; try { std::cout << "Sending " << trait::name() << test_count << std::endl; // Erase previous message queue message_queue::remove("message_queue"); // Create a message_queue. message_queue mq(create_only // only create , "message_queue" // name , test_count // max message number , trait::msg_size() // max message size ); // send ev_tests boost::timer::auto_cpu_timer t; for (int i = 0; i < test_count; ++i) { auto buffer = trait::to_wire(ev_test()); mq.send(buffer.data(), buffer.size(), 0); } } catch (const interprocess_exception &ex) { message_queue::remove("message_queue"); std::cout << ex.what() << std::endl; BOOST_CHECK(true == false); } runtime rt; try { std::cout << "Receiving " << trait::name() << test_count << std::endl; // Open a message queue. message_queue mq(open_only // only create , "message_queue" // name ); message_queue::size_type recvd_size; packet buffer(trait::msg_size()); unsigned int priority; boost::timer::auto_cpu_timer t; for (int i = 0; i < test_count; ++i) { // receive the raw data mq.receive(buffer.data(), buffer.size(), recvd_size, priority); // deserialize auto p_carrier = trait::from_wire(buffer); // handle it to the protocol p_carrier->accept(&rt); } } catch (const interprocess_exception &ex) { message_queue::remove("message_queue"); std::cout << ex.what() << std::endl; BOOST_CHECK(true == false); } message_queue::remove("message_queue"); std::cout << "RT Counter: " << rt.i << std::endl << std::endl; } BOOST_AUTO_TEST_CASE(carrier_boost_xml_ev_test) { do_test<boost_test<boost_xml_trait>>(); } BOOST_AUTO_TEST_CASE(carrier_boost_text_ev_test) { do_test<boost_test<boost_text_trait>>(); } BOOST_AUTO_TEST_CASE(carrier_boost_binary_ev_test) { do_test<boost_test<boost_binary_trait>>(); } BOOST_AUTO_TEST_CASE(carrier_cstyle_ev_test) { do_test<cstyle_test>(); } BOOST_AUTO_TEST_CASE(carrier_char_test) { do_test<char_test>(); }

Am 21.09.2016 um 10:14 schrieb georg@schorsch-tech.de:
struct boost_xml_trait { static const char *name() { return "boost_xml_test: ev_test: "; }
typedef boost::archive::xml_oarchive oarchive; typedef boost::archive::xml_iarchive iarchive; };
struct boost_text_trait { static const char *name() { return "boost_text_test: ev_test: "; }
typedef boost::archive::xml_oarchive oarchive; typedef boost::archive::xml_iarchive iarchive; };
struct boost_binary_trait { static const char *name() { return "boost_binary_test: ev_test: "; }
typedef boost::archive::xml_oarchive oarchive; typedef boost::archive::xml_iarchive iarchive; };
I already figured out that accidently had used in all three test xml archives. Now i can see bigger differences, but its still slower much than cstyle. Here on Linux x86_64 gcc 6.1.1. Running 5 test cases... Sending boost_xml_test: ev_test: 100000 1.979675s wall, 1.950000s user + 0.030000s system = 1.980000s CPU (100.0%) Receiving boost_xml_test: ev_test: 100000 3.253286s wall, 3.250000s user + 0.010000s system = 3.260000s CPU (100.2%) RT Counter: 100000 Sending boost_text_test: ev_test: 100000 1.667762s wall, 1.600000s user + 0.060000s system = 1.660000s CPU (99.5%) Receiving boost_text_test: ev_test: 100000 1.477573s wall, 1.480000s user + 0.000000s system = 1.480000s CPU (100.2%) RT Counter: 100000 Sending boost_binary_test: ev_test: 100000 1.303905s wall, 1.240000s user + 0.070000s system = 1.310000s CPU (100.5%) Receiving boost_binary_test: ev_test: 100000 1.132586s wall, 1.130000s user + 0.000000s system = 1.130000s CPU (99.8%) RT Counter: 100000 Sending cstyle_test: 100000 0.119564s wall, 0.070000s user + 0.050000s system = 0.120000s CPU (100.4%) Receiving cstyle_test: 100000 0.081580s wall, 0.080000s user + 0.000000s system = 0.080000s CPU (98.1%) RT Counter: 100000 Sending char_test 'c': 100000 0.125667s wall, 0.090000s user + 0.040000s system = 0.130000s CPU (103.4%) Receiving char_test 'c': 100000 0.086719s wall, 0.080000s user + 0.010000s system = 0.090000s CPU (103.8%) RT Counter: 9900000 *** No errors detected -- pgp key: 0x702C5BFC Fingerprint: 267F DC06 7F96 3375 969A 9EE6 8E37 7CF4 702C 5BFC

hI,
I already figured out that accidently had used in all three test xml archives. Now i can see bigger differences, but its still slower much than cstyle.
Also try to not create a stream and an archive in every iteration, but instead reuse it. Especially the streams can be quite expensive to create.

Am 21.09.2016 um 19:36 schrieb Bjorn Reese:
On 09/21/2016 06:35 PM, Georg Gast wrote:
Now i can see bigger differences, but its still slower much than cstyle.
The Boost archives use iostreams, whereas cstyle uses memcpy.
Yes, thats clear. :) After watching "CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!" [1] i used on linux the google/benchmark [2] library to measure more precise. [1] https://www.youtube.com/watch?v=nXaxk27zwlk [2] https://github.com/google/benchmark This seems to be much better to judge the performance. I let each test run for at least 10 seconds. ----------------------------------------------------- Linux x64 gcc 6.1.1 Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 16073 16072 872818 to_wire_text 14413 14409 997151 to_wire_binary 10384 10520 1268116 to_wire_cstyle 218 218 63405797 from_wire_xml 32202 32209 434783 from_wire_text 13322 13320 1023392 from_wire_binary 9906 9906 1402806 from_wire_cstyle 210 210 66666667 ----------------------------------------------------- ----------------------------------------------------- Win 10 x64 MSVC 2015 Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 84145 84027 173308 to_wire_text 54691 54751 250279 to_wire_binary 44086 44028 315493 to_wire_cstyle 110 110 126197183 from_wire_xml 97023 96801 143820 from_wire_text 51315 51250 273171 from_wire_binary 43359 43408 320000 from_wire_cstyle 103 103 135757576 ----------------------------------------------------- My opinion: gcc is better at optimizing.... This must be the reason why windows is slower at the archives. ----------------------------------------------------- The code ----------------------------------------------------- static void to_wire_xml(benchmark::State& state) { while (state.KeepRunning()) { boost_test<boost_xml_trait>::to_wire(ev_test()); } } BENCHMARK(to_wire_xml); static void to_wire_text(benchmark::State& state) { while (state.KeepRunning()) { boost_test<boost_text_trait>::to_wire(ev_test()); } } BENCHMARK(to_wire_text); static void to_wire_binary(benchmark::State& state) { while (state.KeepRunning()) { boost_test<boost_binary_trait>::to_wire(ev_test()); } } BENCHMARK(to_wire_binary); static void to_wire_cstyle(benchmark::State& state) { while (state.KeepRunning()) { cstyle_test::to_wire(ev_test()); } } BENCHMARK(to_wire_cstyle); static void from_wire_xml(benchmark::State& state) { auto buffer = boost_test<boost_xml_trait>::to_wire(ev_test()); while (state.KeepRunning()) { boost_test<boost_xml_trait>::from_wire(buffer); } } BENCHMARK(from_wire_xml); static void from_wire_text(benchmark::State& state) { auto buffer = boost_test<boost_text_trait>::to_wire(ev_test()); while (state.KeepRunning()) { boost_test<boost_text_trait>::from_wire(buffer); } } BENCHMARK(from_wire_text); static void from_wire_binary(benchmark::State& state) { auto buffer = boost_test<boost_binary_trait>::to_wire(ev_test()); while (state.KeepRunning()) { boost_test<boost_binary_trait>::from_wire(buffer); } } BENCHMARK(from_wire_binary); static void from_wire_cstyle(benchmark::State& state) { auto buffer = cstyle_test::to_wire(ev_test()); while (state.KeepRunning()) { cstyle_test::from_wire(buffer); } } BENCHMARK(from_wire_cstyle); -- pgp key: 0x702C5BFC Fingerprint: 267F DC06 7F96 3375 969A 9EE6 8E37 7CF4 702C 5BFC

On 09/21/2016 08:35 PM, Georg Gast wrote:
Am 21.09.2016 um 19:36 schrieb Bjorn Reese:
The Boost archives use iostreams, whereas cstyle uses memcpy.
Yes, thats clear. :)
I am not sure how to interpret your response. My statement was not a casual observation about your tests, but the main explanation for the difference in performance. That is one of the reasons why my own archives, unlike the ones that are part of Boost.Serialization, are constructed to serialize directly to/from other container types such as arrays, std::string, and std::vector.

Hello Bjorn, I just wanted to make clear that I know that this c style thing uses memcpy. I would like to use the boost serialization to have its advantages compared to the c style thing. In fact I set up this test to see what runtime costs are there compared to c memcpy. I use boost serialization a lot, but to now not on a time critical path. In my source I use for the streams boost iostream array_source/sink to serialization into/from a vector of chars (my packet typedef). Could you please elaborate what is different in your archive? Thanks! Am 21. September 2016 23:31:09 MESZ, schrieb Bjorn Reese <breese@mail1.stofanet.dk>:
On 09/21/2016 08:35 PM, Georg Gast wrote:
Am 21.09.2016 um 19:36 schrieb Bjorn Reese:
The Boost archives use iostreams, whereas cstyle uses memcpy.
Yes, thats clear. :)
I am not sure how to interpret your response. My statement was not a casual observation about your tests, but the main explanation for the difference in performance.
That is one of the reasons why my own archives, unlike the ones that are part of Boost.Serialization, are constructed to serialize directly to/from other container types such as arrays, std::string, and std::vector.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

On 09/21/2016 08:35 PM, Georg Gast wrote:
Am 21.09.2016 um 19:36 schrieb Bjorn Reese:
The Boost archives use iostreams, whereas cstyle uses memcpy.
Yes, thats clear. :)
I am not sure how to interpret your response. My statement was not a casual observation about your tests, but the main explanation for the difference in performance.
That is one of the reasons why my own archives, unlike the ones that are part of Boost.Serialization, are constructed to serialize directly to/from other container types such as arrays, std::string, and std::vector.
I just found out one issue on Windows: static void to_wire_xml(benchmark::State& state) { //std::locale::global(std::locale("C")); while (state.KeepRunning()) { boost_test<boost_xml_trait>::to_wire(ev_test()); } } If i toggle the commented line, the cost goes down to the half. With the Windows profiler i found out, that the construction of the locale takes so much time. Without global set locale: Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 78066 77177 7479 from_wire_xml 95638 95949 7479 With global set locale: 09/22/16 08:32:49 Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 41399 41302 16619 from_wire_xml 52841 52844 11218 Thats amazing! Thats the Level of the Linux implementation. One riddle is solved :)

Am 22.09.2016 um 08:35 schrieb georg@schorsch-tech.de:
I just found out one issue on Windows:
static void to_wire_xml(benchmark::State& state) { //std::locale::global(std::locale("C"));
while (state.KeepRunning()) { boost_test<boost_xml_trait>::to_wire(ev_test()); } }
If i toggle the commented line, the cost goes down to the half. With the Windows profiler i found out, that the construction of the locale takes so much time.
Without global set locale: Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 78066 77177 7479 from_wire_xml 95638 95949 7479
With global set locale: 09/22/16 08:32:49 Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 41399 41302 16619 from_wire_xml 52841 52844 11218
Thats amazing! Thats the Level of the Linux implementation.
One riddle is solved :)
After fiddling around on linux with clang 3.8 and gcc optimizer options i got down to this. With gcc and -O3. Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 11174 11178 381818 to_wire_text 5148 5149 820313 to_wire_binary 3327 3330 1141304 to_wire_cstyle 63 63 65217391 from_wire_xml 27170 27183 155096 from_wire_text 5371 5370 783582 from_wire_binary 3226 3228 1296296 from_wire_cstyle 45 45 93750000 This results look very nice. <6µs for serilize/deserialize a structure to a portable text archive seems very nice :) Now is the difference again pretty big compared to windows ..... -- pgp key: 0x702C5BFC Fingerprint: 267F DC06 7F96 3375 969A 9EE6 8E37 7CF4 702C 5BFC

After fiddling around on linux with clang 3.8 and gcc optimizer options i got down to this. With gcc and -O3.
Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 11174 11178 381818 to_wire_text 5148 5149 820313 to_wire_binary 3327 3330 1141304 to_wire_cstyle 63 63 65217391 from_wire_xml 27170 27183 155096 from_wire_text 5371 5370 783582 from_wire_binary 3226 3228 1296296 from_wire_cstyle 45 45 93750000
This results look very nice. <6µs for serilize/deserialize a structure to a portable text archive seems very nice :)
Now is the difference again pretty big compared to windows .....
For what it's worth, in tests I've done in the past, binary serialization using boost.serialization and other similar systems was not this big of a difference compared to memcpy. I was seeing maybe a 5x to 10x difference compared to memcpy (yours is 50x). Of course, this depends on a lot of factors, like how much data this is because it would determine if you are memory bound or not, but I am wondering if your cstyle tests are actually being completely optimized away. Have you examined the disassembly? If you find the code is being optimized away, Google Benchmark has a handy "benchmark::DoNotOptimize" function to help keep the optimizer from throwing away the side effects of a particular address. -- chris

Am 22.09.2016 um 22:38 schrieb Chris Glover:
For what it's worth, in tests I've done in the past, binary serialization using boost.serialization and other similar systems was not this big of a difference compared to memcpy. I was seeing maybe a 5x to 10x difference compared to memcpy (yours is 50x).
Of course, this depends on a lot of factors, like how much data this is because it would determine if you are memory bound or not, but I am wondering if your cstyle tests are actually being completely optimized away. Have you examined the disassembly? If you find the code is being optimized away, Google Benchmark has a handy "benchmark::DoNotOptimize" function to help keep the optimizer from throwing away the side effects of a particular address.
Thanks for that hint. I changed the test and set up a range for the data size. As the size exceeded about 2 Mb, the cstyle thing got half as fast as the boost::binary_archive. That surprised me and i decided to not go that path (cstyle), because i really like to use the library. As i setup the ranges for the sizes, i got a lot of results. In fact one thing is notable, the processed bytes/sec. As the data size grows higher than 512 bytes, the processing speed settles at the max rate (for textual archives). It seems each kind of archive has its own limit. The xml archive, as its the most verbose, has the least speed. The binary_archive seems to "cheat" on this test... 7 GB/s ... I guess it just lays totally in the cache. The test were done with gcc 6.1.1/boost 1.61.0 and -O3 optimization. For documentation issues, i add here my results on linux and the current code. Georg Benchmark Time(ns) CPU(ns) Iterations Bandwidth ------------------------------------------------------------------- to_wire_xml/8 15797 15840 31818 493.211kB/s to_wire_xml/64 37456 37385 18617 1.6326MB/s to_wire_xml/512 211649 211188 3182 2.31207MB/s to_wire_xml/4k 1639611 1639344 427 2.38281MB/s to_wire_xml/32k 13641742 13647059 51 2.28987MB/s to_wire_xml/256k 106978476 107333333 6 2.32919MB/s to_wire_xml/2M 870869606 872000000 1 2.29358MB/s to_wire_xml/4M 1819503270 1816000000 1 2.20264MB/s from_wire_xml/8 31584 31600 22152 247.232kB/s from_wire_xml/64 56806 56640 12500 1103.46kB/s from_wire_xml/512 240413 239021 2778 2.04284MB/s from_wire_xml/4k 1742682 1739558 407 2.24554MB/s from_wire_xml/32k 14104072 14122449 49 2.21279MB/s from_wire_xml/256k 113079335 113142857 7 2.2096MB/s from_wire_xml/2M 846656504 844000000 1 2.36967MB/s from_wire_xml/8M 3387609285 3388000000 1 2.36128MB/s to_wire_text/8 6204 6181 109375 1.23442MB/s to_wire_text/64 9197 9200 76087 6.63426MB/s to_wire_text/512 31154 31095 23026 15.7027MB/s to_wire_text/4k 201879 200892 3365 19.4446MB/s to_wire_text/32k 1624883 1620609 427 19.2829MB/s to_wire_text/256k 12647559 12654545 55 19.7557MB/s to_wire_text/2M 100406115 100000000 7 20MB/s to_wire_text/4M 216889302 216000000 3 18.5185MB/s from_wire_text/8 6283 6256 102941 1.21953MB/s from_wire_text/64 9104 9095 76087 6.71096MB/s from_wire_text/512 33779 33810 20349 14.4419MB/s from_wire_text/4k 224963 225219 2966 17.3442MB/s from_wire_text/32k 1759826 1757895 380 17.7769MB/s from_wire_text/256k 14159723 14122449 49 17.7023MB/s from_wire_text/2M 112441804 112666667 6 17.7515MB/s from_wire_text/4M 224818542 225333333 3 17.7515MB/s to_wire_binary/8 4257 4256 163551 1.79281MB/s to_wire_binary/64 4405 4394 162037 13.8904MB/s to_wire_binary/512 4324 4325 159091 112.909MB/s to_wire_binary/4k 5180 5200 134615 751.2MB/s to_wire_binary/32k 11714 11657 58333 2.61791GB/s to_wire_binary/256k 74599 74693 9211 3.26857GB/s to_wire_binary/2M 1160753 1159520 583 1.68443GB/s to_wire_binary/4M 2583586 2578755 273 1.51478GB/s from_wire_binary/8 3509 3500 201149 2.17989MB/s from_wire_binary/64 3476 3480 201149 17.5388MB/s from_wire_binary/512 3601 3598 192308 135.694MB/s from_wire_binary/4k 3833 3840 182292 1017.25MB/s from_wire_binary/32k 6697 6683 102941 4.56615GB/s from_wire_binary/256k 33168 33201 21084 7.35352GB/s from_wire_binary/2M 268648 268842 2574 7.26495GB/s from_wire_binary/4M 820816 821128 833 4.75717GB/s <code> // STL Archive + Stuff #include <boost/serialization/base_object.hpp> #include <boost/serialization/binary_object.hpp> #include <boost/serialization/export.hpp> #include <boost/serialization/shared_ptr.hpp> #include <boost/serialization/split_free.hpp> #include <boost/serialization/unique_ptr.hpp> // include headers that implement a archives in xml/text/binary format #include <boost/archive/archive_exception.hpp> #include <boost/archive/xml_iarchive.hpp> #include <boost/archive/xml_oarchive.hpp> #include <boost/archive/text_iarchive.hpp> #include <boost/archive/text_oarchive.hpp> #include <boost/archive/binary_iarchive.hpp> #include <boost/archive/binary_oarchive.hpp> // IO stream for the to/from wire functions #include <boost/iostreams/device/array.hpp> #include <boost/iostreams/device/back_inserter.hpp> #include <boost/iostreams/stream.hpp> #include <memory> #include <cstdint> #include <vector> #include <benchmark/benchmark.h> // the step interval for the benchmarks static const int range_mult = 4; static const int range_max_step = 20; // the test structure struct ev_test { ev_test(size_t s = 0) { m_data.resize(s); for (auto &c : m_data) c = 1; } std::vector<uint8_t> m_data; }; //----------------------------------------------------------------------------- // Type carrier and its support //---------------------------------------------------------------------------- namespace net { using packet = std::vector<char>; // a packet on the wire class carrier_visitor_base; class carrier_base // the base in the queue { public: using ptr = std::unique_ptr<carrier_base>; virtual ~carrier_base() {} virtual void accept(carrier_visitor_base *p_visitor) = 0; }; template <typename T> class carrier; class carrier_visitor_base { public: virtual ~carrier_visitor_base() {} virtual void handle(carrier<ev_test> *p_evt) = 0; virtual void handle(carrier<char> *p_evt) = 0; virtual void handle(carrier<int> *p_evt) = 0; }; template <typename T> class carrier : public carrier_base // the specific carrier { public: explicit carrier() : m_data() {} explicit carrier(const T &data) : m_data(data) {} virtual void accept(carrier_visitor_base *p_visitor) override { p_visitor->handle(this); } T &data() { return m_data; } private: T m_data; }; } // ns net //---------------------------------------------------------------------------- // external serialization function //---------------------------------------------------------------------------- BOOST_SERIALIZATION_SPLIT_FREE(ev_test) namespace boost { namespace serialization { // serialization function for carrier_base template <class Archive> void serialize(Archive &ar, net::carrier_base &t, const unsigned int version) { } // serialization function for net::carrier<T> template <class Archive, typename T> void serialize(Archive &ar, net::carrier<T> &t, const unsigned int version) { ar &boost::serialization::make_nvp( "carrier_base", boost::serialization::base_object<net::carrier_base>(t)); auto &data = t.data(); ar &BOOST_SERIALIZATION_NVP(data); } // serialization function for ev_test template <class Archive> inline void save(Archive &ar, const ev_test &t, const unsigned int version) { size_t size = t.m_data.size(); ar &BOOST_SERIALIZATION_NVP(size); ar &boost::serialization::make_nvp( "m_data", boost::serialization::make_array(t.m_data.data(), t.m_data.size())); } template <class Archive> inline void load(Archive &ar, ev_test &t, const unsigned int version) { size_t size = 0; ar &BOOST_SERIALIZATION_NVP(size); t.m_data.resize(size); ar &boost::serialization::make_nvp( "m_data", boost::serialization::make_array(t.m_data.data(), t.m_data.size())); } } } // we must export all carrier BOOST_SERIALIZATION_SHARED_PTR(net::carrier<ev_test>) BOOST_CLASS_EXPORT(net::carrier<ev_test>) //---------------------------------------------------------------------------- // the traits for the boost serialization tests //---------------------------------------------------------------------------- struct boost_xml_trait { static const char *name() { return "boost_xml_test: ev_test: "; } typedef boost::archive::xml_oarchive oarchive; typedef boost::archive::xml_iarchive iarchive; }; struct boost_text_trait { static const char *name() { return "boost_text_test: ev_test: "; } typedef boost::archive::text_oarchive oarchive; typedef boost::archive::text_iarchive iarchive; }; struct boost_binary_trait { static const char *name() { return "boost_binary_test: ev_test: "; } typedef boost::archive::binary_oarchive oarchive; typedef boost::archive::binary_iarchive iarchive; }; template <typename archive_trait> struct boost_test { static const char *name() { return archive_trait::name(); } static size_t msg_size() { return 600; } // throws boost::archive::archive_exception template <typename T> static net::packet to_wire(const T &data) { using namespace boost::iostreams; using T1 = typename std::remove_cv<T>::type; using BT = typename std::remove_reference<T1>::type; net::carrier_base::ptr p_carrier = std::make_unique<net::carrier<BT>>(data); net::packet p; p.reserve(msg_size()); { back_insert_device<net::packet> sink(p); stream<back_insert_device<net::packet>> os{sink}; typename archive_trait::oarchive oa(os); oa << BOOST_SERIALIZATION_NVP(p_carrier); } return p; } // throws boost::archive::archive_exception static net::carrier_base::ptr from_wire(const net::packet &data) { using namespace boost::iostreams; array_source source{data.data(), data.size()}; stream<array_source> is{source}; net::carrier_base::ptr p_carrier; typename archive_trait::iarchive ia(is); // this takes the most time ia >> BOOST_SERIALIZATION_NVP(p_carrier); return p_carrier; } }; //---------------------------------------------------------------------------- // XML //---------------------------------------------------------------------------- static void to_wire_xml(benchmark::State &state) { std::locale::global(std::locale("C")); ev_test data(state.range_x()); while (state.KeepRunning()) { boost_test<boost_xml_trait>::to_wire(data); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(to_wire_xml)->Range(8, range_mult << range_max_step); static void from_wire_xml(benchmark::State &state) { std::locale::global(std::locale("C")); auto buffer = boost_test<boost_xml_trait>::to_wire(ev_test(state.range_x())); while (state.KeepRunning()) { boost_test<boost_xml_trait>::from_wire(buffer); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(from_wire_xml)->Range(8, 8 << range_max_step); //---------------------------------------------------------------------------- // Text //---------------------------------------------------------------------------- static void to_wire_text(benchmark::State &state) { std::locale::global(std::locale("C")); ev_test data(state.range_x()); while (state.KeepRunning()) { boost_test<boost_text_trait>::to_wire(data); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(to_wire_text)->Range(8, range_mult << range_max_step); static void from_wire_text(benchmark::State &state) { std::locale::global(std::locale("C")); auto buffer = boost_test<boost_text_trait>::to_wire(ev_test(state.range_x())); while (state.KeepRunning()) { boost_test<boost_text_trait>::from_wire(buffer); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(from_wire_text)->Range(8, range_mult << range_max_step); //---------------------------------------------------------------------------- // Binary //---------------------------------------------------------------------------- static void to_wire_binary(benchmark::State &state) { std::locale::global(std::locale("C")); ev_test data(state.range_x()); while (state.KeepRunning()) { boost_test<boost_binary_trait>::to_wire(data); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(to_wire_binary)->Range(8, range_mult << range_max_step); static void from_wire_binary(benchmark::State &state) { std::locale::global(std::locale("C")); auto buffer = boost_test<boost_binary_trait>::to_wire(ev_test(state.range_x())); while (state.KeepRunning()) { boost_test<boost_binary_trait>::from_wire(buffer); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(from_wire_binary)->Range(8, range_mult << range_max_step); BENCHMARK_MAIN(); </code> -- pgp key: 0x702C5BFC Fingerprint: 267F DC06 7F96 3375 969A 9EE6 8E37 7CF4 702C 5BFC

Am 22.09.2016 um 22:38 schrieb Chris Glover:
For what it's worth, in tests I've done in the past, binary serialization using boost.serialization and other similar systems was not this big of a difference compared to memcpy. I was seeing maybe a 5x to 10x difference compared to memcpy (yours is 50x).
Of course, this depends on a lot of factors, like how much data this is because it would determine if you are memory bound or not, but I am wondering if your cstyle tests are actually being completely optimized away. Have you examined the disassembly? If you find the code is being optimized away, Google Benchmark has a handy "benchmark::DoNotOptimize" function to help keep the optimizer from throwing away the side effects of a particular address.
Hi, i have prepared a comparison of the same code between linux and windows (msvc). There is such a huge gap between linux and windows. I am not sure, if i should post the images to this mailing list. AFAIK is it seen as bad behaviour to post non text to this list, so i added the link. https://www.schorsch-tech.de/doku.php?id=c:boost_serializationwindows Has anyone an idea why this is the case? Look at this diagrams, i cant explain whats happening here. its on the same maschine on bare metal (no VM). Georg

I guess you are still comparing release to debug version. I've ran your code and this is what I've got Win7 x64, VS2015 Update3, Release, x64 Run on (8 X 3392 MHz CPU s) 09/25/16 07:36:53 Benchmark Time CPU Iterations ------------------------------------------------------------ to_wire_xml/8 39564 ns 39980 ns 17949 195.409kB/s to_wire_xml/64 98915 ns 98035 ns 7479 637.527kB/s to_wire_xml/512 583376 ns 583961 ns 1122 856.222kB/s to_wire_xml/4k 4494721 ns 4428415 ns 155 903.258kB/s to_wire_xml/32k 35621888 ns 35100225 ns 20 911.675kB/s to_wire_xml/256k 285296564 ns 280801800 ns 2 911.675kB/s to_wire_xml/2M 2294702350 ns 2293214700 ns 1 893.069kB/s to_wire_xml/4M 4596100456 ns 4586429400 ns 1 893.069kB/s from_wire_xml/8 44701 ns 44361 ns 15473 176.11kB/s from_wire_xml/64 97779 ns 98035 ns 7479 637.527kB/s from_wire_xml/512 523956 ns 530403 ns 1000 942.679kB/s from_wire_xml/4k 3919513 ns 3877482 ns 173 1031.6kB/s from_wire_xml/32k 30990532 ns 31200200 ns 22 1025.63kB/s from_wire_xml/256k 248254367 ns 249601600 ns 3 1025.63kB/s from_wire_xml/2M 1990579271 ns 1981212700 ns 1 1033.71kB/s from_wire_xml/8M 7927240207 ns 7924850800 ns 1 1033.71kB/s to_wire_text/8 13381 ns 13142 ns 49857 594.483kB/s to_wire_text/64 31969 ns 31985 ns 22436 1.90827MB/s to_wire_text/512 180335 ns 179751 ns 4079 2.71643MB/s to_wire_text/4k 1363654 ns 1375560 ns 499 2.83975MB/s to_wire_text/32k 10990438 ns 10968820 ns 64 2.84898MB/s to_wire_text/256k 86883137 ns 88400567 ns 9 2.82804MB/s to_wire_text/2M 696132001 ns 686404400 ns 1 2.91373MB/s to_wire_text/4M 1398212634 ns 1388408900 ns 1 2.881MB/s from_wire_text/8 11158 ns 11195 ns 64102 697.873kB/s from_wire_text/64 25274 ns 25588 ns 28045 2.38534MB/s from_wire_text/512 138245 ns 137666 ns 4986 3.54685MB/s from_wire_text/4k 1047166 ns 1046497 ns 641 3.73269MB/s from_wire_text/32k 8304279 ns 8320053 ns 90 3.75599MB/s from_wire_text/256k 66510527 ns 66654973 ns 11 3.75066MB/s from_wire_text/2M 533393808 ns 530403400 ns 1 3.77071MB/s from_wire_text/4M 1055956857 ns 1060806800 ns 1 3.77071MB/s to_wire_binary/8 5444 ns 5460 ns 100000 1.39732MB/s to_wire_binary/64 5411 ns 5424 ns 112179 11.2538MB/s to_wire_binary/512 5523 ns 5563 ns 112179 87.7797MB/s to_wire_binary/4k 5966 ns 5980 ns 112179 653.244MB/s to_wire_binary/32k 28940 ns 29412 ns 24929 1062.5MB/s to_wire_binary/256k 251626 ns 250358 ns 2804 998.569MB/s to_wire_binary/2M 2548630 ns 2540925 ns 264 787.115MB/s to_wire_binary/4M 6361041 ns 6407184 ns 112 624.299MB/s from_wire_binary/8 5363 ns 5284 ns 112179 1.44375MB/s from_wire_binary/64 5371 ns 5460 ns 100000 11.1785MB/s from_wire_binary/512 5386 ns 5460 ns 100000 89.4282MB/s from_wire_binary/4k 5483 ns 5424 ns 112179 720.244MB/s from_wire_binary/32k 7685 ns 7649 ns 89743 3.98998GB/s from_wire_binary/256k 25332 ns 25588 ns 28045 9.54136GB/s from_wire_binary/2M 620654 ns 625672 ns 1122 3.12164GB/s from_wire_binary/4M 1306333 ns 1306960 ns 561 2.98881GB/s -----Original Message----- From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Georg Gast Sent: Sunday, September 25, 2016 6:42 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] [serialization] Runtime overhead of serialization archives Am 22.09.2016 um 22:38 schrieb Chris Glover:
For what it's worth, in tests I've done in the past, binary serialization using boost.serialization and other similar systems was not this big of a difference compared to memcpy. I was seeing maybe a 5x to 10x difference compared to memcpy (yours is 50x).
Of course, this depends on a lot of factors, like how much data this is because it would determine if you are memory bound or not, but I am wondering if your cstyle tests are actually being completely optimized away. Have you examined the disassembly? If you find the code is being optimized away, Google Benchmark has a handy "benchmark::DoNotOptimize" function to help keep the optimizer from throwing away the side effects of a particular address.
Hi, i have prepared a comparison of the same code between linux and windows (msvc). There is such a huge gap between linux and windows. I am not sure, if i should post the images to this mailing list. AFAIK is it seen as bad behaviour to post non text to this list, so i added the link. https://www.schorsch-tech.de/doku.php?id=c:boost_serializationwindows Has anyone an idea why this is the case? Look at this diagrams, i cant explain whats happening here. its on the same maschine on bare metal (no VM). Georg _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Am 25.09.2016 um 07:05 schrieb Ernest Zaslavsky:
I guess you are still comparing release to debug version. I've ran your code and this is what I've got
Win7 x64, VS2015 Update3, Release, x64
Run on (8 X 3392 MHz CPU s) 09/25/16 07:36:53 Benchmark Time CPU Iterations ------------------------------------------------------------ to_wire_xml/8 39564 ns 39980 ns 17949 195.409kB/s to_wire_xml/64 98915 ns 98035 ns 7479 637.527kB/s to_wire_xml/512 583376 ns 583961 ns 1122 856.222kB/s to_wire_xml/4k 4494721 ns 4428415 ns 155 903.258kB/s to_wire_xml/32k 35621888 ns 35100225 ns 20 911.675kB/s to_wire_xml/256k 285296564 ns 280801800 ns 2 911.675kB/s to_wire_xml/2M 2294702350 ns 2293214700 ns 1 893.069kB/s to_wire_xml/4M 4596100456 ns 4586429400 ns 1 893.069kB/s from_wire_xml/8 44701 ns 44361 ns 15473 176.11kB/s from_wire_xml/64 97779 ns 98035 ns 7479 637.527kB/s from_wire_xml/512 523956 ns 530403 ns 1000 942.679kB/s from_wire_xml/4k 3919513 ns 3877482 ns 173 1031.6kB/s from_wire_xml/32k 30990532 ns 31200200 ns 22 1025.63kB/s from_wire_xml/256k 248254367 ns 249601600 ns 3 1025.63kB/s from_wire_xml/2M 1990579271 ns 1981212700 ns 1 1033.71kB/s from_wire_xml/8M 7927240207 ns 7924850800 ns 1 1033.71kB/s to_wire_text/8 13381 ns 13142 ns 49857 594.483kB/s to_wire_text/64 31969 ns 31985 ns 22436 1.90827MB/s to_wire_text/512 180335 ns 179751 ns 4079 2.71643MB/s to_wire_text/4k 1363654 ns 1375560 ns 499 2.83975MB/s to_wire_text/32k 10990438 ns 10968820 ns 64 2.84898MB/s to_wire_text/256k 86883137 ns 88400567 ns 9 2.82804MB/s to_wire_text/2M 696132001 ns 686404400 ns 1 2.91373MB/s to_wire_text/4M 1398212634 ns 1388408900 ns 1 2.881MB/s from_wire_text/8 11158 ns 11195 ns 64102 697.873kB/s from_wire_text/64 25274 ns 25588 ns 28045 2.38534MB/s from_wire_text/512 138245 ns 137666 ns 4986 3.54685MB/s from_wire_text/4k 1047166 ns 1046497 ns 641 3.73269MB/s from_wire_text/32k 8304279 ns 8320053 ns 90 3.75599MB/s from_wire_text/256k 66510527 ns 66654973 ns 11 3.75066MB/s from_wire_text/2M 533393808 ns 530403400 ns 1 3.77071MB/s from_wire_text/4M 1055956857 ns 1060806800 ns 1 3.77071MB/s to_wire_binary/8 5444 ns 5460 ns 100000 1.39732MB/s to_wire_binary/64 5411 ns 5424 ns 112179 11.2538MB/s to_wire_binary/512 5523 ns 5563 ns 112179 87.7797MB/s to_wire_binary/4k 5966 ns 5980 ns 112179 653.244MB/s to_wire_binary/32k 28940 ns 29412 ns 24929 1062.5MB/s to_wire_binary/256k 251626 ns 250358 ns 2804 998.569MB/s to_wire_binary/2M 2548630 ns 2540925 ns 264 787.115MB/s to_wire_binary/4M 6361041 ns 6407184 ns 112 624.299MB/s from_wire_binary/8 5363 ns 5284 ns 112179 1.44375MB/s from_wire_binary/64 5371 ns 5460 ns 100000 11.1785MB/s from_wire_binary/512 5386 ns 5460 ns 100000 89.4282MB/s from_wire_binary/4k 5483 ns 5424 ns 112179 720.244MB/s from_wire_binary/32k 7685 ns 7649 ns 89743 3.98998GB/s from_wire_binary/256k 25332 ns 25588 ns 28045 9.54136GB/s from_wire_binary/2M 620654 ns 625672 ns 1122 3.12164GB/s from_wire_binary/4M 1306333 ns 1306960 ns 561 2.98881GB/s
Dear Earnest, Thanks for running my code :) have you run my code from the post at 23.9.16 19:41 (make_array)? In your result the XML settles at about 2.9 MB/sec (from wire) and the text archive at about 3.8 MB/sec (from wire). My guess from this values is, that you testes the version with make_array in the serialization functions. I realized this DEBUG thing too and i fixed it in the meantime. On my desktop workstation The make_binary_object improved the performance a lot. Here is the current version of the source from my homepage Georg <code> // STL Archive + Stuff #include <boost/serialization/base_object.hpp> #include <boost/serialization/binary_object.hpp> #include <boost/serialization/export.hpp> #include <boost/serialization/shared_ptr.hpp> #include <boost/serialization/split_free.hpp> #include <boost/serialization/unique_ptr.hpp> // include headers that implement a archives in xml/text/binary format #include <boost/archive/archive_exception.hpp> #include <boost/archive/xml_iarchive.hpp> #include <boost/archive/xml_oarchive.hpp> #include <boost/archive/text_iarchive.hpp> #include <boost/archive/text_oarchive.hpp> #include <boost/archive/binary_iarchive.hpp> #include <boost/archive/binary_oarchive.hpp> // IO stream for the to/from wire functions #include <boost/iostreams/device/array.hpp> #include <boost/iostreams/device/back_inserter.hpp> #include <boost/iostreams/stream.hpp> #include <memory> #include <cstdint> #include <vector> #include <benchmark/benchmark.h> // the step interval for the benchmarks static const int range_mult = 4; static const int range_max_step = 20; // the test structure struct ev_test { ev_test(size_t s = 0) { m_data.resize(s); for (auto &c : m_data) c = 1; } std::vector<uint8_t> m_data; }; //----------------------------------------------------------------------------- // Type carrier and its support //---------------------------------------------------------------------------- namespace net { using packet = std::vector<char>; // a packet on the wire class carrier_visitor_base; class carrier_base // the base in the queue { public: using ptr = std::unique_ptr<carrier_base>; virtual ~carrier_base() {} virtual void accept(carrier_visitor_base *p_visitor) = 0; }; template <typename T> class carrier; class carrier_visitor_base { public: virtual ~carrier_visitor_base() {} virtual void handle(carrier<ev_test> *p_evt) = 0; virtual void handle(carrier<char> *p_evt) = 0; virtual void handle(carrier<int> *p_evt) = 0; }; template <typename T> class carrier : public carrier_base // the specific carrier { public: explicit carrier() : m_data() {} explicit carrier(const T &data) : m_data(data) {} virtual void accept(carrier_visitor_base *p_visitor) override { p_visitor->handle(this); } T &data() { return m_data; } private: T m_data; }; } // ns net //---------------------------------------------------------------------------- // external serialization function //---------------------------------------------------------------------------- BOOST_SERIALIZATION_SPLIT_FREE(ev_test) namespace boost { namespace serialization { // serialization function for carrier_base template <class Archive> void serialize(Archive &ar, net::carrier_base &t, const unsigned int version) { } // serialization function for net::carrier<T> template <class Archive, typename T> void serialize(Archive &ar, net::carrier<T> &t, const unsigned int version) { ar &boost::serialization::make_nvp( "carrier_base", boost::serialization::base_object<net::carrier_base>(t)); auto &data = t.data(); ar &BOOST_SERIALIZATION_NVP(data); } // serialization function for ev_test template <class Archive> inline void save(Archive &ar, const ev_test &t, const unsigned int version) { size_t size = t.m_data.size(); ar &BOOST_SERIALIZATION_NVP(size); ar &boost::serialization::make_nvp( "m_data", boost::serialization::make_binary_object(const_cast<uint8_t*>(t.m_data.data()), t.m_data.size())); } template <class Archive> inline void load(Archive &ar, ev_test &t, const unsigned int version) { size_t size = 0; ar &BOOST_SERIALIZATION_NVP(size); t.m_data.resize(size); ar &boost::serialization::make_nvp( "m_data", boost::serialization::make_binary_object(t.m_data.data(), t.m_data.size())); } } } // we must export all carrier BOOST_SERIALIZATION_SHARED_PTR(net::carrier<ev_test>) BOOST_CLASS_EXPORT(net::carrier<ev_test>) //---------------------------------------------------------------------------- // the traits for the boost serialization tests //---------------------------------------------------------------------------- struct boost_xml_trait { static const char *name() { return "boost_xml_test: ev_test: "; } typedef boost::archive::xml_oarchive oarchive; typedef boost::archive::xml_iarchive iarchive; }; struct boost_text_trait { static const char *name() { return "boost_text_test: ev_test: "; } typedef boost::archive::text_oarchive oarchive; typedef boost::archive::text_iarchive iarchive; }; struct boost_binary_trait { static const char *name() { return "boost_binary_test: ev_test: "; } typedef boost::archive::binary_oarchive oarchive; typedef boost::archive::binary_iarchive iarchive; }; template <typename archive_trait> struct boost_test { static const char *name() { return archive_trait::name(); } static size_t msg_size() { return 600; } // throws boost::archive::archive_exception template <typename T> static net::packet to_wire(const T &data) { using namespace boost::iostreams; using T1 = typename std::remove_cv<T>::type; using BT = typename std::remove_reference<T1>::type; net::carrier_base::ptr p_carrier = std::make_unique<net::carrier<BT>>(data); net::packet p; p.reserve(msg_size()); { back_insert_device<net::packet> sink(p); stream<back_insert_device<net::packet>> os{ sink }; typename archive_trait::oarchive oa(os); oa << BOOST_SERIALIZATION_NVP(p_carrier); } return p; } // throws boost::archive::archive_exception static net::carrier_base::ptr from_wire(const net::packet &data) { using namespace boost::iostreams; array_source source{ data.data(), data.size() }; stream<array_source> is{ source }; net::carrier_base::ptr p_carrier; typename archive_trait::iarchive ia(is); // this takes the most time ia >> BOOST_SERIALIZATION_NVP(p_carrier); return p_carrier; } }; //---------------------------------------------------------------------------- // XML //---------------------------------------------------------------------------- static void to_wire_xml(benchmark::State &state) { std::locale::global(std::locale("C")); ev_test data(state.range_x()); while (state.KeepRunning()) { boost_test<boost_xml_trait>::to_wire(data); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(to_wire_xml)->Range(8, range_mult << range_max_step); static void from_wire_xml(benchmark::State &state) { std::locale::global(std::locale("C")); auto buffer = boost_test<boost_xml_trait>::to_wire(ev_test(state.range_x())); while (state.KeepRunning()) { boost_test<boost_xml_trait>::from_wire(buffer); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(from_wire_xml)->Range(8, range_mult << range_max_step); //---------------------------------------------------------------------------- // Text //---------------------------------------------------------------------------- static void to_wire_text(benchmark::State &state) { std::locale::global(std::locale("C")); ev_test data(state.range_x()); while (state.KeepRunning()) { boost_test<boost_text_trait>::to_wire(data); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(to_wire_text)->Range(8, range_mult << range_max_step); static void from_wire_text(benchmark::State &state) { std::locale::global(std::locale("C")); auto buffer = boost_test<boost_text_trait>::to_wire(ev_test(state.range_x())); while (state.KeepRunning()) { boost_test<boost_text_trait>::from_wire(buffer); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(from_wire_text)->Range(8, range_mult << range_max_step); //---------------------------------------------------------------------------- // Binary //---------------------------------------------------------------------------- static void to_wire_binary(benchmark::State &state) { std::locale::global(std::locale("C")); ev_test data(state.range_x()); while (state.KeepRunning()) { boost_test<boost_binary_trait>::to_wire(data); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(to_wire_binary)->Range(8, range_mult << range_max_step); static void from_wire_binary(benchmark::State &state) { std::locale::global(std::locale("C")); auto buffer = boost_test<boost_binary_trait>::to_wire(ev_test(state.range_x())); while (state.KeepRunning()) { boost_test<boost_binary_trait>::from_wire(buffer); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(from_wire_binary)->Range(8, range_mult << range_max_step); BENCHMARK_MAIN(); </code>

have you run my code from the post at 23.9.16 19:41 (make_array)? Yep
And here results for your latest code. Looks like it is doing quite well. Run on (8 X 3392 MHz CPU s) 09/25/16 11:50:14 Benchmark Time CPU Iterations ------------------------------------------------------------ to_wire_xml/8 30685 ns 30594 ns 22436 255.361kB/s to_wire_xml/64 33922 ns 34315 ns 21367 1.77868MB/s to_wire_xml/512 59209 ns 59797 ns 11218 8.16563MB/s to_wire_xml/4k 253341 ns 255922 ns 2804 15.2635MB/s to_wire_xml/32k 1831131 ns 1793594 ns 374 17.4231MB/s to_wire_xml/256k 14493601 ns 14352092 ns 50 17.4191MB/s to_wire_xml/2M 116950902 ns 117000750 ns 6 17.0939MB/s to_wire_xml/4M 237862928 ns 234001500 ns 3 17.0939MB/s from_wire_xml/8 38244 ns 38242 ns 17949 204.291kB/s from_wire_xml/64 42733 ns 42831 ns 16026 1.42503MB/s from_wire_xml/512 80459 ns 81703 ns 8974 5.97628MB/s from_wire_xml/4k 362877 ns 359818 ns 1951 10.8562MB/s from_wire_xml/32k 2637410 ns 2600017 ns 264 12.0192MB/s from_wire_xml/256k 20917089 ns 20962634 ns 32 11.926MB/s from_wire_xml/2M 166986547 ns 163801050 ns 4 12.2099MB/s from_wire_xml/4M 334816819 ns 335402150 ns 2 11.926MB/s to_wire_text/8 12291 ns 12412 ns 64102 629.454kB/s to_wire_text/64 15434 ns 15645 ns 44872 3.90136MB/s to_wire_text/512 39425 ns 39773 ns 17258 12.2767MB/s to_wire_text/4k 231714 ns 233668 ns 2804 16.7171MB/s to_wire_text/32k 1831865 ns 1835306 ns 408 17.0271MB/s to_wire_text/256k 14413837 ns 14213424 ns 45 17.589MB/s to_wire_text/2M 115013254 ns 114400733 ns 6 17.4824MB/s to_wire_text/4M 235687258 ns 234001500 ns 3 17.0939MB/s from_wire_text/8 11461 ns 11403 ns 56089 685.104kB/s from_wire_text/64 16117 ns 15992 ns 44872 3.81654MB/s from_wire_text/512 51763 ns 53040 ns 10000 9.20585MB/s from_wire_text/4k 333699 ns 336473 ns 2040 11.6094MB/s from_wire_text/32k 2607880 ns 2600017 ns 264 12.0192MB/s from_wire_text/256k 20869215 ns 20948706 ns 35 11.9339MB/s from_wire_text/2M 167760690 ns 167701075 ns 4 11.926MB/s from_wire_text/4M 334471631 ns 335402150 ns 2 11.926MB/s to_wire_binary/8 5625 ns 5616 ns 100000 1.3585MB/s to_wire_binary/64 5634 ns 5616 ns 100000 10.868MB/s to_wire_binary/512 5747 ns 5702 ns 112179 85.6388MB/s to_wire_binary/4k 6130 ns 6119 ns 112179 638.398MB/s to_wire_binary/32k 12251 ns 12273 ns 57200 2.4866GB/s to_wire_binary/256k 251085 ns 250358 ns 2804 998.569MB/s to_wire_binary/2M 2504876 ns 2507159 ns 280 797.716MB/s to_wire_binary/4M 6222666 ns 6267897 ns 112 638.173MB/s from_wire_binary/8 5194 ns 5304 ns 100000 1.43841MB/s from_wire_binary/64 5277 ns 5284 ns 112179 11.55MB/s from_wire_binary/512 5235 ns 5145 ns 112179 94.897MB/s from_wire_binary/4k 5354 ns 5424 ns 112179 720.244MB/s from_wire_binary/32k 7479 ns 7475 ns 89743 4.08277GB/s from_wire_binary/256k 24681 ns 24475 ns 28045 9.97506GB/s from_wire_binary/2M 631601 ns 626091 ns 897 3.11955GB/s from_wire_binary/4M 1309594 ns 1313034 ns 499 2.97498GB/s -----Original Message----- From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Georg Gast Sent: Sunday, September 25, 2016 11:38 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] [serialization] Runtime overhead of serialization archives Am 25.09.2016 um 07:05 schrieb Ernest Zaslavsky:
I guess you are still comparing release to debug version. I've ran your code and this is what I've got
Win7 x64, VS2015 Update3, Release, x64
Run on (8 X 3392 MHz CPU s) 09/25/16 07:36:53 Benchmark Time CPU Iterations ------------------------------------------------------------ to_wire_xml/8 39564 ns 39980 ns 17949 195.409kB/s to_wire_xml/64 98915 ns 98035 ns 7479 637.527kB/s to_wire_xml/512 583376 ns 583961 ns 1122 856.222kB/s to_wire_xml/4k 4494721 ns 4428415 ns 155 903.258kB/s to_wire_xml/32k 35621888 ns 35100225 ns 20 911.675kB/s to_wire_xml/256k 285296564 ns 280801800 ns 2 911.675kB/s to_wire_xml/2M 2294702350 ns 2293214700 ns 1 893.069kB/s to_wire_xml/4M 4596100456 ns 4586429400 ns 1 893.069kB/s from_wire_xml/8 44701 ns 44361 ns 15473 176.11kB/s from_wire_xml/64 97779 ns 98035 ns 7479 637.527kB/s from_wire_xml/512 523956 ns 530403 ns 1000 942.679kB/s from_wire_xml/4k 3919513 ns 3877482 ns 173 1031.6kB/s from_wire_xml/32k 30990532 ns 31200200 ns 22 1025.63kB/s from_wire_xml/256k 248254367 ns 249601600 ns 3 1025.63kB/s from_wire_xml/2M 1990579271 ns 1981212700 ns 1 1033.71kB/s from_wire_xml/8M 7927240207 ns 7924850800 ns 1 1033.71kB/s to_wire_text/8 13381 ns 13142 ns 49857 594.483kB/s to_wire_text/64 31969 ns 31985 ns 22436 1.90827MB/s to_wire_text/512 180335 ns 179751 ns 4079 2.71643MB/s to_wire_text/4k 1363654 ns 1375560 ns 499 2.83975MB/s to_wire_text/32k 10990438 ns 10968820 ns 64 2.84898MB/s to_wire_text/256k 86883137 ns 88400567 ns 9 2.82804MB/s to_wire_text/2M 696132001 ns 686404400 ns 1 2.91373MB/s to_wire_text/4M 1398212634 ns 1388408900 ns 1 2.881MB/s from_wire_text/8 11158 ns 11195 ns 64102 697.873kB/s from_wire_text/64 25274 ns 25588 ns 28045 2.38534MB/s from_wire_text/512 138245 ns 137666 ns 4986 3.54685MB/s from_wire_text/4k 1047166 ns 1046497 ns 641 3.73269MB/s from_wire_text/32k 8304279 ns 8320053 ns 90 3.75599MB/s from_wire_text/256k 66510527 ns 66654973 ns 11 3.75066MB/s from_wire_text/2M 533393808 ns 530403400 ns 1 3.77071MB/s from_wire_text/4M 1055956857 ns 1060806800 ns 1 3.77071MB/s to_wire_binary/8 5444 ns 5460 ns 100000 1.39732MB/s to_wire_binary/64 5411 ns 5424 ns 112179 11.2538MB/s to_wire_binary/512 5523 ns 5563 ns 112179 87.7797MB/s to_wire_binary/4k 5966 ns 5980 ns 112179 653.244MB/s to_wire_binary/32k 28940 ns 29412 ns 24929 1062.5MB/s to_wire_binary/256k 251626 ns 250358 ns 2804 998.569MB/s to_wire_binary/2M 2548630 ns 2540925 ns 264 787.115MB/s to_wire_binary/4M 6361041 ns 6407184 ns 112 624.299MB/s from_wire_binary/8 5363 ns 5284 ns 112179 1.44375MB/s from_wire_binary/64 5371 ns 5460 ns 100000 11.1785MB/s from_wire_binary/512 5386 ns 5460 ns 100000 89.4282MB/s from_wire_binary/4k 5483 ns 5424 ns 112179 720.244MB/s from_wire_binary/32k 7685 ns 7649 ns 89743 3.98998GB/s from_wire_binary/256k 25332 ns 25588 ns 28045 9.54136GB/s from_wire_binary/2M 620654 ns 625672 ns 1122 3.12164GB/s from_wire_binary/4M 1306333 ns 1306960 ns 561 2.98881GB/s
Dear Earnest, Thanks for running my code :) have you run my code from the post at 23.9.16 19:41 (make_array)? In your result the XML settles at about 2.9 MB/sec (from wire) and the text archive at about 3.8 MB/sec (from wire). My guess from this values is, that you testes the version with make_array in the serialization functions. I realized this DEBUG thing too and i fixed it in the meantime. On my desktop workstation The make_binary_object improved the performance a lot. Here is the current version of the source from my homepage Georg <code> // STL Archive + Stuff #include <boost/serialization/base_object.hpp> #include <boost/serialization/binary_object.hpp> #include <boost/serialization/export.hpp> #include <boost/serialization/shared_ptr.hpp> #include <boost/serialization/split_free.hpp> #include <boost/serialization/unique_ptr.hpp> // include headers that implement a archives in xml/text/binary format #include <boost/archive/archive_exception.hpp> #include <boost/archive/xml_iarchive.hpp> #include <boost/archive/xml_oarchive.hpp> #include <boost/archive/text_iarchive.hpp> #include <boost/archive/text_oarchive.hpp> #include <boost/archive/binary_iarchive.hpp> #include <boost/archive/binary_oarchive.hpp> // IO stream for the to/from wire functions #include <boost/iostreams/device/array.hpp> #include <boost/iostreams/device/back_inserter.hpp> #include <boost/iostreams/stream.hpp> #include <memory> #include <cstdint> #include <vector> #include <benchmark/benchmark.h> // the step interval for the benchmarks static const int range_mult = 4; static const int range_max_step = 20; // the test structure struct ev_test { ev_test(size_t s = 0) { m_data.resize(s); for (auto &c : m_data) c = 1; } std::vector<uint8_t> m_data; }; //----------------------------------------------------------------------------- // Type carrier and its support //---------------------------------------------------------------------------- namespace net { using packet = std::vector<char>; // a packet on the wire class carrier_visitor_base; class carrier_base // the base in the queue { public: using ptr = std::unique_ptr<carrier_base>; virtual ~carrier_base() {} virtual void accept(carrier_visitor_base *p_visitor) = 0; }; template <typename T> class carrier; class carrier_visitor_base { public: virtual ~carrier_visitor_base() {} virtual void handle(carrier<ev_test> *p_evt) = 0; virtual void handle(carrier<char> *p_evt) = 0; virtual void handle(carrier<int> *p_evt) = 0; }; template <typename T> class carrier : public carrier_base // the specific carrier { public: explicit carrier() : m_data() {} explicit carrier(const T &data) : m_data(data) {} virtual void accept(carrier_visitor_base *p_visitor) override { p_visitor->handle(this); } T &data() { return m_data; } private: T m_data; }; } // ns net //---------------------------------------------------------------------------- // external serialization function //---------------------------------------------------------------------------- BOOST_SERIALIZATION_SPLIT_FREE(ev_test) namespace boost { namespace serialization { // serialization function for carrier_base template <class Archive> void serialize(Archive &ar, net::carrier_base &t, const unsigned int version) { } // serialization function for net::carrier<T> template <class Archive, typename T> void serialize(Archive &ar, net::carrier<T> &t, const unsigned int version) { ar &boost::serialization::make_nvp( "carrier_base", boost::serialization::base_object<net::carrier_base>(t)); auto &data = t.data(); ar &BOOST_SERIALIZATION_NVP(data); } // serialization function for ev_test template <class Archive> inline void save(Archive &ar, const ev_test &t, const unsigned int version) { size_t size = t.m_data.size(); ar &BOOST_SERIALIZATION_NVP(size); ar &boost::serialization::make_nvp( "m_data", boost::serialization::make_binary_object(const_cast<uint8_t*>(t.m_data.data()), t.m_data.size())); } template <class Archive> inline void load(Archive &ar, ev_test &t, const unsigned int version) { size_t size = 0; ar &BOOST_SERIALIZATION_NVP(size); t.m_data.resize(size); ar &boost::serialization::make_nvp( "m_data", boost::serialization::make_binary_object(t.m_data.data(), t.m_data.size())); } } } // we must export all carrier BOOST_SERIALIZATION_SHARED_PTR(net::carrier<ev_test>) BOOST_CLASS_EXPORT(net::carrier<ev_test>) //---------------------------------------------------------------------------- // the traits for the boost serialization tests //---------------------------------------------------------------------------- struct boost_xml_trait { static const char *name() { return "boost_xml_test: ev_test: "; } typedef boost::archive::xml_oarchive oarchive; typedef boost::archive::xml_iarchive iarchive; }; struct boost_text_trait { static const char *name() { return "boost_text_test: ev_test: "; } typedef boost::archive::text_oarchive oarchive; typedef boost::archive::text_iarchive iarchive; }; struct boost_binary_trait { static const char *name() { return "boost_binary_test: ev_test: "; } typedef boost::archive::binary_oarchive oarchive; typedef boost::archive::binary_iarchive iarchive; }; template <typename archive_trait> struct boost_test { static const char *name() { return archive_trait::name(); } static size_t msg_size() { return 600; } // throws boost::archive::archive_exception template <typename T> static net::packet to_wire(const T &data) { using namespace boost::iostreams; using T1 = typename std::remove_cv<T>::type; using BT = typename std::remove_reference<T1>::type; net::carrier_base::ptr p_carrier = std::make_unique<net::carrier<BT>>(data); net::packet p; p.reserve(msg_size()); { back_insert_device<net::packet> sink(p); stream<back_insert_device<net::packet>> os{ sink }; typename archive_trait::oarchive oa(os); oa << BOOST_SERIALIZATION_NVP(p_carrier); } return p; } // throws boost::archive::archive_exception static net::carrier_base::ptr from_wire(const net::packet &data) { using namespace boost::iostreams; array_source source{ data.data(), data.size() }; stream<array_source> is{ source }; net::carrier_base::ptr p_carrier; typename archive_trait::iarchive ia(is); // this takes the most time ia >> BOOST_SERIALIZATION_NVP(p_carrier); return p_carrier; } }; //---------------------------------------------------------------------------- // XML //---------------------------------------------------------------------------- static void to_wire_xml(benchmark::State &state) { std::locale::global(std::locale("C")); ev_test data(state.range_x()); while (state.KeepRunning()) { boost_test<boost_xml_trait>::to_wire(data); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(to_wire_xml)->Range(8, range_mult << range_max_step); static void from_wire_xml(benchmark::State &state) { std::locale::global(std::locale("C")); auto buffer = boost_test<boost_xml_trait>::to_wire(ev_test(state.range_x())); while (state.KeepRunning()) { boost_test<boost_xml_trait>::from_wire(buffer); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(from_wire_xml)->Range(8, range_mult << range_max_step); //---------------------------------------------------------------------------- // Text //---------------------------------------------------------------------------- static void to_wire_text(benchmark::State &state) { std::locale::global(std::locale("C")); ev_test data(state.range_x()); while (state.KeepRunning()) { boost_test<boost_text_trait>::to_wire(data); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(to_wire_text)->Range(8, range_mult << range_max_step); static void from_wire_text(benchmark::State &state) { std::locale::global(std::locale("C")); auto buffer = boost_test<boost_text_trait>::to_wire(ev_test(state.range_x())); while (state.KeepRunning()) { boost_test<boost_text_trait>::from_wire(buffer); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(from_wire_text)->Range(8, range_mult << range_max_step); //---------------------------------------------------------------------------- // Binary //---------------------------------------------------------------------------- static void to_wire_binary(benchmark::State &state) { std::locale::global(std::locale("C")); ev_test data(state.range_x()); while (state.KeepRunning()) { boost_test<boost_binary_trait>::to_wire(data); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(to_wire_binary)->Range(8, range_mult << range_max_step); static void from_wire_binary(benchmark::State &state) { std::locale::global(std::locale("C")); auto buffer = boost_test<boost_binary_trait>::to_wire(ev_test(state.range_x())); while (state.KeepRunning()) { boost_test<boost_binary_trait>::from_wire(buffer); } state.SetBytesProcessed(static_cast<int64_t>(state.iterations()) * state.range_x()); } BENCHMARK(from_wire_binary)->Range(8, range_mult << range_max_step); BENCHMARK_MAIN(); </code> _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Am 25.09.2016 um 10:56 schrieb Ernest Zaslavsky:
have you run my code from the post at 23.9.16 19:41 (make_array)? Yep
And here results for your latest code. Looks like it is doing quite well.
Dear Ernest, That result is in the same range as my current windows results. My main issue is, why is there such a big difference to the linux one? My XML Archives on linux settels at 50 MB/s. The text archives nearly at the same range too. See the added graphs. This is what i cant explain .... Georg

My XML Archives on linux settels at 50 MB/s. The text archives nearly at the same range too. Oh, now I see, windows is much slower in XML Well, looks like it is streams issue, streams never were known for great performance (at least on windows). That's why you cant use boost::lexical_cast if you are performance oriented. See the shotscreen attached. It all goes down to put/peek/ignore etc. I guess this issue is eligible for Microsoft Connect issue to be opened :)
-----Original Message----- From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Georg Gast Sent: Sunday, September 25, 2016 12:07 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] [serialization] Runtime overhead of serialization archives Am 25.09.2016 um 10:56 schrieb Ernest Zaslavsky:
have you run my code from the post at 23.9.16 19:41 (make_array)? Yep
And here results for your latest code. Looks like it is doing quite well.
Dear Ernest, That result is in the same range as my current windows results. My main issue is, why is there such a big difference to the linux one? My XML Archives on linux settels at 50 MB/s. The text archives nearly at the same range too. See the added graphs. This is what i cant explain .... Georg

Without global set locale: Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 78066 77177 7479 from_wire_xml 95638 95949 7479
With global set locale: 09/22/16 08:32:49 Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 41399 41302 16619 from_wire_xml 52841 52844 11218
Thats amazing! Thats the Level of the Linux implementation.
One riddle is solved :)
After fiddling around on linux with clang 3.8 and gcc optimizer options i got down to this. With gcc and -O3.
Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 11174 11178 381818 to_wire_text 5148 5149 820313 to_wire_binary 3327 3330 1141304 to_wire_cstyle 63 63 65217391 from_wire_xml 27170 27183 155096 from_wire_text 5371 5370 783582 from_wire_binary 3226 3228 1296296 from_wire_cstyle 45 45 93750000
This results look very nice. <6µs for serilize/deserialize a structure to a portable text archive seems very nice :)
Now is the difference again pretty big compared to windows .....
Now i installed msys2 on Win10 and compiled boost/benchmark and the testsuite with gcc 5.3.0 and i got these results Run on (8 X 2195 MHz CPU s) 2016-09-23 00:00:29 ***WARNING*** Library was built as DEBUG. Timings may be affected. Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 17304 17274 40698 to_wire_text 10769 10917 74468 to_wire_binary 7244 7310 89744 to_wire_cstyle 168 165 4069767 from_wire_xml 75807 75725 8861 from_wire_text 10217 10273 74468 from_wire_binary 7254 7308 111111 from_wire_cstyle 166 165 4069767 The compiler or its support libraries are definitly an issue.

Run on (8 X 2195 MHz CPU s) 2016-09-23 00:00:29 ***WARNING*** Library was built as DEBUG. Timings may be affected.
Looks like you are comparing release version with debug hence the gap -----Original Message----- From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Georg Gast Sent: Friday, September 23, 2016 1:04 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] [serialization] Runtime overhead of serialization archives
Without global set locale: Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 78066 77177 7479 from_wire_xml 95638 95949 7479
With global set locale: 09/22/16 08:32:49 Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 41399 41302 16619 from_wire_xml 52841 52844 11218
Thats amazing! Thats the Level of the Linux implementation.
One riddle is solved :)
After fiddling around on linux with clang 3.8 and gcc optimizer options i got down to this. With gcc and -O3.
Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 11174 11178 381818 to_wire_text 5148 5149 820313 to_wire_binary 3327 3330 1141304 to_wire_cstyle 63 63 65217391 from_wire_xml 27170 27183 155096 from_wire_text 5371 5370 783582 from_wire_binary 3226 3228 1296296 from_wire_cstyle 45 45 93750000
This results look very nice. <6µs for serilize/deserialize a structure to a portable text archive seems very nice :)
Now is the difference again pretty big compared to windows .....
Now i installed msys2 on Win10 and compiled boost/benchmark and the testsuite with gcc 5.3.0 and i got these results Run on (8 X 2195 MHz CPU s) 2016-09-23 00:00:29 ***WARNING*** Library was built as DEBUG. Timings may be affected. Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 17304 17274 40698 to_wire_text 10769 10917 74468 to_wire_binary 7244 7310 89744 to_wire_cstyle 168 165 4069767 from_wire_xml 75807 75725 8861 from_wire_text 10217 10273 74468 from_wire_binary 7254 7308 111111 from_wire_cstyle 166 165 4069767 The compiler or its support libraries are definitly an issue. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

On 09/22/2016 06:25 AM, Georg Gast wrote:
Could you please elaborate what is different in your archive?
The Boost.Serialization archives use iostreams as a generic mechanism for inputting and outputting data. This comes with the added performance cost as you have discovered. My output archives use a buffer interface class along with buffer traits to determine how to write data to a given container. These buffer traits are described at: http://breese.github.io/trial/protocol/trial_protocol/buffer.html There are specializations for the most common standard container types, so you can pass a std::string (or std::vector or std::ostream) directly in the constructor. My input archives simply take a string view as input. I have not had the need for anything else.
participants (6)
-
Bjorn Reese
-
Chris Glover
-
Ernest Zaslavsky
-
Georg Gast
-
georg@schorsch-tech.de
-
Oswin Krause