Performance optimization in Boost using std::vector<>

Hello everybody, I have a question related to performance optimization using Boost. I found this link http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html <http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html> and trying to figure out which curve (on the graph in the link) represents the communication of std::vector<int> and std::vector<double>? Is communication using std::vector<int> and std::vector<double> optimized (is_mpi_datatype) or not? So I use "boost_mpi" and "boost_serialization" libraries. I include the header "#include <boost/serialization/vector.hpp>" in my code. Then I send directly std::vector<int> and std::vector<double> using "world.send(...) " and world.recv(...)" calls. I fill the vector with some values (for example I fill ten values) and I get the same ten values on other side of processor boundary. This thing works but I want to improve communication performance. I found out in this link http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html under section "User-defined data types" that "Fixed data types can be optimized for transmission using the is_mpi_datatype type trait. ". Also I studied the information on http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.performa.... Also this link http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/wrappers.html#ar... shows that std::vector<> are optimized for serialization. I am now confused that sending std::vector<> like this is good for performance optimization or not? What other better methods are available? Is something like this http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.skeleton... a good option? Best Regards, Salman Arshad -- View this message in context: http://boost.2283326.n4.nabble.com/Performance-optimization-in-Boost-using-s... Sent from the Boost - Users mailing list archive at Nabble.com.

My code looks similar to example below but i send really big vectors. #include <boost/mpi.hpp> #include <iostream> #include <boost/serialization/vector.hpp> namespace mpi = boost::mpi; int main() { mpi::environment env; mpi::communicator world; std::vector<int> my_vector; if (world.rank() == 0) { my_vector.push_back(17); my_vector.push_back(38); world.send(1, 0, my_vector); } else { world.recv(0, 0, my_vector); } return 0; } -- View this message in context: http://boost.2283326.n4.nabble.com/Performance-optimization-in-Boost-using-s... Sent from the Boost - Users mailing list archive at Nabble.com.

There is a known performance problem with serializing a std::vector over MPI. Basically, this prevents you from ever reaching the performance of C. The problem is on the receive side. When you receive a vector, if you don't know the size, the receive side has to: - get the number of elements of the vector - resize the vector (which initializes elements) - receive the elements in the vector data (reinitialize the elements) The C version of the idiom: - gets the number of elements - reserves (as opposed to resize) the memory for the elements - receive the element in the vector (initialize elements once). This might make a small or a large performance difference, profile! However, if you decide to use std::vector as API, you basically cannot change this later, since even if you where to use the C idiom, at some point you have to copy into a std::vector. A more C++ "alternative" to the C idiom that offers the same performance would be to use a std::unique_ptr<T[]> + a size. If you can have a custom vector type, consider adding an "unsafe_change_size(std::size_t new_size)" where "assert(new_size < capacity)" member function and a custom allocator that doesn't default construct elements. Rust Vec<T> type has it (unsafe get_mut_len), and it proves useful into providing a zero const abstraction around a C array that also is dynamically resizable. Would I do it if I need a std::vector as abstraction? No, I would live with the choice and never try to get as fast as C. Reserve memory in your receive buffers at the beginning of the program and keep them around (reuse them) to prevent memory allocation during send/receive operations. On Wednesday, February 11, 2015 at 3:13:52 PM UTC+1, saloo wrote:
Hello everybody,
I have a question related to performance optimization using Boost. I found this link http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html <http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html> and trying to figure out which curve (on the graph in the link) represents the communication of std::vector<int> and std::vector<double>? Is communication using std::vector<int> and std::vector<double> optimized (is_mpi_datatype) or not?
So I use "boost_mpi" and "boost_serialization" libraries. I include the header "#include <boost/serialization/vector.hpp>" in my code. Then I send directly std::vector<int> and std::vector<double> using "world.send(...) " and world.recv(...)" calls. I fill the vector with some values (for example I fill ten values) and I get the same ten values on other side of processor boundary. This thing works but I want to improve communication performance. I found out in this link http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html under section "User-defined data types" that "Fixed data types can be optimized for transmission using the is_mpi_datatype type trait. ". Also I studied the information on
http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.performa....
Also this link
http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/wrappers.html#ar... shows that std::vector<> are optimized for serialization. I am now confused that sending std::vector<> like this is good for performance optimization or not? What other better methods are available? Is something like this
http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.skeleton... a good option? Best Regards, Salman Arshad
-- View this message in context: http://boost.2283326.n4.nabble.com/Performance-optimization-in-Boost-using-s... Sent from the Boost - Users mailing list archive at Nabble.com. _______________________________________________ Boost-users mailing list Boost...@lists.boost.org <javascript:> http://lists.boost.org/mailman/listinfo.cgi/boost-users

Hi, not sure if the OP needs std::vector but... I'd recommend boost::container::vector which has a dedicated constructor [1] and resize() [2] method tagged with default_init_t argument, both of which default initialize the values in the vector. For primitives it basically means it leaves them uninitialized, hence there's no overhead when the vector is to be filled with real data soon. WBR, Adam Romanek [1] http://www.boost.org/doc/libs/1_57_0/doc/html/boost/container/vector.html#id... [2] http://www.boost.org/doc/libs/1_57_0/doc/html/boost/container/vector.html#id... On 12.02.2015 09:42, Gonzalo BG wrote:
There is a known performance problem with serializing a std::vector over MPI. Basically, this prevents you from ever reaching the performance of C.
The problem is on the receive side. When you receive a vector, if you don't know the size, the receive side has to: - get the number of elements of the vector - resize the vector (which initializes elements) - receive the elements in the vector data (reinitialize the elements)
The C version of the idiom: - gets the number of elements - reserves (as opposed to resize) the memory for the elements - receive the element in the vector (initialize elements once).
This might make a small or a large performance difference, profile! However, if you decide to use std::vector as API, you basically cannot change this later, since even if you where to use the C idiom, at some point you have to copy into a std::vector.
A more C++ "alternative" to the C idiom that offers the same performance would be to use a std::unique_ptr<T[]> + a size.
If you can have a custom vector type, consider adding an "unsafe_change_size(std::size_t new_size)" where "assert(new_size < capacity)" member function and a custom allocator that doesn't default construct elements. Rust Vec<T> type has it (unsafe get_mut_len), and it proves useful into providing a zero const abstraction around a C array that also is dynamically resizable.
Would I do it if I need a std::vector as abstraction? No, I would live with the choice and never try to get as fast as C. Reserve memory in your receive buffers at the beginning of the program and keep them around (reuse them) to prevent memory allocation during send/receive operations.
On Wednesday, February 11, 2015 at 3:13:52 PM UTC+1, saloo wrote:
Hello everybody,
I have a question related to performance optimization using Boost. I found this link http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html <http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html> <http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html <http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html>> and trying to figure out which curve (on the graph in the link) represents the communication of std::vector<int> and std::vector<double>? Is communication using std::vector<int> and std::vector<double> optimized (is_mpi_datatype) or not?
So I use "boost_mpi" and "boost_serialization" libraries. I include the header "#include <boost/serialization/vector.hpp>" in my code. Then I send directly std::vector<int> and std::vector<double> using "world.send(...) " and world.recv(...)" calls. I fill the vector with some values (for example I fill ten values) and I get the same ten values on other side of processor boundary. This thing works but I want to improve communication performance. I found out in this link http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html <http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html> under section "User-defined data types" that "Fixed data types can be optimized for transmission using the is_mpi_datatype type trait. ". Also I studied the information on http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.performa... <http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.performance_optimizations>.
Also this link http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/wrappers.html#ar... <http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/wrappers.html#arrays>
shows that std::vector<> are optimized for serialization. I am now confused that sending std::vector<> like this is good for performance optimization or not? What other better methods are available? Is something like this http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.skeleton... <http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.skeleton_and_content>
a good option? Best Regards, Salman Arshad
-- View this message in context: http://boost.2283326.n4.nabble.com/Performance-optimization-in-Boost-using-s... <http://boost.2283326.n4.nabble.com/Performance-optimization-in-Boost-using-std-vector-tp4672196.html>
Sent from the Boost - Users mailing list archive at Nabble.com. _______________________________________________ Boost-users mailing list Boost...@lists.boost.org <javascript:> http://lists.boost.org/mailman/listinfo.cgi/boost-users <http://lists.boost.org/mailman/listinfo.cgi/boost-users>
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Thanks Gonzalo for a detailed explanation So what I understand is to change the code in boost to following code : #include <boost/mpi.hpp> #include <iostream> #include <boost/serialization/vector.hpp> namespace mpi = boost::mpi; int main() { mpi::environment env; mpi::communicator world; std::vector<int> my_vector; if (world.rank() == 0) { my_vector.push_back(17); my_vector.push_back(38); world.send(1, 0, my_vector); } else { std::vector<int> my_vector2; my_vector2.reserve(2); world.recv(0, 0, my_vector2); } return 0; } What is the best option in boost to achieve a good performance? I saw in the code of boost/serialization/vector.hpp that they have an optimized version which keeps track of size and uses serialization wrapper of make_array. How can I force boost to use optimized version for serializing? Below is the code from boost/serialization/vector.hpp: // the optimized versions template<class Archive, class U, class Allocator> inline void save( Archive & ar, const std::vector<U, Allocator> &t, const unsigned int /* file_version */, mpl::true_ ){ const collection_size_type count(t.size()); ar << BOOST_SERIALIZATION_NVP(count); if (!t.empty()) ar << make_array(detail::get_data(t),t.size()); } template<class Archive, class U, class Allocator> inline void load( Archive & ar, std::vector<U, Allocator> &t, const unsigned int /* file_version */, mpl::true_ ){ collection_size_type count(t.size()); ar >> BOOST_SERIALIZATION_NVP(count); t.resize(count); unsigned int item_version=0; if(BOOST_SERIALIZATION_VECTOR_VERSIONED(ar.get_library_version())) { ar >> BOOST_SERIALIZATION_NVP(item_version); } if (!t.empty()) ar >> make_array(detail::get_data(t),t.size()); } Or should I skip the boost completely and go to basic MPI commands to send vector as MPI derived data type? Then I should keep in mind what you said about std::unique_ptr and Vect<T> and also reserving memory in recieve buffer at beginning of pragram and reusing it to prevent memopry allocation during send/recieve. How can I reach a good perfomance solution using boost? Thanks -- View this message in context: http://boost.2283326.n4.nabble.com/Performance-optimization-in-Boost-using-s... Sent from the Boost - Users mailing list archive at Nabble.com.

Hello
There is a known performance problem with serializing a std::vector over MPI. Basically, this prevents you from ever reaching the performance of C. The problem is on the receive side. When you receive a vector, if you don't know the size, the receive side has to: - get the number of elements of the vector - resize the vector (which initializes elements) - receive the elements in the vector data (reinitialize the elements) The C version of the idiom: - gets the number of elements - reserves (as opposed to resize) the memory for the elements - receive the element in the vector (initialize elements once). This might make a small or a large performance difference, profile!
According to the attached program there seems to be a much larger performance problem than initializing vector elements. The program first sends a vector of doubles using MPI, then sends another identical vector with boost::mpi and prints how long these took in seconds. Note that boost::mpi also sends two messages for run-time sized containers. For vectors of 1e6 items the program prints (mpi rank is the first number): mpi 0 resize: 0.0126891, send: 0.00988925, recv: 0 1 resize: 0.0131643, send: 0, recv: 0.00955247 boost::mpi 0 resize: 0.0096425, send: 0.279135, recv: 0 1 resize: 0, send: 0, recv: 0.295702 For vectors of 1e7 items: mpi 0 resize: 0.0974027, send: 0.0538886, recv: 0 1 resize: 0.105708, send: 0, recv: 0.0456324 boost::mpi 0 resize: 0.0517177, send: 2.70333, recv: 0 1 resize: 0, send: 0, recv: 2.82339 And vectors of 5e7 items: mpi 0 resize: 0.590099, send: 0.226269, recv: 0 1 resize: 0.440719, send: 0, recv: 0.375706 boost::mpi 0 resize: 0.198448, send: 13.5335, recv: 0 1 resize: 0, send: 0, recv: 14.0518 Boost::mpi version is always at least 10 times slower. It also seems to run out of memory with smaller number of items implying that unnecessary copies of data are created somewhere. Based on experience with more complex programs (e.g. http://dx.doi.org/10.1016/j.jastp.2014.08.012) I wouldn't recommend boost::mpi for high performance computing. Or in case of user error at least high performance is easier to get with pure MPI... I used boost-1.57.0, g++ (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) and mpirun (Open MPI) 1.6.5. Ilja
participants (4)
-
Adam Romanek
-
Gonzalo BG
-
Ilja Honkonen
-
saloo