Performance optimization in Boost using std::vector<>
Hello everybody,
I have a question related to performance optimization using Boost. I found
this link
http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html
http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html and
trying to figure out which curve (on the graph in the link) represents the
communication of std::vector<int> and std::vector<double>? Is communication
using std::vector<int> and std::vector<double> optimized (is_mpi_datatype)
or not?
So I use "boost_mpi" and "boost_serialization" libraries. I include the
header "#include
My code looks similar to example below but i send really big vectors.
#include
There is a known performance problem with serializing a std::vector over
MPI.
Basically, this prevents you from ever reaching the performance of C.
The problem is on the receive side. When you receive a vector, if you don't
know the size,
the receive side has to:
- get the number of elements of the vector
- resize the vector (which initializes elements)
- receive the elements in the vector data (reinitialize the elements)
The C version of the idiom:
- gets the number of elements
- reserves (as opposed to resize) the memory for the elements
- receive the element in the vector (initialize elements once).
This might make a small or a large performance difference, profile!
However, if you
decide to use std::vector as API, you basically cannot change this later,
since
even if you where to use the C idiom, at some point you have to copy
into a std::vector.
A more C++ "alternative" to the C idiom that offers the same performance
would be
to use a std::unique_ptr
Hello everybody,
I have a question related to performance optimization using Boost. I found this link http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html and trying to figure out which curve (on the graph in the link) represents the communication of std::vector<int> and std::vector<double>? Is communication using std::vector<int> and std::vector<double> optimized (is_mpi_datatype) or not?
So I use "boost_mpi" and "boost_serialization" libraries. I include the header "#include
" in my code. Then I send directly std::vector<int> and std::vector<double> using "world.send(...) " and world.recv(...)" calls. I fill the vector with some values (for example I fill ten values) and I get the same ten values on other side of processor boundary. This thing works but I want to improve communication performance. I found out in this link http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html under section "User-defined data types" that "Fixed data types can be optimized for transmission using the is_mpi_datatype type trait. ". Also I studied the information on http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.performa....
Also this link
http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/wrappers.html#ar... shows that std::vector<> are optimized for serialization. I am now confused that sending std::vector<> like this is good for performance optimization or not? What other better methods are available? Is something like this
http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.skeleton... a good option? Best Regards, Salman Arshad
-- View this message in context: http://boost.2283326.n4.nabble.com/Performance-optimization-in-Boost-using-s... Sent from the Boost - Users mailing list archive at Nabble.com. _______________________________________________ Boost-users mailing list Boost...@lists.boost.org javascript: http://lists.boost.org/mailman/listinfo.cgi/boost-users
Hi, not sure if the OP needs std::vector but... I'd recommend boost::container::vector which has a dedicated constructor [1] and resize() [2] method tagged with default_init_t argument, both of which default initialize the values in the vector. For primitives it basically means it leaves them uninitialized, hence there's no overhead when the vector is to be filled with real data soon. WBR, Adam Romanek [1] http://www.boost.org/doc/libs/1_57_0/doc/html/boost/container/vector.html#id... [2] http://www.boost.org/doc/libs/1_57_0/doc/html/boost/container/vector.html#id... On 12.02.2015 09:42, Gonzalo BG wrote:
There is a known performance problem with serializing a std::vector over MPI. Basically, this prevents you from ever reaching the performance of C.
The problem is on the receive side. When you receive a vector, if you don't know the size, the receive side has to: - get the number of elements of the vector - resize the vector (which initializes elements) - receive the elements in the vector data (reinitialize the elements)
The C version of the idiom: - gets the number of elements - reserves (as opposed to resize) the memory for the elements - receive the element in the vector (initialize elements once).
This might make a small or a large performance difference, profile! However, if you decide to use std::vector as API, you basically cannot change this later, since even if you where to use the C idiom, at some point you have to copy into a std::vector.
A more C++ "alternative" to the C idiom that offers the same performance would be to use a std::unique_ptr
+ a size. If you can have a custom vector type, consider adding an "unsafe_change_size(std::size_t new_size)" where "assert(new_size < capacity)" member function and a custom allocator that doesn't default construct elements. Rust Vec<T> type has it (unsafe get_mut_len), and it proves useful into providing a zero const abstraction around a C array that also is dynamically resizable.
Would I do it if I need a std::vector as abstraction? No, I would live with the choice and never try to get as fast as C. Reserve memory in your receive buffers at the beginning of the program and keep them around (reuse them) to prevent memory allocation during send/receive operations.
On Wednesday, February 11, 2015 at 3:13:52 PM UTC+1, saloo wrote:
Hello everybody,
I have a question related to performance optimization using Boost. I found this link http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html <http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html> and trying to figure out which curve (on the graph in the link) represents the communication of std::vector<int> and std::vector<double>? Is communication using std::vector<int> and std::vector<double> optimized (is_mpi_datatype) or not?
So I use "boost_mpi" and "boost_serialization" libraries. I include the header "#include
" in my code. Then I send directly std::vector<int> and std::vector<double> using "world.send(...) " and world.recv(...)" calls. I fill the vector with some values (for example I fill ten values) and I get the same ten values on other side of processor boundary. This thing works but I want to improve communication performance. I found out in this link http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html under section "User-defined data types" that "Fixed data types can be optimized for transmission using the is_mpi_datatype type trait. ". Also I studied the information on http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.performa... http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.performa.... Also this link http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/wrappers.html#ar... http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/wrappers.html#ar...
shows that std::vector<> are optimized for serialization. I am now confused that sending std::vector<> like this is good for performance optimization or not? What other better methods are available? Is something like this http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.skeleton... http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.skeleton...
a good option? Best Regards, Salman Arshad
-- View this message in context: http://boost.2283326.n4.nabble.com/Performance-optimization-in-Boost-using-s... http://boost.2283326.n4.nabble.com/Performance-optimization-in-Boost-using-s...
Sent from the Boost - Users mailing list archive at Nabble.com. _______________________________________________ Boost-users mailing list Boost...@lists.boost.org javascript: http://lists.boost.org/mailman/listinfo.cgi/boost-users http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Thanks Gonzalo for a detailed explanation
So what I understand is to change the code in boost to following code :
#include
Hello
There is a known performance problem with serializing a std::vector over MPI. Basically, this prevents you from ever reaching the performance of C. The problem is on the receive side. When you receive a vector, if you don't know the size, the receive side has to: - get the number of elements of the vector - resize the vector (which initializes elements) - receive the elements in the vector data (reinitialize the elements) The C version of the idiom: - gets the number of elements - reserves (as opposed to resize) the memory for the elements - receive the element in the vector (initialize elements once). This might make a small or a large performance difference, profile!
According to the attached program there seems to be a much larger performance problem than initializing vector elements. The program first sends a vector of doubles using MPI, then sends another identical vector with boost::mpi and prints how long these took in seconds. Note that boost::mpi also sends two messages for run-time sized containers. For vectors of 1e6 items the program prints (mpi rank is the first number): mpi 0 resize: 0.0126891, send: 0.00988925, recv: 0 1 resize: 0.0131643, send: 0, recv: 0.00955247 boost::mpi 0 resize: 0.0096425, send: 0.279135, recv: 0 1 resize: 0, send: 0, recv: 0.295702 For vectors of 1e7 items: mpi 0 resize: 0.0974027, send: 0.0538886, recv: 0 1 resize: 0.105708, send: 0, recv: 0.0456324 boost::mpi 0 resize: 0.0517177, send: 2.70333, recv: 0 1 resize: 0, send: 0, recv: 2.82339 And vectors of 5e7 items: mpi 0 resize: 0.590099, send: 0.226269, recv: 0 1 resize: 0.440719, send: 0, recv: 0.375706 boost::mpi 0 resize: 0.198448, send: 13.5335, recv: 0 1 resize: 0, send: 0, recv: 14.0518 Boost::mpi version is always at least 10 times slower. It also seems to run out of memory with smaller number of items implying that unnecessary copies of data are created somewhere. Based on experience with more complex programs (e.g. http://dx.doi.org/10.1016/j.jastp.2014.08.012) I wouldn't recommend boost::mpi for high performance computing. Or in case of user error at least high performance is easier to get with pure MPI... I used boost-1.57.0, g++ (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) and mpirun (Open MPI) 1.6.5. Ilja
participants (4)
-
Adam Romanek
-
Gonzalo BG
-
Ilja Honkonen
-
saloo