
At Thu, 26 Aug 2010 16:05:41 -0800, Robert Ramey wrote:
Dave Abrahams wrote:
BoostPro Computing, http://boostpro.com Sent from coveted but awkward mobile device
If one were using heterogenious machines, I could understand the usage of MPI types. But as I understand it, the MPI serialization presumes that the machines are binary compatible.
You're mistaken.
So I'm just not seeing this.
Start reading here: http://www.boost.org/doc/libs/1_39_0/doc/html/mpi/tutorial.html#mpi.skeleton... and all will be revealed
lol - I've read that several times.
I always wonder, when you write that, whether you're physically laughing out loud. That's OK; don't spoil the mystique ;-)
I just never found it to be very revealing. The word skeleton seemed pretty suggestive. It's still not clear to me how such a think can work between heterogeneous machines. For example, if I have an array of 2 byte integers and they each need to get transformed one by one into a 4 byte integer because that's closest MPI data type,
I think you don't understand what MPI datatypes do. They are not like an API that you have to conform to.
I don't see how the fact that the "shape" doesn't change helps you.
Well, let me try to explain all this; I think it is important stuff and not generally as well-understood as it should be. And since Boost.MPI and Boost.Serialization are so closely related, I think it's especially important that *you* underestand. I hope Matthias will correct me if I get anything wrong. The first thing to understand is at the hardware level. Network cards have a fixed-size buffer. Sending anything over the network involves getting it into the buffer. If the buffer fills up, packets are waiting to go out, and you can't put anything further in the buffer until that happens. One mission of MPI (the non-boost variety) is to provide a portable high-level API for sending out these messages. Therefore, MPI deals with the low-level stuff and has/needs direct access to the network buffer *but code written on top of MPI does not* <== note emphasis. <<<=== No, really, note it! Now let's assume a heterogeneous environment. In that case, you can't just “blit the bytes;” somebody needs to figure out how to encode data for transmission and decode it upon receipt so that it has the same meaninig on both ends. In particular, bytes may need to be permuted to account for endian-ness. Let's further assume a system that transmits in little-endian, so before sending from a big-endian machine one needs to swap bytes (all the other possible schemes have the same consequences---by choosing one we can work with specific examples). The code doing the permutation has to know about the data structure, rather than operating on the data as raw bytes only. For example, if the data structure is a sequence of 32-bit integers, you need to reverse each group of 4 consecutive bytes in-place. If it's a sequence of 16-bit integers, you need to reverse pairs of bytes in-place, and if it's a sequence of chars, you don't need to do anything. Now suppose MPI only knew about byte sequences --- sort of like files only know about byte sequences. What would we do? We'd use something like Boost.Serialization to handle the translation. We'd serialize our source data into a portable representation that could be passed to MPI, which would then copy the bytes into the network buffer as needed. That's two copies of every byte. One copy to serialize, and another copy into the network buffer. However, we have MPI datatypes and type maps, which describe the structure of data to be transmitted. Take a quick look at http://ww2.cs.mu.oz.au/498/notes/node20.html, and note that an MPI type map is a sequence of (type, byteoffset) pairs. If there's padding in your data structure, the type map captures that fact so padding bytes aren't sent. So we can tell MPI about the data structure and let MPI put the serialized representation *directly* into the network buffer. That's one copy of every byte. Of course, making this efficient depends on sending the same data structure multiple times. MPI type maps are on the order of the same size as the data structure they describe, so creating one might be roughly the same cost as a copy. So using the same "shape" over and over again is important. Unfortunately, an MPI type map is a pain to create, and is even more painful to maintain as data structures change. So, typically, type maps get created for a very few structs (essentially) and more complex structures are typically sent with a series of MPI calls (some of which use type maps). The genius of Boost.MPI is in three realizations: 1. You can represent all the values in an arbitrarily complex non-contiguous data structure with a single type map. It's like a struct that extends from the data's minimum address to its maximum, probably with *lots* of padding. 2. When you serialize a complex data structure, the archive sees the type and address of every datum. 3. You can treat addresses as byte offsets (from address zero). So Boost.MPI has an Archive type one that creates an MPI type map by treating addresses as offsets and translating fundamental C++ types into MPI datatypes. This involves no actual data copying. Then it uses the type map to send the "giant struct beginning at address zero." This avoids an expensive intermediate serialization phase before MPI actually gets its paws on your data. HTH, and I hope I got all that right :-) -- Dave Abrahams BoostPro Computing http://www.boostpro.com