
Hi, in light of the performance questions let me summarize some details of how the proposed Boost.MPI library sends data: If an object is sent for which an MPI_Datatype exists, then communication is done using that MPI_Datatype. This applies to both the builtin primitive types, as well as custom MPI_Datatypes for "POD- like" types for which an MPI_Datatype can be constructed. For each such type, an MPI_Datatype is built *once* during execution of the program by using the serialization library. If a fixed-size array is sent for which an MPI_Datatype exists then again send is done using the MPI_Datatype. Thus for these two cases we get optimal performance, and a much simplified usage, since the creation of MPI_Datatypes is made much easier than in MPI. For all other types (variable-sized vectors, linked lists, trees) the data structure is serialized into a buffer by the serialization library using MPI_Pack. Again MPI_Datatypes are used wherever they exist, and contiguous arrays of homogeneous types for which MPI_Datatypes exist are serialized using a single MPI_Pack call (using a new optimization in Boost.Serialization). At the end, the buffer is sent using MPI_Send. Note here that while MPI_Pack calls do incur an overhead, we are talking about sending complex data structures for which no corresponding MPI call exists, and any program directly written using MPI would also need to first serialize the data structure into a buffer. To counter the mentioned overhead, there is the "skeleton&content" mechanism for cases when a data structure needs to be sent multiple times, with different "contents" while the "skeleton" (the sizes of arrays, the values of pointers, ...) of the data structure remains unchanged. In that case the structural information only (sizes, types, pointers) is serialized using MPI_Pack and sent once so that the receiving side can create an identical data structure to receive the data. Afterwards an MPI_Datatype for the "contents" (the data members) of the data structure is created, and sending the content is done using this custom MPI_Datatype, which again gives optimal performance. It seems to me that the simplicity of the interface does a good job at hiding these optimizations from the user. If anyone knows of a further optimization trick that could be used then please post it to the list. Matthias