
I think this approach would have similar performance to your swap() and swap_in_place(). Tomorrow night, I'll make some measurements.
The differences are greater than I expected. I did have to write special copy routines for the typed-approach, because std::copy can't copy a buffer to itself. So, I added special copy functions to endian.hpp that do allow copy-in-place. Here are the results, on a 64-bit Wintel Platform. I could not get the endian-swap functions to compile with gcc 4.3 and Ubuntu. terry ===== CONVERT IN PLACE ======== The benchmark program generates an homogeneous data file with 2^20 4-byte unsigned, big-endian integers. The array is read into memory as one big blob, and then is converted in place to machine-endian (little). The result is memcmp'ed to verify the expected result. The reading-in, converting, and verifying was repeated 1000 times. Swap Based: 9 seconds Type Based: 13 seconds When the disk-data-file was in little endian, both approaches came in at around 6 seconds. --- swap-based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&array2), sizeof(array2)); swap_in_place<big_to_machine>(array2.begin(), array2.end()); } assert(memcmp(&array1, &array2, sizeof(array_type)) == 0); } --- type based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&array2), sizeof(array2)); disk_array& src = reinterpret_cast<disk_array&>(array2); interface::copy(src.begin(), src.end(), array2.begin()); } assert(memcmp(&array1, &array2, sizeof(array_type)) == 0); } ======== CONVERT & COPY ========= The benchmark program still generates the same big, homogeneous data file. The array is still read into memory as one big blob. But this time, the conversion is copied to another array, i.e. not in place. The result is still memcmp'ed. Still repeated 1000 times. Swap Based: 18 seconds Type Based 14 seconds When the disk-data-file was in in little endian format both programs took about 9 seconds. --- swap based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&tmp_array), sizeof(tmp_array)); array_type::const_iterator src = tmp_array.begin(); array_type::const_iterator end = tmp_array.end(); array_type::iterator dst = array2.begin(); for ( ; src != end; ++src, ++dst) *dst = swap<little_to_machine>(*src); } assert(memcmp(&array1, &array2, sizeof(array_type)) == 0); } --- typed based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&tmp_array), sizeof(tmp_array)); interface::copy(tmp_array.begin(), tmp_array.end(), array2.begin()); } assert(memcmp(&array1, &array2, sizeof(array_type)) == 0); }