
----- Original Message ----- From: "Tomas Puverle" <tomas.puverle@morganstanley.com> Newsgroups: gmane.comp.lib.boost.devel To: <boost@lists.boost.org> Sent: Friday, May 28, 2010 11:05 AM Subject: Re: [boost::endian] Request for comments/interest
Thanks Dave,
2) To copy or not to copy. <snip>
Dave brings up an important example which I'd like to expand on a little:
Suppose your application generates a large amount of data which may need to be endian-swapped.
For the sake of argument, say I've just generated an 10GB array that contains some market data, which I want to send in little-endian format to some external device.
In the case of the typed interface, in order to send this data, I would have to construct a new 10GB array of little32_t and then copy the data from the host array to the destination array.
Since IP packets cannot be 10GB, I submit that you're going to have to break your 10GB array down into messages. Then you're going to copy portions of the 10GB array into those messages and send them. In the type-base approach the message may indeed contain an array. boost::array<endian<little, uint32_t>, MaxFragmentSize> buffer; That you copy fragments of the 10GB array into before sending, and then on the receiving size, copy them out. The user on either side of the interface can extract the data from the fields without knowing the endianness of the field or the endianness of the machine he's working on. He doesn't have to know to call a swap function. He just extracts the data using the standard copy algorithm. The conversion happens automatically by implicit conversions. One copy into each message. One copy out. What could be better than that?
This has several problems: 1) It is relying on the fact that the typed class can be exactly overlaid over the space required by the underlying type. This is an implementation detail but a concern nonetheless, especially if, for example, you start packing your members for space efficiency.
In the example I posted, on non-native machines, an object "T" is represented inside of endian<endian_t, T> as "char storage[sizeof T]". Provided that the compiler provides some kind of "packed" directive (all that I use do), then field alignment isn't an issue. Doesn't swap_in_place<>() make the same assumption of overlaying types?
2) The copy always happens, even if the data doesn't need to change, since it's already in the correct "external" format. This is useless work - not only does it use one CPU to do nothing 10 billion times, it also unnecessarily taxes the memory interfaces, potentially affecting other CPUs/threads (and more, but I hope this is enough of an illustration)
In the message-based interfaces that I am used to, one always must copy some data structures into a message before you send it. After all, if you're using byte-streams, then endianness doesn't really apply. There is always at least one copy into the message. The typed-interface only requires one copy of data into each message. In both techniques you have to copy the information out of the message, if you use it, at least one time. The problem with the swapping mechanisum is that the swap, requires a write and a read from every location, before you even read it, whether you actually read the fields or not. And/or, the user has to remember whether he/she has already swapped each field. Since messages are often passed from one protocol layer to the next, usually written by different authors, I shudder to think of the integration experience. The typed method requires one read from each memory location no matter what the endianness is. (IUnfortunately, in the case of poorly optimizing compilers, the read on non-native machines may actually make two copies.) The only efficiency issue with the typed interface is that non-native-endianess values are read out in reverse order byte-by-byte, where the native endian fields can be read out of the message more efficiently using word-sized and aligned data transfers.
swap_in_place<>(r) where r is a range (or swap_in_place<>(begin,end), which is provided for convenience) will be zero cost if no work needs to be done, while having the same complexity as the above (but only!) if swapping is required. With the swap_in_place<>() approach, you only pay for what you need (to borrow from the C++ mantra)
With the typed-approach you only pay for the message fields that you read. No extra work is required on native-endian machines. I think the typed-approach actually fits the "only pay for what you use" mantra better. I get the impression that I'm missing something. If you're game, I'd like to consider a real-world use-case that uses multiple endians and has different protocol layers. That is one over-the-wire packet has several layers of headers, possibly with different endian alignment than the user payload contained. This is common on PC's which often have big-endian IP headers and then have a little-endian user payload. The whole packet is read in from a socket at once into a data buffer owned by a unique_ptr, so the message is not copied from layer-to-layer. I work on proprietary, non-internet networks, so I'm not sure which protocol headers we should use for a use-case. In my wireless applications, the headers are usually padded to an integral number of bytes, but fields within the headers are sometimes not byte-aligned. We're only considering byte-ordering here too. An equally important part of the endian problem for me, is the bit-ordering. For this I use a similar technique for portable bitfields bitfield<endian_t, w1, w2, w3, w4, w5, ...> I'm not sure yet how your swapping technique would affect that. If we can find the time, I think our discussions would benefit from a concrete example to measure against. BTW, I like the interface design of your library and the way you use macros and iterators to ease the swappability of classes, including inheritance. I'm arguing against swapping though because I've been using the type-based method (but not Beman's exactly) successfully for a long time. I'm a very biased. :o). terry