Re: [boost] [boost::endian] Request for comments/interest

I wanted to throw a few things into the discussion: 1) Typed vs untyped. We have had various discussions about whether the network endian object should be a different type to the machine endian. I have a pretty strong opinion on this, and I think you should work with the same type. There are 2 core reasons for this: a. Performance when network endian is the same as machine endian - I'll discuss more on this later. b. When you are reading a complex packet of data off the network, a common use case would be to reinterpret_cast a block of data into a struct. When you are using different types for network and machine endian you end up having 2 structs - one to read the data off the network, and one to copy the data into. This could turn into a pretty severe maintenance burden. c. Given that you tend to convert to machine endian at the boundaries, you can treat the untyped interface in a very similar way to a boost::shared_ptr style interface. You call swap_in_place on the return value from the function that creates the initial struct - and therefore you never have access to the network endian version in anything except your most basic read function. As such, I think the type safety of network endian data is a bit of a red herring. It is no different from creating a raw pointer, then half a dozen lines of code later constructing a boost::shared_ptr from it. It is against the principle of the interface, and would create some evil bugs, but hasn't been formally prohibited by the interface. d. Finally, I would definitely want an endian interface that operated at a high level of abstraction - ie at the struct or container level, not at the built-in type level. 2) To copy or not to copy. I have a very big issue with any interface that enforces a copy. If I'm writing something to live on a memory limited embedded device, I absolutely want to be able to endian swap in place. Secondly, we should definitely not accept an interface that copies just on the grounds that the most common operation is from a big endian network to a little endian machine that does always require mutation. The platforms that boost runs on includes plenty of big-endian machine types; and I have come across numerous cases where (for performance reasons on a largely little endian machine environment) data has been left little-endian on the network. Having a do nothing (not even copy) case for when machine and network endianness match is very important. If you are reading a big stream of data from the network, not copying anything is a very important case, and key for writing high performance production code. 3) A final thought is that I am much more keen to use template arguments to decide on which endian swap to use than explicitly coding them in function template names. It makes endian swapping code in a big code base easier to spot, and makes the interface of the endian swapper itself more logical. When I change from big to little endian, I am tweaking a parameter of the endian swapper, not calling a wholly different function. I think from a design perspective, a template parameter makes most sense. Overall, I like Tom's interface, but I have been using it for quite a while (as a disclaimer). Dave

Thanks Dave,
2) To copy or not to copy. <snip>
Dave brings up an important example which I'd like to expand on a little: Suppose your application generates a large amount of data which may need to be endian-swapped. For the sake of argument, say I've just generated an 10GB array that contains some market data, which I want to send in little-endian format to some external device. In the case of the typed interface, in order to send this data, I would have to construct a new 10GB array of little32_t and then copy the data from the host array to the destination array. This has several problems: 1) It is relying on the fact that the typed class can be exactly overlaid over the space required by the underlying type. This is an implementation detail but a concern nonetheless, especially if, for example, you start packing your members for space efficiency. 2) The copy always happens, even if the data doesn't need to change, since it's already in the correct "external" format. This is useless work - not only does it use one CPU to do nothing 10 billion times, it also unnecessarily taxes the memory interfaces, potentially affecting other CPUs/threads (and more, but I hope this is enough of an illustration) swap_in_place<>(r) where r is a range (or swap_in_place<>(begin,end), which is provided for convenience) will be zero cost if no work needs to be done, while having the same complexity as the above (but only!) if swapping is required. With the swap_in_place<>() approach, you only pay for what you need (to borrow from the C++ mantra) Tom

----- Original Message ----- From: "Tomas Puverle" <tomas.puverle@morganstanley.com> To: <boost@lists.boost.org> Sent: Friday, May 28, 2010 6:05 PM Subject: Re: [boost] [boost::endian] Request for comments/interest
Thanks Dave,
2) To copy or not to copy. <snip>
Dave brings up an important example which I'd like to expand on a little:
Suppose your application generates a large amount of data which may need to be endian-swapped.
For the sake of argument, say I've just generated an 10GB array that contains some market data, which I want to send in little-endian format to some external device.
In the case of the typed interface, in order to send this data, I would have to construct a new 10GB array of little32_t and then copy the data from the host array to the destination array.
This has several problems: 1) It is relying on the fact that the typed class can be exactly overlaid over the space required by the underlying type. This is an implementation detail but a concern nonetheless, especially if, for example, you start packing your members for space efficiency.
2) The copy always happens, even if the data doesn't need to change, since it's already in the correct "external" format. This is useless work - not only does it use one CPU to do nothing 10 billion times, it also unnecessarily taxes the memory interfaces, potentially affecting other CPUs/threads (and more, but I hope this is enough of an illustration)
swap_in_place<>(r) where r is a range (or swap_in_place<>(begin,end), which is provided for convenience) will be zero cost if no work needs to be done, while having the same complexity as the above (but only!) if swapping is required. With the swap_in_place<>() approach, you only pay for what you need (to borrow from the C++ mantra)
This is not completly true, as your implementation will at least visit the instance. I was been thinking more on how to convert safely using typed endian without copy, and I think I have found an hint. On top of the Beman' typed classes, we can define a function that will do conditionaly the conversions if the endian changes: template <typename E1, typename E2, typename T> endian<E1, T>& convert_to(endian<E2, T> &e1, endian<E1, T>& e2); // [1] template <typename E, typename T> endian<E, T>& convert_to(endian<E, T> &e1, endian<E, T>& e2); // [2] convert_to [1] will make the conversion on the e2 parameter and return the address of e2. convert_to [2] will do nothing and return the address of e1. In order to extend this to other types we can define a metafunction that gives the endianess of a type. The result, could be big, little or mixed. The metafunction can be defined for some predefined types and for fusion sequences. So any structure that has defined a fusion adaptor will have the metafunction defined. The conver_to function of types with the same endianess will behave like [2], so no need for copy neither visiting the structure. If the endianess is different then we need to make the conversion if needed and the copy. This needs some metaprogramming but it seems reasonable, at least to me. This not solves yet the problem of allocating two different structures and the need to maintain two structures synchronized :(. Best, Vicente

At Fri, 28 May 2010 19:16:41 +0200, vicente.botet wrote: <snip entire foregoing message>
swap_in_place<>(r) where r is a range (or swap_in_place<>(begin,end), which is provided for convenience) will be zero cost if no work needs to be done, while having the same complexity as the above (but only!) if swapping is required. With the swap_in_place<>() approach, you only pay for what you need (to borrow from the C++ mantra)
This is not completly true, as your implementation will at least visit the instance. …
http://www.boost.org/community/policy.html#effective Thank you. -- Dave Abrahams Boost Moderator

template <typename E1, typename E2, typename T> endian<E1, T>& convert_to(endian<E2, T> &e1, endian<E1, T>& e2); // [1]
template <typename E, typename T> endian<E, T>& convert_to(endian<E, T> &e1, endian<E, T>& e2); // [2]
convert_to [1] will make the conversion on the e2 parameter and return the address of e2. convert_to [2] will do nothing and return the address of e1.
In order to extend this to other types we can define a metafunction that gives the endianess of a type. The result, could be big, little or mixed. The metafunction can be defined for some predefined types and for fusion sequences. So any structure that has defined a fusion adaptor will have the metafunction defined.
The conver_to function of types with the same endianess will behave like [2], so no need for copy neither visiting the structure. If the endianess is different then we need to make the conversion if needed and the copy.
This needs some metaprogramming but it seems reasonable, at least to me.
This not solves yet the problem of allocating two different structures and the need to maintain two structures synchronized :(.
In the simple, type-based endian implementation that I attached to a previous email in this thread. You can just write: int main() { struct MyClass { char data[12]; }; endian<little, MyClass> lil_1; endian<little, MyClass> lil_2; endian<big, MyClass> big_1; lil_1 = lil_2; big_1 = lil_1; lil_2 = big_1; } // main The conversions are automatic and implicit. The end-user may not even be aware of the endianness of lil_1, lil_2, or big_1. Each assignment requires a copy though. But I can't think of a use-case that wouldn't require at least one copy anyhow. terry

----- Original Message ----- From: "Terry Golubiewski" <tjgolubi@netins.net> To: <boost@lists.boost.org> Sent: Friday, May 28, 2010 10:46 PM Subject: Re: [boost] [boost::endian] Request for comments/interest
In order to extend this to other types we can define a metafunction that gives the endianess of a type. The result, could be big, little or mixed. The metafunction can be defined for some predefined types and for fusion sequences. So any structure that has defined a fusion adaptor will have the metafunction defined.
The conver_to function of types with the same endianess will behave like [2], so no need for copy neither visiting the structure. If the endianess is different then we need to make the conversion if needed and the copy.
In the simple, type-based endian implementation that I attached to a previous email in this thread. You can just write:
int main() { struct MyClass { char data[12]; };
endian<little, MyClass> lil_1; endian<little, MyClass> lil_2; endian<big, MyClass> big_1;
lil_1 = lil_2; big_1 = lil_1; lil_2 = big_1;
} // main
Maybe, but what about more heterogeneus structures?
The conversions are automatic and implicit. The end-user may not even be aware of the endianness of lil_1, lil_2, or big_1. Each assignment requires a copy though. But I can't think of a use-case that wouldn't require at least one copy anyhow.
I was trying to prove that, the endian "type safe" approach can avoid as well copies (respect to the "untyped" approach) when not necesary, while continue to been type safe. Vicente

----- Original Message ----- From: "vicente.botet" <vicente.botet@wanadoo.fr> Newsgroups: gmane.comp.lib.boost.devel To: <boost@lists.boost.org> Sent: Friday, May 28, 2010 5:21 PM Subject: Re: [boost::endian] Request for comments/interest
int main() { struct MyClass { char data[12]; };
endian<little, MyClass> lil_1; endian<little, MyClass> lil_2; endian<big, MyClass> big_1;
lil_1 = lil_2; big_1 = lil_1; lil_2 = big_1;
} // main
Maybe, but what about more heterogeneus structures?
Ok, here's an actual example that compiles on VC9. (Notice how I worked in "chrono") terry #include "endian.hpp" #include <boost/units/systems/si.hpp> #include <boost/cstdint.hpp> #include <std0x/chrono.h> using namespace boost; using namespace boost::interface; using namespace boost::units; using namespace std0x; using namespace std0x::chrono; template<endian_t, int w1, int w2=0, int w3=0, int w4=0, int w5=0> struct bitfield { char placeholder[(w1 + w2 + w3 + w4 + w5 + 7)/8]; }; // bitfield namespace internet { #pragma pack(push ,1) struct IpHeader { bitfield<big, 4, 4> version_headerLength; enum { version, headerLength }; bitfield<big, 3, 1, 1, 1, 1> differentiated_services; enum { precedence, low_delay, high_thruput, high_reliability, minimize_cost }; endian<big, uint16_t> total_length; endian<big, uint16_t> identification; bitfield<big, 1, 1, 1, 13> flags_frag; enum { reserved, dont_frag, more_frag, frag_offset }; endian<big, uint8_t> time_to_live; endian<big, uint8_t> protocol; endian<big, uint16_t> header_checksum; endian<big, uint32_t> source_address; endian<big, uint32_t> destination_address; }; // IpHeader struct UdpHeader { endian<big, uint16_t> source_port; endian<big, uint16_t> destination_port; endian<big, uint16_t> length; endian<big, uint16_t> checksum; }; // UdpHeader #pragma pack(pop) } // internet #pragma pack(push, 1) struct UserMessage { endian<little, time_point<system_clock, duration<int64_t, nano> > > timestamp; endian<little, uint32_t> aircraft_id; struct Position { endian<little, quantity<si::length, int32_t> > x; endian<little, quantity<si::length, int32_t> > y; endian<little, quantity<si::length, int32_t> > z; } position; struct Attitude { endian<little, quantity<si::plane_angle, int8_t> > heading; endian<little, quantity<si::plane_angle, int8_t> > pitch; endian<little, quantity<si::plane_angle, int8_t> > roll; } attitude; }; // UserMessage struct Packet { internet::IpHeader ipHeader; internet::UdpHeader udpHeader; UserMessage userMessage; }; // Packet #pragma pack(pop) int main() { Packet packet; packet.ipHeader.source_address = 0x12345678; packet.udpHeader.destination_port = 1234; packet.userMessage.timestamp = system_clock::now(); packet.userMessage.position.y = 17 * si::meter; } // main

Terry,
template<endian_t, int w1, int w2=0, int w3=0, int w4=0, int w5=0> struct bitfield { char placeholder[(w1 + w2 + w3 + w4 + w5 + 7)/8]; }; // bitfield
By the way, you could always plug into the facility for swapping user defined types for the above.
struct UserMessage { endian<little, time_point<system_clock, duration<int64_t, nano> > > timestamp;
I was under the impression that Beman's library doesn't support any types beside integers? This definitely looks like a user defined type to me.
struct Position { endian<little, quantity<si::length, int32_t> > x; endian<little, quantity<si::length, int32_t> > y; endian<little, quantity<si::length, int32_t> > z; } position;
Again, UDTs?
struct Packet { internet::IpHeader ipHeader; internet::UdpHeader udpHeader; UserMessage userMessage; }; // Packet #pragma pack(pop)
int main() { Packet packet; packet.ipHeader.source_address = 0x12345678; packet.udpHeader.destination_port = 1234; packet.userMessage.timestamp = system_clock::now(); packet.userMessage.position.y = 17 * si::meter; } // main
I definitely see the elegance of this code and as I said before, I am not opposed to implementing the typed interface. Having said that, I do have several concerns with how this gets actually read from/sent to a device and additionally, as mentioned in another post, I think this "neat" code may be a little opaque for someone not familiar with it. Tom

Tomas, ----- Original Message ----- From: "Tomas Puverle" <Tomas.Puverle@morganstanley.com> Newsgroups: gmane.comp.lib.boost.devel To: <boost@lists.boost.org> Sent: Sunday, May 30, 2010 10:27 AM Subject: Re: [boost::endian] Request for comments/interest
struct UserMessage { endian<little, time_point<system_clock, duration<int64_t, nano> > > timestamp;
I was under the impression that Beman's library doesn't support any types beside integers? This definitely looks like a user defined type to me.
I'm not referring to Beman's endian. I'm referring to "mine" that I posted in an attachment in this thread on May 27. It is still basically Beman's approach with all the integer dependence removed. It provides endian<endian_t, T> where T could be any over-the-wire-sendable type (i.e. no pointers, virtual members, etc).
struct Position { endian<little, quantity<si::length, int32_t> > x; endian<little, quantity<si::length, int32_t> > y; endian<little, quantity<si::length, int32_t> > z; } position;
Again, UDTs?
Yep!
struct Packet { internet::IpHeader ipHeader; internet::UdpHeader udpHeader; UserMessage userMessage; }; // Packet #pragma pack(pop)
int main() { Packet packet; packet.ipHeader.source_address = 0x12345678; packet.udpHeader.destination_port = 1234; packet.userMessage.timestamp = system_clock::now(); packet.userMessage.position.y = 17 * si::meter; } // main
I definitely see the elegance of this code and as I said before, I am not opposed to implementing the typed interface. Having said that, I do have several concerns with how this gets actually read from/sent to a device and additionally, as mentioned in another post, I think this "neat" code may be a little opaque for someone not familiar with it.
I think the "opaqueness" is a major feature of this approach. The user of the higher-level classes has no idea about the underlying endianness of the message fields or if they've been byte-swapped yet or not. It just works. And if the systems engineers later decided to change some of the endianness of some of the message fields, the message definition code would change, but none of the code that uses the messages would have to change at all. Please let's address these device interface read/write concerns with a concrete example. For the example above, on the receiving end, to determine the time delay between sender and receiver, the following code could be used (assuming system_clock's are synced). char buf[BIG_ENOUGH]; size_t bytesRead = device.read(buf, sizeof(buf)); assert(bytesRead >= sizeof(Packet)); const Packet& packet = *reinterpret_cast<const Packet*>(buf); system_clock::duration delay = system_clock::now() - packet.userMessage.timestamp; On the sending end... Packet packet; // load the packet with stuff packet.userMessage.timestamp = system_clock::now(); device.send(&packet, sizeof(packet)); Of course, I'm just showing one field for simplicity. Setting/getting the other fields would be done similarly. How would this example best be done with your library? terry

Terry Golubiewski wrote:
Tomas Puverle wrote:
I definitely see the elegance of this code and as I said before, I am not opposed to implementing the typed interface. Having said that, I do have several concerns with how this gets actually read from/sent to a device and additionally, as mentioned in another post, I think this "neat" code may be a little opaque for someone not familiar with it.
I think the "opaqueness" is a major feature of this approach.
Opacity is good when you want high level abstract behavior, but shouldn't preclude the (potentially) higher efficiency low level API.
The user of the higher-level classes has no idea about the underlying endianness of the message fields or if they've been byte-swapped yet or not.
Either the typed endian values store an extra bit to determine whether the external to host swapping has already occurred, in which case they cannot be written out as is, or the swapping must occur *each time* the value is read. I presume the latter to be the case. When the external and host order match, this isn't a big deal, but imagine code that ignorantly loops over such objects repeatedly reading values when the external and host don't match. The overhead can be significant. That's where the opacity can be problematic and the user of such a library must be warned clearly about this potential.
It just works.
Yes, but it just works inefficiently in some uses cases, unless I missed something.
And if the systems engineers later decided to change some of the endianness of some of the message fields, the message definition code would change, but none of the code that uses the messages would have to change at all.
I see the value of that, but presuming that the code that reads such a message from the external device is encapsulated, the reading and swapping likely is done together once per message, so changes to endianness are localized without the special types. There is danger of writing code in which reading and swapping is not localized, so it would be appropriate to alter users to the value of doing so. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Opacity is good when you want high level abstract behavior, but shouldn't preclude the (potentially) higher efficiency low level API.
Completely agree. I do think, however, that Terry has presented some compelling use cases and I'd like to re-iterate that based on the feedback I've received, it seems inevitable that I'll have to implement the higher-level "typed" interface before I can consider submitting the library for formal review.
Yes, but it just works inefficiently in some uses cases, unless I missed something.
Yes - I am about to reply to one of Terry's posts where we'll discuss this further.
I see the value of that, but presuming that the code that reads such a message from the external device is encapsulated, the reading and swapping likely is done together once per message, so changes to endianness are localized without the special types. There is danger of writing code in which reading and swapping is not localized, so it would be appropriate to alter users to the value of doing so.
Thank you for so succinctly re-iterating the point that Dave Handley and I have been trying to make. I think it is important to realize that the "encapsulation" that you get by using the typed inteface will not be particularly good if the types just get used throughout the application rather than being contained in the I/O part of the code. In my view, it almost amounts to just a false sense of security. Tom

struct Packet { internet::IpHeader ipHeader; internet::UdpHeader udpHeader; UserMessage userMessage; }; // Packet
I do see one problem with this - you have defined your struct with a specified endianness. However, if you wanted to send it to a different destination which requires a different endianness, you would have to duplicate your data structure. The more I think about it, the more it's clear to me that this represents an example of the wrong separation of concerns (no flame intended - simply a software engineering term): In the typed approach, the endianness becomes the property of the message. However, I think endianness should be the property of the destination, no?
char buf[BIG_ENOUGH]; <snip>
Terry, I like your example as it does indeed seem to make simple things simple. But other than my comment above, could you consider the following (simple) scenario: - read a matrix from disk and for the sake of argument, let's say it's in a big-endian format - read a number representing the number of its dimensions - based on the number of elements, swap each element to the machine's representation Please note that this isn't based on any existing code I have and I just typed it in, so I expect there are syntax errors :) void * data = ...; read(fh, data, size); MatrixHeader * mh = static_cast<MatrixHeader*>(data); std::size_t nDims = swap<big_to_machine>(mh.nDims); std::size_t nElem = std::accumulate( make_iterator<big_to_machine>(mh.dims), make_iterator<big_to_machine>(mh.dims + nDims), std::size_t(1), std::multiply<std::size_t>()); //this is the part I am most interested in how you'd solve using the typed //approach. Ignore the potential alignment issue. double * dataBeg = static_cast<unsigned char*>(data) + sizeof(mh), dataEnd = dataBeg + nElem; swap_in_place<big_to_machine>(dataBeg, dataEnd); //process dataBeg ... I can't help but think that the typed approach will have to be O(N) in both little and big-endian scenarios. Tom

Aha! ----- Original Message ----- From: "Tomas Puverle" <tomas.puverle@morganstanley.com> Newsgroups: gmane.comp.lib.boost.devel To: <boost@lists.boost.org> Sent: Tuesday, June 01, 2010 11:01 AM Subject: Re: [boost::endian] Request for comments/interest
struct Packet { internet::IpHeader ipHeader; internet::UdpHeader udpHeader; UserMessage userMessage; }; // Packet
I do see one problem with this - you have defined your struct with a specified endianness.
Yes, the over the wire endianness is always (in my world) specified independently and does not depend on the source or destination endianness. Both endpoints use the same header file (ideally) or each side constructs their own. If it gets more complicated than this, then we use CORBA.
However, if you wanted to send it to a different destination which requires a different endianness, you would have to duplicate your data structure.
Yes, this design assumes compile-time knowledge of endianness. So, the author cannot specify the endianness at runtime; its a template argument.
The more I think about it, the more it's clear to me that this represents an example of the wrong separation of concerns (no flame intended - simply a software engineering term):
In the typed approach, the endianness becomes the property of the message.
I think this is the appropriate separation of concerns. The endianness is a property of how the data in the message is represented.
However, I think endianness should be the property of the destination, no?
I definitely disagree. Because the code should not know the endianness of the destination. There could be several destinations, possibly with differing endianness, for the same message. Also, if a new board gets added that is has a different endian, I should only have to change the code on that board. No other boards should be affected.
char buf[BIG_ENOUGH]; <snip>
Terry, I like your example as it does indeed seem to make simple things simple.
For the kind of work that I do, yes.
- read a matrix from disk and for the sake of argument, let's say it's in a big-endian format - read a number representing the number of its dimensions - based on the number of elements, swap each element to the machine's representation
//this is the part I am most interested in how you'd solve using the typed //approach. Ignore the potential alignment issue. double * dataBeg = static_cast<unsigned char*>(data) + sizeof(mh), dataEnd = dataBeg + nElem; swap_in_place<big_to_machine>(dataBeg, dataEnd);
//process dataBeg ...
I can't help but think that the typed approach will have to be O(N) in both little and big-endian scenarios.
AHA! Yes. If done this way, reading in the entire array and then swapping it in one step, then one would need an endian_swapper, like yours to make it a no-op in the native case. However, if I had to solve this problem with my tools, I wouldn't read the whole array in at once. I would only read in the header, but not the data at first. Unfortunately, I can't remember how to do this with boost::multiarray, if we just allocate a really big vector for the data, then I would have read in each element into the data portion like this. vector<double> dataPortion; dataPortion.reserve(nElem); for (int i=0; i!=nElem; ++i) { endian<big, double> x; read(fh, &x, sizeof(x)); dataPortion.push_back(double(x)); } This requires an extra copy into the temporary x, which cannot be a register, before storing the native version into the vector. I see your point now for the in-place case. Thank you! Assuming I read in the whole array into memory at once, as you did, then I might try this (assuming aliasing rules don't really matter). To address the in-place case with a typed interface, I would make a templated copy function. template<endian_t E1, class T, endian_t E2> inline endian<E2,T>* copy(const endian<E1, T>* first, const endian<E1,T>* last, endian<E2,T>* dest) { if (E1 == E2 && first == dest) return dest + nElem; while (first != last) *dest++ = *first++; return dest; } (I don't recall how to generalize the parameters as iterators. Help anyone?) ... then to swap in place ... endian<big, double>* src = reinterpret_cast<endian<big, double>*>(mh +1); double* dst = reinterpret_cast<double*>(src); (void) copy(src, src+nElem, dst); // Almost a no-op in the native, endian case. ... but I'm not sure if this is legal from an aliasing standpoint. I think this approach would have similar performance to your swap() and swap_in_place(). Tomorrow night, I'll make some measurements. terry

I'm sorry. The example copy-in-place example which read... endian<big, double>* src = reinterpret_cast<endian<big, double>*>(mh +1); double* dst = reinterpret_cast<double*>(src); (void) copy(src, src+nElem, dst); // Almost a no-op in the native, endian case. ... should have read... endian<big, double>* src = reinterpret_cast<endian<big, double>*>(mh +1); endian<native, double>* dst = reinterpret_cast<endian<native, double>*>(src); (void) copy(src, src+nElem, dst); // Almost a no-op in the native, endian case. That is, for the copy() function to work, the destination must be in the form of an endian<> rather than a raw double pointer. double* dst ==> endian<native, double>* dst Sorry for any confusion. terry

Terry Golubiewski wrote:
Tomas Puverle wrote:
struct Packet { internet::IpHeader ipHeader; internet::UdpHeader udpHeader; UserMessage userMessage; }; // Packet
I do see one problem with this - you have defined your struct with a specified endianness.
Yes, the over the wire endianness is always (in my world) specified independently and does not depend on the source or destination endianness. Both endpoints use the same header file (ideally) or each side constructs their own. If it gets more complicated than this, then we use CORBA.
However, if you wanted to send it to a different destination which requires a different endianness, you would have to duplicate your data structure.
Yes, this design assumes compile-time knowledge of endianness. So, the author cannot specify the endianness at runtime; its a template argument.
The more I think about it, the more it's clear to me that this represents an example of the wrong separation of concerns (no flame intended - simply a software engineering term):
In the typed approach, the endianness becomes the property of the message.
I think this is the appropriate separation of concerns. The endianness is a property of how the data in the message is represented.
Disagree.
However, I think endianness should be the property of the destination, no?
I definitely disagree. Because the code should not know the endianness of the destination. There could be several destinations, possibly with differing endianness, for the same message. Also, if a new board gets added that is has a different endian, I should only have to change the code on that board. No other boards should be affected.
You seem to be making a distinction between the communication/transfer layer's endianness and that of the destination. That is not the distinction I make nor, I think, that Tomas is making. For me, there are two endiannesses that matter: the application's and the external, communication/transfer layer's. If the app must manipulate or inspect the data, then its endianness is, of necessity, host order. The means to transfer the data elsewhere dictates whether and to which endianness the application's order must be changed. The apps consuming that data after it leaves the first application are of no concern. (Though one might choose the external format to avoid any swapping where possible.) Once the data reaches another application, presumably the destination in your wording, that application must deal with the communication layer's endianness and decide whether to adjust it. That swapping is wholly independent of the swapping done by the first app. What Tomas noted is that the application's order should not change with the external endianness unless the app does nothing with the data but transfer it. In that case, the app won't use the endian library at all. OTOH, if the app does inspect of manipulate the data, then the endianness will be swapped upon importation into the app and, possibly, again on the exportation. The swapping happens at the boundaries of the application. Within the application, such data should always be in host order.
- read a matrix from disk and for the sake of argument, let's say it's in a big-endian format - read a number representing the number of its dimensions - based on the number of elements, swap each element to the machine's representation
//this is the part I am most interested in how you'd solve //using the typed approach. Ignore the potential alignment //issue. double * dataBeg = static_cast<unsigned char*>(data) + sizeof(mh), dataEnd = dataBeg + nElem; swap_in_place<big_to_machine>(dataBeg, dataEnd);
//process dataBeg ...
I can't help but think that the typed approach will have to be O(N) in both little and big-endian scenarios.
AHA! Yes. If done this way, reading in the entire array and then swapping it in one step, then one would need an endian_swapper, like yours to make it a no-op in the native case. However, if I had to solve this problem with my tools, I wouldn't read the whole array in at once. I would only read in the header, but not the data at first.
That's just what Tomas did in his code. [snip code showing element-by-element copying of data]
I think this approach would have similar performance to your swap() and swap_in_place(). Tomorrow night, I'll make some measurements.
Your code could be as efficient as using Tomas' swap(), but not swap_in_place(), I think. Measurements are good. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

You seem to be making a distinction between the communication/transfer layer's endianness and that of the destination. <snip>
Glad to see you agree.
Your code could be as efficient as using Tomas' swap(), but not swap_in_place(), I think. Measurements are good.
I am looking forward to seeing these. Additionally, just pulling data through the CPU without doing anything to it might invalidate the caches of other processors, cause memory bus bottlenecks etc, effects that might be hard to measure.

Stewart wrote:
You seem to be making a distinction between the communication/transfer layer's endianness and that of the destination. That is not the distinction I make nor, I think, that Tomas is making. For me, there are two endiannesses that matter: the application's and the external, communication/transfer layer's. If the app must manipulate or inspect the data, then its endianness is, of necessity, host order. The means to transfer the data elsewhere dictates whether and to which endianness the application's order must be changed. The apps consuming that data after it leaves the first application are of no concern. (Though one might choose the external format to avoid any swapping where possible.)
Once the data reaches another application, presumably the destination in your wording, that application must deal with the communication layer's endianness and decide whether to adjust it. That swapping is wholly independent of the swapping done by the first app.
What Tomas noted is that the application's order should not change with the external endianness unless the app does nothing with the data but transfer it. In that case, the app won't use the endian library at all. OTOH, if the app does inspect of manipulate the data, then the endianness will be swapped upon importation into the app and, possibly, again on the exportation. The swapping happens at the boundaries of the application. Within the application, such data should always be in host order.
I agree with you that the over-the-wire or physical medium format is fixed and the apps must adjust. I do agree that the host must convert the data into the native format to operate on it. I do not agree that the entire message should be swapped-in-place unless performance is critical.
Your code could be as efficient as using Tomas' swap(), but not swap_in_place(), I think. Measurements are good.
Good prediction! The type-based code is more efficient than Tomas' swap() but less efficient than swap_in_place. Both approaches are equally efficient for the same-endian case. terry

Terry Golubiewski wrote:
Rob Stewart wrote:
I do not agree that the entire message should be swapped-in-place unless performance is critical.
I stated as much in a previous post by saying I could imagine some preferring to use the object-based approach until performance was known to be an issue and then switching to the function-based approach.
Your code could be as efficient as using Tomas' swap(), but not swap_in_place(), I think. Measurements are good.
Good prediction! The type-based code is more efficient than Tomas' swap() but less efficient than swap_in_place.
I think you'll find the test, as crafted, mislead you. Swap will prove at least as efficient when corrected. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Your code could be as efficient as using Tomas' swap(), but not swap_in_place(), I think. Measurements are good.
Good prediction! The type-based code is more efficient than Tomas' swap() but less efficient than swap_in_place.
That is surprising. swap() should have no overhead compared to the endian object types. Then again, it's very early code, and I've done no optimization yet. It could be something as simply as an "inline" missing.
Both approaches are equally efficient for the same-endian case.
Do you mean endian object and swap_in_place or endian object and swap? Tom

That is surprising. swap() should have no overhead compared to the endian object types. Then again, it's very early code, and I've done no optimization yet. It could be something as simply as an "inline" missing.
Both approaches are equally efficient for the same-endian case.
Do you mean endian object and swap_in_place or endian object and swap?
I didn't see a missing inline. I think I'm beginning to understand what is going on. The type-based endian approach suffers in the swap-in-place case, because its designed to be an efficient copier. My implementation of type-based endian requires a copy to a temporary, which must be in memory (not a register), before rewriting the data to the same location in converted form. std::swap() must do this too, but it can use a register for the temporary. This (I think) explains the performance advantage swapping shows for in-place conversion. However, the swap approach suffers when doing a copy, because it has to read the data, then write it (in-place), and then read it again, before writing it to the destination, if you don't want to modify the original data. terry

Aha!
Sounds good! :)
In the typed approach, the endianness becomes the property of the message.
I think this is the appropriate separation of concerns. The endianness is a property of how the data in the message is represented.
Ok, I don't want to belabor this point, as you are the expert in your domain, so I'm just going to say I am not convinced and leave it at that.
I see your point now for the in-place case. Thank you!
Great! Code always helps me, glad we were able to go on the same page.
I think this approach would have similar performance to your swap() and swap_in_place(). Tomorrow night, I'll make some measurements.
That would certainly be useful. However, I do contend that there are other effects, too, which will be hard to measure, which I discussed in another one of my emails. Tom

I think this approach would have similar performance to your swap() and swap_in_place(). Tomorrow night, I'll make some measurements.
The differences are greater than I expected. I did have to write special copy routines for the typed-approach, because std::copy can't copy a buffer to itself. So, I added special copy functions to endian.hpp that do allow copy-in-place. Here are the results, on a 64-bit Wintel Platform. I could not get the endian-swap functions to compile with gcc 4.3 and Ubuntu. terry ===== CONVERT IN PLACE ======== The benchmark program generates an homogeneous data file with 2^20 4-byte unsigned, big-endian integers. The array is read into memory as one big blob, and then is converted in place to machine-endian (little). The result is memcmp'ed to verify the expected result. The reading-in, converting, and verifying was repeated 1000 times. Swap Based: 9 seconds Type Based: 13 seconds When the disk-data-file was in little endian, both approaches came in at around 6 seconds. --- swap-based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&array2), sizeof(array2)); swap_in_place<big_to_machine>(array2.begin(), array2.end()); } assert(memcmp(&array1, &array2, sizeof(array_type)) == 0); } --- type based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&array2), sizeof(array2)); disk_array& src = reinterpret_cast<disk_array&>(array2); interface::copy(src.begin(), src.end(), array2.begin()); } assert(memcmp(&array1, &array2, sizeof(array_type)) == 0); } ======== CONVERT & COPY ========= The benchmark program still generates the same big, homogeneous data file. The array is still read into memory as one big blob. But this time, the conversion is copied to another array, i.e. not in place. The result is still memcmp'ed. Still repeated 1000 times. Swap Based: 18 seconds Type Based 14 seconds When the disk-data-file was in in little endian format both programs took about 9 seconds. --- swap based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&tmp_array), sizeof(tmp_array)); array_type::const_iterator src = tmp_array.begin(); array_type::const_iterator end = tmp_array.end(); array_type::iterator dst = array2.begin(); for ( ; src != end; ++src, ++dst) *dst = swap<little_to_machine>(*src); } assert(memcmp(&array1, &array2, sizeof(array_type)) == 0); } --- typed based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&tmp_array), sizeof(tmp_array)); interface::copy(tmp_array.begin(), tmp_array.end(), array2.begin()); } assert(memcmp(&array1, &array2, sizeof(array_type)) == 0); }

Terry Golubiewski wrote: I have some problems with your tests.
===== CONVERT IN PLACE ========
The benchmark program generates an homogeneous data file with 2^20 4-byte unsigned, big-endian integers. The array is read into memory as one big blob, and then is converted in place to machine-endian (little). The result is memcmp'ed to verify the expected result. The reading-in, converting, and verifying was repeated 1000 times.
Swap Based: 9 seconds Type Based: 13 seconds
When the disk-data-file was in little endian, both approaches came in at around 6 seconds.
--- swap-based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&array2), sizeof(array2));
Reading the data into memory shouldn't be part of the timed code. If you apply the swap-in-place logic an odd number of times, the result will be host order.
swap_in_place<big_to_machine>(array2.begin(), array2.end()); } assert(memcmp(&array1, &array2, sizeof(array_type)) == 0); }
--- type based ---
for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&array2), sizeof(array2));
As above.
disk_array& src = reinterpret_cast<disk_array&>(array2); interface::copy(src.begin(), src.end(), array2.begin());
Two lines instead of one. I find this less desirable. I imagine you could make your copy function more helpful.
} assert(memcmp(&array1, &array2, sizeof(array_type)) == 0); }
======== CONVERT & COPY =========
The benchmark program still generates the same big, homogeneous data file. The array is still read into memory as one big blob. But this time, the conversion is copied to another array, i.e. not in place. The result is still memcmp'ed. Still repeated 1000 times.
Swap Based: 18 seconds Type Based 14 seconds
When the disk-data-file was in in little endian format both programs took about 9 seconds.
--- swap based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&tmp_array), sizeof(tmp_array)); array_type::const_iterator src = tmp_array.begin(); array_type::const_iterator end = tmp_array.end(); array_type::iterator dst = array2.begin(); for ( ; src != end; ++src, ++dst) *dst = swap<little_to_machine>(*src);
s/little_to_machine/big_to_machine/?
} assert(memcmp(&array1, &array2, sizeof(array_type)) == 0); }
--- typed based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&tmp_array), sizeof(tmp_array)); interface::copy(tmp_array.begin(), tmp_array.end(), array2.begin());
Does this actually swap anything? Doesn't this just copy the data to unswapped objects that *would* swap on access? That's hardly a fair comparison.
} assert(memcmp(&array1, &array2, sizeof(array_type)) == 0); }
I don't see any code reading the resulting values. That unfairly taints the tests in favor of the object-based approach. If the underlying data is big endian, then the object-based approach implies swapping on every access to the data. Reading each value once would be the optimal use case for the object-based approach. Reading multiple times would clearly favor the function-based approach. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Robert Stewart wrote...
I have some problems with your tests.
Me too, I didn't get the results I expected. :o)
Reading the data into memory shouldn't be part of the timed code. If you apply the swap-in-place logic an odd number of times, the result will be host order.
The test application is reading a big-endian (not-native) file into memory from a disk file. The reading disk into memory code is exactly the same for both approaches. The application might be a video decompression library where the data was compressed and stored on a big-endian machine. The receiving machine is little-endian though (in my case) and should decode the large "coefficient" file into memory in native format. Because once in memory, each coefficient will be accessed multiple times. This is the "killer-app" that favors in-place-swapping. The disk file remains in big-endian format, and each time is converted to little-endian. I just do it a lot of times so that I get a long measurement interval, so interference from other threads cancels out. The deviation of the results was around 0.1 seconds. Reading data from a file may also help to avoid cache issues. I wasn't trying to measure how fast each approach is, just how to measure their relative speed using a specific application where swapping was expected to have an advantage.
disk_array& src = reinterpret_cast<disk_array&>(array2); interface::copy(src.begin(), src.end(), array2.begin());
Two lines instead of one. I find this less desirable. I imagine you could make your copy function more helpful.
I could make it more helpful; but then it would be called swap_in_place and would be implemented just like Tomas'!
for ( ; src != end; ++src, ++dst) *dst = swap<little_to_machine>(*src);
s/little_to_machine/big_to_machine/?
I did the tests in both big-endian --> little-endian and little->endian to little->endian. Unfortunately, I forgot to change the swap<arg> back to big_to_machine when I cut&pasted. Sorry about the confusion.
--- typed based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&tmp_array), sizeof(tmp_array)); interface::copy(tmp_array.begin(), tmp_array.end(), array2.begin());
Does this actually swap anything? Doesn't this just copy the data to unswapped objects that *would* swap on access? That's hardly a fair comparison.
array2 is just a plain array<uint32_t, SIZE>. tmp_array is an array<endian<big, uint32_t>, SIZE>. So the conversion/copy happens in interface::copy.
I don't see any code reading the resulting values. That unfairly taints the tests in favor of the object-based approach. If the underlying data is big endian, then the object-based approach implies swapping on every access to the data. Reading each value once would be the optimal use case for the object-based approach. Reading multiple times would clearly favor the function-based approach.
Both tests convert the disk-file into native machine format, so there is no reason to re-read the data once it been converted, other than to verify its correctness. terry

Terry Golubiewski wrote:
Robert Stewart wrote...
Reading the data into memory shouldn't be part of the timed code. If you apply the swap-in-place logic an odd number of times, the result will be host order.
The test application is reading a big-endian (not-native) file into memory from a disk file. The reading disk into memory code is exactly the same for both approaches.
It is the same in both cases, but there could be plenty of variation in that larger body of code you actually timed that swamp what you wanted to measure.
The application might be a video decompression library where the data was compressed and stored on a big-endian machine. The receiving machine is little-endian though (in my case) and should decode the large "coefficient" file into memory in native format.
I'm not arguing against reading data from disk, just against timing that part of the code.
Because once in memory, each coefficient will be accessed multiple times. This is the "killer-app" that favors in-place-swapping.
Sure, but you didn't read the data multiple times.
Reading data from a file may also help to avoid cache issues.
That may be true, yet hardly useful when trying to compare the performance of the two approaches.
I wasn't trying to measure how fast each approach is, just how to measure their relative speed using a specific application where swapping was expected to have an advantage.
To do that, you must measure the speed of each approach.
disk_array& src = reinterpret_cast<disk_array&>(array2); interface::copy(src.begin(), src.end(), array2.begin());
Two lines instead of one. I find this less desirable. I imagine you could make your copy function more helpful.
I could make it more helpful; but then it would be called swap_in_place and would be implemented just like Tomas'!
Yours would still be in terms of the endian types.
--- typed based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&tmp_array), sizeof(tmp_array)); interface::copy(tmp_array.begin(), tmp_array.end(), array2.begin());
Does this actually swap anything? Doesn't this just copy the data to unswapped objects that *would* swap on access? That's hardly a fair comparison.
array2 is just a plain array<uint32_t, SIZE>. tmp_array is an array<endian<big, uint32_t>, SIZE>. So the conversion/copy happens in interface::copy.
I don't think that's right. The copy is done in interface::copy(), but the conversion (swap) doesn't occur until you read from the endian objects, right? My point is that you need to walk the array summing the values, or something, to read each value at least once. Otherwise, your code merely copies the data as read from disk to the endian objects but doesn't swap any bytes, and yet you compare that to function-based code that does swap. (Don't forget to write the sum -- or whatever result you compute -- to stdout, for example, to avoid the logic's being optimized out of your test.) _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Robert Stewart wrote:
I could make it more helpful; but then it would be called swap_in_place and would be implemented just like Tomas'!
Yours would still be in terms of the endian types.
What inteface would you suggest?
--- typed based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&tmp_array), sizeof(tmp_array)); interface::copy(tmp_array.begin(), tmp_array.end(), array2.begin());
Does this actually swap anything? Doesn't this just copy the data to unswapped objects that *would* swap on access? That's hardly a fair comparison.
array2 is just a plain array<uint32_t, SIZE>. tmp_array is an array<endian<big, uint32_t>, SIZE>. So the conversion/copy happens in interface::copy.
I don't think that's right. The copy is done in interface::copy(), but the conversion (swap) doesn't occur until you read from the endian objects, right?
Not right. The conversion happens inside of interface::copy which is performing a swap-in-place in this example. After the call to interface::copy, array2 is in native-endian format. Both tests convert the disk file to native-endian format in memory, so no other comparison is really necessary after the conversion, other than the memcmp to verify that the conversion worked.
My point is that you need to walk the array summing the values, or something, to read each value at least once. Otherwise, your code merely copies the data as read from disk to the endian objects but doesn't swap any bytes, and yet you compare that to function-based code that does swap.
I think the memcmp does that. terry

Terry Golubiewski wrote:
Robert Stewart wrote:
Terry Golubiewski wrote:
Robert Stewart wrote:
Terry Golubiewski wrote:
I could make it more helpful; but then it would be called swap_in_place and would be implemented just like Tomas'!
Yours would still be in terms of the endian types.
What inteface would you suggest?
One that takes the raw pointer directly to avoid the reinterpret_cast in user code.
--- typed based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&tmp_array), sizeof(tmp_array)); interface::copy(tmp_array.begin(), tmp_array.end(), array2.begin());
Does this actually swap anything? Doesn't this just copy the data to unswapped objects that *would* swap on access? That's hardly a fair comparison.
array2 is just a plain array<uint32_t, SIZE>. tmp_array is an array<endian<big, uint32_t>, SIZE>. So the conversion/copy happens in interface::copy.
I don't think that's right. The copy is done in interface::copy(), but the conversion (swap) doesn't occur until you read from the endian objects, right?
Not right. The conversion happens inside of interface::copy which is performing a swap-in-place in this example. After the call to interface::copy, array2 is in native-endian format.
OK, I misunderstood what you were doing in interface::copy(). _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

--- typed based --- for (int trial=0; trial != 1000; ++trial) { { ifstream input("array.dat", ios::binary); input.read(reinterpret_cast<char*>(&tmp_array), sizeof(tmp_array)); interface::copy(tmp_array.begin(), tmp_array.end(), array2.begin());
Does this actually swap anything? Doesn't this just copy the data to unswapped objects that *would* swap on access? That's hardly a fair comparison.
array2 is just a plain array<uint32_t, SIZE>. tmp_array is an array<endian<big, uint32_t>, SIZE>. So the conversion/copy happens in interface::copy.
... The conversion happens inside of interface::copy which is performing a *swap-in-place* in this example. After the call to interface::copy, array2 is in native-endian format.
Correction: the above example does not do a conversion-in-place. It does a conversion/copy. Sorry for any confusion. terry

First of all, *thank you* for going into all this trouble and writing the code. However, ...
I have some problems with your tests.
I was going to make the same points as Robert, but he beat me to it. Additionally (I don't think this was mentioned), I don't think the verification should be part of the test, either.
When the disk-data-file was in little endian, both approaches came in at around 6 seconds.
This, I think, you'll find is the overhead of reading the file & verification. In the little endian data case, swap_in_place<>() does nothing! :)
Swap Based: 18 seconds Type Based 14 seconds
This is unexpected...
When the disk-data-file was in in little endian format both programs took about 9 seconds.
But this is as I'd expect - copying should take the same amount of time for both. Thanks again. If there is a performance discrepancy between swap (soon to be endian_cast<>) and the object-based approach, I will make sure to fix it, of course. Tom

Tomas Puverle wrote:
First of all, *thank you* for going into all this trouble and writing the code.
I needed to write some code to understand your use-case. Benchmarks are always criticized, especially hastily created ones. However, it's important to code something because "best" always implicitly implies a "for what". Doing this has enhanced my awareness of endianess issues (and I thought I already knew everything). I'm considering adding an endian_convert_in_place<endian_t, T>(T& x) function to my library, that will be implemented using std::swap. So *thank you* for raising my awareness, and hopefully others are monitoring this thread as well.
I have some problems with your tests.
I was going to make the same points as Robert, but he beat me to it. Additionally (I don't think this was mentioned), I don't think the verification should be part of the test, either.
As Robert pointed out, it should do something with the result, otherwise the whole thing could be optimized away by a clever compiler. Plus, the assert has alerted me to over-zealous optimizations. Its the same overhead for both tests, so it shouldn't affect the results.
When the disk-data-file was in little endian, both approaches came in at around 6 seconds.
Both approaches seem to do nothing. That's as it should be. I'm just using cygwin's "time" command to evaluate the results. So the times given represent program initialization/deinitialization; generating and writing the data file; opening, reading and closing the datafile for each iteration; and verifying that the conversion was correct for each iteration using memcmp. I'm not trying to measure how fast each approach is in absolute terms. Just given a common test app, how do the resulting times compare. The overhead should be the same in both cases.
This, I think, you'll find is the overhead of reading the file & verification. In the little endian data case, swap_in_place<>() does nothing! :)
Both approaches seem to do nothing in the same-endian case.
Swap Based: 18 seconds Type Based 14 seconds
This is unexpected...
Swap-then-copy is less efficient than just a reverse copy (in the general case). These differences are more significant than I expected. If we allow 9 seconds for the disk i/o and memcmp, then the overhead is approximately Type-based: 36% overhead (14-9)/14 Swap-based: 50% overhead (18-9)/18.
Thanks again. If there is a performance discrepancy between swap (soon to be endian_cast<>) and the object-based approach, I will make sure to fix it, of course.
If you implement endian_cast<> using a reverse-copy instead of swapping, you will see a performance improvement. terry

On Thu, Jun 3, 2010 at 9:38 AM, Terry Golubiewski <tjgolubi@netins.net> wrote:
I'm not trying to measure how fast each approach is in absolute terms.
I think the rest of us want as close to an absolute measurements as possible.
Just given a common test app, how do the resulting times compare. The overhead should be the same in both cases.
Since you have included initialization, file read, etc. the resulting times are not very interesting. You have basically diminished the performance differences, skewing the results. Jon

John wrote> Terry wrote >>
I'm not trying to measure how fast each approach is in absolute terms.
I think the rest of us want as close to an absolute measurements as possible.
Just given a common test app, how do the resulting times compare. The overhead should be the same in both cases.
Snce you have included initialization, file read, etc. the resulting times are not very interesting. You have basically diminished the performance differences, skewing the results.
Please submit something more absolute and interesting. I'd like to see it. terry

On Thu, Jun 3, 2010 at 11:01 AM, Terry Golubiewski <tjgolubi@netins.net> wrote:
Please submit something more absolute and interesting. I'd like to see it.
I'm busy at work today. But you can just grab the time in microseconds prior to starting the endian operations, and again when it's done. Print out t2-t1. The reason this is important: If it takes 1 second for my code to run, and yours takes 2 seconds, but we add 10 minutes to the total time, then it looks like our code performs equally well. In reality, mine was twice as fast for the operations in question. Jon

----- Original Message ----- From: "Terry Golubiewski" <tjgolubi@netins.net> To: <boost@lists.boost.org> Sent: Thursday, June 03, 2010 5:38 PM Subject: Re: [boost] [boost::endian] Request for comments/interest
Tomas Puverle wrote:
Swap Based: 18 seconds Type Based 14 seconds
This is unexpected...
Swap-then-copy is less efficient than just a reverse copy (in the general case). These differences are more significant than I expected. If we allow 9 seconds for the disk i/o and memcmp, then the overhead is approximately
Type-based: 36% overhead (14-9)/14 Swap-based: 50% overhead (18-9)/18.
Been a little more conservative: If we allow 6 seconds for the disk i/o and memcmp Type-based: (14-6)=8 Swap-based: (18-6)=12. Ii.e; Swap-based spends 50% more time than Type-based.
Thanks again. If there is a performance discrepancy between swap (soon to be endian_cast<>) and the object-based approach, I will make sure to fix it, of course.
If you implement endian_cast<> using a reverse-copy instead of swapping, you will see a performance improvement.
You are right. Swap should make 50% more copies as a temporary is needed. This is in line with the figures. Vicente

----- Original Message ----- From: "Terry Golubiewski" <tjgolubi@netins.net> To: <boost@lists.boost.org> Sent: Saturday, May 29, 2010 5:55 AM Subject: Re: [boost] [boost::endian] Request for comments/interest
----- Original Message ----- From: "vicente.botet" <vicente.botet@wanadoo.fr> Newsgroups: gmane.comp.lib.boost.devel To: <boost@lists.boost.org> Sent: Friday, May 28, 2010 5:21 PM Subject: Re: [boost::endian] Request for comments/interest
int main() { struct MyClass { char data[12]; };
endian<little, MyClass> lil_1; endian<little, MyClass> lil_2; endian<big, MyClass> big_1;
lil_1 = lil_2; big_1 = lil_1; lil_2 = big_1;
} // main
Maybe, but what about more heterogeneus structures?
Ok, here's an actual example that compiles on VC9. (Notice how I worked in "chrono")
terry
template<endian_t, int w1, int w2=0, int w3=0, int w4=0, int w5=0> struct bitfield { char placeholder[(w1 + w2 + w3 + w4 + w5 + 7)/8]; }; // bitfield
namespace internet {
#pragma pack(push ,1)
struct IpHeader { bitfield<big, 4, 4> version_headerLength; enum { version, headerLength }; bitfield<big, 3, 1, 1, 1, 1> differentiated_services; enum { precedence, low_delay, high_thruput, high_reliability, minimize_cost };
Hi, We are working on a bitfield library and I'm curious in contrasting your approach and ours. I'm interested in seen the complete code and or documentation if (available) for the bitfield part. Could you provide it? Thanks, Vicente

Vincente,
This is not completly true, as your implementation will at least visit the instance.
As Dave already pointed out, it *actually* is completely true. In the case when nothing is supposed to happen, nothing happens - please refer to the implementation. If you still have doubts afterwards, we can discuss it further. Tom

----- Original Message ----- From: "Tomas Puverle" <Tomas.Puverle@morganstanley.com> To: <boost@lists.boost.org> Sent: Sunday, May 30, 2010 2:40 AM Subject: Re: [boost] [boost::endian] Request for comments/interest
Vincente,
This is not completly true, as your implementation will at least visit the instance.
As Dave already pointed out, it *actually* is completely true. In the case when nothing is supposed to happen, nothing happens - please refer to the implementation. If you still have doubts afterwards, we can discuss it further.
Oh, I see. In fact your structurs are either formatted with a specific endianess for all the fields. This is usualy the case, but in the general case you will need to format some fields in big endian and other in litle endian. Messages as for example the posted by Terry, which mixes little and big endian, can not be formatted by your library without visiting each field and some of them will be a no_op while others will be swap_op. This was the reason I said: "This is not completly true, as your implementation will at least visit the instance". If all the fields of your message have an uniform endianess, then you don't need to visit the structure and your swap in place can do nothing if the endianess match. Best, Vicente

----- Original Message ----- From: "Tomas Puverle" <tomas.puverle@morganstanley.com> Newsgroups: gmane.comp.lib.boost.devel To: <boost@lists.boost.org> Sent: Friday, May 28, 2010 11:05 AM Subject: Re: [boost::endian] Request for comments/interest
Thanks Dave,
2) To copy or not to copy. <snip>
Dave brings up an important example which I'd like to expand on a little:
Suppose your application generates a large amount of data which may need to be endian-swapped.
For the sake of argument, say I've just generated an 10GB array that contains some market data, which I want to send in little-endian format to some external device.
In the case of the typed interface, in order to send this data, I would have to construct a new 10GB array of little32_t and then copy the data from the host array to the destination array.
Since IP packets cannot be 10GB, I submit that you're going to have to break your 10GB array down into messages. Then you're going to copy portions of the 10GB array into those messages and send them. In the type-base approach the message may indeed contain an array. boost::array<endian<little, uint32_t>, MaxFragmentSize> buffer; That you copy fragments of the 10GB array into before sending, and then on the receiving size, copy them out. The user on either side of the interface can extract the data from the fields without knowing the endianness of the field or the endianness of the machine he's working on. He doesn't have to know to call a swap function. He just extracts the data using the standard copy algorithm. The conversion happens automatically by implicit conversions. One copy into each message. One copy out. What could be better than that?
This has several problems: 1) It is relying on the fact that the typed class can be exactly overlaid over the space required by the underlying type. This is an implementation detail but a concern nonetheless, especially if, for example, you start packing your members for space efficiency.
In the example I posted, on non-native machines, an object "T" is represented inside of endian<endian_t, T> as "char storage[sizeof T]". Provided that the compiler provides some kind of "packed" directive (all that I use do), then field alignment isn't an issue. Doesn't swap_in_place<>() make the same assumption of overlaying types?
2) The copy always happens, even if the data doesn't need to change, since it's already in the correct "external" format. This is useless work - not only does it use one CPU to do nothing 10 billion times, it also unnecessarily taxes the memory interfaces, potentially affecting other CPUs/threads (and more, but I hope this is enough of an illustration)
In the message-based interfaces that I am used to, one always must copy some data structures into a message before you send it. After all, if you're using byte-streams, then endianness doesn't really apply. There is always at least one copy into the message. The typed-interface only requires one copy of data into each message. In both techniques you have to copy the information out of the message, if you use it, at least one time. The problem with the swapping mechanisum is that the swap, requires a write and a read from every location, before you even read it, whether you actually read the fields or not. And/or, the user has to remember whether he/she has already swapped each field. Since messages are often passed from one protocol layer to the next, usually written by different authors, I shudder to think of the integration experience. The typed method requires one read from each memory location no matter what the endianness is. (IUnfortunately, in the case of poorly optimizing compilers, the read on non-native machines may actually make two copies.) The only efficiency issue with the typed interface is that non-native-endianess values are read out in reverse order byte-by-byte, where the native endian fields can be read out of the message more efficiently using word-sized and aligned data transfers.
swap_in_place<>(r) where r is a range (or swap_in_place<>(begin,end), which is provided for convenience) will be zero cost if no work needs to be done, while having the same complexity as the above (but only!) if swapping is required. With the swap_in_place<>() approach, you only pay for what you need (to borrow from the C++ mantra)
With the typed-approach you only pay for the message fields that you read. No extra work is required on native-endian machines. I think the typed-approach actually fits the "only pay for what you use" mantra better. I get the impression that I'm missing something. If you're game, I'd like to consider a real-world use-case that uses multiple endians and has different protocol layers. That is one over-the-wire packet has several layers of headers, possibly with different endian alignment than the user payload contained. This is common on PC's which often have big-endian IP headers and then have a little-endian user payload. The whole packet is read in from a socket at once into a data buffer owned by a unique_ptr, so the message is not copied from layer-to-layer. I work on proprietary, non-internet networks, so I'm not sure which protocol headers we should use for a use-case. In my wireless applications, the headers are usually padded to an integral number of bytes, but fields within the headers are sometimes not byte-aligned. We're only considering byte-ordering here too. An equally important part of the endian problem for me, is the bit-ordering. For this I use a similar technique for portable bitfields bitfield<endian_t, w1, w2, w3, w4, w5, ...> I'm not sure yet how your swapping technique would affect that. If we can find the time, I think our discussions would benefit from a concrete example to measure against. BTW, I like the interface design of your library and the way you use macros and iterators to ease the swappability of classes, including inheritance. I'm arguing against swapping though because I've been using the type-based method (but not Beman's exactly) successfully for a long time. I'm a very biased. :o). terry

Terry,
Since IP packets cannot be 10GB, I submit that you're going to have to break your 10GB array down into messages.
Thank you for your continued feedback. You have raised some interesting points and issues. Please see my comments inline. First of all, note that I was careful to say "send the data to an external device" but I believe that you are thinking about the problem purely from the point of view of networking, at least your message seems to imply so in this case. I am not going to have to break my message into packets. And *even* if the message needs to be broken into packets, it will not be done by me, but by the OS. I will just call write()/WriteFile() or whatever with the data I have available. I am not going to break it up into packets ahead of time.
boost::array<endian<little, uint32_t>, MaxFragmentSize> buffer;
That you copy fragments of the 10GB array into before sending, and then on the receiving size, copy them out. The user on either side of the interface can extract the data from the fields without knowing the endianness of the field or the endianness of the machine he's working on. He doesn't have to know to call a swap function. He just extracts the data using the standard copy algorithm. The conversion happens automatically by implicit conversions. One copy into each message. One copy out. What could be better than that?
Sorry for the overquote but I wanted to make sure we didn't lose the context in this particular case. Better than 1 copy in/out is 0 copies in/out. Here's how: I memory map a file. Now the data is in memory. Alternatively, I allocate a buffer and read from a disk or from a network. Note that the OS needs to make a copy to get the data into the user-space buffer. At this point, ideally, I should be able to start using the data. If I understand your suggestion correctly, you would, at this point, construct a collection of endian types in place in this buffer, allocate a new buffer and copy the data out to it, during which the swapping would happen. If this is correct, I think there are several problems with this approach: - this may not seem relevant but I think this is really ugly and much less maintainable than the functional approach. - I can do a swap_in_place<>() on the original buffer. 0 copies. 0 work in the case when the endianness is already correct. - On the other hand, you have to allocate a new buffer, placement new all the endian types, perform the copy. Cost: Allocation + at least 2N operations in either case, not to mention the other bad side effects related to unncessary work which I already detailed in another post.
then field alignment isn't an issue.
Correct, but it may affect the quality of the code the compiler can generate. I belive my approach doesn't suffer from this problem.
Doesn't swap_in_place<>() make the same assumption of overlaying types?
No, since the type just gets written back to the same type and location. The only assumption swap_in_place<>() makes is that a swapped type is again representable in the original type. And yes, I will give you that this is a non-trivial assumption, as, as others have pointed out, this may not be valid for floating point values or even pointers on some machines.
In the message-based interfaces that I am used to, one always must copy some data structures into a message before you send it.
But, as I pointed out, we are not just talking about network protocols.
In both techniques you have to copy the information out of the message, if you use it, at least one time. The problem with the swapping mechanisum is that the swap, requires a write and a read from every location,
This is not necessarily true. It may the case with swap_in_place but not necessarily with swap<>(). However, while I have agreed with you that some people might find the endian types useful, I have to take exception to your claim above, that you always need to swap everything. That is simply not true! I can, just like you, do the following: int i = swap<big_to_machine>(s.i); Actually, I personally find this code rather readable and in many respects, I find it more instructive than the following: int i = s.i; where i happens to be an endian type. I would go as far as to argue that my code is much more self-documenting and would lead to fewer surprises for a programmer not familiar with your code.
With the typed-approach you only pay for the message fields that you read.
And equally with the functional approach.
No extra work is required on native-endian machines.
But I think I've demostrated that there is actually a significant amount of extra work required even on native-endian machines.
I think the typed-approach actually fits the "only pay for what you use" mantra better.
Disagree.
I get the impression that I'm missing something. If you're game, I'd like to consider a real-world use-case that uses multiple endians and has different protocol layers.
Of course. I like the idea of actual use cases.
We're only considering byte-ordering here too. An equally important part of the endian problem for me, is the bit-ordering. For this I use a similar technique for portable bitfields
bitfield<endian_t, w1, w2, w3, w4, w5, ...>
I am not sure what the above means, sorry.
I'm arguing against swapping though because I've been using the type-based method (but not Beman's exactly) successfully for a long time. I'm a very biased. :o).
This has been very useful. Thank you. Tom

- I can do a swap_in_place<>() on the original buffer. 0 copies. 0 work in the case when the endianness is already correct. - On the other hand, you have to allocate a new buffer, placement new all the endian types, perform the copy. Cost: Allocation + at least 2N operations in either case, not to mention the other bad side effects related to unncessary work which I already detailed in another post.
OK. I hadn't been considering the memory mapped use-case. But I still feel confident that the typed approach doesn't suffer any extra overhead on same-endian machines. Would you please give me a realistic concrete example, so that I can code it up and measure the number of copies that my approach uses. I am really "stuck" on the network messaging paradigm, so an example will help me.
We're only considering byte-ordering here too. An equally important part of the endian problem for me, is the bit-ordering. For this I use a similar technique for portable bitfields
bitfield<endian_t, w1, w2, w3, w4, w5, ...>
I am not sure what the above means, sorry.
Its somewhat off-topic, and I didn't post the code for this class, because its complicated, but bitfield<big, 3, 6, -21, 9> means four consecutive unsigned integer bit fields with widths of 3, 6, 21, and 9 bits, respectively, 39 bits occupying 5 bytes, with 1 bit of zero as the least-significant-bit of the last octet. The first 3 bits correspond to the most-significant bits of the first octet. The sign on -21 means that that field is a signed integral field, while the others are unsigned. bitfield<little, 3, 6, -21, 9> means almost the same thing, but the first 3 bits are the least significant bits of the first octet, and the single pad bit of zero at the end would be the most-significant bit of the final octet. My point was that I don't think of endianess as being a "swapping" problem, but an "ordering" problem. The bit-endianness of messages makes this clear for me, since swapping doesn't work.

Its somewhat off-topic, and I didn't post the code for this class, because its complicated, but
bitfield<big, 3, 6, -21, 9>
means four consecutive unsigned integer bit fields with widths of 3, 6, 21, and 9 bits, respectively,
Presumably, based on your explanation, you mean 3 unsigned bit fields and one signed one. Seems neat. One question though - is this supposed to swap the "bitfields" (i.e. is this a generalized endian bit swapper) or is the intent to have a bitfield which then gets treated as one "larger" type and gets swapped byte-wise?
My point was that I don't think of endianess as being a "swapping" problem, but an "ordering" problem. The bit-endianness of messages makes this clear for me, since swapping doesn't work.
I would be interested in knowing what you'd think about implementing the swapping using the user defined type endian swapping facility in my library. At first sight, it seems like it should be possible. Tom

----- Original Message ----- From: "Tomas Puverle" <Tomas.Puverle@morganstanley.com> Newsgroups: gmane.comp.lib.boost.devel To: <boost@lists.boost.org> Sent: Sunday, May 30, 2010 10:33 AM Subject: Re: [boost::endian] Request for comments/interest
Its somewhat off-topic, and I didn't post the code for this class, because its complicated, but
bitfield<big, 3, 6, -21, 9>
means four consecutive unsigned integer bit fields with widths of 3, 6, 21, and 9 bits, respectively,
Presumably, based on your explanation, you mean 3 unsigned bit fields and one signed one.
Yes, that's what I mean't. Sorry.
One question though - is this supposed to swap the "bitfields" (i.e. is this a generalized endian bit swapper) or is the intent to have a bitfield which then gets treated as one "larger" type and gets swapped byte-wise?
This is generalized bit swapper. One need not swap the "larger" type. The bitfield in implemented internally as a char array, large enough to hold all the bits. In the above example 3+6+21+9 = 39 bits = 5 bytes (octets).
My point was that I don't think of endianess as being a "swapping" problem, but an "ordering" problem. The bit-endianness of messages makes this clear for me, since swapping doesn't work.
I would be interested in knowing what you'd think about implementing the swapping using the user defined type endian swapping facility in my library. At first sight, it seems like it should be possible.
Let's do this! terry

Sorry for my late replies. I have been offline for a bit. On Fri, May 28, 2010 at 2:27 PM, Terry Golubiewski <tjgolubi@netins.net> wrote:
Since IP packets cannot be 10GB, I submit that you're going to have to break your 10GB array down into messages. Then you're going to copy portions of the 10GB array into those messages and send them. In the type-base approach the message may indeed contain an array.
As Tomas already pointed out, explicitly breaking a blob of data into chunks for network transfer is superfluous in streaming protocols such as TCP. The protocol will take care of this more efficiently at the network and/or transport layer than I can at the application layer. Further, if I break my data blob into k segments and insert my own message header as a preamble to each segment, then I unnecessarily add k*sizeof(my_header) to the stream.
In the message-based interfaces that I am used to, one always must copy some data structures into a message before you send it.
As Tomas also pointed out, this is not a marshaling library. However, a marshaling library could make good use of an "endian" library such as this. In addition to my network needs, I also have at least two image file formats that have the option of storing the arbitrarily sized data on disk in big- or little-endian format. A library that allows me to slurp up the image data into memory and swap in place if needed is terribly useful to me. Jon

----- Original Message ----- From: "Jonathan Franklin" <franklin.jonathan@gmail.com> Newsgroups: gmane.comp.lib.boost.devel To: <boost@lists.boost.org> Sent: Tuesday, June 01, 2010 4:41 PM Subject: Re: [boost::endian] Request for comments/interest Sorry for my late replies. I have been offline for a bit. On Fri, May 28, 2010 at 2:27 PM, Terry Golubiewski <tjgolubi@netins.net> wrote:
Since IP packets cannot be 10GB, I submit that you're going to have to break your 10GB array down into messages. Then you're going to copy portions of the 10GB array into those messages and send them. In the type-base approach the message may indeed contain an array.
As Tomas already pointed out, explicitly breaking a blob of data into chunks for network transfer is superfluous in streaming protocols such as TCP. The protocol will take care of this more efficiently at the network and/or transport layer than I can at the application layer. Further, if I break my data blob into k segments and insert my own message header as a preamble to each segment, then I unnecessarily add k*sizeof(my_header) to the stream.
In the message-based interfaces that I am used to, one always must copy some data structures into a message before you send it.
As Tomas also pointed out, this is not a marshaling library. However, a marshaling library could make good use of an "endian" library such as this. In addition to my network needs, I also have at least two image file formats that have the option of storing the arbitrarily sized data on disk in big- or little-endian format. A library that allows me to slurp up the image data into memory and swap in place if needed is terribly useful to me. Jon In another post, I finally understood this use-case. Again (being the non-swap advocate) to address this, I would suggest... template<endian_t E1, class T, endian_t E2> endian<E2, T>* copy(const endian<E1, T>* first, const endian<E1, T>* last, endian<E2, T>* dest) { if (E1 == E2 && first == dest) return dest + distance(first, last); while (first != last) *dest++ = *first++; return dest; } Assuming you just read a TCP stream of big-endian T's into memory at a location 'char* buf', then you could endian-convert the T's, if necessary, like this... const endian<big, T>* src = reinterpret_cast<const endian<big, T>*>(buf); endian<native, T>* dst = reinterpret_cast<endian<native, T>*>(buf); (void) copy(src, src+numTs, dst); Would this do what you want? terry

Terry Golubiewski wrote:
Jonathan Franklin wrote:
Please clean up Outlook's quoting and attribution messes.
In addition to my network needs, I also have at least two image file formats that have the option of storing the arbitrarily sized data on disk in big- or little-endian format. A library that allows me to slurp up the image data into memory and swap in place if needed is terribly useful to me.
In another post, I finally understood this use-case. Again (being the non-swap advocate) to address this, I would suggest...
template<endian_t E1, class T, endian_t E2> endian<E2, T>* copy(const endian<E1, T>* first, const endian<E1, T>* last, endian<E2, T>* dest) { if (E1 == E2 && first == dest) return dest + distance(first, last); while (first != last) *dest++ = *first++; return dest; }
I assume you'd actually dispatch through a class template's static member function in order to get compile time detection of E1 == E2 via partial specialization. The loop might well be as efficient as Tomas' swap_in_place().
Assuming you just read a TCP stream of big-endian T's into memory at a location 'char* buf', then you could endian- convert the T's, if necessary, like this...
const endian<big, T>* src = reinterpret_cast<const endian<big, T>*>(buf); endian<native, T>* dst = reinterpret_cast<endian<native, T>*>(buf);
(void) copy(src, src+numTs, dst);
A call to swap_in_place() avoids the ugly reinterpret_casts and looks far simpler. Thus far, I'm not convinced that there is any value in the endian types, but keep trying! _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

----- Original Message ----- From: "Stewart, Robert" <Robert.Stewart@sig.com> To: <boost@lists.boost.org> Sent: Wednesday, June 02, 2010 1:24 PM Subject: Re: [boost] [boost::endian] Request for comments/interest
Terry Golubiewski wrote:
Assuming you just read a TCP stream of big-endian T's into memory at a location 'char* buf', then you could endian- convert the T's, if necessary, like this...
const endian<big, T>* src = reinterpret_cast<const endian<big, T>*>(buf); endian<native, T>* dst = reinterpret_cast<endian<native, T>*>(buf);
(void) copy(src, src+numTs, dst);
A call to swap_in_place() avoids the ugly reinterpret_casts and looks far simpler.
You need a cast to see a void* as a T*. This was already the case with Tom code:
void * data = ...; read(fh, data, size); MatrixHeader * mh = static_cast<MatrixHeader*>(data);
Thus far, I'm not convinced that there is any value in the endian types, but keep trying!
If I'm not wrong, swap_in_place can be defined on top of the endian types as follows template <typename E, typename T> swap_in_place(T* data) { const endian<E, T>* src = reinterpret_cast<const endian<E T>*>(data); endian<native, T>* dst = reinterpret_cast<endian<native, T>*>(data); copy(src, src+numTs, dst); } used as follows swap_in_place<big>(data); This swap_in_place is a no-op if native is big. Best, Vicente

vicente.botet wrote:
Rob Stewart wrote:
Terry Golubiewski wrote:
Assuming you just read a TCP stream of big-endian T's into memory at a location 'char* buf', then you could endian- convert the T's, if necessary, like this...
const endian<big, T>* src = reinterpret_cast<const endian<big, T>*>(buf); endian<native, T>* dst = reinterpret_cast<endian<native, T>*>(buf);
(void) copy(src, src+numTs, dst);
A call to swap_in_place() avoids the ugly reinterpret_casts and looks far simpler.
You need a cast to see a void* as a T*. This was already the case with Tom code:
void * data = ...; read(fh, data, size); MatrixHeader * mh = static_cast<MatrixHeader*>(data);
Quite right, but a static_cast suffices, it is to a simpler type, and there is only one.
Thus far, I'm not convinced that there is any value in the endian types, but keep trying!
If I'm not wrong, swap_in_place can be defined on top of the endian types as follows
template <typename E, typename T> swap_in_place(T* data) { const endian<E, T>* src = reinterpret_cast<const endian<E T>*>(data); endian<native, T>* dst = reinterpret_cast<endian<native, T>*>(data); copy(src, src+numTs, dst); }
used as follows
swap_in_place<big>(data);
This swap_in_place is a no-op if native is big.
I'll accept that the casts are zero cost in this case, and copy() can be written better than Terry showed previously, but the extra complexity may thwart optimization, so performance testing or an examination of emitted object code would be necessary to verify your assertion. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

swap_in_place<big>(data);
This swap_in_place is a no-op if native is big.
A no-op in what respect? I (might) give you that there will be no observable state change. But from a runtime perspective, I don't agree. However, there is a much bigger problem with all of the examples using copy, please see point 25.2.1 bullet 3 below from the C++ standard: 25.2.1 Copy [lib.alg.copy] template<class InputIterator, class OutputIterator> OutputIterator copy(InputIterator first, InputIterator last, OutputIterator result); 1 Effects: Copies elements in the range [first, last) into the range [result, result + (last - first)) starting from first and proceeding to last. For each non-negative integer n < (last-first), performs *(result + n) = *(first + n). 2 Returns: result + (last - first). *3 Requires: result shall not be in the range [first, last).* Unless I am missing something, I don't see why you are implementing swap_in_place *on top* of the endian types. That just seems backwards to me (other than being illegal) Is there a fundamental design issue you can see by implementing the endian types on top of the functional primitives? Tom

However, there is a much bigger problem with all of the examples using copy, please see point 25.2.1 bullet 3 below from the C++ standard:
25.2.1 Copy [lib.alg.copy] template<class InputIterator, class OutputIterator> OutputIterator copy(InputIterator first, InputIterator last, OutputIterator result); 1 Effects: Copies elements in the range [first, last) into the range [result, result + (last - first)) starting from first and proceeding to last. For each non-negative integer n < (last-first), performs *(result + n) = *(first + n). 2 Returns: result + (last - first). *3 Requires: result shall not be in the range [first, last).*
Unless I am missing something, I don't see why you are implementing swap_in_place *on top* of the endian types. That just seems backwards to me (other than being illegal)
Is there a fundamental design issue you can see by implementing the endian types on top of the functional primitives?
Tom
I'm implementing swap-in-place on top of endian types to compare that approach to yours. I would **never** do that in practice (unless I really need to ;o) ). I had to make special versions of copy that allowed the destination to be the same as the source. Yes, I think swapping the bytes underneath an object in place is a very bad design practice. You can get some performance gain by doing it though, as I recently found out. I was unable to match the swap-in-place performance using my typed approach. I'm not really sure why not though. But my experiments suggest that if you try to implement a typed-endian interface on top of a swapping one, then you suffer an even worse performance penalty. terry

const endian<big, T>* src = reinterpret_cast<const
endian<big, T>*>(buf); endian<native, T>* dst = reinterpret_cast<endian<native, T>*>(buf);
(void) copy(src, src+numTs, dst);
A call to swap_in_place() avoids the ugly reinterpret_casts and looks far simpler.
Thus far, I'm not convinced that there is any value in the endian types, but keep trying!
swap_in_place still needs a reinterpret_cast too to convert from the 'char * buf' assembled from the hypothetical TCP stream. What bugs me is that 'buf' doesn't point to a 'T' yet. It points to a half-baked 'T'. Only after one calls swap_in_place() is the object really a 'T'. The reinterpret_casts are valuable here because they show that the programmer is doing a potentially dangerous thing -- changing the memory underlying a structure in place! terry

Terry Golubiewski wrote:
Rob Stewart wrote:
Please, people, don't drop attributions and keep them simple.
const endian<big, T>* src = reinterpret_cast<const endian<big, T>*>(buf); endian<native, T>* dst = reinterpret_cast<endian<native, T>*>(buf);
(void) copy(src, src+numTs, dst);
A call to swap_in_place() avoids the ugly reinterpret_casts and looks far simpler.
swap_in_place still needs a reinterpret_cast too to convert from the 'char * buf' assembled from the hypothetical TCP stream.
Not quite. As discussed in a separate post, a static_cast is sufficient and there's only one cast needed. That's still less ugly. (Since buf is a void * in the snipped code, static_cast would work for your casts, too.)
What bugs me is that 'buf' doesn't point to a 'T' yet. It points to a half-baked 'T'. Only after one calls swap_in_place() is the object really a 'T'.
That would apply to floating point, certainly. However, for integer types, your statement is false. Before the swap, they are still integers, they just don't have the desired value. Even in the object-based approach, the object isn't a proper T until you access it and the swap occurs. The implicit swap veneer just gives you the impression that it is already in the desired form.
The reinterpret_casts are valuable here because they show that the programmer is doing a potentially dangerous thing -- changing the memory underlying a structure in place!
Sure, but why make things uglier than necessary? _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Terry wrote:
swap_in_place still needs a reinterpret_cast too to convert from the 'char * buf' assembled from the hypothetical TCP stream.
Robert Stewart wrote:
Not quite. As discussed in a separate post, a static_cast is sufficient and there's only one cast needed. That's still less ugly. (Since buf is a void * in the snipped code, static_cast would work for your casts, too.)
'buf' must be a pointer to "something". The void* is just masking what is really analagous to a reinterpret_cast. void* is just as ugly as reinterpret_cast. Maybe more so.
What bugs me is that 'buf' doesn't point to a 'T' yet. It points to a half-baked 'T'. Only after one calls swap_in_place() is the object really a 'T'.
That would apply to floating point, certainly. However, for integer types, your statement is false. Before the swap, they are still integers, they just don't have the desired value.
I disagree. Physically, they are compatible, but logically, they are different. I think the logical view matters more.
Even in the object-based approach, the object isn't a proper T until you access it and the swap occurs. The implicit swap veneer just gives you the impression that it is already in the desired form.
An endian<X, T> is not the same as a T. Even endian<native, T> must be converted to a T before you can operate on it. The veneer is important, because it works like a T, without actually being a T. I think that subtle distinction is important and is the primary motivation for type-based endian. I object to your use of the word "swap" though. My type-based approach does a reverse_copy, not a swap. This distinction is the fundamental topic of this discussion. When T is not integral, the distinction becomes more important. Although I have found endian<big, double> to be useful in the past. I wouldn't recommend that anyone actually do that (though I wouldn't prohibit it either). For production systems, would prefer that messages contain something like... #pragma pack(push, 1) template<endian_t E> class ieee754::binary64 { // storage in ieee754 format. public: binary64(); // 0.0 in ieee format binary64(double x); // converts from native double to ieee format operator double() const; convert ieee format to native double binary64& operator=(double x); // convert native double to ieee format }; struct Message { typedef interface::ieee754::binary64<big> Double; Double x, y, z; }; #pragma pack(pop) terry

Terry Golubiewski wrote:
Robert Stewart wrote:
Terry Golubiewski wrote:
(Since buf is a void * in the snipped code, static_cast would work for your casts, too.)
'buf' must be a pointer to "something". The void* is just masking what is really analagous to a reinterpret_cast. void* is just as ugly as reinterpret_cast. Maybe more so.
Data read from disk is untyped. Many APIs use void *, others char *. Either way, there's no useful type information. If you have a void *, you can use static_cast to another pointer type. If you have a char *, you must use reinterpret_cast to another pointer type (or two static_casts).
What bugs me is that 'buf' doesn't point to a 'T' yet. It points to a half-baked 'T'. Only after one calls swap_in_place() is the object really a 'T'.
That would apply to floating point, certainly. However, for integer types, your statement is false. Before the swap, they are still integers, they just don't have the desired value.
I disagree. Physically, they are compatible, but logically, they are different.
We're in violent agreement.
I think the logical view matters more.
I can't argue that point, but I don't think it signifies. In either case, the data read from disk is not usable until endianness has been addressed. Once it has, all remaining logic works with the appropriate type.
Even in the object-based approach, the object isn't a proper T until you access it and the swap occurs. The implicit swap veneer just gives you the impression that it is already in the desired form.
An endian<X, T> is not the same as a T. Even endian<native,T> must be converted to a T before you can operate on it. The veneer is important, because it works like a T, without actually being a T. I think that subtle distinction is important and is the primary motivation for type-based endian.
You've made a reasonable case. "Object-based" please.
I object to your use of the word "swap" though. My type-based approach does a reverse_copy, not a swap. This distinction is the fundamental topic of this discussion.
Perhaps you posted code that I was expected to have perused to know that your code did a reverse copy rather than a swap, but I didn't see such code or notice that it did so. However, a reverse copy is not always correct for changing endianness, though certainly so for changes between big and little endianness. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Robert Stewart wrote: However, a reverse copy is not always correct for changing endianness, though certainly so for changes between big and little endianness. I agree, just as swapping is not always correct either. terry

----- Original Message ----- From: "Terry Golubiewski" <tjgolubi@netins.net> To: <boost@lists.boost.org> Sent: Thursday, June 03, 2010 5:50 PM Subject: Re: [boost] [boost::endian] Request for comments/interest
Robert Stewart wrote:
However, a reverse copy is not always correct for changing endianness, though certainly so for changes between big and little endianness.
I agree, just as swapping is not always correct either.
+1 Vicente

"Dave Handley" wrote:
a. Performance when network endian is the same as machine endian - I'll discuss more on this later.
Studies have been performed comparing "receiver makes right" designs (where swapping is only performed by the receiver, if needed) with others. The costs of swapping on modern processors is pretty minimal compared with the design overhead and complexity, so "receiver makes right" many times is not justified.
b. When you are reading a complex packet of data off the network, a common use case would be to reinterpret_cast a block of data into a struct. When you are using different types for network and machine endian you end up having 2 structs - one to read the data off the network, and one to copy the data into. This could turn into a pretty severe maintenance burden.
And (as I just wrote in a different e-mail), you do know the brittleness and maintenance burden of "casting structs" and sending them over the network (or writing to file)? I've seen developers waste days trying to figure out subtle problems of alignment and mismatched message sizes. Not that I'm making a particular argument for or against a specific endian library design, just pointing out that any safe and portable marshalling scheme requires a transformation from the "wire model" to the "internal / memory / object model" somewhere (I.e. a transformation that includes a copy somewhere).
d. Finally, I would definitely want an endian interface that operated at a high level of abstraction - ie at the struct or container level, not at the built-in type level.
Definitely - while a Boost endian library might not be (and shouldn't be) a full marshalling design, it should nicely support one.
2) To copy or not to copy. I have a very big issue with any interface that enforces a copy. If I'm writing something to live on a memory limited embedded device, I absolutely want to be able to endian swap in place.
I agree that a Boost endian library should not require a copy, for the use cases where C-style low-level memory handling ("my struct matches the byte stream") are appropriate. But it should also encourage safer, more portable, and higher-level abstractions where robustness and safety is more important. Cliff

On 28 May 2010 13:44, Cliff Green <cliffg@codewrangler.net> wrote:
Studies have been performed comparing "receiver makes right" designs (where swapping is only performed by the receiver, if needed) with others. The costs of swapping on modern processors is pretty minimal compared with the design overhead and complexity, so "receiver makes right" many times is not justified.
I understood it more as "we use little-endian for our network format, since most of our machines are little-endian, but still want to allow things to work on the couple of big-endian machines", not as "receiver makes right".

Cliff Green wrote:
And (as I just wrote in a different e-mail), you do know the brittleness and maintenance burden of "casting structs" and sending them over the network (or writing to file)? I've seen developers waste days trying to figure out subtle problems of alignment and mismatched message sizes.
One can easily specify a field with the endian type or with the right endianness, so using the endian types doesn't preclude mistakes. However, I concede that writing procedural code to swap each field, including newly added fields, is more brittle. Presuming that the function approach is more efficient, I can imagine writing code with the endian types and, when performance becomes an issue, switching to normal types plus swapping function calls. Having the choice is valuable. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.
participants (10)
-
Cliff Green
-
Dave Handley
-
David Abrahams
-
Jonathan Franklin
-
Scott McMurray
-
Stewart, Robert
-
Terry Golubiewski
-
Tomas Puverle
-
Tomas Puverle
-
vicente.botet