[boost::endian] Request for comments/interest

Tomas Puverle

26 May 2010 26 May '10

4:37 p.m.

I wrote an endian library on the flight back from boostcon but it has taken a little time to clear the submission with the firm. I realize that Beman's endian swapper library is already in the review queue but I hope my approach is different and interesting enough to warrant consideration. It's located here in the vault (Utilities/endian-puverle.zip): http://www.boostpro.com/vault/index.php?action=downloadfile&filename=endian-puverle.zip&directory=Utilities& The highlights of the library are the following: - Very simple interface: swap<machine_to_little/big>(), swap_in_place<machine_to_little/big>() - Support for built-in and user-defined data types and ranges thereof. E.g. you can swap ranges of structs containing/deriving from other structs - endian::iterator, which will iterate across a range swapping values as necessary. It works with any swappable type. The library is described/documented/tested in the file main.cpp at the root of the zip archive. Unfinished items: - Right now, the library only includes the "vanilla" swapper, meaning it doesn't take advantage of special instructions on architectures which natively support endian conversion. I didn't include this in order to prevent cluttering the code with #ifdefs. This functionality would, however, be part of the final library. - Ideally, the metafunction is_mutable_range<> should be moved out of endian::detail and become part of Boost.Range of Boost.TypeTraits. I have donned by bulletproof vest and I'm ready for any feedback you may have. :) Tom

Show replies by date

Stewart, Robert

26 May 26 May

5:07 p.m.

Tomas Puverle wrote:

...

I wrote an endian library on the flight back from boostcon but it has taken a little time to clear the submission with the firm.

I recall our discussion of the inherent inefficiencies of Beman's approach -- that all operations on a boost::endian type potentially require swapping -- whereas the typical use case is to use native endianness until ready to read/write using wire format. It is only when ready to do I/O that there is a need to account for endianness.

...

I realize that Beman's endian swapper library is already in the review queue but I hope my approach is different and interesting enough to warrant consideration.

Alternatives are good.

...

The highlights of the library are the following: - Very simple interface: swap<machine_to_little/big>(), swap_in_place<machine_to_little/big>()

In my version of such functionality, I have byte_order::to_host() and byte_order::to_network() function templates. Those names are more in keeping with the C functions htons, ntohs, etc. I have versions that return a swapped copy of the input and those that take a non-const reference to a destination type, which means both types can be deduced and checked. Mine doesn't require the input and output type to be the same, but they must meet certain criteria checked via Boost.Concept Checking: they must be the same size, have the same signedness, and be of a supported size (currently up to 64 bits). namespace byte_order { template <class T, class U> void to_host(T &, U); template <class T, class U> T to_host(U); }

...

- Support for built-in and user-defined data types and ranges thereof. E.g. you can swap ranges of structs containing/deriving from other structs

I haven't looked at the implementation you've selected, but there are several ways to do the swapping: casting among pointer types, straight casting, union-based punning, and memcpy(). The optimal choice for a particular data size and platform varies. The pointer-casting approach can lead to type punning problems for GCC without -fno-strict-aliasing. Whether to choose one approach for simplicity or use performance testing to find all of the special cases and select the approach -- including platform intrinsics -- to maximize performance, is the choice you must make.

...

- endian::iterator, which will iterate across a range swapping values as necessary. It works with any swappable type.

Nice idea!

...

The library is described/documented/tested in the file main.cpp at the root of the zip archive.

I'll try to have a look. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Tomas Puverle

7:43 p.m.

Hi Rob Thanks for the positive comments.

...

I have versions that return a swapped copy of the input and those that take a non-const reference to a destination

That is precisely the purpose for swap<>() and swap_in_place<>()

...

...
- endian::iterator, which will iterate across a range swapping values as necessary. It works with any swappable type.

Nice idea!

I find it very useful when reading memory-mapped binary files containing aligned homogeneous data. Also, if the data is already in the order you need, the iterator factory simply returns the original iterator, meaning it's zero cost in that case. I look forward to any further questions/comments you may have. Tom

Stewart, Robert

7:46 p.m.

Tomas Puverle wrote:

...

...
I have versions that return a swapped copy of the input and those that take a non-const reference to a destination

That is precisely the purpose for swap<>() and swap_in_place<>()

Granted, but why not overload versus introduce a long name like "swap_in_place?" _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Tomas Puverle

8:16 p.m.

...

Granted, but why not overload versus introduce a long name like "swap_in_place?"

My reasoning: 1) You cannot overload based on const vs non-const ref, so these two functions wouldn't work: template<typename T> void swap(T & t); template<typename T> T swap(const T & t); 2) Even if you could, I think the name swap_in_place<>() better expresses the intent of the function. However, I could rename it to "swap_loc" or something similar if that's preferable to users. Tom

Stephan T. Lavavej

9:26 p.m.

[Tomas Puverle]

...

You cannot overload based on const vs non-const ref

Of course you can. C:\Temp>type meow.cpp #include <iostream> #include <ostream> #include <string> using namespace std; template <typename T> void meow(T&) { cout << "modifiable!" << endl; } template <typename T> T meow(const T& t) { cout << "const!" << endl; return t; } string modifiable_rvalue() { return "purr"; } // For demonstration purposes only. // Functions shouldn't return by const value. const string const_rvalue() { return "hiss"; } int main() { int m = 1701; const int c = 1729; meow(m); meow(c); string x = "cat"; const string y = "kitty"; meow(x); meow(y); meow(modifiable_rvalue()); meow(const_rvalue()); } C:\Temp>cl /EHsc /nologo /W4 meow.cpp meow.cpp C:\Temp>meow modifiable! const! modifiable! const! const! const! Stephan T. Lavavej Visual C++ Libraries Developer

Tomas Puverle

27 May 27 May

1:30 a.m.

Hey Stephan,

...

Of course you can.

I know that's what I said but it's not what I meant :) My point was about the fact that if you have two functions with the following signatures template<typename T> void swap(T & t); template<typename T> T swap(const T & t); when the user has this int i; then there is no way to specify whether or not he wants the copying or the swap-in-place version (ok, I can think of a way but I think it's way too convoluted) swap(i); //what does this mean? I was just trying to explain why I chose to have a swap_in_place<>() rather than rely on overloading. Sorry for any confusion.

Giovanni Piero Deretta

9:55 a.m.

On Thu, May 27, 2010 at 2:30 AM, Tomas Puverle <Tomas.Puverle@morganstanley.com> wrote: [...]

...

My point was about the fact that if you have two functions with the following signatures

template<typename T> void swap(T & t);

template<typename T> T swap(const T & t);

when the user has this

int i;

then there is no way to specify whether or not he wants the copying or the swap-in-place version (ok, I can think of a way but I think it's way too convoluted)

Static casting?

...

swap(i); //what does this mean?

I was just trying to explain why I chose to have a swap_in_place<>() rather than rely on overloading.

The biggest reason not to do that, IMHO, is that oveloards should always have the same behavior. -- gpd

Stewart, Robert

1:10 p.m.

Tomas Puverle wrote:

...

...
Granted, but why not overload versus introduce a long name like "swap_in_place?"

My reasoning:

1) You cannot overload based on const vs non-const ref, so these two functions wouldn't work:

template<typename T> void swap(T & t);

template<typename T> T swap(const T & t);

2) Even if you could, I think the name swap_in_place<>() better expresses the intent of the function. However, I could rename it to "swap_loc" or something similar if that's preferable to users.

The name is ugly, but that's not quite the point I was making, mostly because I was blinded by my own interface when looking at yours. Allow me to show again what I have and see how the ideas can be merged: template <class T, class U> void to_host(T &, U); template <class T, class U> T to_host(U); I don't provide in-place byte swapping as you can see, and that's an interesting use case I hadn't considered. That interface provides for the following use cases: { int32_t wire(/* from wire */); int i; to_host(i, wire); } { int32_t wire(/* from wire */); int i(to_host<int>(wire)); } In the latter case, the function looks like a new-style cast. Arguably, that could be named with "_cast" to make that connotation clearer. You and others have noted that wire format isn't always big endian; I raised that issue at BoostCon. My current interface is modeled after ntohs() and similar functions, so it assumes big endian wire format. Your inclusion of a tag to indicate the direction of the endianness change, if any, accounts for that, obviously, but looks ugly to me. Here's another take on the interface: template <class T, class U> void from_big_endian(T &, U); template <class T> void from_big_endian(T &); template <class T, class U> void to_big_endian(T &, U); template <class T> void to_big_endian(T &); template <class T, class U> void from_little_endian(T &, U); template <class T> void from_little_endian(T &); template <class T, class U> void to_little_endian(T &, U); template <class T> void to_little_endian(T &); template <class T, class U> T big_endian_cast(U); template <class T, class U> T little_endian_cast(U); Obviously, I've embedded the direction and ordering in the function template names rather than in a template argument, but that looks more readable to me. Of course, the direction is relative to the host order. The casts and the copy-swap functions swap U and return T so long as T is big enough. That means static checks for relative sizes and signedness. Obviously, the to_*_endian<>() functions correspond, in part, to your swap_in_place<>() and the casts to your swap<>(). Naming them *_cast is much nicer. The two-argument overloads are safer than the casts because they eliminate the chance for truncation: int32_t wire(/* from wire */); short s(big_endian_cast<int32_t>(wire)); Compilers can be configured to warn about such things, of course, but programmers are lazy and can ignore warnings. If that were written as follows, the compiler can verify the types: int32_t wire(/* from wire */); short s; to_big_endian(s, wire); // ERROR: destination too small That API supports these use cases: { int32_t wire(/* from wire */); long l; to_big_endian(l, wire); /// OK if sizeof(long) >= sizeof(int32_t) } { int32_t value(/* from wire */); to_big_endian(value); } { int32_t wire(/* from wire */); int i(little_endian_cast<int>(wire)); /// OK if sizeof(int) >= sizeof(int32_t) } _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Kim Barrett

2:32 p.m.

On May 27, 2010, at 9:10 AM, Stewart, Robert wrote:

...

Here's another take on the interface:

template <class T, class U> void from_big_endian(T &, U);

[... snip several more ...] template <class T, class U> T little_endian_cast(U);

Obviously, I've embedded the direction and ordering in the function template names rather than in a template argument, but that looks more readable to me.

What about various flavors of mixed / middle endian? Some of them actually do appear in the wild, which leads me to prefer the template argument approach to a potential proliferation of function names. The template argument approach also might be easier to work with in generic code.

Stewart, Robert

2:50 p.m.

Kim Barrett wrote:

...

On May 27, 2010, at 9:10 AM, Stewart, Robert wrote:

...
Here's another take on the interface:

template <class T, class U> void from_big_endian(T &, U);

[... snip several more ...] template <class T, class U> T little_endian_cast(U);

Obviously, I've embedded the direction and ordering in the function template names rather than in a template argument, but that looks more readable to me.

What about various flavors of mixed / middle endian? Some of them actually do appear in the wild, which leads me to prefer the template argument approach to a potential proliferation of function names. The template argument approach also might be easier to work with in generic code.

I've never seen those, so I wasn't accounting for them. They couldn't be addressed by the library directly because they differ based upon data size and its context. Indeed, I can't think how that could be addressed except by some structure traversing functionality which would need to know the rules to apply to the fields in a given structure for a given mixed endianness. If these functions are to apply to adapted structs, then encoding the endianness in the names is a problem. If they only apply to simpler types, then it isn't. Still, the argument about usefulness within generic code is strong as I can imagine an algorithm that is given endianness as a template parameter. It isn't clear that directionality should be parameterized. That leads to this: enum endianness { big_endian, little_endian }; template <endianness E, class T, class U> void from(T &, U); template <endianness E, class T> void from(T &); template <endianness E, class T, class U> void to(T &, U); template <endianness E, class T> void to(T &); template <class T, class U> T big_endian_cast(U); template <class T, class U> T little_endian_cast(U); The usage would change accordingly: { int32_t wire(/* from wire */); long l; to<big_endian>(l, wire); /// OK if sizeof(long) >= sizeof(int32_t) } { int32_t value(/* from wire */); to<big_endian>(value); } { int32_t wire(/* from wire */); int i(little_endian_cast<int>(wire)); /// OK if sizeof(int) >= sizeof(int32_t) } IOW, the spelling changes from "{from,to}_{big,little}_endian" to "{from,to}<{big,little}_endian>." _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Kim Barrett

4:18 p.m.

On May 27, 2010, at 10:50 AM, Stewart, Robert wrote:

...

It isn't clear that directionality should be parameterized.

That leads to this:

enum endianness { big_endian, little_endian };

template <endianness E, class T, class U> void from(T &, U);

template <endianness E, class T> void from(T &);

[...] IOW, the spelling changes from "{from,to}_{big,little}_endian" to "{from,to}<{big,little}_endian>."

Yes, something like that was what I had in mind.

Tomas Puverle

30 May 30 May

12:26 a.m.

...

...
[...] IOW, the spelling changes from "{from,to}_{big,little}_endian" to "{from,to}<{big,little}_endian>."

Yes, something like that was what I had in mind.

Kim, You certainly have the functionality to go from/to the host endianess to/from any other but how do you write code which needs to go from a specific endianess to another specific endianess? It seem that would be much easier in my interface than with the one above. Am I misunderstanding your point? Tom

Stewart, Robert

1 Jun 1 Jun

12:14 p.m.

Tomas Puverle wrote:

...

Kim wrote:

...
Rob Stewart wrote:

...
...
IOW, the spelling changes from "{from,to}_{big,little}_endian" to "{from,to}<{big,little}_endian>."

Yes, something like that was what I had in mind.

You certainly have the functionality to go from/to the host endianess to/from any other but how do you write code which needs to go from a specific endianess to another specific endianess? It seem that would be much easier in my interface than with the one above.

I've never had the need to convert data from one specific endianness to another, though I can imagine some sort of conversion or translation utility doing so. Note that it seems rare indeed to have a utility that does nothing but endianness conversion rather than needing to do something to the data during the conversion. There are several scenarios for converting X-to-Y endianness: # | Source | Destination | Host --+--------+-------------+----- 1 | B | B | B 2 | B | B | L 3 | B | L | B 4 | B | L | L 5 | L | B | B 6 | L | B | L 7 | L | L | B 8 | L | L | L Assuming the API shown above, then cases 1 and 8 require no swapping at all, cases 3, 4, 5, and 6 require one swap, while cases 2 and 7 require two swaps, assuming that the utility swaps on read and again on write. Such work cannot be avoided if the utility does anything to the data during the conversion. Assuming an API like Tomas suggested, the utility can do all conversions with at most one swap. That efficiency is only possible, however, if the utility does nothing to the data during the conversion. Now, one must consider the likelihood of needing to do such conversions. I noted that I've never had the need to do so and would, generalizing from the specific, suppose such a need is rare. Consequently, I'd conclude that while my API is less efficient in two of the cases, its simplicity makes it better for most, and certainly the most common, cases. Does your experience differ? Supposing that need important, supporting my suggested interface doesn't preclude a more generalized X-to-Y swapping which ignores the host order altogether. Thus: enum host_relative_byte_order { big_endian, little_endian }; template <host_relative_byte_order, class T> T from(T); template <host_relative_byte_order, class T, class U> void from(T &, U); template <host_relative_byte_order, class T> T to(T); template <host_relative_byte_order, class T, class U> void to(T &, U); enum byte_order { big_to_little_endian, little_to_big_endian }; template <byte_order, class T> T swap(T); template <byte_order, class T, class U> void swap(T &, U); Another approach for the latter: template < host_relative_byte_order From , host_relative_byte_order To , class T

...

T swap(T); template < host_relative_byte_order From , host_relative_byte_order To , class T , class U

...

void swap(T &, U); (In that case, I'd rename host_relative_byte_order to just byte_order.) That leaves the choice to the library user. I presume from<>() and to<>() would be used more often. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Tomas Puverle

3:11 p.m.

...

Note that it seems rare indeed to have a utility that does nothing but endianness conversion rather than needing to do something to the data during the conversion.

I think what you describe is pretty close to what I had in mind. And while the utility which does nothing but endian conversion does seem a bit far fetched (perhaps some sort of network bridge which translates between two different network layers?), I think a far more likely scenario is a program that examines and modifies a portion of the data while performing "regularization" on the rest of it. I certainly didn't want to preclude that use case.

...

Assuming an API like Tomas suggested, the utility can do all conversions with at most one swap. That efficiency is only possible, however, if the utility does nothing to the data during the conversion.

Agreed. Still, there seems no good reason to preclude it. There are many great programs I can't even begin to dream about.

...

enum host_relative_byte_order { big_endian, little_endian };

I prefer to use types to enums for the tags.

...

template <host_relative_byte_order, class T> T from(T); <snip code>

I just want to make sure - your only issue right now is with the function names, is that correct? Please don't take this the wrong way, as this is just a matter of taste, but I find your interface worse than mine. It seems that you've gone from the two functions to seven or eight. As for the type conversions, I don't see why you need the endian swapper to do them. Would you be perhaps able to share a use case for this?

...

template < host_relative_byte_order From , host_relative_byte_order To , class T

...
T swap(T);

Finally, I have tried and rejected the approach which includes the "from" and "to" endianness. The main reason is that I've found that I, myself, and other developer using the library, had trouble remembering which parameter was "from" and which was the "to", leading to bugs. Hence the "machine_to_big" tag, which is completely unambiguous. Thanks, Tom

Stewart, Robert

3:40 p.m.

Tomas Puverle wrote:

...

Rob Stewart wrote:

Don't drop attributions. Things can get confusing otherwise.

...

...
Note that it seems rare indeed to have a utility that does nothing but endianness conversion rather than needing to do something to the data during the conversion.

I think what you describe is pretty close to what I had in mind.

And while the utility which does nothing but endian conversion does seem a bit far fetched (perhaps some sort of network bridge which translates between two different network layers?), I think a far more likely scenario is a program that examines and modifies a portion of the data while performing "regularization" on the rest of it.

I certainly didn't want to preclude that use case.

An API with only from and to functions doesn't preclude it, but it does make it less efficient.

...

...
Assuming an API like Tomas suggested, the utility can do all conversions with at most one swap. That efficiency is only possible, however, if the utility does nothing to the data during the conversion.

Agreed. Still, there seems no good reason to preclude it.

No argument.

...

...
enum host_relative_byte_order { big_endian, little_endian };

I prefer to use types to enums for the tags.

I have no problem with either.

...

...
template <host_relative_byte_order, class T> T from(T); <snip code>

I just want to make sure - your only issue right now is with the function names, is that correct?

No. I object to forcing specifying both the source and target endianness in the interface when the usual need is between the host and the external endianness. With your syntax, the two are always specified in the call swap<source_to_target>(). With mine, the words "from" and "to" are relative to the host order, so you get from<big_endian> and to<little_endian>.

...

Please don't take this the wrong way, as this is just a matter of taste, but I find your interface worse than mine. It seems that you've gone from the two functions to seven or eight.

There's no reason for me to be troubled by your thinking your interface better than mine. After all, I'm suggesting the reverse! Indeed, I'm suggesting a larger interface because I'm including your interface and adding more. The result is three categories of functions, ignoring overloads: from<X>() to<X>() swap<X_to_Y>() That's not complicated or confusing, is it?

...

As for the type conversions, I don't see why you need the endian swapper to do them. Would you be perhaps able to share a use case for this?

There are various cases in which the external format uses a type that isn't what the class that will ultimately consume the data prefers -- perhaps for compatibility or sizing reasons. If the client's type is large enough to handle the value, but different from, the external type, the client must swap, check the range, and then copy. I like doing that in one place. Clearly, there must be a copy to effect that, so such functionality can be layered atop the single-type functions, but I think they are highly useful nonetheless and should be considered for inclusion in such a general purpose library.

...

...
template < host_relative_byte_order From , host_relative_byte_order To , class T

...
T swap(T);

Finally, I have tried and rejected the approach which includes the "from" and "to" endianness. The main reason is that I've found that I, myself, and other developer using the library, had trouble remembering which parameter was "from" and which was the "to", leading to bugs. Hence the "machine_to_big" tag, which is completely unambiguous.

I agree that is a better approach. You might consider adding the reverse choices for completeness: "Y_from_X." _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Tomas Puverle

4:07 p.m.

...

An API with only from and to functions doesn't preclude it, but it does make it less efficient.

That was the intent of my response.

...

Indeed, I'm suggesting a larger interface because I'm including your interface and adding more. The result is three categories of functions, ignoring overloads:

from<X>() to<X>() swap<X_to_Y>()

That's not complicated or confusing, is it?

Ok. I was on the verge of yielding to the "to" and "from" syntax in my previous post. That would satisfy your concerns, correct?

...

I agree that is a better approach. You might consider adding the reverse choices for completeness: "Y_from_X."

Sure, will do. Again, thanks for the feedback. Tom

Stewart, Robert

4:15 p.m.

Tomas Puverle wrote:

...

Ok. I was on the verge of yielding to the "to" and "from" syntax in my previous post.

That would satisfy your concerns, correct?

Yes, unless there's something that has escaped my notice or recollection, of course. I would like to see s/machine/host/ in your tag names for brevity, though. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Tomas Puverle

4:40 p.m.

...

I would like to see s/machine/host/ in your tag names for brevity, though.

Already on my list. Thanks.

Cliff Green

27 May 27 May

6:43 p.m.

New subject: [boost::endian] Request for comments/interest (floating point endian handling)

For generic endian conversions, keep in mind the restrictions for floating point types (and other types with similar properties). It might be worthwhile putting in an "is_integral" check in the generic function implementations. I see a number of projects where I work that blithely endian swap and pass around binary floating point values in distributed applications, not understanding the brittleness of this design. For example, in the following:

...

template <class T, class U> T big_endian_cast(U);

(and similar little_endian_cast function)

If T and U are floating point types, then: double val1 = someVal; // little endian, for example // ... double val2 = little_endian_cast<double>(big_endian_cast<double>(val1)); For many values, val1 != val2; (For all integral types in the above casting functions, all possible values will have val1 == val2.) This is due to floating point normalization and other non-obvious operations that might be silently applied to the internal bits when values are moved to / from registers. And, even though IEEE 754 is pretty much the standard floating point implementation on modern platforms, that shouldn't be assumed. I can't remember if Beman's endian facilities addressed floating point swapping, but any accepted Boost endian facility needs to address non-integral types in some form or fashion (either by restricting, or properly implementing). I faintly remember some discussions on this subject in the Serialization portable binary archive, or maybe in the Math floating point utilities, but a quick glance at 1.43 documentation doesn't provide specific code for endianness handling in either library (I may have missed it). Cliff

vicente.botet

8:41 p.m.

New subject: [boost::endian] Request for comments/interest (floatingpoint endian handling)

----- Original Message ----- From: "Cliff Green" <cliffg@codewrangler.net> To: <boost@lists.boost.org> Sent: Thursday, May 27, 2010 8:43 PM Subject: Re: [boost] [boost::endian] Request for comments/interest (floatingpoint endian handling)

...

For generic endian conversions, keep in mind the restrictions for floating point types (and other types with similar properties). It might be worthwhile putting in an "is_integral" check in the generic function implementations.

...

I can't remember if Beman's endian facilities addressed floating point swapping, but any accepted Boost endian facility needs to address non-integral types in some form or fashion (either by restricting, or properly implementing).

I faintly remember some discussions on this subject in the Serialization portable binary archive, or maybe in the Math floating point utilities, but a quick glance at 1.43 documentation doesn't provide specific code for endianness handling in either library (I may have missed it).

Beman's endian library restricts to interger types. Vicente

Tomas Puverle

28 May 28 May

3:05 p.m.

New subject: [boost] [boost::endian] Request for comments/interest (floating point endian handling)

...

double val1 = someVal; // little endian, for example // ... double val2 = little_endian_cast<double>(big_endian_cast<double>(val1));

For many values, val1 != val2; (For all integral types in the above casting functions, all possible values will have val1 == val2.)

This is a good point. And this is not as unusual as you may think, even on IEEE 754 machines. For example, on some compilers on Intel machines, sizeof(long comes in at 12 bytes (even though long double is only 80 bits), essentially meaning that the last two bytes are garbage when the data is in memory. However, I am not completely sure that this falls within the realm of an endian library. If, as in your example, the program cannot move the data from its internal registers to memory without modifying it, what would you suggest the endian should library do?

Cliff Green

4:40 p.m.

New subject: [boost::endian] Request for comments/interest (floating point endian handling)

Tomas Puverle wrote:

...

This is a good point. And this is not as unusual as you may think, even on IEEE 754 machines. For example, on some compilers on Intel machines, sizeof(long comes in at 12 bytes (even though long double is only 80 bits), essentially meaning that the last two bytes are garbage when the data is in memory.

However, I am not completely sure that this falls within the realm of an endian library. If, as in your example, the program cannot move the data from its internal registers to memory without modifying it, what would you suggest the endian should library do?

I was mainly pointing out the function that Robert Stewart was proposing or using as an example:

...

template <class T, class U> T big_endian_cast(U);

I did the same thing many years ago (when writing an endian swapping utility), and noticed that when returning by value, the fp normalization would occur, causing silent "changing of the internal bits". Floating point values could be "in-place" swapped, but this was the only safe usage (I.e. after the "in-place" swap, the values would have to be sent across the network or written to disk, etc). The brittleness with swapping fp values is a subtle and non-obvious problem. The best endian solution would be one that includes safe and portable binary floating point approaches, although I'm not familiar enough with the possibilities to know how much effort that entails. At the least, any endian facilities need to either disallow or cause warnings to occur (really BIG and nasty warnings would be best), when used with fp values. I ended up writing endian facilities that were, in effect, "take this stream of bytes, and return a value correctly endian swapped (and safe) for use by the app". For example, the function above was something like: template <typename T> T extract_and swap_from_stream(const char*); It would be used as: val = extract_and_swap_from_stream<int32_t>(buf); This was project specific (as part of a general marshalling library), and has drawbacks (e.g. it assumed certain endian orderings in the incoming stream). Cliff

Stewart, Robert

1 Jun 1 Jun

1:03 p.m.

New subject: [boost::endian] Request for comments/interest (floating point endian handling)

Cliff Green wrote:

...

Tomas Puverle wrote:

...
...
This is a good point. And this is not as unusual as you may think, even on IEEE 754 machines. For example, on some compilers on Intel machines, sizeof(long comes in at 12 bytes (even though long double is only 80 bits), essentially meaning that the last two bytes are garbage when the data is in memory.

what would you suggest the endian should library do?

I was mainly pointing out the function that Robert Stewart was proposing or using as an example:

...
template <class T, class U> T big_endian_cast(U);

I did the same thing many years ago (when writing an endian swapping utility), and noticed that when returning by value, the fp normalization would occur, causing silent "changing of the internal bits". Floating point values could be "in-place" swapped, but this was the only safe usage (I.e. after the "in-place" swap, the values would have to be sent across the network or written to disk, etc). The brittleness with swapping fp values is a subtle and non-obvious problem.

I hadn't heard of this problem before, but I'm glad to know about it now. Do you know whether zeroing the remaining bits solves the problem? If so, specializing for FP versus integer types would allow handling the difference.

...

The best endian solution would be one that includes safe and portable binary floating point approaches, although I'm not familiar enough with the possibilities to know how much effort that entails. At the least, any endian facilities need to either disallow or cause warnings to occur (really BIG and nasty warnings would be best), when used with fp values.

I agree that FP should be disabled if not supported correctly, but I'd rather find the correct solution and use it for FP types.

...

I ended up writing endian facilities that were, in effect, "take this stream of bytes, and return a value correctly endian swapped (and safe) for use by the app". For example, the function above was something like:

template <typename T> T extract_and swap_from_stream(const char*);

It would be used as:

val = extract_and_swap_from_stream<int32_t>(buf);

That's an interesting facility. It could be built atop the approaches we've been discussing. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Tomas Puverle

3:21 p.m.

New subject: [boost] [boost::endian] Request for comments/interest (floating point endian handling)

...

I hadn't heard of this problem before, but I'm glad to know about it now. Do you know whether zeroing the remaining bits solves the problem? If so, specializing for FP versus integer types would allow handling the difference.

There are two problems I can think of off the top of my head: The first one I've already described in an earlier post and concern the "long double" data type on different compilers. The second would concern NaN values. I have not encountered this myself (and don't have the IEEE standard handy, so can't check if this is even a valid concern) but I can imagine that a swapped floating point number could result in a NaN value. There are several valid NaN representations and it doesn't seem unreasonable that a CPU could "normalize" all NaNs to just a single one. Of course, swapping this normalized NaN could result in a completely different number. On a similar note, CPUs can handle denormalized numbers in different ways, sometimes just rounding them down to zero. Again, this has the potential to break in the same way as described above. Honestly, it's hard to say how to try and prevent this - since it's so architecture and compiler dependent - other than through a large collection of test cases.

...

I agree that FP should be disabled if not supported correctly, but I'd rather find the correct solution and use it for FP types.

Yes, agreed. Tom

Cliff Green

3:51 p.m.

New subject: [boost::endian] Request for comments/interest (floating point endian handling)

...

...
I did the same thing many years ago (when writing an endian swapping utility), and noticed that when returning by value, the fp normalization would occur, causing silent "changing of the internal bits". Floating point values could be "in-place" swapped, but this was the only safe usage (I.e. after the "in-place" swap, the values would have to be sent across the network or written to disk, etc). The brittleness with swapping fp values is a subtle and non-obvious problem.

I hadn't heard of this problem before, but I'm glad to know about it now. Do you know whether zeroing the remaining bits solves the problem? If so, specializing for FP versus integer types would allow handling the difference.

For integral types, all bit patterns are valid numbers, so swapping bytes always creates a valid number. Performing a swap twice always ends up with the same original number (I.e. for all values in integral type T, swap(swap(T)) == T, no matter if the values are read / accessed between the swaps). With floating point types, various bit patterns mean different things, including signed and unsigned zero, NAN, infinity, etc. IEEE 754 has "sticky bits" to capture "inexact", "overflow", "underflow", "div by 0" and "invalid" states. A FP processor instruction will look at the bit pattern and do things with the value, including silently changing bits. For example, from http://en.wikibooks.org/wiki/Floating_Point/Normalization: "We say that the floating point number is normalized if the fraction is at least 1/b, where b is the base. In other words, the mantissa would be too large to fit if it were multiplied by the base. Non-normalized numbers are sometimes called denormal; they contain less precision than the representation normally can hold. If the number is not normalized, then you can subtract 1 from the exponent while multiplying the mantissa by the base, and get another floating point number with the same value. Normalization consists of doing this repeatedly until the number is normalized. Two distinct normalized floating point numbers cannot be equal in value." Once a fp number is byte swapped, the only safe way to treat it is as a char array (byte buffer). Anything else (e.g. returning it by value from a function, or just reading / accessing it as a fp value) may cause normalization or other fp operations to kick in on certain values. It's a pernicious problem, since bits are silently changed for only certain values, and swap(swap(T)) no longer holds true for all values. I'm not familiar enough with "safe fp byte swapping" techniques to compare or recommend them (obviously, converting to / from a text representation will work, with the usual rounding and accuracy constraints). Since the whole point of endian / byte swapping utilities are to allow binary values to be serialized / IO'ed (network, disk, etc), without having to convert to / from text, there might be ways to grab the various portions (exponent, mantissa, etc) and treat them as integral values (including byte swapping). This would be format specific (e.g. IEEE 754), and would entail querying the fp format (C++ standard is agnostic wrt fp formats). In code I've seen where fp values are byte swapped, it's always "in-place" swapping, and it's just luck that there's no code "in between the swaps" that might cause normalization (or other bit changing) to occur. For Boost quality libraries, I would always vote against code that silently fails with what appears to be typical, normal usage. That's why I brought up the point about disallowing or explicitly supporting fp types in endian / byte swapping libraries. Cliff

Tomas Puverle

7:23 p.m.

New subject: [boost] [boost::endian] Request for comments/interest (floating point endian handling)

...

In code I've seen where fp values are byte swapped, it's always "in-place" swapping, and it's just luck that there's no code "in between the swaps" that might cause normalization (or other bit changing) to occur. For Boost quality libraries, I would always vote against code that silently fails with what appears to be typical, normal usage. That's why I brought up the point about disallowing or explicitly supporting fp types in endian / byte swapping libraries.

I think a reasonable solution to this problem would be the following: 1) Restrict the swap<>() function for floating point types (and pointers, for that matter, as they potentially have the same problems) (who might want to send a pointer to a device? I don't know but I don't see a reason to disallow it) The swapping itself is not a problem - it is done at the byte level, so floating point registers don't get involved. The only issue is the return value from swap<>(), which could be in a register and could introduce the above issues. However, this is only a problem when swapping FROM the machine representation to the "device" representation. It does not exist in the opposite direction. So we can always allow swap_in_place<>, from<>() (as suggested by Rob) and a restricted form of swap<>() (only the {little/big}_to_host versions). I think that should cover it. Another possibility would be to return an opaque type which contains some number of bytes representing the swapped data in the case of "host_to_<whatever>" scenario. Actually, I sort of like this idea, as it makes it clear the data is now completely opaque and shouldn't be used any longer, other than to write it. Tom

Terry Golubiewski

26 May 26 May

10:35 p.m.

With Beman's approach there is no extra overhead if the native type == endian type. If you use Beman's endian only for over-the-wire messages and then extract the endian-aware types in/out to native types, then I think Beman's efficiency is very good. Beman's approach does provide several operators for endian-swappable types, but I just don't use those operations, preferring to extract to native, perform operations, and then store the result. I have also extended Beman's general approach to floating point numbers with very little effort. What inefficiencies in Beman's approach should be corrected? Also, network-byte-order isn't the only "correct" over-the-wire byte order. Some of us still believe that little-endian is better. (Don't take the bait). ;o) Since the endian issue is larger than network-byte-ordering, I would suggest removing any such references. terry ----- Original Message ----- From: "Stewart, Robert" <Robert.Stewart@sig.com> Newsgroups: gmane.comp.lib.boost.devel To: <boost@lists.boost.org> Sent: Wednesday, May 26, 2010 12:07 PM Subject: Re: [boost::endian] Request for comments/interest

...

Tomas Puverle wrote:

...
I wrote an endian library on the flight back from boostcon but it has taken a little time to clear the submission with the firm.

I recall our discussion of the inherent inefficiencies of Beman's approach -- that all operations on a boost::endian type potentially require swapping -- whereas the typical use case is to use native endianness until ready to read/write using wire format. It is only when ready to do I/O that there is a need to account for endianness.

...
I realize that Beman's endian swapper library is already in the review queue but I hope my approach is different and interesting enough to warrant consideration.

Alternatives are good.

...
The highlights of the library are the following: - Very simple interface: swap<machine_to_little/big>(), swap_in_place<machine_to_little/big>()

In my version of such functionality, I have byte_order::to_host() and byte_order::to_network() function templates. Those names are more in keeping with the C functions htons, ntohs, etc. I have versions that return a swapped copy of the input and those that take a non-const reference to a destination type, which means both types can be deduced and checked. Mine doesn't require the input and output type to be the same, but they must meet certain criteria checked via Boost.Concept Checking: they must be the same size, have the same signedness, and be of a supported size (currently up to 64 bits).

namespace byte_order { template <class T, class U> void to_host(T &, U);

template <class T, class U> T to_host(U); }

...
- Support for built-in and user-defined data types and ranges thereof. E.g. you can swap ranges of structs containing/deriving from other structs

I haven't looked at the implementation you've selected, but there are several ways to do the swapping: casting among pointer types, straight casting, union-based punning, and memcpy(). The optimal choice for a particular data size and platform varies. The pointer-casting approach can lead to type punning problems for GCC without -fno-strict-aliasing. Whether to choose one approach for simplicity or use performance testing to find all of the special cases and select the approach -- including platform intrinsics -- to maximize performance, is the choice you must make.

...
- endian::iterator, which will iterate across a range swapping values as necessary. It works with any swappable type.

Nice idea!

...
The library is described/documented/tested in the file main.cpp at the root of the zip archive.

I'll try to have a look.

_____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com

IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of vi ruses. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Tomas Puverle

27 May 27 May

2:11 a.m.

Hi Terry,

...

With Beman's approach there is no extra overhead if the native type == endian type.

As is the case with my library, as you'd expect.

...

If you use Beman's endian only for over-the-wire messages and then extract the endian-aware types in/out to native types, then I think Beman's efficiency is very good. Beman's approach does provide several operators for endian-swappable types, but I just don't use those operations, preferring to extract to native, perform operations, and then store the result. I have also extended Beman's general approach to floating point numbers with very little effort.

What inefficiencies in Beman's approach should be corrected?

Note that I didn't criticize Beman's approach in my original email - I would just prefer to say that my approach is different. Here some of the differences in our philosophies: 1) Intrusive vs non-intrusive: Endian swapping can be added/retrofited to any type using my library, even if you can't modify the original source code. If I am not mistaken, Beman's library requires the data types which need to be swapped to be written using his endian types as members. 2) In your usage scenario, you actually require TWO types: One to deal with the swapping and then your "native" type where you save your swapped state. This now means you have to maintain two separate structs/classes when only one is needed. 3) The fact that the endian types provide operators gives the impression that it's ok to operate on them. I don't agree with that design choice, as I think the operation of endian swapping and operating on the data should be divorced from each other. A less experienced user that you may end up not realizing what the hidden costs are and use the swappable types throughout the application, paying unnecessary overheads. I would prefer for the library to be hard to misuse in that way; borrowing a phrase from python, "explicit is better than implicit". This may be simply my opinion but I believe that forcing the user to do the explicit swap<>() leads to better separation of concerns for the application.

...

Also, network-byte-order isn't the only "correct" over-the-wire byte order. Some of us still believe that little-endian is better. (Don't take the bait). ;o)

I code for several different platforms, some of which are little and some of which are big endian so I am completely agnostic to this issue and so is my library. (I am not sure whether there was something I said that made you think otherwise). There are four types of swap in the library: - little_to_machine - big_to_machine - machine_to_big - machine_to_little and each one of these can be used with swap<>(), swap_in_place<>() and the endian iterator. I hope you see that I don't prescribe any "network" order - it really is not my place to do so.

...

Since the endian issue is larger than network-byte-ordering, I would suggest removing any such references.

Again, I am not sure what you're referring to but if I'm missing something, please point it out. Thanks! Tom

Terry Golubiewski

6:12 a.m.

I was responding to some of the examples in Robert Stewart's reply, which I snipped unfortunately. I hadn't even looked at your library yet. I see that most of my comments do not apply to your code. Often, when I receive a "message" over-the-wire, I only want to examine some fields of the message and ignore the rest. Different sections of code process different parts of received messages, so it would be dangerous to do in-place swapping (for my way of doing things). Beman's approach has gradually become more integer-centric over time, with alignment support and supporting odd-sized integers. I believe that the endian problem is primarily an interface problem. Here is a simplified version of Beman's approach that is quite simple and just wraps a user-type T, so you would just write struct MyMessage { endian<big, MyType> big_field; endian<little, MyType> lil_field; endian<native, MyType> scary_field; }; // Message in your over-the-wire messages and then MyType x = msg.big_field; msg.lil_field = x; Here's the (draft, possibly naive) code for this approach. terry namespace boost { namespace interface { struct uninitialized_t { }; static const uninitialized_t uninitialized; enum endian_t { little, big }; #ifdef BOOST_BIG_ENDIAN static const endian_t native = big; #else static const endian_t native = little; #endif namespace detail { template<endian_t E, class T, std::size_t L> inline void Retrieve(T* value, const array<uint8_t, L>& bytes); template<endian_t E, class T std::size_t L> inline void Store(array<uint8_t, L>& bytes, const T& value); } // detail #pragma pack(push, 1) template<endian_t E, class T> class endian { public: static const endian_t endian_type = E; typedef T value_type; private: static const std::size_t storage_size = sizeof(T); array<uint8_t, storage_size> m_storage; public: endian() { detail::Store<endian_type>(m_storage, T()); } endian(uninitialized_t) { } endian(const value_type& x) { detail::Store<endian_type>(m_storage, x); } value_type value() const { value_type rval; detail::Retrieve<endian_type>(&rval, m_storage); return rval; } // value operator value_type() const { return value(); } endian& operator=(const value_type& rhs) { detail::Store<endian_type>(m_storage, rhs); return *this; } // operator= }; // endian // Specialize for native endian types. template<endian_t E, class T> class endian<native, T> { public: static const endian_t endian_type = native; typedef T value_type; private: T m_value; public: endian() : m_value() { } endian(uninitiazed_t) { } endian(const value_type& x) : m_value(x) { } value_type value() const { return m_value; } operator value_type() const { return value(); } endian& operator=(const value_type& rhs) { m_value = rhs; } }; // endian #pragma pack(pop) } } // boost::interface

Stewart, Robert

1:22 p.m.

Terry Golubiewski wrote: [please don't top post]

...

With Beman's approach there is no extra overhead if the native type == endian type.

In many cases, write format is big endian and the host is little, so that assertion is small comfort.

...

If you use Beman's endian only for over-the-wire messages and then extract the endian-aware types in/out to native types, then I think Beman's efficiency is very good.

That involves copying all of the data. In many cases, the extra data copying is too expensive.

...

Beman's approach does provide several operators for endian-swappable types, but I just don't use those operations, preferring to extract to native, perform operations, and then store the result.

That the operations are provided for types that look and act like built-ins suggests that the operations are just as efficient. In the big/wire, little/host scenario, simple looking expressions can be rife with inefficiency. Expression templates could be employed to reduce those inefficiencies, of course, but as you note, when one is going to do much with the values, it is better to swap once and then work with the natively ordered value. Tomas' and my APIs are geared toward the latter approach. Beman's design tends to encourage the former. Nevertheless, I understand that less demanding applications would benefit from the safety of Beman's approach, so both are worthwhile.

...

Also, network-byte-order isn't the only "correct" over-the-wire byte order. Some of us still believe that little-endian is better. (Don't take the bait). ;o)

I absolutely agree that local conventions can make that work well. The fact is, however, that the Internet is based upon big-endian ordering and that is a strong influence on many applications.

...

Since the endian issue is larger than network-byte-ordering, I would suggest removing any such references.

Those were only in the (existing) API that I posted illustrating an alternative to Tomas' API, as I'm sure you're now aware. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

vicente.botet

1:51 p.m.

----- Original Message ----- From: "Stewart, Robert" <Robert.Stewart@sig.com> To: <boost@lists.boost.org> Sent: Thursday, May 27, 2010 3:22 PM Subject: Re: [boost] [boost::endian] Request for comments/interest

...

Terry Golubiewski wrote:

[please don't top post]

...
With Beman's approach there is no extra overhead if the native type == endian type.

In many cases, write format is big endian and the host is little, so that assertion is small comfort.

What the OP intend to say is that when we convert from the same endian there is no extra overload. But even this is not true, as Beman's implementation will do a copy as there is no in-place operations, as you bellow.

...

...
If you use Beman's endian only for over-the-wire messages and then extract the endian-aware types in/out to native types, then I think Beman's efficiency is very good.

That involves copying all of the data. In many cases, the extra data copying is too expensive.

...
Beman's approach does provide several operators for endian-swappable types, but I just don't use those operations, preferring to extract to native, perform operations, and then store the result.

That the operations are provided for types that look and act like built-ins suggests that the operations are just as efficient. In the big/wire, little/host scenario, simple looking expressions can be rife with inefficiency. Expression templates could be employed to reduce those inefficiencies, of course, but as you note, when one is going to do much with the values, it is better to swap once and then work with the natively ordered value.

We can remove all the operations of the Beman's typed approach, so no risk to over use of. Without no operations, the Beman's classes must be used just to convert from some endiannes to the native and vice-versa, and then operate on the native types.

...

Tomas' and my APIs are geared toward the latter approach. Beman's design tends to encourage the former. Nevertheless, I understand that less demanding applications would benefit from the safety of Beman's approach, so both are worthwhile.

IMO the single case when your and Tomas approach can be more efficient is when you use in_place conversions, as you are able to do nothing when the endianess is the same. The other case, i.e., conversion from one variable to another, is better handled by Beman's approach, as it is type safe and is should not be less efficient than yours.

...

...
Also, network-byte-order isn't the only "correct" over-the-wire byte order. Some of us still believe that little-endian is better. (Don't take the bait). ;o)

I absolutely agree that local conventions can make that work well. The fact is, however, that the Internet is based upon big-endian ordering and that is a strong influence on many applications.

...
Since the endian issue is larger than network-byte-ordering, I would suggest removing any such references.

Those were only in the (existing) API that I posted illustrating an alternative to Tomas' API, as I'm sure you're now aware.

I agree that the interface must be network neutral, only endiannes is the concern. Best, Vicente

Stewart, Robert

2:04 p.m.

vicente.botet wrote:

...

Rob Stewart wrote:

...
Terry Golubiewski wrote:

...
With Beman's approach there is no extra overhead if the native type == endian type.

In many cases, write format is big endian and the host is little, so that assertion is small comfort.

What the OP intend to say is that when we convert from the same endian there is no extra overload.

I understood that. My point was that the swapping scenario is very common, so the overhead will also be common.

...

But even this is not true, as Beman's implementation will do a copy as there is no in-place operations, as you bellow.

I didn't realize I was bellowing about anything and I wasn't the one to introduce the idea of in-place swapping. Nevertheless, copying built-in types is trivial and should lend itself readily to optimization in many cases, so the difference should be small, if not zero.

...

...
...
Beman's approach does provide several operators for endian-swappable types, but I just don't use those operations, preferring to extract to native, perform operations, and then store the result.

That the operations are provided for types that look and act like built-ins suggests that the operations are just as efficient. In the big/wire, little/host scenario, simple looking expressions can be rife with inefficiency. Expression templates could be employed to reduce those inefficiencies, of course, but as you note, when one is going to do much with the values, it is better to swap once and then work with the natively ordered value.

We can remove all the operations of the Beman's typed approach, so no risk to over use of. Without no operations, the Beman's classes must be used just to convert from some endiannes to the native and vice-versa, and then operate on the native types.

At that point, however, the functional approach seems more straightforward.

...

...
Tomas' and my APIs are geared toward the latter approach. Beman's design tends to encourage the former. Nevertheless, I understand that less demanding applications would benefit from the safety of Beman's approach, so both are worthwhile.

IMO the single case when your and Tomas approach can be more efficient is when you use in_place conversions, as you are able to do nothing when the endianness is the same. The other case, i.e., conversion from one variable to another, is better handled by Beman's approach, as it is type safe and is should not be less efficient than yours.

If you remove the operators from Beman's types, then its a question of syntax and, possibly, safety. However, I don't see any type safety problems in my original or hybrid APIs. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

vicente.botet

2:20 p.m.

----- Original Message ----- From: "Stewart, Robert" <Robert.Stewart@sig.com> To: <boost@lists.boost.org> Sent: Thursday, May 27, 2010 4:04 PM Subject: Re: [boost] [boost::endian] Request for comments/interest

...

vicente.botet wrote:

...
Rob Stewart wrote:

...
Terry Golubiewski wrote:

...
With Beman's approach there is no extra overhead if the native type == endian type.

In many cases, write format is big endian and the host is little, so that assertion is small comfort.

What the OP intend to say is that when we convert from the same endian there is no extra overload.

I understood that. My point was that the swapping scenario is very common, so the overhead will also be common.

...
But even this is not true, as Beman's implementation will do a copy as there is no in-place operations, as you bellow.

I didn't realize I was bellowing about anything and I wasn't the one to introduce the idea of in-place swapping. Nevertheless, copying built-in types is trivial and should lend itself readily to optimization in many cases, so the difference should be small, if not zero.

...
...
...
Beman's approach does provide several operators for endian-swappable types, but I just don't use those operations, preferring to extract to native, perform operations, and then store the result.

That the operations are provided for types that look and act like built-ins suggests that the operations are just as efficient. In the big/wire, little/host scenario, simple looking expressions can be rife with inefficiency. Expression templates could be employed to reduce those inefficiencies, of course, but as you note, when one is going to do much with the values, it is better to swap once and then work with the natively ordered value.

We can remove all the operations of the Beman's typed approach, so no risk to over use of. Without no operations, the Beman's classes must be used just to convert from some endiannes to the native and vice-versa, and then operate on the native types.

At that point, however, the functional approach seems more straightforward.

...
...
Tomas' and my APIs are geared toward the latter approach. Beman's design tends to encourage the former. Nevertheless, I understand that less demanding applications would benefit from the safety of Beman's approach, so both are worthwhile.

IMO the single case when your and Tomas approach can be more efficient is when you use in_place conversions, as you are able to do nothing when the endianness is the same. The other case, i.e., conversion from one variable to another, is better handled by Beman's approach, as it is type safe and is should not be less efficient than yours.

If you remove the operators from Beman's types, then its a question of syntax and, possibly, safety. However, I don't see any type safety problems in my original or hybrid APIs.

The problem is that we don't know the endiannes of the types int32_t or long. This depends on the operation you have performed on your code. E.g. int32_t wire(/* from wire */); long l1; long l2; to_big_endian(l1, l2); to_big_endian(l2, wire); With the typed endians this is not possible. big32_t wire(/* from wire */); long l1; long l2; l2=l1; // no endian change as both are native wire=l2; // conversion to big from native. Best, Vicente

Stewart, Robert

2:29 p.m.

vicente.botet wrote:

...

Rob Stewart wrote:

...
I don't see any type safety problems in my original or hybrid APIs.

The problem is that we don't know the endiannes of the types int32_t or long. This depends on the operation you have performed on your code. E.g.

Endianness is not part of the type, so referring to type safety mislead me.

...

int32_t wire(/* from wire */); long l1; long l2; to_big_endian(l1, l2); to_big_endian(l2, wire);

With the typed endians this is not possible. big32_t wire(/* from wire */); long l1; long l2; l2=l1; // no endian change as both are native wire=l2; // conversion to big from native.

That's a sensible concern and reason to provide such types (without the operators) along with the less safe functions. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Gottlob Frege

28 May 28 May

6:03 a.m.

I'm just commenting for emphasis, no disagreements. Below... On Thu, May 27, 2010 at 10:29 AM, Stewart, Robert <Robert.Stewart@sig.com> wrote:

...

vicente.botet wrote:

...
Rob Stewart wrote:

...
I don't see any type safety problems in my original or hybrid APIs.

The problem is that we don't know the endiannes of the types int32_t or long. This depends on the operation you have performed on your code. E.g.

Endianness is not part of the type, so referring to type safety mislead me.

...
int32_t wire(/* from wire */); long l1; long l2; to_big_endian(l1, l2); to_big_endian(l2, wire);

Yes, the point here is that 'wire' isn't really an int, is it? At least not a useful one (at least when the machine's order doesn't match the wire). Thus they should be distinct types.

...

...
With the typed endians this is not possible. big32_t wire(/* from wire */); long l1; long l2; l2=l1; // no endian change as both are native wire=l2; // conversion to big from native.

That's a sensible concern and reason to provide such types (without the operators) along with the less safe functions.

Yep, I think it would be great if we could always use big32_t, etc. But you know there will be times when you are handed predefined objects with ints (but not really) in them, etc. So less safe functions, including in-place functions, will be needed. And I also agree that the operators should be avoided. Tony

Mateusz Loskot

27 May 27 May

2:23 p.m.

On 27/05/10 15:04, Stewart, Robert wrote:

...

...
But even this is not true, as Beman's implementation will do a copy as there is no in-place operations, as you bellow.

I didn't realize I was bellowing about anything and I wasn't the one to introduce the idea of in-place swapping. Nevertheless, copying built-in types is trivial and should lend itself readily to optimization in many cases, so the difference should be small, if not zero.

If I may ask, do you use "Beman's implementation" to refer to copying with unrolled loops? I used the loops approach to craft endian-aware binary streams IO carrying geometry objects, it was for Boost.Geometry, some time ago: http://github.com/mloskot/workshop/blob/master/endian/endian.hpp Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

Tomas Puverle

28 May 28 May

3:11 p.m.

...

If I may ask, do you use "Beman's implementation" to refer to copying with unrolled loops?

Mateusz, we are currently avoiding discussing "library implementation" issues and rather concentrating on the interfaces and their effects on "application implementation" issues. So no, we're currently not even talking about unrolling loops ;) Tom

Mateusz Loskot

29 May 29 May

1:40 p.m.

On Fri, 2010-05-28 at 15:11 +0000, Tomas Puverle wrote:

...

...
If I may ask, do you use "Beman's implementation" to refer to copying with unrolled loops?

Mateusz,

we are currently avoiding discussing "library implementation" issues and rather concentrating on the interfaces and their effects on "application implementation" issues.

So no, we're currently not even talking about unrolling loops ;)

Tom, I had not been paying close attention before I noticed that what happened after I posted my question :-) Thanks for clarification. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

Terry Golubiewski

27 May 27 May

7:01 p.m.

...

At that point, however, the functional approach seems more straightforward.

We sometimes deal with messages with mixed endianness. A functional approach means the programmer has to "know" which endianess the data he wants to swap is. With the endian<endian_t, T> approach, the message can contain fields with varying endianess and the structure is self-documenting. struct MyStruct { ... }; struct Msg { endian<little, uint32_t> x; endian<big, uint16_t> y; endian<big, MyStruct> z; endian<little, double> w; }; I attached a working (?) program that demonstrates this and a assembly listing showing the optimized results. terry

Scott McMurray

28 May 28 May

3:11 p.m.

On 27 May 2010 15:01, Terry Golubiewski <tjgolubi@netins.net> wrote:

...

With the endian<endian_t, T> approach, the message can contain fields with varying endianess and the structure is self-documenting.

+1 Most uses I've seen involve different wire and memory models anyways, since the latter can change while the former is fixed, so I'm not convinced that the copy is always as terrible as has been implied. I'd certainly never keep a BMP header structure in memory for very long, for instance. Also, the wire structures cannot maintain invariants, so having a separate class is often useful even if its contents are essentially identical to the wire version.

Cliff Green

4:57 p.m.

"Scott McMurray" wrote:

...

Most uses I've seen involve different wire and memory models anyways, since the latter can change while the former is fixed, so I'm not convinced that the copy is always as terrible as has been implied. I'd certainly never keep a BMP header structure in memory for very long, for instance.

Exactly! I'm constantly irritated at work where the "wire model" is defined as: Take this struct as defined in a header, cast it to char* or void* and blast it across the network (or write to disk). Do the converse on the other side. Swap as needed. Every time I point out the dangers in that approach, the response I get is "well, I made sure everything is aligned on four byte boundaries, and I know how to swap bytes". And then, a developer wastes three days debugging why a "bool" value doesn't get handled correctly, or the padding at the end of the struct causes different sizes to be transmitted across the network (or written to disk), or an esoteric compiler aligns the values in the struct differently than expected, or a 64-bit compile causes different padding than the 32-bit compile. All of my "home grown" marshalling designs require a copy somewhere - usually values are extracted from a byte stream into app safe structures (incoming) or appended to a byte stream (endian swapped as needed) for outgoing needs. So my point is: In most general marshalling / serialization designs where portable binary values are required, the transform step (from "wire" to "object / memory" model) requires a copy anyway. The low-level endian facilities should not require an extra copy, but trying to minimize all copying and constraining the endian library design might be misguided. Cliff

Tomas Puverle

30 May 30 May

2:01 p.m.

...

Exactly! I'm constantly irritated at work where the "wire model" is defined as:

Take this struct as defined in a header, cast it to char* or void* and blast it across the network (or write to disk). Do the converse on the other side. Swap as needed. Every time I point out the dangers in that approach, the response I get is "well, I made sure everything is aligned on four byte boundaries, and I know how to swap bytes".

I don't see what you find so irritating. First of all, as I've pointed out in other post, endianess is not a problem restricted to sending data over a network. Second, at some point you will have to have a raw datastructure which will be sent to a device. That is where endianness comes to play. Am I missing something?

...

All of my "home grown" marshalling designs require a copy somewhere -

Let's not get side tracked, please. I am not proposing a marshalling library but an endian one.

...

usually values are extracted from a byte stream into app safe structures (incoming) or appended to a byte stream (endian swapped as needed) for outgoing needs.

Doesn't this contradict your point from the beginning of the message? Sorry but I just feel confused now.

...

In most general marshalling / serialization designs where portable binary values are required, the transform step (from "wire" to "object / memory" model) requires a copy anyway.

Again, please let me repeat that I cannot say anything about a marshalling solution as I'm not proposing one. However, endian swapping can, in many cases, be done with 0 copying. Can you please, at least in pseudocode, describe how you would go from "raw data" to application data? Suppose I've just done the following: void * buffer = <allocate memory>; read(fh, buffer, size_of_buffer); //now what? That may help me understand your use case better.

...

The low-level endian facilities should not require an extra copy, but trying to minimize all copying and constraining the endian library design might be misguided.

I don't feel like I'm doing that. I have provided two interfaces - a copying and a non-copying one without compromising the design in any way. If you've looked at the code, can you perhaps be more specific about what you have in mind? Cheers, Tom

Tomas Puverle

28 May 28 May

3:21 p.m.

Terry, thanks for posting your code.

...

...
At that point, however, the functional approach seems more straightforward.

We sometimes deal with messages with mixed endianness. A functional approach means the programmer has to "know" which endianess the data he wants to swap is.

Well, the programmer needs to know this in any case but in your usage scenario, without additional information about what you are doing, I would tend to agree that the information may be better encapsulated the way you did it. As I said previously, I am opened to implementing what you need as a thin layer on top of the "raw" swap capability. That would satisfy your needs, correct? Tom

Rob Riggs

12:36 a.m.

On 05/26/2010 11:07 AM, Stewart, Robert wrote:

...

Tomas Puverle wrote:

...
The highlights of the library are the following: - Very simple interface: swap<machine_to_little/big>(), swap_in_place<machine_to_little/big>()

In my version of such functionality, I have byte_order::to_host() and byte_order::to_network() function templates. Those names are more in keeping with the C functions htons, ntohs, etc.

+1 on the naming. No matter what the interface morphs into, "network" should be a valid synonym for "big_endian" for the reason stated.

Tomas Puverle

3:30 p.m.

...

...
In my version of such functionality, I have byte_order::to_host() and byte_order::to_network() function templates. Those names are more in keeping with the C functions htons, ntohs, etc.

+1 on the naming. No matter what the interface morphs into, "network" should be a valid synonym for "big_endian" for the reason stated.

Hi Rob, I am not sure what your comment refers to exactly. Earlier in the thread we have already established that for many users, the "network" order may mean different things, and hence the reason why my library doesn't prescribe one in the first place. Are you saying that you would like it to do so? Or are you saying you would like to have another typedef, "machine_to_network"? I *could* add that with no problems, of course, BUT like I said, that typedef may mean different things to different people. I would almost prefer for this to be an application specific define, like this //MyApp.h // //in our world, "network order" means little endian typedef endian::machine_to_little hton; and then use it later as swap<hton>(myType) Does that make sense to you? Tom

Rob Riggs

29 May 29 May

3:29 a.m.

On 05/28/2010 09:30 AM, Tomas Puverle wrote:

...

...
...
In my version of such functionality, I have byte_order::to_host() and

byte_order::to_network()

...
function templates. Those names are more in keeping with the C functions

htons, ntohs, etc.

...
...
+1 on the naming. No matter what the interface morphs into, "network" should be a valid synonym for "big_endian" for the reason stated.

Hi Rob,

I am not sure what your comment refers to exactly.

Earlier in the thread we have already established that for many users, the "network" order may mean different things, and hence the reason why my library doesn't prescribe one in the first place.

Hi Tomas, The term "network byte order" means the same to everyone who has had to learn what endianness is. Why would one want ignore that the term "network byte order" has real meaning in an endianness library? That there are legitimate use cases for sending little-endian data over the network is, to me, irrelevant.

...

Are you saying that you would like it to do so? Or are you saying you would like to have another typedef, "machine_to_network"?

Acknowledging "network byte order" in the interface does not prescribe anything. I am suggesting is that it should be a consistent synonym for "big endian". Virtually all IETF binary protocols specify /network byte order/ and everyone who reads the RFCs knows what that means. One need only google 'rfc "network byte order"' to get a feel for its pervasiveness. Ignoring it completely seems silly.

...

I *could* add that with no problems, of course, BUT like I said, that typedef may mean different things to different people. I would almost prefer for this to be an application specific define

But it's not application-specific. It's a de facto world-wide standard. Regards, Rob

Terry Golubiewski

4:02 a.m.

...

The term "network byte order" means the same to everyone who has had to learn what endianness is. Why would one want ignore that the term "network byte order" has real meaning in an endianness library? That there are legitimate use cases for sending little-endian data over the network is, to me, irrelevant.

...
Are you saying that you would like it to do so? Or are you saying you would like to have another typedef, "machine_to_network"?

Acknowledging "network byte order" in the interface does not prescribe anything. I am suggesting is that it should be a consistent synonym for "big endian". Virtually all IETF binary protocols specify /network byte order/ and everyone who reads the RFCs knows what that means. One need only google 'rfc "network byte order"' to get a feel for its pervasiveness. Ignoring it completely seems silly.

...
I *could* add that with no problems, of course, BUT like I said, that typedef may mean different things to different people. I would almost prefer for this to be an application specific define

But it's not application-specific. It's a de facto world-wide standard.

There are other networks besides the internet, some that don't use IP and are not big-endian by default. It would probably make sense to have an "internet" namespace that has all that stuff predefined somewhere. Endianess is a bigger concept than the internet, especially for embedded software that uses several dissimilar processors, but might not have any "network" interface. BTW there are other endiannesses besides "bit" and "little" too, but I haven't seen any processors with "middle-endian" like that since the PDP (if that was the one). Endianess is not just byte-swapping either. A big-endian, floating point representation and a little-endian representation of the same kind of floating point number may not be simple byte-reflections of each other. terry

Michael Caisse

4:28 p.m.

Terry Golubiewski wrote:

...

...
The term "network byte order" means the same to everyone who has had to learn what endianness is. Why would one want ignore that the term "network byte order" has real meaning in an endianness library? That there are legitimate use cases for sending little-endian data over the network is, to me, irrelevant.

...
Are you saying that you would like it to do so? Or are you saying you would like to have another typedef, "machine_to_network"?

Acknowledging "network byte order" in the interface does not prescribe anything. I am suggesting is that it should be a consistent synonym for "big endian". Virtually all IETF binary protocols specify /network byte order/ and everyone who reads the RFCs knows what that means. One need only google 'rfc "network byte order"' to get a feel for its pervasiveness. Ignoring it completely seems silly.

...
I *could* add that with no problems, of course, BUT like I said, that typedef may mean different things to different people. I would almost prefer for this to be an application specific define

But it's not application-specific. It's a de facto world-wide standard.

There are other networks besides the internet, some that don't use IP and are not big-endian by default. It would probably make sense to have an "internet" namespace that has all that stuff predefined somewhere. Endianess is a bigger concept than the internet, especially for embedded software that uses several dissimilar processors, but might not have any "network" interface. BTW there are other endiannesses besides "bit" and "little" too, but I haven't seen any processors with "middle-endian" like that since the PDP (if that was the one). Endianess is not just byte-swapping either. A big-endian, floating point representation and a little-endian representation of the same kind of floating point number may not be simple byte-reflections of each other.

terry

Tom / Terry - I don't think the point being made is that there are not different endian formats on networks but that the term "network byte order" actually means something. It is not a wishful or ambiguous description it is actually a real term with real meaning. Thanks to the Berkeley API that many of us grew up with "host to network" also has real meaning. I'm not suggesting that the term "network" should even appear in an endian library. I would prefer the term to only exist in some domain specific namespace; however, ignoring well over 20-years of terminology history isn't going to make things clearer in the interface. Can we keep the term network out of the main interface and simply add it to namespaces as Terry has suggested? Tom, I'm looking forward to reading more about your library. Best regards - michael -- ---------------------------------- Michael Caisse Object Modeling Designs www.objectmodelingdesigns.com

Tomas Puverle

30 May 30 May

2:23 p.m.

Michael,

...

It is not a wishful or ambiguous description it is actually a real term with real meaning. Thanks to the Berkeley API that many of us grew up with "host to network" also has real meaning.

I don't think we were disputing it at all; we were trying to...

...

I'm not suggesting that the term "network" should even appear in an endian library.

...make the same point as you have here.

...

I would prefer the term to only exist in some domain specific namespace;

Completely agree. Hence my reference to having the typedef as part of an "application" as opposed to the core part of the endian library.

...

however, ignoring well over 20-years of terminology history isn't going to make things clearer in the interface.

I am not - I was simply pointing out that it's not a network library. This typedef should, perhaps be better placed in ASIO, even though I still think even that is somewhat contentious.

...

Can we keep the term network out of the main interface and simply add it to namespaces as Terry has suggested?

Do you have a suggestion? I am not sure that adding anything about networking to the endian library is the correct separation of concerns but I am open to ideas.

...

Tom, I'm looking forward to reading more about your library.

Thank you. And I look forward to any additional comments you may have. Tom

Jonathan Franklin

1 Jun 1 Jun

9:15 p.m.

On Sat, May 29, 2010 at 10:28 AM, Michael Caisse <boost@objectmodelingdesigns.com> wrote:

...

I don't think the point being made is that there are not different endian formats on networks but that the term "network byte order" actually means something. It is not a wishful or ambiguous description it is actually a real term with real meaning. Thanks to the Berkeley API that many of us grew up with "host to network" also has real meaning.

Yup.

...

I'm not suggesting that the term "network" should even appear in an endian library. I would prefer the term to only exist in some domain specific namespace; however, ignoring well over 20-years of terminology history isn't going to make things clearer in the interface.

+1 for not including "network" in the interface to avoid future discussions such as these. Jon

Tomas Puverle

30 May 30 May

2:13 p.m.

Rob,

...

The term "network byte order" means the same to everyone who has had to learn what endianness is. Why would one want ignore that the term "network byte order" has real meaning in an endianness library?

Because this is an endian library and not a network library. Like others on this thread, I think you are making the assumption that the main use for "endian" swapping is to do with networking. Endian swapping is a more general utility and hence my resistance to including terms such as "network byte order" within it.

...

That there are legitimate use cases for sending little-endian data over the network is, to me, irrelevant.

I am sorry but I am having trouble interpreting this sentence. I presume you didn't for it to be inflamatory - could you perhaps reword it so I can understand what you are trying to say?

...

Acknowledging "network byte order" in the interface does not prescribe anything. I am suggesting is that it should be a consistent synonym for "big endian".

Then the answer is "no" for the reason I spelled out above. It would be innapropriate for a low-level library such as an endian swapper to dictate application policies. Tom

vicente.botet

26 May 26 May

9:55 p.m.

Hi Tom, how are you. ----- Original Message ----- From: "Tomas Puverle" <tomas.puverle@morganstanley.com> To: <boost@lists.boost.org> Sent: Wednesday, May 26, 2010 6:37 PM Subject: [boost] [boost::endian] Request for comments/interest

...

I wrote an endian library on the flight back from boostcon but it has taken a little time to clear the submission with the firm.

I realize that Beman's endian swapper library is already in the review queue but I hope my approach is different and interesting enough to warrant consideration. It's located here in the vault (Utilities/endian-puverle.zip):

http://www.boostpro.com/vault/index.php?action=downloadfile&filename=endian-puverle.zip&directory=Utilities&

The highlights of the library are the following: - Very simple interface: swap<machine_to_little/big>(), swap_in_place<machine_to_little/big>() - Support for built-in and user-defined data types and ranges thereof. E.g. you can swap ranges of structs containing/deriving from other structs - endian::iterator, which will iterate across a range swapping values as necessary. It works with any swappable type.

The library is described/documented/tested in the file main.cpp at the root of the zip archive.

Unfinished items: - Right now, the library only includes the "vanilla" swapper, meaning it doesn't take advantage of special instructions on architectures which natively support endian conversion. I didn't include this in order to prevent cluttering the code with #ifdefs. This functionality would, however, be part of the final library. - Ideally, the metafunction is_mutable_range<> should be moved out of endian::detail and become part of Boost.Range of Boost.TypeTraits.

I have donned by bulletproof vest and I'm ready for any feedback you may have. :)

One of the advantages of the Beman's Endian library is that the endianes of the stored data is on the type. I think that this is a must, as very often the hidden endian bugs are are due to this fact. I proposed to Beman to split the functionality of endian into two levels. The first level take care just of endian format and how to convert from one endian to the ether. The second defines integer endian types. The reason to split is that the cost of executing arithmetic operations that make two conversions each time is to high for a lot of applications, so it will be better to disallows these operations. Of course the user can not use them with the current design, but an error is always possible. int a=0xABCD; int b=endian::swap<machine_to_big>(a); is done with Boost.Endian as follows native32_t a=0xABCDEF; big32_t b=a; or int a=0xABCDEF; big32_t b=a; * I like your endian::swap idea to apply swap to structures. I have not see yet what BOOST_ENDIAN_UDT(Base, (i)(d)) does, but maybe something like that would be done struct Base { int i; short s; }; struct BaseBig { big32_t i; big16_t s; }; We can see the structures as fusion sequences using the STRUCT_ADAPTOR. It should be not too complicated to define a conversion function that uses the fusion sequence views of both structures and that manage with endianes Base b = {1234, 123}; BaseBig bb = convert<BaseBig>(b); * My experience with swap in place is that it is very dangerous as you can swap in order to send a message and forget let the variable with the wrong format and make arithmetic operations on. int a=0xABCD; endian::swap_in_place<machine_to_big>(b); ... later on which is the format of 'a'? Usualy the developer will think that it is native, but ... * Maybe your design could perform better and allows in_place changes (which is risky), but if I had to start a project I will use the Beman's library, if the performances were supportable, which I think it is if we don't use arithmetic on endian types. Resuming, there will be always people that prefere one or the other design. Just 2 cts from one that had had a lot of endian issues. Best, Vicente

Tomas Puverle

27 May 27 May

2:26 a.m.

...

Hi Tom, how are you.

...

One of the advantages of the Beman's Endian library is that the endianes of

I am good. I hope you had a good flight back from boostcon. I ended up being stuck in Denver because I missed my last connection. the stored data is on the type. I

...

think that this is a must, as very often the hidden endian bugs are are due to this fact.

There may be some overlap here with the post I just sent in reply to Terry's questions. Rather than repeating myself, do you mind if I refer you to that post?

...

I proposed to Beman to split the functionality of endian into two levels. The first level take care just of endian format and how to convert from one endian to the ether. The second defines integer endian types. <snip more...>

Actually, I have toyed with the idea and I am not at all opposed to it. But I thought I'd start small first.

...

* My experience with swap in place is that it is very dangerous as you can swap in order to send a message and <snip>

At the end of the day, doing endian swapping is inherently a "dangerous" operation, to use your words. I don't think we can always stop the programmer from doing something silly... ;)

...

Resuming, there will be always people that prefere one or the other design.

I agree. Thank you for the feedback. Tom

Terry Golubiewski

29 May 29 May

1:24 p.m.

I was just looking at your implementation. It looks like it will only swap things with even-numbers of entities. For example, would swap_in_place<int24_t>(an_int24_t) work? sizeof(int24_t) == 3 terry

Tomas Puverle

30 May 30 May

1:22 a.m.

...

I was just looking at your implementation. It looks like it will only swap things with even-numbers of entities. For example, would swap_in_place<int24_t>(an_int24_t) work? sizeof(int24_t) == 3

You are correct that I haven't provided the overloads for the non-power two sizes. This, however, is not a limitation of the design but simply the product of my desire to keep the amount of code to a minimum in the RFC version of the code. It simply a question of providing the template specializations.

5534

Age (days ago)

5540

Last active (days ago)

List overview

Download

55 comments

15 participants

participants (15)

Cliff Green
Giovanni Piero Deretta
Gottlob Frege
Jonathan Franklin
Kim Barrett
Mateusz Loskot
Michael Caisse
Rob Riggs
Scott McMurray
Stephan T. Lavavej
Stewart, Robert
Terry Golubiewski
Tomas Puverle
Tomas Puverle
vicente.botet