[endian] Comments on Endian 0.5

Hello, I started a thread on Boost-users about the proposed endian library, but it's probably more suitable on the dev list. Here's a link to the thread in the archives: <http://lists.boost.org/boost-users/2006/07/20887.php> I see two issues with the library as of 0.5. First, the endian classes cannot be used in unions. And, second the classes cannot be used in variable argument lists. I'm attaching a patch to help resolve both of these issues. In order to be used in a union, a class must not have any constructors. However, removing constructors breaks the following code: big4_t x = 42; My solution was to provide a static init() method to help in these situations: big4_t x = big_t::init(42); The second problem shows up using varargs. For example, this will not compile: printf("x = %d\n", x); The solution I came up with is to use operator() to get the native value: printf("x = %d\n", x()); In my patch, I've also added a check_unions() function to endian_test.cpp that tests the sizes of unions. Thanks, -Dave

Dave Dribin wrote :
I see two issues with the library as of 0.5. First, the endian classes cannot be used in unions. And, second the classes cannot be used in variable argument lists.
How is that a problem ? Those are old C features that should be avoided. Using an ugly interface just to make it a POD type is a bad idea IMO.

On Jul 20, 2006, at 4:17 PM, loufoque wrote:
Dave Dribin wrote :
I see two issues with the library as of 0.5. First, the endian classes cannot be used in unions. And, second the classes cannot be used in variable argument lists.
How is that a problem ? Those are old C features that should be avoided.
Maybe I'm misunderstanding the use case of the endian library, but I can see these classes being used to read and write binary file formats and network messages. Sometimes these things use unions. It's not necessarily an old C feature. It's genuinely useful to have two fields map to the same memory location for some protocols or file formats. The use case I gave on Boost-users was I had to port code to read and write Apple's HFS+ file system. Apple was kind enough to provide structures (and unions) to access the low-level data: <http://darwinsource.opendarwin.org/10.4.6.ppc/xnu-792.6.70/bsd/ hfs/hfs_format.h> Of course, when I ported code that used this header from PowerPC to Intel, the endian was wrong. I basically wrote my own C++ classes, very similar to the Boost ones, to help hide the byte swapping ugliness which was needed. With these classes, I was able to replace all uses of u_int32_t with ubig4_t in the header file. Then, all my code that accessed these structures was instantly portable to Intel with very little code changes. It was pretty remarkable, actually.
Using an ugly interface just to make it a POD type is a bad idea IMO.
What's ugly? The ::init()? It's not *that* ugly. And it makes the library more useful. What's the use case for requiring elegant initializers? I don't see many people doing something like: big4_t x = 42; What good is a single big-endian variable on the stack? I see far more use cases for these classes in structs and unions, and then writing them out a file descriptor. -Dave

Dave Dribin wrote :
Sometimes these things use unions. It's not necessarily an old C feature.
Have you looked at boost.variant ? It allows similar semantic functionality.
What's ugly?
It's an illegimate use of a factory. According to the RAII idiom, the resource should be acquired through its constructor. However I admit I don't know how it fits with (un)serialization of binary files.

On Jul 21, 2006, at 10:51 AM, loufoque wrote:
Dave Dribin wrote :
Sometimes these things use unions. It's not necessarily an old C feature.
Have you looked at boost.variant ?
I hadn't until just now.
It allows similar semantic functionality.
Eh... it may be similar semantically, but it doesn't fit the same use cases. First, this doesn't work: boost::variant< big4_t, little4_t > u(42); You get a 3,300 character error message, the gist of which is: error: call of overloaded 'initialize(void*, const int&)' is ambiguous Second, variant provides no guarantee of memory layout. boost::variant< big4_t, uint32_t > u(42); std::cout << sizeof(u) << std::endl; provides an output of 8. The whole point of a union is to have to structure members share the same memory locations. This union (assuming you could actually use these in unions) has a sizeof 4: union { big4_t big; little4_t little; } big_or_little;
It's an illegimate use of a factory. According to the RAII idiom, the resource should be acquired through its constructor.
Well, I guess that's a matter of opinion. We're not really acquire any resources here, so it seems okay to break the idiom. I think it's a happy-medium workaround for a restriction of the C++ language. Remember that the whole point of these endian classes is to abstract away physical memory, and give you complete control over the alignment and endianness of binary data. You *want* to use these in structures because you can guarantee padding and endiannes of the underlying memory. They're not your typical C++ abstractions. They're necessarily low-level. Which is why, in my opinion, they should be allowed in unions. A unions sole purpose is to provide different overlays on top of memory. Just my $.02, -Dave

On Jul 20, 2006, at 4:17 PM, loufoque wrote:
[snip]
What's ugly? The ::init()? It's not *that* ugly. And it makes the library more useful. What's the use case for requiring elegant initializers? I don't see many people doing something like:
big4_t x = 42;
What about the use of endian objects where one would expect raw integral objects? Such use includes template<typename DestT, typename SourceT> DestT make_double(SourceT val) { return 2 * val; } big4_t doubleVal = make_double<big4_t>(21); The generic code need not even deal with arithmetic at all, such as in vector<int> int_vect; ... vector<big4_t> big4_vect(int_vect.begin(), int_vect.end()); Since we do not have the luxury of two user-defined conversions in the chain from 'int' to 'endian<...>', which would allow us to write an intermediate type with both an 'int' constructor and an 'endian' operator, we have to choose: unions (and other pure POD scenarios) or conversion from raw integral objects. I think a wise decision was made in the choosing the latter.
What good is a single big-endian variable on the stack? I see far more use cases for these classes in structs and unions, and then writing them out a file descriptor.
But we also want it to be interchangeable - as far as possible - with integers in generic code (whether templates or "copy-and-pasting".) /David

On Jul 21, 2006, at 4:14 PM, David Bergman wrote:
What about the use of endian objects where one would expect raw integral objects? Such use includes
template<typename DestT, typename SourceT> DestT make_double(SourceT val) { return 2 * val; }
big4_t doubleVal = make_double<big4_t>(21);
This doesn't seem to be a common use case for binary data structures used in I/O.
The generic code need not even deal with arithmetic at all, such as in
vector<int> int_vect; ... vector<big4_t> big4_vect(int_vect.begin(), int_vect.end());
And this has a workaround by using a for loop.
Since we do not have the luxury of two user-defined conversions in the chain from 'int' to 'endian<...>', which would allow us to write an intermediate type with both an 'int' constructor and an 'endian' operator, we have to choose: unions (and other pure POD scenarios) or conversion from raw integral objects. I think a wise decision was made in the choosing the latter.
I guess I disagree. With an assignment operator, most of the conversions from raw integral types are covered, and you can still use 'em in unions. If you still feel this way, would it be at least possible to add a #define to disable the constructors? This would allow the user of the library to choose what is important to them, rather than the library writer. We obviously can't support all use cases with the current limitations of the C++ language.
What good is a single big-endian variable on the stack? I see far more use cases for these classes in structs and unions, and then writing them out a file descriptor.
But we also want it to be interchangeable - as far as possible - with integers in generic code (whether templates or "copy-and-pasting".)
Yes, I would agree. As far as possible, but not at the expense of legitimate and useful use cases. An endian class library, that could be used in unions, would have saved me a good chunk of time on a recent client project. I keep re-reading the documentation, and nowhere does it explain that interchangeability with integer types is a key requirement. It talks about providing "integer-like byte-holder binary types with explicit control over byte order, value type, size, and alignment." It talks about "use cases almost always involve I/O, either via files or network connections." Unions are not uncommon in these situations. In fact, the example at the end gives a structure representing a file format. Again, it is not at all uncommon to have structures that represent file formats that require unions. Even one of the FAQ questions talks about why operators are are provided at all, since that isn't really the prime feature of these classes. Basically, they are nice to have to make code more readable. And I agree with that, so long as reducing clutter doesn't inhibit some real-world use cases. Removing constructors makes only a small fraction of uses cases more wordy. Well, if I haven't convinced you by this point, hopefully at you'll at least consider using a #define to disable constructors for those people that *do* find a real-world use for such classes in unions. -Dave

Dave Dribin <dave-ml@dribin.org> writes:
[snip]
Well, if I haven't convinced you by this point, hopefully at you'll at least consider using a #define to disable constructors for those people that *do* find a real-world use for such classes in unions.
Perhaps the solution is to support both use cases by having a separate type for each of the two use cases. -- Jeremy Maitin-Shepard

David Dribin wrote:
On Jul 21, 2006, at 4:14 PM, David Bergman wrote:
What about the use of endian objects where one would expect raw integral objects? Such use includes
template<typename DestT, typename SourceT> DestT make_double(SourceT val) { return 2 * val; }
big4_t doubleVal = make_double<big4_t>(21);
This doesn't seem to be a common use case for binary data structures used in I/O.
Ok, but *before* being output? The object might very well pass through some generic code.
The generic code need not even deal with arithmetic at all, such as in
vector<int> int_vect; ... vector<big4_t> big4_vect(int_vect.begin(), int_vect.end());
And this has a workaround by using a for loop.
I don't understand. One would need to use "::init", right? That is not exactly compatible with regular integer types - or most other types. I was refering to generic code which we cannot easily change, as well.
Since we do not have the luxury of two user-defined conversions in the chain from 'int' to 'endian<...>', which would allow us to write an intermediate type with both an 'int' constructor and an 'endian' operator, we have to choose: unions (and other pure POD scenarios) or conversion from raw integral objects. I think a wise decision was made in the choosing the latter.
I guess I disagree. With an assignment operator, most of the conversions from raw integral types are covered, and you can still use 'em in unions. If you still feel this way, would it be at least possible to add a #define to disable the constructors?
Yes, maybe.
This would allow the user of the library to choose what is important to them, rather than the library writer. We obviously can't support all use cases with the current limitations of the C++ language.
You are right. [snip]
Well, if I haven't convinced you by this point, hopefully at you'll at least consider using a #define to disable constructors for those people that *do* find a real-world use for such classes in unions.
I would probably not consider it, but it is not my choice :-) What I *would have* considered, though, is to have two templates: template<...> struct pod_endian { ... no constructor but all the endian logic ... } and template<...> struct endian : pod_endian { ... constructors and not much else ... } And have conversion operator to get an endian from a pod_endian. Additionally, I would probably use my embed_type proposal and get: template<.., typename IntType, ..> struct endian : embed_type<IntType, endian> { ... } using proper policies. /David

I wrote:
What I *would have* considered, though, is to have two templates:
template<...> struct pod_endian { ... no constructor but all the endian logic ... }
and
template<...> struct endian : pod_endian { ... constructors and not much else ... }
And have conversion operator to get an endian from a pod_endian.
... or have one template with an extra flag, for "create constructor" (much like a policy...) and use a template specialization // Here, 'pod_endian' contains the actual logic, but can be // hidden (put in anonymous namespace...) template<endianness E, typename RawT, std::size_t n_bytes, alignment A, bool createConstructor = true> struct endian : pod_endian<E, RawT, n_bytes, A> { endian(RawT num) : m_value(num) {} }; template<..., typename RawT> struct endian<..., RawT, false> : pod_endian<E, RawT, n_bytes, A> { // no constructor, and actually nothing else either :-) } This way, we can use either one: union MakingMrDribinHappy { endian<..., false> mySpecificInt; int somethingElse; }; template<typename T> void makingMrDawesHappy(T t) { .... } makingMrDawesHappy(endian<..., true>(42)); One would of course have to either create named instantiations for both these kinds or parameterize them, as template<bool useConstructor> struct big4_t : endian<big, int_least32_t, 4, useConstructor> {}; The use is not as nice: big4_t<> myTemplateFriendlyInt; big4_t<false> myUnionFriendlyInt; Ugh :-( /David

On Jul 25, 2006, at 10:56 PM, David Bergman wrote:
What I *would have* considered, though, is to have two templates:
template<...> struct pod_endian { ... no constructor but all the endian logic ... }
and
template<...> struct endian : pod_endian { ... constructors and not much else ... }
And have conversion operator to get an endian from a pod_endian.
I actually like that. It would double the number of typedefs, but I think it's a decent compromise. Maybe use a naming convention like xxx_pod_t for the POD types, e.g. big32_pod_t?
Additionally, I would probably use my embed_type proposal and get:
template<.., typename IntType, ..> struct endian : embed_type<IntType, endian> { ... }
using proper policies.
I'm not familiar with this. Does that make it a non-POD type? -Dave

On 7/27/06 6:50 PM, "loufoque" <mathias.gaunard@etu.u-bordeaux1.fr> wrote:
Dave Dribin wrote :
And this has a workaround by using a for loop.
Not really. std::vector, just like a lot of elements from the C++ standard library, require that the types are CopyConstructible. And if endian types are, they're not PODs anymore.
Why would being copy-constructible forbid POD-ness? Obviously, adding a copy constructor to a class/struct/union would disqualify it from being POD. But that's not the only way to define copying. The automatic copy constructor does _not_ cancel POD qualification, since is a dumb bitwise copy for PODs, and should still count as copy-constructible as far as STL is concerned. (If an explicit copy construction operation actually is required, then the concept is broken. It not like you can improperly use a constructor by getting its member-function address.) -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com

Daryle Walker wrote :
Why would being copy-constructible forbid POD-ness? Obviously, adding a copy constructor to a class/struct/union would disqualify it from being POD. But that's not the only way to define copying. The automatic copy constructor does _not_ cancel POD qualification, since is a dumb bitwise copy for PODs, and should still count as copy-constructible as far as STL is concerned. (If an explicit copy construction operation actually is required, then the concept is broken. It not like you can improperly use a constructor by getting its member-function address.)
As I clearly said, I was talking about the endian types. And they require a custom constructor.

"Dave Dribin" <dave-ml@dribin.org> wrote in message news:F773AA4D-E17B-4FB2-B872-116B17E7715B@dribin.org...
Hello,
I started a thread on Boost-users about the proposed endian library, but it's probably more suitable on the dev list. Here's a link to the thread in the archives:
<http://lists.boost.org/boost-users/2006/07/20887.php>
I see two issues with the library as of 0.5. First, the endian classes cannot be used in unions.
Yep, that's a problem. But it goes beyond union requirements - the class needs to be changed be POD's. The primary use is I/O, and that means they need to be memcpyable. And that means they must be POD's. I know some have argued that both POD and non-POD versions should be supplied, but I don't want to depend on undefined behavior, which is what will happen with a non-POD endian class. Constructors can be added later if the committee decides to relax the POD requirements.
And, second the classes cannot be used in variable argument lists. I'm attaching a patch to help resolve both of these issues.
In order to be used in a union, a class must not have any constructors. However, removing constructors breaks the following code:
big4_t x = 42;
My solution was to provide a static init() method to help in these situations:
big4_t x = big_t::init(42);
Interesting. Of course you could have also written it like this: big4_t x; x=45;
The second problem shows up using varargs. For example, this will not compile:
printf("x = %d\n", x);
The solution I came up with is to use operator() to get the native value:
printf("x = %d\n", x());
My old implementation also did that, but over time I came to feel it was too cryptic. I'd prefer to call the functon value() I think.
In my patch, I've also added a check_unions() function to endian_test.cpp that tests the sizes of unions.
Thanks for spotting these problems. I fell stupid for no thinking the POD issue throught before this. My old interface was a POD - I guess I got seduced by the elegance of Darin's interface and wishful thinking about changing C++ to allow POD's to have constructors (except copy constructors). --Beman
participants (7)
-
Beman Dawes
-
Daryle Walker
-
Dave Dribin
-
David Bergman
-
David Bergman
-
Jeremy Maitin-Shepard
-
loufoque