Endian library - request for comments - Boost

Endian library - request for comments

Beman Dawes

31 May 2006 31 May '06

2:44 p.m.

Back in 1999 and 2000, there was discussion of a Boost Endian library to provided portable big and little endian integer types. I've dusted off some old code from Darin Adler, gotten his permission to use the new Boost license, and have put together a library. A zip file of the whole library is available at http://mysite.verizon.net/~beman/endian.zip The docs can be read at http://mysite.verizon.net/~beman/endian.html Comments welcome! --Beman

Show replies by date

Matias Capeletto

31 May 31 May

3:01 p.m.

I like to see it in boost, I have had some troubles in the past switching between machine architectures... I am not very fond of the interface, i prefer something like: integer<endian::big,8> bn; integer<endian::little> ln; // default second parameter What do you think? On 5/31/06, Beman Dawes <bdawes@acm.org> wrote:

...

Back in 1999 and 2000, there was discussion of a Boost Endian library to provided portable big and little endian integer types. I've dusted off some old code from Darin Adler, gotten his permission to use the new Boost license, and have put together a library.

A zip file of the whole library is available at http://mysite.verizon.net/~beman/endian.zip

The docs can be read at http://mysite.verizon.net/~beman/endian.html

Comments welcome!

--Beman _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Beman Dawes

4:08 p.m.

"Matias Capeletto" <matias.capeletto@gmail.com> wrote in message news:e9b043a10605310801t2cdff7f3u1fac405dbab735cb@mail.gmail.com...

...

I like to see it in boost, I have had some troubles in the past switching between machine architectures... I am not very fond of the interface, i prefer something like:

integer<endian::big,8> bn; integer<endian::little> ln; // default second parameter

What do you think?

See my response to Scott McMurray. Thanks, --Beman

me22

3:17 p.m.

On 5/31/06, Beman Dawes <bdawes@acm.org> wrote:

...

Back in 1999 and 2000, there was discussion of a Boost Endian library to provided portable big and little endian integer types. I've dusted off some old code from Darin Adler, gotten his permission to use the new Boost license, and have put together a library.

Comments welcome!

Very interesting library. I've written something before with similar functionality, but using a rather different approach. I haven't tried it yet, jut here are a few comments from looking at the code and documentation: The link to http://mysite.verizon.net/example/example1.cpp is broken Why not expose the template implementation and remove the typedefs? Presumably Boost.Integer-style template manipulations could be used to find the appropriate storage types. I think the names for the types might be a little bit too cute. "bin" makes me think binary, not signed big-endian. That being said, I'm not sure I have any good name that isn't far too verbose. unsigned_littleendian_aligned<4> is admittedly starting to push convenience, though bigendian<5> isn't too bad. Some sort of size method would be nice so that out.write( reinterpret_cast<char*>(&big5), 5 ); could instead be out.write( reinterpret_cast<char*>(&big5), big5.size() ); to avoid the magic constant. I agree that operator<< cannot be used for binary output, but reinterpret_cast-and-write "smells" to me. What about using operator< or operator<=? They have similar precedence to << and >>, and are visually similar enough that the connection might be decently intuitive. Something involving & could also be nice, mirroring Serialisation, but I don't know of a good way of using anything with & in a nice way with a bidirectional stream without going the full Serialization framework route. ~ Scott McMurray

Beman Dawes

4:07 p.m.

"me22" <me22.ca@gmail.com> wrote in message news:fa28b9250605310817l2d71b783j9e92bd247619d313@mail.gmail.com...

...

On 5/31/06, Beman Dawes <bdawes@acm.org> wrote:

...
Back in 1999 and 2000, there was discussion of a Boost Endian library to provided portable big and little endian integer types. I've dusted off some old code from Darin Adler, gotten his permission to use the new Boost license, and have put together a library.

Comments welcome!

Very interesting library. I've written something before with similar functionality, but using a rather different approach.

I haven't tried it yet, jut here are a few comments from looking at the code and documentation:

The link to http://mysite.verizon.net/example/example1.cpp is broken

I only put the main html file up - if you unpack the .zip file, the link will hopefully work OK.

...

Why not expose the template implementation and remove the typedefs? Presumably Boost.Integer-style template manipulations could be used to find the appropriate storage types.

The rationale for typedefs is that when an application uses endian classes, they are often used very heavily. Think hundreds, thousands, or even more uses in an organization. So, the names should be short, and all developers should use the same names. The rationale for hiding the templates was that some implementations may prefer to provide hand-coded implementations of each type, either to speed compiles or to achieve platform specific optimizations. (I don't recommend the latter because of a lot of past experience where optimizations became pessimizations when something as simple as a compiler switch changed.) That said, I guess there could be an argument to expose the templates for those who prefer them.

...

I think the names for the types might be a little bit too cute. "bin" makes me think binary, not signed big-endian. That being said, I'm not sure I have any good name that isn't far too verbose. unsigned_littleendian_aligned<4> is admittedly starting to push convenience, though bigendian<5> isn't too bad.

I'm often amazed at the clever names Boosters suggest, so I think it is worthwhile to speculate a bit about better names. But the everyday use typedefs really do need to be short and memorable. I've been using the "bin2", "bun3", etc. since 1984 or so, with several hundred programmers now using them all the time, and never had a request to change the names.

...

Some sort of size method would be nice so that out.write( reinterpret_cast<char*>(&big5), 5 ); could instead be out.write( reinterpret_cast<char*>(&big5), big5.size() ); to avoid the magic constant.

I guess I shouldn't have used that example, since it isn't realistic. In the real world, the endian classes are almost always used in classes or structs, and the I/O is usually Unix or fstream level rather than iostreams. A more typical use might be something like: struct header_record { bun3 version; bun1 rev; bun5 nrecs; ..... blah blah blah }; Thanks for the comments, --Beman

Rene Rivera

4:40 p.m.

Beman Dawes wrote:

...

"me22" <me22.ca@gmail.com> wrote in message news:fa28b9250605310817l2d71b783j9e92bd247619d313@mail.gmail.com...

...
On 5/31/06, Beman Dawes <bdawes@acm.org> wrote:

...

That said, I guess there could be an argument to expose the templates for those who prefer them.

Or, if those templates are sufficiently generic, exposing them for the case for different kinds of endianness. In those rare cases when you are emulating some strange hardware :-)

...

...
I think the names for the types might be a little bit too cute. "bin" makes me think binary, not signed big-endian. That being said, I'm not sure I have any good name that isn't far too verbose. unsigned_littleendian_aligned<4> is admittedly starting to push convenience, though bigendian<5> isn't too bad.

I'm often amazed at the clever names Boosters suggest, so I think it is worthwhile to speculate a bit about better names. But the everyday use typedefs really do need to be short and memorable. I've been using the "bin2", "bun3", etc. since 1984 or so, with several hundred programmers now using them all the time, and never had a request to change the names.

Since name suggestions are up... My main problem with the bin*, bun* names is that you have to train yourself to know what they mean. It might be easy after a while but they don't say anything to me initially (even with the explanatory chart). My suggestion would be to stick close to the existing cstdint types: int_be8_t int_le8_t uint_be8_t uint_le8_t etc. Of course that means using bit sizes instead of byte sizes. But I find the bit sizes more natural anyway :-)

...

A more typical use might be something like:

struct header_record { bun3 version; bun1 rev; bun5 nrecs; ..... blah blah blah };

One question I have is... How common is it to need all the various arithmetic operators in these types? All my use cases of endianness handling are solved by a single specialized byteswap() function. And hence see the utility of having the type do it for me, but not the utility of the operators. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Jeff Flinn

6:22 p.m.

Rene Rivera wrote:

...

Beman Dawes wrote:

...
"me22" <me22.ca@gmail.com> wrote in message news:fa28b9250605310817l2d71b783j9e92bd247619d313@mail.gmail.com...

...
On 5/31/06, Beman Dawes <bdawes@acm.org> wrote: ... I think the names for the types might be a little bit too cute. "bin" makes me think binary, not signed big-endian. That being said, I'm not sure I have any good name that isn't far too verbose. unsigned_littleendian_aligned<4> is admittedly starting to push convenience, though bigendian<5> isn't too bad.

I'm often amazed at the clever names Boosters suggest, so I think it is worthwhile to speculate a bit about better names. But the everyday use typedefs really do need to be short and memorable. I've been using the "bin2", "bun3", etc. since 1984 or so, with several hundred programmers now using them all the time, and never had a request to change the names.

Since name suggestions are up... My main problem with the bin*, bun* names is that you have to train yourself to know what they mean. It might be easy after a while but they don't say anything to me initially (even with the explanatory chart). My suggestion would be to stick close to the existing cstdint types:

int_be8_t int_le8_t uint_be8_t uint_le8_t

etc.

Of course that means using bit sizes instead of byte sizes. But I find the bit sizes more natural anyway :-)

This convention occured to me as well, as I was reading through the thread. Needless to say I like this naming convention, add another vote. Jeff Flinn

Beman Dawes

6:44 p.m.

Rene Rivera wrote:

...

Beman Dawes wrote:

...
"me22" <me22.ca@gmail.com> wrote in message news:fa28b9250605310817l2d71b783j9e92bd247619d313@mail.gmail.com...

...
On 5/31/06, Beman Dawes <bdawes@acm.org> wrote:

...
That said, I guess there could be an argument to expose the templates for those who prefer them.

Or, if those templates are sufficiently generic, exposing them for the case for different kinds of endianness. In those rare cases when you are emulating some strange hardware :-)

...
...
I think the names for the types might be a little bit too cute. "bin" makes me think binary, not signed big-endian. That being said, I'm not sure I have any good name that isn't far too verbose. unsigned_littleendian_aligned<4> is admittedly starting to push convenience, though bigendian<5> isn't too bad. I'm often amazed at the clever names Boosters suggest, so I think it is worthwhile to speculate a bit about better names. But the everyday use typedefs really do need to be short and memorable. I've been using the "bin2", "bun3", etc. since 1984 or so, with several hundred programmers now using them all the time, and never had a request to change the names.

Since name suggestions are up... My main problem with the bin*, bun* names is that you have to train yourself to know what they mean. It might be easy after a while but they don't say anything to me initially (even with the explanatory chart). My suggestion would be to stick close to the existing cstdint types:

int_be8_t int_le8_t uint_be8_t uint_le8_t

etc.

or: int_b8_t int_l8_t uint_b8_t uint_l8_t or: int_b1_t int_l1_t uint_b1_t uint_l1_t

...

Of course that means using bit sizes instead of byte sizes. But I find the bit sizes more natural anyway :-)

I tried it both ways over the years. The problem with using bit sizes is that a programmer is often counts these things, and always in terms of bytes. Bytes is just more convenient than bits. That's because in the Geographic applications I work on, every additional byte gets multiplied by 50 or 100 million (because that is how many records it takes to represent the US and Canada, without even mentioning the rest of the world.) So a designer thinks in terms of bytes, not eight bit multiples. Also, working in terms of bytes seems to signal to readers that something special is going on - int_b32_t is more likely to be mistakenly viewed as just another typedef for an int32_t than bin4 or bin4_t.

...

...
A more typical use might be something like:

struct header_record { bun3 version; bun1 rev; bun5 nrecs; ..... blah blah blah };

One question I have is... How common is it to need all the various arithmetic operators in these types? All my use cases of endianness handling are solved by a single specialized byteswap() function. And hence see the utility of having the type do it for me, but not the utility of the operators.

My original classes didn't support arithmetic operations. But Darin Adler and several others argued that it was much easier if all were supported. I went back and reviewed old code and found they were right. Providing a full set of operations reduced program clutter and made for code both easier to write and to read. I became a believer. Thanks for the comments, --Beman

me22

7:38 p.m.

On 5/31/06, Beman Dawes <bdawes@acm.org> wrote:

...

That's because in the Geographic applications I work on, every additional byte gets multiplied by 50 or 100 million (because that is how many records it takes to represent the US and Canada, without even mentioning the rest of the world.) So a designer thinks in terms of bytes, not eight bit multiples.

My original classes didn't support arithmetic operations. But Darin Adler and several others argued that it was much easier if all were supported. I went back and reviewed old code and found they were right. Providing a full set of operations reduced program clutter and made for code both easier to write and to read. I became a believer.

It sounds like the main advantage you're getting here is the reduced space requirement. Can you elaborate why the endianness is important for you here? I think that it might be most useful were the code adapted to become an addition to Boost.Integer, so that one could use, for example, boost::int_t<24>::exact and get a type requiring only 3 bytes of storage, assuming CHAR_BIT==8. ( Obviously with no guarantee of a specific endianness or that it would be a fundamental type and perhaps STATIC_ASSERT'ing for bit lengths not a multiple of CHAR_BIT. ) It would be another option for a space/speed trade-off as with ::fast vs ::least. On 5/31/06, Cliff Green <cliffg@codewrangler.net> wrote:

...

I've never had a need for code that performs operations on endian types, other than as "placeholders" for performing I/O with them (whether over a network, or disk I/O).

Agreed. I'm still not convinced that these types are a better solution than a simple input/output wrapper allowing something along the lines of long x; infile >> big_endian<3>(x); x *= 13; outfile << little_endian<3>(x); ~ Scott McMurray

Beman Dawes

2 Jun 2 Jun

2:51 p.m.

"me22" <me22.ca@gmail.com> wrote in message news:fa28b9250605311238t323effai49e2d275e8376eff@mail.gmail.com...

...

On 5/31/06, Beman Dawes <bdawes@acm.org> wrote:

...
That's because in the Geographic applications I work on, every additional byte gets multiplied by 50 or 100 million (because that is how many records it takes to represent the US and Canada, without even mentioning the rest of the world.) So a designer thinks in terms of bytes, not eight bit multiples.

My original classes didn't support arithmetic operations. But Darin Adler and several others argued that it was much easier if all were supported. I went back and reviewed old code and found they were right. Providing a full set of operations reduced program clutter and made for code both easier to write and to read. I became a believer.

It sounds like the main advantage you're getting here is the reduced space requirement. Can you elaborate why the endianness is important for you here?

If endianness isn't important, these are the wrong classes to use. Both the space saving and portability (ie. endianness) is important in my apps, but the portability is by far the strongest motivation for these classes. I've reworked the docs and example to hopefully make that clearer. In reworking the example, I used part of a file format that is a de facto standard in the GIS industry. It contains both big and little endian integers. (Don't ask - I have no idea why it was designed that way - I wasn't the designer.)

...

I think that it might be most useful were the code adapted to become an addition to Boost.Integer, so that one could use, for example, boost::int_t<24>::exact and get a type requiring only 3 bytes of storage, assuming CHAR_BIT==8. ( Obviously with no guarantee of a specific endianness or that it would be a fundamental type and perhaps STATIC_ASSERT'ing for bit lengths not a multiple of CHAR_BIT. ) It would be another option for a space/speed trade-off as with ::fast vs ::least.

I've exposed the integer_cover_operators template and moved it to a separate header to make it generally available. I like your suggestion to make it part of Boost.Integer - I hadn't thought of that. Asside: Would it also work for floating-point types? Once we have Concepts, the compiler will tells us in an instant!

...

On 5/31/06, Cliff Green <cliffg@codewrangler.net> wrote:

...
I've never had a need for code that performs operations on endian types, other than as "placeholders" for performing I/O with them (whether over a network, or disk I/O).

Agreed.

Let's say you have to increment a variable in a record. It is very convenient to be able to write: ++record.foo. Rather than: int temp( record.foo); ++temp; record.foo = temp; Now I know that automatic conversions in some "placeholder" style endian classes make them pretty convenient, but there is no additional cost to providing the arithmetic operations. If you don't need/want/like them, don't use them.

...

I'm still not convinced that these types are a better solution than a simple input/output wrapper allowing something along the lines of long x; infile >> big_endian<3>(x); x *= 13; outfile << little_endian<3>(x);

That's fine for small examples, but gets old for large production programs. The experience reports from Darin and others who have been using endian classes with full arithmetic ops have also been convincing. Thanks, --Beman

me22

7:21 p.m.

On 6/2/06, Beman Dawes <bdawes@acm.org> wrote:

...

"Cliff Green" <cliffg@codewrangler.net> wrote in message

...
But I'm curious as to the rationale [...] Sure. The usual application for me is a record (or page) oriented disk file. The app works with a record, but often only touches a portion of that record. The rest remains unchanged. Doing a byte-reversing copy of the entire record into a buffer would kill performance.

Thanks, I was wondering about that as well. On 5/31/06, Yuval Ronen <ronen_yuval@yahoo.com> wrote:

...

What I'm suggesting is to provide a library for integer types of various sizes/alignments. This already partly exists in Boost.Integer (AFAIK, support for alignment is missing there). So Boost.Integer needs to be expanded for this.

On 6/2/06, Beman Dawes <bdawes@acm.org> wrote:

...

I've exposed the integer_cover_operators template and moved it to a separate header to make it generally available. I like your suggestion to make it part of Boost.Integer - I hadn't thought of that.

I've attached a basic exact.hpp header. I took the endian.hpp header and cut it down to something that might make a nice addition to Boost.Integer regardless of the outcome of the endian discussion. It's basically the unaligned_bytes idea, but done in host endianness. Usage is exact_int_t<N> or exact_uint_t<N>. ( it BOOST_STATIC_ASSERT's if N is not a multiple of numeric_limits<unsigned char>::digits ) It might be possible to make them faster by typepunning the internal storage to a bitfield, but I don't think that's legal. I didn't use integral types instead of the char array where the size matches nor bitfields internally because that resulted (in g++ 3.4.6) in extra padding or alignment requirements being added to the struct, which would presumably be unacceptable. Perhaps exact and aligned_exact could be added to uint_t and int_t? As a typedef to void or a fancy template trick to have it not be provided where impossible, presumably. Additionally, though this is slightly off-topic, why do uint_t and int_t not use __int64_t or long long or similar when available? That currently prevents using N>32 on x86 with the attached code. It might also be a nice template argument to some big_endian<> and little_endian<> wrapper templates, if that design is used. I find the appearance reasonably elegant: boost::little_endian< short > boost::big_endian< boost::uint16_t > boost::little_endian< boost::uint_t<24>::exact > boost::little_endian< boost::int_t<32>::aligned_exact > I particularly like the separation between sizing and endianness, though it's probably too verbose for your tastes. Hoping that that was more productive than my last few posts, ~ Scott McMurray

Marsh J. Ray

1 Jun 1 Jun

2:22 a.m.

Beman Dawes wrote:

...

Rene Rivera wrote:

...
int_be8_t int_le8_t uint_be8_t uint_le8_t

I like it.

...

int_b1_t int_l1_t uint_b1_t uint_l1_t

I'd vote to keep the 'e'. On many fonts/displays the lowercase 'L' and digit one are nearly indistinguishable. While this probably isn't an overriding concern in the naming of most identifiers, consider how commonly-used the 'l1' or 'l16' versions would be. Also, 'll' is likely to be used as a prefix for "long long".

...

I tried it both ways over the years. The problem with using bit sizes is that a programmer is often counts these things, and always in terms of bytes. Bytes is just more convenient than bits. <> Also, working in terms of bytes seems to signal to readers that something special is going on - int_b32_t is more likely to be mistakenly viewed as just another typedef for an int32_t than bin4 or bin4_t.

I think the bit-count nomenclature is currently far more widely used, and specifically by boost/cstdint.hpp. Consider the confusion that would occur between with int8_t being a char while int_l8_t is a long long! - Marsh

me22

31 May 31 May

6:02 p.m.

On 5/31/06, Beman Dawes <bdawes@acm.org> wrote:

...

The rationale for hiding the templates was that some implementations may prefer to provide hand-coded implementations of each type, either to speed compiles or to achieve platform specific optimizations. (I don't recommend the latter because of a lot of past experience where optimizations became pessimizations when something as simple as a compiler switch changed.)

With templates, specialisation could be used to provide hand-coded implementations, if desired. I fail to see how these would drastically speed compilation while remaining header-only, and if an implementation file becomes acceptable, explicit instantiation could be easily used for the templates that are needed.

...

I'm often amazed at the clever names Boosters suggest, so I think it is worthwhile to speculate a bit about better names. But the everyday use typedefs really do need to be short and memorable. I've been using the "bin2", "bun3", etc. since 1984 or so, with several hundred programmers now using them all the time, and never had a request to change the names.

* Use the naming conventions of the C++ Standard Library (See Naming conventions rationale): o Names (except as noted below) should be all lowercase, with words separated by underscores. o Acronyms should be treated as ordinary names (e.g. xml_parser instead of XML_parser). o Template parameter names begin with an uppercase letter. o Macro (gasp!) names all uppercase and begin with BOOST_. * Choose meaningful names - explicit is better than implicit, and readability counts. There is a strong preference for clear and descriptive names, even if lengthy. ~ http://boost.org/more/lib_guide.htm#Naming

...

The rationale for typedefs is that when an application uses endian classes, they are often used very heavily. Think hundreds, thousands, or even more uses in an organization. So, the names should be short, and all developers should use the same names.

struct header_record { bun3 version; bun1 rev; bun5 nrecs; ..... blah blah blah };

In a struct like that, copy-pasting would mean that slightly longer names wouldn't be much of an inconvenience: struct header_record { unsigned_bigendian<3> version; unsigned_bigendian<1> revision; unsigned_bigendian<5> record_count; }; and I think the readability gain would quickly make up for it. That being said, Rene Rivera's suggestions might be a nice middle-ground. Rene also raises an interesting point about the nessessity of operations on the types. How often does the internal byte ordering matter when not doing IO?

...

I guess I shouldn't have used that example, since it isn't realistic. In the real world, the endian classes are almost always used in classes or structs, and the I/O is usually Unix or fstream level rather than iostreams.

Sorry, but can you explain the difference between "fstream level" and "iostreams"? Also, how do you usually deal with padding? struct S { boost::bin4 a; boost::bun4a b; boost::bin5 c, d; }; g++ 3.4.6, for example, gives sizeof(S) as 20. Thanks, Scott McMurray

Cliff Green

7:05 p.m.

I've dealt with (and written) endian utilities for many years, since almost my full career has involved writing software for heterogeneous distributed systems. I definitely think Boost needs a library to ease and simplify byte endian issues. But I'm curious as to the rationale for these endian integer types - I've always written utilities to transform native (or user-defined) types into a serializable binary format - including the option for big-endian, little-endian (or in one project, for "receiver makes right" endianness). I've never had a need for code that performs operations on endian types, other than as "placeholders" for performing I/O with them (whether over a network, or disk I/O). Are there application use cases where the application never transforms into / out of native / UDT's, and always uses these (endian) types instead? Specifically, I've written (templatized generic) utilities that take integers and append them to a character (byte) buffer in a specified endian format (and the converse transformation). (Non-integral types are also supported, with lots of caveats and warnings - floating point type swapping is full of pitfalls, and any endian / byte swapping on a UDT has the expected restrictions - POD single element, no pointers or references, etc.) I would expect these types of utilities to be used in some form of serialization design (e.g. with Boost.Serialization). Anyway, I'm just trying to expand my awareness of various endian related designs. Cliff

Beman Dawes

2 Jun 2 Jun

3:16 p.m.

"Cliff Green" <cliffg@codewrangler.net> wrote in message news:web-41870932@bk2.webmaillogin.com...

...

I've dealt with (and written) endian utilities for many years, since almost my full career has involved writing software for heterogeneous distributed systems. I definitely think Boost needs a library to ease and simplify byte endian issues.

But I'm curious as to the rationale for these endian integer types - I've always written utilities to transform native (or user-defined) types into a serializable binary format - including the option for big-endian, little-endian (or in one project, for "receiver makes right" endianness). I've never had a need for code that performs operations on endian types, other than as "placeholders" for performing I/O with them (whether over a network, or disk I/O). Are there application use cases where the application never transforms into / out of native / UDT's, and always uses these (endian) types instead?

Sure. The usual application for me is a record (or page) oriented disk file. The app works with a record, but often only touches a portion of that record. The rest remains unchanged. Doing a byte-reversing copy of the entire record into a buffer would kill performance. Perhaps the best-known example of this is a page in a B-Tree. While the app using the B-tree may possible work with native or UDT types, the B-tree structural information on the page always uses the endian types. There is no reason to ever convert out of the endian types.

...

Specifically, I've written (templatized generic) utilities that take integers and append them to a character (byte) buffer in a specified endian format (and the converse transformation). (Non-integral types are also supported, with lots of caveats and warnings - floating point type swapping is full of pitfalls, and any endian / byte swapping on a UDT has the expected restrictions - POD single element, no pointers or references, etc.) I would expect these types of utilities to be used in some form of serialization design (e.g. with Boost.Serialization).

Yes, but those are a somewhat different set of needs.

...

Anyway, I'm just trying to expand my awareness of various endian related designs.

Yes, they are lots of fun. In industrial database applications there is also a lot of money riding on getting endianess right. You have to be able to build files in one environment, and sell them to folks in other environments without the buyers ever realizing you aren't building with their favorite platform. --Beman

Beman Dawes

31 May 31 May

7:48 p.m.

me22 wrote:

...

On 5/31/06, Beman Dawes <bdawes@acm.org> wrote:

...
The rationale for hiding the templates was that some implementations may prefer to provide hand-coded implementations of each type, either to speed compiles or to achieve platform specific optimizations. (I don't recommend the latter because of a lot of past experience where optimizations became pessimizations when something as simple as a compiler switch changed.)

With templates, specialisation could be used to provide hand-coded implementations, if desired.

Yes.

...

I fail to see how these would drastically speed compilation while remaining header-only, and if an implementation file becomes acceptable, explicit instantiation could be easily used for the templates that are needed.

Essentially, the implementation hand-generates the same code as would be generated by template instantiation. Bulky, but not difficult. All inline, to avoid ODR violations. At one time avoiding templates, particularly recursive templates, was an issue. Probably not much of an issue today.

...

...
I'm often amazed at the clever names Boosters suggest, so I think it is worthwhile to speculate a bit about better names. But the everyday use typedefs really do need to be short and memorable. I've been using the "bin2", "bun3", etc. since 1984 or so, with several hundred programmers now using them all the time, and never had a request to change the names.

* Use the naming conventions of the C++ Standard Library (See Naming conventions rationale): o Names (except as noted below) should be all lowercase, with words separated by underscores. o Acronyms should be treated as ordinary names (e.g. xml_parser instead of XML_parser). o Template parameter names begin with an uppercase letter. o Macro (gasp!) names all uppercase and begin with BOOST_. * Choose meaningful names - explicit is better than implicit, and readability counts. There is a strong preference for clear and descriptive names, even if lengthy.

~ http://boost.org/more/lib_guide.htm#Naming

...
The rationale for typedefs is that when an application uses endian classes, they are often used very heavily. Think hundreds, thousands, or even more uses in an organization. So, the names should be short, and all developers should use the same names.

struct header_record { bun3 version; bun1 rev; bun5 nrecs; ..... blah blah blah };

In a struct like that, copy-pasting would mean that slightly longer names wouldn't be much of an inconvenience:

struct header_record { unsigned_bigendian<3> version; unsigned_bigendian<1> revision; unsigned_bigendian<5> record_count; };

and I think the readability gain would quickly make up for it. That being said, Rene Rivera's suggestions might be a nice middle-ground.

Yes, I'd prefer something more along the lines of Rene's suggestions. If the template were exposed, people would always be able to write out the names in full if desired.

...

Rene also raises an interesting point about the nessessity of operations on the types. How often does the internal byte ordering matter when not doing IO?

Never, AFAIK. But the point is that you often read a record containing values these types, inspect, modify, and then write them back out. So for any operations not provided, you have to convert to the underlying type, perform the computation, then convert back. Even if the conversion is done automatically, you still have to explicitly provide a temporary. Or write ugly casts. Also, there isn't any real cost to providing full arithmetic operations. The functions are never instantiated if they are not used.

...

...
I guess I shouldn't have used that example, since it isn't realistic. In the real world, the endian classes are almost always used in classes or structs, and the I/O is usually Unix or fstream level rather than iostreams.

Sorry, but can you explain the difference between "fstream level" and "iostreams"?

Also, how do you usually deal with padding? struct S { boost::bin4 a; boost::bun4a b; boost::bin5 c, d; }; g++ 3.4.6, for example, gives sizeof(S) as 20.

That's correct, and other compilers will come up with the same result. Here is a little test program: #include <iostream> #include <cstddef> #include <boost/endian.hpp> struct S { boost::bin4 a; boost::bun4a b; boost::bin5 c, d; }; int main() { std::cout << sizeof(S) << " " << offsetof(S,a) << " " << offsetof(S,b) << " " << offsetof(S,c) << " " << offsetof(S,d) << std::endl; return 0; } It outputs 20 0 4 8 13. In other words, 2 padding bytes are added after the d. That ensures that each element of an array is correctly aligned. The result would be the same if b was an int, assuming 32-bit int's. When a struct (or class) has any aligned member, the whole struct gets aligned accordingly. Using endian.hpp doesn't alter that. Now change b from bun4a to bun4, and re-run: 18 0 4 8 13. Because no members are aligned, the struct is not padded. Thanks, --Beman

Christopher Kohlhoff

1 Jun 1 Jun

1:20 a.m.

Hi Beman, Beman Dawes <bdawes@acm.org> wrote:

...

That said, I guess there could be an argument to expose the templates for those who prefer them.

Here's an argument for exposing templates: it allows the template parameter to propagate up into my own types. I'm thinking something like: template <typename Endianness> struct my_message { integer<4, Endianness> foo; integer<2, Endianness> bar; ... }; This permits me to write one function template to handle my_message structures in programs that need to deal with both little- and big-endian at runtime (i'm thinking "receiver makes right" protocols). With this scheme there should probably also be a type for native endianness so we could write: my_message<endian::big> msg1; my_message<endian::little> msg2; my_message<endian::native> msg3; etc. In my opinion a similar argument applies to exposing the alignment as a template parameter. Cheers, Chris

Beman Dawes

2 Jun 2 Jun

3:24 p.m.

"Christopher Kohlhoff" <chris@kohlhoff.com> wrote in message news:20060601012008.72917.qmail@web32609.mail.mud.yahoo.com...

...

Hi Beman,

Beman Dawes <bdawes@acm.org> wrote:

...
That said, I guess there could be an argument to expose the templates for those who prefer them.

Here's an argument for exposing templates:...

I am going to expose the templates - I'm made the code changes and am working on docs now.

...

it allows the template parameter to propagate up into my own types. I'm thinking something like:

template <typename Endianness> struct my_message { integer<4, Endianness> foo; integer<2, Endianness> bar; ... };

This permits me to write one function template to handle my_message structures in programs that need to deal with both little- and big-endian at runtime (i'm thinking "receiver makes right" protocols).

With this scheme there should probably also be a type for native endianness so we could write:

my_message<endian::big> msg1; my_message<endian::little> msg2; my_message<endian::native> msg3;

Interesting.

...

etc. In my opinion a similar argument applies to exposing the alignment as a template parameter.

Yeah. Those are separate classes now. Probably not too hard to wrap in a generative class template that takes endianness and alignment as template arguments and then just instantiates one of the current classes. Anyone like to contribute such an animal? --Beman

Beman Dawes

5:24 p.m.

"Beman Dawes" <bdawes@acm.org> wrote in message news:e5pl7g$gd5$1@sea.gmane.org...

...

..

...
my_message<endian::big> msg1; my_message<endian::little> msg2; my_message<endian::native> msg3;

Interesting.

...
etc. In my opinion a similar argument applies to exposing the alignment as a template parameter.

Yeah. Those are separate classes now. Probably not too hard to wrap in a generative class template that takes endianness and alignment as template arguments and then just instantiates one of the current classes.

Anyone like to contribute such an animal?

Nevermind. Such a wrapper is trivial. I'll include it myself. --Beman

John Maddock

31 May 31 May

5:30 p.m.

Beman Dawes wrote:

...

Back in 1999 and 2000, there was discussion of a Boost Endian library to provided portable big and little endian integer types. I've dusted off some old code from Darin Adler, gotten his permission to use the new Boost license, and have put together a library.

A zip file of the whole library is available at http://mysite.verizon.net/~beman/endian.zip

The docs can be read at http://mysite.verizon.net/~beman/endian.html

Very interesting! I've only read the docs but it looks like a very good - and nice and simple - idea. John.

Yuval Ronen

10:26 p.m.

Beman Dawes wrote:

...

Back in 1999 and 2000, there was discussion of a Boost Endian library to provided portable big and little endian integer types. I've dusted off some old code from Darin Adler, gotten his permission to use the new Boost license, and have put together a library.

A zip file of the whole library is available at http://mysite.verizon.net/~beman/endian.zip

I think an Endian library would be most useful in Boost. However, I have some doubts about this proposal. It seems to me that it tries to solve two problems at once, and it doesn't look good. The first issue is providing integer types of various sizes/alignments. the second issue is Endianness. Wouldn't it be better to separate those two? If we want to provide with integer types of various sizes/alignments, then why shouldn't they allowed to be with native Endianness? And the complimentary question - if we want to provide with specific Endianness types, then why can't we do it with regular types such as int or long? What I'm suggesting is to provide a library for integer types of various sizes/alignments. This already partly exists in Boost.Integer (AFAIK, support for alignment is missing there). So Boost.Integer needs to be expanded for this. Then, there's also a need for Endian classes which are templated with some integer type. Something like: boost::big_endian<int> x; boost::little_endian<boost::uint32_t> y; boost::big_endian<boost::signed_7_bytes_aligned> z; This way, the full scope of possibilities is provided.

...

The docs can be read at http://mysite.verizon.net/~beman/endian.html

Just a tiny thing: a sentence describing how to convert to/from native integers would be very helpful. I know I could find it in the code, if I'd bother to look, and the code sample also implies there's an implicit conversion from native types, but what about the other way, and anyway, I think a short explaining sentence is a good idea. Yuval

Beman Dawes

2 Jun 2 Jun

3:30 p.m.

"Yuval Ronen" <ronen_yuval@yahoo.com> wrote in message news:e5l56m$vse$1@sea.gmane.org...

...

Beman Dawes wrote:

...
Back in 1999 and 2000, there was discussion of a Boost Endian library to provided portable big and little endian integer types. I've dusted off some old code from Darin Adler, gotten his permission to use the new Boost license, and have put together a library.

A zip file of the whole library is available at http://mysite.verizon.net/~beman/endian.zip

I think an Endian library would be most useful in Boost. However, I have some doubts about this proposal. It seems to me that it tries to solve two problems at once, and it doesn't look good. The first issue is providing integer types of various sizes/alignments. the second issue is Endianness. Wouldn't it be better to separate those two?

If we want to provide with integer types of various sizes/alignments, then why shouldn't they allowed to be with native Endianness? And the complimentary question - if we want to provide with specific Endianness types, then why can't we do it with regular types such as int or long?

See Chis Kohlhoff's posting, and my response to him.

...

What I'm suggesting is to provide a library for integer types of various sizes/alignments. This already partly exists in Boost.Integer (AFAIK, support for alignment is missing there). So Boost.Integer needs to be expanded for this.

Again, see Chis' post and my response. That is pretty much what he was suggesting IIUC.

...

Then, there's also a need for Endian classes which are templated with some integer type. Something like:

boost::big_endian<int> x; boost::little_endian<boost::uint32_t> y; boost::big_endian<boost::signed_7_bytes_aligned> z;

This way, the full scope of possibilities is provided.

...
The docs can be read at http://mysite.verizon.net/~beman/endian.html

Just a tiny thing: a sentence describing how to convert to/from native integers would be very helpful. I know I could find it in the code, if I'd bother to look, and the code sample also implies there's an implicit conversion from native types, but what about the other way, and anyway, I think a short explaining sentence is a good idea.

I've expanded the docs. Will post later today. Yes, there are conversions to and from the cover type. Thanks, --Beman

Thorsten Ottosen

3 Jun 3 Jun

12:05 p.m.

Beman Dawes wrote:

...

Back in 1999 and 2000, there was discussion of a Boost Endian library to provided portable big and little endian integer types. I've dusted off some old code from Darin Adler, gotten his permission to use the new Boost license, and have put together a library.

A zip file of the whole library is available at http://mysite.verizon.net/~beman/endian.zip

The docs can be read at http://mysite.verizon.net/~beman/endian.html

Comments welcome!

This is more a couple of questions that comments. 1. have I misunderstood aligmnet completely when I thought that a single data member was not guaranteed to be aligned, but an array ways? (your aligned types use a data member, your unaligned use a char array). 2. why is a char array used, and not, say, a T[1] array? -Thorsten

Sebastian Redl

9:28 p.m.

Thorsten Ottosen wrote:

...

1. have I misunderstood aligmnet completely when I thought that a single data member was not guaranteed to be aligned, but an array ways? (your aligned types use a data member, your unaligned use a char array).

Every variable is aligned according to its own alignment requirements. Typically, a variable needs to be aligned on a boundary of its own size, i.e. a byte type must be byte-aligned, a word type word-aligned and so on. An array is always aligned according to its underlying type. Thus, to get a simple aligned type, just use a member of that type. To get an unaligned type, you need to allocate storage that is byte-aligned, and the easiest way of doing this is using a char[sizeof(T)]. I hope that answers the question as well. Sebastian Redl

Beman Dawes

10:08 p.m.

On 6/3/06, Sebastian Redl <sebastian.redl@getdesigned.at> wrote:

...

Thorsten Ottosen wrote:

...
1. have I misunderstood aligmnet completely when I thought that a single data member was not guaranteed to be aligned, but an array ways? (your aligned types use a data member, your unaligned use a char array).

Every variable is aligned according to its own alignment requirements. Typically, a variable needs to be aligned on a boundary of its own size, i.e. a byte type must be byte-aligned, a word type word-aligned and so on. An array is always aligned according to its underlying type. Thus, to get a simple aligned type, just use a member of that type. To get an unaligned type, you need to allocate storage that is byte-aligned, and the easiest way of doing this is using a char[sizeof(T)]. I hope that answers the question as well.

Yes, that's correct. One minor note, though, is that the unaligned storage is actually char[num_bytes]. That's because sizeof(T) == num_bytes if the cover type is the same size as the endian type, but for, say, a 7 byte endian number the cover type is typically 8 bytes, so char[sizeof(T)] would be one byte too large. --Beman

Beman Dawes

10:24 p.m.

On 6/3/06, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:

...

This is more a couple of questions that comments.

1. have I misunderstood aligmnet completely when I thought that a single data member was not guaranteed to be aligned, but an array ways?

An array of some type gets the same alignment as a scalar of that type.

...

(your aligned types use a data member, your unaligned use a char array).

2. why is a char array used, and not, say, a T[1] array?

T[1] is given the same alignment by the compiler as T. Also, T is the wrong size except for the case where sizeof(T) == num_bytes. --Beman

6947

Age (days ago)

6950

Last active (days ago)

List overview

Download

25 comments

12 participants

participants (12)

Beman Dawes
Christopher Kohlhoff
Cliff Green
Jeff Flinn
John Maddock
Marsh J. Ray
Matias Capeletto
me22
Rene Rivera
Sebastian Redl
Thorsten Ottosen
Yuval Ronen