[endian] Refresh based on comments received

older
[interprocess] Code uploaded to...

Beman Dawes

4 Jun 2006 4 Jun '06

8:34 p.m.

A refresh of the .zip file for the Endian library, based on comments received so far, is available at http://mysite.verizon.net/~beman/endian-0.2.zip The docs are online at http://mysite.verizon.net/~beman/endian-0.2/libs/endian/index.html Changes include: * Templates are exposed. * The four outermost templates have been folded into a single endian class template. * The integer_cover_operators class template has been moved into a separate header file, boost/integer_cover_operators.hpp. No docs yet, but functionality is pretty obvious from the header. * Provision has been made for native endianness. Not implemented yet. * More explicit names have been given to the forty-four typedefs. A careful explanation of the naming rationale has been added to the docs. * The docs have been expanded. * The example program has been rewritten and is much more realistic. * I haven't looked at Scott McMurray's exact.hpp header yet, so that may kick off yet more changes to come. --Beman

Show replies by date

Eric Friedman

4 Jun 4 Jun

11:35 p.m.

Beman, I haven't looked at the implementation, but I did look through the documentation and caught up with this conversation thread today. For the most part, I like the new iteration quite a bit. Two comments: First, the new names are a lot better than the old bin/bun names. But I think the aligned types could use spelling out, i.e., aligned_big2_t instead of abig2_t. Unlike unsigned types, where it is standard to abbreviate them to names like uint, aligned types are not so common. Second, when I first read the docs I objected to the arithmetic operations. Then I read the following statement you made on June 2nd: Let's say you have to increment a variable in a record. It is very convenient to be able to write: ++record.foo. Rather than: int temp( record.foo); ++temp; record.foo = temp; Now I know that automatic conversions in some "placeholder" style endian classes make them pretty convenient, but there is no additional cost to providing the arithmetic operations. If you don't need/want/like them, don't use them." This argument convinced me. I think this, or something like it, should appear in the docs instead of the more hand-wavy "Providing a full set of operations reduces program clutter and makes code both easier to write and to read." Eric Beman Dawes wrote:

...

A refresh of the .zip file for the Endian library, based on comments received so far, is available at http://mysite.verizon.net/~beman/endian-0.2.zip

The docs are online at http://mysite.verizon.net/~beman/endian-0.2/libs/endian/index.html

Changes include:

* Templates are exposed.

* The four outermost templates have been folded into a single endian class template.

* The integer_cover_operators class template has been moved into a separate header file, boost/integer_cover_operators.hpp. No docs yet, but functionality is pretty obvious from the header.

* Provision has been made for native endianness. Not implemented yet.

* More explicit names have been given to the forty-four typedefs. A careful explanation of the naming rationale has been added to the docs.

* The docs have been expanded.

* The example program has been rewritten and is much more realistic.

* I haven't looked at Scott McMurray's exact.hpp header yet, so that may kick off yet more changes to come.

--Beman _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christopher Kohlhoff

5 Jun 5 Jun

1:11 a.m.

Hi Beman, Beman Dawes <bdawes@acm.org> wrote:

...

A refresh of the .zip file for the Endian library, based on comments received so far, is available at http://mysite.verizon.net/~beman/endian-0.2.zip

The interface looks pretty good to me, except that I'm a little concerned that the endianness enum names are introduced directly into the boost namespace. In particular 'big', 'little' and 'native' seem like common names. Perhaps it could be: namespace endianness { enum type { big, ... }; } Cheers, Chris

Beman Dawes

2:06 a.m.

On 6/4/06, Christopher Kohlhoff <chris@kohlhoff.com> wrote:

...

Hi Beman,

Beman Dawes <bdawes@acm.org> wrote:

...
A refresh of the .zip file for the Endian library, based on comments received so far, is available at http://mysite.verizon.net/~beman/endian-0.2.zip

The interface looks pretty good to me, except that I'm a little concerned that the endianness enum names are introduced directly into the boost namespace. In particular 'big', 'little' and 'native' seem like common names. Perhaps it could be:

namespace endianness { enum type { big, ... }; }

Good point. I'm not sure about your solution yet (not for any technical concerns, but just cause it's past my bedtime and my mind is already half asleep). Thanks, --Beman

me22

1:44 a.m.

On 6/4/06, Beman Dawes <bdawes@acm.org> wrote:

...

* Provision has been made for native endianness. Not implemented yet.

I've implemented it in the attached header and added the corresponding typedefs. It currently memcpy's the bytes around. I don't know whether it would be preferable to use BOOST_LITTLE_ENDIAN and call (load|store)_(big|little)_endian instead.

...

* More explicit names have been given to the forty-four typedefs. A careful explanation of the naming rationale has been added to the docs.

Much nicer, and different enough from stdint for big8_t to not look immediately like a char. I would also, as Eric suggests, prefer aligned_ as the prefix instead of just a.

...

* I haven't looked at Scott McMurray's exact.hpp header yet, so that may kick off yet more changes to come.

It's basically just the native endian types, just in a different interface. With those in yours, it's not terribly important. I would like to keep a similar interface as a possibility and think it's possible to do so without changing the current one at all (only adding to it): boost::endian< boost::big, boost::uint_t<24> > x; Unfortunately, I haven't gotten any of my attempts at specialisation to work. Here's what I consider the most promising failed attempt, in the hopes that someone can spot the problem: template <endianness E, std::size_t n_bits, std::size_t n_bytes> class endian< E, boost::int_t<n_bits>, n_bytes > : integer_cover_operators< endian< E, boost::int_t<n_bits>, n_bytes >, typename boost::int_t<n_bits>::least > { BOOST_STATIC_ASSERT( n_bits % CHAR_BIT == 0 ); public: typedef typename boost::int_t<n_bits>::least value_type; endian() {} endian(value_type i) : value(i) {} operator value_type() { return value; } private: endian< E, value_type, n_bits/CHAR_BIT > value; }; template <endianness E, std::size_t n_bits, std::size_t n_bytes> class endian< E, boost::uint_t<n_bits>, n_bytes > : integer_cover_operators< endian< E, boost::uint_t<n_bits>, n_bytes >, typename boost::uint_t<n_bits>::least > { BOOST_STATIC_ASSERT( n_bits % CHAR_BIT == 0 ); public: typedef typename boost::uint_t<n_bits>::least value_type; endian() {} endian(value_type i) : value(i) {} operator value_type() { return value; } private: endian< E, value_type, n_bits/CHAR_BIT > value; }; The other change I made to the header was to add STATIC_ASSERTs in the various templates. I could see someone using endian<little, int, 3>, which would be very confusing if they ever ported to a platform with 16-bit ints. One other question: Will the library being using "bytes" as 8 bits or as the same size as a char? ( I understand there are many people in various C++ IRC channels with toasters that have 13-bit chars :P ) ~ Scott

Gennaro Prota

12:59 p.m.

On Sun, 04 Jun 2006 16:34:38 -0400, Beman Dawes <bdawes@acm.org> wrote:

...

A refresh of the .zip file for the Endian library, based on comments received so far, is available at http://mysite.verizon.net/~beman/endian-0.2.zip

The docs are online at http://mysite.verizon.net/~beman/endian-0.2/libs/endian/index.html

Hi Beman, I had a quick look and hope to do a more careful analysis in the next days. Nonetheless I have some not very useful comments: * as far as I know (I have been absent from the list for long time, so please correct me if this has changed again) the license reference text we use now doesn't contain "use, modification, and distribution", despite what http://www.boost.org/more/lib_guide.htm says. The adopted version is the one reported at http://www.boost.org/more/license_info.html * there is no guarantee that an unsigned char has 8 bits, and C++ programmers usually identify (in accordance with the standard terminology) "byte" with "unsigned char". So either the various "<< 8" have to be changed to something more portable or the interface should use a different name than unsigned char; of course numeric_limits<> offers everything you need, but you might also consider my more fine-grained type traits in the Yahoo! files section: http://lists.boost.org/Archives/boost/2003/03/45411.php * I fear I'm missing something but does unrolled_byte_loops really "unroll" loops at compile time? It seems to me it just uses (run-time) recursion. * integer_cover_operators initial section reports "integer_operations.hpp", presumably as filename, and seem to have many superfluous includes. The guard macro name also seems inconsistent. More importantly, is it intentional that stream input and output only consider ostream and istream (no wide versions, no templates, etc.)? * (minor) the example omits fclose() * though the Wikipedia article seems to be, at the time I'm writing, in a decent state, it might easily degrade in the future (I've experience this myself; no ranting :)); OTOH we can't include it into the boost files, due to the Wikipedia license. It might be worth writing something ourselves, at least in the long run, or link to a specific version of it, with a word of caution that any newer versions have not been verified by the boost members. Cheers, --Gennaro.

Beman Dawes

8 Jun 8 Jun

4:51 p.m.

Gennaro Prota wrote:

...

On Sun, 04 Jun 2006 16:34:38 -0400, Beman Dawes <bdawes@acm.org> wrote:

...
A refresh of the .zip file for the Endian library, based on comments received so far, is available at http://mysite.verizon.net/~beman/endian-0.2.zip

The docs are online at http://mysite.verizon.net/~beman/endian-0.2/libs/endian/index.html

Hi Beman,

I had a quick look and hope to do a more careful analysis in the next days. Nonetheless I have some not very useful comments:

* as far as I know (I have been absent from the list for long time, so please correct me if this has changed again) the license reference text we use now doesn't contain "use, modification, and distribution", despite what http://www.boost.org/more/lib_guide.htm says. The adopted version is the one reported at http://www.boost.org/more/license_info.html

Fixed. lib_guide.htm corrected, too.

...

* there is no guarantee that an unsigned char has 8 bits...

The C and C++ standards specify char, signed char, and unsigned char all have exactly 8 bits, AFAIK.

...

* I fear I'm missing something but does unrolled_byte_loops really "unroll" loops at compile time? It seems to me it just uses (run-time) recursion.

The depth of recursion is controlled at compile time. The runtime calls are presumably optimized away by inlining. You would have to look at the generated code and/or run some tests to see if there is any abstraction penalty. In any case, that is an implementation detail, as indicated by being in namespace detail. It can be replaced with something else if need be.

...

* integer_cover_operators initial section reports "integer_operations.hpp", presumably as filename,

Fixed.

...

and seem to have many superfluous includes.

Fixed.

...

The guard macro name also seems inconsistent.

Fixed.

...

More importantly, is it intentional that stream input and output only consider ostream and istream (no wide versions, no templates, etc.)?

Yep, that needs to be fixed. I suspect the current version reflects compiler problems cica 1999. I've added a "TODO" comment to the code in case so it won't be forgotten in case I don't get to it right away.

...

* (minor) the example omits fclose()

It was omitted for brevity. I've now added it.

...

* though the Wikipedia article seems to be, at the time I'm writing, in a decent state, it might easily degrade in the future (I've experience this myself; no ranting :)); OTOH we can't include it into the boost files, due to the Wikipedia license. It might be worth writing something ourselves, at least in the long run, or link to a specific version of it, with a word of caution that any newer versions have not been verified by the boost members.

Yes, reference to the Wikipedia in docs is something we need to discuss. The articles are often extremely well written and authoritative, and it save a lot of work (and then later maintenance) to link to them. But as you say, we have no way to know if an article might degrade in the future. My current feeling is that the advantages of linking to a well-written Wikipedia entry outweigh the disadvantages, but it is an open question that other Boosters need to think about. Thanks for the comments and corrections! --Beman

Kim Barrett

5:29 p.m.

At 12:51 PM -0400 6/8/06, Beman Dawes wrote:

...

Gennaro Prota wrote:

...
* there is no guarantee that an unsigned char has 8 bits...

The C and C++ standards specify char, signed char, and unsigned char all have exactly 8 bits, AFAIK.

CHAR_BITS is defined to be *at least* 8 bits. No guarantee that it is *exactly* 8 bits. This is not just a historical artifact to support strange ancient processors with odd addressing unit sizes either. There are modern C/C++ implementations for modern DSP processors where, for example, sizeof(char) == sizeof(int) == 1, and CHAR_BITS is 16 or 32 (or perhaps even 64, though I haven't actually run across that last case myself). Of course, the vast majority of even purportedly portable code ignores this fact, because it can be a real PITA to deal with, usually for little or no benefit.

Gennaro Prota

6:05 p.m.

On Thu, 8 Jun 2006 13:29:02 -0400, Kim Barrett <kab@irobot.com> wrote:

...

At 12:51 PM -0400 6/8/06, Beman Dawes wrote:

...
Gennaro Prota wrote:

...
* there is no guarantee that an unsigned char has 8 bits...

The C and C++ standards specify char, signed char, and unsigned char all have exactly 8 bits, AFAIK.

CHAR_BITS is defined to be *at least* 8 bits. No guarantee that it is *exactly* 8 bits.

Right.

...

This is not just a historical artifact to support strange ancient processors with odd addressing unit sizes either. There are modern C/C++ implementations for modern DSP processors where, for example, sizeof(char) == sizeof(int) == 1, and CHAR_BITS is 16 or 32 (or perhaps even 64, though I haven't actually run across that last case myself).

Historically there have been implementations for machines with a 36-bit word size, where CHAR_BIT == 9 was chosen. This way, four chars were packed into one machine word, and a pointer to char actually consisted of a pointer to a machine word plus an offset (0, 1, 2, 3). Of course a pointer to int just consisted of a machine pointer. This is basically the reason why the standard allows sizeof(char *) > sizeof(int *)

...

Of course, the vast majority of even purportedly portable code ignores this fact, because it can be a real PITA to deal with, usually for little or no benefit.

It actually depends on the context. In some cases it is difficult, in some others it's just a matter of avoiding to hardcode a constant. FWIW, dynamic_bitset<>::count() also works on platforms where CHAR_BIT

...

8, by selecting a different implementation at compile time. Incidentally, it also takes into account the possibility of padding bits in the representation of integer types; do you know of any implementation that has these?

--Gennaro.

Beman Dawes

9 Jun 9 Jun

5:59 p.m.

Kim Barrett wrote:

...

At 12:51 PM -0400 6/8/06, Beman Dawes wrote:

...
Gennaro Prota wrote:

...
* there is no guarantee that an unsigned char has 8 bits...

The C and C++ standards specify char, signed char, and unsigned char all have exactly 8 bits, AFAIK.

CHAR_BITS is defined to be *at least* 8 bits. No guarantee that it is *exactly* 8 bits.

Oops! Sorry, you are correct.

...

This is not just a historical artifact to support strange ancient processors with odd addressing unit sizes either. There are modern C/C++ implementations for modern DSP processors where, for example, sizeof(char) == sizeof(int) == 1, and CHAR_BITS is 16 or 32 (or perhaps even 64, though I haven't actually run across that last case myself).

I'll have to give that some thought. The primary purpose of endian byte-holders is data interchange between platforms, so I'm not sure endians are of any use if the hardware doesn't support data interchange. OTOH, if the platform does support data interchange at least for certain sizes, then endians have a role and need to work correctly (or yield a noisy failure) and be named and specified accordingly.

...

Of course, the vast majority of even purportedly portable code ignores this fact, because it can be a real PITA to deal with, usually for little or no benefit.

Yes, understood. Let's say sizeof(char) == sizeof(int) == 1 and CHAR_BITS is 32. My initial reactions is that it is possible to write an implementation that will work for the 4-byte and 8-byte endians to work, but not the other sizes. However, the Boost implementation won't work, and that's just the way the cookie crumbles. I'll add compile time tests so that the Boost implementation will yield a noisy rather than silent failure if CHAR_BITS != 8. Thanks! --Beman

Gennaro Prota

8 Jun 8 Jun

6:23 p.m.

On Thu, 08 Jun 2006 12:51:09 -0400, Beman Dawes <bdawes@acm.org> wrote:

...

Gennaro Prota wrote:

...
[...] * I fear I'm missing something but does unrolled_byte_loops really "unroll" loops at compile time? It seems to me it just uses (run-time) recursion.

The depth of recursion is controlled at compile time. The runtime calls are presumably optimized away by inlining. You would have to look at the generated code and/or run some tests to see if there is any abstraction penalty.

In any case, that is an implementation detail, as indicated by being in namespace detail. It can be replaced with something else if need be.

Yes, absolutely. This was more a "terminology" question, in fact.

...

[...]

Thanks for the comments and corrections!

It's a pleasure. I hope to have further comments in the next days, when I'll try using the library for a real application I have in mind. --Gennaro.

Yuval Ronen

5 Jun 5 Jun

7:59 p.m.

Beman Dawes wrote:

...

A refresh of the .zip file for the Endian library, based on comments received so far, is available at http://mysite.verizon.net/~beman/endian-0.2.zip

The docs are online at http://mysite.verizon.net/~beman/endian-0.2/libs/endian/index.html

I see that enum endianness have a 'native' option. This looks a bit weird to me. If I want to use native endianness, why should I use a class named 'endian'? Very unintuitive... What I'm suggesting is exactly what I suggested before (and obviously failed to convince): There should be a set of Integer types for various sizes/alignments, which could be used without any relation to endianness (which probably means native endianness, just as using a simple 'int' or 'uint32_t' means native endianness). These types could then be wrapped in a big_endian or little_endian class, if the arbitrary native endianness is not desired. Maybe this time I did a better job of convincing... Other stuff: - I think that using bits numbering is better than bytes, because a) uniformity with the types in <cstdint> is *very* important, IMO and b) as some noted, the size of a char is not necessarily 8 bits (so help me God if I understand why this is more useful than harmful), so bits numbering is less ambiguous than bytes (and maybe this is the reason why it was chosen to be used in <cstdint>). Actually, it just occurred to me that if portability between different platforms (with different CHAR_BITS) is our main concern here, then it *must* be bits, isn't it? - Is aligned more common than unaligned, or vice-versa? It sounds logical to me, that since the POD integers types (int and friends) are aligned, it should also be the 'default' behavior of any class mimicking them, including of course, the endian class. The conclusion is that instead of prefixing 'a' or 'aligned_' to the aligned types, the unaligned types should get a prefix ('unaligned_'?). - Having an enum with values such as 'big', 'aligned_big', 'little', 'aligned_little', etc, just cries for separation. The enum should have only 'big' and 'little', and the endian template can accept one more template argument - 'bool aligned'. I hope this post didn't sound too criticizing. That's not what I meant. I only wanted to thank you a lot for putting the effort of writing it, and present my humble comments. Yuval

me22

8:37 p.m.

On 6/5/06, Yuval Ronen <ronen_yuval@yahoo.com> wrote:

...

I see that enum endianness have a 'native' option. This looks a bit weird to me. If I want to use native endianness, why should I use a class named 'endian'? Very unintuitive... What I'm suggesting is exactly what I suggested before (and obviously failed to convince): There should be a set of Integer types for various sizes/alignments, which could be used without any relation to endianness (which probably means native endianness, just as using a simple 'int' or 'uint32_t' means native endianness). These types could then be wrapped in a big_endian or little_endian class, if the arbitrary native endianness is not desired.

I like that approach as well. Unaligned native-endian arbitrary-sized integer types is what my exact.hpp header from the previous thread implemented. I'll try and implement the big_endian and little_endian wrappers for comparison purposes.

...

- Is aligned more common than unaligned, or vice-versa? It sounds logical to me, that since the POD integers types (int and friends) are aligned, it should also be the 'default' behavior of any class mimicking them, including of course, the endian class. The conclusion is that instead of prefixing 'a' or 'aligned_' to the aligned types, the unaligned types should get a prefix ('unaligned_'?).

I think it could be common to have many of these types in a header one after another and not wanting any padding between the members, so unaligned should remain the default. Additionally, aligned can only be provided when there are fundamental integral types of the requested size, so having them be special in some way is probably good.

...

- Having an enum with values such as 'big', 'aligned_big', 'little', 'aligned_little', etc, just cries for separation. The enum should have only 'big' and 'little', and the endian template can accept one more template argument - 'bool aligned'.

I'm unsure whether or not the separation would be advantageous, but for readability I'd much prefer an aligned/unaligned enum over true/false. ~ Scott McMurray

Yuval Ronen

6 Jun 6 Jun

6 a.m.

me22 wrote:

...

On 6/5/06, Yuval Ronen <ronen_yuval@yahoo.com> wrote:

...
I see that enum endianness have a 'native' option. This looks a bit weird to me. If I want to use native endianness, why should I use a class named 'endian'? Very unintuitive... What I'm suggesting is exactly what I suggested before (and obviously failed to convince): There should be a set of Integer types for various sizes/alignments, which could be used without any relation to endianness (which probably means native endianness, just as using a simple 'int' or 'uint32_t' means native endianness). These types could then be wrapped in a big_endian or little_endian class, if the arbitrary native endianness is not desired.

I like that approach as well.

Unaligned native-endian arbitrary-sized integer types is what my exact.hpp header from the previous thread implemented. I'll try and implement the big_endian and little_endian wrappers for comparison purposes.

If this approach is accepted, then it means that we can debate about the arbitrary-sized/arbitrary-aligned types without any relation to the endianness, as the xxx_endian are just wrappers that are relevant only when they are relevant. Boost.Integer then needs to be extended to support unaligned types (it already has aligned types). I'm repeating all this (without saying anything new) because it's relevant to my answer to the next paragraph...

...

...
- Is aligned more common than unaligned, or vice-versa? It sounds logical to me, that since the POD integers types (int and friends) are aligned, it should also be the 'default' behavior of any class mimicking them, including of course, the endian class. The conclusion is that instead of prefixing 'a' or 'aligned_' to the aligned types, the unaligned types should get a prefix ('unaligned_'?).

I think it could be common to have many of these types in a header one after another and not wanting any padding between the members, so unaligned should remain the default.

Additionally, aligned can only be provided when there are fundamental integral types of the requested size, so having them be special in some way is probably good.

... so Boost.Integer's int32_t is aligned, period. If we want to create a new unaligned 32-bit type, it must be named something else. The non-prefixed names are already taken by the aligned types in <cstdint>... Of course we can come up with a whole new system of naming, just to bypass the names already taken, but I think that (according to the previous paragraph) the right way to go is extending Boost.Integer with unaligned types, because that's where it should go...

...

...
- Having an enum with values such as 'big', 'aligned_big', 'little', 'aligned_little', etc, just cries for separation. The enum should have only 'big' and 'little', and the endian template can accept one more template argument - 'bool aligned'.

I'm unsure whether or not the separation would be advantageous, but for readability I'd much prefer an aligned/unaligned enum over true/false.

No problem for me with an aligned/unaligned enum. Yuval

Christopher Kohlhoff

1:42 a.m.

Hi Yuval, Yuval Ronen <ronen_yuval@yahoo.com> wrote:

...

I see that enum endianness have a 'native' option. This looks a bit weird to me. If I want to use native endianness, why should I use a class named 'endian'?

You might be writing code that cares about endianness, and need to use native endianness in some places, and big- or little-endian integers in others. One example is receiver makes right protocols, where the message is always transmitted using the sender's endianness. When a message is received it could be either big- or little-endian.

...

- Is aligned more common than unaligned, or vice-versa? It sounds logical to me, that since the POD integers types (int and friends) are aligned, it should also be the 'default' behavior of any class mimicking them, including of course, the endian class. The conclusion is that instead of prefixing 'a' or 'aligned_' to the aligned types, the unaligned types should get a prefix ('unaligned_'?).

I had a think about this too, and came to the opposite conclusion that unaligned is the sensible default. The purpose of these classes is manipulating records for I/O, not mimicking in memory structures. IMHO defining a structure that uses the aligned endian types (slightly) increases code fragility due to the effect of compiler alignment options on the layout, and so should not be given equal billing. Since the stated motivation for supporting aligned types is performance, you'd have specific reasons for choosing them and be more likely to be careful about using them correctly. Cheers, Chris

Yuval Ronen

6:12 a.m.

Christopher Kohlhoff wrote:

...

...
I see that enum endianness have a 'native' option. This looks a bit weird to me. If I want to use native endianness, why should I use a class named 'endian'?

You might be writing code that cares about endianness, and need to use native endianness in some places, and big- or little-endian integers in others.

One example is receiver makes right protocols, where the message is always transmitted using the sender's endianness. When a message is received it could be either big- or little-endian.

I failed to understand. In a 'receiver makes right' protocols, as you describe it, the sender sends the data it in its native endianness. I assume it probably also sends the type of its native endianness, big or little, otherwise the receiver won't be able to know it. Then, the receiver reads the endianness, and then reads all data according to it (which means run-time selection of endianness, hopefully possible using boost::variant). All of this is great, but I can't see the relevance to the existence of a 'native' option in the endianness enum, sorry. Can you elaborate, please? Yuval

Christopher Kohlhoff

6:33 a.m.

Hi Yuval, Yuval Ronen <ronen_yuval@yahoo.com> wrote:

...

In a 'receiver makes right' protocols, as you describe it, the sender sends the data it in its native endianness. I assume it probably also sends the type of its native endianness, big or little, otherwise the receiver won't be able to know it.

Yes, this information may be transmitted when the connection is first established or in a message header.

...

Then, the receiver reads the endianness, and then reads all data according to it (which means run-time selection of endianness, hopefully possible using boost::variant).

Yep, although I'm thinking generic code using templates, where the template parameter is based on the endianness.

...

All of this is great, but I can't see the relevance to the existence of a 'native' option in the endianness enum, sorry. Can you elaborate, please?

It lets me define my message structures once to cater for all types of endianness. E.g. a very simple example: tempalte <endianness E> struct message { endian<E, short, 2> first; endian<E, int, 4> second; endian<E, int, 4> third; ... }; Sender may write: message<native> m; m.first = ... write(sock, buffer(&m, sizeof(m))); Receiver may write: if (peer_is_big_endian) { message<big> m; read(sock, buffer(&m, sizeof(m))); process_message(m); } else { message<little> m; read(sock, buffer(&m, sizeof(m))); process_message(m); } And both sender and receiver may make use of template functions for processing message structures: template <endianness E> void process_message(message<E>& m) { ... } Cheers, Chris

Yuval Ronen

9:02 p.m.

Christopher Kohlhoff wrote:

...

It lets me define my message structures once to cater for all types of endianness. E.g. a very simple example:

tempalte <endianness E> struct message { endian<E, short, 2> first; endian<E, int, 4> second; endian<E, int, 4> third; ... };

Sender may write:

message<native> m; m.first = ... write(sock, buffer(&m, sizeof(m)));

Receiver may write:

if (peer_is_big_endian) { message<big> m; read(sock, buffer(&m, sizeof(m))); process_message(m); } else { message<little> m; read(sock, buffer(&m, sizeof(m))); process_message(m); }

And both sender and receiver may make use of template functions for processing message structures:

template <endianness E> void process_message(message<E>& m) { ... }

According to your example, the process_message function seems to be used only by the receiver, and this is actually logical, so I can't see the benefit of 'native' there. On the other hand, the definition of struct message, can benefit from it, I agree. So to provide a solution for the struct, I suggest to add a 'type selector' to the suggestion I described (too many times) in this thread. This selector would be defined as follows: template <endianness E, typename T> struct endian_selector { typedef ... type; } where: endian_selector<big , T>::type is big_endian<T> endian_selector<little, T>::type is little_endian<T> endian_selector<native, T>::type is T Yes, there is a 'native' option in here, you convinced me. I think this allows the struct message to be defined elegantly as in your example.

me22

9:21 p.m.

On 6/6/06, Yuval Ronen <ronen_yuval@yahoo.com> wrote:

...

Christopher Kohlhoff wrote:

...
tempalte <endianness E> struct message { endian<E, short, 2> first; endian<E, int, 4> second; endian<E, int, 4> third; ... };

On the other hand, the definition of struct message, can benefit from it, I agree. So to provide a solution for the struct, I suggest to add a 'type selector' to the suggestion I described (too many times) in this thread.

How about template template parameters? template <template <typename> class E> struct message { E< exact_int_t<2> > first; E< exact_int_t<4> > second; E< exact_int_t<4> > third; ... }; Though that might unnecessarily restrict the possible compilers. ~ Scott McMurray

Beman Dawes

8 Jun 8 Jun

6:17 p.m.

Yuval Ronen wrote:

...

Beman Dawes wrote:

...
A refresh of the .zip file for the Endian library, based on comments received so far, is available at http://mysite.verizon.net/~beman/endian-0.2.zip

The docs are online at http://mysite.verizon.net/~beman/endian-0.2/libs/endian/index.html

... What I'm suggesting is exactly what I suggested before (and obviously failed to convince): There should be a set of Integer types for various sizes/alignments, which could be used without any relation to endianness (which probably means native endianness, just as using a simple 'int' or 'uint32_t' means native endianness).

What I'm missing is the motivation. Other than for endian I/O, I'm not able to visualize any need for integers of various sizes/alignments beyond those already provided by <cstdint>. In any case, such types would seem to fit better into an integer library than a library providing endian byte-holders.

...

These types could then be wrapped in a big_endian or little_endian class, if the arbitrary native endianness is not desired.

Maybe this time I did a better job of convincing...

I need motivating use cases or applications rather than just hearing that "there should be [such types]". Other stuff:

...

- I think that using bits numbering is better than bytes, because a) uniformity with the types in <cstdint> is *very* important, IMO and b) as some noted, the size of a char is not necessarily 8 bits (so help me God if I understand why this is more useful than harmful), so bits numbering is less ambiguous than bytes (and maybe this is the reason why it was chosen to be used in <cstdint>).

<cstdint> is about integers, where the number of bits is critical, even if not exactly a certain number of bytes. <boost/endian.hpp> is about endian byte-holders, where the number of bytes is critical, even if not exactly matching the architecture's integer number of bits.

...

Actually, it just occurred to me that if portability between different platforms (with different CHAR_BITS) is our main concern here, then it *must* be bits, isn't it?

CHAR_BITS is fixed at 8. It never varies. It really sounds like your concern is applications involving integers, and an endian class is the wrong tool to solve your problem. Is that possibly the case?

...

- Is aligned more common than unaligned, or vice-versa? It sounds logical to me, that since the POD integers types (int and friends) are aligned, it should also be the 'default' behavior of any class mimicking them, including of course, the endian class. The conclusion is that instead of prefixing 'a' or 'aligned_' to the aligned types, the unaligned types should get a prefix ('unaligned_'?).

Unaligned (including the very common sub-cases of aligned by happenstance, careful placement, or padding) covers the vast majority of the uses in my experience. Forced alignment is a (somewhat dangerous) optimization that I would not recommend except to endian experts who understand the risks involved.

...

- Having an enum with values such as 'big', 'aligned_big', 'little', 'aligned_little', etc, just cries for separation. The enum should have only 'big' and 'little', and the endian template can accept one more template argument - 'bool aligned'.

My initial implementation did have an additional template argument, taking an enum: enum alignment { unaligned, aligned }; But having an additional argument meant that defaulting didn't work well. It is nice to be able to default the lengths for aligned.

...

I hope this post didn't sound too criticizing. That's not what I meant. I only wanted to thank you a lot for putting the effort of writing it, and present my humble comments.

Not at all. The whole point of posting preliminary versions is to get feedback. Thanks, --Beman

Yuval Ronen

9 Jun 9 Jun

2:45 p.m.

Beman Dawes wrote:

...

...
what I suggested before (and obviously failed to convince): There should be a set of Integer types for various sizes/alignments, which could be used without any relation to endianness (which probably means native endianness, just as using a simple 'int' or 'uint32_t' means native endianness).

What I'm missing is the motivation. Other than for endian I/O, I'm not able to visualize any need for integers of various sizes/alignments beyond those already provided by <cstdint>.

They are needed for the exact same reason you wrote class endian in the first place. You described your motivation as dealing with large files containing records. These records could contain integer types, and you wanted to be portable, and therefore store them in a declared endianness, rather than an unknown native endianness. You also wanted to be very economical with storage requirements, so you used an weird-sized, unaligned integers. That's perfectly fine. Now lets just take this exact example, and remove the need for portability. If I write code that I *know* will run on a homogeneous set of platforms, and I want to save the performance penalty imposed by the non-native endianness, then I'd like to use native endianness. The need for weird-sized unaligned integers to save space didn't disappear. It's still there. Hence the need for these Integer types for various sizes/alignments. Bottom line, is that I believe the need for these integer types exists (for space efficiency, or other reasons) even without the endianness specifications, and the latter should be built around them, and not interleaved with them.

...

In any case, such types would seem to fit better into an integer library than a library providing endian byte-holders.

Absolutely, that's what I was saying. These types should reside in Boost.Integer, and Boost.Endian should just accept them (and others) as template parameters.

...

...
- I think that using bits numbering is better than bytes, because a) uniformity with the types in <cstdint> is *very* important, IMO and b) as some noted, the size of a char is not necessarily 8 bits (so help me God if I understand why this is more useful than harmful), so bits numbering is less ambiguous than bytes (and maybe this is the reason why it was chosen to be used in <cstdint>).

<cstdint> is about integers, where the number of bits is critical, even if not exactly a certain number of bytes.

<boost/endian.hpp> is about endian byte-holders, where the number of bytes is critical, even if not exactly matching the architecture's integer number of bits.

boost/endian is not about integers? How can it be not? The *only* area where endianness is relevant is with integers. A buffer of bytes has no, and doesn't need any, endianness. That's why I think an *integer* type is the parameter to the endian classes. It seems we agree on that, because your code does exactly this - passes integer types to the endian class.

...

...
Actually, it just occurred to me that if portability between different platforms (with different CHAR_BITS) is our main concern here, then it *must* be bits, isn't it?

CHAR_BITS is fixed at 8. It never varies.

I'm certainly not a standard expert, but several posters in this thread, and in the Boost.Asio review thread, claimed that CHAR_BITS can be larger than 8. I had no knowledge of my own here, so I relied on it. If this is wrong, then I am wrong as well. My apologies for that.

...

It really sounds like your concern is applications involving integers, and an endian class is the wrong tool to solve your problem. Is that possibly the case?

My concern is applications involving integers, that's correct, but I see no contradiction between this and the endian topic. As I explained above, I believe concern in applications involving integers is in fact a pre-requisite for dealing with endianness.

...

...
- Is aligned more common than unaligned, or vice-versa? It sounds logical to me, that since the POD integers types (int and friends) are aligned, it should also be the 'default' behavior of any class mimicking them, including of course, the endian class. The conclusion is that instead of prefixing 'a' or 'aligned_' to the aligned types, the unaligned types should get a prefix ('unaligned_'?).

Unaligned (including the very common sub-cases of aligned by happenstance, careful placement, or padding) covers the vast majority of the uses in my experience. Forced alignment is a (somewhat dangerous) optimization that I would not recommend except to endian experts who understand the risks involved.

Let me understand, are you saying that using an int somewhere is "aligned by happenstance", and therefore considered "unaligned"?

...

...
- Having an enum with values such as 'big', 'aligned_big', 'little', 'aligned_little', etc, just cries for separation. The enum should have only 'big' and 'little', and the endian template can accept one more template argument - 'bool aligned'.

My initial implementation did have an additional template argument, taking an enum:

enum alignment { unaligned, aligned };

Looks excellent.

...

But having an additional argument meant that defaulting didn't work well. It is nice to be able to default the lengths for aligned.

I have to admit that I don't understand how adding the 'enum alignment' as a first or second template argument (before or after the 'endianness' argument) caused any problems with the default length argument. Sounds harmless to me.

...

Thanks,

--Beman

My pleasure, Yuval

Beman Dawes

10 Jun 10 Jun

1:18 p.m.

Yuval Ronen wrote:

...

Beman Dawes wrote:

...
...
what I suggested before (and obviously failed to convince): There should be a set of Integer types for various sizes/alignments, which could be used without any relation to endianness (which probably means native endianness, just as using a simple 'int' or 'uint32_t' means native endianness). What I'm missing is the motivation. Other than for endian I/O, I'm not able to visualize any need for integers of various sizes/alignments beyond those already provided by <cstdint>.

They are needed for the exact same reason you wrote class endian in the first place.

You described your motivation as dealing with large files containing records. These records could contain integer types, and you wanted to be portable, and therefore store them in a declared endianness, rather than an unknown native endianness. You also wanted to be very economical with storage requirements, so you used an weird-sized, unaligned integers.

That's perfectly fine. Now lets just take this exact example, and remove the need for portability. If I write code that I *know* will run on a homogeneous set of platforms, and I want to save the performance penalty imposed by the non-native endianness, then I'd like to use native endianness. The need for weird-sized unaligned integers to save space didn't disappear. It's still there. Hence the need for these Integer types for various sizes/alignments.

OK, "need for weird-sized unaligned integers to save space" is a valid motivation, although I would be surprised if usage was widespread.

...

Bottom line, is that I believe the need for these integer types exists (for space efficiency, or other reasons) even without the endianness specifications, and the latter should be built around them, and not interleaved with them.

The fundamental characteristic of these types is that they are stored as a sequence of bytes. Since a sequence of bytes has to have some ordering (big, little, native, whatever), I don't see how you could efficiently have types containing a sequence of bytes, and only later layer endianness on top of them. To do so would seem to imply later byte-swapping. In other words, if there was an unaligned any-size <= 8 integer library, presumably it would use native endianness. That implies byte-by-byte copying. Then changing endianness would involve byte swapping, doubling the cost compared to the current approach.

...

...
In any case, such types would seem to fit better into an integer library than a library providing endian byte-holders.

Absolutely, that's what I was saying. These types should reside in Boost.Integer, and Boost.Endian should just accept them (and others) as template parameters.

I can see that as an argument for making endian a part of Boost.Integer rather than a separate library.

...

...
...
- I think that using bits numbering is better than bytes, because a) uniformity with the types in <cstdint> is *very* important, IMO and b) as some noted, the size of a char is not necessarily 8 bits (so help me God if I understand why this is more useful than harmful), so bits numbering is less ambiguous than bytes (and maybe this is the reason why it was chosen to be used in <cstdint>). <cstdint> is about integers, where the number of bits is critical, even if not exactly a certain number of bytes.

<boost/endian.hpp> is about endian byte-holders, where the number of bytes is critical, even if not exactly matching the architecture's integer number of bits.

boost/endian is not about integers? How can it be not? The *only* area where endianness is relevant is with integers.

I've seen other numeric types (decimal, floats) where endianness was an issue, but did not include them in this proposal because I personally have no experience with such types and no need for such types.

...

A buffer of bytes has no, and doesn't need any, endianness.

...

That's why I think an *integer* type is the parameter to the endian classes. It seems we agree on that, because your code does exactly this - passes integer types to the endian class.

...
...
Actually, it just occurred to me that if portability between different platforms (with different CHAR_BITS) is our main concern here, then it *must* be bits, isn't it? CHAR_BITS is fixed at 8. It never varies.

I'm certainly not a standard expert, but several posters in this thread, and in the Boost.Asio review thread, claimed that CHAR_BITS can be larger than 8. I had no knowledge of my own here, so I relied on it. If this is wrong, then I am wrong as well. My apologies for that.

No, I was the one that was wrong. Sorry.

...

...

Let me understand, are you saying that using an int somewhere is "aligned by happenstance", and therefore considered "unaligned"?

Let me give an example: struct foo { big3_t v1; big3_t v2; big2_t v3; }; Now by happenstance v3 has an offset modulo 2 of 0. But in the applications I work with, it would be a design mistake to change it to an aligned_big2_t. That's because foo's may get embedded in larger structs like this: struct bar { big3_t x1; foo x2; }; It is very important for these apps that no padding be inserted after x1. That's why v3 isn't logically considered aligned, even though it happens to have an offset modulo 2 of 0.

...

...
...
- Having an enum with values such as 'big', 'aligned_big', 'little', 'aligned_little', etc, just cries for separation. The enum should have only 'big' and 'little', and the endian template can accept one more template argument - 'bool aligned'. My initial implementation did have an additional template argument, taking an enum:

enum alignment { unaligned, aligned };

Looks excellent.

...
But having an additional argument meant that defaulting didn't work well. It is nice to be able to default the lengths for aligned.

I have to admit that I don't understand how adding the 'enum alignment' as a first or second template argument (before or after the 'endianness' argument) caused any problems with the default length argument. Sounds harmless to me.

There are two defaults we might like: (1) alignment defaults to unaligned. (2) num_bytes defaults to sizeof(T) If the alignment parameter precedes the num_bytes parameter, an alignment default doesn't work if a num_bytes argument is present. If the num_bytes parameter precedes the alignment parameter, a num_bytes default doesn't work if an alignment argument is present. That's why I'm in favor of adding named arguments to the language. In this particular case, (2) is less important that (1), so I guess we could sacrifice (2) and have a separate alignment parameter, placed last so it could be defaulted to unaligned. I'm undecided. Thanks for all the comments, --Beman

Yuval Ronen

7:20 p.m.

...

...
You described your motivation as dealing with large files containing records. These records could contain integer types, and you wanted to be portable, and therefore store them in a declared endianness, rather than an unknown native endianness. You also wanted to be very economical with storage requirements, so you used an weird-sized, unaligned integers.

That's perfectly fine. Now lets just take this exact example, and remove the need for portability. If I write code that I *know* will run on a homogeneous set of platforms, and I want to save the performance penalty imposed by the non-native endianness, then I'd like to use native endianness. The need for weird-sized unaligned integers to save space didn't disappear. It's still there. Hence the need for these Integer types for various sizes/alignments.

OK, "need for weird-sized unaligned integers to save space" is a valid motivation, although I would be surprised if usage was widespread.

I'm not sure usage of these types (without endianness) would be any less widespread than usage with endianness, but I should really know when to quit... :-)

...

...
Bottom line, is that I believe the need for these integer types exists (for space efficiency, or other reasons) even without the endianness specifications, and the latter should be built around them, and not interleaved with them.

The fundamental characteristic of these types is that they are stored as a sequence of bytes. Since a sequence of bytes has to have some ordering (big, little, native, whatever), I don't see how you could efficiently have types containing a sequence of bytes, and only later layer endianness on top of them. To do so would seem to imply later byte-swapping.

In other words, if there was an unaligned any-size <= 8 integer library, presumably it would use native endianness. That implies byte-by-byte copying. Then changing endianness would involve byte swapping, doubling the cost compared to the current approach.

When is this "byte-by-byte copying" supposed to happen? When constructing the object? Performing I/O? Performing arithmetic operations? Any implementation of an endian class, my approach or yours, needs to do some byte swapping and copying, at one of those stages, at least. And as far as I understand, your implementation does it when performing arithmetic operations, but I might misunderstood your code. My idea of how to write it, would make the swapping and copying during construction and I/O rather then arithmetics, but I think it would also be possible to do it so arithmetic operations would suffer instead. Which is better? I don't know, but I don't think you can skip it all together...

...

...
...
In any case, such types would seem to fit better into an integer library than a library providing endian byte-holders. Absolutely, that's what I was saying. These types should reside in Boost.Integer, and Boost.Endian should just accept them (and others) as template parameters.

I can see that as an argument for making endian a part of Boost.Integer rather than a separate library.

If I'll assume for a moment that we agree on everything else (which we don't :-) ), then I won't mind either way.

...

...
...
...
- I think that using bits numbering is better than bytes, because a) uniformity with the types in <cstdint> is *very* important, IMO and b) as some noted, the size of a char is not necessarily 8 bits (so help me God if I understand why this is more useful than harmful), so bits numbering is less ambiguous than bytes (and maybe this is the reason why it was chosen to be used in <cstdint>). <cstdint> is about integers, where the number of bits is critical, even if not exactly a certain number of bytes.

<boost/endian.hpp> is about endian byte-holders, where the number of bytes is critical, even if not exactly matching the architecture's integer number of bits. boost/endian is not about integers? How can it be not? The *only* area where endianness is relevant is with integers.

I've seen other numeric types (decimal, floats) where endianness was an issue, but did not include them in this proposal because I personally have no experience with such types and no need for such types.

So it seems we agree that those types are out of scope here. But this is drifting from my original point. The point was that both <cstdint> and the endian class(es) deal with integers, not with "byte-holders". Byte holders, or IOW buffers, don't have anything to do with endianness, only integers do. So this was the rationale for using bits instead of bytes, to be consistent with <cstdint>.

...

...
...
...
Actually, it just occurred to me that if portability between different platforms (with different CHAR_BITS) is our main concern here, then it *must* be bits, isn't it? CHAR_BITS is fixed at 8. It never varies. I'm certainly not a standard expert, but several posters in this thread, and in the Boost.Asio review thread, claimed that CHAR_BITS can be larger than 8. I had no knowledge of my own here, so I relied on it. If this is wrong, then I am wrong as well. My apologies for that.

No, I was the one that was wrong. Sorry.

Which brings me back to my original post - if such platforms with CHAR_BITS != 8 are to be supported (which is not certain), then I think counting bits is an absolute must...

...

...
...

Let me understand, are you saying that using an int somewhere is "aligned by happenstance", and therefore considered "unaligned"?

Let me give an example:

struct foo { big3_t v1; big3_t v2; big2_t v3; };

Now by happenstance v3 has an offset modulo 2 of 0. But in the applications I work with, it would be a design mistake to change it to an aligned_big2_t.

That's because foo's may get embedded in larger structs like this:

struct bar { big3_t x1; foo x2; };

It is very important for these apps that no padding be inserted after x1. That's why v3 isn't logically considered aligned, even though it happens to have an offset modulo 2 of 0.

I completely agree with your example, but am still not convinced. The reason is probably because we disagree on the more fundamental issue of whether to separate the endian class from size/alignment or not. So there is no reason to pursue this further.

...

...
...
...
- Having an enum with values such as 'big', 'aligned_big', 'little', 'aligned_little', etc, just cries for separation. The enum should have only 'big' and 'little', and the endian template can accept one more template argument - 'bool aligned'. My initial implementation did have an additional template argument, taking an enum:

enum alignment { unaligned, aligned }; Looks excellent.

...
But having an additional argument meant that defaulting didn't work well. It is nice to be able to default the lengths for aligned. I have to admit that I don't understand how adding the 'enum alignment' as a first or second template argument (before or after the 'endianness' argument) caused any problems with the default length argument. Sounds harmless to me.

There are two defaults we might like:

(1) alignment defaults to unaligned. (2) num_bytes defaults to sizeof(T)

If the alignment parameter precedes the num_bytes parameter, an alignment default doesn't work if a num_bytes argument is present.

If the num_bytes parameter precedes the alignment parameter, a num_bytes default doesn't work if an alignment argument is present.

That's why I'm in favor of adding named arguments to the language.

In this particular case, (2) is less important that (1), so I guess we could sacrifice (2) and have a separate alignment parameter, placed last so it could be defaulted to unaligned. I'm undecided.

Your current implementation actually sacrifices (1) in favor of (2), and in this case there should also be no problem adding the alignment parameter (without a default, which was just sacrificed). It seems that either preferring (1) over (2), or vice-versa, allows adding the alignment parameter.

...

Thanks for all the comments,

You're welcome.

me22

8:36 p.m.

On 6/10/06, Beman Dawes <bdawes@acm.org> wrote:

...

There are two defaults we might like:

(1) alignment defaults to unaligned. (2) num_bytes defaults to sizeof(T)

[ Snip lots more discussion ]

I've been working on Yuval's idea, and in doing so came up with a different interface that I'll describe here, as it deals with the issues mentioned here in a different way. (This is not the exact one I've implemented, I've adjusted some things to fit closer with what's been discussed here.) I've basically inverted some of the things. Instead of having num_bytes default to sizeof(T), why not have T default to some type based on num_bytes? Boost.Integer already provides an elegant interface for doing this. This requires an extra template parameter to specify signedness, but I don't think that's a problem. For one, it means that the T parameter is unneeded, so it's not increasing the number of parameters anyways. It means that the type can default to signed, "as the ints do", and the signed and unsigned keywords are valid arguments to a type template parameter, which I think is a fairly elegant interface. As for alignment, might there ever be a need to have alignment at something other than num_bytes? If the storage is done in an anonymous union of the char array and some type where sizeof(Align_T)==align_bytes, then you can allow attempted alignment at any number of bytes. Allowing aligned and unaligned as arguments would also be easy. If unaligned has the value 1, then defaulting to unaligned makes the Align_T a char, which has no effect on the alignment. aligned could be given a value of 0, for which the implementation would pick T as the Align_T. Examples: little-endian, 3 bytes, signed, unaligned: endian<little,3> little-endian, 5 bytes, unsigned, unaligned: endian<little,5,unsigned> big-endian, 4 bytes, signed, 4-byte aligned: endian<big,4,signed,aligned> big-endian, 2 bytes, unsigned, 4-byte aligned: endian<big,2,unsigned,4> This would, however, modifying (or implementing a clone of) Boost.Integer to add support for long long or __int64_t or similar, where available, but I think that would be a worthwhile project anyways.

...

Thanks for all the comments,

Thanks for bringing it up in the first place, ~ Scott McMurray

Tomas Puverle

7 Jun 7 Jun

4:13 p.m.

...

A refresh of the .zip file for the Endian library, based on comments received so far, is available at http://mysite.verizon.net/~beman/endian-0.2.zip

The docs are online at http://mysite.verizon.net/~beman/endian-0.2/libs/endian/index.html

I was very interested in this library/thread but unfortunately didn't have the time to read the old thread and this one until now. I'd like to put up a few suggestions for consideration: 1) There is some mention of supporting custom swapping routines. This is fairly important, as the 3 architectures that I use the most (x86, x86-64 and sparc) all support some harware primitives to perform endian swapping. x86's have a bswap instruction. sparc allows you to load/store memory operands as either little or big endian. Other architectures I occasionaly touch have similar capabilities. I would like to be able to take advantage of this. 2) Could the library include functionality to make it more useful with other types? This suggestion is not a replacement for what you are doing, it's more like a supplement. It's useful when you need to do a fair amount of processing on your data after you read it in but before you write it back to the stream/storage etc. This is how I've implemented similar things in the past: template<typename T> struct endian_swapper { static void swap_in_place(T&); }; The class is specialised for builtin types and arrays, where the member does the obvious thing. There is also an inline template free function with the same signature as the member, which just does "return endian_swapper<T>::swap_in_place(t);" Now consider a struct, e.g. some sort of network message: struct A { char k; int i; float x; }; BOOST_IMPLEMENT_ENDIAN(A, (k)(i)(x)); The macro expands to a specialisation of endian_swapper<A> with swap_in_place just calls the free function swap_in_place: template<> struct endian_swapper<A> { void swap_in_place(A& t_) { swap_in_place(t_.(&A::k)); swap_in_place(t_.(&A::i)); swap_in_place(t_.(&A::x)); } }; You can see how this also recursively works for structs/arrays inside other structs/arrays. Final piece is the free function swap, which gives you back a swapped copy of the original: template<class T> T swap(const T & t_) { T temp(t_) swap_in_place(temp); return temp; } I just tried to outline the general idea here but I hope I managed to get the point across. It would be even nicer if it could also understand the native/little/big endian semantics you've already discussed but I thought trying to work that into the examples would just make them too complicated. Thanks for the good work. Tom

Beman Dawes

12 Jun 12 Jun

7:38 p.m.

Tomas Puverle wrote:

...

...
A refresh of the .zip file for the Endian library, based on comments received so far, is available at http://mysite.verizon.net/~beman/endian-0.2.zip

The docs are online at http://mysite.verizon.net/~beman/endian-0.2/libs/endian/index.html

I was very interested in this library/thread but unfortunately didn't have the time to read the old thread and this one until now.

I'd like to put up a few suggestions for consideration:

1) There is some mention of supporting custom swapping routines. This is fairly important, as the 3 architectures that I use the most (x86, x86-64 and sparc) all support some harware primitives to perform endian swapping. x86's have a bswap instruction. sparc allows you to load/store memory operands as either little or big endian. Other architectures I occasionaly touch have similar capabilities. I would like to be able to take advantage of this.

In theory it should be possible to provide assemble language optimizations for specific platforms. In practice..., well, we will have to see what surfaces. The first step is to finalize the interface, docs, etc, and have a formal review. Once that is complete, it would be great if platform specific assembler optimizations were contributed, particularly when timing tests show considerable speedups. You might want to check the assembly language output from your favorite compiler. I have seen a compiler that recognized byte-swaps and generated code that used a byte-swap instruction.

...

2) Could the library include functionality to make it more useful with other types? This suggestion is not a replacement for what you are doing, it's more like a supplement. It's useful when you need to do a fair amount of processing on your data after you read it in but before you write it back to the stream/storage etc. This is how I've implemented similar things in the past:

template<typename T> struct endian_swapper { static void swap_in_place(T&); };

The class is specialised for builtin types and arrays, where the member does the obvious thing. There is also an inline template free function with the same signature as the member, which just does "return endian_swapper<T>::swap_in_place(t);"

Now consider a struct, e.g. some sort of network message:

struct A { char k; int i; float x; };

BOOST_IMPLEMENT_ENDIAN(A, (k)(i)(x));

The macro expands to a specialisation of endian_swapper<A> with swap_in_place just calls the free function swap_in_place:

template<> struct endian_swapper<A> { void swap_in_place(A& t_) { swap_in_place(t_.(&A::k)); swap_in_place(t_.(&A::i)); swap_in_place(t_.(&A::x)); } };

You can see how this also recursively works for structs/arrays inside other structs/arrays.

Final piece is the free function swap, which gives you back a swapped copy of the original:

template<class T> T swap(const T & t_) { T temp(t_) swap_in_place(temp); return temp; }

I just tried to outline the general idea here but I hope I managed to get the point across.

It would be even nicer if it could also understand the native/little/big endian semantics you've already discussed but I thought trying to work that into the examples would just make them too complicated.

I'm not adverse to a more general endian swapper, but that is beyond my needs. (I'm basically trying to get endian types into Boost so I can use them in a B-tree library, and don't want to spend the whole summer just talking about endian possibilities.)

...

Thanks for the good work.

Well, thank you for the comments! --Beman

6943

Age (days ago)

6951

Last active (days ago)

List overview

Download

25 comments

8 participants

participants (8)

Beman Dawes
Christopher Kohlhoff
Eric Friedman
Gennaro Prota
Kim Barrett
me22
Tomas Puverle
Yuval Ronen