[integer] Promotion of endian library code from vault

newer
Re: [boost] [switch] Is there any...

older
[Phoenix] Review ended September...

Neil Mayhew

22 May 2008 22 May '08

2:53 p.m.

On 2006-07-22 12:21:01, Beman Dawes wrote:

...

[boost] [integer] endian-0.6 posted w/ names changed endian-0.6.zip has been posted in the integer directory of the file vault at http://boost-consulting.com/vault/

This code is still in the vault, but I'd like to see it in the main part of boost. What needs to be done to make this happen? The endian library is exactly what I've been looking for, and I would really like to use it in my work, but I can't justify that to my colleagues if it's not an official part of boost. I would hate to become yet another person that reinvents this particular wheel. However, I notice there was a lot of discussion <http://lists.boost.org/Archives/boost/2006/07/108125.php> about PODs vs constructors, because of unions, so maybe not enough consensus was reached? There were some interesting proposals but no agreed-upon solution. In which case, I suspect that people will need to copy the code and each hack it to suit their own purposes, which is not what boost is all about. I have two use cases from my own work: TrueType fonts, and Mac HFS+ file systems. Both of these are big-endian file formats that need to be read into memory and processed, so (FWIW) I would support the POD approach. Currently, I can't use the code as-is because of the need to use the types in unions. References: http://lists.boost.org/Archives/boost/2000/03/2495.php http://lists.boost.org/Archives/boost/2006/05/105621.php http://lists.boost.org/Archives/boost/2006/06/105695.php http://lists.boost.org/Archives/boost/2006/06/105779.php http://lists.boost.org/Archives/boost/2006/06/105968.php http://lists.boost.org/Archives/boost/2006/06/106655.php http://lists.boost.org/Archives/boost/2006/07/108125.php http://lists.boost.org/Archives/boost/2006/07/108202.php --Neil

Show replies by date

Roland Schwarz

23 May 23 May

11:05 a.m.

Neil Mayhew wrote:

...

This code is still in the vault, but I'd like to see it in the main part of boost. What needs to be done to make this happen?

The endian library is exactly what I've been looking for, and I would really like to use it in my work, but I can't justify that to my colleagues if it's not an official part of boost.

This is an interesting library indeed! Thank you for having this brought up on the list again. However I have some concerns about usefulness when it comes to compiler independence and platform independence. One of the goals of such a library to be useful (for me at least) would be to be able to create compiler/platform independent binary files. I can see two problems here: 1) struct layout. The standard gives no provisions for struct layout. So e.g. for struct foo { big8_t a; big32_t b; }; one cannot predict the alignment of the members. (Or am I wrong in this respect?) 2) The standard makes no provision for object representation, so writing out a bit pattern (which essentially is object representation) cannot be guaranteed to be read in on another platform. (Irresepective of endianess.) The only way to do this is to map object values to a stream of chars. One way to implement this mapping is to convert to ASCII representation (this is what the standard lib provides), but as I believe this is not the only mapping possible. I can imagine a mapping to char values that is less computing intensive and will resemble binary a little closer. In an attempt to solve issue 1) I came up with something like: struct foo { foo(char* begin, char* end) : x(begin) , y(begin) , z(begin) {} field<int, bigint32_t, 0> x; field<short int, bigint16_t, 32> y; field<short int, bigint16_t, 48> z; }; The foo class can be instantiated on a sequence of chars, and the field proxies perform conversion onto this sequence during access operations. The last template parameter controls offset (a bit offset in my case). The bottom line: For the endian lib to go into boost I really would want to require it being able to produce platform independent binary data. Regards, -- _________________________________________ _ _ | Roland Schwarz |_)(_ | aka. speedsnail | \__) | mailto:roland.schwarz@chello.at ________| http://www.blackspace.at

dizzy

11:46 a.m.

On Friday 23 May 2008 14:05:56 Roland Schwarz wrote:

...

Neil Mayhew wrote:

...
This code is still in the vault, but I'd like to see it in the main part of boost. What needs to be done to make this happen?

The endian library is exactly what I've been looking for, and I would really like to use it in my work, but I can't justify that to my colleagues if it's not an official part of boost.

This is an interesting library indeed! Thank you for having this brought up on the list again.

However I have some concerns about usefulness when it comes to compiler independence and platform independence.

One of the goals of such a library to be useful (for me at least) would be to be able to create compiler/platform independent binary files.

I can see two problems here:

1) struct layout. The standard gives no provisions for struct layout. So e.g. for

struct foo { big8_t a; big32_t b; };

one cannot predict the alignment of the members. (Or am I wrong in this respect?)

Correct, which is why protocol binary structures are never mapped directly in memory (you can with some compiler extensions but you won't gain anything since I/O is the bottleneck in such cases and not memory copy). Instead a serialization aproach should solve such issues.

...

2) The standard makes no provision for object representation, so writing out a bit pattern (which essentially is object representation) cannot be guaranteed to be read in on another platform. (Irresepective of endianess.) The only way to do this is to map object values to a stream of chars. One way to implement this mapping is to convert to ASCII representation (this is what the standard lib provides), but as I believe this is not the only mapping possible. I can imagine a mapping to char values that is less computing intensive and will resemble binary a little closer.

In an attempt to solve issue 1) I came up with something like:

struct foo { foo(char* begin, char* end)

: x(begin)

, y(begin) , z(begin) {}

field<int, bigint32_t, 0> x; field<short int, bigint16_t, 32> y; field<short int, bigint16_t, 48> z; };

That kinda looks like reinventing boost.serialization although with a different API (I did something similar in my code). I'm not sure if boost.serialization allows read/write on different platforms right now, if not, something could be added to it to do so (a new kind of archive type maybe). -- Mihai RUSU Email: dizzy@roedu.net "Linux is obsolete" -- AST

Roland Schwarz

12:19 p.m.

dizzy wrote:

...

On Friday 23 May 2008 14:05:56 Roland Schwarz wrote:

...
one cannot predict the alignment of the members. (Or am I wrong in this respect?)

Correct, which is why protocol binary structures are never mapped directly in memory (you can with some compiler extensions but you won't gain anything since I/O is the bottleneck in such cases and not memory copy). Instead a serialization aproach should solve such issues.

It seems I was wrong with my assumptions about endian.hpp: The proposed lib actually does have an unaligned type, i.e. it actually maps the types to char bytes[..] arrays. So the question remains what the standard has to say about alignment of struct foo { char bytes_1 [3]; char bytes_2 [2]; char bytes_3 [1]; char bytes_4 [4]; }; Will such a struct be equivalent to char bytes[3+2+1+4] ? Hmm, and foo isn't necessarily a POD. Writing this out with binary write... what is guaranteed? I fear not much.

...

That kinda looks like reinventing boost.serialization although with a different API (I did something similar in my code).

Indeed there are similarities. But there are differences as well. 1) I wanted a way to control layout on a per struct basis. I wanted to be able to go as low as bit position. 2) boost.serialization is fine when I want to make my in memory classes persistent, but it is of little help for the decoding of binary protocol packages. ( I do not claim it is not possible). 3) boost.serialization solves two orthogonal problems with a single implementation. One problem is mapping native types to portable types (partial overlap with standard lib << operators), the other is writing out a tree of objects (serialization). I was aiming only at the first problem, and so does endian.hpp I guess. Perhaps serialization could be refactored to allow more control over the layout within the serialize functions? -- _________________________________________ _ _ | Roland Schwarz |_)(_ | aka. speedsnail | \__) | mailto:roland.schwarz@chello.at ________| http://www.blackspace.at

Neil Mayhew

1:28 p.m.

On 5/23/08 6:19 AM, Roland Schwarz wrote:

...

The proposed lib actually does have an unaligned type, i.e. it actually maps the types to char bytes[..] arrays. So the question remains what the standard has to say about alignment of

struct foo { char bytes_1 [3]; char bytes_2 [2]; char bytes_3 [1]; char bytes_4 [4]; };

Will such a struct be equivalent to char bytes[3+2+1+4] ?

I am not a language lawyer, but my understanding is that struct members will be aligned according to their needs, and a type that contains only chars will need no special alignment. The endian library therefore takes care of alignment down to the byte level.

...

1) I wanted a way to control layout on a per struct basis. I wanted to be able to go as low as bit position.

I think it is very unlikely that a portable library would be able to get control over bit positions. At least, not in a purely data-declarative way. However, you could use a boost endian type as the underlying storage and add methods to access bit ranges within that. For example, the addition of a general bit-range access method to the endian template would help here, one which takes a bit offset and a length, and you would then add access methods to the combined struct that call the bit-range one on individual endian fields.

...

Hmm, and foo isn't necessarily a POD. Writing this out with binary write... what is guaranteed? I fear not much.

Provided the alignment and length of endian fields is guaranteed (which I believe it is) then the only other thing that could get in the way is "hidden" content in the combined struct. However, provided there are no virtual methods, what you see is what you get. So I think binary read/write will always work, and it seems to me that this is the main point of endian. I'm not sure what the standard says, but it would work with any of the compilers I have used in the past 10 years or so. --Neil

dizzy

1:36 p.m.

On Friday 23 May 2008 15:19:32 Roland Schwarz wrote:

...

...
That kinda looks like reinventing boost.serialization although with a different API (I did something similar in my code).

Indeed there are similarities. But there are differences as well.

1) I wanted a way to control layout on a per struct basis. I wanted to be able to go as low as bit position.

2) boost.serialization is fine when I want to make my in memory classes persistent, but it is of little help for the decoding of binary protocol packages. ( I do not claim it is not possible).

3) boost.serialization solves two orthogonal problems with a single implementation. One problem is mapping native types to portable types (partial overlap with standard lib << operators), the other is writing out a tree of objects (serialization). I was aiming only at the first problem, and so does endian.hpp I guess. Perhaps serialization could be refactored to allow more control over the layout within the serialize functions?

Correct, I agree, that's why I guess I have written my own portable serialization library (although I never thought too much about reasons but those listed by you do apply in my case). I have something similar as you said, basically something like this (I'm showing only the integral case because is the basis of the framework, I also have utility serialization code for strings and such but those are based on the integral too): integral<ValueSignedness, ValueBits, Serializator<ByteSize, TypeSize, ExternSignedness, Endianess> > i; ValueSignedness can be "signed" or "unsigned" (I abuse these 2 integral types to use them as "tags"). ValueBits is the number of bits needed to form the value (the minimum number). Serializator is a concept which basically means having member functions for serialization, deserialization acording to the given template arguments representation: - ByteSize is the size in bits of a single byte for the external representation - TypeSize is the size in external representation bytes of the integral type as it will exist in the external representation (like you can specify you want an external integral type of 2 bytes of 16 bits each) - ExternSignedness can be "unsigned" (used as a tag), "signed_1scompl" (signed with one's complement), "signed_2ndcompl" (signed with two's complement) - Endianess can be "little_endian", "big_endian" which are also 2 tag types (I overlook the mixed endian and such cases, I do not need to support them). The code will take care of performing all needed conversions. One nice feature I like about my framework is that it is completely compile time configured which means that it will compile code to perform certain conversions ONLY if such conversions are required (for example if you compile the code on a high endian platform and you output in a high endian form then it will copy the data, unless other conversions are required by the ByteSize/TypeSize parameters, in general I've implemented about 4 optimized possible ways to perform the conversions considering 4 special cases of what the platform has and what has been requested, if no such optimization match is found a generic slower algorithm is used). -- Mihai RUSU Email: dizzy@roedu.net "Linux is obsolete" -- AST

Giovanni Piero Deretta

1:45 p.m.

On Fri, May 23, 2008 at 1:46 PM, dizzy <dizzy@roedu.net> wrote:

...

On Friday 23 May 2008 14:05:56 Roland Schwarz wrote:

...
Neil Mayhew wrote:

...
This code is still in the vault, but I'd like to see it in the main part of boost. What needs to be done to make this happen?

The endian library is exactly what I've been looking for, and I would really like to use it in my work, but I can't justify that to my colleagues if it's not an official part of boost.

This is an interesting library indeed! Thank you for having this brought up on the list again.

However I have some concerns about usefulness when it comes to compiler independence and platform independence.

One of the goals of such a library to be useful (for me at least) would be to be able to create compiler/platform independent binary files.

I can see two problems here:

1) struct layout. The standard gives no provisions for struct layout. So e.g. for

struct foo { big8_t a; big32_t b; };

one cannot predict the alignment of the members. (Or am I wrong in this respect?)

Correct, which is why protocol binary structures are never mapped directly in memory (you can with some compiler extensions but you won't gain anything since I/O is the bottleneck in such cases and not memory copy). Instead a serialization aproach should solve such issues.

With disk I/O this is certainly true, on the other hand, high speed LAN networks might actually be faster than (uncached) memory accesses (think for example 10G ethernet). Zero copy I/O is certainly an useful property. -- gpd

Neil Mayhew

3:27 p.m.

On 23/05/08 07:45 AM Giovanni Piero Deretta wrote:

...

On Fri, May 23, 2008 at 1:46 PM, dizzy <dizzy@roedu.net> wrote:

...
... protocol binary structures are never mapped directly in memory (you can with some compiler extensions but you won't gain anything since I/O is the bottleneck in such cases and not memory copy). Instead a serialization aproach should solve such issues.

With disk I/O this is certainly true, on the other hand, high speed LAN networks might actually be faster than (uncached) memory accesses (think for example 10G ethernet). Zero copy I/O is certainly an useful property.

Also, protocols aren't the only use case. For example, dealing with large binary files may involve memory-mapped disk i/o, in which case mapping directly structures accurately in memory is essential. Endian works very well for this, and I don't see a need for special serialization. --Neil

Phil Endecott

4:32 p.m.

dizzy wrote:

...

On Friday 23 May 2008 14:05:56 Roland Schwarz wrote:

...
The standard gives no provisions for struct layout. So e.g. for

struct foo { big8_t a; big32_t b; };

one cannot predict the alignment of the members. (Or am I wrong in this respect?)

Correct, which is why protocol binary structures are never mapped directly in memory (you can with some compiler extensions but you won't gain anything since I/O is the bottleneck in such cases and not memory copy). Instead a serialization aproach should solve such issues.

Never say never :-) I find that mmap()ing binary files into memory has the following significant advantage compared to read()ing them in: if the file is large and memory is tight, then read-in data must be swapped out i.e. written to disk. In contrast, read-only mmap()ed pages can simply be discarded from RAM. Even if the data is never actually swapped out, unless the OS over-commits swap space, disk will be reserved for this data. So mmap() has a performance benefit on all memory-constrained systems and also a disk (i.e. flash) space benefit on embedded systems. So while I'm generally quite pedantic about standards-compliance issues, struct layout is an area where I'm prepared to assume that the compiler does the "obvious" thing. In this sort of code I tend to use integers with fixed sizes (i.e. int32_t rather than int) so that the data files are more likely to be portable between 32- and 64-bit systems. Being able to declare their endianness using something like this library would be useful too. But in practice, I'm nearly always doing this in an environment where the file is always going to be read by the exact same application binary. Phil.

Beman Dawes

24 May 24 May

11:34 a.m.

Roland Schwarz wrote:

...

... The bottom line: For the endian lib to go into boost I really would want to require it being able to produce platform independent binary data.

Yes, of course. That's the most important requirement. While the interface in endian.hpp is modern C++, the C implementation of underlying struct has been in use since about 1984 on a very wide variety of platforms. Data files have been exchanged without problems between those platforms since then. Others have reported successful use of the same technique. The standards committees understand the importance of preserving layout compatibility for such structs. --Beman

Beman Dawes

23 May 23 May

1:11 p.m.

Neil Mayhew wrote:

...

On 2006-07-22 12:21:01, Beman Dawes wrote:

...
[boost] [integer] endian-0.6 posted w/ names changed endian-0.6.zip has been posted in the integer directory of the file vault at http://boost-consulting.com/vault/

This code is still in the vault, but I'd like to see it in the main part of boost. What needs to be done to make this happen?

Here's what has happened: In working on endian, I became convince that the C++ standard's POD specification was a serious impediment and needed major revision. The standards committee agreed, and so C++0x will include a major relaxation on the requirements for POD's. Among other changes, base classes and constructors are permitted (under certain conditions). See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2342.htm That work led Lawrence Crowl to come up with the idea of deleted and defaulted function for C++0x, which works synergistically with the POD's changes. See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2346.htm A third C++0x feature, constexpr, may also be useful for endian. See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2235.pdf So the next step is to revisit the current endian design, and apply C++0x features where useful. Some of these features are starting to become available in compilers, so they can be tested. ifdef's will be needed, of course, so that the endian stuff will still work with C++03 compilers. I'm buried with other commitments, so if someone else wants to help with endian it might speed things up quite a bit. Thanks, --Beman

Neil Mayhew

1:37 p.m.

On 5/23/08 7:11 AM, Beman Dawes wrote:

...

In working on endian, I became convince that the C++ standard's POD specification was a serious impediment and needed major revision. The standards committee agreed, and so C++0x will include a major relaxation on the requirements for POD's... So the next step is to revisit the current endian design, and apply C++0x features where useful.

Thanks for this clarification. The standard changes sound great, and it makes sense to take advantage of these in endian, where possible.

...

Some of these features are starting to become available in compilers, so they can be tested.

However, I think it will be a while before I myself am able to use a compiler with these features, whereas I'd like to use endian asap. How about we get the current implementation into boost more or less as-is, and then work on improving it with new compiler features as these become available?

...

I'm buried with other commitments, so if someone else wants to help with endian it might speed things up quite a bit.

What kind of help did you have in mind? Is there anything that needs to be done that doesn't involve using new compiler features? The areas that I am interested in seeing more work done on are (1) making constructors "conditional" and (2) adding a bit-range access method. --Neil

Beman Dawes

24 May 24 May

11:45 a.m.

Neil Mayhew wrote:

...

On 5/23/08 7:11 AM, Beman Dawes wrote:

...
In working on endian, I became convince that the C++ standard's POD specification was a serious impediment and needed major revision. The standards committee agreed, and so C++0x will include a major relaxation on the requirements for POD's... So the next step is to revisit the current endian design, and apply C++0x features where useful.

Thanks for this clarification. The standard changes sound great, and it makes sense to take advantage of these in endian, where possible.

...
Some of these features are starting to become available in compilers, so they can be tested.

However, I think it will be a while before I myself am able to use a compiler with these features, whereas I'd like to use endian asap. How about we get the current implementation into boost more or less as-is, and then work on improving it with new compiler features as these become available?

...
I'm buried with other commitments, so if someone else wants to help with endian it might speed things up quite a bit.

What kind of help did you have in mind?

I just took a look at the current state of the library, and it seems to be in pretty good shape. So the main help would be careful review and commenting on a revised version. I'll try to get that out in a week or so.

...

Is there anything that needs to be done that doesn't involve using new compiler features?

Yes. The two enums need to be moved into subnamespaces to simulate C++0x scoped enums. Questions: * For efficiency, should an operator= taking a value_type argument be added? (I can't remember why there isn't one.) * Should there be a conversion to bool? * Should there be a way (#ifdef? separate class?) to provide an interoperable version that meets C++03 POD requirements, so endian objects can be put in unions? Is that what you mean by (1) below?

...

The areas that I am interested in seeing more work done on are (1) making constructors "conditional" and (2) adding a bit-range access method.

Do you have a proposed design for bit-range access? What is the use case or motivation for bit-range access? Thanks, --Beman

Scott McMurray

3:23 p.m.

On Sat, May 24, 2008 at 7:45 AM, Beman Dawes <bdawes@acm.org> wrote:

...

Neil Mayhew wrote:

...
The areas that I am interested in seeing more work done on are (1) making constructors "conditional" and (2) adding a bit-range access method.

Do you have a proposed design for bit-range access?

What is the use case or motivation for bit-range access?

Perhaps something based off of erlang's Bit Syntax? http://www.erlang.org/doc/reference_manual/expressions.html#6.16 Armstrong's Erlang book gives MPEG headers as an example of bit range access, among other things. So something like typedef bit_sequence< uint_t<11>, uint_t<2>, uint_t<2>, uint_t<1>, uint_t<4>, uint_t<2>, uint_t<1>, uint_t<9>

...

MPEG_header;

Or if you're writing a linker to generate EXEs for windows (another example from Armstrong's book), typedef bit_sequence< little_endian< uint_t<32> >, // Characteristics little_endian< uint_t<32> >, // TimeDateStamp little_endian< uint_t<16> >, // MajorVersion little_endian< uint_t<16> >, // MinorVersion little_endian< uint_t<16> >, // NumberOfNamedEntries little_endian< uint_t<16> > // NumberOfIdEntries

...

image_resource_directory;

Be fun to be able to get everything as a tuple, too, to be able to assign to a tie, pattern-matching style That interface works better for the bit ranges, though, I suppose, since little_endian< uint_t<32> > as a boost.endian type on its own would be far more convenient name-wise than having to get it through a tuple-style interface. Just thinking out loud, ~ Scott

vicente.botet

4:05 p.m.

----- Original Message ----- From: "Beman Dawes" <bdawes@acm.org> To: <boost@lists.boost.org> Sent: Saturday, May 24, 2008 1:45 PM Subject: Re: [boost] [integer] Promotion of endian library code from vault

...

Neil Mayhew wrote:

...
The areas that I am interested in seeing more work done on are (1) making constructors "conditional" and (2) adding a bit-range access method.

Do you have a proposed design for bit-range access?

What about the bitfield library? The one on the Vault could be a good base. Vicente

Neil Mayhew

26 May 26 May

8:31 p.m.

On 24/05/08 05:45 AM Beman Dawes wrote:

...

Neil Mayhew wrote:

...
The areas that I am interested in seeing more work done on are (1) making constructors "conditional" and (2) adding a bit-range access method.

* Should there be a way (#ifdef? separate class?) to provide an interoperable version that meets C++03 POD requirements, so endian objects can be put in unions? Is that what you mean by (1)?

Yes, this is what I meant. I was thinking that the most elegant solution would be to have a base class that contains everything except the constructors, and a derived class that adds the constructors (I think this was in a post in 2006). The typedefs to big32_t etc. would then be #if'd to correspond to the POD or non-POD classes as desired. I think this is better than putting the #if's around the constructors themselves.

...

Do you have a proposed design for bit-range access?

I was thinking of adding methods to read and write the given bits within the value: value_type endian::bits(std::size_t offset, std::size_t length) const; value_type endian::bits(std::size_t offset, std::size_t length, value_type value); Making these template methods with offset and length being template arguments might allow for more compile-time unrolling, but since these methods just do shifting and masking, the compiler will probably do all the needed optimization anyway. A better design might be to specialize Bitfield to accept endians, although I haven't studied Bitfield enough yet to see how well this would work.

...

What is the use case or motivation for bit-range access?

A packed binary structure like the following. The numbers are lengths in bits, and the total length is 24 bits (ie 3 bytes): +------------+-----------+------------+ | field_a:6 | field_b:4 | field_c:14 | +------------+-----------+------------+ This may look contrived, but I have seen plenty of things similar to this in protocols and data files. As part of a bigger struct, this would be implemented using endian::bits as follows: struct packet { ubig24_t abc; uint_least32_t field_a() const { return abc.bits(0, 6); } uint_least32_t field_b() const { return abc.bits(6, 4); } uint_least32_t field_b() const { return abc.bits(10, 14); } uint_least32_t field_a(uint_least32_t v) { return abc.bits(0, 6, v); } uint_least32_t field_b(uint_least32_t v) { return abc.bits(6, 4, v); } uint_least32_t field_b(uint_least32_t v) { return abc.bits(10, 14, v); } }; If the design used Bitfield instead, there would be one access method for both read and write, which would return by value a Bitfield on abc. However, the method would need to have both const and non-const versions, returning a const and non-const Bitfield respectively. Unfortunately, the Bitfield can't be made a part of the struct because it would take up space, breaking the mapping of the memory. So there would be an inconsistency with the syntax for accessing other (non-bit) fields within the structure, ie with and without parens, but I can live with that. --Neil

Beman Dawes

28 May 28 May

10:25 p.m.

Neil Mayhew wrote:

...

On 24/05/08 05:45 AM Beman Dawes wrote:

...
Neil Mayhew wrote:

...
The areas that I am interested in seeing more work done on are (1) making constructors "conditional" and (2) adding a bit-range access method.

* Should there be a way (#ifdef? separate class?) to provide an interoperable version that meets C++03 POD requirements, so endian objects can be put in unions? Is that what you mean by (1)?

Yes, this is what I meant. I was thinking that the most elegant solution would be to have a base class that contains everything except the constructors, and a derived class that adds the constructors (I think this was in a post in 2006).

There are some issues with the base class approach. For it to be a POD, it can't have constructors, base classes, or private members. No base classes means no arithmetic operations unless they are 100% supplied by the class itself. No private members means the non-POD version would have to use private inheritance and then forward all operations to the private operations. Messy, but I can't think of any other approach. It might be better to just have two separate types, endian and old_endian. (old_endian isn't very clear as a name, but calling it pod_endian becomes invalid the moment C++0x ships.)

...

The typedefs to big32_t etc. would then be #if'd to correspond to the POD or non-POD classes as desired. I think this is better than putting the #if's around the constructors themselves.

Hummm... Seems too obscure. Better to have two sets of typedefs, with the C++03 set prefixed by old_ (or whatever better name anyone can come up with.)

...

...
Do you have a proposed design for bit-range access?

I was thinking of adding methods to read and write the given bits within the value:

value_type endian::bits(std::size_t offset, std::size_t length) const; value_type endian::bits(std::size_t offset, std::size_t length, value_type value);

Making these template methods with offset and length being template arguments might allow for more compile-time unrolling, but since these methods just do shifting and masking, the compiler will probably do all the needed optimization anyway.

A better design might be to specialize Bitfield to accept endians, although I haven't studied Bitfield enough yet to see how well this would work.

...
What is the use case or motivation for bit-range access?

A packed binary structure like the following. The numbers are lengths in bits, and the total length is 24 bits (ie 3 bytes):

+------------+-----------+------------+ | field_a:6 | field_b:4 | field_c:14 | +------------+-----------+------------+

This may look contrived, but I have seen plenty of things similar to this in protocols and data files.

As part of a bigger struct, this would be implemented using endian::bits as follows:

struct packet { ubig24_t abc;

uint_least32_t field_a() const { return abc.bits(0, 6); } uint_least32_t field_b() const { return abc.bits(6, 4); } uint_least32_t field_b() const { return abc.bits(10, 14); } uint_least32_t field_a(uint_least32_t v) { return abc.bits(0, 6, v); } uint_least32_t field_b(uint_least32_t v) { return abc.bits(6, 4, v); } uint_least32_t field_b(uint_least32_t v) { return abc.bits(10, 14, v); } };

If the design used Bitfield instead, there would be one access method for both read and write, which would return by value a Bitfield on abc. However, the method would need to have both const and non-const versions, returning a const and non-const Bitfield respectively. Unfortunately, the Bitfield can't be made a part of the struct because it would take up space, breaking the mapping of the memory. So there would be an inconsistency with the syntax for accessing other (non-bit) fields within the structure, ie with and without parens, but I can live with that.

Interesting. I need to think about that some more. Thanks, --Beman

Neil Mayhew

11:50 p.m.

On 28/05/08 04:25 PM Beman Dawes wrote:

...

Neil Mayhew wrote:

...
... have a base class that contains everything except the constructors, and a derived class that adds the constructors

There are some issues with the base class approach. For it to be a POD, it can't have constructors, base classes, or private members.

Strictly speaking, yes, although we are only concerned about memory layout here, so the class doesn't have to be a true POD. The other concern is for the class to be able to live inside a union, and I think the constructors are the the only thing stopping that. (I #if'd the constructors and I was then able to use endian in a union.)

...

It might be better to just have two separate types, endian and old_endian. (old_endian isn't very clear as a name, but calling it pod_endian becomes invalid the moment C++0x ships.)

...
The typedefs to big32_t etc. would then be #if'd to correspond to the POD or non-POD classes as desired. I think this is better than putting the #if's around the constructors themselves.

Hummm... Seems too obscure. Better to have two sets of typedefs, with the C++03 set prefixed by old_ (or whatever better name anyone can come up with.)

I don't like "old_", at least not if it has to appear in my code! :-) I would like my client code to remain essentially the same when I upgrade to a C++0x compiler. This means having the same type names, which is why I suggested conditional typedefs. I thought this was cleaner than putting the conditionals inside the endian class itself. However, it doesn't make a lot of difference. Perhaps the simplest and best solution is therefore: class endian< ... > : cover_operators< ... > { public: #if defined(CXX_0X) || !defined(ENDIANS_IN_UNIONS) endian() {} endian(T val) { ... } #endif --Neil

Neil Mayhew

29 May 29 May

7:54 p.m.

On 28/05/08 05:50 PM Neil Mayhew wrote:

...

Perhaps the simplest and best solution is therefore:

class endian< ... > : cover_operators< ... > { public: #if defined(CXX_0X) || !defined(ENDIANS_IN_UNIONS) endian() {} endian(T val) { ... } #endif

I just discovered that an operator=(T) is needed as well: endian& operator=(T i) { detail::store_big_endian<T, n_bits/8>(bytes, i); } Without this, and without the endian(T val) constructor, a lot of things just don't work - for example, the binary arithmetic operators. This makes me think that there should have been an operator= all along. For a start, this is just good practice: anywhere there's a constructor initializing from a particular type, there should usually also be an assignment operator taking the same type. Second, I think the generated code for binary operators must have been suboptimal, since it seems that the computed result (a native integer) of adding two endians was being used to construct a temporary endian which was then copy-assigned into the actual result. At least, that's my interpretation of the compilation errors I was getting before I put operator= in. To test this, take out the constructors and do: big32_t a, b; nt32_t i; i = a + b; operator+(endian, endian) is implemented using cover_operators::operator+=, which is defined as x = +x + y, hence uses assignment of the result of adding two native integers converted from the respective endians. If you're assigning the result to a native type anyway, then returning an endian is inefficient, even with operator= defined. I'm not sure how to fix this, without abandoning the use of boost::operators. In fact, I am beginning to wonder whether the traditional approach to binary operators, which returns the same type as its two arguments, and implements by calling +=, is not actually appropriate for endian. --Neil

Beman Dawes

30 May 30 May

12:49 a.m.

Neil Mayhew wrote:

...

On 28/05/08 05:50 PM Neil Mayhew wrote:

...
Perhaps the simplest and best solution is therefore:

class endian< ... > : cover_operators< ... > { public: #if defined(CXX_0X) || !defined(ENDIANS_IN_UNIONS) endian() {} endian(T val) { ... } #endif

I just discovered that an operator=(T) is needed as well:

endian& operator=(T i) { detail::store_big_endian<T, n_bits/8>(bytes, i); }

Without this, and without the endian(T val) constructor, a lot of things just don't work - for example, the binary arithmetic operators.

This makes me think that there should have been an operator= all along. For a start, this is just good practice: anywhere there's a constructor initializing from a particular type, there should usually also be an assignment operator taking the same type.

Did you get a chance to look at the new version in the vault? See the message I posted two or three days ago: http://tinyurl.com/6xejo5 It provides operator=(T), for the reasons you identified.

...

Second, I think the generated code for binary operators must have been suboptimal, since it seems that the computed result (a native integer) of adding two endians was being used to construct a temporary endian which was then copy-assigned into the actual result. At least, that's my interpretation of the compilation errors I was getting before I put operator= in. To test this, take out the constructors and do:

big32_t a, b; nt32_t i; i = a + b;

operator+(endian, endian) is implemented using cover_operators::operator+=, which is defined as x = +x + y, hence uses assignment of the result of adding two native integers converted from the respective endians. If you're assigning the result to a native type anyway, then returning an endian is inefficient, even with operator= defined. I'm not sure how to fix this, without abandoning the use of boost::operators. In fact, I am beginning to wonder whether the traditional approach to binary operators, which returns the same type as its two arguments, and implements by calling +=, is not actually appropriate for endian.

I've got similar concerns, plus the use of boost::operators impact on PODness under the current standard. --Beman

Neil Mayhew

3:24 a.m.

On 2008-05-29 18:49, Beman Dawes wrote:

...

Neil Mayhew wrote:

...
I just discovered that an operator=(T) is needed as well:

Did you get a chance to look at the new version in the vault?... It provides operator=(T), for the reasons you identified.

My apologies. I did look at it, but didn't notice the addition of operator=, and also didn't notice that I still had the older (0.6) version in my working directory. It might be a good idea, when you post new versions of things to the vault, to keep one or two older versions around, to make it easier for people to see what's changed. I did still have 0.6 around, but it was on another machine and I didn't bother to copy it over to diff against it. --Neil

Beman Dawes

12:04 p.m.

Neil Mayhew wrote:

...

On 2008-05-29 18:49, Beman Dawes wrote:

...
Neil Mayhew wrote:

...
I just discovered that an operator=(T) is needed as well: Did you get a chance to look at the new version in the vault?... It provides operator=(T), for the reasons you identified.

My apologies. I did look at it, but didn't notice the addition of operator=, and also didn't notice that I still had the older (0.6) version in my working directory.

It might be a good idea, when you post new versions of things to the vault, to keep one or two older versions around, to make it easier for people to see what's changed. I did still have 0.6 around, but it was on another machine and I didn't bother to copy it over to diff against it.

I really should put it into the subversion sandbox. I'll try to do that later today. --Beman

vicente.botet

27 Sep 27 Sep

7:51 a.m.

----- Original Message ----- From: "Beman Dawes" <bdawes@acm.org> To: <boost@lists.boost.org> Sent: Friday, May 30, 2008 2:04 PM Subject: Re: [boost] [integer] Promotion of endian library code from vault

...

I really should put it into the subversion sandbox. I'll try to do that later today.

Hi Beman, there is a copy/paste bug on the sandbox. # ifdef BOOST_BIG_ENDIAN // BUG !!!!!!!!!! must be # ifdef BOOST_LITTLE_ENDIAN isn't it? Vicente template <typename T, std::size_t n_bits> class endian< endianness::little, T, n_bits, alignment::aligned > : cover_operators< endian< endianness::little, T, n_bits, alignment::aligned >, T > { BOOST_STATIC_ASSERT( (n_bits/8)*8 == n_bits ); BOOST_STATIC_ASSERT( sizeof(T) == n_bits/8 ); public: typedef T value_type; # ifndef BOOST_ENDIAN_NO_CTORS endian() BOOST_ENDIAN_DEFAULT_CONSTRUCT # ifdef BOOST_BIG_ENDIAN // BUG !!!!!!!!!! endian(T val) : m_value(val) { } # else explicit endian(T val) { detail::store_little_endian<T, sizeof(T)>(&m_value, val); } # endif # endif # ifdef BOOST_LITTLE_ENDIAN endian & operator=(T val) { m_value = val; return *this; } operator T() const { return m_value; } #else endian & operator=(T val) { detail::store_little_endian<T, sizeof(T)>(&m_value, val); return *this; } operator T() const { return detail::load_little_endian<T, sizeof(T)>(&m_value); } #endif private: T m_value; };

Beman Dawes

1 Oct 1 Oct

3:33 p.m.

vicente.botet wrote:

...

----- Original Message ----- From: "Beman Dawes" <bdawes@acm.org> To: <boost@lists.boost.org> Sent: Friday, May 30, 2008 2:04 PM Subject: Re: [boost] [integer] Promotion of endian library code from vault

...
I really should put it into the subversion sandbox. I'll try to do that later today.

Hi Beman, there is a copy/paste bug on the sandbox.

# ifdef BOOST_BIG_ENDIAN // BUG !!!!!!!!!!

must be # ifdef BOOST_LITTLE_ENDIAN

isn't it?

Nice catch! Fixed. This would have been detected by the regression tests if they were run on a big endian machine. --Beman

Beman Dawes

30 May 30 May

1:51 a.m.

Neil Mayhew wrote:

...

On 28/05/08 04:25 PM Beman Dawes wrote:

...
Neil Mayhew wrote:

...
... have a base class that contains everything except the constructors, and a derived class that adds the constructors There are some issues with the base class approach. For it to be a POD, it can't have constructors, base classes, or private members.

Strictly speaking, yes, although we are only concerned about memory layout here, so the class doesn't have to be a true POD. The other concern is for the class to be able to live inside a union, and I think the constructors are the the only thing stopping that. (I #if'd the constructors and I was then able to use endian in a union.)

I was wrong above - you are correct that just the constructs have to be removed. Private members and base classes are OK.

...

...
It might be better to just have two separate types, endian and old_endian. (old_endian isn't very clear as a name, but calling it pod_endian becomes invalid the moment C++0x ships.)

...
The typedefs to big32_t etc. would then be #if'd to correspond to the POD or non-POD classes as desired. I think this is better than putting the #if's around the constructors themselves. Hummm... Seems too obscure. Better to have two sets of typedefs, with the C++03 set prefixed by old_ (or whatever better name anyone can come up with.)

I don't like "old_", at least not if it has to appear in my code! :-)

I would like my client code to remain essentially the same when I upgrade to a C++0x compiler. This means having the same type names, which is why I suggested conditional typedefs. I thought this was cleaner than putting the conditionals inside the endian class itself. However, it doesn't make a lot of difference.

Perhaps the simplest and best solution is therefore:

class endian< ... > : cover_operators< ... > { public: #if defined(CXX_0X) || !defined(ENDIANS_IN_UNIONS) endian() {} endian(T val) { ... } #endif

That would work. The macro names need tweaks. Maybe !defined(BOOST_NO_RELAXED_PODS) || !defined(BOOST_ENDIANS_IN_UNIONS) or !(defined(BOOST_NO_RELAXED_PODS) && defined(BOOST_ENDIANS_IN_UNIONS)) It's been a long day so that'll need checking in the morning:-) Thanks for the ideas, --Beman

Neil Mayhew

3:53 a.m.

On 2008-05-29 19:51, Beman Dawes wrote:

...

...
...
There are some issues with the base class approach. For it to be a POD, it can't have constructors, base classes, or private members.

I was wrong above ... just the constructs have to be removed. Private members and base classes are OK.

Does that mean the FAQ entry "Are endian types POD's?" in the documentation needs to be changed? Or do you just mean what's said in the next FAQ entry, that "this problem has never been observed in a real compiler"? If we do implement the suggestion of a switch to turn off the definition of constructors, there will of course need to be some adjustments to the documentation. I'm still thinking about what those changes might usefully be. BTW, if you are making changes to the documentation, I have a very minor suggestion. In "What are the implications of C++03 endian types not being POD's?", it would make for easier reading if you made it clearer that there are two points being made; eg "Also, compilers aren't required to align or lay out storage in portable ways for non-POD types, although in practice this problem has never been observed in a real compiler." --Neil

Neil Mayhew

28 May 28 May

11:59 p.m.

On 26/05/08 02:31 PM Neil Mayhew wrote:

...

On 24/05/08 05:45 AM Beman Dawes wrote:

...
Do you have a proposed design for bit-range access?

I was thinking of adding methods to read and write the given bits within the value:

value_type endian::bits(std::size_t offset, std::size_t length) const; value_type endian::bits(std::size_t offset, std::size_t length, value_type value);

struct packet { ubig24_t abc;

uint_least32_t field_a() const { return abc.bits(0, 6); } uint_least32_t field_b() const { return abc.bits(6, 4); } uint_least32_t field_c() const { return abc.bits(10, 14); } ... };

On second thoughts, I've realised this would be easier to read, and safer, if the bits() methods used start and end rather than offset and length. Then the values in the methods would match up, and it would be more obvious if there were gaps or overlaps: return abc.bits(0, 6); ... return abc.bits(6, 10); ... return abc.bits(10, 14);

...

A better design might be to specialize Bitfield to accept endians...

If the design used Bitfield instead, there would be one access method for both read and write, which would return by value a Bitfield on abc.

Something like this: Bitfield<0, 6, big32_t> packet::field_a() { return Bitfield<0, 6, big32_t>(abc); }; You could then write: packet p; p.field_a() = 42; int x = p.field_a(); --Neil

vicente.botet

25 Sep 25 Sep

11:23 p.m.

----- Original Message ----- From: "Neil Mayhew" <neil_mayhew@users.sourceforge.net> To: <boost@lists.boost.org> Sent: Thursday, May 29, 2008 1:59 AM Subject: Re: [boost] [integer] Promotion of endian library code from vault

...

On 26/05/08 02:31 PM Neil Mayhew wrote:

...
On 24/05/08 05:45 AM Beman Dawes wrote:

...
Do you have a proposed design for bit-range access?

I was thinking of adding methods to read and write the given bits within the value:

value_type endian::bits(std::size_t offset, std::size_t length) const; value_type endian::bits(std::size_t offset, std::size_t length, value_type value);

On second thoughts, I've realised this would be easier to read, and safer, if the bits() methods used start and end rather than offset and length. Then the values in the methods would match up, and it would be more obvious if there were gaps or overlaps:

return abc.bits(0, 6); ... return abc.bits(6, 10); ... return abc.bits(10, 14);

Yes I think this is eassier to read and write.

...

...
A better design might be to specialize Bitfield to accept endians...

I don't think that bitfield needs to be modified to accept endians. The support is already a parameter.

...

...
If the design used Bitfield instead, there would be one access method for both read and write, which would return by value a Bitfield on abc.

Something like this:

Bitfield<0, 6, big32_t> packet::field_a() { return Bitfield<0, 6, big32_t>(abc); };

Hi, There is a problem with a single accessor, it not works for const packet. If we want to mimic whatever we can do with C/C++ bitfields do not forget that the library should take care of bitfields with different interger sizes and sign respect to its support, so the resulting type must be added as parameter. (or should the default conversions work?) typedef bitfield<int_least8_t, 0, 6, big32_t> field_a_type; In addition the bitfield class can provide two types reference and value and the user can use as follows field_a_type::reference field_a() { return field_a_type::reference(abc); }; field_a_type::value::value_type field_a() const { return field_a_type::value(abc); }; We can define a macro that will make easier the use of bitfield, including the typedef and the accessors definition. BOOST_BITFIELD(int_least8_t, field_a, 0, 6, big32_t); compare this to the C/C++ bitfield int_least8_t field_a:6 Vicente

Roland Schwarz

27 May 27 May

7:19 a.m.

Beman Dawes schrieb:

...

So the main help would be careful review and commenting on a revised version. I'll try to get that out in a week or so.

I use version 0.8: Questions: 1) Would it be possible to add a conversion operator to char* ? At least for unaligned types this looks useful to me since it would avoid ugly casts in cases where conversion is safe. 2) I'd like to be able to initialize a struct containing endian<> types by making it a reference to some buffer. E.g. char buf[1234]; my_struct& ms(buf); What would be needed to make this possible? (I'd expect this only to work if the entire struct contains of unaligned char[].) Is this feasible at all? 3) The naming endian for the lib does not caption its "real" purpose. At least I was not aware of it altough I was searching for it. I'd suggest something as ptype.hpp for _P_ortable _TYPES_. Regards, speedsnail

Neil Mayhew

1:44 p.m.

On 5/27/08 1:19 AM, Roland Schwarz wrote:

...

1) Would it be possible to add a conversion operator to char* ? At least for unaligned types.

2) I'd like to be able to initialize a struct containing endian<> types by making it a reference to some buffer.

I think perhaps you've misunderstood the purpose of endian. The idea is to be able to define a structure like this: struct example_t { big8_t a; big24_t b; big32_t c; }; example_t data; You read it from a file or a socket with read(fd, &data, sizeof(data)) and then use data.a, data.b and data.c as if they were built-in integer types. Reading into a character buffer and then "mapping" a structure on top of it could be done with reinterpret_cast<example_t*>(buffer), although this is not the preferred approach. The idea is not to take individual endian integers and force them to a particular position within a buffer, but to let the whole struct do the work of computing the offsets. Of course, that can be still done with reinterpret_cast, but it's not how the library is designed to work.

...

3) The naming endian for the lib does not caption its "real" purpose. At least I was not aware of it altough I was searching for it. I'd suggest something as ptype.hpp for _P_ortable _TYPES_.

You have a point here, although the original purpose of the library is to provide a clean and safe way of accessing data that does not use native endianness. The capability to handle unaligned and odd-sized integers goes naturally with this, so it makes sense to me to keep the name. It was what I searched for when I was looking for something like this. I was trying to handle big-endian data on a little-endian platform. I would say the purpose is NOT to provide portable types, but to be able to handle non-portable types that are forced on us by legacy and platform-specific data formats. Endian *can* be used to create portable binary files, but users of endian should be careful to consider the issues before making use of endian too heavily for new formats. A truly portable format like XML is in general much preferable to creating new binary data formats. My only concern with the name endian is that there is already an endian.hpp elsewhere in boost and I think it would be easy for people to miss this endian, and never discover its functionality. --Neil

Hansi

2:41 p.m.

It would be also great if it is possible to directly use lexical_cast with the endian library! Neil Mayhew schrieb:

...

On 5/27/08 1:19 AM, Roland Schwarz wrote:

...
1) Would it be possible to add a conversion operator to char* ? At least for unaligned types.

2) I'd like to be able to initialize a struct containing endian<> types by making it a reference to some buffer.

I think perhaps you've misunderstood the purpose of endian. The idea is to be able to define a structure like this:

struct example_t { big8_t a; big24_t b; big32_t c; };

example_t data;

You read it from a file or a socket with read(fd, &data, sizeof(data)) and then use data.a, data.b and data.c as if they were built-in integer types.

Reading into a character buffer and then "mapping" a structure on top of it could be done with reinterpret_cast<example_t*>(buffer), although this is not the preferred approach. The idea is not to take individual endian integers and force them to a particular position within a buffer, but to let the whole struct do the work of computing the offsets. Of course, that can be still done with reinterpret_cast, but it's not how the library is designed to work.

...
3) The naming endian for the lib does not caption its "real" purpose. At least I was not aware of it altough I was searching for it. I'd suggest something as ptype.hpp for _P_ortable _TYPES_.

You have a point here, although the original purpose of the library is to provide a clean and safe way of accessing data that does not use native endianness. The capability to handle unaligned and odd-sized integers goes naturally with this, so it makes sense to me to keep the name. It was what I searched for when I was looking for something like this. I was trying to handle big-endian data on a little-endian platform.

I would say the purpose is NOT to provide portable types, but to be able to handle non-portable types that are forced on us by legacy and platform-specific data formats. Endian *can* be used to create portable binary files, but users of endian should be careful to consider the issues before making use of endian too heavily for new formats. A truly portable format like XML is in general much preferable to creating new binary data formats.

My only concern with the name endian is that there is already an endian.hpp elsewhere in boost and I think it would be easy for people to miss this endian, and never discover its functionality.

--Neil

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Scott McMurray

28 May 28 May

4:04 a.m.

On Tue, May 27, 2008 at 10:41 AM, Hansi <hansipet@web.de> wrote:

...

It would be also great if it is possible to directly use lexical_cast with the endian library!

I don't follow. What would you want to want to use lexical_cast for?

Hansi

5:33 a.m.

sorry for the bad explanation... I would use it only to convert to strings and backwards... Scott McMurray schrieb:

...

On Tue, May 27, 2008 at 10:41 AM, Hansi <hansipet@web.de> wrote:

...
It would be also great if it is possible to directly use lexical_cast with the endian library!

I don't follow.

What would you want to want to use lexical_cast for? _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Roland Schwarz

5:58 a.m.

Neil Mayhew wrote:

...

I think perhaps you've misunderstood the purpose of endian. The idea is to be able to define a structure like this:

struct example_t { big8_t a; big24_t b; big32_t c; };

example_t data;

You read it from a file or a socket with read(fd, &data, sizeof(data)) and then use data.a, data.b and data.c as if they were built-in integer types.

Reading into a character buffer and then "mapping" a structure on top of it could be done with reinterpret_cast<example_t*>(buffer), although this is not the preferred approach.

I don't think I misunderstood the purpose. It is just that in my code I ended up exactly as you said: I have buffer in terms of char, i.e. char buffer[]. This buffer is not from stream, file or socket. It comes from a fifo in my application. The I want to make use of the buffer as: if (type == id_foo) { foo& f(*reinterpret_cast<foo*>(&buffer[0])); ... do something with foo ... } And I think this is not only ugly but also unnecessary, since foo already might be expressed in terms of char[]. If not, e.g. because foo is not made of unaligned data, I'd prefer a compiler error instead. Btw. read(&foo, sizeof(foo)) also will need to do an equivalent of reinterpret cast internally. Doesn't it? I just think that endian is a fairly general idea that should not be limited to stream, file and socket. Yes of course I always can do the reinterpret_cast, but if we can do better, why shouldn't we? An alternative I can think of would be to not make the bytes[] private. -- _________________________________________ _ _ | Roland Schwarz |_)(_ | aka. speedsnail | \__) | mailto:roland.schwarz@chello.at ________| http://www.blackspace.at

Roland Schwarz

9:33 a.m.

Just checked: iostream::write can't write to a void* This means: struct A_t { ulittle32_t a; ulittle32_t b; } A; std::fstream f("test.bin", ios::out|ios::binary); f.write(&A, sizeof(A_t); won't compile. And I think one should be able to expect this to work for an endian lib. Or am am I missing something here? -- _________________________________________ _ _ | Roland Schwarz |_)(_ | aka. speedsnail | \__) | mailto:roland.schwarz@chello.at ________| http://www.blackspace.at

Beman Dawes

10:09 p.m.

Roland Schwarz wrote:

...

Just checked: iostream::write can't write to a void*

Actually, its expecting a const char*.

...

This means:

struct A_t { ulittle32_t a; ulittle32_t b; } A;

std::fstream f("test.bin", ios::out|ios::binary);

f.write(&A, sizeof(A_t);

won't compile. And I think one should be able to expect this to work for an endian lib.

Or am am I missing something here?

A reinterpret_cast is needed for a struct. It doesn't matter what is inside the struct. --beman

Scott McMurray

29 May 29 May

1:33 a.m.

On Wed, May 28, 2008 at 1:58 AM, Roland Schwarz <roland.schwarz@chello.at> wrote:

...

I have buffer in terms of char, i.e. char buffer[]. This buffer is not from stream, file or socket. It comes from a fifo in my application. The I want to make use of the buffer as:

if (type == id_foo) { foo& f(*reinterpret_cast<foo*>(&buffer[0])); ... do something with foo ...

}

And I think this is not only ugly but also unnecessary, since foo already might be expressed in terms of char[]. If not, e.g. because foo is not made of unaligned data, I'd prefer a compiler error instead.

I think that there's a strong possibility that violates strong aliasing requirements.

...

Btw. read(&foo, sizeof(foo)) also will need to do an equivalent of reinterpret cast internally. Doesn't it?

I just think that endian is a fairly general idea that should not be limited to stream, file and socket.

Is there a usage for it outside of I/O though? I can't think of one. Perhaps what's really needed it a "better" (for some criteria I can't elaborate) I/O system...

...

Yes of course I always can do the reinterpret_cast, but if we can do better, why shouldn't we?

An alternative I can think of would be to not make the bytes[] private.

It has to be public, to be POD.

Neil Mayhew

5:54 a.m.

On 2008-05-27 23:58, Roland Schwarz wrote:

...

I have buffer in terms of char, i.e. char buffer[]. This buffer is not from stream, file or socket. It comes from a fifo in my application.

If you are using a FIFO, then I assume the data is originating on the same machine. In which case, there should be no need to use endian, as you can arrange that the endianness, and even the alignment, is native to the machine.

...

The I want to make use of the buffer as:

if (type == id_foo) { foo& f(*reinterpret_cast<foo*>(&buffer[0])); ... do something with foo ... }

And I think this is not only ugly but also unnecessary, since foo already might be expressed in terms of char[]. If not, e.g. because foo is not made of unaligned data, I'd prefer a compiler error instead.

Yes of course I always can do the reinterpret_cast, but if we can do better, why shouldn't we?

I think reinterpret_cast is the honest thing to do, because that is actually what you are doing, reinterpreting a memory location. Just because an endian is defined in terms of chars internally, that's not actually what it is. Also, people reading your code will find it easier to understand what is going on if they see the reinterpret_cast, since this sort of poking around in memory, based on a type id, conventionally uses reinterpret_cast. It may be a little long-winded, but it's not necessarily ugly, just explicit. Maybe your use case isn't really suited to endian, and you should instead define a few functions like the load_big_endian that's in the detail section of endian.hpp. Then you could write: if (type == id_foo) { u_int32_t f(load_big_endian<u_int32_t, 24>(&buffer[0])); Another possibility would be to put this f in a union: struct data { int8_t type; union { ubig24_t f; ubig32_t g; my_type h; // etc. }; }; However, I realise this may not fit your data. Have you looked at boost serialization? That might fit your needs better. Endian is designed for when a block of data is fetched in one go, and you want to map a complex structure directly over it, without fiddling with pointers and offsets.

...

Btw. read(&foo, sizeof(foo)) also will need to do an equivalent of reinterpret cast internally. Doesn't it?

I meant the POSIX read which has a void* parameter for the buffer. fstream's read uses char*, and so would need a reinterpret_cast. --Neil

6142

Age (days ago)

6274

Last active (days ago)

List overview

Download

37 comments

9 participants

participants (9)

Beman Dawes
dizzy
Giovanni Piero Deretta
Hansi
Neil Mayhew
Phil Endecott
Roland Schwarz
Scott McMurray
vicente.botet