Access to unaligned (packed) structs

I am searching for a library / component / pattern that will help me with the following problem: I have a memory structure in raw memory (received from a measurement instrument), that is unaligned in its data members. Now I want to have an access interface to this data that comes as close as possible to access of a struct. E.g. struct foo { char a; int b; } Altough my raw data "char* pdata" has an 8bit followed by a 32bit I want to be able to do something like: foo data(pdata); and then char ch = data.a; int n = data.b; Does anyone know about such a library? (This should work compiler/platform independent of course.) Thank you, Roland aka speedsnail

On Thu, 11 Oct 2007 08:40:21 +0200, Roland Schwarz <roland.schwarz@chello.at> wrote:
I am searching for a library / component / pattern that will help me with the following problem:
I have a memory structure in raw memory (received from a measurement instrument), that is unaligned in its data members.
Now I want to have an access interface to this data that comes as close as possible to access of a struct. E.g.
struct foo { char a; int b; }
Altough my raw data "char* pdata" has an 8bit followed by a 32bit I want to be able to do something like:
foo data(pdata);
and then
char ch = data.a; int n = data.b;
Does anyone know about such a library? (This should work compiler/platform independent of course.)
class foo { private: unsigned char internals[5]; public: char a() {return internals[0];} int b() { // select code block depending on your endianness unsigned int n=(a[0]<<24)|(a[1]<<16]|(a[2]<<8)|a[3]; unsigned int n=(a[3]<<24)|(a[2]<<16]|(a[1]<<8)|a[0]; return static_cast<int>(n); } };

Zara wrote:
class foo { private: unsigned char internals[5]; public: char a() {return internals[0];} int b() { // select code block depending on your endianness unsigned int n=(a[0]<<24)|(a[1]<<16]|(a[2]<<8)|a[3]; unsigned int n=(a[3]<<24)|(a[2]<<16]|(a[1]<<8)|a[0]; return static_cast<int>(n); } };
Thank you for the answer, but the solution is only close to what I requested. Access syntax in your solution is: data.a() while I want data.a I.e. a has to be an object on its own that has a type conversion operator for the access. My problem as far as I can currently see, boils down to another question: I have struct bar { } and struct foo { bar a; } How can I give bar access to its "container"? Perhaps like so: struct foo { foo() { a.setref(this); } bar a; } But this implies runtime cost, which I want to avoid. Another solution: struct foo { foo() : a(this) {} bar a; } While I am not sure if this avoids the runtime cost, I am also unsure if this is valid C++ at all. At least the microsoft compiler spills out a warning. Roland aka speedsnail

Zara wrote:
class foo { private: unsigned char internals[5]; public: char a() {return internals[0];} int b() { // select code block depending on your endianness unsigned int n=(a[0]<<24)|(a[1]<<16]|(a[2]<<8)|a[3]; unsigned int n=(a[3]<<24)|(a[2]<<16]|(a[1]<<8)|a[0]; return static_cast<int>(n); } };
Hmm, I tried the following: template<class T> struct packed { unsigned char data[sizeof(T)]; operator T&() const { return static_cast<T>(*data); } } struct foobar { packed<char> a; packed<int> b; } which seems to work by try. But is this standard conformant usage? (I need this only for POD's) Roland aka speedsnail

On Thu, 11 Oct 2007 13:43:47 +0200, Roland Schwarz <roland.schwarz@chello.at> wrote:
Zara wrote:
class foo { private: unsigned char internals[5]; public: char a() {return internals[0];} int b() { // select code block depending on your endianness unsigned int n=(a[0]<<24)|(a[1]<<16]|(a[2]<<8)|a[3]; unsigned int n=(a[3]<<24)|(a[2]<<16]|(a[1]<<8)|a[0]; return static_cast<int>(n); } };
Hmm, I tried the following:
template<class T> struct packed { unsigned char data[sizeof(T)]; operator T&() const { return static_cast<T>(*data); } }
struct foobar { packed<char> a; packed<int> b; }
which seems to work by try. But is this standard conformant usage? (I need this only for POD's)
Roland aka speedsnail
I may not work for PODs. It depends on the data alignment of the compiler, and the data alignment capabilities of the processor. You are invoking the dreaded devils of Undefined Behaviour! Bes regards, Zara

First there was a typo: it should read operator T() const not operator T&() const Zara wrote:
I may not work for PODs. It depends on the data alignment of the compiler, and the data alignment capabilities of the processor.
You are invoking the dreaded devils of Undefined Behaviour!
Of course I want to avoid this, but could you please be more specific? Is it the static_cast<T>(*data) that will cause troubles? If so why? Would it be better to use something like operator T() const { T t = 0; for (unsigned n=0; n<sizeof(T); ++n) { t |= ((T)data[n])<<(8*n); } return t; } Roland aka speedsnail

On Thu, 11 Oct 2007 14:19:34 +0200, Roland Schwarz <roland.schwarz@chello.at> wrote:
First there was a typo: it should read operator T() const not operator T&() const
Zara wrote:
I may not work for PODs. It depends on the data alignment of the compiler, and the data alignment capabilities of the processor.
You are invoking the dreaded devils of Undefined Behaviour!
Of course I want to avoid this, but could you please be more specific?
Is it the static_cast<T>(*data)
that will cause troubles?
If so why?
Because not all processors may access words not aligned with the natural word boundary. This could result in some type of exception (depends on processor, OS, compiler...). OTOH, if you compile with 4-byte alignment, the cast *may* strip the lower bits of the pointer and give you rubbish instead of your value. This is UB, and should never be invoked.
Would it be better to use something like
operator T() const { T t = 0; for (unsigned n=0; n<sizeof(T); ++n) { t |= ((T)data[n])<<(8*n); } return t; }
yes, certainly. Just check if the endiannes is the right one, or if you must caculate the toal the oother way round. Best regrads, Zara

Zara wrote:
yes, certainly. Just check if the endiannes is the right one, or if you must caculate the toal the oother way round.
Thank you. I have to add: My example posted was bogus. I just tried to compile and never instantiated the class. When I did I saw that the compiler complained anyways. To make it work I had to use: template<class T> struct packed { unsigned char data[sizeof(T)]; operator T&() { return *reinterpret_cast<T*>(data); } } And then I looked into the standard: 5.2.10 clause 7: A pointer to an object can be explicitly converted to an object of different type. I think this is my case. But of course: .., the result of such a pointer conversion is unspecified. And this I guess is what you meant. So I stick with the latter, explicit code. So the question is left: unsigned char mem[] = {1,2,3,4,5}; struct foo { packed<char> a; packed<int> b; }; How do I map my struct to mem avoiding surprise: foo& f((foo&)(*mem)); I guess this is bad again, as for the very same reasons of unspecified behaviour, true? At least I cannot use a static_cast instead of (foo&) cast. Thanks again Roland aka speedsnail

Roland Schwarz wrote:
How do I map my struct to mem avoiding surprise:
foo& f((foo&)(*mem));
I guess this is bad again, as for the very same reasons of unspecified behaviour, true? At least I cannot use a static_cast instead of (foo&) cast.
Altough I am not terribly skilled in standards interpretation I think the following should be safe: union { mem[]; foo s; }; Now I should be able to access this as mem[0], mem[1], ... from the writing side, e.g. from a function that fills my buffer, and later access it with s.a, s.b. I would be glad if someone more skilled than me could assert my believe. Roland aka speedsnail

Roland Schwarz wrote:
Altough I am not terribly skilled in standards interpretation I think the following should be safe:
union { mem[]; foo s; };
Now I should be able to access this as mem[0], mem[1], ... from the writing side, e.g. from a function that fills my buffer, and later access it with s.a, s.b.
Absolutely not. The only circumstance in which you may access a member of a union that is not the one that has been most recently set is when you access common, structurally conformant members of structs, i.e. struct foo { int i; float f; }; struct bar { int one; float two; }; union foobar { foo f; bar b; }; foobar fb; fb.f.i = 0; fb.f.f = 1.1; cout << fb.b.one; // Yep, valid. The moment the types are not exactly equal (and two consecutive chars are not the same as a char[2]) you're no longer allowed to assume anything. Sebastian Redl

Sebastian Redl wrote:
Absolutely not. The only circumstance in which you may access a member of a union that is not the one that has been most recently set is when you access common, structurally conformant members of structs, i.e.
Hmm, you may have missed the details of my example, so I repeat the essentials: struct foo1 { unsigned data[4]; }; struct foo2 { unsigned data[2]; }; struct bar { foo1 a; foo2 b; }; union { data[6]; bar b; }; Are those types not "layout compatible" as the standard requires for the access I intend? I mean foo1, foo2 are POD. So the question boils down to whether foo1 and unsigned data[4] are layout compatible, not? Roland aka speedsnail

Roland Schwarz wrote:
Hmm, you may have missed the details of my example, so I repeat the essentials:
struct foo1 { unsigned data[4]; };
struct foo2 { unsigned data[2]; };
struct bar { foo1 a; foo2 b; };
union { data[6]; bar b; };
Are those types not "layout compatible" as the standard requires for the access I intend? I mean foo1, foo2 are POD. So the question boils down to whether foo1 and unsigned data[4] are layout compatible, not?
The answer is still no. The standard says (9.2/16):
If a POD-union contains two or more POD-structs that share a common initial sequence, and if the POD-union object currently contains one of these POD-structs, it is permitted to inspect the common initial part of any of them. Two POD-structs share a common initial sequence if corresponding members have layout-compatible types (and, for bit- fields, the same widths) for a sequence of one or more initial mem- bers. This is the only case where access through a different member is allowed.
unsigned data[6] is not a POD-struct, even though it is a POD. Furthermore, you definitely can't access foo2's data, because there might be padding between foo1 and foo2. (Not likely that any compiler does that, but they're allowed to.) Sebastian Redl

Sebastian Redl wrote:
The answer is still no. The standard says (9.2/16):
Ok, I think I got the point. I think the situation is even worse than I initially was thinking of: So even in the following case where T is an POD-object type: struct U { T t; }; and when sizeof(T) == sizeof(U) I am allowed to: T t; U u1; U u2; memcpy(&t,&u1,sizeof(T); memcpy(&u2,&t,sizeof(T); and rely on u1 == u2, but may not rely on: t == u1 or t == u2, because there are no guarantees about value representation guarantees between U and T. Correct? Roland aka speedsnail

Roland Schwarz wrote:
So even in the following case where T is an POD-object type:
struct U { T t; };
and when sizeof(T) == sizeof(U)
I am allowed to:
T t; U u1; U u2;
memcpy(&t,&u1,sizeof(T); memcpy(&u2,&t,sizeof(T);
and rely on u1 == u2, but may not rely on: t == u1 or t == u2, because there are no guarantees about value representation guarantees between U and T.
Correct?
Yes, I think this is correct. Sebastian Redl

On 10/11/07 01:40, Roland Schwarz wrote:
I am searching for a library / component / pattern that will help me with the following problem:
I have a memory structure in raw memory (received from a measurement instrument), that is unaligned in its data members.
Now I want to have an access interface to this data that comes as close as possible to access of a struct. E.g.
struct foo { char a; int b; }
Altough my raw data "char* pdata" has an 8bit followed by a 32bit I want to be able to do something like:
foo data(pdata);
and then
char ch = data.a; int n = data.b;
Does anyone know about such a library? (This should work compiler/platform independent of course.)
There's something related described here: http://archives.free.net.ph/message/20070514.193240.ada2f1bb.en.html Unfortunately, its specific to Gregor's g++ variadic template compiler. OTOH, maybe the variadic version of the tuple and packed tuple described in the post could be converted back into a non-variadic version as in current fusion or mpl. Unfortunately that would involve adapting all the boost preprocessor logic in fusion or mpl's tuple to emulate Gregor's variadic templates for this particular case, but that's just the reverse of what was done to produce the mpl-vt.zip in <boost-vault>/variadic_templates. So, maybe it wouldn't be too much trouble.

Hi Roland, Roland Schwarz wrote:
I am searching for a library / component / pattern that will help me with the following problem:
I have a memory structure in raw memory (received from a measurement instrument), that is unaligned in its data members.
Now I want to have an access interface to this data that comes as close as possible to access of a struct.
I have had to do this for USB descriptors. In my case, the hardware (ARM) does not support unaligned accesses. There is also a question of endianness to worry about. Do you need read-only or read-write access?
(This should work compiler/platform independent of course.)
It's not difficult to write NON compiler-independent code that does the right thing, e.g. for gcc: struct foo { char a; int b; } __attribute__ ((packed)) This will do the right thing. In particular, when you read from foo.b on my ARM system the compiler will generate a series of four byte read instructions and combine the results. However, you have to be very careful what you do with such a field. In particular, you mustn't take the address of foo.b: void f(int* p) { ... } f(&foo.b); Here, the code inside f will assume that p is aligned; it will NOT generate the byte reads and if you're lucky you'll get a bus error; if you're unlucky you'll just get the wrong answer. (I spent a long time debugging something like this.... largely because gcc does not give any warning when you take the address of an unaligned field. On x86, however, it probably will work since the hardware supports unaligned accesses - though I believe that the compiler is still allowed to assume that pointers are aligned, even on x86. I believe that other compilers have ways to declare a struct as packed with very similar semantics. So you could probably write a library with some per-compiler macros to make this multi-compiler, but not compiler-independent. If you really want standards-compliant compiler-independent code: Quoting from a later message:
template<class T> struct packed { unsigned char data[sizeof(T)]; operator T&() const { return static_cast<T>(*data); } }
I think you mean something like operator T() const { return *(reinterpret_cast<T>(*data)); } don't you? If that's what you mean, the answer is that No, it won't work (but will "probably" work on x86). You need to memcpy into an aligned T: operator T() const { T t; memcpy(&t,data,sizeof(T)); return t; } If you need it to be writeable you can add an operator= and copy-constructor that memcpy the other way. My experience is that g++ will optimise-away the memcpy in this sort of code and you'll end up with something close-to-optimal. I don't know about other compilers (and would love to know). But I think the problem is that when you build your struct containing these packed<T> objects: struct foobar { packed<char> a; packed<int> b; } I don't think (but am not sure) that you can assume that there is no padding between a and b. Can someone confirm? I have a vague recollection that the compiler has some freedom in this case because it changed at some point for gcc on ARM, which resulted in binary incompatibility. I hope that is of some use. Good luck! Phil.
participants (5)
-
Larry Evans
-
Phil Endecott
-
Roland Schwarz
-
Sebastian Redl
-
Zara