[endain_ext] Beman's Integer.Endian extensions to work with endian unaware data

Hi, I have a implemented a prototype that allows to make endian conversions in place respect to an endian point of view (domain) of endian unaware data. The point of view is represented by a map from the native data types to an endian tree, which have mpl sequences as nodes and the leaves are the endianess of the integer types. namespace endian { struct big {}; struct little {}; # ifdef BOOST_BIG_ENDIAN typedef big native ; # elsif BOOST_LITTLE_ENDIAN typedef little native ; # error "Endian not detected." # endif } Two functions to convert any data in place are provided: template <typename Domain, typename T> void convert_from(T& r); template <typename Domain, typename T> void convert_to(T& r); The default implementation works for fusion sequences. So the user will need to adapt the structures so them are viewed as fusion sequences or overload the functions. Next follows an example that defines 3 structures, and a point of view (domain) that see one of then as big, the second as little and the third as a mix of big and little. using namespace boost; // native structs definition namespace X { struct big_c { uint32_t a; uint16_t b; }; struct little_c { int32_t a; int16_t b; }; struct mixed_c { big_c a; little_c b; }; } // definition of a point of view (domain) struct network {}; // mapping of native types to an endian tree for the network point of view (domain) namespace boost { namespace endian { template <> struct domain_map <network, X::big_c> { typedef mpl::vector<big,big> type; }; template <> struct domain_map <network, X::little_c> { typedef mpl::vector<little,little> type; }; }} // view of the structures as fusion sequences BOOST_FUSION_ADAPT_STRUCT(X::big_c, (uint32_t, a) (uint16_t, b) ) BOOST_FUSION_ADAPT_STRUCT(X::little_c, (int32_t, a) (int16_t, b) ) BOOST_FUSION_ADAPT_STRUCT(X::mixed_c, (X::big_c, a) (X::little_c, b) ) int main( int argc, char * argv[] ) { X::mixed_c m; // fill with data from the 'network' domain (emulating the reception from the domain 'network') m.a.a=0x01020304; m.a.b=0x0A0B; m.b.a=0x04030201; m.b.b=0x0B0A; std::cout << std::hex << m.a.a << std::endl; std::cout << std::hex << m.a.b << std::endl; std::cout << std::hex << m.b.a << std::endl; std::cout << std::hex << m.b.b << std::endl; // convert from the network to the native domain endian::convert_from<network>(m); std::cout << std::hex << m.a.a << std::endl; std::cout << std::hex << m.a.b << std::endl; std::cout << std::hex << m.b.a << std::endl; std::cout << std::hex << m.b.b << std::endl; // work with the native variable // ... // convert from the native to the network domain endian::convert_from<network>(m); std::cout << std::hex << m.a.a << std::endl; std::cout << std::hex << m.a.b << std::endl; std::cout << std::hex << m.b.a << std::endl; std::cout << std::hex << m.b.b << std::endl; return 0; } The result is: 1020304 a0b 4030201 b0a 1020304 a0b 4030201 b0a 1020304 a0b 4030201 b0a The library could define three predefined maps for the domain endian::native, endian::big, endian::little, which maps every leaf to the corresponding endian type. I have not see the assembler generated, but the design is there to do nothing when the in place conversion concerns the same endianness. The library is based on the one from Beman's. * I have split the the endian class on two levels: - endian_pack - endian. * replaced the enum class endianness by a endianness tag (see above) * I have added an endian_view class that is a reference to a native type viewed with a specific endianess (a kind of endian_cast). * An last I have added on top the requested 'in place conversion' functions. The last two features work with endian anaware data. I could add pure conversions if needed on endian anaware data. The used names for functions and classes and under what namespace put them should yet be determined. All the sources can be found in the sandbox under http://svn.boost.org/svn/boost/sandbox/endian_ext. Documentation will come later :(. As always, comments are welcome :) _____________________ Vicente Juan Botet Escribá http://viboes.blogspot.com/

vicente.botet wrote:
I have a implemented a prototype that allows to make endian conversions in place respect to an endian point of view (domain) of endian unaware data. The point of view is represented by a map from the native data types to an endian tree, which have mpl sequences as nodes and the leaves are the endianess of the integer types.
I have no idea what that means. (Having now examined the example you provided below, I have a better idea, but as expressed it doesn't tell me much.)
Two functions to convert any data in place are provided:
template <typename Domain, typename T> void convert_from(T& r);
template <typename Domain, typename T> void convert_to(T& r);
The default implementation works for fusion sequences. So the user will need to adapt the structures so them are viewed as fusion sequences or overload the functions.
OK. That much seems reasonable.
Next follows an example that defines 3 structures, and a point of view (domain) that see one of then as big, the second as little and the third as a mix of big and little.
using namespace boost; // native structs definition namespace X { struct big_c { uint32_t a; uint16_t b; }; struct little_c { int32_t a; int16_t b; }; struct mixed_c { big_c a; little_c b; }; }
// definition of a point of view (domain) struct network {};
// mapping of native types to an endian tree for the network point of view (domain) namespace boost { namespace endian { template <> struct domain_map <network, X::big_c> { typedef mpl::vector<big,big> type; }; template <> struct domain_map <network, X::little_c> { typedef mpl::vector<little,little> type; };
}}
I have no idea what that means. (Having now studied the rest of the example, I understand the purpose for the above, but that doesn't mean I really understand it.)
// view of the structures as fusion sequences BOOST_FUSION_ADAPT_STRUCT(X::big_c, (uint32_t, a) (uint16_t, b) ) BOOST_FUSION_ADAPT_STRUCT(X::little_c, (int32_t, a) (int16_t, b) ) BOOST_FUSION_ADAPT_STRUCT(X::mixed_c, (X::big_c, a) (X::little_c, b) )
The type duplication is unfortunate, but that was necessary in Beman's approach, too.
int main( int argc, char * argv[] ) {
X::mixed_c m; // fill with data from the 'network' domain (emulating the reception from the domain 'network') m.a.a=0x01020304; m.a.b=0x0A0B; m.b.a=0x04030201; m.b.b=0x0B0A;
This is just putting raw data, of assumed endianness, into the structures, right?
// convert from the network to the native domain endian::convert_from<network>(m);
Here you have declared that the current data in m is in network order and requested that it be converted to host order, right? That's no different, other than implementation details, than what Tomas originally provided, IIUC.
// convert from the native to the network domain endian::convert_from<network>(m);
^^^^ to?
The library could define three predefined maps for the domain endian::native, endian::big, endian::little, which maps every leaf to the corresponding endian type.
That's an appropriate step, given the commonality of those options.
I have not see the assembler generated, but the design is there to do nothing when the in place conversion concerns the same endianness.
That's a necessity.
The library is based on the one from Beman's. * I have split the the endian class on two levels: - endian_pack - endian. * replaced the enum class endianness by a endianness tag (see above)
Tomas already noted that he didn't like the use of enums, so you're on the same page there.
* I have added an endian_view class that is a reference to a native type viewed with a specific endianess (a kind of endian_cast).
That's not unlike Beman's approach in that every access does the conversion, right? Doing that with a view is nice.
* An last I have added on top the requested 'in place conversion' functions.
The last two features work with endian anaware data. I could add pure conversions if needed on endian anaware data.
I presume by "anaware" you mean "unaware." What do you mean by "pure conversions?" Do you mean the lower level, function-based interface? If I understand your proposal, you're trying to avoid the need to define a duplicate structure, just to express the source endianness, by instead applying a convert_to/from with a mapping describing the endianness of each field (or all fields?) of the data and the desired order, to convert the data. Is that right? Obviously, the scaffolding required by your approach must be compared against the parallel definition to see which is clearer, more functional, and performs better. I'd like to see Tomas provide the same behavior using his approach and then we can compare them directly. We'll also want to examine the performance differences, if any. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

----- Original Message ----- From: "Stewart, Robert" <Robert.Stewart@sig.com> To: <boost@lists.boost.org> Sent: Friday, June 11, 2010 1:13 PM Subject: Re: [boost] [endain_ext] Beman's Integer.Endian extensions to work with endian unaware data
vicente.botet wrote:
I have a implemented a prototype that allows to make endian conversions in place respect to an endian point of view (domain) of endian unaware data. The point of view is represented by a map from the native data types to an endian tree, which have mpl sequences as nodes and the leaves are the endianess of the integer types.
I have no idea what that means. (Having now examined the example you provided below, I have a better idea, but as expressed it doesn't tell me much.)
This was not my intention. I'll try to do better next time. n>> // definition of a point of view (domain)
struct network {};
// mapping of native types to an endian tree for the network point of view (domain) namespace boost { namespace endian { template <> struct domain_map <network, X::big_c> { typedef mpl::vector<big,big> type; }; template <> struct domain_map <network, X::little_c> { typedef mpl::vector<little,little> type; };
}}
I have no idea what that means. (Having now studied the rest of the example, I understand the purpose for the above, but that doesn't mean I really understand it.)
I think that the use of the newtwork works was not adequented. Just replace "netwrok" by "ifA". This means that relative to the interface "ifA" the structure X::big_c has its two fields in big endian format, and that the structure X::little_c has its two fields in little endian format. Note that the integer used when not endian aware.
// view of the structures as fusion sequences BOOST_FUSION_ADAPT_STRUCT(X::big_c, (uint32_t, a) (uint16_t, b) ) BOOST_FUSION_ADAPT_STRUCT(X::little_c, (int32_t, a) (int16_t, b) ) BOOST_FUSION_ADAPT_STRUCT(X::mixed_c, (X::big_c, a) (X::little_c, b) )
The type duplication is unfortunate, but that was necessary in Beman's approach, too.
You mean Tom's approach? The difference is that Tom used a different macro, so an application needing to see the structure has fussion sequence and to define the Tom swap function needed to declare both.
int main( int argc, char * argv[] ) {
X::mixed_c m; // fill with data from the 'network' domain (emulating the reception from the domain 'network') m.a.a=0x01020304; m.a.b=0x0A0B; m.b.a=0x04030201; m.b.b=0x0B0A;
This is just putting raw data, of assumed endianness, into the structures, right?
Yes. Sorry if the code comment was not clear.
// convert from the network to the native domain endian::convert_from<network>(m);
Here you have declared that the current data in m is in network order and requested that it be converted to host order, right?
Just a precission. Convert from *the point of view given by the domain map 'network or "ifA"* to the native endian. I repeat the use of the "network" point of view let some of you think that it was big-endian,but it wsa not big-endian in all the cases.
That's no different, other than implementation details, than what Tomas originally provided, IIUC.
Tom's swap_in_place force to interpret all the fields of the structure with the same endianness.
// convert from the native to the network domain endian::convert_from<network>(m);
^^^^ to? Yes.
The library could define three predefined maps for the domain endian::native, endian::big, endian::little, which maps every leaf to the corresponding endian type.
That's an appropriate step, given the commonality of those options.
I have not see the assembler generated, but the design is there to do nothing when the in place conversion concerns the same endianness.
That's a necessity.
I have verified and the generated code is really nothing. Here it is the code generated (gcc-4.4 -O3) to do the conversion from big to little + the initialization of the variable. movl $67305985, %eax movb %al, __ZN12_GLOBAL__N_11mE+3 movl %eax, %edx shrl $8, %edx movb %dl, __ZN12_GLOBAL__N_11mE+2 movl %eax, %edx shrl $16, %edx movb %dl, __ZN12_GLOBAL__N_11mE+1 shrl $24, %eax movb %al, __ZN12_GLOBAL__N_11mE movl $2826, %eax movb %al, __ZN12_GLOBAL__N_11mE+5 shrl $8, %eax movb %al, __ZN12_GLOBAL__N_11mE+4 instead of without movl $67305985, __ZN12_GLOBAL__N_11mE movw $2826, __ZN12_GLOBAL__N_11mE+4 I don't understand this code, but IMO the code has been completly inlined. This code correspond to the conversion in place of the big endian part, that is, an integer and a short.
The library is based on the one from Beman's. * I have split the the endian class on two levels: - endian_pack - endian. * replaced the enum class endianness by a endianness tag (see above)
Tomas already noted that he didn't like the use of enums, so you're on the same page there.
* I have added an endian_view class that is a reference to a native type viewed with a specific endianess (a kind of endian_cast).
That's not unlike Beman's approach in that every access does the conversion, right? Doing that with a view is nice.
* An last I have added on top the requested 'in place conversion' functions.
The last two features work with endian anaware data. I could add pure conversions if needed on endian anaware data.
I presume by "anaware" you mean "unaware."
Yes
What do you mean by "pure conversions?" Do you mean the lower level, function-based interface? Yes. Not in-place.
If I understand your proposal, you're trying to avoid the need to define a duplicate structure, just to express the source endianness,
I'm trying to respond to the conversion in-place need expressed in this list.
by instead applying a convert_to/from with a mapping describing the endianness of each field (or all fields?) of the data and the desired order, to convert the data. Is that right?
Yes.
Obviously, the scaffolding required by your approach must be compared against the parallel definition to see which is clearer, more functional, and performs better.
The single thing I have added is the map of the types to the endianness.
I'd like to see Tomas provide the same behavior using his approach and then we can compare them directly.
Me too. The advantage of reusing the Beman's library is that we have already, aligned/unaligned endian aware types, with and without arithmetic operations, and conversion in-place and functional conversion for aligned endian unaware types.
We'll also want to examine the performance differences, if any.
Maybe the drawback will be the performances, who knows? Best, Vicente P.S. I have updated the sabdbox with a bugfix.

vicente.botet wrote:
Rob Stewart wrote:
vicente.botet wrote:
// definition of a point of view (domain) struct network {};
// mapping of native types to an endian tree for the network point of view (domain) namespace boost { namespace endian { template <> struct domain_map <network, X::big_c> { typedef mpl::vector<big,big> type; }; template <> struct domain_map <network, X::little_c> { typedef mpl::vector<little,little> type; };
}}
I have no idea what that means. (Having now studied the rest of the example, I understand the purpose for the above, but that doesn't mean I really understand it.)
I think that the use of the newtwork works was not adequented. Just replace "netwrok" by "ifA". This means that relative to the interface "ifA" the structure X::big_c has its two fields in big endian format, and that the structure X::little_c has its two fields in little endian format. Note that the integer used when not endian aware.
The MPL vectors are, IIUC, a sequential list of the endiannesses of each field in the domain_map's second template parameter, right? I don't quite get the "relative to the interface" part. It would seem more reasonable to declare how to convert each field's endianness to the first template parameter's endianness or else to simply declare each field's endianness (more below).
// view of the structures as fusion sequences BOOST_FUSION_ADAPT_STRUCT(X::big_c, (uint32_t, a) (uint16_t, b) ) BOOST_FUSION_ADAPT_STRUCT(X::little_c, (int32_t, a) (int16_t, b) ) BOOST_FUSION_ADAPT_STRUCT(X::mixed_c, (X::big_c, a) (X::little_c, b) )
The type duplication is unfortunate, but that was necessary in Beman's approach, too.
You mean Tom's approach?
No, I did mean Beman's approach, which necessitates declaring a duplicate structure of endian types when you need or are given a native type structure (such as an OS or RTL structure) for an API. (struct tm, for example, would require a duplicate, endian-type-based structure in Beman's approach.)
The difference is that Tom used a different macro, so an application needing to see the structure has fussion sequence and to define the Tom swap function needed to declare both.
I've totally lost track of how Tomas' worked, hence my wishing to see the same example in both approaches.
X::mixed_c m; // fill with data from the 'network' domain (emulating the reception from the domain 'network') m.a.a=0x01020304; m.a.b=0x0A0B; m.b.a=0x04030201; m.b.b=0x0B0A;
This is just putting raw data, of assumed endianness, into the structures, right?
Yes. Sorry if the code comment was not clear.
I was just trying to confirm what I understood you to write and the meaning of that part of the code, so there was no confusion as we progressed through the example.
// convert from the network to the native domain endian::convert_from<network>(m);
Here you have declared that the current data in m is in network order and requested that it be converted to host order, right?
Just a precission. Convert from *the point of view given by the domain map 'network or "ifA"* to the native endian. I repeat the use of the "network" point of view let some of you think that it was big-endian,but it wsa not big-endian in all the cases.
So, this code, combined with the domain_map specialization above says that, m.a is big endian relative to "network" (or "ifA" if you had written endian::convert_from<ifA>(m)) and that "network" somehow implies a mapping to host order. I don't quite get the point of that relativity. If your domain_map were, instead, an endianness_map that merely noted the endianness of each field, then the foregoing would make more sense. I mean something like this: template <> struct endianness_map<X::big_c> { typedef mpl::vector<big,big> type; }; That would enable the following invocation, unless I'm mistaken: endian::convert_from(m); Here, "convert_from" implies from the endianness of "m" which is described by the endianness_map<X::big_c> specialization, to host order (implicit in the name). At that point, it seems that "to_host" would be clearer: endian::to_host(m); Isn't that possible and more straightforward? It would be nice if the BOOST_FUSION_ADAPT_STRUCT functionality could be reused to get the following syntax: BOOST_ENDIAN_ADAPT_STRUCT(X::big_c, (uint32_t, big, a) (uint16_t, big, b) )
That's no different, other than implementation details, than what Tomas originally provided, IIUC.
Tom's swap_in_place force to interpret all the fields of the structure with the same endianness.
Ah, right.
Here it is the code generated (gcc-4.4 -O3) to do the conversion from big to little + the initialization of the variable.
movl $67305985, %eax movb %al, __ZN12_GLOBAL__N_11mE+3 movl %eax, %edx shrl $8, %edx movb %dl, __ZN12_GLOBAL__N_11mE+2 movl %eax, %edx shrl $16, %edx movb %dl, __ZN12_GLOBAL__N_11mE+1 shrl $24, %eax movb %al, __ZN12_GLOBAL__N_11mE movl $2826, %eax movb %al, __ZN12_GLOBAL__N_11mE+5 shrl $8, %eax movb %al, __ZN12_GLOBAL__N_11mE+4
instead of without
movl $67305985, __ZN12_GLOBAL__N_11mE movw $2826, __ZN12_GLOBAL__N_11mE+4
I don't understand this code, but IMO the code has been completly inlined.
I'm not certain whether that is optimal, but I don't know what the assembly should be for the necessary operations in this case. Perhaps generating the same conversion using an intrinsic and comparing the generated assembly would help. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

From: "Stewart, Robert" <Robert.Stewart@sig.com> To: <boost@lists.boost.org> Sent: Monday, June 14, 2010 1:42 PM Subject: Re: [boost] [endain_ext] Beman's Integer.Endian extensions towork with endian unaware data
vicente.botet wrote:
Rob Stewart wrote:
vicente.botet wrote:
The MPL vectors are, IIUC, a sequential list of the endiannesses of each field in the domain_map's second template parameter, right?
Yes, it can be. But note also that when you have embeedded structures the mpl vector can contains the endianness or a mpl vector od endianness,i.e. it formes a tree.
I don't quite get the "relative to the interface" part. It would seem more reasonable to declare how to convert each field's endianness to the first template parameter's endianness or else to simply declare each field's endianness (more below).
Maybe I'm overgenralizing, but I don't think so. The same C++ type can be have different endianness map relative to two interfaces. For example, a relay that receives a message from one normalized interface, change someting specific, and relays it throw another internal interface. The endianness used for the normalized and the internal should nto be the same. The domain map is used to manage with this.
The type duplication is unfortunate, but that was necessary in Beman's approach, too.
You mean Tom's approach?
No, I did mean Beman's approach, which necessitates declaring a duplicate structure of endian types when you need or are given a native type structure (such as an OS or RTL structure) for an API. (struct tm, for example, would require a duplicate, endian-type-based structure in Beman's approach.)
This could be the case, but it is not always the case, as I have tried to explain in other posts, as all applications don't need to transform the message to the native format. Some will just copy some data to local variables and specific contexts that don't share the same structure at all.
The difference is that Tom used a different macro, so an application needing to see the structure has fussion sequence and to define the Tom swap function needed to declare both.
I've totally lost track of how Tomas' worked, hence my wishing to see the same example in both approaches.
You can jump to the first post, that includes Tom approach to manage with swap in place for structures.
// convert from the network to the native domain endian::convert_from<network>(m);
Here you have declared that the current data in m is in network order and requested that it be converted to host order, right?
Just a precission. Convert from *the point of view given by the domain map 'network or "ifA"* to the native endian. I repeat the use of the "network" point of view let some of you think that it was big-endian,but it wsa not big-endian in all the cases.
So, this code, combined with the domain_map specialization above says that, m.a is big endian relative to "network" (or "ifA" if you had written endian::convert_from<ifA>(m)) and that "network" somehow implies a mapping to host order. I don't quite get the point of that relativity.
See above (two interfaces)
If your domain_map were, instead, an endianness_map that merely noted the endianness of each field, then the foregoing would make more sense. I mean something like this:
template <> struct endianness_map<X::big_c> { typedef mpl::vector<big,big> type; };
This will always be possible if a C++ structure can have just an endianness, but this will force you to define two structures with the same native format, but different endianness, and I think that you want to avoid this duplication.
That would enable the following invocation, unless I'm mistaken:
endian::convert_from(m);
Here, "convert_from" implies from the endianness of "m" which is described by the endianness_map<X::big_c> specialization, to host order (implicit in the name). At that point, it seems that "to_host" would be clearer:
endian::to_host(m);
Isn't that possible and more straightforward?
This could be possible, as far as we restrict to one endiannes by C++ type.
It would be nice if the BOOST_FUSION_ADAPT_STRUCT functionality could be reused to get the following syntax:
BOOST_ENDIAN_ADAPT_STRUCT(X::big_c, (uint32_t, big, a) (uint16_t, big, b) )
Yes, if a single endianness is the common case, this could be useful.
That's no different, other than implementation details, than what Tomas originally provided, IIUC.
Tom's swap_in_place force to interpret all the fields of the structure with the same endianness.
Ah, right.
Here it is the code generated (gcc-4.4 -O3) to do the conversion from big to little + the initialization of the variable.
I don't understand this code, but IMO the code has been completly inlined.
I have made the test using htons/l and the generated assembler is identical :) integer::convert_from<network>(m.a); //~ m.a.a=htonl(m.a.a); //~ m.a.b=htons(m.a.b); Best, Vicente
participants (2)
-
Stewart, Robert
-
vicente.botet