Re: [boost] [boost::endian] Request for comments/interest

30 May 2010

      Terry,
...
Since IP packets cannot be 10GB, I submit that you're going to have to break 
your 10GB array down into messages.
Thank you for your continued feedback.  You have raised some interesting 
points and issues.  Please see my comments inline.

First of all, note that I was careful to say "send the data to an external 
device" but I believe that you are thinking about the problem purely from the 
point of view of networking, at least your message seems to imply so in this 
case.

I am not going to have to break my message into packets.  And *even* if the 
message needs to be broken into packets, it will not be done by me, but by the 
OS.  I will just call write()/WriteFile() or whatever with the data I have 
available.  I am not going to break it up into packets ahead of time.
...
boost::array<endian<little, uint32_t>, MaxFragmentSize> buffer;
That you copy fragments of the 10GB array into before sending, and then on 
the receiving size, copy them out.
The user on either side of the interface can extract the data from the 
fields without knowing the endianness of the field or the endianness of the 
machine he's working on.
He doesn't have to know to call a swap function.  He just extracts the data 
using the standard copy algorithm.  The conversion happens automatically by 
implicit conversions.
One copy into each message.  One copy out.  What could be better than that?
Sorry for the overquote but I wanted to make sure we didn't lose the context 
in this particular case.

Better than 1 copy in/out is 0 copies in/out.  Here's how:

I memory map a file.  Now the data is in memory.
Alternatively, I allocate a buffer and read from a disk or from a network.  
Note that the OS needs to make a copy to get the data into the user-space 
buffer.  At this point, ideally, I should be able to start using the data.

If I understand your suggestion correctly, you would, at this point, construct 
a collection of endian types in place in this buffer, allocate a new buffer 
and copy the data out to it, during which the swapping would happen.

If this is correct, I think there are several problems with this approach:
 - this may not seem relevant but I think this is really ugly and much less 
maintainable than the functional approach.
 - I can do a swap_in_place<>() on the original buffer.  0 copies.  0 work in 
the case when the endianness is already correct.  
 - On the other hand, you have to allocate a new buffer, placement new all the 
endian types, perform the copy.  Cost: Allocation + at least 2N operations in 
either case, not to mention the other bad side effects related to unncessary 
work which I already detailed in another post.
...
then field alignment isn't an issue.
Correct, but it may affect the quality of the code the compiler can generate.  
I belive my approach doesn't suffer from this problem.
...
Doesn't swap_in_place<>() make the same assumption of overlaying types?
No, since the type just gets written back to the same type and location.  The 
only assumption swap_in_place<>() makes is that a swapped type is again 
representable in the original type.  And yes, I will give you that this is a 
non-trivial assumption, as, as others have pointed out, this may not be valid 
for floating point values or even pointers on some machines.
...
In the message-based interfaces that I am used to, one always must copy some 
data structures into a message before you send it.
But, as I pointed out, we are not just talking about network protocols.
...
In both techniques you have to copy the information out of the message, if 
you use it, at least one time.  The problem with the swapping mechanisum is 
that the swap, requires a write and a read from every location,
This is not necessarily true.
It may the case with swap_in_place but not necessarily with swap<>().

However, while I have agreed with you that some people might find the endian 
types useful, I have to take exception to your claim above, that you always 
need to swap everything.  That is simply not true!  I can, just like you, do 
the following:

int i = swap<big_to_machine>(s.i);

Actually, I personally find this code rather readable and in many respects, I 
find it more instructive than the following:

int i = s.i;

where i happens to be an endian type.  I would go as far as to argue that my 
code is much more self-documenting and would lead to fewer surprises for a 
programmer not familiar with your code.
...
With the typed-approach you only pay for the message fields that you read.
And equally with the functional approach.
...
No extra work is required on native-endian machines.
But I think I've demostrated that there is actually a significant amount of 
extra work required even on native-endian machines.
...
I think the typed-approach actually fits the "only pay for what you use" 
mantra better.
Disagree.
...
I get the impression that I'm missing something.  If you're game, I'd like 
to consider a real-world use-case that uses multiple endians and has 
different protocol layers.
Of course.  I like the idea of actual use cases.
...
We're only considering byte-ordering here too.  An equally important part of 
the endian problem for me, is the bit-ordering.  For this I use a similar 
technique for portable bitfields
bitfield<endian_t, w1, w2, w3, w4, w5, ...>
I am not sure what the above means, sorry.
...
I'm arguing against swapping though because I've been using the type-based 
method (but not Beman's exactly) successfully for a long time.  I'm a very 
biased.  :o).
This has been very useful.  Thank you. 

Tom

Re: [boost] [boost::endian] Request for comments/interest

Tomas Puverle