[asio] skipping data in tcp stream

HI! What we have: tcp connection, from which we read messages. each message = header(fixed len) + body(random len) header contains the length of body and crc to check if the header is not corrupted. Reading is done in two async steps - first the header is read, then memory for body is allocated and the body is read. Questions are: 1) if header is corrupted under some cases (header crc says body len could be wrong) then as I understand we must skip all body data and read the next message. How to skip? async_read_until seemed to be the solution, but manual told that it may read surplus data into streambuf - it will be hard to deal with that data because I am reading directly from socket to some allocated memory (as was told above) 2) actually, can it be that in tcp stream data will be corrupted? I guess, yes - so header contains bodylen and crc for body and for header? is such solution an overhead? 3) maybe I do some global design mistakes? Thank you!

1) if header is corrupted under some cases (header crc says body len could be wrong) then as I understand we must skip all body data and read the next message. How to skip? async_read_until seemed to be the solution, but manual told that it may read surplus data into streambuf - it will be hard to deal with that data because I am reading directly from socket to some allocated memory (as was told above)
How do you know the number of bytes you have to skip? If you determine that your header is corrupted, the length info can't be trusted any more. You'd have to employ some framing method for your packets so you can reliably detect the next packet start. In this case you shouldn't skip a given number of bytes but rather until a new frame start can be detected.
2) actually, can it be that in tcp stream data will be corrupted? I guess, yes - so header contains bodylen and crc for body and for header? is such solution an overhead?
It's theoretically possible, but very unlikely. Note that TCP/IP employs checksums itself and should retransmit any packet which is detected as faulty. IIRC it's only a 16 bit checksum, so if you throw completely mangled data every 65536th packet would make it through the check sum (statistically of course). Note that you'd have to have a very unreliable media for that to be of concern, something which would make normal communication next to impossible.
3) maybe I do some global design mistakes?
Before you delve into check sum protection for you data too much, you should check what TCP/IP already has to offer. Do you have an analysis of your expected error pattern (bit errors, dropped bytes, erased bits, bundle errors) ? What's the acceptable error rate of the data you transfer ? What's the error rate of your TCP/IP channel ?

How do you know the number of bytes you have to skip? If you determine that your header is corrupted, the length info can't be trusted any more.
You'd have to employ some framing method for your packets so you can reliably detect the next packet start. In this case you shouldn't skip a given number of bytes but rather until a new frame start can be detected.
I was thinking about it and got an idea, that I will read data until byte with 0 value is detected After this there will be a try to read the header. If the read header is corrupted (probably we are on the middle of the message body) then try again and again.. until normal header is read.
It's theoretically possible, but very unlikely. Note that TCP/IP employs checksums itself and should retransmit any packet which is detected as faulty. IIRC it's only a 16 bit checksum, so if you throw completely mangled data every 65536th packet would make it through the check sum (statistically of course). Note that you'd have to have a very unreliable media for that to be of concern, something which would make normal communication next to impossible.
Before you delve into check sum protection for you data too much, you should check what TCP/IP already has to offer. Do you have an analysis of your expected error pattern (bit errors, dropped bytes, erased bits, bundle errors) ? What's the acceptable error rate of the data you transfer ? What's the error rate of your TCP/IP channel ?
I guess, connections will be very different - standard wire connections, gprs, 3g, wifi.. And the aim is to provide max reliability with minimal cost - I try to count every byte and so also thinking about need to implement own additional checks for packet corruption. Also I'm thinking about using UDP - as I understand, I will have lower reliability, but I will not need to skip any data - each message is delivered separately. Nevertheless, Rudolf, thank you very much :) One more: can boost::asio::async_read return without filling provided buffer fully? I guess only in case of some error that will be set in boost::system::error_code parameter.

I was thinking about it and got an idea, that I will read data until byte with 0 value is detected After this there will be a try to read the header. If the read header is corrupted (probably we are on the middle of the message body) then try again and again.. until normal header is read.
This sounds like a protocol with message framing like HDLC.
I guess, connections will be very different - standard wire connections, gprs, 3g, wifi..
But note that apart from TCP/IP capabilities some of these transfer media have their own error detection or correction schemes in their underlying layers.
And the aim is to provide max reliability with minimal cost - I try to count every byte and so also thinking about need to implement own additional checks for packet corruption.
Do you have any practical tests which show excessive error rates for one of these channels (apart from dropped packets) ? If these are indeed you channels, I would expect very low error rates. Chances are a decent check sum over your whole packet would do the job. If a packet turns out corrupt, you may as well drop the connection and start from scratch. If it's imperative that you maintain the connection, packet framing might be inevitable. If you expect rare bit errors, a simple FEC scheme might be the solution, since it avoids a lot of protocol overhead for data retransmits.
Also I'm thinking about using UDP - as I understand, I will have lower reliability, but I will not need to skip any data - each message is delivered separately.
UDP introduces a number of additional hurdles which TCP/IP handles for you: correct ordering of packets, handling of dropped packets, simpe bit error detection. Be sure you know these implications before you drop TCP/IP

One more: can boost::asio::async_read return without filling provided buffer fully? I guess only in case of some error that will be set in boost::system::error_code parameter.
async_read has 4 overloads: http://www.boost.org/doc/libs/1_38_0/doc/html/boost_asio/reference/async_rea... 2 of them have completion condtition parameter: http://www.boost.org/doc/libs/1_38_0/doc/html/boost_asio/reference/async_rea... http://www.boost.org/doc/libs/1_38_0/doc/html/boost_asio/reference/async_rea...
participants (3)
-
Igor R
-
Roman Shmelev
-
Rudolf Leitgeb