Re: [Boost-users] [asio] Socket closed notification?

15 Dec 2009

      On Tue, Dec 15, 2009 at 10:25 AM, Scott Gifford
<sgifford@suspectclass.com> wrote:
...
Jonathan Franklin <franklin.jonathan@gmail.com> writes:
Sure, I agree completely that attempting a read is necessary, but IME
it is not sufficient.  You must additionally send data, either with an
OS write or TCP keepalive, to detect a completely unresponsive peer
(i.e. one which has fallen off the network).  The only way to detect
an unresponsive peer is via a timeout, and with no data to send, there
is nothing to time out.
I also agree that write() won't always return an error, but it should
attempt to send data, which will cause the TCP layer to wait for an
acknowledgement of that data.  If that times out, the TCP layer should
detect an error on the socket, and a subsequent call to read() should
return an error.
I think we're agreeing, but not being clear enough for each other, or the OP.

The OP is interested in detecting a closed (or crashed) remote socket.
There are 2 scenarios to consider:

1. The OP has no control over the application protocol, and there is
no application-level ping or ACK mechanism built-in.  In this case,
the application cannot send any data outside of the "normal" operation
(e.g. can't actively try to detect whether the remote host is still
there).  The application must rely on read() returning an EOF when it
is notified that the remote socket has closed (e.g. by the remote
system, the TCP keep-alive mechanism, an ICMP message from an
intermediate router, etc).  If the remote host is hard-down (blue
screened, cable cut, etc), and there is no TCP keep-alive, then you're
pretty much hosed.

The only possibility would be to add an application-level timeout to
the read.  e.g. reset your timer each time you read data.  Kill the
socket when the timeout occurs.  However, this may not be an option
for your use case.

2. There is an application-level ping/ACK mechanism available (the OP
may need to add it).  In this case, the "ping" is sent to the
hard-down remote host.  The write() call will not fail, and it may
take many write() calls to generate a failure.  However, as soon as
the TCP stack times out the send (right about when the writes will
begin to fail), the read() call will immediately return an EOF.

In neither case can one rely on write() failing.  In case 2, one *can*
rely on the read() eventually returning EOF.  The worst-case scenario
for case 1 will never detect the downed remote host.  However,
attempting to send data in case 1 under "normal" operation will
generate an EOF from read(), but not a failure in write().

I prefer timing out "inactive" connections to sending "heart-beat"
messages, when possible.

Jon