[asio] subtle bug in win_iocp_io_service?

First I have to apologize for cross - posting, but I really do not know which is the correct list to ask this. This is a follow up to my first attempt that was of subject "Stopping active open connections". I know it would be ideal if I was able to post a code snippet that clearly shows the bug I suspect. But unfortunately I am not able to do so. The "bug" seems to be so time dependent, that I cannot manage to isolate it from my code. Instead I submit a reasoning and a fix that worked in my case. I am doing this in the hope that someone (chris possibly you ? ) can assert it really is a bug or give me a hint what I might do wrong. 1) What is happening: In code similar to the http server3 example I have one client connection open (waiting for a read) and one pending incoming connection. Shuting down the server (Ctrl-C) shows inconsistent behaviour: *) Either everything works well or *) the open connection socket gets closed, but the connection objects dtor does not get called or *) there is an access violation, with odd stack frame (somewhere from biolsp.dll ?!) 2) What change made it work for me? In the file win_iocp_io_service.hpp on line 94 in function shutdown_service I changed the timeout value from zero to 10 msec. To me the problem looks as if it was not possible to reliably find out the number of outstanding operations. Since 10 msec is as arbitrary a number as 0 I guess the proposed change is not a solution but only masking the error in a different manner. This would mean that it would be necessary to record the outstanding operations in a separate list and call the destructors on all that have not been retrieved by GetQueuedCompletionStatus. In the hope that someone cares, Roland aka speedsnail

Roland Schwarz wrote:
1) What is happening: In code similar to the http server3 example I have one client connection open (waiting for a read) and one pending incoming connection.
Shuting down the server (Ctrl-C) shows inconsistent behaviour: *) Either everything works well or *) the open connection socket gets closed, but the connection objects dtor does not get called or *) there is an access violation, with odd stack frame (somewhere from biolsp.dll ?!)
2) What change made it work for me? In the file win_iocp_io_service.hpp on line 94 in function shutdown_service I changed the timeout value from zero to 10 msec.
To me the problem looks as if it was not possible to reliably find out the number of outstanding operations. Since 10 msec is as arbitrary a number as 0 I guess the proposed change is not a solution but only masking the error in a different manner.
This would mean that it would be necessary to record the outstanding operations in a separate list and call the destructors on all that have not been retrieved by GetQueuedCompletionStatus.
If this is the problem, maintaining a counter of outstanding operations should be sufficient. Can you please add some debugging code to confirm whether the number of items dequeued using GetQueuedCompletionStatus differs when you use the 10msec timeout. Thanks. Cheers, Chris

Christopher Kohlhoff wrote:
Can you please add some debugging code to confirm whether the number of items dequeued using GetQueuedCompletionStatus differs when you use the 10msec timeout. Thanks.
Hmm, not exactly sure what you mean. In my code I have the dtors of my connection objects emit a message when they are called. Since they use the shared_from_this trick they are invoked when the overlapped structure is been deleted since they are bound to them. So I am pretty sure I can see when the call to destruct the overlapped is missing. On the other hand I can see that the socket is closed properly since the client receives the shutdown request. I would be glad if you could give me another hint which kind of debug message would be helpful to you. On the other hand, perhaps some (mathematical) reasoning also can help: If the socket is closed, which property guarantees that the queued completion status invoked with zero timeout will see the associated overlapped for sure? Or put it another way: why not waiting infinitely when we are sure there must be something pending? Why zero timeout? Regards Roland

Roland Schwarz wrote:
In my code I have the dtors of my connection objects emit a message when they are called. Since they use the shared_from_this trick they are invoked when the overlapped structure is been deleted since they are bound to them.
So I am pretty sure I can see when the call to destruct the overlapped is missing.
On the other hand I can see that the socket is closed properly since the client receives the shutdown request.
I would be glad if you could give me another hint which kind of debug message would be helpful to you.
Don't worry, what you describe above already sounds like sufficient evidence.
On the other hand, perhaps some (mathematical) reasoning also can help: If the socket is closed, which property guarantees that the queued completion status invoked with zero timeout will see the associated overlapped for sure? Or put it another way: why not waiting infinitely when we are sure there must be something pending? Why zero timeout?
That's exactly what I meant by "maintaining a counter of outstanding operations", and I've just checked in such a change. Please let me know how it goes. Cheers, Chris
participants (2)
-
Christopher Kohlhoff
-
Roland Schwarz