Concurrency and session termination in boost::asio
Hello, I'm experimenting with boost::asio as the basis for communications and event handling for a network server. The server will support many clients, and should scale well to machines with multiple processors/cores, so I'd like to use thread pools to execute requests. I've done this by calling io_service.run() from multiple threads, which seems to be the normal way to do it. Each client session always has an asynchronous read operation waiting for new commands. If the server has a notification for a thread, it will send it by starting an asynchronous write on that session. Notifications can come at any time, not just in response to a request, so it is common for a session to simultaneously have oustanding asynchronous read and write requests. Sessions are terminated when the socket is closed or when the client sends a quit command. I'm trying to make sure that my handling of session termination is threadsafe. It seems that the normal way to handle this is to detect an error or end-of-file in an asynchronous event handler, then delete the object to end the session. Because my server is using a threadpool and waiting for both read and write events, it is possible for a read and write callback to be running concurrently on different CPUs. In that case, if one of those callbacks detects an error and deletes the object, the other will continue running using an invalid object. In a simple echo server I wrote to learn about boost::asio, I can see these invalid accesses when running under valgrind with a large number of clients, so unless I have made an error in my server, this appears to be a real problem. I rewrote it to have write errors handled by closing the session, which causes the read callback to run, notice the session has been closed, and delete the object. This seemed to solve the problem, but it still seems there is a potential problem if the read callback finishes deleting the object while the write callback is running. Is there a better way to handle session termination? Can I use io_service::strand to prevent this by having the readers and writers for the same session running in the same strand? Thanks for any thoughts! ----Scott.
On Wed, 05 Nov 2008 20:39:47 +0100, Scott Gifford
[...]I rewrote it to have write errors handled by closing the session, which causes the read callback to run, notice the session has been closed, and delete the object. This seemed to solve the problem, but it still seems there is a potential problem if the read callback finishes deleting the object while the write callback is running.
Is there a better way to handle session termination?
See the various Boost.Asio examples where classes with asynchronous handlers are typically derived from boost::enable_shared_from_this. Boris
Boris
On Wed, 05 Nov 2008 20:39:47 +0100, Scott Gifford
wrote: [...]I rewrote it to have write errors handled by closing the session, which causes the read callback to run, notice the session has been closed, and delete the object. This seemed to solve the problem, but it still seems there is a potential problem if the read callback finishes deleting the object while the write callback is running.
Is there a better way to handle session termination?
See the various Boost.Asio examples where classes with asynchronous handlers are typically derived from boost::enable_shared_from_this.
Ah, thanks! I had noticed they were using shared_ptr<>s, but not that they were passing them to boost::bind, very clever. So I have changed my server's session class to inherit from enable_shared_from_this and changed the calls to boost::bind to use the shared this pointer. Now as long as there is a registered callback for a session object, it will never be destroyed. If I don't hold any external references to the session object, when all the callbacks complete without starting any other asynchronous calls, the object will be destroyed after last callback completes, in the destructor for the shared_ptr. If the socket fails or is closed, all of the callbacks will be called with a failure code, ensuring that the session is eventually destroyed. Since my server is always reading from clients, I can just make sure there is always a registered callback for network reads. I can do this by starting an asynchronous read when the session object is created, then making sure that when each read completes I start a new one. That will ensure that a reference to my session stays alive, and to shut down the connection the read callback will simply not start another asynchronous read. Does all of that sound about right? One related question: In my server, on certain errors or a client request, I was calling the close() method of a tcp::socket. Watching that in the debugger, it ends up running: boost::asio::detail::reactive_socket_service::close from boost/asio/detail/reactive_socket_service.hpp The close method calls "is_open" to determine if the socket has already been closed, and if it is not asks the OS to close it. It looks to me like this is not done in a threadsafe way. I can't see a mutex being held here, and so two threads that call close() simultaneously could both find that is_open returned false, then both close the socket. With unfortunate timing, by the time the second one runs, that socket's file descriptor could have been reused, and the wrong fd would be closed. Am I missing something here? Or is close() not meant to be threadsafe? Thanks! ----Scott.
Does all of that sound about right?
Yes, that's exactly the way it's done in the examples.
The close method calls "is_open" to determine if the socket has already been closed, and if it is not asks the OS to close it.
It looks to me like this is not done in a threadsafe way.
Right, you should post closing to the io_service thread, like this: void my_connection::close() { io_service_.post(boost::bind(&my_connection::do_close, shared_from_this())); } void my_connection::do_close() { boost::system::error_code err; socket_.close(err); }
"Igor R"
Does all of that sound about right?
Yes, that's exactly the way it's done in the examples.
Great, thanks!
The close method calls "is_open" to determine if the socket has already been closed, and if it is not asks the OS to close it.
It looks to me like this is not done in a threadsafe way.
Right, you should post closing to the io_service thread, like this:
Ah, I see, thanks for the sample code! I'm not sure I understand exactly what makes this threadsafe, though. It seems like if my reader callback and writer callback are running simultaneously, and both decide to close the connection, they will post two close requests to io_service. If those requests are scheduled on two different worker threads, they could execute simultaneously, and still cause problems.
void my_connection::do_close() { boost::system::error_code err; socket_.close(err); }
Maybe I should just hold a lock in do_close? That seems like it would solve the problem. Thanks again for the help! -----Scott.
If those requests are scheduled on two different worker threads, they could execute simultaneously, and still cause problems.
Sorry, I forgot that you run one io_service in several threads... Why wouldn't you scale your application using "io_service per CPU" approach, rather than "thread per CPU"? Like this: http://www.boost.org/doc/libs/1_37_0/doc/html/boost_asio/examples.html#boost...
"Igor R"
If those requests are scheduled on two different worker threads, they could execute simultaneously, and still cause problems.
Sorry, I forgot that you run one io_service in several threads... Why wouldn't you scale your application using "io_service per CPU" approach, rather than "thread per CPU"?
At first glance, thread per CPU seemed simpler. Also I was planning on having my read callback handle some commands that may be slow, such as database queries, and I didn't want all other clients in the same io_service to block while that's happening. But certainly I can change that.
Like this:
http://www.boost.org/doc/libs/1_37_0/doc/html/boost_asio/examples.html#boost...
Thanks, I had looked at that briefly, I will take a closer look. So the advantage of io_service per CPU is that, with only one run() thread on each io_service, requests are serialized on the run() thread. Any callbacks are queued to the io_service(). Any client sessions which are attached to that io_service are guaranteed that only one callback will be called at a time. The reference material for "strands" calls this running the io_service in an implicit strand. If the client session is cancel()ed or close()d, does that clear out the callback queue of anything related to that session? Or do I need to be aware of the possibility that the session is now closed()d somehow? For example, if I have two read callbacks queued, and the first causes the connection to close, do I need to worry that if the second tries to write back to the connection, the wrong thing could happen? Or will boost::asio protect against that? Also, should I be able to get the same per-session request serialization by using strands? Thanks again! -----Scott.
On Thu, 06 Nov 2008 17:38:39 +0100, Scott Gifford
"Igor R"
writes: If those requests are scheduled on two different worker threads, they could execute simultaneously, and still cause problems.
Sorry, I forgot that you run one io_service in several threads... Why wouldn't you scale your application using "io_service per CPU" approach, rather than "thread per CPU"?
At first glance, thread per CPU seemed simpler. Also I was planning on having my read callback handle some commands that may be slow, such as database queries, and I didn't want all other clients in the same io_service to block while that's happening. But certainly I can change that.
See http://thread.gmane.org/gmane.comp.lib.boost.asio.user/1300 for a (small) discussion about the various designs. I wouldn't change the design of an application though just because of difficulties in closing a socket. Boris
Boris
On Thu, 06 Nov 2008 17:38:39 +0100, Scott Gifford
wrote:
[...]
At first glance, thread per CPU seemed simpler. Also I was planning on having my read callback handle some commands that may be slow, such as database queries, and I didn't want all other clients in the same io_service to block while that's happening. But certainly I can change that.
See http://thread.gmane.org/gmane.comp.lib.boost.asio.user/1300 for a (small) discussion about the various designs.
Thanks Boris, I appreciate the link!
I wouldn't change the design of an application though just because of difficulties in closing a socket.
Right now my app is just a few hundred lines of code for testing, mostly cribbed from the boost examples, so changing the design this early is no big deal. :-) ----Scott.
Also I was planning on having my read callback handle some commands that may be slow, such as database queries, and I didn't want all other clients in the same io_service to block while that's happening.
Well, i don't know what you application does, but you can consider extending io_service-per-cpu to io_service-per-"logical unit", where inside a "logical unit" the parallelization is not essential. Of course, if you've got lots of such "units", the overhead might be unacceptable. Anyway, all these tricks aim to simplify the design by minimizing multithreading mess. Afterall, if you see that it just complicates your design, you always can prefer spreading locks in your code :).
If the client session is cancel()ed or close()d, does that clear out the callback queue of anything related to that session? Or do I need to be aware of the possibility that the session is now closed()d somehow?
If you close() a socket, the callback queue is certainly *not* cleared, because such a clearing would violate very important guarantee provided by asio: every async. request ends up calling its handler (probably with some error passed to it). Number of landings should be equal to the number of takeoffs - that's what allows us, in particular, to use that automatic lifetime management by binding shared_ptr to the handler.
For example, if I have two read callbacks queued, and the first causes the connection to close, do I need to worry that if the second tries to write back to the connection, the wrong thing could happen?
Nothing wrong can happen, the further i/o will just fail gracefully (and its handler will be called as well).
"Igor R"
Also I was planning on having my read callback handle some commands that may be slow, such as database queries, and I didn't want all other clients in the same io_service to block while that's happening.
[...]
Anyway, all these tricks aim to simplify the design by minimizing multithreading mess. Afterall, if you see that it just complicates your design, you always can prefer spreading locks in your code :).
Yeah, I don't mind using locks, but I'm not exactly sure which boost::asio operations require locks for concurrent access, and which do not. Maybe I'm trying to make things more complicated than they are, and it's simply the case that *all* concurrent access to the same boost::asio object requires a lock to be held? I had been assuming that it's safe to make concurrent calls to a tcp::socket's async_read() and async_write() methods, is that a correct assumption? Or do I need to hold a mutex whenever I'm calling any method on a tcp::socket? Looking more closely at the documentation, it looks like that's the case. The only example I found with a thread pool was HTTP server example 3, and HTTP never needs to deal with simultaneous reads and writes. If anybody knows of any examples floating around that do simultaneous reads and writes with a thread pool, I would very much appreciate a link.
If the client session is cancel()ed or close()d, does that clear out the callback queue of anything related to that session? Or do I need to be aware of the possibility that the session is now closed()d somehow?
If you close() a socket, the callback queue is certainly *not* cleared, because such a clearing would violate very important guarantee provided by asio: every async. request ends up calling its handler (probably with some error passed to it). Number of landings should be equal to the number of takeoffs - that's what allows us, in particular, to use that automatic lifetime management by binding shared_ptr to the handler.
Ah, I see, makes perfect sense, thanks for the explanation! Thanks again for all your help, both Igor and Boris! -----Scott.
Or do I need to hold a mutex whenever I'm calling any method on a tcp::socket? Looking more closely at the documentation, it looks like that's the case.
Sure, the reference of ip::tcp::socket says: "Thread Safety Distinct objects: Safe. Shared objects: Unsafe. "
Thanks again for all your help, both Igor and Boris!
you're welcome :)
participants (3)
-
Boris
-
Igor R
-
Scott Gifford