Also I was planning on having my read callback handle some commands that may be slow, such as database queries, and I didn't want all other clients in the same io_service to block while that's happening.
Well, i don't know what you application does, but you can consider extending io_service-per-cpu to io_service-per-"logical unit", where inside a "logical unit" the parallelization is not essential. Of course, if you've got lots of such "units", the overhead might be unacceptable. Anyway, all these tricks aim to simplify the design by minimizing multithreading mess. Afterall, if you see that it just complicates your design, you always can prefer spreading locks in your code :).
If the client session is cancel()ed or close()d, does that clear out the callback queue of anything related to that session? Or do I need to be aware of the possibility that the session is now closed()d somehow?
If you close() a socket, the callback queue is certainly *not* cleared, because such a clearing would violate very important guarantee provided by asio: every async. request ends up calling its handler (probably with some error passed to it). Number of landings should be equal to the number of takeoffs - that's what allows us, in particular, to use that automatic lifetime management by binding shared_ptr to the handler.
For example, if I have two read callbacks queued, and the first causes the connection to close, do I need to worry that if the second tries to write back to the connection, the wrong thing could happen?
Nothing wrong can happen, the further i/o will just fail gracefully (and its handler will be called as well).