
Sergei Politov <spolitov <at> gmail.com> writes:
[interprocess] message_queue hangs when another process dies
Suppose we have 2 processes, one sends messages to queue, another reads them. When reading process dies (for instance using End process) during message_queue.receive the another process hangs in send.
I've run into an apparently old problem with message_queue (see post above from two years ago) and I am wondering if there isn't a fairly simple solution. It wouldn't be perfect but would be far better behavior than we have now. Please let me know if this looks like a good idea for inclusion into the interprocess library. The problem is the message_queue send operation will block forever trying to send to a process that has been abnormally terminated. The send is trying to do a interprocess_condition::notify_one call. Inside interprocess_condition::notify it executes the statement: "m_enter_mut.lock()". This mutex is holding back the send call from completing because the dead process still has ownership. My solutions to the problem lie within the interprocess_condition class, as this is really the source of the problem. Solution 1: Fixed timeout notify -------------------------------- Change the mutex lock call in interprocess_condition::notify to a timed_lock call using a fixed timeout value. This feature could be enabled/disabled and the timeout value configured through use of preprocessor symbols. Replace: inline void interprocess_condition::notify(boost::uint32_t command) { m_enter_mut.lock(); With: inline void interprocess_condition::notify(boost::uint32_t command) { #ifdef ENABLE_BOOST_INTERPROCESS_TIMEOUT boost::posix_time::ptime expires = boost::posix_time::microsec_clock::universal_time() + boost::posix_time::milliseconds(BOOST_INTERPROCESS_TIMEOUT_MS); if (!m_enter_mut.timed_lock(expires)) throw timeout_exception(); #else m_enter_mut.lock(); #endif This allows an exception to be thrown if it waits too long at the mutex. This may be adequate for most applications, I don't see a good reason for this to block for very long. This change will of course effect anything using interprocess_condition, which could be seen as a good thing or a bad thing. Good in that anything using it, like message_queue for instance, will immediately get improved functionality. The message_queue send will now throw an interprocess_timeout exception on the send without any code changes! However it may be seen as bad thing because the thrown exception may be unexpected behavior (although not expecting exceptions is not a wise thing). Solution 2: Notify with timeout ------------------------------- We introduce an interprocess_condition::notify that specifies the time to wait for notification to complete. Add: inline void interprocess_condition::notify( boost::uint32_t command, const boost::posix_time::ptime &abs_time) { if (!m_enter_mut.timed_lock(abs_time)) throw timeout_exception(); This solution is much the same as the first but introduces new methods to accomplish the functionality. The advantages of this approach would be control of the timeout value and existing functionality would not be changed. Disadvantage would be that software wanting this feature would need to be rewritten. For example the message_queue send & try_send functions could have additional timeout values. One issue I am having with this solution, is why would I want to use the old notify API? It seems the new methods would deprecate the old ones and create mild confusion. Actually looking closely at the message_queue API, this presents some challenges: // We can add the timeout here, no problem. void send ( const void *buffer, std::size_t buffer_size, unsigned int priority, const boost::posix_time::ptime& abs_time); // <-- new timeout value // The nature of this method is not to block, // so adding a timeout value here is counter intuitive. // But this is exactly what we need to do because it blocks // in our exceptional case. bool try_send( const void *buffer, std::size_t buffer_size, unsigned int priority, const boost::posix_time::ptime& abs_time); // <-- new timeout value // Here we probably need to use the existing timeout for // the timed_notify call. bool timed_send( const void *buffer, std::size_t buffer_size, unsigned int priority, const boost::posix_time::ptime& abs_time); // <-- existing timeout value I was originally thinking this was the best solution, but now after looking at the details, solution 1 is looking more appealing. Solution 2: Try notify ---------------------- This solution pushes the waiting code back to the caller. The advantage is that this solution does not block at any time, but its usage will be more complicated. Add: inline bool interprocess_condition::try_notify( boost::uint32_t command) { if (!m_enter_mut.try_lock()) return false; Not sure I am loving this solution, but it could be used to create a better behaved message_queue::try_send. One that would be difficult to use too, and I'm afraid not very popular (ie try_send returning false because it can't aquire the mutex right away).