[boost] [interprocess] More robust message_queue and interprocess_condition?

14 Apr 2011

      Sergei Politov <spolitov <at> gmail.com> writes:
...
[interprocess] message_queue hangs when another process dies
Suppose we have 2 processes, one sends messages to queue, another reads
them.
  When reading process dies (for instance using End process) during
message_queue.receive the another process hangs in send.
I've run into an apparently old problem with message_queue (see post above from 
two years ago) and I am wondering if there isn't a fairly simple solution. It 
wouldn't be perfect but would be far better behavior than we have now.

Please let me know if this looks like a good idea for inclusion into the 
interprocess library.

The problem is the message_queue send operation will block forever trying to 
send to a process that has been abnormally terminated. The send is trying to do 
a interprocess_condition::notify_one call. Inside 
interprocess_condition::notify it executes the statement: "m_enter_mut.lock()". 
This mutex is holding back the send call from completing because the dead 
process still has ownership.

My solutions to the problem lie within the interprocess_condition class, as 
this is really the source of the problem. 

Solution 1: Fixed timeout notify
--------------------------------

Change the mutex lock call in interprocess_condition::notify to a timed_lock 
call using a fixed timeout value. This feature could be enabled/disabled and 
the timeout value configured through use of preprocessor symbols.

Replace:

  inline void interprocess_condition::notify(boost::uint32_t command)
  {
      m_enter_mut.lock();

With:

  inline void interprocess_condition::notify(boost::uint32_t command)
  {
  #ifdef ENABLE_BOOST_INTERPROCESS_TIMEOUT
     boost::posix_time::ptime expires 
       = boost::posix_time::microsec_clock::universal_time() +  
         boost::posix_time::milliseconds(BOOST_INTERPROCESS_TIMEOUT_MS);
     if (!m_enter_mut.timed_lock(expires))
       throw timeout_exception();
  #else
      m_enter_mut.lock();
  #endif

This allows an exception to be thrown if it waits too long at the mutex. This 
may be adequate for most applications, I don't see a good reason for this to 
block for very long. This change will of course effect anything using 
interprocess_condition, which could be seen as a good thing or a bad thing. 
Good in that anything using it, like message_queue for instance, will 
immediately get improved functionality. The message_queue send will now throw 
an interprocess_timeout exception on the send without any code changes! However 
it may be seen as bad thing because the thrown exception may be unexpected 
behavior (although not expecting exceptions is not a wise thing).

Solution 2: Notify with timeout  
-------------------------------

We introduce an interprocess_condition::notify that specifies the time to wait 
for notification to complete.

Add:

  inline void interprocess_condition::notify(
      boost::uint32_t command,
      const boost::posix_time::ptime &abs_time)
  {
     if (!m_enter_mut.timed_lock(abs_time))
       throw timeout_exception();

This solution is much the same as the first but introduces new methods to 
accomplish the functionality. The advantages of this approach would be control 
of the timeout value and existing functionality would not be changed. 
Disadvantage would be that software wanting this feature would need to be 
rewritten. For example the message_queue send & try_send functions could have 
additional timeout values. One issue I am having with this solution, is why 
would I want to use the old notify API? It seems the new methods would 
deprecate the old ones and create mild confusion.

Actually looking closely at the message_queue API, this presents some 
challenges:

   // We can add the timeout here, no problem.
   void send (
       const void *buffer,
       std::size_t buffer_size, 
       unsigned int priority,
       const boost::posix_time::ptime& abs_time); // <-- new timeout value

   // The nature of this method is not to block,
   // so adding a timeout value here is counter intuitive.
   // But this is exactly what we need to do because it blocks
   // in our exceptional case.
   bool try_send(
       const void *buffer,
       std::size_t buffer_size, 
       unsigned int priority,
       const boost::posix_time::ptime& abs_time); // <-- new timeout value

   // Here we probably need to use the existing timeout for 
   // the timed_notify call.
   bool timed_send(
       const void *buffer,
       std::size_t buffer_size, 
       unsigned int priority,  
       const boost::posix_time::ptime& abs_time); // <-- existing timeout value

I was originally thinking this was the best solution, but now after looking at 
the details, solution 1 is looking more appealing.

Solution 2: Try notify
----------------------

This solution pushes the waiting code back to the caller. The advantage is that 
this solution does not block at any time, but its usage will be more 
complicated.

Add:

  inline bool interprocess_condition::try_notify(
      boost::uint32_t command)
  {
     if (!m_enter_mut.try_lock())
       return false;

Not sure I am loving this solution, but it could be used to create a better 
behaved message_queue::try_send. One that would be difficult to use too, and 
I'm afraid not very popular (ie try_send returning false because it can't 
aquire the mutex right away).

[boost] [interprocess] More robust message_queue and interprocess_condition?

Ross MacGregor