[Boost-users] [interprocess] Sharing data in a peer-to-peer fashion

5 Mar 2010

      Hello,

I'm working on what seems to be a fairly interesting problem, and I'm 
looking for any interprocess experts to lend any advice.  After reading 
all of the documentation for Interprocess, it seems that all of the 
examples work with a "one parent, many children" type model.  The 
problem that I am having is that I am working with a "many children, no 
parent" model.  My processes are spawned by the web server (and 
FastCGI), so there is no easy way to modify that code to manage the 
interprocess data.

At least one instance of my program is meant to stay alive forever, so 
I'm not very concerned about deleting the interprocess data at any time. 
  My problem however is the mutexes, and the stale locks that result if 
one of the processes crashes or exits abnormally.  It seems that once an 
instance of my application crashes, no other instance may ever get a 
lock on the shared mutex, because the original process still holds a lock.

I've thought of a few different ideas to solve this problem:

  * Using an interprocess shared_ptr for the mutex and data, with a 
custom deleter to remove all instances once the last application is 
exiting and the use count is zero.  However this suffers from the same 
problem - it seems the use count is never decremented when the 
application exits abnormally (for example, kill -9, crash, or CTRL+C).

  * Using a "heart beat" type system to keep track of processes.  My 
idea was to do something like this: Keep an interprocess associative 
array of process ID's mapped to last heartbeat time, with each process 
updating it's own heartbeat.  At the same time, every other process 
(since there is no one parent process) must keep track of the 
heartbeats, and remove processes from the array which have not responded 
with a heartbeat in awhile. The problem is this - what about the mutex 
for the associative array? And the locks that another application might 
have on them?  I wind up back at the point that I started at.  The stale 
locks are destroyed if the mutex is deleted (via 
boost::interprocess::named_mutex::delete), and I can detect the stale 
lock pretty reliably with a timed try lock (wait 30 seconds or so to 
aquire the lock, if it can never be acquired then there is obviously a 
problem).  But what of thread safety, since I now have to delete the 
lock and re-create it, where other processes might wind up doing the 
same thing? And what if those other processes are also trying to obtain 
a lock on the mutex at the same time I'm deleting and recreating it?

Those are pretty much the only ideas I've had.  Does anybody have 
anything better?  Maybe some of you have tackled a similar problem?

Thanks!

[Boost-users] [interprocess] Sharing data in a peer-to-peer fashion

Brett Gmoser