
Hello, I'm working on what seems to be a fairly interesting problem, and I'm looking for any interprocess experts to lend any advice. After reading all of the documentation for Interprocess, it seems that all of the examples work with a "one parent, many children" type model. The problem that I am having is that I am working with a "many children, no parent" model. My processes are spawned by the web server (and FastCGI), so there is no easy way to modify that code to manage the interprocess data. At least one instance of my program is meant to stay alive forever, so I'm not very concerned about deleting the interprocess data at any time. My problem however is the mutexes, and the stale locks that result if one of the processes crashes or exits abnormally. It seems that once an instance of my application crashes, no other instance may ever get a lock on the shared mutex, because the original process still holds a lock. I've thought of a few different ideas to solve this problem: * Using an interprocess shared_ptr for the mutex and data, with a custom deleter to remove all instances once the last application is exiting and the use count is zero. However this suffers from the same problem - it seems the use count is never decremented when the application exits abnormally (for example, kill -9, crash, or CTRL+C). * Using a "heart beat" type system to keep track of processes. My idea was to do something like this: Keep an interprocess associative array of process ID's mapped to last heartbeat time, with each process updating it's own heartbeat. At the same time, every other process (since there is no one parent process) must keep track of the heartbeats, and remove processes from the array which have not responded with a heartbeat in awhile. The problem is this - what about the mutex for the associative array? And the locks that another application might have on them? I wind up back at the point that I started at. The stale locks are destroyed if the mutex is deleted (via boost::interprocess::named_mutex::delete), and I can detect the stale lock pretty reliably with a timed try lock (wait 30 seconds or so to aquire the lock, if it can never be acquired then there is obviously a problem). But what of thread safety, since I now have to delete the lock and re-create it, where other processes might wind up doing the same thing? And what if those other processes are also trying to obtain a lock on the mutex at the same time I'm deleting and recreating it? Those are pretty much the only ideas I've had. Does anybody have anything better? Maybe some of you have tackled a similar problem? Thanks!