[interprocess] leaked named mutexes

I recently discovered that a process can very easily leave a named mutex dangling. Consider the following: #include <cstdlib> #include <iostream> #include <boost/interprocess/sync/named_mutex.hpp> #include <boost/interprocess/sync/scoped_lock.hpp> namespace ip = boost::interprocess; char const *name = "mynamedmutex"; int main(int argc, char*argv[]) { ip::named_mutex mtx(ip::open_or_create, name); std::cout << "acquiring named mutex" << std::endl; ip::scoped_lock<ip::named_mutex> lock(mtx); std::cout << "acquired" << std::endl; exit(EXIT_FAILURE); // whoops } On my Linux box, this runs fine the first time, but the second time it hangs waiting to acquire the mutex. I have to manually delete the semaphore in /dev/shm/. This will happen whenever the process exits without calling destructors of locals; for instance: - std::exit - std::quick_exit - std::abort - std::terminate - a crash - assert failure - an unhandled exception - etc.. I find I can handle *some* of this by registering a terminate handler, an exit handler (and on C++11, a quick_exit handler) that calls boost::interprocess::named_mutex::remove. This raises a few questions, though... - Is there a better way? - Is it safe to `remove` the same named mutex multiple times? - Does this clean up only this process's use of the named_mutex, or does it nuke it from the system, even if another process is using it? (The docs suggest the latter, which is not what I want, is it?) Thanks, -- Eric Niebler Boost.org

On Mar 6, 2013, at 3:51 PM, Eric Niebler wrote:
I recently discovered that a process can very easily leave a named mutex dangling. Consider the following:
#include <cstdlib> #include <iostream> #include <boost/interprocess/sync/named_mutex.hpp> #include <boost/interprocess/sync/scoped_lock.hpp> namespace ip = boost::interprocess;
char const *name = "mynamedmutex";
int main(int argc, char*argv[]) { ip::named_mutex mtx(ip::open_or_create, name); std::cout << "acquiring named mutex" << std::endl; ip::scoped_lock<ip::named_mutex> lock(mtx); std::cout << "acquired" << std::endl; exit(EXIT_FAILURE); // whoops }
On my Linux box, this runs fine the first time, but the second time it hangs waiting to acquire the mutex. I have to manually delete the semaphore in /dev/shm/.
This will happen whenever the process exits without calling destructors of locals; for instance:
- std::exit - std::quick_exit - std::abort - std::terminate - a crash - assert failure - an unhandled exception - etc..
I find I can handle *some* of this by registering a terminate handler, an exit handler (and on C++11, a quick_exit handler) that calls boost::interprocess::named_mutex::remove. This raises a few questions, though...
- Is there a better way? - Is it safe to `remove` the same named mutex multiple times? - Does this clean up only this process's use of the named_mutex, or does it nuke it from the system, even if another process is using it? (The docs suggest the latter, which is not what I want, is it?)
Hi Eric, If ipcs lists your named entity, ipcrm should remove it. I'm not sure how boost::ip objects are created so these commands may not help you. -- Noel

On 2013-03-06 23:51, Eric Niebler wrote:
On my Linux box, this runs fine the first time, but the second time it hangs waiting to acquire the mutex. I have to manually delete the semaphore in /dev/shm/.
I do not know the specifics of named_mutex, but this is a general problem with Unix shared memory, semaphores, and message queues. The basic problem is that these IPC facilities are designed to be used between processes, and sometimes you actually want them to survive a crash, so it is difficult to make a garbage collection for them.
- Does this clean up only this process's use of the named_mutex, or does it nuke it from the system, even if another process is using it? (The docs suggest the latter, which is not what I want, is it?)
Probably the latter. The workaround that I have been using is to get the PID of the last process that has accessed the IPC, and check if it is still running. If not, I remove the IPC. I was doing this with a script, so I used the ipcs -p, ps, and ipcrm commands.

On Mar 6, 2013, at 5:51 PM, Eric Niebler wrote:
I recently discovered that a process can very easily leave a named mutex dangling. Consider the following:
To deal with a mutex that was held by a thread whose process has died, one needs to use a "robust" mutex. There's a POSIX mutex construction attribute for making robust mutexes, and it's supported on Linux starting circa 2.6.18(?). I think (some versions of?) Windows provide this mechanism in the native mutexes too. There's a little protocol around mutex lock attempts, where an error return code indicates the earlier owner died, so that as part of lock acquisition you've now also acquired responsibility for dealing with any cleanup. Dealing with robust mutexes is tricky. With exception safety one relies on no-throw operations as basic primitives. The nearest cognate in the robust mutex / cross process world is (true, not emulated) atomic operations. You can probably guess what that does to complexity.

On 7 March 2013 17:50, Kim Barrett wrote:
On Mar 6, 2013, at 5:51 PM, Eric Niebler wrote:
I recently discovered that a process can very easily leave a named mutex dangling. Consider the following:
To deal with a mutex that was held by a thread whose process has died, one needs to use a "robust" mutex. There's a POSIX mutex construction attribute for making robust mutexes, and it's supported on Linux starting circa 2.6.18(?). I think (some versions of?) Windows provide this mechanism in the native mutexes too. There's a little protocol around mutex lock attempts, where an error return code indicates the earlier owner died, so that as part of lock acquisition you've now also acquired responsibility for dealing with any cleanup.
Dealing with robust mutexes is tricky. With exception safety one relies on no-throw operations as basic primitives. The nearest cognate in the robust mutex / cross process world is (true, not emulated) atomic operations. You can probably guess what that does to complexity.
Tricky indeed, I've been trying to work out how to map robust mutexes to the C++11 Mutex concepts, and have so far decided that calling robust_mutex.lock() or robust_mutex.try_lock() should throw an exception (with errc::state_not_recoverable) if the owner has died, even though C++11 says try_lock() is non-throwing. Handling the EOWNERDEAD case has to be requested explicitly by the user by passing a special value of type robust_t: auto result = robust_mutex.lock(robust); if (result == robust_lock_result::locked) { // got the lock } else // result == robust_lock_result::inconsistent { // we have the lock, but state is unknown // attempt to recover state and either robust_mutex.recover(); // or robust_mutex.unlock(); // mark mutex as unusable } Has anyone else looked at trying to fit robust mutexes into the C++11 concepts or into Boost? I'd be interested in working with anyone looking into it.

On Mar 7, 2013, at 1:21 PM, Jonathan Wakely wrote:
Tricky indeed, I've been trying to work out how to map robust mutexes to the C++11 Mutex concepts, and have so far decided that calling robust_mutex.lock() or robust_mutex.try_lock() should throw an exception (with errc::state_not_recoverable) if the owner has died, even though C++11 says try_lock() is non-throwing.
I think that demonstrates that robust mutexes are not models of the C++11 Mutex concepts.
Handling the EOWNERDEAD case has to be requested explicitly by the user by passing a special value of type robust_t:
My approach to using robust mutexes does not attempt to treat them as models of the C++11 Mutex concept. Instead, there are now locking operations associated with them which *require* a handler function that gets invoked on EOWNERDEAD. This approach filtered up to condition variables too; can't use C++11 / boost.thread condition variables with these robust mutexes, because of the requirement for a handler for EOWNERDEAD. I thought about providing C++11 Mutex-like operations to allow these robust mutexes to be used like ordinary mutexes, but ultimately decided there was no real use-case for that, since the whole point of a robust mutex is to support EOWNERDEAD handling.
participants (5)
-
Belcourt, Kenneth
-
Bjorn Reese
-
Eric Niebler
-
Jonathan Wakely
-
Kim Barrett