[Interprocess] hang locking p_hdr->m_mutex

I've got my machine (Win 7) into a state where using a message queue
results in a hang. More specifically, calls to:
scoped_lock

El 20/08/2011 0:54, David Byron escribió:
I've got my machine (Win 7) into a state where using a message queue results in a hang. More specifically, calls to:
scoped_lock
lock(p_hdr->m_mutex); hang forever.
As a bit more background, there's a message queue that holds up to 5 x 1024 byte messages, and I believe currently has 2. I'm having a hard time verifying that because calls to get_num_msg() hang trying on calls like the one above, but that's what I see when I look in c:/ProgramData/boost_interprocess/20110816195406.550541/CliIPC.
I have feeling what happened is that some process that was using the message queue got terminated while holding the p_hdr->m_mutex. Does anyone have any suggestions for recovering from this?
I'm using version 1.47.
For now, the only thing you can do is using a timed lock, It's not easy to support robust mutexes when using emulated process-shared resources. Ion

On 8/19/2011 4:43 PM, Ion Gaztañaga wrote:
El 20/08/2011 0:54, David Byron escribió:
I've got my machine (Win 7) into a state where using a message queue results in a hang. More specifically, calls to:
scoped_lock
lock(p_hdr->m_mutex); hang forever.
For now, the only thing you can do is using a timed lock, It's not easy to support robust mutexes when using emulated process-shared resources.
Except that the above code is in message_queue.hpp even in try_send or
timed_send. It's in get_num_msg() for example. So short of modifying
message_queue.hpp I'm not sure what to do.
One thing that came to mind is always calling message_queue::remove on
startup. I think that gets passed this problem, but may introduce some
other complications if the two processes that use the message queue
start and end at undefined times in an undefined order. I'm still
trying to test this, but perhaps you can help by answering this question:
If a process is waiting for the mutex like this:
scoped_lock

El 20/08/2011 1:57, David Byron escribió:
Except that the above code is in message_queue.hpp even in try_send or timed_send. It's in get_num_msg() for example. So short of modifying message_queue.hpp I'm not sure what to do.
Sorry, I didn't understand you. In those cases, the only solution would be to configure at compile time (macro or whatever) a maximum lock time and throw an exception just to notify that a deadlock might be ocurring. Detecting the dead of a mutex owner is not an easy task without kernel support.
One thing that came to mind is always calling message_queue::remove on startup. I think that gets passed this problem, but may introduce some other complications if the two processes that use the message queue start and end at undefined times in an undefined order. I'm still trying to test this, but perhaps you can help by answering this question:
If a process is waiting for the mutex like this:
scoped_lock
lock(p_hdr->m_mutex); what happens if another process removes the message queue?
The queue is removed (in windows, the name of the queue is changed so that no other connection succeeds and marked to be erased when the last handle is closed, Ion

On 8/19/2011 11:21 PM, Ion Gaztañaga wrote:
El 20/08/2011 1:57, David Byron escribió:
Except that the above code is in message_queue.hpp even in try_send or timed_send. It's in get_num_msg() for example. So short of modifying message_queue.hpp I'm not sure what to do.
Sorry, I didn't understand you. In those cases, the only solution would be to configure at compile time (macro or whatever) a maximum lock time and throw an exception just to notify that a deadlock might be ocurring. Detecting the dead of a mutex owner is not an easy task without kernel support.
Instead of using a persistent integer on windows, how about CreateMutex Then if the mutex holder dies, the other side learns about it because WaitForSingleObject returns WAIT_ABANDONED. But then it's not so important to learn that the other side is dead...just that we've got the mutex and it's OK to proceed. At the moment I see two implementations in interprocess/sync/interprocess_mutex.hpp -- a posix one and emulation/mutex.xpp. My understanding is that a process waiting on a posix interprocess_mutex wakes up (having taken ownership) if the owner process dies. Is that right? Adding an implementation for windows that behaves that way feels like a big bonus -- way better than hanging. Is something like this feasible? I'm fairly new to boost so I have a feeling creating a patch is going to be pretty slow going but but I'll take a crack if it would help. Barring this, I'm struggling to come up with a safe way to use boost::interprocess::message_queue on windows (or any platform that uses the emulation interprocess_mutex really). If anyone has any suggestions, I'd love to hear them.
One thing that came to mind is always calling message_queue::remove on startup. I think that gets passed this problem, but may introduce some other complications if the two processes that use the message queue start and end at undefined times in an undefined order. I'm still trying to test this, but perhaps you can help by answering this question:
If a process is waiting for the mutex like this:
scoped_lock
lock(p_hdr->m_mutex); what happens if another process removes the message queue?
The queue is removed (in windows, the name of the queue is changed so that no other connection succeeds and marked to be erased when the last handle is closed,
That's useful information, but what I'm also curious about is whether the process waiting for the mutex wakes up or not. Using the emulated interprocess_mutex, I'm not sure. Because the waiting process still likely has a handle, I guess it hangs forever, yes? Thanks for your help. -DB

El 25/08/2011 20:53, David Byron escribió:
Instead of using a persistent integer on windows, how about CreateMutex Then if the mutex holder dies, the other side learns about it because WaitForSingleObject returns WAIT_ABANDONED. But then it's not so important to learn that the other side is dead...just that we've got the mutex and it's OK to proceed.
The key problem is object lifetime. In windows, when last attached process dies/closes the handle, the interprocess mechanism is automatically destroyed. In Unix, it's more like a file and an explicit unlink must be done. That's the real problem of portable semantics between windows and unix.
My understanding is that a process waiting on a posix interprocess_mutex wakes up (having taken ownership) if the owner process dies. Is that right? Adding an implementation for windows that behaves that way feels like a big bonus -- way better than hanging.
No. There is an option, robust mutexes, not implemented by all systems, that establishes a protocol for handling abandoned mutexes: http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_...
The queue is removed (in windows, the name of the queue is changed so that no other connection succeeds and marked to be erased when the last handle is closed,
That's useful information, but what I'm also curious about is whether the process waiting for the mutex wakes up or not. Using the emulated interprocess_mutex, I'm not sure. Because the waiting process still likely has a handle, I guess it hangs forever, yes?
It will hang, yes, just like in a posix system without robust mutex support. Adding robust mutex support is in my to-do list, exploring some emulation for windows too, but I guess that would require a huge amount, because I don't know any portable runtime that has achieved this. Best, Ion

On 8/25/2011 9:56 PM, Ion Gaztañaga wrote:
El 25/08/2011 20:53, David Byron escribió:
Instead of using a persistent integer on windows, how about CreateMutex Then if the mutex holder dies, the other side learns about it because WaitForSingleObject returns WAIT_ABANDONED. But then it's not so important to learn that the other side is dead...just that we've got the mutex and it's OK to proceed.
The key problem is object lifetime. In windows, when last attached process dies/closes the handle, the interprocess mechanism is automatically destroyed. In Unix, it's more like a file and an explicit unlink must be done. That's the real problem of portable semantics between windows and unix.
I see. So I guess I repeat my earlier comment/question (slightly modified): I'm struggling to come up with a safe way to use boost::interprocess::message_queue. If anyone has any suggestions, I'd love to hear them.
The queue is removed (in windows, the name of the queue is changed so that no other connection succeeds and marked to be erased when the last handle is closed,
That's useful information, but what I'm also curious about is whether the process waiting for the mutex wakes up or not. Using the emulated interprocess_mutex, I'm not sure. Because the waiting process still likely has a handle, I guess it hangs forever, yes?
It will hang, yes, just like in a posix system without robust mutex support. Adding robust mutex support is in my to-do list, exploring some emulation for windows too, but I guess that would require a huge amount, because I don't know any portable runtime that has achieved this.
If CreateMutex behaves the "right way" on windows, does it make sense to have the behavior differ across platforms? Do you agree that using CreateMutex instead of the emulation mutex would prevent hangs? Are there other downsides I haven't considered? -DB

El 26/08/2011 14:40, David Byron escribió:
If CreateMutex behaves the "right way" on windows, does it make sense to have the behavior differ across platforms?
Portability is the most important goal for Interprocess :( And CreateMutex needs a name, you can't construct a named mutex in shared memory, both are different beasts. Best, Ion

On 8/26/2011 9:56 AM, Ion Gaztañaga wrote:
El 26/08/2011 14:40, David Byron escribió:
If CreateMutex behaves the "right way" on windows, does it make sense to have the behavior differ across platforms?
Portability is the most important goal for Interprocess :(
Makes sense. I'm not hell bent on changing it. I'd love to use it just as it is. I just can't figure out how to do it safely given that a process might die while holding the interprocess_mutex. I could easily be missing something. If someone would tell me if that's the case, I'd be eternally grateful.
And CreateMutex needs a name, you can't construct a named mutex in shared memory, both are different beasts.
From http://msdn.microsoft.com/en-us/library/ms682411%28v=vs.85%29.aspx: "Multiple processes can have handles of the same mutex object, enabling use of the object for interprocess synchronization." and then: "A process can specify a named mutex in a call to the OpenMutex or CreateMutex function to retrieve a handle to the mutex object." The name of the message queue seems OK, perhaps beginning with "Global\" on some versions of windows. So I still think windows mutexes would work. -DB

El 27/08/2011 1:25, David Byron escribió:
On 8/26/2011 9:56 AM, Ion Gaztañaga wrote:
El 26/08/2011 14:40, David Byron escribió:
If CreateMutex behaves the "right way" on windows, does it make sense to have the behavior differ across platforms?
Portability is the most important goal for Interprocess :(
Makes sense. I'm not hell bent on changing it. I'd love to use it just as it is. I just can't figure out how to do it safely given that a process might die while holding the interprocess_mutex. I could easily be missing something. If someone would tell me if that's the case, I'd be eternally grateful.
I'm integrating a patch kindly sent by Ross MacGregor that activates a timeout when locking, if a define is set. When you can't lock a mutex for a time longer than X milliseconds, a special exception is thrown. Then you should erase that resource (message queue or whatever) as it is likely to be corrupted by the crashing process. I hope we can put this in Boost 1.48.
And CreateMutex needs a name, you can't construct a named mutex in shared memory, both are different beasts.
From http://msdn.microsoft.com/en-us/library/ms682411%28v=vs.85%29.aspx:
"Multiple processes can have handles of the same mutex object, enabling use of the object for interprocess synchronization."
and then:
"A process can specify a named mutex in a call to the OpenMutex or CreateMutex function to retrieve a handle to the mutex object."
The name of the message queue seems OK, perhaps beginning with "Global\" on some versions of windows.
So I still think windows mutexes would work.
I repeat, you still have lifetime issues so we can't maintain message queue lifetime semantics and use simple CreateMutex. Best, Ion

On 8/28/2011 9:08 AM, Ion Gaztañaga wrote:
I'm integrating a patch kindly sent by Ross MacGregor that activates a timeout when locking, if a define is set. When you can't lock a mutex for a time longer than X milliseconds, a special exception is thrown. Then you should erase that resource (message queue or whatever) as it is likely to be corrupted by the crashing process.
I hope we can put this in Boost 1.48.
That would be great.
I repeat, you still have lifetime issues so we can't maintain message queue lifetime semantics and use simple CreateMutex.
I agree with you. It's finally sinking in that the mechanism I'm really looking for is a message queue but with process lifetime....and it would be fine for me to use windows_shared_memory only if that's easier. It does seem easier to use since there's less of a distinction between the two processes using the queue -- both can open_or_create and neither one needs to remove. And, I don't need to build anything in my protocol do deal with stale messages leftover from previous processes. If anyone has code for something like this, I'd love to see it. Even assuming an interprocess_mutex implemented with CreateMutex, I'm not sure what the corresponding changes need to be in interprocess_condition. Or that this is even the right way to go in the first place. Thanks much. -DB
participants (2)
-
David Byron
-
Ion Gaztañaga