[interprocess] native Windows cond_var + mutex

Hi, Attached is a patch to implement interprocess_condition and interprocess_mutex natively on Windows. Following are some detailed notes. I hope this isn't too awkward of a time for a submission like this - I realize there's a lot of 1.48 activity going on. One more note about the patch - IIRC, some patch utils won't create directories. This patch is intended to be applied to boost/interprocess and relies on a new directory boost/interprocess/sync/win32. Background I started out by looking at http://www.cs.wustl.edu/~schmidt/win32-cv-1.html. This appears to be from the late '90s and is the basis for ACE's cond_var implementation. The solution in section 3.4 is arguably the most correct implementation. However, this solution relies on the atomicity of SignalObjectAndWait(). SOAW() may have been atomic for unicore systems common at the time but from what I've read is not atomic for multicore (it prevents context switches on a single core but ignores concurrent threads on other cores), and win32 documentation backs this up. While the paper argues that this leads to unfairness, I don't think the solution is correct without this atomicity. Wait/signal/wait problem One problem that I came across is what I'll call the wait/signal/wait problem. Setting the state, thread 1 is waiting on a condition and thread 2 has just completed cond_signal. Without an atomic SOAW(), thread 1 may release the external mutex without obtaining the associated semaphore. Another thread, thread 3, may obtain the external lock and find the cond_var signaled and behave correspondingly. This I believe is incorrect - a waiter which obtains the external mutex after the signaler has exited signal()should not find cond_var signaled. This situation is unlikely but it makes assumptions about the scheduler that I am not comfortable with. If there's a flaw in my reasoning hopefully someone will point that out. This solution This solution is structurally somewhat similarly to the paper's solution. One point to note is that notify_one()/signal() blocks on an event in the same manner as notify_all()/broadcast(). There may be a solution which does not block the signaler while wake-processing is occurring, but I saw additional complexity on that path and decided to forgo it for the time being. I hope the solution is correct but have only testing to validate. I haven't used any formal methods to back it up. Maybe I'll construct a Petri net sometime. That's always .. fun. I throw this solution on the mercy of the court. External mutex I have yet to see production examples of cond_var use where signalers do not hold the external mutex while they signal, but this use case is allowed by POSIX. My original solution (if it is in fact a solution) relies on some particular external mutex being held by all waiters and signalers for some particular condition variable. I recently added two mutexes to solve that situation but I have only minimal testing experience with that recent addition. Use of these new locks is demarcated with WIN32_POSIX_SEMANTICS. Interprocess vs. Intraprocess I'm not sure if there is support for process-specific synchronization objects in boost.interprocess, but if support is present or was added at some future time this could help optimize things for win32. At the moment win32 interprocess mutexes rely on assigning random names to mutexes (only named mutexes can cross platform boundaries in win32), and all mutex operations have to look up the mutex by this name. Interprocess seems to have been designed primarily with pthreads in mind, and things get a little awkward on Windows in this regard. Similarly, it would be nice to be able to use the native condition variables that win32 added for Vista+, but these can't be used across processes according to the documentation, and so can't be used AFAICT for inteprocess_condition. Correspondingly, an intraprocess_mutex would be able to forgo the win32 mutex name lookups. Miscellaneous Not all interprocess tests currently pass. There are some additional variations of mutex like the upgradable/sharable, etc., that I haven't looked at in detail. I'm hoping that this patch goes far enough that it'll have some momentum. I'm happy to either complete this work in the future or help whoever undertakes the task, but I may not be able to justify much of my time to the task in the short term to my employer. I realized a little too late that I didn't keep formatting entirely consistent with the existing code. Apologies. Hopefully it's only the opening block curlies. Thanks. Dan

El 27/10/2011 14:24, Dan Brown escribió:
Hi,
Attached is a patch to implement interprocess_condition and interprocess_mutex natively on Windows. Following are some detailed notes. I hope this isn't too awkward of a time for a submission like this - I realize there's a lot of 1.48 activity going on. One more note about the patch - IIRC, some patch utils won't create directories. This patch is intended to be applied to boost/interprocess and relies on a new directory boost/interprocess/sync/win32.
You can't put HANDLEs as members for classes shared in memory between processes as they are void pointers only useful for the process that creates them. Using native windows synchronization for process shared synchronization primitives with POSIX lifetime semantics is not easy, no project like APR or Cygwin has achieved this AFAIK. Ion

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost- bounces@lists.boost.org] On Behalf Of Ion Gaztañaga Sent: Friday, October 28, 2011 1:42 PM To: boost@lists.boost.org Subject: Re: [boost] [interprocess] native Windows cond_var + mutex
El 27/10/2011 14:24, Dan Brown escribió:
Hi,
Attached is a patch to implement interprocess_condition and interprocess_mutex natively on Windows. Following are some detailed notes. I hope this isn't too awkward of a time for a submission like this - I realize there's a lot of 1.48 activity going on. One more note about the patch - IIRC, some patch utils won't create directories. This patch is intended to be applied to boost/interprocess and relies on a new directory boost/interprocess/sync/win32.
You can't put HANDLEs as members for classes shared in memory between processes as they are void pointers only useful for the process that creates them. Using native windows synchronization for process shared synchronization primitives with POSIX lifetime semantics is not easy, no project like APR or Cygwin has achieved this AFAIK.
Yes and no. You're correct in that I overlooked the use of handles for interprocess_condition. I believe, though, that solving this for the interprocess_condition event handles is the same as the solution I use for win32/interprocess_mutex. Named mutexes (and events) can be used cross-process. It's a fairly straightforward fix unless I'm missing something. I'll get you an updated patch shortly. Thanks for the feedback. Dan

Yes and no. You're correct in that I overlooked the use of handles for interprocess_condition. I believe, though, that solving this for the interprocess_condition event handles is the same as the solution I use for win32/interprocess_mutex. Named mutexes (and events) can be used cross-process. It's a fairly straightforward fix unless I'm missing something. I'll get you an updated patch shortly.
For each lock you are creating handles without closing them (leaks) and connecting by name for each lock is also pretty inefficient. CloseHandle in the destructor is useless, as no one is going to call it as users will just remove shared memory. Mutex names are local and can't be seen by other users/services.... Believe me, using native calls to emulate process-shared pthreads is not that easy. Ion

Yes and no. You're correct in that I overlooked the use of handles for interprocess_condition. I believe, though, that solving this for the interprocess_condition event handles is the same as the solution I use for win32/interprocess_mutex. Named mutexes (and events) can be used cross-process. It's a fairly straightforward fix unless I'm missing something. I'll get you an updated patch shortly.
For each lock you are creating handles without closing them (leaks) and
Yes, lock() and wait() have this leak. I will address this in a new patch.
connecting by name for each lock is also pretty inefficient.
Agreed. However, the trade is substantially reduced latency, since currently, waking on signaled sync objects on win32 require at least an average of a half ms.
CloseHandle in the destructor is useless, as no one is going to call it as users will just remove shared memory.
I'm not sure I understand this point and perhaps I don't understand the intended usage model for interprocess mutex/condvar well enough. Do you mean that an expected valid use of sync objects is to construct them in shared memory but not destroy them - only delete their underlying storage? Is this explicit or implicit in sync object documentation? Or am I misunderstanding your point?
Mutex names are local and can't be seen by other users/services....
Wouldn't appending "Global\" to the win32 synchronization objects address this scoping issue?
Believe me, using native calls to emulate process-shared pthreads is not that easy.
I'm open to that possibility but I'm still not convinced that this solution is as fundamentally off the mark as you seem to be suggesting. I will try to provide a new patch soon that addresses at least some of the issues you've raised to support that position. In my mind this also re-raises the question of whether strictly intraprocess synchronization objects could be useful, since these are, if not more common than strictly inter-process sync objects, at least very commonly used on Windows and I believe other platforms as well (ACE provides these for example). This is, in fact, precisely what I was hoping to use interprocess for in my own cross-platform project. But perhaps a library called "interprocess" is not the right home for these?

El 29/10/2011 12:42, Dan Brown escribió:
I'm open to that possibility but I'm still not convinced that this solution is as fundamentally off the mark as you seem to be suggesting. I will try to provide a new patch soon that addresses at least some of the issues you've raised to support that position. In my mind this also re-raises the question of whether strictly intraprocess synchronization objects could be useful, since these are, if not more common than strictly inter-process sync objects, at least very commonly used on Windows and I believe other platforms as well (ACE provides these for example). This is, in fact, precisely what I was hoping to use interprocess for in my own cross-platform project. But perhaps a library called "interprocess" is not the right home for these?
Interprocess tries to be portable and that means choosing a model for lifetime, etc. The chosen model is POSIX. Windows has essentially *named* process synchronization mechanisms, but their lifetime semantics are radically different than POSIX ones, and that's a problem (although solvable, as I think cygwin manages this). Several emulation layers need services, daemons, etc, a model Interprocess and Boost don't follow. ACE does not support mutexes placed in shared memory (_POSIX_THREAD_PROCESS_SHARED). Cygwin does not support it. APR does not support it. I've reviewed several libraries to find a better solution than current spinlock-based code, but still have not found a good solution. I'm not saying that it does not exists, I'm just saying that it is not as simple as it might seem. Interprocess has a solution, not very efficient but portable. Ion

I'm open to that possibility but I'm still not convinced that this solution is as fundamentally off the mark as you seem to be suggesting. I will try to provide a new patch soon that addresses at least some of the issues you've raised to support that position. In my mind this also re-raises the question of whether strictly intraprocess synchronization objects could be useful, since these are, if not more common than strictly inter-process sync objects, at least very commonly used on Windows and I believe other platforms as well (ACE provides these for example). This is, in fact, precisely what I was hoping to use interprocess for in my own cross-platform project. But perhaps a library called "interprocess" is not the right home for these?
Interprocess tries to be portable and that means choosing a model for lifetime, etc. The chosen model is POSIX. Windows has essentially *named* process synchronization mechanisms, but their lifetime semantics are radically different than POSIX ones, and that's a problem (although solvable, as I think cygwin manages this). Several emulation layers need services, daemons, etc, a model Interprocess and Boost don't follow.
ACE does not support mutexes placed in shared memory (_POSIX_THREAD_PROCESS_SHARED). Cygwin does not support it. APR does not support it.
This was actually my intended point, though maybe I didn't make it clearly. The fact that these libraries (ACE, etc.) provide intraprocess sync objects but not interprocess sync objects is evidence that intraprocess sync objects meet many people's needs. In fact, I find that I need sync objects within a process much more often than I need them between processes. If the boost.interprocess sync usage model cannot be more efficiently implemented on win32, either by finding a way to implement the current model, by relaxing the model's requirements (see my next question below), or by providing strictly intraprocess versions of sync objects, then they're unsuitable for my current purposes. I don't think my requirements are unusual in this respect either. Based on my own needs and the fact that other libraries like ACE provide this cross-platform functionality convinces me that Boost would benefit from the addition of similar intraprocess sync objects.
I've reviewed several libraries to find a better solution than current spinlock-based code, but still have not found a good solution. I'm not saying that it does not exists, I'm just saying that it is not as simple as it might seem. Interprocess has a solution, not very efficient but portable.
To this point in the discussion, only issue I can see as fundamentally preventing the proposed implementation from working (assuming the addition of named-event fixes I alluded to earlier) is the named-object "lifetime semantics" issue that you mentioned, but I'm not sure I understand it fully. It would be helpful for me if you could you clarify the difference in sync object lifetime semantics that make cross-process named win32 sync objects unsuitable for supporting the boost.interprocess version of pthread sync object semantics.

El 30/10/2011 1:48, Dan Brown escribió:
This was actually my intended point, though maybe I didn't make it clearly. The fact that these libraries (ACE, etc.) provide intraprocess sync objects but not interprocess sync objects is evidence that intraprocess sync objects meet many people's needs. In fact, I find that I need sync objects within a process much more often than I need them between processes.
Interprocess is about cross-process, if you need intra-process Boost.Thread is the natural way. In Interprocess you can customize mutexes used by some classes for intra-process via templates (see http://www.boost.org/libs/dic/html/interprocess/customizing_interprocess.htm... and class reference for details).
If the boost.interprocess sync usage model cannot be more efficiently implemented on win32, either by finding a way to implement the current model, by relaxing the model's requirements (see my next question below), or by providing strictly intraprocess versions of sync objects, then they're unsuitable for my current purposes.
Which are your requirements? If you plan to build portable synchronization objects for intra-process, I think that for intraprocess the natural library is to propose them for Boost.Thread. If you plan to build inter-process mechanisms, then it should fit Interprocess requirements.
To this point in the discussion, only issue I can see as fundamentally preventing the proposed implementation from working (assuming the addition of named-event fixes I alluded to earlier) is the named-object "lifetime semantics" issue that you mentioned, but I'm not sure I understand it fully. It would be helpful for me if you could you clarify the difference in sync object lifetime semantics that make cross-process named win32 sync objects unsuitable for supporting the boost.interprocess version of pthread sync object semantics.
It's a bit long to explain but in windows named resources are reference-counted and are destroyed when the last attached process dies/detaches. Windows has no unnamed process-shared synchronization primitives, unnamed ones are intra-proces only. In POSIX named resources are file-like (they live until they are explicitly removed) and unnamed can be intra-process or interprocess (PTHREAD_PSHARED). Unnamed interprocess resources must be constructed in shared memory/memory mapped files, use them, unmap them, remap them and continue usihng them. For more details I recommend: UNIX Network Programming, Volume 2, Second Edition: Interprocess Communications http://www.kohala.com/start/unpv22e/unpv22e.html Best, Ion

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost- bounces@lists.boost.org] On Behalf Of Ion Gaztañaga Sent: Sunday, October 30, 2011 5:37 AM To: boost@lists.boost.org Subject: Re: [boost] [interprocess] native Windows cond_var + mutex
El 30/10/2011 1:48, Dan Brown escribió:
This was actually my intended point, though maybe I didn't make it clearly. The fact that these libraries (ACE, etc.) provide intraprocess sync objects but not interprocess sync objects is evidence that intraprocess sync objects meet many people's needs. In fact, I find that I need sync objects within a process much more often than I need them between processes.
Interprocess is about cross-process, if you need intra-process Boost.Thread is the natural way. In Interprocess you can customize mutexes used by some classes for intra-process via templates (see http://www.boost.org/libs/dic/html/interprocess/customizing_interprocess. html and class reference for details).
The fact that Boost.Thread provides sync objects is an embarrassing oversight on my part and solves my immediate problem. Thanks for the pointer. Continued curiosity below...
It's a bit long to explain but in windows named resources are reference- counted and are destroyed when the last attached process dies/detaches. Windows has no unnamed process-shared synchronization primitives, unnamed ones are intra-proces only.
In POSIX named resources are file-like (they live until they are explicitly removed) and unnamed can be intra-process or interprocess (PTHREAD_PSHARED). Unnamed interprocess resources must be constructed in shared memory/memory mapped files, use them, unmap them, remap them and continue usihng them. For more details I recommend:
UNIX Network Programming, Volume 2, Second Edition: Interprocess Communications
Thanks for the explanation. The mutex implementation I provided intends to provide "unnamed-like" cross-process mutex semantics by assigning random names. The same can be done with the internal events in interprocess_condition to make them properly interprocess. I realize this presents the corner case of name collision with user-named objects, but this seems unlikely enough not to be a major consideration. Now that I understand the lifetime that applies to (file-like) named objects issue a little better I'm still unsure of why it matters. Why does it matter (other than performance) if on Windows, the object happens to be deleted when the last reference goes away? The "explicit destroy" operation is then a noop on Windows. When it is referenced again it is recreated. There must be some use case that is not occurring to me right now. This is a little bit academic for me now but I remain curious about this case. If a mutex is destroyed in the forest and no one is around to lock it, does it make a sound? ;-)

El 30/10/2011 15:34, Dan Brown escribió:
Now that I understand the lifetime that applies to (file-like) named objects issue a little better I'm still unsure of why it matters. Why does it matter (other than performance) if on Windows, the object happens to be deleted when the last reference goes away? The "explicit destroy" operation is then a noop on Windows. When it is referenced again it is recreated. There must be some use case that is not occurring to me right now.
Yes, because in UNIX a process could create a named semaphore / shared memory (which is a named resource), increase / write it, exit and then another process could get those counts/memory. It's a widely used pattern in some environments. In windows, when the creator exits, as it was the only attached process, the mutex/memory would be automatically destroyed and no new process could get access to it.

Now that I understand the lifetime that applies to (file-like) named objects issue a little better I'm still unsure of why it matters. Why does it matter (other than performance) if on Windows, the object happens to be deleted when the last reference goes away? The "explicit destroy" operation is then a noop on Windows. When it is referenced again it is recreated. There must be some use case that is not occurring to me right now.
Yes, because in UNIX a process could create a named semaphore / shared memory (which is a named resource), increase / write it, exit and then another process could get those counts/memory. It's a widely used pattern in some environments. In windows, when the creator exits, as it was the only attached process, the mutex/memory would be automatically destroyed and no new process could get access to it.
So this would be an issue for objects that have initial state that is specified only at creation time not acquisition time, e.g. initial semaphore count. Can't mutexes be created transparently and atomically? I think the same would be true of condition variables. So when the last process exits and destroys the mutex or cond_var, the next process to acquire it creates it. There can be logical persistence for some objects via their name even if there isn't physical persistence.

El 30/10/2011 21:23, Dan Brown escribió:
So this would be an issue for objects that have initial state that is specified only at creation time not acquisition time, e.g. initial semaphore count. Can't mutexes be created transparently and atomically? I think the same would be true of condition variables. So when the last process exits and destroys the mutex or cond_var, the next process to acquire it creates it. There can be logical persistence for some objects via their name even if there isn't physical persistence.
No, I can create shared memory write anything into it and unmap it. In windows it would be destroyed. In UNIX another process could read anything the first process has written. I could take a semaphore to count arrived packets increasing the semaphore count for each packet, and then exit producer process. After that another process could know how many packets arrived. In windows the semaphore would be destroyed and the consumer wouldn't get any information. The same with a message queue (windows has no native message queue but you could do one with shmem and named mutex/condition var). The producer can fill the message queue and exit. The consumer takes messages. In windows the message queue would be destroyed and data lost.

El 30/10/2011 21:23, Dan Brown escribió:
So this would be an issue for objects that have initial state that is specified only at creation time not acquisition time, e.g. initial semaphore count. Can't mutexes be created transparently and atomically? I think the same would be true of condition variables. So when the last process exits and destroys the mutex or cond_var, the next process to acquire it creates it. There can be logical persistence for some objects via their name even if there isn't physical persistence.
No, I can create shared memory write anything into it and unmap it. In windows it would be destroyed. In UNIX another process could read anything the first process has written.
I could take a semaphore to count arrived packets increasing the semaphore count for each packet, and then exit producer process. After that another process could know how many packets arrived. In windows the semaphore would be destroyed and the consumer wouldn't get any information.
The same with a message queue (windows has no native message queue but you could do one with shmem and named mutex/condition var). The producer can fill the message queue and exit. The consumer takes messages. In windows the message queue would be destroyed and data lost.
Agreed. I was referring to the implementation of mutexes and condition vars. I believe those could be implemented in the manner I've suggested because they don't have the state issues that you've pointed apply to shared memory and semaphores and other objects. That is unless there's some interaction of those objects with the other parts of interprocess that I'm not taking into account.

2011/10/30 Ion Gaztañaga <igaztanaga@gmail.com>:
El 30/10/2011 21:23, Dan Brown escribió:
[...]. There can be logical persistence for some objects via their name even if there isn't physical persistence.
No, I can create shared memory write anything into it and unmap it. In windows it would be destroyed. In UNIX another process could read anything the first process has written.
I've worked on an app where this was worked-around by a little spawned "tray" app that referenced the "persistent" shared-memory segments, and this was achieved using Windows-specific COM interfaces (since this is a Windows-specific work-around, that sounds reasonable). Once registered, the little tray app would be re-spawned automatically if 1) the user logged out, or b) the machine rebooted (Windows update anyone...), and c) the user explicitly exited the tray app. It's not of course file-like in its persistence, but quite adequate and often sufficient. I'm sure you're aware of this type of work-around, so I guess my question is why this is not included out-of-the-box in Boost.Interprocess? One can also easily imagine service (i.e. daemon) based work-arounds, but these are the disadvantage to require more permissions to install I guess, with the benefit to survive a logoff. (I usually have Local Admin rights, so maybe just the COM registration requires some rights to write the registry...) I think I also remember on this list discussions about Windows-Kernel-based ways to keep the SHM segments around. These are all work-around, but having any one of those in Boost.Interprocess would be "nice"! :) Thanks, --DD
participants (3)
-
Dan Brown
-
Dominique Devienne
-
Ion Gaztañaga