[boost] [interprocess] Semaphore cleanup after crash

When using semaphores to synchronize separate processes, everything works fine when each process exits nicely (closing its semaphores before exit). But things get really messy when a process might crash. I am unable to figure out how to recover from such a crash which leaves semaphores in inconsistent state. If a semaphore is not in-use (open) by any process, in this case (in my application) I can safely 'remove' it and start afresh. Is there some way to find out if any process is using a semaphore at a time so that I can call 'remove'? When I just add a 'remove' on process start this works great on windows (as remove just fails if another process has the semaphore open), but on linux sem_unlink is used which has the behavior of deleting it even if its in use. What is the general practice when it comes to cleaning up semaphores after process crashes? Maybe some way to ensure that 'post' and 'close' are always called even when application has otherwise crashed? Is there some way to use boost's windows style semaphores on linux instead of native posix style? I tried looking and many have asked this question (in context of recovering from posix semaphores, which are used by boost on linux), but I couldn't find any answers. Lars had asked this here also, almost an year ago but no answers in that thread either. This seems like a basic issue but am totally lost on how to even approach it. Sachin Garg

On Tue, Jul 29, 2008 at 02:53:01PM +0530, Sachin Garg wrote:
When using semaphores to synchronize separate processes, everything works fine when each process exits nicely (closing its semaphores before exit). But things get really messy when a process might crash. I am unable to figure out how to recover from such a crash which leaves semaphores in inconsistent state.
If a semaphore is not in-use (open) by any process, in this case (in my application) I can safely 'remove' it and start afresh. Is there some way to find out if any process is using a semaphore at a time so that I can call 'remove'?
When I just add a 'remove' on process start this works great on windows (as remove just fails if another process has the semaphore open), but on linux sem_unlink is used which has the behavior of deleting it even if its in use.
What is the general practice when it comes to cleaning up semaphores after process crashes? Maybe some way to ensure that 'post' and 'close' are always called even when application has otherwise crashed? Is there some way to use boost's windows style semaphores on linux instead of native posix style?
I tried looking and many have asked this question (in context of recovering from posix semaphores, which are used by boost on linux), but I couldn't find any answers. Lars had asked this here also, almost an year ago but no answers in that thread either. This seems like a basic issue but am totally lost on how to even approach it.
Sachin Garg
Hi Sachin ipcs -s -p will show a list of semaphores and the associated pids of the process which created them. Using the pids obtained from the above, you can check the process table to check whether the process is still alive? e.g. (N.B. I use the -m, rather than the -s option to ipcs for illustration, since I have no semaphores, but do have shared memory used). bob@spain:~$ ipcs -m -p ------ Shared Memory Creator/Last-op -------- shmid owner cpid lpid 327680 bob 7192 7238 360449 bob 7233 7284 393218 bob 7273 7187 425987 bob 7235 7187 458756 bob 7279 7187 491525 bob 7235 7187 524294 bob 7284 29961 557063 bob 7227 7187 589832 bob 7227 7187 622601 bob 7287 7187 655370 bob 7309 7187 688139 bob 7347 7187 720908 bob 16117 16124 753677 bob 16117 16124 bob@spain:~$ ps ax | grep 7192 7192 tty2 Sl 0:04 /usr/bin/gnome-session 19670 pts/3 R+ 0:00 grep --colour=auto 7192 bob@spain:~$ A little perl script could be written to do this. Bob -- To make tax forms true they should read "Income Owed Us" and "Incommode You".

On Tue, Jul 29, 2008 at 9:25 PM, Bob Wilkinson
On Tue, Jul 29, 2008 at 02:53:01PM +0530, Sachin Garg wrote:
When using semaphores to synchronize separate processes, everything works fine when each process exits nicely (closing its semaphores before exit). But things get really messy when a process might crash. I am unable to figure out how to recover from such a crash which leaves semaphores in inconsistent state.
If a semaphore is not in-use (open) by any process, in this case (in my application) I can safely 'remove' it and start afresh. Is there some way to find out if any process is using a semaphore at a time so that I can call 'remove'?
When I just add a 'remove' on process start this works great on windows (as remove just fails if another process has the semaphore open), but on linux sem_unlink is used which has the behavior of deleting it even if its in use.
What is the general practice when it comes to cleaning up semaphores after process crashes? Maybe some way to ensure that 'post' and 'close' are always called even when application has otherwise crashed? Is there some way to use boost's windows style semaphores on linux instead of native posix style?
I tried looking and many have asked this question (in context of recovering from posix semaphores, which are used by boost on linux), but I couldn't find any answers. Lars had asked this here also, almost an year ago but no answers in that thread either. This seems like a basic issue but am totally lost on how to even approach it.
ipcs -s -p will show a list of semaphores and the associated pids of the process which created them.
Using the pids obtained from the above, you can check the process table to check whether the process is still alive?
e.g. (N.B. I use the -m, rather than the -s option to ipcs for illustration, since I have no semaphores, but do have shared memory used).
bob@spain:~$ ipcs -m -p [snip]
A little perl script could be written to do this.
I have a named semaphore so can I do all this only for that one named semaphore? And can this be done programatically without relying on external executables? If its possible (I hope it is), will it make sense to add such a smart_remove to boost interprocess? Idea behind using boost was to make my code portable and hide the platform related intricacies. Doing all this myself squarely defeats that purpose (but for now, I would still love to know the solution if any :-). Thanks, Sachin Garg

Sachin Garg wrote:
If a semaphore is not in-use (open) by any process, in this case (in my application) I can safely 'remove' it and start afresh. Is there some way to find out if any process is using a semaphore at a time so that I can call 'remove'?
Inteprocess is modeled after posix primitives, so there is no way to know if someone is attached. Think about this as if the semaphore was a file. What would you do if you are communicating two processes with a file and one process crashes? I think you should have some keepalive mechanism to detect that a process has died and recreate ipc mechanisms on failure.
When I just add a 'remove' on process start this works great on windows (as remove just fails if another process has the semaphore open), but on linux sem_unlink is used which has the behavior of deleting it even if its in use.
This same problem happens with std::remove(const char *filename) (windows version fails if the file is in use but unix version calls unlink and removes that file from the filesystem without failing while attached processes still write to that phantom file) but this is a difference I don't know how to solve.
What is the general practice when it comes to cleaning up semaphores after process crashes? Maybe some way to ensure that 'post' and 'close' are always called even when application has otherwise crashed? Is there some way to use boost's windows style semaphores on linux instead of native posix style?
I tried looking and many have asked this question (in context of recovering from posix semaphores, which are used by boost on linux), but I couldn't find any answers. Lars had asked this here also, almost an year ago but no answers in that thread either. This seems like a basic issue but am totally lost on how to even approach it.
In general I see no general solution. You can't register cleanup actions when a process crashes (well, the OS can, but not the user code). If anyone has any idea about this, I would be glad to hear it. Regards, Ion

On Tue, Jul 29, 2008 at 10:32 PM, Ion Gaztañaga
Sachin Garg wrote:
If a semaphore is not in-use (open) by any process, in this case (in my application) I can safely 'remove' it and start afresh. Is there some way to find out if any process is using a semaphore at a time so that I can call 'remove'?
Inteprocess is modeled after posix primitives, so there is no way to know if someone is attached. Think about this as if the semaphore was a file. What would you do if you are communicating two processes with a file and one process crashes? I think you should have some keepalive mechanism to detect that a process has died and recreate ipc mechanisms on failure.
Yep, I understand this is the posix way of removing everything, be it semaphores or other stuff. By keepalive do you mean having an umbrella process to take care of recovering from such crashes? Or is it some other standard mechanism that I am not aware of?
When I just add a 'remove' on process start this works great on windows (as remove just fails if another process has the semaphore open), but on linux sem_unlink is used which has the behavior of deleting it even if its in use.
This same problem happens with std::remove(const char *filename) (windows version fails if the file is in use but unix version calls unlink and removes that file from the filesystem without failing while attached processes still write to that phantom file) but this is a difference I don't know how to solve.
Yep. I tried forcing use of interprocess' cywgin and windows implementation of named_semaphore on linux (just for experimenting) as these are done differently. Windows one fails to compile and cygwin implementation fails as that uses shm_unlink which works same as sem_unlink, the posix way :-)
What is the general practice when it comes to cleaning up semaphores after process crashes? Maybe some way to ensure that 'post' and 'close' are always called even when application has otherwise crashed? Is there some way to use boost's windows style semaphores on linux instead of native posix style?
I tried looking and many have asked this question (in context of recovering from posix semaphores, which are used by boost on linux), but I couldn't find any answers. Lars had asked this here also, almost an year ago but no answers in that thread either. This seems like a basic issue but am totally lost on how to even approach it.
In general I see no general solution. You can't register cleanup actions when a process crashes (well, the OS can, but not the user code). If anyone has any idea about this, I would be glad to hear it.
The method discussed with Bob (in same thread), does that makes sense? To programatically do what he proposes using commands. I am not aware of system calls for this but it seems possible (ipcs does this 'somehow') to find which process last used a semaphore and then it can be checked if that process id is still alive, and only then we can call sem_unlink. All this can be abstracted with boost in a smart_remove or a safe_remove. Idea being to sem_unlink only when no other process is using it. If it doesn't looks like something of too much general value (though I think it would be) I would atleast like to do this in my code, so any pointers to relevant system calls will be really really helpful. Thanks for all the great work done in interprocess. Sachin Garg

On Tue, Jul 29, 2008 at 11:33 PM, Sachin Garg
On Tue, Jul 29, 2008 at 10:32 PM, Ion Gaztañaga
wrote: Sachin Garg wrote:
If a semaphore is not in-use (open) by any process, in this case (in my application) I can safely 'remove' it and start afresh. Is there some way to find out if any process is using a semaphore at a time so that I can call 'remove'?
Inteprocess is modeled after posix primitives, so there is no way to know if someone is attached. Think about this as if the semaphore was a file. What would you do if you are communicating two processes with a file and one process crashes? I think you should have some keepalive mechanism to detect that a process has died and recreate ipc mechanisms on failure.
Yep, I understand this is the posix way of removing everything, be it semaphores or other stuff. By keepalive do you mean having an umbrella process to take care of recovering from such crashes? Or is it some other standard mechanism that I am not aware of?
When I just add a 'remove' on process start this works great on windows (as remove just fails if another process has the semaphore open), but on linux sem_unlink is used which has the behavior of deleting it even if its in use.
This same problem happens with std::remove(const char *filename) (windows version fails if the file is in use but unix version calls unlink and removes that file from the filesystem without failing while attached processes still write to that phantom file) but this is a difference I don't know how to solve.
Yep. I tried forcing use of interprocess' cywgin and windows implementation of named_semaphore on linux (just for experimenting) as these are done differently. Windows one fails to compile and cygwin implementation fails as that uses shm_unlink which works same as sem_unlink, the posix way :-)
What is the general practice when it comes to cleaning up semaphores after process crashes? Maybe some way to ensure that 'post' and 'close' are always called even when application has otherwise crashed? Is there some way to use boost's windows style semaphores on linux instead of native posix style?
I tried looking and many have asked this question (in context of recovering from posix semaphores, which are used by boost on linux), but I couldn't find any answers. Lars had asked this here also, almost an year ago but no answers in that thread either. This seems like a basic issue but am totally lost on how to even approach it.
In general I see no general solution. You can't register cleanup actions when a process crashes (well, the OS can, but not the user code). If anyone has any idea about this, I would be glad to hear it.
The method discussed with Bob (in same thread), does that makes sense? To programatically do what he proposes using commands.
I am not aware of system calls for this but it seems possible (ipcs does this 'somehow') to find which process last used a semaphore and then it can be checked if that process id is still alive, and only then we can call sem_unlink. All this can be abstracted with boost in a smart_remove or a safe_remove. Idea being to sem_unlink only when no other process is using it.
If it doesn't looks like something of too much general value (though I think it would be) I would atleast like to do this in my code, so any pointers to relevant system calls will be really really helpful.
Thanks for all the great work done in interprocess.
ps. I figured something can be done using semctl/semget etc but they need sem's set id as parameter. Haven't yet figure out how to find that id for a posix named semaphore. Sachin Garg

Ion,
I was wondering whats your take on this. Is it something that
can/should be added to boost or would you prefer that I just hack it
in my code only?
Sachin Garg
On Wed, Jul 30, 2008 at 12:50 AM, Sachin Garg
On Tue, Jul 29, 2008 at 11:33 PM, Sachin Garg
wrote: On Tue, Jul 29, 2008 at 10:32 PM, Ion Gaztañaga
wrote: Sachin Garg wrote:
If a semaphore is not in-use (open) by any process, in this case (in my application) I can safely 'remove' it and start afresh. Is there some way to find out if any process is using a semaphore at a time so that I can call 'remove'?
Inteprocess is modeled after posix primitives, so there is no way to know if someone is attached. Think about this as if the semaphore was a file. What would you do if you are communicating two processes with a file and one process crashes? I think you should have some keepalive mechanism to detect that a process has died and recreate ipc mechanisms on failure.
Yep, I understand this is the posix way of removing everything, be it semaphores or other stuff. By keepalive do you mean having an umbrella process to take care of recovering from such crashes? Or is it some other standard mechanism that I am not aware of?
When I just add a 'remove' on process start this works great on windows (as remove just fails if another process has the semaphore open), but on linux sem_unlink is used which has the behavior of deleting it even if its in use.
This same problem happens with std::remove(const char *filename) (windows version fails if the file is in use but unix version calls unlink and removes that file from the filesystem without failing while attached processes still write to that phantom file) but this is a difference I don't know how to solve.
Yep. I tried forcing use of interprocess' cywgin and windows implementation of named_semaphore on linux (just for experimenting) as these are done differently. Windows one fails to compile and cygwin implementation fails as that uses shm_unlink which works same as sem_unlink, the posix way :-)
What is the general practice when it comes to cleaning up semaphores after process crashes? Maybe some way to ensure that 'post' and 'close' are always called even when application has otherwise crashed? Is there some way to use boost's windows style semaphores on linux instead of native posix style?
I tried looking and many have asked this question (in context of recovering from posix semaphores, which are used by boost on linux), but I couldn't find any answers. Lars had asked this here also, almost an year ago but no answers in that thread either. This seems like a basic issue but am totally lost on how to even approach it.
In general I see no general solution. You can't register cleanup actions when a process crashes (well, the OS can, but not the user code). If anyone has any idea about this, I would be glad to hear it.
The method discussed with Bob (in same thread), does that makes sense? To programatically do what he proposes using commands.
I am not aware of system calls for this but it seems possible (ipcs does this 'somehow') to find which process last used a semaphore and then it can be checked if that process id is still alive, and only then we can call sem_unlink. All this can be abstracted with boost in a smart_remove or a safe_remove. Idea being to sem_unlink only when no other process is using it.
If it doesn't looks like something of too much general value (though I think it would be) I would atleast like to do this in my code, so any pointers to relevant system calls will be really really helpful.
Thanks for all the great work done in interprocess.
ps. I figured something can be done using semctl/semget etc but they need sem's set id as parameter. Haven't yet figure out how to find that id for a posix named semaphore.
Sachin Garg

Sachin Garg wrote:
Ion,
I was wondering whats your take on this. Is it something that can/should be added to boost or would you prefer that I just hack it in my code only?
Sachin Garg
My opinion is that there is no solution without kernel help. The original Interproces library (Shmem) emulated windows behaviour in Unix and it was a nightmare to get consistent behaviour. This was changed in Interprocess. The relationship between System V and POSIX resources is quite obscure and I don't see a proper way to solve this. The same problem with files and I haven't seen any clue to make Unix and Windows behavior identical until today: http://mg.to/2004/09/30/file_share_delete-in-shell-extension According to this, adding FILE_SHARE_DELETE to the shared memory emulation functions would allow, UNIX-like behavior for Windows files. I haven't had time to test this. This would not solve your problem, because you want Windows behavior (failure when the resource is in use) in UNIX. Regards, Ion

Ion Gaztañaga wrote:
The same problem with files and I haven't seen any clue to make Unix and Windows behavior identical until today:
http://mg.to/2004/09/30/file_share_delete-in-shell-extension
According to this, adding FILE_SHARE_DELETE to the shared memory emulation functions would allow, UNIX-like behavior for Windows files. I haven't had time to test this.
I've just checked this, but it does not behave like unix. If you specify FILE_SHARE_DELETE you DeleteFile returns success when the file when it's in use (but the file it's still there in the explorer) and opening the file after deletion fails. However, if you try to create another file with the same name this also fails. So you can't just call "remove" and recreate the file with the same name. That's a pity. Regards, Ion

On Sat, Aug 2, 2008 at 1:48 PM, Ion Gaztañaga
Ion Gaztañaga wrote:
The same problem with files and I haven't seen any clue to make Unix and Windows behavior identical until today:
http://mg.to/2004/09/30/file_share_delete-in-shell-extension
According to this, adding FILE_SHARE_DELETE to the shared memory emulation functions would allow, UNIX-like behavior for Windows files. I haven't had time to test this.
I've just checked this, but it does not behave like unix. If you specify FILE_SHARE_DELETE you DeleteFile returns success when the file when it's in use (but the file it's still there in the explorer) and opening the file after deletion fails. However, if you try to create another file with the same name this also fails. So you can't just call "remove" and recreate the file with the same name. That's a pity.
If you are at making behavior identical, I would much prefer the windows way rather than posix way. Not a preference towards any platform, just that the windows way seems to make more sense. Or maybe two 'removes', one which works as it does now and other the smart_remove. In case someone out there does prefer's posix way. Of course, I don't have any answer as to how to get either done. I have been trying to hack in Bob's solution in my code as it can work atleast for me. But its painful as internal implementation of named_semaphore is different on win/lin/mac, so both lin and mac will need separate hacks, and then this is something that will need to be carefully examined again every time boost is updated as internal implementations may change is future. Sometimes I just wish things were easier :-) Sachin Garg

Microsoft has a lot of faults but when comparing Windows API to the LINUX API I take windows every day.
I remember having to clean up shared memory segments by and then under some UNIX OS -- I guess it was SOLARIS.
I also remember that mmap()/munmap() (on SOLARIS) did not behave in a manner, which I would prefer as a C++ programmer, since multiple calls to mmap() could be undone by a single call to munmap(). I hope that this problem was considered when writing the memory mapped io features of boost. I guess you would need some static container of all pointers returned by mmap() and some reference count indicating how often the matching pointer was returned by mmap():
static std::map

Well, that atleast confirms I am not the only one banging my head on
the wall due to this :-)
I don't think the reference count solution can work here as process
crashes can leave reference count invalid. Any other possible solution
you might have implemented?
Sachin Garg
On Wed, Jul 30, 2008 at 2:30 AM,
Microsoft has a lot of faults but when comparing Windows API to the LINUX API I take windows every day.
I remember having to clean up shared memory segments by and then under some UNIX OS -- I guess it was SOLARIS.
I also remember that mmap()/munmap() (on SOLARIS) did not behave in a manner, which I would prefer as a C++ programmer, since multiple calls to mmap() could be undone by a single call to munmap(). I hope that this problem was considered when writing the memory mapped io features of boost. I guess you would need some static container of all pointers returned by mmap() and some reference count indicating how often the matching pointer was returned by mmap():
static std::map
s_sMmap2RefCount; I also remember the amount of code I wrote to hack around the UNIX feature of killing processes which write to a dead pipe.
I also remember the amount of code I wrote to get the errno from execvp() into the process which called fork().
I also remember having to write code in C instead of C++, since the code was supposed to be linked into a shared library which was intended to be dlopen-ed by some third party executable which in turn may or may not be loading the correct C++ library -- consider that UNIX knows only about a single namespaces for all symbols in a process.
I also remember that I could not write C++ code with a post-C++-Exception-Handling-style, since the matching compiler did not implement C++ Exception Handling correctly for a couple of years after this feature was already working on Windows and OS/2. The UNIX compiler did call destructors for memory locations for which no constructor had been called and vs. versa, they were forgetting to call destructors for initialized temporary objects.
I hate the UNIX API because I'm a C++ programmer.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (4)
-
Bob Wilkinson
-
Ion Gaztañaga
-
peter_foelsche@agilent.com
-
Sachin Garg