
On Sun, Apr 13, 2008 at 10:46:42AM +0200, Ion GaztaƱaga wrote:
Ok. You need to register all processes attached to one particular segment the segment somewhere. This imposes some reliability problems because a process might crash when doing another task than using the segment.
Yes. I used a separate "bootstrap [shm] segment" to hold all global bookkeeping data. Actually, for this particular purpose, you don't even need it: you can use the SHM segment itself to hold a list of processes attached to it. (Unfortunately, there's no POSIX API to get the list of processes attached to a particular SHM -- most probably because such information is volatile and potentially already worthless at the time you get to use it.) Regarding reliability: a process can crash at any time for any cause; introducing error-free SHM grow code (however it's implemented) will not make the program crash more or less frequently or introduce some new failure mode. What _can_ happen though is that a process crashes and remains registered as having the segment attached. The same problem occurs also when handling SIGSEGV to make the remapping. Since a dead process may be replaced by a random process with the same PID, sending an asynchronous notification to that process may do unpredictable things -- most likely, terminate it [the default action for most signals]. So the reason for handling SIGSEGV and other fatal signals would NOT be to remap segments, BUT to deregister the process from the SHM manager before terminating it. This is again only a half-solution because the process may be terminated by other signals that it doesn't handle, and most definitely with SIGKILL which can't be caught. Potential solution would be to have all cooperating processes have a common parent controller -- thus, when the process dies, it will remain in the zombie state, and since parent will be coded to NOT call wait() and not to exit until all childs have exited (SIGCHLD), this will prevent the reuse of PIDs. This parent controller could than deregister the process from its SHM segments, and finally wait() for it after everything has been cleaned up. Where (at which level of complexity) to stop, depends on the needs - but the solution _can_ be made very reliable and portable. (An extremely simple solution that doesn't require controller process and won't kill random processes: just run the cooperating processes under a dedicated user ID.)
An asynchronous notification via signal does not carry enough context (sigval from sigqueue onlyl stores an int or void*) to notify which
That would be enough with a bootstrap segment that contains a list of all SHM segs managed by boost. The you just send the offset/pointer into this segment.
And if that does not discourage you from implementing this, there is no much you can do inside a signal handler. You can see a list here:
http://www.opengroup.org/onlinepubs/000095399/functions/xsh_chap02_04.html#t...
This means that you can't call mmap from a signal handler. You can't remap memory asynchronously according POSIX. It's possible that some OSs support that.
Good point. You still have two choices: 1. You ignored the possibility of sending a message through a POSIX msgq with SIGEV_THREAD notification (see mq_notify()). 2. Have a dedicated signal + dedicated thread in each process to catch it (see sigwait()). [All other threads shall block this signal.]
If remapping is possible, a more correct and robust mechanism could be
Correct according to which specification?
catching SIGSEGV from processes that have not updated their memory mappings and doing some remapping with some global growable segment list stored in a singleton (this has problems when using dlls). Less interprocess communication means more reliability.
Far from it: that very same URL says the following: "The behavior of a process is undefined after it returns normally from a signal-catching function for a [XSI] SIGBUS, SIGFPE, SIGILL, or SIGSEGV signal that was not generated by kill(), [RTS] sigqueue(), or raise()." This applies to what you have just proposed. Furthermore, this venue shall lead you into a mess of platform-specific code: please see GNU libsigsegv. Anyway, a line has to be drawn somewhere: perfection is the worst enemy of good enough. Why should a library ensure its correct operation, when the client program breaks its preconditions? It's a tradeoff between being clean and having stronger preconditions (my approach), or relying on undefined behavior with weaker preconditions (trying to compensate for broken programs). A question: let's say that you have a situation like this: [SHM segment] [unmapped memory] ^ X A program generates SIGSEGV at address X. How are you going to design a *roubust* mechanism that can distinguish the following two cases: - true SIGSEGV (access through corrupt pointer) - SIGSEGV that should grow the segment? Note that there's a race condition: a program might make a true invalid access with corrupt pointer, but by the time that you've looked up the address, and found the nearest SHM segment, *another* process might have already grown the segment. Thus, instead of process being terminated, you will instead grow the faulting process's SHM mapping and let it go berzerk over valid data. Protecting the signal handler with a mutex/semaphore isn't enough: you'd need a way to *atomically* enter the signal handler and acquire a mutex/semaphore.