[interprocess] Fault recovery in managed mapped file

I have found that managed mapped file can get stuck in a spinlock if the file is not closed and fully flushed to disk. For example, if the power is pulled from computer while a segment is open and before the first page in the segment has been committed. In this case it is common for a journaled filesystem to preserve the fact that the file was created, but it has lost the contents of the file and now the file appears to be zero'd out. I have observed this behaviour on Linux systems running ext4, for example. This loop is where we get stuck, in managed_open_or_create_impl::priv_open_or_create while(value == InitializingSegment || value == UninitializedSegment){ detail::thread_yield(); value = detail::atomic_read32(patomic_word); } At this point I have opened the file in open_only mode. *patomic_word is 0 (UninitializedSegment) at this point. It appears this code is waiting for some other process or thread to initialize the segment, but in fact there is no such process doing so. Some possible solutions: If, at this point we have opened a file and not created it, why wait for an UnitializedSegment to change state? If the segment is Unitialized here then simply throw an error. Make it the caller's responsibility to ensure the segment is created/initialized before it opened in a read-only mode. Perhaps, if you want to allow multiple processes to do open and create simultaneously without any additional synchronization mechanism, you could accomplish that by adding a count of open mappings into the shared segment. If the reference count is 1 at this point, don't attempt this spinlock because the state of the file is never going to change. In this case throw if the *patomic_word is != InitializedSegment. -- KEVIN ARUNSKI

On 17/11/2010 15:14, Kevin Arunski wrote:
I have found that managed mapped file can get stuck in a spinlock if the file is not closed and fully flushed to disk. For example, if the power is pulled from computer while a segment is open and before the first page in the segment has been committed. In this case it is common for a journaled filesystem to preserve the fact that the file was created, but it has lost the contents of the file and now the file appears to be zero'd out. I have observed this behaviour on Linux systems running ext4, for example.
Some possible solutions:
If, at this point we have opened a file and not created it, why wait for an UnitializedSegment to change state? If the segment is Unitialized here then simply throw an error. Make it the caller's responsibility to ensure the segment is created/initialized before it opened in a read-only mode.
The reason is to support simultaneous open and create, as you indicate below.
Perhaps, if you want to allow multiple processes to do open and create simultaneously without any additional synchronization mechanism, you could accomplish that by adding a count of open mappings into the shared segment. If the reference count is 1 at this point, don't attempt this spinlock because the state of the file is never going to change. In this case throw if the *patomic_word is != InitializedSegment.
A count does not work, because if a process dies, then you have a wrong count. If you need to commit the first page to avoid power errors, call flush() just after creating the managed segment. Anyway, trying to use a mapped file after a hard shut down has no sensible recovery, you don't know which parts of the file the OS has committed, the internal data structure might be absolutely corrupted. Best, Ion

On Nov 18, 2010, at 3:07 PM, Ion Gaztañaga wrote:
On 17/11/2010 15:14, Kevin Arunski wrote:
I have found that managed mapped file can get stuck in a spinlock if the file is not closed and fully flushed to disk. For example, if the power is pulled from computer while a segment is open and before the first page in the segment has been committed. In this case it is common for a journaled filesystem to preserve the fact that the file was created, but it has lost the contents of the file and now the file appears to be zero'd out. I have observed this behaviour on Linux systems running ext4, for example.
Some possible solutions:
If, at this point we have opened a file and not created it, why wait for an UnitializedSegment to change state? If the segment is Unitialized here then simply throw an error. Make it the caller's responsibility to ensure the segment is created/initialized before it opened in a read-only mode.
The reason is to support simultaneous open and create, as you indicate below.
Perhaps, if you want to allow multiple processes to do open and create simultaneously without any additional synchronization mechanism, you could accomplish that by adding a count of open mappings into the shared segment. If the reference count is 1 at this point, don't attempt this spinlock because the state of the file is never going to change. In this case throw if the *patomic_word is != InitializedSegment.
A count does not work, because if a process dies, then you have a wrong count. If you need to commit the first page to avoid power errors, call flush() just after creating the managed segment
Understood. I have been using flush() to commit managed file segments to save them; and indeed that does work fine. The problem comes when the crash occurs between opening the file and calling flush. I could move the flush() earlier ahead in the process, though, to reduce the change of this situation happening. But, even if this allows the open to proceed, how much can I tell about the file since changes were made after the flush? If for example, I wanted to set a dirty flag within the segment itself, wouldn't I run the risk of the allocation structures within the segment being corrupt, leaving me unable to find the offset of my flag?
Anyway, trying to use a mapped file after a hard shut down has no sensible recovery, you don't know which parts of the file the OS has committed, the internal data structure might be absolutely corrupted.
Indeed, I do not want to use the corrupted file at all, but I have no way to tell if the file is corrupted or OK. If I try to open the segment read only and examine it, I get stuck in the loop with no way to detect the failure. This is the problem I am seeking a solution to. From looking at the code it appears that if, for whatever reason, the first 32 bits of the file are 0, and the file is opened read-only, then I am stuck. I was able to solve the issue for my purposes with this change: diff -r boostb/interprocess/detail/managed_open_or_create_impl.hpp boosta/interprocess/detail/managed_open_or_create_impl.hpp 353c353,358 < while(value == InitializingSegment || value == UninitializedSegment){ ---
if (value == UninitializedSegment) { throw interprocess_exception(error_info(corrupted_error)); }
while(value == InitializingSegment){
But, as you can see if the user intends to use the open and create simultaneously as a synchronization mechanism it will fail. This is ok for me because I already have synchronization elsewhere in my code that prevents that scenario. Perhaps, rather than spinning indefinitely, there could a timeout or limit other on how long the open function will wait for the file to become initialized? I assume from the fact that you chose a spin lock that you didn't intend for the user to wait indefinitely. KEVIN ARUNSKI
participants (2)
-
Ion Gaztañaga
-
Kevin Arunski