On Tue, 21 Mar 2017 23:08:30 -0400
Vinnie Falco via Boost
On Tue, Mar 21, 2017 at 10:50 PM, Lee Clagett via Boost
wrote:
[...snip...]
The pathological case is power loss, which cannot be unit tested easily (and probably isn't possible).
I think this can be unit tested, and I believe that NuDB's unit test covers the case of power loss. I think we can agree that power loss on a read is uninteresting (since it can't corrupt data). The unit test models a power loss as a fatal error during a write. The test exercises all possible fatal errors using an incremental approach (I alluded to this in my previous message).
The database template requires a File type to instantiate:
template
class basic_store; NuDB comes with posix_file and win32_file. The unit test uses its own fail_file which wraps the native file and generates a simulated error on the Nth write: https://github.com/vinniefalco/NuDB/blob/master/extras/nudb/test/fail_file.h...
The test implementation tries to insert a bunch of records into a new database using a fail_file, incrementing the counter as it goes. On a failure it tries to recover (which can also result in a failed write). After the recover fails, it verifies that the database is still consistent. You can see the test implementation: https://github.com/vinniefalco/NuDB/blob/master/test/recover.cpp#L125
I *think* this test proves that the implementation maintains consistency on any power loss. would very much like anyone who is interested to tell me whether this approach works.
The other responses to this thread reiterated what I thought could occur - there should be corruption "races" from a write call to file sync completion. Writing the blocks to the log file are superfluous because it is writing to multiple sectors and there is no mechanism to detect a partial write after power failure. So the information cannot be read from reliably. If these writes were omitted, the presence of the log file header indicates whether a commit failed to complete, and this header should fit in a single sector. The key and data file sizes in that header indicate what portion at the end of the key and data files should be ignored. If the log file header was moved to beginning of the key file, after the records are appended and sync'ed, the header could be overwritten with the new file lengths. The key file would not need to be opened with the `_POSIX_SYNCHRONIZED_IO` flag, and a fsync after the writing header can be omitted (which means a commit also has "relaxed" disk flushing). This should improve commit performance. At this point I'm sure Niall will tell me that writing <512 bytes to the beginning of the file is also not an atomic operation. Which is exactly what SQLite documentation assumes. The power-loss recovery algorithm gets tougher with this assumption.
- The key-file is append only so NuDB has multiple blocks of sorted records. So hopefully the insertions do not cause lots of blocks or the algorithm becomes a linear search.
There's no sorting going on here, the key file is an on-disk hash table where each bucket holds a good-sized number of keys (so that a single read from the key file does the most work).
I jumped into the internal fetch function which was sorted within a single bucket and had a linked list of spills. Reading the README first would've made it clear that there was more to the implementation. So the worst case performance is a link-list if a hash collision. Was the primary decision for the default hash implementation performance? Lee