
On Sat, 25 Mar 2017 16:22:50 -0400
Vinnie Falco via Boost
On Sat, Mar 25, 2017 at 4:01 PM, Lee Clagett via Boost
wrote: The other responses to this thread reiterated what I thought could occur - there should be corruption "races" from a write call to file sync completion.
NuDB makes the same assumptions regarding the underlying file system capabilities as SQLite. In particular, if there are two calls to fsync in a row, it assumes that the first fsync will complete before the second one starts. And that upon return from a successful call to fsync, the data has been written.
I think SQLite makes more stringent assumptions - that between the write and the sync the file metadata and sectors can be written in any order. And one further - that a single sector could be partially written but only sequentially forwards or backwards. This last assumption sounds like a legacy assumption from spinning disks. I have been (poorly) suggesting that the order of the new buckets in the log and log file header be swapped. Writing the header which fits in a single sector is _more_ likely to be all-or-nothing than the bucket records. At least that's my understanding of this insane situation. The recover method in NuDB appears to be assuming that if it can read the entire bucket in the log, that its valid. Using the header to "post" completed changes seems like it has a smaller window for failure. If the header was not written correctly, truncate.
When there is a power loss or device failure, it is possible that recent insertions are lost. The library only guarantees that there will be no corruption. Specifically, any insertions which happen after a commit, might be rolled back if the recover process is invoked. Since the commit process runs every second, not much will be lost.
Writing the blocks to the log file are superfluous because it is writing to multiple sectors and there is no mechanism to detect a partial write after power failure.
Hmm, I don't think there's anything superfluous in this library. The log file is a "rollback file." It contains blocks from the key file in the state they were in before being modified. During the commit phase, nothing in the key file is modified until all of the blocks intended to be modified are first backed up to the log file. If the power goes out while these blocks are written to the log file, there is no loss.
I've proven twice in this thread I cannot read code quickly. Crap. I thought the key file was append-only like the data file, which is not possible. No idea why I thought this, other than it would make the implementation easier. I think the only way to get rid of the log file in this design is via a COW filesystem (but I have no experience with those).
I jumped into the internal fetch function which was sorted within a single bucket and had a linked list of spills. Reading the README first would've made it clear that there was more to the implementation.
The documentation for NuDB needs work! I can only vouch for the maturity of the source code, not the documentation :)
So the worst case performance is a link-list if a hash collision.
Right, although the creation parameters are tuned such that less than 1% of buckets have 1 spill record, and 0% of buckets have 2 or more spill records.
Was the primary decision for the default hash implementation performance?
If you're talking about xxhasher, it was chosen for being the best balance of performance, good distribution properties, and decent security. NuDB was designed to handle adversarial inputs since most envisioned use-cases insert data from untrusted sources / the network.
Yes, I asked because I was curious about the security analysis. The only security analysis I could find quickly was the SMHasher test, and it got the same score as `murmurhash 3a`. Murmurhash 3 was shown to vulnerable to collisions, even with a random seed. I could not find the significance of the 'a' (i.e. `3a`) listed in the table provided by xxHash. Lee