
On Tue, Mar 21, 2017 at 10:50 PM, Lee Clagett via Boost <boost@lists.boost.org> wrote:
Have you compared this to LMDB?
In our application (rippled) LMDB performs quite poorly even compared to LevelDB, which performed poorly compared to NuDB. Its true that the rippled workload is quite niche.
The pathological case is power loss, which cannot be unit tested easily (and probably isn't possible).
I think this can be unit tested, and I believe that NuDB's unit test covers the case of power loss. I think we can agree that power loss on a read is uninteresting (since it can't corrupt data). The unit test models a power loss as a fatal error during a write. The test exercises all possible fatal errors using an incremental approach (I alluded to this in my previous message). The database template requires a File type to instantiate: template<class Hasher, class File> class basic_store; NuDB comes with posix_file and win32_file. The unit test uses its own fail_file which wraps the native file and generates a simulated error on the Nth write: https://github.com/vinniefalco/NuDB/blob/master/extras/nudb/test/fail_file.h... The test implementation tries to insert a bunch of records into a new database using a fail_file, incrementing the counter as it goes. On a failure it tries to recover (which can also result in a failed write). After the recover fails, it verifies that the database is still consistent. You can see the test implementation: https://github.com/vinniefalco/NuDB/blob/master/test/recover.cpp#L125 I *think* this test proves that the implementation maintains consistency on any power loss. would very much like anyone who is interested to tell me whether this approach works.
- The key-file is append only so NuDB has multiple blocks of sorted records. So hopefully the insertions do not cause lots of blocks or the algorithm becomes a linear search.
There's no sorting going on here, the key file is an on-disk hash table where each bucket holds a good-sized number of keys (so that a single read from the key file does the most work).
This is surprisingly unlikely to scale to the IOPS Niall mentioned. The reader lock requires exclusive cacheline access to synchronize with the writer, which forces the core to wait for a response from other cores/processors.
That's possible. I haven't benchmarked it for that scenario and I am not knowledgeable on optimizing for memory access / cache performance. Thanks