
fsync() performs pathologically awful on copy-on-write filing systems
The library is not designed for exotic file systems like the one you describe. Its meant for simple commodity hardware and operating systems such as what you might find on a bare metal amazon web instance. There is no need for a copy on write file system, as long as the invariants are met (that fsyncs aren't reordered).
Except, as has already been established, retrievability of fsyncs after power loss *are* reordered. So your invariant is not met. For the record, ZFS is hardly an exotic file system. All my servers are running ZFS on Linux because that Linux distro (Proxmox) defaults to ZFS. My FreeBSD install on my laptop runs ZFS because that's also the default filing system for FreeBSD. Additionally, ext4 can be mounted in COW mode via "data=journal". So none of this is exotic, merely it's commonplace outside where you've been using NuDB to date. As I said to you at the very beginning of all this, your database is aimed at a very limited use case. If it entered Boost, you'd find people doing all sorts of crazy stuff with it, and running it on ZFS would be very mild compared to what some would do.
In which case you did not make a great choice.
Much, much better would be Blake2b. 2 cycles/byte, cryptographically secure, collision probability exceeds life of the universe.
Hmm, no, it seems that you are the one who "did not make a great choice." The requirements of Hasher do not include cryptographic security. Blake2b is a cryptographically secure hash function which computes digests up to 32 bytes, while xxhasher is a non cryptographically secure hash function which computes a 64-bit digest. NuDB requires a Hasher more like std::hash and less like SHA-1.
I already explained in my reply to Lee why one very good approach to portably achieving durability is to use an acyclic graph of chained cryptographic hashes to maintain a secure history of time. Exactly as git or mercurial does in fact. If you're not using cryptographically strong hashing, it's highly unlikely your database can be durable.
Blake2b can achieve almost 1Gb/s while xxhash can achieve 110Gb/s.
Your maths are seriously out. Your hash function (which runs per thread) only needs to be as fast as your storage device is at a queue depth of 1. So, taking a top of the range NVM SSD, the Samsung 960 Pro, it can write at QD1 about 50k IOPS. That's around 200Mb/sec/thread. Blake2b runs at 1Gb/sec, so it should comfortably fit with a bit of room to spare on a single thread. Obviously, more threads gets you more queue depth and performance should rise linearly until you run out of CPUs. Note I mention you only cryptographically hash on write. I wouldn't suggest it for lookups and reads except on first database open. For lookups and reads I'd strongly recommend SpookyHash v2 over xxhash (you'll find a header only edition of SpookyHash v2 in AFIO v2). Spooky was designed by a renowned crypto hash expert, unlike the MurmorHash derived xxhash which definitely was not. Spooky is fast on all architectures, not just Intel x64. Spooky uses the same internal mechanism as a cryptographically strong hash, just fewer rounds for performance. Spooky is as ddos resistant as siphash, and empirically proved collision resistant to 2^72 inputs, whereas xxhash will collide at best at 2^32 inputs. And the 128 bit hash Spooky creates fits perfectly into a single SSE or NEON register making working with them single cycle, and that is exactly what AFIO uses to work with them via its uint128 type. So for a content addressable database like yours, please use SpookyHash v2, even if you XOR into 64 bits. And if you decide to stick with 64 bit hashes, you need to document that collision is mathematically certain after 4 billion items have been inserted, and mathematically likely long before that. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/