
Your hash function (which runs per thread) only needs to be as fast as your storage device is at a queue depth of 1. So, taking a top of the range NVM SSD, the Samsung 960 Pro, it can write at QD1 about 50k IOPS. That's around 200Mb/sec/thread. Blake2b runs at 1Gb/sec, so it should comfortably fit with a bit of room to spare on a single thread.
I'm probably being dense, but I don't get your argument. NuDB hashes the value to use as a key, from what I understood. If you have a large value, the write will be much faster than with random 4k IOPS, no?
For reference, on my i7 4770k running Win 8.1 a 1TB 960 Pro has about 1700MB/s sequential write speed and 150MB/s 4k random write, both at QD1.
Obviously enough, if you write 16Kb to storage, the OS will issue 4x 4Kb writes (why 4Kb? That's the PCIe DMA maximum). Your 960 Pro very likely will issue all four of those in parallel, so the write of 16Kb will complete in the same time as writing a single 4Kb block. Taking just your figures above, dividing 1700 by 150 yields about 11, so in your test and/or your filing system and/or your OS, sequential writes are being issued on average at 11 at a time. If you are using the kernel page cache, all that stuff is taken care of for you. NuDB uses the kernel page cache, so all that is out of your control. But for me when I was designing AFIO, I was starting on the premise that many if not most would be running with O_DIRECT|O_SYNC and they will be doing their own multiplexing of 4Kb block i/o onto storage. So the async backend AFIO v2 uses is specifically a non-multithreaded implementation - per thread you run an afio::io_service, and you issue an i/o, while that completes you prepare the next i/o, upon completion you issue the next i/o and so on. You keep your thread always writing data, when not writing then preparing the next data to write etc. Thus becomes the challenge of how much work to pack onto each kernel thread, and that's where calculating hash overheads etc comes in. All I was saying earlier was that modern CPUs are fast enough to do crypto strength hashing on i/o even to the very fastest SSDs available right now. If you want durability, you won't give up write performance to achieve it so long as your CPU is as top end as your SSD. If you don't want durability, then none of this matters. Crypto hashes are very expensive, and say SpookyHash is pretty amazing for its performance. So use SpookyHash, but don't claim durability unless you've implemented a hash chain of history where one can wind arbitrarily backwards and forwards in time and thus have the hashes checking one another's correctness such that a collision in one would be spotted by another. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/