
On 28/03/2017 14:42, Vinnie Falco via Boost wrote:
On Tue, Mar 28, 2017 at 9:27 AM, Niall Douglas via Boost
wrote: I reckon in a few hundred lines of code I can meet or beat NuDB
Is there a repository link where I can try out your code?
AFIO v2 is a collection of primitive filesystem algorithms. The idea is that it's a toolkit for quickly mashing together custom file system programs like your NuDB in minimum time and effort. So there is nothing in there which looks like NuDB. Docs, such as they are, are at https://ned14.github.io/boost.afio/. Github repo is obvious. And as I mentioned, the CI has been reporting it doesn't build for some months now, so I'd doubt you can try anything until after the ACCU conference. If I were going to write in a few hundred lines something like NuDB, for the value store file, I'd make it sparse and do atomic append as you do, and use hole punching to delete and free storage. I'd create a shared memory mapped view of the key file. I'd use something like https://github.com/ned14/boost-lite/blob/master/include/algorithm/open_hash_... for the indexing algorithm in that memory mapped key file, it has a concurrency safe policy class called atomic_linear_memory_policy which puts a spinlock on each hash table item, and so is concurrent safe. That would be my first attempt, and it would be easy. But I think there is a better algorithm in there, one based on a very large single sparse file 16 Exabyte long which is itself is one giant open addressed hash table. With a very high quality hash, you simply open hash address to the nearest 4Kb block and your maximum key + value is 4Kb with larger values being spilled into a separate largevalue file. Small value cluster packing might be worth implementing as well if values are much smaller than 4Kb, so the idea is you pack nearby items into the same 4Kb extent where possible and use a linear scan to find them inside the 4Kb block. The second algorithm I think would be much faster than your algorithm, but it's pure guesswork on my behalf on the basis that you do one i/o per read or write as it's a single file. It easily might be much slower. Either way, the first algorithm is basically the same as yours, but simplified down to the bare essentials. It should perform as well as yours, and likely better due to no write log file etc. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/