
I am unsure what benefit it confers over the much safer choices of SQLite or UnQLite both of which are battle tested and written by database experts to ensure that other people's data is never, ever lost.
SQLite is painfully slow compared to NuDB. UnQLite supports features not present in NuDB such as cursor iteration, moving, and deleting. The tradeoff for not supporting these features is that NuDB can make better claims about its performance.
You are correct that NuDB is a single purpose design, and so will always be faster. However SQLite configured into "I don't care about data loss" mode is pretty swift. "Painfully slow" is not a correct term to use except for its default data loss caring mode.
I felt when I reviewed NuDB's implementation some time ago that your implementation would be highly susceptible to data loss in a wide set of circumstances.
NuDB makes some assumptions about the underlying file system. In particular, that when a call to fsync() returns it is assured the data is reliably written. If these invariants are met there is no data loss.
In fact, one of NuDB's unit tests exhaustively verifies that the database recovery process performs correctly after an I/O failure. The unit test works by instantiating a database with a custom File template parameter. This custom file implementation simulates an I/O failure on the Nth operation. The test runs with N=1 to failure, then increments N, and re-runs the test until there are no failures. At each iteration, the test verifies that the database recover process left the database in a consistent state. You can see this code here: https://github.com/vinniefalco/NuDB/blob/master/test/recover.cpp#L125
I don't doubt that you think this database is vulnerable to data loss under some condition. However, I would appreciate specifics so that I can verify under what conditions your claim is correct, or if it is correct at all.
The single biggest thing which struck me when reading your implementation was that you make very strong assumptions about your storage, and that generally leads to corner case data loss. Plucking straight from random as it was too long ago I examined your source, but a classic mistake is to assume this is sequentially consistent: int keyfd, storefd; write(storefd, data) fsync(storefd) write(keyfd, key) fsync(keyfd) Here the programmer writes the value being stored, persists it, then writes the key to newly stored value, persists that. Most programmers unfamiliar with filing systems will assume that the fsync to the storefd cannot happen after the fsync to the keyfd. They are wrong, that is a permitted reorder. fsyncs are only guaranteed to be sequentially consistent *on the same file descriptor* not different file descriptors. Technically speaking, you could even dupfd() the same fd to the same inode and fsync is not required to be sequentially consistent between those two fds, though most major filing systems do implement that because the locking is kept at inode level. I can't remember if NuDB makes that sort of mistake, but the single biggest thing which struck me was your assumption that storage is reliable and predictable and that the syscalls you wrote will be executed in the order you wrote them. None of those are true except in a very limited circumstance i.e. power never goes off, storage never fails, ECC RAM is present, bitflips never occur, you are on a very recent Linux kernel, your ext4 filing system hasn't been mounted with unusual flags etc. You also rely heavily on the kernel page caching architecture to give your performance. That's a good move as most people trying to invent their own caching architecture on top of O_DIRECT do a far worse job. However to really get IOPS out of storage just around the corner where hitting 1M 4K IOPS is achievable, you can't avoid O_DIRECT for that.
I also felt it wouldn't scale well to the > 1M IOPS storage just around the corner.
While it hasn't yet been tested on these non-existent devices, I am optimistic, since NuDB was specifically designed for very intense fetch workloads with light concurrent insertions. Fetches can happen concurrently, it should be possible to saturate the read throughput of a device. See: https://github.com/vinniefalco/NuDB/blob/master/include/nudb/impl/basic_stor...
The Macbook Pro I am typing on right now hits 300k 4Kb IOPS just fine. Some of the systems with big RAM drives I've tested AFIO were busting past 2M IOPS. These are not non-existent devices, they are just currently high end, and rapidly heading towards the consumer.
I would suspect it is being used in a highly restricted use case where corner cases don't present.
Maybe. I'll counter by saying, there are only three operations possible on an open database: insert, fetch, and close.
I was more referring to use environment. For example, atomic appending to a file is super slow on ZFS. It may be fast on ext4 and there you should use atomic appends, but on other filing systems you need to set the extent to maximum and use lazy allocation via an atomically updated manually kept watermark. There's lots of this stuff, it's why AFIO v2 has a concept of "storage profiles" which keep a precanned mix of AFIO filing system algorithms for some storage combo. That way if you load your key-value store on ZFS on Linux, the storage profile says to use algorithms A, B and D. If you load it on NTFS on Windows, use algorithms B, C and F instead. AFIO provides many algorithms all accessible via an abstracted base class so things using them don't need to care.
if you are up for getting this to a state ready to submit to Boost, I'd welcome it.
Cool! The docs just need a bit of work, the library code is more than ready.
Thanks for taking the time to go through it!
I appreciate I'm full of nothing but criticism, but I wanted to give you some idea of how a peer review would go. I do want to say thank you for writing your library, and intending to bring it to Boost. It may well be that the community will accept NuDB if it very substantially reduces the guarantees it offers to essentially none i.e. this is a fast key-value insert only store with no guarantees. But I will mention that if I don't need to provide any guarantees, I believe I could roll something with AFIO v2 even in its current very alpha state that will match or beat NuDB in less than a hundred lines of code and about one week of work. I even reckon I can achieve key-value deletion if users are willing to accept a reasonable ceiling on total item count of maybe a few million items, and use of a recent extents based filing system. I don't wish to do your work down, but a key value store with no guarantees really is that easy to implement with AFIO v2, right now. It's the guarantees and achieving consistent performance across environments which is hard. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/