
If you implemented NuDB as a simple data file and a memory mapped key file and always atomic appended transactions to the data file when inserting items, then after power loss you could check if the key file mentions extents not possible given the size of the data file. You then can rebuild the key file simply by replaying through the data file, being careful to ignore any truncated final append.
I think this is trading an atomic `truncate(0)` assumption with an atomic multi-block overwrite assumption. So this seems like something that is more likely to have a torn write that is hard to notice.
Implied in my proposal was that every record appended to the data file would be hashed. You need this as atomic append does not send those appends to storage in order, so appends could be torn.
That would be a reasonable power loss recovery algorithm. A little slow to do recovery for large databases, but safe, reliable, predictable and it would only run on a badly closed database. You can also turn off fsync entirely, and let the atomic appends land on storage in an order probably close to the append order. Ought to be quicker than NuDB by a fair bit, much fewer i/o ops, simpler design.
How would it notice that a bucket was partially overwritten though? Wouldn't it have to _always_ inspect the entire key file?
My memory mapped key file would be something like: struct key_file { std::atomic<off_t> data_file_extent, data_file_written; open_hash_index<key_type, std::pair<off_t, off_t>> keys; // map key to offset into data file and size of value stored }; When opening the database for use, you'd first check that data_file_extent matches the actual size of the data file. You'd then check that every key refers to an extent before the data_file_extent. If anything doesn't match up, you need to rebuild. The data_file_written atomic BTW is atomically updated when an append completes, whereas data_file_extent is atomically updated before an append is begun. This should keep a correct window of extents to check for torn writes if O_SYNC is working. But what if O_SYNC isn't working? I'd solve this by writing a timestamp with each transaction record written to the data file. So you parse backwards from whatever survived the power loss at the end of the file by say one minute's worth of writes. Most sane filing systems won't not write out dirty data after a minute. That would be my automatic "load the database" implementation, it's a reasonable balance of performance with checks. Of course one could and should provide a "scan the entire database for consistency and repair" routine too. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/