
On 2012-12-20 16:45, Beman Dawes wrote:
2) Data compression for pages (less file sizes, less memory usage, ordered data can be compressed very good)
I'm strongly against that. The full rationale for not doing compression is lengthy research paper, but the bottom line is what Rudolf Bayer said so many years ago with regard to prefix and suffix key compression - the increased complexity and reduced reliability makes compression very unattractive.
The problems of compression are closely related to the problems of variable length data. If the application is willing to tolerate sequential (rather than binary) search at the page level, or is willing to tolerate an indexed organization on pages, or even (gasp!) an additional disk access per comparison, these problems aren't necessarily showstoppers. But if some applications won't tolerate even a dispatch (either a virtual function or a hand-rolled dispatch) to select the approach being employed, then the only choice I can see is to provide essentially multiple sets of classes, and that gets complex and messy.
At what point do you expect serialization to be executed? Do pages need to be kept in a serialized state in memory? Is it wrong to, say, represent a page with a std::map in memory, and serialize to a binary page when writing? In the latter case, compression seems to be less difficult.
3) Ability for user to provide custom read/write mutexes (fake mutex, interprocess mutex, std::mutex)
There is a spectrum of needs. I've seen various designs that are optimal for various points in that spectrum. Can you point to any design that is optimal across the spectrum from single thread, single process, single machine, on up through multi-thread, multi-process, multi-machine?
MVCC for b-trees comes close. See, e.g., http://guide.couchdb.org/draft/btree.html Cheers, Rutger