
On Fri, Dec 21, 2012 at 2:26 AM, Rutger ter Borg <rutger@terborg.net> wrote:
On 2012-12-20 16:45, Beman Dawes wrote:
2) Data compression for pages (less file sizes, less memory usage, ordered data can be compressed very good)
I'm strongly against that. The full rationale for not doing compression is lengthy research paper, but the bottom line is what Rudolf Bayer said so many years ago with regard to prefix and suffix key compression - the increased complexity and reduced reliability makes compression very unattractive.
The problems of compression are closely related to the problems of variable length data. If the application is willing to tolerate sequential (rather than binary) search at the page level, or is willing to tolerate an indexed organization on pages, or even (gasp!) an additional disk access per comparison, these problems aren't necessarily showstoppers. But if some applications won't tolerate even a dispatch (either a virtual function or a hand-rolled dispatch) to select the approach being employed, then the only choice I can see is to provide essentially multiple sets of classes, and that gets complex and messy.
At what point do you expect serialization to be executed? Do pages need to be kept in a serialized state in memory? Is it wrong to, say, represent a page with a std::map in memory, and serialize to a binary page when writing? In the latter case, compression seems to be less difficult.
It's been some time since I've taken a look at Beman's btree code, so if I'm wrong about something please correct me. There is no support for variable-length keys at the moment. The code is structured to always have the same number of keys in one page. Compression would throw this off, adding a good bit of complexity. Also, I believe the pages are kept in a blittable format (i.e. you can mmap it and use it right away without any deserializing), so there's a relatively minor complexity of changing that. I do see a use for compression, and I'd definitely like the option. Not necessarily prefix/suffix folding, but a general (ie. zlib) page compression could come in handy on e.g. phones and SD cards, or over a network -- scenarios where I/O is particularly expensive and CPU is plentiful. -- Cory Nelson http://int64.org