Re: [boost] boost::compressed_bitset

11 Jun 2009

      2009/6/11 Zachary Turner <divisortheory@gmail.com>:
...
I will take a look at the sparsebitvector.  It does seem quite similar just
from how you described as a map<offset, bitset<128> >.  The first thing I
wonder though (without having read the link yet), is why does the value need
to be an entire bitset?.  Perhaps they are also trying to make it efficient
in the case where the bitset is "almost" sparse, but there are occasional
sequences of interleaved 0s and 1s.  That's an interesting idea I hadn't
thought of.  In any case it does seem like there is some potential for
overlap so I'll read the link in a little bit.
Well, by the time you have a map<intptr_t, T>, the size of a node is
at least 3 pointers plus the amount for T, so for alignment reasons
you might as well store at least a pointer's worth of bits in it,
since its free to do so, and then it's quite tempting to store more,
since tripling the number of bits per node only costs 50% extra memory
at worst, the first time, and can reduce it in some cases.

And on some platforms, with the way new is implemented, using the same
amount of storage for bits as for the allocator and node overhead
actually lets you store something like 8 pointers worth of bits per
node.
...
The two usage scenarios I can give you off the top of my head are one that
I'm using it for, and one that the Windows Kernel uses the same data
structure for.  I use it to store a filesystem allocation bitmap.  For every
block on the filesystem (where the block size can be 1KB, 4KB, etc depending
on user settings), there is 1 bit to say whether the block is allocated or
unallocated.  If you work out the math, for a 1TB drive you can end up with
a bitset taking up 33MB of data if you use an uncompressed bitset.  But on
the other hand, massive storage capacity can often be reminiscent of a RAID
array, which is likely to be defragmented.  And of course most people in
general tend to do some sort of defragmentation pretty often.  Although even
if that were not the case, typical filesystem allocation patterns even for
heavily fragmented drives tend to have lots of runs of 1s or 0s.
Ah, I see.  The block allocator will already be trying pretty hard to
avoid the pathological case for the range version.  Smart.

LLVM's version uses a linked list, keeping an iterator to an internal
element so adjactent requests can be O(1).  I wonder whether an
intrusive::splay_tree might make an interesting backend for
compressed_bitset for similar reasons?

Not that this helps in any way, but whole "storing a run of one bits
as one element" reminds be of segmented binary numbers, which I first
saw from Okasaki's thesis.

~ Scott

Re: [boost] boost::compressed_bitset

Scott McMurray