
I know there's over a zillion bitset implementations, but often times I have found myself in need of a bitset that is efficient (space-wise) at storing millions, billions, or even trillions (no joke) of bits. For this I have developed an in-house class which stores bits in a structure vaguely reminiscent of a "T-Tree" (http://en.wikipedia.org/wiki/T_tree). You can think of it basically as an std::map<boost::uint64_t, boost::uint64_t> where <key, value> represents the *interval* of bits [key, key+value-1]. So it's basically run-length encoded, with logarithmic lookups instead of linear lookups. Obviously the worst case scenario is where bits alternate 10101... throughout the entire bitset, and this structure will be terrible in that case. But where there are frequent runs of 0 and 1 bits, this gives orders of magnitude savings in space. Since a map is sorted on key, it's still a valid binary search tree and one can find the node for the k'th bit by map::lower_bound(k) and quickly check whether the bit is set by testing whether k is in [iter->first , iter->first+iter->second). The most difficult part is insertion / deletion since it involves splitting (or merging) intervals and rebalancing but the complexity is still no worse than that of a normal rebalancing avl tree. I almost exclusively use this structure in a way that interval nodes represent sequences of 1 bits, but it can just as easily be used the other way around, where intervals represent sequences of 0 bits. In fact now that I think about it probably doesn't even matter. Performing bitwise operations on such a structure is very fast. All you need to do is expand / reduce the intervals appropriately. Lookup / assignment is obviously not constant time but of course that is the tradeoff. Forward / reverse iteration is still very fast however as you only need to take the non-constant time hit once on begin(). It is also very fast to iterate only 1 bits or only 0 bits for the same reason, which can be quite slow with normal bitsets. The implementation I currently have is very unsuitable for boost for many reasons, but if there is enough interest I will (slowly) work on adapting this to be more generic / usable by a wider audience, of course with suggestions and input from the community. Zach