
Hi Zach, 2009/11/3 Zachary Turner <divisortheory@gmail.com>:
On Mon, Nov 2, 2009 at 11:25 AM, Joachim Faulhaber <afojgo@googlemail.com>wrote:
I'd like to ask Zachary Turner to comment on the use case. Can this be useful in your field?
This is interesting and at some point I might download the library and play with it. Is there any easy way for me to determine the memory usage of an interval_map / interval_set?
interval_map / interval_set are implemented via std::map / std::set. So for every node carrying an interval / interval-value-pair you have three pointers (left,right,parent) and a color bit. The interval itself needs 2*sizeof(domain_type) plus one byte for the border information. Finally a map node has to provide space for the value of it's codomain type. The number of interval or interval-value nodes is returned by the function interval_count() or iterative_size() on all interval containers.
It's hard to say whether this will offer an advantage in my use case because being that I'm interested in primarily a) which blocks on a filesystem are allocated, and b) which blocks on a filesystem have changed since the last time I checked, it depends heavily on the usage patterns of the user.
The which-blocks-are-allocated bitset would be more likely to *not* benefit from this type of additional compression, or at least not significantly, because it would be very rare on a disk to have a significantly high concentration of alternating bits,
as I have already mentioned in this thread, the periodic bit compression is a nice byproduct but should not be overemphasized. Clearly we won't find periodic patterns very often in file allocation tables ;)
or extremely short runs.
... hmm, I thought, due to fragmentation there could be more narrow sequences of changing block patterns that could profit from bitcompression ...
The which-blocks-have-changed-since-last-time bitset is a little more uncertain. Obviously file access patterns aren't truly random, but let's just say in the worst case scenario of completely random disk access, and hence completely random bits being flipped from 0 to 1, how would this affect the memory usage of this template? I feel like this is kind of a difficult situation to analyze so I'm not looking for an exact O() storage complexity or anything, but just any insight you have to offer. My current solution compresses only intervals and thus also suffers in the case of random access, but I wonder if this solution would allow me to save space.
It would allow to save space only, if in addition to interval compression your blocks-have-changed-patterns allow for additional bit compression: Clusters of changed-blocks within not too big ranges, that can be compressed into 'local' bitsets. In the worst case scenario of randomly and sparsely distributed bits 'large_bitset' has no advantage over an itl::interval_set or your compressed_bitset implementation (if I understood it right).
What kind of iteration support is provided? I often need to iterate through each bit that is set to 1, where the value_type of the iterator is the 0-based index of the bit in the bitset. So for example if I have a bitset representing the bit string 0110110101100 I would want, or at least I would want to be able to easily create, an iterator that returns 1, 2, 4, 5, 7, 9, 10.
mini::large_bitset started as a non trivial example on the usefulness of itl::interval_maps. So this implementation http://www.herold-faulhaber.de/boost_itl/doc/libs/itl/doc/html/boost_itl/pro... aims to be very *mini*mal. Also the more elaborated version 'interval_bitset' https://svn.boost.org/svn/boost/sandbox/itl/boost/itl_xt/interval_bitset.hpp is a first shot and not yet very much field tested (therefore not part of the core library submitted for review). With itl::interval_bitset you can currently iterate over intervals only. I am generally reluctant to implement element iterators for interval containers, because this would be IMO a permanent invitation for inefficient usage. Apart from that, it wouldn't be difficult to provide such an iterator for itl::interval_bitset.
The other main important thing would be the ability to do in-place bitwise |, &, ~.
There are, even in mini::large_bitset, three different overloads for bitwise operators (except for ~) that are performing inplace, if the inserted range is already in the container. As a useful addition I think about adding operators o= to combine whole (unit-) bitsets at a given offset interval_bitset& interval_bitset::operator o= (const std::pair<domain_type, bitset_type>& bits_at);
Advancing an iterator also needs to have constant-time complexity for me, ...
that would be no problem (apart from the reluctance problem ;)
... and the bitwise operators should probably be no greater than n log(n).
currently O(log(n)) for bits (elements) O(log(n)) for unit bitsets O(n) for intervals of elements (interval defined bitset) O(m log(n+m)) for interval_bitsets of iterative_size n and m Thank you for all your comments and suggestions! Cheers Joachim