Is someone interested in container CompressedVector, a vector-like storage for integer types

Hi! Assume you want an interface as for vector<unsigned int>, but know that elements have only N significant bits, say 17. In this case the container will save 46% memory compared to vector<unsigned int>. CompressedVector internally stores elements as a dynamic bitset (vector<size_t>) and uses bitshift ops to put/get significant bits into one or two size_t elements. It's similar to vector<bool> and dynamic_bitset, and has the same drawback, it's not a proper STL container. Anyway, the container is simple, efficient and quite fast. It will work best for very long arrays of indexes. Since its storage is continuous, it can also be used to speed up IO (load/store to disk, network transfers) acting as a very fast compression.

Alexander Bulovyatov wrote:
Hi!
Assume you want an interface as for vector<unsigned int>, but know that elements have only N significant bits, say 17. In this case the container will save 46% memory compared to vector<unsigned int>.
CompressedVector internally stores elements as a dynamic bitset (vector<size_t>) and uses bitshift ops to put/get significant bits into one or two size_t elements. It's similar to vector<bool> and dynamic_bitset, and has the same drawback, it's not a proper STL container.
Anyway, the container is simple, efficient and quite fast. It will work best for very long arrays of indexes. Since its storage is continuous, it can also be used to speed up IO (load/store to disk, network transfers) acting as a very fast compression.
Hi, Alejandro Cabrera needed something like that for counting bloom filters (GSOC project). He included a private implementation taking in account his needs. Do you think that you could provide an implementation that is optimized for (2, 4 bits)? Best, Vicente -- View this message in context: http://boost.2283326.n4.nabble.com/Is-someone-interested-in-container-Compre... Sent from the Boost - Dev mailing list archive at Nabble.com.

Well, I don't see any special optimization for 2, 4 bits. They will work faster than, say 7, because a compiler replaces mul / div by power of 2 with << >> ops. -- View this message in context: http://boost.2283326.n4.nabble.com/Is-someone-interested-in-container-Compre... Sent from the Boost - Dev mailing list archive at Nabble.com.

Alexander Bulovyatov wrote:
Well, I don't see any special optimization for 2, 4 bits. They will work faster than, say 7, because a compiler replaces mul / div by power of 2 with << >> ops.
Hin In this cases you are sure the bits will be always on the same word, so there is no need to take care of managing multiple words for theese cases. Best, Vicente -- View this message in context: http://boost.2283326.n4.nabble.com/Is-someone-interested-in-container-Compre... Sent from the Boost - Dev mailing list archive at Nabble.com.

You're right. Although, the second word isn't always needed, so we check a condition before considering it. OK, an additional condition N != pow of 2 can be added. It's static, so it'll be optimized by a compiler, then we save one IF. In my opinion, the most tricky part is to provide efficient iterators and packed ops as insert(interval), push_back(interval). -- View this message in context: http://boost.2283326.n4.nabble.com/Is-someone-interested-in-container-Compre... Sent from the Boost - Dev mailing list archive at Nabble.com.
participants (2)
-
Alexander Bulovyatov
-
Vicente Botet