optimizing use of dynamic_bitset

Hi Folks, Having gotten a big chunk of code to work, I'm now spending a few days making it fast. So far, I've removed enough overhead to drop our runtime by a factor of 4, and a big part of that was replacing some other data structures with dynamic_bitset. Profiling data now indicates that one of the more expensive operations is spending most of its time calling dynamic_bitset<>::test a bajillion times. My question is this: is dynamic_bitset::test generally thought to be the fastest way to access a specific bit in a dynamic_bitset? Are there other ways to determine if bit n is set? (I assume that anything using dynamic_bitset::operator[] will be slower, since it has to set up a proxy, but have not tested this.) Any thoughts are appreciated. Thanks! -- Dave Steffen, Ph.D. Disobey this command! Software Engineer IV - Douglas Hofstadter Numerica Corporation dg@steffen a@t numerica d@ot us (remove @'s to email me)

On Thu, Jun 14, 2007 at 10:36:18AM -0600, Dave Steffen wrote:
Profiling data now indicates that one of the more expensive operations is spending most of its time calling dynamic_bitset<>::test a bajillion times. My question is this: is dynamic_bitset::test generally thought to be the fastest way to access a specific bit in a dynamic_bitset? Are there other ways to determine if bit n is set?
In boost/dynamic_bitset/dynamic_bitset.hpp I find the following definition template <typename Block, typename Allocator> bool dynamic_bitset<Block, Allocator>::m_unchecked_test(size_type pos) const { return (m_bits[block_index(pos)] & bit_mask(pos)) != 0; } This can be further optimized by using processor-specific instructions (inline asm); eg. x86 CPUs have the BT (bit test) instruction which tests a particular bit. The nice thing about it is that it works over arbitrarily long bit strings in memory. Thus, a memory reference, AND and runtime computation of bitmask (at least 3 instructions) can be boiled down to a single instruction. I doubt that compiler optimizers are smart enough to recognize the above pattern and convert it automatically to CPU-specific code. You can always check by disassembling the generated code.
participants (2)
-
Dave Steffen
-
Zeljko Vrba