
Hello. Sorry for multiple posts. Hopplfully I got I now right. ----------------------------------------------------- Dear boost developers, I checked the implementation of boost::dynamic_bitset::count and find that it uses a table look up. This is not optimal. First on 64-bit machines divide-and-conquer algorithms are faster than table look up and save even memory. Second if the bitset contains several words, there are for both 32 and 64 bit machines population count algorithms that are faster than the simple loop over all words used by boost. Together with Y. Edel I developed a new version,which saves 60% against the simple algorithm. We put the code online (http://cage.ugent.be/~klein/popc.html). A disadvantage of our implementation is that we relay on aggressive optimization (-O3) of the compiler. If you are searching a compromise between speed and code complexity. The first iteration of the Harley-Seal-method (25% faster than the simple loop) would be a candidate. You find this and other population count algorithms on our homepage, too. If there is interest I can add one of the advance algorithms in boost. Best wishes Andreas Klein

Andreas Klein wrote:
Hello.
Sorry for multiple posts. Hopplfully I got I now right.
----------------------------------------------------- Dear boost developers,
I checked the implementation of boost::dynamic_bitset::count
and find that it uses a table look up.
This is not optimal.
First on 64-bit machines divide-and-conquer algorithms are faster than table look up and save even memory.
Second if the bitset contains several words, there are for both 32 and 64 bit machines population count algorithms that are faster than the simple loop over all words used by boost.
Together with Y. Edel I developed a new version,which saves 60% against the simple algorithm. We put the code online (http://cage.ugent.be/~klein/popc.html). A disadvantage of our implementation is that we relay on aggressive optimization (-O3) of the compiler.
If you are searching a compromise between speed and code complexity. The first iteration of the Harley-Seal-method (25% faster than the simple loop) would be a candidate.
You find this and other population count algorithms on our homepage, too.
If there is interest I can add one of the advance algorithms in boost.
Best wishes Andreas Klein
How do these compare with the bit count algorithms at: http://graphics.stanford.edu/~seander/bithacks.html Jeff

How do these compare with the bit count algorithms at:
These algorithms are simple algorithms that work wordwise. In our test program we have a lot of variations of such algorithms. In short for 32-bit table look up it the best algorithm for a single word and for 64-bit one should use a divde an conquer method like some of the one on this side. But if the bitst is much larger than one words these algorithms are not optimal. You can save time if you use an algorithm that works on servel word in parallel. With our method we save about 60% time for bitset with more than 250 words. The Harley-Seal method saves 25%-50% time depending on the iteration an need only few words (about 10-30). Alalgorithms have fallbacks, so they do not wast time, if they see only one word. Best wishes Andreas Klein
participants (2)
-
Andreas Klein
-
Jeff Flinn