Population count procedure in dynamic_set
Hello! The library dynamic_bitset contains a slow population count procedure: https://github.com/boostorg/dynamic_bitset/blob/340822f979371973fe4a84363623... There are several better, faster and (more or less) portable algorithms. Daniel Lemire, Nathan Kurz and I work on a library which provides fastest possible implementations for CPU having AVX2 instruction set, you can check out its current state: https://github.com/CountOnes/hamming_weight. I also did an extensive comparison of different algorithms on different machines: https://github.com/WojciechMula/sse-popcount, one example https://github.com/WojciechMula/sse-popcount/blob/master/results/skylake/sky... As I mentioned these algorithms are different, the fastest use SIMD instructions, other depends on a CPU instruction. However, some approaches are portable, machine-independent, like this https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-bit-parallel... It's possible to provide better solution at small cost, a lot of work has already been done. Would you be interested in including such boosted procedures as part of dynamic_bitset (or a new library)? Popcount itself is used in number of fields: I've found it in DNA matching (Jaccard index), computer vision, databases, and Daniel recently reported application in machine learning. best regards Wojciech
On 09/28/2016 03:12 AM, Wojciech Muła wrote:
Hello!
The library dynamic_bitset contains a slow population count procedure: https://github.com/boostorg/dynamic_bitset/blob/340822f979371973fe4a84363623...
There are several better, faster and (more or less) portable algorithms. Daniel Lemire, Nathan Kurz and I work on a library which provides fastest possible implementations for CPU having AVX2 instruction set, you can check out its current state: https://github.com/CountOnes/hamming_weight. I also did an extensive comparison of different algorithms on different machines: https://github.com/WojciechMula/sse-popcount, one example https://github.com/WojciechMula/sse-popcount/blob/master/results/skylake/sky...
As I mentioned these algorithms are different, the fastest use SIMD instructions, other depends on a CPU instruction. However, some approaches are portable, machine-independent, like this https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-bit-parallel...
It's possible to provide better solution at small cost, a lot of work has already been done. Would you be interested in including such boosted procedures as part of dynamic_bitset (or a new library)?
Yes. If there is a faster method, which is portable, Boost is certainly interested in it. If you provide a PR I will certainly look at it.
Popcount itself is used in number of fields: I've found it in DNA matching (Jaccard index), computer vision, databases, and Daniel recently reported application in machine learning.
best regards Wojciech
participants (2)
-
Edward Diener
-
Wojciech Muła