set() of dynamic_bitset is too slow.

Dear list, I am using dynamic_bitset to store a sequence of random 0/1. The program generates repeatedly 32bit random numbers and assign the result to a dynamic_bitset. for(UINT i=0; i < m_N; ++i) { if(bit == block) // generate a block { ri = rng().randInt(m_max); bit = 0; } if((ri >> bit) & 0x1) succ.set(i); bit++; } When I profile this program, I surprisingly find out that succ.set(i) uses >50% of the execution time. Is there anyway I can assign block by block to dynamic_bitset, or assign sequentially so the position of i does not have to be calculated each time? Many thanks in advance. Bo

Bo Peng wrote:
Dear list,
I am using dynamic_bitset to store a sequence of random 0/1. The program generates repeatedly 32bit random numbers and assign the result to a dynamic_bitset.
for(UINT i=0; i < m_N; ++i) { if(bit == block) // generate a block { ri = rng().randInt(m_max); bit = 0; } if((ri >> bit) & 0x1) succ.set(i); bit++; } When I profile this program, I surprisingly find out that succ.set(i) uses >50% of the execution time. Is there anyway I can assign block by block to dynamic_bitset, or assign sequentially so the position of i does not have to be calculated each time?
Hi,
you code looks very suspect. Are you aware it actually yields m_N bits
rather than m_N blocks? Also, I'll assume the dynamic_bitset is sized in
advance. Sticking to the problem, there are many alternative codings; in
particular you could use the iterator-based version of append, after
creating a little iterator class for the random integers (please, ignore
from_block_range() :-)). If that's not suitable for you might try this:
typedef UINT block_type;
boost::dynamic_bitset

typedef UINT block_type; boost::dynamic_bitset
succ; /* no resizing needed */ // (assuming m_N is a multiple of the UINT width, i.e. 32) for(UINT i=1; i <= m_N / (block + 1); ++i) { block_type b = /*random...*/; succ.append(b); }
Let us know if it worked for you.
I should have read the manuals more closely! I did not notice that
append is appending a block of bits, while push_back appends only one
bit.
Your code works fine, but I worry about the size of block so I use
dynamic_bitset<>::block_type and bits_per_block. The code is now more
complicated but runs *much* faster!
succ.clear();
size_t numblock = m_N/BitSet::bits_per_block;
vectorBitSet::block_type first_blocks(numblock);
for(size_t i=0; i

succ.clear(); size_t numblock = m_N/BitSet::bits_per_block; vectorBitSet::block_type first_blocks(numblock); for(size_t i=0; i
> i) & 0x1) succ.set(numblock*BitSet::bits_per_block+i); }
When I think again about this, the special treatment of the last block
is unnecessary, and because succ is already allocated, the following
code is clearer:
vectorBitSet::block_type blocks(succ.num_blocks());
for(size_t i=0; i

On Sun, 4 Jun 2006 00:37:28 -0500, "Bo Peng"
succ.clear(); size_t numblock = m_N/BitSet::bits_per_block; vectorBitSet::block_type first_blocks(numblock); for(size_t i=0; i
> i) & 0x1) succ.set(numblock*BitSet::bits_per_block+i); } When I think again about this, the special treatment of the last block is unnecessary, and because succ is already allocated, the following code is clearer:
vectorBitSet::block_type blocks(succ.num_blocks()); for(size_t i=0; i
I guess more efficiency can only be achieved through block_begin() and block_end() functions so that I can write to the blocks directly. Is this possible?
No, that is intentionally not possible. But you can construct the
dynamic_bitset object directly from a suitable iterator range (no need
to use an intermediate vector). Just to illustrate the point, this is
an example iterator based on boost::iterator_facade. Use it with
caution as I wrote it off the top of my head without verifying if it
really satisfies all iterator requirements:
#include <cstddef>
#include "boost/iterator/iterator_facade.hpp"
#include "boost/dynamic_bitset.hpp"
template<typename v>
class rng_iterator_impl : public boost::iterator_facade<
rng_iterator_impl<v>,
v,
boost::random_access_traversal_tag
>
{
std::size_t count;
mutable v val;
public:
rng_iterator_impl(): count(0), val(0) {}
explicit rng_iterator_impl(v c): count(c), val(0) { }
private:
friend class boost::iterator_core_access;
value_type& dereference() const { return val=generate(); }
bool equal(rng_iterator_impl const& rhs) const {
return count == rhs.count;
}
void increment() { ++count; }
void decrement() { --count; }
void advance(difference_type n) { count+=n; }
difference_type distance_to(rng_iterator_impl const & r) const {
return r.count - count;
}
v generate() const { /* call you generating function here */ }
};
typedef boost::dynamic_bitset<> my_bitset_type;
typedef rng_iterator_impl

No, that is intentionally not possible. But you can construct the dynamic_bitset object directly from a suitable iterator range (no need to use an intermediate vector). Just to illustrate the point, this is an example iterator based on boost::iterator_facade.
I think I get the idea of this piece of code (after reading the
iterator document), but it does not really fit my needs. If you are
interested, I can explain to you the whole scenario:
In a large simulation program, I need to simulate successes at
different rates, for many times. For example, I may need to simulate a
sequence of 0/1 with probability [0.00001, 0.5, 0.003, 0.001] for
10000 times. This 10000-times process may be repeated as well. Doing
randUniform(0,1) < p 40000 times is very slow.
I then build a class that wraps around a
vector

On Wed, 7 Jun 2006 14:36:15 -0500, "Bo Peng"
No, that is intentionally not possible. But you can construct the dynamic_bitset object directly from a suitable iterator range (no need to use an intermediate vector). Just to illustrate the point, this is an example iterator based on boost::iterator_facade.
I think I get the idea of this piece of code (after reading the iterator document), but it does not really fit my needs. If you are interested, I can explain to you the whole scenario:
Sure, I'm here to help, in as far as I can.
In a large simulation program, I need to simulate successes at different rates, for many times. For example, I may need to simulate a sequence of 0/1 with probability [0.00001, 0.5, 0.003, 0.001] for 10000 times. This 10000-times process may be repeated as well. Doing randUniform(0,1) < p 40000 times is very slow.
I then build a class that wraps around a vector
, which, for example, has 4 dynamic_bitset of length 10000. Since 0.5 and very small p are comon, I use special algorithms to handle them and store the results in this bit matrix. The results will be retrieved as size 4 vectors. The bit matrix has fixed size, and will be refreshed when needed. Efficiency needs to be improved in two parts:
1. set random bits when p=0.5. This has been largely solved but I need to use a temporary vector to stored generated random number, and transfer them into the dynamic_bitset. Your iterator allows me to create a bitvector, but I basically need to write to an existing vector. A built-in block iterator would be helpful in this case.
You if can leave with the chance that from_block_range could be removed in later version of the library then you can use that one. However, keep in mind that a) it doesn't resize the bitset: it's your responsibility to ensure that you won't copy "too many blocks" into it b) it assumes the bitset size is a multiple of the block_width; for instance, if on your platform block_width = 32, the bitset will actually have (1 + [10000/32]) * 32 = 313 * 32 = 10016 bits. To be safe (and avoid library asserts) you should first size the bitset to 10016, then resize it to 10000 *after* calling from_block_range. A potentially slower alternative, with the same caveats about the actual bitset size, but with no need for an intermediate vector in the client code, would be: std::swap(succ_matrix[i], (your_bitset_type(it, it + 313))); // shrink succ_matrix[i] to 10000 elements Differently from the from_block_range solution, this won't save you from unnecessary dynamic allocations. Furthermore, due to an oversight on our part, it's unlikely that the correct constructor will be picked up for my_bitset_type(it, it + 313), though everything might work, for instance, with VC 7.1. This will be fixed as soon as possible.
2. retrieve succ from each column of this matrix. Curently, I am doing something like
vector<int> succ(4); .... for(size_t i=0; i<4; ++i) succ[i] = succ_matrix[i][cur]; // process succ cur ++ // get another row
I'm confused by this. Does succ_matrix[i][cur] yields a bool or an int? And in the latter case, what value does it actually return?
The [cur] operation is slow since its location in dynamic_bitset needs to be calculated each time. It would be better if I can do something like:
vector<int> succ(4); for(size_t i=0; i<4; ++i) succ[i] = *(++succ_iterator[i]);
where the location is stored in iterators and need not to be re-calculated.
So, as you can see, I basically need two levels (bit and block) of iterators to access existing bitsets.
The reason why dynamic_bitset does not expose block-level iterators is that I see blocks as an implementation detail. I'm not even convinced there should be a template parameter for them. The reason why it doesn't expose bit-level iterators is that they wouldn't integrate with the standard library anyway (see the section about why dynamic_bitset is not a container, in the docs). But given that Jeremy's new iterator categories have been approved for TR1 I'm going to implement them. *Nonetheless* incrementing/decrementing iterators still requires some computation. Maybe you are thinking that such iterators could be implemented by storing a pointer or reference to a bitset and an index, but would not be conforming. If you reply to my questions above I can try giving further help. Cheers, --Gennaro.

However, keep in mind that a) it doesn't resize the bitset: it's your responsibility to ensure that you won't copy "too many blocks" into it
Now, I am writing num_blocks()-1 blocks, and treat the last block separately. I think I can just use bit sizes that are multiples of block size, although the last few bits will not be used. In this way, the algorithm will be cleaner and faster, and boost/dynamic_bitset will not complain.
2. retrieve succ from each column of this matrix. Curently, I am doing something like
vector<int> succ(4); .... for(size_t i=0; i<4; ++i) succ[i] = succ_matrix[i][cur]; // process succ cur ++ // get another row
I'm confused by this. Does succ_matrix[i][cur] yields a bool or an int? And in the latter case, what value does it actually return?
I was using dynamic_bitset to get results (succ) but the performance was bad. vector<int> is much faster. I can afford to use int since succ is usually small.
The reason why dynamic_bitset does not expose block-level iterators is that I see blocks as an implementation detail. I'm not even convinced there should be a template parameter for them.
I have checked my random number generator. It will produce full range of integer values (0 - 2^32-1). As a result, I use unsigned long as the template parameter of dynamic_bitset. At least for my usage, this template parameter is useful.
*Nonetheless* incrementing/decrementing iterators still requires some computation. Maybe you are thinking that such iterators could be implemented by storing a pointer or reference to a bitset and an index, but would not be conforming.
You mean *(++ptr) will be as slow as succ[++idx]? I would be really disappointed. Bo

On Fri, 2 Jun 2006 17:03:11 -0500, "Bo Peng"
Dear list,
I am using dynamic_bitset to store a sequence of random 0/1. The program generates repeatedly 32bit random numbers and assign the result to a dynamic_bitset.
for(UINT i=0; i < m_N; ++i) { if(bit == block) // generate a block { ri = rng().randInt(m_max); bit = 0; } if((ri >> bit) & 0x1) succ.set(i); bit++; } When I profile this program, I surprisingly find out that succ.set(i) uses >50% of the execution time. Is there anyway I can assign block by block to dynamic_bitset, or assign sequentially so the position of i does not have to be calculated each time?
Hi,
you code looks suspect to me. Are you aware it actually yields m_N
bits rather than m_N blocks? Also, I'll assume the dynamic_bitset is
sized in advance. Sticking to the problem, there are many alternative
codings; in particular you could use the iterator-based version of
append, after creating a little iterator class for the random integers
(please, ignore from_block_range() :-)). If that's not suitable for
you might try this:
typedef UINT block_type;
boost::dynamic_bitset

On Sat, 03 Jun 2006 16:03:57 +0200, Gennaro Prota
[...] you might try this:
typedef UINT block_type; boost::dynamic_bitset
succ; /* no resizing needed */ // (assuming m_N is a multiple of the UINT width, i.e. 32) for(UINT i=1; i <= m_N / (block + 1); ++i) { [...]
Of course, it should be "block", not "block + 1", here and in the other reply I gave at first. Sorry for the oversight. --Gennaro.
participants (2)
-
Bo Peng
-
Gennaro Prota