set() of dynamic_bitset is too slow.

newer
Serialization of valarray, Problem...

older
Help to BUILD AND INSTALL Boost...

Bo Peng

2 Jun 2006 2 Jun '06

10:03 p.m.

Dear list, I am using dynamic_bitset to store a sequence of random 0/1. The program generates repeatedly 32bit random numbers and assign the result to a dynamic_bitset. for(UINT i=0; i < m_N; ++i) { if(bit == block) // generate a block { ri = rng().randInt(m_max); bit = 0; } if((ri >> bit) & 0x1) succ.set(i); bit++; } When I profile this program, I surprisingly find out that succ.set(i) uses >50% of the execution time. Is there anyway I can assign block by block to dynamic_bitset, or assign sequentially so the position of i does not have to be calculated each time? Many thanks in advance. Bo

Show replies by date

Gennaro Prota

3 Jun 3 Jun

12:59 p.m.

Bo Peng wrote:

...

Dear list,

I am using dynamic_bitset to store a sequence of random 0/1. The program generates repeatedly 32bit random numbers and assign the result to a dynamic_bitset.

for(UINT i=0; i < m_N; ++i) { if(bit == block) // generate a block { ri = rng().randInt(m_max); bit = 0; } if((ri >> bit) & 0x1) succ.set(i); bit++; } When I profile this program, I surprisingly find out that succ.set(i) uses >50% of the execution time. Is there anyway I can assign block by block to dynamic_bitset, or assign sequentially so the position of i does not have to be calculated each time?

Hi, you code looks very suspect. Are you aware it actually yields m_N bits rather than m_N blocks? Also, I'll assume the dynamic_bitset is sized in advance. Sticking to the problem, there are many alternative codings; in particular you could use the iterator-based version of append, after creating a little iterator class for the random integers (please, ignore from_block_range() :-)). If that's not suitable for you might try this: typedef UINT block_type; boost::dynamic_bitset<block_type> succ; /* no resizing needed */ // (assuming m_N is a multiple of the UINT width, i.e. 32) for(UINT i=1; i <= m_N / (block + 1); ++i) { block_type b = /*random...*/; succ.append(b); } Let us know if it worked for you. Cheers, Gennaro.

Bo Peng

4 Jun 4 Jun

4:37 a.m.

...

typedef UINT block_type; boost::dynamic_bitset<block_type> succ; /* no resizing needed */

// (assuming m_N is a multiple of the UINT width, i.e. 32) for(UINT i=1; i <= m_N / (block + 1); ++i) { block_type b = /*random...*/; succ.append(b); }

Let us know if it worked for you.

I should have read the manuals more closely! I did not notice that append is appending a block of bits, while push_back appends only one bit. Your code works fine, but I worry about the size of block so I use dynamic_bitset<>::block_type and bits_per_block. The code is now more complicated but runs *much* faster! succ.clear(); size_t numblock = m_N/BitSet::bits_per_block; vector<BitSet::block_type> first_blocks(numblock); for(size_t i=0; i<numblock; ++i) first_blocks[i] = rng().randInt(~BitSet::block_type(0)); succ.append(first_blocks.begin(), first_blocks.end()); // last block BitSet::block_type last_block = rng().randInt(~BitSet::block_type(0)); for(size_t i=0; i < m_N - numblock*BitSet::bits_per_block; ++i) { if((last_block >> i) & 0x1) succ.set(numblock*BitSet::bits_per_block+i); } Of course BitSet is a typedef of dynamic_bitset<>. Many thanks for your help. Bo

Bo Peng

5:37 a.m.

...

succ.clear(); size_t numblock = m_N/BitSet::bits_per_block; vector<BitSet::block_type> first_blocks(numblock); for(size_t i=0; i<numblock; ++i) first_blocks[i] = rng().randInt(~BitSet::block_type(0)); succ.append(first_blocks.begin(), first_blocks.end()); // last block BitSet::block_type last_block = rng().randInt(~BitSet::block_type(0)); for(size_t i=0; i < m_N - numblock*BitSet::bits_per_block; ++i) { if((last_block >> i) & 0x1) succ.set(numblock*BitSet::bits_per_block+i); }

When I think again about this, the special treatment of the last block is unnecessary, and because succ is already allocated, the following code is clearer: vector<BitSet::block_type> blocks(succ.num_blocks()); for(size_t i=0; i<blocks.size(); ++i) blocks[i] = rng().randInt(~BitSet::block_type(0)); from_block_range(blocks.begin(), blocks.end(), succ); I guess more efficiency can only be achieved through block_begin() and block_end() functions so that I can write to the blocks directly. Is this possible? Bo

Gennaro Prota

5:42 p.m.

On Sun, 4 Jun 2006 00:37:28 -0500, "Bo Peng" <ben.bob@gmail.com> wrote:

...

...
succ.clear(); size_t numblock = m_N/BitSet::bits_per_block; vector<BitSet::block_type> first_blocks(numblock); for(size_t i=0; i<numblock; ++i) first_blocks[i] = rng().randInt(~BitSet::block_type(0)); succ.append(first_blocks.begin(), first_blocks.end()); // last block BitSet::block_type last_block = rng().randInt(~BitSet::block_type(0)); for(size_t i=0; i < m_N - numblock*BitSet::bits_per_block; ++i) { if((last_block >> i) & 0x1) succ.set(numblock*BitSet::bits_per_block+i); }

When I think again about this, the special treatment of the last block is unnecessary, and because succ is already allocated, the following code is clearer:

vector<BitSet::block_type> blocks(succ.num_blocks()); for(size_t i=0; i<blocks.size(); ++i) blocks[i] = rng().randInt(~BitSet::block_type(0)); from_block_range(blocks.begin(), blocks.end(), succ);

I guess more efficiency can only be achieved through block_begin() and block_end() functions so that I can write to the blocks directly. Is this possible?

No, that is intentionally not possible. But you can construct the dynamic_bitset object directly from a suitable iterator range (no need to use an intermediate vector). Just to illustrate the point, this is an example iterator based on boost::iterator_facade. Use it with caution as I wrote it off the top of my head without verifying if it really satisfies all iterator requirements: #include <cstddef> #include "boost/iterator/iterator_facade.hpp" #include "boost/dynamic_bitset.hpp" template<typename v> class rng_iterator_impl : public boost::iterator_facade< rng_iterator_impl<v>, v, boost::random_access_traversal_tag > { std::size_t count; mutable v val; public: rng_iterator_impl(): count(0), val(0) {} explicit rng_iterator_impl(v c): count(c), val(0) { } private: friend class boost::iterator_core_access; value_type& dereference() const { return val=generate(); } bool equal(rng_iterator_impl const& rhs) const { return count == rhs.count; } void increment() { ++count; } void decrement() { --count; } void advance(difference_type n) { count+=n; } difference_type distance_to(rng_iterator_impl const & r) const { return r.count - count; } v generate() const { /* call you generating function here */ } }; typedef boost::dynamic_bitset<> my_bitset_type; typedef rng_iterator_impl<my_bitset_type::block_type> rng_iterator; Now, you should be able to do: int main() { const int how_many = 8; rng_iterator it; my_bitset_type b(it, it + how_many); } If the last line in main() gives you problem please replace it with: my_bitset_type b; b.append(it, it + how_many); Cheers. --Gennaro.

Bo Peng

7 Jun 7 Jun

7:36 p.m.

...

No, that is intentionally not possible. But you can construct the dynamic_bitset object directly from a suitable iterator range (no need to use an intermediate vector). Just to illustrate the point, this is an example iterator based on boost::iterator_facade.

I think I get the idea of this piece of code (after reading the iterator document), but it does not really fit my needs. If you are interested, I can explain to you the whole scenario: In a large simulation program, I need to simulate successes at different rates, for many times. For example, I may need to simulate a sequence of 0/1 with probability [0.00001, 0.5, 0.003, 0.001] for 10000 times. This 10000-times process may be repeated as well. Doing randUniform(0,1) < p 40000 times is very slow. I then build a class that wraps around a vector<dynamic_bitset<unsigned long> >, which, for example, has 4 dynamic_bitset of length 10000. Since 0.5 and very small p are comon, I use special algorithms to handle them and store the results in this bit matrix. The results will be retrieved as size 4 vectors. The bit matrix has fixed size, and will be refreshed when needed. Efficiency needs to be improved in two parts: 1. set random bits when p=0.5. This has been largely solved but I need to use a temporary vector to stored generated random number, and transfer them into the dynamic_bitset. Your iterator allows me to create a bitvector, but I basically need to write to an existing vector. A built-in block iterator would be helpful in this case. 2. retrieve succ from each column of this matrix. Curently, I am doing something like vector<int> succ(4); .... for(size_t i=0; i<4; ++i) succ[i] = succ_matrix[i][cur]; // process succ cur ++ // get another row The [cur] operation is slow since its location in dynamic_bitset needs to be calculated each time. It would be better if I can do something like: vector<int> succ(4); for(size_t i=0; i<4; ++i) succ[i] = *(++succ_iterator[i]); where the location is stored in iterators and need not to be re-calculated. So, as you can see, I basically need two levels (bit and block) of iterators to access existing bitsets. Cheers, Bo

Gennaro Prota

8 Jun 8 Jun

5:52 p.m.

On Wed, 7 Jun 2006 14:36:15 -0500, "Bo Peng" <ben.bob@gmail.com> wrote:

...

...
No, that is intentionally not possible. But you can construct the dynamic_bitset object directly from a suitable iterator range (no need to use an intermediate vector). Just to illustrate the point, this is an example iterator based on boost::iterator_facade.

I think I get the idea of this piece of code (after reading the iterator document), but it does not really fit my needs. If you are interested, I can explain to you the whole scenario:

Sure, I'm here to help, in as far as I can.

...

In a large simulation program, I need to simulate successes at different rates, for many times. For example, I may need to simulate a sequence of 0/1 with probability [0.00001, 0.5, 0.003, 0.001] for 10000 times. This 10000-times process may be repeated as well. Doing randUniform(0,1) < p 40000 times is very slow.

I then build a class that wraps around a vector<dynamic_bitset<unsigned long> >, which, for example, has 4 dynamic_bitset of length 10000. Since 0.5 and very small p are comon, I use special algorithms to handle them and store the results in this bit matrix. The results will be retrieved as size 4 vectors. The bit matrix has fixed size, and will be refreshed when needed.

Efficiency needs to be improved in two parts:

1. set random bits when p=0.5. This has been largely solved but I need to use a temporary vector to stored generated random number, and transfer them into the dynamic_bitset. Your iterator allows me to create a bitvector, but I basically need to write to an existing vector. A built-in block iterator would be helpful in this case.

You if can leave with the chance that from_block_range could be removed in later version of the library then you can use that one. However, keep in mind that a) it doesn't resize the bitset: it's your responsibility to ensure that you won't copy "too many blocks" into it b) it assumes the bitset size is a multiple of the block_width; for instance, if on your platform block_width = 32, the bitset will actually have (1 + [10000/32]) * 32 = 313 * 32 = 10016 bits. To be safe (and avoid library asserts) you should first size the bitset to 10016, then resize it to 10000 *after* calling from_block_range. A potentially slower alternative, with the same caveats about the actual bitset size, but with no need for an intermediate vector in the client code, would be: std::swap(succ_matrix[i], (your_bitset_type(it, it + 313))); // shrink succ_matrix[i] to 10000 elements Differently from the from_block_range solution, this won't save you from unnecessary dynamic allocations. Furthermore, due to an oversight on our part, it's unlikely that the correct constructor will be picked up for my_bitset_type(it, it + 313), though everything might work, for instance, with VC 7.1. This will be fixed as soon as possible.

...

2. retrieve succ from each column of this matrix. Curently, I am doing something like

vector<int> succ(4); .... for(size_t i=0; i<4; ++i) succ[i] = succ_matrix[i][cur]; // process succ cur ++ // get another row

I'm confused by this. Does succ_matrix[i][cur] yields a bool or an int? And in the latter case, what value does it actually return?

...

The [cur] operation is slow since its location in dynamic_bitset needs to be calculated each time. It would be better if I can do something like:

vector<int> succ(4); for(size_t i=0; i<4; ++i) succ[i] = *(++succ_iterator[i]);

where the location is stored in iterators and need not to be re-calculated.

So, as you can see, I basically need two levels (bit and block) of iterators to access existing bitsets.

The reason why dynamic_bitset does not expose block-level iterators is that I see blocks as an implementation detail. I'm not even convinced there should be a template parameter for them. The reason why it doesn't expose bit-level iterators is that they wouldn't integrate with the standard library anyway (see the section about why dynamic_bitset is not a container, in the docs). But given that Jeremy's new iterator categories have been approved for TR1 I'm going to implement them. *Nonetheless* incrementing/decrementing iterators still requires some computation. Maybe you are thinking that such iterators could be implemented by storing a pointer or reference to a bitset and an index, but would not be conforming. If you reply to my questions above I can try giving further help. Cheers, --Gennaro.

Bo Peng

9 Jun 9 Jun

3:09 a.m.

...

However, keep in mind that a) it doesn't resize the bitset: it's your responsibility to ensure that you won't copy "too many blocks" into it

Now, I am writing num_blocks()-1 blocks, and treat the last block separately. I think I can just use bit sizes that are multiples of block size, although the last few bits will not be used. In this way, the algorithm will be cleaner and faster, and boost/dynamic_bitset will not complain.

...

...
2. retrieve succ from each column of this matrix. Curently, I am doing something like

vector<int> succ(4); .... for(size_t i=0; i<4; ++i) succ[i] = succ_matrix[i][cur]; // process succ cur ++ // get another row

I'm confused by this. Does succ_matrix[i][cur] yields a bool or an int? And in the latter case, what value does it actually return?

I was using dynamic_bitset to get results (succ) but the performance was bad. vector<int> is much faster. I can afford to use int since succ is usually small.

...

The reason why dynamic_bitset does not expose block-level iterators is that I see blocks as an implementation detail. I'm not even convinced there should be a template parameter for them.

I have checked my random number generator. It will produce full range of integer values (0 - 2^32-1). As a result, I use unsigned long as the template parameter of dynamic_bitset. At least for my usage, this template parameter is useful.

...

*Nonetheless* incrementing/decrementing iterators still requires some computation. Maybe you are thinking that such iterators could be implemented by storing a pointer or reference to a bitset and an index, but would not be conforming.

You mean *(++ptr) will be as slow as succ[++idx]? I would be really disappointed. Bo

Gennaro Prota

3 Jun 3 Jun

2:03 p.m.

On Fri, 2 Jun 2006 17:03:11 -0500, "Bo Peng" <ben.bob@gmail.com> wrote:

...

Dear list,

I am using dynamic_bitset to store a sequence of random 0/1. The program generates repeatedly 32bit random numbers and assign the result to a dynamic_bitset.

for(UINT i=0; i < m_N; ++i) { if(bit == block) // generate a block { ri = rng().randInt(m_max); bit = 0; } if((ri >> bit) & 0x1) succ.set(i); bit++; } When I profile this program, I surprisingly find out that succ.set(i) uses >50% of the execution time. Is there anyway I can assign block by block to dynamic_bitset, or assign sequentially so the position of i does not have to be calculated each time?

Hi, you code looks suspect to me. Are you aware it actually yields m_N bits rather than m_N blocks? Also, I'll assume the dynamic_bitset is sized in advance. Sticking to the problem, there are many alternative codings; in particular you could use the iterator-based version of append, after creating a little iterator class for the random integers (please, ignore from_block_range() :-)). If that's not suitable for you might try this: typedef UINT block_type; boost::dynamic_bitset<block_type> succ; /* no resizing needed */ // (assuming m_N is a multiple of the UINT width, i.e. 32) for(UINT i=1; i <= m_N / (block + 1); ++i) { block_type b = /*random...*/; succ.append(b); } Please, let us know if this helped. Cheers, Gennaro. -- Genny.

Gennaro Prota

4 Jun 4 Jun

5:47 p.m.

On Sat, 03 Jun 2006 16:03:57 +0200, Gennaro Prota <gennaro_prota@yahoo.com> wrote:

...

[...] you might try this:

typedef UINT block_type; boost::dynamic_bitset<block_type> succ; /* no resizing needed */

// (assuming m_N is a multiple of the UINT width, i.e. 32) for(UINT i=1; i <= m_N / (block + 1); ++i) { [...]

Of course, it should be "block", not "block + 1", here and in the other reply I gave at first. Sorry for the oversight. --Gennaro.

6678

Age (days ago)

6685

Last active (days ago)

List overview

Download

9 comments

2 participants

participants (2)

Bo Peng
Gennaro Prota