[dynamic_bitset] Comparison operators without assertions

newer
[review] Review of PolyCollection...

Pavel Kretov

27 Apr 2017 27 Apr '17

5:02 p.m.

Hi all. Looking through boost::dynamic_bitset code, I found a interesting inconsistency between its operator== and operator<. Namely, the first works well with bitsets of different size, while the second triggers an assertion: boost::dynamic_bitset<> a(std::string("111")), b(std::string("1110")); a == b; // false a < b; // Assertion `a.size() == b.size()' failed. This means that we cannot reliably use dynamic bitset within ordered containers like std::set or std::map. Since the standard library has distinct concepts of equality and equivalence (see [1]), it's fine to have different comparison behaviors in dynamic_bitset. However, as far as I can judge, equivalence is a metaphysically broader concept than equality, we shouldn't restrict it to bitsets of the same size only. So, the questions: 1. May I remove that assertion? (It won't be a breaking change, after all). 2. How about doing the same thing to bitwise operators, too? Instead of throwing an assertion if sizes don't match, it may be worth to generalize the operators definitions. I can see several possible schemes, but not all of them are consistent enough. 3. Shouldn't BOOST_ASSERT be used instead of the one from assert.h? ――― Pavel K. [1]: http://en.cppreference.com/w/cpp/concept/EqualityComparable

Show replies by date

Edward Diener

30 Apr 30 Apr

3:13 p.m.

On 4/27/2017 1:02 PM, Pavel Kretov via Boost wrote:

...

Hi all.

Looking through boost::dynamic_bitset code, I found a interesting inconsistency between its operator== and operator<. Namely, the first works well with bitsets of different size, while the second triggers an assertion:

boost::dynamic_bitset<> a(std::string("111")), b(std::string("1110")); a == b; // false a < b; // Assertion `a.size() == b.size()' failed.

This means that we cannot reliably use dynamic bitset within ordered containers like std::set or std::map.

Since the standard library has distinct concepts of equality and equivalence (see [1]), it's fine to have different comparison behaviors in dynamic_bitset. However, as far as I can judge, equivalence is a metaphysically broader concept than equality, we shouldn't restrict it to bitsets of the same size only.

So, the questions:

1. May I remove that assertion? (It won't be a breaking change, after all).

2. How about doing the same thing to bitwise operators, too? Instead of throwing an assertion if sizes don't match, it may be worth to generalize the operators definitions. I can see several possible schemes, but not all of them are consistent enough.

Are you suggesting that the <,>,<=, and >= operators always return 'false' if the sizes do not match, rather than assert ?

...

3. Shouldn't BOOST_ASSERT be used instead of the one from assert.h?

Possibly. I think the intention was always to assert, whereas BOOST_ASSERT lets the assertion be overridden. I do agree with you that the current way dynamic_bitset works with the assertion should have been documented.

...

――― Pavel K.

[1]: http://en.cppreference.com/w/cpp/concept/EqualityComparable

Pavel Kretov

2 May 2 May

5:47 p.m.

Hi Edward and Matt! Thank you for your replies. Let me comment them both in one (maybe, overly long) message. Edward Diener wrote:

...

Are you suggesting that the <,>,<=, and >= operators always return 'false' if the sizes do not match, rather than assert ?

Definitely, no. I'm suggesting to discuss the rules of meaningful and useful comparison for dynamic bitsets of different lengths. Returning 'false' each time sizes are different would make 'operator <' not only inappropriate for std::set or std::map, but will force many STL algorithms to treat these objects as equivalent, since equivalence is defined is the standard library as: equiv(a, b) = !(a<b) && !(b<a) We need a more elaborate solution.

...

...
Shouldn't BOOST_ASSERT be used instead of the one from assert.h?

Possibly. I think the intention was always to assert, whereas BOOST_ASSERT lets the assertion be overridden. I do agree with you that the current way dynamic_bitset works with the assertion should have been documented.

I think, these assertions should be eliminated, not documented. We can get rid of them without breaking even a bit of backward-compatibility. Would you feel happy if an innocent-looking code like std::string("c++") < "boost" call std::terminate just because the class authors don't want to handle all possible inputs for some reason? Less-comparison must either be completely and safely defined, or not defined at all. That's how the math works. If we cannot reliably define total ordering, we can, probably, return std::optional<bool> or even boost::tribool from 'operator <', or, at the very bad, put an exception to the throw()-list. Terminating the whole application immediately is *the worst possible* option, since it is not the expected behavior for the comparison operator. Matt Calabrese wrote:

...

I agree that such comparisons should be valid. We should represent a total order, just like standard containers do (assuming their elements do, of course). I'm actually rather surprised that this isn't already supported.

At the time, I can see two completely different, mutually exclusive total-order generalizations: 1. Mimic lexicographic string comparison: compare a[0] with b[0], if they're equal, compare a[1] with b[1], etc. This is also the way how std::vector<T> works if T is less-comparable (including the special case for std::vector<bool>). 2. Mimic little-endian (or big-endian) numeric comparison. -bit-: (9) (0) a: 0000001101 b: 0001101 c: 00010000 a<b: false b<a: false equiv(a,b): true a<c: true b<c: true I strongly prefer the former approach, since the latter makes equivalence (!(a<b) && !(b<a)) so badly different from equality (a==b). Equality already works in the lexicographic way, there is little reason to introduce incompatible ordering scheme to 'operator <', unless we're going to generalize bitwise operators to mimic numbers, too. It must also be noted that std::bitset avoids defining 'operator <' for some reason. I think, they did it in order not to give users false expectations about the ordering. The very name 'bitset' is the source of confusion. I think, if we could rename boost::dynamic_bitset to simply boost::bitstring, anyone would have no second idea about how the class does its comparisons.

...

I'm not sure, though I agree that it's definitely worth investigating with further discussion on the list and some example use-cases. I intuitively suspect it's a reasonable generalization to act as though implicit 0s were present for the shorter operand, based on an interpretation as a binary value. The side on which to pad should be consistent with the lexicographical ordering. Even this generalization is somewhat debatable though, since extraneous 0s contribute to value in a dynamic bitset.

If we continue thinking about dynamic_bitset as a bitstring (or, more generally, bitlist), it would be consistent to define bitwise operators in terms of 'zip' function, commonly found in a number of functional programming languages (some pseudo-Haskell code follows): (&) :: [Bit] -> [Bit] -> [Bit] a & b = zipWith (&) a b Usually, 'zip' combines lists until one of them is over (see [1] and [2], though). Hence, applying 'operator &' to bitsets of different size should create a new bitset with the length of the shorter. The great feature of this definition is that it naturally obeys the De Morghan's laws: (~a & ~b) == ~(a | b) (~b | ~b) == ~(a & b) I also tried different padding schemes and had no luck keeping them consistent to De Morghan laws. It anyone interested, I can post my generalization attempts on the list. With best wishes and hope that my suboptimal language skills won't totally ruin my arguments, ――― Pavel K. [1]: https://softwareengineering.stackexchange.com/q/274983 (Why does 'zip' ignore the dangling tail of the collection?) [2]: http://stackoverflow.com/a/8513803 (TLDR: Boost's zip_iterator doesn't behave like Haskell's zip).

Edward Diener

3 May 3 May

8:47 p.m.

On 5/2/2017 1:47 PM, Pavel Kretov via Boost wrote:

...

Hi Edward and Matt! Thank you for your replies. Let me comment them both in one (maybe, overly long) message.

Edward Diener wrote:

...
Are you suggesting that the <,>,<=, and >= operators always return 'false' if the sizes do not match, rather than assert ?

Definitely, no. I'm suggesting to discuss the rules of meaningful and useful comparison for dynamic bitsets of different lengths. Returning 'false' each time sizes are different would make 'operator <' not only inappropriate for std::set or std::map, but will force many STL algorithms to treat these objects as equivalent, since equivalence is defined is the standard library as:

equiv(a, b) = !(a<b) && !(b<a)

We need a more elaborate solution.

...
...
Shouldn't BOOST_ASSERT be used instead of the one from assert.h?

Possibly. I think the intention was always to assert, whereas BOOST_ASSERT lets the assertion be overridden. I do agree with you that the current way dynamic_bitset works with the assertion should have been documented.

I think, these assertions should be eliminated, not documented. We can get rid of them without breaking even a bit of backward-compatibility. Would you feel happy if an innocent-looking code like

std::string("c++") < "boost"

call std::terminate just because the class authors don't want to handle all possible inputs for some reason? Less-comparison must either be completely and safely defined, or not defined at all. That's how the math works.

If we cannot reliably define total ordering, we can, probably, return std::optional<bool> or even boost::tribool from 'operator <', or, at the very bad, put an exception to the throw()-list. Terminating the whole application immediately is *the worst possible* option, since it is not the expected behavior for the comparison operator.

I do not see the point of having <.<=,>, or >= for dynamic bitsets. I was not involved in the original discussion about what this was supposed to mean for dynamic_bitset but I would rather that they had not been implemented at all. Given that they were implemented, and I have no idea when that functionality would ever be used, I have no great stake in how they should work when the bitsets are of different sizes. I do agree with your point that using an assert when the bitsets are of different sizes is a very poor solution.

...

Matt Calabrese wrote:

...
I agree that such comparisons should be valid. We should represent a total order, just like standard containers do (assuming their elements do, of course). I'm actually rather surprised that this isn't already supported.

At the time, I can see two completely different, mutually exclusive total-order generalizations:

1. Mimic lexicographic string comparison: compare a[0] with b[0], if they're equal, compare a[1] with b[1], etc. This is also the way how std::vector<T> works if T is less-comparable (including the special case for std::vector<bool>).

This seems the most logical method.

...

2. Mimic little-endian (or big-endian) numeric comparison.

-bit-: (9) (0) a: 0000001101 b: 0001101 c: 00010000 a<b: false b<a: false equiv(a,b): true a<c: true b<c: true

I strongly prefer the former approach, since the latter makes equivalence (!(a<b) && !(b<a)) so badly different from equality (a==b). Equality already works in the lexicographic way, there is little reason to introduce incompatible ordering scheme to 'operator <', unless we're going to generalize bitwise operators to mimic numbers, too.

It must also be noted that std::bitset avoids defining 'operator <' for some reason. I think, they did it in order not to give users false expectations about the ordering. The very name 'bitset' is the source of confusion. I think, if we could rename boost::dynamic_bitset to simply boost::bitstring, anyone would have no second idea about how the class does its comparisons.

I do not think that bitsets are bitstrings in any way. I admit that I am baffled by the fact that, other than wanting to know if two bitsets are equal or unequal, any sort of lesser or greater type comparison makes any sense at all. But given that it is already implemented in dynamic_bitset I am certainly not against a change which would work with bitsets of different sizes.

...

...
I'm not sure, though I agree that it's definitely worth investigating with further discussion on the list and some example use-cases. I intuitively suspect it's a reasonable generalization to act as though implicit 0s were present for the shorter operand, based on an interpretation as a binary value. The side on which to pad should be consistent with the lexicographical ordering. Even this generalization is somewhat debatable though, since extraneous 0s contribute to value in a dynamic bitset.

If we continue thinking about dynamic_bitset as a bitstring (or, more generally, bitlist), it would be consistent to define bitwise operators in terms of 'zip' function, commonly found in a number of functional programming languages (some pseudo-Haskell code follows):

(&) :: [Bit] -> [Bit] -> [Bit] a & b = zipWith (&) a b

Would you please explain this in C++ rather than in Haskell ?

...

Usually, 'zip' combines lists until one of them is over (see [1] and [2], though). Hence, applying 'operator &' to bitsets of different size should create a new bitset with the length of the shorter. The great feature of this definition is that it naturally obeys the De Morghan's laws:

(~a & ~b) == ~(a | b) (~b | ~b) == ~(a & b)

I also tried different padding schemes and had no luck keeping them consistent to De Morghan laws. It anyone interested, I can post my generalization attempts on the list.

With best wishes and hope that my suboptimal language skills won't totally ruin my arguments,

Your English language skills are fine. If you create a PR I will look at it.

...

――― Pavel K.

[1]: https://softwareengineering.stackexchange.com/q/274983 (Why does 'zip' ignore the dangling tail of the collection?) [2]: http://stackoverflow.com/a/8513803 (TLDR: Boost's zip_iterator doesn't behave like Haskell's zip).

Matt Calabrese

9 May 9 May

6:41 p.m.

On Wed, May 3, 2017 at 4:47 PM, Edward Diener via Boost < boost@lists.boost.org> wrote:

...

I do not see the point of having <.<=,>, or >= for dynamic bitsets. I was not involved in the original discussion about what this was supposed to mean for dynamic_bitset but I would rather that they had not been implemented at all. Given that they were implemented, and I have no idea when that functionality would ever be used, I have no great stake in how they should work when the bitsets are of different sizes. I do agree with your point that using an assert when the bitsets are of different sizes is a very poor solution.

There are at least some advocates for defining these all of the time to represent a default order. If you don't fall into that camp, there is the existing precedent being that standard containers provide such operations, although they are technically only considered "optional requirements" for the standard container concept, even though that's a bit of an oxymoron. On Wed, May 3, 2017 at 4:47 PM, Edward Diener via Boost < boost@lists.boost.org> wrote:

...

I admit that I am baffled by the fact that, other than wanting to know if two bitsets are equal or unequal, any sort of lesser or greater type comparison makes any sense at all. But given that it is already implemented in dynamic_bitset I am certainly not against a change which would work with bitsets of different sizes.

Having an ordering is a common requirement for generic algorithms/datastructures. Stepanov defines Regular types as including a default order. Since we don't (yet) have a standard way to specify some kind of default order that is different from the relational operators, that usually means overloading <, >, <=, and >=. I personally tend to feel that because a given type has multiple valid orderings, unless it's a monostate type, it's a questionable requirement. An ordering can always be specified at a call-site. Still, existing practice is to simply define the relational operators and I don't think it hurts. -- -Matt Calabrese

Edward Diener

7:16 p.m.

On 5/9/2017 2:41 PM, Matt Calabrese via Boost wrote:

...

On Wed, May 3, 2017 at 4:47 PM, Edward Diener via Boost < boost@lists.boost.org> wrote:

...
I do not see the point of having <.<=,>, or >= for dynamic bitsets. I was not involved in the original discussion about what this was supposed to mean for dynamic_bitset but I would rather that they had not been implemented at all. Given that they were implemented, and I have no idea when that functionality would ever be used, I have no great stake in how they should work when the bitsets are of different sizes. I do agree with your point that using an assert when the bitsets are of different sizes is a very poor solution.

There are at least some advocates for defining these all of the time to represent a default order. If you don't fall into that camp, there is the existing precedent being that standard containers provide such operations, although they are technically only considered "optional requirements" for the standard container concept, even though that's a bit of an oxymoron.

On Wed, May 3, 2017 at 4:47 PM, Edward Diener via Boost < boost@lists.boost.org> wrote:

...
I admit that I am baffled by the fact that, other than wanting to know if two bitsets are equal or unequal, any sort of lesser or greater type comparison makes any sense at all. But given that it is already implemented in dynamic_bitset I am certainly not against a change which would work with bitsets of different sizes.

Having an ordering is a common requirement for generic algorithms/datastructures. Stepanov defines Regular types as including a default order. Since we don't (yet) have a standard way to specify some kind of default order that is different from the relational operators, that usually means overloading <, >, <=, and >=. I personally tend to feel that because a given type has multiple valid orderings, unless it's a monostate type, it's a questionable requirement. An ordering can always be specified at a call-site. Still, existing practice is to simply define the relational operators and I don't think it hurts.

I am more pragmatic in this case. Ordering is great when it means something. i just don't see that ordering dynamic_bitsets mean anything as far as the uses of a bitset are defined. Given that there is an ordering I agree with the OP that an assertion, when bitsets are a different size and an ordering operator is used, is not the right way that dynamic_bitset should have been designed. But what should be done I do not know and am perfectly willing to defer to others on this, as I am just a maintainer and not the original developer.

...

Peter Bartlett

11 May 11 May

12:28 a.m.

Edward wrote:

...

Given that there is an ordering I agree with the OP that an assertion, when bitsets are a different size and an ordering operator is used, is not the right way that >dynamic_bitset should have been designed. But what should be done I do not know and am perfectly willing to defer to others on this, as I am just a maintainer >and not the original developer.

I don't think there is an truly natural ordering because you'd want, say, '00' not to be equal to be '0' but under any natural ordering they'd map to the same integer, 0. So you have two options: - consider length first and if they are equal consider the bit pattern using the ordering you have now. - consider common bits first (i.e. given lengths M,N consider the last min(N,M) bits of each). If they are equal consider the length (longer is greater). Either of these would be compatible with the equal-size ordering that is currently present. I would imagine the latter is better because it implies '00' < '1' which is less surprising than the former ('00' > '1')

Edward Diener

1:15 a.m.

On 5/10/2017 8:28 PM, Peter Bartlett via Boost wrote:

...

Edward wrote:

...
Given that there is an ordering I agree with the OP that an assertion, when bitsets are a different size and an ordering operator is used, is not the right way that >dynamic_bitset should have been designed. But what should be done I do not know and am perfectly willing to defer to others on this, as I am just a maintainer >and not the original developer.

I don't think there is an truly natural ordering because you'd want, say, '00' not to be equal to be '0' but under any natural ordering they'd map to the same integer, 0.

So you have two options: - consider length first and if they are equal consider the bit pattern using the ordering you have now. - consider common bits first (i.e. given lengths M,N consider the last min(N,M) bits of each). If they are equal consider the length (longer is greater).

I do not understand the difference above. Currently when the sizes are equal the algorithm compares the bits from right to left, else asserts. Your second choice above would do the same when the sizes are equal, but always specify the longer length as greater rather than assert if the same length bits are equal. I agree with how you suggest to change it, but between the first and second above, when the sizes are equal I see no difference. Or do you mean to suggest that your second choice would compare the last min(N,M) bits from left to right to determine ordering rather than the current right to left ?

...

Either of these would be compatible with the equal-size ordering that is currently present. I would imagine the latter is better because it implies '00' < '1' which is less surprising than the former ('00' > '1')

degski

4:24 a.m.

On 11 May 2017 at 04:15, Edward Diener via Boost <boost@lists.boost.org> wrote:

...

On 5/10/2017 8:28 PM, Peter Bartlett via Boost wrote: Or do you mean to suggest that your second choice would compare the last min(N,M) bits from left to right to determine ordering rather than the current right to left ?

Seems to me that comparing bitsets of different sizes makes little 'sense', but, having said that, having an order (any) in bitsets of different size is the right thing to have, as it allows for putting those bitsets in std::(multi_)map/std::(multi_)set. The choice of Left/Right or Right/Left should I think be guided by what is the cheapest way to compare them, as the 'most significant (to the user) bits' might be on the left, on the right, in the middle or even on both ends. I would go for memcmp(). degski -- "*Ihre sogenannte Religion wirkt bloß wie ein Opiat reizend, betäubend, Schmerzen aus Schwäche stillend.*" - Novalis 1798

Edward Diener

5:11 a.m.

On 5/11/2017 12:24 AM, degski via Boost wrote:

...

On 11 May 2017 at 04:15, Edward Diener via Boost <boost@lists.boost.org> wrote:

...
On 5/10/2017 8:28 PM, Peter Bartlett via Boost wrote: Or do you mean to suggest that your second choice would compare the last min(N,M) bits from left to right to determine ordering rather than the current right to left ?

Seems to me that comparing bitsets of different sizes makes little 'sense', but, having said that, having an order (any) in bitsets of different size is the right thing to have, as it allows for putting those bitsets in std::(multi_)map/std::(multi_)set. The choice of Left/Right or Right/Left should I think be guided by what is the cheapest way to compare them, as the 'most significant (to the user) bits' might be on the left, on the right, in the middle or even on both ends. I would go for memcmp().

There is no reason to change the right-to-left order as the most significant bit is on the right. Furthermore it would be foolish to break backward compatibility on a whim. I will probably keep the same order but just add that the longer size, the rest being equal, will always be considered greater than the shorter size. This will eliminate the assert and, hopefully, will not break any backward compatibility as the assert itself should never have been expected in normal code. The only other possibility that I can imagine, if the assert is eliminated, is to throw some exception if the sizes are not equal, which at least has the possibility of keeping the program running if the exception is caught.

...

degski

degski

5:34 a.m.

On 11 May 2017 at 08:11, Edward Diener via Boost <boost@lists.boost.org> wrote:

...

There is no reason to change the right-to-left order as the most significant bit is on the right.

A bitset is not a number, 'most signifcant' depends on usage and is in the eye of the beholder.

...

Furthermore it would be foolish to break backward compatibility on a whim.

Yes. I will probably keep the same order but just add that the longer size, the

...

rest being equal, will always be considered greater than the shorter size.

One could introduce a comparison-policy, the default preserving the 'current' behaviour.

...

The only other possibility that I can imagine, if the assert is eliminated, is to throw some exception if the sizes are not equal, which at least has the possibility of keeping the program running if the exception is caught.

An exception seems completely wrong to me, as there is a valid use case for comparing bitsets of differing lengths. Exceptions should be used for exceptional situations, i.e. not here. degski -- "*Ihre sogenannte Religion wirkt bloß wie ein Opiat reizend, betäubend, Schmerzen aus Schwäche stillend.*" - Novalis 1798

Edward Diener

15 May 15 May

8:50 p.m.

On 5/11/2017 1:34 AM, degski via Boost wrote:

...

On 11 May 2017 at 08:11, Edward Diener via Boost <boost@lists.boost.org> wrote:

...
There is no reason to change the right-to-left order as the most significant bit is on the right.

A bitset is not a number, 'most signifcant' depends on usage and is in the eye of the beholder.

...
Furthermore it would be foolish to break backward compatibility on a whim.

Yes.

I will probably keep the same order but just add that the longer size, the

...
rest being equal, will always be considered greater than the shorter size.

One could introduce a comparison-policy, the default preserving the 'current' behaviour.

...
The only other possibility that I can imagine, if the assert is eliminated, is to throw some exception if the sizes are not equal, which at least has the possibility of keeping the program running if the exception is caught.

An exception seems completely wrong to me, as there is a valid use case for comparing bitsets of differing lengths. Exceptions should be used for exceptional situations, i.e. not here.

What I have ended up doing is to keep the exact same behavior as before when the bitsets are of equal size. When the bitsets are of unequal size, if either size is 0 that bitset is less than the other one. If neither size is 0 I convert both bitsets to an unsigned long using the already provided member function to_ulong() and use those values for the comparison. I have tested this out adding some tests for different sized bitsets. I have not pushed this to 'develop' yet. I wanted to post here my solution and then others could chime in if they wished. I found that it was completely wrong to do bit-by-bit comparisons when the bitsets were not the same size. No matter how I decided to do it the result was never correct. If anyone doubts this they can try it for themselves.

...

degski

degski

16 May 16 May

6:49 a.m.

On 15 May 2017 at 23:50, Edward Diener via Boost <boost@lists.boost.org> wrote:

...

What I have ended up doing is to keep the exact same behavior as before when the bitsets are of equal size. When the bitsets are of unequal size, if either size is 0 that bitset is less than the other one. If neither size is 0 I convert both bitsets to an unsigned long using the already provided member function to_ulong() and use those values for the comparison.

A throwing function returning a ulong seem wrong and restrictive (32 bits on windows). If my bitset was only 32 bits long, I wouldn't use a dynamic_bitset<> in the first place. I found that it was completely wrong to do bit-by-bit comparisons when the

...

bitsets were not the same size.

Size/length could be the deciding factor in this case. degski -- "*Ihre sogenannte Religion wirkt bloß wie ein Opiat reizend, betäubend, Schmerzen aus Schwäche stillend.*" - Novalis 1798

Edward Diener

5:01 p.m.

On 5/16/2017 2:49 AM, degski via Boost wrote:

...

On 15 May 2017 at 23:50, Edward Diener via Boost <boost@lists.boost.org> wrote:

...
What I have ended up doing is to keep the exact same behavior as before when the bitsets are of equal size. When the bitsets are of unequal size, if either size is 0 that bitset is less than the other one. If neither size is 0 I convert both bitsets to an unsigned long using the already provided member function to_ulong() and use those values for the comparison.

A throwing function returning a ulong seem wrong and restrictive (32 bits on windows). If my bitset was only 32 bits long, I wouldn't use a dynamic_bitset<> in the first place.

There is no other way to do it. BTW ulong is 64-bits on any 64-bit OS.

...

I found that it was completely wrong to do bit-by-bit comparisons when the

...
bitsets were not the same size.

Size/length could be the deciding factor in this case.

Feel free to suggest an algorithm which works doing bit-by-bit comparisons when the number of bits are different.

...

degski

Gavin Lambert

11:33 p.m.

On 17/05/2017 05:01, Edward Diener wrote:

...

There is no other way to do it. BTW ulong is 64-bits on any 64-bit OS.

"unsigned long" is 32-bits on Windows, even in 64 bit.

...

Feel free to suggest an algorithm which works doing bit-by-bit comparisons when the number of bits are different.

What's wrong with comparing bit-by-bit up to the common length, then if still equal, whichever is longer is greater? Sure, that doesn't necessarily produce the same answer as an integral comparison, but that's not a useful comparison for a bitset anyway.

Edward Diener

17 May 17 May

12:39 a.m.

On 5/16/2017 7:33 PM, Gavin Lambert via Boost wrote:

...

On 17/05/2017 05:01, Edward Diener wrote:

...
There is no other way to do it. BTW ulong is 64-bits on any 64-bit OS.

"unsigned long" is 32-bits on Windows, even in 64 bit.

...
Feel free to suggest an algorithm which works doing bit-by-bit comparisons when the number of bits are different.

What's wrong with comparing bit-by-bit up to the common length, then if still equal, whichever is longer is greater?

It depends on what you consider the bits comprising the common length ? I am assuming you are considering the bits comprising the common length as the lower order bits comprising the common length. So that with 1010001 and 111 the bits comprising the common length are 001 and 111. Because if you mean the high order bits comprising the common length, so that the bits comprising the common length are 101 and 111, that makes absolutely no sense to me, since by that 1010001 < 111. However even if you mean the low order bits comprising the common length then by your suggestion 1010001 is still < 111, which again makes no sense. I think whatever the algorithm the greater value has to be greater than the lesser value An alternative algorithm I have been considering is when the bit sizes are unequal is: a) The bits comprising the common length are the low order bits. b) If the bitset with the larger bit size has any 1 bit outside the bits comprising the common length, the smaller bit size is always < the larger bit size. c) Else do a bit-by-bit comparison for the bits comprising the common length going from the higher order bit to the lower order bit of just those bits. This algorithm would guarantee that the greater value is always > the lesser value without the to_ulong limitation of the number of bits in an unsigned long.

...

Sure, that doesn't necessarily produce the same answer as an integral comparison, but that's not a useful comparison for a bitset anyway.

I disagree. As far as ordering I think it is the only always reliable comparison.

Peter Dimov

12:46 a.m.

Edward Diener wrote:

...

An alternative algorithm I have been considering is when the bit sizes are unequal is:

a) The bits comprising the common length are the low order bits. b) If the bitset with the larger bit size has any 1 bit outside the bits comprising the common length, the smaller bit size is always < the larger bit size. c) Else do a bit-by-bit comparison for the bits comprising the common length going from the higher order bit to the lower order bit of just those bits.

Could also add d) if still equal, the larger bit size > the smaller bit size. This makes 000111 > 111 instead of equal, so that equality (which I presume checks sizes) and equivalence match.

Edward Diener

4:01 a.m.

On 5/16/2017 8:46 PM, Peter Dimov via Boost wrote:

...

Edward Diener wrote:

...
An alternative algorithm I have been considering is when the bit sizes are unequal is:

a) The bits comprising the common length are the low order bits. b) If the bitset with the larger bit size has any 1 bit outside the bits comprising the common length, the smaller bit size is always < the larger bit size. c) Else do a bit-by-bit comparison for the bits comprising the common length going from the higher order bit to the lower order bit of just those bits.

Could also add

d) if still equal, the larger bit size > the smaller bit size.

This makes 000111 > 111 instead of equal, so that equality (which I presume checks sizes) and equivalence match.

Agreed.

degski

6:54 a.m.

On 17 May 2017 at 07:01, Edward Diener via Boost <boost@lists.boost.org> wrote:

...

On 5/16/2017 8:46 PM, Peter Dimov via Boost wrote:

...
Edward Diener wrote:

...
An alternative algorithm I have been considering is when the bit sizes are unequal is:

a) The bits comprising the common length are the low order bits.

The fact that bits are part of the common length does not mean that these bits have anything in common (semantically). It seems (what Edward D. seemed to think originally) non-sensical to me (not to say overly expensive).

d) if still equal, the larger bit size > the smaller bit size.

...

...
This makes 000111 > 111 instead of equal, so that equality (which I presume checks sizes) and equivalence match.

This means you can just as well start with the length comparison first. The Algorithm: 1. Compare length: shorter < longer. 2. If equal lengths, compare bitwise (in some way, f.e. memcmp()), but ALL the bits. Any use of ulong() as suggested, or f.e. hashing, will not result in a strict weak ordering, i.e. it's useless. degski -- "*Ihre sogenannte Religion wirkt bloß wie ein Opiat reizend, betäubend, Schmerzen aus Schwäche stillend.*" - Novalis 1798

Pete Bartlett

7:07 a.m.

...

On 17 May 2017, at 07:54, degski via Boost <boost@lists.boost.org> wrote:

On 17 May 2017 at 07:01, Edward Diener via Boost <boost@lists.boost.org> wrote:

...
...
On 5/16/2017 8:46 PM, Peter Dimov via Boost wrote:

Edward Diener wrote:

...
An alternative algorithm I have been considering is when the bit sizes are unequal is:

a) The bits comprising the common length are the low order bits.

The fact that bits are part of the common length does not mean that these bits have anything in common (semantically). It seems (what Edward D. seemed to think originally) non-sensical to me (not to say overly expensive).

d) if still equal, the larger bit size > the smaller bit size.

...
...
This makes 000111 > 111 instead of equal, so that equality (which I presume checks sizes) and equivalence match.

This means you can just as well start with the length comparison first. The Algorithm:

1. Compare length: shorter < longer. 2. If equal lengths, compare bitwise (in some way, f.e. memcmp()), but ALL the bits.

Any use of ulong() as suggested, or f.e. hashing, will not result in a strict weak ordering, i.e. it's useless.

The documentation already states that the ordering is lexicographic - there's no fuss or choice to be had here. The ordering MUST be the same as for std::string except the alphabet is now two characters (0/1) instead of 256 chars. I hope no one thinks it is hard to compare std::strings of non-equal lengths...! Nb there are other member functions that might be affected by this issue (the is_subset_of group). Do they demand equal sizes too? The docs don't impose a pre-condition that they do...

degski

8:21 a.m.

On 17 May 2017 at 10:07, Pete Bartlett via Boost <boost@lists.boost.org> wrote:

...

I hope no one thinks it is hard to compare std::strings of non-equal lengths...!

Well... consider: std::cout << ( std::string ( "0001" ) > std::string ( "000" ) ) << '\n'; std::cout << ( std::string ( "1110" ) > std::string ( "111" ) ) << '\n'; std::cout << ( std::string ( "0101" ) > std::string ( "000" ) ) << '\n'; std::cout << ( std::string ( "1010" ) < std::string ( "111" ) ) << '\n'; std::cout << ( std::string ( "000111" ) < std::string ( "111" ) ) << '\n'; std::cout << ( std::string ( "111000" ) > std::string ( "111" ) ) << '\n'; But I guess, problem solved: http://en.cppreference.com/w/cpp/algorithm/lexicographical_compare and comparison of boost::dynamic_bitsets<> should neither throw or assert. degski -- "*Ihre sogenannte Religion wirkt bloß wie ein Opiat reizend, betäubend, Schmerzen aus Schwäche stillend.*" - Novalis 1798

Edward Diener

18 May 18 May

3:20 a.m.

On 5/17/2017 3:07 AM, Pete Bartlett via Boost wrote:

...

...
On 17 May 2017, at 07:54, degski via Boost <boost@lists.boost.org> wrote:

On 17 May 2017 at 07:01, Edward Diener via Boost <boost@lists.boost.org> wrote:

...
...
On 5/16/2017 8:46 PM, Peter Dimov via Boost wrote:

Edward Diener wrote:

...
An alternative algorithm I have been considering is when the bit sizes are unequal is:

a) The bits comprising the common length are the low order bits.

The fact that bits are part of the common length does not mean that these bits have anything in common (semantically). It seems (what Edward D. seemed to think originally) non-sensical to me (not to say overly expensive).

d) if still equal, the larger bit size > the smaller bit size.

...
...
This makes 000111 > 111 instead of equal, so that equality (which I presume checks sizes) and equivalence match.

This means you can just as well start with the length comparison first. The Algorithm:

1. Compare length: shorter < longer. 2. If equal lengths, compare bitwise (in some way, f.e. memcmp()), but ALL the bits.

Any use of ulong() as suggested, or f.e. hashing, will not result in a strict weak ordering, i.e. it's useless.

The documentation already states that the ordering is lexicographic - there's no fuss or choice to be had here. The ordering MUST be the same as for std::string except the alphabet is now two characters (0/1) instead of 256 chars.

I hope no one thinks it is hard to compare std::strings of non-equal lengths...!

I implemented it with a lexicographic compare and pushed it to develop after testing it. I also added a more tests. I honestly still do not see the value of having ordering for bitsets, and I do not know the initial reasoning why it was implemented. But since it was there might as well be ordering for bitsets of different sizes.

...

Nb there are other member functions that might be affected by this issue (the is_subset_of group). Do they demand equal sizes too? The docs don't impose a pre-condition that they do...

I did not take a look at any this.

Gavin Lambert

17 May 17 May

1:54 a.m.

On 17/05/2017 12:39, Edward Diener wrote:

...

On 5/16/2017 7:33 PM, Gavin Lambert wrote:

...
What's wrong with comparing bit-by-bit up to the common length, then if still equal, whichever is longer is greater?

It depends on what you consider the bits comprising the common length ? I am assuming you are considering the bits comprising the common length as the lower order bits comprising the common length. So that with

1010001 and 111

the bits comprising the common length are 001 and 111.

I'm not thinking of it as a bitstring at all. Whichever bit is bit [0] is compared first. It doesn't matter whether this is rendered first or last as a bitstring. (Although yes, I would expect it to be rendered last, because I've been biased by little-bit-endian numbering; note that isn't universal, however; albeit nearly so, even on big-endian-integer architectures, due to base numbering.)

...

However even if you mean the low order bits comprising the common length then by your suggestion 1010001 is still < 111, which again makes no sense.

Why does that make no sense? This is not a number and it is not a bitstring, it is an arbitrary set of bits. As long as the ordering is consistent (which this is, in all cases) it doesn't have to match how it would be ordered as a bitstring or integer. Some alternate completely different possible orderings: 1. Render it as a bitstring first and then use string comparison. This seems needlessly perf-hungry (unless internal storage were already as a bitstring, but that seems needlessly storage-hungry). This also biases the comparison to the non-[0] end, which might be good or bad depending on usage. 2. Compare each 32-bit (or whatever window size is convenient) group of bits as an unsigned integer (zero-padding if smaller than the window size), starting at whichever one holds [0] and continuing as long as both sets contributed at least one "real" bit to the window; if any comparison returns unequal, that's the result; if all comparisons return equal, then finally compare the actual sizes of both sets for the final result. Advantage: fast. Disadvantage: the ordering might change if the window size changes (eg. for different platforms, or future Boost versions).

Gavin Lambert

2:07 a.m.

Mere moments ago, quoth I:

...

1. Render it as a bitstring first and then use string comparison. This seems needlessly perf-hungry (unless internal storage were already as a bitstring, but that seems needlessly storage-hungry). This also biases the comparison to the non-[0] end, which might be good or bad depending on usage.

Variations include zero-padding whichever one is smaller, or not.

...

2. Compare each 32-bit (or whatever window size is convenient) group of bits as an unsigned integer (zero-padding if smaller than the window size), starting at whichever one holds [0] and continuing as long as both sets contributed at least one "real" bit to the window; if any comparison returns unequal, that's the result; if all comparisons return equal, then finally compare the actual sizes of both sets for the final result. Advantage: fast. Disadvantage: the ordering might change if the window size changes (eg. for different platforms, or future Boost versions).

Another variation is to start from the non-[0] end. This would probably get you a more "natural" bitstring-like ordering and should be less dependent on the window size. Might still result in different ordering on big-endian vs. little-endian platforms, though that depends on how internal storage is structured vs. indexing. Comparing the actual sizes at the end if everything else is equal seems like an important step, though. 0000 > 00.

degski

6:32 a.m.

On 16 May 2017 at 20:01, Edward Diener via Boost <boost@lists.boost.org> wrote: There is no other way to do it. BTW uong is 64-bits on any 64-bit OS.

...

This remark by a seasoned expert shows how dangerous the use of the type long realy is. degski -- "*Ihre sogenannte Religion wirkt bloß wie ein Opiat reizend, betäubend, Schmerzen aus Schwäche stillend.*" - Novalis 1798

Matt Calabrese

30 Apr 30 Apr

4:22 p.m.

On Apr 27, 2017 13:36, "Pavel Kretov via Boost" <boost@lists.boost.org> wrote: Hi all. Looking through boost::dynamic_bitset code, I found a interesting inconsistency between its operator== and operator<. Namely, the first works well with bitsets of different size, while the second triggers an assertion: boost::dynamic_bitset<> a(std::string("111")), b(std::string("1110")); a == b; // false a < b; // Assertion `a.size() == b.size()' failed. This means that we cannot reliably use dynamic bitset within ordered containers like std::set or std::map. Since the standard library has distinct concepts of equality and equivalence (see [1]), it's fine to have different comparison behaviors in dynamic_bitset. However, as far as I can judge, equivalence is a metaphysically broader concept than equality, we shouldn't restrict it to bitsets of the same size only. So, the questions: 1. May I remove that assertion? (It won't be a breaking change, after all). I agree that such comparisons should be valid. We should represent a total order, just like standard containers do (assuming their elements do, of course). I'm actually rather surprised that this isn't already supported. On Apr 27, 2017 13:36, "Pavel Kretov via Boost" 2. How about doing the same thing to bitwise operators, too? Instead of throwing an assertion if sizes don't match, it may be worth to generalize the operators definitions. I can see several possible schemes, but not all of them are consistent enough. I'm not sure, though I agree that it's definitely worth investigating with further discussion on the list and some example use-cases. I intuitively suspect it's a reasonable generalization to act as though implicit 0s were present for the shorter operand, based on an interpretation as a binary value. The side on which to pad should be consistent with the lexicographical ordering. Even this generalization is somewhat debatable though, since extraneous 0s contribute to value in a dynamic bitset.

2942

Age (days ago)

2963

Last active (days ago)

List overview

Download

25 comments

8 participants

participants (8)

degski
Edward Diener
Gavin Lambert
Matt Calabrese
Pavel Kretov
Pete Bartlett
Peter Bartlett
Peter Dimov