[integer] Type-safe and bounded integers with compile-time checking

Hi all, There are some integer libraries out there that improve on the fundamental integers; Boost.Integer, ACCU BoundedInt, part of The Boost Constrained Value library, and probably many other. They add the ability to specify an integer based on a range, they add bounds checking at runtime, or a combination of both. What I have not found yet though is a library that checks bounds at compile time. Me personally I would like to have the compiler check as much as possible. For instance, the compiler would complain about an assignment if the target's range is not a superset of the source's, unless an explicit conversion is made. Does anyone know of a library supporting this? If there is no such library I hope there is an interest in one and that this post will start a discussion about how it best can be done. The motivation for this kind of library is that its usage helps show the intent of the developer, and it can point out faults that otherwise even an extensive test suite might miss. That it forces the developer to thoroughly think about value ranges in computations I also regard as a point in favour. What about the type-safe part in the subject of this post? Even though most mistakes will be caught by the bounds checking, I believe there are situations where one wants to clearly state that an integer represents one specific concept and nothing else, for instance a numeric identifier. So I would like this kind of library to also support a kind of type-safety. I am not sure that that is the correct term for it, but what I mean is that when one declares a bounded integer it should be possible to state that it can not mix with other bounded integers regardless if the bounds are compatible. As I have been thinking about this for years now I of course have a lot of ideas on how to implement this. As a starting point for a discussion I would like to present some basic parts of what I have come up with. The integer type is declared as follows: template< typename Range, typename Tune = fast > struct integer; It takes a range type instead of taking the bounds directly. It also has a tuning parameter which is used when selecting the fundamental integer type to use for storage. The idea of having a separate range type is that it can be used for other types, see the array type below. The range type then: template< intmax_t Lower, intmax_t Upper, typename TypesafetyTag = defaultTypesafety > struct range_c; The bounds is no surprise, but here I add the type-safety part. I am not quit sure about this approach because the bounds and the type-safety mechanisms are orthogonal, but it works together quite well in practice I believe. Maybe it should not be named "range" but something else. The type-safety mechanism is fairly simple; two integers must have the same tag to be part of the same expression (unless a type-safety conversion is used). This means that they cannot be mixed with fundamental integers either. The default type-safety tag is special in that it allows for a mix with fundamental integers except that there is no implicit conversion or assignment to fundamental integers (this would otherwise provide a loophole). In practice fundamental integers do not work that well with the static bounded integers though, because most often the range is too big. Another type that seems fairly obvious is a static array (similar to Boost.Array): template< typename Range > struct array; The array of course is just as large as needed for the range. As subscript parameter it only accepts bounded integers with compatible ranges, i.e. with subset bounds and equal type-safety tag. Contrary to what is normal in C++ the array is not necessarily zero-based, it starts at the lower bound of the array. There is much more of course, for instance that the range type should take types instead of integers as bounds, but this post is getting too long already. As said before I hope this is going to be the start of a fruitful discussion. Sincerely, Leif Linderstam

On 09/04/2011 09:16 PM, Leif Linderstam wrote:
What I have not found yet though is a library that checks bounds at compile time. Me personally I would like to have the compiler check as much as possible. For instance, the compiler would complain about an assignment if the target's range is not a superset of the source's, unless an explicit conversion is made.
Does anyone know of a library supporting this?
How would addition and multiplication work?

Mathias Gaunard skrev 2011-09-05 11:30:
On 09/04/2011 09:16 PM, Leif Linderstam wrote:
What I have not found yet though is a library that checks bounds at compile time. Me personally I would like to have the compiler check as much as possible. For instance, the compiler would complain about an assignment if the target's range is not a superset of the source's, unless an explicit conversion is made.
Does anyone know of a library supporting this?
How would addition and multiplication work?
They produce new ranges, e.g. for the addition A + B the range of the result is [ A_low + B_low, A_high + B_high ] where A_low is the lower bound of A's range, and so on. The result of all operators must have a valid range, which indeed has some fundamental implications. First, all bit operators are ruled out because if it is at all possible to compute a new range, the result is probably not that interesting. But the actual bit pattern of an integer is actually just a representation, one of potentially many although in practice it is probably hard to find anything but two-complements representation today. For bit patterns, use a bit pattern type. Second, assignment of a value back to one of the operands, i.e. A = A + B, will invariably require the use of an explicit range conversion. This means that the compound assignments and increment/decrement operators, if at all supported, must do an implicit range conversion. Range conversions, explicit or implicit, must do a dynamic range check so these operations will not be statically checked. Sincerely, Leif Linderstam

On 05/09/2011 14:29, Leif Linderstam wrote:
They produce new ranges, e.g. for the addition A + B the range of the result is [ A_low + B_low, A_high + B_high ] where A_low is the lower bound of A's range, and so on.
Ok, but that would very quickly lead to very big ranges. What happens when that range goes beyond the range supported by the underlying int type?

Mathias Gaunard wrote 2011-09-05 16:57:
Ok, but that would very quickly lead to very big ranges. What happens when that range goes beyond the range supported by the underlying int type?
The range of the result is computed, this can be used to select a proper underlying type for the result. So although the two ranges A: [0, 200] and B: [1, 100] both fit into unsigned chars, the result of A+B will be put into a short or an int. In the general case it is even a bit more complicated than that. To prevent overflow a suitable internal type for the computation must be chosen by the library. Agreed, with multiplication we soon need more than the widest fundamental type, but in most cases you would run into an overflow with the ordinary integers as well, only that you might not be aware of the risk. My first go at this was that if the result range cannot be represented by any fundamental int type a compile time error is issued. My latest thoughts though are that the library should instead allow for multi-sized integers, which then of course touches on the current work of Christopher Kormanyos. In the original post I said that the range type should accept types as bounds instead of integers; this is the reason. With types there we can specify really big ranges. Sincerely, Leif Linderstam

On 05/09/2011 21:27, Leif Linderstam wrote:
My first go at this was that if the result range cannot be represented by any fundamental int type a compile time error is issued. My latest thoughts though are that the library should instead allow for multi-sized integers, which then of course touches on the current work of Christopher Kormanyos. In the original post I said that the range type should accept types as bounds instead of integers; this is the reason. With types there we can specify really big ranges.
A typed approach prevents iterative programming somewhat, since each operation would yield an object of a different type. You can't write a = 0; while(some_cond) a += something; You have to write it as a fold. If it's just to ensure overflow doesn't happen, wouldn't it be better to make all of the checking happen at runtime, and disable it in NDEBUG builds?

Mathias Gaunard wrote 2011-09-06 10:30:
On 05/09/2011 21:27, Leif Linderstam wrote:
My first go at this was that if the result range cannot be represented by any fundamental int type a compile time error is issued. My latest thoughts though are that the library should instead allow for multi-sized integers, which then of course touches on the current work of Christopher Kormanyos. In the original post I said that the range type should accept types as bounds instead of integers; this is the reason. With types there we can specify really big ranges.
A typed approach prevents iterative programming somewhat, since each operation would yield an object of a different type.
I lost you a bit there. The typed approach for ranges is there to support compile-time constants that are larger than 64 bits. But it is true that the operators +, -, *, /, %, and even unary - all return an object of a new type (regardless if the bounds of the range type is specified with ints or types).
You can't write a = 0; while(some_cond) a += something;
Although in my first thoughts of this I did not include +=, -=, etc., I now believe that those operators must be supported, otherwise this kind of library would be too far from normal C++. And indeed, the only way to implement these is to do range checking at runtime. (I personally prefer to keep those checks in production code as well, but that should perhaps be configurable.) However, this means that the following: a += something; is not the same as: a = a + something; which actually would not compile. Instead it is the same as: a = range_conversion<range_of_a>(a + something); Sincerely, Leif Linderstam

On 09/06/2011 08:42 PM, Leif Linderstam wrote:
However, this means that the following:
a += something;
is not the same as:
a = a + something;
I meant them to be the same in my message.
which actually would not compile.
Which was the problem I was highlighting.

Mathias Gaunard skrev 2011-09-07 00:05:
On 09/06/2011 08:42 PM, Leif Linderstam wrote:
However, this means that the following:
a += something;
is not the same as:
a = a + something;
I meant them to be the same in my message.
I gathered that much, I was just trying to point out that I never meant them to be same. I now realize that it is very confusing if they are not the same, and that the approach taken by Ben Robinson is much more reasonable. Thank you for your comments! Sincerely, Leif Linderstam

Leif Linderstam, I am planning on submitting a policy based integer class which I have already implemented that shares many of the features that you have requested. I currently call it MetaBoundedInt, and the design philosophy was to provide both an overflow, and ranged checked integer, which leverages as much compile time information as possible to maximize performance, as well as communicate overflow/range errors at compile time when possible. The handling of overflow/range errors is policy based, and allows a single policy to assert, throw, saturate, wrap, etc... when the bounds (either overflow or out-of-range) are exceeded. The binary math operators +,-,*,/,% are all implemented, and perform the policy based behavior. Also, when assigning one type of integer to another, overflow detection is implemented correctly by analyzing the value, to detect if it will fit into the smaller lhs data type and specified range. Leveraging the compile time information, certain comparisons are completely eliminated to maximize performance, such as checking the range if the bounds are the max/min of the data type, which is already checked during overflow detection. Also, if the data type min/max of the rhs is greater than or equal to the min/max of the lhs, no overflow detection is required, and therefore not performed. There are many more specializations which eliminate needless computation. Usage looks something like this: mbi<throwing_policy, int8_t> var1(0U); // Detects only overflow mbi<throwing_policy, int8_t, -8, 8> var2(0U); // Detects overflow and range Now use use value just like any other value, except if you overflow the data type, or exceed the range, the policy determines the behavior. As you requested Leif, the throwing_policy can be replaced in a release build with an ignore_policy, so that these additional checks can be eliminated for maximum release build performance. I was going to wait to submit this until after my Singularity submission, which is currently on the review calendar, but given your interest query, I am responding now. Does MetaBoundedInt (mbi) pique anybody's interest? Thank you, Ben Robinson, Ph.D. On Mon, Sep 5, 2011 at 7:57 AM, Mathias Gaunard < mathias.gaunard@ens-lyon.org> wrote:
On 05/09/2011 14:29, Leif Linderstam wrote:
They produce new ranges, e.g. for the addition A + B the range of the
result is [ A_low + B_low, A_high + B_high ] where A_low is the lower bound of A's range, and so on.
Ok, but that would very quickly lead to very big ranges. What happens when that range goes beyond the range supported by the underlying int type?
______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>

Ben Robinson wrote 2011-09-05 21:57:
... I currently call it MetaBoundedInt, and the design philosophy was to provide both an overflow, and ranged checked integer, which leverages as much compile time information as possible to maximize performance, as well as communicate overflow/range errors at compile time when possible.
That seems to be more in line with how for instance Ada does it, which probably means that it is a more reasonable approach than mine. That said, part of the reason for me to start looking into this was some sloppy use of integers within a project I joined and a fairly strict environment would have helped.
mbi<throwing_policy, int8_t, -8, 8> var2(0U); // Detects overflow and range
Why not go all the way and select an underlying int based on the given range?
As you requested Leif, the throwing_policy can be replaced in a release build with an ignore_policy, so that these additional checks can be eliminated for maximum release build performance.
Then I was not clear enough. If a check cannot be made at compile-time I want it to be done at runtime. I do want this type of library to be as efficient as possible for better acceptance among developers, but my main reason for wanting it to do static checks as much as possible is to detect faults as early as possible, and to make it harder to write that sloppy code. Sincerely, Leif Linderstam

On 6 Sep 2011, at 19:33, Leif Linderstam wrote:
Ben Robinson wrote 2011-09-05 21:57:
... I currently call it MetaBoundedInt, and the design philosophy was to provide both an overflow, and ranged checked integer, which leverages as much compile time information as possible to maximize performance, as well as communicate overflow/range errors at compile time when possible.
That seems to be more in line with how for instance Ada does it, which probably means that it is a more reasonable approach than mine. That said, part of the reason for me to start looking into this was some sloppy use of integers within a project I joined and a fairly strict environment would have helped.
mbi<throwing_policy, int8_t, -8, 8> var2(0U); // Detects overflow and range
Why not go all the way and select an underlying int based on the given range?
As you requested Leif, the throwing_policy can be replaced in a release build with an ignore_policy, so that these additional checks can be eliminated for maximum release build performance.
Then I was not clear enough. If a check cannot be made at compile-time I want it to be done at runtime. I do want this type of library to be as efficient as possible for better acceptance among developers, but my main reason for wanting it to do static checks as much as possible is to detect faults as early as possible, and to make it harder to write that sloppy code.
Could you give an example of a non-trivial algorithm, where this would help? I am having difficulty imaging something where bounds can be usefully detected at compile time. Most mathematical algorithms involve ifs and loops, which would would not seem compatible with your technique. If you want something to work off, consider a simple example like: int scalar_product(vector<int> v1, vector<int> v2) { int sum = 0; for(int i = 0; i < v1.size(); ++i) sum += v1[i] * v2[i]; return sum; } Chris

Christopher Jefferson skrev 2011-09-06 20:38:
Could you give an example of a non-trivial algorithm, where this would help? I am having difficulty imaging something where bounds can be usefully detected at compile time.
No, the uses I saw all involved just trivial computations like multiplying a time in seconds with number of ticks / second. I also cannot see a case where the result of a non-trivial algorithm can have its bounds decided statically, certainly not any iterative algorithm. I was thinking that a range conversion operation with run-time checking would fill this gap, but the approach that Ben Robinson has taken with his MetaBoundedInt seems much more fruitful and reasonable. Sincerely, Leif Linderstam
participants (4)
-
Ben Robinson
-
Christopher Jefferson
-
Leif Linderstam
-
Mathias Gaunard