Re: [boost] a safe integer library

12 Dec 2015

      On 12/12/15 9:11 AM, Phil Endecott wrote:
...
Hi Robert,
Robert Ramey wrote:
...
creating types which carry their ranges around with them through
expressions.
I'm curious to know how well this works in practice, i.e. how much
of the time do you need run-time checking in real programs.
That makes two of us.
...
Here are a few examples that have occurred to me:
1. Arguments to functions
void f(safe<int> i)
{
   int8_t j = i;  // Needs a run-time check unless we know i<128
}
correct.
...
f(3);  // Run-time check happens, even though value is known.
The constructor of safe<int> doesn't do a runtime check when it's being 
initialized by an int.

the assignment would require a runtime check.
...
Even though the body of f is visible at the point of the call,
In general this is not true. f could well be compiled separately.
...
I don't think an approach based on types can avoid the run-time
check unless the function args are templates:
template <typename RANGE_TYPE>
void f(RANGE_TYPE i)
{
   int8_t j = i;
}
f(3);
here is a simpler way

void f(safe<int> i){
     int j = i;  // Needs a run-time check unless we know i<128
}

also

void f(safe<int> i){
     auto j = i; // Needs a run-time check unless we know i<128
}

also

void f(safe<int8_t> i){
     int8_t j = i;
}

but then f(3) requires a runtime check

UNLESS we do

constexpr void f(safe<int8_t> i){
     int8_t j = i;
}

in which case f(3) occurs at compile time - truely zero runtime cost !!!
The whole statement goes away!!!
...
Here you can presumably arrange things so that the type of i inside
f includes the range information and the run-time check can be
avoided.  But this does rely on the body of f being visible at the
point of the call.
nope - it relies upon picking types which are large enough and carry 
thier range information with them.  The "automatic" promotion policy can 
do this automatically.

The verbosity of the function declaration should
...
improve a bit with concepts, but it's not really a drop-in replacement.
This is true - used as a drop in replacement, one will get more runtime 
checking.
...
Related to this is function return types:
safe<int> incr(safe<int> i)
{
   return i+1;
}
int8_t j = incr(42);  // Run-time check or not?
No runtime check when the safe<int> is constructed on the stack

yes runtime check calculating the result of i + i.

No runtime check when returning the result.

Since C++ expression rules mean that the result of i + i will be an int 
and there check required to construct a safe<int> from an int.

In fact, probably zero overhead due to NVRO.

yes runtime check on assigment to int8_t j = incr(42).

Unless incr is made constexpr at which time everything moves to compile 
time.  zero runtime checking - actually zero execution for the whole thing!
...
Maybe auto return types help a bit but what about
return (i%2==0) ? i/2 : 3*i+1;
what about it?
...
2. Merging constraints
Example:
void g(int j);
void f(safe<int> i)
{
   for (int a = 0; a < 1000000; ++a) {
     g(i+a);
   }
}
Does that generate a million run-time checks?
yep (note: a will overflow on some machines)
...
If my expectation
is that i will be "small", most likely I would be happy with just
one check at the start of f:
assert(i < maxint - 1000000);
of course.  But maybe you'd consider the following equivalent

void f(safe_integer_range<0, maxint> i){
     safe_integer_range<0, 1000000> a;
     for (a = 0; a < 1000000; ++a) {
       g(i+a);
     }
}

if maxint is less than 2^32 - 1000000 then no runtime checking will in 
the function.  Note how this differs from the usage of assert.  With 
assert we check during debug build.  But when we ship we throw away the 
seat belts.  With safe types in this case, we get zero runtime overhead 
and we're guaranteed not to crash at runtime.

At this point we're beyond the "drop-in" replacement stage.  But we're 
also enhancing getting safety and performance guarantees we've never 
been able to get before.
...
3. Implied constraints:
vector<S> v;
f(v);  // fills v
safe<size_t> bytes = v.size() * sizeof(S);
I know that the multiplication can't overflow, because if v.size()
was large enough that it would overflow then I would have run out
of memory already.
Hmmm - I don't actually know that for a fact.  I've always found the 
usage of size type confusing. I don't really know that it's not possible 
to allocate a vector whose total space occupies more than 2^32 bytes (or 
is it 2^64).

But in any case, since construction safe<size_t> with a variable of type 
size_t will doesn't need to check there's nothing to anyway.
...
So as I say, I'm curious to hear whether you've applied your safe<int>
to any real applications and measured how many of the potential run-
time checks are avoidable.
Only my tests and examples. I'm hoping someone will create some 
performance tests.
...
My fear is that it could be fewer than
you hope, and that the checks remain inside loops rather than begin
hoisted outside them.
We'll just have to see.
...
Another reservation about this is the question of what should
happen when an overflow is detected.  Exceptions are not great,
because you have to worry about what you'll do when you get one.
The library has the concept of exception policy. throwing an exception 
is the default but you can select a different one or make your own. It's 
not hard.  But you really have to spend time thinking about what you 
want to do.

One thing you might want to look at is the usage of automatic type 
promotion (another policy - default is native).  When using this policy, 
types returned from binary expressions are almost always large enough to 
hold the expression results - so we're back in to zero runtime checking.

Another thing you might look into is safe...range which does the 
opposite. I attaches range information to your variables at compile time 
so that runtime checking can be skipped when it's known not to be necessary.

Then there is the fact that all this is constexpr friendly which can 
have a huge impact on performace.
...
One ambitious idea is to find a way to fall back to arbitrary
precision arithmetic:
template <typename INT>
void f(INT i) {
   INT i2 = i*i;
   INT j = (i2 + 4096) << 2;
   g(j%7);
}
try {
   f<int>(n);
}
catch (overflow&) {
   f<arbitrary_precision_int>(n);
}
How much syntactic sugar can we hide that in?
This is too ambitious for me.  But you could do it with your own custom 
promotion policy.  The requirements for such a policy are in the 
documentation and you can look how the "automatic" promotion policy does 
it.  It specifies the return types from binary operations dependent on 
the types of the operands.  It does it in such a way as to minimize the 
need for runtime checking.  If you make your own policy and named it 
"phil" then using it would be:

template<typename T>
using safe_t = safe<T, phil>;

then just you safe_t<int> everywhere int is used.  You'd have your own 
drop-in replacement.

Note that I've made this sound significantly easier than it actually is. 
Take a look at the "automatic" type promotion policy.
...
I recently experimented with LLVM's thread safety checking (see
http://clang.llvm.org/docs/ThreadSafetyAnalysis.html ); it lets
you annotate variables to indicate that locks must be held when
they are accessed:
mutex m;
   int i GUARDED_BY(m);
I've been impressed by that, and I wonder if attributes defining
expected ranges and static analysis might be a more practical
way to find potential overflows.
using safe_...range(min, max) seems equivalent to me.

I'm now trying to make work the piece d'resistance - an error trapping 
policy which uses static_assert so that for things like embedded systems 
one could get a compile time error anywhere something could possible 
overflow - no runtime checking and no exception handling required.  I'm 
having difficulty with this as we speak.

My expectation is to see the library used in following manner

a) some safety critical program has a bug which is suspected to be 
addressable by safe type

b) In desparation, someone just changes all the int, ... types via a 
alias to safe equivalents.

c) program is compiled and run - and a bug is found. Hooray.

d) the alias is redefined to the built-in equivalents

d) but wait, some idiot consults a lawyer. He says - well now you know 
how to guarantee that the program has no bugs (remember he's a lawyer) 
so you better make sure you include this facility in case we get sued. 
We have to be able to prove we did everything humanly possible to avoid 
bugs.

e) someone complains that the program is going to be too slow. But under 
pressure of the authorities they produce a benchmark and it IS slower. 
But not quite by as much as was feared.

f) so now someone says - what would it take to make it faster? At this 
point some changes are made so that we're not doing as much shrinking of 
types. Perhaps the libraries "automatic" policy is used which expands 
intermediate and auto variables to be big enough to hold the results. 
perhaps safe_...range is used to specify real ranges. This eliminates 
the performance bottleneck. But we're no longer using safe types as 
"drop-in" replacements. We've created new types selected to (almost) 
never trap.

It's just a speculative scenario. But it's not unbelievable.

I hope it can be seen that I've spent considerable time thinking about 
this. Now I'm afraid to get on an air plane or get an MRI. It's that bad.

Robert Ramey