Re: [boost] [math distributions] where to check for validity of distribution variables?

22 Nov 2008

      Thijs van den Berg wrote:
...
...
...
That's what the existing distributions do.  In fact we could omit
most of the subsequent parameter checking code if we could figure
out whether the error handlers will throw or not on error (in fact
we *can* get this information at compile time and make the
subsequent checks a no-op if we know that the constructor would
have thrown on error... we just ran out of time on that refinement).
I don't understand this, it has to do with my lack of knowledge on
this... If you ensure that the parameters get checked in the
constructor, why would that check *not* throw an error when needed?
Correct, the constructor might not throw if the parameters are invalid, 
*and* the current policy for handling domain errors is something other than 
throwing an exception.  Of course exception throwing is the default, and 
highly recomended, but there are some situations where exceptions aren't 
allowed, and returning a NaN when a function that uses the distribution is 
the correct thing to do.  In fact custom error handlers can return a 
*user-defined error-value* which should be propagated back to the caller of 
the non-member functions if the parameters to the distribution are invalid.

The reference for error handing policies is here: 
http://www.boost.org/doc/libs/1_37_0/libs/math/doc/sf_and_dist/html/math_too..., 
but best to read the tutorial 
http://www.boost.org/doc/libs/1_37_0/libs/math/doc/sf_and_dist/html/math_too... 
first as that gives an end user perspective.
...
...
Compile time might be tricky depending on the complicity of the
parameter validation code, but simple range check on the parameters
could be done compile time. What mechanism are your thinking about
regarding compile checking, e.g. that scale>0?
Ah, I don't mean compile time checking of parameters, I mean:

If the current policy in effect (which *is* known at compile time), mandates 
throwing on a domain error, then we know for sure that the constructor would 
have thrown if the parameters were invalid.  In that case *only* we can omit 
checking the parameters again in the body of the non-member functions as we 
know they must be OK.
...
...
...
...
...
I'll work out the parameter idea in the Laplace distribution
code...
OK good.
John,
I got a bit of Laplace code to share!
Cool :-)
...
...
I still need to test the numerical results, but I compiles without
errors/warnings, and it throws errors when parameters are invalid.
Do I need to put the code somewhere? I've attached it to this mail...
Go ahead and commit to the sandbox version of Boost.Math, if you let me know 
when you think it's release ready (or not), and I'll know it's OK for that 
addition to be merged to the Trunk then.
...
...
I have 3 idea's in the code I'd like to discuss.
* a public member function "check_parameters" in the distribution
class
Looks fine, but I would make it const so that it can be called on 
const-qualified distributions.  If check_parameters needs to cache/change 
something, then that can always be declared mutable as a last resort.
...
...
* a public member function operator() that allows run-time changing
of
dist parameters. I know that's a big change... I myself could use
something like this. E.g. I have some code that calibrates a
stochastic
model based on time series data & stores the estimated distribution
parameters in a file. Another program will read the distribution
parameters from that file, crate distributions objects,  and do
probability calculations with that. I can only do that when I can set
the distribution parameters *runtime*.
*If* we support changing the parameters then IMO it shouldn't be an 
operator(): that's reserved for function like objects, and that's not what 
we have here.

The thing is, there are some distributions where the valid range of one 
parameter may depend upon the values of others, so I'm not so keen on 
setting one parameter at a time (although it could clearly be done in this 
case).  So what's wrong with:

mydist d(1, 2);
// do something
d = mydist(3, 4);
// do something else

Currently all the distributions are assignable and cheap to copy, is that 
likely to change?  We could insist that all distributions are cheap to copy, 
by using the PIMPL technique and copy-on-write for distros with lots of 
data.  Otherwise let's add a reset() member function to set all the 
parameters.
...
...
* no more checking for distribution parameters in the non-member
functions. Checking is only done when the distribution parameters get
set or get changed. But as said before, I have no good grasp on the
subtle issues with that. You said "if we could figure out whether the
error handlers will throw or not", implying that there are
complexities
with this.
Yep: see above.
...
...
At the moment, I just have the code. It you think the code is ok,
then
how would I go about with documentation & testing? Do you have some
structure in place for that? I've seen quite some code in the
sandbox/math, ...concept etc...
The best thing is to see the tests for the other distributions as examples. 
We try and obtain independent test data for all the distributions - even if 
it's of limited precision - to sanity check our implementations.

In this case, since we're trivially calling std lib functions, there 
shouldn't be any need to generate high precision test data for accuracy 
testing, just make sure you test all the corner cases, and error handling.

For the docs, if you take something like the docs for the normal or 
exponential as a starting point that should get you going?

Re the code:

PDF: looks like the sign of the value passed to exp() is the wrong way 
around (could be wrong about that).  Sign in CDF might be suspect too.

CDF: 1-exp(x) should probably -expm1(x) for accuracy.

Quantile: not sure about the formulae here, will look again when I have more 
time.

HTH, John.

Re: [boost] [math distributions] where to check for validity of distribution variables?

John Maddock