[math distributions] where to check for validity of distribution variables?

I was playing around with implementing the (simple) Laplace distribution to get some feeling with "math/distributions" code and concepts. A lot of non-member functions for statistical distributions (like pdf, cdf) make a local copy of the distribution internal variables, and validate those local variables. An alternative would be to move the checking to member function of the distribution. instead of // CREATE LOCAL COPY OF DISTRIBUTION VARIABLES RealType sd = dist.standard_deviation(); RealType mean = dist.mean(); ... // VALIDATE THOSE LOCAL VARIABLES RealType result; if(false == detail::check_scale(function, sd, &result, Policy())) { return result; } if(false == detail::check_location(function, mean, &result, Policy())) { return result; } do it like this // VALIDATE DISTRIBUTION RealType result; if(false == dist.check()) { return result; } // CREATE LOCAL COPY OF VALID DISTRIBUTION VARIABLES RealType sd = dist.standard_deviation(); RealType mean = dist.mean(); I see two benefits: * the code is more compact this way and repeated code in most of the non member functions is centralized (into the distribution) * for multivariate distributions, the checking can be expensive (a covariance matrix of a multivariate normal needs to be semi positive definite). We could do the checking only once in the constructor & cache the result. What do you think? We might turn "having valid parameters" into a property of *all* distribution. As an alternative, we might add a non member function bool valid<distributionType... but that wouldn't allow for caching validation in e.g. a constructor In general (but in the scope of the constructs used in math/distributions & its non member functions): what are the arguments for placing code in either member or non-member functions ?

Thijs van den Berg wrote:
I was playing around with implementing the (simple) Laplace distribution to get some feeling with "math/distributions" code and concepts.
A lot of non-member functions for statistical distributions (like pdf, cdf) make a local copy of the distribution internal variables, and validate those local variables. An alternative would be to move the checking to member function of the distribution.
instead of
// CREATE LOCAL COPY OF DISTRIBUTION VARIABLES RealType sd = dist.standard_deviation(); RealType mean = dist.mean(); ... // VALIDATE THOSE LOCAL VARIABLES RealType result; if(false == detail::check_scale(function, sd, &result, Policy())) { return result; } if(false == detail::check_location(function, mean, &result, Policy())) { return result; }
do it like this
// VALIDATE DISTRIBUTION RealType result; if(false == dist.check()) { return result; } // CREATE LOCAL COPY OF VALID DISTRIBUTION VARIABLES RealType sd = dist.standard_deviation(); RealType mean = dist.mean();
I see two benefits: * the code is more compact this way and repeated code in most of the non member functions is centralized (into the distribution)
Nod.
* for multivariate distributions, the checking can be expensive (a covariance matrix of a multivariate normal needs to be semi positive definite). We could do the checking only once in the constructor & cache the result.
Nod.
What do you think? We might turn "having valid parameters" into a property of *all* distribution. As an alternative, we might add a non member function bool valid<distributionType... but that wouldn't allow for caching validation in e.g. a constructor
Sounds fine to me.
In general (but in the scope of the constructs used in math/distributions & its non member functions): what are the arguments for placing code in either member or non-member functions ?
For implementation details, use whatever works best, for interfaces non-members that operate uniformly on a range of types seem to work best. John.

What do you think? We might turn "having valid parameters" into a property of *all* distribution. As an alternative, we might add a non member function bool valid<distributionType... but that wouldn't allow for caching validation in e.g. a constructor
Sounds fine to me.
thats great! What's your opinion on the fact that you can only set parameter in the constructor? E.g. the normal distribution does a parameter check in the constructor, and those parameters can't change after that. I'll work out the parameter idea in the Laplace distribution code...
In general (but in the scope of the constructs used in math/distributions & its non member functions): what are the arguments for placing code in either member or non-member functions ?
For implementation details, use whatever works best, for interfaces non-members that operate uniformly on a range of types seem to work best.
John.
that's very pragmatic! I like that, thanks

Thijs van den Berg wrote:
What do you think? We might turn "having valid parameters" into a property of *all* distribution. As an alternative, we might add a non member function bool valid<distributionType... but that wouldn't allow for caching validation in e.g. a constructor
Sounds fine to me. thats great! What's your opinion on the fact that you can only set parameter in the constructor? E.g. the normal distribution does a parameter check in the constructor, and those parameters can't change after that.
That's what the existing distributions do. In fact we could omit most of the subsequent parameter checking code if we could figure out whether the error handlers will throw or not on error (in fact we *can* get this information at compile time and make the subsequent checks a no-op if we know that the constructor would have thrown on error... we just ran out of time on that refinement).
I'll work out the parameter idea in the Laplace distribution code...
OK good.
In general (but in the scope of the constructs used in math/distributions & its non member functions): what are the arguments for placing code in either member or non-member functions ?
For implementation details, use whatever works best, for interfaces non-members that operate uniformly on a range of types seem to work best.
John. that's very pragmatic! I like that, thanks
We believe in pragmatism round here :-) John.

John Maddock wrote:
Thijs van den Berg wrote:
What do you think? We might turn "having valid parameters" into a property of *all* distribution. As an alternative, we might add a non member function bool valid<distributionType... but that wouldn't allow for caching validation in e.g. a constructor
Sounds fine to me. thats great! What's your opinion on the fact that you can only set parameter in the constructor? E.g. the normal distribution does a parameter check in the constructor, and those parameters can't change after that.
That's what the existing distributions do. In fact we could omit most of the subsequent parameter checking code if we could figure out whether the error handlers will throw or not on error (in fact we *can* get this information at compile time and make the subsequent checks a no-op if we know that the constructor would have thrown on error... we just ran out of time on that refinement).
I don't understand this, it has to do with my lack of knowledge on this... If you ensure that the parameters get checked in the constructor, why would that check *not* throw an error when needed? Compile time might be tricky depending on the complicity of the parameter validation code, but simple range check on the parameters could be done compile time. What mechanism are your thinking about regarding compile checking, e.g. that scale>0?
I'll work out the parameter idea in the Laplace distribution code...
OK good.
John, I got a bit of Laplace code to share! I still need to test the numerical results, but I compiles without errors/warnings, and it throws errors when parameters are invalid. Do I need to put the code somewhere? I've attached it to this mail... I have 3 idea's in the code I'd like to discuss. * a public member function "check_parameters" in the distribution class * a public member function operator() that allows run-time changing of dist parameters. I know that's a big change... I myself could use something like this. E.g. I have some code that calibrates a stochastic model based on time series data & stores the estimated distribution parameters in a file. Another program will read the distribution parameters from that file, crate distributions objects, and do probability calculations with that. I can only do that when I can set the distribution parameters *runtime*. * no more checking for distribution parameters in the non-member functions. Checking is only done when the distribution parameters get set or get changed. But as said before, I have no good grasp on the subtle issues with that. You said "if we could figure out whether the error handlers will throw or not", implying that there are complexities with this. At the moment, I just have the code. It you think the code is ok, then how would I go about with documentation & testing? Do you have some structure in place for that? I've seen quite some code in the sandbox/math, ...concept etc... Cheers, Thijs

Thijs van den Berg wrote:
That's what the existing distributions do. In fact we could omit most of the subsequent parameter checking code if we could figure out whether the error handlers will throw or not on error (in fact we *can* get this information at compile time and make the subsequent checks a no-op if we know that the constructor would have thrown on error... we just ran out of time on that refinement).
I don't understand this, it has to do with my lack of knowledge on this... If you ensure that the parameters get checked in the constructor, why would that check *not* throw an error when needed?
Correct, the constructor might not throw if the parameters are invalid, *and* the current policy for handling domain errors is something other than throwing an exception. Of course exception throwing is the default, and highly recomended, but there are some situations where exceptions aren't allowed, and returning a NaN when a function that uses the distribution is the correct thing to do. In fact custom error handlers can return a *user-defined error-value* which should be propagated back to the caller of the non-member functions if the parameters to the distribution are invalid. The reference for error handing policies is here: http://www.boost.org/doc/libs/1_37_0/libs/math/doc/sf_and_dist/html/math_too..., but best to read the tutorial http://www.boost.org/doc/libs/1_37_0/libs/math/doc/sf_and_dist/html/math_too... first as that gives an end user perspective.
Compile time might be tricky depending on the complicity of the parameter validation code, but simple range check on the parameters could be done compile time. What mechanism are your thinking about regarding compile checking, e.g. that scale>0?
Ah, I don't mean compile time checking of parameters, I mean: If the current policy in effect (which *is* known at compile time), mandates throwing on a domain error, then we know for sure that the constructor would have thrown if the parameters were invalid. In that case *only* we can omit checking the parameters again in the body of the non-member functions as we know they must be OK.
I'll work out the parameter idea in the Laplace distribution code...
OK good.
John, I got a bit of Laplace code to share!
Cool :-)
I still need to test the numerical results, but I compiles without errors/warnings, and it throws errors when parameters are invalid.
Do I need to put the code somewhere? I've attached it to this mail...
Go ahead and commit to the sandbox version of Boost.Math, if you let me know when you think it's release ready (or not), and I'll know it's OK for that addition to be merged to the Trunk then.
I have 3 idea's in the code I'd like to discuss. * a public member function "check_parameters" in the distribution class
Looks fine, but I would make it const so that it can be called on const-qualified distributions. If check_parameters needs to cache/change something, then that can always be declared mutable as a last resort.
* a public member function operator() that allows run-time changing of dist parameters. I know that's a big change... I myself could use something like this. E.g. I have some code that calibrates a stochastic model based on time series data & stores the estimated distribution parameters in a file. Another program will read the distribution parameters from that file, crate distributions objects, and do probability calculations with that. I can only do that when I can set the distribution parameters *runtime*.
*If* we support changing the parameters then IMO it shouldn't be an operator(): that's reserved for function like objects, and that's not what we have here. The thing is, there are some distributions where the valid range of one parameter may depend upon the values of others, so I'm not so keen on setting one parameter at a time (although it could clearly be done in this case). So what's wrong with: mydist d(1, 2); // do something d = mydist(3, 4); // do something else Currently all the distributions are assignable and cheap to copy, is that likely to change? We could insist that all distributions are cheap to copy, by using the PIMPL technique and copy-on-write for distros with lots of data. Otherwise let's add a reset() member function to set all the parameters.
* no more checking for distribution parameters in the non-member functions. Checking is only done when the distribution parameters get set or get changed. But as said before, I have no good grasp on the subtle issues with that. You said "if we could figure out whether the error handlers will throw or not", implying that there are complexities with this.
Yep: see above.
At the moment, I just have the code. It you think the code is ok, then how would I go about with documentation & testing? Do you have some structure in place for that? I've seen quite some code in the sandbox/math, ...concept etc...
The best thing is to see the tests for the other distributions as examples. We try and obtain independent test data for all the distributions - even if it's of limited precision - to sanity check our implementations. In this case, since we're trivially calling std lib functions, there shouldn't be any need to generate high precision test data for accuracy testing, just make sure you test all the corner cases, and error handling. For the docs, if you take something like the docs for the normal or exponential as a starting point that should get you going? Re the code: PDF: looks like the sign of the value passed to exp() is the wrong way around (could be wrong about that). Sign in CDF might be suspect too. CDF: 1-exp(x) should probably -expm1(x) for accuracy. Quantile: not sure about the formulae here, will look again when I have more time. HTH, John.

John Maddock wrote:
Thijs van den Berg wrote:
That's what the existing distributions do. In fact we could omit most of the subsequent parameter checking code if we could figure out whether the error handlers will throw or not on error (in fact we *can* get this information at compile time and make the subsequent checks a no-op if we know that the constructor would have thrown on error... we just ran out of time on that refinement).
I don't understand this, it has to do with my lack of knowledge on this... If you ensure that the parameters get checked in the constructor, why would that check *not* throw an error when needed?
Correct, the constructor might not throw if the parameters are invalid, *and* the current policy for handling domain errors is something other than throwing an exception. Of course exception throwing is the default, and highly recomended, but there are some situations where exceptions aren't allowed, and returning a NaN when a function that uses the distribution is the correct thing to do. In fact custom error handlers can return a *user-defined error-value* which should be propagated back to the caller of the non-member functions if the parameters to the distribution are invalid.
The reference for error handing policies is here: http://www.boost.org/doc/libs/1_37_0/libs/math/doc/sf_and_dist/html/math_too..., but best to read the tutorial http://www.boost.org/doc/libs/1_37_0/libs/math/doc/sf_and_dist/html/math_too... first as that gives an end user perspective.
Compile time might be tricky depending on the complicity of the parameter validation code, but simple range check on the parameters could be done compile time. What mechanism are your thinking about regarding compile checking, e.g. that scale>0?
Ah, I don't mean compile time checking of parameters, I mean:
If the current policy in effect (which *is* known at compile time), mandates throwing on a domain error, then we know for sure that the constructor would have thrown if the parameters were invalid. In that case *only* we can omit checking the parameters again in the body of the non-member functions as we know they must be OK.
very clear, thanks!
I'll work out the parameter idea in the Laplace distribution code...
OK good.
John, I got a bit of Laplace code to share!
Cool :-)
I still need to test the numerical results, but I compiles without errors/warnings, and it throws errors when parameters are invalid.
Do I need to put the code somewhere? I've attached it to this mail...
Go ahead and commit to the sandbox version of Boost.Math, if you let me know when you think it's release ready (or not), and I'll know it's OK for that addition to be merged to the Trunk then.
Ok, I'll fix the code a bit more -especially numerical validation-. When it has some quality I'll put it in the sandbox.
I have 3 idea's in the code I'd like to discuss. * a public member function "check_parameters" in the distribution class
Looks fine, but I would make it const so that it can be called on const-qualified distributions. If check_parameters needs to cache/change something, then that can always be declared mutable as a last resort.
Hahaha the compiler pointed that out! The non-member function get a "const distribution" passed so it needs to const indeed! I think a good place for a validation caching mechanism could be in the constructor. The constructor can then set a private bool, and that can be accessed ala "RealType is_valid() const {..};" This would imply a small change of error reporting. E.g. pdf(dist, x) will have two checks 1) is dist valid? 2) is x valid? The first check can throw a domain_error telling that the distribution has (some combination) of invalid parameters. The throwing can be done in the pdf function or in the is_valid() member function.. If we put it in the is_valid() member function, then it should maybe be called check_validity(), to make it clearer that it's doing some action. That's probably the best interface. check_validity() can either do a check on all parameters,or lookup a the cached checking result that was stored during the initial checking done in the constructor.
* a public member function operator() that allows run-time changing of dist parameters. I know that's a big change... I myself could use something like this. E.g. I have some code that calibrates a stochastic model based on time series data & stores the estimated distribution parameters in a file. Another program will read the distribution parameters from that file, crate distributions objects, and do probability calculations with that. I can only do that when I can set the distribution parameters *runtime*.
*If* we support changing the parameters then IMO it shouldn't be an operator(): that's reserved for function like objects, and that's not what we have here.
The thing is, there are some distributions where the valid range of one parameter may depend upon the values of others, so I'm not so keen on setting one parameter at a time (although it could clearly be done in this case). So what's wrong with:
mydist d(1, 2); // do something d = mydist(3, 4); // do something else
Currently all the distributions are assignable and cheap to copy, is that likely to change? We could insist that all distributions are cheap to copy, by using the PIMPL technique and copy-on-write for distros with lots of data. Otherwise let's add a reset() member function to set all the parameters.
That was indeed a useless idea, don't know what I was thinking! :) I was too much involved (in ohter code) with template parameters, and was thinking mydist<3, 4>() instead of mydist(3, 4);
* no more checking for distribution parameters in the non-member functions. Checking is only done when the distribution parameters get set or get changed. But as said before, I have no good grasp on the subtle issues with that. You said "if we could figure out whether the error handlers will throw or not", implying that there are complexities with this.
Yep: see above.
At the moment, I just have the code. It you think the code is ok, then how would I go about with documentation & testing? Do you have some structure in place for that? I've seen quite some code in the sandbox/math, ...concept etc...
The best thing is to see the tests for the other distributions as examples. We try and obtain independent test data for all the distributions - even if it's of limited precision - to sanity check our implementations.
In this case, since we're trivially calling std lib functions, there shouldn't be any need to generate high precision test data for accuracy testing, just make sure you test all the corner cases, and error handling.
Yes I will.
For the docs, if you take something like the docs for the normal or exponential as a starting point that should get you going?
Re the code:
PDF: looks like the sign of the value passed to exp() is the wrong way around (could be wrong about that). Sign in CDF might be suspect too.
CDF: 1-exp(x) should probably -expm1(x) for accuracy.
Quantile: not sure about the formulae here, will look again when I have more time.
John, I need to validate the code before you can waste your time on it. I'm currently collecting benchmark values & writing a test file. That should get rid of all the bugs. Paul gave me some good help with going that way (with the test)
HTH, John.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- SITMO Quantitative Financial Consultancy - Software Development M.A. (Thijs) van den Berg Tel.+31 (0)6 2411 0061 Fax.+31 (0)15 285 1984 thijs@sitmo.com <mailto:thijs@sitmo.com> - www.sitmo.com <http://www.sitmo.com>

Thijs van den Berg wrote:
Hahaha the compiler pointed that out! The non-member function get a "const distribution" passed so it needs to const indeed! I think a good place for a validation caching mechanism could be in the constructor. The constructor can then set a private bool, and that can be accessed ala "RealType is_valid() const {..};"
This would imply a small change of error reporting. E.g. pdf(dist, x) will have two checks 1) is dist valid? 2) is x valid?
The first check can throw a domain_error telling that the distribution has (some combination) of invalid parameters. The throwing can be done in the pdf function or in the is_valid() member function.. If we put it in the is_valid() member function, then it should maybe be called check_validity(), to make it clearer that it's doing some action. That's probably the best interface. check_validity() can either do a check on all parameters,or lookup a the cached checking result that was stored during the initial checking done in the constructor.
Don't forget that the policy may be for domain errors to return a custom error value - or take some other action - so you would have to store the RealType result of the error checks on the parameters as well as the true/false result.
John, I need to validate the code before you can waste your time on it. I'm currently collecting benchmark values & writing a test file. That should get rid of all the bugs. Paul gave me some good help with going that way (with the test)
Nod. John.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Thijs van den Berg Sent: 22 November 2008 14:48 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] where to check for validity of distribution variables?
John Maddock wrote:
Thijs van den Berg wrote:
> What do you think? We might turn "having valid parameters" into > a property of *all* distribution. As an alternative, we might add > a non member function bool valid<distributionType... > but that wouldn't allow for caching validation in e.g. a > constructor
Sounds fine to me. thats great! What's your opinion on the fact that you can only set parameter in the constructor? E.g. the normal distribution does a parameter check in the constructor, and those parameters can't change after that.
That's what the existing distributions do. In fact we could omit most of the subsequent parameter checking code if we could figure out whether the error handlers will throw or not on error (in fact we *can* get this information at compile time and make the subsequent checks a no-op if we know that the constructor would have thrown on error... we just ran out of time on that refinement).
I don't understand this, it has to do with my lack of knowledge on this... If you ensure that the parameters get checked in the constructor, why would that check *not* throw an error when needed?
Often you just want to return a NaN, infinity or a 'best guess'. So John devised the rather complicated - but very useful - policies. Most important they are needed to provide the C++ Standard library C-style error behaviour. enum error_policy_type { throw_on_error = 0, // throw an exception. errno_on_error = 1, // set ::errno & return 0, NaN, infinity or best guess.. ignore_error = 2, // return 0, NaN, infinity or best guess. user_error = 3 // call a user-defined error handler.
Compile time might be tricky depending on the complicity of the parameter validation code, but simple range check on the parameters could be done compile time. What mechanism are your thinking about regarding compile checking, e.g. that scale>0?
The complexity of policy options make it much simpler to do a run-time check. You'd save a tiny bit on run-time - but probably pay in compile time? Paul --- Paul A. Bristow Prizet Farmhouse Kendal, UK LA8 8AB +44 1539 561830, mobile +44 7714330204 pbristow@hetp.u-net.com

Paul A. Bristow wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org]
On
Behalf Of Thijs van den Berg Sent: 22 November 2008 14:48 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] where to check for validity of
distribution
variables?
John Maddock wrote:
Thijs van den Berg wrote:
>> What do you think? We might turn "having valid parameters" into >> a property of *all* distribution. As an alternative, we might add >> a non member function bool valid<distributionType... >> but that wouldn't allow for caching validation in e.g. a >> constructor >> Sounds fine to me.
thats great! What's your opinion on the fact that you can only set parameter in the constructor? E.g. the normal distribution does a parameter check in the constructor, and those parameters can't change after that.
That's what the existing distributions do. In fact we could omit most of the subsequent parameter checking code if we could figure out whether the error handlers will throw or not on error (in fact we *can* get this information at compile time and make the subsequent checks a no-op if we know that the constructor would have thrown on error... we just ran out of time on that refinement).
I don't understand this, it has to do with my lack of knowledge on this...
If you ensure
that the parameters get checked in the constructor, why would that check
*not* throw
an error when needed?
Often you just want to return a NaN, infinity or a 'best guess'.
So John devised the rather complicated - but very useful - policies.
Most important they are needed to provide the C++ Standard library C-style error behaviour.
enum error_policy_type { throw_on_error = 0, // throw an exception. errno_on_error = 1, // set ::errno & return 0, NaN, infinity or best guess.. ignore_error = 2, // return 0, NaN, infinity or best guess. user_error = 3 // call a user-defined error handler.
Hi Paul, thanks for the info! I'll have to delve into those concepts a bit more I see. Regarding the checking in non member functions for the validity of the distribution: would it be possible for the distributin contructor to fail runtime (before the distribution parameters can be validated)? Would it be safe for me to assume that * if a distribution validate its parameters in the constructor * if the constructor doesn't throw an error then * there is no need to check the distribution parameters anymore after construction, e.g. in a non-member function. if so, then I would send in new distributions with only checks in the constructor another option is to check parameters (and throw errors) in the distribution parameter access member funtions like "RealType location() const {return m_location;}". A possible drawback in that is that sometimes a *combination* of parameters is valid or not..
Compile time might be tricky depending on the complicity of the parameter
validation
code, but simple range check on the parameters could be done compile time.
What
mechanism are your thinking about regarding compile checking, e.g. that
scale>0?
The complexity of policy options make it much simpler to do a run-time check.
You'd save a tiny bit on run-time - but probably pay in compile time?
Paul
I think the same about that. runtime is good enough, and even unavoidable if you would allow distribution parameters to be set runtime. Btw ,why isn't that implemented (allowing distribution paramters to be set riu-time)? Lack of implementation time (postponed to future versions), of is it a design choice? Cheers, Thijs
--- Paul A. Bristow Prizet Farmhouse Kendal, UK LA8 8AB +44 1539 561830, mobile +44 7714330204 pbristow@hetp.u-net.com
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- SITMO Quantitative Financial Consultancy - Software Development M.A. (Thijs) van den Berg Tel.+31 (0)6 2411 0061 Fax.+31 (0)15 285 1984 thijs@sitmo.com <mailto:thijs@sitmo.com> - www.sitmo.com <http://www.sitmo.com>

Thijs van den Berg wrote:
Would it be safe for me to assume that * if a distribution validate its parameters in the constructor * if the constructor doesn't throw an error then * there is no need to check the distribution parameters anymore after construction, e.g. in a non-member function.
No, the parameters may be invalid, but the current policy may dictate returning an error *value*, in such cases the constructor has to defer signally the error until some other function is called that tries to use the distribution and which returns a value. John.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Thijs van den Berg Sent: 23 November 2008 20:11 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] where to check for validity of distribution variables?
Paul A. Bristow wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org]
On
Behalf Of Thijs van den Berg Sent: 22 November 2008 14:48 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] where to check for validity of
distribution
variables?
John Maddock wrote:
Thijs van den Berg wrote:
>>> What do you think? We might turn "having valid parameters" >>> into a property of *all* distribution. As an alternative, we >>> might add a non member function bool valid<distributionType... >>> but that wouldn't allow for caching validation in e.g. a >>> constructor >>> > Sounds fine to me. > thats great! What's your opinion on the fact that you can only set parameter in the constructor? E.g. the normal distribution does a parameter check in the constructor, and those parameters can't change after that.
That's what the existing distributions do. In fact we could omit most of the subsequent parameter checking code if we could figure out whether the error handlers will throw or not on error (in fact we *can* get this information at compile time and make the subsequent checks a no-op if we know that the constructor would have thrown on error... we just ran out of time on that refinement).
I don't understand this, it has to do with my lack of knowledge on
this...
If you ensure
that the parameters get checked in the constructor, why would that check
*not* throw
an error when needed?
Often you just want to return a NaN, infinity or a 'best guess'.
So John devised the rather complicated - but very useful - policies.
Most important they are needed to provide the C++ Standard library C-style error behaviour.
enum error_policy_type { throw_on_error = 0, // throw an exception. errno_on_error = 1, // set ::errno & return 0, NaN, infinity or best guess.. ignore_error = 2, // return 0, NaN, infinity or best guess. user_error = 3 // call a user-defined error handler.
Hi Paul, thanks for the info! I'll have to delve into those concepts a bit more I see. Regarding the checking in non member functions for the validity of the distribution: would it be possible for the distributin contructor to fail runtime (before the distribution parameters can be validated)? Would it be safe for me to assume that * if a distribution validate its parameters in the constructor * if the constructor doesn't throw an error then * there is no need to check the distribution parameters anymore after construction, e.g. in a non-member function. if so, then I would send in new distributions with only checks in the constructor
another option is to check parameters (and throw errors) in the distribution parameter access member funtions like "RealType location() const {return m_location;}". A possible drawback in that is that sometimes a *combination* of parameters is valid or not.
As I recall, because the chosen policy for the constructor might not cause it to throw, we decided on 'belt and braces' repeated checks, even if it proved redundant (because the check is cheap). If there are other combinations that might cause trouble, this means this is even more sensible.
Compile time might be tricky depending on the complicity of the parameter
validation
code, but simple range check on the parameters could be done compile time.
What
mechanism are your thinking about regarding compile checking, e.g. that
scale>0?
The complexity of policy options make it much simpler to do a run-time check.
You'd save a tiny bit on run-time - but probably pay in compile time?
Paul
I think the same about that. runtime is good enough, and even unavoidable if you would allow distribution parameters to be set runtime. Btw ,why isn't that implemented (allowing distribution paramters to be set riu-time)? Lack of implementation time (postponed to future versions), of is it a design choice?
As I recall, construction (and destruction) is cheap (compared to a cdf, pdf etc) , it is simplest to make users construct a new distribution. Paul --- Paul A. Bristow Prizet Farmhouse Kendal, UK LA8 8AB +44 1539 561830, mobile +44 7714330204 pbristow@hetp.u-net.com

Paul A. Bristow wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org]
On
Behalf Of Thijs van den Berg Sent: 23 November 2008 20:11 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] where to check for validity of
distribution
variables?
Paul A. Bristow wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org]
On
Behalf Of Thijs van den Berg Sent: 22 November 2008 14:48 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] where to check for validity of
distribution
variables?
John Maddock wrote:
Thijs van den Berg wrote:
>>>> What do you think? We might turn "having valid parameters" >>>> into a property of *all* distribution. As an alternative, we >>>> might add a non member function bool valid<distributionType... >>>> but that wouldn't allow for caching validation in e.g. a >>>> constructor >>>> >>>> >> Sounds fine to me. >> >> > thats great! What's your opinion on the fact that you can only set > parameter in the constructor? > E.g. the normal distribution does a parameter check in the > constructor, and those parameters can't change after that. > > That's what the existing distributions do. In fact we could omit most of the subsequent parameter checking code if we could figure out whether the error handlers will throw or not on error (in fact we *can* get this information at compile time and make the subsequent checks a no-op if we know that the constructor would have thrown on error... we just ran out of time on that refinement).
I don't understand this, it has to do with my lack of knowledge on
this...
If you ensure
that the parameters get checked in the constructor, why would that check
*not* throw
an error when needed?
Often you just want to return a NaN, infinity or a 'best guess'.
So John devised the rather complicated - but very useful - policies.
Most important they are needed to provide the C++ Standard library C-style error behaviour.
enum error_policy_type { throw_on_error = 0, // throw an exception. errno_on_error = 1, // set ::errno & return 0, NaN, infinity or best
guess..
ignore_error = 2, // return 0, NaN, infinity or best guess. user_error = 3 // call a user-defined error handler.
Hi Paul, thanks for the info! I'll have to delve into those concepts a bit more I see. Regarding the checking in non member functions for the validity of the distribution: would it be possible for the distributin contructor to fail
runtime (before the
distribution parameters can be validated)? Would it be safe for me to assume that * if a distribution validate its parameters in the constructor * if the constructor doesn't throw an error then * there is no need to check the distribution parameters anymore after
construction,
e.g. in a non-member function. if so, then I would send in new distributions with only checks in the
constructor
another option is to check parameters (and throw errors) in the
distribution parameter
access member funtions like "RealType location() const {return
m_location;}". A
possible drawback in that is that sometimes a *combination* of parameters
is valid or
not.
As I recall, because the chosen policy for the constructor might not cause it to throw, we decided on 'belt and braces' repeated checks, even if it proved redundant (because the check is cheap). If there are other combinations that might cause trouble, this means this is even more sensible.
ah! you're saying you van have one type of policy for the distribution (construtor) and another policy type in some non-member function like pdf. that explains the things I'm seeing! A final question regarding the error checking is this: Suppose a distribution has a couple of valid an invalid parameters. E.g. normal(2,0), whith has a valid mean=2 and invalid std=0. Formally that would make the distribution object invalid... There are at least two possible view on what to do with non-member fuctions. 1) Make *all* of them return NaN because the distribution in invalid. This is a mathematical interpretation or 2) (current implementation) try to give an answer when possible, this is a "can we calculate the result?" interpretation. In this case we can calculate the mean (it's 2), but we can't calculate the pdf because that would give an divide by zero. I'm asking this because I'd like to stick to you're approach with new code, and *not* because I want to discuss a preference for any of the two... :)
Compile time might be tricky depending on the complicity of the parameter
validation
code, but simple range check on the parameters could be done compile
time.
What
mechanism are your thinking about regarding compile checking, e.g. that
scale>0?
The complexity of policy options make it much simpler to do a run-time check.
You'd save a tiny bit on run-time - but probably pay in compile time?
Paul
I think the same about that. runtime is good enough, and even unavoidable
if you
would allow distribution parameters to be set runtime. Btw ,why isn't that
implemented
(allowing distribution paramters to be set riu-time)? Lack of
implementation time
(postponed to future versions), of is it a design choice?
As I recall, construction (and destruction) is cheap (compared to a cdf, pdf etc) , it is simplest to make users construct a new distribution.
yes, I agree, that's a good way to implement the problem I described with the current interface! Works fine!
Paul
--- Paul A. Bristow Prizet Farmhouse Kendal, UK LA8 8AB +44 1539 561830, mobile +44 7714330204 pbristow@hetp.u-net.com
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- SITMO Quantitative Financial Consultancy - Software Development M.A. (Thijs) van den Berg Tel.+31 (0)6 2411 0061 Fax.+31 (0)15 285 1984 thijs@sitmo.com <mailto:thijs@sitmo.com> - www.sitmo.com <http://www.sitmo.com>

Thijs van den Berg wrote:
ah! you're saying you van have one type of policy for the distribution (construtor) and another policy type in some non-member function like pdf. that explains the things I'm seeing!
Nope, the policy that the distribution has applies to all the non-member functions as well, it's just that the *error handlers may not throw, but return an error value instead*.
A final question regarding the error checking is this: Suppose a distribution has a couple of valid an invalid parameters. E.g. normal(2,0), whith has a valid mean=2 and invalid std=0. Formally that would make the distribution object invalid... There are at least two possible view on what to do with non-member fuctions. 1) Make *all* of them return NaN because the distribution in invalid. This is a mathematical interpretation or 2) (current implementation) try to give an answer when possible, this is a "can we calculate the result?" interpretation. In this case we can calculate the mean (it's 2), but we can't calculate the pdf because that would give an divide by zero.
I'm asking this because I'd like to stick to you're approach with new code, and *not* because I want to discuss a preference for any of the two... :)
Ah, OK, in that case, let's continue with the status quo :-) John.

John Maddock wrote:
Thijs van den Berg wrote:
ah! you're saying you van have one type of policy for the distribution (construtor) and another policy type in some non-member function like pdf. that explains the things I'm seeing!
Nope, the policy that the distribution has applies to all the non-member functions as well, it's just that the *error handlers may not throw, but return an error value instead*.
A final question regarding the error checking is this: Suppose a distribution has a couple of valid an invalid parameters. E.g. normal(2,0), whith has a valid mean=2 and invalid std=0. Formally that would make the distribution object invalid... There are at least two possible view on what to do with non-member fuctions. 1) Make *all* of them return NaN because the distribution in invalid. This is a mathematical interpretation or 2) (current implementation) try to give an answer when possible, this is a "can we calculate the result?" interpretation. In this case we can calculate the mean (it's 2), but we can't calculate the pdf because that would give an divide by zero.
I'm asking this because I'd like to stick to you're approach with new code, and *not* because I want to discuss a preference for any of the two... :)
Ah, OK, in that case, let's continue with the status quo :-)
John.
very good! Non-member functions will *only check the distribution parameters they need for their calculation (and no other parameters) * will return a sensible value if all those parameters are ok * not throw an error if other parameters have rendered the distribution mathematically invalid. in that case, I don't see an really big use for an is_valid() or check_parameters() member function (for now) -- SITMO Quantitative Financial Consultancy - Software Development M.A. (Thijs) van den Berg Tel.+31 (0)6 2411 0061 Fax.+31 (0)15 285 1984 thijs@sitmo.com <mailto:thijs@sitmo.com> - www.sitmo.com <http://www.sitmo.com>

Paul A. Bristow wrote:
Often you just want to return a NaN, infinity or a 'best guess'.
So John devised the rather complicated - but very useful - policies.
Most important they are needed to provide the C++ Standard library C-style error behaviour.
Hi Paul, John, I'm trying get a better feel on your wanted error checking/validation behavior in the dist lib. If I get the objective, then I'll know better what type of code to write. I we have the following code that constructs a faulty distribution and uses that in follow-up steps, what would be wanted behavior regarding error handling? 1: normal N(0,0); // a normal distribution with invalid std 2: double y = pdf(N, 0.3); 3: double k = kurtosis(N); Case 1: strict, formal 1:throw an error "invalid std" 2:throw an error "invalid dist passed to pdf" 3:throw an error "invalid dist passed to kurtosis" Case 2: minimalistic 1:throw an error "invalid std" 2:*no* checking of disk, divide by zero error 3: return "0", all valid normal distribution have a kurtosis of 0. We don't even have to look at the distribution details Case 3: error tolerant, but preventing numerical errors 1:throw an error "invalid std" 2:throw an error "invalid dist passed to pdf" 3: return "0", all valid normal distribution have a kurtosis of 0. We don't even have to Cheers, thijs -- SITMO Quantitative Financial Consultancy - Software Development M.A. (Thijs) van den Berg Tel.+31 (0)6 2411 0061 Fax.+31 (0)15 285 1984 thijs@sitmo.com <mailto:thijs@sitmo.com> - www.sitmo.com <http://www.sitmo.com>

Thijs van den Berg wrote:
I we have the following code that constructs a faulty distribution and uses that in follow-up steps, what would be wanted behavior regarding error handling?
1: normal N(0,0); // a normal distribution with invalid std 2: double y = pdf(N, 0.3); 3: double k = kurtosis(N);
IMO, we should probably follow this one (but might not where the result is the same constant for all parameter values). However, as noted previously an error might not trigger an exception: we might be required to return a specific error value instead. John.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Thijs van den Berg Sent: 22 November 2008 14:48 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] where to check for validity of distribution variables?
I got a bit of Laplace code to share! I still need to test the numerical results, but I compiles without errors/warnings, and it throws errors when parameters are invalid.
Do I need to put the code somewhere? I've attached it to this mail...
Suggest you commit to sandbox Boost_sandbox\math_toolkit\boost\math\distributions
At the moment, I just have the code.
It you think the code is ok,
Looks plausible but I've not checked in detail.
then how would I go about with documentation & testing?
Do you have some structure in place for that?
I've seen quite some code in the sandbox/math, ...concept etc...
You need a test suite - follow the examples for similar distributions? Don't forget to say where you got the test values from (Wolfram, Matlab...) in comments. We've tried to use a variety of sources to reduce the risk of mistakes. I have MathCAD but it doesn't calculate Laplace I think. I find using MSVC IDE helpful for testing the test, but then you can add to the test jamfile.v2 "Run test_laplace.cpp" Good idea to cut your teeth on a simple distribution before trying to run with a multivariate one ;-) HTH Paul

Paul A. Bristow wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org]
On
Behalf Of Thijs van den Berg Sent: 22 November 2008 14:48 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] where to check for validity of
distribution
variables?
I got a bit of Laplace code to share! I still need to test the numerical results, but I compiles without
errors/warnings, and it
throws errors when parameters are invalid.
Do I need to put the code somewhere? I've attached it to this mail...
Suggest you commit to sandbox
Boost_sandbox\math_toolkit\boost\math\distributions
At the moment, I just have the code.
It you think the code is ok,
Looks plausible but I've not checked in detail.
then how would I go about with documentation & testing?
Do you have some structure in place for that?
I've seen quite some code in the sandbox/math, ...concept etc...
You need a test suite - follow the examples for similar distributions?
ok, I build a test suite & search for reference material (for the test values). Once I have that ready, and the test show valid results, I'll add it to the sandbox. It might take a couple of day.
Don't forget to say where you got the test values from (Wolfram, Matlab...) in comments. We've tried to use a variety of sources to reduce the risk of mistakes. I have MathCAD but it doesn't calculate Laplace I think.
I find using MSVC IDE helpful for testing the test, but then you can add to the test jamfile.v2
"Run test_laplace.cpp"
Good idea to cut your teeth on a simple distribution before trying to run with a multivariate one ;-)
HTH
Paul
Yes, it's a great way to get an introduction on how things work here at boost as well as the design of the math/distribution package! Thanks for the pointers, oops I mean references!

Paul A. Bristow wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org]
On
Behalf Of Thijs van den Berg Sent: 22 November 2008 14:48 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] where to check for validity of
distribution
variables?
I got a bit of Laplace code to share! I still need to test the numerical results, but I compiles without
errors/warnings, and it
throws errors when parameters are invalid.
Do I need to put the code somewhere? I've attached it to this mail...
Suggest you commit to sandbox
Boost_sandbox\math_toolkit\boost\math\distributions
At the moment, I just have the code.
It you think the code is ok,
Looks plausible but I've not checked in detail.
then how would I go about with documentation & testing?
Do you have some structure in place for that?
I've seen quite some code in the sandbox/math, ...concept etc...
You need a test suite - follow the examples for similar distributions?
Don't forget to say where you got the test values from (Wolfram, Matlab...) in comments. We've tried to use a variety of sources to reduce the risk of mistakes. I have MathCAD but it doesn't calculate Laplace I think.
I find using MSVC IDE helpful for testing the test, but then you can add to the test jamfile.v2
Hi Paul, I'm trying to set that up with C++ 7.1 but am having a hard time compiling the unit test lib's. I use a ''console projects", single threaded, but it keeps naggin about the binary lib unit_test fatal error LNK1104: cannot open file 'libboost_unit_test_framework-vc71-sgd-1_37.lib' and I don't seem to be able to build *that* one with bjam.... I've tried bjam --build-type=complete libs/test but no luck so far... Do you have any quick tips? I've also tried tried the full header include version of the unit test, and that also had issues.. What's your approach/method/configuration on running unit test on MSVC? Do you use the binaries?
"Run test_laplace.cpp"
Good idea to cut your teeth on a simple distribution before trying to run with a multivariate one ;-)
HTH
Paul
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- SITMO Quantitative Financial Consultancy - Software Development M.A. (Thijs) van den Berg Tel.+31 (0)6 2411 0061 Fax.+31 (0)15 285 1984 thijs@sitmo.com <mailto:thijs@sitmo.com> - www.sitmo.com <http://www.sitmo.com>

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Thijs van den Berg Sent: 27 November 2008 13:59 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] Laplace distribution
I'm trying to set that up with C++ 7.1 but am having a hard time compiling
the unit test
lib's. I use a ''console projects", single threaded, but it keeps naggin about the binary lib unit_test fatal error LNK1104: cannot open file 'libboost_unit_test_framework-vc71- sgd-1_37.lib' and I don't seem to be able to build *that* one with bjam.... I've tried
bjam --build-type=complete libs/test
but no luck so far... Do you have any quick tips?
I have a healthy loathing of bjam. Syntax is weird, doc never seem to tell you what you NEED to know... Does this help? http://www.nabble.com/Need-help-building-boost-on-Windows-XP-td19654083.html bjam.exe toolset=msvc --with-test --with-filesystem --with-program_options \ --with-iostreams --with-threads --prefix=C:\Developer\Toolkits\Boost \ variant=release,debug threading=multi link=static runtime-link=shared install link=static looks what you need?
I've also tried tried the full header include version of the unit test, and that also had issues..
What's your approach/method/configuration on running unit test on MSVC?
I use the included version with the IDE, but the binary library with bjam ful tests. It's a bitch - until you get set up. HTH Paul

Thijs van den Berg wrote:
I'm trying to set that up with C++ 7.1 but am having a hard time compiling the unit test lib's. I use a ''console projects", single threaded, but it keeps naggin about the binary lib unit_test fatal error LNK1104: cannot open file 'libboost_unit_test_framework-vc71-sgd-1_37.lib' and I don't seem to be able to build *that* one with bjam.... I've tried
bjam --build-type=complete libs/test
That would be: bjam --build-type=complete --with-test toolset=msvc-7.1 HTH, John.

John Maddock, Paul Bristow, Regarding unit testing: I got the unit testing working! pfff. binaries, paths, compiler switches, post build settings,.. Thanks both of you! I copies a test case from the "\libs\math\test" as a start and changed it into checking laplace numericas, but had some (final) minor problems. I solved those by changing
int test_main(int, char* []) into BOOST_AUTO_TEST_CASE( test1 ) hope that's ok...
Anyway, I'll be able to add some validated code to the sandbox soon. After that ...multi-dim-Gaussian's! So far it's been a great learning experience Cheers, Thijs -- SITMO Quantitative Financial Consultancy - Software Development M.A. (Thijs) van den Berg Tel.+31 (0)6 2411 0061 Fax.+31 (0)15 285 1984 thijs@sitmo.com <mailto:thijs@sitmo.com> - www.sitmo.com <http://www.sitmo.com>

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Thijs van den Berg Sent: 27 November 2008 20:59 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] Laplace distribution
I solved those by changing
int test_main(int, char* []) into BOOST_AUTO_TEST_CASE( test1 ) hope that's ok...
More than - an improvement - using the latest recommended way of starting Boost.Test
Anyway, I'll be able to add some validated code to the sandbox soon.
Ping when committed. PS Take care to follow other distribution tests as examples - there are several of pits (into which I have fallen, serially) Since I think the formulae only use built-in functions like exp, they should be quite accurate and you should be able to set the tolerance as a few eps even when round tripping. Don't forget too that (for full portability) constants should be declared and long double static_cast<RealType>(0.53958342416056554201085167134004L) (unless they are 'exactly representable' like 0.5, 0.25...)
After that ...multi-dim-Gaussian's!
Now the real fun will begin! Paul

Paul A. Bristow wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org]
On
Behalf Of Thijs van den Berg Sent: 27 November 2008 20:59 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] Laplace distribution
I solved those by changing
int test_main(int, char* [])
into
BOOST_AUTO_TEST_CASE( test1 )
hope that's ok...
More than - an improvement - using the latest recommended way of starting Boost.Test
Anyway, I'll be able to add some validated code to the sandbox soon.
Ping when committed.
I will
PS Take care to follow other distribution tests as examples - there are several of pits (into which I have fallen, serially)
I now have some specific test for the pdf, cdf values but also some generic test like quantile(cdf(x))==x hazard(x)=pdf(x)/(1-cdf(x)) U still need to do the error triggering checks and find more refeence values. I'll take a good look at the other tests so look for the other pitfalls
Since I think the formulae only use built-in functions like exp, they should be quite accurate and you should be able to set the tolerance as a few eps even when round tripping.
Don't forget too that (for full portability) constants should be declared and long double
static_cast<RealType>(0.53958342416056554201085167134004L)
(unless they are 'exactly representable' like 0.5, 0.25...)
good point!
After that ...multi-dim-Gaussian's!
Now the real fun will begin!
indeed... the matrix interface will be difficult part. Oh boy..I'm thinking that it would be nice to have: * default matrix storage & default matrix operators.. That way, it works stand-alone without the hassle of dependencies * abstraction via bindings, Give users tthe option to use uBlas, Atlas, lapack etc. instead of the simple internal matrix operators but that design needs quite a bit of thinking..
Paul
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- SITMO Quantitative Financial Consultancy - Software Development M.A. (Thijs) van den Berg Tel.+31 (0)6 2411 0061 Fax.+31 (0)15 285 1984 thijs@sitmo.com <mailto:thijs@sitmo.com> - www.sitmo.com <http://www.sitmo.com>

Paul, John, I've done a first commit to the sandbox! 1) sandbox\math_toolkit\boost\math\distributions\laplace.cpp 2) sandbox\math_toolkit\libs\math\test\test_laplace.cpp ..more later! (doc's, equations, charts, discussion) The unit test file was (is) quite a lot of work, ..there are *so many* things to consider/check. How about this: We could write some generic test based on the properties of distributions... General relations: 1) quantile(cdf(x)) == x 2) hazard(x) = pdf(x)/(1-cdf(x)) 3) pdf(x,location,scale) = pdf( (x-location)/scale, 0, 1)/scale 4) cdf(x,location,scale) = cdf( (x-location)/scale, 0, 1) 5) cdf(complement(N,x)) = cdf(N(-x)) 6) quantile(complement(N,p)) = quantile(N(-x,1-p)) perhaps some automatic checking (for all distribution) of error throwing 7) support <-> cdf, pdf 8) quantile <-> p=0, p=1 And some generic test for distributions with specific properties Symmetric distributions: pdf(x) = pdf(-x) cdf(x) = 1-cdf(-x) etc. we could write template functions for that, that get passed a set of 'x' values etc -- SITMO Quantitative Financial Consultancy - Software Development M.A. (Thijs) van den Berg Tel.+31 (0)6 2411 0061 Fax.+31 (0)15 285 1984 thijs@sitmo.com <mailto:thijs@sitmo.com> - www.sitmo.com <http://www.sitmo.com>

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Thijs van den Berg Sent: 28 November 2008 13:58 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] Laplace distribution
I've done a first commit to the sandbox!
1) sandbox\math_toolkit\boost\math\distributions\laplace.cpp 2) sandbox\math_toolkit\libs\math\test\test_laplace.cpp
..more later! (doc's, equations, charts, discussion)
The unit test file was (is) quite a lot of work, ..there are *so many* things to consider/check.
;-))
How about this: We could write some generic test based on the properties of distributions...
General relations: 1) quantile(cdf(x)) == x 2) hazard(x) = pdf(x)/(1-cdf(x)) 3) pdf(x,location,scale) = pdf( (x-location)/scale, 0, 1)/scale 4) cdf(x,location,scale) = cdf( (x-location)/scale, 0, 1) 5) cdf(complement(N,x)) = cdf(N(-x)) 6) quantile(complement(N,p)) = quantile(N(-x,1-p))
perhaps some automatic checking (for all distribution) of error throwing 7) support <-> cdf, pdf 8) quantile <-> p=0, p=1
And some generic test for distributions with specific properties Symmetric distributions: pdf(x) = pdf(-x) cdf(x) = 1-cdf(-x)
etc. we could write template functions for that, that get passed a set of 'x' values etc
That would indeed be neat - but our tests just grew like Topsy ;-) And it would be quite a lot of work to change now. I also worry that the acceptable tolerances vary widely, so you would have keep passing these as parameters anyway. We also (still are) a bit schizophrenic about whether to allow infinity as parameter, so a uniform test to check throw would be difficult. I note you haven't tried to deal with the long double case. Even if your system like my MSVC only does double == long double 64 bit reals, we should have the hooks in place for systems that do proper long doubles. This rather messy code does this, if appropriate. #ifndef BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS test_spots(0.0L); // Test long double. #if !BOOST_WORKAROUND(__BORLANDC__, BOOST_TESTED_AT(0x0582)) test_spots(boost::math::concepts::real_concept(0.)); // Test real concept. #endif #else std::cout << "<note>The long double tests have been disabled on this platform " "either because the long double overloads of the usual math functions are " "not available at all, or because they are too inaccurate for these tests " "to pass.</note>" << std::cout; #endif At least a few really accurate values could be calculated using the published formula using a 100 decimal digit calculator? But round tripping should test well and I don't expect any trouble for higher precision types (in the unlikely event that someone want them - no accounting for some peoples taste). (I wonder if we can get rid of the Borland test yet?) You might also check the convenience typedef works like this? // Check that can generate lognormal distribution using the two convenience methods: boost::math::lognormal myf1(1., 2); // Using typedef lognormal_distribution<> myf2(1., 2); // Using default RealType double. http://mathworld.wolfram.com/LaplaceDistribution.html is a good link for the doc? Weisstein, Eric W. "Laplace Distribution." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/LaplaceDistribution.html <aside> (This looks rather amusing http://demonstrations.wolfram.com/SampleVersusTheoreticalDistribution/). </aside> Looking good to me. Paul PS You can write 2. 1., , 0.5, 0.25... without the L because they can be exactly represented as float, double, long double (unlike 0.1, 0.01... ) static_cast<RealType>(0.5) which saves some typing and clutter.

Paul A. Bristow wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org]
On
Behalf Of Thijs van den Berg Sent: 28 November 2008 13:58 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] Laplace distribution
I've done a first commit to the sandbox!
1) sandbox\math_toolkit\boost\math\distributions\laplace.cpp 2) sandbox\math_toolkit\libs\math\test\test_laplace.cpp
..more later! (doc's, equations, charts, discussion)
The unit test file was (is) quite a lot of work, ..there are *so many*
things to consider/check.
;-))
How about this: We could write some generic test based on the properties of
distributions...
General relations: 1) quantile(cdf(x)) == x 2) hazard(x) = pdf(x)/(1-cdf(x)) 3) pdf(x,location,scale) = pdf( (x-location)/scale, 0, 1)/scale 4) cdf(x,location,scale) = cdf( (x-location)/scale, 0, 1) 5) cdf(complement(N,x)) = cdf(N(-x)) 6) quantile(complement(N,p)) = quantile(N(-x,1-p))
perhaps some automatic checking (for all distribution) of error throwing 7) support <-> cdf, pdf 8) quantile <-> p=0, p=1
And some generic test for distributions with specific properties Symmetric distributions: pdf(x) = pdf(-x) cdf(x) = 1-cdf(-x)
etc. we could write template functions for that, that get passed a set of 'x'
values etc
That would indeed be neat - but our tests just grew like Topsy ;-)
And it would be quite a lot of work to change now.
I also worry that the acceptable tolerances vary widely, so you would have keep passing these as parameters anyway.
We also (still are) a bit schizophrenic about whether to allow infinity as parameter, so a uniform test to check throw would be difficult.
great idea. We could make an orthogonal unit test, testing all distributions for this specific case
I note you haven't tried to deal with the long double case. Even if your system like my MSVC only does double == long double 64 bit reals, we should have the hooks in place for systems that do proper long doubles.
This rather messy code does this, if appropriate.
#ifndef BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS test_spots(0.0L); // Test long double. #if !BOOST_WORKAROUND(__BORLANDC__, BOOST_TESTED_AT(0x0582)) test_spots(boost::math::concepts::real_concept(0.)); // Test real concept. #endif #else std::cout << "<note>The long double tests have been disabled on this platform " "either because the long double overloads of the usual math functions are " "not available at all, or because they are too inaccurate for these tests " "to pass.</note>" << std::cout; #endif
thanks! I'll include this.
At least a few really accurate values could be calculated using the published formula using a 100 decimal digit calculator? But round tripping should test well and I don't expect any trouble for higher precision types (in the unlikely event that someone want them - no accounting for some peoples taste).
I found it quite difficult to find reference matireal. Enough references to equations, but very little reference to numerical values. The only benchmark tool I have that implements Laplace is GNU Octave. I found that Mathematica also has the Laplace distribution . Some high precision numerical values generated by Mathematics would be very welcome!
(I wonder if we can get rid of the Borland test yet?)
You might also check the convenience typedef works like this?
// Check that can generate lognormal distribution using the two convenience methods: boost::math::lognormal myf1(1., 2); // Using typedef lognormal_distribution<> myf2(1., 2); // Using default RealType double.
Yes, those should also be checked.
http://mathworld.wolfram.com/LaplaceDistribution.html is a good link for the doc? Weisstein, Eric W. "Laplace Distribution." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/LaplaceDistribution.html
<aside> (This looks rather amusing http://demonstrations.wolfram.com/SampleVersusTheoreticalDistribution/). </aside>
Looking good to me.
Paul
PS You can write
2. 1., , 0.5, 0.25... without the L because they can be exactly represented as float, double, long double (unlike 0.1, 0.01... )
nice!
static_cast<RealType>(0.5)
which saves some typing and clutter.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- SITMO Quantitative Financial Consultancy - Software Development M.A. (Thijs) van den Berg Tel.+31 (0)6 2411 0061 Fax.+31 (0)15 285 1984 thijs@sitmo.com <mailto:thijs@sitmo.com> - www.sitmo.com <http://www.sitmo.com>

Paul A. Bristow wrote:
ons...
General relations: 1) quantile(cdf(x)) == x 2) hazard(x) = pdf(x)/(1-cdf(x)) 3) pdf(x,location,scale) = pdf( (x-location)/scale, 0, 1)/scale 4) cdf(x,location,scale) = cdf( (x-location)/scale, 0, 1) 5) cdf(complement(N,x)) = cdf(N(-x)) 6) quantile(complement(N,p)) = quantile(N(-x,1-p))
perhaps some automatic checking (for all distribution) of error throwing 7) support <-> cdf, pdf 8) quantile <-> p=0, p=1
And some generic test for distributions with specific properties Symmetric distributions: pdf(x) = pdf(-x) cdf(x) = 1-cdf(-x)
etc. we could write template functions for that, that get passed a set of 'x'
values etc
That would indeed be neat - but our tests just grew like Topsy ;-)
And it would be quite a lot of work to change now.
Yes, the use of standardizing testing or all depends on the number of developers. It the numbers grows then at some point it might be a good idea to document a list of tests that are needed. Currently I'll doing fine iterating & growing the test based on feedback from you two. A friend of mine has written a python script to automatically generate 'binding' headers from lapac source. Perhaps at some point it would be doable to automate a lot of that work, but the relevance is low. Cheers, This files

Paul, John, Having looked at parts of the code in math:distributions, I have some questions regarding coding in 1) Assignments: shouldn't we replace occurrences like RealType result scale = dist.scale(); with RealType result scale(dist.scale()); 2) The range() and support() non-member function return "*const* pair<>", all others return *non*-const RealType, Shouldn't all of them be const? 3) This is one will have quite a bit of impact... In common_error_handling.hpp in the function inline bool check_XXX(const char* function, RealType const& prob, RealType* result, const Policy& pol) shouldn't we replace RealType* result, with RealType& result, (and adjust all the calls) to ensure that result has a valid address? Cheers, Thijs PS, I see no LambertW function in math::special_function. I'm sure Knuth is going to be very upset! :) -- SITMO Quantitative Financial Consultancy - Software Development M.A. (Thijs) van den Berg Tel.+31 (0)6 2411 0061 Fax.+31 (0)15 285 1984 thijs@sitmo.com <mailto:thijs@sitmo.com> - www.sitmo.com <http://www.sitmo.com>

AMDG Thijs van den Berg wrote:
1) Assignments: shouldn't we replace occurrences like RealType result scale = dist.scale(); with RealType result scale(dist.scale());
The two forms mean exactly the same thing--copy construction, not assignment. In Christ, Steven Watanabe

Thijs van den Berg wrote:
Paul, John,
Having looked at parts of the code in math:distributions, I have some questions regarding coding in
1) Assignments: shouldn't we replace occurrences like RealType result scale = dist.scale(); with RealType result scale(dist.scale());
They are the same thing.
2) The range() and support() non-member function return "*const* pair<>", all others return *non*-const RealType, Shouldn't all of them be const?
Does it actually make any difference given that they return the result by value? In any case the range and support functions are actually documented as: template<class RealType, class Policy> std::pair<RealType, RealType> support(const Distribution-Type<RealType, Policy>& dist); So I'm not sure where the extra "const" came from in the code you're seeing: you'd probably need a compiler that supports C++0x rvalue-references to detect the difference, and even then I'm not sure what the utility would be.
3) This is one will have quite a bit of impact... In common_error_handling.hpp in the function inline bool check_XXX(const char* function, RealType const& prob, RealType* result, const Policy& pol) shouldn't we replace RealType* result, with RealType& result, (and adjust all the calls) to ensure that result has a valid address?
Maybe :-) If we're designing a public interface then yes for sure, but as an implementation detail it doesn't really make that much difference.
PS, I see no LambertW function in math::special_function. I'm sure Knuth is going to be very upset! :)
Well I've never needed it ;-) Cheers, John.

John Maddock wrote:
Thijs van den Berg wrote:
Paul, John,
Having looked at parts of the code in math:distributions, I have some questions regarding coding in
1) Assignments: shouldn't we replace occurrences like RealType result scale = dist.scale(); with RealType result scale(dist.scale());
They are the same thing.
if that's try, then the first notation is more readable
2) The range() and support() non-member function return "*const* pair<>", all others return *non*-const RealType, Shouldn't all of them be const?
Does it actually make any difference given that they return the result by value? In any case the range and support functions are actually documented as:
template<class RealType, class Policy> std::pair<RealType, RealType> support(const Distribution-Type<RealType, Policy>& dist);
So I'm not sure where the extra "const" came from in the code you're seeing: you'd probably need a compiler that supports C++0x rvalue-references to detect the difference, and even then I'm not sure what the utility would be. It was just a question. It would be a good idea then to remove the const and make it in synch with doc's. I could do a regexp search :)
3) This is one will have quite a bit of impact... In common_error_handling.hpp in the function inline bool check_XXX(const char* function, RealType const& prob, RealType* result, const Policy& pol) shouldn't we replace RealType* result, with RealType& result, (and adjust all the calls) to ensure that result has a valid address?
Maybe :-)
If we're designing a public interface then yes for sure, but as an implementation detail it doesn't really make that much difference.
I agree, if it's not public and it works, it's low priority. I would prefer myself to work on more intereting stuff
PS, I see no LambertW function in math::special_function. I'm sure Knuth is going to be very upset! :)
Well I've never needed it ;-) Cheers, John.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- SITMO Quantitative Financial Consultancy - Software Development M.A. (Thijs) van den Berg Tel.+31 (0)6 2411 0061 Fax.+31 (0)15 285 1984 thijs@sitmo.com <mailto:thijs@sitmo.com> - www.sitmo.com <http://www.sitmo.com>

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Thijs van den Berg Sent: 29 November 2008 09:59 To: boost@lists.boost.org Subject: Re: [boost] [math distributions]
PS, I see no LambertW function in math::special_function. I'm sure Knuth is going to be very upset! :)
I'm entirely ignorant of other uses this has apart from this http://www.apmaths.uwo.ca/~rcorless/frames/PAPERS/LambertW/knuth&me.jpg (And some very pretty 3D colorplots ;-) http://en.wikipedia.org/wiki/Lambert_W_function gives some clues: "The Lambert W function cannot be expressed in terms of elementary functions. It is useful in combinatorics, for instance in the enumeration of trees. It can be used to solve various equations involving exponentials and also occurs in the solution of time-delayed differential equations, such as y'(t) = a y(t − 1)." But I'm sure you are right - Knuth will be very, very disappointed. Anyone with nothing better to do fancy implementing this? :-)) Paul --- Paul A. Bristow Prizet Farmhouse Kendal, UK LA8 8AB +44 1539 561830, mobile +44 7714330204 pbristow@hetp.u-net.com

Paul A. Bristow wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Thijs van den Berg Sent: 29 November 2008 09:59 To: boost@lists.boost.org Subject: Re: [boost] [math distributions]
PS, I see no LambertW function in math::special_function. I'm sure Knuth is going to be very upset! :)
I'm entirely ignorant of other uses this has apart from this
http://www.apmaths.uwo.ca/~rcorless/frames/PAPERS/LambertW/knuth&me.jpg
(And some very pretty 3D colorplots ;-)
http://en.wikipedia.org/wiki/Lambert_W_function
gives some clues:
"The Lambert W function cannot be expressed in terms of elementary functions. It is useful in combinatorics, for instance in the enumeration of trees. It can be used to solve various equations involving exponentials and also occurs in the solution of time-delayed differential equations, such as y'(t) = a y(t − 1)."
But I'm sure you are right - Knuth will be very, very disappointed.
Anyone with nothing better to do fancy implementing this? :-))
Paul
well I was thinking about implementing it using this ref http://www.whim.org/nebula/math/lambertw.html but I'm not sure about the complex number version (which is very important, because it's needed for the colorplots!) Another thing is that the multivariate Gaussian has higher priority on my list. It's something I actually need to do quite soon (in- or out-side boost), plus the fact that adding a new function is *way* more work than just implementing a simple formula. Essentially lapalace is nothing more than T function pdf_lapace(const T& x, const T& a, const T& b) return exp(-abs(x-a)/b)/b/2; so I thought, I can do that in 10 minutes! ..but I ended up writing an additional 800 lines. aagghh. A little think I could try to do is decompose the unit test into a distribution invariant part (which is reusable) and a laplace specific part. How about common_test.hpp, as an analogy to common_error_handling.hpp? That would make adding a test other as simple as adding a single line. Your idea on testing all non-member functions for the way they handle infinity could e.g. be done like this: // The centralized part. Testing various values, is a bit configurable with a couple of boolean template <class DistributionType> template distribution_check_throw_on_inf(const DistributionType dist, bool x_neg_inf, bool x_pos_inf, ...) { typedef DistributionType::RealType realtype; if (x_neg_inf) //check if an error gets thrown on x == -inf { BOOST_CHECK_THROW(pdf(dist, -std::numeric_limits<RealType>::infinity()), std::domain_error); BOOST_CHECK_THROW(cdf(dist, -std::numeric_limits<RealType>::infinity()), std::domain_error); ... other non-member functions... } if (x_pos_inf) { ... } ... other tests ..., e.g. p=0, p=1 in quantile } And then have a test that uses it like this: BOOST_AUTO_TEST_CASE( dist_test ) { distribution_check_throw_on_inf(normal<float>(0,1), true, true, .. distribution_check_throw_on_inf(normal<double>(0,1), true, true, .. distribution_check_throw_on_inf(normal<long double>(0,1), true, true, .. } we could even make a single centralized test for *all* distributions! A author of a new distribution will have to add a couple of lines to this BOOST_AUTO_TEST_CASE( dist_throw_test ) { distribution_check_throw_on_inf(normal<float>(0,1), true, true, .. distribution_check_throw_on_inf(normal<double>(0,1), true, true, .. ... distribution_check_throw_on_inf(laplace<float>(0,1), true, true, .. ... } something like this would make me much more confident that all tests are performed on all distribution (nothing forgotten). What do you think? It also makes the burden of thinking about all possible cases much less, as the test cases are centralized... -- SITMO Quantitative Financial Consultancy - Software Development M.A. (Thijs) van den Berg Tel.+31 (0)6 2411 0061 Fax.+31 (0)15 285 1984 thijs@sitmo.com <mailto:thijs@sitmo.com> - www.sitmo.com <http://www.sitmo.com>

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Thijs van den Berg Sent: 29 November 2008 14:15 To: boost@lists.boost.org Subject: Re: [boost] [math distributions]
Paul A. Bristow wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Thijs van den Berg Sent: 29 November 2008 09:59 To: boost@lists.boost.org Subject: Re: [boost] [math distributions]
PS, I see no LambertW function in math::special_function. I'm sure Knuth is going to be very upset! :) But I'm sure you are right - Knuth will be very, very disappointed.
Anyone with nothing better to do fancy implementing this? :-))
Paul
well I was thinking about implementing it using this ref
http://www.whim.org/nebula/math/lambertw.html
but I'm not sure about the complex number version (which is very important, because it's needed for the colorplots!)
A Friday afternoon project? ;-)
Another thing is that the multivariate Gaussian has higher priority on my list. It's something I actually need to do quite soon (in- or out-side boost), plus the fact that adding a new function is *way* more work than just implementing a simple formula.
Essentially lapalace is nothing more than
T function pdf_lapace(const T& x, const T& a, const T& b) return exp(-abs(x- a)/b)/b/2;
so I thought, I can do that in 10 minutes! ..but I ended up writing an additional 800 lines. aagghh. A little think I could try to do is decompose the unit test into a distribution invariant part (which is reusable) and a laplace specific part. How about common_test.hpp, as an analogy to common_error_handling.hpp? That would make adding a test other as simple as adding a single line. Your idea on testing all non-member functions for the way they handle infinity could e.g. be done like this:
// The centralized part. Testing various values, is a bit configurable with a couple of boolean
template <class DistributionType> template distribution_check_throw_on_inf(const DistributionType dist, bool x_neg_inf, bool x_pos_inf, ...) { typedef DistributionType::RealType realtype;
if (x_neg_inf) //check if an error gets thrown on x == -inf { BOOST_CHECK_THROW(pdf(dist, -std::numeric_limits<RealType>::infinity()), std::domain_error); BOOST_CHECK_THROW(cdf(dist, -std::numeric_limits<RealType>::infinity()), std::domain_error);
... other non-member functions...
}
if (x_pos_inf) { ... }
... other tests ..., e.g. p=0, p=1 in quantile }
And then have a test that uses it like this:
BOOST_AUTO_TEST_CASE( dist_test ) { distribution_check_throw_on_inf(normal<float>(0,1), true, true, .. distribution_check_throw_on_inf(normal<double>(0,1), true, true, .. distribution_check_throw_on_inf(normal<long double>(0,1), true, true, .. }
we could even make a single centralized test for *all* distributions! A author of a new distribution will have to add a couple of lines to this
BOOST_AUTO_TEST_CASE( dist_throw_test ) { distribution_check_throw_on_inf(normal<float>(0,1), true, true, .. distribution_check_throw_on_inf(normal<double>(0,1), true, true, .. ...
distribution_check_throw_on_inf(laplace<float>(0,1), true, true, .. ... }
something like this would make me much more confident that all tests are performed on all distribution (nothing forgotten). What do you think? It also makes the burden of thinking about all possible cases much less, as the test cases are centralized...
Well if we had known what we do now, we might well have done it this clearly better way. - but having done most of the distributions, I fear it is easier to keep using what we have as boilerplate. And the tolerance are too variable to handle this way easily - Laplace is rather accurate because only using exp. And the multivariate will be pretty different too. Reminds me of a quote "Why spend five years of your life automating something that you can do in a day or two?" :-) Paul --- Paul A. Bristow Prizet Farmhouse Kendal, UK LA8 8AB +44 1539 561830, mobile +44 7714330204 pbristow@hetp.u-net.com

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock Sent: 22 November 2008 09:55 To: boost@lists.boost.org Subject: Re: [boost] [math distributions] where to check for validity of distribution variables?
Thijs van den Berg wrote:
What do you think? We might turn "having valid parameters" into a property of *all* distribution. As an alternative, we might add a non member function bool valid<distributionType... but that wouldn't allow for caching validation in e.g. a constructor
Sounds fine to me. thats great! What's your opinion on the fact that you can only set parameter in the constructor? E.g. the normal distribution does a parameter check in the constructor, and those parameters can't change after that.
That's what the existing distributions do. In fact we could omit most of the subsequent parameter checking code if we could figure out whether the error handlers will throw or not on error (in fact we *can* get this information at compile time and make the subsequent checks a no-op if we know that the constructor would have thrown on error... we just ran out of time on that refinement).
As I recall, we figured that the time it saved compared with the mega-whirring that would follow was negligible. What was important was to catch dud parameters, so checking even if redundantly was worth it. Paul --- Paul A. Bristow Prizet Farmhouse Kendal, UK LA8 8AB +44 1539 561830, mobile +44 7714330204 pbristow@hetp.u-net.com
participants (4)
-
John Maddock
-
Paul A. Bristow
-
Steven Watanabe
-
Thijs van den Berg