[math toolkit] Review results

I'd like to congratulate John Maddock, Paul Bristow, and their contributors on putting together an outstanding submission to Boost. It is rare to have complete unanimity among votes, but in this case there were no dissenters to accepting either the special functions or the statistical distributions portions of the library, so I'm glad to announce that it has been accepted for inclusion in Boost. Thanks to the many participants in the review : Gottlob Frege, Guillaume Melquiond, Johan RĂ¥de, Arnaldur Gylfason, John Phillips, Mark Van De Vyver, Stephan Tolksdorf, Hubert Holin, Seweryn Habdank-Wojewodski, Kevin Lynch, Leonaldo Peralta, Jeff Garland, and Stefan Seefeld (apologies for anyone I might have missed in this list). Here is a brief summary of the relatively few major issues that were raised during the course of the review : 1) Error handling is an issue that arises because this library is at the frontier between traditional numerical libraries and a (hopefully) new breed of such libraries that utilize modern programming techniques and facilities. Jeff Garland suggested that the default behavior should be to throw an exception, with errno being an option enabled by macro. It would also be nice to have more granular control over which instances throw exceptions and which do not (so, for example, a user could choose to ignore denormals). It was also suggested that additional, more transparent exceptions be provided for cases such as failure to converge rather than reusing tools::logic_error. 2) Jeff Garland also pointed out, rightly in my opinion, that attempts to use statistical functions that do not exist for a distribution should fail to compile rather than leading to a runtime error (e.g. mean of a Cauchy distribution). A reasonable method of implementing this should be devised or a strong argument for why it is not feasible/desirable provided. 3) Arnaldur Gylfason did some nice accuracy and performance testing vs. R. It was noted that the performance of quantile functions in this library was significantly worse than R, unlike non-quantile functions. He also pointed out that, for discrete distributions, the current behavior of returning fractional results for quantiles may not be the expected one. These issues should be addressed and documented. 4) Hubert Holin suggested the possibility of use of a policy parameter to choose between speed and accuracy. Another interesting possibility would be to allow the user to specify the desired precision either at compile or runtime. 5) It appears that this library will actually become two "sublibraries" within the Boost.Math library. Currently all code lives in the boost::math namespace; I would like to at least see a discussion of the possibility of having boost::math::special_functions, boost::math::statistics, and, perhaps, boost::math::statistics::distributions namespaces - as more functionality gets added to boost::math, collisions will become more likely, so some thought given now to logical partitioning may save pain later. During the review a number of typos and minor issues of documentation were raised, which the authors will need to address if they haven't already. In particular, John Phillips noted that the rationale behind using rational (vs. polynomial) approximations be clarified, the derivation of coefficients be documented, and the list of special functions in the documentation be expanded to encompass all functions actually implemented in the library. Stephan Tolksdorf requested a table listing standard statistical tests and the corresponding library implementations and enumerated a number of other documentation issues. Overall, the library was very well received, and the authors are to be commended on the tremendous amount of care and effort devoted to its preparation. Matthias Schabel Review Manager

On Wed, 02 May 2007 20:30:01 +0200, Matthias Schabel <boost@schabel-family.org> wrote:
5) It appears that this library will actually become two "sublibraries" within the Boost.Math library.
I agree but, IMO, this should be achieved mostly by using paths/directories, exactly as that already happens in J. Maddock & co.'s library.
Currently all code lives in the boost::math namespace; I would like to at least see a discussion of the possibility of having boost::math::special_functions, boost::math::statistics, and, perhaps, boost::math::statistics::distributions namespaces - as more functionality gets added to boost::math, collisions will become more likely, so some thought given now to logical partitioning may save pain later.
I, respectfully, disagree. I find this subdivision too pedantic. IMHO only the boost::math::statistics namespace could be needed. Under this namespace distributions are enough contextualized. As far as special functions, they are enough general, as mathematical objects, that they should be put straight under the boost::math namespace. Congratulations to the authors' library on their good and well documented work. Marco -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

-----Original Message-----
From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Matthias Schabel Sent: 02 May 2007 19:30 To: boost@lists.boost.org; boost-announce; boost-users Subject: [boost] [math toolkit] Review results
2) Jeff Garland also pointed out, rightly in my opinion, that attempts to use statistical functions that do not exist for a distribution should fail to compile rather than leading to a runtime error (e.g. mean of a Cauchy distribution). A reasonable method of implementing this should be devised or a strong argument for why it is not feasible/desirable provided.
I have begun looking at this. The current code is: template <class RealType> inline RealType mean(const cauchy_distribution<RealType>& ) { // There is no mean: return tools::domain_error<RealType>( BOOST_CURRENT_FUNCTION, "The Cauchy distribution does not have a mean: " "the only possible return value is %1%.", std::numeric_limits<RealType>::quiet_NaN()); } This fails at RUN time thus: Autorun "i:\boost\libs\math\test\math_test\debug\test_cauchy_mean.exe" Running 1 test case... unknown location(0): fatal error in "test_main_caller( argc, argv )": std::domain_error: Error in function double __cdecl boost::math::mean<double>(const class boost::math::cauchy_distribution<double> &): The Cauchy distribution does not have a mean: the only possible return value is 1.#QNAN. which is pretty clear, but only at runtime. If we want it to fail at compile time, IMO it MUST be an option becaue there are occasions when it is vital to be able treat all distributions alike. A typical case is our C# .NET (cue hisses and boos) 'applet' to calculate distribution parameters - for any distribution chosen from a dropdown. (This useful program will be posted somewhere soon). So to provide an option, a macro seems sensible, perhaps BOOST_MATH_COMPILE_FAIL_IF_UNDEFINED? BOOST_STATIC_ASSERT(false); has been one way of providing a compile time trap and fails with a rather long message, although clicking on the 1st line goes straight to the source code above, so it soon becomes pretty clear. "I:\Boost-sandbox\math_toolkit\boost/math/distributions/cauchy.hpp(218) : error C2027: use of undefined type 'boost::STATIC_ASSERTION_FAILURE<x>' with [ x=false ] .\test_cauchy_mean.cpp(700) : see reference to function template instantiation 'RealType boost::math::mean<double>(const boost::math::cauchy_distribution<RealType> &)' being compiled with [ RealType=double ] " For MSVC at least one can use the #error pragma thus #ifdef BOOST_MATH_COMPILE_FAIL_IF_UNDEFINED # ifdef BOOST_MSVC # error Mean of the Cauchy distribution is undefined! # else BOOST_STATIC_ASSERT(false); # endif #endif and this produces the rather more explicit message: I:\Boost-sandbox\math_toolkit\boost/math/distributions/cauchy.hpp(216) : fatal error C1189: #error : Mean of the Cauchy distribution is undefined! So some questions: 1 Does this macro mechanism seem OK? Is the macro name OK? 2 And should the portable BOOST_STATIC_ASSERT be used? 3 Should the pragma #error be used? 4 What compilers do/don't support #error pragma? 5 Should be default behaviour (macro NOT defined) remain a runtime failure as at present? Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Paul A Bristow wrote:
If we want it to fail at compile time, IMO it MUST be an option becaue there are occasions when it is vital to be able treat all distributions alike. A typical case is our C# .NET (cue hisses and boos) 'applet' to calculate distribution parameters - for any distribution chosen from a dropdown. (This useful program will be posted somewhere soon).
Two other options: we could remove the definitions of accessor functions that are undefined for that distribution, and then add our own private "catch all" versions when we need them. Or we could add the "catch all" versions to a separate namespace, and users could bring them into scope when needed. Do we have customised (as in meaningful and distribution specific) error messages for the current functions though? If so we should try and keep them.
So to provide an option, a macro seems sensible, perhaps BOOST_MATH_COMPILE_FAIL_IF_UNDEFINED?
The consensus seems to be for making compiler errors the default, so maybe BOOST_MATH_DEFINE_ALL_ACCESSORS to make them available regardless of whether they are mathematically defined for the distribution.
BOOST_STATIC_ASSERT(false); has been one way of providing a compile time trap and fails with a rather long message, although clicking on the 1st line goes straight to the source code above, so it soon becomes pretty clear.
It would have to be BOOST_STATIC_ASSERT(sizeof(T) == 0) as some compilers realise that BOOST_STATIC_ASSERT(false) will always fail and issue a diagnostic irrespective of whether the template is instantiated.
For MSVC at least one can use the #error pragma thus
#ifdef BOOST_MATH_COMPILE_FAIL_IF_UNDEFINED # ifdef BOOST_MSVC # error Mean of the Cauchy distribution is undefined! # else BOOST_STATIC_ASSERT(false); # endif #endif
and this produces the rather more explicit message:
I:\Boost-sandbox\math_toolkit\boost/math/distributions/cauchy.hpp(216) : fatal error C1189: #error : Mean of the Cauchy distribution is undefined!
Surely that just gives an error whenever you include the header? Or we could simply not provide the definitions of these functions at all: but then the error messages might be rather cryptic? John.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock Sent: 16 May 2007 18:36 To: boost@lists.boost.org Subject: Re: [boost] [math toolkit] Review results
Paul A Bristow wrote:
If we want it to fail at compile time, IMO it MUST be an option becaue there are occasions when it is vital to be able treat all distributions alike. A typical case is our C# .NET (cue hisses and boos) 'applet' to calculate distribution parameters - for any distribution chosen from a dropdown. (This useful program will be posted somewhere soon).
Two other options: we could remove the definitions of accessor functions that are undefined for that distribution, and then add our own private "catch all" versions when we need them. Or we could add the "catch all" versions to a separate namespace, and users could bring them into scope when needed.
These seem more complicated to me, unless I am misunderstanding, again. There are only quite few *undefined* accessors (mean, etc).
Do we have customised (as in meaningful and distribution specific) error messages for the current functions though? If so we should try and keep them.
The compiler outputs: I:\Boost-sandbox\math_toolkit\boost/math/distributions/cauchy.hpp(218) : error C2027: use of undefined type 'boost::STATIC_ASSERTION_FAILURE<x>' with [ x=false ] .\test_cauchy_mean.cpp(47) : see reference to function template instantiation 'RealType boost::math::mean<double>(const boost::math::cauchy_distribution<RealType> &)' being compiled with [ RealType=double ] but if you click on cauchy.hpp line 218 you see the pretty self-explanatory code below: template <class RealType> inline RealType mean(const cauchy_distribution<RealType>& ) { // There is no mean: #ifdef BOOST_MATH_COMPILE_FAIL_IF_UNDEFINED # ifndef BOOST_MSVC # error Mean of the Cauchy distribution is undefined! # else BOOST_STATIC_ASSERT(sizeof(RealType) == 0); # endif #endif return tools::domain_error<RealType>( BOOST_CURRENT_FUNCTION, "The Cauchy distribution does not have a mean: " "the only possible return value is %1%.", std::numeric_limits<RealType>::quiet_NaN()); }
So to provide an option, a macro seems sensible, perhaps BOOST_MATH_COMPILE_FAIL_IF_UNDEFINED?
The consensus seems to be for making compiler errors the default, so maybe BOOST_MATH_DEFINE_ALL_ACCESSORS to make them available regardless of whether they are mathematically defined for the distribution.
OK - I assume this is just the inverse of BOOST_MATH_COMPILE_FAIL_IF_UNDEFINED Are we *sure* this is the consensus? IMO providing NaN may be more useful?
BOOST_STATIC_ASSERT(false); has been one way of providing a compile time trap and fails with a rather long message, although clicking on the 1st line goes straight to the source code above, so it soon becomes pretty clear.
It would have to be BOOST_STATIC_ASSERT(sizeof(T) == 0) as some compilers realise that BOOST_STATIC_ASSERT(false) will always fail and issue a diagnostic irrespective of whether the template is instantiated.
OK - but I hope compilers don't get even smarter.
For MSVC at least one can use the #error pragma thus
#ifdef BOOST_MATH_COMPILE_FAIL_IF_UNDEFINED # ifdef BOOST_MSVC # error Mean of the Cauchy distribution is undefined! # else BOOST_STATIC_ASSERT(false); # endif #endif
and this produces the rather more explicit message:
I:\Boost-sandbox\math_toolkit\boost/math/distributions/cauchy.hpp(216)
: fatal error C1189: #error : Mean of the Cauchy distribution is undefined!
Surely that just gives an error whenever you include the header?
Seems to work for MSVC 8.0.
Or we could simply not provide the definitions of these functions at all: but then the error messages might be rather cryptic?
A definite step backwards, IMO. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Matthias Schabel Sent: 02 May 2007 19:30 To: boost@lists.boost.org; boost-announce; boost-users Subject: [boost] [math toolkit] Review results
1) Error handling is an issue that arises because this library is at the frontier between traditional numerical libraries and a (hopefully) new breed of such libraries that utilize modern programming techniques and facilities. Jeff Garland suggested that the default behavior should be to throw an exception, with errno being an option enabled by macro.
Given a macro #define BOOST_MATH_THROW_ON_ALL_ERRORS to switch all the macros 'on' it would seem easy enough to switch to C++ exception handling state. I am unclear why the default should be one way or the other, and it's a hassle to change it, so I propose to leave as is.
It would also be nice to have more granular control over which instances throw exceptions and which do not (so, for example, a user could choose to ignore denormals).
\tools\error_handling.hpp appears to me to provide control, for example, of denormals though BOOST_MATH_THROW_ON_DENORM_ERROR If BOOST_MATH_THROW_ON_DENORM_ERROR is not defined, then errno is NOT set, so it is effectively ignored. We don't have a std exception type for denorming, so underflow_error is used instead.
It was also suggested that additional, more transparent exceptions be provided for cases such as failure to converge rather than reusing tools::logic_error.
The standard exception classes seem rather ill-defined (ill-conceived even?) to me and I can't see a standard exception that fits the bill for 'failed to converge'. "failed to converge" seems a run-time error, so would std::range_error, be the least unsuitable (neither under nor overflow_error seem right)? Or does anyone have a better suggestion? Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com
participants (4)
-
John Maddock
-
Marco
-
Matthias Schabel
-
Paul A Bristow