[boost][math toolkit] Review results
I'd like to congratulate John Maddock, Paul Bristow, and their contributors on putting together an outstanding submission to Boost. It is rare to have complete unanimity among votes, but in this case there were no dissenters to accepting either the special functions or the statistical distributions portions of the library, so I'm glad to announce that it has been accepted for inclusion in Boost. Thanks to the many participants in the review : Gottlob Frege, Guillaume Melquiond, Johan RĂ¥de, Arnaldur Gylfason, John Phillips, Mark Van De Vyver, Stephan Tolksdorf, Hubert Holin, Seweryn Habdank-Wojewodski, Kevin Lynch, Leonaldo Peralta, Jeff Garland, and Stefan Seefeld (apologies for anyone I might have missed in this list). Here is a brief summary of the relatively few major issues that were raised during the course of the review : 1) Error handling is an issue that arises because this library is at the frontier between traditional numerical libraries and a (hopefully) new breed of such libraries that utilize modern programming techniques and facilities. Jeff Garland suggested that the default behavior should be to throw an exception, with errno being an option enabled by macro. It would also be nice to have more granular control over which instances throw exceptions and which do not (so, for example, a user could choose to ignore denormals). It was also suggested that additional, more transparent exceptions be provided for cases such as failure to converge rather than reusing tools::logic_error. 2) Jeff Garland also pointed out, rightly in my opinion, that attempts to use statistical functions that do not exist for a distribution should fail to compile rather than leading to a runtime error (e.g. mean of a Cauchy distribution). A reasonable method of implementing this should be devised or a strong argument for why it is not feasible/desirable provided. 3) Arnaldur Gylfason did some nice accuracy and performance testing vs. R. It was noted that the performance of quantile functions in this library was significantly worse than R, unlike non-quantile functions. He also pointed out that, for discrete distributions, the current behavior of returning fractional results for quantiles may not be the expected one. These issues should be addressed and documented. 4) Hubert Holin suggested the possibility of use of a policy parameter to choose between speed and accuracy. Another interesting possibility would be to allow the user to specify the desired precision either at compile or runtime. 5) It appears that this library will actually become two "sublibraries" within the Boost.Math library. Currently all code lives in the boost::math namespace; I would like to at least see a discussion of the possibility of having boost::math::special_functions, boost::math::statistics, and, perhaps, boost::math::statistics::distributions namespaces - as more functionality gets added to boost::math, collisions will become more likely, so some thought given now to logical partitioning may save pain later. During the review a number of typos and minor issues of documentation were raised, which the authors will need to address if they haven't already. In particular, John Phillips noted that the rationale behind using rational (vs. polynomial) approximations be clarified, the derivation of coefficients be documented, and the list of special functions in the documentation be expanded to encompass all functions actually implemented in the library. Stephan Tolksdorf requested a table listing standard statistical tests and the corresponding library implementations and enumerated a number of other documentation issues. Overall, the library was very well received, and the authors are to be commended on the tremendous amount of care and effort devoted to its preparation. Matthias Schabel Review Manager
Matthias Schabel wrote: Many thanks for the summary Mattias.
1) Error handling is an issue that arises because this library is at the frontier between traditional numerical libraries and a (hopefully) new breed of such libraries that utilize modern programming techniques and facilities. Jeff Garland suggested that the default behavior should be to throw an exception, with errno being an option enabled by macro. It would also be nice to have more granular control over which instances throw exceptions and which do not (so, for example, a user could choose to ignore denormals). It was also suggested that additional, more transparent exceptions be provided for cases such as failure to converge rather than reusing tools::logic_error.
I'll raise a separate thread about this at some point, and see if there's a consensus on what folks want.
2) Jeff Garland also pointed out, rightly in my opinion, that attempts to use statistical functions that do not exist for a distribution should fail to compile rather than leading to a runtime error (e.g. mean of a Cauchy distribution). A reasonable method of implementing this should be devised or a strong argument for why it is not feasible/desirable provided.
Will do.
3) Arnaldur Gylfason did some nice accuracy and performance testing vs. R. It was noted that the performance of quantile functions in this library was significantly worse than R, unlike non-quantile functions. He also pointed out that, for discrete distributions, the current behavior of returning fractional results for quantiles may not be the expected one. These issues should be addressed and documented.
I'll look into those, there were only a couple I think that were much slower, could be they have better initial guesses for the numeric search. I'll try and find out what R does.
4) Hubert Holin suggested the possibility of use of a policy parameter to choose between speed and accuracy. Another interesting possibility would be to allow the user to specify the desired precision either at compile or runtime.
That's quite a bit of work, but I'll certainly document it under the issues list. Seems like maybe we have a request to allow several extra parameters to the functions at some point in the future: * To control whether they throw or not. * To control accuracy. * To possibly return an error estimate. All present difficult pro's and con's as to whether they should be compile time or runtime parameters. Plus a lot of work! I'll try and think about this some more, but in the mean time ideas (preferably with demo code!) would be welcome.
5) It appears that this library will actually become two "sublibraries" within the Boost.Math library. Currently all code lives in the boost::math namespace; I would like to at least see a discussion of the possibility of having boost::math::special_functions, boost::math::statistics, and, perhaps, boost::math::statistics::distributions namespaces - as more functionality gets added to boost::math, collisions will become more likely, so some thought given now to logical partitioning may save pain later.
I'll start a separate discussion thread about that at some point, it's something Paul and I have discussed before and come to no firm conclusion on :-(
During the review a number of typos and minor issues of documentation were raised, which the authors will need to address if they haven't already. In particular, John Phillips noted that the rationale behind using rational (vs. polynomial) approximations be clarified, the derivation of coefficients be documented, and the list of special functions in the documentation be expanded to encompass all functions actually implemented in the library. Stephan Tolksdorf requested a table listing standard statistical tests and the corresponding library implementations and enumerated a number of other documentation issues.
Will fix. Cheers, John.
participants (2)
-
John Maddock
-
Matthias Schabel