
In order to better motivate the need for the Boost Probability library, I have updated the documentation, which is accessible at http://biology.nmsu.edu/software/probability/ Although this constitutes a new release, the only difference is in documentation. As a result, the contents of v0.2.2 in the Boost Vault still reflect exactly the most recent release and I haven't uploaded a new copy. The new motivational example is taken from the problem of ascertaining the long-term trend of global climate. One database used to assess this is available from the NOAA National Climate Data Center (http://www.ncdc.noaa.gov/oa/climate/ghcn-monthly/index.php). It contains monthly data for thousands of stations worldwide, in many cases for decades. Today's version, for example, contains 590,543 records of mean temperature. A typical likelihood calculation evaluating a model of climate would involve a product of likelihoods across all of these records, almost certainly yielding a result on the order of 10^{-600,000} or less. Such numbers cannot be handled using typical floating point representations, so specialized solutions of some form are required. The natural method is to accumulate the sum of logarithms of likelihoods, rather than the product of likelihoods, across the dataset. This keeps the values within suitable bounds, but requires keeping track of the fact that different types of values (probabilities, likelihoods, and log likelihoods) are being used throughout a typical program. If these are all represented using native types, such as double, it is easy to lose track of the fact that they have different semantics. A real solution of this problem would include modules taking care of calculating the probability of each individual data record and modules taking care of accumulating that information across the records. The problem is complex enough that each of these responsibilities would realistically be divided across many units and it would not be unreasonable to expect development to be divided among many programmers. In such situations it is all too easy to lose track of what semantics apply to a specific value when the only information available in the code is the data type (e.g., double) which provides little help and some (perhaps untrustworthy) comments that may or may not be read and in any case cannot affect the compiler. Using the Probability library, one can encode the exact semantics using the type system in a way that lends itself to generic programming. The resulting clarity, safety, and maintainability is retained regardless of how large the code base becomes and how the operations are distributed across modules and/or programmers. As a result of these features, I feel that this library makes a significant contribution to solving a well-defined set of problems that occur in certain types of scientific programming and modeling. I hope you will take a serious look at its capabilities and provide me with further feedback. I am especially interested in improving the portability of the code and need testers with access to compilers other than g++. I look forward to your comments, suggestions, and general discussion. Thank you. Cheers, Brook