On Fri, Sep 4, 2015 at 7:43 AM, Rajaditya Mukherjee < rajaditya.mukherjee@gmail.com> wrote:
Hi Matt,
I am imagining such a library would be greatly useful for numerical scientists and statisticians for precise computing. Can you give us a toy example of how it works ?
I can't post the code without releasing it but I can briefly describe how it works. The general input-range version works by accumulating a mixed number that represents the mean at any given iteration. The mixed number has a whole number part that is of the value_type of the range, and the fractional part is represented as a numerator that is of the iterator's difference_type and a denominator that is of the iterator's difference type. The initial "value" of this state is {0, 0, 0}, which is indeterminate. At any given iteration, you are guaranteed that the fractional part's denominator is exactly equal to the number of elements that have been iterated over, the fractional part's numerator is in the range [0,denominator), and the whole number is the whole number part of the mean. In other words, the fractional part is never reduced. By having a state represented in this manner you are able to communicate back to the user both the exact average as well as the number of elements in the range (since the number of elements is equal to the denominator's value). This guarantee is also useful as we advance the state. The reason the result is 3 values representing a mixed number and not 2 values simply representing a rational number is because a rational number consisting of 2 values of either the difference type or the value type cannot be guaranteed to be able to represent the exact mean without possibility of overflow. With 3 values of appropriate type, you can guarantee that the mean can be represented since the whole number part will always be bounded by the smallest and largest value in the input range (and remember the whole part's type is the value_type of the range). The components of the fractional part use the difference_type of the range since I always represent the fractional part as N/D where D is the number of elements seen so far and N is in the range [0, D). In this manner, the exact mean is guaranteed to able to be represented at any given iteration, and the accumulation function can advance the state in a manner that will never overflow. You use it like any STL algorithm: ////////// auto result = mean(std::begin(range), std::end(range)); ////////// At the moment I do not have an implementation that works piecemeal as is akin to Boost.Accumulators, however since the implementation of the algorithm really is simply a call to std::accumulate, it should be very simple to expose such a form, and I imagine that it would be especially useful. Some further details on how the state is manipulated with each iteration: At any given point, we have the whole number part, the numerator of the fractional part, and the denominator of the fractional part (remember that the denominator is guaranteed to always be the size of the range up to this point). When advancing to the next state, I do an operation that has the same effect as taking a single, rational version of the mixed number N/D, changing this value to N/(D+1), and then adding in V/(D+1), where V is the next value in the range (remember that here D is the current number of elements that we've iterated over). This isn't directly what I do since I'm dealing with a mixed number instead of a rational number, but it's the same overall effect and is done in a way that cannot overflow. The only thing that is really hairy at this point is that there is mixed-mode arithmetic involved (the value_type and the difference_type need to be operands to arithmetic functions). At the moment I handle the mixed-mode arithmetic by explicitly casting, but ultimately it should probably be a customization point of the algorithm, especially if this is to be useful with user-defined, integer-like value types. -- -Matt Calabrese