statistics: exact median, first and third quartile
Hello all, I was looking if Boost would support basic statistics calculated from a container of values: - Boost.Accumulators supports these statistics but only in an incremental way. This is nice, but now we want exact values since the containers are small (i.e. less than 100 elements) - Boost.Math: support only for distributions Did I overlook something or is this just not supported?
AMDG On 02/26/2013 07:29 AM, gast128 wrote:
Hello all,
I was looking if Boost would support basic statistics calculated from a container of values: - Boost.Accumulators supports these statistics but only in an incremental way. This is nice, but now we want exact values since the containers are small (i.e. less than 100 elements) - Boost.Math: support only for distributions
Did I overlook something or is this just not supported?
This is called std::nth_element. In Christ, Steven Watanabe
Steven Watanabe wrote:
On 02/26/2013 07:29 AM, gast128 wrote:
Hello all,
I was looking if Boost would support basic statistics calculated from a container of values: - Boost.Accumulators supports these statistics but only in an incremental way. This is nice, but now we want exact values since the containers are small (i.e. less than 100 elements) - Boost.Math: support only for distributions
Did I overlook something or is this just not supported?
This is called std::nth_element.
Wouldn't it still be nice to have this as part of the accumulator framework? How about the mode? Would that be worth adding to boost::accumulators?
Steven Watanabe
Did I overlook something or is this just not supported?
This is called std::nth_element.
In Christ, Steven Watanabe
Thx, but are you sure? Median is defined as the middle element in odd distributions but is the average of the 2 middle elements in even distributions. For quartiles (or generalized quantiles) its even worse: there are multiple definitions for quartiles. I even had an article listed 11 definitions of calculating the quartile: http://www.amstat.org/publications/jse/v14n3/langford.html. Still it would have been better if Boost had done the thinking for me :)
gast128 wrote:
Steven Watanabe
writes:
This is called std::nth_element.
Thx, but are you sure? Median is defined as the middle element in odd distributions but is the average of the 2 middle elements in even distributions.
So? For a collection c of size 2n, you would use: nth_element( begin(c), begin(c) + n - 1, end(c) ); nth_element( begin(c), begin(c) + n, end(c) ); auto median = (c[n - 1] + c[n]) / 2;
For quartiles (or generalized quantiles) its even worse: there are multiple definitions for quartiles. I even had an article listed 11 definitions of calculating the quartile: http://www.amstat.org/publications/jse/v14n3/langford.html. Still it would have been better if Boost had done the thinking for me :)
Definitely!
Thx, but are you sure? Median is defined as the middle element in odd distributions but is the average of the 2 middle elements in even distributions.
So? For a collection c of size 2n, you would use:
nth_element( begin(c), begin(c) + n - 1, end(c) ); nth_element( begin(c), begin(c) + n, end(c) ); auto median = (c[n - 1] + c[n]) / 2;
Yes however nth_element has some overhead so doing it twice is not advisable. Ofc u can fix that (by either partial sort etc.), but the best thing was: auto med = boost::median(itBegin, itEnd); (//or std::)
participants (3)
-
gast128
-
James Hirschorn
-
Steven Watanabe