New subject: Countertree

19 Nov 2012

      On 17 November 2012 19:44, Francisco Jos? Tapia <fjtapia@gmail.com> wrote:
...
2012/11/16 Vadim Stadnik <vadimstdk@gmail.com>
...
On Sat, Nov 17, 2012 at 4:25 AM, Leo Goodstadt <leo.goodstadt@llew.org.uk wrote:
...
...
Project location ( zip file with code and documentation) :
https://dl.dropbox.com/u/8437476/works/countertree_code_doc.zip
Quick view of documentation with code download :
https://dl.dropbox.com/u/8437476/works/countertree/index.html
This library is an implementation of a binary red-black counter tree.
This tree have an additional counter in each leaf. This permit the access to
the elements by the position, like in a vector. It is a random access
container with random access iterators
Among other things, countertree would allow the boost Statistical
Accumulators Library to support rolling median and rolling quantiles
efficiently. This would be very useful.
I had previously had to hack the tree class in the Policy-Based Data
Structures part of the gnu c++ library.
(see http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/tree_order_statistics_node... )
As far as I know, this support is not yet available in countertree project,
although this is possible in principle.
About the compatibility with the boost Statistical Accumulators Library, I
didn't thought about it, but it would be useful. I will try to implement,
but I need to examine in deep for to provide you a well founded opinion.
1) For counter trees, the median is just the N/2-th value in the
counter tree, where K is the size of the tree.
The Nth-percentile can be calculated similarly.
A rolling median with a fixed size window requires there to be an
insertion and deletion for each (most!) new value, so I am guessing
it should run in O(N log K) time, where N =  size of data, K = size of window

2) The alternative is to use two multisets corresponding to values
smaller and larger than the current median (or the Nth percentile).
This should also have O(N log K) complexity.

3) With the special case where the data consists of integers in a
fixed range, we can use a segment tree where the complexity would be
O(N log R) where R = the size of the fixed range of data values. This
can result in considerable speed ups.

The best bit about counter trees are their flexibility, we can
calculate multiple Nth-percentiles at the same time (e.g. all the
quartiles). As to the actual  rather than theoretical speed, I think
we need bench-marking....

Leo

Re: [boost] Countertree

Leo Goodstadt

Vadim Stadnik

tags

participants (2)