New subject: Countertree

20 Nov 2012

      ...
...
On Mon, Nov 19, 2012 at 9:02 PM, Leo Goodstadt <leo.goodstadt@llew.org.uk>wrote:
1) For counter trees, the median is just the N/2-th value in the
counter tree, where K is the size of the tree.
The Nth-percentile can be calculated similarly.
A rolling median with a fixed size window requires there to be an
insertion and deletion for each (most!) new value, so I am guessing
it should run in O(N log K) time, where N =  size of data, K = size of
window
IIUC, it is better to avoid modifying original data (size N), the rolling
window of size K can be represented by a multiset based on an augmented
On Tue, 20 Nov 2012, Vadim Stadnik <vadimstdk@gmail.com> wrote:
tree. This method guarantees complexity of obtaining 1st median value for
O(K log K) operations, which is the cost of construction of the rolling
window, every subsequent median value can be obtained for O(log K)
operations, when the window rolls. This approach should be more efficient.
Oops. Careless editing made me mix up my Ks and Ns, hopeless confusing everyone.

So given a large data set of size N and we want to calculate medians
across a rolling window
of size K, we only need to create a counter tree of size K (the window size).
The median is the K/2 value in the tree.

As you point out, the entire operation runs in O(N log K) time.

All sensible approaches to calculating medians I can think of take O(N
log K) time. The question comes down to what the constants are.

The great advantage of a counter tree is that we are not limited to
medians (the 50%ile value). We can collect for example, the values at
25%, 50%, and, 75%, and so on, at only marginally extra cost. I
presume that most of the cost lies in balancing the tree after
insertions/deletions and updating the counters.

The only shortcut is the trivial optimisation that if the value
entering the window is the same as the one being removed at the other
end, then the median obviously stays the same.

Leo

Re: [boost] Countertree

Leo Goodstadt

Andrii Sydorchuk

Francisco José Tapia

Andrii Sydorchuk

Francisco José Tapia

tags

participants (3)