Re: [boost] Countertree

On 17 November 2012 19:44, Francisco Jos? Tapia <fjtapia@gmail.com> wrote:
2012/11/16 Vadim Stadnik <vadimstdk@gmail.com>
On Sat, Nov 17, 2012 at 4:25 AM, Leo Goodstadt <leo.goodstadt@llew.org.uk wrote:
Project location ( zip file with code and documentation) : https://dl.dropbox.com/u/8437476/works/countertree_code_doc.zip
Quick view of documentation with code download : https://dl.dropbox.com/u/8437476/works/countertree/index.html
This library is an implementation of a binary red-black counter tree. This tree have an additional counter in each leaf. This permit the access to the elements by the position, like in a vector. It is a random access container with random access iterators
Among other things, countertree would allow the boost Statistical Accumulators Library to support rolling median and rolling quantiles efficiently. This would be very useful.
I had previously had to hack the tree class in the Policy-Based Data Structures part of the gnu c++ library. (see http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/tree_order_statistics_node... ) As far as I know, this support is not yet available in countertree project, although this is possible in principle.
About the compatibility with the boost Statistical Accumulators Library, I didn't thought about it, but it would be useful. I will try to implement, but I need to examine in deep for to provide you a well founded opinion.
1) For counter trees, the median is just the N/2-th value in the counter tree, where K is the size of the tree. The Nth-percentile can be calculated similarly. A rolling median with a fixed size window requires there to be an insertion and deletion for each (most!) new value, so I am guessing it should run in O(N log K) time, where N = size of data, K = size of window 2) The alternative is to use two multisets corresponding to values smaller and larger than the current median (or the Nth percentile). This should also have O(N log K) complexity. 3) With the special case where the data consists of integers in a fixed range, we can use a segment tree where the complexity would be O(N log R) where R = the size of the fixed range of data values. This can result in considerable speed ups. The best bit about counter trees are their flexibility, we can calculate multiple Nth-percentiles at the same time (e.g. all the quartiles). As to the actual rather than theoretical speed, I think we need bench-marking.... Leo

1) For counter trees, the median is just the N/2-th value in the counter tree, where K is the size of the tree. The Nth-percentile can be calculated similarly. A rolling median with a fixed size window requires there to be an insertion and deletion for each (most!) new value, so I am guessing it should run in O(N log K) time, where N = size of data, K = size of window
IIUC, it is better to avoid modifying original data (size N), the rolling window of size K can be represented by a multiset based on an augmented
On Mon, Nov 19, 2012 at 9:02 PM, Leo Goodstadt <leo.goodstadt@llew.org.uk>wrote: tree. This method guarantees complexity of obtaining 1st median value for O(K log K) operations, which is the cost of construction of the rolling window, every subsequent median value can be obtained for O(log K) operations, when the window rolls. This approach should be more efficient. Regards, Vadim Stadnik
participants (2)
-
Leo Goodstadt
-
Vadim Stadnik