multi-index:Q - Space and time efficiency
Hi, I have a struct with 2 members as follows: typedef struct time_series { time_t _time; double _value; } time_series_t; The data I get is sorted by time as it is a time series. I would like to maintain another index sorted by value. Since I need to keep removing the oldest records, I need the time based order and since I need to compute percentile, I need to sort it based on value. Would using multi-index save space (and time in insertion and removal) compared to having 2 vectors with different sort order? This is related to my implementation of percentile using boost::accumulators. I could not use the 'tail' or other equivalents as they expect window size (number of data) in advance and I will not know that. Hence, I need a container that can grow. -dhruva Unlimited freedom, unlimited storage. Get it now, on http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/
dhruva escribió:
Hi, I have a struct with 2 members as follows:
typedef struct time_series { time_t _time; double _value; } time_series_t;
The data I get is sorted by time as it is a time series. I would like to maintain another index sorted by value. Since I need to keep removing the oldest records, I need the time based order and since I need to compute percentile, I need to sort it based on value. Would using multi-index save space (and time in insertion and removal) compared to having 2 vectors with different sort order?
The space taken per element by the Boost.MultiIndex-based solution would be: 6p+sizeof(time_series_t) where p is the size of a pointer, 4 bytes in a typical 32-bit architecture. (See http://www.boost.org/libs/multi_index/doc/performance.html for details.) As for having two vectors, one with the time_series_t objects themselves, the second with pointers to the elements of the first one, the space per element is c(p+sizeof(time_series_t)) where c is the capacity() to size() ratio for the vectors, which oscillates (unless explicitly controlled by the user) between 1 and some lib-specific upper bound (2 or 1.5 in most cases). So, if we have for instance p=4, sizeof(time_series_t)=16, c=1.25 that would yield 40 bytes per element for scenario 1 and 25 bytes per element for scenario 2. Your mileage may vary depending on your particular environment. As for speed, this much depends on your usage patterns: how often you sort the vectors, prune them, etc. In particular, take into account that removing elements from a position in a vector other than the tail is expensive, which might lead you to use (the more space-consuming) std::queues instead. I guess your best option is to prototype both scenarios and profile them. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
----- Original Message ----
From: "joaquin@tid.es"
As for speed, this much depends on your usage patterns: how often you sort the vectors, prune them, etc. In particular, take into account that removing elements from a position in a vector other than the tail is expensive, which might lead you to use (the more space-consuming) std::queues instead. I guess your best option is to prototype both scenarios and profile them.
Thank you for the detailed explanation. To have a sorted data (1 container), I am suing priority_queue (based on vector) and the other container storing the incoming data (which is sorted based on time) is stored in dequeue. I will take your suggestion of having both implementations and profiling them. -dhruva Check out the all-new face of Yahoo! India. Go to http://in.yahoo.com/
participants (2)
-
dhruva
-
joaquin@tid.es