Linear circular buffer

Hi, The boost library contains a circular buffer template by Jan Gaspar. It seems well written and STL robust, but I believe the design has some performance shortcomings in many practical usages. Every time I have used a circular buffer it has been to drag a window over a huge or real time dataset. If the dataset is non real-time and small enough to be contained in memory, there is no practical need for a circular buffer, since the window data pointer can just be dragged in memory along the full length of the dataset. In many algorithms using a dragged window, you would need to process the full length of the window for each time the window is moved. E.g. convolving signal data. Thus, you read from the window a lot more than you write. To maximize performance of your window reading and to utilize external libraries, you need linear access to the data. While Gaspar's circular buffer provide a linearize method, it requires a full reorganization of the contained data. This yields an O(N*M) efficiency, just for dragging the window (size N) over the data (size M). For internal use, the circular buffer provides iterators, but each iteration requires an internal buffer check, in order to wrap around the buffer border. Since the nature of a circular buffer in most practical scenarios (at least that I can think of) is to be involved in the processing of large amount of data, read performance is important. If you allow the internal buffer to allocate more memory than the size of the circular buffer, you can reduce the number of times you need to reorganize the internal buffer to make it linear readable. If you allocate N items more than the circular buffer size, you only need to reorganize every N+1 pushes to the window. Thus, allocating additional 100 items will reduce the reorganization overhead to 1%. I have written to Jan Gaspar, who recognizes that the buffer is not efficient for linear use, but he hasn't got the time anymore to add that feature. Is a circular buffer variant with linear access at a low maintenance cost something that would have public interest? Soren

Hi Soren, On Thu, Jun 25, 2009 at 4:35 AM, Soren Holstebroe<holstebroe@gmail.com> wrote:
The boost library contains a circular buffer template by Jan Gaspar. It seems well written and STL robust, but I believe the design has some performance shortcomings in many practical usages.
I use it just fine in my high performance applications where I maintain the most recent N timings from API calls. Although iteration through the dataset may be a little painful, N is not large enough for me to notice (100 and 1000) the pain. The calls run in sub-microsecond time and the maintenance of automatically evicting older entries gives me (the user) a good guarantee that the data I'm getting is consistent.
Every time I have used a circular buffer it has been to drag a window over a huge or real time dataset. If the dataset is non real-time and small enough to be contained in memory, there is no practical need for a circular buffer, since the window data pointer can just be dragged in memory along the full length of the dataset.
That makes two of us. :)
In many algorithms using a dragged window, you would need to process the full length of the window for each time the window is moved. E.g. convolving signal data. Thus, you read from the window a lot more than you write. To maximize performance of your window reading and to utilize external libraries, you need linear access to the data. While Gaspar's circular buffer provide a linearize method, it requires a full reorganization of the contained data. This yields an O(N*M) efficiency, just for dragging the window (size N) over the data (size M). For internal use, the circular buffer provides iterators, but each iteration requires an internal buffer check, in order to wrap around the buffer border. Since the nature of a circular buffer in most practical scenarios (at least that I can think of) is to be involved in the processing of large amount of data, read performance is important.
I agree about read performance, but I don't think you ever need to call 'linearize' if you don't intend to build a vector out of your circular buffer. Iterating through the container is O(N) where N is the size of the container, and the check for borders is O(1); I can imagine that you can use the modulo (%) function given that you know the initial offset of the "first" entry in the buffer, and N being the size of the buffer -- this doesn't require a check for borders then because any increment will always be trimmed and offset to within the buffer.
If you allow the internal buffer to allocate more memory than the size of the circular buffer, you can reduce the number of times you need to reorganize the internal buffer to make it linear readable. If you allocate N items more than the circular buffer size, you only need to reorganize every N+1 pushes to the window. Thus, allocating additional 100 items will reduce the reorganization overhead to 1%.
How about space efficiency then? I (the user) expect to have just 100 elements' worth of memory used when I allocate 100 elements' worth of a circular buffer at construction time (same goes for the STL vector).
I have written to Jan Gaspar, who recognizes that the buffer is not efficient for linear use, but he hasn't got the time anymore to add that feature.
Is a circular buffer variant with linear access at a low maintenance cost something that would have public interest?
Linear access time is nice especially if you intend to always just go through the whole data structure everytime -- having said this I don't think I'd necessarily want to pay for extra space usage if I can absolutely avoid it. How do you intend to implement the internal buffer then without having to use more memory than necessary to store the elements? -- Dean Michael Berris | Software Engineer, Friendster, Inc. blog.cplusplus-soup.com | twitter.com/mikhailberis | linkedin.com/in/mikhailberis | profiles.friendster.com/mikhailberis | deanberris.com
participants (2)
-
Dean Michael Berris
-
Soren Holstebroe