Proposal: Vector-like container, which takes O(log(N)) for insert/erase, feedback

Hi, everybody! The most important feature of my container is performance. However, other features will appear later. https://github.com/alexkupri/array/blob/master/description/description.txt Today I compared my container with rope. My container outperformed it 10 times. link, I added verbose explanation under "ROPE TEST" section. Shortly speaking, four reasons exist: cache, pages, balancing and malloc/free.

Hi Alexander, Please respond one email at a time, otherwise it's hard to follow the discussion. See http://www.boost.org/community/policy.html. Alexander Kuprijanov wrote:
Again, the times you measure are too small to reasonably tell which one is faster. You could perform each test 10k more times to keep the results around 1s. Instead of comparing it with a different implementation I guess it would be sufficient to contain only 2 children in the internal nodes. <snip>
I agree with Lars. Your container shouldn't be seen as a replacement. It has different properties, hence it may be useful in some situations but not all of them. And I strongly dissagree with the statement that your container should be the first choice. std::vector<> will outperform it in most of the real-life use-cases. Also, IMHO the most important feature of std::list<> is insert() and erase() not invalidating other iterators. AFAIU in your container iterators may be invalid after insert()? Regards, Adam

On Apr 5, 2014, at 4:59 PM, Adam Wulkiewicz <adam.wulkiewicz@gmail.com> wrote:
I agree with Lars. Your container shouldn't be seen as a replacement. It has different properties, hence it may be useful in some situations but not all of them.
..
AFAIU in your container iterators may be invalid after insert...
I think it would benefit the discussion if we would lists it's properties about access and O() and compare it with STL containers. If it a class on its own then one can expect more variant in the future like the hash_map version of a map.

On Sun, Apr 6, 2014 at 12:45 AM, Alexander Kuprijanov <alexkupri@gmail.com>wrote:
Thank you for the figure! IIUC, this data structure is a variant of an augmented B+ tree, since it stores elements of a data set in external nodes only and provides access to the elements through a B-tree, which implements augmenting with the counts of elements. It is a combination of two data structures, which is one of the main features of B+ trees. This data structure is quite similar to augmented B+ trees implemented in the library "STL Extensions" submitted for Boost review https://github.com/vstadnik/stl_ext_adv_review. For comparison, please have a look at Figure 1 in the section "Design and implementation of B+ trees" that shows an example of a string representation. This data structures is a combinations of a segmented array and a dynamically allocated B-tree. The difference is that this data structure is a level-linked B+ trees. It has additional links so that nodes at every internal level of the tree form a linked list. The main advantage of the level-linked variant of B+ trees is that with multiple augmenting they can support not only fast access and update operations on sequences of elements, but also fast sequential summation algorithms, which have logarithmic running time. The first suggestion is to implement a container that provides a union of interfaces of STL sequences and can be used as a replacement for std::vector, std::list and std::deque. In the library "STL extensions" this container is named std_ext_adv::sequence. In addition to this, the data structure that you developed can be used to store sorted elements. Thus, it can support interfaces of STL containers std::set, std::multiset, std::map and std::multimap. Did you consider the development of these associative containers? I would suggest implementing all of these STL containers without fast summation algorithms for the first version. The users will be able to take advantage of interchangeable containers and choose the best one for a specific application. Regards, Vadim Stadnik

Vadim Stadnik <vadimstdk <at> gmail.com> writes:
Hi, Vadim! I've compared current performance of our containers. Mine outperforms yours roughly by 1.5-2 times in random single insert, delete and random access. However, yours outperforms mine up to 3 times on large arrays in iterative access. Additionally, yours keeps additional pointer per each element. This seems me quite unnecessary and ineffective. Yours takes roughly 2 times more memory. I understood for myself, how to implement iterators, so that they will be relatively fast and compatible with std::vector interface. This will be my priority #1, because everybody wants stl compatibility. Best regards, A.K.

On Tue, Apr 8, 2014 at 11:07 PM, Alex Kupriianov <alexkupri@gmail.com>wrote:
Thank you for the comparison of performance. The performance of an algorithm is normally better in a native data structure than in its STL variant unless the latter can take advantage of a fast allocator. When you implement STL compliant containers your data structures can be compared with many other data structures: linked lists, red-black trees, arrays, vectors, and other augmented trees. The current implementation of augmented B+ trees that you tested is quite general. At this stage I would like to avoid the problem of premature optimization. The use of space and performance of specific algorithms can be improved in this variant of B+ trees. This task requires specification of priorities. Regards, Vadim Stadnik

Alexander Kuprijanov <alexkupri <at> gmail.com> writes:
Hi again, boost community! After some time of hesitating, I re-implemented the container. Shortly speaking, the container has interface like vector, but performs insertions/deletions much faster (random access, however, a bit slower). It takes O(log(N)) for insertions, deletions, random access, near O(1) for iterative access. It performs random insertions/deletions faster than vector in absolute values (not only in O() terms) on all range of container size. The things I've changed: 1) renamed the container; 2) made its interface compatible with vector; 3) added some useful functions, such as split and concatenate, which do not move all elements and require only O(log(N)) time. I hope the container will be useful extension. Code: https://github.com/alexkupri/array Manual: http://alexkupri.github.com/array/

On Thu, Jul 10, 2014 at 4:44 AM, Aleksandr Kupriianov <alexkupri@gmail.com> wrote:
Hi Aleksandr, IMO, the data structure and container that you developed are quite useful. Any data structure or container that outperforms std::vector<T> is important in practice, since std::vector<T> is the default and the most frequently used standard container. Linear computational complexity of its update operations can significantly affect performance of user algorithms. This is why I suggest keep improving your code and project. The first problem is that your files do not include Boost license. If you are planning to submit your project for Boost review please check this list of requirements: http://www.boost.org/development/requirements.html As for documentation, it is necessary to explain benefits of your data structure and container. It would be helpful to provide examples of applications in which the new container can replace std::vector<T> with significant performance gain. You can use already available examples discussed in my and other similar projects, but it would be even more impressive if you add some new examples. It is also highly desirable to include a section that explains the design of your data structure. One or two figures that illustrate the method of augmenting would help every interested programmer to better understand your achievements. Thank you for comparing performance with my containers. A bit later I will test your container and will write more comments. Regards, Vadim Stadnik
participants (7)
-
Adam Wulkiewicz
-
Aleksandr Kupriianov
-
Alex Kupriianov
-
Alexander Kuprijanov
-
Jeremy Murphy
-
Thijs (M.A.) van den Berg
-
Vadim Stadnik