Re: [boost] Review managers wanted for augmented data structures

8:08 PM, "Vadim Stadnik" wrote:
Is there an alternative of not integrating advanced data structures into STL?
Why do they need to be put "into STL"? Why can't they exist in addition to it, or alongside it?
Augmented trees are non-standard due to different performance guarantees.
They're non-standard because they're not in the standard.
A more reasonable and practical approach to STL would be to provide a wide set of interchangeable containers and data structures with different performance guarantees and allow programmers to make decisions about the most suitable containers for specific applications.
That's exactly what the STL is. The STL is about iterators and algorithms and concepts, not a fixed set of predefined containers. You've claimed the STL makes your containers "illegal" which is an odd claim that I can't agree with. Extending the STL with alternative containers is not just acceptable, I'm pretty sure it was intended by Stepanov et al. If your containers don't meet the complexity requirements of an Associative Container as defined in the standard that's fine, just don't claim it is an Associative Container and there shouldn't be a problem, surely. I don't understand why you think there are objections to non-standard containers. The containers in the standard are not suitable for all situations and are not the only choices, if there's a more suitable non-standard container then using it makes sense. This doesn't put the STL "under pressure", it proves its value, because existing algorithms work with both standard and non-standard containers if they model the required concepts.

On Sun, Mar 3, 2013 at 3:22 AM, Jonathan Wakely <jwakely.boost@kayari.org> wrote:
[..] existing algorithms work with both standard and non-standard containers if they model the required concepts.
The containers we are proposing here are sequence containers. The computational complexity of their operations are somehow in between those of std::list and std::vector. They provide random access in O(log N) time _and_ insertion/removal in O(log N) time. Therefore, their iterators are _not_ as random-access as those of std::vector. If we declare them random-access, existing algorithms will perform poorly on them. And if we declare them sequential-access (Bidirectional), it will be worse. For example, if the contents of the iterator are sorted, we can make a binary search in O(log N) time, since the implementation is based on a self balanced tree. If we declare the iterators random-access, the existing binary_search algorithm will treat the sequence like a vector, and it will take O(log N log N) time. Existing algorithms are tailored for the existing sequence containers. In order to integrate the use of these new containers ---even as non-standard containers--- we need the standard library to at least acknowledge the possibility of their existence by declaring a new iterator category. Best regards, Martín Knoblauch Revuelta

On Sun, Mar 3, 2013 at 7:38 PM, Martin Knoblauch Revuelta < mkrevuelta@gmail.com> wrote:
On Sun, Mar 3, 2013 at 3:22 AM, Jonathan Wakely <jwakely.boost@kayari.org> wrote:
[..] existing algorithms work with both standard and non-standard containers if they model the required concepts.
The containers we are proposing here are sequence containers. The computational complexity of their operations are somehow in between those of std::list and std::vector. They provide random access in O(log N) time _and_ insertion/removal in O(log N) time.
Therefore, their iterators are _not_ as random-access as those of std::vector. If we declare them random-access, existing algorithms will perform poorly on them. And if we declare them sequential-access (Bidirectional), it will be worse.
For example, if the contents of the iterator are sorted, we can make a binary search in O(log N) time, since the implementation is based on a self balanced tree. If we declare the iterators random-access, the existing binary_search algorithm will treat the sequence like a vector, and it will take O(log N log N) time.
The main point is that asymptotically this is much more efficient than linear search.
Existing algorithms are tailored for the existing sequence containers. In order to integrate the use of these new containers ---even as non-standard containers--- we need the standard library to at least acknowledge the possibility of their existence by declaring a new iterator category.
I am not sure that adding a new category of an iterator will be helpful, since this can affect the generality of design and user algorithms. To some degree it is equivalent to the current C++ specification of computational complexities. This approach will make acceptable one class of augmented trees, but at the same time it will create problems for integration of other potentially useful data structures. For example, suppose there is a new data structure that supports random access to elements with O(log(log N)) running time. Should we again introduce a new iterator category? Regards, Vadim Stadnik

On Sun, Mar 3, 2013 at 4:47 PM, Vadim Stadnik <vadimstdk@gmail.com> wrote:
On Sun, Mar 3, 2013 at 7:38 PM, Martin Knoblauch Revuelta < mkrevuelta@gmail.com> wrote:
On Sun, Mar 3, 2013 at 3:22 AM, Jonathan Wakely <jwakely.boost@kayari.org> wrote:
[..] existing algorithms work with both standard and non-standard containers if they model the required concepts.
The containers we are proposing here are sequence containers. The computational complexity of their operations are somehow in between those of std::list and std::vector. They provide random access in O(log N) time _and_ insertion/removal in O(log N) time.
Therefore, their iterators are _not_ as random-access as those of std::vector. If we declare them random-access, existing algorithms will perform poorly on them. And if we declare them sequential-access (Bidirectional), it will be worse.
For example, if the contents of the iterator are sorted, we can make a binary search in O(log N) time, since the implementation is based on a self balanced tree. If we declare the iterators random-access, the existing binary_search algorithm will treat the sequence like a vector, and it will take O(log N log N) time.
The main point is that asymptotically this is much more efficient than linear search.
Do you mean that you would declare them RandomAccess iterators? We can expect to have containers with millions of elements. For that size, O(log N) is about 20 times faster than O(log N log N). That makes a big difference.
[..] I am not sure that adding a new category of an iterator will be helpful, since this can affect the generality of design and user algorithms. To some degree it is equivalent to the current C++ specification of computational complexities. This approach will make acceptable one class of augmented trees, but at the same time it will create problems for integration of other potentially useful data structures.
Yes, I perfectly understand that introducing a new iterator category is not as simple as editing a few lines. That's why I see this as the mayor obstacle for the integration of augmented trees with the style of the C++ Standard Library.
For example, suppose there is a new data structure that supports random access to elements with O(log(log N)) running time. Should we again introduce a new iterator category?
Well, for millions of elements, this means about "4 times slower than O(1)"... so... maybe? Best regards, Martín Knoblauch Revuelta

Am 04.03.2013 08:48, schrieb Martin Knoblauch Revuelta:
Do you mean that you would declare them RandomAccess iterators?
We can expect to have containers with millions of elements. For that size, O(log N) is about 20 times faster than O(log N log N). That makes a big difference.
isn't that a reason for providing optimized specializations of some STL algorithms, instead of introducing another iterator category? how would a new iterator category even help? as long as your iterator is a forward iterator, the user is allowed to call std::binary_search on your iterator, which uses the same algorithm on any iterator category. (only functions used by binary_search like std::advance are specialized on the category) where in all this would your tree-optimized binary_search code be called? the standard doesn't require random access iterator ops to be constant time.

On Mon, Mar 4, 2013 at 7:35 PM, Stefan Strasser <strasser@uni-bremen.de>wrote:
Am 04.03.2013 08:48, schrieb Martin Knoblauch Revuelta:
Do you mean that you would declare them RandomAccess iterators?
We can expect to have containers with millions of elements. For that size, O(log N) is about 20 times faster than O(log N log N). That makes a big difference.
isn't that a reason for providing optimized specializations of some STL algorithms, instead of introducing another iterator category?
how would a new iterator category even help? as long as your iterator is a forward iterator, the user is allowed to call std::binary_search on your iterator, which uses the same algorithm on any iterator category. (only functions used by binary_search like std::advance are specialized on the category) where in all this would your tree-optimized binary_search code be called?
the standard doesn't require random access iterator ops to be constant time.
Unfortunately, there is a problem of iterator categories with augmented trees. The C++ standard specifies computational complexity for iterator operations, although they are not shown in tables, since all of them are required to be constant time. In both C++03 and C++11 this is formulated in the section 24.1.8. IMO, the problem is rather theoretical than practical. First of all, in systems with virtual memory management the value of "constant time" can vary by almost two factors for array based and dynamically allocated data structures. This is the reason why in some cases dumb algorithms using std::vector outperform smart algorithms using trees. Second, STL specification ignores the main theoretical parameter of user algorithms - the total computational cost. This aspect has been discussed in the very first message of this thread. Third, augmented trees easily bypass these C++ restrictions, since they provide wider sets of efficient operations and can replace standard containers through existing interfaces. Also, I do not consider adding new category of iterators useful for the reason that it works against generality by increasing the number of types. The generalization or abstraction method reduces all the specific types to one "ideal type". The generalization also means more unified and coherent interfaces for various types that support the interface of this "ideal type". Regards, Vadim Stadnik

On Sun, Mar 3, 2013 at 9:22 AM, Jonathan Wakely <jwakely.boost@kayari.org>wrote:
8:08 PM, "Vadim Stadnik" wrote:
Is there an alternative of not integrating advanced data structures into STL?
Why do they need to be put "into STL"?
Why can't they exist in addition to it, or alongside it?
In principle, this is a good option, since the class of advanced data structures is very wide (for library design can be regarded as unlimited) and it will be impossible to specify all data structures that support the same interfaces, but have minor differences in topology and data scheme in order to improve efficiency of specific container operations and user algorithms. The problems with this option is that current STL does not offer concepts of <sequence> or <set> that could be used in algorithms with "third party" data structures.
Augmented trees are non-standard due to different performance guarantees.
They're non-standard because they're not in the standard.
A more reasonable and practical approach to STL would be to provide a wide set of interchangeable containers and data structures with different performance guarantees and allow programmers to make decisions about the most suitable containers for specific applications.
That's exactly what the STL is. The STL is about iterators and algorithms and concepts, not a fixed set of predefined containers.
I agree that STL greatly improved representation of math concepts compared to all previous libraries. Nevertheless, it is well known that its containers are not completely interchangeable. First of all, this concerns the concept of <sequence>, which is represented in STL (C++03) by three types of containers with different interfaces: std::vector, std::list and std::deque. The next level of generalization is achieved with augmented trees that efficiently support the union of interfaces of all of these types of STL containers.
You've claimed the STL makes your containers "illegal" which is an odd claim that I can't agree with. Extending the STL with alternative containers is not just acceptable, I'm pretty sure it was intended by Stepanov et al.
I do not understand this statement. Because of ".. NOT just acceptable", it sounds like Stepanov et al. are responsible for STL being non-extendable,
If your containers don't meet the complexity requirements of an Associative Container as defined in the standard that's fine, just don't claim it is an Associative Container and there shouldn't be a problem, surely.
The case of associative containers is particularly illustrative. Basic search trees that currently support these types of containers are acceptable. However, the augmented variants of these trees that offer improved access to "n-th element" are not acceptable. Why should we avoid calling such containers associative when they use augmented trees (equivalent to improved trees) instead of basic trees? Just because they offer more efficient operations? Yes, that's correct! This conclusion follows from the C++ standard specification of computational complexities for single operations. This is an illustration how STL becomes unfriendly to more useful data structures.
I don't understand why you think there are objections to non-standard containers. The containers in the standard are not suitable for all situations and are not the only choices, if there's a more suitable non-standard container then using it makes sense. This doesn't put the STL "under pressure", it proves its value, because existing algorithms work with both standard and non-standard containers if they model the required concepts.
IMO, STL is fundamentally very strong ! First of all because of excellent interfaces. The problem is that it is not friendly to data structures that can make this library even better: improve efficiency of user algorithms and generalization of math concepts. The advantages of augmented data structures become obvious when they are tested in solutions to real problems. For everyone who is yet skeptical about usefulness of these data structures I would suggest to solve the following classical Josephus problem: http://en.wikipedia.org/wiki/Josephus_problem The performance test using two operations std::advance() and container.insert()) discussed in http://dl.dropbox.com/u/51041088/stl_ext_adv_review/doc/doc/boost_comparison... https://github.com/vstadnik/stl_ext_adv_review/blob/master/doc/doc/boost_comparison.html<https://github.com/vstadnik/stl_ext_adv_review> is a variant of algorithm for this type of problems with the difference that elements are not erased, but inserted into a container. Every STL and Boost container shows quadratic O(NxN) running time, whereas containers using augmented data structures have O(N logN) running time. Regards, Vadim Stadnik

On 3 March 2013 11:03, Vadim Stadnik wrote:
On Sun, Mar 3, 2013 at 9:22 AM, Jonathan Wakely wrote:
You've claimed the STL makes your containers "illegal" which is an odd claim that I can't agree with. Extending the STL with alternative containers is not just acceptable, I'm pretty sure it was intended by Stepanov et al.
I do not understand this statement. Because of ".. NOT just acceptable", it sounds like Stepanov et al. are responsible for STL being non-extendable,
No that's the opposite of what it says. Try replacing "just" with "only" if that helps. What I said is that extending the STL is acceptable, and I believe it was intended by the designers.

On Sun, Mar 3, 2013 at 9:56 PM, Jonathan Wakely <jwakely.boost@kayari.org>wrote:
On 3 March 2013 11:03, Vadim Stadnik wrote:
On Sun, Mar 3, 2013 at 9:22 AM, Jonathan Wakely wrote:
You've claimed the STL makes your containers "illegal" which is an odd claim that I can't agree with. Extending the STL with alternative containers is not just acceptable, I'm pretty sure it was intended by Stepanov et al.
I do not understand this statement. Because of ".. NOT just acceptable", it sounds like Stepanov et al. are responsible for STL being non-extendable,
No that's the opposite of what it says. Try replacing "just" with "only" if that helps.
What I said is that extending the STL is acceptable, and I believe it was intended by the designers.
Thank you, This is important clarification. Regards, Vadim Stadnik
participants (4)
-
Jonathan Wakely
-
Martin Knoblauch Revuelta
-
Stefan Strasser
-
Vadim Stadnik