Re: [boost] Review managers wanted for augmented data structures

newer
[fusion] What do we do with nil ?

older
[gil] undocumented/unintended...

Jonathan Wakely

2 Mar 2013 2 Mar '13

10:22 p.m.

8:08 PM, "Vadim Stadnik" wrote:

...

Is there an alternative of not integrating advanced data structures into STL?

Why do they need to be put "into STL"? Why can't they exist in addition to it, or alongside it?

...

Augmented trees are non-standard due to different performance guarantees.

They're non-standard because they're not in the standard.

...

A more reasonable and practical approach to STL would be to provide a wide set of interchangeable containers and data structures with different performance guarantees and allow programmers to make decisions about the most suitable containers for specific applications.

That's exactly what the STL is. The STL is about iterators and algorithms and concepts, not a fixed set of predefined containers. You've claimed the STL makes your containers "illegal" which is an odd claim that I can't agree with. Extending the STL with alternative containers is not just acceptable, I'm pretty sure it was intended by Stepanov et al. If your containers don't meet the complexity requirements of an Associative Container as defined in the standard that's fine, just don't claim it is an Associative Container and there shouldn't be a problem, surely. I don't understand why you think there are objections to non-standard containers. The containers in the standard are not suitable for all situations and are not the only choices, if there's a more suitable non-standard container then using it makes sense. This doesn't put the STL "under pressure", it proves its value, because existing algorithms work with both standard and non-standard containers if they model the required concepts.

Show replies by date

Martin Knoblauch Revuelta

3 Mar 3 Mar

8:38 a.m.

New subject: Review managers wanted for augmented data structures

On Sun, Mar 3, 2013 at 3:22 AM, Jonathan Wakely <jwakely.boost@kayari.org> wrote:

...

[..] existing algorithms work with both standard and non-standard containers if they model the required concepts.

The containers we are proposing here are sequence containers. The computational complexity of their operations are somehow in between those of std::list and std::vector. They provide random access in O(log N) time _and_ insertion/removal in O(log N) time. Therefore, their iterators are _not_ as random-access as those of std::vector. If we declare them random-access, existing algorithms will perform poorly on them. And if we declare them sequential-access (Bidirectional), it will be worse. For example, if the contents of the iterator are sorted, we can make a binary search in O(log N) time, since the implementation is based on a self balanced tree. If we declare the iterators random-access, the existing binary_search algorithm will treat the sequence like a vector, and it will take O(log N log N) time. Existing algorithms are tailored for the existing sequence containers. In order to integrate the use of these new containers ---even as non-standard containers--- we need the standard library to at least acknowledge the possibility of their existence by declaring a new iterator category. Best regards, Martín Knoblauch Revuelta

Vadim Stadnik

11:47 a.m.

New subject: Review managers wanted for augmented data structures

On Sun, Mar 3, 2013 at 7:38 PM, Martin Knoblauch Revuelta < mkrevuelta@gmail.com> wrote:

...

On Sun, Mar 3, 2013 at 3:22 AM, Jonathan Wakely <jwakely.boost@kayari.org> wrote:

...
[..] existing algorithms work with both standard and non-standard containers if they model the required concepts.

The containers we are proposing here are sequence containers. The computational complexity of their operations are somehow in between those of std::list and std::vector. They provide random access in O(log N) time _and_ insertion/removal in O(log N) time.

Therefore, their iterators are _not_ as random-access as those of std::vector. If we declare them random-access, existing algorithms will perform poorly on them. And if we declare them sequential-access (Bidirectional), it will be worse.

For example, if the contents of the iterator are sorted, we can make a binary search in O(log N) time, since the implementation is based on a self balanced tree. If we declare the iterators random-access, the existing binary_search algorithm will treat the sequence like a vector, and it will take O(log N log N) time.

The main point is that asymptotically this is much more efficient than linear search.

...

Existing algorithms are tailored for the existing sequence containers. In order to integrate the use of these new containers ---even as non-standard containers--- we need the standard library to at least acknowledge the possibility of their existence by declaring a new iterator category.

I am not sure that adding a new category of an iterator will be helpful, since this can affect the generality of design and user algorithms. To some degree it is equivalent to the current C++ specification of computational complexities. This approach will make acceptable one class of augmented trees, but at the same time it will create problems for integration of other potentially useful data structures. For example, suppose there is a new data structure that supports random access to elements with O(log(log N)) running time. Should we again introduce a new iterator category? Regards, Vadim Stadnik

Martin Knoblauch Revuelta

4 Mar 4 Mar

7:48 a.m.

New subject: Review managers wanted for augmented data structures

On Sun, Mar 3, 2013 at 4:47 PM, Vadim Stadnik <vadimstdk@gmail.com> wrote:

...

On Sun, Mar 3, 2013 at 7:38 PM, Martin Knoblauch Revuelta < mkrevuelta@gmail.com> wrote:

...
On Sun, Mar 3, 2013 at 3:22 AM, Jonathan Wakely <jwakely.boost@kayari.org> wrote:

...
[..] existing algorithms work with both standard and non-standard containers if they model the required concepts.

The containers we are proposing here are sequence containers. The computational complexity of their operations are somehow in between those of std::list and std::vector. They provide random access in O(log N) time _and_ insertion/removal in O(log N) time.

Therefore, their iterators are _not_ as random-access as those of std::vector. If we declare them random-access, existing algorithms will perform poorly on them. And if we declare them sequential-access (Bidirectional), it will be worse.

For example, if the contents of the iterator are sorted, we can make a binary search in O(log N) time, since the implementation is based on a self balanced tree. If we declare the iterators random-access, the existing binary_search algorithm will treat the sequence like a vector, and it will take O(log N log N) time.

The main point is that asymptotically this is much more efficient than linear search.

Do you mean that you would declare them RandomAccess iterators? We can expect to have containers with millions of elements. For that size, O(log N) is about 20 times faster than O(log N log N). That makes a big difference.

...

...
[..] I am not sure that adding a new category of an iterator will be helpful, since this can affect the generality of design and user algorithms. To some degree it is equivalent to the current C++ specification of computational complexities. This approach will make acceptable one class of augmented trees, but at the same time it will create problems for integration of other potentially useful data structures.

Yes, I perfectly understand that introducing a new iterator category is not as simple as editing a few lines. That's why I see this as the mayor obstacle for the integration of augmented trees with the style of the C++ Standard Library.

...

For example, suppose there is a new data structure that supports random access to elements with O(log(log N)) running time. Should we again introduce a new iterator category?

Well, for millions of elements, this means about "4 times slower than O(1)"... so... maybe? Best regards, Martín Knoblauch Revuelta

Stefan Strasser

8:35 a.m.

New subject: Review managers wanted for augmented data structures

Am 04.03.2013 08:48, schrieb Martin Knoblauch Revuelta:

...

Do you mean that you would declare them RandomAccess iterators?

We can expect to have containers with millions of elements. For that size, O(log N) is about 20 times faster than O(log N log N). That makes a big difference.

isn't that a reason for providing optimized specializations of some STL algorithms, instead of introducing another iterator category? how would a new iterator category even help? as long as your iterator is a forward iterator, the user is allowed to call std::binary_search on your iterator, which uses the same algorithm on any iterator category. (only functions used by binary_search like std::advance are specialized on the category) where in all this would your tree-optimized binary_search code be called? the standard doesn't require random access iterator ops to be constant time.

Vadim Stadnik

12:50 p.m.

New subject: Review managers wanted for augmented data structures

On Mon, Mar 4, 2013 at 7:35 PM, Stefan Strasser <strasser@uni-bremen.de>wrote:

...

Am 04.03.2013 08:48, schrieb Martin Knoblauch Revuelta:

Do you mean that you would declare them RandomAccess iterators?

...
We can expect to have containers with millions of elements. For that size, O(log N) is about 20 times faster than O(log N log N). That makes a big difference.

isn't that a reason for providing optimized specializations of some STL algorithms, instead of introducing another iterator category?

how would a new iterator category even help? as long as your iterator is a forward iterator, the user is allowed to call std::binary_search on your iterator, which uses the same algorithm on any iterator category. (only functions used by binary_search like std::advance are specialized on the category) where in all this would your tree-optimized binary_search code be called?

the standard doesn't require random access iterator ops to be constant time.

Unfortunately, there is a problem of iterator categories with augmented trees. The C++ standard specifies computational complexity for iterator operations, although they are not shown in tables, since all of them are required to be constant time. In both C++03 and C++11 this is formulated in the section 24.1.8. IMO, the problem is rather theoretical than practical. First of all, in systems with virtual memory management the value of "constant time" can vary by almost two factors for array based and dynamically allocated data structures. This is the reason why in some cases dumb algorithms using std::vector outperform smart algorithms using trees. Second, STL specification ignores the main theoretical parameter of user algorithms - the total computational cost. This aspect has been discussed in the very first message of this thread. Third, augmented trees easily bypass these C++ restrictions, since they provide wider sets of efficient operations and can replace standard containers through existing interfaces. Also, I do not consider adding new category of iterators useful for the reason that it works against generality by increasing the number of types. The generalization or abstraction method reduces all the specific types to one "ideal type". The generalization also means more unified and coherent interfaces for various types that support the interface of this "ideal type". Regards, Vadim Stadnik

Dave Abrahams

31 Mar 31 Mar

3:37 a.m.

New subject: Review managers wanted for augmented data structures

on Mon Mar 04 2013, Vadim Stadnik <vadimstdk-AT-gmail.com> wrote:

...

On Mon, Mar 4, 2013 at 7:35 PM, Stefan Strasser <strasser@uni-bremen.de>wrote:

...
Am 04.03.2013 08:48, schrieb Martin Knoblauch Revuelta:

Do you mean that you would declare them RandomAccess iterators?

...
We can expect to have containers with millions of elements. For that size, O(log N) is about 20 times faster than O(log N log N). That makes a big difference.

isn't that a reason for providing optimized specializations of some STL algorithms, instead of introducing another iterator category?

how would a new iterator category even help? as long as your iterator is a forward iterator, the user is allowed to call std::binary_search on your iterator, which uses the same algorithm on any iterator category. (only functions used by binary_search like std::advance are specialized on the category) where in all this would your tree-optimized binary_search code be called?

the standard doesn't require random access iterator ops to be constant time.

Unfortunately, there is a problem of iterator categories with augmented trees. The C++ standard specifies computational complexity for iterator operations, although they are not shown in tables, since all of them are required to be constant time. In both C++03 and C++11 this is formulated in the section 24.1.8.

Are you sure these iterators don't fall under the *amortized* constant time requirements, just like those of the ordered associative containers? -- Dave Abrahams

Vadim Stadnik

1:22 p.m.

New subject: Review managers wanted for augmented data structures

On Sun, Mar 31, 2013 at 1:37 PM, Dave Abrahams <dave@boostpro.com> wrote:

...

on Mon Mar 04 2013, Vadim Stadnik <vadimstdk-AT-gmail.com> wrote:

...
On Mon, Mar 4, 2013 at 7:35 PM, Stefan Strasser <strasser@uni-bremen.de wrote:

...
Am 04.03.2013 08:48, schrieb Martin Knoblauch Revuelta:

Do you mean that you would declare them RandomAccess iterators?

...
We can expect to have containers with millions of elements. For that size, O(log N) is about 20 times faster than O(log N log N). That makes a big difference.

isn't that a reason for providing optimized specializations of some STL algorithms, instead of introducing another iterator category?

how would a new iterator category even help? as long as your iterator is a forward iterator, the user is allowed to call std::binary_search on your iterator, which uses the same algorithm on any iterator category. (only functions used by binary_search like std::advance are specialized on the category) where in all this would your tree-optimized binary_search code be called?

the standard doesn't require random access iterator ops to be constant time.

Unfortunately, there is a problem of iterator categories with augmented trees. The C++ standard specifies computational complexity for iterator operations, although they are not shown in tables, since all of them are required to be constant time. In both C++03 and C++11 this is formulated in the section 24.1.8.

Are you sure these iterators don't fall under the *amortized* constant time requirements, just like those of the ordered associative containers?

Yes, I am sure. The algorithms that implement random access to container elements by index (also often named 'rank') are efficient search algorithms using counters of elements stored in tree nodes. These algorithms are similar to algorithms searching an element by a key that support functions of associative containers, such as c.lower_bound(key) and c.upper_bound(key). In the general case, both kinds of search algorithms on trees have the same logarithmic running time. However, the augmented array based B+ trees have an important advantage over red-black trees. They offer constant time access to the nearest container elements in the worst case against constant amortized time. Thus, these new data structure can provide the efficiency of sequential processing close to that of std::vector. Regards, Vadim Stadnik

Dave Abrahams

2:40 p.m.

New subject: Review managers wanted for augmented data structures

on Sun Mar 31 2013, Vadim Stadnik <vadimstdk-AT-gmail.com> wrote:

...

The algorithms that implement random access to container elements by index (also often named 'rank') are efficient search algorithms using counters of elements stored in tree nodes. These algorithms are similar to algorithms searching an element by a key that support functions of associative containers, such as c.lower_bound(key) and c.upper_bound(key). In the general case, both kinds of search algorithms on trees have the same logarithmic running time. However, the augmented array based B+ trees have an important advantage over red-black trees. They offer constant time access to the nearest container elements in the worst case against constant amortized time. Thus, these new data structure can provide the efficiency of sequential processing close to that of std::vector.

Ah. Seems like we don't have appropriate concepts to characterize these things. What do you say about the big-O when the dominating factor will always have such a small constant that it becomes insignificant? It feels like a cop-out, but I guess I'd go with random_access. -- Dave Abrahams

Vadim Stadnik

9:45 p.m.

New subject: Review managers wanted for augmented data structures

On Mon, Apr 1, 2013 at 12:40 AM, Dave Abrahams <dave@boostpro.com> wrote:

...

on Sun Mar 31 2013, Vadim Stadnik <vadimstdk-AT-gmail.com> wrote:

...
The algorithms that implement random access to container elements by index (also often named 'rank') are efficient search algorithms using counters of elements stored in tree nodes. These algorithms are similar to algorithms searching an element by a key that support functions of associative containers, such as c.lower_bound(key) and c.upper_bound(key). In the general case, both kinds of search algorithms on trees have the same logarithmic running time. However, the augmented array based B+ trees have an important advantage over red-black trees. They offer constant time access to the nearest container elements in the worst case against constant amortized time. Thus, these new data structure can provide the efficiency of sequential processing close to that of std::vector.

Ah. Seems like we don't have appropriate concepts to characterize these things. What do you say about the big-O when the dominating factor will always have such a small constant that it becomes insignificant? It feels like a cop-out, but I guess I'd go with random_access.

Asymptotically, most of operators of these new iterators have O(1) cost, this group includes increment, decrement and comparison operators. However, a few operators {+= , -=, +, -, [ ]} have O(log N) cost. Quite obviously, such iterators are much closer to random access iterators than to bidirectional. In addition to this, without std::random_access_iterator_tag user algorithms can not take full advantage of the efficient algorithms offered by augmented data structures. The running time can increase quite significantly from O(log N) to O(N), since standard library algorithms would use the subset of operators of bidirectional iterators only. Regards, Vadim Stadnik

Vadim Stadnik

3 Mar 3 Mar

11:03 a.m.

New subject: Review managers wanted for augmented data structures

On Sun, Mar 3, 2013 at 9:22 AM, Jonathan Wakely <jwakely.boost@kayari.org>wrote:

...

8:08 PM, "Vadim Stadnik" wrote:

...
Is there an alternative of not integrating advanced data structures into STL?

Why do they need to be put "into STL"?

Why can't they exist in addition to it, or alongside it?

In principle, this is a good option, since the class of advanced data structures is very wide (for library design can be regarded as unlimited) and it will be impossible to specify all data structures that support the same interfaces, but have minor differences in topology and data scheme in order to improve efficiency of specific container operations and user algorithms. The problems with this option is that current STL does not offer concepts of <sequence> or <set> that could be used in algorithms with "third party" data structures.

...

...
Augmented trees are non-standard due to different performance guarantees.

They're non-standard because they're not in the standard.

...
A more reasonable and practical approach to STL would be to provide a wide set of interchangeable containers and data structures with different performance guarantees and allow programmers to make decisions about the most suitable containers for specific applications.

That's exactly what the STL is. The STL is about iterators and algorithms and concepts, not a fixed set of predefined containers.

I agree that STL greatly improved representation of math concepts compared to all previous libraries. Nevertheless, it is well known that its containers are not completely interchangeable. First of all, this concerns the concept of <sequence>, which is represented in STL (C++03) by three types of containers with different interfaces: std::vector, std::list and std::deque. The next level of generalization is achieved with augmented trees that efficiently support the union of interfaces of all of these types of STL containers.

...

You've claimed the STL makes your containers "illegal" which is an odd claim that I can't agree with. Extending the STL with alternative containers is not just acceptable, I'm pretty sure it was intended by Stepanov et al.

I do not understand this statement. Because of ".. NOT just acceptable", it sounds like Stepanov et al. are responsible for STL being non-extendable,

...

If your containers don't meet the complexity requirements of an Associative Container as defined in the standard that's fine, just don't claim it is an Associative Container and there shouldn't be a problem, surely.

The case of associative containers is particularly illustrative. Basic search trees that currently support these types of containers are acceptable. However, the augmented variants of these trees that offer improved access to "n-th element" are not acceptable. Why should we avoid calling such containers associative when they use augmented trees (equivalent to improved trees) instead of basic trees? Just because they offer more efficient operations? Yes, that's correct! This conclusion follows from the C++ standard specification of computational complexities for single operations. This is an illustration how STL becomes unfriendly to more useful data structures.

...

I don't understand why you think there are objections to non-standard containers. The containers in the standard are not suitable for all situations and are not the only choices, if there's a more suitable non-standard container then using it makes sense. This doesn't put the STL "under pressure", it proves its value, because existing algorithms work with both standard and non-standard containers if they model the required concepts.

IMO, STL is fundamentally very strong ! First of all because of excellent interfaces. The problem is that it is not friendly to data structures that can make this library even better: improve efficiency of user algorithms and generalization of math concepts. The advantages of augmented data structures become obvious when they are tested in solutions to real problems. For everyone who is yet skeptical about usefulness of these data structures I would suggest to solve the following classical Josephus problem: http://en.wikipedia.org/wiki/Josephus_problem The performance test using two operations std::advance() and container.insert()) discussed in http://dl.dropbox.com/u/51041088/stl_ext_adv_review/doc/doc/boost_comparison... https://github.com/vstadnik/stl_ext_adv_review/blob/master/doc/doc/boost_comparison.html<https://github.com/vstadnik/stl_ext_adv_review> is a variant of algorithm for this type of problems with the difference that elements are not erased, but inserted into a container. Every STL and Boost container shows quadratic O(NxN) running time, whereas containers using augmented data structures have O(N logN) running time. Regards, Vadim Stadnik

Jonathan Wakely

11:56 a.m.

New subject: Review managers wanted for augmented data structures

On 3 March 2013 11:03, Vadim Stadnik wrote:

...

On Sun, Mar 3, 2013 at 9:22 AM, Jonathan Wakely wrote:

...
You've claimed the STL makes your containers "illegal" which is an odd claim that I can't agree with. Extending the STL with alternative containers is not just acceptable, I'm pretty sure it was intended by Stepanov et al.

I do not understand this statement. Because of ".. NOT just acceptable", it sounds like Stepanov et al. are responsible for STL being non-extendable,

No that's the opposite of what it says. Try replacing "just" with "only" if that helps. What I said is that extending the STL is acceptable, and I believe it was intended by the designers.

Vadim Stadnik

12:29 p.m.

New subject: Review managers wanted for augmented data structures

On Sun, Mar 3, 2013 at 9:56 PM, Jonathan Wakely <jwakely.boost@kayari.org>wrote:

...

On 3 March 2013 11:03, Vadim Stadnik wrote:

...
On Sun, Mar 3, 2013 at 9:22 AM, Jonathan Wakely wrote:

...
You've claimed the STL makes your containers "illegal" which is an odd claim that I can't agree with. Extending the STL with alternative containers is not just acceptable, I'm pretty sure it was intended by Stepanov et al.

I do not understand this statement. Because of ".. NOT just acceptable", it sounds like Stepanov et al. are responsible for STL being non-extendable,

No that's the opposite of what it says. Try replacing "just" with "only" if that helps.

What I said is that extending the STL is acceptable, and I believe it was intended by the designers.

Thank you, This is important clarification. Regards, Vadim Stadnik

4499

Age (days ago)

4528

Last active (days ago)

List overview

Download

8 comments

4 participants

participants (4)

Dave Abrahams
Jonathan Wakely
Martin Knoblauch Revuelta
Stefan Strasser
Vadim Stadnik