[proposal] Sequence container with O(log n) random access _and_ O(log n) insertion/removal (wherever)

- How about a sequence container (like a vector) with the advantage of logarithmic time insertion/removal at any point of the sequence? The disadvantage is that random access takes logarithmic time too (instead of the usual constant time random access of a vector). Here it is: http://sourceforge.net/projects/avl-array It is implemented with a tree, but it is _not_ a map. The 'position' of elements in avl_array are always natural numbers (0, 1, 2... up to size()-1). When an element is inserted/removed, it changes automatically the position of all elements after it in the sequence (in logarithmic time!). A simple experiment realized consisted in populating a container by inserting random numbers in random positions, and then removing numbers from random positions, showing the value of the last one for checking. In this test AVL Arrays completely outperform list, vector and deque. The test code and the results are in sourceforge too. Version 1.1 includes a new experimental feature that I call "Non-Proportional Sequence View". In the normal sequence, every element occupies a position of size 1. Additionally, elements can be viewed as a sequence where the 'width' of every element can be changed (in O(log n) time), being the position of every element the sum of the widths of all previous elements in the sequence. Negative widths are forbidden, but zero widths are valid. The type used for the width is a template parameter, so any class supporting some operators can be used, not only unsigned or double. I've tried to adapt it to the boost requirements, but I'm sure there's still a lot of things to change. If you are interested, please take a look at it, and tell me your opinion. Regards, Martin Knoblauch Revuelta Universidad de Alcala

Martin Knoblauch Revuelta wrote:
How about a sequence container (like a vector) with the advantage of logarithmic time insertion/removal at any point of the sequence? The disadvantage is that random access takes logarithmic time too (instead of the usual constant time random access of a vector).
I was wondering how long it would take this to make it from the usenet groups to here :-) I've mentioned this type of container here in the past, under the name of "rank_tree". Very soon now I will have the version I wrote, and use, ported to the framework that Bernhard produced for the SoC tree project. You can look at the draft TR <https://boost-consulting.com:8443/trac/soc/browser/boost/soc/2006/tree/trunk/libs/tree/doc/html/d2102.html?format=raw> that mentions it, and other trees. I don't have time to look at your implementation right now. But perhaps you could comment on how you think your implementation might fit within the above. And of course if there's something we haven't thought about. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Martin Knoblauch Revuelta wrote:
How about a sequence container (like a vector) with the advantage of logarithmic time insertion/removal at any point of the sequence? The disadvantage is that random access takes logarithmic time too (instead of the usual constant time random access of a vector).
Sorry if these where already mentioned, but I just got around to looking at the implementation: * There's no need for the m_next and m_prev fields. You can do inorder, preorder, postorder, and others without them. * I suspect you don't need the m_height either. But I don't know what you are using it for yet. * In my rank tree implementation the only extra field per node is the count, what you call width (as separate node and total). If I remember correctly in CLRS they describe the possibility of the count not directly reflecting the subtree+1 size to allow for sparse trees. I suspect that you can remove the need for the different m_node_width and m_total_width, using one only, by deducing the m_node_width from "count - left_count - right_count". * Overall you need to make use of STD utilities more. I think someone already mentioned something to this effect. Another example is using the iterator tags and other meta information. This is important to help out type specific algorithms. * Also putting implementations out-of-line would help in reading what the various interfaces do. I also dislike the use of abbreviations and single letter symbols, as it makes it harder to read. But this is mostly a matter of taste :-) * I don't see how not having a static allocator prevents moving nodes across equally typed containers. * I get the impression that there's is way more code than needed to implement this data structure. * You should use the allocator types instead of making types out of the value type directly. * Not sure why you would want the various unsigned/int/long vs size_type variations of methods. Having those extra methods just confuses the interface for users and introduces extra use cases to account for in the implementation. * operator=(o) can't be O(max(N,M)). You have to both construct the new elements and destroy the existing elements which makes it O(M+N), and keeping exceptions in mind you do a swap in between :-) * You have a variety of C style casts, and you happen to be casting "this". That's frowned upon, and truthfully I can't see why you do it. * Algorithms rarely have a place in STL containers. You might want to move the various binary_search, sort, etc. methods to functions. Preferably having them use the same interface as the STD versions. * I would make the value sorted aspect a separate container, possibly as an adaptor to the rank container but possibly also some other form of tree. Sorted ordering is orthogonal to rank statistics. * You seem to be using a sentinel node as a hybrid root and tail node but making the avl_array itself that sentinel node. To say the least this is a bit strange to me. Why not have just a root pointer to a real root node? Now for naming... I don't think I'll ever be able to refer to this as anything other than a rank tree, even with the addition of a sparse version. Obviously you named it avl for the type of balancing you are doing. As for some of the other suggested names they fall into two categories: structure (log_list, llist, larray, chain, random_list, rlist, ralist, ra_sequence) and category (os_tree, order statistics dictionary, and the variety of synonyms for sequence). The category terms fail to account for implementations of other exemplars from that category. This is the reason why STL doesn't have a "container" class... I know it's an exaggerated example :-) The STL tends to stick closely to using implementation related terms which would fit more closely with using structure terms. Programmers find this intuitive because they learn data structures and algorithm, and hence they can use the names as an indication of what to expect from the implementation of the containers. Unfortunately this rules out all the proffered terms as they suggest implementations other than an augmented binary tree. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

On 10/31/06, Rene Rivera <grafikrobot@gmail.com> wrote:
Martin Knoblauch Revuelta wrote:
How about a sequence container (like a vector) with the advantage of logarithmic time insertion/removal at any point of the sequence? The disadvantage is that random access takes logarithmic time too (instead of the usual constant time random access of a vector).
Sorry if these where already mentioned, but I just got around to looking at the implementation:
* There's no need for the m_next and m_prev fields. You can do inorder, preorder, postorder, and others without them.
Yes, I could, but operators ++ and -- would become O(log n). Many operations benefit from the posibility of fast iteration through m_next and m_prev. For example: begin() and end() are O(1).
* I suspect you don't need the m_height either. But I don't know what you are using it for yet.
For balance. IMO, the logic of re-balancing becomes simpler by having m_height instead of the usual {-1,0,+1}.
* In my rank tree implementation the only extra field per node is the count, what you call width (as separate node and total). If I remember correctly in CLRS they describe the possibility of the count not directly reflecting the subtree+1 size to allow for sparse trees. I suspect that you can remove the need for the different m_node_width and m_total_width, using one only, by deducing the m_node_width from "count - left_count - right_count".
The width is based on the same concept as count, but they are not related at all. Note that W might be float, for example. I might deduce m_total_width from m_node_width by recursing through the whole subtree every time. Obviously, this would spoil the whole thing.
* Overall you need to make use of STD utilities more. I think someone already mentioned something to this effect. Another example is using the iterator tags and other meta information. This is important to help out type specific algorithms.
I'll study this.
* Also putting implementations out-of-line would help in reading what the various interfaces do.
I agree. I'll do so.
I also dislike the use of abbreviations and single letter symbols, as it makes it harder to read. But this is mostly a matter of taste :-)
Yes, we might discuss this for ages. I try not to abuse them, using longer names on variables that will live longer, or will be passed by reference. Though, IMHO long names turn complex expessions unreadable as operators get buried under lots of letters.
* I don't see how not having a static allocator prevents moving nodes across equally typed containers.
It shouldn't, but it depends on how correct is the allocator class. I read in the documentation of SGI that allocators must be fully static, allowing deallocation of something by a different object of that one used for allocation (perhaps even destructed in the meantime). Is this true for _all_ std and boost allocators too?
* I get the impression that there's is way more code than needed to implement this data structure.
I'm trying to refactorize some things, but it's a large interface... ;-)
* You should use the allocator types instead of making types out of the value type directly.
I don't understand. Do you mean the rebind trick? I thought this was the standard way.
* Not sure why you would want the various unsigned/int/long vs size_type variations of methods. Having those extra methods just confuses the interface for users and introduces extra use cases to account for in the implementation.
It helps avoiding ambiguities when T is an unsigned/int/long. This is how other container implementations solve it.
* operator=(o) can't be O(max(N,M)). You have to both construct the new elements and destroy the existing elements which makes it O(M+N)
Right.
and keeping exceptions in mind you do a swap in between :-)
Exceptions? Swap? Currently, it looks like this: clear (); // delete old contents construct_nodes_list (..); // copy (T constr. might throw) build_known_size_tree (..); // still here? ok, build the tree Do you mean I should do it this way instead? construct_nodes_list (..); // copy (T constr. might throw) clear (); // still here? ok, delete old contents build_known_size_tree (..); // and build the tree
* You have a variety of C style casts, and you happen to be casting "this". That's frowned upon, and truthfully I can't see why you do it.
I'll change them to C++ casts. With this trick, neither iterators nor tree nodes need to contain a pointer to the avl_array object where they "belong", while the interface allows working with iterators without requiring references to the corresponding avl_array objects.
* Algorithms rarely have a place in STL containers. You might want to move the various binary_search, sort, etc. methods to functions. Preferably having them use the same interface as the STD versions.
I'll study this too.
* I would make the value sorted aspect a separate container, possibly as an adaptor to the rank container but possibly also some other form of tree. Sorted ordering is orthogonal to rank statistics.
Why? I wouldn't like to restrict the hybrid use of a container object. I mean, using it as a sorted container for a while, then using it as unsorted for a while, then sorting it again, maybe with a different criterion, and so on... Note that iterators would remain valid through the whole process...
* You seem to be using a sentinel node as a hybrid root and tail node but making the avl_array itself that sentinel node. To say the least this is a bit strange to me.
In version 1.0, the sentinel was a member of the avl_array. I had to use some ugly non-portable macros to get the avl_array from the sentinel address. It's much better with inheritance.
Why not have just a root pointer to a real root node?
An embedded sentinel (aggregated, or inherited) is IMHO far more consistent with the idea of an "end" position. Additionally, it simplifies the implementation.
Now for naming...
I don't think I'll ever be able to refer to this as anything other than a rank tree, even with the addition of a sparse version. Obviously you named it avl for the type of balancing you are doing. As for some of the other suggested names they fall into two categories: structure (log_list, llist, larray, chain, random_list, rlist, ralist, ra_sequence) and category (os_tree, order statistics dictionary, and the variety of synonyms for sequence).
The category terms fail to account for implementations of other exemplars from that category. This is the reason why STL doesn't have a "container" class... I know it's an exaggerated example :-)
The STL tends to stick closely to using implementation related terms which would fit more closely with using structure terms. Programmers find this intuitive because they learn data structures and algorithm, and hence they can use the names as an indication of what to expect from the implementation of the containers. Unfortunately this rules out all the proffered terms as they suggest implementations other than an augmented binary tree.
I would be pleased with anyone of the suggested names, including rank tree if you don't mind. Thanks a lot for your suggestions. Best regards, Martin

On 10/31/06, Martin Knoblauch Revuelta <mkrevuelta@gmail.com> wrote:
The width is based on the same concept as count, but they are not related at all. Note that W might be float, for example.
I might deduce m_total_width from m_node_width by recursing through the whole subtree every time. Obviously, this would spoil the whole thing.
I don't entirely understand the benifit of your 'non-proportional sequence view'. What use cases can you think of where this would be beneficial to client code? Right now it seems like it adds unjustified bloat to the library.
* I get the impression that there's is way more code than needed to implement this data structure.
I'm trying to refactorize some things, but it's a large interface... ;-)
Large interface == monolithic design. As much as possible, I would try to stick to just implementing member functions that are already member functions of other container types. Everything else should be a free function, perhaps in an optional utility header of some kind.
* Not sure why you would want the various unsigned/int/long vs size_type variations of methods. Having those extra methods just confuses the interface for users and introduces extra use cases to account for in the implementation.
It helps avoiding ambiguities when T is an unsigned/int/long. This is how other container implementations solve it.
What ambiguities? Implicit conversions exist between built in types, so if there is no algorithmic difference I don't understand this justification.
and keeping exceptions in mind you do a swap in between :-)
Exceptions? Swap?
He is referring to a pattern where you would create a temporary <sequence name here> object, build up that object, and then swap the two sequences (just swap their root pointer values). This would ensure that if you can't finish building the tree then the nodes you added will be released automatically. Essentially, commit/rollback without a try/catch.
* You have a variety of C style casts, and you happen to be casting "this". That's frowned upon, and truthfully I can't see why you do it.
I'll change them to C++ casts.
With this trick, neither iterators nor tree nodes need to contain a pointer to the avl_array object where they "belong", while the interface allows working with iterators without requiring references to the corresponding avl_array objects.
I would say that slightly more space overhead would be preferable to tricky and less straight forward code. Although I'm not sure why either the nodes or the iterators should be aware of the container they are a part of to begin with.
* Algorithms rarely have a place in STL containers. You might want to move the various binary_search, sort, etc. methods to functions. Preferably having them use the same interface as the STD versions.
I'll study this too.
One of the most important features of the stl is that it separates algorithms from the container types they operate on. Most of these algorithms already exist in the stl. What is your justification that they need to be overridden for your container in the first place?
* I would make the value sorted aspect a separate container, possibly as an adaptor to the rank container but possibly also some other form of tree. Sorted ordering is orthogonal to rank statistics.
Why? I wouldn't like to restrict the hybrid use of a container object. I mean, using it as a sorted container for a while, then using it as unsorted for a while, then sorting it again, maybe with a different criterion, and so on... Note that iterators would remain valid through the whole process...
Because it desimplifies the interface, causing bloat and making client code potentially pay for features it doesn't want/need. I would prefer a separate container type (perhaps priority_<sequence_name_here>) which would behave more like a stack or queue. It could allow pushing and popping values into sorted order and forbid arbitrary location inserts by the user. Supporting both random location inserts by the user and sorted inserts doesn't make sense. The container no longer has control over its invariants. -Jason

On 10/31/06, Jason Hise <0xchaos@gmail.com> wrote:
I don't entirely understand the benifit of your 'non-proportional sequence view'. What use cases can you think of where this would be beneficial to client code?
The best example I've thought of is a text editor. You might want to access the lines of a text file indexing by long-line (or paragraph) number, or by broken-line number. Every long-line would be stored in an element, and node_width would be the number of broken-lines required for it. Note that this can be extended to word processors, internet browsers, e-book readers or, in general, any program displaying a series of things that might extend significantly in one dimension. The width might be the height in pixels of the lines (of text, or whatever). In fact, this idea is already in use: struct _GtkTextBTreeNode { //... int num_lines; /* Total number of lines (leaves) in * the subtree rooted here. */ int num_chars; /* Number of chars below here */ //... }; It can also be used for saving space by having a fixed circular buffer of, say, 32 small elements in each node of the tree (though, iterators would loose some good validity properties). For the next version I have designed an interface that allows having more than one NPSV width, and indexing separately by each one of them.
Right now it seems like it adds unjustified bloat to the library.
The empty_number class, used as default value for the template parameter W, contains no data, so it should be optimized away by any compiler.
I'm trying to refactorize some things, but it's a large interface... ;-)
Large interface == monolithic design. As much as possible, I would try to stick to just implementing member functions that are already member functions of other container types. Everything else should be a free function, perhaps in an optional utility header of some kind.
IMHO, the main things that make it large are supporting vector+list interfaces, and having four different types of iterators: var/const x straight/reverse. By the way, somebody asked me if I could implement splice methods. I am doing it, but I'm calling them move, since splice is a specific name referring how this is done with lists, and remarking that it takes O(1) time, which will be false here. What do you think?
* [..] various unsigned/int/long vs size_type variations of methods. [..]
It helps avoiding ambiguities when T is an unsigned/int/long. This is how other container implementations solve it.
What ambiguities? [..]
For example, if you only provide the unsigned version of avl_array (unsigned/int/long n, const T & t), then this code: avl_array<int> A(4,3); will be understood by the compiler as a call to: template <class IT> avl_array (IT from, IT to); which is, of course, wrong. All these methods are inline, so there should be no difference after compilation. I simply applied the same pattern wherever I suspected that there might be any ambiguity. That said, if concepts help avoiding these ambiguities, I'll be happy to remove the extra versions. I find them simply painful.
and keeping exceptions in mind you do a swap in between :-) [..] He is referring to a pattern where you would create a temporary <sequence name here> object, build up that object, and then swap the two sequences (just swap their root pointer values). This would ensure that if you can't finish building the tree then the nodes you added will be released automatically. Essentially, commit/rollback without a try/catch.
Sounds good :-)
I would say that slightly more space overhead would be preferable to tricky and less straight forward code.
The key here is the word "slightly". I find that having a pointer to the container in every node takes a lot of space. So does having it in iterators; the user might be cross-referencing different data structures, using lots of iterators.
Although I'm not sure why either the nodes or the iterators should be aware of the container they are a part of to begin with.
Good point. I think that, in some operations, specifying the container is both redundant and disgusting. Redundancy is error prone. If you are specifying a location with an iterator, why should you specify the container too. Specially when it is not necessary at all, or it can be deduced from the iterator reasonably fast. In non-static methods where an iterator is required, an assert checks the consistency of the call. This helps in debug processes. assert (owner(it.ptr)==this); // it must point into this arr. See this other assert in the operator for iterators subtraction: // Again, we can mix const and var. template<class X,class Y> // while calculating distances int operator- (const avl_array_rev_iter<T,A,W,X,Y> & it) const { my_array * a, * b; int m, n; m = my_array::position_of_node (ptr, a, false); n = my_array::position_of_node (it_ptr(it), b, false); assert (a==b); // Inter-array distance has no sense return n-m; } Some methods have been made static just to provide enhaced flexibility. For example: static iterator insert (const iterator & it, const T & t); (Now I'm writing all this, i note that i forgot to make static the method erase(iterator). Some other members might be made static too...) Other methods, like static swap(iterator,iterator) have their functionality extended (swapping nodes between different containers) without requiring any additional parameter. It would be absurd to require specification of only one of the involved containers. The same thing happens with all versions of move() (coming soon; see comments about splice above). I think that this provides flexibility and ease of use. Whether the "static-where-possible" approach is kept or not, reaching the container from iterators is definitely useful. If kept, for not requiring references to it. If not, for checking consistency. Of course, the end node might be outside the container class, or it might be there as an aggregated member (instead of being inherited). But what would be the difference? Unless an extra T object (or at least the corresponding amount of memory) was used, the end node would need to be of a different class, and then the same casts (C or C++ style) would be required. And, by the way, casting a pointer from a base class to a derived class is perfectly legal and safe (regarded that it is actually pointing to an object of the derived class), isn't it?
One of the most important features of the stl is that it separates algorithms from the container types they operate on. Most of these algorithms already exist in the stl. What is your justification that they need to be overridden for your container in the first place?
Because I bet that these are optimized for either O(1) random access or O(1) insertion/removal. The new container doesn't fit any of those, and it would perform suboptimal if treated by algorithms based on those assumptions. Is there a sort specialization for sorted trees? I mean, creating a tree from scratch and populating it with an insert_sorted method. Even if it exists, note the optimization I am using: instead of extracting the nodes one by one from the source tree (with the corresponding rebalancing), I detach the list from it, and traverse it. The complexity will remain the same, but this trick will save a lot of time. This can only be done from inside the class or from a friend function. The same is for stable_sort(). The methods reverse, merge and unique are all O(N) (well, merge is O(M+N) ;). I can't imagine how this could be achieved by an existing algorithm, using only the public interface, and maintaining the validity of all iterators. The methods insert_sorted() and binary_search() would take O(sqr(log n)) if implemented from the outside.
* I would make the value sorted aspect a separate container, [..]
Why? I wouldn't like to restrict the hybrid use of a container object. [..]
Because it desimplifies the interface, causing bloat and making client code potentially pay for features it doesn't want/need.
This container is a template. Unused methods don't even get compiled. If you mean the m_oldpos member in avl_array_node_tree_fields... well, I agree. It's not so bad, I guess... If I knew a way to cleanly wipe it out when stable_sort is not used...
[..] Supporting both random location inserts by the user and sorted inserts doesn't make sense.
I think it might make sense. Think of the tables of a word processor, or the rows of a spreadsheet: the final user can freely insert/extract, and there's a sort command in the menu. This command is often used consecutively for different ordering criterions.
The container no longer has control over its invariants.
Regarding a particular order, true :-( Though, the user has (just like with vector and list) :-) Best regards, Martin

On 10/31/06, Martin Knoblauch Revuelta <mkrevuelta@gmail.com> wrote:
On 10/31/06, Jason Hise <0xchaos@gmail.com> wrote:
I don't entirely understand the benifit of your 'non-proportional sequence view'. What use cases can you think of where this would be beneficial to client code?
The best example I've thought of is a text editor [...] For the next version I have designed an interface that allows having more than one NPSV width, and indexing separately by each one of them.
Wouldn't it make more sense to have a container of containers if you want to do this? Or am I misunderstanding something?
Right now it seems like it adds unjustified bloat to the library.
The empty_number class, used as default value for the template parameter W, contains no data, so it should be optimized away by any compiler.
I was referring more to lines of code and elegance of implementation.
By the way, somebody asked me if I could implement splice methods. I am doing it, but I'm calling them move, since splice is a specific name referring how this is done with lists, and remarking that it takes O(1) time, which will be false here. What do you think?
I think that the member function should still be called splice for the sake of conforming to a common interface.
Although I'm not sure why either the nodes or the iterators should be aware of the container they are a part of to begin with.
Good point.
I think that, in some operations, specifying the container is both redundant and disgusting. Redundancy is error prone. If you are specifying a location with an iterator, why should you specify the container too. Specially when it is not necessary at all, or it can be deduced from the iterator reasonably fast.
Algorithms should not need to know to which container the iterators belong. Iterators are meant to be redirectable pointers to locations in a container, and nothing more. It seems like you are trying to make iterators capable of so much magic that one never needs to refer to the container. But the container is the place where the interface is supposed to be.
In non-static methods where an iterator is required, an assert checks the consistency of the call. This helps in debug processes.
assert (owner(it.ptr)==this); // it must point into this arr.
That's very kind of you, but I don't think this check is your responsibility. Especially if it results in more overhead being required.
See this other assert in the operator for iterators subtraction: [..]
Same comment.
Some methods have been made static just to provide enhaced flexibility. For example:
static iterator insert (const iterator & it, const T & t);
(Now I'm writing all this, i note that i forgot to make static the method erase(iterator). Some other members might be made static too...)
Making this function static is the equivalent of saying that an iterator is a micro-container capable of the equivalent of a push_back. Although this may be true, it should not be exposed like this. Conceptually the functionality is at the wrong level. Values are inserted into containers, or erased from them. An iterator only specifies the location.
Other methods, like static swap(iterator,iterator) have their functionality extended (swapping nodes between different containers) without requiring any additional parameter. It would be absurd to require specification of only one of the involved containers. The same thing happens with all versions of move() (coming soon; see comments about splice above).
One would expect swapping iterators to swap what they point to, not to which containers their values belong. Containers normally expect to contain a sequence of value types which are capable of being swapped themselves easily, therefore negating the need for this function. A swap should be provided at the container level, but providing a specialized one at the iterator level doesn't make sense.
I think that this provides flexibility and ease of use.
Whether the "static-where-possible" approach is kept or not, reaching the container from iterators is definitely useful. If kept, for not requiring references to it. If not, for checking consistency.
Static where possible is in my opinion a bad idea. The container should provide member functions which modify or access the container, and everything else should be moved out into free floating utility functions. This is what a separation between containers and algorithms means, and it is what the stl is built on.
One of the most important features of the stl is that it separates algorithms from the container types they operate on. Most of these algorithms already exist in the stl. What is your justification that they need to be overridden for your container in the first place?
Because I bet that these are optimized for either O(1) random access or O(1) insertion/removal. The new container doesn't fit any of those, and it would perform suboptimal if treated by algorithms based on those assumptions.
Is there a sort specialization for sorted trees? I mean, creating a tree from scratch and populating it with an insert_sorted method. Even if it exists, note the optimization I am using: instead of extracting the nodes one by one from the source tree (with the corresponding rebalancing), I detach the list from it, and traverse it. The complexity will remain the same, but this trick will save a lot of time. This can only be done from inside the class or from a friend function. The same is for stable_sort().
You are trying to make one container that is supposed to look like a sequence do absolutely everything a tree can do, and I think this is a terrible mistake. If people need a data structure that can do everything a tree can do, they should use something that provides an interface to a tree. Not something that provides an interface to a sequence.
* I would make the value sorted aspect a separate container, [..]
Why? I wouldn't like to restrict the hybrid use of a container object. [..]
Because it desimplifies the interface, causing bloat and making client code potentially pay for features it doesn't want/need.
This container is a template. Unused methods don't even get compiled.
Again, I meant it bloats the interface. Not necessarilly the compiled code size or algorithmic complexity. In general, a container either needs to stay sorted or it doesn't. This is why there is a separate priority queue class, instead of an insert_sorted method in each sequence container.
[..] Supporting both random location inserts by the user and sorted inserts doesn't make sense.
I think it might make sense. Think of the tables of a word processor, or the rows of a spreadsheet: the final user can freely insert/extract, and there's a sort command in the menu. This command is often used consecutively for different ordering criterions.
Unless the document is a million pages long, I doubt calling the standard sort algorithm would really result in that much of a performance hit. You keep using the text editor as justification for the extra features. Wouldn't it be better for the text editor to use a tree interface for its storage in the first place, rather than a sequence interface? Summary: I think that it is absolutely vital that you consider this to be a sequence library, rather than a tree that happens to use stl constructs. Trees and the algorithms that operate on them are separate animals, and should be dealt with in their own library. Right now I am seeing a dangerous degree of feature bloat resulting directly from your desire to build on the optimizations possible because of the tree nature of the implementation. -Jason

On 11/1/06, Jason Hise <0xchaos@gmail.com> wrote:
[..] Right now I am seeing a dangerous degree of feature bloat resulting directly from your desire to build on the optimizations possible because of the tree nature of the implementation.
Obviously, I can't impose any undesired extra feature. It was never my intention. Sorry if it looked like that. I was just trying to convince you. Of course, I will be pleased to prepare a version with only the features approved by the general opinion. I have aleady made a lot of changes following your advise. The mess of pointer conversions has been simplified, and I can change it more if nobody likes how it is right now. The "static where possible" approach will soon disappear unless somebody expresses a desire to keep it. I've realized that the stable_sort can be done by the user (using sort) with some effort, but with the same algorithmic complexity, so I'll probably remove m_oldpos and stable_sort(). I'll put the algorithms out of the class and I'll organize it all in modules, so that everybody can see how it woult look without algorithms and/or without NPSV. Thanks again, Martin

Martin Knoblauch Revuelta wrote:
On 10/31/06, Rene Rivera <grafikrobot@gmail.com> wrote:
* There's no need for the m_next and m_prev fields. You can do inorder, preorder, postorder, and others without them.
Yes, I could, but operators ++ and -- would become O(log n).
I beg to differ. All those traversals can be implemented O(1) per iteration. This is the case for all binary trees.
Many operations benefit from the posibility of fast iteration through m_next and m_prev. For example: begin() and end() are O(1).
If you want constant time for begin() and end() you could store references to them in the container and update them as needed over time.
* I suspect you don't need the m_height either. But I don't know what you are using it for yet.
For balance. IMO, the logic of re-balancing becomes simpler by having m_height instead of the usual {-1,0,+1}.
Right, AVL balancing... In my version I implemented a balancing that uses the count/width directly obviating the need for extra coloring information.
* In my rank tree implementation the only extra field per node is the count, what you call width (as separate node and total). If I remember correctly in CLRS they describe the possibility of the count not directly reflecting the subtree+1 size to allow for sparse trees. I suspect that you can remove the need for the different m_node_width and m_total_width, using one only, by deducing the m_node_width from "count - left_count - right_count".
The width is based on the same concept as count, but they are not related at all. Note that W might be float, for example.
I don't see how making it a float invalidates the calculation as it's the rank statistic invariant. Are you not using that invariant? If not, what is your invariant?
I might deduce m_total_width from m_node_width by recursing through the whole subtree every time. Obviously, this would spoil the whole thing.
Hm, I guess you aren't using the rank invariant then.
* I don't see how not having a static allocator prevents moving nodes across equally typed containers.
It shouldn't, but it depends on how correct is the allocator class. I read in the documentation of SGI that allocators must be fully static, allowing deallocation of something by a different object of that one used for allocation (perhaps even destructed in the meantime). Is this true for _all_ std and boost allocators too?
The pertinent passage is 20.1.5p4: Implementations of containers described in this International Standard are permitted to assume that their Allocator template parameter meets the following two additional requirements beyond those in Table 32. — All instances of a given allocator type are required to be interchangeable and always compare equal to each other.
* You should use the allocator types instead of making types out of the value type directly.
I don't understand. Do you mean the rebind trick? I thought this was the standard way.
I meant that instead of: typedef T & reference; typedef const T & const_reference; You use: typedef typename A::reference reference; typedef typename A::const_reference const_reference;
* You have a variety of C style casts, and you happen to be casting "this". That's frowned upon, and truthfully I can't see why you do it.
I'll change them to C++ casts.
Hopefully you mean static_cast<>. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Rene Rivera ha escrito:
Martin Knoblauch Revuelta wrote:
On 10/31/06, Rene Rivera <grafikrobot@gmail.com> wrote:
* There's no need for the m_next and m_prev fields. You can do inorder, preorder, postorder, and others without them.
Yes, I could, but operators ++ and -- would become O(log n).
I beg to differ. All those traversals can be implemented O(1) per iteration. This is the case for all binary trees.
Umm, it is *amortized* O(1), as allowed by 24.1/8. Without amortization being taken into account, ++ and -- have a worst case complexity O(log n), if I'm not missing something. Although not explicitly stated in the standard, the amortization here has to be done for a complete traversal from begin to end. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Joaquín Mª López Muñoz wrote:
Rene Rivera ha escrito:
Martin Knoblauch Revuelta wrote:
On 10/31/06, Rene Rivera <grafikrobot@gmail.com> wrote:
* There's no need for the m_next and m_prev fields. You can do inorder, preorder, postorder, and others without them. Yes, I could, but operators ++ and -- would become O(log n). I beg to differ. All those traversals can be implemented O(1) per iteration. This is the case for all binary trees.
Umm, it is *amortized* O(1), as allowed by 24.1/8. Without amortization being taken into account, ++ and -- have a worst case complexity O(log n), if I'm not missing something. Although not explicitly stated in the standard, the amortization here has to be done for a complete traversal from begin to end.
Yes, sorry, I meant amortized time :-) -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

On 10/31/06, Rene Rivera <grafikrobot@gmail.com> wrote:
Martin Knoblauch Revuelta wrote:
On 10/31/06, Rene Rivera <grafikrobot@gmail.com> wrote:
If you want constant time for begin() and end() you could store references to them in the container and update them as needed over time.
I do. They are in the m_prev and m_next pointers of the dummy node. Additionally, the implementation is simpler this way.
Right, AVL balancing... In my version I implemented a balancing that uses the count/width directly obviating the need for extra coloring information.
Are you mathematically sure that this ensures the AVL balance constraints and doesn't require unnecessary rotations? Interesting. This calls for an experiment (I'm a skeptic, as you see ;-)
[..] Hm, I guess you aren't using the rank invariant then.
I'm not sure of what you mean with "rank invariant" but I guess I'm using it for the count, the index operator, iterators... but not for the NPSV thing. That's why I call it "Non-Proportional...".
* I don't see how not having a static allocator prevents moving nodes across equally typed containers. [..] The pertinent passage is 20.1.5p4:
Implementations of containers described in this International Standard are permitted to assume that their Allocator template parameter meets the following two additional requirements beyond those in Table 32.
— All instances of a given allocator type are required to be interchangeable and always compare equal to each other.
Great! Then I will get rid of the static allocator.
* You should use the allocator types instead of making types out of the value type directly. [..] I meant that instead of:
typedef T & reference; typedef const T & const_reference;
You use:
typedef typename A::reference reference; typedef typename A::const_reference const_reference;
Ok. Thanks ;-)
* You have a variety of C style casts, and you happen to be casting "this". That's frowned upon, and truthfully I can't see why you do it.
I'll change them to C++ casts.
Hopefully you mean static_cast<>.
Of course. The m_parent member of the dummy node is also used: it stores a NULL pointer. Only dummy nodes have this pointer set to NULL. By checking this, I can ensure that the cast will be ok. Thanks, Martin
participants (4)
-
Jason Hise
-
Joaquín Mª López Muñoz
-
Martin Knoblauch Revuelta
-
Rene Rivera