
On 10/31/06, Jason Hise <0xchaos@gmail.com> wrote:
I don't entirely understand the benifit of your 'non-proportional sequence view'. What use cases can you think of where this would be beneficial to client code?
The best example I've thought of is a text editor. You might want to access the lines of a text file indexing by long-line (or paragraph) number, or by broken-line number. Every long-line would be stored in an element, and node_width would be the number of broken-lines required for it. Note that this can be extended to word processors, internet browsers, e-book readers or, in general, any program displaying a series of things that might extend significantly in one dimension. The width might be the height in pixels of the lines (of text, or whatever). In fact, this idea is already in use: struct _GtkTextBTreeNode { //... int num_lines; /* Total number of lines (leaves) in * the subtree rooted here. */ int num_chars; /* Number of chars below here */ //... }; It can also be used for saving space by having a fixed circular buffer of, say, 32 small elements in each node of the tree (though, iterators would loose some good validity properties). For the next version I have designed an interface that allows having more than one NPSV width, and indexing separately by each one of them.
Right now it seems like it adds unjustified bloat to the library.
The empty_number class, used as default value for the template parameter W, contains no data, so it should be optimized away by any compiler.
I'm trying to refactorize some things, but it's a large interface... ;-)
Large interface == monolithic design. As much as possible, I would try to stick to just implementing member functions that are already member functions of other container types. Everything else should be a free function, perhaps in an optional utility header of some kind.
IMHO, the main things that make it large are supporting vector+list interfaces, and having four different types of iterators: var/const x straight/reverse. By the way, somebody asked me if I could implement splice methods. I am doing it, but I'm calling them move, since splice is a specific name referring how this is done with lists, and remarking that it takes O(1) time, which will be false here. What do you think?
* [..] various unsigned/int/long vs size_type variations of methods. [..]
It helps avoiding ambiguities when T is an unsigned/int/long. This is how other container implementations solve it.
What ambiguities? [..]
For example, if you only provide the unsigned version of avl_array (unsigned/int/long n, const T & t), then this code: avl_array<int> A(4,3); will be understood by the compiler as a call to: template <class IT> avl_array (IT from, IT to); which is, of course, wrong. All these methods are inline, so there should be no difference after compilation. I simply applied the same pattern wherever I suspected that there might be any ambiguity. That said, if concepts help avoiding these ambiguities, I'll be happy to remove the extra versions. I find them simply painful.
and keeping exceptions in mind you do a swap in between :-) [..] He is referring to a pattern where you would create a temporary <sequence name here> object, build up that object, and then swap the two sequences (just swap their root pointer values). This would ensure that if you can't finish building the tree then the nodes you added will be released automatically. Essentially, commit/rollback without a try/catch.
Sounds good :-)
I would say that slightly more space overhead would be preferable to tricky and less straight forward code.
The key here is the word "slightly". I find that having a pointer to the container in every node takes a lot of space. So does having it in iterators; the user might be cross-referencing different data structures, using lots of iterators.
Although I'm not sure why either the nodes or the iterators should be aware of the container they are a part of to begin with.
Good point. I think that, in some operations, specifying the container is both redundant and disgusting. Redundancy is error prone. If you are specifying a location with an iterator, why should you specify the container too. Specially when it is not necessary at all, or it can be deduced from the iterator reasonably fast. In non-static methods where an iterator is required, an assert checks the consistency of the call. This helps in debug processes. assert (owner(it.ptr)==this); // it must point into this arr. See this other assert in the operator for iterators subtraction: // Again, we can mix const and var. template<class X,class Y> // while calculating distances int operator- (const avl_array_rev_iter<T,A,W,X,Y> & it) const { my_array * a, * b; int m, n; m = my_array::position_of_node (ptr, a, false); n = my_array::position_of_node (it_ptr(it), b, false); assert (a==b); // Inter-array distance has no sense return n-m; } Some methods have been made static just to provide enhaced flexibility. For example: static iterator insert (const iterator & it, const T & t); (Now I'm writing all this, i note that i forgot to make static the method erase(iterator). Some other members might be made static too...) Other methods, like static swap(iterator,iterator) have their functionality extended (swapping nodes between different containers) without requiring any additional parameter. It would be absurd to require specification of only one of the involved containers. The same thing happens with all versions of move() (coming soon; see comments about splice above). I think that this provides flexibility and ease of use. Whether the "static-where-possible" approach is kept or not, reaching the container from iterators is definitely useful. If kept, for not requiring references to it. If not, for checking consistency. Of course, the end node might be outside the container class, or it might be there as an aggregated member (instead of being inherited). But what would be the difference? Unless an extra T object (or at least the corresponding amount of memory) was used, the end node would need to be of a different class, and then the same casts (C or C++ style) would be required. And, by the way, casting a pointer from a base class to a derived class is perfectly legal and safe (regarded that it is actually pointing to an object of the derived class), isn't it?
One of the most important features of the stl is that it separates algorithms from the container types they operate on. Most of these algorithms already exist in the stl. What is your justification that they need to be overridden for your container in the first place?
Because I bet that these are optimized for either O(1) random access or O(1) insertion/removal. The new container doesn't fit any of those, and it would perform suboptimal if treated by algorithms based on those assumptions. Is there a sort specialization for sorted trees? I mean, creating a tree from scratch and populating it with an insert_sorted method. Even if it exists, note the optimization I am using: instead of extracting the nodes one by one from the source tree (with the corresponding rebalancing), I detach the list from it, and traverse it. The complexity will remain the same, but this trick will save a lot of time. This can only be done from inside the class or from a friend function. The same is for stable_sort(). The methods reverse, merge and unique are all O(N) (well, merge is O(M+N) ;). I can't imagine how this could be achieved by an existing algorithm, using only the public interface, and maintaining the validity of all iterators. The methods insert_sorted() and binary_search() would take O(sqr(log n)) if implemented from the outside.
* I would make the value sorted aspect a separate container, [..]
Why? I wouldn't like to restrict the hybrid use of a container object. [..]
Because it desimplifies the interface, causing bloat and making client code potentially pay for features it doesn't want/need.
This container is a template. Unused methods don't even get compiled. If you mean the m_oldpos member in avl_array_node_tree_fields... well, I agree. It's not so bad, I guess... If I knew a way to cleanly wipe it out when stable_sort is not used...
[..] Supporting both random location inserts by the user and sorted inserts doesn't make sense.
I think it might make sense. Think of the tables of a word processor, or the rows of a spreadsheet: the final user can freely insert/extract, and there's a sort command in the menu. This command is often used consecutively for different ordering criterions.
The container no longer has control over its invariants.
Regarding a particular order, true :-( Though, the user has (just like with vector and list) :-) Best regards, Martin