Re: [boost] [proposal] Sequence container with O(log n) random access _and_ O(log n) insertion/removal (wherever)

31 Oct 2006

      On 10/31/06, Jason Hise <0xchaos@gmail.com> wrote:
...
I don't entirely understand the benifit of your 'non-proportional
sequence view'.  What use cases can you think of where this would be
beneficial to client code?
The best example I've thought of is a text editor. You might want to
access the lines of a text file indexing by long-line (or paragraph)
number, or by broken-line number. Every long-line would be stored in
an element, and node_width would be the number of broken-lines
required for it.

Note that this can be extended to word processors, internet browsers,
e-book readers or, in general, any program displaying a series of
things that might extend significantly in one dimension. The width
might be the height in pixels of the lines (of text, or whatever).

In fact, this idea is already in use:

struct _GtkTextBTreeNode {
 //...
 int num_lines;             /* Total number of lines (leaves) in
                             * the subtree rooted here. */
 int num_chars;             /* Number of chars below here */
 //...
};

It can also be used for saving space by having a fixed circular buffer
of, say, 32 small elements in each node of the tree (though, iterators
would loose some good validity properties).

For the next version I have designed an interface that allows having
more than one NPSV width, and indexing separately by each one of them.
...
Right now it seems like it adds
unjustified bloat to the library.
The empty_number class, used as default value for the template
parameter W, contains no data, so it should be optimized away by any
compiler.
...
...
I'm trying to refactorize some things, but it's a large interface... ;-)
Large interface == monolithic design.  As much as possible, I would
try to stick to just implementing member functions that are already
member functions of other container types.  Everything else should be
a free function, perhaps in an optional utility header of some kind.
IMHO, the main things that make it large are supporting vector+list
interfaces, and having four different types of iterators: var/const x
straight/reverse.

By the way, somebody asked me if I could implement splice methods. I
am doing it, but I'm calling them move, since splice is a specific
name referring how this is done with lists, and remarking that it
takes O(1) time, which will be false here. What do you think?
...
...
...
* [..] various unsigned/int/long vs size_type variations of methods. [..]
It helps avoiding ambiguities when T is an unsigned/int/long. This is
how other container implementations solve it.
What ambiguities? [..]
For example, if you only provide the unsigned version of  avl_array
(unsigned/int/long n, const T & t), then this code:

  avl_array<int> A(4,3);

will be understood by the compiler as a call to:

    template <class IT>
    avl_array (IT from, IT to);

which is, of course, wrong.

All these methods are inline, so there should be no difference after
compilation. I simply applied the same pattern wherever I suspected
that there might be any ambiguity.

That said, if concepts help avoiding these ambiguities, I'll be happy
to remove the extra versions. I find them simply painful.
...
...
...
and keeping exceptions in mind you do a swap in between :-)
[..]
He is referring to a pattern where you would create a temporary
<sequence name here> object, build up that object, and then swap the
two sequences (just swap their root pointer values).  This would
ensure that if you can't finish building the tree then the nodes you
added will be released automatically.  Essentially, commit/rollback
without a try/catch.
Sounds good  :-)
...
I would say that slightly more space overhead would be preferable to
tricky and less straight forward code.
The key here is the word "slightly". I find that having a pointer to
the container in every node takes a lot of space. So does having it in
iterators; the user might be cross-referencing different data
structures, using lots of iterators.
...
Although I'm not sure why
either the nodes or the iterators should be aware of the container
they are a part of to begin with.
Good point.

I think that, in some operations, specifying the container is both
redundant and disgusting. Redundancy is error prone. If you are
specifying a location with an iterator, why should you specify the
container too. Specially when it is not necessary at all, or it can be
deduced from the iterator reasonably fast.

In non-static methods where an iterator is required, an assert checks
the consistency of the call. This helps in debug processes.

  assert (owner(it.ptr)==this);  // it must point into this arr.

See this other assert in the operator for iterators subtraction:

                               // Again, we can mix const and var.
    template<class X,class Y>  // while calculating distances
    int operator-
        (const avl_array_rev_iter<T,A,W,X,Y> & it) const
    {
      my_array * a, * b;
      int m, n;

      m = my_array::position_of_node (ptr, a, false);
      n = my_array::position_of_node (it_ptr(it), b, false);

      assert (a==b); // Inter-array distance has no sense

      return n-m;
    }

Some methods have been made static just to provide enhaced
flexibility. For example:

  static iterator insert (const iterator & it, const T & t);

(Now I'm writing all this, i note that i forgot to make static the
method erase(iterator). Some other members might be made static
too...)

Other methods, like static swap(iterator,iterator) have their
functionality extended (swapping nodes between different containers)
without requiring any additional parameter. It would be absurd to
require specification of  only one of the involved containers. The
same thing happens with all versions of move() (coming soon; see
comments about splice above).

I think that this provides flexibility and ease of use.

Whether the "static-where-possible" approach is kept or not, reaching
the container from iterators is definitely useful. If kept, for not
requiring references to it. If not, for checking consistency.

Of course, the end node might be outside the container class, or it
might be there as an aggregated member (instead of being inherited).
But what would be the difference? Unless an extra T object (or at
least the corresponding amount of memory) was used, the end node would
need to be of a different class, and then the same casts (C or C++
style) would be required. And, by the way, casting a pointer from a
base class to a derived class is perfectly legal and safe (regarded
that it is actually pointing to an object of the derived class), isn't
it?
...
One of the most important features of the stl is that it separates
algorithms from the container types they operate on.  Most of these
algorithms already exist in the stl.  What is your justification that
they need to be overridden for your container in the first place?
Because I bet that these are optimized for either O(1) random access
or O(1) insertion/removal. The new container doesn't fit any of those,
and it would perform suboptimal if treated by algorithms based on
those assumptions.

Is there a sort specialization for sorted trees? I mean, creating a
tree from scratch and populating it with an insert_sorted method. Even
if it exists, note the optimization I am using: instead of extracting
the nodes one by one from the source tree (with the corresponding
rebalancing), I detach the list from it, and traverse it. The
complexity will remain the same, but this trick will save a lot of
time. This can only be done from inside the class or from a friend
function. The same is for stable_sort().

The methods reverse, merge and unique are all O(N) (well, merge is
O(M+N) ;). I can't imagine how this could be achieved by an existing
algorithm, using only the public interface, and maintaining the
validity of all iterators.

The methods insert_sorted() and binary_search() would take O(sqr(log
n)) if implemented from the outside.
...
...
...
* I would make the value sorted aspect a separate container, [..]
Why? I wouldn't like to restrict the hybrid use of a container object.
[..]
Because it desimplifies the interface, causing bloat and making client
code potentially pay for features it doesn't want/need.
This container is a template. Unused methods don't even get compiled.

If you mean the m_oldpos member in avl_array_node_tree_fields... well,
I agree. It's not so bad, I guess... If I knew a way to cleanly wipe
it out when stable_sort is not used...
...
[..]
Supporting both random location inserts by the user and sorted inserts
doesn't make sense.
I think it might make sense. Think of the tables of a word processor,
or the rows of a spreadsheet: the final user can freely
insert/extract, and there's a sort command in the menu. This command
is often used consecutively for different ordering criterions.
...
The container no longer has control over its
invariants.
Regarding a particular order, true   :-(
Though, the user has (just like with vector and list)   :-)

Best regards,

Martin

Re: [boost] [proposal] Sequence container with O(log n) random access _and_ O(log n) insertion/removal (wherever)

Martin Knoblauch Revuelta