[boost] Re: IndexedSet Review

29 Mar 2004

      "Joaquín Mª López Muñoz" <joaquin@tid.es> wrote in message
news:406802F9.4990B35C@tid.es...
...
[...]
Ummm... If I followed your approach I would be incurring two
extra copies which is precisely what modify() is meant to avoid.
For non-destructive updating you can use update(), that's the
idea. If I'm not making myself clear please tell me so.
Yeah, I'm not understanding where the extra copies would come
from.  As far as I can tell, it's possible to reorder the operations
in modify() without increasing the cost of execution except by a
few instructions.  This is my understanding of how modify() works
now:

mod(element); // changes the element immediately and in-place
unlink(element); // takes the node out of the index
link2(element); // puts it back in at the correct spot

This is what I'm suggesting:

i = find(new_element); // look to see if modify() can even succeed
if (*i == new_element) return;
mod(element); // now change the node, since modify() will succeed
unlike(element); // same as before
link2(element, i); // reinsert, but use i as a hint (which will be correct)

I don't see any copies being made at all (except possibly of an
iterator, which should hardly be prohibitively expensive).  What is
new is that you need to be able to look at the changed value without
actually changing anything.  That's why I suggested a change to
the Modifier function object:

class change_name
{
public:
    change_name(std::string const& new_name) : name_(new_name) { }
    std::string const& value(void) const { return name_; }
    void operator()(employee& em) const { em.name_ = name_; }
private:
    std::string name_;
};

So find(element) is something like lower_bound(mod.value()).
The complexity shouldn't change because in both cases, you have
to perform 2 log n searches through the index to remove and then
relink the node.  The only difference is when you perform those
searches.  I'm saying take the second insert search and move it
to the beginning of the operation, so that you know if you can
succeed before you unlink the node.  If I'm missing something,
let me know.
...
[...]
...
Some of the lines are > 80 columns (e.g.: the copyright notice).
Is this strictly required? If so I'll go thru the code and make the
changes. If not, I'd rather leave it like that.
I don't think it's strictly required, but a lot of people complain
about long lines.
...
<off topic> It'd be nice if Boost.Config provided a (possibly
automatically generated) reference table showing which
compilers do have a specific defect macro.</off topic>
That would be great, but as with many things, complaints are
usually satisfied by the complainer. ;)
...
[...]
Great! The score is 9-0 so far.
I think it's the review manager's job to keep track of that, but I
guess a little enthusiasm doesn't hurt. ;)
...
[...]
Well, I stated previously that I'll do what people agree on. I don't
know if the final decision should be made by Pavel as the review
manager.  IMHO, qualified identifiers will be so long anyway that
having an additional ::container:: won't hurt much.
It won't hurt, but if it won't help, why should it be there?  We can add
gratuitous namespace levels all day long, but that doesn't mean
we should.
...
[...]
The main problem with "Set" slipping into the name of the
container is that, for instance, the following
indexed_set<int,index_list<sequenced<> > >
does not have anything to do with sets (it resembles a std::list of
constant ints.) For this reason I like Multicontainer better (or
something along this line.)
Perhaps, but "multicontainer" sounds *too* generic.  I would expect
something with that name to support any number of back-ends, like
vector, deque, etc.  Actually, come to think of it, that might not be
such a bad idea.  However, it wouldn't really fit the implementation
of the library, because the container is a node-based one, and
vector wouldn't play as nicely as the others.  But if you could not
only created computed indices, but also *virtual indices*, then you
could create an associative vector!  However, this might exceed
the scope of the library.  By a virtual index, I mean that the vector
itself would remain sorted, and you would have an index that
performs a binary search without actually storing any additional
node information.  That sounds like a big mess, though.

The reason I think that "Set" is ok is that the STL "Set" is only
vaguely related to mathematical sets.  The only property of
essence that is related to mathematical sets is the uniqueness
property, and even that is violated by the multiset.  The vagarities
of computational performance necessitate that the STL notion
of Set be stricter than the mathematical notion for it to be usable.
In the same way, I think that IndexedSet need not strictly be even
an STL Set for similar reasons.  It supports the "essential
properties" of uniqueness and intrinsic ordering, and adds
additional concepts which cater to computational practicalities.
It's not *merely* a container, but a special kind of container.  It's
not truly a generic container, so I think it would benefit from a
more specific name.  It's primarily associative, and that's the
essence of the STL Set, which is why I think your container
ultimately *is* a set, even though a degenerate configuration
may not be.
...
<off topic>Reviewers have showed little or no interest in
sequenced indices.  Seems weird to me, as they can be used
to construct a seemingly useful list-like container with fast
lookup.</off topic>
This is probably because people who need the fast lookup usually
don't need the original insert order, so the list aspect of it has
limited usefulness.  However, I can think of a few use cases where
it might be handy, so I suspect that really, this is a feature waiting
to be discovered.
...
As for the Composite Sorted Associative Container concept, we
have the same problem. Technically, the requirements imposed
to current or future index types by indexed_set are much weaker
than being some sort of associative container. I cannot even state
global complexity bounds (please check the reference, page
"Index reference", section "Complexity signature").
Ok, so if the concept is not a refinement of Associative Container,
then create a new concept!
...
I agree with you a concept section would be highly desirable, but
here I definitely need some help. What you request is not trivial.
Then let's take a first stab at it.  What you have is similar to an
Associative Container, but is fundamentally different.  The way in
which it is different is that there is more than one way to access
the data.  And, as you say, it is like multiple containers in one.  So
I think the most fundamental concept here is Composite Container.
The reason we don't want to use "Multiple" is that "Multiple
Associative Container" means "multiset" and "multimap" (not my
favorite terminology, but nobody asked me when they were naming
them ;).

Let's rip off SGI for a minute:

    Description

    A Composite Container is a variable-sized Container that
    supports retrieval of elements (values) based on one or more
    keys. It supports insertion and removal of elements, but differs
    from an Associative Container in that the complexity of the
    operations depends on the type of key indices used.

    [insert additional discussion here]

    Refinement of

    Forward Container

Now, here we have a tricky issue.  By default, IndexedSet *is*
default constructible, but we can't say that Composite Container
should be, because IndexedSet might *not* be default c'tible if
one of the key types are not.  The STL solved this by simply
implicitly requiring key comparators to be default c'tible, but
perhaps this requirement is not justified.  Anyway, let's continue:

    Associated types

    [this part is lengthy, but I think you know what to fill in here]

    Notation

    [this part should be obvious as well]

    Definitions

    [just rip off SGI some more ;)]

    Valid expressions

    [this is where the interesting stuff goes]

    Expression semantics

    [this is more of the meat of the concept]

    Complexity guarantees

    [just put your conditional complexity analyses here]

In particular, I think you should note that both ordered and non-
ordered indices are possible, and give separate sets of
guarantees for each, noting how the complexity is bounded from
below by the slowest index type.  I don't see how you can do much
better than that, but I think at least that much is required.

    Invariants

    [you probably know what to put here, for the most part]

Models

    * indexed_set (of course)
    * bimap (if anyone makes it ;)

That's a start anyway, and if you fill in as much of that as you can,
I think others will be able to help you fill in the rest.  It also might
help if you define other concepts first, like Ordered Unique Index,
etc.  I know it's a lot of work, but I think it's worthwhile (especially
for people wanting to extend the library, i.e.: by defining custom
index types).
...
[...]
My Latin (I speak Spanish) heritage makes me favor "indices"
to "indexes" (and "minuscule" to "miniscule", etc.) The Google
metric could be biased by the fact that "indexes" is not only the
plural of "index" but also the 3rd singular formal of the Present
tense of the verb "index".
Good call!  That was some sloppy reasoning on my part. ;)
...
[...]
The nth_ prefix has some tradition in C++, why not use it here
too?
I guess there is "nth_element()".  I've never had need for it, so
I didn't even notice it.

Dave

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.581 / Virus Database: 368 - Release Date: 2/9/2004

[boost] Re: IndexedSet Review

David B. Held