[boost] IndexedSet Review

29 Mar 2004

      *** What is your evaluation of the design?

Overall, the design seems sound.  I didn't look at every aspect
of the design in my review, but the usage seems straightforward,
and as the end-goal, I think it passes.

It's too bad that you have to specify the type of the member when
using member<>, but I don't see a way to infer it without having
measurable runtime overhead.

The range() member is very clever and is a good use case for
Boost.Lambda, IMO.

It seems to me that modify() need not have the destructive policy
that it does.  It appears that modify() erases the node from the
index, then reinserts it into the appropriate place.  However, it seems
to me that modify() could first look to see if the modified node
already exists, save the iterator to the correct insertion point, and
then only unlink the node if the modification would be succesful.

So the strategy would be like this:

1) Change the modifier function object to require a value() member
    that gives the new value to be used.
2) Instead of calling mod() right away, call something like
    lower_bound(), using mod.value().  Save the iterator if there isn't
    already an element there that conflicts.
3) Perform the erase as usual, and then do mod(), link2(), etc.

Wow, Boost.Lambda looks like a home run, were it not for the
fact that directly assigning to members through the set interface
is a serious violation of encapsulation. ;)

The safe mode and invariant checking are also nice features.
Perhaps they could be a guideline or recommended feature for
Boost libraries.

One thing that I think should be added (though not affecting my
vote) is ranked indexes.  So just as an index might be unique or
non-unique, it can be ranked or not ranked.  Ranking gives an
index an O(log n) ordinal search.  That is, you can search for the
ith item according to a certain index in O(log n) time, given an
ordered index (obviously, the sequenced indexes would have an
O(n) search time).  This does change the structure of the nodes,
and necessarily makes them bigger for ranked indexes.  So it is
not a trivial change.  However, I am sure that it is doable and would
add value to the library.

*** What is your evaluation of the implementation?

The implementation seems fairly clean, modulo the regular
portability hacks.  The coding style is tighter than I prefer (w.r.t
whitespace) but doesn't affect my opinion of the code itself.  It
seems that some care has been taken to consider performance
issues without necessarily providing the fastest implementation
possible.  While the use of MPL, Lambda, and Tuples makes
the implementation a little fat, this is the kind of compile-time
cost that I think is justified by the run-time and code expressivity
benefits.

Some of the lines are > 80 columns (e.g.: the copyright notice).

I don't like the C-style comments, but I don't expect to be modifying
the source, so it's not really an issue.

I think that BOOST_NO_MEMBER_TEMPLATES should be
deprecated.  I don't think there's a single supported compiler in
the Boost compiler configs that defines this any more.  I think
checking against BOOST_MSVC6_MEMBER_TEMPLATES is
sufficient, but it would be nice to not even have to check against
that.

I do believe the parts that come from other code need to retain the
original copyright.  The parts that were inspired by, but not directly
derived from, another implementation probably don't need a
copyright notice, but should probably give a mention in the copyright,
at the least as a matter of professional courtesy.

*** What is your evaluation of the documentation?

The documentation is very good, except that it could use some very
minor editing (which does not affect the clarity or understandability
of the documentation).

Here is a somewhat minor nitpick.  In the tutorial, the index_list is
specified as so:

    (type of index)<[(tag)[,(key extractor)[,(comparison predicate)]]]>
    (type of index)<[(key extractor)[,(comparison predicate)]]>

I would change it to thus:

    (unique | non_unique)<[(tag)[,(key extractor)[,(comparison
predicate)]]]>
    (unique | non_unique)<[(key extractor)[,(comparison predicate)]]>

It would be nice to be able to read through the documentation like
a manual.  Thus, next links on each page would be appropriate.

On the performance page, you state that the tests were performed on
a system with 260 KB "RAM memory", running W2K.  I'm absolutely
certain that is not correct. ;)  "256 MB RAM" sounds a lot more likely.

Something that I think should be added to the documentation is a
concept page that lists the main concepts in IndexedSet and a concept
interface a la the SGI STL reference (www.sgi.com/tech/stl).

*** What is your evaluation of the potential usefulness of the library?

I think the library is very useful and fills an important gap in the STL.
I think it would be "complete" if it were to add ranked index support.
Then it would be a total replacement of containers like VB's
Collection.  In particular, I think the key functors alone make it
useful, let alone multiple indexes and the like.

*** Did you try to use the library?  With what compiler?
 Did you have any problems?

I did try very briefly.  However, I got blocked by BJam, and didn't
take the time to d/l the latest Boost to fix it (assuming it does).  I
will try to get BJam working later, and if I succeed before the
review is over, I will at least run the tests and create a few small
test programs myself.

*** How much effort did you put into your evaluation? A glance? A
quick reading? In-depth study?

I read all the documentation except for the reference section, and
I spent some time looking over random parts of the code.

*** Are you knowledgeable about the problem domain?

Yes.  I've had need for similar containers and have hand-rolled
some solutions.  The scope of IndexedSet is impressive without
being too feature-bloated.

*** Do you think the library should be accepted as a Boost library?

Yes, most definitely.

*** Naming issues:

Here's my take on names, but should not be construed as affecting
my vote for inclusion...

Namespace: I think too many namespace levels are not helpful.
A separate 'container' namespace seems a bit superfluous to me.
There is no danger in name clashes within Boost between container
libraries and other kinds of libraries that I can see, and I don't see
that it provides a significant documentation benefit.  I think additional
namespace depth is most appropriate for collections of algorithms,
where a proliferation of names is more likely to lead to name
collisions.

Library name: I don't think IndexedSet is so bad, but an alternative
that I like is CompositeSet.  Then there could be a Composite
Sorted Associative Container concept which would support the
notion of containers with multiple keys.  Then CompositeSet
would be a model of Composite Sorted Associative Container.
A synonym for Sorted Associative Container could be Simple
Sorted Associative Container or Trivial Sorted Associative
Container.

Indexes: According to the American Heritage Dictionary, Fourth
Edition, as channeled by www.dictionary.com, "indexes" is
preferred to "indices".  Another spelling metric I like to use is
Google.  According to Google, "indexes" returns 5.8 million
hits, while "indices" returns 5.5 million hits.  So that's only a
5% difference, but it's statistically significant. ;)  I personally
think the preferred spelling should be used throughout the
documentation, but that's more of a nitpick than a criticism.

Anyway, I prefer "ordered index" to "regular index".  As far as
index types go, I prefer "ordered_unique" to "set_like", due to
consistency.  I think Ordered Index should be a concept for
this library.

Key extraction: Leave it in the library.

nth_index_type: I don't like the word "nth", personally.  I suggest
either of the following naming schemes:

nth_index_type -> ordinal_index
index_type -> named_index | tagged_index

nth_index_type -> index_byval | index_bynum
index_type -> index_byname | index_bytag

Generalize appropriately to the iterator types.

update(): Actually, I do prefer "replace()" over "update()", because
it does more accurately reflect the operation.

As far as header organization goes, I would think of it this way:
What are the major features and concepts of the library, and
how will they be used separately or together?  Organize the
headers in a way that allows users to include feature sets as
modules.  If you have to, create dummy headers that just include
sets of headers that are logically dependent so that users can
activate a certain feature by including that header.  Also,
consider the cost of each header.  It is worth the extra effort
to isolate expensive headers than cheap ones.  Other than
that, I don't have any particular suggestions.

Dave

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.581 / Virus Database: 368 - Release Date: 2/10/2004

[boost] IndexedSet Review

David B. Held