Re: [boost] IndexedSet Review

29 Mar 2004

      Hello David, thanks for the effort of reviewing indexed_set
(and seems you made a thorough review indeed!)

"David B. Held" ha escrito:
...
*** What is your evaluation of the design?
[...]
...
It's too bad that you have to specify the type of the member when
using member<>, but I don't see a way to infer it without having
measurable runtime overhead.
This is a recurrent request, but afaik noone has come up with a
solution.
...
The range() member is very clever and is a good use case for
Boost.Lambda, IMO.
Credit here goes entirely to Pavel, who pushed me to include this
memfun --I was reluctant to add non-standard features.
...
It seems to me that modify() need not have the destructive policy
that it does.  It appears that modify() erases the node from the
index, then reinserts it into the appropriate place.  However, it seems
to me that modify() could first look to see if the modified node
already exists, save the iterator to the correct insertion point, and
then only unlink the node if the modification would be succesful.
So the strategy would be like this:
1) Change the modifier function object to require a value() member
    that gives the new value to be used.
2) Instead of calling mod() right away, call something like
    lower_bound(), using mod.value().  Save the iterator if there isn't
    already an element there that conflicts.
3) Perform the erase as usual, and then do mod(), link2(), etc.
Ummm... If I followed your approach I would be incurring two
extra copies which is precisely what modify() is meant to avoid.
For non-destructive updating you can use update(), that's the
idea. If I'm not making myself clear please tell me so.
...
One thing that I think should be added (though not affecting my
vote) is ranked indexes.  So just as an index might be unique or
non-unique, it can be ranked or not ranked.  Ranking gives an
index an O(log n) ordinal search.  That is, you can search for the
ith item according to a certain index in O(log n) time, given an
ordered index (obviously, the sequenced indexes would have an
O(n) search time).  This does change the structure of the nodes,
and necessarily makes them bigger for ranked indexes.  So it is
not a trivial change.  However, I am sure that it is doable and would
add value to the library.
Yep. The idea of ranked containers came up recently, as introduced
by Peter Palotas. I'll add this to the future work section. The most
wanted new features seem to be (in this order) serialization,
hashed indices and ranked indices.
...
*** What is your evaluation of the implementation?
[...]
...
Some of the lines are > 80 columns (e.g.: the copyright notice).
Is this strictly required? If so I'll go thru the code and make the changes.
If not, I'd rather leave it like that.
...
I think that BOOST_NO_MEMBER_TEMPLATES should be
deprecated.  I don't think there's a single supported compiler in
the Boost compiler configs that defines this any more.  I think
checking against BOOST_MSVC6_MEMBER_TEMPLATES is
sufficient, but it would be nice to not even have to check against
that.
<off topic> It'd be nice if Boost.Config provided a (possibly
automatically generated) reference table showing which compilers
do have a specific defect macro.</off topic>

Would it be OK to you if we delayed the remotion of this macro till
the next dev cycle? I fear I could be inadvertently missing an
otherwise supportable compiler.
...
I do believe the parts that come from other code need to retain the
original copyright.  The parts that were inspired by, but not directly
derived from, another implementation probably don't need a
copyright notice, but should probably give a mention in the copyright,
at the least as a matter of professional courtesy.
No problem. Apart from stl_set.h I borrowed nothing from other
sources. Will add the (c) where appropriate.
...
*** What is your evaluation of the documentation?
[...]
...
Here is a somewhat minor nitpick.  In the tutorial, the index_list is
specified as so:
(type of index)<[(tag)[,(key extractor)[,(comparison predicate)]]]>
    (type of index)<[(key extractor)[,(comparison predicate)]]>
I would change it to thus:
(unique | non_unique)<[(tag)[,(key extractor)[,(comparison
predicate)]]]>
    (unique | non_unique)<[(key extractor)[,(comparison predicate)]]>
Yes, why not. Consider it changed.
...
It would be nice to be able to read through the documentation like
a manual.  Thus, next links on each page would be appropriate.
On the performance page, you state that the tests were performed on
a system with 260 KB "RAM memory", running W2K.  I'm absolutely
certain that is not correct. ;)  "256 MB RAM" sounds a lot more likely.
You're absolutely correct. Fixed. Thanks.
...
Something that I think should be added to the documentation is a
concept page that lists the main concepts in IndexedSet and a concept
interface a la the SGI STL reference (www.sgi.com/tech/stl).
More on this later down the post.
...
*** What is your evaluation of the potential usefulness of the library?
[...]
...
*** Did you try to use the library?  With what compiler?
[...]
...
*** How much effort did you put into your evaluation? A glance? A
quick reading? In-depth study?
[...]
...
*** Are you knowledgeable about the problem domain?
[...]
...
*** Do you think the library should be accepted as a Boost library?
Yes, most definitely.
Great! The score is 9-0 so far.
...
*** Naming issues:
Here's my take on names, but should not be construed as affecting
my vote for inclusion...
Namespace: I think too many namespace levels are not helpful.
A separate 'container' namespace seems a bit superfluous to me.
There is no danger in name clashes within Boost between container
libraries and other kinds of libraries that I can see, and I don't see
that it provides a significant documentation benefit.  I think additional
namespace depth is most appropriate for collections of algorithms,
where a proliferation of names is more likely to lead to name
collisions.
Well, I stated previously that I'll do what people agree on. I don't
know if the final decision should be made by Pavel as the review manager.
IMHO, qualified identifiers will be so long anyway that having an
additional ::container:: won't hurt much.
...
Library name: I don't think IndexedSet is so bad, but an alternative
that I like is CompositeSet.  Then there could be a Composite
Sorted Associative Container concept which would support the
notion of containers with multiple keys.  Then CompositeSet
would be a model of Composite Sorted Associative Container.
A synonym for Sorted Associative Container could be Simple
Sorted Associative Container or Trivial Sorted Associative
Container.
The main problem with "Set" slipping into the name of the container
is that, for instance, the following

indexed_set<int,index_list<sequenced<> > >

does not have anything to do with sets (it resembles a std::list of
constant ints.) For this reason I like Multicontainer better (or something
along this line.)

<off topic>Reviewers have showed little or no interest in sequenced indices.
Seems weird to me, as they can be used to construct a seemingly useful
list-like container with fast lookup.</off topic>

As for the Composite Sorted Associative Container concept, we have the
same problem. Technically, the requirements imposed to current or
future index types by indexed_set are much weaker than being some
sort of associative container. I cannot even state global complexity
bounds (please check the reference, page "Index reference", section
"Complexity signature").
I agree with you a concept section would be highly desirable, but here
I definitely need some help. What you request is not trivial.
...
Indexes: According to the American Heritage Dictionary, Fourth
Edition, as channeled by www.dictionary.com, "indexes" is
preferred to "indices".  Another spelling metric I like to use is
Google.  According to Google, "indexes" returns 5.8 million
hits, while "indices" returns 5.5 million hits.  So that's only a
5% difference, but it's statistically significant. ;)  I personally
think the preferred spelling should be used throughout the
documentation, but that's more of a nitpick than a criticism.
My Latin (I speak Spanish) heritage makes me favor "indices"
to "indexes" (and "minuscule" to "miniscule", etc.)
The Google metric could be biased by the fact that "indexes" is
not only the plural of "index" but also the 3rd singular formal
of the Present tense of the verb "index".
May people allow me to retain "indices"? If only as a small
concession to those of use who studied and love the
olde Latin language.
...
Anyway, I prefer "ordered index" to "regular index".  As far as
index types go, I prefer "ordered_unique" to "set_like", due to
consistency.  I think Ordered Index should be a concept for
this library.
Others do also like ordered_unique etc. I do too.
...
nth_index_type: I don't like the word "nth", personally.  I suggest
either of the following naming schemes:
nth_index_type -> ordinal_index
index_type -> named_index | tagged_index
nth_index_type -> index_byval | index_bynum
index_type -> index_byname | index_bytag
Generalize appropriately to the iterator types.
I recently proposed the following

template<int N> struct nth_index;
template<typename Tag> struct index;
template<int N> struct nth_index_iterator;
template<int N> struct nth_index_const_iterator;
template<typename Tag> struct index_iterator;
template<typename Tag> struct index_const_iterator;

which seems to me good enough. Don't you like it?
The nth_ prefix has some tradition in C++, why not use it here
too?
...
update(): Actually, I do prefer "replace()" over "update()", because
it does more accurately reflect the operation.
Me too.
...
As far as header organization goes, I would think of it this way:
What are the major features and concepts of the library, and
how will they be used separately or together?  Organize the
headers in a way that allows users to include feature sets as
modules.  If you have to, create dummy headers that just include
sets of headers that are logically dependent so that users can
activate a certain feature by including that header.  Also,
consider the cost of each header.  It is worth the extra effort
to isolate expensive headers than cheap ones.  Other than
that, I don't have any particular suggestions.
Well, I'm gathering all the feedback from the reviewers wrt to
naming and header organization and will try to come up with some
consistent proposal today or tomorrow.

Thank you again for taking the effort to review the library. I hope
you can find it useful in the future. (Here goes a challenge: what
about writing a multiply indexed map on top of indexed_set?
You're the map expert here at Boost ;) Please check the Future
work section for a preliminary discussion on the subject of
indexed maps.)

Best,

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo

Re: [boost] IndexedSet Review

Joaquín Mª López Muñoz