Re: [boost] [review] Review of Unordered library begins today December 7

10 Dec 2007

      First of all, this is a fine implementation.  I'm just curious why,  
since most compilers now provide a tr1 implementation, if it is just  
historical or if this implementation claims to be better than some  
other existing TR1 implementations.  But with the purpose of fitting  
Boost.Tr1, this will fulfill the purpose.  And great care was taken  
to be standard compliant.  I vote for accept.

I tried to spend a few hours looking at the code and reading the  
documentation.  This is what I find below.  I haven't had the time to  
even run an example or test the code, though.  I limited myself only  
to examination.

Overall, the focus and hard-work (esp. with exception safety and its  
testing) is impressive.  It seems you've taken care of a lot of  
standardese aspects.  Kudos.

Definition of operator==:
---------------------------------

One small issue I see, is that you should define operator==.  I  
searched a bit, and found Howard's statement (on http:// 
www.velocityreviews.com/forums/t291501-stdset-versus- 
tr1unorderedset.html):
...
I unsuccessfully tried to get an operator== into tr1::unordered_*.  
Assuming a good hash function such an operation can have the  
intended semantics and O(N) complexity."
I find it unfortunate that it's not defined in the standard.  Howard  
gives the same implementation I would have given, which works in O(N)  
time.  I found also the following bogus note in the Working Draft:
...
10 Unordered associative containers are not required to support the  
expressions a == b or a != b. [ Note: This is because the container  
requirements define operator equality in terms of equality of  
ranges. Since the elements of an unordered associative container  
appear in an arbitrary order, range equality is not a useful  
operation. — end note]
As if they couldn't specify in the Table "Additional requirements" a  
different post-condition, or alternately, state the post-condition in  
terms of range only in the Tables for sequence and ordered  
associative containers, and put in a different post-condition for  
unordered containers.  Howard also states:
...
The final decision was to let implementors provide it as an  
extension, but not to require it.
Daniel, seems to me that you'd have the liberty to implement it, and  
if you chose to not name it operator==, that'd be fine too. It would  
be a boost extension.  But it really irks me that an unordered  
container is considered not to have a value (can't have a value if  
you can't compare them) when there's no good reason not to provide  
operator==.

Comments on the code:
--------------------------------

I really appreciate the care you've taken wrt exception safety.

The prime number list is too short, it roughly doubles every time.    
On the other hand, I understand that it wouldn't do that have a  
*much* larger list, as this static array is included in every  
translation unit.  One possibility for a much finer one would be to  
make unordered an object library, included in the build.  I  
understand if you want to keep it "include-only", but could you at  
least enlarge the list to have an increase of 30% instead of 100%?

In the next_prime and prev_prime functions, please do not use the  
hard constant 28.  Use instead
    static const int num_primes = sizeof prime_list / sizeof  
*prime_list;

What exactly is the use of pair_cast?  There must be a legitimate  
reason that I don't quite understand.  The pair template conversion  
constructor should suffice, no?

Re: swap:  I really think you should be following the committee's  
current advice (I'm looking at the code of hash+table_impl.hpp, lines  
1194 -- 1219), sooner rather than later.  But my opinion is biased.   
At least you state precisely what you're doing.  This is not a  
criticism of your code.  But such change might break your client's  
code, if the standard later codifies its current decision.  It seems  
that changing the semantics of swap before unordered is accepted and  
released with Boost, would save some grief later.

There is some convention in boost that members begin with m_... or  
s_...  but I don't think it's very important.  Your system (the  
trailing underscore) is fine.  Yet, these things show in a debugger.

Minor comments on the documentation:
------------------------------------------------------

In buckets.html:

*  The sentence "The + 1 is required because the container is allowed  
to resize when the load factor is equal to the maximum load factor"  
is unclear to me, although I think I figured out the intended  
meaning.  Resize is most certainly wrong here (size is not changed),  
you meant rehash?  I'd rephrase  "The + 1 guarantees there is no  
invalidation; without it, reallocation could occur if the number of  
bucket exactly divides the target size, since the container is  
allowed to resize when the load factor is equal to the maximum load  
factor."

In hash_equality.html:

* This is not quite true: "An example implementation of FNV-1, and  
some other hash functions are supplied in the examples directory."   
Only FNV is provided...  After all, these unordered containers are  
only useful if you have a good hash function to go with it.  You  
should definitely mention the Boost.Hash library, which provides  
hashes for these unordered containers.  Nevertheless boost.hash does  
not give you tradeoff between speed and good hashing, which is why  
other more specialized hashes could be provided.  Besides FNV-1, I  
found Robert Jenkins' (http://www.burtleburtle.net/bob/hash/ 
evahash.html) quite good.

* I am always reminded of Matt Austern's column (http://lafstern.org/ 
matt/col2_new.pdf) for the example you give.   I went back to get  
most o the points, and then read your implementation.  Unfortunately,  
your functor implementation falls prey to the default pointed out in  
Matt's article: std::toupper depends on the global locale, so that if  
I call setlocale between two calls to your functor, I might get  
different results.  Be sure to check out Matt's column.

*  the declaration
     boost::unordered_multiset<point, std::equal_to<point>,  
point_hash>  points;
looks wrong (order of template args).

* I wished you put links to boost.hash.

In comparison.html:

* The spelling parametrise is quite unusual (although correct, very  
British!) but in any case incompatible with the C++ Working Draft,  
which uses parameterize.  I think you should adopt the most common  
spelling.  Same thing with amortized.

* "ordering comparison: No equivalent. No idea why."  A bit of humor  
doesn't hurt :)

* "Can be compared using the ==, !=, <, <=, >, >= operators: No  
comparison operators are defined."  That's a sticky point I mentioned  
above.

* Complexity guarantees:  You should state (recall?) what n, N, etc.  
stand for.  Esp since you use size() also.

* "Inserting a range of N elements: Average case O(N), worst case O(N  
* 'size()')" single quotes and font around size()

* "Find: logarithmic    Average case: O(N), Worst case: O(size())"  I  
think you meant "Average case: O(1)"

In rationale.hpp:

* "For example, an it..." : delete "an"

* "loosing the efficiency": losing efficiency. Spelling in other  
places too

* "Swap: I followed Howard Hinnant's advice and implemented option  
3.":  Huh?  Were are these options 1, 2, and 3 described?  You should  
give a link to http://www.open-std.org/JTC1/SC22/WG21/docs/papers/ 
2004/n1599.html.

* reference pages:  the constructor that takes arguments (size_type =  
impl-defined, ..) should be declared explicit, for all four  
containers.  The code is already doing the right thing.

--
Hervé Brönnimann
hervebronnimann@mac.com