
Daniel: First, thanks for your reply. And sorry if I came across as overly critical. I think your library is excellent. The point of the review process is to try to make it even better :) In fact, if we could make the C++ standard better in the process, I'd be delighted. As I've said in my review, but to be clear I'll say once again, none of these items under discussion below are contingent to my (enthusiastic) vote in favor of your library. They are merely for your consideration. On Dec 10, 2007, at 3:54 PM, Daniel James wrote:
I find it unfortunate that it's not defined in the standard. Howard gives the same implementation I would have given, which works in O(N) time.
And only works for unordered_set. How would you define equality for unordered_multimap? Do you compare the mapped values, does the order of the values matter? If it doesn't then the comparison becomes more expensive, if it does then it's a bit odd for an 'unordered' container (although this implementation does preserve insertion order, if it didn't then things would be worse). Is it okay to define equality in terms of the equality predicate for keys, but operator== for values?
It works equally well with a little tweak if you require equality of count(key) for the multiset. I didn't think about map/multimap, thanks for pointing this out. But Daniel, what exactly does operator== for std::multimap do??? In fact, because insertion in std::multimap inserts at the end of the range, the value of std::multimap depends on the insertion order. You wouldn't be doing anything different with unordered_multimap. And as you mention, your implementation does already preserve insertion order, so you wouldn't have any problem to supply these extensions, and they'd be well- defined. Isn't it better if you can provide stronger guarantees than the current draft standard, just as you already do for exceptions? Now the standard should really define what happens when inserting equivalent keys in unordered_multimap, just as it does for multimap, but we're not fixing the standard. I'll raise that point on another mailing list if I get around it. As I understand, it depends on whether you use singly- or doubly-linked lists for chaining, and I buy your argument as to why use doubly-linked lists for multiset. I think you made good points there. As for the equality comparison vs predicate, the same again is true for operator== for the standard associative containers, defined as a.size() == b.size() && std::equal(a.begin(), a.end(), b.begin() [sic] (no, *I* didn't forget the final paren, it's not on page 671 either). Nowhere in that makes use of the "equivalence" implied by operator<, which is used for uniqueness in set/map and equivalence in multiset/multimap. So by defining equality in terms of the equality predicate for keys in order to check for the uniqueness property, but operator== for both keys and values in the implementation of operator== for unordered_map/multimap, you wouldn't exactly break new ground. Or am I missing some subtle implications or other clause in the standard?
I found also the following bogus note in the Working Draft:
10 Unordered associative containers are not required to support the expressions a == b or a != b. [ Note: This is because the container requirements define operator equality in terms of equality of ranges. Since the elements of an unordered associative container appear in an arbitrary order, range equality is not a useful operation. — end note]
That's not bogus at all. An equality operator for unordered containers would have completely different semantics to all other containers (especially when using custom comparisons, comparing associative containers can have surprising results).
Why not? As I argue above, the same is already true of the ordered associative containers. I don't think you'd be providing anything more surprising than already present in the standard. As for surprising semantics, a set and multiset have well-defined notions of value (in a mathematical sense), i.e., a subset from a ground set of key, which is exactly what operator== compares. For map and multimap, the value is a function from a ground set of keys to a value for map, sequence of values for multimap (sequence means that order matters). The operational definition in terms of range is simply an algorithm for comparing those values. Exactly the same is true for the values of unordered containers; operator== as Howard and I've defined them precisely perform this comparison of value. I find nothing surprising in that. The fact that operator== is defined in terms of ranges for sequences and ordered associative containers, and is defined/implemented differently for unordered containers, is not surprising, it's necessary. Again, I believe the standard could be improved on this, and wouldn't your library be the natural vector to provide a reference implementation for a proposal to define operator== for unordered containers?
"Right now, I'm trying to limit myself to obvious design decisions for extensions so I don't really want to go into this area.
Fair enough. You gave good points. I hope my counter-arguments are equally relevant, but you're of course free to not implement them. Still, as boost explores possibilities for later standardization, it could be hoped that you would provide an implementation that provides prior art for future discussion in the committee. Whether you or someone else (perhaps me, I really would like to) would make the proposal, is for the future. Still, I think there's only one sensible definition for unordered_set, and unordered_multiset, and the multiset version isn't trivial either. Would be nice if at least those werre in the library, perhaps with a named function instead of an operator. Would it be a problem if you at least provided those?
The prime number list is too short, it roughly doubles every time. On the other hand, I understand that it wouldn't do that have a *much* larger list, as this static array is included in every translation unit.
I think it can actually cause an ODR violation so I probably should change that anyway. David Abrahams recently posted a way to avoid this by including the value in a template, so I'll try that.
In theory, yes. In practice, linkers don't really enforce it, I hear. Having it as a template doesn't solve the basic problem, which is the replication of identical data in many translation units. In fact, it might exacerbate it by having the same table defined multiple times in the same translation unit! Where I work, we're trying to curb executable sizes (which can be enormous, like you wouldn't believe). This is precisely what would get a frown at the code review.
How much of a difference would it make? The memory requirements for the array of buckets is typically smaller than the memory required for the nodes (unless you have a small number of nodes or increase the maximum load factor). T
You're correct, unless people like to have very large maps of small types (unordered_map<int, some_type*> with 10M elements) where it starts to really matter. On the other end, it also matters if a user has lots of small maps. I think it can really be a problem, in fact I have been in that situation myself (having to fix the prime number table of our implementation of hash table, by request of one user with performance-critical needs). So here I actually speak from practical experience. I believe the best thing, is to move the table to the .cpp, but I also I totally understanding the appeal of header-only libraries. Would you consider a configuration flag, whereby unless explicitly defined, your table is in the header, but for a user which controls its own site installation, a table can be generated into a .cpp and compiled into a library to be linked into programs using Boost.Unordered? I actually did write such a piece of code, it isn't hard, and you can specify the step (30% or 100%) and the range of the desired table. If you include this code into the example section and make the flag public, your users can choose what they prefer based on their requirements. Then again, if they're that sophisticated, maybe they can do it themselves.
Re: swap: I really think you should be following the committee's current advice (I'm looking at the code of hash_table_impl.hpp, lines 1194 -- 1219), sooner rather than later. But my opinion is biased.
Why is your opinion biased?
My boss is John Lakos, and my colleague Pablo Halpern. They both have strong opinions on allocators and value-semantic types. In fact, they're arguing exactly about these issues with the committee (N2436, N2387, N1850 ). I've been converted myself :)
That works both ways, as the committee could change its mind. Since we don't have concepts, following their current decision means slow swaps. Which isn't that bad since it's only for the rare occasions when the allocators aren't equal. I haven't got a strong preference here either, but this was discussed before and the general consensus seemed to be for what I've done.
What I hear from Pablo and John is that they're getting good support around their proposal (ok, they're biased too :)... Maybe you're right to wait and see what the committee will do. Hopefully, won't take long as C++0x is around the horizon and there'll be two meetings in Jan. and June. Somehow, I didn't think that Pablo's proposal (N2436) required concepts, but required traits instead. But I'm not intimately familiar. -- Hervé Brönnimann hervebronnimann@mac.com