
In-Reply-To: <d0kr8r$gl6$1@sea.gmane.org> nesotto@cs.auc.dk (Thorsten Ottosen) wrote (abridged):
It's a pleasure to announce that the review of Daniel James' hash functions library starts today (8th of March) and ends the 12th of March.
I see I have missed the deadline. Oh, well. Several queries have been raised during the review, some of which have merit. Firstly, I think we have some agreement that the signature: size_t hash_range( It first, It last ); is wrong and there should be some way to provide an initial seed. I am not yet decided on what the best signature should be; I am tempted towards a 3rd argument with a default value, but I can also see some merit in mimicking hash_combine() by making the seed the first argument. Secondly, the proposed implementation for hashing pointers produces undefined behaviour because it subtracts pointers not in the same array. We have some agreement that using reinterpret_cast would be better. Thirdly, I think there is also a legitimate query over whether the default hash_range should provide better quality hashes for arrays of bytes, eg along the lines discussed in: http://www.azillionmonkeys.com/qed/hash.html With boost interface is more important than implementation, but apparently implementations of the hash library will not be allowed to vary. My impression is that the proposed implementation of hash_combine: seed ^= hash_value(v) + (seed << 6) + (seed >> 2); was designed to combine 8-bit character values; do we have any evidence it also performs well to combine 32-bit values? Perhaps hash_value(v) would be better treated as an array of 4 bytes? Fourthly, there seems to be a design philosophy behind the current proposal which is not stated explicitly in the documentation. It apparently favours some notion of safety over speed and hash quality, where "safety" means that different objects should have the same hash values if they look at all alike to someone possibly. For example, the proposal has no specialisation for chars at all. I can't tell if that is an oversight, or whether it is a design aim that: char chars[] = { 1, 2, 3, 4 }; int ints[] = { 1, 2, 3, 4 }; assert( hash_range( chars, chars+4 ) == hash_range( ints, ints+4 ) ); I gather it is a design aim that hash_value( pair<int,int> ) match hash_range( int[2] ). Presumably if hash_range takes a seed argument, this means hash_value must too, else how can it match it? I am not sure about all this philosophy. I had kinda hoped that a boost hashing library would offer good quality industrial-strength hashing; that it would save me having to do my own research. Regardless, I think the aims and guarantees of the proposed library should be documented more explicitly. In conclusion, I feel that if we given only 5 days to review a proposal, that proposal needs to be spot on, uncontroversial, with all issues nailed accurately in advance. I don't think that is the case here. Although I think all the problems could be resolved, I don't think we have enough time to do that in the given review period. If I have to decide now, I'd say: This should not be accepted as a Boost library. Stock questions: I have not tried to compile the code. I have looked at the documentation, the source, and followed some of the references. I have raised design questions here and followed the discussion. Total time is a few hours. I am not an expert on hashing, but I have implemented my own hashing library. I have encountered practical problems due to different objects having the same hash values. -- Dave Harris, Nottingham, UK