[review results] hash functions accepted

Dear all, I'm pleased to announce that Daniel James' hash-functions library is accepted into boost. The review comments were generally positive, a few bugs were fixed and in general we had a solid discussion in favor of the current design. Congratulations to James! I would like to thank all who partcipated: Dave Harris Peter Dimov John Maddock Joaquin M. Chris Jefferson Caleb Epstein Topher Cooper David Abrahams Alberto Barbati Tobias Swinger Jaap Suter Rob Stewart Bronek Kozicki Below is a list of issue that was resolved; let us have a discussion about any of them if I have misunderstood something. Also, feel free to mention something if it was forgotten by me. Issues and solutions: --------------------- 1. for those who don't want to include the whole std-lib, individual headers can be put in hash/std/XX.hpp (but see 3) 2. update docs to reflect that ADL is used. 3. consider if the primary template should not take 1 range parameter and call hash_range; unordered containers can then be explicitly overloaded when they become available. this would also have the great effect that subranges like sub_range<string> "just works" by default 4. avoid too many zero hash values by defaulting to something other than 0 5. change implementation of hash values for pointers so undefined behavior is gone (and consider adding x + (x >> 3) ) 6. the design aim was predictibility and to minimize collision; for the latter part, I would like to know if similar values hash to similar buckets or not. If very different objects end up in the same buskets, wouldn't equality be much faster rejected? Should this be a concern? 7. the hash_ptr<> suggestion is better done with general indirected function 8. hash_range should be overloaded to take an initial seed best regards Thorsten, review manager -- Thorsten Ottosen ---------------------------- www.dezide.com www.cs.aau.dk/index2.php?content=Research/bss www.boost.org www.open-std.org/JTC1/SC22/WG21/

Thorsten Ottosen wrote:
3. consider if the primary template should not take 1 range parameter and call hash_range; unordered containers can then be explicitly overloaded when they become available. this would also have the great effect that subranges like sub_range<string> "just works" by default
I don't remember this being discussed during the review, could you please explain it in more detail? Do you mean that hash_value should have a primary template that assumes that its argument is a range? If so, I strongly oppose it. Ranges aren't special from the point of view of hash_value, they shouldn't monopolize the primary template. The proper way to make hash_value work with sub_range<T>, as with any other type, is for sub_range to define a hash_value overload as an inline friend or in its namespace.

Peter Dimov wrote:
Thorsten Ottosen wrote:
3. consider if the primary template should not take 1 range parameter and call hash_range; unordered containers can then be explicitly overloaded when they become available. this would also have the great effect that subranges like sub_range<string> "just works" by default
I don't remember this being discussed during the review, could you please explain it in more detail? Do you mean that hash_value should have a primary template that assumes that its argument is a range? If so, I strongly oppose it. Ranges aren't special from the point of view of hash_value, they shouldn't monopolize the primary template. The proper way to make hash_value work with sub_range<T>, as with any other type, is for sub_range to define a hash_value overload as an inline friend or in its namespace.
I think Thorsten is referring in part to some off list emails, and in part to when I wrote (about removing the includes for all the standard containers):
Another possible solution is to just use Boost.Range on anything that 'looks' like a container. But this will break on unordered containers (as equivalent containers can have different sequences).
I forgot to mention that there's no way to tell what 'looks' like a container at all. Joaquín replied:
Too broad IMHO.
So, I think we all rejected the idea. It would be nice if there was a way to do something like that, but there isn't. Daniel

"Daniel James" <daniel@calamity.org.uk> wrote in message news:d1napl$8pu$1@sea.gmane.org... Peter Dimov wrote:
Thorsten Ottosen wrote:
3. consider if the primary template should not take 1 range parameter and call hash_range; unordered containers can then be explicitly overloaded when they become available. this would also have the great effect that subranges like sub_range<string> "just works" by default
I don't remember this being discussed during the review, could you please explain it in more detail? Do you mean that hash_value should have a primary template that assumes that its argument is a range?
yes. | If
so, I strongly oppose it. Ranges aren't special from the point of view of hash_value, they shouldn't monopolize the primary template.
why should anything *special* monopolize the primary template? Why not something *general*? | The
proper way to make hash_value work with sub_range<T>, as with any other type, is for sub_range to define a hash_value overload as an inline friend or in its namespace.
|I think Thorsten is referring in part to some off list emails, and in |part to when I wrote (about removing the includes for all the standard |containers): | |> Another possible solution is to just use Boost.Range on anything that 'looks' like a container. But this will break on unordered containers (as equivalent containers can have different |sequences). | |I forgot to mention that there's no way to tell what 'looks' like a |container at all. Joaquín replied: | |> Too broad IMHO. | |So, I think we all rejected the idea. It would be nice if there was a |way to do something like that, but there isn't. that Joachin don't like the idea is ok, it's just not an arguemnt in favor of much. What are the obstacles? The unordered containers can define their own version. Btw, would you please describe why the unordered containers would "break"? -Thorsten

Thorsten Ottosen wrote:
"Daniel James" <daniel@calamity.org.uk> wrote in message news:d1napl$8pu$1@sea.gmane.org... Peter Dimov wrote:
I don't remember this being discussed during the review, could you please explain it in more detail? Do you mean that hash_value should have a primary template that assumes that its argument is a range?
yes.
If so, I strongly oppose it. Ranges aren't special from the point of view of hash_value, they shouldn't monopolize the primary template.
why should anything *special* monopolize the primary template? Why not something *general*?
There can be at most one primary template (and for ADL customization points, there should be none). When you pick your favorite type category and put that in a primary template, this makes this type category special with respect to the primitive. It means that the primitive operates on this specific category of types. hash_value is not a range primitive. It is a value primitive.

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:016401c52e5e$0e08abe0$6501a8c0@pdimov2... | Thorsten Ottosen wrote: | > "Daniel James" <daniel@calamity.org.uk> wrote in message | > news:d1napl$8pu$1@sea.gmane.org... | > Peter Dimov wrote: | >> I don't remember this being discussed during the review, could you | >> please explain it in more detail? Do you mean that hash_value should | >> have a primary template that assumes that its argument is a range? | > | > yes. | > | >> If | >> so, I strongly oppose it. Ranges aren't special from the point of | >> view of hash_value, they shouldn't monopolize the primary template. | > | > why should anything *special* monopolize the primary template? Why not | > something *general*? | | There can be at most one primary template (and for ADL customization points, | there should be none). why? In boost.range this is done and it is very convenient that the primary template works with all containers. | When you pick your favorite type category and put | that in a primary template, this makes this type category special with | respect to the primitive. speciel == works | It means that the primitive operates on this | specific category of types. this is your interpretation | hash_value is not a range primitive. It is a value primitive. is vector< sub_range<string> >::value_type a value or a range? -Thorsten

Thorsten Ottosen wrote:
|I think Thorsten is referring in part to some off list emails, and in |part to when I wrote (about removing the includes for all the standard |containers): | |> Another possible solution is to just use Boost.Range on anything that 'looks' like a container. But this will break on unordered containers (as equivalent containers can have different |sequences). | |I forgot to mention that there's no way to tell what 'looks' like a |container at all. Joaquín replied: | |> Too broad IMHO. | |So, I think we all rejected the idea. It would be nice if there was a |way to do something like that, but there isn't.
that Joachin don't like the idea is ok, it's just not an arguemnt in favor of much.
I was just recapping what conversation there was. But it does make three people opposed to the idea.
What are the obstacles? The unordered containers can define their own version.
But there could be other containers which also cause problems which we can't anticipate. Or other types which have a sequence, but don't define equality in terms of that sequence.
Btw, would you please describe why the unordered containers would "break"?
It's possible for two unordered containers to contain exactly the same elements but in a different order. So they would get different hash values for hash_range, but are considered equal (although equality isn't usually defined anyway). Daniel

"Daniel James" <daniel@calamity.org.uk> wrote in message news:d1nff6$ph6$1@sea.gmane.org... Thorsten Ottosen wrote:
that Joachin don't like the idea is ok, it's just not an arguemnt in favor of much.
I was just recapping what conversation there was. But it does make three people opposed to the idea.
>>>>>>>>>
all decision should, if possible, be based on arguments not on personal opinions
What are the obstacles? The unordered containers can define their own version.
But there could be other containers which also cause problems which we can't anticipate. Or other types which have a sequence, but don't define equality in terms of that sequence.
>>>>>>>>>
The rule of thumb is that you must make sure objects comparing equal has the same hash-value; what the hash-value is is not a problem AFAICT.
Btw, would you please describe why the unordered containers would "break"?
It's possible for two unordered containers to contain exactly the same elements but in a different order. So they would get different hash values for hash_range, but are considered equal (although equality isn't usually defined anyway).
>>>>>>>>>>
if equality isn't defined, then it makes little sense to use them as index in a hash-based container. In fact, it won't compile. -Thorsten

"Daniel James" <daniel@calamity.org.uk> wrote in message news:d1pmd5$tl2$1@sea.gmane.org... | Thorsten Ottosen wrote: | | > if equality isn't defined, then it makes little sense to use them as index in | > a hash-based | > container. In fact, it won't compile. | | But equality isn't prohibited. So an implementation could supply it. with what purpose? Even so, it must be far easier to write an empty function with a static assert for those cases that should not use the default than to do the opposite. | Or | a non-standard hash container could. Does anyone supply it? If so, with what semantics? Even so, does the extreamely rare situation, that a hash-container is used as index in a hash-container, justify that we should not use a generic default ? -Thorsten

Thorsten Ottosen wrote:
Even so, does the extreamely rare situation, that a hash-container is used as index in a hash-container, justify that we should not use a generic default ?
But that was just a specific example of a general point - that we can't define a hash function for an object when we don't know what its equality function is. Another example: struct foo { int identity_; std::vector<int> values_; std::vector<int>::const_iterator begin() const { return values_.begin(); } std::vector<int>::const_iterator end() { return values_.end(); } friend bool operator==(foo const& x) const { return identity_ == x.identity_; } // ... };

"Daniel James" <daniel@calamity.org.uk> wrote in message news:d1pooj$7ia$1@sea.gmane.org... | Thorsten Ottosen wrote: | | > Even so, does the extreamely rare situation, that a hash-container is used as | > index in a hash-container, | > justify that we should not use a generic default ? | | But that was just a specific example of a general point - that we can't | define a hash function for an object when we don't know what its | equality function is. | | Another example: My point has been that it is up to the programmer to make sure that equal objects has the same hash value. Your point is that it could be the wrong hash value. I don't see why we should guard against this; the user must always make sure that equal objects have the same hash value; consider just what happens when the user adds a new data member to the class and updates equality but forgets to update the hash value or vice versa. So I think that we are just getting a false sense of security out of leaving the default empty. -Thorsten

Thorsten Ottosen wrote:
My point has been that it is up to the programmer to make sure that equal objects has the same hash value.
But the programmer might be oblivious to the hash library (remember that it's used by default in both Boost.MultiIndex and my unordered containers). The object might have been written by one person, and then inserted into a hash table by another who, on seeing that it compiles, thinks that everything has been sorted out in advance.
Your point is that it could be the wrong hash value.
I don't see why we should guard against this; the user must always make sure that equal objects have the same hash value; consider just what happens when the user adds a new data member to the class and updates equality but forgets to update the hash value or vice versa.
When someone writes a custom hash function they're assuming responsibility for its upkeep. But that only happens when they write it. Daniel

Thorsten Ottosen wrote:
So I think that we are just getting a false sense of security out of leaving the default empty.
OK, let's leave equivalence out of it. What is the default for an arbitrary object x of user-defined type X? You claim that it's hash_range( x.begin(), x.end() ), but this is obviously incorrect in the vast majority of cases. I claim that there is no default.

Daniel James <daniel <at> calamity.org.uk> writes:
Thorsten Ottosen wrote:
|I think Thorsten is referring in part to some off list emails, and in |part to when I wrote (about removing the includes for all the standard |containers): | |> Another possible solution is to just use Boost.Range on anything that 'looks' like a container. But this will break on unordered containers (as equivalent containers can have different |sequences). | |I forgot to mention that there's no way to tell what 'looks' like a |container at all. Joaquín replied: | |> Too broad IMHO. | |So, I think we all rejected the idea. It would be nice if there was a |way to do something like that, but there isn't.
that Joachin don't like the idea is ok, it's just not an arguemnt in favor
of
much.
From my point of view, this is just a maintenance issue: if there's a stdlib (none that I know, anyway) with extra
Of course :) My concern about the idea is that we can do better by simply fowrard declaring the STL containers: namespace std{ template<typename T,typename A> class vector; } template<typename T,typename A> std::size_t hash_value(const std::vector<T,A>& x) { ... } etc. The problem with this approach is that it is allowed that stdlib implementors use extra template parameters. parameters we can sync the forward declarations for this particular case and be happy. This is meant to avoid the inclusion of std headers for STL containers, since they're extra overhead when their corresponding hash overloads aren't to be used. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Joaquín M LópezMuñoz wrote:
The problem with this approach is that it is allowed that stdlib implementors use extra template parameters.
They are not, see: http://open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#94

Peter Dimov <pdimov <at> mmltd.net> writes:
Joaquín M LópezMuñoz wrote:
The problem with this approach is that it is allowed that stdlib implementors use extra template parameters.
They are not, see:
So much the better! Is there any obstacle, then, to forward declare STL containers so as to define their corresponding hash_value overloads, instead of including them? In the linked page it is said [quote] "I can't think of any way that this extension could break a conforming program, considering that users are not permitted to forward-declare standard library components" Is this really so? Why cannot I forward declare a stdlib component? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

On Wed, Mar 23, 2005 at 04:06:09PM +0000, Joaquin M Lopez Munoz wrote:
"I can't think of any way that this extension could break a conforming program, considering that users are not permitted to forward-declare standard library components"
Is this really so? Why cannot I forward declare a stdlib component?
Opening namespace std for anything except specialisations of standard library templates (with user-defined types as parameters) is illegal. jon

"Jonathan Wakely" <cow@compsoc.man.ac.uk> wrote in message news:20050323165335.GA52151@compsoc.man.ac.uk... | On Wed, Mar 23, 2005 at 04:06:09PM +0000, Joaquin M Lopez Munoz wrote: | | > "I can't think of any way that this extension could | > break a conforming program, considering that users | > are not permitted to forward-declare standard | > library components" | > | > Is this really so? Why cannot I forward declare a | > stdlib component? | | Opening namespace std for anything except specialisations of standard | library templates (with user-defined types as parameters) is illegal. yes, but why? -Thorsten

On Wed, Mar 23, 2005 at 06:15:14PM +0100, Thorsten Ottosen wrote:
"Jonathan Wakely" <cow@compsoc.man.ac.uk> wrote in message news:20050323165335.GA52151@compsoc.man.ac.uk... | On Wed, Mar 23, 2005 at 04:06:09PM +0000, Joaquin M Lopez Munoz wrote: | | > "I can't think of any way that this extension could | > break a conforming program, considering that users | > are not permitted to forward-declare standard | > library components" | > | > Is this really so? Why cannot I forward declare a | > stdlib component? | | Opening namespace std for anything except specialisations of standard | library templates (with user-defined types as parameters) is illegal.
yes, but why?
Because the Standard says so! :-) I think that's the rule that the DR refers to when it says "users are not permitted ...", whether it's possible/advisable in practice is a different matter. Boost already opens namespace std for various reasons (e.g declaring C library names in std::) so forward decls would be no worse than the current situation - technically illegal, but harmless and even necessary in some cases. I can't see any real compiler ever rejecting the code, only a really strict validation test suite of some kind. jon

"Jonathan Wakely" <cow@compsoc.man.ac.uk> wrote in message news:20050323180511.GA54619@compsoc.man.ac.uk... | On Wed, Mar 23, 2005 at 06:15:14PM +0100, Thorsten Ottosen wrote: | | > "Jonathan Wakely" <cow@compsoc.man.ac.uk> wrote in message | > news:20050323165335.GA52151@compsoc.man.ac.uk... | > | On Wed, Mar 23, 2005 at 04:06:09PM +0000, Joaquin M Lopez Munoz wrote: | > | | > | > "I can't think of any way that this extension could | > | > break a conforming program, considering that users | > | > are not permitted to forward-declare standard | > | > library components" | > | > | > | > Is this really so? Why cannot I forward declare a | > | > stdlib component? | > | | > | Opening namespace std for anything except specialisations of standard | > | library templates (with user-defined types as parameters) is illegal. | > | > yes, but why? | | Because the Standard says so! :-) yes, I know that...I was interesting in examples of *how* we can break programs with forward declarations. -Thorsten

Jonathan Wakely wrote:
On Wed, Mar 23, 2005 at 04:06:09PM +0000, Joaquin M Lopez Munoz wrote:
"I can't think of any way that this extension could break a conforming program, considering that users are not permitted to forward-declare standard library components"
Is this really so? Why cannot I forward declare a stdlib component?
Opening namespace std for anything except specialisations of standard library templates (with user-defined types as parameters) is illegal.
OK, but that has never stopped us (Boost) from doing it before. config/suffix.hpp defines std::min and std::max for non-compliant std libs. lambda/detail/operator_return_type_traits.hpp forward-declares std::complex<>. I don't see anything particularly wrong with that, except that perhaps forward-declarations of std types should be moved into the config sub-project to better handle the variety in different std libs. -- Eric Niebler Boost Consulting www.boost-consulting.com

On Wed, Mar 23, 2005 at 10:01:21AM -0800, Eric Niebler wrote:
Jonathan Wakely wrote:
On Wed, Mar 23, 2005 at 04:06:09PM +0000, Joaquin M Lopez Munoz wrote:
"I can't think of any way that this extension could break a conforming program, considering that users are not permitted to forward-declare standard library components"
Is this really so? Why cannot I forward declare a stdlib component?
Opening namespace std for anything except specialisations of standard library templates (with user-defined types as parameters) is illegal.
OK, but that has never stopped us (Boost) from doing it before. config/suffix.hpp defines std::min and std::max for non-compliant std libs. lambda/detail/operator_return_type_traits.hpp forward-declares std::complex<>. I don't see anything particularly wrong with that, except that perhaps forward-declarations of std types should be moved into the config sub-project to better handle the variety in different std libs.
I was never suggesting Boost shouldn't do it, (I also pointed out that it's done already for BOOST_NO_STDC_NAMESPACE,) I was just giving a likely rationale for the comment in the DR, which is what I thought Joaquin was asking about. The fact that you can get away with it safely in practice doesn't mean it's legal. I think I'll quit while I'm behind :) jon

Thorsten Ottosen wrote:
I would like to thank all who partcipated:
Thank you from me as well, and thanks to Thorsten for managing the review.
Issues and solutions: ---------------------
1. for those who don't want to include the whole std-lib, individual headers can be put in hash/std/XX.hpp (but see 3)
This should probably be a sub-directory of functional, as hash lives at boost/functional/hash.hpp.
4. avoid too many zero hash values by defaulting to something other than 0
I think the preferred solution was to add a constant to hash_combine (which seems to work well). Although, it might be good to have an 'initial_seed' constant.
5. change implementation of hash values for pointers so undefined behavior is gone (and consider adding x + (x >> 3) )
I'm going to do that.
6. the design aim was predictibility and to minimize collision; for the latter part, I would like to know if similar values hash to similar buckets or not. If very different objects end up in the same buskets, wouldn't equality be much faster rejected? Should this be a concern?
Generally, values that collide should not be similar, (which is why collisions of sequences of zeros was a problem), although I'm sure that there are still exceptions to that rule. Especially because it's quite hard to define what 'similar' means in general. Daniel
participants (7)
-
Daniel James
-
Eric Niebler
-
Joaquin M Lopez Munoz
-
Joaquín M López Muñoz
-
Jonathan Wakely
-
Peter Dimov
-
Thorsten Ottosen