[review results] hash functions accepted

Thorsten Ottosen

21 Mar 2005 21 Mar '05

3:48 p.m.

Dear all, I'm pleased to announce that Daniel James' hash-functions library is accepted into boost. The review comments were generally positive, a few bugs were fixed and in general we had a solid discussion in favor of the current design. Congratulations to James! I would like to thank all who partcipated: Dave Harris Peter Dimov John Maddock Joaquin M. Chris Jefferson Caleb Epstein Topher Cooper David Abrahams Alberto Barbati Tobias Swinger Jaap Suter Rob Stewart Bronek Kozicki Below is a list of issue that was resolved; let us have a discussion about any of them if I have misunderstood something. Also, feel free to mention something if it was forgotten by me. Issues and solutions: --------------------- 1. for those who don't want to include the whole std-lib, individual headers can be put in hash/std/XX.hpp (but see 3) 2. update docs to reflect that ADL is used. 3. consider if the primary template should not take 1 range parameter and call hash_range; unordered containers can then be explicitly overloaded when they become available. this would also have the great effect that subranges like sub_range<string> "just works" by default 4. avoid too many zero hash values by defaulting to something other than 0 5. change implementation of hash values for pointers so undefined behavior is gone (and consider adding x + (x >> 3) ) 6. the design aim was predictibility and to minimize collision; for the latter part, I would like to know if similar values hash to similar buckets or not. If very different objects end up in the same buskets, wouldn't equality be much faster rejected? Should this be a concern? 7. the hash_ptr<> suggestion is better done with general indirected function 8. hash_range should be overloaded to take an initial seed best regards Thorsten, review manager -- Thorsten Ottosen ---------------------------- www.dezide.com www.cs.aau.dk/index2.php?content=Research/bss www.boost.org www.open-std.org/JTC1/SC22/WG21/

Show replies by date

Peter Dimov

21 Mar 21 Mar

7:16 p.m.

Thorsten Ottosen wrote:

...

3. consider if the primary template should not take 1 range parameter and call hash_range; unordered containers can then be explicitly overloaded when they become available. this would also have the great effect that subranges like sub_range<string> "just works" by default

I don't remember this being discussed during the review, could you please explain it in more detail? Do you mean that hash_value should have a primary template that assumes that its argument is a range? If so, I strongly oppose it. Ranges aren't special from the point of view of hash_value, they shouldn't monopolize the primary template. The proper way to make hash_value work with sub_range<T>, as with any other type, is for sub_range to define a hash_value overload as an inline friend or in its namespace.

Daniel James

8:30 p.m.

Peter Dimov wrote:

...

Thorsten Ottosen wrote:

...
3. consider if the primary template should not take 1 range parameter and call hash_range; unordered containers can then be explicitly overloaded when they become available. this would also have the great effect that subranges like sub_range<string> "just works" by default

I don't remember this being discussed during the review, could you please explain it in more detail? Do you mean that hash_value should have a primary template that assumes that its argument is a range? If so, I strongly oppose it. Ranges aren't special from the point of view of hash_value, they shouldn't monopolize the primary template. The proper way to make hash_value work with sub_range<T>, as with any other type, is for sub_range to define a hash_value overload as an inline friend or in its namespace.

I think Thorsten is referring in part to some off list emails, and in part to when I wrote (about removing the includes for all the standard containers):

...

Another possible solution is to just use Boost.Range on anything that 'looks' like a container. But this will break on unordered containers (as equivalent containers can have different sequences).

I forgot to mention that there's no way to tell what 'looks' like a container at all. Joaquín replied:

...

Too broad IMHO.

So, I think we all rejected the idea. It would be nice if there was a way to do something like that, but there isn't. Daniel

Thorsten Ottosen

9:18 p.m.

"Daniel James" <daniel@calamity.org.uk> wrote in message news:d1napl$8pu$1@sea.gmane.org... Peter Dimov wrote:

...

Thorsten Ottosen wrote:

...
3. consider if the primary template should not take 1 range parameter and call hash_range; unordered containers can then be explicitly overloaded when they become available. this would also have the great effect that subranges like sub_range<string> "just works" by default

I don't remember this being discussed during the review, could you please explain it in more detail? Do you mean that hash_value should have a primary template that assumes that its argument is a range?

yes. | If

...

so, I strongly oppose it. Ranges aren't special from the point of view of hash_value, they shouldn't monopolize the primary template.

why should anything *special* monopolize the primary template? Why not something *general*? | The

...

proper way to make hash_value work with sub_range<T>, as with any other type, is for sub_range to define a hash_value overload as an inline friend or in its namespace.

|I think Thorsten is referring in part to some off list emails, and in |part to when I wrote (about removing the includes for all the standard |containers): | |> Another possible solution is to just use Boost.Range on anything that 'looks' like a container. But this will break on unordered containers (as equivalent containers can have different |sequences). | |I forgot to mention that there's no way to tell what 'looks' like a |container at all. Joaquín replied: | |> Too broad IMHO. | |So, I think we all rejected the idea. It would be nice if there was a |way to do something like that, but there isn't. that Joachin don't like the idea is ok, it's just not an arguemnt in favor of much. What are the obstacles? The unordered containers can define their own version. Btw, would you please describe why the unordered containers would "break"? -Thorsten

Peter Dimov

9:36 p.m.

Thorsten Ottosen wrote:

...

"Daniel James" <daniel@calamity.org.uk> wrote in message news:d1napl$8pu$1@sea.gmane.org... Peter Dimov wrote:

...
I don't remember this being discussed during the review, could you please explain it in more detail? Do you mean that hash_value should have a primary template that assumes that its argument is a range?

yes.

...
If so, I strongly oppose it. Ranges aren't special from the point of view of hash_value, they shouldn't monopolize the primary template.

why should anything *special* monopolize the primary template? Why not something *general*?

There can be at most one primary template (and for ADL customization points, there should be none). When you pick your favorite type category and put that in a primary template, this makes this type category special with respect to the primitive. It means that the primitive operates on this specific category of types. hash_value is not a range primitive. It is a value primitive.

Thorsten Ottosen

22 Mar 22 Mar

12:42 p.m.

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:016401c52e5e$0e08abe0$6501a8c0@pdimov2... | Thorsten Ottosen wrote: | > "Daniel James" <daniel@calamity.org.uk> wrote in message | > news:d1napl$8pu$1@sea.gmane.org... | > Peter Dimov wrote: | >> I don't remember this being discussed during the review, could you | >> please explain it in more detail? Do you mean that hash_value should | >> have a primary template that assumes that its argument is a range? | > | > yes. | > | >> If | >> so, I strongly oppose it. Ranges aren't special from the point of | >> view of hash_value, they shouldn't monopolize the primary template. | > | > why should anything *special* monopolize the primary template? Why not | > something *general*? | | There can be at most one primary template (and for ADL customization points, | there should be none). why? In boost.range this is done and it is very convenient that the primary template works with all containers. | When you pick your favorite type category and put | that in a primary template, this makes this type category special with | respect to the primitive. speciel == works | It means that the primitive operates on this | specific category of types. this is your interpretation | hash_value is not a range primitive. It is a value primitive. is vector< sub_range<string> >::value_type a value or a range? -Thorsten

Daniel James

21 Mar 21 Mar

9:50 p.m.

Thorsten Ottosen wrote:

...

|I think Thorsten is referring in part to some off list emails, and in |part to when I wrote (about removing the includes for all the standard |containers): | |> Another possible solution is to just use Boost.Range on anything that 'looks' like a container. But this will break on unordered containers (as equivalent containers can have different |sequences). | |I forgot to mention that there's no way to tell what 'looks' like a |container at all. Joaquín replied: | |> Too broad IMHO. | |So, I think we all rejected the idea. It would be nice if there was a |way to do something like that, but there isn't.

that Joachin don't like the idea is ok, it's just not an arguemnt in favor of much.

I was just recapping what conversation there was. But it does make three people opposed to the idea.

...

What are the obstacles? The unordered containers can define their own version.

But there could be other containers which also cause problems which we can't anticipate. Or other types which have a sequence, but don't define equality in terms of that sequence.

...

Btw, would you please describe why the unordered containers would "break"?

It's possible for two unordered containers to contain exactly the same elements but in a different order. So they would get different hash values for hash_range, but are considered equal (although equality isn't usually defined anyway). Daniel

Thorsten Ottosen

22 Mar 22 Mar

12:39 p.m.

"Daniel James" <daniel@calamity.org.uk> wrote in message news:d1nff6$ph6$1@sea.gmane.org... Thorsten Ottosen wrote:

...

that Joachin don't like the idea is ok, it's just not an arguemnt in favor of much.

I was just recapping what conversation there was. But it does make three people opposed to the idea.

...

...
...
...
...
...
>>>>>>>>>

all decision should, if possible, be based on arguments not on personal opinions

...

What are the obstacles? The unordered containers can define their own version.

But there could be other containers which also cause problems which we can't anticipate. Or other types which have a sequence, but don't define equality in terms of that sequence.

...

...
...
...
...
...
>>>>>>>>>

The rule of thumb is that you must make sure objects comparing equal has the same hash-value; what the hash-value is is not a problem AFAICT.

...

Btw, would you please describe why the unordered containers would "break"?

...

...
...
...
...
...
>>>>>>>>>>

if equality isn't defined, then it makes little sense to use them as index in a hash-based container. In fact, it won't compile. -Thorsten

Daniel James

6:01 p.m.

Thorsten Ottosen wrote:

...

if equality isn't defined, then it makes little sense to use them as index in a hash-based container. In fact, it won't compile.

But equality isn't prohibited. So an implementation could supply it. Or a non-standard hash container could.

Thorsten Ottosen

6:31 p.m.

"Daniel James" <daniel@calamity.org.uk> wrote in message news:d1pmd5$tl2$1@sea.gmane.org... | Thorsten Ottosen wrote: | | > if equality isn't defined, then it makes little sense to use them as index in | > a hash-based | > container. In fact, it won't compile. | | But equality isn't prohibited. So an implementation could supply it. with what purpose? Even so, it must be far easier to write an empty function with a static assert for those cases that should not use the default than to do the opposite. | Or | a non-standard hash container could. Does anyone supply it? If so, with what semantics? Even so, does the extreamely rare situation, that a hash-container is used as index in a hash-container, justify that we should not use a generic default ? -Thorsten

Daniel James

6:41 p.m.

Thorsten Ottosen wrote:

...

Even so, does the extreamely rare situation, that a hash-container is used as index in a hash-container, justify that we should not use a generic default ?

But that was just a specific example of a general point - that we can't define a hash function for an object when we don't know what its equality function is. Another example: struct foo { int identity_; std::vector<int> values_; std::vector<int>::const_iterator begin() const { return values_.begin(); } std::vector<int>::const_iterator end() { return values_.end(); } friend bool operator==(foo const& x) const { return identity_ == x.identity_; } // ... };

Thorsten Ottosen

7:04 p.m.

"Daniel James" <daniel@calamity.org.uk> wrote in message news:d1pooj$7ia$1@sea.gmane.org... | Thorsten Ottosen wrote: | | > Even so, does the extreamely rare situation, that a hash-container is used as | > index in a hash-container, | > justify that we should not use a generic default ? | | But that was just a specific example of a general point - that we can't | define a hash function for an object when we don't know what its | equality function is. | | Another example: My point has been that it is up to the programmer to make sure that equal objects has the same hash value. Your point is that it could be the wrong hash value. I don't see why we should guard against this; the user must always make sure that equal objects have the same hash value; consider just what happens when the user adds a new data member to the class and updates equality but forgets to update the hash value or vice versa. So I think that we are just getting a false sense of security out of leaving the default empty. -Thorsten

Daniel James

7:38 p.m.

Thorsten Ottosen wrote:

...

My point has been that it is up to the programmer to make sure that equal objects has the same hash value.

But the programmer might be oblivious to the hash library (remember that it's used by default in both Boost.MultiIndex and my unordered containers). The object might have been written by one person, and then inserted into a hash table by another who, on seeing that it compiles, thinks that everything has been sorted out in advance.

...

Your point is that it could be the wrong hash value.

I don't see why we should guard against this; the user must always make sure that equal objects have the same hash value; consider just what happens when the user adds a new data member to the class and updates equality but forgets to update the hash value or vice versa.

When someone writes a custom hash function they're assuming responsibility for its upkeep. But that only happens when they write it. Daniel

Peter Dimov

8 p.m.

Thorsten Ottosen wrote:

...

So I think that we are just getting a false sense of security out of leaving the default empty.

OK, let's leave equivalence out of it. What is the default for an arbitrary object x of user-defined type X? You claim that it's hash_range( x.begin(), x.end() ), but this is obviously incorrect in the vast majority of cases. I claim that there is no default.

Joaquín M López Muñoz

3:33 p.m.

Daniel James <daniel <at> calamity.org.uk> writes:

...

Thorsten Ottosen wrote:

...
|I think Thorsten is referring in part to some off list emails, and in |part to when I wrote (about removing the includes for all the standard |containers): | |> Another possible solution is to just use Boost.Range on anything that 'looks' like a container. But this will break on unordered containers (as equivalent containers can have different |sequences). | |I forgot to mention that there's no way to tell what 'looks' like a |container at all. Joaquín replied: | |> Too broad IMHO. | |So, I think we all rejected the idea. It would be nice if there was a |way to do something like that, but there isn't.

that Joachin don't like the idea is ok, it's just not an arguemnt in favor

...

...
much.

...

From my point of view, this is just a maintenance issue: if there's a stdlib (none that I know, anyway) with extra

Of course :) My concern about the idea is that we can do better by simply fowrard declaring the STL containers: namespace std{ template<typename T,typename A> class vector; } template<typename T,typename A> std::size_t hash_value(const std::vector<T,A>& x) { ... } etc. The problem with this approach is that it is allowed that stdlib implementors use extra template parameters. parameters we can sync the forward declarations for this particular case and be happy. This is meant to avoid the inclusion of std headers for STL containers, since they're extra overhead when their corresponding hash overloads aren't to be used. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Peter Dimov

4:37 p.m.

Joaquín M LópezMuñoz wrote:

...

The problem with this approach is that it is allowed that stdlib implementors use extra template parameters.

They are not, see: http://open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#94

Joaquin M Lopez Munoz

23 Mar 23 Mar

4:06 p.m.

Peter Dimov <pdimov <at> mmltd.net> writes:

...

Joaquín M LópezMuñoz wrote:

...
The problem with this approach is that it is allowed that stdlib implementors use extra template parameters.

They are not, see:

http://open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#94

So much the better! Is there any obstacle, then, to forward declare STL containers so as to define their corresponding hash_value overloads, instead of including them? In the linked page it is said [quote] "I can't think of any way that this extension could break a conforming program, considering that users are not permitted to forward-declare standard library components" Is this really so? Why cannot I forward declare a stdlib component? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Jonathan Wakely

4:53 p.m.

On Wed, Mar 23, 2005 at 04:06:09PM +0000, Joaquin M Lopez Munoz wrote:

...

"I can't think of any way that this extension could break a conforming program, considering that users are not permitted to forward-declare standard library components"

Is this really so? Why cannot I forward declare a stdlib component?

Opening namespace std for anything except specialisations of standard library templates (with user-defined types as parameters) is illegal. jon

Thorsten Ottosen

5:15 p.m.

"Jonathan Wakely" <cow@compsoc.man.ac.uk> wrote in message news:20050323165335.GA52151@compsoc.man.ac.uk... | On Wed, Mar 23, 2005 at 04:06:09PM +0000, Joaquin M Lopez Munoz wrote: | | > "I can't think of any way that this extension could | > break a conforming program, considering that users | > are not permitted to forward-declare standard | > library components" | > | > Is this really so? Why cannot I forward declare a | > stdlib component? | | Opening namespace std for anything except specialisations of standard | library templates (with user-defined types as parameters) is illegal. yes, but why? -Thorsten

Jonathan Wakely

6:05 p.m.

On Wed, Mar 23, 2005 at 06:15:14PM +0100, Thorsten Ottosen wrote:

...

"Jonathan Wakely" <cow@compsoc.man.ac.uk> wrote in message news:20050323165335.GA52151@compsoc.man.ac.uk... | On Wed, Mar 23, 2005 at 04:06:09PM +0000, Joaquin M Lopez Munoz wrote: | | > "I can't think of any way that this extension could | > break a conforming program, considering that users | > are not permitted to forward-declare standard | > library components" | > | > Is this really so? Why cannot I forward declare a | > stdlib component? | | Opening namespace std for anything except specialisations of standard | library templates (with user-defined types as parameters) is illegal.

yes, but why?

Because the Standard says so! :-) I think that's the rule that the DR refers to when it says "users are not permitted ...", whether it's possible/advisable in practice is a different matter. Boost already opens namespace std for various reasons (e.g declaring C library names in std::) so forward decls would be no worse than the current situation - technically illegal, but harmless and even necessary in some cases. I can't see any real compiler ever rejecting the code, only a really strict validation test suite of some kind. jon

Thorsten Ottosen

6:35 p.m.

"Jonathan Wakely" <cow@compsoc.man.ac.uk> wrote in message news:20050323180511.GA54619@compsoc.man.ac.uk... | On Wed, Mar 23, 2005 at 06:15:14PM +0100, Thorsten Ottosen wrote: | | > "Jonathan Wakely" <cow@compsoc.man.ac.uk> wrote in message | > news:20050323165335.GA52151@compsoc.man.ac.uk... | > | On Wed, Mar 23, 2005 at 04:06:09PM +0000, Joaquin M Lopez Munoz wrote: | > | | > | > "I can't think of any way that this extension could | > | > break a conforming program, considering that users | > | > are not permitted to forward-declare standard | > | > library components" | > | > | > | > Is this really so? Why cannot I forward declare a | > | > stdlib component? | > | | > | Opening namespace std for anything except specialisations of standard | > | library templates (with user-defined types as parameters) is illegal. | > | > yes, but why? | | Because the Standard says so! :-) yes, I know that...I was interesting in examples of *how* we can break programs with forward declarations. -Thorsten

Eric Niebler

6:01 p.m.

Jonathan Wakely wrote:

...

On Wed, Mar 23, 2005 at 04:06:09PM +0000, Joaquin M Lopez Munoz wrote:

...
"I can't think of any way that this extension could break a conforming program, considering that users are not permitted to forward-declare standard library components"

Is this really so? Why cannot I forward declare a stdlib component?

Opening namespace std for anything except specialisations of standard library templates (with user-defined types as parameters) is illegal.

OK, but that has never stopped us (Boost) from doing it before. config/suffix.hpp defines std::min and std::max for non-compliant std libs. lambda/detail/operator_return_type_traits.hpp forward-declares std::complex<>. I don't see anything particularly wrong with that, except that perhaps forward-declarations of std types should be moved into the config sub-project to better handle the variety in different std libs. -- Eric Niebler Boost Consulting www.boost-consulting.com

Jonathan Wakely

24 Mar 24 Mar

10:10 a.m.

On Wed, Mar 23, 2005 at 10:01:21AM -0800, Eric Niebler wrote:

...

Jonathan Wakely wrote:

...
On Wed, Mar 23, 2005 at 04:06:09PM +0000, Joaquin M Lopez Munoz wrote:

...
"I can't think of any way that this extension could break a conforming program, considering that users are not permitted to forward-declare standard library components"

Is this really so? Why cannot I forward declare a stdlib component?

Opening namespace std for anything except specialisations of standard library templates (with user-defined types as parameters) is illegal.

OK, but that has never stopped us (Boost) from doing it before. config/suffix.hpp defines std::min and std::max for non-compliant std libs. lambda/detail/operator_return_type_traits.hpp forward-declares std::complex<>. I don't see anything particularly wrong with that, except that perhaps forward-declarations of std types should be moved into the config sub-project to better handle the variety in different std libs.

I was never suggesting Boost shouldn't do it, (I also pointed out that it's done already for BOOST_NO_STDC_NAMESPACE,) I was just giving a likely rationale for the comment in the DR, which is what I thought Joaquin was asking about. The fact that you can get away with it safely in practice doesn't mean it's legal. I think I'll quit while I'm behind :) jon

Daniel James

21 Mar 21 Mar

8:20 p.m.

Thorsten Ottosen wrote:

...

I would like to thank all who partcipated:

Thank you from me as well, and thanks to Thorsten for managing the review.

...

Issues and solutions: ---------------------

1. for those who don't want to include the whole std-lib, individual headers can be put in hash/std/XX.hpp (but see 3)

This should probably be a sub-directory of functional, as hash lives at boost/functional/hash.hpp.

...

4. avoid too many zero hash values by defaulting to something other than 0

I think the preferred solution was to add a constant to hash_combine (which seems to work well). Although, it might be good to have an 'initial_seed' constant.

...

5. change implementation of hash values for pointers so undefined behavior is gone (and consider adding x + (x >> 3) )

I'm going to do that.

...

6. the design aim was predictibility and to minimize collision; for the latter part, I would like to know if similar values hash to similar buckets or not. If very different objects end up in the same buskets, wouldn't equality be much faster rejected? Should this be a concern?

Generally, values that collide should not be similar, (which is why collisions of sequences of zeros was a problem), although I'm sure that there are still exceptions to that rule. Especially because it's quite hard to define what 'similar' means in general. Daniel

7387

Age (days ago)

7390

Last active (days ago)

List overview

Download

23 comments

7 participants

participants (7)

Daniel James
Eric Niebler
Joaquin M Lopez Munoz
Joaquín M López Muñoz
Jonathan Wakely
Peter Dimov
Thorsten Ottosen