Status of Hash Collections?

newer
[program_options] parse_config_file

Jeff Garland

28 Oct 2006 28 Oct '06

3:53 a.m.

Yesterday I got to see the first part of Scott Meyer's overview of TR1 and Boost. Of course, he pointed out that we don't yet have the hash collections in Boost. Looking over what's in the vault, it looks like it's coming along well...seems like it's pretty much ready for review? Since I see that I asked this in July does this count as nagging ;-) Jeff

Show replies by date

John Maddock

28 Oct 28 Oct

9:01 a.m.

Jeff Garland wrote:

...

Yesterday I got to see the first part of Scott Meyer's overview of TR1 and Boost. Of course, he pointed out that we don't yet have the hash collections in Boost. Looking over what's in the vault, it looks like it's coming along well...seems like it's pretty much ready for review? Since I see that I asked this in July does this count as nagging ;-)

Very slow nagging :-) I agree we really should get some hashed containers into Boost, heaven only knows we've been talking about them long enough ! John.

Aristid Breitkreuz

12:40 p.m.

Am Samstag, den 28.10.2006, 10:01 +0100 schrieb John Maddock:

...

I agree we really should get some hashed containers into Boost, heaven only knows we've been talking about them long enough !

What about multi_index? They have support for hashing. But quite probably stand-alone hashed containers would be good to for two reasons: 1. TR1 compliance 2. Possibly the multi_index implementation or interface has some serious drawbacks Am I mistaken? Aristid

"JOAQUIN LOPEZ MU?Z"

4:07 p.m.

Hello Aristid, ----- Mensaje original ----- De: Aristid Breitkreuz <aribrei@arcor.de> Fecha: Sábado, Octubre 28, 2006 2:41 pm Asunto: Re: [boost] Status of Hash Collections? Para: boost@lists.boost.org

...

What about multi_index? They have support for hashing.

But quite probably stand-alone hashed containers would be good to for two reasons: 1. TR1 compliance 2. Possibly the multi_index implementation or interface has some serious drawbacks

Am I mistaken?

Well, I too support the inclusion of standalone TR1-compliant unordered containers into Boost, but since you mention Boost.MultiIndex hashed indices let me explain how they stand in comparison with unordered_[multi]set. The instantiation multi_index_container< Value, indexed_by< hashed_[non_]unique<identity<Value>,Hash,Pred>

...

, Alloc

completely replicates the interface and complexity bounds of unordered_[multi]set<Value,Hash,Pred,Alloc> except that insert(const value&) returns a pair<iterator,bool> both for hashed_unique and hashed_non_unique indices, while unordered_multiset::insert returns an iterator. Due to design constraints, B.MI hashed indices provide the following guarantees that are *not* mandated for TR1 unordered containers: * Iterator validity is preserved in any case during insertion or rehashing: TR1 allows for iterator invalidation when a rehash (implicit or explicit) is performed. * Erasing an element or range of elements via iterators does not throw ever, as the internal hash function and equality predicate objects are not actually invoked. * rehash provides the strong exception safety guarantee unconditionally. TR1 only warrants it if the internal hash function and equality predicate objects do not throw. These stronger guarantees might result in some performance penalties in the area of iterator increment, erase and rehash operations, though my measurings lead me to think that the actual impact is low or negligible. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Aristid Breitkreuz

6:29 p.m.

Hello Joaquin, Thanks for the insight! Am Samstag, den 28.10.2006, 18:07 +0200 schrieb "JOAQUIN LOPEZ MU?Z":

...

Hello Aristid,

Well, I too support the inclusion of standalone TR1-compliant unordered containers into Boost, but since you mention Boost.MultiIndex hashed indices let me explain how they stand in comparison with unordered_[multi]set. The instantiation

multi_index_container< Value, indexed_by< hashed_[non_]unique<identity<Value>,Hash,Pred>

...
, Alloc

completely replicates the interface and complexity bounds of

unordered_[multi]set<Value,Hash,Pred,Alloc>

except that insert(const value&) returns a pair<iterator,bool> both for hashed_unique and hashed_non_unique indices, while unordered_multiset::insert returns an iterator.

Will hashed_non_unique ever return a false second there?

...

Due to design constraints, B.MI hashed indices provide the following guarantees that are *not* mandated for TR1 unordered containers:

* Iterator validity is preserved in any case during insertion or rehashing: TR1 allows for iterator invalidation when a rehash (implicit or explicit) is performed. * Erasing an element or range of elements via iterators does not throw ever, as the internal hash function and equality predicate objects are not actually invoked. * rehash provides the strong exception safety guarantee unconditionally. TR1 only warrants it if the internal hash function and equality predicate objects do not throw.

These stronger guarantees might result in some performance penalties in the area of iterator increment, erase and rehash operations, though my measurings lead me to think that the actual impact is low or negligible.

Does this imply that implementing TR1 unordered containers on top of B.MI would not impose a serious performance drawback? The interface is nearly the same so writing a TR1 compliant wrapper should be ... doable, right? And it should be done if it shall be part of 1.35 :-). Aristid Breitkreuz PS: I hope 1.34 will come out soon now. (Who doesn't?)

"JOAQUIN LOPEZ MU?Z"

6:53 p.m.

----- Mensaje original ----- De: Aristid Breitkreuz <aribrei@arcor.de> Fecha: Sábado, Octubre 28, 2006 8:30 pm Asunto: Re: [boost] Status of Hash Collections? Para: boost@lists.boost.org [...]

...

Am Samstag, den 28.10.2006, 18:07 +0200 schrieb "JOAQUIN LOPEZ MU?Z": [...]

...
multi_index_container< Value, indexed_by< hashed_[non_]unique<identity<Value>,Hash,Pred>

...
, Alloc

completely replicates the interface and complexity bounds of

unordered_[multi]set<Value,Hash,Pred,Alloc>

except that insert(const value&) returns a pair<iterator,bool> both for hashed_unique and hashed_non_unique indices, while unordered_multiset::insert returns an iterator.

Will hashed_non_unique ever return a false second there?

It can return a false if there are additional indices with uniqueness constraints, hence the pair<iterator,bool> signature.

...

...
Due to design constraints, B.MI hashed indices provide the following guarantees that are *not* mandated for TR1 unordered containers: [...] These stronger guarantees might result in some performance penalties in the area of iterator increment, erase and rehash operations, though my measurings lead me to think that the actual impact is low or negligible.

Does this imply that implementing TR1 unordered containers on top of B.MI would not impose a serious performance drawback? The interface is nearly the same so writing a TR1 compliant wrapper should be ... doable,right?

It's certainly doable, but I don't see any reason to follow this approach given that Daniel James' standalone implementation available at the vault looks like is completely finished AFAICS. We only need that Daniel or someone else write the docs and request a formal review.

...

And it should be done if it shall be part of 1.35 :-).

Aristid Breitkreuz

PS: I hope 1.34 will come out soon now. (Who doesn't?)

We're close, the regression count is only 47 :) Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Daniel James

31 Oct 31 Oct

11:06 p.m.

Jeff Garland wrote:

...

Yesterday I got to see the first part of Scott Meyer's overview of TR1 and Boost. Of course, he pointed out that we don't yet have the hash collections in Boost. Looking over what's in the vault, it looks like it's coming along well...seems like it's pretty much ready for review? Since I see that I asked this in July does this count as nagging ;-)

Sorry about the slow reply, I haven't been checking my email often recently. I just uploaded a new version, which fixes a bug and removes some unnecessary junk from the implementation. Looking at my to do list, it's mainly just to add more tests and examples, which probably don't strictly need to be done before a review. There are also a few minor issues with the standard to be sorted out. There is one problem that might be more serious. I'm not sure if Allocator::address or the allocator comparison operators are allowed to throw an exception. If they can, the implementation doesn't meet the exception requirements. Actually, if the allocator comparison operators can throw then it might not be possible to meet the swap exception requirements for most of the standard containers. So maybe I'm missing something. I haven't looked into this in much detail yet. Daniel

John Maddock

1 Nov 1 Nov

10:21 a.m.

Daniel James wrote:

...

Jeff Garland wrote:

...
Yesterday I got to see the first part of Scott Meyer's overview of TR1 and Boost. Of course, he pointed out that we don't yet have the hash collections in Boost. Looking over what's in the vault, it looks like it's coming along well...seems like it's pretty much ready for review? Since I see that I asked this in July does this count as nagging ;-)

Sorry about the slow reply, I haven't been checking my email often recently.

I just uploaded a new version, which fixes a bug and removes some unnecessary junk from the implementation.

Looking at my to do list, it's mainly just to add more tests and examples, which probably don't strictly need to be done before a review. There are also a few minor issues with the standard to be sorted out.

There is one problem that might be more serious. I'm not sure if Allocator::address or the allocator comparison operators are allowed to throw an exception. If they can, the implementation doesn't meet the exception requirements. Actually, if the allocator comparison operators can throw then it might not be possible to meet the swap exception requirements for most of the standard containers. So maybe I'm missing something. I haven't looked into this in much detail yet.

I don't know about address, but you are allowed to assume that all allocators compare equal. Does that help? John.

Daniel James

10:54 a.m.

On 01/11/06, John Maddock <john@johnmaddock.co.uk> wrote:

...

I don't know about address, but you are allowed to assume that all allocators compare equal. Does that help?

I've been trying to fully support allocators in order to support allocators such as the one from Ion's shared memory library. I realise that most implementations don't do this so it's probably not vital. And it's still an open issue how swap should be implemented: http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#431 Allocator::address is actually the more important issue, as I use it to destruct nodes. If it can throw I really should change the implementation so that it doesn't. While I'd prefer not to, it's not too bad. I'll have to remove an optimisation for iteration. Also, I'm constructing the value in nodes separately from the pointers, in order to avoid an extra temporary, but I won't be able to destruct them separately. It feels wrong to destruct them in a different manner to their construction, but I don't think it'll cause any problems in practise. Daniel

John Maddock

1:22 p.m.

Daniel James wrote:

...

On 01/11/06, John Maddock <john@johnmaddock.co.uk> wrote:

...
I don't know about address, but you are allowed to assume that all allocators compare equal. Does that help?

I've been trying to fully support allocators in order to support allocators such as the one from Ion's shared memory library. I realise that most implementations don't do this so it's probably not vital. And it's still an open issue how swap should be implemented:

http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#431

Allocator::address is actually the more important issue, as I use it to destruct nodes. If it can throw I really should change the implementation so that it doesn't. While I'd prefer not to, it's not too bad. I'll have to remove an optimisation for iteration. Also, I'm constructing the value in nodes separately from the pointers, in order to avoid an extra temporary, but I won't be able to destruct them separately. It feels wrong to destruct them in a different manner to their construction, but I don't think it'll cause any problems in practise.

The thing is I can't really imagine a situation where it would throw: or are you thinking of on-disk data structures maybe? My gut feeling would be to document any limitations and see how it flies: you've already gone beyond the requirements of the std, so I guess you could mark the use of non-equal allocators as experimental or something. Basically it sounds like you need some real-world feedback to decide how best to proceed? Getting the containers in Boost would be one way to get that experience: as long as the more experimental aspects that go over and beyond what TR1 requires are documented I don't see a great problem. Don't forget that someone may come up with either a killer suggestion, or a killer-gotcha during review that changes things anyway :-) HTH, John.

Daniel James

8:54 p.m.

John Maddock wrote:

...

The thing is I can't really imagine a situation where it would throw: or are you thinking of on-disk data structures maybe?

My gut feeling would be to document any limitations and see how it flies: you've already gone beyond the requirements of the std, so I guess you could mark the use of non-equal allocators as experimental or something.

Basically it sounds like you need some real-world feedback to decide how best to proceed? Getting the containers in Boost would be one way to get that experience: as long as the more experimental aspects that go over and beyond what TR1 requires are documented I don't see a great problem. Don't forget that someone may come up with either a killer suggestion, or a killer-gotcha during review that changes things anyway :-) You're probably right. I think I've assumed a few things beyond what is stated in the allocator requirements - mainly because it would be impractical not to. I don't think it's ever stated that Allocator::pointer's operations are no throw, but really they must be.

The main reason for my concerns was that for deallocate it's explicitly said that it does not throw exceptions. But nothing is said for any other expressions. But then, from a quick look at the requirements, the meaning of Allocator::address doesn't seem to be defined at all. thanks, Daniel

Ion Gaztañaga

4:36 p.m.

John Maddock wrote:

...

Daniel James wrote:

...
There is one problem that might be more serious. I'm not sure if Allocator::address or the allocator comparison operators are allowed to throw an exception. If they can, the implementation doesn't meet the exception requirements. Actually, if the allocator comparison operators can throw then it might not be possible to meet the swap exception requirements for most of the standard containers. So maybe I'm missing something. I haven't looked into this in much detail yet.

I don't know about address, but you are allowed to assume that all allocators compare equal. Does that help?

The standard allows ignoring equality and also allows ignoring pointer, reference and other typedefs but I would like to make boost unordered containers compatible with Boost.Interprocess shared memory allocators and avoid writing my own unordered container. My bet is to use the option 3 for Issue 431: http://std.dkuug.dk/jtc1/sc22/wg21/docs/papers/2004/n1599.html If comparing allocators throws, swap would throw, so this would invalidate one of the best arguments of this option. I sincerely would assume that comparing allocators does not throw, to obtain a no-throw swap. Apart from this, the old implementation of Boost unordered supposed that the hash function and the predicate can throw, something that I found really nasty. If I remember correctly, there was a double-buffer trick in the old implementation, that incremented the size of the hash. According to TR1: 6.3.1.1 Exception safety guarantees [tr.unord.req.except] 1 For unordered associative containers, no clear() function throws an exception. No erase() function throws an exception unless that exception is thrown by the container’s Hash or Pred object (if any). 2 For unordered associative containers, if an exception is thrown by any operation other than the container’s hash function from within an insert() function inserting a single element, the insert() function has no effect. 3 For unordered associative containers, no swap function throws an exception unless that exception is thrown by the copy constructor or copy assignment operator of the container’s Hash or Pred object (if any). 4 For unordered associative containers, if an exception is thrown from within a rehash() function other than by the container’s hash function or comparison function, the rehash() function has no effect. I haven't implemented these containers myself, but aren't these exception guarantees an overhead when implementing these containers? When swapping, if the hash function's copy-constructor does not throw but the predicate's copy constructor throws what's the state of the half-constructed container? The only way to obtain strong guarantee would be to have an array of two has functions and use an index or member pointer that we could rollback. This complicates the implementation a lot and it might have an space overhead. Is realistic to have both a throwing pred and hash function? I have never used a comparison function or has that throws, is there any real-life example of throwing hash and predicate? Regards, Ion

Daniel James

8:37 p.m.

Ion Gaztañaga wrote:

...

My bet is to use the option 3 for Issue 431:

http://std.dkuug.dk/jtc1/sc22/wg21/docs/papers/2004/n1599.html

If comparing allocators throws, swap would throw, so this would invalidate one of the best arguments of this option. I sincerely would assume that comparing allocators does not throw, to obtain a no-throw swap.

I think option 3 is best as well. I don't think this is actually a problem. The guarantee becomes something like, "swap is no-throw, as long as comparing and swapping allocators are no-throw operations." And really, I expect that allocators that throws in these circumstances are very rare, if they exist at all. The problem is the swap requirements:

...

3 For unordered associative containers, no swap function throws an exception unless that exception is thrown by the copy constructor or copy assignment operator of the container’s Hash or Pred object (if any).

Something probably should be added about allocators.

...

Apart from this, the old implementation of Boost unordered supposed that the hash function and the predicate can throw, something that I found really nasty. If I remember correctly, there was a double-buffer trick in the old implementation, that incremented the size of the hash.

It's still there. Yes, hashes and predicates that can throw are a bit weird. But the standard specifically allows them. My implementation is never going to be the most efficient hash table. At the moment, the main object is certainly larger than it needs to be - I'm not even using EBO. But typically this won't be too when you consider the space used by the nodes and it will improve in the future. I'd quite like to have the first release support older compilers, and then the second discard them and improve the implementation. I expect several people would prefer me to skip the first part of that plan.

...

I haven't implemented these containers myself, but aren't these exception guarantees an overhead when implementing these containers?

It's not too bad. There's the double buffer that you mentioned, and insert could be simplified if it didn't need to meet the requirements.

...

When swapping, if the hash function's copy-constructor does not throw but the predicate's copy constructor throws what's the state of the half-constructed container? The only way to obtain strong guarantee would be to have an array of two has functions and use an index or member pointer that we could rollback.

Yep, I've also got a similar problem with allocators. I have to choose either to store a single allocator - which causes problems in the destructor as I'll have to construct an extra allocator (at least two allocators are needed in the destructor - one for buckets and one for nodes) or I have to store multiple allocators leading to problems when swapping. Actually, I haven't dealt with that yet - I'm currently expecting a no-throw swap.

...

This complicates the implementation a lot and it might have an space overhead. Is realistic to have both a throwing pred and hash function? I have never used a comparison function or has that throws, is there any real-life example of throwing hash and predicate?

Not that I know of. But the standard feels the need to specify that they can. Normally I wouldn't care about these things, but it seems to me that when implementing a standard container it's worth being overly cautious. Daniel

Daniel James

2 Nov 2 Nov

7:22 p.m.

Daniel James wrote:

...

I just uploaded a new version, which fixes a bug and removes some unnecessary junk from the implementation.

It looks like the upload failed, and it was still the old version in the vault. So I tried again and the new version is there now. You can find it at: http://www.boost-consulting.com/vault/index.php?directory=Containers Daniel

6818

Age (days ago)

6823

Last active (days ago)

List overview

Download

13 comments

6 participants

participants (6)

"JOAQUIN LOPEZ MU?Z"
Aristid Breitkreuz
Daniel James
Ion Gaztañaga
Jeff Garland
John Maddock