[review] Review of Unordered library begins today December 7

Ion Gaztañaga

7 Dec 2007 7 Dec '07

10:05 a.m.

Hi all, The formal review of the Unordered library, proposed by Daniel James, begins today : *Description*: An implementation of the unordered containers specified in TR1, with most of the changes from the recent draft standards. *Author*: Daniel James *Download*: http://www.boost-consulting.com/vault/index.php?action=downloadfile&filename=unordered.zip&directory=Containers The documentation is in the zip file. I've put documentation online here: http://igaztanaga.drivehq.com/unordered http://tinyurl.com/yw8wjp What to include in Review Comments ================================== Your comments may be brief or lengthy, but basically the Review Manager needs your evaluation of the library. If you identify problems along the way, please note if they are minor, serious, or showstoppers. Here are some questions you might want to answer in your review: * What is your evaluation of the design? * What is your evaluation of the implementation? * What is your evaluation of the documentation? * What is your evaluation of the potential usefulness of the library? * Did you try to use the library? With what compiler? Did you have any problems? * How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? * Are you knowledgeable about the problem domain? And finally, every review should answer this question: * Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion. Unordered containers are likely to become one of the most used Boost libraries so I encourage you to review the library in order to see if Unordered fulfills all your expectations. Ion Gaztañaga - Review Manager -

Show replies by date

Thorsten Ottosen

7 Dec 7 Dec

2:55 p.m.

New subject: [review] Review of Unordered library begins today December 7

Ion Gaztañaga skrev:

...

Hi all, And finally, every review should answer this question:

* Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion.

I vote to accept this library. The documentation is very nice. One thing puzzled me though: (see http://igaztanaga.drivehq.com/unordered/unordered/comparison.html) "The containers hash or predicate function can throw exceptions from erase" Under what cicumstances can/should/would I construct a ahsh or predicate that throws? I mean, the hash computes an integer, and comparisons reads data to determine its answer. This migth be what the standard requires, but it certainly seems non-intuitive. Why not just require that neither hash nor predicates can throw? -Thorsten

Ion Gaztañaga

7 p.m.

Thorsten Ottosen wrote:

...

I vote to accept this library. The documentation is very nice.

Thanks for the vote.

...

One thing puzzled me though: (see http://igaztanaga.drivehq.com/unordered/unordered/comparison.html)

"The containers hash or predicate function can throw exceptions from erase"

Under what cicumstances can/should/would I construct a ahsh or predicate that throws? I mean, the hash computes an integer, and comparisons reads data to determine its answer.

This migth be what the standard requires, but it certainly seems non-intuitive. Why not just require that neither hash nor predicates can throw?

The same could be said about Predicate for ordered containers like std::set/map. I haven't seen a practical case where the comparison could throw. But standard library experts should answer this better than me. Certainly, Daniel has tried to achieve strong exception guarantees and the implementation becomes quite complicated if comparison/hash throws. Double buffering and other tricks are needed. I think this is a very good question both for boosters and people from the LWG.

...

-Thorsten

Regards, Ion

Daniel James

8 Dec 8 Dec

10:11 a.m.

Ion Gaztañaga wrote:

...

Thorsten Ottosen wrote:

...
I vote to accept this library. The documentation is very nice.

Thanks for the vote.

Thank you from me as well.

...

...
One thing puzzled me though: (see http://igaztanaga.drivehq.com/unordered/unordered/comparison.html)

"The containers hash or predicate function can throw exceptions from erase"

Under what cicumstances can/should/would I construct a ahsh or predicate that throws? I mean, the hash computes an integer, and comparisons reads data to determine its answer.

This migth be what the standard requires, but it certainly seems non-intuitive. Why not just require that neither hash nor predicates can throw?

The same could be said about Predicate for ordered containers like std::set/map. I haven't seen a practical case where the comparison could throw. But standard library experts should answer this better than me.

One example where they could throw is if the hash function used a numeric type which throws an exception when it overflows (which does suggest a buggy hash function but there you go). Another is that the data required to evaluate the hash function or check for equality is expensive and is lazily evaluated. For example, if you had a class which represents file, and based your equality test on calculating the SHA-1 signature then you might only calculate it when required - which could be a call to the hash or equality function.

...

Certainly, Daniel has tried to achieve strong exception guarantees and the implementation becomes quite complicated if comparison/hash throws. Double buffering and other tricks are needed. I think this is a very good question both for boosters and people from the LWG.

Dealing with a call to comparison and hash isn't that bad - the implementation of the insert functions is a little intricate because of the exception requirements. The main problem is if the copy constructor, assignment or swap calls. (Ion knows all of this, it's for the benefit of everyone else). If either objects throw an exception while swapping or assigning a container it could leave the hash and comparison in an unknown state which would make the object invalid. To avoid this I store two of each, and a pointer to the current functions. And manipulation is done on the ones that aren't in use and when finished the pointer is switched to the new functions, making the operation atomic. The main cost is the memory use of the main container object. Instead of containing 2 small objects, it contains 4 small objects and a pointer - on 32-bit x86 this typically means an extra 6 bytes, although if the hash or equality object has members it can be more. I don't see this as a significant problem because the memory required for the container's buckets and nodes is typically much larger. Without doing something like this, I don't think the container could use a boost::function for its hash or comparison object and meet the current standard's requirements. The requirements could be weakened to say that after such an exception the only thing you could with the container is destruct it, but that would be inconsistent with the rest of the standard library. Daniel

Thorsten Ottosen

9 Dec 9 Dec

4:34 p.m.

Daniel James skrev:

...

Ion Gaztañaga wrote:

...
Thorsten Ottosen wrote:

...

...
...
One thing puzzled me though: (see http://igaztanaga.drivehq.com/unordered/unordered/comparison.html)

"The containers hash or predicate function can throw exceptions from erase"

...

...
...
This migth be what the standard requires, but it certainly seems non-intuitive. Why not just require that neither hash nor predicates can throw?

The same could be said about Predicate for ordered containers like std::set/map. I haven't seen a practical case where the comparison could throw. But standard library experts should answer this better than me.

Well, the table at http://igaztanaga.drivehq.com/unordered/unordered/comparison.html says that erase() does not throw for ordered containers (which is correct). Anyway, this means any argument used to assume that predicate evaluation is a no-throw should be the same for unordered containers.

...

One example where they could throw is if the hash function used a numeric type which throws an exception when it overflows (which does suggest a buggy hash function but there you go).

Well, I think it is wrong to allow for incorrect hash-functions. We can't really protect against that anyway.

...

Another is that the data required to evaluate the hash function or check for equality is expensive and is lazily evaluated. For example, if you had a class which represents file, and based your equality test on calculating the SHA-1 signature then you might only calculate it when required - which could be a call to the hash or equality function.

yes, but would the lazy evaluation be done when inserting the object in the first place, thus not leaving it any lazy evaluation when calling erase? Why does erase( const_iterator ) (and range version), not provide the no-throw guarantee? Does erase give the strong guarantee otherwise? Since erase(const Key&) for ordered containers cannot throw, so I guess it is implicitly required that Pred(x,y) must not throw? If so, I don't see why operator== is any different. -Thorsten

Daniel James

5:13 p.m.

New subject: [review] Review of Unordered library begins today December 7

On 09/12/2007, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:

...

Well, the table at

http://igaztanaga.drivehq.com/unordered/unordered/comparison.html

says that erase() does not throw for ordered containers (which is correct). Anyway, this means any argument used to assume that predicate evaluation is a no-throw should be the same for unordered containers.

comp.std.c++ would be the place to discuss logical inconsistencies in the standard.

...

...
Another is that the data required to evaluate the hash function or check for equality is expensive and is lazily evaluated. [....]

yes, but would the lazy evaluation be done when inserting the object in the first place, thus not leaving it any lazy evaluation when calling erase?

Maybe the data is cached, not memoized and is no longer present during the erase. This might happen if it's expensive to store. Anyway, I'm just trying to provide hypothetical examples of hash functions which throw - not justify the decision. Having said that, it would be bizarre to allow the hash function to throw in insert and not in erase. How would it know?

...

Why does erase( const_iterator ) (and range version), not provide the no-throw guarantee?

Because that's what the standard says... Perhaps I'm being pedantic in my reading, the intent might have been that the hash function is never called during those overloads of erase. This implementation actually does provide the no throw guarantee. But Jeremy Maitin-Shepard's version (on which it is based) didn't because the iterators didn't store the bucket, and required the hash to be calculated to work out which bucket the elements to be deleted are in. Which I think the standard allows.

...

Does erase give the strong guarantee otherwise?

Exceptions are only allowed from the hash or predicate objects. So no throw if they are no throw.

...

Since erase(const Key&) for ordered containers cannot throw, so I guess it is implicitly required that Pred(x,y) must not throw? If so, I don't see why operator== is any different.

Because the different parts of the standard were written at different times by different people? I can only speculate. Daniel

Thorsten Ottosen

10 Dec 10 Dec

2:40 p.m.

Daniel James skrev:

...

On 09/12/2007, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:

...
Well, the table at

http://igaztanaga.drivehq.com/unordered/unordered/comparison.html

says that erase() does not throw for ordered containers (which is correct). Anyway, this means any argument used to assume that predicate evaluation is a no-throw should be the same for unordered containers.

comp.std.c++ would be the place to discuss logical inconsistencies in the standard.

Right, but we are allowed to investigate potential problems if we see them :-)

...

...
Why does erase( const_iterator ) (and range version), not provide the no-throw guarantee?

Because that's what the standard says... Perhaps I'm being pedantic in my reading, the intent might have been that the hash function is never called during those overloads of erase.

This implementation actually does provide the no throw guarantee. But Jeremy Maitin-Shepard's version (on which it is based) didn't because the iterators didn't store the bucket, and required the hash to be calculated to work out which bucket the elements to be deleted are in. Which I think the standard allows.

Ok, I think you right the standard allows that. But this is also a bit irritating, because I cannot portable take advantage of your stronger guarantee. feThis again begs the question of how much the stronger guarantee affects performance?

...

...
Does erase give the strong guarantee otherwise?

Exceptions are only allowed from the hash or predicate objects. So no throw if they are no throw.

yes, but my question was: *if an exception is thrown*, then what? I assume you give the basic guarantee? -Thorsten

Daniel James

4:19 p.m.

New subject: [review] Review of Unordered library begins today December 7

On 10/12/2007, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:

...

Daniel James skrev:

...
This implementation actually does provide the no throw guarantee. But Jeremy Maitin-Shepard's version (on which it is based) didn't because the iterators didn't store the bucket, and required the hash to be calculated to work out which bucket the elements to be deleted are in. Which I think the standard allows.

Ok, I think you right the standard allows that. But this is also a bit irritating, because I cannot portable take advantage of your stronger guarantee. feThis again begs the question of how much the stronger guarantee affects performance?

I don't think it affects performance at all, I think if it was required it would reduce the implementation choices. For example, if a container could rehash without invalidating iterators then I think it can currently do that when erase is called, but not if the stronger guarantee is required and hash can throw. Also, I made a mistake in the documentation, I said that a call to Hash or Pred can throw. But re-reading the standard, it looks like it doesn't have to be a call, it can be any member of the function objects.

...

yes, but my question was: *if an exception is thrown*, then what? I assume you give the basic guarantee?

Sorry, I misunderstood. This implementation gives the strong guarantee, the standard gives the basic guarantee. Daniel

David Abrahams

11 Dec 11 Dec

4:03 p.m.

New subject: [review] Review of Unordered library begins today December 7

on Mon Dec 10 2007, "Daniel James" <daniel_james-AT-fmail.co.uk> wrote:

...

On 10/12/2007, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:

...
Daniel James skrev:

...
This implementation actually does provide the no throw guarantee. But Jeremy Maitin-Shepard's version (on which it is based) didn't because the iterators didn't store the bucket, and required the hash to be calculated to work out which bucket the elements to be deleted are in. Which I think the standard allows.

Ok, I think you right the standard allows that. But this is also a bit irritating, because I cannot portable take advantage of your stronger guarantee. feThis again begs the question of how much the stronger guarantee affects performance?

I don't think it affects performance at all, I think if it was required it would reduce the implementation choices. For example, if a container could rehash without invalidating iterators then I think it can currently do that when erase is called, but not if the stronger guarantee is required and hash can throw.

Also, I made a mistake in the documentation, I said that a call to Hash or Pred can throw. But re-reading the standard, it looks like it doesn't have to be a call, it can be any member of the function objects.

...
yes, but my question was: *if an exception is thrown*, then what? I assume you give the basic guarantee?

Sorry, I misunderstood. This implementation gives the strong guarantee, the standard gives the basic guarantee.

I hope this implementation isn't sacrificing performance to get that strong guarantee. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Daniel James

6:49 p.m.

David Abrahams wrote:

...

I hope this implementation isn't sacrificing performance to get that strong guarantee.

Not at all. The strong guarantee is a natural consequence of the data structures that I'm using. Daniel

David Abrahams

12 Dec 12 Dec

12:45 p.m.

New subject: [review] Review of Unordered library begins today December 7

on Tue Dec 11 2007, Daniel James <daniel_james-AT-fmail.co.uk> wrote:

...

David Abrahams wrote:

...
I hope this implementation isn't sacrificing performance to get that strong guarantee.

Not at all. The strong guarantee is a natural consequence of the data structures that I'm using.

:) -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Ion Gaztañaga

9 Dec 9 Dec

9:58 p.m.

New subject: [review] Review of Unordered library begins today December 7

Thorsten Ottosen wrote:

...

Well, the table at

http://igaztanaga.drivehq.com/unordered/unordered/comparison.html

says that erase() does not throw for ordered containers (which is correct). Anyway, this means any argument used to assume that predicate evaluation is a no-throw should be the same for unordered containers.

You are right. The draft says in 23.1 (Container requirements): 10 Unless otherwise specified (see 23.2.2.3 and 23.2.5.4) all container types defined in this clause meet the following additional requirements: [...] — no erase(), pop_back() or pop_front() function throws an exception. so the predicate for associative container should not throw, because erase(const key_type &k) can't throw. On the other hand, this seems strange. I understand making erase(iterator) no-throw, but I find strange making erase(const key_type &)don't think the standard wanted to forbid exceptions from the predicate. Clearly unordered containers are different: 23.1.3.1 Exception safety guarantees [unord.req.except] 1 For unordered associative containers, no clear() function throws an exception. No erase() function throws an exception unless that exception is thrown by the container’s Hash or Pred object (if any). Why this difference? I hope someone knows the answer ;-) Regards, Ion

David Abrahams

11 Dec 11 Dec

4:05 p.m.

New subject: [review] Review of Unordered library begins today December 7

on Sun Dec 09 2007, Ion Gaztañaga <igaztanaga-AT-gmail.com> wrote:

...

Thorsten Ottosen wrote:

...
Well, the table at

http://igaztanaga.drivehq.com/unordered/unordered/comparison.html

says that erase() does not throw for ordered containers (which is correct). Anyway, this means any argument used to assume that predicate evaluation is a no-throw should be the same for unordered containers.

You are right. The draft says in 23.1 (Container requirements):

10 Unless otherwise specified (see 23.2.2.3 and 23.2.5.4) all ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ container types defined in this clause meet the following additional requirements:

[...]

— no erase(), pop_back() or pop_front() function throws an exception.

So it is "otherwise specified" for vector and deque.

...

so the predicate for associative container should not throw, because erase(const key_type &k) can't throw.

They're unrelated.

...

On the other hand, this seems strange. I understand making erase(iterator) no-throw, but I find strange making erase(const key_type &)don't think the standard wanted to forbid exceptions from the predicate. Clearly unordered containers are different:

23.1.3.1 Exception safety guarantees [unord.req.except]

1 For unordered associative containers, no clear() function throws an exception. No erase() function throws an exception unless that exception is thrown by the container’s Hash or Pred object (if any).

Why this difference? I hope someone knows the answer ;-)

Because unordered associative containers may rehash when erasing. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Joaquín Mª López Muñoz

5:17 p.m.

New subject: [review] Review of Unordered library begins today December7

David Abrahams ha escrito:

...

on Sun Dec 09 2007, Ion GaztaÃ±aga <igaztanaga-AT-gmail.com> wrote:

...
23.1.3.1 Exception safety guarantees [unord.req.except]

1 For unordered associative containers, no clear() function throws an exception. No erase() function throws an exception unless that exception is thrown by the containerâs Hash or Pred object (if any).

Why this difference? I hope someone knows the answer ;-)

Because unordered associative containers may rehash when erasing.

I think unordered containers are *not* allowed to rehash when erasing: see N2461, 23.1.3/8 "...Rehashing invalidates iterators, changes ordering between elements,..." and 23.1.3/12 "...The erase members shall invalidate only iterators and references to the erased elements." so the conclusion is that erase can't rehash (incidentally, the implications of rehashing when erasing a range would be difficult to cope with). The rationale IMHO behind allowing erase to let Hash or Pred throw is to allow implementers to write iterators without bucket tracking --the bucket is computed on the fly when needed, for instance when erasing. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Daniel James

7 p.m.

New subject: [review] Review of Unordered library begins today December7

Joaquín Mª López Muñoz wrote:

...

I think unordered containers are *not* allowed to rehash when erasing: see N2461, 23.1.3/8

"...Rehashing invalidates iterators, changes ordering between elements,..."

and 23.1.3/12

"...The erase members shall invalidate only iterators and references to the erased elements."

so the conclusion is that erase can't rehash (incidentally, the implications of rehashing when erasing a range would be difficult to cope with).

If a container implemented a rehash which doesn't invalidate iterators (which is certainly possible, if unlikely) then could it rehash? (Of course, such a container would have to maintain the order of the elements). Daniel

"JOAQUIN LOPEZ MU?Z"

7:25 p.m.

New subject: [review] Review of Unordered library begins today December7

----- Mensaje original ----- De: Daniel James <daniel_james@fmail.co.uk> Fecha: Martes, Diciembre 11, 2007 8:01 pm Asunto: Re: [boost] [review] Review of Unordered library begins today December7 Para: boost@lists.boost.org

...

Joaquín Mª López Muñoz wrote:

...
I think unordered containers are *not* allowed to rehash when erasing:> see N2461, 23.1.3/8

"...Rehashing invalidates iterators, changes ordering between elements,..."> and 23.1.3/12

"...The erase members shall invalidate only iterators and references> to the erased elements."

so the conclusion is that erase can't rehash (incidentally, the implications of rehashing when erasing a range would be difficult to cope with).

If a container implemented a rehash which doesn't invalidate iterators (which is certainly possible, if unlikely) then could it rehash? (Of course, such a container would have to maintain the order of the elements).

I don't think so: even if you manage to implement an iterator-conscious, order-preserving, sensible rehashing mechanism, getting new memory can throw std::bad_alloc, which is not allowed (only throwing from Hash or Pred permitted)--unless you catch such exceptions internally... but this is rather artificial. My opinion is that the writers of this piece of the C++0x standard didn't consider rehashing as a permissible operation when erasing. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Daniel James

12 Dec 12 Dec

10:23 p.m.

New subject: [review] Review of Unordered library begins today December7

On 11/12/2007, "JOAQUIN LOPEZ MU?Z" <joaquin@tid.es> wrote:

...

I don't think so: even if you manage to implement an iterator-conscious, order-preserving, sensible rehashing mechanism, getting new memory can throw std::bad_alloc, which is not allowed (only throwing from Hash or Pred permitted)--unless you catch such exceptions internally... but this is rather artificial. My opinion is that the writers of this piece of the C++0x standard didn't consider rehashing as a permissible operation when erasing.

If they don't they should say so explicitly as it's easy to implement a hash table that meets the requirements. For example the buckets could be an array of pointers into a list of nodes that are ordered by the hash function. Rehash then has strong exception safety and doesn't disrupt the order of the elements. Memory use will be the same, performance will be fine, perhaps a little slower to ensure the hash quality is good enough, and there will be notable gains since elements are never lost due to rehashing, iterators are never invalided by inserts, memory use can be reduced when a large number of elements are erased and iteration will be fast. And if that doesn't work, who knows what might be devised in the next ten years? Daniel

David Abrahams

6:06 p.m.

New subject: [review] Review of Unordered library begins today December7

on Tue Dec 11 2007, Joaquín Mª López Muñoz <joaquin-AT-tid.es> wrote:

...

David Abrahams ha escrito:

...
on Sun Dec 09 2007, Ion GaztaÃ±aga <igaztanaga-AT-gmail.com> wrote:

...
23.1.3.1 Exception safety guarantees [unord.req.except]

1 For unordered associative containers, no clear() function throws an exception. No erase() function throws an exception unless that exception is thrown by the containerâ€™s Hash or Pred object (if any).

Why this difference? I hope someone knows the answer ;-)

Because unordered associative containers may rehash when erasing.

I think unordered containers are *not* allowed to rehash when erasing: see N2461, 23.1.3/8

"...Rehashing invalidates iterators, changes ordering between elements,..."

and 23.1.3/12

"...The erase members shall invalidate only iterators and references to the erased elements."

so the conclusion is that erase can't rehash

Sorry, how do you reach that conclusion? It says that erase can invalidate iterators, and the only thing it says that rehashing invalidates is iterators. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Eric Niebler

6:14 p.m.

New subject: [review] Review of Unordered library begins today December7

David Abrahams wrote:

...

on Tue Dec 11 2007, Joaquín Mª López Muñoz <joaquin-AT-tid.es> wrote:

...
David Abrahams ha escrito:

...
on Sun Dec 09 2007, Ion GaztaÃ±aga <igaztanaga-AT-gmail.com> wrote:

...
23.1.3.1 Exception safety guarantees [unord.req.except]

1 For unordered associative containers, no clear() function throws an exception. No erase() function throws an exception unless that exception is thrown by the containerâ€™s Hash or Pred object (if any).

Why this difference? I hope someone knows the answer ;-) Because unordered associative containers may rehash when erasing. I think unordered containers are *not* allowed to rehash when erasing: see N2461, 23.1.3/8

"...Rehashing invalidates iterators, changes ordering between elements,..."

and 23.1.3/12

"...The erase members shall invalidate only iterators and references to the erased elements."

so the conclusion is that erase can't rehash

Sorry, how do you reach that conclusion? It says that erase can invalidate iterators, and the only thing it says that rehashing invalidates is iterators.

If I'm understanding correctly, erase can only invalidate iterators (to the erased elements*, but rehashing invalidates iterators to all elements. Therefor, erase cannot rehash. Did I get that right? -- Eric Niebler Boost Consulting www.boost-consulting.com

Joaquín Mª López Muñoz

6:24 p.m.

New subject: [review] Review of Unordered library begins today December7

Eric Niebler ha escrito:

...

David Abrahams wrote:

...
on Tue Dec 11 2007, JoaquÃn MÂª LÃ³pez MuÃ±oz <joaquin-AT-tid.es> wrote:

...
David Abrahams ha escrito:

...
on Sun Dec 09 2007, Ion GaztaÃÂ±aga <igaztanaga-AT-gmail.com> wrote:

...
23.1.3.1 Exception safety guarantees [unord.req.except]

1 For unordered associative containers, no clear() function throws an exception. No erase() function throws an exception unless that exception is thrown by the containerÃ¢â¬â¢s Hash or Pred object (if any).

Why this difference? I hope someone knows the answer ;-) Because unordered associative containers may rehash when erasing. I think unordered containers are *not* allowed to rehash when erasing: see N2461, 23.1.3/8

"...Rehashing invalidates iterators, changes ordering between elements,..."

and 23.1.3/12

"...The erase members shall invalidate only iterators and references to the erased elements."

so the conclusion is that erase can't rehash

Sorry, how do you reach that conclusion? It says that erase can invalidate iterators, and the only thing it says that rehashing invalidates is iterators.

If I'm understanding correctly, erase can only invalidate iterators (to the erased elements*, but rehashing invalidates iterators to all elements. Therefor, erase cannot rehash. Did I get that right?

This is exactly how I'm interpreting the standard (draft), yep. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Ion Gaztañaga

11 Dec 11 Dec

8:26 p.m.

David Abrahams wrote:

...

...
so the predicate for associative container should not throw, because erase(const key_type &k) can't throw.

They're unrelated.

So the exception specification for that erase version is missing. I thought that erase(const key_type &) was allowed to throw, but "no erase(), pop_back() or pop_front() function throws an exception" seems quite generic. If "erase()" means "erase(iterator)" I think it would be nice to clarify it and explicitly state if the Comparison function is allowed to throw. Just my 2 cents, Ion

David Abrahams

12 Dec 12 Dec

1:25 p.m.

New subject: [review] Review of Unordered library begins today December 7

on Tue Dec 11 2007, Ion Gaztañaga <igaztanaga-AT-gmail.com> wrote:

...

David Abrahams wrote:

...
...
so the predicate for associative container should not throw, because erase(const key_type &k) can't throw.

They're unrelated.

So the exception specification for that erase version is missing. I thought that erase(const key_type &) was allowed to throw, but "no erase(), pop_back() or pop_front() function throws an exception" seems quite generic. If "erase()" means "erase(iterator)" I think it would be nice to clarify it and explicitly state if the Comparison function is allowed to throw.

Just my 2 cents,

I think you're right. This looks like a defect to me. Howard: my suggested resolution is to say, for erase(key_type const&), no exception is thrown other than by the comparison function. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Howard Hinnant

5:08 p.m.

New subject: [review] Review of Unordered library begins today December 7

On Dec 12, 2007, at 8:25 AM, David Abrahams wrote:

...

on Tue Dec 11 2007, Ion Gaztañaga <igaztanaga-AT-gmail.com> wrote:

...
David Abrahams wrote:

...
...
so the predicate for associative container should not throw, because erase(const key_type &k) can't throw.

They're unrelated.

So the exception specification for that erase version is missing. I thought that erase(const key_type &) was allowed to throw, but "no erase(), pop_back() or pop_front() function throws an exception" seems quite generic. If "erase()" means "erase(iterator)" I think it would be nice to clarify it and explicitly state if the Comparison function is allowed to throw.

Just my 2 cents,

I think you're right. This looks like a defect to me. Howard: my suggested resolution is to say, for erase(key_type const&), no exception is thrown other than by the comparison function.

I can open an issue, but I'm not understanding why the "Unless otherwise specified" in [container.requirements]/10 doesn't give us what we want. [unord.req.except]/11 "otherwise specifies":

...

No erase() function throws an exception unless that exception is thrown by the container’s Hash or Pred object (if any).

This really seems NAD to me. -Howard

David Abrahams

11 Dec 11 Dec

4:02 p.m.

New subject: [review] Review of Unordered library begins today December 7

on Sun Dec 09 2007, Thorsten Ottosen <thorsten.ottosen-AT-dezide.com> wrote:

...

Daniel James skrev:

...
Ion Gaztañaga wrote:

...
Thorsten Ottosen wrote:

...
...
...
One thing puzzled me though: (see http://igaztanaga.drivehq.com/unordered/unordered/comparison.html)

"The containers hash or predicate function can throw exceptions from erase"

...
...
...
This migth be what the standard requires, but it certainly seems non-intuitive. Why not just require that neither hash nor predicates can throw?

The same could be said about Predicate for ordered containers like std::set/map. I haven't seen a practical case where the comparison could throw. But standard library experts should answer this better than me.

Well, the table at

http://igaztanaga.drivehq.com/unordered/unordered/comparison.html

says that erase() does not throw for ordered containers (which is correct). Anyway, this means any argument used to assume that predicate evaluation is a no-throw should be the same for unordered containers.

No, the ordered containers don't do any predicate evaluation in erase().

...

...
Another is that the data required to evaluate the hash function or check for equality is expensive and is lazily evaluated. For example, if you had a class which represents file, and based your equality test on calculating the SHA-1 signature then you might only calculate it when required - which could be a call to the hash or equality function.

yes, but would the lazy evaluation be done when inserting the object in the first place, thus not leaving it any lazy evaluation when calling erase?

It might be cached data that is later thrown out, or stored on disk.

...

Why does erase( const_iterator ) (and range version), not provide the no-throw guarantee?

Why should it? Vector/deque erase don't.

...

Does erase give the strong guarantee otherwise?

Since erase(const Key&) for ordered containers cannot throw, so I guess it is implicitly required that Pred(x,y) must not throw?

Nope, see above. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Thorsten Ottosen

4:48 p.m.

David Abrahams skrev:

...

on Sun Dec 09 2007, Thorsten Ottosen <thorsten.ottosen-AT-dezide.com> wrote:

...

...
says that erase() does not throw for ordered containers (which is correct). Anyway, this means any argument used to assume that predicate evaluation is a no-throw should be the same for unordered containers.

No, the ordered containers don't do any predicate evaluation in erase().

How can you implement std::set<T>::erase( const T& ) then, if you are not going to find the equivalent elements by evaluating Pred(x,y)?

...

...
Why does erase( const_iterator ) (and range version), not provide the no-throw guarantee?

Why should it? Vector/deque erase don't.

Because it is a node based container.

...

...
Does erase give the strong guarantee otherwise?

Since erase(const Key&) for ordered containers cannot throw, so I guess it is implicitly required that Pred(x,y) must not throw?

Nope, see above.

Still don't get it. -Thorsten

David Abrahams

12 Dec 12 Dec

12:57 p.m.

New subject: [review] Review of Unordered library begins today December 7

on Tue Dec 11 2007, Thorsten Ottosen <thorsten.ottosen-AT-dezide.com> wrote:

...

David Abrahams skrev:

...
on Sun Dec 09 2007, Thorsten Ottosen <thorsten.ottosen-AT-dezide.com> wrote:

...
...
says that erase() does not throw for ordered containers (which is correct). Anyway, this means any argument used to assume that predicate evaluation is a no-throw should be the same for unordered containers.

No, the ordered containers don't do any predicate evaluation in erase().

How can you implement

std::set<T>::erase( const T& )

then, if you are not going to find the equivalent elements by evaluating Pred(x,y)?

Sorry, I forgot about that one; I was just thinking of erase(iterator1, iterator2) and erase(iterator)

...

...
...
Why does erase( const_iterator ) (and range version), not provide the no-throw guarantee?

Why should it? Vector/deque erase don't.

Because it is a node based container.

Yeah, but it might rehash.

...

...
...
Does erase give the strong guarantee otherwise?

Since erase(const Key&) for ordered containers cannot throw, so I guess it is implicitly required that Pred(x,y) must not throw?

Nope, see above.

Still don't get it.

Hmm, I need to go back and check the draft standard. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams

11 Dec 11 Dec

4 p.m.

New subject: [review] Review of Unordered library begins today December 7

on Fri Dec 07 2007, Ion Gaztañaga <igaztanaga-AT-gmail.com> wrote:

...

Thorsten Ottosen wrote:

...
I vote to accept this library. The documentation is very nice.

Thanks for the vote.

...
One thing puzzled me though: (see http://igaztanaga.drivehq.com/unordered/unordered/comparison.html)

"The containers hash or predicate function can throw exceptions from erase"

Under what cicumstances can/should/would I construct a ahsh or predicate that throws? I mean, the hash computes an integer, and comparisons reads data to determine its answer.

This migth be what the standard requires, but it certainly seems non-intuitive. Why not just require that neither hash nor predicates can throw?

The same could be said about Predicate for ordered containers like std::set/map. I haven't seen a practical case where the comparison could throw. But standard library experts should answer this better than me.

Certainly, Daniel has tried to achieve strong exception guarantees and the implementation becomes quite complicated if comparison/hash throws. Double buffering and other tricks are needed. I think this is a very good question both for boosters and people from the LWG.

When specifying requirements in the standard, we (the LWG) don't like to constrain implementations or users if possible. I don't see any reason to forbid a throwing comparison or hash. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Thorsten Ottosen

4:45 p.m.

David Abrahams skrev:

...

on Fri Dec 07 2007, Ion Gaztañaga <igaztanaga-AT-gmail.com> wrote:

...
Thorsten Ottosen wrote:

...

...
...
One thing puzzled me though: (see http://igaztanaga.drivehq.com/unordered/unordered/comparison.html)

"The containers hash or predicate function can throw exceptions from erase"

...

...
Certainly, Daniel has tried to achieve strong exception guarantees and the implementation becomes quite complicated if comparison/hash throws. Double buffering and other tricks are needed. I think this is a very good question both for boosters and people from the LWG.

When specifying requirements in the standard, we (the LWG) don't like to constrain implementations or users if possible. I don't see any reason to forbid a throwing comparison or hash.

Right, but this is often a double-edged sword. Think of problems with allowing std::list<T>::size() to be O(n). What is flexibility for the implementer is often a synonym for non-portability for the user. As a library implementer I have to be conservative w.r.t. the exceotion-safety guarantees given which means the additional guarantees given by some implementations cannot be taken advantage of. Of course, I might take advantage of them in non-generic code, but then at the expense of being non-portable. -Thorsten

David Abrahams

12 Dec 12 Dec

12:44 p.m.

New subject: [review] Review of Unordered library begins today December 7

on Tue Dec 11 2007, Thorsten Ottosen <thorsten.ottosen-AT-dezide.com> wrote:

...

David Abrahams skrev:

...
on Fri Dec 07 2007, Ion Gaztañaga <igaztanaga-AT-gmail.com> wrote:

...
Thorsten Ottosen wrote:

...
...
...
One thing puzzled me though: (see http://igaztanaga.drivehq.com/unordered/unordered/comparison.html)

"The containers hash or predicate function can throw exceptions from erase"

...
...
Certainly, Daniel has tried to achieve strong exception guarantees and the implementation becomes quite complicated if comparison/hash throws. Double buffering and other tricks are needed. I think this is a very good question both for boosters and people from the LWG.

When specifying requirements in the standard, we (the LWG) don't like to constrain implementations or users if possible. I don't see any reason to forbid a throwing comparison or hash.

Right, but this is often a double-edged sword. Think of problems with allowing std::list<T>::size() to be O(n).

Or allowing ptr_container<T> to not be assignable and copy-constructible?

...

What is flexibility for the implementer is often a synonym for non-portability for the user. As a library implementer I have to be conservative w.r.t. the exceotion-safety guarantees given which means the additional guarantees given by some implementations cannot be taken advantage of. Of course, I might take advantage of them in non-generic code, but then at the expense of being non-portable.

I'm aware of all the tradeoffs. It's a balancing act, as ever. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Thorsten Ottosen

2:47 p.m.

David Abrahams skrev:

...

on Tue Dec 11 2007, Thorsten Ottosen <thorsten.ottosen-AT-dezide.com> wrote:

...
David Abrahams skrev:

...
on Fri Dec 07 2007, Ion Gaztañaga <igaztanaga-AT-gmail.com> wrote:

...
...
One thing puzzled me though: (see http://igaztanaga.drivehq.com/unordered/unordered/comparison.html)

"The containers hash or predicate function can throw exceptions from erase" Certainly, Daniel has tried to achieve strong exception guarantees and

Thorsten Ottosen wrote: the implementation becomes quite complicated if comparison/hash throws. Double buffering and other tricks are needed. I think this is a very good question both for boosters and people from the LWG. When specifying requirements in the standard, we (the LWG) don't like to constrain implementations or users if possible. I don't see any reason to forbid a throwing comparison or hash. Right, but this is often a double-edged sword. Think of problems with allowing std::list<T>::size() to be O(n).

Or allowing ptr_container<T> to not be assignable and copy-constructible?

yes. FWIW, they are in 1.35. -Thorsten

Mathias Gaunard

9 Dec 9 Dec

11:37 p.m.

New subject: [review] Review of Unordered library begins today December 7

Ion Gaztañaga wrote:

...

The formal review of the Unordered library, proposed by Daniel James, begins today :

Just a few simple remarks. Why not add support for moving and in-place construction?

Daniel James

10 Dec 10 Dec

12:36 a.m.

New subject: [review] Review of Unordered library begins today December 7

On 09/12/2007, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:

...

Just a few simple remarks. Why not add support for moving and in-place construction?

Short answer: If accepted, it's pretty likely that they will be supported, but maybe not in the first release. Long answer: The main reason for not supporting move is that boost doesn't have a move library. Recently Adobe's and Boost.Thread's move libraries have been discussed on this list so I'll look into them. Boost.Thread's is probably more appealing since it's already in Boost, but I'd need to review it first. There's no support in the standard yet, but I don't think that's a problem. In place construction has been added to the standard, but only recently and I haven't had a chance to catch up. There are some problems since existing allocators aren't going to support it and std::pair will be missing the extensions to support constructing its members in place. It's possible to use a replacement class, or perform some sort of hack (i.e. constructing the members yourself instead of using std::pair's constructor). Allocators are more of a problem, but perhaps some compromise could be made. Some people (including Ion) think that the allocator shouldn't be used for construction and destruction. This was proposed, and then opposed: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2257.html http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2339.htm If I don't use construct and destroy there wouldn't be a problem here (and the implementation would perhaps be a little simpler). If I do, then maybe I could provide a default allocator that supports it and use some sort of marker in the allocator to indicate if it supports in place allocation. (IIRC there was proposal to add version information to allocators, although I haven't heard about that for a while, and it's too late for me to look it up right now). Daniel

David Abrahams

11 Dec 11 Dec

4:06 p.m.

New subject: [review] Review of Unordered library begins today December 7

on Sun Dec 09 2007, "Daniel James" <daniel_james-AT-fmail.co.uk> wrote:

...

The main reason for not supporting move is that boost doesn't have a move library. Recently Adobe's and Boost.Thread's move libraries have been discussed on this list so I'll look into them. Boost.Thread's is probably more appealing since it's already in Boost, but I'd need to review it first.

I think the Boost.Thread approach only works for movable-but-noncopyable objects. For "regular types" (see stepanov's definition) you need Adobe's version. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Daniel James

6:58 p.m.

David Abrahams wrote:

...

I think the Boost.Thread approach only works for movable-but-noncopyable objects. For "regular types" (see stepanov's definition) you need Adobe's version.

Would we have to ask them to change the licence to do that?

David Abrahams

12 Dec 12 Dec

12:45 p.m.

New subject: [review] Review of Unordered library begins today December 7

on Tue Dec 11 2007, Daniel James <daniel_james-AT-fmail.co.uk> wrote:

...

David Abrahams wrote:

...
I think the Boost.Thread approach only works for movable-but-noncopyable objects. For "regular types" (see stepanov's definition) you need Adobe's version.

Would we have to ask them to change the licence to do that?

We have permission from Sean Parent at Adobe to "Boostify" any ASL code we want. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Hervé Brönnimann

10 Dec 10 Dec

2:42 a.m.

New subject: [review] Review of Unordered library begins today December 7

First of all, this is a fine implementation. I'm just curious why, since most compilers now provide a tr1 implementation, if it is just historical or if this implementation claims to be better than some other existing TR1 implementations. But with the purpose of fitting Boost.Tr1, this will fulfill the purpose. And great care was taken to be standard compliant. I vote for accept. I tried to spend a few hours looking at the code and reading the documentation. This is what I find below. I haven't had the time to even run an example or test the code, though. I limited myself only to examination. Overall, the focus and hard-work (esp. with exception safety and its testing) is impressive. It seems you've taken care of a lot of standardese aspects. Kudos. Definition of operator==: --------------------------------- One small issue I see, is that you should define operator==. I searched a bit, and found Howard's statement (on http:// www.velocityreviews.com/forums/t291501-stdset-versus- tr1unorderedset.html):

...

I unsuccessfully tried to get an operator== into tr1::unordered_*. Assuming a good hash function such an operation can have the intended semantics and O(N) complexity."

I find it unfortunate that it's not defined in the standard. Howard gives the same implementation I would have given, which works in O(N) time. I found also the following bogus note in the Working Draft:

...

10 Unordered associative containers are not required to support the expressions a == b or a != b. [ Note: This is because the container requirements define operator equality in terms of equality of ranges. Since the elements of an unordered associative container appear in an arbitrary order, range equality is not a useful operation. — end note]

As if they couldn't specify in the Table "Additional requirements" a different post-condition, or alternately, state the post-condition in terms of range only in the Tables for sequence and ordered associative containers, and put in a different post-condition for unordered containers. Howard also states:

...

The final decision was to let implementors provide it as an extension, but not to require it.

Daniel, seems to me that you'd have the liberty to implement it, and if you chose to not name it operator==, that'd be fine too. It would be a boost extension. But it really irks me that an unordered container is considered not to have a value (can't have a value if you can't compare them) when there's no good reason not to provide operator==. Comments on the code: -------------------------------- I really appreciate the care you've taken wrt exception safety. The prime number list is too short, it roughly doubles every time. On the other hand, I understand that it wouldn't do that have a *much* larger list, as this static array is included in every translation unit. One possibility for a much finer one would be to make unordered an object library, included in the build. I understand if you want to keep it "include-only", but could you at least enlarge the list to have an increase of 30% instead of 100%? In the next_prime and prev_prime functions, please do not use the hard constant 28. Use instead static const int num_primes = sizeof prime_list / sizeof *prime_list; What exactly is the use of pair_cast? There must be a legitimate reason that I don't quite understand. The pair template conversion constructor should suffice, no? Re: swap: I really think you should be following the committee's current advice (I'm looking at the code of hash+table_impl.hpp, lines 1194 -- 1219), sooner rather than later. But my opinion is biased. At least you state precisely what you're doing. This is not a criticism of your code. But such change might break your client's code, if the standard later codifies its current decision. It seems that changing the semantics of swap before unordered is accepted and released with Boost, would save some grief later. There is some convention in boost that members begin with m_... or s_... but I don't think it's very important. Your system (the trailing underscore) is fine. Yet, these things show in a debugger. Minor comments on the documentation: ------------------------------------------------------ In buckets.html: * The sentence "The + 1 is required because the container is allowed to resize when the load factor is equal to the maximum load factor" is unclear to me, although I think I figured out the intended meaning. Resize is most certainly wrong here (size is not changed), you meant rehash? I'd rephrase "The + 1 guarantees there is no invalidation; without it, reallocation could occur if the number of bucket exactly divides the target size, since the container is allowed to resize when the load factor is equal to the maximum load factor." In hash_equality.html: * This is not quite true: "An example implementation of FNV-1, and some other hash functions are supplied in the examples directory." Only FNV is provided... After all, these unordered containers are only useful if you have a good hash function to go with it. You should definitely mention the Boost.Hash library, which provides hashes for these unordered containers. Nevertheless boost.hash does not give you tradeoff between speed and good hashing, which is why other more specialized hashes could be provided. Besides FNV-1, I found Robert Jenkins' (http://www.burtleburtle.net/bob/hash/ evahash.html) quite good. * I am always reminded of Matt Austern's column (http://lafstern.org/ matt/col2_new.pdf) for the example you give. I went back to get most o the points, and then read your implementation. Unfortunately, your functor implementation falls prey to the default pointed out in Matt's article: std::toupper depends on the global locale, so that if I call setlocale between two calls to your functor, I might get different results. Be sure to check out Matt's column. * the declaration boost::unordered_multiset<point, std::equal_to<point>, point_hash> points; looks wrong (order of template args). * I wished you put links to boost.hash. In comparison.html: * The spelling parametrise is quite unusual (although correct, very British!) but in any case incompatible with the C++ Working Draft, which uses parameterize. I think you should adopt the most common spelling. Same thing with amortized. * "ordering comparison: No equivalent. No idea why." A bit of humor doesn't hurt :) * "Can be compared using the ==, !=, <, <=, >, >= operators: No comparison operators are defined." That's a sticky point I mentioned above. * Complexity guarantees: You should state (recall?) what n, N, etc. stand for. Esp since you use size() also. * "Inserting a range of N elements: Average case O(N), worst case O(N * 'size()')" single quotes and font around size() * "Find: logarithmic Average case: O(N), Worst case: O(size())" I think you meant "Average case: O(1)" In rationale.hpp: * "For example, an it..." : delete "an" * "loosing the efficiency": losing efficiency. Spelling in other places too * "Swap: I followed Howard Hinnant's advice and implemented option 3.": Huh? Were are these options 1, 2, and 3 described? You should give a link to http://www.open-std.org/JTC1/SC22/WG21/docs/papers/ 2004/n1599.html. * reference pages: the constructor that takes arguments (size_type = impl-defined, ..) should be declared explicit, for all four containers. The code is already doing the right thing. -- Hervé Brönnimann hervebronnimann@mac.com

Daniel James

8:54 p.m.

New subject: [review] Review of Unordered library begins today December 7

Thanks for the review, there's a lot to act on. On 10/12/2007, Hervé Brönnimann <hervebronnimann@mac.com> wrote:

...

Definition of operator==: ---------------------------------

[snip]

...

I find it unfortunate that it's not defined in the standard. Howard gives the same implementation I would have given, which works in O(N) time.

And only works for unordered_set. How would you define equality for unordered_multimap? Do you compare the mapped values, does the order of the values matter? If it doesn't then the comparison becomes more expensive, if it does then it's a bit odd for an 'unordered' container (although this implementation does preserve insertion order, if it didn't then things would be worse). Is it okay to define equality in terms of the equality predicate for keys, but operator== for values?

...

I found also the following bogus note in the Working Draft:

...
10 Unordered associative containers are not required to support the expressions a == b or a != b. [ Note: This is because the container requirements define operator equality in terms of equality of ranges. Since the elements of an unordered associative container appear in an arbitrary order, range equality is not a useful operation. — end note]

That's not bogus at all. An equality operator for unordered containers would have completely different semantics to all other containers (especially when using custom comparisons, comparing associative containers can have surprising results).

...

Daniel, seems to me that you'd have the liberty to implement it, and if you chose to not name it operator==, that'd be fine too. It would be a boost extension. But it really irks me that an unordered container is considered not to have a value (can't have a value if you can't compare them) when there's no good reason not to provide operator==.

It isn't true that you can't compare them. In fact, you can implement your own comparison that would be pretty much as efficient as any that I provided. Right now, I'm trying to limit myself to obvious design decisions for extensions so I don't really want to go into this area.

...

The prime number list is too short, it roughly doubles every time. On the other hand, I understand that it wouldn't do that have a *much* larger list, as this static array is included in every translation unit.

I think it can actually cause an ODR violation so I probably should change that anyway. David Abrahams recently posted a way to avoid this by including the value in a template, so I'll try that.

...

One possibility for a much finer one would be to make unordered an object library, included in the build. I understand if you want to keep it "include-only", but could you at least enlarge the list to have an increase of 30% instead of 100%?

How much of a difference would it make? The memory requirements for the array of buckets is typically smaller than the memory required for the nodes (unless you have a small number of nodes or increase the maximum load factor). The primes that I'm using apparently give better results because they are as far away as possible from powers of two (I'm pretty sceptical about that though). But I don't really have a strong opinion one way or the other, so if no one disagrees, I'll add more primes.

...

In the next_prime and prev_prime functions, please do not use the hard constant 28. Use instead static const int num_primes = sizeof prime_list / sizeof *prime_list;

Okay.

...

What exactly is the use of pair_cast? There must be a legitimate reason that I don't quite understand. The pair template conversion constructor should suffice, no?

It works around some of the less standard compliant STL implementations.

...

Re: swap: I really think you should be following the committee's current advice (I'm looking at the code of hash+table_impl.hpp, lines 1194 -- 1219), sooner rather than later. But my opinion is biased.

Why is your opinion biased?

...

At least you state precisely what you're doing. This is not a criticism of your code. But such change might break your client's code, if the standard later codifies its current decision. It seems that changing the semantics of swap before unordered is accepted and released with Boost, would save some grief later.

That works both ways, as the committee could change its mind. Since we don't have concepts, following their current decision means slow swaps. Which isn't that bad since it's only for the rare occasions when the allocators aren't equal. I haven't got a strong preference here either, but this was discussed before and the general consensus seemed to be for what I've done.

...

There is some convention in boost that members begin with m_... or s_... but I don't think it's very important. Your system (the trailing underscore) is fine. Yet, these things show in a debugger.

If there's a convention, it's not followed widely. There are a lot of boost libraries that use a trailing underscore for members.

...

Minor comments on the documentation: ------------------------------------------------------

I'll act on all of these, a few weak excuses below.

...

In hash_equality.html:

* This is not quite true: "An example implementation of FNV-1, and some other hash functions are supplied in the examples directory." Only FNV is provided... After all, these unordered containers are only useful if you have a good hash function to go with it. You should definitely mention the Boost.Hash library, which provides hashes for these unordered containers. Nevertheless boost.hash does not give you tradeoff between speed and good hashing, which is why other more specialized hashes could be provided. Besides FNV-1, I found Robert Jenkins' (http://www.burtleburtle.net/bob/hash/ evahash.html) quite good.

I wrote that because I had actually implemented some other hash functions but then I didn't include them because I was worried about copyright. But Bob Jenkins's hash functions are public domain, so I'm not sure why I was worried about them. I'll add something based on his functions. And I'll check on some of the other ones I was considering.

...

* I am always reminded of Matt Austern's column (http://lafstern.org/ matt/col2_new.pdf) for the example you give. I went back to get most o the points, and then read your implementation. Unfortunately, your functor implementation falls prey to the default pointed out in Matt's article: std::toupper depends on the global locale, so that if I call setlocale between two calls to your functor, I might get different results. Be sure to check out Matt's column.

Good point. The example is actually a simplified version of 'libs/unordered/examples/case_insensitive.hpp' which doesn't have this problem (well, it wouldn't if it didn't have a bug which is fixed in subverison). I probably simplified it too far.

...

* "Swap: I followed Howard Hinnant's advice and implemented option 3.": Huh? Were are these options 1, 2, and 3 described? You should give a link to http://www.open-std.org/JTC1/SC22/WG21/docs/papers/ 2004/n1599.html.

There is a link but it's cunningly hidden in the heading. I never noticed that the boost documentation stylesheet obscures it. I'll add visible links. Daniel

Hervé Brönnimann

11 Dec 11 Dec

2:26 a.m.

New subject: [review] Review of Unordered library begins today December 7

Daniel: First, thanks for your reply. And sorry if I came across as overly critical. I think your library is excellent. The point of the review process is to try to make it even better :) In fact, if we could make the C++ standard better in the process, I'd be delighted. As I've said in my review, but to be clear I'll say once again, none of these items under discussion below are contingent to my (enthusiastic) vote in favor of your library. They are merely for your consideration. On Dec 10, 2007, at 3:54 PM, Daniel James wrote:

...

...
I find it unfortunate that it's not defined in the standard. Howard gives the same implementation I would have given, which works in O(N) time.

And only works for unordered_set. How would you define equality for unordered_multimap? Do you compare the mapped values, does the order of the values matter? If it doesn't then the comparison becomes more expensive, if it does then it's a bit odd for an 'unordered' container (although this implementation does preserve insertion order, if it didn't then things would be worse). Is it okay to define equality in terms of the equality predicate for keys, but operator== for values?

It works equally well with a little tweak if you require equality of count(key) for the multiset. I didn't think about map/multimap, thanks for pointing this out. But Daniel, what exactly does operator== for std::multimap do??? In fact, because insertion in std::multimap inserts at the end of the range, the value of std::multimap depends on the insertion order. You wouldn't be doing anything different with unordered_multimap. And as you mention, your implementation does already preserve insertion order, so you wouldn't have any problem to supply these extensions, and they'd be well- defined. Isn't it better if you can provide stronger guarantees than the current draft standard, just as you already do for exceptions? Now the standard should really define what happens when inserting equivalent keys in unordered_multimap, just as it does for multimap, but we're not fixing the standard. I'll raise that point on another mailing list if I get around it. As I understand, it depends on whether you use singly- or doubly-linked lists for chaining, and I buy your argument as to why use doubly-linked lists for multiset. I think you made good points there. As for the equality comparison vs predicate, the same again is true for operator== for the standard associative containers, defined as a.size() == b.size() && std::equal(a.begin(), a.end(), b.begin() [sic] (no, *I* didn't forget the final paren, it's not on page 671 either). Nowhere in that makes use of the "equivalence" implied by operator<, which is used for uniqueness in set/map and equivalence in multiset/multimap. So by defining equality in terms of the equality predicate for keys in order to check for the uniqueness property, but operator== for both keys and values in the implementation of operator== for unordered_map/multimap, you wouldn't exactly break new ground. Or am I missing some subtle implications or other clause in the standard?

...

...
I found also the following bogus note in the Working Draft:

...
10 Unordered associative containers are not required to support the expressions a == b or a != b. [ Note: This is because the container requirements define operator equality in terms of equality of ranges. Since the elements of an unordered associative container appear in an arbitrary order, range equality is not a useful operation. — end note]

That's not bogus at all. An equality operator for unordered containers would have completely different semantics to all other containers (especially when using custom comparisons, comparing associative containers can have surprising results).

Why not? As I argue above, the same is already true of the ordered associative containers. I don't think you'd be providing anything more surprising than already present in the standard. As for surprising semantics, a set and multiset have well-defined notions of value (in a mathematical sense), i.e., a subset from a ground set of key, which is exactly what operator== compares. For map and multimap, the value is a function from a ground set of keys to a value for map, sequence of values for multimap (sequence means that order matters). The operational definition in terms of range is simply an algorithm for comparing those values. Exactly the same is true for the values of unordered containers; operator== as Howard and I've defined them precisely perform this comparison of value. I find nothing surprising in that. The fact that operator== is defined in terms of ranges for sequences and ordered associative containers, and is defined/implemented differently for unordered containers, is not surprising, it's necessary. Again, I believe the standard could be improved on this, and wouldn't your library be the natural vector to provide a reference implementation for a proposal to define operator== for unordered containers?

...

"Right now, I'm trying to limit myself to obvious design decisions for extensions so I don't really want to go into this area.

Fair enough. You gave good points. I hope my counter-arguments are equally relevant, but you're of course free to not implement them. Still, as boost explores possibilities for later standardization, it could be hoped that you would provide an implementation that provides prior art for future discussion in the committee. Whether you or someone else (perhaps me, I really would like to) would make the proposal, is for the future. Still, I think there's only one sensible definition for unordered_set, and unordered_multiset, and the multiset version isn't trivial either. Would be nice if at least those werre in the library, perhaps with a named function instead of an operator. Would it be a problem if you at least provided those?

...

...
The prime number list is too short, it roughly doubles every time. On the other hand, I understand that it wouldn't do that have a *much* larger list, as this static array is included in every translation unit.

...

I think it can actually cause an ODR violation so I probably should change that anyway. David Abrahams recently posted a way to avoid this by including the value in a template, so I'll try that.

In theory, yes. In practice, linkers don't really enforce it, I hear. Having it as a template doesn't solve the basic problem, which is the replication of identical data in many translation units. In fact, it might exacerbate it by having the same table defined multiple times in the same translation unit! Where I work, we're trying to curb executable sizes (which can be enormous, like you wouldn't believe). This is precisely what would get a frown at the code review.

...

How much of a difference would it make? The memory requirements for the array of buckets is typically smaller than the memory required for the nodes (unless you have a small number of nodes or increase the maximum load factor). T

You're correct, unless people like to have very large maps of small types (unordered_map<int, some_type*> with 10M elements) where it starts to really matter. On the other end, it also matters if a user has lots of small maps. I think it can really be a problem, in fact I have been in that situation myself (having to fix the prime number table of our implementation of hash table, by request of one user with performance-critical needs). So here I actually speak from practical experience. I believe the best thing, is to move the table to the .cpp, but I also I totally understanding the appeal of header-only libraries. Would you consider a configuration flag, whereby unless explicitly defined, your table is in the header, but for a user which controls its own site installation, a table can be generated into a .cpp and compiled into a library to be linked into programs using Boost.Unordered? I actually did write such a piece of code, it isn't hard, and you can specify the step (30% or 100%) and the range of the desired table. If you include this code into the example section and make the flag public, your users can choose what they prefer based on their requirements. Then again, if they're that sophisticated, maybe they can do it themselves.

...

...
Re: swap: I really think you should be following the committee's current advice (I'm looking at the code of hash_table_impl.hpp, lines 1194 -- 1219), sooner rather than later. But my opinion is biased.

Why is your opinion biased?

My boss is John Lakos, and my colleague Pablo Halpern. They both have strong opinions on allocators and value-semantic types. In fact, they're arguing exactly about these issues with the committee (N2436, N2387, N1850 ). I've been converted myself :)

...

That works both ways, as the committee could change its mind. Since we don't have concepts, following their current decision means slow swaps. Which isn't that bad since it's only for the rare occasions when the allocators aren't equal. I haven't got a strong preference here either, but this was discussed before and the general consensus seemed to be for what I've done.

What I hear from Pablo and John is that they're getting good support around their proposal (ok, they're biased too :)... Maybe you're right to wait and see what the committee will do. Hopefully, won't take long as C++0x is around the horizon and there'll be two meetings in Jan. and June. Somehow, I didn't think that Pablo's proposal (N2436) required concepts, but required traits instead. But I'm not intimately familiar. -- Hervé Brönnimann hervebronnimann@mac.com

Daniel James

8:24 p.m.

New subject: [review] Review of Unordered library begins today December 7

Hervé Brönnimann wrote:

...

As for the equality comparison vs predicate, the same again is true for operator== for the standard associative containers, defined as a.size() == b.size() && std::equal(a.begin(), a.end(), b.begin() [sic] (no, *I* didn't forget the final paren, it's not on page 671 either). Nowhere in that makes use of the "equivalence" implied by operator<, which is used for uniqueness in set/map and equivalence in multiset/multimap. So by defining equality in terms of the equality predicate for keys in order to check for the uniqueness property, but operator== for both keys and values in the implementation of operator== for unordered_map/multimap, you wouldn't exactly break new ground.

That's a very compelling argument. When using a custom comparison object, operator== can already give weird results for std::map, so why shouldn't it for std::unordered_map? There's a problem if the containers' function objects aren't equal, but I guess the solution is just not to use operator== in that very unusual case.

...

I believe the best thing, is to move the table to the .cpp, but I also I totally understanding the appeal of header-only libraries. Would you consider a configuration flag, whereby unless explicitly defined, your table is in the header, but for a user which controls its own site installation, a table can be generated into a .cpp and compiled into a library to be linked into programs using Boost.Unordered? I actually did write such a piece of code, it isn't hard, and you can specify the step (30% or 100%) and the range of the desired table. If you include this code into the example section and make the flag public, your users can choose what they prefer based on their requirements.

An optionally separately compiled part is fine. I think it's a good idea. As for specifying the step, I'm not sure about that. It might be better just to go for a fixed 30% step.

...

What I hear from Pablo and John is that they're getting good support around their proposal (ok, they're biased too :)... Maybe you're right to wait and see what the committee will do. Hopefully, won't take long as C++0x is around the horizon and there'll be two meetings in Jan. and June. Somehow, I didn't think that Pablo's proposal (N2436) required concepts, but required traits instead. But I'm not intimately familiar.

Okay, if no one disagrees I'll change the swap implementation. Daniel

Peter Dimov

11:15 p.m.

New subject: [review] Review of Unordered library begins today December 7

Daniel James:

...

Okay, if no one disagrees I'll change the swap implementation.

Nobody (in the committee or otherwise) has any idea yet what swap will end up doing, so for now, you probably should do what you think is best.

Daniel James

12 Dec 12 Dec

10:24 p.m.

New subject: [review] Review of Unordered library begins today December 7

On 11/12/2007, Peter Dimov <pdimov@pdimov.com> wrote:

...

Nobody (in the committee or otherwise) has any idea yet what swap will end up doing, so for now, you probably should do what you think is best.

Well, I don't have any real experience of using custom allocators so I'd prefer to follow the advice of others who do. Daniel

John Torjo

1:14 p.m.

...

Here are some questions you might want to answer in your review:

* What is your evaluation of the design?

Looks good to me.

...

* What is your evaluation of the implementation?

Looks good, haven't looked much into the detail/ folder

...

* What is your evaluation of the documentation?

Good.

...

* What is your evaluation of the potential usefulness of the library?

Extremely useful.

...

* Did you try to use the library? With what compiler? Did you have any problems?

Haven't tried

...

* How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

Quick reading.

...

* Are you knowledgeable about the problem domain?

I've used hash_map implementations in the past.

...

And finally, every review should answer this question:

* Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion.

Yes, definitely! Best, John -- http://John.Torjo.com -- C++ expert ... call me only if you want things done right

Jamie Allsop

19 Dec 19 Dec

3:04 a.m.

New subject: [review] Review of Unordered library begins today December 7

First apologies for the belated review and its brevity - I really had hoped I'd have time to be more thorough, but here goes. First some comments on the docs (nits/suggestions really). The documentation is really very good. --- Introduction 2nd para - "For a hash table you need an equality function and a hash function for the key." I got the impression you were contrasting this with requirements of the ordered containers requiring less than comparable values. If that is so then you might want to use something like, "In contrast a hash table ..." of "For a hash table you _only_ need ..." as you are using this to point out an advantage of hash tables. 3rd para - "So the C++ Standard Library Technical Report introduced ..." I think needs re-worded slightly. 'So' presumably refers to the advantages of the hashed containers. In this case it is abrupt and not very specific. Perhaps you could relate the start of the sentence to the advantages you oultine more directly. For example, "With this in mind ...", or simply start, "The C++ Library Technical Report has therefore introduced ..." Last para - "There are other differences, which will be detailed later." If there are one or two specific places where these are enumerated it would be nice to directly reference the sections. --- Data Structure 2nd para - A reference to the "Equality Predicates and Hash Functions" would be nice, if only to indicate to the reader that there is more detailed information. You could perhaps hyperlink "hash function" to that section. 3rd para - "If at a later date the container wants to find an element in the container it just has to apply the same process to the element's key to discover which bucket it is in." This rather long sentence could do with one or more commas to make reading it easier for example, "If at a later date the container wants to find an element in the container, it just has to apply the same process to the element's key to discover which bucket it is in." Better yet, try to split it into two sentences (if that is possible). 4th para - /comparison/comparisons/ --- Controlling the number of buckets 1st para - "...performance to get worse." Should this be "...performance to worsen/degrade." 2nd para - "First you can specify the minimum number of buckets in the constructor, and later, by calling rehash." Ok, so you mean that you can specify a minimum number of buckets when you construct your container, after construction you can also influence the bucket count by calling rehash? The "... ,and later, ..." threw me the first time I read it. 3rd para - "The other method is the max_load_factor member function." Ok, now I am confused... are there three methods? Or are the two methods 1 - specify the minimum number of buckets in the constructor 2 - the max_load_factor member function ...and calling rehash is relevant to both. I think this and the previous paragraph need to be made more coherent. 6th para - "Note: rehash's argument is the number of buckets..." Ok, now I think I am seeing what you want to say in the 2nd paragraph -- the number of buckets can be specified either during construction or when rehash is called? "The + 1 is required because the container is allowed to resize when the load factor is equal to the maximum load factor." For example? It's late here and I'm not sure what you mean (but I guess it will be obvious in the morning :) ) --- Equality Predicates and Hash Functions This section has been well discussed in previous posts. However on reading it I thought that an earlier explicit reference to Boost.Hash might be appropriate, spelling out the relationship between the two libraries. --- One thing that would be nice are some performance details, perhaps comparisons against some other library implementations using the same has function. Just a thought. --- The rest of the docs go into a variety of details about specific topics and is all very good. Ion Gaztañaga wrote:

...

* What is your evaluation of the design?

Well thought out and closely tracking TR1.

...

* What is your evaluation of the implementation?

I haven't really looked at it - if I get a chance I will do so.

...

* What is your evaluation of the documentation?

Excellent - see above.

...

* What is your evaluation of the potential usefulness of the library?

We will be looking to use this library in critical areas of our codebase where performance is critical

...

* Did you try to use the library? With what compiler?

Not yet.

...

Did you have any problems?

N/A

...

* How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

I read the docs a few times and looked at other references, plus reviewed the previous reviewers comments.

...

* Are you knowledgeable about the problem domain?

So so. I haven't done a lot of work with hash containers before (I haven't needed to). I DO need to make heavy use of them now so this library is very timely. I've unfortunately not been able to get time to replace our existing implementation with this one and do a performance comparison.

...

And finally, every review should answer this question:

* Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion.

This is a much needed library in boost. I vote for it to be accepted.

...

Unordered containers are likely to become one of the most used Boost libraries so I encourage you to review the library in order to see if Unordered fulfills all your expectations.

I think this appears to be an excellent start. Performance will be an issue for me, but as I am in a position to do the needed comparisons myself (but don't have time yet) I can appreciate how more performance information could be very subjective to different peoples' needs. Jamie

...

Ion Gaztañaga - Review Manager - _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Daniel James

9:52 p.m.

New subject: [review] Review of Unordered library begins today December 7

On 19/12/2007, Jamie Allsop <ja11sop@yahoo.co.uk> wrote:

...

First apologies for the belated review and its brevity - I really had hoped I'd have time to be more thorough, but here goes.

Thanks for reviewing the library - suggestions are always welcome, especially help with improving the documentation. I'll look into your suggestions soon (they all look worthwhile).

...

One thing that would be nice are some performance details, perhaps comparisons against some other library implementations using the same has function. Just a thought.

A problem is that some implementations require better quality hash functions because they use a power of 2 for the number of buckets. So it's probably worth trying a some different hash functions.

...

I haven't done a lot of work with hash containers before (I haven't needed to). I DO need to make heavy use of them now so this library is very timely. I've unfortunately not been able to get time to replace our existing implementation with this one and do a performance comparison.

I did some informal benchmarking a while ago and found the gcc version to be a little faster (sorry, I didn't record the figures). I think this was partly because it doesn't fully support allocators. I'm a bit weary of publishing artificial speed comparisons so some serious real life comparisons would be very useful. I'll see if I can dig out my benchmarks, they used boost headers as test data so they weren't completely unrealistic. Daniel

6434

Age (days ago)

6446

Last active (days ago)

List overview

Download

43 comments

13 participants

participants (13)

"JOAQUIN LOPEZ MU?Z"
Daniel James
David Abrahams
Eric Niebler
Hervé Brönnimann
Howard Hinnant
Ion Gaztañaga
Jamie Allsop
Joaquín Mª López Muñoz
John Torjo
Mathias Gaunard
Peter Dimov
Thorsten Ottosen