[tokenmap] updated perfect hash container in sandbox

Changes: * To eliminate a possibility of the algo for finding free slots becoming slower as the load factor increases, the container now keeps track of open slots in unused nodes. This guarantees constant time insertion of new elements at a slight bookkeeping overhead. Consequently, the load-factor has been removed. Pointed by Daniel James. * More STL compliant (iterators, size(), empty(), etc)

could you please provide some additional rationale for this container? I don't understand the advantage to allocating values without any type of container and using the pointers as keys. as I understand the tokenmap-container / my assumptions: - insert()ing into the container requires an allocation - the key values are unknown to the user of the container - the key type must (internally) be integral so what is the advantage to simply allocating the mapped_type and using the returned pointer as the key? in container-form: class container{ typedef ... mapped_type; typedef mapped_type *key_type; ... key_type insert(mapped_type &v){ return new mapped_type(v); } } (to make the container iteratable(if that`s a word) a linked list would have to be used, but a tokenmap also seems to do that) what am I missing?

http://svn.boost.org/svn/boost/sandbox/tokenmap/libs/tokenmap/doc/html/token... On Wed, Apr 21, 2010 at 11:12 AM, <strasser@uni-bremen.de> wrote:
could you please provide some additional rationale for this container? I don't understand the advantage to allocating values without any type of container and using the pointers as keys.
what am I missing?

Zitat von Slawomir Lisznianski <sl@paramay.com>:
http://svn.boost.org/svn/boost/sandbox/tokenmap/libs/tokenmap/doc/html/token...
I have read the rationale (i.e. example use cases), that's why I was asking for additional rationale. I think I have figured it out from the source code now, but I meant something like: what is the purpose of the container? that it can tell valid keys from invalid keys. (-> index into vector) that keys can not be guessed. (-> randomized key part) that it can be iterated in linear time to size() instead of capacity().(-> linked list of nodes) why is the mapped_type stored by pointer to allocated object instead of by value? "One immediate consequence of using this simple strategy is that a user cannot provide her own tokens" I think the "randomized part" could be provided by the user, if the key is a std::pair<index_part, user_part>. btw, a similar technique (but without the iterating and the randomizing part) is employed in my implementation of thread_specific_ptr: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=tss.hpp&directory=& that would be another use case of your container if it is configurable enough in the final version. Regards,

strasser@uni-bremen.de wrote:
why is the mapped_type stored by pointer to allocated object instead of by value?
next release (in the works) renames tokenmap to ptr_tokenmap, and a new tokenmap comes in its place with by-value semantics.
I think the "randomized part" could be provided by the user, if the key is a std::pair<index_part, user_part>.
Correct. Next release delegate to an external randomizer. Presumably users will plug Boost Random Number.
btw, a similar technique (but without the iterating and the randomizing part) is employed in my implementation of thread_specific_ptr: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=tss.hpp&directory=&
Interesting... thanks for the hint.
that would be another use case of your container if it is configurable enough in the final version.
It shall be. Also, thanks for bullet-point examples of analogous operations. It summarized it all well. -sl

Zitat von Slawomir Lisznianski <sl@paramay.com>:
strasser@uni-bremen.de wrote:
why is the mapped_type stored by pointer to allocated object instead of by value?
next release (in the works) renames tokenmap to ptr_tokenmap, and a new tokenmap comes in its place with by-value semantics.
I think the "randomized part" could be provided by the user, if the key is a std::pair<index_part, user_part>.
Correct. Next release delegate to an external randomizer. Presumably users will plug Boost Random Number.
I think the this could be a container adapter instead of part of the container, just like std::priority_queue is to std::vector. adapter::key_type would be std::pair<container::key_type,randomized_part> you could use the adapter for any container, and easily use tokenmap without the randomized key part, without using something like boost::empty as the randomized part (e.g. for thread_specific_ptr).

Slawomir Lisznianski skrev:
Changes:
* To eliminate a possibility of the algo for finding free slots becoming slower as the load factor increases, the container now keeps track of open slots in unused nodes. This guarantees constant time insertion of new elements at a slight bookkeeping overhead. Consequently, the load-factor has been removed. Pointed by Daniel James.
* More STL compliant (iterators, size(), empty(), etc)
If there is no reason that performance cannot be compared to boost::unordered_map, then please also perform this comparison. Consider also to reorder your template arguments to make the interface as close to normal maps as possible. -Thorsten

Thorsten Ottosen wrote:
If there is no reason that performance cannot be compared to boost::unordered_map, then please also perform this comparison.
will do..
Consider also to reorder your template arguments to make the interface as close to normal maps as possible.
The intention was that the token-type, being limited to an integral type, would default to an int which would work for vast majority of cases. So the user could simply write: tokenmap<my_class> fast_map; instead of: tokenmap<int, my_class> fast_map; The second may give an impression that the key can be anything other, whereas in reality it's quite limiting. I don't feel strong about either method though. -sl

Slawomir Lisznianski skrev:
Thorsten Ottosen wrote: Consider also to reorder your template arguments to make the interface
as close to normal maps as possible.
The intention was that the token-type, being limited to an integral type, would default to an int which would work for vast majority of cases. So the user could simply write:
tokenmap<my_class> fast_map;
instead of:
tokenmap<int, my_class> fast_map;
The second may give an impression that the key can be anything other, whereas in reality it's quite limiting.
I don't feel strong about either method though.
Is suggest that you use BOOST_STATIC_ASSERT and a type_trait to make the requirement checked and explicit. And to make the syntax mimic that of normal maps. -Thorsten

On 2010-04-24, Slawomir Lisznianski <sl@paramay.com> wrote:
Thorsten Ottosen wrote:
If there is no reason that performance cannot be compared to boost::unordered_map, then please also perform this comparison.
will do..
In order for the comparison to be more fair, somehow the time that `boost::unordered_map` took to compute the hashes of `key_type` objects must be "subtracted out". Looking through the documentation (http://www.boost.org/doc/libs/1_42_0/doc/html/boost/unordered_map.html), it does not appear that `boost::unordered_map` provides "expert" find routines that take hashes, but perhaps it would be easy to add one for the purpose of the comparison: `xfind` takes a hash `h` and returns an `iterator` to the element with a key that hashes to `h`, or `end()` if no such element is found. Such a function could even be added to `boost::unordered_map`.
participants (4)
-
Daniel Trebbien
-
Slawomir Lisznianski
-
strasser@uni-bremen.de
-
Thorsten Ottosen