Re: [Boost-users] Question about proper container/algorithm to use for fast dictionary lookup

18 Sep 2005

      Thanks for the reply,

Your point is well taken.  Well if that's so then why does boost::spirit
implement a TST?

And just to get a bit more clarity here, at the site for the TST their
data set looks much like a pretty random collection of words (a typical
dictionary), so I guess the question is which container works best for
implementing a) a very small dictionary of under 100 terms that are
comprised of the Unicode invariant characters (about 60 characters)
which is even a subset of ASCII and b) a larger dictionary of thousands
of terms comprised of the entire Unicode BMP set of characters? 

Thanks,

Elisha
...
...
...
2) Would its performance be much better than a general purpose
associative container?
Performance-wise, it should be about the same as a hash_map.
Memory-wise, a trie should be much better.
AFAIK there is nothing resembling a trie in Boost, but there are
...
...
free implementations available.
Spirit has a TST as a internal class. It is used in
boost::spirit::symbols
class.
For information about TST see
<http://www.cs.princeton.edu/%7Ers/strings/>
or a recent copy of Sedgewick's "Algorithms" book.
Memory-wise a trie could be much worse than a hash_map, it depends on
...
data set.  If enough of your keys are prefixes of one another then
memory
usage is not that bad. However if your keys are unique or the end of
other
the
the
...
key is the common part then an excessive amount of memory can be used.
Since for a TST a typical node will be at least 16 bytes on a 32bit
machine.  That is 16 bytes per character, not per string.
Nathan E. Moore
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Re: [Boost-users] Question about proper container/algorithm to use for fast dictionary lookup

Elisha Berns