
Hi, I would like to know if there is Trie implementation in Boost. If not is it going to implemented soon? Regards, Chintan Rao H

On Thu, Mar 20, 2008 at 9:26 PM, chintan rao <chintanraoh@gmail.com> wrote:
I would like to know if there is Trie implementation in Boost. If not is it going to implemented soon?
I think there might be a trie-based symbol table in spirit, but no stand-alone one that I know of. I was considering submitting a proposal for something like that for SoC, but it looks like I'm going to get a real job instead :P

John Maddock wrote:
chintan rao wrote:
Hi, I would like to know if there is Trie implementation in Boost. If not is it going to implemented soon?
No it's one of those things we're currently missing.
I think the boost spirit symbol class is implemented as a trie, albeit incomplete... you can add items, and search for them, but not remove them. IIRC, there was discussion on the spirit mailing list regarding a more generalized version. Jeff Flinn

On Mon, Mar 24, 2008 at 4:44 PM, Jeff Flinn <TriumphSprint2000@hotmail.com> wrote:
John Maddock wrote:
chintan rao wrote:
Hi, I would like to know if there is Trie implementation in Boost. If not is it going to implemented soon?
No it's one of those things we're currently missing.
I think the boost spirit symbol class is implemented as a trie, albeit
incomplete... you can add items, and search for them, but not remove them. IIRC, there was discussion on the spirit mailing list regarding a more generalized version.
I had looked into. Seems like and old discussion. It dated back to 2005.... I had no idea how to follow it up. Do they use a generalized version,for which they used a TST according to the thread link given by Martin Wille in reply to my post ) or do they do they version of tries specific to strings. Did they implement a Patricia Trie? Anyway, generally speaking ........ A generalized version will have to use Ternary Search Tree otherwise direct array look for the child node's address become too large. Trie for strings itself is space in efficient. One has to use PATRICIA tries to really optimize it . There are other techniques like DAWG which do make the tries somewhat space efficient... Jeff Flinn
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

chintan rao wrote:
On Mon, Mar 24, 2008 at 4:44 PM, Jeff Flinn <TriumphSprint2000@hotmail.com> wrote:
John Maddock wrote:
chintan rao wrote:
Hi, I would like to know if there is Trie implementation in Boost. If not is it going to implemented soon? No it's one of those things we're currently missing. I think the boost spirit symbol class is implemented as a trie, albeit
incomplete... you can add items, and search for them, but not remove them. IIRC, there was discussion on the spirit mailing list regarding a more generalized version.
I had looked into. Seems like and old discussion. It dated back to 2005.... I had no idea how to follow it up. Do they use a generalized version,for which they used a TST according to the thread link given by Martin Wille in reply to my post ) or do they do they version of tries specific to strings.
Spirit isn't limited to strings and the TST implementation is templated on the "character" type. It should work with any sequential key. Did they implement
a Patricia Trie?
No. Spirit's TST node class contains only an instance of a single CharT.
Anyway, generally speaking ........ A generalized version will have to use Ternary Search Tree otherwise direct array look for the child node's address become too large. Trie for strings itself is space in efficient. One has to use PATRICIA tries to really optimize it . There are other techniques like DAWG which do make the tries somewhat space efficient...
"optimize" implies an optimization goal. As with other data structures, you can't optimize them all in a single implementation. What optimization goal(s) do you have in mind? Regards, m

Hi, On Mon, Mar 24, 2008 at 10:55 PM, Martin Wille <mw8329@yahoo.com.au> wrote:
chintan rao wrote:
On Mon, Mar 24, 2008 at 4:44 PM, Jeff Flinn < TriumphSprint2000@hotmail.com> wrote:
John Maddock wrote:
chintan rao wrote:
Hi, I would like to know if there is Trie implementation in Boost. If not is it going to implemented soon? No it's one of those things we're currently missing. I think the boost spirit symbol class is implemented as a trie, albeit
incomplete... you can add items, and search for them, but not remove them. IIRC, there was discussion on the spirit mailing list regarding a more generalized version.
I had looked into. Seems like and old discussion. It dated back to 2005.... I had no idea how to follow it up. Do they use a generalized version,for which they used a TST according to the thread link given by Martin Wille in reply to my post ) or do they do they version of tries specific to strings.
Spirit isn't limited to strings and the TST implementation is templated on the "character" type. It should work with any sequential key.
Did they implement
a Patricia Trie?
No. Spirit's TST node class contains only an instance of a single CharT.
Anyway, generally speaking ........ A generalized version will have to use Ternary Search Tree otherwise direct array look for the child node's address become too large. Trie for strings itself is space in efficient. One has to use PATRICIA tries to really optimize it . There are other techniques like DAWG which do make the tries somewhat space efficient...
"optimize" implies an optimization goal. As with other data structures, you can't optimize them all in a single implementation. What optimization goal(s) do you have in mind?
PATRICIA Tries optimize space. I don't know about how much it improves the time efficiency over normal tries as one could be using TST in it to conserve space. But as far i remember a person #boost channel reported some thing like 2x speed over std::map. DAWG does retain the time efficiency of a normal trie and also optimizes space by merging common subtrees. But it again depends on how one decides to implement it.. Ie using a TST or no TST. But there is an added time required to build a DAWG from a trie. ( one internet source reported it as "160,000 words in about 2 minutes" ) Regards, Chintan Rao H

On Mon, Mar 24, 2008 at 9:30 PM, chintan rao <chintanraoh@gmail.com> wrote:
PATRICIA Tries optimize space. I don't know about how much it improves the time efficiency over normal tries as one could be using TST in it to conserve space. But as far i remember a person #boost channel reported some thing like 2x speed over std::map.
(I use PATRICIA to mean joining chains of nodes with only one outbound link. Whether that's right not no, I don't know.) I expect the time efficiciency improvement isn't significant. It's probably similar to the difference between iterating list and vector, in that the memory locality is better. It's essential, as far as I'm concerned, though, to keep the node space overhead from getting excessive, particularly since it does not worsen the time costs. And that "person in #boost" was likely me. I got roughly a 2x speed up for 2x the memory cost, in a non-generic, proof-of-concept tst-patricia-trie.

On Tue, Mar 25, 2008 at 7:29 AM, Scott McMurray <me22.ca+boost@gmail.com> wrote:
On Mon, Mar 24, 2008 at 9:30 PM, chintan rao <chintanraoh@gmail.com> wrote:
PATRICIA Tries optimize space. I don't know about how much it improves
the
time efficiency over normal tries as one could be using TST in it to conserve space. But as far i remember a person #boost channel reported some thing like 2x speed over std::map.
(I use PATRICIA to mean joining chains of nodes with only one outbound link. Whether that's right not no, I don't know.)
I expect the time efficiciency improvement isn't significant. It's probably similar to the difference between iterating list and vector, in that the memory locality is better. It's essential, as far as I'm concerned, though, to keep the node space overhead from getting excessive, particularly since it does not worsen the time costs.
And that "person in #boost" was likely me. I got roughly a 2x speed up for 2x the memory cost, in a non-generic, proof-of-concept tst-patricia-trie.
Must have been you. You had spoken about tst patricia trie. :).
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hi, Finally some bad discovery was made......... http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/trie_based_containers.html There seems to be a PATRICIA trie implementation in libstdc++ in 4.3 and 4.2implementations Please suggest modifications to my application. Regards, Chintan Rao H On Tue, Mar 25, 2008 at 8:41 AM, chintan rao <chintanraoh@gmail.com> wrote:
On Tue, Mar 25, 2008 at 7:29 AM, Scott McMurray <me22.ca+boost@gmail.com> wrote:
On Mon, Mar 24, 2008 at 9:30 PM, chintan rao <chintanraoh@gmail.com> wrote:
PATRICIA Tries optimize space. I don't know about how much it
improves the
time efficiency over normal tries as one could be using TST in it to conserve space. But as far i remember a person #boost channel reported some thing like 2x speed over std::map.
(I use PATRICIA to mean joining chains of nodes with only one outbound link. Whether that's right not no, I don't know.)
I expect the time efficiciency improvement isn't significant. It's probably similar to the difference between iterating list and vector, in that the memory locality is better. It's essential, as far as I'm concerned, though, to keep the node space overhead from getting excessive, particularly since it does not worsen the time costs.
And that "person in #boost" was likely me. I got roughly a 2x speed up for 2x the memory cost, in a non-generic, proof-of-concept tst-patricia-trie.
Must have been you. You had spoken about tst patricia trie. :).
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Tue, Mar 25, 2008 at 5:45 PM, chintan rao <chintanraoh@gmail.com> wrote:
Hi, Finally some bad discovery was made.........
http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/trie_based_containers.html
There seems to be a PATRICIA trie implementation in libstdc++ in 4.3 and 4.2 implementations Please suggest modifications to my application.
Is there some way this can still sustain?
Regards, Chintan Rao H
On Tue, Mar 25, 2008 at 8:41 AM, chintan rao <chintanraoh@gmail.com> wrote:
On Tue, Mar 25, 2008 at 7:29 AM, Scott McMurray <me22.ca+boost@gmail.com> wrote:
On Mon, Mar 24, 2008 at 9:30 PM, chintan rao <chintanraoh@gmail.com> wrote:
PATRICIA Tries optimize space. I don't know about how much it
improves the
time efficiency over normal tries as one could be using TST in it to conserve space. But as far i remember a person #boost channel reported some thing like 2x speed over std::map.
(I use PATRICIA to mean joining chains of nodes with only one outbound link. Whether that's right not no, I don't know.)
I expect the time efficiciency improvement isn't significant. It's probably similar to the difference between iterating list and vector, in that the memory locality is better. It's essential, as far as I'm concerned, though, to keep the node space overhead from getting excessive, particularly since it does not worsen the time costs.
And that "person in #boost" was likely me. I got roughly a 2x speed up for 2x the memory cost, in a non-generic, proof-of-concept tst-patricia-trie.
Must have been you. You had spoken about tst patricia trie. :).
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hi, Sorry took libstdc++ as a standard.. my mistake :(. Regards, Chintan Rao H On Tue, Mar 25, 2008 at 6:28 PM, chintan rao <chintanraoh@gmail.com> wrote:
On Tue, Mar 25, 2008 at 5:45 PM, chintan rao <chintanraoh@gmail.com> wrote:
Hi, Finally some bad discovery was made.........
http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/trie_based_containers.html
There seems to be a PATRICIA trie implementation in libstdc++ in 4.3 and 4.2 implementations Please suggest modifications to my application.
Is there some way this can still sustain?
Regards, Chintan Rao H
On Tue, Mar 25, 2008 at 8:41 AM, chintan rao <chintanraoh@gmail.com> wrote:
On Tue, Mar 25, 2008 at 7:29 AM, Scott McMurray < me22.ca+boost@gmail.com> wrote:
On Mon, Mar 24, 2008 at 9:30 PM, chintan rao <chintanraoh@gmail.com> wrote:
PATRICIA Tries optimize space. I don't know about how much it
improves the
time efficiency over normal tries as one could be using TST in it to conserve space. But as far i remember a person #boost channel reported some thing like 2x speed over std::map.
(I use PATRICIA to mean joining chains of nodes with only one outbound link. Whether that's right not no, I don't know.)
I expect the time efficiciency improvement isn't significant. It's probably similar to the difference between iterating list and vector, in that the memory locality is better. It's essential, as far as I'm concerned, though, to keep the node space overhead from getting excessive, particularly since it does not worsen the time costs.
And that "person in #boost" was likely me. I got roughly a 2x speed up for 2x the memory cost, in a non-generic, proof-of-concept tst-patricia-trie.
Must have been you. You had spoken about tst patricia trie. :).
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

chintan rao wrote:
Hi, I would like to know if there is Trie implementation in Boost.
There's a TST implementation. TSTs are related to tries. However, the implementation is well hidden in Spirit (inside the symbol table implementation) and it only has a subset of the desirable interface. Maybe, you're interested in that TST implementation.
If not is it going to implemented soon?
There has been an undertaking to separate the TST code from Spirit into a library of its own and to complete the interface. That effort apparently is stalled for quite some time now. I don't know how much progress the effort made. I suspect it hit the magical 90%-done barrier. I suggest you search the Spirit developer mailing list for relevant messages if you're interested. Starting points could be 1. http://tinyurl.com/37gcgp or 2. http://tinyurl.com/2e3jzn HTH, m 1) http://sourceforge.net/mailarchive/forum.php?thread_name=2084b47d05090905432484f5ca%40mail.gmail.com&forum_name=spirit-devel 2) http://sourceforge.net/mailarchive/message.php?msg_name=cs431t%24duo%241%40s...

Hi, I was thinking of implementing trie and various variation of tries, namely Ternary trees, Suffix trees and DAWG(Directed Acyclic Word Graphs) , for Google Soc. I was thinking something like this This is the abstract of the interface for trie Interface: trie<type_2> something; trie[string key]=type2 value; // O(key.size()); value trie::delete(string key); // O(key.size()); trie::iterator trie::find(string key); // O(key.size())' int trie::size() // O(1) time. Gives the total no of elements in the trie trie::iterator trie::begin() //gives the beginning ie the lexicographically least key :) trie::iterator trie::end() //gives a iterator which is same as x=x+trie.size(); string trie::get_key(iterator x) gets the key pointed to by iterator x; trie::iterator x; //worst case O(height of tree) x++; // traverses the tree left to right and goes to the "next" node x--; // traverses the tree right to left and goes to "prev" node *x // referrences the value of the key x is pointing to. trie.delete(x) // deletes the key pointed to by x; trie.count_prefix(string key) // counts the no of keys with prefix key Examples: trie::interator x; trie<vector<int> > t; t["hello"].push_back(9); t["eat"].push_back(10); x=t.find("hello"); x--; //or --x; x now points to "eat" since eat comes before "hello" lexicographically; cout<<(*x)[0]<<endl; x++;//or ++x, x now points to "hello"; x=t.end(); // or x++ will point to t.end(). do { x--; int size=(*x).size(); cout<<t.get_key(x)<<endl; for(int j=0;j<size;j++) cout<<(*x)[j]<<" "; cout<<endl; } while(x!=t.begin()); Please suggest changes to the interface and other things. Regards, Chintan Rao H On Fri, Mar 21, 2008 at 8:00 PM, Martin Wille <mw8329@yahoo.com.au> wrote:
chintan rao wrote:
Hi, I would like to know if there is Trie implementation in Boost.
There's a TST implementation. TSTs are related to tries. However, the implementation is well hidden in Spirit (inside the symbol table implementation) and it only has a subset of the desirable interface.
Maybe, you're interested in that TST implementation.
If not is it going to implemented soon?
There has been an undertaking to separate the TST code from Spirit into a library of its own and to complete the interface. That effort apparently is stalled for quite some time now.
I don't know how much progress the effort made. I suspect it hit the magical 90%-done barrier.
Would it be a good idea to start TST all this from scratch and implement a general thing for Boost?
I suggest you search the Spirit developer mailing list for relevant messages if you're interested. Starting points could be 1. http://tinyurl.com/37gcgp or 2. http://tinyurl.com/2e3jzn
HTH, m
1)
2)
http://sourceforge.net/mailarchive/message.php?msg_name=cs431t%24duo%241%40s...
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

chintan rao wrote:
Please suggest changes to the interface and other things.
You will want to read through the previous tree library design and code work that Bernhard did for GSoC 2006: <http://svn.boost.org/trac/boost/browser/sandbox/SOC/2006/tree/trunk> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2101.html> -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org (msn) - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim,yahoo,skype,efnet,gmail

On Fri, Mar 21, 2008 at 1:08 PM, chintan rao <chintanraoh@gmail.com> wrote:
Please suggest changes to the interface and other things.
I notice you've been using strings for everything. It might be interesting to use arbitrary boost ranges, so that you have a container_(map|set) kind of thing.

On Fri, Mar 21, 2008 at 11:27 PM, Scott McMurray <me22.ca+boost@gmail.com> wrote:
On Fri, Mar 21, 2008 at 1:08 PM, chintan rao <chintanraoh@gmail.com> wrote:
Please suggest changes to the interface and other things.
I notice you've been using strings for everything. It might be interesting to use arbitrary boost ranges, so that you have a container_(map|set) kind of thing.
I wrote this only for strings. Another thing what can be done is this can implemented for all ranges using ternary trees but time complexity of such algorithms will be higher. With strings the algorithm becomes faster that hash_map especially when there are collisions. This can particularly be useful in Dictionaries, Auto completion , and and efficient replacement for hash_map and map. With DAWG the space complexity of the algorithms become much much lower. Therefore i think these will good additions into boost library.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Fri, Mar 21, 2008 at 11:27 PM, Scott McMurray <me22.ca+boost@gmail.com> wrote:
On Fri, Mar 21, 2008 at 1:08 PM, chintan rao <chintanraoh@gmail.com> wrote:
Please suggest changes to the interface and other things.
I notice you've been using strings for everything. It might be interesting to use arbitrary boost ranges, so that you have a container_(map|set) kind of thing.
First todo this as efficiently as string tries one can have min and max value for key[x] { by default key is string }. Key[x] must be increment-able (should support key[x]++ between min and max). and there must be a map function, similar to hash_function in hash_map, which will map each key[x] to consecutive integers. For example, when key[x] is unsigned char, its by default mapped from 0 to 255. and key[x] can be incremented :) Am i correct?
__________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

chintan rao wrote:
I was thinking of implementing trie and various variation of tries, namely Ternary trees, Suffix trees and DAWG(Directed Acyclic Word Graphs) , for Google Soc.
That's interesting. Do you have a motivating application? I last thought about tries when Boost.Flyweight was being discussed. Could your tries be used there? At one time I was considering writing an Apache log file analysis program. This would have used some sort of trie or similar structure to store the URLs (the aim is to be able to analyse very large logs in RAM). At the time I was considering PATRICIA tries. I also have some code that stores large numbers of very similar filenames: /photos/christmas_2007/img_3277.jpg /photos/christmas_2007/img_3278.jpg .... These are currently stored in a binary file which is mmap()ed. The time to read the file is non-trivial and is bad for program start-up time. So perhaps I need a trie that can be stored in the mmap()ed file, i.e. not using pointers, like the Boost.Interprocess containers do.
I was thinking something like this
This is the abstract of the interface for trie Interface: trie<type_2> something; trie[string key]=type2 value; // O(key.size()); value trie::delete(string key); // O(key.size()); trie::iterator trie::find(string key); // O(key.size())' int trie::size() // O(1) time. Gives the total no of elements in the trie trie::iterator trie::begin() //gives the beginning ie the lexicographically least key :) trie::iterator trie::end() //gives a iterator which is same as x=x+trie.size(); string trie::get_key(iterator x) gets the key pointed to by iterator x;
trie::iterator x; //worst case O(height of tree) x++; // traverses the tree left to right and goes to the "next" node x--; // traverses the tree right to left and goes to "prev" node
*x // referrences the value of the key x is pointing to. trie.delete(x) // deletes the key pointed to by x;
trie.count_prefix(string key) // counts the no of keys with prefix key
Examples: trie::interator x; trie<vector<int> > t; t["hello"].push_back(9); t["eat"].push_back(10);
x=t.find("hello"); x--; //or --x; x now points to "eat" since eat comes before "hello" lexicographically; cout<<(*x)[0]<<endl;
x++;//or ++x, x now points to "hello";
x=t.end(); // or x++ will point to t.end().
do { x--; int size=(*x).size(); cout<<t.get_key(x)<<endl; for(int j=0;j<size;j++) cout<<(*x)[j]<<" "; cout<<endl; } while(x!=t.begin());
Please suggest changes to the interface and other things.
I think you should be able to make it (almost) identical to std::map's interface. You should certainly use std::map as your starting point, and only deviate from it when you have a good reason to do so. Regards, Phil.

Hi, On Sun, Mar 23, 2008 at 5:37 AM, Phil Endecott < spam_from_boost_dev@chezphil.org> wrote:
chintan rao wrote:
I was thinking of implementing trie and various variation of tries, namely Ternary trees, Suffix trees and DAWG(Directed Acyclic Word Graphs) , for Google Soc.
That's interesting. Do you have a motivating application?
Till now i was gathering ideas here at the mailing list and other places. Will have to prepare a final application.
I last thought about tries when Boost.Flyweight was being discussed. Could your tries be used there?
I have not looked into Boost.Flyweight but I suppose they can be used :). At one time I was considering writing an Apache log file analysis
program. This would have used some sort of trie or similar structure to store the URLs (the aim is to be able to analyse very large logs in RAM). At the time I was considering PATRICIA tries.
Some people had suggested PATRACIA tries. I looked into the algorithm and added it to my todo list. Actually even the DAWG idea was suggested by someone else too :). I also looked in Digital Search Trees :).
I also have some code that stores large numbers of very similar filenames:
/photos/christmas_2007/img_3277.jpg /photos/christmas_2007/img_3278.jpg ....
These are currently stored in a binary file which is mmap()ed. The time to read the file is non-trivial and is bad for program start-up time. So perhaps I need a trie that can be stored in the mmap()ed file, i.e. not using pointers, like the Boost.Interprocess containers do.
I was thinking something like this
This is the abstract of the interface for trie Interface: trie<type_2> something; trie[string key]=type2 value; // O(key.size()); value trie::delete(string key); // O(key.size()); trie::iterator trie::find(string key); // O(key.size())' int trie::size() // O(1) time. Gives the total no of elements in the trie trie::iterator trie::begin() //gives the beginning ie the lexicographically least key :) trie::iterator trie::end() //gives a iterator which is same as x=x+trie.size(); string trie::get_key(iterator x) gets the key pointed to by iterator x;
trie::iterator x; //worst case O(height of tree) x++; // traverses the tree left to right and goes to the "next" node x--; // traverses the tree right to left and goes to "prev" node
*x // referrences the value of the key x is pointing to. trie.delete(x) // deletes the key pointed to by x;
trie.count_prefix(string key) // counts the no of keys with prefix key
Examples: trie::interator x; trie<vector<int> > t; t["hello"].push_back(9); t["eat"].push_back(10);
x=t.find("hello"); x--; //or --x; x now points to "eat" since eat comes before "hello" lexicographically; cout<<(*x)[0]<<endl;
x++;//or ++x, x now points to "hello";
x=t.end(); // or x++ will point to t.end().
do { x--; int size=(*x).size(); cout<<t.get_key(x)<<endl; for(int j=0;j<size;j++) cout<<(*x)[j]<<" "; cout<<endl; } while(x!=t.begin());
Please suggest changes to the interface and other things.
I think you should be able to make it (almost) identical to std::map's interface. You should certainly use std::map as your starting point, and only deviate from it when you have a good reason to do so.
Should be easily possible :).
Regards, Phil.
Regards, Chintan
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (7)
-
chintan rao
-
Jeff Flinn
-
John Maddock
-
Martin Wille
-
Phil Endecott
-
Rene Rivera
-
Scott McMurray