
Hi, On Sun, Mar 23, 2008 at 5:37 AM, Phil Endecott < spam_from_boost_dev@chezphil.org> wrote:
chintan rao wrote:
I was thinking of implementing trie and various variation of tries, namely Ternary trees, Suffix trees and DAWG(Directed Acyclic Word Graphs) , for Google Soc.
That's interesting. Do you have a motivating application?
Till now i was gathering ideas here at the mailing list and other places. Will have to prepare a final application.
I last thought about tries when Boost.Flyweight was being discussed. Could your tries be used there?
I have not looked into Boost.Flyweight but I suppose they can be used :). At one time I was considering writing an Apache log file analysis
program. This would have used some sort of trie or similar structure to store the URLs (the aim is to be able to analyse very large logs in RAM). At the time I was considering PATRICIA tries.
Some people had suggested PATRACIA tries. I looked into the algorithm and added it to my todo list. Actually even the DAWG idea was suggested by someone else too :). I also looked in Digital Search Trees :).
I also have some code that stores large numbers of very similar filenames:
/photos/christmas_2007/img_3277.jpg /photos/christmas_2007/img_3278.jpg ....
These are currently stored in a binary file which is mmap()ed. The time to read the file is non-trivial and is bad for program start-up time. So perhaps I need a trie that can be stored in the mmap()ed file, i.e. not using pointers, like the Boost.Interprocess containers do.
I was thinking something like this
This is the abstract of the interface for trie Interface: trie<type_2> something; trie[string key]=type2 value; // O(key.size()); value trie::delete(string key); // O(key.size()); trie::iterator trie::find(string key); // O(key.size())' int trie::size() // O(1) time. Gives the total no of elements in the trie trie::iterator trie::begin() //gives the beginning ie the lexicographically least key :) trie::iterator trie::end() //gives a iterator which is same as x=x+trie.size(); string trie::get_key(iterator x) gets the key pointed to by iterator x;
trie::iterator x; //worst case O(height of tree) x++; // traverses the tree left to right and goes to the "next" node x--; // traverses the tree right to left and goes to "prev" node
*x // referrences the value of the key x is pointing to. trie.delete(x) // deletes the key pointed to by x;
trie.count_prefix(string key) // counts the no of keys with prefix key
Examples: trie::interator x; trie<vector<int> > t; t["hello"].push_back(9); t["eat"].push_back(10);
x=t.find("hello"); x--; //or --x; x now points to "eat" since eat comes before "hello" lexicographically; cout<<(*x)[0]<<endl;
x++;//or ++x, x now points to "hello";
x=t.end(); // or x++ will point to t.end().
do { x--; int size=(*x).size(); cout<<t.get_key(x)<<endl; for(int j=0;j<size;j++) cout<<(*x)[j]<<" "; cout<<endl; } while(x!=t.begin());
Please suggest changes to the interface and other things.
I think you should be able to make it (almost) identical to std::map's interface. You should certainly use std::map as your starting point, and only deviate from it when you have a good reason to do so.
Should be easily possible :).
Regards, Phil.
Regards, Chintan
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost