[gsoc] Proposal for Generic Trie, Radix Tree, and Suffix Array Data Structures

Hello, I am interested in writing a library consisting of generic implementations of several common data structures and algorithms primarily useful in text-processing. I believe these would be useful additions to Boost and would like to work on them for GSoC. Before I submit my proposal, I would like to hear whether or not the community agrees that this is a good idea, and I would appreciate any suggestions or comments offered. My plan is to implement a trie / prefix tree data structure, STL set/map-like classes implemented over the trie, a radix tree, a suffix tree implemented over the radix tree, and a suffix array. These would be instantiable over arbitrary sequence types, and use iterator traits to determine the contained type. I am a student at Kent State University, working on an M.S. in Computer Science in a combined undergraduate/graduate program. I have a particular interest in computational linguistics, and I will admit that am especially interested in this project because of how it could be applied to my own field, but I believe that these types would be of general utility to the programming community as a whole. Thank you for your time, Chris Wagner

Chris Wagner skrev:
Hello,
I am interested in writing a library consisting of generic implementations of several common data structures and algorithms primarily useful in text-processing. I believe these would be useful additions to Boost and would like to work on them for GSoC. Before I submit my proposal, I would like to hear whether or not the community agrees that this is a good idea, and I would appreciate any suggestions or comments offered.
I don't think there is much more to say than: go for it! :-) And you're right, it could be a really usable library. -Thorsten

Chris Wagner-11 wrote:
I am interested in writing a library consisting of generic implementations of several common data structures and algorithms primarily useful in text-processing. I believe these would be useful additions to Boost and would like to work on them for GSoC. Before I submit my proposal, I would like to hear whether or not the community agrees that this is a good idea, and I would appreciate any suggestions or comments offered.
My plan is to implement a trie / prefix tree data structure, STL set/map-like classes implemented over the trie, a radix tree, a suffix tree implemented over the radix tree, and a suffix array. These would be instantiable over arbitrary sequence types, and use iterator traits to determine the contained type.
Look at http://code.google.com/p/patl/ PATL . I think that is very similar to what you want to do. -- View this message in context: http://www.nabble.com/-gsoc--Proposal-for-Generic-Trie%2C-Radix-Tree%2C-and-... Sent from the Boost - Dev mailing list archive at Nabble.com.

Hi, I think you should build on what was done in last year's GSoC by Chintan Rao. I recall he did a lot of work but the final status was not published. regards jose On Fri, May 1, 2009 at 10:30 PM, Roman Klyujkov <uxnuxn@gmail.com> wrote:
Chris Wagner-11 wrote:
I am interested in writing a library consisting of generic implementations of several common data structures and algorithms primarily useful in text-processing. I believe these would be useful additions to Boost and would like to work on them for GSoC. Before I submit my proposal, I would like to hear whether or not the community agrees that this is a good idea, and I would appreciate any suggestions or comments offered.
My plan is to implement a trie / prefix tree data structure, STL set/map-like classes implemented over the trie, a radix tree, a suffix tree implemented over the radix tree, and a suffix array. These would be instantiable over arbitrary sequence types, and use iterator traits to determine the contained type.
Look at http://code.google.com/p/patl/ PATL . I think that is very similar to what you want to do. -- View this message in context: http://www.nabble.com/-gsoc--Proposal-for-Generic-Trie%2C-Radix-Tree%2C-and-... Sent from the Boost - Dev mailing list archive at Nabble.com.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hi, Can you keep me informed about what you do? Regards, Chintan Rao H

Hello all, Sorry about the delay in reply -- I *am* reading this thread, honest. The last few weeks have been... interesting... for me, but I should be able to reply to things within a reasonable time-frame now. Roman, I will definitely take a look at the PATL, since it does seem very similar to what I am doing. Chintan, I remember meeting you on IRC, and I will try to keep you updated on what I am doing either there or via email. Thank you for your suggestions so far. - Chris
participants (5)
-
Chintan Rao H
-
Chris Wagner
-
Jose
-
Roman Klyujkov
-
Thorsten Ottosen