
Just a small follow-up. I was caught in exam week and could not do anything constructive for a while. I did catch up with my advisor who specializes in AI and she was also of the opinion that a small set of tool properly implemented should keep me busy through the summer. I am preparing my proposal and should submit in a while. I want to know if I have a good chance of being selected. Any advice at this stage would be awesome. I am looking at tagging, chunking, tokenizing and parsing, stemming and stop-word removal as suggested. I will be using the O'Reilly NLTK book as a model reference. Any other good reference sources would be helpful! On 29 March 2011 06:53, Sarma Tangirala <tvssarma.omega9@gmail.com> wrote:
Hello,
On 27 March 2011 18:26, Andrew Sutton <asutton.list@gmail.com> wrote:
Hi,
I have worked with Boost on my projects but haven't really thought of using C++ as a language for NLP.
The NLP that I have done is on Python and Java, for their built-in string methods.
I think this would be an interesting project, but doing it correctly would require *way* more than 3 months of effort. If you were interested in starting to work on an NLP support library, you should focus on designing a small set of tools (WordNet support, stopword removal support, stemmers, etc.),
I do realize it will take a lot more than a summer's worth of effort, but as you pointed out, a small library with a couple of basic of tools could be an excellent start point.
I don't know if this is inexperience or ignorance, but does C++ work well for NLP?
There's no reason it should not be a great choice for NLP applications. It might be worth pointing out that some of the performance critical components of the Python NLTK (Natural Language Toolkit) are written in C... and we all know that C++ is a better C than C :)
I have worked on the Python NLTK and absolutely loved it. I did not know that the critical components were written in C/C++. But I must admit, I haven't seen a final application written entirely in C/C++. I am a moderate to good level programmer and I think the reason why a lot of people prefer python is for the simplicity of the code or as one forum user put it, "the syntactical fluff and non-abstraction" that goes with C++.
Also, I was looking at some C++ code using Boost/tokenizer.hpp that
tokenized some text and it looked a bit scary.
Welcome to Boost. The learning curve can be a bit steep, but don't let that scare you away.
Andrew
Haha. Thanks for the welcome!
I do realize the complexity involved in a project such as Boost, but for a noob, I was in total awe! :)
--
Regards, Sarma Tangirala, Junior - Class of 2012, Department of Information Science and Technology, College of Engineering Guindy - Anna University
-- Regards, Sarma Tangirala, Junior - Class of 2012, Department of Information Science and Technology, College of Engineering Guindy - Anna University