
Hello, On 27 March 2011 18:26, Andrew Sutton <asutton.list@gmail.com> wrote:
Hi,
I have worked with Boost on my projects but haven't really thought of using C++ as a language for NLP.
The NLP that I have done is on Python and Java, for their built-in string methods.
I think this would be an interesting project, but doing it correctly would require *way* more than 3 months of effort. If you were interested in starting to work on an NLP support library, you should focus on designing a small set of tools (WordNet support, stopword removal support, stemmers, etc.),
I do realize it will take a lot more than a summer's worth of effort, but as you pointed out, a small library with a couple of basic of tools could be an excellent start point.
I don't know if this is inexperience or ignorance, but does C++ work well for NLP?
There's no reason it should not be a great choice for NLP applications. It might be worth pointing out that some of the performance critical components of the Python NLTK (Natural Language Toolkit) are written in C... and we all know that C++ is a better C than C :)
I have worked on the Python NLTK and absolutely loved it. I did not know that the critical components were written in C/C++. But I must admit, I haven't seen a final application written entirely in C/C++. I am a moderate to good level programmer and I think the reason why a lot of people prefer python is for the simplicity of the code or as one forum user put it, "the syntactical fluff and non-abstraction" that goes with C++.
Also, I was looking at some C++ code using Boost/tokenizer.hpp that
tokenized some text and it looked a bit scary.
Welcome to Boost. The learning curve can be a bit steep, but don't let that scare you away.
Andrew
Haha. Thanks for the welcome! I do realize the complexity involved in a project such as Boost, but for a noob, I was in total awe! :) -- Regards, Sarma Tangirala, Junior - Class of 2012, Department of Information Science and Technology, College of Engineering Guindy - Anna University