
Hi,
I have worked with Boost on my projects but haven't really thought of using C++ as a language for NLP.
The NLP that I have done is on Python and Java, for their built-in string methods.
I think this would be an interesting project, but doing it correctly would require *way* more than 3 months of effort. If you were interested in starting to work on an NLP support library, you should focus on designing a small set of tools (WordNet support, stopword removal support, stemmers, etc.),
I don't know if this is inexperience or ignorance, but does C++ work well for NLP?
There's no reason it should not be a great choice for NLP applications. It might be worth pointing out that some of the performance critical components of the Python NLTK (Natural Language Toolkit) are written in C... and we all know that C++ is a better C than C :)
Also, I was looking at some C++ code using Boost/tokenizer.hpp that tokenized some text and it looked a bit scary.
Welcome to Boost. The learning curve can be a bit steep, but don't let that scare you away. Andrew