
ajay gopalakrishnan wrote:
I work mainly in Machine Learning & Data Mining and this job mainly requires me to write very efficient and fast code for numerical processing programs as well as do a lot of data, especially text preprocessing. It would be great if you cant point to me some good packages in Boost for the following tasks. Packages that are very good and don't have a very steep learning curve.
* Text Parsing. package that lets me do something like what Sed and Awk can do.
Take a look at Boost.Regex and Boost.Spirit.
* Linear Algebra - Eigen analysis, matrix operations, Matrix decomposition etc.
There is Boost.uBLAS. But frankly, I don't like uBLAS. I use my own generic C++ wrapper around Intel MKL and VecLib.
* Optimization routines - Linear programming , Quadratic Programming etc. * HTML, XML parsing etc.
Boost does not have much to offer in these areas. Unfortunately most Boost libraries are written by a single person. The Boost culture does not seem to encourage collaborations. And these areas would require teams of several developers to get something useful done. ---- Also, do take a look at smart pointers, bind, function, filesystem, format and thread. They solve a lot of general software development problems. You mentioned data mining. If you do statistical analysis of data, then you will need Boost.Math.Statistical Distributions. HTH, Johan Råde