data:image/s3,"s3://crabby-images/fe2a5/fe2a5013e0c9f36d9cc0ebc50be855feeab562be" alt=""
ajay gopalakrishnan wrote:
I work mainly in Machine Learning & Data Mining and this job mainly requires me to write very efficient and fast code for numerical processing programs as well as do a lot of data, especially text preprocessing. It would be great if you cant point to me some good packages in Boost for the following tasks. Packages that are very good and don't have a very steep learning curve.
* Text Parsing. package that lets me do something like what Sed and Awk can do.
http://cttl.sourceforge.net/ may be of value here
Take a look at Boost.Regex and Boost.Spirit.
* Linear Algebra - Eigen analysis, matrix operations, Matrix decomposition etc.
There is Boost.uBLAS. But frankly, I don't like uBLAS. I use my own generic C++ wrapper around Intel MKL and VecLib.
http://itpp.sourceforge.net/ may help with some of the math. http://www.osl.iu.edu/research/mtl/ for some matrix, linear algebra stuff
* Optimization routines - Linear programming , Quadratic
Programming etc.
* HTML, XML parsing etc.
http://itpp.sourceforge.net/ may be useful for xml I've used http://homepage.mac.com/pauljlucas/software/html_tree/ to parse html. It has a built in element iterator, but look at the code to confirm, as there is only a prefix ++ rather than prefix ++. Some stuff in http://swishplusplus.sourceforge.net/ may be useful for text mining. -- Scanned for viruses and dangerous content at http://www.oneunified.net and is believed to be clean.