Queries of a Boost newbie

Hi all, I have worked on C++ for a long time in the past, but then I stopped using C++ and moved to other languages like Java sheerly because I was facing portability and repeatability issues with my programs. However, with the advent of Boost, I am very excited to go back to C++ again. Boost looks like a pretty big package to easily go through and seems daunting at first to get used to using it. I have been trying to read the documentation and packages available, but it is proving to be a bit difficult to get started quickly. I work mainly in Machine Learning & Data Mining and this job mainly requires me to write very efficient and fast code for numerical processing programs as well as do a lot of data, especially text preprocessing. It would be great if you cant point to me some good packages in Boost for the following tasks. Packages that are very good and don't have a very steep learning curve. - Text Parsing. package that lets me do something like what Sed and Awk can do. - Linear Algebra - Eigen analysis, matrix operations, Matrix decomposition etc. - Optimization routines - Linear programming , Quadratic Programming etc. - HTML, XML parsing etc. Please do let me know if I am posting in the wrong forum. Thanks & Regards, Ajay Gopalakrishnan.

ajay gopalakrishnan wrote:
I work mainly in Machine Learning & Data Mining and this job mainly requires me to write very efficient and fast code for numerical processing programs as well as do a lot of data, especially text preprocessing. It would be great if you cant point to me some good packages in Boost for the following tasks. Packages that are very good and don't have a very steep learning curve.
* Text Parsing. package that lets me do something like what Sed and Awk can do.
Take a look at Boost.Regex and Boost.Spirit.
* Linear Algebra - Eigen analysis, matrix operations, Matrix decomposition etc.
There is Boost.uBLAS. But frankly, I don't like uBLAS. I use my own generic C++ wrapper around Intel MKL and VecLib.
* Optimization routines - Linear programming , Quadratic Programming etc. * HTML, XML parsing etc.
Boost does not have much to offer in these areas. Unfortunately most Boost libraries are written by a single person. The Boost culture does not seem to encourage collaborations. And these areas would require teams of several developers to get something useful done. ---- Also, do take a look at smart pointers, bind, function, filesystem, format and thread. They solve a lot of general software development problems. You mentioned data mining. If you do statistical analysis of data, then you will need Boost.Math.Statistical Distributions. HTH, Johan Råde

ajay gopalakrishnan wrote:
I work mainly in Machine Learning & Data Mining and this job mainly requires me to write very efficient and fast code for numerical processing programs as well as do a lot of data, especially text preprocessing. It would be great if you cant point to me some good packages in Boost for the following tasks. Packages that are very good and don't have a very steep learning curve.
* Text Parsing. package that lets me do something like what Sed and Awk can do.
http://cttl.sourceforge.net/ may be of value here
Take a look at Boost.Regex and Boost.Spirit.
* Linear Algebra - Eigen analysis, matrix operations, Matrix decomposition etc.
There is Boost.uBLAS. But frankly, I don't like uBLAS. I use my own generic C++ wrapper around Intel MKL and VecLib.
http://itpp.sourceforge.net/ may help with some of the math. http://www.osl.iu.edu/research/mtl/ for some matrix, linear algebra stuff
* Optimization routines - Linear programming , Quadratic
Programming etc.
* HTML, XML parsing etc.
http://itpp.sourceforge.net/ may be useful for xml I've used http://homepage.mac.com/pauljlucas/software/html_tree/ to parse html. It has a built in element iterator, but look at the code to confirm, as there is only a prefix ++ rather than prefix ++. Some stuff in http://swishplusplus.sourceforge.net/ may be useful for text mining. -- Scanned for viruses and dangerous content at http://www.oneunified.net and is believed to be clean.

I have worked on C++ for a long time in the past, but then I stopped using C++ and moved to other languages like Java sheerly because I was facing
portability and
repeatability issues with my programs. However, with the advent of Boost, I am very excited to go back to C++ again. Ajay Gopalakrishnan.
I started using boost for the same reason, but I would not abondon higher level languages either. C++ is great for some things, but for other things it is the wrong tool for the job. I reccommend you impliment the 'tools' where performance is important in C++, but you make your integrated applications in higher level languages (like python). You can use boost.python or swig (which I have used) to expose your c++ code to languages like python or even Java. These langauges currently make xml interfaces, web services, guis, etc. extremely simple and easy. Others here may disagree, but this is a pattern I (currently) find works. -- John Femiani
participants (6)
-
ajay gopalakrishnan
-
Brad
-
Bruno Lalande
-
Johan Råde
-
John Femiani
-
Ray Burkholder