Re: [boost] A possible GSOC 2011 proposal idea

Hi, First of all, many thanks for the feedback! Much of the discussion has been in whether it is possible for me to accomplish such a task over the summer. The answer is no; a library designed to meet Boost's standards will not be done over the summer. However, a substantial amount of the work will be done over the summer and it will continue over the next year until it is satisfactory. I will then be responsible for maintaining it for as long as required. In order for you to understand the scope of this project, it must be made clear that the project references itself many times. For example, MILP's and ILP's are solved using the exact same algorithms, so for all intents and purposes, they are the same. Second of all, solving ILP's using branch and bound or cutting planes both require a strong algorithm for solving their relaxed forms, i.e. their LP relaxations. Third, the bulk of the effort in solving LP's using the refined simplex algorithm is in solving a system of linear equations. This is by far the most difficult part of the entire project and is what I intend to spend the summer doing. It will be difficult for the aforementioned reasons; hence the topic of my GSOC proposal will be efficiently solving linear systems (LS). After GSOC, I will continue my work and write LP and ILP solvers. As for the scope of solving LS, there are many different algorithms designed to do just that. In order to limit the scope, I will not consider parallel algorithms or distributed algorithms. If the Boost community thinks that these are reasonable and I should forgo solving LP in favor of solving large (> 100,000x100,000) LS, we can certainly discuss this. The following algorithms are up for consideration: * LU decomposition for small matrices * Iterative methods for large matrices, such as conjugate gradient for positive definite matrices Of course, this will all be integrated into the already existing uBLAS library. There are many issues with getting such solver to work well in practice, namely reducing fill-in, numerical stability, and performance. In any case, getting a general purpose LS solver is the first step towards getting an LP solver running. Therefore, it is the focus of my work this summer and only afterwards will I work on LP-related algorithms. One last this is that it wasn't my intent for my pre-GSOC submission to be anywhere the quality required for Boost submission. It was only there to show my support towards this project and that I'm committed fully to its success. Thanks again for your feedback! Chad

On 21/03/2011 12:00, Chad Seibert wrote:
As for the scope of solving LS, there are many different algorithms designed to do just that. In order to limit the scope, I will not consider parallel algorithms or distributed algorithms. If the Boost community thinks that these are reasonable and I should forgo solving LP in favor of solving large (> 100,000x100,000) LS, we can certainly discuss this. The following algorithms are up for consideration:
* LU decomposition for small matrices * Iterative methods for large matrices, such as conjugate gradient for positive definite matrices
Doesn't LAPACK already do that?

Yes, it does. However, it seems inappropriate to write an LP library for Boost that uses LAPACK to do the linear algebra stuff. If we wanted to go down that road, we could just write a wrapper for an existing lp library instead, like what OpenOpt does. On Mon, Mar 21, 2011 at 8:39 AM, Mathias Gaunard < mathias.gaunard@ens-lyon.org> wrote:
On 21/03/2011 12:00, Chad Seibert wrote:
As for the scope of solving LS, there are many different algorithms
designed to do just that. In order to limit the scope, I will not consider parallel algorithms or distributed algorithms. If the Boost community thinks that these are reasonable and I should forgo solving LP in favor of solving large (> 100,000x100,000) LS, we can certainly discuss this. The following algorithms are up for consideration:
* LU decomposition for small matrices * Iterative methods for large matrices, such as conjugate gradient for positive definite matrices
Doesn't LAPACK already do that?
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On 21/03/2011 14:50, Chad Seibert wrote:
Yes, it does. However, it seems inappropriate to write an LP library for Boost that uses LAPACK to do the linear algebra stuff. If we wanted to go down that road, we could just write a wrapper for an existing lp library instead, like what OpenOpt does.
LAPACK is a common interface for which there exists many optimized implementations from various vendors. Those are of very high quality, you are simply never going to beat them in terms of performance, even if you are a numerical computation expert. There are also open-source implementations available with liberal licenses. OpenOpt, on the other hand, is a large framework, and albeit it has a liberal license, there is still interest for alternatives. While it is good, it's not quite as good as it could be. But I don't think rewriting the LAPACK routines you need makes any sense at all. It's really state of the art software.

On 21/03/2011 15:02, Mathias Gaunard wrote:
On 21/03/2011 14:50, Chad Seibert wrote:
Yes, it does. However, it seems inappropriate to write an LP library for Boost that uses LAPACK to do the linear algebra stuff. If we wanted to go down that road, we could just write a wrapper for an existing lp library instead, like what OpenOpt does.
LAPACK is a common interface for which there exists many optimized implementations from various vendors. Those are of very high quality, you are simply never going to beat them in terms of performance, even if you are a numerical computation expert. There are also open-source implementations available with liberal licenses.
My bad, I was confused, I meant BLAS there. And LAPACK is the best way to use BLAS routines correctly to solve LS.

On Mon, Mar 21, 2011 at 7:02 AM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 21/03/2011 14:50, Chad Seibert wrote:
Yes, it does. However, it seems inappropriate to write an LP library for Boost that uses LAPACK to do the linear algebra stuff. If we wanted to go down that road, we could just write a wrapper for an existing lp library instead, like what OpenOpt does.
LAPACK is a common interface for which there exists many optimized implementations from various vendors. Those are of very high quality, you are simply never going to beat them in terms of performance, even if you are a numerical computation expert. There are also open-source implementations available with liberal licenses.
OpenOpt, on the other hand, is a large framework, and albeit it has a liberal license, there is still interest for alternatives. While it is good, it's not quite as good as it could be.
But I don't think rewriting the LAPACK routines you need makes any sense at all. It's really state of the art software.
Typically what you do with a generic library is that you write generic algorithms for arbitrary datatypes and then specialize them to dispatch to LAPACK when the types can be handled directly. That way you can still get a reasonably-performant matrix-of-matrices or matrix-of-rationals and get screamin'-hot performance for floats and doubles. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On 21/03/2011 18:45, Dave Abrahams wrote:
Typically what you do with a generic library is that you write generic algorithms for arbitrary datatypes and then specialize them to dispatch to LAPACK when the types can be handled directly. That way you can still get a reasonably-performant matrix-of-matrices or matrix-of-rationals and get screamin'-hot performance for floats and doubles.
From my experience I found that when efficiency is one of the main cocners, it often worked better the other way around: write the algorithm optimally, and then see how you can generalize it. also, writing multiple implementations requires time, which is somewhat restricted during a GSoC.

At Tue, 22 Mar 2011 00:12:24 +0100, Mathias Gaunard wrote:
On 21/03/2011 18:45, Dave Abrahams wrote:
Typically what you do with a generic library is that you write generic algorithms for arbitrary datatypes and then specialize them to dispatch to LAPACK when the types can be handled directly. That way you can still get a reasonably-performant matrix-of-matrices or matrix-of-rationals and get screamin'-hot performance for floats and doubles.
From my experience I found that when efficiency is one of the main cocners, it often worked better the other way around: write the algorithm optimally, and then see how you can generalize it.
Well naturally; the generic programming process *starts* with multiple optimal non-general implementations, and lifts them repeatedly to higher levels of abstraction... [http://www.generic-programming.org/about/intro/lifting.php]
also, writing multiple implementations requires time, which is somewhat restricted during a GSoC.
...but as the generic programmer, you don't have to write them yourself; you generally look around and find examples that people smarter than you have written. Of course, that takes time too :-) -- Dave Abrahams BoostPro Computing http://www.boostpro.com

First of all, thanks for all our input! However, it seems that it would take longer than the GSOC student submission period to decide an appropriate course of action. So, I'm forgoing the project. Again, thank you for your consideration in my work. I have a new proposal, however. There is no doubt that Suffix Tree's are useful and I think it would be a great addition to the Boost library. Over the GSOC timeframe, I'll implement a RAM based suffix tree algorithm such as Ukkonen's algorithm. The nice thing about implementing Ukkonen's algorithm is that it isn't difficult to get a simple implementation running and to speed it up later. Please see the implementation section of " http://en.wikipedia.org/wiki/Suffix_tree" for more information. If the Boost community feels this is a worthy addition, I'll write up a proposal as soon as I can. Many thanks for your consideration, Chad On Mon, Mar 21, 2011 at 11:35 PM, Dave Abrahams <dave@boostpro.com> wrote:
At Tue, 22 Mar 2011 00:12:24 +0100, Mathias Gaunard wrote:
On 21/03/2011 18:45, Dave Abrahams wrote:
Typically what you do with a generic library is that you write generic algorithms for arbitrary datatypes and then specialize them to dispatch to LAPACK when the types can be handled directly. That way you can still get a reasonably-performant matrix-of-matrices or matrix-of-rationals and get screamin'-hot performance for floats and doubles.
From my experience I found that when efficiency is one of the main cocners, it often worked better the other way around: write the algorithm optimally, and then see how you can generalize it.
Well naturally; the generic programming process *starts* with multiple optimal non-general implementations, and lifts them repeatedly to higher levels of abstraction... [http://www.generic-programming.org/about/intro/lifting.php]
also, writing multiple implementations requires time, which is somewhat restricted during a GSoC.
...but as the generic programmer, you don't have to write them yourself; you generally look around and find examples that people smarter than you have written.
Of course, that takes time too :-)
-- Dave Abrahams BoostPro Computing http://www.boostpro.com
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Chad Seibert wrote:
I have a new proposal, however. There is no doubt that Suffix Tree's are useful and I think it would be a great addition to the Boost library. Over the GSOC timeframe, I'll implement a RAM based suffix tree algorithm such as Ukkonen's algorithm. The nice thing about implementing Ukkonen's algorithm is that it isn't difficult to get a simple implementation running and to speed it up later. Please see the implementation section of " http://en.wikipedia.org/wiki/Suffix_tree" for more information.
I'm sure that a good suffix tree library would be welcomed. I can't say whether the amount of work is sane for a GSoC project, though. Have you considered suffix arrays? Personally I have found more real-world applications for these, rather than trees. Actually the variant that I use currently is a suffix array that indexes only the start of each word; I'm not sure how practical it would be to make a generic suffix array that could be specialised with a predicate to match in that way. Do you have a motivating application? Or at least something real to benchmark with? If you do, that makes it much easier to choose between implementation options etc. Regards, Phil. (p.s. you might like to change the message subject, if you want this new idea to get noticed...)

Chad Seibert wrote:
Yes, it does. However, it seems inappropriate to write an LP library for Boost that uses LAPACK to do the linear algebra stuff. If we wanted to go down that road, we could just write a wrapper for an existing lp library instead, like what OpenOpt does.
I have to disagree here. We have Boost.MPI, which builds on top of existing MPI implementations instead of trying to reinvent MPI. Now the sandbox numeric bindings project (<http://svn.boost.org/svn/boost/sandbox/numeric_bindings>) probably will never be able to compete with Boost.MPI in terms of perfection, but it seems to be quite usable nevertheless. There are projects like NT2 or eigen, which try to do better/different than BLAS/LAPACK and succeeded more or less. However, I don't see which benefit it would bring if you try to compete with them during a GSOC project. Regards, Thomas

Right then. Perhaps the best alternative is to write a wrapper for common LP and LS solvers, based on the assumption that no combination of our efforts will ever out-match other open-source implementations. Does that seem a reasonable summer project? Write a wrapper around LAPACK, similar to LAPACK++? On Mon, Mar 21, 2011 at 9:37 AM, Thomas Klimpel <Thomas.Klimpel@synopsys.com
wrote:
Chad Seibert wrote:
Yes, it does. However, it seems inappropriate to write an LP library for Boost that uses LAPACK to do the linear algebra stuff. If we wanted to go down that road, we could just write a wrapper for an existing lp library instead, like what OpenOpt does.
I have to disagree here. We have Boost.MPI, which builds on top of existing MPI implementations instead of trying to reinvent MPI. Now the sandbox numeric bindings project (< http://svn.boost.org/svn/boost/sandbox/numeric_bindings>) probably will never be able to compete with Boost.MPI in terms of perfection, but it seems to be quite usable nevertheless. There are projects like NT2 or eigen, which try to do better/different than BLAS/LAPACK and succeeded more or less. However, I don't see which benefit it would bring if you try to compete with them during a GSOC project.
Regards, Thomas _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Chad Seibert wrote:
Right then. Perhaps the best alternative is to write a wrapper for common LP and LS solvers, based on the assumption that no combination of our efforts will ever out-match other open-source implementations. Does that seem a reasonable summer project? Write a wrapper around LAPACK, similar to LAPACK++?
A "modern" C++ DSEL for specifying linear constraints would be a very elegant piece of template metaprogramming, but special care would have to be taken for error handling. If you SFINAE guard every interface such that nothing that shouldn't compile ever matches any function your errors would be "no function F at line #". Since we have standard file formats for LP and ILP the first goal could be to just emit a standard format for consumption by an off the shelf solver. Even that modest scope would be very challenging to complete in the GSOC time frame, but we expect a successful student project to continue after the summer and evolve into an active member of the boost community. Regards, Luke

On Mon, Mar 21, 2011 at 3:37 PM, Thomas Klimpel <Thomas.Klimpel@synopsys.com> wrote:
Chad Seibert wrote:
Yes, it does. However, it seems inappropriate to write an LP library for Boost that uses LAPACK to do the linear algebra stuff. If we wanted to go down that road, we could just write a wrapper for an existing lp library instead, like what OpenOpt does.
I have to disagree here. We have Boost.MPI, which builds on top of existing MPI implementations instead of trying to reinvent MPI. Now the sandbox numeric bindings project (<http://svn.boost.org/svn/boost/sandbox/numeric_bindings>) probably will never be able to compete with Boost.MPI in terms of perfection, but it seems to be quite usable nevertheless. There are projects like NT2 or eigen, which try to do better/different than BLAS/LAPACK and succeeded more or less. However, I don't see which benefit it would bring if you try to compete with them during a GSOC project.
+1 for this GSOC proposal!! I agree with Thomas. Since the main goal is to build an LP library, I would suggest to Chad to use an already existent linear algebra library and focus its attention to the implementation of LP algorithms. You can always change your underlying implementation with a different one later. For what concerns, linear algebra libraries, currently uBLAS has an LU(P) solver. For what concerns Boost.NumericBindings, it is a valuable friend for integrating uBLAS (and not only it) with LAPACK/BLAS/... However: * Boost.NumericBindings is still not part of Boost. Here the risk is that when reviewing this GSOC for inclusion in Boost, it could not be accepted since it relies in a "beta" library * Use of LAPACK/BLAS/... would add a depedency to a third party libraries. Furthermore such libraries are written in FORTRAN. So it would better to use a C port. So, my opinion is to initially rely as much as possible to uBLAS for handling linear algebra stuff. ;) Best, -- Marco

On 21/03/2011 15:37, Thomas Klimpel wrote:
Chad Seibert wrote:
Yes, it does. However, it seems inappropriate to write an LP library for Boost that uses LAPACK to do the linear algebra stuff. If we wanted to go down that road, we could just write a wrapper for an existing lp library instead, like what OpenOpt does.
There are projects like NT2 or eigen, which try to do better/different than BLAS/LAPACK and succeeded more or less.
NT2 depends on BLAS/LAPACK, it does not replace it.

Mathias Gaunard wrote:
On 21/03/2011 12:00, Chad Seibert wrote:
If the Boost community thinks that these are reasonable and I should forgo solving LP in favor of solving large (> 100,000x100,000) LS, we can certainly discuss this. The following algorithms are up for consideration:
* LU decomposition for small matrices * Iterative methods for large matrices, such as conjugate gradient for positive definite matrices
Doesn't LAPACK already do that?
LAPACK certainly does the small matrices part (note how Chad has defined "small"). The iterative methods for large matrices part seem a bit questionable to me in this context, because I see no reasons why the matrices should be positive definite, or have any other "guaranteed" property that would assure convergence of iterative methods. On the other hand, LP matrices often have few non-zero elements, so that sparse direct methods could be used when LAPACK is no longer appropriate. I think even an ILU preconditioner for an iterative method would still require methods related to these sparse direct methods. There are libraries for this, like umfpack, mumps, superLU and others. Sorry for being a bit sloopy here, but being more precise (and verifying that my answers are relatively correct) would cost me more time than I want to spend for this answer right now. Regards, Thomas
participants (7)
-
Chad Seibert
-
Dave Abrahams
-
Mathias Gaunard
-
Phil Endecott
-
sguazt
-
Simonson, Lucanus J
-
Thomas Klimpel