Re: Boost Mathematicians

Michael Stevens

5 Jun 2004 5 Jun '04

2:08 p.m.

Hi Dave, hi all, I read with interest you plans for reviving MTL. It is certainly a shame that development of MTL was been stagnant for so long. It certainly had many interesting design features and I used it for all my work for many years. That said I switched to using and developing uBLAS long ago and it has an excellent design. On Saturday 05 June 2004 13:53, David Abrahams wrote:

...

"Neal D. Becker" <ndbecker2@verizon.net> writes:

...
David Abrahams wrote:

...
Tom Brinkman <reportbase@yahoo.com> writes:

...
In short, lets tell the world that this is the premiere place for advanced mathmatical library development, as I believe that it could be.

Comments?

1. I think oonumerics.org has the jump on us here, though the site seems to be douwn at this moment.

2. I will be starting work on a rewrite of MTL (the Matrix Template Library) for linear algebra soon.

What do you plan for MTL? How is it different than ublas?

MTL is aimed at linear algebra, whereas IIUC ublas is not.

...

There's a lot more to what's in the current plan than I can lay out here, but the focus will be on support for different kinds of matrices with combinations of these aspects

o Shape o Orientation o Symmetry o Sparsity o Blocking o Upper / Lower storage o Unit diagonal Other then the many forms of blocking (other then banded) uBLAS supports all

Well the L and A in uBLAS certainly stand for Linear Algebra! Of course the B stands for Basic and uBLAS's primary aim is to provide the standard set of BLAS functions in a modern C++ environment. Of course as it stands the complete uBLAS library is more then just the BLAS functions and includes some common Linear Algebra algorithms and many useful types. That said I think it is important to separate BLAS functions from domain specific linear algebra algorithm development. This is something that proved itself since the seventies. these in its design. This really is its strength! To a large extent they can even be combine these properties where it makes mathematical sense. For example you can wrap up one of a number of sparse matrix types in a symmetric adaptor.

...

and operations like:

o scalar-vector (vector-scalar) multiplication o vector addition (and subtraction) o apply linear operator (left) o norm o inner product o triangular solve

Other then 'apply linear operator' these are all in uBLAS!

...

with expression templates, and of course, zero abstraction penalty ;-) Of course uBLAS does this all with ET, but the abstraction penalty may not be zero :-)

Other then the lack of ET in the current MTL the big difference between the two libraries is the definition of iterators. Neither design seems to be perfect with regard to efficiency. Since uBLAS is already in Boost and has a well established and clean user syntax it would seem strange to ignore it. For the perspective of building further Linear Algebra algorithms it would not be too hard to use the syntax sufficiently portably so that a future MTL with expression templates could not be used interchangeably. Michael -- ___________________________________ Michael Stevens Systems Engineering Navigation Systems, Estimation and Bayesian Filtering http://bayesclasses.sf.net ___________________________________

Show replies by date

David Abrahams

2 Jul 2 Jul

8:23 p.m.

New subject: Boost Mathematicians

Michael Stevens <Michael.Stevens@epost.de> writes:

...

Hi Dave, hi all,

I read with interest you plans for reviving MTL. It is certainly a shame that development of MTL was been stagnant for so long. It certainly had many interesting design features and I used it for all my work for many years. That said I switched to using and developing uBLAS long ago and it has an excellent design.

Jeremy and I have just completed a re-evaluation of uBlas based on what's in uBlas' own CVS repository, having not discovered that until recently (you should have that info on uBlas' Boost page!) We have some major complaints with uBlas' design. The list is long, and the issues run deep enough that we don't believe that uBlas is a suitable foundation for the work we want to do. Here is a partial list of things we take issue with: Interface Design ---------------- * Not grounded in Generic Programming. The concept taxonomies, to the extent they exist, are weak, and poorly/incorrectly documented. Aspects of the design that should be generic are not (e.g. only certain storage containers are supported, rather than supplying storage concept requirements). No linear algebra concepts (vector space, field, etc.) The library is so out-of-conformance with our expectations for Generic Programming that this one item by itself is probably enough to make it unsuitable for us. * Redundant specification of element type in matrix/vector storage. * size1 and size2 should be named num_rows and num_columns or something memnonic * iterator1 and iterator2 should be named column_iterator and row_iterator or something memnonic. * prod should be named operator*; this is a linear algebra library after all. * begin() and end() should never violate O(1) complexity expectations. * insert(i,x) and erase(i) names used inconsistently with standard library. * Matrix/Vector concept/class interfaces are way too "fat" and need to be minimized (e.g. rbegin/rend *member* functions should be eliminated). * The slice interface is wrong; stride should come last and be optional; 2nd argument should be end and not size; then a separate range interface could be eliminated. * No support for unorderd sparse formats -- it can't be made to fit into the uBlas framework. Implementation -------------- * Expressions that require temporaries are not supported by uBLAS under release mode. They are supported under debug mode. For example, the following program compiles under debug mode, but not under release mode. #include <boost/numeric/ublas/matrix.hpp> #include <boost/numeric/ublas/io.hpp> int main () { using namespace boost::numeric::ublas; matrix<double> m (3, 3); vector<double> v (3); for (unsigned i = 0; i < std::min (m.size1 (), v.size ()); ++ i) { for (unsigned j = 0; j < m.size2 (); ++ j) m (i, j) = 3 * i + j; v (i) = i; } std::cout << prod (prod(v, m), m) << std::endl; } The workaround to make it compile under release mode is to explicitly insert the creation of a temporary: std::cout << prod (vector<double>(prod(v, m)), m) << std::endl; There should be no such surprises when moving from debug to release. Debug mode should use expression templates, too, as the differences can cause other surprises. * Should use iterator_adaptor. There is a ton of boilerplate iterator code in the uBLAS that needs to be deleted. * Should use enable_if instead of CRTP to implement operators. uBLAS avoids the ambiguity problem by only using operator* for vector-scalar, matrix-scalar ops, but that's only a partial solution. Its expressions can't interact with objects from other libraries (e.g. multi-array) because they require the intrusive CRTP base class. * Certain operations, especially on sparse matrices and vectors, and when dealing with matrix_row and matrix_column proxies have the wrong complexity. Functions such as begin() and end() are suppose to be constant time. I see calls to find1 and find2, which look like expensive functions (they each contain a loop). Testing ------- * There should be a readme describing the organization of the tests. * Tests should not print stuff that must be manually inspected for correctness. * Test programs should instead either complete successfully (with exit code 0) or not (and print why it failed). * Abstraction penalty tests need to compare the library with tuned fortran BLAS and ATLAS, not naive 'C' implementation. Documentation ------------- * In really bad shape. Redundant boilerplate tables make the head spin rather than providing useful information. * Needs to be user-centered, not implementation centered. * Need a good set of tutorials. Compilers --------- * Simple example doesn't compile with gcc 3.3. Got it to compile by #if'ing out operator value_type() in vector_expression.hpp.

...

On Saturday 05 June 2004 13:53, David Abrahams wrote:

...
...
What do you plan for MTL? How is it different than ublas?

MTL is aimed at linear algebra, whereas IIUC ublas is not.

Well the L and A in uBLAS certainly stand for Linear Algebra! Of course the B stands for Basic and uBLAS's primary aim is to provide the standard set of BLAS functions in a modern C++ environment. Of course as it stands the complete uBLAS library is more then just the BLAS functions and includes some common Linear Algebra algorithms and many useful types.

Whoops. Of course that's correct.

...

That said I think it is important to separate BLAS functions from domain specific linear algebra algorithm development. This is something that proved itself since the seventies.

...
There's a lot more to what's in the current plan than I can lay out here, but the focus will be on support for different kinds of matrices with combinations of these aspects

o Shape o Orientation o Symmetry o Sparsity o Blocking o Upper / Lower storage o Unit diagonal

Other then the many forms of blocking (other then banded) uBLAS supports all these in its design.

I believe that a design with really good support for blocking can't be easily grafted onto an existing design that doesn't have it.

...

This really is its strength! To a large extent they can even be combine these properties where it makes mathematical sense. For example you can wrap up one of a number of sparse matrix types in a symmetric adaptor.

This stuff was all present in MTL2 IIRC.

...

...
and operations like:

o scalar-vector (vector-scalar) multiplication o vector addition (and subtraction) o apply linear operator (left) o norm o inner product o triangular solve Other then 'apply linear operator' these are all in uBLAS!

...
with expression templates, and of course, zero abstraction penalty ;-) Of course uBLAS does this all with ET, but the abstraction penalty may not be zero :-)

Other then the lack of ET in the current MTL the big difference between the two libraries is the definition of iterators. Neither design seems to be perfect with regard to efficiency.

No, and I have some ideas for addressing that.

...

Since uBLAS is already in Boost and has a well established and clean user syntax it would seem strange to ignore it.

Yeah, I stopped ignoring it long enough to determine for sure that we should probably ignore it :(.

...

For the perspective of building further Linear Algebra algorithms it would not be too hard to use the syntax sufficiently portably so that a future MTL with expression templates could not be used interchangeably.

We have some problems with the syntax too, as you can see from the above. That said, if the design of MTL makes sparing use of members and instead relies on free functions, you should be able to make uBlas syntax adapters ;-) -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Walter Landry

3 Jul 3 Jul

4:09 a.m.

New subject: Boost Mathematicians

David Abrahams <dave@boost-consulting.com> wrote:

...

* prod should be named operator*; this is a linear algebra library after all.

Actually, operator* can be thought of as either an inner product or an outer product. I'm not familiar with the innards of uBLAS, but that is a problem common to most linear algebra libraries. Regards, Walter Landry wlandry@ucsd.edu

David Abrahams

2:48 p.m.

New subject: Boost Mathematicians

Walter Landry <wlandry@ucsd.edu> writes:

...

David Abrahams <dave@boost-consulting.com> wrote:

...
* prod should be named operator*; this is a linear algebra library after all.

Actually, operator* can be thought of as either an inner product or an outer product. I'm not familiar with the innards of uBLAS, but that is a problem common to most linear algebra libraries.

IIUC in the domain of linear algebra when mathemeticians write AB for 2D matrices A and B, they mean ordinary matrix multiplication (I think inner product) yielding another 2D matrix, and that's what operator* should mean in a linear algebra library. If you want an outer product yielding a 4D matrix I think you're basically outside the domain of linear algebra. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David B. Held

6 Jul 6 Jul

3:24 p.m.

New subject: Boost Mathematicians

"David Abrahams" <dave@boost-consulting.com> wrote in message news:uoemxdu50.fsf@boost-consulting.com...

...

Walter Landry <wlandry@ucsd.edu> writes:

...
David Abrahams <dave@boost-consulting.com> wrote:

...
* prod should be named operator*; this is a linear algebra library after all.

Actually, operator* can be thought of as either an inner product or an outer product. I'm not familiar with the innards of uBLAS, but that is a problem common to most linear algebra libraries.

IIUC in the domain of linear algebra when mathemeticians write

AB

for 2D matrices A and B, they mean ordinary matrix multiplication (I think inner product) yielding another 2D matrix, and that's what operator* should mean in a linear algebra library. If you want an outer product yielding a 4D matrix I think you're basically outside the domain of linear algebra.

Actually, taking a look at mathworld.wolfram.com and a few other sites, I have reached the tentative conclusion that outer products can only be formed over vectors, not matrices. Matrix multiplication is well-defined and unambiguous, whereas vector multiplication comes in at least four varieties: dot product, cross product (also called the "outer product"), perpendicular dot product, and vector direct product. Hence, it would make sense that operator*(matrix, matrix) would mean "matrix multiplication". However, operator*(vector, vector) is the problem child. Using the most literal symbolism, it seems that it should mean "dot product". It is unfortunate that there is no operatorX in C++, and that such user-defined operators cannot be created. In this case, I don't see a problem with Mr. Virgilio's suggestion to use operator^ for cross product, except possibly precedence rules and/or the fact that there is such a thing as a "wedge product" which uses roughly the same symbol. However, it is not specific to linear algebra, so it shouldn't be a significant problem. Dave --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.710 / Virus Database: 466 - Release Date: 6/23/2004

Jeremy Siek

3:37 p.m.

New subject: Boost Mathematicians

Hi David, On Jul 6, 2004, at 10:24 AM, David B. Held wrote:

...

Actually, taking a look at mathworld.wolfram.com and a few other sites, I have reached the tentative conclusion that outer products can only be formed over vectors, not matrices. Matrix multiplication is well-defined and unambiguous, whereas vector multiplication comes in at least four varieties: dot product, cross product (also called the "outer product"), perpendicular dot product, and vector direct product.

Thanks for the concise summary!

...

Hence, it would make sense that operator*(matrix, matrix) would mean "matrix multiplication".

yes.

...

However, operator*(vector, vector) is the problem child.

Yes, there is no obvious "right choice" for vectors. There are basically two options, both of which are fine by me. 1. Use names such as inner_product (or dot_product), outer_product, etc. This approach is "true" to the mathematics and completely unambiguous. 2. Take the Matlab approach (which also appears in many textbooks), and treat vectors as degenerate matrices: as either row or column vectors. In this case, operator* on these vectors just has the usual meaning it does for matrices. Depending on whether the vectors are row or column vectors determines whether operator* is equivalent to inner or outer product. This approach is more concise and fits with existing practice. Cheers, Jeremy _______________________________________________ Jeremy Siek <jsiek@osl.iu.edu> http://www.osl.iu.edu/~jsiek Ph.D. Student, Indiana University Bloomington C++ Booster (http://www.boost.org) _______________________________________________

Christoph Ludwig

7 Jul 7 Jul

9:33 a.m.

New subject: Boost Mathematicians

On Tue, Jul 06, 2004 at 10:24:49AM -0500, David B. Held wrote:

...

"David Abrahams" <dave@boost-consulting.com> wrote in message news:uoemxdu50.fsf@boost-consulting.com...

...
Walter Landry <wlandry@ucsd.edu> writes:

...
David Abrahams <dave@boost-consulting.com> wrote:

...
* prod should be named operator*; this is a linear algebra library after all.

Actually, operator* can be thought of as either an inner product or an outer product. I'm not familiar with the innards of uBLAS, but that is a problem common to most linear algebra libraries.

IIUC in the domain of linear algebra when mathemeticians write

AB

for 2D matrices A and B, they mean ordinary matrix multiplication (I think inner product) yielding another 2D matrix, and that's what operator* should mean in a linear algebra library. If you want an outer product yielding a 4D matrix I think you're basically outside the domain of linear algebra.

Actually, taking a look at mathworld.wolfram.com and a few other sites, I have reached the tentative conclusion that outer products can only be formed over vectors, not matrices. Matrix multiplication is well-defined and unambiguous, whereas vector multiplication comes in at least four varieties: dot product, cross product (also called the "outer product"), perpendicular dot product, and vector direct product.

You forgot the "Kronecker product" which is the tensor product explicitly written down for matrices. Let A = (a_{i,j}) be an m by n matrix and B a p by q matrix. Then the Kronecker product A x B is the following block matrix / a_{1,1}B ... a_{1,n}B \ | . . | A x B = | . . | | . . | \ a_{m,1}B ... a_{m,n}B / The Kronecker product has its place in linear algebra, but I agree that in a C++ library operator* should be reserved for the "common" matrix product. Christoph -- http://www.informatik.tu-darmstadt.de/TI/Mitarbeiter/cludwig.html LiDIA: http://www.informatik.tu-darmstadt.de/TI/LiDIA/Welcome.html

Vincent N. Virgilio

5 Jul 5 Jul

5:41 a.m.

New subject: Boost Mathematicians

Walter Landry wrote:

...

David Abrahams <dave@boost-consulting.com> wrote:

...
* prod should be named operator*; this is a linear algebra library after all.

Actually, operator* can be thought of as either an inner product or an outer product. I'm not familiar with the innards of uBLAS, but that is a problem common to most linear algebra libraries.

Regards, Walter Landry wlandry@ucsd.edu

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

I find that operator^ is an expressive syntax for the outer product. Of course you lose a terse syntax for exponentiation; but perhaps that is not important for the operands involved. There is prior art, even on this list, in Jaap Suter's Geometric Algebra library. (http://www.jaapsuter.com/) Regards, -- Vince Virgilio

Toon Knapen

3:54 p.m.

New subject: Boost Mathematicians

I think Dave's evaluation of uBlas should be closely looked at and we should address the issues mentioned. So I would like to comment so that we can further these isues and really tackle these problems. David Abrahams wrote:

...

Jeremy and I have just completed a re-evaluation of uBlas based on what's in uBlas' own CVS repository, having not discovered that until recently (you should have that info on uBlas' Boost page!) We have some major complaints with uBlas' design. The list is long, and the issues run deep enough that we don't believe that uBlas is a suitable foundation for the work we want to do. I however regret that you never provided your feedback before.

...

Here is a partial list of things we take issue with:

Interface Design ----------------

* Not grounded in Generic Programming. The concept taxonomies, to the extent they exist, are weak, and poorly/incorrectly documented. Aspects of the design that should be generic are not (e.g. only certain storage containers are supported, rather than supplying storage concept requirements). No linear algebra concepts (vector space, field, etc.) The library is so out-of-conformance with our expectations for Generic Programming that this one item by itself is probably enough to make it unsuitable for us.

The problem regarding documentation regularly comes up on the ml. Some people started with an alternative documentation for uBlas. Maybe it's time to merge and rework all uBlas documentation. And indeed documentation is crucial for developing generic concepts.

...

* Redundant specification of element type in matrix/vector storage.

Could you elaborate ?

...

* size1 and size2 should be named num_rows and num_columns or something memnonic

This is just a matter of documentation. AFAIK there is no accepted concept that demands 'num_rows' and 'num_columns'. The '1' and '2' refer to the first and second index which is also a nice convention IMO.

...

* iterator1 and iterator2 should be named column_iterator and row_iterator or something memnonic.

same as above.

...

* prod should be named operator*; this is a linear algebra library after all.

This also came up a few times and we never agreed how users could clearly distinguish the product and the element-product so there was decided to use more explicit function-names. But the most important thing about operator-overloading is being able to use generic algorithms that exploit the given operator. Not having operator* AFAIK never prohibited the reuse of generic LA algorithms.

...

* begin() and end() should never violate O(1) complexity expectations.

That would indeed be ideal but why exactly do you insist on O(1) complexity.

...

* insert(i,x) and erase(i) names used inconsistently with standard library.

agreed. Again this could be solved by generating proper documentation that is also inline with the documentation of the standard library.

...

* Matrix/Vector concept/class interfaces are way too "fat" and need to be minimized (e.g. rbegin/rend *member* functions should be eliminated).

agreed.

...

* The slice interface is wrong; stride should come last and be optional; 2nd argument should be end and not size; then a separate range interface could be eliminated.

I'm not convinced slice and range could be united. A range is a special case that can more easily optimised compared to the slice implementation.

...

* No support for unorderd sparse formats -- it can't be made to fit into the uBlas framework.

I'm not qualified to comment here.

...

Implementation --------------

* Expressions that require temporaries are not supported by uBLAS under release mode. They are supported under debug mode. For example, the following program compiles under debug mode, but not under release mode.

#include <boost/numeric/ublas/matrix.hpp> #include <boost/numeric/ublas/io.hpp>

int main () { using namespace boost::numeric::ublas; matrix<double> m (3, 3); vector<double> v (3); for (unsigned i = 0; i < std::min (m.size1 (), v.size ()); ++ i) { for (unsigned j = 0; j < m.size2 (); ++ j) m (i, j) = 3 * i + j; v (i) = i; } std::cout << prod (prod(v, m), m) << std::endl; }

The workaround to make it compile under release mode is to explicitly insert the creation of a temporary:

std::cout << prod (vector<double>(prod(v, m)), m) << std::endl;

There should be no such surprises when moving from debug to release. Debug mode should use expression templates, too, as the differences can cause other surprises.

Agreed. The semantics in debug and release mode should be identical.

...

* Should use iterator_adaptor. There is a ton of boilerplate iterator code in the uBLAS that needs to be deleted.

uBlas originates from before the new iterator_adaptor lib. But indeed we might need to look at using the iterator_adaptor library.

...

* Should use enable_if instead of CRTP to implement operators. uBLAS avoids the ambiguity problem by only using operator* for vector-scalar, matrix-scalar ops, but that's only a partial solution. Its expressions can't interact with objects from other libraries (e.g. multi-array) because they require the intrusive CRTP base class.

True but operator* between uBlas classes and multi_arrays are used very frequently.

...

Testing ------- * There should be a readme describing the organization of the tests.

Indeed but this is a problem that can be solved (but must be dealed with)

...

* Tests should not print stuff that must be manually inspected for correctness.

Does it really bother you ?

...

* Test programs should instead either complete successfully (with exit code 0) or not (and print why it failed).

agreed.

...

Documentation -------------

* In really bad shape. Redundant boilerplate tables make the head spin rather than providing useful information.

* Needs to be user-centered, not implementation centered.

* Need a good set of tutorials.

all agreed. Man, we really need to improve our documentation!

David Abrahams

6:45 p.m.

New subject: Boost Mathematicians

Toon Knapen <toon.knapen@fft.be> writes:

...

I think Dave's evaluation of uBlas should be closely looked at and we should address the issues mentioned. So I would like to comment so that we can further these isues and really tackle these problems.

[small request: please leave a blank line between quoted text and your replies. I had to edit this one manually]

...

David Abrahams wrote:

...
Jeremy and I have just completed a re-evaluation of uBlas based on what's in uBlas' own CVS repository, having not discovered that until recently (you should have that info on uBlas' Boost page!) We have some major complaints with uBlas' design. The list is long, and the issues run deep enough that we don't believe that uBlas is a suitable foundation for the work we want to do.

I however regret that you never provided your feedback before.

I didn't have the time at the time ;-). I will note that Jeremy tried to make the points about generic programming with the library's original authors and they just didn't seem to "get it".

...

...
Here is a partial list of things we take issue with:

Interface Design ----------------

* Not grounded in Generic Programming. The concept taxonomies, to the extent they exist, are weak, and poorly/incorrectly documented. Aspects of the design that should be generic are not (e.g. only certain storage containers are supported, rather than supplying storage concept requirements). No linear algebra concepts (vector space, field, etc.) The library is so out-of-conformance with our expectations for Generic Programming that this one item by itself is probably enough to make it unsuitable for us.

The problem regarding documentation regularly comes up on the ml. Some people started with an alternative documentation for uBlas. Maybe it's time to merge and rework all uBlas documentation. And indeed documentation is crucial for developing generic concepts.

This isn't just about the documentation. Implementations founded on this kind of un-rigorous specification are almost certainly not actually generic. Combined with the other serious points we've raised, it begins to look like a complete rewrite would be needed.

...

...
* Redundant specification of element type in matrix/vector storage.

Could you elaborate ?

When you build a matrix type, you pass it a storage parameter that also has knowledge of the matrix element type, and they probably (lack of rigor in the docs don't make it very certain) have to match. A small issue.

...

...
* size1 and size2 should be named num_rows and num_columns or something memnonic

This is just a matter of documentation.

Disagree.

...

AFAIK there is no accepted concept that demands 'num_rows' and 'num_columns'. The '1' and '2' refer to the first and second index which is also a nice convention IMO.

These are indexing rows and columns of the matrix, so they should say so. In some ways this is an aesthetic issue, but there are some objective arguments that the current design is more confusing. It's possible to interpret "first index" various ways, considering the possibility of row-major and column-major indexing.

...

...
* iterator1 and iterator2 should be named column_iterator and row_iterator or something memnonic.

same as above.

Indeed ;-)

...

...
* prod should be named operator*; this is a linear algebra library after all.

This also came up a few times and we never agreed how users could clearly distinguish the product and the element-product so there was decided to use more explicit function-names.

What is an element-product?

...

But the most important thing about operator-overloading is being able to use generic algorithms that exploit the given operator.

No, that's the most important thing about generic programming (and maybe function overloading). The most important thing about operator overloading is terse, recognizable expressivity.

...

Not having operator* AFAIK never prohibited the reuse of generic LA algorithms.

IIUC, the set of NxN matrices forms a Ring, as does the set of real numbers. You should be able to use the same multiplication operator to implement any computation that works on Rings.

...

...
* begin() and end() should never violate O(1) complexity expectations.

That would indeed be ideal but why exactly do you insist on O(1) complexity.

Because of the expectation created by the standard. Certainly O(N) is way too much.

...

...
* insert(i,x) and erase(i) names used inconsistently with standard library.

agreed. Again this could be solved by generating proper documentation that is also inline with the documentation of the standard library.

Disagree. It's an aesthetic/understandability issue that can't be completely fixed with documentation.

...

...
* The slice interface is wrong; stride should come last and be optional; 2nd argument should be end and not size; then a separate range interface could be eliminated.

I'm not convinced slice and range could be united. A range is a special case that can more easily optimised compared to the slice implementation.

Maybe so. The interface to slice is still suboptimal.

...

...
* Should use enable_if instead of CRTP to implement operators. uBLAS avoids the ambiguity problem by only using operator* for vector-scalar, matrix-scalar ops, but that's only a partial solution. Its expressions can't interact with objects from other libraries (e.g. multi-array) because they require the intrusive CRTP base class.

True but operator* between uBlas classes and multi_arrays are used very frequently.

I must be misunderstanding something then. Maybe you get away with it because you've used prod() instead of operator* for matrix multiplication. How about operator+?

...

...
Testing ------- * There should be a readme describing the organization of the tests.

Indeed but this is a problem that can be solved (but must be dealed with)

...
* Tests should not print stuff that must be manually inspected for

^^^^

...
correctness.

Does it really bother you ?

Absolutely, for the reasons you agree with below.

...

...
* Test programs should instead either complete successfully (with exit code 0) or not (and print why it failed).

agreed.

-- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Jeremy Graham Siek

8:43 p.m.

New subject: [ublas-dev] Re: Re: Boost Mathematicians

Hi Toon, On Jul 5, 2004, at 10:54 AM, Toon Knapen wrote:

...

I think Dave's evaluation of uBlas should be closely looked at and we should address the issues mentioned. So I would like to comment so that we can further these isues and really tackle these problems.

David Abrahams wrote:

...
Jeremy and I have just completed a re-evaluation of uBlas based on what's in uBlas' own CVS repository, having not discovered that until recently (you should have that info on uBlas' Boost page!) We have some major complaints with uBlas' design. The list is long, and the issues run deep enough that we don't believe that uBlas is a suitable foundation for the work we want to do. I however regret that you never provided your feedback before.

I would have like to, but was swamped with other work when uBLAS went through review (I was probably writing the BGL book).

...

...
Here is a partial list of things we take issue with:

Interface Design ----------------

* Not grounded in Generic Programming. The concept taxonomies, to

...
extent they exist, are weak, and poorly/incorrectly documented. Aspects of the design that should be generic are not (e.g. only certain storage containers are supported, rather than supplying storage concept requirements). No linear algebra concepts (vector space, field, etc.) The library is so out-of-conformance with our expectations for Generic Programming that this one item by itself is probably enough to make it unsuitable for us. The problem regarding documentation regularly comes up on the ml. Some

the people started with an alternative documentation for uBlas. Maybe it's time to merge and rework all uBlas documentation. And indeed documentation is crucial for developing generic concepts.

...
* Redundant specification of element type in matrix/vector storage.

Could you elaborate ?

...
* size1 and size2 should be named num_rows and num_columns or something memnonic

This is just a matter of documentation. AFAIK there is no accepted concept that demands 'num_rows' and 'num_columns'. The '1' and '2' refer to the first and second index which is also a nice convention IMO.

No, there are uBLAS concepts that refer to size1 and size2. For example, the Matrix Expression concept.

...

...
* iterator1 and iterator2 should be named column_iterator and row_iterator or something memnonic.

same as above.

...
* prod should be named operator*; this is a linear algebra library after all.

This also came up a few times and we never agreed how users could clearly distinguish the product and the element-product so there was decided to use more explicit function-names.

element-wise products do not play the same central role that products do in linear algebra. Thus, operator* should be used for product, and some other name, like element_prod for element-wise product.

...

But the most important thing about operator-overloading is being able to use generic algorithms that exploit the given operator. Not having operator* AFAIK never prohibited the reuse of generic LA algorithms.

...
* begin() and end() should never violate O(1) complexity

expectations. That would indeed be ideal but why exactly do you insist on O(1) complexity.

Because the C++ Standard requirements for Container say constant time, it's what people expect, and because it follows the STL design style (e.g., std::list has no operator[]).

...

...
* insert(i,x) and erase(i) names used inconsistently with standard

library. agreed. Again this could be solved by generating proper documentation that is also inline with the documentation of the standard library.

This is an interface change, so code would need to change too.

...

...
* Matrix/Vector concept/class interfaces are way too "fat" and need

to

...
   be minimized (e.g. rbegin/rend *member* functions should be    eliminated). agreed.

...
* The slice interface is wrong; stride should come last and be    optional; 2nd argument should be end and not size; then a separate    range interface could be eliminated.

I'm not convinced slice and range could be united. A range is a special case that can more easily optimised compared to the slice implementation.

...
* No support for unorderd sparse formats -- it can't be made to fit    into the uBlas framework.

I'm not qualified to comment here.

...
Implementation --------------

* Expressions that require temporaries are not supported by uBLAS    under release mode. They are supported under debug mode. For    example, the following program compiles under debug mode, but not    under release mode.

   #include <boost/numeric/ublas/matrix.hpp>    #include <boost/numeric/ublas/io.hpp>

   int main () {        using namespace boost::numeric::ublas;        matrix<double> m (3, 3);        vector<double> v (3);        for (unsigned i = 0; i < std::min (m.size1 (), v.size ()); ++

i) {

...
      for (unsigned j = 0; j < m.size2 (); ++ j)             m (i, j) = 3 * i + j;       v (i) = i;        }        std::cout << prod (prod(v, m), m) << std::endl;    }

   The workaround to make it compile under release mode is to    explicitly insert the creation of a temporary:

       std::cout << prod (vector<double>(prod(v, m)), m) << std::endl;

   There should be no such surprises when moving from debug to    release. Debug mode should use expression templates, too, as the    differences can cause other surprises. Agreed. The semantics in debug and release mode should be identical.

...
* Should use iterator_adaptor. There is a ton of boilerplate

iterator

...
   code in the uBLAS that needs to be deleted. uBlas originates from before the new iterator_adaptor lib. But indeed we might need to look at using the iterator_adaptor library.

...
* Should use enable_if instead of CRTP to implement operators. uBLAS    avoids the ambiguity problem by only using operator* for    vector-scalar, matrix-scalar ops, but that's only a partial    solution. Its expressions can't interact with objects from other    libraries (e.g. multi-array) because they require the intrusive

CRTP

...
   base class. True but operator* between uBlas classes and multi_arrays are used very frequently.

...
Testing ------- * There should be a readme describing the organization of the tests.

Indeed but this is a problem that can be solved (but must be dealed with)

...
* Tests should not print stuff that must be manually inspected for    correctness.

Does it really bother you ?

Yes, very much. It should bother you too.

...

...
* Test programs should instead either complete successfully (with exit code 0) or not (and print why it failed).

agreed.

...
Documentation -------------

* In really bad shape. Redundant boilerplate tables make the head spin rather than providing useful information.

* Needs to be user-centered, not implementation centered.

* Need a good set of tutorials.

all agreed. Man, we really need to improve our documentation!

_______________________________________________ Jeremy Siek <jsiek@osl.iu.edu> http://www.osl.iu.edu/~jsiek Ph.D. Candidate, Indiana University Bloomington C++ Booster (http://www.boost.org) _______________________________________________

Matthias Schabel

9 Jul 9 Jul

9:24 p.m.

New subject: multidimensional arrays (was Boost Mathematicians)

I've been following this thread with some interest as I have had a number of similar complaints with the design and implementation of uBlas and have been frustrated for years by the lack of a comprehensive and flexible framework for handling multidimensional array data in a manner which enables the average end user to ignore the details of implementation of storage scheme, etc... while providing a consistent interface with sufficient comprehensiveness to handle as many reasonable use cases as possible. It seems to me that a Boost matrix/linear algebra library could very profitably be built on top of a well designed policy-based multidimensional array class. BTW, I am aware of the existence of Blitz++, which provides a number of interesting capabilities and is to be lauded for its demonstration that C++ multidimensional array performance need not be inferior to FORTRAN. However, it seems like the support of that library is weak at present and there appears to be no effort to drive toward standardization. A couple of thoughts on such a D-dimensional array class : 1) it should implement as much of the STL one-dimensional container interface as possible so that algorithms which are unconcerned with the order of traversal can use an efficient 1D iterator interface - for example, if I want to replace every element of a matrix with its square, I should be able to do this : matrix_type mat; for (matrix_type::iterator it=mat.begin();it!=mat.end();++it) *it = (*it)*(*it); The advantage here is that the algorithm doesn't need to know if the matrix is sparse or dense, as that logic is in the iterator, and applies equally well to higher rank arrays. Indexed element access can just be viewed as a mapping of a D-dimensional index to the corresponding 1D offset. 2) it should support two common syntaxes : mat[i][j] - zero offset C-style access mat(i,j) - possibly non-zero offset access 3) implementations for both statically allocated and dynamically allocated memory should be feasible. 4) for genericity (in algorithms which work for various values of D) there should also be an index type Index<D> which may be passed to operator() as an index : mat(Index<2>(i,j)) - equivalent to mat(i,j) 5) should support various types of slicing : indirection, subranges, masking, etc... 6) should facilitate range checking if desired. If there is interest, I have written some starting code that I would be happy to share which addresses most of these points to some extent and provides some other nice properties such as allowing transposition of arbitrary indices with no need to reorder the underlying storage (that is mat.transpose(0,1) causes all subsequent calls to mat(i,j) to provide the transposed element without moving the elements themselves). I have functioning dense, sparse, and diagonal implementations (with a proxy reference type) as well as indirection, subrange, and mask slicing. The design is uses a single policy template for the underlying implementation : Array<T,R,Imp> - T is the value type of elements, R is the rank, and Imp contains the functional implementation details. I'll see about shaping the code up this weekend and posting it on the Yahoo files section if there's interest.... Matthias

...

Jeremy and I have just completed a re-evaluation of uBlas based on what's in uBlas' own CVS repository, having not discovered that until recently (you should have that info on uBlas' Boost page!) We have some major complaints with uBlas' design. The list is long, and the issues run deep enough that we don't believe that uBlas is a suitable foundation for the work we want to do.

Here is a partial list of things we take issue with:

Interface Design ----------------

* Not grounded in Generic Programming. The concept taxonomies, to the extent they exist, are weak, and poorly/incorrectly documented. Aspects of the design that should be generic are not (e.g. only certain storage containers are supported, rather than supplying storage concept requirements). No linear algebra concepts (vector space, field, etc.) The library is so out-of-conformance with our expectations for Generic Programming that this one item by itself is probably enough to make it unsuitable for us.

* Redundant specification of element type in matrix/vector storage.

* size1 and size2 should be named num_rows and num_columns or something memnonic

* iterator1 and iterator2 should be named column_iterator and row_iterator or something memnonic.

* prod should be named operator*; this is a linear algebra library after all.

* begin() and end() should never violate O(1) complexity expectations.

* insert(i,x) and erase(i) names used inconsistently with standard library.

* Matrix/Vector concept/class interfaces are way too "fat" and need to be minimized (e.g. rbegin/rend *member* functions should be eliminated).

* The slice interface is wrong; stride should come last and be optional; 2nd argument should be end and not size; then a separate range interface could be eliminated.

* No support for unorderd sparse formats -- it can't be made to fit into the uBlas framework.

Implementation --------------

* Expressions that require temporaries are not supported by uBLAS under release mode. They are supported under debug mode. For example, the following program compiles under debug mode, but not under release mode.

#include <boost/numeric/ublas/matrix.hpp> #include <boost/numeric/ublas/io.hpp>

int main () { using namespace boost::numeric::ublas; matrix<double> m (3, 3); vector<double> v (3); for (unsigned i = 0; i < std::min (m.size1 (), v.size ()); ++ i) { for (unsigned j = 0; j < m.size2 (); ++ j) m (i, j) = 3 * i + j; v (i) = i; } std::cout << prod (prod(v, m), m) << std::endl; }

The workaround to make it compile under release mode is to explicitly insert the creation of a temporary:

std::cout << prod (vector<double>(prod(v, m)), m) << std::endl;

There should be no such surprises when moving from debug to release. Debug mode should use expression templates, too, as the differences can cause other surprises.

* Should use iterator_adaptor. There is a ton of boilerplate iterator code in the uBLAS that needs to be deleted.

* Should use enable_if instead of CRTP to implement operators. uBLAS avoids the ambiguity problem by only using operator* for vector-scalar, matrix-scalar ops, but that's only a partial solution. Its expressions can't interact with objects from other libraries (e.g. multi-array) because they require the intrusive CRTP base class.

* Certain operations, especially on sparse matrices and vectors, and when dealing with matrix_row and matrix_column proxies have the wrong complexity. Functions such as begin() and end() are suppose to be constant time. I see calls to find1 and find2, which look like expensive functions (they each contain a loop).

Testing ------- * There should be a readme describing the organization of the tests.

* Tests should not print stuff that must be manually inspected for correctness.

* Test programs should instead either complete successfully (with exit code 0) or not (and print why it failed).

* Abstraction penalty tests need to compare the library with tuned fortran BLAS and ATLAS, not naive 'C' implementation.

Documentation -------------

* In really bad shape. Redundant boilerplate tables make the head spin rather than providing useful information.

* Needs to be user-centered, not implementation centered.

* Need a good set of tutorials.

Compilers ---------

* Simple example doesn't compile with gcc 3.3. Got it to compile by #if'ing out operator value_type() in vector_expression.hpp.

...
On Saturday 05 June 2004 13:53, David Abrahams wrote:

...
...
What do you plan for MTL? How is it different than ublas?

MTL is aimed at linear algebra, whereas IIUC ublas is not.

Well the L and A in uBLAS certainly stand for Linear Algebra! Of course the B stands for Basic and uBLAS's primary aim is to provide the standard set of BLAS functions in a modern C++ environment. Of course as it stands the complete uBLAS library is more then just the BLAS functions and includes some common Linear Algebra algorithms and many useful types.

Whoops. Of course that's correct.

...
That said I think it is important to separate BLAS functions from domain specific linear algebra algorithm development. This is something that proved itself since the seventies.

...
There's a lot more to what's in the current plan than I can lay out here, but the focus will be on support for different kinds of matrices with combinations of these aspects

o Shape o Orientation o Symmetry o Sparsity o Blocking o Upper / Lower storage o Unit diagonal

Other then the many forms of blocking (other then banded) uBLAS supports all these in its design.

I believe that a design with really good support for blocking can't be easily grafted onto an existing design that doesn't have it.

...
This really is its strength! To a large extent they can even be combine these properties where it makes mathematical sense. For example you can wrap up one of a number of sparse matrix types in a symmetric adaptor.

This stuff was all present in MTL2 IIRC.

...
...
and operations like:

o scalar-vector (vector-scalar) multiplication o vector addition (and subtraction) o apply linear operator (left) o norm o inner product o triangular solve Other then 'apply linear operator' these are all in uBLAS!

...
with expression templates, and of course, zero abstraction penalty ;-) Of course uBLAS does this all with ET, but the abstraction penalty may not be zero :-)

Other then the lack of ET in the current MTL the big difference between the two libraries is the definition of iterators. Neither design seems to be perfect with regard to efficiency.

No, and I have some ideas for addressing that.

...
Since uBLAS is already in Boost and has a well established and clean user syntax it would seem strange to ignore it.

Yeah, I stopped ignoring it long enough to determine for sure that we should probably ignore it :(.

...
For the perspective of building further Linear Algebra algorithms it would not be too hard to use the syntax sufficiently portably so that a future MTL with expression templates could not be used interchangeably.

We have some problems with the syntax too, as you can see from the above. That said, if the design of MTL makes sparing use of members and instead relies on free functions, you should be able to make uBlas syntax adapters ;-)

-- Dave Abrahams Boost Consulting http://www.boost-consulting.com

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

------------------------------------------------------------------------ --------------------------- Matthias Schabel, Ph.D. Utah Center for Advanced Imaging Research 729 Arapeen Drive Salt Lake City, UT 84108 801-587-9413 (work) 801-585-3592 (fax) 801-706-5760 (cell) 801-484-0811 (home) mschabel at ucair med utah edu

Larry Evans

9:48 p.m.

New subject: multidimensional arrays (was Boost Mathematicians)

On 07/09/2004 04:24 PM, Matthias Schabel wrote: [snip]

...

A couple of thoughts on such a D-dimensional array class :

1) it should implement as much of the STL one-dimensional container interface as possible so that algorithms which are unconcerned with the order of traversal can use an efficient 1D iterator interface - for example, if I want to replace every element of a matrix with its square, I should be able to do this :

matrix_type mat;

for (matrix_type::iterator it=mat.begin();it!=mat.end();++it) *it = (*it)*(*it);

The advantage here is that the algorithm doesn't need to know if the matrix is sparse or dense, as that logic is in the iterator, and applies equally well to higher rank arrays. Indexed element access can just be viewed as a mapping of a D-dimensional index to the corresponding 1D offset.

2) it should support two common syntaxes :

mat[i][j] - zero offset C-style access mat(i,j) - possibly non-zero offset access

3) implementations for both statically allocated and dynamically allocated memory should be feasible.

4) for genericity (in algorithms which work for various values of D) there should also be an index type Index<D> which may be passed to operator() as an index :

mat(Index<2>(i,j)) - equivalent to mat(i,j)

5) should support various types of slicing : indirection, subranges, masking, etc...

6) should facilitate range checking if desired.

If there is interest, I have written some starting code that I would be happy to share which addresses most of

I'm interested.

...

these points to some extent and provides some other nice properties such as allowing transposition of arbitrary indices with no need to reorder the underlying storage (that is mat.transpose(0,1) causes all subsequent calls to mat(i,j) to provide the transposed element without moving the elements themselves). I have

I'd be interested in seeing how this was done, especially in comparison with Timothy Budd's methods in _An Apl Compiler_. I had an implementation of this; however, it was only for dense matrices since that's all Budd had in his book.

...

functioning dense, sparse, and diagonal implementations (with a proxy reference type) as well as indirection, subrange, and mask slicing. The design is uses a single policy template for the underlying implementation :

Array<T,R,Imp> - T is the value type of elements, R is the rank, and Imp contains the functional implementation details.

I'll see about shaping the code up this weekend and posting it on the Yahoo files section if there's interest....

I'd be very interested. I'm thinking about using a sparse matrix where T is symbol_set, where symbol_set is the set of terminal and non-terminal symbols in a grammar. This could be used by spirit in deriving LL(k) parser as described in: http://facweb.cs.depaul.edu/tchristopher/llkppr.pdf

Matthias Schabel

12 Jul 12 Jul

5:10 p.m.

New subject: multidimensional arrays (was Boost Mathematicians)

...

...
these points to some extent and provides some other nice properties such as allowing transposition of arbitrary indices with no need to reorder the underlying storage (that is mat.transpose(0,1) causes all subsequent calls to mat(i,j) to provide the transposed element without moving the elements themselves). I have

I'd be interested in seeing how this was done, especially in comparison with Timothy Budd's methods in _An Apl Compiler_. I had an implementation of this; however, it was only for dense matrices since that's all Budd had in his book.

What I do is maintain a signature of permutations (essentially the index order : for example, in a 3D C array the signature would be (0,1,2) while a FORTRAN order array would be (2,1,0). An accessor class is responsible for converting between the multidimensional index and a one dimensional offset corresponding to a dense matrix; the allocator then uses this offset to get the element out of storage - if the matrix is dense, the allocator just passes the already computed offset along, while if it is sparse, a map is used to look for the existence of the specified element. ------------------------------------------------------------------------ --------------------------- Matthias Schabel, Ph.D. Utah Center for Advanced Imaging Research 729 Arapeen Drive Salt Lake City, UT 84108 801-587-9413 (work) 801-585-3592 (fax) 801-706-5760 (cell) 801-484-0811 (home) mschabel at ucair med utah edu

Larry Evans

5:49 p.m.

New subject: multidimensional arrays (was Boost Mathematicians)

On 07/12/2004 12:10 PM, Matthias Schabel wrote:

...

...
I'd be interested in seeing how this was done, especially in comparison with Timothy Budd's methods in _An Apl Compiler_. I had an implementation of this; however, it was only for dense matrices since that's all Budd had in his book.

What I do is maintain a signature of permutations (essentially the index order : for example, in a 3D C array the signature would be (0,1,2) while a FORTRAN order array would be (2,1,0). An accessor class is responsible for converting between the multidimensional index and a one dimensional offset corresponding to a dense matrix; the allocator then uses this offset to get the element out of storage - if the matrix is dense, the allocator just passes the already computed offset along, while if it is sparse, a map is used to look for the existence of the specified element.

In _An Apl Compiler_, an "expansion vector", e, is used. The size is the array's rank (what I think you term dimension). The transpose simply permutes (probably what you're doing with index order) the expansion vector (e.g. instead of e0,e1,e2, it would be e2,e1,e0). The expansion vector is just the "scan" of the "shape" in apl terms. I think that's right, or maybe it's the scan of the reverse of the shape. For example if the shape is 2,5,10, then the expansion vector would be 2, 10, 100, and the location of a[i0][i1][i2] would be a0[e0*i0+e1*i1+e2*i2], where a0 is the start if the array.

Topher Cooper

6:15 p.m.

New subject: multidimensional arrays (was Boost Mathematicians)

Putting my two cents in on the issue of redefining the meaning of "operator,". Summary: Bad idea. The redefinition of "operator<<" for the I/O libraries is an entirely different situation. Before that was done, the specific meaning of the operator depended, however simply, on polymorphic resolution of the left operand. Use of the operator for streams -- though frequently confusing at first until it became, effectively, an accepted part of the language -- simply extended this fact rather radically. The complete meaning of the operator was now different rather than simply some of the details, but this is within the continuum of meanings. Now, as before, to understand what the operator meant -- if anything -- you examined the class of the first operand. The situation is very different for "operator," however. Sans redefinition it has a well defined meaning for every object that (despite the formal situation) is varied (and only in the matter of its return type) only through the *second* operand. Seeing an expression of the form "x, y" there is no interpretive moment equivalent to stopping to think just what it means to "shift an output stream left." As it stands, "x, y" has a clear meaning -- evaluate x then evaluate y and use the latter as the value of the complete expression -- UNLESS someone somewhere has overloaded the operator. Overloading "operator<<" as in the I/O libraries *extends* its meaning. Overloading "operator," redefines its meaning. Are people aware of the ordering implicit in the *normal* usage of the "," operator? Of course, look at how it is taught. It is described as a sort of "weaker" version of ";" that can be used within expressions without terminating the statement the expression appears in. Do they understand the sequential nature of ";". Of course. Do they understand the sequential nature of ",", at least in many cases, if they remember that "," within expressions exists at all, they remember that it acts sort of like ";" and therefore they probably understand its sequential character. Overloading "operator," is something like overloading "operator==" to mean something entirely unrelated to equality or equivalence -- only worse. Looking at this from another viewpoint: If there is no standard that allows generic programmers to assume that "operator," can be assumed to act according to its "default semantics", then they should stop using "operator," except to make use of some specifically overloaded semantics. Otherwise, the generic code is indeterminate (OK, you could use it by being very careful that no value dependent on a generic type was involved, but you would still be simply hoping that no one had added an overload involving one of your non-parameterized types). I strongly recommend that "operator," not be overloaded. If it is decided that this is absolutely essential, then I recommend that a strong and precise statement first be drafted stating unambiguously when such overloading should be "acceptable". Topher

David Abrahams

7:13 p.m.

New subject: Please stay in the appropriate thread [was Re: multidimensional arrays]

Topher Cooper <topher@topherc.net> writes:

...

Putting my two cents in on the issue of redefining the meaning of "operator,".

Summary: Bad idea.

This posting belongs in a different thread. Please take care to make sure your posts go where they belong. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Joel de Guzman

5 Aug 5 Aug

4:40 a.m.

New subject: Sparse tables and perfect hashing [ was: multidimensional arrays (was Boost Mathematicians) ]

Larry Evans wrote:

...

I'd be very interested. I'm thinking about using a sparse matrix where T is symbol_set, where symbol_set is the set of terminal and non-terminal symbols in a grammar. This could be used by spirit in deriving LL(k) parser as described in:

http://facweb.cs.depaul.edu/tchristopher/llkppr.pdf

Larry, I've recently experimented on perfect hashing (minimal and non-minimal) where the input is a char or wchar_t or int. What's interesting is that, based on my tests, I exceeded the speed of C++'s switch-case. I did a test on perfect hashing of integers and tested it against a switch in a loop: int test(int i) { switch (i) { case 1: return 0; case 4: return 1; case 9: return 2; case 16: return 3; case 25: return 4; case 36: return 5; case 49: return 6; default: return -1; } } vs. a corresponding perfect hash table. I got 6.6 secs (switch) vs. 0.422 (perfect hash) secs on VC7.1. G++'s switch was faster (i got 2.2) but still 4X slower than the perfect hash. The perfect hash generation can be done at runtime using metaprogramming. I based the algorithms on Bob Jenkin's perfact hash stuff: http://burtleburtle.net/bob/hash/perfect.html I think I've got the makings of an extremelely fast LL(1) engine. See attached test. Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

David Abrahams

9 Jul 9 Jul

10:14 p.m.

New subject: multidimensional arrays (was Boost Mathematicians)

Matthias Schabel <boost@schabel-family.org> writes:

...

I've been following this thread with some interest as I have had a number of similar complaints with the design and implementation of uBlas and have been frustrated for years by the lack of a comprehensive and flexible framework for handling multidimensional array data in a manner which enables the average end user to ignore the details of implementation of storage scheme, etc... while providing a consistent interface with sufficient comprehensiveness to handle as many reasonable use cases as possible. It seems to me that a Boost matrix/linear algebra library could very profitably be built on top of a well designed policy-based multidimensional array class.

Maybe so, but I wouldn't start there. The way any particular class template is defined should be one of the last concerns in the design of any generic framework. It's all about concepts and algorithms.

...

BTW, I am aware of the existence of Blitz++, which provides a number of interesting capabilities and is to be lauded for its demonstration that C++ multidimensional array performance need not be inferior to FORTRAN. However, it seems like the support of that library is weak at present and there appears to be no effort to drive toward standardization.

A couple of thoughts on such a D-dimensional array class :

1) it should implement as much of the STL one-dimensional container interface as possible

If by that you mean driving towards something based on the STL (reversible) container requirements in tables 65 and 66, then I strongly disagree. Those are bloated concepts and little or no generic code has been written to take advantage of them.

...

so that algorithms which are unconcerned with the order of traversal can use an efficient 1D iterator interface

That, I agree, is important.

...

- for example, if I want to replace every element of a matrix with its square, I should be able to do this :

matrix_type mat;

for (matrix_type::iterator it=mat.begin();it!=mat.end();++it) *it = (*it)*(*it);

That's potentially rather inefficient with all those .end() invocations. Anyway, I don't think low-level iteration over elements should be required by the concepts to be exposed as the primary begin() and end() iterators of a matrix, but through some kind of indirection. Since I am predisposed to using a free-function interface: std::for_each(begin(elements(mat)), end(elements(mat)), _1 = _1*_1);

...

The advantage here is that the algorithm doesn't need to know if the matrix is sparse or dense, as that logic is in the iterator, and applies equally well to higher rank arrays. Indexed element access can just be viewed as a mapping of a D-dimensional index to the corresponding 1D offset.

There's precedent for that in the MTL already. <snip other details of array interface> I have no problem with any of these details being part of any specific array class. Whether or not they should be part of the library's concept requirements is another question (on which I haven't formed an opinion yet).

...

If there is interest, I have written some starting code that I would be happy to share which addresses most of these points to some extent and provides some other nice properties such as allowing transposition of arbitrary indices with no need to reorder the underlying storage (that is mat.transpose(0,1) causes all subsequent calls to mat(i,j) to provide the transposed element without moving the elements themselves).

I'm not sure that's a nice property, since it means that the traversal order of the matrix is not a compile-time property.

...

I have functioning dense, sparse, and diagonal implementations (with a proxy reference type) as well as indirection, subrange, and mask slicing. The design is uses a single policy template for the underlying implementation :

Array<T,R,Imp> - T is the value type of elements, R is the rank, and Imp contains the functional implementation details.

I'll see about shaping the code up this weekend and posting it on the Yahoo files section if there's interest....

Your matrix implementation might be very nice, but from my point of view, using any specific implementation as the basis of a new linear algebra library design is starting from the wrong end of things... so for me, at least, looking at it in detail would be premature. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Matthias Schabel

12 Jul 12 Jul

5:35 p.m.

New subject: multidimensional arrays (was Boost Mathematicians)

...

Maybe so, but I wouldn't start there. The way any particular class template is defined should be one of the last concerns in the design of any generic framework. It's all about concepts and algorithms.

I agree in principle; on the other hand, it is often quite difficult to tease out appropriately complete and orthogonal concepts until one has had significant experience in practice with the problem domain - that requires an actual working implementation against which concepts can be tested.

...

If by that you mean driving towards something based on the STL (reversible) container requirements in tables 65 and 66, then I strongly disagree. Those are bloated concepts and little or no generic code has been written to take advantage of them.

I'm not familiar with tables 65 and 66; I just think that the interface should support as much of the existing standard for one-dimensional containers as possible as there is existing code which would presumably work with little modification if that is done.

...

...
so that algorithms which are unconcerned with the order of traversal can use an efficient 1D iterator interface

That, I agree, is important.

...
- for example, if I want to replace every element of a matrix with its square, I should be able to do this :

matrix_type mat;

for (matrix_type::iterator it=mat.begin();it!=mat.end();++it) *it = (*it)*(*it);

That's potentially rather inefficient with all those .end() invocations. Anyway, I don't think low-level iteration over elements should be required by the concepts to be exposed as the primary begin() and end() iterators of a matrix, but through some kind of indirection. Since I am predisposed to using a free-function interface:

std::for_each(begin(elements(mat)), end(elements(mat)), _1 = _1*_1);

I'm curious - what's the rationale for preferring begin(elements(mat)) to mat.begin() ? While the syntax you describe looks fine to me, I think it's also important to remember that many numericists for whom this sort of library would be interesting are not interested in advanced coding techniques, per se, and will likely be put off by the need to learn lambda expressions, etc... and will generally prefer to not have to completely rework to the logic of all their code to port it effectively. There is a significant contingent of programmers who are clinging to FORTRAN because 1) there is a lot of existing code written in it, 2) they understand the syntax, 3) there is a perception that the added language machinery in C++ leads to performance compromises. If we're hoping to gain broad acceptance, it seems to me that the interface specification may need to be a bit fatter rather than lean to soften the transition.

...

I have no problem with any of these details being part of any specific array class. Whether or not they should be part of the library's concept requirements is another question (on which I haven't formed an opinion yet).

That's fine - I'm primarily interested in starting a dialogue. There continues to be a proliferation of different vector/matrix/tensor/array libraries, each of which is tailored to some particular domain or another. The existence of a standard interface (or several levels of interfaces supporting various functionality) would go a long way toward facilitating the growth of a coherent and interoperable set of numerical libraries in C++.

...

...
of arbitrary indices with no need to reorder the underlying storage (that is mat.transpose(0,1) causes all subsequent calls to mat(i,j) to provide the transposed element without moving the elements themselves).

I'm not sure that's a nice property, since it means that the traversal order of the matrix is not a compile-time property.

This is actually precisely the kind of issue that I'm hoping to side-step by allowing flexible implementation - I imagine different requirements will call for different concepts to be supported. In some applications having static traversal order may not be as important as being able to dynamically reconfigure array access, whereas in others it may be critical. However, for another large class of algorithms it may well be irrelevant and, for those applications, it should be an unspecified implementation detail.

...

Your matrix implementation might be very nice, but from my point of view, using any specific implementation as the basis of a new linear algebra library design is starting from the wrong end of things... so for me, at least, looking at it in detail would be premature.

I understand your point here and agree that algorithms should only require that their arguments support the relevant concepts. I'm not arguing that my particular implementation should be considered as the basis for a new MTL, just that it would be hugely beneficial if a new linear algebra library was based on a well considered and reasonably mature multidimensional array interface/concept set. I think that it is critically important to decouple the 2D array aspects of a linear algebra library from the actual linear algebra aspects - there are many algorithms which one might want to apply to a matrix which have nothing to do with linear algebra. It would be very nice to see Boost put forth at least the start of a solution to the ongoing problem of proliferating array and matrix libraries. ------------------------------------------------------------------------ --------------------------- Matthias Schabel, Ph.D. Utah Center for Advanced Imaging Research 729 Arapeen Drive Salt Lake City, UT 84108 801-587-9413 (work) 801-585-3592 (fax) 801-706-5760 (cell) 801-484-0811 (home) mschabel at ucair med utah edu

David Abrahams

7:29 p.m.

New subject: multidimensional arrays (was Boost Mathematicians)

Matthias Schabel <boost@schabel-family.org> writes:

...

...
Maybe so, but I wouldn't start there. The way any particular class template is defined should be one of the last concerns in the design of any generic framework. It's all about concepts and algorithms.

I agree in principle; on the other hand, it is often quite difficult to tease out appropriately complete and orthogonal concepts until one has had significant experience in practice with the problem domain - that requires an actual working implementation against which concepts can be tested.

...
If by that you mean driving towards something based on the STL (reversible) container requirements in tables 65 and 66, then I strongly disagree. Those are bloated concepts and little or no generic code has been written to take advantage of them.

I'm not familiar with tables 65 and 66; I just think that the interface should support as much of the existing standard for one-dimensional containers as possible as there is existing code which would presumably work with little modification if that is done.

I doubt it. My claim is that very little code is truly written to the standard container requirements. It's well known that the standard containers are not usually substitutable for one-another (Scott Meyers has written extensively on the subject).

...

...
That's potentially rather inefficient with all those .end() invocations. Anyway, I don't think low-level iteration over elements should be required by the concepts to be exposed as the primary begin() and end() iterators of a matrix, but through some kind of indirection. Since I am predisposed to using a free-function interface:

std::for_each(begin(elements(mat)), end(elements(mat)), _1 = _1*_1);

I'm curious - what's the rationale for preferring begin(elements(mat)) to mat.begin() ? While the syntax you describe looks fine to me, I think it's also important to remember that many numericists for whom this sort of library would be interesting are not interested in advanced coding techniques, per se

Do you find built-in arrays to be advanced? You can't call member functions on those, so you need to use non-members in generic code if you want to operate on them.

...

, and will likely be put off by the need to learn lambda expressions,

There's nothing in what I was describing that requires lambda expressions; using std::for_each drastically simplifies the code and probably makes it much more efficient due to built-in loop unrolling. Here; I'll write it with a for loop if it makes you happy... for(iterator<Matrix>::type i = begin(elements(mat) , finish = end(elements(mat)); i != finish; ++i) { *i = *i * *i; } Ugly.

...

...
...
of arbitrary indices with no need to reorder the underlying storage (that is mat.transpose(0,1) causes all subsequent calls to mat(i,j) to provide the transposed element without moving the elements themselves).

I'm not sure that's a nice property, since it means that the traversal order of the matrix is not a compile-time property.

This is actually precisely the kind of issue that I'm hoping to side-step by allowing flexible implementation - I imagine different requirements will call for different concepts to be supported. In some applications having static traversal order may not be as important as being able to dynamically reconfigure array access, whereas in others it may be critical. However, for another large class of algorithms it may well be irrelevant and, for those applications, it should be an unspecified implementation detail.

I'm not convinced that being able to specify the transposition at runtime is a significant advantage over something like: transpose<0,1>(mat) which can preseve full compile-time information. Having one interface is simpler than having two.

...

...
Your matrix implementation might be very nice, but from my point of view, using any specific implementation as the basis of a new linear algebra library design is starting from the wrong end of things... so for me, at least, looking at it in detail would be premature.

I understand your point here and agree that algorithms should only require that their arguments support the relevant concepts. I'm not arguing that my particular implementation should be considered as the basis for a new MTL, just that it would be hugely beneficial if a new linear algebra library was based on a well considered and reasonably mature multidimensional array interface/concept set.

Of course I think whatever I'm going to do will be well-considered ;-)

...

I think that it is critically important to decouple the 2D array aspects of a linear algebra library from the actual linear algebra aspects

I don't understand how or why. It's a linear algebra library; why should it be decoupled from linear algebra?

...

- there are many algorithms which one might want to apply to a matrix which have nothing to do with linear algebra. It would be very nice to see Boost put forth at least the start of a solution to the ongoing problem of proliferating array and matrix libraries.

I'm not sure why that's a problem, but I think that solving it is probably outside the scope of my work. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

7655

Age (days ago)

7716

Last active (days ago)

List overview

Download

18 comments

12 participants

participants (12)

Christoph Ludwig
David Abrahams
David B. Held
Jeremy Graham Siek
Jeremy Siek
Joel de Guzman
Larry Evans
Matthias Schabel
Michael Stevens
Toon Knapen
Topher Cooper
Vincent N. Virgilio
Walter Landry