Probing interest in fixed dimension matrix class

I have just finished writing a fixed dimensionality matrix template (kmatrix<T, Rows, Cols>) which appears to significantly outperform the ublas::matrix. Is there any interest? Below are the results of the benchmarks using Visual C++ 7.1 on an Intel Celeron 1.6 GHZ: Integer matrix multiplication: 2,2 X 2,2 kmatrix 94 msec elapsed ublas 578 msec elapsed 3,3 X 3,3 kmatrix 250 msec elapsed ublas 891 msec elapsed 100,100 X 100,100 kmatrix 593 msec elapsed ublas 1422 msec elapsed 100,1 X 1,100 kmatrix 172 msec elapsed ublas 266 msec elapsed 1,100 X 100,1 kmatrix 625 msec elapsed ublas 1718 msec elapsed Double matrix multiplication: 2,2 X 2,2 kmatrix 94 msec elapsed ublas 593 msec elapsed 3,3 X 3,3 kmatrix 297 msec elapsed ublas 1235 msec elapsed 100,100 X 100,100 kmatrix 766 msec elapsed ublas 1625 msec elapsed 100,1 X 1,100 kmatrix 203 msec elapsed ublas 328 msec elapsed 1,100 X 100, 1 kmatrix 813 msec elapsed ublas 2125 msec elapsed Christopher Diggins http://www.cdiggins.com

I'd be very interested in that functionality. Sascha christopher diggins wrote:
I have just finished writing a fixed dimensionality matrix template (kmatrix<T, Rows, Cols>) which appears to significantly outperform the ublas::matrix. Is there any interest? Below are the results of the benchmarks using Visual C++ 7.1 on an Intel Celeron 1.6 GHZ:
Integer matrix multiplication:
2,2 X 2,2 kmatrix 94 msec elapsed ublas 578 msec elapsed
3,3 X 3,3 kmatrix 250 msec elapsed ublas 891 msec elapsed
100,100 X 100,100 kmatrix 593 msec elapsed ublas 1422 msec elapsed
100,1 X 1,100 kmatrix 172 msec elapsed ublas 266 msec elapsed
1,100 X 100,1 kmatrix 625 msec elapsed ublas 1718 msec elapsed
Double matrix multiplication:
2,2 X 2,2 kmatrix 94 msec elapsed ublas 593 msec elapsed
3,3 X 3,3 kmatrix 297 msec elapsed ublas 1235 msec elapsed
100,100 X 100,100 kmatrix 766 msec elapsed ublas 1625 msec elapsed
100,1 X 1,100 kmatrix 203 msec elapsed ublas 328 msec elapsed
1,100 X 100, 1 kmatrix 813 msec elapsed ublas 2125 msec elapsed
Christopher Diggins http://www.cdiggins.com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Somewhere in the E.U., le 02/06/2005 Bonjour In article <008e01c566c1$e31850b0$2b792518@heronnest>, christopher diggins <cdiggins@videotron.ca> wrote:
I have just finished writing a fixed dimensionality matrix template (kmatrix<T, Rows, Cols>) which appears to significantly outperform the ublas::matrix. Is there any interest? Below are the results of the benchmarks using Visual C++ 7.1 on an Intel Celeron 1.6 GHZ:
[SNIP]
Christopher Diggins http://www.cdiggins.com
I believe we badly need some de jure standard for C++ incarnations of common mathematical concepts, in particular w.r.t. algebra and (euclidian) geometry. "Small" matrices are in that group. The fact that they would be fast would be icing on the cake, not the cake itself, as it were. What is most needed IMHO is a clear and rich set of links between the various elements of that collection (complex <-> plane (vector) rotation; rotation <-> geometric elements of the rotation (when meaningful); 2D, 3D, 4D points;...). Everybody and his (or her) favorite pet has a homegrown version of that kind of library, more or less naïve and more or less efficient, but what we need is really something which we could present as a standard, and speed (alone) is not sufficient for that to happen. Expressivity and richness of functionality would be better bets. Care must also be taken not to sever the bridges with more general packages such as ublas, of course. At any rate, many such proposals have been floated here but, AFAIK, none has ever been developed to the point of requesting a review, which I feel is really a shame. Merci Hubert Holin

I have just finished writing a fixed dimensionality matrix template (kmatrix<T, Rows, Cols>) which appears to significantly outperform the ublas::matrix. Is there any interest? Below are the results of the benchmarks using Visual C++ 7.1 on an Intel Celeron 1.6 GHZ:
I believe we badly need some de jure standard for C++ incarnations of common mathematical concepts, in particular w.r.t. algebra and (euclidian) geometry. "Small" matrices are in that group.
The fact that they would be fast would be icing on the cake, not the cake itself, as it were. What is most needed IMHO is a clear and rich set of links between the various elements of that collection (complex <-> plane (vector) rotation; rotation <-> geometric elements of the rotation (when meaningful); 2D, 3D, 4D points;...).
Whilst I'm happy that a mathematicians/physicists set of concepts and tools need hammering out for that field of programming, in many application areas (e.g. gaming, image/signal processing) there is also a need for a fast implementation of 1d, 2d and 3d data structures and associated processing routines that are going to approach the speed of vector C libraries. Several signal/image processing libraries exist though often in architecture-optimised binary-only forms (e.g. the richness of the Intel performance Primitives library). Others (such as MacSTL at www.pixelglow.com) use C++ techniques and interfaces to low level architecture-specific vectorisation that work reasonably well. Personally I prefer the use of non-architecture specific code that an autovectorising compiler has a good chance of optimising but defining a 'good chance' with current compilers is sometimes challenging if supporting multiple compilers/platforms. I believe that a 'vector processing' library should provide a data structure implementation that could maintain interoperability with C high-performance libraries (e.g. contiguous memory, perhaps assuming constant x,y,z strides). I'm not suggesting that would be its only implementation. It should be able to expose raw pointers, with an assumed data layout. I understand that exposing data structure is 'bad', but the fact is that users of such a library will often need to call diverse third party architecture optimised C libraries in cases where absolute performance is essential. Many C++ vector/matrix library implementations I've seen that hide the underlying structure and support variable size containers do pay a significant abstraction penalty in traversing the container or in making it difficult for compiler's to take advantage of additional optimisations that are possible with fixed sized containers (such as cache manipulation, pre-fetch). Paul

----- Original Message ----- From: "Paul Baxter" <pauljbaxter@hotmail.com> To: <boost@lists.boost.org> Sent: Thursday, June 02, 2005 10:59 AM Subject: [boost] Re: Probing interest in fixed dimension matrix class
I have just finished writing a fixed dimensionality matrix template (kmatrix<T, Rows, Cols>) which appears to significantly outperform the ublas::matrix. Is there any interest? Below are the results of the benchmarks using Visual C++ 7.1 on an Intel Celeron 1.6 GHZ:
I believe we badly need some de jure standard for C++ incarnations of common mathematical concepts, in particular w.r.t. algebra and (euclidian) geometry. "Small" matrices are in that group.
The library I wrote I do not believe has any constraint for smallness.
The fact that they would be fast would be icing on the cake, not the cake itself, as it were. What is most needed IMHO is a clear and rich set of links between the various elements of that collection (complex <-> plane (vector) rotation; rotation <-> geometric elements of the rotation (when meaningful); 2D, 3D, 4D points;...).
Whilst I'm happy that a mathematicians/physicists set of concepts and tools need hammering out for that field of programming, in many application areas (e.g. gaming, image/signal processing) there is also a need for a fast implementation of 1d, 2d and 3d data structures and associated processing routines that are going to approach the speed of vector C libraries.
I concur.
Several signal/image processing libraries exist though often in architecture-optimised binary-only forms (e.g. the richness of the Intel performance Primitives library). Others (such as MacSTL at www.pixelglow.com) use C++ techniques and interfaces to low level architecture-specific vectorisation that work reasonably well.
Personally I prefer the use of non-architecture specific code that an autovectorising compiler has a good chance of optimising but defining a 'good chance' with current compilers is sometimes challenging if supporting multiple compilers/platforms.
I believe that a 'vector processing' library should provide a data structure implementation that could maintain interoperability with C high-performance libraries (e.g. contiguous memory, perhaps assuming constant x,y,z strides).
That is precisely my implementation approach.
I'm not suggesting that would be its only implementation. It should be able to expose raw pointers, with an assumed data layout. I understand that exposing data structure is 'bad', but the fact is that users of such a library will often need to call diverse third party architecture optimised C libraries in cases where absolute performance is essential.
I don't expose data, but that would be easy enough and I would like to see it done.
Many C++ vector/matrix library implementations I've seen that hide the underlying structure and support variable size containers do pay a significant abstraction penalty in traversing the container or in making it difficult for compiler's to take advantage of additional optimisations that are possible with fixed sized containers (such as cache manipulation, pre-fetch).
I full agree. I have just posted my code to the public domain on comp.lang.c++ at http://tinyurl.com/d5oom Unfortunately I do not have time for the time being to make it fully boostified but I would like to help anyone who wants to make a full boost proposal, any takers? -Christopher Diggins

christopher diggins <cdiggins <at> videotron.ca> writes:
I have just posted my code to the public domain on comp.lang.c++ at http://tinyurl.com/d5oom Unfortunately I do not have time for the time being to make it fully boostified but I would like to help anyone who wants to make a full boost proposal, any takers?
Could you also post your benchmarks?. Bob

----- Original Message ----- From: "Bob Bell" <belvis@imageworks.com> To: <boost@lists.boost.org> Sent: Thursday, June 02, 2005 7:46 PM Subject: [boost] Re: Probing interest in fixed dimension matrix class
christopher diggins <cdiggins <at> videotron.ca> writes:
I have just posted my code to the public domain on comp.lang.c++ at http://tinyurl.com/d5oom Unfortunately I do not have time for the time being to make it fully boostified but I would like to help anyone who wants to make a full boost proposal, any takers?
Could you also post your benchmarks?.
Done! The url is the same. The new benchmarks improve speed for ublas by using noalias(A) = prod(B, C) but the kmatrix speed is still far superior on my machine. (celeron 1.6, yadda yadda yadda) -- Christopher Diggins http://www.cdiggins.com

christopher diggins wrote:
Done! The url is the same. The new benchmarks improve speed for ublas by using noalias(A) = prod(B, C) but the kmatrix speed is still far superior on my machine. (celeron 1.6, yadda yadda yadda)
Did you try to compare kmatrix with ublas::matrix<T, row_major, ublas::bounded_array<T, M*N> >; fres

----- Original Message ----- From: "Kresimir Fresl" <fresl@master.grad.hr> To: <boost@lists.boost.org> Sent: Friday, June 03, 2005 2:10 AM Subject: Re: [boost] Re: Probing interest in fixed dimension matrix class
christopher diggins wrote:
Done! The url is the same. The new benchmarks improve speed for ublas by using noalias(A) = prod(B, C) but the kmatrix speed is still far superior on my machine. (celeron 1.6, yadda yadda yadda)
Did you try to compare kmatrix with
ublas::matrix<T, row_major, ublas::bounded_array<T, M*N> >;
fres
Just did, basically the same results: comparing integer matricies 2,2 X 2,2 kmatrix 63 msec elapsed ublas 156 msec elapsed 3,3 X 3,3 kmatrix 219 msec elapsed ublas 438 msec elapsed 100,100 X 100,100 kmatrix 406 msec elapsed ublas 1187 msec elapsed 100,1 X 1,100 kmatrix 125 msec elapsed ublas 203 msec elapsed 1,100 X 100, 1 kmatrix 406 msec elapsed ublas 1172 msec elapsed comparing double matricies 2,2 X 2,2 kmatrix 78 msec elapsed ublas 172 msec elapsed 3,3 X 3,3 kmatrix 219 msec elapsed ublas 469 msec elapsed 100,100 X 100,100 kmatrix 547 msec elapsed ublas 1390 msec elapsed 100,1 X 1,100 kmatrix 156 msec elapsed ublas 219 msec elapsed 1,100 X 100, 1 kmatrix 484 msec elapsed ublas 1375 msec elapsed - Christopher Diggins

Paul Baxter <pauljbaxter <at> hotmail.com> writes:
Many C++ vector/matrix library implementations I've seen that hide the underlying structure and support variable size containers do pay a significant abstraction penalty in traversing the container or in making it difficult for compiler's to take advantage of additional optimisations that are possible with fixed sized containers (such as cache manipulation, pre-fetch).
One of the things I've always wanted to see (and have done some work on myself) is a vector/matrix library that separates algorithm from data structure, in a similar way that the standard library separates algorithms from containers. A result of this approach is a high degree of interoperability (e.g., it's possible to do things like multiply matrices from different libraries together, using the generic "multiply" algorithm). Bob

"Hubert Holin" <Hubert.Holin@meteo.fr> wrote
The fact that they would be fast would be icing on the cake, not the cake itself, as it were. What is most needed IMHO is a clear and rich set of links between the various elements of that collection (complex <-> plane (vector) rotation; rotation <-> geometric elements of the rotation (when meaningful); 2D, 3D, 4D points;...).
Addressing the issue of points, affine space and so on. There has been some discussion on the list regarding whether points and vectors should be distinguished as types, with allowed operations on points being restricted to addition of a vector, and subtration of two points resulting in a vector. However in some contexts addition and multiplication of points is meaningful. eg to find the point 1/3 rd way from the start of line(A , B) one could calculate 2/3 *A +1/3 * B. It is only in a context such as the above that addition and multiplication of points has meaning. It is possible to construct compile time structures using eg compile time rational numbers and enforcing conversion of the result to a point only where the rational sums to 1, however it would restrict the calculations to points where the weights are known at compile time. It is for this reason that I prefer to make no distinction between points and vectors and am happy to represent coordinates as vectors, which allows use of transform matrices. One can call them position-vectors. -------------- Rotation around an arbitrary axis is an example of a coordinate system transform, which can be represented by means of a matrix. I appreciate that a quaternion may be more efficient but I think it is limited to only rotation and scaling. Matrix transformations should be implemented first because they are generic. ----------------- In my experiments with physical quantities I found that it is possible (of course) to make a matrix conform to dimensional analysis. In terms of a 3D matrix for a type of dimension D the signature for a row-column matrix looks as follows (Forgive me for pushing the typeof library) typedef BOOST_TYPEOF( D() / D()) dimensionless; typedef BOOST_TYPEOF( 1 / D()) reciprocal; typedef D identity; The types in RC format look like so: | dimensionless, dimensionless, dimensionless, reciprocal, | | dimensionless, dimensionless, dimensionless, reciprocal, | | dimensionless, dimensionless, dimensionless, reciprocal, | | D, D, D, dimensionless | Note that this can actually make the matrix more comprehensible. For instance if D is a length type eg millimeters then The three D's in [3,0],[3,1],and [3,2] represent the x,y,z components of a translation in millimeters. Further it is not possible to accidentally multiply matrices of transposed types. Similarly a homogeneous coordinate looks like so : |dimensionless, dimensionless, dimensionless, reciprocal | I bring this up because perhaps you should consider this if you wish to make the matrices truly generic.( The dimensionless and reciprocal types are functions of the matrix calculations rather than a physical quantities library.)OTOH maybe you would need to restrict value-types to numeric types where both these functions are identities. regards Andy Little

Andy Little wrote:
eg to find the point 1/3 rd way from the start of line(A , B) one could calculate 2/3 *A +1/3 * B.
I think it is more natural to compute this as A + (1/3)*(B - A), where A and B are points and B - A is a vector. So you're really still just adding a vector to a point to get a new point. There is never a need to add or rescale points.

Deane Yang wrote:
Andy Little wrote:
eg to find the point 1/3 rd way from the start of line(A , B) one could calculate 2/3 *A +1/3 * B.
I think it is more natural to compute this as
A + (1/3)*(B - A),
where A and B are points and B - A is a vector. So you're really still just adding a vector to a point to get a new point. There is never a need to add or rescale points.
Now, to contradict myself, it is useful to rescale and add points in at least one circumstance: when you want the "center of mass" or the "average" of a set of points A1,..., An. In principle, you can do it as B + [(A1-B) + .... + (An - B)]/n but that's a lot of extra subtractions. If I needed this (or a weighted average), I would probably encapsulate it in a weighted average function (which would internally rescale and add the points) and still disallow rescaling and adding points in my library interface. But I guess I now see why someone might just find it more convenient to allow rescaling and adding points.

"Deane Yang" <deane_yang@yahoo.com> wrote
Deane Yang wrote:
Andy Little wrote:
eg to find the point 1/3 rd way from the start of line(A , B) one could calculate 2/3 *A +1/3 * B.
I think it is more natural to compute this as
A + (1/3)*(B - A),
where A and B are points and B - A is a vector. So you're really still just adding a vector to a point to get a new point. There is never a need to add or rescale points.
Now, to contradict myself, it is useful to rescale and add points in at least one circumstance: when you want the "center of mass" or the "average" of a set of points A1,..., An. In principle, you can do it as
B + [(A1-B) + .... + (An - B)]/n
but that's a lot of extra subtractions.
IMO there are a large number of cases eg when interpolating positions on a curve/surface where multiplication and addition is necessary.
If I needed this (or a weighted average), I would probably encapsulate it in a weighted average function (which would internally rescale and add the points) and still disallow rescaling and adding points in my library interface.
But I guess I now see why someone might just find it more convenient to allow rescaling and adding points.
Thanks for that :-) I do feel that it is necessary to get the basics thrashed out as they are the simplest building blocks of a graphics system. regards Andy Little

"Deane Yang" <deane_yang@yahoo.com> wrote
Andy Little wrote:
eg to find the point 1/3 rd way from the start of line(A , B) one could calculate 2/3 *A +1/3 * B.
I think it is more natural to compute this as
A + (1/3)*(B - A),
where A and B are points and B - A is a vector. So you're really still just adding a vector to a point to get a new point. There is never a need to add or rescale points.
Ideally the two expressions should be equivalent: 2/3 *A +1/3 * B == A + (1/3)*(B - A) = (2 * A + B ) / 3 etc If no multiplication or addition is allowed on points, this is not the case. (Welll.. its possible to work it at compile time, but I believe this is too restrictive) The points implementation is getting in the way of the expression. Expressions will fail for no good reason. OTOH These calculations work correctly if A and B are 'position-vectors'. regards Andy Little

Andy Little wrote:
Ideally the two expressions should be equivalent:
2/3 *A +1/3 * B == A + (1/3)*(B - A) = (2 * A + B ) / 3 etc
If no multiplication or addition is allowed on points, this is not the case. (Welll.. its possible to work it at compile time, but I believe this is too restrictive) The points implementation is getting in the way of the expression. Expressions will fail for no good reason. OTOH These calculations work correctly if A and B are 'position-vectors'.
What is the difference between a "point" and a "position-vector"? Unfortunately, it appears that everyone uses similar words but with different specific meanings. For me, a "point" is an element in n-dimensional flat affine space. A "vector" is an element in the associated n-dimensional vector space (sometimes called the "tangent space"). So the allowable operations are the standard vector space operations for vectors (vector addition and scalar multiplication) plus point + vector -> point This should be enough to do any computation that anybody would ever need. And, unless computational speed is a real issue, I think it's much clearer to always code one's computations in this form. In my mind, the meaning of A + (1/3)*(B-A) ("start at A and go 1/3 the distance towards B") is much more transparent than (2/3)A + (1/3)B ("rescale A by 2/3; rescale B by 1/3; add") However, I do acknowledge that for intensive computations you might not want to constrain yourself in this way.

"Deane Yang" <deane_yang@yahoo.com> wrote in message news:d7nich$msv$1@sea.gmane.org...
Andy Little wrote:
Ideally the two expressions should be equivalent:
2/3 *A +1/3 * B == A + (1/3)*(B - A) = (2 * A + B ) / 3 etc
If no multiplication or addition is allowed on points, this is not the case. (Welll.. its possible to work it at compile time, but I believe this is too restrictive) The points implementation is getting in the way of the expression. Expressions will fail for no good reason. OTOH These calculations work correctly if A and B are 'position-vectors'.
What is the difference between a "point" and a "position-vector"?
I'm sure no expert on this but FWIW this is my understandoing: A position-vector is a work-around because it is not possible to have adequate C++ control of operations on points (in affine space), as previously explained. A position-vector is a just a vector, where one makes the assumption that all vectors are added to the Point (0,0) and any result is added to the Point (0,0) (I believe addtion of Point(0,0) to another point is valid as well as multiplication by 1. By suitable equation bending you can arrive to the above result) Therefore all vector ops are allowed. regards Andy Little
participants (8)
-
Andy Little
-
Bob Bell
-
christopher diggins
-
Deane Yang
-
Hubert Holin
-
Kresimir Fresl
-
Paul Baxter
-
Sascha Seewald