
DE wrote:
* split containers from expressions don't get what you mean by that they are already splitted
With that I meant that every linear algebra library tends to provide its own implementation of vector and matrix types. Sometimes with different access semantics. One could use std::vector for dense vectors, (unordered) map for sparse vectors/matrices. Perhaps what's missing in the standard containers are matrices? In other words, I would prefer using a traits system to access any container. Suppose if only standard containers were to be used, perhaps let(C) = a*A*B + b*C will be enough to inject template expressions? I've also noted that matrix libraries are usually not exhaustive with their matrix classes. There's all kinds of stuff, like symmetric, triangular, diagonal, banded, bidiagonal, tridiagonal, .... Adaptors / math property wrapper come in handy too, if you have another container which happens to be a symmetric matrix or a symmetric matrix which happens to be positive definite. E.g., suppose A is dense, x = solve( A, b ) x = solve( positive_definite( upper( A ) ), b ) in the second case you say to use the upper triangular part, and make that pos def. in my work for writing a python translator from LAPACK's Fortran source base to C++ bindings code, I've come across the specializations available in LAPACK, that might give a clue for an exhaustive list and write down a good set of orthogonal concepts (e.g., storage traits, structure traits, math properties, etc.).
* split algorithms from expressions, i.e., enable pluggable/customisable back-ends into the expression templates actually it's rather trivial specialize a template for concrete expression to your taste and you are done
Although this sounds good, I think in some cases there will be multiple matches of available algorithms. Then the specialization for the expression with the highest numerical complexity should "win" (assuming the gain is highest there). Is this trivial, too? This would be really neat, I think an entire area of research is writing algorithms for special cases.
* achieve performance of FORTRAN-based programs (dominant in HPC) that is the hardest part i suppose but i'm sure it is achievable
Given the previous point, it's trivial to plug-in BLAS and LAPACK, so plus minus some handling of temporaries, we might already be almost there. What I forgot to mention was error bounds and error propagation. For many users this may be more important than speed. Many algorithms are specialized for this purpose only (same speed, but higher precision). Cheers, Rutger