different matrix library?

newer
persistence library proposal (with...

DE

10 Aug 2009 10 Aug '09

5:04 p.m.

hi i'm new to boost so please don't blame me the question is do you need a matrix library which allows you to write clear, human readable code like dV = (1/m)*force + transpose(A)*g - cross_product(omega, V) //here dV is a derivative of velocity, m is (scalar) mass //A is transition matrix and //g is gravity acceleration vector //omega is angular velocity and V is velocity vector itself //force is defined in body space (e.g. aerodynamics and thrust) that is an expression of newton's 2nd law (or if you like 'change in momentum' law: d(m*V)/dt = d_local(m*V)/dt + (omega)x(m*V) = sum_of(forces) as you can see so far i'm a math nerd) of course it must be implemented through lazy evaluation technique (which immediately implies expression templates) ps i looked at uBLAS and the design seemed obsolete to me so i thought there could be a better approach then blindly following blas design pps thanks for reading til here and forgive for (possibly) wasting your time

Show replies by date

Frédéric Bron

11 Aug 11 Aug

8:39 a.m.

...

i looked at uBLAS and the design seemed obsolete to me so i thought there could be a better approach then blindly following blas design

What is the issue with uBLAS? Frédéric

joel

8:43 a.m.

Frédéric Bron wrote:

...

What is the issue with uBLAS? BLAS interface is far too low level in term of expressivity and syntaxically looks old.

-- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

DE

9:32 a.m.

on 11.08.2009 at 12:39 Frederic Bron wrote :

...

...
i looked at uBLAS and the design seemed obsolete to me so i thought there could be a better approach then blindly following blas design

...

What is the issue with uBLAS?

in my opinion a piece of software must be targeted to target users (excuse for a pun) i see a generic scientist as target user who already knows about matrix calculus and may not know about blas for such person writing u = A*v + w is much more natural than that of blas terms i think that user must focus on the design and the intent rather than how to express the thing in blas terms -- Pavel ps actually the code is 99% complete (and already working for my own purposes) it is just to be customized for boost library and there are minor changes to be made to the implementation

Edward Grace

10:54 a.m.

...

...
What is the issue with uBLAS?

in my opinion a piece of software must be targeted to target users (excuse for a pun) i see a generic scientist

Hmm, a generic scientist.... There's a thought.

...

as target user who already knows about matrix calculus and may not know about blas for such person writing u = A*v + w is much more natural than that of blas terms

In principle yes - however the efficient implementation of the above is likely to be a lot tricker than it might seem. The 'advantage' of BLAS and other routines as I see it is many many man-years of optimisation and tweaking on various architectures as well as good generic implementations (ATLAS). I'd venture that it's still going to be hard to beat!

...

i think that user must focus on the design and the intent rather than how to express the thing in blas terms

I am slightly unsure, is your proposal a rewrite of the linear algebra routines or a wrapper that conceptually maps calls in the following manner? yourlib::operator*(foo,bar) -> ublas::prod(foo,bar) Are you aware of Blitz++ and POOMA? Blitz offers a very (for the mathematical physicist) intuitive tensor- like approach, an example: // This expression will set // // c = a * b // ijk ik kj C = A(i,k) * B(k,j); -ed

DE

1:31 p.m.

on 11.08.2009 at 14:54 Edward Grace wrote :

...

In principle yes - however the efficient implementation of the above is likely to be a lot tricker than it might seem. will you as a user concern about how tricky an implementation is? or will you rather care about how convinient and clear public interface is?

...

The 'advantage' of BLAS and other routines as I see it is many many man-years of optimisation and tweaking on various architectures as well as good generic implementations (ATLAS). I'd venture that it's still going to be hard to beat! the advantage of BLAS as far as i can see is if you take original fortran implementation and use it as it is (wow, how many 'as's and 'is's) as i understand that's the implementation you are talking about when mentioning 'man-years' i hope to make an implementation 'as good as' but which will exploit all of c++ advantages

...

I am slightly unsure, is your proposal a rewrite of the linear algebra routines or a wrapper that conceptually maps calls in the following manner? yourlib::operator*(foo,bar) -> ublas::prod(foo,bar) definitely it's not a wrapper so i guess it's a rewrite in c++ style

...

Are you aware of Blitz++ and POOMA? Blitz offers a very (for the mathematical physicist) intuitive tensor- like approach, an example: // This expression will set // // c = a * b // ijk ik kj C = A(i,k) * B(k,j); i'm aware of blitz++ (i peeped at the implementation a little when i wrote my own lib) i don't use it because i don't like the concept

-- Pavel

Edward Grace

2:16 p.m.

...

...
In principle yes - however the efficient implementation of the above is likely to be a lot tricker than it might seem. will you as a user concern about how tricky an implementation is? or will you rather care about how convinient and clear public interface is?

The latter - naturally! If you can make it extensible so that the back end could (for example) be thread, CUDA, and MPI aware so much the better. ;-)

...

...
The 'advantage' of BLAS and other routines as I see it is many many man-years of optimisation and tweaking on various architectures as well as good generic implementations (ATLAS). I'd venture that it's still going to be hard to beat! the advantage of BLAS as far as i can see is if you take original fortran implementation and use it as it is (wow, how many 'as's and 'is's) as i understand that's the implementation you are talking about when mentioning 'man-years' i hope to make an implementation 'as good as' but which will exploit all of c++ advantages

I look forward to it! Competing with top-end platform tuned FORTRAN90 compilers is not a challenge for the feint hearted!

...

...
I am slightly unsure, is your proposal a rewrite of the linear algebra routines or a wrapper that conceptually maps calls in the following manner? yourlib::operator*(foo,bar) -> ublas::prod(foo,bar) definitely it's not a wrapper so i guess it's a rewrite in c++ style

Well. You're a braver man than I! Building a templated linear algebra library is a lot of work!

...

...
Are you aware of Blitz++ and POOMA? Blitz offers a very (for the mathematical physicist) intuitive tensor- like approach, an example: // This expression will set // // c = a * b // ijk ik kj C = A(i,k) * B(k,j); i'm aware of blitz++ (i peeped at the implementation a little when i wrote my own lib) i don't use it because i don't like the concept

How so? I don't use Blitz++ as it appears to have stagnated - very unfortunate. The concept seems quite attractive however. For example Blitz++ style expressiveness need not constrain you to scalars (tensors of order zero), vectors (tensors of order one) or matricies (tensors of order 2) and can allow far more flexibility than simple matrix calculus. For example, viewing _1 as placeholders in the a similar manner to boost::bind, being able to write things (I'm ignoring contra- vs. co- variance) like: typedef std::complex complex; boost::yourlib::tensor<complex,0> epsilon; // Same as std::complex epsilon since it's a scalar; boost::yourlib::tensor<complex,1> P,E; boost::yourlib::tensor<complex,2> Chi1; // Other template arguments could indicate upper or lower indices. boost::yourlib::tensor<complex,3> Chi2; boost::yourlib::tensor<complex,4> Chi3; //... stuff using boost::yourlib::einstein_placeholders; P(_1) = Chi1(_1,_2)*E(_2) + Chi2(_1,_2,_3)*E(_2)*E(_3) + Chi3 (_1,_2,_3,_4)*E(_2)*E(_3)*E(_4); so that the Einstein summation convention was observed (summation over repeated indices), http://en.wikipedia.org/wiki/Einstein_notation http://en.wikipedia.org/wiki/Tensor would be superdoubleplusgood. Incidentally you may be interested in the Tensor Contraction Engine, http://www.csc.lsu.edu/~gb/TCE/ or the tensor template library http://ttl.yanenko.com/ -ed ------------------------------------------------ "No more boom and bust." -- Dr. J. G. Brown, 1997

DE

4:09 p.m.

...

...
will you as a user concern about how tricky an implementation is? or will you rather care about how convinient and clear public interface is? The latter - naturally! If you can make it extensible so that the back end could (for example) be thread, CUDA, and MPI aware so much the better. ;-) well at this moment i can not imagine such a mechanism to transform

on 11.08.2009 at 18:16 Edward Grace wrote : particular (no matter how complex, e.g. (a + b*c - d*e*f...)) code to lib calls i will think about a possibility of such

...

...
i hope to make an implementation 'as good as' but which will exploit all of c++ advantages I look forward to it! Competing with top-end platform tuned FORTRAN90 compilers is not a challenge for the feint hearted! i'l do my best! but i will need someone help to express some equations in fortran to compare the performance

...

Well. You're a braver man than I! Building a templated linear algebra library is a lot of work! indeed! (but it is very interesting too) and once it is built won't i just throw it away?

...

How so? I don't use Blitz++ as it appears to have stagnated - very unfortunate. The concept seems quite attractive however. For example Blitz++ style expressiveness need not constrain you to scalars (tensors of order zero), vectors (tensors of order one) or matricies (tensors of order 2) and can allow far more flexibility than simple matrix calculus. For example, viewing _1 as placeholders in the a similar manner to boost::bind, being able to write things (I'm ignoring contra- vs. co- variance) like: [here goes code] so that the Einstein summation convention was observed (summation over repeated indices), would be superdoubleplusgood. you made my mind boil! i don't use such notation so i did not even think to implement such a feature but if you claim this feature is essential... well it must be implemented indeed and don't like that... placeholders

...

Incidentally you may be interested in the Tensor Contraction Engine, http://www.csc.lsu.edu/~gb/TCE/ or the tensor template library http://ttl.yanenko.com/ i ran through that TCE is targeted to chemists and fortran programmers so it's not rather interesting and the latter supplies only static sized tensors (if i understood right) which is not flexible in general

-- Pavel

Edward Grace

2:21 p.m.

...

i'm aware of blitz++ (i peeped at the implementation a little when i wrote my own lib) i don't use it because i don't like the concept

I take back my comment on the status of Blitz++, http://sourceforge.net/projects/blitz/ it still seems to be going strong, the time-stamps on the web site are somewhat misleading. -ed

Zoran Cvetkov

2:34 p.m.

Edward Grace wrote:

...

...
i'm aware of blitz++ (i peeped at the implementation a little when i wrote my own lib) i don't use it because i don't like the concept

I take back my comment on the status of Blitz++,

http://sourceforge.net/projects/blitz/

it still seems to be going strong, the time-stamps on the web site are somewhat misleading.

-ed _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Have you looked at Eigen? it seems to fulfill all your requirements...

Edward Grace

2:53 p.m.

...

...
Have you looked at Eigen? it seems to fulfill all your requirements...

Well, colour me impressed. I was unaware of that library, thanks a lot! Others may like to check out: http://eigen.tuxfamily.org/index.php?title=API_Showcase -ed ------------------------------------------------ "No more boom and bust." -- Dr. J. G. Brown, 1997

John Maddock

11:07 a.m.

...

...
...
i looked at uBLAS and the design seemed obsolete to me so i thought there could be a better approach then blindly following blas design

...
What is the issue with uBLAS?

in my opinion a piece of software must be targeted to target users (excuse for a pun) i see a generic scientist as target user who already knows about matrix calculus and may not know about blas for such person writing u = A*v + w is much more natural than that of blas terms i think that user must focus on the design and the intent rather than how to express the thing in blas terms

Please note we're not talking about BLAS (C interface), but uBLAS which is already a Boost vector/matrix library here: http://www.boost.org/doc/libs/1_39_0/libs/numeric/ublas/doc/index.htm You also need to be very careful with allowing "simple" expressions like u = A*v + w As traditionally these have awful performance due to the temporaries involved. There is also an ongoing Boost-sister-project to devise a "next generation" matrix library for C++, but I've lost the link for the moment :-( Cheers, John.

joel

11:18 a.m.

John Maddock wrote:

...

You also need to be very careful with allowing "simple" expressions like

u = A*v + w

As traditionally these have awful performance due to the temporaries involved.

Expression templates and DSEL want you to call back ;) This problem is solved since Blitz++ and generizable using proto

...

There is also an ongoing Boost-sister-project to devise a "next generation" matrix library for C++, but I've lost the link for the moment :-( Mine or someone else ?

-- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

DE

1:16 p.m.

...

Please note we're not talking about BLAS (C interface), but uBLAS which is already a Boost vector/matrix library here: http://www.boost.org/doc/libs/1_39_0/libs/numeric/ublas/doc/index.htm i know that but uBLAS is an attempt to implement C library (actually Fortran) by means of C++ as far as i can see

on 11.08.2009 at 15:07 John Maddock wrote : that is not good i propose pure C++ solution

...

You also need to be very careful with allowing "simple" expressions like

...

u = A*v + w

...

As traditionally these have awful performance due to the temporaries involved. of course an implementation must eleminate such performance issues that's exactly what i'm talking about

...

There is also an ongoing Boost-sister-project to devise a "next generation" matrix library for C++, but I've lost the link for the moment :-( unfortunally i don't know about that

-- Pavel

joel

5:30 p.m.

DE wrote:

...

i know that but uBLAS is an attempt to implement C library (actually Fortran) by means of C++ as far as i can see that is not good i propose pure C++ solution

well, if you're not afraid to team up, i've been working on such a library (matrix that's it) since 2004 and I'll gladly share my work if needed. -- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

DE

5:46 p.m.

...

...
i know that but uBLAS is an attempt to implement C library (actually Fortran) by means of C++ as far as i can see that is not good i propose pure C++ solution well, if you're not afraid to team up, i've been working on such a library (matrix that's it) since 2004 and I'll gladly share my work if needed. i'd be glad to i'll try to post my code and docs somewhere on friday (or saturday at most)

on 11.08.2009 at 21:30 joel wrote : there we'll see for now i propose to prove those angry men that boost will benefit from such library -- Pavel ps would you mind to post some lib code and examples of yours?

joel

5:52 p.m.

DE wrote:

...

for now i propose to prove those angry men that boost will benefit from such librar Best approach is always do, show, convince. I was burned with my own Boost.SIMD proposal. It's better to work first THEN show off.

For my own work, see : http://www.lri.fr/~falcou/pub/falcou-ACIVS-2004.pdf http://www.springerlink.com/content/l4r4462r25740127/ and the old page at sourceforge : http://nt2.sourceforge.net I had a bit of discussion in the ML before, with comaprison w/r to Eigen -- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

DE

6:15 p.m.

on 11.08.2009 at 21:52 joel wrote :

...

Best approach is always do, show, convince. I was burned with my own Boost.SIMD proposal. It's better to work first THEN show off.

...

For my own work, see :

...

http://www.lri.fr/~falcou/pub/falcou-ACIVS-2004.pdf http://www.springerlink.com/content/l4r4462r25740127/

...

and the old page at sourceforge : http://nt2.sourceforge.net

...

I had a bit of discussion in the ML before, with comaprison w/r to Eigen

i'll run through that links for now listen as i can see you are familiar to simd programming so when i researched the expression template based impementation of vector operations i tried to use sse2 though compiler intrinsics (since they are de facto portable) but i did not get any benefit for doubles i think that's because load/store operations consumed boost from actual add_pd's etc. so evaluating something like (pairwise for sse2) vec + vec*scalar - vec + element_wise_mul(vec, vec) yields the same time for sse2 and plain implementations however for floats i got significant boost (~2 however not 4) so since i prefer doubles i droped such simd optimizations (but they are still easily possible) i'm interested about what you can say on this -- Pavel

joel

6:21 p.m.

DE wrote:

...

i'll run through that links If mor eis needed, feel free to mail me in private ;) for now listen as i can see you are familiar to simd programming

among other ;)

...

so when i researched the expression template based impementation of vector operations i tried to use sse2 though compiler intrinsics (since they are de facto portable) but i did not get any benefit for doubles ... i'm interested about what you can say on thi For simpel operation you can't get more than 20 or 30% speedup an dmost of the time you get none. You can however speed up things like transendental or trigo functions by 2 or 3.

Search the archive for my extended SIMD performances chart using the SIMD layer from NT2. We also target multicore using openMP (rather trivial) and are starting GPUs this year with a new post-doctoral grant. SO I hope to get everythign working together to get some compelte, be-all end-all matrix library out of that. -- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

DE

6:43 p.m.

...

We also target multicore using openMP (rather trivial) and are starting GPUs this year with a new post-doctoral grant. SO I hope to get everythign working together to get some compelte, be-all end-all matrix library out of that.

on 11.08.2009 at 22:21 joel wrote : this cycle is eternal the next two cycles are AVX and GPGPU's (larrabee) -- Pavel

joel

6:56 p.m.

DE wrote:

...

this cycle is eternal : the next two cycles are AVX and GPGPU's (larrabee)

Yeah, that's keep me busy for the next 10 years then ;) -- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

Patrick Mihelich

8:53 p.m.

I hope you will release sooner than in 10 years though :). I'm already eager to see your SIMD stuff. Pavel, let me second Zoran's recommendation of Eigen2. IMO it far outclasses any other currently released C++ linalg library. It is pure C++, not bindings to BLAS/LAPACK, but already as fast as things like Intel MKL for many operations. Writing such a library from scratch is a massive undertaking. I would suggest working with Joel or the Eigen guys. Patrick On Tue, Aug 11, 2009 at 11:56 AM, joel <joel.falcou@lri.fr> wrote:

...

DE wrote:

...
this cycle is eternal : the next two cycles are AVX and GPGPU's (larrabee)

Yeah, that's keep me busy for the next 10 years then ;)

-- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

joel

9:18 p.m.

Patrick Mihelich wrote:

...

I hope you will release sooner than in 10 years though :). I'm already eager to see your SIMD stuff.

I have to "boostify" it so to speak. Currently it's a bit deep in unrelated stuff. Or maybe I could give a sneak peak at the non-boost version ? -- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

DE

12 Aug 12 Aug

7:28 a.m.

on 12.08.2009 at 0:53 Patrick Mihelich wrote :

...

Pavel, let me second Zoran's recommendation of Eigen2. IMO it far outclasses any other currently released C++ linalg library. It is pure C++, not bindings to BLAS/LAPACK, but already as fast as things like Intel MKL for many operations.

...

Writing such a library from scratch is a massive undertaking. I would suggest working with Joel or the Eigen guys. the thing is even if a library like we are talking about to be written from scratch i'm not afraid of that because i've already done that once and i know exactly how to do it some copy-paste, some snippets and voila but thanks for your bothering anyway in the end i think we all will benefit from (re)designing such a library

-- Pavel ps BTW i wrote very good (i believe) matrix inversion function and wanted to compare to one from intel (IPP or MKL - i don't remeber now) i laughed loud when intel's routine yielded stack overflow! well i think they used recursion to my disappointing

Sebastian Nowozin

7:40 a.m.

Hello, DE wrote:

...

on 12.08.2009 at 0:53

...

Patrick Mihelich wrote :

...
Pavel, let me second Zoran's recommendation of Eigen2. IMO it far outclasses any other currently released C++ linalg library. It is pure C++, not bindings to BLAS/LAPACK, but already as fast as things like Intel MKL for many operations.

...
Writing such a library from scratch is a massive undertaking. I would suggest working with Joel or the Eigen guys. the thing is even if a library like we are talking about to be written from scratch i'm not afraid of that because i've already done that once and i know exactly how to do it some copy-paste, some snippets and voila but thanks for your bothering anyway in the end i think we all will benefit from (re)designing such a library

A library similar in spirit to Eigen2 is GETFEM GMM++: http://home.gna.org/getfem/gmm_intro.html It has excellent support for sparse matrices and is also a pure header-based template library (with backends to *PACK and linear system solvers). Sebastian

DE

10:02 a.m.

on 12.08.2009 at 11:40 Sebastian Nowozin wrote :

...

A library similar in spirit to Eigen2 is GETFEM GMM++: http://home.gna.org/getfem/gmm_intro.html It has excellent support for sparse matrices and is also a pure header-based template library (with backends to *PACK and linear system solvers).

thanks for the link i finally can say (with proof) that a generic c++ implementation can be comparable or even faster than e.g. genuine fortran blas implementation so at least the task (of writing such lib) renders to be feasible -- Pavel

DE

7:36 a.m.

to say about threading on 11.08.2009 at 22:21 joel wrote :

...

We also target multicore using openMP (rather trivial) and are starting GPUs this year with a new post-doctoral grant. SO I hope to get everythign working together to get some compelte, be-all end-all matrix library out of that.

i tried to parallelize some operations using openmp but got significant speedup only for matrix-vector and matrix-matrix multiplication so i came to a thought that multithreading for linalg operations of moderate sizes is even harmful rather threading must be done at higher levels e.g. when monte-carlo simulating it is trivial to compute subsequent realizations in several threads -> minimum sinchronization, minimum complexity, maximum speedup so the question is: is threading for such lib needed as air? -- Pavel

joel

8:15 a.m.

...

i tried to parallelize some operations using openmp but got significant speedup only for matrix-vector and matrix-matrix multiplication YOu have to test larger problem than that. I've threaded alrge image

processing algorithm, motne carlo simulation etc and all benefit from threading. BEware also that things like cache false sharing and wrong setup of private variable may screw up openMP. For pthread, see my former paper at EuroPar 08. I added a tempalte layer able to compute when it was useful to parallelize a given expression.

...

so the question is: is threading for such lib needed as air Yes, but as all thing like this, you need to be able to set up a policy for it.

-- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

joel

11 Aug 11 Aug

8:42 a.m.

DE wrote:

...

hi i'm new to boost so please don't blame me

the question is do you need a matrix library which allows you to write clear, human readable code like

dV = (1/m)*force + transpose(A)*g - cross_product(omega, V) //here dV is a derivative of velocity, m is (scalar) mass //A is transition matrix and //g is gravity acceleration vector //omega is angular velocity and V is velocity vector itself //force is defined in body space (e.g. aerodynamics and thrust)

that is an expression of newton's 2nd law (or if you like 'change in momentum' law: d(m*V)/dt = d_local(m*V)/dt + (omega)x(m*V) = sum_of(forces) as you can see so far i'm a math nerd) of course it must be implemented through lazy evaluation technique (which immediately implies expression templates NT2 is a library of mine which aims at this. It's not complete yet even if an old prototype is available on sourceforge.

-- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

Rutger ter Borg

11:18 a.m.

DE wrote: [snip]

...

ps i looked at uBLAS and the design seemed obsolete to me so i thought there could be a better approach then blindly following blas design

I've started some work on integrating a high-level interface to BLAS/LAPACK in the Numeric Bindings library (which is in the sandbox, see http://tinyurl.com/otes8m). E.g., the high-level solve function intends to collapse about 40 lapack functions into one c++ function. There's also Karl Meerbergen's GLAS library at http://tinyurl.com/nx7tld. Kind regards, Rutger ter Borg

DE

1:49 p.m.

on 11.08.2009 at 15:18 Rutger ter Borg wrote :

...

I've started some work on integrating a high-level interface to BLAS/LAPACK in the Numeric Bindings library (which is in the sandbox, see http://tinyurl.com/otes8m). E.g., the high-level solve function intends to collapse about 40 lapack functions into one c++ function. wrapping blas or uBLAS is not my intent uBLAS does not support some handy concepts such as complexity of expressions so if you try to wrap uBLAS involving complexity you probably simply double code size because too many is to be wrapped e.g. (V1+V2) has linear complexity, (M1*V1) has square compl., (M1*M2) has cubic compl. etc.

...

There's also Karl Meerbergen's GLAS library at http://tinyurl.com/nx7tld. again as i can see it's a wrapper, not a standalone lib

-- Pavel

Rutger ter Borg

2:55 p.m.

DE wrote:

...

wrapping blas or uBLAS is not my intent uBLAS does not support some handy concepts such as complexity of expressions so if you try to wrap uBLAS involving complexity you probably simply double code size because too many is to be wrapped e.g. (V1+V2) has linear complexity, (M1*V1) has square compl., (M1*M2) has cubic compl. etc.

I would argue that wrapping around back-ends is the best one could do, given the vendor support they have, and man-years spent on them. E.g., using template expressions to rewrite a free expression to an optimal set of back- end calls (to BLAS, LAPACK, FFTW, etc.)? Then you would be able to benefit from vendor-optimised back-ends, and the expressiveness of C++. Cheers, Rutger

DE

4:15 p.m.

on 11.08.2009 at 18:55 Rutger ter Borg wrote :

...

...
wrapping blas or uBLAS is not my intent uBLAS does not support some handy concepts such as complexity of expressions so if you try to wrap uBLAS involving complexity you probably simply double code size because too many is to be wrapped e.g. (V1+V2) has linear complexity, (M1*V1) has square compl., (M1*M2) has cubic compl. etc.

...

I would argue that wrapping around back-ends is the best one could do, given the vendor support they have, and man-years spent on them. E.g., using template expressions to rewrite a free expression to an optimal set of back- end calls (to BLAS, LAPACK, FFTW, etc.)? Then you would be able to benefit from vendor-optimised back-ends, and the expressiveness of C++. speaking of such routines like finding eigenvalues or performing SVD i agree with you but nativly implemented blas seems to me better than mapping to library calls that's what i'm talking about

-- Pavel

Rutger ter Borg

5:03 p.m.

DE wrote:

...

speaking of such routines like finding eigenvalues or performing SVD i agree with you but nativly implemented blas seems to me better than mapping to library calls that's what i'm talking about

I see. Even then, to outperform, say, ATLAS, Intel's MKL, or nVidia's CUBLAS, is extremely challenging. I think this will already hold for a serialized execution model. On top of that, when taking into account that a BLAS (or LAPACK for that matter) can be replaced by parallel and/or distributed execution models (threaded/PBLAS/ScalaPACK) I would say it's near impossible. Besides, what is nicer to be able to plug-in a new GPU and to be quickly able to use its power to full extend? Or, given the dominance and vendor support of the BLAS API, some other future piece of hardware? Cheers, Rutger

DE

5:29 p.m.

on 11.08.2009 at 21:03 Rutger ter Borg wrote :

...

...
speaking of such routines like finding eigenvalues or performing SVD i agree with you but nativly implemented blas seems to me better than mapping to library calls that's what i'm talking about I see. Even then, to outperform, say, ATLAS, Intel's MKL, or nVidia's CUBLAS, is extremely challenging. I think this will already hold for a serialized execution model. On top of that, when taking into account that a BLAS (or LAPACK for that matter) can be replaced by parallel and/or distributed execution models (threaded/PBLAS/ScalaPACK) I would say it's near impossible.

...

Besides, what is nicer to be able to plug-in a new GPU and to be quickly able to use its power to full extend? Or, given the dominance and vendor support of the BLAS API, some other future piece of hardware?

if we lived in ideal world it'll be the very approach however the first sentence on boost homepage states that

...

Boost provides free peer-reviewed portable C++ source libraries. i don't think blas implementations such that you say will be ever portable (and i actually wonder how the threading was made portable??)

i focus on delivering a portable generic self-contained solution without dependancies on third-party entities

Rutger ter Borg

6:11 p.m.

DE wrote: [snip]

...

if we lived in ideal world it'll be the very approach however the first sentence on boost homepage states that

...
Boost provides free peer-reviewed portable C++ source libraries. i don't think blas implementations such that you say will be ever portable (and i actually wonder how the threading was made portable??)

Thanks :-) I'm not sure, but portable may not mean "without dependencies of any other code whatsoever". There's also Boost.Python, Boost.MPI, Boost.Filesystem, Boost.Asio, etc., which all depend on some external API.

...

i focus on delivering a portable generic self-contained solution without dependancies on third-party entities

Interesting. Have you looked at MTL4? Cheers, Rutger

DE

6:35 p.m.

...

Thanks :-) I'm not sure, but portable may not mean "without dependencies of any other code whatsoever". There's also Boost.Python, Boost.MPI, Boost.Filesystem, Boost.Asio, etc., which all depend on some external API. you are welcome! maybe portable doesn't mean external-dependance-free but at least a

...

Interesting. Have you looked at MTL4? i swear it looks familiar to me but i don't actually remeber i'll use all that references as a reference (oops! pun again) while

on 11.08.2009 at 22:11 Rutger ter Borg wrote : library must not depend on other entities rather than standard facilities or OS i prefer bundles like e.g. qt: once you got it you build it and use and nothing else matters i don't like when lib needs something else to use it rather a particular lib can deliver a feature of extesibility or integration with 3rd-party software e.g. doxygen which is able to draw it's own ugly class diagrams but allows you to use external graphviz tool seamlessly to generate state of the art pics polishing my own lib -- Pavel

joel

6:42 p.m.

DE wrote:

...

i don't like when lib needs something else to use itrather a particular lib can deliver a feature of extesibility orintegration with 3rd-party software A good practice is to be generic enough so pluging in an external ref inside the core lib is easy and can be done as external toolbox.

-- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

DE

7:01 p.m.

on 11.08.2009 at 22:42 joel wrote :

...

...
i don't like when lib needs something else to use itrather a particular lib can deliver a feature of extesibility orintegration with 3rd-party software A good practice is to be generic enough so pluging in an external ref inside the core lib is easy and can be done as external toolbox. are you reading minds? that is i just wanted to say!

-- Pavel

joel

7:05 p.m.

DE wrote:

...

are you reading minds? that is i just wanted to say! I've been down this road for years now so I start to know my maps :p

-- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

5827

Age (days ago)

5829

Last active (days ago)

List overview

Download

39 comments

9 participants

participants (9)

DE
Edward Grace
Frédéric Bron
joel
John Maddock
Patrick Mihelich
Rutger ter Borg
Sebastian Nowozin
Zoran Cvetkov