
Sorry for top-posting without quotes but I think someone said something about intel compiler applicable to this. When I said earlier that the Intel compiler was as good as my hand coded assembler for wavelets, I think I was talking about something I wrote as naive hand coded for-loops and, IIRC, it was an in-place 2D transform. I'm not sure what that compiler can do with extra copies and pretty sure it won't volunteer to overwrite your operands :) This can be a big deal on big data sets when considering cache misses. An, if you are doing multiple passes on same data, blocking ( do many levels on small junk) can help too. Also, I think JM said something about gcc not inlining very well. I had to check as this was always the one thing I assumed a compiler could do right. I did a quick test of my own code g++ -Wall -O0 -ggdb -S -o junk00g string_test.cpp gcc version 3.4.4 (cygming special, gdc 0.12, using dmd 0.125) versus -O3 and it appears, on a quick look, that most of the "calls" went away in the code I care about ( although I haven't used gcc much in the past and haven't looked at assembler in a while ).
From: maikbeckmann@gmx.de To: boost-users@lists.boost.org Date: Thu, 29 Nov 2007 19:54:35 +0100 Subject: Re: [Boost-users] C++ Performance
Am Donnerstag 29 November 2007 10:34:04 schrieb nisha kannookadan:
Ok, I optimized my program (now its with pass by reference and the resize stuff is out):
void Wavelet::ttrans(matrix& At, int level) { matrix cfe1, cfe2, cfo, cfe, c, d; int N,s2;
N = (At.size1()+1)/2; s2 = At.size2(); scalar_matrix zer(N,s2);
for (int ii = 1; ii <= level; ii++) {
cfo = subslice(At, 0,2,N, 0,1,s2); cfe = subslice(At, 1,2,N-1, 0,1,s2); A subslice is lightweight handle for maybe a heavyweight matrix. This line cfo = subslice(At, 0,2,N, 0,1,s2); eliminates the performance gain, since cfo is a full flagged matrix.
However, don't know if its allowed to apply a subrange to a subslice. Can you spend a full working example plus data? Its very hard to give tips on template libraries without the tips I get from of my compiler :)
c = (cfe + (subrange(cfo, 0,N-1, 0,s2)+subrange(cfo, 1,N, 0,s2))*0.5);
zer.resize(N,s2,true); cfe1 = zer; cfe2 = zer;
(subrange(cfe1, 0,N-1, 0,s2)).assign(cfe); (subrange(cfe2, 1,N, 0,s2)).assign(cfe); d = cfo-(cfe1+cfe2)*0.5;
(subrange(At, 0,N-1, 0,At.size2())).assign(c); (subrange(At, N-1,2*N-1, 0,At.size2())).assign(d);
N = N/2; }
cfe1.clear(); cfe2.clear(); cfo.clear(); cfe.clear(); c.clear(); d.clear();
}
But I guessed, its still not good enough, and wanted to work with pointer to mend copying..and the result was the next code piece, which compiles, but terminates when I run it..
void Wavelet::ttrans(matrix& At, int level) { matrix cfe1, cfe2, *cfo, *cfe, *c, *d;
PLEASE don't use ublas matrices as pointers! They are not made for this (no virtual destructors for performace reasons). If you want to avoid copying, allways use references.
BTW: Theres a ublas mailing list - http://lists.boost.org/mailman/listinfo.cgi/ublas which is read by all ublas devs and power users. If someone knows how get the most performace out of your code, they do. And again, yu will get the most (useful) feedback if you provide a working examples which can be hacked.
Best, -- Maik
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_________________________________________________________________ Put your friends on the big screen with Windows Vista® + Windows Live™. http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC_...