Re: [Boost-users] C++ Performance

29 Nov 2007

      Sorry for top-posting without quotes but I think someone said something about
intel compiler applicable to this. When I said earlier that the Intel compiler 
was as good as my hand coded assembler for wavelets, I think I was
talking about something I wrote as naive hand coded for-loops
and, IIRC, it was an in-place 2D transform. I'm not sure what that compiler
can do with extra  copies and pretty sure it won't volunteer to overwrite your
operands :) This can be a big deal on big data sets when considering cache misses.
An, if you are doing multiple passes on same data, blocking ( do many levels on small
junk) can help too. 

Also, I think JM said something about gcc not inlining very well. 
I had to check as this was always the one thing I assumed a compiler could
do right. I did a quick test of my own code

 g++ -Wall -O0 -ggdb -S -o junk00g string_test.cpp

gcc version 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)

versus -O3 and it appears, on a quick look, that most of the "calls" went
away in the code I care about ( although I haven't used gcc much in the past
and haven't looked at assembler in a while ).
...
From: maikbeckmann@gmx.de
To: boost-users@lists.boost.org
Date: Thu, 29 Nov 2007 19:54:35 +0100
Subject: Re: [Boost-users] C++ Performance
Am Donnerstag 29 November 2007 10:34:04 schrieb nisha kannookadan:
...
Ok, I optimized my program (now its with pass by reference and the resize
stuff is out):
void Wavelet::ttrans(matrix& At, int level)
{
matrix cfe1, cfe2, cfo, cfe, c, d;
int N,s2;
N = (At.size1()+1)/2;
s2 = At.size2();
scalar_matrix zer(N,s2);
for (int ii = 1; ii <= level; ii++)
{
cfo = subslice(At, 0,2,N, 0,1,s2);
cfe = subslice(At, 1,2,N-1, 0,1,s2);
A subslice is lightweight handle for maybe a heavyweight matrix. This line
cfo = subslice(At, 0,2,N, 0,1,s2);
eliminates the performance gain, since cfo is a full flagged matrix.
However, don't know if its allowed to apply a subrange to a subslice. Can you
spend a full working example plus data? Its very hard to give tips on
template libraries without the tips I get from of my compiler :)
...
c = (cfe + (subrange(cfo, 0,N-1, 0,s2)+subrange(cfo, 1,N, 0,s2))*0.5);
zer.resize(N,s2,true);
cfe1 = zer;
cfe2 = zer;
(subrange(cfe1, 0,N-1, 0,s2)).assign(cfe);
(subrange(cfe2, 1,N, 0,s2)).assign(cfe);
d = cfo-(cfe1+cfe2)*0.5;
(subrange(At, 0,N-1, 0,At.size2())).assign(c);
(subrange(At, N-1,2*N-1, 0,At.size2())).assign(d);
N = N/2;
}
cfe1.clear();
cfe2.clear();
cfo.clear();
cfe.clear();
c.clear();
d.clear();
}
But I guessed, its still not good enough, and wanted to work with pointer
to mend copying..and the result was the next code piece, which compiles,
but terminates when I run it..
void Wavelet::ttrans(matrix& At, int level)
{
matrix cfe1, cfe2, *cfo, *cfe, *c, *d;
PLEASE don't use ublas matrices as pointers! They are not made for this (no
virtual destructors for performace reasons). If you want to avoid copying,
allways use references.
BTW: Theres a ublas mailing list
- http://lists.boost.org/mailman/listinfo.cgi/ublas
which is read by all ublas devs and power users. If someone knows how get the
most performace out of your code, they do. And again, yu will get the most
(useful) feedback if you provide a working examples which can be hacked.
Best,
-- Maik
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users
_________________________________________________________________
Put your friends on the big screen with Windows Vista® + Windows Live™.
http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC_...

Re: [Boost-users] C++ Performance

Mike Marchywka