data:image/s3,"s3://crabby-images/76c84/76c846b764e7872f1d787afb23f85b009fb9109d" alt=""
Hi everybody Im a quite new boost user and I wonder, why C++ Program is so slow. I wrote a program, and I have the same program in Matlab, I expected that my C++ version is much faster, but its not. I use matrices, vectors, matrix- and vector proxies. Do some solving and compute calculations with prod, element_div, etc.. Is boost in general slower than matlab, should I use something else? Id be real happy about any help. Regards Nisha K PS: This question was already in the developer forum. _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us
data:image/s3,"s3://crabby-images/e25a3/e25a3ff70fef014bb731b40c3b8cc6e7e386af64" alt=""
It's possible for a program to be very fast enough though it's not written in C++. Matlab is heavily optimized for the sorts of programs you appear to be writing (it's not so fast when there's a lot of control-flow logic included). Choose the right tool for the job. -Max Wilson -- "The presentation or 'gift' of the Holy Ghost simply confers upon a man the right to receive at any time, when he is worthy of it and desires it, the power and light of truth of the Holy Ghost, although he may often be left to his own spirit and judgment." --Joseph F. Smith (manual, p. 69)
data:image/s3,"s3://crabby-images/a48a6/a48a6baf71f1d2c16288e617fca9aaea905d534c" alt=""
Nisha
Im a quite new boost user and I wonder, why C++ Program is so slow.
I wrote a program, and I have the same program in Matlab, I expected that my C++ version is much faster, but its not.
I use matrices, vectors, matrix- and vector proxies. Do some solving and compute calculations with prod, element_div, etc..
Is boost in general slower than matlab, should I use something else?
Id be real happy about any help.
This is unfortunately a very broad question, and the way it is worded will make it difficult to answer. In general, a compiled language such as C++ will be faster than an interpreted language such as Matlab. However, as with all computing languages, it is very possible to use C++ (and/or Boost) improperly or inefficiently. Perhaps, the best way to obtain help would be to post a small code example that exhibits the performance problem you are seeing. It would be good to see the Matlab and C++ equivalents that you are trying to compare. With C++, compilation settings (compiler flags and defines) make a significant difference. For instance, many additional checks are done on the running code when it is compiled in debug mode. The checks are great for making sure that the code works properly, but they should be disabled for release builds in order to obtain optimal runtime performance. It is generally a good idea to compile with NDEBUG defined when building in release mode. Additionally, it is always good to get familiar with profiling tools. These tools make it easier to spot performance bottlenecks in your code. Hope This Helps, Justin
data:image/s3,"s3://crabby-images/4da77/4da77e9fd1a2ba2a150c914254e0df7913473e98" alt=""
Quoth nisha kannookadan on Wed, Nov 28, 2007 at 23:31:11 +0000
Im a quite new boost user and I wonder, why C++ Program is so slow.
I would suggest investigating with the Intel compiler if you can. It is expensive but good. -- yann@kierun.org -= H+ =- www.kierun.org PGP: 009D 7287 C4A7 FD4F 1680 06E4 F751 7006 9DE2 6318
data:image/s3,"s3://crabby-images/76c84/76c846b764e7872f1d787afb23f85b009fb9109d" alt=""
Ok, I optimized my program (now its with pass by reference and the resize stuff is out): void Wavelet::ttrans(matrix& At, int level) { matrix cfe1, cfe2, cfo, cfe, c, d; int N,s2; N = (At.size1()+1)/2; s2 = At.size2(); scalar_matrix zer(N,s2); for (int ii = 1; ii <= level; ii++) { cfo = subslice(At, 0,2,N, 0,1,s2); cfe = subslice(At, 1,2,N-1, 0,1,s2); c = (cfe + (subrange(cfo, 0,N-1, 0,s2)+subrange(cfo, 1,N, 0,s2))*0.5); zer.resize(N,s2,true); cfe1 = zer; cfe2 = zer; (subrange(cfe1, 0,N-1, 0,s2)).assign(cfe); (subrange(cfe2, 1,N, 0,s2)).assign(cfe); d = cfo-(cfe1+cfe2)*0.5; (subrange(At, 0,N-1, 0,At.size2())).assign(c); (subrange(At, N-1,2*N-1, 0,At.size2())).assign(d); N = N/2; } cfe1.clear(); cfe2.clear(); cfo.clear(); cfe.clear(); c.clear(); d.clear(); } But I guessed, its still not good enough, and wanted to work with pointer to mend copying..and the result was the next code piece, which compiles, but terminates when I run it.. void Wavelet::ttrans(matrix& At, int level) { matrix cfe1, cfe2, *cfo, *cfe, *c, *d; int N,s2; N = (At.size1()+1)/2; s2 = At.size2(); scalar_matrix zer(N,s2,0); for (int ii = 1; ii <= level; ii++) { *cfo = subslice(At, 0,2,N, 0,1,s2); *cfe = subslice(At, 1,2,N-1, 0,1,s2); *c = (*cfe + (subrange(*cfo, 0,N-1, 0,s2)+subrange(*cfo, 1,N, 0,s2))*0.5); zer.resize(N,s2,true); cfe1 = zer; cfe2 = zer; (subrange(cfe1, 0,N-1, 0,s2)).assign(*cfe); (subrange(cfe2, 1,N, 0,s2)).assign(*cfe); *d = *cfo-(cfe1+cfe2)*0.5; (subrange(At, 0,N-1, 0,At.size2())).assign(*c); (subrange(At, N-1,2*N-1, 0,At.size2())).assign(*d); N = N/2; cfe1.clear(); cfe2.clear(); } } Am I on the right track, or can I do much more for optimization. Do I still do a lot of copying? And can anybody tell me, why this is not runnging? I will check out the intel compiler. Thanks. } _________________________________________________________________ News, entertainment and everything you care about at Live.com. Get it now! http://www.live.com/getstarted.aspx
data:image/s3,"s3://crabby-images/8256c/8256c9cc951a851e4f6e9283f09992b2074c621a" alt=""
On Thu, 29 Nov 2007 09:34:04 +0000, nisha kannookadan wrote:
Am I on the right track, or can I do much more for optimization. Do I still do a lot of copying? And can anybody tell me, why this is not runnging?
My friend, you need to read a book on C++. Specifically, it is obvious that your understanding of pointers is incomplete. I can recommend "The C++ Programming Language" but I have been told that it is not a good recommendation for people new to the language (I used it, so I don't see the big deal!) Don't get caught into trial-and-error programming. It is really the worst method of programming. Even more than not programming! -- Sohail Somani http://uint32t.blogspot.com
data:image/s3,"s3://crabby-images/4b472/4b4720c85e54c5ec274b0eafcf6f9b9227f908e7" alt=""
Am Donnerstag 29 November 2007 10:34:04 schrieb nisha kannookadan:
Ok, I optimized my program (now its with pass by reference and the resize stuff is out):
void Wavelet::ttrans(matrix& At, int level) { matrix cfe1, cfe2, cfo, cfe, c, d; int N,s2;
N = (At.size1()+1)/2; s2 = At.size2(); scalar_matrix zer(N,s2);
for (int ii = 1; ii <= level; ii++) {
cfo = subslice(At, 0,2,N, 0,1,s2); cfe = subslice(At, 1,2,N-1, 0,1,s2); A subslice is lightweight handle for maybe a heavyweight matrix. This line cfo = subslice(At, 0,2,N, 0,1,s2); eliminates the performance gain, since cfo is a full flagged matrix.
However, don't know if its allowed to apply a subrange to a subslice. Can you spend a full working example plus data? Its very hard to give tips on template libraries without the tips I get from of my compiler :)
c = (cfe + (subrange(cfo, 0,N-1, 0,s2)+subrange(cfo, 1,N, 0,s2))*0.5);
zer.resize(N,s2,true); cfe1 = zer; cfe2 = zer;
(subrange(cfe1, 0,N-1, 0,s2)).assign(cfe); (subrange(cfe2, 1,N, 0,s2)).assign(cfe); d = cfo-(cfe1+cfe2)*0.5;
(subrange(At, 0,N-1, 0,At.size2())).assign(c); (subrange(At, N-1,2*N-1, 0,At.size2())).assign(d);
N = N/2; }
cfe1.clear(); cfe2.clear(); cfo.clear(); cfe.clear(); c.clear(); d.clear();
}
But I guessed, its still not good enough, and wanted to work with pointer to mend copying..and the result was the next code piece, which compiles, but terminates when I run it..
void Wavelet::ttrans(matrix& At, int level) { matrix cfe1, cfe2, *cfo, *cfe, *c, *d;
PLEASE don't use ublas matrices as pointers! They are not made for this (no virtual destructors for performace reasons). If you want to avoid copying, allways use references. BTW: Theres a ublas mailing list - http://lists.boost.org/mailman/listinfo.cgi/ublas which is read by all ublas devs and power users. If someone knows how get the most performace out of your code, they do. And again, yu will get the most (useful) feedback if you provide a working examples which can be hacked. Best, -- Maik
data:image/s3,"s3://crabby-images/8256c/8256c9cc951a851e4f6e9283f09992b2074c621a" alt=""
On Thu, 29 Nov 2007 19:54:35 +0100, Maik Beckmann wrote:
PLEASE don't use ublas matrices as pointers! They are not made for this (no virtual destructors for performace reasons). If you want to avoid copying, allways use references.
I hope you weren't suggesting to do this: // probably won't even compile, and will // hurt you if it does. vector_slice<T> & s = subslice(...); -- Sohail Somani http://uint32t.blogspot.com
data:image/s3,"s3://crabby-images/4b472/4b4720c85e54c5ec274b0eafcf6f9b9227f908e7" alt=""
Am Donnerstag 29 November 2007 20:05:20 schrieb Sohail Somani:
On Thu, 29 Nov 2007 19:54:35 +0100, Maik Beckmann wrote:
PLEASE don't use ublas matrices as pointers! They are not made for this (no virtual destructors for performace reasons). If you want to avoid copying, allways use references.
I hope you weren't suggesting to do this:
// probably won't even compile, and will // hurt you if it does. vector_slice<T> & s = subslice(...);
A slice is a lightweight object, copying doesn't hurt. No, the hint isn't related to slices and ranges. It's just: typedef boost::numeric::ublas::matrix<double> matrix_type; matrix_type At; // .... fill At ... matrix_type& cfo = mx; vs. matrix_type* cfo = &mx;
data:image/s3,"s3://crabby-images/8256c/8256c9cc951a851e4f6e9283f09992b2074c621a" alt=""
On Thu, 29 Nov 2007 20:23:50 +0100, Maik Beckmann wrote:
I hope you weren't suggesting to do this:
// probably won't even compile, and will // hurt you if it does. vector_slice<T> & s = subslice(...);
A slice is a lightweight object, copying doesn't hurt.
The above wasn't a copy, it was a reference to a temporary (which shouldn't compile!) [snip] In any case, I think the OP needs to read a book or three :-) -- Sohail Somani http://uint32t.blogspot.com
data:image/s3,"s3://crabby-images/4b472/4b4720c85e54c5ec274b0eafcf6f9b9227f908e7" alt=""
Am Donnerstag 29 November 2007 20:26:54 schrieb Sohail Somani:
On Thu, 29 Nov 2007 20:23:50 +0100, Maik Beckmann wrote:
I hope you weren't suggesting to do this:
// probably won't even compile, and will // hurt you if it does. vector_slice<T> & s = subslice(...);
A slice is a lightweight object, copying doesn't hurt.
The above wasn't a copy,
I know
data:image/s3,"s3://crabby-images/76c84/76c846b764e7872f1d787afb23f85b009fb9109d" alt=""
Hi guys Thanks for all. I follow your advises, posted the code in the ublas, gonna read a C++ book and Im positive, its gonna be fine. I changed now the code to something runnable and have it also in matlab, u will see, its way slower in C++. So first C++: #include #include "MVHF.h" #include #include #include #include #include #define DNDEBUG #include using namespace std; void ttrans(matrix& At, int level) { matrix cfe1, cfe2, cfo, cfe, c, d; int N,s2; N = (At.size1()+1)/2; s2 = At.size2(); zero_matrix zer(N,s2); for (int ii = 1; ii <= level; ii++) { cfo = subslice(At, 0,2,N, 0,1,s2); cfe = subslice(At, 1,2,N-1, 0,1,s2); c = cfe + (subrange(cfo, 0,N-1, 0,s2)+subrange(cfo, 1,N, 0,s2))*0.5; zer.resize(N,s2,true); cfe1 = zer; cfe2 = zer; (subrange(cfe1, 0,N-1, 0,s2)).assign(cfe); (subrange(cfe2, 1,N, 0,s2)).assign(cfe); d = cfo-(cfe1+cfe2)*0.5; (subrange(At, 0,N-1, 0,At.size2())).assign(c); (subrange(At, N-1,2*N-1, 0,At.size2())).assign(d); N = N/2; } } void init(int dom,int llev) { int in_dofs_LA= 511; matrix Am_LA, As_LA; double mesh_size = ((double)(dom)/(in_dofs_LA+1)); scalar_matrix zer(in_dofs_LA, in_dofs_LA,0.0); Am_LA.resize(in_dofs_LA, in_dofs_LA, false); As_LA.resize(in_dofs_LA, in_dofs_LA, false); Am_LA.assign(zer); As_LA.assign(zer); Am_LA(0,0) = 4.0; Am_LA(1,0) = 1.0; As_LA(0,0) = 2.0; As_LA(1,0) = -1.0; for (int ii = 1; ii < (in_dofs_LA-1); ii++){ Am_LA(ii-1,ii) = 1.0; Am_LA(ii,ii) = 4.0; Am_LA(ii+1,ii) = 1.0; As_LA(ii-1,ii) = -1.0; As_LA(ii,ii) = 2.0; As_LA(ii+1,ii) = -1.0; } Am_LA(in_dofs_LA-2,in_dofs_LA-1) = 1.0; Am_LA(in_dofs_LA-1,in_dofs_LA-1) = 4.0; As_LA(in_dofs_LA-2,in_dofs_LA-1) = -1.0; As_LA(in_dofs_LA-1,in_dofs_LA-1) = 2.0; Am_LA = Am_LA*(mesh_size/6.0); As_LA = As_LA*(1.0/mesh_size); ttrans(Am_LA,llev); Am_LA = trans(Am_LA); ttrans(Am_LA,llev); Am_LA = trans(Am_LA); ttrans(As_LA,llev); As_LA = trans(As_LA); ttrans(As_LA,llev); As_LA = trans(As_LA); } int main() { clock_t start, end; start = clock(); init(1,8); end = clock(); std::cout << " " << std::endl; std::cout << "Elapsed time is " << (double)(end-start)/CLOCKS_PER_SEC << "." << std::endl; } Here the matlab version of it: function test() tic R = 1; L = 8; n = 2^(L+1)-1; h = R/(n+1); e = ones(n,1); Am = h/6*spdiags([e, 4*e, e], -1:1, n, n); % compute stiffness matrix As = 1/h*spdiags([-e, 2*e, -e], -1:1, n, n); % transform into wavelets Am = ttrans(ttrans(Am,L)',L)'; As = ttrans(ttrans(As,L)',L)'; toc return function cd = ttrans(cl,L) N = size(cl,1)+1; N = N/2; m = size(cl,2); z = zeros(1,m); cd = cl; for i=1:L cfo = cd(1:2:2*N-1,:); cfe = cd(2:2:2*N-2,:); % wavelets c = cfe + (cfo(1:end-1,:)+cfo(2:end,:))/2; d = cfo - ([cfe;z]+[z;cfe])/2; cd(1:2*N-1,:) = [c;d]; N = N/2; end return But I really gotta say, this mailing list is great. I really appreciate all of your replies. Thanks, Nisha K _________________________________________________________________ News, entertainment and everything you care about at Live.com. Get it now! http://www.live.com/getstarted.aspx
data:image/s3,"s3://crabby-images/758ed/758ed636272ddc947a4ce1398eb6dee6f687ebf4" alt=""
I changed now the code to something runnable and have it also in matlab, u will see, its way slower in C++.
I probably shouldn't ask but did you verify the output was the same in both cases? "diff " would work if you could get the same formats and precisions. FWIW, a decent compiler could probably figure out that certain things have no observable effect and start yanking code that produces such results. Before you give up on C++, try coding a simple loop without the OO stuff and see what you get. In-place wavelet is nice intro and you can use the optimized result for "lifting" to more complicated wavelets.
From: nishak44@hotmail.com To: boost-users@lists.boost.org Date: Fri, 30 Nov 2007 17:05:55 +0000 Subject: Re: [Boost-users] C++ Performance
Hi guys
Thanks for all. I follow your advises, posted the code in the ublas, gonna read a C++ book and Im positive, its gonna be fine.
I changed now the code to something runnable and have it also in matlab, u will see, its way slower in C++.
So first C++:
#include #include "MVHF.h" #include #include #include #include #include
#define DNDEBUG #include
using namespace std;
void ttrans(matrix& At, int level) { matrix cfe1, cfe2, cfo, cfe, c, d; int N,s2;
N = (At.size1()+1)/2; s2 = At.size2(); zero_matrix zer(N,s2);
for (int ii = 1; ii <= level; ii++) {
cfo = subslice(At, 0,2,N, 0,1,s2); cfe = subslice(At, 1,2,N-1, 0,1,s2);
c = cfe + (subrange(cfo, 0,N-1, 0,s2)+subrange(cfo, 1,N, 0,s2))*0.5;
zer.resize(N,s2,true); cfe1 = zer; cfe2 = zer;
(subrange(cfe1, 0,N-1, 0,s2)).assign(cfe); (subrange(cfe2, 1,N, 0,s2)).assign(cfe); d = cfo-(cfe1+cfe2)*0.5;
(subrange(At, 0,N-1, 0,At.size2())).assign(c); (subrange(At, N-1,2*N-1, 0,At.size2())).assign(d);
N = N/2; }
}
void init(int dom,int llev) {
int in_dofs_LA= 511; matrix Am_LA, As_LA; double mesh_size = ((double)(dom)/(in_dofs_LA+1));
scalar_matrix zer(in_dofs_LA, in_dofs_LA,0.0); Am_LA.resize(in_dofs_LA, in_dofs_LA, false); As_LA.resize(in_dofs_LA, in_dofs_LA, false); Am_LA.assign(zer); As_LA.assign(zer);
Am_LA(0,0) = 4.0; Am_LA(1,0) = 1.0;
As_LA(0,0) = 2.0; As_LA(1,0) = -1.0;
for (int ii = 1; ii < (in_dofs_LA-1); ii++){ Am_LA(ii-1,ii) = 1.0; Am_LA(ii,ii) = 4.0; Am_LA(ii+1,ii) = 1.0;
As_LA(ii-1,ii) = -1.0; As_LA(ii,ii) = 2.0; As_LA(ii+1,ii) = -1.0; }
Am_LA(in_dofs_LA-2,in_dofs_LA-1) = 1.0; Am_LA(in_dofs_LA-1,in_dofs_LA-1) = 4.0;
As_LA(in_dofs_LA-2,in_dofs_LA-1) = -1.0; As_LA(in_dofs_LA-1,in_dofs_LA-1) = 2.0;
Am_LA = Am_LA*(mesh_size/6.0); As_LA = As_LA*(1.0/mesh_size);
ttrans(Am_LA,llev); Am_LA = trans(Am_LA); ttrans(Am_LA,llev); Am_LA = trans(Am_LA);
ttrans(As_LA,llev); As_LA = trans(As_LA); ttrans(As_LA,llev); As_LA = trans(As_LA);
}
int main() { clock_t start, end; start = clock();
init(1,8); end = clock();
std::cout << " " << std::endl; std::cout << "Elapsed time is " << (double)(end-start)/CLOCKS_PER_SEC << "." << std::endl; }
Here the matlab version of it:
function test() tic R = 1; L = 8; n = 2^(L+1)-1; h = R/(n+1);
e = ones(n,1); Am = h/6*spdiags([e, 4*e, e], -1:1, n, n);
% compute stiffness matrix As = 1/h*spdiags([-e, 2*e, -e], -1:1, n, n);
% transform into wavelets Am = ttrans(ttrans(Am,L)',L)'; As = ttrans(ttrans(As,L)',L)'; toc
return
function cd = ttrans(cl,L) N = size(cl,1)+1; N = N/2;
m = size(cl,2);
z = zeros(1,m); cd = cl; for i=1:L cfo = cd(1:2:2*N-1,:); cfe = cd(2:2:2*N-2,:);
% wavelets c = cfe + (cfo(1:end-1,:)+cfo(2:end,:))/2; d = cfo - ([cfe;z]+[z;cfe])/2;
cd(1:2*N-1,:) = [c;d]; N = N/2; end
return
But I really gotta say, this mailing list is great. I really appreciate all of your replies.
Thanks, Nisha K _________________________________________________________________ News, entertainment and everything you care about at Live.com. Get it now! http://www.live.com/getstarted.aspx _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_________________________________________________________________ Connect and share in new ways with Windows Live. http://www.windowslive.com/connect.html?ocid=TXT_TAGLM_Wave2_newways_112007
data:image/s3,"s3://crabby-images/76c84/76c846b764e7872f1d787afb23f85b009fb9109d" alt=""
Hi Mike,
I probably shouldn't ask but did you verify the output was the same in both cases?
Yes. It is. The programm itself is fine. Its just the speed.
"diff " would work if you could get the same formats and precisions.
What is "diff" ?
FWIW, a decent compiler could probably figure out that certain things have no observable effect and start yanking code that produces such results.
Guess I gotta change the compiler, to much trouble with Ecplise. Thanks a lot. _________________________________________________________________ Explore the seven wonders of the world http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE
data:image/s3,"s3://crabby-images/dd17f/dd17f517ef180bb7e3ff6711e0598f4e9c8f4768" alt=""
Hi,
On 12/1/07, nisha kannookadan
"diff " would work if you could get the same formats and precisions.
What is "diff" ?
From my understanding:
If you can write the results of your computation into a file, write the output from MATLAB and then from your C++ program. Make sure you have follow the same order in writing the data to a file (with same formatting like spaces/newlines...) Use a program like GNU diff/kdiff3/windiff on the two output files you have to find the differences between them. Ideally, you should not see any difference. If you do notice a difference, check if the difference is negligible (you have to decide the precision you are expecting). The idea behind this is: A drive to improve performance _must_ not degrade the correctness of the solution. -dky -- Contents reflect my personal views only!
data:image/s3,"s3://crabby-images/758ed/758ed636272ddc947a4ce1398eb6dee6f687ebf4" alt=""
have to find the differences between them. Ideally, you should not see any difference. If you do notice a difference, check if the difference is negligible (you have to decide the precision you are expecting).
Normally with numerical stuff, diff wouldn't be easy unless you knock down precision but, even then you may often want to at least look at the noise. I think this wavelet transform is invertible and and should be "exact" ( you can transform integers to integers and invert if you do everything right). In this case, it may be easier to write an inverse wavelet transform and output the numerical differences between original and recovered sequences. More to the point, at least verify that the suspect code block is really guilty and try to disassemble it. Also, as far as speed, look at something like FFTW ( you can find stuff on google). They have lots of documentation and discussion posted in various places. Or, the link I posted earlier to Intel site should help. Don't just count instructions as memory access patterns can be a much bigger deal- think various shades or virtual memory.
Date: Tue, 4 Dec 2007 10:19:42 +0530 From: dhruvakm@gmail.com To: boost-users@lists.boost.org Subject: Re: [Boost-users] C++ Performance
Hi,
On 12/1/07, nisha kannookadan wrote:
"diff " would work if you could get the same formats and precisions.
What is "diff" ?
From my understanding:
If you can write the results of your computation into a file, write the output from MATLAB and then from your C++ program. Make sure you have follow the same order in writing the data to a file (with same formatting like spaces/newlines...) Use a program like GNU diff/kdiff3/windiff on the two output files you have to find the differences between them. Ideally, you should not see any difference. If you do notice a difference, check if the difference is negligible (you have to decide the precision you are expecting).
The idea behind this is: A drive to improve performance _must_ not degrade the correctness of the solution.
-dky
-- Contents reflect my personal views only! _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_________________________________________________________________ Connect and share in new ways with Windows Live. http://www.windowslive.com/connect.html?ocid=TXT_TAGLM_Wave2_newways_112007
data:image/s3,"s3://crabby-images/76c84/76c846b764e7872f1d787afb23f85b009fb9109d" alt=""
Normally with numerical stuff, diff wouldn't be easy unless you knock down precision> but, even then you may often want to at least look at the noise. I think this wavelet> transform is invertible and and should be "exact" ( you can transform integers to> integers and invert if you do everything right). In this case, it may be easier to> write an inverse wavelet transform and output the numerical differences between > original and recovered sequences.> > More to the point, at least verify that the suspect code block is really guilty and> try to disassemble it.> > Also, as far as speed, look at something like FFTW ( you can find stuff on google).> They have lots of documentation and discussion posted in various places.> Or, the link I posted earlier to Intel site should help. Don't just count instructions> as memory access patterns can be a much bigger deal- think various shades or> virtual memory. > I will do that. Thanx.
Explore the seven wonders of the world http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE
data:image/s3,"s3://crabby-images/76c84/76c846b764e7872f1d787afb23f85b009fb9109d" alt=""
From my understanding:> > If you can write the results of your computation into a file, write> the output from MATLAB and then from your C++ program. Make sure you> have follow the same order in writing the data to a file (with same> formatting like spaces/newlines...)> Use a program like GNU diff/kdiff3/windiff on the two output files you> have to find the differences between them. Ideally, you should not see> any difference. If you do notice a difference, check if the difference> is negligible (you have to decide the precision you are expecting).> > The idea behind this is:> A drive to improve performance _must_ not degrade the correctness of> the solution. Ok thnks druvah..
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us
data:image/s3,"s3://crabby-images/653b7/653b7b2da7d7831f907242408599600f2076e2cd" alt=""
On Nov 30, 2007, at 12:05 PM, nisha kannookadan wrote:
I changed now the code to something runnable and have it also in matlab, u will see, its way slower in C++.
[snip]
#define DNDEBUG
As Justin told you earlier, the line above should be: #define NDEBUG to eliminate the overhead of bounds-checking, etc. On the compiler command-line, it is "-DNDEBUG", which is the concatenation of the "-D" option, meaning to define a preprocessor symbol, and "NDEBUG", the literal symbol. Regards, -Steve -- Steve Byan Steve.Byan@netapp.com
data:image/s3,"s3://crabby-images/758ed/758ed636272ddc947a4ce1398eb6dee6f687ebf4" alt=""
Sorry for top-posting without quotes but I think someone said something about intel compiler applicable to this. When I said earlier that the Intel compiler was as good as my hand coded assembler for wavelets, I think I was talking about something I wrote as naive hand coded for-loops and, IIRC, it was an in-place 2D transform. I'm not sure what that compiler can do with extra copies and pretty sure it won't volunteer to overwrite your operands :) This can be a big deal on big data sets when considering cache misses. An, if you are doing multiple passes on same data, blocking ( do many levels on small junk) can help too. Also, I think JM said something about gcc not inlining very well. I had to check as this was always the one thing I assumed a compiler could do right. I did a quick test of my own code g++ -Wall -O0 -ggdb -S -o junk00g string_test.cpp gcc version 3.4.4 (cygming special, gdc 0.12, using dmd 0.125) versus -O3 and it appears, on a quick look, that most of the "calls" went away in the code I care about ( although I haven't used gcc much in the past and haven't looked at assembler in a while ).
From: maikbeckmann@gmx.de To: boost-users@lists.boost.org Date: Thu, 29 Nov 2007 19:54:35 +0100 Subject: Re: [Boost-users] C++ Performance
Am Donnerstag 29 November 2007 10:34:04 schrieb nisha kannookadan:
Ok, I optimized my program (now its with pass by reference and the resize stuff is out):
void Wavelet::ttrans(matrix& At, int level) { matrix cfe1, cfe2, cfo, cfe, c, d; int N,s2;
N = (At.size1()+1)/2; s2 = At.size2(); scalar_matrix zer(N,s2);
for (int ii = 1; ii <= level; ii++) {
cfo = subslice(At, 0,2,N, 0,1,s2); cfe = subslice(At, 1,2,N-1, 0,1,s2); A subslice is lightweight handle for maybe a heavyweight matrix. This line cfo = subslice(At, 0,2,N, 0,1,s2); eliminates the performance gain, since cfo is a full flagged matrix.
However, don't know if its allowed to apply a subrange to a subslice. Can you spend a full working example plus data? Its very hard to give tips on template libraries without the tips I get from of my compiler :)
c = (cfe + (subrange(cfo, 0,N-1, 0,s2)+subrange(cfo, 1,N, 0,s2))*0.5);
zer.resize(N,s2,true); cfe1 = zer; cfe2 = zer;
(subrange(cfe1, 0,N-1, 0,s2)).assign(cfe); (subrange(cfe2, 1,N, 0,s2)).assign(cfe); d = cfo-(cfe1+cfe2)*0.5;
(subrange(At, 0,N-1, 0,At.size2())).assign(c); (subrange(At, N-1,2*N-1, 0,At.size2())).assign(d);
N = N/2; }
cfe1.clear(); cfe2.clear(); cfo.clear(); cfe.clear(); c.clear(); d.clear();
}
But I guessed, its still not good enough, and wanted to work with pointer to mend copying..and the result was the next code piece, which compiles, but terminates when I run it..
void Wavelet::ttrans(matrix& At, int level) { matrix cfe1, cfe2, *cfo, *cfe, *c, *d;
PLEASE don't use ublas matrices as pointers! They are not made for this (no virtual destructors for performace reasons). If you want to avoid copying, allways use references.
BTW: Theres a ublas mailing list - http://lists.boost.org/mailman/listinfo.cgi/ublas which is read by all ublas devs and power users. If someone knows how get the most performace out of your code, they do. And again, yu will get the most (useful) feedback if you provide a working examples which can be hacked.
Best, -- Maik
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_________________________________________________________________ Put your friends on the big screen with Windows Vista® + Windows Live™. http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC_...
data:image/s3,"s3://crabby-images/76c84/76c846b764e7872f1d787afb23f85b009fb9109d" alt=""
I hope you all not gonna kick me out, when u see this, but I have acctually no real clue about makefiles. The program is huge and Im getting lost to do the makefile, since I dont understand a few things about it..since the main file uses a class, which uses other classes and the other classes use again other or common classes. I got a lot of suggestions to add NDEBUD/DNDEBUD etc on the command line when compiling. As I said, I work with Eclipse and usually just press play. But I saw, that its possible to do something like a makefile there..and so I did that. I guess, I cant include NDEBUG etc without a makefile, right? And my makefile has only this line.. g++ -DNDEBUG *.cpp Its running. But not much faster. Is this totally wrong? And about the bindings.. I acctually need just to solve a linear system, with a triangle matrix: Hy = s, so is this fine just to do y = solve(subrange(H, 0,m, 0,m), subrange(s, 0, m),upper_tag()); or should I rather use the bindings. Then I use prod yet. So I do no crazy stuff, I guess the bindings arent necessary here. Am I right? Thanks a lot _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us
data:image/s3,"s3://crabby-images/76c84/76c846b764e7872f1d787afb23f85b009fb9109d" alt=""
For get about the makefile, realized that it was totally wrong. I defined the DNDEGUB at the top, like this.. #include #include #include #define DNDEBUG #include .. That should be right. :) Nisha K _________________________________________________________________ Explore the seven wonders of the world http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE
data:image/s3,"s3://crabby-images/a48a6/a48a6baf71f1d2c16288e617fca9aaea905d534c" alt=""
Nisha, On Thursday 29 November 2007 06:05:12 nisha kannookadan wrote:
#define DNDEBUG
This line should be the following: #define NDEBUG In reality, it is better to handle this flag in the build system. I would suggest looking into CMake (http://cmake.org) for a build system. It makes generating build files much easier. Additionally, I believe that Makefiles generated by CMake automatically set NDEBUG for release builds. As with everything worth learning, you will need to invest some time in learning CMake. You will be rewarded if you do though. Justin
participants (9)
-
dhruva
-
KSpam
-
Maik Beckmann
-
Maximilian Wilson
-
Mike Marchywka
-
nisha kannookadan
-
Sohail Somani
-
Steve Byan
-
Yann Golanski