
I would guess memory allocation: if you're multiplying into already allocated arrays and don't have to dynamically allocate memory for the result of the multiplication, then that should be a lot quicker.
No, I do not pre-allocate arrays. Here is the code for O(N^2) algorithm I'm using: double * g(double * p, double * q, size_t len) { double * out = new double[2*len]; clear(out, 2*len); // calls memset for (int i=0; i<len; ++i) for (int j=0; j<len; ++j) out[i+j] += p[i] * q[j]; return out; } I can't remember if the polynomial class operators are move-optimised
either (they should be).
I don't think they are move optimized. All the function take lvalue references. Neither does this have a move constructor. I guess this class needs a lot of work. -- Lakshay