
On Thu, 2 Apr 2009, Joel Falcou wrote:
SIMD algorithms for double precision seem to be rather hard to do right. It's difficult to get the right precision with respect to the scalar reference as scalar algorithm take advantages of the internal 80 bits floating points register, thus leading comparison between our implementation and the reference to yields things like 3000 ulp (ie 10^-13 RMS instead of 10^16). ... Discussions welcome.
My understanding is that that the problem lies with Intel's 80-bit "internal" precision. I've seen people force a copy out of the FP registers to counteract this, but I forget the full logic behind why. Maybe just to achieve cross-platform repeatability.
For your purposes, it might be best to have "slow, IEEE-compliant" scalar ops for checking results and "fast, Intel-specific" scalars for comparing timings.
- Daniel _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Yes I agree, I am working with joel and the problem is one I try to
dherring@ll.mit.edu a écrit : treat. It seems to me that we must have speedy algorithm for simd double, because accracy will be slower than mere scalar mapping through simd vectors. It also seems to me that simd is not still mature for double : 2 element is too less to hope a big gain with branchless algorithm for math functions... Jean-thierry