
pavel wrote:
taking into account luke's message about adc_pi, i think that it's a perfect opportunity to abstract multiword addition (with carry) like this:
unsigned add_with_carry(unsigned *dest, const unsigned *src, size_t size);
the returned value is intended to be an overflow flag (read: carry of the result of the most significant word addition)
What you describe is the sort of thing I (and I presume others) had in mind when we suggested that arithmetic algorithms be decoupled from data structures, i.e. you would have template <typename ITER, typename CONST_ITER> carry_type add_with_carry(ITER dest_begin, ITER dest_end, CONST_ITER src_begin, CONST_ITER src_end); with a general-purpose implementation, and then specialise it: carry_type add_with_carry(uint32_t* dest_begin, uint32_t* dest_end, const uint32_t* src_begin, const uint32_t* src_end); and provide an asm implementation for the specialisation. However, this doesn't suit the SIMD instruction that Luke described. That does multiple adds in parallel, and has carry-in and carry-out to each, but those carries are NOT propagated from one adder to its neighbour in that same instruction. So to use that instruction efficiently, say you are adding 4 multi-word numbers together: R = A + B + C + D; You would do (tmp1, carries1) = A + B; (tmp2, carries2) = tmp1 + C + carries1; (tmp3, carries3) = tmp2 + D + carries2; R = tmp3 + carries3; Note that the +s in the first 3 lines use the SIMD instruction Luke described, and the temporaries are incomplete. In the last line, the carries are added in (with the carry potentially propagating across the whole width); this may need one instruction per word. In hardware this is called a "carry save adder"; I'm not sure if the SIMD people use the same terminology. Anyway, the point is that even a multi-word addition abstraction is not specialisable in a way that fully exploits SIMD instructions. Doing that probably needs expression templates. Regards, Phil.