
On Tuesday 15 March 2011 18:45:49 Gruenke, Matt wrote:
template< SignProperty is_signed, int int_bits, //!< Number of integer bits. int frac_bits //!< Number of fractional bits. > class CFixed;
I, too, used a class like this in the past, but experience makes me wish for more flexibility. When modeling hardware operations implemented on signal/image processing algorithms, the following schemes have all seen wide use: 1. signed, int_bits, frac_bits <- your design 2. signed, msb_power_of_2, lsb_power_of_2 3. signed, width, int_bits 4. signed, width, frac_bits Of these, the only one I found to be completely intuitive was the second one. All of the others required non-obvious rules to represent the cases when the MSB is to the right of the binary point or the LSB is much to the left of the binary point. Subsequently, my colleagues and I found that most of the reasoning in choosing appropriate bit-widths is in figuring out the ULP. Our experience led me to propose my design.
The way I structured it was to discard information only on assignment (unless the maximum internal storage had overflowed, of course). While I agree that it's tempting to just use int64 for storage, the size and speed implications of creating large arrays of them provides a strong argument against this approach.
Good idea, but this is a refinement that can wait until later.
Since the most common use of fixed-point arithmetic is for speed, the priority is speed over safety.
In order of decreasing priority: correctness, speed, safety. Safety can be enforced in debug builds.
For example, never assume that overflow checking is performed. Furthermore, operatorns supported by the class are primarily designed to avoid unnecessary arithmetic, while minimizing surprises.
Agreed.
The general policies were:
* No virtual functions! Fixed-point types must be as fast and small as hand-coded fixed-point arthmetic, using built-in types.
* Converting constructor is implicit, since conversions (such as reductions in precision) are so common, in fixed-point arithmetic.
Agreed, but this is not an issue with the 'assigner' template argument.
* add/subtract return the greater of the int_bits and frac_bits (independently) of the operands.
* add/subtract returns signedness of the type with the greater range (choosing unsigned, in the event of a tie).
Disagreed. Follow the integers - operating on a signed always returns signed.
* multiply returns a type in which the int and frac bits are the sum of the corresponding values of the operands. The result is signed, unless both operands are unsigned.
* The return type of divide is equivalent to multiplying the numerator by the inverse of the denominator.
* inverse returns a type with one fewer frac bits than the operand had int bits and one more int bits than the operand had frac bits. This sacrifice of precision is necessary to avoid overflow.
This one is tricky; the behavior you propose is completely unacceptable for signal processing.
I like the idea of abstracting the assigner, but I guess I don't really see what you'd want to vary besides perhaps rounding behavior.
Examples (not exhaustive): Truncation vs rounding (direction - 4 possible ways) Wrapping vs saturation vs symmetric saturation Allowing only values of 2^{k} (used a lot in ASIC loop filters) Regards, Ravi