Fixed point implicity

Soren Holstebroe

1 Jul 2009 1 Jul '09

8:10 p.m.

Hi, I'm generalizing my fixed point class and have come across a few design decisions. 1. Return type of additive arithmetic operation between integers and fixed points Example: a 16 bit integer is added to a 16 bit fixed point with 8 bit fraction. I see three possible return types: 16 bit integer, 16:8 fixed point and promoted 32:8 fixed point If return type is 16 bit integer, the fraction is lost. Clearly not an option. If return type is 16:8 fixed point overflow is possible. A return type of 32:8 would eliminate overflow, but promotion quickly becomes problematic for larger integer types, lets say a 32 bit integer added to a fixed point. In that case we would need a 64 bit integer which could cripple performance on many systems. Another problem with promotion is that the arithmetic expression typically would be in a context where the expression is assigned to a fixed point of the type present in the expression. In that case we have an overflow problem again. I think I favor keeping the return type in 16:8. The user would then have to use an explicit cast to avoid overflow. 2. Return type of additive arithmetic operation between float and fixed points Example: fixed_point<int, 16>(10.5) + 5.5f (integer is representing type, 16 bit is the bit reserved for the fraction) What would you expect the return type to be? I think I favor returning a float. The accuracy of the fixed point is compromised anyway. 3. Return type of additive arithmetic operation between different fixed point types Example: fixed_point<int, 16>(10.5) + fixed_point<int, 8>(10.5) Like in case 1 there is the choice between losing precision, risking overflow or promoting (ex. to fixed_point<boost::int_t<16 + 8>::fast, 16>) I think I favor to simply disallow implicit arithmetic between different fixed point types. This would require the user to cast either type. The problem with that choice is less transparent use of the fixed_point type, since changing a type could break existing code where the previous type was in a valid arithmetic operation with another fixed point type. 4. Similar problems exist for multiplication and division If a fixed_point is multiplied with an integer there is a risk of overflow. If a fixed_point is multiplied with another fixed_point there is a risk of overflow and precision will be lost. If a fixed_point is divided by an integer there is a risk of loss of precision. If a fixed_point or integer is divided by another fixed_point there is a risk of overflow and loss of precision. Soren

Show replies by date

Ravi

2 Jul 2 Jul

2:09 a.m.

On Wednesday 01 July 2009 16:10:37 Soren Holstebroe wrote:

...

I'm generalizing my fixed point class and have come across a few design decisions.

1. Return type of additive arithmetic operation between integers and fixed points [snip more return type questions]

Given that both options (staying within the original widths and promoting to a larger width) are both valid (even crucial) in different contexts, it does not make sense to pick an option that would not let the user decide the meaning. Quoting from the zen of python: in the face of ambiguity, refuse the temptation to guess. My preferred solution is to let the user decide. The return type of any expression has a bitwidth that completely allows for any values of the operands: N + N -> N+1 N * N -> 2N The user can then assign it or cast it if necessary (for example, to minimize intermediate precision). Ok, but does this mean that precisions larger than some fixed bound (32-bit or 64-bit) must be provided? No. All that is needed is a compile-time error when instantiating a type with bit-width larger than 32 or 64. With a little bit of hackery, one can even return a proxy type that allows casting/assigning to an acceptable bit-width but provides a compile error for all other uses. That said, I am unlikely to use your library given your current interface. Any fixed-point library without policies allowing for choice in rounding vs. truncation and in saturation vs. wrapping would not meet my needs. Please see the fixed-point classes from SystemC for a reasonable interface (with rather poor implementation). Regards, Ravi

Neal Becker

12:25 p.m.

Ravi wrote:

...

On Wednesday 01 July 2009 16:10:37 Soren Holstebroe wrote:

...
I'm generalizing my fixed point class and have come across a few design decisions.

1. Return type of additive arithmetic operation between integers and fixed points [snip more return type questions]

Given that both options (staying within the original widths and promoting to a larger width) are both valid (even crucial) in different contexts, it does not make sense to pick an option that would not let the user decide the meaning. Quoting from the zen of python: in the face of ambiguity, refuse the temptation to guess.

My preferred solution is to let the user decide. The return type of any expression has a bitwidth that completely allows for any values of the operands: N + N -> N+1 N * N -> 2N The user can then assign it or cast it if necessary (for example, to minimize intermediate precision). Ok, but does this mean that precisions larger than some fixed bound (32-bit or 64-bit) must be provided? No. All that is needed is a compile-time error when instantiating a type with bit-width larger than 32 or 64. With a little bit of hackery, one can even return a proxy type that allows casting/assigning to an acceptable bit-width but provides a compile error for all other uses.

... I completely agree with the sentiment (explicit is better than implicit), but I would (and have) done it differently: N @ N -> N. If you want different, you cast the operands to wider field first. BTW, if you are interested in policy-based round, etc, you might be interested in the code I posted.

Ravi

1:56 p.m.

On Thursday 02 July 2009 08:25:11 Neal Becker wrote:

...

I completely agree with the sentiment (explicit is better than implicit), but I would (and have) done it differently:

N @ N -> N.

If you want different, you cast the operands to wider field first.

Too hard to use correctly, in my experience. For example, during DSP algorithm development, I have a function that computes something: template <typename A=double, typename B=double, typename C=double, typename R=double> R compute_cmag( A a, B b, C c) { return R( (a*(b+c))*(a*(b+c)) ); // grouping necessary for hardware model } After algorithm development, during fixed point conversion (for implementation on a chip), I find that the following suffices: A = s[2,-3] 7 bits B = s[4,-2] 8 bits C = s[4,-3] 9 bits R = u[7,2] with saturation and truncation 6 bits where s[a,b] is a signed type holding bits for 2^a, 2^(a-1),...,2^b and u[a,b] is the corresponding unsigned type. In generic code of the form above, I do not want to compute the largest type required for all intermediate computations, since the compiler can do it for me. In fact, I do not even want to know whether the intermediate precision is larger than what can represented by fixed-point numbers, since I expect the compiler to give me an error if the intermediate precision becomes too large. There are likely to be hundreds of such computations in a model for, say, a demodulator, and I definitely do not want to go through every single one of them trying to compute maximum intermediate precisions (unless the compiler tells me to do so). Regards, Ravi

joel

1:58 p.m.

Ravi wrote:

...

After algorithm development, during fixed point conversion (for implementation on a chip), I find that the following suffices: A = s[2,-3] 7 bits B = s[4,-2] 8 bits C = s[4,-3] 9 bits R = u[7,2] with saturation and truncation 6 bits where s[a,b] is a signed type holding bits for 2^a, 2^(a-1),...,2^b and u[a,b] is the corresponding unsigned type.

Let me ask a layman question then : are those ranges complex to evaluate ? If not, maybe can we do it at compile-time so we knwo we always geenrate the proper sufficent bit enabled representation. -- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

Ravi

2:18 p.m.

On Thursday 02 July 2009 09:58:52 joel wrote:

...

Ravi wrote:

...
After algorithm development, during fixed point conversion (for implementation on a chip), I find that the following suffices: A = s[2,-3] 7 bits B = s[4,-2] 8 bits C = s[4,-3] 9 bits R = u[7,2] with saturation and truncation 6 bits where s[a,b] is a signed type holding bits for 2^a, 2^(a-1),...,2^b and u[a,b] is the corresponding unsigned type.

Let me ask a layman question then : are those ranges complex to evaluate?

Yes, if you are asking about how I go from type 'double' to 's[2,-3]'. In fact, it is probably a black art! Considerable time is spent optimizing the bit-widths of data paths in a custom ASIC since every extra bit in a multiplier roughly doubles both the silicon area and the power required. Usually, you find that truncating one bit at some point in the signal processing chain seems to have no effect until you reach a point much later in the chain where it kills you and then you must check whether you really need to carry that extra bit or whether you can compensate for it by allowing extra bits in some intermediate computation.

...

If not, maybe can we do it at compile-time so we knwo we always geenrate the proper sufficent bit enabled representation.

It'd be worthwhile just to get the sort of type promotion Phil mentioned incorporated into a fixed-point library. However, if you are looking for a research topic, computing the bits required, for example, for a simple single-carrier QAM demodulator given the number of input bits (usually fixed by choice of ADCs), is probably achievable for someone with knowledge of theoretical computer science since the metric to be optimized (bit error rate at some signal-to-noise ratio) is well known and that one could come up with a (computationally infeasible) brute force approach pretty quickly. Further discussion about this is probably off-topic for this list, but I am happy to discuss this offline. Regards, Ravi

Soren Holstebroe

6:51 p.m.

...

My preferred solution is to let the user decide. The return type of any expression has a bitwidth that completely allows for any values of the operands: N + N -> N+1 N * N -> 2N The user can then assign it or cast it if necessary (for example, to minimize intermediate precision). Ok, but does this mean that precisions larger than some fixed bound (32-bit or 64-bit) must be provided? No. All that is needed is a compile-time error when instantiating a type with bit-width larger than 32 or 64. With a little bit of hackery, one can even return a proxy type that allows casting/assigning to an acceptable bit-width but provides a compile error for all other uses.

My worry is that a N32 * N32 operation giving an out of bits compile error would be really annoying in many scenarios (assuming 64 bit is not allowed or considered a performance killer). Perhaps the best would be to make the return type as a trait, but we would still need a convenient default behavior. I think it is important to understand all common usage scenarios and how likely they are to judge if some behavior has least surprise. I have used a fixed point template in a couple of scenarios. The most simple is just as a convenient conversion of external data. For example sound stream input in 32:28 signed fixed point, DSP filter in floats and output in 16:16 signed fixed point. In that scenario I didn't use any arithmetic, so the fixed template was mostly for a convenient easy maintainable notation. In another scenario I replaced a rather complicated float based image processing algorithm with a fixed point to gain speed. In that scenario arithmetic and comparison was used. Some error margin was acceptable so the most important issue was seamless type replacement without having to add casts and conversions to the existing code base. In yet another scenario I used fixed points in high performance computer graphics, where a fixed point type was not relevant since I used MMX assembly for maximum performance. My point is that, while a boost template should be general purpose, there are some usages that are more relevant than other and some where a general purpose library just doesn't do the job anyway.

...

That said, I am unlikely to use your library given your current interface. Any fixed-point library without policies allowing for choice in rounding vs. truncation and in saturation vs. wrapping would not meet my needs.

Rounding, etc. is another discussion. I agree that a fixed point should have optional policies for rounding, but I kept rounding and saturation out of the operator discussion to keep it simple. Soren

Phil Endecott

1:01 p.m.

Hi Soren, Soren Holstebroe wrote:

...

I'm generalizing my fixed point class and have come across a few design decisions.

1. Return type of additive arithmetic operation between integers and fixed points [etc.]

I had a couple of guidelines that I tried to follow in my implementation: - Behave similarly to the built-in types, e.g. int+float => float. This helps with the "principle of least surprise" idea. - Hope that the compiler will do useful optimisations, e.g. fixed<32> a, b, c; c = a + b; Here the temporary result of a+b needs 33 bits, but the expensive-to-compute 33rd bit is discarded in the assignment to c. If we are lucky, the compiler will realise this and not compute it in the first place. I never did any experiments to determine whether this actually does happen though.

...

Example: a 16 bit integer is added to a 16 bit fixed point with 8 bit fraction. I see three possible return types: 16 bit integer, 16:8 fixed point and promoted 32:8 fixed point If return type is 16 bit integer, the fraction is lost. Clearly not an option. If return type is 16:8 fixed point overflow is possible. A return type of 32:8 would eliminate overflow, but promotion quickly becomes problematic for larger integer types, lets say a 32 bit integer added to a fixed point. In that case we would need a 64 bit integer which could cripple performance on many systems. Another problem with promotion is that the arithmetic expression typically would be in a context where the expression is assigned to a fixed point of the type present in the expression. In that case we have an overflow problem again.

I think I favor keeping the return type in 16:8. The user would then have to use an explicit cast to avoid overflow.

I would prefer to keep all the bits.

...

2. Return type of additive arithmetic operation between float and fixed points

Example: fixed_point<int, 16>(10.5) + 5.5f (integer is representing type, 16 bit is the bit reserved for the fraction) What would you expect the return type to be? I think I favor returning a float. The accuracy of the fixed point is compromised anyway.

Yes, return a float.

...

3. Return type of additive arithmetic operation between different fixed point types Example: fixed_point<int, 16>(10.5) + fixed_point<int, 8>(10.5) Like in case 1 there is the choice between losing precision, risking overflow or promoting (ex. to fixed_point<boost::int_t<16 + 8>::fast, 16>)

I think I favor to simply disallow implicit arithmetic between different fixed point types. This would require the user to cast either type. The problem with that choice is less transparent use of the fixed_point type, since changing a type could break existing code where the previous type was in a valid arithmetic operation with another fixed point type.

No, I would prefer to return a type that keeps all the bits.

...

4. Similar problems exist for multiplication and division If a fixed_point is multiplied with an integer there is a risk of overflow. If a fixed_point is multiplied with another fixed_point there is a risk of overflow and precision will be lost.

You can return a type that has enough bits for the exact result. You can get a problem when you need more than 64 bits though. "Big Ints" (note, fixed-size big ints not variable-size ones) would fix this.

...

If a fixed_point is divided by an integer there is a risk of loss of precision. If a fixed_point or integer is divided by another fixed_point there is a risk of overflow and loss of precision.

I remember getting to division and my brain seized up trying to work out how many bits are needed.... As Ravi points out, different applications will have different requirements. And it is hard to implement a library that is totally flexible because there is nowhere in the syntax of the expression "a+b" to specify all the possible options. Perhaps what is needed is a "meta-library" that can be instantiated to provide fixed point types with different characteristics. Something like a layer on top of Boost.Operators. Thoughts anyone? Regards, Phil.

Ravi

2:28 p.m.

On Thursday 02 July 2009 09:01:22 Phil Endecott wrote:

...

As Ravi points out, different applications will have different requirements. And it is hard to implement a library that is totally flexible because there is nowhere in the syntax of the expression "a+b" to specify all the possible options.

Perhaps what is needed is a "meta-library" that can be instantiated to provide fixed point types with different characteristics. Something like a layer on top of Boost.Operators. Thoughts anyone?

Why would that be different from a combination of your two guiding principles? Based on you guideline 1, the return type of A (op) B is one that preserves all bits where (op) is addition, subtraction or multiplication (ignore division for now). Then, based on your guideline 2, when the result is cast/assigned to some type, that type would hold all the information necessary (trucation, saturation, etc.). The operators themselves need to know only enough to preserve bits. For division, one may require explicit casting everywhere, since an integer divisor which has prime factors other than 2 will require too many bits for some dividend, and the user must explicitly specify what to do. (The reasoning can be extended to fixed-point non-integers.) Regards, Ravi

5868

Age (days ago)

5869

Last active (days ago)

List overview

Download

8 comments

5 participants

participants (5)

joel
Neal Becker
Phil Endecott
Ravi
Soren Holstebroe