Re: [boost] Fixed point integer proposal

For example, does a(24bit) x b(24bit) -> c(24bit), or c(48bit)?
My preference is to be explicit about these issues, meaning 24bit x 24bit -> 24 bit. If you want more, you cast the operands first. Similarly there were some questions on division.
In my template N bit x N bit equals N bit per default. The number one design criteria was speed. Why would you use fixed point instead of floats if not for speed? I agree that you should be explicit if you want to promote the return type of a multiplication. That said, my template actually work with a promotion framework I wrote, where you can specify the promoted type of different kind of operations for different types. In the image processing library where the class was used, the promotion framework was set to promote N bit * N bit to 2N bit, but only up till the maximum number of bits natively supported by the CPU architecture. It was combined with an overflow check in debug mode to avoid surprises when the promotion was at the bit loft. That worked pretty well (and speedy).
My implementation also uses boost::operators. In addition, I have 2 versions, one which the bit-widths are set at compile time, and the other set at runtime (I use the latter to expose to python).
I can't see a way to support run time bit width without sacrificing performance. Soren

Soren Holstebroe wrote:
For example, does a(24bit) x b(24bit) -> c(24bit), or c(48bit)?
My preference is to be explicit about these issues, meaning 24bit x 24bit -> 24 bit. If you want more, you cast the operands first. Similarly there were some questions on division.
In my template N bit x N bit equals N bit per default. The number one design criteria was speed. Why would you use fixed point instead of floats if not for speed?
For reproducibility. I have several times wanted to not use floating point because its behaviour may change depending on architecture / compiler / compiler options. In some cases fixed point would have been fine, and it would be nice to see it in Boost. The alternative seems to be to roll my own fixed (or floating!) point type, which would be more difficult and error-prone and probably slower. Nevertheless, I think N bit x N bit equals N bit is sensible. John Bytheway

The number one design criteria was speed. Why would you use fixed point instead of floats if not for speed?
For reproducibility.
Yes, exactly. Fixed points doesn't accumulate errors as opposed to floats. This is very useful in any algorithm where values are accumulated.
I have several times wanted to not use floating point because its behaviour may change depending on architecture / compiler / compiler options.
Non-IEEE compliant floats. That sounds nasty. Soren

Soren Holstebroe wrote:
In my template N bit x N bit equals N bit per default.
Which of the 2N possible bits do you keep, and which do you discard? I chose to try to behave as closely as possible to how built-in integer types work. If I multiply uint8_t * uint8_t and assign the result to a uint32_t, it works. IIUC the equivalent operation would not work with your implementation, right?
The number one design criteria was speed. Why would you use fixed point instead of floats if not for speed?
In my case, it has been for two reasons: - To store in 32 bits a value with a precision of 2^-32, not the 2^-24 that you get with a float. - To allow me to do Z-curve bit manipulations on the values. Speed is also useful of course. As I think you are already seeing from the other responses there are various different applications for fixed point. Your choice is either to try to get a subset of the possible functionality that does what you want approved, or to invest some effort in expanding your library to make everyone happy. Regards, Phil.

Phil Endecott wrote:
Soren Holstebroe wrote:
In my template N bit x N bit equals N bit per default.
Which of the 2N possible bits do you keep, and which do you discard?
I chose to try to behave as closely as possible to how built-in integer types work. If I multiply uint8_t * uint8_t and assign the result to a uint32_t, it works.
Yes, but if you multiply a uint32_t by a uint32_t and assign to a uint64_t, then it doesn't work (assuming int == uint32_t). Do you want a fixed point library to mimic these apparent inconsistencies? It would probably make it less portable, since sizeof(int) isn't the same everywhere. John Bytheway

John Bytheway wrote:
Phil Endecott wrote:
I chose to try to behave as closely as possible to how built-in integer types work. If I multiply uint8_t * uint8_t and assign the result to a uint32_t, it works.
Yes, but if you multiply a uint32_t by a uint32_t and assign to a uint64_t, then it doesn't work (assuming int == uint32_t). Do you want a fixed point library to mimic these apparent inconsistencies? It would probably make it less portable, since sizeof(int) isn't the same everywhere.
Here's another example: say I add two 8-bit values and assign the result to a 16-bit value. Do I want to truncate the result of the addition to 8 bits before the assignment? I found other corner-cases where it's not obvious what to do. Using expression templates makes some of these choices easier (specifically because you know the type being assigned to) but it is much more complex to implement. Phil.
participants (3)
-
John Bytheway
-
Phil Endecott
-
Soren Holstebroe