Some follow-up on my fixed point code, and safe integers

Dear All, Last week I posted some fixed-point arithmetic code for potential boostification. Thanks for all your comments. The most common suggestion was that it ought to offer overflow detection, so I have just thrown together some proof-of-concept code to do this. I have added a third template parameter to the fixed<> class which allows you to specify an integer implementation type; it defaults to boost::int_t<>::least as before giving you no checking an no overhead. If you want to check for overflow you need to supply a checking integer type, and I've implemented a simple safe_int<N> class that can be used for this: http://svn.chezphil.org/libpbe/trunk/include/safe_int.hh This is just a proof of concept with some obvious holes, but I think it shows what's possible. It should also be usable on its own in integer applications. The other main issues that were raised were: - The alternative approach of using proto to provide something more sophisticated. I would be very interested to see what can be achieved by this approach (and at what cost) compared to my simplistic version. - The need for <cmath>-like functions. This is rather beyond my ability. - What implicit and/or explicit conversions should be included? - My concerns about how to do, for example, a 32x32->64-bit multiply efficiently without actually doing a 64x64->64-bit multiply. I look forward to hearing your comments. I'm keen to maintain momentum; if people think that something not too different in scope from what I am currently proposing would be acceptable then I will start thinking about the other things that are needed (docs, tests etc). The fixed<> class itself can be found at: http://svn.chezphil.org/libpbe/trunk/include/fixed.hh Regards, Phil.

"Phil Endecott" <spam_from_boost_dev@chezphil.org> writes:
The other main issues that were raised were:
- The alternative approach of using proto to provide something more sophisticated. I would be very interested to see what can be achieved by this approach (and at what cost) compared to my simplistic version.
I'm still working on this, a bit slowed down by the complexity of the SystemC datatype spec. I'll make the source code available asap, maybe integer first and real fixed point later. I will actually have two separate proto contexes: one in which user expressions are evaluated and one in which internal expressions (overflow and quantization) are evaluated. The former is what one would expect, the latter knows about special terminals like "MSB of the protion discarded by quantization" and many others, with as much as possible done at compile time.
- My concerns about how to do, for example, a 32x32->64-bit multiply efficiently without actually doing a 64x64->64-bit multiply.
I don't see too many ways out. One thing people do is to shift right by 16 both operands, do a 16x16->32 and then shift the result left by 32. (e.g. compute (x/2^16 * y/2^16)*2^32). This maintains at least the magnitude of the result, but whether it is acceptable or not depends on the application domain. Btw, X86_64 gives you 64x64->128, I think. But yes, on other architectures you might have the problem. I've seen in your code that you allocate one more bit than requested by the user in order to detect overflow. As you note this is problematic when you hit the machine operand length. In those cases, besides checking the msbs before and after you can also use the carry flag. It can be accessed through inline asm instruction, or used implicitely via conditional branches and conditional moves. Regards, Maurizio

Maurizio Vitale wrote:
"Phil Endecott" <spam_from_boost_dev@chezphil.org> writes:
- My concerns about how to do, for example, a 32x32->64-bit multiply efficiently without actually doing a 64x64->64-bit multiply.
I don't see too many ways out. One thing people do is to shift right by 16 both operands, do a 16x16->32 and then shift the result left by 32. (e.g. compute (x/2^16 * y/2^16)*2^32). This maintains at least the magnitude of the result, but whether it is acceptable or not depends on the application domain.
No, I'm not interested in anything with loss of precision.
Btw, X86_64 gives you 64x64->128, I think.
That's interesting. Is there any way to use that from C++? Of course, it also has a relatively efficient 32x32->64 because it has a single instruction for 64x64->64. So any code for this problem would have to be architecture-specific. But here is the sort of thing that I am thinking about for machines that only have 32x32->32: In these figures, each letter represents 16 bits. If I want to do 32x32->64, I need to break up the two operands into two 16-bit portions and do this calculation: A B x C D --------- B*D A*D C*B A*C --------- So that's 4 multiplies. This compares to the naive method where, in C++, I first cast to 64-bits and the compiler generates a 64x64->64 multiply like this: A B C D x E F G H ----------------- D*H C*H G*D B*H C*G F*D A*H B*G F*C E*D ------------------ That uses 10 multiplies. You don't need to do e.g. A*E as it falls off the most significant end of the result. (Have I got that right?) If you are lucky there might be some run-time zero detection that simplifies it a bit (e.g. in gcc this is done in a library function). But even then detecting it at compile time by explicitly asking fora 32x32->64 multiplication would still be better. (Is this sort of integer arithmetic of interest to people, other than as the implementation of fixed point? I believe there was a big-int SoC proposal; does that overlap with this?)
I've seen in your code that you allocate one more bit than requested by the user in order to detect overflow. As you note this is problematic when you hit the machine operand length. In those cases, besides checking the msbs before and after you can also use the carry flag. It can be accessed through inline asm instruction, or used implicitely via conditional branches and conditional moves.
Yes. Storing a 32-bit safe_int in 32 bits is significantly harder than storing a 31-bit one, however you do it. I believe that most of the time when we ask the compiler for an 'int', we don't really care exactly how many bits it has and making a 31-bit value your default safe_int so that it can use the relatively easy guard bit implementation would be a good enough solution. Regards, Phil.

"Phil Endecott" <spam_from_boost_dev@chezphil.org> writes:
Btw, X86_64 gives you 64x64->128, I think.
That's interesting. Is there any way to use that from C++?
Well, not directly as there's no 128-bit datatype were to store the result, but the instruction mulq rx (pseudo-syntax) multiplies the 64-bit in rax by the 64 bit in rx and stores the result in (dx,ax). So an inlined function with asm and the appropriate constraints could store the result in two longs. Then there's an imul instruction for signed quantities, where if you are willing to pay for longer opcodes (and don't know the implication in execution speeds) you have more flexibility on where to store instruction. It might even be that GCC defines those operations as intrinsics, but haven't checked this.
Yes. Storing a 32-bit safe_int in 32 bits is significantly harder than storing a 31-bit one, however you do it. I believe that most of the time when we ask the compiler for an 'int', we don't really care exactly how many bits it has and making a 31-bit value your default safe_int so that it can use the relatively easy guard bit implementation would be a good enough solution.
In general I agree, but my application is modeling hardware and people would be surprised if the object modeling their 32 bit register ends up being 8 byte long, so in my implementation I'll take one of the approaches I've outlined, and look at the carry and overflow bits. This will make my library a tad more difficult to port, but for now I'm focusing on X86_64 and GCC. Regards, Maurizio

"Phil Endecott" <spam_from_boost_dev@chezphil.org> writes:
If I want to do 32x32->64, I need to break up the two operands into two 16-bit portions and do this calculation:
I just stumbled on this, which migh be of interest to you, http://www.east.isi.edu/~jsuh/publ/sci01-jsuh-rev.pdf I haven't read it in detail yet. Regards, Maurizio

Phil Endecott wrote:
- My concerns about how to do, for example, a 32x32->64-bit multiply efficiently without actually doing a 64x64->64-bit multiply.
You might take a look at: http://www.devmaster.net/articles/fixed-point-optimizations/ and the discussion in the forums afterwards: http://www.devmaster.net/forums/showthread.php?t=9531 - Michael Marcin

"Phil Endecott" <spam_from_boost_dev@chezphil.org> writes:
Last week I posted some fixed-point arithmetic code for potential boostification. Thanks for all your comments.
- The need for <cmath>-like functions. This is rather beyond my ability.
I've just implemented some 64-bit (35.28, signed) fixed-point math functions: exp, sqrt, sin, cos. I can post the code if you like.
- What implicit and/or explicit conversions should be included?
I've got implicit (and explicit) conversions to/from every built-in arithmetic type.
I look forward to hearing your comments. I'm keen to maintain momentum; if people think that something not too different in scope from what I am currently proposing would be acceptable then I will start thinking about the other things that are needed (docs, tests etc).
I like the ability to specify the number of bits either side of the decimal point, but this does get in the way for doing the math functions --- you need a table with magic numbers in at the appropriate precision for exp and sin/cos. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

I have seen Phil's code, but I wonder if Maurizio is willing to offer a peak at the proto-fied library he is working on as well? I am curious to see the different approaches and also I wonder if either of you are willing to consider non power-of-two scaled values. For instance images are commonly stored using 8bit integers, but with the assumption that 255 is actually 'one'. Also dollars can be apparently by treated as scaled<int64_t, 100> in order to prevent some floating point errors. If the library included something like scaled<uint8_t, 255>, along with operators and their return types as discussed for fixed points, then I would personally be extremely interested. John

"John Femiani" <JOHN.FEMIANI@asu.edu> writes:
I have seen Phil's code, but I wonder if Maurizio is willing to offer a peak at the proto-fied library he is working on as well?
I've no problems in making my code available as I go, but I'm still working on porting to proto some of the code I did write before proto was widely announced. If I can finish that port I'll post the result as 'work in progress' around the end of next week. That part was using my horrible homegrown expression templates, but had the nice property of working, something you cannot underestimate in software development. And I was getting nice vibes from looking at the assembly code produced. In the meantime I might write up something on the features I'm trying to implement, both for discussion and as a starting point for documenting my code. But I'd like to stress an important point: I'm not in the immediate working on a boost library, rather I want to replace the SystemC datatype implementation with something: - much shorter (right now the OSCI SystemC implementation weights at about 15K lines of code, I want to get as near to zero as possible) - more efficient (by using expression templates there more domain specific optimizations that can be done) This means that I've to do a few things that are probably of no interest to a wider community. For instance in SystemC you have a class sc_int<N> that behaves like you'd think, but it is defined to inherit from a class sc_int_base that takes the width as a constructor parameters. This width cannot be changed after construction, so you'd question why to have it at run time at all. Still I need to allow bounds to be optionally specifiable at run-time. For now I'm doing good and you pay for bounds only when you actually want to specify them at run-time, otherwise (with decent EBO) they just disappear. Similary, there're things that might be interesting for some people and I'm not implementing immediately (but leave hooks in the framework). For instance I'm allowing multiple logical representations for integers, but at the beginning I'll implement only magnitude (for unsigned) and two_complement (for signed). Later, I'll support sign-magnitude and one-complement. I've ruled out for instance polarized numbers (e.g. numbers w/ offset as used often in exponents of floating point values) because it would be a place where the logical representation needs to be parametrized. Also, for now I'll support access to bitfields inside numbers only to the extent required by SystemC. The right thing to do would be to allow for the specification of a phyisical layout (simple application is the definition of different endianesses than the host machine. I did it with a previous employer and I know how to design the framework to allow for the plugin of this part later, but is a big project on its own.
I am curious to see the different approaches and also I wonder if either of you are willing to consider non power-of-two scaled values.
Not in the first implementation. Later I'd like to let people specify bounds as a MIN and MAX value, rather than bitwidths, but this wouldn't be what I think you mean here. If you allow for arbitrary scales, you get into what Ada called 'the small'. This is basically the weight of the lsb in your significand. And even them, they required compilers to implement only power of two. There're nasty things that happen with arithmetic when numbers with arbitrary (and different) smalls are mixed. Just to give an idea, when you add two values with a different binary scaling factor you need to align the decimal point, and you do this by shifting one of the two (you chose which so that you do not loose bits, when you're allowed to grow the significand). If the 'small' is arbitrary, this alignment becomes messier and people have fought many papers on the resulting error in arithmetic operations. So, no I do not think I want to go there.
For instance images are commonly stored using 8bit integers, but with the assumption that 255 is actually 'one'.
As I said before, this seems easy untill all numbers in the expression shares the same 'small', but in general is very difficult. There might be ways to get what you need in this case by allowing the user to specify optional functors that are used to convert to/from builtin types. But it requires further thought. Which is why probably I should post a list of features open for discussion, regardeless from what I'll be able to implement right now.
Also dollars can be apparently by treated as scaled<int64_t, 100> in order to prevent some floating point errors. If the library included something like scaled<uint8_t, 255>, along with operators and their return types as discussed for fixed points, then I would personally be extremely interested.
Well, next C++ and C standard will have numbers with decimal representation exactly for that. I haven't followed recent developments, but that was the impression I got.

Anthony Williams wrote:
I've just implemented some 64-bit (35.28, signed) fixed-point math functions: exp, sqrt, sin, cos. I can post the code if you like.
Please do.
I like the ability to specify the number of bits either side of the decimal point, but this does get in the way for doing the math functions --- you need a table with magic numbers in at the appropriate precision for exp and sin/cos.
Presumably the tables could be computed at run time once per type. There is also the question of what type the result should have. John Femiani wrote:
I wonder if either of you are willing to consider non power-of-two scaled values. For instance images are commonly stored using 8bit integers, but with the assumption that 255 is actually 'one'.
That's an interesting problem, and one that I have previously had to deal with myself. However, I'm really keen to focus on the core idea of a fixed-point type. I am a bit concerned that the nature of the Boost process could lead to a sort of "design by committee". If I were to resist all suggestions for "feature creep", would I eventually find my offering rejected for "lack of features"? Would only a "bloated" library be accepted? When I was integrating my safe_int<> type with fixed<>, I encountered an interesting problem. Recall that fixed has three template parameters giving the number of bits before and after the point and the implementation type. The implementation type defaults to an integer provided by Boost.Integer, however you can provide an alternative such as safe_int. My problem is what type should be used in the result of a function such as: template<int XWB, int XFB, typename XIMPL, int YWB, int YFB, typename YIMPL> fixed<XWB+YWB, XFB+YFB, ????> operator*(fixed<XWB,XFB,XIMPL> x, fixed<YWB,YFB,YIMPL>); Putting aside the question of what to do when X and Y have different implementation types (e.g. one is overflow checking and one is not), what I need in the return type is something that is "like XIMPL (or YIMPL) but with n bits". For example, if XIMPL is safe_int<XWB+XFB>, the return type should be safe_int<XWB+YWB+XFB+YFB>; if it's boost::int_t<XWB+XFB>::least, the return type should be boost::int_t<XWB+YWB+XFB+YFB>. Any idea how to do that? Regards, Phil.

"Phil Endecott" <spam_from_boost_dev@chezphil.org> writes:
I like the ability to specify the number of bits either side of the decimal point, but this does get in the way for doing the math functions --- you need a table with magic numbers in at the appropriate precision for exp and sin/cos.
For my library I'm specifying the number of bits in the significand and the position of the decimal point. This allows for some interesting scenarios (needed for my SystemC application). If you call W the width of the significand and S (for scale) the position of the decimal point you get: W <= S <= 0 : typical fixed point numbers (integers when S=0) with some number of bits in the integer part (possibly 0) and some number of bits in the fractional part (again, possibly 0) S > W : the decimal point is at the left of the significand, missing bits are the zero extension (for unsigned) or sign extension (for signed numbers) of the significand) S < 0 : the decimal point is to the right of the significand. Missing bits are 0s. A similar effect could be obtained by allowing the number of integer and fractional bits to be negative numbers, but I feel that in taht case many of the checks and computations you've to perform in order to do arithmetic are more complex. It wouldn't matter much (still linear stuff), but in myapplication there're cases where these computations happen at run-time, so the simpler the better. A user layer can easily adapt one representation into another (and indeed, mine is not exactly the SystemC one)
John Femiani wrote:
I wonder if either of you are willing to consider non power-of-two scaled values. For instance images are commonly stored using 8bit integers, but with the assumption that 255 is actually 'one'.
That's an interesting problem, and one that I have previously had to deal with myself. However, I'm really keen to focus on the core idea of a fixed-point type.
I am a bit concerned that the nature of the Boost process could lead to a sort of "design by committee". If I were to resist all suggestions for "feature creep", would I eventually find my offering rejected for "lack of features"? Would only a "bloated" library be accepted?
When I was integrating my safe_int<> type with fixed<>, I encountered an interesting problem. Recall that fixed has three template parameters giving the number of bits before and after the point and the implementation type. The implementation type defaults to an integer provided by Boost.Integer, however you can provide an alternative such as safe_int. My problem is what type should be used in the result of a function such as:
template<int XWB, int XFB, typename XIMPL, int YWB, int YFB, typename YIMPL> fixed<XWB+YWB, XFB+YFB, ????> operator*(fixed<XWB,XFB,XIMPL> x, fixed<YWB,YFB,YIMPL>);
Putting aside the question of what to do when X and Y have different implementation types (e.g. one is overflow checking and one is not), what I need in the return type is something that is "like XIMPL (or YIMPL) but with n bits". For example, if XIMPL is safe_int<XWB+XFB>, the return type should be safe_int<XWB+YWB+XFB+YFB>; if it's boost::int_t<XWB+XFB>::least, the return type should be boost::int_t<XWB+YWB+XFB+YFB>. Any idea how to do that?
I'm doing things a bit differently, so the rest of this may or may not be of use. I completely separate the functionality of storing values from the overflow and quantization behaviour, so all I really have to do in arithmetic operations is to find an appropriate container for the value. Containers are classified in families. That's user definable, but for SystemC I'll have: things which are less than 64 bits, things that are larger than 64 bits (but fixed) and things that have infinite precision. The latter two will be implemented w/ the GNU multiprecision library. When doing arithmetic, what happens is user definable. While the natural size of a multiplication would be W1+W2:S1+S2, the user may decide that some families lock-in. In this case if for instance you are in a family that allows up to 64 bits everything goes as you described and you get the natural size until you hit the 64 bit wall. After that you invoke the overflow handling. It is user defineable whether overflow is managed only upon creation/assignment or for every subexpressions. This is again required for allowing me to implement SystemC, but is also possible to allow for family promotion. I have a metafunction that computes the concrete type used for representing values. Arguments are: S: the desired size in bits Families: a description of classes of types to be used Family: the desired family, defaulting to unspecified Families is a (mpl) vector of family descriptions, each having three components: - the maximum number of bits (0 means unbounded) - the raw type to be used (can be a lambda expression, in which case it is called with Size and Representation (basically signed or unsigned), and this is where it blends with boost::(u)int_t - whether the representation allows for infinite precision or not. I include the relevant portion of my code that does this at the end. It doesn't really work for you because I assume that families are non-overlapping (kind of, I have the exception of 0 for unlimited) and that the first found is the one you want to represent your number. I'm still thinking about how to address the second part of what you achieve with allowing normal ints and safe ints together. At present I'm leaning towards a receiver-decides policy, combined with a user selectable policy of overflow/quantization on assignment only or also on subexpression. To clarify, if O_1..O_n are variables that would throw on overflow when something too large is assigned to them and I_1..I_n are variables that would behave like C++ builtins and ignore overflow, I'm thinking to have: I_1 = I_0 + I_1 + I_2 - never throws O_1 = I_0 + I_1 - throws on assignment if I_0+I_1 is too large for O_1 O_1 = O_2 + O_2 - throws on + if (because of O_1 being the target): - throw on subexpression is selected - either no family exist to hold the sum or the user has disabled family promotion throws on = as the previous one if you ever get there I_1 = O_1 + I_1 - this is probably more debatable, but I'm presently thinking it should never throw. I'd appreciate if people would contribute to the discussion by stating what the'd like for combining overflow and quantization handling. Regards, Maurizio --------------------------------------------------- template<int N, typename raw_type, bool bounded_> struct family { typedef mpl::int_<N> bits; typedef raw_type type; typedef mpl::bool_<bounded_> bounded; }; template<typename Family> struct bits { typedef typename Family::bits type; }; template<typename Family> struct raw_type { typedef typename Family::type type; }; //using mpl::placeholders::_; using mpl::placeholders::_1; using mpl::placeholders::_2; // Not really infinity, but good enough for a number of bits which is larger than anybody will ever // need. Hey, it is even more than what fits in 640KB. Be happy. Peace, ok? // FIX-ME why can't we use const_max? 1L<<16 is not even 640KB in size. //static const long infinity = boost::integer_traits<long>::const_max; static const long infinity = 1L << 16; // This is used for the limited precision family, and we assume that boost::int_t and boost::uint_t cover // sizes up to 64 bits. This is only true on 64 bit machines, as (unsigned) long long is not supported yet. template<typename Size, typename Representation> struct builtin_numeric { typedef typename boost::int_t<Size::value>::fast type; }; template<typename Size> struct builtin_numeric<Size, magnitude_c> { typedef typename boost::uint_t<Size::value>::fast type; }; // FIX-ME we actually want to say mpz_class directly when defining families, but I haven't found a way // for detecting whether somethig is a binary metafunction or a plain type. Till then close your eyes... template<typename Size, typename Representation> struct gmp_number { typedef mpz_class type; }; struct unbounded : family<0, gmp_number<_1,_2>, false> {}; struct limited : family<64, builtin_numeric<_1,_2>, true> {}; struct unlimited : family<infinity, gmp_number<_1,_2>, true> {}; //typedef mpl::vector2<unbounded, limited/*, unlimited*/> default_families; typedef mpl::vector3<unbounded, limited, unlimited> default_families; //-- Compute the concrete type we use for representing values. // // SIZE is the number of bits in the significand. The sign bit is included in SIZE. // // REPRESENTATION is the _logical_ layout of numbers. This includes: // - magnitude_c, for unsigned numbers // - two_complement_c, for two's complement signed numbers // we also define, but not support for now // - one_complement_c, for ones complement signed numbers // - sign_magnitude_c, for signed numbers represented as sign and magnitude. // we do not define biased numbers. It would complicate things (it would be the // first representation requiring a parameter, although we could do without, as // frequently the bias is derived from the size). If there's a desperate need for these // we will reconsider. // // FAMILIES is a type describing the allowed families among which to chose. // // FAMILY is a tag for specifying the desired family. When left unspecified, the first // family that allows for SIZE bits with REPRESENTATION is selected. // If the user explicitely requests a family, it is required that this family is // actually capable of holding SIZE bits with REPRESENTATION. // // this metafunction returns the concrete type as its primary result, available as concrete_type::type // In addition it defines: // family, the family returned // bits, the maximum number of bits in the significanf, as an mpl::int_<> template<typename Size, typename Representation, typename Families, typename Family> struct concrete_type { typedef typename mpl::end<Families>::type not_found; typedef typename mpl::find_if<Families, mpl::less_equal<Size, bits<_1> > >::type found; BOOST_MPL_ASSERT_NOT ((boost::is_same<found, not_found>)); typedef typename when_unspecified<Family, typename mpl::deref<found> >::type::type family; // FIX-ME here we should find a way for detecting whether raw_type returns a binary metafunction or // something else. In the first case you want to apply the metafunction to SIZE and REPRESENTATION, // otherwise you want to return the type unmodified. // FIX-ME it might be useful to have another special case: if type is a pair, consider the first element // to define the type for unsigned numbers and the second element of the pair for signed numbers. typedef typename mpl::apply<typename raw_type<family>::type, Size, Representation>::type type; }; template<typename Representation, typename Families, typename Family> struct concrete_type<run_time, Representation, Families, Family> { BOOST_MPL_ASSERT_NOT ((boost::is_same<Family,unspecified>)); // FIX-ME here we should find a way for detecting whether raw_type returns a binary metafunction or // something else. In the first case you want to apply the metafunction to SIZE and REPRESENTATION, // otherwise you want to return the type unmodified. // FIX-ME it might be useful to have another special case: if type is a pair, consider the first element // to define the type for unsigned numbers and the second element of the pair for signed numbers. typedef typename mpl::apply<typename raw_type<Family>::type, typename bits<Family>::type, Representation>::type type; };

Phil Endecott wrote:
I am a bit concerned that the nature of the Boost process could lead to a sort of "design by committee". If I were to resist all suggestions for "feature creep", would I eventually find my offering rejected for "lack of features"? Would only a "bloated" library be accepted?
my $.02 I think you have nothing to fear. I can't think of a Boost library where only a 'bloated library' would be accepted. As a general rule when I've been a review manager I would only consider a it 'doesn't do this use case' as vital if it's so central that the library interface would really be incomplete or broken without it. So, as long as the library is useful to many without a particular feature one 'it doesn't do this' doesn't do xyz' isn't usually critical to rejecting a library me. If lots of people are in agreement than it's harder to ignore. Of course review managers are people making judgments, so your mileage may vary. Anyway, you're free to *not include* features that you don't think are important to the core mission of the library. The hard thing is if lots of other people think it's important and you don't. Jeff

"Phil Endecott" <spam_from_boost_dev@chezphil.org> writes:
Anthony Williams wrote:
I've just implemented some 64-bit (35.28, signed) fixed-point math functions: exp, sqrt, sin, cos. I can post the code if you like.
Please do.
I've attached the code for my fixed point type. It does sin, cos, exp, sqrt, ln, tan, atan, floor, ceil, modf, abs, and the full set of normal arithmetic functions.
I like the ability to specify the number of bits either side of the decimal point, but this does get in the way for doing the math functions --- you need a table with magic numbers in at the appropriate precision for exp and sin/cos.
Presumably the tables could be computed at run time once per type.
Yes, but that would add to the app startup cost, or make the first math function call *very* expensive. The arctan table (for sin and cos) has 30 entries, and the log tables used for exp total 91 entries. 30 calls to atan and 91 calls to log can take a while. If you increase the precision, then you need more entries than that.
There is also the question of what type the result should have.
The same type as the argument seems a sensible start. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

I like the ability to specify the number of bits either side of the decimal point, but this does get in the way for doing the math functions --- you need a table with magic numbers in at the appropriate precision for exp and sin/cos.
Presumably the tables could be computed at run time once per type.
Yes, but that would add to the app startup cost, or make the first math function call *very* expensive. The arctan table (for sin and cos) has 30 entries, and the log tables used for exp total 91 entries. 30 calls to atan and 91 calls to log can take a while.
If you were doing this in a loop, yes, but as a single, one time, startup cost? Even if each call to atan/log takes a microsecond, and you use all 64 different precisions, that's *still* only a few milliseconds on the startup - who is going to care? Um. If you are doing this in an embedded 16-bit processor (which is just the sort of place that fixed point often /is/ useful), that microsecond might be a bit optimistic - and you would really want the constants in ROM not RAM.
If you increase the precision, then you need more entries than that.
-- Martin Bonner Project Leader PI SHURLOK LTD Telephone: +44 1223 441434 / 203894 (direct) Fax: +44 1223 203999 Email: martin.bonner@pi-shurlok.com www.pi-shurlok.com

"Martin Bonner" <Martin.Bonner@pi-shurlok.com> writes:
I like the ability to specify the number of bits either side of the decimal point, but this does get in the way for doing the math functions --- you need a table with magic numbers in at the appropriate precision for exp and sin/cos.
Presumably the tables could be computed at run time once per type.
Yes, but that would add to the app startup cost, or make the first math function call *very* expensive. The arctan table (for sin and cos) has 30 entries, and the log tables used for exp total 91 entries. 30 calls to atan and 91 calls to log can take a while.
If you were doing this in a loop, yes, but as a single, one time, startup cost? Even if each call to atan/log takes a microsecond, and you use all 64 different precisions, that's *still* only a few milliseconds on the startup - who is going to care?
Um. If you are doing this in an embedded 16-bit processor (which is just the sort of place that fixed point often /is/ useful), that microsecond might be a bit optimistic - and you would really want the constants in ROM not RAM.
Exactly. I wrote my fixed point class for code that ran on an ARM processor. The version of exp in the C library (for doubles) takes 600 microseconds on that target. Just the one set of tables would be an extra 60ms at startup, and 64 sets would cause a delay of over 3 seconds --- quite noticeable. For a 16-bit target, I would be surprised if anyone was using 64 bit fixed point, and if you have less precision you need fewer table entries. Then again, I expect 16 bit embedded CPUs to be rather slow in calculating log. Of course, it wouldn't take too much effort to write a program to spew out the tables for each and every possible set of precisions. The generated source files could be then be compiled as normal, and the compiler could dutifully put the values in ROM as desired. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Anthony Williams wrote:
Of course, it wouldn't take too much effort to write a program to spew out the tables for each and every possible set of precisions. The generated source files could be then be compiled as normal, and the compiler could dutifully put the values in ROM as desired.
These are just integer calculations. Shouldn't we be able to calculate arbitrarily precise tables at complile-time using meta-programming? I'd probably put the calculations in a separate lib and explictly instantiate them for the types/precision I care about as there would probably a noticeable hit for such complex meta-programming math but I'd rather do that than use an external code generator. Thanks, Michael Marcin

"Michael Marcin" <mmarcin@method-solutions.com> writes:
Anthony Williams wrote:
Of course, it wouldn't take too much effort to write a program to spew out the tables for each and every possible set of precisions. The generated source files could be then be compiled as normal, and the compiler could dutifully put the values in ROM as desired.
These are just integer calculations. Shouldn't we be able to calculate arbitrarily precise tables at complile-time using meta-programming?
I'd probably put the calculations in a separate lib and explictly instantiate them for the types/precision I care about as there would probably a noticeable hit for such complex meta-programming math but I'd rather do that than use an external code generator.
If you fancy writing a template metaprogram to calculate atan or logarithms, go ahead! Actually calculating the values from scratch in a reasonable length of time is more than I could manage in a *run-time* solution --- the lookup tables make it several orders of magnitude faster. I think the extra compile time complexity would either make the compiler keel over, or the compile take ridiculously long. Anyway, I was suggesting that the code generator be run exactly once (unless it needed to be modified for some reason, e.g. to change the rounding rules for better overall accuracy, or change the values for which the atan and natural log are calculated), and then the generated code kept in CVS. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Anthony Williams wrote:
If you fancy writing a template metaprogram to calculate atan or logarithms, go ahead! Actually calculating the values from scratch in a reasonable length of time is more than I could manage in a *run-time* solution --- the lookup tables make it several orders of magnitude faster. I think the extra compile time complexity would either make the compiler keel over, or the compile take ridiculously long.
Yes it is silly to calculate the values from scratch on every call or every compile. If they could be put in a static library they would only need to be calculated when you would need to regenerate the tables anyway. The benefit is that I don't need to install yet another tool to regenerate them.
Anyway, I was suggesting that the code generator be run exactly once (unless it needed to be modified for some reason, e.g. to change the rounding rules for better overall accuracy, or change the values for which the atan and natural log are calculated), and then the generated code kept in CVS.
Different situations call for different amounts of resolution in the tables and fixed-point types. It would be hard if not impossible to have tables to fit every need in CVS. For it to be useable it needs to be readily configurable to get the "right" behavior for the application with reasonable defaults to get things up and running when it's not yet time for fine tuning. Thanks, Michael Marcin
participants (8)
-
Anthony Williams
-
Jeff Garland
-
John Femiani
-
Martin Bonner
-
Maurizio Vitale
-
Michael Marcin
-
Michael Marcin
-
Phil Endecott