
On Sunday 17 August 2008 09:32:40 Zeljko Vrba wrote:
On Sat, Aug 16, 2008 at 11:50:05PM +0300, dizzy wrote:
I do not agree. They generally do their job fine (which is provide portable support to work with native, non checked, platform integer types). For any other needs you should probably use another tool.
Well, they do *not* do their job fine: (-1U < 2) == false, which is a mathematical nonsense (more on that below).
That does not prove anything. Any good tool has bad use cases. It does not prove integer types do not do their job fine for what was their purpose. You seem to want more than their purpose.
Signed arithmetic overflow is undefined behavior, and some CPUs actually raise an exception on overflow (e.g. MIPS). Every 'a+b' expression, with a and b being signed integers, is potential UB. Some machines (e.g. x86) do not raise an exception but set an overflow flag which may be tested by a single instruction, yet I don't know of a compiler which offers run-time overflow checking as a code-generation option. Portable checks for overflow (relying on bit operations) incur immense overhead (certainly much greater than a single instruction).
Which is the reasoning why they were not introduced in the standard since it's prohibitive or impossible in certain platforms. So? That just means the native integer types inherit the many limitations of the low level CPU arithmetic operations of the many platforms out there. Kinda logical to do so if you ask me (since C++ tries to be defined so it has fast implementations on the many native platforms out there instead of trying to be strict on his own virtual machine).
Writing an extra set of parentheses is not visually intrusive or cluttering.
Says who?
Writing static_cast<int>(expr), (int)(expr), or surrounding the offending code with #pragma _is_ cluttering and intrusive.
Common, if we are to argue about how clumsy it _looks_ writing additional parenthesis vs static_cast I think we have too much time on our hands. There are countless other compromises we do when programming in C++ just because for the rest it works so well (and no I don't see that as a flaw in the language, meaning one could have all the good things of C++ without the bad things, I see that as result of the good things of C++, every language feature has side effects, some which we may not find good in certain contexts but that does not make them a flaw).
Yes, I want my code to be short, concise, and easily readable, in addition to being correct. So shoot me :-)
Well that's you. C++ was also made to: - not pay for anything you don't need (I think RTTI is an exception there) - allow writing code that is very close to the native platform - be source compatible with C as much as possible
I have researched comp.lang.c++.moderated archives on this topic, and other sources, and found two advices:
Peter van der Linden in "Expert C Programming: Deep C Secrets" writes:
"Avoid unnecessary complexity by minimizing your use of unsigned types. Specifically, don't use an unsigned type to represent a quantity just because it will never be negative (e.g., "age" or "national_debt")."
A quote of Bjarne Stroustrup: "The unsigned integer types are ideal for uses that treat storage as a bit array. Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea. Attempts to ensure that some values are positive by declaring variables unsigned will typically be defeated by the implicit conversion rules."
I personally use "unsigned" to mean values that cannot be negative and I always careful about comparing them with signed types and such cases. That is, in the low level code of course (it's also on layers, starting at some layer it becomes unsigned after being properly checked for negative values and from that point on it is used only as unsigned internally tho the outside world can send signed values). In the high level code (which I am not writing, since my work focuses on low level code) I would use something like your proposed library (which proves that C++ not only offers native types to satisfy _my_ needs, you can also use them to do something to satisfy _your_ needs thus they are not "flawed").
Yet, all of the STL uses an unsigned type for size_type and the return value of size().
It uses std::allocator::size_type which is size_t for obvious reasons. Why size_t is unsigned is a question to be asked for the C people who made it so in C90 I guess :) Either way they are low level enough to not complain about that choice.
As much as I'd like to use only signed ints, this becomes prohibitive (due to warnings) when mixing them with STL. An yes, I've been bitten several times in the past by implicit signed -> unsigned conversions in relational comparisons. The most sane thing would be to throw an exception at runtime if one compares a negative integer with some unsigned quantity, instead of getting false for '-1 < 2U', which is a mathematical nonsense.
You imply some code to check for that thus violating the basic principle of not paying for what you don't need.
signedness of a type *should not* affect its numerical value in mathematical operations.
But it can't do that without runtime cost. There are 3 options the compiler can do about the a < b (a being signed, b being unsigned): - it can, as it does now, convert a to unsigned (which may result in unspecified value) and compare that; you find it wrong - it can convert b to signed and compare with a; you think this is right but it isn't because it's exactly as the option above, it has cases in which conversion of b to signed results in unspecified value (because the unsigned b value may be bigger than what the signed a can represent); would you still prefer this over the above option? maybe so but others will not since the need exactly the other way (the current solution) - it can promote both arguments to the signed integer type that supports all values of both (say long) and compare that; this may seem fine to you but it has the following problems: - it assumes there is a type that can represent both those values; we can clearly imagine that a and b might have been long and unsigned long in the beginning so that won't work; not to mention there is no guarantee that even if they are int then long can fit them both; long can have the same size of int (as it does so on many platforms) - promoting to a bigger type than int and comparing that may prove more expensive than comparing ints (which the standard defines as being the natural integer type of the platform, thus the "fastest" one); so then we would be violating the principle of paying for what we don't need in cases of those people that they do know a is not negative thus a < b won't do anything bad; Your solution to this problem?
I *do* realize that. However, simple wrappers won't fix operators. I like having as many low-overhead sanity checks in my code as possible, but I don't want to litter the code with a bunch of assert()s in front of every arithmetic expression. Should I replace every 'if(a < b)' with 'if(less(a, b))' and provide a bunch of overloaded less() functions?
No but why would you do it? You can't know at certain locations in your program that something may be negative or not? That's actually _why_ I use unsigned types, to signal that it can't be negative from that moment on (ie the user code sees the interface taking an "int" but the called function converts it unsigned after checking for the precondition of being positive value and then from that point on uses only unsigneds so having unsigned types actually helps knowing that variable can't be negative). Or otherwise, how do you know a pointer can't be invalid at every point of use in the program? You can't, you can have invariants based on the code flow up to that point or to help you a bit with this you can have the value stored in something that moves the precondition earlier (like a reference, for which to be initialized you would need to dereference the pointer and if that's invalid the UB happens then). That's the same with receiving a user value as int and storing it in unsigned to be used later as unsigned. The later code sees unsigned (just as the later pointer code sees the reference) and knows it's a positive value (just as the later pointer code knows it's a valid reference). They know this because at the location of converting from int to unsigned (or from pointer to reference) you performed the check.
Oh, the library would allow conversion from integer<32, 0, 4294967295> to integer<16, 0, 65535>, and insert an appropriate run-time range-check.
Sounds fine.
So types integer
and integer would be compatible if the intersection of [L1,H1] and [L2,H2] is not empty; the resulting type would be coerced to integer (though other bounds are necessary for other operations, e.g., addition), with a run-time check inserted (this makes it possible to maintain rather lax bounds at compile-time). ===
Actually, I think I could simplify my requirements a bit:
- define template class integer<T> with T == char, short, int, long, long long (no unsigned types allowed!)
So you can't know at some point in code if some value is positive without assumption (may be error prone) or runtime check (too costly). I think you should allow to signal syntactically that some type takes only positive values, it is something often needed in code to know this for sure about some variable. It won't break your invariants, it can be made compatible with the rest of your library ideas.
- allow initialization from _any_ integer<T> or primitive type, larger or smaller, signed or unsigned, provided it is in the range of T; throw exception otherwise
Initialization with implicit conversion too? If so that might become tricky with the bellow requirement about conversion to integer types (in general it's tricky to get implicit conversions right).
- allow conversion to _any_ underlying integer type, signed or unsigned, provided the value fits; otherwise throw an exception - allow mixed arithmetic between mixed integer<T> classes (e.g. integer<int> + integer<char> would be defined) as well as between integer<T> and any primitive type, subject to the conversion rules above - arithmetic would be checked for overflow
This, as you said, can be overly expensive on some platforms or not? But if you need it sure.
- comparisons between integer<T> and unsigned types are allowed as long as integer<T> is positive or 0; otherwise an exception is thrown
Mathematically speaking "0" is a positive value ;) On the serious side, why throw? Wouldn't be better to still try to perform them somehow (and only if there is no underlying common type that can represent all their values and be able to compare them throw). This of course comes with associated costs and if as you say below you want a flag that disables all associated runtime costs then I understand why you need to throw here in the debug version.
- bit manipulation operators would behave as if T were unsigned; an additional method or function signed_shift(integer<T>) would be provided
But you should make some more strict requirements then. You can't assume a negative value encoding (such as 2's complement) thus you should not allow bitwise operations affecting more than the bits forming the value, ie without the sign bit. Otherwise you (as the C++ standard) get "unspecified value".
- arrays of integer<T> must have the same low-level layout as arrays of T
I'm not sure how this can be accomplished. Sure you can have only the integer value as the single non static data field but I don't think the standard says that a struct A { int a; /* any members here except non static data members */ }; has the same binary layout with a struct A { int a; };. I know it makes sense to have but I'm not sure about the letter of the standard on this. Conclusion: it's a nice library. It's something that I also suggested to some colleagues of mine when they ran into some of those signed/unsigned warnings issues (into their high level code). But notice that some of your checks may be too expensive, more expensive than someone is willing to pay to get those features, so then to make it generic you will need to do some sort of compile time configuration, like for example with policies. You say you would be using it only in debug mode so you think of it just as a form of assert() but using the operator overloading syntactic sugar to still have common integer arithmetic syntax without assert()s polluting the code everywhere. But you would use it as native types when in release mode, so then the C++ integer types are not flawed that you would use them in release mode? Your only complain is that they were flawed for not having expensive runtime checks when used in debug mode? :)
Example:
integer<int> a; integer<short> b; short result = a + b + 7;
This code would convert b to integer<int>, check for overflow before (or after[*]) adding it to a, compute a+b, check for overflow before adding 7, add 7 to the result, check that the result fits into short, done. If any check fails, an exception is thrown.
[*] Overflow checks could be implemented in platform-specific assembler; e.g. x86 sets the overflow flag _after_ addition.
This is the semantic that I'd like to have in debug versions; in "release" version of the program, everything would behave as ordinary arithmeic on primitive types. I'm studying numeric_cast<>, but.. it covers only conversions, not other items listed above.
There is a lot of possible dangerous code that can be done in C or C++ and some compilers tell you about it but that do not make it wrong code. Warnings
The problem is that every 'a+b', with a and b signed, is dangerous (potential UB), unless you either 1) formally *prove* that the expression won't overflow or 2) insert extra run-time checks (which clutter the code). :/
Sure, just as "*p" is potential UB unless you: - formally prove p is a valid pointer - you insert extra run-time checks (with pointers this is actually worse since other than the null pointer value you have no way to determine if the pointer is invalid) As I said, it's just normal C++, many places of potential danger but that might be fine.
Something as common as simple addition should at least have an _option_ of *always*, under *all* circumstances having defined behavior.
Portable and without runtime costs, no. It has no such option. That's the reason why undefined behavior has been introduced in C (and in C++). There are many cases where because of technical and portability reasons, to provide a defined behavior would be either impossible or too expensive. So the standard says then it's undefined behavior.
The integer<> class proposed above is just one of possibilities; but that should have been included in the standard :/
You mean in the standard library? Well I work often with open source projects, and in the open source community we have a saying when someone complains about something not already available in software: "if it's not available it means it's not needed". To a certain degree this is valid for the current problem too. Search on the Internet, how many std-like hash implementations you find vs how many such range checked integer libraries you find? That is a good estimation of the need of such a library IMO. Yes, it is useful, but it is not very very useful. Plus, because it can be implemented in C++ proves that C++ integer types are not flawed and they allow you to do what you need. -- Dizzy "Linux is obsolete" -- AST