size_type doubts / integer library..

newer
[[Boost]][[Interprocess]] Defining...

Zeljko Vrba

16 Aug 2008 16 Aug '08

5:32 p.m.

Integer types in C and C++ are a mess. For example, I have made a library where a task is identified by a unique unsigned integer. The extra information about the tasks is stored in a std::vector. When a new task is created, I use the size() method to get the next id, I assign it to the task and then push_back the (pointer to) task structure into the vector. Now, the task structure has also an "unsigned int id" field. In 64-bit mode, sizeof(unsigned) == 4, sizeof(std::vector::size_type) == 8 I get a warnings about type truncation, and obviously, I don't like them. But I like explicit casts and turning off warnings even less. No, I don't want to tie together size_type and task id type (unsigned int). One reason is "aesthetic", another reason is that I don't want the task id type to be larger than necessary (heck, even a 16-bit type would have been enough), because the task IDs will be copied verbatim into another std::vector<unsigned> for further processing (edge lists of a graph). Doubling the size of an integer type shall have bad effects on CPU caches, and I don't want to do it. What to do? Encapsulate into "get_next_id()" function? Have a custom size() function/macro that just casts the result of vector::size and returns it? == Another example: an external library defines its interfaces with signed integer types, I work with unsigned types (why? to avoid even more warnings when comparing task IDs with vector::size() result, as in assert(task->id < tasks.size()), which are abundant in my code). Again, some warnings are unavoidable. What to do to have "clean" code? == Does anyone know about an integer class that lets the user define the number of bits used for storage, lower allowed bound and upper allowed bound for the range? Like: template<int BITS, long low, long high> class Integer; BITS would be allowed to assume a value only equal to the one of the existing integer types (e.g. 8 for char, 16 for short, etc.), and the class would be constrained to hold values in range [low, high] (inclusive). All operations (+-*/<<>><>) would be compile-time checked to make sense. For example, it would be valid to add Integer<32, 0, 8> with Integer<8, 0, 8> and store into Integer<8, 0, 16> (or Integer<8, 0, 17>) result, but NOT into Integer<8, 0, 8> result. The operations would also be checked at run-time to not overflow. Address-of operator would convert to signed or unsigned underlying integer type, depending on the lower bound. Mixing operations with native integer types would be allowed, provided ranges are respected[*]. And, of course, the library should have a "production mode" where the class would turn off all checks. Does anybody know about such a library? Is a unified numeric tower too much to ask for, at least in the form of a library? How do you deal with these issues? [*] If there's no such library in existence, I might be willing to think about a set of "sane" axioms and checks for automatic conversions. And I have "Hacker's Delight" at home, so I can peek into it and get formulas for deriving bounds of arithmetic operations. And yes, is this even easily implementable within C++ template framework; one has to deal with integer overflow.. What happens in template<long A, long B> class blah { const long T = A+B; }; when A+B overflows? Compiler error? Undefined result? Did C++0x do anything to clean up the integer types mess? This is a proposal for extension to the C language: http://www.cert.org/archive/pdf/07tn027.pdf Anything similar available for C++? </rant> Or should I just listen to the devil on my shoulder and turn off the appropriate warnings?

Show replies by date

dizzy

16 Aug 16 Aug

8:50 p.m.

On Saturday 16 August 2008 20:32:59 Zeljko Vrba wrote:

...

Integer types in C and C++ are a mess.

I do not agree. They generally do their job fine (which is provide portable support to work with native, non checked, platform integer types). For any other needs you should probably use another tool.

...

For example, I have made a library where a task is identified by a unique unsigned integer. The extra information about the tasks is stored in a std::vector. When a new task is created, I use the size() method to get the next id, I assign it to the task and then push_back the (pointer to) task structure into the vector. Now, the task structure has also an "unsigned int id" field. In 64-bit mode,

sizeof(unsigned) == 4, sizeof(std::vector::size_type) == 8

I get a warnings about type truncation, and obviously, I don't like them.

It's perfectly defined to assign values from bigger integrals to smaller integrals as long as the value can be represented of course. There is nothing wrong with that. Some compiler warnings can be helpful others not.

...

But I like explicit casts and turning off warnings even less.

You can research how to turn off that warning with your compiler only in specific parts of the code. Obviously the easiest way is explicit conversion.

...

No, I don't want to tie together size_type and task id type (unsigned int).

Of course not, no reason to tie them.

...

One reason is "aesthetic", another reason is that I don't want the task id type to be larger than necessary (heck, even a 16-bit type would have been enough), because the task IDs will be copied verbatim into another std::vector<unsigned> for further processing (edge lists of a graph). Doubling the size of an integer type shall have bad effects on CPU caches, and I don't want to do it.

What to do? Encapsulate into "get_next_id()" function? Have a custom size() function/macro that just casts the result of vector::size and returns it?

Sure. It is very common in a program to need to assign values from types that can represent different values as long as your program logic makes sure those values can be represented at destination, how else can you imagine to do it?

...

Another example: an external library defines its interfaces with signed integer types, I work with unsigned types (why? to avoid even more warnings when comparing task IDs with vector::size() result, as in assert(task->id < tasks.size()), which are abundant in my code).

You realise you can fix just that with a helpful function wrapper and you don't need to change whole interfaces and redesign your code because of a warning of your compiler being too picky with perfectly fine code right?

...

Again, some warnings are unavoidable.

The warnings were ment to be helpful. If they are not, turn them off.

...

What to do to have "clean" code?

Define "clean" code. If by "clean" code you mean code that does not generate any warnings but does not use any casts and still has full warnings enabled I guess you could find cases where the same conversion is performed but no warning is issued tho that sounds really bad, to write your code depending on whatever compiler warning algorithm specifics (also notice that even if the warning is not issued that does not mean the code is safer in any way since the same possibly dangerous conversion is performed).

...

Does anyone know about an integer class that lets the user define the number of bits used for storage, lower allowed bound and upper allowed bound for the range? Like: template<int BITS, long low, long high> class Integer;

BITS would be allowed to assume a value only equal to the one of the existing integer types (e.g. 8 for char, 16 for short, etc.), and the class would be constrained to hold values in range [low, high] (inclusive).

All operations (+-*/<<>><>) would be compile-time checked to make sense. For example, it would be valid to add Integer<32, 0, 8> with Integer<8, 0, 8> and store into Integer<8, 0, 16> (or Integer<8, 0, 17>) result, but NOT into Integer<8, 0, 8> result. The operations would also be checked at run-time to not overflow.

What is nice with C++ is that it offers you the tools to do things. A lot of people wouldn't need all those checks so they need the low level integrals that C++ provides. Other people like you do need them so they need these new integral types. C++ templates, metaprogramming and operator overloading makes it possible to do it.

...

Address-of operator would convert to signed or unsigned underlying integer type, depending on the lower bound.

This I don't understand. What has address-of anything to do with integral values?

...

Mixing operations with native integer types would be allowed, provided ranges are respected[*].

And, of course, the library should have a "production mode" where the class would turn off all checks.

Does anybody know about such a library? Is a unified numeric tower too much to ask for, at least in the form of a library? How do you deal with these issues?

I think it can be done and it's not very complex, it's a good example of the power of C++ IMO. However, notice that SUCH a library would not solve your original descibed problem (it would actually make it from a warning to a compile time error). Supose std::vector<T>::size_type is such a checked integral. Since it's suposed to represent the maximum size possible in memory (actually it's just the size_type of the Allocator argument but I'm talking about the default one) then it would be something like integer<32, 0, 4294967295>. Then, your thread ID is something like integer<16, 0, 65535>. Then the compile time checks you talked about woudl not allow you to assign values from what vector.size() returns to your thread IDs. So what you actually do is get to the same old solution we have now, you convert values from a type being able to represent more to a type with less values after you have made sure the values can be stored. So how exactly does your library solve anything then?

...

[*] If there's no such library in existence, I might be willing to think about a set of "sane" axioms and checks for automatic conversions. And I have "Hacker's Delight" at home, so I can peek into it and get formulas for deriving bounds of arithmetic operations.

And yes, is this even easily implementable within C++ template framework; one has to deal with integer overflow.. What happens in

template<long A, long B> class blah { const long T = A+B; };

when A+B overflows? Compiler error? Undefined result? Did C++0x do anything to clean up the integer types mess?

If it does overflow what would you have liked the standard to say about it?

...

This is a proposal for extension to the C language: http://www.cert.org/archive/pdf/07tn027.pdf

Anything similar available for C++?

Or should I just listen to the devil on my shoulder and turn off the appropriate warnings?

Obviously so if you are doing correct code. By your reasoning one should find a solution for the common if(a = b) warning other than: - turn off the warning - or wrap with another set of paranthesis if((a = b)) (the equivalent of the explicit conversion in your case) There is a lot of possible dangerous code that can be done in C or C++ and some compilers tell you about it but that do not make it wrong code. Warnings were put into compilers to catch on common pitfalls but they generally do not mean the code is wrong. -- Dizzy

Andrea Denzler

9 p.m.

Isn't it easier to define those types? int16, int32, int64 uint16, uint32, uint64 Of course the definition is determined by the platform using #ifdef code but basically we have something like typedef unsigned short uint16; It should be enough to handle most integer cases. Use always those types when you need to know it's size and use explicit casting to other types when needed to avoid the warning from bigger to smaller integer size or from signed to unsigned. Try to reduce that situations in any case. Andrea

Zeljko Vrba

17 Aug 17 Aug

6:32 a.m.

On Sat, Aug 16, 2008 at 11:50:05PM +0300, dizzy wrote:

...

I do not agree. They generally do their job fine (which is provide portable support to work with native, non checked, platform integer types). For any other needs you should probably use another tool.

Well, they do *not* do their job fine: (-1U < 2) == false, which is a mathematical nonsense (more on that below). Signed arithmetic overflow is undefined behavior, and some CPUs actually raise an exception on overflow (e.g. MIPS). Every 'a+b' expression, with a and b being signed integers, is potential UB. Some machines (e.g. x86) do not raise an exception but set an overflow flag which may be tested by a single instruction, yet I don't know of a compiler which offers run-time overflow checking as a code-generation option. Portable checks for overflow (relying on bit operations) incur immense overhead (certainly much greater than a single instruction). [mail rearranged a bit]

...

You can research how to turn off that warning with your compiler only in specific parts of the code. Obviously the easiest way is explicit conversion.

...
Or should I just listen to the devil on my shoulder and turn off the appropriate warnings?

Obviously so if you are doing correct code. By your reasoning one should find a solution for the common if(a = b) warning other than: - turn off the warning - or wrap with another set of paranthesis if((a = b)) (the equivalent of the explicit conversion in your case)

Writing an extra set of parentheses is not visually intrusive or cluttering. Writing static_cast<int>(expr), (int)(expr), or surrounding the offending code with #pragma _is_ cluttering and intrusive. Yes, I want my code to be short, concise, and easily readable, in addition to being correct. So shoot me :-) I have researched comp.lang.c++.moderated archives on this topic, and other sources, and found two advices: Peter van der Linden in "Expert C Programming: Deep C Secrets" writes: "Avoid unnecessary complexity by minimizing your use of unsigned types. Specifically, don't use an unsigned type to represent a quantity just because it will never be negative (e.g., "age" or "national_debt")." A quote of Bjarne Stroustrup: "The unsigned integer types are ideal for uses that treat storage as a bit array. Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea. Attempts to ensure that some values are positive by declaring variables unsigned will typically be defeated by the implicit conversion rules." Yet, all of the STL uses an unsigned type for size_type and the return value of size(). As much as I'd like to use only signed ints, this becomes prohibitive (due to warnings) when mixing them with STL. An yes, I've been bitten several times in the past by implicit signed -> unsigned conversions in relational comparisons. The most sane thing would be to throw an exception at runtime if one compares a negative integer with some unsigned quantity, instead of getting false for '-1 < 2U', which is a mathematical nonsense. signedness of a type *should not* affect its numerical value in mathematical operations.

...

...
Another example: an external library defines its interfaces with signed integer types, I work with unsigned types (why? to avoid even more warnings when comparing task IDs with vector::size() result, as in assert(task->id < tasks.size()), which are abundant in my code).

You realise you can fix just that with a helpful function wrapper and you don't need to change whole interfaces and redesign your code because of a warning of your compiler being too picky with perfectly fine code right?

I *do* realize that. However, simple wrappers won't fix operators. I like having as many low-overhead sanity checks in my code as possible, but I don't want to litter the code with a bunch of assert()s in front of every arithmetic expression. Should I replace every 'if(a < b)' with 'if(less(a, b))' and provide a bunch of overloaded less() functions?

...

The warnings were ment to be helpful. If they are not, turn them off.

If I'm going to turn off that particular warning, I want to compensate with extensive run-time checking, at least in the "debug" version. There are _good_ reasons that warnings about 64->32 bit truncation or comparisons of different signedness exist.

...

Supose std::vector<T>::size_type is such a checked integral. Since it's suposed to represent the maximum size possible in memory (actually it's just the size_type of the Allocator argument but I'm talking about the default one) then it would be something like integer<32, 0, 4294967295>. Then, your thread ID is something like integer<16, 0, 65535>. Then the compile time checks you talked about woudl not allow you to assign values from what vector.size() returns to your thread IDs. So what you actually do is get to the same old

Oh, the library would allow conversion from integer<32, 0, 4294967295> to integer<16, 0, 65535>, and insert an appropriate run-time range-check. So types integer<N1,L1,H1> and integer<N2, L2, H2> would be compatible if the intersection of [L1,H1] and [L2,H2] is not empty; the resulting type would be coerced to integer<max(N1,N2), min(L1, L2), max(H1, H2)> (though other bounds are necessary for other operations, e.g., addition), with a run-time check inserted (this makes it possible to maintain rather lax bounds at compile-time). === Actually, I think I could simplify my requirements a bit: - define template class integer<T> with T == char, short, int, long, long long (no unsigned types allowed!) - allow initialization from _any_ integer<T> or primitive type, larger or smaller, signed or unsigned, provided it is in the range of T; throw exception otherwise - allow conversion to _any_ underlying integer type, signed or unsigned, provided the value fits; otherwise throw an exception - allow mixed arithmetic between mixed integer<T> classes (e.g. integer<int> + integer<char> would be defined) as well as between integer<T> and any primitive type, subject to the conversion rules above - arithmetic would be checked for overflow - comparisons between integer<T> and unsigned types are allowed as long as integer<T> is positive or 0; otherwise an exception is thrown - bit manipulation operators would behave as if T were unsigned; an additional method or function signed_shift(integer<T>) would be provided - arrays of integer<T> must have the same low-level layout as arrays of T (i think this covers all the requirements) Example: integer<int> a; integer<short> b; short result = a + b + 7; This code would convert b to integer<int>, check for overflow before (or after[*]) adding it to a, compute a+b, check for overflow before adding 7, add 7 to the result, check that the result fits into short, done. If any check fails, an exception is thrown. [*] Overflow checks could be implemented in platform-specific assembler; e.g. x86 sets the overflow flag _after_ addition. This is the semantic that I'd like to have in debug versions; in "release" version of the program, everything would behave as ordinary arithmeic on primitive types. I'm studying numeric_cast<>, but.. it covers only conversions, not other items listed above.

...

There is a lot of possible dangerous code that can be done in C or C++ and some compilers tell you about it but that do not make it wrong code. Warnings

The problem is that every 'a+b', with a and b signed, is dangerous (potential UB), unless you either 1) formally *prove* that the expression won't overflow or 2) insert extra run-time checks (which clutter the code). :/ Something as common as simple addition should at least have an _option_ of *always*, under *all* circumstances having defined behavior. The integer<> class proposed above is just one of possibilities; but that should have been included in the standard :/

Michiel Helvensteijn

5:13 p.m.

Zeljko Vrba wrote:

...

Actually, I think I could simplify my requirements a bit:

I strongly agree with your proposal, and pretty much everything in your post. I would also like the possibility to explicitly state the lower- and upper-bound in the type and the possibility to use a dynamic integer (without any bounds), should the need arise. In my opinion, the ideal solution would be to have only one int type in the language, which would conceptually have no bounds. Convenient sub-types (like uint) would be defined which would enforce type-invariants (like: value >= 0). The programmer may also define his own sub-types. All int-types would be compatible, however, and behave correctly in a mathematical sense. If an operation between two integers overflows, an assertion failure would arise. This is clearly not possible for C++. But I'm designing a programming language where integers work this way. The hope is that the bounds of a plain int may be found (formally proved) at compile-time, so it can be cleanly mapped to a hardware-int without any runtime overhead. If the programmer specifies bounds, they are either proved correct at compile-time or enforced at run-time. In all other cases, a dynamic int is used. This should completely eliminate unexpected runtime overflow. I'm also thinking of making int itself a subtype of a bit-array, which would take care of bit-shift operations. But I'm not really sure how that would work yet for all integers, since I would like to make the behavior hardware independent and predictable. -- Michiel Helvensteijn

Andrea Denzler

5:43 p.m.

Actually the compiler issue a warning if you convert something that will create an overflow (like unsigned int to signed int). When there is no overflow you will get no warning (like unsigned char to signed int). Avoid such warnings writing your own conversion code (see assert below). What else can the compiler do since the value to convert is unknown (unless you have a const). To get an assert on integer overflow at runtime you must write your own integer class, handling the conversion from all integral types (modern CPUs doesn't offer interrupts on integer overflow). You will have an overhead of course, but it can be minimized very well using assembler instructions. The operators < and > works well mathematically. The problem with -1U < 2 is that you are going to have an overflow because you are converting -1 to unsigned. This happens before the operator is applied (it require both value of the same type). -----Messaggio originale----- Da: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] Per conto di Michiel Helvensteijn Inviato: domenica 17 agosto 2008 19.13 A: boost-users@lists.boost.org Oggetto: Re: [Boost-users] size_type doubts / integer library.. Zeljko Vrba wrote:

...

Actually, I think I could simplify my requirements a bit:

Zeljko Vrba

6:31 p.m.

On Sun, Aug 17, 2008 at 07:43:14PM +0200, Andrea Denzler wrote:

...

To get an assert on integer overflow at runtime you must write your own integer class, handling the conversion from all integral types (modern CPUs doesn't offer interrupts on integer overflow). You will have an overhead of course, but it can be minimized very well using assembler instructions.

That's what I proposed. I listed my requirements and asked whether such a library already existed :-)

...

The operators < and > works well mathematically. The problem with -1U < 2 is that you are going to have an overflow because you are converting -1 to unsigned. This happens before the operator is applied (it require both value of the same type).

Well, the first expression was a typo, it should be -1 < 2U. Technically, no overflow happens because signed->unsigned conversion is well-defined (even for negative numbers). I know _why_ it happens, I was complaining on the definition which mathematically makes no sense. == It is ironic that the experts recommend to use signed integers, yet the language definition is backwards since it is biased towards unsigned arithmetic: in three of four possible mixes (s/u,u/s,u/u) the arithmetic is unsigned, and only in the s/s case the arithmetic is signed and behaves according to the usual mathematical definitions. Given the language as it is, the advice should be to use unsigned most of the time, because it is contagious and has always a defined behavior. So one should actually only cast to a signed type only at the moment a signed interpretation is needed. This works nice for 2nd complement which preserves signedness even with unsigned operations (at least across add/sub), though it would probably horribly break with 1st complement or sign-magnitude representation. Blindly multiplying or dividing and reinterpreting the result as signed will break (i.e. give incorrect result) even with 2nd complement representation. <rant>~30 years since C has been invented, ~20 since C++ has been invented, and they still have no sane integer arithmetic defined, not even as an option *shrug*</rant> <sarcasm>I guess the best choice is to use float, at least one knows what to expect.</sarcasm>

Andrea Denzler

7:16 p.m.

I may add that C/C++ have different integer sizes on different platforms adding even more confusion. I understand that a basic int has the size of the processor register, but when I handle and store data values I want to know it's size. When I want a 16 bit integer then I want to use a 16 bit integer because I don't waste space with a 32 bit or have data structures of different sizes. Even worse the size of the wchar_t for i18n. A signed/unsigned compare should always generate a warning, but I just found out it doesn't if you use constant values like -1 < 2U. Funny. signed int a=-1; unsigned int b=2U; bool result = a < b; // as usual I get the signed/unsigned warning Technically if you compare/add/sub two signed/unsigned values of the same byte size then you are having an overflow because signed values doesn't exist in the unsigned realm and you have a double amount of unsigned values. That's why a class (or new standard integer types) handling those confusions is really welcome. Until now I rely on crossplatform integer sizes (uint16, uint32, uint64, etc) and compiler warnings. I think compiler warnings are important because you always know there is something to care about. An overflow assert at runtime can happen or not happen. So ideally we should handle at compile time explicitly all incompatible integral types (signed/unsigned of same size) and have at runtime (at least in debug mode) asserts for any kind of integer overflow (through casting from signed to/from unsigned and basic operations like add, sub, mul, div, etc). imho.... -----Messaggio originale----- Da: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] Per conto di Zeljko Vrba Inviato: domenica 17 agosto 2008 20.31 A: boost-users@lists.boost.org Oggetto: Re: [Boost-users] size_type doubts / integer library.. On Sun, Aug 17, 2008 at 07:43:14PM +0200, Andrea Denzler wrote:

...

To get an assert on integer overflow at runtime you must write your own integer class, handling the conversion from all integral types (modern

CPUs

...

doesn't offer interrupts on integer overflow). You will have an overhead of course, but it can be minimized very well using assembler instructions.

That's what I proposed. I listed my requirements and asked whether such a library already existed :-)

...

The operators < and > works well mathematically. The problem with -1U < 2

...

that you are going to have an overflow because you are converting -1 to unsigned. This happens before the operator is applied (it require both value of the same type).

Zeljko Vrba

18 Aug 18 Aug

6:14 a.m.

On Sun, Aug 17, 2008 at 09:16:29PM +0200, Andrea Denzler wrote:

...

I may add that C/C++ have different integer sizes on different platforms adding even more confusion. I understand that a basic int has the size of the processor register, but when I handle and store data values I want to

"The size of processor register" has become very vague with the arrival of AMD64. Registers are 64-bit, but the default operand size is 32-bit. 64-bit integer arithmetic requires an extra instruction prefix.. So, in a sense, 32-bit integers are still the most efficient integer datatype even on AMD64 (and indeed, "int" is 32-bits). AMD64 is just an example that first comes to mind; there are probably other similar architectures...

...

That's why a class (or new standard integer types) handling those confusions is really welcome. Until now I rely on crossplatform integer sizes (uint16,

OK, I might start to write some code in my free time soon.. If for nothing else, then to experiment with the Boost.Operators library and playing with numeric traits. (and because I like to code in assembler :-)) In the first round, the class will only support x86/AMD64; in the second round I will apply the black magic from "Hacker's Delight" and write it portably.

dizzy

9:34 a.m.

On Sunday 17 August 2008 22:16:29 Andrea Denzler wrote:

...

I may add that C/C++ have different integer sizes on different platforms adding even more confusion.

How do you suggest that C++ offer integer types that are native to the platform then?

...

I understand that a basic int has the size of the processor register, but when I handle and store data values I want to know it's size.

So you have sizeof().

...

When I want a 16 bit integer then I want to use a 16 bit integer because I don't waste space with a 32 bit or have data structures of different sizes.

I don't think bits matter in storage much. Because there is no system interface I am aware of that works with bits (POSIX file I/O, sockets I/O, etc all work with bytes). They all work with bytes, with the native platform byte. And sizeof() tells you the size in bytes. So you got all you need there to know how much space it takes to store that type.

...

Even worse the size of the wchar_t for i18n.

sizeof(wchar_t) works as well.

...

A signed/unsigned compare should always generate a warning, but I just found out it doesn't if you use constant values like -1 < 2U. Funny. signed int a=-1; unsigned int b=2U; bool result = a < b; // as usual I get the signed/unsigned warning

Technically if you compare/add/sub two signed/unsigned values of the same byte size then you are having an overflow because signed values doesn't exist in the unsigned realm and you have a double amount of unsigned values.

Not necessarily true (that unsigned have the corresponding signed type sign bit for value). There can be unsigned types for which the range of values is the same as the signed type (except of course the negative values).

...

That's why a class (or new standard integer types) handling those confusions is really welcome.

For what, for the issues Zeljko described or for the fixed integer size you said? For the former it's technically impossible to have native fast integer types and checked without runtime cost. For the later of course you can have as you can even make classes for fixed integer types (I have something like this in my serialization framework and the code takes fast code paths if it detects at compile time that the current platform matches well the fixed integer sizes that the user asked for).

...

Until now I rely on crossplatform integer sizes (uint16, uint32, uint64, etc) and compiler warnings. I think compiler warnings are important because you always know there is something to care about. An overflow assert at runtime can happen or not happen.

Of course, there is no doubt compile time checks are better than runtime ones (the main reason I like to use C++ and not C, since I have more compile time semantics to express more invariants at compile time using familiar syntax).

...

So ideally we should handle at compile time explicitly all incompatible integral types (signed/unsigned of same size) and have at runtime (at least in debug mode) asserts for any kind of integer overflow (through casting from signed to/from unsigned and basic operations like add, sub, mul, div, etc).

I think his library idea does exactly this. Still, that doesn't mean the C++ native integrals are flawed :) -- Dizzy "Linux is obsolete" -- AST

Andrea Denzler

11:51 a.m.

-----Original Text----- Dizzy wrote:

...

So you have sizeof().

You missed my point! Or you really think I never heard about sizeof? :-) When I define my class/struct it happen that to avoid a waste of space I want to define it with the needed size. For example I want it 16 bit (sorry that I use the bit expression, it's a old habit). The only way to do that is using preprocessor directive (because platform dependent) that create something like int16, int32, int64. With small data I don't care but when I work on a huge amount of data the difference between int16 and int32 is a double amount of used memory. And this matters. wchar_t is horribly defined. Sometimes 1 byte, on windows 2 byte, on most unix platforms 4 byte. Not only the size is different but also the encoding is different! What a wrong choice. You ever worked with Unicode? wchar_t is supposed to help in this but it request you to add a lot of platform dependant code. When I use UTF-8, UTF-16 or UTF-32 I NEED to know the size of the integer value when I define it. char for UTF-8, but today I use platform dependent preprocessor directives for UTF-16/32, would be simpler if C/C++ offer int16, int32, int64 types. Also when I want to store the data on cross platform compatible files I suppose to use always the same byte size for integer values. So using explicitly a int32/int64 data type (and handling manually endianess) is a easy way to handle this. Those types doesnt exist so we all have to define manually with stuff like #if.... + typedef int int32, etc etc.

...

...
Technically if you compare/add/sub two signed/unsigned values of the same byte size then you are having an overflow because signed values doesn't exist in the unsigned realm and you have a double amount of unsigned values. Not necessarily true (that unsigned have the corresponding signed type sign bit for value). There can be unsigned types for which the range of values is the same as the signed type (except of course the negative values).

How can a signed integer type include all the valus of a unsigned integer type if both has the same bit/byte size? Or I don't get your point.

...

For what, for the issues Zeljko described or for the fixed integer size you said? For the former it's technically impossible to have native fast integer types and checked without runtime cost.

Checked integer types are useful for debug builds. The runtime cost is not that heavy, usually on overflow a register flag is set (in some cases even an interrupt) so a simple assembler jmp handle this. But again.. only for debug/special builds since we all know that there is a runtime overhead.

...

For the later of course you can have as you can even make classes for fixed integer types (I have something like this in my serialization framework and the code takes fast code paths if it detects at compile time that the current platform matches well the fixed integer sizes that the user asked for).

You see.. you too admit that you needed to use integer types where you knew it's size, e.g. "fixed integer types". -----Messaggio originale----- Da: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] Per conto di dizzy Inviato: lunedì 18 agosto 2008 11.35 A: boost-users@lists.boost.org Oggetto: Re: [Boost-users] size_type doubts / integer library.. On Sunday 17 August 2008 22:16:29 Andrea Denzler wrote:

...

I may add that C/C++ have different integer sizes on different platforms adding even more confusion.

How do you suggest that C++ offer integer types that are native to the platform then?

...

I understand that a basic int has the size of the processor register, but when I handle and store data values I want to know it's size.

So you have sizeof().

...

When I want a 16 bit integer then I want to use a 16 bit integer because I don't waste space with a 32 bit or have data structures of different sizes.

...

Even worse the size of the wchar_t for i18n.

sizeof(wchar_t) works as well.

...

A signed/unsigned compare should always generate a warning, but I just found out it doesn't if you use constant values like -1 < 2U. Funny. signed int a=-1; unsigned int b=2U; bool result = a < b; // as usual I get the signed/unsigned warning

Technically if you compare/add/sub two signed/unsigned values of the same byte size then you are having an overflow because signed values doesn't exist in the unsigned realm and you have a double amount of unsigned values.

...

That's why a class (or new standard integer types) handling those confusions is really welcome.

...

Until now I rely on crossplatform integer sizes (uint16, uint32, uint64, etc) and compiler warnings. I think compiler warnings are important because you always know there is something to care about. An overflow assert at runtime can happen or not happen.

...

So ideally we should handle at compile time explicitly all incompatible integral types (signed/unsigned of same size) and have at runtime (at least in debug mode) asserts for any kind of integer overflow (through casting from signed to/from unsigned and basic operations like add, sub, mul, div, etc).

I think his library idea does exactly this. Still, that doesn't mean the C++ native integrals are flawed :) -- Dizzy "Linux is obsolete" -- AST _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

dizzy

1:03 p.m.

On Monday 18 August 2008 14:51:53 Andrea Denzler wrote:

...

-----Original Text-----

Dizzy wrote:

...
So you have sizeof().

You missed my point! Or you really think I never heard about sizeof? :-)

When I define my class/struct it happen that to avoid a waste of space I want to define it with the needed size.

You never said anything like that in your previous email. However, you do realize that the standard says very few things about binary layout? So sizeof(struct { char a; int b; }) is usually > sizeof(char) + sizeof(int).

...

For example I want it 16 bit (sorry that I use the bit expression, it's a old habit). The only way to do that is using preprocessor directive (because platform dependent) that create something like int16, int32, int64. With small data I don't care but when I work on a huge amount of data the difference between int16 and int32 is a double amount of used memory. And this matters.

You can also write your own template type that takes bitsize (or value range) and resolves to the smallest native integral able to satisfy the requirement. Are you saying that besides the native platform integer types you would also like something like this to come with the standard library? If so propose it to the committee. If you are saying there should be _only_ such types then I hope you realize that's really not acceptable as many people still need to be able to use C++ with the fastest native type possible.

...

wchar_t is horribly defined. Sometimes 1 byte, on windows 2 byte, on most unix platforms 4 byte. Not only the size is different but also the encoding is different! What a wrong choice.

So you are saying the standard should have decided on a byte size for wchar_t when the byte size in bits depends on the platform? Or you probably mean on a range value. Such a range value would be defined depending on the encoding used. So if the encoding is undefined it makes sense that everything else about wchar_t is (to allow the implementation to have it's own encoding and enough wchar_t representation for it). I see your complain about wchar_t as I see Zeljko's complain about native integers. wchar_t means native platform wide character capable of holding any character the platform may support. Just as char is meant to be used to store the basic character set without any encoding specification (ASCII or not). Just as "int" is meant to be the fastest native platform integer type, without having specified negative value encoding, size (apart from the C90 inherited requirements on the minimum value ranges).

...

You ever worked with Unicode? wchar_t is supposed to help in this but it request you to add a lot of platform dependant code.

No here I think you are wrong. wchar_t is not supposed to help you work with Unicode. That's like saying that "int" is supposed to help you work with integer values stored with 2's complement encoding (or that char was meant to help you work with ASCII characters). Since neither "int" has negative value encoding specified nor does wchar_t it clearly means that they were not meant to help you with that.

...

When I use UTF-8, UTF-16 or UTF-32 I NEED to know the size of the integer value when I define it. char for UTF-8, but today I use platform dependent preprocessor directives for UTF-16/32, would be simpler if C/C++ offer int16, int32, int64 types.

Of course you need to do so since you need value range guarantees for specific types. You can query those with numeric_limits<type>::min()/max(). With some template machinery you can do it at compile time (tho not using ::min()/::max() I just mentioned as in C++03 a function call cannot be used to form a const expression, but you can write specializations for each of the char, short, int, long integer types that then check the range using INT_MAX and such constants). This is generally valid when working with anything that requires fixed representation (file binary formats, network protocols, character encoding, etc). You are saying that besides the native integral types (and native character types) C++ should offer you some fixed integer types (and some fixed character encoding types). C++0x will offer both from what I understand.

...

Also when I want to store the data on cross platform compatible files I suppose to use always the same byte size for integer values. So using explicitly a int32/int64 data type (and handling manually endianess) is a easy way to handle this. Those types doesnt exist so we all have to define manually with stuff like #if.... + typedef int int32, etc etc.

Yes, that's what all people do in their portable layer creating portable types. Notice that just integer size and byte endianess is not all it takes to make it portable. You also have to take into consideration the negative value encoding (the positive one you don't need to as the standard specifies it to be a pure binary system). My serialization library dedicates an important part of it describing these differences, what I do is create something like: integer<signed, 20, integer_serializer<9, 3, high_endian, 1st_complement> > Which means I want a signed (I use "signed", "unsigned" as plain tags) integer type of 20 bits minimum that is serialized/deserialized with a representation of 3 bytes of 9 bits each using high_endianess and 1st_complement negative value encoding. The library decides at compile time to use fast paths of code if it detects that the external representation as certain common attributes with the current machine native representation. I haven't described this to advertise the library just to say that IMO it is quite normal for a low level language like C++ to provide native types of which representation depend largely on the platform. As long as one can still solve his portable types problems with these everything is fine with me. The fact that C++0x adds the C99/POSIX fixed width integer types and also some fixed character encoding types all the better but it's not the end of the world.

...

How can a signed integer type include all the valus of a unsigned integer type if both has the same bit/byte size? Or I don't get your point.

Huh? Where in the standard does it says that the unsigned type has to use all the bits? They have the same size in bytes as reported by sizeof() but that does not mean they can't have padding. I actually asked about this to make sure a couple days ago on c.l.c++ usenet group, see the answers (especially James' one) here: http://groups.google.com/group/comp.lang.c++/browse_thread/thread/e1fe1592fe...

...

Checked integer types are useful for debug builds. The runtime cost is not that heavy, usually on overflow a register flag is set (in some cases even an interrupt) so a simple assembler jmp handle this. But again.. only for debug/special builds since we all know that there is a runtime overhead.

That I completely agree. One of the main reasons I love using stdlib is that I can compile it in debug mode and have all iterators, dereferences checked. But we can't ask from the standard to offer such checked native integers only in debug more or something :)

...

You see.. you too admit that you needed to use integer types where you knew it's size, e.g. "fixed integer types".

Of course there is a need (otherwise they wouldn't be in C++0x). I'm just saying C++ native types are not flawed because of not providing that. They are not the tool for that, plain simple. -- Dizzy "Linux is obsolete" -- AST

Zeljko Vrba

2:48 p.m.

On Mon, Aug 18, 2008 at 04:03:06PM +0300, dizzy wrote:

...

...
Checked integer types are useful for debug builds. The runtime cost is not that heavy, usually on overflow a register flag is set (in some cases even an interrupt) so a simple assembler jmp handle this. But again.. only for debug/special builds since we all know that there is a runtime overhead.

That I completely agree. One of the main reasons I love using stdlib is that I can compile it in debug mode and have all iterators, dereferences checked. But we can't ask from the standard to offer such checked native integers only in debug more or something :)

I wish that the standard defined a set of standard debugging / diagnostic / safety facilities that would behave identically on all platforms. You like having debug STL. I'd like to have a debug / defined version of integers. Why do you think that it's wrong to define those things in the standard as an _option_ ? Do you mind that div() and ldiv() functions are standardized[*]? If not, why would you mind having defined minimal requirements for safe integers, safe iterators, etc? [*] These two functions are specifically provided to provide platform-independent rounding in integer division. So the precedent already exists.

Andrea Denzler

4:15 p.m.

...

You never said anything like that in your previous email. However, you do realize that the standard says very few things about binary layout? So sizeof(struct { char a; int b; }) is usually > sizeof(char) + sizeof(int).

It's more than 20 years that I code in C and Assembler, so yes, I know this. Sometimes it's important to care about the memory usage despite memory alignment performance issues. Using 16bit and not 32bit reduce memory usage a lot.

...

You can also write your own template type that takes bitsize (or value range) and resolves to the smallest native integral able to satisfy the requirement.

Sure, we know all that C/C++ fortunately allows to do that. But if such tasks, like cross platform serialization are so common, why not introducing them as a standard like with the STL.

...

...
wchar_t is horribly defined. Sometimes 1 byte, on windows 2 byte, on most unix platforms 4 byte. Not only the size is different but also the So you are saying the standard should have decided on a byte size for wchar_t

What I mean is that there should be a crossplatform solution for defining all characters. Unicode DO THIS. wchar_t not. So wchar_t is useless for real crossplatform i18n applications. On Windows it even breaks the definition of the wchar_t requiring that one wide char represent all possible characters for that system, that's not true on Windows because for rare combinations you need two wchar_t.

...

No here I think you are wrong. wchar_t is not supposed to help you work with Unicode.

The effect is that wchar_t is confusing and useless for crossplatform i18n. Once you realize this, you just avoid to use wchar_t. Nice for being a standard. I want to create applications running on different platforms without a nightmare.

...

You are saying that besides the native integral types (and native character types) C++ should offer you some fixed integer types (and some fixed character encoding types). C++0x will offer both from what I understand.

Yes, fixed integral types is enough. A Unicode point is just a number. Sure a standard function for decoding/encoding from the current locale to Unicode characters is welcome. Even better if I can write directly a Unicode text to a i/o stream.

...

Yes, that's what all people do in their portable layer creating portable types. Notice that just integer size and byte endianess is not all it takes to make it portable.

Since it's so widely used why not introduce in the standard a common serialization encoding for (fixed size) integral types for all applications? Why everyone must invent hot water again?

...

I haven't described this to advertise the library just to say that IMO it is quite normal for a low level language like C++ to provide native types of which representation depend largely on the platform.

Yes, I know this very well.. in the past performance issues was much more important than today. And 30 years ago issues like crossplatform serialization or i18n didn't exist. But today it's different.

...

Huh? Where in the standard does it says that the unsigned type has to use all the bits?

I want that my code is portable. I have a overflow because on common platforms the unsigned type use all bit's when handling signed/unsigned. Are you telling me to compile only on Unisys MCP to avoid the overflow?

...

Of course there is a need (otherwise they wouldn't be in C++0x). I'm just saying C++ native types are not flawed because of not providing that. They are not the tool for that, plain simple.

The simple fact that -1 < 2U returns false without giving me any warning is a issue. C++ is IMHO the best language in the world, I love it, I loved C since the first day long time ago. But there are issues... there is plenty of room to improve it. Do you really think that a over 30 years old language has no issues today??? The best path is to extend and improve it... so that I benefit new features on existing code.

Daryle Walker

8:53 p.m.

On Aug 18, 2008, at 12:15 PM, Andrea Denzler wrote:

...

...
You can also write your own template type that takes bitsize (or value range) and resolves to the smallest native integral able to satisfy the requirement.

Sure, we know all that C/C++ fortunately allows to do that. But if such tasks, like cross platform serialization are so common, why not introducing them as a standard like with the STL.

Do you guys know that we have class templates with inner typedefs to the built-in integer types that do what you want here? And they're based on either bit-length or maximum/minimum value. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com

dizzy

19 Aug 19 Aug

8:27 a.m.

On Monday 18 August 2008 19:15:50 Andrea Denzler wrote:

...

Sure, we know all that C/C++ fortunately allows to do that. But if such tasks, like cross platform serialization are so common, why not introducing them as a standard like with the STL.

Because there are just too many decisions on how to serialize it (the external portable representation). Decisions that depend on the user testcase and thus the user should decide on them not the standard library (because there are inherent tradeoffs to be made and it's up to the user to do that). What I would like tho from C++ is something that helps in doing serialization no matter how you do it. And first thing that comes in my mind about that is compile time introspection for data members. That alone would make writing serialization frameworks much easier and less error prone to use them. Another thing that might be nice is a skeleton serialization framework, extremely extensible (at compile time), it may come with an implementation already for the lazy people but it should allow for those with special needs the flexibility to have complete control over the external representation of the values. But our discussions are in vain. It's pointless what we say here, it only matters what papers are submitted to the committee (and if any will be approved about this subject that will make it in the next standard after C++0x).

...

What I mean is that there should be a crossplatform solution for defining all characters. Unicode DO THIS. wchar_t not. So wchar_t is useless for real crossplatform i18n applications. On Windows it even breaks the definition of the wchar_t requiring that one wide char represent all possible characters for that system, that's not true on Windows because for rare combinations you need two wchar_t.

Yes wchar_t is useless for portable character encoding. Just as "int" is useless for portable integer value communication. They are however useful to their purpose. If it's true what you say about Windows then it's not conforming (well the letter says "Type wchar_t is a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales (22.1.1).").

...

Yes, fixed integral types is enough.

You do know however that they are not mandated? A conforming implementation if it does not have a native type without padding for say a int32_t then that implementation will not offer "int32_t" (so your program will not compile which may be what you wanted or not, just saying that it does not cover everything people imagine about these types).

...

A Unicode point is just a number. Sure a standard function for decoding/encoding from the current locale to Unicode characters is welcome. Even better if I can write directly a Unicode text to a i/o stream.

You mean write UTF-8 or UTF-16 or other Unicode encodings?

...

I want that my code is portable. I have a overflow because on common platforms the unsigned type use all bit's when handling signed/unsigned. Are you telling me to compile only on Unisys MCP to avoid the overflow?

I didn't say anything as such, just saying that integral types may have padding.

...

The simple fact that -1 < 2U returns false without giving me any warning is a issue. C++ is IMHO the best language in the world, I love it, I loved C since the first day long time ago. But there are issues... there is plenty of room to improve it. Do you really think that a over 30 years old language has no issues today???

The best path is to extend and improve it... so that I benefit new features on existing code.

Yes there are problems but not so big to say the whole integer C++ feature is flowed. -- Dizzy "Linux is obsolete" -- AST

dizzy

18 Aug 18 Aug

9:24 a.m.

On Sunday 17 August 2008 09:32:40 Zeljko Vrba wrote:

...

On Sat, Aug 16, 2008 at 11:50:05PM +0300, dizzy wrote:

...
I do not agree. They generally do their job fine (which is provide portable support to work with native, non checked, platform integer types). For any other needs you should probably use another tool.

Well, they do *not* do their job fine: (-1U < 2) == false, which is a mathematical nonsense (more on that below).

That does not prove anything. Any good tool has bad use cases. It does not prove integer types do not do their job fine for what was their purpose. You seem to want more than their purpose.

...

Signed arithmetic overflow is undefined behavior, and some CPUs actually raise an exception on overflow (e.g. MIPS). Every 'a+b' expression, with a and b being signed integers, is potential UB. Some machines (e.g. x86) do not raise an exception but set an overflow flag which may be tested by a single instruction, yet I don't know of a compiler which offers run-time overflow checking as a code-generation option. Portable checks for overflow (relying on bit operations) incur immense overhead (certainly much greater than a single instruction).

Which is the reasoning why they were not introduced in the standard since it's prohibitive or impossible in certain platforms. So? That just means the native integer types inherit the many limitations of the low level CPU arithmetic operations of the many platforms out there. Kinda logical to do so if you ask me (since C++ tries to be defined so it has fast implementations on the many native platforms out there instead of trying to be strict on his own virtual machine).

...

Writing an extra set of parentheses is not visually intrusive or cluttering.

Says who?

...

Writing static_cast<int>(expr), (int)(expr), or surrounding the offending code with #pragma _is_ cluttering and intrusive.

Common, if we are to argue about how clumsy it _looks_ writing additional parenthesis vs static_cast I think we have too much time on our hands. There are countless other compromises we do when programming in C++ just because for the rest it works so well (and no I don't see that as a flaw in the language, meaning one could have all the good things of C++ without the bad things, I see that as result of the good things of C++, every language feature has side effects, some which we may not find good in certain contexts but that does not make them a flaw).

...

Yes, I want my code to be short, concise, and easily readable, in addition to being correct. So shoot me :-)

Well that's you. C++ was also made to: - not pay for anything you don't need (I think RTTI is an exception there) - allow writing code that is very close to the native platform - be source compatible with C as much as possible

...

I have researched comp.lang.c++.moderated archives on this topic, and other sources, and found two advices:

Peter van der Linden in "Expert C Programming: Deep C Secrets" writes:

"Avoid unnecessary complexity by minimizing your use of unsigned types. Specifically, don't use an unsigned type to represent a quantity just because it will never be negative (e.g., "age" or "national_debt")."

A quote of Bjarne Stroustrup: "The unsigned integer types are ideal for uses that treat storage as a bit array. Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea. Attempts to ensure that some values are positive by declaring variables unsigned will typically be defeated by the implicit conversion rules."

I personally use "unsigned" to mean values that cannot be negative and I always careful about comparing them with signed types and such cases. That is, in the low level code of course (it's also on layers, starting at some layer it becomes unsigned after being properly checked for negative values and from that point on it is used only as unsigned internally tho the outside world can send signed values). In the high level code (which I am not writing, since my work focuses on low level code) I would use something like your proposed library (which proves that C++ not only offers native types to satisfy _my_ needs, you can also use them to do something to satisfy _your_ needs thus they are not "flawed").

...

Yet, all of the STL uses an unsigned type for size_type and the return value of size().

It uses std::allocator::size_type which is size_t for obvious reasons. Why size_t is unsigned is a question to be asked for the C people who made it so in C90 I guess :) Either way they are low level enough to not complain about that choice.

...

As much as I'd like to use only signed ints, this becomes prohibitive (due to warnings) when mixing them with STL. An yes, I've been bitten several times in the past by implicit signed -> unsigned conversions in relational comparisons. The most sane thing would be to throw an exception at runtime if one compares a negative integer with some unsigned quantity, instead of getting false for '-1 < 2U', which is a mathematical nonsense.

You imply some code to check for that thus violating the basic principle of not paying for what you don't need.

...

signedness of a type *should not* affect its numerical value in mathematical operations.

But it can't do that without runtime cost. There are 3 options the compiler can do about the a < b (a being signed, b being unsigned): - it can, as it does now, convert a to unsigned (which may result in unspecified value) and compare that; you find it wrong - it can convert b to signed and compare with a; you think this is right but it isn't because it's exactly as the option above, it has cases in which conversion of b to signed results in unspecified value (because the unsigned b value may be bigger than what the signed a can represent); would you still prefer this over the above option? maybe so but others will not since the need exactly the other way (the current solution) - it can promote both arguments to the signed integer type that supports all values of both (say long) and compare that; this may seem fine to you but it has the following problems: - it assumes there is a type that can represent both those values; we can clearly imagine that a and b might have been long and unsigned long in the beginning so that won't work; not to mention there is no guarantee that even if they are int then long can fit them both; long can have the same size of int (as it does so on many platforms) - promoting to a bigger type than int and comparing that may prove more expensive than comparing ints (which the standard defines as being the natural integer type of the platform, thus the "fastest" one); so then we would be violating the principle of paying for what we don't need in cases of those people that they do know a is not negative thus a < b won't do anything bad; Your solution to this problem?

...

I *do* realize that. However, simple wrappers won't fix operators. I like having as many low-overhead sanity checks in my code as possible, but I don't want to litter the code with a bunch of assert()s in front of every arithmetic expression. Should I replace every 'if(a < b)' with 'if(less(a, b))' and provide a bunch of overloaded less() functions?

No but why would you do it? You can't know at certain locations in your program that something may be negative or not? That's actually _why_ I use unsigned types, to signal that it can't be negative from that moment on (ie the user code sees the interface taking an "int" but the called function converts it unsigned after checking for the precondition of being positive value and then from that point on uses only unsigneds so having unsigned types actually helps knowing that variable can't be negative). Or otherwise, how do you know a pointer can't be invalid at every point of use in the program? You can't, you can have invariants based on the code flow up to that point or to help you a bit with this you can have the value stored in something that moves the precondition earlier (like a reference, for which to be initialized you would need to dereference the pointer and if that's invalid the UB happens then). That's the same with receiving a user value as int and storing it in unsigned to be used later as unsigned. The later code sees unsigned (just as the later pointer code sees the reference) and knows it's a positive value (just as the later pointer code knows it's a valid reference). They know this because at the location of converting from int to unsigned (or from pointer to reference) you performed the check.

...

Oh, the library would allow conversion from integer<32, 0, 4294967295> to integer<16, 0, 65535>, and insert an appropriate run-time range-check.

Sounds fine.

...

So types integer<N1,L1,H1> and integer<N2, L2, H2> would be compatible if the intersection of [L1,H1] and [L2,H2] is not empty; the resulting type would be coerced to integer<max(N1,N2), min(L1, L2), max(H1, H2)> (though other bounds are necessary for other operations, e.g., addition), with a run-time check inserted (this makes it possible to maintain rather lax bounds at compile-time).

===

Actually, I think I could simplify my requirements a bit:

- define template class integer<T> with T == char, short, int, long, long long (no unsigned types allowed!)

So you can't know at some point in code if some value is positive without assumption (may be error prone) or runtime check (too costly). I think you should allow to signal syntactically that some type takes only positive values, it is something often needed in code to know this for sure about some variable. It won't break your invariants, it can be made compatible with the rest of your library ideas.

...

- allow initialization from _any_ integer<T> or primitive type, larger or smaller, signed or unsigned, provided it is in the range of T; throw exception otherwise

Initialization with implicit conversion too? If so that might become tricky with the bellow requirement about conversion to integer types (in general it's tricky to get implicit conversions right).

...

- allow conversion to _any_ underlying integer type, signed or unsigned, provided the value fits; otherwise throw an exception - allow mixed arithmetic between mixed integer<T> classes (e.g. integer<int> + integer<char> would be defined) as well as between integer<T> and any primitive type, subject to the conversion rules above - arithmetic would be checked for overflow

This, as you said, can be overly expensive on some platforms or not? But if you need it sure.

...

- comparisons between integer<T> and unsigned types are allowed as long as integer<T> is positive or 0; otherwise an exception is thrown

Mathematically speaking "0" is a positive value ;) On the serious side, why throw? Wouldn't be better to still try to perform them somehow (and only if there is no underlying common type that can represent all their values and be able to compare them throw). This of course comes with associated costs and if as you say below you want a flag that disables all associated runtime costs then I understand why you need to throw here in the debug version.

...

- bit manipulation operators would behave as if T were unsigned; an additional method or function signed_shift(integer<T>) would be provided

But you should make some more strict requirements then. You can't assume a negative value encoding (such as 2's complement) thus you should not allow bitwise operations affecting more than the bits forming the value, ie without the sign bit. Otherwise you (as the C++ standard) get "unspecified value".

...

- arrays of integer<T> must have the same low-level layout as arrays of T

I'm not sure how this can be accomplished. Sure you can have only the integer value as the single non static data field but I don't think the standard says that a struct A { int a; /* any members here except non static data members */ }; has the same binary layout with a struct A { int a; };. I know it makes sense to have but I'm not sure about the letter of the standard on this. Conclusion: it's a nice library. It's something that I also suggested to some colleagues of mine when they ran into some of those signed/unsigned warnings issues (into their high level code). But notice that some of your checks may be too expensive, more expensive than someone is willing to pay to get those features, so then to make it generic you will need to do some sort of compile time configuration, like for example with policies. You say you would be using it only in debug mode so you think of it just as a form of assert() but using the operator overloading syntactic sugar to still have common integer arithmetic syntax without assert()s polluting the code everywhere. But you would use it as native types when in release mode, so then the C++ integer types are not flawed that you would use them in release mode? Your only complain is that they were flawed for not having expensive runtime checks when used in debug mode? :)

...

Example:

integer<int> a; integer<short> b; short result = a + b + 7;

This code would convert b to integer<int>, check for overflow before (or after[*]) adding it to a, compute a+b, check for overflow before adding 7, add 7 to the result, check that the result fits into short, done. If any check fails, an exception is thrown.

[*] Overflow checks could be implemented in platform-specific assembler; e.g. x86 sets the overflow flag _after_ addition.

This is the semantic that I'd like to have in debug versions; in "release" version of the program, everything would behave as ordinary arithmeic on primitive types. I'm studying numeric_cast<>, but.. it covers only conversions, not other items listed above.

...
There is a lot of possible dangerous code that can be done in C or C++ and some compilers tell you about it but that do not make it wrong code. Warnings

The problem is that every 'a+b', with a and b signed, is dangerous (potential UB), unless you either 1) formally *prove* that the expression won't overflow or 2) insert extra run-time checks (which clutter the code). :/

Sure, just as "*p" is potential UB unless you: - formally prove p is a valid pointer - you insert extra run-time checks (with pointers this is actually worse since other than the null pointer value you have no way to determine if the pointer is invalid) As I said, it's just normal C++, many places of potential danger but that might be fine.

...

Something as common as simple addition should at least have an _option_ of *always*, under *all* circumstances having defined behavior.

Portable and without runtime costs, no. It has no such option. That's the reason why undefined behavior has been introduced in C (and in C++). There are many cases where because of technical and portability reasons, to provide a defined behavior would be either impossible or too expensive. So the standard says then it's undefined behavior.

...

The integer<> class proposed above is just one of possibilities; but that should have been included in the standard :/

You mean in the standard library? Well I work often with open source projects, and in the open source community we have a saying when someone complains about something not already available in software: "if it's not available it means it's not needed". To a certain degree this is valid for the current problem too. Search on the Internet, how many std-like hash implementations you find vs how many such range checked integer libraries you find? That is a good estimation of the need of such a library IMO. Yes, it is useful, but it is not very very useful. Plus, because it can be implemented in C++ proves that C++ integer types are not flawed and they allow you to do what you need. -- Dizzy "Linux is obsolete" -- AST

Zeljko Vrba

2:18 p.m.

On Mon, Aug 18, 2008 at 12:24:46PM +0300, dizzy wrote:

...

...
Writing an extra set of parentheses is not visually intrusive or cluttering.

Says who?

I say so. Extra parentheses = 2 chars. Old-style cast = two sets of parentheses + type name. New-style cast = even more clutter.

...

Well that's you. C++ was also made to: - not pay for anything you don't need (I think RTTI is an exception there)

ok, agreed.

...

- allow writing code that is very close to the native platform

Whatever that means. I disagree here because 'a+b' is undefined behavior even on 2's complement machines which do not trap on overflow (even though the result is well defined). Even when I *do* want undetected overflow behavior of signed numbers for some reason, with whatever semantics the underlying CPU provides, I *cannot* get it, because it's UB. (In practice, I *do* get it as long as the code generator is not too nifty with optimizations.) Anyway, we're not going to agree here, so we might just well close this part of the discussion, this is anyway OT.

...

- it can promote both arguments to the signed integer type that supports all values of both (say long) and compare that; this may seem fine to you but it has the following problems: - it assumes there is a type that can represent both those values; we can clearly imagine that a and b might have been long and unsigned long in the beginning so that won't work; not to mention there is no guarantee that even

It can work perfectly, there are only two cases: - a < 0 : true, done, no need for conversion - a >= 0 : ordinary unsigned comparison

...

- promoting to a bigger type than int and comparing that may prove more expensive than comparing ints (which the standard defines as being the natural

You don't need to promote.

...

Your solution to this problem?

See above.

...

program that something may be negative or not? That's actually _why_ I use unsigned types, to signal that it can't be negative from that moment on (ie

It can't be negative, but it can assume a meaningless, huge positive value.

...

Or otherwise, how do you know a pointer can't be invalid at every point of use in the program? You can't, you can have invariants based on the code flow up

I can't, but I rarely do anything but dereferencing on pointers.

...

...
Actually, I think I could simplify my requirements a bit:

- define template class integer<T> with T == char, short, int, long, long long (no unsigned types allowed!)

So you can't know at some point in code if some value is positive without assumption (may be error prone) or runtime check (too costly). I think you should allow to signal syntactically that some type takes only positive values, it is something often needed in code to know this for sure about some

It can be positive, yet still meaningless. Thus, if being "poisitive" is relevant at some point in the code, then also "being in range with some tighter upper bound than MAX_UINT" is also relevant. So the check is needed in any case. As the case is now with C++, "being positive" is a worthless constraint because "a-b" always results in a positive value, regardless of a < b or a > b. This syntactic constraint just gives a false sense of security, imho, at least when it is used the way you have just described (guarantee that something is positive).

...

Initialization with implicit conversion too? If so that might become tricky with the bellow requirement about conversion to integer types (in general it's tricky to get implicit conversions right).

Probably not. I have to think about this.

...

...
- allow conversion to _any_ underlying integer type, signed or unsigned, provided the value fits; otherwise throw an exception - allow mixed arithmetic between mixed integer<T> classes (e.g. integer<int> + integer<char> would be defined) as well as between integer<T> and any primitive type, subject to the conversion rules above - arithmetic would be checked for overflow

This, as you said, can be overly expensive on some platforms or not? But if you need it sure.

With platform-specific inline ASM, it's hardly expensive on any platform (an extra instruction or so). Writing it in portable C++ is expensive, yes.

...

...
- comparisons between integer<T> and unsigned types are allowed as long as integer<T> is positive or 0; otherwise an exception is thrown

Mathematically speaking "0" is a positive value ;) On the serious side, why

No, it's not :-) 0 is nonnegative, but not positive. See here: http://mathworld.wolfram.com/Positive.html http://mathworld.wolfram.com/Nonnegative.html

...

throw? Wouldn't be better to still try to perform them somehow (and only if there is no underlying common type that can represent all their values and be able to compare them throw). This of course comes with associated costs and if as you say below you want a flag that disables all associated runtime costs then I understand why you need to throw here in the debug version.

Yes, this occurred to me too. Should be encapsulated in some policy, so you can choose the behaviour you want.

...

...
- bit manipulation operators would behave as if T were unsigned; an additional method or function signed_shift(integer<T>) would be provided

But you should make some more strict requirements then. You can't assume a negative value encoding (such as 2's complement) thus you should not allow bitwise operations affecting more than the bits forming the value, ie without the sign bit. Otherwise you (as the C++ standard) get "unspecified value".

OK, more food for thought. Though, I primarily target 2nd complement machines. If somebody needs support for other representation, he's welcome to contribute :)

...

...
- arrays of integer<T> must have the same low-level layout as arrays of T

I'm not sure how this can be accomplished. Sure you can have only the integer

struct Integer { int x ; }; Integer a; the standard guarantees that (void*)&a == (void*)&a.x Compiler will hopefully not insert any additional padding, so the result is in practice achieved.

...

features, so then to make it generic you will need to do some sort of compile time configuration, like for example with policies. You say you would be using

yes, i want at least checking/unsafe policy.

...

arithmetic syntax without assert()s polluting the code everywhere. But you would use it as native types when in release mode, so then the C++ integer types are not flawed that you would use them in release mode? Your only complain is that they were flawed for not having expensive runtime checks when used in debug mode? :)

I complained to two things: - that the arithmetic, as it is defined now, can be mathematically nonsensical - that the language definition does not give an option for well-defined arithmetic, either implemented in the compiler or as part of the standard library So yes, integer types suddenly become good enough after the program has been tested extensively enough.

...

...
Something as common as simple addition should at least have an _option_ of *always*, under *all* circumstances having defined behavior.

Portable and without runtime costs, no. It has no such option. That's the

Just because it can be costly, it should be an option. But a standardized one.

...

You mean in the standard library? Well I work often with open source projects,

For example.

...

something not already available in software: "if it's not available it means it's not needed". To a certain degree this is valid for the current problem

Or it might just be that the majority of OSS developers do not realize that such library is needed or why it would be useful.

dizzy

3:03 p.m.

On Monday 18 August 2008 17:18:34 Zeljko Vrba wrote:

...

...
- it can promote both arguments to the signed integer type that supports all values of both (say long) and compare that; this may seem fine to you but it has the following problems: - it assumes there is a type that can represent both those values; we can clearly imagine that a and b might have been long and unsigned long in the beginning so that won't work; not to mention there is no guarantee that even

It can work perfectly, there are only two cases: - a < 0 : true, done, no need for conversion - a >= 0 : ordinary unsigned comparison

Well that still is a too much cost for some people. You just transformed a single CPU comparison of a < b to 2 comparison instructions. People who made sure that "a" is always positive in their program do not want to pay for the price of doing one more comparison for those who didn't.

...

...
program that something may be negative or not? That's actually _why_ I use unsigned types, to signal that it can't be negative from that moment on (ie

It can't be negative, but it can assume a meaningless, huge positive value.

So? It is the fault of the code that converted a negative value to a positive one if it did this and it's not of your matter where you have received unsigned types and working with them. Same as with references, it is the fault of the code that dereferenced an invalid pointer if it gave you an invalid reference but that does not mean your code cannot safely assume your reference is valid.

...

As the case is now with C++, "being positive" is a worthless constraint because "a-b" always results in a positive value,

Because you want it to do more than it does (and other people need it to do). If so you can add your own checks of course.

...

regardless of a < b or a > b. This syntactic constraint just gives a false sense of security, imho, at least when it is used the way you have just described (guarantee that something is positive).

Depends how do you define security, when you receive a reference are you feeling secure enough it points to a valid object or it may come as a result of dereferencing a null pointer and thus do you check the address of references? I hope not! And if not, then why do you trust references but do not trust other syntactic sugar like "unsigned"? Because I meant of using it to express positive value integers as syntactic sugar, obviously anyone can shoot himself in the foot with them if they want (and they should be able to if otherwise means to perform costly checks).

...

...
But you should make some more strict requirements then. You can't assume a negative value encoding (such as 2's complement) thus you should not allow bitwise operations affecting more than the bits forming the value, ie without the sign bit. Otherwise you (as the C++ standard) get "unspecified value".

OK, more food for thought. Though, I primarily target 2nd complement machines. If somebody needs support for other representation, he's welcome to contribute

Just make sure the library does not need redesign to add another negative value encoding and then of course whomever needs it can implement it himself (oh and also that you compile time error when such assumptions are violated).

...

struct Integer { int x ; }; Integer a;

the standard guarantees that (void*)&a == (void*)&a.x

Compiler will hopefully not insert any additional padding, so the result is in practice achieved.

So you make your new Integer class a POD. But then how do you enable implicit conversions (for most of the operator overloads sure you can overload them non-member).

...

- that the arithmetic, as it is defined now, can be mathematically nonsensical

That's how CPUs were made, they don't make much sense in strict mathematical arithmetic but they make a lot of sense in binary mode arithmetic.

...

- that the language definition does not give an option for well-defined arithmetic, either implemented in the compiler or as part of the standard library

Sure it can, the new Integer type you are doing is such an option. Where should the line be drawn between whatever other high level language features some users need and what the standard library provides? -- Dizzy "Linux is obsolete" -- AST

Zeljko Vrba

3:29 p.m.

On Mon, Aug 18, 2008 at 06:03:11PM +0300, dizzy wrote:

...

...
It can work perfectly, there are only two cases: - a < 0 : true, done, no need for conversion - a >= 0 : ordinary unsigned comparison

Well that still is a too much cost for some people. You just transformed a single CPU comparison of a < b to 2 comparison instructions. People who made sure that "a" is always positive in their program do not want to pay for the price of doing one more comparison for those who didn't.

Well, then they get to cast their signed number to unsigned: (unsigned)a < b, problem solved.

...

Depends how do you define security, when you receive a reference are you

Being able to detect invalid input. malloc(-12) is certainly meaningless, malloc(something_large), might be, might be not - where's the border?

...

references? I hope not! And if not, then why do you trust references but do not trust other syntactic sugar like "unsigned"? Because I meant of using it

It's not about trust - it's about being unable to detect invalid input.

...

So you make your new Integer class a POD. But then how do you enable implicit conversions (for most of the operator overloads sure you can overload them non-member).

PODs can have member functions, at least if I interpret this correctly: http://www.fnal.gov/docs/working-groups/fpcltf/Pkg/ISOcxx/doc/POD.html

...

That's how CPUs were made, they don't make much sense in strict mathematical arithmetic but they make a lot of sense in binary mode arithmetic.

CPUs provide various simple means to detect anomalous conditions. C++ does not.

...

Sure it can, the new Integer type you are doing is such an option. Where should the line be drawn between whatever other high level language features some users need and what the standard library provides?

If there's place for locale...

dizzy

19 Aug 19 Aug

8:02 a.m.

On Monday 18 August 2008 18:29:27 Zeljko Vrba wrote:

...

On Mon, Aug 18, 2008 at 06:03:11PM +0300, dizzy wrote:

...
...
It can work perfectly, there are only two cases: - a < 0 : true, done, no need for conversion - a >= 0 : ordinary unsigned comparison

Well that still is a too much cost for some people. You just transformed a single CPU comparison of a < b to 2 comparison instructions. People who made sure that "a" is always positive in their program do not want to pay for the price of doing one more comparison for those who didn't.

Well, then they get to cast their signed number to unsigned: (unsigned)a < b, problem solved.

Then I'm out of my arguments (for the case of comparison alone, there are other cases where such conversions are also performed where more problems need to be solved than the additional compare operation). If I were to decide how to do it I would have decided as you suggested since it seems the least error prone for the user. I'm sure however that there were other reasons that made it be this way and I don't think they are arbitrary. Maybe someone more knowledgeable can comment. I also do not consider this little feature something so important that because it doesn't do as we see it best then we can say the language integer types are broken.

...

It's not about trust - it's about being unable to detect invalid input.

And how does current C++ makes it impossible to detect invalid input?

...

...
So you make your new Integer class a POD. But then how do you enable implicit conversions (for most of the operator overloads sure you can overload them non-member).

PODs can have member functions, at least if I interpret this correctly: http://www.fnal.gov/docs/working-groups/fpcltf/Pkg/ISOcxx/doc/POD.html

By that URL: The term POD class types collectively refers to aggregate classes (POD-struct types)... And then it says: The term aggregate refers to an array or class that has none of the following characteristics [§8.5.1, ¶1]: user-declared constructors, And since a ctor used for conversion is such a user declared ctor then you can't have it POD anymore. Either if you don't want implicit conversion with a ctor you still need your own ctors taking integer values or other integer types (with other ranges) and that makes it not a POD anymore right?

...

...
That's how CPUs were made, they don't make much sense in strict mathematical arithmetic but they make a lot of sense in binary mode arithmetic.

CPUs provide various simple means to detect anomalous conditions. C++ does not.

Yes it would have been nice to know portably an arithmetic operation has overflowed that results in fast CPU code checking for some flag. But I am not sure if this is possible for all CPUs that C++ is supposed to run on or how useful would be. -- Dizzy "Linux is obsolete" -- AST

Zeljko Vrba

10:49 a.m.

On Tue, Aug 19, 2008 at 11:02:44AM +0300, dizzy wrote:

...

to be solved than the additional compare operation). If I were to decide how to do it I would have decided as you suggested since it seems the least error prone for the user. I'm sure however that there were other reasons that made

Hooray, we have agreed on at least something! :-)

...

And since a ctor used for conversion is such a user declared ctor then you can't have it POD anymore. Either if you don't want implicit conversion with a ctor you still need your own ctors taking integer values or other integer types (with other ranges) and that makes it not a POD anymore right?

You're right, I want constructors. I have yet to see a case where a user-defined constructor alters the layout of what an otherwise would be a POD-class, but..

...

Yes it would have been nice to know portably an arithmetic operation has overflowed that results in fast CPU code checking for some flag. But I am not sure if this is possible for all CPUs that C++ is supposed to run on or how useful would be.

Such facility can be implemented in software, if hardware does not provide a better way. Efficiency of implementation has nothing to say about it -- not even all CPUs support floating-point in hardware, yet FP is standardized. If you would use a construct like overflows(a+b), you'd pay the price, however large/small it might be, otherwise you wouldn't. Anyway, time to bring this discussion towards the end -- it seems that we don't have a large gap in opinions, after all.

Kim Barrett

18 Aug 18 Aug

10:50 p.m.

At 8:32 AM +0200 8/17/08, Zeljko Vrba wrote:

...

Some machines (e.g. x86) do not raise an exception but set an overflow flag which may be tested by a single instruction, yet I don't know of a compiler which offers run-time overflow checking as a code-generation option.

See gcc's -ftrapv option.

Asif Lodhi

11:03 p.m.

Hi Zeljko, On 8/17/08, Zeljko Vrba <zvrba@ifi.uio.no> wrote:

...

On Sat, Aug 16, 2008 at 11:50:05PM +0300, dizzy wrote:

...
.................................. times in the past by implicit signed -> unsigned conversions in relational comparisons. The most sane thing would be to throw an exception at runtime if one compares a negative integer with some unsigned quantity, instead of getting false for '-1 < 2U', which is a mathematical nonsense. signedness of a type *should not* affect its numerical value in mathematical operations.

I would like to explain explicitly that the int to unsigned int conversion is NOT by accident - it is a "Standard Conversion" and you can find it in Appendix "C" of the book "The C++ Programming Language, 3rd Ed." (TC++PL) by Bjarne Stroustrup. In appendex C of TC++PL, almost all types of conversions are defined, AFAIK. Once you _know_ that it happens then writing something like "-1 < 2U" is entirely your problem to take care of as compiler, IMO, is not likely to issue any warnings because you are explicitly specifying 2 as an unsinged constant by suffixing "U" to it which will automatically convert the -1 to an unsigned bitwise (as a Standard Conversion) because of closeness of the language to the machine which was a huge factor in the language design (you can read Stroustrup's book: "Design and Evolution of C++" to verify that). As you and others have already said earlier, you can always design a new class if you want to make mathematical sense. -Asif

David Abrahams

16 Aug 16 Aug

8:54 p.m.

on Sat Aug 16 2008, Zeljko Vrba <zvrba-AT-ifi.uio.no> wrote:

...

I get a warnings about type truncation, and obviously, I don't like them. But I like explicit casts and turning off warnings even less. No, I don't want to tie together size_type and task id type (unsigned int). One reason is "aesthetic", another reason is that I don't want the task id type to be larger than necessary (heck, even a 16-bit type would have been enough), because the task IDs will be copied verbatim into another std::vector<unsigned> for further processing (edge lists of a graph). Doubling the size of an integer type shall have bad effects on CPU caches, and I don't want to do it.

What to do? Encapsulate into "get_next_id()" function? Have a custom size() function/macro that just casts the result of vector::size and returns it?

Have you looked at boost::numeric_cast? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Zeljko Vrba

17 Aug 17 Aug

5:19 a.m.

On Sat, Aug 16, 2008 at 12:54:24PM -0800, David Abrahams wrote:

...

Have you looked at boost::numeric_cast?

Oh, thanks for the tip! Looking into it right now.

Asif Lodhi

6:09 a.m.

Hi, On Sat, Aug 16, 2008 at 10:32 PM, Zeljko Vrba <zvrba@ifi.uio.no> wrote:

...

Integer types in C and C++ are a mess. For example, I have made a library where a task is identified by a unique unsigned integer. ==

Another example: an external library defines its interfaces with signed integer types, I work with unsigned types (why? to avoid even more ==

Does anyone know about an integer class that lets the user define the number of bits used for storage, lower allowed bound and upper allowed bound for the range? Like: template<int BITS, long low, long high> class Integer;

I would like to refer you to section C.6.1 of "The C++ Programming Language, 3rd Ed." by Bjarne Stroustrup where the integer promotions are described. Elsewhere in the same pages, other conversions are also described. Basically, Stroustrup suggests to use integer types ONLY in the very lowest level code and use higher level types that you defined according to your own problem domain in your design. If you do want to use integer types then stick to plain ints and plain chars - as much as you possibly can. Use unsigned ints or signed char or unsigned char or longs and you might run into portability issues across different compilers/platforms, afaik, when you mix all of these types in calculations, for example. Lastly, a bit-field (if defined in a structure, that is) gets converted to an int if int can appropriately represent it otherwise it gets converted to an unsigned int. To save you time, you can use the "bitset" class of standard C++ for your purpose. -Asif

Daryle Walker

18 Aug 18 Aug

11:05 p.m.

New subject: The problem isn't C's integer philosophy; it's your design. (was: size_type doubts / integer library..)

Have you ever read a thread in a programming news-group, where some newbie asks how to implement some wacky scheme? Several far-out solutions come forth, then flame-wars erupt over the pros & cons. Then finally someone asks what the newbie really needs and gives a completely different solution based on that response. Basically, the newbie was trying to do something in a manner s/he shouldn't even thought of, let alone suppose it was good enough to try implementing. This is one of those times. On Aug 16, 2008, at 1:32 PM, Zeljko Vrba wrote:

...

Integer types in C and C++ are a mess. For example, I have made a library where a task is identified by a unique unsigned integer. The extra information about the tasks is stored in a std::vector. When a new task is created, I use the size() method to get the next id, I assign it to the task and then push_back the (pointer to) task structure into the vector. Now, the task structure has also an "unsigned int id" field. In 64-bit mode,

sizeof(unsigned) == 4, sizeof(std::vector::size_type) == 8

I get a warnings about type truncation, and obviously, I don't like them. But I like explicit casts and turning off warnings even less. No, I don't want to tie together size_type and task id type (unsigned int). One reason is "aesthetic", another reason is that I don't want the task id type to be larger than necessary (heck, even a 16-bit type would have been enough), because the task IDs will be copied verbatim into another std::vector<unsigned> for further processing (edge lists of a graph). Doubling the size of an integer type shall have bad effects on CPU caches, and I don't want to do it.

What to do? Encapsulate into "get_next_id()" function? Have a custom size() function/macro that just casts the result of vector::size and returns it?

Well, using a custom "size" function will shut the compiler up. But the "get_next_id" function is better because you can change the implementation of the ID and external code shouldn't have to change. (The ID is a typedef and not a naked "unsigned," right?) Anyway, does it really matter; this ID generation code is only used during task construction, right? Actually, writing this response is hard. I've read 20+ responses, talking about how much built-in integers "suck." Then I decided to look at the original post again, and something bugged me about it. Why are you using a number to refer to a container element in the first place? Then I realized that you can't use iterators because they're not stable with vector's element adds or removes. Then I wondered, why are you using a vector in the second place? Wouldn't a list be better, so you can add or remove without invalidating iterators, leaving them available to implement your ID type. And you don't seem to need random-access to various task elements. (A deque is unsuitable for the same reason as a vector.) Then I thought, these tasks just store extra information, and have no relation to each other (that you've revealed). So why are you using any kind of container at all? You have no compunctions about using dynamic memory, so just allocate with shared-pointers: //================================================ class task { struct task_data { // whatever... }; typedef boost::shared_ptr<task_data> sp_type; sp_type data_; // Hidden member-wise constructor explicit task( sp_type d ) : data_( d ) {} public: // Constructors of various configurations, possibly including // a default constructor; but use the automatically-defined // copy-constructor and destructor task(/*whatever*/) : data_( new task_data(/*whatever*/) ) {} // Forced copy task clone() const { sp_type result_data( new task_data(*this->data_) ); task result( result_data ); return result; } // Use automatically defined copy-assignment operator bool operator ==( task const &o ) const { //return this->data_ == o.data_; // shallow return *this->data_ == *o.data_; // deep } bool operator !=( task const &o ) const { return !this->operator ==( o ); } // Regular task functionality follows... }; //================================================ (I was going to suggest using Boost's pointer-containers, but then I realized that you really don't need containment at all.) Now you'll pass this class around instead of an integer type. The size may be higher though, two pointers (and an 'int' in debug-mode).

...

==

Another example: an external library defines its interfaces with signed integer types, I work with unsigned types (why? to avoid even more warnings when comparing task IDs with vector::size() result, as in assert(task->id < tasks.size()), which are abundant in my code). Again, some warnings are unavoidable.

What to do to have "clean" code?

You tasks IDs are conceptually opaque, why is any external code wanting to mess with them? The external code shouldn't be doing anything with the IDs besides comparing them with each other (only != and ==, not ordering) and using them as keys to your task functions. This is why the ID's implementation should be hidden in a wrapping class, external code can't mess with them by default; you would have to define all legal interactions. In other words, define the main task functionality in member functions and friends of the "task" class I suggested, and have any ancillary code call that core code. And if any code besides your test-invariant function is doing those asserts, especially functions outside the task class, you're doing your wrong method wrong.

...

==

Does anyone know about an integer class that lets the user define the number of bits used for storage, lower allowed bound and upper allowed bound for the range? Like: template<int BITS, long low, long high> class Integer;

The integer library in Boost has this. The various class templates only support one of your parameters at a time, though. (Either bit- length, maximum, or minimum, not any two or all three.)

...

BITS would be allowed to assume a value only equal to the one of the existing integer types (e.g. 8 for char, 16 for short, etc.), and the class would be constrained to hold values in range [low, high] (inclusive). [SNIP rant with big ideas for integers he doesn't currently need]

You'll have to enforce a constraint range yourself. But there is a numeric-conversion library in Boost to help you there, too.

...

Or should I just listen to the devil on my shoulder and turn off the appropriate warnings?

No, you should rethink your design on why you need integers in the first place. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com

Emil Dotchevski

11:38 p.m.

New subject: The problem isn't C's integer philosophy; it's your design. (was: size_type doubts / integer library..)

On Mon, Aug 18, 2008 at 4:05 PM, Daryle Walker <darylew@hotmail.com> wrote:

...

Have you ever read a thread in a programming news-group, where some newbie asks how to implement some wacky scheme? Several far-out solutions come forth, then flame-wars erupt over the pros & cons. Then finally someone asks what the newbie really needs and gives a completely different solution based on that response. Basically, the newbie was trying to do something in a manner s/he shouldn't even thought of, let alone suppose it was good enough to try implementing. This is one of those times.

On Aug 16, 2008, at 1:32 PM, Zeljko Vrba wrote:

...

Integer types in C and C++ are a mess. For example, I have made a library where a task is identified by a unique unsigned integer. The extra information about the tasks is stored in a std::vector. When a new task is created, I use the size() method to get the next id, I assign it to the task and then push_back the (pointer to) task structure into the vector. Now, the task structure has also an "unsigned int id" field. In 64-bit mode,

sizeof(unsigned) == 4, sizeof(std::vector::size_type) == 8

I get a warnings about type truncation, and obviously, I don't like them. But I like explicit casts and turning off warnings even less.

What are you suggesting, that in 64-bit mode std::vector::size_type shouldn't be 64-bit? Do you not want a warning when you convert from 64-bit to 32-bit type? One solution to your problem is to wrap the vector into another interface which communicates in terms of functions that are specific for your needs. That interface can do the conversions, and you have the added benefit that user code can't use the full spectrum of std::vector functions, which presumably could make it possible to change the underlying container altogether. By the way, you don't know for sure that sizeof(unsigned) is 4. Strictly speaking, you don't even know that the value returned by sizeof() is in 8-bit bytes. Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode

Zeljko Vrba

19 Aug 19 Aug

5:55 a.m.

New subject: The problem isn't C's integer philosophy; it's your design. (was: size_type doubts / integer library..)

On Mon, Aug 18, 2008 at 04:38:47PM -0700, Emil Dotchevski wrote:

...

What are you suggesting, that in 64-bit mode std::vector::size_type shouldn't be 64-bit?

I suggest that size_type be a signed type.

...

Do you not want a warning when you convert from 64-bit to 32-bit type?

I do want a warning.

6214

Age (days ago)

6217

Last active (days ago)

List overview

Download

29 comments

9 participants

participants (9)

Andrea Denzler
Asif Lodhi
Daryle Walker
David Abrahams
dizzy
Emil Dotchevski
Kim Barrett
Michiel Helvensteijn
Zeljko Vrba