Re: [Boost-users] size_type doubts / integer library..
Some thoughts: 1. Original problem Assigning a size to an ID type, IMHO appears to be completely unrelated to the general integral types' problem. There are two possibilities: A) you intend to use your ID to later reference some address or offset (which is the same) in memory. If that is your intention, then you should stick to the size type (i. e. std:size_t). Problem solved. B) if you don't intend to ever use your ID in the future to refer to an address in memory, then using an integral type might prove short-sighted. Unless you have a very concise knowledge about how many IDs you will need in, say, 5 years, or maybe 10 years from now, it's dangerous to limit yourself to a specific integral type for such a value. The least I would do is a typedef to hide the true type of that value and prevent anyone from making presumptions about it's potential size. Better yet define a class for this ID. If you do that you can write your own conversions and mathematical operators exactly the way you want them to be. Plus you have a central place to fix your conversions if you decide to change your ID class in a couple of years. Problem solved - in a very clean way. (IMHO) Then again I might be overestimating the importance and scope of this ID value. But in that case, why is 'unsigned long' the 'correct' type? Regarding this question, see my comments in part 3 of this message. 2. Size types (and difference types) I disagree on your complaint about size types all being unsigned. The quotation from B. Stroustrup explicitly made an exception for specific cases, and as I understand it he *was* referring to size types. The problem with size types is that they do need to be able to point to every location within the addressable memory space. Unfortunately this means they will need every single bit of the biggest data entity a CPU can handle at any one time. If size types were signed, they would need one extra bit for the sign, effectively doubling the amount of memory such a type would take up in memory. Unfortunately 'difference types' technically should be both signed and able to address every legal address in memory - which means one bit more than the current size types! However, see below.... 3. Integral types situation I agree with your general assessment of the integral types' situation. For several reasons actually, some of them not pointed out yet: A) Whenever signed and unsigend types are mixed, the compiler implicitly converts some of the values or intermediate results. Unfortunately it's often not obvious, which values and terms are being converted, and whether that can lead to problems. Most of the time (but not always, as you have correctly pointed out) the compiler issues warnings in such situations, but many compilers just point to the line, not the exact variable or term. B) What makes this problem even more difficult, is the widespread use of libraries, which may or may not use signed/unsigned values for specific properties, and thus don't even leave it open to the developers to avoid such combinations! C) Built-in integral types are based on internal representations of numbers. From the perspective of a designer this is just plain wrong! It is a violation of the principle of information hiding! Types shouldn't reveal anything about their internal representation, and after 20+ years of history in object oriented programming I really don't understand why we are still forced to use such types! When a developer selects an integral type for a variable, he considers the preconditions and postconditions, and decides the range of values his variable could legally assume. However, he shouldn't be forced to map this range to one of a few predefined ranges, just because these predefined ranges happen to fit the compiler's preferred internal representation. If a variable can assume values in the range [0..999999], then is 'long' the 'correct' type? Or is it 'unsigned long'? My answer is: neither! If the developer chooses either type, others looking at the code might not recognize that certain values outside that range (but well within the limits of the chosen type) will cause problems or might indicate an error elsewhere. 4. Resolution I like the suggestion of an integral type that is defined as a range of valid values. An additional (optional) parameter could be provided to define policies such as behavior on overflow, conversion problems and the like. This would be in the spirit of information hiding. In the case of size types or difference types, the address space could be reasonably restricted to a range that allows for a sign bit without compromising the maximum internal data size. In most cases anyway. And when it turns out an additional bit is still needed, then so be it! This design would take away the need to think in internal representations, and instead would allow the developers to concentrate on the true limitations of their variable types. 5. Advantages A) Developers can set more concise limits on their variables and behavior on violation of these limits. B) Predefined behavior on violations at runtime. C) Compiler developers can optimize internally needed storage by clustering mechanisms, depending on whatever memory access and processing capabilities the available cores provide. D) Compiler developers need to deal with just one basic integral class instead of a dozen or more. Depending on processor architecture this may or may not be very helpful (it might be neccessary to redefine the current integral types internally), but at least the use of these types would be more secure and less prone to programming errors. (Of course this implies the annihilation of current integral types which won't happen for some time, even after a standard basic adaptable integral type would be provided.) So these are my 5 cents, sorry for the lengthy post. Stefan
On Monday 18 August 2008 13:06:08 Lang Stefan wrote:
C) Built-in integral types are based on internal representations of numbers.
Yes because they are the lowest level ones.
From the perspective of a designer this is just plain wrong! It is a violation of the principle of information hiding! Types shouldn't reveal anything about their internal representation, and after 20+ years of history in object oriented programming I really don't understand why we are still forced to use such types! When a developer selects an integral type for a variable, he considers the preconditions and postconditions, and decides the range of values his variable could legally assume. However, he shouldn't be forced to map this range to one of a few predefined ranges, just because these predefined ranges happen to fit the compiler's preferred internal representation. If a variable can assume values in the range [0..999999], then is 'long' the 'correct' type? Or is it 'unsigned long'? My answer is: neither!
Depending on platform it may be neither.
If the developer chooses either type, others looking at the code might not recognize that certain values outside that range (but well within the limits of the chosen type) will cause problems or might indicate an error elsewhere.
That's the job of the programmer. You have the native types with platform dependent ranges. Good. That's a good start (without which you couldn't do anything anyways). Now you need integer types to be defined in terms of ranges, ok, you can do it with templates. Arbitrary precision arithmetic is not the C++ standard IMO, it sits perfectly fine in third-party library space. There was also a BigNum boost library proposal for GSoC. After that is done then maybe everyone will be happy about this. -- Dizzy "Linux is obsolete" -- AST
On Mon, Aug 18, 2008 at 12:06:08PM +0200, Lang Stefan wrote:
A) you intend to use your ID to later reference some address or offset (which is the same) in memory. If that is your intention, then you should stick to the size type (i. e. std:size_t). Problem solved.
Yes, it is used as an index into a vector. And that's what I did (typedef vector<..>::size_type pin_id).
cases, and as I understand it he *was* referring to size types. The problem with size types is that they do need to be able to point to every location within the addressable memory space. Unfortunately this
Hm, do they? Correct me if I'm wrong, but I don't think that C++ mandates flat address space -- it just requires that each individual object has a contiguous memory representation. And there can be discrepancy between the two, e.g. 80286 protected mode: largest segment size is 64k, yet the largest amount of addressable memory is 16MB (or even in the range of TB if one allocates all LDTs and plays with swapping). And, oh yes, pointers were 48-bit :-) So, size_type should be able to represent the size of a *single* largest representable object. Why use it for e.g. number of elements in a vector?
means they will need every single bit of the biggest data entity a CPU can handle at any one time. If size types were signed, they would need one extra bit for the sign, effectively doubling the amount of memory such a type would take up in memory. Unfortunately 'difference types' technically should be both signed and able to address every legal address in memory - which means one bit more than the current size types! However, see below....
The "problem" could be solved by using largest _signed_ type both for size_type and difference_type. and 286 is not the only architecture where flat address space does not exist -- even today, a similar situation exists on mainframes (IIRC, there's no straightforward way to convert between increasing integers and increasing memory addresses). If you replace "every legal address in memory" with "every legal index", then I agree with your analysis. But long almost[*] serves this purpose. So, even trying to define integer types that are able to span all memory space is doomed to failure. From a practical perspective: - on 32-bit machine, it's unrealistic to expect to be able to have a vector of more than 2^31 elements (unless all you have is a std::vector<char>) - on 64-bit machine, 2^63 is a huuuge number. signedness does not matter [*] A similar border-case already exists in the C standard library: printf() returns an int, but it's legitimate to write a string with more than 2^15 characters even on platforms where sizeof(int) == 2. What should be the return value of printf() ? So I'd advocate signed representation for size_type and difference_type.
So these are my 5 cents, sorry for the lengthy post.
Oh, thanks for the feedback.
-----Messaggio originale----- Da: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] Per conto di Lang Stefan
3. Integral types situation B) What makes this problem even more difficult, is the widespread use of libraries, which may or may not use signed/unsigned values for specific properties, and thus don't even leave it open to the developers to avoid such combinations!
This is the worst part.... using a library and handling all that conversions.
C) Built-in integral types are based on internal representations of numbers. From the perspective of a designer this is just plain wrong! It is a violation of the principle of information hiding!
I completely agree with you. The idea behind fixed integer types is to limit
memory usage to the real needed value. Else let's go all with 64bit signed
integers even for a boolean.
The old way to define this is specifying the byte size, but a better
approach should be something like integer
D) (Of course this implies the annihilation of current integral types which won't happen for some time, even after a standard basic adaptable integral type would be provided.)
Let's say that all existing integral can be aliases of a specified
integer
So these are my 5 cents, sorry for the lengthy post.
Thanks for your very nice post :)
participants (4)
-
Andrea Denzler
-
dizzy
-
Lang Stefan
-
Zeljko Vrba