[xint] Question about suitability, portability, and "Boostiness"

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Right now, the XInt library uses a pointer to a (privately defined) struct. The struct has several items, including a small array of "quick digits" and a pointer to a dynamically allocated array of digit_t types, both for containing the magnitude of the integer. While trying to find a more efficient way to store things, I came up with this: struct new_data_t { size_t used, allocated; // (other data items) digit_t digits[1]; // Last item in the struct }; It's an old C trick, apparently legitimized in the C99 standard as the "struct hack". For those not familiar with it, the idea is that I'd allocate a single array of digit_t types, of the required size plus sizeof(new_data_t), then just typecast it as a pointer of type new_data_t. Does anyone know if there's any reason not to do this? Maybe some portability or alignment problem with it? - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvD3ZQACgkQp9x9jeZ9/wQLdQCgklN2Xcs0mDAGhsy4n3TX/Bgw JVoAoK+/YEoawXT7Yd0R3++eWLPYZhjr =Jx8O -----END PGP SIGNATURE-----

On 12 April 2010 22:57, Chad Nelson <chad.thecomfychair@gmail.com> wrote:
Right now, the XInt library uses a pointer to a (privately defined) struct. The struct has several items, including a small array of "quick digits" and a pointer to a dynamically allocated array of digit_t types, both for containing the magnitude of the integer.
The most important optimization for me is the SBO. If the magnitude of the int in question is less than 2**30, it should take no dynamic storage at all. (In addition to space gains, this is a big win for exception-safety too.)
It's an old C trick, apparently legitimized in the C99 standard as the "struct hack". For those not familiar with it, the idea is that I'd allocate a single array of digit_t types, of the required size plus sizeof(new_data_t), then just typecast it as a pointer of type new_data_t.
Does anyone know if there's any reason not to do this? Maybe some portability or alignment problem with it?
I believe the official C way is that the last element is an array without a listed length (digits_type digits_[];), so you malloc once for the whole type and use the array being sure not to overrun the space you allocated. I'm not convinced such tricks are officially allowed in C++, given that the new/delete model doesn't offer any way to use them. Since you just have integral types, why not just allocate new digits_type[allocated+2] and know that index 0 is used and index 1 is allocated? I doubt anyone would be too concerned about not being able to store 2**37-bit integers.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/13/2010 12:33 AM, Scott McMurray wrote:
Right now, the XInt library uses a pointer to a (privately defined) struct. The struct has several items, including a small array of "quick digits" and a pointer to a dynamically allocated array of digit_t types, both for containing the magnitude of the integer.
The most important optimization for me is the SBO. If the magnitude of the int in question is less than 2**30, it should take no dynamic storage at all. (In addition to space gains, this is a big win for exception-safety too.)
SBO = Small B??? Optimization? That's easily done. It's what the aforementioned "quick digits" are used for at present. Of course, it means that the space taken up by those digits will be wasted on every integer that's larger than they can hold.
It's an old C trick, apparently legitimized in the C99 standard as the "struct hack". For those not familiar with it, the idea is that I'd allocate a single array of digit_t types, of the required size plus sizeof(new_data_t), then just typecast it as a pointer of type new_data_t.
Does anyone know if there's any reason not to do this? Maybe some portability or alignment problem with it?
I believe the official C way is that the last element is an array without a listed length (digits_type digits_[];),
Yes, if you've got a C99-compatible compiler. The trick has been in use for a lot longer than that though, I first heard about it in the mid-eighties. Using an array of size 1 allows any C or C++ compiler to use it.
so you malloc once for the whole type and use the array being sure not to overrun the space you allocated. I'm not convinced such tricks are officially allowed in C++, given that the new/delete model doesn't offer any way to use them.
They weren't "officially allowed" in C prior to C99 either, but a lot of smart people used them anyway. :-) The new/delete model doesn't rule them out, if you use an array for storage, and the struct you're overlaying on it doesn't require a constructor or destructor. I think you could even use a placement-new to allow for those, but that's beyond what I'd need here.
Since you just have integral types, why not just allocate new digits_type[allocated+2] and know that index 0 is used and index 1 is allocated? I doubt anyone would be too concerned about not being able to store 2**37-bit integers.
I considered that too, but a digit_t could be as small as sixteen bits on some systems (it's defined as half the size of the largest native integer type the system supports). That could overflow in some rare but not-too-contrived cases. For example, since it would also have to be used for the number of copies in the copy-on-write system, I can see someone making more than 64K copies of a particular number, maybe as a default number in a container. I can work around that problem fairly easily if necessary, but the "struct hack" method is cleaner, if there isn't a good reason to avoid it. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvEAH8ACgkQp9x9jeZ9/wRREgCgn0jP40u+vHlOSKOFZovyWpab n90AnRd1TBMHbE4N2obH4THBdHpMQSFU =mKCy -----END PGP SIGNATURE-----

On 13 April 2010 01:26, Chad Nelson <chad.thecomfychair@gmail.com> wrote:
That's easily done. It's what the aforementioned "quick digits" are used for at present. Of course, it means that the space taken up by those digits will be wasted on every integer that's larger than they can hold.
SBO = Small Buffer Optimization You said the "quick digits" are inside the struct to which the xint object holds a pointer, so using them requires dynamic allocation. I picture it something like this: struct xint { union { internal_xint *ptr; intptr_t small; }; xint() : small(1) {} }; Then you only allocate internal_xints on even addresses, so that if (small&1), then the value of the xint is ashr(small, 1). And that way I don't have to worry that a use of a common literal, like xint(3), might throw an exception,

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/13/2010 01:34 AM, Scott McMurray wrote:
You said the "quick digits" are inside the struct to which the xint object holds a pointer, so using them requires dynamic allocation.
I picture it something like this:
struct xint { union { internal_xint *ptr; intptr_t small; }; xint() : small(1) {} };
Then you only allocate internal_xints on even addresses, so that if (small&1), then the value of the xint is ashr(small, 1).
And that way I don't have to worry that a use of a common literal, like xint(3), might throw an exception,
And in exchange, every function in the library that uses the internals of the integer -- and there are several -- would need to have special-case handling for small numbers. Not worth the trade-off. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvEB0YACgkQp9x9jeZ9/wRx5gCfaI6W7GVN5qxE5eSreEcVl9qV I7YAoNT2FeIVJk95S+SBtkF4TXK0rs4R =QNd/ -----END PGP SIGNATURE-----

On 13 April 2010 01:55, Chad Nelson <chad.thecomfychair@gmail.com> wrote:
And in exchange, every function in the library that uses the internals of the integer -- and there are several -- would need to have special-case handling for small numbers. Not worth the trade-off.
And don't they already need it for your "quick digits"? Also, if you're allowing special values like NaNs, you already have the check, and can even store the bit marking the special value in the small variable. if (small & 1) if (small & 2) then special value type (small >> 2) else sbo value, value (small >> 2) else full-length value in *ptr somewhere You've just moved the NaN check before the pointer chase, so it's potentially faster.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/13/2010 02:03 AM, Scott McMurray wrote:
And in exchange, every function in the library that uses the internals of the integer -- and there are several -- would need to have special-case handling for small numbers. Not worth the trade-off.
And don't they already need it for your "quick digits"?
No. There's a "digits" pointer; I just point it to the small QuickDigits array, and only the allocation and deallocation functions need to know anything about its existence. Everything else just uses whatever "digits" points to, either the QuickDigits array or an allocated one. After restructuring the code, there won't be any need for a separate QuickDigits array to prevent a second allocation -- a single allocation will be sufficient. Or even no allocations, if I put a small array in the main integer class itself for small numbers; but again, that would be wasted space for anything larger. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvEDnIACgkQp9x9jeZ9/wTLFQCfV+AXuxqmA+ocH2FloKd4VLBT y8oAoK4Sqn9FPpFt2e7HohnGxfD+IXaP =PX5k -----END PGP SIGNATURE-----

trying to propose something smart i came to the same solution... but what about struct member alignment? does it matter for the struct hack? -- Pavel P.S. if you notice a grammar mistake or weird phrasing in my message please point it out

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/13/2010 11:57 AM, DE wrote:
trying to propose something smart i came to the same solution...
but what about struct member alignment? does it matter for the struct hack?
I don't think that matters. The compiler would set up the struct with the proper alignment, including the first item of the (one-item) array, and sizeof() would return the size with any padding. The items in an array are all packed end-to-end too, so I don't see how this would be any different from a more conventional approach. Except, of course, that it's more efficient. I just wanted to check with the people on this list about the portability of it. Since you guys collectively deal with that more than any other group in the world, I figured you would know if there were any hidden portability problems with the design. If I don't hear about any soon, I'll take it that there aren't any known ones and implement it. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvEph4ACgkQp9x9jeZ9/wRRGQCeNKHmPxtRlQH/vVCCJlf5e83/ gNsAoOM/b8n1M0qKTbOPTx3bbNIzEtpt =8SsY -----END PGP SIGNATURE-----

"Chad Nelson" <chad.thecomfychair@gmail.com> wrote in message news:4BC4A621.3020009@gmail.com...
I just wanted to check with the people on this list about the portability of it. Since you guys collectively deal with that more than any other group in the world, I figured you would know if there were any hidden portability problems with the design. If I don't hear about any soon, I'll take it that there aren't any known ones and implement it.
Considering it is an ancient trick and that any compiler targeting an OS as widespread as Windows has to 'support' it (e.g. BITMAPINFO struct) it is probably safe to assume as 'safe'/portable... OTOH, it would be great if you could separate the core 'big int'/math logic from the storage/allocation logic, for example have: - a base 'math handling' class with functions that all take the actual location and size of the number as parameters - a wrapping template class(es) that can be configured with policies whether to use/work with fixed sized buffers/numbers (thus no memory allocation, thus no exception handling code, thus maximally lean code) or dynamically sized buffers (with or without SBOs, deep or shallow copies, reference counting etc etc...)... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

on 14.04.2010 at 19:50 Domagoj Saric wrote :
Considering it is an ancient trick and that any compiler targeting an OS as widespread as Windows has to 'support' it (e.g. BITMAPINFO struct) it is probably safe to assume as 'safe'/portable...
OTOH, it would be great if you could separate the core 'big int'/math logic from the storage/allocation logic, for example have: - a base 'math handling' class with functions that all take the actual location and size of the number as parameters
a very good point this way it would be almost trivial to implement an arbitrary precision number as well as fixed precision numbers (e.g. 'xint::fixed<128>') oh, you already wrote that:
- a wrapping template class(es) that can be configured with policies whether to use/work with fixed sized buffers/numbers (thus no memory allocation, thus no exception handling code, thus maximally lean code) or dynamically sized buffers (with or without SBOs, deep or shallow copies, reference counting etc etc...)...
and a policy regulating whether to use ordinary arithmetic or modulo one would be vital e.g. a natural expectation is that a fixed 128 bit integer does modulo (2^128) arithmetic -- Pavel P.S. if you notice a grammar mistake or weird phrasing in my message please point it out

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/14/2010 12:11 PM, DE wrote:
- a wrapping template class(es) that can be configured with policies whether to use/work with fixed sized buffers/numbers (thus no memory allocation, thus no exception handling code, thus maximally lean code) or dynamically sized buffers (with or without SBOs, deep or shallow copies, reference counting etc etc...)...
and a policy regulating whether to use ordinary arithmetic or modulo one would be vital e.g. a natural expectation is that a fixed 128 bit integer does modulo (2^128) arithmetic
I'm not sure what "normal arithmetic" means for a fixed-size integer. Are you referring to using an unlimited-size integer for intermediate results? - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvGRMEACgkQp9x9jeZ9/wT48ACg4q52t2s8xTiWBT3heBf6D0QC tOsAoL0hzT42fxgRV+A+qKsyOpOtka5b =QFKJ -----END PGP SIGNATURE-----

on 15.04.2010 at 2:42 Chad Nelson wrote :
and a policy regulating whether to use ordinary arithmetic or modulo one would be vital e.g. a natural expectation is that a fixed 128 bit integer does modulo (2^128) arithmetic
I'm not sure what "normal arithmetic" means for a fixed-size integer. Are you referring to using an unlimited-size integer for intermediate results?
i meant only that modulo operations should be implicitly used for fixed precision ints, not the general case ones it can be an implementation detail and even may be hidden from users at all i guess it's all up to you -- Pavel P.S. if you notice a grammar mistake or weird phrasing in my message please point it out

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/14/2010 11:50 AM, Domagoj Saric wrote:
I just wanted to check with the people on this list about the portability of it. [...]
Considering it is an ancient trick and that any compiler targeting an OS as widespread as Windows has to 'support' it (e.g. BITMAPINFO struct) it is probably safe to assume as 'safe'/portable...
That's what I figured, when no one spoke up. I implemented it this morning, and I'm happy to say that the library is slightly faster now, as well as more memory-efficient. (I haven't uploaded that change to the Sandbox yet.)
OTOH, it would be great if you could separate the core 'big int'/math logic from the storage/allocation logic, for example have: - a base 'math handling' class with functions that all take the actual location and size of the number as parameters - a wrapping template class(es) that can be configured with policies whether to use/work with fixed sized buffers/numbers (thus no memory allocation, thus no exception handling code, thus maximally lean code) or dynamically sized buffers (with or without SBOs, deep or shallow copies, reference counting etc etc...)...
Unifying everything like that could be interesting, in both the "fascinating" and "difficult" meanings of the word. I've put it on my to-do list, I'll see if I can find a way to do it. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvGQvsACgkQp9x9jeZ9/wRyzgCcDdtl2VFy3WxVWG8x6RIR/syF e6cAoPgmskVhp3n7e+H/P2tXJ7K/AbNM =bSUD -----END PGP SIGNATURE-----

"Chad Nelson" <chad.thecomfychair@gmail.com> wrote in message news:4BC642FF.4040803@gmail.com...
OTOH, it would be great if you could separate the core 'big int'/math logic from the storage/allocation logic, for example have: - a base 'math handling' class with functions that all take the actual location and size of the number as parameters - a wrapping template class(es) that can be configured with policies whether to use/work with fixed sized buffers/numbers (thus no memory allocation, thus no exception handling code, thus maximally lean code) or dynamically sized buffers (with or without SBOs, deep or shallow copies, reference counting etc etc...)...
Unifying everything like that could be interesting, in both the "fascinating" and "difficult" meanings of the word. I've put it on my to-do list, I'll see if I can find a way to do it.
I'm sure boost.devel and comp.lang.c++ would provide a helping hand for most implementation problems ;) In any way, going with the separation of the 'core' and 'storage' logic right from the start would be a good idea IMO, even if you provide only one storage logic/policy in the first run...it will help/force you into a better design (that decouples orthogonal problem domains) and enables easier extensibility and configurability later...;) -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/16/2010 10:25 AM, Domagoj Saric wrote:
Unifying everything like that could be interesting, in both the "fascinating" and "difficult" meanings of the word. I've put it on my to-do list, I'll see if I can find a way to do it.
I'm sure boost.devel and comp.lang.c++ would provide a helping hand for most implementation problems ;)
Oh, no doubt. :-) I think I've figured out a way to do it though. It may be a few days before I have the time to implement it and see how it works.
In any way, going with the separation of the 'core' and 'storage' logic right from the start would be a good idea IMO, even if you provide only one storage logic/policy in the first run...it will help/force you into a better design (that decouples orthogonal problem domains) and enables easier extensibility and configurability later...;)
Far too late to do it "right from the start," I've got a working implementation already (and have since before I introduced myself to this list). It doesn't allow for fixed-size integers yet though. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvI00oACgkQp9x9jeZ9/wTSggCggPn6VGQDAoFV36UjqRcFc5c2 zY4An3wda0L/LvfEkRpT5cjI4kj+FGMb =bLrm -----END PGP SIGNATURE-----
participants (4)
-
Chad Nelson
-
DE
-
Domagoj Saric
-
Scott McMurray