[xint] Fourth release, requesting preliminary review again

Chad Nelson

1 Jun 2010 1 Jun '10

1:44 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The next, and hopefully final, iteration of the Extended Integer library (a arbitrary-length integer math library) is now available in the sandbox (at <https://svn.boost.org/svn/boost/sandbox/xint>) and the Vault (at <http://www.boostpro.com/vault/index.php?action=downloadfile&filename=xint.zip&directory=&>). It incorporates almost all of the suggestions offered on this list. The highlights of this release are: * It's now a header-only library. It can still be compiled if you wish (I use the compiled version for most of my testing), but that's no longer required. * The user-visible classes (integer, nothrow_integer, and fixed_integer) are all now templates, with parameters for an allocator, thread-safe operation, and "secure mode" (zeroing all memory before releasing it). * Thread-safe operation is now the default. It's slightly slower that way, but not much, because the library still uses a copy-on-write system for internal functions (where it's safe because variables allocated there can never be seen across thread boundaries). Every object is given its own storage before it's returned to the non-library caller. I've included my testing projects as well, for CodeLite under Linux and Visual Studio under Windows, for convenience. It includes a slightly older version of Ion Gaztañaga's preliminary Move library too. I haven't tested it with the current version, but it should work with that as well. The Move library is not used by default, and makes only a slight difference to the speed when thread-safety is enabled. I plan to add a few more examples, but what's there should be sufficient to get people started. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwFDsoACgkQp9x9jeZ9/wR4GwCfQ3LMFyQZIuNZC71qEgDQR1fG uLwAoKR3cMA3Be859DxYgz56GsGlCqE9 =o62S -----END PGP SIGNATURE-----

Show replies by date

Marius Stoica

2 Jun 2 Jun

4:02 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

On Tuesday 01 June 2010 16:44:45 Chad Nelson wrote:

...

The next, and hopefully final, iteration of the Extended Integer library (a arbitrary-length integer math library) is now available in the

Great job! Some comments The documentation got a bit harder to read since it redundantly repeats the template arguments. Can something be done about that? Related to that maby you can consider using boost parameter for the template parameters? I think you should make clear in the documentation what threadsafe means. If it still only means that you can't use copies of xints across threads then some could still want to use them in multithreaded code. From an user's point of view making nothrow just a template argument would make sense . I consider using size_t for size in bits a little confusing/annoying. Maby you should typedef size_t bits_t; ? Also i'm interested wath do you think about the performamce when using small numbers. I've seen claims that gmp is "only 5% slower " than using plain ints in such cases. I'm thinking that such situations would be quite common in a lot of applications. data_t seems quite a bloated class to me .

Chad Nelson

7:24 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/02/2010 12:02 PM, Marius Stoica wrote:

...

...
The next, and hopefully final, iteration of the Extended Integer library (a arbitrary-length integer math library) is now available in the

Great job!

Thanks!

...

Some comments

The documentation got a bit harder to read since it redundantly repeats the template arguments. Can something be done about that?

I've thought the same thing, and I don't yet know what to do about it. The only way I can think to fix it is to rewrite all of the functions as non-templates, purely for documentation purposes. I'm not fond of that solution; it means that the documentation would have to be at least somewhat removed from the implementation. The function signatures would have to be duplicated as well, which would increase the chances of them getting out of synch on any changes. I haven't ruled it out, but I'm hoping to find an alternative.

...

Related to that maby you can consider using boost parameter for the template parameters?

Would that make the documentation easier to read? I don't see any other benefit to using it here; there are only three optional parameters, and two of them have the same type (bool) so they couldn't benefit from Boost.Parameter's automatic deduction.

...

I think you should make clear in the documentation what threadsafe means. If it still only means that you can't use copies of xints across threads then some could still want to use them in multithreaded code.

That's still what it means. I've added further documentation on it, and modified the copy constructors to allow optional thread-safe copying of non-thread-safe objects; I've uploaded those changes to the sandbox now, if you want to see them.

...

From an user's point of view making nothrow just a template argument would make sense .

Would it? Though similar, they act very differently in the face of errors. The nothrow version also needs to support the Not-a-Number value, which requires a couple functions that don't make sense for the other types.

...

I consider using size_t for size in bits a little confusing/annoying. Maby you should typedef size_t bits_t; ?

I'm not sure I understand the complaint. size_t is the standard type for sizes. Wouldn't introducing a new type, which is just an alias for an existing and well-known type, be more confusing to people?

...

Also i'm interested wath do you think about the performamce when using small numbers. I've seen claims that gmp is "only 5% slower " than using plain ints in such cases. I'm thinking that such situations would be quite common in a lot of applications.

I hadn't considered that usage. I'm not sure how the speed compares to plain ints, but I'd be very surprised if it were anything near that fast. To make it faster, I'd have to eliminate some of its abilities. Or add assembly-language functions, which would break portability.

...

data_t seems quite a bloated class to me .

It's the central class of the entire library. Reducing its size would just mean moving the complexity somewhere else. Most of its size is due directly to the requirement that it support allocators, which I was told is necessary to be accepted as a Boost library. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwGr/UACgkQp9x9jeZ9/wQqDQCeIH3dAEv2QfkZT7fCEXRxshMy W4cAoI5e8YBSz5Twmrit0GT5RJr8MuKa =Axat -----END PGP SIGNATURE-----

John Bytheway

7:44 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

On 02/06/10 20:24, Chad Nelson wrote:

...

On 06/02/2010 12:02 PM, Marius Stoica wrote:

...
The documentation got a bit harder to read since it redundantly repeats the template arguments. Can something be done about that?

I've thought the same thing, and I don't yet know what to do about it. The only way I can think to fix it is to rewrite all of the functions as non-templates, purely for documentation purposes.

Could you use a macro which is defined differently int the compilation and documentation-generation passes?

...

...
I consider using size_t for size in bits a little confusing/annoying. Maby you should typedef size_t bits_t; ?

I'm not sure I understand the complaint. size_t is the standard type for sizes. Wouldn't introducing a new type, which is just an alias for an existing and well-known type, be more confusing to people?

There is precedent in std::bitset for using size_t to count bits; I think it's fine. John Bytheway

Chad Nelson

3 Jun 3 Jun

4:22 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/02/2010 03:44 PM, John Bytheway wrote:

...

...
...
The documentation got a bit harder to read since it redundantly repeats the template arguments. Can something be done about that?

I've thought the same thing, and I don't yet know what to do about it. The only way I can think to fix it is to rewrite all of the functions as non-templates, purely for documentation purposes.

Could you use a macro which is defined differently int the compilation and documentation-generation passes?

It took some experimentation, but that worked. :-D Thanks! - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwHLgEACgkQp9x9jeZ9/wSEegCgh+lJZ62N8CgBNfMFovWMHz/7 nroAoMt1aK88o47+uelxRBgyxMWrzL8k =4J9y -----END PGP SIGNATURE-----

Marius Stoica

2 Jun 2 Jun

9:53 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

On Wednesday 02 June 2010 22:24:40 Chad Nelson wrote:

...

...
The documentation got a bit harder to read since it redundantly repeats the template arguments. Can something be done about that?

I've thought the same thing, and I don't yet know what to do about it. The only way I can think to fix it is to rewrite all of the functions as non-templates, purely for documentation purposes.

Not worth the effort... and what would be the purpose of using doxygen then ? I was thinking there migth be some doxygen option to disable this ? I would think this is not the only library to have this issue. Maby make a feature request to the doxigen bug tracker ? Frankly it's not that big an issue.

...

...
Related to that maby you can consider using boost parameter for the template parameters?

Would that make the documentation easier to read? I don't see any other benefit to using it here; there are only three optional parameters, and two of them have the same type (bool) so they couldn't benefit from Boost.Parameter's automatic deduction.

Why not ? As the parameter docs points out having two bool parameters can get confusing. And i don't like having to specify the allocator every time . Of course is't just some syntactic sugar.

...

...
I think you should make clear in the documentation what threadsafe means. If it still only means that you can't use copies of xints across threads then some could still want to use them in multithreaded code.

That's still what it means. I've added further documentation on it, and modified the copy constructors to allow optional thread-safe copying of non-thread-safe objects; I've uploaded those changes to the sandbox now, if you want to see them.

It seems to be missing threadsafe.html

...

...
From an user's point of view making nothrow just a template argument would make sense .

Would it? Though similar, they act very differently in the face of errors. The nothrow version also needs to support the Not-a-Number value, which requires a couple functions that don't make sense for the other types.

It's just a compile-time option like the other ones.

...

...
I consider using size_t for size in bits a little confusing/annoying. Maby you should typedef size_t bits_t; ?

I'm not sure I understand the complaint. size_t is the standard type for sizes. Wouldn't introducing a new type, which is just an alias for an existing and well-known type, be more confusing to people?

It's just that the size_t is used to count in bytes everywere. That's how you usually count syze.

...

...
Also i'm interested wath do you think about the performamce when using small numbers. I've seen claims that gmp is "only 5% slower " than using plain ints in such cases. I'm thinking that such situations would be quite common in a lot of applications.

I hadn't considered that usage. I'm not sure how the speed compares to plain ints, but I'd be very surprised if it were anything near that fast. To make it faster, I'd have to eliminate some of its abilities. Or add assembly-language functions, which would break portability.

...
data_t seems quite a bloated class to me .

It's the central class of the entire library. Reducing its size would just mean moving the complexity somewhere else. Most of its size is due directly to the requirement that it support allocators, which I was told is necessary to be accepted as a Boost library.

I was thinking of maby doing something like this. bitfield flags; union { intmax_t val; data_t* data; } and use a flag to see if it's using a val or data. that should make sure that small values don't use any allocatin and use little memory. some of the bools in data_t could be put in the bitfield.

David Abrahams

10:09 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

At Thu, 3 Jun 2010 00:53:23 +0300, Marius Stoica wrote:

...

...
Would that make the documentation easier to read? I don't see any other benefit to using it here; there are only three optional parameters, and two of them have the same type (bool) so they couldn't benefit from Boost.Parameter's automatic deduction.

Why not ? As the parameter docs points out having two bool parameters can get confusing. And i don't like having to specify the allocator every time . Of course is't just some syntactic sugar.

bools are the easiest type in the world to eliminate the “same type” problem with. Just use enums. Then deduced template arguments become beautiful. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Chad Nelson

3 Jun 3 Jun

4:33 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/02/2010 06:09 PM, David Abrahams wrote:

...

...
...
Would that make the documentation easier to read? I don't see any other benefit to using it here; there are only three optional parameters, and two of them have the same type (bool) so they couldn't benefit from Boost.Parameter's automatic deduction.

Why not ? As the parameter docs points out having two bool parameters can get confusing. And i don't like having to specify the allocator every time . Of course is't just some syntactic sugar.

bools are the easiest type in the world to eliminate the “same type” problem with. Just use enums. Then deduced template arguments become beautiful.

Ah, yes... thanks. I'm not sure whether there's enough call to use Boost.Parameter for the XInt library, but if there is, that will make it a lot more elegant. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwHMIsACgkQp9x9jeZ9/wTHegCfW+kRtqGi5G3PM9bxoapHxwCJ gOEAn3i5ecUbcDl0fAfqu7EX/vPUEu11 =OgD/ -----END PGP SIGNATURE-----

Chad Nelson

4:30 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/02/2010 05:53 PM, Marius Stoica wrote:

...

...
...
The documentation got a bit harder to read since it redundantly repeats the template arguments. Can something be done about that?

I've thought the same thing, and I don't yet know what to do about it. The only way I can think to fix it is to rewrite all of the functions as non-templates, purely for documentation purposes.

Not worth the effort... and what would be the purpose of using doxygen then ? I was thinking there migth be some doxygen option to disable this ? I would think this is not the only library to have this issue. Maby make a feature request to the doxigen bug tracker ? Frankly it's not that big an issue.

I found a way, thanks to a suggestion from John Bytheway. :-)

...

...
...
Related to that maby you can consider using boost parameter for the template parameters?

Would that make the documentation easier to read? I don't see any other benefit to using it here; there are only three optional parameters, and two of them have the same type (bool) so they couldn't benefit from Boost.Parameter's automatic deduction.

Why not ? As the parameter docs points out having two bool parameters can get confusing. And i don't like having to specify the allocator every time . Of course is't just some syntactic sugar.

Since you'll probably use typedefs, it's not something you'll need to deal with often. Easy enough to change it, but I'm not sure I see the need yet.

...

...
...
I think you should make clear in the documentation what threadsafe means. If it still only means that you can't use copies of xints across threads then some could still want to use them in multithreaded code.

That's still what it means. I've added further documentation on it, and modified the copy constructors to allow optional thread-safe copying of non-thread-safe objects; I've uploaded those changes to the sandbox now, if you want to see them.

It seems to be missing threadsafe.html

<sigh> Problem found and fixed -- it's there now, along with the latest changes. Sorry about that.

...

...
...
From an user's point of view making nothrow just a template argument would make sense .

Would it? Though similar, they act very differently in the face of errors. The nothrow version also needs to support the Not-a-Number value, which requires a couple functions that don't make sense for the other types.

It's just a compile-time option like the other ones.

Not entirely, due to those two extra functions. And logically it's a distinct type.

...

...
...
I consider using size_t for size in bits a little confusing/annoying. Maby you should typedef size_t bits_t; ?

I'm not sure I understand the complaint. size_t is the standard type for sizes. Wouldn't introducing a new type, which is just an alias for an existing and well-known type, be more confusing to people?

It's just that the size_t is used to count in bytes everywere. That's how you usually count syze.

I still don't really understand the problem, but I've changed it to bitsize_t. That should be close enough that it's intuitively obvious what it's based on.

...

...
...
data_t seems quite a bloated class to me .

It's the central class of the entire library. Reducing its size would just mean moving the complexity somewhere else. Most of its size is due directly to the requirement that it support allocators, which I was told is necessary to be accepted as a Boost library.

I was thinking of maby doing something like this.

bitfield flags; union { intmax_t val; data_t* data; }

and use a flag to see if it's using a val or data. that should make sure that small values don't use any allocatin and use little memory. some of the bools in data_t could be put in the bitfield.

That might save a few bytes of memory, but it would make the entire library slower, because it would have to check to see whether to use the data_t object or not before every operation. Memory isn't so precious that I'd consider that a good trade-off. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwHMAMACgkQp9x9jeZ9/wQuPgCggIGud4Q3l421tuh6LuaWZQzn +JQAoJFS5bNdcaSCtdoqugujY0z+gLYH =sWhv -----END PGP SIGNATURE-----

Marius Stoica

1:40 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

On Thursday 03 June 2010 07:30:59 Chad Nelson wrote:

...

I found a way, thanks to a suggestion from John Bytheway. :-)

That's really awsome :)

...

Since you'll probably use typedefs, it's not something you'll need to deal with often. Easy enough to change it, but I'm not sure I see the need yet.

...

...
...
Would it? Though similar, they act very differently in the face of errors. The nothrow version also needs to support the Not-a-Number value, which requires a couple functions that don't make sense for the other types.

It's just a compile-time option like the other ones.

Not entirely, due to those two extra functions. And logically it's a distinct type.

The more i think about it i think you should go for a policy based design. Rigth now there is the obvious problem that you don't have a nothrow_fixed_integer. Maby you can do something like this : integer_t < throws<false>, size<128>, // 0 means unlimited thread_safe<false>> myint; You could also have some other policies -overflow policy( become nan, overflow to negative) -signed/unsigned policy

...

That might save a few bytes of memory, but it would make the entire library slower, because it would have to check to see whether to use the data_t object or not before every operation. Memory isn't so precious that I'd consider that a good trade-off. --

Doesn't seem to bad to me, the advantages quite serious -if you have a lot of small ints you save a lot of memory( i think an data_t would be more than 3x larger) -for example you can put them in a vector without having pointers to everywhere(less fragmentation) -you can allocate them on the stack without needing to dynamically allocate memory wich is quite expensive and nondeterministic For that you will need to -add a few instructions to every function(not very much) -check a flag (not very costly xoring and comparing are the cheapest ops and it will have good branch prediction ) -add another pointer(probably the most serious issue, but you already have a 3 pointer indirection - maby you can make it so you can keep that ? )

Chad Nelson

4:53 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/03/2010 09:40 AM, Marius Stoica wrote:

...

...
I found a way, thanks to a suggestion from John Bytheway. :-)

That's really awsome :)

Yup. I'm happy with it. :-)

...

...
...
It's just a compile-time option like the other ones.

Not entirely, due to those two extra functions. And logically it's a distinct type.

The more i think about it i think you should go for a policy based design. Rigth now there is the obvious problem that you don't have a nothrow_fixed_integer.

I don't see that as a problem. The fixed_integer class is a convenience, because one fellow here requested it; it does nothing that nothrow_integer can't do just as well and nearly as easily.

...

Maby you can do something like this :

integer_t < throws<false>, size<128>, // 0 means unlimited thread_safe<false>> myint;

You could also have some other policies -overflow policy( become nan, overflow to negative) -signed/unsigned policy

I'm not sure that's really an improvement. The current design separates throwing and nonthrowing for a reason; take a look at some of the function implementations and you'll see why. Combining those would mean, at the very least, an extra if statement in every function. If you look at the fixed_integer versus integer_t templates, you'll see that they really only differ in the digitmanager_t that they use. I considered combining them, or deriving one from the other, but it seemed cleaner and more straightforward to make them unique types.

...

...
That might save a few bytes of memory, but it would make the entire library slower, because it would have to check to see whether to use the data_t object or not before every operation. Memory isn't so precious that I'd consider that a good trade-off.

Doesn't seem to bad to me, the advantages quite serious -if you have a lot of small ints you save a lot of memory( i think an data_t would be more than 3x larger)

If you have a lot of small ints, you're better off storing them as ints and constructing large integer types from them if and when necessary. The library is explicitly designed for storing and manipulating *large* integers, things that are too big for the built-in types. I'd rather not dilute that by trying to make it all things to all people.

...

-for example you can put them in a vector without having pointers to everywhere(less fragmentation)

If you need that quality, the fixed_integer type will always allocate exactly the same amount of memory, no more and no less, making it easy to prevent fragmentation. And you can provide your own allocator to get complete control over the memory.

...

-you can allocate them on the stack without needing to dynamically allocate memory wich is quite expensive and nondeterministic

Not part of the library's design goals. The expense of dynamic memory allocation is dwarfed by the time it takes to process numbers of any reasonable size.

...

For that you will need to

-add a few instructions to every function(not very much) -check a flag (not very costly xoring and comparing are the cheapest ops and it will have good branch prediction ) -add another pointer(probably the most serious issue, but you already have a 3 pointer indirection - maby you can make it so you can keep that ? )

The fact that it takes any at all means that it would slow the library down further. Just adding support for user-selected allocators slowed it down by 2.3 to 2.5 percent, something I am not very happy about. I'd need a *very* compelling reason to slow it any further. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwH3fAACgkQp9x9jeZ9/wRq3wCeNbAwHMl8OXpvKVe9oDU7iqdK wU8An0lFEF1TKo6YmG3hMjWCAh6czC5q =oj9X -----END PGP SIGNATURE-----

Stewart, Robert

6:04 p.m.

Chad Nelson wrote:

...

On 06/03/2010 09:40 AM, Marius Stoica wrote:

...
Maby you can do something like this :

integer_t < throws<false>, size<128>, // 0 means unlimited thread_safe<false>> myint;

You could also have some other policies -overflow policy( become nan, overflow to negative) -signed/unsigned policy

I'm not sure that's really an improvement. The current design separates throwing and nonthrowing for a reason; take a look at some of the function implementations and you'll see why. Combining those would mean, at the very least, an extra if statement in every function.

What if the policy class provided the behavior you would otherwise control with a conditional? For those policies in which the behavior is not needed, the inline, static member function would do nothing. For those in which the behavior is needed, the member function would do the work unconditionally. Changing the policy changes the behavior, so no need for a conditional. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Chad Nelson

9:33 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/03/2010 02:04 PM, Stewart, Robert wrote:

...

...
I'm not sure that's really an improvement. The current design separates throwing and nonthrowing for a reason; take a look at some of the function implementations and you'll see why. Combining those would mean, at the very least, an extra if statement in every function.

What if the policy class provided the behavior you would otherwise control with a conditional? For those policies in which the behavior is not needed, the inline, static member function would do nothing. For those in which the behavior is needed, the member function would do the work unconditionally. Changing the policy changes the behavior, so no need for a conditional.

The above didn't fit with what I thought you guys meant about policy-based stuff. Now that I've corrected that misperception, it looks like it might be a viable answer to at least some of the options. But I'm not sure how to apply it to the throw/nothrow option... I'll experiment with it further and ask if needed. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwIH6sACgkQp9x9jeZ9/wRcdwCgxCYWWsNzfQdEa0J1zPdI0Rho pWcAoJibFFyGwPcAUlk3YuLZg4Iu7xvr =jLe1 -----END PGP SIGNATURE-----

Jeffrey Lee Hellrung, Jr.

9:28 p.m.

On 6/3/2010 9:53 AM, Chad Nelson wrote:

...

On 06/03/2010 09:40 AM, Marius Stoica wrote: [...]

...
...
...
It's just a compile-time option like the other ones.

Not entirely, due to those two extra functions. And logically it's a distinct type.

The more i think about it i think you should go for a policy based design. Rigth now there is the obvious problem that you don't have a nothrow_fixed_integer.

I don't see that as a problem. The fixed_integer class is a convenience, because one fellow here requested it; it does nothing that nothrow_integer can't do just as well and nearly as easily.

...
Maby you can do something like this :

integer_t< throws<false>, size<128>, // 0 means unlimited thread_safe<false>> myint;

You could also have some other policies -overflow policy( become nan, overflow to negative) -signed/unsigned policy

I'm not sure that's really an improvement. The current design separates throwing and nonthrowing for a reason; take a look at some of the function implementations and you'll see why. Combining those would mean, at the very least, an extra if statement in every function.

If by "an extra if statement" you mean an extra compile-time dispatch, then yes, but there wouldn't (or shouldn't, if done correctly) be any runtime overhead. I can see Marius' point of unifying all the integer classes under a single templated class xint::integer_t, but I imagine it will require quite a bit of refactoring. Whether to do this or not, and actually doing it if it should be done, is not the tallest nail right now (or is it?).

...

...
...
That might save a few bytes of memory, but it would make the entire library slower, because it would have to check to see whether to use the data_t object or not before every operation. Memory isn't so precious that I'd consider that a good trade-off.

Doesn't seem to bad to me, the advantages quite serious -if you have a lot of small ints you save a lot of memory( i think an data_t would be more than 3x larger)

If you have a lot of small ints, you're better off storing them as ints and constructing large integer types from them if and when necessary. The library is explicitly designed for storing and manipulating *large* integers, things that are too big for the built-in types. I'd rather not dilute that by trying to make it all things to all people.

It would be nice to have the performance (or some significant fraction thereof) of small integers initially, and have it "nicely transition" into arbitrary precision integers automatically. Manipulations with small integers right now are probably dominated by memory management. Again, this behavior might be most appropriate using another policy (the maximum size of an immediate integer before moving it out to dynamically-allocated remote memory). Again, likely not the tallest nail; I would put it on the "to be considered in the future" list.

...

Not part of the library's design goals. The expense of dynamic memory allocation is dwarfed by the time it takes to process numbers of any reasonable size.

Agreed.

...

...
For that you will need to

-add a few instructions to every function(not very much) -check a flag (not very costly xoring and comparing are the cheapest ops and it will have good branch prediction ) -add another pointer(probably the most serious issue, but you already have a 3 pointer indirection - maby you can make it so you can keep that ? )

The fact that it takes any at all means that it would slow the library down further. Just adding support for user-selected allocators slowed it down by 2.3 to 2.5 percent, something I am not very happy about. I'd need a *very* compelling reason to slow it any further.

Wait...adding support for custom allocators slowed things down over raw new/delete??? That doesn't sound right, though the 2~3% hit might be quality of compiler... You can dispatch (select) between immediate and remote representations via a single function call through a function pointer, so you'll pay a minimum of that, plus any additional logic you need to switch between representations (which can probably be immediate -> remote most of the time). I don't think small-object-optimization should be written off right away, but yeah, I can see where it doesn't fit into your immediate design goals. I'll have to review this incantation of the library when I get some free time. - Jeff

Chad Nelson

10:02 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/03/2010 05:28 PM, Jeffrey Lee Hellrung, Jr. wrote:

...

...
I'm not sure that's really an improvement. The current design separates throwing and nonthrowing for a reason; take a look at some of the function implementations and you'll see why. Combining those would mean, at the very least, an extra if statement in every function.

If by "an extra if statement" you mean an extra compile-time dispatch, then yes, but there wouldn't (or shouldn't, if done correctly) be any runtime overhead.

Yes, once I realized that what I thought he meant didn't fit with what you guys were saying, I did some research and realized what he really meant.

...

I can see Marius' point of unifying all the integer classes under a single templated class xint::integer_t, but I imagine it will require quite a bit of refactoring. Whether to do this or not, and actually doing it if it should be done, is not the tallest nail right now (or is it?).

I've redesigned the whole blasted thing three times already. What's once more? (Yeah, that was a little cynical. I've really got to get back to paying work, I've spent too much time on this already.)

...

...
...
-add another pointer(probably the most serious issue, but you already have a 3 pointer indirection - maby you can make it so you can keep that ? )

The fact that it takes any at all means that it would slow the library down further. Just adding support for user-selected allocators slowed it down by 2.3 to 2.5 percent, something I am not very happy about. I'd need a *very* compelling reason to slow it any further.

Wait...adding support for custom allocators slowed things down over raw new/delete??? That doesn't sound right, though the 2~3% hit might be quality of compiler... [...]

It's the requirements of having a design that supports allocators, while still working with concrete base classes so that the basic functions don't have to be templates. I don't see any good reason to make duplicates of every used function just because, for example, you want to use three different fixed-bit sizes, which is what would happen if every function was a template, in the design I thought I needed before. With policy-based design, that probably wouldn't be a problem, though I need to study it further and do some experiments to see what's possible with it.

...

I'll have to review this incantation of the library when I get some free time.

Please do. I'd really like to know exactly what you guys want it to look like before I spend another month redesigning it yet again. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwIJnAACgkQp9x9jeZ9/wT+6wCfdokeTTI6zTPhd5eFWCCDWjbY lMQAoJMbqh28SPFMcCmPXMKYS10CkG1S =LpOf -----END PGP SIGNATURE-----

Stewart, Robert

4 Jun 4 Jun

11:01 a.m.

Chad Nelson wrote:

...

On 06/03/2010 05:28 PM, Jeffrey Lee Hellrung, Jr. wrote:

...
If by "an extra if statement" you mean an extra compile-time dispatch, then yes, but there wouldn't (or shouldn't, if done correctly) be any runtime overhead.

Yes, once I realized that what I thought he meant didn't fit with what you guys were saying, I did some research and realized what he really meant.

...
I can see Marius' point of unifying all the integer classes under a single templated class xint::integer_t, but I imagine it will require quite a bit of refactoring.

I've redesigned the whole blasted thing three times already. What's once more?

The good news is that you've learned a great deal in the process! This is the curse and benefit of developing for Boost. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Chad Nelson

1:10 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/04/2010 07:01 AM, Stewart, Robert wrote:

...

...
...
I can see Marius' point of unifying all the integer classes under a single templated class xint::integer_t, but I imagine it will require quite a bit of refactoring.

I've redesigned the whole blasted thing three times already. What's once more?

The good news is that you've learned a great deal in the process! This is the curse and benefit of developing for Boost.

Yes, I've learned several powerful techniques that I wasn't previously aware of. Though I could wish I'd learned them more privately... nothing like parading your ignorance in public, eh? On the plus side, it seems that this redesign shouldn't take nearly as long as the last few. The interface system I came up with for the fourth release looks like it can be adapted to policy-based stuff with fairly few changes to most of the code. On the minus side, it looks like every function will have to be a template, which is something that I tried very hard to avoid in previous implementations. It'll be fast, but if a program uses more than one combination of options, the size will increase greatly. The saving grace, I suppose, is that there shouldn't be much need to use too many option combinations. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwI+0wACgkQp9x9jeZ9/wQBcwCgnScyWaCtR31292uUSFf0ZGDw mIYAoI7yY9o9r9ldMTNIPYyU3UeLJ3Ux =QerJ -----END PGP SIGNATURE-----

Scott McMurray

1:15 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

On 4 June 2010 09:10, Chad Nelson <chad.thecomfychair@gmail.com> wrote:

...

On the minus side, it looks like every function will have to be a template, which is something that I tried very hard to avoid in previous implementations. It'll be fast, but if a program uses more than one combination of options, the size will increase greatly. The saving grace, I suppose, is that there shouldn't be much need to use too many option combinations.

Would it not be possible to still keep the math parts in (internal) non-templated functions? It feels like the duplication should mostly be in the interface parts. For instance, a single addition function could take 2 ranges of pointers and perform all the addition logic independently of the template parameters. Then the various templates would arrange the storage properly, call the function, and translate any errors appropriately (not that there'd be any errors for addition, though I suppose there might be a carry).

Chad Nelson

1:37 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/04/2010 09:15 AM, Scott McMurray wrote:

...

...
On the minus side, it looks like every function will have to be a template, which is something that I tried very hard to avoid in previous implementations. It'll be fast, but if a program uses more than one combination of options, the size will increase greatly. The saving grace, I suppose, is that there shouldn't be much need to use too many option combinations.

Would it not be possible to still keep the math parts in (internal) non-templated functions? It feels like the duplication should mostly be in the interface parts.

It's possible for a few of the primitives and more basic functions, but anything that has to allocate a temporary integer_t needs the full-fat version, or it won't know how to allocate and deallocate it properly. Someone using the library for a security program would be put out to discover that temporary variables ignored his stated allocator and Secure preference, for instance.

...

For instance, a single addition function could take 2 ranges of pointers and perform all the addition logic independently of the template parameters. Then the various templates would arrange the storage properly, call the function, and translate any errors appropriately (not that there'd be any errors for addition, though I suppose there might be a carry).

That would work for addition, subtraction, and multiplication. It definitely wouldn't for division (which is more code than the aforementioned three combined). I'm not sure which of the others it would be possible for offhand, but I suspect it would be less than half of them. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwJAYkACgkQp9x9jeZ9/wSjGQCgoooCthX9gPJ6z02ePYcrQjBm 9QoAoPJ/EW9JzI93IstX4rqwMZoUOoBO =3BOb -----END PGP SIGNATURE-----

vicente.botet

2:03 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

----- Original Message ----- From: "Chad Nelson" <chad.thecomfychair@gmail.com> To: <boost@lists.boost.org> Sent: Friday, June 04, 2010 3:37 PM Subject: Re: [boost] [xint] Fourth release, requesting preliminary review again

...

That would work for addition, subtraction, and multiplication. It definitely wouldn't for division (which is more code than the aforementioned three combined). I'm not sure which of the others it would be possible for offhand, but I suspect it would be less than half of them.

This is a good pecentage. Isn't it? Vicente

Chad Nelson

2:29 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/04/2010 10:03 AM, vicente.botet wrote:

...

...
That would work for addition, subtraction, and multiplication. It definitely wouldn't for division (which is more code than the aforementioned three combined). I'm not sure which of the others it would be possible for offhand, but I suspect it would be less than half of them.

This is a good pecentage. Isn't it?

It's better than nothing. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwJDeMACgkQp9x9jeZ9/wQDFgCgk4KxK6EgvknTwHLoPyGX8l/K GvAAoJdxV6/lZ7MsroBNmM5SwMTYjHkj =rNrI -----END PGP SIGNATURE-----

Marius Stoica

4 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

On Friday 04 June 2010 16:37:13 Chad Nelson wrote:

...

It's possible for a few of the primitives and more basic functions, but anything that has to allocate a temporary integer_t needs the full-fat version, or it won't know how to allocate and deallocate it properly. Someone using the library for a security program would be put out to discover that temporary variables ignored his stated allocator and Secure preference, for instance.

Isn't it possible to have a pointer to a function that does allocation/dealocation and is instantiated separately? And keep the pointer in data_t ?

Chad Nelson

3:15 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/04/2010 12:00 PM, Marius Stoica wrote:

...

...
It's possible for a few of the primitives and more basic functions, but anything that has to allocate a temporary integer_t needs the full-fat version, or it won't know how to allocate and deallocate it properly. Someone using the library for a security program would be put out to discover that temporary variables ignored his stated allocator and Secure preference, for instance.

Isn't it possible to have a pointer to a function that does allocation/dealocation and is instantiated separately? And keep the pointer in data_t ?

I thought about it before, and couldn't see a way. But I was just working on that part, and your message prompted me to reevaluate it... I think I *do* see a way now. I'll see if I can explore it into existence. :-) - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwJGJUACgkQp9x9jeZ9/wQz9wCgofCygPS5ES0/ioCojGVh2l+3 yVwAoLLXbsH+LuSv0cQRGkZSPxDDWWI/ =DwHi -----END PGP SIGNATURE-----

Marius Stoica

9:50 a.m.

New subject: [xint] Fourth release, requesting preliminary review again

On Thursday 03 June 2010 19:53:04 Chad Nelson wrote:

...

I'm not sure that's really an improvement. The current design separates throwing and nonthrowing for a reason; take a look at some of the function implementations and you'll see why. Combining those would mean, at the very least, an extra if statement in every function.

I just realised that nothrow integer wouldn't compile with -fno-excetpions. I was assuming that was it's point. I was thinking maby instead of using exceptions you could do something like this in the implementation if (n = maloc(...) == NULL) exceprion_fun(out_of_memory); that could be a function that throws with exceptions and handles the error for nothrow.

Chad Nelson

12:59 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/04/2010 05:50 AM, Marius Stoica wrote:

...

...
I'm not sure that's really an improvement. The current design separates throwing and nonthrowing for a reason; take a look at some of the function implementations and you'll see why. Combining those would mean, at the very least, an extra if statement in every function.

I just realised that nothrow integer wouldn't compile with -fno-excetpions. I was assuming that was it's point.

Nope. See the documentation for the full justification.

...

I was thinking maby instead of using exceptions you could do something like this in the implementation

if (n = maloc(...) == NULL) exceprion_fun(out_of_memory); that could be a function that throws with exceptions and handles the error for nothrow.

That could handle the XInt-generated ones. But since the XInt code has to call out to other code too, that wouldn't solve the entire problem. I don't think C++ allocators, for instance, have any way other than exceptions to handle out-of-memory conditions. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEUEARECAAYFAkwI+KgACgkQp9x9jeZ9/wTbJwCXY5+sYO9xrV1AkuNJav6pzrGc 1gCg64dyRyBoDGSSUAeWQbmFKpQf7Tk= =MiJI -----END PGP SIGNATURE-----

Marius Stoica

2:40 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

On Friday 04 June 2010 15:59:24 Chad Nelson wrote:

...

That could handle the XInt-generated ones. But since the XInt code has to call out to other code too, that wouldn't solve the entire problem. I don't think C++ allocators, for instance, have any way other than exceptions to handle out-of-memory conditions. --

Then i could write an nonthrowing allocator myself. And I can use STL with no- exception with no problems. Is there any other code that migth use exceptions that you're using ? Actually looking at the dependencies the only one that i think migth throw is smartptr wich in the documentation says it has functions that never throw and never call to throwing functions.

Chad Nelson

2:32 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/04/2010 10:40 AM, Marius Stoica wrote:

...

...
That could handle the XInt-generated ones. But since the XInt code has to call out to other code too, that wouldn't solve the entire problem. I don't think C++ allocators, for instance, have any way other than exceptions to handle out-of-memory conditions.

Then i could write an nonthrowing allocator myself. And I can use STL with no- exception with no problems. Is there any other code that migth use exceptions that you're using ?

I haven't examined it for that.

...

Actually looking at the dependencies the only one that i think migth throw is smartptr wich in the documentation says it has functions that never throw and never call to throwing functions.

Then I'll see what I can do with it. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwJDm4ACgkQp9x9jeZ9/wQtOACgzINQKjA7d+drV4OCI1oZJx78 wX8AniWlzRzcWTp2m2BwZGHcns890JVi =hAjR -----END PGP SIGNATURE-----

Marius Stoica

4:01 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

On Friday 04 June 2010 17:32:14 Chad Nelson wrote:

...

Then I'll see what I can do with it.

Great! Thanks :)

Chad Nelson

5 Jun 5 Jun

3:46 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/04/2010 05:50 AM, Marius Stoica wrote:

...

I just realised that nothrow integer wouldn't compile with -fno-excetpions. I was assuming that was it's point. I was thinking maby instead of using exceptions you could do something like this in the implementation

if (n = maloc(...) == NULL) exceprion_fun(out_of_memory); that could be a function that throws with exceptions and handles the error for nothrow.

I've been wrestling with this, and I've hit a dead end: the no-throw version of exception_fun would have to somehow emulate an exception by somehow unwinding the stack back to the point where I'd normally catch the error. There are only two ways that I can see to do that, without exceptions. The first is to test the return value of *every* call, and if it says there was an error, return the error value from it -- very annoying to code, and I see no way to implement it that wouldn't increase the code size, decrease the speed, or both. The second is to use setjmp/longjmp. That doesn't sound like a good idea, since it apparently doesn't do cleanup stuff like calling destructors. The only solution I see is the way I'm handling the nothrow_integer type already: wrap every top-level call in a try/catch construct. Unless someone can suggest some way to handle stack-unwinding without exceptions, doing it manually, or setjmp/longjmp, I don't see any viable way to change it. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwKcV0ACgkQp9x9jeZ9/wQjJgCdGHYtE8D2Y0Xbq0ydBoVs+X6O RJ4An1D5ISD54KHRKFd1KHgpwstPIAa/ =c5rC -----END PGP SIGNATURE-----

Marius Stoica

7:49 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

On Saturday 05 June 2010 18:46:41 Chad Nelson wrote:

...

On 06/04/2010 05:50 AM, Marius Stoica wrote:

...
I just realised that nothrow integer wouldn't compile with -fno-excetpions. I was assuming that was it's point. I was thinking maby instead of using exceptions you could do something like this in the implementation

if (n = maloc(...) == NULL) exceprion_fun(out_of_memory); that could be a function that throws with exceptions and handles the error for nothrow.

I've been wrestling with this, and I've hit a dead end: the no-throw version of exception_fun would have to somehow emulate an exception by somehow unwinding the stack back to the point where I'd normally catch the error.

There are only two ways that I can see to do that, without exceptions. The first is to test the return value of *every* call, and if it says there was an error, return the error value from it -- very annoying to code, and I see no way to implement it that wouldn't increase the code size, decrease the speed, or both.

The second is to use setjmp/longjmp. That doesn't sound like a good idea, since it apparently doesn't do cleanup stuff like calling destructors.

The only solution I see is the way I'm handling the nothrow_integer type already: wrap every top-level call in a try/catch construct. Unless someone can suggest some way to handle stack-unwinding without exceptions, doing it manually, or setjmp/longjmp, I don't see any viable way to change it.

What about having a pointer in the contained classes back to the upper one ? I think you can make it be there only for the no-exception classes by adding it to a no-exception policy class and inheriting it.

Chad Nelson

6 Jun 6 Jun

2:08 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/05/2010 03:49 PM, Marius Stoica wrote:

...

...
[...] The only solution I see is the way I'm handling the nothrow_integer type already: wrap every top-level call in a try/catch construct. Unless someone can suggest some way to handle stack-unwinding without exceptions, doing it manually, or setjmp/longjmp, I don't see any viable way to change it.

What about having a pointer in the contained classes back to the upper one ? I think you can make it be there only for the no-exception classes by adding it to a no-exception policy class and inheriting it.

I'm not sure what that would accomplish. The problem isn't that the classes can't communicate, it's that there's no way (short of either exceptions or setjmp/longjmp) to short-circuit the flow of execution -- if we run into a problem in function H, then we have to send an error code back to function G, which has to check for it and send it back to function F, and so on, all the way to the point where the code knows what to do about it. That's a lot of redundant error-code checks. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwLAy0ACgkQp9x9jeZ9/wQL/ACg4qt19Jj4EkKTCfn69c9lLvgV N0sAn2bQNTgCzQek5lL0HX3nv2R/6vNu =b61p -----END PGP SIGNATURE-----

Jeffrey Lee Hellrung, Jr.

4:05 a.m.

On 6/5/2010 7:08 PM, Chad Nelson wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 06/05/2010 03:49 PM, Marius Stoica wrote:

...
...
[...] The only solution I see is the way I'm handling the nothrow_integer type already: wrap every top-level call in a try/catch construct. Unless someone can suggest some way to handle stack-unwinding without exceptions, doing it manually, or setjmp/longjmp, I don't see any viable way to change it.

What about having a pointer in the contained classes back to the upper one ? I think you can make it be there only for the no-exception classes by adding it to a no-exception policy class and inheriting it.

I'm not sure what that would accomplish. The problem isn't that the classes can't communicate, it's that there's no way (short of either exceptions or setjmp/longjmp) to short-circuit the flow of execution -- if we run into a problem in function H, then we have to send an error code back to function G, which has to check for it and send it back to function F, and so on, all the way to the point where the code knows what to do about it. That's a lot of redundant error-code checks.

Can you just call a user-defined or implementation-defined error handling function that won't return, e.g., std::abort? Maybe just call throw_exception from Boost.Exception, which must be user-defined if exceptions are disabled. Or do you actually, ultimately, want an error code sent back to the user? I'm afraid I haven't been following the context of this particular sub-thread... - Jeff

Chad Nelson

4:19 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/06/2010 12:05 AM, Jeffrey Lee Hellrung, Jr. wrote:

...

...
[...] The problem isn't that the classes can't communicate, it's that there's no way (short of either exceptions or setjmp/longjmp) to short-circuit the flow of execution -- if we run into a problem in function H, then we have to send an error code back to function G, which has to check for it and send it back to function F, and so on, all the way to the point where the code knows what to do about it. That's a lot of redundant error-code checks.

Can you just call a user-defined or implementation-defined error handling function that won't return, e.g., std::abort? [...]

Well, that would be one way to deal with the problem. ;-)

...

Or do you actually, ultimately, want an error code sent back to the user? I'm afraid I haven't been following the context of this particular sub-thread...

That was the idea -- to implement nothrow_integer without using any exceptions at all. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwLIc4ACgkQp9x9jeZ9/wS2ZgCfeKEhfPCi9CZMbk2m2raarrjf aPYAoO1GhKl8YYk8tvHqoNjIuWhtz1xA =D4v1 -----END PGP SIGNATURE-----

Christopher Jefferson

8:14 a.m.

New subject: [xint] Fourth release, requesting preliminary review again

On 6 Jun 2010, at 05:19, Chad Nelson wrote:

...

That was the idea -- to implement nothrow_integer without using any exceptions at all.

Mant libraries, including libstdc++ (the standard library in g++) implement support for -fno-exceptions by just calling abort() instead of throw. I would advise just doing that, as from my experience anything else turns into an unreadable mess very quickly. It is also reasonable to make the function called user-defined and allow longjmping out, under the condition memory will leak, so users should aim to clean up and exit, not get exceptions repeatedly. Chris

Chad Nelson

2:10 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/06/2010 04:14 AM, Christopher Jefferson wrote:

...

...
That was the idea -- to implement nothrow_integer without using any exceptions at all.

Mant libraries, including libstdc++ (the standard library in g++) implement support for -fno-exceptions by just calling abort() instead of throw. I would advise just doing that, as from my experience anything else turns into an unreadable mess very quickly. It is also reasonable to make the function called user-defined and allow longjmping out, under the condition memory will leak, so users should aim to clean up and exit, not get exceptions repeatedly.

- From my testing, that looks like the only reasonable solution. It looks like -fno-exceptions disables throw, try, and catch -- you can't even compile a program containing any of those statements, and putting them in a template class or function that's never instantiated doesn't prevent that. Can I assume (or insist) that BOOST_NO_EXCEPTIONS will be defined if the user wants that behavior? - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwLrFUACgkQp9x9jeZ9/wQPdwCfTS24h3iGs4YAGoCwmI/g5BVu qOQAoOR3CyMJ6ApLoa4ovaknnfoXCWsy =o4Yk -----END PGP SIGNATURE-----

Jürgen Hunold

3 Jun 3 Jun

6:58 a.m.

New subject: [xint] Fourth release, requesting preliminary review again

Hi Chad, On Tuesday, 1. June 2010 15:44:45 you wrote:

...

The highlights of this release are:

* It's now a header-only library. It can still be compiled if you wish (I use the compiled version for most of my testing), but that's no longer required.

Please find a diff removing the dependency on the obsolete lib from test/Jamfile. Test run fine then ;-)) It is cool to have xint header-only.

...

I've included my testing projects as well, for CodeLite under Linux and Visual Studio under Windows, for convenience.

I prefer Boost.Build ;-))

...

I plan to add a few more examples, but what's there should be sufficient to get people started.

That will be nice. Yours, Jürgen -- * Dipl.-Math. Jürgen Hunold ! Ingenieurgesellschaft für * voice: ++49 511 262926 57 ! Verkehrs- und Eisenbahnwesen mbH * fax : ++49 511 262926 99 ! Lister Straße 15 * juergen.hunold@ivembh.de ! www.ivembh.de * * Geschäftsführer: ! Sitz des Unternehmens: Hannover * Prof. Dr.-Ing. Thomas Siefer ! Amtsgericht Hannover, HRB 56965 * PD Dr.-Ing. Alfons Radtke !

Chad Nelson

3:38 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/03/2010 02:58 AM, Jürgen Hunold wrote:

...

...
* It's now a header-only library. It can still be compiled if you wish (I use the compiled version for most of my testing), but that's no longer required.

Please find a diff removing the dependency on the obsolete lib from test/Jamfile. Test run fine then ;-))

Thanks. I forgot that was there.

...

It is cool to have xint header-only.

It has certainly simplified the setup.

...

...
I've included my testing projects as well, for CodeLite under Linux and Visual Studio under Windows, for convenience.

I prefer Boost.Build ;-))

If Boost.Build had an integrated editor and debugger, I might agree. ;-) The integrated debugger is the main reason that I use IDEs, rather than just a favorite text editor and a build system. It's much easier to go with the flow, and use whatever build system is built into the IDE, than try to force it to work my way. I've tried both. (Also, Boost.Build has a HUGE learning curve. I've used it for a couple years now, on roughly a dozen small projects... most tools get increasingly easier to use, but that one's still a major pain. A powerful and cross-platform pain, but a pain nonetheless.) - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwHzGIACgkQp9x9jeZ9/wQSdQCbB4m6O8tkjGQKafren6XZ1FDA 2WkAnRLCTeVnFGfFPrQA2aMIHUxOY+it =fuGX -----END PGP SIGNATURE-----

DE

9 Jun 9 Jun

7:23 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

on 01.06.2010 at 17:44 Chad Nelson wrote : congrats with the new release! great job indeed! i ran through the docs and partly through the sources and have something to say is_odd and is_even is_a_good_decision_indeed in general i like what the interface looks like very much so far i have a question about fixed_integer personally my intention was that fixed integers do not allocate memory on the heap but do everything completely on the stack but now it's look like i need to do the following to achieve a fixed size int: template<size_t size> class my_stack_allocator {/*...*/}; typedef xint::fixed_integer<512, my_stack_allocator<512> > int512; in my opinion this is redudant since a the size of a given fixed integer is known in advance (at compile time) and it is intended to NOT allocate on the heap (as i understand it) and even not supposed to use an allocator -- Pavel P.S. if you notice a grammar mistake or weird phrasing in my message please point it out

Chad Nelson

10 Jun 10 Jun

1:49 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/09/2010 03:23 PM, DE wrote:

...

congrats with the new release! great job indeed!

Thanks!

...

i ran through the docs and partly through the sources and have something to say

is_odd and is_even is_a_good_decision_indeed in general i like what the interface looks like very much

Then you'll be happy to know that I've pretty much managed to keep it for the fifth release. :-)

...

so far i have a question about fixed_integer personally my intention was that fixed integers do not allocate memory on the heap but do everything completely on the stack

Unfortunately I don't see a way to safely and automatically provide entirely stack-based integers yet.

...

but now it's look like i need to do the following to achieve a fixed size int:

template<size_t size> class my_stack_allocator {/*...*/};

typedef xint::fixed_integer<512, my_stack_allocator<512> > int512;

in my opinion this is redudant since a the size of a given fixed integer is known in advance (at compile time) and it is intended to NOT allocate on the heap (as i understand it) and even not supposed to use an allocator

A stack-based integer's magnitude would always have to be stored directly adjacent to the integer itself on the stack, or risk having one deallocated before the other. That means no sharing of memory (even temporarily) for integers with identical magnitudes, and no moving magnitudes around by pointer-swapping -- every time you copy an integer for any reason, you'd have to copy each and every byte of the magnitude along with it. Swapping two 512-bit integers, just as an example, would mean three copies of 64+ bytes each, rather than a simple pointer swap. I assume you wanted to use stack-based memory for speed, but due to the above, it would likely be slower than the equivalent heap-based integer. Especially with a good caching allocator. It would probably also require some modifications to the lower levels of the code, to disable all the pointer-swapping stuff that I've built in for speed. That makes it difficult to experiment with it at present. Once the library is finalized, we can play with it and see if we can find a way around that problem. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwQRJcACgkQp9x9jeZ9/wQG0ACg2cseDTQsxKUwxBI5i32eFTLi IBIAn170wepEThdAOyUTn2tcncAEMdmz =sq3F -----END PGP SIGNATURE-----

Giovanni Piero Deretta

12:15 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

On Thu, Jun 10, 2010 at 2:49 AM, Chad Nelson <chad.thecomfychair@gmail.com> wrote:

...

On 06/09/2010 03:23 PM, DE wrote:

[...]

...

Unfortunately I don't see a way to safely and automatically provide entirely stack-based integers yet.

...
but now it's look like i need to do the following to achieve a fixed size int:

template<size_t size> class my_stack_allocator {/*...*/};

typedef xint::fixed_integer<512, my_stack_allocator<512> > int512;

in my opinion this is redudant since a the size of a given fixed integer is known in advance (at compile time) and it is intended to NOT allocate on the heap (as i understand it) and even not supposed to use an allocator

A stack-based integer's magnitude would always have to be stored directly adjacent to the integer itself on the stack, or risk having one deallocated before the other. That means no sharing of memory (even temporarily) for integers with identical magnitudes, and no moving magnitudes around by pointer-swapping -- every time you copy an integer for any reason, you'd have to copy each and every byte of the magnitude along with it. Swapping two 512-bit integers, just as an example, would mean three copies of 64+ bytes each, rather than a simple pointer swap.

I assume you wanted to use stack-based memory for speed, but due to the above, it would likely be slower than the equivalent heap-based integer. Especially with a good caching allocator.

In my experience, stack allocated memory easily beat heap allocation even for large buffer sizes. In the past I have used a fixed string type with a ridiculously large size for some specialized application and it outperformed the heap based string type it replaced (I didn't try to manually optimize copies away). The reason is that first of all modern cpu are pretty good at copying stuff around, much better than at chasing pointers, also stack allocated objects usually give more freedom to the compiler optimizer (more precise aliasing information and no globally visible side effects). Finally, 512 bit is not that anyway, on a modern x86 processor that's 4 registers or a single cacheline. Did you consider separating the big int data structure itself from the big int algorithms, a la STL? This way anybody can roll their own big int class according to their own need and you need to provide only a default one, optimized for the common case. I haven't read the code or documentation nor followed the discussion in depth, so this might actually already be the case. Just my 0.02€, -- gpd

Chad Nelson

2:05 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/10/2010 08:15 AM, Giovanni Piero Deretta wrote:

...

...
[...] I assume you wanted to use stack-based memory for speed, but due to the above, it would likely be slower than the equivalent heap-based integer. Especially with a good caching allocator.

[...] Did you consider separating the big int data structure itself from the big int algorithms, a la STL? This way anybody can roll their own big int class according to their own need and you need to provide only a default one, optimized for the common case. [...]

I've tried making the algorithms independent by having them take an array and other information, but they need the freedom to reallocate as necessary (or I'd have to have two functions for each operation). That means passing them the allocator as well, at the very least. You end up with four parameters for every integer you need to pass to a function (bool reference for positive/negative, size_t reference for length, size_t for max-length, and the magnitude array), plus making it a template because allocators aren't based on a standard class. Or wasting time packing and unpacking those same options to and from an intermediate structure of some kind. It was a lot faster and easier to pass it as the already-existing raw integer class, which already has that information plus allocation and other capabilities. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwQ8RwACgkQp9x9jeZ9/wQ27gCg+PKJPzKyY7AZqPpYtUVfUl7G 9IwAoPXB7eU5D3JQtxdA7JNNLDe74gS6 =7k2v -----END PGP SIGNATURE-----

Jeffrey Lee Hellrung, Jr.

2:15 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

On 6/10/2010 7:05 AM, Chad Nelson wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 06/10/2010 08:15 AM, Giovanni Piero Deretta wrote:

...
...
[...] I assume you wanted to use stack-based memory for speed, but due to the above, it would likely be slower than the equivalent heap-based integer. Especially with a good caching allocator.

[...] Did you consider separating the big int data structure itself from the big int algorithms, a la STL? This way anybody can roll their own big int class according to their own need and you need to provide only a default one, optimized for the common case. [...]

I've tried making the algorithms independent by having them take an array and other information, but they need the freedom to reallocate as necessary (or I'd have to have two functions for each operation). That means passing them the allocator as well, at the very least. You end up

Why do you need to pass the allocator? You cannot put the responsibility on the client code to provide a pointer to a free chunk of memory (whether heap/free store-allocated or stack-allocated) to write to? The alternative, you say, is "I'd have to have two functions for each operations". Can you elaborate?

...

with four parameters for every integer you need to pass to a function (bool reference for positive/negative, size_t reference for length, size_t for max-length, and the magnitude array), plus making it a template because allocators aren't based on a standard class. Or wasting time packing and unpacking those same options to and from an intermediate structure of some kind. It was a lot faster and easier to pass it as the already-existing raw integer class, which already has that information plus allocation and other capabilities.

- Jeff

Chad Nelson

2:39 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/10/2010 10:15 AM, Jeffrey Lee Hellrung, Jr. wrote:

...

...
I've tried making the algorithms independent by having them take an array and other information, but they need the freedom to reallocate as necessary (or I'd have to have two functions for each operation). That means passing them the allocator as well, at the very least. You end up

Why do you need to pass the allocator? You cannot put the responsibility on the client code to provide a pointer to a free chunk of memory (whether heap/free store-allocated or stack-allocated) to write to?

I could, but I don't see the point. See below.

...

The alternative, you say, is "I'd have to have two functions for each operations". Can you elaborate?

One to calculate how much memory would be needed, and (probably) allocate it -- the calculation is different for every operation. The other to do the actual calculation. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwQ+QMACgkQp9x9jeZ9/wTGFwCgp2u1VTkJSQk/IGGpbYRFcxKj yvMAoMo8SW8B2dD7AdfK3TwG0E8NHNtC =kdl/ -----END PGP SIGNATURE-----

Giovanni Piero Deretta

3:58 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

On Thu, Jun 10, 2010 at 3:39 PM, Chad Nelson <chad.thecomfychair@gmail.com> wrote:

...

On 06/10/2010 10:15 AM, Jeffrey Lee Hellrung, Jr. wrote:

...
...
I've tried making the algorithms independent by having them take an array and other information, but they need the freedom to reallocate as necessary (or I'd have to have two functions for each operation). That means passing them the allocator as well, at the very least. You end up

Why do you need to pass the allocator? You cannot put the responsibility on the client code to provide a pointer to a free chunk of memory (whether heap/free store-allocated or stack-allocated) to write to?

I could, but I don't see the point. See below.

...
The alternative, you say, is "I'd have to have two functions for each operations". Can you elaborate?

One to calculate how much memory would be needed, and (probably) allocate it -- the calculation is different for every operation. The other to do the actual calculation.

I do not buy this: as an example, std::copy does not need to know the size of the output sequence nor need to know how to grow it (one can for example use back_inserter for automatic growth). Couldn't xint use a similar strategy? For an heap allocated big int the ouptut iterator can take care of any reallocation, while for a fixed point big int, in case of overflow, one could chose between UB, assert or throw. In principle there is no reason for algorithms to deal with memory allocation details. -- gpd

Chad Nelson

4:54 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/10/2010 11:58 AM, Giovanni Piero Deretta wrote:

...

...
...
The alternative, you say, is "I'd have to have two functions for each operations". Can you elaborate?

One to calculate how much memory would be needed, and (probably) allocate it -- the calculation is different for every operation. The other to do the actual calculation.

I do not buy this:

Too bad, it's on sale and you're missing a really great deal. ;-)

...

as an example, std::copy does not need to know the size of the output sequence nor need to know how to grow it (one can for example use back_inserter for automatic growth). Couldn't xint use a similar strategy?

std::copy doesn't need to know how to grow the output sequence, but it *does* need to be passed the back_inserter, which needs to know how to grow it. What's the difference between having a calculation function that takes the components of raw integers and a class to reallocate them, and simply sending in the raw integer objects that already have all the components and know how to reallocate themselves?

...

For an heap allocated big int the ouptut iterator can take care of any reallocation, while for a fixed point big int, in case of overflow, one could chose between UB, assert or throw. In principle there is no reason for algorithms to deal with memory allocation details.

Any algorithm that has to modify the size of a buffer has to deal with memory allocation, one way or another, at some level. The more indirect you make it, the harder it is to get speed out of it. I'm trying to make the fastest pure-C++ large-integer implementation I can. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwRGMEACgkQp9x9jeZ9/wQHVwCfY8jCptphscioMWO2ooAcGPu0 eNkAn0EheZAZ+qBoFs8vkutlbllDxfza =SzGM -----END PGP SIGNATURE-----

Steven Watanabe

5 p.m.

AMDG Chad Nelson wrote:

...

...
as an example, std::copy does not need to know the size of the output sequence nor need to know how to grow it (one can for example use back_inserter for automatic growth). Couldn't xint use a similar strategy?

std::copy doesn't need to know how to grow the output sequence, but it *does* need to be passed the back_inserter, which needs to know how to grow it.

You don't have to use back_inserter with std::copy. Copying into a pre-existing buffer also works. In Christ, Steven Watanabe

Chad Nelson

5:13 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/10/2010 01:00 PM, Steven Watanabe wrote:

...

...
...
as an example, std::copy does not need to know the size of the output sequence nor need to know how to grow it (one can for example use back_inserter for automatic growth). Couldn't xint use a similar strategy?

std::copy doesn't need to know how to grow the output sequence, but it *does* need to be passed the back_inserter, which needs to know how to grow it.

You don't have to use back_inserter with std::copy. Copying into a pre-existing buffer also works.

We were discussing why he thought the XInt algorithms didn't need to know how to grow their buffers in order to change the sizes of them. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwRHUYACgkQp9x9jeZ9/wQsqACfXnvKE5klRKqVBtYm9BJY6HlG eRMAoN4H8kUGOkAuU9Ew8yFBlax27R+n =HIOg -----END PGP SIGNATURE-----

Giovanni Piero Deretta

5:34 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

On Thu, Jun 10, 2010 at 5:54 PM, Chad Nelson <chad.thecomfychair@gmail.com> wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 06/10/2010 11:58 AM, Giovanni Piero Deretta wrote:

...
...
...
The alternative, you say, is "I'd have to have two functions for each operations". Can you elaborate?

One to calculate how much memory would be needed, and (probably) allocate it -- the calculation is different for every operation. The other to do the actual calculation.

I do not buy this:

Too bad, it's on sale and you're missing a really great deal. ;-)

:) I would actually find some use for a big int class and I hope that yours find its way into boost as it is the most promising so far.

...

...
as an example, std::copy does not need to know the size of the output sequence nor need to know how to grow it (one can for example use back_inserter for automatic growth). Couldn't xint use a similar strategy?

std::copy doesn't need to know how to grow the output sequence, but it *does* need to be passed the back_inserter, which needs to know how to grow it.

yes, so?

...

What's the difference between having a calculation function that takes the components of raw integers and a class to reallocate them, and simply sending in the raw integer objects that already have all the components and know how to reallocate themselves?

Those are not the options. The choice is between having a function that takes one or more generic input ranges and an output range (as most std algos) and a function that only works on some specific integer type. The advantage would be that the big int algos would work on any sequence of bytes (or whatever smallest amount of computation you use... probably ints?). If I wanted to I could simply use a vector<char> (or deque or whatever) and back_inserter without having to write anything more. And wouldn't need to use your raw integer object. As you have experimented yourself in this list, the bigint class itself has proved to be the most contentious issue as everybody has his own requirements (move vs reference count, fixed buffer vs reallocation, etc).

...

...
For an heap allocated big int the ouptut iterator can take care of any reallocation, while for a fixed point big int, in case of overflow, one could chose between UB, assert or throw. In principle there is no reason for algorithms to deal with memory allocation details.

Any algorithm that has to modify the size of a buffer has to deal with memory allocation, one way or another, at some level.

The fact that std::back_inserter deal with memory allocation in no way implies that std::copy (or any other algorithm) needs to deal with memory allocation. BTW, the fact that it doesn't need to, does not mean that it can't. It is perfectly acceptable (and in fact desirable) for std::copy to be specialized to treat optimally back_inserter iterators to known containers (for example by explicitly calling reserve).

...

The more indirect you make it, the harder it is to get speed out of it. I'm trying to make the fastest pure-C++ large-integer implementation I can.

generic programming in C++ is about having the most flexible abstraction with the smallest abstraction penality. What do you think will prevent you from extracting the highest performance? There are probably many people on the boost mailing list that could give advice. -- gpd

Chad Nelson

6:40 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/10/2010 01:34 PM, Giovanni Piero Deretta wrote:

...

I would actually find some use for a big int class and I hope that yours find its way into boost as it is the most promising so far.

I hope so too. :-)

...

...
What's the difference between having a calculation function that takes the components of raw integers and a class to reallocate them, and simply sending in the raw integer objects that already have all the components and know how to reallocate themselves?

Those are not the options. The choice is between having a function that takes one or more generic input ranges and an output range (as most std algos) and a function that only works on some specific integer type.

The advantage would be that the big int algos would work on any sequence of bytes (or whatever smallest amount of computation you use... probably ints?).

digit_t, defined as the unsigned integral type that's half the size of the largest integral type the platform supports.

...

If I wanted to I could simply use a vector<char> (or deque or whatever) and back_inserter without having to write anything more. And wouldn't need to use your raw integer object.

As an alternative, you could just write your own raw integer class, or modify mine. So long as it provides the same interface as the one that I wrote, the algorithms I've provided will work with it -- they're all templates.

...

As you have experimented yourself in this list, the bigint class itself has proved to be the most contentious issue as everybody has his own requirements (move vs reference count, fixed buffer vs reallocation, etc).

Yeah, I've noticed. :-(

...

...
Any algorithm that has to modify the size of a buffer has to deal with memory allocation, one way or another, at some level.

The fact that std::back_inserter deal with memory allocation in no way implies that std::copy (or any other algorithm) needs to deal with memory allocation. BTW, the fact that it doesn't need to, does not mean that it can't. It is perfectly acceptable (and in fact desirable) for std::copy to be specialized to treat optimally back_inserter iterators to known containers (for example by explicitly calling reserve).

Such a specialization would be very close to what I've got -- the algorithms tell the raw integer object to reallocate itself to a certain size, and that's the extent of their dealings with memory allocation. The object is free to ignore the request, or use a different size; the algorithms don't assume anything about it, they just work as best they can with what the object gives them.

...

...
The more indirect you make it, the harder it is to get speed out of it. I'm trying to make the fastest pure-C++ large-integer implementation I can.

generic programming in C++ is about having the most flexible abstraction with the smallest abstraction penality. What do you think will prevent you from extracting the highest performance?

Experience. :-) There's always a cost to adding indirection, be it in run-time performance, longer compile times, higher memory usage, or more work for the developer or user of the library. I'm trying to get the maximum performance for the amount of work it requires to write and maintain the library, while at the same time making it as easy for the end user to use as possible. Minimizing compile times and memory usage are secondary, but still concerns.

...

There are probably many people on the boost mailing list that could give advice.

I've got advice coming out of both nostrils and an ear already. Some good and some not so good, and it's difficult figuring out who knows what they're talking about sometimes, and which parts of the good advice are even applicable to this library. That said, I've been trying to take as much of the good advice as I can. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwRMaIACgkQp9x9jeZ9/wSSSgCfUoP4CMntWmxDgvWKVY/HEA8X npoAnA1sq0DfFe1FteZDmF9VwYhlqWpz =PkKB -----END PGP SIGNATURE-----

David Abrahams

9:06 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

At Thu, 10 Jun 2010 13:15:00 +0100, Giovanni Piero Deretta wrote:

...

...
I assume you wanted to use stack-based memory for speed, but due to the above, it would likely be slower than the equivalent heap-based integer. Especially with a good caching allocator.

In my experience, stack allocated memory easily beat heap allocation even for large buffer sizes. In the past I have used a fixed string type with a ridiculously large size for some specialized application and it outperformed the heap based string type it replaced (I didn't try to manually optimize copies away).

I assume that we're going to be using the infamous “small string optimization” at some point, which amounts to something like boost::variant< unique_ptr<char[]>, boost::array<char, 6> > right? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Chad Nelson

9:22 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/10/2010 05:06 PM, David Abrahams wrote:

...

...
...
I assume you wanted to use stack-based memory for speed, but due to the above, it would likely be slower than the equivalent heap-based integer. Especially with a good caching allocator.

In my experience, stack allocated memory easily beat heap allocation even for large buffer sizes. In the past I have used a fixed string type with a ridiculously large size for some specialized application and it outperformed the heap based string type it replaced (I didn't try to manually optimize copies away).

I assume that we're going to be using the infamous “small string optimization” at some point, which amounts to something like

boost::variant< unique_ptr<char[]>, boost::array<char, 6> >

right?

I had something like that in earlier versions of the library (the "quickdigits"). I've eliminated them from more recent versions; I figure I can re-add them later if they prove to be useful. Until then, with all the rewriting I've been doing on it, it was just one more place things could go wrong. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwRV7AACgkQp9x9jeZ9/wT8GwCgr7CjwiycgCV8ZmL/mDZmA1i8 KF4Anj+2HYo8lk89lXjKX3CbPLY4hwuT =4N4Q -----END PGP SIGNATURE-----

DE

5:50 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

on 10.06.2010 at 5:49 Chad Nelson wrote :

...

A stack-based integer's magnitude would always have to be stored directly adjacent to the integer itself on the stack, or risk having one deallocated before the other. That means no sharing of memory (even temporarily) for integers with identical magnitudes, and no moving magnitudes around by pointer-swapping -- every time you copy an integer for any reason, you'd have to copy each and every byte of the magnitude along with it. Swapping two 512-bit integers, just as an example, would mean three copies of 64+ bytes each, rather than a simple pointer swap.

...

I assume you wanted to use stack-based memory for speed, but due to the above, it would likely be slower than the equivalent heap-based integer. Especially with a good caching allocator.

...

It would probably also require some modifications to the lower levels of the code, to disable all the pointer-swapping stuff that I've built in for speed. That makes it difficult to experiment with it at present. Once the library is finalized, we can play with it and see if we can find a way around that problem. yes you are right i can see the rationale behind this now i guess the cost of allocation compared to cost of operations is negligible so in fact stack version will not gain much and it seems to me that i recalled the discussion about it

anyway i think that fixed integer is a valuable part of the lib one more question arises is that many algorithms that use built-in integer modular arithmetic can be generalized to larger ints (PRNG, hash etc.) but as i understood from the docs fixed_integer does not exercise modular arithmetic like built-in types do at the same time there already is modular arithmetic implementation in the lib it confuses me what was your intention? -- Pavel P.S. if you notice a grammar mistake or weird phrasing in my message please point it out

Simonson, Lucanus J

6:20 p.m.

DE wrote:

...

...
It would probably also require some modifications to the lower levels of the code, to disable all the pointer-swapping stuff that I've built in for speed. That makes it difficult to experiment with it at present. Once the library is finalized, we can play with it and see if we can find a way around that problem. yes you are right i can see the rationale behind this now i guess the cost of allocation compared to cost of operations is negligible so in fact stack version will not gain much and it seems to me that i recalled the discussion about it

My experience with GMP has been that reusing existing GMP objects instead of declaring new ones was a significant performance enhancement. I don't think it is true that cost of operations is even greater than the cost of allocation, mush less significantly so, in general. It is reasonable, particularly with fixed integer, to expect that the common case will be operations on a number of bits only slightly larger than the built in integer types support. If I need 65 bits (and I do) I will code it myself rather than pay for an allocation (and I did). It isn't clear to me what fixed integer provides if it allocates on the heap. Why would anyone use it? Regards, Luke

Chad Nelson

7:27 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/10/2010 02:20 PM, Simonson, Lucanus J wrote:

...

My experience with GMP has been that reusing existing GMP objects instead of declaring new ones was a significant performance enhancement. I don't think it is true that cost of operations is even greater than the cost of allocation, mush less significantly so, in general.

That depends on the size you're using and what you're doing with it. For a 65-bit integer like you mentioned, doing nothing but addition and subtraction, you're right. For a 4096-bit one using a lot of exponentiation (think RSA encryption), the cost of the operations will *definitely* outweigh the cost of allocation, and by such a large margin that the allocation will be completely unnoticeable. Most examples will fall somewhere between those two extremes.

...

It is reasonable, particularly with fixed integer, to expect that the common case will be operations on a number of bits only slightly larger than the built in integer types support. If I need 65 bits (and I do) I will code it myself rather than pay for an allocation (and I did).

That's certainly a viable option, if you've got the luxury of the time and interest to do it. Your average developer often doesn't, or values his own time more than he needs efficiency in that part of his program. The library is designed for large integers -- something that's usually noticeably larger than 65 bits -- so it probably won't be optimal for really small integers like that. It'll get the job done though, so I don't consider that a problem.

...

It isn't clear to me what fixed integer provides if it allocates on the heap. Why would anyone use it?

Because it simplifies his life. I can't predict what problems someone might put it to, but I can imagine a few, such as emulating an old mainframe or minicomputer that had an odd number of bits (I've heard of computers with everything from seven to 36 bits, for instance). It was a requested feature, and it required very little additional effort, so I added it. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwRPKgACgkQp9x9jeZ9/wRylQCeP7abF9+Xf1btTmSk2rchcVPi AiwAoMcZUxvPYmt+4StdK1WLwWh2SDaE =Qsoc -----END PGP SIGNATURE-----

Simonson, Lucanus J

9:45 p.m.

Chad Nelson wrote:

...

...
My experience with GMP has been that reusing existing GMP objects instead of declaring new ones was a significant performance enhancement. I don't think it is true that cost of operations is even greater than the cost of allocation, mush less significantly so, in general.

That depends on the size you're using and what you're doing with it. For a 65-bit integer like you mentioned, doing nothing but addition and subtraction, you're right. For a 4096-bit one using a lot of exponentiation (think RSA encryption), the cost of the operations will *definitely* outweigh the cost of allocation, and by such a large margin that the allocation will be completely unnoticeable.

Most examples will fall somewhere between those two extremes.

...
It is reasonable, particularly with fixed integer, to expect that the common case will be operations on a number of bits only slightly larger than the built in integer types support. If I need 65 bits (and I do) I will code it myself rather than pay for an allocation (and I did).

That's certainly a viable option, if you've got the luxury of the time and interest to do it. Your average developer often doesn't, or values his own time more than he needs efficiency in that part of his program.

The library is designed for large integers -- something that's usually noticeably larger than 65 bits -- so it probably won't be optimal for really small integers like that. It'll get the job done though, so I don't consider that a problem.

There is a sort of odd dynamic here. On the one hand we all care about performance. On the other there is no way to implement this library in C++ to be efficient on modern hardware because the C++ standard doesn't provide access to integer overflow flags and the compilers are next to worthless for vectorizing general code. Obviously no one expects you to special case the hardware at run time and provide inline assembly where it is warrented, however, where do we draw the line and say "past this line we do care about performance." A performance delta of 10X is not unreasonable between this library and GMP with hand crafted SSE assembly code that takes advantage of overflow flags competing with whatever scalar code the compiler manages to produce, so I suppose we can set performance completely aside for now and focus on the interface. However, the issue will certainly come up during review. Regards, Luke

David Abrahams

9:55 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

At Thu, 10 Jun 2010 14:45:17 -0700, Simonson, Lucanus J wrote:

...

There is a sort of odd dynamic here. On the one hand we all care about performance. On the other there is no way to implement this library in C++ to be efficient on modern hardware because the C++ standard doesn't provide access to integer overflow flags and the compilers are next to worthless for vectorizing general code. Obviously no one expects you to special case the hardware at run time and provide inline assembly where it is warrented,

I do! I mean, it wouldn't be unreasonable. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Chad Nelson

10:16 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/10/2010 05:55 PM, David Abrahams wrote:

...

Simonson, Lucanus J wrote:

...
There is a sort of odd dynamic here. On the one hand we all care about performance. On the other there is no way to implement this library in C++ to be efficient on modern hardware because the C++ standard doesn't provide access to integer overflow flags and the compilers are next to worthless for vectorizing general code. Obviously no one expects you to special case the hardware at run time and provide inline assembly where it is warrented,

I do! I mean, it wouldn't be unreasonable.

Eventually I'd like to code the lower-level operations in assembly, but that isn't going to happen for the first public release. And even after I do, I'd still want to maintain a pure C++ version, for maximum portability. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwRZEgACgkQp9x9jeZ9/wTIzwCeLWh0y5XA05D8FD5/qK40kt+k VCkAniFPHBiJD7P0kF13qa1aNlkftdPx =4X9f -----END PGP SIGNATURE-----

David Abrahams

11:44 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

At Thu, 10 Jun 2010 18:16:40 -0400, Chad Nelson wrote:

...

On 06/10/2010 05:55 PM, David Abrahams wrote:

...
Simonson, Lucanus J wrote:

...
There is a sort of odd dynamic here. On the one hand we all care about performance. On the other there is no way to implement this library in C++ to be efficient on modern hardware because the C++ standard doesn't provide access to integer overflow flags and the compilers are next to worthless for vectorizing general code. Obviously no one expects you to special case the hardware at run time and provide inline assembly where it is warrented,

I do! I mean, it wouldn't be unreasonable.

Eventually I'd like to code the lower-level operations in assembly, but that isn't going to happen for the first public release. And even after I do, I'd still want to maintain a pure C++ version, for maximum portability.

'course. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Joel Falcou

11 Jun 11 Jun

5:53 a.m.

David Abrahams wrote:

...

At Thu, 10 Jun 2010 18:16:40 -0400, Chad Nelson wrote:

...
On 06/10/2010 05:55 PM, David Abrahams wrote:

...
...
There is a sort of odd dynamic here. On the one hand we all care about performance. On the other there is no way to implement this library in C++ to be efficient on modern hardware because the C++ standard doesn't provide access to integer overflow flags and the compilers are next to worthless for vectorizing general code. Obviously no one expects you to special case the hardware at run time and provide inline assembly where it is warrented, I do! I mean, it wouldn't be unreasonable. Eventually I'd like to code the lower-level operations in assembly, but

Simonson, Lucanus J wrote: that isn't going to happen for the first public release. And even after I do, I'd still want to maintain a pure C++ version, for maximum portability.

'course.

you can have both protability and SIMD stuff in clean , easy to read C++. It's not that hard, just long and tiring.

Chad Nelson

2:16 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/11/2010 01:53 AM, Joel Falcou wrote:

...

...
At Thu, 10 Jun 2010 18:16:40 -0400, Chad Nelson wrote:

...
Eventually I'd like to code the lower-level operations in assembly, but that isn't going to happen for the first public release. And even after I do, I'd still want to maintain a pure C++ version, for maximum portability.

'course.

you can have both protability and SIMD stuff in clean , easy to read C++. It's not that hard, just long and tiring.

?!! Do you have references on how to do it? If it can be done, I'd very much like to do it for XInt. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwSRU4ACgkQp9x9jeZ9/wSpLQCbB0zqRQvRx7cdt4RBq8Dx7FjG lHEAoNpPdwgfT3ix0lAjYBDUfEs7uS5I =N1pe -----END PGP SIGNATURE-----

Simonson, Lucanus J

10 p.m.

Chad Nelson wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 06/11/2010 01:53 AM, Joel Falcou wrote:

...
...
At Thu, 10 Jun 2010 18:16:40 -0400, Chad Nelson wrote:

...
Eventually I'd like to code the lower-level operations in assembly, but that isn't going to happen for the first public release. And even after I do, I'd still want to maintain a pure C++ version, for maximum portability.

'course.

you can have both protability and SIMD stuff in clean , easy to read C++. It's not that hard, just long and tiring.

?!!

Do you have references on how to do it? If it can be done, I'd very much like to do it for XInt.

You can wrap usage of intrinsics with templates and create simd data types in C++ that use SSE when available and emulate it in software when not available. Doing so would be a boost library unto itself, which has been proposed many times and generally not well received because it favors which ever instruction set it chooses to target and is therefore not general. It would also be a maintanence headache because new revisions of the instruction set would require constant revisions of the library to keep up. For complicated reasons that become obvious once you comparatively study several vector instruction sets it is not practical (or really feasible for that matter) to target multiple instruction sets with such a library. You write your algorithms differently to take advantage of different instructions. A compromise is a slippery slope toward writing a cross compiler. There is a school of thought that anything that can be done in the compiler can be done better in a library. If we can't implement a good vectorizing compiler as a compiler it isn't reasonable to think we can do so as a library. What it boils down to is that you have to write a different wrapper for each instruction set and you end up implementing your algorithms multiple times anyway. I can understand using some assembly to access special instructions related to concurrency in threading libraries submitted to boost. It is pretty easy to case out the different target platforms and the number of places you need to do it is small. Writing large numbers of non-trivial algorithms in machine specific code is another matter all together. This is sufficiently hard that the best people in the world collaborate to produce GMP that they can share and the hardware vendors typically pitch in with this sort of thing to help themselves help the customer use new hardware features. Even when it is an open source effort the people contributing are often paid. Regards, Luke

joel falcou

10:13 p.m.

...

You can wrap usage of intrinsics with templates and create simd data types in C++ that use SSE when available and emulate it in software when not available. Doing so would be a boost library unto itself, which has been proposed many times and generally not well received because it favors which ever instruction set it chooses to target and is therefore not general. It would also be a maintanence headache because new revisions of the instruction set would require constant revisions of the library to keep up. For complicated reasons that become obvious once you comparatively study several vector instruction sets it is not practical (or really feasible for that matter) to target multiple instruction sets with such a library. You write your algorithms differently to take advantage of different instructions. <off topic> Except in fact, after a while you can just see all this as a real generic layer on top of register. I suggest you to get a hold of what we

Simonson, Lucanus J wrote: presented this year at Boost'Con. With the proper abstraction, we were able to handle all SSE flavors as well as AltiVec. The effort on maintenance is marginal. As an exercie we worked on a XOP binding. Took less than a week to get running. The main idea was to separate the low level, extensions pecific binding from a generic, EDSL based concept of "pack of N elements of type T" in which compile-time transform restructure simd expression toward their natural form for current extension. The error is to make a boost.SIMD stuff. You need a global overhaul of function for scalar and vector register, as scalar are baically vector register with one element. When you're there, everythign flow properly. As for the maitnenance, the IS doesn't change. New IS are made for new processor. At this either they are completely separate (altivec vs spu altivec) or they share an ancestry (SSE2,3,4,AVX for example). Both case are easily manageable. </off topic>

Chad Nelson

10:40 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/11/2010 06:13 PM, joel falcou wrote:

...

[...] I suggest you to get a hold of what we presented this year at Boost'Con. With the proper abstraction, we were able to handle all SSE flavors as well as AltiVec. [...]

Thanks. I'll take a look when I've got the time (which won't be in the next few months, unfortunately). - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwSu18ACgkQp9x9jeZ9/wTMKgCgoalKLpNyD3bbQyiwm09TByCY dskAoMl+b6hwkwFIlZyw7RwbsI1jCPSc =/mxH -----END PGP SIGNATURE-----

Simonson, Lucanus J

12 Jun 12 Jun

12:18 a.m.

New subject: SIMD libraries -- was: [xint] Fourth release, requesting preliminary review again

joel falcou wrote:

...

Simonson, Lucanus J wrote:

...
You can wrap usage of intrinsics with templates and create simd data types in C++ that use SSE when available and emulate it in software when not available. Doing so would be a boost library unto itself, which has been proposed many times and generally not well received because it favors which ever instruction set it chooses to target and is therefore not general. It would also be a maintanence headache because new revisions of the instruction set would require constant revisions of the library to keep up. For complicated reasons that become obvious once you comparatively study several vector instruction sets it is not practical (or really feasible for that matter) to target multiple instruction sets with such a library. You write your algorithms differently to take advantage of different instructions.

...

Except in fact, after a while you can just see all this as a real generic layer on top of register. I suggest you to get a hold of what we presented this year at Boost'Con. With the proper abstraction, we were able to handle all SSE flavors as well as AltiVec. The effort on maintenance is marginal. As an exercie we worked on a XOP binding. Took less than a week to get running. The main idea was to separate the low level, extensions pecific binding from a generic, EDSL based concept of "pack of N elements of type T" in which compile-time transform restructure simd expression toward their natural form for current extension.

The error is to make a boost.SIMD stuff. You need a global overhaul of function for scalar and vector register, as scalar are baically vector register with one element. When you're there, everythign flow properly.

As for the maitnenance, the IS doesn't change. New IS are made for new processor. At this either they are completely separate (altivec vs spu altivec) or they share an ancestry (SSE2,3,4,AVX for example). Both case are easily manageable.

Obviously something like valarray can be implemented. It has been part of the standard for years, though most people don't implement it, use it or even know of its existence. There is a difference between that and what I was talking about, which is a DSEL that exposes all of the features of a SIMD instruction set. It is doable for one IS, and probably for one family, but to generalize and unify across SIMD instruction sets I'm more than skeptical. Since most software targets a platform specifically it is almost a moot point, however. There is very little room between people who want to target a wide variety of hardware and care enough about performance to want to use SIMD and similar people who care so much about performance that they want to write all their code by hand in assembly for each target hardware. I expect GMP falls solidly in the second category, for example. There is a big difference between using SSE instructions and getting the most out of them. In general you have to change your algorithm to fit the architecture. That means different algorithms for different architectures. Does your simd wrapper use expression templates to convert a * b + c to fused multiply add? If it doesn't then it is wrong because it is missing an important optimization and if it does it is also wrong since the fused multiply add produces a result that has different accuracy of precision. Does it expose the overflow/carry flags produced by arithmetic? What about predicates? I agree it is pretty obvious to start out from a "pack of N elements of type T", but that's just the start, not the whole echelada. Please don't misinterpret my skepticism for criticism. I am actually pretty curious to learn more about what you did with SIMD and I'm sorry I missed boostcon this year. Discussions like this work much better in person. I've followed some of your posts in the past about NT2 and uBlas and the recent matrix library submission. Regards, Luke

joel falcou

6:24 a.m.

New subject: SIMD libraries -- was: [xint] Fourth release, requesting preliminary review again

...

Obviously something like valarray can be implemented. It has been part of the standard for years, though most people don't implement it, use it or even know of its existence. There is a difference between that and what I was talking about, which is a DSEL that exposes all of the features of a SIMD instruction set. I'm talking about a DSEL for vector register, not a DSEL pof array with SIMD behavior. It is doable for one IS, and probably for one family, but to generalize and unify across SIMD instruction sets I'm more than skeptical. Well, the thign is that you don't have. You only generalize the thing

Simonson, Lucanus J wrote: that make sense and if you look at it, you'll see you can easily target a large part. Then you can encapsulate the extension specific behaviors into extensions specific function.

...

Since most software targets a platform specifically it is almost a moot point, however. There is very little room between people who want to target a wide variety of hardware and care enough about performance to want to use SIMD and similar people who care so much about performance that they want to write all their code by hand in assembly for each target hardware. I expect GMP falls solidly in the second category, for example. Why again assembly. C intrinsic are perfectly fine. We're not in 1492 anymore ... There is a big difference between using SSE instructions and getting the most out of them. Well , you can look here:

for some performances. We are basically at 80-90% of maximumtheoric speed-up for single operations. The DSL nature of the whole thing make composition as fast.

...

In general you have to change your algorithm to fit the architecture. That means different algorithms for different architectures.

...

Does your simd wrapper use expression templates to convert a * b + c to fused multiply add? Of course, it was like the motivation of making it in the first place. In the proto grammar of our register_<T,N> class, we have a customization point which is specialized per extension and contain a

Except here with our appraoch, the only thign we give is basic building block for SIMD code. When you compose them, we have a set of transform that remap combo to current extensions idioms. list of such fusing transformation.

...

Does it expose the overflow/carry flags produced by arithmetic? Currently no because our use case doesn't require them. If you have an example for them, I'm all for havign a look. What about predicates? You mean ? Handling stuff like vec_sel with proper bit mask ? I agree it is pretty obvious to start out from a "pack of N elements of type T", but that's just the start, not the whole echelada.

We are fully aware. We didn't say "hey let's make SIMD in C++" overnight, it's a 6 years work by two people :o

...

Please don't misinterpret my skepticism for criticism. I am actually pretty curious to learn more about what you did with SIMD and I'm sorry I missed boostcon this year. Discussions like this work much better in person. I've followed some of your posts in the past about NT2 and uBlas and the recent matrix library submission.

No problem. If you want more details we can : 1/ set up a threa dout of this one ;) 2/ give you some extra material about our publications on the matter We're also currently working hard to get it released in a not too much distant future so people cna actualyl play with it ;)

Simonson, Lucanus J

14 Jun 14 Jun

5:23 p.m.

New subject: SIMD libraries

joel falcou wrote:

...

...
There is a big difference between using SSE instructions and getting the most out of them. Well , you can look here:

for some performances. We are basically at 80-90% of maximumtheoric speed-up for single operations. The DSL nature of the whole thing make composition as fast.

There are people who make their livelyhood getting that 10-20% and look down on us who accept less than 100. For my part I'd rather write 10X more code that runs at 90% maximum-theoretic so I'm still on board.

...

In the proto grammar of our register_<T,N> class, we have a customization point which is specialized per extension and contain a list of such fusing transformation.

Interesting. That's a good use of proto.

...

...
Please don't misinterpret my skepticism for criticism. I am actually pretty curious to learn more about what you did with SIMD and I'm sorry I missed boostcon this year. Discussions like this work much better in person. I've followed some of your posts in the past about NT2 and uBlas and the recent matrix library submission.

No problem. If you want more details we can : 1/ set up a threa dout of this one ;) 2/ give you some extra material about our publications on the matter

We're also currently working hard to get it released in a not too much distant future so people cna actualyl play with it ;)

I opt for 1 and 2. I read the EVE publication, which seems to describe only a optimized valarray type interface. Do you have a reference for the generic vector register interface you mentioned? I made some minor contributions to writing a DSEL for wrapping intrinsics. My main design input was to push for making values be models of our expression concept so that everything composes well. Since we weren't using proto we had to figure out a lot of things for ourselves that proto would have done for us. Thanks, Luke

joel falcou

5:32 p.m.

New subject: SIMD libraries

Simonson, Lucanus J wrote:

...

There are people who make their livelyhood getting that 10-20% and look down on us who accept less than 100. For my part I'd rather write 10X more code that runs at 90% maximum-theoretic so I'm still on board.

As I said in my boost'con presentation, we target a different demographic: people who don't care about the details and just want to go "faster". See slides 2->4

...

Interesting. That's a good use of proto

We thought so ;)

...

I opt for 1 and 2. I read the EVE publication, which seems to describe only a optimized valarray type interface. Do you have a reference for the generic vector register interface you mentioned?

EVE is pretty old, it predates proto by 2-3 years. Back in the day it was all made by hand, perusing Veldhuizen seminal work. I don't think we have any publciations on the new versions except for the Boost'Con slides. Time to make one I think even if I see it harder to get published as it seems it's not interesting for a lot of people (read reviewers).

...

I made some minor contributions to writing a DSEL for wrapping intrinsics. My main design input was to push for making values be models of our expression concept so that everything composes well. Since we weren't using proto we had to figure out a lot of things for ourselves that proto would have done for us.

Proto changed our life basically from a mess of hand written ET mess to a clean stuff especially on the specialization front.

Simonson, Lucanus J

6:23 p.m.

New subject: SIMD libraries

joel falcou wrote:

...

...
I opt for 1 and 2. I read the EVE publication, which seems to describe only a optimized valarray type interface. Do you have a reference for the generic vector register interface you mentioned?

EVE is pretty old, it predates proto by 2-3 years. Back in the day it was all made by hand, perusing Veldhuizen seminal work. I don't think we have any publciations on the new versions except for the Boost'Con slides. Time to make one I think even if I see it harder to get published as it seems it's not interesting for a lot of people (read reviewers).

I am not able to access your boostcon slides because my employer is blocking the filetolink site that the boostcon presentations are served up from. Can you email it to me directly? Thanks, Luke

Chad Nelson

10 Jun 10 Jun

10:11 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/10/2010 05:45 PM, Simonson, Lucanus J wrote:

...

...
[...] The library is designed for large integers -- something that's usually noticeably larger than 65 bits -- so it probably won't be optimal for really small integers like that. It'll get the job done though, so I don't consider that a problem.

There is a sort of odd dynamic here. On the one hand we all care about performance. On the other there is no way to implement this library in C++ to be efficient on modern hardware because the C++ standard doesn't provide access to integer overflow flags and the compilers are next to worthless for vectorizing general code.

That's why I specified that I was aiming for the fastest implementation I could get in pure C++.

...

Obviously no one expects you to special case the hardware at run time and provide inline assembly where it is warrented, however, where do we draw the line and say "past this line we do care about performance."

Everyone will draw the line somewhere different. As the developer of the library, I balance code speed against the difficulty of writing and maintaining the library -- a feature that would give a 2% improvement in speed, but double the time it takes to write or update the library, is definitely off the menu. One that would double operational speed for that price is definitely *on* the menu. There's a lot of room in between those.

...

A performance delta of 10X is not unreasonable between this library and GMP with hand crafted SSE assembly code that takes advantage of overflow flags competing with whatever scalar code the compiler manages to produce, so I suppose we can set performance completely aside for now and focus on the interface. However, the issue will certainly come up during review.

Any reasonable reviewer will understand that, as XInt is less than six months old and is being developed by a single person, it isn't likely to be anywhere near GMP's performance. Unreasonable reviewers need not apply. ;-) - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwRYxgACgkQp9x9jeZ9/wR7UgCfTJQLhfCTMlChJD/o/7//bfYD TckAoPa8/7cHXxYlZXMK+j5G6VnP1V7P =RZe0 -----END PGP SIGNATURE-----

Chad Nelson

7:04 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/10/2010 01:50 PM, DE wrote:

...

...
[...] I assume you wanted to use stack-based memory for speed, but due to the above, it would likely be slower than the equivalent heap-based integer. Especially with a good caching allocator. [...]

...

yes you are right i can see the rationale behind this now i guess the cost of allocation compared to cost of operations is negligible so in fact stack version will not gain much and it seems to me that i recalled the discussion about it

If you're doing anything more than simple addition and subtraction, it probably will be.

...

anyway i think that fixed integer is a valuable part of the lib one more question arises is that many algorithms that use built-in integer modular arithmetic can be generalized to larger ints (PRNG, hash etc.)

I'm not sure I understand what you're saying there.

...

but as i understood from the docs fixed_integer does not exercise modular arithmetic like built-in types do

The library's fixed-size integers act like unsigned ints, as far as modular arithmetic goes. Except, of course, that they're signed as well. That's why I took pains to describe their behavior thoroughly, it's not entirely obvious from knowledge of the built-in types.

...

at the same time there already is modular arithmetic implementation in the lib it confuses me what was your intention?

To provide a fixed-integer implementation, primarily because you requested it. :-) The fixed-integer stuff that the library provides is somewhat more flexible than just relying on the modular behavior of the built-in types though, because you can specify a number of bits that isn't necessarily an even multiple of a built-in type's size. For example, it allows 56-bit, 112-bit, and 168-bit integers (the key sizes for Triple-DES) on a 32- or 64-bit machine, even though none of those sizes are a multiple of 32 or 64. That said, the version I'm finishing up now *does* take advantage of the built-in types' behavior when the number of bits is an even multiple of the underlying type's size, to squeeze a little extra performance out of it by eliminating unneeded instructions at compile-time. (Since some people will complain about my example: yes, it's contrived. You don't need a large-integer library for Triple-DES, and you could use a larger value and simply cut it down to the required bits. I was just trying to show one place where it might be useful to calculate something with an unusual number of bits.) - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwRNyMACgkQp9x9jeZ9/wSdwwCdGbskHqLFeoZ9dpW2NeIB5jLi ZzMAoMWGbfhiQtcberJIujrVH9JRbEaI =EDY1 -----END PGP SIGNATURE-----

DE

7:49 p.m.

New subject: [xint] Fourth release, requesting preliminary review again

on 10.06.2010 at 23:04 Chad Nelson wrote :

...

...
yes you are right i can see the rationale behind this now i guess the cost of allocation compared to cost of operations is negligible so in fact stack version will not gain much and it seems to me that i recalled the discussion about it If you're doing anything more than simple addition and subtraction, it probably will be. forgive me for repeating myself but yes, you did everything right i recalled that one ought to first measure and only then optimize (first shoot then ask questions...)

...

...
but as i understood from the docs fixed_integer does not exercise modular arithmetic like built-in types do The library's fixed-size integers act like unsigned ints, as far as modular arithmetic goes. Except, of course, that they're signed as well. That's why I took pains to describe their behavior thoroughly, it's not entirely obvious from knowledge of the built-in types. it's rather unusual for me so if i want to go unsigned bigint i just ignore the sign of fixed integers (in other words they're all positive) am i right?

one small issue i can think about is 'a - b' where a<b according to mod. arith. 'a - b' is equivalent to '(mod + a - b)%mod' is that case preserved by fixed_integer? i misunderstood the docs and probably poorly expressed my thought sorry -- Pavel P.S. if you notice a grammar mistake or weird phrasing in my message please point it out

Chad Nelson

9:48 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/10/2010 03:49 PM, DE wrote:

...

...
[...] If you're doing anything more than simple addition and subtraction, it probably will be.

forgive me for repeating myself but yes, you did everything right i recalled that one ought to first measure and only then optimize (first shoot then ask questions...)

:-)

...

...
The library's fixed-size integers act like unsigned ints, as far as modular arithmetic goes. Except, of course, that they're signed as well. That's why I took pains to describe their behavior thoroughly, it's not entirely obvious from knowledge of the built-in types.

it's rather unusual for me so if i want to go unsigned bigint i just ignore the sign of fixed integers (in other words they're all positive) am i right?

In most cases, that will work, because overflows will be truncated to the proper number of bits. I hadn't considered underflows though, until you mentioned it in your next paragraph.

...

one small issue i can think about is 'a - b' where a<b according to mod. arith. 'a - b' is equivalent to '(mod + a - b)%mod' is that case preserved by fixed_integer?

Not at present. It would be easy enough to add that to the current design though, as yet another template parameter (Signed/Unsigned). I've put that on the to-do list. I'm not sure whether that will make the fifth release (which I hope to have ready tomorrow, if all goes well, so people can play with it over the weekend). If it doesn't, I'll add it as soon after that as I can.

...

i misunderstood the docs and probably poorly expressed my thought sorry

I'm not sure you did. I hadn't considered how negative numbers would affect that. If that's the cause of the misunderstanding, it was entirely my fault, and I apologize. - -- Chad Nelson Oak Circle Software, Inc. * * * -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwRXbMACgkQp9x9jeZ9/wQj3ACg7xNVheu1v1Y30KtAZA4UWul7 W+kAn1gXiUfkjSJDPEQvbsunXx1gVu7+ =5dYy -----END PGP SIGNATURE-----

5521

Age (days ago)

5534

Last active (days ago)

List overview

Download

71 comments

16 participants

participants (16)

Chad Nelson
Christopher Jefferson
David Abrahams
DE
Giovanni Piero Deretta
Jeffrey Lee Hellrung, Jr.
joel falcou
Joel Falcou
John Bytheway
Jürgen Hunold
Marius Stoica
Scott McMurray
Simonson, Lucanus J
Steven Watanabe
Stewart, Robert
vicente.botet