[boost] [Review] XInt Review

4 Mar 2011

      Here is my review for XInt. Currently I think the library should not be 
included in boost as a lot of problems are, in my opinion, still to be 
fixed and the design has to be rethought.

***********************************************************************
* What is your evaluation of the design?

Large integer is a bit like XML and Unicode. Lots of people want to do 
such a library because it has a high impact of usefulness and, from 
afar, it doesn't sounds *this* complicated.

Alas, it is :/

The premises are OK. We have some large integer type that has proper 
value semantic and can be used in a variety of operations and functions. 
So far, so good. The main culprit is that the current library design is 
rather hardwired to work with a choice of implementation the author 
thought to be OK at some point but which still have to be proven better 
than existing solution.

The overall design lacks separation between the data and the algorithm
that makes the library difficult to extend: what if i want to take an 
arbitrary range of integers and look at them as a huge series of bits ? 
What if i don't want the extra memory overhead so i can mmap some files 
directly into memory (cause reading a 10k bits integer is a no-no) ?
Can I pass the integer type to a STD algorithm if I want to iterate over 
the bytes it contains ? or copy them using boost::range copy ?

On the data side, i think being forced to use a given representation is 
wrong. The Ideal should be that the library provide a fall back 
implementation but provide adaptor to adapt any kind of data to be 
looked at as a large integer. The definition of a LargeInteger concept 
would help defining such an adapting interface and make the library more 
fluid. For example, what if i want to implement LCM ?

* What is your evaluation of the implementation?

The implementation is poor for various reasons:

  - author uses COW for performance reasons. Alas COW has been 
demonstrated again and again to be of little interest. Looking at the 
problem 10s (types holding arbitrary large amount of data to be chained 
in operations without copy) should have lead to an expression template 
based solution. Capture a whole chain of operations on values, compute 
the worst case nbr of bits to store, allocate once then compute inside. 
It prevent spurious allocation and make the algorithms easier to write. 
Additionally, ET would help detecting expression pattern for which a non 
naive algorithm could be applied.

  - virtual inheritance: looking at integer_t shows:

    class integer_t: virtual public ....

   Why ? It seems you're trying to implement some policy based design 
but i cant see why virtual inheritance is used ? Do you get problem not 
having multiple definition in your mixins ?

  - There is a lot of unwarranted pass-by-value function prototype which 
seriously impact performance even with COW. pass-by-ref > pass-by-value 
+ COW. Moreover, the "thread safety" COW is dubious.

  - the headers are obscenely big. It could be easier to follow the code 
it there was not this 2k+ line mammoth files.

* What is your evaluation of the documentation?

The documentation is OK but it is rather hard to navigate as there is 
real links between sections, forcing me to go back and forth between 
sections.

* What is your evaluation of the potential usefulness of the library?

The problem the library tries to solve is a very important one as shown 
by the large amount of similar project. Having such a library is worthy 
of inclusion in boost but not in the present form and shape.

* Did you try to use the library? With what compiler? Did you have any
problems?

I tried some examples from the doc on gcc 4.5 with no real problems.

* How much effort did you put into your evaluation? A glance? A quick 
reading? In-depth study?

* Are you knowledgeable about the problem domain?

I've been doing research and development on scientific computing for 9 
years. If the specific problem of large integer is a bit of a fringe 
domain for me, I have been collaborating on such design in some projects.

* Do you think the library should be accepted as a Boost library?

No.

In its current form, the library *should not* be accepted into Boost.
performances claims are not backed up (and other reviewers show it can 
be far better0 the design lacks of genericity and its implementation is 
, in my opinion, sub-optimal. To be accepted the following point should 
be addressed:

   - use expression template to handle efficient combination of 
operations and potentially detect optimisable pattern. Proto is a 
shoe-in for this.

   - the library entities should be made more generic by providing an 
extended integer adaptor facility that make one able to substitute a 
custom integer representation (like having extended_integer<GMP>) for 
corner case stuff. This adaptor will provide an interface able to turn 
these types into a type fulfilling the underlying Large Integer concept 
the library need to handle. Being compatible with GMP should even be a 
goal and a selling point for the library.

  - do real benchmarks and compare to existing state of the art 
solution, at least show benchmarks against GMP on realistic use cases.

  - the algorithm should be more generic and not be tied to the concrete 
type of large integer used.

Frankly, licensing issues aside, a proto based library able to wrap GMP 
if needed and with Range based interface could already cover 80% of 
needs and provides correct speed. Then building on top of that to add 
new algorithm and new back end representation and adaptors would fill 
the library in a very smooth way.

[boost] [Review] XInt Review

Joel Falcou