
On Wednesday, January 22nd, 2025 at 3:20 PM, Alan de Freitas via Boost
Boost.Decimal Review
I want to thank Matt Borland and Chris Kormanyos for their work.
Does this library bring real benefit to C++ developers for real world
use-case?
Yes. It should be helpful in "human-centric calculations". It might be useful in financial applications, although I have some question about this below.
Do you have an application for this library?
Not at the moment.
Does the API match with current best practices?
My biggest point of disagreement is this single-header design. The documentation says "The entire library should be accessed using the convenience header `
`": Although providing a single
header as an extra option is common practice for Boost libraries, I don't think it should be a recommendation (with the word "should") let alone enforced as the only way to consume the library. The rest of the documentation always assumes the user will include `
`.
- The sections "cmath support", "cstdlib support", "format support", and so on..., don't even tell me what header I need to only include that functionality. - The documentation implies `
` includes way more than
anyone needs because even the C++ standard library splits that functionality into different headers while Boost.Decimal doesn't. `
` is forcing users to unnecessarily include a big chunk
of the standard library. And this functionality is so diverse that's very unlikely the user really wants to use even 10% of what's being included. - A single-header design is not a way "to make things just work with no effort on the user's part" because it forces the user to use the library this way, not simply allowing them to do so like all other boost libraries. - If a potential user says, "X seconds of compile time is a problem," it's unhelpful to reply with something like "I don't think X seconds of compile time is not a problem for you." Especially when you consider that it could be 3 seconds per translation unit.Most recent libraries are making an effort to reduce compile times by breaking up the headers and avoiding header-only libraries. This is a big complaint people have about Boost. The library is already header-only and making no effort to break up the headers makes no sense to me. - Telling the user to rely on macros to disable optional functionality doesn't sound good either, because now users have to blacklist rather than whitelist what they want. Blacklisting what you don't want with macros is a huge workaround. It involves a new convention and blacklisting to solve a design problem that has already been solved. - Implying this design matches the C++ standard library is also unreasonable because it brings in many different headers from the standard library. What would match the STL interface, almost by definition, is one header for cmath support, one header for cstdlib support, one header for charconv support... Even the documentation is structured by the different standard library headers it replicates the functionality for.
See: https://github.com/cppalliance/decimal/issues/804. There's an active branch `modular` where I am working on resolving the issue.
A minor point:
- The public API probably shouldn't return `detail::uint128`. It can't be considered an implementation detail if the user can be passing it around. It could be a "see-below" type, where the user should ignore information about the members but can still use.
Yes this change makes sense, and several others have brought it up. I tagged you an issue to track it since you're the most recent one to mention it.
Is the documentation helpful and clear?
Yes. The main point I'd like to discuss is in terms of explaining the trade-offs. The documentation says "Decimal floating point numbers avoid this issue by storing the significand in base-10 (decimal)." and the "issue" mentioned in the previous sentence is "representation errors". This seems misleading. It makes it sound like `decimalX` it's a variant of a bigint type and that attempts to avoid all representation errors. But if I understand correctly, decimal numbers avoid much more specific types of errors: representation errors when the base is not a multiple of a prime factor. It has the same representation errors as base-2 floating point numbers when 1) the base is not a multiple of a prime factor (although there are more multiples of prime factors) and 2) there are more and more rounding errors in the value as the exponent gets larger and larger. So the final trade-off is less representation errors (because there are more multiples of prime factors) at the cost of performance ("ratio to double" between ~30 and ~150 for most operations). It might be wrong about these details, but I still believe that sentence in the documentation is misleading.
There's also a technical question I still don't understand about 1) the relationship between decimal numbers and fixed-point real numbers and 2) the impact of this relationship on potential applications. For instance, Bitcoin uses integers (Satoshis) to represent "decimal" units of BTC. This representation is just an integer and a fixed/implicit scaling factor/exponent. The same representation can be used for dollars: the internal units are cents and the main unit is a dollar. When we compare that with decimal floating point numbers, the trade-off becomes 1) faster operations (even faster than native base-2 floating point types) and 2) no precision errors up / exact accurary for numbers in the predefined scale, and 3) reliable precision regardless of the best exponent to represent the number; at the cost of a more limited range of values that can be represented.
With that trade-off in mind, my intuition is decimal floating point numbers are not so useful for financial applications (the only application discussed in the "Examples" section). With 32-bit fixed-point numbers, I could represent more dollars (or cents) or BTC (or satoshis) than will ever exist (the limited range of values is not a problem) while getting the performance and precision benefits of integers.
In dollars with a 32-bit unsigned integer you could represent ~4.3 Billion USD. The United States Government spends on the order of 10,000 Billion USD a year.
If the number is larger than that maximum, that means either I made a mistake (for instance, in currency conversion) and I should definitly fix this mistake (I can't lose $1000000 and ignore it because my best exponent is now Y for some reason - $1000000 is still a lot and I can't move on until this "precision error" is fixed). If this is not a mistake, then I'm in a subdomain where the precision errors don't matter anymore because I need those large exponents to represent something else. The problem is the extra precision from more multiples of prime factors are now not as relevant to me and base-2 numbers are probably fine (for instance, in some statistics about the data).
I might be wrong about all of that, but if that's the case I assume other people will also be wrong about it. The docs could correct these mistakes or, if I'm wrong, provide other examples of useful applications. That's why I think the main application of the library is "human-centric calculations" and it's only useful in other fields when there's some intersection with that, like there is when reading data in base-10 format that might not fit a fixed-point representation. This discussion could also open possibilities for representations using integers in this library or other libraries in the sense that base-10 floating-point is, in a way, fixed-point numbers of a dynamic exponent.
I think including something like Reuben is planning on doing where he synthesizes a variable fixed point type could be a useful addition.
Other minor points:
- It seems like most code blocks start with some whitespaces in the first line for some reason. - Constructing a decimal from floating point numbers leads to precision errors, and constructing it from independent integers makes it difficult to read. Since we construct it from strings (Literals Support), this could be listed in "Basic Usage". I couldn't figure out the functions to construct these values from literals without using the literals in the "Literals Support" section.
That's fair. I have also moved literals into a literals namespace since the beginning of the review since several people have mentioned that as well.
- The signatures could contain the explicit requires clauses and concepts instead of "Where types `T` and `T2` are integral types" in the exposition. - "Lastly the sign follows the convention of `signbit`": This could be linked to https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/signbit or removed from the beginning of the exposition. The reader might think it is important information to understand the rest of the exposition and stop reading the documentation to google it. Also, `bool sign` only needs to be explained so much because it's unintuitive. If it were called `bool is_negative`, or something like that, there would be nothing to explain.
I'll add an explanation of signbit rather than assuming people know it's counterintuitive.
- The "3.2.8 Note" could just be part of the reference documentation. This way the user would already have this information when looking at the function. It would also be useful to explain how that behavior compares with other numeric types. - Typo in "Integeral" - I missed some better explanation about the difference between fast and regular types. Like in Boost.Unordered where they explain the data structures and then the trade-off. - I'm used to the API reference being at the end of the documentation. I was surprised to see there's a lot of exposition after the reference: Examples, Benchmarks, "Design Decisions", ... - "Comparisons" section: "between vec[i] and vec[i + 1]": where the numbers come from in the implementation is irrelevant. Only the number of replicates (20,000,000) is relevant. "This is repeated 5 times to generate stable results" doesn't add entropy to help stability. It just means you have 100,000,000 replicates with a bias. You could just generate 100,000,000 values. It's also not relevant to the experiment that the values are in a vector. In fact, unless there's a technical reason for that, generating the numbers for the tests and discarding them is much easier. In any case, 100,000,000 is way too much for any probability distribution. After much less than that you'll have a stable p-value. - Comparison results: The tables need some explaining so the reader doesn't have to look at each column and header to infer what's happening. It's usual to include something like "The second column means ____. Lower values are better.". It's also common practice to include the standard deviation in the table. - The tables are not sorted by the values. This is particularly confusing because `GCC _Decimal<X>` always beats `decimal32`. This is not discussed
in the document. `GCC_Decimal<X>` isn't even defined and discussed.
I believe Peter Turcan brought up similar points. I'll add a link to the GCC docs at the top explaining what it is.
- There should be a conclusion about the benchmark results. Did anything change from one platform to the other, etc? If I read the results correctly, the conclusion is I should always use `decimalX_fast`, then `GCC_DecimalX` if I need other properties, then `decimalX` if not using GCC. - If section "Basic Operations" is about "+, -, *, /", then the first comparison subsection "Comparisons" should be renamed to reflect what operations it compares ">, >=, <, <=, ==, and !=" (relational operators).
- Both columns in the table have the same information in a different scale proportion. This means all benchmarks for a platform could be in a single bar plot where the x-axis has each number type, and for each number type, we have the ratio to double for each operation in a bar.
Did you try to use it? What problems or surprises did you encounter?
Yes. I compiled the code and ran the examples.
What is your evaluation of the implementation?
It seems like performance is still much worse than other implentations. I don't think this is a reason to reject the library. Just I hope the authors would keep an eye on that, implementing improvements over time, if possible. Or maybe some other native types or implementations could be wrapped in the library.
Chris and I have been discussing offering a wrapper around the Intel library for those who have it, and want to use it. It's the same model that multi-precision uses.
Please explicitly state that you either accept or reject the
inclusion of this library into boost.
I recommend accepting the library.
Also please indicate the time & effort spent on the evaluation and give
the reasons for your decision.
I read the documentation and the source code. I compiled the library and ran the examples.
I spent time on and off over the last week. In total, I spent about 2 days evaluating the library.
Thank you for your review and comments. Matt