Dear community (especially the reviewers of boost.histogram),
I am still working on finishing the library based on the excellent reviews I got in September. I cannot make up my mind about a specific detail of the library interface and would like to ask for your input.
Every axis type is required to have a size() method, which returns the number of bins along that axis (without counting extra bins for under- and overflow). Should this method return `unsigned` or `int`?
= The case for `unsigned` =
The size of an axis is non-negative, so the return type should reflect this.
All the STL containers return their size as an unsigned integer. Naturally, boost.histogram should be as consistent as possible with the STL, so you don't have think about exceptions from the general rule.
Whether `size()` returns a signed or unsigned type matters even in times of `auto`, since compilers like to warn about comparisons between signed and unsigned integers. These warnings and the STL have slowly thought people to write indexed loops with unsigned integers, like so:
```
auto h = make_histogram(…); // 1D histogram
for (unsigned i = 0; i < axis.size(); ++i) {
auto x = h[i];
// do something with bin
}
```
We don't want to break this painfully learned habit by letting `size()` return a signed integer, because it would generate a warning in this naive loop.
= The case for `int` =
The type of the bin index returned by an axis is `int`, because it can be negative. An axis is required to return -1 when the axis represents an ordered range and the value fall below the low edge of the first bin. So the index must be signed. The size of an axis is highest index + 1. It would be natural to use the same type for `size()` and for the returned index.
Internally in the library, there are many less-than comparisons between the index and the size of the axis. If the type of the latter is unsigned, the size must be cast to `int` in all these comparisons to avoid compiler warnings, which is adding code clutter in the implementation.
The library provides a range adaptor, called `indexed`, which allows one to iterate over the bins of a histogram with a special proxy, that can return the current bin index and the accumulated value for that bin. The following naive code to ignore under/overflow bins while iterating will generate a warning when `size()` returns `unsigned`
```
auto h = make_histogram(…); // 1D histogram
// iterate over indexed bin range, ignore indices < 0 and indices > size (= underflow/overflow bins)
for (auto x : indexed(h)) {
if (x[0] < 0 || x[0] >= h.axis().size()) // without warning only when x[0] and h.axis().size() both return `int`
continue;
// do something with bin
}
```
People are supposed to use the indexed range to iterate over the bins and not a combination of simple loops like in the first listing, because it scales nicely to multi-dimensional histograms. In multiple dimensions, it is also potentially more efficient. The `indexed` range adaptor automatically moves sequentially in memory, which this is not guaranteed if the index is created by looping manually over several indices, like so
```
for (unsigned i = 0; i < h.axis(0).size(); ++i)
for (unsigned j = 0; j < h.axis(1).size(); ++j)
{
auto x = h.at(i, j); // this may jump around in memory, causing cache misses
}
```
How the linear index is generated from the multi-dimensional index is an implementation detail (although the implementation should probably anticipate these naive loops…)
So, if `indexed` is anyway the default recommended way of iterating over a histogram, then size() should return `int` to be compatible with the index type, which is also `int`, because it makes naive code work better.
Sorry for the long email, I would really appreciate your input.
Best regards,
Hans