December 2018 - Boost - lists.preview.boost.org

Reminder: Boost Master branch will close for release tomorrow (12/5)
by Marshall Clow 05 Dec '18

05 Dec '18

Calendar at https://www.boost.org/development -- Marshall

3 3

[histogram] should some_axis::size() return unsigned or int?
by Hans Dembinski 04 Dec '18

04 Dec '18

Dear community (especially the reviewers of boost.histogram), I am still working on finishing the library based on the excellent reviews I got in September. I cannot make up my mind about a specific detail of the library interface and would like to ask for your input. Every axis type is required to have a size() method, which returns the number of bins along that axis (without counting extra bins for under- and overflow). Should this method return `unsigned` or `int`? = The case for `unsigned` = The size of an axis is non-negative, so the return type should reflect this. All the STL containers return their size as an unsigned integer. Naturally, boost.histogram should be as consistent as possible with the STL, so you don't have think about exceptions from the general rule. Whether `size()` returns a signed or unsigned type matters even in times of `auto`, since compilers like to warn about comparisons between signed and unsigned integers. These warnings and the STL have slowly thought people to write indexed loops with unsigned integers, like so: ``` auto h = make_histogram(…); // 1D histogram for (unsigned i = 0; i < axis.size(); ++i) { auto x = h[i]; // do something with bin } ``` We don't want to break this painfully learned habit by letting `size()` return a signed integer, because it would generate a warning in this naive loop. = The case for `int` = The type of the bin index returned by an axis is `int`, because it can be negative. An axis is required to return -1 when the axis represents an ordered range and the value fall below the low edge of the first bin. So the index must be signed. The size of an axis is highest index + 1. It would be natural to use the same type for `size()` and for the returned index. Internally in the library, there are many less-than comparisons between the index and the size of the axis. If the type of the latter is unsigned, the size must be cast to `int` in all these comparisons to avoid compiler warnings, which is adding code clutter in the implementation. The library provides a range adaptor, called `indexed`, which allows one to iterate over the bins of a histogram with a special proxy, that can return the current bin index and the accumulated value for that bin. The following naive code to ignore under/overflow bins while iterating will generate a warning when `size()` returns `unsigned` ``` auto h = make_histogram(…); // 1D histogram // iterate over indexed bin range, ignore indices < 0 and indices > size (= underflow/overflow bins) for (auto x : indexed(h)) { if (x[0] < 0 || x[0] >= h.axis().size()) // without warning only when x[0] and h.axis().size() both return `int` continue; // do something with bin } ``` People are supposed to use the indexed range to iterate over the bins and not a combination of simple loops like in the first listing, because it scales nicely to multi-dimensional histograms. In multiple dimensions, it is also potentially more efficient. The `indexed` range adaptor automatically moves sequentially in memory, which this is not guaranteed if the index is created by looping manually over several indices, like so ``` for (unsigned i = 0; i < h.axis(0).size(); ++i) for (unsigned j = 0; j < h.axis(1).size(); ++j) { auto x = h.at(i, j); // this may jump around in memory, causing cache misses } ``` How the linear index is generated from the multi-dimensional index is an implementation detail (although the implementation should probably anticipate these naive loops…) So, if `indexed` is anyway the default recommended way of iterating over a histogram, then size() should return `int` to be compatible with the index type, which is also `int`, because it makes naive code work better. Sorry for the long email, I would really appreciate your input. Best regards, Hans

6 13

Re: [boost] [graph] Tree data structure
by Bjorn Reese 02 Dec '18

02 Dec '18

On 11/28/18 11:51 AM, Jeremy Murphy via Boost wrote: > long ago, back in 2002 and 2009, Kasper Peeters proposed adding his tree > data structure to Boost. There is also Rene's proposal, which sounds like a useful Boost library: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3700.html

2 1

Re: [boost] [histogram] should some_axis::size() return unsigned or int?
by Hans Dembinski 01 Dec '18

01 Dec '18

Dear Jim, > On 29. Nov 2018, at 13:31, Jim Pivarski <jpivarski(a)gmail.com> wrote: > > It seems to me that if other C++ containers use unsigned integers for size(), you should too, for minimal surprise. The issue you raise about the "which bin?" function returning -1 would be solved in a very-strongly typed language like Haskell as an optional<int>. I know that C++ has optional types now, but I don't know how widely they're used or if there's a significant performance penalty. If this wouldn't look too weird in a C++ program and wouldn't slow it down (or needlessly complicate the code), an optional type would describe your intent more fully than -1. there is boost::optional, which has the semantics of a pointer and can be used to represent a type that stores a value or not. ``` boost::optional o = some_fickle_function(); if (o) { // optional has a value? auto value = *o; // "dereference" to get the value } else { // handle case where value is missing } ``` This is not a good match here, because -1 here does not have the meaning of "value is missing", but it really is the logical index for the virtual bin that spans from -infinity to the lower edge of the first bin in the axis. Value arrow: -inf ——————— | ——————— | —————— | —————— |—————————> +inf bin -1 bin 0 bin 1 bin 2 bin 3 I think representing the underflow bin with -1 and the overflow bin with the value returned by size() is very intuitive and elegant. Best regards, Hans

5 10