Formal Review of Proposed Boost.Histogram Library Starts TODAY

Dear Boost Community, The formal review of Hans Dembinski's Histogram library starts TODAY, the September 17th, and lasts until September 26th, unless an extension occurs. =========== What is it? =========== Histogram is a header-only C++11 library, provides multi-dimensional histogram template class for counting, statistics and data exploration. The histogram class interface has been carefully designed to make the library * offer simple and convenient default behaviour * easy to learn and safe to use * convenient to use for one- as well as multi-dimensional analysis * compatible with STL algorithms * dependant only on C++11 standard library and optionally on Boost The author followed the "Don't pay for what you don't use" principle. The histogram class is customisable through policy classes, but the default policies were carefully crafted so that most users do not need to customize anything. The default policies guarantee the histogram to be safe to use as a black box, memory efficient, and very fast, where safety is related to overflow prevention and large values capping. This guarantees is one of unique features of the library in comparison to other implementations. The histogram class comes in two variants, which share a common interface: * static variant is optimised at compile-time to provide maximum performance, at the cost of reduced runtime flexibility and potentially larger executables; * dynamic variant is a bit slower, but configurable at run-time. Both variants of the histogram class provide optional serialization support implemented with Boost.Serialization. The library itself is header-only and the tests as well as examples can be built using both, Boost.Build and CMake. There are no dependencies required other than C++11 compiler. Optional dependencies include the GNU Scientific Library (GSL) and the ROOT framework from CERN which are needed the corresponding speed test programs used for performance comparison. A note on history: The library was first announced to the Boost community in May, 2016 [1] and the author received extensive feedback. Then, after a year of work posted second request for comments in April, 2017 [2]. =================== Getting the library =================== Checkout the `master` branch from the library git repository: https://github.com/HDembinski/histogram/ NOTE: The author follows the library submission process [3]: the master branch remains frozen for the period of the formal review, modifications in response to feedback should go to `develop` branch and big items may be worked and discussed via opening an issue on GitHub. Documentation: http://hdembinski.github.io/histogram/doc/html/ Website: http://hdembinski.github.io/histogram/ ================ Writing a review ================ Please, follow the official Boost formal review procedure [4]. If you feel this is an interesting library, then please submit your review to the Boost developer list (preferably), or the review manager to mateusz (at) loskot (dot) net. Here are some questions you might want to answer in your review: - What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain? And finally, every review should answer this question: - Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion. If needed, refer to the Boost Library Requirements and Guidelines [5]. Please, give constructive criticism! [1] https://lists.boost.org/Archives/boost/2016/05/229389.php [2] https://lists.boost.org/Archives/boost/2017/04/234410.php [3] https://www.boost.org/development/submissions.html#Review [4] https://www.boost.org/community/reviews.html [5] https://www.boost.org/development/requirements.html Mateusz Review Manager for the proposed Boost.Histogram -- Mateusz Loskot, http://mateusz.loskot.net

AMDG abstract.html: - "the one- and multi-dimensional case" "cases" should be plural. - "..submit this project to Boost, that's why..." Comma splice - Please don't overuse *bold text* acknowledgments.html - "converting the documentation, and adding Jamfiles" No comma here. motivation.html - "The GNU Scientific Library (GSL) and in the ROOT framework" Remove "in" - "The C implementations of the GSL are elegant..." Are there multiple implementations? - "The implementations are not customizable, you have to..." Comma splice build.html - Is there a reason that CMakeLists.txt is in build/ instead of test/? getting_started.html - "int main(int, char**) {" If you're not using the parameters, `int main()` is a valid signature. - Since you require C++11, you might as well use <random> instead of Boost.Random. The Boost implementation of mt19937 has better performance, but that doesn't really matter here. - "in addition, the factory also accepts iterators over a sequence of axis::any, the polymorphic type that can hold concrete axis types" The relationship between axis::any and axis::any_std (the type actually used) in unclear. guide.html: - "A histogram consists a number of..." "consists /of/ a number of..." - "which provides safe counting, is fast and memory efficient." There are only two elements, so use "and" instead of a comma or say "is fast, and /is/ memory efficient." - "Axes objects with different label do..." -> s/Axes/Axis/, s/label/labels/ - "...sequence of bin indices for each axes in order." s/axes/axis/ - I wonder whether you could handle the issue that h * 2 != h + h, by using h * weight(2) to make it clear that multiplication changes the weights of the samples instead of changing the number of samples. - "...remove some axes and look the equivalent lower..." "...look /at/ the equivalent..." - "so it is not worth keeping that axies around" s/axies/axis/ - "Users can create new axis classes ..., or implemented..." s/implemented/implement/ - "Here is a contrieved example of a..." s/contrieved/contrived/ - "The guarantees ... are only valid, if the default..." No comma. - "... you need know what you are doing" "you need /to/ know" benchmarks.html: - No GSL? - You say uniform and normal distributions, but there's only a single plot. Does this mean that you are putting both uniform and normal variates into a single histogram? rationale.html - "Axis access" Not the most fortuitous word combination. - "The buildin storage types..." s/buildin/builtin/ - It seems that using weighted fills breaks the special overflow handling properties of adaptive_storage. I think that this deserves a distinct note. - "Why is Boost.Histogram not build on top..." s/build/built/ concepts.html: - Concept names use CamelCase, by convention. - It doesn't seem very sane to have the value_type be a reference. Is there any reason for this other than the fact that you specify the exact signature of axis_type::index? - I would prefer to set the size of a storage_type in the constructor instead of using reset, if that's possible. - I don't understand the difference between the two overloads of storage_type::add. reference.html: - Unfortunately forward declarations seem to mess up the doxygen/boostbook toolchain. As a result a number of components are listed in histogram_fwd.hpp I'll look into this. - I'll deal with the reference content along with the code. I will note that the reference documentation seems a bit lacking overall as there are many components without any description. General: - The passive voice is overused. In Christ, Steven Watanabe

Dear Steven, thank you very much for the careful reading of the docs and sorry for the overuse of bold text. Now that people actually notice the project I can stop shouting. I will implement all these wording mistakes that you point out. A few clarifcations below: abstract.html:
- "The C implementations of the GSL are elegant..." Are there multiple implementations?
Ok, one implementation per set, but there are two sets of functions, one set for 1d histograms and one set for 2d histograms. - Is there a reason that CMakeLists.txt is in
build/ instead of test/?
Mateusz moved the CMakeLists.txt into the root directory which is better/more straight forward and follows the example of hana and compute. We could also move it to test as you suggest, but I prefer it in the root directory. I read in some Boost guides (I think it was on the blincubator, but I can't find the source just now and blincubator.com gives me a 403 error today) that all build scripts should reside in the build directory or next to the source. So I put it in "build". - "in addition, the factory also accepts iterators over a sequence of
axis::any, the polymorphic type that can hold concrete axis types" The relationship between axis::any and axis::any_std (the type actually used) in unclear.
True, this is not explained well. axis::any is a closed-set polymorphic type derived from boost::variant and templated on a list of types that the polymorphic type should be able to represent. axis::any_std is a concrete type where the type list is filled with standard axis types which users are expected to use most frequently.
- I wonder whether you could handle the issue that h * 2 != h + h, by using h * weight(2) to make it clear that multiplication changes the weights of the samples instead of changing the number of samples.
If I understand correctly, you would only allow scaling by a weight type, not by an arbitrary number. That makes sense and is more explicit. Sounds like a good idea.
benchmarks.html:
- No GSL?
There are GSL benchmarks for 1D and 2D, where our static variants are faster. The GSL does not provide histograms for more than two dimensions. https://www.gnu.org/software/gsl/manual/html_node/Histograms.html
- You say uniform and normal distributions, but there's only a single plot. Does this mean that you are putting both uniform and normal variates into a single histogram?
Good point, this is a mistake in my plotting script. I originally wanted to plot both results but ended up plotting only one. The benchmarks for uniform and normal data are slightly different, because they trigger the branches in the axis code differently. By design of the benchmark, branch prediction works better for the uniform data.
- "Axis access" Not the most fortuitous word combination.
As long as you don't try to say it out loud... ;) - It seems that using weighted fills breaks the special
overflow handling properties of adaptive_storage. I think that this deserves a distinct note.
Right.
- It doesn't seem very sane to have the value_type be a reference. Is there any reason for this other than the fact that you specify the exact signature of axis_type::index?
This is a mistake, the method is not required to accept a reference, pass-by-value is also possible. How should I write it to indicate that passing by value or reference is possible?
- I would prefer to set the size of a storage_type in the constructor instead of using reset, if that's possible.
I was flip-flopping on that one. A call `value.reset(n)` can be replaced by `value = storage_type(n)`. I don't see a good reason to prefer one over the other, do you?
- I don't understand the difference between the two overloads of storage_type::add.
I should explain better, the whole section about concepts is very brief. The first version of "add" is used when you add two histograms and integral counts are added to integral counts. The second version of add - which accepts the weight type - is used when you increment the counter not by 1, but by a weight. In an earlier stage of this library, the corresponding method was called `increment_by_weight`. The second overload is required because it causes an internal transformation from integral counters into dual counts which track the sum of weights and the sum of weights squared in case of the adaptive_storage (so that a variance estimate can be computed). This transformation is not needed when you simply add two histograms.
- Unfortunately forward declarations seem to mess up the doxygen/boostbook toolchain. As a result a number of components are listed in histogram_fwd.hpp I'll look into this.
Thanks, I am glad for any help on this issue.
- I'll deal with the reference content along with the code. I will note that the reference documentation seems a bit lacking overall as there are many components without any description.
True, just like the concepts it got a bit less love.
- The passive voice is overused.
And in the last paper that I submitted to a journal, they complained about my use of active voice... On Mon, Sep 17, 2018 at 10:58 PM Steven Watanabe via Boost < boost@lists.boost.org> wrote:
AMDG
abstract.html:
- "the one- and multi-dimensional case" "cases" should be plural.
- "..submit this project to Boost, that's why..." Comma splice
- Please don't overuse *bold text*
acknowledgments.html
- "converting the documentation, and adding Jamfiles" No comma here.
motivation.html
- "The GNU Scientific Library (GSL) and in the ROOT framework" Remove "in"
- "The C implementations of the GSL are elegant..." Are there multiple implementations?
- "The implementations are not customizable, you have to..." Comma splice
build.html
- Is there a reason that CMakeLists.txt is in build/ instead of test/?
getting_started.html
- "int main(int, char**) {" If you're not using the parameters, `int main()` is a valid signature.
- Since you require C++11, you might as well use <random> instead of Boost.Random. The Boost implementation of mt19937 has better performance, but that doesn't really matter here.
- "in addition, the factory also accepts iterators over a sequence of axis::any, the polymorphic type that can hold concrete axis types" The relationship between axis::any and axis::any_std (the type actually used) in unclear.
guide.html:
- "A histogram consists a number of..." "consists /of/ a number of..."
- "which provides safe counting, is fast and memory efficient." There are only two elements, so use "and" instead of a comma or say "is fast, and /is/ memory efficient."
- "Axes objects with different label do..." -> s/Axes/Axis/, s/label/labels/
- "...sequence of bin indices for each axes in order." s/axes/axis/
- I wonder whether you could handle the issue that h * 2 != h + h, by using h * weight(2) to make it clear that multiplication changes the weights of the samples instead of changing the number of samples.
- "...remove some axes and look the equivalent lower..." "...look /at/ the equivalent..."
- "so it is not worth keeping that axies around" s/axies/axis/
- "Users can create new axis classes ..., or implemented..." s/implemented/implement/
- "Here is a contrieved example of a..." s/contrieved/contrived/
- "The guarantees ... are only valid, if the default..." No comma.
- "... you need know what you are doing" "you need /to/ know"
benchmarks.html:
- No GSL?
- You say uniform and normal distributions, but there's only a single plot. Does this mean that you are putting both uniform and normal variates into a single histogram?
rationale.html
- "Axis access" Not the most fortuitous word combination.
- "The buildin storage types..." s/buildin/builtin/
- It seems that using weighted fills breaks the special overflow handling properties of adaptive_storage. I think that this deserves a distinct note.
- "Why is Boost.Histogram not build on top..." s/build/built/
concepts.html:
- Concept names use CamelCase, by convention.
- It doesn't seem very sane to have the value_type be a reference. Is there any reason for this other than the fact that you specify the exact signature of axis_type::index?
- I would prefer to set the size of a storage_type in the constructor instead of using reset, if that's possible.
- I don't understand the difference between the two overloads of storage_type::add.
reference.html:
- Unfortunately forward declarations seem to mess up the doxygen/boostbook toolchain. As a result a number of components are listed in histogram_fwd.hpp I'll look into this.
- I'll deal with the reference content along with the code. I will note that the reference documentation seems a bit lacking overall as there are many components without any description.
General:
- The passive voice is overused.
In Christ, Steven Watanabe
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

AMDG On 09/18/2018 02:04 PM, Hans Dembinski via Boost wrote:
<snip lots>
- Is there a reason that CMakeLists.txt is in build/ instead of test/?
Mateusz moved the CMakeLists.txt into the root directory which is better/more straight forward and follows the example of hana and compute. We could also move it to test as you suggest, but I prefer it in the root directory. I read in some Boost guides (I think it was on the blincubator, but I can't find the source just now and blincubator.com gives me a 403 error today) that all build scripts should reside in the build directory or next to the source. So I put it in "build".
That's not the correct interpretation of the guideline (perhaps it needs to be stated more clearly as you're not the first to make that mistake). build/ is specifically for the scripts that build a separately compiled library, which you do not need. Note that the corresponding Jamfile, which does essentially the same thing, lives in test/ and this is required for integration. I would probably put CMakeLists.txt in test/ with an (optional) root CMakeLists.txt that just calls add_subdirectory.
- It doesn't seem very sane to have the value_type be a reference. Is there any reason for this other than the fact that you specify the exact signature of axis_type::index?
This is a mistake, the method is not required to accept a reference, pass-by-value is also possible. How should I write it to indicate that passing by value or reference is possible?
I would probably handle it with an "as if" clause. That is to say, concrete implementations need to handle every usage that is legal for pass-by-value, but do not necessarily need to actually use pass-by-value. (To be strictly formally correct, this may require additional restrictions like forbidding usage via a pointer-to-member-function, which can distinguish the exact argument type.)
- I would prefer to set the size of a storage_type in the constructor instead of using reset, if that's possible.
I was flip-flopping on that one. A call `value.reset(n)` can be replaced by `value = storage_type(n)`. I don't see a good reason to prefer one over the other, do you?
After starting to look at the source, I think reset, as you have it, is better, because it allows the storage_type to carry additional state safely (e.g. an allocator).
- The passive voice is overused.
And in the last paper that I submitted to a journal, they complained about my use of active voice...
I did also notice that you used the second person quite a bit, which gives a somewhat informal, conversational feel. This is fine for tutorials but may not be appropriate in all circumstances. In Christ, Steven Watanabe

On 18. Sep 2018, at 22:54, Steven Watanabe via Boost <boost@lists.boost.org> wrote:
That's not the correct interpretation of the guideline (perhaps it needs to be stated more clearly as you're not the first to make that mistake). build/ is specifically for the scripts that build a separately compiled library, which you do not need. Note that the corresponding Jamfile, which does essentially the same thing, lives in test/ and this is required for integration. I would probably put CMakeLists.txt in test/ with an (optional) root CMakeLists.txt that just calls add_subdirectory.
Ok, Mateusz actually suggested the same thing. It was not a mistake originally. Until Saturday, a separately compiled library was part of histogram, the Python module which has been removed, because we concluded that the Python frontend is better handled as a separate project.

On Tue, 18 Sep 2018 at 22:54, Steven Watanabe via Boost <boost@lists.boost.org> wrote:
On 09/18/2018 02:04 PM, Hans Dembinski via Boost wrote:
<snip lots>
- Is there a reason that CMakeLists.txt is in build/ instead of test/?
Mateusz moved the CMakeLists.txt into the root directory which is better/more straight forward and follows the example of hana and compute. We could also move it to test as you suggest, but I prefer it in the root directory. I read in some Boost guides (I think it was on the blincubator, but I can't find the source just now and blincubator.com gives me a 403 error today) that all build scripts should reside in the build directory or next to the source. So I put it in "build".
That's not the correct interpretation of the guideline (perhaps it needs to be stated more clearly as you're not the first to make that mistake).
I believe it deserves clarification https://www.boost.org/development/requirements.html says - build - Library build files such as a Jamfile, IDE projects, Makefiles, Cmake files, etc. - Required if the library has sources to build. It is unclear if the "sources to build" are meant strictly as the library compilation input sources only, or also auxiliary sources of the library as a project (ie. tests, examples).
I would probably put CMakeLists.txt in test/ with an (optional) root CMakeLists.txt that just calls add_subdirectory.
I tend to put the root CMakeLists.txt - it is convenient to drive build of everything in the library - it indicates if the library is CMake-enabled, without browsing deeper Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

On Wed, 19 Sep 2018 at 13:22, Seth via Boost <boost@lists.boost.org> wrote:
On 17-09-18 22:58, Steven Watanabe via Boost wrote:
- The passive voice is overused.
I spit my coffee there. Thanks
Please, keep sich off-topic messages out of the review threads. Thank you! -- Mateusz Loskot, http://mateusz.loskot.net

On 19-09-18 13:34, Mateusz Loskot via Boost wrote:
Please, keep sich off-topic messages out of the review threads.
Thank you!
I'm sorry. I should have been more to the point. My actual gripe with the message would have been that it presented immaterial feedback in an almost unusable form (a PR would be some much more effective for textual changes) and also seemed to be missing any thoughts on the merit of the library. But, I admit I burnt all my time reading through that, and perhaps the material feedback appeared elsewhere in the thread. For lack of time, I'll just bow out

Hi Seth, I appreciate your understanding. I agree with your comment though. Hopefully, Hans will request some more details. Regards, Mateusz On Wed, 19 Sep 2018 at 14:12, Seth via Boost <boost@lists.boost.org> wrote:
On 19-09-18 13:34, Mateusz Loskot via Boost wrote:
Please, keep sich off-topic messages out of the review threads.
Thank you!
I'm sorry. I should have been more to the point. My actual gripe with the message would have been that it presented immaterial feedback in an almost unusable form (a PR would be some much more effective for textual changes) and also seemed to be missing any thoughts on the merit of the library. But, I admit I burnt all my time reading through that, and perhaps the material feedback appeared elsewhere in the thread. For lack of time, I'll just bow out
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Mateusz Loskot, http://mateusz.loskot.net

AMDG arithmetic_operators.hpp: 16: Returning an rvalue reference is tempting, but it can cause issues in some cases because it prevents the lifetime of the temporary from being extended in certain cases. See temporary-lifetime.cpp. 42: histogram<A, S>&& operator*(histogram<A, S>&& a, const double x) histogram::operator*= uses scale_type. I don't really care whether you use scale_type or double, but please be consistent. If you're hard-coding double here, then there's no point to making the internals more flexible. histogram.hpp: 36: What are the requirements on the Axes parameter? I'm deducing that it must be either a std::tuple or a std::vector containing Axes, but I didn't see this explicitly stated anywhere. 60: histogram& operator=(const histogram<A, S>& rhs) This isn't exception safe unless axes_assign is nothrow--which it isn't. 107: std::size_t dim() const noexcept This can probably be constexpr as well. 207: // precondition: argument sequence must be strictly ascending axis indices This is exactly the sort of information that needs to appear in the documentation. Use \pre. Also it would be nice if this weren't a requirement, and reduce_to could shuffle the axes. Does/should it work in the degenerate case of reducing a histogram to itself? 234: BOOST_ASSERT_MSG(begin == end || Will reduce_to even work if begin == end? If nothing else, I think it will fail in the do/while on L247 Also, you're only checking the end, but the code seems to permit signed integers. (Please specify the restrictions on the iterator's value_type clearly). 261: friend class python_access; I don't see a forward declaration of this. Please note that the semantics of this can be exceedingly strange when there is no prior declaration. See friend.cpp 321: auto axes = typename H::axes_type(s.get_allocator()); axes.reserve(std::distance(begin, end)); while (begin != end) axes.emplace_back(*begin++); This can all be reduced to one line: auto axes = typename H::axes_type(begin, end, s.get_allocator()) histogram_fwd.hpp: N/A iterator.hpp: 27: iterator_over(const iterator_over& o) It's probably worth a comment that the cache is not copied, as I almost suggested defaulting this. 66: void advance(int n) The difference_type is ptrdiff_t. - ForwardIterators and higher must be default constructable. - distance_to is not implemented. literals.hpp: - The standard guarantees that the digits are consecutive, so char2int can be just (C - '0'). ostream_operators.hpp: - You don't actually need ostream. iosfwd is enough, especially since you're only #including histogram_fwd.hpp, and not histogram.hpp. - It looks like the format includes a trailing comma, which seems a little odd when you go to the trouble of not adding a new line for an empty histogram. serialization.hpp: 77: if (Archive::is_loading::value) { S::apply(typename S::destroyer(), s.buffer); } This isn't exception-safe, as creating a new buffer can throw leaving a dangling pointer behind. 128: if (Archive::is_loading::value) { this->~variable(); } Don't call the destructor like this. - This does not support XML archives. - serialize for user defined axis types seems error prone, because labelled_base::serialize is inherited. If the user forgets to add serialize, the code will compile and silently slice the object. weight.hpp: 12: namespace detail { Be careful about exposing types from namespace detail in any way. It allows ADL in client code to look inside your detail namespace. 26: detail::weight_type<T> weight(T&& t) { return {t}; No forwarding? The same goes for sample. axis/any.hpp: 242: explicit operator const base&() const { The name `base` is potentially confusing as there is also a base_type. I suggest qualifying it as `axis::base` axis/base.hpp: - I only see operator==, but not operator!=. axis/interval_view.hpp: 33: return axis_.lower(idx_ + 1); The requirement that an axis needs to provide lower to work with interval_view is not documented. Neither is the exact meaning of lower. In particular, it needs to work with a past-the-end index. The actual implementations seem to work for any index, but I'm not sure that the result is actually meaningful if the index is out-of-bounds. 39: operator== Should two interval_views be considered to be the same or different if they refer to distinct, but equal axes? - Actually, I find interval_view somewhat unconvincing. Why can't it just be a pure value that stores the upper and lower bounds? The value_type is usually going to be a built-in type anyway, not to mention that you haven't documented interval_view sufficiently for it to be used by user-defined axes. axis/iterator.hpp: - You're using the default difference_type = ptrdiff_t, but advance and distance_to use int. - Does a separate reverse_iterator_over have any benefit over std::reverse_iterator? - It seems inconsistent for axes to have rbegin/rend, when histogram itself doesn't. axis/ostream_operators.hpp: 22: Calling the function to_string is somewhat misleading, because what you're actually getting is a string formatted for appending to something else. 26: escape_string doesn't escape `\`. 138: Should this specialize for basic_string rather than just string? axis/types.hpp: - The interface for a Transform isn't documented. 236: static_cast<int>(std::floor(z * base_type::size())) The cast to int might overflow for large values of z. 332,350: this->~variable(); Don't call the destructor in the assignment operator. 435: if (i < 0) { return -std::numeric_limits<value_type>::max(); } Did you mean min? - Is there a reason that you're not just using std::vector for variable and category? 429: const int z = x - min_; Both x and min_ have type value_type, not int. 511+529: this->~category(); Just no. 568: mutable int last_ = 0; mutable is dangerous as it means that non-mutating functions cannot safely be run concurrently without syncronization. (The other place that uses mutable, iterator_over, is safer because iterators should usually not be shared in the first place.) axis/value_view.hpp: - As with interval_view, I don't see any need to keep a reference to the axis. detail/axes.hpp: - Don't use the unnamed namespace in headers. 41: equal &= (tp && *tp == std::get<N::value>(t)); This doesn't short-circuit on equal. 63: std::get<N::value>(t) = static_cast<const T&>(v[N::value]); Using an explicit conversion operator may not be the best choice here. My first reaction was to flag this as potential UB because of the unchecked cast. I had to look through the definition of any to realize that it is in fact checked and can throw. 122: t.resize(sizeof...(Us)); reserve+push_back is slightly better than resize+assign. 220: value *= t.shape(); It might be wise to check for integer overflow here. 521: optional_index call_impl(Tag, const std::tuple<T>& axes, It seems that this specialization prevents using a single element container with a single element static histogram, which seems inconsistent with the documentation and the behavior for dynamic histograms. detail/buffer.hpp: 42: You're assuming that the iterator operations themselves do not throw, which isn't guaranteed. 86: std::is_nothrow_constructible<T, U>() This doesn't match the constructor call. detail/cat.hpp: - unnamed namespace again. ... Skip a few files with no comments. ... storage/adaptive_storage.hpp: 299: b.set(b.template create<T>(optr)); This isn't exception-safe (increaser does it right). storage/operators.hpp: - These generic comparison operators seem really dangerous unless you make detail::requires_storage much more precise. In Christ, Steven Watanabe

Dear Steven, thank you again for the careful reading of the implementation. The bugs will be fixed. I comment on some issues below.
On 25. Sep 2018, at 02:22, Steven Watanabe via Boost <boost@lists.boost.org> wrote:
AMDG
arithmetic_operators.hpp: […] 42: histogram<A, S>&& operator*(histogram<A, S>&& a, const double x) histogram::operator*= uses scale_type. I don't really care whether you use scale_type or double, but please be consistent. If you're hard-coding double here, then there's no point to making the internals more flexible.
scale_type was added recently and I forgot to update operator*
histogram.hpp:
36: What are the requirements on the Axes parameter? I'm deducing that it must be either a std::tuple or a std::vector containing Axes, but I didn't see this explicitly stated anywhere.
It is a half-hidden implementation aspect, also see corresponding discussion with Alex. It should be better documented and the histogram class needs a static_assert for the two allowed configurations. Perhaps it can be completely hidden, I have to think about it more.
207: // precondition: argument sequence must be strictly ascending axis indices This is exactly the sort of information that needs to appear in the documentation. Use \pre. Also it would be nice if this weren't a requirement, and reduce_to could shuffle the axes.
Agreed, this is missing a static_assert. It is a requirement to be consistent with the run-time version in line 231. In case of the latter, shuffling the indices into correct order has a significant run-time cost: a copy and a full iteration over the indices. Both can be avoided if the indices are already sorted.
Does/should it work in the degenerate case of reducing a histogram to itself?
Yes.
234: BOOST_ASSERT_MSG(begin == end || Will reduce_to even work if begin == end?
Good point, the assert should fail if begin == end.
261: friend class python_access; I don't see a forward declaration of this. Please note that the semantics of this can be exceedingly strange when there is no prior declaration. See friend.cpp
Interesting, thanks for the explanation why this should always be forward declared.
128: if (Archive::is_loading::value) { this->~variable(); } Don't call the destructor like this.
How should I call it then?
- serialize for user defined axis types seems error prone, because labelled_base::serialize is inherited. If the user forgets to add serialize, the code will compile and silently slice the object.
How can I avoid this?
weight.hpp:
12: namespace detail { Be careful about exposing types from namespace detail in any way. It allows ADL in client code to look inside your detail namespace.
So better make weight_type and sample_type public?
26: detail::weight_type<T> weight(T&& t) { return {t}; No forwarding? The same goes for sample.
Right, that's bad.
axis/base.hpp:
- I only see operator==, but not operator!=.
Axes types are not required to have operator!=.
axis/interval_view.hpp:
39: operator== Should two interval_views be considered to be the same or different if they refer to distinct, but equal axes?
I don't remember now why I made interval_views equal-comparable. If it is only used by the unit-tests, I should define the equal operator locally in the text. I think there is no need for interval_views to be comparable.
- Actually, I find interval_view somewhat unconvincing. Why can't it just be a pure value that stores the upper and lower bounds? The value_type is usually going to be a built-in type anyway, not to mention that you haven't documented interval_view sufficiently for it to be used by user-defined axes.
Let's say an interval is returned as a std::pair<value_type, value_type>. Every time you call axis[k], axis::lower(k) and axis::lower(k+1) are called to fill both values of the pair. Now assume you want to fill a std::vector with all the bin edges (a realistic use case). You iterate over the axis, get the intervals p, and append p.first to the vector. When you reach the last bin, you also append p.second. All the other p.seconds are filled while iterating over the axis, but not used. For N bins, lower() is called 2 N times while only N+1 times the value is actually used. This is very inefficient. interval_view was my solution to avoid this superfluous calling of lower(). Is the compiler is smart enough to avoid the superfluous filling of values that are not used? I don't know, maybe. If so then I would also prefer the simpler solution. Another nice aspect of interval_view: it is explicitly convertible to the bin index. This allows you to run a range-based for-loop over an axis and query the histogram directly with these instances for (auto bin : histogram.axis(0)) { const auto value = histogram.at(bin); // only works because "bin" knows index }
axis/iterator.hpp:
- Does a separate reverse_iterator_over have any benefit over std::reverse_iterator?
Yes, but I forgot the reason.
- It seems inconsistent for axes to have rbegin/rend, when histogram itself doesn't.
It makes sense to reverse-iterate over an axis, but it makes no sense to reverse-iterate over a histogram. The iteration order over the counters in a histogram is an implementation detail.
332,350: this->~variable(); Don't call the destructor in the assignment operator.
Why not? See similar question above.
- Is there a reason that you're not just using std::vector for variable and category?
Yes, sizeof(detail::buffer<T>) < sizeof(std::vector<T>) and axis types should be as small as possible. We don't need growth, so there no need to store a capacity in addition to a size.
detail/axes.hpp:
521: optional_index call_impl(Tag, const std::tuple<T>& axes, It seems that this specialization prevents using a single element container with a single element static histogram, which seems inconsistent with the documentation and the behavior for dynamic histograms.
Correct, because this case is ambiguous. Let's say you have a 1d-histogram h1 and 2d-histogram h2. Both are dynamic histograms and you fill them. h2(1, 2); // ok h2(std::make_tuple(1, 2)); // ok, one argument must represent two values h1(1); // ok h1(std::make_tuple(1)); // ambiguous! h1 cannot know at compile-time whether it should unpack the tuple or pass it to the axis unchanged. Perhaps you have a weird axis that accepts tuples as value_type. I had to make a decision here and the most reasonable one is to pass single arguments of 1d histograms unchanged to the axis. Best regards, Hans

AMDG On 09/25/2018 01:34 PM, Hans Dembinski wrote:
128: if (Archive::is_loading::value) { this->~variable(); } Don't call the destructor like this.
How should I call it then?
Don't. You should only call the destructor explicitly if you intend to end the object's lifetime. In particular, it should only be used for objects that were constructed with placement new. The code has undefined behavior, because you are manipulating an object that does not exist (because it has been destroyed). Even if you ignore that problem, it's not exception-safe.
- serialize for user defined axis types seems error prone, because labelled_base::serialize is inherited. If the user forgets to add serialize, the code will compile and silently slice the object.
How can I avoid this?
The only simple solution is: don't use inheritance. It's also probably possible to use the non-member version of serialize, somehow. Note: The same issue appears for other members as well, but I don't really consider it a problem, because they aren't optional, and are, therefore, less prone to being forgotten than serialize.
weight.hpp:
12: namespace detail { Be careful about exposing types from namespace detail in any way. It allows ADL in client code to look inside your detail namespace.
So better make weight_type and sample_type public?
Or in their own namespace.
axis/base.hpp:
- I only see operator==, but not operator!=.
Axes types are not required to have operator!=.
That doesn't prevent you providing it, and it is more friendly to do so.
- Actually, I find interval_view somewhat unconvincing. Why can't it just be a pure value that stores the upper and lower bounds? The value_type is usually going to be a built-in type anyway, not to mention that you haven't documented interval_view sufficiently for it to be used by user-defined axes.
Let's say an interval is returned as a std::pair<value_type, value_type>. Every time you call axis[k], axis::lower(k) and axis::lower(k+1) are called to fill both values of the pair.
Now assume you want to fill a std::vector with all the bin edges (a realistic use case). You iterate over the axis, get the intervals p, and append p.first to the vector. When you reach the last bin, you also append p.second. All the other p.seconds are filled while iterating over the axis, but not used. For N bins, lower() is called 2 N times while only N+1 times the value is actually used. This is very inefficient.
interval_view was my solution to avoid this superfluous calling of lower(). Is the compiler is smart enough to avoid the superfluous filling of values that are not used? I don't know, maybe. If so then I would also prefer the simpler solution.
It should be possible. I would certainly assume, unless and until proven otherwise, that the compiler can optimize it as long as no reference or copy of the interval_view escapes from the immediate context.
Another nice aspect of interval_view: it is explicitly convertible to the bin index. This allows you to run a range-based for-loop over an axis and query the histogram directly with these instances
for (auto bin : histogram.axis(0)) { const auto value = histogram.at(bin); // only works because "bin" knows index }
That doesn't strictly require you to maintain a reference to the axis.
- Is there a reason that you're not just using std::vector for variable and category?
Yes, sizeof(detail::buffer<T>) < sizeof(std::vector<T>) and axis types should be as small as possible. We don't need growth, so there no need to store a capacity in addition to a size.
Why do the axis types need to be small? Even if you waste a bit of memory, it should be insignificant compared to the actual data, right? Also, if you're that concerned about memory usage, the label costs at least as much.
detail/axes.hpp:
521: optional_index call_impl(Tag, const std::tuple<T>& axes, It seems that this specialization prevents using a single element container with a single element static histogram, which seems inconsistent with the documentation and the behavior for dynamic histograms.
Correct, because this case is ambiguous. Let's say you have a 1d-histogram h1 and 2d-histogram h2. Both are dynamic histograms and you fill them.
h2(1, 2); // ok h2(std::make_tuple(1, 2)); // ok, one argument must represent two values h1(1); // ok h1(std::make_tuple(1)); // ambiguous!
h1 cannot know at compile-time whether it should unpack the tuple or pass it to the axis unchanged. Perhaps you have a weird axis that accepts tuples as value_type.
I had to make a decision here and the most reasonable one is to pass single arguments of 1d histograms unchanged to the axis.
Shouldn't the same reasoning apply to dynamic histograms as well? This definitely needs to be documented. The documentation says that you can pass a container, without saying that one argument is an exception. In Christ, Steven Watanabe

Hi Steven,
On 25. Sep 2018, at 22:45, Steven Watanabe via Boost <boost@lists.boost.org> wrote:
AMDG
On 09/25/2018 01:34 PM, Hans Dembinski wrote:
128: if (Archive::is_loading::value) { this->~variable(); } Don't call the destructor like this.
How should I call it then?
Don't. You should only call the destructor explicitly if you intend to end the object's lifetime. In particular, it should only be used for objects that were constructed with placement new. The code has undefined behavior, because you are manipulating an object that does not exist (because it has been destroyed). Even if you ignore that problem, it's not exception-safe.
I googled a bit but couldn't find that there is anything special about calling a destructor compared to calling other methods. https://stackoverflow.com/questions/14187006/is-calling-destructor-manually-... I want to deallocate the memory of axis::variable here, so I call the destructor manually, which does that. The alternative is to either repeat the deallocation code in the serialize function (error prone), or move the deallocation code into another method which is then called by the destructor and the serialize function. I don't see the gain unless something special happens when you call destructors.
Yes, sizeof(detail::buffer<T>) < sizeof(std::vector<T>) and axis types should be as small as possible. We don't need growth, so there no need to store a capacity in addition to a size.
Why do the axis types need to be small? Even if you waste a bit of memory, it should be insignificant compared to the actual data, right? Also, if you're that concerned about memory usage, the label costs at least as much.
The histogram iterators over axis types a lot. If they are small they are more likely to fit onto a cache line. When you use a dynamic histogram, the size of axis::any<…> is size of largest bounded axis type plus some bytes for the type index. The largest bounded axis type determines the size of axis::any, so I was generally careful in making the larger axis types not larger than necessary. True, labels have a size cost of several pointers, but having labels is more important than reducing the memory footprint, since they are very useful in any non-trivial program with more than one kind of histogram. I had a special tiny_string implementation at some point with the size of one pointer. It even had small-string-optimisation, you can find it here: https://github.com/HDembinski/stateful_pointer Then I decided that adding a custom string type is overkill and used boost::container::string instead. It is not overkill to use detail::buffer here instead of std::vector, since detail::buffer is anyway used by adaptive_storage. Replacing it here with std::vector does not make detail::buffer obsolete.
detail/axes.hpp:
521: optional_index call_impl(Tag, const std::tuple<T>& axes, It seems that this specialization prevents using a single element container with a single element static histogram, which seems inconsistent with the documentation and the behavior for dynamic histograms.
Correct, because this case is ambiguous. Let's say you have a 1d-histogram h1 and 2d-histogram h2. Both are dynamic histograms and you fill them.
h2(1, 2); // ok h2(std::make_tuple(1, 2)); // ok, one argument must represent two values h1(1); // ok h1(std::make_tuple(1)); // ambiguous!
h1 cannot know at compile-time whether it should unpack the tuple or pass it to the axis unchanged. Perhaps you have a weird axis that accepts tuples as value_type.
I had to make a decision here and the most reasonable one is to pass single arguments of 1d histograms unchanged to the axis.
Shouldn't the same reasoning apply to dynamic histograms as well? This definitely needs to be documented. The documentation says that you can pass a container, without saying that one argument is an exception.
static and dynamic histograms have the exact same behaviour in this regard. Agreed, this has to be documented.

AMDG On 09/25/2018 03:44 PM, Hans Dembinski wrote:
On 25. Sep 2018, at 22:45, Steven Watanabe via Boost <boost@lists.boost.org> wrote:
On 09/25/2018 01:34 PM, Hans Dembinski wrote:
128: if (Archive::is_loading::value) { this->~variable(); } Don't call the destructor like this.
How should I call it then?
Don't. You should only call the destructor explicitly if you intend to end the object's lifetime. In particular, it should only be used for objects that were constructed with placement new. The code has undefined behavior, because you are manipulating an object that does not exist (because it has been destroyed). Even if you ignore that problem, it's not exception-safe.
I googled a bit but couldn't find that there is anything special about calling a destructor compared to calling other methods. https://stackoverflow.com/questions/14187006/is-calling-destructor-manually-...
I want to deallocate the memory of axis::variable here, so I call the destructor manually, which does that. The alternative is to either repeat the deallocation code in the serialize function (error prone), or move the deallocation code into another method which is then called by the destructor and the serialize function. I don't see the gain unless something special happens when you call destructors.
Destructors (along with constructors) are special. "The lifetime of an object of type T ends when ... the destructor call starts" "The properties ascribed to objects throughout this International Standard apply for a given object only during its lifetime" [basic.life] (C++11) And it gets worse. You're not just deallocating the buffer. You're also calling the destructors of all subobjects, including the label. The only thing that saves you from a double free is that you're using small labels that fit in the internal buffer in your tests.
Yes, sizeof(detail::buffer<T>) < sizeof(std::vector<T>) and axis types should be as small as possible. We don't need growth, so there no need to store a capacity in addition to a size.
Why do the axis types need to be small? Even if you waste a bit of memory, it should be insignificant compared to the actual data, right? Also, if you're that concerned about memory usage, the label costs at least as much.
The histogram iterators over axis types a lot. If they are small they are more likely to fit onto a cache line. When you use a dynamic histogram, the size of axis::any<…> is size of largest bounded axis type plus some bytes for the type index. The largest bounded axis type determines the size of axis::any, so I was generally careful in making the larger axis types not larger than necessary.
True, labels have a size cost of several pointers, but having labels is more important than reducing the memory footprint, since they are very useful in any non-trivial program with more than one kind of histogram.
I had a special tiny_string implementation at some point with the size of one pointer. It even had small-string-optimisation, you can find it here: https://github.com/HDembinski/stateful_pointer
Then I decided that adding a custom string type is overkill and used boost::container::string instead.
It is not overkill to use detail::buffer here instead of std::vector, since detail::buffer is anyway used by adaptive_storage. Replacing it here with std::vector does not make detail::buffer obsolete.
My real complaint is not so much overkill as the difficulty of getting manual memory management right, which makes the code more error-prone and harder to understand. I believe that the problems I have pointed out are a sufficient demonstration of this.
detail/axes.hpp:
521: optional_index call_impl(Tag, const std::tuple<T>& axes, It seems that this specialization prevents using a single element container with a single element static histogram, which seems inconsistent with the documentation and the behavior for dynamic histograms.
Correct, because this case is ambiguous. Let's say you have a 1d-histogram h1 and 2d-histogram h2. Both are dynamic histograms and you fill them.
h2(1, 2); // ok h2(std::make_tuple(1, 2)); // ok, one argument must represent two values h1(1); // ok h1(std::make_tuple(1)); // ambiguous!
h1 cannot know at compile-time whether it should unpack the tuple or pass it to the axis unchanged. Perhaps you have a weird axis that accepts tuples as value_type.
I had to make a decision here and the most reasonable one is to pass single arguments of 1d histograms unchanged to the axis.
Shouldn't the same reasoning apply to dynamic histograms as well? This definitely needs to be documented. The documentation says that you can pass a container, without saying that one argument is an exception.
static and dynamic histograms have the exact same behaviour in this regard.
That may be what you intended, but it isn't what the code actually does. I can pass a one-element tuple to a dynamic histogram and it will be unpacked. Similarly, I can pass a (one element) vector to a (one axis) dynamic histogram and it will work. The latter has to be supported as the size cannot be detected at compile time.
Agreed, this has to be documented.
In Christ, Steven Watanabe

On 26. Sep 2018, at 00:42, Steven Watanabe via Boost <boost@lists.boost.org> wrote: Destructors (along with constructors) are special.
"The lifetime of an object of type T ends when ... the destructor call starts" "The properties ascribed to objects throughout this International Standard apply for a given object only during its lifetime" [basic.life] (C++11)
And it gets worse. You're not just deallocating the buffer. You're also calling the destructors of all subobjects, including the label. The only thing that saves you from a double free is that you're using small labels that fit in the internal buffer in your tests.
Sorry, you are right. Additional proof: https://wandbox.org/permlink/CtfJSrGZi3jCAk58 Ok, no calling of destructors.
It is not overkill to use detail::buffer here instead of std::vector, since detail::buffer is anyway used by adaptive_storage. Replacing it here with std::vector does not make detail::buffer obsolete.
My real complaint is not so much overkill as the difficulty of getting manual memory management right, which makes the code more error-prone and harder to understand. I believe that the problems I have pointed out are a sufficient demonstration of this.
Point taken and I learned my lesson.
static and dynamic histograms have the exact same behaviour in this regard.
That may be what you intended, but it isn't what the code actually does. I can pass a one-element tuple to a dynamic histogram and it will be unpacked. Similarly, I can pass a (one element) vector to a (one axis) dynamic histogram and it will work. The latter has to be supported as the size cannot be detected at compile time.
Ok, that's a bug. I thought I unit-tested this case, but I didn't. Will be fixed. https://github.com/HDembinski/histogram/issues/106

On Wed, 26 Sep 2018 at 11:17, Hans Dembinski via Boost < boost@lists.boost.org> wrote:
Ok, no calling of destructors.
You should, iff you use placement new to construct an object in uninitialized memory, that's what Steven started with saying. degski -- *“If something cannot go on forever, it will stop" - Herbert Stein*

On 26. Sep 2018, at 10:41, degski <degski@gmail.com> wrote:
On Wed, 26 Sep 2018 at 11:17, Hans Dembinski via Boost <boost@lists.boost.org <mailto:boost@lists.boost.org>> wrote: Ok, no calling of destructors.
You should, iff you use placement new to construct an object in uninitialized memory, that's what Steven started with saying.
Right, I got this now. When you have a union type with non-trivial bounded types you are also supposed to call the destructor manually when you switch types, and there it makes sense. My mistake was to believe that calling the destructor manually is like calling any other function. Now it is clear.

On Wed, 26 Sep 2018 at 12:04, Hans Dembinski <hans.dembinski@gmail.com> wrote:
On 26. Sep 2018, at 10:41, degski <degski@gmail.com> wrote:
On Wed, 26 Sep 2018 at 11:17, Hans Dembinski via Boost < boost@lists.boost.org> wrote:
Ok, no calling of destructors.
You should, iff you use placement new to construct an object in uninitialized memory, that's what Steven started with saying.
Right, I got this now. When you have a union type with non-trivial bounded types you are also supposed to call the destructor manually when you switch types, and there it makes sense. My mistake was to believe that calling the destructor manually is like calling any other function. Now it is clear.
No no, something Steven said just before his other remark, "destructors are special functions". Always pay attention to what Steven says, in a very literal sense, except the "Ad Majorem Gloriam Dei", ignore that. degski -- *“If something cannot go on forever, it will stop" - Herbert Stein*

AMDG On 09/25/2018 03:44 PM, Hans Dembinski wrote:
The histogram iterators over axis types a lot. If they are small they are more likely to fit onto a cache line. When you use a dynamic histogram, the size of axis::any<…> is size of largest bounded axis type plus some bytes for the type index. The largest bounded axis type determines the size of axis::any, so I was generally careful in making the larger axis types not larger than necessary.
This actually makes the optimization pointless. regular<pow> -> 3 doubles variable (with buffer) -> 1 pointer variable (with vector) -> 3 pointers In Christ, Steven Watanabe

On 26. Sep 2018, at 17:02, Steven Watanabe via Boost <boost@lists.boost.org> wrote:
On 09/25/2018 03:44 PM, Hans Dembinski wrote:
The histogram iterators over axis types a lot. If they are small they are more likely to fit onto a cache line. When you use a dynamic histogram, the size of axis::any<…> is size of largest bounded axis type plus some bytes for the type index. The largest bounded axis type determines the size of axis::any, so I was generally careful in making the larger axis types not larger than necessary.
This actually makes the optimization pointless. regular<pow> -> 3 doubles variable (with buffer) -> 1 pointer variable (with vector) -> 3 pointers
Fair enough, but the picture changes when the allocator has a non-zero size. Apart from that, aren't you bothered by the fact that a axis::variable (with vector) derived from axis::base stores the size twice, once as an int inside axis::base and again inside the vector, plus the superfluous field of capacity, and an extra instance of the allocator, while the current implementation stores the allocator only once, and does not waste any memory? I thought people here would tear me apart if I present a design that is so wasteful. Twice storing the size could be avoided by lifting the requirement that all axis types have to derive from axis::base, which I am considering. But with a standard vector you cannot avoid the superfluous capacity field and the duplicated allocator.

AMDG I vote to ACCEPT histogram into Boost. Critical issues: - Various exception safety and memory issues must be fixed. - The reference documentation must be completed. Specifically, various preconditions, requirements, and assumptions must be made explicit. Tests (I'm running out of time, so this is a bit incomplete): - Some of run-fail tests would be better off as just testing that the correct exception is thrown. (i.e. BOOST_TEST_THROWS). The ones that assert must force debug builds (<variant>debug) as assertions are a no-op in release builds. - For serialization tests, I think that you should save a text archive permanently. This will allow you to detect cases where the serialization format changes. (Pass the archive on the command line. The `run` rule in b2 has an `args` parameter.) utility.hpp:115: void deallocate(T*& p, std::size_t n) { Don't take the argument by reference. It's not guaranteed to work. axis_test.cpp:489: BOOST_TEST_EQ(db.size(), 5); // axis_type, char, double, long + ??? The ??? is an internal structure created by vector to support checked iterators (specifically it's std::_Container_proxy). This test will fail in release builds. index_mapper_test.cpp: I feel that this test is a bit too specific. The important part is the relationship between linearize/indices_to_index (detail/axes.hpp), index_cache, and index_mapper. The exact mapping doesn't matter from the point of view of program correctness, as long as they are all consistent. In Christ, Steven Watanabe

Dear Steven,
On 27. Sep 2018, at 03:49, Steven Watanabe via Boost <boost@lists.boost.org> wrote:
I vote to ACCEPT histogram into Boost.
thank you for the exceptionally thorough reading of the code and the documentation. I am really impressed how much you found by just reading the code. I believe you invested a significant fraction of your personal time into this review. So I feel I owe you one.
Critical issues: - Various exception safety and memory issues must be fixed. - The reference documentation must be completed. Specifically, various preconditions, requirements, and assumptions must be made explicit.
Tests (I'm running out of time, so this is a bit incomplete):
- Some of run-fail tests would be better off as just testing that the correct exception is thrown. (i.e. BOOST_TEST_THROWS). The ones that assert must force debug builds (<variant>debug) as assertions are a no-op in release builds.
- For serialization tests, I think that you should save a text archive permanently. This will allow you to detect cases where the serialization format changes. (Pass the archive on the command line. The `run` rule in b2 has an `args` parameter.)
utility.hpp:115: void deallocate(T*& p, std::size_t n) { Don't take the argument by reference. It's not guaranteed to work.
axis_test.cpp:489: BOOST_TEST_EQ(db.size(), 5); // axis_type, char, double, long + ??? The ??? is an internal structure created by vector to support checked iterators (specifically it's std::_Container_proxy). This test will fail in release builds.
index_mapper_test.cpp: I feel that this test is a bit too specific. The important part is the relationship between linearize/indices_to_index (detail/axes.hpp), index_cache, and index_mapper. The exact mapping doesn't matter from the point of view of program correctness, as long as they are all consistent.
I will fix these issues and the suggestions all make sense. Best regards, Hans

On 17/09/2018 21:06, Mateusz Loskot wrote:
The formal review of Hans Dembinski's Histogram library starts TODAY, [...] The author followed the "Don't pay for what you don't use" principle. [...] Documentation: http://hdembinski.github.io/histogram/doc/html/
Perhaps this idea falls into that principle, but there seems to be an inherent assumption in the axis classes that a transform will always return the same type as its input. (For example, the min_ and delta_ fields in the regular axis are stored as value_type, as are the input arguments lower and upper.) It seems like it should be trivial to "fix" that, though, by storing these already-transformed values as type transformed_type instead. using transformed_type = decltype(std::declval<transform_type>().forward(std::declval<value_type>())); Or similar. This should in theory allow an arbitrarily complex value_type to be used, provided that a suitable transform can provide bidirectional conversion to a simpler numeric type. On a related note, the method used to store the transform as a base class of the axis bothers me a bit, in particular that it calls the copy constructor to init the base rather than the move constructor, and that the init of min_ and delta_ (for axis::regular) are invoked on the original copy instead of the internal copy which is subsequently used for all other operations. I'm not 100% clear on the standards guarantees for the constructor initialiser list (in particular if a base initialiser is guaranteed to be fully constructed before the arguments to other initialisers are evaluated or not), so it's possible that this is UB if that guarantee is not met, but I would have thought it'd be more sensible to init as: regular(...) : transform_type(std::move(trans) , min_(transform_type::forward(lower)) , ... Such that the same instance is used in all cases. Also: transforms don't seem to be mentioned much in the documentation. I couldn't find any documented requirements to build your own, for example, though I worked it out from looking at the source.

Perhaps this idea falls into that principle, but there seems to be an inherent assumption in the axis classes that a transform will always return the same type as its input.
True, there is an inherent assumption that the transform does not change the value_type and only do a monotonic mapping, like log(x) or pow(x, const). But like you point out, this restriction is not necessary and it is easy to avoid. But just out of curiosity, can you name a use case where the transformed type is different from the value type? I'm not 100% clear on the standards guarantees for the constructor
initialiser list (in particular if a base initialiser is guaranteed to be fully constructed before the arguments to other initialisers are evaluated or not), so it's possible that this is UB if that guarantee is not met, but I would have thought it'd be more sensible to init as:
regular(...) : transform_type(std::move(trans) , min_(transform_type::forward(lower)) , ...
Such that the same instance is used in all cases.
AFAIK, the order in which these statements are executed is not guaranteed, that's why the ctor uses trans instead of transform_type. I am happy to be proven wrong by a language expert. If that's the case, I agree that your suggestion makes more sense. Also: transforms don't seem to be mentioned much in the documentation.
I couldn't find any documented requirements to build your own, for example, though I worked it out from looking at the source.
True, transforms were added late, I simply forgot to document them properly. I added an issue to that effect. Kind regards, Hans On Tue, Sep 18, 2018 at 10:04 AM Gavin Lambert via Boost < boost@lists.boost.org> wrote:
On 17/09/2018 21:06, Mateusz Loskot wrote:
The formal review of Hans Dembinski's Histogram library starts TODAY, [...] The author followed the "Don't pay for what you don't use" principle. [...] Documentation: http://hdembinski.github.io/histogram/doc/html/
Perhaps this idea falls into that principle, but there seems to be an inherent assumption in the axis classes that a transform will always return the same type as its input.
(For example, the min_ and delta_ fields in the regular axis are stored as value_type, as are the input arguments lower and upper.)
It seems like it should be trivial to "fix" that, though, by storing these already-transformed values as type transformed_type instead.
using transformed_type =
decltype(std::declval<transform_type>().forward(std::declval<value_type>()));
Or similar. This should in theory allow an arbitrarily complex value_type to be used, provided that a suitable transform can provide bidirectional conversion to a simpler numeric type.
On a related note, the method used to store the transform as a base class of the axis bothers me a bit, in particular that it calls the copy constructor to init the base rather than the move constructor, and that the init of min_ and delta_ (for axis::regular) are invoked on the original copy instead of the internal copy which is subsequently used for all other operations.
I'm not 100% clear on the standards guarantees for the constructor initialiser list (in particular if a base initialiser is guaranteed to be fully constructed before the arguments to other initialisers are evaluated or not), so it's possible that this is UB if that guarantee is not met, but I would have thought it'd be more sensible to init as:
regular(...) : transform_type(std::move(trans) , min_(transform_type::forward(lower)) , ...
Such that the same instance is used in all cases.
Also: transforms don't seem to be mentioned much in the documentation. I couldn't find any documented requirements to build your own, for example, though I worked it out from looking at the source.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Tue, 18 Sep 2018 at 20:56, Hans Dembinski via Boost <boost@lists.boost.org> wrote:
Also: transforms don't seem to be mentioned much in the documentation.
I couldn't find any documented requirements to build your own, for example, though I worked it out from looking at the source.
True, transforms were added late, I simply forgot to document them properly. I added an issue to that effect.
That is, for the record and easy follow-up https://github.com/HDembinski/histogram/issues/90 Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

AMDG On 09/18/2018 12:55 PM, Hans Dembinski via Boost wrote:
<snip> I'm not 100% clear on the standards guarantees for the constructor initialiser list (in particular if a base initialiser is guaranteed to be fully constructed before the arguments to other initialisers are evaluated or not), so it's possible that this is UB if that guarantee is not met, but I would have thought it'd be more sensible to init as:
regular(...) : transform_type(std::move(trans) , min_(transform_type::forward(lower)) , ...
Such that the same instance is used in all cases.
AFAIK, the order in which these statements are executed is not guaranteed, that's why the ctor uses trans instead of transform_type. I am happy to be proven wrong by a language expert. If that's the case, I agree that your suggestion makes more sense.
Initializers are never interleaved. "In a non-delegating constructor, initialization proceeds in the following order: - ...virtual base classes ... - Then, direct base classes ... - Then, non-static data members ..." [class.base.init] "The initialization performed by each mem-initializer constitutes a full-expression" [class.base.init] "Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated" [intro.execution] In Christ, Steven Watanabe

On 19/09/2018 06:55, Hans Dembinski wrote:
True, there is an inherent assumption that the transform does not change the value_type and only do a monotonic mapping, like log(x) or pow(x, const). But like you point out, this restriction is not necessary and it is easy to avoid. But just out of curiosity, can you name a use case where the transformed type is different from the value type?
One trivial case is where the input is integers (or an arbitrary-precision bigint type of some sort) but for whatever reason you want to get a histogram of the log or square root or something else floating-point. Another case might be where you want to use a larger structure as your input and use the transform to extract one particular field. Although this is probably less useful since it's a lossy transform and the inverse wouldn't be able to produce the full original object -- unless perhaps the transform was stateful and preserved its input (which doesn't make much sense as presumably it's non-unique, otherwise you wouldn't be using a histogram), or you don't care about the other fields in the structure when producing the inverse, or you don't need to make any calls which require calculating the inverse.
AFAIK, the order in which these statements are executed is not guaranteed, that's why the ctor uses trans instead of transform_type. I am happy to be proven wrong by a language expert. If that's the case, I agree that your suggestion makes more sense.
It would require a special kind of crazy for a compiler to emit calls to an instance method of a base class before calling the base class constructor. So I'm reasonably positive that it's safe. Steven's already addressed this as well, which seems to agree. Although I must admit a little surprise; I thought I remembered that even though the actual initialisation of members occurs in the defined order, evaluation of arguments to the initialisers might not be. Perhaps this was something tightened in C++11?

AMDG On 09/18/2018 05:17 PM, Gavin Lambert via Boost wrote:
<snip> Steven's already addressed this as well, which seems to agree. Although I must admit a little surprise; I thought I remembered that even though the actual initialisation of members occurs in the defined order, evaluation of arguments to the initialisers might not be. Perhaps this was something tightened in C++11?
It was always that way: "There is a sequence point (1.9) after the initialization of each base and member. The expression-list of a mem-initializer is evaluated as part of the initialization of the corresponding base or member." [C++03] In Christ, Steven Watanabe

On 19/09/2018 11:26, Steven Watanabe wrote:
On 09/18/2018 05:17 PM, Gavin Lambert wrote:
Steven's already addressed this as well, which seems to agree. Although I must admit a little surprise; I thought I remembered that even though the actual initialisation of members occurs in the defined order, evaluation of arguments to the initialisers might not be. Perhaps this was something tightened in C++11?
It was always that way:
"There is a sequence point (1.9) after the initialization of each base and member. The expression-list of a mem-initializer is evaluated as part of the initialization of the corresponding base or member." [C++03]
Correct me if I'm wrong, but that text does not appear in C++98. So it's not quite "always". :) Although C++98 does still require that base constructors are called first [class.base.init#5] and that it is legal to call instance methods in the initialisers for members (but not legal in the base constructor call itself) [class.base.init#8]. So that part has probably always been safe.

On 19. Sep 2018, at 01:17, Gavin Lambert via Boost <boost@lists.boost.org> wrote:
One trivial case is where the input is integers (or an arbitrary-precision bigint type of some sort) but for whatever reason you want to get a histogram of the log or square root or something else floating-point.
Ok, good point. I have another practical use case. I think axis::regular should work with Boost.Units, so you can make an axis that accepts only energies. The transform in this case would strip off the unit to produce a dimensionless scale which is then binned. https://github.com/HDembinski/histogram/issues/92
Another case might be where you want to use a larger structure as your input and use the transform to extract one particular field. Although this is probably less useful since it's a lossy transform and the inverse wouldn't be able to produce the full original object -- unless perhaps the transform was stateful and preserved its input (which doesn't make much sense as presumably it's non-unique, otherwise you wouldn't be using a histogram), or you don't care about the other fields in the structure when producing the inverse, or you don't need to make any calls which require calculating the inverse.
AFAIK, the order in which these statements are executed is not guaranteed, that's why the ctor uses trans instead of transform_type. I am happy to be proven wrong by a language expert. If that's the case, I agree that your suggestion makes more sense.
It would require a special kind of crazy for a compiler to emit calls to an instance method of a base class before calling the base class constructor. So I'm reasonably positive that it's safe.
Steven's already addressed this as well, which seems to agree. Although I must admit a little surprise; I thought I remembered that even though the actual initialisation of members occurs in the defined order, evaluation of arguments to the initialisers might not be. Perhaps this was something tightened in C++11?
I think my mistake was the following. When you call a function or a constructor, the order in which the arguments are evaluated is not guaranteed by the standard. But this here is a different case.

Here is my input. I hope it is helpful. Kind regards, Alex -- I like the histogram library because it offers a straightforward solution to a common problem. In my experience it has never been a complex problem (because of the size and complexity of my data), but still it is nice to have a clean solution, avoid duplication, and avoid pitfalls related to under/overflows, and open/closed intervals.
From that perspective, I would really like the library to be as intuitive and robust as possible.
Based on that I have a couple of questions / remarks *Axis_type concept (should probably be AxisType)* Why must axis_type derive from labelled_base and iterator_mixin? My understanding is that concepts are ideally specified in terms of usage and behavior, rather than implementation. So for iterator_mixin, would it not be better to just require that the following works (with the usual semantics) auto i = axis.begin(); auto i = axis.end(); auto i = axis.rbegin(); auto i = axis.rend(); You need to document what iterator concept you adhere to, e.g. I would promise/require OutputIterator with additional requirement of multipass. Likewise for labelled base can you just require: boost::string_view str = axis.label(); axis.label(str); auto a = axis.get_allocator(); // I don't really see the point of this to be honest and for base: int a = size(); int b = shape(); int c = uoflow(); shape() and uoflow() are not very intuitive names/functions to me, how about bool has_underflow() and bool has_overflow() to get the same information. As a user implementing my own axis type, I would prefer just implementing those ten member functions over deriving from the two base classes. It would allow me to make an axis-type that is independent of boost / boost::histogram, and avoid complex inheritance. And, if I wanted, I could still derive from the base classes. Further to those, you would just have auto b = axis[i];// where b is of type axis::bin_type auto i = axis.index(v); // where v is type axis::value_type For these you need to document how the under- and overflow bins are numbered For the AxisType concept definition to be complete, you must also define a BinType concept for the bin_type and a SortableValue concept (I suppose) for the value_type. Why did you chose to make the label a part of the axis? Of course there are many cases where we like the axis to have a label, but is that not true for many other classes, for instance std::vector, std::map, etc. Making this a default seems to be against the stated idea of "you don't pay for what you don't need". It seems to me that you require me to use a pair<axis, string> where I would be happy to use an axis. Why is it necessary for axes to be comparable? And is is not asking for trouble to also compare the axis title (considering lower/upper case typos, etc.) *Not documenting the histogram class* I could not find the documentation/reference for the histogram class. I suppose this is on purpose because there are two implementations that have a common interface? In that case I would document the common interface as a concept too. The user should then in my opinion also not use the histogram class directly, but strictly rely on your factory functions that create Histogram (the concept, not the class) objects. In the tutorial there is the following part to create histogram by copying or moving axes from a std::vector of axes. auto h1 = make_dynamic_histogram(v.begin(), v.end()); // copying axes auto h2 = bh::histogram<decltype(v)>(std::move<v>); // moving axes Now, h1 seems very reasonable to me, but for h2 you now need to use the undocumented histogram class. It is not clear to me what the two template arguments are for in histogram<A,S>. It strikes me as strange that A should be a vector<axis::any_std>, because having the axes stored in a vector seems to be an implementation detail irrelevant to the user. Perhaps this could be avoided by providing factory functions, that take a Range of axes instead of pairs of iterators. e.g. template<typename AxisRange> make_dynamic_histogram_from_range(AxisRange&& axes); template<typename Storage, typename AxisRange> make_dynamic_histogram_from_range(Storage&& storage, AxisRange&& axes); as this is Perfect Forwarding, you can automatically decide whether to move or copy. Furthermore, the user is not limited to providing axes in a std::vector *Not having an overflow bin for Circular axes* How can this type of axes deal with NaN input? *Fill Histogram* The functional style example seems overly complicated and apparently (according to your comments) depends on smartness of the compiler, would the following not be preferred? auto h2 = make_static_histogram( bh::axis::regular<>(8, 0, 4), bh::axis::regular<>(10, 0, 5)); for(auto&& v : input_data) h2(v); I know this is not strictly functional, but you could easily provide another factory function to do this (i.e. something like "make_and_fill_histogram"). *Access bin counts* It seems unfortunate that the methods to access the indices of a bin are methods of the iterator rather than the value. This would preclude most uses of range-based for-loops with histograms. I see that to change this could be tricky, depending on the iterator concept you wish to support. Perhaps it is an option to provide separate Ranges (Views) for histograms that iterate only over values; over values and variances, over indices and values and over indices, values and variances. You could have something like this: for(auto&& v : h) // alternatively: for(auto&& v : indices_and_values_view(h)) { std::cout << v.idx(0) << ' ' << v.idx(1) << ' ' << v.value() << std::endl; } Instead of: for (auto it = h.begin(); it != h.end(); ++it) { std::cout << it.idx(0) << ' ' << it.idx(1) << ' ' << it->value() << std::endl; } *Functionality* The documentation says that automated bin boundaries are not an option, because there is no consensus on how to do this. While I understand you would not want to take this on, I would think a library that includes a number of methods to do this with the possibility to add your own would be very valuable and lack of consensus is no real argument. For instance, GIS packages do this to create legends for geographical maps, and I would like to have this capability at my fingertips. It would be useful to be able to aggregate bins with an intuitive interface, e.g. if I have the population age distribution in 1-year intervals, aggregate it to 5-year intervals. This would be a generalization of the reduce_to function. I like that you included the option to combine storages. Have you also considered the option to subtract storages / samples? This would make histogram more suitable for moving window type analysis. *Conclusion* I vote for inclusion in Boost because it offers fundamental and very useful functionality. - What is your evaluation of the design? -- More can be done to simplify concepts -- More can be done to simplify the interface, especially to support use with range-based for-loops. -- I think the AxisType concept would be better without the label - What is your evaluation of the implementation? -- Didn't look in detail. To me it seemed more complex than necessary, but perhaps that is where the good performance comes from. - What is your evaluation of the documentation? -- Good, could always be better. It is enough to get started. Some important parts are still missing - What is your evaluation of the potential usefulness of the library? -- I would use it - Did you try to use the library? With what compiler? -- No - Did you have any problems? -- No - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? -- A careful reading, and checking in code where I did not understand docs - Are you knowledgeable about the problem domain? -- Not really *Spitting of coffee* The documentation is the first impression people will get from a library. It is incredibly generous of reviewers to pick through the language and give detailed general and specific advice. Avoiding the passive tense where possible is good and useful advice.

*** Please ignore the previous email, I started writing an answer, but then had to stop. Somehow that email was send accidentally.*** Dear Alex, thank you very much for the review! Let me answer some comments.
On 20. Sep 2018, at 16:51, a.hagen-zanker--- via Boost <boost@lists.boost.org> wrote: Based on that I have a couple of questions / remarks
*Axis_type concept (should probably be AxisType)*
Ok (Steven also commented on this).
Why must axis_type derive from labelled_base and iterator_mixin?
Good point, the label and the functionality provided by the iterator_mixin can be made optional. https://github.com/HDembinski/histogram/issues/93
My understanding is that concepts are ideally specified in terms of usage and behavior, rather than implementation. So for iterator_mixin, would it not be better to just require that the following works (with the usual semantics)
Ideally yes, "Although practicality beats purity." I will come back to this below.
auto i = axis.begin(); auto i = axis.end(); auto i = axis.rbegin(); auto i = axis.rend();
You need to document what iterator concept you adhere to, e.g. I would promise/require OutputIterator with additional requirement of multipass.
I need to check, but I think these iterators can be made optional.
Likewise for labelled base can you just require:
boost::string_view str = axis.label(); axis.label(str); auto a = axis.get_allocator(); // I don't really see the point of this to be honest
axis.get_allocator() can be optional. Some axis types need an allocator to get memory from the heap, other don't. I didn't want to store the allocator twice, so an axis that needs an allocator uses the label allocator.
and for base:
int a = size(); int b = shape(); int c = uoflow();
shape() and uoflow() are not very intuitive names/functions to me, how about bool has_underflow() and bool has_overflow() to get the same information.
Do you have a suggestion for how to name "shape()" instead? I am not 100 % happy myself. extend() ? total_size() ? I see the point with uoflow(). It should return the enum instead of an int. An implementation detail is leaking here.
As a user implementing my own axis type, I would prefer just implementing those ten member functions over deriving from the two base classes. It would allow me to make an axis-type that is independent of boost / boost::histogram, and avoid complex inheritance. And, if I wanted, I could still derive from the base classes.
The only requirement I want to keep is that users have to derive from axis::base. I don't see how this limits anyone in their freedom to implement custom axis types, and it ensures that there is a standard layout in memory for the basic numbers that an axis has to provide to the histogram host. This enables optimisations that matter for the dynamic version of the histogram, where the axis types are stored in a variant-like type. If all axis types derive from axis::base, then I can write a visitor that static_casts to axis::base to query these numbers. We checked that gcc then removes the visitation code, because it repeatedly does the same thing. As compilers become better and better, this optimisation will also be applied by other compilers. In short, for the sake of performance, it is mandatory that all axis types derive from axis::base, while deriving from iterator_mixin and axis::labeled_base should be optional.
Further to those, you would just have
auto b = axis[i];// where b is of type axis::bin_type auto i = axis.index(v); // where v is type axis::value_type
For these you need to document how the under- and overflow bins are numbered
Agreed, this will become part of the specs.
For the AxisType concept definition to be complete, you must also define a BinType concept for the bin_type and a SortableValue concept (I suppose) for the value_type.
Yes to BinType concept, while the SortableValue concept is only required for some axis types. The axis::category only requires the value_type to be equal-comparable and not sortable.
Why did you chose to make the label a part of the axis? Of course there are many cases where we like the axis to have a label, but is that not true for many other classes, for instance std::vector, std::map, etc.
I don't know what you mean by "std::vector, std::map". I agree with you that you need to be able to make a custom axis without a label if you wish.
Making this a default seems to be against the stated idea of "you don't pay for what you don't need". It seems to me that you require me to use a pair<axis, string> where I would be happy to use an axis.
All the internal axis types have a builtin label, which is a design decision based on the experience that labels are generally useful to a large audience. Once you can write your own axis types that do not use labels, I think this point is settled, right?
Why is it necessary for axes to be comparable? And is is not asking for trouble to also compare the axis title (considering lower/upper case typos, etc.)
Histograms should be addable, but you should only be able to add histograms with the same logical layout. It makes no sense to add two histograms which have the same number of bins, but different axis types. Or further, which have the same axis types, but different axis labels. The label is a way to make a logical distinction between two otherwise identical axis types, therefore it is included in the comparison. How exactly do you foresee trouble?
*Not documenting the histogram class*
I could not find the documentation/reference for the histogram class. I suppose this is on purpose because there are two implementations that have a common interface? In that case I would document the common interface as a concept too.
The histogram template is not included in the reference, but that is a mistake. It should be there, it is not an implementation detail (reference generator puts *unspecified*, but that's wrong). Perhaps someone with more experience with the doxygen scripts in Boost can help? I merged the two implementations into one templated class a while ago, which is much prettier, and updated the docs. Since you found an obsolete reference to two implementations, could you please point me to where you read that?
In the tutorial there is the following part to create histogram by copying or moving axes from a std::vector of axes.
auto h1 = make_dynamic_histogram(v.begin(), v.end()); // copying axes auto h2 = bh::histogram<decltype(v)>(std::move<v>); // moving axes
Now, h1 seems very reasonable to me, but for h2 you now need to use the undocumented histogram class. It is not clear to me what the two template arguments are for in histogram<A,S>. It strikes me as strange that A should be a vector<axis::any_std>, because having the axes stored in a vector seems to be an implementation detail irrelevant to the user.
It is not an implementation detail. Histogram supports two ways of storing axis types, via std::tuple or via std::vector<axis::any<…>>. These two options are part of the contract. Anyway, these line of thought is based on the assumption that the histogram itself is unspecified, which is not the case. Sorry about the error in the reference.
Perhaps this could be avoided by providing factory functions, that take a Range of axes instead of pairs of iterators. e.g.
template<typename AxisRange> make_dynamic_histogram_from_range(AxisRange&& axes); template<typename Storage, typename AxisRange> make_dynamic_histogram_from_range(Storage&& storage, AxisRange&& axes);
Optional support for ranges or std::span can be added at a later point, this is just syntactic sugar. https://github.com/HDembinski/histogram/issues/94
as this is Perfect Forwarding, you can automatically decide whether to move or copy. Furthermore, the user is not limited to providing axes in a std::vector
I see no need for the user to use anything other std::vector or std::tuple to store axis types. The implementation could be made more flexible in that regard, but there is simply nothing to gain by doing so, only an increase in code complexity.
*Not having an overflow bin for Circular axes*
How can this type of axes deal with NaN input?
Good point! Passing NaN to a circular axis is currently UB. There are two solutions. 1) Allow an optional overflow bin to count NaNs (off by default) 2) Fail an assert when NaN is passed I prefer option 1. https://github.com/HDembinski/histogram/issues/95
*Fill Histogram*
The functional style example seems overly complicated and apparently (according to your comments) depends on smartness of the compiler, would the following not be preferred?
auto h2 = make_static_histogram( bh::axis::regular<>(8, 0, 4), bh::axis::regular<>(10, 0, 5)); for(auto&& v : input_data) h2(v);
I know this is not strictly functional, but you could easily provide another factory function to do this (i.e. something like "make_and_fill_histogram").
I am not sure what you mean, but I think you want me to change the example? I would not use functional style myself for the same reasons you mention, but some people requested this and so I put an example in the guide to show it is possible and discuss the caveats. I think this is a good solution to the documentation problem.
*Access bin counts*
It seems unfortunate that the methods to access the indices of a bin are methods of the iterator rather than the value. This would preclude most uses of range-based for-loops with histograms. I see that to change this could be tricky, depending on the iterator concept you wish to support. Perhaps it is an option to provide separate Ranges (Views) for histograms that iterate only over values; over values and variances, over indices and values and over indices, values and variances.
You could have something like this:
for(auto&& v : h) // alternatively: for(auto&& v : indices_and_values_view(h)) { std::cout << v.idx(0) << ' ' << v.idx(1) << ' ' << v.value() << std::endl; }
Instead of:
for (auto it = h.begin(); it != h.end(); ++it) { std::cout << it.idx(0) << ' ' << it.idx(1) << ' ' << it->value() << std::endl; }
I also like range-based for-loops, and I considered this solution, but then decided against it. Providing these views adds complexity to the interface and the implementation just to enable a bit of syntactic sugar. The current iterator provides all extra information you may want to query when you iterate over a multi-dimensional histogram, but also behaves like a primitive iterator over the counter values, so that simple algorithms like std::accumulate work. STL algorithms do not work anyway with this data structure when you have to do something more complex with the content, because they were not designed to handle multi-dimensional data. But somehow I guess users will not be happy unless they can use range-based for-loops, so I will reconsider. https://github.com/HDembinski/histogram/issues/96
*Functionality*
The documentation says that automated bin boundaries are not an option, because there is no consensus on how to do this. While I understand you would not want to take this on, I would think a library that includes a number of methods to do this with the possibility to add your own would be very valuable and lack of consensus is no real argument. For instance, GIS packages do this to create legends for geographical maps, and I would like to have this capability at my fingertips.
Any form of automated bin boundary computation that requires multi-pass on the data will not be handled by the library. I am currently considering simple forms of automated bin boundary computation that can be implemented in single-pass, like automatically adding bins to a regular axis when a value falls outside the current range.
It would be useful to be able to aggregate bins with an intuitive interface, e.g. if I have the population age distribution in 1-year intervals, aggregate it to 5-year intervals. This would be a generalization of the reduce_to function.
I plan to add a support library with common algorithms. "reduce_to" will become a free function and move there, and an option to aggregate bins like you suggest will also be added, called "rebin". Perhaps these two can/should be combined. https://github.com/HDembinski/histogram/issues/98
I like that you included the option to combine storages. Have you also considered the option to subtract storages / samples? This would make histogram more suitable for moving window type analysis.
Subtracting histograms is currently not implemented, because I originally wanted to maintain that bin entries are always non-negative. But it should be optionally available if the storage supports negative entries. https://github.com/HDembinski/histogram/issues/99
*Conclusion*
I vote for inclusion in Boost because it offers fundamental and very useful functionality.
Thanks!
- What is your evaluation of the implementation? -- Didn't look in detail. To me it seemed more complex than necessary, but perhaps that is where the good performance comes from.
If you have specific suggestions on how to simplify things, let me know. In general, the more use and corner cases the library has to support, the more complex its implementation gets. Best regards, Hans

-----Original Message----- From: Hans Dembinski [mailto:hans.dembinski@gmail.com]
Ideally yes, "Although practicality beats purity." I will come back to this below.
Perhaps not in a Boost library
axis.get_allocator() can be optional. Some axis types need an allocator to get memory from the heap, other don't. I didn't want to store the allocator twice, so an axis that needs an allocator uses the label allocator.
I think it is best to make the concepts as simple as possible. "optional" to me means that it is not part of the concept.
shape() and uoflow() are not very intuitive names/functions to me, how about bool has_underflow() and bool has_overflow() to get the same information.
Do you have a suggestion for how to name "shape()" instead? I am not 100 % happy myself. extend() ? total_size() ?
Do you really need this function? I thought that with size(), has_overflow() and has_underflow() you would convey all information you need.
In short, for the sake of performance, it is mandatory that all axis types derive from axis::base, while deriving from iterator_mixin and axis::labeled_base should be optional.
OK, will take your word for it. I would be on the purity over practicality side here.
Why did you chose to make the label a part of the axis? Of course there are many cases where we like the axis to have a label, but is that not true for many other classes, for instance std::vector, std::map, etc.
I don't know what you mean by "std::vector, std::map". I agree with you that you need to be able to make a custom axis without a label if you wish.
I think it is a purity vs practicality issue. A label is not essential to an axis, so I would rather not require it. You say in practice we always have a label, which also makes sense.
Making this a default seems to be against the stated idea of "you don't pay for what you don't need". It seems to me that you require me to use a pair<axis, string> where I would be happy to use an axis.
All the internal axis types have a builtin label, which is a design decision based on the experience that labels are generally useful to a large audience. Once you can write your own axis types that do not use labels, I think this point is settled, right?
Yes
Why is it necessary for axes to be comparable? And is is not asking for trouble to also compare the axis title (considering lower/upper case typos, etc.)
Histograms should be addable, but you should only be able to add histograms with the same logical layout. It makes no sense to add two histograms which have the same number of bins, but different axis types. Or further, which have the same axis types, but different axis labels. The label is a way to make a logical distinction between two otherwise identical axis types, therefore it is included in the comparison. How exactly do you foresee trouble?
When these strings are provided in runtime, it is easy for the user to make a mistake. For instance trying to add a "Distance bands count" to a "Distance band count". Also having labels seems to invite programming styles where users keep poor track of their axes, and use the label as a kind of handle.
I merged the two implementations into one templated class a while ago, which is much prettier, and updated the docs. Since you found an obsolete reference to two implementations, could you please point me to where you read that?
http://hdembinski.github.io/histogram/doc/html/histogram/abstract.html mentions "two variants with a common interface", which I interpreted as two implementations.
Now, h1 seems very reasonable to me, but for h2 you now need to use the undocumented histogram class. It is not clear to me what the two template arguments are for in histogram<A,S>. It strikes me as strange that A should be a >>vector<axis::any_std>, because having the axes stored in a vector seems to be an implementation detail irrelevant to the user.
It is not an implementation detail. Histogram supports two ways of storing axis types, via std::tuple or via std::vector<axis::any<…>>. These two options are part of the contract. Anyway, these line of thought is based on the assumption that the histogram itself is unspecified, which is not the case. Sorry about the error in the reference.
To me these seem to be implementation details. Why would the user need to know how the axes are stored by the histogram.
I see no need for the user to use anything other std::vector or std::tuple to store axis types. The implementation could be made more flexible in that regard, but there is simply nothing to gain by doing so, only an increase in code complexity.
Makes sense. But it depends on your definition of complexity. To me the rationale would be, the user needs to supply a range of axes so I ask for a range of axes. Your rationale seems to be, the histogram uses a vector of axes so the user supplies a vector of axes. I don't think it is a big issue, I would just (slightly) prefer to not require the user to use a vector.
I know this is not strictly functional, but you could easily provide another factory function to do this (i.e. something like "make_and_fill_histogram").
I am not sure what you mean, but I think you want me to change the example?
Yes, you could use a lambda to have a solution that does not depend on smartness of compiler. Or you could provide a free function to actually do what the example does. Now you seem to say, that a functional style might work, which is not very conclusive. auto h = make_and_fill_histogram(axes, data); // a free function you provide or auto h = lambda(data); // a lambda given in the example

On 21. Sep 2018, at 13:37, <a.hagen-zanker@surrey.ac.uk> <a.hagen-zanker@surrey.ac.uk> wrote:
axis.get_allocator() can be optional. Some axis types need an allocator to get memory from the heap, other don't. I didn't want to store the allocator twice, so an axis that needs an allocator uses the label allocator.
I think it is best to make the concepts as simple as possible. "optional" to me means that it is not part of the concept.
A concept can have optional traits, e.g. the Allocator concept in the std library has optional traits. http://www.enseignement.polytechnique.fr/informatique/INF478/docs/Cpp/en/cpp...
shape() and uoflow() are not very intuitive names/functions to me, how about bool has_underflow() and bool has_overflow() to get the same information.
Do you have a suggestion for how to name "shape()" instead? I am not 100 % happy myself. extend() ? total_size() ?
Do you really need this function? I thought that with size(), has_overflow() and has_underflow() you would convey all information you need.
shape() is more concise than `size() + has_overflow() + has_underflow()`, and the doxy string explains what it returns. If you have a better name, let me know.
Histograms should be addable, but you should only be able to add histograms with the same logical layout. It makes no sense to add two histograms which have the same number of bins, but different axis types. Or further, which have the same axis types, but different axis labels. The label is a way to make a logical distinction between two otherwise identical axis types, therefore it is included in the comparison. How exactly do you foresee trouble?
When these strings are provided in runtime, it is easy for the user to make a mistake. For instance trying to add a "Distance bands count" to a "Distance band count".
Firstly, it is better if code breaks in obvious ways than in mysterious ways. If you have a typo in your code like this, this is a bug. It is nice if the library catches that bug for you. And think about the alternative that you are suggesting. For sure, it is better to catch a typo early than to silently add two histograms which binned "age in years" and "yearly income in $1000" without throwing any error. You may think the program works correctly and then ruin a business. Secondly, spellings mistakes are unlikely. When do you add histograms? When you have generated several histograms in parallel and then you want to add them all up. The individual histograms will be generated by the very same code, so no spelling mistakes.
http://hdembinski.github.io/histogram/doc/html/histogram/abstract.html mentions "two variants with a common interface", which I interpreted as two implementations.
Ah, ok. That wording is alright. There is only one templated histogram class now, while in the past there were two.
Now, h1 seems very reasonable to me, but for h2 you now need to use the undocumented histogram class. It is not clear to me what the two template arguments are for in histogram<A,S>. It strikes me as strange that A should be a >>vector<axis::any_std>, because having the axes stored in a vector seems to be an implementation detail irrelevant to the user.
It is not an implementation detail. Histogram supports two ways of storing axis types, via std::tuple or via std::vector<axis::any<…>>. These two options are part of the contract. Anyway, these line of thought is based on the assumption that the histogram itself is unspecified, which is not the case. Sorry about the error in the reference.
To me these seem to be implementation details. Why would the user need to know how the axes are stored by the histogram.
This is explained in detail in the guide. You are basically saying: why do I need to make a choice between std::array, std::vector, and std::tuple. They are all containers, how they work is an implementation detail. When you can use compile-time configured histograms, they are much better, but sometimes you can't. Therefore, the library also supports histograms which are configured at run-time. Under the hood, these two implementations are very different. For the most part, the difference is hidden from you. You just either use make_static_histogram or make_dynamic_histogram and that's it. I cannot hide the difference completely, so I need to inform the user about it.
I know this is not strictly functional, but you could easily provide another factory function to do this (i.e. something like "make_and_fill_histogram").
I am not sure what you mean, but I think you want me to change the example?
Yes, you could use a lambda to have a solution that does not depend on smartness of compiler. Or you could provide a free function to actually do what the example does. Now you seem to say, that a functional style might work, which is not very conclusive.
auto h = make_and_fill_histogram(axes, data); // a free function you provide
I am against adding this for the sake of maintaining a minimal sufficient interface. I don't think it is a common realistic use case. Often you need histograms when you cannot hold the original data in memory all at once. Therefore, usually you don't have a container with all the data at hand. This is syntactic sugar, so if users actually request it, it can be added later. In any case, the whole point of the example was to show that histogram is compatible with certain STL algorithms, although there are some caveats. The STL algorithms in question often assume that the functor is cheap to copy. A histogram is not cheap to copy, only cheap to move. I cannot change the way how the STL algorithms are implemented, all I can do is inform my users that they can use these algorithms if they wish (and some explicitly requested this), but that it may not be the most performant use of the library. It is better to give this information than not.

shape() is more concise than `size() + has_overflow() + has_underflow()`, and the doxy string explains what it returns. If you have a better name, let me know
I suppose total_size() or full_size() is more intuitive, but of course not so elegant to have both size() and total_size(). [...]
Secondly, spellings mistakes are unlikely. When do you add histograms? When you have generated several histograms in parallel and then you want to add them all up. The individual histograms will be generated by the very same code, so no spelling mistakes.
No need to check for identical labels either then. [...]
This is explained in detail in the guide. You are basically saying: why do I need to make a choice between std::array, std::vector, and std::tuple. They are all containers, how they work is an implementation detail. When you can use compile-time configured histograms, they are much better, but sometimes you can't. Therefore, the library also supports histograms which are configured at run-time. Under the hood, these two implementations are very different. For the most part, the difference is hidden from you. You just either use make_static_histogram or make_dynamic_histogram and that's it. I cannot hide the difference completely, so I need to inform the user about it.
I didn't object to make_static_histogram() / make_dynamic_histogram(), as you explain these reflect a choice the user has to make. I did object to: auto h = histogram< vector<axis_any> > { move(my_axis _vector)}; which requires more detailed knowledge about the histogram class.

On 24. Sep 2018, at 13:24, <a.hagen-zanker@surrey.ac.uk> <a.hagen-zanker@surrey.ac.uk> wrote:
shape() is more concise than `size() + has_overflow() + has_underflow()`, and the doxy string explains what it returns. If you have a better name, let me know
I suppose total_size() or full_size() is more intuitive, but of course not so elegant to have both size() and total_size().
total_size() is less ambiguous than shape(), so I am half-convinced. I would like to have one-word method instead of two-word method.
[...]
Secondly, spellings mistakes are unlikely. When do you add histograms? When you have generated several histograms in parallel and then you want to add them all up. The individual histograms will be generated by the very same code, so no spelling mistakes.
No need to check for identical labels either then.
Even if you reject the second of my two arguments, the first one still stands, but I also don't agree with you here. What if you have two programs which generate histograms which differ only in labels and then write them to files. Then you have another program to add histograms from filenames that you provide on the cmd-line. Not an artificial scenario, but one that I encountered. Checking labels prevents the second program from accidentally adding logically-different kinds of histograms. If you are worried about the run-time cost of comparing labels, it is negligible if your axes types have empty labels.
[...]
This is explained in detail in the guide. You are basically saying: why do I need to make a choice between std::array, std::vector, and std::tuple. They are all containers, how they work is an implementation detail. When you can use compile-time configured histograms, they are much better, but sometimes you can't. Therefore, the library also supports histograms which are configured at run-time. Under the hood, these two implementations are very different. For the most part, the difference is hidden from you. You just either use make_static_histogram or make_dynamic_histogram and that's it. I cannot hide the difference completely, so I need to inform the user about it.
I didn't object to make_static_histogram() / make_dynamic_histogram(), as you explain these reflect a choice the user has to make.
I did object to:
auto h = histogram< vector<axis_any> > { move(my_axis _vector)};
Ok, very well. Perhaps then there should be a make_histogram(…) function that accepts any container of axis types and creates a static histogram if the argument allows it, and a dynamic histogram otherwise.

On Mon, 24 Sep 2018 at 16:19, Hans Dembinski via Boost <boost@lists.boost.org> wrote:
On 24. Sep 2018, at 13:24, <a.hagen-zanker@surrey.ac.uk> <a.hagen-zanker@surrey.ac.uk> wrote:
shape() is more concise than `size() + has_overflow() + has_underflow()`, and the doxy string explains what it returns. If you have a better name, let me know
I suppose total_size() or full_size() is more intuitive, but of course not so elegant to have both size() and total_size().
total_size() is less ambiguous than shape(), so I am half-convinced. I would like to have one-word method instead of two-word method.
I'm not great in naming things, but perhaps: size() shape -> capacity() // as total size including optionally enabled space uoflow() -> tails() Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

On 24. Sep 2018, at 18:33, Mateusz Loskot via Boost <boost@lists.boost.org> wrote:
On Mon, 24 Sep 2018 at 16:19, Hans Dembinski via Boost <boost@lists.boost.org> wrote:
On 24. Sep 2018, at 13:24, <a.hagen-zanker@surrey.ac.uk> <a.hagen-zanker@surrey.ac.uk> wrote:
shape() is more concise than `size() + has_overflow() + has_underflow()`, and the doxy string explains what it returns. If you have a better name, let me know
I suppose total_size() or full_size() is more intuitive, but of course not so elegant to have both size() and total_size().
total_size() is less ambiguous than shape(), so I am half-convinced. I would like to have one-word method instead of two-word method.
I'm not great in naming things, but perhaps:
size() shape -> capacity() // as total size including optionally enabled space
I was also thinking of capacity(), but it may be raising the wrong association? In a sense, it fits: capacity() in a std::vector tells you how much memory for how many items has been actually allocated, while the number of items that you requested is size() items. But people should not think that the existence of capacity() implies that the axis can grow more bins beyond the initial configuration. Wider feedback from the list on this would be very much appreciated.
uoflow() -> tails()
Currently, over/underflow is configured with the enum class uoflow_type, so it makes sense to call the accessor uoflow(). If this things is called tails(), then we should also have an enum class tails_type. I am not 100 % happy with "uoflow", but I am not sure if "tails" should replace it.

On 25/09/2018 09:00, Hans Dembinski wrote:
total_size() is less ambiguous than shape(), so I am half-convinced. I would like to have one-word method instead of two-word method.
I'm not great in naming things, but perhaps:
size() shape -> capacity() // as total size including optionally enabled space
I was also thinking of capacity(), but it may be raising the wrong association? In a sense, it fits: capacity() in a std::vector tells you how much memory for how many items has been actually allocated, while the number of items that you requested is size() items. But people should not think that the existence of capacity() implies that the axis can grow more bins beyond the initial configuration.
Wider feedback from the list on this would be very much appreciated.
How about "extent()"? It still conveys a size-like noun, and has association with "physical size" in a filesystem context, but without the connotations of "capacity()" in a container context.
uoflow() -> tails()
Currently, over/underflow is configured with the enum class uoflow_type, so it makes sense to call the accessor uoflow(). If this things is called tails(), then we should also have an enum class tails_type. I am not 100 % happy with "uoflow", but I am not sure if "tails" should replace it.
Agreed that consistency is important. I don't like "tails"; I'm neutral on "uoflow" (perhaps a shade against due to difficulty to type). I believe someone else suggested "excess" or "extra". Perhaps that might be more suitable?

Hi Gavin,
On 25. Sep 2018, at 02:44, Gavin Lambert via Boost <boost@lists.boost.org> wrote:
How about "extent()"?
It still conveys a size-like noun, and has association with "physical size" in a filesystem context, but without the connotations of "capacity()" in a container context. […] Agreed that consistency is important. I don't like "tails"; I'm neutral on "uoflow" (perhaps a shade against due to difficulty to type).
I believe someone else suggested "excess" or "extra". Perhaps that might be more suitable?
Thank you for the input! I like "extent()" as a replacement for "shape()". That's my favourite so far. I slightly prefer "extra" over "excess". On the one hand, "excess" sounds more specific than "extra". On the other hand, "excess" sounds like we could use this method to access the value of the overflow/underflow bins, but the method merely queries whether overflow/underflow bins exist. Maybe that's just because of the phonetic similarity between "excess" and "access". "extra" is nice, because it is about extra bins. Here is another alternative for "uoflow()". The method is simply called options() and options_type is an enum class that holds or-able bit-masks enum class options_type { underflow_bin = 1, overflow_bin = 2 // more options could be added in the future }; This is straight-forward and future-safe. Hans

AMDG On 09/21/2018 04:23 AM, Hans Dembinski via Boost wrote:
<snip> In short, for the sake of performance, it is mandatory that all axis types derive from axis::base, while deriving from iterator_mixin and axis::labeled_base should be optional.
I feel that I should point out that axis::any does not meet this requirement. In Christ, Steven Watanabe

On 21. Sep 2018, at 23:21, Steven Watanabe via Boost <boost@lists.boost.org> wrote:
On 09/21/2018 04:23 AM, Hans Dembinski via Boost wrote:
<snip> In short, for the sake of performance, it is mandatory that all axis types derive from axis::base, while deriving from iterator_mixin and axis::labeled_base should be optional.
I feel that I should point out that axis::any does not meet this requirement.
There is no need to store an axis::any inside an axis::any, therefore axis::any does not need to comply to all the AxisType concepts. Best regards, Hans

Hi Hans, I felt a bit lame for haven given a review without actually trying the library. So I decided to give it a try. I just ran the very first example on the opening page (http://hdembinski.github.io/histogram/index.html) My output is different from the given example though, as it seems that nothing is counted: bin 0 x in [-1, -0.5): 0 +/- 0 bin 1 x in [-0.5, -1.11022e-16): 0 +/- 0 bin 2 x in [-1.11022e-16, 0.5): 0 +/- 0 bin 3 x in [0.5, 1): 0 +/- 0 bin 4 x in [1, 1.5): 0 +/- 0 bin 5 x in [1.5, 2): 0 +/- 0 bin 6 x in [2, inf): 0 +/- 0 bin -1 x in [-inf, -1): 0 +/- 0 Any clue as to what happened? I have VS 15.7.4, Boost version 1.67, and compile with C++17 turned on. Then I looked at "getting_started_listing_01.cpp" and try there I did get the same output as given in comments. But, there I have difficulty understanding the logic of the example. What happens to the four samples that are added to the histogram in line 18 and 19? Is this the same issue? Thanks, Alex

Hi Alex,
On 25. Sep 2018, at 16:31, <a.hagen-zanker@surrey.ac.uk> <a.hagen-zanker@surrey.ac.uk> wrote:
I just ran the very first example on the opening page (http://hdembinski.github.io/histogram/index.html)
My output is different from the given example though, as it seems that nothing is counted:
bin 0 x in [-1, -0.5): 0 +/- 0 bin 1 x in [-0.5, -1.11022e-16): 0 +/- 0 bin 2 x in [-1.11022e-16, 0.5): 0 +/- 0 bin 3 x in [0.5, 1): 0 +/- 0 bin 4 x in [1, 1.5): 0 +/- 0 bin 5 x in [1.5, 2): 0 +/- 0 bin 6 x in [2, inf): 0 +/- 0 bin -1 x in [-inf, -1): 0 +/- 0
Any clue as to what happened? I have VS 15.7.4, Boost version 1.67, and compile with C++17 turned on.
thanks for catching this, the example in the README.md and in the Getting Started section is wrong. std::for_each makes a copy of the functor, calls operator() on all items in the iterator range and finally returns the functor. The return value is ignored in the example, therefore it is like the fill never happened. Best regards, Hans

On Tue, 25 Sep 2018 at 22:53, Hans Dembinski via Boost < boost@lists.boost.org> wrote:
thanks for catching this, the example in the README.md and in the Getting Started section is wrong.
As this issue caught my attention, I decided to read the readme.md. I think you should remove the word modern in "modern C++11 design". I don't doubt you designed it well, but 1. C++11 is not modern and 2. today's modern is tomorrow's old (Boost.Histogram might not even make in into 1.69, which will push it to 2019), just stick with "C++11 design", that's precise and clear. But, elaborating the above, are there any benefits [low hanging fruit] to be had from C++14 and/or C++17. One I can think of, which would have caught the example problem (at compile time), is the use of the [[nodiscard]] attribute. As you can see, live here, it's a biggie. Then there are static_assert's, extended constexpr (more compile time stuff), noexcept and direct-list-initialization. Structured bindings is maybe something as well [the buckets]. If you strive to make it "modern" (but please don't talk about it), you should actually make it modern and conditionally add C++17 features. It is current Boost practice that the library maintainer can do whatever he likes (summer-of-love-style), which I think is an abberation, but it seems to me (and this was discussed in that rather heated discussion we had some weeks ago), C++11 is (should be) the minimum standard for new library contributions. For the rest I would say, it looks [to me] useful (to many) and indeed looks (relatively) simple (as defined per Boost practice, i.e. nothing is ever really simple) to use. Robert has, rightfully so, dwelled upon this in the past. Just one last thing, it is (really) good that this example issue was caught by that/the poster as beginners/first-time-users will be stumped by this kind of thing and will do a RYO instead. No review, but I vote to include! Thanks, Hans (as it looks like it will be included from what I read so far). Have a nice day, degski -- *“If something cannot go on forever, it will stop" - Herbert Stein*

On Wed, 26 Sep 2018 at 07:40, degski via Boost <boost@lists.boost.org> wrote:
On Tue, 25 Sep 2018 at 22:53, Hans Dembinski via Boost <boost@lists.boost.org> wrote: [...] No review, but I vote to include! Thanks, Hans (as it looks like it will be included from what I read so far).
Thank you for your valuable feedback. FYI, despite declared as not a review, your vote has been recorded :) Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

Hi degski,
On 26. Sep 2018, at 07:40, degski via Boost <boost@lists.boost.org> wrote:
On Tue, 25 Sep 2018 at 22:53, Hans Dembinski via Boost < boost@lists.boost.org> wrote:
As this issue caught my attention, I decided to read the readme.md. I think you should remove the word modern in "modern C++11 design". I don't doubt you designed it well, but 1. C++11 is not modern and 2. today's modern is tomorrow's old (Boost.Histogram might not even make in into 1.69, which will push it to 2019), just stick with "C++11 design", that's precise and clear.
Ok, fair point. https://github.com/HDembinski/histogram/issues/108
But, elaborating the above, are there any benefits [low hanging fruit] to be had from C++14 and/or C++17. One I can think of, which would have caught the example problem (at compile time), is the use of the [[nodiscard]] attribute. As you can see, live here, it's a biggie. Then there are static_assert's, extended constexpr (more compile time stuff), noexcept and direct-list-initialization. Structured bindings is maybe something as well [the buckets]. If you strive to make it "modern" (but please don't talk about it), you should actually make it modern and conditionally add C++17 features.
I considered requiring a more recent C++ standard. Generic lambdas and automatic return type deduction would have made the internal code simpler. Template deduction of constructors would have been useful. std::string_view could have replaced boost::string_view and std::variant boost::variant. In the end I concluded that while it would have made my life as the implementer a little easier, the cost of dropping support for older compilers outweighs these benefits. C++17 features on the user-facing side will be added conditionally over time, of course. I added an issue for [[nodiscard]] https://github.com/HDembinski/histogram/issues/107
Just one last thing, it is (really) good that this example issue was caught by that/the poster as beginners/first-time-users will be stumped by this kind of thing and will do a RYO instead.
Absolutely, this is bad. I am testing that all examples in the docs compile, but I should also test the output. Which some hackery I can probably also automatically test the code in the README.md. Apparently is no include-code-from-file statement for Markdown.
No review, but I vote to include!
Cool, thanks! :) Best regards, Hans

Hi Hans, I did some further hands on trying. In particular I implemented my own axis type and used it with the dynamic historgram variant. 1. I think axis::any is a misnomer. We normally use the word "any" to indicate type erasure. So I would expect axis::any to be able to hold any type that implements the axis concept. Instead, axis::any derives for boost::variant and can hold an axis from a discrete set of types provided by the user. This bites when using your own axis_type in a dynamic histogram, because it means you then also have to specify your own variant of any_std. E.g. I was required to do the following: using my_any = bh::axis::any<bh::axis::regular<>, bh::axis::category<std::string>, my_axis>; std::vector<my_any> axes; You could rename your axis::any to axis::variant, however I think it would be better to more fully achieve type erasure so you can store your dynamic axes as a std::vector<axis::any> and dramatically simplify usage. 2. The bin_type concept is very opaque to me. And, in the end the only two member functions I was required to implement to make my bin_type work were: std::string value() const { return is_negative_bin ? "negative" : "positive"; } operator int() const { return is_negative_bin ? 0 : 1; } "value()" seems to be a misnomer, since it is used to print a label for the bin (in your second example: getting_started_listing_02.cpp line 41). To me it is not logical for a bin to have a value() member function, since a bin really is a domain of values. Would a function "as_string()" make more sense? Or, better, simply work with ostream << ? It would make the odd cast requirement in your second example obsolete. I ended up with the following cast: for (auto my_bin : static_cast<const my_axis&>(h.axis(2))) { std::printf("%s\n", my_bin.value().c_str()); and in my opinion the following would be better for (auto my_bin : h.axis(2)) { std::printf("%s\n", my_bin.as_string().c_str()); or even better for (auto my_bin : h.axis(2)) { std::cout << my_bin << '\n'; operator int() seems inappropriate because why would a bin need to know its index? That should logically be the responsibility of the axis 3. I think this was covered before, but the inheritance from labeled_base is awkward. I am not just required to inherit from it, but also construct it in the appropriate form. I ended up constructing the base class with: my_axis() : labeled_base(2, bh::axis::uoflow_type::off, "positive vs. negative", std::allocator<char>{}) { ... It seems too onerous to require my own axis to know the number of bins on construction. What if I want to add bins one at a time? What if my number of bins is based on some function? I also really don't like having to bother with std::allocator<char>; I normally stay away from that kind of complexity. Why should using histograms force me to start thinking about allocators? 4. Using iterator_over might not be efficient If my axis stores bin_types in a std::vector (which seems reasonable), why can I then not just use the iterator of std::vector? You could achieve that by leaving it up to the user to implement the iterators, and offer iterator_over as the default.. I still think histogram should be accepted, but in my opinion there remains quite some scope to tidy up the concepts to make working with histograms as simple as possible. Kind regards, Alex

Hi Alex,
On 26. Sep 2018, at 12:35, <a.hagen-zanker@surrey.ac.uk> <a.hagen-zanker@surrey.ac.uk> wrote:
I did some further hands on trying. In particular I implemented my own axis type and used it with the dynamic historgram variant.
1. I think axis::any is a misnomer. We normally use the word "any" to indicate type erasure. So I would expect axis::any to be able to hold any type that implements the axis concept. Instead, axis::any derives for boost::variant and can hold an axis from a discrete set of types provided by the user. This bites when using your own axis_type in a dynamic histogram, because it means you then also have to specify your own variant of any_std. E.g. I was required to do the following:
using my_any = bh::axis::any<bh::axis::regular<>, bh::axis::category<std::string>, my_axis>; std::vector<my_any> axes;
You could rename your axis::any to axis::variant, however I think it would be better to more fully achieve type erasure so you can store your dynamic axes as a std::vector<axis::any> and dramatically simplify usage.
you don't need to define my_any yourself, if you use the make_*_histogram factories, as shown in the user guide. Those work automatically with custom types. I am open to renaming axis::any, but isn't axis::variant equally confusing? I avoided the name "variant", because axis::any does more than a normal variant, it implements the common interface of the builtin axis types to provide some convenient polymorphism. axis::any cannot represent all aspects of all axes, but it works for simple cases at least.
2. The bin_type concept is very opaque to me. And, in the end the only two member functions I was required to implement to make my bin_type work were:
The concepts need better documentation, sorry about that, and so does the section about implementing your own axis types.
std::string value() const { return is_negative_bin ? "negative" : "positive"; } operator int() const { return is_negative_bin ? 0 : 1; }
"value()" seems to be a misnomer, since it is used to print a label for the bin (in your second example: getting_started_listing_02.cpp line 41). To me it is not logical for a bin to have a value() member function, since a bin really is a domain of values.
That depends on the axis. axis::category has bins which represent a single value, not a continuum. That's the example you are referring to. Same for axis::integer. The other axes types use a different kind of bin type that represents an interval. That's why there are two helper templates interval_view and value_view. Use the former for axis types whose bins represent an interval and the latter for axis types whose bins represent a single item.
operator int() seems inappropriate because why would a bin need to know its index? That should logically be the responsibility of the axis
I explained the benefit in an answer to Steven.
3. I think this was covered before, but the inheritance from labeled_base is awkward. I am not just required to inherit from it, but also construct it in the appropriate form.
I ended up constructing the base class with:
my_axis() : labeled_base(2, bh::axis::uoflow_type::off, "positive vs. negative", std::allocator<char>{}) { ...
It seems too onerous to require my own axis to know the number of bins on construction. What if I want to add bins one at a time? What if my number of bins is based on some function? I also really don't like having to bother with std::allocator<char>; I normally stay away from that kind of complexity. Why should using histograms force me to start thinking about allocators?
The allocator argument can get a default so you don't have to see it. To be honest, I didn't think someone would want to make axis from scratch. Usually, you want to bend behaviour of an existing axis a bit to achieve what you want, and that is much simpler, see the example in the user guide.
4. Using iterator_over might not be efficient
If my axis stores bin_types in a std::vector (which seems reasonable), why can I then not just use the iterator of std::vector? You could achieve that by leaving it up to the user to implement the iterators, and offer iterator_over as the default..
I still think histogram should be accepted, but in my opinion there remains quite some scope to tidy up the concepts to make working with histograms as simple as possible.
I am thankful for your input on the advanced usage of the library, and of course that should be as simple as possible and it needs better documentation, but you cannot expect it to be as simple as the basic usage. A good library should make simple things simple and complicated things possible. Best regards, Hans

you don't need to define my_any yourself, if you use the make_*_histogram factories, as shown in the user guide. Those work automatically with custom types.
I took you second example as starting point and did whatever necessary to make it work with my own axis. That example starts by putting axes into a vector and then uses the make_dynamic_histogram. I need my_any as value_type for this vector. *********** using my_any = bh::axis::any<bh::axis::regular<>, bh::axis::category<std::string>, my_axis>; int main() { std::vector<my_any> axes; axes.emplace_back(bh::axis::category<std::string>({"red", "blue"})); axes.emplace_back(bh::axis::regular<>(5, -5, 5, "x")); axes.emplace_back(my_axis{}); auto h = bh::make_dynamic_histogram(axes.begin(), axes.end()); ************
I am open to renaming axis::any, but isn't axis::variant equally confusing? I avoided the name "variant", because axis::any does more than a normal variant, it implements the common interface of the builtin axis types to provide some convenient polymorphism. axis::any cannot represent all aspects of all axes, but it works for simple cases at least.
I don't know what you mean by "cannot". If you define a consistent Axis concept, you can wrap this in a type-erased axis::any. If you chose not to do this, then axis::variant is a less confusing name to me.
"value()" seems to be a misnomer, since it is used to print a label for the bin (in your second example: getting_started_listing_02.cpp line 41). To me it is not logical for a bin to have a value() member function, since a bin really is a domain of values.
That depends on the axis. axis::category has bins which represent a single value, not a continuum.
A single value, a set of values and a continuum of values are all domains of values. A bin logically is a domain of values.
That's the example you are referring to. Same for axis::integer. The other axes types use a different kind of bin type that represents an interval. That's why there are two helper templates interval_view and value_view. Use the former for axis types whose bins represent an interval and the latter for axis types whose bins represent a single item.
I don't agree, a bin always is a domain of values and there are more options than interval and value. For instance, I could chose to have bins for odd and even integers. I think you need an interface for the generic case: a bin is a domain.
operator int() seems inappropriate because why would a bin need to know its index? That should logically be the responsibility of the axis
I explained the benefit in an answer to Steven.
But, I don't think that convenience is sufficient reason. It is not logical and brings about further questions. A histogram is a relatively simple mathematical construct, it must be possible to support it with concepts that are rigorously logical, not just convenient.
3. I think this was covered before, but the inheritance from labeled_base is awkward. I am not just required to inherit from it, but also construct it in the appropriate form.
I ended up constructing the base class with:
my_axis() : labeled_base(2, bh::axis::uoflow_type::off, "positive vs. negative", std::allocator<char>{}) { ...
It seems too onerous to require my own axis to know the number of bins on construction. What if I want to add bins one at a time? What if my number of bins is based on some function? I also really don't like having to bother with std::allocator<char>; I normally stay away from that kind of complexity. Why should using histograms force me to start thinking about allocators?
The allocator argument can get a default so you don't have to see it. To be honest, I didn't think someone would want to make axis from scratch. Usually, you want to bend behaviour of an existing axis a bit to achieve what you want, and that is much simpler, see the example in the user guide.
Even though in practice that is what I might do, I am not sure if it is really simpler. I would just be less aware of what I was doing.
4. Using iterator_over might not be efficient
If my axis stores bin_types in a std::vector (which seems reasonable), why can I then not just use the iterator of std::vector? You could achieve that by leaving it up to the user to implement the iterators, and offer iterator_over as the default..
I still think histogram should be accepted, but in my opinion there remains quite some scope to tidy up the concepts to make working with histograms as simple as possible.
I am thankful for your input on the advanced usage of the library, and of course that should be as simple as possible and it needs better documentation, but you cannot expect it to be as simple as the basic usage. A good library should make simple things simple and complicated things possible.
Hmm, in my opinion defining your own axis should be a simple thing in a histogram library. Moreover, I think a mathematical library should map mathematical concepts to programming concepts as closely as possible.

On 26. Sep 2018, at 15:07, <a.hagen-zanker@surrey.ac.uk> <a.hagen-zanker@surrey.ac.uk> wrote:
"value()" seems to be a misnomer, since it is used to print a label for the bin (in your second example: getting_started_listing_02.cpp line 41). To me it is not logical for a bin to have a value() member function, since a bin really is a domain of values.
That depends on the axis. axis::category has bins which represent a single value, not a continuum.
A single value, a set of values and a continuum of values are all domains of values. A bin logically is a domain of values.
Histogram uses a generalized bin concept, where a single bin can represent a single value on an axis or a domain of values. You cannot implement a category axis with your narrow concept of bin. When you have an category axis that represents the sequence "red", "green", "blue", then what's the domain for the bin "red"?
That's the example you are referring to. Same for axis::integer. The other axes types use a different kind of bin type that represents an interval. That's why there are two helper templates interval_view and value_view. Use the former for axis types whose bins represent an interval and the latter for axis types whose bins represent a single item.
I don't agree, a bin always is a domain of values and there are more options than interval and value. For instance, I could chose to have bins for odd and even integers. I think you need an interface for the generic case: a bin is a domain.
You are free to define your bin_type in almost any way you like.
operator int() seems inappropriate because why would a bin need to know its index? That should logically be the responsibility of the axis
I explained the benefit in an answer to Steven.
But, I don't think that convenience is sufficient reason. It is not logical and brings about further questions. A histogram is a relatively simple mathematical construct, it must be possible to support it with concepts that are rigorously logical, not just convenient.
Why is it illogical that a bin knows its own index? A bin both represents a location on an axis and a value or value range on that axis. The library is not forcing you to define your own bins in this way.

Histogram uses a generalized bin concept, where a single bin can represent a single value on an axis or a domain of values.
That is exactly why a generic member function called value() makes no sense to me. It is not a given that the bin has a single value.
When you have an category axis that represents the sequence "red", "green", "blue", then what's the domain for the bin "red"?
Well that would just consist of be the value "red" wouldn't it. Somehow we're talking past each other. I am perfectly fine with a bin consisting of a single value. But in the general case you cannot expect bins to have a single value. Maybe you never intended that. However, I then think your example 2 could be better chosen. I assumed this to demonstrate general behaviour.
Why is it illogical that a bin knows its own index?
Because the axis already maps indices to bin types. It is not necessary for the bin to know it, and bins can be a lot simpler if they don't need to know their number.
The library is not forcing you to define your own bins in this way.
The library isn't forcing me to do anything, but if I want to use it, I have to conform to its concepts, right.?

On 26. Sep 2018, at 19:01, <a.hagen-zanker@surrey.ac.uk> <a.hagen-zanker@surrey.ac.uk> wrote:
Histogram uses a generalized bin concept, where a single bin can represent a single value on an axis or a domain of values.
That is exactly why a generic member function called value() makes no sense to me. It is not a given that the bin has a single value.
Could you please stop arguing and look into the code and the examples? The builtin axis types come with two kinds of bin_types, one based on interval_view, which has no value() method, but a lower() and upper() method, and one based on value_view, which has a value() method. It all makes sense.
When you have an category axis that represents the sequence "red", "green", "blue", then what's the domain for the bin "red"?
Well that would just consist of be the value "red" wouldn't it. Somehow we're talking past each other. I am perfectly fine with a bin consisting of a single value. But in the general case you cannot expect bins to have a single value. Maybe you never intended that. However, I then think your example 2 could be better chosen. I assumed this to demonstrate general behaviour.
That's your fault then. The Getting Started section just has some examples to get you started and give you a feel about the library. It is not intended to give you a full tour of the library and/or to explain all details. For that you need to read the User Guide.
The library is not forcing you to define your own bins in this way.
The library isn't forcing me to do anything, but if I want to use it, I have to conform to its concepts, right.?
There is no BinType concept, as you have noted, which means you are free to define your bin_types in almost arbitrary weird ways.

Could you please stop arguing and look into the code and the examples?
That's exactly what I did. My feedback came from playing with your first and second example.
There is no BinType concept, as you have noted, which means you are free to define your bin_types in almost arbitrary weird ways.
There is a BinType concept, because there are requirements upon bin types in order to work with the rest of the library. You just haven't documented these. I will stop arguing now...
participants (7)
-
a.hagen-zanker@surrey.ac.uk
-
degski
-
Gavin Lambert
-
Hans Dembinski
-
Mateusz Loskot
-
Seth
-
Steven Watanabe