A proposal to make float_hash_value<> more portable

Hello, the current implementation function for hashing of floating point types, the function template float_hash_value<> in hash.hpp, is implicitely written for real types where std::numeric_limits<T>::radix == 2. Although base = 2 is probably the most prominent case, it is not the only representation compatible with Our Holy Standard. E.g. base 16 floating types are well-known to exist. The hidden assumption becomes obvious in the formular std::size_t const length = (std::numeric_limits<T>::digits + std::numeric_limits<int>::digits - 1) / std::numeric_limits<int>::digits; which should actually mean (pseudo formular): std::size_t const length = (std::numeric_limits<T>::digits2 + std::numeric_limits<int>::digits - 1) / std::numeric_limits<int>::digits; where the (non-existing) constant digits2 would describe the "Number of base 2 digits that can be represented without change." To my opinion boost itself provides a simple helper class, namely static_log2, which allows the correct implementation by using the extended formular #include <boost/integer/static_log2.hpp> ... std::size_t const length = (std::numeric_limits<T>::digits * boost::static_log2<std::numeric_limits<T >::radix>::value + std::numeric_limits<int>::digits - 1) / std::numeric_limits<int>::digits; Note that the expression boost::static_log2<std::numeric_limits<T>::radix>::value is exactly 1 in the usual base-2 case, but e.g. 4 in case of base-16. My proposed fix is exact in all cases, where radix is a power of 2, in other cases (like base-10) it would at least be a better approximation than the current one. Greetings from Bremen, Daniel Krügler

Daniel Krügler wrote:
To my opinion boost itself provides a simple helper class, namely static_log2, which allows the correct implementation by using the extended formular
#include <boost/integer/static_log2.hpp> .... std::size_t const length = (std::numeric_limits<T>::digits * boost::static_log2<std::numeric_limits<T >::radix>::value + std::numeric_limits<int>::digits - 1) / std::numeric_limits<int>::digits;
Note that the expression boost::static_log2<std::numeric_limits<T>::radix>::value is exactly 1 in the usual base-2 case, but e.g. 4 in case of base-16.
My proposed fix is exact in all cases, where radix is a power of 2, in other cases (like base-10) it would at least be a better approximation than the current one.
Thanks for letting me know, I need to think about this. It the radix isn't a power of 2, then I guess that frexp and ldexp will be slow and inaccurate. Maybe ldexp could be replaced with a multiplication. As it is, I'm not happy with the current implementation. It might be better to print the float to a string and then generate a hash value from that. That's not great, but it might be more portable. Daniel

Daniel James wrote:
Daniel Krügler wrote:
To my opinion boost itself provides a simple helper class, namely static_log2, which allows the correct implementation by using the extended formular
#include <boost/integer/static_log2.hpp> .... std::size_t const length = (std::numeric_limits<T>::digits * boost::static_log2<std::numeric_limits<T >::radix>::value + std::numeric_limits<int>::digits - 1) / std::numeric_limits<int>::digits;
Note that the expression boost::static_log2<std::numeric_limits<T>::radix>::value is exactly 1 in the usual base-2 case, but e.g. 4 in case of base-16.
My proposed fix is exact in all cases, where radix is a power of 2, in other cases (like base-10) it would at least be a better approximation than the current one.
Thanks for letting me know, I need to think about this. It the radix isn't a power of 2, then I guess that frexp and ldexp will be slow and inaccurate. Maybe ldexp could be replaced with a multiplication.
As it is, I'm not happy with the current implementation. It might be better to print the float to a string and then generate a hash value from that. That's not great, but it might be more portable.
I don't think, that stringifying floats to perform its hashing is really the better solution. Besides the performance issues this method is also not strictly portable, because the length of the string would depend on std::numeric_limits::digits10. One of the (minor) shortcomings of the current impl (although not for everyone) is that non-normal floats like infinities and NaNs (if they exist) all return the same value like 0. Having some of the new C99 floating-point diagnosis functions available (fpclassify is your friend), one could create discriminating values for at least these categories (not for individuals of those). Besides these and the originally described base issue I don't see any problems of your current implementation, I really don't see any reason for absolute portability. Daniel
participants (2)
-
Daniel James
-
Daniel Krügler