BFloat16 conversion to Float32 is not 100% trivial, because of NaN and rounding modes. I think https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/lib/bfl... does a good job at documenting this (the linked code is Apache 2.0 licensed, in case you worry about such issues). However, if one ignores these complexities, conversion is as simple as bitshifting. The public Intel BF16 spec is https://software.intel.com/sites/default/files/managed/40/8b/bf16-hardware-n..., which describes the details of the Intel definition. There are some implementation specific details in https://reviews.llvm.org/D60550. I can't comment on other hardware implementations. The important distinction between __fp16 and _Float16 is the former is a storage format type, not an arithmetic type, whereas the latter is an arithmetic type. The former is more easily implementable, e.g.Intel CPUs since Ivy Bridge use the F16C to do fast conversion to/from the 16-bit storage format to float32, but all arithmetic is done with float32 hardware. Folks who are interested in this topic may enjoy reading https://arxiv.org/abs/1904.06376. The methods described therein are not necessarily applicable to Boost Multiprecision, but may be relevant if uBLAS gets involved. Jeff, who works for Intel On Tue, Oct 15, 2019 at 7:04 AM Phil Endecott via Boost < boost@lists.boost.org> wrote:
Matt Hurd wrote:
IEEE 16bit (fp16) and bfloat16 are both around, but bfloat16 seems to be the new leader in modern implementations thanks to ML use. I haven't experienced both used together but I wouldn't rule it out given bfloat16 may be accelerator specific. Google and intel have support for bfloat16 in some hardware. bfloat16 makes it easy to move to fp32 as they have the same exponent size.
Refs: https://en.wikipedia.org/wiki/Bfloat16_floating-point_format
https://nickhigham.wordpress.com/2018/12/03/half-precision-arithmetic-fp16-v...
According to section 4.1.2 of this ARM document:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053d/IHI0053D_acle_2_1....
implementations support both the IEEE format (1 sign, 5 exponent and 10 mantissa) and an alternative format which is similar except that it doesn't support Inf and NaN, and gains slightly more range. Apparently the bfloat16 format is supported in ARMv8.6-A, but I don't believe that is deployed anywhere yet.
The other place where I've used 16-bit floats is in OpenGL textures, ( https://www.khronos.org/registry/OpenGL/extensions/OES/OES_texture_float.txt ), which use the 1-5-10 format.
I was a bit surprised by the 1-5-10 choice; the maximum value that can be represented is only 65504, i.e. less than the maximum value for an unsigned int of the same size.
bfloat16 can be trivially implemented (as a storage-only type) simply by truncating a 32-bit float; perhaps support for that would be useful too?
Regards, Phil.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Jeff Hammond jeff.science@gmail.com http://jeffhammond.github.io/