Re: [boost] [Math] float16 on ARM

15 Oct 2019

      BFloat16 conversion to Float32 is not 100% trivial, because of NaN and
rounding modes.  I think
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/lib/bfl...
does
a good job at documenting this (the linked code is Apache 2.0 licensed, in
case you worry about such issues).  However, if one ignores these
complexities, conversion is as simple as bitshifting.

The public Intel BF16 spec is
https://software.intel.com/sites/default/files/managed/40/8b/bf16-hardware-n...,
which describes the details of the Intel definition.  There are some
implementation specific details in https://reviews.llvm.org/D60550.  I
can't comment on other hardware implementations.

The important distinction between __fp16 and _Float16 is the former is a
storage format type, not an arithmetic type, whereas the latter is an
arithmetic type.  The former is more easily implementable, e.g.Intel CPUs
since Ivy Bridge use the F16C to do fast conversion to/from the 16-bit
storage format to float32, but all arithmetic is done with float32 hardware.

Folks who are interested in this topic may enjoy reading
https://arxiv.org/abs/1904.06376.  The methods described therein are not
necessarily applicable to Boost Multiprecision, but may be relevant if
uBLAS gets involved.

Jeff, who works for Intel

On Tue, Oct 15, 2019 at 7:04 AM Phil Endecott via Boost <
boost@lists.boost.org> wrote:
...
Matt Hurd wrote:
...
IEEE 16bit (fp16) and bfloat16 are both around, but bfloat16 seems to be
the new leader in modern implementations thanks to ML use. I haven't
experienced both used together but I wouldn't rule it out given bfloat16
may be accelerator specific.  Google and intel have support for bfloat16
in
some hardware. bfloat16 makes it easy to move to fp32 as they have the
same
exponent size.
Refs: https://en.wikipedia.org/wiki/Bfloat16_floating-point_format
https://nickhigham.wordpress.com/2018/12/03/half-precision-arithmetic-fp16-v...
According to section 4.1.2 of this ARM document:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053d/IHI0053D_acle_2_1....
implementations support both the IEEE format (1 sign, 5 exponent and 10
mantissa) and an alternative format which is similar except that it doesn't
support Inf and NaN, and gains slightly more range.  Apparently the
bfloat16
format is supported in ARMv8.6-A, but I don't believe that is deployed
anywhere
yet.
The other place where I've used 16-bit floats is in OpenGL textures,
(
https://www.khronos.org/registry/OpenGL/extensions/OES/OES_texture_float.txt
),
which use the 1-5-10 format.
I was a bit surprised by the 1-5-10 choice; the maximum value that can
be represented is only 65504, i.e. less than the maximum value for an
unsigned int of the same size.
bfloat16 can be trivially implemented (as a storage-only type) simply
by truncating a 32-bit float; perhaps support for that would be useful
too?
Regards, Phil.
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost
-- 
Jeff Hammond
jeff.science@gmail.com
http://jeffhammond.github.io/