- type Float16
- type Precision
const ErrInvalidNaNValue = float16Error("float16: invalid NaN value, expected IEEE 754 NaN")
ErrInvalidNaNValue indicates a NaN was not received.
type Float16 uint16
Float16 represents IEEE 754 half-precision floating-point numbers (binary16).
func FromNaN32ps ¶
FromNaN32ps converts nan to IEEE binary16 NaN while preserving both signaling and payload. Unlike Fromfloat32(), which can only return qNaN because it sets quiet bit = 1, this can return both sNaN and qNaN. If the result is infinity (sNaN with empty payload), then the lowest bit of payload is set to make the result a NaN. Returns ErrInvalidNaNValue and 0x7c01 (sNaN) if nan isn't IEEE 754 NaN. This function was kept simple to be able to inline.
Frombits returns the float16 number corresponding to the IEEE 754 binary16 representation u16, with the sign bit of u16 and the result in the same bit position. Frombits(Bits(x)) == x.
func Fromfloat32 ¶
Fromfloat32 returns a Float16 value converted from f32. Conversion uses IEEE default rounding (nearest int, with ties to even).
Inf returns a Float16 with an infinity value with the specified sign. A sign >= returns positive infinity. A sign < 0 returns negative infinity.
func NaN() Float16
NaN returns a Float16 of IEEE 754 binary16 not-a-number (NaN). Returned NaN value 0x7e01 has all exponent bits = 1 with the first and last bits = 1 in the significand. This is consistent with Go's 64-bit math.NaN(). Canonical CBOR in RFC 7049 uses 0x7e00.
Bits returns the IEEE 754 binary16 representation of f, with the sign bit of f and the result in the same bit position. Bits(Frombits(x)) == x.
Float32 returns a float32 converted from f (Float16). This is a lossless conversion.
IsFinite returns true if f is neither infinite nor NaN.
IsInf reports whether f is an infinity (inf). A sign > 0 reports whether f is positive inf. A sign < 0 reports whether f is negative inf. A sign == 0 reports whether f is either inf.
IsNaN reports whether f is an IEEE 754 binary16 “not-a-number” value.
IsNormal returns true if f is neither zero, infinite, subnormal, or NaN.
func (Float16) IsQuietNaN ¶
IsQuietNaN reports whether f is a quiet (non-signaling) IEEE 754 binary16 “not-a-number” value.
Signbit reports whether f is negative or negative zero.
type Precision int
Precision indicates whether the conversion to Float16 is exact, subnormal without dropped bits, inexact, underflow, or overflow.
const ( // PrecisionExact is for non-subnormals that don't drop bits during conversion. // All of these can round-trip. Should always convert to float16. PrecisionExact Precision = iota // PrecisionUnknown is for subnormals that don't drop bits during conversion but // not all of these can round-trip so precision is unknown without more effort. // Only 2046 of these can round-trip and the rest cannot round-trip. PrecisionUnknown // PrecisionInexact is for dropped significand bits and cannot round-trip. // Some of these are subnormals. Cannot round-trip float32->float16->float32. PrecisionInexact // PrecisionUnderflow is for Underflows. Cannot round-trip float32->float16->float32. PrecisionUnderflow // PrecisionOverflow is for Overflows. Cannot round-trip float32->float16->float32. PrecisionOverflow )
func PrecisionFromfloat32 ¶
PrecisionFromfloat32 returns Precision without performing the conversion. Conversions from both Infinity and NaN values will always report PrecisionExact even if NaN payload or NaN-Quiet-Bit is lost. This function is kept simple to allow inlining and run < 0.5 ns/op, to serve as a fast filter.