Back to

Package float16

Latest Go to latest

The latest major version is .

Published: Jan 17, 2020 | License: MIT | Module:



const ErrInvalidNaNValue = float16Error("float16: invalid NaN value, expected IEEE 754 NaN")

ErrInvalidNaNValue indicates a NaN was not received.

type Float16

type Float16 uint16

Float16 represents IEEE 754 half-precision floating-point numbers (binary16).

func FromNaN32ps

func FromNaN32ps(nan float32) (Float16, error)

FromNaN32ps converts nan to IEEE binary16 NaN while preserving both signaling and payload. Unlike Fromfloat32(), which can only return qNaN because it sets quiet bit = 1, this can return both sNaN and qNaN. If the result is infinity (sNaN with empty payload), then the lowest bit of payload is set to make the result a NaN. Returns ErrInvalidNaNValue and 0x7c01 (sNaN) if nan isn't IEEE 754 NaN. This function was kept simple to be able to inline.

func Frombits

func Frombits(u16 uint16) Float16

Frombits returns the float16 number corresponding to the IEEE 754 binary16 representation u16, with the sign bit of u16 and the result in the same bit position. Frombits(Bits(x)) == x.

func Fromfloat32

func Fromfloat32(f32 float32) Float16

Fromfloat32 returns a Float16 value converted from f32. Conversion uses IEEE default rounding (nearest int, with ties to even).

func Inf

func Inf(sign int) Float16

Inf returns a Float16 with an infinity value with the specified sign. A sign >= returns positive infinity. A sign < 0 returns negative infinity.

func NaN

func NaN() Float16

NaN returns a Float16 of IEEE 754 binary16 not-a-number (NaN). Returned NaN value 0x7e01 has all exponent bits = 1 with the first and last bits = 1 in the significand. This is consistent with Go's 64-bit math.NaN(). Canonical CBOR in RFC 7049 uses 0x7e00.

func (Float16) Bits

func (f Float16) Bits() uint16

Bits returns the IEEE 754 binary16 representation of f, with the sign bit of f and the result in the same bit position. Bits(Frombits(x)) == x.

func (Float16) Float32

func (f Float16) Float32() float32

Float32 returns a float32 converted from f (Float16). This is a lossless conversion.

func (Float16) IsFinite

func (f Float16) IsFinite() bool

IsFinite returns true if f is neither infinite nor NaN.

func (Float16) IsInf

func (f Float16) IsInf(sign int) bool

IsInf reports whether f is an infinity (inf). A sign > 0 reports whether f is positive inf. A sign < 0 reports whether f is negative inf. A sign == 0 reports whether f is either inf.

func (Float16) IsNaN

func (f Float16) IsNaN() bool

IsNaN reports whether f is an IEEE 754 binary16 “not-a-number” value.

func (Float16) IsNormal

func (f Float16) IsNormal() bool

IsNormal returns true if f is neither zero, infinite, subnormal, or NaN.

func (Float16) IsQuietNaN

func (f Float16) IsQuietNaN() bool

IsQuietNaN reports whether f is a quiet (non-signaling) IEEE 754 binary16 “not-a-number” value.

func (Float16) Signbit

func (f Float16) Signbit() bool

Signbit reports whether f is negative or negative zero.

func (Float16) String

func (f Float16) String() string

String satisfies the fmt.Stringer interface.

type Precision

type Precision int

Precision indicates whether the conversion to Float16 is exact, subnormal without dropped bits, inexact, underflow, or overflow.

const (

	// PrecisionExact is for non-subnormals that don't drop bits during conversion.
	// All of these can round-trip.  Should always convert to float16.
	PrecisionExact Precision = iota

	// PrecisionUnknown is for subnormals that don't drop bits during conversion but
	// not all of these can round-trip so precision is unknown without more effort.
	// Only 2046 of these can round-trip and the rest cannot round-trip.

	// PrecisionInexact is for dropped significand bits and cannot round-trip.
	// Some of these are subnormals. Cannot round-trip float32->float16->float32.

	// PrecisionUnderflow is for Underflows. Cannot round-trip float32->float16->float32.

	// PrecisionOverflow is for Overflows. Cannot round-trip float32->float16->float32.

func PrecisionFromfloat32

func PrecisionFromfloat32(f32 float32) Precision

PrecisionFromfloat32 returns Precision without performing the conversion. Conversions from both Infinity and NaN values will always report PrecisionExact even if NaN payload or NaN-Quiet-Bit is lost. This function is kept simple to allow inlining and run < 0.5 ns/op, to serve as a fast filter.

Package Files

Documentation was rendered with GOOS=linux and GOARCH=amd64.

Jump to identifier

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to identifier