float8

package module

v0.0.3 Latest Latest Go to latest Published: Aug 1, 2024 License: MIT Imports: 1 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kshard/float8

Links

Open Source Insights

README ¶

Float8

minifloat in Golang

In computing, minifloats are floating-point values represented with very few bits. The library implements float8 (8-bit uint8). It ideal for applications where memory and storage efficiency are crucial but lossy precision is acceptable (e.g. computer graphics, manche learning, etc).

Features

IEEE 754 and FP8 E4M3 compatible format.
Fast conversion from/to float32.
Fast algebraic operations (+, -, *, /).

Getting Started

The latest version of the module is available at main branch. All development, including new features and bug fixes, take place on the main branch using forking and pull requests as described in contribution guidelines. The stable version is available via Golang modules.

Use go get to retrieve the library and add it as dependency to your application.

go get -u github.com/kshard/float8

Quick example

package main

import (
  "fmt"
  "math"

  "github.com/kshard/float8"
)

func main() {
  phi08 := float8.ToFloat8(math.Phi)
  phi32 := float8.ToFloat32(phi08)
  fmt.Printf("𝝅(08) %f : %08b \n", phi32, phi08)
  fmt.Printf("𝝅(32) %f : %032b \n", float32(math.Phi), math.Float32bits(math.Phi))
}

The conversion is lossy and supported range of values is limited.

Benchmark

The library implement public api using code books so that its pure Go code is efficient:

go test -run=^$ -bench=. -benchtime=1s -cpu 1
BenchmarkToFloat8     546344320         1.9410 ns/op
BenchmarkToFloat32   1000000000         0.5158 ns/op
BenchmarkAdd         1000000000         0.5804 ns/op
BenchmarkMul         1000000000         0.6461 ns/op
BenchmarkToSlice8       3481468         348.70 ns/op

The internal package math8 implements float-point algebra with focus on correctness, which is used to build code books.

How To Contribute

The library is MIT licensed and accepts contributions via GitHub pull requests:

Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Added some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

The build and testing process requires Go version 1.21 or later.

commit message

The commit message helps us to write a good release note, speed-up review process. The message should address two question what changed and why. The project follows the template defined by chapter Contributing to a Project of Git book.

bugs

If you experience any issues with the library, please let us know via GitHub issues. We appreciate detailed and accurate reports that help us to identity and replicate the issue.

License

References

Documentation ¶

Overview ¶

DO NOT EDIT! Use cmd to regenerate it.

Package float8 implement minifloat (https://en.wikipedia.org/wiki/Minifloat) compatible with IEEE 754 and FP8 E4M3 The number is defined as ±mantissa × 2^exponent

DO NOT EDIT! Use cmd to regenerate it.

Constants ¶

View Source

const (
	Infinity = 0x7f | mantissaMask
)

Variables ¶

This section is empty.

Functions ¶

func ToFloat32 ¶

func ToFloat32(f8 Float8) float32

Convert float8 to float32

Types ¶

type Float8 ¶

type Float8 = uint8

Float8 data type

func Add ¶

func Add(a, b Float8) Float8

Add float8(s)

func Div ¶

func Div(a, b Float8) Float8

Divide float8(s)

func Mul ¶

func Mul(a, b Float8) Float8

Multiply float8(s)

func Sub ¶

func Sub(a, b Float8) Float8

Subtract float8(s)

func ToFloat8 ¶

func ToFloat8(f32 float32) Float8

Convert float32 to float8

func ToSlice8 ¶ added in v0.0.3

func ToSlice8(f32s []float32) (f8s []Float8)

Convert []float32 to []float8 Note: the function is faster than standard range over []float32

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
internal
math8 Package math8 implements canonical operations using float8 type.	Package math8 implements canonical operations using float8 type.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL