simd

package module
v1.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 22, 2024 License: MIT Imports: 8 Imported by: 0

README

SIMD

SIMD (Single Instruction, Multiple Data)

SIMD support via Go assembly for arithmetic and bitwise operations. Allowing for parallel element-wise computations. Resulting in a 100% to 400% speedup. Currently AMD64 (x86_64) and ARM64 processors are supported.

Function Documentation

SIMD Support

AMD64 (x86_64) ARM64 PPC64 / PPC64LE
AddFloat32 SSE2 / AVX NEON
AddFloat64 SSE2 / AVX NEON
AddInt32 SSE2 / AVX2 NEON
AddInt64 SSE2 / AVX2 NEON
AndInt32 SSE2 / AVX2 NEON
AndInt64 SSE2 / AVX2 NEON
DivFloat32 SSE2 / AVX
DivFloat64 SSE2 / AVX
DivInt32
DivInt64
MulFloat32 SSE2 / AVX NEON
MulFloat64 SSE2 / AVX NEON
MulInt32 SSE4.1 / AVX2 NEON
MulInt64
OrInt32 SSE2 / AVX2 NEON
OrInt64 SSE2 / AVX2 NEON
SubFloat32 SSE2 / AVX NEON
SubFloat64 SSE2 / AVX NEON
SubInt32 SSE2 / AVX2 NEON
SubInt64 SSE2 / AVX2 NEON
XorInt32 SSE2 / AVX2
XorInt64 SSE2 / AVX2

Make Targets

Tests
Command Description
make test Compiles and runs tests natively on hardware.
make test_amd64 Cross compiles for amd64 and runs tests via QEMU (qemu-x86_64).
make test_arm64 Cross compiles for arm64 and runs tests via QEMU (qemu-aarch64).
Benchmarks
Command Description
make bench Compiles and runs benchmarks natively on hardware.
make bench_amd64 Cross compiles for amd64 and runs benchmarks via QEMU (qemu-x86_64).
make bench_arm64 Cross compiles for arm64 and runs benchmarks via QEMU (qemu-aarch64).

AMD64 AddFloat32 Performance:

Elements Go ns/op SIMD ns/op Performance x
Small Vectors
100 42.5 96.6 0.4
200 88.4 99.9 0.8
300 127.6 106.0 1.2
400 167.8 110.7 1.5
500 208.8 118.2 1.7
600 247.2 123.3 2.0
700 286.5 129.4 2.2
800 328.5 131.4 2.5
900 362.7 137.8 2.6
Medium Vectors
1000 407.5 139.6 2.9
2000 818.0 182.9 4.4
3000 1207 222.1 5.4
4000 1612 290.0 5.5
5000 2028 482.7 4.2
6000 2412 544.5 4.4
7000 2846 623.4 4.5
8000 3277 747.4 4.3
9000 3681 806.7 4.5
Large Vectors
10000 4101 858.6 4.7
20000 8218 1744 4.7
30000 12188 2587 4.7
40000 16363 3277 4.9
50000 20343 4074 4.9
60000 24265 5029 4.8
70000 28435 6210 4.5
80000 32298 7519 4.2
90000 36328 9987 3.6

ARM64 AddFloat32 Performance:

Elements Go ns/op SIMD ns/op Performance x
Small Vectors
100 51.8 13.6 3.8
200 102.2 24.2 4.2
300 152.8 35.9 4.2
400 209.0 47.7 4.3
500 258.7 64.8 3.9
600 309.8 73.4 4.2
700 359.6 89.0 4.0
800 410.6 101.9 4.0
900 460.3 112.5 4.0
Medium Vectors
1000 511.5 124.3 4.1
2000 1015 241.0 4.2
3000 1520 356.9 4.2
4000 2024 473.1 4.2
5000 2527 589.9 4.2
6000 3032 706.1 4.2
7000 3535 822.5 4.2
8000 4039 939.2 4.3
9000 4543 1056 4.3
Large Vectors
10000 5046 1172 4.3
20000 10107 2394 4.2
30000 15139 3599 4.2
40000 20178 4957 4.0
50000 25218 6190 4.0
60000 30253 7277 4.1
70000 35285 8707 4.0
80000 40346 9924 4.0
90000 45378 11189 4.0

Documentation

Overview

SIMD support via Go assembly for arithmetic and bitwise operations. Allowing for parallel element-wise computations.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func AddFloat32

func AddFloat32(left, right, result []float32) int

AddFloat32 performs element-wise addition on left and right, storing the sums in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []float32{1, 9, 2, 8}
right := []float32{3, 7, 4, 6, 5}
result := []float32{0, 0, 0, 0, 0, 0}
length := AddFloat32(left, right, result)
fmt.Print(length, result)
Output:

4 [4 16 6 14 0 0]

func AddFloat64

func AddFloat64(left, right, result []float64) int

AddFloat64 performs element-wise addition on left and right, storing the sums in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []float64{1, 9, 2, 8}
right := []float64{3, 7, 4, 6, 5}
result := []float64{0, 0, 0, 0, 0, 0}
length := AddFloat64(left, right, result)
fmt.Print(length, result)
Output:

4 [4 16 6 14 0 0]

func AddInt32

func AddInt32(left, right, result []int32) int

AddInt32 performs element-wise addition on left and right, storing the sums in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []int32{1, 9, 2, 8}
right := []int32{3, 7, 4, 6, 5}
result := []int32{0, 0, 0, 0, 0, 0}
length := AddInt32(left, right, result)
fmt.Print(length, result)
Output:

4 [4 16 6 14 0 0]

func AddInt64

func AddInt64(left, right, result []int64) int

AddInt64 performs element-wise addition on left and right, storing the sums in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []int64{1, 9, 2, 8}
right := []int64{3, 7, 4, 6, 5}
result := []int64{0, 0, 0, 0, 0, 0}
length := AddInt64(left, right, result)
fmt.Print(length, result)
Output:

4 [4 16 6 14 0 0]

func AndInt32

func AndInt32(left, right, result []int32) int

AndInt32 performs element-wise AND on left and right, storing the results in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []int32{1, 9, 2, 8}
right := []int32{3, 7, 4, 6, 5}
result := []int32{0, 0, 0, 0, 0, 0}
length := AndInt32(left, right, result)
fmt.Print(length, result)
Output:

4 [1 1 0 0 0 0]

func AndInt64

func AndInt64(left, right, result []int64) int

AndInt64 performs element-wise AND on left and right, storing the results in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []int64{1, 9, 2, 8}
right := []int64{3, 7, 4, 6, 5}
result := []int64{0, 0, 0, 0, 0, 0}
length := AndInt64(left, right, result)
fmt.Print(length, result)
Output:

4 [1 1 0 0 0 0]

func DivFloat32

func DivFloat32(left, right, result []float32) int

DivFloat32 performs element-wise division on left and right, storing the quotients in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []float32{1, 9, 2, 8}
right := []float32{3, 7, 4, 6, 5}
result := []float32{0, 0, 0, 0, 0, 0}
length := DivFloat32(left, right, result)
fmt.Print(length, result)
Output:

4 [0.33333334 1.2857143 0.5 1.3333334 0 0]

func DivFloat64

func DivFloat64(left, right, result []float64) int

DivFloat64 performs element-wise division on left and right, storing the quotients in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []float64{1, 9, 2, 8}
right := []float64{3, 7, 4, 6, 5}
result := []float64{0, 0, 0, 0, 0, 0}
length := DivFloat64(left, right, result)
fmt.Print(length, result)
Output:

4 [0.3333333333333333 1.2857142857142858 0.5 1.3333333333333333 0 0]

func DivInt32

func DivInt32(left, right, result []int32) int

DivInt32 performs element-wise division on left and right, storing the quotients in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []int32{1, 9, 2, 8}
right := []int32{3, 7, 4, 6, 5}
result := []int32{0, 0, 0, 0, 0, 0}
length := DivInt32(left, right, result)
fmt.Print(length, result)
Output:

4 [0 1 0 1 0 0]

func DivInt64

func DivInt64(left, right, result []int64) int

DivInt64 performs element-wise division on left and right, storing the quotients in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []int64{1, 9, 2, 8}
right := []int64{3, 7, 4, 6, 5}
result := []int64{0, 0, 0, 0, 0, 0}
length := DivInt64(left, right, result)
fmt.Print(length, result)
Output:

4 [0 1 0 1 0 0]

func MulFloat32

func MulFloat32(left, right, result []float32) int

MulFloat32 performs element-wise multiplication on left and right, storing the products in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

func MulFloat64

func MulFloat64(left, right, result []float64) int

MulFloat64 performs element-wise multiplication on left and right, storing the products in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

func MulInt32

func MulInt32(left, right, result []int32) int

MulInt32 performs element-wise multiplication on left and right, storing the products in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

func MulInt64

func MulInt64(left, right, result []int64) int

MulInt64 performs element-wise multiplication on left and right, storing the products in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

func OrInt32

func OrInt32(left, right, result []int32) int

OrInt32 performs element-wise OR on left and right, storing the results in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []int32{1, 9, 2, 8}
right := []int32{3, 7, 4, 6, 5}
result := []int32{0, 0, 0, 0, 0, 0}
length := OrInt32(left, right, result)
fmt.Print(length, result)
Output:

4 [3 15 6 14 0 0]

func OrInt64

func OrInt64(left, right, result []int64) int

OrInt64 performs element-wise OR on left and right, storing the results in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []int64{1, 9, 2, 8}
right := []int64{3, 7, 4, 6, 5}
result := []int64{0, 0, 0, 0, 0, 0}
length := OrInt64(left, right, result)
fmt.Print(length, result)
Output:

4 [3 15 6 14 0 0]

func SubFloat32

func SubFloat32(left, right, result []float32) int

SubFloat32 performs element-wise subtraction on left and right, storing the differences in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []float32{1, 9, 2, 8}
right := []float32{3, 7, 4, 6, 5}
result := []float32{0, 0, 0, 0, 0, 0}
length := SubFloat32(left, right, result)
fmt.Print(length, result)
Output:

4 [-2 2 -2 2 0 0]

func SubFloat64

func SubFloat64(left, right, result []float64) int

SubFloat64 performs element-wise subtraction on left and right, storing the differences in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []float64{1, 9, 2, 8}
right := []float64{3, 7, 4, 6, 5}
result := []float64{0, 0, 0, 0, 0, 0}
length := SubFloat64(left, right, result)
fmt.Print(length, result)
Output:

4 [-2 2 -2 2 0 0]

func SubInt32

func SubInt32(left, right, result []int32) int

SubInt32 performs element-wise subtraction on left and right, storing the differences in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []int32{1, 9, 2, 8}
right := []int32{3, 7, 4, 6, 5}
result := []int32{0, 0, 0, 0, 0, 0}
length := SubInt32(left, right, result)
fmt.Print(length, result)
Output:

4 [-2 2 -2 2 0 0]

func SubInt64

func SubInt64(left, right, result []int64) int

SubInt64 performs element-wise subtraction on left and right, storing the differences in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []int64{1, 9, 2, 8}
right := []int64{3, 7, 4, 6, 5}
result := []int64{0, 0, 0, 0, 0, 0}
length := SubInt64(left, right, result)
fmt.Print(length, result)
Output:

4 [-2 2 -2 2 0 0]

func XorInt32

func XorInt32(left, right, result []int32) int

XorInt32 performs element-wise XOR on left and right, storing the results in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []int32{1, 9, 2, 8}
right := []int32{3, 7, 4, 6, 5}
result := []int32{0, 0, 0, 0, 0, 0}
length := XorInt32(left, right, result)
fmt.Print(length, result)
Output:

4 [2 14 6 14 0 0]

func XorInt64

func XorInt64(left, right, result []int64) int

XorInt64 performs element-wise XOR on left and right, storing the results in result. The operation is performed up to the shortest length of left, right, and result. Returns the number of operations performed.

Example
left := []int64{1, 9, 2, 8}
right := []int64{3, 7, 4, 6, 5}
result := []int64{0, 0, 0, 0, 0, 0}
length := XorInt64(left, right, result)
fmt.Print(length, result)
Output:

4 [2 14 6 14 0 0]

Types

This section is empty.

Directories

Path Synopsis
internal
avx
sse

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL