archsimd

package standard library
go1.26rc1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 16, 2025 License: BSD-3-Clause Imports: 3 Imported by: 0

Documentation

Overview

Package archsimd provides access to architecture-specific SIMD operations.

This is a low-level package that exposes hardware-specific functionality. It currently supports AMD64.

This package is experimental, and not subject to the Go 1 compatibility promise. It only exists when building with the GOEXPERIMENT=simd environment variable set.

Vector types and operations

Vector types are defined as structs, such as Int8x16 and Float64x8, corresponding to the hardware's vector registers. On AMD64, 128-, 256-, and 512-bit vectors are supported.

Mask types are defined similarly, such as Mask8x16, and are represented as opaque types, handling the differences in the underlying representations. A mask can be converted to/from the corresponding integer vector type, or to/from a bitmask.

Operations are mostly defined as methods on the vector types. Most of them are compiler intrinsics and correspond directly to hardware instructions.

Common operations include:

  • Load/Store: Load a vector from memory or store a vector to memory.
  • Arithmetic: Add, Sub, Mul, etc.
  • Bitwise: And, Or, Xor, etc.
  • Comparison: Equal, Greater, etc., which produce a mask.
  • Conversion: Convert between different vector types.
  • Field selection and rearrangement: GetElem, Permute, etc.
  • Masking: Masked, Merge.

The compiler recognizes certain patterns of operations and may optimize them to more performant instructions. For example, on AVX512, an Add operation followed by Masked may be optimized to a masked add instruction. For this reason, not all hardware instructions are available as APIs.

CPU feature checks

The package provides global variables to check for CPU features available at runtime. For example, on AMD64, the X86 variable provides methods to check for AVX2, AVX512, etc. It is recommended to check for CPU features before using the corresponding vector operations.

Notes

  • This package is not portable, as the available types and operations depend on the target architecture. It is not recommended to expose the SIMD types defined in this package in public APIs.
  • For performance reasons, it is recommended to use the vector types directly as values. It is not recommended to take the address of a vector type, allocate it in the heap, or put it in an aggregate type.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ClearAVXUpperBits

func ClearAVXUpperBits()

ClearAVXUpperBits clears the high bits of Y0-Y15 and Z0-Z15 registers. It is intended for transitioning from AVX to SSE, eliminating the performance penalties caused by false dependencies.

Note: in the future the compiler may automatically generate the instruction, making this function unnecessary.

Asm: VZEROUPPER, CPU Feature: AVX

Types

type Float32x16

type Float32x16 struct {
	// contains filtered or unexported fields
}

Float32x16 is a 512-bit SIMD vector of 16 float32

func BroadcastFloat32x16

func BroadcastFloat32x16(x float32) Float32x16

BroadcastFloat32x16 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512F

func LoadFloat32x16

func LoadFloat32x16(y *[16]float32) Float32x16

LoadFloat32x16 loads a Float32x16 from an array

func LoadFloat32x16Slice

func LoadFloat32x16Slice(s []float32) Float32x16

LoadFloat32x16Slice loads a Float32x16 from a slice of at least 16 float32s

func LoadFloat32x16SlicePart

func LoadFloat32x16SlicePart(s []float32) Float32x16

LoadFloat32x16SlicePart loads a Float32x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadFloat32x16Slice.

func LoadMaskedFloat32x16

func LoadMaskedFloat32x16(y *[16]float32, mask Mask32x16) Float32x16

LoadMaskedFloat32x16 loads a Float32x16 from an array, at those elements enabled by mask

Asm: VMOVDQU32.Z, CPU Feature: AVX512

func (Float32x16) Add

func (x Float32x16) Add(y Float32x16) Float32x16

Add adds corresponding elements of two vectors.

Asm: VADDPS, CPU Feature: AVX512

func (Float32x16) AsFloat64x8

func (from Float32x16) AsFloat64x8() (to Float64x8)

Float64x8 converts from Float32x16 to Float64x8

func (Float32x16) AsInt16x32

func (from Float32x16) AsInt16x32() (to Int16x32)

Int16x32 converts from Float32x16 to Int16x32

func (Float32x16) AsInt32x16

func (from Float32x16) AsInt32x16() (to Int32x16)

Int32x16 converts from Float32x16 to Int32x16

func (Float32x16) AsInt64x8

func (from Float32x16) AsInt64x8() (to Int64x8)

Int64x8 converts from Float32x16 to Int64x8

func (Float32x16) AsInt8x64

func (from Float32x16) AsInt8x64() (to Int8x64)

Int8x64 converts from Float32x16 to Int8x64

func (Float32x16) AsUint16x32

func (from Float32x16) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Float32x16 to Uint16x32

func (Float32x16) AsUint32x16

func (from Float32x16) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Float32x16 to Uint32x16

func (Float32x16) AsUint64x8

func (from Float32x16) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Float32x16 to Uint64x8

func (Float32x16) AsUint8x64

func (from Float32x16) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Float32x16 to Uint8x64

func (Float32x16) CeilScaled

func (x Float32x16) CeilScaled(prec uint8) Float32x16

CeilScaled rounds elements up with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x16) CeilScaledResidue

func (x Float32x16) CeilScaledResidue(prec uint8) Float32x16

CeilScaledResidue computes the difference after ceiling with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x16) Compress

func (x Float32x16) Compress(mask Mask32x16) Float32x16

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VCOMPRESSPS, CPU Feature: AVX512

func (Float32x16) ConcatPermute

func (x Float32x16) ConcatPermute(y Float32x16, indices Uint32x16) Float32x16

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2PS, CPU Feature: AVX512

func (Float32x16) ConvertToInt32

func (x Float32x16) ConvertToInt32() Int32x16

ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2DQ, CPU Feature: AVX512

func (Float32x16) ConvertToUint32

func (x Float32x16) ConvertToUint32() Uint32x16

ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2UDQ, CPU Feature: AVX512

func (Float32x16) Div

func (x Float32x16) Div(y Float32x16) Float32x16

Div divides elements of two vectors.

Asm: VDIVPS, CPU Feature: AVX512

func (Float32x16) Equal

func (x Float32x16) Equal(y Float32x16) Mask32x16

Equal returns x equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX512

func (Float32x16) Expand

func (x Float32x16) Expand(mask Mask32x16) Float32x16

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VEXPANDPS, CPU Feature: AVX512

func (Float32x16) FloorScaled

func (x Float32x16) FloorScaled(prec uint8) Float32x16

FloorScaled rounds elements down with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x16) FloorScaledResidue

func (x Float32x16) FloorScaledResidue(prec uint8) Float32x16

FloorScaledResidue computes the difference after flooring with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x16) GetHi

func (x Float32x16) GetHi() Float32x8

GetHi returns the upper half of x.

Asm: VEXTRACTF64X4, CPU Feature: AVX512

func (Float32x16) GetLo

func (x Float32x16) GetLo() Float32x8

GetLo returns the lower half of x.

Asm: VEXTRACTF64X4, CPU Feature: AVX512

func (Float32x16) Greater

func (x Float32x16) Greater(y Float32x16) Mask32x16

Greater returns x greater-than y, elementwise.

Asm: VCMPPS, CPU Feature: AVX512

func (Float32x16) GreaterEqual

func (x Float32x16) GreaterEqual(y Float32x16) Mask32x16

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX512

func (Float32x16) IsNan

func (x Float32x16) IsNan(y Float32x16) Mask32x16

IsNan checks if elements are NaN. Use as x.IsNan(x).

Asm: VCMPPS, CPU Feature: AVX512

func (Float32x16) Len

func (x Float32x16) Len() int

Len returns the number of elements in a Float32x16

func (Float32x16) Less

func (x Float32x16) Less(y Float32x16) Mask32x16

Less returns x less-than y, elementwise.

Asm: VCMPPS, CPU Feature: AVX512

func (Float32x16) LessEqual

func (x Float32x16) LessEqual(y Float32x16) Mask32x16

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX512

func (Float32x16) Masked

func (x Float32x16) Masked(mask Mask32x16) Float32x16

Masked returns x but with elements zeroed where mask is false.

func (Float32x16) Max

func (x Float32x16) Max(y Float32x16) Float32x16

Max computes the maximum of corresponding elements.

Asm: VMAXPS, CPU Feature: AVX512

func (Float32x16) Merge

func (x Float32x16) Merge(y Float32x16, mask Mask32x16) Float32x16

Merge returns x but with elements set to y where m is false.

func (Float32x16) Min

func (x Float32x16) Min(y Float32x16) Float32x16

Min computes the minimum of corresponding elements.

Asm: VMINPS, CPU Feature: AVX512

func (Float32x16) Mul

func (x Float32x16) Mul(y Float32x16) Float32x16

Mul multiplies corresponding elements of two vectors.

Asm: VMULPS, CPU Feature: AVX512

func (Float32x16) MulAdd

func (x Float32x16) MulAdd(y Float32x16, z Float32x16) Float32x16

MulAdd performs a fused (x * y) + z.

Asm: VFMADD213PS, CPU Feature: AVX512

func (Float32x16) MulAddSub

func (x Float32x16) MulAddSub(y Float32x16, z Float32x16) Float32x16

MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.

Asm: VFMADDSUB213PS, CPU Feature: AVX512

func (Float32x16) MulSubAdd

func (x Float32x16) MulSubAdd(y Float32x16, z Float32x16) Float32x16

MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.

Asm: VFMSUBADD213PS, CPU Feature: AVX512

func (Float32x16) NotEqual

func (x Float32x16) NotEqual(y Float32x16) Mask32x16

NotEqual returns x not-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX512

func (Float32x16) Permute

func (x Float32x16) Permute(indices Uint32x16) Float32x16

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used

Asm: VPERMPS, CPU Feature: AVX512

func (Float32x16) Reciprocal

func (x Float32x16) Reciprocal() Float32x16

Reciprocal computes an approximate reciprocal of each element.

Asm: VRCP14PS, CPU Feature: AVX512

func (Float32x16) ReciprocalSqrt

func (x Float32x16) ReciprocalSqrt() Float32x16

ReciprocalSqrt computes an approximate reciprocal of the square root of each element.

Asm: VRSQRT14PS, CPU Feature: AVX512

func (Float32x16) RoundToEvenScaled

func (x Float32x16) RoundToEvenScaled(prec uint8) Float32x16

RoundToEvenScaled rounds elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x16) RoundToEvenScaledResidue

func (x Float32x16) RoundToEvenScaledResidue(prec uint8) Float32x16

RoundToEvenScaledResidue computes the difference after rounding with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x16) Scale

func (x Float32x16) Scale(y Float32x16) Float32x16

Scale multiplies elements by a power of 2.

Asm: VSCALEFPS, CPU Feature: AVX512

func (Float32x16) SelectFromPairGrouped

func (x Float32x16) SelectFromPairGrouped(a, b, c, d uint8, y Float32x16) Float32x16

SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX512

func (Float32x16) SetHi

func (x Float32x16) SetHi(y Float32x8) Float32x16

SetHi returns x with its upper half set to y.

Asm: VINSERTF64X4, CPU Feature: AVX512

func (Float32x16) SetLo

func (x Float32x16) SetLo(y Float32x8) Float32x16

SetLo returns x with its lower half set to y.

Asm: VINSERTF64X4, CPU Feature: AVX512

func (Float32x16) Sqrt

func (x Float32x16) Sqrt() Float32x16

Sqrt computes the square root of each element.

Asm: VSQRTPS, CPU Feature: AVX512

func (Float32x16) Store

func (x Float32x16) Store(y *[16]float32)

Store stores a Float32x16 to an array

func (Float32x16) StoreMasked

func (x Float32x16) StoreMasked(y *[16]float32, mask Mask32x16)

StoreMasked stores a Float32x16 to an array, at those elements enabled by mask

Asm: VMOVDQU32, CPU Feature: AVX512

func (Float32x16) StoreSlice

func (x Float32x16) StoreSlice(s []float32)

StoreSlice stores x into a slice of at least 16 float32s

func (Float32x16) StoreSlicePart

func (x Float32x16) StoreSlicePart(s []float32)

StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.

func (Float32x16) String

func (x Float32x16) String() string

String returns a string representation of SIMD vector x

func (Float32x16) Sub

func (x Float32x16) Sub(y Float32x16) Float32x16

Sub subtracts corresponding elements of two vectors.

Asm: VSUBPS, CPU Feature: AVX512

func (Float32x16) TruncScaled

func (x Float32x16) TruncScaled(prec uint8) Float32x16

TruncScaled truncates elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x16) TruncScaledResidue

func (x Float32x16) TruncScaledResidue(prec uint8) Float32x16

TruncScaledResidue computes the difference after truncating with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

type Float32x4

type Float32x4 struct {
	// contains filtered or unexported fields
}

Float32x4 is a 128-bit SIMD vector of 4 float32

func BroadcastFloat32x4

func BroadcastFloat32x4(x float32) Float32x4

BroadcastFloat32x4 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadFloat32x4

func LoadFloat32x4(y *[4]float32) Float32x4

LoadFloat32x4 loads a Float32x4 from an array

func LoadFloat32x4Slice

func LoadFloat32x4Slice(s []float32) Float32x4

LoadFloat32x4Slice loads a Float32x4 from a slice of at least 4 float32s

func LoadFloat32x4SlicePart

func LoadFloat32x4SlicePart(s []float32) Float32x4

LoadFloat32x4SlicePart loads a Float32x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadFloat32x4Slice.

func LoadMaskedFloat32x4

func LoadMaskedFloat32x4(y *[4]float32, mask Mask32x4) Float32x4

LoadMaskedFloat32x4 loads a Float32x4 from an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Float32x4) Add

func (x Float32x4) Add(y Float32x4) Float32x4

Add adds corresponding elements of two vectors.

Asm: VADDPS, CPU Feature: AVX

func (Float32x4) AddPairs

func (x Float32x4) AddPairs(y Float32x4) Float32x4

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VHADDPS, CPU Feature: AVX

func (Float32x4) AddSub

func (x Float32x4) AddSub(y Float32x4) Float32x4

AddSub subtracts even elements and adds odd elements of two vectors.

Asm: VADDSUBPS, CPU Feature: AVX

func (Float32x4) AsFloat64x2

func (from Float32x4) AsFloat64x2() (to Float64x2)

Float64x2 converts from Float32x4 to Float64x2

func (Float32x4) AsInt16x8

func (from Float32x4) AsInt16x8() (to Int16x8)

Int16x8 converts from Float32x4 to Int16x8

func (Float32x4) AsInt32x4

func (from Float32x4) AsInt32x4() (to Int32x4)

Int32x4 converts from Float32x4 to Int32x4

func (Float32x4) AsInt64x2

func (from Float32x4) AsInt64x2() (to Int64x2)

Int64x2 converts from Float32x4 to Int64x2

func (Float32x4) AsInt8x16

func (from Float32x4) AsInt8x16() (to Int8x16)

Int8x16 converts from Float32x4 to Int8x16

func (Float32x4) AsUint16x8

func (from Float32x4) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Float32x4 to Uint16x8

func (Float32x4) AsUint32x4

func (from Float32x4) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Float32x4 to Uint32x4

func (Float32x4) AsUint64x2

func (from Float32x4) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Float32x4 to Uint64x2

func (Float32x4) AsUint8x16

func (from Float32x4) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Float32x4 to Uint8x16

func (Float32x4) Broadcast128

func (x Float32x4) Broadcast128() Float32x4

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VBROADCASTSS, CPU Feature: AVX2

func (Float32x4) Broadcast256

func (x Float32x4) Broadcast256() Float32x8

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VBROADCASTSS, CPU Feature: AVX2

func (Float32x4) Broadcast512

func (x Float32x4) Broadcast512() Float32x16

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VBROADCASTSS, CPU Feature: AVX512

func (Float32x4) Ceil

func (x Float32x4) Ceil() Float32x4

Ceil rounds elements up to the nearest integer.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x4) CeilScaled

func (x Float32x4) CeilScaled(prec uint8) Float32x4

CeilScaled rounds elements up with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x4) CeilScaledResidue

func (x Float32x4) CeilScaledResidue(prec uint8) Float32x4

CeilScaledResidue computes the difference after ceiling with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x4) Compress

func (x Float32x4) Compress(mask Mask32x4) Float32x4

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VCOMPRESSPS, CPU Feature: AVX512

func (Float32x4) ConcatPermute

func (x Float32x4) ConcatPermute(y Float32x4, indices Uint32x4) Float32x4

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2PS, CPU Feature: AVX512

func (Float32x4) ConvertToFloat64

func (x Float32x4) ConvertToFloat64() Float64x4

ConvertToFloat64 converts element values to float64.

Asm: VCVTPS2PD, CPU Feature: AVX

func (Float32x4) ConvertToInt32

func (x Float32x4) ConvertToInt32() Int32x4

ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2DQ, CPU Feature: AVX

func (Float32x4) ConvertToInt64

func (x Float32x4) ConvertToInt64() Int64x4

ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2QQ, CPU Feature: AVX512

func (Float32x4) ConvertToUint32

func (x Float32x4) ConvertToUint32() Uint32x4

ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2UDQ, CPU Feature: AVX512

func (Float32x4) ConvertToUint64

func (x Float32x4) ConvertToUint64() Uint64x4

ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2UQQ, CPU Feature: AVX512

func (Float32x4) Div

func (x Float32x4) Div(y Float32x4) Float32x4

Div divides elements of two vectors.

Asm: VDIVPS, CPU Feature: AVX

func (Float32x4) Equal

func (x Float32x4) Equal(y Float32x4) Mask32x4

Equal returns x equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x4) Expand

func (x Float32x4) Expand(mask Mask32x4) Float32x4

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VEXPANDPS, CPU Feature: AVX512

func (Float32x4) Floor

func (x Float32x4) Floor() Float32x4

Floor rounds elements down to the nearest integer.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x4) FloorScaled

func (x Float32x4) FloorScaled(prec uint8) Float32x4

FloorScaled rounds elements down with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x4) FloorScaledResidue

func (x Float32x4) FloorScaledResidue(prec uint8) Float32x4

FloorScaledResidue computes the difference after flooring with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x4) GetElem

func (x Float32x4) GetElem(index uint8) float32

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRD, CPU Feature: AVX

func (Float32x4) Greater

func (x Float32x4) Greater(y Float32x4) Mask32x4

Greater returns x greater-than y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x4) GreaterEqual

func (x Float32x4) GreaterEqual(y Float32x4) Mask32x4

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x4) IsNan

func (x Float32x4) IsNan(y Float32x4) Mask32x4

IsNan checks if elements are NaN. Use as x.IsNan(x).

Asm: VCMPPS, CPU Feature: AVX

func (Float32x4) Len

func (x Float32x4) Len() int

Len returns the number of elements in a Float32x4

func (Float32x4) Less

func (x Float32x4) Less(y Float32x4) Mask32x4

Less returns x less-than y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x4) LessEqual

func (x Float32x4) LessEqual(y Float32x4) Mask32x4

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x4) Masked

func (x Float32x4) Masked(mask Mask32x4) Float32x4

Masked returns x but with elements zeroed where mask is false.

func (Float32x4) Max

func (x Float32x4) Max(y Float32x4) Float32x4

Max computes the maximum of corresponding elements.

Asm: VMAXPS, CPU Feature: AVX

func (Float32x4) Merge

func (x Float32x4) Merge(y Float32x4, mask Mask32x4) Float32x4

Merge returns x but with elements set to y where mask is false.

func (Float32x4) Min

func (x Float32x4) Min(y Float32x4) Float32x4

Min computes the minimum of corresponding elements.

Asm: VMINPS, CPU Feature: AVX

func (Float32x4) Mul

func (x Float32x4) Mul(y Float32x4) Float32x4

Mul multiplies corresponding elements of two vectors.

Asm: VMULPS, CPU Feature: AVX

func (Float32x4) MulAdd

func (x Float32x4) MulAdd(y Float32x4, z Float32x4) Float32x4

MulAdd performs a fused (x * y) + z.

Asm: VFMADD213PS, CPU Feature: AVX512

func (Float32x4) MulAddSub

func (x Float32x4) MulAddSub(y Float32x4, z Float32x4) Float32x4

MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.

Asm: VFMADDSUB213PS, CPU Feature: AVX512

func (Float32x4) MulSubAdd

func (x Float32x4) MulSubAdd(y Float32x4, z Float32x4) Float32x4

MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.

Asm: VFMSUBADD213PS, CPU Feature: AVX512

func (Float32x4) NotEqual

func (x Float32x4) NotEqual(y Float32x4) Mask32x4

NotEqual returns x not-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x4) Reciprocal

func (x Float32x4) Reciprocal() Float32x4

Reciprocal computes an approximate reciprocal of each element.

Asm: VRCPPS, CPU Feature: AVX

func (Float32x4) ReciprocalSqrt

func (x Float32x4) ReciprocalSqrt() Float32x4

ReciprocalSqrt computes an approximate reciprocal of the square root of each element.

Asm: VRSQRTPS, CPU Feature: AVX

func (Float32x4) RoundToEven

func (x Float32x4) RoundToEven() Float32x4

RoundToEven rounds elements to the nearest integer.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x4) RoundToEvenScaled

func (x Float32x4) RoundToEvenScaled(prec uint8) Float32x4

RoundToEvenScaled rounds elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x4) RoundToEvenScaledResidue

func (x Float32x4) RoundToEvenScaledResidue(prec uint8) Float32x4

RoundToEvenScaledResidue computes the difference after rounding with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x4) Scale

func (x Float32x4) Scale(y Float32x4) Float32x4

Scale multiplies elements by a power of 2.

Asm: VSCALEFPS, CPU Feature: AVX512

func (Float32x4) SelectFromPair

func (x Float32x4) SelectFromPair(a, b, c, d uint8, y Float32x4) Float32x4

SelectFromPair returns the selection of four elements from the two vectors x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX

func (Float32x4) SetElem

func (x Float32x4) SetElem(index uint8, y float32) Float32x4

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRD, CPU Feature: AVX

func (Float32x4) Sqrt

func (x Float32x4) Sqrt() Float32x4

Sqrt computes the square root of each element.

Asm: VSQRTPS, CPU Feature: AVX

func (Float32x4) Store

func (x Float32x4) Store(y *[4]float32)

Store stores a Float32x4 to an array

func (Float32x4) StoreMasked

func (x Float32x4) StoreMasked(y *[4]float32, mask Mask32x4)

StoreMasked stores a Float32x4 to an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Float32x4) StoreSlice

func (x Float32x4) StoreSlice(s []float32)

StoreSlice stores x into a slice of at least 4 float32s

func (Float32x4) StoreSlicePart

func (x Float32x4) StoreSlicePart(s []float32)

StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.

func (Float32x4) String

func (x Float32x4) String() string

String returns a string representation of SIMD vector x

func (Float32x4) Sub

func (x Float32x4) Sub(y Float32x4) Float32x4

Sub subtracts corresponding elements of two vectors.

Asm: VSUBPS, CPU Feature: AVX

func (Float32x4) SubPairs

func (x Float32x4) SubPairs(y Float32x4) Float32x4

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VHSUBPS, CPU Feature: AVX

func (Float32x4) Trunc

func (x Float32x4) Trunc() Float32x4

Trunc truncates elements towards zero.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x4) TruncScaled

func (x Float32x4) TruncScaled(prec uint8) Float32x4

TruncScaled truncates elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x4) TruncScaledResidue

func (x Float32x4) TruncScaledResidue(prec uint8) Float32x4

TruncScaledResidue computes the difference after truncating with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

type Float32x8

type Float32x8 struct {
	// contains filtered or unexported fields
}

Float32x8 is a 256-bit SIMD vector of 8 float32

func BroadcastFloat32x8

func BroadcastFloat32x8(x float32) Float32x8

BroadcastFloat32x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadFloat32x8

func LoadFloat32x8(y *[8]float32) Float32x8

LoadFloat32x8 loads a Float32x8 from an array

func LoadFloat32x8Slice

func LoadFloat32x8Slice(s []float32) Float32x8

LoadFloat32x8Slice loads a Float32x8 from a slice of at least 8 float32s

func LoadFloat32x8SlicePart

func LoadFloat32x8SlicePart(s []float32) Float32x8

LoadFloat32x8SlicePart loads a Float32x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadFloat32x8Slice.

func LoadMaskedFloat32x8

func LoadMaskedFloat32x8(y *[8]float32, mask Mask32x8) Float32x8

LoadMaskedFloat32x8 loads a Float32x8 from an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Float32x8) Add

func (x Float32x8) Add(y Float32x8) Float32x8

Add adds corresponding elements of two vectors.

Asm: VADDPS, CPU Feature: AVX

func (Float32x8) AddPairs

func (x Float32x8) AddPairs(y Float32x8) Float32x8

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VHADDPS, CPU Feature: AVX

func (Float32x8) AddSub

func (x Float32x8) AddSub(y Float32x8) Float32x8

AddSub subtracts even elements and adds odd elements of two vectors.

Asm: VADDSUBPS, CPU Feature: AVX

func (Float32x8) AsFloat64x4

func (from Float32x8) AsFloat64x4() (to Float64x4)

Float64x4 converts from Float32x8 to Float64x4

func (Float32x8) AsInt16x16

func (from Float32x8) AsInt16x16() (to Int16x16)

Int16x16 converts from Float32x8 to Int16x16

func (Float32x8) AsInt32x8

func (from Float32x8) AsInt32x8() (to Int32x8)

Int32x8 converts from Float32x8 to Int32x8

func (Float32x8) AsInt64x4

func (from Float32x8) AsInt64x4() (to Int64x4)

Int64x4 converts from Float32x8 to Int64x4

func (Float32x8) AsInt8x32

func (from Float32x8) AsInt8x32() (to Int8x32)

Int8x32 converts from Float32x8 to Int8x32

func (Float32x8) AsUint16x16

func (from Float32x8) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Float32x8 to Uint16x16

func (Float32x8) AsUint32x8

func (from Float32x8) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Float32x8 to Uint32x8

func (Float32x8) AsUint64x4

func (from Float32x8) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Float32x8 to Uint64x4

func (Float32x8) AsUint8x32

func (from Float32x8) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Float32x8 to Uint8x32

func (Float32x8) Ceil

func (x Float32x8) Ceil() Float32x8

Ceil rounds elements up to the nearest integer.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x8) CeilScaled

func (x Float32x8) CeilScaled(prec uint8) Float32x8

CeilScaled rounds elements up with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x8) CeilScaledResidue

func (x Float32x8) CeilScaledResidue(prec uint8) Float32x8

CeilScaledResidue computes the difference after ceiling with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x8) Compress

func (x Float32x8) Compress(mask Mask32x8) Float32x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VCOMPRESSPS, CPU Feature: AVX512

func (Float32x8) ConcatPermute

func (x Float32x8) ConcatPermute(y Float32x8, indices Uint32x8) Float32x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2PS, CPU Feature: AVX512

func (Float32x8) ConvertToFloat64

func (x Float32x8) ConvertToFloat64() Float64x8

ConvertToFloat64 converts element values to float64.

Asm: VCVTPS2PD, CPU Feature: AVX512

func (Float32x8) ConvertToInt32

func (x Float32x8) ConvertToInt32() Int32x8

ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2DQ, CPU Feature: AVX

func (Float32x8) ConvertToInt64

func (x Float32x8) ConvertToInt64() Int64x8

ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2QQ, CPU Feature: AVX512

func (Float32x8) ConvertToUint32

func (x Float32x8) ConvertToUint32() Uint32x8

ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2UDQ, CPU Feature: AVX512

func (Float32x8) ConvertToUint64

func (x Float32x8) ConvertToUint64() Uint64x8

ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2UQQ, CPU Feature: AVX512

func (Float32x8) Div

func (x Float32x8) Div(y Float32x8) Float32x8

Div divides elements of two vectors.

Asm: VDIVPS, CPU Feature: AVX

func (Float32x8) Equal

func (x Float32x8) Equal(y Float32x8) Mask32x8

Equal returns x equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x8) Expand

func (x Float32x8) Expand(mask Mask32x8) Float32x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VEXPANDPS, CPU Feature: AVX512

func (Float32x8) Floor

func (x Float32x8) Floor() Float32x8

Floor rounds elements down to the nearest integer.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x8) FloorScaled

func (x Float32x8) FloorScaled(prec uint8) Float32x8

FloorScaled rounds elements down with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x8) FloorScaledResidue

func (x Float32x8) FloorScaledResidue(prec uint8) Float32x8

FloorScaledResidue computes the difference after flooring with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x8) GetHi

func (x Float32x8) GetHi() Float32x4

GetHi returns the upper half of x.

Asm: VEXTRACTF128, CPU Feature: AVX

func (Float32x8) GetLo

func (x Float32x8) GetLo() Float32x4

GetLo returns the lower half of x.

Asm: VEXTRACTF128, CPU Feature: AVX

func (Float32x8) Greater

func (x Float32x8) Greater(y Float32x8) Mask32x8

Greater returns x greater-than y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x8) GreaterEqual

func (x Float32x8) GreaterEqual(y Float32x8) Mask32x8

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x8) IsNan

func (x Float32x8) IsNan(y Float32x8) Mask32x8

IsNan checks if elements are NaN. Use as x.IsNan(x).

Asm: VCMPPS, CPU Feature: AVX

func (Float32x8) Len

func (x Float32x8) Len() int

Len returns the number of elements in a Float32x8

func (Float32x8) Less

func (x Float32x8) Less(y Float32x8) Mask32x8

Less returns x less-than y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x8) LessEqual

func (x Float32x8) LessEqual(y Float32x8) Mask32x8

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x8) Masked

func (x Float32x8) Masked(mask Mask32x8) Float32x8

Masked returns x but with elements zeroed where mask is false.

func (Float32x8) Max

func (x Float32x8) Max(y Float32x8) Float32x8

Max computes the maximum of corresponding elements.

Asm: VMAXPS, CPU Feature: AVX

func (Float32x8) Merge

func (x Float32x8) Merge(y Float32x8, mask Mask32x8) Float32x8

Merge returns x but with elements set to y where mask is false.

func (Float32x8) Min

func (x Float32x8) Min(y Float32x8) Float32x8

Min computes the minimum of corresponding elements.

Asm: VMINPS, CPU Feature: AVX

func (Float32x8) Mul

func (x Float32x8) Mul(y Float32x8) Float32x8

Mul multiplies corresponding elements of two vectors.

Asm: VMULPS, CPU Feature: AVX

func (Float32x8) MulAdd

func (x Float32x8) MulAdd(y Float32x8, z Float32x8) Float32x8

MulAdd performs a fused (x * y) + z.

Asm: VFMADD213PS, CPU Feature: AVX512

func (Float32x8) MulAddSub

func (x Float32x8) MulAddSub(y Float32x8, z Float32x8) Float32x8

MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.

Asm: VFMADDSUB213PS, CPU Feature: AVX512

func (Float32x8) MulSubAdd

func (x Float32x8) MulSubAdd(y Float32x8, z Float32x8) Float32x8

MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.

Asm: VFMSUBADD213PS, CPU Feature: AVX512

func (Float32x8) NotEqual

func (x Float32x8) NotEqual(y Float32x8) Mask32x8

NotEqual returns x not-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x8) Permute

func (x Float32x8) Permute(indices Uint32x8) Float32x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMPS, CPU Feature: AVX2

func (Float32x8) Reciprocal

func (x Float32x8) Reciprocal() Float32x8

Reciprocal computes an approximate reciprocal of each element.

Asm: VRCPPS, CPU Feature: AVX

func (Float32x8) ReciprocalSqrt

func (x Float32x8) ReciprocalSqrt() Float32x8

ReciprocalSqrt computes an approximate reciprocal of the square root of each element.

Asm: VRSQRTPS, CPU Feature: AVX

func (Float32x8) RoundToEven

func (x Float32x8) RoundToEven() Float32x8

RoundToEven rounds elements to the nearest integer.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x8) RoundToEvenScaled

func (x Float32x8) RoundToEvenScaled(prec uint8) Float32x8

RoundToEvenScaled rounds elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x8) RoundToEvenScaledResidue

func (x Float32x8) RoundToEvenScaledResidue(prec uint8) Float32x8

RoundToEvenScaledResidue computes the difference after rounding with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x8) Scale

func (x Float32x8) Scale(y Float32x8) Float32x8

Scale multiplies elements by a power of 2.

Asm: VSCALEFPS, CPU Feature: AVX512

func (Float32x8) Select128FromPair

func (x Float32x8) Select128FromPair(lo, hi uint8, y Float32x8) Float32x8

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})

returns {70, 71, 72, 73, 40, 41, 42, 43}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2F128, CPU Feature: AVX

func (Float32x8) SelectFromPairGrouped

func (x Float32x8) SelectFromPairGrouped(a, b, c, d uint8, y Float32x8) Float32x8

SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})

returns {4,8,25,81,64,128,169,289}

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX

func (Float32x8) SetHi

func (x Float32x8) SetHi(y Float32x4) Float32x8

SetHi returns x with its upper half set to y.

Asm: VINSERTF128, CPU Feature: AVX

func (Float32x8) SetLo

func (x Float32x8) SetLo(y Float32x4) Float32x8

SetLo returns x with its lower half set to y.

Asm: VINSERTF128, CPU Feature: AVX

func (Float32x8) Sqrt

func (x Float32x8) Sqrt() Float32x8

Sqrt computes the square root of each element.

Asm: VSQRTPS, CPU Feature: AVX

func (Float32x8) Store

func (x Float32x8) Store(y *[8]float32)

Store stores a Float32x8 to an array

func (Float32x8) StoreMasked

func (x Float32x8) StoreMasked(y *[8]float32, mask Mask32x8)

StoreMasked stores a Float32x8 to an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Float32x8) StoreSlice

func (x Float32x8) StoreSlice(s []float32)

StoreSlice stores x into a slice of at least 8 float32s

func (Float32x8) StoreSlicePart

func (x Float32x8) StoreSlicePart(s []float32)

StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Float32x8) String

func (x Float32x8) String() string

String returns a string representation of SIMD vector x

func (Float32x8) Sub

func (x Float32x8) Sub(y Float32x8) Float32x8

Sub subtracts corresponding elements of two vectors.

Asm: VSUBPS, CPU Feature: AVX

func (Float32x8) SubPairs

func (x Float32x8) SubPairs(y Float32x8) Float32x8

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VHSUBPS, CPU Feature: AVX

func (Float32x8) Trunc

func (x Float32x8) Trunc() Float32x8

Trunc truncates elements towards zero.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x8) TruncScaled

func (x Float32x8) TruncScaled(prec uint8) Float32x8

TruncScaled truncates elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x8) TruncScaledResidue

func (x Float32x8) TruncScaledResidue(prec uint8) Float32x8

TruncScaledResidue computes the difference after truncating with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

type Float64x2

type Float64x2 struct {
	// contains filtered or unexported fields
}

Float64x2 is a 128-bit SIMD vector of 2 float64

func BroadcastFloat64x2

func BroadcastFloat64x2(x float64) Float64x2

BroadcastFloat64x2 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadFloat64x2

func LoadFloat64x2(y *[2]float64) Float64x2

LoadFloat64x2 loads a Float64x2 from an array

func LoadFloat64x2Slice

func LoadFloat64x2Slice(s []float64) Float64x2

LoadFloat64x2Slice loads a Float64x2 from a slice of at least 2 float64s

func LoadFloat64x2SlicePart

func LoadFloat64x2SlicePart(s []float64) Float64x2

LoadFloat64x2SlicePart loads a Float64x2 from the slice s. If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes. If s has 2 or more elements, the function is equivalent to LoadFloat64x2Slice.

func LoadMaskedFloat64x2

func LoadMaskedFloat64x2(y *[2]float64, mask Mask64x2) Float64x2

LoadMaskedFloat64x2 loads a Float64x2 from an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Float64x2) Add

func (x Float64x2) Add(y Float64x2) Float64x2

Add adds corresponding elements of two vectors.

Asm: VADDPD, CPU Feature: AVX

func (Float64x2) AddPairs

func (x Float64x2) AddPairs(y Float64x2) Float64x2

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VHADDPD, CPU Feature: AVX

func (Float64x2) AddSub

func (x Float64x2) AddSub(y Float64x2) Float64x2

AddSub subtracts even elements and adds odd elements of two vectors.

Asm: VADDSUBPD, CPU Feature: AVX

func (Float64x2) AsFloat32x4

func (from Float64x2) AsFloat32x4() (to Float32x4)

Float32x4 converts from Float64x2 to Float32x4

func (Float64x2) AsInt16x8

func (from Float64x2) AsInt16x8() (to Int16x8)

Int16x8 converts from Float64x2 to Int16x8

func (Float64x2) AsInt32x4

func (from Float64x2) AsInt32x4() (to Int32x4)

Int32x4 converts from Float64x2 to Int32x4

func (Float64x2) AsInt64x2

func (from Float64x2) AsInt64x2() (to Int64x2)

Int64x2 converts from Float64x2 to Int64x2

func (Float64x2) AsInt8x16

func (from Float64x2) AsInt8x16() (to Int8x16)

Int8x16 converts from Float64x2 to Int8x16

func (Float64x2) AsUint16x8

func (from Float64x2) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Float64x2 to Uint16x8

func (Float64x2) AsUint32x4

func (from Float64x2) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Float64x2 to Uint32x4

func (Float64x2) AsUint64x2

func (from Float64x2) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Float64x2 to Uint64x2

func (Float64x2) AsUint8x16

func (from Float64x2) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Float64x2 to Uint8x16

func (Float64x2) Broadcast128

func (x Float64x2) Broadcast128() Float64x2

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTQ, CPU Feature: AVX2

func (Float64x2) Broadcast256

func (x Float64x2) Broadcast256() Float64x4

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VBROADCASTSD, CPU Feature: AVX2

func (Float64x2) Broadcast512

func (x Float64x2) Broadcast512() Float64x8

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VBROADCASTSD, CPU Feature: AVX512

func (Float64x2) Ceil

func (x Float64x2) Ceil() Float64x2

Ceil rounds elements up to the nearest integer.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x2) CeilScaled

func (x Float64x2) CeilScaled(prec uint8) Float64x2

CeilScaled rounds elements up with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x2) CeilScaledResidue

func (x Float64x2) CeilScaledResidue(prec uint8) Float64x2

CeilScaledResidue computes the difference after ceiling with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x2) Compress

func (x Float64x2) Compress(mask Mask64x2) Float64x2

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VCOMPRESSPD, CPU Feature: AVX512

func (Float64x2) ConcatPermute

func (x Float64x2) ConcatPermute(y Float64x2, indices Uint64x2) Float64x2

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2PD, CPU Feature: AVX512

func (Float64x2) ConvertToFloat32

func (x Float64x2) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32. The result vector's elements are rounded to the nearest value.

Asm: VCVTPD2PSX, CPU Feature: AVX

func (Float64x2) ConvertToInt32

func (x Float64x2) ConvertToInt32() Int32x4

ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2DQX, CPU Feature: AVX

func (Float64x2) ConvertToInt64

func (x Float64x2) ConvertToInt64() Int64x2

ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2QQ, CPU Feature: AVX512

func (Float64x2) ConvertToUint32

func (x Float64x2) ConvertToUint32() Uint32x4

ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2UDQX, CPU Feature: AVX512

func (Float64x2) ConvertToUint64

func (x Float64x2) ConvertToUint64() Uint64x2

ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2UQQ, CPU Feature: AVX512

func (Float64x2) Div

func (x Float64x2) Div(y Float64x2) Float64x2

Div divides elements of two vectors.

Asm: VDIVPD, CPU Feature: AVX

func (Float64x2) Equal

func (x Float64x2) Equal(y Float64x2) Mask64x2

Equal returns x equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x2) Expand

func (x Float64x2) Expand(mask Mask64x2) Float64x2

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VEXPANDPD, CPU Feature: AVX512

func (Float64x2) Floor

func (x Float64x2) Floor() Float64x2

Floor rounds elements down to the nearest integer.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x2) FloorScaled

func (x Float64x2) FloorScaled(prec uint8) Float64x2

FloorScaled rounds elements down with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x2) FloorScaledResidue

func (x Float64x2) FloorScaledResidue(prec uint8) Float64x2

FloorScaledResidue computes the difference after flooring with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x2) GetElem

func (x Float64x2) GetElem(index uint8) float64

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRQ, CPU Feature: AVX

func (Float64x2) Greater

func (x Float64x2) Greater(y Float64x2) Mask64x2

Greater returns x greater-than y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x2) GreaterEqual

func (x Float64x2) GreaterEqual(y Float64x2) Mask64x2

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x2) IsNan

func (x Float64x2) IsNan(y Float64x2) Mask64x2

IsNan checks if elements are NaN. Use as x.IsNan(x).

Asm: VCMPPD, CPU Feature: AVX

func (Float64x2) Len

func (x Float64x2) Len() int

Len returns the number of elements in a Float64x2

func (Float64x2) Less

func (x Float64x2) Less(y Float64x2) Mask64x2

Less returns x less-than y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x2) LessEqual

func (x Float64x2) LessEqual(y Float64x2) Mask64x2

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x2) Masked

func (x Float64x2) Masked(mask Mask64x2) Float64x2

Masked returns x but with elements zeroed where mask is false.

func (Float64x2) Max

func (x Float64x2) Max(y Float64x2) Float64x2

Max computes the maximum of corresponding elements.

Asm: VMAXPD, CPU Feature: AVX

func (Float64x2) Merge

func (x Float64x2) Merge(y Float64x2, mask Mask64x2) Float64x2

Merge returns x but with elements set to y where mask is false.

func (Float64x2) Min

func (x Float64x2) Min(y Float64x2) Float64x2

Min computes the minimum of corresponding elements.

Asm: VMINPD, CPU Feature: AVX

func (Float64x2) Mul

func (x Float64x2) Mul(y Float64x2) Float64x2

Mul multiplies corresponding elements of two vectors.

Asm: VMULPD, CPU Feature: AVX

func (Float64x2) MulAdd

func (x Float64x2) MulAdd(y Float64x2, z Float64x2) Float64x2

MulAdd performs a fused (x * y) + z.

Asm: VFMADD213PD, CPU Feature: AVX512

func (Float64x2) MulAddSub

func (x Float64x2) MulAddSub(y Float64x2, z Float64x2) Float64x2

MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.

Asm: VFMADDSUB213PD, CPU Feature: AVX512

func (Float64x2) MulSubAdd

func (x Float64x2) MulSubAdd(y Float64x2, z Float64x2) Float64x2

MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.

Asm: VFMSUBADD213PD, CPU Feature: AVX512

func (Float64x2) NotEqual

func (x Float64x2) NotEqual(y Float64x2) Mask64x2

NotEqual returns x not-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x2) Reciprocal

func (x Float64x2) Reciprocal() Float64x2

Reciprocal computes an approximate reciprocal of each element.

Asm: VRCP14PD, CPU Feature: AVX512

func (Float64x2) ReciprocalSqrt

func (x Float64x2) ReciprocalSqrt() Float64x2

ReciprocalSqrt computes an approximate reciprocal of the square root of each element.

Asm: VRSQRT14PD, CPU Feature: AVX512

func (Float64x2) RoundToEven

func (x Float64x2) RoundToEven() Float64x2

RoundToEven rounds elements to the nearest integer.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x2) RoundToEvenScaled

func (x Float64x2) RoundToEvenScaled(prec uint8) Float64x2

RoundToEvenScaled rounds elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x2) RoundToEvenScaledResidue

func (x Float64x2) RoundToEvenScaledResidue(prec uint8) Float64x2

RoundToEvenScaledResidue computes the difference after rounding with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x2) Scale

func (x Float64x2) Scale(y Float64x2) Float64x2

Scale multiplies elements by a power of 2.

Asm: VSCALEFPD, CPU Feature: AVX512

func (Float64x2) SelectFromPair

func (x Float64x2) SelectFromPair(a, b uint8, y Float64x2) Float64x2

SelectFromPair returns the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX

func (Float64x2) SetElem

func (x Float64x2) SetElem(index uint8, y float64) Float64x2

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRQ, CPU Feature: AVX

func (Float64x2) Sqrt

func (x Float64x2) Sqrt() Float64x2

Sqrt computes the square root of each element.

Asm: VSQRTPD, CPU Feature: AVX

func (Float64x2) Store

func (x Float64x2) Store(y *[2]float64)

Store stores a Float64x2 to an array

func (Float64x2) StoreMasked

func (x Float64x2) StoreMasked(y *[2]float64, mask Mask64x2)

StoreMasked stores a Float64x2 to an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Float64x2) StoreSlice

func (x Float64x2) StoreSlice(s []float64)

StoreSlice stores x into a slice of at least 2 float64s

func (Float64x2) StoreSlicePart

func (x Float64x2) StoreSlicePart(s []float64)

StoreSlicePart stores the 2 elements of x into the slice s. It stores as many elements as will fit in s. If s has 2 or more elements, the method is equivalent to x.StoreSlice.

func (Float64x2) String

func (x Float64x2) String() string

String returns a string representation of SIMD vector x

func (Float64x2) Sub

func (x Float64x2) Sub(y Float64x2) Float64x2

Sub subtracts corresponding elements of two vectors.

Asm: VSUBPD, CPU Feature: AVX

func (Float64x2) SubPairs

func (x Float64x2) SubPairs(y Float64x2) Float64x2

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VHSUBPD, CPU Feature: AVX

func (Float64x2) Trunc

func (x Float64x2) Trunc() Float64x2

Trunc truncates elements towards zero.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x2) TruncScaled

func (x Float64x2) TruncScaled(prec uint8) Float64x2

TruncScaled truncates elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x2) TruncScaledResidue

func (x Float64x2) TruncScaledResidue(prec uint8) Float64x2

TruncScaledResidue computes the difference after truncating with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

type Float64x4

type Float64x4 struct {
	// contains filtered or unexported fields
}

Float64x4 is a 256-bit SIMD vector of 4 float64

func BroadcastFloat64x4

func BroadcastFloat64x4(x float64) Float64x4

BroadcastFloat64x4 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadFloat64x4

func LoadFloat64x4(y *[4]float64) Float64x4

LoadFloat64x4 loads a Float64x4 from an array

func LoadFloat64x4Slice

func LoadFloat64x4Slice(s []float64) Float64x4

LoadFloat64x4Slice loads a Float64x4 from a slice of at least 4 float64s

func LoadFloat64x4SlicePart

func LoadFloat64x4SlicePart(s []float64) Float64x4

LoadFloat64x4SlicePart loads a Float64x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadFloat64x4Slice.

func LoadMaskedFloat64x4

func LoadMaskedFloat64x4(y *[4]float64, mask Mask64x4) Float64x4

LoadMaskedFloat64x4 loads a Float64x4 from an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Float64x4) Add

func (x Float64x4) Add(y Float64x4) Float64x4

Add adds corresponding elements of two vectors.

Asm: VADDPD, CPU Feature: AVX

func (Float64x4) AddPairs

func (x Float64x4) AddPairs(y Float64x4) Float64x4

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VHADDPD, CPU Feature: AVX

func (Float64x4) AddSub

func (x Float64x4) AddSub(y Float64x4) Float64x4

AddSub subtracts even elements and adds odd elements of two vectors.

Asm: VADDSUBPD, CPU Feature: AVX

func (Float64x4) AsFloat32x8

func (from Float64x4) AsFloat32x8() (to Float32x8)

Float32x8 converts from Float64x4 to Float32x8

func (Float64x4) AsInt16x16

func (from Float64x4) AsInt16x16() (to Int16x16)

Int16x16 converts from Float64x4 to Int16x16

func (Float64x4) AsInt32x8

func (from Float64x4) AsInt32x8() (to Int32x8)

Int32x8 converts from Float64x4 to Int32x8

func (Float64x4) AsInt64x4

func (from Float64x4) AsInt64x4() (to Int64x4)

Int64x4 converts from Float64x4 to Int64x4

func (Float64x4) AsInt8x32

func (from Float64x4) AsInt8x32() (to Int8x32)

Int8x32 converts from Float64x4 to Int8x32

func (Float64x4) AsUint16x16

func (from Float64x4) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Float64x4 to Uint16x16

func (Float64x4) AsUint32x8

func (from Float64x4) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Float64x4 to Uint32x8

func (Float64x4) AsUint64x4

func (from Float64x4) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Float64x4 to Uint64x4

func (Float64x4) AsUint8x32

func (from Float64x4) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Float64x4 to Uint8x32

func (Float64x4) Ceil

func (x Float64x4) Ceil() Float64x4

Ceil rounds elements up to the nearest integer.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x4) CeilScaled

func (x Float64x4) CeilScaled(prec uint8) Float64x4

CeilScaled rounds elements up with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x4) CeilScaledResidue

func (x Float64x4) CeilScaledResidue(prec uint8) Float64x4

CeilScaledResidue computes the difference after ceiling with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x4) Compress

func (x Float64x4) Compress(mask Mask64x4) Float64x4

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VCOMPRESSPD, CPU Feature: AVX512

func (Float64x4) ConcatPermute

func (x Float64x4) ConcatPermute(y Float64x4, indices Uint64x4) Float64x4

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2PD, CPU Feature: AVX512

func (Float64x4) ConvertToFloat32

func (x Float64x4) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32. The result vector's elements are rounded to the nearest value.

Asm: VCVTPD2PSY, CPU Feature: AVX

func (Float64x4) ConvertToInt32

func (x Float64x4) ConvertToInt32() Int32x4

ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2DQY, CPU Feature: AVX

func (Float64x4) ConvertToInt64

func (x Float64x4) ConvertToInt64() Int64x4

ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2QQ, CPU Feature: AVX512

func (Float64x4) ConvertToUint32

func (x Float64x4) ConvertToUint32() Uint32x4

ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2UDQY, CPU Feature: AVX512

func (Float64x4) ConvertToUint64

func (x Float64x4) ConvertToUint64() Uint64x4

ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2UQQ, CPU Feature: AVX512

func (Float64x4) Div

func (x Float64x4) Div(y Float64x4) Float64x4

Div divides elements of two vectors.

Asm: VDIVPD, CPU Feature: AVX

func (Float64x4) Equal

func (x Float64x4) Equal(y Float64x4) Mask64x4

Equal returns x equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x4) Expand

func (x Float64x4) Expand(mask Mask64x4) Float64x4

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VEXPANDPD, CPU Feature: AVX512

func (Float64x4) Floor

func (x Float64x4) Floor() Float64x4

Floor rounds elements down to the nearest integer.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x4) FloorScaled

func (x Float64x4) FloorScaled(prec uint8) Float64x4

FloorScaled rounds elements down with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x4) FloorScaledResidue

func (x Float64x4) FloorScaledResidue(prec uint8) Float64x4

FloorScaledResidue computes the difference after flooring with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x4) GetHi

func (x Float64x4) GetHi() Float64x2

GetHi returns the upper half of x.

Asm: VEXTRACTF128, CPU Feature: AVX

func (Float64x4) GetLo

func (x Float64x4) GetLo() Float64x2

GetLo returns the lower half of x.

Asm: VEXTRACTF128, CPU Feature: AVX

func (Float64x4) Greater

func (x Float64x4) Greater(y Float64x4) Mask64x4

Greater returns x greater-than y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x4) GreaterEqual

func (x Float64x4) GreaterEqual(y Float64x4) Mask64x4

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x4) IsNan

func (x Float64x4) IsNan(y Float64x4) Mask64x4

IsNan checks if elements are NaN. Use as x.IsNan(x).

Asm: VCMPPD, CPU Feature: AVX

func (Float64x4) Len

func (x Float64x4) Len() int

Len returns the number of elements in a Float64x4

func (Float64x4) Less

func (x Float64x4) Less(y Float64x4) Mask64x4

Less returns x less-than y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x4) LessEqual

func (x Float64x4) LessEqual(y Float64x4) Mask64x4

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x4) Masked

func (x Float64x4) Masked(mask Mask64x4) Float64x4

Masked returns x but with elements zeroed where mask is false.

func (Float64x4) Max

func (x Float64x4) Max(y Float64x4) Float64x4

Max computes the maximum of corresponding elements.

Asm: VMAXPD, CPU Feature: AVX

func (Float64x4) Merge

func (x Float64x4) Merge(y Float64x4, mask Mask64x4) Float64x4

Merge returns x but with elements set to y where mask is false.

func (Float64x4) Min

func (x Float64x4) Min(y Float64x4) Float64x4

Min computes the minimum of corresponding elements.

Asm: VMINPD, CPU Feature: AVX

func (Float64x4) Mul

func (x Float64x4) Mul(y Float64x4) Float64x4

Mul multiplies corresponding elements of two vectors.

Asm: VMULPD, CPU Feature: AVX

func (Float64x4) MulAdd

func (x Float64x4) MulAdd(y Float64x4, z Float64x4) Float64x4

MulAdd performs a fused (x * y) + z.

Asm: VFMADD213PD, CPU Feature: AVX512

func (Float64x4) MulAddSub

func (x Float64x4) MulAddSub(y Float64x4, z Float64x4) Float64x4

MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.

Asm: VFMADDSUB213PD, CPU Feature: AVX512

func (Float64x4) MulSubAdd

func (x Float64x4) MulSubAdd(y Float64x4, z Float64x4) Float64x4

MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.

Asm: VFMSUBADD213PD, CPU Feature: AVX512

func (Float64x4) NotEqual

func (x Float64x4) NotEqual(y Float64x4) Mask64x4

NotEqual returns x not-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x4) Permute

func (x Float64x4) Permute(indices Uint64x4) Float64x4

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 2 bits (values 0-3) of each element of indices is used

Asm: VPERMPD, CPU Feature: AVX512

func (Float64x4) Reciprocal

func (x Float64x4) Reciprocal() Float64x4

Reciprocal computes an approximate reciprocal of each element.

Asm: VRCP14PD, CPU Feature: AVX512

func (Float64x4) ReciprocalSqrt

func (x Float64x4) ReciprocalSqrt() Float64x4

ReciprocalSqrt computes an approximate reciprocal of the square root of each element.

Asm: VRSQRT14PD, CPU Feature: AVX512

func (Float64x4) RoundToEven

func (x Float64x4) RoundToEven() Float64x4

RoundToEven rounds elements to the nearest integer.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x4) RoundToEvenScaled

func (x Float64x4) RoundToEvenScaled(prec uint8) Float64x4

RoundToEvenScaled rounds elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x4) RoundToEvenScaledResidue

func (x Float64x4) RoundToEvenScaledResidue(prec uint8) Float64x4

RoundToEvenScaledResidue computes the difference after rounding with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x4) Scale

func (x Float64x4) Scale(y Float64x4) Float64x4

Scale multiplies elements by a power of 2.

Asm: VSCALEFPD, CPU Feature: AVX512

func (Float64x4) Select128FromPair

func (x Float64x4) Select128FromPair(lo, hi uint8, y Float64x4) Float64x4

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})

returns {70, 71, 40, 41}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2F128, CPU Feature: AVX

func (Float64x4) SelectFromPairGrouped

func (x Float64x4) SelectFromPairGrouped(a, b uint8, y Float64x4) Float64x4

SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX

func (Float64x4) SetHi

func (x Float64x4) SetHi(y Float64x2) Float64x4

SetHi returns x with its upper half set to y.

Asm: VINSERTF128, CPU Feature: AVX

func (Float64x4) SetLo

func (x Float64x4) SetLo(y Float64x2) Float64x4

SetLo returns x with its lower half set to y.

Asm: VINSERTF128, CPU Feature: AVX

func (Float64x4) Sqrt

func (x Float64x4) Sqrt() Float64x4

Sqrt computes the square root of each element.

Asm: VSQRTPD, CPU Feature: AVX

func (Float64x4) Store

func (x Float64x4) Store(y *[4]float64)

Store stores a Float64x4 to an array

func (Float64x4) StoreMasked

func (x Float64x4) StoreMasked(y *[4]float64, mask Mask64x4)

StoreMasked stores a Float64x4 to an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Float64x4) StoreSlice

func (x Float64x4) StoreSlice(s []float64)

StoreSlice stores x into a slice of at least 4 float64s

func (Float64x4) StoreSlicePart

func (x Float64x4) StoreSlicePart(s []float64)

StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.

func (Float64x4) String

func (x Float64x4) String() string

String returns a string representation of SIMD vector x

func (Float64x4) Sub

func (x Float64x4) Sub(y Float64x4) Float64x4

Sub subtracts corresponding elements of two vectors.

Asm: VSUBPD, CPU Feature: AVX

func (Float64x4) SubPairs

func (x Float64x4) SubPairs(y Float64x4) Float64x4

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VHSUBPD, CPU Feature: AVX

func (Float64x4) Trunc

func (x Float64x4) Trunc() Float64x4

Trunc truncates elements towards zero.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x4) TruncScaled

func (x Float64x4) TruncScaled(prec uint8) Float64x4

TruncScaled truncates elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x4) TruncScaledResidue

func (x Float64x4) TruncScaledResidue(prec uint8) Float64x4

TruncScaledResidue computes the difference after truncating with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

type Float64x8

type Float64x8 struct {
	// contains filtered or unexported fields
}

Float64x8 is a 512-bit SIMD vector of 8 float64

func BroadcastFloat64x8

func BroadcastFloat64x8(x float64) Float64x8

BroadcastFloat64x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512F

func LoadFloat64x8

func LoadFloat64x8(y *[8]float64) Float64x8

LoadFloat64x8 loads a Float64x8 from an array

func LoadFloat64x8Slice

func LoadFloat64x8Slice(s []float64) Float64x8

LoadFloat64x8Slice loads a Float64x8 from a slice of at least 8 float64s

func LoadFloat64x8SlicePart

func LoadFloat64x8SlicePart(s []float64) Float64x8

LoadFloat64x8SlicePart loads a Float64x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadFloat64x8Slice.

func LoadMaskedFloat64x8

func LoadMaskedFloat64x8(y *[8]float64, mask Mask64x8) Float64x8

LoadMaskedFloat64x8 loads a Float64x8 from an array, at those elements enabled by mask

Asm: VMOVDQU64.Z, CPU Feature: AVX512

func (Float64x8) Add

func (x Float64x8) Add(y Float64x8) Float64x8

Add adds corresponding elements of two vectors.

Asm: VADDPD, CPU Feature: AVX512

func (Float64x8) AsFloat32x16

func (from Float64x8) AsFloat32x16() (to Float32x16)

Float32x16 converts from Float64x8 to Float32x16

func (Float64x8) AsInt16x32

func (from Float64x8) AsInt16x32() (to Int16x32)

Int16x32 converts from Float64x8 to Int16x32

func (Float64x8) AsInt32x16

func (from Float64x8) AsInt32x16() (to Int32x16)

Int32x16 converts from Float64x8 to Int32x16

func (Float64x8) AsInt64x8

func (from Float64x8) AsInt64x8() (to Int64x8)

Int64x8 converts from Float64x8 to Int64x8

func (Float64x8) AsInt8x64

func (from Float64x8) AsInt8x64() (to Int8x64)

Int8x64 converts from Float64x8 to Int8x64

func (Float64x8) AsUint16x32

func (from Float64x8) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Float64x8 to Uint16x32

func (Float64x8) AsUint32x16

func (from Float64x8) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Float64x8 to Uint32x16

func (Float64x8) AsUint64x8

func (from Float64x8) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Float64x8 to Uint64x8

func (Float64x8) AsUint8x64

func (from Float64x8) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Float64x8 to Uint8x64

func (Float64x8) CeilScaled

func (x Float64x8) CeilScaled(prec uint8) Float64x8

CeilScaled rounds elements up with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x8) CeilScaledResidue

func (x Float64x8) CeilScaledResidue(prec uint8) Float64x8

CeilScaledResidue computes the difference after ceiling with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x8) Compress

func (x Float64x8) Compress(mask Mask64x8) Float64x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VCOMPRESSPD, CPU Feature: AVX512

func (Float64x8) ConcatPermute

func (x Float64x8) ConcatPermute(y Float64x8, indices Uint64x8) Float64x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2PD, CPU Feature: AVX512

func (Float64x8) ConvertToFloat32

func (x Float64x8) ConvertToFloat32() Float32x8

ConvertToFloat32 converts element values to float32. The result vector's elements are rounded to the nearest value.

Asm: VCVTPD2PS, CPU Feature: AVX512

func (Float64x8) ConvertToInt32

func (x Float64x8) ConvertToInt32() Int32x8

ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2DQ, CPU Feature: AVX512

func (Float64x8) ConvertToInt64

func (x Float64x8) ConvertToInt64() Int64x8

ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2QQ, CPU Feature: AVX512

func (Float64x8) ConvertToUint32

func (x Float64x8) ConvertToUint32() Uint32x8

ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2UDQ, CPU Feature: AVX512

func (Float64x8) ConvertToUint64

func (x Float64x8) ConvertToUint64() Uint64x8

ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2UQQ, CPU Feature: AVX512

func (Float64x8) Div

func (x Float64x8) Div(y Float64x8) Float64x8

Div divides elements of two vectors.

Asm: VDIVPD, CPU Feature: AVX512

func (Float64x8) Equal

func (x Float64x8) Equal(y Float64x8) Mask64x8

Equal returns x equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX512

func (Float64x8) Expand

func (x Float64x8) Expand(mask Mask64x8) Float64x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VEXPANDPD, CPU Feature: AVX512

func (Float64x8) FloorScaled

func (x Float64x8) FloorScaled(prec uint8) Float64x8

FloorScaled rounds elements down with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x8) FloorScaledResidue

func (x Float64x8) FloorScaledResidue(prec uint8) Float64x8

FloorScaledResidue computes the difference after flooring with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x8) GetHi

func (x Float64x8) GetHi() Float64x4

GetHi returns the upper half of x.

Asm: VEXTRACTF64X4, CPU Feature: AVX512

func (Float64x8) GetLo

func (x Float64x8) GetLo() Float64x4

GetLo returns the lower half of x.

Asm: VEXTRACTF64X4, CPU Feature: AVX512

func (Float64x8) Greater

func (x Float64x8) Greater(y Float64x8) Mask64x8

Greater returns x greater-than y, elementwise.

Asm: VCMPPD, CPU Feature: AVX512

func (Float64x8) GreaterEqual

func (x Float64x8) GreaterEqual(y Float64x8) Mask64x8

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX512

func (Float64x8) IsNan

func (x Float64x8) IsNan(y Float64x8) Mask64x8

IsNan checks if elements are NaN. Use as x.IsNan(x).

Asm: VCMPPD, CPU Feature: AVX512

func (Float64x8) Len

func (x Float64x8) Len() int

Len returns the number of elements in a Float64x8

func (Float64x8) Less

func (x Float64x8) Less(y Float64x8) Mask64x8

Less returns x less-than y, elementwise.

Asm: VCMPPD, CPU Feature: AVX512

func (Float64x8) LessEqual

func (x Float64x8) LessEqual(y Float64x8) Mask64x8

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX512

func (Float64x8) Masked

func (x Float64x8) Masked(mask Mask64x8) Float64x8

Masked returns x but with elements zeroed where mask is false.

func (Float64x8) Max

func (x Float64x8) Max(y Float64x8) Float64x8

Max computes the maximum of corresponding elements.

Asm: VMAXPD, CPU Feature: AVX512

func (Float64x8) Merge

func (x Float64x8) Merge(y Float64x8, mask Mask64x8) Float64x8

Merge returns x but with elements set to y where m is false.

func (Float64x8) Min

func (x Float64x8) Min(y Float64x8) Float64x8

Min computes the minimum of corresponding elements.

Asm: VMINPD, CPU Feature: AVX512

func (Float64x8) Mul

func (x Float64x8) Mul(y Float64x8) Float64x8

Mul multiplies corresponding elements of two vectors.

Asm: VMULPD, CPU Feature: AVX512

func (Float64x8) MulAdd

func (x Float64x8) MulAdd(y Float64x8, z Float64x8) Float64x8

MulAdd performs a fused (x * y) + z.

Asm: VFMADD213PD, CPU Feature: AVX512

func (Float64x8) MulAddSub

func (x Float64x8) MulAddSub(y Float64x8, z Float64x8) Float64x8

MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.

Asm: VFMADDSUB213PD, CPU Feature: AVX512

func (Float64x8) MulSubAdd

func (x Float64x8) MulSubAdd(y Float64x8, z Float64x8) Float64x8

MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.

Asm: VFMSUBADD213PD, CPU Feature: AVX512

func (Float64x8) NotEqual

func (x Float64x8) NotEqual(y Float64x8) Mask64x8

NotEqual returns x not-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX512

func (Float64x8) Permute

func (x Float64x8) Permute(indices Uint64x8) Float64x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMPD, CPU Feature: AVX512

func (Float64x8) Reciprocal

func (x Float64x8) Reciprocal() Float64x8

Reciprocal computes an approximate reciprocal of each element.

Asm: VRCP14PD, CPU Feature: AVX512

func (Float64x8) ReciprocalSqrt

func (x Float64x8) ReciprocalSqrt() Float64x8

ReciprocalSqrt computes an approximate reciprocal of the square root of each element.

Asm: VRSQRT14PD, CPU Feature: AVX512

func (Float64x8) RoundToEvenScaled

func (x Float64x8) RoundToEvenScaled(prec uint8) Float64x8

RoundToEvenScaled rounds elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x8) RoundToEvenScaledResidue

func (x Float64x8) RoundToEvenScaledResidue(prec uint8) Float64x8

RoundToEvenScaledResidue computes the difference after rounding with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x8) Scale

func (x Float64x8) Scale(y Float64x8) Float64x8

Scale multiplies elements by a power of 2.

Asm: VSCALEFPD, CPU Feature: AVX512

func (Float64x8) SelectFromPairGrouped

func (x Float64x8) SelectFromPairGrouped(a, b uint8, y Float64x8) Float64x8

SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX512

func (Float64x8) SetHi

func (x Float64x8) SetHi(y Float64x4) Float64x8

SetHi returns x with its upper half set to y.

Asm: VINSERTF64X4, CPU Feature: AVX512

func (Float64x8) SetLo

func (x Float64x8) SetLo(y Float64x4) Float64x8

SetLo returns x with its lower half set to y.

Asm: VINSERTF64X4, CPU Feature: AVX512

func (Float64x8) Sqrt

func (x Float64x8) Sqrt() Float64x8

Sqrt computes the square root of each element.

Asm: VSQRTPD, CPU Feature: AVX512

func (Float64x8) Store

func (x Float64x8) Store(y *[8]float64)

Store stores a Float64x8 to an array

func (Float64x8) StoreMasked

func (x Float64x8) StoreMasked(y *[8]float64, mask Mask64x8)

StoreMasked stores a Float64x8 to an array, at those elements enabled by mask

Asm: VMOVDQU64, CPU Feature: AVX512

func (Float64x8) StoreSlice

func (x Float64x8) StoreSlice(s []float64)

StoreSlice stores x into a slice of at least 8 float64s

func (Float64x8) StoreSlicePart

func (x Float64x8) StoreSlicePart(s []float64)

StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Float64x8) String

func (x Float64x8) String() string

String returns a string representation of SIMD vector x

func (Float64x8) Sub

func (x Float64x8) Sub(y Float64x8) Float64x8

Sub subtracts corresponding elements of two vectors.

Asm: VSUBPD, CPU Feature: AVX512

func (Float64x8) TruncScaled

func (x Float64x8) TruncScaled(prec uint8) Float64x8

TruncScaled truncates elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x8) TruncScaledResidue

func (x Float64x8) TruncScaledResidue(prec uint8) Float64x8

TruncScaledResidue computes the difference after truncating with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

type Int16x16

type Int16x16 struct {
	// contains filtered or unexported fields
}

Int16x16 is a 256-bit SIMD vector of 16 int16

func BroadcastInt16x16

func BroadcastInt16x16(x int16) Int16x16

BroadcastInt16x16 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt16x16

func LoadInt16x16(y *[16]int16) Int16x16

LoadInt16x16 loads a Int16x16 from an array

func LoadInt16x16Slice

func LoadInt16x16Slice(s []int16) Int16x16

LoadInt16x16Slice loads an Int16x16 from a slice of at least 16 int16s

func LoadInt16x16SlicePart

func LoadInt16x16SlicePart(s []int16) Int16x16

LoadInt16x16SlicePart loads a Int16x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadInt16x16Slice.

func (Int16x16) Abs

func (x Int16x16) Abs() Int16x16

Abs computes the absolute value of each element.

Asm: VPABSW, CPU Feature: AVX2

func (Int16x16) Add

func (x Int16x16) Add(y Int16x16) Int16x16

Add adds corresponding elements of two vectors.

Asm: VPADDW, CPU Feature: AVX2

func (Int16x16) AddPairs

func (x Int16x16) AddPairs(y Int16x16) Int16x16

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDW, CPU Feature: AVX2

func (Int16x16) AddPairsSaturated

func (x Int16x16) AddPairsSaturated(y Int16x16) Int16x16

AddPairsSaturated horizontally adds adjacent pairs of elements with saturation. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDSW, CPU Feature: AVX2

func (Int16x16) AddSaturated

func (x Int16x16) AddSaturated(y Int16x16) Int16x16

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDSW, CPU Feature: AVX2

func (Int16x16) And

func (x Int16x16) And(y Int16x16) Int16x16

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Int16x16) AndNot

func (x Int16x16) AndNot(y Int16x16) Int16x16

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Int16x16) AsFloat32x8

func (from Int16x16) AsFloat32x8() (to Float32x8)

Float32x8 converts from Int16x16 to Float32x8

func (Int16x16) AsFloat64x4

func (from Int16x16) AsFloat64x4() (to Float64x4)

Float64x4 converts from Int16x16 to Float64x4

func (Int16x16) AsInt32x8

func (from Int16x16) AsInt32x8() (to Int32x8)

Int32x8 converts from Int16x16 to Int32x8

func (Int16x16) AsInt64x4

func (from Int16x16) AsInt64x4() (to Int64x4)

Int64x4 converts from Int16x16 to Int64x4

func (Int16x16) AsInt8x32

func (from Int16x16) AsInt8x32() (to Int8x32)

Int8x32 converts from Int16x16 to Int8x32

func (Int16x16) AsUint16x16

func (from Int16x16) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Int16x16 to Uint16x16

func (Int16x16) AsUint32x8

func (from Int16x16) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Int16x16 to Uint32x8

func (Int16x16) AsUint64x4

func (from Int16x16) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Int16x16 to Uint64x4

func (Int16x16) AsUint8x32

func (from Int16x16) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Int16x16 to Uint8x32

func (Int16x16) Compress

func (x Int16x16) Compress(mask Mask16x16) Int16x16

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

func (Int16x16) ConcatPermute

func (x Int16x16) ConcatPermute(y Int16x16, indices Uint16x16) Int16x16

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2W, CPU Feature: AVX512

func (Int16x16) CopySign

func (x Int16x16) CopySign(y Int16x16) Int16x16

CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.

Asm: VPSIGNW, CPU Feature: AVX2

func (Int16x16) DotProductPairs

func (x Int16x16) DotProductPairs(y Int16x16) Int32x8

DotProductPairs multiplies the elements and add the pairs together, yielding a vector of half as many elements with twice the input element size.

Asm: VPMADDWD, CPU Feature: AVX2

func (Int16x16) Equal

func (x Int16x16) Equal(y Int16x16) Mask16x16

Equal returns x equals y, elementwise.

Asm: VPCMPEQW, CPU Feature: AVX2

func (Int16x16) Expand

func (x Int16x16) Expand(mask Mask16x16) Int16x16

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

func (Int16x16) ExtendToInt32

func (x Int16x16) ExtendToInt32() Int32x16

ExtendToInt32 converts element values to int32. The result vector's elements are sign-extended.

Asm: VPMOVSXWD, CPU Feature: AVX512

func (Int16x16) GetHi

func (x Int16x16) GetHi() Int16x8

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int16x16) GetLo

func (x Int16x16) GetLo() Int16x8

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int16x16) Greater

func (x Int16x16) Greater(y Int16x16) Mask16x16

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTW, CPU Feature: AVX2

func (Int16x16) GreaterEqual

func (x Int16x16) GreaterEqual(y Int16x16) Mask16x16

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Int16x16) InterleaveHiGrouped

func (x Int16x16) InterleaveHiGrouped(y Int16x16) Int16x16

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHWD, CPU Feature: AVX2

func (Int16x16) InterleaveLoGrouped

func (x Int16x16) InterleaveLoGrouped(y Int16x16) Int16x16

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLWD, CPU Feature: AVX2

func (Int16x16) IsZero

func (x Int16x16) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int16x16) Len

func (x Int16x16) Len() int

Len returns the number of elements in a Int16x16

func (Int16x16) Less

func (x Int16x16) Less(y Int16x16) Mask16x16

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Int16x16) LessEqual

func (x Int16x16) LessEqual(y Int16x16) Mask16x16

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Int16x16) Masked

func (x Int16x16) Masked(mask Mask16x16) Int16x16

Masked returns x but with elements zeroed where mask is false.

func (Int16x16) Max

func (x Int16x16) Max(y Int16x16) Int16x16

Max computes the maximum of corresponding elements.

Asm: VPMAXSW, CPU Feature: AVX2

func (Int16x16) Merge

func (x Int16x16) Merge(y Int16x16, mask Mask16x16) Int16x16

Merge returns x but with elements set to y where mask is false.

func (Int16x16) Min

func (x Int16x16) Min(y Int16x16) Int16x16

Min computes the minimum of corresponding elements.

Asm: VPMINSW, CPU Feature: AVX2

func (Int16x16) Mul

func (x Int16x16) Mul(y Int16x16) Int16x16

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLW, CPU Feature: AVX2

func (Int16x16) MulHigh

func (x Int16x16) MulHigh(y Int16x16) Int16x16

MulHigh multiplies elements and stores the high part of the result.

Asm: VPMULHW, CPU Feature: AVX2

func (Int16x16) Not

func (x Int16x16) Not() Int16x16

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Int16x16) NotEqual

func (x Int16x16) NotEqual(y Int16x16) Mask16x16

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Int16x16) OnesCount

func (x Int16x16) OnesCount() Int16x16

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTW, CPU Feature: AVX512BITALG

func (Int16x16) Or

func (x Int16x16) Or(y Int16x16) Int16x16

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Int16x16) Permute

func (x Int16x16) Permute(indices Uint16x16) Int16x16

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used

Asm: VPERMW, CPU Feature: AVX512

func (Int16x16) PermuteScalarsHiGrouped

func (x Int16x16) PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x16

PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
	  {x[0], x[1], x[2], x[3],   x[a+4], x[b+4], x[c+4], x[d+4],
		x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFHW, CPU Feature: AVX2

func (Int16x16) PermuteScalarsLoGrouped

func (x Int16x16) PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x16

PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
 {x[a], x[b], x[c], x[d],         x[4], x[5], x[6], x[7],
	 x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFLW, CPU Feature: AVX2

func (Int16x16) SaturateToInt8

func (x Int16x16) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSWB, CPU Feature: AVX512

func (Int16x16) SaturateToUint8

func (x Int16x16) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSWB, CPU Feature: AVX512

func (Int16x16) Select128FromPair

func (x Int16x16) Select128FromPair(lo, hi uint8, y Int16x16) Int16x16

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 42, 43, 44, 45, 46, 47, 50, 51, 52, 53, 54, 55, 56, 57}.Select128FromPair(3, 0,
 {60, 61, 62, 63, 64, 65, 66, 67, 70, 71, 72, 73, 74, 75, 76, 77})

returns {70, 71, 72, 73, 74, 75, 76, 77, 40, 41, 42, 43, 44, 45, 46, 47}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Int16x16) SetHi

func (x Int16x16) SetHi(y Int16x8) Int16x16

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int16x16) SetLo

func (x Int16x16) SetLo(y Int16x8) Int16x16

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int16x16) ShiftAllLeft

func (x Int16x16) ShiftAllLeft(y uint64) Int16x16

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLW, CPU Feature: AVX2

func (Int16x16) ShiftAllLeftConcat

func (x Int16x16) ShiftAllLeftConcat(shift uint8, y Int16x16) Int16x16

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDW, CPU Feature: AVX512VBMI2

func (Int16x16) ShiftAllRight

func (x Int16x16) ShiftAllRight(y uint64) Int16x16

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAW, CPU Feature: AVX2

func (Int16x16) ShiftAllRightConcat

func (x Int16x16) ShiftAllRightConcat(shift uint8, y Int16x16) Int16x16

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDW, CPU Feature: AVX512VBMI2

func (Int16x16) ShiftLeft

func (x Int16x16) ShiftLeft(y Int16x16) Int16x16

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVW, CPU Feature: AVX512

func (Int16x16) ShiftLeftConcat

func (x Int16x16) ShiftLeftConcat(y Int16x16, z Int16x16) Int16x16

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

func (Int16x16) ShiftRight

func (x Int16x16) ShiftRight(y Int16x16) Int16x16

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVW, CPU Feature: AVX512

func (Int16x16) ShiftRightConcat

func (x Int16x16) ShiftRightConcat(y Int16x16, z Int16x16) Int16x16

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

func (Int16x16) Store

func (x Int16x16) Store(y *[16]int16)

Store stores a Int16x16 to an array

func (Int16x16) StoreSlice

func (x Int16x16) StoreSlice(s []int16)

StoreSlice stores x into a slice of at least 16 int16s

func (Int16x16) StoreSlicePart

func (x Int16x16) StoreSlicePart(s []int16)

StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.

func (Int16x16) String

func (x Int16x16) String() string

String returns a string representation of SIMD vector x

func (Int16x16) Sub

func (x Int16x16) Sub(y Int16x16) Int16x16

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBW, CPU Feature: AVX2

func (Int16x16) SubPairs

func (x Int16x16) SubPairs(y Int16x16) Int16x16

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBW, CPU Feature: AVX2

func (Int16x16) SubPairsSaturated

func (x Int16x16) SubPairsSaturated(y Int16x16) Int16x16

SubPairsSaturated horizontally subtracts adjacent pairs of elements with saturation. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBSW, CPU Feature: AVX2

func (Int16x16) SubSaturated

func (x Int16x16) SubSaturated(y Int16x16) Int16x16

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBSW, CPU Feature: AVX2

func (Int16x16) ToMask

func (from Int16x16) ToMask() (to Mask16x16)

ToMask converts from Int16x16 to Mask16x16, mask element is set to true when the corresponding vector element is non-zero.

func (Int16x16) TruncateToInt8

func (x Int16x16) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVWB, CPU Feature: AVX512

func (Int16x16) Xor

func (x Int16x16) Xor(y Int16x16) Int16x16

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Int16x32

type Int16x32 struct {
	// contains filtered or unexported fields
}

Int16x32 is a 512-bit SIMD vector of 32 int16

func BroadcastInt16x32

func BroadcastInt16x32(x int16) Int16x32

BroadcastInt16x32 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512BW

func LoadInt16x32

func LoadInt16x32(y *[32]int16) Int16x32

LoadInt16x32 loads a Int16x32 from an array

func LoadInt16x32Slice

func LoadInt16x32Slice(s []int16) Int16x32

LoadInt16x32Slice loads an Int16x32 from a slice of at least 32 int16s

func LoadInt16x32SlicePart

func LoadInt16x32SlicePart(s []int16) Int16x32

LoadInt16x32SlicePart loads a Int16x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadInt16x32Slice.

func LoadMaskedInt16x32

func LoadMaskedInt16x32(y *[32]int16, mask Mask16x32) Int16x32

LoadMaskedInt16x32 loads a Int16x32 from an array, at those elements enabled by mask

Asm: VMOVDQU16.Z, CPU Feature: AVX512

func (Int16x32) Abs

func (x Int16x32) Abs() Int16x32

Abs computes the absolute value of each element.

Asm: VPABSW, CPU Feature: AVX512

func (Int16x32) Add

func (x Int16x32) Add(y Int16x32) Int16x32

Add adds corresponding elements of two vectors.

Asm: VPADDW, CPU Feature: AVX512

func (Int16x32) AddSaturated

func (x Int16x32) AddSaturated(y Int16x32) Int16x32

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDSW, CPU Feature: AVX512

func (Int16x32) And

func (x Int16x32) And(y Int16x32) Int16x32

And performs a bitwise AND operation between two vectors.

Asm: VPANDD, CPU Feature: AVX512

func (Int16x32) AndNot

func (x Int16x32) AndNot(y Int16x32) Int16x32

AndNot performs a bitwise x &^ y.

Asm: VPANDND, CPU Feature: AVX512

func (Int16x32) AsFloat32x16

func (from Int16x32) AsFloat32x16() (to Float32x16)

Float32x16 converts from Int16x32 to Float32x16

func (Int16x32) AsFloat64x8

func (from Int16x32) AsFloat64x8() (to Float64x8)

Float64x8 converts from Int16x32 to Float64x8

func (Int16x32) AsInt32x16

func (from Int16x32) AsInt32x16() (to Int32x16)

Int32x16 converts from Int16x32 to Int32x16

func (Int16x32) AsInt64x8

func (from Int16x32) AsInt64x8() (to Int64x8)

Int64x8 converts from Int16x32 to Int64x8

func (Int16x32) AsInt8x64

func (from Int16x32) AsInt8x64() (to Int8x64)

Int8x64 converts from Int16x32 to Int8x64

func (Int16x32) AsUint16x32

func (from Int16x32) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Int16x32 to Uint16x32

func (Int16x32) AsUint32x16

func (from Int16x32) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Int16x32 to Uint32x16

func (Int16x32) AsUint64x8

func (from Int16x32) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Int16x32 to Uint64x8

func (Int16x32) AsUint8x64

func (from Int16x32) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Int16x32 to Uint8x64

func (Int16x32) Compress

func (x Int16x32) Compress(mask Mask16x32) Int16x32

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

func (Int16x32) ConcatPermute

func (x Int16x32) ConcatPermute(y Int16x32, indices Uint16x32) Int16x32

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2W, CPU Feature: AVX512

func (Int16x32) DotProductPairs

func (x Int16x32) DotProductPairs(y Int16x32) Int32x16

DotProductPairs multiplies the elements and add the pairs together, yielding a vector of half as many elements with twice the input element size.

Asm: VPMADDWD, CPU Feature: AVX512

func (Int16x32) Equal

func (x Int16x32) Equal(y Int16x32) Mask16x32

Equal returns x equals y, elementwise.

Asm: VPCMPEQW, CPU Feature: AVX512

func (Int16x32) Expand

func (x Int16x32) Expand(mask Mask16x32) Int16x32

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

func (Int16x32) GetHi

func (x Int16x32) GetHi() Int16x16

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int16x32) GetLo

func (x Int16x32) GetLo() Int16x16

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int16x32) Greater

func (x Int16x32) Greater(y Int16x32) Mask16x32

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTW, CPU Feature: AVX512

func (Int16x32) GreaterEqual

func (x Int16x32) GreaterEqual(y Int16x32) Mask16x32

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPW, CPU Feature: AVX512

func (Int16x32) InterleaveHiGrouped

func (x Int16x32) InterleaveHiGrouped(y Int16x32) Int16x32

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHWD, CPU Feature: AVX512

func (Int16x32) InterleaveLoGrouped

func (x Int16x32) InterleaveLoGrouped(y Int16x32) Int16x32

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLWD, CPU Feature: AVX512

func (Int16x32) Len

func (x Int16x32) Len() int

Len returns the number of elements in a Int16x32

func (Int16x32) Less

func (x Int16x32) Less(y Int16x32) Mask16x32

Less returns x less-than y, elementwise.

Asm: VPCMPW, CPU Feature: AVX512

func (Int16x32) LessEqual

func (x Int16x32) LessEqual(y Int16x32) Mask16x32

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPW, CPU Feature: AVX512

func (Int16x32) Masked

func (x Int16x32) Masked(mask Mask16x32) Int16x32

Masked returns x but with elements zeroed where mask is false.

func (Int16x32) Max

func (x Int16x32) Max(y Int16x32) Int16x32

Max computes the maximum of corresponding elements.

Asm: VPMAXSW, CPU Feature: AVX512

func (Int16x32) Merge

func (x Int16x32) Merge(y Int16x32, mask Mask16x32) Int16x32

Merge returns x but with elements set to y where m is false.

func (Int16x32) Min

func (x Int16x32) Min(y Int16x32) Int16x32

Min computes the minimum of corresponding elements.

Asm: VPMINSW, CPU Feature: AVX512

func (Int16x32) Mul

func (x Int16x32) Mul(y Int16x32) Int16x32

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLW, CPU Feature: AVX512

func (Int16x32) MulHigh

func (x Int16x32) MulHigh(y Int16x32) Int16x32

MulHigh multiplies elements and stores the high part of the result.

Asm: VPMULHW, CPU Feature: AVX512

func (Int16x32) Not

func (x Int16x32) Not() Int16x32

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Int16x32) NotEqual

func (x Int16x32) NotEqual(y Int16x32) Mask16x32

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPW, CPU Feature: AVX512

func (Int16x32) OnesCount

func (x Int16x32) OnesCount() Int16x32

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTW, CPU Feature: AVX512BITALG

func (Int16x32) Or

func (x Int16x32) Or(y Int16x32) Int16x32

Or performs a bitwise OR operation between two vectors.

Asm: VPORD, CPU Feature: AVX512

func (Int16x32) Permute

func (x Int16x32) Permute(indices Uint16x32) Int16x32

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 5 bits (values 0-31) of each element of indices is used

Asm: VPERMW, CPU Feature: AVX512

func (Int16x32) PermuteScalarsHiGrouped

func (x Int16x32) PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x32

PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
	  {x[0], x[1], x[2], x[3],     x[a+4], x[b+4], x[c+4], x[d+4],
		x[8], x[9], x[10], x[11],   x[a+12], x[b+12], x[c+12], x[d+12],
		x[16], x[17], x[18], x[19], x[a+20], x[b+20], x[c+20], x[d+20],
		x[24], x[25], x[26], x[27], x[a+28], x[b+28], x[c+28], x[d+28]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFHW, CPU Feature: AVX512

func (Int16x32) PermuteScalarsLoGrouped

func (x Int16x32) PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x32

PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
 {x[a], x[b], x[c], x[d],    x[4], x[5], x[6], x[7],
	x[a+8], x[b+8], x[c+8], x[d+8],     x[12], x[13], x[14], x[15],
	x[a+16], x[b+16], x[c+16], x[d+16], x[20], x[21], x[22], x[23],
	x[a+24], x[b+24], x[c+24], x[d+24], x[28], x[29], x[30], x[31]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFLW, CPU Feature: AVX512

func (Int16x32) SaturateToInt8

func (x Int16x32) SaturateToInt8() Int8x32

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements.

Asm: VPMOVSWB, CPU Feature: AVX512

func (Int16x32) SetHi

func (x Int16x32) SetHi(y Int16x16) Int16x32

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int16x32) SetLo

func (x Int16x32) SetLo(y Int16x16) Int16x32

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int16x32) ShiftAllLeft

func (x Int16x32) ShiftAllLeft(y uint64) Int16x32

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLW, CPU Feature: AVX512

func (Int16x32) ShiftAllLeftConcat

func (x Int16x32) ShiftAllLeftConcat(shift uint8, y Int16x32) Int16x32

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDW, CPU Feature: AVX512VBMI2

func (Int16x32) ShiftAllRight

func (x Int16x32) ShiftAllRight(y uint64) Int16x32

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAW, CPU Feature: AVX512

func (Int16x32) ShiftAllRightConcat

func (x Int16x32) ShiftAllRightConcat(shift uint8, y Int16x32) Int16x32

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDW, CPU Feature: AVX512VBMI2

func (Int16x32) ShiftLeft

func (x Int16x32) ShiftLeft(y Int16x32) Int16x32

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVW, CPU Feature: AVX512

func (Int16x32) ShiftLeftConcat

func (x Int16x32) ShiftLeftConcat(y Int16x32, z Int16x32) Int16x32

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

func (Int16x32) ShiftRight

func (x Int16x32) ShiftRight(y Int16x32) Int16x32

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVW, CPU Feature: AVX512

func (Int16x32) ShiftRightConcat

func (x Int16x32) ShiftRightConcat(y Int16x32, z Int16x32) Int16x32

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

func (Int16x32) Store

func (x Int16x32) Store(y *[32]int16)

Store stores a Int16x32 to an array

func (Int16x32) StoreMasked

func (x Int16x32) StoreMasked(y *[32]int16, mask Mask16x32)

StoreMasked stores a Int16x32 to an array, at those elements enabled by mask

Asm: VMOVDQU16, CPU Feature: AVX512

func (Int16x32) StoreSlice

func (x Int16x32) StoreSlice(s []int16)

StoreSlice stores x into a slice of at least 32 int16s

func (Int16x32) StoreSlicePart

func (x Int16x32) StoreSlicePart(s []int16)

StoreSlicePart stores the 32 elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.

func (Int16x32) String

func (x Int16x32) String() string

String returns a string representation of SIMD vector x

func (Int16x32) Sub

func (x Int16x32) Sub(y Int16x32) Int16x32

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBW, CPU Feature: AVX512

func (Int16x32) SubSaturated

func (x Int16x32) SubSaturated(y Int16x32) Int16x32

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBSW, CPU Feature: AVX512

func (Int16x32) ToMask

func (from Int16x32) ToMask() (to Mask16x32)

ToMask converts from Int16x32 to Mask16x32, mask element is set to true when the corresponding vector element is non-zero.

func (Int16x32) TruncateToInt8

func (x Int16x32) TruncateToInt8() Int8x32

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements.

Asm: VPMOVWB, CPU Feature: AVX512

func (Int16x32) Xor

func (x Int16x32) Xor(y Int16x32) Int16x32

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORD, CPU Feature: AVX512

type Int16x8

type Int16x8 struct {
	// contains filtered or unexported fields
}

Int16x8 is a 128-bit SIMD vector of 8 int16

func BroadcastInt16x8

func BroadcastInt16x8(x int16) Int16x8

BroadcastInt16x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt16x8

func LoadInt16x8(y *[8]int16) Int16x8

LoadInt16x8 loads a Int16x8 from an array

func LoadInt16x8Slice

func LoadInt16x8Slice(s []int16) Int16x8

LoadInt16x8Slice loads an Int16x8 from a slice of at least 8 int16s

func LoadInt16x8SlicePart

func LoadInt16x8SlicePart(s []int16) Int16x8

LoadInt16x8SlicePart loads a Int16x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadInt16x8Slice.

func (Int16x8) Abs

func (x Int16x8) Abs() Int16x8

Abs computes the absolute value of each element.

Asm: VPABSW, CPU Feature: AVX

func (Int16x8) Add

func (x Int16x8) Add(y Int16x8) Int16x8

Add adds corresponding elements of two vectors.

Asm: VPADDW, CPU Feature: AVX

func (Int16x8) AddPairs

func (x Int16x8) AddPairs(y Int16x8) Int16x8

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDW, CPU Feature: AVX

func (Int16x8) AddPairsSaturated

func (x Int16x8) AddPairsSaturated(y Int16x8) Int16x8

AddPairsSaturated horizontally adds adjacent pairs of elements with saturation. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDSW, CPU Feature: AVX

func (Int16x8) AddSaturated

func (x Int16x8) AddSaturated(y Int16x8) Int16x8

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDSW, CPU Feature: AVX

func (Int16x8) And

func (x Int16x8) And(y Int16x8) Int16x8

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Int16x8) AndNot

func (x Int16x8) AndNot(y Int16x8) Int16x8

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Int16x8) AsFloat32x4

func (from Int16x8) AsFloat32x4() (to Float32x4)

Float32x4 converts from Int16x8 to Float32x4

func (Int16x8) AsFloat64x2

func (from Int16x8) AsFloat64x2() (to Float64x2)

Float64x2 converts from Int16x8 to Float64x2

func (Int16x8) AsInt32x4

func (from Int16x8) AsInt32x4() (to Int32x4)

Int32x4 converts from Int16x8 to Int32x4

func (Int16x8) AsInt64x2

func (from Int16x8) AsInt64x2() (to Int64x2)

Int64x2 converts from Int16x8 to Int64x2

func (Int16x8) AsInt8x16

func (from Int16x8) AsInt8x16() (to Int8x16)

Int8x16 converts from Int16x8 to Int8x16

func (Int16x8) AsUint16x8

func (from Int16x8) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Int16x8 to Uint16x8

func (Int16x8) AsUint32x4

func (from Int16x8) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Int16x8 to Uint32x4

func (Int16x8) AsUint64x2

func (from Int16x8) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Int16x8 to Uint64x2

func (Int16x8) AsUint8x16

func (from Int16x8) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Int16x8 to Uint8x16

func (Int16x8) Broadcast128

func (x Int16x8) Broadcast128() Int16x8

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTW, CPU Feature: AVX2

func (Int16x8) Broadcast256

func (x Int16x8) Broadcast256() Int16x16

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTW, CPU Feature: AVX2

func (Int16x8) Broadcast512

func (x Int16x8) Broadcast512() Int16x32

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTW, CPU Feature: AVX512

func (Int16x8) Compress

func (x Int16x8) Compress(mask Mask16x8) Int16x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

func (Int16x8) ConcatPermute

func (x Int16x8) ConcatPermute(y Int16x8, indices Uint16x8) Int16x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2W, CPU Feature: AVX512

func (Int16x8) CopySign

func (x Int16x8) CopySign(y Int16x8) Int16x8

CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.

Asm: VPSIGNW, CPU Feature: AVX

func (Int16x8) DotProductPairs

func (x Int16x8) DotProductPairs(y Int16x8) Int32x4

DotProductPairs multiplies the elements and add the pairs together, yielding a vector of half as many elements with twice the input element size.

Asm: VPMADDWD, CPU Feature: AVX

func (Int16x8) Equal

func (x Int16x8) Equal(y Int16x8) Mask16x8

Equal returns x equals y, elementwise.

Asm: VPCMPEQW, CPU Feature: AVX

func (Int16x8) Expand

func (x Int16x8) Expand(mask Mask16x8) Int16x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

func (Int16x8) ExtendLo2ToInt64x2

func (x Int16x8) ExtendLo2ToInt64x2() Int64x2

ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXWQ, CPU Feature: AVX

func (Int16x8) ExtendLo4ToInt32x4

func (x Int16x8) ExtendLo4ToInt32x4() Int32x4

ExtendLo4ToInt32x4 converts 4 lowest vector element values to int32. The result vector's elements are sign-extended.

Asm: VPMOVSXWD, CPU Feature: AVX

func (Int16x8) ExtendLo4ToInt64x4

func (x Int16x8) ExtendLo4ToInt64x4() Int64x4

ExtendLo4ToInt64x4 converts 4 lowest vector element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXWQ, CPU Feature: AVX2

func (Int16x8) ExtendToInt32

func (x Int16x8) ExtendToInt32() Int32x8

ExtendToInt32 converts element values to int32. The result vector's elements are sign-extended.

Asm: VPMOVSXWD, CPU Feature: AVX2

func (Int16x8) ExtendToInt64

func (x Int16x8) ExtendToInt64() Int64x8

ExtendToInt64 converts element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXWQ, CPU Feature: AVX512

func (Int16x8) GetElem

func (x Int16x8) GetElem(index uint8) int16

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRW, CPU Feature: AVX512

func (Int16x8) Greater

func (x Int16x8) Greater(y Int16x8) Mask16x8

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTW, CPU Feature: AVX

func (Int16x8) GreaterEqual

func (x Int16x8) GreaterEqual(y Int16x8) Mask16x8

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX

func (Int16x8) InterleaveHi

func (x Int16x8) InterleaveHi(y Int16x8) Int16x8

InterleaveHi interleaves the elements of the high halves of x and y.

Asm: VPUNPCKHWD, CPU Feature: AVX

func (Int16x8) InterleaveLo

func (x Int16x8) InterleaveLo(y Int16x8) Int16x8

InterleaveLo interleaves the elements of the low halves of x and y.

Asm: VPUNPCKLWD, CPU Feature: AVX

func (Int16x8) IsZero

func (x Int16x8) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int16x8) Len

func (x Int16x8) Len() int

Len returns the number of elements in a Int16x8

func (Int16x8) Less

func (x Int16x8) Less(y Int16x8) Mask16x8

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX

func (Int16x8) LessEqual

func (x Int16x8) LessEqual(y Int16x8) Mask16x8

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX

func (Int16x8) Masked

func (x Int16x8) Masked(mask Mask16x8) Int16x8

Masked returns x but with elements zeroed where mask is false.

func (Int16x8) Max

func (x Int16x8) Max(y Int16x8) Int16x8

Max computes the maximum of corresponding elements.

Asm: VPMAXSW, CPU Feature: AVX

func (Int16x8) Merge

func (x Int16x8) Merge(y Int16x8, mask Mask16x8) Int16x8

Merge returns x but with elements set to y where mask is false.

func (Int16x8) Min

func (x Int16x8) Min(y Int16x8) Int16x8

Min computes the minimum of corresponding elements.

Asm: VPMINSW, CPU Feature: AVX

func (Int16x8) Mul

func (x Int16x8) Mul(y Int16x8) Int16x8

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLW, CPU Feature: AVX

func (Int16x8) MulHigh

func (x Int16x8) MulHigh(y Int16x8) Int16x8

MulHigh multiplies elements and stores the high part of the result.

Asm: VPMULHW, CPU Feature: AVX

func (Int16x8) Not

func (x Int16x8) Not() Int16x8

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Int16x8) NotEqual

func (x Int16x8) NotEqual(y Int16x8) Mask16x8

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Int16x8) OnesCount

func (x Int16x8) OnesCount() Int16x8

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTW, CPU Feature: AVX512BITALG

func (Int16x8) Or

func (x Int16x8) Or(y Int16x8) Int16x8

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Int16x8) Permute

func (x Int16x8) Permute(indices Uint16x8) Int16x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMW, CPU Feature: AVX512

func (Int16x8) PermuteScalarsHi

func (x Int16x8) PermuteScalarsHi(a, b, c, d uint8) Int16x8

PermuteScalarsHi performs a permutation of vector x using the supplied indices:

result = {x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFHW, CPU Feature: AVX512

func (Int16x8) PermuteScalarsLo

func (x Int16x8) PermuteScalarsLo(a, b, c, d uint8) Int16x8

PermuteScalarsLo performs a permutation of vector x using the supplied indices:

result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFLW, CPU Feature: AVX512

func (Int16x8) SaturateToInt8

func (x Int16x8) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSWB, CPU Feature: AVX512

func (Int16x8) SaturateToUint8

func (x Int16x8) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSWB, CPU Feature: AVX512

func (Int16x8) SetElem

func (x Int16x8) SetElem(index uint8, y int16) Int16x8

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRW, CPU Feature: AVX

func (Int16x8) ShiftAllLeft

func (x Int16x8) ShiftAllLeft(y uint64) Int16x8

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLW, CPU Feature: AVX

func (Int16x8) ShiftAllLeftConcat

func (x Int16x8) ShiftAllLeftConcat(shift uint8, y Int16x8) Int16x8

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDW, CPU Feature: AVX512VBMI2

func (Int16x8) ShiftAllRight

func (x Int16x8) ShiftAllRight(y uint64) Int16x8

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAW, CPU Feature: AVX

func (Int16x8) ShiftAllRightConcat

func (x Int16x8) ShiftAllRightConcat(shift uint8, y Int16x8) Int16x8

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDW, CPU Feature: AVX512VBMI2

func (Int16x8) ShiftLeft

func (x Int16x8) ShiftLeft(y Int16x8) Int16x8

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVW, CPU Feature: AVX512

func (Int16x8) ShiftLeftConcat

func (x Int16x8) ShiftLeftConcat(y Int16x8, z Int16x8) Int16x8

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

func (Int16x8) ShiftRight

func (x Int16x8) ShiftRight(y Int16x8) Int16x8

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVW, CPU Feature: AVX512

func (Int16x8) ShiftRightConcat

func (x Int16x8) ShiftRightConcat(y Int16x8, z Int16x8) Int16x8

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

func (Int16x8) Store

func (x Int16x8) Store(y *[8]int16)

Store stores a Int16x8 to an array

func (Int16x8) StoreSlice

func (x Int16x8) StoreSlice(s []int16)

StoreSlice stores x into a slice of at least 8 int16s

func (Int16x8) StoreSlicePart

func (x Int16x8) StoreSlicePart(s []int16)

StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Int16x8) String

func (x Int16x8) String() string

String returns a string representation of SIMD vector x

func (Int16x8) Sub

func (x Int16x8) Sub(y Int16x8) Int16x8

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBW, CPU Feature: AVX

func (Int16x8) SubPairs

func (x Int16x8) SubPairs(y Int16x8) Int16x8

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBW, CPU Feature: AVX

func (Int16x8) SubPairsSaturated

func (x Int16x8) SubPairsSaturated(y Int16x8) Int16x8

SubPairsSaturated horizontally subtracts adjacent pairs of elements with saturation. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBSW, CPU Feature: AVX

func (Int16x8) SubSaturated

func (x Int16x8) SubSaturated(y Int16x8) Int16x8

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBSW, CPU Feature: AVX

func (Int16x8) ToMask

func (from Int16x8) ToMask() (to Mask16x8)

ToMask converts from Int16x8 to Mask16x8, mask element is set to true when the corresponding vector element is non-zero.

func (Int16x8) TruncateToInt8

func (x Int16x8) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVWB, CPU Feature: AVX512

func (Int16x8) Xor

func (x Int16x8) Xor(y Int16x8) Int16x8

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Int32x16

type Int32x16 struct {
	// contains filtered or unexported fields
}

Int32x16 is a 512-bit SIMD vector of 16 int32

func BroadcastInt32x16

func BroadcastInt32x16(x int32) Int32x16

BroadcastInt32x16 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512F

func LoadInt32x16

func LoadInt32x16(y *[16]int32) Int32x16

LoadInt32x16 loads a Int32x16 from an array

func LoadInt32x16Slice

func LoadInt32x16Slice(s []int32) Int32x16

LoadInt32x16Slice loads an Int32x16 from a slice of at least 16 int32s

func LoadInt32x16SlicePart

func LoadInt32x16SlicePart(s []int32) Int32x16

LoadInt32x16SlicePart loads a Int32x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadInt32x16Slice.

func LoadMaskedInt32x16

func LoadMaskedInt32x16(y *[16]int32, mask Mask32x16) Int32x16

LoadMaskedInt32x16 loads a Int32x16 from an array, at those elements enabled by mask

Asm: VMOVDQU32.Z, CPU Feature: AVX512

func (Int32x16) Abs

func (x Int32x16) Abs() Int32x16

Abs computes the absolute value of each element.

Asm: VPABSD, CPU Feature: AVX512

func (Int32x16) Add

func (x Int32x16) Add(y Int32x16) Int32x16

Add adds corresponding elements of two vectors.

Asm: VPADDD, CPU Feature: AVX512

func (Int32x16) And

func (x Int32x16) And(y Int32x16) Int32x16

And performs a bitwise AND operation between two vectors.

Asm: VPANDD, CPU Feature: AVX512

func (Int32x16) AndNot

func (x Int32x16) AndNot(y Int32x16) Int32x16

AndNot performs a bitwise x &^ y.

Asm: VPANDND, CPU Feature: AVX512

func (Int32x16) AsFloat32x16

func (from Int32x16) AsFloat32x16() (to Float32x16)

Float32x16 converts from Int32x16 to Float32x16

func (Int32x16) AsFloat64x8

func (from Int32x16) AsFloat64x8() (to Float64x8)

Float64x8 converts from Int32x16 to Float64x8

func (Int32x16) AsInt16x32

func (from Int32x16) AsInt16x32() (to Int16x32)

Int16x32 converts from Int32x16 to Int16x32

func (Int32x16) AsInt64x8

func (from Int32x16) AsInt64x8() (to Int64x8)

Int64x8 converts from Int32x16 to Int64x8

func (Int32x16) AsInt8x64

func (from Int32x16) AsInt8x64() (to Int8x64)

Int8x64 converts from Int32x16 to Int8x64

func (Int32x16) AsUint16x32

func (from Int32x16) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Int32x16 to Uint16x32

func (Int32x16) AsUint32x16

func (from Int32x16) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Int32x16 to Uint32x16

func (Int32x16) AsUint64x8

func (from Int32x16) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Int32x16 to Uint64x8

func (Int32x16) AsUint8x64

func (from Int32x16) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Int32x16 to Uint8x64

func (Int32x16) Compress

func (x Int32x16) Compress(mask Mask32x16) Int32x16

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSD, CPU Feature: AVX512

func (Int32x16) ConcatPermute

func (x Int32x16) ConcatPermute(y Int32x16, indices Uint32x16) Int32x16

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2D, CPU Feature: AVX512

func (Int32x16) ConvertToFloat32

func (x Int32x16) ConvertToFloat32() Float32x16

ConvertToFloat32 converts element values to float32.

Asm: VCVTDQ2PS, CPU Feature: AVX512

func (Int32x16) Equal

func (x Int32x16) Equal(y Int32x16) Mask32x16

Equal returns x equals y, elementwise.

Asm: VPCMPEQD, CPU Feature: AVX512

func (Int32x16) Expand

func (x Int32x16) Expand(mask Mask32x16) Int32x16

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDD, CPU Feature: AVX512

func (Int32x16) GetHi

func (x Int32x16) GetHi() Int32x8

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int32x16) GetLo

func (x Int32x16) GetLo() Int32x8

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int32x16) Greater

func (x Int32x16) Greater(y Int32x16) Mask32x16

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTD, CPU Feature: AVX512

func (Int32x16) GreaterEqual

func (x Int32x16) GreaterEqual(y Int32x16) Mask32x16

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPD, CPU Feature: AVX512

func (Int32x16) InterleaveHiGrouped

func (x Int32x16) InterleaveHiGrouped(y Int32x16) Int32x16

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHDQ, CPU Feature: AVX512

func (Int32x16) InterleaveLoGrouped

func (x Int32x16) InterleaveLoGrouped(y Int32x16) Int32x16

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLDQ, CPU Feature: AVX512

func (Int32x16) LeadingZeros

func (x Int32x16) LeadingZeros() Int32x16

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTD, CPU Feature: AVX512

func (Int32x16) Len

func (x Int32x16) Len() int

Len returns the number of elements in a Int32x16

func (Int32x16) Less

func (x Int32x16) Less(y Int32x16) Mask32x16

Less returns x less-than y, elementwise.

Asm: VPCMPD, CPU Feature: AVX512

func (Int32x16) LessEqual

func (x Int32x16) LessEqual(y Int32x16) Mask32x16

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPD, CPU Feature: AVX512

func (Int32x16) Masked

func (x Int32x16) Masked(mask Mask32x16) Int32x16

Masked returns x but with elements zeroed where mask is false.

func (Int32x16) Max

func (x Int32x16) Max(y Int32x16) Int32x16

Max computes the maximum of corresponding elements.

Asm: VPMAXSD, CPU Feature: AVX512

func (Int32x16) Merge

func (x Int32x16) Merge(y Int32x16, mask Mask32x16) Int32x16

Merge returns x but with elements set to y where m is false.

func (Int32x16) Min

func (x Int32x16) Min(y Int32x16) Int32x16

Min computes the minimum of corresponding elements.

Asm: VPMINSD, CPU Feature: AVX512

func (Int32x16) Mul

func (x Int32x16) Mul(y Int32x16) Int32x16

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLD, CPU Feature: AVX512

func (Int32x16) Not

func (x Int32x16) Not() Int32x16

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Int32x16) NotEqual

func (x Int32x16) NotEqual(y Int32x16) Mask32x16

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPD, CPU Feature: AVX512

func (Int32x16) OnesCount

func (x Int32x16) OnesCount() Int32x16

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

func (Int32x16) Or

func (x Int32x16) Or(y Int32x16) Int32x16

Or performs a bitwise OR operation between two vectors.

Asm: VPORD, CPU Feature: AVX512

func (Int32x16) Permute

func (x Int32x16) Permute(indices Uint32x16) Int32x16

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used

Asm: VPERMD, CPU Feature: AVX512

func (Int32x16) PermuteScalarsGrouped

func (x Int32x16) PermuteScalarsGrouped(a, b, c, d uint8) Int32x16

PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
	 {  x[a], x[b], x[c], x[d],         x[a+4], x[b+4], x[c+4], x[d+4],
		x[a+8], x[b+8], x[c+8], x[d+8], x[a+12], x[b+12], x[c+12], x[d+12]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.

Asm: VPSHUFD, CPU Feature: AVX512

func (Int32x16) RotateAllLeft

func (x Int32x16) RotateAllLeft(shift uint8) Int32x16

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLD, CPU Feature: AVX512

func (Int32x16) RotateAllRight

func (x Int32x16) RotateAllRight(shift uint8) Int32x16

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORD, CPU Feature: AVX512

func (Int32x16) RotateLeft

func (x Int32x16) RotateLeft(y Int32x16) Int32x16

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVD, CPU Feature: AVX512

func (Int32x16) RotateRight

func (x Int32x16) RotateRight(y Int32x16) Int32x16

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVD, CPU Feature: AVX512

func (Int32x16) SaturateToInt16

func (x Int32x16) SaturateToInt16() Int16x16

SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.

Asm: VPMOVSDW, CPU Feature: AVX512

func (Int32x16) SaturateToInt16Concat

func (x Int32x16) SaturateToInt16Concat(y Int32x16) Int16x32

SaturateToInt16Concat converts element values to int16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.

Asm: VPACKSSDW, CPU Feature: AVX512

func (Int32x16) SaturateToInt8

func (x Int32x16) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSDB, CPU Feature: AVX512

func (Int32x16) SaturateToUint8

func (x Int32x16) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSDB, CPU Feature: AVX512

func (Int32x16) SelectFromPairGrouped

func (x Int32x16) SelectFromPairGrouped(a, b, c, d uint8, y Int32x16) Int32x16

SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX512

func (Int32x16) SetHi

func (x Int32x16) SetHi(y Int32x8) Int32x16

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int32x16) SetLo

func (x Int32x16) SetLo(y Int32x8) Int32x16

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int32x16) ShiftAllLeft

func (x Int32x16) ShiftAllLeft(y uint64) Int32x16

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLD, CPU Feature: AVX512

func (Int32x16) ShiftAllLeftConcat

func (x Int32x16) ShiftAllLeftConcat(shift uint8, y Int32x16) Int32x16

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDD, CPU Feature: AVX512VBMI2

func (Int32x16) ShiftAllRight

func (x Int32x16) ShiftAllRight(y uint64) Int32x16

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAD, CPU Feature: AVX512

func (Int32x16) ShiftAllRightConcat

func (x Int32x16) ShiftAllRightConcat(shift uint8, y Int32x16) Int32x16

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDD, CPU Feature: AVX512VBMI2

func (Int32x16) ShiftLeft

func (x Int32x16) ShiftLeft(y Int32x16) Int32x16

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVD, CPU Feature: AVX512

func (Int32x16) ShiftLeftConcat

func (x Int32x16) ShiftLeftConcat(y Int32x16, z Int32x16) Int32x16

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

func (Int32x16) ShiftRight

func (x Int32x16) ShiftRight(y Int32x16) Int32x16

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVD, CPU Feature: AVX512

func (Int32x16) ShiftRightConcat

func (x Int32x16) ShiftRightConcat(y Int32x16, z Int32x16) Int32x16

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

func (Int32x16) Store

func (x Int32x16) Store(y *[16]int32)

Store stores a Int32x16 to an array

func (Int32x16) StoreMasked

func (x Int32x16) StoreMasked(y *[16]int32, mask Mask32x16)

StoreMasked stores a Int32x16 to an array, at those elements enabled by mask

Asm: VMOVDQU32, CPU Feature: AVX512

func (Int32x16) StoreSlice

func (x Int32x16) StoreSlice(s []int32)

StoreSlice stores x into a slice of at least 16 int32s

func (Int32x16) StoreSlicePart

func (x Int32x16) StoreSlicePart(s []int32)

StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.

func (Int32x16) String

func (x Int32x16) String() string

String returns a string representation of SIMD vector x

func (Int32x16) Sub

func (x Int32x16) Sub(y Int32x16) Int32x16

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBD, CPU Feature: AVX512

func (Int32x16) ToMask

func (from Int32x16) ToMask() (to Mask32x16)

ToMask converts from Int32x16 to Mask32x16, mask element is set to true when the corresponding vector element is non-zero.

func (Int32x16) TruncateToInt16

func (x Int32x16) TruncateToInt16() Int16x16

TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.

Asm: VPMOVDW, CPU Feature: AVX512

func (Int32x16) TruncateToInt8

func (x Int32x16) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVDB, CPU Feature: AVX512

func (Int32x16) Xor

func (x Int32x16) Xor(y Int32x16) Int32x16

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORD, CPU Feature: AVX512

type Int32x4

type Int32x4 struct {
	// contains filtered or unexported fields
}

Int32x4 is a 128-bit SIMD vector of 4 int32

func BroadcastInt32x4

func BroadcastInt32x4(x int32) Int32x4

BroadcastInt32x4 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt32x4

func LoadInt32x4(y *[4]int32) Int32x4

LoadInt32x4 loads a Int32x4 from an array

func LoadInt32x4Slice

func LoadInt32x4Slice(s []int32) Int32x4

LoadInt32x4Slice loads an Int32x4 from a slice of at least 4 int32s

func LoadInt32x4SlicePart

func LoadInt32x4SlicePart(s []int32) Int32x4

LoadInt32x4SlicePart loads a Int32x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadInt32x4Slice.

func LoadMaskedInt32x4

func LoadMaskedInt32x4(y *[4]int32, mask Mask32x4) Int32x4

LoadMaskedInt32x4 loads a Int32x4 from an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Int32x4) Abs

func (x Int32x4) Abs() Int32x4

Abs computes the absolute value of each element.

Asm: VPABSD, CPU Feature: AVX

func (Int32x4) Add

func (x Int32x4) Add(y Int32x4) Int32x4

Add adds corresponding elements of two vectors.

Asm: VPADDD, CPU Feature: AVX

func (Int32x4) AddPairs

func (x Int32x4) AddPairs(y Int32x4) Int32x4

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDD, CPU Feature: AVX

func (Int32x4) And

func (x Int32x4) And(y Int32x4) Int32x4

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Int32x4) AndNot

func (x Int32x4) AndNot(y Int32x4) Int32x4

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Int32x4) AsFloat32x4

func (from Int32x4) AsFloat32x4() (to Float32x4)

Float32x4 converts from Int32x4 to Float32x4

func (Int32x4) AsFloat64x2

func (from Int32x4) AsFloat64x2() (to Float64x2)

Float64x2 converts from Int32x4 to Float64x2

func (Int32x4) AsInt16x8

func (from Int32x4) AsInt16x8() (to Int16x8)

Int16x8 converts from Int32x4 to Int16x8

func (Int32x4) AsInt64x2

func (from Int32x4) AsInt64x2() (to Int64x2)

Int64x2 converts from Int32x4 to Int64x2

func (Int32x4) AsInt8x16

func (from Int32x4) AsInt8x16() (to Int8x16)

Int8x16 converts from Int32x4 to Int8x16

func (Int32x4) AsUint16x8

func (from Int32x4) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Int32x4 to Uint16x8

func (Int32x4) AsUint32x4

func (from Int32x4) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Int32x4 to Uint32x4

func (Int32x4) AsUint64x2

func (from Int32x4) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Int32x4 to Uint64x2

func (Int32x4) AsUint8x16

func (from Int32x4) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Int32x4 to Uint8x16

func (Int32x4) Broadcast128

func (x Int32x4) Broadcast128() Int32x4

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTD, CPU Feature: AVX2

func (Int32x4) Broadcast256

func (x Int32x4) Broadcast256() Int32x8

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTD, CPU Feature: AVX2

func (Int32x4) Broadcast512

func (x Int32x4) Broadcast512() Int32x16

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTD, CPU Feature: AVX512

func (Int32x4) Compress

func (x Int32x4) Compress(mask Mask32x4) Int32x4

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSD, CPU Feature: AVX512

func (Int32x4) ConcatPermute

func (x Int32x4) ConcatPermute(y Int32x4, indices Uint32x4) Int32x4

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2D, CPU Feature: AVX512

func (Int32x4) ConvertToFloat32

func (x Int32x4) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32.

Asm: VCVTDQ2PS, CPU Feature: AVX

func (Int32x4) ConvertToFloat64

func (x Int32x4) ConvertToFloat64() Float64x4

ConvertToFloat64 converts element values to float64.

Asm: VCVTDQ2PD, CPU Feature: AVX

func (Int32x4) CopySign

func (x Int32x4) CopySign(y Int32x4) Int32x4

CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.

Asm: VPSIGND, CPU Feature: AVX

func (Int32x4) Equal

func (x Int32x4) Equal(y Int32x4) Mask32x4

Equal returns x equals y, elementwise.

Asm: VPCMPEQD, CPU Feature: AVX

func (Int32x4) Expand

func (x Int32x4) Expand(mask Mask32x4) Int32x4

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDD, CPU Feature: AVX512

func (Int32x4) ExtendLo2ToInt64x2

func (x Int32x4) ExtendLo2ToInt64x2() Int64x2

ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXDQ, CPU Feature: AVX

func (Int32x4) ExtendToInt64

func (x Int32x4) ExtendToInt64() Int64x4

ExtendToInt64 converts element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXDQ, CPU Feature: AVX2

func (Int32x4) GetElem

func (x Int32x4) GetElem(index uint8) int32

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRD, CPU Feature: AVX

func (Int32x4) Greater

func (x Int32x4) Greater(y Int32x4) Mask32x4

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTD, CPU Feature: AVX

func (Int32x4) GreaterEqual

func (x Int32x4) GreaterEqual(y Int32x4) Mask32x4

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX

func (Int32x4) InterleaveHi

func (x Int32x4) InterleaveHi(y Int32x4) Int32x4

InterleaveHi interleaves the elements of the high halves of x and y.

Asm: VPUNPCKHDQ, CPU Feature: AVX

func (Int32x4) InterleaveLo

func (x Int32x4) InterleaveLo(y Int32x4) Int32x4

InterleaveLo interleaves the elements of the low halves of x and y.

Asm: VPUNPCKLDQ, CPU Feature: AVX

func (Int32x4) IsZero

func (x Int32x4) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int32x4) LeadingZeros

func (x Int32x4) LeadingZeros() Int32x4

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTD, CPU Feature: AVX512

func (Int32x4) Len

func (x Int32x4) Len() int

Len returns the number of elements in a Int32x4

func (Int32x4) Less

func (x Int32x4) Less(y Int32x4) Mask32x4

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX

func (Int32x4) LessEqual

func (x Int32x4) LessEqual(y Int32x4) Mask32x4

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX

func (Int32x4) Masked

func (x Int32x4) Masked(mask Mask32x4) Int32x4

Masked returns x but with elements zeroed where mask is false.

func (Int32x4) Max

func (x Int32x4) Max(y Int32x4) Int32x4

Max computes the maximum of corresponding elements.

Asm: VPMAXSD, CPU Feature: AVX

func (Int32x4) Merge

func (x Int32x4) Merge(y Int32x4, mask Mask32x4) Int32x4

Merge returns x but with elements set to y where mask is false.

func (Int32x4) Min

func (x Int32x4) Min(y Int32x4) Int32x4

Min computes the minimum of corresponding elements.

Asm: VPMINSD, CPU Feature: AVX

func (Int32x4) Mul

func (x Int32x4) Mul(y Int32x4) Int32x4

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLD, CPU Feature: AVX

func (Int32x4) MulEvenWiden

func (x Int32x4) MulEvenWiden(y Int32x4) Int64x2

MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1.Even[i] * v2.Even[i].

Asm: VPMULDQ, CPU Feature: AVX

func (Int32x4) Not

func (x Int32x4) Not() Int32x4

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Int32x4) NotEqual

func (x Int32x4) NotEqual(y Int32x4) Mask32x4

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Int32x4) OnesCount

func (x Int32x4) OnesCount() Int32x4

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

func (Int32x4) Or

func (x Int32x4) Or(y Int32x4) Int32x4

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Int32x4) PermuteScalars

func (x Int32x4) PermuteScalars(a, b, c, d uint8) Int32x4

PermuteScalars performs a permutation of vector x's elements using the supplied indices:

result = {x[a], x[b], x[c], x[d]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.

Asm: VPSHUFD, CPU Feature: AVX

func (Int32x4) RotateAllLeft

func (x Int32x4) RotateAllLeft(shift uint8) Int32x4

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLD, CPU Feature: AVX512

func (Int32x4) RotateAllRight

func (x Int32x4) RotateAllRight(shift uint8) Int32x4

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORD, CPU Feature: AVX512

func (Int32x4) RotateLeft

func (x Int32x4) RotateLeft(y Int32x4) Int32x4

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVD, CPU Feature: AVX512

func (Int32x4) RotateRight

func (x Int32x4) RotateRight(y Int32x4) Int32x4

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVD, CPU Feature: AVX512

func (Int32x4) SaturateToInt16

func (x Int32x4) SaturateToInt16() Int16x8

SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.

Asm: VPMOVSDW, CPU Feature: AVX512

func (Int32x4) SaturateToInt16Concat

func (x Int32x4) SaturateToInt16Concat(y Int32x4) Int16x8

SaturateToInt16Concat converts element values to int16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.

Asm: VPACKSSDW, CPU Feature: AVX

func (Int32x4) SaturateToInt8

func (x Int32x4) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSDB, CPU Feature: AVX512

func (Int32x4) SaturateToUint8

func (x Int32x4) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSDB, CPU Feature: AVX512

func (Int32x4) SelectFromPair

func (x Int32x4) SelectFromPair(a, b, c, d uint8, y Int32x4) Int32x4

SelectFromPair returns the selection of four elements from the two vectors x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX

func (Int32x4) SetElem

func (x Int32x4) SetElem(index uint8, y int32) Int32x4

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRD, CPU Feature: AVX

func (Int32x4) ShiftAllLeft

func (x Int32x4) ShiftAllLeft(y uint64) Int32x4

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLD, CPU Feature: AVX

func (Int32x4) ShiftAllLeftConcat

func (x Int32x4) ShiftAllLeftConcat(shift uint8, y Int32x4) Int32x4

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDD, CPU Feature: AVX512VBMI2

func (Int32x4) ShiftAllRight

func (x Int32x4) ShiftAllRight(y uint64) Int32x4

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAD, CPU Feature: AVX

func (Int32x4) ShiftAllRightConcat

func (x Int32x4) ShiftAllRightConcat(shift uint8, y Int32x4) Int32x4

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDD, CPU Feature: AVX512VBMI2

func (Int32x4) ShiftLeft

func (x Int32x4) ShiftLeft(y Int32x4) Int32x4

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVD, CPU Feature: AVX2

func (Int32x4) ShiftLeftConcat

func (x Int32x4) ShiftLeftConcat(y Int32x4, z Int32x4) Int32x4

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

func (Int32x4) ShiftRight

func (x Int32x4) ShiftRight(y Int32x4) Int32x4

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVD, CPU Feature: AVX2

func (Int32x4) ShiftRightConcat

func (x Int32x4) ShiftRightConcat(y Int32x4, z Int32x4) Int32x4

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

func (Int32x4) Store

func (x Int32x4) Store(y *[4]int32)

Store stores a Int32x4 to an array

func (Int32x4) StoreMasked

func (x Int32x4) StoreMasked(y *[4]int32, mask Mask32x4)

StoreMasked stores a Int32x4 to an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Int32x4) StoreSlice

func (x Int32x4) StoreSlice(s []int32)

StoreSlice stores x into a slice of at least 4 int32s

func (Int32x4) StoreSlicePart

func (x Int32x4) StoreSlicePart(s []int32)

StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.

func (Int32x4) String

func (x Int32x4) String() string

String returns a string representation of SIMD vector x

func (Int32x4) Sub

func (x Int32x4) Sub(y Int32x4) Int32x4

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBD, CPU Feature: AVX

func (Int32x4) SubPairs

func (x Int32x4) SubPairs(y Int32x4) Int32x4

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBD, CPU Feature: AVX

func (Int32x4) ToMask

func (from Int32x4) ToMask() (to Mask32x4)

ToMask converts from Int32x4 to Mask32x4, mask element is set to true when the corresponding vector element is non-zero.

func (Int32x4) TruncateToInt16

func (x Int32x4) TruncateToInt16() Int16x8

TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.

Asm: VPMOVDW, CPU Feature: AVX512

func (Int32x4) TruncateToInt8

func (x Int32x4) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVDB, CPU Feature: AVX512

func (Int32x4) Xor

func (x Int32x4) Xor(y Int32x4) Int32x4

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Int32x8

type Int32x8 struct {
	// contains filtered or unexported fields
}

Int32x8 is a 256-bit SIMD vector of 8 int32

func BroadcastInt32x8

func BroadcastInt32x8(x int32) Int32x8

BroadcastInt32x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt32x8

func LoadInt32x8(y *[8]int32) Int32x8

LoadInt32x8 loads a Int32x8 from an array

func LoadInt32x8Slice

func LoadInt32x8Slice(s []int32) Int32x8

LoadInt32x8Slice loads an Int32x8 from a slice of at least 8 int32s

func LoadInt32x8SlicePart

func LoadInt32x8SlicePart(s []int32) Int32x8

LoadInt32x8SlicePart loads a Int32x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadInt32x8Slice.

func LoadMaskedInt32x8

func LoadMaskedInt32x8(y *[8]int32, mask Mask32x8) Int32x8

LoadMaskedInt32x8 loads a Int32x8 from an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Int32x8) Abs

func (x Int32x8) Abs() Int32x8

Abs computes the absolute value of each element.

Asm: VPABSD, CPU Feature: AVX2

func (Int32x8) Add

func (x Int32x8) Add(y Int32x8) Int32x8

Add adds corresponding elements of two vectors.

Asm: VPADDD, CPU Feature: AVX2

func (Int32x8) AddPairs

func (x Int32x8) AddPairs(y Int32x8) Int32x8

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDD, CPU Feature: AVX2

func (Int32x8) And

func (x Int32x8) And(y Int32x8) Int32x8

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Int32x8) AndNot

func (x Int32x8) AndNot(y Int32x8) Int32x8

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Int32x8) AsFloat32x8

func (from Int32x8) AsFloat32x8() (to Float32x8)

Float32x8 converts from Int32x8 to Float32x8

func (Int32x8) AsFloat64x4

func (from Int32x8) AsFloat64x4() (to Float64x4)

Float64x4 converts from Int32x8 to Float64x4

func (Int32x8) AsInt16x16

func (from Int32x8) AsInt16x16() (to Int16x16)

Int16x16 converts from Int32x8 to Int16x16

func (Int32x8) AsInt64x4

func (from Int32x8) AsInt64x4() (to Int64x4)

Int64x4 converts from Int32x8 to Int64x4

func (Int32x8) AsInt8x32

func (from Int32x8) AsInt8x32() (to Int8x32)

Int8x32 converts from Int32x8 to Int8x32

func (Int32x8) AsUint16x16

func (from Int32x8) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Int32x8 to Uint16x16

func (Int32x8) AsUint32x8

func (from Int32x8) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Int32x8 to Uint32x8

func (Int32x8) AsUint64x4

func (from Int32x8) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Int32x8 to Uint64x4

func (Int32x8) AsUint8x32

func (from Int32x8) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Int32x8 to Uint8x32

func (Int32x8) Compress

func (x Int32x8) Compress(mask Mask32x8) Int32x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSD, CPU Feature: AVX512

func (Int32x8) ConcatPermute

func (x Int32x8) ConcatPermute(y Int32x8, indices Uint32x8) Int32x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2D, CPU Feature: AVX512

func (Int32x8) ConvertToFloat32

func (x Int32x8) ConvertToFloat32() Float32x8

ConvertToFloat32 converts element values to float32.

Asm: VCVTDQ2PS, CPU Feature: AVX

func (Int32x8) ConvertToFloat64

func (x Int32x8) ConvertToFloat64() Float64x8

ConvertToFloat64 converts element values to float64.

Asm: VCVTDQ2PD, CPU Feature: AVX512

func (Int32x8) CopySign

func (x Int32x8) CopySign(y Int32x8) Int32x8

CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.

Asm: VPSIGND, CPU Feature: AVX2

func (Int32x8) Equal

func (x Int32x8) Equal(y Int32x8) Mask32x8

Equal returns x equals y, elementwise.

Asm: VPCMPEQD, CPU Feature: AVX2

func (Int32x8) Expand

func (x Int32x8) Expand(mask Mask32x8) Int32x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDD, CPU Feature: AVX512

func (Int32x8) ExtendToInt64

func (x Int32x8) ExtendToInt64() Int64x8

ExtendToInt64 converts element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXDQ, CPU Feature: AVX512

func (Int32x8) GetHi

func (x Int32x8) GetHi() Int32x4

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int32x8) GetLo

func (x Int32x8) GetLo() Int32x4

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int32x8) Greater

func (x Int32x8) Greater(y Int32x8) Mask32x8

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTD, CPU Feature: AVX2

func (Int32x8) GreaterEqual

func (x Int32x8) GreaterEqual(y Int32x8) Mask32x8

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Int32x8) InterleaveHiGrouped

func (x Int32x8) InterleaveHiGrouped(y Int32x8) Int32x8

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHDQ, CPU Feature: AVX2

func (Int32x8) InterleaveLoGrouped

func (x Int32x8) InterleaveLoGrouped(y Int32x8) Int32x8

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLDQ, CPU Feature: AVX2

func (Int32x8) IsZero

func (x Int32x8) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int32x8) LeadingZeros

func (x Int32x8) LeadingZeros() Int32x8

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTD, CPU Feature: AVX512

func (Int32x8) Len

func (x Int32x8) Len() int

Len returns the number of elements in a Int32x8

func (Int32x8) Less

func (x Int32x8) Less(y Int32x8) Mask32x8

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Int32x8) LessEqual

func (x Int32x8) LessEqual(y Int32x8) Mask32x8

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Int32x8) Masked

func (x Int32x8) Masked(mask Mask32x8) Int32x8

Masked returns x but with elements zeroed where mask is false.

func (Int32x8) Max

func (x Int32x8) Max(y Int32x8) Int32x8

Max computes the maximum of corresponding elements.

Asm: VPMAXSD, CPU Feature: AVX2

func (Int32x8) Merge

func (x Int32x8) Merge(y Int32x8, mask Mask32x8) Int32x8

Merge returns x but with elements set to y where mask is false.

func (Int32x8) Min

func (x Int32x8) Min(y Int32x8) Int32x8

Min computes the minimum of corresponding elements.

Asm: VPMINSD, CPU Feature: AVX2

func (Int32x8) Mul

func (x Int32x8) Mul(y Int32x8) Int32x8

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLD, CPU Feature: AVX2

func (Int32x8) MulEvenWiden

func (x Int32x8) MulEvenWiden(y Int32x8) Int64x4

MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1.Even[i] * v2.Even[i].

Asm: VPMULDQ, CPU Feature: AVX2

func (Int32x8) Not

func (x Int32x8) Not() Int32x8

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Int32x8) NotEqual

func (x Int32x8) NotEqual(y Int32x8) Mask32x8

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Int32x8) OnesCount

func (x Int32x8) OnesCount() Int32x8

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

func (Int32x8) Or

func (x Int32x8) Or(y Int32x8) Int32x8

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Int32x8) Permute

func (x Int32x8) Permute(indices Uint32x8) Int32x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMD, CPU Feature: AVX2

func (Int32x8) PermuteScalarsGrouped

func (x Int32x8) PermuteScalarsGrouped(a, b, c, d uint8) Int32x8

PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:

result = {x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.

Asm: VPSHUFD, CPU Feature: AVX2

func (Int32x8) RotateAllLeft

func (x Int32x8) RotateAllLeft(shift uint8) Int32x8

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLD, CPU Feature: AVX512

func (Int32x8) RotateAllRight

func (x Int32x8) RotateAllRight(shift uint8) Int32x8

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORD, CPU Feature: AVX512

func (Int32x8) RotateLeft

func (x Int32x8) RotateLeft(y Int32x8) Int32x8

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVD, CPU Feature: AVX512

func (Int32x8) RotateRight

func (x Int32x8) RotateRight(y Int32x8) Int32x8

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVD, CPU Feature: AVX512

func (Int32x8) SaturateToInt16

func (x Int32x8) SaturateToInt16() Int16x8

SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.

Asm: VPMOVSDW, CPU Feature: AVX512

func (Int32x8) SaturateToInt16Concat

func (x Int32x8) SaturateToInt16Concat(y Int32x8) Int16x16

SaturateToInt16Concat converts element values to int16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.

Asm: VPACKSSDW, CPU Feature: AVX2

func (Int32x8) SaturateToInt8

func (x Int32x8) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSDB, CPU Feature: AVX512

func (Int32x8) SaturateToUint8

func (x Int32x8) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSDB, CPU Feature: AVX512

func (Int32x8) Select128FromPair

func (x Int32x8) Select128FromPair(lo, hi uint8, y Int32x8) Int32x8

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})

returns {70, 71, 72, 73, 40, 41, 42, 43}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Int32x8) SelectFromPairGrouped

func (x Int32x8) SelectFromPairGrouped(a, b, c, d uint8, y Int32x8) Int32x8

SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})

returns {4,8,25,81,64,128,169,289}

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX

func (Int32x8) SetHi

func (x Int32x8) SetHi(y Int32x4) Int32x8

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int32x8) SetLo

func (x Int32x8) SetLo(y Int32x4) Int32x8

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int32x8) ShiftAllLeft

func (x Int32x8) ShiftAllLeft(y uint64) Int32x8

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLD, CPU Feature: AVX2

func (Int32x8) ShiftAllLeftConcat

func (x Int32x8) ShiftAllLeftConcat(shift uint8, y Int32x8) Int32x8

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDD, CPU Feature: AVX512VBMI2

func (Int32x8) ShiftAllRight

func (x Int32x8) ShiftAllRight(y uint64) Int32x8

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAD, CPU Feature: AVX2

func (Int32x8) ShiftAllRightConcat

func (x Int32x8) ShiftAllRightConcat(shift uint8, y Int32x8) Int32x8

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDD, CPU Feature: AVX512VBMI2

func (Int32x8) ShiftLeft

func (x Int32x8) ShiftLeft(y Int32x8) Int32x8

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVD, CPU Feature: AVX2

func (Int32x8) ShiftLeftConcat

func (x Int32x8) ShiftLeftConcat(y Int32x8, z Int32x8) Int32x8

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

func (Int32x8) ShiftRight

func (x Int32x8) ShiftRight(y Int32x8) Int32x8

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVD, CPU Feature: AVX2

func (Int32x8) ShiftRightConcat

func (x Int32x8) ShiftRightConcat(y Int32x8, z Int32x8) Int32x8

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

func (Int32x8) Store

func (x Int32x8) Store(y *[8]int32)

Store stores a Int32x8 to an array

func (Int32x8) StoreMasked

func (x Int32x8) StoreMasked(y *[8]int32, mask Mask32x8)

StoreMasked stores a Int32x8 to an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Int32x8) StoreSlice

func (x Int32x8) StoreSlice(s []int32)

StoreSlice stores x into a slice of at least 8 int32s

func (Int32x8) StoreSlicePart

func (x Int32x8) StoreSlicePart(s []int32)

StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Int32x8) String

func (x Int32x8) String() string

String returns a string representation of SIMD vector x

func (Int32x8) Sub

func (x Int32x8) Sub(y Int32x8) Int32x8

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBD, CPU Feature: AVX2

func (Int32x8) SubPairs

func (x Int32x8) SubPairs(y Int32x8) Int32x8

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBD, CPU Feature: AVX2

func (Int32x8) ToMask

func (from Int32x8) ToMask() (to Mask32x8)

ToMask converts from Int32x8 to Mask32x8, mask element is set to true when the corresponding vector element is non-zero.

func (Int32x8) TruncateToInt16

func (x Int32x8) TruncateToInt16() Int16x8

TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.

Asm: VPMOVDW, CPU Feature: AVX512

func (Int32x8) TruncateToInt8

func (x Int32x8) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVDB, CPU Feature: AVX512

func (Int32x8) Xor

func (x Int32x8) Xor(y Int32x8) Int32x8

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Int64x2

type Int64x2 struct {
	// contains filtered or unexported fields
}

Int64x2 is a 128-bit SIMD vector of 2 int64

func BroadcastInt64x2

func BroadcastInt64x2(x int64) Int64x2

BroadcastInt64x2 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt64x2

func LoadInt64x2(y *[2]int64) Int64x2

LoadInt64x2 loads a Int64x2 from an array

func LoadInt64x2Slice

func LoadInt64x2Slice(s []int64) Int64x2

LoadInt64x2Slice loads an Int64x2 from a slice of at least 2 int64s

func LoadInt64x2SlicePart

func LoadInt64x2SlicePart(s []int64) Int64x2

LoadInt64x2SlicePart loads a Int64x2 from the slice s. If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes. If s has 2 or more elements, the function is equivalent to LoadInt64x2Slice.

func LoadMaskedInt64x2

func LoadMaskedInt64x2(y *[2]int64, mask Mask64x2) Int64x2

LoadMaskedInt64x2 loads a Int64x2 from an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Int64x2) Abs

func (x Int64x2) Abs() Int64x2

Abs computes the absolute value of each element.

Asm: VPABSQ, CPU Feature: AVX512

func (Int64x2) Add

func (x Int64x2) Add(y Int64x2) Int64x2

Add adds corresponding elements of two vectors.

Asm: VPADDQ, CPU Feature: AVX

func (Int64x2) And

func (x Int64x2) And(y Int64x2) Int64x2

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Int64x2) AndNot

func (x Int64x2) AndNot(y Int64x2) Int64x2

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Int64x2) AsFloat32x4

func (from Int64x2) AsFloat32x4() (to Float32x4)

Float32x4 converts from Int64x2 to Float32x4

func (Int64x2) AsFloat64x2

func (from Int64x2) AsFloat64x2() (to Float64x2)

Float64x2 converts from Int64x2 to Float64x2

func (Int64x2) AsInt16x8

func (from Int64x2) AsInt16x8() (to Int16x8)

Int16x8 converts from Int64x2 to Int16x8

func (Int64x2) AsInt32x4

func (from Int64x2) AsInt32x4() (to Int32x4)

Int32x4 converts from Int64x2 to Int32x4

func (Int64x2) AsInt8x16

func (from Int64x2) AsInt8x16() (to Int8x16)

Int8x16 converts from Int64x2 to Int8x16

func (Int64x2) AsUint16x8

func (from Int64x2) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Int64x2 to Uint16x8

func (Int64x2) AsUint32x4

func (from Int64x2) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Int64x2 to Uint32x4

func (Int64x2) AsUint64x2

func (from Int64x2) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Int64x2 to Uint64x2

func (Int64x2) AsUint8x16

func (from Int64x2) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Int64x2 to Uint8x16

func (Int64x2) Broadcast128

func (x Int64x2) Broadcast128() Int64x2

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTQ, CPU Feature: AVX2

func (Int64x2) Broadcast256

func (x Int64x2) Broadcast256() Int64x4

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTQ, CPU Feature: AVX2

func (Int64x2) Broadcast512

func (x Int64x2) Broadcast512() Int64x8

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTQ, CPU Feature: AVX512

func (Int64x2) Compress

func (x Int64x2) Compress(mask Mask64x2) Int64x2

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSQ, CPU Feature: AVX512

func (Int64x2) ConcatPermute

func (x Int64x2) ConcatPermute(y Int64x2, indices Uint64x2) Int64x2

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2Q, CPU Feature: AVX512

func (Int64x2) ConvertToFloat32

func (x Int64x2) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32.

Asm: VCVTQQ2PSX, CPU Feature: AVX512

func (Int64x2) ConvertToFloat64

func (x Int64x2) ConvertToFloat64() Float64x2

ConvertToFloat64 converts element values to float64.

Asm: VCVTQQ2PD, CPU Feature: AVX512

func (Int64x2) Equal

func (x Int64x2) Equal(y Int64x2) Mask64x2

Equal returns x equals y, elementwise.

Asm: VPCMPEQQ, CPU Feature: AVX

func (Int64x2) Expand

func (x Int64x2) Expand(mask Mask64x2) Int64x2

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDQ, CPU Feature: AVX512

func (Int64x2) GetElem

func (x Int64x2) GetElem(index uint8) int64

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRQ, CPU Feature: AVX

func (Int64x2) Greater

func (x Int64x2) Greater(y Int64x2) Mask64x2

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTQ, CPU Feature: AVX

func (Int64x2) GreaterEqual

func (x Int64x2) GreaterEqual(y Int64x2) Mask64x2

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX

func (Int64x2) InterleaveHi

func (x Int64x2) InterleaveHi(y Int64x2) Int64x2

InterleaveHi interleaves the elements of the high halves of x and y.

Asm: VPUNPCKHQDQ, CPU Feature: AVX

func (Int64x2) InterleaveLo

func (x Int64x2) InterleaveLo(y Int64x2) Int64x2

InterleaveLo interleaves the elements of the low halves of x and y.

Asm: VPUNPCKLQDQ, CPU Feature: AVX

func (Int64x2) IsZero

func (x Int64x2) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int64x2) LeadingZeros

func (x Int64x2) LeadingZeros() Int64x2

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTQ, CPU Feature: AVX512

func (Int64x2) Len

func (x Int64x2) Len() int

Len returns the number of elements in a Int64x2

func (Int64x2) Less

func (x Int64x2) Less(y Int64x2) Mask64x2

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX

func (Int64x2) LessEqual

func (x Int64x2) LessEqual(y Int64x2) Mask64x2

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX

func (Int64x2) Masked

func (x Int64x2) Masked(mask Mask64x2) Int64x2

Masked returns x but with elements zeroed where mask is false.

func (Int64x2) Max

func (x Int64x2) Max(y Int64x2) Int64x2

Max computes the maximum of corresponding elements.

Asm: VPMAXSQ, CPU Feature: AVX512

func (Int64x2) Merge

func (x Int64x2) Merge(y Int64x2, mask Mask64x2) Int64x2

Merge returns x but with elements set to y where mask is false.

func (Int64x2) Min

func (x Int64x2) Min(y Int64x2) Int64x2

Min computes the minimum of corresponding elements.

Asm: VPMINSQ, CPU Feature: AVX512

func (Int64x2) Mul

func (x Int64x2) Mul(y Int64x2) Int64x2

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLQ, CPU Feature: AVX512

func (Int64x2) Not

func (x Int64x2) Not() Int64x2

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Int64x2) NotEqual

func (x Int64x2) NotEqual(y Int64x2) Mask64x2

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Int64x2) OnesCount

func (x Int64x2) OnesCount() Int64x2

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

func (Int64x2) Or

func (x Int64x2) Or(y Int64x2) Int64x2

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Int64x2) RotateAllLeft

func (x Int64x2) RotateAllLeft(shift uint8) Int64x2

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLQ, CPU Feature: AVX512

func (Int64x2) RotateAllRight

func (x Int64x2) RotateAllRight(shift uint8) Int64x2

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORQ, CPU Feature: AVX512

func (Int64x2) RotateLeft

func (x Int64x2) RotateLeft(y Int64x2) Int64x2

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVQ, CPU Feature: AVX512

func (Int64x2) RotateRight

func (x Int64x2) RotateRight(y Int64x2) Int64x2

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVQ, CPU Feature: AVX512

func (Int64x2) SaturateToInt16

func (x Int64x2) SaturateToInt16() Int16x8

SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.

Asm: VPMOVSQW, CPU Feature: AVX512

func (Int64x2) SaturateToInt32

func (x Int64x2) SaturateToInt32() Int32x4

SaturateToInt32 converts element values to int32. Conversion is done with saturation on the vector elements.

Asm: VPMOVSQD, CPU Feature: AVX512

func (Int64x2) SaturateToInt8

func (x Int64x2) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSQB, CPU Feature: AVX512

func (Int64x2) SaturateToUint8

func (x Int64x2) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSQB, CPU Feature: AVX512

func (Int64x2) SelectFromPair

func (x Int64x2) SelectFromPair(a, b uint8, y Int64x2) Int64x2

SelectFromPair returns the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX

func (Int64x2) SetElem

func (x Int64x2) SetElem(index uint8, y int64) Int64x2

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRQ, CPU Feature: AVX

func (Int64x2) ShiftAllLeft

func (x Int64x2) ShiftAllLeft(y uint64) Int64x2

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLQ, CPU Feature: AVX

func (Int64x2) ShiftAllLeftConcat

func (x Int64x2) ShiftAllLeftConcat(shift uint8, y Int64x2) Int64x2

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

func (Int64x2) ShiftAllRight

func (x Int64x2) ShiftAllRight(y uint64) Int64x2

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAQ, CPU Feature: AVX512

func (Int64x2) ShiftAllRightConcat

func (x Int64x2) ShiftAllRightConcat(shift uint8, y Int64x2) Int64x2

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

func (Int64x2) ShiftLeft

func (x Int64x2) ShiftLeft(y Int64x2) Int64x2

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVQ, CPU Feature: AVX2

func (Int64x2) ShiftLeftConcat

func (x Int64x2) ShiftLeftConcat(y Int64x2, z Int64x2) Int64x2

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

func (Int64x2) ShiftRight

func (x Int64x2) ShiftRight(y Int64x2) Int64x2

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVQ, CPU Feature: AVX512

func (Int64x2) ShiftRightConcat

func (x Int64x2) ShiftRightConcat(y Int64x2, z Int64x2) Int64x2

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

func (Int64x2) Store

func (x Int64x2) Store(y *[2]int64)

Store stores a Int64x2 to an array

func (Int64x2) StoreMasked

func (x Int64x2) StoreMasked(y *[2]int64, mask Mask64x2)

StoreMasked stores a Int64x2 to an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Int64x2) StoreSlice

func (x Int64x2) StoreSlice(s []int64)

StoreSlice stores x into a slice of at least 2 int64s

func (Int64x2) StoreSlicePart

func (x Int64x2) StoreSlicePart(s []int64)

StoreSlicePart stores the 2 elements of x into the slice s. It stores as many elements as will fit in s. If s has 2 or more elements, the method is equivalent to x.StoreSlice.

func (Int64x2) String

func (x Int64x2) String() string

String returns a string representation of SIMD vector x

func (Int64x2) Sub

func (x Int64x2) Sub(y Int64x2) Int64x2

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBQ, CPU Feature: AVX

func (Int64x2) ToMask

func (from Int64x2) ToMask() (to Mask64x2)

ToMask converts from Int64x2 to Mask64x2, mask element is set to true when the corresponding vector element is non-zero.

func (Int64x2) TruncateToInt16

func (x Int64x2) TruncateToInt16() Int16x8

TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.

Asm: VPMOVQW, CPU Feature: AVX512

func (Int64x2) TruncateToInt32

func (x Int64x2) TruncateToInt32() Int32x4

TruncateToInt32 converts element values to int32. Conversion is done with truncation on the vector elements.

Asm: VPMOVQD, CPU Feature: AVX512

func (Int64x2) TruncateToInt8

func (x Int64x2) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVQB, CPU Feature: AVX512

func (Int64x2) Xor

func (x Int64x2) Xor(y Int64x2) Int64x2

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Int64x4

type Int64x4 struct {
	// contains filtered or unexported fields
}

Int64x4 is a 256-bit SIMD vector of 4 int64

func BroadcastInt64x4

func BroadcastInt64x4(x int64) Int64x4

BroadcastInt64x4 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt64x4

func LoadInt64x4(y *[4]int64) Int64x4

LoadInt64x4 loads a Int64x4 from an array

func LoadInt64x4Slice

func LoadInt64x4Slice(s []int64) Int64x4

LoadInt64x4Slice loads an Int64x4 from a slice of at least 4 int64s

func LoadInt64x4SlicePart

func LoadInt64x4SlicePart(s []int64) Int64x4

LoadInt64x4SlicePart loads a Int64x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadInt64x4Slice.

func LoadMaskedInt64x4

func LoadMaskedInt64x4(y *[4]int64, mask Mask64x4) Int64x4

LoadMaskedInt64x4 loads a Int64x4 from an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Int64x4) Abs

func (x Int64x4) Abs() Int64x4

Abs computes the absolute value of each element.

Asm: VPABSQ, CPU Feature: AVX512

func (Int64x4) Add

func (x Int64x4) Add(y Int64x4) Int64x4

Add adds corresponding elements of two vectors.

Asm: VPADDQ, CPU Feature: AVX2

func (Int64x4) And

func (x Int64x4) And(y Int64x4) Int64x4

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Int64x4) AndNot

func (x Int64x4) AndNot(y Int64x4) Int64x4

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Int64x4) AsFloat32x8

func (from Int64x4) AsFloat32x8() (to Float32x8)

Float32x8 converts from Int64x4 to Float32x8

func (Int64x4) AsFloat64x4

func (from Int64x4) AsFloat64x4() (to Float64x4)

Float64x4 converts from Int64x4 to Float64x4

func (Int64x4) AsInt16x16

func (from Int64x4) AsInt16x16() (to Int16x16)

Int16x16 converts from Int64x4 to Int16x16

func (Int64x4) AsInt32x8

func (from Int64x4) AsInt32x8() (to Int32x8)

Int32x8 converts from Int64x4 to Int32x8

func (Int64x4) AsInt8x32

func (from Int64x4) AsInt8x32() (to Int8x32)

Int8x32 converts from Int64x4 to Int8x32

func (Int64x4) AsUint16x16

func (from Int64x4) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Int64x4 to Uint16x16

func (Int64x4) AsUint32x8

func (from Int64x4) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Int64x4 to Uint32x8

func (Int64x4) AsUint64x4

func (from Int64x4) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Int64x4 to Uint64x4

func (Int64x4) AsUint8x32

func (from Int64x4) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Int64x4 to Uint8x32

func (Int64x4) Compress

func (x Int64x4) Compress(mask Mask64x4) Int64x4

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSQ, CPU Feature: AVX512

func (Int64x4) ConcatPermute

func (x Int64x4) ConcatPermute(y Int64x4, indices Uint64x4) Int64x4

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2Q, CPU Feature: AVX512

func (Int64x4) ConvertToFloat32

func (x Int64x4) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32.

Asm: VCVTQQ2PSY, CPU Feature: AVX512

func (Int64x4) ConvertToFloat64

func (x Int64x4) ConvertToFloat64() Float64x4

ConvertToFloat64 converts element values to float64.

Asm: VCVTQQ2PD, CPU Feature: AVX512

func (Int64x4) Equal

func (x Int64x4) Equal(y Int64x4) Mask64x4

Equal returns x equals y, elementwise.

Asm: VPCMPEQQ, CPU Feature: AVX2

func (Int64x4) Expand

func (x Int64x4) Expand(mask Mask64x4) Int64x4

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDQ, CPU Feature: AVX512

func (Int64x4) GetHi

func (x Int64x4) GetHi() Int64x2

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int64x4) GetLo

func (x Int64x4) GetLo() Int64x2

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int64x4) Greater

func (x Int64x4) Greater(y Int64x4) Mask64x4

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTQ, CPU Feature: AVX2

func (Int64x4) GreaterEqual

func (x Int64x4) GreaterEqual(y Int64x4) Mask64x4

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Int64x4) InterleaveHiGrouped

func (x Int64x4) InterleaveHiGrouped(y Int64x4) Int64x4

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHQDQ, CPU Feature: AVX2

func (Int64x4) InterleaveLoGrouped

func (x Int64x4) InterleaveLoGrouped(y Int64x4) Int64x4

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLQDQ, CPU Feature: AVX2

func (Int64x4) IsZero

func (x Int64x4) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int64x4) LeadingZeros

func (x Int64x4) LeadingZeros() Int64x4

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTQ, CPU Feature: AVX512

func (Int64x4) Len

func (x Int64x4) Len() int

Len returns the number of elements in a Int64x4

func (Int64x4) Less

func (x Int64x4) Less(y Int64x4) Mask64x4

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Int64x4) LessEqual

func (x Int64x4) LessEqual(y Int64x4) Mask64x4

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Int64x4) Masked

func (x Int64x4) Masked(mask Mask64x4) Int64x4

Masked returns x but with elements zeroed where mask is false.

func (Int64x4) Max

func (x Int64x4) Max(y Int64x4) Int64x4

Max computes the maximum of corresponding elements.

Asm: VPMAXSQ, CPU Feature: AVX512

func (Int64x4) Merge

func (x Int64x4) Merge(y Int64x4, mask Mask64x4) Int64x4

Merge returns x but with elements set to y where mask is false.

func (Int64x4) Min

func (x Int64x4) Min(y Int64x4) Int64x4

Min computes the minimum of corresponding elements.

Asm: VPMINSQ, CPU Feature: AVX512

func (Int64x4) Mul

func (x Int64x4) Mul(y Int64x4) Int64x4

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLQ, CPU Feature: AVX512

func (Int64x4) Not

func (x Int64x4) Not() Int64x4

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Int64x4) NotEqual

func (x Int64x4) NotEqual(y Int64x4) Mask64x4

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Int64x4) OnesCount

func (x Int64x4) OnesCount() Int64x4

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

func (Int64x4) Or

func (x Int64x4) Or(y Int64x4) Int64x4

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Int64x4) Permute

func (x Int64x4) Permute(indices Uint64x4) Int64x4

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 2 bits (values 0-3) of each element of indices is used

Asm: VPERMQ, CPU Feature: AVX512

func (Int64x4) RotateAllLeft

func (x Int64x4) RotateAllLeft(shift uint8) Int64x4

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLQ, CPU Feature: AVX512

func (Int64x4) RotateAllRight

func (x Int64x4) RotateAllRight(shift uint8) Int64x4

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORQ, CPU Feature: AVX512

func (Int64x4) RotateLeft

func (x Int64x4) RotateLeft(y Int64x4) Int64x4

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVQ, CPU Feature: AVX512

func (Int64x4) RotateRight

func (x Int64x4) RotateRight(y Int64x4) Int64x4

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVQ, CPU Feature: AVX512

func (Int64x4) SaturateToInt16

func (x Int64x4) SaturateToInt16() Int16x8

SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.

Asm: VPMOVSQW, CPU Feature: AVX512

func (Int64x4) SaturateToInt32

func (x Int64x4) SaturateToInt32() Int32x4

SaturateToInt32 converts element values to int32. Conversion is done with saturation on the vector elements.

Asm: VPMOVSQD, CPU Feature: AVX512

func (Int64x4) SaturateToInt8

func (x Int64x4) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSQB, CPU Feature: AVX512

func (Int64x4) SaturateToUint8

func (x Int64x4) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSQB, CPU Feature: AVX512

func (Int64x4) Select128FromPair

func (x Int64x4) Select128FromPair(lo, hi uint8, y Int64x4) Int64x4

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})

returns {70, 71, 40, 41}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Int64x4) SelectFromPairGrouped

func (x Int64x4) SelectFromPairGrouped(a, b uint8, y Int64x4) Int64x4

SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX

func (Int64x4) SetHi

func (x Int64x4) SetHi(y Int64x2) Int64x4

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int64x4) SetLo

func (x Int64x4) SetLo(y Int64x2) Int64x4

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int64x4) ShiftAllLeft

func (x Int64x4) ShiftAllLeft(y uint64) Int64x4

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLQ, CPU Feature: AVX2

func (Int64x4) ShiftAllLeftConcat

func (x Int64x4) ShiftAllLeftConcat(shift uint8, y Int64x4) Int64x4

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

func (Int64x4) ShiftAllRight

func (x Int64x4) ShiftAllRight(y uint64) Int64x4

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAQ, CPU Feature: AVX512

func (Int64x4) ShiftAllRightConcat

func (x Int64x4) ShiftAllRightConcat(shift uint8, y Int64x4) Int64x4

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

func (Int64x4) ShiftLeft

func (x Int64x4) ShiftLeft(y Int64x4) Int64x4

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVQ, CPU Feature: AVX2

func (Int64x4) ShiftLeftConcat

func (x Int64x4) ShiftLeftConcat(y Int64x4, z Int64x4) Int64x4

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

func (Int64x4) ShiftRight

func (x Int64x4) ShiftRight(y Int64x4) Int64x4

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVQ, CPU Feature: AVX512

func (Int64x4) ShiftRightConcat

func (x Int64x4) ShiftRightConcat(y Int64x4, z Int64x4) Int64x4

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

func (Int64x4) Store

func (x Int64x4) Store(y *[4]int64)

Store stores a Int64x4 to an array

func (Int64x4) StoreMasked

func (x Int64x4) StoreMasked(y *[4]int64, mask Mask64x4)

StoreMasked stores a Int64x4 to an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Int64x4) StoreSlice

func (x Int64x4) StoreSlice(s []int64)

StoreSlice stores x into a slice of at least 4 int64s

func (Int64x4) StoreSlicePart

func (x Int64x4) StoreSlicePart(s []int64)

StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.

func (Int64x4) String

func (x Int64x4) String() string

String returns a string representation of SIMD vector x

func (Int64x4) Sub

func (x Int64x4) Sub(y Int64x4) Int64x4

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBQ, CPU Feature: AVX2

func (Int64x4) ToMask

func (from Int64x4) ToMask() (to Mask64x4)

ToMask converts from Int64x4 to Mask64x4, mask element is set to true when the corresponding vector element is non-zero.

func (Int64x4) TruncateToInt16

func (x Int64x4) TruncateToInt16() Int16x8

TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.

Asm: VPMOVQW, CPU Feature: AVX512

func (Int64x4) TruncateToInt32

func (x Int64x4) TruncateToInt32() Int32x4

TruncateToInt32 converts element values to int32. Conversion is done with truncation on the vector elements.

Asm: VPMOVQD, CPU Feature: AVX512

func (Int64x4) TruncateToInt8

func (x Int64x4) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVQB, CPU Feature: AVX512

func (Int64x4) Xor

func (x Int64x4) Xor(y Int64x4) Int64x4

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Int64x8

type Int64x8 struct {
	// contains filtered or unexported fields
}

Int64x8 is a 512-bit SIMD vector of 8 int64

func BroadcastInt64x8

func BroadcastInt64x8(x int64) Int64x8

BroadcastInt64x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512F

func LoadInt64x8

func LoadInt64x8(y *[8]int64) Int64x8

LoadInt64x8 loads a Int64x8 from an array

func LoadInt64x8Slice

func LoadInt64x8Slice(s []int64) Int64x8

LoadInt64x8Slice loads an Int64x8 from a slice of at least 8 int64s

func LoadInt64x8SlicePart

func LoadInt64x8SlicePart(s []int64) Int64x8

LoadInt64x8SlicePart loads a Int64x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadInt64x8Slice.

func LoadMaskedInt64x8

func LoadMaskedInt64x8(y *[8]int64, mask Mask64x8) Int64x8

LoadMaskedInt64x8 loads a Int64x8 from an array, at those elements enabled by mask

Asm: VMOVDQU64.Z, CPU Feature: AVX512

func (Int64x8) Abs

func (x Int64x8) Abs() Int64x8

Abs computes the absolute value of each element.

Asm: VPABSQ, CPU Feature: AVX512

func (Int64x8) Add

func (x Int64x8) Add(y Int64x8) Int64x8

Add adds corresponding elements of two vectors.

Asm: VPADDQ, CPU Feature: AVX512

func (Int64x8) And

func (x Int64x8) And(y Int64x8) Int64x8

And performs a bitwise AND operation between two vectors.

Asm: VPANDQ, CPU Feature: AVX512

func (Int64x8) AndNot

func (x Int64x8) AndNot(y Int64x8) Int64x8

AndNot performs a bitwise x &^ y.

Asm: VPANDNQ, CPU Feature: AVX512

func (Int64x8) AsFloat32x16

func (from Int64x8) AsFloat32x16() (to Float32x16)

Float32x16 converts from Int64x8 to Float32x16

func (Int64x8) AsFloat64x8

func (from Int64x8) AsFloat64x8() (to Float64x8)

Float64x8 converts from Int64x8 to Float64x8

func (Int64x8) AsInt16x32

func (from Int64x8) AsInt16x32() (to Int16x32)

Int16x32 converts from Int64x8 to Int16x32

func (Int64x8) AsInt32x16

func (from Int64x8) AsInt32x16() (to Int32x16)

Int32x16 converts from Int64x8 to Int32x16

func (Int64x8) AsInt8x64

func (from Int64x8) AsInt8x64() (to Int8x64)

Int8x64 converts from Int64x8 to Int8x64

func (Int64x8) AsUint16x32

func (from Int64x8) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Int64x8 to Uint16x32

func (Int64x8) AsUint32x16

func (from Int64x8) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Int64x8 to Uint32x16

func (Int64x8) AsUint64x8

func (from Int64x8) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Int64x8 to Uint64x8

func (Int64x8) AsUint8x64

func (from Int64x8) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Int64x8 to Uint8x64

func (Int64x8) Compress

func (x Int64x8) Compress(mask Mask64x8) Int64x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSQ, CPU Feature: AVX512

func (Int64x8) ConcatPermute

func (x Int64x8) ConcatPermute(y Int64x8, indices Uint64x8) Int64x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2Q, CPU Feature: AVX512

func (Int64x8) ConvertToFloat32

func (x Int64x8) ConvertToFloat32() Float32x8

ConvertToFloat32 converts element values to float32.

Asm: VCVTQQ2PS, CPU Feature: AVX512

func (Int64x8) ConvertToFloat64

func (x Int64x8) ConvertToFloat64() Float64x8

ConvertToFloat64 converts element values to float64.

Asm: VCVTQQ2PD, CPU Feature: AVX512

func (Int64x8) Equal

func (x Int64x8) Equal(y Int64x8) Mask64x8

Equal returns x equals y, elementwise.

Asm: VPCMPEQQ, CPU Feature: AVX512

func (Int64x8) Expand

func (x Int64x8) Expand(mask Mask64x8) Int64x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDQ, CPU Feature: AVX512

func (Int64x8) GetHi

func (x Int64x8) GetHi() Int64x4

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int64x8) GetLo

func (x Int64x8) GetLo() Int64x4

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int64x8) Greater

func (x Int64x8) Greater(y Int64x8) Mask64x8

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTQ, CPU Feature: AVX512

func (Int64x8) GreaterEqual

func (x Int64x8) GreaterEqual(y Int64x8) Mask64x8

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPQ, CPU Feature: AVX512

func (Int64x8) InterleaveHiGrouped

func (x Int64x8) InterleaveHiGrouped(y Int64x8) Int64x8

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHQDQ, CPU Feature: AVX512

func (Int64x8) InterleaveLoGrouped

func (x Int64x8) InterleaveLoGrouped(y Int64x8) Int64x8

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLQDQ, CPU Feature: AVX512

func (Int64x8) LeadingZeros

func (x Int64x8) LeadingZeros() Int64x8

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTQ, CPU Feature: AVX512

func (Int64x8) Len

func (x Int64x8) Len() int

Len returns the number of elements in a Int64x8

func (Int64x8) Less

func (x Int64x8) Less(y Int64x8) Mask64x8

Less returns x less-than y, elementwise.

Asm: VPCMPQ, CPU Feature: AVX512

func (Int64x8) LessEqual

func (x Int64x8) LessEqual(y Int64x8) Mask64x8

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPQ, CPU Feature: AVX512

func (Int64x8) Masked

func (x Int64x8) Masked(mask Mask64x8) Int64x8

Masked returns x but with elements zeroed where mask is false.

func (Int64x8) Max

func (x Int64x8) Max(y Int64x8) Int64x8

Max computes the maximum of corresponding elements.

Asm: VPMAXSQ, CPU Feature: AVX512

func (Int64x8) Merge

func (x Int64x8) Merge(y Int64x8, mask Mask64x8) Int64x8

Merge returns x but with elements set to y where m is false.

func (Int64x8) Min

func (x Int64x8) Min(y Int64x8) Int64x8

Min computes the minimum of corresponding elements.

Asm: VPMINSQ, CPU Feature: AVX512

func (Int64x8) Mul

func (x Int64x8) Mul(y Int64x8) Int64x8

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLQ, CPU Feature: AVX512

func (Int64x8) Not

func (x Int64x8) Not() Int64x8

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Int64x8) NotEqual

func (x Int64x8) NotEqual(y Int64x8) Mask64x8

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPQ, CPU Feature: AVX512

func (Int64x8) OnesCount

func (x Int64x8) OnesCount() Int64x8

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

func (Int64x8) Or

func (x Int64x8) Or(y Int64x8) Int64x8

Or performs a bitwise OR operation between two vectors.

Asm: VPORQ, CPU Feature: AVX512

func (Int64x8) Permute

func (x Int64x8) Permute(indices Uint64x8) Int64x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMQ, CPU Feature: AVX512

func (Int64x8) RotateAllLeft

func (x Int64x8) RotateAllLeft(shift uint8) Int64x8

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLQ, CPU Feature: AVX512

func (Int64x8) RotateAllRight

func (x Int64x8) RotateAllRight(shift uint8) Int64x8

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORQ, CPU Feature: AVX512

func (Int64x8) RotateLeft

func (x Int64x8) RotateLeft(y Int64x8) Int64x8

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVQ, CPU Feature: AVX512

func (Int64x8) RotateRight

func (x Int64x8) RotateRight(y Int64x8) Int64x8

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVQ, CPU Feature: AVX512

func (Int64x8) SaturateToInt16

func (x Int64x8) SaturateToInt16() Int16x8

SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.

Asm: VPMOVSQW, CPU Feature: AVX512

func (Int64x8) SaturateToInt32

func (x Int64x8) SaturateToInt32() Int32x8

SaturateToInt32 converts element values to int32. Conversion is done with saturation on the vector elements.

Asm: VPMOVSQD, CPU Feature: AVX512

func (Int64x8) SaturateToInt8

func (x Int64x8) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSQB, CPU Feature: AVX512

func (Int64x8) SaturateToUint8

func (x Int64x8) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSQB, CPU Feature: AVX512

func (Int64x8) SelectFromPairGrouped

func (x Int64x8) SelectFromPairGrouped(a, b uint8, y Int64x8) Int64x8

SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX512

func (Int64x8) SetHi

func (x Int64x8) SetHi(y Int64x4) Int64x8

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int64x8) SetLo

func (x Int64x8) SetLo(y Int64x4) Int64x8

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int64x8) ShiftAllLeft

func (x Int64x8) ShiftAllLeft(y uint64) Int64x8

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLQ, CPU Feature: AVX512

func (Int64x8) ShiftAllLeftConcat

func (x Int64x8) ShiftAllLeftConcat(shift uint8, y Int64x8) Int64x8

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

func (Int64x8) ShiftAllRight

func (x Int64x8) ShiftAllRight(y uint64) Int64x8

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAQ, CPU Feature: AVX512

func (Int64x8) ShiftAllRightConcat

func (x Int64x8) ShiftAllRightConcat(shift uint8, y Int64x8) Int64x8

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

func (Int64x8) ShiftLeft

func (x Int64x8) ShiftLeft(y Int64x8) Int64x8

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVQ, CPU Feature: AVX512

func (Int64x8) ShiftLeftConcat

func (x Int64x8) ShiftLeftConcat(y Int64x8, z Int64x8) Int64x8

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

func (Int64x8) ShiftRight

func (x Int64x8) ShiftRight(y Int64x8) Int64x8

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVQ, CPU Feature: AVX512

func (Int64x8) ShiftRightConcat

func (x Int64x8) ShiftRightConcat(y Int64x8, z Int64x8) Int64x8

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

func (Int64x8) Store

func (x Int64x8) Store(y *[8]int64)

Store stores a Int64x8 to an array

func (Int64x8) StoreMasked

func (x Int64x8) StoreMasked(y *[8]int64, mask Mask64x8)

StoreMasked stores a Int64x8 to an array, at those elements enabled by mask

Asm: VMOVDQU64, CPU Feature: AVX512

func (Int64x8) StoreSlice

func (x Int64x8) StoreSlice(s []int64)

StoreSlice stores x into a slice of at least 8 int64s

func (Int64x8) StoreSlicePart

func (x Int64x8) StoreSlicePart(s []int64)

StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Int64x8) String

func (x Int64x8) String() string

String returns a string representation of SIMD vector x

func (Int64x8) Sub

func (x Int64x8) Sub(y Int64x8) Int64x8

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBQ, CPU Feature: AVX512

func (Int64x8) ToMask

func (from Int64x8) ToMask() (to Mask64x8)

ToMask converts from Int64x8 to Mask64x8, mask element is set to true when the corresponding vector element is non-zero.

func (Int64x8) TruncateToInt16

func (x Int64x8) TruncateToInt16() Int16x8

TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.

Asm: VPMOVQW, CPU Feature: AVX512

func (Int64x8) TruncateToInt32

func (x Int64x8) TruncateToInt32() Int32x8

TruncateToInt32 converts element values to int32. Conversion is done with truncation on the vector elements.

Asm: VPMOVQD, CPU Feature: AVX512

func (Int64x8) TruncateToInt8

func (x Int64x8) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVQB, CPU Feature: AVX512

func (Int64x8) Xor

func (x Int64x8) Xor(y Int64x8) Int64x8

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORQ, CPU Feature: AVX512

type Int8x16

type Int8x16 struct {
	// contains filtered or unexported fields
}

Int8x16 is a 128-bit SIMD vector of 16 int8

func BroadcastInt8x16

func BroadcastInt8x16(x int8) Int8x16

BroadcastInt8x16 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt8x16

func LoadInt8x16(y *[16]int8) Int8x16

LoadInt8x16 loads a Int8x16 from an array

func LoadInt8x16Slice

func LoadInt8x16Slice(s []int8) Int8x16

LoadInt8x16Slice loads an Int8x16 from a slice of at least 16 int8s

func LoadInt8x16SlicePart

func LoadInt8x16SlicePart(s []int8) Int8x16

LoadInt8x16SlicePart loads a Int8x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadInt8x16Slice.

func (Int8x16) Abs

func (x Int8x16) Abs() Int8x16

Abs computes the absolute value of each element.

Asm: VPABSB, CPU Feature: AVX

func (Int8x16) Add

func (x Int8x16) Add(y Int8x16) Int8x16

Add adds corresponding elements of two vectors.

Asm: VPADDB, CPU Feature: AVX

func (Int8x16) AddSaturated

func (x Int8x16) AddSaturated(y Int8x16) Int8x16

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDSB, CPU Feature: AVX

func (Int8x16) And

func (x Int8x16) And(y Int8x16) Int8x16

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Int8x16) AndNot

func (x Int8x16) AndNot(y Int8x16) Int8x16

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Int8x16) AsFloat32x4

func (from Int8x16) AsFloat32x4() (to Float32x4)

Float32x4 converts from Int8x16 to Float32x4

func (Int8x16) AsFloat64x2

func (from Int8x16) AsFloat64x2() (to Float64x2)

Float64x2 converts from Int8x16 to Float64x2

func (Int8x16) AsInt16x8

func (from Int8x16) AsInt16x8() (to Int16x8)

Int16x8 converts from Int8x16 to Int16x8

func (Int8x16) AsInt32x4

func (from Int8x16) AsInt32x4() (to Int32x4)

Int32x4 converts from Int8x16 to Int32x4

func (Int8x16) AsInt64x2

func (from Int8x16) AsInt64x2() (to Int64x2)

Int64x2 converts from Int8x16 to Int64x2

func (Int8x16) AsUint16x8

func (from Int8x16) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Int8x16 to Uint16x8

func (Int8x16) AsUint32x4

func (from Int8x16) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Int8x16 to Uint32x4

func (Int8x16) AsUint64x2

func (from Int8x16) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Int8x16 to Uint64x2

func (Int8x16) AsUint8x16

func (from Int8x16) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Int8x16 to Uint8x16

func (Int8x16) Broadcast128

func (x Int8x16) Broadcast128() Int8x16

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTB, CPU Feature: AVX2

func (Int8x16) Broadcast256

func (x Int8x16) Broadcast256() Int8x32

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTB, CPU Feature: AVX2

func (Int8x16) Broadcast512

func (x Int8x16) Broadcast512() Int8x64

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTB, CPU Feature: AVX512

func (Int8x16) Compress

func (x Int8x16) Compress(mask Mask8x16) Int8x16

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

func (Int8x16) ConcatPermute

func (x Int8x16) ConcatPermute(y Int8x16, indices Uint8x16) Int8x16

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2B, CPU Feature: AVX512VBMI

func (Int8x16) CopySign

func (x Int8x16) CopySign(y Int8x16) Int8x16

CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.

Asm: VPSIGNB, CPU Feature: AVX

func (Int8x16) DotProductQuadruple

func (x Int8x16) DotProductQuadruple(y Uint8x16) Int32x4

DotProductQuadruple performs dot products on groups of 4 elements of x and y. DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.

Asm: VPDPBUSD, CPU Feature: AVXVNNI

func (Int8x16) DotProductQuadrupleSaturated

func (x Int8x16) DotProductQuadrupleSaturated(y Uint8x16) Int32x4

DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y. DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.

Asm: VPDPBUSDS, CPU Feature: AVXVNNI

func (Int8x16) Equal

func (x Int8x16) Equal(y Int8x16) Mask8x16

Equal returns x equals y, elementwise.

Asm: VPCMPEQB, CPU Feature: AVX

func (Int8x16) Expand

func (x Int8x16) Expand(mask Mask8x16) Int8x16

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

func (Int8x16) ExtendLo2ToInt64x2

func (x Int8x16) ExtendLo2ToInt64x2() Int64x2

ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXBQ, CPU Feature: AVX

func (Int8x16) ExtendLo4ToInt32x4

func (x Int8x16) ExtendLo4ToInt32x4() Int32x4

ExtendLo4ToInt32x4 converts 4 lowest vector element values to int32. The result vector's elements are sign-extended.

Asm: VPMOVSXBD, CPU Feature: AVX

func (Int8x16) ExtendLo4ToInt64x4

func (x Int8x16) ExtendLo4ToInt64x4() Int64x4

ExtendLo4ToInt64x4 converts 4 lowest vector element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXBQ, CPU Feature: AVX2

func (Int8x16) ExtendLo8ToInt16x8

func (x Int8x16) ExtendLo8ToInt16x8() Int16x8

ExtendLo8ToInt16x8 converts 8 lowest vector element values to int16. The result vector's elements are sign-extended.

Asm: VPMOVSXBW, CPU Feature: AVX

func (Int8x16) ExtendLo8ToInt32x8

func (x Int8x16) ExtendLo8ToInt32x8() Int32x8

ExtendLo8ToInt32x8 converts 8 lowest vector element values to int32. The result vector's elements are sign-extended.

Asm: VPMOVSXBD, CPU Feature: AVX2

func (Int8x16) ExtendLo8ToInt64x8

func (x Int8x16) ExtendLo8ToInt64x8() Int64x8

ExtendLo8ToInt64x8 converts 8 lowest vector element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXBQ, CPU Feature: AVX512

func (Int8x16) ExtendToInt16

func (x Int8x16) ExtendToInt16() Int16x16

ExtendToInt16 converts element values to int16. The result vector's elements are sign-extended.

Asm: VPMOVSXBW, CPU Feature: AVX2

func (Int8x16) ExtendToInt32

func (x Int8x16) ExtendToInt32() Int32x16

ExtendToInt32 converts element values to int32. The result vector's elements are sign-extended.

Asm: VPMOVSXBD, CPU Feature: AVX512

func (Int8x16) GetElem

func (x Int8x16) GetElem(index uint8) int8

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRB, CPU Feature: AVX512

func (Int8x16) Greater

func (x Int8x16) Greater(y Int8x16) Mask8x16

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTB, CPU Feature: AVX

func (Int8x16) GreaterEqual

func (x Int8x16) GreaterEqual(y Int8x16) Mask8x16

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX

func (Int8x16) IsZero

func (x Int8x16) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int8x16) Len

func (x Int8x16) Len() int

Len returns the number of elements in a Int8x16

func (Int8x16) Less

func (x Int8x16) Less(y Int8x16) Mask8x16

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX

func (Int8x16) LessEqual

func (x Int8x16) LessEqual(y Int8x16) Mask8x16

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX

func (Int8x16) Masked

func (x Int8x16) Masked(mask Mask8x16) Int8x16

Masked returns x but with elements zeroed where mask is false.

func (Int8x16) Max

func (x Int8x16) Max(y Int8x16) Int8x16

Max computes the maximum of corresponding elements.

Asm: VPMAXSB, CPU Feature: AVX

func (Int8x16) Merge

func (x Int8x16) Merge(y Int8x16, mask Mask8x16) Int8x16

Merge returns x but with elements set to y where mask is false.

func (Int8x16) Min

func (x Int8x16) Min(y Int8x16) Int8x16

Min computes the minimum of corresponding elements.

Asm: VPMINSB, CPU Feature: AVX

func (Int8x16) Not

func (x Int8x16) Not() Int8x16

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Int8x16) NotEqual

func (x Int8x16) NotEqual(y Int8x16) Mask8x16

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Int8x16) OnesCount

func (x Int8x16) OnesCount() Int8x16

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTB, CPU Feature: AVX512BITALG

func (Int8x16) Or

func (x Int8x16) Or(y Int8x16) Int8x16

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Int8x16) Permute

func (x Int8x16) Permute(indices Uint8x16) Int8x16

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used

Asm: VPERMB, CPU Feature: AVX512VBMI

func (Int8x16) PermuteOrZero

func (x Int8x16) PermuteOrZero(indices Int8x16) Int8x16

PermuteOrZero performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The lower four bits of each byte-sized index in indices select an element from x, unless the index's sign bit is set in which case zero is used instead.

Asm: VPSHUFB, CPU Feature: AVX

func (Int8x16) SetElem

func (x Int8x16) SetElem(index uint8, y int8) Int8x16

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRB, CPU Feature: AVX

func (Int8x16) Store

func (x Int8x16) Store(y *[16]int8)

Store stores a Int8x16 to an array

func (Int8x16) StoreSlice

func (x Int8x16) StoreSlice(s []int8)

StoreSlice stores x into a slice of at least 16 int8s

func (Int8x16) StoreSlicePart

func (x Int8x16) StoreSlicePart(s []int8)

StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.

func (Int8x16) String

func (x Int8x16) String() string

String returns a string representation of SIMD vector x

func (Int8x16) Sub

func (x Int8x16) Sub(y Int8x16) Int8x16

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBB, CPU Feature: AVX

func (Int8x16) SubSaturated

func (x Int8x16) SubSaturated(y Int8x16) Int8x16

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBSB, CPU Feature: AVX

func (Int8x16) ToMask

func (from Int8x16) ToMask() (to Mask8x16)

ToMask converts from Int8x16 to Mask8x16, mask element is set to true when the corresponding vector element is non-zero.

func (Int8x16) Xor

func (x Int8x16) Xor(y Int8x16) Int8x16

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Int8x32

type Int8x32 struct {
	// contains filtered or unexported fields
}

Int8x32 is a 256-bit SIMD vector of 32 int8

func BroadcastInt8x32

func BroadcastInt8x32(x int8) Int8x32

BroadcastInt8x32 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt8x32

func LoadInt8x32(y *[32]int8) Int8x32

LoadInt8x32 loads a Int8x32 from an array

func LoadInt8x32Slice

func LoadInt8x32Slice(s []int8) Int8x32

LoadInt8x32Slice loads an Int8x32 from a slice of at least 32 int8s

func LoadInt8x32SlicePart

func LoadInt8x32SlicePart(s []int8) Int8x32

LoadInt8x32SlicePart loads a Int8x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadInt8x32Slice.

func (Int8x32) Abs

func (x Int8x32) Abs() Int8x32

Abs computes the absolute value of each element.

Asm: VPABSB, CPU Feature: AVX2

func (Int8x32) Add

func (x Int8x32) Add(y Int8x32) Int8x32

Add adds corresponding elements of two vectors.

Asm: VPADDB, CPU Feature: AVX2

func (Int8x32) AddSaturated

func (x Int8x32) AddSaturated(y Int8x32) Int8x32

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDSB, CPU Feature: AVX2

func (Int8x32) And

func (x Int8x32) And(y Int8x32) Int8x32

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Int8x32) AndNot

func (x Int8x32) AndNot(y Int8x32) Int8x32

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Int8x32) AsFloat32x8

func (from Int8x32) AsFloat32x8() (to Float32x8)

Float32x8 converts from Int8x32 to Float32x8

func (Int8x32) AsFloat64x4

func (from Int8x32) AsFloat64x4() (to Float64x4)

Float64x4 converts from Int8x32 to Float64x4

func (Int8x32) AsInt16x16

func (from Int8x32) AsInt16x16() (to Int16x16)

Int16x16 converts from Int8x32 to Int16x16

func (Int8x32) AsInt32x8

func (from Int8x32) AsInt32x8() (to Int32x8)

Int32x8 converts from Int8x32 to Int32x8

func (Int8x32) AsInt64x4

func (from Int8x32) AsInt64x4() (to Int64x4)

Int64x4 converts from Int8x32 to Int64x4

func (Int8x32) AsUint16x16

func (from Int8x32) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Int8x32 to Uint16x16

func (Int8x32) AsUint32x8

func (from Int8x32) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Int8x32 to Uint32x8

func (Int8x32) AsUint64x4

func (from Int8x32) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Int8x32 to Uint64x4

func (Int8x32) AsUint8x32

func (from Int8x32) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Int8x32 to Uint8x32

func (Int8x32) Compress

func (x Int8x32) Compress(mask Mask8x32) Int8x32

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

func (Int8x32) ConcatPermute

func (x Int8x32) ConcatPermute(y Int8x32, indices Uint8x32) Int8x32

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2B, CPU Feature: AVX512VBMI

func (Int8x32) CopySign

func (x Int8x32) CopySign(y Int8x32) Int8x32

CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.

Asm: VPSIGNB, CPU Feature: AVX2

func (Int8x32) DotProductQuadruple

func (x Int8x32) DotProductQuadruple(y Uint8x32) Int32x8

DotProductQuadruple performs dot products on groups of 4 elements of x and y. DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.

Asm: VPDPBUSD, CPU Feature: AVXVNNI

func (Int8x32) DotProductQuadrupleSaturated

func (x Int8x32) DotProductQuadrupleSaturated(y Uint8x32) Int32x8

DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y. DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.

Asm: VPDPBUSDS, CPU Feature: AVXVNNI

func (Int8x32) Equal

func (x Int8x32) Equal(y Int8x32) Mask8x32

Equal returns x equals y, elementwise.

Asm: VPCMPEQB, CPU Feature: AVX2

func (Int8x32) Expand

func (x Int8x32) Expand(mask Mask8x32) Int8x32

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

func (Int8x32) ExtendToInt16

func (x Int8x32) ExtendToInt16() Int16x32

ExtendToInt16 converts element values to int16. The result vector's elements are sign-extended.

Asm: VPMOVSXBW, CPU Feature: AVX512

func (Int8x32) GetHi

func (x Int8x32) GetHi() Int8x16

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int8x32) GetLo

func (x Int8x32) GetLo() Int8x16

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int8x32) Greater

func (x Int8x32) Greater(y Int8x32) Mask8x32

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTB, CPU Feature: AVX2

func (Int8x32) GreaterEqual

func (x Int8x32) GreaterEqual(y Int8x32) Mask8x32

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Int8x32) IsZero

func (x Int8x32) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int8x32) Len

func (x Int8x32) Len() int

Len returns the number of elements in a Int8x32

func (Int8x32) Less

func (x Int8x32) Less(y Int8x32) Mask8x32

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Int8x32) LessEqual

func (x Int8x32) LessEqual(y Int8x32) Mask8x32

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Int8x32) Masked

func (x Int8x32) Masked(mask Mask8x32) Int8x32

Masked returns x but with elements zeroed where mask is false.

func (Int8x32) Max

func (x Int8x32) Max(y Int8x32) Int8x32

Max computes the maximum of corresponding elements.

Asm: VPMAXSB, CPU Feature: AVX2

func (Int8x32) Merge

func (x Int8x32) Merge(y Int8x32, mask Mask8x32) Int8x32

Merge returns x but with elements set to y where mask is false.

func (Int8x32) Min

func (x Int8x32) Min(y Int8x32) Int8x32

Min computes the minimum of corresponding elements.

Asm: VPMINSB, CPU Feature: AVX2

func (Int8x32) Not

func (x Int8x32) Not() Int8x32

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Int8x32) NotEqual

func (x Int8x32) NotEqual(y Int8x32) Mask8x32

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Int8x32) OnesCount

func (x Int8x32) OnesCount() Int8x32

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTB, CPU Feature: AVX512BITALG

func (Int8x32) Or

func (x Int8x32) Or(y Int8x32) Int8x32

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Int8x32) Permute

func (x Int8x32) Permute(indices Uint8x32) Int8x32

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 5 bits (values 0-31) of each element of indices is used

Asm: VPERMB, CPU Feature: AVX512VBMI

func (Int8x32) PermuteOrZeroGrouped

func (x Int8x32) PermuteOrZeroGrouped(indices Int8x32) Int8x32

PermuteOrZeroGrouped performs a grouped permutation of vector x using indices: result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...} The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.

Asm: VPSHUFB, CPU Feature: AVX2

func (Int8x32) Select128FromPair

func (x Int8x32) Select128FromPair(lo, hi uint8, y Int8x32) Int8x32

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{0x40, 0x41, ..., 0x4f, 0x50, 0x51, ..., 0x5f}.Select128FromPair(3, 0,
     {0x60, 0x61, ..., 0x6f, 0x70, 0x71, ..., 0x7f})

returns {0x70, 0x71, ..., 0x7f, 0x40, 0x41, ..., 0x4f}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Int8x32) SetHi

func (x Int8x32) SetHi(y Int8x16) Int8x32

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int8x32) SetLo

func (x Int8x32) SetLo(y Int8x16) Int8x32

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int8x32) Store

func (x Int8x32) Store(y *[32]int8)

Store stores a Int8x32 to an array

func (Int8x32) StoreSlice

func (x Int8x32) StoreSlice(s []int8)

StoreSlice stores x into a slice of at least 32 int8s

func (Int8x32) StoreSlicePart

func (x Int8x32) StoreSlicePart(s []int8)

StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.

func (Int8x32) String

func (x Int8x32) String() string

String returns a string representation of SIMD vector x

func (Int8x32) Sub

func (x Int8x32) Sub(y Int8x32) Int8x32

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBB, CPU Feature: AVX2

func (Int8x32) SubSaturated

func (x Int8x32) SubSaturated(y Int8x32) Int8x32

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBSB, CPU Feature: AVX2

func (Int8x32) ToMask

func (from Int8x32) ToMask() (to Mask8x32)

ToMask converts from Int8x32 to Mask8x32, mask element is set to true when the corresponding vector element is non-zero.

func (Int8x32) Xor

func (x Int8x32) Xor(y Int8x32) Int8x32

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Int8x64

type Int8x64 struct {
	// contains filtered or unexported fields
}

Int8x64 is a 512-bit SIMD vector of 64 int8

func BroadcastInt8x64

func BroadcastInt8x64(x int8) Int8x64

BroadcastInt8x64 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512BW

func LoadInt8x64

func LoadInt8x64(y *[64]int8) Int8x64

LoadInt8x64 loads a Int8x64 from an array

func LoadInt8x64Slice

func LoadInt8x64Slice(s []int8) Int8x64

LoadInt8x64Slice loads an Int8x64 from a slice of at least 64 int8s

func LoadInt8x64SlicePart

func LoadInt8x64SlicePart(s []int8) Int8x64

LoadInt8x64SlicePart loads a Int8x64 from the slice s. If s has fewer than 64 elements, the remaining elements of the vector are filled with zeroes. If s has 64 or more elements, the function is equivalent to LoadInt8x64Slice.

func LoadMaskedInt8x64

func LoadMaskedInt8x64(y *[64]int8, mask Mask8x64) Int8x64

LoadMaskedInt8x64 loads a Int8x64 from an array, at those elements enabled by mask

Asm: VMOVDQU8.Z, CPU Feature: AVX512

func (Int8x64) Abs

func (x Int8x64) Abs() Int8x64

Abs computes the absolute value of each element.

Asm: VPABSB, CPU Feature: AVX512

func (Int8x64) Add

func (x Int8x64) Add(y Int8x64) Int8x64

Add adds corresponding elements of two vectors.

Asm: VPADDB, CPU Feature: AVX512

func (Int8x64) AddSaturated

func (x Int8x64) AddSaturated(y Int8x64) Int8x64

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDSB, CPU Feature: AVX512

func (Int8x64) And

func (x Int8x64) And(y Int8x64) Int8x64

And performs a bitwise AND operation between two vectors.

Asm: VPANDD, CPU Feature: AVX512

func (Int8x64) AndNot

func (x Int8x64) AndNot(y Int8x64) Int8x64

AndNot performs a bitwise x &^ y.

Asm: VPANDND, CPU Feature: AVX512

func (Int8x64) AsFloat32x16

func (from Int8x64) AsFloat32x16() (to Float32x16)

Float32x16 converts from Int8x64 to Float32x16

func (Int8x64) AsFloat64x8

func (from Int8x64) AsFloat64x8() (to Float64x8)

Float64x8 converts from Int8x64 to Float64x8

func (Int8x64) AsInt16x32

func (from Int8x64) AsInt16x32() (to Int16x32)

Int16x32 converts from Int8x64 to Int16x32

func (Int8x64) AsInt32x16

func (from Int8x64) AsInt32x16() (to Int32x16)

Int32x16 converts from Int8x64 to Int32x16

func (Int8x64) AsInt64x8

func (from Int8x64) AsInt64x8() (to Int64x8)

Int64x8 converts from Int8x64 to Int64x8

func (Int8x64) AsUint16x32

func (from Int8x64) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Int8x64 to Uint16x32

func (Int8x64) AsUint32x16

func (from Int8x64) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Int8x64 to Uint32x16

func (Int8x64) AsUint64x8

func (from Int8x64) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Int8x64 to Uint64x8

func (Int8x64) AsUint8x64

func (from Int8x64) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Int8x64 to Uint8x64

func (Int8x64) Compress

func (x Int8x64) Compress(mask Mask8x64) Int8x64

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

func (Int8x64) ConcatPermute

func (x Int8x64) ConcatPermute(y Int8x64, indices Uint8x64) Int8x64

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2B, CPU Feature: AVX512VBMI

func (Int8x64) DotProductQuadruple

func (x Int8x64) DotProductQuadruple(y Uint8x64) Int32x16

DotProductQuadruple performs dot products on groups of 4 elements of x and y. DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.

Asm: VPDPBUSD, CPU Feature: AVX512VNNI

func (Int8x64) DotProductQuadrupleSaturated

func (x Int8x64) DotProductQuadrupleSaturated(y Uint8x64) Int32x16

DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y. DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.

Asm: VPDPBUSDS, CPU Feature: AVX512VNNI

func (Int8x64) Equal

func (x Int8x64) Equal(y Int8x64) Mask8x64

Equal returns x equals y, elementwise.

Asm: VPCMPEQB, CPU Feature: AVX512

func (Int8x64) Expand

func (x Int8x64) Expand(mask Mask8x64) Int8x64

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

func (Int8x64) GetHi

func (x Int8x64) GetHi() Int8x32

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int8x64) GetLo

func (x Int8x64) GetLo() Int8x32

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int8x64) Greater

func (x Int8x64) Greater(y Int8x64) Mask8x64

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTB, CPU Feature: AVX512

func (Int8x64) GreaterEqual

func (x Int8x64) GreaterEqual(y Int8x64) Mask8x64

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPB, CPU Feature: AVX512

func (Int8x64) Len

func (x Int8x64) Len() int

Len returns the number of elements in a Int8x64

func (Int8x64) Less

func (x Int8x64) Less(y Int8x64) Mask8x64

Less returns x less-than y, elementwise.

Asm: VPCMPB, CPU Feature: AVX512

func (Int8x64) LessEqual

func (x Int8x64) LessEqual(y Int8x64) Mask8x64

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPB, CPU Feature: AVX512

func (Int8x64) Masked

func (x Int8x64) Masked(mask Mask8x64) Int8x64

Masked returns x but with elements zeroed where mask is false.

func (Int8x64) Max

func (x Int8x64) Max(y Int8x64) Int8x64

Max computes the maximum of corresponding elements.

Asm: VPMAXSB, CPU Feature: AVX512

func (Int8x64) Merge

func (x Int8x64) Merge(y Int8x64, mask Mask8x64) Int8x64

Merge returns x but with elements set to y where m is false.

func (Int8x64) Min

func (x Int8x64) Min(y Int8x64) Int8x64

Min computes the minimum of corresponding elements.

Asm: VPMINSB, CPU Feature: AVX512

func (Int8x64) Not

func (x Int8x64) Not() Int8x64

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Int8x64) NotEqual

func (x Int8x64) NotEqual(y Int8x64) Mask8x64

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPB, CPU Feature: AVX512

func (Int8x64) OnesCount

func (x Int8x64) OnesCount() Int8x64

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTB, CPU Feature: AVX512BITALG

func (Int8x64) Or

func (x Int8x64) Or(y Int8x64) Int8x64

Or performs a bitwise OR operation between two vectors.

Asm: VPORD, CPU Feature: AVX512

func (Int8x64) Permute

func (x Int8x64) Permute(indices Uint8x64) Int8x64

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 6 bits (values 0-63) of each element of indices is used

Asm: VPERMB, CPU Feature: AVX512VBMI

func (Int8x64) PermuteOrZeroGrouped

func (x Int8x64) PermuteOrZeroGrouped(indices Int8x64) Int8x64

PermuteOrZeroGrouped performs a grouped permutation of vector x using indices: result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...} The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.

Asm: VPSHUFB, CPU Feature: AVX512

func (Int8x64) SetHi

func (x Int8x64) SetHi(y Int8x32) Int8x64

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int8x64) SetLo

func (x Int8x64) SetLo(y Int8x32) Int8x64

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int8x64) Store

func (x Int8x64) Store(y *[64]int8)

Store stores a Int8x64 to an array

func (Int8x64) StoreMasked

func (x Int8x64) StoreMasked(y *[64]int8, mask Mask8x64)

StoreMasked stores a Int8x64 to an array, at those elements enabled by mask

Asm: VMOVDQU8, CPU Feature: AVX512

func (Int8x64) StoreSlice

func (x Int8x64) StoreSlice(s []int8)

StoreSlice stores x into a slice of at least 64 int8s

func (Int8x64) StoreSlicePart

func (x Int8x64) StoreSlicePart(s []int8)

StoreSlicePart stores the 64 elements of x into the slice s. It stores as many elements as will fit in s. If s has 64 or more elements, the method is equivalent to x.StoreSlice.

func (Int8x64) String

func (x Int8x64) String() string

String returns a string representation of SIMD vector x

func (Int8x64) Sub

func (x Int8x64) Sub(y Int8x64) Int8x64

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBB, CPU Feature: AVX512

func (Int8x64) SubSaturated

func (x Int8x64) SubSaturated(y Int8x64) Int8x64

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBSB, CPU Feature: AVX512

func (Int8x64) ToMask

func (from Int8x64) ToMask() (to Mask8x64)

ToMask converts from Int8x64 to Mask8x64, mask element is set to true when the corresponding vector element is non-zero.

func (Int8x64) Xor

func (x Int8x64) Xor(y Int8x64) Int8x64

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORD, CPU Feature: AVX512

type Mask16x16

type Mask16x16 struct {
	// contains filtered or unexported fields
}

Mask16x16 is a 256-bit SIMD vector of 16 int16

func Mask16x16FromBits

func Mask16x16FromBits(y uint16) Mask16x16

Mask16x16FromBits constructs a Mask16x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVW, CPU Feature: AVX512

func (Mask16x16) And

func (x Mask16x16) And(y Mask16x16) Mask16x16

func (Mask16x16) Or

func (x Mask16x16) Or(y Mask16x16) Mask16x16

func (Mask16x16) ToBits

func (x Mask16x16) ToBits() uint16

ToBits constructs a bitmap from a Mask16x16, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVW, CPU Features: AVX512

func (Mask16x16) ToInt16x16

func (from Mask16x16) ToInt16x16() (to Int16x16)

ToInt16x16 converts from Mask16x16 to Int16x16

type Mask16x32

type Mask16x32 struct {
	// contains filtered or unexported fields
}

Mask16x32 is a 512-bit SIMD vector of 32 int16

func Mask16x32FromBits

func Mask16x32FromBits(y uint32) Mask16x32

Mask16x32FromBits constructs a Mask16x32 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVW, CPU Feature: AVX512

func (Mask16x32) And

func (x Mask16x32) And(y Mask16x32) Mask16x32

func (Mask16x32) Or

func (x Mask16x32) Or(y Mask16x32) Mask16x32

func (Mask16x32) ToBits

func (x Mask16x32) ToBits() uint32

ToBits constructs a bitmap from a Mask16x32, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVW, CPU Features: AVX512

func (Mask16x32) ToInt16x32

func (from Mask16x32) ToInt16x32() (to Int16x32)

ToInt16x32 converts from Mask16x32 to Int16x32

type Mask16x8

type Mask16x8 struct {
	// contains filtered or unexported fields
}

Mask16x8 is a 128-bit SIMD vector of 8 int16

func Mask16x8FromBits

func Mask16x8FromBits(y uint8) Mask16x8

Mask16x8FromBits constructs a Mask16x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVW, CPU Feature: AVX512

func (Mask16x8) And

func (x Mask16x8) And(y Mask16x8) Mask16x8

func (Mask16x8) Or

func (x Mask16x8) Or(y Mask16x8) Mask16x8

func (Mask16x8) ToBits

func (x Mask16x8) ToBits() uint8

ToBits constructs a bitmap from a Mask16x8, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVW, CPU Features: AVX512

func (Mask16x8) ToInt16x8

func (from Mask16x8) ToInt16x8() (to Int16x8)

ToInt16x8 converts from Mask16x8 to Int16x8

type Mask32x16

type Mask32x16 struct {
	// contains filtered or unexported fields
}

Mask32x16 is a 512-bit SIMD vector of 16 int32

func Mask32x16FromBits

func Mask32x16FromBits(y uint16) Mask32x16

Mask32x16FromBits constructs a Mask32x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVD, CPU Feature: AVX512

func (Mask32x16) And

func (x Mask32x16) And(y Mask32x16) Mask32x16

func (Mask32x16) Or

func (x Mask32x16) Or(y Mask32x16) Mask32x16

func (Mask32x16) ToBits

func (x Mask32x16) ToBits() uint16

ToBits constructs a bitmap from a Mask32x16, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVD, CPU Features: AVX512

func (Mask32x16) ToInt32x16

func (from Mask32x16) ToInt32x16() (to Int32x16)

ToInt32x16 converts from Mask32x16 to Int32x16

type Mask32x4

type Mask32x4 struct {
	// contains filtered or unexported fields
}

Mask32x4 is a 128-bit SIMD vector of 4 int32

func Mask32x4FromBits

func Mask32x4FromBits(y uint8) Mask32x4

Mask32x4FromBits constructs a Mask32x4 from a bitmap value, where 1 means set for the indexed element, 0 means unset. Only the lower 4 bits of y are used.

Asm: KMOVD, CPU Feature: AVX512

func (Mask32x4) And

func (x Mask32x4) And(y Mask32x4) Mask32x4

func (Mask32x4) Or

func (x Mask32x4) Or(y Mask32x4) Mask32x4

func (Mask32x4) ToBits

func (x Mask32x4) ToBits() uint8

ToBits constructs a bitmap from a Mask32x4, where 1 means set for the indexed element, 0 means unset. Only the lower 4 bits of y are used.

Asm: KMOVD, CPU Features: AVX512

func (Mask32x4) ToInt32x4

func (from Mask32x4) ToInt32x4() (to Int32x4)

ToInt32x4 converts from Mask32x4 to Int32x4

type Mask32x8

type Mask32x8 struct {
	// contains filtered or unexported fields
}

Mask32x8 is a 256-bit SIMD vector of 8 int32

func Mask32x8FromBits

func Mask32x8FromBits(y uint8) Mask32x8

Mask32x8FromBits constructs a Mask32x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVD, CPU Feature: AVX512

func (Mask32x8) And

func (x Mask32x8) And(y Mask32x8) Mask32x8

func (Mask32x8) Or

func (x Mask32x8) Or(y Mask32x8) Mask32x8

func (Mask32x8) ToBits

func (x Mask32x8) ToBits() uint8

ToBits constructs a bitmap from a Mask32x8, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVD, CPU Features: AVX512

func (Mask32x8) ToInt32x8

func (from Mask32x8) ToInt32x8() (to Int32x8)

ToInt32x8 converts from Mask32x8 to Int32x8

type Mask64x2

type Mask64x2 struct {
	// contains filtered or unexported fields
}

Mask64x2 is a 128-bit SIMD vector of 2 int64

func Mask64x2FromBits

func Mask64x2FromBits(y uint8) Mask64x2

Mask64x2FromBits constructs a Mask64x2 from a bitmap value, where 1 means set for the indexed element, 0 means unset. Only the lower 2 bits of y are used.

Asm: KMOVQ, CPU Feature: AVX512

func (Mask64x2) And

func (x Mask64x2) And(y Mask64x2) Mask64x2

func (Mask64x2) Or

func (x Mask64x2) Or(y Mask64x2) Mask64x2

func (Mask64x2) ToBits

func (x Mask64x2) ToBits() uint8

ToBits constructs a bitmap from a Mask64x2, where 1 means set for the indexed element, 0 means unset. Only the lower 2 bits of y are used.

Asm: KMOVQ, CPU Features: AVX512

func (Mask64x2) ToInt64x2

func (from Mask64x2) ToInt64x2() (to Int64x2)

ToInt64x2 converts from Mask64x2 to Int64x2

type Mask64x4

type Mask64x4 struct {
	// contains filtered or unexported fields
}

Mask64x4 is a 256-bit SIMD vector of 4 int64

func Mask64x4FromBits

func Mask64x4FromBits(y uint8) Mask64x4

Mask64x4FromBits constructs a Mask64x4 from a bitmap value, where 1 means set for the indexed element, 0 means unset. Only the lower 4 bits of y are used.

Asm: KMOVQ, CPU Feature: AVX512

func (Mask64x4) And

func (x Mask64x4) And(y Mask64x4) Mask64x4

func (Mask64x4) Or

func (x Mask64x4) Or(y Mask64x4) Mask64x4

func (Mask64x4) ToBits

func (x Mask64x4) ToBits() uint8

ToBits constructs a bitmap from a Mask64x4, where 1 means set for the indexed element, 0 means unset. Only the lower 4 bits of y are used.

Asm: KMOVQ, CPU Features: AVX512

func (Mask64x4) ToInt64x4

func (from Mask64x4) ToInt64x4() (to Int64x4)

ToInt64x4 converts from Mask64x4 to Int64x4

type Mask64x8

type Mask64x8 struct {
	// contains filtered or unexported fields
}

Mask64x8 is a 512-bit SIMD vector of 8 int64

func Mask64x8FromBits

func Mask64x8FromBits(y uint8) Mask64x8

Mask64x8FromBits constructs a Mask64x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVQ, CPU Feature: AVX512

func (Mask64x8) And

func (x Mask64x8) And(y Mask64x8) Mask64x8

func (Mask64x8) Or

func (x Mask64x8) Or(y Mask64x8) Mask64x8

func (Mask64x8) ToBits

func (x Mask64x8) ToBits() uint8

ToBits constructs a bitmap from a Mask64x8, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVQ, CPU Features: AVX512

func (Mask64x8) ToInt64x8

func (from Mask64x8) ToInt64x8() (to Int64x8)

ToInt64x8 converts from Mask64x8 to Int64x8

type Mask8x16

type Mask8x16 struct {
	// contains filtered or unexported fields
}

Mask8x16 is a 128-bit SIMD vector of 16 int8

func Mask8x16FromBits

func Mask8x16FromBits(y uint16) Mask8x16

Mask8x16FromBits constructs a Mask8x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVB, CPU Feature: AVX512

func (Mask8x16) And

func (x Mask8x16) And(y Mask8x16) Mask8x16

func (Mask8x16) Or

func (x Mask8x16) Or(y Mask8x16) Mask8x16

func (Mask8x16) ToBits

func (x Mask8x16) ToBits() uint16

ToBits constructs a bitmap from a Mask8x16, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVB, CPU Features: AVX512

func (Mask8x16) ToInt8x16

func (from Mask8x16) ToInt8x16() (to Int8x16)

ToInt8x16 converts from Mask8x16 to Int8x16

type Mask8x32

type Mask8x32 struct {
	// contains filtered or unexported fields
}

Mask8x32 is a 256-bit SIMD vector of 32 int8

func Mask8x32FromBits

func Mask8x32FromBits(y uint32) Mask8x32

Mask8x32FromBits constructs a Mask8x32 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVB, CPU Feature: AVX512

func (Mask8x32) And

func (x Mask8x32) And(y Mask8x32) Mask8x32

func (Mask8x32) Or

func (x Mask8x32) Or(y Mask8x32) Mask8x32

func (Mask8x32) ToBits

func (x Mask8x32) ToBits() uint32

ToBits constructs a bitmap from a Mask8x32, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVB, CPU Features: AVX512

func (Mask8x32) ToInt8x32

func (from Mask8x32) ToInt8x32() (to Int8x32)

ToInt8x32 converts from Mask8x32 to Int8x32

type Mask8x64

type Mask8x64 struct {
	// contains filtered or unexported fields
}

Mask8x64 is a 512-bit SIMD vector of 64 int8

func Mask8x64FromBits

func Mask8x64FromBits(y uint64) Mask8x64

Mask8x64FromBits constructs a Mask8x64 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVB, CPU Feature: AVX512

func (Mask8x64) And

func (x Mask8x64) And(y Mask8x64) Mask8x64

func (Mask8x64) Or

func (x Mask8x64) Or(y Mask8x64) Mask8x64

func (Mask8x64) ToBits

func (x Mask8x64) ToBits() uint64

ToBits constructs a bitmap from a Mask8x64, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVB, CPU Features: AVX512

func (Mask8x64) ToInt8x64

func (from Mask8x64) ToInt8x64() (to Int8x64)

ToInt8x64 converts from Mask8x64 to Int8x64

type Uint16x16

type Uint16x16 struct {
	// contains filtered or unexported fields
}

Uint16x16 is a 256-bit SIMD vector of 16 uint16

func BroadcastUint16x16

func BroadcastUint16x16(x uint16) Uint16x16

BroadcastUint16x16 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadUint16x16

func LoadUint16x16(y *[16]uint16) Uint16x16

LoadUint16x16 loads a Uint16x16 from an array

func LoadUint16x16Slice

func LoadUint16x16Slice(s []uint16) Uint16x16

LoadUint16x16Slice loads an Uint16x16 from a slice of at least 16 uint16s

func LoadUint16x16SlicePart

func LoadUint16x16SlicePart(s []uint16) Uint16x16

LoadUint16x16SlicePart loads a Uint16x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadUint16x16Slice.

func (Uint16x16) Add

func (x Uint16x16) Add(y Uint16x16) Uint16x16

Add adds corresponding elements of two vectors.

Asm: VPADDW, CPU Feature: AVX2

func (Uint16x16) AddPairs

func (x Uint16x16) AddPairs(y Uint16x16) Uint16x16

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDW, CPU Feature: AVX2

func (Uint16x16) AddSaturated

func (x Uint16x16) AddSaturated(y Uint16x16) Uint16x16

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDUSW, CPU Feature: AVX2

func (Uint16x16) And

func (x Uint16x16) And(y Uint16x16) Uint16x16

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Uint16x16) AndNot

func (x Uint16x16) AndNot(y Uint16x16) Uint16x16

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Uint16x16) AsFloat32x8

func (from Uint16x16) AsFloat32x8() (to Float32x8)

Float32x8 converts from Uint16x16 to Float32x8

func (Uint16x16) AsFloat64x4

func (from Uint16x16) AsFloat64x4() (to Float64x4)

Float64x4 converts from Uint16x16 to Float64x4

func (Uint16x16) AsInt16x16

func (from Uint16x16) AsInt16x16() (to Int16x16)

Int16x16 converts from Uint16x16 to Int16x16

func (Uint16x16) AsInt32x8

func (from Uint16x16) AsInt32x8() (to Int32x8)

Int32x8 converts from Uint16x16 to Int32x8

func (Uint16x16) AsInt64x4

func (from Uint16x16) AsInt64x4() (to Int64x4)

Int64x4 converts from Uint16x16 to Int64x4

func (Uint16x16) AsInt8x32

func (from Uint16x16) AsInt8x32() (to Int8x32)

Int8x32 converts from Uint16x16 to Int8x32

func (Uint16x16) AsUint32x8

func (from Uint16x16) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Uint16x16 to Uint32x8

func (Uint16x16) AsUint64x4

func (from Uint16x16) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Uint16x16 to Uint64x4

func (Uint16x16) AsUint8x32

func (from Uint16x16) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Uint16x16 to Uint8x32

func (Uint16x16) Average

func (x Uint16x16) Average(y Uint16x16) Uint16x16

Average computes the rounded average of corresponding elements.

Asm: VPAVGW, CPU Feature: AVX2

func (Uint16x16) Compress

func (x Uint16x16) Compress(mask Mask16x16) Uint16x16

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

func (Uint16x16) ConcatPermute

func (x Uint16x16) ConcatPermute(y Uint16x16, indices Uint16x16) Uint16x16

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2W, CPU Feature: AVX512

func (Uint16x16) Equal

func (x Uint16x16) Equal(y Uint16x16) Mask16x16

Equal returns x equals y, elementwise.

Asm: VPCMPEQW, CPU Feature: AVX2

func (Uint16x16) Expand

func (x Uint16x16) Expand(mask Mask16x16) Uint16x16

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

func (Uint16x16) ExtendToUint32

func (x Uint16x16) ExtendToUint32() Uint32x16

ExtendToUint32 converts element values to uint32. The result vector's elements are zero-extended.

Asm: VPMOVZXWD, CPU Feature: AVX512

func (Uint16x16) GetHi

func (x Uint16x16) GetHi() Uint16x8

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint16x16) GetLo

func (x Uint16x16) GetLo() Uint16x8

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint16x16) Greater

func (x Uint16x16) Greater(y Uint16x16) Mask16x16

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX2

func (Uint16x16) GreaterEqual

func (x Uint16x16) GreaterEqual(y Uint16x16) Mask16x16

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Uint16x16) InterleaveHiGrouped

func (x Uint16x16) InterleaveHiGrouped(y Uint16x16) Uint16x16

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHWD, CPU Feature: AVX2

func (Uint16x16) InterleaveLoGrouped

func (x Uint16x16) InterleaveLoGrouped(y Uint16x16) Uint16x16

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLWD, CPU Feature: AVX2

func (Uint16x16) IsZero

func (x Uint16x16) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint16x16) Len

func (x Uint16x16) Len() int

Len returns the number of elements in a Uint16x16

func (Uint16x16) Less

func (x Uint16x16) Less(y Uint16x16) Mask16x16

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Uint16x16) LessEqual

func (x Uint16x16) LessEqual(y Uint16x16) Mask16x16

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Uint16x16) Masked

func (x Uint16x16) Masked(mask Mask16x16) Uint16x16

Masked returns x but with elements zeroed where mask is false.

func (Uint16x16) Max

func (x Uint16x16) Max(y Uint16x16) Uint16x16

Max computes the maximum of corresponding elements.

Asm: VPMAXUW, CPU Feature: AVX2

func (Uint16x16) Merge

func (x Uint16x16) Merge(y Uint16x16, mask Mask16x16) Uint16x16

Merge returns x but with elements set to y where mask is false.

func (Uint16x16) Min

func (x Uint16x16) Min(y Uint16x16) Uint16x16

Min computes the minimum of corresponding elements.

Asm: VPMINUW, CPU Feature: AVX2

func (Uint16x16) Mul

func (x Uint16x16) Mul(y Uint16x16) Uint16x16

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLW, CPU Feature: AVX2

func (Uint16x16) MulHigh

func (x Uint16x16) MulHigh(y Uint16x16) Uint16x16

MulHigh multiplies elements and stores the high part of the result.

Asm: VPMULHUW, CPU Feature: AVX2

func (Uint16x16) Not

func (x Uint16x16) Not() Uint16x16

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Uint16x16) NotEqual

func (x Uint16x16) NotEqual(y Uint16x16) Mask16x16

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Uint16x16) OnesCount

func (x Uint16x16) OnesCount() Uint16x16

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTW, CPU Feature: AVX512BITALG

func (Uint16x16) Or

func (x Uint16x16) Or(y Uint16x16) Uint16x16

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Uint16x16) Permute

func (x Uint16x16) Permute(indices Uint16x16) Uint16x16

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used

Asm: VPERMW, CPU Feature: AVX512

func (Uint16x16) PermuteScalarsHiGrouped

func (x Uint16x16) PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x16

PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
  {x[0], x[1], x[2], x[3],   x[a+4], x[b+4], x[c+4], x[d+4],
	x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12]}

Each group is of size 128-bit.

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFHW, CPU Feature: AVX2

func (Uint16x16) PermuteScalarsLoGrouped

func (x Uint16x16) PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x16

PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:

 result = {x[a], x[b], x[c], x[d],         x[4], x[5], x[6], x[7],
	x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFLW, CPU Feature: AVX2

func (Uint16x16) Select128FromPair

func (x Uint16x16) Select128FromPair(lo, hi uint8, y Uint16x16) Uint16x16

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 42, 43, 44, 45, 46, 47, 50, 51, 52, 53, 54, 55, 56, 57}.Select128FromPair(3, 0,
 {60, 61, 62, 63, 64, 65, 66, 67, 70, 71, 72, 73, 74, 75, 76, 77})

returns {70, 71, 72, 73, 74, 75, 76, 77, 40, 41, 42, 43, 44, 45, 46, 47}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Uint16x16) SetHi

func (x Uint16x16) SetHi(y Uint16x8) Uint16x16

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint16x16) SetLo

func (x Uint16x16) SetLo(y Uint16x8) Uint16x16

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint16x16) ShiftAllLeft

func (x Uint16x16) ShiftAllLeft(y uint64) Uint16x16

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLW, CPU Feature: AVX2

func (Uint16x16) ShiftAllLeftConcat

func (x Uint16x16) ShiftAllLeftConcat(shift uint8, y Uint16x16) Uint16x16

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDW, CPU Feature: AVX512VBMI2

func (Uint16x16) ShiftAllRight

func (x Uint16x16) ShiftAllRight(y uint64) Uint16x16

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLW, CPU Feature: AVX2

func (Uint16x16) ShiftAllRightConcat

func (x Uint16x16) ShiftAllRightConcat(shift uint8, y Uint16x16) Uint16x16

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDW, CPU Feature: AVX512VBMI2

func (Uint16x16) ShiftLeft

func (x Uint16x16) ShiftLeft(y Uint16x16) Uint16x16

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVW, CPU Feature: AVX512

func (Uint16x16) ShiftLeftConcat

func (x Uint16x16) ShiftLeftConcat(y Uint16x16, z Uint16x16) Uint16x16

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

func (Uint16x16) ShiftRight

func (x Uint16x16) ShiftRight(y Uint16x16) Uint16x16

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVW, CPU Feature: AVX512

func (Uint16x16) ShiftRightConcat

func (x Uint16x16) ShiftRightConcat(y Uint16x16, z Uint16x16) Uint16x16

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

func (Uint16x16) Store

func (x Uint16x16) Store(y *[16]uint16)

Store stores a Uint16x16 to an array

func (Uint16x16) StoreSlice

func (x Uint16x16) StoreSlice(s []uint16)

StoreSlice stores x into a slice of at least 16 uint16s

func (Uint16x16) StoreSlicePart

func (x Uint16x16) StoreSlicePart(s []uint16)

StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.

func (Uint16x16) String

func (x Uint16x16) String() string

String returns a string representation of SIMD vector x

func (Uint16x16) Sub

func (x Uint16x16) Sub(y Uint16x16) Uint16x16

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBW, CPU Feature: AVX2

func (Uint16x16) SubPairs

func (x Uint16x16) SubPairs(y Uint16x16) Uint16x16

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBW, CPU Feature: AVX2

func (Uint16x16) SubSaturated

func (x Uint16x16) SubSaturated(y Uint16x16) Uint16x16

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBUSW, CPU Feature: AVX2

func (Uint16x16) TruncateToUint8

func (x Uint16x16) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVWB, CPU Feature: AVX512

func (Uint16x16) Xor

func (x Uint16x16) Xor(y Uint16x16) Uint16x16

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Uint16x32

type Uint16x32 struct {
	// contains filtered or unexported fields
}

Uint16x32 is a 512-bit SIMD vector of 32 uint16

func BroadcastUint16x32

func BroadcastUint16x32(x uint16) Uint16x32

BroadcastUint16x32 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512BW

func LoadMaskedUint16x32

func LoadMaskedUint16x32(y *[32]uint16, mask Mask16x32) Uint16x32

LoadMaskedUint16x32 loads a Uint16x32 from an array, at those elements enabled by mask

Asm: VMOVDQU16.Z, CPU Feature: AVX512

func LoadUint16x32

func LoadUint16x32(y *[32]uint16) Uint16x32

LoadUint16x32 loads a Uint16x32 from an array

func LoadUint16x32Slice

func LoadUint16x32Slice(s []uint16) Uint16x32

LoadUint16x32Slice loads an Uint16x32 from a slice of at least 32 uint16s

func LoadUint16x32SlicePart

func LoadUint16x32SlicePart(s []uint16) Uint16x32

LoadUint16x32SlicePart loads a Uint16x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadUint16x32Slice.

func (Uint16x32) Add

func (x Uint16x32) Add(y Uint16x32) Uint16x32

Add adds corresponding elements of two vectors.

Asm: VPADDW, CPU Feature: AVX512

func (Uint16x32) AddSaturated

func (x Uint16x32) AddSaturated(y Uint16x32) Uint16x32

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDUSW, CPU Feature: AVX512

func (Uint16x32) And

func (x Uint16x32) And(y Uint16x32) Uint16x32

And performs a bitwise AND operation between two vectors.

Asm: VPANDD, CPU Feature: AVX512

func (Uint16x32) AndNot

func (x Uint16x32) AndNot(y Uint16x32) Uint16x32

AndNot performs a bitwise x &^ y.

Asm: VPANDND, CPU Feature: AVX512

func (Uint16x32) AsFloat32x16

func (from Uint16x32) AsFloat32x16() (to Float32x16)

Float32x16 converts from Uint16x32 to Float32x16

func (Uint16x32) AsFloat64x8

func (from Uint16x32) AsFloat64x8() (to Float64x8)

Float64x8 converts from Uint16x32 to Float64x8

func (Uint16x32) AsInt16x32

func (from Uint16x32) AsInt16x32() (to Int16x32)

Int16x32 converts from Uint16x32 to Int16x32

func (Uint16x32) AsInt32x16

func (from Uint16x32) AsInt32x16() (to Int32x16)

Int32x16 converts from Uint16x32 to Int32x16

func (Uint16x32) AsInt64x8

func (from Uint16x32) AsInt64x8() (to Int64x8)

Int64x8 converts from Uint16x32 to Int64x8

func (Uint16x32) AsInt8x64

func (from Uint16x32) AsInt8x64() (to Int8x64)

Int8x64 converts from Uint16x32 to Int8x64

func (Uint16x32) AsUint32x16

func (from Uint16x32) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Uint16x32 to Uint32x16

func (Uint16x32) AsUint64x8

func (from Uint16x32) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Uint16x32 to Uint64x8

func (Uint16x32) AsUint8x64

func (from Uint16x32) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Uint16x32 to Uint8x64

func (Uint16x32) Average

func (x Uint16x32) Average(y Uint16x32) Uint16x32

Average computes the rounded average of corresponding elements.

Asm: VPAVGW, CPU Feature: AVX512

func (Uint16x32) Compress

func (x Uint16x32) Compress(mask Mask16x32) Uint16x32

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

func (Uint16x32) ConcatPermute

func (x Uint16x32) ConcatPermute(y Uint16x32, indices Uint16x32) Uint16x32

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2W, CPU Feature: AVX512

func (Uint16x32) Equal

func (x Uint16x32) Equal(y Uint16x32) Mask16x32

Equal returns x equals y, elementwise.

Asm: VPCMPEQW, CPU Feature: AVX512

func (Uint16x32) Expand

func (x Uint16x32) Expand(mask Mask16x32) Uint16x32

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

func (Uint16x32) GetHi

func (x Uint16x32) GetHi() Uint16x16

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint16x32) GetLo

func (x Uint16x32) GetLo() Uint16x16

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint16x32) Greater

func (x Uint16x32) Greater(y Uint16x32) Mask16x32

Greater returns x greater-than y, elementwise.

Asm: VPCMPUW, CPU Feature: AVX512

func (Uint16x32) GreaterEqual

func (x Uint16x32) GreaterEqual(y Uint16x32) Mask16x32

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPUW, CPU Feature: AVX512

func (Uint16x32) InterleaveHiGrouped

func (x Uint16x32) InterleaveHiGrouped(y Uint16x32) Uint16x32

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHWD, CPU Feature: AVX512

func (Uint16x32) InterleaveLoGrouped

func (x Uint16x32) InterleaveLoGrouped(y Uint16x32) Uint16x32

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLWD, CPU Feature: AVX512

func (Uint16x32) Len

func (x Uint16x32) Len() int

Len returns the number of elements in a Uint16x32

func (Uint16x32) Less

func (x Uint16x32) Less(y Uint16x32) Mask16x32

Less returns x less-than y, elementwise.

Asm: VPCMPUW, CPU Feature: AVX512

func (Uint16x32) LessEqual

func (x Uint16x32) LessEqual(y Uint16x32) Mask16x32

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPUW, CPU Feature: AVX512

func (Uint16x32) Masked

func (x Uint16x32) Masked(mask Mask16x32) Uint16x32

Masked returns x but with elements zeroed where mask is false.

func (Uint16x32) Max

func (x Uint16x32) Max(y Uint16x32) Uint16x32

Max computes the maximum of corresponding elements.

Asm: VPMAXUW, CPU Feature: AVX512

func (Uint16x32) Merge

func (x Uint16x32) Merge(y Uint16x32, mask Mask16x32) Uint16x32

Merge returns x but with elements set to y where m is false.

func (Uint16x32) Min

func (x Uint16x32) Min(y Uint16x32) Uint16x32

Min computes the minimum of corresponding elements.

Asm: VPMINUW, CPU Feature: AVX512

func (Uint16x32) Mul

func (x Uint16x32) Mul(y Uint16x32) Uint16x32

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLW, CPU Feature: AVX512

func (Uint16x32) MulHigh

func (x Uint16x32) MulHigh(y Uint16x32) Uint16x32

MulHigh multiplies elements and stores the high part of the result.

Asm: VPMULHUW, CPU Feature: AVX512

func (Uint16x32) Not

func (x Uint16x32) Not() Uint16x32

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Uint16x32) NotEqual

func (x Uint16x32) NotEqual(y Uint16x32) Mask16x32

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPUW, CPU Feature: AVX512

func (Uint16x32) OnesCount

func (x Uint16x32) OnesCount() Uint16x32

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTW, CPU Feature: AVX512BITALG

func (Uint16x32) Or

func (x Uint16x32) Or(y Uint16x32) Uint16x32

Or performs a bitwise OR operation between two vectors.

Asm: VPORD, CPU Feature: AVX512

func (Uint16x32) Permute

func (x Uint16x32) Permute(indices Uint16x32) Uint16x32

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 5 bits (values 0-31) of each element of indices is used

Asm: VPERMW, CPU Feature: AVX512

func (Uint16x32) PermuteScalarsHiGrouped

func (x Uint16x32) PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x32

PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
	 {  x[0], x[1], x[2], x[3],     x[a+4], x[b+4], x[c+4], x[d+4],
		x[8], x[9], x[10], x[11],   x[a+12], x[b+12], x[c+12], x[d+12],
		x[16], x[17], x[18], x[19], x[a+20], x[b+20], x[c+20], x[d+20],
		x[24], x[25], x[26], x[27], x[a+28], x[b+28], x[c+28], x[d+28]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFHW, CPU Feature: AVX512

func (Uint16x32) PermuteScalarsLoGrouped

func (x Uint16x32) PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x32

PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
 {x[a], x[b], x[c], x[d],    x[4], x[5], x[6], x[7],
	x[a+8], x[b+8], x[c+8], x[d+8],     x[12], x[13], x[14], x[15],
	x[a+16], x[b+16], x[c+16], x[d+16], x[20], x[21], x[22], x[23],
	x[a+24], x[b+24], x[c+24], x[d+24], x[28], x[29], x[30], x[31]}

Each group is of size 128-bit.

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFLW, CPU Feature: AVX512

func (Uint16x32) SaturateToUint8

func (x Uint16x32) SaturateToUint8() Uint8x32

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSWB, CPU Feature: AVX512

func (Uint16x32) SetHi

func (x Uint16x32) SetHi(y Uint16x16) Uint16x32

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint16x32) SetLo

func (x Uint16x32) SetLo(y Uint16x16) Uint16x32

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint16x32) ShiftAllLeft

func (x Uint16x32) ShiftAllLeft(y uint64) Uint16x32

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLW, CPU Feature: AVX512

func (Uint16x32) ShiftAllLeftConcat

func (x Uint16x32) ShiftAllLeftConcat(shift uint8, y Uint16x32) Uint16x32

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDW, CPU Feature: AVX512VBMI2

func (Uint16x32) ShiftAllRight

func (x Uint16x32) ShiftAllRight(y uint64) Uint16x32

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLW, CPU Feature: AVX512

func (Uint16x32) ShiftAllRightConcat

func (x Uint16x32) ShiftAllRightConcat(shift uint8, y Uint16x32) Uint16x32

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDW, CPU Feature: AVX512VBMI2

func (Uint16x32) ShiftLeft

func (x Uint16x32) ShiftLeft(y Uint16x32) Uint16x32

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVW, CPU Feature: AVX512

func (Uint16x32) ShiftLeftConcat

func (x Uint16x32) ShiftLeftConcat(y Uint16x32, z Uint16x32) Uint16x32

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

func (Uint16x32) ShiftRight

func (x Uint16x32) ShiftRight(y Uint16x32) Uint16x32

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVW, CPU Feature: AVX512

func (Uint16x32) ShiftRightConcat

func (x Uint16x32) ShiftRightConcat(y Uint16x32, z Uint16x32) Uint16x32

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

func (Uint16x32) Store

func (x Uint16x32) Store(y *[32]uint16)

Store stores a Uint16x32 to an array

func (Uint16x32) StoreMasked

func (x Uint16x32) StoreMasked(y *[32]uint16, mask Mask16x32)

StoreMasked stores a Uint16x32 to an array, at those elements enabled by mask

Asm: VMOVDQU16, CPU Feature: AVX512

func (Uint16x32) StoreSlice

func (x Uint16x32) StoreSlice(s []uint16)

StoreSlice stores x into a slice of at least 32 uint16s

func (Uint16x32) StoreSlicePart

func (x Uint16x32) StoreSlicePart(s []uint16)

StoreSlicePart stores the 32 elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.

func (Uint16x32) String

func (x Uint16x32) String() string

String returns a string representation of SIMD vector x

func (Uint16x32) Sub

func (x Uint16x32) Sub(y Uint16x32) Uint16x32

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBW, CPU Feature: AVX512

func (Uint16x32) SubSaturated

func (x Uint16x32) SubSaturated(y Uint16x32) Uint16x32

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBUSW, CPU Feature: AVX512

func (Uint16x32) TruncateToUint8

func (x Uint16x32) TruncateToUint8() Uint8x32

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements.

Asm: VPMOVWB, CPU Feature: AVX512

func (Uint16x32) Xor

func (x Uint16x32) Xor(y Uint16x32) Uint16x32

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORD, CPU Feature: AVX512

type Uint16x8

type Uint16x8 struct {
	// contains filtered or unexported fields
}

Uint16x8 is a 128-bit SIMD vector of 8 uint16

func BroadcastUint16x8

func BroadcastUint16x8(x uint16) Uint16x8

BroadcastUint16x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadUint16x8

func LoadUint16x8(y *[8]uint16) Uint16x8

LoadUint16x8 loads a Uint16x8 from an array

func LoadUint16x8Slice

func LoadUint16x8Slice(s []uint16) Uint16x8

LoadUint16x8Slice loads an Uint16x8 from a slice of at least 8 uint16s

func LoadUint16x8SlicePart

func LoadUint16x8SlicePart(s []uint16) Uint16x8

LoadUint16x8SlicePart loads a Uint16x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadUint16x8Slice.

func (Uint16x8) Add

func (x Uint16x8) Add(y Uint16x8) Uint16x8

Add adds corresponding elements of two vectors.

Asm: VPADDW, CPU Feature: AVX

func (Uint16x8) AddPairs

func (x Uint16x8) AddPairs(y Uint16x8) Uint16x8

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDW, CPU Feature: AVX

func (Uint16x8) AddSaturated

func (x Uint16x8) AddSaturated(y Uint16x8) Uint16x8

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDUSW, CPU Feature: AVX

func (Uint16x8) And

func (x Uint16x8) And(y Uint16x8) Uint16x8

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Uint16x8) AndNot

func (x Uint16x8) AndNot(y Uint16x8) Uint16x8

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Uint16x8) AsFloat32x4

func (from Uint16x8) AsFloat32x4() (to Float32x4)

Float32x4 converts from Uint16x8 to Float32x4

func (Uint16x8) AsFloat64x2

func (from Uint16x8) AsFloat64x2() (to Float64x2)

Float64x2 converts from Uint16x8 to Float64x2

func (Uint16x8) AsInt16x8

func (from Uint16x8) AsInt16x8() (to Int16x8)

Int16x8 converts from Uint16x8 to Int16x8

func (Uint16x8) AsInt32x4

func (from Uint16x8) AsInt32x4() (to Int32x4)

Int32x4 converts from Uint16x8 to Int32x4

func (Uint16x8) AsInt64x2

func (from Uint16x8) AsInt64x2() (to Int64x2)

Int64x2 converts from Uint16x8 to Int64x2

func (Uint16x8) AsInt8x16

func (from Uint16x8) AsInt8x16() (to Int8x16)

Int8x16 converts from Uint16x8 to Int8x16

func (Uint16x8) AsUint32x4

func (from Uint16x8) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Uint16x8 to Uint32x4

func (Uint16x8) AsUint64x2

func (from Uint16x8) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Uint16x8 to Uint64x2

func (Uint16x8) AsUint8x16

func (from Uint16x8) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Uint16x8 to Uint8x16

func (Uint16x8) Average

func (x Uint16x8) Average(y Uint16x8) Uint16x8

Average computes the rounded average of corresponding elements.

Asm: VPAVGW, CPU Feature: AVX

func (Uint16x8) Broadcast128

func (x Uint16x8) Broadcast128() Uint16x8

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTW, CPU Feature: AVX2

func (Uint16x8) Broadcast256

func (x Uint16x8) Broadcast256() Uint16x16

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTW, CPU Feature: AVX2

func (Uint16x8) Broadcast512

func (x Uint16x8) Broadcast512() Uint16x32

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTW, CPU Feature: AVX512

func (Uint16x8) Compress

func (x Uint16x8) Compress(mask Mask16x8) Uint16x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

func (Uint16x8) ConcatPermute

func (x Uint16x8) ConcatPermute(y Uint16x8, indices Uint16x8) Uint16x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2W, CPU Feature: AVX512

func (Uint16x8) Equal

func (x Uint16x8) Equal(y Uint16x8) Mask16x8

Equal returns x equals y, elementwise.

Asm: VPCMPEQW, CPU Feature: AVX

func (Uint16x8) Expand

func (x Uint16x8) Expand(mask Mask16x8) Uint16x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

func (Uint16x8) ExtendLo2ToUint64x2

func (x Uint16x8) ExtendLo2ToUint64x2() Uint64x2

ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXWQ, CPU Feature: AVX

func (Uint16x8) ExtendLo4ToUint32x4

func (x Uint16x8) ExtendLo4ToUint32x4() Uint32x4

ExtendLo4ToUint32x4 converts 4 lowest vector element values to uint32. The result vector's elements are zero-extended.

Asm: VPMOVZXWD, CPU Feature: AVX

func (Uint16x8) ExtendLo4ToUint64x4

func (x Uint16x8) ExtendLo4ToUint64x4() Uint64x4

ExtendLo4ToUint64x4 converts 4 lowest vector element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXWQ, CPU Feature: AVX2

func (Uint16x8) ExtendToUint32

func (x Uint16x8) ExtendToUint32() Uint32x8

ExtendToUint32 converts element values to uint32. The result vector's elements are zero-extended.

Asm: VPMOVZXWD, CPU Feature: AVX2

func (Uint16x8) ExtendToUint64

func (x Uint16x8) ExtendToUint64() Uint64x8

ExtendToUint64 converts element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXWQ, CPU Feature: AVX512

func (Uint16x8) GetElem

func (x Uint16x8) GetElem(index uint8) uint16

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRW, CPU Feature: AVX512

func (Uint16x8) Greater

func (x Uint16x8) Greater(y Uint16x8) Mask16x8

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX

func (Uint16x8) GreaterEqual

func (x Uint16x8) GreaterEqual(y Uint16x8) Mask16x8

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX

func (Uint16x8) InterleaveHi

func (x Uint16x8) InterleaveHi(y Uint16x8) Uint16x8

InterleaveHi interleaves the elements of the high halves of x and y.

Asm: VPUNPCKHWD, CPU Feature: AVX

func (Uint16x8) InterleaveLo

func (x Uint16x8) InterleaveLo(y Uint16x8) Uint16x8

InterleaveLo interleaves the elements of the low halves of x and y.

Asm: VPUNPCKLWD, CPU Feature: AVX

func (Uint16x8) IsZero

func (x Uint16x8) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint16x8) Len

func (x Uint16x8) Len() int

Len returns the number of elements in a Uint16x8

func (Uint16x8) Less

func (x Uint16x8) Less(y Uint16x8) Mask16x8

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX

func (Uint16x8) LessEqual

func (x Uint16x8) LessEqual(y Uint16x8) Mask16x8

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX

func (Uint16x8) Masked

func (x Uint16x8) Masked(mask Mask16x8) Uint16x8

Masked returns x but with elements zeroed where mask is false.

func (Uint16x8) Max

func (x Uint16x8) Max(y Uint16x8) Uint16x8

Max computes the maximum of corresponding elements.

Asm: VPMAXUW, CPU Feature: AVX

func (Uint16x8) Merge

func (x Uint16x8) Merge(y Uint16x8, mask Mask16x8) Uint16x8

Merge returns x but with elements set to y where mask is false.

func (Uint16x8) Min

func (x Uint16x8) Min(y Uint16x8) Uint16x8

Min computes the minimum of corresponding elements.

Asm: VPMINUW, CPU Feature: AVX

func (Uint16x8) Mul

func (x Uint16x8) Mul(y Uint16x8) Uint16x8

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLW, CPU Feature: AVX

func (Uint16x8) MulHigh

func (x Uint16x8) MulHigh(y Uint16x8) Uint16x8

MulHigh multiplies elements and stores the high part of the result.

Asm: VPMULHUW, CPU Feature: AVX

func (Uint16x8) Not

func (x Uint16x8) Not() Uint16x8

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Uint16x8) NotEqual

func (x Uint16x8) NotEqual(y Uint16x8) Mask16x8

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Uint16x8) OnesCount

func (x Uint16x8) OnesCount() Uint16x8

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTW, CPU Feature: AVX512BITALG

func (Uint16x8) Or

func (x Uint16x8) Or(y Uint16x8) Uint16x8

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Uint16x8) Permute

func (x Uint16x8) Permute(indices Uint16x8) Uint16x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMW, CPU Feature: AVX512

func (Uint16x8) PermuteScalarsHi

func (x Uint16x8) PermuteScalarsHi(a, b, c, d uint8) Uint16x8

PermuteScalarsHi performs a permutation of vector x using the supplied indices:

result = {x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFHW, CPU Feature: AVX512

func (Uint16x8) PermuteScalarsLo

func (x Uint16x8) PermuteScalarsLo(a, b, c, d uint8) Uint16x8

PermuteScalarsLo performs a permutation of vector x using the supplied indices:

result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFLW, CPU Feature: AVX512

func (Uint16x8) SetElem

func (x Uint16x8) SetElem(index uint8, y uint16) Uint16x8

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRW, CPU Feature: AVX

func (Uint16x8) ShiftAllLeft

func (x Uint16x8) ShiftAllLeft(y uint64) Uint16x8

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLW, CPU Feature: AVX

func (Uint16x8) ShiftAllLeftConcat

func (x Uint16x8) ShiftAllLeftConcat(shift uint8, y Uint16x8) Uint16x8

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDW, CPU Feature: AVX512VBMI2

func (Uint16x8) ShiftAllRight

func (x Uint16x8) ShiftAllRight(y uint64) Uint16x8

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLW, CPU Feature: AVX

func (Uint16x8) ShiftAllRightConcat

func (x Uint16x8) ShiftAllRightConcat(shift uint8, y Uint16x8) Uint16x8

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDW, CPU Feature: AVX512VBMI2

func (Uint16x8) ShiftLeft

func (x Uint16x8) ShiftLeft(y Uint16x8) Uint16x8

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVW, CPU Feature: AVX512

func (Uint16x8) ShiftLeftConcat

func (x Uint16x8) ShiftLeftConcat(y Uint16x8, z Uint16x8) Uint16x8

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

func (Uint16x8) ShiftRight

func (x Uint16x8) ShiftRight(y Uint16x8) Uint16x8

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVW, CPU Feature: AVX512

func (Uint16x8) ShiftRightConcat

func (x Uint16x8) ShiftRightConcat(y Uint16x8, z Uint16x8) Uint16x8

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

func (Uint16x8) Store

func (x Uint16x8) Store(y *[8]uint16)

Store stores a Uint16x8 to an array

func (Uint16x8) StoreSlice

func (x Uint16x8) StoreSlice(s []uint16)

StoreSlice stores x into a slice of at least 8 uint16s

func (Uint16x8) StoreSlicePart

func (x Uint16x8) StoreSlicePart(s []uint16)

StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Uint16x8) String

func (x Uint16x8) String() string

String returns a string representation of SIMD vector x

func (Uint16x8) Sub

func (x Uint16x8) Sub(y Uint16x8) Uint16x8

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBW, CPU Feature: AVX

func (Uint16x8) SubPairs

func (x Uint16x8) SubPairs(y Uint16x8) Uint16x8

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBW, CPU Feature: AVX

func (Uint16x8) SubSaturated

func (x Uint16x8) SubSaturated(y Uint16x8) Uint16x8

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBUSW, CPU Feature: AVX

func (Uint16x8) TruncateToUint8

func (x Uint16x8) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVWB, CPU Feature: AVX512

func (Uint16x8) Xor

func (x Uint16x8) Xor(y Uint16x8) Uint16x8

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Uint32x16

type Uint32x16 struct {
	// contains filtered or unexported fields
}

Uint32x16 is a 512-bit SIMD vector of 16 uint32

func BroadcastUint32x16

func BroadcastUint32x16(x uint32) Uint32x16

BroadcastUint32x16 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512F

func LoadMaskedUint32x16

func LoadMaskedUint32x16(y *[16]uint32, mask Mask32x16) Uint32x16

LoadMaskedUint32x16 loads a Uint32x16 from an array, at those elements enabled by mask

Asm: VMOVDQU32.Z, CPU Feature: AVX512

func LoadUint32x16

func LoadUint32x16(y *[16]uint32) Uint32x16

LoadUint32x16 loads a Uint32x16 from an array

func LoadUint32x16Slice

func LoadUint32x16Slice(s []uint32) Uint32x16

LoadUint32x16Slice loads an Uint32x16 from a slice of at least 16 uint32s

func LoadUint32x16SlicePart

func LoadUint32x16SlicePart(s []uint32) Uint32x16

LoadUint32x16SlicePart loads a Uint32x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadUint32x16Slice.

func (Uint32x16) Add

func (x Uint32x16) Add(y Uint32x16) Uint32x16

Add adds corresponding elements of two vectors.

Asm: VPADDD, CPU Feature: AVX512

func (Uint32x16) And

func (x Uint32x16) And(y Uint32x16) Uint32x16

And performs a bitwise AND operation between two vectors.

Asm: VPANDD, CPU Feature: AVX512

func (Uint32x16) AndNot

func (x Uint32x16) AndNot(y Uint32x16) Uint32x16

AndNot performs a bitwise x &^ y.

Asm: VPANDND, CPU Feature: AVX512

func (Uint32x16) AsFloat32x16

func (from Uint32x16) AsFloat32x16() (to Float32x16)

Float32x16 converts from Uint32x16 to Float32x16

func (Uint32x16) AsFloat64x8

func (from Uint32x16) AsFloat64x8() (to Float64x8)

Float64x8 converts from Uint32x16 to Float64x8

func (Uint32x16) AsInt16x32

func (from Uint32x16) AsInt16x32() (to Int16x32)

Int16x32 converts from Uint32x16 to Int16x32

func (Uint32x16) AsInt32x16

func (from Uint32x16) AsInt32x16() (to Int32x16)

Int32x16 converts from Uint32x16 to Int32x16

func (Uint32x16) AsInt64x8

func (from Uint32x16) AsInt64x8() (to Int64x8)

Int64x8 converts from Uint32x16 to Int64x8

func (Uint32x16) AsInt8x64

func (from Uint32x16) AsInt8x64() (to Int8x64)

Int8x64 converts from Uint32x16 to Int8x64

func (Uint32x16) AsUint16x32

func (from Uint32x16) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Uint32x16 to Uint16x32

func (Uint32x16) AsUint64x8

func (from Uint32x16) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Uint32x16 to Uint64x8

func (Uint32x16) AsUint8x64

func (from Uint32x16) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Uint32x16 to Uint8x64

func (Uint32x16) Compress

func (x Uint32x16) Compress(mask Mask32x16) Uint32x16

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSD, CPU Feature: AVX512

func (Uint32x16) ConcatPermute

func (x Uint32x16) ConcatPermute(y Uint32x16, indices Uint32x16) Uint32x16

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2D, CPU Feature: AVX512

func (Uint32x16) ConvertToFloat32

func (x Uint32x16) ConvertToFloat32() Float32x16

ConvertToFloat32 converts element values to float32.

Asm: VCVTUDQ2PS, CPU Feature: AVX512

func (Uint32x16) Equal

func (x Uint32x16) Equal(y Uint32x16) Mask32x16

Equal returns x equals y, elementwise.

Asm: VPCMPEQD, CPU Feature: AVX512

func (Uint32x16) Expand

func (x Uint32x16) Expand(mask Mask32x16) Uint32x16

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDD, CPU Feature: AVX512

func (Uint32x16) GetHi

func (x Uint32x16) GetHi() Uint32x8

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint32x16) GetLo

func (x Uint32x16) GetLo() Uint32x8

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint32x16) Greater

func (x Uint32x16) Greater(y Uint32x16) Mask32x16

Greater returns x greater-than y, elementwise.

Asm: VPCMPUD, CPU Feature: AVX512

func (Uint32x16) GreaterEqual

func (x Uint32x16) GreaterEqual(y Uint32x16) Mask32x16

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPUD, CPU Feature: AVX512

func (Uint32x16) InterleaveHiGrouped

func (x Uint32x16) InterleaveHiGrouped(y Uint32x16) Uint32x16

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHDQ, CPU Feature: AVX512

func (Uint32x16) InterleaveLoGrouped

func (x Uint32x16) InterleaveLoGrouped(y Uint32x16) Uint32x16

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLDQ, CPU Feature: AVX512

func (Uint32x16) LeadingZeros

func (x Uint32x16) LeadingZeros() Uint32x16

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTD, CPU Feature: AVX512

func (Uint32x16) Len

func (x Uint32x16) Len() int

Len returns the number of elements in a Uint32x16

func (Uint32x16) Less

func (x Uint32x16) Less(y Uint32x16) Mask32x16

Less returns x less-than y, elementwise.

Asm: VPCMPUD, CPU Feature: AVX512

func (Uint32x16) LessEqual

func (x Uint32x16) LessEqual(y Uint32x16) Mask32x16

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPUD, CPU Feature: AVX512

func (Uint32x16) Masked

func (x Uint32x16) Masked(mask Mask32x16) Uint32x16

Masked returns x but with elements zeroed where mask is false.

func (Uint32x16) Max

func (x Uint32x16) Max(y Uint32x16) Uint32x16

Max computes the maximum of corresponding elements.

Asm: VPMAXUD, CPU Feature: AVX512

func (Uint32x16) Merge

func (x Uint32x16) Merge(y Uint32x16, mask Mask32x16) Uint32x16

Merge returns x but with elements set to y where m is false.

func (Uint32x16) Min

func (x Uint32x16) Min(y Uint32x16) Uint32x16

Min computes the minimum of corresponding elements.

Asm: VPMINUD, CPU Feature: AVX512

func (Uint32x16) Mul

func (x Uint32x16) Mul(y Uint32x16) Uint32x16

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLD, CPU Feature: AVX512

func (Uint32x16) Not

func (x Uint32x16) Not() Uint32x16

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Uint32x16) NotEqual

func (x Uint32x16) NotEqual(y Uint32x16) Mask32x16

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPUD, CPU Feature: AVX512

func (Uint32x16) OnesCount

func (x Uint32x16) OnesCount() Uint32x16

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

func (Uint32x16) Or

func (x Uint32x16) Or(y Uint32x16) Uint32x16

Or performs a bitwise OR operation between two vectors.

Asm: VPORD, CPU Feature: AVX512

func (Uint32x16) Permute

func (x Uint32x16) Permute(indices Uint32x16) Uint32x16

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used

Asm: VPERMD, CPU Feature: AVX512

func (Uint32x16) PermuteScalarsGrouped

func (x Uint32x16) PermuteScalarsGrouped(a, b, c, d uint8) Uint32x16

PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
	 {  x[a], x[b], x[c], x[d],         x[a+4], x[b+4], x[c+4], x[d+4],
		x[a+8], x[b+8], x[c+8], x[d+8], x[a+12], x[b+12], x[c+12], x[d+12]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFD, CPU Feature: AVX512

func (Uint32x16) RotateAllLeft

func (x Uint32x16) RotateAllLeft(shift uint8) Uint32x16

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLD, CPU Feature: AVX512

func (Uint32x16) RotateAllRight

func (x Uint32x16) RotateAllRight(shift uint8) Uint32x16

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORD, CPU Feature: AVX512

func (Uint32x16) RotateLeft

func (x Uint32x16) RotateLeft(y Uint32x16) Uint32x16

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVD, CPU Feature: AVX512

func (Uint32x16) RotateRight

func (x Uint32x16) RotateRight(y Uint32x16) Uint32x16

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVD, CPU Feature: AVX512

func (Uint32x16) SaturateToUint16

func (x Uint32x16) SaturateToUint16() Uint16x16

SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSDW, CPU Feature: AVX512

func (Uint32x16) SaturateToUint16Concat

func (x Uint32x16) SaturateToUint16Concat(y Uint32x16) Uint16x32

SaturateToUint16Concat converts element values to uint16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.

Asm: VPACKUSDW, CPU Feature: AVX512

func (Uint32x16) SelectFromPairGrouped

func (x Uint32x16) SelectFromPairGrouped(a, b, c, d uint8, y Uint32x16) Uint32x16

SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX512

func (Uint32x16) SetHi

func (x Uint32x16) SetHi(y Uint32x8) Uint32x16

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint32x16) SetLo

func (x Uint32x16) SetLo(y Uint32x8) Uint32x16

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint32x16) ShiftAllLeft

func (x Uint32x16) ShiftAllLeft(y uint64) Uint32x16

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLD, CPU Feature: AVX512

func (Uint32x16) ShiftAllLeftConcat

func (x Uint32x16) ShiftAllLeftConcat(shift uint8, y Uint32x16) Uint32x16

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDD, CPU Feature: AVX512VBMI2

func (Uint32x16) ShiftAllRight

func (x Uint32x16) ShiftAllRight(y uint64) Uint32x16

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLD, CPU Feature: AVX512

func (Uint32x16) ShiftAllRightConcat

func (x Uint32x16) ShiftAllRightConcat(shift uint8, y Uint32x16) Uint32x16

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDD, CPU Feature: AVX512VBMI2

func (Uint32x16) ShiftLeft

func (x Uint32x16) ShiftLeft(y Uint32x16) Uint32x16

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVD, CPU Feature: AVX512

func (Uint32x16) ShiftLeftConcat

func (x Uint32x16) ShiftLeftConcat(y Uint32x16, z Uint32x16) Uint32x16

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

func (Uint32x16) ShiftRight

func (x Uint32x16) ShiftRight(y Uint32x16) Uint32x16

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVD, CPU Feature: AVX512

func (Uint32x16) ShiftRightConcat

func (x Uint32x16) ShiftRightConcat(y Uint32x16, z Uint32x16) Uint32x16

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

func (Uint32x16) Store

func (x Uint32x16) Store(y *[16]uint32)

Store stores a Uint32x16 to an array

func (Uint32x16) StoreMasked

func (x Uint32x16) StoreMasked(y *[16]uint32, mask Mask32x16)

StoreMasked stores a Uint32x16 to an array, at those elements enabled by mask

Asm: VMOVDQU32, CPU Feature: AVX512

func (Uint32x16) StoreSlice

func (x Uint32x16) StoreSlice(s []uint32)

StoreSlice stores x into a slice of at least 16 uint32s

func (Uint32x16) StoreSlicePart

func (x Uint32x16) StoreSlicePart(s []uint32)

StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.

func (Uint32x16) String

func (x Uint32x16) String() string

String returns a string representation of SIMD vector x

func (Uint32x16) Sub

func (x Uint32x16) Sub(y Uint32x16) Uint32x16

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBD, CPU Feature: AVX512

func (Uint32x16) TruncateToUint16

func (x Uint32x16) TruncateToUint16() Uint16x16

TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.

Asm: VPMOVDW, CPU Feature: AVX512

func (Uint32x16) TruncateToUint8

func (x Uint32x16) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVDB, CPU Feature: AVX512

func (Uint32x16) Xor

func (x Uint32x16) Xor(y Uint32x16) Uint32x16

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORD, CPU Feature: AVX512

type Uint32x4

type Uint32x4 struct {
	// contains filtered or unexported fields
}

Uint32x4 is a 128-bit SIMD vector of 4 uint32

func BroadcastUint32x4

func BroadcastUint32x4(x uint32) Uint32x4

BroadcastUint32x4 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadMaskedUint32x4

func LoadMaskedUint32x4(y *[4]uint32, mask Mask32x4) Uint32x4

LoadMaskedUint32x4 loads a Uint32x4 from an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func LoadUint32x4

func LoadUint32x4(y *[4]uint32) Uint32x4

LoadUint32x4 loads a Uint32x4 from an array

func LoadUint32x4Slice

func LoadUint32x4Slice(s []uint32) Uint32x4

LoadUint32x4Slice loads an Uint32x4 from a slice of at least 4 uint32s

func LoadUint32x4SlicePart

func LoadUint32x4SlicePart(s []uint32) Uint32x4

LoadUint32x4SlicePart loads a Uint32x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadUint32x4Slice.

func (Uint32x4) AESInvMixColumns

func (x Uint32x4) AESInvMixColumns() Uint32x4

AESInvMixColumns performs the InvMixColumns operation in AES cipher algorithm defined in FIPS 197. x is the chunk of w array in use. result = InvMixColumns(x)

Asm: VAESIMC, CPU Feature: AVX, AES

func (Uint32x4) AESRoundKeyGenAssist

func (x Uint32x4) AESRoundKeyGenAssist(rconVal uint8) Uint32x4

AESRoundKeyGenAssist performs some components of KeyExpansion in AES cipher algorithm defined in FIPS 197. x is an array of AES words, but only x[0] and x[2] are used. r is a value from the Rcon constant array. result[0] = XOR(SubWord(RotWord(x[0])), r) result[1] = SubWord(x[1]) result[2] = XOR(SubWord(RotWord(x[2])), r) result[3] = SubWord(x[3])

rconVal results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VAESKEYGENASSIST, CPU Feature: AVX, AES

func (Uint32x4) Add

func (x Uint32x4) Add(y Uint32x4) Uint32x4

Add adds corresponding elements of two vectors.

Asm: VPADDD, CPU Feature: AVX

func (Uint32x4) AddPairs

func (x Uint32x4) AddPairs(y Uint32x4) Uint32x4

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDD, CPU Feature: AVX

func (Uint32x4) And

func (x Uint32x4) And(y Uint32x4) Uint32x4

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Uint32x4) AndNot

func (x Uint32x4) AndNot(y Uint32x4) Uint32x4

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Uint32x4) AsFloat32x4

func (from Uint32x4) AsFloat32x4() (to Float32x4)

Float32x4 converts from Uint32x4 to Float32x4

func (Uint32x4) AsFloat64x2

func (from Uint32x4) AsFloat64x2() (to Float64x2)

Float64x2 converts from Uint32x4 to Float64x2

func (Uint32x4) AsInt16x8

func (from Uint32x4) AsInt16x8() (to Int16x8)

Int16x8 converts from Uint32x4 to Int16x8

func (Uint32x4) AsInt32x4

func (from Uint32x4) AsInt32x4() (to Int32x4)

Int32x4 converts from Uint32x4 to Int32x4

func (Uint32x4) AsInt64x2

func (from Uint32x4) AsInt64x2() (to Int64x2)

Int64x2 converts from Uint32x4 to Int64x2

func (Uint32x4) AsInt8x16

func (from Uint32x4) AsInt8x16() (to Int8x16)

Int8x16 converts from Uint32x4 to Int8x16

func (Uint32x4) AsUint16x8

func (from Uint32x4) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Uint32x4 to Uint16x8

func (Uint32x4) AsUint64x2

func (from Uint32x4) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Uint32x4 to Uint64x2

func (Uint32x4) AsUint8x16

func (from Uint32x4) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Uint32x4 to Uint8x16

func (Uint32x4) Broadcast128

func (x Uint32x4) Broadcast128() Uint32x4

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTD, CPU Feature: AVX2

func (Uint32x4) Broadcast256

func (x Uint32x4) Broadcast256() Uint32x8

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTD, CPU Feature: AVX2

func (Uint32x4) Broadcast512

func (x Uint32x4) Broadcast512() Uint32x16

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTD, CPU Feature: AVX512

func (Uint32x4) Compress

func (x Uint32x4) Compress(mask Mask32x4) Uint32x4

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSD, CPU Feature: AVX512

func (Uint32x4) ConcatPermute

func (x Uint32x4) ConcatPermute(y Uint32x4, indices Uint32x4) Uint32x4

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2D, CPU Feature: AVX512

func (Uint32x4) ConvertToFloat32

func (x Uint32x4) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32.

Asm: VCVTUDQ2PS, CPU Feature: AVX512

func (Uint32x4) ConvertToFloat64

func (x Uint32x4) ConvertToFloat64() Float64x4

ConvertToFloat64 converts element values to float64.

Asm: VCVTUDQ2PD, CPU Feature: AVX512

func (Uint32x4) Equal

func (x Uint32x4) Equal(y Uint32x4) Mask32x4

Equal returns x equals y, elementwise.

Asm: VPCMPEQD, CPU Feature: AVX

func (Uint32x4) Expand

func (x Uint32x4) Expand(mask Mask32x4) Uint32x4

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDD, CPU Feature: AVX512

func (Uint32x4) ExtendLo2ToUint64x2

func (x Uint32x4) ExtendLo2ToUint64x2() Uint64x2

ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXDQ, CPU Feature: AVX

func (Uint32x4) ExtendToUint64

func (x Uint32x4) ExtendToUint64() Uint64x4

ExtendToUint64 converts element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXDQ, CPU Feature: AVX2

func (Uint32x4) GetElem

func (x Uint32x4) GetElem(index uint8) uint32

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRD, CPU Feature: AVX

func (Uint32x4) Greater

func (x Uint32x4) Greater(y Uint32x4) Mask32x4

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX

func (Uint32x4) GreaterEqual

func (x Uint32x4) GreaterEqual(y Uint32x4) Mask32x4

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX

func (Uint32x4) InterleaveHi

func (x Uint32x4) InterleaveHi(y Uint32x4) Uint32x4

InterleaveHi interleaves the elements of the high halves of x and y.

Asm: VPUNPCKHDQ, CPU Feature: AVX

func (Uint32x4) InterleaveLo

func (x Uint32x4) InterleaveLo(y Uint32x4) Uint32x4

InterleaveLo interleaves the elements of the low halves of x and y.

Asm: VPUNPCKLDQ, CPU Feature: AVX

func (Uint32x4) IsZero

func (x Uint32x4) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint32x4) LeadingZeros

func (x Uint32x4) LeadingZeros() Uint32x4

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTD, CPU Feature: AVX512

func (Uint32x4) Len

func (x Uint32x4) Len() int

Len returns the number of elements in a Uint32x4

func (Uint32x4) Less

func (x Uint32x4) Less(y Uint32x4) Mask32x4

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX

func (Uint32x4) LessEqual

func (x Uint32x4) LessEqual(y Uint32x4) Mask32x4

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX

func (Uint32x4) Masked

func (x Uint32x4) Masked(mask Mask32x4) Uint32x4

Masked returns x but with elements zeroed where mask is false.

func (Uint32x4) Max

func (x Uint32x4) Max(y Uint32x4) Uint32x4

Max computes the maximum of corresponding elements.

Asm: VPMAXUD, CPU Feature: AVX

func (Uint32x4) Merge

func (x Uint32x4) Merge(y Uint32x4, mask Mask32x4) Uint32x4

Merge returns x but with elements set to y where mask is false.

func (Uint32x4) Min

func (x Uint32x4) Min(y Uint32x4) Uint32x4

Min computes the minimum of corresponding elements.

Asm: VPMINUD, CPU Feature: AVX

func (Uint32x4) Mul

func (x Uint32x4) Mul(y Uint32x4) Uint32x4

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLD, CPU Feature: AVX

func (Uint32x4) MulEvenWiden

func (x Uint32x4) MulEvenWiden(y Uint32x4) Uint64x2

MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1.Even[i] * v2.Even[i].

Asm: VPMULUDQ, CPU Feature: AVX

func (Uint32x4) Not

func (x Uint32x4) Not() Uint32x4

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Uint32x4) NotEqual

func (x Uint32x4) NotEqual(y Uint32x4) Mask32x4

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Uint32x4) OnesCount

func (x Uint32x4) OnesCount() Uint32x4

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

func (Uint32x4) Or

func (x Uint32x4) Or(y Uint32x4) Uint32x4

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Uint32x4) PermuteScalars

func (x Uint32x4) PermuteScalars(a, b, c, d uint8) Uint32x4

PermuteScalars performs a permutation of vector x's elements using the supplied indices:

result = {x[a], x[b], x[c], x[d]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.

Asm: VPSHUFD, CPU Feature: AVX

func (Uint32x4) RotateAllLeft

func (x Uint32x4) RotateAllLeft(shift uint8) Uint32x4

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLD, CPU Feature: AVX512

func (Uint32x4) RotateAllRight

func (x Uint32x4) RotateAllRight(shift uint8) Uint32x4

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORD, CPU Feature: AVX512

func (Uint32x4) RotateLeft

func (x Uint32x4) RotateLeft(y Uint32x4) Uint32x4

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVD, CPU Feature: AVX512

func (Uint32x4) RotateRight

func (x Uint32x4) RotateRight(y Uint32x4) Uint32x4

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVD, CPU Feature: AVX512

func (Uint32x4) SHA1FourRounds

func (x Uint32x4) SHA1FourRounds(constant uint8, y Uint32x4) Uint32x4

SHA1FourRounds performs 4 rounds of B loop in SHA1 algorithm defined in FIPS 180-4. x contains the state variables a, b, c and d from upper to lower order. y contains the W array elements (with the state variable e added to the upper element) from upper to lower order. result = the state variables a', b', c', d' updated after 4 rounds. constant = 0 for the first 20 rounds of the loop, 1 for the next 20 rounds of the loop..., 3 for the last 20 rounds of the loop.

constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: SHA1RNDS4, CPU Feature: SHA

func (Uint32x4) SHA1Message1

func (x Uint32x4) SHA1Message1(y Uint32x4) Uint32x4

SHA1Message1 does the XORing of 1 in SHA1 algorithm defined in FIPS 180-4. x = {W3, W2, W1, W0} y = {0, 0, W5, W4} result = {W3^W5, W2^W4, W1^W3, W0^W2}.

Asm: SHA1MSG1, CPU Feature: SHA

func (Uint32x4) SHA1Message2

func (x Uint32x4) SHA1Message2(y Uint32x4) Uint32x4

SHA1Message2 does the calculation of 3 and 4 in SHA1 algorithm defined in FIPS 180-4. x = result of 2. y = {W15, W14, W13} result = {W19, W18, W17, W16}

Asm: SHA1MSG2, CPU Feature: SHA

func (Uint32x4) SHA1NextE

func (x Uint32x4) SHA1NextE(y Uint32x4) Uint32x4

SHA1NextE calculates the state variable e' updated after 4 rounds in SHA1 algorithm defined in FIPS 180-4. x contains the state variable a (before the 4 rounds), placed in the upper element. y is the elements of W array for next 4 rounds from upper to lower order. result = the elements of the W array for the next 4 rounds, with the updated state variable e' added to the upper element, from upper to lower order. For the last round of the loop, you can specify zero for y to obtain the e' value itself, or better off specifying H4:0:0:0 for y to get e' added to H4. (Note that the value of e' is computed only from x, and values of y don't affect the computation of the value of e'.)

Asm: SHA1NEXTE, CPU Feature: SHA

func (Uint32x4) SHA256Message1

func (x Uint32x4) SHA256Message1(y Uint32x4) Uint32x4

SHA256Message1 does the sigma and addtion of 1 in SHA1 algorithm defined in FIPS 180-4. x = {W0, W1, W2, W3} y = {W4, 0, 0, 0} result = {W0+σ(W1), W1+σ(W2), W2+σ(W3), W3+σ(W4)}

Asm: SHA256MSG1, CPU Feature: SHA

func (Uint32x4) SHA256Message2

func (x Uint32x4) SHA256Message2(y Uint32x4) Uint32x4

SHA256Message2 does the sigma and addition of 3 in SHA1 algorithm defined in FIPS 180-4. x = result of 2 y = {0, 0, W14, W15} result = {W16, W17, W18, W19}

Asm: SHA256MSG2, CPU Feature: SHA

func (Uint32x4) SHA256TwoRounds

func (x Uint32x4) SHA256TwoRounds(y Uint32x4, z Uint32x4) Uint32x4

SHA256TwoRounds does 2 rounds of B loop to calculate updated state variables in SHA1 algorithm defined in FIPS 180-4. x = {h, g, d, c} y = {f, e, b, a} z = {W0+K0, W1+K1} result = {f', e', b', a'} The K array is a 64-DWORD constant array defined in page 11 of FIPS 180-4. Each element of the K array is to be added to the corresponding element of the W array to make the input data z. The updated state variables c', d', g', h' are not returned by this instruction, because they are equal to the input data y (the state variables a, b, e, f before the 2 rounds).

Asm: SHA256RNDS2, CPU Feature: SHA

func (Uint32x4) SaturateToUint16

func (x Uint32x4) SaturateToUint16() Uint16x8

SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSDW, CPU Feature: AVX512

func (Uint32x4) SaturateToUint16Concat

func (x Uint32x4) SaturateToUint16Concat(y Uint32x4) Uint16x8

SaturateToUint16Concat converts element values to uint16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.

Asm: VPACKUSDW, CPU Feature: AVX

func (Uint32x4) SelectFromPair

func (x Uint32x4) SelectFromPair(a, b, c, d uint8, y Uint32x4) Uint32x4

SelectFromPair returns the selection of four elements from the two vectors x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX

func (Uint32x4) SetElem

func (x Uint32x4) SetElem(index uint8, y uint32) Uint32x4

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRD, CPU Feature: AVX

func (Uint32x4) ShiftAllLeft

func (x Uint32x4) ShiftAllLeft(y uint64) Uint32x4

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLD, CPU Feature: AVX

func (Uint32x4) ShiftAllLeftConcat

func (x Uint32x4) ShiftAllLeftConcat(shift uint8, y Uint32x4) Uint32x4

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDD, CPU Feature: AVX512VBMI2

func (Uint32x4) ShiftAllRight

func (x Uint32x4) ShiftAllRight(y uint64) Uint32x4

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLD, CPU Feature: AVX

func (Uint32x4) ShiftAllRightConcat

func (x Uint32x4) ShiftAllRightConcat(shift uint8, y Uint32x4) Uint32x4

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDD, CPU Feature: AVX512VBMI2

func (Uint32x4) ShiftLeft

func (x Uint32x4) ShiftLeft(y Uint32x4) Uint32x4

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVD, CPU Feature: AVX2

func (Uint32x4) ShiftLeftConcat

func (x Uint32x4) ShiftLeftConcat(y Uint32x4, z Uint32x4) Uint32x4

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

func (Uint32x4) ShiftRight

func (x Uint32x4) ShiftRight(y Uint32x4) Uint32x4

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVD, CPU Feature: AVX2

func (Uint32x4) ShiftRightConcat

func (x Uint32x4) ShiftRightConcat(y Uint32x4, z Uint32x4) Uint32x4

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

func (Uint32x4) Store

func (x Uint32x4) Store(y *[4]uint32)

Store stores a Uint32x4 to an array

func (Uint32x4) StoreMasked

func (x Uint32x4) StoreMasked(y *[4]uint32, mask Mask32x4)

StoreMasked stores a Uint32x4 to an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Uint32x4) StoreSlice

func (x Uint32x4) StoreSlice(s []uint32)

StoreSlice stores x into a slice of at least 4 uint32s

func (Uint32x4) StoreSlicePart

func (x Uint32x4) StoreSlicePart(s []uint32)

StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.

func (Uint32x4) String

func (x Uint32x4) String() string

String returns a string representation of SIMD vector x

func (Uint32x4) Sub

func (x Uint32x4) Sub(y Uint32x4) Uint32x4

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBD, CPU Feature: AVX

func (Uint32x4) SubPairs

func (x Uint32x4) SubPairs(y Uint32x4) Uint32x4

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBD, CPU Feature: AVX

func (Uint32x4) TruncateToUint16

func (x Uint32x4) TruncateToUint16() Uint16x8

TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.

Asm: VPMOVDW, CPU Feature: AVX512

func (Uint32x4) TruncateToUint8

func (x Uint32x4) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVDB, CPU Feature: AVX512

func (Uint32x4) Xor

func (x Uint32x4) Xor(y Uint32x4) Uint32x4

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Uint32x8

type Uint32x8 struct {
	// contains filtered or unexported fields
}

Uint32x8 is a 256-bit SIMD vector of 8 uint32

func BroadcastUint32x8

func BroadcastUint32x8(x uint32) Uint32x8

BroadcastUint32x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadMaskedUint32x8

func LoadMaskedUint32x8(y *[8]uint32, mask Mask32x8) Uint32x8

LoadMaskedUint32x8 loads a Uint32x8 from an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func LoadUint32x8

func LoadUint32x8(y *[8]uint32) Uint32x8

LoadUint32x8 loads a Uint32x8 from an array

func LoadUint32x8Slice

func LoadUint32x8Slice(s []uint32) Uint32x8

LoadUint32x8Slice loads an Uint32x8 from a slice of at least 8 uint32s

func LoadUint32x8SlicePart

func LoadUint32x8SlicePart(s []uint32) Uint32x8

LoadUint32x8SlicePart loads a Uint32x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadUint32x8Slice.

func (Uint32x8) Add

func (x Uint32x8) Add(y Uint32x8) Uint32x8

Add adds corresponding elements of two vectors.

Asm: VPADDD, CPU Feature: AVX2

func (Uint32x8) AddPairs

func (x Uint32x8) AddPairs(y Uint32x8) Uint32x8

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDD, CPU Feature: AVX2

func (Uint32x8) And

func (x Uint32x8) And(y Uint32x8) Uint32x8

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Uint32x8) AndNot

func (x Uint32x8) AndNot(y Uint32x8) Uint32x8

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Uint32x8) AsFloat32x8

func (from Uint32x8) AsFloat32x8() (to Float32x8)

Float32x8 converts from Uint32x8 to Float32x8

func (Uint32x8) AsFloat64x4

func (from Uint32x8) AsFloat64x4() (to Float64x4)

Float64x4 converts from Uint32x8 to Float64x4

func (Uint32x8) AsInt16x16

func (from Uint32x8) AsInt16x16() (to Int16x16)

Int16x16 converts from Uint32x8 to Int16x16

func (Uint32x8) AsInt32x8

func (from Uint32x8) AsInt32x8() (to Int32x8)

Int32x8 converts from Uint32x8 to Int32x8

func (Uint32x8) AsInt64x4

func (from Uint32x8) AsInt64x4() (to Int64x4)

Int64x4 converts from Uint32x8 to Int64x4

func (Uint32x8) AsInt8x32

func (from Uint32x8) AsInt8x32() (to Int8x32)

Int8x32 converts from Uint32x8 to Int8x32

func (Uint32x8) AsUint16x16

func (from Uint32x8) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Uint32x8 to Uint16x16

func (Uint32x8) AsUint64x4

func (from Uint32x8) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Uint32x8 to Uint64x4

func (Uint32x8) AsUint8x32

func (from Uint32x8) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Uint32x8 to Uint8x32

func (Uint32x8) Compress

func (x Uint32x8) Compress(mask Mask32x8) Uint32x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSD, CPU Feature: AVX512

func (Uint32x8) ConcatPermute

func (x Uint32x8) ConcatPermute(y Uint32x8, indices Uint32x8) Uint32x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2D, CPU Feature: AVX512

func (Uint32x8) ConvertToFloat32

func (x Uint32x8) ConvertToFloat32() Float32x8

ConvertToFloat32 converts element values to float32.

Asm: VCVTUDQ2PS, CPU Feature: AVX512

func (Uint32x8) ConvertToFloat64

func (x Uint32x8) ConvertToFloat64() Float64x8

ConvertToFloat64 converts element values to float64.

Asm: VCVTUDQ2PD, CPU Feature: AVX512

func (Uint32x8) Equal

func (x Uint32x8) Equal(y Uint32x8) Mask32x8

Equal returns x equals y, elementwise.

Asm: VPCMPEQD, CPU Feature: AVX2

func (Uint32x8) Expand

func (x Uint32x8) Expand(mask Mask32x8) Uint32x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDD, CPU Feature: AVX512

func (Uint32x8) ExtendToUint64

func (x Uint32x8) ExtendToUint64() Uint64x8

ExtendToUint64 converts element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXDQ, CPU Feature: AVX512

func (Uint32x8) GetHi

func (x Uint32x8) GetHi() Uint32x4

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint32x8) GetLo

func (x Uint32x8) GetLo() Uint32x4

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint32x8) Greater

func (x Uint32x8) Greater(y Uint32x8) Mask32x8

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX2

func (Uint32x8) GreaterEqual

func (x Uint32x8) GreaterEqual(y Uint32x8) Mask32x8

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Uint32x8) InterleaveHiGrouped

func (x Uint32x8) InterleaveHiGrouped(y Uint32x8) Uint32x8

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHDQ, CPU Feature: AVX2

func (Uint32x8) InterleaveLoGrouped

func (x Uint32x8) InterleaveLoGrouped(y Uint32x8) Uint32x8

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLDQ, CPU Feature: AVX2

func (Uint32x8) IsZero

func (x Uint32x8) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint32x8) LeadingZeros

func (x Uint32x8) LeadingZeros() Uint32x8

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTD, CPU Feature: AVX512

func (Uint32x8) Len

func (x Uint32x8) Len() int

Len returns the number of elements in a Uint32x8

func (Uint32x8) Less

func (x Uint32x8) Less(y Uint32x8) Mask32x8

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Uint32x8) LessEqual

func (x Uint32x8) LessEqual(y Uint32x8) Mask32x8

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Uint32x8) Masked

func (x Uint32x8) Masked(mask Mask32x8) Uint32x8

Masked returns x but with elements zeroed where mask is false.

func (Uint32x8) Max

func (x Uint32x8) Max(y Uint32x8) Uint32x8

Max computes the maximum of corresponding elements.

Asm: VPMAXUD, CPU Feature: AVX2

func (Uint32x8) Merge

func (x Uint32x8) Merge(y Uint32x8, mask Mask32x8) Uint32x8

Merge returns x but with elements set to y where mask is false.

func (Uint32x8) Min

func (x Uint32x8) Min(y Uint32x8) Uint32x8

Min computes the minimum of corresponding elements.

Asm: VPMINUD, CPU Feature: AVX2

func (Uint32x8) Mul

func (x Uint32x8) Mul(y Uint32x8) Uint32x8

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLD, CPU Feature: AVX2

func (Uint32x8) MulEvenWiden

func (x Uint32x8) MulEvenWiden(y Uint32x8) Uint64x4

MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1.Even[i] * v2.Even[i].

Asm: VPMULUDQ, CPU Feature: AVX2

func (Uint32x8) Not

func (x Uint32x8) Not() Uint32x8

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Uint32x8) NotEqual

func (x Uint32x8) NotEqual(y Uint32x8) Mask32x8

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Uint32x8) OnesCount

func (x Uint32x8) OnesCount() Uint32x8

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

func (Uint32x8) Or

func (x Uint32x8) Or(y Uint32x8) Uint32x8

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Uint32x8) Permute

func (x Uint32x8) Permute(indices Uint32x8) Uint32x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMD, CPU Feature: AVX2

func (Uint32x8) PermuteScalarsGrouped

func (x Uint32x8) PermuteScalarsGrouped(a, b, c, d uint8) Uint32x8

PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:

result = {x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFD, CPU Feature: AVX2

func (Uint32x8) RotateAllLeft

func (x Uint32x8) RotateAllLeft(shift uint8) Uint32x8

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLD, CPU Feature: AVX512

func (Uint32x8) RotateAllRight

func (x Uint32x8) RotateAllRight(shift uint8) Uint32x8

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORD, CPU Feature: AVX512

func (Uint32x8) RotateLeft

func (x Uint32x8) RotateLeft(y Uint32x8) Uint32x8

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVD, CPU Feature: AVX512

func (Uint32x8) RotateRight

func (x Uint32x8) RotateRight(y Uint32x8) Uint32x8

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVD, CPU Feature: AVX512

func (Uint32x8) SaturateToUint16

func (x Uint32x8) SaturateToUint16() Uint16x8

SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSDW, CPU Feature: AVX512

func (Uint32x8) SaturateToUint16Concat

func (x Uint32x8) SaturateToUint16Concat(y Uint32x8) Uint16x16

SaturateToUint16Concat converts element values to uint16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.

Asm: VPACKUSDW, CPU Feature: AVX2

func (Uint32x8) Select128FromPair

func (x Uint32x8) Select128FromPair(lo, hi uint8, y Uint32x8) Uint32x8

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})

returns {70, 71, 72, 73, 40, 41, 42, 43}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Uint32x8) SelectFromPairGrouped

func (x Uint32x8) SelectFromPairGrouped(a, b, c, d uint8, y Uint32x8) Uint32x8

SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})

returns {4,8,25,81,64,128,169,289}

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX

func (Uint32x8) SetHi

func (x Uint32x8) SetHi(y Uint32x4) Uint32x8

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint32x8) SetLo

func (x Uint32x8) SetLo(y Uint32x4) Uint32x8

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint32x8) ShiftAllLeft

func (x Uint32x8) ShiftAllLeft(y uint64) Uint32x8

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLD, CPU Feature: AVX2

func (Uint32x8) ShiftAllLeftConcat

func (x Uint32x8) ShiftAllLeftConcat(shift uint8, y Uint32x8) Uint32x8

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDD, CPU Feature: AVX512VBMI2

func (Uint32x8) ShiftAllRight

func (x Uint32x8) ShiftAllRight(y uint64) Uint32x8

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLD, CPU Feature: AVX2

func (Uint32x8) ShiftAllRightConcat

func (x Uint32x8) ShiftAllRightConcat(shift uint8, y Uint32x8) Uint32x8

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDD, CPU Feature: AVX512VBMI2

func (Uint32x8) ShiftLeft

func (x Uint32x8) ShiftLeft(y Uint32x8) Uint32x8

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVD, CPU Feature: AVX2

func (Uint32x8) ShiftLeftConcat

func (x Uint32x8) ShiftLeftConcat(y Uint32x8, z Uint32x8) Uint32x8

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

func (Uint32x8) ShiftRight

func (x Uint32x8) ShiftRight(y Uint32x8) Uint32x8

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVD, CPU Feature: AVX2

func (Uint32x8) ShiftRightConcat

func (x Uint32x8) ShiftRightConcat(y Uint32x8, z Uint32x8) Uint32x8

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

func (Uint32x8) Store

func (x Uint32x8) Store(y *[8]uint32)

Store stores a Uint32x8 to an array

func (Uint32x8) StoreMasked

func (x Uint32x8) StoreMasked(y *[8]uint32, mask Mask32x8)

StoreMasked stores a Uint32x8 to an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Uint32x8) StoreSlice

func (x Uint32x8) StoreSlice(s []uint32)

StoreSlice stores x into a slice of at least 8 uint32s

func (Uint32x8) StoreSlicePart

func (x Uint32x8) StoreSlicePart(s []uint32)

StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Uint32x8) String

func (x Uint32x8) String() string

String returns a string representation of SIMD vector x

func (Uint32x8) Sub

func (x Uint32x8) Sub(y Uint32x8) Uint32x8

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBD, CPU Feature: AVX2

func (Uint32x8) SubPairs

func (x Uint32x8) SubPairs(y Uint32x8) Uint32x8

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBD, CPU Feature: AVX2

func (Uint32x8) TruncateToUint16

func (x Uint32x8) TruncateToUint16() Uint16x8

TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.

Asm: VPMOVDW, CPU Feature: AVX512

func (Uint32x8) TruncateToUint8

func (x Uint32x8) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVDB, CPU Feature: AVX512

func (Uint32x8) Xor

func (x Uint32x8) Xor(y Uint32x8) Uint32x8

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Uint64x2

type Uint64x2 struct {
	// contains filtered or unexported fields
}

Uint64x2 is a 128-bit SIMD vector of 2 uint64

func BroadcastUint64x2

func BroadcastUint64x2(x uint64) Uint64x2

BroadcastUint64x2 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadMaskedUint64x2

func LoadMaskedUint64x2(y *[2]uint64, mask Mask64x2) Uint64x2

LoadMaskedUint64x2 loads a Uint64x2 from an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func LoadUint64x2

func LoadUint64x2(y *[2]uint64) Uint64x2

LoadUint64x2 loads a Uint64x2 from an array

func LoadUint64x2Slice

func LoadUint64x2Slice(s []uint64) Uint64x2

LoadUint64x2Slice loads an Uint64x2 from a slice of at least 2 uint64s

func LoadUint64x2SlicePart

func LoadUint64x2SlicePart(s []uint64) Uint64x2

LoadUint64x2SlicePart loads a Uint64x2 from the slice s. If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes. If s has 2 or more elements, the function is equivalent to LoadUint64x2Slice.

func (Uint64x2) Add

func (x Uint64x2) Add(y Uint64x2) Uint64x2

Add adds corresponding elements of two vectors.

Asm: VPADDQ, CPU Feature: AVX

func (Uint64x2) And

func (x Uint64x2) And(y Uint64x2) Uint64x2

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Uint64x2) AndNot

func (x Uint64x2) AndNot(y Uint64x2) Uint64x2

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Uint64x2) AsFloat32x4

func (from Uint64x2) AsFloat32x4() (to Float32x4)

Float32x4 converts from Uint64x2 to Float32x4

func (Uint64x2) AsFloat64x2

func (from Uint64x2) AsFloat64x2() (to Float64x2)

Float64x2 converts from Uint64x2 to Float64x2

func (Uint64x2) AsInt16x8

func (from Uint64x2) AsInt16x8() (to Int16x8)

Int16x8 converts from Uint64x2 to Int16x8

func (Uint64x2) AsInt32x4

func (from Uint64x2) AsInt32x4() (to Int32x4)

Int32x4 converts from Uint64x2 to Int32x4

func (Uint64x2) AsInt64x2

func (from Uint64x2) AsInt64x2() (to Int64x2)

Int64x2 converts from Uint64x2 to Int64x2

func (Uint64x2) AsInt8x16

func (from Uint64x2) AsInt8x16() (to Int8x16)

Int8x16 converts from Uint64x2 to Int8x16

func (Uint64x2) AsUint16x8

func (from Uint64x2) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Uint64x2 to Uint16x8

func (Uint64x2) AsUint32x4

func (from Uint64x2) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Uint64x2 to Uint32x4

func (Uint64x2) AsUint8x16

func (from Uint64x2) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Uint64x2 to Uint8x16

func (Uint64x2) Broadcast128

func (x Uint64x2) Broadcast128() Uint64x2

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTQ, CPU Feature: AVX2

func (Uint64x2) Broadcast256

func (x Uint64x2) Broadcast256() Uint64x4

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTQ, CPU Feature: AVX2

func (Uint64x2) Broadcast512

func (x Uint64x2) Broadcast512() Uint64x8

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTQ, CPU Feature: AVX512

func (Uint64x2) CarrylessMultiply

func (x Uint64x2) CarrylessMultiply(a, b uint8, y Uint64x2) Uint64x2

CarrylessMultiply computes one of four possible carryless multiplications of selected high and low halves of x and y, depending on the values of a and b, returning the 128-bit product in the concatenated two elements of the result. a selects the low (0) or high (1) element of x and b selects the low (0) or high (1) element of y.

A carryless multiplication uses bitwise XOR instead of add-with-carry, for example (in base two): 11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101

This also models multiplication of polynomials with coefficients from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 = x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds polynomial terms, but coefficients "add" with XOR.)

constant values of a and b will result in better performance, otherwise the intrinsic may translate into a jump table.

Asm: VPCLMULQDQ, CPU Feature: AVX

func (Uint64x2) Compress

func (x Uint64x2) Compress(mask Mask64x2) Uint64x2

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSQ, CPU Feature: AVX512

func (Uint64x2) ConcatPermute

func (x Uint64x2) ConcatPermute(y Uint64x2, indices Uint64x2) Uint64x2

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2Q, CPU Feature: AVX512

func (Uint64x2) ConvertToFloat32

func (x Uint64x2) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32.

Asm: VCVTUQQ2PSX, CPU Feature: AVX512

func (Uint64x2) ConvertToFloat64

func (x Uint64x2) ConvertToFloat64() Float64x2

ConvertToFloat64 converts element values to float64.

Asm: VCVTUQQ2PD, CPU Feature: AVX512

func (Uint64x2) Equal

func (x Uint64x2) Equal(y Uint64x2) Mask64x2

Equal returns x equals y, elementwise.

Asm: VPCMPEQQ, CPU Feature: AVX

func (Uint64x2) Expand

func (x Uint64x2) Expand(mask Mask64x2) Uint64x2

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDQ, CPU Feature: AVX512

func (Uint64x2) GetElem

func (x Uint64x2) GetElem(index uint8) uint64

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRQ, CPU Feature: AVX

func (Uint64x2) Greater

func (x Uint64x2) Greater(y Uint64x2) Mask64x2

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX

func (Uint64x2) GreaterEqual

func (x Uint64x2) GreaterEqual(y Uint64x2) Mask64x2

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX

func (Uint64x2) InterleaveHi

func (x Uint64x2) InterleaveHi(y Uint64x2) Uint64x2

InterleaveHi interleaves the elements of the high halves of x and y.

Asm: VPUNPCKHQDQ, CPU Feature: AVX

func (Uint64x2) InterleaveLo

func (x Uint64x2) InterleaveLo(y Uint64x2) Uint64x2

InterleaveLo interleaves the elements of the low halves of x and y.

Asm: VPUNPCKLQDQ, CPU Feature: AVX

func (Uint64x2) IsZero

func (x Uint64x2) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint64x2) LeadingZeros

func (x Uint64x2) LeadingZeros() Uint64x2

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTQ, CPU Feature: AVX512

func (Uint64x2) Len

func (x Uint64x2) Len() int

Len returns the number of elements in a Uint64x2

func (Uint64x2) Less

func (x Uint64x2) Less(y Uint64x2) Mask64x2

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX

func (Uint64x2) LessEqual

func (x Uint64x2) LessEqual(y Uint64x2) Mask64x2

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX

func (Uint64x2) Masked

func (x Uint64x2) Masked(mask Mask64x2) Uint64x2

Masked returns x but with elements zeroed where mask is false.

func (Uint64x2) Max

func (x Uint64x2) Max(y Uint64x2) Uint64x2

Max computes the maximum of corresponding elements.

Asm: VPMAXUQ, CPU Feature: AVX512

func (Uint64x2) Merge

func (x Uint64x2) Merge(y Uint64x2, mask Mask64x2) Uint64x2

Merge returns x but with elements set to y where mask is false.

func (Uint64x2) Min

func (x Uint64x2) Min(y Uint64x2) Uint64x2

Min computes the minimum of corresponding elements.

Asm: VPMINUQ, CPU Feature: AVX512

func (Uint64x2) Mul

func (x Uint64x2) Mul(y Uint64x2) Uint64x2

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLQ, CPU Feature: AVX512

func (Uint64x2) Not

func (x Uint64x2) Not() Uint64x2

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Uint64x2) NotEqual

func (x Uint64x2) NotEqual(y Uint64x2) Mask64x2

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Uint64x2) OnesCount

func (x Uint64x2) OnesCount() Uint64x2

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

func (Uint64x2) Or

func (x Uint64x2) Or(y Uint64x2) Uint64x2

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Uint64x2) RotateAllLeft

func (x Uint64x2) RotateAllLeft(shift uint8) Uint64x2

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLQ, CPU Feature: AVX512

func (Uint64x2) RotateAllRight

func (x Uint64x2) RotateAllRight(shift uint8) Uint64x2

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORQ, CPU Feature: AVX512

func (Uint64x2) RotateLeft

func (x Uint64x2) RotateLeft(y Uint64x2) Uint64x2

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVQ, CPU Feature: AVX512

func (Uint64x2) RotateRight

func (x Uint64x2) RotateRight(y Uint64x2) Uint64x2

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVQ, CPU Feature: AVX512

func (Uint64x2) SaturateToUint16

func (x Uint64x2) SaturateToUint16() Uint16x8

SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSQW, CPU Feature: AVX512

func (Uint64x2) SaturateToUint32

func (x Uint64x2) SaturateToUint32() Uint32x4

SaturateToUint32 converts element values to uint32. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSQD, CPU Feature: AVX512

func (Uint64x2) SelectFromPair

func (x Uint64x2) SelectFromPair(a, b uint8, y Uint64x2) Uint64x2

SelectFromPair returns the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX

func (Uint64x2) SetElem

func (x Uint64x2) SetElem(index uint8, y uint64) Uint64x2

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRQ, CPU Feature: AVX

func (Uint64x2) ShiftAllLeft

func (x Uint64x2) ShiftAllLeft(y uint64) Uint64x2

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLQ, CPU Feature: AVX

func (Uint64x2) ShiftAllLeftConcat

func (x Uint64x2) ShiftAllLeftConcat(shift uint8, y Uint64x2) Uint64x2

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

func (Uint64x2) ShiftAllRight

func (x Uint64x2) ShiftAllRight(y uint64) Uint64x2

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLQ, CPU Feature: AVX

func (Uint64x2) ShiftAllRightConcat

func (x Uint64x2) ShiftAllRightConcat(shift uint8, y Uint64x2) Uint64x2

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

func (Uint64x2) ShiftLeft

func (x Uint64x2) ShiftLeft(y Uint64x2) Uint64x2

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVQ, CPU Feature: AVX2

func (Uint64x2) ShiftLeftConcat

func (x Uint64x2) ShiftLeftConcat(y Uint64x2, z Uint64x2) Uint64x2

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

func (Uint64x2) ShiftRight

func (x Uint64x2) ShiftRight(y Uint64x2) Uint64x2

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVQ, CPU Feature: AVX2

func (Uint64x2) ShiftRightConcat

func (x Uint64x2) ShiftRightConcat(y Uint64x2, z Uint64x2) Uint64x2

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

func (Uint64x2) Store

func (x Uint64x2) Store(y *[2]uint64)

Store stores a Uint64x2 to an array

func (Uint64x2) StoreMasked

func (x Uint64x2) StoreMasked(y *[2]uint64, mask Mask64x2)

StoreMasked stores a Uint64x2 to an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Uint64x2) StoreSlice

func (x Uint64x2) StoreSlice(s []uint64)

StoreSlice stores x into a slice of at least 2 uint64s

func (Uint64x2) StoreSlicePart

func (x Uint64x2) StoreSlicePart(s []uint64)

StoreSlicePart stores the 2 elements of x into the slice s. It stores as many elements as will fit in s. If s has 2 or more elements, the method is equivalent to x.StoreSlice.

func (Uint64x2) String

func (x Uint64x2) String() string

String returns a string representation of SIMD vector x

func (Uint64x2) Sub

func (x Uint64x2) Sub(y Uint64x2) Uint64x2

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBQ, CPU Feature: AVX

func (Uint64x2) TruncateToUint16

func (x Uint64x2) TruncateToUint16() Uint16x8

TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.

Asm: VPMOVQW, CPU Feature: AVX512

func (Uint64x2) TruncateToUint32

func (x Uint64x2) TruncateToUint32() Uint32x4

TruncateToUint32 converts element values to uint32. Conversion is done with truncation on the vector elements.

Asm: VPMOVQD, CPU Feature: AVX512

func (Uint64x2) TruncateToUint8

func (x Uint64x2) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVQB, CPU Feature: AVX512

func (Uint64x2) Xor

func (x Uint64x2) Xor(y Uint64x2) Uint64x2

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Uint64x4

type Uint64x4 struct {
	// contains filtered or unexported fields
}

Uint64x4 is a 256-bit SIMD vector of 4 uint64

func BroadcastUint64x4

func BroadcastUint64x4(x uint64) Uint64x4

BroadcastUint64x4 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadMaskedUint64x4

func LoadMaskedUint64x4(y *[4]uint64, mask Mask64x4) Uint64x4

LoadMaskedUint64x4 loads a Uint64x4 from an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func LoadUint64x4

func LoadUint64x4(y *[4]uint64) Uint64x4

LoadUint64x4 loads a Uint64x4 from an array

func LoadUint64x4Slice

func LoadUint64x4Slice(s []uint64) Uint64x4

LoadUint64x4Slice loads an Uint64x4 from a slice of at least 4 uint64s

func LoadUint64x4SlicePart

func LoadUint64x4SlicePart(s []uint64) Uint64x4

LoadUint64x4SlicePart loads a Uint64x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadUint64x4Slice.

func (Uint64x4) Add

func (x Uint64x4) Add(y Uint64x4) Uint64x4

Add adds corresponding elements of two vectors.

Asm: VPADDQ, CPU Feature: AVX2

func (Uint64x4) And

func (x Uint64x4) And(y Uint64x4) Uint64x4

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Uint64x4) AndNot

func (x Uint64x4) AndNot(y Uint64x4) Uint64x4

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Uint64x4) AsFloat32x8

func (from Uint64x4) AsFloat32x8() (to Float32x8)

Float32x8 converts from Uint64x4 to Float32x8

func (Uint64x4) AsFloat64x4

func (from Uint64x4) AsFloat64x4() (to Float64x4)

Float64x4 converts from Uint64x4 to Float64x4

func (Uint64x4) AsInt16x16

func (from Uint64x4) AsInt16x16() (to Int16x16)

Int16x16 converts from Uint64x4 to Int16x16

func (Uint64x4) AsInt32x8

func (from Uint64x4) AsInt32x8() (to Int32x8)

Int32x8 converts from Uint64x4 to Int32x8

func (Uint64x4) AsInt64x4

func (from Uint64x4) AsInt64x4() (to Int64x4)

Int64x4 converts from Uint64x4 to Int64x4

func (Uint64x4) AsInt8x32

func (from Uint64x4) AsInt8x32() (to Int8x32)

Int8x32 converts from Uint64x4 to Int8x32

func (Uint64x4) AsUint16x16

func (from Uint64x4) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Uint64x4 to Uint16x16

func (Uint64x4) AsUint32x8

func (from Uint64x4) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Uint64x4 to Uint32x8

func (Uint64x4) AsUint8x32

func (from Uint64x4) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Uint64x4 to Uint8x32

func (Uint64x4) CarrylessMultiplyGrouped

func (x Uint64x4) CarrylessMultiplyGrouped(a, b uint8, y Uint64x4) Uint64x4

CarrylessMultiplyGrouped computes one of four possible carryless multiplications of selected high and low halves of each of the two 128-bit lanes of x and y, depending on the values of a and b, and returns the four 128-bit products in the result's lanes. a selects the low (0) or high (1) elements of x's lanes and b selects the low (0) or high (1) elements of y's lanes.

A carryless multiplication uses bitwise XOR instead of add-with-carry, for example (in base two): 11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101

This also models multiplication of polynomials with coefficients from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 = x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds polynomial terms, but coefficients "add" with XOR.)

constant values of a and b will result in better performance, otherwise the intrinsic may translate into a jump table.

Asm: VPCLMULQDQ, CPU Feature: AVX512VPCLMULQDQ

func (Uint64x4) Compress

func (x Uint64x4) Compress(mask Mask64x4) Uint64x4

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSQ, CPU Feature: AVX512

func (Uint64x4) ConcatPermute

func (x Uint64x4) ConcatPermute(y Uint64x4, indices Uint64x4) Uint64x4

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2Q, CPU Feature: AVX512

func (Uint64x4) ConvertToFloat32

func (x Uint64x4) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32.

Asm: VCVTUQQ2PSY, CPU Feature: AVX512

func (Uint64x4) ConvertToFloat64

func (x Uint64x4) ConvertToFloat64() Float64x4

ConvertToFloat64 converts element values to float64.

Asm: VCVTUQQ2PD, CPU Feature: AVX512

func (Uint64x4) Equal

func (x Uint64x4) Equal(y Uint64x4) Mask64x4

Equal returns x equals y, elementwise.

Asm: VPCMPEQQ, CPU Feature: AVX2

func (Uint64x4) Expand

func (x Uint64x4) Expand(mask Mask64x4) Uint64x4

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDQ, CPU Feature: AVX512

func (Uint64x4) GetHi

func (x Uint64x4) GetHi() Uint64x2

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint64x4) GetLo

func (x Uint64x4) GetLo() Uint64x2

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint64x4) Greater

func (x Uint64x4) Greater(y Uint64x4) Mask64x4

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX2

func (Uint64x4) GreaterEqual

func (x Uint64x4) GreaterEqual(y Uint64x4) Mask64x4

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Uint64x4) InterleaveHiGrouped

func (x Uint64x4) InterleaveHiGrouped(y Uint64x4) Uint64x4

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHQDQ, CPU Feature: AVX2

func (Uint64x4) InterleaveLoGrouped

func (x Uint64x4) InterleaveLoGrouped(y Uint64x4) Uint64x4

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLQDQ, CPU Feature: AVX2

func (Uint64x4) IsZero

func (x Uint64x4) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint64x4) LeadingZeros

func (x Uint64x4) LeadingZeros() Uint64x4

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTQ, CPU Feature: AVX512

func (Uint64x4) Len

func (x Uint64x4) Len() int

Len returns the number of elements in a Uint64x4

func (Uint64x4) Less

func (x Uint64x4) Less(y Uint64x4) Mask64x4

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Uint64x4) LessEqual

func (x Uint64x4) LessEqual(y Uint64x4) Mask64x4

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Uint64x4) Masked

func (x Uint64x4) Masked(mask Mask64x4) Uint64x4

Masked returns x but with elements zeroed where mask is false.

func (Uint64x4) Max

func (x Uint64x4) Max(y Uint64x4) Uint64x4

Max computes the maximum of corresponding elements.

Asm: VPMAXUQ, CPU Feature: AVX512

func (Uint64x4) Merge

func (x Uint64x4) Merge(y Uint64x4, mask Mask64x4) Uint64x4

Merge returns x but with elements set to y where mask is false.

func (Uint64x4) Min

func (x Uint64x4) Min(y Uint64x4) Uint64x4

Min computes the minimum of corresponding elements.

Asm: VPMINUQ, CPU Feature: AVX512

func (Uint64x4) Mul

func (x Uint64x4) Mul(y Uint64x4) Uint64x4

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLQ, CPU Feature: AVX512

func (Uint64x4) Not

func (x Uint64x4) Not() Uint64x4

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Uint64x4) NotEqual

func (x Uint64x4) NotEqual(y Uint64x4) Mask64x4

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Uint64x4) OnesCount

func (x Uint64x4) OnesCount() Uint64x4

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

func (Uint64x4) Or

func (x Uint64x4) Or(y Uint64x4) Uint64x4

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Uint64x4) Permute

func (x Uint64x4) Permute(indices Uint64x4) Uint64x4

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 2 bits (values 0-3) of each element of indices is used

Asm: VPERMQ, CPU Feature: AVX512

func (Uint64x4) RotateAllLeft

func (x Uint64x4) RotateAllLeft(shift uint8) Uint64x4

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLQ, CPU Feature: AVX512

func (Uint64x4) RotateAllRight

func (x Uint64x4) RotateAllRight(shift uint8) Uint64x4

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORQ, CPU Feature: AVX512

func (Uint64x4) RotateLeft

func (x Uint64x4) RotateLeft(y Uint64x4) Uint64x4

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVQ, CPU Feature: AVX512

func (Uint64x4) RotateRight

func (x Uint64x4) RotateRight(y Uint64x4) Uint64x4

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVQ, CPU Feature: AVX512

func (Uint64x4) SaturateToUint16

func (x Uint64x4) SaturateToUint16() Uint16x8

SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSQW, CPU Feature: AVX512

func (Uint64x4) SaturateToUint32

func (x Uint64x4) SaturateToUint32() Uint32x4

SaturateToUint32 converts element values to uint32. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSQD, CPU Feature: AVX512

func (Uint64x4) Select128FromPair

func (x Uint64x4) Select128FromPair(lo, hi uint8, y Uint64x4) Uint64x4

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})

returns {70, 71, 40, 41}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Uint64x4) SelectFromPairGrouped

func (x Uint64x4) SelectFromPairGrouped(a, b uint8, y Uint64x4) Uint64x4

SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX

func (Uint64x4) SetHi

func (x Uint64x4) SetHi(y Uint64x2) Uint64x4

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint64x4) SetLo

func (x Uint64x4) SetLo(y Uint64x2) Uint64x4

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint64x4) ShiftAllLeft

func (x Uint64x4) ShiftAllLeft(y uint64) Uint64x4

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLQ, CPU Feature: AVX2

func (Uint64x4) ShiftAllLeftConcat

func (x Uint64x4) ShiftAllLeftConcat(shift uint8, y Uint64x4) Uint64x4

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

func (Uint64x4) ShiftAllRight

func (x Uint64x4) ShiftAllRight(y uint64) Uint64x4

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLQ, CPU Feature: AVX2

func (Uint64x4) ShiftAllRightConcat

func (x Uint64x4) ShiftAllRightConcat(shift uint8, y Uint64x4) Uint64x4

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

func (Uint64x4) ShiftLeft

func (x Uint64x4) ShiftLeft(y Uint64x4) Uint64x4

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVQ, CPU Feature: AVX2

func (Uint64x4) ShiftLeftConcat

func (x Uint64x4) ShiftLeftConcat(y Uint64x4, z Uint64x4) Uint64x4

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

func (Uint64x4) ShiftRight

func (x Uint64x4) ShiftRight(y Uint64x4) Uint64x4

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVQ, CPU Feature: AVX2

func (Uint64x4) ShiftRightConcat

func (x Uint64x4) ShiftRightConcat(y Uint64x4, z Uint64x4) Uint64x4

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

func (Uint64x4) Store

func (x Uint64x4) Store(y *[4]uint64)

Store stores a Uint64x4 to an array

func (Uint64x4) StoreMasked

func (x Uint64x4) StoreMasked(y *[4]uint64, mask Mask64x4)

StoreMasked stores a Uint64x4 to an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Uint64x4) StoreSlice

func (x Uint64x4) StoreSlice(s []uint64)

StoreSlice stores x into a slice of at least 4 uint64s

func (Uint64x4) StoreSlicePart

func (x Uint64x4) StoreSlicePart(s []uint64)

StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.

func (Uint64x4) String

func (x Uint64x4) String() string

String returns a string representation of SIMD vector x

func (Uint64x4) Sub

func (x Uint64x4) Sub(y Uint64x4) Uint64x4

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBQ, CPU Feature: AVX2

func (Uint64x4) TruncateToUint16

func (x Uint64x4) TruncateToUint16() Uint16x8

TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.

Asm: VPMOVQW, CPU Feature: AVX512

func (Uint64x4) TruncateToUint32

func (x Uint64x4) TruncateToUint32() Uint32x4

TruncateToUint32 converts element values to uint32. Conversion is done with truncation on the vector elements.

Asm: VPMOVQD, CPU Feature: AVX512

func (Uint64x4) TruncateToUint8

func (x Uint64x4) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVQB, CPU Feature: AVX512

func (Uint64x4) Xor

func (x Uint64x4) Xor(y Uint64x4) Uint64x4

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Uint64x8

type Uint64x8 struct {
	// contains filtered or unexported fields
}

Uint64x8 is a 512-bit SIMD vector of 8 uint64

func BroadcastUint64x8

func BroadcastUint64x8(x uint64) Uint64x8

BroadcastUint64x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512F

func LoadMaskedUint64x8

func LoadMaskedUint64x8(y *[8]uint64, mask Mask64x8) Uint64x8

LoadMaskedUint64x8 loads a Uint64x8 from an array, at those elements enabled by mask

Asm: VMOVDQU64.Z, CPU Feature: AVX512

func LoadUint64x8

func LoadUint64x8(y *[8]uint64) Uint64x8

LoadUint64x8 loads a Uint64x8 from an array

func LoadUint64x8Slice

func LoadUint64x8Slice(s []uint64) Uint64x8

LoadUint64x8Slice loads an Uint64x8 from a slice of at least 8 uint64s

func LoadUint64x8SlicePart

func LoadUint64x8SlicePart(s []uint64) Uint64x8

LoadUint64x8SlicePart loads a Uint64x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadUint64x8Slice.

func (Uint64x8) Add

func (x Uint64x8) Add(y Uint64x8) Uint64x8

Add adds corresponding elements of two vectors.

Asm: VPADDQ, CPU Feature: AVX512

func (Uint64x8) And

func (x Uint64x8) And(y Uint64x8) Uint64x8

And performs a bitwise AND operation between two vectors.

Asm: VPANDQ, CPU Feature: AVX512

func (Uint64x8) AndNot

func (x Uint64x8) AndNot(y Uint64x8) Uint64x8

AndNot performs a bitwise x &^ y.

Asm: VPANDNQ, CPU Feature: AVX512

func (Uint64x8) AsFloat32x16

func (from Uint64x8) AsFloat32x16() (to Float32x16)

Float32x16 converts from Uint64x8 to Float32x16

func (Uint64x8) AsFloat64x8

func (from Uint64x8) AsFloat64x8() (to Float64x8)

Float64x8 converts from Uint64x8 to Float64x8

func (Uint64x8) AsInt16x32

func (from Uint64x8) AsInt16x32() (to Int16x32)

Int16x32 converts from Uint64x8 to Int16x32

func (Uint64x8) AsInt32x16

func (from Uint64x8) AsInt32x16() (to Int32x16)

Int32x16 converts from Uint64x8 to Int32x16

func (Uint64x8) AsInt64x8

func (from Uint64x8) AsInt64x8() (to Int64x8)

Int64x8 converts from Uint64x8 to Int64x8

func (Uint64x8) AsInt8x64

func (from Uint64x8) AsInt8x64() (to Int8x64)

Int8x64 converts from Uint64x8 to Int8x64

func (Uint64x8) AsUint16x32

func (from Uint64x8) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Uint64x8 to Uint16x32

func (Uint64x8) AsUint32x16

func (from Uint64x8) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Uint64x8 to Uint32x16

func (Uint64x8) AsUint8x64

func (from Uint64x8) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Uint64x8 to Uint8x64

func (Uint64x8) CarrylessMultiplyGrouped

func (x Uint64x8) CarrylessMultiplyGrouped(a, b uint8, y Uint64x8) Uint64x8

CarrylessMultiplyGrouped computes one of four possible carryless multiplications of selected high and low halves of each of the four 128-bit lanes of x and y, depending on the values of a and b, and returns the four 128-bit products in the result's lanes. a selects the low (0) or high (1) elements of x's lanes and b selects the low (0) or high (1) elements of y's lanes.

A carryless multiplication uses bitwise XOR instead of add-with-carry, for example (in base two): 11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101

This also models multiplication of polynomials with coefficients from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 = x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds polynomial terms, but coefficients "add" with XOR.)

constant values of a and b will result in better performance, otherwise the intrinsic may translate into a jump table.

Asm: VPCLMULQDQ, CPU Feature: AVX512VPCLMULQDQ

func (Uint64x8) Compress

func (x Uint64x8) Compress(mask Mask64x8) Uint64x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSQ, CPU Feature: AVX512

func (Uint64x8) ConcatPermute

func (x Uint64x8) ConcatPermute(y Uint64x8, indices Uint64x8) Uint64x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2Q, CPU Feature: AVX512

func (Uint64x8) ConvertToFloat32

func (x Uint64x8) ConvertToFloat32() Float32x8

ConvertToFloat32 converts element values to float32.

Asm: VCVTUQQ2PS, CPU Feature: AVX512

func (Uint64x8) ConvertToFloat64

func (x Uint64x8) ConvertToFloat64() Float64x8

ConvertToFloat64 converts element values to float64.

Asm: VCVTUQQ2PD, CPU Feature: AVX512

func (Uint64x8) Equal

func (x Uint64x8) Equal(y Uint64x8) Mask64x8

Equal returns x equals y, elementwise.

Asm: VPCMPEQQ, CPU Feature: AVX512

func (Uint64x8) Expand

func (x Uint64x8) Expand(mask Mask64x8) Uint64x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDQ, CPU Feature: AVX512

func (Uint64x8) GetHi

func (x Uint64x8) GetHi() Uint64x4

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint64x8) GetLo

func (x Uint64x8) GetLo() Uint64x4

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint64x8) Greater

func (x Uint64x8) Greater(y Uint64x8) Mask64x8

Greater returns x greater-than y, elementwise.

Asm: VPCMPUQ, CPU Feature: AVX512

func (Uint64x8) GreaterEqual

func (x Uint64x8) GreaterEqual(y Uint64x8) Mask64x8

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPUQ, CPU Feature: AVX512

func (Uint64x8) InterleaveHiGrouped

func (x Uint64x8) InterleaveHiGrouped(y Uint64x8) Uint64x8

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHQDQ, CPU Feature: AVX512

func (Uint64x8) InterleaveLoGrouped

func (x Uint64x8) InterleaveLoGrouped(y Uint64x8) Uint64x8

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLQDQ, CPU Feature: AVX512

func (Uint64x8) LeadingZeros

func (x Uint64x8) LeadingZeros() Uint64x8

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTQ, CPU Feature: AVX512

func (Uint64x8) Len

func (x Uint64x8) Len() int

Len returns the number of elements in a Uint64x8

func (Uint64x8) Less

func (x Uint64x8) Less(y Uint64x8) Mask64x8

Less returns x less-than y, elementwise.

Asm: VPCMPUQ, CPU Feature: AVX512

func (Uint64x8) LessEqual

func (x Uint64x8) LessEqual(y Uint64x8) Mask64x8

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPUQ, CPU Feature: AVX512

func (Uint64x8) Masked

func (x Uint64x8) Masked(mask Mask64x8) Uint64x8

Masked returns x but with elements zeroed where mask is false.

func (Uint64x8) Max

func (x Uint64x8) Max(y Uint64x8) Uint64x8

Max computes the maximum of corresponding elements.

Asm: VPMAXUQ, CPU Feature: AVX512

func (Uint64x8) Merge

func (x Uint64x8) Merge(y Uint64x8, mask Mask64x8) Uint64x8

Merge returns x but with elements set to y where m is false.

func (Uint64x8) Min

func (x Uint64x8) Min(y Uint64x8) Uint64x8

Min computes the minimum of corresponding elements.

Asm: VPMINUQ, CPU Feature: AVX512

func (Uint64x8) Mul

func (x Uint64x8) Mul(y Uint64x8) Uint64x8

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLQ, CPU Feature: AVX512

func (Uint64x8) Not

func (x Uint64x8) Not() Uint64x8

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Uint64x8) NotEqual

func (x Uint64x8) NotEqual(y Uint64x8) Mask64x8

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPUQ, CPU Feature: AVX512

func (Uint64x8) OnesCount

func (x Uint64x8) OnesCount() Uint64x8

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

func (Uint64x8) Or

func (x Uint64x8) Or(y Uint64x8) Uint64x8

Or performs a bitwise OR operation between two vectors.

Asm: VPORQ, CPU Feature: AVX512

func (Uint64x8) Permute

func (x Uint64x8) Permute(indices Uint64x8) Uint64x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMQ, CPU Feature: AVX512

func (Uint64x8) RotateAllLeft

func (x Uint64x8) RotateAllLeft(shift uint8) Uint64x8

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLQ, CPU Feature: AVX512

func (Uint64x8) RotateAllRight

func (x Uint64x8) RotateAllRight(shift uint8) Uint64x8

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORQ, CPU Feature: AVX512

func (Uint64x8) RotateLeft

func (x Uint64x8) RotateLeft(y Uint64x8) Uint64x8

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVQ, CPU Feature: AVX512

func (Uint64x8) RotateRight

func (x Uint64x8) RotateRight(y Uint64x8) Uint64x8

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVQ, CPU Feature: AVX512

func (Uint64x8) SaturateToUint16

func (x Uint64x8) SaturateToUint16() Uint16x8

SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSQW, CPU Feature: AVX512

func (Uint64x8) SaturateToUint32

func (x Uint64x8) SaturateToUint32() Uint32x8

SaturateToUint32 converts element values to uint32. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSQD, CPU Feature: AVX512

func (Uint64x8) SelectFromPairGrouped

func (x Uint64x8) SelectFromPairGrouped(a, b uint8, y Uint64x8) Uint64x8

SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX512

func (Uint64x8) SetHi

func (x Uint64x8) SetHi(y Uint64x4) Uint64x8

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint64x8) SetLo

func (x Uint64x8) SetLo(y Uint64x4) Uint64x8

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint64x8) ShiftAllLeft

func (x Uint64x8) ShiftAllLeft(y uint64) Uint64x8

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLQ, CPU Feature: AVX512

func (Uint64x8) ShiftAllLeftConcat

func (x Uint64x8) ShiftAllLeftConcat(shift uint8, y Uint64x8) Uint64x8

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

func (Uint64x8) ShiftAllRight

func (x Uint64x8) ShiftAllRight(y uint64) Uint64x8

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLQ, CPU Feature: AVX512

func (Uint64x8) ShiftAllRightConcat

func (x Uint64x8) ShiftAllRightConcat(shift uint8, y Uint64x8) Uint64x8

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

func (Uint64x8) ShiftLeft

func (x Uint64x8) ShiftLeft(y Uint64x8) Uint64x8

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVQ, CPU Feature: AVX512

func (Uint64x8) ShiftLeftConcat

func (x Uint64x8) ShiftLeftConcat(y Uint64x8, z Uint64x8) Uint64x8

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

func (Uint64x8) ShiftRight

func (x Uint64x8) ShiftRight(y Uint64x8) Uint64x8

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVQ, CPU Feature: AVX512

func (Uint64x8) ShiftRightConcat

func (x Uint64x8) ShiftRightConcat(y Uint64x8, z Uint64x8) Uint64x8

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

func (Uint64x8) Store

func (x Uint64x8) Store(y *[8]uint64)

Store stores a Uint64x8 to an array

func (Uint64x8) StoreMasked

func (x Uint64x8) StoreMasked(y *[8]uint64, mask Mask64x8)

StoreMasked stores a Uint64x8 to an array, at those elements enabled by mask

Asm: VMOVDQU64, CPU Feature: AVX512

func (Uint64x8) StoreSlice

func (x Uint64x8) StoreSlice(s []uint64)

StoreSlice stores x into a slice of at least 8 uint64s

func (Uint64x8) StoreSlicePart

func (x Uint64x8) StoreSlicePart(s []uint64)

StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Uint64x8) String

func (x Uint64x8) String() string

String returns a string representation of SIMD vector x

func (Uint64x8) Sub

func (x Uint64x8) Sub(y Uint64x8) Uint64x8

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBQ, CPU Feature: AVX512

func (Uint64x8) TruncateToUint16

func (x Uint64x8) TruncateToUint16() Uint16x8

TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.

Asm: VPMOVQW, CPU Feature: AVX512

func (Uint64x8) TruncateToUint32

func (x Uint64x8) TruncateToUint32() Uint32x8

TruncateToUint32 converts element values to uint32. Conversion is done with truncation on the vector elements.

Asm: VPMOVQD, CPU Feature: AVX512

func (Uint64x8) TruncateToUint8

func (x Uint64x8) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVQB, CPU Feature: AVX512

func (Uint64x8) Xor

func (x Uint64x8) Xor(y Uint64x8) Uint64x8

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORQ, CPU Feature: AVX512

type Uint8x16

type Uint8x16 struct {
	// contains filtered or unexported fields
}

Uint8x16 is a 128-bit SIMD vector of 16 uint8

func BroadcastUint8x16

func BroadcastUint8x16(x uint8) Uint8x16

BroadcastUint8x16 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadUint8x16

func LoadUint8x16(y *[16]uint8) Uint8x16

LoadUint8x16 loads a Uint8x16 from an array

func LoadUint8x16Slice

func LoadUint8x16Slice(s []uint8) Uint8x16

LoadUint8x16Slice loads an Uint8x16 from a slice of at least 16 uint8s

func LoadUint8x16SlicePart

func LoadUint8x16SlicePart(s []uint8) Uint8x16

LoadUint8x16SlicePart loads a Uint8x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadUint8x16Slice.

func (Uint8x16) AESDecryptLastRound

func (x Uint8x16) AESDecryptLastRound(y Uint32x4) Uint8x16

AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)

Asm: VAESDECLAST, CPU Feature: AVX, AES

func (Uint8x16) AESDecryptOneRound

func (x Uint8x16) AESDecryptOneRound(y Uint32x4) Uint8x16

AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)

Asm: VAESDEC, CPU Feature: AVX, AES

func (Uint8x16) AESEncryptLastRound

func (x Uint8x16) AESEncryptLastRound(y Uint32x4) Uint8x16

AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey((ShiftRows(SubBytes(x))), y)

Asm: VAESENCLAST, CPU Feature: AVX, AES

func (Uint8x16) AESEncryptOneRound

func (x Uint8x16) AESEncryptOneRound(y Uint32x4) Uint8x16

AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)

Asm: VAESENC, CPU Feature: AVX, AES

func (Uint8x16) Add

func (x Uint8x16) Add(y Uint8x16) Uint8x16

Add adds corresponding elements of two vectors.

Asm: VPADDB, CPU Feature: AVX

func (Uint8x16) AddSaturated

func (x Uint8x16) AddSaturated(y Uint8x16) Uint8x16

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDUSB, CPU Feature: AVX

func (Uint8x16) And

func (x Uint8x16) And(y Uint8x16) Uint8x16

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Uint8x16) AndNot

func (x Uint8x16) AndNot(y Uint8x16) Uint8x16

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Uint8x16) AsFloat32x4

func (from Uint8x16) AsFloat32x4() (to Float32x4)

Float32x4 converts from Uint8x16 to Float32x4

func (Uint8x16) AsFloat64x2

func (from Uint8x16) AsFloat64x2() (to Float64x2)

Float64x2 converts from Uint8x16 to Float64x2

func (Uint8x16) AsInt16x8

func (from Uint8x16) AsInt16x8() (to Int16x8)

Int16x8 converts from Uint8x16 to Int16x8

func (Uint8x16) AsInt32x4

func (from Uint8x16) AsInt32x4() (to Int32x4)

Int32x4 converts from Uint8x16 to Int32x4

func (Uint8x16) AsInt64x2

func (from Uint8x16) AsInt64x2() (to Int64x2)

Int64x2 converts from Uint8x16 to Int64x2

func (Uint8x16) AsInt8x16

func (from Uint8x16) AsInt8x16() (to Int8x16)

Int8x16 converts from Uint8x16 to Int8x16

func (Uint8x16) AsUint16x8

func (from Uint8x16) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Uint8x16 to Uint16x8

func (Uint8x16) AsUint32x4

func (from Uint8x16) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Uint8x16 to Uint32x4

func (Uint8x16) AsUint64x2

func (from Uint8x16) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Uint8x16 to Uint64x2

func (Uint8x16) Average

func (x Uint8x16) Average(y Uint8x16) Uint8x16

Average computes the rounded average of corresponding elements.

Asm: VPAVGB, CPU Feature: AVX

func (Uint8x16) Broadcast128

func (x Uint8x16) Broadcast128() Uint8x16

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTB, CPU Feature: AVX2

func (Uint8x16) Broadcast256

func (x Uint8x16) Broadcast256() Uint8x32

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTB, CPU Feature: AVX2

func (Uint8x16) Broadcast512

func (x Uint8x16) Broadcast512() Uint8x64

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTB, CPU Feature: AVX512

func (Uint8x16) Compress

func (x Uint8x16) Compress(mask Mask8x16) Uint8x16

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

func (Uint8x16) ConcatPermute

func (x Uint8x16) ConcatPermute(y Uint8x16, indices Uint8x16) Uint8x16

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2B, CPU Feature: AVX512VBMI

func (Uint8x16) ConcatShiftBytesRight

func (x Uint8x16) ConcatShiftBytesRight(constant uint8, y Uint8x16) Uint8x16

ConcatShiftBytesRight concatenates x and y and shift it right by constant bytes. The result vector will be the lower half of the concatenated vector.

constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPALIGNR, CPU Feature: AVX

func (Uint8x16) DotProductPairsSaturated

func (x Uint8x16) DotProductPairsSaturated(y Int8x16) Int16x8

DotProductPairsSaturated multiplies the elements and add the pairs together with saturation, yielding a vector of half as many elements with twice the input element size.

Asm: VPMADDUBSW, CPU Feature: AVX

func (Uint8x16) Equal

func (x Uint8x16) Equal(y Uint8x16) Mask8x16

Equal returns x equals y, elementwise.

Asm: VPCMPEQB, CPU Feature: AVX

func (Uint8x16) Expand

func (x Uint8x16) Expand(mask Mask8x16) Uint8x16

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

func (Uint8x16) ExtendLo2ToUint64x2

func (x Uint8x16) ExtendLo2ToUint64x2() Uint64x2

ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXBQ, CPU Feature: AVX

func (Uint8x16) ExtendLo4ToUint32x4

func (x Uint8x16) ExtendLo4ToUint32x4() Uint32x4

ExtendLo4ToUint32x4 converts 4 lowest vector element values to uint32. The result vector's elements are zero-extended.

Asm: VPMOVZXBD, CPU Feature: AVX

func (Uint8x16) ExtendLo4ToUint64x4

func (x Uint8x16) ExtendLo4ToUint64x4() Uint64x4

ExtendLo4ToUint64x4 converts 4 lowest vector element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXBQ, CPU Feature: AVX2

func (Uint8x16) ExtendLo8ToUint16x8

func (x Uint8x16) ExtendLo8ToUint16x8() Uint16x8

ExtendLo8ToUint16x8 converts 8 lowest vector element values to uint16. The result vector's elements are zero-extended.

Asm: VPMOVZXBW, CPU Feature: AVX

func (Uint8x16) ExtendLo8ToUint32x8

func (x Uint8x16) ExtendLo8ToUint32x8() Uint32x8

ExtendLo8ToUint32x8 converts 8 lowest vector element values to uint32. The result vector's elements are zero-extended.

Asm: VPMOVZXBD, CPU Feature: AVX2

func (Uint8x16) ExtendLo8ToUint64x8

func (x Uint8x16) ExtendLo8ToUint64x8() Uint64x8

ExtendLo8ToUint64x8 converts 8 lowest vector element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXBQ, CPU Feature: AVX512

func (Uint8x16) ExtendToUint16

func (x Uint8x16) ExtendToUint16() Uint16x16

ExtendToUint16 converts element values to uint16. The result vector's elements are zero-extended.

Asm: VPMOVZXBW, CPU Feature: AVX2

func (Uint8x16) ExtendToUint32

func (x Uint8x16) ExtendToUint32() Uint32x16

ExtendToUint32 converts element values to uint32. The result vector's elements are zero-extended.

Asm: VPMOVZXBD, CPU Feature: AVX512

func (Uint8x16) GaloisFieldAffineTransform

func (x Uint8x16) GaloisFieldAffineTransform(y Uint64x2, b uint8) Uint8x16

GaloisFieldAffineTransform computes an affine transformation in GF(2^8): x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.

b results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI

func (Uint8x16) GaloisFieldAffineTransformInverse

func (x Uint8x16) GaloisFieldAffineTransformInverse(y Uint64x2, b uint8) Uint8x16

GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8), with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1: x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.

b results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI

func (Uint8x16) GaloisFieldMul

func (x Uint8x16) GaloisFieldMul(y Uint8x16) Uint8x16

GaloisFieldMul computes element-wise GF(2^8) multiplication with reduction polynomial x^8 + x^4 + x^3 + x + 1.

Asm: VGF2P8MULB, CPU Feature: AVX512GFNI

func (Uint8x16) GetElem

func (x Uint8x16) GetElem(index uint8) uint8

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRB, CPU Feature: AVX512

func (Uint8x16) Greater

func (x Uint8x16) Greater(y Uint8x16) Mask8x16

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX2

func (Uint8x16) GreaterEqual

func (x Uint8x16) GreaterEqual(y Uint8x16) Mask8x16

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Uint8x16) IsZero

func (x Uint8x16) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint8x16) Len

func (x Uint8x16) Len() int

Len returns the number of elements in a Uint8x16

func (Uint8x16) Less

func (x Uint8x16) Less(y Uint8x16) Mask8x16

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Uint8x16) LessEqual

func (x Uint8x16) LessEqual(y Uint8x16) Mask8x16

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Uint8x16) Masked

func (x Uint8x16) Masked(mask Mask8x16) Uint8x16

Masked returns x but with elements zeroed where mask is false.

func (Uint8x16) Max

func (x Uint8x16) Max(y Uint8x16) Uint8x16

Max computes the maximum of corresponding elements.

Asm: VPMAXUB, CPU Feature: AVX

func (Uint8x16) Merge

func (x Uint8x16) Merge(y Uint8x16, mask Mask8x16) Uint8x16

Merge returns x but with elements set to y where mask is false.

func (Uint8x16) Min

func (x Uint8x16) Min(y Uint8x16) Uint8x16

Min computes the minimum of corresponding elements.

Asm: VPMINUB, CPU Feature: AVX

func (Uint8x16) Not

func (x Uint8x16) Not() Uint8x16

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Uint8x16) NotEqual

func (x Uint8x16) NotEqual(y Uint8x16) Mask8x16

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Uint8x16) OnesCount

func (x Uint8x16) OnesCount() Uint8x16

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTB, CPU Feature: AVX512BITALG

func (Uint8x16) Or

func (x Uint8x16) Or(y Uint8x16) Uint8x16

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Uint8x16) Permute

func (x Uint8x16) Permute(indices Uint8x16) Uint8x16

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used

Asm: VPERMB, CPU Feature: AVX512VBMI

func (Uint8x16) PermuteOrZero

func (x Uint8x16) PermuteOrZero(indices Int8x16) Uint8x16

PermuteOrZero performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The lower four bits of each byte-sized index in indices select an element from x, unless the index's sign bit is set in which case zero is used instead.

Asm: VPSHUFB, CPU Feature: AVX

func (Uint8x16) SetElem

func (x Uint8x16) SetElem(index uint8, y uint8) Uint8x16

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRB, CPU Feature: AVX

func (Uint8x16) Store

func (x Uint8x16) Store(y *[16]uint8)

Store stores a Uint8x16 to an array

func (Uint8x16) StoreSlice

func (x Uint8x16) StoreSlice(s []uint8)

StoreSlice stores x into a slice of at least 16 uint8s

func (Uint8x16) StoreSlicePart

func (x Uint8x16) StoreSlicePart(s []uint8)

StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.

func (Uint8x16) String

func (x Uint8x16) String() string

String returns a string representation of SIMD vector x

func (Uint8x16) Sub

func (x Uint8x16) Sub(y Uint8x16) Uint8x16

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBB, CPU Feature: AVX

func (Uint8x16) SubSaturated

func (x Uint8x16) SubSaturated(y Uint8x16) Uint8x16

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBUSB, CPU Feature: AVX

func (Uint8x16) SumAbsDiff

func (x Uint8x16) SumAbsDiff(y Uint8x16) Uint16x8

SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed. This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.

Asm: VPSADBW, CPU Feature: AVX

func (Uint8x16) Xor

func (x Uint8x16) Xor(y Uint8x16) Uint8x16

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Uint8x32

type Uint8x32 struct {
	// contains filtered or unexported fields
}

Uint8x32 is a 256-bit SIMD vector of 32 uint8

func BroadcastUint8x32

func BroadcastUint8x32(x uint8) Uint8x32

BroadcastUint8x32 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadUint8x32

func LoadUint8x32(y *[32]uint8) Uint8x32

LoadUint8x32 loads a Uint8x32 from an array

func LoadUint8x32Slice

func LoadUint8x32Slice(s []uint8) Uint8x32

LoadUint8x32Slice loads an Uint8x32 from a slice of at least 32 uint8s

func LoadUint8x32SlicePart

func LoadUint8x32SlicePart(s []uint8) Uint8x32

LoadUint8x32SlicePart loads a Uint8x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadUint8x32Slice.

func (Uint8x32) AESDecryptLastRound

func (x Uint8x32) AESDecryptLastRound(y Uint32x8) Uint8x32

AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)

Asm: VAESDECLAST, CPU Feature: AVX512VAES

func (Uint8x32) AESDecryptOneRound

func (x Uint8x32) AESDecryptOneRound(y Uint32x8) Uint8x32

AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)

Asm: VAESDEC, CPU Feature: AVX512VAES

func (Uint8x32) AESEncryptLastRound

func (x Uint8x32) AESEncryptLastRound(y Uint32x8) Uint8x32

AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey((ShiftRows(SubBytes(x))), y)

Asm: VAESENCLAST, CPU Feature: AVX512VAES

func (Uint8x32) AESEncryptOneRound

func (x Uint8x32) AESEncryptOneRound(y Uint32x8) Uint8x32

AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)

Asm: VAESENC, CPU Feature: AVX512VAES

func (Uint8x32) Add

func (x Uint8x32) Add(y Uint8x32) Uint8x32

Add adds corresponding elements of two vectors.

Asm: VPADDB, CPU Feature: AVX2

func (Uint8x32) AddSaturated

func (x Uint8x32) AddSaturated(y Uint8x32) Uint8x32

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDUSB, CPU Feature: AVX2

func (Uint8x32) And

func (x Uint8x32) And(y Uint8x32) Uint8x32

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Uint8x32) AndNot

func (x Uint8x32) AndNot(y Uint8x32) Uint8x32

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Uint8x32) AsFloat32x8

func (from Uint8x32) AsFloat32x8() (to Float32x8)

Float32x8 converts from Uint8x32 to Float32x8

func (Uint8x32) AsFloat64x4

func (from Uint8x32) AsFloat64x4() (to Float64x4)

Float64x4 converts from Uint8x32 to Float64x4

func (Uint8x32) AsInt16x16

func (from Uint8x32) AsInt16x16() (to Int16x16)

Int16x16 converts from Uint8x32 to Int16x16

func (Uint8x32) AsInt32x8

func (from Uint8x32) AsInt32x8() (to Int32x8)

Int32x8 converts from Uint8x32 to Int32x8

func (Uint8x32) AsInt64x4

func (from Uint8x32) AsInt64x4() (to Int64x4)

Int64x4 converts from Uint8x32 to Int64x4

func (Uint8x32) AsInt8x32

func (from Uint8x32) AsInt8x32() (to Int8x32)

Int8x32 converts from Uint8x32 to Int8x32

func (Uint8x32) AsUint16x16

func (from Uint8x32) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Uint8x32 to Uint16x16

func (Uint8x32) AsUint32x8

func (from Uint8x32) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Uint8x32 to Uint32x8

func (Uint8x32) AsUint64x4

func (from Uint8x32) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Uint8x32 to Uint64x4

func (Uint8x32) Average

func (x Uint8x32) Average(y Uint8x32) Uint8x32

Average computes the rounded average of corresponding elements.

Asm: VPAVGB, CPU Feature: AVX2

func (Uint8x32) Compress

func (x Uint8x32) Compress(mask Mask8x32) Uint8x32

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

func (Uint8x32) ConcatPermute

func (x Uint8x32) ConcatPermute(y Uint8x32, indices Uint8x32) Uint8x32

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2B, CPU Feature: AVX512VBMI

func (Uint8x32) ConcatShiftBytesRightGrouped

func (x Uint8x32) ConcatShiftBytesRightGrouped(constant uint8, y Uint8x32) Uint8x32

ConcatShiftBytesRightGrouped concatenates x and y and shift it right by constant bytes. The result vector will be the lower half of the concatenated vector. This operation is performed grouped by each 16 byte.

constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPALIGNR, CPU Feature: AVX2

func (Uint8x32) DotProductPairsSaturated

func (x Uint8x32) DotProductPairsSaturated(y Int8x32) Int16x16

DotProductPairsSaturated multiplies the elements and add the pairs together with saturation, yielding a vector of half as many elements with twice the input element size.

Asm: VPMADDUBSW, CPU Feature: AVX2

func (Uint8x32) Equal

func (x Uint8x32) Equal(y Uint8x32) Mask8x32

Equal returns x equals y, elementwise.

Asm: VPCMPEQB, CPU Feature: AVX2

func (Uint8x32) Expand

func (x Uint8x32) Expand(mask Mask8x32) Uint8x32

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

func (Uint8x32) ExtendToUint16

func (x Uint8x32) ExtendToUint16() Uint16x32

ExtendToUint16 converts element values to uint16. The result vector's elements are zero-extended.

Asm: VPMOVZXBW, CPU Feature: AVX512

func (Uint8x32) GaloisFieldAffineTransform

func (x Uint8x32) GaloisFieldAffineTransform(y Uint64x4, b uint8) Uint8x32

GaloisFieldAffineTransform computes an affine transformation in GF(2^8): x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.

b results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI

func (Uint8x32) GaloisFieldAffineTransformInverse

func (x Uint8x32) GaloisFieldAffineTransformInverse(y Uint64x4, b uint8) Uint8x32

GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8), with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1: x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.

b results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI

func (Uint8x32) GaloisFieldMul

func (x Uint8x32) GaloisFieldMul(y Uint8x32) Uint8x32

GaloisFieldMul computes element-wise GF(2^8) multiplication with reduction polynomial x^8 + x^4 + x^3 + x + 1.

Asm: VGF2P8MULB, CPU Feature: AVX512GFNI

func (Uint8x32) GetHi

func (x Uint8x32) GetHi() Uint8x16

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint8x32) GetLo

func (x Uint8x32) GetLo() Uint8x16

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint8x32) Greater

func (x Uint8x32) Greater(y Uint8x32) Mask8x32

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX2

func (Uint8x32) GreaterEqual

func (x Uint8x32) GreaterEqual(y Uint8x32) Mask8x32

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Uint8x32) IsZero

func (x Uint8x32) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint8x32) Len

func (x Uint8x32) Len() int

Len returns the number of elements in a Uint8x32

func (Uint8x32) Less

func (x Uint8x32) Less(y Uint8x32) Mask8x32

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Uint8x32) LessEqual

func (x Uint8x32) LessEqual(y Uint8x32) Mask8x32

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Uint8x32) Masked

func (x Uint8x32) Masked(mask Mask8x32) Uint8x32

Masked returns x but with elements zeroed where mask is false.

func (Uint8x32) Max

func (x Uint8x32) Max(y Uint8x32) Uint8x32

Max computes the maximum of corresponding elements.

Asm: VPMAXUB, CPU Feature: AVX2

func (Uint8x32) Merge

func (x Uint8x32) Merge(y Uint8x32, mask Mask8x32) Uint8x32

Merge returns x but with elements set to y where mask is false.

func (Uint8x32) Min

func (x Uint8x32) Min(y Uint8x32) Uint8x32

Min computes the minimum of corresponding elements.

Asm: VPMINUB, CPU Feature: AVX2

func (Uint8x32) Not

func (x Uint8x32) Not() Uint8x32

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Uint8x32) NotEqual

func (x Uint8x32) NotEqual(y Uint8x32) Mask8x32

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Uint8x32) OnesCount

func (x Uint8x32) OnesCount() Uint8x32

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTB, CPU Feature: AVX512BITALG

func (Uint8x32) Or

func (x Uint8x32) Or(y Uint8x32) Uint8x32

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Uint8x32) Permute

func (x Uint8x32) Permute(indices Uint8x32) Uint8x32

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 5 bits (values 0-31) of each element of indices is used

Asm: VPERMB, CPU Feature: AVX512VBMI

func (Uint8x32) PermuteOrZeroGrouped

func (x Uint8x32) PermuteOrZeroGrouped(indices Int8x32) Uint8x32

PermuteOrZeroGrouped performs a grouped permutation of vector x using indices: result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...} The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.

Asm: VPSHUFB, CPU Feature: AVX2

func (Uint8x32) Select128FromPair

func (x Uint8x32) Select128FromPair(lo, hi uint8, y Uint8x32) Uint8x32

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{0x40, 0x41, ..., 0x4f, 0x50, 0x51, ..., 0x5f}.Select128FromPair(3, 0,
     {0x60, 0x61, ..., 0x6f, 0x70, 0x71, ..., 0x7f})

returns {0x70, 0x71, ..., 0x7f, 0x40, 0x41, ..., 0x4f}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Uint8x32) SetHi

func (x Uint8x32) SetHi(y Uint8x16) Uint8x32

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint8x32) SetLo

func (x Uint8x32) SetLo(y Uint8x16) Uint8x32

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint8x32) Store

func (x Uint8x32) Store(y *[32]uint8)

Store stores a Uint8x32 to an array

func (Uint8x32) StoreSlice

func (x Uint8x32) StoreSlice(s []uint8)

StoreSlice stores x into a slice of at least 32 uint8s

func (Uint8x32) StoreSlicePart

func (x Uint8x32) StoreSlicePart(s []uint8)

StoreSlicePart stores the 32 elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.

func (Uint8x32) String

func (x Uint8x32) String() string

String returns a string representation of SIMD vector x

func (Uint8x32) Sub

func (x Uint8x32) Sub(y Uint8x32) Uint8x32

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBB, CPU Feature: AVX2

func (Uint8x32) SubSaturated

func (x Uint8x32) SubSaturated(y Uint8x32) Uint8x32

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBUSB, CPU Feature: AVX2

func (Uint8x32) SumAbsDiff

func (x Uint8x32) SumAbsDiff(y Uint8x32) Uint16x16

SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed. This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.

Asm: VPSADBW, CPU Feature: AVX2

func (Uint8x32) Xor

func (x Uint8x32) Xor(y Uint8x32) Uint8x32

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Uint8x64

type Uint8x64 struct {
	// contains filtered or unexported fields
}

Uint8x64 is a 512-bit SIMD vector of 64 uint8

func BroadcastUint8x64

func BroadcastUint8x64(x uint8) Uint8x64

BroadcastUint8x64 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512BW

func LoadMaskedUint8x64

func LoadMaskedUint8x64(y *[64]uint8, mask Mask8x64) Uint8x64

LoadMaskedUint8x64 loads a Uint8x64 from an array, at those elements enabled by mask

Asm: VMOVDQU8.Z, CPU Feature: AVX512

func LoadUint8x64

func LoadUint8x64(y *[64]uint8) Uint8x64

LoadUint8x64 loads a Uint8x64 from an array

func LoadUint8x64Slice

func LoadUint8x64Slice(s []uint8) Uint8x64

LoadUint8x64Slice loads an Uint8x64 from a slice of at least 64 uint8s

func LoadUint8x64SlicePart

func LoadUint8x64SlicePart(s []uint8) Uint8x64

LoadUint8x64SlicePart loads a Uint8x64 from the slice s. If s has fewer than 64 elements, the remaining elements of the vector are filled with zeroes. If s has 64 or more elements, the function is equivalent to LoadUint8x64Slice.

func (Uint8x64) AESDecryptLastRound

func (x Uint8x64) AESDecryptLastRound(y Uint32x16) Uint8x64

AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)

Asm: VAESDECLAST, CPU Feature: AVX512VAES

func (Uint8x64) AESDecryptOneRound

func (x Uint8x64) AESDecryptOneRound(y Uint32x16) Uint8x64

AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)

Asm: VAESDEC, CPU Feature: AVX512VAES

func (Uint8x64) AESEncryptLastRound

func (x Uint8x64) AESEncryptLastRound(y Uint32x16) Uint8x64

AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey((ShiftRows(SubBytes(x))), y)

Asm: VAESENCLAST, CPU Feature: AVX512VAES

func (Uint8x64) AESEncryptOneRound

func (x Uint8x64) AESEncryptOneRound(y Uint32x16) Uint8x64

AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)

Asm: VAESENC, CPU Feature: AVX512VAES

func (Uint8x64) Add

func (x Uint8x64) Add(y Uint8x64) Uint8x64

Add adds corresponding elements of two vectors.

Asm: VPADDB, CPU Feature: AVX512

func (Uint8x64) AddSaturated

func (x Uint8x64) AddSaturated(y Uint8x64) Uint8x64

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDUSB, CPU Feature: AVX512

func (Uint8x64) And

func (x Uint8x64) And(y Uint8x64) Uint8x64

And performs a bitwise AND operation between two vectors.

Asm: VPANDD, CPU Feature: AVX512

func (Uint8x64) AndNot

func (x Uint8x64) AndNot(y Uint8x64) Uint8x64

AndNot performs a bitwise x &^ y.

Asm: VPANDND, CPU Feature: AVX512

func (Uint8x64) AsFloat32x16

func (from Uint8x64) AsFloat32x16() (to Float32x16)

Float32x16 converts from Uint8x64 to Float32x16

func (Uint8x64) AsFloat64x8

func (from Uint8x64) AsFloat64x8() (to Float64x8)

Float64x8 converts from Uint8x64 to Float64x8

func (Uint8x64) AsInt16x32

func (from Uint8x64) AsInt16x32() (to Int16x32)

Int16x32 converts from Uint8x64 to Int16x32

func (Uint8x64) AsInt32x16

func (from Uint8x64) AsInt32x16() (to Int32x16)

Int32x16 converts from Uint8x64 to Int32x16

func (Uint8x64) AsInt64x8

func (from Uint8x64) AsInt64x8() (to Int64x8)

Int64x8 converts from Uint8x64 to Int64x8

func (Uint8x64) AsInt8x64

func (from Uint8x64) AsInt8x64() (to Int8x64)

Int8x64 converts from Uint8x64 to Int8x64

func (Uint8x64) AsUint16x32

func (from Uint8x64) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Uint8x64 to Uint16x32

func (Uint8x64) AsUint32x16

func (from Uint8x64) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Uint8x64 to Uint32x16

func (Uint8x64) AsUint64x8

func (from Uint8x64) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Uint8x64 to Uint64x8

func (Uint8x64) Average

func (x Uint8x64) Average(y Uint8x64) Uint8x64

Average computes the rounded average of corresponding elements.

Asm: VPAVGB, CPU Feature: AVX512

func (Uint8x64) Compress

func (x Uint8x64) Compress(mask Mask8x64) Uint8x64

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

func (Uint8x64) ConcatPermute

func (x Uint8x64) ConcatPermute(y Uint8x64, indices Uint8x64) Uint8x64

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2B, CPU Feature: AVX512VBMI

func (Uint8x64) ConcatShiftBytesRightGrouped

func (x Uint8x64) ConcatShiftBytesRightGrouped(constant uint8, y Uint8x64) Uint8x64

ConcatShiftBytesRightGrouped concatenates x and y and shift it right by constant bytes. The result vector will be the lower half of the concatenated vector. This operation is performed grouped by each 16 byte.

constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPALIGNR, CPU Feature: AVX512

func (Uint8x64) DotProductPairsSaturated

func (x Uint8x64) DotProductPairsSaturated(y Int8x64) Int16x32

DotProductPairsSaturated multiplies the elements and add the pairs together with saturation, yielding a vector of half as many elements with twice the input element size.

Asm: VPMADDUBSW, CPU Feature: AVX512

func (Uint8x64) Equal

func (x Uint8x64) Equal(y Uint8x64) Mask8x64

Equal returns x equals y, elementwise.

Asm: VPCMPEQB, CPU Feature: AVX512

func (Uint8x64) Expand

func (x Uint8x64) Expand(mask Mask8x64) Uint8x64

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

func (Uint8x64) GaloisFieldAffineTransform

func (x Uint8x64) GaloisFieldAffineTransform(y Uint64x8, b uint8) Uint8x64

GaloisFieldAffineTransform computes an affine transformation in GF(2^8): x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.

b results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI

func (Uint8x64) GaloisFieldAffineTransformInverse

func (x Uint8x64) GaloisFieldAffineTransformInverse(y Uint64x8, b uint8) Uint8x64

GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8), with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1: x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.

b results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI

func (Uint8x64) GaloisFieldMul

func (x Uint8x64) GaloisFieldMul(y Uint8x64) Uint8x64

GaloisFieldMul computes element-wise GF(2^8) multiplication with reduction polynomial x^8 + x^4 + x^3 + x + 1.

Asm: VGF2P8MULB, CPU Feature: AVX512GFNI

func (Uint8x64) GetHi

func (x Uint8x64) GetHi() Uint8x32

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint8x64) GetLo

func (x Uint8x64) GetLo() Uint8x32

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint8x64) Greater

func (x Uint8x64) Greater(y Uint8x64) Mask8x64

Greater returns x greater-than y, elementwise.

Asm: VPCMPUB, CPU Feature: AVX512

func (Uint8x64) GreaterEqual

func (x Uint8x64) GreaterEqual(y Uint8x64) Mask8x64

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPUB, CPU Feature: AVX512

func (Uint8x64) Len

func (x Uint8x64) Len() int

Len returns the number of elements in a Uint8x64

func (Uint8x64) Less

func (x Uint8x64) Less(y Uint8x64) Mask8x64

Less returns x less-than y, elementwise.

Asm: VPCMPUB, CPU Feature: AVX512

func (Uint8x64) LessEqual

func (x Uint8x64) LessEqual(y Uint8x64) Mask8x64

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPUB, CPU Feature: AVX512

func (Uint8x64) Masked

func (x Uint8x64) Masked(mask Mask8x64) Uint8x64

Masked returns x but with elements zeroed where mask is false.

func (Uint8x64) Max

func (x Uint8x64) Max(y Uint8x64) Uint8x64

Max computes the maximum of corresponding elements.

Asm: VPMAXUB, CPU Feature: AVX512

func (Uint8x64) Merge

func (x Uint8x64) Merge(y Uint8x64, mask Mask8x64) Uint8x64

Merge returns x but with elements set to y where m is false.

func (Uint8x64) Min

func (x Uint8x64) Min(y Uint8x64) Uint8x64

Min computes the minimum of corresponding elements.

Asm: VPMINUB, CPU Feature: AVX512

func (Uint8x64) Not

func (x Uint8x64) Not() Uint8x64

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Uint8x64) NotEqual

func (x Uint8x64) NotEqual(y Uint8x64) Mask8x64

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPUB, CPU Feature: AVX512

func (Uint8x64) OnesCount

func (x Uint8x64) OnesCount() Uint8x64

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTB, CPU Feature: AVX512BITALG

func (Uint8x64) Or

func (x Uint8x64) Or(y Uint8x64) Uint8x64

Or performs a bitwise OR operation between two vectors.

Asm: VPORD, CPU Feature: AVX512

func (Uint8x64) Permute

func (x Uint8x64) Permute(indices Uint8x64) Uint8x64

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 6 bits (values 0-63) of each element of indices is used

Asm: VPERMB, CPU Feature: AVX512VBMI

func (Uint8x64) PermuteOrZeroGrouped

func (x Uint8x64) PermuteOrZeroGrouped(indices Int8x64) Uint8x64

PermuteOrZeroGrouped performs a grouped permutation of vector x using indices: result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...} The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.

Asm: VPSHUFB, CPU Feature: AVX512

func (Uint8x64) SetHi

func (x Uint8x64) SetHi(y Uint8x32) Uint8x64

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint8x64) SetLo

func (x Uint8x64) SetLo(y Uint8x32) Uint8x64

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint8x64) Store

func (x Uint8x64) Store(y *[64]uint8)

Store stores a Uint8x64 to an array

func (Uint8x64) StoreMasked

func (x Uint8x64) StoreMasked(y *[64]uint8, mask Mask8x64)

StoreMasked stores a Uint8x64 to an array, at those elements enabled by mask

Asm: VMOVDQU8, CPU Feature: AVX512

func (Uint8x64) StoreSlice

func (x Uint8x64) StoreSlice(s []uint8)

StoreSlice stores x into a slice of at least 64 uint8s

func (Uint8x64) StoreSlicePart

func (x Uint8x64) StoreSlicePart(s []uint8)

StoreSlicePart stores the 64 elements of x into the slice s. It stores as many elements as will fit in s. If s has 64 or more elements, the method is equivalent to x.StoreSlice.

func (Uint8x64) String

func (x Uint8x64) String() string

String returns a string representation of SIMD vector x

func (Uint8x64) Sub

func (x Uint8x64) Sub(y Uint8x64) Uint8x64

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBB, CPU Feature: AVX512

func (Uint8x64) SubSaturated

func (x Uint8x64) SubSaturated(y Uint8x64) Uint8x64

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBUSB, CPU Feature: AVX512

func (Uint8x64) SumAbsDiff

func (x Uint8x64) SumAbsDiff(y Uint8x64) Uint16x32

SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed. This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.

Asm: VPSADBW, CPU Feature: AVX512

func (Uint8x64) Xor

func (x Uint8x64) Xor(y Uint8x64) Uint8x64

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORD, CPU Feature: AVX512

type X86Features

type X86Features struct{}
var X86 X86Features

func (X86Features) AES

func (X86Features) AES() bool

AES returns whether the CPU supports the AES feature.

AES is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX

func (X86Features) AVX() bool

AVX returns whether the CPU supports the AVX feature.

AVX is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX2

func (X86Features) AVX2() bool

AVX2 returns whether the CPU supports the AVX2 feature.

AVX2 is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512

func (X86Features) AVX512() bool

AVX512 returns whether the CPU supports the AVX512F+CD+BW+DQ+VL features.

These five CPU features are bundled together, and no use of AVX-512 is allowed unless all of these features are supported together. Nearly every CPU that has shipped with any support for AVX-512 has supported all five of these features.

AVX512 is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512BITALG

func (X86Features) AVX512BITALG() bool

AVX512BITALG returns whether the CPU supports the AVX512BITALG feature.

AVX512BITALG is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512GFNI

func (X86Features) AVX512GFNI() bool

AVX512GFNI returns whether the CPU supports the AVX512GFNI feature.

AVX512GFNI is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512VAES

func (X86Features) AVX512VAES() bool

AVX512VAES returns whether the CPU supports the AVX512VAES feature.

AVX512VAES is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512VBMI

func (X86Features) AVX512VBMI() bool

AVX512VBMI returns whether the CPU supports the AVX512VBMI feature.

AVX512VBMI is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512VBMI2

func (X86Features) AVX512VBMI2() bool

AVX512VBMI2 returns whether the CPU supports the AVX512VBMI2 feature.

AVX512VBMI2 is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512VNNI

func (X86Features) AVX512VNNI() bool

AVX512VNNI returns whether the CPU supports the AVX512VNNI feature.

AVX512VNNI is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512VPCLMULQDQ

func (X86Features) AVX512VPCLMULQDQ() bool

AVX512VPCLMULQDQ returns whether the CPU supports the AVX512VPCLMULQDQ feature.

AVX512VPCLMULQDQ is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512VPOPCNTDQ

func (X86Features) AVX512VPOPCNTDQ() bool

AVX512VPOPCNTDQ returns whether the CPU supports the AVX512VPOPCNTDQ feature.

AVX512VPOPCNTDQ is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVXVNNI

func (X86Features) AVXVNNI() bool

AVXVNNI returns whether the CPU supports the AVXVNNI feature.

AVXVNNI is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) SHA

func (X86Features) SHA() bool

SHA returns whether the CPU supports the SHA feature.

SHA is defined on all GOARCHes, but will only return true on GOARCH amd64.

Notes

Bugs

  • Using a vector type as a type parameter may not work.

  • Using reflect Call to call a vector function/method may not work.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL