archsimd

package standard library

go1.26rc1 Latest Latest Go to latest Published: Dec 16, 2025 License: BSD-3-Clause Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

cs.opensource.google/go/go

Links

Report a Vulnerability

Documentation ¶

Rendered for

Overview ¶

Package archsimd provides access to architecture-specific SIMD operations.

This is a low-level package that exposes hardware-specific functionality. It currently supports AMD64.

This package is experimental, and not subject to the Go 1 compatibility promise. It only exists when building with the GOEXPERIMENT=simd environment variable set.

Vector types and operations ¶

Vector types are defined as structs, such as Int8x16 and Float64x8, corresponding to the hardware's vector registers. On AMD64, 128-, 256-, and 512-bit vectors are supported.

Mask types are defined similarly, such as Mask8x16, and are represented as opaque types, handling the differences in the underlying representations. A mask can be converted to/from the corresponding integer vector type, or to/from a bitmask.

Operations are mostly defined as methods on the vector types. Most of them are compiler intrinsics and correspond directly to hardware instructions.

Common operations include:

Load/Store: Load a vector from memory or store a vector to memory.
Arithmetic: Add, Sub, Mul, etc.
Bitwise: And, Or, Xor, etc.
Comparison: Equal, Greater, etc., which produce a mask.
Conversion: Convert between different vector types.
Field selection and rearrangement: GetElem, Permute, etc.
Masking: Masked, Merge.

The compiler recognizes certain patterns of operations and may optimize them to more performant instructions. For example, on AVX512, an Add operation followed by Masked may be optimized to a masked add instruction. For this reason, not all hardware instructions are available as APIs.

CPU feature checks ¶

The package provides global variables to check for CPU features available at runtime. For example, on AMD64, the X86 variable provides methods to check for AVX2, AVX512, etc. It is recommended to check for CPU features before using the corresponding vector operations.

Notes ¶

This package is not portable, as the available types and operations depend on the target architecture. It is not recommended to expose the SIMD types defined in this package in public APIs.
For performance reasons, it is recommended to use the vector types directly as values. It is not recommended to take the address of a vector type, allocate it in the heap, or put it in an aggregate type.

Index ¶

func ClearAVXUpperBits()
type Float32x16
- func BroadcastFloat32x16(x float32) Float32x16
- func LoadFloat32x16(y *[16]float32) Float32x16
- func LoadFloat32x16Slice(s []float32) Float32x16
- func LoadFloat32x16SlicePart(s []float32) Float32x16
- func LoadMaskedFloat32x16(y *[16]float32, mask Mask32x16) Float32x16
- func (x Float32x16) Add(y Float32x16) Float32x16
- func (from Float32x16) AsFloat64x8() (to Float64x8)
- func (from Float32x16) AsInt16x32() (to Int16x32)
- func (from Float32x16) AsInt32x16() (to Int32x16)
- func (from Float32x16) AsInt64x8() (to Int64x8)
- func (from Float32x16) AsInt8x64() (to Int8x64)
- func (from Float32x16) AsUint16x32() (to Uint16x32)
- func (from Float32x16) AsUint32x16() (to Uint32x16)
- func (from Float32x16) AsUint64x8() (to Uint64x8)
- func (from Float32x16) AsUint8x64() (to Uint8x64)
- func (x Float32x16) CeilScaled(prec uint8) Float32x16
- func (x Float32x16) CeilScaledResidue(prec uint8) Float32x16
- func (x Float32x16) Compress(mask Mask32x16) Float32x16
- func (x Float32x16) ConcatPermute(y Float32x16, indices Uint32x16) Float32x16
- func (x Float32x16) ConvertToInt32() Int32x16
- func (x Float32x16) ConvertToUint32() Uint32x16
- func (x Float32x16) Div(y Float32x16) Float32x16
- func (x Float32x16) Equal(y Float32x16) Mask32x16
- func (x Float32x16) Expand(mask Mask32x16) Float32x16
- func (x Float32x16) FloorScaled(prec uint8) Float32x16
- func (x Float32x16) FloorScaledResidue(prec uint8) Float32x16
- func (x Float32x16) GetHi() Float32x8
- func (x Float32x16) GetLo() Float32x8
- func (x Float32x16) Greater(y Float32x16) Mask32x16
- func (x Float32x16) GreaterEqual(y Float32x16) Mask32x16
- func (x Float32x16) IsNan(y Float32x16) Mask32x16
- func (x Float32x16) Len() int
- func (x Float32x16) Less(y Float32x16) Mask32x16
- func (x Float32x16) LessEqual(y Float32x16) Mask32x16
- func (x Float32x16) Masked(mask Mask32x16) Float32x16
- func (x Float32x16) Max(y Float32x16) Float32x16
- func (x Float32x16) Merge(y Float32x16, mask Mask32x16) Float32x16
- func (x Float32x16) Min(y Float32x16) Float32x16
- func (x Float32x16) Mul(y Float32x16) Float32x16
- func (x Float32x16) MulAdd(y Float32x16, z Float32x16) Float32x16
- func (x Float32x16) MulAddSub(y Float32x16, z Float32x16) Float32x16
- func (x Float32x16) MulSubAdd(y Float32x16, z Float32x16) Float32x16
- func (x Float32x16) NotEqual(y Float32x16) Mask32x16
- func (x Float32x16) Permute(indices Uint32x16) Float32x16
- func (x Float32x16) Reciprocal() Float32x16
- func (x Float32x16) ReciprocalSqrt() Float32x16
- func (x Float32x16) RoundToEvenScaled(prec uint8) Float32x16
- func (x Float32x16) RoundToEvenScaledResidue(prec uint8) Float32x16
- func (x Float32x16) Scale(y Float32x16) Float32x16
- func (x Float32x16) SelectFromPairGrouped(a, b, c, d uint8, y Float32x16) Float32x16
- func (x Float32x16) SetHi(y Float32x8) Float32x16
- func (x Float32x16) SetLo(y Float32x8) Float32x16
- func (x Float32x16) Sqrt() Float32x16
- func (x Float32x16) Store(y *[16]float32)
- func (x Float32x16) StoreMasked(y *[16]float32, mask Mask32x16)
- func (x Float32x16) StoreSlice(s []float32)
- func (x Float32x16) StoreSlicePart(s []float32)
- func (x Float32x16) String() string
- func (x Float32x16) Sub(y Float32x16) Float32x16
- func (x Float32x16) TruncScaled(prec uint8) Float32x16
- func (x Float32x16) TruncScaledResidue(prec uint8) Float32x16
type Float32x4
- func BroadcastFloat32x4(x float32) Float32x4
- func LoadFloat32x4(y *[4]float32) Float32x4
- func LoadFloat32x4Slice(s []float32) Float32x4
- func LoadFloat32x4SlicePart(s []float32) Float32x4
- func LoadMaskedFloat32x4(y *[4]float32, mask Mask32x4) Float32x4
- func (x Float32x4) Add(y Float32x4) Float32x4
- func (x Float32x4) AddPairs(y Float32x4) Float32x4
- func (x Float32x4) AddSub(y Float32x4) Float32x4
- func (from Float32x4) AsFloat64x2() (to Float64x2)
- func (from Float32x4) AsInt16x8() (to Int16x8)
- func (from Float32x4) AsInt32x4() (to Int32x4)
- func (from Float32x4) AsInt64x2() (to Int64x2)
- func (from Float32x4) AsInt8x16() (to Int8x16)
- func (from Float32x4) AsUint16x8() (to Uint16x8)
- func (from Float32x4) AsUint32x4() (to Uint32x4)
- func (from Float32x4) AsUint64x2() (to Uint64x2)
- func (from Float32x4) AsUint8x16() (to Uint8x16)
- func (x Float32x4) Broadcast128() Float32x4
- func (x Float32x4) Broadcast256() Float32x8
- func (x Float32x4) Broadcast512() Float32x16
- func (x Float32x4) Ceil() Float32x4
- func (x Float32x4) CeilScaled(prec uint8) Float32x4
- func (x Float32x4) CeilScaledResidue(prec uint8) Float32x4
- func (x Float32x4) Compress(mask Mask32x4) Float32x4
- func (x Float32x4) ConcatPermute(y Float32x4, indices Uint32x4) Float32x4
- func (x Float32x4) ConvertToFloat64() Float64x4
- func (x Float32x4) ConvertToInt32() Int32x4
- func (x Float32x4) ConvertToInt64() Int64x4
- func (x Float32x4) ConvertToUint32() Uint32x4
- func (x Float32x4) ConvertToUint64() Uint64x4
- func (x Float32x4) Div(y Float32x4) Float32x4
- func (x Float32x4) Equal(y Float32x4) Mask32x4
- func (x Float32x4) Expand(mask Mask32x4) Float32x4
- func (x Float32x4) Floor() Float32x4
- func (x Float32x4) FloorScaled(prec uint8) Float32x4
- func (x Float32x4) FloorScaledResidue(prec uint8) Float32x4
- func (x Float32x4) GetElem(index uint8) float32
- func (x Float32x4) Greater(y Float32x4) Mask32x4
- func (x Float32x4) GreaterEqual(y Float32x4) Mask32x4
- func (x Float32x4) IsNan(y Float32x4) Mask32x4
- func (x Float32x4) Len() int
- func (x Float32x4) Less(y Float32x4) Mask32x4
- func (x Float32x4) LessEqual(y Float32x4) Mask32x4
- func (x Float32x4) Masked(mask Mask32x4) Float32x4
- func (x Float32x4) Max(y Float32x4) Float32x4
- func (x Float32x4) Merge(y Float32x4, mask Mask32x4) Float32x4
- func (x Float32x4) Min(y Float32x4) Float32x4
- func (x Float32x4) Mul(y Float32x4) Float32x4
- func (x Float32x4) MulAdd(y Float32x4, z Float32x4) Float32x4
- func (x Float32x4) MulAddSub(y Float32x4, z Float32x4) Float32x4
- func (x Float32x4) MulSubAdd(y Float32x4, z Float32x4) Float32x4
- func (x Float32x4) NotEqual(y Float32x4) Mask32x4
- func (x Float32x4) Reciprocal() Float32x4
- func (x Float32x4) ReciprocalSqrt() Float32x4
- func (x Float32x4) RoundToEven() Float32x4
- func (x Float32x4) RoundToEvenScaled(prec uint8) Float32x4
- func (x Float32x4) RoundToEvenScaledResidue(prec uint8) Float32x4
- func (x Float32x4) Scale(y Float32x4) Float32x4
- func (x Float32x4) SelectFromPair(a, b, c, d uint8, y Float32x4) Float32x4
- func (x Float32x4) SetElem(index uint8, y float32) Float32x4
- func (x Float32x4) Sqrt() Float32x4
- func (x Float32x4) Store(y *[4]float32)
- func (x Float32x4) StoreMasked(y *[4]float32, mask Mask32x4)
- func (x Float32x4) StoreSlice(s []float32)
- func (x Float32x4) StoreSlicePart(s []float32)
- func (x Float32x4) String() string
- func (x Float32x4) Sub(y Float32x4) Float32x4
- func (x Float32x4) SubPairs(y Float32x4) Float32x4
- func (x Float32x4) Trunc() Float32x4
- func (x Float32x4) TruncScaled(prec uint8) Float32x4
- func (x Float32x4) TruncScaledResidue(prec uint8) Float32x4
type Float32x8
- func BroadcastFloat32x8(x float32) Float32x8
- func LoadFloat32x8(y *[8]float32) Float32x8
- func LoadFloat32x8Slice(s []float32) Float32x8
- func LoadFloat32x8SlicePart(s []float32) Float32x8
- func LoadMaskedFloat32x8(y *[8]float32, mask Mask32x8) Float32x8
- func (x Float32x8) Add(y Float32x8) Float32x8
- func (x Float32x8) AddPairs(y Float32x8) Float32x8
- func (x Float32x8) AddSub(y Float32x8) Float32x8
- func (from Float32x8) AsFloat64x4() (to Float64x4)
- func (from Float32x8) AsInt16x16() (to Int16x16)
- func (from Float32x8) AsInt32x8() (to Int32x8)
- func (from Float32x8) AsInt64x4() (to Int64x4)
- func (from Float32x8) AsInt8x32() (to Int8x32)
- func (from Float32x8) AsUint16x16() (to Uint16x16)
- func (from Float32x8) AsUint32x8() (to Uint32x8)
- func (from Float32x8) AsUint64x4() (to Uint64x4)
- func (from Float32x8) AsUint8x32() (to Uint8x32)
- func (x Float32x8) Ceil() Float32x8
- func (x Float32x8) CeilScaled(prec uint8) Float32x8
- func (x Float32x8) CeilScaledResidue(prec uint8) Float32x8
- func (x Float32x8) Compress(mask Mask32x8) Float32x8
- func (x Float32x8) ConcatPermute(y Float32x8, indices Uint32x8) Float32x8
- func (x Float32x8) ConvertToFloat64() Float64x8
- func (x Float32x8) ConvertToInt32() Int32x8
- func (x Float32x8) ConvertToInt64() Int64x8
- func (x Float32x8) ConvertToUint32() Uint32x8
- func (x Float32x8) ConvertToUint64() Uint64x8
- func (x Float32x8) Div(y Float32x8) Float32x8
- func (x Float32x8) Equal(y Float32x8) Mask32x8
- func (x Float32x8) Expand(mask Mask32x8) Float32x8
- func (x Float32x8) Floor() Float32x8
- func (x Float32x8) FloorScaled(prec uint8) Float32x8
- func (x Float32x8) FloorScaledResidue(prec uint8) Float32x8
- func (x Float32x8) GetHi() Float32x4
- func (x Float32x8) GetLo() Float32x4
- func (x Float32x8) Greater(y Float32x8) Mask32x8
- func (x Float32x8) GreaterEqual(y Float32x8) Mask32x8
- func (x Float32x8) IsNan(y Float32x8) Mask32x8
- func (x Float32x8) Len() int
- func (x Float32x8) Less(y Float32x8) Mask32x8
- func (x Float32x8) LessEqual(y Float32x8) Mask32x8
- func (x Float32x8) Masked(mask Mask32x8) Float32x8
- func (x Float32x8) Max(y Float32x8) Float32x8
- func (x Float32x8) Merge(y Float32x8, mask Mask32x8) Float32x8
- func (x Float32x8) Min(y Float32x8) Float32x8
- func (x Float32x8) Mul(y Float32x8) Float32x8
- func (x Float32x8) MulAdd(y Float32x8, z Float32x8) Float32x8
- func (x Float32x8) MulAddSub(y Float32x8, z Float32x8) Float32x8
- func (x Float32x8) MulSubAdd(y Float32x8, z Float32x8) Float32x8
- func (x Float32x8) NotEqual(y Float32x8) Mask32x8
- func (x Float32x8) Permute(indices Uint32x8) Float32x8
- func (x Float32x8) Reciprocal() Float32x8
- func (x Float32x8) ReciprocalSqrt() Float32x8
- func (x Float32x8) RoundToEven() Float32x8
- func (x Float32x8) RoundToEvenScaled(prec uint8) Float32x8
- func (x Float32x8) RoundToEvenScaledResidue(prec uint8) Float32x8
- func (x Float32x8) Scale(y Float32x8) Float32x8
- func (x Float32x8) Select128FromPair(lo, hi uint8, y Float32x8) Float32x8
- func (x Float32x8) SelectFromPairGrouped(a, b, c, d uint8, y Float32x8) Float32x8
- func (x Float32x8) SetHi(y Float32x4) Float32x8
- func (x Float32x8) SetLo(y Float32x4) Float32x8
- func (x Float32x8) Sqrt() Float32x8
- func (x Float32x8) Store(y *[8]float32)
- func (x Float32x8) StoreMasked(y *[8]float32, mask Mask32x8)
- func (x Float32x8) StoreSlice(s []float32)
- func (x Float32x8) StoreSlicePart(s []float32)
- func (x Float32x8) String() string
- func (x Float32x8) Sub(y Float32x8) Float32x8
- func (x Float32x8) SubPairs(y Float32x8) Float32x8
- func (x Float32x8) Trunc() Float32x8
- func (x Float32x8) TruncScaled(prec uint8) Float32x8
- func (x Float32x8) TruncScaledResidue(prec uint8) Float32x8
type Float64x2
- func BroadcastFloat64x2(x float64) Float64x2
- func LoadFloat64x2(y *[2]float64) Float64x2
- func LoadFloat64x2Slice(s []float64) Float64x2
- func LoadFloat64x2SlicePart(s []float64) Float64x2
- func LoadMaskedFloat64x2(y *[2]float64, mask Mask64x2) Float64x2
- func (x Float64x2) Add(y Float64x2) Float64x2
- func (x Float64x2) AddPairs(y Float64x2) Float64x2
- func (x Float64x2) AddSub(y Float64x2) Float64x2
- func (from Float64x2) AsFloat32x4() (to Float32x4)
- func (from Float64x2) AsInt16x8() (to Int16x8)
- func (from Float64x2) AsInt32x4() (to Int32x4)
- func (from Float64x2) AsInt64x2() (to Int64x2)
- func (from Float64x2) AsInt8x16() (to Int8x16)
- func (from Float64x2) AsUint16x8() (to Uint16x8)
- func (from Float64x2) AsUint32x4() (to Uint32x4)
- func (from Float64x2) AsUint64x2() (to Uint64x2)
- func (from Float64x2) AsUint8x16() (to Uint8x16)
- func (x Float64x2) Broadcast128() Float64x2
- func (x Float64x2) Broadcast256() Float64x4
- func (x Float64x2) Broadcast512() Float64x8
- func (x Float64x2) Ceil() Float64x2
- func (x Float64x2) CeilScaled(prec uint8) Float64x2
- func (x Float64x2) CeilScaledResidue(prec uint8) Float64x2
- func (x Float64x2) Compress(mask Mask64x2) Float64x2
- func (x Float64x2) ConcatPermute(y Float64x2, indices Uint64x2) Float64x2
- func (x Float64x2) ConvertToFloat32() Float32x4
- func (x Float64x2) ConvertToInt32() Int32x4
- func (x Float64x2) ConvertToInt64() Int64x2
- func (x Float64x2) ConvertToUint32() Uint32x4
- func (x Float64x2) ConvertToUint64() Uint64x2
- func (x Float64x2) Div(y Float64x2) Float64x2
- func (x Float64x2) Equal(y Float64x2) Mask64x2
- func (x Float64x2) Expand(mask Mask64x2) Float64x2
- func (x Float64x2) Floor() Float64x2
- func (x Float64x2) FloorScaled(prec uint8) Float64x2
- func (x Float64x2) FloorScaledResidue(prec uint8) Float64x2
- func (x Float64x2) GetElem(index uint8) float64
- func (x Float64x2) Greater(y Float64x2) Mask64x2
- func (x Float64x2) GreaterEqual(y Float64x2) Mask64x2
- func (x Float64x2) IsNan(y Float64x2) Mask64x2
- func (x Float64x2) Len() int
- func (x Float64x2) Less(y Float64x2) Mask64x2
- func (x Float64x2) LessEqual(y Float64x2) Mask64x2
- func (x Float64x2) Masked(mask Mask64x2) Float64x2
- func (x Float64x2) Max(y Float64x2) Float64x2
- func (x Float64x2) Merge(y Float64x2, mask Mask64x2) Float64x2
- func (x Float64x2) Min(y Float64x2) Float64x2
- func (x Float64x2) Mul(y Float64x2) Float64x2
- func (x Float64x2) MulAdd(y Float64x2, z Float64x2) Float64x2
- func (x Float64x2) MulAddSub(y Float64x2, z Float64x2) Float64x2
- func (x Float64x2) MulSubAdd(y Float64x2, z Float64x2) Float64x2
- func (x Float64x2) NotEqual(y Float64x2) Mask64x2
- func (x Float64x2) Reciprocal() Float64x2
- func (x Float64x2) ReciprocalSqrt() Float64x2
- func (x Float64x2) RoundToEven() Float64x2
- func (x Float64x2) RoundToEvenScaled(prec uint8) Float64x2
- func (x Float64x2) RoundToEvenScaledResidue(prec uint8) Float64x2
- func (x Float64x2) Scale(y Float64x2) Float64x2
- func (x Float64x2) SelectFromPair(a, b uint8, y Float64x2) Float64x2
- func (x Float64x2) SetElem(index uint8, y float64) Float64x2
- func (x Float64x2) Sqrt() Float64x2
- func (x Float64x2) Store(y *[2]float64)
- func (x Float64x2) StoreMasked(y *[2]float64, mask Mask64x2)
- func (x Float64x2) StoreSlice(s []float64)
- func (x Float64x2) StoreSlicePart(s []float64)
- func (x Float64x2) String() string
- func (x Float64x2) Sub(y Float64x2) Float64x2
- func (x Float64x2) SubPairs(y Float64x2) Float64x2
- func (x Float64x2) Trunc() Float64x2
- func (x Float64x2) TruncScaled(prec uint8) Float64x2
- func (x Float64x2) TruncScaledResidue(prec uint8) Float64x2
type Float64x4
- func BroadcastFloat64x4(x float64) Float64x4
- func LoadFloat64x4(y *[4]float64) Float64x4
- func LoadFloat64x4Slice(s []float64) Float64x4
- func LoadFloat64x4SlicePart(s []float64) Float64x4
- func LoadMaskedFloat64x4(y *[4]float64, mask Mask64x4) Float64x4
- func (x Float64x4) Add(y Float64x4) Float64x4
- func (x Float64x4) AddPairs(y Float64x4) Float64x4
- func (x Float64x4) AddSub(y Float64x4) Float64x4
- func (from Float64x4) AsFloat32x8() (to Float32x8)
- func (from Float64x4) AsInt16x16() (to Int16x16)
- func (from Float64x4) AsInt32x8() (to Int32x8)
- func (from Float64x4) AsInt64x4() (to Int64x4)
- func (from Float64x4) AsInt8x32() (to Int8x32)
- func (from Float64x4) AsUint16x16() (to Uint16x16)
- func (from Float64x4) AsUint32x8() (to Uint32x8)
- func (from Float64x4) AsUint64x4() (to Uint64x4)
- func (from Float64x4) AsUint8x32() (to Uint8x32)
- func (x Float64x4) Ceil() Float64x4
- func (x Float64x4) CeilScaled(prec uint8) Float64x4
- func (x Float64x4) CeilScaledResidue(prec uint8) Float64x4
- func (x Float64x4) Compress(mask Mask64x4) Float64x4
- func (x Float64x4) ConcatPermute(y Float64x4, indices Uint64x4) Float64x4
- func (x Float64x4) ConvertToFloat32() Float32x4
- func (x Float64x4) ConvertToInt32() Int32x4
- func (x Float64x4) ConvertToInt64() Int64x4
- func (x Float64x4) ConvertToUint32() Uint32x4
- func (x Float64x4) ConvertToUint64() Uint64x4
- func (x Float64x4) Div(y Float64x4) Float64x4
- func (x Float64x4) Equal(y Float64x4) Mask64x4
- func (x Float64x4) Expand(mask Mask64x4) Float64x4
- func (x Float64x4) Floor() Float64x4
- func (x Float64x4) FloorScaled(prec uint8) Float64x4
- func (x Float64x4) FloorScaledResidue(prec uint8) Float64x4
- func (x Float64x4) GetHi() Float64x2
- func (x Float64x4) GetLo() Float64x2
- func (x Float64x4) Greater(y Float64x4) Mask64x4
- func (x Float64x4) GreaterEqual(y Float64x4) Mask64x4
- func (x Float64x4) IsNan(y Float64x4) Mask64x4
- func (x Float64x4) Len() int
- func (x Float64x4) Less(y Float64x4) Mask64x4
- func (x Float64x4) LessEqual(y Float64x4) Mask64x4
- func (x Float64x4) Masked(mask Mask64x4) Float64x4
- func (x Float64x4) Max(y Float64x4) Float64x4
- func (x Float64x4) Merge(y Float64x4, mask Mask64x4) Float64x4
- func (x Float64x4) Min(y Float64x4) Float64x4
- func (x Float64x4) Mul(y Float64x4) Float64x4
- func (x Float64x4) MulAdd(y Float64x4, z Float64x4) Float64x4
- func (x Float64x4) MulAddSub(y Float64x4, z Float64x4) Float64x4
- func (x Float64x4) MulSubAdd(y Float64x4, z Float64x4) Float64x4
- func (x Float64x4) NotEqual(y Float64x4) Mask64x4
- func (x Float64x4) Permute(indices Uint64x4) Float64x4
- func (x Float64x4) Reciprocal() Float64x4
- func (x Float64x4) ReciprocalSqrt() Float64x4
- func (x Float64x4) RoundToEven() Float64x4
- func (x Float64x4) RoundToEvenScaled(prec uint8) Float64x4
- func (x Float64x4) RoundToEvenScaledResidue(prec uint8) Float64x4
- func (x Float64x4) Scale(y Float64x4) Float64x4
- func (x Float64x4) Select128FromPair(lo, hi uint8, y Float64x4) Float64x4
- func (x Float64x4) SelectFromPairGrouped(a, b uint8, y Float64x4) Float64x4
- func (x Float64x4) SetHi(y Float64x2) Float64x4
- func (x Float64x4) SetLo(y Float64x2) Float64x4
- func (x Float64x4) Sqrt() Float64x4
- func (x Float64x4) Store(y *[4]float64)
- func (x Float64x4) StoreMasked(y *[4]float64, mask Mask64x4)
- func (x Float64x4) StoreSlice(s []float64)
- func (x Float64x4) StoreSlicePart(s []float64)
- func (x Float64x4) String() string
- func (x Float64x4) Sub(y Float64x4) Float64x4
- func (x Float64x4) SubPairs(y Float64x4) Float64x4
- func (x Float64x4) Trunc() Float64x4
- func (x Float64x4) TruncScaled(prec uint8) Float64x4
- func (x Float64x4) TruncScaledResidue(prec uint8) Float64x4
type Float64x8
- func BroadcastFloat64x8(x float64) Float64x8
- func LoadFloat64x8(y *[8]float64) Float64x8
- func LoadFloat64x8Slice(s []float64) Float64x8
- func LoadFloat64x8SlicePart(s []float64) Float64x8
- func LoadMaskedFloat64x8(y *[8]float64, mask Mask64x8) Float64x8
- func (x Float64x8) Add(y Float64x8) Float64x8
- func (from Float64x8) AsFloat32x16() (to Float32x16)
- func (from Float64x8) AsInt16x32() (to Int16x32)
- func (from Float64x8) AsInt32x16() (to Int32x16)
- func (from Float64x8) AsInt64x8() (to Int64x8)
- func (from Float64x8) AsInt8x64() (to Int8x64)
- func (from Float64x8) AsUint16x32() (to Uint16x32)
- func (from Float64x8) AsUint32x16() (to Uint32x16)
- func (from Float64x8) AsUint64x8() (to Uint64x8)
- func (from Float64x8) AsUint8x64() (to Uint8x64)
- func (x Float64x8) CeilScaled(prec uint8) Float64x8
- func (x Float64x8) CeilScaledResidue(prec uint8) Float64x8
- func (x Float64x8) Compress(mask Mask64x8) Float64x8
- func (x Float64x8) ConcatPermute(y Float64x8, indices Uint64x8) Float64x8
- func (x Float64x8) ConvertToFloat32() Float32x8
- func (x Float64x8) ConvertToInt32() Int32x8
- func (x Float64x8) ConvertToInt64() Int64x8
- func (x Float64x8) ConvertToUint32() Uint32x8
- func (x Float64x8) ConvertToUint64() Uint64x8
- func (x Float64x8) Div(y Float64x8) Float64x8
- func (x Float64x8) Equal(y Float64x8) Mask64x8
- func (x Float64x8) Expand(mask Mask64x8) Float64x8
- func (x Float64x8) FloorScaled(prec uint8) Float64x8
- func (x Float64x8) FloorScaledResidue(prec uint8) Float64x8
- func (x Float64x8) GetHi() Float64x4
- func (x Float64x8) GetLo() Float64x4
- func (x Float64x8) Greater(y Float64x8) Mask64x8
- func (x Float64x8) GreaterEqual(y Float64x8) Mask64x8
- func (x Float64x8) IsNan(y Float64x8) Mask64x8
- func (x Float64x8) Len() int
- func (x Float64x8) Less(y Float64x8) Mask64x8
- func (x Float64x8) LessEqual(y Float64x8) Mask64x8
- func (x Float64x8) Masked(mask Mask64x8) Float64x8
- func (x Float64x8) Max(y Float64x8) Float64x8
- func (x Float64x8) Merge(y Float64x8, mask Mask64x8) Float64x8
- func (x Float64x8) Min(y Float64x8) Float64x8
- func (x Float64x8) Mul(y Float64x8) Float64x8
- func (x Float64x8) MulAdd(y Float64x8, z Float64x8) Float64x8
- func (x Float64x8) MulAddSub(y Float64x8, z Float64x8) Float64x8
- func (x Float64x8) MulSubAdd(y Float64x8, z Float64x8) Float64x8
- func (x Float64x8) NotEqual(y Float64x8) Mask64x8
- func (x Float64x8) Permute(indices Uint64x8) Float64x8
- func (x Float64x8) Reciprocal() Float64x8
- func (x Float64x8) ReciprocalSqrt() Float64x8
- func (x Float64x8) RoundToEvenScaled(prec uint8) Float64x8
- func (x Float64x8) RoundToEvenScaledResidue(prec uint8) Float64x8
- func (x Float64x8) Scale(y Float64x8) Float64x8
- func (x Float64x8) SelectFromPairGrouped(a, b uint8, y Float64x8) Float64x8
- func (x Float64x8) SetHi(y Float64x4) Float64x8
- func (x Float64x8) SetLo(y Float64x4) Float64x8
- func (x Float64x8) Sqrt() Float64x8
- func (x Float64x8) Store(y *[8]float64)
- func (x Float64x8) StoreMasked(y *[8]float64, mask Mask64x8)
- func (x Float64x8) StoreSlice(s []float64)
- func (x Float64x8) StoreSlicePart(s []float64)
- func (x Float64x8) String() string
- func (x Float64x8) Sub(y Float64x8) Float64x8
- func (x Float64x8) TruncScaled(prec uint8) Float64x8
- func (x Float64x8) TruncScaledResidue(prec uint8) Float64x8
type Int16x16
- func BroadcastInt16x16(x int16) Int16x16
- func LoadInt16x16(y *[16]int16) Int16x16
- func LoadInt16x16Slice(s []int16) Int16x16
- func LoadInt16x16SlicePart(s []int16) Int16x16
- func (x Int16x16) Abs() Int16x16
- func (x Int16x16) Add(y Int16x16) Int16x16
- func (x Int16x16) AddPairs(y Int16x16) Int16x16
- func (x Int16x16) AddPairsSaturated(y Int16x16) Int16x16
- func (x Int16x16) AddSaturated(y Int16x16) Int16x16
- func (x Int16x16) And(y Int16x16) Int16x16
- func (x Int16x16) AndNot(y Int16x16) Int16x16
- func (from Int16x16) AsFloat32x8() (to Float32x8)
- func (from Int16x16) AsFloat64x4() (to Float64x4)
- func (from Int16x16) AsInt32x8() (to Int32x8)
- func (from Int16x16) AsInt64x4() (to Int64x4)
- func (from Int16x16) AsInt8x32() (to Int8x32)
- func (from Int16x16) AsUint16x16() (to Uint16x16)
- func (from Int16x16) AsUint32x8() (to Uint32x8)
- func (from Int16x16) AsUint64x4() (to Uint64x4)
- func (from Int16x16) AsUint8x32() (to Uint8x32)
- func (x Int16x16) Compress(mask Mask16x16) Int16x16
- func (x Int16x16) ConcatPermute(y Int16x16, indices Uint16x16) Int16x16
- func (x Int16x16) CopySign(y Int16x16) Int16x16
- func (x Int16x16) DotProductPairs(y Int16x16) Int32x8
- func (x Int16x16) Equal(y Int16x16) Mask16x16
- func (x Int16x16) Expand(mask Mask16x16) Int16x16
- func (x Int16x16) ExtendToInt32() Int32x16
- func (x Int16x16) GetHi() Int16x8
- func (x Int16x16) GetLo() Int16x8
- func (x Int16x16) Greater(y Int16x16) Mask16x16
- func (x Int16x16) GreaterEqual(y Int16x16) Mask16x16
- func (x Int16x16) InterleaveHiGrouped(y Int16x16) Int16x16
- func (x Int16x16) InterleaveLoGrouped(y Int16x16) Int16x16
- func (x Int16x16) IsZero() bool
- func (x Int16x16) Len() int
- func (x Int16x16) Less(y Int16x16) Mask16x16
- func (x Int16x16) LessEqual(y Int16x16) Mask16x16
- func (x Int16x16) Masked(mask Mask16x16) Int16x16
- func (x Int16x16) Max(y Int16x16) Int16x16
- func (x Int16x16) Merge(y Int16x16, mask Mask16x16) Int16x16
- func (x Int16x16) Min(y Int16x16) Int16x16
- func (x Int16x16) Mul(y Int16x16) Int16x16
- func (x Int16x16) MulHigh(y Int16x16) Int16x16
- func (x Int16x16) Not() Int16x16
- func (x Int16x16) NotEqual(y Int16x16) Mask16x16
- func (x Int16x16) OnesCount() Int16x16
- func (x Int16x16) Or(y Int16x16) Int16x16
- func (x Int16x16) Permute(indices Uint16x16) Int16x16
- func (x Int16x16) PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x16
- func (x Int16x16) PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x16
- func (x Int16x16) SaturateToInt8() Int8x16
- func (x Int16x16) SaturateToUint8() Int8x16
- func (x Int16x16) Select128FromPair(lo, hi uint8, y Int16x16) Int16x16
- func (x Int16x16) SetHi(y Int16x8) Int16x16
- func (x Int16x16) SetLo(y Int16x8) Int16x16
- func (x Int16x16) ShiftAllLeft(y uint64) Int16x16
- func (x Int16x16) ShiftAllLeftConcat(shift uint8, y Int16x16) Int16x16
- func (x Int16x16) ShiftAllRight(y uint64) Int16x16
- func (x Int16x16) ShiftAllRightConcat(shift uint8, y Int16x16) Int16x16
- func (x Int16x16) ShiftLeft(y Int16x16) Int16x16
- func (x Int16x16) ShiftLeftConcat(y Int16x16, z Int16x16) Int16x16
- func (x Int16x16) ShiftRight(y Int16x16) Int16x16
- func (x Int16x16) ShiftRightConcat(y Int16x16, z Int16x16) Int16x16
- func (x Int16x16) Store(y *[16]int16)
- func (x Int16x16) StoreSlice(s []int16)
- func (x Int16x16) StoreSlicePart(s []int16)
- func (x Int16x16) String() string
- func (x Int16x16) Sub(y Int16x16) Int16x16
- func (x Int16x16) SubPairs(y Int16x16) Int16x16
- func (x Int16x16) SubPairsSaturated(y Int16x16) Int16x16
- func (x Int16x16) SubSaturated(y Int16x16) Int16x16
- func (from Int16x16) ToMask() (to Mask16x16)
- func (x Int16x16) TruncateToInt8() Int8x16
- func (x Int16x16) Xor(y Int16x16) Int16x16
type Int16x32
- func BroadcastInt16x32(x int16) Int16x32
- func LoadInt16x32(y *[32]int16) Int16x32
- func LoadInt16x32Slice(s []int16) Int16x32
- func LoadInt16x32SlicePart(s []int16) Int16x32
- func LoadMaskedInt16x32(y *[32]int16, mask Mask16x32) Int16x32
- func (x Int16x32) Abs() Int16x32
- func (x Int16x32) Add(y Int16x32) Int16x32
- func (x Int16x32) AddSaturated(y Int16x32) Int16x32
- func (x Int16x32) And(y Int16x32) Int16x32
- func (x Int16x32) AndNot(y Int16x32) Int16x32
- func (from Int16x32) AsFloat32x16() (to Float32x16)
- func (from Int16x32) AsFloat64x8() (to Float64x8)
- func (from Int16x32) AsInt32x16() (to Int32x16)
- func (from Int16x32) AsInt64x8() (to Int64x8)
- func (from Int16x32) AsInt8x64() (to Int8x64)
- func (from Int16x32) AsUint16x32() (to Uint16x32)
- func (from Int16x32) AsUint32x16() (to Uint32x16)
- func (from Int16x32) AsUint64x8() (to Uint64x8)
- func (from Int16x32) AsUint8x64() (to Uint8x64)
- func (x Int16x32) Compress(mask Mask16x32) Int16x32
- func (x Int16x32) ConcatPermute(y Int16x32, indices Uint16x32) Int16x32
- func (x Int16x32) DotProductPairs(y Int16x32) Int32x16
- func (x Int16x32) Equal(y Int16x32) Mask16x32
- func (x Int16x32) Expand(mask Mask16x32) Int16x32
- func (x Int16x32) GetHi() Int16x16
- func (x Int16x32) GetLo() Int16x16
- func (x Int16x32) Greater(y Int16x32) Mask16x32
- func (x Int16x32) GreaterEqual(y Int16x32) Mask16x32
- func (x Int16x32) InterleaveHiGrouped(y Int16x32) Int16x32
- func (x Int16x32) InterleaveLoGrouped(y Int16x32) Int16x32
- func (x Int16x32) Len() int
- func (x Int16x32) Less(y Int16x32) Mask16x32
- func (x Int16x32) LessEqual(y Int16x32) Mask16x32
- func (x Int16x32) Masked(mask Mask16x32) Int16x32
- func (x Int16x32) Max(y Int16x32) Int16x32
- func (x Int16x32) Merge(y Int16x32, mask Mask16x32) Int16x32
- func (x Int16x32) Min(y Int16x32) Int16x32
- func (x Int16x32) Mul(y Int16x32) Int16x32
- func (x Int16x32) MulHigh(y Int16x32) Int16x32
- func (x Int16x32) Not() Int16x32
- func (x Int16x32) NotEqual(y Int16x32) Mask16x32
- func (x Int16x32) OnesCount() Int16x32
- func (x Int16x32) Or(y Int16x32) Int16x32
- func (x Int16x32) Permute(indices Uint16x32) Int16x32
- func (x Int16x32) PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x32
- func (x Int16x32) PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x32
- func (x Int16x32) SaturateToInt8() Int8x32
- func (x Int16x32) SetHi(y Int16x16) Int16x32
- func (x Int16x32) SetLo(y Int16x16) Int16x32
- func (x Int16x32) ShiftAllLeft(y uint64) Int16x32
- func (x Int16x32) ShiftAllLeftConcat(shift uint8, y Int16x32) Int16x32
- func (x Int16x32) ShiftAllRight(y uint64) Int16x32
- func (x Int16x32) ShiftAllRightConcat(shift uint8, y Int16x32) Int16x32
- func (x Int16x32) ShiftLeft(y Int16x32) Int16x32
- func (x Int16x32) ShiftLeftConcat(y Int16x32, z Int16x32) Int16x32
- func (x Int16x32) ShiftRight(y Int16x32) Int16x32
- func (x Int16x32) ShiftRightConcat(y Int16x32, z Int16x32) Int16x32
- func (x Int16x32) Store(y *[32]int16)
- func (x Int16x32) StoreMasked(y *[32]int16, mask Mask16x32)
- func (x Int16x32) StoreSlice(s []int16)
- func (x Int16x32) StoreSlicePart(s []int16)
- func (x Int16x32) String() string
- func (x Int16x32) Sub(y Int16x32) Int16x32
- func (x Int16x32) SubSaturated(y Int16x32) Int16x32
- func (from Int16x32) ToMask() (to Mask16x32)
- func (x Int16x32) TruncateToInt8() Int8x32
- func (x Int16x32) Xor(y Int16x32) Int16x32
type Int16x8
- func BroadcastInt16x8(x int16) Int16x8
- func LoadInt16x8(y *[8]int16) Int16x8
- func LoadInt16x8Slice(s []int16) Int16x8
- func LoadInt16x8SlicePart(s []int16) Int16x8
- func (x Int16x8) Abs() Int16x8
- func (x Int16x8) Add(y Int16x8) Int16x8
- func (x Int16x8) AddPairs(y Int16x8) Int16x8
- func (x Int16x8) AddPairsSaturated(y Int16x8) Int16x8
- func (x Int16x8) AddSaturated(y Int16x8) Int16x8
- func (x Int16x8) And(y Int16x8) Int16x8
- func (x Int16x8) AndNot(y Int16x8) Int16x8
- func (from Int16x8) AsFloat32x4() (to Float32x4)
- func (from Int16x8) AsFloat64x2() (to Float64x2)
- func (from Int16x8) AsInt32x4() (to Int32x4)
- func (from Int16x8) AsInt64x2() (to Int64x2)
- func (from Int16x8) AsInt8x16() (to Int8x16)
- func (from Int16x8) AsUint16x8() (to Uint16x8)
- func (from Int16x8) AsUint32x4() (to Uint32x4)
- func (from Int16x8) AsUint64x2() (to Uint64x2)
- func (from Int16x8) AsUint8x16() (to Uint8x16)
- func (x Int16x8) Broadcast128() Int16x8
- func (x Int16x8) Broadcast256() Int16x16
- func (x Int16x8) Broadcast512() Int16x32
- func (x Int16x8) Compress(mask Mask16x8) Int16x8
- func (x Int16x8) ConcatPermute(y Int16x8, indices Uint16x8) Int16x8
- func (x Int16x8) CopySign(y Int16x8) Int16x8
- func (x Int16x8) DotProductPairs(y Int16x8) Int32x4
- func (x Int16x8) Equal(y Int16x8) Mask16x8
- func (x Int16x8) Expand(mask Mask16x8) Int16x8
- func (x Int16x8) ExtendLo2ToInt64x2() Int64x2
- func (x Int16x8) ExtendLo4ToInt32x4() Int32x4
- func (x Int16x8) ExtendLo4ToInt64x4() Int64x4
- func (x Int16x8) ExtendToInt32() Int32x8
- func (x Int16x8) ExtendToInt64() Int64x8
- func (x Int16x8) GetElem(index uint8) int16
- func (x Int16x8) Greater(y Int16x8) Mask16x8
- func (x Int16x8) GreaterEqual(y Int16x8) Mask16x8
- func (x Int16x8) InterleaveHi(y Int16x8) Int16x8
- func (x Int16x8) InterleaveLo(y Int16x8) Int16x8
- func (x Int16x8) IsZero() bool
- func (x Int16x8) Len() int
- func (x Int16x8) Less(y Int16x8) Mask16x8
- func (x Int16x8) LessEqual(y Int16x8) Mask16x8
- func (x Int16x8) Masked(mask Mask16x8) Int16x8
- func (x Int16x8) Max(y Int16x8) Int16x8
- func (x Int16x8) Merge(y Int16x8, mask Mask16x8) Int16x8
- func (x Int16x8) Min(y Int16x8) Int16x8
- func (x Int16x8) Mul(y Int16x8) Int16x8
- func (x Int16x8) MulHigh(y Int16x8) Int16x8
- func (x Int16x8) Not() Int16x8
- func (x Int16x8) NotEqual(y Int16x8) Mask16x8
- func (x Int16x8) OnesCount() Int16x8
- func (x Int16x8) Or(y Int16x8) Int16x8
- func (x Int16x8) Permute(indices Uint16x8) Int16x8
- func (x Int16x8) PermuteScalarsHi(a, b, c, d uint8) Int16x8
- func (x Int16x8) PermuteScalarsLo(a, b, c, d uint8) Int16x8
- func (x Int16x8) SaturateToInt8() Int8x16
- func (x Int16x8) SaturateToUint8() Int8x16
- func (x Int16x8) SetElem(index uint8, y int16) Int16x8
- func (x Int16x8) ShiftAllLeft(y uint64) Int16x8
- func (x Int16x8) ShiftAllLeftConcat(shift uint8, y Int16x8) Int16x8
- func (x Int16x8) ShiftAllRight(y uint64) Int16x8
- func (x Int16x8) ShiftAllRightConcat(shift uint8, y Int16x8) Int16x8
- func (x Int16x8) ShiftLeft(y Int16x8) Int16x8
- func (x Int16x8) ShiftLeftConcat(y Int16x8, z Int16x8) Int16x8
- func (x Int16x8) ShiftRight(y Int16x8) Int16x8
- func (x Int16x8) ShiftRightConcat(y Int16x8, z Int16x8) Int16x8
- func (x Int16x8) Store(y *[8]int16)
- func (x Int16x8) StoreSlice(s []int16)
- func (x Int16x8) StoreSlicePart(s []int16)
- func (x Int16x8) String() string
- func (x Int16x8) Sub(y Int16x8) Int16x8
- func (x Int16x8) SubPairs(y Int16x8) Int16x8
- func (x Int16x8) SubPairsSaturated(y Int16x8) Int16x8
- func (x Int16x8) SubSaturated(y Int16x8) Int16x8
- func (from Int16x8) ToMask() (to Mask16x8)
- func (x Int16x8) TruncateToInt8() Int8x16
- func (x Int16x8) Xor(y Int16x8) Int16x8
type Int32x16
- func BroadcastInt32x16(x int32) Int32x16
- func LoadInt32x16(y *[16]int32) Int32x16
- func LoadInt32x16Slice(s []int32) Int32x16
- func LoadInt32x16SlicePart(s []int32) Int32x16
- func LoadMaskedInt32x16(y *[16]int32, mask Mask32x16) Int32x16
- func (x Int32x16) Abs() Int32x16
- func (x Int32x16) Add(y Int32x16) Int32x16
- func (x Int32x16) And(y Int32x16) Int32x16
- func (x Int32x16) AndNot(y Int32x16) Int32x16
- func (from Int32x16) AsFloat32x16() (to Float32x16)
- func (from Int32x16) AsFloat64x8() (to Float64x8)
- func (from Int32x16) AsInt16x32() (to Int16x32)
- func (from Int32x16) AsInt64x8() (to Int64x8)
- func (from Int32x16) AsInt8x64() (to Int8x64)
- func (from Int32x16) AsUint16x32() (to Uint16x32)
- func (from Int32x16) AsUint32x16() (to Uint32x16)
- func (from Int32x16) AsUint64x8() (to Uint64x8)
- func (from Int32x16) AsUint8x64() (to Uint8x64)
- func (x Int32x16) Compress(mask Mask32x16) Int32x16
- func (x Int32x16) ConcatPermute(y Int32x16, indices Uint32x16) Int32x16
- func (x Int32x16) ConvertToFloat32() Float32x16
- func (x Int32x16) Equal(y Int32x16) Mask32x16
- func (x Int32x16) Expand(mask Mask32x16) Int32x16
- func (x Int32x16) GetHi() Int32x8
- func (x Int32x16) GetLo() Int32x8
- func (x Int32x16) Greater(y Int32x16) Mask32x16
- func (x Int32x16) GreaterEqual(y Int32x16) Mask32x16
- func (x Int32x16) InterleaveHiGrouped(y Int32x16) Int32x16
- func (x Int32x16) InterleaveLoGrouped(y Int32x16) Int32x16
- func (x Int32x16) LeadingZeros() Int32x16
- func (x Int32x16) Len() int
- func (x Int32x16) Less(y Int32x16) Mask32x16
- func (x Int32x16) LessEqual(y Int32x16) Mask32x16
- func (x Int32x16) Masked(mask Mask32x16) Int32x16
- func (x Int32x16) Max(y Int32x16) Int32x16
- func (x Int32x16) Merge(y Int32x16, mask Mask32x16) Int32x16
- func (x Int32x16) Min(y Int32x16) Int32x16
- func (x Int32x16) Mul(y Int32x16) Int32x16
- func (x Int32x16) Not() Int32x16
- func (x Int32x16) NotEqual(y Int32x16) Mask32x16
- func (x Int32x16) OnesCount() Int32x16
- func (x Int32x16) Or(y Int32x16) Int32x16
- func (x Int32x16) Permute(indices Uint32x16) Int32x16
- func (x Int32x16) PermuteScalarsGrouped(a, b, c, d uint8) Int32x16
- func (x Int32x16) RotateAllLeft(shift uint8) Int32x16
- func (x Int32x16) RotateAllRight(shift uint8) Int32x16
- func (x Int32x16) RotateLeft(y Int32x16) Int32x16
- func (x Int32x16) RotateRight(y Int32x16) Int32x16
- func (x Int32x16) SaturateToInt16() Int16x16
- func (x Int32x16) SaturateToInt16Concat(y Int32x16) Int16x32
- func (x Int32x16) SaturateToInt8() Int8x16
- func (x Int32x16) SaturateToUint8() Int8x16
- func (x Int32x16) SelectFromPairGrouped(a, b, c, d uint8, y Int32x16) Int32x16
- func (x Int32x16) SetHi(y Int32x8) Int32x16
- func (x Int32x16) SetLo(y Int32x8) Int32x16
- func (x Int32x16) ShiftAllLeft(y uint64) Int32x16
- func (x Int32x16) ShiftAllLeftConcat(shift uint8, y Int32x16) Int32x16
- func (x Int32x16) ShiftAllRight(y uint64) Int32x16
- func (x Int32x16) ShiftAllRightConcat(shift uint8, y Int32x16) Int32x16
- func (x Int32x16) ShiftLeft(y Int32x16) Int32x16
- func (x Int32x16) ShiftLeftConcat(y Int32x16, z Int32x16) Int32x16
- func (x Int32x16) ShiftRight(y Int32x16) Int32x16
- func (x Int32x16) ShiftRightConcat(y Int32x16, z Int32x16) Int32x16
- func (x Int32x16) Store(y *[16]int32)
- func (x Int32x16) StoreMasked(y *[16]int32, mask Mask32x16)
- func (x Int32x16) StoreSlice(s []int32)
- func (x Int32x16) StoreSlicePart(s []int32)
- func (x Int32x16) String() string
- func (x Int32x16) Sub(y Int32x16) Int32x16
- func (from Int32x16) ToMask() (to Mask32x16)
- func (x Int32x16) TruncateToInt16() Int16x16
- func (x Int32x16) TruncateToInt8() Int8x16
- func (x Int32x16) Xor(y Int32x16) Int32x16
type Int32x4
- func BroadcastInt32x4(x int32) Int32x4
- func LoadInt32x4(y *[4]int32) Int32x4
- func LoadInt32x4Slice(s []int32) Int32x4
- func LoadInt32x4SlicePart(s []int32) Int32x4
- func LoadMaskedInt32x4(y *[4]int32, mask Mask32x4) Int32x4
- func (x Int32x4) Abs() Int32x4
- func (x Int32x4) Add(y Int32x4) Int32x4
- func (x Int32x4) AddPairs(y Int32x4) Int32x4
- func (x Int32x4) And(y Int32x4) Int32x4
- func (x Int32x4) AndNot(y Int32x4) Int32x4
- func (from Int32x4) AsFloat32x4() (to Float32x4)
- func (from Int32x4) AsFloat64x2() (to Float64x2)
- func (from Int32x4) AsInt16x8() (to Int16x8)
- func (from Int32x4) AsInt64x2() (to Int64x2)
- func (from Int32x4) AsInt8x16() (to Int8x16)
- func (from Int32x4) AsUint16x8() (to Uint16x8)
- func (from Int32x4) AsUint32x4() (to Uint32x4)
- func (from Int32x4) AsUint64x2() (to Uint64x2)
- func (from Int32x4) AsUint8x16() (to Uint8x16)
- func (x Int32x4) Broadcast128() Int32x4
- func (x Int32x4) Broadcast256() Int32x8
- func (x Int32x4) Broadcast512() Int32x16
- func (x Int32x4) Compress(mask Mask32x4) Int32x4
- func (x Int32x4) ConcatPermute(y Int32x4, indices Uint32x4) Int32x4
- func (x Int32x4) ConvertToFloat32() Float32x4
- func (x Int32x4) ConvertToFloat64() Float64x4
- func (x Int32x4) CopySign(y Int32x4) Int32x4
- func (x Int32x4) Equal(y Int32x4) Mask32x4
- func (x Int32x4) Expand(mask Mask32x4) Int32x4
- func (x Int32x4) ExtendLo2ToInt64x2() Int64x2
- func (x Int32x4) ExtendToInt64() Int64x4
- func (x Int32x4) GetElem(index uint8) int32
- func (x Int32x4) Greater(y Int32x4) Mask32x4
- func (x Int32x4) GreaterEqual(y Int32x4) Mask32x4
- func (x Int32x4) InterleaveHi(y Int32x4) Int32x4
- func (x Int32x4) InterleaveLo(y Int32x4) Int32x4
- func (x Int32x4) IsZero() bool
- func (x Int32x4) LeadingZeros() Int32x4
- func (x Int32x4) Len() int
- func (x Int32x4) Less(y Int32x4) Mask32x4
- func (x Int32x4) LessEqual(y Int32x4) Mask32x4
- func (x Int32x4) Masked(mask Mask32x4) Int32x4
- func (x Int32x4) Max(y Int32x4) Int32x4
- func (x Int32x4) Merge(y Int32x4, mask Mask32x4) Int32x4
- func (x Int32x4) Min(y Int32x4) Int32x4
- func (x Int32x4) Mul(y Int32x4) Int32x4
- func (x Int32x4) MulEvenWiden(y Int32x4) Int64x2
- func (x Int32x4) Not() Int32x4
- func (x Int32x4) NotEqual(y Int32x4) Mask32x4
- func (x Int32x4) OnesCount() Int32x4
- func (x Int32x4) Or(y Int32x4) Int32x4
- func (x Int32x4) PermuteScalars(a, b, c, d uint8) Int32x4
- func (x Int32x4) RotateAllLeft(shift uint8) Int32x4
- func (x Int32x4) RotateAllRight(shift uint8) Int32x4
- func (x Int32x4) RotateLeft(y Int32x4) Int32x4
- func (x Int32x4) RotateRight(y Int32x4) Int32x4
- func (x Int32x4) SaturateToInt16() Int16x8
- func (x Int32x4) SaturateToInt16Concat(y Int32x4) Int16x8
- func (x Int32x4) SaturateToInt8() Int8x16
- func (x Int32x4) SaturateToUint8() Int8x16
- func (x Int32x4) SelectFromPair(a, b, c, d uint8, y Int32x4) Int32x4
- func (x Int32x4) SetElem(index uint8, y int32) Int32x4
- func (x Int32x4) ShiftAllLeft(y uint64) Int32x4
- func (x Int32x4) ShiftAllLeftConcat(shift uint8, y Int32x4) Int32x4
- func (x Int32x4) ShiftAllRight(y uint64) Int32x4
- func (x Int32x4) ShiftAllRightConcat(shift uint8, y Int32x4) Int32x4
- func (x Int32x4) ShiftLeft(y Int32x4) Int32x4
- func (x Int32x4) ShiftLeftConcat(y Int32x4, z Int32x4) Int32x4
- func (x Int32x4) ShiftRight(y Int32x4) Int32x4
- func (x Int32x4) ShiftRightConcat(y Int32x4, z Int32x4) Int32x4
- func (x Int32x4) Store(y *[4]int32)
- func (x Int32x4) StoreMasked(y *[4]int32, mask Mask32x4)
- func (x Int32x4) StoreSlice(s []int32)
- func (x Int32x4) StoreSlicePart(s []int32)
- func (x Int32x4) String() string
- func (x Int32x4) Sub(y Int32x4) Int32x4
- func (x Int32x4) SubPairs(y Int32x4) Int32x4
- func (from Int32x4) ToMask() (to Mask32x4)
- func (x Int32x4) TruncateToInt16() Int16x8
- func (x Int32x4) TruncateToInt8() Int8x16
- func (x Int32x4) Xor(y Int32x4) Int32x4
type Int32x8
- func BroadcastInt32x8(x int32) Int32x8
- func LoadInt32x8(y *[8]int32) Int32x8
- func LoadInt32x8Slice(s []int32) Int32x8
- func LoadInt32x8SlicePart(s []int32) Int32x8
- func LoadMaskedInt32x8(y *[8]int32, mask Mask32x8) Int32x8
- func (x Int32x8) Abs() Int32x8
- func (x Int32x8) Add(y Int32x8) Int32x8
- func (x Int32x8) AddPairs(y Int32x8) Int32x8
- func (x Int32x8) And(y Int32x8) Int32x8
- func (x Int32x8) AndNot(y Int32x8) Int32x8
- func (from Int32x8) AsFloat32x8() (to Float32x8)
- func (from Int32x8) AsFloat64x4() (to Float64x4)
- func (from Int32x8) AsInt16x16() (to Int16x16)
- func (from Int32x8) AsInt64x4() (to Int64x4)
- func (from Int32x8) AsInt8x32() (to Int8x32)
- func (from Int32x8) AsUint16x16() (to Uint16x16)
- func (from Int32x8) AsUint32x8() (to Uint32x8)
- func (from Int32x8) AsUint64x4() (to Uint64x4)
- func (from Int32x8) AsUint8x32() (to Uint8x32)
- func (x Int32x8) Compress(mask Mask32x8) Int32x8
- func (x Int32x8) ConcatPermute(y Int32x8, indices Uint32x8) Int32x8
- func (x Int32x8) ConvertToFloat32() Float32x8
- func (x Int32x8) ConvertToFloat64() Float64x8
- func (x Int32x8) CopySign(y Int32x8) Int32x8
- func (x Int32x8) Equal(y Int32x8) Mask32x8
- func (x Int32x8) Expand(mask Mask32x8) Int32x8
- func (x Int32x8) ExtendToInt64() Int64x8
- func (x Int32x8) GetHi() Int32x4
- func (x Int32x8) GetLo() Int32x4
- func (x Int32x8) Greater(y Int32x8) Mask32x8
- func (x Int32x8) GreaterEqual(y Int32x8) Mask32x8
- func (x Int32x8) InterleaveHiGrouped(y Int32x8) Int32x8
- func (x Int32x8) InterleaveLoGrouped(y Int32x8) Int32x8
- func (x Int32x8) IsZero() bool
- func (x Int32x8) LeadingZeros() Int32x8
- func (x Int32x8) Len() int
- func (x Int32x8) Less(y Int32x8) Mask32x8
- func (x Int32x8) LessEqual(y Int32x8) Mask32x8
- func (x Int32x8) Masked(mask Mask32x8) Int32x8
- func (x Int32x8) Max(y Int32x8) Int32x8
- func (x Int32x8) Merge(y Int32x8, mask Mask32x8) Int32x8
- func (x Int32x8) Min(y Int32x8) Int32x8
- func (x Int32x8) Mul(y Int32x8) Int32x8
- func (x Int32x8) MulEvenWiden(y Int32x8) Int64x4
- func (x Int32x8) Not() Int32x8
- func (x Int32x8) NotEqual(y Int32x8) Mask32x8
- func (x Int32x8) OnesCount() Int32x8
- func (x Int32x8) Or(y Int32x8) Int32x8
- func (x Int32x8) Permute(indices Uint32x8) Int32x8
- func (x Int32x8) PermuteScalarsGrouped(a, b, c, d uint8) Int32x8
- func (x Int32x8) RotateAllLeft(shift uint8) Int32x8
- func (x Int32x8) RotateAllRight(shift uint8) Int32x8
- func (x Int32x8) RotateLeft(y Int32x8) Int32x8
- func (x Int32x8) RotateRight(y Int32x8) Int32x8
- func (x Int32x8) SaturateToInt16() Int16x8
- func (x Int32x8) SaturateToInt16Concat(y Int32x8) Int16x16
- func (x Int32x8) SaturateToInt8() Int8x16
- func (x Int32x8) SaturateToUint8() Int8x16
- func (x Int32x8) Select128FromPair(lo, hi uint8, y Int32x8) Int32x8
- func (x Int32x8) SelectFromPairGrouped(a, b, c, d uint8, y Int32x8) Int32x8
- func (x Int32x8) SetHi(y Int32x4) Int32x8
- func (x Int32x8) SetLo(y Int32x4) Int32x8
- func (x Int32x8) ShiftAllLeft(y uint64) Int32x8
- func (x Int32x8) ShiftAllLeftConcat(shift uint8, y Int32x8) Int32x8
- func (x Int32x8) ShiftAllRight(y uint64) Int32x8
- func (x Int32x8) ShiftAllRightConcat(shift uint8, y Int32x8) Int32x8
- func (x Int32x8) ShiftLeft(y Int32x8) Int32x8
- func (x Int32x8) ShiftLeftConcat(y Int32x8, z Int32x8) Int32x8
- func (x Int32x8) ShiftRight(y Int32x8) Int32x8
- func (x Int32x8) ShiftRightConcat(y Int32x8, z Int32x8) Int32x8
- func (x Int32x8) Store(y *[8]int32)
- func (x Int32x8) StoreMasked(y *[8]int32, mask Mask32x8)
- func (x Int32x8) StoreSlice(s []int32)
- func (x Int32x8) StoreSlicePart(s []int32)
- func (x Int32x8) String() string
- func (x Int32x8) Sub(y Int32x8) Int32x8
- func (x Int32x8) SubPairs(y Int32x8) Int32x8
- func (from Int32x8) ToMask() (to Mask32x8)
- func (x Int32x8) TruncateToInt16() Int16x8
- func (x Int32x8) TruncateToInt8() Int8x16
- func (x Int32x8) Xor(y Int32x8) Int32x8
type Int64x2
- func BroadcastInt64x2(x int64) Int64x2
- func LoadInt64x2(y *[2]int64) Int64x2
- func LoadInt64x2Slice(s []int64) Int64x2
- func LoadInt64x2SlicePart(s []int64) Int64x2
- func LoadMaskedInt64x2(y *[2]int64, mask Mask64x2) Int64x2
- func (x Int64x2) Abs() Int64x2
- func (x Int64x2) Add(y Int64x2) Int64x2
- func (x Int64x2) And(y Int64x2) Int64x2
- func (x Int64x2) AndNot(y Int64x2) Int64x2
- func (from Int64x2) AsFloat32x4() (to Float32x4)
- func (from Int64x2) AsFloat64x2() (to Float64x2)
- func (from Int64x2) AsInt16x8() (to Int16x8)
- func (from Int64x2) AsInt32x4() (to Int32x4)
- func (from Int64x2) AsInt8x16() (to Int8x16)
- func (from Int64x2) AsUint16x8() (to Uint16x8)
- func (from Int64x2) AsUint32x4() (to Uint32x4)
- func (from Int64x2) AsUint64x2() (to Uint64x2)
- func (from Int64x2) AsUint8x16() (to Uint8x16)
- func (x Int64x2) Broadcast128() Int64x2
- func (x Int64x2) Broadcast256() Int64x4
- func (x Int64x2) Broadcast512() Int64x8
- func (x Int64x2) Compress(mask Mask64x2) Int64x2
- func (x Int64x2) ConcatPermute(y Int64x2, indices Uint64x2) Int64x2
- func (x Int64x2) ConvertToFloat32() Float32x4
- func (x Int64x2) ConvertToFloat64() Float64x2
- func (x Int64x2) Equal(y Int64x2) Mask64x2
- func (x Int64x2) Expand(mask Mask64x2) Int64x2
- func (x Int64x2) GetElem(index uint8) int64
- func (x Int64x2) Greater(y Int64x2) Mask64x2
- func (x Int64x2) GreaterEqual(y Int64x2) Mask64x2
- func (x Int64x2) InterleaveHi(y Int64x2) Int64x2
- func (x Int64x2) InterleaveLo(y Int64x2) Int64x2
- func (x Int64x2) IsZero() bool
- func (x Int64x2) LeadingZeros() Int64x2
- func (x Int64x2) Len() int
- func (x Int64x2) Less(y Int64x2) Mask64x2
- func (x Int64x2) LessEqual(y Int64x2) Mask64x2
- func (x Int64x2) Masked(mask Mask64x2) Int64x2
- func (x Int64x2) Max(y Int64x2) Int64x2
- func (x Int64x2) Merge(y Int64x2, mask Mask64x2) Int64x2
- func (x Int64x2) Min(y Int64x2) Int64x2
- func (x Int64x2) Mul(y Int64x2) Int64x2
- func (x Int64x2) Not() Int64x2
- func (x Int64x2) NotEqual(y Int64x2) Mask64x2
- func (x Int64x2) OnesCount() Int64x2
- func (x Int64x2) Or(y Int64x2) Int64x2
- func (x Int64x2) RotateAllLeft(shift uint8) Int64x2
- func (x Int64x2) RotateAllRight(shift uint8) Int64x2
- func (x Int64x2) RotateLeft(y Int64x2) Int64x2
- func (x Int64x2) RotateRight(y Int64x2) Int64x2
- func (x Int64x2) SaturateToInt16() Int16x8
- func (x Int64x2) SaturateToInt32() Int32x4
- func (x Int64x2) SaturateToInt8() Int8x16
- func (x Int64x2) SaturateToUint8() Int8x16
- func (x Int64x2) SelectFromPair(a, b uint8, y Int64x2) Int64x2
- func (x Int64x2) SetElem(index uint8, y int64) Int64x2
- func (x Int64x2) ShiftAllLeft(y uint64) Int64x2
- func (x Int64x2) ShiftAllLeftConcat(shift uint8, y Int64x2) Int64x2
- func (x Int64x2) ShiftAllRight(y uint64) Int64x2
- func (x Int64x2) ShiftAllRightConcat(shift uint8, y Int64x2) Int64x2
- func (x Int64x2) ShiftLeft(y Int64x2) Int64x2
- func (x Int64x2) ShiftLeftConcat(y Int64x2, z Int64x2) Int64x2
- func (x Int64x2) ShiftRight(y Int64x2) Int64x2
- func (x Int64x2) ShiftRightConcat(y Int64x2, z Int64x2) Int64x2
- func (x Int64x2) Store(y *[2]int64)
- func (x Int64x2) StoreMasked(y *[2]int64, mask Mask64x2)
- func (x Int64x2) StoreSlice(s []int64)
- func (x Int64x2) StoreSlicePart(s []int64)
- func (x Int64x2) String() string
- func (x Int64x2) Sub(y Int64x2) Int64x2
- func (from Int64x2) ToMask() (to Mask64x2)
- func (x Int64x2) TruncateToInt16() Int16x8
- func (x Int64x2) TruncateToInt32() Int32x4
- func (x Int64x2) TruncateToInt8() Int8x16
- func (x Int64x2) Xor(y Int64x2) Int64x2
type Int64x4
- func BroadcastInt64x4(x int64) Int64x4
- func LoadInt64x4(y *[4]int64) Int64x4
- func LoadInt64x4Slice(s []int64) Int64x4
- func LoadInt64x4SlicePart(s []int64) Int64x4
- func LoadMaskedInt64x4(y *[4]int64, mask Mask64x4) Int64x4
- func (x Int64x4) Abs() Int64x4
- func (x Int64x4) Add(y Int64x4) Int64x4
- func (x Int64x4) And(y Int64x4) Int64x4
- func (x Int64x4) AndNot(y Int64x4) Int64x4
- func (from Int64x4) AsFloat32x8() (to Float32x8)
- func (from Int64x4) AsFloat64x4() (to Float64x4)
- func (from Int64x4) AsInt16x16() (to Int16x16)
- func (from Int64x4) AsInt32x8() (to Int32x8)
- func (from Int64x4) AsInt8x32() (to Int8x32)
- func (from Int64x4) AsUint16x16() (to Uint16x16)
- func (from Int64x4) AsUint32x8() (to Uint32x8)
- func (from Int64x4) AsUint64x4() (to Uint64x4)
- func (from Int64x4) AsUint8x32() (to Uint8x32)
- func (x Int64x4) Compress(mask Mask64x4) Int64x4
- func (x Int64x4) ConcatPermute(y Int64x4, indices Uint64x4) Int64x4
- func (x Int64x4) ConvertToFloat32() Float32x4
- func (x Int64x4) ConvertToFloat64() Float64x4
- func (x Int64x4) Equal(y Int64x4) Mask64x4
- func (x Int64x4) Expand(mask Mask64x4) Int64x4
- func (x Int64x4) GetHi() Int64x2
- func (x Int64x4) GetLo() Int64x2
- func (x Int64x4) Greater(y Int64x4) Mask64x4
- func (x Int64x4) GreaterEqual(y Int64x4) Mask64x4
- func (x Int64x4) InterleaveHiGrouped(y Int64x4) Int64x4
- func (x Int64x4) InterleaveLoGrouped(y Int64x4) Int64x4
- func (x Int64x4) IsZero() bool
- func (x Int64x4) LeadingZeros() Int64x4
- func (x Int64x4) Len() int
- func (x Int64x4) Less(y Int64x4) Mask64x4
- func (x Int64x4) LessEqual(y Int64x4) Mask64x4
- func (x Int64x4) Masked(mask Mask64x4) Int64x4
- func (x Int64x4) Max(y Int64x4) Int64x4
- func (x Int64x4) Merge(y Int64x4, mask Mask64x4) Int64x4
- func (x Int64x4) Min(y Int64x4) Int64x4
- func (x Int64x4) Mul(y Int64x4) Int64x4
- func (x Int64x4) Not() Int64x4
- func (x Int64x4) NotEqual(y Int64x4) Mask64x4
- func (x Int64x4) OnesCount() Int64x4
- func (x Int64x4) Or(y Int64x4) Int64x4
- func (x Int64x4) Permute(indices Uint64x4) Int64x4
- func (x Int64x4) RotateAllLeft(shift uint8) Int64x4
- func (x Int64x4) RotateAllRight(shift uint8) Int64x4
- func (x Int64x4) RotateLeft(y Int64x4) Int64x4
- func (x Int64x4) RotateRight(y Int64x4) Int64x4
- func (x Int64x4) SaturateToInt16() Int16x8
- func (x Int64x4) SaturateToInt32() Int32x4
- func (x Int64x4) SaturateToInt8() Int8x16
- func (x Int64x4) SaturateToUint8() Int8x16
- func (x Int64x4) Select128FromPair(lo, hi uint8, y Int64x4) Int64x4
- func (x Int64x4) SelectFromPairGrouped(a, b uint8, y Int64x4) Int64x4
- func (x Int64x4) SetHi(y Int64x2) Int64x4
- func (x Int64x4) SetLo(y Int64x2) Int64x4
- func (x Int64x4) ShiftAllLeft(y uint64) Int64x4
- func (x Int64x4) ShiftAllLeftConcat(shift uint8, y Int64x4) Int64x4
- func (x Int64x4) ShiftAllRight(y uint64) Int64x4
- func (x Int64x4) ShiftAllRightConcat(shift uint8, y Int64x4) Int64x4
- func (x Int64x4) ShiftLeft(y Int64x4) Int64x4
- func (x Int64x4) ShiftLeftConcat(y Int64x4, z Int64x4) Int64x4
- func (x Int64x4) ShiftRight(y Int64x4) Int64x4
- func (x Int64x4) ShiftRightConcat(y Int64x4, z Int64x4) Int64x4
- func (x Int64x4) Store(y *[4]int64)
- func (x Int64x4) StoreMasked(y *[4]int64, mask Mask64x4)
- func (x Int64x4) StoreSlice(s []int64)
- func (x Int64x4) StoreSlicePart(s []int64)
- func (x Int64x4) String() string
- func (x Int64x4) Sub(y Int64x4) Int64x4
- func (from Int64x4) ToMask() (to Mask64x4)
- func (x Int64x4) TruncateToInt16() Int16x8
- func (x Int64x4) TruncateToInt32() Int32x4
- func (x Int64x4) TruncateToInt8() Int8x16
- func (x Int64x4) Xor(y Int64x4) Int64x4
type Int64x8
- func BroadcastInt64x8(x int64) Int64x8
- func LoadInt64x8(y *[8]int64) Int64x8
- func LoadInt64x8Slice(s []int64) Int64x8
- func LoadInt64x8SlicePart(s []int64) Int64x8
- func LoadMaskedInt64x8(y *[8]int64, mask Mask64x8) Int64x8
- func (x Int64x8) Abs() Int64x8
- func (x Int64x8) Add(y Int64x8) Int64x8
- func (x Int64x8) And(y Int64x8) Int64x8
- func (x Int64x8) AndNot(y Int64x8) Int64x8
- func (from Int64x8) AsFloat32x16() (to Float32x16)
- func (from Int64x8) AsFloat64x8() (to Float64x8)
- func (from Int64x8) AsInt16x32() (to Int16x32)
- func (from Int64x8) AsInt32x16() (to Int32x16)
- func (from Int64x8) AsInt8x64() (to Int8x64)
- func (from Int64x8) AsUint16x32() (to Uint16x32)
- func (from Int64x8) AsUint32x16() (to Uint32x16)
- func (from Int64x8) AsUint64x8() (to Uint64x8)
- func (from Int64x8) AsUint8x64() (to Uint8x64)
- func (x Int64x8) Compress(mask Mask64x8) Int64x8
- func (x Int64x8) ConcatPermute(y Int64x8, indices Uint64x8) Int64x8
- func (x Int64x8) ConvertToFloat32() Float32x8
- func (x Int64x8) ConvertToFloat64() Float64x8
- func (x Int64x8) Equal(y Int64x8) Mask64x8
- func (x Int64x8) Expand(mask Mask64x8) Int64x8
- func (x Int64x8) GetHi() Int64x4
- func (x Int64x8) GetLo() Int64x4
- func (x Int64x8) Greater(y Int64x8) Mask64x8
- func (x Int64x8) GreaterEqual(y Int64x8) Mask64x8
- func (x Int64x8) InterleaveHiGrouped(y Int64x8) Int64x8
- func (x Int64x8) InterleaveLoGrouped(y Int64x8) Int64x8
- func (x Int64x8) LeadingZeros() Int64x8
- func (x Int64x8) Len() int
- func (x Int64x8) Less(y Int64x8) Mask64x8
- func (x Int64x8) LessEqual(y Int64x8) Mask64x8
- func (x Int64x8) Masked(mask Mask64x8) Int64x8
- func (x Int64x8) Max(y Int64x8) Int64x8
- func (x Int64x8) Merge(y Int64x8, mask Mask64x8) Int64x8
- func (x Int64x8) Min(y Int64x8) Int64x8
- func (x Int64x8) Mul(y Int64x8) Int64x8
- func (x Int64x8) Not() Int64x8
- func (x Int64x8) NotEqual(y Int64x8) Mask64x8
- func (x Int64x8) OnesCount() Int64x8
- func (x Int64x8) Or(y Int64x8) Int64x8
- func (x Int64x8) Permute(indices Uint64x8) Int64x8
- func (x Int64x8) RotateAllLeft(shift uint8) Int64x8
- func (x Int64x8) RotateAllRight(shift uint8) Int64x8
- func (x Int64x8) RotateLeft(y Int64x8) Int64x8
- func (x Int64x8) RotateRight(y Int64x8) Int64x8
- func (x Int64x8) SaturateToInt16() Int16x8
- func (x Int64x8) SaturateToInt32() Int32x8
- func (x Int64x8) SaturateToInt8() Int8x16
- func (x Int64x8) SaturateToUint8() Int8x16
- func (x Int64x8) SelectFromPairGrouped(a, b uint8, y Int64x8) Int64x8
- func (x Int64x8) SetHi(y Int64x4) Int64x8
- func (x Int64x8) SetLo(y Int64x4) Int64x8
- func (x Int64x8) ShiftAllLeft(y uint64) Int64x8
- func (x Int64x8) ShiftAllLeftConcat(shift uint8, y Int64x8) Int64x8
- func (x Int64x8) ShiftAllRight(y uint64) Int64x8
- func (x Int64x8) ShiftAllRightConcat(shift uint8, y Int64x8) Int64x8
- func (x Int64x8) ShiftLeft(y Int64x8) Int64x8
- func (x Int64x8) ShiftLeftConcat(y Int64x8, z Int64x8) Int64x8
- func (x Int64x8) ShiftRight(y Int64x8) Int64x8
- func (x Int64x8) ShiftRightConcat(y Int64x8, z Int64x8) Int64x8
- func (x Int64x8) Store(y *[8]int64)
- func (x Int64x8) StoreMasked(y *[8]int64, mask Mask64x8)
- func (x Int64x8) StoreSlice(s []int64)
- func (x Int64x8) StoreSlicePart(s []int64)
- func (x Int64x8) String() string
- func (x Int64x8) Sub(y Int64x8) Int64x8
- func (from Int64x8) ToMask() (to Mask64x8)
- func (x Int64x8) TruncateToInt16() Int16x8
- func (x Int64x8) TruncateToInt32() Int32x8
- func (x Int64x8) TruncateToInt8() Int8x16
- func (x Int64x8) Xor(y Int64x8) Int64x8
type Int8x16
- func BroadcastInt8x16(x int8) Int8x16
- func LoadInt8x16(y *[16]int8) Int8x16
- func LoadInt8x16Slice(s []int8) Int8x16
- func LoadInt8x16SlicePart(s []int8) Int8x16
- func (x Int8x16) Abs() Int8x16
- func (x Int8x16) Add(y Int8x16) Int8x16
- func (x Int8x16) AddSaturated(y Int8x16) Int8x16
- func (x Int8x16) And(y Int8x16) Int8x16
- func (x Int8x16) AndNot(y Int8x16) Int8x16
- func (from Int8x16) AsFloat32x4() (to Float32x4)
- func (from Int8x16) AsFloat64x2() (to Float64x2)
- func (from Int8x16) AsInt16x8() (to Int16x8)
- func (from Int8x16) AsInt32x4() (to Int32x4)
- func (from Int8x16) AsInt64x2() (to Int64x2)
- func (from Int8x16) AsUint16x8() (to Uint16x8)
- func (from Int8x16) AsUint32x4() (to Uint32x4)
- func (from Int8x16) AsUint64x2() (to Uint64x2)
- func (from Int8x16) AsUint8x16() (to Uint8x16)
- func (x Int8x16) Broadcast128() Int8x16
- func (x Int8x16) Broadcast256() Int8x32
- func (x Int8x16) Broadcast512() Int8x64
- func (x Int8x16) Compress(mask Mask8x16) Int8x16
- func (x Int8x16) ConcatPermute(y Int8x16, indices Uint8x16) Int8x16
- func (x Int8x16) CopySign(y Int8x16) Int8x16
- func (x Int8x16) DotProductQuadruple(y Uint8x16) Int32x4
- func (x Int8x16) DotProductQuadrupleSaturated(y Uint8x16) Int32x4
- func (x Int8x16) Equal(y Int8x16) Mask8x16
- func (x Int8x16) Expand(mask Mask8x16) Int8x16
- func (x Int8x16) ExtendLo2ToInt64x2() Int64x2
- func (x Int8x16) ExtendLo4ToInt32x4() Int32x4
- func (x Int8x16) ExtendLo4ToInt64x4() Int64x4
- func (x Int8x16) ExtendLo8ToInt16x8() Int16x8
- func (x Int8x16) ExtendLo8ToInt32x8() Int32x8
- func (x Int8x16) ExtendLo8ToInt64x8() Int64x8
- func (x Int8x16) ExtendToInt16() Int16x16
- func (x Int8x16) ExtendToInt32() Int32x16
- func (x Int8x16) GetElem(index uint8) int8
- func (x Int8x16) Greater(y Int8x16) Mask8x16
- func (x Int8x16) GreaterEqual(y Int8x16) Mask8x16
- func (x Int8x16) IsZero() bool
- func (x Int8x16) Len() int
- func (x Int8x16) Less(y Int8x16) Mask8x16
- func (x Int8x16) LessEqual(y Int8x16) Mask8x16
- func (x Int8x16) Masked(mask Mask8x16) Int8x16
- func (x Int8x16) Max(y Int8x16) Int8x16
- func (x Int8x16) Merge(y Int8x16, mask Mask8x16) Int8x16
- func (x Int8x16) Min(y Int8x16) Int8x16
- func (x Int8x16) Not() Int8x16
- func (x Int8x16) NotEqual(y Int8x16) Mask8x16
- func (x Int8x16) OnesCount() Int8x16
- func (x Int8x16) Or(y Int8x16) Int8x16
- func (x Int8x16) Permute(indices Uint8x16) Int8x16
- func (x Int8x16) PermuteOrZero(indices Int8x16) Int8x16
- func (x Int8x16) SetElem(index uint8, y int8) Int8x16
- func (x Int8x16) Store(y *[16]int8)
- func (x Int8x16) StoreSlice(s []int8)
- func (x Int8x16) StoreSlicePart(s []int8)
- func (x Int8x16) String() string
- func (x Int8x16) Sub(y Int8x16) Int8x16
- func (x Int8x16) SubSaturated(y Int8x16) Int8x16
- func (from Int8x16) ToMask() (to Mask8x16)
- func (x Int8x16) Xor(y Int8x16) Int8x16
type Int8x32
- func BroadcastInt8x32(x int8) Int8x32
- func LoadInt8x32(y *[32]int8) Int8x32
- func LoadInt8x32Slice(s []int8) Int8x32
- func LoadInt8x32SlicePart(s []int8) Int8x32
- func (x Int8x32) Abs() Int8x32
- func (x Int8x32) Add(y Int8x32) Int8x32
- func (x Int8x32) AddSaturated(y Int8x32) Int8x32
- func (x Int8x32) And(y Int8x32) Int8x32
- func (x Int8x32) AndNot(y Int8x32) Int8x32
- func (from Int8x32) AsFloat32x8() (to Float32x8)
- func (from Int8x32) AsFloat64x4() (to Float64x4)
- func (from Int8x32) AsInt16x16() (to Int16x16)
- func (from Int8x32) AsInt32x8() (to Int32x8)
- func (from Int8x32) AsInt64x4() (to Int64x4)
- func (from Int8x32) AsUint16x16() (to Uint16x16)
- func (from Int8x32) AsUint32x8() (to Uint32x8)
- func (from Int8x32) AsUint64x4() (to Uint64x4)
- func (from Int8x32) AsUint8x32() (to Uint8x32)
- func (x Int8x32) Compress(mask Mask8x32) Int8x32
- func (x Int8x32) ConcatPermute(y Int8x32, indices Uint8x32) Int8x32
- func (x Int8x32) CopySign(y Int8x32) Int8x32
- func (x Int8x32) DotProductQuadruple(y Uint8x32) Int32x8
- func (x Int8x32) DotProductQuadrupleSaturated(y Uint8x32) Int32x8
- func (x Int8x32) Equal(y Int8x32) Mask8x32
- func (x Int8x32) Expand(mask Mask8x32) Int8x32
- func (x Int8x32) ExtendToInt16() Int16x32
- func (x Int8x32) GetHi() Int8x16
- func (x Int8x32) GetLo() Int8x16
- func (x Int8x32) Greater(y Int8x32) Mask8x32
- func (x Int8x32) GreaterEqual(y Int8x32) Mask8x32
- func (x Int8x32) IsZero() bool
- func (x Int8x32) Len() int
- func (x Int8x32) Less(y Int8x32) Mask8x32
- func (x Int8x32) LessEqual(y Int8x32) Mask8x32
- func (x Int8x32) Masked(mask Mask8x32) Int8x32
- func (x Int8x32) Max(y Int8x32) Int8x32
- func (x Int8x32) Merge(y Int8x32, mask Mask8x32) Int8x32
- func (x Int8x32) Min(y Int8x32) Int8x32
- func (x Int8x32) Not() Int8x32
- func (x Int8x32) NotEqual(y Int8x32) Mask8x32
- func (x Int8x32) OnesCount() Int8x32
- func (x Int8x32) Or(y Int8x32) Int8x32
- func (x Int8x32) Permute(indices Uint8x32) Int8x32
- func (x Int8x32) PermuteOrZeroGrouped(indices Int8x32) Int8x32
- func (x Int8x32) Select128FromPair(lo, hi uint8, y Int8x32) Int8x32
- func (x Int8x32) SetHi(y Int8x16) Int8x32
- func (x Int8x32) SetLo(y Int8x16) Int8x32
- func (x Int8x32) Store(y *[32]int8)
- func (x Int8x32) StoreSlice(s []int8)
- func (x Int8x32) StoreSlicePart(s []int8)
- func (x Int8x32) String() string
- func (x Int8x32) Sub(y Int8x32) Int8x32
- func (x Int8x32) SubSaturated(y Int8x32) Int8x32
- func (from Int8x32) ToMask() (to Mask8x32)
- func (x Int8x32) Xor(y Int8x32) Int8x32
type Int8x64
- func BroadcastInt8x64(x int8) Int8x64
- func LoadInt8x64(y *[64]int8) Int8x64
- func LoadInt8x64Slice(s []int8) Int8x64
- func LoadInt8x64SlicePart(s []int8) Int8x64
- func LoadMaskedInt8x64(y *[64]int8, mask Mask8x64) Int8x64
- func (x Int8x64) Abs() Int8x64
- func (x Int8x64) Add(y Int8x64) Int8x64
- func (x Int8x64) AddSaturated(y Int8x64) Int8x64
- func (x Int8x64) And(y Int8x64) Int8x64
- func (x Int8x64) AndNot(y Int8x64) Int8x64
- func (from Int8x64) AsFloat32x16() (to Float32x16)
- func (from Int8x64) AsFloat64x8() (to Float64x8)
- func (from Int8x64) AsInt16x32() (to Int16x32)
- func (from Int8x64) AsInt32x16() (to Int32x16)
- func (from Int8x64) AsInt64x8() (to Int64x8)
- func (from Int8x64) AsUint16x32() (to Uint16x32)
- func (from Int8x64) AsUint32x16() (to Uint32x16)
- func (from Int8x64) AsUint64x8() (to Uint64x8)
- func (from Int8x64) AsUint8x64() (to Uint8x64)
- func (x Int8x64) Compress(mask Mask8x64) Int8x64
- func (x Int8x64) ConcatPermute(y Int8x64, indices Uint8x64) Int8x64
- func (x Int8x64) DotProductQuadruple(y Uint8x64) Int32x16
- func (x Int8x64) DotProductQuadrupleSaturated(y Uint8x64) Int32x16
- func (x Int8x64) Equal(y Int8x64) Mask8x64
- func (x Int8x64) Expand(mask Mask8x64) Int8x64
- func (x Int8x64) GetHi() Int8x32
- func (x Int8x64) GetLo() Int8x32
- func (x Int8x64) Greater(y Int8x64) Mask8x64
- func (x Int8x64) GreaterEqual(y Int8x64) Mask8x64
- func (x Int8x64) Len() int
- func (x Int8x64) Less(y Int8x64) Mask8x64
- func (x Int8x64) LessEqual(y Int8x64) Mask8x64
- func (x Int8x64) Masked(mask Mask8x64) Int8x64
- func (x Int8x64) Max(y Int8x64) Int8x64
- func (x Int8x64) Merge(y Int8x64, mask Mask8x64) Int8x64
- func (x Int8x64) Min(y Int8x64) Int8x64
- func (x Int8x64) Not() Int8x64
- func (x Int8x64) NotEqual(y Int8x64) Mask8x64
- func (x Int8x64) OnesCount() Int8x64
- func (x Int8x64) Or(y Int8x64) Int8x64
- func (x Int8x64) Permute(indices Uint8x64) Int8x64
- func (x Int8x64) PermuteOrZeroGrouped(indices Int8x64) Int8x64
- func (x Int8x64) SetHi(y Int8x32) Int8x64
- func (x Int8x64) SetLo(y Int8x32) Int8x64
- func (x Int8x64) Store(y *[64]int8)
- func (x Int8x64) StoreMasked(y *[64]int8, mask Mask8x64)
- func (x Int8x64) StoreSlice(s []int8)
- func (x Int8x64) StoreSlicePart(s []int8)
- func (x Int8x64) String() string
- func (x Int8x64) Sub(y Int8x64) Int8x64
- func (x Int8x64) SubSaturated(y Int8x64) Int8x64
- func (from Int8x64) ToMask() (to Mask8x64)
- func (x Int8x64) Xor(y Int8x64) Int8x64
type Mask16x16
- func Mask16x16FromBits(y uint16) Mask16x16
- func (x Mask16x16) And(y Mask16x16) Mask16x16
- func (x Mask16x16) Or(y Mask16x16) Mask16x16
- func (x Mask16x16) ToBits() uint16
- func (from Mask16x16) ToInt16x16() (to Int16x16)
type Mask16x32
- func Mask16x32FromBits(y uint32) Mask16x32
- func (x Mask16x32) And(y Mask16x32) Mask16x32
- func (x Mask16x32) Or(y Mask16x32) Mask16x32
- func (x Mask16x32) ToBits() uint32
- func (from Mask16x32) ToInt16x32() (to Int16x32)
type Mask16x8
- func Mask16x8FromBits(y uint8) Mask16x8
- func (x Mask16x8) And(y Mask16x8) Mask16x8
- func (x Mask16x8) Or(y Mask16x8) Mask16x8
- func (x Mask16x8) ToBits() uint8
- func (from Mask16x8) ToInt16x8() (to Int16x8)
type Mask32x16
- func Mask32x16FromBits(y uint16) Mask32x16
- func (x Mask32x16) And(y Mask32x16) Mask32x16
- func (x Mask32x16) Or(y Mask32x16) Mask32x16
- func (x Mask32x16) ToBits() uint16
- func (from Mask32x16) ToInt32x16() (to Int32x16)
type Mask32x4
- func Mask32x4FromBits(y uint8) Mask32x4
- func (x Mask32x4) And(y Mask32x4) Mask32x4
- func (x Mask32x4) Or(y Mask32x4) Mask32x4
- func (x Mask32x4) ToBits() uint8
- func (from Mask32x4) ToInt32x4() (to Int32x4)
type Mask32x8
- func Mask32x8FromBits(y uint8) Mask32x8
- func (x Mask32x8) And(y Mask32x8) Mask32x8
- func (x Mask32x8) Or(y Mask32x8) Mask32x8
- func (x Mask32x8) ToBits() uint8
- func (from Mask32x8) ToInt32x8() (to Int32x8)
type Mask64x2
- func Mask64x2FromBits(y uint8) Mask64x2
- func (x Mask64x2) And(y Mask64x2) Mask64x2
- func (x Mask64x2) Or(y Mask64x2) Mask64x2
- func (x Mask64x2) ToBits() uint8
- func (from Mask64x2) ToInt64x2() (to Int64x2)
type Mask64x4
- func Mask64x4FromBits(y uint8) Mask64x4
- func (x Mask64x4) And(y Mask64x4) Mask64x4
- func (x Mask64x4) Or(y Mask64x4) Mask64x4
- func (x Mask64x4) ToBits() uint8
- func (from Mask64x4) ToInt64x4() (to Int64x4)
type Mask64x8
- func Mask64x8FromBits(y uint8) Mask64x8
- func (x Mask64x8) And(y Mask64x8) Mask64x8
- func (x Mask64x8) Or(y Mask64x8) Mask64x8
- func (x Mask64x8) ToBits() uint8
- func (from Mask64x8) ToInt64x8() (to Int64x8)
type Mask8x16
- func Mask8x16FromBits(y uint16) Mask8x16
- func (x Mask8x16) And(y Mask8x16) Mask8x16
- func (x Mask8x16) Or(y Mask8x16) Mask8x16
- func (x Mask8x16) ToBits() uint16
- func (from Mask8x16) ToInt8x16() (to Int8x16)
type Mask8x32
- func Mask8x32FromBits(y uint32) Mask8x32
- func (x Mask8x32) And(y Mask8x32) Mask8x32
- func (x Mask8x32) Or(y Mask8x32) Mask8x32
- func (x Mask8x32) ToBits() uint32
- func (from Mask8x32) ToInt8x32() (to Int8x32)
type Mask8x64
- func Mask8x64FromBits(y uint64) Mask8x64
- func (x Mask8x64) And(y Mask8x64) Mask8x64
- func (x Mask8x64) Or(y Mask8x64) Mask8x64
- func (x Mask8x64) ToBits() uint64
- func (from Mask8x64) ToInt8x64() (to Int8x64)
type Uint16x16
- func BroadcastUint16x16(x uint16) Uint16x16
- func LoadUint16x16(y *[16]uint16) Uint16x16
- func LoadUint16x16Slice(s []uint16) Uint16x16
- func LoadUint16x16SlicePart(s []uint16) Uint16x16
- func (x Uint16x16) Add(y Uint16x16) Uint16x16
- func (x Uint16x16) AddPairs(y Uint16x16) Uint16x16
- func (x Uint16x16) AddSaturated(y Uint16x16) Uint16x16
- func (x Uint16x16) And(y Uint16x16) Uint16x16
- func (x Uint16x16) AndNot(y Uint16x16) Uint16x16
- func (from Uint16x16) AsFloat32x8() (to Float32x8)
- func (from Uint16x16) AsFloat64x4() (to Float64x4)
- func (from Uint16x16) AsInt16x16() (to Int16x16)
- func (from Uint16x16) AsInt32x8() (to Int32x8)
- func (from Uint16x16) AsInt64x4() (to Int64x4)
- func (from Uint16x16) AsInt8x32() (to Int8x32)
- func (from Uint16x16) AsUint32x8() (to Uint32x8)
- func (from Uint16x16) AsUint64x4() (to Uint64x4)
- func (from Uint16x16) AsUint8x32() (to Uint8x32)
- func (x Uint16x16) Average(y Uint16x16) Uint16x16
- func (x Uint16x16) Compress(mask Mask16x16) Uint16x16
- func (x Uint16x16) ConcatPermute(y Uint16x16, indices Uint16x16) Uint16x16
- func (x Uint16x16) Equal(y Uint16x16) Mask16x16
- func (x Uint16x16) Expand(mask Mask16x16) Uint16x16
- func (x Uint16x16) ExtendToUint32() Uint32x16
- func (x Uint16x16) GetHi() Uint16x8
- func (x Uint16x16) GetLo() Uint16x8
- func (x Uint16x16) Greater(y Uint16x16) Mask16x16
- func (x Uint16x16) GreaterEqual(y Uint16x16) Mask16x16
- func (x Uint16x16) InterleaveHiGrouped(y Uint16x16) Uint16x16
- func (x Uint16x16) InterleaveLoGrouped(y Uint16x16) Uint16x16
- func (x Uint16x16) IsZero() bool
- func (x Uint16x16) Len() int
- func (x Uint16x16) Less(y Uint16x16) Mask16x16
- func (x Uint16x16) LessEqual(y Uint16x16) Mask16x16
- func (x Uint16x16) Masked(mask Mask16x16) Uint16x16
- func (x Uint16x16) Max(y Uint16x16) Uint16x16
- func (x Uint16x16) Merge(y Uint16x16, mask Mask16x16) Uint16x16
- func (x Uint16x16) Min(y Uint16x16) Uint16x16
- func (x Uint16x16) Mul(y Uint16x16) Uint16x16
- func (x Uint16x16) MulHigh(y Uint16x16) Uint16x16
- func (x Uint16x16) Not() Uint16x16
- func (x Uint16x16) NotEqual(y Uint16x16) Mask16x16
- func (x Uint16x16) OnesCount() Uint16x16
- func (x Uint16x16) Or(y Uint16x16) Uint16x16
- func (x Uint16x16) Permute(indices Uint16x16) Uint16x16
- func (x Uint16x16) PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x16
- func (x Uint16x16) PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x16
- func (x Uint16x16) Select128FromPair(lo, hi uint8, y Uint16x16) Uint16x16
- func (x Uint16x16) SetHi(y Uint16x8) Uint16x16
- func (x Uint16x16) SetLo(y Uint16x8) Uint16x16
- func (x Uint16x16) ShiftAllLeft(y uint64) Uint16x16
- func (x Uint16x16) ShiftAllLeftConcat(shift uint8, y Uint16x16) Uint16x16
- func (x Uint16x16) ShiftAllRight(y uint64) Uint16x16
- func (x Uint16x16) ShiftAllRightConcat(shift uint8, y Uint16x16) Uint16x16
- func (x Uint16x16) ShiftLeft(y Uint16x16) Uint16x16
- func (x Uint16x16) ShiftLeftConcat(y Uint16x16, z Uint16x16) Uint16x16
- func (x Uint16x16) ShiftRight(y Uint16x16) Uint16x16
- func (x Uint16x16) ShiftRightConcat(y Uint16x16, z Uint16x16) Uint16x16
- func (x Uint16x16) Store(y *[16]uint16)
- func (x Uint16x16) StoreSlice(s []uint16)
- func (x Uint16x16) StoreSlicePart(s []uint16)
- func (x Uint16x16) String() string
- func (x Uint16x16) Sub(y Uint16x16) Uint16x16
- func (x Uint16x16) SubPairs(y Uint16x16) Uint16x16
- func (x Uint16x16) SubSaturated(y Uint16x16) Uint16x16
- func (x Uint16x16) TruncateToUint8() Uint8x16
- func (x Uint16x16) Xor(y Uint16x16) Uint16x16
type Uint16x32
- func BroadcastUint16x32(x uint16) Uint16x32
- func LoadMaskedUint16x32(y *[32]uint16, mask Mask16x32) Uint16x32
- func LoadUint16x32(y *[32]uint16) Uint16x32
- func LoadUint16x32Slice(s []uint16) Uint16x32
- func LoadUint16x32SlicePart(s []uint16) Uint16x32
- func (x Uint16x32) Add(y Uint16x32) Uint16x32
- func (x Uint16x32) AddSaturated(y Uint16x32) Uint16x32
- func (x Uint16x32) And(y Uint16x32) Uint16x32
- func (x Uint16x32) AndNot(y Uint16x32) Uint16x32
- func (from Uint16x32) AsFloat32x16() (to Float32x16)
- func (from Uint16x32) AsFloat64x8() (to Float64x8)
- func (from Uint16x32) AsInt16x32() (to Int16x32)
- func (from Uint16x32) AsInt32x16() (to Int32x16)
- func (from Uint16x32) AsInt64x8() (to Int64x8)
- func (from Uint16x32) AsInt8x64() (to Int8x64)
- func (from Uint16x32) AsUint32x16() (to Uint32x16)
- func (from Uint16x32) AsUint64x8() (to Uint64x8)
- func (from Uint16x32) AsUint8x64() (to Uint8x64)
- func (x Uint16x32) Average(y Uint16x32) Uint16x32
- func (x Uint16x32) Compress(mask Mask16x32) Uint16x32
- func (x Uint16x32) ConcatPermute(y Uint16x32, indices Uint16x32) Uint16x32
- func (x Uint16x32) Equal(y Uint16x32) Mask16x32
- func (x Uint16x32) Expand(mask Mask16x32) Uint16x32
- func (x Uint16x32) GetHi() Uint16x16
- func (x Uint16x32) GetLo() Uint16x16
- func (x Uint16x32) Greater(y Uint16x32) Mask16x32
- func (x Uint16x32) GreaterEqual(y Uint16x32) Mask16x32
- func (x Uint16x32) InterleaveHiGrouped(y Uint16x32) Uint16x32
- func (x Uint16x32) InterleaveLoGrouped(y Uint16x32) Uint16x32
- func (x Uint16x32) Len() int
- func (x Uint16x32) Less(y Uint16x32) Mask16x32
- func (x Uint16x32) LessEqual(y Uint16x32) Mask16x32
- func (x Uint16x32) Masked(mask Mask16x32) Uint16x32
- func (x Uint16x32) Max(y Uint16x32) Uint16x32
- func (x Uint16x32) Merge(y Uint16x32, mask Mask16x32) Uint16x32
- func (x Uint16x32) Min(y Uint16x32) Uint16x32
- func (x Uint16x32) Mul(y Uint16x32) Uint16x32
- func (x Uint16x32) MulHigh(y Uint16x32) Uint16x32
- func (x Uint16x32) Not() Uint16x32
- func (x Uint16x32) NotEqual(y Uint16x32) Mask16x32
- func (x Uint16x32) OnesCount() Uint16x32
- func (x Uint16x32) Or(y Uint16x32) Uint16x32
- func (x Uint16x32) Permute(indices Uint16x32) Uint16x32
- func (x Uint16x32) PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x32
- func (x Uint16x32) PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x32
- func (x Uint16x32) SaturateToUint8() Uint8x32
- func (x Uint16x32) SetHi(y Uint16x16) Uint16x32
- func (x Uint16x32) SetLo(y Uint16x16) Uint16x32
- func (x Uint16x32) ShiftAllLeft(y uint64) Uint16x32
- func (x Uint16x32) ShiftAllLeftConcat(shift uint8, y Uint16x32) Uint16x32
- func (x Uint16x32) ShiftAllRight(y uint64) Uint16x32
- func (x Uint16x32) ShiftAllRightConcat(shift uint8, y Uint16x32) Uint16x32
- func (x Uint16x32) ShiftLeft(y Uint16x32) Uint16x32
- func (x Uint16x32) ShiftLeftConcat(y Uint16x32, z Uint16x32) Uint16x32
- func (x Uint16x32) ShiftRight(y Uint16x32) Uint16x32
- func (x Uint16x32) ShiftRightConcat(y Uint16x32, z Uint16x32) Uint16x32
- func (x Uint16x32) Store(y *[32]uint16)
- func (x Uint16x32) StoreMasked(y *[32]uint16, mask Mask16x32)
- func (x Uint16x32) StoreSlice(s []uint16)
- func (x Uint16x32) StoreSlicePart(s []uint16)
- func (x Uint16x32) String() string
- func (x Uint16x32) Sub(y Uint16x32) Uint16x32
- func (x Uint16x32) SubSaturated(y Uint16x32) Uint16x32
- func (x Uint16x32) TruncateToUint8() Uint8x32
- func (x Uint16x32) Xor(y Uint16x32) Uint16x32
type Uint16x8
- func BroadcastUint16x8(x uint16) Uint16x8
- func LoadUint16x8(y *[8]uint16) Uint16x8
- func LoadUint16x8Slice(s []uint16) Uint16x8
- func LoadUint16x8SlicePart(s []uint16) Uint16x8
- func (x Uint16x8) Add(y Uint16x8) Uint16x8
- func (x Uint16x8) AddPairs(y Uint16x8) Uint16x8
- func (x Uint16x8) AddSaturated(y Uint16x8) Uint16x8
- func (x Uint16x8) And(y Uint16x8) Uint16x8
- func (x Uint16x8) AndNot(y Uint16x8) Uint16x8
- func (from Uint16x8) AsFloat32x4() (to Float32x4)
- func (from Uint16x8) AsFloat64x2() (to Float64x2)
- func (from Uint16x8) AsInt16x8() (to Int16x8)
- func (from Uint16x8) AsInt32x4() (to Int32x4)
- func (from Uint16x8) AsInt64x2() (to Int64x2)
- func (from Uint16x8) AsInt8x16() (to Int8x16)
- func (from Uint16x8) AsUint32x4() (to Uint32x4)
- func (from Uint16x8) AsUint64x2() (to Uint64x2)
- func (from Uint16x8) AsUint8x16() (to Uint8x16)
- func (x Uint16x8) Average(y Uint16x8) Uint16x8
- func (x Uint16x8) Broadcast128() Uint16x8
- func (x Uint16x8) Broadcast256() Uint16x16
- func (x Uint16x8) Broadcast512() Uint16x32
- func (x Uint16x8) Compress(mask Mask16x8) Uint16x8
- func (x Uint16x8) ConcatPermute(y Uint16x8, indices Uint16x8) Uint16x8
- func (x Uint16x8) Equal(y Uint16x8) Mask16x8
- func (x Uint16x8) Expand(mask Mask16x8) Uint16x8
- func (x Uint16x8) ExtendLo2ToUint64x2() Uint64x2
- func (x Uint16x8) ExtendLo4ToUint32x4() Uint32x4
- func (x Uint16x8) ExtendLo4ToUint64x4() Uint64x4
- func (x Uint16x8) ExtendToUint32() Uint32x8
- func (x Uint16x8) ExtendToUint64() Uint64x8
- func (x Uint16x8) GetElem(index uint8) uint16
- func (x Uint16x8) Greater(y Uint16x8) Mask16x8
- func (x Uint16x8) GreaterEqual(y Uint16x8) Mask16x8
- func (x Uint16x8) InterleaveHi(y Uint16x8) Uint16x8
- func (x Uint16x8) InterleaveLo(y Uint16x8) Uint16x8
- func (x Uint16x8) IsZero() bool
- func (x Uint16x8) Len() int
- func (x Uint16x8) Less(y Uint16x8) Mask16x8
- func (x Uint16x8) LessEqual(y Uint16x8) Mask16x8
- func (x Uint16x8) Masked(mask Mask16x8) Uint16x8
- func (x Uint16x8) Max(y Uint16x8) Uint16x8
- func (x Uint16x8) Merge(y Uint16x8, mask Mask16x8) Uint16x8
- func (x Uint16x8) Min(y Uint16x8) Uint16x8
- func (x Uint16x8) Mul(y Uint16x8) Uint16x8
- func (x Uint16x8) MulHigh(y Uint16x8) Uint16x8
- func (x Uint16x8) Not() Uint16x8
- func (x Uint16x8) NotEqual(y Uint16x8) Mask16x8
- func (x Uint16x8) OnesCount() Uint16x8
- func (x Uint16x8) Or(y Uint16x8) Uint16x8
- func (x Uint16x8) Permute(indices Uint16x8) Uint16x8
- func (x Uint16x8) PermuteScalarsHi(a, b, c, d uint8) Uint16x8
- func (x Uint16x8) PermuteScalarsLo(a, b, c, d uint8) Uint16x8
- func (x Uint16x8) SetElem(index uint8, y uint16) Uint16x8
- func (x Uint16x8) ShiftAllLeft(y uint64) Uint16x8
- func (x Uint16x8) ShiftAllLeftConcat(shift uint8, y Uint16x8) Uint16x8
- func (x Uint16x8) ShiftAllRight(y uint64) Uint16x8
- func (x Uint16x8) ShiftAllRightConcat(shift uint8, y Uint16x8) Uint16x8
- func (x Uint16x8) ShiftLeft(y Uint16x8) Uint16x8
- func (x Uint16x8) ShiftLeftConcat(y Uint16x8, z Uint16x8) Uint16x8
- func (x Uint16x8) ShiftRight(y Uint16x8) Uint16x8
- func (x Uint16x8) ShiftRightConcat(y Uint16x8, z Uint16x8) Uint16x8
- func (x Uint16x8) Store(y *[8]uint16)
- func (x Uint16x8) StoreSlice(s []uint16)
- func (x Uint16x8) StoreSlicePart(s []uint16)
- func (x Uint16x8) String() string
- func (x Uint16x8) Sub(y Uint16x8) Uint16x8
- func (x Uint16x8) SubPairs(y Uint16x8) Uint16x8
- func (x Uint16x8) SubSaturated(y Uint16x8) Uint16x8
- func (x Uint16x8) TruncateToUint8() Uint8x16
- func (x Uint16x8) Xor(y Uint16x8) Uint16x8
type Uint32x16
- func BroadcastUint32x16(x uint32) Uint32x16
- func LoadMaskedUint32x16(y *[16]uint32, mask Mask32x16) Uint32x16
- func LoadUint32x16(y *[16]uint32) Uint32x16
- func LoadUint32x16Slice(s []uint32) Uint32x16
- func LoadUint32x16SlicePart(s []uint32) Uint32x16
- func (x Uint32x16) Add(y Uint32x16) Uint32x16
- func (x Uint32x16) And(y Uint32x16) Uint32x16
- func (x Uint32x16) AndNot(y Uint32x16) Uint32x16
- func (from Uint32x16) AsFloat32x16() (to Float32x16)
- func (from Uint32x16) AsFloat64x8() (to Float64x8)
- func (from Uint32x16) AsInt16x32() (to Int16x32)
- func (from Uint32x16) AsInt32x16() (to Int32x16)
- func (from Uint32x16) AsInt64x8() (to Int64x8)
- func (from Uint32x16) AsInt8x64() (to Int8x64)
- func (from Uint32x16) AsUint16x32() (to Uint16x32)
- func (from Uint32x16) AsUint64x8() (to Uint64x8)
- func (from Uint32x16) AsUint8x64() (to Uint8x64)
- func (x Uint32x16) Compress(mask Mask32x16) Uint32x16
- func (x Uint32x16) ConcatPermute(y Uint32x16, indices Uint32x16) Uint32x16
- func (x Uint32x16) ConvertToFloat32() Float32x16
- func (x Uint32x16) Equal(y Uint32x16) Mask32x16
- func (x Uint32x16) Expand(mask Mask32x16) Uint32x16
- func (x Uint32x16) GetHi() Uint32x8
- func (x Uint32x16) GetLo() Uint32x8
- func (x Uint32x16) Greater(y Uint32x16) Mask32x16
- func (x Uint32x16) GreaterEqual(y Uint32x16) Mask32x16
- func (x Uint32x16) InterleaveHiGrouped(y Uint32x16) Uint32x16
- func (x Uint32x16) InterleaveLoGrouped(y Uint32x16) Uint32x16
- func (x Uint32x16) LeadingZeros() Uint32x16
- func (x Uint32x16) Len() int
- func (x Uint32x16) Less(y Uint32x16) Mask32x16
- func (x Uint32x16) LessEqual(y Uint32x16) Mask32x16
- func (x Uint32x16) Masked(mask Mask32x16) Uint32x16
- func (x Uint32x16) Max(y Uint32x16) Uint32x16
- func (x Uint32x16) Merge(y Uint32x16, mask Mask32x16) Uint32x16
- func (x Uint32x16) Min(y Uint32x16) Uint32x16
- func (x Uint32x16) Mul(y Uint32x16) Uint32x16
- func (x Uint32x16) Not() Uint32x16
- func (x Uint32x16) NotEqual(y Uint32x16) Mask32x16
- func (x Uint32x16) OnesCount() Uint32x16
- func (x Uint32x16) Or(y Uint32x16) Uint32x16
- func (x Uint32x16) Permute(indices Uint32x16) Uint32x16
- func (x Uint32x16) PermuteScalarsGrouped(a, b, c, d uint8) Uint32x16
- func (x Uint32x16) RotateAllLeft(shift uint8) Uint32x16
- func (x Uint32x16) RotateAllRight(shift uint8) Uint32x16
- func (x Uint32x16) RotateLeft(y Uint32x16) Uint32x16
- func (x Uint32x16) RotateRight(y Uint32x16) Uint32x16
- func (x Uint32x16) SaturateToUint16() Uint16x16
- func (x Uint32x16) SaturateToUint16Concat(y Uint32x16) Uint16x32
- func (x Uint32x16) SelectFromPairGrouped(a, b, c, d uint8, y Uint32x16) Uint32x16
- func (x Uint32x16) SetHi(y Uint32x8) Uint32x16
- func (x Uint32x16) SetLo(y Uint32x8) Uint32x16
- func (x Uint32x16) ShiftAllLeft(y uint64) Uint32x16
- func (x Uint32x16) ShiftAllLeftConcat(shift uint8, y Uint32x16) Uint32x16
- func (x Uint32x16) ShiftAllRight(y uint64) Uint32x16
- func (x Uint32x16) ShiftAllRightConcat(shift uint8, y Uint32x16) Uint32x16
- func (x Uint32x16) ShiftLeft(y Uint32x16) Uint32x16
- func (x Uint32x16) ShiftLeftConcat(y Uint32x16, z Uint32x16) Uint32x16
- func (x Uint32x16) ShiftRight(y Uint32x16) Uint32x16
- func (x Uint32x16) ShiftRightConcat(y Uint32x16, z Uint32x16) Uint32x16
- func (x Uint32x16) Store(y *[16]uint32)
- func (x Uint32x16) StoreMasked(y *[16]uint32, mask Mask32x16)
- func (x Uint32x16) StoreSlice(s []uint32)
- func (x Uint32x16) StoreSlicePart(s []uint32)
- func (x Uint32x16) String() string
- func (x Uint32x16) Sub(y Uint32x16) Uint32x16
- func (x Uint32x16) TruncateToUint16() Uint16x16
- func (x Uint32x16) TruncateToUint8() Uint8x16
- func (x Uint32x16) Xor(y Uint32x16) Uint32x16
type Uint32x4
- func BroadcastUint32x4(x uint32) Uint32x4
- func LoadMaskedUint32x4(y *[4]uint32, mask Mask32x4) Uint32x4
- func LoadUint32x4(y *[4]uint32) Uint32x4
- func LoadUint32x4Slice(s []uint32) Uint32x4
- func LoadUint32x4SlicePart(s []uint32) Uint32x4
- func (x Uint32x4) AESInvMixColumns() Uint32x4
- func (x Uint32x4) AESRoundKeyGenAssist(rconVal uint8) Uint32x4
- func (x Uint32x4) Add(y Uint32x4) Uint32x4
- func (x Uint32x4) AddPairs(y Uint32x4) Uint32x4
- func (x Uint32x4) And(y Uint32x4) Uint32x4
- func (x Uint32x4) AndNot(y Uint32x4) Uint32x4
- func (from Uint32x4) AsFloat32x4() (to Float32x4)
- func (from Uint32x4) AsFloat64x2() (to Float64x2)
- func (from Uint32x4) AsInt16x8() (to Int16x8)
- func (from Uint32x4) AsInt32x4() (to Int32x4)
- func (from Uint32x4) AsInt64x2() (to Int64x2)
- func (from Uint32x4) AsInt8x16() (to Int8x16)
- func (from Uint32x4) AsUint16x8() (to Uint16x8)
- func (from Uint32x4) AsUint64x2() (to Uint64x2)
- func (from Uint32x4) AsUint8x16() (to Uint8x16)
- func (x Uint32x4) Broadcast128() Uint32x4
- func (x Uint32x4) Broadcast256() Uint32x8
- func (x Uint32x4) Broadcast512() Uint32x16
- func (x Uint32x4) Compress(mask Mask32x4) Uint32x4
- func (x Uint32x4) ConcatPermute(y Uint32x4, indices Uint32x4) Uint32x4
- func (x Uint32x4) ConvertToFloat32() Float32x4
- func (x Uint32x4) ConvertToFloat64() Float64x4
- func (x Uint32x4) Equal(y Uint32x4) Mask32x4
- func (x Uint32x4) Expand(mask Mask32x4) Uint32x4
- func (x Uint32x4) ExtendLo2ToUint64x2() Uint64x2
- func (x Uint32x4) ExtendToUint64() Uint64x4
- func (x Uint32x4) GetElem(index uint8) uint32
- func (x Uint32x4) Greater(y Uint32x4) Mask32x4
- func (x Uint32x4) GreaterEqual(y Uint32x4) Mask32x4
- func (x Uint32x4) InterleaveHi(y Uint32x4) Uint32x4
- func (x Uint32x4) InterleaveLo(y Uint32x4) Uint32x4
- func (x Uint32x4) IsZero() bool
- func (x Uint32x4) LeadingZeros() Uint32x4
- func (x Uint32x4) Len() int
- func (x Uint32x4) Less(y Uint32x4) Mask32x4
- func (x Uint32x4) LessEqual(y Uint32x4) Mask32x4
- func (x Uint32x4) Masked(mask Mask32x4) Uint32x4
- func (x Uint32x4) Max(y Uint32x4) Uint32x4
- func (x Uint32x4) Merge(y Uint32x4, mask Mask32x4) Uint32x4
- func (x Uint32x4) Min(y Uint32x4) Uint32x4
- func (x Uint32x4) Mul(y Uint32x4) Uint32x4
- func (x Uint32x4) MulEvenWiden(y Uint32x4) Uint64x2
- func (x Uint32x4) Not() Uint32x4
- func (x Uint32x4) NotEqual(y Uint32x4) Mask32x4
- func (x Uint32x4) OnesCount() Uint32x4
- func (x Uint32x4) Or(y Uint32x4) Uint32x4
- func (x Uint32x4) PermuteScalars(a, b, c, d uint8) Uint32x4
- func (x Uint32x4) RotateAllLeft(shift uint8) Uint32x4
- func (x Uint32x4) RotateAllRight(shift uint8) Uint32x4
- func (x Uint32x4) RotateLeft(y Uint32x4) Uint32x4
- func (x Uint32x4) RotateRight(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA1FourRounds(constant uint8, y Uint32x4) Uint32x4
- func (x Uint32x4) SHA1Message1(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA1Message2(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA1NextE(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA256Message1(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA256Message2(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA256TwoRounds(y Uint32x4, z Uint32x4) Uint32x4
- func (x Uint32x4) SaturateToUint16() Uint16x8
- func (x Uint32x4) SaturateToUint16Concat(y Uint32x4) Uint16x8
- func (x Uint32x4) SelectFromPair(a, b, c, d uint8, y Uint32x4) Uint32x4
- func (x Uint32x4) SetElem(index uint8, y uint32) Uint32x4
- func (x Uint32x4) ShiftAllLeft(y uint64) Uint32x4
- func (x Uint32x4) ShiftAllLeftConcat(shift uint8, y Uint32x4) Uint32x4
- func (x Uint32x4) ShiftAllRight(y uint64) Uint32x4
- func (x Uint32x4) ShiftAllRightConcat(shift uint8, y Uint32x4) Uint32x4
- func (x Uint32x4) ShiftLeft(y Uint32x4) Uint32x4
- func (x Uint32x4) ShiftLeftConcat(y Uint32x4, z Uint32x4) Uint32x4
- func (x Uint32x4) ShiftRight(y Uint32x4) Uint32x4
- func (x Uint32x4) ShiftRightConcat(y Uint32x4, z Uint32x4) Uint32x4
- func (x Uint32x4) Store(y *[4]uint32)
- func (x Uint32x4) StoreMasked(y *[4]uint32, mask Mask32x4)
- func (x Uint32x4) StoreSlice(s []uint32)
- func (x Uint32x4) StoreSlicePart(s []uint32)
- func (x Uint32x4) String() string
- func (x Uint32x4) Sub(y Uint32x4) Uint32x4
- func (x Uint32x4) SubPairs(y Uint32x4) Uint32x4
- func (x Uint32x4) TruncateToUint16() Uint16x8
- func (x Uint32x4) TruncateToUint8() Uint8x16
- func (x Uint32x4) Xor(y Uint32x4) Uint32x4
type Uint32x8
- func BroadcastUint32x8(x uint32) Uint32x8
- func LoadMaskedUint32x8(y *[8]uint32, mask Mask32x8) Uint32x8
- func LoadUint32x8(y *[8]uint32) Uint32x8
- func LoadUint32x8Slice(s []uint32) Uint32x8
- func LoadUint32x8SlicePart(s []uint32) Uint32x8
- func (x Uint32x8) Add(y Uint32x8) Uint32x8
- func (x Uint32x8) AddPairs(y Uint32x8) Uint32x8
- func (x Uint32x8) And(y Uint32x8) Uint32x8
- func (x Uint32x8) AndNot(y Uint32x8) Uint32x8
- func (from Uint32x8) AsFloat32x8() (to Float32x8)
- func (from Uint32x8) AsFloat64x4() (to Float64x4)
- func (from Uint32x8) AsInt16x16() (to Int16x16)
- func (from Uint32x8) AsInt32x8() (to Int32x8)
- func (from Uint32x8) AsInt64x4() (to Int64x4)
- func (from Uint32x8) AsInt8x32() (to Int8x32)
- func (from Uint32x8) AsUint16x16() (to Uint16x16)
- func (from Uint32x8) AsUint64x4() (to Uint64x4)
- func (from Uint32x8) AsUint8x32() (to Uint8x32)
- func (x Uint32x8) Compress(mask Mask32x8) Uint32x8
- func (x Uint32x8) ConcatPermute(y Uint32x8, indices Uint32x8) Uint32x8
- func (x Uint32x8) ConvertToFloat32() Float32x8
- func (x Uint32x8) ConvertToFloat64() Float64x8
- func (x Uint32x8) Equal(y Uint32x8) Mask32x8
- func (x Uint32x8) Expand(mask Mask32x8) Uint32x8
- func (x Uint32x8) ExtendToUint64() Uint64x8
- func (x Uint32x8) GetHi() Uint32x4
- func (x Uint32x8) GetLo() Uint32x4
- func (x Uint32x8) Greater(y Uint32x8) Mask32x8
- func (x Uint32x8) GreaterEqual(y Uint32x8) Mask32x8
- func (x Uint32x8) InterleaveHiGrouped(y Uint32x8) Uint32x8
- func (x Uint32x8) InterleaveLoGrouped(y Uint32x8) Uint32x8
- func (x Uint32x8) IsZero() bool
- func (x Uint32x8) LeadingZeros() Uint32x8
- func (x Uint32x8) Len() int
- func (x Uint32x8) Less(y Uint32x8) Mask32x8
- func (x Uint32x8) LessEqual(y Uint32x8) Mask32x8
- func (x Uint32x8) Masked(mask Mask32x8) Uint32x8
- func (x Uint32x8) Max(y Uint32x8) Uint32x8
- func (x Uint32x8) Merge(y Uint32x8, mask Mask32x8) Uint32x8
- func (x Uint32x8) Min(y Uint32x8) Uint32x8
- func (x Uint32x8) Mul(y Uint32x8) Uint32x8
- func (x Uint32x8) MulEvenWiden(y Uint32x8) Uint64x4
- func (x Uint32x8) Not() Uint32x8
- func (x Uint32x8) NotEqual(y Uint32x8) Mask32x8
- func (x Uint32x8) OnesCount() Uint32x8
- func (x Uint32x8) Or(y Uint32x8) Uint32x8
- func (x Uint32x8) Permute(indices Uint32x8) Uint32x8
- func (x Uint32x8) PermuteScalarsGrouped(a, b, c, d uint8) Uint32x8
- func (x Uint32x8) RotateAllLeft(shift uint8) Uint32x8
- func (x Uint32x8) RotateAllRight(shift uint8) Uint32x8
- func (x Uint32x8) RotateLeft(y Uint32x8) Uint32x8
- func (x Uint32x8) RotateRight(y Uint32x8) Uint32x8
- func (x Uint32x8) SaturateToUint16() Uint16x8
- func (x Uint32x8) SaturateToUint16Concat(y Uint32x8) Uint16x16
- func (x Uint32x8) Select128FromPair(lo, hi uint8, y Uint32x8) Uint32x8
- func (x Uint32x8) SelectFromPairGrouped(a, b, c, d uint8, y Uint32x8) Uint32x8
- func (x Uint32x8) SetHi(y Uint32x4) Uint32x8
- func (x Uint32x8) SetLo(y Uint32x4) Uint32x8
- func (x Uint32x8) ShiftAllLeft(y uint64) Uint32x8
- func (x Uint32x8) ShiftAllLeftConcat(shift uint8, y Uint32x8) Uint32x8
- func (x Uint32x8) ShiftAllRight(y uint64) Uint32x8
- func (x Uint32x8) ShiftAllRightConcat(shift uint8, y Uint32x8) Uint32x8
- func (x Uint32x8) ShiftLeft(y Uint32x8) Uint32x8
- func (x Uint32x8) ShiftLeftConcat(y Uint32x8, z Uint32x8) Uint32x8
- func (x Uint32x8) ShiftRight(y Uint32x8) Uint32x8
- func (x Uint32x8) ShiftRightConcat(y Uint32x8, z Uint32x8) Uint32x8
- func (x Uint32x8) Store(y *[8]uint32)
- func (x Uint32x8) StoreMasked(y *[8]uint32, mask Mask32x8)
- func (x Uint32x8) StoreSlice(s []uint32)
- func (x Uint32x8) StoreSlicePart(s []uint32)
- func (x Uint32x8) String() string
- func (x Uint32x8) Sub(y Uint32x8) Uint32x8
- func (x Uint32x8) SubPairs(y Uint32x8) Uint32x8
- func (x Uint32x8) TruncateToUint16() Uint16x8
- func (x Uint32x8) TruncateToUint8() Uint8x16
- func (x Uint32x8) Xor(y Uint32x8) Uint32x8
type Uint64x2
- func BroadcastUint64x2(x uint64) Uint64x2
- func LoadMaskedUint64x2(y *[2]uint64, mask Mask64x2) Uint64x2
- func LoadUint64x2(y *[2]uint64) Uint64x2
- func LoadUint64x2Slice(s []uint64) Uint64x2
- func LoadUint64x2SlicePart(s []uint64) Uint64x2
- func (x Uint64x2) Add(y Uint64x2) Uint64x2
- func (x Uint64x2) And(y Uint64x2) Uint64x2
- func (x Uint64x2) AndNot(y Uint64x2) Uint64x2
- func (from Uint64x2) AsFloat32x4() (to Float32x4)
- func (from Uint64x2) AsFloat64x2() (to Float64x2)
- func (from Uint64x2) AsInt16x8() (to Int16x8)
- func (from Uint64x2) AsInt32x4() (to Int32x4)
- func (from Uint64x2) AsInt64x2() (to Int64x2)
- func (from Uint64x2) AsInt8x16() (to Int8x16)
- func (from Uint64x2) AsUint16x8() (to Uint16x8)
- func (from Uint64x2) AsUint32x4() (to Uint32x4)
- func (from Uint64x2) AsUint8x16() (to Uint8x16)
- func (x Uint64x2) Broadcast128() Uint64x2
- func (x Uint64x2) Broadcast256() Uint64x4
- func (x Uint64x2) Broadcast512() Uint64x8
- func (x Uint64x2) CarrylessMultiply(a, b uint8, y Uint64x2) Uint64x2
- func (x Uint64x2) Compress(mask Mask64x2) Uint64x2
- func (x Uint64x2) ConcatPermute(y Uint64x2, indices Uint64x2) Uint64x2
- func (x Uint64x2) ConvertToFloat32() Float32x4
- func (x Uint64x2) ConvertToFloat64() Float64x2
- func (x Uint64x2) Equal(y Uint64x2) Mask64x2
- func (x Uint64x2) Expand(mask Mask64x2) Uint64x2
- func (x Uint64x2) GetElem(index uint8) uint64
- func (x Uint64x2) Greater(y Uint64x2) Mask64x2
- func (x Uint64x2) GreaterEqual(y Uint64x2) Mask64x2
- func (x Uint64x2) InterleaveHi(y Uint64x2) Uint64x2
- func (x Uint64x2) InterleaveLo(y Uint64x2) Uint64x2
- func (x Uint64x2) IsZero() bool
- func (x Uint64x2) LeadingZeros() Uint64x2
- func (x Uint64x2) Len() int
- func (x Uint64x2) Less(y Uint64x2) Mask64x2
- func (x Uint64x2) LessEqual(y Uint64x2) Mask64x2
- func (x Uint64x2) Masked(mask Mask64x2) Uint64x2
- func (x Uint64x2) Max(y Uint64x2) Uint64x2
- func (x Uint64x2) Merge(y Uint64x2, mask Mask64x2) Uint64x2
- func (x Uint64x2) Min(y Uint64x2) Uint64x2
- func (x Uint64x2) Mul(y Uint64x2) Uint64x2
- func (x Uint64x2) Not() Uint64x2
- func (x Uint64x2) NotEqual(y Uint64x2) Mask64x2
- func (x Uint64x2) OnesCount() Uint64x2
- func (x Uint64x2) Or(y Uint64x2) Uint64x2
- func (x Uint64x2) RotateAllLeft(shift uint8) Uint64x2
- func (x Uint64x2) RotateAllRight(shift uint8) Uint64x2
- func (x Uint64x2) RotateLeft(y Uint64x2) Uint64x2
- func (x Uint64x2) RotateRight(y Uint64x2) Uint64x2
- func (x Uint64x2) SaturateToUint16() Uint16x8
- func (x Uint64x2) SaturateToUint32() Uint32x4
- func (x Uint64x2) SelectFromPair(a, b uint8, y Uint64x2) Uint64x2
- func (x Uint64x2) SetElem(index uint8, y uint64) Uint64x2
- func (x Uint64x2) ShiftAllLeft(y uint64) Uint64x2
- func (x Uint64x2) ShiftAllLeftConcat(shift uint8, y Uint64x2) Uint64x2
- func (x Uint64x2) ShiftAllRight(y uint64) Uint64x2
- func (x Uint64x2) ShiftAllRightConcat(shift uint8, y Uint64x2) Uint64x2
- func (x Uint64x2) ShiftLeft(y Uint64x2) Uint64x2
- func (x Uint64x2) ShiftLeftConcat(y Uint64x2, z Uint64x2) Uint64x2
- func (x Uint64x2) ShiftRight(y Uint64x2) Uint64x2
- func (x Uint64x2) ShiftRightConcat(y Uint64x2, z Uint64x2) Uint64x2
- func (x Uint64x2) Store(y *[2]uint64)
- func (x Uint64x2) StoreMasked(y *[2]uint64, mask Mask64x2)
- func (x Uint64x2) StoreSlice(s []uint64)
- func (x Uint64x2) StoreSlicePart(s []uint64)
- func (x Uint64x2) String() string
- func (x Uint64x2) Sub(y Uint64x2) Uint64x2
- func (x Uint64x2) TruncateToUint16() Uint16x8
- func (x Uint64x2) TruncateToUint32() Uint32x4
- func (x Uint64x2) TruncateToUint8() Uint8x16
- func (x Uint64x2) Xor(y Uint64x2) Uint64x2
type Uint64x4
- func BroadcastUint64x4(x uint64) Uint64x4
- func LoadMaskedUint64x4(y *[4]uint64, mask Mask64x4) Uint64x4
- func LoadUint64x4(y *[4]uint64) Uint64x4
- func LoadUint64x4Slice(s []uint64) Uint64x4
- func LoadUint64x4SlicePart(s []uint64) Uint64x4
- func (x Uint64x4) Add(y Uint64x4) Uint64x4
- func (x Uint64x4) And(y Uint64x4) Uint64x4
- func (x Uint64x4) AndNot(y Uint64x4) Uint64x4
- func (from Uint64x4) AsFloat32x8() (to Float32x8)
- func (from Uint64x4) AsFloat64x4() (to Float64x4)
- func (from Uint64x4) AsInt16x16() (to Int16x16)
- func (from Uint64x4) AsInt32x8() (to Int32x8)
- func (from Uint64x4) AsInt64x4() (to Int64x4)
- func (from Uint64x4) AsInt8x32() (to Int8x32)
- func (from Uint64x4) AsUint16x16() (to Uint16x16)
- func (from Uint64x4) AsUint32x8() (to Uint32x8)
- func (from Uint64x4) AsUint8x32() (to Uint8x32)
- func (x Uint64x4) CarrylessMultiplyGrouped(a, b uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) Compress(mask Mask64x4) Uint64x4
- func (x Uint64x4) ConcatPermute(y Uint64x4, indices Uint64x4) Uint64x4
- func (x Uint64x4) ConvertToFloat32() Float32x4
- func (x Uint64x4) ConvertToFloat64() Float64x4
- func (x Uint64x4) Equal(y Uint64x4) Mask64x4
- func (x Uint64x4) Expand(mask Mask64x4) Uint64x4
- func (x Uint64x4) GetHi() Uint64x2
- func (x Uint64x4) GetLo() Uint64x2
- func (x Uint64x4) Greater(y Uint64x4) Mask64x4
- func (x Uint64x4) GreaterEqual(y Uint64x4) Mask64x4
- func (x Uint64x4) InterleaveHiGrouped(y Uint64x4) Uint64x4
- func (x Uint64x4) InterleaveLoGrouped(y Uint64x4) Uint64x4
- func (x Uint64x4) IsZero() bool
- func (x Uint64x4) LeadingZeros() Uint64x4
- func (x Uint64x4) Len() int
- func (x Uint64x4) Less(y Uint64x4) Mask64x4
- func (x Uint64x4) LessEqual(y Uint64x4) Mask64x4
- func (x Uint64x4) Masked(mask Mask64x4) Uint64x4
- func (x Uint64x4) Max(y Uint64x4) Uint64x4
- func (x Uint64x4) Merge(y Uint64x4, mask Mask64x4) Uint64x4
- func (x Uint64x4) Min(y Uint64x4) Uint64x4
- func (x Uint64x4) Mul(y Uint64x4) Uint64x4
- func (x Uint64x4) Not() Uint64x4
- func (x Uint64x4) NotEqual(y Uint64x4) Mask64x4
- func (x Uint64x4) OnesCount() Uint64x4
- func (x Uint64x4) Or(y Uint64x4) Uint64x4
- func (x Uint64x4) Permute(indices Uint64x4) Uint64x4
- func (x Uint64x4) RotateAllLeft(shift uint8) Uint64x4
- func (x Uint64x4) RotateAllRight(shift uint8) Uint64x4
- func (x Uint64x4) RotateLeft(y Uint64x4) Uint64x4
- func (x Uint64x4) RotateRight(y Uint64x4) Uint64x4
- func (x Uint64x4) SaturateToUint16() Uint16x8
- func (x Uint64x4) SaturateToUint32() Uint32x4
- func (x Uint64x4) Select128FromPair(lo, hi uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) SelectFromPairGrouped(a, b uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) SetHi(y Uint64x2) Uint64x4
- func (x Uint64x4) SetLo(y Uint64x2) Uint64x4
- func (x Uint64x4) ShiftAllLeft(y uint64) Uint64x4
- func (x Uint64x4) ShiftAllLeftConcat(shift uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) ShiftAllRight(y uint64) Uint64x4
- func (x Uint64x4) ShiftAllRightConcat(shift uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) ShiftLeft(y Uint64x4) Uint64x4
- func (x Uint64x4) ShiftLeftConcat(y Uint64x4, z Uint64x4) Uint64x4
- func (x Uint64x4) ShiftRight(y Uint64x4) Uint64x4
- func (x Uint64x4) ShiftRightConcat(y Uint64x4, z Uint64x4) Uint64x4
- func (x Uint64x4) Store(y *[4]uint64)
- func (x Uint64x4) StoreMasked(y *[4]uint64, mask Mask64x4)
- func (x Uint64x4) StoreSlice(s []uint64)
- func (x Uint64x4) StoreSlicePart(s []uint64)
- func (x Uint64x4) String() string
- func (x Uint64x4) Sub(y Uint64x4) Uint64x4
- func (x Uint64x4) TruncateToUint16() Uint16x8
- func (x Uint64x4) TruncateToUint32() Uint32x4
- func (x Uint64x4) TruncateToUint8() Uint8x16
- func (x Uint64x4) Xor(y Uint64x4) Uint64x4
type Uint64x8
- func BroadcastUint64x8(x uint64) Uint64x8
- func LoadMaskedUint64x8(y *[8]uint64, mask Mask64x8) Uint64x8
- func LoadUint64x8(y *[8]uint64) Uint64x8
- func LoadUint64x8Slice(s []uint64) Uint64x8
- func LoadUint64x8SlicePart(s []uint64) Uint64x8
- func (x Uint64x8) Add(y Uint64x8) Uint64x8
- func (x Uint64x8) And(y Uint64x8) Uint64x8
- func (x Uint64x8) AndNot(y Uint64x8) Uint64x8
- func (from Uint64x8) AsFloat32x16() (to Float32x16)
- func (from Uint64x8) AsFloat64x8() (to Float64x8)
- func (from Uint64x8) AsInt16x32() (to Int16x32)
- func (from Uint64x8) AsInt32x16() (to Int32x16)
- func (from Uint64x8) AsInt64x8() (to Int64x8)
- func (from Uint64x8) AsInt8x64() (to Int8x64)
- func (from Uint64x8) AsUint16x32() (to Uint16x32)
- func (from Uint64x8) AsUint32x16() (to Uint32x16)
- func (from Uint64x8) AsUint8x64() (to Uint8x64)
- func (x Uint64x8) CarrylessMultiplyGrouped(a, b uint8, y Uint64x8) Uint64x8
- func (x Uint64x8) Compress(mask Mask64x8) Uint64x8
- func (x Uint64x8) ConcatPermute(y Uint64x8, indices Uint64x8) Uint64x8
- func (x Uint64x8) ConvertToFloat32() Float32x8
- func (x Uint64x8) ConvertToFloat64() Float64x8
- func (x Uint64x8) Equal(y Uint64x8) Mask64x8
- func (x Uint64x8) Expand(mask Mask64x8) Uint64x8
- func (x Uint64x8) GetHi() Uint64x4
- func (x Uint64x8) GetLo() Uint64x4
- func (x Uint64x8) Greater(y Uint64x8) Mask64x8
- func (x Uint64x8) GreaterEqual(y Uint64x8) Mask64x8
- func (x Uint64x8) InterleaveHiGrouped(y Uint64x8) Uint64x8
- func (x Uint64x8) InterleaveLoGrouped(y Uint64x8) Uint64x8
- func (x Uint64x8) LeadingZeros() Uint64x8
- func (x Uint64x8) Len() int
- func (x Uint64x8) Less(y Uint64x8) Mask64x8
- func (x Uint64x8) LessEqual(y Uint64x8) Mask64x8
- func (x Uint64x8) Masked(mask Mask64x8) Uint64x8
- func (x Uint64x8) Max(y Uint64x8) Uint64x8
- func (x Uint64x8) Merge(y Uint64x8, mask Mask64x8) Uint64x8
- func (x Uint64x8) Min(y Uint64x8) Uint64x8
- func (x Uint64x8) Mul(y Uint64x8) Uint64x8
- func (x Uint64x8) Not() Uint64x8
- func (x Uint64x8) NotEqual(y Uint64x8) Mask64x8
- func (x Uint64x8) OnesCount() Uint64x8
- func (x Uint64x8) Or(y Uint64x8) Uint64x8
- func (x Uint64x8) Permute(indices Uint64x8) Uint64x8
- func (x Uint64x8) RotateAllLeft(shift uint8) Uint64x8
- func (x Uint64x8) RotateAllRight(shift uint8) Uint64x8
- func (x Uint64x8) RotateLeft(y Uint64x8) Uint64x8
- func (x Uint64x8) RotateRight(y Uint64x8) Uint64x8
- func (x Uint64x8) SaturateToUint16() Uint16x8
- func (x Uint64x8) SaturateToUint32() Uint32x8
- func (x Uint64x8) SelectFromPairGrouped(a, b uint8, y Uint64x8) Uint64x8
- func (x Uint64x8) SetHi(y Uint64x4) Uint64x8
- func (x Uint64x8) SetLo(y Uint64x4) Uint64x8
- func (x Uint64x8) ShiftAllLeft(y uint64) Uint64x8
- func (x Uint64x8) ShiftAllLeftConcat(shift uint8, y Uint64x8) Uint64x8
- func (x Uint64x8) ShiftAllRight(y uint64) Uint64x8
- func (x Uint64x8) ShiftAllRightConcat(shift uint8, y Uint64x8) Uint64x8
- func (x Uint64x8) ShiftLeft(y Uint64x8) Uint64x8
- func (x Uint64x8) ShiftLeftConcat(y Uint64x8, z Uint64x8) Uint64x8
- func (x Uint64x8) ShiftRight(y Uint64x8) Uint64x8
- func (x Uint64x8) ShiftRightConcat(y Uint64x8, z Uint64x8) Uint64x8
- func (x Uint64x8) Store(y *[8]uint64)
- func (x Uint64x8) StoreMasked(y *[8]uint64, mask Mask64x8)
- func (x Uint64x8) StoreSlice(s []uint64)
- func (x Uint64x8) StoreSlicePart(s []uint64)
- func (x Uint64x8) String() string
- func (x Uint64x8) Sub(y Uint64x8) Uint64x8
- func (x Uint64x8) TruncateToUint16() Uint16x8
- func (x Uint64x8) TruncateToUint32() Uint32x8
- func (x Uint64x8) TruncateToUint8() Uint8x16
- func (x Uint64x8) Xor(y Uint64x8) Uint64x8
type Uint8x16
- func BroadcastUint8x16(x uint8) Uint8x16
- func LoadUint8x16(y *[16]uint8) Uint8x16
- func LoadUint8x16Slice(s []uint8) Uint8x16
- func LoadUint8x16SlicePart(s []uint8) Uint8x16
- func (x Uint8x16) AESDecryptLastRound(y Uint32x4) Uint8x16
- func (x Uint8x16) AESDecryptOneRound(y Uint32x4) Uint8x16
- func (x Uint8x16) AESEncryptLastRound(y Uint32x4) Uint8x16
- func (x Uint8x16) AESEncryptOneRound(y Uint32x4) Uint8x16
- func (x Uint8x16) Add(y Uint8x16) Uint8x16
- func (x Uint8x16) AddSaturated(y Uint8x16) Uint8x16
- func (x Uint8x16) And(y Uint8x16) Uint8x16
- func (x Uint8x16) AndNot(y Uint8x16) Uint8x16
- func (from Uint8x16) AsFloat32x4() (to Float32x4)
- func (from Uint8x16) AsFloat64x2() (to Float64x2)
- func (from Uint8x16) AsInt16x8() (to Int16x8)
- func (from Uint8x16) AsInt32x4() (to Int32x4)
- func (from Uint8x16) AsInt64x2() (to Int64x2)
- func (from Uint8x16) AsInt8x16() (to Int8x16)
- func (from Uint8x16) AsUint16x8() (to Uint16x8)
- func (from Uint8x16) AsUint32x4() (to Uint32x4)
- func (from Uint8x16) AsUint64x2() (to Uint64x2)
- func (x Uint8x16) Average(y Uint8x16) Uint8x16
- func (x Uint8x16) Broadcast128() Uint8x16
- func (x Uint8x16) Broadcast256() Uint8x32
- func (x Uint8x16) Broadcast512() Uint8x64
- func (x Uint8x16) Compress(mask Mask8x16) Uint8x16
- func (x Uint8x16) ConcatPermute(y Uint8x16, indices Uint8x16) Uint8x16
- func (x Uint8x16) ConcatShiftBytesRight(constant uint8, y Uint8x16) Uint8x16
- func (x Uint8x16) DotProductPairsSaturated(y Int8x16) Int16x8
- func (x Uint8x16) Equal(y Uint8x16) Mask8x16
- func (x Uint8x16) Expand(mask Mask8x16) Uint8x16
- func (x Uint8x16) ExtendLo2ToUint64x2() Uint64x2
- func (x Uint8x16) ExtendLo4ToUint32x4() Uint32x4
- func (x Uint8x16) ExtendLo4ToUint64x4() Uint64x4
- func (x Uint8x16) ExtendLo8ToUint16x8() Uint16x8
- func (x Uint8x16) ExtendLo8ToUint32x8() Uint32x8
- func (x Uint8x16) ExtendLo8ToUint64x8() Uint64x8
- func (x Uint8x16) ExtendToUint16() Uint16x16
- func (x Uint8x16) ExtendToUint32() Uint32x16
- func (x Uint8x16) GaloisFieldAffineTransform(y Uint64x2, b uint8) Uint8x16
- func (x Uint8x16) GaloisFieldAffineTransformInverse(y Uint64x2, b uint8) Uint8x16
- func (x Uint8x16) GaloisFieldMul(y Uint8x16) Uint8x16
- func (x Uint8x16) GetElem(index uint8) uint8
- func (x Uint8x16) Greater(y Uint8x16) Mask8x16
- func (x Uint8x16) GreaterEqual(y Uint8x16) Mask8x16
- func (x Uint8x16) IsZero() bool
- func (x Uint8x16) Len() int
- func (x Uint8x16) Less(y Uint8x16) Mask8x16
- func (x Uint8x16) LessEqual(y Uint8x16) Mask8x16
- func (x Uint8x16) Masked(mask Mask8x16) Uint8x16
- func (x Uint8x16) Max(y Uint8x16) Uint8x16
- func (x Uint8x16) Merge(y Uint8x16, mask Mask8x16) Uint8x16
- func (x Uint8x16) Min(y Uint8x16) Uint8x16
- func (x Uint8x16) Not() Uint8x16
- func (x Uint8x16) NotEqual(y Uint8x16) Mask8x16
- func (x Uint8x16) OnesCount() Uint8x16
- func (x Uint8x16) Or(y Uint8x16) Uint8x16
- func (x Uint8x16) Permute(indices Uint8x16) Uint8x16
- func (x Uint8x16) PermuteOrZero(indices Int8x16) Uint8x16
- func (x Uint8x16) SetElem(index uint8, y uint8) Uint8x16
- func (x Uint8x16) Store(y *[16]uint8)
- func (x Uint8x16) StoreSlice(s []uint8)
- func (x Uint8x16) StoreSlicePart(s []uint8)
- func (x Uint8x16) String() string
- func (x Uint8x16) Sub(y Uint8x16) Uint8x16
- func (x Uint8x16) SubSaturated(y Uint8x16) Uint8x16
- func (x Uint8x16) SumAbsDiff(y Uint8x16) Uint16x8
- func (x Uint8x16) Xor(y Uint8x16) Uint8x16
type Uint8x32
- func BroadcastUint8x32(x uint8) Uint8x32
- func LoadUint8x32(y *[32]uint8) Uint8x32
- func LoadUint8x32Slice(s []uint8) Uint8x32
- func LoadUint8x32SlicePart(s []uint8) Uint8x32
- func (x Uint8x32) AESDecryptLastRound(y Uint32x8) Uint8x32
- func (x Uint8x32) AESDecryptOneRound(y Uint32x8) Uint8x32
- func (x Uint8x32) AESEncryptLastRound(y Uint32x8) Uint8x32
- func (x Uint8x32) AESEncryptOneRound(y Uint32x8) Uint8x32
- func (x Uint8x32) Add(y Uint8x32) Uint8x32
- func (x Uint8x32) AddSaturated(y Uint8x32) Uint8x32
- func (x Uint8x32) And(y Uint8x32) Uint8x32
- func (x Uint8x32) AndNot(y Uint8x32) Uint8x32
- func (from Uint8x32) AsFloat32x8() (to Float32x8)
- func (from Uint8x32) AsFloat64x4() (to Float64x4)
- func (from Uint8x32) AsInt16x16() (to Int16x16)
- func (from Uint8x32) AsInt32x8() (to Int32x8)
- func (from Uint8x32) AsInt64x4() (to Int64x4)
- func (from Uint8x32) AsInt8x32() (to Int8x32)
- func (from Uint8x32) AsUint16x16() (to Uint16x16)
- func (from Uint8x32) AsUint32x8() (to Uint32x8)
- func (from Uint8x32) AsUint64x4() (to Uint64x4)
- func (x Uint8x32) Average(y Uint8x32) Uint8x32
- func (x Uint8x32) Compress(mask Mask8x32) Uint8x32
- func (x Uint8x32) ConcatPermute(y Uint8x32, indices Uint8x32) Uint8x32
- func (x Uint8x32) ConcatShiftBytesRightGrouped(constant uint8, y Uint8x32) Uint8x32
- func (x Uint8x32) DotProductPairsSaturated(y Int8x32) Int16x16
- func (x Uint8x32) Equal(y Uint8x32) Mask8x32
- func (x Uint8x32) Expand(mask Mask8x32) Uint8x32
- func (x Uint8x32) ExtendToUint16() Uint16x32
- func (x Uint8x32) GaloisFieldAffineTransform(y Uint64x4, b uint8) Uint8x32
- func (x Uint8x32) GaloisFieldAffineTransformInverse(y Uint64x4, b uint8) Uint8x32
- func (x Uint8x32) GaloisFieldMul(y Uint8x32) Uint8x32
- func (x Uint8x32) GetHi() Uint8x16
- func (x Uint8x32) GetLo() Uint8x16
- func (x Uint8x32) Greater(y Uint8x32) Mask8x32
- func (x Uint8x32) GreaterEqual(y Uint8x32) Mask8x32
- func (x Uint8x32) IsZero() bool
- func (x Uint8x32) Len() int
- func (x Uint8x32) Less(y Uint8x32) Mask8x32
- func (x Uint8x32) LessEqual(y Uint8x32) Mask8x32
- func (x Uint8x32) Masked(mask Mask8x32) Uint8x32
- func (x Uint8x32) Max(y Uint8x32) Uint8x32
- func (x Uint8x32) Merge(y Uint8x32, mask Mask8x32) Uint8x32
- func (x Uint8x32) Min(y Uint8x32) Uint8x32
- func (x Uint8x32) Not() Uint8x32
- func (x Uint8x32) NotEqual(y Uint8x32) Mask8x32
- func (x Uint8x32) OnesCount() Uint8x32
- func (x Uint8x32) Or(y Uint8x32) Uint8x32
- func (x Uint8x32) Permute(indices Uint8x32) Uint8x32
- func (x Uint8x32) PermuteOrZeroGrouped(indices Int8x32) Uint8x32
- func (x Uint8x32) Select128FromPair(lo, hi uint8, y Uint8x32) Uint8x32
- func (x Uint8x32) SetHi(y Uint8x16) Uint8x32
- func (x Uint8x32) SetLo(y Uint8x16) Uint8x32
- func (x Uint8x32) Store(y *[32]uint8)
- func (x Uint8x32) StoreSlice(s []uint8)
- func (x Uint8x32) StoreSlicePart(s []uint8)
- func (x Uint8x32) String() string
- func (x Uint8x32) Sub(y Uint8x32) Uint8x32
- func (x Uint8x32) SubSaturated(y Uint8x32) Uint8x32
- func (x Uint8x32) SumAbsDiff(y Uint8x32) Uint16x16
- func (x Uint8x32) Xor(y Uint8x32) Uint8x32
type Uint8x64
- func BroadcastUint8x64(x uint8) Uint8x64
- func LoadMaskedUint8x64(y *[64]uint8, mask Mask8x64) Uint8x64
- func LoadUint8x64(y *[64]uint8) Uint8x64
- func LoadUint8x64Slice(s []uint8) Uint8x64
- func LoadUint8x64SlicePart(s []uint8) Uint8x64
- func (x Uint8x64) AESDecryptLastRound(y Uint32x16) Uint8x64
- func (x Uint8x64) AESDecryptOneRound(y Uint32x16) Uint8x64
- func (x Uint8x64) AESEncryptLastRound(y Uint32x16) Uint8x64
- func (x Uint8x64) AESEncryptOneRound(y Uint32x16) Uint8x64
- func (x Uint8x64) Add(y Uint8x64) Uint8x64
- func (x Uint8x64) AddSaturated(y Uint8x64) Uint8x64
- func (x Uint8x64) And(y Uint8x64) Uint8x64
- func (x Uint8x64) AndNot(y Uint8x64) Uint8x64
- func (from Uint8x64) AsFloat32x16() (to Float32x16)
- func (from Uint8x64) AsFloat64x8() (to Float64x8)
- func (from Uint8x64) AsInt16x32() (to Int16x32)
- func (from Uint8x64) AsInt32x16() (to Int32x16)
- func (from Uint8x64) AsInt64x8() (to Int64x8)
- func (from Uint8x64) AsInt8x64() (to Int8x64)
- func (from Uint8x64) AsUint16x32() (to Uint16x32)
- func (from Uint8x64) AsUint32x16() (to Uint32x16)
- func (from Uint8x64) AsUint64x8() (to Uint64x8)
- func (x Uint8x64) Average(y Uint8x64) Uint8x64
- func (x Uint8x64) Compress(mask Mask8x64) Uint8x64
- func (x Uint8x64) ConcatPermute(y Uint8x64, indices Uint8x64) Uint8x64
- func (x Uint8x64) ConcatShiftBytesRightGrouped(constant uint8, y Uint8x64) Uint8x64
- func (x Uint8x64) DotProductPairsSaturated(y Int8x64) Int16x32
- func (x Uint8x64) Equal(y Uint8x64) Mask8x64
- func (x Uint8x64) Expand(mask Mask8x64) Uint8x64
- func (x Uint8x64) GaloisFieldAffineTransform(y Uint64x8, b uint8) Uint8x64
- func (x Uint8x64) GaloisFieldAffineTransformInverse(y Uint64x8, b uint8) Uint8x64
- func (x Uint8x64) GaloisFieldMul(y Uint8x64) Uint8x64
- func (x Uint8x64) GetHi() Uint8x32
- func (x Uint8x64) GetLo() Uint8x32
- func (x Uint8x64) Greater(y Uint8x64) Mask8x64
- func (x Uint8x64) GreaterEqual(y Uint8x64) Mask8x64
- func (x Uint8x64) Len() int
- func (x Uint8x64) Less(y Uint8x64) Mask8x64
- func (x Uint8x64) LessEqual(y Uint8x64) Mask8x64
- func (x Uint8x64) Masked(mask Mask8x64) Uint8x64
- func (x Uint8x64) Max(y Uint8x64) Uint8x64
- func (x Uint8x64) Merge(y Uint8x64, mask Mask8x64) Uint8x64
- func (x Uint8x64) Min(y Uint8x64) Uint8x64
- func (x Uint8x64) Not() Uint8x64
- func (x Uint8x64) NotEqual(y Uint8x64) Mask8x64
- func (x Uint8x64) OnesCount() Uint8x64
- func (x Uint8x64) Or(y Uint8x64) Uint8x64
- func (x Uint8x64) Permute(indices Uint8x64) Uint8x64
- func (x Uint8x64) PermuteOrZeroGrouped(indices Int8x64) Uint8x64
- func (x Uint8x64) SetHi(y Uint8x32) Uint8x64
- func (x Uint8x64) SetLo(y Uint8x32) Uint8x64
- func (x Uint8x64) Store(y *[64]uint8)
- func (x Uint8x64) StoreMasked(y *[64]uint8, mask Mask8x64)
- func (x Uint8x64) StoreSlice(s []uint8)
- func (x Uint8x64) StoreSlicePart(s []uint8)
- func (x Uint8x64) String() string
- func (x Uint8x64) Sub(y Uint8x64) Uint8x64
- func (x Uint8x64) SubSaturated(y Uint8x64) Uint8x64
- func (x Uint8x64) SumAbsDiff(y Uint8x64) Uint16x32
- func (x Uint8x64) Xor(y Uint8x64) Uint8x64
type X86Features
- func (X86Features) AES() bool
- func (X86Features) AVX() bool
- func (X86Features) AVX2() bool
- func (X86Features) AVX512() bool
- func (X86Features) AVX512BITALG() bool
- func (X86Features) AVX512GFNI() bool
- func (X86Features) AVX512VAES() bool
- func (X86Features) AVX512VBMI() bool
- func (X86Features) AVX512VBMI2() bool
- func (X86Features) AVX512VNNI() bool
- func (X86Features) AVX512VPCLMULQDQ() bool
- func (X86Features) AVX512VPOPCNTDQ() bool
- func (X86Features) AVXVNNI() bool
- func (X86Features) SHA() bool
Bugs

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ClearAVXUpperBits ¶

func ClearAVXUpperBits()

ClearAVXUpperBits clears the high bits of Y0-Y15 and Z0-Z15 registers. It is intended for transitioning from AVX to SSE, eliminating the performance penalties caused by false dependencies.

Note: in the future the compiler may automatically generate the instruction, making this function unnecessary.

Asm: VZEROUPPER, CPU Feature: AVX

Types ¶

type Float32x16 ¶

type Float32x16 struct {
	// contains filtered or unexported fields
}

Float32x16 is a 512-bit SIMD vector of 16 float32

func BroadcastFloat32x16 ¶

func BroadcastFloat32x16(x float32) Float32x16

BroadcastFloat32x16 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512F

func LoadFloat32x16 ¶

func LoadFloat32x16(y *[16]float32) Float32x16

LoadFloat32x16 loads a Float32x16 from an array

func LoadFloat32x16Slice ¶

func LoadFloat32x16Slice(s []float32) Float32x16

LoadFloat32x16Slice loads a Float32x16 from a slice of at least 16 float32s

func LoadFloat32x16SlicePart ¶

func LoadFloat32x16SlicePart(s []float32) Float32x16

LoadFloat32x16SlicePart loads a Float32x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadFloat32x16Slice.

func LoadMaskedFloat32x16 ¶

func LoadMaskedFloat32x16(y *[16]float32, mask Mask32x16) Float32x16

LoadMaskedFloat32x16 loads a Float32x16 from an array, at those elements enabled by mask

Asm: VMOVDQU32.Z, CPU Feature: AVX512

func (Float32x16) Add ¶

func (x Float32x16) Add(y Float32x16) Float32x16

Add adds corresponding elements of two vectors.

Asm: VADDPS, CPU Feature: AVX512

func (Float32x16) AsFloat64x8 ¶

func (from Float32x16) AsFloat64x8() (to Float64x8)

Float64x8 converts from Float32x16 to Float64x8

func (Float32x16) AsInt16x32 ¶

func (from Float32x16) AsInt16x32() (to Int16x32)

Int16x32 converts from Float32x16 to Int16x32

func (Float32x16) AsInt32x16 ¶

func (from Float32x16) AsInt32x16() (to Int32x16)

Int32x16 converts from Float32x16 to Int32x16

func (Float32x16) AsInt64x8 ¶

func (from Float32x16) AsInt64x8() (to Int64x8)

Int64x8 converts from Float32x16 to Int64x8

func (Float32x16) AsInt8x64 ¶

func (from Float32x16) AsInt8x64() (to Int8x64)

Int8x64 converts from Float32x16 to Int8x64

func (Float32x16) AsUint16x32 ¶

func (from Float32x16) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Float32x16 to Uint16x32

func (Float32x16) AsUint32x16 ¶

func (from Float32x16) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Float32x16 to Uint32x16

func (Float32x16) AsUint64x8 ¶

func (from Float32x16) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Float32x16 to Uint64x8

func (Float32x16) AsUint8x64 ¶

func (from Float32x16) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Float32x16 to Uint8x64

func (Float32x16) CeilScaled ¶

func (x Float32x16) CeilScaled(prec uint8) Float32x16

CeilScaled rounds elements up with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x16) CeilScaledResidue ¶

func (x Float32x16) CeilScaledResidue(prec uint8) Float32x16

CeilScaledResidue computes the difference after ceiling with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x16) Compress ¶

func (x Float32x16) Compress(mask Mask32x16) Float32x16

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VCOMPRESSPS, CPU Feature: AVX512

func (Float32x16) ConcatPermute ¶

func (x Float32x16) ConcatPermute(y Float32x16, indices Uint32x16) Float32x16

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2PS, CPU Feature: AVX512

func (Float32x16) ConvertToInt32 ¶

func (x Float32x16) ConvertToInt32() Int32x16

ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2DQ, CPU Feature: AVX512

func (Float32x16) ConvertToUint32 ¶

func (x Float32x16) ConvertToUint32() Uint32x16

ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2UDQ, CPU Feature: AVX512

func (Float32x16) Div ¶

func (x Float32x16) Div(y Float32x16) Float32x16

Div divides elements of two vectors.

Asm: VDIVPS, CPU Feature: AVX512

func (Float32x16) Equal ¶

func (x Float32x16) Equal(y Float32x16) Mask32x16

Equal returns x equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX512

func (Float32x16) Expand ¶

func (x Float32x16) Expand(mask Mask32x16) Float32x16

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VEXPANDPS, CPU Feature: AVX512

func (Float32x16) FloorScaled ¶

func (x Float32x16) FloorScaled(prec uint8) Float32x16

FloorScaled rounds elements down with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x16) FloorScaledResidue ¶

func (x Float32x16) FloorScaledResidue(prec uint8) Float32x16

FloorScaledResidue computes the difference after flooring with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x16) GetHi ¶

func (x Float32x16) GetHi() Float32x8

GetHi returns the upper half of x.

Asm: VEXTRACTF64X4, CPU Feature: AVX512

func (Float32x16) GetLo ¶

func (x Float32x16) GetLo() Float32x8

GetLo returns the lower half of x.

Asm: VEXTRACTF64X4, CPU Feature: AVX512

func (Float32x16) Greater ¶

func (x Float32x16) Greater(y Float32x16) Mask32x16

Greater returns x greater-than y, elementwise.

Asm: VCMPPS, CPU Feature: AVX512

func (Float32x16) GreaterEqual ¶

func (x Float32x16) GreaterEqual(y Float32x16) Mask32x16

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX512

func (Float32x16) IsNan ¶

func (x Float32x16) IsNan(y Float32x16) Mask32x16

IsNan checks if elements are NaN. Use as x.IsNan(x).

Asm: VCMPPS, CPU Feature: AVX512

func (Float32x16) Len ¶

func (x Float32x16) Len() int

Len returns the number of elements in a Float32x16

func (Float32x16) Less ¶

func (x Float32x16) Less(y Float32x16) Mask32x16

Less returns x less-than y, elementwise.

Asm: VCMPPS, CPU Feature: AVX512

func (Float32x16) LessEqual ¶

func (x Float32x16) LessEqual(y Float32x16) Mask32x16

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX512

func (Float32x16) Masked ¶

func (x Float32x16) Masked(mask Mask32x16) Float32x16

Masked returns x but with elements zeroed where mask is false.

func (Float32x16) Max ¶

func (x Float32x16) Max(y Float32x16) Float32x16

Max computes the maximum of corresponding elements.

Asm: VMAXPS, CPU Feature: AVX512

func (Float32x16) Merge ¶

func (x Float32x16) Merge(y Float32x16, mask Mask32x16) Float32x16

Merge returns x but with elements set to y where m is false.

func (Float32x16) Min ¶

func (x Float32x16) Min(y Float32x16) Float32x16

Min computes the minimum of corresponding elements.

Asm: VMINPS, CPU Feature: AVX512

func (Float32x16) Mul ¶

func (x Float32x16) Mul(y Float32x16) Float32x16

Mul multiplies corresponding elements of two vectors.

Asm: VMULPS, CPU Feature: AVX512

func (Float32x16) MulAdd ¶

func (x Float32x16) MulAdd(y Float32x16, z Float32x16) Float32x16

MulAdd performs a fused (x * y) + z.

Asm: VFMADD213PS, CPU Feature: AVX512

func (Float32x16) MulAddSub ¶

func (x Float32x16) MulAddSub(y Float32x16, z Float32x16) Float32x16

MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.

Asm: VFMADDSUB213PS, CPU Feature: AVX512

func (Float32x16) MulSubAdd ¶

func (x Float32x16) MulSubAdd(y Float32x16, z Float32x16) Float32x16

MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.

Asm: VFMSUBADD213PS, CPU Feature: AVX512

func (Float32x16) NotEqual ¶

func (x Float32x16) NotEqual(y Float32x16) Mask32x16

NotEqual returns x not-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX512

func (Float32x16) Permute ¶

func (x Float32x16) Permute(indices Uint32x16) Float32x16

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used

Asm: VPERMPS, CPU Feature: AVX512

func (Float32x16) Reciprocal ¶

func (x Float32x16) Reciprocal() Float32x16

Reciprocal computes an approximate reciprocal of each element.

Asm: VRCP14PS, CPU Feature: AVX512

func (Float32x16) ReciprocalSqrt ¶

func (x Float32x16) ReciprocalSqrt() Float32x16

ReciprocalSqrt computes an approximate reciprocal of the square root of each element.

Asm: VRSQRT14PS, CPU Feature: AVX512

func (Float32x16) RoundToEvenScaled ¶

func (x Float32x16) RoundToEvenScaled(prec uint8) Float32x16

RoundToEvenScaled rounds elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x16) RoundToEvenScaledResidue ¶

func (x Float32x16) RoundToEvenScaledResidue(prec uint8) Float32x16

RoundToEvenScaledResidue computes the difference after rounding with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x16) Scale ¶

func (x Float32x16) Scale(y Float32x16) Float32x16

Scale multiplies elements by a power of 2.

Asm: VSCALEFPS, CPU Feature: AVX512

func (Float32x16) SelectFromPairGrouped ¶

func (x Float32x16) SelectFromPairGrouped(a, b, c, d uint8, y Float32x16) Float32x16

SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX512

func (Float32x16) SetHi ¶

func (x Float32x16) SetHi(y Float32x8) Float32x16

SetHi returns x with its upper half set to y.

Asm: VINSERTF64X4, CPU Feature: AVX512

func (Float32x16) SetLo ¶

func (x Float32x16) SetLo(y Float32x8) Float32x16

SetLo returns x with its lower half set to y.

Asm: VINSERTF64X4, CPU Feature: AVX512

func (Float32x16) Sqrt ¶

func (x Float32x16) Sqrt() Float32x16

Sqrt computes the square root of each element.

Asm: VSQRTPS, CPU Feature: AVX512

func (Float32x16) Store ¶

func (x Float32x16) Store(y *[16]float32)

Store stores a Float32x16 to an array

func (Float32x16) StoreMasked ¶

func (x Float32x16) StoreMasked(y *[16]float32, mask Mask32x16)

StoreMasked stores a Float32x16 to an array, at those elements enabled by mask

Asm: VMOVDQU32, CPU Feature: AVX512

func (Float32x16) StoreSlice ¶

func (x Float32x16) StoreSlice(s []float32)

StoreSlice stores x into a slice of at least 16 float32s

func (Float32x16) StoreSlicePart ¶

func (x Float32x16) StoreSlicePart(s []float32)

StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.

func (Float32x16) String ¶

func (x Float32x16) String() string

String returns a string representation of SIMD vector x

func (Float32x16) Sub ¶

func (x Float32x16) Sub(y Float32x16) Float32x16

Sub subtracts corresponding elements of two vectors.

Asm: VSUBPS, CPU Feature: AVX512

func (Float32x16) TruncScaled ¶

func (x Float32x16) TruncScaled(prec uint8) Float32x16

TruncScaled truncates elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x16) TruncScaledResidue ¶

func (x Float32x16) TruncScaledResidue(prec uint8) Float32x16

TruncScaledResidue computes the difference after truncating with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

type Float32x4 ¶

type Float32x4 struct {
	// contains filtered or unexported fields
}

Float32x4 is a 128-bit SIMD vector of 4 float32

func BroadcastFloat32x4 ¶

func BroadcastFloat32x4(x float32) Float32x4

BroadcastFloat32x4 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadFloat32x4 ¶

func LoadFloat32x4(y *[4]float32) Float32x4

LoadFloat32x4 loads a Float32x4 from an array

func LoadFloat32x4Slice ¶

func LoadFloat32x4Slice(s []float32) Float32x4

LoadFloat32x4Slice loads a Float32x4 from a slice of at least 4 float32s

func LoadFloat32x4SlicePart ¶

func LoadFloat32x4SlicePart(s []float32) Float32x4

LoadFloat32x4SlicePart loads a Float32x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadFloat32x4Slice.

func LoadMaskedFloat32x4 ¶

func LoadMaskedFloat32x4(y *[4]float32, mask Mask32x4) Float32x4

LoadMaskedFloat32x4 loads a Float32x4 from an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Float32x4) Add ¶

func (x Float32x4) Add(y Float32x4) Float32x4

Add adds corresponding elements of two vectors.

Asm: VADDPS, CPU Feature: AVX

func (Float32x4) AddPairs ¶

func (x Float32x4) AddPairs(y Float32x4) Float32x4

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VHADDPS, CPU Feature: AVX

func (Float32x4) AddSub ¶

func (x Float32x4) AddSub(y Float32x4) Float32x4

AddSub subtracts even elements and adds odd elements of two vectors.

Asm: VADDSUBPS, CPU Feature: AVX

func (Float32x4) AsFloat64x2 ¶

func (from Float32x4) AsFloat64x2() (to Float64x2)

Float64x2 converts from Float32x4 to Float64x2

func (Float32x4) AsInt16x8 ¶

func (from Float32x4) AsInt16x8() (to Int16x8)

Int16x8 converts from Float32x4 to Int16x8

func (Float32x4) AsInt32x4 ¶

func (from Float32x4) AsInt32x4() (to Int32x4)

Int32x4 converts from Float32x4 to Int32x4

func (Float32x4) AsInt64x2 ¶

func (from Float32x4) AsInt64x2() (to Int64x2)

Int64x2 converts from Float32x4 to Int64x2

func (Float32x4) AsInt8x16 ¶

func (from Float32x4) AsInt8x16() (to Int8x16)

Int8x16 converts from Float32x4 to Int8x16

func (Float32x4) AsUint16x8 ¶

func (from Float32x4) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Float32x4 to Uint16x8

func (Float32x4) AsUint32x4 ¶

func (from Float32x4) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Float32x4 to Uint32x4

func (Float32x4) AsUint64x2 ¶

func (from Float32x4) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Float32x4 to Uint64x2

func (Float32x4) AsUint8x16 ¶

func (from Float32x4) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Float32x4 to Uint8x16

func (Float32x4) Broadcast128 ¶

func (x Float32x4) Broadcast128() Float32x4

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VBROADCASTSS, CPU Feature: AVX2

func (Float32x4) Broadcast256 ¶

func (x Float32x4) Broadcast256() Float32x8

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VBROADCASTSS, CPU Feature: AVX2

func (Float32x4) Broadcast512 ¶

func (x Float32x4) Broadcast512() Float32x16

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VBROADCASTSS, CPU Feature: AVX512

func (Float32x4) Ceil ¶

func (x Float32x4) Ceil() Float32x4

Ceil rounds elements up to the nearest integer.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x4) CeilScaled ¶

func (x Float32x4) CeilScaled(prec uint8) Float32x4

CeilScaled rounds elements up with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x4) CeilScaledResidue ¶

func (x Float32x4) CeilScaledResidue(prec uint8) Float32x4

CeilScaledResidue computes the difference after ceiling with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x4) Compress ¶

func (x Float32x4) Compress(mask Mask32x4) Float32x4

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VCOMPRESSPS, CPU Feature: AVX512

func (Float32x4) ConcatPermute ¶

func (x Float32x4) ConcatPermute(y Float32x4, indices Uint32x4) Float32x4

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2PS, CPU Feature: AVX512

func (Float32x4) ConvertToFloat64 ¶

func (x Float32x4) ConvertToFloat64() Float64x4

ConvertToFloat64 converts element values to float64.

Asm: VCVTPS2PD, CPU Feature: AVX

func (Float32x4) ConvertToInt32 ¶

func (x Float32x4) ConvertToInt32() Int32x4

ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2DQ, CPU Feature: AVX

func (Float32x4) ConvertToInt64 ¶

func (x Float32x4) ConvertToInt64() Int64x4

ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2QQ, CPU Feature: AVX512

func (Float32x4) ConvertToUint32 ¶

func (x Float32x4) ConvertToUint32() Uint32x4

ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2UDQ, CPU Feature: AVX512

func (Float32x4) ConvertToUint64 ¶

func (x Float32x4) ConvertToUint64() Uint64x4

ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2UQQ, CPU Feature: AVX512

func (Float32x4) Div ¶

func (x Float32x4) Div(y Float32x4) Float32x4

Div divides elements of two vectors.

Asm: VDIVPS, CPU Feature: AVX

func (Float32x4) Equal ¶

func (x Float32x4) Equal(y Float32x4) Mask32x4

Equal returns x equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x4) Expand ¶

func (x Float32x4) Expand(mask Mask32x4) Float32x4

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VEXPANDPS, CPU Feature: AVX512

func (Float32x4) Floor ¶

func (x Float32x4) Floor() Float32x4

Floor rounds elements down to the nearest integer.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x4) FloorScaled ¶

func (x Float32x4) FloorScaled(prec uint8) Float32x4

FloorScaled rounds elements down with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x4) FloorScaledResidue ¶

func (x Float32x4) FloorScaledResidue(prec uint8) Float32x4

FloorScaledResidue computes the difference after flooring with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x4) GetElem ¶

func (x Float32x4) GetElem(index uint8) float32

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRD, CPU Feature: AVX

func (Float32x4) Greater ¶

func (x Float32x4) Greater(y Float32x4) Mask32x4

Greater returns x greater-than y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x4) GreaterEqual ¶

func (x Float32x4) GreaterEqual(y Float32x4) Mask32x4

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x4) IsNan ¶

func (x Float32x4) IsNan(y Float32x4) Mask32x4

IsNan checks if elements are NaN. Use as x.IsNan(x).

Asm: VCMPPS, CPU Feature: AVX

func (Float32x4) Len ¶

func (x Float32x4) Len() int

Len returns the number of elements in a Float32x4

func (Float32x4) Less ¶

func (x Float32x4) Less(y Float32x4) Mask32x4

Less returns x less-than y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x4) LessEqual ¶

func (x Float32x4) LessEqual(y Float32x4) Mask32x4

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x4) Masked ¶

func (x Float32x4) Masked(mask Mask32x4) Float32x4

Masked returns x but with elements zeroed where mask is false.

func (Float32x4) Max ¶

func (x Float32x4) Max(y Float32x4) Float32x4

Max computes the maximum of corresponding elements.

Asm: VMAXPS, CPU Feature: AVX

func (Float32x4) Merge ¶

func (x Float32x4) Merge(y Float32x4, mask Mask32x4) Float32x4

Merge returns x but with elements set to y where mask is false.

func (Float32x4) Min ¶

func (x Float32x4) Min(y Float32x4) Float32x4

Min computes the minimum of corresponding elements.

Asm: VMINPS, CPU Feature: AVX

func (Float32x4) Mul ¶

func (x Float32x4) Mul(y Float32x4) Float32x4

Mul multiplies corresponding elements of two vectors.

Asm: VMULPS, CPU Feature: AVX

func (Float32x4) MulAdd ¶

func (x Float32x4) MulAdd(y Float32x4, z Float32x4) Float32x4

MulAdd performs a fused (x * y) + z.

Asm: VFMADD213PS, CPU Feature: AVX512

func (Float32x4) MulAddSub ¶

func (x Float32x4) MulAddSub(y Float32x4, z Float32x4) Float32x4

MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.

Asm: VFMADDSUB213PS, CPU Feature: AVX512

func (Float32x4) MulSubAdd ¶

func (x Float32x4) MulSubAdd(y Float32x4, z Float32x4) Float32x4

MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.

Asm: VFMSUBADD213PS, CPU Feature: AVX512

func (Float32x4) NotEqual ¶

func (x Float32x4) NotEqual(y Float32x4) Mask32x4

NotEqual returns x not-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x4) Reciprocal ¶

func (x Float32x4) Reciprocal() Float32x4

Reciprocal computes an approximate reciprocal of each element.

Asm: VRCPPS, CPU Feature: AVX

func (Float32x4) ReciprocalSqrt ¶

func (x Float32x4) ReciprocalSqrt() Float32x4

ReciprocalSqrt computes an approximate reciprocal of the square root of each element.

Asm: VRSQRTPS, CPU Feature: AVX

func (Float32x4) RoundToEven ¶

func (x Float32x4) RoundToEven() Float32x4

RoundToEven rounds elements to the nearest integer.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x4) RoundToEvenScaled ¶

func (x Float32x4) RoundToEvenScaled(prec uint8) Float32x4

RoundToEvenScaled rounds elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x4) RoundToEvenScaledResidue ¶

func (x Float32x4) RoundToEvenScaledResidue(prec uint8) Float32x4

RoundToEvenScaledResidue computes the difference after rounding with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x4) Scale ¶

func (x Float32x4) Scale(y Float32x4) Float32x4

Scale multiplies elements by a power of 2.

Asm: VSCALEFPS, CPU Feature: AVX512

func (Float32x4) SelectFromPair ¶

func (x Float32x4) SelectFromPair(a, b, c, d uint8, y Float32x4) Float32x4

SelectFromPair returns the selection of four elements from the two vectors x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX

func (Float32x4) SetElem ¶

func (x Float32x4) SetElem(index uint8, y float32) Float32x4

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRD, CPU Feature: AVX

func (Float32x4) Sqrt ¶

func (x Float32x4) Sqrt() Float32x4

Sqrt computes the square root of each element.

Asm: VSQRTPS, CPU Feature: AVX

func (Float32x4) Store ¶

func (x Float32x4) Store(y *[4]float32)

Store stores a Float32x4 to an array

func (Float32x4) StoreMasked ¶

func (x Float32x4) StoreMasked(y *[4]float32, mask Mask32x4)

StoreMasked stores a Float32x4 to an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Float32x4) StoreSlice ¶

func (x Float32x4) StoreSlice(s []float32)

StoreSlice stores x into a slice of at least 4 float32s

func (Float32x4) StoreSlicePart ¶

func (x Float32x4) StoreSlicePart(s []float32)

StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.

func (Float32x4) String ¶

func (x Float32x4) String() string

String returns a string representation of SIMD vector x

func (Float32x4) Sub ¶

func (x Float32x4) Sub(y Float32x4) Float32x4

Sub subtracts corresponding elements of two vectors.

Asm: VSUBPS, CPU Feature: AVX

func (Float32x4) SubPairs ¶

func (x Float32x4) SubPairs(y Float32x4) Float32x4

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VHSUBPS, CPU Feature: AVX

func (Float32x4) Trunc ¶

func (x Float32x4) Trunc() Float32x4

Trunc truncates elements towards zero.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x4) TruncScaled ¶

func (x Float32x4) TruncScaled(prec uint8) Float32x4

TruncScaled truncates elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x4) TruncScaledResidue ¶

func (x Float32x4) TruncScaledResidue(prec uint8) Float32x4

TruncScaledResidue computes the difference after truncating with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

type Float32x8 ¶

type Float32x8 struct {
	// contains filtered or unexported fields
}

Float32x8 is a 256-bit SIMD vector of 8 float32

func BroadcastFloat32x8 ¶

func BroadcastFloat32x8(x float32) Float32x8

BroadcastFloat32x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadFloat32x8 ¶

func LoadFloat32x8(y *[8]float32) Float32x8

LoadFloat32x8 loads a Float32x8 from an array

func LoadFloat32x8Slice ¶

func LoadFloat32x8Slice(s []float32) Float32x8

LoadFloat32x8Slice loads a Float32x8 from a slice of at least 8 float32s

func LoadFloat32x8SlicePart ¶

func LoadFloat32x8SlicePart(s []float32) Float32x8

LoadFloat32x8SlicePart loads a Float32x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadFloat32x8Slice.

func LoadMaskedFloat32x8 ¶

func LoadMaskedFloat32x8(y *[8]float32, mask Mask32x8) Float32x8

LoadMaskedFloat32x8 loads a Float32x8 from an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Float32x8) Add ¶

func (x Float32x8) Add(y Float32x8) Float32x8

Add adds corresponding elements of two vectors.

Asm: VADDPS, CPU Feature: AVX

func (Float32x8) AddPairs ¶

func (x Float32x8) AddPairs(y Float32x8) Float32x8

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VHADDPS, CPU Feature: AVX

func (Float32x8) AddSub ¶

func (x Float32x8) AddSub(y Float32x8) Float32x8

AddSub subtracts even elements and adds odd elements of two vectors.

Asm: VADDSUBPS, CPU Feature: AVX

func (Float32x8) AsFloat64x4 ¶

func (from Float32x8) AsFloat64x4() (to Float64x4)

Float64x4 converts from Float32x8 to Float64x4

func (Float32x8) AsInt16x16 ¶

func (from Float32x8) AsInt16x16() (to Int16x16)

Int16x16 converts from Float32x8 to Int16x16

func (Float32x8) AsInt32x8 ¶

func (from Float32x8) AsInt32x8() (to Int32x8)

Int32x8 converts from Float32x8 to Int32x8

func (Float32x8) AsInt64x4 ¶

func (from Float32x8) AsInt64x4() (to Int64x4)

Int64x4 converts from Float32x8 to Int64x4

func (Float32x8) AsInt8x32 ¶

func (from Float32x8) AsInt8x32() (to Int8x32)

Int8x32 converts from Float32x8 to Int8x32

func (Float32x8) AsUint16x16 ¶

func (from Float32x8) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Float32x8 to Uint16x16

func (Float32x8) AsUint32x8 ¶

func (from Float32x8) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Float32x8 to Uint32x8

func (Float32x8) AsUint64x4 ¶

func (from Float32x8) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Float32x8 to Uint64x4

func (Float32x8) AsUint8x32 ¶

func (from Float32x8) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Float32x8 to Uint8x32

func (Float32x8) Ceil ¶

func (x Float32x8) Ceil() Float32x8

Ceil rounds elements up to the nearest integer.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x8) CeilScaled ¶

func (x Float32x8) CeilScaled(prec uint8) Float32x8

CeilScaled rounds elements up with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x8) CeilScaledResidue ¶

func (x Float32x8) CeilScaledResidue(prec uint8) Float32x8

CeilScaledResidue computes the difference after ceiling with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x8) Compress ¶

func (x Float32x8) Compress(mask Mask32x8) Float32x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VCOMPRESSPS, CPU Feature: AVX512

func (Float32x8) ConcatPermute ¶

func (x Float32x8) ConcatPermute(y Float32x8, indices Uint32x8) Float32x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2PS, CPU Feature: AVX512

func (Float32x8) ConvertToFloat64 ¶

func (x Float32x8) ConvertToFloat64() Float64x8

ConvertToFloat64 converts element values to float64.

Asm: VCVTPS2PD, CPU Feature: AVX512

func (Float32x8) ConvertToInt32 ¶

func (x Float32x8) ConvertToInt32() Int32x8

ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2DQ, CPU Feature: AVX

func (Float32x8) ConvertToInt64 ¶

func (x Float32x8) ConvertToInt64() Int64x8

ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2QQ, CPU Feature: AVX512

func (Float32x8) ConvertToUint32 ¶

func (x Float32x8) ConvertToUint32() Uint32x8

ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2UDQ, CPU Feature: AVX512

func (Float32x8) ConvertToUint64 ¶

func (x Float32x8) ConvertToUint64() Uint64x8

ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPS2UQQ, CPU Feature: AVX512

func (Float32x8) Div ¶

func (x Float32x8) Div(y Float32x8) Float32x8

Div divides elements of two vectors.

Asm: VDIVPS, CPU Feature: AVX

func (Float32x8) Equal ¶

func (x Float32x8) Equal(y Float32x8) Mask32x8

Equal returns x equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x8) Expand ¶

func (x Float32x8) Expand(mask Mask32x8) Float32x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VEXPANDPS, CPU Feature: AVX512

func (Float32x8) Floor ¶

func (x Float32x8) Floor() Float32x8

Floor rounds elements down to the nearest integer.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x8) FloorScaled ¶

func (x Float32x8) FloorScaled(prec uint8) Float32x8

FloorScaled rounds elements down with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x8) FloorScaledResidue ¶

func (x Float32x8) FloorScaledResidue(prec uint8) Float32x8

FloorScaledResidue computes the difference after flooring with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x8) GetHi ¶

func (x Float32x8) GetHi() Float32x4

GetHi returns the upper half of x.

Asm: VEXTRACTF128, CPU Feature: AVX

func (Float32x8) GetLo ¶

func (x Float32x8) GetLo() Float32x4

GetLo returns the lower half of x.

Asm: VEXTRACTF128, CPU Feature: AVX

func (Float32x8) Greater ¶

func (x Float32x8) Greater(y Float32x8) Mask32x8

Greater returns x greater-than y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x8) GreaterEqual ¶

func (x Float32x8) GreaterEqual(y Float32x8) Mask32x8

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x8) IsNan ¶

func (x Float32x8) IsNan(y Float32x8) Mask32x8

IsNan checks if elements are NaN. Use as x.IsNan(x).

Asm: VCMPPS, CPU Feature: AVX

func (Float32x8) Len ¶

func (x Float32x8) Len() int

Len returns the number of elements in a Float32x8

func (Float32x8) Less ¶

func (x Float32x8) Less(y Float32x8) Mask32x8

Less returns x less-than y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x8) LessEqual ¶

func (x Float32x8) LessEqual(y Float32x8) Mask32x8

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x8) Masked ¶

func (x Float32x8) Masked(mask Mask32x8) Float32x8

Masked returns x but with elements zeroed where mask is false.

func (Float32x8) Max ¶

func (x Float32x8) Max(y Float32x8) Float32x8

Max computes the maximum of corresponding elements.

Asm: VMAXPS, CPU Feature: AVX

func (Float32x8) Merge ¶

func (x Float32x8) Merge(y Float32x8, mask Mask32x8) Float32x8

Merge returns x but with elements set to y where mask is false.

func (Float32x8) Min ¶

func (x Float32x8) Min(y Float32x8) Float32x8

Min computes the minimum of corresponding elements.

Asm: VMINPS, CPU Feature: AVX

func (Float32x8) Mul ¶

func (x Float32x8) Mul(y Float32x8) Float32x8

Mul multiplies corresponding elements of two vectors.

Asm: VMULPS, CPU Feature: AVX

func (Float32x8) MulAdd ¶

func (x Float32x8) MulAdd(y Float32x8, z Float32x8) Float32x8

MulAdd performs a fused (x * y) + z.

Asm: VFMADD213PS, CPU Feature: AVX512

func (Float32x8) MulAddSub ¶

func (x Float32x8) MulAddSub(y Float32x8, z Float32x8) Float32x8

MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.

Asm: VFMADDSUB213PS, CPU Feature: AVX512

func (Float32x8) MulSubAdd ¶

func (x Float32x8) MulSubAdd(y Float32x8, z Float32x8) Float32x8

MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.

Asm: VFMSUBADD213PS, CPU Feature: AVX512

func (Float32x8) NotEqual ¶

func (x Float32x8) NotEqual(y Float32x8) Mask32x8

NotEqual returns x not-equals y, elementwise.

Asm: VCMPPS, CPU Feature: AVX

func (Float32x8) Permute ¶

func (x Float32x8) Permute(indices Uint32x8) Float32x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMPS, CPU Feature: AVX2

func (Float32x8) Reciprocal ¶

func (x Float32x8) Reciprocal() Float32x8

Reciprocal computes an approximate reciprocal of each element.

Asm: VRCPPS, CPU Feature: AVX

func (Float32x8) ReciprocalSqrt ¶

func (x Float32x8) ReciprocalSqrt() Float32x8

ReciprocalSqrt computes an approximate reciprocal of the square root of each element.

Asm: VRSQRTPS, CPU Feature: AVX

func (Float32x8) RoundToEven ¶

func (x Float32x8) RoundToEven() Float32x8

RoundToEven rounds elements to the nearest integer.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x8) RoundToEvenScaled ¶

func (x Float32x8) RoundToEvenScaled(prec uint8) Float32x8

RoundToEvenScaled rounds elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x8) RoundToEvenScaledResidue ¶

func (x Float32x8) RoundToEvenScaledResidue(prec uint8) Float32x8

RoundToEvenScaledResidue computes the difference after rounding with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

func (Float32x8) Scale ¶

func (x Float32x8) Scale(y Float32x8) Float32x8

Scale multiplies elements by a power of 2.

Asm: VSCALEFPS, CPU Feature: AVX512

func (Float32x8) Select128FromPair ¶

func (x Float32x8) Select128FromPair(lo, hi uint8, y Float32x8) Float32x8

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})

returns {70, 71, 72, 73, 40, 41, 42, 43}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2F128, CPU Feature: AVX

func (Float32x8) SelectFromPairGrouped ¶

func (x Float32x8) SelectFromPairGrouped(a, b, c, d uint8, y Float32x8) Float32x8

SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})

returns {4,8,25,81,64,128,169,289}

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX

func (Float32x8) SetHi ¶

func (x Float32x8) SetHi(y Float32x4) Float32x8

SetHi returns x with its upper half set to y.

Asm: VINSERTF128, CPU Feature: AVX

func (Float32x8) SetLo ¶

func (x Float32x8) SetLo(y Float32x4) Float32x8

SetLo returns x with its lower half set to y.

Asm: VINSERTF128, CPU Feature: AVX

func (Float32x8) Sqrt ¶

func (x Float32x8) Sqrt() Float32x8

Sqrt computes the square root of each element.

Asm: VSQRTPS, CPU Feature: AVX

func (Float32x8) Store ¶

func (x Float32x8) Store(y *[8]float32)

Store stores a Float32x8 to an array

func (Float32x8) StoreMasked ¶

func (x Float32x8) StoreMasked(y *[8]float32, mask Mask32x8)

StoreMasked stores a Float32x8 to an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Float32x8) StoreSlice ¶

func (x Float32x8) StoreSlice(s []float32)

StoreSlice stores x into a slice of at least 8 float32s

func (Float32x8) StoreSlicePart ¶

func (x Float32x8) StoreSlicePart(s []float32)

StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Float32x8) String ¶

func (x Float32x8) String() string

String returns a string representation of SIMD vector x

func (Float32x8) Sub ¶

func (x Float32x8) Sub(y Float32x8) Float32x8

Sub subtracts corresponding elements of two vectors.

Asm: VSUBPS, CPU Feature: AVX

func (Float32x8) SubPairs ¶

func (x Float32x8) SubPairs(y Float32x8) Float32x8

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VHSUBPS, CPU Feature: AVX

func (Float32x8) Trunc ¶

func (x Float32x8) Trunc() Float32x8

Trunc truncates elements towards zero.

Asm: VROUNDPS, CPU Feature: AVX

func (Float32x8) TruncScaled ¶

func (x Float32x8) TruncScaled(prec uint8) Float32x8

TruncScaled truncates elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPS, CPU Feature: AVX512

func (Float32x8) TruncScaledResidue ¶

func (x Float32x8) TruncScaledResidue(prec uint8) Float32x8

TruncScaledResidue computes the difference after truncating with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPS, CPU Feature: AVX512

type Float64x2 ¶

type Float64x2 struct {
	// contains filtered or unexported fields
}

Float64x2 is a 128-bit SIMD vector of 2 float64

func BroadcastFloat64x2 ¶

func BroadcastFloat64x2(x float64) Float64x2

BroadcastFloat64x2 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadFloat64x2 ¶

func LoadFloat64x2(y *[2]float64) Float64x2

LoadFloat64x2 loads a Float64x2 from an array

func LoadFloat64x2Slice ¶

func LoadFloat64x2Slice(s []float64) Float64x2

LoadFloat64x2Slice loads a Float64x2 from a slice of at least 2 float64s

func LoadFloat64x2SlicePart ¶

func LoadFloat64x2SlicePart(s []float64) Float64x2

LoadFloat64x2SlicePart loads a Float64x2 from the slice s. If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes. If s has 2 or more elements, the function is equivalent to LoadFloat64x2Slice.

func LoadMaskedFloat64x2 ¶

func LoadMaskedFloat64x2(y *[2]float64, mask Mask64x2) Float64x2

LoadMaskedFloat64x2 loads a Float64x2 from an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Float64x2) Add ¶

func (x Float64x2) Add(y Float64x2) Float64x2

Add adds corresponding elements of two vectors.

Asm: VADDPD, CPU Feature: AVX

func (Float64x2) AddPairs ¶

func (x Float64x2) AddPairs(y Float64x2) Float64x2

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VHADDPD, CPU Feature: AVX

func (Float64x2) AddSub ¶

func (x Float64x2) AddSub(y Float64x2) Float64x2

AddSub subtracts even elements and adds odd elements of two vectors.

Asm: VADDSUBPD, CPU Feature: AVX

func (Float64x2) AsFloat32x4 ¶

func (from Float64x2) AsFloat32x4() (to Float32x4)

Float32x4 converts from Float64x2 to Float32x4

func (Float64x2) AsInt16x8 ¶

func (from Float64x2) AsInt16x8() (to Int16x8)

Int16x8 converts from Float64x2 to Int16x8

func (Float64x2) AsInt32x4 ¶

func (from Float64x2) AsInt32x4() (to Int32x4)

Int32x4 converts from Float64x2 to Int32x4

func (Float64x2) AsInt64x2 ¶

func (from Float64x2) AsInt64x2() (to Int64x2)

Int64x2 converts from Float64x2 to Int64x2

func (Float64x2) AsInt8x16 ¶

func (from Float64x2) AsInt8x16() (to Int8x16)

Int8x16 converts from Float64x2 to Int8x16

func (Float64x2) AsUint16x8 ¶

func (from Float64x2) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Float64x2 to Uint16x8

func (Float64x2) AsUint32x4 ¶

func (from Float64x2) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Float64x2 to Uint32x4

func (Float64x2) AsUint64x2 ¶

func (from Float64x2) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Float64x2 to Uint64x2

func (Float64x2) AsUint8x16 ¶

func (from Float64x2) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Float64x2 to Uint8x16

func (Float64x2) Broadcast128 ¶

func (x Float64x2) Broadcast128() Float64x2

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTQ, CPU Feature: AVX2

func (Float64x2) Broadcast256 ¶

func (x Float64x2) Broadcast256() Float64x4

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VBROADCASTSD, CPU Feature: AVX2

func (Float64x2) Broadcast512 ¶

func (x Float64x2) Broadcast512() Float64x8

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VBROADCASTSD, CPU Feature: AVX512

func (Float64x2) Ceil ¶

func (x Float64x2) Ceil() Float64x2

Ceil rounds elements up to the nearest integer.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x2) CeilScaled ¶

func (x Float64x2) CeilScaled(prec uint8) Float64x2

CeilScaled rounds elements up with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x2) CeilScaledResidue ¶

func (x Float64x2) CeilScaledResidue(prec uint8) Float64x2

CeilScaledResidue computes the difference after ceiling with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x2) Compress ¶

func (x Float64x2) Compress(mask Mask64x2) Float64x2

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VCOMPRESSPD, CPU Feature: AVX512

func (Float64x2) ConcatPermute ¶

func (x Float64x2) ConcatPermute(y Float64x2, indices Uint64x2) Float64x2

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2PD, CPU Feature: AVX512

func (Float64x2) ConvertToFloat32 ¶

func (x Float64x2) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32. The result vector's elements are rounded to the nearest value.

Asm: VCVTPD2PSX, CPU Feature: AVX

func (Float64x2) ConvertToInt32 ¶

func (x Float64x2) ConvertToInt32() Int32x4

ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2DQX, CPU Feature: AVX

func (Float64x2) ConvertToInt64 ¶

func (x Float64x2) ConvertToInt64() Int64x2

ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2QQ, CPU Feature: AVX512

func (Float64x2) ConvertToUint32 ¶

func (x Float64x2) ConvertToUint32() Uint32x4

ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2UDQX, CPU Feature: AVX512

func (Float64x2) ConvertToUint64 ¶

func (x Float64x2) ConvertToUint64() Uint64x2

ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2UQQ, CPU Feature: AVX512

func (Float64x2) Div ¶

func (x Float64x2) Div(y Float64x2) Float64x2

Div divides elements of two vectors.

Asm: VDIVPD, CPU Feature: AVX

func (Float64x2) Equal ¶

func (x Float64x2) Equal(y Float64x2) Mask64x2

Equal returns x equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x2) Expand ¶

func (x Float64x2) Expand(mask Mask64x2) Float64x2

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VEXPANDPD, CPU Feature: AVX512

func (Float64x2) Floor ¶

func (x Float64x2) Floor() Float64x2

Floor rounds elements down to the nearest integer.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x2) FloorScaled ¶

func (x Float64x2) FloorScaled(prec uint8) Float64x2

FloorScaled rounds elements down with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x2) FloorScaledResidue ¶

func (x Float64x2) FloorScaledResidue(prec uint8) Float64x2

FloorScaledResidue computes the difference after flooring with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x2) GetElem ¶

func (x Float64x2) GetElem(index uint8) float64

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRQ, CPU Feature: AVX

func (Float64x2) Greater ¶

func (x Float64x2) Greater(y Float64x2) Mask64x2

Greater returns x greater-than y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x2) GreaterEqual ¶

func (x Float64x2) GreaterEqual(y Float64x2) Mask64x2

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x2) IsNan ¶

func (x Float64x2) IsNan(y Float64x2) Mask64x2

IsNan checks if elements are NaN. Use as x.IsNan(x).

Asm: VCMPPD, CPU Feature: AVX

func (Float64x2) Len ¶

func (x Float64x2) Len() int

Len returns the number of elements in a Float64x2

func (Float64x2) Less ¶

func (x Float64x2) Less(y Float64x2) Mask64x2

Less returns x less-than y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x2) LessEqual ¶

func (x Float64x2) LessEqual(y Float64x2) Mask64x2

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x2) Masked ¶

func (x Float64x2) Masked(mask Mask64x2) Float64x2

Masked returns x but with elements zeroed where mask is false.

func (Float64x2) Max ¶

func (x Float64x2) Max(y Float64x2) Float64x2

Max computes the maximum of corresponding elements.

Asm: VMAXPD, CPU Feature: AVX

func (Float64x2) Merge ¶

func (x Float64x2) Merge(y Float64x2, mask Mask64x2) Float64x2

Merge returns x but with elements set to y where mask is false.

func (Float64x2) Min ¶

func (x Float64x2) Min(y Float64x2) Float64x2

Min computes the minimum of corresponding elements.

Asm: VMINPD, CPU Feature: AVX

func (Float64x2) Mul ¶

func (x Float64x2) Mul(y Float64x2) Float64x2

Mul multiplies corresponding elements of two vectors.

Asm: VMULPD, CPU Feature: AVX

func (Float64x2) MulAdd ¶

func (x Float64x2) MulAdd(y Float64x2, z Float64x2) Float64x2

MulAdd performs a fused (x * y) + z.

Asm: VFMADD213PD, CPU Feature: AVX512

func (Float64x2) MulAddSub ¶

func (x Float64x2) MulAddSub(y Float64x2, z Float64x2) Float64x2

MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.

Asm: VFMADDSUB213PD, CPU Feature: AVX512

func (Float64x2) MulSubAdd ¶

func (x Float64x2) MulSubAdd(y Float64x2, z Float64x2) Float64x2

MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.

Asm: VFMSUBADD213PD, CPU Feature: AVX512

func (Float64x2) NotEqual ¶

func (x Float64x2) NotEqual(y Float64x2) Mask64x2

NotEqual returns x not-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x2) Reciprocal ¶

func (x Float64x2) Reciprocal() Float64x2

Reciprocal computes an approximate reciprocal of each element.

Asm: VRCP14PD, CPU Feature: AVX512

func (Float64x2) ReciprocalSqrt ¶

func (x Float64x2) ReciprocalSqrt() Float64x2

ReciprocalSqrt computes an approximate reciprocal of the square root of each element.

Asm: VRSQRT14PD, CPU Feature: AVX512

func (Float64x2) RoundToEven ¶

func (x Float64x2) RoundToEven() Float64x2

RoundToEven rounds elements to the nearest integer.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x2) RoundToEvenScaled ¶

func (x Float64x2) RoundToEvenScaled(prec uint8) Float64x2

RoundToEvenScaled rounds elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x2) RoundToEvenScaledResidue ¶

func (x Float64x2) RoundToEvenScaledResidue(prec uint8) Float64x2

RoundToEvenScaledResidue computes the difference after rounding with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x2) Scale ¶

func (x Float64x2) Scale(y Float64x2) Float64x2

Scale multiplies elements by a power of 2.

Asm: VSCALEFPD, CPU Feature: AVX512

func (Float64x2) SelectFromPair ¶

func (x Float64x2) SelectFromPair(a, b uint8, y Float64x2) Float64x2

SelectFromPair returns the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX

func (Float64x2) SetElem ¶

func (x Float64x2) SetElem(index uint8, y float64) Float64x2

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRQ, CPU Feature: AVX

func (Float64x2) Sqrt ¶

func (x Float64x2) Sqrt() Float64x2

Sqrt computes the square root of each element.

Asm: VSQRTPD, CPU Feature: AVX

func (Float64x2) Store ¶

func (x Float64x2) Store(y *[2]float64)

Store stores a Float64x2 to an array

func (Float64x2) StoreMasked ¶

func (x Float64x2) StoreMasked(y *[2]float64, mask Mask64x2)

StoreMasked stores a Float64x2 to an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Float64x2) StoreSlice ¶

func (x Float64x2) StoreSlice(s []float64)

StoreSlice stores x into a slice of at least 2 float64s

func (Float64x2) StoreSlicePart ¶

func (x Float64x2) StoreSlicePart(s []float64)

StoreSlicePart stores the 2 elements of x into the slice s. It stores as many elements as will fit in s. If s has 2 or more elements, the method is equivalent to x.StoreSlice.

func (Float64x2) String ¶

func (x Float64x2) String() string

String returns a string representation of SIMD vector x

func (Float64x2) Sub ¶

func (x Float64x2) Sub(y Float64x2) Float64x2

Sub subtracts corresponding elements of two vectors.

Asm: VSUBPD, CPU Feature: AVX

func (Float64x2) SubPairs ¶

func (x Float64x2) SubPairs(y Float64x2) Float64x2

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VHSUBPD, CPU Feature: AVX

func (Float64x2) Trunc ¶

func (x Float64x2) Trunc() Float64x2

Trunc truncates elements towards zero.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x2) TruncScaled ¶

func (x Float64x2) TruncScaled(prec uint8) Float64x2

TruncScaled truncates elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x2) TruncScaledResidue ¶

func (x Float64x2) TruncScaledResidue(prec uint8) Float64x2

TruncScaledResidue computes the difference after truncating with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

type Float64x4 ¶

type Float64x4 struct {
	// contains filtered or unexported fields
}

Float64x4 is a 256-bit SIMD vector of 4 float64

func BroadcastFloat64x4 ¶

func BroadcastFloat64x4(x float64) Float64x4

BroadcastFloat64x4 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadFloat64x4 ¶

func LoadFloat64x4(y *[4]float64) Float64x4

LoadFloat64x4 loads a Float64x4 from an array

func LoadFloat64x4Slice ¶

func LoadFloat64x4Slice(s []float64) Float64x4

LoadFloat64x4Slice loads a Float64x4 from a slice of at least 4 float64s

func LoadFloat64x4SlicePart ¶

func LoadFloat64x4SlicePart(s []float64) Float64x4

LoadFloat64x4SlicePart loads a Float64x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadFloat64x4Slice.

func LoadMaskedFloat64x4 ¶

func LoadMaskedFloat64x4(y *[4]float64, mask Mask64x4) Float64x4

LoadMaskedFloat64x4 loads a Float64x4 from an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Float64x4) Add ¶

func (x Float64x4) Add(y Float64x4) Float64x4

Add adds corresponding elements of two vectors.

Asm: VADDPD, CPU Feature: AVX

func (Float64x4) AddPairs ¶

func (x Float64x4) AddPairs(y Float64x4) Float64x4

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VHADDPD, CPU Feature: AVX

func (Float64x4) AddSub ¶

func (x Float64x4) AddSub(y Float64x4) Float64x4

AddSub subtracts even elements and adds odd elements of two vectors.

Asm: VADDSUBPD, CPU Feature: AVX

func (Float64x4) AsFloat32x8 ¶

func (from Float64x4) AsFloat32x8() (to Float32x8)

Float32x8 converts from Float64x4 to Float32x8

func (Float64x4) AsInt16x16 ¶

func (from Float64x4) AsInt16x16() (to Int16x16)

Int16x16 converts from Float64x4 to Int16x16

func (Float64x4) AsInt32x8 ¶

func (from Float64x4) AsInt32x8() (to Int32x8)

Int32x8 converts from Float64x4 to Int32x8

func (Float64x4) AsInt64x4 ¶

func (from Float64x4) AsInt64x4() (to Int64x4)

Int64x4 converts from Float64x4 to Int64x4

func (Float64x4) AsInt8x32 ¶

func (from Float64x4) AsInt8x32() (to Int8x32)

Int8x32 converts from Float64x4 to Int8x32

func (Float64x4) AsUint16x16 ¶

func (from Float64x4) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Float64x4 to Uint16x16

func (Float64x4) AsUint32x8 ¶

func (from Float64x4) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Float64x4 to Uint32x8

func (Float64x4) AsUint64x4 ¶

func (from Float64x4) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Float64x4 to Uint64x4

func (Float64x4) AsUint8x32 ¶

func (from Float64x4) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Float64x4 to Uint8x32

func (Float64x4) Ceil ¶

func (x Float64x4) Ceil() Float64x4

Ceil rounds elements up to the nearest integer.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x4) CeilScaled ¶

func (x Float64x4) CeilScaled(prec uint8) Float64x4

CeilScaled rounds elements up with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x4) CeilScaledResidue ¶

func (x Float64x4) CeilScaledResidue(prec uint8) Float64x4

CeilScaledResidue computes the difference after ceiling with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x4) Compress ¶

func (x Float64x4) Compress(mask Mask64x4) Float64x4

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VCOMPRESSPD, CPU Feature: AVX512

func (Float64x4) ConcatPermute ¶

func (x Float64x4) ConcatPermute(y Float64x4, indices Uint64x4) Float64x4

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2PD, CPU Feature: AVX512

func (Float64x4) ConvertToFloat32 ¶

func (x Float64x4) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32. The result vector's elements are rounded to the nearest value.

Asm: VCVTPD2PSY, CPU Feature: AVX

func (Float64x4) ConvertToInt32 ¶

func (x Float64x4) ConvertToInt32() Int32x4

ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2DQY, CPU Feature: AVX

func (Float64x4) ConvertToInt64 ¶

func (x Float64x4) ConvertToInt64() Int64x4

ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2QQ, CPU Feature: AVX512

func (Float64x4) ConvertToUint32 ¶

func (x Float64x4) ConvertToUint32() Uint32x4

ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2UDQY, CPU Feature: AVX512

func (Float64x4) ConvertToUint64 ¶

func (x Float64x4) ConvertToUint64() Uint64x4

ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2UQQ, CPU Feature: AVX512

func (Float64x4) Div ¶

func (x Float64x4) Div(y Float64x4) Float64x4

Div divides elements of two vectors.

Asm: VDIVPD, CPU Feature: AVX

func (Float64x4) Equal ¶

func (x Float64x4) Equal(y Float64x4) Mask64x4

Equal returns x equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x4) Expand ¶

func (x Float64x4) Expand(mask Mask64x4) Float64x4

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VEXPANDPD, CPU Feature: AVX512

func (Float64x4) Floor ¶

func (x Float64x4) Floor() Float64x4

Floor rounds elements down to the nearest integer.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x4) FloorScaled ¶

func (x Float64x4) FloorScaled(prec uint8) Float64x4

FloorScaled rounds elements down with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x4) FloorScaledResidue ¶

func (x Float64x4) FloorScaledResidue(prec uint8) Float64x4

FloorScaledResidue computes the difference after flooring with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x4) GetHi ¶

func (x Float64x4) GetHi() Float64x2

GetHi returns the upper half of x.

Asm: VEXTRACTF128, CPU Feature: AVX

func (Float64x4) GetLo ¶

func (x Float64x4) GetLo() Float64x2

GetLo returns the lower half of x.

Asm: VEXTRACTF128, CPU Feature: AVX

func (Float64x4) Greater ¶

func (x Float64x4) Greater(y Float64x4) Mask64x4

Greater returns x greater-than y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x4) GreaterEqual ¶

func (x Float64x4) GreaterEqual(y Float64x4) Mask64x4

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x4) IsNan ¶

func (x Float64x4) IsNan(y Float64x4) Mask64x4

IsNan checks if elements are NaN. Use as x.IsNan(x).

Asm: VCMPPD, CPU Feature: AVX

func (Float64x4) Len ¶

func (x Float64x4) Len() int

Len returns the number of elements in a Float64x4

func (Float64x4) Less ¶

func (x Float64x4) Less(y Float64x4) Mask64x4

Less returns x less-than y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x4) LessEqual ¶

func (x Float64x4) LessEqual(y Float64x4) Mask64x4

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x4) Masked ¶

func (x Float64x4) Masked(mask Mask64x4) Float64x4

Masked returns x but with elements zeroed where mask is false.

func (Float64x4) Max ¶

func (x Float64x4) Max(y Float64x4) Float64x4

Max computes the maximum of corresponding elements.

Asm: VMAXPD, CPU Feature: AVX

func (Float64x4) Merge ¶

func (x Float64x4) Merge(y Float64x4, mask Mask64x4) Float64x4

Merge returns x but with elements set to y where mask is false.

func (Float64x4) Min ¶

func (x Float64x4) Min(y Float64x4) Float64x4

Min computes the minimum of corresponding elements.

Asm: VMINPD, CPU Feature: AVX

func (Float64x4) Mul ¶

func (x Float64x4) Mul(y Float64x4) Float64x4

Mul multiplies corresponding elements of two vectors.

Asm: VMULPD, CPU Feature: AVX

func (Float64x4) MulAdd ¶

func (x Float64x4) MulAdd(y Float64x4, z Float64x4) Float64x4

MulAdd performs a fused (x * y) + z.

Asm: VFMADD213PD, CPU Feature: AVX512

func (Float64x4) MulAddSub ¶

func (x Float64x4) MulAddSub(y Float64x4, z Float64x4) Float64x4

MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.

Asm: VFMADDSUB213PD, CPU Feature: AVX512

func (Float64x4) MulSubAdd ¶

func (x Float64x4) MulSubAdd(y Float64x4, z Float64x4) Float64x4

MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.

Asm: VFMSUBADD213PD, CPU Feature: AVX512

func (Float64x4) NotEqual ¶

func (x Float64x4) NotEqual(y Float64x4) Mask64x4

NotEqual returns x not-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX

func (Float64x4) Permute ¶

func (x Float64x4) Permute(indices Uint64x4) Float64x4

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 2 bits (values 0-3) of each element of indices is used

Asm: VPERMPD, CPU Feature: AVX512

func (Float64x4) Reciprocal ¶

func (x Float64x4) Reciprocal() Float64x4

Reciprocal computes an approximate reciprocal of each element.

Asm: VRCP14PD, CPU Feature: AVX512

func (Float64x4) ReciprocalSqrt ¶

func (x Float64x4) ReciprocalSqrt() Float64x4

ReciprocalSqrt computes an approximate reciprocal of the square root of each element.

Asm: VRSQRT14PD, CPU Feature: AVX512

func (Float64x4) RoundToEven ¶

func (x Float64x4) RoundToEven() Float64x4

RoundToEven rounds elements to the nearest integer.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x4) RoundToEvenScaled ¶

func (x Float64x4) RoundToEvenScaled(prec uint8) Float64x4

RoundToEvenScaled rounds elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x4) RoundToEvenScaledResidue ¶

func (x Float64x4) RoundToEvenScaledResidue(prec uint8) Float64x4

RoundToEvenScaledResidue computes the difference after rounding with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x4) Scale ¶

func (x Float64x4) Scale(y Float64x4) Float64x4

Scale multiplies elements by a power of 2.

Asm: VSCALEFPD, CPU Feature: AVX512

func (Float64x4) Select128FromPair ¶

func (x Float64x4) Select128FromPair(lo, hi uint8, y Float64x4) Float64x4

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})

returns {70, 71, 40, 41}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2F128, CPU Feature: AVX

func (Float64x4) SelectFromPairGrouped ¶

func (x Float64x4) SelectFromPairGrouped(a, b uint8, y Float64x4) Float64x4

SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX

func (Float64x4) SetHi ¶

func (x Float64x4) SetHi(y Float64x2) Float64x4

SetHi returns x with its upper half set to y.

Asm: VINSERTF128, CPU Feature: AVX

func (Float64x4) SetLo ¶

func (x Float64x4) SetLo(y Float64x2) Float64x4

SetLo returns x with its lower half set to y.

Asm: VINSERTF128, CPU Feature: AVX

func (Float64x4) Sqrt ¶

func (x Float64x4) Sqrt() Float64x4

Sqrt computes the square root of each element.

Asm: VSQRTPD, CPU Feature: AVX

func (Float64x4) Store ¶

func (x Float64x4) Store(y *[4]float64)

Store stores a Float64x4 to an array

func (Float64x4) StoreMasked ¶

func (x Float64x4) StoreMasked(y *[4]float64, mask Mask64x4)

StoreMasked stores a Float64x4 to an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Float64x4) StoreSlice ¶

func (x Float64x4) StoreSlice(s []float64)

StoreSlice stores x into a slice of at least 4 float64s

func (Float64x4) StoreSlicePart ¶

func (x Float64x4) StoreSlicePart(s []float64)

StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.

func (Float64x4) String ¶

func (x Float64x4) String() string

String returns a string representation of SIMD vector x

func (Float64x4) Sub ¶

func (x Float64x4) Sub(y Float64x4) Float64x4

Sub subtracts corresponding elements of two vectors.

Asm: VSUBPD, CPU Feature: AVX

func (Float64x4) SubPairs ¶

func (x Float64x4) SubPairs(y Float64x4) Float64x4

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VHSUBPD, CPU Feature: AVX

func (Float64x4) Trunc ¶

func (x Float64x4) Trunc() Float64x4

Trunc truncates elements towards zero.

Asm: VROUNDPD, CPU Feature: AVX

func (Float64x4) TruncScaled ¶

func (x Float64x4) TruncScaled(prec uint8) Float64x4

TruncScaled truncates elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x4) TruncScaledResidue ¶

func (x Float64x4) TruncScaledResidue(prec uint8) Float64x4

TruncScaledResidue computes the difference after truncating with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

type Float64x8 ¶

type Float64x8 struct {
	// contains filtered or unexported fields
}

Float64x8 is a 512-bit SIMD vector of 8 float64

func BroadcastFloat64x8 ¶

func BroadcastFloat64x8(x float64) Float64x8

BroadcastFloat64x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512F

func LoadFloat64x8 ¶

func LoadFloat64x8(y *[8]float64) Float64x8

LoadFloat64x8 loads a Float64x8 from an array

func LoadFloat64x8Slice ¶

func LoadFloat64x8Slice(s []float64) Float64x8

LoadFloat64x8Slice loads a Float64x8 from a slice of at least 8 float64s

func LoadFloat64x8SlicePart ¶

func LoadFloat64x8SlicePart(s []float64) Float64x8

LoadFloat64x8SlicePart loads a Float64x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadFloat64x8Slice.

func LoadMaskedFloat64x8 ¶

func LoadMaskedFloat64x8(y *[8]float64, mask Mask64x8) Float64x8

LoadMaskedFloat64x8 loads a Float64x8 from an array, at those elements enabled by mask

Asm: VMOVDQU64.Z, CPU Feature: AVX512

func (Float64x8) Add ¶

func (x Float64x8) Add(y Float64x8) Float64x8

Add adds corresponding elements of two vectors.

Asm: VADDPD, CPU Feature: AVX512

func (Float64x8) AsFloat32x16 ¶

func (from Float64x8) AsFloat32x16() (to Float32x16)

Float32x16 converts from Float64x8 to Float32x16

func (Float64x8) AsInt16x32 ¶

func (from Float64x8) AsInt16x32() (to Int16x32)

Int16x32 converts from Float64x8 to Int16x32

func (Float64x8) AsInt32x16 ¶

func (from Float64x8) AsInt32x16() (to Int32x16)

Int32x16 converts from Float64x8 to Int32x16

func (Float64x8) AsInt64x8 ¶

func (from Float64x8) AsInt64x8() (to Int64x8)

Int64x8 converts from Float64x8 to Int64x8

func (Float64x8) AsInt8x64 ¶

func (from Float64x8) AsInt8x64() (to Int8x64)

Int8x64 converts from Float64x8 to Int8x64

func (Float64x8) AsUint16x32 ¶

func (from Float64x8) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Float64x8 to Uint16x32

func (Float64x8) AsUint32x16 ¶

func (from Float64x8) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Float64x8 to Uint32x16

func (Float64x8) AsUint64x8 ¶

func (from Float64x8) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Float64x8 to Uint64x8

func (Float64x8) AsUint8x64 ¶

func (from Float64x8) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Float64x8 to Uint8x64

func (Float64x8) CeilScaled ¶

func (x Float64x8) CeilScaled(prec uint8) Float64x8

CeilScaled rounds elements up with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x8) CeilScaledResidue ¶

func (x Float64x8) CeilScaledResidue(prec uint8) Float64x8

CeilScaledResidue computes the difference after ceiling with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x8) Compress ¶

func (x Float64x8) Compress(mask Mask64x8) Float64x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VCOMPRESSPD, CPU Feature: AVX512

func (Float64x8) ConcatPermute ¶

func (x Float64x8) ConcatPermute(y Float64x8, indices Uint64x8) Float64x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2PD, CPU Feature: AVX512

func (Float64x8) ConvertToFloat32 ¶

func (x Float64x8) ConvertToFloat32() Float32x8

ConvertToFloat32 converts element values to float32. The result vector's elements are rounded to the nearest value.

Asm: VCVTPD2PS, CPU Feature: AVX512

func (Float64x8) ConvertToInt32 ¶

func (x Float64x8) ConvertToInt32() Int32x8

ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2DQ, CPU Feature: AVX512

func (Float64x8) ConvertToInt64 ¶

func (x Float64x8) ConvertToInt64() Int64x8

ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2QQ, CPU Feature: AVX512

func (Float64x8) ConvertToUint32 ¶

func (x Float64x8) ConvertToUint32() Uint32x8

ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2UDQ, CPU Feature: AVX512

func (Float64x8) ConvertToUint64 ¶

func (x Float64x8) ConvertToUint64() Uint64x8

ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.

Asm: VCVTTPD2UQQ, CPU Feature: AVX512

func (Float64x8) Div ¶

func (x Float64x8) Div(y Float64x8) Float64x8

Div divides elements of two vectors.

Asm: VDIVPD, CPU Feature: AVX512

func (Float64x8) Equal ¶

func (x Float64x8) Equal(y Float64x8) Mask64x8

Equal returns x equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX512

func (Float64x8) Expand ¶

func (x Float64x8) Expand(mask Mask64x8) Float64x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VEXPANDPD, CPU Feature: AVX512

func (Float64x8) FloorScaled ¶

func (x Float64x8) FloorScaled(prec uint8) Float64x8

FloorScaled rounds elements down with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x8) FloorScaledResidue ¶

func (x Float64x8) FloorScaledResidue(prec uint8) Float64x8

FloorScaledResidue computes the difference after flooring with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x8) GetHi ¶

func (x Float64x8) GetHi() Float64x4

GetHi returns the upper half of x.

Asm: VEXTRACTF64X4, CPU Feature: AVX512

func (Float64x8) GetLo ¶

func (x Float64x8) GetLo() Float64x4

GetLo returns the lower half of x.

Asm: VEXTRACTF64X4, CPU Feature: AVX512

func (Float64x8) Greater ¶

func (x Float64x8) Greater(y Float64x8) Mask64x8

Greater returns x greater-than y, elementwise.

Asm: VCMPPD, CPU Feature: AVX512

func (Float64x8) GreaterEqual ¶

func (x Float64x8) GreaterEqual(y Float64x8) Mask64x8

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX512

func (Float64x8) IsNan ¶

func (x Float64x8) IsNan(y Float64x8) Mask64x8

IsNan checks if elements are NaN. Use as x.IsNan(x).

Asm: VCMPPD, CPU Feature: AVX512

func (Float64x8) Len ¶

func (x Float64x8) Len() int

Len returns the number of elements in a Float64x8

func (Float64x8) Less ¶

func (x Float64x8) Less(y Float64x8) Mask64x8

Less returns x less-than y, elementwise.

Asm: VCMPPD, CPU Feature: AVX512

func (Float64x8) LessEqual ¶

func (x Float64x8) LessEqual(y Float64x8) Mask64x8

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX512

func (Float64x8) Masked ¶

func (x Float64x8) Masked(mask Mask64x8) Float64x8

Masked returns x but with elements zeroed where mask is false.

func (Float64x8) Max ¶

func (x Float64x8) Max(y Float64x8) Float64x8

Max computes the maximum of corresponding elements.

Asm: VMAXPD, CPU Feature: AVX512

func (Float64x8) Merge ¶

func (x Float64x8) Merge(y Float64x8, mask Mask64x8) Float64x8

Merge returns x but with elements set to y where m is false.

func (Float64x8) Min ¶

func (x Float64x8) Min(y Float64x8) Float64x8

Min computes the minimum of corresponding elements.

Asm: VMINPD, CPU Feature: AVX512

func (Float64x8) Mul ¶

func (x Float64x8) Mul(y Float64x8) Float64x8

Mul multiplies corresponding elements of two vectors.

Asm: VMULPD, CPU Feature: AVX512

func (Float64x8) MulAdd ¶

func (x Float64x8) MulAdd(y Float64x8, z Float64x8) Float64x8

MulAdd performs a fused (x * y) + z.

Asm: VFMADD213PD, CPU Feature: AVX512

func (Float64x8) MulAddSub ¶

func (x Float64x8) MulAddSub(y Float64x8, z Float64x8) Float64x8

MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.

Asm: VFMADDSUB213PD, CPU Feature: AVX512

func (Float64x8) MulSubAdd ¶

func (x Float64x8) MulSubAdd(y Float64x8, z Float64x8) Float64x8

MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.

Asm: VFMSUBADD213PD, CPU Feature: AVX512

func (Float64x8) NotEqual ¶

func (x Float64x8) NotEqual(y Float64x8) Mask64x8

NotEqual returns x not-equals y, elementwise.

Asm: VCMPPD, CPU Feature: AVX512

func (Float64x8) Permute ¶

func (x Float64x8) Permute(indices Uint64x8) Float64x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMPD, CPU Feature: AVX512

func (Float64x8) Reciprocal ¶

func (x Float64x8) Reciprocal() Float64x8

Reciprocal computes an approximate reciprocal of each element.

Asm: VRCP14PD, CPU Feature: AVX512

func (Float64x8) ReciprocalSqrt ¶

func (x Float64x8) ReciprocalSqrt() Float64x8

ReciprocalSqrt computes an approximate reciprocal of the square root of each element.

Asm: VRSQRT14PD, CPU Feature: AVX512

func (Float64x8) RoundToEvenScaled ¶

func (x Float64x8) RoundToEvenScaled(prec uint8) Float64x8

RoundToEvenScaled rounds elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x8) RoundToEvenScaledResidue ¶

func (x Float64x8) RoundToEvenScaledResidue(prec uint8) Float64x8

RoundToEvenScaledResidue computes the difference after rounding with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

func (Float64x8) Scale ¶

func (x Float64x8) Scale(y Float64x8) Float64x8

Scale multiplies elements by a power of 2.

Asm: VSCALEFPD, CPU Feature: AVX512

func (Float64x8) SelectFromPairGrouped ¶

func (x Float64x8) SelectFromPairGrouped(a, b uint8, y Float64x8) Float64x8

SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX512

func (Float64x8) SetHi ¶

func (x Float64x8) SetHi(y Float64x4) Float64x8

SetHi returns x with its upper half set to y.

Asm: VINSERTF64X4, CPU Feature: AVX512

func (Float64x8) SetLo ¶

func (x Float64x8) SetLo(y Float64x4) Float64x8

SetLo returns x with its lower half set to y.

Asm: VINSERTF64X4, CPU Feature: AVX512

func (Float64x8) Sqrt ¶

func (x Float64x8) Sqrt() Float64x8

Sqrt computes the square root of each element.

Asm: VSQRTPD, CPU Feature: AVX512

func (Float64x8) Store ¶

func (x Float64x8) Store(y *[8]float64)

Store stores a Float64x8 to an array

func (Float64x8) StoreMasked ¶

func (x Float64x8) StoreMasked(y *[8]float64, mask Mask64x8)

StoreMasked stores a Float64x8 to an array, at those elements enabled by mask

Asm: VMOVDQU64, CPU Feature: AVX512

func (Float64x8) StoreSlice ¶

func (x Float64x8) StoreSlice(s []float64)

StoreSlice stores x into a slice of at least 8 float64s

func (Float64x8) StoreSlicePart ¶

func (x Float64x8) StoreSlicePart(s []float64)

StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Float64x8) String ¶

func (x Float64x8) String() string

String returns a string representation of SIMD vector x

func (Float64x8) Sub ¶

func (x Float64x8) Sub(y Float64x8) Float64x8

Sub subtracts corresponding elements of two vectors.

Asm: VSUBPD, CPU Feature: AVX512

func (Float64x8) TruncScaled ¶

func (x Float64x8) TruncScaled(prec uint8) Float64x8

TruncScaled truncates elements with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VRNDSCALEPD, CPU Feature: AVX512

func (Float64x8) TruncScaledResidue ¶

func (x Float64x8) TruncScaledResidue(prec uint8) Float64x8

TruncScaledResidue computes the difference after truncating with specified precision.

prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VREDUCEPD, CPU Feature: AVX512

type Int16x16 ¶

type Int16x16 struct {
	// contains filtered or unexported fields
}

Int16x16 is a 256-bit SIMD vector of 16 int16

func BroadcastInt16x16 ¶

func BroadcastInt16x16(x int16) Int16x16

BroadcastInt16x16 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt16x16 ¶

func LoadInt16x16(y *[16]int16) Int16x16

LoadInt16x16 loads a Int16x16 from an array

func LoadInt16x16Slice ¶

func LoadInt16x16Slice(s []int16) Int16x16

LoadInt16x16Slice loads an Int16x16 from a slice of at least 16 int16s

func LoadInt16x16SlicePart ¶

func LoadInt16x16SlicePart(s []int16) Int16x16

LoadInt16x16SlicePart loads a Int16x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadInt16x16Slice.

func (Int16x16) Abs ¶

func (x Int16x16) Abs() Int16x16

Abs computes the absolute value of each element.

Asm: VPABSW, CPU Feature: AVX2

func (Int16x16) Add ¶

func (x Int16x16) Add(y Int16x16) Int16x16

Add adds corresponding elements of two vectors.

Asm: VPADDW, CPU Feature: AVX2

func (Int16x16) AddPairs ¶

func (x Int16x16) AddPairs(y Int16x16) Int16x16

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDW, CPU Feature: AVX2

func (Int16x16) AddPairsSaturated ¶

func (x Int16x16) AddPairsSaturated(y Int16x16) Int16x16

AddPairsSaturated horizontally adds adjacent pairs of elements with saturation. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDSW, CPU Feature: AVX2

func (Int16x16) AddSaturated ¶

func (x Int16x16) AddSaturated(y Int16x16) Int16x16

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDSW, CPU Feature: AVX2

func (Int16x16) And ¶

func (x Int16x16) And(y Int16x16) Int16x16

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Int16x16) AndNot ¶

func (x Int16x16) AndNot(y Int16x16) Int16x16

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Int16x16) AsFloat32x8 ¶

func (from Int16x16) AsFloat32x8() (to Float32x8)

Float32x8 converts from Int16x16 to Float32x8

func (Int16x16) AsFloat64x4 ¶

func (from Int16x16) AsFloat64x4() (to Float64x4)

Float64x4 converts from Int16x16 to Float64x4

func (Int16x16) AsInt32x8 ¶

func (from Int16x16) AsInt32x8() (to Int32x8)

Int32x8 converts from Int16x16 to Int32x8

func (Int16x16) AsInt64x4 ¶

func (from Int16x16) AsInt64x4() (to Int64x4)

Int64x4 converts from Int16x16 to Int64x4

func (Int16x16) AsInt8x32 ¶

func (from Int16x16) AsInt8x32() (to Int8x32)

Int8x32 converts from Int16x16 to Int8x32

func (Int16x16) AsUint16x16 ¶

func (from Int16x16) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Int16x16 to Uint16x16

func (Int16x16) AsUint32x8 ¶

func (from Int16x16) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Int16x16 to Uint32x8

func (Int16x16) AsUint64x4 ¶

func (from Int16x16) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Int16x16 to Uint64x4

func (Int16x16) AsUint8x32 ¶

func (from Int16x16) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Int16x16 to Uint8x32

func (Int16x16) Compress ¶

func (x Int16x16) Compress(mask Mask16x16) Int16x16

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

func (Int16x16) ConcatPermute ¶

func (x Int16x16) ConcatPermute(y Int16x16, indices Uint16x16) Int16x16

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2W, CPU Feature: AVX512

func (Int16x16) CopySign ¶

func (x Int16x16) CopySign(y Int16x16) Int16x16

CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.

Asm: VPSIGNW, CPU Feature: AVX2

func (Int16x16) DotProductPairs ¶

func (x Int16x16) DotProductPairs(y Int16x16) Int32x8

DotProductPairs multiplies the elements and add the pairs together, yielding a vector of half as many elements with twice the input element size.

Asm: VPMADDWD, CPU Feature: AVX2

func (Int16x16) Equal ¶

func (x Int16x16) Equal(y Int16x16) Mask16x16

Equal returns x equals y, elementwise.

Asm: VPCMPEQW, CPU Feature: AVX2

func (Int16x16) Expand ¶

func (x Int16x16) Expand(mask Mask16x16) Int16x16

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

func (Int16x16) ExtendToInt32 ¶

func (x Int16x16) ExtendToInt32() Int32x16

ExtendToInt32 converts element values to int32. The result vector's elements are sign-extended.

Asm: VPMOVSXWD, CPU Feature: AVX512

func (Int16x16) GetHi ¶

func (x Int16x16) GetHi() Int16x8

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int16x16) GetLo ¶

func (x Int16x16) GetLo() Int16x8

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int16x16) Greater ¶

func (x Int16x16) Greater(y Int16x16) Mask16x16

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTW, CPU Feature: AVX2

func (Int16x16) GreaterEqual ¶

func (x Int16x16) GreaterEqual(y Int16x16) Mask16x16

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Int16x16) InterleaveHiGrouped ¶

func (x Int16x16) InterleaveHiGrouped(y Int16x16) Int16x16

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHWD, CPU Feature: AVX2

func (Int16x16) InterleaveLoGrouped ¶

func (x Int16x16) InterleaveLoGrouped(y Int16x16) Int16x16

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLWD, CPU Feature: AVX2

func (Int16x16) IsZero ¶

func (x Int16x16) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int16x16) Len ¶

func (x Int16x16) Len() int

Len returns the number of elements in a Int16x16

func (Int16x16) Less ¶

func (x Int16x16) Less(y Int16x16) Mask16x16

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Int16x16) LessEqual ¶

func (x Int16x16) LessEqual(y Int16x16) Mask16x16

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Int16x16) Masked ¶

func (x Int16x16) Masked(mask Mask16x16) Int16x16

Masked returns x but with elements zeroed where mask is false.

func (Int16x16) Max ¶

func (x Int16x16) Max(y Int16x16) Int16x16

Max computes the maximum of corresponding elements.

Asm: VPMAXSW, CPU Feature: AVX2

func (Int16x16) Merge ¶

func (x Int16x16) Merge(y Int16x16, mask Mask16x16) Int16x16

Merge returns x but with elements set to y where mask is false.

func (Int16x16) Min ¶

func (x Int16x16) Min(y Int16x16) Int16x16

Min computes the minimum of corresponding elements.

Asm: VPMINSW, CPU Feature: AVX2

func (Int16x16) Mul ¶

func (x Int16x16) Mul(y Int16x16) Int16x16

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLW, CPU Feature: AVX2

func (Int16x16) MulHigh ¶

func (x Int16x16) MulHigh(y Int16x16) Int16x16

MulHigh multiplies elements and stores the high part of the result.

Asm: VPMULHW, CPU Feature: AVX2

func (Int16x16) Not ¶

func (x Int16x16) Not() Int16x16

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Int16x16) NotEqual ¶

func (x Int16x16) NotEqual(y Int16x16) Mask16x16

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Int16x16) OnesCount ¶

func (x Int16x16) OnesCount() Int16x16

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTW, CPU Feature: AVX512BITALG

func (Int16x16) Or ¶

func (x Int16x16) Or(y Int16x16) Int16x16

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Int16x16) Permute ¶

func (x Int16x16) Permute(indices Uint16x16) Int16x16

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used

Asm: VPERMW, CPU Feature: AVX512

func (Int16x16) PermuteScalarsHiGrouped ¶

func (x Int16x16) PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x16

PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
	  {x[0], x[1], x[2], x[3],   x[a+4], x[b+4], x[c+4], x[d+4],
		x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFHW, CPU Feature: AVX2

func (Int16x16) PermuteScalarsLoGrouped ¶

func (x Int16x16) PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x16

PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
 {x[a], x[b], x[c], x[d],         x[4], x[5], x[6], x[7],
	 x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFLW, CPU Feature: AVX2

func (Int16x16) SaturateToInt8 ¶

func (x Int16x16) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSWB, CPU Feature: AVX512

func (Int16x16) SaturateToUint8 ¶

func (x Int16x16) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSWB, CPU Feature: AVX512

func (Int16x16) Select128FromPair ¶

func (x Int16x16) Select128FromPair(lo, hi uint8, y Int16x16) Int16x16

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 42, 43, 44, 45, 46, 47, 50, 51, 52, 53, 54, 55, 56, 57}.Select128FromPair(3, 0,
 {60, 61, 62, 63, 64, 65, 66, 67, 70, 71, 72, 73, 74, 75, 76, 77})

returns {70, 71, 72, 73, 74, 75, 76, 77, 40, 41, 42, 43, 44, 45, 46, 47}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Int16x16) SetHi ¶

func (x Int16x16) SetHi(y Int16x8) Int16x16

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int16x16) SetLo ¶

func (x Int16x16) SetLo(y Int16x8) Int16x16

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int16x16) ShiftAllLeft ¶

func (x Int16x16) ShiftAllLeft(y uint64) Int16x16

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLW, CPU Feature: AVX2

func (Int16x16) ShiftAllLeftConcat ¶

func (x Int16x16) ShiftAllLeftConcat(shift uint8, y Int16x16) Int16x16

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDW, CPU Feature: AVX512VBMI2

func (Int16x16) ShiftAllRight ¶

func (x Int16x16) ShiftAllRight(y uint64) Int16x16

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAW, CPU Feature: AVX2

func (Int16x16) ShiftAllRightConcat ¶

func (x Int16x16) ShiftAllRightConcat(shift uint8, y Int16x16) Int16x16

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDW, CPU Feature: AVX512VBMI2

func (Int16x16) ShiftLeft ¶

func (x Int16x16) ShiftLeft(y Int16x16) Int16x16

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVW, CPU Feature: AVX512

func (Int16x16) ShiftLeftConcat ¶

func (x Int16x16) ShiftLeftConcat(y Int16x16, z Int16x16) Int16x16

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

func (Int16x16) ShiftRight ¶

func (x Int16x16) ShiftRight(y Int16x16) Int16x16

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVW, CPU Feature: AVX512

func (Int16x16) ShiftRightConcat ¶

func (x Int16x16) ShiftRightConcat(y Int16x16, z Int16x16) Int16x16

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

func (Int16x16) Store ¶

func (x Int16x16) Store(y *[16]int16)

Store stores a Int16x16 to an array

func (Int16x16) StoreSlice ¶

func (x Int16x16) StoreSlice(s []int16)

StoreSlice stores x into a slice of at least 16 int16s

func (Int16x16) StoreSlicePart ¶

func (x Int16x16) StoreSlicePart(s []int16)

StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.

func (Int16x16) String ¶

func (x Int16x16) String() string

String returns a string representation of SIMD vector x

func (Int16x16) Sub ¶

func (x Int16x16) Sub(y Int16x16) Int16x16

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBW, CPU Feature: AVX2

func (Int16x16) SubPairs ¶

func (x Int16x16) SubPairs(y Int16x16) Int16x16

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBW, CPU Feature: AVX2

func (Int16x16) SubPairsSaturated ¶

func (x Int16x16) SubPairsSaturated(y Int16x16) Int16x16

SubPairsSaturated horizontally subtracts adjacent pairs of elements with saturation. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBSW, CPU Feature: AVX2

func (Int16x16) SubSaturated ¶

func (x Int16x16) SubSaturated(y Int16x16) Int16x16

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBSW, CPU Feature: AVX2

func (Int16x16) ToMask ¶

func (from Int16x16) ToMask() (to Mask16x16)

ToMask converts from Int16x16 to Mask16x16, mask element is set to true when the corresponding vector element is non-zero.

func (Int16x16) TruncateToInt8 ¶

func (x Int16x16) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVWB, CPU Feature: AVX512

func (Int16x16) Xor ¶

func (x Int16x16) Xor(y Int16x16) Int16x16

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Int16x32 ¶

type Int16x32 struct {
	// contains filtered or unexported fields
}

Int16x32 is a 512-bit SIMD vector of 32 int16

func BroadcastInt16x32 ¶

func BroadcastInt16x32(x int16) Int16x32

BroadcastInt16x32 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512BW

func LoadInt16x32 ¶

func LoadInt16x32(y *[32]int16) Int16x32

LoadInt16x32 loads a Int16x32 from an array

func LoadInt16x32Slice ¶

func LoadInt16x32Slice(s []int16) Int16x32

LoadInt16x32Slice loads an Int16x32 from a slice of at least 32 int16s

func LoadInt16x32SlicePart ¶

func LoadInt16x32SlicePart(s []int16) Int16x32

LoadInt16x32SlicePart loads a Int16x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadInt16x32Slice.

func LoadMaskedInt16x32 ¶

func LoadMaskedInt16x32(y *[32]int16, mask Mask16x32) Int16x32

LoadMaskedInt16x32 loads a Int16x32 from an array, at those elements enabled by mask

Asm: VMOVDQU16.Z, CPU Feature: AVX512

func (Int16x32) Abs ¶

func (x Int16x32) Abs() Int16x32

Abs computes the absolute value of each element.

Asm: VPABSW, CPU Feature: AVX512

func (Int16x32) Add ¶

func (x Int16x32) Add(y Int16x32) Int16x32

Add adds corresponding elements of two vectors.

Asm: VPADDW, CPU Feature: AVX512

func (Int16x32) AddSaturated ¶

func (x Int16x32) AddSaturated(y Int16x32) Int16x32

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDSW, CPU Feature: AVX512

func (Int16x32) And ¶

func (x Int16x32) And(y Int16x32) Int16x32

And performs a bitwise AND operation between two vectors.

Asm: VPANDD, CPU Feature: AVX512

func (Int16x32) AndNot ¶

func (x Int16x32) AndNot(y Int16x32) Int16x32

AndNot performs a bitwise x &^ y.

Asm: VPANDND, CPU Feature: AVX512

func (Int16x32) AsFloat32x16 ¶

func (from Int16x32) AsFloat32x16() (to Float32x16)

Float32x16 converts from Int16x32 to Float32x16

func (Int16x32) AsFloat64x8 ¶

func (from Int16x32) AsFloat64x8() (to Float64x8)

Float64x8 converts from Int16x32 to Float64x8

func (Int16x32) AsInt32x16 ¶

func (from Int16x32) AsInt32x16() (to Int32x16)

Int32x16 converts from Int16x32 to Int32x16

func (Int16x32) AsInt64x8 ¶

func (from Int16x32) AsInt64x8() (to Int64x8)

Int64x8 converts from Int16x32 to Int64x8

func (Int16x32) AsInt8x64 ¶

func (from Int16x32) AsInt8x64() (to Int8x64)

Int8x64 converts from Int16x32 to Int8x64

func (Int16x32) AsUint16x32 ¶

func (from Int16x32) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Int16x32 to Uint16x32

func (Int16x32) AsUint32x16 ¶

func (from Int16x32) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Int16x32 to Uint32x16

func (Int16x32) AsUint64x8 ¶

func (from Int16x32) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Int16x32 to Uint64x8

func (Int16x32) AsUint8x64 ¶

func (from Int16x32) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Int16x32 to Uint8x64

func (Int16x32) Compress ¶

func (x Int16x32) Compress(mask Mask16x32) Int16x32

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

func (Int16x32) ConcatPermute ¶

func (x Int16x32) ConcatPermute(y Int16x32, indices Uint16x32) Int16x32

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2W, CPU Feature: AVX512

func (Int16x32) DotProductPairs ¶

func (x Int16x32) DotProductPairs(y Int16x32) Int32x16

DotProductPairs multiplies the elements and add the pairs together, yielding a vector of half as many elements with twice the input element size.

Asm: VPMADDWD, CPU Feature: AVX512

func (Int16x32) Equal ¶

func (x Int16x32) Equal(y Int16x32) Mask16x32

Equal returns x equals y, elementwise.

Asm: VPCMPEQW, CPU Feature: AVX512

func (Int16x32) Expand ¶

func (x Int16x32) Expand(mask Mask16x32) Int16x32

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

func (Int16x32) GetHi ¶

func (x Int16x32) GetHi() Int16x16

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int16x32) GetLo ¶

func (x Int16x32) GetLo() Int16x16

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int16x32) Greater ¶

func (x Int16x32) Greater(y Int16x32) Mask16x32

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTW, CPU Feature: AVX512

func (Int16x32) GreaterEqual ¶

func (x Int16x32) GreaterEqual(y Int16x32) Mask16x32

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPW, CPU Feature: AVX512

func (Int16x32) InterleaveHiGrouped ¶

func (x Int16x32) InterleaveHiGrouped(y Int16x32) Int16x32

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHWD, CPU Feature: AVX512

func (Int16x32) InterleaveLoGrouped ¶

func (x Int16x32) InterleaveLoGrouped(y Int16x32) Int16x32

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLWD, CPU Feature: AVX512

func (Int16x32) Len ¶

func (x Int16x32) Len() int

Len returns the number of elements in a Int16x32

func (Int16x32) Less ¶

func (x Int16x32) Less(y Int16x32) Mask16x32

Less returns x less-than y, elementwise.

Asm: VPCMPW, CPU Feature: AVX512

func (Int16x32) LessEqual ¶

func (x Int16x32) LessEqual(y Int16x32) Mask16x32

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPW, CPU Feature: AVX512

func (Int16x32) Masked ¶

func (x Int16x32) Masked(mask Mask16x32) Int16x32

Masked returns x but with elements zeroed where mask is false.

func (Int16x32) Max ¶

func (x Int16x32) Max(y Int16x32) Int16x32

Max computes the maximum of corresponding elements.

Asm: VPMAXSW, CPU Feature: AVX512

func (Int16x32) Merge ¶

func (x Int16x32) Merge(y Int16x32, mask Mask16x32) Int16x32

Merge returns x but with elements set to y where m is false.

func (Int16x32) Min ¶

func (x Int16x32) Min(y Int16x32) Int16x32

Min computes the minimum of corresponding elements.

Asm: VPMINSW, CPU Feature: AVX512

func (Int16x32) Mul ¶

func (x Int16x32) Mul(y Int16x32) Int16x32

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLW, CPU Feature: AVX512

func (Int16x32) MulHigh ¶

func (x Int16x32) MulHigh(y Int16x32) Int16x32

MulHigh multiplies elements and stores the high part of the result.

Asm: VPMULHW, CPU Feature: AVX512

func (Int16x32) Not ¶

func (x Int16x32) Not() Int16x32

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Int16x32) NotEqual ¶

func (x Int16x32) NotEqual(y Int16x32) Mask16x32

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPW, CPU Feature: AVX512

func (Int16x32) OnesCount ¶

func (x Int16x32) OnesCount() Int16x32

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTW, CPU Feature: AVX512BITALG

func (Int16x32) Or ¶

func (x Int16x32) Or(y Int16x32) Int16x32

Or performs a bitwise OR operation between two vectors.

Asm: VPORD, CPU Feature: AVX512

func (Int16x32) Permute ¶

func (x Int16x32) Permute(indices Uint16x32) Int16x32

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 5 bits (values 0-31) of each element of indices is used

Asm: VPERMW, CPU Feature: AVX512

func (Int16x32) PermuteScalarsHiGrouped ¶

func (x Int16x32) PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x32

PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
	  {x[0], x[1], x[2], x[3],     x[a+4], x[b+4], x[c+4], x[d+4],
		x[8], x[9], x[10], x[11],   x[a+12], x[b+12], x[c+12], x[d+12],
		x[16], x[17], x[18], x[19], x[a+20], x[b+20], x[c+20], x[d+20],
		x[24], x[25], x[26], x[27], x[a+28], x[b+28], x[c+28], x[d+28]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFHW, CPU Feature: AVX512

func (Int16x32) PermuteScalarsLoGrouped ¶

func (x Int16x32) PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x32

PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
 {x[a], x[b], x[c], x[d],    x[4], x[5], x[6], x[7],
	x[a+8], x[b+8], x[c+8], x[d+8],     x[12], x[13], x[14], x[15],
	x[a+16], x[b+16], x[c+16], x[d+16], x[20], x[21], x[22], x[23],
	x[a+24], x[b+24], x[c+24], x[d+24], x[28], x[29], x[30], x[31]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFLW, CPU Feature: AVX512

func (Int16x32) SaturateToInt8 ¶

func (x Int16x32) SaturateToInt8() Int8x32

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements.

Asm: VPMOVSWB, CPU Feature: AVX512

func (Int16x32) SetHi ¶

func (x Int16x32) SetHi(y Int16x16) Int16x32

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int16x32) SetLo ¶

func (x Int16x32) SetLo(y Int16x16) Int16x32

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int16x32) ShiftAllLeft ¶

func (x Int16x32) ShiftAllLeft(y uint64) Int16x32

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLW, CPU Feature: AVX512

func (Int16x32) ShiftAllLeftConcat ¶

func (x Int16x32) ShiftAllLeftConcat(shift uint8, y Int16x32) Int16x32

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDW, CPU Feature: AVX512VBMI2

func (Int16x32) ShiftAllRight ¶

func (x Int16x32) ShiftAllRight(y uint64) Int16x32

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAW, CPU Feature: AVX512

func (Int16x32) ShiftAllRightConcat ¶

func (x Int16x32) ShiftAllRightConcat(shift uint8, y Int16x32) Int16x32

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDW, CPU Feature: AVX512VBMI2

func (Int16x32) ShiftLeft ¶

func (x Int16x32) ShiftLeft(y Int16x32) Int16x32

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVW, CPU Feature: AVX512

func (Int16x32) ShiftLeftConcat ¶

func (x Int16x32) ShiftLeftConcat(y Int16x32, z Int16x32) Int16x32

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

func (Int16x32) ShiftRight ¶

func (x Int16x32) ShiftRight(y Int16x32) Int16x32

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVW, CPU Feature: AVX512

func (Int16x32) ShiftRightConcat ¶

func (x Int16x32) ShiftRightConcat(y Int16x32, z Int16x32) Int16x32

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

func (Int16x32) Store ¶

func (x Int16x32) Store(y *[32]int16)

Store stores a Int16x32 to an array

func (Int16x32) StoreMasked ¶

func (x Int16x32) StoreMasked(y *[32]int16, mask Mask16x32)

StoreMasked stores a Int16x32 to an array, at those elements enabled by mask

Asm: VMOVDQU16, CPU Feature: AVX512

func (Int16x32) StoreSlice ¶

func (x Int16x32) StoreSlice(s []int16)

StoreSlice stores x into a slice of at least 32 int16s

func (Int16x32) StoreSlicePart ¶

func (x Int16x32) StoreSlicePart(s []int16)

StoreSlicePart stores the 32 elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.

func (Int16x32) String ¶

func (x Int16x32) String() string

String returns a string representation of SIMD vector x

func (Int16x32) Sub ¶

func (x Int16x32) Sub(y Int16x32) Int16x32

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBW, CPU Feature: AVX512

func (Int16x32) SubSaturated ¶

func (x Int16x32) SubSaturated(y Int16x32) Int16x32

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBSW, CPU Feature: AVX512

func (Int16x32) ToMask ¶

func (from Int16x32) ToMask() (to Mask16x32)

ToMask converts from Int16x32 to Mask16x32, mask element is set to true when the corresponding vector element is non-zero.

func (Int16x32) TruncateToInt8 ¶

func (x Int16x32) TruncateToInt8() Int8x32

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements.

Asm: VPMOVWB, CPU Feature: AVX512

func (Int16x32) Xor ¶

func (x Int16x32) Xor(y Int16x32) Int16x32

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORD, CPU Feature: AVX512

type Int16x8 ¶

type Int16x8 struct {
	// contains filtered or unexported fields
}

Int16x8 is a 128-bit SIMD vector of 8 int16

func BroadcastInt16x8 ¶

func BroadcastInt16x8(x int16) Int16x8

BroadcastInt16x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt16x8 ¶

func LoadInt16x8(y *[8]int16) Int16x8

LoadInt16x8 loads a Int16x8 from an array

func LoadInt16x8Slice ¶

func LoadInt16x8Slice(s []int16) Int16x8

LoadInt16x8Slice loads an Int16x8 from a slice of at least 8 int16s

func LoadInt16x8SlicePart ¶

func LoadInt16x8SlicePart(s []int16) Int16x8

LoadInt16x8SlicePart loads a Int16x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadInt16x8Slice.

func (Int16x8) Abs ¶

func (x Int16x8) Abs() Int16x8

Abs computes the absolute value of each element.

Asm: VPABSW, CPU Feature: AVX

func (Int16x8) Add ¶

func (x Int16x8) Add(y Int16x8) Int16x8

Add adds corresponding elements of two vectors.

Asm: VPADDW, CPU Feature: AVX

func (Int16x8) AddPairs ¶

func (x Int16x8) AddPairs(y Int16x8) Int16x8

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDW, CPU Feature: AVX

func (Int16x8) AddPairsSaturated ¶

func (x Int16x8) AddPairsSaturated(y Int16x8) Int16x8

AddPairsSaturated horizontally adds adjacent pairs of elements with saturation. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDSW, CPU Feature: AVX

func (Int16x8) AddSaturated ¶

func (x Int16x8) AddSaturated(y Int16x8) Int16x8

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDSW, CPU Feature: AVX

func (Int16x8) And ¶

func (x Int16x8) And(y Int16x8) Int16x8

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Int16x8) AndNot ¶

func (x Int16x8) AndNot(y Int16x8) Int16x8

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Int16x8) AsFloat32x4 ¶

func (from Int16x8) AsFloat32x4() (to Float32x4)

Float32x4 converts from Int16x8 to Float32x4

func (Int16x8) AsFloat64x2 ¶

func (from Int16x8) AsFloat64x2() (to Float64x2)

Float64x2 converts from Int16x8 to Float64x2

func (Int16x8) AsInt32x4 ¶

func (from Int16x8) AsInt32x4() (to Int32x4)

Int32x4 converts from Int16x8 to Int32x4

func (Int16x8) AsInt64x2 ¶

func (from Int16x8) AsInt64x2() (to Int64x2)

Int64x2 converts from Int16x8 to Int64x2

func (Int16x8) AsInt8x16 ¶

func (from Int16x8) AsInt8x16() (to Int8x16)

Int8x16 converts from Int16x8 to Int8x16

func (Int16x8) AsUint16x8 ¶

func (from Int16x8) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Int16x8 to Uint16x8

func (Int16x8) AsUint32x4 ¶

func (from Int16x8) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Int16x8 to Uint32x4

func (Int16x8) AsUint64x2 ¶

func (from Int16x8) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Int16x8 to Uint64x2

func (Int16x8) AsUint8x16 ¶

func (from Int16x8) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Int16x8 to Uint8x16

func (Int16x8) Broadcast128 ¶

func (x Int16x8) Broadcast128() Int16x8

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTW, CPU Feature: AVX2

func (Int16x8) Broadcast256 ¶

func (x Int16x8) Broadcast256() Int16x16

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTW, CPU Feature: AVX2

func (Int16x8) Broadcast512 ¶

func (x Int16x8) Broadcast512() Int16x32

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTW, CPU Feature: AVX512

func (Int16x8) Compress ¶

func (x Int16x8) Compress(mask Mask16x8) Int16x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

func (Int16x8) ConcatPermute ¶

func (x Int16x8) ConcatPermute(y Int16x8, indices Uint16x8) Int16x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2W, CPU Feature: AVX512

func (Int16x8) CopySign ¶

func (x Int16x8) CopySign(y Int16x8) Int16x8

CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.

Asm: VPSIGNW, CPU Feature: AVX

func (Int16x8) DotProductPairs ¶

func (x Int16x8) DotProductPairs(y Int16x8) Int32x4

DotProductPairs multiplies the elements and add the pairs together, yielding a vector of half as many elements with twice the input element size.

Asm: VPMADDWD, CPU Feature: AVX

func (Int16x8) Equal ¶

func (x Int16x8) Equal(y Int16x8) Mask16x8

Equal returns x equals y, elementwise.

Asm: VPCMPEQW, CPU Feature: AVX

func (Int16x8) Expand ¶

func (x Int16x8) Expand(mask Mask16x8) Int16x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

func (Int16x8) ExtendLo2ToInt64x2 ¶

func (x Int16x8) ExtendLo2ToInt64x2() Int64x2

ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXWQ, CPU Feature: AVX

func (Int16x8) ExtendLo4ToInt32x4 ¶

func (x Int16x8) ExtendLo4ToInt32x4() Int32x4

ExtendLo4ToInt32x4 converts 4 lowest vector element values to int32. The result vector's elements are sign-extended.

Asm: VPMOVSXWD, CPU Feature: AVX

func (Int16x8) ExtendLo4ToInt64x4 ¶

func (x Int16x8) ExtendLo4ToInt64x4() Int64x4

ExtendLo4ToInt64x4 converts 4 lowest vector element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXWQ, CPU Feature: AVX2

func (Int16x8) ExtendToInt32 ¶

func (x Int16x8) ExtendToInt32() Int32x8

ExtendToInt32 converts element values to int32. The result vector's elements are sign-extended.

Asm: VPMOVSXWD, CPU Feature: AVX2

func (Int16x8) ExtendToInt64 ¶

func (x Int16x8) ExtendToInt64() Int64x8

ExtendToInt64 converts element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXWQ, CPU Feature: AVX512

func (Int16x8) GetElem ¶

func (x Int16x8) GetElem(index uint8) int16

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRW, CPU Feature: AVX512

func (Int16x8) Greater ¶

func (x Int16x8) Greater(y Int16x8) Mask16x8

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTW, CPU Feature: AVX

func (Int16x8) GreaterEqual ¶

func (x Int16x8) GreaterEqual(y Int16x8) Mask16x8

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX

func (Int16x8) InterleaveHi ¶

func (x Int16x8) InterleaveHi(y Int16x8) Int16x8

InterleaveHi interleaves the elements of the high halves of x and y.

Asm: VPUNPCKHWD, CPU Feature: AVX

func (Int16x8) InterleaveLo ¶

func (x Int16x8) InterleaveLo(y Int16x8) Int16x8

InterleaveLo interleaves the elements of the low halves of x and y.

Asm: VPUNPCKLWD, CPU Feature: AVX

func (Int16x8) IsZero ¶

func (x Int16x8) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int16x8) Len ¶

func (x Int16x8) Len() int

Len returns the number of elements in a Int16x8

func (Int16x8) Less ¶

func (x Int16x8) Less(y Int16x8) Mask16x8

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX

func (Int16x8) LessEqual ¶

func (x Int16x8) LessEqual(y Int16x8) Mask16x8

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX

func (Int16x8) Masked ¶

func (x Int16x8) Masked(mask Mask16x8) Int16x8

Masked returns x but with elements zeroed where mask is false.

func (Int16x8) Max ¶

func (x Int16x8) Max(y Int16x8) Int16x8

Max computes the maximum of corresponding elements.

Asm: VPMAXSW, CPU Feature: AVX

func (Int16x8) Merge ¶

func (x Int16x8) Merge(y Int16x8, mask Mask16x8) Int16x8

Merge returns x but with elements set to y where mask is false.

func (Int16x8) Min ¶

func (x Int16x8) Min(y Int16x8) Int16x8

Min computes the minimum of corresponding elements.

Asm: VPMINSW, CPU Feature: AVX

func (Int16x8) Mul ¶

func (x Int16x8) Mul(y Int16x8) Int16x8

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLW, CPU Feature: AVX

func (Int16x8) MulHigh ¶

func (x Int16x8) MulHigh(y Int16x8) Int16x8

MulHigh multiplies elements and stores the high part of the result.

Asm: VPMULHW, CPU Feature: AVX

func (Int16x8) Not ¶

func (x Int16x8) Not() Int16x8

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Int16x8) NotEqual ¶

func (x Int16x8) NotEqual(y Int16x8) Mask16x8

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Int16x8) OnesCount ¶

func (x Int16x8) OnesCount() Int16x8

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTW, CPU Feature: AVX512BITALG

func (Int16x8) Or ¶

func (x Int16x8) Or(y Int16x8) Int16x8

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Int16x8) Permute ¶

func (x Int16x8) Permute(indices Uint16x8) Int16x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMW, CPU Feature: AVX512

func (Int16x8) PermuteScalarsHi ¶

func (x Int16x8) PermuteScalarsHi(a, b, c, d uint8) Int16x8

PermuteScalarsHi performs a permutation of vector x using the supplied indices:

result = {x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFHW, CPU Feature: AVX512

func (Int16x8) PermuteScalarsLo ¶

func (x Int16x8) PermuteScalarsLo(a, b, c, d uint8) Int16x8

PermuteScalarsLo performs a permutation of vector x using the supplied indices:

result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFLW, CPU Feature: AVX512

func (Int16x8) SaturateToInt8 ¶

func (x Int16x8) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSWB, CPU Feature: AVX512

func (Int16x8) SaturateToUint8 ¶

func (x Int16x8) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSWB, CPU Feature: AVX512

func (Int16x8) SetElem ¶

func (x Int16x8) SetElem(index uint8, y int16) Int16x8

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRW, CPU Feature: AVX

func (Int16x8) ShiftAllLeft ¶

func (x Int16x8) ShiftAllLeft(y uint64) Int16x8

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLW, CPU Feature: AVX

func (Int16x8) ShiftAllLeftConcat ¶

func (x Int16x8) ShiftAllLeftConcat(shift uint8, y Int16x8) Int16x8

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDW, CPU Feature: AVX512VBMI2

func (Int16x8) ShiftAllRight ¶

func (x Int16x8) ShiftAllRight(y uint64) Int16x8

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAW, CPU Feature: AVX

func (Int16x8) ShiftAllRightConcat ¶

func (x Int16x8) ShiftAllRightConcat(shift uint8, y Int16x8) Int16x8

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDW, CPU Feature: AVX512VBMI2

func (Int16x8) ShiftLeft ¶

func (x Int16x8) ShiftLeft(y Int16x8) Int16x8

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVW, CPU Feature: AVX512

func (Int16x8) ShiftLeftConcat ¶

func (x Int16x8) ShiftLeftConcat(y Int16x8, z Int16x8) Int16x8

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

func (Int16x8) ShiftRight ¶

func (x Int16x8) ShiftRight(y Int16x8) Int16x8

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVW, CPU Feature: AVX512

func (Int16x8) ShiftRightConcat ¶

func (x Int16x8) ShiftRightConcat(y Int16x8, z Int16x8) Int16x8

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

func (Int16x8) Store ¶

func (x Int16x8) Store(y *[8]int16)

Store stores a Int16x8 to an array

func (Int16x8) StoreSlice ¶

func (x Int16x8) StoreSlice(s []int16)

StoreSlice stores x into a slice of at least 8 int16s

func (Int16x8) StoreSlicePart ¶

func (x Int16x8) StoreSlicePart(s []int16)

StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Int16x8) String ¶

func (x Int16x8) String() string

String returns a string representation of SIMD vector x

func (Int16x8) Sub ¶

func (x Int16x8) Sub(y Int16x8) Int16x8

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBW, CPU Feature: AVX

func (Int16x8) SubPairs ¶

func (x Int16x8) SubPairs(y Int16x8) Int16x8

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBW, CPU Feature: AVX

func (Int16x8) SubPairsSaturated ¶

func (x Int16x8) SubPairsSaturated(y Int16x8) Int16x8

SubPairsSaturated horizontally subtracts adjacent pairs of elements with saturation. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBSW, CPU Feature: AVX

func (Int16x8) SubSaturated ¶

func (x Int16x8) SubSaturated(y Int16x8) Int16x8

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBSW, CPU Feature: AVX

func (Int16x8) ToMask ¶

func (from Int16x8) ToMask() (to Mask16x8)

ToMask converts from Int16x8 to Mask16x8, mask element is set to true when the corresponding vector element is non-zero.

func (Int16x8) TruncateToInt8 ¶

func (x Int16x8) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVWB, CPU Feature: AVX512

func (Int16x8) Xor ¶

func (x Int16x8) Xor(y Int16x8) Int16x8

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Int32x16 ¶

type Int32x16 struct {
	// contains filtered or unexported fields
}

Int32x16 is a 512-bit SIMD vector of 16 int32

func BroadcastInt32x16 ¶

func BroadcastInt32x16(x int32) Int32x16

BroadcastInt32x16 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512F

func LoadInt32x16 ¶

func LoadInt32x16(y *[16]int32) Int32x16

LoadInt32x16 loads a Int32x16 from an array

func LoadInt32x16Slice ¶

func LoadInt32x16Slice(s []int32) Int32x16

LoadInt32x16Slice loads an Int32x16 from a slice of at least 16 int32s

func LoadInt32x16SlicePart ¶

func LoadInt32x16SlicePart(s []int32) Int32x16

LoadInt32x16SlicePart loads a Int32x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadInt32x16Slice.

func LoadMaskedInt32x16 ¶

func LoadMaskedInt32x16(y *[16]int32, mask Mask32x16) Int32x16

LoadMaskedInt32x16 loads a Int32x16 from an array, at those elements enabled by mask

Asm: VMOVDQU32.Z, CPU Feature: AVX512

func (Int32x16) Abs ¶

func (x Int32x16) Abs() Int32x16

Abs computes the absolute value of each element.

Asm: VPABSD, CPU Feature: AVX512

func (Int32x16) Add ¶

func (x Int32x16) Add(y Int32x16) Int32x16

Add adds corresponding elements of two vectors.

Asm: VPADDD, CPU Feature: AVX512

func (Int32x16) And ¶

func (x Int32x16) And(y Int32x16) Int32x16

And performs a bitwise AND operation between two vectors.

Asm: VPANDD, CPU Feature: AVX512

func (Int32x16) AndNot ¶

func (x Int32x16) AndNot(y Int32x16) Int32x16

AndNot performs a bitwise x &^ y.

Asm: VPANDND, CPU Feature: AVX512

func (Int32x16) AsFloat32x16 ¶

func (from Int32x16) AsFloat32x16() (to Float32x16)

Float32x16 converts from Int32x16 to Float32x16

func (Int32x16) AsFloat64x8 ¶

func (from Int32x16) AsFloat64x8() (to Float64x8)

Float64x8 converts from Int32x16 to Float64x8

func (Int32x16) AsInt16x32 ¶

func (from Int32x16) AsInt16x32() (to Int16x32)

Int16x32 converts from Int32x16 to Int16x32

func (Int32x16) AsInt64x8 ¶

func (from Int32x16) AsInt64x8() (to Int64x8)

Int64x8 converts from Int32x16 to Int64x8

func (Int32x16) AsInt8x64 ¶

func (from Int32x16) AsInt8x64() (to Int8x64)

Int8x64 converts from Int32x16 to Int8x64

func (Int32x16) AsUint16x32 ¶

func (from Int32x16) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Int32x16 to Uint16x32

func (Int32x16) AsUint32x16 ¶

func (from Int32x16) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Int32x16 to Uint32x16

func (Int32x16) AsUint64x8 ¶

func (from Int32x16) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Int32x16 to Uint64x8

func (Int32x16) AsUint8x64 ¶

func (from Int32x16) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Int32x16 to Uint8x64

func (Int32x16) Compress ¶

func (x Int32x16) Compress(mask Mask32x16) Int32x16

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSD, CPU Feature: AVX512

func (Int32x16) ConcatPermute ¶

func (x Int32x16) ConcatPermute(y Int32x16, indices Uint32x16) Int32x16

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2D, CPU Feature: AVX512

func (Int32x16) ConvertToFloat32 ¶

func (x Int32x16) ConvertToFloat32() Float32x16

ConvertToFloat32 converts element values to float32.

Asm: VCVTDQ2PS, CPU Feature: AVX512

func (Int32x16) Equal ¶

func (x Int32x16) Equal(y Int32x16) Mask32x16

Equal returns x equals y, elementwise.

Asm: VPCMPEQD, CPU Feature: AVX512

func (Int32x16) Expand ¶

func (x Int32x16) Expand(mask Mask32x16) Int32x16

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDD, CPU Feature: AVX512

func (Int32x16) GetHi ¶

func (x Int32x16) GetHi() Int32x8

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int32x16) GetLo ¶

func (x Int32x16) GetLo() Int32x8

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int32x16) Greater ¶

func (x Int32x16) Greater(y Int32x16) Mask32x16

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTD, CPU Feature: AVX512

func (Int32x16) GreaterEqual ¶

func (x Int32x16) GreaterEqual(y Int32x16) Mask32x16

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPD, CPU Feature: AVX512

func (Int32x16) InterleaveHiGrouped ¶

func (x Int32x16) InterleaveHiGrouped(y Int32x16) Int32x16

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHDQ, CPU Feature: AVX512

func (Int32x16) InterleaveLoGrouped ¶

func (x Int32x16) InterleaveLoGrouped(y Int32x16) Int32x16

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLDQ, CPU Feature: AVX512

func (Int32x16) LeadingZeros ¶

func (x Int32x16) LeadingZeros() Int32x16

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTD, CPU Feature: AVX512

func (Int32x16) Len ¶

func (x Int32x16) Len() int

Len returns the number of elements in a Int32x16

func (Int32x16) Less ¶

func (x Int32x16) Less(y Int32x16) Mask32x16

Less returns x less-than y, elementwise.

Asm: VPCMPD, CPU Feature: AVX512

func (Int32x16) LessEqual ¶

func (x Int32x16) LessEqual(y Int32x16) Mask32x16

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPD, CPU Feature: AVX512

func (Int32x16) Masked ¶

func (x Int32x16) Masked(mask Mask32x16) Int32x16

Masked returns x but with elements zeroed where mask is false.

func (Int32x16) Max ¶

func (x Int32x16) Max(y Int32x16) Int32x16

Max computes the maximum of corresponding elements.

Asm: VPMAXSD, CPU Feature: AVX512

func (Int32x16) Merge ¶

func (x Int32x16) Merge(y Int32x16, mask Mask32x16) Int32x16

Merge returns x but with elements set to y where m is false.

func (Int32x16) Min ¶

func (x Int32x16) Min(y Int32x16) Int32x16

Min computes the minimum of corresponding elements.

Asm: VPMINSD, CPU Feature: AVX512

func (Int32x16) Mul ¶

func (x Int32x16) Mul(y Int32x16) Int32x16

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLD, CPU Feature: AVX512

func (Int32x16) Not ¶

func (x Int32x16) Not() Int32x16

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Int32x16) NotEqual ¶

func (x Int32x16) NotEqual(y Int32x16) Mask32x16

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPD, CPU Feature: AVX512

func (Int32x16) OnesCount ¶

func (x Int32x16) OnesCount() Int32x16

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

func (Int32x16) Or ¶

func (x Int32x16) Or(y Int32x16) Int32x16

Or performs a bitwise OR operation between two vectors.

Asm: VPORD, CPU Feature: AVX512

func (Int32x16) Permute ¶

func (x Int32x16) Permute(indices Uint32x16) Int32x16

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used

Asm: VPERMD, CPU Feature: AVX512

func (Int32x16) PermuteScalarsGrouped ¶

func (x Int32x16) PermuteScalarsGrouped(a, b, c, d uint8) Int32x16

PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
	 {  x[a], x[b], x[c], x[d],         x[a+4], x[b+4], x[c+4], x[d+4],
		x[a+8], x[b+8], x[c+8], x[d+8], x[a+12], x[b+12], x[c+12], x[d+12]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.

Asm: VPSHUFD, CPU Feature: AVX512

func (Int32x16) RotateAllLeft ¶

func (x Int32x16) RotateAllLeft(shift uint8) Int32x16

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLD, CPU Feature: AVX512

func (Int32x16) RotateAllRight ¶

func (x Int32x16) RotateAllRight(shift uint8) Int32x16

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORD, CPU Feature: AVX512

func (Int32x16) RotateLeft ¶

func (x Int32x16) RotateLeft(y Int32x16) Int32x16

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVD, CPU Feature: AVX512

func (Int32x16) RotateRight ¶

func (x Int32x16) RotateRight(y Int32x16) Int32x16

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVD, CPU Feature: AVX512

func (Int32x16) SaturateToInt16 ¶

func (x Int32x16) SaturateToInt16() Int16x16

SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.

Asm: VPMOVSDW, CPU Feature: AVX512

func (Int32x16) SaturateToInt16Concat ¶

func (x Int32x16) SaturateToInt16Concat(y Int32x16) Int16x32

SaturateToInt16Concat converts element values to int16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.

Asm: VPACKSSDW, CPU Feature: AVX512

func (Int32x16) SaturateToInt8 ¶

func (x Int32x16) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSDB, CPU Feature: AVX512

func (Int32x16) SaturateToUint8 ¶

func (x Int32x16) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSDB, CPU Feature: AVX512

func (Int32x16) SelectFromPairGrouped ¶

func (x Int32x16) SelectFromPairGrouped(a, b, c, d uint8, y Int32x16) Int32x16

SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX512

func (Int32x16) SetHi ¶

func (x Int32x16) SetHi(y Int32x8) Int32x16

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int32x16) SetLo ¶

func (x Int32x16) SetLo(y Int32x8) Int32x16

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int32x16) ShiftAllLeft ¶

func (x Int32x16) ShiftAllLeft(y uint64) Int32x16

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLD, CPU Feature: AVX512

func (Int32x16) ShiftAllLeftConcat ¶

func (x Int32x16) ShiftAllLeftConcat(shift uint8, y Int32x16) Int32x16

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDD, CPU Feature: AVX512VBMI2

func (Int32x16) ShiftAllRight ¶

func (x Int32x16) ShiftAllRight(y uint64) Int32x16

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAD, CPU Feature: AVX512

func (Int32x16) ShiftAllRightConcat ¶

func (x Int32x16) ShiftAllRightConcat(shift uint8, y Int32x16) Int32x16

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDD, CPU Feature: AVX512VBMI2

func (Int32x16) ShiftLeft ¶

func (x Int32x16) ShiftLeft(y Int32x16) Int32x16

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVD, CPU Feature: AVX512

func (Int32x16) ShiftLeftConcat ¶

func (x Int32x16) ShiftLeftConcat(y Int32x16, z Int32x16) Int32x16

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

func (Int32x16) ShiftRight ¶

func (x Int32x16) ShiftRight(y Int32x16) Int32x16

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVD, CPU Feature: AVX512

func (Int32x16) ShiftRightConcat ¶

func (x Int32x16) ShiftRightConcat(y Int32x16, z Int32x16) Int32x16

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

func (Int32x16) Store ¶

func (x Int32x16) Store(y *[16]int32)

Store stores a Int32x16 to an array

func (Int32x16) StoreMasked ¶

func (x Int32x16) StoreMasked(y *[16]int32, mask Mask32x16)

StoreMasked stores a Int32x16 to an array, at those elements enabled by mask

Asm: VMOVDQU32, CPU Feature: AVX512

func (Int32x16) StoreSlice ¶

func (x Int32x16) StoreSlice(s []int32)

StoreSlice stores x into a slice of at least 16 int32s

func (Int32x16) StoreSlicePart ¶

func (x Int32x16) StoreSlicePart(s []int32)

StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.

func (Int32x16) String ¶

func (x Int32x16) String() string

String returns a string representation of SIMD vector x

func (Int32x16) Sub ¶

func (x Int32x16) Sub(y Int32x16) Int32x16

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBD, CPU Feature: AVX512

func (Int32x16) ToMask ¶

func (from Int32x16) ToMask() (to Mask32x16)

ToMask converts from Int32x16 to Mask32x16, mask element is set to true when the corresponding vector element is non-zero.

func (Int32x16) TruncateToInt16 ¶

func (x Int32x16) TruncateToInt16() Int16x16

TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.

Asm: VPMOVDW, CPU Feature: AVX512

func (Int32x16) TruncateToInt8 ¶

func (x Int32x16) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVDB, CPU Feature: AVX512

func (Int32x16) Xor ¶

func (x Int32x16) Xor(y Int32x16) Int32x16

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORD, CPU Feature: AVX512

type Int32x4 ¶

type Int32x4 struct {
	// contains filtered or unexported fields
}

Int32x4 is a 128-bit SIMD vector of 4 int32

func BroadcastInt32x4 ¶

func BroadcastInt32x4(x int32) Int32x4

BroadcastInt32x4 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt32x4 ¶

func LoadInt32x4(y *[4]int32) Int32x4

LoadInt32x4 loads a Int32x4 from an array

func LoadInt32x4Slice ¶

func LoadInt32x4Slice(s []int32) Int32x4

LoadInt32x4Slice loads an Int32x4 from a slice of at least 4 int32s

func LoadInt32x4SlicePart ¶

func LoadInt32x4SlicePart(s []int32) Int32x4

LoadInt32x4SlicePart loads a Int32x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadInt32x4Slice.

func LoadMaskedInt32x4 ¶

func LoadMaskedInt32x4(y *[4]int32, mask Mask32x4) Int32x4

LoadMaskedInt32x4 loads a Int32x4 from an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Int32x4) Abs ¶

func (x Int32x4) Abs() Int32x4

Abs computes the absolute value of each element.

Asm: VPABSD, CPU Feature: AVX

func (Int32x4) Add ¶

func (x Int32x4) Add(y Int32x4) Int32x4

Add adds corresponding elements of two vectors.

Asm: VPADDD, CPU Feature: AVX

func (Int32x4) AddPairs ¶

func (x Int32x4) AddPairs(y Int32x4) Int32x4

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDD, CPU Feature: AVX

func (Int32x4) And ¶

func (x Int32x4) And(y Int32x4) Int32x4

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Int32x4) AndNot ¶

func (x Int32x4) AndNot(y Int32x4) Int32x4

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Int32x4) AsFloat32x4 ¶

func (from Int32x4) AsFloat32x4() (to Float32x4)

Float32x4 converts from Int32x4 to Float32x4

func (Int32x4) AsFloat64x2 ¶

func (from Int32x4) AsFloat64x2() (to Float64x2)

Float64x2 converts from Int32x4 to Float64x2

func (Int32x4) AsInt16x8 ¶

func (from Int32x4) AsInt16x8() (to Int16x8)

Int16x8 converts from Int32x4 to Int16x8

func (Int32x4) AsInt64x2 ¶

func (from Int32x4) AsInt64x2() (to Int64x2)

Int64x2 converts from Int32x4 to Int64x2

func (Int32x4) AsInt8x16 ¶

func (from Int32x4) AsInt8x16() (to Int8x16)

Int8x16 converts from Int32x4 to Int8x16

func (Int32x4) AsUint16x8 ¶

func (from Int32x4) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Int32x4 to Uint16x8

func (Int32x4) AsUint32x4 ¶

func (from Int32x4) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Int32x4 to Uint32x4

func (Int32x4) AsUint64x2 ¶

func (from Int32x4) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Int32x4 to Uint64x2

func (Int32x4) AsUint8x16 ¶

func (from Int32x4) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Int32x4 to Uint8x16

func (Int32x4) Broadcast128 ¶

func (x Int32x4) Broadcast128() Int32x4

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTD, CPU Feature: AVX2

func (Int32x4) Broadcast256 ¶

func (x Int32x4) Broadcast256() Int32x8

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTD, CPU Feature: AVX2

func (Int32x4) Broadcast512 ¶

func (x Int32x4) Broadcast512() Int32x16

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTD, CPU Feature: AVX512

func (Int32x4) Compress ¶

func (x Int32x4) Compress(mask Mask32x4) Int32x4

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSD, CPU Feature: AVX512

func (Int32x4) ConcatPermute ¶

func (x Int32x4) ConcatPermute(y Int32x4, indices Uint32x4) Int32x4

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2D, CPU Feature: AVX512

func (Int32x4) ConvertToFloat32 ¶

func (x Int32x4) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32.

Asm: VCVTDQ2PS, CPU Feature: AVX

func (Int32x4) ConvertToFloat64 ¶

func (x Int32x4) ConvertToFloat64() Float64x4

ConvertToFloat64 converts element values to float64.

Asm: VCVTDQ2PD, CPU Feature: AVX

func (Int32x4) CopySign ¶

func (x Int32x4) CopySign(y Int32x4) Int32x4

CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.

Asm: VPSIGND, CPU Feature: AVX

func (Int32x4) Equal ¶

func (x Int32x4) Equal(y Int32x4) Mask32x4

Equal returns x equals y, elementwise.

Asm: VPCMPEQD, CPU Feature: AVX

func (Int32x4) Expand ¶

func (x Int32x4) Expand(mask Mask32x4) Int32x4

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDD, CPU Feature: AVX512

func (Int32x4) ExtendLo2ToInt64x2 ¶

func (x Int32x4) ExtendLo2ToInt64x2() Int64x2

ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXDQ, CPU Feature: AVX

func (Int32x4) ExtendToInt64 ¶

func (x Int32x4) ExtendToInt64() Int64x4

ExtendToInt64 converts element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXDQ, CPU Feature: AVX2

func (Int32x4) GetElem ¶

func (x Int32x4) GetElem(index uint8) int32

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRD, CPU Feature: AVX

func (Int32x4) Greater ¶

func (x Int32x4) Greater(y Int32x4) Mask32x4

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTD, CPU Feature: AVX

func (Int32x4) GreaterEqual ¶

func (x Int32x4) GreaterEqual(y Int32x4) Mask32x4

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX

func (Int32x4) InterleaveHi ¶

func (x Int32x4) InterleaveHi(y Int32x4) Int32x4

InterleaveHi interleaves the elements of the high halves of x and y.

Asm: VPUNPCKHDQ, CPU Feature: AVX

func (Int32x4) InterleaveLo ¶

func (x Int32x4) InterleaveLo(y Int32x4) Int32x4

InterleaveLo interleaves the elements of the low halves of x and y.

Asm: VPUNPCKLDQ, CPU Feature: AVX

func (Int32x4) IsZero ¶

func (x Int32x4) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int32x4) LeadingZeros ¶

func (x Int32x4) LeadingZeros() Int32x4

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTD, CPU Feature: AVX512

func (Int32x4) Len ¶

func (x Int32x4) Len() int

Len returns the number of elements in a Int32x4

func (Int32x4) Less ¶

func (x Int32x4) Less(y Int32x4) Mask32x4

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX

func (Int32x4) LessEqual ¶

func (x Int32x4) LessEqual(y Int32x4) Mask32x4

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX

func (Int32x4) Masked ¶

func (x Int32x4) Masked(mask Mask32x4) Int32x4

Masked returns x but with elements zeroed where mask is false.

func (Int32x4) Max ¶

func (x Int32x4) Max(y Int32x4) Int32x4

Max computes the maximum of corresponding elements.

Asm: VPMAXSD, CPU Feature: AVX

func (Int32x4) Merge ¶

func (x Int32x4) Merge(y Int32x4, mask Mask32x4) Int32x4

Merge returns x but with elements set to y where mask is false.

func (Int32x4) Min ¶

func (x Int32x4) Min(y Int32x4) Int32x4

Min computes the minimum of corresponding elements.

Asm: VPMINSD, CPU Feature: AVX

func (Int32x4) Mul ¶

func (x Int32x4) Mul(y Int32x4) Int32x4

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLD, CPU Feature: AVX

func (Int32x4) MulEvenWiden ¶

func (x Int32x4) MulEvenWiden(y Int32x4) Int64x2

MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1.Even[i] * v2.Even[i].

Asm: VPMULDQ, CPU Feature: AVX

func (Int32x4) Not ¶

func (x Int32x4) Not() Int32x4

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Int32x4) NotEqual ¶

func (x Int32x4) NotEqual(y Int32x4) Mask32x4

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Int32x4) OnesCount ¶

func (x Int32x4) OnesCount() Int32x4

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

func (Int32x4) Or ¶

func (x Int32x4) Or(y Int32x4) Int32x4

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Int32x4) PermuteScalars ¶

func (x Int32x4) PermuteScalars(a, b, c, d uint8) Int32x4

PermuteScalars performs a permutation of vector x's elements using the supplied indices:

result = {x[a], x[b], x[c], x[d]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.

Asm: VPSHUFD, CPU Feature: AVX

func (Int32x4) RotateAllLeft ¶

func (x Int32x4) RotateAllLeft(shift uint8) Int32x4

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLD, CPU Feature: AVX512

func (Int32x4) RotateAllRight ¶

func (x Int32x4) RotateAllRight(shift uint8) Int32x4

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORD, CPU Feature: AVX512

func (Int32x4) RotateLeft ¶

func (x Int32x4) RotateLeft(y Int32x4) Int32x4

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVD, CPU Feature: AVX512

func (Int32x4) RotateRight ¶

func (x Int32x4) RotateRight(y Int32x4) Int32x4

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVD, CPU Feature: AVX512

func (Int32x4) SaturateToInt16 ¶

func (x Int32x4) SaturateToInt16() Int16x8

SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.

Asm: VPMOVSDW, CPU Feature: AVX512

func (Int32x4) SaturateToInt16Concat ¶

func (x Int32x4) SaturateToInt16Concat(y Int32x4) Int16x8

SaturateToInt16Concat converts element values to int16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.

Asm: VPACKSSDW, CPU Feature: AVX

func (Int32x4) SaturateToInt8 ¶

func (x Int32x4) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSDB, CPU Feature: AVX512

func (Int32x4) SaturateToUint8 ¶

func (x Int32x4) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSDB, CPU Feature: AVX512

func (Int32x4) SelectFromPair ¶

func (x Int32x4) SelectFromPair(a, b, c, d uint8, y Int32x4) Int32x4

SelectFromPair returns the selection of four elements from the two vectors x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX

func (Int32x4) SetElem ¶

func (x Int32x4) SetElem(index uint8, y int32) Int32x4

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRD, CPU Feature: AVX

func (Int32x4) ShiftAllLeft ¶

func (x Int32x4) ShiftAllLeft(y uint64) Int32x4

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLD, CPU Feature: AVX

func (Int32x4) ShiftAllLeftConcat ¶

func (x Int32x4) ShiftAllLeftConcat(shift uint8, y Int32x4) Int32x4

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDD, CPU Feature: AVX512VBMI2

func (Int32x4) ShiftAllRight ¶

func (x Int32x4) ShiftAllRight(y uint64) Int32x4

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAD, CPU Feature: AVX

func (Int32x4) ShiftAllRightConcat ¶

func (x Int32x4) ShiftAllRightConcat(shift uint8, y Int32x4) Int32x4

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDD, CPU Feature: AVX512VBMI2

func (Int32x4) ShiftLeft ¶

func (x Int32x4) ShiftLeft(y Int32x4) Int32x4

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVD, CPU Feature: AVX2

func (Int32x4) ShiftLeftConcat ¶

func (x Int32x4) ShiftLeftConcat(y Int32x4, z Int32x4) Int32x4

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

func (Int32x4) ShiftRight ¶

func (x Int32x4) ShiftRight(y Int32x4) Int32x4

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVD, CPU Feature: AVX2

func (Int32x4) ShiftRightConcat ¶

func (x Int32x4) ShiftRightConcat(y Int32x4, z Int32x4) Int32x4

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

func (Int32x4) Store ¶

func (x Int32x4) Store(y *[4]int32)

Store stores a Int32x4 to an array

func (Int32x4) StoreMasked ¶

func (x Int32x4) StoreMasked(y *[4]int32, mask Mask32x4)

StoreMasked stores a Int32x4 to an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Int32x4) StoreSlice ¶

func (x Int32x4) StoreSlice(s []int32)

StoreSlice stores x into a slice of at least 4 int32s

func (Int32x4) StoreSlicePart ¶

func (x Int32x4) StoreSlicePart(s []int32)

StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.

func (Int32x4) String ¶

func (x Int32x4) String() string

String returns a string representation of SIMD vector x

func (Int32x4) Sub ¶

func (x Int32x4) Sub(y Int32x4) Int32x4

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBD, CPU Feature: AVX

func (Int32x4) SubPairs ¶

func (x Int32x4) SubPairs(y Int32x4) Int32x4

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBD, CPU Feature: AVX

func (Int32x4) ToMask ¶

func (from Int32x4) ToMask() (to Mask32x4)

ToMask converts from Int32x4 to Mask32x4, mask element is set to true when the corresponding vector element is non-zero.

func (Int32x4) TruncateToInt16 ¶

func (x Int32x4) TruncateToInt16() Int16x8

TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.

Asm: VPMOVDW, CPU Feature: AVX512

func (Int32x4) TruncateToInt8 ¶

func (x Int32x4) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVDB, CPU Feature: AVX512

func (Int32x4) Xor ¶

func (x Int32x4) Xor(y Int32x4) Int32x4

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Int32x8 ¶

type Int32x8 struct {
	// contains filtered or unexported fields
}

Int32x8 is a 256-bit SIMD vector of 8 int32

func BroadcastInt32x8 ¶

func BroadcastInt32x8(x int32) Int32x8

BroadcastInt32x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt32x8 ¶

func LoadInt32x8(y *[8]int32) Int32x8

LoadInt32x8 loads a Int32x8 from an array

func LoadInt32x8Slice ¶

func LoadInt32x8Slice(s []int32) Int32x8

LoadInt32x8Slice loads an Int32x8 from a slice of at least 8 int32s

func LoadInt32x8SlicePart ¶

func LoadInt32x8SlicePart(s []int32) Int32x8

LoadInt32x8SlicePart loads a Int32x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadInt32x8Slice.

func LoadMaskedInt32x8 ¶

func LoadMaskedInt32x8(y *[8]int32, mask Mask32x8) Int32x8

LoadMaskedInt32x8 loads a Int32x8 from an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Int32x8) Abs ¶

func (x Int32x8) Abs() Int32x8

Abs computes the absolute value of each element.

Asm: VPABSD, CPU Feature: AVX2

func (Int32x8) Add ¶

func (x Int32x8) Add(y Int32x8) Int32x8

Add adds corresponding elements of two vectors.

Asm: VPADDD, CPU Feature: AVX2

func (Int32x8) AddPairs ¶

func (x Int32x8) AddPairs(y Int32x8) Int32x8

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDD, CPU Feature: AVX2

func (Int32x8) And ¶

func (x Int32x8) And(y Int32x8) Int32x8

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Int32x8) AndNot ¶

func (x Int32x8) AndNot(y Int32x8) Int32x8

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Int32x8) AsFloat32x8 ¶

func (from Int32x8) AsFloat32x8() (to Float32x8)

Float32x8 converts from Int32x8 to Float32x8

func (Int32x8) AsFloat64x4 ¶

func (from Int32x8) AsFloat64x4() (to Float64x4)

Float64x4 converts from Int32x8 to Float64x4

func (Int32x8) AsInt16x16 ¶

func (from Int32x8) AsInt16x16() (to Int16x16)

Int16x16 converts from Int32x8 to Int16x16

func (Int32x8) AsInt64x4 ¶

func (from Int32x8) AsInt64x4() (to Int64x4)

Int64x4 converts from Int32x8 to Int64x4

func (Int32x8) AsInt8x32 ¶

func (from Int32x8) AsInt8x32() (to Int8x32)

Int8x32 converts from Int32x8 to Int8x32

func (Int32x8) AsUint16x16 ¶

func (from Int32x8) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Int32x8 to Uint16x16

func (Int32x8) AsUint32x8 ¶

func (from Int32x8) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Int32x8 to Uint32x8

func (Int32x8) AsUint64x4 ¶

func (from Int32x8) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Int32x8 to Uint64x4

func (Int32x8) AsUint8x32 ¶

func (from Int32x8) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Int32x8 to Uint8x32

func (Int32x8) Compress ¶

func (x Int32x8) Compress(mask Mask32x8) Int32x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSD, CPU Feature: AVX512

func (Int32x8) ConcatPermute ¶

func (x Int32x8) ConcatPermute(y Int32x8, indices Uint32x8) Int32x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2D, CPU Feature: AVX512

func (Int32x8) ConvertToFloat32 ¶

func (x Int32x8) ConvertToFloat32() Float32x8

ConvertToFloat32 converts element values to float32.

Asm: VCVTDQ2PS, CPU Feature: AVX

func (Int32x8) ConvertToFloat64 ¶

func (x Int32x8) ConvertToFloat64() Float64x8

ConvertToFloat64 converts element values to float64.

Asm: VCVTDQ2PD, CPU Feature: AVX512

func (Int32x8) CopySign ¶

func (x Int32x8) CopySign(y Int32x8) Int32x8

CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.

Asm: VPSIGND, CPU Feature: AVX2

func (Int32x8) Equal ¶

func (x Int32x8) Equal(y Int32x8) Mask32x8

Equal returns x equals y, elementwise.

Asm: VPCMPEQD, CPU Feature: AVX2

func (Int32x8) Expand ¶

func (x Int32x8) Expand(mask Mask32x8) Int32x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDD, CPU Feature: AVX512

func (Int32x8) ExtendToInt64 ¶

func (x Int32x8) ExtendToInt64() Int64x8

ExtendToInt64 converts element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXDQ, CPU Feature: AVX512

func (Int32x8) GetHi ¶

func (x Int32x8) GetHi() Int32x4

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int32x8) GetLo ¶

func (x Int32x8) GetLo() Int32x4

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int32x8) Greater ¶

func (x Int32x8) Greater(y Int32x8) Mask32x8

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTD, CPU Feature: AVX2

func (Int32x8) GreaterEqual ¶

func (x Int32x8) GreaterEqual(y Int32x8) Mask32x8

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Int32x8) InterleaveHiGrouped ¶

func (x Int32x8) InterleaveHiGrouped(y Int32x8) Int32x8

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHDQ, CPU Feature: AVX2

func (Int32x8) InterleaveLoGrouped ¶

func (x Int32x8) InterleaveLoGrouped(y Int32x8) Int32x8

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLDQ, CPU Feature: AVX2

func (Int32x8) IsZero ¶

func (x Int32x8) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int32x8) LeadingZeros ¶

func (x Int32x8) LeadingZeros() Int32x8

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTD, CPU Feature: AVX512

func (Int32x8) Len ¶

func (x Int32x8) Len() int

Len returns the number of elements in a Int32x8

func (Int32x8) Less ¶

func (x Int32x8) Less(y Int32x8) Mask32x8

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Int32x8) LessEqual ¶

func (x Int32x8) LessEqual(y Int32x8) Mask32x8

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Int32x8) Masked ¶

func (x Int32x8) Masked(mask Mask32x8) Int32x8

Masked returns x but with elements zeroed where mask is false.

func (Int32x8) Max ¶

func (x Int32x8) Max(y Int32x8) Int32x8

Max computes the maximum of corresponding elements.

Asm: VPMAXSD, CPU Feature: AVX2

func (Int32x8) Merge ¶

func (x Int32x8) Merge(y Int32x8, mask Mask32x8) Int32x8

Merge returns x but with elements set to y where mask is false.

func (Int32x8) Min ¶

func (x Int32x8) Min(y Int32x8) Int32x8

Min computes the minimum of corresponding elements.

Asm: VPMINSD, CPU Feature: AVX2

func (Int32x8) Mul ¶

func (x Int32x8) Mul(y Int32x8) Int32x8

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLD, CPU Feature: AVX2

func (Int32x8) MulEvenWiden ¶

func (x Int32x8) MulEvenWiden(y Int32x8) Int64x4

MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1.Even[i] * v2.Even[i].

Asm: VPMULDQ, CPU Feature: AVX2

func (Int32x8) Not ¶

func (x Int32x8) Not() Int32x8

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Int32x8) NotEqual ¶

func (x Int32x8) NotEqual(y Int32x8) Mask32x8

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Int32x8) OnesCount ¶

func (x Int32x8) OnesCount() Int32x8

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

func (Int32x8) Or ¶

func (x Int32x8) Or(y Int32x8) Int32x8

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Int32x8) Permute ¶

func (x Int32x8) Permute(indices Uint32x8) Int32x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMD, CPU Feature: AVX2

func (Int32x8) PermuteScalarsGrouped ¶

func (x Int32x8) PermuteScalarsGrouped(a, b, c, d uint8) Int32x8

PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:

result = {x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.

Asm: VPSHUFD, CPU Feature: AVX2

func (Int32x8) RotateAllLeft ¶

func (x Int32x8) RotateAllLeft(shift uint8) Int32x8

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLD, CPU Feature: AVX512

func (Int32x8) RotateAllRight ¶

func (x Int32x8) RotateAllRight(shift uint8) Int32x8

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORD, CPU Feature: AVX512

func (Int32x8) RotateLeft ¶

func (x Int32x8) RotateLeft(y Int32x8) Int32x8

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVD, CPU Feature: AVX512

func (Int32x8) RotateRight ¶

func (x Int32x8) RotateRight(y Int32x8) Int32x8

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVD, CPU Feature: AVX512

func (Int32x8) SaturateToInt16 ¶

func (x Int32x8) SaturateToInt16() Int16x8

SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.

Asm: VPMOVSDW, CPU Feature: AVX512

func (Int32x8) SaturateToInt16Concat ¶

func (x Int32x8) SaturateToInt16Concat(y Int32x8) Int16x16

SaturateToInt16Concat converts element values to int16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.

Asm: VPACKSSDW, CPU Feature: AVX2

func (Int32x8) SaturateToInt8 ¶

func (x Int32x8) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSDB, CPU Feature: AVX512

func (Int32x8) SaturateToUint8 ¶

func (x Int32x8) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSDB, CPU Feature: AVX512

func (Int32x8) Select128FromPair ¶

func (x Int32x8) Select128FromPair(lo, hi uint8, y Int32x8) Int32x8

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})

returns {70, 71, 72, 73, 40, 41, 42, 43}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Int32x8) SelectFromPairGrouped ¶

func (x Int32x8) SelectFromPairGrouped(a, b, c, d uint8, y Int32x8) Int32x8

SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})

returns {4,8,25,81,64,128,169,289}

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX

func (Int32x8) SetHi ¶

func (x Int32x8) SetHi(y Int32x4) Int32x8

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int32x8) SetLo ¶

func (x Int32x8) SetLo(y Int32x4) Int32x8

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int32x8) ShiftAllLeft ¶

func (x Int32x8) ShiftAllLeft(y uint64) Int32x8

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLD, CPU Feature: AVX2

func (Int32x8) ShiftAllLeftConcat ¶

func (x Int32x8) ShiftAllLeftConcat(shift uint8, y Int32x8) Int32x8

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDD, CPU Feature: AVX512VBMI2

func (Int32x8) ShiftAllRight ¶

func (x Int32x8) ShiftAllRight(y uint64) Int32x8

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAD, CPU Feature: AVX2

func (Int32x8) ShiftAllRightConcat ¶

func (x Int32x8) ShiftAllRightConcat(shift uint8, y Int32x8) Int32x8

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDD, CPU Feature: AVX512VBMI2

func (Int32x8) ShiftLeft ¶

func (x Int32x8) ShiftLeft(y Int32x8) Int32x8

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVD, CPU Feature: AVX2

func (Int32x8) ShiftLeftConcat ¶

func (x Int32x8) ShiftLeftConcat(y Int32x8, z Int32x8) Int32x8

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

func (Int32x8) ShiftRight ¶

func (x Int32x8) ShiftRight(y Int32x8) Int32x8

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVD, CPU Feature: AVX2

func (Int32x8) ShiftRightConcat ¶

func (x Int32x8) ShiftRightConcat(y Int32x8, z Int32x8) Int32x8

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

func (Int32x8) Store ¶

func (x Int32x8) Store(y *[8]int32)

Store stores a Int32x8 to an array

func (Int32x8) StoreMasked ¶

func (x Int32x8) StoreMasked(y *[8]int32, mask Mask32x8)

StoreMasked stores a Int32x8 to an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Int32x8) StoreSlice ¶

func (x Int32x8) StoreSlice(s []int32)

StoreSlice stores x into a slice of at least 8 int32s

func (Int32x8) StoreSlicePart ¶

func (x Int32x8) StoreSlicePart(s []int32)

StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Int32x8) String ¶

func (x Int32x8) String() string

String returns a string representation of SIMD vector x

func (Int32x8) Sub ¶

func (x Int32x8) Sub(y Int32x8) Int32x8

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBD, CPU Feature: AVX2

func (Int32x8) SubPairs ¶

func (x Int32x8) SubPairs(y Int32x8) Int32x8

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBD, CPU Feature: AVX2

func (Int32x8) ToMask ¶

func (from Int32x8) ToMask() (to Mask32x8)

ToMask converts from Int32x8 to Mask32x8, mask element is set to true when the corresponding vector element is non-zero.

func (Int32x8) TruncateToInt16 ¶

func (x Int32x8) TruncateToInt16() Int16x8

TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.

Asm: VPMOVDW, CPU Feature: AVX512

func (Int32x8) TruncateToInt8 ¶

func (x Int32x8) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVDB, CPU Feature: AVX512

func (Int32x8) Xor ¶

func (x Int32x8) Xor(y Int32x8) Int32x8

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Int64x2 ¶

type Int64x2 struct {
	// contains filtered or unexported fields
}

Int64x2 is a 128-bit SIMD vector of 2 int64

func BroadcastInt64x2 ¶

func BroadcastInt64x2(x int64) Int64x2

BroadcastInt64x2 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt64x2 ¶

func LoadInt64x2(y *[2]int64) Int64x2

LoadInt64x2 loads a Int64x2 from an array

func LoadInt64x2Slice ¶

func LoadInt64x2Slice(s []int64) Int64x2

LoadInt64x2Slice loads an Int64x2 from a slice of at least 2 int64s

func LoadInt64x2SlicePart ¶

func LoadInt64x2SlicePart(s []int64) Int64x2

LoadInt64x2SlicePart loads a Int64x2 from the slice s. If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes. If s has 2 or more elements, the function is equivalent to LoadInt64x2Slice.

func LoadMaskedInt64x2 ¶

func LoadMaskedInt64x2(y *[2]int64, mask Mask64x2) Int64x2

LoadMaskedInt64x2 loads a Int64x2 from an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Int64x2) Abs ¶

func (x Int64x2) Abs() Int64x2

Abs computes the absolute value of each element.

Asm: VPABSQ, CPU Feature: AVX512

func (Int64x2) Add ¶

func (x Int64x2) Add(y Int64x2) Int64x2

Add adds corresponding elements of two vectors.

Asm: VPADDQ, CPU Feature: AVX

func (Int64x2) And ¶

func (x Int64x2) And(y Int64x2) Int64x2

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Int64x2) AndNot ¶

func (x Int64x2) AndNot(y Int64x2) Int64x2

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Int64x2) AsFloat32x4 ¶

func (from Int64x2) AsFloat32x4() (to Float32x4)

Float32x4 converts from Int64x2 to Float32x4

func (Int64x2) AsFloat64x2 ¶

func (from Int64x2) AsFloat64x2() (to Float64x2)

Float64x2 converts from Int64x2 to Float64x2

func (Int64x2) AsInt16x8 ¶

func (from Int64x2) AsInt16x8() (to Int16x8)

Int16x8 converts from Int64x2 to Int16x8

func (Int64x2) AsInt32x4 ¶

func (from Int64x2) AsInt32x4() (to Int32x4)

Int32x4 converts from Int64x2 to Int32x4

func (Int64x2) AsInt8x16 ¶

func (from Int64x2) AsInt8x16() (to Int8x16)

Int8x16 converts from Int64x2 to Int8x16

func (Int64x2) AsUint16x8 ¶

func (from Int64x2) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Int64x2 to Uint16x8

func (Int64x2) AsUint32x4 ¶

func (from Int64x2) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Int64x2 to Uint32x4

func (Int64x2) AsUint64x2 ¶

func (from Int64x2) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Int64x2 to Uint64x2

func (Int64x2) AsUint8x16 ¶

func (from Int64x2) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Int64x2 to Uint8x16

func (Int64x2) Broadcast128 ¶

func (x Int64x2) Broadcast128() Int64x2

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTQ, CPU Feature: AVX2

func (Int64x2) Broadcast256 ¶

func (x Int64x2) Broadcast256() Int64x4

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTQ, CPU Feature: AVX2

func (Int64x2) Broadcast512 ¶

func (x Int64x2) Broadcast512() Int64x8

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTQ, CPU Feature: AVX512

func (Int64x2) Compress ¶

func (x Int64x2) Compress(mask Mask64x2) Int64x2

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSQ, CPU Feature: AVX512

func (Int64x2) ConcatPermute ¶

func (x Int64x2) ConcatPermute(y Int64x2, indices Uint64x2) Int64x2

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2Q, CPU Feature: AVX512

func (Int64x2) ConvertToFloat32 ¶

func (x Int64x2) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32.

Asm: VCVTQQ2PSX, CPU Feature: AVX512

func (Int64x2) ConvertToFloat64 ¶

func (x Int64x2) ConvertToFloat64() Float64x2

ConvertToFloat64 converts element values to float64.

Asm: VCVTQQ2PD, CPU Feature: AVX512

func (Int64x2) Equal ¶

func (x Int64x2) Equal(y Int64x2) Mask64x2

Equal returns x equals y, elementwise.

Asm: VPCMPEQQ, CPU Feature: AVX

func (Int64x2) Expand ¶

func (x Int64x2) Expand(mask Mask64x2) Int64x2

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDQ, CPU Feature: AVX512

func (Int64x2) GetElem ¶

func (x Int64x2) GetElem(index uint8) int64

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRQ, CPU Feature: AVX

func (Int64x2) Greater ¶

func (x Int64x2) Greater(y Int64x2) Mask64x2

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTQ, CPU Feature: AVX

func (Int64x2) GreaterEqual ¶

func (x Int64x2) GreaterEqual(y Int64x2) Mask64x2

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX

func (Int64x2) InterleaveHi ¶

func (x Int64x2) InterleaveHi(y Int64x2) Int64x2

InterleaveHi interleaves the elements of the high halves of x and y.

Asm: VPUNPCKHQDQ, CPU Feature: AVX

func (Int64x2) InterleaveLo ¶

func (x Int64x2) InterleaveLo(y Int64x2) Int64x2

InterleaveLo interleaves the elements of the low halves of x and y.

Asm: VPUNPCKLQDQ, CPU Feature: AVX

func (Int64x2) IsZero ¶

func (x Int64x2) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int64x2) LeadingZeros ¶

func (x Int64x2) LeadingZeros() Int64x2

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTQ, CPU Feature: AVX512

func (Int64x2) Len ¶

func (x Int64x2) Len() int

Len returns the number of elements in a Int64x2

func (Int64x2) Less ¶

func (x Int64x2) Less(y Int64x2) Mask64x2

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX

func (Int64x2) LessEqual ¶

func (x Int64x2) LessEqual(y Int64x2) Mask64x2

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX

func (Int64x2) Masked ¶

func (x Int64x2) Masked(mask Mask64x2) Int64x2

Masked returns x but with elements zeroed where mask is false.

func (Int64x2) Max ¶

func (x Int64x2) Max(y Int64x2) Int64x2

Max computes the maximum of corresponding elements.

Asm: VPMAXSQ, CPU Feature: AVX512

func (Int64x2) Merge ¶

func (x Int64x2) Merge(y Int64x2, mask Mask64x2) Int64x2

Merge returns x but with elements set to y where mask is false.

func (Int64x2) Min ¶

func (x Int64x2) Min(y Int64x2) Int64x2

Min computes the minimum of corresponding elements.

Asm: VPMINSQ, CPU Feature: AVX512

func (Int64x2) Mul ¶

func (x Int64x2) Mul(y Int64x2) Int64x2

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLQ, CPU Feature: AVX512

func (Int64x2) Not ¶

func (x Int64x2) Not() Int64x2

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Int64x2) NotEqual ¶

func (x Int64x2) NotEqual(y Int64x2) Mask64x2

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Int64x2) OnesCount ¶

func (x Int64x2) OnesCount() Int64x2

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

func (Int64x2) Or ¶

func (x Int64x2) Or(y Int64x2) Int64x2

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Int64x2) RotateAllLeft ¶

func (x Int64x2) RotateAllLeft(shift uint8) Int64x2

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLQ, CPU Feature: AVX512

func (Int64x2) RotateAllRight ¶

func (x Int64x2) RotateAllRight(shift uint8) Int64x2

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORQ, CPU Feature: AVX512

func (Int64x2) RotateLeft ¶

func (x Int64x2) RotateLeft(y Int64x2) Int64x2

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVQ, CPU Feature: AVX512

func (Int64x2) RotateRight ¶

func (x Int64x2) RotateRight(y Int64x2) Int64x2

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVQ, CPU Feature: AVX512

func (Int64x2) SaturateToInt16 ¶

func (x Int64x2) SaturateToInt16() Int16x8

SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.

Asm: VPMOVSQW, CPU Feature: AVX512

func (Int64x2) SaturateToInt32 ¶

func (x Int64x2) SaturateToInt32() Int32x4

SaturateToInt32 converts element values to int32. Conversion is done with saturation on the vector elements.

Asm: VPMOVSQD, CPU Feature: AVX512

func (Int64x2) SaturateToInt8 ¶

func (x Int64x2) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSQB, CPU Feature: AVX512

func (Int64x2) SaturateToUint8 ¶

func (x Int64x2) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSQB, CPU Feature: AVX512

func (Int64x2) SelectFromPair ¶

func (x Int64x2) SelectFromPair(a, b uint8, y Int64x2) Int64x2

SelectFromPair returns the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX

func (Int64x2) SetElem ¶

func (x Int64x2) SetElem(index uint8, y int64) Int64x2

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRQ, CPU Feature: AVX

func (Int64x2) ShiftAllLeft ¶

func (x Int64x2) ShiftAllLeft(y uint64) Int64x2

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLQ, CPU Feature: AVX

func (Int64x2) ShiftAllLeftConcat ¶

func (x Int64x2) ShiftAllLeftConcat(shift uint8, y Int64x2) Int64x2

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

func (Int64x2) ShiftAllRight ¶

func (x Int64x2) ShiftAllRight(y uint64) Int64x2

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAQ, CPU Feature: AVX512

func (Int64x2) ShiftAllRightConcat ¶

func (x Int64x2) ShiftAllRightConcat(shift uint8, y Int64x2) Int64x2

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

func (Int64x2) ShiftLeft ¶

func (x Int64x2) ShiftLeft(y Int64x2) Int64x2

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVQ, CPU Feature: AVX2

func (Int64x2) ShiftLeftConcat ¶

func (x Int64x2) ShiftLeftConcat(y Int64x2, z Int64x2) Int64x2

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

func (Int64x2) ShiftRight ¶

func (x Int64x2) ShiftRight(y Int64x2) Int64x2

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVQ, CPU Feature: AVX512

func (Int64x2) ShiftRightConcat ¶

func (x Int64x2) ShiftRightConcat(y Int64x2, z Int64x2) Int64x2

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

func (Int64x2) Store ¶

func (x Int64x2) Store(y *[2]int64)

Store stores a Int64x2 to an array

func (Int64x2) StoreMasked ¶

func (x Int64x2) StoreMasked(y *[2]int64, mask Mask64x2)

StoreMasked stores a Int64x2 to an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Int64x2) StoreSlice ¶

func (x Int64x2) StoreSlice(s []int64)

StoreSlice stores x into a slice of at least 2 int64s

func (Int64x2) StoreSlicePart ¶

func (x Int64x2) StoreSlicePart(s []int64)

StoreSlicePart stores the 2 elements of x into the slice s. It stores as many elements as will fit in s. If s has 2 or more elements, the method is equivalent to x.StoreSlice.

func (Int64x2) String ¶

func (x Int64x2) String() string

String returns a string representation of SIMD vector x

func (Int64x2) Sub ¶

func (x Int64x2) Sub(y Int64x2) Int64x2

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBQ, CPU Feature: AVX

func (Int64x2) ToMask ¶

func (from Int64x2) ToMask() (to Mask64x2)

ToMask converts from Int64x2 to Mask64x2, mask element is set to true when the corresponding vector element is non-zero.

func (Int64x2) TruncateToInt16 ¶

func (x Int64x2) TruncateToInt16() Int16x8

TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.

Asm: VPMOVQW, CPU Feature: AVX512

func (Int64x2) TruncateToInt32 ¶

func (x Int64x2) TruncateToInt32() Int32x4

TruncateToInt32 converts element values to int32. Conversion is done with truncation on the vector elements.

Asm: VPMOVQD, CPU Feature: AVX512

func (Int64x2) TruncateToInt8 ¶

func (x Int64x2) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVQB, CPU Feature: AVX512

func (Int64x2) Xor ¶

func (x Int64x2) Xor(y Int64x2) Int64x2

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Int64x4 ¶

type Int64x4 struct {
	// contains filtered or unexported fields
}

Int64x4 is a 256-bit SIMD vector of 4 int64

func BroadcastInt64x4 ¶

func BroadcastInt64x4(x int64) Int64x4

BroadcastInt64x4 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt64x4 ¶

func LoadInt64x4(y *[4]int64) Int64x4

LoadInt64x4 loads a Int64x4 from an array

func LoadInt64x4Slice ¶

func LoadInt64x4Slice(s []int64) Int64x4

LoadInt64x4Slice loads an Int64x4 from a slice of at least 4 int64s

func LoadInt64x4SlicePart ¶

func LoadInt64x4SlicePart(s []int64) Int64x4

LoadInt64x4SlicePart loads a Int64x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadInt64x4Slice.

func LoadMaskedInt64x4 ¶

func LoadMaskedInt64x4(y *[4]int64, mask Mask64x4) Int64x4

LoadMaskedInt64x4 loads a Int64x4 from an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Int64x4) Abs ¶

func (x Int64x4) Abs() Int64x4

Abs computes the absolute value of each element.

Asm: VPABSQ, CPU Feature: AVX512

func (Int64x4) Add ¶

func (x Int64x4) Add(y Int64x4) Int64x4

Add adds corresponding elements of two vectors.

Asm: VPADDQ, CPU Feature: AVX2

func (Int64x4) And ¶

func (x Int64x4) And(y Int64x4) Int64x4

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Int64x4) AndNot ¶

func (x Int64x4) AndNot(y Int64x4) Int64x4

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Int64x4) AsFloat32x8 ¶

func (from Int64x4) AsFloat32x8() (to Float32x8)

Float32x8 converts from Int64x4 to Float32x8

func (Int64x4) AsFloat64x4 ¶

func (from Int64x4) AsFloat64x4() (to Float64x4)

Float64x4 converts from Int64x4 to Float64x4

func (Int64x4) AsInt16x16 ¶

func (from Int64x4) AsInt16x16() (to Int16x16)

Int16x16 converts from Int64x4 to Int16x16

func (Int64x4) AsInt32x8 ¶

func (from Int64x4) AsInt32x8() (to Int32x8)

Int32x8 converts from Int64x4 to Int32x8

func (Int64x4) AsInt8x32 ¶

func (from Int64x4) AsInt8x32() (to Int8x32)

Int8x32 converts from Int64x4 to Int8x32

func (Int64x4) AsUint16x16 ¶

func (from Int64x4) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Int64x4 to Uint16x16

func (Int64x4) AsUint32x8 ¶

func (from Int64x4) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Int64x4 to Uint32x8

func (Int64x4) AsUint64x4 ¶

func (from Int64x4) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Int64x4 to Uint64x4

func (Int64x4) AsUint8x32 ¶

func (from Int64x4) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Int64x4 to Uint8x32

func (Int64x4) Compress ¶

func (x Int64x4) Compress(mask Mask64x4) Int64x4

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSQ, CPU Feature: AVX512

func (Int64x4) ConcatPermute ¶

func (x Int64x4) ConcatPermute(y Int64x4, indices Uint64x4) Int64x4

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2Q, CPU Feature: AVX512

func (Int64x4) ConvertToFloat32 ¶

func (x Int64x4) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32.

Asm: VCVTQQ2PSY, CPU Feature: AVX512

func (Int64x4) ConvertToFloat64 ¶

func (x Int64x4) ConvertToFloat64() Float64x4

ConvertToFloat64 converts element values to float64.

Asm: VCVTQQ2PD, CPU Feature: AVX512

func (Int64x4) Equal ¶

func (x Int64x4) Equal(y Int64x4) Mask64x4

Equal returns x equals y, elementwise.

Asm: VPCMPEQQ, CPU Feature: AVX2

func (Int64x4) Expand ¶

func (x Int64x4) Expand(mask Mask64x4) Int64x4

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDQ, CPU Feature: AVX512

func (Int64x4) GetHi ¶

func (x Int64x4) GetHi() Int64x2

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int64x4) GetLo ¶

func (x Int64x4) GetLo() Int64x2

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int64x4) Greater ¶

func (x Int64x4) Greater(y Int64x4) Mask64x4

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTQ, CPU Feature: AVX2

func (Int64x4) GreaterEqual ¶

func (x Int64x4) GreaterEqual(y Int64x4) Mask64x4

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Int64x4) InterleaveHiGrouped ¶

func (x Int64x4) InterleaveHiGrouped(y Int64x4) Int64x4

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHQDQ, CPU Feature: AVX2

func (Int64x4) InterleaveLoGrouped ¶

func (x Int64x4) InterleaveLoGrouped(y Int64x4) Int64x4

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLQDQ, CPU Feature: AVX2

func (Int64x4) IsZero ¶

func (x Int64x4) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int64x4) LeadingZeros ¶

func (x Int64x4) LeadingZeros() Int64x4

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTQ, CPU Feature: AVX512

func (Int64x4) Len ¶

func (x Int64x4) Len() int

Len returns the number of elements in a Int64x4

func (Int64x4) Less ¶

func (x Int64x4) Less(y Int64x4) Mask64x4

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Int64x4) LessEqual ¶

func (x Int64x4) LessEqual(y Int64x4) Mask64x4

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Int64x4) Masked ¶

func (x Int64x4) Masked(mask Mask64x4) Int64x4

Masked returns x but with elements zeroed where mask is false.

func (Int64x4) Max ¶

func (x Int64x4) Max(y Int64x4) Int64x4

Max computes the maximum of corresponding elements.

Asm: VPMAXSQ, CPU Feature: AVX512

func (Int64x4) Merge ¶

func (x Int64x4) Merge(y Int64x4, mask Mask64x4) Int64x4

Merge returns x but with elements set to y where mask is false.

func (Int64x4) Min ¶

func (x Int64x4) Min(y Int64x4) Int64x4

Min computes the minimum of corresponding elements.

Asm: VPMINSQ, CPU Feature: AVX512

func (Int64x4) Mul ¶

func (x Int64x4) Mul(y Int64x4) Int64x4

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLQ, CPU Feature: AVX512

func (Int64x4) Not ¶

func (x Int64x4) Not() Int64x4

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Int64x4) NotEqual ¶

func (x Int64x4) NotEqual(y Int64x4) Mask64x4

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Int64x4) OnesCount ¶

func (x Int64x4) OnesCount() Int64x4

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

func (Int64x4) Or ¶

func (x Int64x4) Or(y Int64x4) Int64x4

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Int64x4) Permute ¶

func (x Int64x4) Permute(indices Uint64x4) Int64x4

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 2 bits (values 0-3) of each element of indices is used

Asm: VPERMQ, CPU Feature: AVX512

func (Int64x4) RotateAllLeft ¶

func (x Int64x4) RotateAllLeft(shift uint8) Int64x4

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLQ, CPU Feature: AVX512

func (Int64x4) RotateAllRight ¶

func (x Int64x4) RotateAllRight(shift uint8) Int64x4

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORQ, CPU Feature: AVX512

func (Int64x4) RotateLeft ¶

func (x Int64x4) RotateLeft(y Int64x4) Int64x4

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVQ, CPU Feature: AVX512

func (Int64x4) RotateRight ¶

func (x Int64x4) RotateRight(y Int64x4) Int64x4

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVQ, CPU Feature: AVX512

func (Int64x4) SaturateToInt16 ¶

func (x Int64x4) SaturateToInt16() Int16x8

SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.

Asm: VPMOVSQW, CPU Feature: AVX512

func (Int64x4) SaturateToInt32 ¶

func (x Int64x4) SaturateToInt32() Int32x4

SaturateToInt32 converts element values to int32. Conversion is done with saturation on the vector elements.

Asm: VPMOVSQD, CPU Feature: AVX512

func (Int64x4) SaturateToInt8 ¶

func (x Int64x4) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSQB, CPU Feature: AVX512

func (Int64x4) SaturateToUint8 ¶

func (x Int64x4) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSQB, CPU Feature: AVX512

func (Int64x4) Select128FromPair ¶

func (x Int64x4) Select128FromPair(lo, hi uint8, y Int64x4) Int64x4

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})

returns {70, 71, 40, 41}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Int64x4) SelectFromPairGrouped ¶

func (x Int64x4) SelectFromPairGrouped(a, b uint8, y Int64x4) Int64x4

SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX

func (Int64x4) SetHi ¶

func (x Int64x4) SetHi(y Int64x2) Int64x4

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int64x4) SetLo ¶

func (x Int64x4) SetLo(y Int64x2) Int64x4

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int64x4) ShiftAllLeft ¶

func (x Int64x4) ShiftAllLeft(y uint64) Int64x4

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLQ, CPU Feature: AVX2

func (Int64x4) ShiftAllLeftConcat ¶

func (x Int64x4) ShiftAllLeftConcat(shift uint8, y Int64x4) Int64x4

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

func (Int64x4) ShiftAllRight ¶

func (x Int64x4) ShiftAllRight(y uint64) Int64x4

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAQ, CPU Feature: AVX512

func (Int64x4) ShiftAllRightConcat ¶

func (x Int64x4) ShiftAllRightConcat(shift uint8, y Int64x4) Int64x4

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

func (Int64x4) ShiftLeft ¶

func (x Int64x4) ShiftLeft(y Int64x4) Int64x4

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVQ, CPU Feature: AVX2

func (Int64x4) ShiftLeftConcat ¶

func (x Int64x4) ShiftLeftConcat(y Int64x4, z Int64x4) Int64x4

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

func (Int64x4) ShiftRight ¶

func (x Int64x4) ShiftRight(y Int64x4) Int64x4

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVQ, CPU Feature: AVX512

func (Int64x4) ShiftRightConcat ¶

func (x Int64x4) ShiftRightConcat(y Int64x4, z Int64x4) Int64x4

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

func (Int64x4) Store ¶

func (x Int64x4) Store(y *[4]int64)

Store stores a Int64x4 to an array

func (Int64x4) StoreMasked ¶

func (x Int64x4) StoreMasked(y *[4]int64, mask Mask64x4)

StoreMasked stores a Int64x4 to an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Int64x4) StoreSlice ¶

func (x Int64x4) StoreSlice(s []int64)

StoreSlice stores x into a slice of at least 4 int64s

func (Int64x4) StoreSlicePart ¶

func (x Int64x4) StoreSlicePart(s []int64)

StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.

func (Int64x4) String ¶

func (x Int64x4) String() string

String returns a string representation of SIMD vector x

func (Int64x4) Sub ¶

func (x Int64x4) Sub(y Int64x4) Int64x4

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBQ, CPU Feature: AVX2

func (Int64x4) ToMask ¶

func (from Int64x4) ToMask() (to Mask64x4)

ToMask converts from Int64x4 to Mask64x4, mask element is set to true when the corresponding vector element is non-zero.

func (Int64x4) TruncateToInt16 ¶

func (x Int64x4) TruncateToInt16() Int16x8

TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.

Asm: VPMOVQW, CPU Feature: AVX512

func (Int64x4) TruncateToInt32 ¶

func (x Int64x4) TruncateToInt32() Int32x4

TruncateToInt32 converts element values to int32. Conversion is done with truncation on the vector elements.

Asm: VPMOVQD, CPU Feature: AVX512

func (Int64x4) TruncateToInt8 ¶

func (x Int64x4) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVQB, CPU Feature: AVX512

func (Int64x4) Xor ¶

func (x Int64x4) Xor(y Int64x4) Int64x4

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Int64x8 ¶

type Int64x8 struct {
	// contains filtered or unexported fields
}

Int64x8 is a 512-bit SIMD vector of 8 int64

func BroadcastInt64x8 ¶

func BroadcastInt64x8(x int64) Int64x8

BroadcastInt64x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512F

func LoadInt64x8 ¶

func LoadInt64x8(y *[8]int64) Int64x8

LoadInt64x8 loads a Int64x8 from an array

func LoadInt64x8Slice ¶

func LoadInt64x8Slice(s []int64) Int64x8

LoadInt64x8Slice loads an Int64x8 from a slice of at least 8 int64s

func LoadInt64x8SlicePart ¶

func LoadInt64x8SlicePart(s []int64) Int64x8

LoadInt64x8SlicePart loads a Int64x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadInt64x8Slice.

func LoadMaskedInt64x8 ¶

func LoadMaskedInt64x8(y *[8]int64, mask Mask64x8) Int64x8

LoadMaskedInt64x8 loads a Int64x8 from an array, at those elements enabled by mask

Asm: VMOVDQU64.Z, CPU Feature: AVX512

func (Int64x8) Abs ¶

func (x Int64x8) Abs() Int64x8

Abs computes the absolute value of each element.

Asm: VPABSQ, CPU Feature: AVX512

func (Int64x8) Add ¶

func (x Int64x8) Add(y Int64x8) Int64x8

Add adds corresponding elements of two vectors.

Asm: VPADDQ, CPU Feature: AVX512

func (Int64x8) And ¶

func (x Int64x8) And(y Int64x8) Int64x8

And performs a bitwise AND operation between two vectors.

Asm: VPANDQ, CPU Feature: AVX512

func (Int64x8) AndNot ¶

func (x Int64x8) AndNot(y Int64x8) Int64x8

AndNot performs a bitwise x &^ y.

Asm: VPANDNQ, CPU Feature: AVX512

func (Int64x8) AsFloat32x16 ¶

func (from Int64x8) AsFloat32x16() (to Float32x16)

Float32x16 converts from Int64x8 to Float32x16

func (Int64x8) AsFloat64x8 ¶

func (from Int64x8) AsFloat64x8() (to Float64x8)

Float64x8 converts from Int64x8 to Float64x8

func (Int64x8) AsInt16x32 ¶

func (from Int64x8) AsInt16x32() (to Int16x32)

Int16x32 converts from Int64x8 to Int16x32

func (Int64x8) AsInt32x16 ¶

func (from Int64x8) AsInt32x16() (to Int32x16)

Int32x16 converts from Int64x8 to Int32x16

func (Int64x8) AsInt8x64 ¶

func (from Int64x8) AsInt8x64() (to Int8x64)

Int8x64 converts from Int64x8 to Int8x64

func (Int64x8) AsUint16x32 ¶

func (from Int64x8) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Int64x8 to Uint16x32

func (Int64x8) AsUint32x16 ¶

func (from Int64x8) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Int64x8 to Uint32x16

func (Int64x8) AsUint64x8 ¶

func (from Int64x8) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Int64x8 to Uint64x8

func (Int64x8) AsUint8x64 ¶

func (from Int64x8) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Int64x8 to Uint8x64

func (Int64x8) Compress ¶

func (x Int64x8) Compress(mask Mask64x8) Int64x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSQ, CPU Feature: AVX512

func (Int64x8) ConcatPermute ¶

func (x Int64x8) ConcatPermute(y Int64x8, indices Uint64x8) Int64x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2Q, CPU Feature: AVX512

func (Int64x8) ConvertToFloat32 ¶

func (x Int64x8) ConvertToFloat32() Float32x8

ConvertToFloat32 converts element values to float32.

Asm: VCVTQQ2PS, CPU Feature: AVX512

func (Int64x8) ConvertToFloat64 ¶

func (x Int64x8) ConvertToFloat64() Float64x8

ConvertToFloat64 converts element values to float64.

Asm: VCVTQQ2PD, CPU Feature: AVX512

func (Int64x8) Equal ¶

func (x Int64x8) Equal(y Int64x8) Mask64x8

Equal returns x equals y, elementwise.

Asm: VPCMPEQQ, CPU Feature: AVX512

func (Int64x8) Expand ¶

func (x Int64x8) Expand(mask Mask64x8) Int64x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDQ, CPU Feature: AVX512

func (Int64x8) GetHi ¶

func (x Int64x8) GetHi() Int64x4

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int64x8) GetLo ¶

func (x Int64x8) GetLo() Int64x4

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int64x8) Greater ¶

func (x Int64x8) Greater(y Int64x8) Mask64x8

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTQ, CPU Feature: AVX512

func (Int64x8) GreaterEqual ¶

func (x Int64x8) GreaterEqual(y Int64x8) Mask64x8

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPQ, CPU Feature: AVX512

func (Int64x8) InterleaveHiGrouped ¶

func (x Int64x8) InterleaveHiGrouped(y Int64x8) Int64x8

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHQDQ, CPU Feature: AVX512

func (Int64x8) InterleaveLoGrouped ¶

func (x Int64x8) InterleaveLoGrouped(y Int64x8) Int64x8

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLQDQ, CPU Feature: AVX512

func (Int64x8) LeadingZeros ¶

func (x Int64x8) LeadingZeros() Int64x8

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTQ, CPU Feature: AVX512

func (Int64x8) Len ¶

func (x Int64x8) Len() int

Len returns the number of elements in a Int64x8

func (Int64x8) Less ¶

func (x Int64x8) Less(y Int64x8) Mask64x8

Less returns x less-than y, elementwise.

Asm: VPCMPQ, CPU Feature: AVX512

func (Int64x8) LessEqual ¶

func (x Int64x8) LessEqual(y Int64x8) Mask64x8

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPQ, CPU Feature: AVX512

func (Int64x8) Masked ¶

func (x Int64x8) Masked(mask Mask64x8) Int64x8

Masked returns x but with elements zeroed where mask is false.

func (Int64x8) Max ¶

func (x Int64x8) Max(y Int64x8) Int64x8

Max computes the maximum of corresponding elements.

Asm: VPMAXSQ, CPU Feature: AVX512

func (Int64x8) Merge ¶

func (x Int64x8) Merge(y Int64x8, mask Mask64x8) Int64x8

Merge returns x but with elements set to y where m is false.

func (Int64x8) Min ¶

func (x Int64x8) Min(y Int64x8) Int64x8

Min computes the minimum of corresponding elements.

Asm: VPMINSQ, CPU Feature: AVX512

func (Int64x8) Mul ¶

func (x Int64x8) Mul(y Int64x8) Int64x8

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLQ, CPU Feature: AVX512

func (Int64x8) Not ¶

func (x Int64x8) Not() Int64x8

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Int64x8) NotEqual ¶

func (x Int64x8) NotEqual(y Int64x8) Mask64x8

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPQ, CPU Feature: AVX512

func (Int64x8) OnesCount ¶

func (x Int64x8) OnesCount() Int64x8

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

func (Int64x8) Or ¶

func (x Int64x8) Or(y Int64x8) Int64x8

Or performs a bitwise OR operation between two vectors.

Asm: VPORQ, CPU Feature: AVX512

func (Int64x8) Permute ¶

func (x Int64x8) Permute(indices Uint64x8) Int64x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMQ, CPU Feature: AVX512

func (Int64x8) RotateAllLeft ¶

func (x Int64x8) RotateAllLeft(shift uint8) Int64x8

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLQ, CPU Feature: AVX512

func (Int64x8) RotateAllRight ¶

func (x Int64x8) RotateAllRight(shift uint8) Int64x8

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORQ, CPU Feature: AVX512

func (Int64x8) RotateLeft ¶

func (x Int64x8) RotateLeft(y Int64x8) Int64x8

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVQ, CPU Feature: AVX512

func (Int64x8) RotateRight ¶

func (x Int64x8) RotateRight(y Int64x8) Int64x8

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVQ, CPU Feature: AVX512

func (Int64x8) SaturateToInt16 ¶

func (x Int64x8) SaturateToInt16() Int16x8

SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.

Asm: VPMOVSQW, CPU Feature: AVX512

func (Int64x8) SaturateToInt32 ¶

func (x Int64x8) SaturateToInt32() Int32x8

SaturateToInt32 converts element values to int32. Conversion is done with saturation on the vector elements.

Asm: VPMOVSQD, CPU Feature: AVX512

func (Int64x8) SaturateToInt8 ¶

func (x Int64x8) SaturateToInt8() Int8x16

SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSQB, CPU Feature: AVX512

func (Int64x8) SaturateToUint8 ¶

func (x Int64x8) SaturateToUint8() Int8x16

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVSQB, CPU Feature: AVX512

func (Int64x8) SelectFromPairGrouped ¶

func (x Int64x8) SelectFromPairGrouped(a, b uint8, y Int64x8) Int64x8

SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX512

func (Int64x8) SetHi ¶

func (x Int64x8) SetHi(y Int64x4) Int64x8

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int64x8) SetLo ¶

func (x Int64x8) SetLo(y Int64x4) Int64x8

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int64x8) ShiftAllLeft ¶

func (x Int64x8) ShiftAllLeft(y uint64) Int64x8

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLQ, CPU Feature: AVX512

func (Int64x8) ShiftAllLeftConcat ¶

func (x Int64x8) ShiftAllLeftConcat(shift uint8, y Int64x8) Int64x8

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

func (Int64x8) ShiftAllRight ¶

func (x Int64x8) ShiftAllRight(y uint64) Int64x8

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.

Asm: VPSRAQ, CPU Feature: AVX512

func (Int64x8) ShiftAllRightConcat ¶

func (x Int64x8) ShiftAllRightConcat(shift uint8, y Int64x8) Int64x8

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

func (Int64x8) ShiftLeft ¶

func (x Int64x8) ShiftLeft(y Int64x8) Int64x8

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVQ, CPU Feature: AVX512

func (Int64x8) ShiftLeftConcat ¶

func (x Int64x8) ShiftLeftConcat(y Int64x8, z Int64x8) Int64x8

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

func (Int64x8) ShiftRight ¶

func (x Int64x8) ShiftRight(y Int64x8) Int64x8

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.

Asm: VPSRAVQ, CPU Feature: AVX512

func (Int64x8) ShiftRightConcat ¶

func (x Int64x8) ShiftRightConcat(y Int64x8, z Int64x8) Int64x8

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

func (Int64x8) Store ¶

func (x Int64x8) Store(y *[8]int64)

Store stores a Int64x8 to an array

func (Int64x8) StoreMasked ¶

func (x Int64x8) StoreMasked(y *[8]int64, mask Mask64x8)

StoreMasked stores a Int64x8 to an array, at those elements enabled by mask

Asm: VMOVDQU64, CPU Feature: AVX512

func (Int64x8) StoreSlice ¶

func (x Int64x8) StoreSlice(s []int64)

StoreSlice stores x into a slice of at least 8 int64s

func (Int64x8) StoreSlicePart ¶

func (x Int64x8) StoreSlicePart(s []int64)

StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Int64x8) String ¶

func (x Int64x8) String() string

String returns a string representation of SIMD vector x

func (Int64x8) Sub ¶

func (x Int64x8) Sub(y Int64x8) Int64x8

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBQ, CPU Feature: AVX512

func (Int64x8) ToMask ¶

func (from Int64x8) ToMask() (to Mask64x8)

ToMask converts from Int64x8 to Mask64x8, mask element is set to true when the corresponding vector element is non-zero.

func (Int64x8) TruncateToInt16 ¶

func (x Int64x8) TruncateToInt16() Int16x8

TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.

Asm: VPMOVQW, CPU Feature: AVX512

func (Int64x8) TruncateToInt32 ¶

func (x Int64x8) TruncateToInt32() Int32x8

TruncateToInt32 converts element values to int32. Conversion is done with truncation on the vector elements.

Asm: VPMOVQD, CPU Feature: AVX512

func (Int64x8) TruncateToInt8 ¶

func (x Int64x8) TruncateToInt8() Int8x16

TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVQB, CPU Feature: AVX512

func (Int64x8) Xor ¶

func (x Int64x8) Xor(y Int64x8) Int64x8

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORQ, CPU Feature: AVX512

type Int8x16 ¶

type Int8x16 struct {
	// contains filtered or unexported fields
}

Int8x16 is a 128-bit SIMD vector of 16 int8

func BroadcastInt8x16 ¶

func BroadcastInt8x16(x int8) Int8x16

BroadcastInt8x16 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt8x16 ¶

func LoadInt8x16(y *[16]int8) Int8x16

LoadInt8x16 loads a Int8x16 from an array

func LoadInt8x16Slice ¶

func LoadInt8x16Slice(s []int8) Int8x16

LoadInt8x16Slice loads an Int8x16 from a slice of at least 16 int8s

func LoadInt8x16SlicePart ¶

func LoadInt8x16SlicePart(s []int8) Int8x16

LoadInt8x16SlicePart loads a Int8x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadInt8x16Slice.

func (Int8x16) Abs ¶

func (x Int8x16) Abs() Int8x16

Abs computes the absolute value of each element.

Asm: VPABSB, CPU Feature: AVX

func (Int8x16) Add ¶

func (x Int8x16) Add(y Int8x16) Int8x16

Add adds corresponding elements of two vectors.

Asm: VPADDB, CPU Feature: AVX

func (Int8x16) AddSaturated ¶

func (x Int8x16) AddSaturated(y Int8x16) Int8x16

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDSB, CPU Feature: AVX

func (Int8x16) And ¶

func (x Int8x16) And(y Int8x16) Int8x16

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Int8x16) AndNot ¶

func (x Int8x16) AndNot(y Int8x16) Int8x16

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Int8x16) AsFloat32x4 ¶

func (from Int8x16) AsFloat32x4() (to Float32x4)

Float32x4 converts from Int8x16 to Float32x4

func (Int8x16) AsFloat64x2 ¶

func (from Int8x16) AsFloat64x2() (to Float64x2)

Float64x2 converts from Int8x16 to Float64x2

func (Int8x16) AsInt16x8 ¶

func (from Int8x16) AsInt16x8() (to Int16x8)

Int16x8 converts from Int8x16 to Int16x8

func (Int8x16) AsInt32x4 ¶

func (from Int8x16) AsInt32x4() (to Int32x4)

Int32x4 converts from Int8x16 to Int32x4

func (Int8x16) AsInt64x2 ¶

func (from Int8x16) AsInt64x2() (to Int64x2)

Int64x2 converts from Int8x16 to Int64x2

func (Int8x16) AsUint16x8 ¶

func (from Int8x16) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Int8x16 to Uint16x8

func (Int8x16) AsUint32x4 ¶

func (from Int8x16) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Int8x16 to Uint32x4

func (Int8x16) AsUint64x2 ¶

func (from Int8x16) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Int8x16 to Uint64x2

func (Int8x16) AsUint8x16 ¶

func (from Int8x16) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Int8x16 to Uint8x16

func (Int8x16) Broadcast128 ¶

func (x Int8x16) Broadcast128() Int8x16

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTB, CPU Feature: AVX2

func (Int8x16) Broadcast256 ¶

func (x Int8x16) Broadcast256() Int8x32

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTB, CPU Feature: AVX2

func (Int8x16) Broadcast512 ¶

func (x Int8x16) Broadcast512() Int8x64

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTB, CPU Feature: AVX512

func (Int8x16) Compress ¶

func (x Int8x16) Compress(mask Mask8x16) Int8x16

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

func (Int8x16) ConcatPermute ¶

func (x Int8x16) ConcatPermute(y Int8x16, indices Uint8x16) Int8x16

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2B, CPU Feature: AVX512VBMI

func (Int8x16) CopySign ¶

func (x Int8x16) CopySign(y Int8x16) Int8x16

CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.

Asm: VPSIGNB, CPU Feature: AVX

func (Int8x16) DotProductQuadruple ¶

func (x Int8x16) DotProductQuadruple(y Uint8x16) Int32x4

DotProductQuadruple performs dot products on groups of 4 elements of x and y. DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.

Asm: VPDPBUSD, CPU Feature: AVXVNNI

func (Int8x16) DotProductQuadrupleSaturated ¶

func (x Int8x16) DotProductQuadrupleSaturated(y Uint8x16) Int32x4

DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y. DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.

Asm: VPDPBUSDS, CPU Feature: AVXVNNI

func (Int8x16) Equal ¶

func (x Int8x16) Equal(y Int8x16) Mask8x16

Equal returns x equals y, elementwise.

Asm: VPCMPEQB, CPU Feature: AVX

func (Int8x16) Expand ¶

func (x Int8x16) Expand(mask Mask8x16) Int8x16

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

func (Int8x16) ExtendLo2ToInt64x2 ¶

func (x Int8x16) ExtendLo2ToInt64x2() Int64x2

ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXBQ, CPU Feature: AVX

func (Int8x16) ExtendLo4ToInt32x4 ¶

func (x Int8x16) ExtendLo4ToInt32x4() Int32x4

ExtendLo4ToInt32x4 converts 4 lowest vector element values to int32. The result vector's elements are sign-extended.

Asm: VPMOVSXBD, CPU Feature: AVX

func (Int8x16) ExtendLo4ToInt64x4 ¶

func (x Int8x16) ExtendLo4ToInt64x4() Int64x4

ExtendLo4ToInt64x4 converts 4 lowest vector element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXBQ, CPU Feature: AVX2

func (Int8x16) ExtendLo8ToInt16x8 ¶

func (x Int8x16) ExtendLo8ToInt16x8() Int16x8

ExtendLo8ToInt16x8 converts 8 lowest vector element values to int16. The result vector's elements are sign-extended.

Asm: VPMOVSXBW, CPU Feature: AVX

func (Int8x16) ExtendLo8ToInt32x8 ¶

func (x Int8x16) ExtendLo8ToInt32x8() Int32x8

ExtendLo8ToInt32x8 converts 8 lowest vector element values to int32. The result vector's elements are sign-extended.

Asm: VPMOVSXBD, CPU Feature: AVX2

func (Int8x16) ExtendLo8ToInt64x8 ¶

func (x Int8x16) ExtendLo8ToInt64x8() Int64x8

ExtendLo8ToInt64x8 converts 8 lowest vector element values to int64. The result vector's elements are sign-extended.

Asm: VPMOVSXBQ, CPU Feature: AVX512

func (Int8x16) ExtendToInt16 ¶

func (x Int8x16) ExtendToInt16() Int16x16

ExtendToInt16 converts element values to int16. The result vector's elements are sign-extended.

Asm: VPMOVSXBW, CPU Feature: AVX2

func (Int8x16) ExtendToInt32 ¶

func (x Int8x16) ExtendToInt32() Int32x16

ExtendToInt32 converts element values to int32. The result vector's elements are sign-extended.

Asm: VPMOVSXBD, CPU Feature: AVX512

func (Int8x16) GetElem ¶

func (x Int8x16) GetElem(index uint8) int8

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRB, CPU Feature: AVX512

func (Int8x16) Greater ¶

func (x Int8x16) Greater(y Int8x16) Mask8x16

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTB, CPU Feature: AVX

func (Int8x16) GreaterEqual ¶

func (x Int8x16) GreaterEqual(y Int8x16) Mask8x16

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX

func (Int8x16) IsZero ¶

func (x Int8x16) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int8x16) Len ¶

func (x Int8x16) Len() int

Len returns the number of elements in a Int8x16

func (Int8x16) Less ¶

func (x Int8x16) Less(y Int8x16) Mask8x16

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX

func (Int8x16) LessEqual ¶

func (x Int8x16) LessEqual(y Int8x16) Mask8x16

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX

func (Int8x16) Masked ¶

func (x Int8x16) Masked(mask Mask8x16) Int8x16

Masked returns x but with elements zeroed where mask is false.

func (Int8x16) Max ¶

func (x Int8x16) Max(y Int8x16) Int8x16

Max computes the maximum of corresponding elements.

Asm: VPMAXSB, CPU Feature: AVX

func (Int8x16) Merge ¶

func (x Int8x16) Merge(y Int8x16, mask Mask8x16) Int8x16

Merge returns x but with elements set to y where mask is false.

func (Int8x16) Min ¶

func (x Int8x16) Min(y Int8x16) Int8x16

Min computes the minimum of corresponding elements.

Asm: VPMINSB, CPU Feature: AVX

func (Int8x16) Not ¶

func (x Int8x16) Not() Int8x16

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Int8x16) NotEqual ¶

func (x Int8x16) NotEqual(y Int8x16) Mask8x16

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Int8x16) OnesCount ¶

func (x Int8x16) OnesCount() Int8x16

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTB, CPU Feature: AVX512BITALG

func (Int8x16) Or ¶

func (x Int8x16) Or(y Int8x16) Int8x16

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Int8x16) Permute ¶

func (x Int8x16) Permute(indices Uint8x16) Int8x16

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used

Asm: VPERMB, CPU Feature: AVX512VBMI

func (Int8x16) PermuteOrZero ¶

func (x Int8x16) PermuteOrZero(indices Int8x16) Int8x16

PermuteOrZero performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The lower four bits of each byte-sized index in indices select an element from x, unless the index's sign bit is set in which case zero is used instead.

Asm: VPSHUFB, CPU Feature: AVX

func (Int8x16) SetElem ¶

func (x Int8x16) SetElem(index uint8, y int8) Int8x16

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRB, CPU Feature: AVX

func (Int8x16) Store ¶

func (x Int8x16) Store(y *[16]int8)

Store stores a Int8x16 to an array

func (Int8x16) StoreSlice ¶

func (x Int8x16) StoreSlice(s []int8)

StoreSlice stores x into a slice of at least 16 int8s

func (Int8x16) StoreSlicePart ¶

func (x Int8x16) StoreSlicePart(s []int8)

StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.

func (Int8x16) String ¶

func (x Int8x16) String() string

String returns a string representation of SIMD vector x

func (Int8x16) Sub ¶

func (x Int8x16) Sub(y Int8x16) Int8x16

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBB, CPU Feature: AVX

func (Int8x16) SubSaturated ¶

func (x Int8x16) SubSaturated(y Int8x16) Int8x16

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBSB, CPU Feature: AVX

func (Int8x16) ToMask ¶

func (from Int8x16) ToMask() (to Mask8x16)

ToMask converts from Int8x16 to Mask8x16, mask element is set to true when the corresponding vector element is non-zero.

func (Int8x16) Xor ¶

func (x Int8x16) Xor(y Int8x16) Int8x16

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Int8x32 ¶

type Int8x32 struct {
	// contains filtered or unexported fields
}

Int8x32 is a 256-bit SIMD vector of 32 int8

func BroadcastInt8x32 ¶

func BroadcastInt8x32(x int8) Int8x32

BroadcastInt8x32 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadInt8x32 ¶

func LoadInt8x32(y *[32]int8) Int8x32

LoadInt8x32 loads a Int8x32 from an array

func LoadInt8x32Slice ¶

func LoadInt8x32Slice(s []int8) Int8x32

LoadInt8x32Slice loads an Int8x32 from a slice of at least 32 int8s

func LoadInt8x32SlicePart ¶

func LoadInt8x32SlicePart(s []int8) Int8x32

LoadInt8x32SlicePart loads a Int8x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadInt8x32Slice.

func (Int8x32) Abs ¶

func (x Int8x32) Abs() Int8x32

Abs computes the absolute value of each element.

Asm: VPABSB, CPU Feature: AVX2

func (Int8x32) Add ¶

func (x Int8x32) Add(y Int8x32) Int8x32

Add adds corresponding elements of two vectors.

Asm: VPADDB, CPU Feature: AVX2

func (Int8x32) AddSaturated ¶

func (x Int8x32) AddSaturated(y Int8x32) Int8x32

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDSB, CPU Feature: AVX2

func (Int8x32) And ¶

func (x Int8x32) And(y Int8x32) Int8x32

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Int8x32) AndNot ¶

func (x Int8x32) AndNot(y Int8x32) Int8x32

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Int8x32) AsFloat32x8 ¶

func (from Int8x32) AsFloat32x8() (to Float32x8)

Float32x8 converts from Int8x32 to Float32x8

func (Int8x32) AsFloat64x4 ¶

func (from Int8x32) AsFloat64x4() (to Float64x4)

Float64x4 converts from Int8x32 to Float64x4

func (Int8x32) AsInt16x16 ¶

func (from Int8x32) AsInt16x16() (to Int16x16)

Int16x16 converts from Int8x32 to Int16x16

func (Int8x32) AsInt32x8 ¶

func (from Int8x32) AsInt32x8() (to Int32x8)

Int32x8 converts from Int8x32 to Int32x8

func (Int8x32) AsInt64x4 ¶

func (from Int8x32) AsInt64x4() (to Int64x4)

Int64x4 converts from Int8x32 to Int64x4

func (Int8x32) AsUint16x16 ¶

func (from Int8x32) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Int8x32 to Uint16x16

func (Int8x32) AsUint32x8 ¶

func (from Int8x32) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Int8x32 to Uint32x8

func (Int8x32) AsUint64x4 ¶

func (from Int8x32) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Int8x32 to Uint64x4

func (Int8x32) AsUint8x32 ¶

func (from Int8x32) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Int8x32 to Uint8x32

func (Int8x32) Compress ¶

func (x Int8x32) Compress(mask Mask8x32) Int8x32

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

func (Int8x32) ConcatPermute ¶

func (x Int8x32) ConcatPermute(y Int8x32, indices Uint8x32) Int8x32

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2B, CPU Feature: AVX512VBMI

func (Int8x32) CopySign ¶

func (x Int8x32) CopySign(y Int8x32) Int8x32

CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.

Asm: VPSIGNB, CPU Feature: AVX2

func (Int8x32) DotProductQuadruple ¶

func (x Int8x32) DotProductQuadruple(y Uint8x32) Int32x8

DotProductQuadruple performs dot products on groups of 4 elements of x and y. DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.

Asm: VPDPBUSD, CPU Feature: AVXVNNI

func (Int8x32) DotProductQuadrupleSaturated ¶

func (x Int8x32) DotProductQuadrupleSaturated(y Uint8x32) Int32x8

DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y. DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.

Asm: VPDPBUSDS, CPU Feature: AVXVNNI

func (Int8x32) Equal ¶

func (x Int8x32) Equal(y Int8x32) Mask8x32

Equal returns x equals y, elementwise.

Asm: VPCMPEQB, CPU Feature: AVX2

func (Int8x32) Expand ¶

func (x Int8x32) Expand(mask Mask8x32) Int8x32

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

func (Int8x32) ExtendToInt16 ¶

func (x Int8x32) ExtendToInt16() Int16x32

ExtendToInt16 converts element values to int16. The result vector's elements are sign-extended.

Asm: VPMOVSXBW, CPU Feature: AVX512

func (Int8x32) GetHi ¶

func (x Int8x32) GetHi() Int8x16

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int8x32) GetLo ¶

func (x Int8x32) GetLo() Int8x16

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Int8x32) Greater ¶

func (x Int8x32) Greater(y Int8x32) Mask8x32

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTB, CPU Feature: AVX2

func (Int8x32) GreaterEqual ¶

func (x Int8x32) GreaterEqual(y Int8x32) Mask8x32

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Int8x32) IsZero ¶

func (x Int8x32) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Int8x32) Len ¶

func (x Int8x32) Len() int

Len returns the number of elements in a Int8x32

func (Int8x32) Less ¶

func (x Int8x32) Less(y Int8x32) Mask8x32

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Int8x32) LessEqual ¶

func (x Int8x32) LessEqual(y Int8x32) Mask8x32

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Int8x32) Masked ¶

func (x Int8x32) Masked(mask Mask8x32) Int8x32

Masked returns x but with elements zeroed where mask is false.

func (Int8x32) Max ¶

func (x Int8x32) Max(y Int8x32) Int8x32

Max computes the maximum of corresponding elements.

Asm: VPMAXSB, CPU Feature: AVX2

func (Int8x32) Merge ¶

func (x Int8x32) Merge(y Int8x32, mask Mask8x32) Int8x32

Merge returns x but with elements set to y where mask is false.

func (Int8x32) Min ¶

func (x Int8x32) Min(y Int8x32) Int8x32

Min computes the minimum of corresponding elements.

Asm: VPMINSB, CPU Feature: AVX2

func (Int8x32) Not ¶

func (x Int8x32) Not() Int8x32

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Int8x32) NotEqual ¶

func (x Int8x32) NotEqual(y Int8x32) Mask8x32

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Int8x32) OnesCount ¶

func (x Int8x32) OnesCount() Int8x32

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTB, CPU Feature: AVX512BITALG

func (Int8x32) Or ¶

func (x Int8x32) Or(y Int8x32) Int8x32

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Int8x32) Permute ¶

func (x Int8x32) Permute(indices Uint8x32) Int8x32

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 5 bits (values 0-31) of each element of indices is used

Asm: VPERMB, CPU Feature: AVX512VBMI

func (Int8x32) PermuteOrZeroGrouped ¶

func (x Int8x32) PermuteOrZeroGrouped(indices Int8x32) Int8x32

PermuteOrZeroGrouped performs a grouped permutation of vector x using indices: result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...} The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.

Asm: VPSHUFB, CPU Feature: AVX2

func (Int8x32) Select128FromPair ¶

func (x Int8x32) Select128FromPair(lo, hi uint8, y Int8x32) Int8x32

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{0x40, 0x41, ..., 0x4f, 0x50, 0x51, ..., 0x5f}.Select128FromPair(3, 0,
     {0x60, 0x61, ..., 0x6f, 0x70, 0x71, ..., 0x7f})

returns {0x70, 0x71, ..., 0x7f, 0x40, 0x41, ..., 0x4f}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Int8x32) SetHi ¶

func (x Int8x32) SetHi(y Int8x16) Int8x32

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int8x32) SetLo ¶

func (x Int8x32) SetLo(y Int8x16) Int8x32

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Int8x32) Store ¶

func (x Int8x32) Store(y *[32]int8)

Store stores a Int8x32 to an array

func (Int8x32) StoreSlice ¶

func (x Int8x32) StoreSlice(s []int8)

StoreSlice stores x into a slice of at least 32 int8s

func (Int8x32) StoreSlicePart ¶

func (x Int8x32) StoreSlicePart(s []int8)

StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.

func (Int8x32) String ¶

func (x Int8x32) String() string

String returns a string representation of SIMD vector x

func (Int8x32) Sub ¶

func (x Int8x32) Sub(y Int8x32) Int8x32

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBB, CPU Feature: AVX2

func (Int8x32) SubSaturated ¶

func (x Int8x32) SubSaturated(y Int8x32) Int8x32

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBSB, CPU Feature: AVX2

func (Int8x32) ToMask ¶

func (from Int8x32) ToMask() (to Mask8x32)

ToMask converts from Int8x32 to Mask8x32, mask element is set to true when the corresponding vector element is non-zero.

func (Int8x32) Xor ¶

func (x Int8x32) Xor(y Int8x32) Int8x32

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Int8x64 ¶

type Int8x64 struct {
	// contains filtered or unexported fields
}

Int8x64 is a 512-bit SIMD vector of 64 int8

func BroadcastInt8x64 ¶

func BroadcastInt8x64(x int8) Int8x64

BroadcastInt8x64 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512BW

func LoadInt8x64 ¶

func LoadInt8x64(y *[64]int8) Int8x64

LoadInt8x64 loads a Int8x64 from an array

func LoadInt8x64Slice ¶

func LoadInt8x64Slice(s []int8) Int8x64

LoadInt8x64Slice loads an Int8x64 from a slice of at least 64 int8s

func LoadInt8x64SlicePart ¶

func LoadInt8x64SlicePart(s []int8) Int8x64

LoadInt8x64SlicePart loads a Int8x64 from the slice s. If s has fewer than 64 elements, the remaining elements of the vector are filled with zeroes. If s has 64 or more elements, the function is equivalent to LoadInt8x64Slice.

func LoadMaskedInt8x64 ¶

func LoadMaskedInt8x64(y *[64]int8, mask Mask8x64) Int8x64

LoadMaskedInt8x64 loads a Int8x64 from an array, at those elements enabled by mask

Asm: VMOVDQU8.Z, CPU Feature: AVX512

func (Int8x64) Abs ¶

func (x Int8x64) Abs() Int8x64

Abs computes the absolute value of each element.

Asm: VPABSB, CPU Feature: AVX512

func (Int8x64) Add ¶

func (x Int8x64) Add(y Int8x64) Int8x64

Add adds corresponding elements of two vectors.

Asm: VPADDB, CPU Feature: AVX512

func (Int8x64) AddSaturated ¶

func (x Int8x64) AddSaturated(y Int8x64) Int8x64

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDSB, CPU Feature: AVX512

func (Int8x64) And ¶

func (x Int8x64) And(y Int8x64) Int8x64

And performs a bitwise AND operation between two vectors.

Asm: VPANDD, CPU Feature: AVX512

func (Int8x64) AndNot ¶

func (x Int8x64) AndNot(y Int8x64) Int8x64

AndNot performs a bitwise x &^ y.

Asm: VPANDND, CPU Feature: AVX512

func (Int8x64) AsFloat32x16 ¶

func (from Int8x64) AsFloat32x16() (to Float32x16)

Float32x16 converts from Int8x64 to Float32x16

func (Int8x64) AsFloat64x8 ¶

func (from Int8x64) AsFloat64x8() (to Float64x8)

Float64x8 converts from Int8x64 to Float64x8

func (Int8x64) AsInt16x32 ¶

func (from Int8x64) AsInt16x32() (to Int16x32)

Int16x32 converts from Int8x64 to Int16x32

func (Int8x64) AsInt32x16 ¶

func (from Int8x64) AsInt32x16() (to Int32x16)

Int32x16 converts from Int8x64 to Int32x16

func (Int8x64) AsInt64x8 ¶

func (from Int8x64) AsInt64x8() (to Int64x8)

Int64x8 converts from Int8x64 to Int64x8

func (Int8x64) AsUint16x32 ¶

func (from Int8x64) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Int8x64 to Uint16x32

func (Int8x64) AsUint32x16 ¶

func (from Int8x64) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Int8x64 to Uint32x16

func (Int8x64) AsUint64x8 ¶

func (from Int8x64) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Int8x64 to Uint64x8

func (Int8x64) AsUint8x64 ¶

func (from Int8x64) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Int8x64 to Uint8x64

func (Int8x64) Compress ¶

func (x Int8x64) Compress(mask Mask8x64) Int8x64

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

func (Int8x64) ConcatPermute ¶

func (x Int8x64) ConcatPermute(y Int8x64, indices Uint8x64) Int8x64

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2B, CPU Feature: AVX512VBMI

func (Int8x64) DotProductQuadruple ¶

func (x Int8x64) DotProductQuadruple(y Uint8x64) Int32x16

DotProductQuadruple performs dot products on groups of 4 elements of x and y. DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.

Asm: VPDPBUSD, CPU Feature: AVX512VNNI

func (Int8x64) DotProductQuadrupleSaturated ¶

func (x Int8x64) DotProductQuadrupleSaturated(y Uint8x64) Int32x16

DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y. DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.

Asm: VPDPBUSDS, CPU Feature: AVX512VNNI

func (Int8x64) Equal ¶

func (x Int8x64) Equal(y Int8x64) Mask8x64

Equal returns x equals y, elementwise.

Asm: VPCMPEQB, CPU Feature: AVX512

func (Int8x64) Expand ¶

func (x Int8x64) Expand(mask Mask8x64) Int8x64

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

func (Int8x64) GetHi ¶

func (x Int8x64) GetHi() Int8x32

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int8x64) GetLo ¶

func (x Int8x64) GetLo() Int8x32

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Int8x64) Greater ¶

func (x Int8x64) Greater(y Int8x64) Mask8x64

Greater returns x greater-than y, elementwise.

Asm: VPCMPGTB, CPU Feature: AVX512

func (Int8x64) GreaterEqual ¶

func (x Int8x64) GreaterEqual(y Int8x64) Mask8x64

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPB, CPU Feature: AVX512

func (Int8x64) Len ¶

func (x Int8x64) Len() int

Len returns the number of elements in a Int8x64

func (Int8x64) Less ¶

func (x Int8x64) Less(y Int8x64) Mask8x64

Less returns x less-than y, elementwise.

Asm: VPCMPB, CPU Feature: AVX512

func (Int8x64) LessEqual ¶

func (x Int8x64) LessEqual(y Int8x64) Mask8x64

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPB, CPU Feature: AVX512

func (Int8x64) Masked ¶

func (x Int8x64) Masked(mask Mask8x64) Int8x64

Masked returns x but with elements zeroed where mask is false.

func (Int8x64) Max ¶

func (x Int8x64) Max(y Int8x64) Int8x64

Max computes the maximum of corresponding elements.

Asm: VPMAXSB, CPU Feature: AVX512

func (Int8x64) Merge ¶

func (x Int8x64) Merge(y Int8x64, mask Mask8x64) Int8x64

Merge returns x but with elements set to y where m is false.

func (Int8x64) Min ¶

func (x Int8x64) Min(y Int8x64) Int8x64

Min computes the minimum of corresponding elements.

Asm: VPMINSB, CPU Feature: AVX512

func (Int8x64) Not ¶

func (x Int8x64) Not() Int8x64

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Int8x64) NotEqual ¶

func (x Int8x64) NotEqual(y Int8x64) Mask8x64

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPB, CPU Feature: AVX512

func (Int8x64) OnesCount ¶

func (x Int8x64) OnesCount() Int8x64

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTB, CPU Feature: AVX512BITALG

func (Int8x64) Or ¶

func (x Int8x64) Or(y Int8x64) Int8x64

Or performs a bitwise OR operation between two vectors.

Asm: VPORD, CPU Feature: AVX512

func (Int8x64) Permute ¶

func (x Int8x64) Permute(indices Uint8x64) Int8x64

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 6 bits (values 0-63) of each element of indices is used

Asm: VPERMB, CPU Feature: AVX512VBMI

func (Int8x64) PermuteOrZeroGrouped ¶

func (x Int8x64) PermuteOrZeroGrouped(indices Int8x64) Int8x64

PermuteOrZeroGrouped performs a grouped permutation of vector x using indices: result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...} The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.

Asm: VPSHUFB, CPU Feature: AVX512

func (Int8x64) SetHi ¶

func (x Int8x64) SetHi(y Int8x32) Int8x64

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int8x64) SetLo ¶

func (x Int8x64) SetLo(y Int8x32) Int8x64

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Int8x64) Store ¶

func (x Int8x64) Store(y *[64]int8)

Store stores a Int8x64 to an array

func (Int8x64) StoreMasked ¶

func (x Int8x64) StoreMasked(y *[64]int8, mask Mask8x64)

StoreMasked stores a Int8x64 to an array, at those elements enabled by mask

Asm: VMOVDQU8, CPU Feature: AVX512

func (Int8x64) StoreSlice ¶

func (x Int8x64) StoreSlice(s []int8)

StoreSlice stores x into a slice of at least 64 int8s

func (Int8x64) StoreSlicePart ¶

func (x Int8x64) StoreSlicePart(s []int8)

StoreSlicePart stores the 64 elements of x into the slice s. It stores as many elements as will fit in s. If s has 64 or more elements, the method is equivalent to x.StoreSlice.

func (Int8x64) String ¶

func (x Int8x64) String() string

String returns a string representation of SIMD vector x

func (Int8x64) Sub ¶

func (x Int8x64) Sub(y Int8x64) Int8x64

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBB, CPU Feature: AVX512

func (Int8x64) SubSaturated ¶

func (x Int8x64) SubSaturated(y Int8x64) Int8x64

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBSB, CPU Feature: AVX512

func (Int8x64) ToMask ¶

func (from Int8x64) ToMask() (to Mask8x64)

ToMask converts from Int8x64 to Mask8x64, mask element is set to true when the corresponding vector element is non-zero.

func (Int8x64) Xor ¶

func (x Int8x64) Xor(y Int8x64) Int8x64

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORD, CPU Feature: AVX512

type Mask16x16 ¶

type Mask16x16 struct {
	// contains filtered or unexported fields
}

Mask16x16 is a 256-bit SIMD vector of 16 int16

func Mask16x16FromBits ¶

func Mask16x16FromBits(y uint16) Mask16x16

Mask16x16FromBits constructs a Mask16x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVW, CPU Feature: AVX512

func (Mask16x16) And ¶

func (x Mask16x16) And(y Mask16x16) Mask16x16

func (Mask16x16) Or ¶

func (x Mask16x16) Or(y Mask16x16) Mask16x16

func (Mask16x16) ToBits ¶

func (x Mask16x16) ToBits() uint16

ToBits constructs a bitmap from a Mask16x16, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVW, CPU Features: AVX512

func (Mask16x16) ToInt16x16 ¶

func (from Mask16x16) ToInt16x16() (to Int16x16)

ToInt16x16 converts from Mask16x16 to Int16x16

type Mask16x32 ¶

type Mask16x32 struct {
	// contains filtered or unexported fields
}

Mask16x32 is a 512-bit SIMD vector of 32 int16

func Mask16x32FromBits ¶

func Mask16x32FromBits(y uint32) Mask16x32

Mask16x32FromBits constructs a Mask16x32 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVW, CPU Feature: AVX512

func (Mask16x32) And ¶

func (x Mask16x32) And(y Mask16x32) Mask16x32

func (Mask16x32) Or ¶

func (x Mask16x32) Or(y Mask16x32) Mask16x32

func (Mask16x32) ToBits ¶

func (x Mask16x32) ToBits() uint32

ToBits constructs a bitmap from a Mask16x32, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVW, CPU Features: AVX512

func (Mask16x32) ToInt16x32 ¶

func (from Mask16x32) ToInt16x32() (to Int16x32)

ToInt16x32 converts from Mask16x32 to Int16x32

type Mask16x8 ¶

type Mask16x8 struct {
	// contains filtered or unexported fields
}

Mask16x8 is a 128-bit SIMD vector of 8 int16

func Mask16x8FromBits ¶

func Mask16x8FromBits(y uint8) Mask16x8

Mask16x8FromBits constructs a Mask16x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVW, CPU Feature: AVX512

func (Mask16x8) And ¶

func (x Mask16x8) And(y Mask16x8) Mask16x8

func (Mask16x8) Or ¶

func (x Mask16x8) Or(y Mask16x8) Mask16x8

func (Mask16x8) ToBits ¶

func (x Mask16x8) ToBits() uint8

ToBits constructs a bitmap from a Mask16x8, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVW, CPU Features: AVX512

func (Mask16x8) ToInt16x8 ¶

func (from Mask16x8) ToInt16x8() (to Int16x8)

ToInt16x8 converts from Mask16x8 to Int16x8

type Mask32x16 ¶

type Mask32x16 struct {
	// contains filtered or unexported fields
}

Mask32x16 is a 512-bit SIMD vector of 16 int32

func Mask32x16FromBits ¶

func Mask32x16FromBits(y uint16) Mask32x16

Mask32x16FromBits constructs a Mask32x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVD, CPU Feature: AVX512

func (Mask32x16) And ¶

func (x Mask32x16) And(y Mask32x16) Mask32x16

func (Mask32x16) Or ¶

func (x Mask32x16) Or(y Mask32x16) Mask32x16

func (Mask32x16) ToBits ¶

func (x Mask32x16) ToBits() uint16

ToBits constructs a bitmap from a Mask32x16, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVD, CPU Features: AVX512

func (Mask32x16) ToInt32x16 ¶

func (from Mask32x16) ToInt32x16() (to Int32x16)

ToInt32x16 converts from Mask32x16 to Int32x16

type Mask32x4 ¶

type Mask32x4 struct {
	// contains filtered or unexported fields
}

Mask32x4 is a 128-bit SIMD vector of 4 int32

func Mask32x4FromBits ¶

func Mask32x4FromBits(y uint8) Mask32x4

Mask32x4FromBits constructs a Mask32x4 from a bitmap value, where 1 means set for the indexed element, 0 means unset. Only the lower 4 bits of y are used.

Asm: KMOVD, CPU Feature: AVX512

func (Mask32x4) And ¶

func (x Mask32x4) And(y Mask32x4) Mask32x4

func (Mask32x4) Or ¶

func (x Mask32x4) Or(y Mask32x4) Mask32x4

func (Mask32x4) ToBits ¶

func (x Mask32x4) ToBits() uint8

ToBits constructs a bitmap from a Mask32x4, where 1 means set for the indexed element, 0 means unset. Only the lower 4 bits of y are used.

Asm: KMOVD, CPU Features: AVX512

func (Mask32x4) ToInt32x4 ¶

func (from Mask32x4) ToInt32x4() (to Int32x4)

ToInt32x4 converts from Mask32x4 to Int32x4

type Mask32x8 ¶

type Mask32x8 struct {
	// contains filtered or unexported fields
}

Mask32x8 is a 256-bit SIMD vector of 8 int32

func Mask32x8FromBits ¶

func Mask32x8FromBits(y uint8) Mask32x8

Mask32x8FromBits constructs a Mask32x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVD, CPU Feature: AVX512

func (Mask32x8) And ¶

func (x Mask32x8) And(y Mask32x8) Mask32x8

func (Mask32x8) Or ¶

func (x Mask32x8) Or(y Mask32x8) Mask32x8

func (Mask32x8) ToBits ¶

func (x Mask32x8) ToBits() uint8

ToBits constructs a bitmap from a Mask32x8, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVD, CPU Features: AVX512

func (Mask32x8) ToInt32x8 ¶

func (from Mask32x8) ToInt32x8() (to Int32x8)

ToInt32x8 converts from Mask32x8 to Int32x8

type Mask64x2 ¶

type Mask64x2 struct {
	// contains filtered or unexported fields
}

Mask64x2 is a 128-bit SIMD vector of 2 int64

func Mask64x2FromBits ¶

func Mask64x2FromBits(y uint8) Mask64x2

Mask64x2FromBits constructs a Mask64x2 from a bitmap value, where 1 means set for the indexed element, 0 means unset. Only the lower 2 bits of y are used.

Asm: KMOVQ, CPU Feature: AVX512

func (Mask64x2) And ¶

func (x Mask64x2) And(y Mask64x2) Mask64x2

func (Mask64x2) Or ¶

func (x Mask64x2) Or(y Mask64x2) Mask64x2

func (Mask64x2) ToBits ¶

func (x Mask64x2) ToBits() uint8

ToBits constructs a bitmap from a Mask64x2, where 1 means set for the indexed element, 0 means unset. Only the lower 2 bits of y are used.

Asm: KMOVQ, CPU Features: AVX512

func (Mask64x2) ToInt64x2 ¶

func (from Mask64x2) ToInt64x2() (to Int64x2)

ToInt64x2 converts from Mask64x2 to Int64x2

type Mask64x4 ¶

type Mask64x4 struct {
	// contains filtered or unexported fields
}

Mask64x4 is a 256-bit SIMD vector of 4 int64

func Mask64x4FromBits ¶

func Mask64x4FromBits(y uint8) Mask64x4

Mask64x4FromBits constructs a Mask64x4 from a bitmap value, where 1 means set for the indexed element, 0 means unset. Only the lower 4 bits of y are used.

Asm: KMOVQ, CPU Feature: AVX512

func (Mask64x4) And ¶

func (x Mask64x4) And(y Mask64x4) Mask64x4

func (Mask64x4) Or ¶

func (x Mask64x4) Or(y Mask64x4) Mask64x4

func (Mask64x4) ToBits ¶

func (x Mask64x4) ToBits() uint8

ToBits constructs a bitmap from a Mask64x4, where 1 means set for the indexed element, 0 means unset. Only the lower 4 bits of y are used.

Asm: KMOVQ, CPU Features: AVX512

func (Mask64x4) ToInt64x4 ¶

func (from Mask64x4) ToInt64x4() (to Int64x4)

ToInt64x4 converts from Mask64x4 to Int64x4

type Mask64x8 ¶

type Mask64x8 struct {
	// contains filtered or unexported fields
}

Mask64x8 is a 512-bit SIMD vector of 8 int64

func Mask64x8FromBits ¶

func Mask64x8FromBits(y uint8) Mask64x8

Mask64x8FromBits constructs a Mask64x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVQ, CPU Feature: AVX512

func (Mask64x8) And ¶

func (x Mask64x8) And(y Mask64x8) Mask64x8

func (Mask64x8) Or ¶

func (x Mask64x8) Or(y Mask64x8) Mask64x8

func (Mask64x8) ToBits ¶

func (x Mask64x8) ToBits() uint8

ToBits constructs a bitmap from a Mask64x8, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVQ, CPU Features: AVX512

func (Mask64x8) ToInt64x8 ¶

func (from Mask64x8) ToInt64x8() (to Int64x8)

ToInt64x8 converts from Mask64x8 to Int64x8

type Mask8x16 ¶

type Mask8x16 struct {
	// contains filtered or unexported fields
}

Mask8x16 is a 128-bit SIMD vector of 16 int8

func Mask8x16FromBits ¶

func Mask8x16FromBits(y uint16) Mask8x16

Mask8x16FromBits constructs a Mask8x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVB, CPU Feature: AVX512

func (Mask8x16) And ¶

func (x Mask8x16) And(y Mask8x16) Mask8x16

func (Mask8x16) Or ¶

func (x Mask8x16) Or(y Mask8x16) Mask8x16

func (Mask8x16) ToBits ¶

func (x Mask8x16) ToBits() uint16

ToBits constructs a bitmap from a Mask8x16, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVB, CPU Features: AVX512

func (Mask8x16) ToInt8x16 ¶

func (from Mask8x16) ToInt8x16() (to Int8x16)

ToInt8x16 converts from Mask8x16 to Int8x16

type Mask8x32 ¶

type Mask8x32 struct {
	// contains filtered or unexported fields
}

Mask8x32 is a 256-bit SIMD vector of 32 int8

func Mask8x32FromBits ¶

func Mask8x32FromBits(y uint32) Mask8x32

Mask8x32FromBits constructs a Mask8x32 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVB, CPU Feature: AVX512

func (Mask8x32) And ¶

func (x Mask8x32) And(y Mask8x32) Mask8x32

func (Mask8x32) Or ¶

func (x Mask8x32) Or(y Mask8x32) Mask8x32

func (Mask8x32) ToBits ¶

func (x Mask8x32) ToBits() uint32

ToBits constructs a bitmap from a Mask8x32, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVB, CPU Features: AVX512

func (Mask8x32) ToInt8x32 ¶

func (from Mask8x32) ToInt8x32() (to Int8x32)

ToInt8x32 converts from Mask8x32 to Int8x32

type Mask8x64 ¶

type Mask8x64 struct {
	// contains filtered or unexported fields
}

Mask8x64 is a 512-bit SIMD vector of 64 int8

func Mask8x64FromBits ¶

func Mask8x64FromBits(y uint64) Mask8x64

Mask8x64FromBits constructs a Mask8x64 from a bitmap value, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVB, CPU Feature: AVX512

func (Mask8x64) And ¶

func (x Mask8x64) And(y Mask8x64) Mask8x64

func (Mask8x64) Or ¶

func (x Mask8x64) Or(y Mask8x64) Mask8x64

func (Mask8x64) ToBits ¶

func (x Mask8x64) ToBits() uint64

ToBits constructs a bitmap from a Mask8x64, where 1 means set for the indexed element, 0 means unset.

Asm: KMOVB, CPU Features: AVX512

func (Mask8x64) ToInt8x64 ¶

func (from Mask8x64) ToInt8x64() (to Int8x64)

ToInt8x64 converts from Mask8x64 to Int8x64

type Uint16x16 ¶

type Uint16x16 struct {
	// contains filtered or unexported fields
}

Uint16x16 is a 256-bit SIMD vector of 16 uint16

func BroadcastUint16x16 ¶

func BroadcastUint16x16(x uint16) Uint16x16

BroadcastUint16x16 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadUint16x16 ¶

func LoadUint16x16(y *[16]uint16) Uint16x16

LoadUint16x16 loads a Uint16x16 from an array

func LoadUint16x16Slice ¶

func LoadUint16x16Slice(s []uint16) Uint16x16

LoadUint16x16Slice loads an Uint16x16 from a slice of at least 16 uint16s

func LoadUint16x16SlicePart ¶

func LoadUint16x16SlicePart(s []uint16) Uint16x16

LoadUint16x16SlicePart loads a Uint16x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadUint16x16Slice.

func (Uint16x16) Add ¶

func (x Uint16x16) Add(y Uint16x16) Uint16x16

Add adds corresponding elements of two vectors.

Asm: VPADDW, CPU Feature: AVX2

func (Uint16x16) AddPairs ¶

func (x Uint16x16) AddPairs(y Uint16x16) Uint16x16

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDW, CPU Feature: AVX2

func (Uint16x16) AddSaturated ¶

func (x Uint16x16) AddSaturated(y Uint16x16) Uint16x16

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDUSW, CPU Feature: AVX2

func (Uint16x16) And ¶

func (x Uint16x16) And(y Uint16x16) Uint16x16

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Uint16x16) AndNot ¶

func (x Uint16x16) AndNot(y Uint16x16) Uint16x16

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Uint16x16) AsFloat32x8 ¶

func (from Uint16x16) AsFloat32x8() (to Float32x8)

Float32x8 converts from Uint16x16 to Float32x8

func (Uint16x16) AsFloat64x4 ¶

func (from Uint16x16) AsFloat64x4() (to Float64x4)

Float64x4 converts from Uint16x16 to Float64x4

func (Uint16x16) AsInt16x16 ¶

func (from Uint16x16) AsInt16x16() (to Int16x16)

Int16x16 converts from Uint16x16 to Int16x16

func (Uint16x16) AsInt32x8 ¶

func (from Uint16x16) AsInt32x8() (to Int32x8)

Int32x8 converts from Uint16x16 to Int32x8

func (Uint16x16) AsInt64x4 ¶

func (from Uint16x16) AsInt64x4() (to Int64x4)

Int64x4 converts from Uint16x16 to Int64x4

func (Uint16x16) AsInt8x32 ¶

func (from Uint16x16) AsInt8x32() (to Int8x32)

Int8x32 converts from Uint16x16 to Int8x32

func (Uint16x16) AsUint32x8 ¶

func (from Uint16x16) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Uint16x16 to Uint32x8

func (Uint16x16) AsUint64x4 ¶

func (from Uint16x16) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Uint16x16 to Uint64x4

func (Uint16x16) AsUint8x32 ¶

func (from Uint16x16) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Uint16x16 to Uint8x32

func (Uint16x16) Average ¶

func (x Uint16x16) Average(y Uint16x16) Uint16x16

Average computes the rounded average of corresponding elements.

Asm: VPAVGW, CPU Feature: AVX2

func (Uint16x16) Compress ¶

func (x Uint16x16) Compress(mask Mask16x16) Uint16x16

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

func (Uint16x16) ConcatPermute ¶

func (x Uint16x16) ConcatPermute(y Uint16x16, indices Uint16x16) Uint16x16

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2W, CPU Feature: AVX512

func (Uint16x16) Equal ¶

func (x Uint16x16) Equal(y Uint16x16) Mask16x16

Equal returns x equals y, elementwise.

Asm: VPCMPEQW, CPU Feature: AVX2

func (Uint16x16) Expand ¶

func (x Uint16x16) Expand(mask Mask16x16) Uint16x16

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

func (Uint16x16) ExtendToUint32 ¶

func (x Uint16x16) ExtendToUint32() Uint32x16

ExtendToUint32 converts element values to uint32. The result vector's elements are zero-extended.

Asm: VPMOVZXWD, CPU Feature: AVX512

func (Uint16x16) GetHi ¶

func (x Uint16x16) GetHi() Uint16x8

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint16x16) GetLo ¶

func (x Uint16x16) GetLo() Uint16x8

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint16x16) Greater ¶

func (x Uint16x16) Greater(y Uint16x16) Mask16x16

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX2

func (Uint16x16) GreaterEqual ¶

func (x Uint16x16) GreaterEqual(y Uint16x16) Mask16x16

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Uint16x16) InterleaveHiGrouped ¶

func (x Uint16x16) InterleaveHiGrouped(y Uint16x16) Uint16x16

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHWD, CPU Feature: AVX2

func (Uint16x16) InterleaveLoGrouped ¶

func (x Uint16x16) InterleaveLoGrouped(y Uint16x16) Uint16x16

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLWD, CPU Feature: AVX2

func (Uint16x16) IsZero ¶

func (x Uint16x16) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint16x16) Len ¶

func (x Uint16x16) Len() int

Len returns the number of elements in a Uint16x16

func (Uint16x16) Less ¶

func (x Uint16x16) Less(y Uint16x16) Mask16x16

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Uint16x16) LessEqual ¶

func (x Uint16x16) LessEqual(y Uint16x16) Mask16x16

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Uint16x16) Masked ¶

func (x Uint16x16) Masked(mask Mask16x16) Uint16x16

Masked returns x but with elements zeroed where mask is false.

func (Uint16x16) Max ¶

func (x Uint16x16) Max(y Uint16x16) Uint16x16

Max computes the maximum of corresponding elements.

Asm: VPMAXUW, CPU Feature: AVX2

func (Uint16x16) Merge ¶

func (x Uint16x16) Merge(y Uint16x16, mask Mask16x16) Uint16x16

Merge returns x but with elements set to y where mask is false.

func (Uint16x16) Min ¶

func (x Uint16x16) Min(y Uint16x16) Uint16x16

Min computes the minimum of corresponding elements.

Asm: VPMINUW, CPU Feature: AVX2

func (Uint16x16) Mul ¶

func (x Uint16x16) Mul(y Uint16x16) Uint16x16

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLW, CPU Feature: AVX2

func (Uint16x16) MulHigh ¶

func (x Uint16x16) MulHigh(y Uint16x16) Uint16x16

MulHigh multiplies elements and stores the high part of the result.

Asm: VPMULHUW, CPU Feature: AVX2

func (Uint16x16) Not ¶

func (x Uint16x16) Not() Uint16x16

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Uint16x16) NotEqual ¶

func (x Uint16x16) NotEqual(y Uint16x16) Mask16x16

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Uint16x16) OnesCount ¶

func (x Uint16x16) OnesCount() Uint16x16

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTW, CPU Feature: AVX512BITALG

func (Uint16x16) Or ¶

func (x Uint16x16) Or(y Uint16x16) Uint16x16

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Uint16x16) Permute ¶

func (x Uint16x16) Permute(indices Uint16x16) Uint16x16

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used

Asm: VPERMW, CPU Feature: AVX512

func (Uint16x16) PermuteScalarsHiGrouped ¶

func (x Uint16x16) PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x16

PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
  {x[0], x[1], x[2], x[3],   x[a+4], x[b+4], x[c+4], x[d+4],
	x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12]}

Each group is of size 128-bit.

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFHW, CPU Feature: AVX2

func (Uint16x16) PermuteScalarsLoGrouped ¶

func (x Uint16x16) PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x16

PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:

 result = {x[a], x[b], x[c], x[d],         x[4], x[5], x[6], x[7],
	x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFLW, CPU Feature: AVX2

func (Uint16x16) Select128FromPair ¶

func (x Uint16x16) Select128FromPair(lo, hi uint8, y Uint16x16) Uint16x16

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 42, 43, 44, 45, 46, 47, 50, 51, 52, 53, 54, 55, 56, 57}.Select128FromPair(3, 0,
 {60, 61, 62, 63, 64, 65, 66, 67, 70, 71, 72, 73, 74, 75, 76, 77})

returns {70, 71, 72, 73, 74, 75, 76, 77, 40, 41, 42, 43, 44, 45, 46, 47}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Uint16x16) SetHi ¶

func (x Uint16x16) SetHi(y Uint16x8) Uint16x16

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint16x16) SetLo ¶

func (x Uint16x16) SetLo(y Uint16x8) Uint16x16

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint16x16) ShiftAllLeft ¶

func (x Uint16x16) ShiftAllLeft(y uint64) Uint16x16

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLW, CPU Feature: AVX2

func (Uint16x16) ShiftAllLeftConcat ¶

func (x Uint16x16) ShiftAllLeftConcat(shift uint8, y Uint16x16) Uint16x16

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDW, CPU Feature: AVX512VBMI2

func (Uint16x16) ShiftAllRight ¶

func (x Uint16x16) ShiftAllRight(y uint64) Uint16x16

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLW, CPU Feature: AVX2

func (Uint16x16) ShiftAllRightConcat ¶

func (x Uint16x16) ShiftAllRightConcat(shift uint8, y Uint16x16) Uint16x16

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDW, CPU Feature: AVX512VBMI2

func (Uint16x16) ShiftLeft ¶

func (x Uint16x16) ShiftLeft(y Uint16x16) Uint16x16

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVW, CPU Feature: AVX512

func (Uint16x16) ShiftLeftConcat ¶

func (x Uint16x16) ShiftLeftConcat(y Uint16x16, z Uint16x16) Uint16x16

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

func (Uint16x16) ShiftRight ¶

func (x Uint16x16) ShiftRight(y Uint16x16) Uint16x16

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVW, CPU Feature: AVX512

func (Uint16x16) ShiftRightConcat ¶

func (x Uint16x16) ShiftRightConcat(y Uint16x16, z Uint16x16) Uint16x16

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

func (Uint16x16) Store ¶

func (x Uint16x16) Store(y *[16]uint16)

Store stores a Uint16x16 to an array

func (Uint16x16) StoreSlice ¶

func (x Uint16x16) StoreSlice(s []uint16)

StoreSlice stores x into a slice of at least 16 uint16s

func (Uint16x16) StoreSlicePart ¶

func (x Uint16x16) StoreSlicePart(s []uint16)

StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.

func (Uint16x16) String ¶

func (x Uint16x16) String() string

String returns a string representation of SIMD vector x

func (Uint16x16) Sub ¶

func (x Uint16x16) Sub(y Uint16x16) Uint16x16

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBW, CPU Feature: AVX2

func (Uint16x16) SubPairs ¶

func (x Uint16x16) SubPairs(y Uint16x16) Uint16x16

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBW, CPU Feature: AVX2

func (Uint16x16) SubSaturated ¶

func (x Uint16x16) SubSaturated(y Uint16x16) Uint16x16

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBUSW, CPU Feature: AVX2

func (Uint16x16) TruncateToUint8 ¶

func (x Uint16x16) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVWB, CPU Feature: AVX512

func (Uint16x16) Xor ¶

func (x Uint16x16) Xor(y Uint16x16) Uint16x16

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Uint16x32 ¶

type Uint16x32 struct {
	// contains filtered or unexported fields
}

Uint16x32 is a 512-bit SIMD vector of 32 uint16

func BroadcastUint16x32 ¶

func BroadcastUint16x32(x uint16) Uint16x32

BroadcastUint16x32 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512BW

func LoadMaskedUint16x32 ¶

func LoadMaskedUint16x32(y *[32]uint16, mask Mask16x32) Uint16x32

LoadMaskedUint16x32 loads a Uint16x32 from an array, at those elements enabled by mask

Asm: VMOVDQU16.Z, CPU Feature: AVX512

func LoadUint16x32 ¶

func LoadUint16x32(y *[32]uint16) Uint16x32

LoadUint16x32 loads a Uint16x32 from an array

func LoadUint16x32Slice ¶

func LoadUint16x32Slice(s []uint16) Uint16x32

LoadUint16x32Slice loads an Uint16x32 from a slice of at least 32 uint16s

func LoadUint16x32SlicePart ¶

func LoadUint16x32SlicePart(s []uint16) Uint16x32

LoadUint16x32SlicePart loads a Uint16x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadUint16x32Slice.

func (Uint16x32) Add ¶

func (x Uint16x32) Add(y Uint16x32) Uint16x32

Add adds corresponding elements of two vectors.

Asm: VPADDW, CPU Feature: AVX512

func (Uint16x32) AddSaturated ¶

func (x Uint16x32) AddSaturated(y Uint16x32) Uint16x32

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDUSW, CPU Feature: AVX512

func (Uint16x32) And ¶

func (x Uint16x32) And(y Uint16x32) Uint16x32

And performs a bitwise AND operation between two vectors.

Asm: VPANDD, CPU Feature: AVX512

func (Uint16x32) AndNot ¶

func (x Uint16x32) AndNot(y Uint16x32) Uint16x32

AndNot performs a bitwise x &^ y.

Asm: VPANDND, CPU Feature: AVX512

func (Uint16x32) AsFloat32x16 ¶

func (from Uint16x32) AsFloat32x16() (to Float32x16)

Float32x16 converts from Uint16x32 to Float32x16

func (Uint16x32) AsFloat64x8 ¶

func (from Uint16x32) AsFloat64x8() (to Float64x8)

Float64x8 converts from Uint16x32 to Float64x8

func (Uint16x32) AsInt16x32 ¶

func (from Uint16x32) AsInt16x32() (to Int16x32)

Int16x32 converts from Uint16x32 to Int16x32

func (Uint16x32) AsInt32x16 ¶

func (from Uint16x32) AsInt32x16() (to Int32x16)

Int32x16 converts from Uint16x32 to Int32x16

func (Uint16x32) AsInt64x8 ¶

func (from Uint16x32) AsInt64x8() (to Int64x8)

Int64x8 converts from Uint16x32 to Int64x8

func (Uint16x32) AsInt8x64 ¶

func (from Uint16x32) AsInt8x64() (to Int8x64)

Int8x64 converts from Uint16x32 to Int8x64

func (Uint16x32) AsUint32x16 ¶

func (from Uint16x32) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Uint16x32 to Uint32x16

func (Uint16x32) AsUint64x8 ¶

func (from Uint16x32) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Uint16x32 to Uint64x8

func (Uint16x32) AsUint8x64 ¶

func (from Uint16x32) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Uint16x32 to Uint8x64

func (Uint16x32) Average ¶

func (x Uint16x32) Average(y Uint16x32) Uint16x32

Average computes the rounded average of corresponding elements.

Asm: VPAVGW, CPU Feature: AVX512

func (Uint16x32) Compress ¶

func (x Uint16x32) Compress(mask Mask16x32) Uint16x32

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

func (Uint16x32) ConcatPermute ¶

func (x Uint16x32) ConcatPermute(y Uint16x32, indices Uint16x32) Uint16x32

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2W, CPU Feature: AVX512

func (Uint16x32) Equal ¶

func (x Uint16x32) Equal(y Uint16x32) Mask16x32

Equal returns x equals y, elementwise.

Asm: VPCMPEQW, CPU Feature: AVX512

func (Uint16x32) Expand ¶

func (x Uint16x32) Expand(mask Mask16x32) Uint16x32

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

func (Uint16x32) GetHi ¶

func (x Uint16x32) GetHi() Uint16x16

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint16x32) GetLo ¶

func (x Uint16x32) GetLo() Uint16x16

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint16x32) Greater ¶

func (x Uint16x32) Greater(y Uint16x32) Mask16x32

Greater returns x greater-than y, elementwise.

Asm: VPCMPUW, CPU Feature: AVX512

func (Uint16x32) GreaterEqual ¶

func (x Uint16x32) GreaterEqual(y Uint16x32) Mask16x32

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPUW, CPU Feature: AVX512

func (Uint16x32) InterleaveHiGrouped ¶

func (x Uint16x32) InterleaveHiGrouped(y Uint16x32) Uint16x32

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHWD, CPU Feature: AVX512

func (Uint16x32) InterleaveLoGrouped ¶

func (x Uint16x32) InterleaveLoGrouped(y Uint16x32) Uint16x32

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLWD, CPU Feature: AVX512

func (Uint16x32) Len ¶

func (x Uint16x32) Len() int

Len returns the number of elements in a Uint16x32

func (Uint16x32) Less ¶

func (x Uint16x32) Less(y Uint16x32) Mask16x32

Less returns x less-than y, elementwise.

Asm: VPCMPUW, CPU Feature: AVX512

func (Uint16x32) LessEqual ¶

func (x Uint16x32) LessEqual(y Uint16x32) Mask16x32

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPUW, CPU Feature: AVX512

func (Uint16x32) Masked ¶

func (x Uint16x32) Masked(mask Mask16x32) Uint16x32

Masked returns x but with elements zeroed where mask is false.

func (Uint16x32) Max ¶

func (x Uint16x32) Max(y Uint16x32) Uint16x32

Max computes the maximum of corresponding elements.

Asm: VPMAXUW, CPU Feature: AVX512

func (Uint16x32) Merge ¶

func (x Uint16x32) Merge(y Uint16x32, mask Mask16x32) Uint16x32

Merge returns x but with elements set to y where m is false.

func (Uint16x32) Min ¶

func (x Uint16x32) Min(y Uint16x32) Uint16x32

Min computes the minimum of corresponding elements.

Asm: VPMINUW, CPU Feature: AVX512

func (Uint16x32) Mul ¶

func (x Uint16x32) Mul(y Uint16x32) Uint16x32

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLW, CPU Feature: AVX512

func (Uint16x32) MulHigh ¶

func (x Uint16x32) MulHigh(y Uint16x32) Uint16x32

MulHigh multiplies elements and stores the high part of the result.

Asm: VPMULHUW, CPU Feature: AVX512

func (Uint16x32) Not ¶

func (x Uint16x32) Not() Uint16x32

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Uint16x32) NotEqual ¶

func (x Uint16x32) NotEqual(y Uint16x32) Mask16x32

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPUW, CPU Feature: AVX512

func (Uint16x32) OnesCount ¶

func (x Uint16x32) OnesCount() Uint16x32

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTW, CPU Feature: AVX512BITALG

func (Uint16x32) Or ¶

func (x Uint16x32) Or(y Uint16x32) Uint16x32

Or performs a bitwise OR operation between two vectors.

Asm: VPORD, CPU Feature: AVX512

func (Uint16x32) Permute ¶

func (x Uint16x32) Permute(indices Uint16x32) Uint16x32

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 5 bits (values 0-31) of each element of indices is used

Asm: VPERMW, CPU Feature: AVX512

func (Uint16x32) PermuteScalarsHiGrouped ¶

func (x Uint16x32) PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x32

PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
	 {  x[0], x[1], x[2], x[3],     x[a+4], x[b+4], x[c+4], x[d+4],
		x[8], x[9], x[10], x[11],   x[a+12], x[b+12], x[c+12], x[d+12],
		x[16], x[17], x[18], x[19], x[a+20], x[b+20], x[c+20], x[d+20],
		x[24], x[25], x[26], x[27], x[a+28], x[b+28], x[c+28], x[d+28]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFHW, CPU Feature: AVX512

func (Uint16x32) PermuteScalarsLoGrouped ¶

func (x Uint16x32) PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x32

PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
 {x[a], x[b], x[c], x[d],    x[4], x[5], x[6], x[7],
	x[a+8], x[b+8], x[c+8], x[d+8],     x[12], x[13], x[14], x[15],
	x[a+16], x[b+16], x[c+16], x[d+16], x[20], x[21], x[22], x[23],
	x[a+24], x[b+24], x[c+24], x[d+24], x[28], x[29], x[30], x[31]}

Each group is of size 128-bit.

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFLW, CPU Feature: AVX512

func (Uint16x32) SaturateToUint8 ¶

func (x Uint16x32) SaturateToUint8() Uint8x32

SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSWB, CPU Feature: AVX512

func (Uint16x32) SetHi ¶

func (x Uint16x32) SetHi(y Uint16x16) Uint16x32

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint16x32) SetLo ¶

func (x Uint16x32) SetLo(y Uint16x16) Uint16x32

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint16x32) ShiftAllLeft ¶

func (x Uint16x32) ShiftAllLeft(y uint64) Uint16x32

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLW, CPU Feature: AVX512

func (Uint16x32) ShiftAllLeftConcat ¶

func (x Uint16x32) ShiftAllLeftConcat(shift uint8, y Uint16x32) Uint16x32

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDW, CPU Feature: AVX512VBMI2

func (Uint16x32) ShiftAllRight ¶

func (x Uint16x32) ShiftAllRight(y uint64) Uint16x32

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLW, CPU Feature: AVX512

func (Uint16x32) ShiftAllRightConcat ¶

func (x Uint16x32) ShiftAllRightConcat(shift uint8, y Uint16x32) Uint16x32

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDW, CPU Feature: AVX512VBMI2

func (Uint16x32) ShiftLeft ¶

func (x Uint16x32) ShiftLeft(y Uint16x32) Uint16x32

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVW, CPU Feature: AVX512

func (Uint16x32) ShiftLeftConcat ¶

func (x Uint16x32) ShiftLeftConcat(y Uint16x32, z Uint16x32) Uint16x32

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

func (Uint16x32) ShiftRight ¶

func (x Uint16x32) ShiftRight(y Uint16x32) Uint16x32

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVW, CPU Feature: AVX512

func (Uint16x32) ShiftRightConcat ¶

func (x Uint16x32) ShiftRightConcat(y Uint16x32, z Uint16x32) Uint16x32

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

func (Uint16x32) Store ¶

func (x Uint16x32) Store(y *[32]uint16)

Store stores a Uint16x32 to an array

func (Uint16x32) StoreMasked ¶

func (x Uint16x32) StoreMasked(y *[32]uint16, mask Mask16x32)

StoreMasked stores a Uint16x32 to an array, at those elements enabled by mask

Asm: VMOVDQU16, CPU Feature: AVX512

func (Uint16x32) StoreSlice ¶

func (x Uint16x32) StoreSlice(s []uint16)

StoreSlice stores x into a slice of at least 32 uint16s

func (Uint16x32) StoreSlicePart ¶

func (x Uint16x32) StoreSlicePart(s []uint16)

StoreSlicePart stores the 32 elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.

func (Uint16x32) String ¶

func (x Uint16x32) String() string

String returns a string representation of SIMD vector x

func (Uint16x32) Sub ¶

func (x Uint16x32) Sub(y Uint16x32) Uint16x32

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBW, CPU Feature: AVX512

func (Uint16x32) SubSaturated ¶

func (x Uint16x32) SubSaturated(y Uint16x32) Uint16x32

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBUSW, CPU Feature: AVX512

func (Uint16x32) TruncateToUint8 ¶

func (x Uint16x32) TruncateToUint8() Uint8x32

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements.

Asm: VPMOVWB, CPU Feature: AVX512

func (Uint16x32) Xor ¶

func (x Uint16x32) Xor(y Uint16x32) Uint16x32

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORD, CPU Feature: AVX512

type Uint16x8 ¶

type Uint16x8 struct {
	// contains filtered or unexported fields
}

Uint16x8 is a 128-bit SIMD vector of 8 uint16

func BroadcastUint16x8 ¶

func BroadcastUint16x8(x uint16) Uint16x8

BroadcastUint16x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadUint16x8 ¶

func LoadUint16x8(y *[8]uint16) Uint16x8

LoadUint16x8 loads a Uint16x8 from an array

func LoadUint16x8Slice ¶

func LoadUint16x8Slice(s []uint16) Uint16x8

LoadUint16x8Slice loads an Uint16x8 from a slice of at least 8 uint16s

func LoadUint16x8SlicePart ¶

func LoadUint16x8SlicePart(s []uint16) Uint16x8

LoadUint16x8SlicePart loads a Uint16x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadUint16x8Slice.

func (Uint16x8) Add ¶

func (x Uint16x8) Add(y Uint16x8) Uint16x8

Add adds corresponding elements of two vectors.

Asm: VPADDW, CPU Feature: AVX

func (Uint16x8) AddPairs ¶

func (x Uint16x8) AddPairs(y Uint16x8) Uint16x8

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDW, CPU Feature: AVX

func (Uint16x8) AddSaturated ¶

func (x Uint16x8) AddSaturated(y Uint16x8) Uint16x8

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDUSW, CPU Feature: AVX

func (Uint16x8) And ¶

func (x Uint16x8) And(y Uint16x8) Uint16x8

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Uint16x8) AndNot ¶

func (x Uint16x8) AndNot(y Uint16x8) Uint16x8

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Uint16x8) AsFloat32x4 ¶

func (from Uint16x8) AsFloat32x4() (to Float32x4)

Float32x4 converts from Uint16x8 to Float32x4

func (Uint16x8) AsFloat64x2 ¶

func (from Uint16x8) AsFloat64x2() (to Float64x2)

Float64x2 converts from Uint16x8 to Float64x2

func (Uint16x8) AsInt16x8 ¶

func (from Uint16x8) AsInt16x8() (to Int16x8)

Int16x8 converts from Uint16x8 to Int16x8

func (Uint16x8) AsInt32x4 ¶

func (from Uint16x8) AsInt32x4() (to Int32x4)

Int32x4 converts from Uint16x8 to Int32x4

func (Uint16x8) AsInt64x2 ¶

func (from Uint16x8) AsInt64x2() (to Int64x2)

Int64x2 converts from Uint16x8 to Int64x2

func (Uint16x8) AsInt8x16 ¶

func (from Uint16x8) AsInt8x16() (to Int8x16)

Int8x16 converts from Uint16x8 to Int8x16

func (Uint16x8) AsUint32x4 ¶

func (from Uint16x8) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Uint16x8 to Uint32x4

func (Uint16x8) AsUint64x2 ¶

func (from Uint16x8) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Uint16x8 to Uint64x2

func (Uint16x8) AsUint8x16 ¶

func (from Uint16x8) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Uint16x8 to Uint8x16

func (Uint16x8) Average ¶

func (x Uint16x8) Average(y Uint16x8) Uint16x8

Average computes the rounded average of corresponding elements.

Asm: VPAVGW, CPU Feature: AVX

func (Uint16x8) Broadcast128 ¶

func (x Uint16x8) Broadcast128() Uint16x8

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTW, CPU Feature: AVX2

func (Uint16x8) Broadcast256 ¶

func (x Uint16x8) Broadcast256() Uint16x16

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTW, CPU Feature: AVX2

func (Uint16x8) Broadcast512 ¶

func (x Uint16x8) Broadcast512() Uint16x32

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTW, CPU Feature: AVX512

func (Uint16x8) Compress ¶

func (x Uint16x8) Compress(mask Mask16x8) Uint16x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2

func (Uint16x8) ConcatPermute ¶

func (x Uint16x8) ConcatPermute(y Uint16x8, indices Uint16x8) Uint16x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2W, CPU Feature: AVX512

func (Uint16x8) Equal ¶

func (x Uint16x8) Equal(y Uint16x8) Mask16x8

Equal returns x equals y, elementwise.

Asm: VPCMPEQW, CPU Feature: AVX

func (Uint16x8) Expand ¶

func (x Uint16x8) Expand(mask Mask16x8) Uint16x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDW, CPU Feature: AVX512VBMI2

func (Uint16x8) ExtendLo2ToUint64x2 ¶

func (x Uint16x8) ExtendLo2ToUint64x2() Uint64x2

ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXWQ, CPU Feature: AVX

func (Uint16x8) ExtendLo4ToUint32x4 ¶

func (x Uint16x8) ExtendLo4ToUint32x4() Uint32x4

ExtendLo4ToUint32x4 converts 4 lowest vector element values to uint32. The result vector's elements are zero-extended.

Asm: VPMOVZXWD, CPU Feature: AVX

func (Uint16x8) ExtendLo4ToUint64x4 ¶

func (x Uint16x8) ExtendLo4ToUint64x4() Uint64x4

ExtendLo4ToUint64x4 converts 4 lowest vector element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXWQ, CPU Feature: AVX2

func (Uint16x8) ExtendToUint32 ¶

func (x Uint16x8) ExtendToUint32() Uint32x8

ExtendToUint32 converts element values to uint32. The result vector's elements are zero-extended.

Asm: VPMOVZXWD, CPU Feature: AVX2

func (Uint16x8) ExtendToUint64 ¶

func (x Uint16x8) ExtendToUint64() Uint64x8

ExtendToUint64 converts element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXWQ, CPU Feature: AVX512

func (Uint16x8) GetElem ¶

func (x Uint16x8) GetElem(index uint8) uint16

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRW, CPU Feature: AVX512

func (Uint16x8) Greater ¶

func (x Uint16x8) Greater(y Uint16x8) Mask16x8

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX

func (Uint16x8) GreaterEqual ¶

func (x Uint16x8) GreaterEqual(y Uint16x8) Mask16x8

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX

func (Uint16x8) InterleaveHi ¶

func (x Uint16x8) InterleaveHi(y Uint16x8) Uint16x8

InterleaveHi interleaves the elements of the high halves of x and y.

Asm: VPUNPCKHWD, CPU Feature: AVX

func (Uint16x8) InterleaveLo ¶

func (x Uint16x8) InterleaveLo(y Uint16x8) Uint16x8

InterleaveLo interleaves the elements of the low halves of x and y.

Asm: VPUNPCKLWD, CPU Feature: AVX

func (Uint16x8) IsZero ¶

func (x Uint16x8) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint16x8) Len ¶

func (x Uint16x8) Len() int

Len returns the number of elements in a Uint16x8

func (Uint16x8) Less ¶

func (x Uint16x8) Less(y Uint16x8) Mask16x8

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX

func (Uint16x8) LessEqual ¶

func (x Uint16x8) LessEqual(y Uint16x8) Mask16x8

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX

func (Uint16x8) Masked ¶

func (x Uint16x8) Masked(mask Mask16x8) Uint16x8

Masked returns x but with elements zeroed where mask is false.

func (Uint16x8) Max ¶

func (x Uint16x8) Max(y Uint16x8) Uint16x8

Max computes the maximum of corresponding elements.

Asm: VPMAXUW, CPU Feature: AVX

func (Uint16x8) Merge ¶

func (x Uint16x8) Merge(y Uint16x8, mask Mask16x8) Uint16x8

Merge returns x but with elements set to y where mask is false.

func (Uint16x8) Min ¶

func (x Uint16x8) Min(y Uint16x8) Uint16x8

Min computes the minimum of corresponding elements.

Asm: VPMINUW, CPU Feature: AVX

func (Uint16x8) Mul ¶

func (x Uint16x8) Mul(y Uint16x8) Uint16x8

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLW, CPU Feature: AVX

func (Uint16x8) MulHigh ¶

func (x Uint16x8) MulHigh(y Uint16x8) Uint16x8

MulHigh multiplies elements and stores the high part of the result.

Asm: VPMULHUW, CPU Feature: AVX

func (Uint16x8) Not ¶

func (x Uint16x8) Not() Uint16x8

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Uint16x8) NotEqual ¶

func (x Uint16x8) NotEqual(y Uint16x8) Mask16x8

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Uint16x8) OnesCount ¶

func (x Uint16x8) OnesCount() Uint16x8

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTW, CPU Feature: AVX512BITALG

func (Uint16x8) Or ¶

func (x Uint16x8) Or(y Uint16x8) Uint16x8

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Uint16x8) Permute ¶

func (x Uint16x8) Permute(indices Uint16x8) Uint16x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMW, CPU Feature: AVX512

func (Uint16x8) PermuteScalarsHi ¶

func (x Uint16x8) PermuteScalarsHi(a, b, c, d uint8) Uint16x8

PermuteScalarsHi performs a permutation of vector x using the supplied indices:

result = {x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFHW, CPU Feature: AVX512

func (Uint16x8) PermuteScalarsLo ¶

func (x Uint16x8) PermuteScalarsLo(a, b, c, d uint8) Uint16x8

PermuteScalarsLo performs a permutation of vector x using the supplied indices:

result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFLW, CPU Feature: AVX512

func (Uint16x8) SetElem ¶

func (x Uint16x8) SetElem(index uint8, y uint16) Uint16x8

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRW, CPU Feature: AVX

func (Uint16x8) ShiftAllLeft ¶

func (x Uint16x8) ShiftAllLeft(y uint64) Uint16x8

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLW, CPU Feature: AVX

func (Uint16x8) ShiftAllLeftConcat ¶

func (x Uint16x8) ShiftAllLeftConcat(shift uint8, y Uint16x8) Uint16x8

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDW, CPU Feature: AVX512VBMI2

func (Uint16x8) ShiftAllRight ¶

func (x Uint16x8) ShiftAllRight(y uint64) Uint16x8

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLW, CPU Feature: AVX

func (Uint16x8) ShiftAllRightConcat ¶

func (x Uint16x8) ShiftAllRightConcat(shift uint8, y Uint16x8) Uint16x8

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDW, CPU Feature: AVX512VBMI2

func (Uint16x8) ShiftLeft ¶

func (x Uint16x8) ShiftLeft(y Uint16x8) Uint16x8

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVW, CPU Feature: AVX512

func (Uint16x8) ShiftLeftConcat ¶

func (x Uint16x8) ShiftLeftConcat(y Uint16x8, z Uint16x8) Uint16x8

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVW, CPU Feature: AVX512VBMI2

func (Uint16x8) ShiftRight ¶

func (x Uint16x8) ShiftRight(y Uint16x8) Uint16x8

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVW, CPU Feature: AVX512

func (Uint16x8) ShiftRightConcat ¶

func (x Uint16x8) ShiftRightConcat(y Uint16x8, z Uint16x8) Uint16x8

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVW, CPU Feature: AVX512VBMI2

func (Uint16x8) Store ¶

func (x Uint16x8) Store(y *[8]uint16)

Store stores a Uint16x8 to an array

func (Uint16x8) StoreSlice ¶

func (x Uint16x8) StoreSlice(s []uint16)

StoreSlice stores x into a slice of at least 8 uint16s

func (Uint16x8) StoreSlicePart ¶

func (x Uint16x8) StoreSlicePart(s []uint16)

StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Uint16x8) String ¶

func (x Uint16x8) String() string

String returns a string representation of SIMD vector x

func (Uint16x8) Sub ¶

func (x Uint16x8) Sub(y Uint16x8) Uint16x8

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBW, CPU Feature: AVX

func (Uint16x8) SubPairs ¶

func (x Uint16x8) SubPairs(y Uint16x8) Uint16x8

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBW, CPU Feature: AVX

func (Uint16x8) SubSaturated ¶

func (x Uint16x8) SubSaturated(y Uint16x8) Uint16x8

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBUSW, CPU Feature: AVX

func (Uint16x8) TruncateToUint8 ¶

func (x Uint16x8) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVWB, CPU Feature: AVX512

func (Uint16x8) Xor ¶

func (x Uint16x8) Xor(y Uint16x8) Uint16x8

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Uint32x16 ¶

type Uint32x16 struct {
	// contains filtered or unexported fields
}

Uint32x16 is a 512-bit SIMD vector of 16 uint32

func BroadcastUint32x16 ¶

func BroadcastUint32x16(x uint32) Uint32x16

BroadcastUint32x16 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512F

func LoadMaskedUint32x16 ¶

func LoadMaskedUint32x16(y *[16]uint32, mask Mask32x16) Uint32x16

LoadMaskedUint32x16 loads a Uint32x16 from an array, at those elements enabled by mask

Asm: VMOVDQU32.Z, CPU Feature: AVX512

func LoadUint32x16 ¶

func LoadUint32x16(y *[16]uint32) Uint32x16

LoadUint32x16 loads a Uint32x16 from an array

func LoadUint32x16Slice ¶

func LoadUint32x16Slice(s []uint32) Uint32x16

LoadUint32x16Slice loads an Uint32x16 from a slice of at least 16 uint32s

func LoadUint32x16SlicePart ¶

func LoadUint32x16SlicePart(s []uint32) Uint32x16

LoadUint32x16SlicePart loads a Uint32x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadUint32x16Slice.

func (Uint32x16) Add ¶

func (x Uint32x16) Add(y Uint32x16) Uint32x16

Add adds corresponding elements of two vectors.

Asm: VPADDD, CPU Feature: AVX512

func (Uint32x16) And ¶

func (x Uint32x16) And(y Uint32x16) Uint32x16

And performs a bitwise AND operation between two vectors.

Asm: VPANDD, CPU Feature: AVX512

func (Uint32x16) AndNot ¶

func (x Uint32x16) AndNot(y Uint32x16) Uint32x16

AndNot performs a bitwise x &^ y.

Asm: VPANDND, CPU Feature: AVX512

func (Uint32x16) AsFloat32x16 ¶

func (from Uint32x16) AsFloat32x16() (to Float32x16)

Float32x16 converts from Uint32x16 to Float32x16

func (Uint32x16) AsFloat64x8 ¶

func (from Uint32x16) AsFloat64x8() (to Float64x8)

Float64x8 converts from Uint32x16 to Float64x8

func (Uint32x16) AsInt16x32 ¶

func (from Uint32x16) AsInt16x32() (to Int16x32)

Int16x32 converts from Uint32x16 to Int16x32

func (Uint32x16) AsInt32x16 ¶

func (from Uint32x16) AsInt32x16() (to Int32x16)

Int32x16 converts from Uint32x16 to Int32x16

func (Uint32x16) AsInt64x8 ¶

func (from Uint32x16) AsInt64x8() (to Int64x8)

Int64x8 converts from Uint32x16 to Int64x8

func (Uint32x16) AsInt8x64 ¶

func (from Uint32x16) AsInt8x64() (to Int8x64)

Int8x64 converts from Uint32x16 to Int8x64

func (Uint32x16) AsUint16x32 ¶

func (from Uint32x16) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Uint32x16 to Uint16x32

func (Uint32x16) AsUint64x8 ¶

func (from Uint32x16) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Uint32x16 to Uint64x8

func (Uint32x16) AsUint8x64 ¶

func (from Uint32x16) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Uint32x16 to Uint8x64

func (Uint32x16) Compress ¶

func (x Uint32x16) Compress(mask Mask32x16) Uint32x16

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSD, CPU Feature: AVX512

func (Uint32x16) ConcatPermute ¶

func (x Uint32x16) ConcatPermute(y Uint32x16, indices Uint32x16) Uint32x16

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2D, CPU Feature: AVX512

func (Uint32x16) ConvertToFloat32 ¶

func (x Uint32x16) ConvertToFloat32() Float32x16

ConvertToFloat32 converts element values to float32.

Asm: VCVTUDQ2PS, CPU Feature: AVX512

func (Uint32x16) Equal ¶

func (x Uint32x16) Equal(y Uint32x16) Mask32x16

Equal returns x equals y, elementwise.

Asm: VPCMPEQD, CPU Feature: AVX512

func (Uint32x16) Expand ¶

func (x Uint32x16) Expand(mask Mask32x16) Uint32x16

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDD, CPU Feature: AVX512

func (Uint32x16) GetHi ¶

func (x Uint32x16) GetHi() Uint32x8

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint32x16) GetLo ¶

func (x Uint32x16) GetLo() Uint32x8

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint32x16) Greater ¶

func (x Uint32x16) Greater(y Uint32x16) Mask32x16

Greater returns x greater-than y, elementwise.

Asm: VPCMPUD, CPU Feature: AVX512

func (Uint32x16) GreaterEqual ¶

func (x Uint32x16) GreaterEqual(y Uint32x16) Mask32x16

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPUD, CPU Feature: AVX512

func (Uint32x16) InterleaveHiGrouped ¶

func (x Uint32x16) InterleaveHiGrouped(y Uint32x16) Uint32x16

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHDQ, CPU Feature: AVX512

func (Uint32x16) InterleaveLoGrouped ¶

func (x Uint32x16) InterleaveLoGrouped(y Uint32x16) Uint32x16

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLDQ, CPU Feature: AVX512

func (Uint32x16) LeadingZeros ¶

func (x Uint32x16) LeadingZeros() Uint32x16

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTD, CPU Feature: AVX512

func (Uint32x16) Len ¶

func (x Uint32x16) Len() int

Len returns the number of elements in a Uint32x16

func (Uint32x16) Less ¶

func (x Uint32x16) Less(y Uint32x16) Mask32x16

Less returns x less-than y, elementwise.

Asm: VPCMPUD, CPU Feature: AVX512

func (Uint32x16) LessEqual ¶

func (x Uint32x16) LessEqual(y Uint32x16) Mask32x16

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPUD, CPU Feature: AVX512

func (Uint32x16) Masked ¶

func (x Uint32x16) Masked(mask Mask32x16) Uint32x16

Masked returns x but with elements zeroed where mask is false.

func (Uint32x16) Max ¶

func (x Uint32x16) Max(y Uint32x16) Uint32x16

Max computes the maximum of corresponding elements.

Asm: VPMAXUD, CPU Feature: AVX512

func (Uint32x16) Merge ¶

func (x Uint32x16) Merge(y Uint32x16, mask Mask32x16) Uint32x16

Merge returns x but with elements set to y where m is false.

func (Uint32x16) Min ¶

func (x Uint32x16) Min(y Uint32x16) Uint32x16

Min computes the minimum of corresponding elements.

Asm: VPMINUD, CPU Feature: AVX512

func (Uint32x16) Mul ¶

func (x Uint32x16) Mul(y Uint32x16) Uint32x16

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLD, CPU Feature: AVX512

func (Uint32x16) Not ¶

func (x Uint32x16) Not() Uint32x16

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Uint32x16) NotEqual ¶

func (x Uint32x16) NotEqual(y Uint32x16) Mask32x16

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPUD, CPU Feature: AVX512

func (Uint32x16) OnesCount ¶

func (x Uint32x16) OnesCount() Uint32x16

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

func (Uint32x16) Or ¶

func (x Uint32x16) Or(y Uint32x16) Uint32x16

Or performs a bitwise OR operation between two vectors.

Asm: VPORD, CPU Feature: AVX512

func (Uint32x16) Permute ¶

func (x Uint32x16) Permute(indices Uint32x16) Uint32x16

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used

Asm: VPERMD, CPU Feature: AVX512

func (Uint32x16) PermuteScalarsGrouped ¶

func (x Uint32x16) PermuteScalarsGrouped(a, b, c, d uint8) Uint32x16

PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:

 result =
	 {  x[a], x[b], x[c], x[d],         x[a+4], x[b+4], x[c+4], x[d+4],
		x[a+8], x[b+8], x[c+8], x[d+8], x[a+12], x[b+12], x[c+12], x[d+12]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFD, CPU Feature: AVX512

func (Uint32x16) RotateAllLeft ¶

func (x Uint32x16) RotateAllLeft(shift uint8) Uint32x16

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLD, CPU Feature: AVX512

func (Uint32x16) RotateAllRight ¶

func (x Uint32x16) RotateAllRight(shift uint8) Uint32x16

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORD, CPU Feature: AVX512

func (Uint32x16) RotateLeft ¶

func (x Uint32x16) RotateLeft(y Uint32x16) Uint32x16

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVD, CPU Feature: AVX512

func (Uint32x16) RotateRight ¶

func (x Uint32x16) RotateRight(y Uint32x16) Uint32x16

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVD, CPU Feature: AVX512

func (Uint32x16) SaturateToUint16 ¶

func (x Uint32x16) SaturateToUint16() Uint16x16

SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSDW, CPU Feature: AVX512

func (Uint32x16) SaturateToUint16Concat ¶

func (x Uint32x16) SaturateToUint16Concat(y Uint32x16) Uint16x32

SaturateToUint16Concat converts element values to uint16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.

Asm: VPACKUSDW, CPU Feature: AVX512

func (Uint32x16) SelectFromPairGrouped ¶

func (x Uint32x16) SelectFromPairGrouped(a, b, c, d uint8, y Uint32x16) Uint32x16

SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX512

func (Uint32x16) SetHi ¶

func (x Uint32x16) SetHi(y Uint32x8) Uint32x16

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint32x16) SetLo ¶

func (x Uint32x16) SetLo(y Uint32x8) Uint32x16

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint32x16) ShiftAllLeft ¶

func (x Uint32x16) ShiftAllLeft(y uint64) Uint32x16

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLD, CPU Feature: AVX512

func (Uint32x16) ShiftAllLeftConcat ¶

func (x Uint32x16) ShiftAllLeftConcat(shift uint8, y Uint32x16) Uint32x16

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDD, CPU Feature: AVX512VBMI2

func (Uint32x16) ShiftAllRight ¶

func (x Uint32x16) ShiftAllRight(y uint64) Uint32x16

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLD, CPU Feature: AVX512

func (Uint32x16) ShiftAllRightConcat ¶

func (x Uint32x16) ShiftAllRightConcat(shift uint8, y Uint32x16) Uint32x16

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDD, CPU Feature: AVX512VBMI2

func (Uint32x16) ShiftLeft ¶

func (x Uint32x16) ShiftLeft(y Uint32x16) Uint32x16

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVD, CPU Feature: AVX512

func (Uint32x16) ShiftLeftConcat ¶

func (x Uint32x16) ShiftLeftConcat(y Uint32x16, z Uint32x16) Uint32x16

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

func (Uint32x16) ShiftRight ¶

func (x Uint32x16) ShiftRight(y Uint32x16) Uint32x16

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVD, CPU Feature: AVX512

func (Uint32x16) ShiftRightConcat ¶

func (x Uint32x16) ShiftRightConcat(y Uint32x16, z Uint32x16) Uint32x16

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

func (Uint32x16) Store ¶

func (x Uint32x16) Store(y *[16]uint32)

Store stores a Uint32x16 to an array

func (Uint32x16) StoreMasked ¶

func (x Uint32x16) StoreMasked(y *[16]uint32, mask Mask32x16)

StoreMasked stores a Uint32x16 to an array, at those elements enabled by mask

Asm: VMOVDQU32, CPU Feature: AVX512

func (Uint32x16) StoreSlice ¶

func (x Uint32x16) StoreSlice(s []uint32)

StoreSlice stores x into a slice of at least 16 uint32s

func (Uint32x16) StoreSlicePart ¶

func (x Uint32x16) StoreSlicePart(s []uint32)

StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.

func (Uint32x16) String ¶

func (x Uint32x16) String() string

String returns a string representation of SIMD vector x

func (Uint32x16) Sub ¶

func (x Uint32x16) Sub(y Uint32x16) Uint32x16

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBD, CPU Feature: AVX512

func (Uint32x16) TruncateToUint16 ¶

func (x Uint32x16) TruncateToUint16() Uint16x16

TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.

Asm: VPMOVDW, CPU Feature: AVX512

func (Uint32x16) TruncateToUint8 ¶

func (x Uint32x16) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVDB, CPU Feature: AVX512

func (Uint32x16) Xor ¶

func (x Uint32x16) Xor(y Uint32x16) Uint32x16

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORD, CPU Feature: AVX512

type Uint32x4 ¶

type Uint32x4 struct {
	// contains filtered or unexported fields
}

Uint32x4 is a 128-bit SIMD vector of 4 uint32

func BroadcastUint32x4 ¶

func BroadcastUint32x4(x uint32) Uint32x4

BroadcastUint32x4 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadMaskedUint32x4 ¶

func LoadMaskedUint32x4(y *[4]uint32, mask Mask32x4) Uint32x4

LoadMaskedUint32x4 loads a Uint32x4 from an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func LoadUint32x4 ¶

func LoadUint32x4(y *[4]uint32) Uint32x4

LoadUint32x4 loads a Uint32x4 from an array

func LoadUint32x4Slice ¶

func LoadUint32x4Slice(s []uint32) Uint32x4

LoadUint32x4Slice loads an Uint32x4 from a slice of at least 4 uint32s

func LoadUint32x4SlicePart ¶

func LoadUint32x4SlicePart(s []uint32) Uint32x4

LoadUint32x4SlicePart loads a Uint32x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadUint32x4Slice.

func (Uint32x4) AESInvMixColumns ¶

func (x Uint32x4) AESInvMixColumns() Uint32x4

AESInvMixColumns performs the InvMixColumns operation in AES cipher algorithm defined in FIPS 197. x is the chunk of w array in use. result = InvMixColumns(x)

Asm: VAESIMC, CPU Feature: AVX, AES

func (Uint32x4) AESRoundKeyGenAssist ¶

func (x Uint32x4) AESRoundKeyGenAssist(rconVal uint8) Uint32x4

AESRoundKeyGenAssist performs some components of KeyExpansion in AES cipher algorithm defined in FIPS 197. x is an array of AES words, but only x[0] and x[2] are used. r is a value from the Rcon constant array. result[0] = XOR(SubWord(RotWord(x[0])), r) result[1] = SubWord(x[1]) result[2] = XOR(SubWord(RotWord(x[2])), r) result[3] = SubWord(x[3])

rconVal results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VAESKEYGENASSIST, CPU Feature: AVX, AES

func (Uint32x4) Add ¶

func (x Uint32x4) Add(y Uint32x4) Uint32x4

Add adds corresponding elements of two vectors.

Asm: VPADDD, CPU Feature: AVX

func (Uint32x4) AddPairs ¶

func (x Uint32x4) AddPairs(y Uint32x4) Uint32x4

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDD, CPU Feature: AVX

func (Uint32x4) And ¶

func (x Uint32x4) And(y Uint32x4) Uint32x4

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Uint32x4) AndNot ¶

func (x Uint32x4) AndNot(y Uint32x4) Uint32x4

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Uint32x4) AsFloat32x4 ¶

func (from Uint32x4) AsFloat32x4() (to Float32x4)

Float32x4 converts from Uint32x4 to Float32x4

func (Uint32x4) AsFloat64x2 ¶

func (from Uint32x4) AsFloat64x2() (to Float64x2)

Float64x2 converts from Uint32x4 to Float64x2

func (Uint32x4) AsInt16x8 ¶

func (from Uint32x4) AsInt16x8() (to Int16x8)

Int16x8 converts from Uint32x4 to Int16x8

func (Uint32x4) AsInt32x4 ¶

func (from Uint32x4) AsInt32x4() (to Int32x4)

Int32x4 converts from Uint32x4 to Int32x4

func (Uint32x4) AsInt64x2 ¶

func (from Uint32x4) AsInt64x2() (to Int64x2)

Int64x2 converts from Uint32x4 to Int64x2

func (Uint32x4) AsInt8x16 ¶

func (from Uint32x4) AsInt8x16() (to Int8x16)

Int8x16 converts from Uint32x4 to Int8x16

func (Uint32x4) AsUint16x8 ¶

func (from Uint32x4) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Uint32x4 to Uint16x8

func (Uint32x4) AsUint64x2 ¶

func (from Uint32x4) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Uint32x4 to Uint64x2

func (Uint32x4) AsUint8x16 ¶

func (from Uint32x4) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Uint32x4 to Uint8x16

func (Uint32x4) Broadcast128 ¶

func (x Uint32x4) Broadcast128() Uint32x4

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTD, CPU Feature: AVX2

func (Uint32x4) Broadcast256 ¶

func (x Uint32x4) Broadcast256() Uint32x8

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTD, CPU Feature: AVX2

func (Uint32x4) Broadcast512 ¶

func (x Uint32x4) Broadcast512() Uint32x16

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTD, CPU Feature: AVX512

func (Uint32x4) Compress ¶

func (x Uint32x4) Compress(mask Mask32x4) Uint32x4

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSD, CPU Feature: AVX512

func (Uint32x4) ConcatPermute ¶

func (x Uint32x4) ConcatPermute(y Uint32x4, indices Uint32x4) Uint32x4

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2D, CPU Feature: AVX512

func (Uint32x4) ConvertToFloat32 ¶

func (x Uint32x4) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32.

Asm: VCVTUDQ2PS, CPU Feature: AVX512

func (Uint32x4) ConvertToFloat64 ¶

func (x Uint32x4) ConvertToFloat64() Float64x4

ConvertToFloat64 converts element values to float64.

Asm: VCVTUDQ2PD, CPU Feature: AVX512

func (Uint32x4) Equal ¶

func (x Uint32x4) Equal(y Uint32x4) Mask32x4

Equal returns x equals y, elementwise.

Asm: VPCMPEQD, CPU Feature: AVX

func (Uint32x4) Expand ¶

func (x Uint32x4) Expand(mask Mask32x4) Uint32x4

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDD, CPU Feature: AVX512

func (Uint32x4) ExtendLo2ToUint64x2 ¶

func (x Uint32x4) ExtendLo2ToUint64x2() Uint64x2

ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXDQ, CPU Feature: AVX

func (Uint32x4) ExtendToUint64 ¶

func (x Uint32x4) ExtendToUint64() Uint64x4

ExtendToUint64 converts element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXDQ, CPU Feature: AVX2

func (Uint32x4) GetElem ¶

func (x Uint32x4) GetElem(index uint8) uint32

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRD, CPU Feature: AVX

func (Uint32x4) Greater ¶

func (x Uint32x4) Greater(y Uint32x4) Mask32x4

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX

func (Uint32x4) GreaterEqual ¶

func (x Uint32x4) GreaterEqual(y Uint32x4) Mask32x4

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX

func (Uint32x4) InterleaveHi ¶

func (x Uint32x4) InterleaveHi(y Uint32x4) Uint32x4

InterleaveHi interleaves the elements of the high halves of x and y.

Asm: VPUNPCKHDQ, CPU Feature: AVX

func (Uint32x4) InterleaveLo ¶

func (x Uint32x4) InterleaveLo(y Uint32x4) Uint32x4

InterleaveLo interleaves the elements of the low halves of x and y.

Asm: VPUNPCKLDQ, CPU Feature: AVX

func (Uint32x4) IsZero ¶

func (x Uint32x4) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint32x4) LeadingZeros ¶

func (x Uint32x4) LeadingZeros() Uint32x4

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTD, CPU Feature: AVX512

func (Uint32x4) Len ¶

func (x Uint32x4) Len() int

Len returns the number of elements in a Uint32x4

func (Uint32x4) Less ¶

func (x Uint32x4) Less(y Uint32x4) Mask32x4

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX

func (Uint32x4) LessEqual ¶

func (x Uint32x4) LessEqual(y Uint32x4) Mask32x4

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX

func (Uint32x4) Masked ¶

func (x Uint32x4) Masked(mask Mask32x4) Uint32x4

Masked returns x but with elements zeroed where mask is false.

func (Uint32x4) Max ¶

func (x Uint32x4) Max(y Uint32x4) Uint32x4

Max computes the maximum of corresponding elements.

Asm: VPMAXUD, CPU Feature: AVX

func (Uint32x4) Merge ¶

func (x Uint32x4) Merge(y Uint32x4, mask Mask32x4) Uint32x4

Merge returns x but with elements set to y where mask is false.

func (Uint32x4) Min ¶

func (x Uint32x4) Min(y Uint32x4) Uint32x4

Min computes the minimum of corresponding elements.

Asm: VPMINUD, CPU Feature: AVX

func (Uint32x4) Mul ¶

func (x Uint32x4) Mul(y Uint32x4) Uint32x4

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLD, CPU Feature: AVX

func (Uint32x4) MulEvenWiden ¶

func (x Uint32x4) MulEvenWiden(y Uint32x4) Uint64x2

MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1.Even[i] * v2.Even[i].

Asm: VPMULUDQ, CPU Feature: AVX

func (Uint32x4) Not ¶

func (x Uint32x4) Not() Uint32x4

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Uint32x4) NotEqual ¶

func (x Uint32x4) NotEqual(y Uint32x4) Mask32x4

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Uint32x4) OnesCount ¶

func (x Uint32x4) OnesCount() Uint32x4

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

func (Uint32x4) Or ¶

func (x Uint32x4) Or(y Uint32x4) Uint32x4

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Uint32x4) PermuteScalars ¶

func (x Uint32x4) PermuteScalars(a, b, c, d uint8) Uint32x4

PermuteScalars performs a permutation of vector x's elements using the supplied indices:

result = {x[a], x[b], x[c], x[d]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.

Asm: VPSHUFD, CPU Feature: AVX

func (Uint32x4) RotateAllLeft ¶

func (x Uint32x4) RotateAllLeft(shift uint8) Uint32x4

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLD, CPU Feature: AVX512

func (Uint32x4) RotateAllRight ¶

func (x Uint32x4) RotateAllRight(shift uint8) Uint32x4

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORD, CPU Feature: AVX512

func (Uint32x4) RotateLeft ¶

func (x Uint32x4) RotateLeft(y Uint32x4) Uint32x4

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVD, CPU Feature: AVX512

func (Uint32x4) RotateRight ¶

func (x Uint32x4) RotateRight(y Uint32x4) Uint32x4

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVD, CPU Feature: AVX512

func (Uint32x4) SHA1FourRounds ¶

func (x Uint32x4) SHA1FourRounds(constant uint8, y Uint32x4) Uint32x4

SHA1FourRounds performs 4 rounds of B loop in SHA1 algorithm defined in FIPS 180-4. x contains the state variables a, b, c and d from upper to lower order. y contains the W array elements (with the state variable e added to the upper element) from upper to lower order. result = the state variables a', b', c', d' updated after 4 rounds. constant = 0 for the first 20 rounds of the loop, 1 for the next 20 rounds of the loop..., 3 for the last 20 rounds of the loop.

constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: SHA1RNDS4, CPU Feature: SHA

func (Uint32x4) SHA1Message1 ¶

func (x Uint32x4) SHA1Message1(y Uint32x4) Uint32x4

SHA1Message1 does the XORing of 1 in SHA1 algorithm defined in FIPS 180-4. x = {W3, W2, W1, W0} y = {0, 0, W5, W4} result = {W3^W5, W2^W4, W1^W3, W0^W2}.

Asm: SHA1MSG1, CPU Feature: SHA

func (Uint32x4) SHA1Message2 ¶

func (x Uint32x4) SHA1Message2(y Uint32x4) Uint32x4

SHA1Message2 does the calculation of 3 and 4 in SHA1 algorithm defined in FIPS 180-4. x = result of 2. y = {W15, W14, W13} result = {W19, W18, W17, W16}

Asm: SHA1MSG2, CPU Feature: SHA

func (Uint32x4) SHA1NextE ¶

func (x Uint32x4) SHA1NextE(y Uint32x4) Uint32x4

SHA1NextE calculates the state variable e' updated after 4 rounds in SHA1 algorithm defined in FIPS 180-4. x contains the state variable a (before the 4 rounds), placed in the upper element. y is the elements of W array for next 4 rounds from upper to lower order. result = the elements of the W array for the next 4 rounds, with the updated state variable e' added to the upper element, from upper to lower order. For the last round of the loop, you can specify zero for y to obtain the e' value itself, or better off specifying H4:0:0:0 for y to get e' added to H4. (Note that the value of e' is computed only from x, and values of y don't affect the computation of the value of e'.)

Asm: SHA1NEXTE, CPU Feature: SHA

func (Uint32x4) SHA256Message1 ¶

func (x Uint32x4) SHA256Message1(y Uint32x4) Uint32x4

SHA256Message1 does the sigma and addtion of 1 in SHA1 algorithm defined in FIPS 180-4. x = {W0, W1, W2, W3} y = {W4, 0, 0, 0} result = {W0+σ(W1), W1+σ(W2), W2+σ(W3), W3+σ(W4)}

Asm: SHA256MSG1, CPU Feature: SHA

func (Uint32x4) SHA256Message2 ¶

func (x Uint32x4) SHA256Message2(y Uint32x4) Uint32x4

SHA256Message2 does the sigma and addition of 3 in SHA1 algorithm defined in FIPS 180-4. x = result of 2 y = {0, 0, W14, W15} result = {W16, W17, W18, W19}

Asm: SHA256MSG2, CPU Feature: SHA

func (Uint32x4) SHA256TwoRounds ¶

func (x Uint32x4) SHA256TwoRounds(y Uint32x4, z Uint32x4) Uint32x4

SHA256TwoRounds does 2 rounds of B loop to calculate updated state variables in SHA1 algorithm defined in FIPS 180-4. x = {h, g, d, c} y = {f, e, b, a} z = {W0+K0, W1+K1} result = {f', e', b', a'} The K array is a 64-DWORD constant array defined in page 11 of FIPS 180-4. Each element of the K array is to be added to the corresponding element of the W array to make the input data z. The updated state variables c', d', g', h' are not returned by this instruction, because they are equal to the input data y (the state variables a, b, e, f before the 2 rounds).

Asm: SHA256RNDS2, CPU Feature: SHA

func (Uint32x4) SaturateToUint16 ¶

func (x Uint32x4) SaturateToUint16() Uint16x8

SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSDW, CPU Feature: AVX512

func (Uint32x4) SaturateToUint16Concat ¶

func (x Uint32x4) SaturateToUint16Concat(y Uint32x4) Uint16x8

SaturateToUint16Concat converts element values to uint16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.

Asm: VPACKUSDW, CPU Feature: AVX

func (Uint32x4) SelectFromPair ¶

func (x Uint32x4) SelectFromPair(a, b, c, d uint8, y Uint32x4) Uint32x4

SelectFromPair returns the selection of four elements from the two vectors x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX

func (Uint32x4) SetElem ¶

func (x Uint32x4) SetElem(index uint8, y uint32) Uint32x4

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRD, CPU Feature: AVX

func (Uint32x4) ShiftAllLeft ¶

func (x Uint32x4) ShiftAllLeft(y uint64) Uint32x4

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLD, CPU Feature: AVX

func (Uint32x4) ShiftAllLeftConcat ¶

func (x Uint32x4) ShiftAllLeftConcat(shift uint8, y Uint32x4) Uint32x4

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDD, CPU Feature: AVX512VBMI2

func (Uint32x4) ShiftAllRight ¶

func (x Uint32x4) ShiftAllRight(y uint64) Uint32x4

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLD, CPU Feature: AVX

func (Uint32x4) ShiftAllRightConcat ¶

func (x Uint32x4) ShiftAllRightConcat(shift uint8, y Uint32x4) Uint32x4

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDD, CPU Feature: AVX512VBMI2

func (Uint32x4) ShiftLeft ¶

func (x Uint32x4) ShiftLeft(y Uint32x4) Uint32x4

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVD, CPU Feature: AVX2

func (Uint32x4) ShiftLeftConcat ¶

func (x Uint32x4) ShiftLeftConcat(y Uint32x4, z Uint32x4) Uint32x4

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

func (Uint32x4) ShiftRight ¶

func (x Uint32x4) ShiftRight(y Uint32x4) Uint32x4

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVD, CPU Feature: AVX2

func (Uint32x4) ShiftRightConcat ¶

func (x Uint32x4) ShiftRightConcat(y Uint32x4, z Uint32x4) Uint32x4

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

func (Uint32x4) Store ¶

func (x Uint32x4) Store(y *[4]uint32)

Store stores a Uint32x4 to an array

func (Uint32x4) StoreMasked ¶

func (x Uint32x4) StoreMasked(y *[4]uint32, mask Mask32x4)

StoreMasked stores a Uint32x4 to an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Uint32x4) StoreSlice ¶

func (x Uint32x4) StoreSlice(s []uint32)

StoreSlice stores x into a slice of at least 4 uint32s

func (Uint32x4) StoreSlicePart ¶

func (x Uint32x4) StoreSlicePart(s []uint32)

StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.

func (Uint32x4) String ¶

func (x Uint32x4) String() string

String returns a string representation of SIMD vector x

func (Uint32x4) Sub ¶

func (x Uint32x4) Sub(y Uint32x4) Uint32x4

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBD, CPU Feature: AVX

func (Uint32x4) SubPairs ¶

func (x Uint32x4) SubPairs(y Uint32x4) Uint32x4

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBD, CPU Feature: AVX

func (Uint32x4) TruncateToUint16 ¶

func (x Uint32x4) TruncateToUint16() Uint16x8

TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.

Asm: VPMOVDW, CPU Feature: AVX512

func (Uint32x4) TruncateToUint8 ¶

func (x Uint32x4) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVDB, CPU Feature: AVX512

func (Uint32x4) Xor ¶

func (x Uint32x4) Xor(y Uint32x4) Uint32x4

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Uint32x8 ¶

type Uint32x8 struct {
	// contains filtered or unexported fields
}

Uint32x8 is a 256-bit SIMD vector of 8 uint32

func BroadcastUint32x8 ¶

func BroadcastUint32x8(x uint32) Uint32x8

BroadcastUint32x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadMaskedUint32x8 ¶

func LoadMaskedUint32x8(y *[8]uint32, mask Mask32x8) Uint32x8

LoadMaskedUint32x8 loads a Uint32x8 from an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func LoadUint32x8 ¶

func LoadUint32x8(y *[8]uint32) Uint32x8

LoadUint32x8 loads a Uint32x8 from an array

func LoadUint32x8Slice ¶

func LoadUint32x8Slice(s []uint32) Uint32x8

LoadUint32x8Slice loads an Uint32x8 from a slice of at least 8 uint32s

func LoadUint32x8SlicePart ¶

func LoadUint32x8SlicePart(s []uint32) Uint32x8

LoadUint32x8SlicePart loads a Uint32x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadUint32x8Slice.

func (Uint32x8) Add ¶

func (x Uint32x8) Add(y Uint32x8) Uint32x8

Add adds corresponding elements of two vectors.

Asm: VPADDD, CPU Feature: AVX2

func (Uint32x8) AddPairs ¶

func (x Uint32x8) AddPairs(y Uint32x8) Uint32x8

AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].

Asm: VPHADDD, CPU Feature: AVX2

func (Uint32x8) And ¶

func (x Uint32x8) And(y Uint32x8) Uint32x8

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Uint32x8) AndNot ¶

func (x Uint32x8) AndNot(y Uint32x8) Uint32x8

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Uint32x8) AsFloat32x8 ¶

func (from Uint32x8) AsFloat32x8() (to Float32x8)

Float32x8 converts from Uint32x8 to Float32x8

func (Uint32x8) AsFloat64x4 ¶

func (from Uint32x8) AsFloat64x4() (to Float64x4)

Float64x4 converts from Uint32x8 to Float64x4

func (Uint32x8) AsInt16x16 ¶

func (from Uint32x8) AsInt16x16() (to Int16x16)

Int16x16 converts from Uint32x8 to Int16x16

func (Uint32x8) AsInt32x8 ¶

func (from Uint32x8) AsInt32x8() (to Int32x8)

Int32x8 converts from Uint32x8 to Int32x8

func (Uint32x8) AsInt64x4 ¶

func (from Uint32x8) AsInt64x4() (to Int64x4)

Int64x4 converts from Uint32x8 to Int64x4

func (Uint32x8) AsInt8x32 ¶

func (from Uint32x8) AsInt8x32() (to Int8x32)

Int8x32 converts from Uint32x8 to Int8x32

func (Uint32x8) AsUint16x16 ¶

func (from Uint32x8) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Uint32x8 to Uint16x16

func (Uint32x8) AsUint64x4 ¶

func (from Uint32x8) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Uint32x8 to Uint64x4

func (Uint32x8) AsUint8x32 ¶

func (from Uint32x8) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Uint32x8 to Uint8x32

func (Uint32x8) Compress ¶

func (x Uint32x8) Compress(mask Mask32x8) Uint32x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSD, CPU Feature: AVX512

func (Uint32x8) ConcatPermute ¶

func (x Uint32x8) ConcatPermute(y Uint32x8, indices Uint32x8) Uint32x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2D, CPU Feature: AVX512

func (Uint32x8) ConvertToFloat32 ¶

func (x Uint32x8) ConvertToFloat32() Float32x8

ConvertToFloat32 converts element values to float32.

Asm: VCVTUDQ2PS, CPU Feature: AVX512

func (Uint32x8) ConvertToFloat64 ¶

func (x Uint32x8) ConvertToFloat64() Float64x8

ConvertToFloat64 converts element values to float64.

Asm: VCVTUDQ2PD, CPU Feature: AVX512

func (Uint32x8) Equal ¶

func (x Uint32x8) Equal(y Uint32x8) Mask32x8

Equal returns x equals y, elementwise.

Asm: VPCMPEQD, CPU Feature: AVX2

func (Uint32x8) Expand ¶

func (x Uint32x8) Expand(mask Mask32x8) Uint32x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDD, CPU Feature: AVX512

func (Uint32x8) ExtendToUint64 ¶

func (x Uint32x8) ExtendToUint64() Uint64x8

ExtendToUint64 converts element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXDQ, CPU Feature: AVX512

func (Uint32x8) GetHi ¶

func (x Uint32x8) GetHi() Uint32x4

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint32x8) GetLo ¶

func (x Uint32x8) GetLo() Uint32x4

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint32x8) Greater ¶

func (x Uint32x8) Greater(y Uint32x8) Mask32x8

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX2

func (Uint32x8) GreaterEqual ¶

func (x Uint32x8) GreaterEqual(y Uint32x8) Mask32x8

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Uint32x8) InterleaveHiGrouped ¶

func (x Uint32x8) InterleaveHiGrouped(y Uint32x8) Uint32x8

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHDQ, CPU Feature: AVX2

func (Uint32x8) InterleaveLoGrouped ¶

func (x Uint32x8) InterleaveLoGrouped(y Uint32x8) Uint32x8

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLDQ, CPU Feature: AVX2

func (Uint32x8) IsZero ¶

func (x Uint32x8) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint32x8) LeadingZeros ¶

func (x Uint32x8) LeadingZeros() Uint32x8

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTD, CPU Feature: AVX512

func (Uint32x8) Len ¶

func (x Uint32x8) Len() int

Len returns the number of elements in a Uint32x8

func (Uint32x8) Less ¶

func (x Uint32x8) Less(y Uint32x8) Mask32x8

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Uint32x8) LessEqual ¶

func (x Uint32x8) LessEqual(y Uint32x8) Mask32x8

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Uint32x8) Masked ¶

func (x Uint32x8) Masked(mask Mask32x8) Uint32x8

Masked returns x but with elements zeroed where mask is false.

func (Uint32x8) Max ¶

func (x Uint32x8) Max(y Uint32x8) Uint32x8

Max computes the maximum of corresponding elements.

Asm: VPMAXUD, CPU Feature: AVX2

func (Uint32x8) Merge ¶

func (x Uint32x8) Merge(y Uint32x8, mask Mask32x8) Uint32x8

Merge returns x but with elements set to y where mask is false.

func (Uint32x8) Min ¶

func (x Uint32x8) Min(y Uint32x8) Uint32x8

Min computes the minimum of corresponding elements.

Asm: VPMINUD, CPU Feature: AVX2

func (Uint32x8) Mul ¶

func (x Uint32x8) Mul(y Uint32x8) Uint32x8

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLD, CPU Feature: AVX2

func (Uint32x8) MulEvenWiden ¶

func (x Uint32x8) MulEvenWiden(y Uint32x8) Uint64x4

MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1.Even[i] * v2.Even[i].

Asm: VPMULUDQ, CPU Feature: AVX2

func (Uint32x8) Not ¶

func (x Uint32x8) Not() Uint32x8

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Uint32x8) NotEqual ¶

func (x Uint32x8) NotEqual(y Uint32x8) Mask32x8

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Uint32x8) OnesCount ¶

func (x Uint32x8) OnesCount() Uint32x8

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ

func (Uint32x8) Or ¶

func (x Uint32x8) Or(y Uint32x8) Uint32x8

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Uint32x8) Permute ¶

func (x Uint32x8) Permute(indices Uint32x8) Uint32x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMD, CPU Feature: AVX2

func (Uint32x8) PermuteScalarsGrouped ¶

func (x Uint32x8) PermuteScalarsGrouped(a, b, c, d uint8) Uint32x8

PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:

result = {x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4]}

Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.

Asm: VPSHUFD, CPU Feature: AVX2

func (Uint32x8) RotateAllLeft ¶

func (x Uint32x8) RotateAllLeft(shift uint8) Uint32x8

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLD, CPU Feature: AVX512

func (Uint32x8) RotateAllRight ¶

func (x Uint32x8) RotateAllRight(shift uint8) Uint32x8

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORD, CPU Feature: AVX512

func (Uint32x8) RotateLeft ¶

func (x Uint32x8) RotateLeft(y Uint32x8) Uint32x8

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVD, CPU Feature: AVX512

func (Uint32x8) RotateRight ¶

func (x Uint32x8) RotateRight(y Uint32x8) Uint32x8

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVD, CPU Feature: AVX512

func (Uint32x8) SaturateToUint16 ¶

func (x Uint32x8) SaturateToUint16() Uint16x8

SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSDW, CPU Feature: AVX512

func (Uint32x8) SaturateToUint16Concat ¶

func (x Uint32x8) SaturateToUint16Concat(y Uint32x8) Uint16x16

SaturateToUint16Concat converts element values to uint16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.

Asm: VPACKUSDW, CPU Feature: AVX2

func (Uint32x8) Select128FromPair ¶

func (x Uint32x8) Select128FromPair(lo, hi uint8, y Uint32x8) Uint32x8

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})

returns {70, 71, 72, 73, 40, 41, 42, 43}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Uint32x8) SelectFromPairGrouped ¶

func (x Uint32x8) SelectFromPairGrouped(a, b, c, d uint8, y Uint32x8) Uint32x8

SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})

returns {4,8,25,81,64,128,169,289}

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPS, CPU Feature: AVX

func (Uint32x8) SetHi ¶

func (x Uint32x8) SetHi(y Uint32x4) Uint32x8

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint32x8) SetLo ¶

func (x Uint32x8) SetLo(y Uint32x4) Uint32x8

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint32x8) ShiftAllLeft ¶

func (x Uint32x8) ShiftAllLeft(y uint64) Uint32x8

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLD, CPU Feature: AVX2

func (Uint32x8) ShiftAllLeftConcat ¶

func (x Uint32x8) ShiftAllLeftConcat(shift uint8, y Uint32x8) Uint32x8

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDD, CPU Feature: AVX512VBMI2

func (Uint32x8) ShiftAllRight ¶

func (x Uint32x8) ShiftAllRight(y uint64) Uint32x8

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLD, CPU Feature: AVX2

func (Uint32x8) ShiftAllRightConcat ¶

func (x Uint32x8) ShiftAllRightConcat(shift uint8, y Uint32x8) Uint32x8

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDD, CPU Feature: AVX512VBMI2

func (Uint32x8) ShiftLeft ¶

func (x Uint32x8) ShiftLeft(y Uint32x8) Uint32x8

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVD, CPU Feature: AVX2

func (Uint32x8) ShiftLeftConcat ¶

func (x Uint32x8) ShiftLeftConcat(y Uint32x8, z Uint32x8) Uint32x8

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVD, CPU Feature: AVX512VBMI2

func (Uint32x8) ShiftRight ¶

func (x Uint32x8) ShiftRight(y Uint32x8) Uint32x8

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVD, CPU Feature: AVX2

func (Uint32x8) ShiftRightConcat ¶

func (x Uint32x8) ShiftRightConcat(y Uint32x8, z Uint32x8) Uint32x8

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVD, CPU Feature: AVX512VBMI2

func (Uint32x8) Store ¶

func (x Uint32x8) Store(y *[8]uint32)

Store stores a Uint32x8 to an array

func (Uint32x8) StoreMasked ¶

func (x Uint32x8) StoreMasked(y *[8]uint32, mask Mask32x8)

StoreMasked stores a Uint32x8 to an array, at those elements enabled by mask

Asm: VMASKMOVD, CPU Feature: AVX2

func (Uint32x8) StoreSlice ¶

func (x Uint32x8) StoreSlice(s []uint32)

StoreSlice stores x into a slice of at least 8 uint32s

func (Uint32x8) StoreSlicePart ¶

func (x Uint32x8) StoreSlicePart(s []uint32)

StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Uint32x8) String ¶

func (x Uint32x8) String() string

String returns a string representation of SIMD vector x

func (Uint32x8) Sub ¶

func (x Uint32x8) Sub(y Uint32x8) Uint32x8

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBD, CPU Feature: AVX2

func (Uint32x8) SubPairs ¶

func (x Uint32x8) SubPairs(y Uint32x8) Uint32x8

SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Asm: VPHSUBD, CPU Feature: AVX2

func (Uint32x8) TruncateToUint16 ¶

func (x Uint32x8) TruncateToUint16() Uint16x8

TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.

Asm: VPMOVDW, CPU Feature: AVX512

func (Uint32x8) TruncateToUint8 ¶

func (x Uint32x8) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVDB, CPU Feature: AVX512

func (Uint32x8) Xor ¶

func (x Uint32x8) Xor(y Uint32x8) Uint32x8

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Uint64x2 ¶

type Uint64x2 struct {
	// contains filtered or unexported fields
}

Uint64x2 is a 128-bit SIMD vector of 2 uint64

func BroadcastUint64x2 ¶

func BroadcastUint64x2(x uint64) Uint64x2

BroadcastUint64x2 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadMaskedUint64x2 ¶

func LoadMaskedUint64x2(y *[2]uint64, mask Mask64x2) Uint64x2

LoadMaskedUint64x2 loads a Uint64x2 from an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func LoadUint64x2 ¶

func LoadUint64x2(y *[2]uint64) Uint64x2

LoadUint64x2 loads a Uint64x2 from an array

func LoadUint64x2Slice ¶

func LoadUint64x2Slice(s []uint64) Uint64x2

LoadUint64x2Slice loads an Uint64x2 from a slice of at least 2 uint64s

func LoadUint64x2SlicePart ¶

func LoadUint64x2SlicePart(s []uint64) Uint64x2

LoadUint64x2SlicePart loads a Uint64x2 from the slice s. If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes. If s has 2 or more elements, the function is equivalent to LoadUint64x2Slice.

func (Uint64x2) Add ¶

func (x Uint64x2) Add(y Uint64x2) Uint64x2

Add adds corresponding elements of two vectors.

Asm: VPADDQ, CPU Feature: AVX

func (Uint64x2) And ¶

func (x Uint64x2) And(y Uint64x2) Uint64x2

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Uint64x2) AndNot ¶

func (x Uint64x2) AndNot(y Uint64x2) Uint64x2

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Uint64x2) AsFloat32x4 ¶

func (from Uint64x2) AsFloat32x4() (to Float32x4)

Float32x4 converts from Uint64x2 to Float32x4

func (Uint64x2) AsFloat64x2 ¶

func (from Uint64x2) AsFloat64x2() (to Float64x2)

Float64x2 converts from Uint64x2 to Float64x2

func (Uint64x2) AsInt16x8 ¶

func (from Uint64x2) AsInt16x8() (to Int16x8)

Int16x8 converts from Uint64x2 to Int16x8

func (Uint64x2) AsInt32x4 ¶

func (from Uint64x2) AsInt32x4() (to Int32x4)

Int32x4 converts from Uint64x2 to Int32x4

func (Uint64x2) AsInt64x2 ¶

func (from Uint64x2) AsInt64x2() (to Int64x2)

Int64x2 converts from Uint64x2 to Int64x2

func (Uint64x2) AsInt8x16 ¶

func (from Uint64x2) AsInt8x16() (to Int8x16)

Int8x16 converts from Uint64x2 to Int8x16

func (Uint64x2) AsUint16x8 ¶

func (from Uint64x2) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Uint64x2 to Uint16x8

func (Uint64x2) AsUint32x4 ¶

func (from Uint64x2) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Uint64x2 to Uint32x4

func (Uint64x2) AsUint8x16 ¶

func (from Uint64x2) AsUint8x16() (to Uint8x16)

Uint8x16 converts from Uint64x2 to Uint8x16

func (Uint64x2) Broadcast128 ¶

func (x Uint64x2) Broadcast128() Uint64x2

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTQ, CPU Feature: AVX2

func (Uint64x2) Broadcast256 ¶

func (x Uint64x2) Broadcast256() Uint64x4

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTQ, CPU Feature: AVX2

func (Uint64x2) Broadcast512 ¶

func (x Uint64x2) Broadcast512() Uint64x8

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTQ, CPU Feature: AVX512

func (Uint64x2) CarrylessMultiply ¶

func (x Uint64x2) CarrylessMultiply(a, b uint8, y Uint64x2) Uint64x2

CarrylessMultiply computes one of four possible carryless multiplications of selected high and low halves of x and y, depending on the values of a and b, returning the 128-bit product in the concatenated two elements of the result. a selects the low (0) or high (1) element of x and b selects the low (0) or high (1) element of y.

A carryless multiplication uses bitwise XOR instead of add-with-carry, for example (in base two): 11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101

This also models multiplication of polynomials with coefficients from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 = x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds polynomial terms, but coefficients "add" with XOR.)

constant values of a and b will result in better performance, otherwise the intrinsic may translate into a jump table.

Asm: VPCLMULQDQ, CPU Feature: AVX

func (Uint64x2) Compress ¶

func (x Uint64x2) Compress(mask Mask64x2) Uint64x2

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSQ, CPU Feature: AVX512

func (Uint64x2) ConcatPermute ¶

func (x Uint64x2) ConcatPermute(y Uint64x2, indices Uint64x2) Uint64x2

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2Q, CPU Feature: AVX512

func (Uint64x2) ConvertToFloat32 ¶

func (x Uint64x2) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32.

Asm: VCVTUQQ2PSX, CPU Feature: AVX512

func (Uint64x2) ConvertToFloat64 ¶

func (x Uint64x2) ConvertToFloat64() Float64x2

ConvertToFloat64 converts element values to float64.

Asm: VCVTUQQ2PD, CPU Feature: AVX512

func (Uint64x2) Equal ¶

func (x Uint64x2) Equal(y Uint64x2) Mask64x2

Equal returns x equals y, elementwise.

Asm: VPCMPEQQ, CPU Feature: AVX

func (Uint64x2) Expand ¶

func (x Uint64x2) Expand(mask Mask64x2) Uint64x2

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDQ, CPU Feature: AVX512

func (Uint64x2) GetElem ¶

func (x Uint64x2) GetElem(index uint8) uint64

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRQ, CPU Feature: AVX

func (Uint64x2) Greater ¶

func (x Uint64x2) Greater(y Uint64x2) Mask64x2

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX

func (Uint64x2) GreaterEqual ¶

func (x Uint64x2) GreaterEqual(y Uint64x2) Mask64x2

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX

func (Uint64x2) InterleaveHi ¶

func (x Uint64x2) InterleaveHi(y Uint64x2) Uint64x2

InterleaveHi interleaves the elements of the high halves of x and y.

Asm: VPUNPCKHQDQ, CPU Feature: AVX

func (Uint64x2) InterleaveLo ¶

func (x Uint64x2) InterleaveLo(y Uint64x2) Uint64x2

InterleaveLo interleaves the elements of the low halves of x and y.

Asm: VPUNPCKLQDQ, CPU Feature: AVX

func (Uint64x2) IsZero ¶

func (x Uint64x2) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint64x2) LeadingZeros ¶

func (x Uint64x2) LeadingZeros() Uint64x2

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTQ, CPU Feature: AVX512

func (Uint64x2) Len ¶

func (x Uint64x2) Len() int

Len returns the number of elements in a Uint64x2

func (Uint64x2) Less ¶

func (x Uint64x2) Less(y Uint64x2) Mask64x2

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX

func (Uint64x2) LessEqual ¶

func (x Uint64x2) LessEqual(y Uint64x2) Mask64x2

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX

func (Uint64x2) Masked ¶

func (x Uint64x2) Masked(mask Mask64x2) Uint64x2

Masked returns x but with elements zeroed where mask is false.

func (Uint64x2) Max ¶

func (x Uint64x2) Max(y Uint64x2) Uint64x2

Max computes the maximum of corresponding elements.

Asm: VPMAXUQ, CPU Feature: AVX512

func (Uint64x2) Merge ¶

func (x Uint64x2) Merge(y Uint64x2, mask Mask64x2) Uint64x2

Merge returns x but with elements set to y where mask is false.

func (Uint64x2) Min ¶

func (x Uint64x2) Min(y Uint64x2) Uint64x2

Min computes the minimum of corresponding elements.

Asm: VPMINUQ, CPU Feature: AVX512

func (Uint64x2) Mul ¶

func (x Uint64x2) Mul(y Uint64x2) Uint64x2

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLQ, CPU Feature: AVX512

func (Uint64x2) Not ¶

func (x Uint64x2) Not() Uint64x2

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Uint64x2) NotEqual ¶

func (x Uint64x2) NotEqual(y Uint64x2) Mask64x2

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Uint64x2) OnesCount ¶

func (x Uint64x2) OnesCount() Uint64x2

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

func (Uint64x2) Or ¶

func (x Uint64x2) Or(y Uint64x2) Uint64x2

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Uint64x2) RotateAllLeft ¶

func (x Uint64x2) RotateAllLeft(shift uint8) Uint64x2

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLQ, CPU Feature: AVX512

func (Uint64x2) RotateAllRight ¶

func (x Uint64x2) RotateAllRight(shift uint8) Uint64x2

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORQ, CPU Feature: AVX512

func (Uint64x2) RotateLeft ¶

func (x Uint64x2) RotateLeft(y Uint64x2) Uint64x2

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVQ, CPU Feature: AVX512

func (Uint64x2) RotateRight ¶

func (x Uint64x2) RotateRight(y Uint64x2) Uint64x2

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVQ, CPU Feature: AVX512

func (Uint64x2) SaturateToUint16 ¶

func (x Uint64x2) SaturateToUint16() Uint16x8

SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSQW, CPU Feature: AVX512

func (Uint64x2) SaturateToUint32 ¶

func (x Uint64x2) SaturateToUint32() Uint32x4

SaturateToUint32 converts element values to uint32. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSQD, CPU Feature: AVX512

func (Uint64x2) SelectFromPair ¶

func (x Uint64x2) SelectFromPair(a, b uint8, y Uint64x2) Uint64x2

SelectFromPair returns the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX

func (Uint64x2) SetElem ¶

func (x Uint64x2) SetElem(index uint8, y uint64) Uint64x2

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRQ, CPU Feature: AVX

func (Uint64x2) ShiftAllLeft ¶

func (x Uint64x2) ShiftAllLeft(y uint64) Uint64x2

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLQ, CPU Feature: AVX

func (Uint64x2) ShiftAllLeftConcat ¶

func (x Uint64x2) ShiftAllLeftConcat(shift uint8, y Uint64x2) Uint64x2

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

func (Uint64x2) ShiftAllRight ¶

func (x Uint64x2) ShiftAllRight(y uint64) Uint64x2

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLQ, CPU Feature: AVX

func (Uint64x2) ShiftAllRightConcat ¶

func (x Uint64x2) ShiftAllRightConcat(shift uint8, y Uint64x2) Uint64x2

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

func (Uint64x2) ShiftLeft ¶

func (x Uint64x2) ShiftLeft(y Uint64x2) Uint64x2

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVQ, CPU Feature: AVX2

func (Uint64x2) ShiftLeftConcat ¶

func (x Uint64x2) ShiftLeftConcat(y Uint64x2, z Uint64x2) Uint64x2

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

func (Uint64x2) ShiftRight ¶

func (x Uint64x2) ShiftRight(y Uint64x2) Uint64x2

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVQ, CPU Feature: AVX2

func (Uint64x2) ShiftRightConcat ¶

func (x Uint64x2) ShiftRightConcat(y Uint64x2, z Uint64x2) Uint64x2

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

func (Uint64x2) Store ¶

func (x Uint64x2) Store(y *[2]uint64)

Store stores a Uint64x2 to an array

func (Uint64x2) StoreMasked ¶

func (x Uint64x2) StoreMasked(y *[2]uint64, mask Mask64x2)

StoreMasked stores a Uint64x2 to an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Uint64x2) StoreSlice ¶

func (x Uint64x2) StoreSlice(s []uint64)

StoreSlice stores x into a slice of at least 2 uint64s

func (Uint64x2) StoreSlicePart ¶

func (x Uint64x2) StoreSlicePart(s []uint64)

StoreSlicePart stores the 2 elements of x into the slice s. It stores as many elements as will fit in s. If s has 2 or more elements, the method is equivalent to x.StoreSlice.

func (Uint64x2) String ¶

func (x Uint64x2) String() string

String returns a string representation of SIMD vector x

func (Uint64x2) Sub ¶

func (x Uint64x2) Sub(y Uint64x2) Uint64x2

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBQ, CPU Feature: AVX

func (Uint64x2) TruncateToUint16 ¶

func (x Uint64x2) TruncateToUint16() Uint16x8

TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.

Asm: VPMOVQW, CPU Feature: AVX512

func (Uint64x2) TruncateToUint32 ¶

func (x Uint64x2) TruncateToUint32() Uint32x4

TruncateToUint32 converts element values to uint32. Conversion is done with truncation on the vector elements.

Asm: VPMOVQD, CPU Feature: AVX512

func (Uint64x2) TruncateToUint8 ¶

func (x Uint64x2) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVQB, CPU Feature: AVX512

func (Uint64x2) Xor ¶

func (x Uint64x2) Xor(y Uint64x2) Uint64x2

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Uint64x4 ¶

type Uint64x4 struct {
	// contains filtered or unexported fields
}

Uint64x4 is a 256-bit SIMD vector of 4 uint64

func BroadcastUint64x4 ¶

func BroadcastUint64x4(x uint64) Uint64x4

BroadcastUint64x4 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadMaskedUint64x4 ¶

func LoadMaskedUint64x4(y *[4]uint64, mask Mask64x4) Uint64x4

LoadMaskedUint64x4 loads a Uint64x4 from an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func LoadUint64x4 ¶

func LoadUint64x4(y *[4]uint64) Uint64x4

LoadUint64x4 loads a Uint64x4 from an array

func LoadUint64x4Slice ¶

func LoadUint64x4Slice(s []uint64) Uint64x4

LoadUint64x4Slice loads an Uint64x4 from a slice of at least 4 uint64s

func LoadUint64x4SlicePart ¶

func LoadUint64x4SlicePart(s []uint64) Uint64x4

LoadUint64x4SlicePart loads a Uint64x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadUint64x4Slice.

func (Uint64x4) Add ¶

func (x Uint64x4) Add(y Uint64x4) Uint64x4

Add adds corresponding elements of two vectors.

Asm: VPADDQ, CPU Feature: AVX2

func (Uint64x4) And ¶

func (x Uint64x4) And(y Uint64x4) Uint64x4

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Uint64x4) AndNot ¶

func (x Uint64x4) AndNot(y Uint64x4) Uint64x4

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Uint64x4) AsFloat32x8 ¶

func (from Uint64x4) AsFloat32x8() (to Float32x8)

Float32x8 converts from Uint64x4 to Float32x8

func (Uint64x4) AsFloat64x4 ¶

func (from Uint64x4) AsFloat64x4() (to Float64x4)

Float64x4 converts from Uint64x4 to Float64x4

func (Uint64x4) AsInt16x16 ¶

func (from Uint64x4) AsInt16x16() (to Int16x16)

Int16x16 converts from Uint64x4 to Int16x16

func (Uint64x4) AsInt32x8 ¶

func (from Uint64x4) AsInt32x8() (to Int32x8)

Int32x8 converts from Uint64x4 to Int32x8

func (Uint64x4) AsInt64x4 ¶

func (from Uint64x4) AsInt64x4() (to Int64x4)

Int64x4 converts from Uint64x4 to Int64x4

func (Uint64x4) AsInt8x32 ¶

func (from Uint64x4) AsInt8x32() (to Int8x32)

Int8x32 converts from Uint64x4 to Int8x32

func (Uint64x4) AsUint16x16 ¶

func (from Uint64x4) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Uint64x4 to Uint16x16

func (Uint64x4) AsUint32x8 ¶

func (from Uint64x4) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Uint64x4 to Uint32x8

func (Uint64x4) AsUint8x32 ¶

func (from Uint64x4) AsUint8x32() (to Uint8x32)

Uint8x32 converts from Uint64x4 to Uint8x32

func (Uint64x4) CarrylessMultiplyGrouped ¶

func (x Uint64x4) CarrylessMultiplyGrouped(a, b uint8, y Uint64x4) Uint64x4

CarrylessMultiplyGrouped computes one of four possible carryless multiplications of selected high and low halves of each of the two 128-bit lanes of x and y, depending on the values of a and b, and returns the four 128-bit products in the result's lanes. a selects the low (0) or high (1) elements of x's lanes and b selects the low (0) or high (1) elements of y's lanes.

A carryless multiplication uses bitwise XOR instead of add-with-carry, for example (in base two): 11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101

This also models multiplication of polynomials with coefficients from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 = x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds polynomial terms, but coefficients "add" with XOR.)

constant values of a and b will result in better performance, otherwise the intrinsic may translate into a jump table.

Asm: VPCLMULQDQ, CPU Feature: AVX512VPCLMULQDQ

func (Uint64x4) Compress ¶

func (x Uint64x4) Compress(mask Mask64x4) Uint64x4

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSQ, CPU Feature: AVX512

func (Uint64x4) ConcatPermute ¶

func (x Uint64x4) ConcatPermute(y Uint64x4, indices Uint64x4) Uint64x4

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2Q, CPU Feature: AVX512

func (Uint64x4) ConvertToFloat32 ¶

func (x Uint64x4) ConvertToFloat32() Float32x4

ConvertToFloat32 converts element values to float32.

Asm: VCVTUQQ2PSY, CPU Feature: AVX512

func (Uint64x4) ConvertToFloat64 ¶

func (x Uint64x4) ConvertToFloat64() Float64x4

ConvertToFloat64 converts element values to float64.

Asm: VCVTUQQ2PD, CPU Feature: AVX512

func (Uint64x4) Equal ¶

func (x Uint64x4) Equal(y Uint64x4) Mask64x4

Equal returns x equals y, elementwise.

Asm: VPCMPEQQ, CPU Feature: AVX2

func (Uint64x4) Expand ¶

func (x Uint64x4) Expand(mask Mask64x4) Uint64x4

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDQ, CPU Feature: AVX512

func (Uint64x4) GetHi ¶

func (x Uint64x4) GetHi() Uint64x2

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint64x4) GetLo ¶

func (x Uint64x4) GetLo() Uint64x2

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint64x4) Greater ¶

func (x Uint64x4) Greater(y Uint64x4) Mask64x4

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX2

func (Uint64x4) GreaterEqual ¶

func (x Uint64x4) GreaterEqual(y Uint64x4) Mask64x4

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Uint64x4) InterleaveHiGrouped ¶

func (x Uint64x4) InterleaveHiGrouped(y Uint64x4) Uint64x4

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHQDQ, CPU Feature: AVX2

func (Uint64x4) InterleaveLoGrouped ¶

func (x Uint64x4) InterleaveLoGrouped(y Uint64x4) Uint64x4

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLQDQ, CPU Feature: AVX2

func (Uint64x4) IsZero ¶

func (x Uint64x4) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint64x4) LeadingZeros ¶

func (x Uint64x4) LeadingZeros() Uint64x4

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTQ, CPU Feature: AVX512

func (Uint64x4) Len ¶

func (x Uint64x4) Len() int

Len returns the number of elements in a Uint64x4

func (Uint64x4) Less ¶

func (x Uint64x4) Less(y Uint64x4) Mask64x4

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Uint64x4) LessEqual ¶

func (x Uint64x4) LessEqual(y Uint64x4) Mask64x4

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Uint64x4) Masked ¶

func (x Uint64x4) Masked(mask Mask64x4) Uint64x4

Masked returns x but with elements zeroed where mask is false.

func (Uint64x4) Max ¶

func (x Uint64x4) Max(y Uint64x4) Uint64x4

Max computes the maximum of corresponding elements.

Asm: VPMAXUQ, CPU Feature: AVX512

func (Uint64x4) Merge ¶

func (x Uint64x4) Merge(y Uint64x4, mask Mask64x4) Uint64x4

Merge returns x but with elements set to y where mask is false.

func (Uint64x4) Min ¶

func (x Uint64x4) Min(y Uint64x4) Uint64x4

Min computes the minimum of corresponding elements.

Asm: VPMINUQ, CPU Feature: AVX512

func (Uint64x4) Mul ¶

func (x Uint64x4) Mul(y Uint64x4) Uint64x4

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLQ, CPU Feature: AVX512

func (Uint64x4) Not ¶

func (x Uint64x4) Not() Uint64x4

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Uint64x4) NotEqual ¶

func (x Uint64x4) NotEqual(y Uint64x4) Mask64x4

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Uint64x4) OnesCount ¶

func (x Uint64x4) OnesCount() Uint64x4

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

func (Uint64x4) Or ¶

func (x Uint64x4) Or(y Uint64x4) Uint64x4

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Uint64x4) Permute ¶

func (x Uint64x4) Permute(indices Uint64x4) Uint64x4

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 2 bits (values 0-3) of each element of indices is used

Asm: VPERMQ, CPU Feature: AVX512

func (Uint64x4) RotateAllLeft ¶

func (x Uint64x4) RotateAllLeft(shift uint8) Uint64x4

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLQ, CPU Feature: AVX512

func (Uint64x4) RotateAllRight ¶

func (x Uint64x4) RotateAllRight(shift uint8) Uint64x4

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORQ, CPU Feature: AVX512

func (Uint64x4) RotateLeft ¶

func (x Uint64x4) RotateLeft(y Uint64x4) Uint64x4

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVQ, CPU Feature: AVX512

func (Uint64x4) RotateRight ¶

func (x Uint64x4) RotateRight(y Uint64x4) Uint64x4

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVQ, CPU Feature: AVX512

func (Uint64x4) SaturateToUint16 ¶

func (x Uint64x4) SaturateToUint16() Uint16x8

SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSQW, CPU Feature: AVX512

func (Uint64x4) SaturateToUint32 ¶

func (x Uint64x4) SaturateToUint32() Uint32x4

SaturateToUint32 converts element values to uint32. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSQD, CPU Feature: AVX512

func (Uint64x4) Select128FromPair ¶

func (x Uint64x4) Select128FromPair(lo, hi uint8, y Uint64x4) Uint64x4

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})

returns {70, 71, 40, 41}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Uint64x4) SelectFromPairGrouped ¶

func (x Uint64x4) SelectFromPairGrouped(a, b uint8, y Uint64x4) Uint64x4

SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX

func (Uint64x4) SetHi ¶

func (x Uint64x4) SetHi(y Uint64x2) Uint64x4

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint64x4) SetLo ¶

func (x Uint64x4) SetLo(y Uint64x2) Uint64x4

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint64x4) ShiftAllLeft ¶

func (x Uint64x4) ShiftAllLeft(y uint64) Uint64x4

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLQ, CPU Feature: AVX2

func (Uint64x4) ShiftAllLeftConcat ¶

func (x Uint64x4) ShiftAllLeftConcat(shift uint8, y Uint64x4) Uint64x4

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

func (Uint64x4) ShiftAllRight ¶

func (x Uint64x4) ShiftAllRight(y uint64) Uint64x4

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLQ, CPU Feature: AVX2

func (Uint64x4) ShiftAllRightConcat ¶

func (x Uint64x4) ShiftAllRightConcat(shift uint8, y Uint64x4) Uint64x4

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

func (Uint64x4) ShiftLeft ¶

func (x Uint64x4) ShiftLeft(y Uint64x4) Uint64x4

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVQ, CPU Feature: AVX2

func (Uint64x4) ShiftLeftConcat ¶

func (x Uint64x4) ShiftLeftConcat(y Uint64x4, z Uint64x4) Uint64x4

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

func (Uint64x4) ShiftRight ¶

func (x Uint64x4) ShiftRight(y Uint64x4) Uint64x4

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVQ, CPU Feature: AVX2

func (Uint64x4) ShiftRightConcat ¶

func (x Uint64x4) ShiftRightConcat(y Uint64x4, z Uint64x4) Uint64x4

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

func (Uint64x4) Store ¶

func (x Uint64x4) Store(y *[4]uint64)

Store stores a Uint64x4 to an array

func (Uint64x4) StoreMasked ¶

func (x Uint64x4) StoreMasked(y *[4]uint64, mask Mask64x4)

StoreMasked stores a Uint64x4 to an array, at those elements enabled by mask

Asm: VMASKMOVQ, CPU Feature: AVX2

func (Uint64x4) StoreSlice ¶

func (x Uint64x4) StoreSlice(s []uint64)

StoreSlice stores x into a slice of at least 4 uint64s

func (Uint64x4) StoreSlicePart ¶

func (x Uint64x4) StoreSlicePart(s []uint64)

StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.

func (Uint64x4) String ¶

func (x Uint64x4) String() string

String returns a string representation of SIMD vector x

func (Uint64x4) Sub ¶

func (x Uint64x4) Sub(y Uint64x4) Uint64x4

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBQ, CPU Feature: AVX2

func (Uint64x4) TruncateToUint16 ¶

func (x Uint64x4) TruncateToUint16() Uint16x8

TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.

Asm: VPMOVQW, CPU Feature: AVX512

func (Uint64x4) TruncateToUint32 ¶

func (x Uint64x4) TruncateToUint32() Uint32x4

TruncateToUint32 converts element values to uint32. Conversion is done with truncation on the vector elements.

Asm: VPMOVQD, CPU Feature: AVX512

func (Uint64x4) TruncateToUint8 ¶

func (x Uint64x4) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVQB, CPU Feature: AVX512

func (Uint64x4) Xor ¶

func (x Uint64x4) Xor(y Uint64x4) Uint64x4

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Uint64x8 ¶

type Uint64x8 struct {
	// contains filtered or unexported fields
}

Uint64x8 is a 512-bit SIMD vector of 8 uint64

func BroadcastUint64x8 ¶

func BroadcastUint64x8(x uint64) Uint64x8

BroadcastUint64x8 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512F

func LoadMaskedUint64x8 ¶

func LoadMaskedUint64x8(y *[8]uint64, mask Mask64x8) Uint64x8

LoadMaskedUint64x8 loads a Uint64x8 from an array, at those elements enabled by mask

Asm: VMOVDQU64.Z, CPU Feature: AVX512

func LoadUint64x8 ¶

func LoadUint64x8(y *[8]uint64) Uint64x8

LoadUint64x8 loads a Uint64x8 from an array

func LoadUint64x8Slice ¶

func LoadUint64x8Slice(s []uint64) Uint64x8

LoadUint64x8Slice loads an Uint64x8 from a slice of at least 8 uint64s

func LoadUint64x8SlicePart ¶

func LoadUint64x8SlicePart(s []uint64) Uint64x8

LoadUint64x8SlicePart loads a Uint64x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadUint64x8Slice.

func (Uint64x8) Add ¶

func (x Uint64x8) Add(y Uint64x8) Uint64x8

Add adds corresponding elements of two vectors.

Asm: VPADDQ, CPU Feature: AVX512

func (Uint64x8) And ¶

func (x Uint64x8) And(y Uint64x8) Uint64x8

And performs a bitwise AND operation between two vectors.

Asm: VPANDQ, CPU Feature: AVX512

func (Uint64x8) AndNot ¶

func (x Uint64x8) AndNot(y Uint64x8) Uint64x8

AndNot performs a bitwise x &^ y.

Asm: VPANDNQ, CPU Feature: AVX512

func (Uint64x8) AsFloat32x16 ¶

func (from Uint64x8) AsFloat32x16() (to Float32x16)

Float32x16 converts from Uint64x8 to Float32x16

func (Uint64x8) AsFloat64x8 ¶

func (from Uint64x8) AsFloat64x8() (to Float64x8)

Float64x8 converts from Uint64x8 to Float64x8

func (Uint64x8) AsInt16x32 ¶

func (from Uint64x8) AsInt16x32() (to Int16x32)

Int16x32 converts from Uint64x8 to Int16x32

func (Uint64x8) AsInt32x16 ¶

func (from Uint64x8) AsInt32x16() (to Int32x16)

Int32x16 converts from Uint64x8 to Int32x16

func (Uint64x8) AsInt64x8 ¶

func (from Uint64x8) AsInt64x8() (to Int64x8)

Int64x8 converts from Uint64x8 to Int64x8

func (Uint64x8) AsInt8x64 ¶

func (from Uint64x8) AsInt8x64() (to Int8x64)

Int8x64 converts from Uint64x8 to Int8x64

func (Uint64x8) AsUint16x32 ¶

func (from Uint64x8) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Uint64x8 to Uint16x32

func (Uint64x8) AsUint32x16 ¶

func (from Uint64x8) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Uint64x8 to Uint32x16

func (Uint64x8) AsUint8x64 ¶

func (from Uint64x8) AsUint8x64() (to Uint8x64)

Uint8x64 converts from Uint64x8 to Uint8x64

func (Uint64x8) CarrylessMultiplyGrouped ¶

func (x Uint64x8) CarrylessMultiplyGrouped(a, b uint8, y Uint64x8) Uint64x8

CarrylessMultiplyGrouped computes one of four possible carryless multiplications of selected high and low halves of each of the four 128-bit lanes of x and y, depending on the values of a and b, and returns the four 128-bit products in the result's lanes. a selects the low (0) or high (1) elements of x's lanes and b selects the low (0) or high (1) elements of y's lanes.

A carryless multiplication uses bitwise XOR instead of add-with-carry, for example (in base two): 11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101

This also models multiplication of polynomials with coefficients from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 = x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds polynomial terms, but coefficients "add" with XOR.)

constant values of a and b will result in better performance, otherwise the intrinsic may translate into a jump table.

Asm: VPCLMULQDQ, CPU Feature: AVX512VPCLMULQDQ

func (Uint64x8) Compress ¶

func (x Uint64x8) Compress(mask Mask64x8) Uint64x8

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSQ, CPU Feature: AVX512

func (Uint64x8) ConcatPermute ¶

func (x Uint64x8) ConcatPermute(y Uint64x8, indices Uint64x8) Uint64x8

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2Q, CPU Feature: AVX512

func (Uint64x8) ConvertToFloat32 ¶

func (x Uint64x8) ConvertToFloat32() Float32x8

ConvertToFloat32 converts element values to float32.

Asm: VCVTUQQ2PS, CPU Feature: AVX512

func (Uint64x8) ConvertToFloat64 ¶

func (x Uint64x8) ConvertToFloat64() Float64x8

ConvertToFloat64 converts element values to float64.

Asm: VCVTUQQ2PD, CPU Feature: AVX512

func (Uint64x8) Equal ¶

func (x Uint64x8) Equal(y Uint64x8) Mask64x8

Equal returns x equals y, elementwise.

Asm: VPCMPEQQ, CPU Feature: AVX512

func (Uint64x8) Expand ¶

func (x Uint64x8) Expand(mask Mask64x8) Uint64x8

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDQ, CPU Feature: AVX512

func (Uint64x8) GetHi ¶

func (x Uint64x8) GetHi() Uint64x4

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint64x8) GetLo ¶

func (x Uint64x8) GetLo() Uint64x4

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint64x8) Greater ¶

func (x Uint64x8) Greater(y Uint64x8) Mask64x8

Greater returns x greater-than y, elementwise.

Asm: VPCMPUQ, CPU Feature: AVX512

func (Uint64x8) GreaterEqual ¶

func (x Uint64x8) GreaterEqual(y Uint64x8) Mask64x8

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPUQ, CPU Feature: AVX512

func (Uint64x8) InterleaveHiGrouped ¶

func (x Uint64x8) InterleaveHiGrouped(y Uint64x8) Uint64x8

InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.

Asm: VPUNPCKHQDQ, CPU Feature: AVX512

func (Uint64x8) InterleaveLoGrouped ¶

func (x Uint64x8) InterleaveLoGrouped(y Uint64x8) Uint64x8

InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.

Asm: VPUNPCKLQDQ, CPU Feature: AVX512

func (Uint64x8) LeadingZeros ¶

func (x Uint64x8) LeadingZeros() Uint64x8

LeadingZeros counts the leading zeros of each element in x.

Asm: VPLZCNTQ, CPU Feature: AVX512

func (Uint64x8) Len ¶

func (x Uint64x8) Len() int

Len returns the number of elements in a Uint64x8

func (Uint64x8) Less ¶

func (x Uint64x8) Less(y Uint64x8) Mask64x8

Less returns x less-than y, elementwise.

Asm: VPCMPUQ, CPU Feature: AVX512

func (Uint64x8) LessEqual ¶

func (x Uint64x8) LessEqual(y Uint64x8) Mask64x8

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPUQ, CPU Feature: AVX512

func (Uint64x8) Masked ¶

func (x Uint64x8) Masked(mask Mask64x8) Uint64x8

Masked returns x but with elements zeroed where mask is false.

func (Uint64x8) Max ¶

func (x Uint64x8) Max(y Uint64x8) Uint64x8

Max computes the maximum of corresponding elements.

Asm: VPMAXUQ, CPU Feature: AVX512

func (Uint64x8) Merge ¶

func (x Uint64x8) Merge(y Uint64x8, mask Mask64x8) Uint64x8

Merge returns x but with elements set to y where m is false.

func (Uint64x8) Min ¶

func (x Uint64x8) Min(y Uint64x8) Uint64x8

Min computes the minimum of corresponding elements.

Asm: VPMINUQ, CPU Feature: AVX512

func (Uint64x8) Mul ¶

func (x Uint64x8) Mul(y Uint64x8) Uint64x8

Mul multiplies corresponding elements of two vectors.

Asm: VPMULLQ, CPU Feature: AVX512

func (Uint64x8) Not ¶

func (x Uint64x8) Not() Uint64x8

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Uint64x8) NotEqual ¶

func (x Uint64x8) NotEqual(y Uint64x8) Mask64x8

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPUQ, CPU Feature: AVX512

func (Uint64x8) OnesCount ¶

func (x Uint64x8) OnesCount() Uint64x8

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ

func (Uint64x8) Or ¶

func (x Uint64x8) Or(y Uint64x8) Uint64x8

Or performs a bitwise OR operation between two vectors.

Asm: VPORQ, CPU Feature: AVX512

func (Uint64x8) Permute ¶

func (x Uint64x8) Permute(indices Uint64x8) Uint64x8

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used

Asm: VPERMQ, CPU Feature: AVX512

func (Uint64x8) RotateAllLeft ¶

func (x Uint64x8) RotateAllLeft(shift uint8) Uint64x8

RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPROLQ, CPU Feature: AVX512

func (Uint64x8) RotateAllRight ¶

func (x Uint64x8) RotateAllRight(shift uint8) Uint64x8

RotateAllRight rotates each element to the right by the number of bits specified by the immediate.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPRORQ, CPU Feature: AVX512

func (Uint64x8) RotateLeft ¶

func (x Uint64x8) RotateLeft(y Uint64x8) Uint64x8

RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.

Asm: VPROLVQ, CPU Feature: AVX512

func (Uint64x8) RotateRight ¶

func (x Uint64x8) RotateRight(y Uint64x8) Uint64x8

RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.

Asm: VPRORVQ, CPU Feature: AVX512

func (Uint64x8) SaturateToUint16 ¶

func (x Uint64x8) SaturateToUint16() Uint16x8

SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSQW, CPU Feature: AVX512

func (Uint64x8) SaturateToUint32 ¶

func (x Uint64x8) SaturateToUint32() Uint32x8

SaturateToUint32 converts element values to uint32. Conversion is done with saturation on the vector elements.

Asm: VPMOVUSQD, CPU Feature: AVX512

func (Uint64x8) SelectFromPairGrouped ¶

func (x Uint64x8) SelectFromPairGrouped(a, b uint8, y Uint64x8) Uint64x8

SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.

If the selectors are not constant this will translate to a function call.

Asm: VSHUFPD, CPU Feature: AVX512

func (Uint64x8) SetHi ¶

func (x Uint64x8) SetHi(y Uint64x4) Uint64x8

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint64x8) SetLo ¶

func (x Uint64x8) SetLo(y Uint64x4) Uint64x8

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint64x8) ShiftAllLeft ¶

func (x Uint64x8) ShiftAllLeft(y uint64) Uint64x8

ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.

Asm: VPSLLQ, CPU Feature: AVX512

func (Uint64x8) ShiftAllLeftConcat ¶

func (x Uint64x8) ShiftAllLeftConcat(shift uint8, y Uint64x8) Uint64x8

ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHLDQ, CPU Feature: AVX512VBMI2

func (Uint64x8) ShiftAllRight ¶

func (x Uint64x8) ShiftAllRight(y uint64) Uint64x8

ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.

Asm: VPSRLQ, CPU Feature: AVX512

func (Uint64x8) ShiftAllRightConcat ¶

func (x Uint64x8) ShiftAllRightConcat(shift uint8, y Uint64x8) Uint64x8

ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.

shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPSHRDQ, CPU Feature: AVX512VBMI2

func (Uint64x8) ShiftLeft ¶

func (x Uint64x8) ShiftLeft(y Uint64x8) Uint64x8

ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.

Asm: VPSLLVQ, CPU Feature: AVX512

func (Uint64x8) ShiftLeftConcat ¶

func (x Uint64x8) ShiftLeftConcat(y Uint64x8, z Uint64x8) Uint64x8

ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.

Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2

func (Uint64x8) ShiftRight ¶

func (x Uint64x8) ShiftRight(y Uint64x8) Uint64x8

ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.

Asm: VPSRLVQ, CPU Feature: AVX512

func (Uint64x8) ShiftRightConcat ¶

func (x Uint64x8) ShiftRightConcat(y Uint64x8, z Uint64x8) Uint64x8

ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.

Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2

func (Uint64x8) Store ¶

func (x Uint64x8) Store(y *[8]uint64)

Store stores a Uint64x8 to an array

func (Uint64x8) StoreMasked ¶

func (x Uint64x8) StoreMasked(y *[8]uint64, mask Mask64x8)

StoreMasked stores a Uint64x8 to an array, at those elements enabled by mask

Asm: VMOVDQU64, CPU Feature: AVX512

func (Uint64x8) StoreSlice ¶

func (x Uint64x8) StoreSlice(s []uint64)

StoreSlice stores x into a slice of at least 8 uint64s

func (Uint64x8) StoreSlicePart ¶

func (x Uint64x8) StoreSlicePart(s []uint64)

StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.

func (Uint64x8) String ¶

func (x Uint64x8) String() string

String returns a string representation of SIMD vector x

func (Uint64x8) Sub ¶

func (x Uint64x8) Sub(y Uint64x8) Uint64x8

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBQ, CPU Feature: AVX512

func (Uint64x8) TruncateToUint16 ¶

func (x Uint64x8) TruncateToUint16() Uint16x8

TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.

Asm: VPMOVQW, CPU Feature: AVX512

func (Uint64x8) TruncateToUint32 ¶

func (x Uint64x8) TruncateToUint32() Uint32x8

TruncateToUint32 converts element values to uint32. Conversion is done with truncation on the vector elements.

Asm: VPMOVQD, CPU Feature: AVX512

func (Uint64x8) TruncateToUint8 ¶

func (x Uint64x8) TruncateToUint8() Uint8x16

TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.

Asm: VPMOVQB, CPU Feature: AVX512

func (Uint64x8) Xor ¶

func (x Uint64x8) Xor(y Uint64x8) Uint64x8

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORQ, CPU Feature: AVX512

type Uint8x16 ¶

type Uint8x16 struct {
	// contains filtered or unexported fields
}

Uint8x16 is a 128-bit SIMD vector of 16 uint8

func BroadcastUint8x16 ¶

func BroadcastUint8x16(x uint8) Uint8x16

BroadcastUint8x16 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadUint8x16 ¶

func LoadUint8x16(y *[16]uint8) Uint8x16

LoadUint8x16 loads a Uint8x16 from an array

func LoadUint8x16Slice ¶

func LoadUint8x16Slice(s []uint8) Uint8x16

LoadUint8x16Slice loads an Uint8x16 from a slice of at least 16 uint8s

func LoadUint8x16SlicePart ¶

func LoadUint8x16SlicePart(s []uint8) Uint8x16

LoadUint8x16SlicePart loads a Uint8x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadUint8x16Slice.

func (Uint8x16) AESDecryptLastRound ¶

func (x Uint8x16) AESDecryptLastRound(y Uint32x4) Uint8x16

AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)

Asm: VAESDECLAST, CPU Feature: AVX, AES

func (Uint8x16) AESDecryptOneRound ¶

func (x Uint8x16) AESDecryptOneRound(y Uint32x4) Uint8x16

AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)

Asm: VAESDEC, CPU Feature: AVX, AES

func (Uint8x16) AESEncryptLastRound ¶

func (x Uint8x16) AESEncryptLastRound(y Uint32x4) Uint8x16

AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey((ShiftRows(SubBytes(x))), y)

Asm: VAESENCLAST, CPU Feature: AVX, AES

func (Uint8x16) AESEncryptOneRound ¶

func (x Uint8x16) AESEncryptOneRound(y Uint32x4) Uint8x16

AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)

Asm: VAESENC, CPU Feature: AVX, AES

func (Uint8x16) Add ¶

func (x Uint8x16) Add(y Uint8x16) Uint8x16

Add adds corresponding elements of two vectors.

Asm: VPADDB, CPU Feature: AVX

func (Uint8x16) AddSaturated ¶

func (x Uint8x16) AddSaturated(y Uint8x16) Uint8x16

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDUSB, CPU Feature: AVX

func (Uint8x16) And ¶

func (x Uint8x16) And(y Uint8x16) Uint8x16

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX

func (Uint8x16) AndNot ¶

func (x Uint8x16) AndNot(y Uint8x16) Uint8x16

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX

func (Uint8x16) AsFloat32x4 ¶

func (from Uint8x16) AsFloat32x4() (to Float32x4)

Float32x4 converts from Uint8x16 to Float32x4

func (Uint8x16) AsFloat64x2 ¶

func (from Uint8x16) AsFloat64x2() (to Float64x2)

Float64x2 converts from Uint8x16 to Float64x2

func (Uint8x16) AsInt16x8 ¶

func (from Uint8x16) AsInt16x8() (to Int16x8)

Int16x8 converts from Uint8x16 to Int16x8

func (Uint8x16) AsInt32x4 ¶

func (from Uint8x16) AsInt32x4() (to Int32x4)

Int32x4 converts from Uint8x16 to Int32x4

func (Uint8x16) AsInt64x2 ¶

func (from Uint8x16) AsInt64x2() (to Int64x2)

Int64x2 converts from Uint8x16 to Int64x2

func (Uint8x16) AsInt8x16 ¶

func (from Uint8x16) AsInt8x16() (to Int8x16)

Int8x16 converts from Uint8x16 to Int8x16

func (Uint8x16) AsUint16x8 ¶

func (from Uint8x16) AsUint16x8() (to Uint16x8)

Uint16x8 converts from Uint8x16 to Uint16x8

func (Uint8x16) AsUint32x4 ¶

func (from Uint8x16) AsUint32x4() (to Uint32x4)

Uint32x4 converts from Uint8x16 to Uint32x4

func (Uint8x16) AsUint64x2 ¶

func (from Uint8x16) AsUint64x2() (to Uint64x2)

Uint64x2 converts from Uint8x16 to Uint64x2

func (Uint8x16) Average ¶

func (x Uint8x16) Average(y Uint8x16) Uint8x16

Average computes the rounded average of corresponding elements.

Asm: VPAVGB, CPU Feature: AVX

func (Uint8x16) Broadcast128 ¶

func (x Uint8x16) Broadcast128() Uint8x16

Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.

Asm: VPBROADCASTB, CPU Feature: AVX2

func (Uint8x16) Broadcast256 ¶

func (x Uint8x16) Broadcast256() Uint8x32

Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.

Asm: VPBROADCASTB, CPU Feature: AVX2

func (Uint8x16) Broadcast512 ¶

func (x Uint8x16) Broadcast512() Uint8x64

Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.

Asm: VPBROADCASTB, CPU Feature: AVX512

func (Uint8x16) Compress ¶

func (x Uint8x16) Compress(mask Mask8x16) Uint8x16

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

func (Uint8x16) ConcatPermute ¶

func (x Uint8x16) ConcatPermute(y Uint8x16, indices Uint8x16) Uint8x16

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2B, CPU Feature: AVX512VBMI

func (Uint8x16) ConcatShiftBytesRight ¶

func (x Uint8x16) ConcatShiftBytesRight(constant uint8, y Uint8x16) Uint8x16

ConcatShiftBytesRight concatenates x and y and shift it right by constant bytes. The result vector will be the lower half of the concatenated vector.

constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPALIGNR, CPU Feature: AVX

func (Uint8x16) DotProductPairsSaturated ¶

func (x Uint8x16) DotProductPairsSaturated(y Int8x16) Int16x8

DotProductPairsSaturated multiplies the elements and add the pairs together with saturation, yielding a vector of half as many elements with twice the input element size.

Asm: VPMADDUBSW, CPU Feature: AVX

func (Uint8x16) Equal ¶

func (x Uint8x16) Equal(y Uint8x16) Mask8x16

Equal returns x equals y, elementwise.

Asm: VPCMPEQB, CPU Feature: AVX

func (Uint8x16) Expand ¶

func (x Uint8x16) Expand(mask Mask8x16) Uint8x16

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

func (Uint8x16) ExtendLo2ToUint64x2 ¶

func (x Uint8x16) ExtendLo2ToUint64x2() Uint64x2

ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXBQ, CPU Feature: AVX

func (Uint8x16) ExtendLo4ToUint32x4 ¶

func (x Uint8x16) ExtendLo4ToUint32x4() Uint32x4

ExtendLo4ToUint32x4 converts 4 lowest vector element values to uint32. The result vector's elements are zero-extended.

Asm: VPMOVZXBD, CPU Feature: AVX

func (Uint8x16) ExtendLo4ToUint64x4 ¶

func (x Uint8x16) ExtendLo4ToUint64x4() Uint64x4

ExtendLo4ToUint64x4 converts 4 lowest vector element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXBQ, CPU Feature: AVX2

func (Uint8x16) ExtendLo8ToUint16x8 ¶

func (x Uint8x16) ExtendLo8ToUint16x8() Uint16x8

ExtendLo8ToUint16x8 converts 8 lowest vector element values to uint16. The result vector's elements are zero-extended.

Asm: VPMOVZXBW, CPU Feature: AVX

func (Uint8x16) ExtendLo8ToUint32x8 ¶

func (x Uint8x16) ExtendLo8ToUint32x8() Uint32x8

ExtendLo8ToUint32x8 converts 8 lowest vector element values to uint32. The result vector's elements are zero-extended.

Asm: VPMOVZXBD, CPU Feature: AVX2

func (Uint8x16) ExtendLo8ToUint64x8 ¶

func (x Uint8x16) ExtendLo8ToUint64x8() Uint64x8

ExtendLo8ToUint64x8 converts 8 lowest vector element values to uint64. The result vector's elements are zero-extended.

Asm: VPMOVZXBQ, CPU Feature: AVX512

func (Uint8x16) ExtendToUint16 ¶

func (x Uint8x16) ExtendToUint16() Uint16x16

ExtendToUint16 converts element values to uint16. The result vector's elements are zero-extended.

Asm: VPMOVZXBW, CPU Feature: AVX2

func (Uint8x16) ExtendToUint32 ¶

func (x Uint8x16) ExtendToUint32() Uint32x16

ExtendToUint32 converts element values to uint32. The result vector's elements are zero-extended.

Asm: VPMOVZXBD, CPU Feature: AVX512

func (Uint8x16) GaloisFieldAffineTransform ¶

func (x Uint8x16) GaloisFieldAffineTransform(y Uint64x2, b uint8) Uint8x16

GaloisFieldAffineTransform computes an affine transformation in GF(2^8): x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.

b results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI

func (Uint8x16) GaloisFieldAffineTransformInverse ¶

func (x Uint8x16) GaloisFieldAffineTransformInverse(y Uint64x2, b uint8) Uint8x16

GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8), with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1: x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.

b results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI

func (Uint8x16) GaloisFieldMul ¶

func (x Uint8x16) GaloisFieldMul(y Uint8x16) Uint8x16

GaloisFieldMul computes element-wise GF(2^8) multiplication with reduction polynomial x^8 + x^4 + x^3 + x + 1.

Asm: VGF2P8MULB, CPU Feature: AVX512GFNI

func (Uint8x16) GetElem ¶

func (x Uint8x16) GetElem(index uint8) uint8

GetElem retrieves a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPEXTRB, CPU Feature: AVX512

func (Uint8x16) Greater ¶

func (x Uint8x16) Greater(y Uint8x16) Mask8x16

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX2

func (Uint8x16) GreaterEqual ¶

func (x Uint8x16) GreaterEqual(y Uint8x16) Mask8x16

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Uint8x16) IsZero ¶

func (x Uint8x16) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint8x16) Len ¶

func (x Uint8x16) Len() int

Len returns the number of elements in a Uint8x16

func (Uint8x16) Less ¶

func (x Uint8x16) Less(y Uint8x16) Mask8x16

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Uint8x16) LessEqual ¶

func (x Uint8x16) LessEqual(y Uint8x16) Mask8x16

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Uint8x16) Masked ¶

func (x Uint8x16) Masked(mask Mask8x16) Uint8x16

Masked returns x but with elements zeroed where mask is false.

func (Uint8x16) Max ¶

func (x Uint8x16) Max(y Uint8x16) Uint8x16

Max computes the maximum of corresponding elements.

Asm: VPMAXUB, CPU Feature: AVX

func (Uint8x16) Merge ¶

func (x Uint8x16) Merge(y Uint8x16, mask Mask8x16) Uint8x16

Merge returns x but with elements set to y where mask is false.

func (Uint8x16) Min ¶

func (x Uint8x16) Min(y Uint8x16) Uint8x16

Min computes the minimum of corresponding elements.

Asm: VPMINUB, CPU Feature: AVX

func (Uint8x16) Not ¶

func (x Uint8x16) Not() Uint8x16

Not returns the bitwise complement of x

Emulated, CPU Feature AVX

func (Uint8x16) NotEqual ¶

func (x Uint8x16) NotEqual(y Uint8x16) Mask8x16

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX

func (Uint8x16) OnesCount ¶

func (x Uint8x16) OnesCount() Uint8x16

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTB, CPU Feature: AVX512BITALG

func (Uint8x16) Or ¶

func (x Uint8x16) Or(y Uint8x16) Uint8x16

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX

func (Uint8x16) Permute ¶

func (x Uint8x16) Permute(indices Uint8x16) Uint8x16

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used

Asm: VPERMB, CPU Feature: AVX512VBMI

func (Uint8x16) PermuteOrZero ¶

func (x Uint8x16) PermuteOrZero(indices Int8x16) Uint8x16

PermuteOrZero performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The lower four bits of each byte-sized index in indices select an element from x, unless the index's sign bit is set in which case zero is used instead.

Asm: VPSHUFB, CPU Feature: AVX

func (Uint8x16) SetElem ¶

func (x Uint8x16) SetElem(index uint8, y uint8) Uint8x16

SetElem sets a single constant-indexed element's value.

index results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPINSRB, CPU Feature: AVX

func (Uint8x16) Store ¶

func (x Uint8x16) Store(y *[16]uint8)

Store stores a Uint8x16 to an array

func (Uint8x16) StoreSlice ¶

func (x Uint8x16) StoreSlice(s []uint8)

StoreSlice stores x into a slice of at least 16 uint8s

func (Uint8x16) StoreSlicePart ¶

func (x Uint8x16) StoreSlicePart(s []uint8)

StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.

func (Uint8x16) String ¶

func (x Uint8x16) String() string

String returns a string representation of SIMD vector x

func (Uint8x16) Sub ¶

func (x Uint8x16) Sub(y Uint8x16) Uint8x16

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBB, CPU Feature: AVX

func (Uint8x16) SubSaturated ¶

func (x Uint8x16) SubSaturated(y Uint8x16) Uint8x16

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBUSB, CPU Feature: AVX

func (Uint8x16) SumAbsDiff ¶

func (x Uint8x16) SumAbsDiff(y Uint8x16) Uint16x8

SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed. This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.

Asm: VPSADBW, CPU Feature: AVX

func (Uint8x16) Xor ¶

func (x Uint8x16) Xor(y Uint8x16) Uint8x16

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX

type Uint8x32 ¶

type Uint8x32 struct {
	// contains filtered or unexported fields
}

Uint8x32 is a 256-bit SIMD vector of 32 uint8

func BroadcastUint8x32 ¶

func BroadcastUint8x32(x uint8) Uint8x32

BroadcastUint8x32 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX2

func LoadUint8x32 ¶

func LoadUint8x32(y *[32]uint8) Uint8x32

LoadUint8x32 loads a Uint8x32 from an array

func LoadUint8x32Slice ¶

func LoadUint8x32Slice(s []uint8) Uint8x32

LoadUint8x32Slice loads an Uint8x32 from a slice of at least 32 uint8s

func LoadUint8x32SlicePart ¶

func LoadUint8x32SlicePart(s []uint8) Uint8x32

LoadUint8x32SlicePart loads a Uint8x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadUint8x32Slice.

func (Uint8x32) AESDecryptLastRound ¶

func (x Uint8x32) AESDecryptLastRound(y Uint32x8) Uint8x32

AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)

Asm: VAESDECLAST, CPU Feature: AVX512VAES

func (Uint8x32) AESDecryptOneRound ¶

func (x Uint8x32) AESDecryptOneRound(y Uint32x8) Uint8x32

AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)

Asm: VAESDEC, CPU Feature: AVX512VAES

func (Uint8x32) AESEncryptLastRound ¶

func (x Uint8x32) AESEncryptLastRound(y Uint32x8) Uint8x32

AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey((ShiftRows(SubBytes(x))), y)

Asm: VAESENCLAST, CPU Feature: AVX512VAES

func (Uint8x32) AESEncryptOneRound ¶

func (x Uint8x32) AESEncryptOneRound(y Uint32x8) Uint8x32

AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)

Asm: VAESENC, CPU Feature: AVX512VAES

func (Uint8x32) Add ¶

func (x Uint8x32) Add(y Uint8x32) Uint8x32

Add adds corresponding elements of two vectors.

Asm: VPADDB, CPU Feature: AVX2

func (Uint8x32) AddSaturated ¶

func (x Uint8x32) AddSaturated(y Uint8x32) Uint8x32

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDUSB, CPU Feature: AVX2

func (Uint8x32) And ¶

func (x Uint8x32) And(y Uint8x32) Uint8x32

And performs a bitwise AND operation between two vectors.

Asm: VPAND, CPU Feature: AVX2

func (Uint8x32) AndNot ¶

func (x Uint8x32) AndNot(y Uint8x32) Uint8x32

AndNot performs a bitwise x &^ y.

Asm: VPANDN, CPU Feature: AVX2

func (Uint8x32) AsFloat32x8 ¶

func (from Uint8x32) AsFloat32x8() (to Float32x8)

Float32x8 converts from Uint8x32 to Float32x8

func (Uint8x32) AsFloat64x4 ¶

func (from Uint8x32) AsFloat64x4() (to Float64x4)

Float64x4 converts from Uint8x32 to Float64x4

func (Uint8x32) AsInt16x16 ¶

func (from Uint8x32) AsInt16x16() (to Int16x16)

Int16x16 converts from Uint8x32 to Int16x16

func (Uint8x32) AsInt32x8 ¶

func (from Uint8x32) AsInt32x8() (to Int32x8)

Int32x8 converts from Uint8x32 to Int32x8

func (Uint8x32) AsInt64x4 ¶

func (from Uint8x32) AsInt64x4() (to Int64x4)

Int64x4 converts from Uint8x32 to Int64x4

func (Uint8x32) AsInt8x32 ¶

func (from Uint8x32) AsInt8x32() (to Int8x32)

Int8x32 converts from Uint8x32 to Int8x32

func (Uint8x32) AsUint16x16 ¶

func (from Uint8x32) AsUint16x16() (to Uint16x16)

Uint16x16 converts from Uint8x32 to Uint16x16

func (Uint8x32) AsUint32x8 ¶

func (from Uint8x32) AsUint32x8() (to Uint32x8)

Uint32x8 converts from Uint8x32 to Uint32x8

func (Uint8x32) AsUint64x4 ¶

func (from Uint8x32) AsUint64x4() (to Uint64x4)

Uint64x4 converts from Uint8x32 to Uint64x4

func (Uint8x32) Average ¶

func (x Uint8x32) Average(y Uint8x32) Uint8x32

Average computes the rounded average of corresponding elements.

Asm: VPAVGB, CPU Feature: AVX2

func (Uint8x32) Compress ¶

func (x Uint8x32) Compress(mask Mask8x32) Uint8x32

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

func (Uint8x32) ConcatPermute ¶

func (x Uint8x32) ConcatPermute(y Uint8x32, indices Uint8x32) Uint8x32

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2B, CPU Feature: AVX512VBMI

func (Uint8x32) ConcatShiftBytesRightGrouped ¶

func (x Uint8x32) ConcatShiftBytesRightGrouped(constant uint8, y Uint8x32) Uint8x32

ConcatShiftBytesRightGrouped concatenates x and y and shift it right by constant bytes. The result vector will be the lower half of the concatenated vector. This operation is performed grouped by each 16 byte.

constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPALIGNR, CPU Feature: AVX2

func (Uint8x32) DotProductPairsSaturated ¶

func (x Uint8x32) DotProductPairsSaturated(y Int8x32) Int16x16

DotProductPairsSaturated multiplies the elements and add the pairs together with saturation, yielding a vector of half as many elements with twice the input element size.

Asm: VPMADDUBSW, CPU Feature: AVX2

func (Uint8x32) Equal ¶

func (x Uint8x32) Equal(y Uint8x32) Mask8x32

Equal returns x equals y, elementwise.

Asm: VPCMPEQB, CPU Feature: AVX2

func (Uint8x32) Expand ¶

func (x Uint8x32) Expand(mask Mask8x32) Uint8x32

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

func (Uint8x32) ExtendToUint16 ¶

func (x Uint8x32) ExtendToUint16() Uint16x32

ExtendToUint16 converts element values to uint16. The result vector's elements are zero-extended.

Asm: VPMOVZXBW, CPU Feature: AVX512

func (Uint8x32) GaloisFieldAffineTransform ¶

func (x Uint8x32) GaloisFieldAffineTransform(y Uint64x4, b uint8) Uint8x32

GaloisFieldAffineTransform computes an affine transformation in GF(2^8): x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.

b results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI

func (Uint8x32) GaloisFieldAffineTransformInverse ¶

func (x Uint8x32) GaloisFieldAffineTransformInverse(y Uint64x4, b uint8) Uint8x32

GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8), with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1: x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.

b results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI

func (Uint8x32) GaloisFieldMul ¶

func (x Uint8x32) GaloisFieldMul(y Uint8x32) Uint8x32

GaloisFieldMul computes element-wise GF(2^8) multiplication with reduction polynomial x^8 + x^4 + x^3 + x + 1.

Asm: VGF2P8MULB, CPU Feature: AVX512GFNI

func (Uint8x32) GetHi ¶

func (x Uint8x32) GetHi() Uint8x16

GetHi returns the upper half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint8x32) GetLo ¶

func (x Uint8x32) GetLo() Uint8x16

GetLo returns the lower half of x.

Asm: VEXTRACTI128, CPU Feature: AVX2

func (Uint8x32) Greater ¶

func (x Uint8x32) Greater(y Uint8x32) Mask8x32

Greater returns a mask whose elements indicate whether x > y

Emulated, CPU Feature AVX2

func (Uint8x32) GreaterEqual ¶

func (x Uint8x32) GreaterEqual(y Uint8x32) Mask8x32

GreaterEqual returns a mask whose elements indicate whether x >= y

Emulated, CPU Feature AVX2

func (Uint8x32) IsZero ¶

func (x Uint8x32) IsZero() bool

IsZero returns true if all elements of x are zeros.

This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y

Asm: VPTEST, CPU Feature: AVX

func (Uint8x32) Len ¶

func (x Uint8x32) Len() int

Len returns the number of elements in a Uint8x32

func (Uint8x32) Less ¶

func (x Uint8x32) Less(y Uint8x32) Mask8x32

Less returns a mask whose elements indicate whether x < y

Emulated, CPU Feature AVX2

func (Uint8x32) LessEqual ¶

func (x Uint8x32) LessEqual(y Uint8x32) Mask8x32

LessEqual returns a mask whose elements indicate whether x <= y

Emulated, CPU Feature AVX2

func (Uint8x32) Masked ¶

func (x Uint8x32) Masked(mask Mask8x32) Uint8x32

Masked returns x but with elements zeroed where mask is false.

func (Uint8x32) Max ¶

func (x Uint8x32) Max(y Uint8x32) Uint8x32

Max computes the maximum of corresponding elements.

Asm: VPMAXUB, CPU Feature: AVX2

func (Uint8x32) Merge ¶

func (x Uint8x32) Merge(y Uint8x32, mask Mask8x32) Uint8x32

Merge returns x but with elements set to y where mask is false.

func (Uint8x32) Min ¶

func (x Uint8x32) Min(y Uint8x32) Uint8x32

Min computes the minimum of corresponding elements.

Asm: VPMINUB, CPU Feature: AVX2

func (Uint8x32) Not ¶

func (x Uint8x32) Not() Uint8x32

Not returns the bitwise complement of x

Emulated, CPU Feature AVX2

func (Uint8x32) NotEqual ¶

func (x Uint8x32) NotEqual(y Uint8x32) Mask8x32

NotEqual returns a mask whose elements indicate whether x != y

Emulated, CPU Feature AVX2

func (Uint8x32) OnesCount ¶

func (x Uint8x32) OnesCount() Uint8x32

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTB, CPU Feature: AVX512BITALG

func (Uint8x32) Or ¶

func (x Uint8x32) Or(y Uint8x32) Uint8x32

Or performs a bitwise OR operation between two vectors.

Asm: VPOR, CPU Feature: AVX2

func (Uint8x32) Permute ¶

func (x Uint8x32) Permute(indices Uint8x32) Uint8x32

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 5 bits (values 0-31) of each element of indices is used

Asm: VPERMB, CPU Feature: AVX512VBMI

func (Uint8x32) PermuteOrZeroGrouped ¶

func (x Uint8x32) PermuteOrZeroGrouped(indices Int8x32) Uint8x32

PermuteOrZeroGrouped performs a grouped permutation of vector x using indices: result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...} The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.

Asm: VPSHUFB, CPU Feature: AVX2

func (Uint8x32) Select128FromPair ¶

func (x Uint8x32) Select128FromPair(lo, hi uint8, y Uint8x32) Uint8x32

Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,

{0x40, 0x41, ..., 0x4f, 0x50, 0x51, ..., 0x5f}.Select128FromPair(3, 0,
     {0x60, 0x61, ..., 0x6f, 0x70, 0x71, ..., 0x7f})

returns {0x70, 0x71, ..., 0x7f, 0x40, 0x41, ..., 0x4f}.

lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.

Asm: VPERM2I128, CPU Feature: AVX2

func (Uint8x32) SetHi ¶

func (x Uint8x32) SetHi(y Uint8x16) Uint8x32

SetHi returns x with its upper half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint8x32) SetLo ¶

func (x Uint8x32) SetLo(y Uint8x16) Uint8x32

SetLo returns x with its lower half set to y.

Asm: VINSERTI128, CPU Feature: AVX2

func (Uint8x32) Store ¶

func (x Uint8x32) Store(y *[32]uint8)

Store stores a Uint8x32 to an array

func (Uint8x32) StoreSlice ¶

func (x Uint8x32) StoreSlice(s []uint8)

StoreSlice stores x into a slice of at least 32 uint8s

func (Uint8x32) StoreSlicePart ¶

func (x Uint8x32) StoreSlicePart(s []uint8)

StoreSlicePart stores the 32 elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.

func (Uint8x32) String ¶

func (x Uint8x32) String() string

String returns a string representation of SIMD vector x

func (Uint8x32) Sub ¶

func (x Uint8x32) Sub(y Uint8x32) Uint8x32

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBB, CPU Feature: AVX2

func (Uint8x32) SubSaturated ¶

func (x Uint8x32) SubSaturated(y Uint8x32) Uint8x32

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBUSB, CPU Feature: AVX2

func (Uint8x32) SumAbsDiff ¶

func (x Uint8x32) SumAbsDiff(y Uint8x32) Uint16x16

SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed. This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.

Asm: VPSADBW, CPU Feature: AVX2

func (Uint8x32) Xor ¶

func (x Uint8x32) Xor(y Uint8x32) Uint8x32

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXOR, CPU Feature: AVX2

type Uint8x64 ¶

type Uint8x64 struct {
	// contains filtered or unexported fields
}

Uint8x64 is a 512-bit SIMD vector of 64 uint8

func BroadcastUint8x64 ¶

func BroadcastUint8x64(x uint8) Uint8x64

BroadcastUint8x64 returns a vector with the input x assigned to all elements of the output.

Emulated, CPU Feature AVX512BW

func LoadMaskedUint8x64 ¶

func LoadMaskedUint8x64(y *[64]uint8, mask Mask8x64) Uint8x64

LoadMaskedUint8x64 loads a Uint8x64 from an array, at those elements enabled by mask

Asm: VMOVDQU8.Z, CPU Feature: AVX512

func LoadUint8x64 ¶

func LoadUint8x64(y *[64]uint8) Uint8x64

LoadUint8x64 loads a Uint8x64 from an array

func LoadUint8x64Slice ¶

func LoadUint8x64Slice(s []uint8) Uint8x64

LoadUint8x64Slice loads an Uint8x64 from a slice of at least 64 uint8s

func LoadUint8x64SlicePart ¶

func LoadUint8x64SlicePart(s []uint8) Uint8x64

LoadUint8x64SlicePart loads a Uint8x64 from the slice s. If s has fewer than 64 elements, the remaining elements of the vector are filled with zeroes. If s has 64 or more elements, the function is equivalent to LoadUint8x64Slice.

func (Uint8x64) AESDecryptLastRound ¶

func (x Uint8x64) AESDecryptLastRound(y Uint32x16) Uint8x64

AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)

Asm: VAESDECLAST, CPU Feature: AVX512VAES

func (Uint8x64) AESDecryptOneRound ¶

func (x Uint8x64) AESDecryptOneRound(y Uint32x16) Uint8x64

AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)

Asm: VAESDEC, CPU Feature: AVX512VAES

func (Uint8x64) AESEncryptLastRound ¶

func (x Uint8x64) AESEncryptLastRound(y Uint32x16) Uint8x64

AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey((ShiftRows(SubBytes(x))), y)

Asm: VAESENCLAST, CPU Feature: AVX512VAES

func (Uint8x64) AESEncryptOneRound ¶

func (x Uint8x64) AESEncryptOneRound(y Uint32x16) Uint8x64

AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)

Asm: VAESENC, CPU Feature: AVX512VAES

func (Uint8x64) Add ¶

func (x Uint8x64) Add(y Uint8x64) Uint8x64

Add adds corresponding elements of two vectors.

Asm: VPADDB, CPU Feature: AVX512

func (Uint8x64) AddSaturated ¶

func (x Uint8x64) AddSaturated(y Uint8x64) Uint8x64

AddSaturated adds corresponding elements of two vectors with saturation.

Asm: VPADDUSB, CPU Feature: AVX512

func (Uint8x64) And ¶

func (x Uint8x64) And(y Uint8x64) Uint8x64

And performs a bitwise AND operation between two vectors.

Asm: VPANDD, CPU Feature: AVX512

func (Uint8x64) AndNot ¶

func (x Uint8x64) AndNot(y Uint8x64) Uint8x64

AndNot performs a bitwise x &^ y.

Asm: VPANDND, CPU Feature: AVX512

func (Uint8x64) AsFloat32x16 ¶

func (from Uint8x64) AsFloat32x16() (to Float32x16)

Float32x16 converts from Uint8x64 to Float32x16

func (Uint8x64) AsFloat64x8 ¶

func (from Uint8x64) AsFloat64x8() (to Float64x8)

Float64x8 converts from Uint8x64 to Float64x8

func (Uint8x64) AsInt16x32 ¶

func (from Uint8x64) AsInt16x32() (to Int16x32)

Int16x32 converts from Uint8x64 to Int16x32

func (Uint8x64) AsInt32x16 ¶

func (from Uint8x64) AsInt32x16() (to Int32x16)

Int32x16 converts from Uint8x64 to Int32x16

func (Uint8x64) AsInt64x8 ¶

func (from Uint8x64) AsInt64x8() (to Int64x8)

Int64x8 converts from Uint8x64 to Int64x8

func (Uint8x64) AsInt8x64 ¶

func (from Uint8x64) AsInt8x64() (to Int8x64)

Int8x64 converts from Uint8x64 to Int8x64

func (Uint8x64) AsUint16x32 ¶

func (from Uint8x64) AsUint16x32() (to Uint16x32)

Uint16x32 converts from Uint8x64 to Uint16x32

func (Uint8x64) AsUint32x16 ¶

func (from Uint8x64) AsUint32x16() (to Uint32x16)

Uint32x16 converts from Uint8x64 to Uint32x16

func (Uint8x64) AsUint64x8 ¶

func (from Uint8x64) AsUint64x8() (to Uint64x8)

Uint64x8 converts from Uint8x64 to Uint64x8

func (Uint8x64) Average ¶

func (x Uint8x64) Average(y Uint8x64) Uint8x64

Average computes the rounded average of corresponding elements.

Asm: VPAVGB, CPU Feature: AVX512

func (Uint8x64) Compress ¶

func (x Uint8x64) Compress(mask Mask8x64) Uint8x64

Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.

Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2

func (Uint8x64) ConcatPermute ¶

func (x Uint8x64) ConcatPermute(y Uint8x64, indices Uint8x64) Uint8x64

ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.

Asm: VPERMI2B, CPU Feature: AVX512VBMI

func (Uint8x64) ConcatShiftBytesRightGrouped ¶

func (x Uint8x64) ConcatShiftBytesRightGrouped(constant uint8, y Uint8x64) Uint8x64

ConcatShiftBytesRightGrouped concatenates x and y and shift it right by constant bytes. The result vector will be the lower half of the concatenated vector. This operation is performed grouped by each 16 byte.

constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VPALIGNR, CPU Feature: AVX512

func (Uint8x64) DotProductPairsSaturated ¶

func (x Uint8x64) DotProductPairsSaturated(y Int8x64) Int16x32

DotProductPairsSaturated multiplies the elements and add the pairs together with saturation, yielding a vector of half as many elements with twice the input element size.

Asm: VPMADDUBSW, CPU Feature: AVX512

func (Uint8x64) Equal ¶

func (x Uint8x64) Equal(y Uint8x64) Mask8x64

Equal returns x equals y, elementwise.

Asm: VPCMPEQB, CPU Feature: AVX512

func (Uint8x64) Expand ¶

func (x Uint8x64) Expand(mask Mask8x64) Uint8x64

Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.

Asm: VPEXPANDB, CPU Feature: AVX512VBMI2

func (Uint8x64) GaloisFieldAffineTransform ¶

func (x Uint8x64) GaloisFieldAffineTransform(y Uint64x8, b uint8) Uint8x64

GaloisFieldAffineTransform computes an affine transformation in GF(2^8): x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.

b results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI

func (Uint8x64) GaloisFieldAffineTransformInverse ¶

func (x Uint8x64) GaloisFieldAffineTransformInverse(y Uint64x8, b uint8) Uint8x64

GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8), with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1: x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.

b results in better performance when it's a constant, a non-constant value will be translated into a jump table.

Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI

func (Uint8x64) GaloisFieldMul ¶

func (x Uint8x64) GaloisFieldMul(y Uint8x64) Uint8x64

GaloisFieldMul computes element-wise GF(2^8) multiplication with reduction polynomial x^8 + x^4 + x^3 + x + 1.

Asm: VGF2P8MULB, CPU Feature: AVX512GFNI

func (Uint8x64) GetHi ¶

func (x Uint8x64) GetHi() Uint8x32

GetHi returns the upper half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint8x64) GetLo ¶

func (x Uint8x64) GetLo() Uint8x32

GetLo returns the lower half of x.

Asm: VEXTRACTI64X4, CPU Feature: AVX512

func (Uint8x64) Greater ¶

func (x Uint8x64) Greater(y Uint8x64) Mask8x64

Greater returns x greater-than y, elementwise.

Asm: VPCMPUB, CPU Feature: AVX512

func (Uint8x64) GreaterEqual ¶

func (x Uint8x64) GreaterEqual(y Uint8x64) Mask8x64

GreaterEqual returns x greater-than-or-equals y, elementwise.

Asm: VPCMPUB, CPU Feature: AVX512

func (Uint8x64) Len ¶

func (x Uint8x64) Len() int

Len returns the number of elements in a Uint8x64

func (Uint8x64) Less ¶

func (x Uint8x64) Less(y Uint8x64) Mask8x64

Less returns x less-than y, elementwise.

Asm: VPCMPUB, CPU Feature: AVX512

func (Uint8x64) LessEqual ¶

func (x Uint8x64) LessEqual(y Uint8x64) Mask8x64

LessEqual returns x less-than-or-equals y, elementwise.

Asm: VPCMPUB, CPU Feature: AVX512

func (Uint8x64) Masked ¶

func (x Uint8x64) Masked(mask Mask8x64) Uint8x64

Masked returns x but with elements zeroed where mask is false.

func (Uint8x64) Max ¶

func (x Uint8x64) Max(y Uint8x64) Uint8x64

Max computes the maximum of corresponding elements.

Asm: VPMAXUB, CPU Feature: AVX512

func (Uint8x64) Merge ¶

func (x Uint8x64) Merge(y Uint8x64, mask Mask8x64) Uint8x64

Merge returns x but with elements set to y where m is false.

func (Uint8x64) Min ¶

func (x Uint8x64) Min(y Uint8x64) Uint8x64

Min computes the minimum of corresponding elements.

Asm: VPMINUB, CPU Feature: AVX512

func (Uint8x64) Not ¶

func (x Uint8x64) Not() Uint8x64

Not returns the bitwise complement of x

Emulated, CPU Feature AVX512

func (Uint8x64) NotEqual ¶

func (x Uint8x64) NotEqual(y Uint8x64) Mask8x64

NotEqual returns x not-equals y, elementwise.

Asm: VPCMPUB, CPU Feature: AVX512

func (Uint8x64) OnesCount ¶

func (x Uint8x64) OnesCount() Uint8x64

OnesCount counts the number of set bits in each element.

Asm: VPOPCNTB, CPU Feature: AVX512BITALG

func (Uint8x64) Or ¶

func (x Uint8x64) Or(y Uint8x64) Uint8x64

Or performs a bitwise OR operation between two vectors.

Asm: VPORD, CPU Feature: AVX512

func (Uint8x64) Permute ¶

func (x Uint8x64) Permute(indices Uint8x64) Uint8x64

Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 6 bits (values 0-63) of each element of indices is used

Asm: VPERMB, CPU Feature: AVX512VBMI

func (Uint8x64) PermuteOrZeroGrouped ¶

func (x Uint8x64) PermuteOrZeroGrouped(indices Int8x64) Uint8x64

PermuteOrZeroGrouped performs a grouped permutation of vector x using indices: result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...} The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.

Asm: VPSHUFB, CPU Feature: AVX512

func (Uint8x64) SetHi ¶

func (x Uint8x64) SetHi(y Uint8x32) Uint8x64

SetHi returns x with its upper half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint8x64) SetLo ¶

func (x Uint8x64) SetLo(y Uint8x32) Uint8x64

SetLo returns x with its lower half set to y.

Asm: VINSERTI64X4, CPU Feature: AVX512

func (Uint8x64) Store ¶

func (x Uint8x64) Store(y *[64]uint8)

Store stores a Uint8x64 to an array

func (Uint8x64) StoreMasked ¶

func (x Uint8x64) StoreMasked(y *[64]uint8, mask Mask8x64)

StoreMasked stores a Uint8x64 to an array, at those elements enabled by mask

Asm: VMOVDQU8, CPU Feature: AVX512

func (Uint8x64) StoreSlice ¶

func (x Uint8x64) StoreSlice(s []uint8)

StoreSlice stores x into a slice of at least 64 uint8s

func (Uint8x64) StoreSlicePart ¶

func (x Uint8x64) StoreSlicePart(s []uint8)

StoreSlicePart stores the 64 elements of x into the slice s. It stores as many elements as will fit in s. If s has 64 or more elements, the method is equivalent to x.StoreSlice.

func (Uint8x64) String ¶

func (x Uint8x64) String() string

String returns a string representation of SIMD vector x

func (Uint8x64) Sub ¶

func (x Uint8x64) Sub(y Uint8x64) Uint8x64

Sub subtracts corresponding elements of two vectors.

Asm: VPSUBB, CPU Feature: AVX512

func (Uint8x64) SubSaturated ¶

func (x Uint8x64) SubSaturated(y Uint8x64) Uint8x64

SubSaturated subtracts corresponding elements of two vectors with saturation.

Asm: VPSUBUSB, CPU Feature: AVX512

func (Uint8x64) SumAbsDiff ¶

func (x Uint8x64) SumAbsDiff(y Uint8x64) Uint16x32

SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed. This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.

Asm: VPSADBW, CPU Feature: AVX512

func (Uint8x64) Xor ¶

func (x Uint8x64) Xor(y Uint8x64) Uint8x64

Xor performs a bitwise XOR operation between two vectors.

Asm: VPXORD, CPU Feature: AVX512

type X86Features ¶

type X86Features struct{}

var X86 X86Features

func (X86Features) AES ¶

func (X86Features) AES() bool

AES returns whether the CPU supports the AES feature.

AES is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX ¶

func (X86Features) AVX() bool

AVX returns whether the CPU supports the AVX feature.

AVX is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX2 ¶

func (X86Features) AVX2() bool

AVX2 returns whether the CPU supports the AVX2 feature.

AVX2 is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512 ¶

func (X86Features) AVX512() bool

AVX512 returns whether the CPU supports the AVX512F+CD+BW+DQ+VL features.

These five CPU features are bundled together, and no use of AVX-512 is allowed unless all of these features are supported together. Nearly every CPU that has shipped with any support for AVX-512 has supported all five of these features.

AVX512 is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512BITALG ¶

func (X86Features) AVX512BITALG() bool

AVX512BITALG returns whether the CPU supports the AVX512BITALG feature.

AVX512BITALG is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512GFNI ¶

func (X86Features) AVX512GFNI() bool

AVX512GFNI returns whether the CPU supports the AVX512GFNI feature.

AVX512GFNI is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512VAES ¶

func (X86Features) AVX512VAES() bool

AVX512VAES returns whether the CPU supports the AVX512VAES feature.

AVX512VAES is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512VBMI ¶

func (X86Features) AVX512VBMI() bool

AVX512VBMI returns whether the CPU supports the AVX512VBMI feature.

AVX512VBMI is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512VBMI2 ¶

func (X86Features) AVX512VBMI2() bool

AVX512VBMI2 returns whether the CPU supports the AVX512VBMI2 feature.

AVX512VBMI2 is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512VNNI ¶

func (X86Features) AVX512VNNI() bool

AVX512VNNI returns whether the CPU supports the AVX512VNNI feature.

AVX512VNNI is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512VPCLMULQDQ ¶

func (X86Features) AVX512VPCLMULQDQ() bool

AVX512VPCLMULQDQ returns whether the CPU supports the AVX512VPCLMULQDQ feature.

AVX512VPCLMULQDQ is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVX512VPOPCNTDQ ¶

func (X86Features) AVX512VPOPCNTDQ() bool

AVX512VPOPCNTDQ returns whether the CPU supports the AVX512VPOPCNTDQ feature.

AVX512VPOPCNTDQ is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) AVXVNNI ¶

func (X86Features) AVXVNNI() bool

AVXVNNI returns whether the CPU supports the AVXVNNI feature.

AVXVNNI is defined on all GOARCHes, but will only return true on GOARCH amd64.

func (X86Features) SHA ¶

func (X86Features) SHA() bool

SHA returns whether the CPU supports the SHA feature.

SHA is defined on all GOARCHes, but will only return true on GOARCH amd64.

Notes ¶

Bugs ¶

Using a vector type as a type parameter may not work.
Using reflect Call to call a vector function/method may not work.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL