Documentation
¶
Overview ¶
Package archsimd provides access to architecture-specific SIMD operations.
This is a low-level package that exposes hardware-specific functionality. It currently supports AMD64.
This package is experimental, and not subject to the Go 1 compatibility promise. It only exists when building with the GOEXPERIMENT=simd environment variable set.
Vector types and operations ¶
Vector types are defined as structs, such as Int8x16 and Float64x8, corresponding to the hardware's vector registers. On AMD64, 128-, 256-, and 512-bit vectors are supported.
Mask types are defined similarly, such as Mask8x16, and are represented as opaque types, handling the differences in the underlying representations. A mask can be converted to/from the corresponding integer vector type, or to/from a bitmask.
Operations are mostly defined as methods on the vector types. Most of them are compiler intrinsics and correspond directly to hardware instructions.
Common operations include:
- Load/Store: Load a vector from memory or store a vector to memory.
- Arithmetic: Add, Sub, Mul, etc.
- Bitwise: And, Or, Xor, etc.
- Comparison: Equal, Greater, etc., which produce a mask.
- Conversion: Convert between different vector types.
- Field selection and rearrangement: GetElem, Permute, etc.
- Masking: Masked, Merge.
The compiler recognizes certain patterns of operations and may optimize them to more performant instructions. For example, on AVX512, an Add operation followed by Masked may be optimized to a masked add instruction. For this reason, not all hardware instructions are available as APIs.
CPU feature checks ¶
The package provides global variables to check for CPU features available at runtime. For example, on AMD64, the X86 variable provides methods to check for AVX2, AVX512, etc. It is recommended to check for CPU features before using the corresponding vector operations.
Notes ¶
- This package is not portable, as the available types and operations depend on the target architecture. It is not recommended to expose the SIMD types defined in this package in public APIs.
- For performance reasons, it is recommended to use the vector types directly as values. It is not recommended to take the address of a vector type, allocate it in the heap, or put it in an aggregate type.
Index ¶
- func ClearAVXUpperBits()
- type Float32x16
- func (x Float32x16) Add(y Float32x16) Float32x16
- func (x Float32x16) AsFloat64x8() Float64x8
- func (x Float32x16) AsInt16x32() Int16x32
- func (x Float32x16) AsInt32x16() Int32x16
- func (x Float32x16) AsInt64x8() Int64x8
- func (x Float32x16) AsInt8x64() Int8x64
- func (x Float32x16) AsUint16x32() Uint16x32
- func (x Float32x16) AsUint32x16() Uint32x16
- func (x Float32x16) AsUint64x8() Uint64x8
- func (x Float32x16) AsUint8x64() Uint8x64
- func (x Float32x16) CeilScaled(prec uint8) Float32x16
- func (x Float32x16) CeilScaledResidue(prec uint8) Float32x16
- func (x Float32x16) Compress(mask Mask32x16) Float32x16
- func (x Float32x16) ConcatPermute(y Float32x16, indices Uint32x16) Float32x16
- func (x Float32x16) ConvertToInt32() Int32x16
- func (x Float32x16) ConvertToUint32() Uint32x16
- func (x Float32x16) Div(y Float32x16) Float32x16
- func (x Float32x16) Equal(y Float32x16) Mask32x16
- func (x Float32x16) Expand(mask Mask32x16) Float32x16
- func (x Float32x16) FloorScaled(prec uint8) Float32x16
- func (x Float32x16) FloorScaledResidue(prec uint8) Float32x16
- func (x Float32x16) GetHi() Float32x8
- func (x Float32x16) GetLo() Float32x8
- func (x Float32x16) Greater(y Float32x16) Mask32x16
- func (x Float32x16) GreaterEqual(y Float32x16) Mask32x16
- func (x Float32x16) IsNaN() Mask32x16
- func (x Float32x16) Len() int
- func (x Float32x16) Less(y Float32x16) Mask32x16
- func (x Float32x16) LessEqual(y Float32x16) Mask32x16
- func (x Float32x16) Masked(mask Mask32x16) Float32x16
- func (x Float32x16) Max(y Float32x16) Float32x16
- func (x Float32x16) Merge(y Float32x16, mask Mask32x16) Float32x16
- func (x Float32x16) Min(y Float32x16) Float32x16
- func (x Float32x16) Mul(y Float32x16) Float32x16
- func (x Float32x16) MulAdd(y Float32x16, z Float32x16) Float32x16
- func (x Float32x16) MulAddSub(y Float32x16, z Float32x16) Float32x16
- func (x Float32x16) MulSubAdd(y Float32x16, z Float32x16) Float32x16
- func (x Float32x16) NotEqual(y Float32x16) Mask32x16
- func (x Float32x16) Permute(indices Uint32x16) Float32x16
- func (x Float32x16) Reciprocal() Float32x16
- func (x Float32x16) ReciprocalSqrt() Float32x16
- func (x Float32x16) RoundToEvenScaled(prec uint8) Float32x16
- func (x Float32x16) RoundToEvenScaledResidue(prec uint8) Float32x16
- func (x Float32x16) Scale(y Float32x16) Float32x16
- func (x Float32x16) SelectFromPairGrouped(a, b, c, d uint8, y Float32x16) Float32x16
- func (x Float32x16) SetHi(y Float32x8) Float32x16
- func (x Float32x16) SetLo(y Float32x8) Float32x16
- func (x Float32x16) Sqrt() Float32x16
- func (x Float32x16) Store(y *[16]float32)
- func (x Float32x16) StoreMasked(y *[16]float32, mask Mask32x16)
- func (x Float32x16) StoreSlice(s []float32)
- func (x Float32x16) StoreSlicePart(s []float32)
- func (x Float32x16) String() string
- func (x Float32x16) Sub(y Float32x16) Float32x16
- func (x Float32x16) TruncScaled(prec uint8) Float32x16
- func (x Float32x16) TruncScaledResidue(prec uint8) Float32x16
- type Float32x4
- func (x Float32x4) Add(y Float32x4) Float32x4
- func (x Float32x4) AddPairs(y Float32x4) Float32x4
- func (x Float32x4) AddSub(y Float32x4) Float32x4
- func (x Float32x4) AsFloat64x2() Float64x2
- func (x Float32x4) AsInt16x8() Int16x8
- func (x Float32x4) AsInt32x4() Int32x4
- func (x Float32x4) AsInt64x2() Int64x2
- func (x Float32x4) AsInt8x16() Int8x16
- func (x Float32x4) AsUint16x8() Uint16x8
- func (x Float32x4) AsUint32x4() Uint32x4
- func (x Float32x4) AsUint64x2() Uint64x2
- func (x Float32x4) AsUint8x16() Uint8x16
- func (x Float32x4) Broadcast1To16() Float32x16
- func (x Float32x4) Broadcast1To4() Float32x4
- func (x Float32x4) Broadcast1To8() Float32x8
- func (x Float32x4) Ceil() Float32x4
- func (x Float32x4) CeilScaled(prec uint8) Float32x4
- func (x Float32x4) CeilScaledResidue(prec uint8) Float32x4
- func (x Float32x4) Compress(mask Mask32x4) Float32x4
- func (x Float32x4) ConcatPermute(y Float32x4, indices Uint32x4) Float32x4
- func (x Float32x4) ConvertToFloat64() Float64x4
- func (x Float32x4) ConvertToInt32() Int32x4
- func (x Float32x4) ConvertToInt64() Int64x4
- func (x Float32x4) ConvertToUint32() Uint32x4
- func (x Float32x4) ConvertToUint64() Uint64x4
- func (x Float32x4) Div(y Float32x4) Float32x4
- func (x Float32x4) Equal(y Float32x4) Mask32x4
- func (x Float32x4) Expand(mask Mask32x4) Float32x4
- func (x Float32x4) Floor() Float32x4
- func (x Float32x4) FloorScaled(prec uint8) Float32x4
- func (x Float32x4) FloorScaledResidue(prec uint8) Float32x4
- func (x Float32x4) GetElem(index uint8) float32
- func (x Float32x4) Greater(y Float32x4) Mask32x4
- func (x Float32x4) GreaterEqual(y Float32x4) Mask32x4
- func (x Float32x4) IsNaN() Mask32x4
- func (x Float32x4) Len() int
- func (x Float32x4) Less(y Float32x4) Mask32x4
- func (x Float32x4) LessEqual(y Float32x4) Mask32x4
- func (x Float32x4) Masked(mask Mask32x4) Float32x4
- func (x Float32x4) Max(y Float32x4) Float32x4
- func (x Float32x4) Merge(y Float32x4, mask Mask32x4) Float32x4
- func (x Float32x4) Min(y Float32x4) Float32x4
- func (x Float32x4) Mul(y Float32x4) Float32x4
- func (x Float32x4) MulAdd(y Float32x4, z Float32x4) Float32x4
- func (x Float32x4) MulAddSub(y Float32x4, z Float32x4) Float32x4
- func (x Float32x4) MulSubAdd(y Float32x4, z Float32x4) Float32x4
- func (x Float32x4) NotEqual(y Float32x4) Mask32x4
- func (x Float32x4) Reciprocal() Float32x4
- func (x Float32x4) ReciprocalSqrt() Float32x4
- func (x Float32x4) RoundToEven() Float32x4
- func (x Float32x4) RoundToEvenScaled(prec uint8) Float32x4
- func (x Float32x4) RoundToEvenScaledResidue(prec uint8) Float32x4
- func (x Float32x4) Scale(y Float32x4) Float32x4
- func (x Float32x4) SelectFromPair(a, b, c, d uint8, y Float32x4) Float32x4
- func (x Float32x4) SetElem(index uint8, y float32) Float32x4
- func (x Float32x4) Sqrt() Float32x4
- func (x Float32x4) Store(y *[4]float32)
- func (x Float32x4) StoreMasked(y *[4]float32, mask Mask32x4)
- func (x Float32x4) StoreSlice(s []float32)
- func (x Float32x4) StoreSlicePart(s []float32)
- func (x Float32x4) String() string
- func (x Float32x4) Sub(y Float32x4) Float32x4
- func (x Float32x4) SubPairs(y Float32x4) Float32x4
- func (x Float32x4) Trunc() Float32x4
- func (x Float32x4) TruncScaled(prec uint8) Float32x4
- func (x Float32x4) TruncScaledResidue(prec uint8) Float32x4
- type Float32x8
- func (x Float32x8) Add(y Float32x8) Float32x8
- func (x Float32x8) AddPairsGrouped(y Float32x8) Float32x8
- func (x Float32x8) AddSub(y Float32x8) Float32x8
- func (x Float32x8) AsFloat64x4() Float64x4
- func (x Float32x8) AsInt16x16() Int16x16
- func (x Float32x8) AsInt32x8() Int32x8
- func (x Float32x8) AsInt64x4() Int64x4
- func (x Float32x8) AsInt8x32() Int8x32
- func (x Float32x8) AsUint16x16() Uint16x16
- func (x Float32x8) AsUint32x8() Uint32x8
- func (x Float32x8) AsUint64x4() Uint64x4
- func (x Float32x8) AsUint8x32() Uint8x32
- func (x Float32x8) Ceil() Float32x8
- func (x Float32x8) CeilScaled(prec uint8) Float32x8
- func (x Float32x8) CeilScaledResidue(prec uint8) Float32x8
- func (x Float32x8) Compress(mask Mask32x8) Float32x8
- func (x Float32x8) ConcatPermute(y Float32x8, indices Uint32x8) Float32x8
- func (x Float32x8) ConvertToFloat64() Float64x8
- func (x Float32x8) ConvertToInt32() Int32x8
- func (x Float32x8) ConvertToInt64() Int64x8
- func (x Float32x8) ConvertToUint32() Uint32x8
- func (x Float32x8) ConvertToUint64() Uint64x8
- func (x Float32x8) Div(y Float32x8) Float32x8
- func (x Float32x8) Equal(y Float32x8) Mask32x8
- func (x Float32x8) Expand(mask Mask32x8) Float32x8
- func (x Float32x8) Floor() Float32x8
- func (x Float32x8) FloorScaled(prec uint8) Float32x8
- func (x Float32x8) FloorScaledResidue(prec uint8) Float32x8
- func (x Float32x8) GetHi() Float32x4
- func (x Float32x8) GetLo() Float32x4
- func (x Float32x8) Greater(y Float32x8) Mask32x8
- func (x Float32x8) GreaterEqual(y Float32x8) Mask32x8
- func (x Float32x8) IsNaN() Mask32x8
- func (x Float32x8) Len() int
- func (x Float32x8) Less(y Float32x8) Mask32x8
- func (x Float32x8) LessEqual(y Float32x8) Mask32x8
- func (x Float32x8) Masked(mask Mask32x8) Float32x8
- func (x Float32x8) Max(y Float32x8) Float32x8
- func (x Float32x8) Merge(y Float32x8, mask Mask32x8) Float32x8
- func (x Float32x8) Min(y Float32x8) Float32x8
- func (x Float32x8) Mul(y Float32x8) Float32x8
- func (x Float32x8) MulAdd(y Float32x8, z Float32x8) Float32x8
- func (x Float32x8) MulAddSub(y Float32x8, z Float32x8) Float32x8
- func (x Float32x8) MulSubAdd(y Float32x8, z Float32x8) Float32x8
- func (x Float32x8) NotEqual(y Float32x8) Mask32x8
- func (x Float32x8) Permute(indices Uint32x8) Float32x8
- func (x Float32x8) Reciprocal() Float32x8
- func (x Float32x8) ReciprocalSqrt() Float32x8
- func (x Float32x8) RoundToEven() Float32x8
- func (x Float32x8) RoundToEvenScaled(prec uint8) Float32x8
- func (x Float32x8) RoundToEvenScaledResidue(prec uint8) Float32x8
- func (x Float32x8) Scale(y Float32x8) Float32x8
- func (x Float32x8) Select128FromPair(lo, hi uint8, y Float32x8) Float32x8
- func (x Float32x8) SelectFromPairGrouped(a, b, c, d uint8, y Float32x8) Float32x8
- func (x Float32x8) SetHi(y Float32x4) Float32x8
- func (x Float32x8) SetLo(y Float32x4) Float32x8
- func (x Float32x8) Sqrt() Float32x8
- func (x Float32x8) Store(y *[8]float32)
- func (x Float32x8) StoreMasked(y *[8]float32, mask Mask32x8)
- func (x Float32x8) StoreSlice(s []float32)
- func (x Float32x8) StoreSlicePart(s []float32)
- func (x Float32x8) String() string
- func (x Float32x8) Sub(y Float32x8) Float32x8
- func (x Float32x8) SubPairsGrouped(y Float32x8) Float32x8
- func (x Float32x8) Trunc() Float32x8
- func (x Float32x8) TruncScaled(prec uint8) Float32x8
- func (x Float32x8) TruncScaledResidue(prec uint8) Float32x8
- type Float64x2
- func (x Float64x2) Add(y Float64x2) Float64x2
- func (x Float64x2) AddPairs(y Float64x2) Float64x2
- func (x Float64x2) AddSub(y Float64x2) Float64x2
- func (x Float64x2) AsFloat32x4() Float32x4
- func (x Float64x2) AsInt16x8() Int16x8
- func (x Float64x2) AsInt32x4() Int32x4
- func (x Float64x2) AsInt64x2() Int64x2
- func (x Float64x2) AsInt8x16() Int8x16
- func (x Float64x2) AsUint16x8() Uint16x8
- func (x Float64x2) AsUint32x4() Uint32x4
- func (x Float64x2) AsUint64x2() Uint64x2
- func (x Float64x2) AsUint8x16() Uint8x16
- func (x Float64x2) Broadcast1To2() Float64x2
- func (x Float64x2) Broadcast1To4() Float64x4
- func (x Float64x2) Broadcast1To8() Float64x8
- func (x Float64x2) Ceil() Float64x2
- func (x Float64x2) CeilScaled(prec uint8) Float64x2
- func (x Float64x2) CeilScaledResidue(prec uint8) Float64x2
- func (x Float64x2) Compress(mask Mask64x2) Float64x2
- func (x Float64x2) ConcatPermute(y Float64x2, indices Uint64x2) Float64x2
- func (x Float64x2) ConvertToFloat32() Float32x4
- func (x Float64x2) ConvertToInt32() Int32x4
- func (x Float64x2) ConvertToInt64() Int64x2
- func (x Float64x2) ConvertToUint32() Uint32x4
- func (x Float64x2) ConvertToUint64() Uint64x2
- func (x Float64x2) Div(y Float64x2) Float64x2
- func (x Float64x2) Equal(y Float64x2) Mask64x2
- func (x Float64x2) Expand(mask Mask64x2) Float64x2
- func (x Float64x2) Floor() Float64x2
- func (x Float64x2) FloorScaled(prec uint8) Float64x2
- func (x Float64x2) FloorScaledResidue(prec uint8) Float64x2
- func (x Float64x2) GetElem(index uint8) float64
- func (x Float64x2) Greater(y Float64x2) Mask64x2
- func (x Float64x2) GreaterEqual(y Float64x2) Mask64x2
- func (x Float64x2) IsNaN() Mask64x2
- func (x Float64x2) Len() int
- func (x Float64x2) Less(y Float64x2) Mask64x2
- func (x Float64x2) LessEqual(y Float64x2) Mask64x2
- func (x Float64x2) Masked(mask Mask64x2) Float64x2
- func (x Float64x2) Max(y Float64x2) Float64x2
- func (x Float64x2) Merge(y Float64x2, mask Mask64x2) Float64x2
- func (x Float64x2) Min(y Float64x2) Float64x2
- func (x Float64x2) Mul(y Float64x2) Float64x2
- func (x Float64x2) MulAdd(y Float64x2, z Float64x2) Float64x2
- func (x Float64x2) MulAddSub(y Float64x2, z Float64x2) Float64x2
- func (x Float64x2) MulSubAdd(y Float64x2, z Float64x2) Float64x2
- func (x Float64x2) NotEqual(y Float64x2) Mask64x2
- func (x Float64x2) Reciprocal() Float64x2
- func (x Float64x2) ReciprocalSqrt() Float64x2
- func (x Float64x2) RoundToEven() Float64x2
- func (x Float64x2) RoundToEvenScaled(prec uint8) Float64x2
- func (x Float64x2) RoundToEvenScaledResidue(prec uint8) Float64x2
- func (x Float64x2) Scale(y Float64x2) Float64x2
- func (x Float64x2) SelectFromPair(a, b uint8, y Float64x2) Float64x2
- func (x Float64x2) SetElem(index uint8, y float64) Float64x2
- func (x Float64x2) Sqrt() Float64x2
- func (x Float64x2) Store(y *[2]float64)
- func (x Float64x2) StoreMasked(y *[2]float64, mask Mask64x2)
- func (x Float64x2) StoreSlice(s []float64)
- func (x Float64x2) StoreSlicePart(s []float64)
- func (x Float64x2) String() string
- func (x Float64x2) Sub(y Float64x2) Float64x2
- func (x Float64x2) SubPairs(y Float64x2) Float64x2
- func (x Float64x2) Trunc() Float64x2
- func (x Float64x2) TruncScaled(prec uint8) Float64x2
- func (x Float64x2) TruncScaledResidue(prec uint8) Float64x2
- type Float64x4
- func (x Float64x4) Add(y Float64x4) Float64x4
- func (x Float64x4) AddPairsGrouped(y Float64x4) Float64x4
- func (x Float64x4) AddSub(y Float64x4) Float64x4
- func (x Float64x4) AsFloat32x8() Float32x8
- func (x Float64x4) AsInt16x16() Int16x16
- func (x Float64x4) AsInt32x8() Int32x8
- func (x Float64x4) AsInt64x4() Int64x4
- func (x Float64x4) AsInt8x32() Int8x32
- func (x Float64x4) AsUint16x16() Uint16x16
- func (x Float64x4) AsUint32x8() Uint32x8
- func (x Float64x4) AsUint64x4() Uint64x4
- func (x Float64x4) AsUint8x32() Uint8x32
- func (x Float64x4) Ceil() Float64x4
- func (x Float64x4) CeilScaled(prec uint8) Float64x4
- func (x Float64x4) CeilScaledResidue(prec uint8) Float64x4
- func (x Float64x4) Compress(mask Mask64x4) Float64x4
- func (x Float64x4) ConcatPermute(y Float64x4, indices Uint64x4) Float64x4
- func (x Float64x4) ConvertToFloat32() Float32x4
- func (x Float64x4) ConvertToInt32() Int32x4
- func (x Float64x4) ConvertToInt64() Int64x4
- func (x Float64x4) ConvertToUint32() Uint32x4
- func (x Float64x4) ConvertToUint64() Uint64x4
- func (x Float64x4) Div(y Float64x4) Float64x4
- func (x Float64x4) Equal(y Float64x4) Mask64x4
- func (x Float64x4) Expand(mask Mask64x4) Float64x4
- func (x Float64x4) Floor() Float64x4
- func (x Float64x4) FloorScaled(prec uint8) Float64x4
- func (x Float64x4) FloorScaledResidue(prec uint8) Float64x4
- func (x Float64x4) GetHi() Float64x2
- func (x Float64x4) GetLo() Float64x2
- func (x Float64x4) Greater(y Float64x4) Mask64x4
- func (x Float64x4) GreaterEqual(y Float64x4) Mask64x4
- func (x Float64x4) IsNaN() Mask64x4
- func (x Float64x4) Len() int
- func (x Float64x4) Less(y Float64x4) Mask64x4
- func (x Float64x4) LessEqual(y Float64x4) Mask64x4
- func (x Float64x4) Masked(mask Mask64x4) Float64x4
- func (x Float64x4) Max(y Float64x4) Float64x4
- func (x Float64x4) Merge(y Float64x4, mask Mask64x4) Float64x4
- func (x Float64x4) Min(y Float64x4) Float64x4
- func (x Float64x4) Mul(y Float64x4) Float64x4
- func (x Float64x4) MulAdd(y Float64x4, z Float64x4) Float64x4
- func (x Float64x4) MulAddSub(y Float64x4, z Float64x4) Float64x4
- func (x Float64x4) MulSubAdd(y Float64x4, z Float64x4) Float64x4
- func (x Float64x4) NotEqual(y Float64x4) Mask64x4
- func (x Float64x4) Permute(indices Uint64x4) Float64x4
- func (x Float64x4) Reciprocal() Float64x4
- func (x Float64x4) ReciprocalSqrt() Float64x4
- func (x Float64x4) RoundToEven() Float64x4
- func (x Float64x4) RoundToEvenScaled(prec uint8) Float64x4
- func (x Float64x4) RoundToEvenScaledResidue(prec uint8) Float64x4
- func (x Float64x4) Scale(y Float64x4) Float64x4
- func (x Float64x4) Select128FromPair(lo, hi uint8, y Float64x4) Float64x4
- func (x Float64x4) SelectFromPairGrouped(a, b uint8, y Float64x4) Float64x4
- func (x Float64x4) SetHi(y Float64x2) Float64x4
- func (x Float64x4) SetLo(y Float64x2) Float64x4
- func (x Float64x4) Sqrt() Float64x4
- func (x Float64x4) Store(y *[4]float64)
- func (x Float64x4) StoreMasked(y *[4]float64, mask Mask64x4)
- func (x Float64x4) StoreSlice(s []float64)
- func (x Float64x4) StoreSlicePart(s []float64)
- func (x Float64x4) String() string
- func (x Float64x4) Sub(y Float64x4) Float64x4
- func (x Float64x4) SubPairsGrouped(y Float64x4) Float64x4
- func (x Float64x4) Trunc() Float64x4
- func (x Float64x4) TruncScaled(prec uint8) Float64x4
- func (x Float64x4) TruncScaledResidue(prec uint8) Float64x4
- type Float64x8
- func (x Float64x8) Add(y Float64x8) Float64x8
- func (x Float64x8) AsFloat32x16() Float32x16
- func (x Float64x8) AsInt16x32() Int16x32
- func (x Float64x8) AsInt32x16() Int32x16
- func (x Float64x8) AsInt64x8() Int64x8
- func (x Float64x8) AsInt8x64() Int8x64
- func (x Float64x8) AsUint16x32() Uint16x32
- func (x Float64x8) AsUint32x16() Uint32x16
- func (x Float64x8) AsUint64x8() Uint64x8
- func (x Float64x8) AsUint8x64() Uint8x64
- func (x Float64x8) CeilScaled(prec uint8) Float64x8
- func (x Float64x8) CeilScaledResidue(prec uint8) Float64x8
- func (x Float64x8) Compress(mask Mask64x8) Float64x8
- func (x Float64x8) ConcatPermute(y Float64x8, indices Uint64x8) Float64x8
- func (x Float64x8) ConvertToFloat32() Float32x8
- func (x Float64x8) ConvertToInt32() Int32x8
- func (x Float64x8) ConvertToInt64() Int64x8
- func (x Float64x8) ConvertToUint32() Uint32x8
- func (x Float64x8) ConvertToUint64() Uint64x8
- func (x Float64x8) Div(y Float64x8) Float64x8
- func (x Float64x8) Equal(y Float64x8) Mask64x8
- func (x Float64x8) Expand(mask Mask64x8) Float64x8
- func (x Float64x8) FloorScaled(prec uint8) Float64x8
- func (x Float64x8) FloorScaledResidue(prec uint8) Float64x8
- func (x Float64x8) GetHi() Float64x4
- func (x Float64x8) GetLo() Float64x4
- func (x Float64x8) Greater(y Float64x8) Mask64x8
- func (x Float64x8) GreaterEqual(y Float64x8) Mask64x8
- func (x Float64x8) IsNaN() Mask64x8
- func (x Float64x8) Len() int
- func (x Float64x8) Less(y Float64x8) Mask64x8
- func (x Float64x8) LessEqual(y Float64x8) Mask64x8
- func (x Float64x8) Masked(mask Mask64x8) Float64x8
- func (x Float64x8) Max(y Float64x8) Float64x8
- func (x Float64x8) Merge(y Float64x8, mask Mask64x8) Float64x8
- func (x Float64x8) Min(y Float64x8) Float64x8
- func (x Float64x8) Mul(y Float64x8) Float64x8
- func (x Float64x8) MulAdd(y Float64x8, z Float64x8) Float64x8
- func (x Float64x8) MulAddSub(y Float64x8, z Float64x8) Float64x8
- func (x Float64x8) MulSubAdd(y Float64x8, z Float64x8) Float64x8
- func (x Float64x8) NotEqual(y Float64x8) Mask64x8
- func (x Float64x8) Permute(indices Uint64x8) Float64x8
- func (x Float64x8) Reciprocal() Float64x8
- func (x Float64x8) ReciprocalSqrt() Float64x8
- func (x Float64x8) RoundToEvenScaled(prec uint8) Float64x8
- func (x Float64x8) RoundToEvenScaledResidue(prec uint8) Float64x8
- func (x Float64x8) Scale(y Float64x8) Float64x8
- func (x Float64x8) SelectFromPairGrouped(a, b uint8, y Float64x8) Float64x8
- func (x Float64x8) SetHi(y Float64x4) Float64x8
- func (x Float64x8) SetLo(y Float64x4) Float64x8
- func (x Float64x8) Sqrt() Float64x8
- func (x Float64x8) Store(y *[8]float64)
- func (x Float64x8) StoreMasked(y *[8]float64, mask Mask64x8)
- func (x Float64x8) StoreSlice(s []float64)
- func (x Float64x8) StoreSlicePart(s []float64)
- func (x Float64x8) String() string
- func (x Float64x8) Sub(y Float64x8) Float64x8
- func (x Float64x8) TruncScaled(prec uint8) Float64x8
- func (x Float64x8) TruncScaledResidue(prec uint8) Float64x8
- type Int16x16
- func (x Int16x16) Abs() Int16x16
- func (x Int16x16) Add(y Int16x16) Int16x16
- func (x Int16x16) AddPairsGrouped(y Int16x16) Int16x16
- func (x Int16x16) AddPairsSaturatedGrouped(y Int16x16) Int16x16
- func (x Int16x16) AddSaturated(y Int16x16) Int16x16
- func (x Int16x16) And(y Int16x16) Int16x16
- func (x Int16x16) AndNot(y Int16x16) Int16x16
- func (x Int16x16) AsFloat32x8() Float32x8
- func (x Int16x16) AsFloat64x4() Float64x4
- func (x Int16x16) AsInt32x8() Int32x8
- func (x Int16x16) AsInt64x4() Int64x4
- func (x Int16x16) AsInt8x32() Int8x32
- func (x Int16x16) AsUint16x16() Uint16x16
- func (x Int16x16) AsUint32x8() Uint32x8
- func (x Int16x16) AsUint64x4() Uint64x4
- func (x Int16x16) AsUint8x32() Uint8x32
- func (x Int16x16) Compress(mask Mask16x16) Int16x16
- func (x Int16x16) ConcatPermute(y Int16x16, indices Uint16x16) Int16x16
- func (x Int16x16) CopySign(y Int16x16) Int16x16
- func (x Int16x16) DotProductPairs(y Int16x16) Int32x8
- func (x Int16x16) Equal(y Int16x16) Mask16x16
- func (x Int16x16) Expand(mask Mask16x16) Int16x16
- func (x Int16x16) ExtendToInt32() Int32x16
- func (x Int16x16) GetHi() Int16x8
- func (x Int16x16) GetLo() Int16x8
- func (x Int16x16) Greater(y Int16x16) Mask16x16
- func (x Int16x16) GreaterEqual(y Int16x16) Mask16x16
- func (x Int16x16) InterleaveHiGrouped(y Int16x16) Int16x16
- func (x Int16x16) InterleaveLoGrouped(y Int16x16) Int16x16
- func (x Int16x16) IsZero() bool
- func (x Int16x16) Len() int
- func (x Int16x16) Less(y Int16x16) Mask16x16
- func (x Int16x16) LessEqual(y Int16x16) Mask16x16
- func (x Int16x16) Masked(mask Mask16x16) Int16x16
- func (x Int16x16) Max(y Int16x16) Int16x16
- func (x Int16x16) Merge(y Int16x16, mask Mask16x16) Int16x16
- func (x Int16x16) Min(y Int16x16) Int16x16
- func (x Int16x16) Mul(y Int16x16) Int16x16
- func (x Int16x16) MulHigh(y Int16x16) Int16x16
- func (x Int16x16) Not() Int16x16
- func (x Int16x16) NotEqual(y Int16x16) Mask16x16
- func (x Int16x16) OnesCount() Int16x16
- func (x Int16x16) Or(y Int16x16) Int16x16
- func (x Int16x16) Permute(indices Uint16x16) Int16x16
- func (x Int16x16) PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x16
- func (x Int16x16) PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x16
- func (x Int16x16) SaturateToInt8() Int8x16
- func (x Int16x16) Select128FromPair(lo, hi uint8, y Int16x16) Int16x16
- func (x Int16x16) SetHi(y Int16x8) Int16x16
- func (x Int16x16) SetLo(y Int16x8) Int16x16
- func (x Int16x16) ShiftAllLeft(y uint64) Int16x16
- func (x Int16x16) ShiftAllLeftConcat(shift uint8, y Int16x16) Int16x16
- func (x Int16x16) ShiftAllRight(y uint64) Int16x16
- func (x Int16x16) ShiftAllRightConcat(shift uint8, y Int16x16) Int16x16
- func (x Int16x16) ShiftLeft(y Int16x16) Int16x16
- func (x Int16x16) ShiftLeftConcat(y Int16x16, z Int16x16) Int16x16
- func (x Int16x16) ShiftRight(y Int16x16) Int16x16
- func (x Int16x16) ShiftRightConcat(y Int16x16, z Int16x16) Int16x16
- func (x Int16x16) Store(y *[16]int16)
- func (x Int16x16) StoreSlice(s []int16)
- func (x Int16x16) StoreSlicePart(s []int16)
- func (x Int16x16) String() string
- func (x Int16x16) Sub(y Int16x16) Int16x16
- func (x Int16x16) SubPairsGrouped(y Int16x16) Int16x16
- func (x Int16x16) SubPairsSaturatedGrouped(y Int16x16) Int16x16
- func (x Int16x16) SubSaturated(y Int16x16) Int16x16
- func (from Int16x16) ToMask() (to Mask16x16)
- func (x Int16x16) TruncateToInt8() Int8x16
- func (x Int16x16) Xor(y Int16x16) Int16x16
- type Int16x32
- func (x Int16x32) Abs() Int16x32
- func (x Int16x32) Add(y Int16x32) Int16x32
- func (x Int16x32) AddSaturated(y Int16x32) Int16x32
- func (x Int16x32) And(y Int16x32) Int16x32
- func (x Int16x32) AndNot(y Int16x32) Int16x32
- func (x Int16x32) AsFloat32x16() Float32x16
- func (x Int16x32) AsFloat64x8() Float64x8
- func (x Int16x32) AsInt32x16() Int32x16
- func (x Int16x32) AsInt64x8() Int64x8
- func (x Int16x32) AsInt8x64() Int8x64
- func (x Int16x32) AsUint16x32() Uint16x32
- func (x Int16x32) AsUint32x16() Uint32x16
- func (x Int16x32) AsUint64x8() Uint64x8
- func (x Int16x32) AsUint8x64() Uint8x64
- func (x Int16x32) Compress(mask Mask16x32) Int16x32
- func (x Int16x32) ConcatPermute(y Int16x32, indices Uint16x32) Int16x32
- func (x Int16x32) DotProductPairs(y Int16x32) Int32x16
- func (x Int16x32) Equal(y Int16x32) Mask16x32
- func (x Int16x32) Expand(mask Mask16x32) Int16x32
- func (x Int16x32) GetHi() Int16x16
- func (x Int16x32) GetLo() Int16x16
- func (x Int16x32) Greater(y Int16x32) Mask16x32
- func (x Int16x32) GreaterEqual(y Int16x32) Mask16x32
- func (x Int16x32) InterleaveHiGrouped(y Int16x32) Int16x32
- func (x Int16x32) InterleaveLoGrouped(y Int16x32) Int16x32
- func (x Int16x32) Len() int
- func (x Int16x32) Less(y Int16x32) Mask16x32
- func (x Int16x32) LessEqual(y Int16x32) Mask16x32
- func (x Int16x32) Masked(mask Mask16x32) Int16x32
- func (x Int16x32) Max(y Int16x32) Int16x32
- func (x Int16x32) Merge(y Int16x32, mask Mask16x32) Int16x32
- func (x Int16x32) Min(y Int16x32) Int16x32
- func (x Int16x32) Mul(y Int16x32) Int16x32
- func (x Int16x32) MulHigh(y Int16x32) Int16x32
- func (x Int16x32) Not() Int16x32
- func (x Int16x32) NotEqual(y Int16x32) Mask16x32
- func (x Int16x32) OnesCount() Int16x32
- func (x Int16x32) Or(y Int16x32) Int16x32
- func (x Int16x32) Permute(indices Uint16x32) Int16x32
- func (x Int16x32) PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x32
- func (x Int16x32) PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x32
- func (x Int16x32) SaturateToInt8() Int8x32
- func (x Int16x32) SetHi(y Int16x16) Int16x32
- func (x Int16x32) SetLo(y Int16x16) Int16x32
- func (x Int16x32) ShiftAllLeft(y uint64) Int16x32
- func (x Int16x32) ShiftAllLeftConcat(shift uint8, y Int16x32) Int16x32
- func (x Int16x32) ShiftAllRight(y uint64) Int16x32
- func (x Int16x32) ShiftAllRightConcat(shift uint8, y Int16x32) Int16x32
- func (x Int16x32) ShiftLeft(y Int16x32) Int16x32
- func (x Int16x32) ShiftLeftConcat(y Int16x32, z Int16x32) Int16x32
- func (x Int16x32) ShiftRight(y Int16x32) Int16x32
- func (x Int16x32) ShiftRightConcat(y Int16x32, z Int16x32) Int16x32
- func (x Int16x32) Store(y *[32]int16)
- func (x Int16x32) StoreMasked(y *[32]int16, mask Mask16x32)
- func (x Int16x32) StoreSlice(s []int16)
- func (x Int16x32) StoreSlicePart(s []int16)
- func (x Int16x32) String() string
- func (x Int16x32) Sub(y Int16x32) Int16x32
- func (x Int16x32) SubSaturated(y Int16x32) Int16x32
- func (from Int16x32) ToMask() (to Mask16x32)
- func (x Int16x32) TruncateToInt8() Int8x32
- func (x Int16x32) Xor(y Int16x32) Int16x32
- type Int16x8
- func (x Int16x8) Abs() Int16x8
- func (x Int16x8) Add(y Int16x8) Int16x8
- func (x Int16x8) AddPairs(y Int16x8) Int16x8
- func (x Int16x8) AddPairsSaturated(y Int16x8) Int16x8
- func (x Int16x8) AddSaturated(y Int16x8) Int16x8
- func (x Int16x8) And(y Int16x8) Int16x8
- func (x Int16x8) AndNot(y Int16x8) Int16x8
- func (x Int16x8) AsFloat32x4() Float32x4
- func (x Int16x8) AsFloat64x2() Float64x2
- func (x Int16x8) AsInt32x4() Int32x4
- func (x Int16x8) AsInt64x2() Int64x2
- func (x Int16x8) AsInt8x16() Int8x16
- func (x Int16x8) AsUint16x8() Uint16x8
- func (x Int16x8) AsUint32x4() Uint32x4
- func (x Int16x8) AsUint64x2() Uint64x2
- func (x Int16x8) AsUint8x16() Uint8x16
- func (x Int16x8) Broadcast1To16() Int16x16
- func (x Int16x8) Broadcast1To32() Int16x32
- func (x Int16x8) Broadcast1To8() Int16x8
- func (x Int16x8) Compress(mask Mask16x8) Int16x8
- func (x Int16x8) ConcatPermute(y Int16x8, indices Uint16x8) Int16x8
- func (x Int16x8) CopySign(y Int16x8) Int16x8
- func (x Int16x8) DotProductPairs(y Int16x8) Int32x4
- func (x Int16x8) Equal(y Int16x8) Mask16x8
- func (x Int16x8) Expand(mask Mask16x8) Int16x8
- func (x Int16x8) ExtendLo2ToInt64() Int64x2
- func (x Int16x8) ExtendLo4ToInt32() Int32x4
- func (x Int16x8) ExtendLo4ToInt64() Int64x4
- func (x Int16x8) ExtendToInt32() Int32x8
- func (x Int16x8) ExtendToInt64() Int64x8
- func (x Int16x8) GetElem(index uint8) int16
- func (x Int16x8) Greater(y Int16x8) Mask16x8
- func (x Int16x8) GreaterEqual(y Int16x8) Mask16x8
- func (x Int16x8) InterleaveHi(y Int16x8) Int16x8
- func (x Int16x8) InterleaveLo(y Int16x8) Int16x8
- func (x Int16x8) IsZero() bool
- func (x Int16x8) Len() int
- func (x Int16x8) Less(y Int16x8) Mask16x8
- func (x Int16x8) LessEqual(y Int16x8) Mask16x8
- func (x Int16x8) Masked(mask Mask16x8) Int16x8
- func (x Int16x8) Max(y Int16x8) Int16x8
- func (x Int16x8) Merge(y Int16x8, mask Mask16x8) Int16x8
- func (x Int16x8) Min(y Int16x8) Int16x8
- func (x Int16x8) Mul(y Int16x8) Int16x8
- func (x Int16x8) MulHigh(y Int16x8) Int16x8
- func (x Int16x8) Not() Int16x8
- func (x Int16x8) NotEqual(y Int16x8) Mask16x8
- func (x Int16x8) OnesCount() Int16x8
- func (x Int16x8) Or(y Int16x8) Int16x8
- func (x Int16x8) Permute(indices Uint16x8) Int16x8
- func (x Int16x8) PermuteScalarsHi(a, b, c, d uint8) Int16x8
- func (x Int16x8) PermuteScalarsLo(a, b, c, d uint8) Int16x8
- func (x Int16x8) SaturateToInt8() Int8x16
- func (x Int16x8) SetElem(index uint8, y int16) Int16x8
- func (x Int16x8) ShiftAllLeft(y uint64) Int16x8
- func (x Int16x8) ShiftAllLeftConcat(shift uint8, y Int16x8) Int16x8
- func (x Int16x8) ShiftAllRight(y uint64) Int16x8
- func (x Int16x8) ShiftAllRightConcat(shift uint8, y Int16x8) Int16x8
- func (x Int16x8) ShiftLeft(y Int16x8) Int16x8
- func (x Int16x8) ShiftLeftConcat(y Int16x8, z Int16x8) Int16x8
- func (x Int16x8) ShiftRight(y Int16x8) Int16x8
- func (x Int16x8) ShiftRightConcat(y Int16x8, z Int16x8) Int16x8
- func (x Int16x8) Store(y *[8]int16)
- func (x Int16x8) StoreSlice(s []int16)
- func (x Int16x8) StoreSlicePart(s []int16)
- func (x Int16x8) String() string
- func (x Int16x8) Sub(y Int16x8) Int16x8
- func (x Int16x8) SubPairs(y Int16x8) Int16x8
- func (x Int16x8) SubPairsSaturated(y Int16x8) Int16x8
- func (x Int16x8) SubSaturated(y Int16x8) Int16x8
- func (from Int16x8) ToMask() (to Mask16x8)
- func (x Int16x8) TruncateToInt8() Int8x16
- func (x Int16x8) Xor(y Int16x8) Int16x8
- type Int32x16
- func (x Int32x16) Abs() Int32x16
- func (x Int32x16) Add(y Int32x16) Int32x16
- func (x Int32x16) And(y Int32x16) Int32x16
- func (x Int32x16) AndNot(y Int32x16) Int32x16
- func (x Int32x16) AsFloat32x16() Float32x16
- func (x Int32x16) AsFloat64x8() Float64x8
- func (x Int32x16) AsInt16x32() Int16x32
- func (x Int32x16) AsInt64x8() Int64x8
- func (x Int32x16) AsInt8x64() Int8x64
- func (x Int32x16) AsUint16x32() Uint16x32
- func (x Int32x16) AsUint32x16() Uint32x16
- func (x Int32x16) AsUint64x8() Uint64x8
- func (x Int32x16) AsUint8x64() Uint8x64
- func (x Int32x16) Compress(mask Mask32x16) Int32x16
- func (x Int32x16) ConcatPermute(y Int32x16, indices Uint32x16) Int32x16
- func (x Int32x16) ConvertToFloat32() Float32x16
- func (x Int32x16) Equal(y Int32x16) Mask32x16
- func (x Int32x16) Expand(mask Mask32x16) Int32x16
- func (x Int32x16) GetHi() Int32x8
- func (x Int32x16) GetLo() Int32x8
- func (x Int32x16) Greater(y Int32x16) Mask32x16
- func (x Int32x16) GreaterEqual(y Int32x16) Mask32x16
- func (x Int32x16) InterleaveHiGrouped(y Int32x16) Int32x16
- func (x Int32x16) InterleaveLoGrouped(y Int32x16) Int32x16
- func (x Int32x16) LeadingZeros() Int32x16
- func (x Int32x16) Len() int
- func (x Int32x16) Less(y Int32x16) Mask32x16
- func (x Int32x16) LessEqual(y Int32x16) Mask32x16
- func (x Int32x16) Masked(mask Mask32x16) Int32x16
- func (x Int32x16) Max(y Int32x16) Int32x16
- func (x Int32x16) Merge(y Int32x16, mask Mask32x16) Int32x16
- func (x Int32x16) Min(y Int32x16) Int32x16
- func (x Int32x16) Mul(y Int32x16) Int32x16
- func (x Int32x16) Not() Int32x16
- func (x Int32x16) NotEqual(y Int32x16) Mask32x16
- func (x Int32x16) OnesCount() Int32x16
- func (x Int32x16) Or(y Int32x16) Int32x16
- func (x Int32x16) Permute(indices Uint32x16) Int32x16
- func (x Int32x16) PermuteScalarsGrouped(a, b, c, d uint8) Int32x16
- func (x Int32x16) RotateAllLeft(shift uint8) Int32x16
- func (x Int32x16) RotateAllRight(shift uint8) Int32x16
- func (x Int32x16) RotateLeft(y Int32x16) Int32x16
- func (x Int32x16) RotateRight(y Int32x16) Int32x16
- func (x Int32x16) SaturateToInt16() Int16x16
- func (x Int32x16) SaturateToInt16ConcatGrouped(y Int32x16) Int16x32
- func (x Int32x16) SaturateToInt8() Int8x16
- func (x Int32x16) SaturateToUint16ConcatGrouped(y Int32x16) Uint16x32
- func (x Int32x16) SelectFromPairGrouped(a, b, c, d uint8, y Int32x16) Int32x16
- func (x Int32x16) SetHi(y Int32x8) Int32x16
- func (x Int32x16) SetLo(y Int32x8) Int32x16
- func (x Int32x16) ShiftAllLeft(y uint64) Int32x16
- func (x Int32x16) ShiftAllLeftConcat(shift uint8, y Int32x16) Int32x16
- func (x Int32x16) ShiftAllRight(y uint64) Int32x16
- func (x Int32x16) ShiftAllRightConcat(shift uint8, y Int32x16) Int32x16
- func (x Int32x16) ShiftLeft(y Int32x16) Int32x16
- func (x Int32x16) ShiftLeftConcat(y Int32x16, z Int32x16) Int32x16
- func (x Int32x16) ShiftRight(y Int32x16) Int32x16
- func (x Int32x16) ShiftRightConcat(y Int32x16, z Int32x16) Int32x16
- func (x Int32x16) Store(y *[16]int32)
- func (x Int32x16) StoreMasked(y *[16]int32, mask Mask32x16)
- func (x Int32x16) StoreSlice(s []int32)
- func (x Int32x16) StoreSlicePart(s []int32)
- func (x Int32x16) String() string
- func (x Int32x16) Sub(y Int32x16) Int32x16
- func (from Int32x16) ToMask() (to Mask32x16)
- func (x Int32x16) TruncateToInt16() Int16x16
- func (x Int32x16) TruncateToInt8() Int8x16
- func (x Int32x16) Xor(y Int32x16) Int32x16
- type Int32x4
- func (x Int32x4) Abs() Int32x4
- func (x Int32x4) Add(y Int32x4) Int32x4
- func (x Int32x4) AddPairs(y Int32x4) Int32x4
- func (x Int32x4) And(y Int32x4) Int32x4
- func (x Int32x4) AndNot(y Int32x4) Int32x4
- func (x Int32x4) AsFloat32x4() Float32x4
- func (x Int32x4) AsFloat64x2() Float64x2
- func (x Int32x4) AsInt16x8() Int16x8
- func (x Int32x4) AsInt64x2() Int64x2
- func (x Int32x4) AsInt8x16() Int8x16
- func (x Int32x4) AsUint16x8() Uint16x8
- func (x Int32x4) AsUint32x4() Uint32x4
- func (x Int32x4) AsUint64x2() Uint64x2
- func (x Int32x4) AsUint8x16() Uint8x16
- func (x Int32x4) Broadcast1To16() Int32x16
- func (x Int32x4) Broadcast1To4() Int32x4
- func (x Int32x4) Broadcast1To8() Int32x8
- func (x Int32x4) Compress(mask Mask32x4) Int32x4
- func (x Int32x4) ConcatPermute(y Int32x4, indices Uint32x4) Int32x4
- func (x Int32x4) ConvertToFloat32() Float32x4
- func (x Int32x4) ConvertToFloat64() Float64x4
- func (x Int32x4) CopySign(y Int32x4) Int32x4
- func (x Int32x4) Equal(y Int32x4) Mask32x4
- func (x Int32x4) Expand(mask Mask32x4) Int32x4
- func (x Int32x4) ExtendLo2ToInt64() Int64x2
- func (x Int32x4) ExtendToInt64() Int64x4
- func (x Int32x4) GetElem(index uint8) int32
- func (x Int32x4) Greater(y Int32x4) Mask32x4
- func (x Int32x4) GreaterEqual(y Int32x4) Mask32x4
- func (x Int32x4) InterleaveHi(y Int32x4) Int32x4
- func (x Int32x4) InterleaveLo(y Int32x4) Int32x4
- func (x Int32x4) IsZero() bool
- func (x Int32x4) LeadingZeros() Int32x4
- func (x Int32x4) Len() int
- func (x Int32x4) Less(y Int32x4) Mask32x4
- func (x Int32x4) LessEqual(y Int32x4) Mask32x4
- func (x Int32x4) Masked(mask Mask32x4) Int32x4
- func (x Int32x4) Max(y Int32x4) Int32x4
- func (x Int32x4) Merge(y Int32x4, mask Mask32x4) Int32x4
- func (x Int32x4) Min(y Int32x4) Int32x4
- func (x Int32x4) Mul(y Int32x4) Int32x4
- func (x Int32x4) MulEvenWiden(y Int32x4) Int64x2
- func (x Int32x4) Not() Int32x4
- func (x Int32x4) NotEqual(y Int32x4) Mask32x4
- func (x Int32x4) OnesCount() Int32x4
- func (x Int32x4) Or(y Int32x4) Int32x4
- func (x Int32x4) PermuteScalars(a, b, c, d uint8) Int32x4
- func (x Int32x4) RotateAllLeft(shift uint8) Int32x4
- func (x Int32x4) RotateAllRight(shift uint8) Int32x4
- func (x Int32x4) RotateLeft(y Int32x4) Int32x4
- func (x Int32x4) RotateRight(y Int32x4) Int32x4
- func (x Int32x4) SaturateToInt16() Int16x8
- func (x Int32x4) SaturateToInt16Concat(y Int32x4) Int16x8
- func (x Int32x4) SaturateToInt8() Int8x16
- func (x Int32x4) SaturateToUint16Concat(y Int32x4) Uint16x8
- func (x Int32x4) SelectFromPair(a, b, c, d uint8, y Int32x4) Int32x4
- func (x Int32x4) SetElem(index uint8, y int32) Int32x4
- func (x Int32x4) ShiftAllLeft(y uint64) Int32x4
- func (x Int32x4) ShiftAllLeftConcat(shift uint8, y Int32x4) Int32x4
- func (x Int32x4) ShiftAllRight(y uint64) Int32x4
- func (x Int32x4) ShiftAllRightConcat(shift uint8, y Int32x4) Int32x4
- func (x Int32x4) ShiftLeft(y Int32x4) Int32x4
- func (x Int32x4) ShiftLeftConcat(y Int32x4, z Int32x4) Int32x4
- func (x Int32x4) ShiftRight(y Int32x4) Int32x4
- func (x Int32x4) ShiftRightConcat(y Int32x4, z Int32x4) Int32x4
- func (x Int32x4) Store(y *[4]int32)
- func (x Int32x4) StoreMasked(y *[4]int32, mask Mask32x4)
- func (x Int32x4) StoreSlice(s []int32)
- func (x Int32x4) StoreSlicePart(s []int32)
- func (x Int32x4) String() string
- func (x Int32x4) Sub(y Int32x4) Int32x4
- func (x Int32x4) SubPairs(y Int32x4) Int32x4
- func (from Int32x4) ToMask() (to Mask32x4)
- func (x Int32x4) TruncateToInt16() Int16x8
- func (x Int32x4) TruncateToInt8() Int8x16
- func (x Int32x4) Xor(y Int32x4) Int32x4
- type Int32x8
- func (x Int32x8) Abs() Int32x8
- func (x Int32x8) Add(y Int32x8) Int32x8
- func (x Int32x8) AddPairsGrouped(y Int32x8) Int32x8
- func (x Int32x8) And(y Int32x8) Int32x8
- func (x Int32x8) AndNot(y Int32x8) Int32x8
- func (x Int32x8) AsFloat32x8() Float32x8
- func (x Int32x8) AsFloat64x4() Float64x4
- func (x Int32x8) AsInt16x16() Int16x16
- func (x Int32x8) AsInt64x4() Int64x4
- func (x Int32x8) AsInt8x32() Int8x32
- func (x Int32x8) AsUint16x16() Uint16x16
- func (x Int32x8) AsUint32x8() Uint32x8
- func (x Int32x8) AsUint64x4() Uint64x4
- func (x Int32x8) AsUint8x32() Uint8x32
- func (x Int32x8) Compress(mask Mask32x8) Int32x8
- func (x Int32x8) ConcatPermute(y Int32x8, indices Uint32x8) Int32x8
- func (x Int32x8) ConvertToFloat32() Float32x8
- func (x Int32x8) ConvertToFloat64() Float64x8
- func (x Int32x8) CopySign(y Int32x8) Int32x8
- func (x Int32x8) Equal(y Int32x8) Mask32x8
- func (x Int32x8) Expand(mask Mask32x8) Int32x8
- func (x Int32x8) ExtendToInt64() Int64x8
- func (x Int32x8) GetHi() Int32x4
- func (x Int32x8) GetLo() Int32x4
- func (x Int32x8) Greater(y Int32x8) Mask32x8
- func (x Int32x8) GreaterEqual(y Int32x8) Mask32x8
- func (x Int32x8) InterleaveHiGrouped(y Int32x8) Int32x8
- func (x Int32x8) InterleaveLoGrouped(y Int32x8) Int32x8
- func (x Int32x8) IsZero() bool
- func (x Int32x8) LeadingZeros() Int32x8
- func (x Int32x8) Len() int
- func (x Int32x8) Less(y Int32x8) Mask32x8
- func (x Int32x8) LessEqual(y Int32x8) Mask32x8
- func (x Int32x8) Masked(mask Mask32x8) Int32x8
- func (x Int32x8) Max(y Int32x8) Int32x8
- func (x Int32x8) Merge(y Int32x8, mask Mask32x8) Int32x8
- func (x Int32x8) Min(y Int32x8) Int32x8
- func (x Int32x8) Mul(y Int32x8) Int32x8
- func (x Int32x8) MulEvenWiden(y Int32x8) Int64x4
- func (x Int32x8) Not() Int32x8
- func (x Int32x8) NotEqual(y Int32x8) Mask32x8
- func (x Int32x8) OnesCount() Int32x8
- func (x Int32x8) Or(y Int32x8) Int32x8
- func (x Int32x8) Permute(indices Uint32x8) Int32x8
- func (x Int32x8) PermuteScalarsGrouped(a, b, c, d uint8) Int32x8
- func (x Int32x8) RotateAllLeft(shift uint8) Int32x8
- func (x Int32x8) RotateAllRight(shift uint8) Int32x8
- func (x Int32x8) RotateLeft(y Int32x8) Int32x8
- func (x Int32x8) RotateRight(y Int32x8) Int32x8
- func (x Int32x8) SaturateToInt16() Int16x8
- func (x Int32x8) SaturateToInt16ConcatGrouped(y Int32x8) Int16x16
- func (x Int32x8) SaturateToInt8() Int8x16
- func (x Int32x8) SaturateToUint16ConcatGrouped(y Int32x8) Uint16x16
- func (x Int32x8) Select128FromPair(lo, hi uint8, y Int32x8) Int32x8
- func (x Int32x8) SelectFromPairGrouped(a, b, c, d uint8, y Int32x8) Int32x8
- func (x Int32x8) SetHi(y Int32x4) Int32x8
- func (x Int32x8) SetLo(y Int32x4) Int32x8
- func (x Int32x8) ShiftAllLeft(y uint64) Int32x8
- func (x Int32x8) ShiftAllLeftConcat(shift uint8, y Int32x8) Int32x8
- func (x Int32x8) ShiftAllRight(y uint64) Int32x8
- func (x Int32x8) ShiftAllRightConcat(shift uint8, y Int32x8) Int32x8
- func (x Int32x8) ShiftLeft(y Int32x8) Int32x8
- func (x Int32x8) ShiftLeftConcat(y Int32x8, z Int32x8) Int32x8
- func (x Int32x8) ShiftRight(y Int32x8) Int32x8
- func (x Int32x8) ShiftRightConcat(y Int32x8, z Int32x8) Int32x8
- func (x Int32x8) Store(y *[8]int32)
- func (x Int32x8) StoreMasked(y *[8]int32, mask Mask32x8)
- func (x Int32x8) StoreSlice(s []int32)
- func (x Int32x8) StoreSlicePart(s []int32)
- func (x Int32x8) String() string
- func (x Int32x8) Sub(y Int32x8) Int32x8
- func (x Int32x8) SubPairsGrouped(y Int32x8) Int32x8
- func (from Int32x8) ToMask() (to Mask32x8)
- func (x Int32x8) TruncateToInt16() Int16x8
- func (x Int32x8) TruncateToInt8() Int8x16
- func (x Int32x8) Xor(y Int32x8) Int32x8
- type Int64x2
- func (x Int64x2) Abs() Int64x2
- func (x Int64x2) Add(y Int64x2) Int64x2
- func (x Int64x2) And(y Int64x2) Int64x2
- func (x Int64x2) AndNot(y Int64x2) Int64x2
- func (x Int64x2) AsFloat32x4() Float32x4
- func (x Int64x2) AsFloat64x2() Float64x2
- func (x Int64x2) AsInt16x8() Int16x8
- func (x Int64x2) AsInt32x4() Int32x4
- func (x Int64x2) AsInt8x16() Int8x16
- func (x Int64x2) AsUint16x8() Uint16x8
- func (x Int64x2) AsUint32x4() Uint32x4
- func (x Int64x2) AsUint64x2() Uint64x2
- func (x Int64x2) AsUint8x16() Uint8x16
- func (x Int64x2) Broadcast1To2() Int64x2
- func (x Int64x2) Broadcast1To4() Int64x4
- func (x Int64x2) Broadcast1To8() Int64x8
- func (x Int64x2) Compress(mask Mask64x2) Int64x2
- func (x Int64x2) ConcatPermute(y Int64x2, indices Uint64x2) Int64x2
- func (x Int64x2) ConvertToFloat32() Float32x4
- func (x Int64x2) ConvertToFloat64() Float64x2
- func (x Int64x2) Equal(y Int64x2) Mask64x2
- func (x Int64x2) Expand(mask Mask64x2) Int64x2
- func (x Int64x2) GetElem(index uint8) int64
- func (x Int64x2) Greater(y Int64x2) Mask64x2
- func (x Int64x2) GreaterEqual(y Int64x2) Mask64x2
- func (x Int64x2) InterleaveHi(y Int64x2) Int64x2
- func (x Int64x2) InterleaveLo(y Int64x2) Int64x2
- func (x Int64x2) IsZero() bool
- func (x Int64x2) LeadingZeros() Int64x2
- func (x Int64x2) Len() int
- func (x Int64x2) Less(y Int64x2) Mask64x2
- func (x Int64x2) LessEqual(y Int64x2) Mask64x2
- func (x Int64x2) Masked(mask Mask64x2) Int64x2
- func (x Int64x2) Max(y Int64x2) Int64x2
- func (x Int64x2) Merge(y Int64x2, mask Mask64x2) Int64x2
- func (x Int64x2) Min(y Int64x2) Int64x2
- func (x Int64x2) Mul(y Int64x2) Int64x2
- func (x Int64x2) Not() Int64x2
- func (x Int64x2) NotEqual(y Int64x2) Mask64x2
- func (x Int64x2) OnesCount() Int64x2
- func (x Int64x2) Or(y Int64x2) Int64x2
- func (x Int64x2) RotateAllLeft(shift uint8) Int64x2
- func (x Int64x2) RotateAllRight(shift uint8) Int64x2
- func (x Int64x2) RotateLeft(y Int64x2) Int64x2
- func (x Int64x2) RotateRight(y Int64x2) Int64x2
- func (x Int64x2) SaturateToInt16() Int16x8
- func (x Int64x2) SaturateToInt32() Int32x4
- func (x Int64x2) SaturateToInt8() Int8x16
- func (x Int64x2) SelectFromPair(a, b uint8, y Int64x2) Int64x2
- func (x Int64x2) SetElem(index uint8, y int64) Int64x2
- func (x Int64x2) ShiftAllLeft(y uint64) Int64x2
- func (x Int64x2) ShiftAllLeftConcat(shift uint8, y Int64x2) Int64x2
- func (x Int64x2) ShiftAllRight(y uint64) Int64x2
- func (x Int64x2) ShiftAllRightConcat(shift uint8, y Int64x2) Int64x2
- func (x Int64x2) ShiftLeft(y Int64x2) Int64x2
- func (x Int64x2) ShiftLeftConcat(y Int64x2, z Int64x2) Int64x2
- func (x Int64x2) ShiftRight(y Int64x2) Int64x2
- func (x Int64x2) ShiftRightConcat(y Int64x2, z Int64x2) Int64x2
- func (x Int64x2) Store(y *[2]int64)
- func (x Int64x2) StoreMasked(y *[2]int64, mask Mask64x2)
- func (x Int64x2) StoreSlice(s []int64)
- func (x Int64x2) StoreSlicePart(s []int64)
- func (x Int64x2) String() string
- func (x Int64x2) Sub(y Int64x2) Int64x2
- func (from Int64x2) ToMask() (to Mask64x2)
- func (x Int64x2) TruncateToInt16() Int16x8
- func (x Int64x2) TruncateToInt32() Int32x4
- func (x Int64x2) TruncateToInt8() Int8x16
- func (x Int64x2) Xor(y Int64x2) Int64x2
- type Int64x4
- func (x Int64x4) Abs() Int64x4
- func (x Int64x4) Add(y Int64x4) Int64x4
- func (x Int64x4) And(y Int64x4) Int64x4
- func (x Int64x4) AndNot(y Int64x4) Int64x4
- func (x Int64x4) AsFloat32x8() Float32x8
- func (x Int64x4) AsFloat64x4() Float64x4
- func (x Int64x4) AsInt16x16() Int16x16
- func (x Int64x4) AsInt32x8() Int32x8
- func (x Int64x4) AsInt8x32() Int8x32
- func (x Int64x4) AsUint16x16() Uint16x16
- func (x Int64x4) AsUint32x8() Uint32x8
- func (x Int64x4) AsUint64x4() Uint64x4
- func (x Int64x4) AsUint8x32() Uint8x32
- func (x Int64x4) Compress(mask Mask64x4) Int64x4
- func (x Int64x4) ConcatPermute(y Int64x4, indices Uint64x4) Int64x4
- func (x Int64x4) ConvertToFloat32() Float32x4
- func (x Int64x4) ConvertToFloat64() Float64x4
- func (x Int64x4) Equal(y Int64x4) Mask64x4
- func (x Int64x4) Expand(mask Mask64x4) Int64x4
- func (x Int64x4) GetHi() Int64x2
- func (x Int64x4) GetLo() Int64x2
- func (x Int64x4) Greater(y Int64x4) Mask64x4
- func (x Int64x4) GreaterEqual(y Int64x4) Mask64x4
- func (x Int64x4) InterleaveHiGrouped(y Int64x4) Int64x4
- func (x Int64x4) InterleaveLoGrouped(y Int64x4) Int64x4
- func (x Int64x4) IsZero() bool
- func (x Int64x4) LeadingZeros() Int64x4
- func (x Int64x4) Len() int
- func (x Int64x4) Less(y Int64x4) Mask64x4
- func (x Int64x4) LessEqual(y Int64x4) Mask64x4
- func (x Int64x4) Masked(mask Mask64x4) Int64x4
- func (x Int64x4) Max(y Int64x4) Int64x4
- func (x Int64x4) Merge(y Int64x4, mask Mask64x4) Int64x4
- func (x Int64x4) Min(y Int64x4) Int64x4
- func (x Int64x4) Mul(y Int64x4) Int64x4
- func (x Int64x4) Not() Int64x4
- func (x Int64x4) NotEqual(y Int64x4) Mask64x4
- func (x Int64x4) OnesCount() Int64x4
- func (x Int64x4) Or(y Int64x4) Int64x4
- func (x Int64x4) Permute(indices Uint64x4) Int64x4
- func (x Int64x4) RotateAllLeft(shift uint8) Int64x4
- func (x Int64x4) RotateAllRight(shift uint8) Int64x4
- func (x Int64x4) RotateLeft(y Int64x4) Int64x4
- func (x Int64x4) RotateRight(y Int64x4) Int64x4
- func (x Int64x4) SaturateToInt16() Int16x8
- func (x Int64x4) SaturateToInt32() Int32x4
- func (x Int64x4) SaturateToInt8() Int8x16
- func (x Int64x4) Select128FromPair(lo, hi uint8, y Int64x4) Int64x4
- func (x Int64x4) SelectFromPairGrouped(a, b uint8, y Int64x4) Int64x4
- func (x Int64x4) SetHi(y Int64x2) Int64x4
- func (x Int64x4) SetLo(y Int64x2) Int64x4
- func (x Int64x4) ShiftAllLeft(y uint64) Int64x4
- func (x Int64x4) ShiftAllLeftConcat(shift uint8, y Int64x4) Int64x4
- func (x Int64x4) ShiftAllRight(y uint64) Int64x4
- func (x Int64x4) ShiftAllRightConcat(shift uint8, y Int64x4) Int64x4
- func (x Int64x4) ShiftLeft(y Int64x4) Int64x4
- func (x Int64x4) ShiftLeftConcat(y Int64x4, z Int64x4) Int64x4
- func (x Int64x4) ShiftRight(y Int64x4) Int64x4
- func (x Int64x4) ShiftRightConcat(y Int64x4, z Int64x4) Int64x4
- func (x Int64x4) Store(y *[4]int64)
- func (x Int64x4) StoreMasked(y *[4]int64, mask Mask64x4)
- func (x Int64x4) StoreSlice(s []int64)
- func (x Int64x4) StoreSlicePart(s []int64)
- func (x Int64x4) String() string
- func (x Int64x4) Sub(y Int64x4) Int64x4
- func (from Int64x4) ToMask() (to Mask64x4)
- func (x Int64x4) TruncateToInt16() Int16x8
- func (x Int64x4) TruncateToInt32() Int32x4
- func (x Int64x4) TruncateToInt8() Int8x16
- func (x Int64x4) Xor(y Int64x4) Int64x4
- type Int64x8
- func (x Int64x8) Abs() Int64x8
- func (x Int64x8) Add(y Int64x8) Int64x8
- func (x Int64x8) And(y Int64x8) Int64x8
- func (x Int64x8) AndNot(y Int64x8) Int64x8
- func (x Int64x8) AsFloat32x16() Float32x16
- func (x Int64x8) AsFloat64x8() Float64x8
- func (x Int64x8) AsInt16x32() Int16x32
- func (x Int64x8) AsInt32x16() Int32x16
- func (x Int64x8) AsInt8x64() Int8x64
- func (x Int64x8) AsUint16x32() Uint16x32
- func (x Int64x8) AsUint32x16() Uint32x16
- func (x Int64x8) AsUint64x8() Uint64x8
- func (x Int64x8) AsUint8x64() Uint8x64
- func (x Int64x8) Compress(mask Mask64x8) Int64x8
- func (x Int64x8) ConcatPermute(y Int64x8, indices Uint64x8) Int64x8
- func (x Int64x8) ConvertToFloat32() Float32x8
- func (x Int64x8) ConvertToFloat64() Float64x8
- func (x Int64x8) Equal(y Int64x8) Mask64x8
- func (x Int64x8) Expand(mask Mask64x8) Int64x8
- func (x Int64x8) GetHi() Int64x4
- func (x Int64x8) GetLo() Int64x4
- func (x Int64x8) Greater(y Int64x8) Mask64x8
- func (x Int64x8) GreaterEqual(y Int64x8) Mask64x8
- func (x Int64x8) InterleaveHiGrouped(y Int64x8) Int64x8
- func (x Int64x8) InterleaveLoGrouped(y Int64x8) Int64x8
- func (x Int64x8) LeadingZeros() Int64x8
- func (x Int64x8) Len() int
- func (x Int64x8) Less(y Int64x8) Mask64x8
- func (x Int64x8) LessEqual(y Int64x8) Mask64x8
- func (x Int64x8) Masked(mask Mask64x8) Int64x8
- func (x Int64x8) Max(y Int64x8) Int64x8
- func (x Int64x8) Merge(y Int64x8, mask Mask64x8) Int64x8
- func (x Int64x8) Min(y Int64x8) Int64x8
- func (x Int64x8) Mul(y Int64x8) Int64x8
- func (x Int64x8) Not() Int64x8
- func (x Int64x8) NotEqual(y Int64x8) Mask64x8
- func (x Int64x8) OnesCount() Int64x8
- func (x Int64x8) Or(y Int64x8) Int64x8
- func (x Int64x8) Permute(indices Uint64x8) Int64x8
- func (x Int64x8) RotateAllLeft(shift uint8) Int64x8
- func (x Int64x8) RotateAllRight(shift uint8) Int64x8
- func (x Int64x8) RotateLeft(y Int64x8) Int64x8
- func (x Int64x8) RotateRight(y Int64x8) Int64x8
- func (x Int64x8) SaturateToInt16() Int16x8
- func (x Int64x8) SaturateToInt32() Int32x8
- func (x Int64x8) SaturateToInt8() Int8x16
- func (x Int64x8) SelectFromPairGrouped(a, b uint8, y Int64x8) Int64x8
- func (x Int64x8) SetHi(y Int64x4) Int64x8
- func (x Int64x8) SetLo(y Int64x4) Int64x8
- func (x Int64x8) ShiftAllLeft(y uint64) Int64x8
- func (x Int64x8) ShiftAllLeftConcat(shift uint8, y Int64x8) Int64x8
- func (x Int64x8) ShiftAllRight(y uint64) Int64x8
- func (x Int64x8) ShiftAllRightConcat(shift uint8, y Int64x8) Int64x8
- func (x Int64x8) ShiftLeft(y Int64x8) Int64x8
- func (x Int64x8) ShiftLeftConcat(y Int64x8, z Int64x8) Int64x8
- func (x Int64x8) ShiftRight(y Int64x8) Int64x8
- func (x Int64x8) ShiftRightConcat(y Int64x8, z Int64x8) Int64x8
- func (x Int64x8) Store(y *[8]int64)
- func (x Int64x8) StoreMasked(y *[8]int64, mask Mask64x8)
- func (x Int64x8) StoreSlice(s []int64)
- func (x Int64x8) StoreSlicePart(s []int64)
- func (x Int64x8) String() string
- func (x Int64x8) Sub(y Int64x8) Int64x8
- func (from Int64x8) ToMask() (to Mask64x8)
- func (x Int64x8) TruncateToInt16() Int16x8
- func (x Int64x8) TruncateToInt32() Int32x8
- func (x Int64x8) TruncateToInt8() Int8x16
- func (x Int64x8) Xor(y Int64x8) Int64x8
- type Int8x16
- func (x Int8x16) Abs() Int8x16
- func (x Int8x16) Add(y Int8x16) Int8x16
- func (x Int8x16) AddSaturated(y Int8x16) Int8x16
- func (x Int8x16) And(y Int8x16) Int8x16
- func (x Int8x16) AndNot(y Int8x16) Int8x16
- func (x Int8x16) AsFloat32x4() Float32x4
- func (x Int8x16) AsFloat64x2() Float64x2
- func (x Int8x16) AsInt16x8() Int16x8
- func (x Int8x16) AsInt32x4() Int32x4
- func (x Int8x16) AsInt64x2() Int64x2
- func (x Int8x16) AsUint16x8() Uint16x8
- func (x Int8x16) AsUint32x4() Uint32x4
- func (x Int8x16) AsUint64x2() Uint64x2
- func (x Int8x16) AsUint8x16() Uint8x16
- func (x Int8x16) Broadcast1To16() Int8x16
- func (x Int8x16) Broadcast1To32() Int8x32
- func (x Int8x16) Broadcast1To64() Int8x64
- func (x Int8x16) Compress(mask Mask8x16) Int8x16
- func (x Int8x16) ConcatPermute(y Int8x16, indices Uint8x16) Int8x16
- func (x Int8x16) CopySign(y Int8x16) Int8x16
- func (x Int8x16) Equal(y Int8x16) Mask8x16
- func (x Int8x16) Expand(mask Mask8x16) Int8x16
- func (x Int8x16) ExtendLo2ToInt64() Int64x2
- func (x Int8x16) ExtendLo4ToInt32() Int32x4
- func (x Int8x16) ExtendLo4ToInt64() Int64x4
- func (x Int8x16) ExtendLo8ToInt16() Int16x8
- func (x Int8x16) ExtendLo8ToInt32() Int32x8
- func (x Int8x16) ExtendLo8ToInt64() Int64x8
- func (x Int8x16) ExtendToInt16() Int16x16
- func (x Int8x16) ExtendToInt32() Int32x16
- func (x Int8x16) GetElem(index uint8) int8
- func (x Int8x16) Greater(y Int8x16) Mask8x16
- func (x Int8x16) GreaterEqual(y Int8x16) Mask8x16
- func (x Int8x16) IsZero() bool
- func (x Int8x16) Len() int
- func (x Int8x16) Less(y Int8x16) Mask8x16
- func (x Int8x16) LessEqual(y Int8x16) Mask8x16
- func (x Int8x16) Masked(mask Mask8x16) Int8x16
- func (x Int8x16) Max(y Int8x16) Int8x16
- func (x Int8x16) Merge(y Int8x16, mask Mask8x16) Int8x16
- func (x Int8x16) Min(y Int8x16) Int8x16
- func (x Int8x16) Not() Int8x16
- func (x Int8x16) NotEqual(y Int8x16) Mask8x16
- func (x Int8x16) OnesCount() Int8x16
- func (x Int8x16) Or(y Int8x16) Int8x16
- func (x Int8x16) Permute(indices Uint8x16) Int8x16
- func (x Int8x16) PermuteOrZero(indices Int8x16) Int8x16
- func (x Int8x16) SetElem(index uint8, y int8) Int8x16
- func (x Int8x16) Store(y *[16]int8)
- func (x Int8x16) StoreSlice(s []int8)
- func (x Int8x16) StoreSlicePart(s []int8)
- func (x Int8x16) String() string
- func (x Int8x16) Sub(y Int8x16) Int8x16
- func (x Int8x16) SubSaturated(y Int8x16) Int8x16
- func (from Int8x16) ToMask() (to Mask8x16)
- func (x Int8x16) Xor(y Int8x16) Int8x16
- type Int8x32
- func (x Int8x32) Abs() Int8x32
- func (x Int8x32) Add(y Int8x32) Int8x32
- func (x Int8x32) AddSaturated(y Int8x32) Int8x32
- func (x Int8x32) And(y Int8x32) Int8x32
- func (x Int8x32) AndNot(y Int8x32) Int8x32
- func (x Int8x32) AsFloat32x8() Float32x8
- func (x Int8x32) AsFloat64x4() Float64x4
- func (x Int8x32) AsInt16x16() Int16x16
- func (x Int8x32) AsInt32x8() Int32x8
- func (x Int8x32) AsInt64x4() Int64x4
- func (x Int8x32) AsUint16x16() Uint16x16
- func (x Int8x32) AsUint32x8() Uint32x8
- func (x Int8x32) AsUint64x4() Uint64x4
- func (x Int8x32) AsUint8x32() Uint8x32
- func (x Int8x32) Compress(mask Mask8x32) Int8x32
- func (x Int8x32) ConcatPermute(y Int8x32, indices Uint8x32) Int8x32
- func (x Int8x32) CopySign(y Int8x32) Int8x32
- func (x Int8x32) Equal(y Int8x32) Mask8x32
- func (x Int8x32) Expand(mask Mask8x32) Int8x32
- func (x Int8x32) ExtendToInt16() Int16x32
- func (x Int8x32) GetHi() Int8x16
- func (x Int8x32) GetLo() Int8x16
- func (x Int8x32) Greater(y Int8x32) Mask8x32
- func (x Int8x32) GreaterEqual(y Int8x32) Mask8x32
- func (x Int8x32) IsZero() bool
- func (x Int8x32) Len() int
- func (x Int8x32) Less(y Int8x32) Mask8x32
- func (x Int8x32) LessEqual(y Int8x32) Mask8x32
- func (x Int8x32) Masked(mask Mask8x32) Int8x32
- func (x Int8x32) Max(y Int8x32) Int8x32
- func (x Int8x32) Merge(y Int8x32, mask Mask8x32) Int8x32
- func (x Int8x32) Min(y Int8x32) Int8x32
- func (x Int8x32) Not() Int8x32
- func (x Int8x32) NotEqual(y Int8x32) Mask8x32
- func (x Int8x32) OnesCount() Int8x32
- func (x Int8x32) Or(y Int8x32) Int8x32
- func (x Int8x32) Permute(indices Uint8x32) Int8x32
- func (x Int8x32) PermuteOrZeroGrouped(indices Int8x32) Int8x32
- func (x Int8x32) Select128FromPair(lo, hi uint8, y Int8x32) Int8x32
- func (x Int8x32) SetHi(y Int8x16) Int8x32
- func (x Int8x32) SetLo(y Int8x16) Int8x32
- func (x Int8x32) Store(y *[32]int8)
- func (x Int8x32) StoreSlice(s []int8)
- func (x Int8x32) StoreSlicePart(s []int8)
- func (x Int8x32) String() string
- func (x Int8x32) Sub(y Int8x32) Int8x32
- func (x Int8x32) SubSaturated(y Int8x32) Int8x32
- func (from Int8x32) ToMask() (to Mask8x32)
- func (x Int8x32) Xor(y Int8x32) Int8x32
- type Int8x64
- func (x Int8x64) Abs() Int8x64
- func (x Int8x64) Add(y Int8x64) Int8x64
- func (x Int8x64) AddSaturated(y Int8x64) Int8x64
- func (x Int8x64) And(y Int8x64) Int8x64
- func (x Int8x64) AndNot(y Int8x64) Int8x64
- func (x Int8x64) AsFloat32x16() Float32x16
- func (x Int8x64) AsFloat64x8() Float64x8
- func (x Int8x64) AsInt16x32() Int16x32
- func (x Int8x64) AsInt32x16() Int32x16
- func (x Int8x64) AsInt64x8() Int64x8
- func (x Int8x64) AsUint16x32() Uint16x32
- func (x Int8x64) AsUint32x16() Uint32x16
- func (x Int8x64) AsUint64x8() Uint64x8
- func (x Int8x64) AsUint8x64() Uint8x64
- func (x Int8x64) Compress(mask Mask8x64) Int8x64
- func (x Int8x64) ConcatPermute(y Int8x64, indices Uint8x64) Int8x64
- func (x Int8x64) Equal(y Int8x64) Mask8x64
- func (x Int8x64) Expand(mask Mask8x64) Int8x64
- func (x Int8x64) GetHi() Int8x32
- func (x Int8x64) GetLo() Int8x32
- func (x Int8x64) Greater(y Int8x64) Mask8x64
- func (x Int8x64) GreaterEqual(y Int8x64) Mask8x64
- func (x Int8x64) Len() int
- func (x Int8x64) Less(y Int8x64) Mask8x64
- func (x Int8x64) LessEqual(y Int8x64) Mask8x64
- func (x Int8x64) Masked(mask Mask8x64) Int8x64
- func (x Int8x64) Max(y Int8x64) Int8x64
- func (x Int8x64) Merge(y Int8x64, mask Mask8x64) Int8x64
- func (x Int8x64) Min(y Int8x64) Int8x64
- func (x Int8x64) Not() Int8x64
- func (x Int8x64) NotEqual(y Int8x64) Mask8x64
- func (x Int8x64) OnesCount() Int8x64
- func (x Int8x64) Or(y Int8x64) Int8x64
- func (x Int8x64) Permute(indices Uint8x64) Int8x64
- func (x Int8x64) PermuteOrZeroGrouped(indices Int8x64) Int8x64
- func (x Int8x64) SetHi(y Int8x32) Int8x64
- func (x Int8x64) SetLo(y Int8x32) Int8x64
- func (x Int8x64) Store(y *[64]int8)
- func (x Int8x64) StoreMasked(y *[64]int8, mask Mask8x64)
- func (x Int8x64) StoreSlice(s []int8)
- func (x Int8x64) StoreSlicePart(s []int8)
- func (x Int8x64) String() string
- func (x Int8x64) Sub(y Int8x64) Int8x64
- func (x Int8x64) SubSaturated(y Int8x64) Int8x64
- func (from Int8x64) ToMask() (to Mask8x64)
- func (x Int8x64) Xor(y Int8x64) Int8x64
- type Mask16x16
- type Mask16x32
- type Mask16x8
- type Mask32x16
- type Mask32x4
- type Mask32x8
- type Mask64x2
- type Mask64x4
- type Mask64x8
- type Mask8x16
- type Mask8x32
- type Mask8x64
- type Uint16x16
- func (x Uint16x16) Add(y Uint16x16) Uint16x16
- func (x Uint16x16) AddPairsGrouped(y Uint16x16) Uint16x16
- func (x Uint16x16) AddSaturated(y Uint16x16) Uint16x16
- func (x Uint16x16) And(y Uint16x16) Uint16x16
- func (x Uint16x16) AndNot(y Uint16x16) Uint16x16
- func (x Uint16x16) AsFloat32x8() Float32x8
- func (x Uint16x16) AsFloat64x4() Float64x4
- func (x Uint16x16) AsInt16x16() Int16x16
- func (x Uint16x16) AsInt32x8() Int32x8
- func (x Uint16x16) AsInt64x4() Int64x4
- func (x Uint16x16) AsInt8x32() Int8x32
- func (x Uint16x16) AsUint32x8() Uint32x8
- func (x Uint16x16) AsUint64x4() Uint64x4
- func (x Uint16x16) AsUint8x32() Uint8x32
- func (x Uint16x16) Average(y Uint16x16) Uint16x16
- func (x Uint16x16) Compress(mask Mask16x16) Uint16x16
- func (x Uint16x16) ConcatPermute(y Uint16x16, indices Uint16x16) Uint16x16
- func (x Uint16x16) Equal(y Uint16x16) Mask16x16
- func (x Uint16x16) Expand(mask Mask16x16) Uint16x16
- func (x Uint16x16) ExtendToUint32() Uint32x16
- func (x Uint16x16) GetHi() Uint16x8
- func (x Uint16x16) GetLo() Uint16x8
- func (x Uint16x16) Greater(y Uint16x16) Mask16x16
- func (x Uint16x16) GreaterEqual(y Uint16x16) Mask16x16
- func (x Uint16x16) InterleaveHiGrouped(y Uint16x16) Uint16x16
- func (x Uint16x16) InterleaveLoGrouped(y Uint16x16) Uint16x16
- func (x Uint16x16) IsZero() bool
- func (x Uint16x16) Len() int
- func (x Uint16x16) Less(y Uint16x16) Mask16x16
- func (x Uint16x16) LessEqual(y Uint16x16) Mask16x16
- func (x Uint16x16) Masked(mask Mask16x16) Uint16x16
- func (x Uint16x16) Max(y Uint16x16) Uint16x16
- func (x Uint16x16) Merge(y Uint16x16, mask Mask16x16) Uint16x16
- func (x Uint16x16) Min(y Uint16x16) Uint16x16
- func (x Uint16x16) Mul(y Uint16x16) Uint16x16
- func (x Uint16x16) MulHigh(y Uint16x16) Uint16x16
- func (x Uint16x16) Not() Uint16x16
- func (x Uint16x16) NotEqual(y Uint16x16) Mask16x16
- func (x Uint16x16) OnesCount() Uint16x16
- func (x Uint16x16) Or(y Uint16x16) Uint16x16
- func (x Uint16x16) Permute(indices Uint16x16) Uint16x16
- func (x Uint16x16) PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x16
- func (x Uint16x16) PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x16
- func (x Uint16x16) SaturateToUint8() Uint8x16
- func (x Uint16x16) Select128FromPair(lo, hi uint8, y Uint16x16) Uint16x16
- func (x Uint16x16) SetHi(y Uint16x8) Uint16x16
- func (x Uint16x16) SetLo(y Uint16x8) Uint16x16
- func (x Uint16x16) ShiftAllLeft(y uint64) Uint16x16
- func (x Uint16x16) ShiftAllLeftConcat(shift uint8, y Uint16x16) Uint16x16
- func (x Uint16x16) ShiftAllRight(y uint64) Uint16x16
- func (x Uint16x16) ShiftAllRightConcat(shift uint8, y Uint16x16) Uint16x16
- func (x Uint16x16) ShiftLeft(y Uint16x16) Uint16x16
- func (x Uint16x16) ShiftLeftConcat(y Uint16x16, z Uint16x16) Uint16x16
- func (x Uint16x16) ShiftRight(y Uint16x16) Uint16x16
- func (x Uint16x16) ShiftRightConcat(y Uint16x16, z Uint16x16) Uint16x16
- func (x Uint16x16) Store(y *[16]uint16)
- func (x Uint16x16) StoreSlice(s []uint16)
- func (x Uint16x16) StoreSlicePart(s []uint16)
- func (x Uint16x16) String() string
- func (x Uint16x16) Sub(y Uint16x16) Uint16x16
- func (x Uint16x16) SubPairsGrouped(y Uint16x16) Uint16x16
- func (x Uint16x16) SubSaturated(y Uint16x16) Uint16x16
- func (x Uint16x16) TruncateToUint8() Uint8x16
- func (x Uint16x16) Xor(y Uint16x16) Uint16x16
- type Uint16x32
- func (x Uint16x32) Add(y Uint16x32) Uint16x32
- func (x Uint16x32) AddSaturated(y Uint16x32) Uint16x32
- func (x Uint16x32) And(y Uint16x32) Uint16x32
- func (x Uint16x32) AndNot(y Uint16x32) Uint16x32
- func (x Uint16x32) AsFloat32x16() Float32x16
- func (x Uint16x32) AsFloat64x8() Float64x8
- func (x Uint16x32) AsInt16x32() Int16x32
- func (x Uint16x32) AsInt32x16() Int32x16
- func (x Uint16x32) AsInt64x8() Int64x8
- func (x Uint16x32) AsInt8x64() Int8x64
- func (x Uint16x32) AsUint32x16() Uint32x16
- func (x Uint16x32) AsUint64x8() Uint64x8
- func (x Uint16x32) AsUint8x64() Uint8x64
- func (x Uint16x32) Average(y Uint16x32) Uint16x32
- func (x Uint16x32) Compress(mask Mask16x32) Uint16x32
- func (x Uint16x32) ConcatPermute(y Uint16x32, indices Uint16x32) Uint16x32
- func (x Uint16x32) Equal(y Uint16x32) Mask16x32
- func (x Uint16x32) Expand(mask Mask16x32) Uint16x32
- func (x Uint16x32) GetHi() Uint16x16
- func (x Uint16x32) GetLo() Uint16x16
- func (x Uint16x32) Greater(y Uint16x32) Mask16x32
- func (x Uint16x32) GreaterEqual(y Uint16x32) Mask16x32
- func (x Uint16x32) InterleaveHiGrouped(y Uint16x32) Uint16x32
- func (x Uint16x32) InterleaveLoGrouped(y Uint16x32) Uint16x32
- func (x Uint16x32) Len() int
- func (x Uint16x32) Less(y Uint16x32) Mask16x32
- func (x Uint16x32) LessEqual(y Uint16x32) Mask16x32
- func (x Uint16x32) Masked(mask Mask16x32) Uint16x32
- func (x Uint16x32) Max(y Uint16x32) Uint16x32
- func (x Uint16x32) Merge(y Uint16x32, mask Mask16x32) Uint16x32
- func (x Uint16x32) Min(y Uint16x32) Uint16x32
- func (x Uint16x32) Mul(y Uint16x32) Uint16x32
- func (x Uint16x32) MulHigh(y Uint16x32) Uint16x32
- func (x Uint16x32) Not() Uint16x32
- func (x Uint16x32) NotEqual(y Uint16x32) Mask16x32
- func (x Uint16x32) OnesCount() Uint16x32
- func (x Uint16x32) Or(y Uint16x32) Uint16x32
- func (x Uint16x32) Permute(indices Uint16x32) Uint16x32
- func (x Uint16x32) PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x32
- func (x Uint16x32) PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x32
- func (x Uint16x32) SaturateToUint8() Uint8x32
- func (x Uint16x32) SetHi(y Uint16x16) Uint16x32
- func (x Uint16x32) SetLo(y Uint16x16) Uint16x32
- func (x Uint16x32) ShiftAllLeft(y uint64) Uint16x32
- func (x Uint16x32) ShiftAllLeftConcat(shift uint8, y Uint16x32) Uint16x32
- func (x Uint16x32) ShiftAllRight(y uint64) Uint16x32
- func (x Uint16x32) ShiftAllRightConcat(shift uint8, y Uint16x32) Uint16x32
- func (x Uint16x32) ShiftLeft(y Uint16x32) Uint16x32
- func (x Uint16x32) ShiftLeftConcat(y Uint16x32, z Uint16x32) Uint16x32
- func (x Uint16x32) ShiftRight(y Uint16x32) Uint16x32
- func (x Uint16x32) ShiftRightConcat(y Uint16x32, z Uint16x32) Uint16x32
- func (x Uint16x32) Store(y *[32]uint16)
- func (x Uint16x32) StoreMasked(y *[32]uint16, mask Mask16x32)
- func (x Uint16x32) StoreSlice(s []uint16)
- func (x Uint16x32) StoreSlicePart(s []uint16)
- func (x Uint16x32) String() string
- func (x Uint16x32) Sub(y Uint16x32) Uint16x32
- func (x Uint16x32) SubSaturated(y Uint16x32) Uint16x32
- func (x Uint16x32) TruncateToUint8() Uint8x32
- func (x Uint16x32) Xor(y Uint16x32) Uint16x32
- type Uint16x8
- func (x Uint16x8) Add(y Uint16x8) Uint16x8
- func (x Uint16x8) AddPairs(y Uint16x8) Uint16x8
- func (x Uint16x8) AddSaturated(y Uint16x8) Uint16x8
- func (x Uint16x8) And(y Uint16x8) Uint16x8
- func (x Uint16x8) AndNot(y Uint16x8) Uint16x8
- func (x Uint16x8) AsFloat32x4() Float32x4
- func (x Uint16x8) AsFloat64x2() Float64x2
- func (x Uint16x8) AsInt16x8() Int16x8
- func (x Uint16x8) AsInt32x4() Int32x4
- func (x Uint16x8) AsInt64x2() Int64x2
- func (x Uint16x8) AsInt8x16() Int8x16
- func (x Uint16x8) AsUint32x4() Uint32x4
- func (x Uint16x8) AsUint64x2() Uint64x2
- func (x Uint16x8) AsUint8x16() Uint8x16
- func (x Uint16x8) Average(y Uint16x8) Uint16x8
- func (x Uint16x8) Broadcast1To16() Uint16x16
- func (x Uint16x8) Broadcast1To32() Uint16x32
- func (x Uint16x8) Broadcast1To8() Uint16x8
- func (x Uint16x8) Compress(mask Mask16x8) Uint16x8
- func (x Uint16x8) ConcatPermute(y Uint16x8, indices Uint16x8) Uint16x8
- func (x Uint16x8) Equal(y Uint16x8) Mask16x8
- func (x Uint16x8) Expand(mask Mask16x8) Uint16x8
- func (x Uint16x8) ExtendLo2ToUint64() Uint64x2
- func (x Uint16x8) ExtendLo4ToUint32() Uint32x4
- func (x Uint16x8) ExtendLo4ToUint64() Uint64x4
- func (x Uint16x8) ExtendToUint32() Uint32x8
- func (x Uint16x8) ExtendToUint64() Uint64x8
- func (x Uint16x8) GetElem(index uint8) uint16
- func (x Uint16x8) Greater(y Uint16x8) Mask16x8
- func (x Uint16x8) GreaterEqual(y Uint16x8) Mask16x8
- func (x Uint16x8) InterleaveHi(y Uint16x8) Uint16x8
- func (x Uint16x8) InterleaveLo(y Uint16x8) Uint16x8
- func (x Uint16x8) IsZero() bool
- func (x Uint16x8) Len() int
- func (x Uint16x8) Less(y Uint16x8) Mask16x8
- func (x Uint16x8) LessEqual(y Uint16x8) Mask16x8
- func (x Uint16x8) Masked(mask Mask16x8) Uint16x8
- func (x Uint16x8) Max(y Uint16x8) Uint16x8
- func (x Uint16x8) Merge(y Uint16x8, mask Mask16x8) Uint16x8
- func (x Uint16x8) Min(y Uint16x8) Uint16x8
- func (x Uint16x8) Mul(y Uint16x8) Uint16x8
- func (x Uint16x8) MulHigh(y Uint16x8) Uint16x8
- func (x Uint16x8) Not() Uint16x8
- func (x Uint16x8) NotEqual(y Uint16x8) Mask16x8
- func (x Uint16x8) OnesCount() Uint16x8
- func (x Uint16x8) Or(y Uint16x8) Uint16x8
- func (x Uint16x8) Permute(indices Uint16x8) Uint16x8
- func (x Uint16x8) PermuteScalarsHi(a, b, c, d uint8) Uint16x8
- func (x Uint16x8) PermuteScalarsLo(a, b, c, d uint8) Uint16x8
- func (x Uint16x8) SaturateToUint8() Uint8x16
- func (x Uint16x8) SetElem(index uint8, y uint16) Uint16x8
- func (x Uint16x8) ShiftAllLeft(y uint64) Uint16x8
- func (x Uint16x8) ShiftAllLeftConcat(shift uint8, y Uint16x8) Uint16x8
- func (x Uint16x8) ShiftAllRight(y uint64) Uint16x8
- func (x Uint16x8) ShiftAllRightConcat(shift uint8, y Uint16x8) Uint16x8
- func (x Uint16x8) ShiftLeft(y Uint16x8) Uint16x8
- func (x Uint16x8) ShiftLeftConcat(y Uint16x8, z Uint16x8) Uint16x8
- func (x Uint16x8) ShiftRight(y Uint16x8) Uint16x8
- func (x Uint16x8) ShiftRightConcat(y Uint16x8, z Uint16x8) Uint16x8
- func (x Uint16x8) Store(y *[8]uint16)
- func (x Uint16x8) StoreSlice(s []uint16)
- func (x Uint16x8) StoreSlicePart(s []uint16)
- func (x Uint16x8) String() string
- func (x Uint16x8) Sub(y Uint16x8) Uint16x8
- func (x Uint16x8) SubPairs(y Uint16x8) Uint16x8
- func (x Uint16x8) SubSaturated(y Uint16x8) Uint16x8
- func (x Uint16x8) TruncateToUint8() Uint8x16
- func (x Uint16x8) Xor(y Uint16x8) Uint16x8
- type Uint32x16
- func (x Uint32x16) Add(y Uint32x16) Uint32x16
- func (x Uint32x16) And(y Uint32x16) Uint32x16
- func (x Uint32x16) AndNot(y Uint32x16) Uint32x16
- func (x Uint32x16) AsFloat32x16() Float32x16
- func (x Uint32x16) AsFloat64x8() Float64x8
- func (x Uint32x16) AsInt16x32() Int16x32
- func (x Uint32x16) AsInt32x16() Int32x16
- func (x Uint32x16) AsInt64x8() Int64x8
- func (x Uint32x16) AsInt8x64() Int8x64
- func (x Uint32x16) AsUint16x32() Uint16x32
- func (x Uint32x16) AsUint64x8() Uint64x8
- func (x Uint32x16) AsUint8x64() Uint8x64
- func (x Uint32x16) Compress(mask Mask32x16) Uint32x16
- func (x Uint32x16) ConcatPermute(y Uint32x16, indices Uint32x16) Uint32x16
- func (x Uint32x16) ConvertToFloat32() Float32x16
- func (x Uint32x16) Equal(y Uint32x16) Mask32x16
- func (x Uint32x16) Expand(mask Mask32x16) Uint32x16
- func (x Uint32x16) GetHi() Uint32x8
- func (x Uint32x16) GetLo() Uint32x8
- func (x Uint32x16) Greater(y Uint32x16) Mask32x16
- func (x Uint32x16) GreaterEqual(y Uint32x16) Mask32x16
- func (x Uint32x16) InterleaveHiGrouped(y Uint32x16) Uint32x16
- func (x Uint32x16) InterleaveLoGrouped(y Uint32x16) Uint32x16
- func (x Uint32x16) LeadingZeros() Uint32x16
- func (x Uint32x16) Len() int
- func (x Uint32x16) Less(y Uint32x16) Mask32x16
- func (x Uint32x16) LessEqual(y Uint32x16) Mask32x16
- func (x Uint32x16) Masked(mask Mask32x16) Uint32x16
- func (x Uint32x16) Max(y Uint32x16) Uint32x16
- func (x Uint32x16) Merge(y Uint32x16, mask Mask32x16) Uint32x16
- func (x Uint32x16) Min(y Uint32x16) Uint32x16
- func (x Uint32x16) Mul(y Uint32x16) Uint32x16
- func (x Uint32x16) Not() Uint32x16
- func (x Uint32x16) NotEqual(y Uint32x16) Mask32x16
- func (x Uint32x16) OnesCount() Uint32x16
- func (x Uint32x16) Or(y Uint32x16) Uint32x16
- func (x Uint32x16) Permute(indices Uint32x16) Uint32x16
- func (x Uint32x16) PermuteScalarsGrouped(a, b, c, d uint8) Uint32x16
- func (x Uint32x16) RotateAllLeft(shift uint8) Uint32x16
- func (x Uint32x16) RotateAllRight(shift uint8) Uint32x16
- func (x Uint32x16) RotateLeft(y Uint32x16) Uint32x16
- func (x Uint32x16) RotateRight(y Uint32x16) Uint32x16
- func (x Uint32x16) SaturateToUint16() Uint16x16
- func (x Uint32x16) SaturateToUint8() Uint8x16
- func (x Uint32x16) SelectFromPairGrouped(a, b, c, d uint8, y Uint32x16) Uint32x16
- func (x Uint32x16) SetHi(y Uint32x8) Uint32x16
- func (x Uint32x16) SetLo(y Uint32x8) Uint32x16
- func (x Uint32x16) ShiftAllLeft(y uint64) Uint32x16
- func (x Uint32x16) ShiftAllLeftConcat(shift uint8, y Uint32x16) Uint32x16
- func (x Uint32x16) ShiftAllRight(y uint64) Uint32x16
- func (x Uint32x16) ShiftAllRightConcat(shift uint8, y Uint32x16) Uint32x16
- func (x Uint32x16) ShiftLeft(y Uint32x16) Uint32x16
- func (x Uint32x16) ShiftLeftConcat(y Uint32x16, z Uint32x16) Uint32x16
- func (x Uint32x16) ShiftRight(y Uint32x16) Uint32x16
- func (x Uint32x16) ShiftRightConcat(y Uint32x16, z Uint32x16) Uint32x16
- func (x Uint32x16) Store(y *[16]uint32)
- func (x Uint32x16) StoreMasked(y *[16]uint32, mask Mask32x16)
- func (x Uint32x16) StoreSlice(s []uint32)
- func (x Uint32x16) StoreSlicePart(s []uint32)
- func (x Uint32x16) String() string
- func (x Uint32x16) Sub(y Uint32x16) Uint32x16
- func (x Uint32x16) TruncateToUint16() Uint16x16
- func (x Uint32x16) TruncateToUint8() Uint8x16
- func (x Uint32x16) Xor(y Uint32x16) Uint32x16
- type Uint32x4
- func (x Uint32x4) AESInvMixColumns() Uint32x4
- func (x Uint32x4) AESRoundKeyGenAssist(rconVal uint8) Uint32x4
- func (x Uint32x4) Add(y Uint32x4) Uint32x4
- func (x Uint32x4) AddPairs(y Uint32x4) Uint32x4
- func (x Uint32x4) And(y Uint32x4) Uint32x4
- func (x Uint32x4) AndNot(y Uint32x4) Uint32x4
- func (x Uint32x4) AsFloat32x4() Float32x4
- func (x Uint32x4) AsFloat64x2() Float64x2
- func (x Uint32x4) AsInt16x8() Int16x8
- func (x Uint32x4) AsInt32x4() Int32x4
- func (x Uint32x4) AsInt64x2() Int64x2
- func (x Uint32x4) AsInt8x16() Int8x16
- func (x Uint32x4) AsUint16x8() Uint16x8
- func (x Uint32x4) AsUint64x2() Uint64x2
- func (x Uint32x4) AsUint8x16() Uint8x16
- func (x Uint32x4) Broadcast1To16() Uint32x16
- func (x Uint32x4) Broadcast1To4() Uint32x4
- func (x Uint32x4) Broadcast1To8() Uint32x8
- func (x Uint32x4) Compress(mask Mask32x4) Uint32x4
- func (x Uint32x4) ConcatPermute(y Uint32x4, indices Uint32x4) Uint32x4
- func (x Uint32x4) ConvertToFloat32() Float32x4
- func (x Uint32x4) ConvertToFloat64() Float64x4
- func (x Uint32x4) Equal(y Uint32x4) Mask32x4
- func (x Uint32x4) Expand(mask Mask32x4) Uint32x4
- func (x Uint32x4) ExtendLo2ToUint64() Uint64x2
- func (x Uint32x4) ExtendToUint64() Uint64x4
- func (x Uint32x4) GetElem(index uint8) uint32
- func (x Uint32x4) Greater(y Uint32x4) Mask32x4
- func (x Uint32x4) GreaterEqual(y Uint32x4) Mask32x4
- func (x Uint32x4) InterleaveHi(y Uint32x4) Uint32x4
- func (x Uint32x4) InterleaveLo(y Uint32x4) Uint32x4
- func (x Uint32x4) IsZero() bool
- func (x Uint32x4) LeadingZeros() Uint32x4
- func (x Uint32x4) Len() int
- func (x Uint32x4) Less(y Uint32x4) Mask32x4
- func (x Uint32x4) LessEqual(y Uint32x4) Mask32x4
- func (x Uint32x4) Masked(mask Mask32x4) Uint32x4
- func (x Uint32x4) Max(y Uint32x4) Uint32x4
- func (x Uint32x4) Merge(y Uint32x4, mask Mask32x4) Uint32x4
- func (x Uint32x4) Min(y Uint32x4) Uint32x4
- func (x Uint32x4) Mul(y Uint32x4) Uint32x4
- func (x Uint32x4) MulEvenWiden(y Uint32x4) Uint64x2
- func (x Uint32x4) Not() Uint32x4
- func (x Uint32x4) NotEqual(y Uint32x4) Mask32x4
- func (x Uint32x4) OnesCount() Uint32x4
- func (x Uint32x4) Or(y Uint32x4) Uint32x4
- func (x Uint32x4) PermuteScalars(a, b, c, d uint8) Uint32x4
- func (x Uint32x4) RotateAllLeft(shift uint8) Uint32x4
- func (x Uint32x4) RotateAllRight(shift uint8) Uint32x4
- func (x Uint32x4) RotateLeft(y Uint32x4) Uint32x4
- func (x Uint32x4) RotateRight(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA1FourRounds(constant uint8, y Uint32x4) Uint32x4
- func (x Uint32x4) SHA1Message1(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA1Message2(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA1NextE(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA256Message1(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA256Message2(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA256TwoRounds(y Uint32x4, z Uint32x4) Uint32x4
- func (x Uint32x4) SaturateToUint16() Uint16x8
- func (x Uint32x4) SaturateToUint8() Uint8x16
- func (x Uint32x4) SelectFromPair(a, b, c, d uint8, y Uint32x4) Uint32x4
- func (x Uint32x4) SetElem(index uint8, y uint32) Uint32x4
- func (x Uint32x4) ShiftAllLeft(y uint64) Uint32x4
- func (x Uint32x4) ShiftAllLeftConcat(shift uint8, y Uint32x4) Uint32x4
- func (x Uint32x4) ShiftAllRight(y uint64) Uint32x4
- func (x Uint32x4) ShiftAllRightConcat(shift uint8, y Uint32x4) Uint32x4
- func (x Uint32x4) ShiftLeft(y Uint32x4) Uint32x4
- func (x Uint32x4) ShiftLeftConcat(y Uint32x4, z Uint32x4) Uint32x4
- func (x Uint32x4) ShiftRight(y Uint32x4) Uint32x4
- func (x Uint32x4) ShiftRightConcat(y Uint32x4, z Uint32x4) Uint32x4
- func (x Uint32x4) Store(y *[4]uint32)
- func (x Uint32x4) StoreMasked(y *[4]uint32, mask Mask32x4)
- func (x Uint32x4) StoreSlice(s []uint32)
- func (x Uint32x4) StoreSlicePart(s []uint32)
- func (x Uint32x4) String() string
- func (x Uint32x4) Sub(y Uint32x4) Uint32x4
- func (x Uint32x4) SubPairs(y Uint32x4) Uint32x4
- func (x Uint32x4) TruncateToUint16() Uint16x8
- func (x Uint32x4) TruncateToUint8() Uint8x16
- func (x Uint32x4) Xor(y Uint32x4) Uint32x4
- type Uint32x8
- func (x Uint32x8) Add(y Uint32x8) Uint32x8
- func (x Uint32x8) AddPairsGrouped(y Uint32x8) Uint32x8
- func (x Uint32x8) And(y Uint32x8) Uint32x8
- func (x Uint32x8) AndNot(y Uint32x8) Uint32x8
- func (x Uint32x8) AsFloat32x8() Float32x8
- func (x Uint32x8) AsFloat64x4() Float64x4
- func (x Uint32x8) AsInt16x16() Int16x16
- func (x Uint32x8) AsInt32x8() Int32x8
- func (x Uint32x8) AsInt64x4() Int64x4
- func (x Uint32x8) AsInt8x32() Int8x32
- func (x Uint32x8) AsUint16x16() Uint16x16
- func (x Uint32x8) AsUint64x4() Uint64x4
- func (x Uint32x8) AsUint8x32() Uint8x32
- func (x Uint32x8) Compress(mask Mask32x8) Uint32x8
- func (x Uint32x8) ConcatPermute(y Uint32x8, indices Uint32x8) Uint32x8
- func (x Uint32x8) ConvertToFloat32() Float32x8
- func (x Uint32x8) ConvertToFloat64() Float64x8
- func (x Uint32x8) Equal(y Uint32x8) Mask32x8
- func (x Uint32x8) Expand(mask Mask32x8) Uint32x8
- func (x Uint32x8) ExtendToUint64() Uint64x8
- func (x Uint32x8) GetHi() Uint32x4
- func (x Uint32x8) GetLo() Uint32x4
- func (x Uint32x8) Greater(y Uint32x8) Mask32x8
- func (x Uint32x8) GreaterEqual(y Uint32x8) Mask32x8
- func (x Uint32x8) InterleaveHiGrouped(y Uint32x8) Uint32x8
- func (x Uint32x8) InterleaveLoGrouped(y Uint32x8) Uint32x8
- func (x Uint32x8) IsZero() bool
- func (x Uint32x8) LeadingZeros() Uint32x8
- func (x Uint32x8) Len() int
- func (x Uint32x8) Less(y Uint32x8) Mask32x8
- func (x Uint32x8) LessEqual(y Uint32x8) Mask32x8
- func (x Uint32x8) Masked(mask Mask32x8) Uint32x8
- func (x Uint32x8) Max(y Uint32x8) Uint32x8
- func (x Uint32x8) Merge(y Uint32x8, mask Mask32x8) Uint32x8
- func (x Uint32x8) Min(y Uint32x8) Uint32x8
- func (x Uint32x8) Mul(y Uint32x8) Uint32x8
- func (x Uint32x8) MulEvenWiden(y Uint32x8) Uint64x4
- func (x Uint32x8) Not() Uint32x8
- func (x Uint32x8) NotEqual(y Uint32x8) Mask32x8
- func (x Uint32x8) OnesCount() Uint32x8
- func (x Uint32x8) Or(y Uint32x8) Uint32x8
- func (x Uint32x8) Permute(indices Uint32x8) Uint32x8
- func (x Uint32x8) PermuteScalarsGrouped(a, b, c, d uint8) Uint32x8
- func (x Uint32x8) RotateAllLeft(shift uint8) Uint32x8
- func (x Uint32x8) RotateAllRight(shift uint8) Uint32x8
- func (x Uint32x8) RotateLeft(y Uint32x8) Uint32x8
- func (x Uint32x8) RotateRight(y Uint32x8) Uint32x8
- func (x Uint32x8) SaturateToUint16() Uint16x8
- func (x Uint32x8) SaturateToUint8() Uint8x16
- func (x Uint32x8) Select128FromPair(lo, hi uint8, y Uint32x8) Uint32x8
- func (x Uint32x8) SelectFromPairGrouped(a, b, c, d uint8, y Uint32x8) Uint32x8
- func (x Uint32x8) SetHi(y Uint32x4) Uint32x8
- func (x Uint32x8) SetLo(y Uint32x4) Uint32x8
- func (x Uint32x8) ShiftAllLeft(y uint64) Uint32x8
- func (x Uint32x8) ShiftAllLeftConcat(shift uint8, y Uint32x8) Uint32x8
- func (x Uint32x8) ShiftAllRight(y uint64) Uint32x8
- func (x Uint32x8) ShiftAllRightConcat(shift uint8, y Uint32x8) Uint32x8
- func (x Uint32x8) ShiftLeft(y Uint32x8) Uint32x8
- func (x Uint32x8) ShiftLeftConcat(y Uint32x8, z Uint32x8) Uint32x8
- func (x Uint32x8) ShiftRight(y Uint32x8) Uint32x8
- func (x Uint32x8) ShiftRightConcat(y Uint32x8, z Uint32x8) Uint32x8
- func (x Uint32x8) Store(y *[8]uint32)
- func (x Uint32x8) StoreMasked(y *[8]uint32, mask Mask32x8)
- func (x Uint32x8) StoreSlice(s []uint32)
- func (x Uint32x8) StoreSlicePart(s []uint32)
- func (x Uint32x8) String() string
- func (x Uint32x8) Sub(y Uint32x8) Uint32x8
- func (x Uint32x8) SubPairsGrouped(y Uint32x8) Uint32x8
- func (x Uint32x8) TruncateToUint16() Uint16x8
- func (x Uint32x8) TruncateToUint8() Uint8x16
- func (x Uint32x8) Xor(y Uint32x8) Uint32x8
- type Uint64x2
- func (x Uint64x2) Add(y Uint64x2) Uint64x2
- func (x Uint64x2) And(y Uint64x2) Uint64x2
- func (x Uint64x2) AndNot(y Uint64x2) Uint64x2
- func (x Uint64x2) AsFloat32x4() Float32x4
- func (x Uint64x2) AsFloat64x2() Float64x2
- func (x Uint64x2) AsInt16x8() Int16x8
- func (x Uint64x2) AsInt32x4() Int32x4
- func (x Uint64x2) AsInt64x2() Int64x2
- func (x Uint64x2) AsInt8x16() Int8x16
- func (x Uint64x2) AsUint16x8() Uint16x8
- func (x Uint64x2) AsUint32x4() Uint32x4
- func (x Uint64x2) AsUint8x16() Uint8x16
- func (x Uint64x2) Broadcast1To2() Uint64x2
- func (x Uint64x2) Broadcast1To4() Uint64x4
- func (x Uint64x2) Broadcast1To8() Uint64x8
- func (x Uint64x2) CarrylessMultiply(a, b uint8, y Uint64x2) Uint64x2
- func (x Uint64x2) Compress(mask Mask64x2) Uint64x2
- func (x Uint64x2) ConcatPermute(y Uint64x2, indices Uint64x2) Uint64x2
- func (x Uint64x2) ConvertToFloat32() Float32x4
- func (x Uint64x2) ConvertToFloat64() Float64x2
- func (x Uint64x2) Equal(y Uint64x2) Mask64x2
- func (x Uint64x2) Expand(mask Mask64x2) Uint64x2
- func (x Uint64x2) GetElem(index uint8) uint64
- func (x Uint64x2) Greater(y Uint64x2) Mask64x2
- func (x Uint64x2) GreaterEqual(y Uint64x2) Mask64x2
- func (x Uint64x2) InterleaveHi(y Uint64x2) Uint64x2
- func (x Uint64x2) InterleaveLo(y Uint64x2) Uint64x2
- func (x Uint64x2) IsZero() bool
- func (x Uint64x2) LeadingZeros() Uint64x2
- func (x Uint64x2) Len() int
- func (x Uint64x2) Less(y Uint64x2) Mask64x2
- func (x Uint64x2) LessEqual(y Uint64x2) Mask64x2
- func (x Uint64x2) Masked(mask Mask64x2) Uint64x2
- func (x Uint64x2) Max(y Uint64x2) Uint64x2
- func (x Uint64x2) Merge(y Uint64x2, mask Mask64x2) Uint64x2
- func (x Uint64x2) Min(y Uint64x2) Uint64x2
- func (x Uint64x2) Mul(y Uint64x2) Uint64x2
- func (x Uint64x2) Not() Uint64x2
- func (x Uint64x2) NotEqual(y Uint64x2) Mask64x2
- func (x Uint64x2) OnesCount() Uint64x2
- func (x Uint64x2) Or(y Uint64x2) Uint64x2
- func (x Uint64x2) RotateAllLeft(shift uint8) Uint64x2
- func (x Uint64x2) RotateAllRight(shift uint8) Uint64x2
- func (x Uint64x2) RotateLeft(y Uint64x2) Uint64x2
- func (x Uint64x2) RotateRight(y Uint64x2) Uint64x2
- func (x Uint64x2) SaturateToUint16() Uint16x8
- func (x Uint64x2) SaturateToUint32() Uint32x4
- func (x Uint64x2) SaturateToUint8() Uint8x16
- func (x Uint64x2) SelectFromPair(a, b uint8, y Uint64x2) Uint64x2
- func (x Uint64x2) SetElem(index uint8, y uint64) Uint64x2
- func (x Uint64x2) ShiftAllLeft(y uint64) Uint64x2
- func (x Uint64x2) ShiftAllLeftConcat(shift uint8, y Uint64x2) Uint64x2
- func (x Uint64x2) ShiftAllRight(y uint64) Uint64x2
- func (x Uint64x2) ShiftAllRightConcat(shift uint8, y Uint64x2) Uint64x2
- func (x Uint64x2) ShiftLeft(y Uint64x2) Uint64x2
- func (x Uint64x2) ShiftLeftConcat(y Uint64x2, z Uint64x2) Uint64x2
- func (x Uint64x2) ShiftRight(y Uint64x2) Uint64x2
- func (x Uint64x2) ShiftRightConcat(y Uint64x2, z Uint64x2) Uint64x2
- func (x Uint64x2) Store(y *[2]uint64)
- func (x Uint64x2) StoreMasked(y *[2]uint64, mask Mask64x2)
- func (x Uint64x2) StoreSlice(s []uint64)
- func (x Uint64x2) StoreSlicePart(s []uint64)
- func (x Uint64x2) String() string
- func (x Uint64x2) Sub(y Uint64x2) Uint64x2
- func (x Uint64x2) TruncateToUint16() Uint16x8
- func (x Uint64x2) TruncateToUint32() Uint32x4
- func (x Uint64x2) TruncateToUint8() Uint8x16
- func (x Uint64x2) Xor(y Uint64x2) Uint64x2
- type Uint64x4
- func (x Uint64x4) Add(y Uint64x4) Uint64x4
- func (x Uint64x4) And(y Uint64x4) Uint64x4
- func (x Uint64x4) AndNot(y Uint64x4) Uint64x4
- func (x Uint64x4) AsFloat32x8() Float32x8
- func (x Uint64x4) AsFloat64x4() Float64x4
- func (x Uint64x4) AsInt16x16() Int16x16
- func (x Uint64x4) AsInt32x8() Int32x8
- func (x Uint64x4) AsInt64x4() Int64x4
- func (x Uint64x4) AsInt8x32() Int8x32
- func (x Uint64x4) AsUint16x16() Uint16x16
- func (x Uint64x4) AsUint32x8() Uint32x8
- func (x Uint64x4) AsUint8x32() Uint8x32
- func (x Uint64x4) CarrylessMultiplyGrouped(a, b uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) Compress(mask Mask64x4) Uint64x4
- func (x Uint64x4) ConcatPermute(y Uint64x4, indices Uint64x4) Uint64x4
- func (x Uint64x4) ConvertToFloat32() Float32x4
- func (x Uint64x4) ConvertToFloat64() Float64x4
- func (x Uint64x4) Equal(y Uint64x4) Mask64x4
- func (x Uint64x4) Expand(mask Mask64x4) Uint64x4
- func (x Uint64x4) GetHi() Uint64x2
- func (x Uint64x4) GetLo() Uint64x2
- func (x Uint64x4) Greater(y Uint64x4) Mask64x4
- func (x Uint64x4) GreaterEqual(y Uint64x4) Mask64x4
- func (x Uint64x4) InterleaveHiGrouped(y Uint64x4) Uint64x4
- func (x Uint64x4) InterleaveLoGrouped(y Uint64x4) Uint64x4
- func (x Uint64x4) IsZero() bool
- func (x Uint64x4) LeadingZeros() Uint64x4
- func (x Uint64x4) Len() int
- func (x Uint64x4) Less(y Uint64x4) Mask64x4
- func (x Uint64x4) LessEqual(y Uint64x4) Mask64x4
- func (x Uint64x4) Masked(mask Mask64x4) Uint64x4
- func (x Uint64x4) Max(y Uint64x4) Uint64x4
- func (x Uint64x4) Merge(y Uint64x4, mask Mask64x4) Uint64x4
- func (x Uint64x4) Min(y Uint64x4) Uint64x4
- func (x Uint64x4) Mul(y Uint64x4) Uint64x4
- func (x Uint64x4) Not() Uint64x4
- func (x Uint64x4) NotEqual(y Uint64x4) Mask64x4
- func (x Uint64x4) OnesCount() Uint64x4
- func (x Uint64x4) Or(y Uint64x4) Uint64x4
- func (x Uint64x4) Permute(indices Uint64x4) Uint64x4
- func (x Uint64x4) RotateAllLeft(shift uint8) Uint64x4
- func (x Uint64x4) RotateAllRight(shift uint8) Uint64x4
- func (x Uint64x4) RotateLeft(y Uint64x4) Uint64x4
- func (x Uint64x4) RotateRight(y Uint64x4) Uint64x4
- func (x Uint64x4) SaturateToUint16() Uint16x8
- func (x Uint64x4) SaturateToUint32() Uint32x4
- func (x Uint64x4) SaturateToUint8() Uint8x16
- func (x Uint64x4) Select128FromPair(lo, hi uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) SelectFromPairGrouped(a, b uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) SetHi(y Uint64x2) Uint64x4
- func (x Uint64x4) SetLo(y Uint64x2) Uint64x4
- func (x Uint64x4) ShiftAllLeft(y uint64) Uint64x4
- func (x Uint64x4) ShiftAllLeftConcat(shift uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) ShiftAllRight(y uint64) Uint64x4
- func (x Uint64x4) ShiftAllRightConcat(shift uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) ShiftLeft(y Uint64x4) Uint64x4
- func (x Uint64x4) ShiftLeftConcat(y Uint64x4, z Uint64x4) Uint64x4
- func (x Uint64x4) ShiftRight(y Uint64x4) Uint64x4
- func (x Uint64x4) ShiftRightConcat(y Uint64x4, z Uint64x4) Uint64x4
- func (x Uint64x4) Store(y *[4]uint64)
- func (x Uint64x4) StoreMasked(y *[4]uint64, mask Mask64x4)
- func (x Uint64x4) StoreSlice(s []uint64)
- func (x Uint64x4) StoreSlicePart(s []uint64)
- func (x Uint64x4) String() string
- func (x Uint64x4) Sub(y Uint64x4) Uint64x4
- func (x Uint64x4) TruncateToUint16() Uint16x8
- func (x Uint64x4) TruncateToUint32() Uint32x4
- func (x Uint64x4) TruncateToUint8() Uint8x16
- func (x Uint64x4) Xor(y Uint64x4) Uint64x4
- type Uint64x8
- func (x Uint64x8) Add(y Uint64x8) Uint64x8
- func (x Uint64x8) And(y Uint64x8) Uint64x8
- func (x Uint64x8) AndNot(y Uint64x8) Uint64x8
- func (x Uint64x8) AsFloat32x16() Float32x16
- func (x Uint64x8) AsFloat64x8() Float64x8
- func (x Uint64x8) AsInt16x32() Int16x32
- func (x Uint64x8) AsInt32x16() Int32x16
- func (x Uint64x8) AsInt64x8() Int64x8
- func (x Uint64x8) AsInt8x64() Int8x64
- func (x Uint64x8) AsUint16x32() Uint16x32
- func (x Uint64x8) AsUint32x16() Uint32x16
- func (x Uint64x8) AsUint8x64() Uint8x64
- func (x Uint64x8) CarrylessMultiplyGrouped(a, b uint8, y Uint64x8) Uint64x8
- func (x Uint64x8) Compress(mask Mask64x8) Uint64x8
- func (x Uint64x8) ConcatPermute(y Uint64x8, indices Uint64x8) Uint64x8
- func (x Uint64x8) ConvertToFloat32() Float32x8
- func (x Uint64x8) ConvertToFloat64() Float64x8
- func (x Uint64x8) Equal(y Uint64x8) Mask64x8
- func (x Uint64x8) Expand(mask Mask64x8) Uint64x8
- func (x Uint64x8) GetHi() Uint64x4
- func (x Uint64x8) GetLo() Uint64x4
- func (x Uint64x8) Greater(y Uint64x8) Mask64x8
- func (x Uint64x8) GreaterEqual(y Uint64x8) Mask64x8
- func (x Uint64x8) InterleaveHiGrouped(y Uint64x8) Uint64x8
- func (x Uint64x8) InterleaveLoGrouped(y Uint64x8) Uint64x8
- func (x Uint64x8) LeadingZeros() Uint64x8
- func (x Uint64x8) Len() int
- func (x Uint64x8) Less(y Uint64x8) Mask64x8
- func (x Uint64x8) LessEqual(y Uint64x8) Mask64x8
- func (x Uint64x8) Masked(mask Mask64x8) Uint64x8
- func (x Uint64x8) Max(y Uint64x8) Uint64x8
- func (x Uint64x8) Merge(y Uint64x8, mask Mask64x8) Uint64x8
- func (x Uint64x8) Min(y Uint64x8) Uint64x8
- func (x Uint64x8) Mul(y Uint64x8) Uint64x8
- func (x Uint64x8) Not() Uint64x8
- func (x Uint64x8) NotEqual(y Uint64x8) Mask64x8
- func (x Uint64x8) OnesCount() Uint64x8
- func (x Uint64x8) Or(y Uint64x8) Uint64x8
- func (x Uint64x8) Permute(indices Uint64x8) Uint64x8
- func (x Uint64x8) RotateAllLeft(shift uint8) Uint64x8
- func (x Uint64x8) RotateAllRight(shift uint8) Uint64x8
- func (x Uint64x8) RotateLeft(y Uint64x8) Uint64x8
- func (x Uint64x8) RotateRight(y Uint64x8) Uint64x8
- func (x Uint64x8) SaturateToUint16() Uint16x8
- func (x Uint64x8) SaturateToUint32() Uint32x8
- func (x Uint64x8) SaturateToUint8() Uint8x16
- func (x Uint64x8) SelectFromPairGrouped(a, b uint8, y Uint64x8) Uint64x8
- func (x Uint64x8) SetHi(y Uint64x4) Uint64x8
- func (x Uint64x8) SetLo(y Uint64x4) Uint64x8
- func (x Uint64x8) ShiftAllLeft(y uint64) Uint64x8
- func (x Uint64x8) ShiftAllLeftConcat(shift uint8, y Uint64x8) Uint64x8
- func (x Uint64x8) ShiftAllRight(y uint64) Uint64x8
- func (x Uint64x8) ShiftAllRightConcat(shift uint8, y Uint64x8) Uint64x8
- func (x Uint64x8) ShiftLeft(y Uint64x8) Uint64x8
- func (x Uint64x8) ShiftLeftConcat(y Uint64x8, z Uint64x8) Uint64x8
- func (x Uint64x8) ShiftRight(y Uint64x8) Uint64x8
- func (x Uint64x8) ShiftRightConcat(y Uint64x8, z Uint64x8) Uint64x8
- func (x Uint64x8) Store(y *[8]uint64)
- func (x Uint64x8) StoreMasked(y *[8]uint64, mask Mask64x8)
- func (x Uint64x8) StoreSlice(s []uint64)
- func (x Uint64x8) StoreSlicePart(s []uint64)
- func (x Uint64x8) String() string
- func (x Uint64x8) Sub(y Uint64x8) Uint64x8
- func (x Uint64x8) TruncateToUint16() Uint16x8
- func (x Uint64x8) TruncateToUint32() Uint32x8
- func (x Uint64x8) TruncateToUint8() Uint8x16
- func (x Uint64x8) Xor(y Uint64x8) Uint64x8
- type Uint8x16
- func (x Uint8x16) AESDecryptLastRound(y Uint32x4) Uint8x16
- func (x Uint8x16) AESDecryptOneRound(y Uint32x4) Uint8x16
- func (x Uint8x16) AESEncryptLastRound(y Uint32x4) Uint8x16
- func (x Uint8x16) AESEncryptOneRound(y Uint32x4) Uint8x16
- func (x Uint8x16) Add(y Uint8x16) Uint8x16
- func (x Uint8x16) AddSaturated(y Uint8x16) Uint8x16
- func (x Uint8x16) And(y Uint8x16) Uint8x16
- func (x Uint8x16) AndNot(y Uint8x16) Uint8x16
- func (x Uint8x16) AsFloat32x4() Float32x4
- func (x Uint8x16) AsFloat64x2() Float64x2
- func (x Uint8x16) AsInt16x8() Int16x8
- func (x Uint8x16) AsInt32x4() Int32x4
- func (x Uint8x16) AsInt64x2() Int64x2
- func (x Uint8x16) AsInt8x16() Int8x16
- func (x Uint8x16) AsUint16x8() Uint16x8
- func (x Uint8x16) AsUint32x4() Uint32x4
- func (x Uint8x16) AsUint64x2() Uint64x2
- func (x Uint8x16) Average(y Uint8x16) Uint8x16
- func (x Uint8x16) Broadcast1To16() Uint8x16
- func (x Uint8x16) Broadcast1To32() Uint8x32
- func (x Uint8x16) Broadcast1To64() Uint8x64
- func (x Uint8x16) Compress(mask Mask8x16) Uint8x16
- func (x Uint8x16) ConcatPermute(y Uint8x16, indices Uint8x16) Uint8x16
- func (x Uint8x16) ConcatShiftBytesRight(shift uint8, y Uint8x16) Uint8x16
- func (x Uint8x16) DotProductPairsSaturated(y Int8x16) Int16x8
- func (x Uint8x16) Equal(y Uint8x16) Mask8x16
- func (x Uint8x16) Expand(mask Mask8x16) Uint8x16
- func (x Uint8x16) ExtendLo2ToUint64() Uint64x2
- func (x Uint8x16) ExtendLo4ToUint32() Uint32x4
- func (x Uint8x16) ExtendLo4ToUint64() Uint64x4
- func (x Uint8x16) ExtendLo8ToUint16() Uint16x8
- func (x Uint8x16) ExtendLo8ToUint32() Uint32x8
- func (x Uint8x16) ExtendLo8ToUint64() Uint64x8
- func (x Uint8x16) ExtendToUint16() Uint16x16
- func (x Uint8x16) ExtendToUint32() Uint32x16
- func (x Uint8x16) GaloisFieldAffineTransform(y Uint64x2, b uint8) Uint8x16
- func (x Uint8x16) GaloisFieldAffineTransformInverse(y Uint64x2, b uint8) Uint8x16
- func (x Uint8x16) GaloisFieldMul(y Uint8x16) Uint8x16
- func (x Uint8x16) GetElem(index uint8) uint8
- func (x Uint8x16) Greater(y Uint8x16) Mask8x16
- func (x Uint8x16) GreaterEqual(y Uint8x16) Mask8x16
- func (x Uint8x16) IsZero() bool
- func (x Uint8x16) Len() int
- func (x Uint8x16) Less(y Uint8x16) Mask8x16
- func (x Uint8x16) LessEqual(y Uint8x16) Mask8x16
- func (x Uint8x16) Masked(mask Mask8x16) Uint8x16
- func (x Uint8x16) Max(y Uint8x16) Uint8x16
- func (x Uint8x16) Merge(y Uint8x16, mask Mask8x16) Uint8x16
- func (x Uint8x16) Min(y Uint8x16) Uint8x16
- func (x Uint8x16) Not() Uint8x16
- func (x Uint8x16) NotEqual(y Uint8x16) Mask8x16
- func (x Uint8x16) OnesCount() Uint8x16
- func (x Uint8x16) Or(y Uint8x16) Uint8x16
- func (x Uint8x16) Permute(indices Uint8x16) Uint8x16
- func (x Uint8x16) PermuteOrZero(indices Int8x16) Uint8x16
- func (x Uint8x16) SetElem(index uint8, y uint8) Uint8x16
- func (x Uint8x16) Store(y *[16]uint8)
- func (x Uint8x16) StoreSlice(s []uint8)
- func (x Uint8x16) StoreSlicePart(s []uint8)
- func (x Uint8x16) String() string
- func (x Uint8x16) Sub(y Uint8x16) Uint8x16
- func (x Uint8x16) SubSaturated(y Uint8x16) Uint8x16
- func (x Uint8x16) SumAbsDiff(y Uint8x16) Uint16x8
- func (x Uint8x16) Xor(y Uint8x16) Uint8x16
- type Uint8x32
- func (x Uint8x32) AESDecryptLastRound(y Uint32x8) Uint8x32
- func (x Uint8x32) AESDecryptOneRound(y Uint32x8) Uint8x32
- func (x Uint8x32) AESEncryptLastRound(y Uint32x8) Uint8x32
- func (x Uint8x32) AESEncryptOneRound(y Uint32x8) Uint8x32
- func (x Uint8x32) Add(y Uint8x32) Uint8x32
- func (x Uint8x32) AddSaturated(y Uint8x32) Uint8x32
- func (x Uint8x32) And(y Uint8x32) Uint8x32
- func (x Uint8x32) AndNot(y Uint8x32) Uint8x32
- func (x Uint8x32) AsFloat32x8() Float32x8
- func (x Uint8x32) AsFloat64x4() Float64x4
- func (x Uint8x32) AsInt16x16() Int16x16
- func (x Uint8x32) AsInt32x8() Int32x8
- func (x Uint8x32) AsInt64x4() Int64x4
- func (x Uint8x32) AsInt8x32() Int8x32
- func (x Uint8x32) AsUint16x16() Uint16x16
- func (x Uint8x32) AsUint32x8() Uint32x8
- func (x Uint8x32) AsUint64x4() Uint64x4
- func (x Uint8x32) Average(y Uint8x32) Uint8x32
- func (x Uint8x32) Compress(mask Mask8x32) Uint8x32
- func (x Uint8x32) ConcatPermute(y Uint8x32, indices Uint8x32) Uint8x32
- func (x Uint8x32) ConcatShiftBytesRightGrouped(shift uint8, y Uint8x32) Uint8x32
- func (x Uint8x32) DotProductPairsSaturated(y Int8x32) Int16x16
- func (x Uint8x32) Equal(y Uint8x32) Mask8x32
- func (x Uint8x32) Expand(mask Mask8x32) Uint8x32
- func (x Uint8x32) ExtendToUint16() Uint16x32
- func (x Uint8x32) GaloisFieldAffineTransform(y Uint64x4, b uint8) Uint8x32
- func (x Uint8x32) GaloisFieldAffineTransformInverse(y Uint64x4, b uint8) Uint8x32
- func (x Uint8x32) GaloisFieldMul(y Uint8x32) Uint8x32
- func (x Uint8x32) GetHi() Uint8x16
- func (x Uint8x32) GetLo() Uint8x16
- func (x Uint8x32) Greater(y Uint8x32) Mask8x32
- func (x Uint8x32) GreaterEqual(y Uint8x32) Mask8x32
- func (x Uint8x32) IsZero() bool
- func (x Uint8x32) Len() int
- func (x Uint8x32) Less(y Uint8x32) Mask8x32
- func (x Uint8x32) LessEqual(y Uint8x32) Mask8x32
- func (x Uint8x32) Masked(mask Mask8x32) Uint8x32
- func (x Uint8x32) Max(y Uint8x32) Uint8x32
- func (x Uint8x32) Merge(y Uint8x32, mask Mask8x32) Uint8x32
- func (x Uint8x32) Min(y Uint8x32) Uint8x32
- func (x Uint8x32) Not() Uint8x32
- func (x Uint8x32) NotEqual(y Uint8x32) Mask8x32
- func (x Uint8x32) OnesCount() Uint8x32
- func (x Uint8x32) Or(y Uint8x32) Uint8x32
- func (x Uint8x32) Permute(indices Uint8x32) Uint8x32
- func (x Uint8x32) PermuteOrZeroGrouped(indices Int8x32) Uint8x32
- func (x Uint8x32) Select128FromPair(lo, hi uint8, y Uint8x32) Uint8x32
- func (x Uint8x32) SetHi(y Uint8x16) Uint8x32
- func (x Uint8x32) SetLo(y Uint8x16) Uint8x32
- func (x Uint8x32) Store(y *[32]uint8)
- func (x Uint8x32) StoreSlice(s []uint8)
- func (x Uint8x32) StoreSlicePart(s []uint8)
- func (x Uint8x32) String() string
- func (x Uint8x32) Sub(y Uint8x32) Uint8x32
- func (x Uint8x32) SubSaturated(y Uint8x32) Uint8x32
- func (x Uint8x32) SumAbsDiff(y Uint8x32) Uint16x16
- func (x Uint8x32) Xor(y Uint8x32) Uint8x32
- type Uint8x64
- func (x Uint8x64) AESDecryptLastRound(y Uint32x16) Uint8x64
- func (x Uint8x64) AESDecryptOneRound(y Uint32x16) Uint8x64
- func (x Uint8x64) AESEncryptLastRound(y Uint32x16) Uint8x64
- func (x Uint8x64) AESEncryptOneRound(y Uint32x16) Uint8x64
- func (x Uint8x64) Add(y Uint8x64) Uint8x64
- func (x Uint8x64) AddSaturated(y Uint8x64) Uint8x64
- func (x Uint8x64) And(y Uint8x64) Uint8x64
- func (x Uint8x64) AndNot(y Uint8x64) Uint8x64
- func (x Uint8x64) AsFloat32x16() Float32x16
- func (x Uint8x64) AsFloat64x8() Float64x8
- func (x Uint8x64) AsInt16x32() Int16x32
- func (x Uint8x64) AsInt32x16() Int32x16
- func (x Uint8x64) AsInt64x8() Int64x8
- func (x Uint8x64) AsInt8x64() Int8x64
- func (x Uint8x64) AsUint16x32() Uint16x32
- func (x Uint8x64) AsUint32x16() Uint32x16
- func (x Uint8x64) AsUint64x8() Uint64x8
- func (x Uint8x64) Average(y Uint8x64) Uint8x64
- func (x Uint8x64) Compress(mask Mask8x64) Uint8x64
- func (x Uint8x64) ConcatPermute(y Uint8x64, indices Uint8x64) Uint8x64
- func (x Uint8x64) ConcatShiftBytesRightGrouped(shift uint8, y Uint8x64) Uint8x64
- func (x Uint8x64) DotProductPairsSaturated(y Int8x64) Int16x32
- func (x Uint8x64) Equal(y Uint8x64) Mask8x64
- func (x Uint8x64) Expand(mask Mask8x64) Uint8x64
- func (x Uint8x64) GaloisFieldAffineTransform(y Uint64x8, b uint8) Uint8x64
- func (x Uint8x64) GaloisFieldAffineTransformInverse(y Uint64x8, b uint8) Uint8x64
- func (x Uint8x64) GaloisFieldMul(y Uint8x64) Uint8x64
- func (x Uint8x64) GetHi() Uint8x32
- func (x Uint8x64) GetLo() Uint8x32
- func (x Uint8x64) Greater(y Uint8x64) Mask8x64
- func (x Uint8x64) GreaterEqual(y Uint8x64) Mask8x64
- func (x Uint8x64) Len() int
- func (x Uint8x64) Less(y Uint8x64) Mask8x64
- func (x Uint8x64) LessEqual(y Uint8x64) Mask8x64
- func (x Uint8x64) Masked(mask Mask8x64) Uint8x64
- func (x Uint8x64) Max(y Uint8x64) Uint8x64
- func (x Uint8x64) Merge(y Uint8x64, mask Mask8x64) Uint8x64
- func (x Uint8x64) Min(y Uint8x64) Uint8x64
- func (x Uint8x64) Not() Uint8x64
- func (x Uint8x64) NotEqual(y Uint8x64) Mask8x64
- func (x Uint8x64) OnesCount() Uint8x64
- func (x Uint8x64) Or(y Uint8x64) Uint8x64
- func (x Uint8x64) Permute(indices Uint8x64) Uint8x64
- func (x Uint8x64) PermuteOrZeroGrouped(indices Int8x64) Uint8x64
- func (x Uint8x64) SetHi(y Uint8x32) Uint8x64
- func (x Uint8x64) SetLo(y Uint8x32) Uint8x64
- func (x Uint8x64) Store(y *[64]uint8)
- func (x Uint8x64) StoreMasked(y *[64]uint8, mask Mask8x64)
- func (x Uint8x64) StoreSlice(s []uint8)
- func (x Uint8x64) StoreSlicePart(s []uint8)
- func (x Uint8x64) String() string
- func (x Uint8x64) Sub(y Uint8x64) Uint8x64
- func (x Uint8x64) SubSaturated(y Uint8x64) Uint8x64
- func (x Uint8x64) SumAbsDiff(y Uint8x64) Uint16x32
- func (x Uint8x64) Xor(y Uint8x64) Uint8x64
- type X86Features
- func (X86Features) AES() bool
- func (X86Features) AVX() bool
- func (X86Features) AVX2() bool
- func (X86Features) AVX512() bool
- func (X86Features) AVX512BITALG() bool
- func (X86Features) AVX512GFNI() bool
- func (X86Features) AVX512VAES() bool
- func (X86Features) AVX512VBMI() bool
- func (X86Features) AVX512VBMI2() bool
- func (X86Features) AVX512VNNI() bool
- func (X86Features) AVX512VPCLMULQDQ() bool
- func (X86Features) AVX512VPOPCNTDQ() bool
- func (X86Features) AVXVNNI() bool
- func (X86Features) SHA() bool
- Bugs
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ClearAVXUpperBits ¶
func ClearAVXUpperBits()
ClearAVXUpperBits clears the high bits of Y0-Y15 and Z0-Z15 registers. It is intended for transitioning from AVX to SSE, eliminating the performance penalties caused by false dependencies.
Note: in the future the compiler may automatically generate the instruction, making this function unnecessary.
Asm: VZEROUPPER, CPU Feature: AVX
Types ¶
type Float32x16 ¶
type Float32x16 struct {
// contains filtered or unexported fields
}
Float32x16 is a 512-bit SIMD vector of 16 float32s.
func BroadcastFloat32x16 ¶
func BroadcastFloat32x16(x float32) Float32x16
BroadcastFloat32x16 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX512F
func LoadFloat32x16 ¶
func LoadFloat32x16(y *[16]float32) Float32x16
LoadFloat32x16 loads a Float32x16 from an array.
func LoadFloat32x16Slice ¶
func LoadFloat32x16Slice(s []float32) Float32x16
LoadFloat32x16Slice loads a Float32x16 from a slice of at least 16 float32s.
func LoadFloat32x16SlicePart ¶
func LoadFloat32x16SlicePart(s []float32) Float32x16
LoadFloat32x16SlicePart loads a Float32x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadFloat32x16Slice.
func LoadMaskedFloat32x16 ¶
func LoadMaskedFloat32x16(y *[16]float32, mask Mask32x16) Float32x16
LoadMaskedFloat32x16 loads a Float32x16 from an array, at those elements enabled by mask.
Asm: VMOVDQU32.Z, CPU Feature: AVX512
func (Float32x16) Add ¶
func (x Float32x16) Add(y Float32x16) Float32x16
Add adds corresponding elements of two vectors.
Asm: VADDPS, CPU Feature: AVX512
func (Float32x16) AsFloat64x8 ¶
func (x Float32x16) AsFloat64x8() Float64x8
AsFloat64x8 returns a Float64x8 with the same bit representation as x.
func (Float32x16) AsInt16x32 ¶
func (x Float32x16) AsInt16x32() Int16x32
AsInt16x32 returns an Int16x32 with the same bit representation as x.
func (Float32x16) AsInt32x16 ¶
func (x Float32x16) AsInt32x16() Int32x16
AsInt32x16 returns an Int32x16 with the same bit representation as x.
func (Float32x16) AsInt64x8 ¶
func (x Float32x16) AsInt64x8() Int64x8
AsInt64x8 returns an Int64x8 with the same bit representation as x.
func (Float32x16) AsInt8x64 ¶
func (x Float32x16) AsInt8x64() Int8x64
AsInt8x64 returns an Int8x64 with the same bit representation as x.
func (Float32x16) AsUint16x32 ¶
func (x Float32x16) AsUint16x32() Uint16x32
AsUint16x32 returns a Uint16x32 with the same bit representation as x.
func (Float32x16) AsUint32x16 ¶
func (x Float32x16) AsUint32x16() Uint32x16
AsUint32x16 returns a Uint32x16 with the same bit representation as x.
func (Float32x16) AsUint64x8 ¶
func (x Float32x16) AsUint64x8() Uint64x8
AsUint64x8 returns a Uint64x8 with the same bit representation as x.
func (Float32x16) AsUint8x64 ¶
func (x Float32x16) AsUint8x64() Uint8x64
AsUint8x64 returns a Uint8x64 with the same bit representation as x.
func (Float32x16) CeilScaled ¶
func (x Float32x16) CeilScaled(prec uint8) Float32x16
CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x16) CeilScaledResidue ¶
func (x Float32x16) CeilScaledResidue(prec uint8) Float32x16
CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x16) Compress ¶
func (x Float32x16) Compress(mask Mask32x16) Float32x16
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPS, CPU Feature: AVX512
func (Float32x16) ConcatPermute ¶
func (x Float32x16) ConcatPermute(y Float32x16, indices Uint32x16) Float32x16
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PS, CPU Feature: AVX512
func (Float32x16) ConvertToInt32 ¶
func (x Float32x16) ConvertToInt32() Int32x16
ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2DQ, CPU Feature: AVX512
func (Float32x16) ConvertToUint32 ¶
func (x Float32x16) ConvertToUint32() Uint32x16
ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2UDQ, CPU Feature: AVX512
func (Float32x16) Div ¶
func (x Float32x16) Div(y Float32x16) Float32x16
Div divides elements of two vectors.
Asm: VDIVPS, CPU Feature: AVX512
func (Float32x16) Equal ¶
func (x Float32x16) Equal(y Float32x16) Mask32x16
Equal returns a mask whose elements indicate whether x == y.
Asm: VCMPPS, CPU Feature: AVX512
func (Float32x16) Expand ¶
func (x Float32x16) Expand(mask Mask32x16) Float32x16
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPS, CPU Feature: AVX512
func (Float32x16) FloorScaled ¶
func (x Float32x16) FloorScaled(prec uint8) Float32x16
FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x16) FloorScaledResidue ¶
func (x Float32x16) FloorScaledResidue(prec uint8) Float32x16
FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x16) GetHi ¶
func (x Float32x16) GetHi() Float32x8
GetHi returns the upper half of x.
Asm: VEXTRACTF64X4, CPU Feature: AVX512
func (Float32x16) GetLo ¶
func (x Float32x16) GetLo() Float32x8
GetLo returns the lower half of x.
Asm: VEXTRACTF64X4, CPU Feature: AVX512
func (Float32x16) Greater ¶
func (x Float32x16) Greater(y Float32x16) Mask32x16
Greater returns a mask whose elements indicate whether x > y.
Asm: VCMPPS, CPU Feature: AVX512
func (Float32x16) GreaterEqual ¶
func (x Float32x16) GreaterEqual(y Float32x16) Mask32x16
GreaterEqual returns a mask whose elements indicate whether x >= y.
Asm: VCMPPS, CPU Feature: AVX512
func (Float32x16) IsNaN ¶
func (x Float32x16) IsNaN() Mask32x16
IsNaN returns a mask whose elements indicate whether the corresponding elements of x are NaN.
Asm: VCMPPS, CPU Feature: AVX512
func (Float32x16) Len ¶
func (x Float32x16) Len() int
Len returns the number of elements in a Float32x16.
func (Float32x16) Less ¶
func (x Float32x16) Less(y Float32x16) Mask32x16
Less returns a mask whose elements indicate whether x < y.
Asm: VCMPPS, CPU Feature: AVX512
func (Float32x16) LessEqual ¶
func (x Float32x16) LessEqual(y Float32x16) Mask32x16
LessEqual returns a mask whose elements indicate whether x <= y.
Asm: VCMPPS, CPU Feature: AVX512
func (Float32x16) Masked ¶
func (x Float32x16) Masked(mask Mask32x16) Float32x16
Masked returns x but with elements zeroed where mask is false.
func (Float32x16) Max ¶
func (x Float32x16) Max(y Float32x16) Float32x16
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VMAXPS, CPU Feature: AVX512
func (Float32x16) Merge ¶
func (x Float32x16) Merge(y Float32x16, mask Mask32x16) Float32x16
Merge returns x but with elements set to y where mask is false.
func (Float32x16) Min ¶
func (x Float32x16) Min(y Float32x16) Float32x16
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VMINPS, CPU Feature: AVX512
func (Float32x16) Mul ¶
func (x Float32x16) Mul(y Float32x16) Float32x16
Mul multiplies corresponding elements of two vectors.
Asm: VMULPS, CPU Feature: AVX512
func (Float32x16) MulAdd ¶
func (x Float32x16) MulAdd(y Float32x16, z Float32x16) Float32x16
MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PS, CPU Feature: AVX512
func (Float32x16) MulAddSub ¶
func (x Float32x16) MulAddSub(y Float32x16, z Float32x16) Float32x16
MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PS, CPU Feature: AVX512
func (Float32x16) MulSubAdd ¶
func (x Float32x16) MulSubAdd(y Float32x16, z Float32x16) Float32x16
MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PS, CPU Feature: AVX512
func (Float32x16) NotEqual ¶
func (x Float32x16) NotEqual(y Float32x16) Mask32x16
NotEqual returns a mask whose elements indicate whether x != y.
Asm: VCMPPS, CPU Feature: AVX512
func (Float32x16) Permute ¶
func (x Float32x16) Permute(indices Uint32x16) Float32x16
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 4 bits (values 0-15) of each element of indices is used.
Asm: VPERMPS, CPU Feature: AVX512
func (Float32x16) Reciprocal ¶
func (x Float32x16) Reciprocal() Float32x16
Reciprocal computes an approximate reciprocal of each element.
Asm: VRCP14PS, CPU Feature: AVX512
func (Float32x16) ReciprocalSqrt ¶
func (x Float32x16) ReciprocalSqrt() Float32x16
ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRT14PS, CPU Feature: AVX512
func (Float32x16) RoundToEvenScaled ¶
func (x Float32x16) RoundToEvenScaled(prec uint8) Float32x16
RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x16) RoundToEvenScaledResidue ¶
func (x Float32x16) RoundToEvenScaledResidue(prec uint8) Float32x16
RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x16) Scale ¶
func (x Float32x16) Scale(y Float32x16) Float32x16
Scale multiplies each element of x by 2 raised to the power of the floor of the corresponding element in y.
Asm: VSCALEFPS, CPU Feature: AVX512
func (Float32x16) SelectFromPairGrouped ¶
func (x Float32x16) SelectFromPairGrouped(a, b, c, d uint8, y Float32x16) Float32x16
SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX512
func (Float32x16) SetHi ¶
func (x Float32x16) SetHi(y Float32x8) Float32x16
SetHi returns x with its upper half set to y.
Asm: VINSERTF64X4, CPU Feature: AVX512
func (Float32x16) SetLo ¶
func (x Float32x16) SetLo(y Float32x8) Float32x16
SetLo returns x with its lower half set to y.
Asm: VINSERTF64X4, CPU Feature: AVX512
func (Float32x16) Sqrt ¶
func (x Float32x16) Sqrt() Float32x16
Sqrt computes the square root of each element.
Asm: VSQRTPS, CPU Feature: AVX512
func (Float32x16) Store ¶
func (x Float32x16) Store(y *[16]float32)
Store stores a Float32x16 to an array.
func (Float32x16) StoreMasked ¶
func (x Float32x16) StoreMasked(y *[16]float32, mask Mask32x16)
StoreMasked stores a Float32x16 to an array, at those elements enabled by mask.
Asm: VMOVDQU32, CPU Feature: AVX512
func (Float32x16) StoreSlice ¶
func (x Float32x16) StoreSlice(s []float32)
StoreSlice stores x into a slice of at least 16 float32s.
func (Float32x16) StoreSlicePart ¶
func (x Float32x16) StoreSlicePart(s []float32)
StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.
func (Float32x16) String ¶
func (x Float32x16) String() string
String returns a string representation of SIMD vector x.
func (Float32x16) Sub ¶
func (x Float32x16) Sub(y Float32x16) Float32x16
Sub subtracts corresponding elements of two vectors.
Asm: VSUBPS, CPU Feature: AVX512
func (Float32x16) TruncScaled ¶
func (x Float32x16) TruncScaled(prec uint8) Float32x16
TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x16) TruncScaledResidue ¶
func (x Float32x16) TruncScaledResidue(prec uint8) Float32x16
TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
type Float32x4 ¶
type Float32x4 struct {
// contains filtered or unexported fields
}
Float32x4 is a 128-bit SIMD vector of 4 float32s.
func BroadcastFloat32x4 ¶
BroadcastFloat32x4 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadFloat32x4 ¶
LoadFloat32x4 loads a Float32x4 from an array.
func LoadFloat32x4Slice ¶
LoadFloat32x4Slice loads a Float32x4 from a slice of at least 4 float32s.
func LoadFloat32x4SlicePart ¶
LoadFloat32x4SlicePart loads a Float32x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadFloat32x4Slice.
func LoadMaskedFloat32x4 ¶
LoadMaskedFloat32x4 loads a Float32x4 from an array, at those elements enabled by mask.
Asm: VMASKMOVD, CPU Feature: AVX2
func (Float32x4) Add ¶
Add adds corresponding elements of two vectors.
Asm: VADDPS, CPU Feature: AVX
func (Float32x4) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0+x1, x2+x3, ..., y0+y1, y2+y3, ...].
Asm: VHADDPS, CPU Feature: AVX
func (Float32x4) AddSub ¶
AddSub subtracts even elements and adds odd elements of two vectors.
Asm: VADDSUBPS, CPU Feature: AVX
func (Float32x4) AsFloat64x2 ¶
AsFloat64x2 returns a Float64x2 with the same bit representation as x.
func (Float32x4) AsUint16x8 ¶
AsUint16x8 returns a Uint16x8 with the same bit representation as x.
func (Float32x4) AsUint32x4 ¶
AsUint32x4 returns a Uint32x4 with the same bit representation as x.
func (Float32x4) AsUint64x2 ¶
AsUint64x2 returns a Uint64x2 with the same bit representation as x.
func (Float32x4) AsUint8x16 ¶
AsUint8x16 returns a Uint8x16 with the same bit representation as x.
func (Float32x4) Broadcast1To16 ¶
func (x Float32x4) Broadcast1To16() Float32x16
Broadcast1To16 copies the lowest element of its input to all 16 elements of the output vector.
Asm: VBROADCASTSS, CPU Feature: AVX512
func (Float32x4) Broadcast1To4 ¶
Broadcast1To4 copies the lowest element of its input to all 4 elements of the output vector.
Asm: VBROADCASTSS, CPU Feature: AVX2
func (Float32x4) Broadcast1To8 ¶
Broadcast1To8 copies the lowest element of its input to all 8 elements of the output vector.
Asm: VBROADCASTSS, CPU Feature: AVX2
func (Float32x4) Ceil ¶
Ceil rounds elements up to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX
func (Float32x4) CeilScaled ¶
CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x4) CeilScaledResidue ¶
CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x4) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPS, CPU Feature: AVX512
func (Float32x4) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PS, CPU Feature: AVX512
func (Float32x4) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTPS2PD, CPU Feature: AVX
func (Float32x4) ConvertToInt32 ¶
ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2DQ, CPU Feature: AVX
func (Float32x4) ConvertToInt64 ¶
ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2QQ, CPU Feature: AVX512
func (Float32x4) ConvertToUint32 ¶
ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2UDQ, CPU Feature: AVX512
func (Float32x4) ConvertToUint64 ¶
ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2UQQ, CPU Feature: AVX512
func (Float32x4) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x4) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPS, CPU Feature: AVX512
func (Float32x4) Floor ¶
Floor rounds elements down to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX
func (Float32x4) FloorScaled ¶
FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x4) FloorScaledResidue ¶
FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x4) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRD, CPU Feature: AVX
func (Float32x4) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x4) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x4) IsNaN ¶
IsNaN returns a mask whose elements indicate whether the corresponding elements of x are NaN.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x4) Less ¶
Less returns a mask whose elements indicate whether x < y.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x4) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x4) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VMAXPS, CPU Feature: AVX
func (Float32x4) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VMINPS, CPU Feature: AVX
func (Float32x4) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VMULPS, CPU Feature: AVX
func (Float32x4) MulAdd ¶
MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PS, CPU Feature: AVX512
func (Float32x4) MulAddSub ¶
MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PS, CPU Feature: AVX512
func (Float32x4) MulSubAdd ¶
MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PS, CPU Feature: AVX512
func (Float32x4) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x4) Reciprocal ¶
Reciprocal computes an approximate reciprocal of each element.
Asm: VRCPPS, CPU Feature: AVX
func (Float32x4) ReciprocalSqrt ¶
ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRTPS, CPU Feature: AVX
func (Float32x4) RoundToEven ¶
RoundToEven rounds elements to the nearest integer, rounding ties to even.
Asm: VROUNDPS, CPU Feature: AVX
func (Float32x4) RoundToEvenScaled ¶
RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x4) RoundToEvenScaledResidue ¶
RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x4) Scale ¶
Scale multiplies each element of x by 2 raised to the power of the floor of the corresponding element in y.
Asm: VSCALEFPS, CPU Feature: AVX512
func (Float32x4) SelectFromPair ¶
SelectFromPair returns the selection of four elements from the two vectors x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example,
{1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81})
returns {4,8,25,81}.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX
func (Float32x4) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRD, CPU Feature: AVX
func (Float32x4) Sqrt ¶
Sqrt computes the square root of each element.
Asm: VSQRTPS, CPU Feature: AVX
func (Float32x4) StoreMasked ¶
StoreMasked stores a Float32x4 to an array, at those elements enabled by mask.
Asm: VMASKMOVD, CPU Feature: AVX2
func (Float32x4) StoreSlice ¶
StoreSlice stores x into a slice of at least 4 float32s.
func (Float32x4) StoreSlicePart ¶
StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.
func (Float32x4) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VSUBPS, CPU Feature: AVX
func (Float32x4) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0-x1, x2-x3, ..., y0-y1, y2-y3, ...].
Asm: VHSUBPS, CPU Feature: AVX
func (Float32x4) TruncScaled ¶
TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x4) TruncScaledResidue ¶
TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
type Float32x8 ¶
type Float32x8 struct {
// contains filtered or unexported fields
}
Float32x8 is a 256-bit SIMD vector of 8 float32s.
func BroadcastFloat32x8 ¶
BroadcastFloat32x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadFloat32x8 ¶
LoadFloat32x8 loads a Float32x8 from an array.
func LoadFloat32x8Slice ¶
LoadFloat32x8Slice loads a Float32x8 from a slice of at least 8 float32s.
func LoadFloat32x8SlicePart ¶
LoadFloat32x8SlicePart loads a Float32x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadFloat32x8Slice.
func LoadMaskedFloat32x8 ¶
LoadMaskedFloat32x8 loads a Float32x8 from an array, at those elements enabled by mask.
Asm: VMASKMOVD, CPU Feature: AVX2
func (Float32x8) Add ¶
Add adds corresponding elements of two vectors.
Asm: VADDPS, CPU Feature: AVX
func (Float32x8) AddPairsGrouped ¶
AddPairsGrouped horizontally adds adjacent pairs of elements. With each 128-bit as a group: for x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0+x1, x2+x3, ..., y0+y1, y2+y3, ...].
Asm: VHADDPS, CPU Feature: AVX
func (Float32x8) AddSub ¶
AddSub subtracts even elements and adds odd elements of two vectors.
Asm: VADDSUBPS, CPU Feature: AVX
func (Float32x8) AsFloat64x4 ¶
AsFloat64x4 returns a Float64x4 with the same bit representation as x.
func (Float32x8) AsInt16x16 ¶
AsInt16x16 returns an Int16x16 with the same bit representation as x.
func (Float32x8) AsUint16x16 ¶
AsUint16x16 returns a Uint16x16 with the same bit representation as x.
func (Float32x8) AsUint32x8 ¶
AsUint32x8 returns a Uint32x8 with the same bit representation as x.
func (Float32x8) AsUint64x4 ¶
AsUint64x4 returns a Uint64x4 with the same bit representation as x.
func (Float32x8) AsUint8x32 ¶
AsUint8x32 returns a Uint8x32 with the same bit representation as x.
func (Float32x8) Ceil ¶
Ceil rounds elements up to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX
func (Float32x8) CeilScaled ¶
CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x8) CeilScaledResidue ¶
CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPS, CPU Feature: AVX512
func (Float32x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PS, CPU Feature: AVX512
func (Float32x8) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTPS2PD, CPU Feature: AVX512
func (Float32x8) ConvertToInt32 ¶
ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2DQ, CPU Feature: AVX
func (Float32x8) ConvertToInt64 ¶
ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2QQ, CPU Feature: AVX512
func (Float32x8) ConvertToUint32 ¶
ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2UDQ, CPU Feature: AVX512
func (Float32x8) ConvertToUint64 ¶
ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2UQQ, CPU Feature: AVX512
func (Float32x8) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPS, CPU Feature: AVX512
func (Float32x8) Floor ¶
Floor rounds elements down to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX
func (Float32x8) FloorScaled ¶
FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x8) FloorScaledResidue ¶
FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x8) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x8) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x8) IsNaN ¶
IsNaN returns a mask whose elements indicate whether the corresponding elements of x are NaN.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x8) Less ¶
Less returns a mask whose elements indicate whether x < y.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x8) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x8) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VMAXPS, CPU Feature: AVX
func (Float32x8) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VMINPS, CPU Feature: AVX
func (Float32x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VMULPS, CPU Feature: AVX
func (Float32x8) MulAdd ¶
MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PS, CPU Feature: AVX512
func (Float32x8) MulAddSub ¶
MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PS, CPU Feature: AVX512
func (Float32x8) MulSubAdd ¶
MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PS, CPU Feature: AVX512
func (Float32x8) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x8) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used.
Asm: VPERMPS, CPU Feature: AVX2
func (Float32x8) Reciprocal ¶
Reciprocal computes an approximate reciprocal of each element.
Asm: VRCPPS, CPU Feature: AVX
func (Float32x8) ReciprocalSqrt ¶
ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRTPS, CPU Feature: AVX
func (Float32x8) RoundToEven ¶
RoundToEven rounds elements to the nearest integer, rounding ties to even.
Asm: VROUNDPS, CPU Feature: AVX
func (Float32x8) RoundToEvenScaled ¶
RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x8) RoundToEvenScaledResidue ¶
RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x8) Scale ¶
Scale multiplies each element of x by 2 raised to the power of the floor of the corresponding element in y.
Asm: VSCALEFPS, CPU Feature: AVX512
func (Float32x8) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})
returns {70, 71, 72, 73, 40, 41, 42, 43}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2F128, CPU Feature: AVX
func (Float32x8) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example,
{1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})
returns {4,8,25,81,64,128,169,289}.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX
func (Float32x8) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTF128, CPU Feature: AVX
func (Float32x8) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTF128, CPU Feature: AVX
func (Float32x8) Sqrt ¶
Sqrt computes the square root of each element.
Asm: VSQRTPS, CPU Feature: AVX
func (Float32x8) StoreMasked ¶
StoreMasked stores a Float32x8 to an array, at those elements enabled by mask.
Asm: VMASKMOVD, CPU Feature: AVX2
func (Float32x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 float32s.
func (Float32x8) StoreSlicePart ¶
StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Float32x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VSUBPS, CPU Feature: AVX
func (Float32x8) SubPairsGrouped ¶
SubPairsGrouped horizontally subtracts adjacent pairs of elements. With each 128-bit as a group: for x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0-x1, x2-x3, ..., y0-y1, y2-y3, ...].
Asm: VHSUBPS, CPU Feature: AVX
func (Float32x8) TruncScaled ¶
TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x8) TruncScaledResidue ¶
TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
type Float64x2 ¶
type Float64x2 struct {
// contains filtered or unexported fields
}
Float64x2 is a 128-bit SIMD vector of 2 float64s.
func BroadcastFloat64x2 ¶
BroadcastFloat64x2 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadFloat64x2 ¶
LoadFloat64x2 loads a Float64x2 from an array.
func LoadFloat64x2Slice ¶
LoadFloat64x2Slice loads a Float64x2 from a slice of at least 2 float64s.
func LoadFloat64x2SlicePart ¶
LoadFloat64x2SlicePart loads a Float64x2 from the slice s. If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes. If s has 2 or more elements, the function is equivalent to LoadFloat64x2Slice.
func LoadMaskedFloat64x2 ¶
LoadMaskedFloat64x2 loads a Float64x2 from an array, at those elements enabled by mask.
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Float64x2) Add ¶
Add adds corresponding elements of two vectors.
Asm: VADDPD, CPU Feature: AVX
func (Float64x2) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1] and y = [y0, y1], the result is [x0+x1, y0+y1].
Asm: VHADDPD, CPU Feature: AVX
func (Float64x2) AddSub ¶
AddSub subtracts even elements and adds odd elements of two vectors.
Asm: VADDSUBPD, CPU Feature: AVX
func (Float64x2) AsFloat32x4 ¶
AsFloat32x4 returns a Float32x4 with the same bit representation as x.
func (Float64x2) AsUint16x8 ¶
AsUint16x8 returns a Uint16x8 with the same bit representation as x.
func (Float64x2) AsUint32x4 ¶
AsUint32x4 returns a Uint32x4 with the same bit representation as x.
func (Float64x2) AsUint64x2 ¶
AsUint64x2 returns a Uint64x2 with the same bit representation as x.
func (Float64x2) AsUint8x16 ¶
AsUint8x16 returns a Uint8x16 with the same bit representation as x.
func (Float64x2) Broadcast1To2 ¶
Broadcast1To2 copies the lowest element of its input to all 2 elements of the output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2
func (Float64x2) Broadcast1To4 ¶
Broadcast1To4 copies the lowest element of its input to all 4 elements of the output vector.
Asm: VBROADCASTSD, CPU Feature: AVX2
func (Float64x2) Broadcast1To8 ¶
Broadcast1To8 copies the lowest element of its input to all 8 elements of the output vector.
Asm: VBROADCASTSD, CPU Feature: AVX512
func (Float64x2) Ceil ¶
Ceil rounds elements up to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX
func (Float64x2) CeilScaled ¶
CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x2) CeilScaledResidue ¶
CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x2) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPD, CPU Feature: AVX512
func (Float64x2) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PD, CPU Feature: AVX512
func (Float64x2) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32. The result vector's elements are rounded to the nearest value.
Asm: VCVTPD2PSX, CPU Feature: AVX
func (Float64x2) ConvertToInt32 ¶
ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2DQX, CPU Feature: AVX
func (Float64x2) ConvertToInt64 ¶
ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2QQ, CPU Feature: AVX512
func (Float64x2) ConvertToUint32 ¶
ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2UDQX, CPU Feature: AVX512
func (Float64x2) ConvertToUint64 ¶
ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2UQQ, CPU Feature: AVX512
func (Float64x2) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x2) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPD, CPU Feature: AVX512
func (Float64x2) Floor ¶
Floor rounds elements down to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX
func (Float64x2) FloorScaled ¶
FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x2) FloorScaledResidue ¶
FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x2) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRQ, CPU Feature: AVX
func (Float64x2) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x2) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x2) IsNaN ¶
IsNaN returns a mask whose elements indicate whether the corresponding elements of x are NaN.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x2) Less ¶
Less returns a mask whose elements indicate whether x < y.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x2) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x2) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VMAXPD, CPU Feature: AVX
func (Float64x2) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VMINPD, CPU Feature: AVX
func (Float64x2) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VMULPD, CPU Feature: AVX
func (Float64x2) MulAdd ¶
MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PD, CPU Feature: AVX512
func (Float64x2) MulAddSub ¶
MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PD, CPU Feature: AVX512
func (Float64x2) MulSubAdd ¶
MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PD, CPU Feature: AVX512
func (Float64x2) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x2) Reciprocal ¶
Reciprocal computes an approximate reciprocal of each element.
Asm: VRCP14PD, CPU Feature: AVX512
func (Float64x2) ReciprocalSqrt ¶
ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRT14PD, CPU Feature: AVX512
func (Float64x2) RoundToEven ¶
RoundToEven rounds elements to the nearest integer, rounding ties to even.
Asm: VROUNDPD, CPU Feature: AVX
func (Float64x2) RoundToEvenScaled ¶
RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x2) RoundToEvenScaledResidue ¶
RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x2) Scale ¶
Scale multiplies each element of x by 2 raised to the power of the floor of the corresponding element in y.
Asm: VSCALEFPD, CPU Feature: AVX512
func (Float64x2) SelectFromPair ¶
SelectFromPair returns the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX
func (Float64x2) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRQ, CPU Feature: AVX
func (Float64x2) Sqrt ¶
Sqrt computes the square root of each element.
Asm: VSQRTPD, CPU Feature: AVX
func (Float64x2) StoreMasked ¶
StoreMasked stores a Float64x2 to an array, at those elements enabled by mask.
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Float64x2) StoreSlice ¶
StoreSlice stores x into a slice of at least 2 float64s.
func (Float64x2) StoreSlicePart ¶
StoreSlicePart stores the 2 elements of x into the slice s. It stores as many elements as will fit in s. If s has 2 or more elements, the method is equivalent to x.StoreSlice.
func (Float64x2) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VSUBPD, CPU Feature: AVX
func (Float64x2) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1] and y = [y0, y1], the result is [x0-x1, y0-y1].
Asm: VHSUBPD, CPU Feature: AVX
func (Float64x2) TruncScaled ¶
TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x2) TruncScaledResidue ¶
TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
type Float64x4 ¶
type Float64x4 struct {
// contains filtered or unexported fields
}
Float64x4 is a 256-bit SIMD vector of 4 float64s.
func BroadcastFloat64x4 ¶
BroadcastFloat64x4 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadFloat64x4 ¶
LoadFloat64x4 loads a Float64x4 from an array.
func LoadFloat64x4Slice ¶
LoadFloat64x4Slice loads a Float64x4 from a slice of at least 4 float64s.
func LoadFloat64x4SlicePart ¶
LoadFloat64x4SlicePart loads a Float64x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadFloat64x4Slice.
func LoadMaskedFloat64x4 ¶
LoadMaskedFloat64x4 loads a Float64x4 from an array, at those elements enabled by mask.
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Float64x4) Add ¶
Add adds corresponding elements of two vectors.
Asm: VADDPD, CPU Feature: AVX
func (Float64x4) AddPairsGrouped ¶
AddPairsGrouped horizontally adds adjacent pairs of elements. With each 128-bit as a group: for x = [x0, x1] and y = [y0, y1], the result is [x0+x1, y0+y1].
Asm: VHADDPD, CPU Feature: AVX
func (Float64x4) AddSub ¶
AddSub subtracts even elements and adds odd elements of two vectors.
Asm: VADDSUBPD, CPU Feature: AVX
func (Float64x4) AsFloat32x8 ¶
AsFloat32x8 returns a Float32x8 with the same bit representation as x.
func (Float64x4) AsInt16x16 ¶
AsInt16x16 returns an Int16x16 with the same bit representation as x.
func (Float64x4) AsUint16x16 ¶
AsUint16x16 returns a Uint16x16 with the same bit representation as x.
func (Float64x4) AsUint32x8 ¶
AsUint32x8 returns a Uint32x8 with the same bit representation as x.
func (Float64x4) AsUint64x4 ¶
AsUint64x4 returns a Uint64x4 with the same bit representation as x.
func (Float64x4) AsUint8x32 ¶
AsUint8x32 returns a Uint8x32 with the same bit representation as x.
func (Float64x4) Ceil ¶
Ceil rounds elements up to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX
func (Float64x4) CeilScaled ¶
CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x4) CeilScaledResidue ¶
CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x4) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPD, CPU Feature: AVX512
func (Float64x4) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PD, CPU Feature: AVX512
func (Float64x4) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32. The result vector's elements are rounded to the nearest value.
Asm: VCVTPD2PSY, CPU Feature: AVX
func (Float64x4) ConvertToInt32 ¶
ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2DQY, CPU Feature: AVX
func (Float64x4) ConvertToInt64 ¶
ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2QQ, CPU Feature: AVX512
func (Float64x4) ConvertToUint32 ¶
ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2UDQY, CPU Feature: AVX512
func (Float64x4) ConvertToUint64 ¶
ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2UQQ, CPU Feature: AVX512
func (Float64x4) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x4) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPD, CPU Feature: AVX512
func (Float64x4) Floor ¶
Floor rounds elements down to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX
func (Float64x4) FloorScaled ¶
FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x4) FloorScaledResidue ¶
FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x4) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x4) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x4) IsNaN ¶
IsNaN returns a mask whose elements indicate whether the corresponding elements of x are NaN.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x4) Less ¶
Less returns a mask whose elements indicate whether x < y.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x4) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x4) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VMAXPD, CPU Feature: AVX
func (Float64x4) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VMINPD, CPU Feature: AVX
func (Float64x4) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VMULPD, CPU Feature: AVX
func (Float64x4) MulAdd ¶
MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PD, CPU Feature: AVX512
func (Float64x4) MulAddSub ¶
MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PD, CPU Feature: AVX512
func (Float64x4) MulSubAdd ¶
MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PD, CPU Feature: AVX512
func (Float64x4) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x4) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 2 bits (values 0-3) of each element of indices is used.
Asm: VPERMPD, CPU Feature: AVX512
func (Float64x4) Reciprocal ¶
Reciprocal computes an approximate reciprocal of each element.
Asm: VRCP14PD, CPU Feature: AVX512
func (Float64x4) ReciprocalSqrt ¶
ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRT14PD, CPU Feature: AVX512
func (Float64x4) RoundToEven ¶
RoundToEven rounds elements to the nearest integer, rounding ties to even.
Asm: VROUNDPD, CPU Feature: AVX
func (Float64x4) RoundToEvenScaled ¶
RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x4) RoundToEvenScaledResidue ¶
RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x4) Scale ¶
Scale multiplies each element of x by 2 raised to the power of the floor of the corresponding element in y.
Asm: VSCALEFPD, CPU Feature: AVX512
func (Float64x4) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})
returns {70, 71, 40, 41}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2F128, CPU Feature: AVX
func (Float64x4) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX
func (Float64x4) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTF128, CPU Feature: AVX
func (Float64x4) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTF128, CPU Feature: AVX
func (Float64x4) Sqrt ¶
Sqrt computes the square root of each element.
Asm: VSQRTPD, CPU Feature: AVX
func (Float64x4) StoreMasked ¶
StoreMasked stores a Float64x4 to an array, at those elements enabled by mask.
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Float64x4) StoreSlice ¶
StoreSlice stores x into a slice of at least 4 float64s.
func (Float64x4) StoreSlicePart ¶
StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.
func (Float64x4) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VSUBPD, CPU Feature: AVX
func (Float64x4) SubPairsGrouped ¶
SubPairsGrouped horizontally subtracts adjacent pairs of elements. With each 128-bit as a group: for x = [x0, x1] and y = [y0, y1], the result is [x0-x1, y0-y1].
Asm: VHSUBPD, CPU Feature: AVX
func (Float64x4) TruncScaled ¶
TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x4) TruncScaledResidue ¶
TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
type Float64x8 ¶
type Float64x8 struct {
// contains filtered or unexported fields
}
Float64x8 is a 512-bit SIMD vector of 8 float64s.
func BroadcastFloat64x8 ¶
BroadcastFloat64x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX512F
func LoadFloat64x8 ¶
LoadFloat64x8 loads a Float64x8 from an array.
func LoadFloat64x8Slice ¶
LoadFloat64x8Slice loads a Float64x8 from a slice of at least 8 float64s.
func LoadFloat64x8SlicePart ¶
LoadFloat64x8SlicePart loads a Float64x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadFloat64x8Slice.
func LoadMaskedFloat64x8 ¶
LoadMaskedFloat64x8 loads a Float64x8 from an array, at those elements enabled by mask.
Asm: VMOVDQU64.Z, CPU Feature: AVX512
func (Float64x8) Add ¶
Add adds corresponding elements of two vectors.
Asm: VADDPD, CPU Feature: AVX512
func (Float64x8) AsFloat32x16 ¶
func (x Float64x8) AsFloat32x16() Float32x16
AsFloat32x16 returns a Float32x16 with the same bit representation as x.
func (Float64x8) AsInt16x32 ¶
AsInt16x32 returns an Int16x32 with the same bit representation as x.
func (Float64x8) AsInt32x16 ¶
AsInt32x16 returns an Int32x16 with the same bit representation as x.
func (Float64x8) AsUint16x32 ¶
AsUint16x32 returns a Uint16x32 with the same bit representation as x.
func (Float64x8) AsUint32x16 ¶
AsUint32x16 returns a Uint32x16 with the same bit representation as x.
func (Float64x8) AsUint64x8 ¶
AsUint64x8 returns a Uint64x8 with the same bit representation as x.
func (Float64x8) AsUint8x64 ¶
AsUint8x64 returns a Uint8x64 with the same bit representation as x.
func (Float64x8) CeilScaled ¶
CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x8) CeilScaledResidue ¶
CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPD, CPU Feature: AVX512
func (Float64x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PD, CPU Feature: AVX512
func (Float64x8) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32. The result vector's elements are rounded to the nearest value.
Asm: VCVTPD2PS, CPU Feature: AVX512
func (Float64x8) ConvertToInt32 ¶
ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2DQ, CPU Feature: AVX512
func (Float64x8) ConvertToInt64 ¶
ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2QQ, CPU Feature: AVX512
func (Float64x8) ConvertToUint32 ¶
ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2UDQ, CPU Feature: AVX512
func (Float64x8) ConvertToUint64 ¶
ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2UQQ, CPU Feature: AVX512
func (Float64x8) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VCMPPD, CPU Feature: AVX512
func (Float64x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPD, CPU Feature: AVX512
func (Float64x8) FloorScaled ¶
FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x8) FloorScaledResidue ¶
FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x8) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VCMPPD, CPU Feature: AVX512
func (Float64x8) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Asm: VCMPPD, CPU Feature: AVX512
func (Float64x8) IsNaN ¶
IsNaN returns a mask whose elements indicate whether the corresponding elements of x are NaN.
Asm: VCMPPD, CPU Feature: AVX512
func (Float64x8) Less ¶
Less returns a mask whose elements indicate whether x < y.
Asm: VCMPPD, CPU Feature: AVX512
func (Float64x8) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Asm: VCMPPD, CPU Feature: AVX512
func (Float64x8) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VMAXPD, CPU Feature: AVX512
func (Float64x8) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VMINPD, CPU Feature: AVX512
func (Float64x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VMULPD, CPU Feature: AVX512
func (Float64x8) MulAdd ¶
MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PD, CPU Feature: AVX512
func (Float64x8) MulAddSub ¶
MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PD, CPU Feature: AVX512
func (Float64x8) MulSubAdd ¶
MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PD, CPU Feature: AVX512
func (Float64x8) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Asm: VCMPPD, CPU Feature: AVX512
func (Float64x8) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used.
Asm: VPERMPD, CPU Feature: AVX512
func (Float64x8) Reciprocal ¶
Reciprocal computes an approximate reciprocal of each element.
Asm: VRCP14PD, CPU Feature: AVX512
func (Float64x8) ReciprocalSqrt ¶
ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRT14PD, CPU Feature: AVX512
func (Float64x8) RoundToEvenScaled ¶
RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x8) RoundToEvenScaledResidue ¶
RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x8) Scale ¶
Scale multiplies each element of x by 2 raised to the power of the floor of the corresponding element in y.
Asm: VSCALEFPD, CPU Feature: AVX512
func (Float64x8) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX512
func (Float64x8) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTF64X4, CPU Feature: AVX512
func (Float64x8) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTF64X4, CPU Feature: AVX512
func (Float64x8) Sqrt ¶
Sqrt computes the square root of each element.
Asm: VSQRTPD, CPU Feature: AVX512
func (Float64x8) StoreMasked ¶
StoreMasked stores a Float64x8 to an array, at those elements enabled by mask.
Asm: VMOVDQU64, CPU Feature: AVX512
func (Float64x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 float64s.
func (Float64x8) StoreSlicePart ¶
StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Float64x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VSUBPD, CPU Feature: AVX512
func (Float64x8) TruncScaled ¶
TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x8) TruncScaledResidue ¶
TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
type Int16x16 ¶
type Int16x16 struct {
// contains filtered or unexported fields
}
Int16x16 is a 256-bit SIMD vector of 16 int16s.
func BroadcastInt16x16 ¶
BroadcastInt16x16 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadInt16x16 ¶
LoadInt16x16 loads an Int16x16 from an array.
func LoadInt16x16Slice ¶
LoadInt16x16Slice loads an Int16x16 from a slice of at least 16 int16s.
func LoadInt16x16SlicePart ¶
LoadInt16x16SlicePart loads a Int16x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadInt16x16Slice.
func (Int16x16) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSW, CPU Feature: AVX2
func (Int16x16) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDW, CPU Feature: AVX2
func (Int16x16) AddPairsGrouped ¶
AddPairsGrouped horizontally adds adjacent pairs of elements. With each 128-bit as a group: for x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0+x1, x2+x3, ..., y0+y1, y2+y3, ...].
Asm: VPHADDW, CPU Feature: AVX2
func (Int16x16) AddPairsSaturatedGrouped ¶
AddPairsSaturatedGrouped horizontally adds adjacent pairs of elements with saturation. With each 128-bit as a group: for x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0+x1, x2+x3, ..., y0+y1, y2+y3, ...].
Asm: VPHADDSW, CPU Feature: AVX2
func (Int16x16) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSW, CPU Feature: AVX2
func (Int16x16) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Int16x16) AsFloat32x8 ¶
AsFloat32x8 returns a Float32x8 with the same bit representation as x.
func (Int16x16) AsFloat64x4 ¶
AsFloat64x4 returns a Float64x4 with the same bit representation as x.
func (Int16x16) AsUint16x16 ¶
AsUint16x16 returns a Uint16x16 with the same bit representation as x.
func (Int16x16) AsUint32x8 ¶
AsUint32x8 returns a Uint32x8 with the same bit representation as x.
func (Int16x16) AsUint64x4 ¶
AsUint64x4 returns a Uint64x4 with the same bit representation as x.
func (Int16x16) AsUint8x32 ¶
AsUint8x32 returns a Uint8x32 with the same bit representation as x.
func (Int16x16) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2
func (Int16x16) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512
func (Int16x16) CopySign ¶
CopySign returns the product of x with -1, 0, or 1, whichever constant is nearest to the value of y.
Asm: VPSIGNW, CPU Feature: AVX2
func (Int16x16) DotProductPairs ¶
DotProductPairs multiplies the elements and add the pairs together, yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDWD, CPU Feature: AVX2
func (Int16x16) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQW, CPU Feature: AVX2
func (Int16x16) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2
func (Int16x16) ExtendToInt32 ¶
ExtendToInt32 sign-extends element values to int32.
Asm: VPMOVSXWD, CPU Feature: AVX512
func (Int16x16) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPGTW, CPU Feature: AVX2
func (Int16x16) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX2
func (Int16x16) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX2
func (Int16x16) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX2
func (Int16x16) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Int16x16) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX2
func (Int16x16) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX2
func (Int16x16) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXSW, CPU Feature: AVX2
func (Int16x16) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINSW, CPU Feature: AVX2
func (Int16x16) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX2
func (Int16x16) MulHigh ¶
MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHW, CPU Feature: AVX2
func (Int16x16) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX2
func (Int16x16) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG
func (Int16x16) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Int16x16) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 4 bits (values 0-15) of each element of indices is used.
Asm: VPERMW, CPU Feature: AVX512
func (Int16x16) PermuteScalarsHiGrouped ¶
PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4],
x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX2
func (Int16x16) PermuteScalarsLoGrouped ¶
PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7],
x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX2
func (Int16x16) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8 with signed saturation.
Asm: VPMOVSWB, CPU Feature: AVX512
func (Int16x16) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 42, 43, 44, 45, 46, 47, 50, 51, 52, 53, 54, 55, 56, 57}.Select128FromPair(3, 0,
{60, 61, 62, 63, 64, 65, 66, 67, 70, 71, 72, 73, 74, 75, 76, 77})
returns {70, 71, 72, 73, 74, 75, 76, 77, 40, 41, 42, 43, 44, 45, 46, 47}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Int16x16) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int16x16) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int16x16) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLW, CPU Feature: AVX2
func (Int16x16) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2
func (Int16x16) ShiftAllRight ¶
ShiftAllRight performs a signed right shift on each element by y bits.
Asm: VPSRAW, CPU Feature: AVX2
func (Int16x16) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2
func (Int16x16) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVW, CPU Feature: AVX512
func (Int16x16) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2
func (Int16x16) ShiftRight ¶
ShiftRight performs a signed right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRAVW, CPU Feature: AVX512
func (Int16x16) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2
func (Int16x16) StoreSlice ¶
StoreSlice stores x into a slice of at least 16 int16s.
func (Int16x16) StoreSlicePart ¶
StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.
func (Int16x16) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX2
func (Int16x16) SubPairsGrouped ¶
SubPairsGrouped horizontally subtracts adjacent pairs of elements. With each 128-bit as a group: for x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0-x1, x2-x3, ..., y0-y1, y2-y3, ...].
Asm: VPHSUBW, CPU Feature: AVX2
func (Int16x16) SubPairsSaturatedGrouped ¶
SubPairsSaturatedGrouped horizontally subtracts adjacent pairs of elements with saturation. With each 128-bit as a group: for x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0-x1, x2-x3, ..., y0-y1, y2-y3, ...].
Asm: VPHSUBSW, CPU Feature: AVX2
func (Int16x16) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSW, CPU Feature: AVX2
func (Int16x16) ToMask ¶
ToMask converts from Int16x16 to Mask16x16, mask element is set to true when the corresponding vector element is non-zero.
func (Int16x16) TruncateToInt8 ¶
TruncateToInt8 truncates element values to int8.
Asm: VPMOVWB, CPU Feature: AVX512
type Int16x32 ¶
type Int16x32 struct {
// contains filtered or unexported fields
}
Int16x32 is a 512-bit SIMD vector of 32 int16s.
func BroadcastInt16x32 ¶
BroadcastInt16x32 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX512BW
func LoadInt16x32 ¶
LoadInt16x32 loads an Int16x32 from an array.
func LoadInt16x32Slice ¶
LoadInt16x32Slice loads an Int16x32 from a slice of at least 32 int16s.
func LoadInt16x32SlicePart ¶
LoadInt16x32SlicePart loads a Int16x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadInt16x32Slice.
func LoadMaskedInt16x32 ¶
LoadMaskedInt16x32 loads an Int16x32 from an array, at those elements enabled by mask.
Asm: VMOVDQU16.Z, CPU Feature: AVX512
func (Int16x32) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSW, CPU Feature: AVX512
func (Int16x32) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDW, CPU Feature: AVX512
func (Int16x32) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSW, CPU Feature: AVX512
func (Int16x32) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512
func (Int16x32) AsFloat32x16 ¶
func (x Int16x32) AsFloat32x16() Float32x16
AsFloat32x16 returns a Float32x16 with the same bit representation as x.
func (Int16x32) AsFloat64x8 ¶
AsFloat64x8 returns a Float64x8 with the same bit representation as x.
func (Int16x32) AsInt32x16 ¶
AsInt32x16 returns an Int32x16 with the same bit representation as x.
func (Int16x32) AsUint16x32 ¶
AsUint16x32 returns a Uint16x32 with the same bit representation as x.
func (Int16x32) AsUint32x16 ¶
AsUint32x16 returns a Uint32x16 with the same bit representation as x.
func (Int16x32) AsUint64x8 ¶
AsUint64x8 returns a Uint64x8 with the same bit representation as x.
func (Int16x32) AsUint8x64 ¶
AsUint8x64 returns a Uint8x64 with the same bit representation as x.
func (Int16x32) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2
func (Int16x32) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512
func (Int16x32) DotProductPairs ¶
DotProductPairs multiplies the elements and add the pairs together, yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDWD, CPU Feature: AVX512
func (Int16x32) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQW, CPU Feature: AVX512
func (Int16x32) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2
func (Int16x32) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPGTW, CPU Feature: AVX512
func (Int16x32) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Asm: VPCMPW, CPU Feature: AVX512
func (Int16x32) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX512
func (Int16x32) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX512
func (Int16x32) Less ¶
Less returns a mask whose elements indicate whether x < y.
Asm: VPCMPW, CPU Feature: AVX512
func (Int16x32) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Asm: VPCMPW, CPU Feature: AVX512
func (Int16x32) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXSW, CPU Feature: AVX512
func (Int16x32) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINSW, CPU Feature: AVX512
func (Int16x32) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX512
func (Int16x32) MulHigh ¶
MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHW, CPU Feature: AVX512
func (Int16x32) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Asm: VPCMPW, CPU Feature: AVX512
func (Int16x32) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG
func (Int16x32) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512
func (Int16x32) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 5 bits (values 0-31) of each element of indices is used.
Asm: VPERMW, CPU Feature: AVX512
func (Int16x32) PermuteScalarsHiGrouped ¶
PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4],
x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12],
x[16], x[17], x[18], x[19], x[a+20], x[b+20], x[c+20], x[d+20],
x[24], x[25], x[26], x[27], x[a+28], x[b+28], x[c+28], x[d+28]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX512
func (Int16x32) PermuteScalarsLoGrouped ¶
PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7],
x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15],
x[a+16], x[b+16], x[c+16], x[d+16], x[20], x[21], x[22], x[23],
x[a+24], x[b+24], x[c+24], x[d+24], x[28], x[29], x[30], x[31]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX512
func (Int16x32) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8 with signed saturation.
Asm: VPMOVSWB, CPU Feature: AVX512
func (Int16x32) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int16x32) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int16x32) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLW, CPU Feature: AVX512
func (Int16x32) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2
func (Int16x32) ShiftAllRight ¶
ShiftAllRight performs a signed right shift on each element by y bits.
Asm: VPSRAW, CPU Feature: AVX512
func (Int16x32) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2
func (Int16x32) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVW, CPU Feature: AVX512
func (Int16x32) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2
func (Int16x32) ShiftRight ¶
ShiftRight performs a signed right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRAVW, CPU Feature: AVX512
func (Int16x32) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2
func (Int16x32) StoreMasked ¶
StoreMasked stores an Int16x32 to an array, at those elements enabled by mask.
Asm: VMOVDQU16, CPU Feature: AVX512
func (Int16x32) StoreSlice ¶
StoreSlice stores x into a slice of at least 32 int16s.
func (Int16x32) StoreSlicePart ¶
StoreSlicePart stores the 32 elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.
func (Int16x32) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX512
func (Int16x32) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSW, CPU Feature: AVX512
func (Int16x32) ToMask ¶
ToMask converts from Int16x32 to Mask16x32, mask element is set to true when the corresponding vector element is non-zero.
func (Int16x32) TruncateToInt8 ¶
TruncateToInt8 truncates element values to int8.
Asm: VPMOVWB, CPU Feature: AVX512
type Int16x8 ¶
type Int16x8 struct {
// contains filtered or unexported fields
}
Int16x8 is a 128-bit SIMD vector of 8 int16s.
func BroadcastInt16x8 ¶
BroadcastInt16x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadInt16x8 ¶
LoadInt16x8 loads an Int16x8 from an array.
func LoadInt16x8Slice ¶
LoadInt16x8Slice loads an Int16x8 from a slice of at least 8 int16s.
func LoadInt16x8SlicePart ¶
LoadInt16x8SlicePart loads a Int16x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadInt16x8Slice.
func (Int16x8) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0+x1, x2+x3, ..., y0+y1, y2+y3, ...].
Asm: VPHADDW, CPU Feature: AVX
func (Int16x8) AddPairsSaturated ¶
AddPairsSaturated horizontally adds adjacent pairs of elements with saturation. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0+x1, x2+x3, ..., y0+y1, y2+y3, ...].
Asm: VPHADDSW, CPU Feature: AVX
func (Int16x8) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSW, CPU Feature: AVX
func (Int16x8) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Int16x8) AsFloat32x4 ¶
AsFloat32x4 returns a Float32x4 with the same bit representation as x.
func (Int16x8) AsFloat64x2 ¶
AsFloat64x2 returns a Float64x2 with the same bit representation as x.
func (Int16x8) AsUint16x8 ¶
AsUint16x8 returns a Uint16x8 with the same bit representation as x.
func (Int16x8) AsUint32x4 ¶
AsUint32x4 returns a Uint32x4 with the same bit representation as x.
func (Int16x8) AsUint64x2 ¶
AsUint64x2 returns a Uint64x2 with the same bit representation as x.
func (Int16x8) AsUint8x16 ¶
AsUint8x16 returns a Uint8x16 with the same bit representation as x.
func (Int16x8) Broadcast1To16 ¶
Broadcast1To16 copies the lowest element of its input to all 16 elements of the output vector.
Asm: VPBROADCASTW, CPU Feature: AVX2
func (Int16x8) Broadcast1To32 ¶
Broadcast1To32 copies the lowest element of its input to all 32 elements of the output vector.
Asm: VPBROADCASTW, CPU Feature: AVX512
func (Int16x8) Broadcast1To8 ¶
Broadcast1To8 copies the lowest element of its input to all 8 elements of the output vector.
Asm: VPBROADCASTW, CPU Feature: AVX2
func (Int16x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2
func (Int16x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512
func (Int16x8) CopySign ¶
CopySign returns the product of x with -1, 0, or 1, whichever constant is nearest to the value of y.
Asm: VPSIGNW, CPU Feature: AVX
func (Int16x8) DotProductPairs ¶
DotProductPairs multiplies the elements and add the pairs together, yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDWD, CPU Feature: AVX
func (Int16x8) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQW, CPU Feature: AVX
func (Int16x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2
func (Int16x8) ExtendLo2ToInt64 ¶
ExtendLo2ToInt64 sign-extends 2 lowest vector element values to int64.
Asm: VPMOVSXWQ, CPU Feature: AVX
func (Int16x8) ExtendLo4ToInt32 ¶
ExtendLo4ToInt32 sign-extends 4 lowest vector element values to int32.
Asm: VPMOVSXWD, CPU Feature: AVX
func (Int16x8) ExtendLo4ToInt64 ¶
ExtendLo4ToInt64 sign-extends 4 lowest vector element values to int64.
Asm: VPMOVSXWQ, CPU Feature: AVX2
func (Int16x8) ExtendToInt32 ¶
ExtendToInt32 sign-extends element values to int32.
Asm: VPMOVSXWD, CPU Feature: AVX2
func (Int16x8) ExtendToInt64 ¶
ExtendToInt64 sign-extends element values to int64.
Asm: VPMOVSXWQ, CPU Feature: AVX512
func (Int16x8) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRW, CPU Feature: AVX512
func (Int16x8) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPGTW, CPU Feature: AVX
func (Int16x8) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX
func (Int16x8) InterleaveHi ¶
InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX
func (Int16x8) InterleaveLo ¶
InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX
func (Int16x8) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Int16x8) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX
func (Int16x8) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX
func (Int16x8) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXSW, CPU Feature: AVX
func (Int16x8) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINSW, CPU Feature: AVX
func (Int16x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX
func (Int16x8) MulHigh ¶
MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHW, CPU Feature: AVX
func (Int16x8) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX
func (Int16x8) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG
func (Int16x8) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Int16x8) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used.
Asm: VPERMW, CPU Feature: AVX512
func (Int16x8) PermuteScalarsHi ¶
PermuteScalarsHi performs a permutation of vector x using the supplied indices:
result = {x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX512
func (Int16x8) PermuteScalarsLo ¶
PermuteScalarsLo performs a permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX512
func (Int16x8) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8 with signed saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVSWB, CPU Feature: AVX512
func (Int16x8) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRW, CPU Feature: AVX
func (Int16x8) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLW, CPU Feature: AVX
func (Int16x8) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2
func (Int16x8) ShiftAllRight ¶
ShiftAllRight performs a signed right shift on each element by y bits.
Asm: VPSRAW, CPU Feature: AVX
func (Int16x8) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2
func (Int16x8) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVW, CPU Feature: AVX512
func (Int16x8) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2
func (Int16x8) ShiftRight ¶
ShiftRight performs a signed right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRAVW, CPU Feature: AVX512
func (Int16x8) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2
func (Int16x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 int16s.
func (Int16x8) StoreSlicePart ¶
StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Int16x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX
func (Int16x8) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0-x1, x2-x3, ..., y0-y1, y2-y3, ...].
Asm: VPHSUBW, CPU Feature: AVX
func (Int16x8) SubPairsSaturated ¶
SubPairsSaturated horizontally subtracts adjacent pairs of elements with saturation. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0-x1, x2-x3, ..., y0-y1, y2-y3, ...].
Asm: VPHSUBSW, CPU Feature: AVX
func (Int16x8) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSW, CPU Feature: AVX
func (Int16x8) ToMask ¶
ToMask converts from Int16x8 to Mask16x8, mask element is set to true when the corresponding vector element is non-zero.
func (Int16x8) TruncateToInt8 ¶
TruncateToInt8 truncates element values to int8. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVWB, CPU Feature: AVX512
type Int32x16 ¶
type Int32x16 struct {
// contains filtered or unexported fields
}
Int32x16 is a 512-bit SIMD vector of 16 int32s.
func BroadcastInt32x16 ¶
BroadcastInt32x16 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX512F
func LoadInt32x16 ¶
LoadInt32x16 loads an Int32x16 from an array.
func LoadInt32x16Slice ¶
LoadInt32x16Slice loads an Int32x16 from a slice of at least 16 int32s.
func LoadInt32x16SlicePart ¶
LoadInt32x16SlicePart loads a Int32x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadInt32x16Slice.
func LoadMaskedInt32x16 ¶
LoadMaskedInt32x16 loads an Int32x16 from an array, at those elements enabled by mask.
Asm: VMOVDQU32.Z, CPU Feature: AVX512
func (Int32x16) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSD, CPU Feature: AVX512
func (Int32x16) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDD, CPU Feature: AVX512
func (Int32x16) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512
func (Int32x16) AsFloat32x16 ¶
func (x Int32x16) AsFloat32x16() Float32x16
AsFloat32x16 returns a Float32x16 with the same bit representation as x.
func (Int32x16) AsFloat64x8 ¶
AsFloat64x8 returns a Float64x8 with the same bit representation as x.
func (Int32x16) AsInt16x32 ¶
AsInt16x32 returns an Int16x32 with the same bit representation as x.
func (Int32x16) AsUint16x32 ¶
AsUint16x32 returns a Uint16x32 with the same bit representation as x.
func (Int32x16) AsUint32x16 ¶
AsUint32x16 returns a Uint32x16 with the same bit representation as x.
func (Int32x16) AsUint64x8 ¶
AsUint64x8 returns a Uint64x8 with the same bit representation as x.
func (Int32x16) AsUint8x64 ¶
AsUint8x64 returns a Uint8x64 with the same bit representation as x.
func (Int32x16) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512
func (Int32x16) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512
func (Int32x16) ConvertToFloat32 ¶
func (x Int32x16) ConvertToFloat32() Float32x16
ConvertToFloat32 converts element values to float32.
Asm: VCVTDQ2PS, CPU Feature: AVX512
func (Int32x16) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQD, CPU Feature: AVX512
func (Int32x16) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512
func (Int32x16) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPGTD, CPU Feature: AVX512
func (Int32x16) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Asm: VPCMPD, CPU Feature: AVX512
func (Int32x16) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX512
func (Int32x16) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX512
func (Int32x16) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512
func (Int32x16) Less ¶
Less returns a mask whose elements indicate whether x < y.
Asm: VPCMPD, CPU Feature: AVX512
func (Int32x16) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Asm: VPCMPD, CPU Feature: AVX512
func (Int32x16) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXSD, CPU Feature: AVX512
func (Int32x16) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINSD, CPU Feature: AVX512
func (Int32x16) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX512
func (Int32x16) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Asm: VPCMPD, CPU Feature: AVX512
func (Int32x16) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ
func (Int32x16) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512
func (Int32x16) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 4 bits (values 0-15) of each element of indices is used.
Asm: VPERMD, CPU Feature: AVX512
func (Int32x16) PermuteScalarsGrouped ¶
PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{ x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4],
x[a+8], x[b+8], x[c+8], x[d+8], x[a+12], x[b+12], x[c+12], x[d+12]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.
Asm: VPSHUFD, CPU Feature: AVX512
func (Int32x16) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512
func (Int32x16) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512
func (Int32x16) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512
func (Int32x16) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512
func (Int32x16) SaturateToInt16 ¶
SaturateToInt16 converts element values to int16 with signed saturation.
Asm: VPMOVSDW, CPU Feature: AVX512
func (Int32x16) SaturateToInt16ConcatGrouped ¶
SaturateToInt16ConcatGrouped converts element values to int16 with signed saturation. With each 128-bit as a group: The converted elements from x will be packed to the lower part of the group in the result vector, the converted elements from y will be packed to the upper part of the group in the result vector.
Asm: VPACKSSDW, CPU Feature: AVX512
func (Int32x16) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8 with signed saturation.
Asm: VPMOVSDB, CPU Feature: AVX512
func (Int32x16) SaturateToUint16ConcatGrouped ¶
SaturateToUint16ConcatGrouped converts element values to uint16 with unsigned saturation. With each 128-bit as a group: The converted elements from x will be packed to the lower part of the group in the result vector, the converted elements from y will be packed to the upper part of the group in the result vector.
Asm: VPACKUSDW, CPU Feature: AVX512
func (Int32x16) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX512
func (Int32x16) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int32x16) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int32x16) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLD, CPU Feature: AVX512
func (Int32x16) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2
func (Int32x16) ShiftAllRight ¶
ShiftAllRight performs a signed right shift on each element by y bits.
Asm: VPSRAD, CPU Feature: AVX512
func (Int32x16) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2
func (Int32x16) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVD, CPU Feature: AVX512
func (Int32x16) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2
func (Int32x16) ShiftRight ¶
ShiftRight performs a signed right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRAVD, CPU Feature: AVX512
func (Int32x16) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2
func (Int32x16) StoreMasked ¶
StoreMasked stores an Int32x16 to an array, at those elements enabled by mask.
Asm: VMOVDQU32, CPU Feature: AVX512
func (Int32x16) StoreSlice ¶
StoreSlice stores x into a slice of at least 16 int32s.
func (Int32x16) StoreSlicePart ¶
StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.
func (Int32x16) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX512
func (Int32x16) ToMask ¶
ToMask converts from Int32x16 to Mask32x16, mask element is set to true when the corresponding vector element is non-zero.
func (Int32x16) TruncateToInt16 ¶
TruncateToInt16 truncates element values to int16.
Asm: VPMOVDW, CPU Feature: AVX512
func (Int32x16) TruncateToInt8 ¶
TruncateToInt8 truncates element values to int8.
Asm: VPMOVDB, CPU Feature: AVX512
type Int32x4 ¶
type Int32x4 struct {
// contains filtered or unexported fields
}
Int32x4 is a 128-bit SIMD vector of 4 int32s.
func BroadcastInt32x4 ¶
BroadcastInt32x4 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadInt32x4 ¶
LoadInt32x4 loads an Int32x4 from an array.
func LoadInt32x4Slice ¶
LoadInt32x4Slice loads an Int32x4 from a slice of at least 4 int32s.
func LoadInt32x4SlicePart ¶
LoadInt32x4SlicePart loads a Int32x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadInt32x4Slice.
func LoadMaskedInt32x4 ¶
LoadMaskedInt32x4 loads an Int32x4 from an array, at those elements enabled by mask.
Asm: VMASKMOVD, CPU Feature: AVX2
func (Int32x4) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0+x1, x2+x3, ..., y0+y1, y2+y3, ...].
Asm: VPHADDD, CPU Feature: AVX
func (Int32x4) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Int32x4) AsFloat32x4 ¶
AsFloat32x4 returns a Float32x4 with the same bit representation as x.
func (Int32x4) AsFloat64x2 ¶
AsFloat64x2 returns a Float64x2 with the same bit representation as x.
func (Int32x4) AsUint16x8 ¶
AsUint16x8 returns a Uint16x8 with the same bit representation as x.
func (Int32x4) AsUint32x4 ¶
AsUint32x4 returns a Uint32x4 with the same bit representation as x.
func (Int32x4) AsUint64x2 ¶
AsUint64x2 returns a Uint64x2 with the same bit representation as x.
func (Int32x4) AsUint8x16 ¶
AsUint8x16 returns a Uint8x16 with the same bit representation as x.
func (Int32x4) Broadcast1To16 ¶
Broadcast1To16 copies the lowest element of its input to all 16 elements of the output vector.
Asm: VPBROADCASTD, CPU Feature: AVX512
func (Int32x4) Broadcast1To4 ¶
Broadcast1To4 copies the lowest element of its input to all 4 elements of the output vector.
Asm: VPBROADCASTD, CPU Feature: AVX2
func (Int32x4) Broadcast1To8 ¶
Broadcast1To8 copies the lowest element of its input to all 8 elements of the output vector.
Asm: VPBROADCASTD, CPU Feature: AVX2
func (Int32x4) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512
func (Int32x4) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512
func (Int32x4) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTDQ2PS, CPU Feature: AVX
func (Int32x4) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTDQ2PD, CPU Feature: AVX
func (Int32x4) CopySign ¶
CopySign returns the product of x with -1, 0, or 1, whichever constant is nearest to the value of y.
Asm: VPSIGND, CPU Feature: AVX
func (Int32x4) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQD, CPU Feature: AVX
func (Int32x4) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512
func (Int32x4) ExtendLo2ToInt64 ¶
ExtendLo2ToInt64 sign-extends 2 lowest vector element values to int64.
Asm: VPMOVSXDQ, CPU Feature: AVX
func (Int32x4) ExtendToInt64 ¶
ExtendToInt64 sign-extends element values to int64.
Asm: VPMOVSXDQ, CPU Feature: AVX2
func (Int32x4) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRD, CPU Feature: AVX
func (Int32x4) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPGTD, CPU Feature: AVX
func (Int32x4) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX
func (Int32x4) InterleaveHi ¶
InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX
func (Int32x4) InterleaveLo ¶
InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX
func (Int32x4) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Int32x4) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512
func (Int32x4) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX
func (Int32x4) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX
func (Int32x4) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXSD, CPU Feature: AVX
func (Int32x4) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINSD, CPU Feature: AVX
func (Int32x4) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX
func (Int32x4) MulEvenWiden ¶
MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1[2*i] * v2[2*i].
Asm: VPMULDQ, CPU Feature: AVX
func (Int32x4) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX
func (Int32x4) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ
func (Int32x4) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Int32x4) PermuteScalars ¶
PermuteScalars performs a permutation of vector x's elements using the supplied indices:
result = {x[a], x[b], x[c], x[d]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.
Asm: VPSHUFD, CPU Feature: AVX
func (Int32x4) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512
func (Int32x4) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512
func (Int32x4) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512
func (Int32x4) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512
func (Int32x4) SaturateToInt16 ¶
SaturateToInt16 converts element values to int16 with signed saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVSDW, CPU Feature: AVX512
func (Int32x4) SaturateToInt16Concat ¶
SaturateToInt16Concat converts element values to int16 with signed saturation. The converted elements from x will be packed to the lower part of the result vector, the converted elements from y will be packed to the upper part of the result vector.
Asm: VPACKSSDW, CPU Feature: AVX
func (Int32x4) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8 with signed saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVSDB, CPU Feature: AVX512
func (Int32x4) SaturateToUint16Concat ¶
SaturateToUint16Concat converts element values to uint16 with unsigned saturation. The converted elements from x will be packed to the lower part of the result vector, the converted elements from y will be packed to the upper part of the result vector.
Asm: VPACKUSDW, CPU Feature: AVX
func (Int32x4) SelectFromPair ¶
SelectFromPair returns the selection of four elements from the two vectors x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example,
{1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81})
returns {4,8,25,81}.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX
func (Int32x4) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRD, CPU Feature: AVX
func (Int32x4) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLD, CPU Feature: AVX
func (Int32x4) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2
func (Int32x4) ShiftAllRight ¶
ShiftAllRight performs a signed right shift on each element by y bits.
Asm: VPSRAD, CPU Feature: AVX
func (Int32x4) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2
func (Int32x4) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVD, CPU Feature: AVX2
func (Int32x4) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2
func (Int32x4) ShiftRight ¶
ShiftRight performs a signed right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRAVD, CPU Feature: AVX2
func (Int32x4) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2
func (Int32x4) StoreMasked ¶
StoreMasked stores an Int32x4 to an array, at those elements enabled by mask.
Asm: VMASKMOVD, CPU Feature: AVX2
func (Int32x4) StoreSlice ¶
StoreSlice stores x into a slice of at least 4 int32s.
func (Int32x4) StoreSlicePart ¶
StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.
func (Int32x4) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX
func (Int32x4) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0-x1, x2-x3, ..., y0-y1, y2-y3, ...].
Asm: VPHSUBD, CPU Feature: AVX
func (Int32x4) ToMask ¶
ToMask converts from Int32x4 to Mask32x4, mask element is set to true when the corresponding vector element is non-zero.
func (Int32x4) TruncateToInt16 ¶
TruncateToInt16 truncates element values to int16. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVDW, CPU Feature: AVX512
func (Int32x4) TruncateToInt8 ¶
TruncateToInt8 truncates element values to int8. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVDB, CPU Feature: AVX512
type Int32x8 ¶
type Int32x8 struct {
// contains filtered or unexported fields
}
Int32x8 is a 256-bit SIMD vector of 8 int32s.
func BroadcastInt32x8 ¶
BroadcastInt32x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadInt32x8 ¶
LoadInt32x8 loads an Int32x8 from an array.
func LoadInt32x8Slice ¶
LoadInt32x8Slice loads an Int32x8 from a slice of at least 8 int32s.
func LoadInt32x8SlicePart ¶
LoadInt32x8SlicePart loads a Int32x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadInt32x8Slice.
func LoadMaskedInt32x8 ¶
LoadMaskedInt32x8 loads an Int32x8 from an array, at those elements enabled by mask.
Asm: VMASKMOVD, CPU Feature: AVX2
func (Int32x8) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSD, CPU Feature: AVX2
func (Int32x8) AddPairsGrouped ¶
AddPairsGrouped horizontally adds adjacent pairs of elements. With each 128-bit as a group: for x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0+x1, x2+x3, ..., y0+y1, y2+y3, ...].
Asm: VPHADDD, CPU Feature: AVX2
func (Int32x8) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Int32x8) AsFloat32x8 ¶
AsFloat32x8 returns a Float32x8 with the same bit representation as x.
func (Int32x8) AsFloat64x4 ¶
AsFloat64x4 returns a Float64x4 with the same bit representation as x.
func (Int32x8) AsInt16x16 ¶
AsInt16x16 returns an Int16x16 with the same bit representation as x.
func (Int32x8) AsUint16x16 ¶
AsUint16x16 returns a Uint16x16 with the same bit representation as x.
func (Int32x8) AsUint32x8 ¶
AsUint32x8 returns a Uint32x8 with the same bit representation as x.
func (Int32x8) AsUint64x4 ¶
AsUint64x4 returns a Uint64x4 with the same bit representation as x.
func (Int32x8) AsUint8x32 ¶
AsUint8x32 returns a Uint8x32 with the same bit representation as x.
func (Int32x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512
func (Int32x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512
func (Int32x8) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTDQ2PS, CPU Feature: AVX
func (Int32x8) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTDQ2PD, CPU Feature: AVX512
func (Int32x8) CopySign ¶
CopySign returns the product of x with -1, 0, or 1, whichever constant is nearest to the value of y.
Asm: VPSIGND, CPU Feature: AVX2
func (Int32x8) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQD, CPU Feature: AVX2
func (Int32x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512
func (Int32x8) ExtendToInt64 ¶
ExtendToInt64 sign-extends element values to int64.
Asm: VPMOVSXDQ, CPU Feature: AVX512
func (Int32x8) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPGTD, CPU Feature: AVX2
func (Int32x8) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX2
func (Int32x8) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX2
func (Int32x8) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX2
func (Int32x8) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Int32x8) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512
func (Int32x8) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX2
func (Int32x8) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX2
func (Int32x8) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXSD, CPU Feature: AVX2
func (Int32x8) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINSD, CPU Feature: AVX2
func (Int32x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX2
func (Int32x8) MulEvenWiden ¶
MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1[2*i] * v2[2*i].
Asm: VPMULDQ, CPU Feature: AVX2
func (Int32x8) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX2
func (Int32x8) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ
func (Int32x8) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Int32x8) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used.
Asm: VPERMD, CPU Feature: AVX2
func (Int32x8) PermuteScalarsGrouped ¶
PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.
Asm: VPSHUFD, CPU Feature: AVX2
func (Int32x8) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512
func (Int32x8) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512
func (Int32x8) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512
func (Int32x8) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512
func (Int32x8) SaturateToInt16 ¶
SaturateToInt16 converts element values to int16 with signed saturation.
Asm: VPMOVSDW, CPU Feature: AVX512
func (Int32x8) SaturateToInt16ConcatGrouped ¶
SaturateToInt16ConcatGrouped converts element values to int16 with signed saturation. With each 128-bit as a group: The converted elements from x will be packed to the lower part of the group in the result vector, the converted elements from y will be packed to the upper part of the group in the result vector.
Asm: VPACKSSDW, CPU Feature: AVX2
func (Int32x8) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8 with signed saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVSDB, CPU Feature: AVX512
func (Int32x8) SaturateToUint16ConcatGrouped ¶
SaturateToUint16ConcatGrouped converts element values to uint16 with unsigned saturation. With each 128-bit as a group: The converted elements from x will be packed to the lower part of the group in the result vector, the converted elements from y will be packed to the upper part of the group in the result vector.
Asm: VPACKUSDW, CPU Feature: AVX2
func (Int32x8) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})
returns {70, 71, 72, 73, 40, 41, 42, 43}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Int32x8) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example,
{1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})
returns {4,8,25,81,64,128,169,289}.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX
func (Int32x8) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int32x8) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int32x8) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLD, CPU Feature: AVX2
func (Int32x8) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2
func (Int32x8) ShiftAllRight ¶
ShiftAllRight performs a signed right shift on each element by y bits.
Asm: VPSRAD, CPU Feature: AVX2
func (Int32x8) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2
func (Int32x8) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVD, CPU Feature: AVX2
func (Int32x8) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2
func (Int32x8) ShiftRight ¶
ShiftRight performs a signed right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRAVD, CPU Feature: AVX2
func (Int32x8) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2
func (Int32x8) StoreMasked ¶
StoreMasked stores an Int32x8 to an array, at those elements enabled by mask.
Asm: VMASKMOVD, CPU Feature: AVX2
func (Int32x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 int32s.
func (Int32x8) StoreSlicePart ¶
StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Int32x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX2
func (Int32x8) SubPairsGrouped ¶
SubPairsGrouped horizontally subtracts adjacent pairs of elements. With each 128-bit as a group: for x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0-x1, x2-x3, ..., y0-y1, y2-y3, ...].
Asm: VPHSUBD, CPU Feature: AVX2
func (Int32x8) ToMask ¶
ToMask converts from Int32x8 to Mask32x8, mask element is set to true when the corresponding vector element is non-zero.
func (Int32x8) TruncateToInt16 ¶
TruncateToInt16 truncates element values to int16.
Asm: VPMOVDW, CPU Feature: AVX512
func (Int32x8) TruncateToInt8 ¶
TruncateToInt8 truncates element values to int8. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVDB, CPU Feature: AVX512
type Int64x2 ¶
type Int64x2 struct {
// contains filtered or unexported fields
}
Int64x2 is a 128-bit SIMD vector of 2 int64s.
func BroadcastInt64x2 ¶
BroadcastInt64x2 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadInt64x2 ¶
LoadInt64x2 loads an Int64x2 from an array.
func LoadInt64x2Slice ¶
LoadInt64x2Slice loads an Int64x2 from a slice of at least 2 int64s.
func LoadInt64x2SlicePart ¶
LoadInt64x2SlicePart loads a Int64x2 from the slice s. If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes. If s has 2 or more elements, the function is equivalent to LoadInt64x2Slice.
func LoadMaskedInt64x2 ¶
LoadMaskedInt64x2 loads an Int64x2 from an array, at those elements enabled by mask.
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Int64x2) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSQ, CPU Feature: AVX512
func (Int64x2) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Int64x2) AsFloat32x4 ¶
AsFloat32x4 returns a Float32x4 with the same bit representation as x.
func (Int64x2) AsFloat64x2 ¶
AsFloat64x2 returns a Float64x2 with the same bit representation as x.
func (Int64x2) AsUint16x8 ¶
AsUint16x8 returns a Uint16x8 with the same bit representation as x.
func (Int64x2) AsUint32x4 ¶
AsUint32x4 returns a Uint32x4 with the same bit representation as x.
func (Int64x2) AsUint64x2 ¶
AsUint64x2 returns a Uint64x2 with the same bit representation as x.
func (Int64x2) AsUint8x16 ¶
AsUint8x16 returns a Uint8x16 with the same bit representation as x.
func (Int64x2) Broadcast1To2 ¶
Broadcast1To2 copies the lowest element of its input to all 2 elements of the output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2
func (Int64x2) Broadcast1To4 ¶
Broadcast1To4 copies the lowest element of its input to all 4 elements of the output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2
func (Int64x2) Broadcast1To8 ¶
Broadcast1To8 copies the lowest element of its input to all 8 elements of the output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX512
func (Int64x2) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512
func (Int64x2) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512
func (Int64x2) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTQQ2PSX, CPU Feature: AVX512
func (Int64x2) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTQQ2PD, CPU Feature: AVX512
func (Int64x2) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQQ, CPU Feature: AVX
func (Int64x2) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512
func (Int64x2) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRQ, CPU Feature: AVX
func (Int64x2) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPGTQ, CPU Feature: AVX
func (Int64x2) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX
func (Int64x2) InterleaveHi ¶
InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX
func (Int64x2) InterleaveLo ¶
InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX
func (Int64x2) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Int64x2) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512
func (Int64x2) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX
func (Int64x2) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX
func (Int64x2) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXSQ, CPU Feature: AVX512
func (Int64x2) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINSQ, CPU Feature: AVX512
func (Int64x2) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512
func (Int64x2) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX
func (Int64x2) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ
func (Int64x2) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Int64x2) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512
func (Int64x2) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512
func (Int64x2) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512
func (Int64x2) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512
func (Int64x2) SaturateToInt16 ¶
SaturateToInt16 converts element values to int16 with signed saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVSQW, CPU Feature: AVX512
func (Int64x2) SaturateToInt32 ¶
SaturateToInt32 converts element values to int32 with signed saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVSQD, CPU Feature: AVX512
func (Int64x2) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8 with signed saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVSQB, CPU Feature: AVX512
func (Int64x2) SelectFromPair ¶
SelectFromPair returns the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX
func (Int64x2) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRQ, CPU Feature: AVX
func (Int64x2) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLQ, CPU Feature: AVX
func (Int64x2) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2
func (Int64x2) ShiftAllRight ¶
ShiftAllRight performs a signed right shift on each element by y bits.
Asm: VPSRAQ, CPU Feature: AVX512
func (Int64x2) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2
func (Int64x2) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVQ, CPU Feature: AVX2
func (Int64x2) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2
func (Int64x2) ShiftRight ¶
ShiftRight performs a signed right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRAVQ, CPU Feature: AVX512
func (Int64x2) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2
func (Int64x2) StoreMasked ¶
StoreMasked stores an Int64x2 to an array, at those elements enabled by mask.
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Int64x2) StoreSlice ¶
StoreSlice stores x into a slice of at least 2 int64s.
func (Int64x2) StoreSlicePart ¶
StoreSlicePart stores the 2 elements of x into the slice s. It stores as many elements as will fit in s. If s has 2 or more elements, the method is equivalent to x.StoreSlice.
func (Int64x2) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX
func (Int64x2) ToMask ¶
ToMask converts from Int64x2 to Mask64x2, mask element is set to true when the corresponding vector element is non-zero.
func (Int64x2) TruncateToInt16 ¶
TruncateToInt16 truncates element values to int16. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVQW, CPU Feature: AVX512
func (Int64x2) TruncateToInt32 ¶
TruncateToInt32 truncates element values to int32. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVQD, CPU Feature: AVX512
func (Int64x2) TruncateToInt8 ¶
TruncateToInt8 truncates element values to int8. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVQB, CPU Feature: AVX512
type Int64x4 ¶
type Int64x4 struct {
// contains filtered or unexported fields
}
Int64x4 is a 256-bit SIMD vector of 4 int64s.
func BroadcastInt64x4 ¶
BroadcastInt64x4 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadInt64x4 ¶
LoadInt64x4 loads an Int64x4 from an array.
func LoadInt64x4Slice ¶
LoadInt64x4Slice loads an Int64x4 from a slice of at least 4 int64s.
func LoadInt64x4SlicePart ¶
LoadInt64x4SlicePart loads a Int64x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadInt64x4Slice.
func LoadMaskedInt64x4 ¶
LoadMaskedInt64x4 loads an Int64x4 from an array, at those elements enabled by mask.
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Int64x4) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSQ, CPU Feature: AVX512
func (Int64x4) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Int64x4) AsFloat32x8 ¶
AsFloat32x8 returns a Float32x8 with the same bit representation as x.
func (Int64x4) AsFloat64x4 ¶
AsFloat64x4 returns a Float64x4 with the same bit representation as x.
func (Int64x4) AsInt16x16 ¶
AsInt16x16 returns an Int16x16 with the same bit representation as x.
func (Int64x4) AsUint16x16 ¶
AsUint16x16 returns a Uint16x16 with the same bit representation as x.
func (Int64x4) AsUint32x8 ¶
AsUint32x8 returns a Uint32x8 with the same bit representation as x.
func (Int64x4) AsUint64x4 ¶
AsUint64x4 returns a Uint64x4 with the same bit representation as x.
func (Int64x4) AsUint8x32 ¶
AsUint8x32 returns a Uint8x32 with the same bit representation as x.
func (Int64x4) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512
func (Int64x4) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512
func (Int64x4) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTQQ2PSY, CPU Feature: AVX512
func (Int64x4) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTQQ2PD, CPU Feature: AVX512
func (Int64x4) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQQ, CPU Feature: AVX2
func (Int64x4) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512
func (Int64x4) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPGTQ, CPU Feature: AVX2
func (Int64x4) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX2
func (Int64x4) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX2
func (Int64x4) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX2
func (Int64x4) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Int64x4) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512
func (Int64x4) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX2
func (Int64x4) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX2
func (Int64x4) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXSQ, CPU Feature: AVX512
func (Int64x4) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINSQ, CPU Feature: AVX512
func (Int64x4) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512
func (Int64x4) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX2
func (Int64x4) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ
func (Int64x4) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Int64x4) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 2 bits (values 0-3) of each element of indices is used.
Asm: VPERMQ, CPU Feature: AVX512
func (Int64x4) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512
func (Int64x4) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512
func (Int64x4) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512
func (Int64x4) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512
func (Int64x4) SaturateToInt16 ¶
SaturateToInt16 converts element values to int16 with signed saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVSQW, CPU Feature: AVX512
func (Int64x4) SaturateToInt32 ¶
SaturateToInt32 converts element values to int32 with signed saturation.
Asm: VPMOVSQD, CPU Feature: AVX512
func (Int64x4) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8 with signed saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVSQB, CPU Feature: AVX512
func (Int64x4) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})
returns {70, 71, 40, 41}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Int64x4) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX
func (Int64x4) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int64x4) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int64x4) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLQ, CPU Feature: AVX2
func (Int64x4) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2
func (Int64x4) ShiftAllRight ¶
ShiftAllRight performs a signed right shift on each element by y bits.
Asm: VPSRAQ, CPU Feature: AVX512
func (Int64x4) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2
func (Int64x4) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVQ, CPU Feature: AVX2
func (Int64x4) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2
func (Int64x4) ShiftRight ¶
ShiftRight performs a signed right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRAVQ, CPU Feature: AVX512
func (Int64x4) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2
func (Int64x4) StoreMasked ¶
StoreMasked stores an Int64x4 to an array, at those elements enabled by mask.
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Int64x4) StoreSlice ¶
StoreSlice stores x into a slice of at least 4 int64s.
func (Int64x4) StoreSlicePart ¶
StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.
func (Int64x4) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX2
func (Int64x4) ToMask ¶
ToMask converts from Int64x4 to Mask64x4, mask element is set to true when the corresponding vector element is non-zero.
func (Int64x4) TruncateToInt16 ¶
TruncateToInt16 truncates element values to int16. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVQW, CPU Feature: AVX512
func (Int64x4) TruncateToInt32 ¶
TruncateToInt32 truncates element values to int32.
Asm: VPMOVQD, CPU Feature: AVX512
func (Int64x4) TruncateToInt8 ¶
TruncateToInt8 truncates element values to int8. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVQB, CPU Feature: AVX512
type Int64x8 ¶
type Int64x8 struct {
// contains filtered or unexported fields
}
Int64x8 is a 512-bit SIMD vector of 8 int64s.
func BroadcastInt64x8 ¶
BroadcastInt64x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX512F
func LoadInt64x8 ¶
LoadInt64x8 loads an Int64x8 from an array.
func LoadInt64x8Slice ¶
LoadInt64x8Slice loads an Int64x8 from a slice of at least 8 int64s.
func LoadInt64x8SlicePart ¶
LoadInt64x8SlicePart loads a Int64x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadInt64x8Slice.
func LoadMaskedInt64x8 ¶
LoadMaskedInt64x8 loads an Int64x8 from an array, at those elements enabled by mask.
Asm: VMOVDQU64.Z, CPU Feature: AVX512
func (Int64x8) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSQ, CPU Feature: AVX512
func (Int64x8) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDQ, CPU Feature: AVX512
func (Int64x8) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDQ, CPU Feature: AVX512
func (Int64x8) AsFloat32x16 ¶
func (x Int64x8) AsFloat32x16() Float32x16
AsFloat32x16 returns a Float32x16 with the same bit representation as x.
func (Int64x8) AsFloat64x8 ¶
AsFloat64x8 returns a Float64x8 with the same bit representation as x.
func (Int64x8) AsInt16x32 ¶
AsInt16x32 returns an Int16x32 with the same bit representation as x.
func (Int64x8) AsInt32x16 ¶
AsInt32x16 returns an Int32x16 with the same bit representation as x.
func (Int64x8) AsUint16x32 ¶
AsUint16x32 returns a Uint16x32 with the same bit representation as x.
func (Int64x8) AsUint32x16 ¶
AsUint32x16 returns a Uint32x16 with the same bit representation as x.
func (Int64x8) AsUint64x8 ¶
AsUint64x8 returns a Uint64x8 with the same bit representation as x.
func (Int64x8) AsUint8x64 ¶
AsUint8x64 returns a Uint8x64 with the same bit representation as x.
func (Int64x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512
func (Int64x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512
func (Int64x8) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTQQ2PS, CPU Feature: AVX512
func (Int64x8) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTQQ2PD, CPU Feature: AVX512
func (Int64x8) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQQ, CPU Feature: AVX512
func (Int64x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512
func (Int64x8) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPGTQ, CPU Feature: AVX512
func (Int64x8) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Asm: VPCMPQ, CPU Feature: AVX512
func (Int64x8) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX512
func (Int64x8) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX512
func (Int64x8) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512
func (Int64x8) Less ¶
Less returns a mask whose elements indicate whether x < y.
Asm: VPCMPQ, CPU Feature: AVX512
func (Int64x8) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Asm: VPCMPQ, CPU Feature: AVX512
func (Int64x8) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXSQ, CPU Feature: AVX512
func (Int64x8) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINSQ, CPU Feature: AVX512
func (Int64x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512
func (Int64x8) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Asm: VPCMPQ, CPU Feature: AVX512
func (Int64x8) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ
func (Int64x8) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORQ, CPU Feature: AVX512
func (Int64x8) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used.
Asm: VPERMQ, CPU Feature: AVX512
func (Int64x8) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512
func (Int64x8) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512
func (Int64x8) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512
func (Int64x8) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512
func (Int64x8) SaturateToInt16 ¶
SaturateToInt16 converts element values to int16 with signed saturation.
Asm: VPMOVSQW, CPU Feature: AVX512
func (Int64x8) SaturateToInt32 ¶
SaturateToInt32 converts element values to int32 with signed saturation.
Asm: VPMOVSQD, CPU Feature: AVX512
func (Int64x8) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8 with signed saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVSQB, CPU Feature: AVX512
func (Int64x8) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX512
func (Int64x8) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int64x8) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int64x8) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLQ, CPU Feature: AVX512
func (Int64x8) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2
func (Int64x8) ShiftAllRight ¶
ShiftAllRight performs a signed right shift on each element by y bits.
Asm: VPSRAQ, CPU Feature: AVX512
func (Int64x8) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2
func (Int64x8) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVQ, CPU Feature: AVX512
func (Int64x8) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2
func (Int64x8) ShiftRight ¶
ShiftRight performs a signed right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRAVQ, CPU Feature: AVX512
func (Int64x8) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2
func (Int64x8) StoreMasked ¶
StoreMasked stores an Int64x8 to an array, at those elements enabled by mask.
Asm: VMOVDQU64, CPU Feature: AVX512
func (Int64x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 int64s.
func (Int64x8) StoreSlicePart ¶
StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Int64x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX512
func (Int64x8) ToMask ¶
ToMask converts from Int64x8 to Mask64x8, mask element is set to true when the corresponding vector element is non-zero.
func (Int64x8) TruncateToInt16 ¶
TruncateToInt16 truncates element values to int16.
Asm: VPMOVQW, CPU Feature: AVX512
func (Int64x8) TruncateToInt32 ¶
TruncateToInt32 truncates element values to int32.
Asm: VPMOVQD, CPU Feature: AVX512
func (Int64x8) TruncateToInt8 ¶
TruncateToInt8 truncates element values to int8. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVQB, CPU Feature: AVX512
type Int8x16 ¶
type Int8x16 struct {
// contains filtered or unexported fields
}
Int8x16 is a 128-bit SIMD vector of 16 int8s.
func BroadcastInt8x16 ¶
BroadcastInt8x16 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadInt8x16 ¶
LoadInt8x16 loads an Int8x16 from an array.
func LoadInt8x16Slice ¶
LoadInt8x16Slice loads an Int8x16 from a slice of at least 16 int8s.
func LoadInt8x16SlicePart ¶
LoadInt8x16SlicePart loads a Int8x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadInt8x16Slice.
func (Int8x16) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSB, CPU Feature: AVX
func (Int8x16) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Int8x16) AsFloat32x4 ¶
AsFloat32x4 returns a Float32x4 with the same bit representation as x.
func (Int8x16) AsFloat64x2 ¶
AsFloat64x2 returns a Float64x2 with the same bit representation as x.
func (Int8x16) AsUint16x8 ¶
AsUint16x8 returns a Uint16x8 with the same bit representation as x.
func (Int8x16) AsUint32x4 ¶
AsUint32x4 returns a Uint32x4 with the same bit representation as x.
func (Int8x16) AsUint64x2 ¶
AsUint64x2 returns a Uint64x2 with the same bit representation as x.
func (Int8x16) AsUint8x16 ¶
AsUint8x16 returns a Uint8x16 with the same bit representation as x.
func (Int8x16) Broadcast1To16 ¶
Broadcast1To16 copies the lowest element of its input to all 16 elements of the output vector.
Asm: VPBROADCASTB, CPU Feature: AVX2
func (Int8x16) Broadcast1To32 ¶
Broadcast1To32 copies the lowest element of its input to all 32 elements of the output vector.
Asm: VPBROADCASTB, CPU Feature: AVX2
func (Int8x16) Broadcast1To64 ¶
Broadcast1To64 copies the lowest element of its input to all 64 elements of the output vector.
Asm: VPBROADCASTB, CPU Feature: AVX512
func (Int8x16) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2
func (Int8x16) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI
func (Int8x16) CopySign ¶
CopySign returns the product of x with -1, 0, or 1, whichever constant is nearest to the value of y.
Asm: VPSIGNB, CPU Feature: AVX
func (Int8x16) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQB, CPU Feature: AVX
func (Int8x16) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2
func (Int8x16) ExtendLo2ToInt64 ¶
ExtendLo2ToInt64 sign-extends 2 lowest vector element values to int64.
Asm: VPMOVSXBQ, CPU Feature: AVX
func (Int8x16) ExtendLo4ToInt32 ¶
ExtendLo4ToInt32 sign-extends 4 lowest vector element values to int32.
Asm: VPMOVSXBD, CPU Feature: AVX
func (Int8x16) ExtendLo4ToInt64 ¶
ExtendLo4ToInt64 sign-extends 4 lowest vector element values to int64.
Asm: VPMOVSXBQ, CPU Feature: AVX2
func (Int8x16) ExtendLo8ToInt16 ¶
ExtendLo8ToInt16 sign-extends 8 lowest vector element values to int16.
Asm: VPMOVSXBW, CPU Feature: AVX
func (Int8x16) ExtendLo8ToInt32 ¶
ExtendLo8ToInt32 sign-extends 8 lowest vector element values to int32.
Asm: VPMOVSXBD, CPU Feature: AVX2
func (Int8x16) ExtendLo8ToInt64 ¶
ExtendLo8ToInt64 sign-extends 8 lowest vector element values to int64.
Asm: VPMOVSXBQ, CPU Feature: AVX512
func (Int8x16) ExtendToInt16 ¶
ExtendToInt16 sign-extends element values to int16.
Asm: VPMOVSXBW, CPU Feature: AVX2
func (Int8x16) ExtendToInt32 ¶
ExtendToInt32 sign-extends element values to int32.
Asm: VPMOVSXBD, CPU Feature: AVX512
func (Int8x16) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRB, CPU Feature: AVX512
func (Int8x16) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPGTB, CPU Feature: AVX
func (Int8x16) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX
func (Int8x16) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Int8x16) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX
func (Int8x16) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX
func (Int8x16) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXSB, CPU Feature: AVX
func (Int8x16) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINSB, CPU Feature: AVX
func (Int8x16) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX
func (Int8x16) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG
func (Int8x16) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Int8x16) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 4 bits (values 0-15) of each element of indices is used.
Asm: VPERMB, CPU Feature: AVX512VBMI
func (Int8x16) PermuteOrZero ¶
PermuteOrZero performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The lower four bits of each byte-sized index in indices select an element from x, unless the index's sign bit is set in which case zero is used instead.
Asm: VPSHUFB, CPU Feature: AVX
func (Int8x16) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRB, CPU Feature: AVX
func (Int8x16) StoreSlice ¶
StoreSlice stores x into a slice of at least 16 int8s.
func (Int8x16) StoreSlicePart ¶
StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.
func (Int8x16) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX
func (Int8x16) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSB, CPU Feature: AVX
type Int8x32 ¶
type Int8x32 struct {
// contains filtered or unexported fields
}
Int8x32 is a 256-bit SIMD vector of 32 int8s.
func BroadcastInt8x32 ¶
BroadcastInt8x32 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadInt8x32 ¶
LoadInt8x32 loads an Int8x32 from an array.
func LoadInt8x32Slice ¶
LoadInt8x32Slice loads an Int8x32 from a slice of at least 32 int8s.
func LoadInt8x32SlicePart ¶
LoadInt8x32SlicePart loads a Int8x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadInt8x32Slice.
func (Int8x32) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSB, CPU Feature: AVX2
func (Int8x32) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSB, CPU Feature: AVX2
func (Int8x32) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Int8x32) AsFloat32x8 ¶
AsFloat32x8 returns a Float32x8 with the same bit representation as x.
func (Int8x32) AsFloat64x4 ¶
AsFloat64x4 returns a Float64x4 with the same bit representation as x.
func (Int8x32) AsInt16x16 ¶
AsInt16x16 returns an Int16x16 with the same bit representation as x.
func (Int8x32) AsUint16x16 ¶
AsUint16x16 returns a Uint16x16 with the same bit representation as x.
func (Int8x32) AsUint32x8 ¶
AsUint32x8 returns a Uint32x8 with the same bit representation as x.
func (Int8x32) AsUint64x4 ¶
AsUint64x4 returns a Uint64x4 with the same bit representation as x.
func (Int8x32) AsUint8x32 ¶
AsUint8x32 returns a Uint8x32 with the same bit representation as x.
func (Int8x32) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2
func (Int8x32) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI
func (Int8x32) CopySign ¶
CopySign returns the product of x with -1, 0, or 1, whichever constant is nearest to the value of y.
Asm: VPSIGNB, CPU Feature: AVX2
func (Int8x32) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQB, CPU Feature: AVX2
func (Int8x32) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2
func (Int8x32) ExtendToInt16 ¶
ExtendToInt16 sign-extends element values to int16.
Asm: VPMOVSXBW, CPU Feature: AVX512
func (Int8x32) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPGTB, CPU Feature: AVX2
func (Int8x32) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX2
func (Int8x32) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Int8x32) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX2
func (Int8x32) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX2
func (Int8x32) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXSB, CPU Feature: AVX2
func (Int8x32) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINSB, CPU Feature: AVX2
func (Int8x32) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX2
func (Int8x32) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG
func (Int8x32) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Int8x32) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 5 bits (values 0-31) of each element of indices is used.
Asm: VPERMB, CPU Feature: AVX512VBMI
func (Int8x32) PermuteOrZeroGrouped ¶
PermuteOrZeroGrouped performs a grouped permutation of vector x using indices:
result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...}
The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.
Asm: VPSHUFB, CPU Feature: AVX2
func (Int8x32) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{0x40, 0x41, ..., 0x4f, 0x50, 0x51, ..., 0x5f}.Select128FromPair(3, 0,
{0x60, 0x61, ..., 0x6f, 0x70, 0x71, ..., 0x7f})
returns {0x70, 0x71, ..., 0x7f, 0x40, 0x41, ..., 0x4f}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Int8x32) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int8x32) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int8x32) StoreSlice ¶
StoreSlice stores x into a slice of at least 32 int8s.
func (Int8x32) StoreSlicePart ¶
StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.
func (Int8x32) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX2
func (Int8x32) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSB, CPU Feature: AVX2
type Int8x64 ¶
type Int8x64 struct {
// contains filtered or unexported fields
}
Int8x64 is a 512-bit SIMD vector of 64 int8s.
func BroadcastInt8x64 ¶
BroadcastInt8x64 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX512BW
func LoadInt8x64 ¶
LoadInt8x64 loads an Int8x64 from an array.
func LoadInt8x64Slice ¶
LoadInt8x64Slice loads an Int8x64 from a slice of at least 64 int8s.
func LoadInt8x64SlicePart ¶
LoadInt8x64SlicePart loads a Int8x64 from the slice s. If s has fewer than 64 elements, the remaining elements of the vector are filled with zeroes. If s has 64 or more elements, the function is equivalent to LoadInt8x64Slice.
func LoadMaskedInt8x64 ¶
LoadMaskedInt8x64 loads an Int8x64 from an array, at those elements enabled by mask.
Asm: VMOVDQU8.Z, CPU Feature: AVX512
func (Int8x64) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSB, CPU Feature: AVX512
func (Int8x64) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDB, CPU Feature: AVX512
func (Int8x64) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSB, CPU Feature: AVX512
func (Int8x64) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512
func (Int8x64) AsFloat32x16 ¶
func (x Int8x64) AsFloat32x16() Float32x16
AsFloat32x16 returns a Float32x16 with the same bit representation as x.
func (Int8x64) AsFloat64x8 ¶
AsFloat64x8 returns a Float64x8 with the same bit representation as x.
func (Int8x64) AsInt16x32 ¶
AsInt16x32 returns an Int16x32 with the same bit representation as x.
func (Int8x64) AsInt32x16 ¶
AsInt32x16 returns an Int32x16 with the same bit representation as x.
func (Int8x64) AsUint16x32 ¶
AsUint16x32 returns a Uint16x32 with the same bit representation as x.
func (Int8x64) AsUint32x16 ¶
AsUint32x16 returns a Uint32x16 with the same bit representation as x.
func (Int8x64) AsUint64x8 ¶
AsUint64x8 returns a Uint64x8 with the same bit representation as x.
func (Int8x64) AsUint8x64 ¶
AsUint8x64 returns a Uint8x64 with the same bit representation as x.
func (Int8x64) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2
func (Int8x64) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI
func (Int8x64) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQB, CPU Feature: AVX512
func (Int8x64) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2
func (Int8x64) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPGTB, CPU Feature: AVX512
func (Int8x64) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Asm: VPCMPB, CPU Feature: AVX512
func (Int8x64) Less ¶
Less returns a mask whose elements indicate whether x < y.
Asm: VPCMPB, CPU Feature: AVX512
func (Int8x64) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Asm: VPCMPB, CPU Feature: AVX512
func (Int8x64) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXSB, CPU Feature: AVX512
func (Int8x64) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINSB, CPU Feature: AVX512
func (Int8x64) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Asm: VPCMPB, CPU Feature: AVX512
func (Int8x64) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG
func (Int8x64) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512
func (Int8x64) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 6 bits (values 0-63) of each element of indices is used.
Asm: VPERMB, CPU Feature: AVX512VBMI
func (Int8x64) PermuteOrZeroGrouped ¶
PermuteOrZeroGrouped performs a grouped permutation of vector x using indices:
result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...}
The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.
Asm: VPSHUFB, CPU Feature: AVX512
func (Int8x64) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int8x64) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int8x64) StoreMasked ¶
StoreMasked stores an Int8x64 to an array, at those elements enabled by mask.
Asm: VMOVDQU8, CPU Feature: AVX512
func (Int8x64) StoreSlice ¶
StoreSlice stores x into a slice of at least 64 int8s.
func (Int8x64) StoreSlicePart ¶
StoreSlicePart stores the 64 elements of x into the slice s. It stores as many elements as will fit in s. If s has 64 or more elements, the method is equivalent to x.StoreSlice.
func (Int8x64) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX512
func (Int8x64) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSB, CPU Feature: AVX512
type Mask16x16 ¶
type Mask16x16 struct {
// contains filtered or unexported fields
}
Mask16x16 is a mask for a SIMD vector of 16 16-bit elements.
func Mask16x16FromBits ¶
Mask16x16FromBits constructs a Mask16x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVW, CPU Feature: AVX512
func (Mask16x16) ToBits ¶
ToBits constructs a bitmap from a Mask16x16, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVW, CPU Features: AVX512
func (Mask16x16) ToInt16x16 ¶
ToInt16x16 converts from Mask16x16 to Int16x16.
type Mask16x32 ¶
type Mask16x32 struct {
// contains filtered or unexported fields
}
Mask16x32 is a mask for a SIMD vector of 32 16-bit elements.
func Mask16x32FromBits ¶
Mask16x32FromBits constructs a Mask16x32 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVW, CPU Feature: AVX512
func (Mask16x32) ToBits ¶
ToBits constructs a bitmap from a Mask16x32, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVW, CPU Features: AVX512
func (Mask16x32) ToInt16x32 ¶
ToInt16x32 converts from Mask16x32 to Int16x32.
type Mask16x8 ¶
type Mask16x8 struct {
// contains filtered or unexported fields
}
Mask16x8 is a mask for a SIMD vector of 8 16-bit elements.
func Mask16x8FromBits ¶
Mask16x8FromBits constructs a Mask16x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVW, CPU Feature: AVX512
type Mask32x16 ¶
type Mask32x16 struct {
// contains filtered or unexported fields
}
Mask32x16 is a mask for a SIMD vector of 16 32-bit elements.
func Mask32x16FromBits ¶
Mask32x16FromBits constructs a Mask32x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVD, CPU Feature: AVX512
func (Mask32x16) ToBits ¶
ToBits constructs a bitmap from a Mask32x16, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVD, CPU Features: AVX512
func (Mask32x16) ToInt32x16 ¶
ToInt32x16 converts from Mask32x16 to Int32x16.
type Mask32x4 ¶
type Mask32x4 struct {
// contains filtered or unexported fields
}
Mask32x4 is a mask for a SIMD vector of 4 32-bit elements.
func Mask32x4FromBits ¶
Mask32x4FromBits constructs a Mask32x4 from a bitmap value, where 1 means set for the indexed element, 0 means unset. Only the lower 4 bits of y are used.
Asm: KMOVD, CPU Feature: AVX512
type Mask32x8 ¶
type Mask32x8 struct {
// contains filtered or unexported fields
}
Mask32x8 is a mask for a SIMD vector of 8 32-bit elements.
func Mask32x8FromBits ¶
Mask32x8FromBits constructs a Mask32x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVD, CPU Feature: AVX512
type Mask64x2 ¶
type Mask64x2 struct {
// contains filtered or unexported fields
}
Mask64x2 is a mask for a SIMD vector of 2 64-bit elements.
func Mask64x2FromBits ¶
Mask64x2FromBits constructs a Mask64x2 from a bitmap value, where 1 means set for the indexed element, 0 means unset. Only the lower 2 bits of y are used.
Asm: KMOVQ, CPU Feature: AVX512
type Mask64x4 ¶
type Mask64x4 struct {
// contains filtered or unexported fields
}
Mask64x4 is a mask for a SIMD vector of 4 64-bit elements.
func Mask64x4FromBits ¶
Mask64x4FromBits constructs a Mask64x4 from a bitmap value, where 1 means set for the indexed element, 0 means unset. Only the lower 4 bits of y are used.
Asm: KMOVQ, CPU Feature: AVX512
type Mask64x8 ¶
type Mask64x8 struct {
// contains filtered or unexported fields
}
Mask64x8 is a mask for a SIMD vector of 8 64-bit elements.
func Mask64x8FromBits ¶
Mask64x8FromBits constructs a Mask64x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVQ, CPU Feature: AVX512
type Mask8x16 ¶
type Mask8x16 struct {
// contains filtered or unexported fields
}
Mask8x16 is a mask for a SIMD vector of 16 8-bit elements.
func Mask8x16FromBits ¶
Mask8x16FromBits constructs a Mask8x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVB, CPU Feature: AVX512
type Mask8x32 ¶
type Mask8x32 struct {
// contains filtered or unexported fields
}
Mask8x32 is a mask for a SIMD vector of 32 8-bit elements.
func Mask8x32FromBits ¶
Mask8x32FromBits constructs a Mask8x32 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVB, CPU Feature: AVX512
type Mask8x64 ¶
type Mask8x64 struct {
// contains filtered or unexported fields
}
Mask8x64 is a mask for a SIMD vector of 64 8-bit elements.
func Mask8x64FromBits ¶
Mask8x64FromBits constructs a Mask8x64 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVB, CPU Feature: AVX512
type Uint16x16 ¶
type Uint16x16 struct {
// contains filtered or unexported fields
}
Uint16x16 is a 256-bit SIMD vector of 16 uint16s.
func BroadcastUint16x16 ¶
BroadcastUint16x16 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadUint16x16 ¶
LoadUint16x16 loads a Uint16x16 from an array.
func LoadUint16x16Slice ¶
LoadUint16x16Slice loads an Uint16x16 from a slice of at least 16 uint16s.
func LoadUint16x16SlicePart ¶
LoadUint16x16SlicePart loads a Uint16x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadUint16x16Slice.
func (Uint16x16) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDW, CPU Feature: AVX2
func (Uint16x16) AddPairsGrouped ¶
AddPairsGrouped horizontally adds adjacent pairs of elements. With each 128-bit as a group: for x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0+x1, x2+x3, ..., y0+y1, y2+y3, ...].
Asm: VPHADDW, CPU Feature: AVX2
func (Uint16x16) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSW, CPU Feature: AVX2
func (Uint16x16) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Uint16x16) AsFloat32x8 ¶
AsFloat32x8 returns a Float32x8 with the same bit representation as x.
func (Uint16x16) AsFloat64x4 ¶
AsFloat64x4 returns a Float64x4 with the same bit representation as x.
func (Uint16x16) AsInt16x16 ¶
AsInt16x16 returns an Int16x16 with the same bit representation as x.
func (Uint16x16) AsUint32x8 ¶
AsUint32x8 returns a Uint32x8 with the same bit representation as x.
func (Uint16x16) AsUint64x4 ¶
AsUint64x4 returns a Uint64x4 with the same bit representation as x.
func (Uint16x16) AsUint8x32 ¶
AsUint8x32 returns a Uint8x32 with the same bit representation as x.
func (Uint16x16) Average ¶
Average computes the rounded average of corresponding elements.
Asm: VPAVGW, CPU Feature: AVX2
func (Uint16x16) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2
func (Uint16x16) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512
func (Uint16x16) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQW, CPU Feature: AVX2
func (Uint16x16) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2
func (Uint16x16) ExtendToUint32 ¶
ExtendToUint32 zero-extends element values to uint32.
Asm: VPMOVZXWD, CPU Feature: AVX512
func (Uint16x16) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Emulated, CPU Feature: AVX2
func (Uint16x16) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX2
func (Uint16x16) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX2
func (Uint16x16) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX2
func (Uint16x16) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Uint16x16) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX2
func (Uint16x16) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX2
func (Uint16x16) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXUW, CPU Feature: AVX2
func (Uint16x16) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINUW, CPU Feature: AVX2
func (Uint16x16) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX2
func (Uint16x16) MulHigh ¶
MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHUW, CPU Feature: AVX2
func (Uint16x16) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX2
func (Uint16x16) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG
func (Uint16x16) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Uint16x16) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 4 bits (values 0-15) of each element of indices is used.
Asm: VPERMW, CPU Feature: AVX512
func (Uint16x16) PermuteScalarsHiGrouped ¶
PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4],
x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12]}
Each group is of size 128-bit.
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX2
func (Uint16x16) PermuteScalarsLoGrouped ¶
PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7],
x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX2
func (Uint16x16) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8 with unsigned saturation.
Asm: VPMOVUSWB, CPU Feature: AVX512
func (Uint16x16) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 42, 43, 44, 45, 46, 47, 50, 51, 52, 53, 54, 55, 56, 57}.Select128FromPair(3, 0,
{60, 61, 62, 63, 64, 65, 66, 67, 70, 71, 72, 73, 74, 75, 76, 77})
returns {70, 71, 72, 73, 74, 75, 76, 77, 40, 41, 42, 43, 44, 45, 46, 47}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Uint16x16) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint16x16) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint16x16) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLW, CPU Feature: AVX2
func (Uint16x16) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2
func (Uint16x16) ShiftAllRight ¶
ShiftAllRight performs an unsigned right shift on each element by y bits.
Asm: VPSRLW, CPU Feature: AVX2
func (Uint16x16) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2
func (Uint16x16) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVW, CPU Feature: AVX512
func (Uint16x16) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2
func (Uint16x16) ShiftRight ¶
ShiftRight performs an unsigned right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRLVW, CPU Feature: AVX512
func (Uint16x16) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2
func (Uint16x16) StoreSlice ¶
StoreSlice stores x into a slice of at least 16 uint16s.
func (Uint16x16) StoreSlicePart ¶
StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.
func (Uint16x16) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX2
func (Uint16x16) SubPairsGrouped ¶
SubPairsGrouped horizontally subtracts adjacent pairs of elements. With each 128-bit as a group: for x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0-x1, x2-x3, ..., y0-y1, y2-y3, ...].
Asm: VPHSUBW, CPU Feature: AVX2
func (Uint16x16) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSW, CPU Feature: AVX2
func (Uint16x16) TruncateToUint8 ¶
TruncateToUint8 truncates element values to uint8.
Asm: VPMOVWB, CPU Feature: AVX512
type Uint16x32 ¶
type Uint16x32 struct {
// contains filtered or unexported fields
}
Uint16x32 is a 512-bit SIMD vector of 32 uint16s.
func BroadcastUint16x32 ¶
BroadcastUint16x32 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX512BW
func LoadMaskedUint16x32 ¶
LoadMaskedUint16x32 loads a Uint16x32 from an array, at those elements enabled by mask.
Asm: VMOVDQU16.Z, CPU Feature: AVX512
func LoadUint16x32 ¶
LoadUint16x32 loads a Uint16x32 from an array.
func LoadUint16x32Slice ¶
LoadUint16x32Slice loads an Uint16x32 from a slice of at least 32 uint16s.
func LoadUint16x32SlicePart ¶
LoadUint16x32SlicePart loads a Uint16x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadUint16x32Slice.
func (Uint16x32) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDW, CPU Feature: AVX512
func (Uint16x32) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSW, CPU Feature: AVX512
func (Uint16x32) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512
func (Uint16x32) AsFloat32x16 ¶
func (x Uint16x32) AsFloat32x16() Float32x16
AsFloat32x16 returns a Float32x16 with the same bit representation as x.
func (Uint16x32) AsFloat64x8 ¶
AsFloat64x8 returns a Float64x8 with the same bit representation as x.
func (Uint16x32) AsInt16x32 ¶
AsInt16x32 returns an Int16x32 with the same bit representation as x.
func (Uint16x32) AsInt32x16 ¶
AsInt32x16 returns an Int32x16 with the same bit representation as x.
func (Uint16x32) AsUint32x16 ¶
AsUint32x16 returns a Uint32x16 with the same bit representation as x.
func (Uint16x32) AsUint64x8 ¶
AsUint64x8 returns a Uint64x8 with the same bit representation as x.
func (Uint16x32) AsUint8x64 ¶
AsUint8x64 returns a Uint8x64 with the same bit representation as x.
func (Uint16x32) Average ¶
Average computes the rounded average of corresponding elements.
Asm: VPAVGW, CPU Feature: AVX512
func (Uint16x32) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2
func (Uint16x32) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512
func (Uint16x32) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQW, CPU Feature: AVX512
func (Uint16x32) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2
func (Uint16x32) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPUW, CPU Feature: AVX512
func (Uint16x32) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Asm: VPCMPUW, CPU Feature: AVX512
func (Uint16x32) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX512
func (Uint16x32) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX512
func (Uint16x32) Less ¶
Less returns a mask whose elements indicate whether x < y.
Asm: VPCMPUW, CPU Feature: AVX512
func (Uint16x32) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Asm: VPCMPUW, CPU Feature: AVX512
func (Uint16x32) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXUW, CPU Feature: AVX512
func (Uint16x32) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINUW, CPU Feature: AVX512
func (Uint16x32) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX512
func (Uint16x32) MulHigh ¶
MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHUW, CPU Feature: AVX512
func (Uint16x32) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Asm: VPCMPUW, CPU Feature: AVX512
func (Uint16x32) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG
func (Uint16x32) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512
func (Uint16x32) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 5 bits (values 0-31) of each element of indices is used.
Asm: VPERMW, CPU Feature: AVX512
func (Uint16x32) PermuteScalarsHiGrouped ¶
PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{ x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4],
x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12],
x[16], x[17], x[18], x[19], x[a+20], x[b+20], x[c+20], x[d+20],
x[24], x[25], x[26], x[27], x[a+28], x[b+28], x[c+28], x[d+28]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX512
func (Uint16x32) PermuteScalarsLoGrouped ¶
PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7],
x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15],
x[a+16], x[b+16], x[c+16], x[d+16], x[20], x[21], x[22], x[23],
x[a+24], x[b+24], x[c+24], x[d+24], x[28], x[29], x[30], x[31]}
Each group is of size 128-bit.
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX512
func (Uint16x32) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8 with unsigned saturation.
Asm: VPMOVUSWB, CPU Feature: AVX512
func (Uint16x32) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint16x32) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint16x32) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLW, CPU Feature: AVX512
func (Uint16x32) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2
func (Uint16x32) ShiftAllRight ¶
ShiftAllRight performs an unsigned right shift on each element by y bits.
Asm: VPSRLW, CPU Feature: AVX512
func (Uint16x32) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2
func (Uint16x32) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVW, CPU Feature: AVX512
func (Uint16x32) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2
func (Uint16x32) ShiftRight ¶
ShiftRight performs an unsigned right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRLVW, CPU Feature: AVX512
func (Uint16x32) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2
func (Uint16x32) StoreMasked ¶
StoreMasked stores a Uint16x32 to an array, at those elements enabled by mask.
Asm: VMOVDQU16, CPU Feature: AVX512
func (Uint16x32) StoreSlice ¶
StoreSlice stores x into a slice of at least 32 uint16s.
func (Uint16x32) StoreSlicePart ¶
StoreSlicePart stores the 32 elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.
func (Uint16x32) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX512
func (Uint16x32) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSW, CPU Feature: AVX512
func (Uint16x32) TruncateToUint8 ¶
TruncateToUint8 truncates element values to uint8.
Asm: VPMOVWB, CPU Feature: AVX512
type Uint16x8 ¶
type Uint16x8 struct {
// contains filtered or unexported fields
}
Uint16x8 is a 128-bit SIMD vector of 8 uint16s.
func BroadcastUint16x8 ¶
BroadcastUint16x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadUint16x8 ¶
LoadUint16x8 loads a Uint16x8 from an array.
func LoadUint16x8Slice ¶
LoadUint16x8Slice loads an Uint16x8 from a slice of at least 8 uint16s.
func LoadUint16x8SlicePart ¶
LoadUint16x8SlicePart loads a Uint16x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadUint16x8Slice.
func (Uint16x8) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0+x1, x2+x3, ..., y0+y1, y2+y3, ...].
Asm: VPHADDW, CPU Feature: AVX
func (Uint16x8) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSW, CPU Feature: AVX
func (Uint16x8) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Uint16x8) AsFloat32x4 ¶
AsFloat32x4 returns a Float32x4 with the same bit representation as x.
func (Uint16x8) AsFloat64x2 ¶
AsFloat64x2 returns a Float64x2 with the same bit representation as x.
func (Uint16x8) AsUint32x4 ¶
AsUint32x4 returns a Uint32x4 with the same bit representation as x.
func (Uint16x8) AsUint64x2 ¶
AsUint64x2 returns a Uint64x2 with the same bit representation as x.
func (Uint16x8) AsUint8x16 ¶
AsUint8x16 returns a Uint8x16 with the same bit representation as x.
func (Uint16x8) Average ¶
Average computes the rounded average of corresponding elements.
Asm: VPAVGW, CPU Feature: AVX
func (Uint16x8) Broadcast1To16 ¶
Broadcast1To16 copies the lowest element of its input to all 16 elements of the output vector.
Asm: VPBROADCASTW, CPU Feature: AVX2
func (Uint16x8) Broadcast1To32 ¶
Broadcast1To32 copies the lowest element of its input to all 32 elements of the output vector.
Asm: VPBROADCASTW, CPU Feature: AVX512
func (Uint16x8) Broadcast1To8 ¶
Broadcast1To8 copies the lowest element of its input to all 8 elements of the output vector.
Asm: VPBROADCASTW, CPU Feature: AVX2
func (Uint16x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2
func (Uint16x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512
func (Uint16x8) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQW, CPU Feature: AVX
func (Uint16x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2
func (Uint16x8) ExtendLo2ToUint64 ¶
ExtendLo2ToUint64 zero-extends 2 lowest vector element values to uint64.
Asm: VPMOVZXWQ, CPU Feature: AVX
func (Uint16x8) ExtendLo4ToUint32 ¶
ExtendLo4ToUint32 zero-extends 4 lowest vector element values to uint32.
Asm: VPMOVZXWD, CPU Feature: AVX
func (Uint16x8) ExtendLo4ToUint64 ¶
ExtendLo4ToUint64 zero-extends 4 lowest vector element values to uint64.
Asm: VPMOVZXWQ, CPU Feature: AVX2
func (Uint16x8) ExtendToUint32 ¶
ExtendToUint32 zero-extends element values to uint32.
Asm: VPMOVZXWD, CPU Feature: AVX2
func (Uint16x8) ExtendToUint64 ¶
ExtendToUint64 zero-extends element values to uint64.
Asm: VPMOVZXWQ, CPU Feature: AVX512
func (Uint16x8) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRW, CPU Feature: AVX512
func (Uint16x8) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Emulated, CPU Feature: AVX
func (Uint16x8) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX
func (Uint16x8) InterleaveHi ¶
InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX
func (Uint16x8) InterleaveLo ¶
InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX
func (Uint16x8) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Uint16x8) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX
func (Uint16x8) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX
func (Uint16x8) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXUW, CPU Feature: AVX
func (Uint16x8) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINUW, CPU Feature: AVX
func (Uint16x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX
func (Uint16x8) MulHigh ¶
MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHUW, CPU Feature: AVX
func (Uint16x8) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX
func (Uint16x8) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG
func (Uint16x8) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Uint16x8) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used.
Asm: VPERMW, CPU Feature: AVX512
func (Uint16x8) PermuteScalarsHi ¶
PermuteScalarsHi performs a permutation of vector x using the supplied indices:
result = {x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX512
func (Uint16x8) PermuteScalarsLo ¶
PermuteScalarsLo performs a permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX512
func (Uint16x8) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8 with unsigned saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVUSWB, CPU Feature: AVX512
func (Uint16x8) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRW, CPU Feature: AVX
func (Uint16x8) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLW, CPU Feature: AVX
func (Uint16x8) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2
func (Uint16x8) ShiftAllRight ¶
ShiftAllRight performs an unsigned right shift on each element by y bits.
Asm: VPSRLW, CPU Feature: AVX
func (Uint16x8) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2
func (Uint16x8) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVW, CPU Feature: AVX512
func (Uint16x8) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2
func (Uint16x8) ShiftRight ¶
ShiftRight performs an unsigned right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRLVW, CPU Feature: AVX512
func (Uint16x8) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2
func (Uint16x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 uint16s.
func (Uint16x8) StoreSlicePart ¶
StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Uint16x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX
func (Uint16x8) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0-x1, x2-x3, ..., y0-y1, y2-y3, ...].
Asm: VPHSUBW, CPU Feature: AVX
func (Uint16x8) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSW, CPU Feature: AVX
func (Uint16x8) TruncateToUint8 ¶
TruncateToUint8 truncates element values to uint8. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVWB, CPU Feature: AVX512
type Uint32x16 ¶
type Uint32x16 struct {
// contains filtered or unexported fields
}
Uint32x16 is a 512-bit SIMD vector of 16 uint32s.
func BroadcastUint32x16 ¶
BroadcastUint32x16 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX512F
func LoadMaskedUint32x16 ¶
LoadMaskedUint32x16 loads a Uint32x16 from an array, at those elements enabled by mask.
Asm: VMOVDQU32.Z, CPU Feature: AVX512
func LoadUint32x16 ¶
LoadUint32x16 loads a Uint32x16 from an array.
func LoadUint32x16Slice ¶
LoadUint32x16Slice loads an Uint32x16 from a slice of at least 16 uint32s.
func LoadUint32x16SlicePart ¶
LoadUint32x16SlicePart loads a Uint32x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadUint32x16Slice.
func (Uint32x16) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDD, CPU Feature: AVX512
func (Uint32x16) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512
func (Uint32x16) AsFloat32x16 ¶
func (x Uint32x16) AsFloat32x16() Float32x16
AsFloat32x16 returns a Float32x16 with the same bit representation as x.
func (Uint32x16) AsFloat64x8 ¶
AsFloat64x8 returns a Float64x8 with the same bit representation as x.
func (Uint32x16) AsInt16x32 ¶
AsInt16x32 returns an Int16x32 with the same bit representation as x.
func (Uint32x16) AsInt32x16 ¶
AsInt32x16 returns an Int32x16 with the same bit representation as x.
func (Uint32x16) AsUint16x32 ¶
AsUint16x32 returns a Uint16x32 with the same bit representation as x.
func (Uint32x16) AsUint64x8 ¶
AsUint64x8 returns a Uint64x8 with the same bit representation as x.
func (Uint32x16) AsUint8x64 ¶
AsUint8x64 returns a Uint8x64 with the same bit representation as x.
func (Uint32x16) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512
func (Uint32x16) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512
func (Uint32x16) ConvertToFloat32 ¶
func (x Uint32x16) ConvertToFloat32() Float32x16
ConvertToFloat32 converts element values to float32.
Asm: VCVTUDQ2PS, CPU Feature: AVX512
func (Uint32x16) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQD, CPU Feature: AVX512
func (Uint32x16) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512
func (Uint32x16) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPUD, CPU Feature: AVX512
func (Uint32x16) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Asm: VPCMPUD, CPU Feature: AVX512
func (Uint32x16) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX512
func (Uint32x16) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX512
func (Uint32x16) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512
func (Uint32x16) Less ¶
Less returns a mask whose elements indicate whether x < y.
Asm: VPCMPUD, CPU Feature: AVX512
func (Uint32x16) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Asm: VPCMPUD, CPU Feature: AVX512
func (Uint32x16) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXUD, CPU Feature: AVX512
func (Uint32x16) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINUD, CPU Feature: AVX512
func (Uint32x16) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX512
func (Uint32x16) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Asm: VPCMPUD, CPU Feature: AVX512
func (Uint32x16) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ
func (Uint32x16) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512
func (Uint32x16) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 4 bits (values 0-15) of each element of indices is used.
Asm: VPERMD, CPU Feature: AVX512
func (Uint32x16) PermuteScalarsGrouped ¶
PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{ x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4],
x[a+8], x[b+8], x[c+8], x[d+8], x[a+12], x[b+12], x[c+12], x[d+12]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFD, CPU Feature: AVX512
func (Uint32x16) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512
func (Uint32x16) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512
func (Uint32x16) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512
func (Uint32x16) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512
func (Uint32x16) SaturateToUint16 ¶
SaturateToUint16 converts element values to uint16 with unsigned saturation.
Asm: VPMOVUSDW, CPU Feature: AVX512
func (Uint32x16) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8 with unsigned saturation.
Asm: VPMOVUSDB, CPU Feature: AVX512
func (Uint32x16) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX512
func (Uint32x16) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint32x16) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint32x16) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLD, CPU Feature: AVX512
func (Uint32x16) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2
func (Uint32x16) ShiftAllRight ¶
ShiftAllRight performs an unsigned right shift on each element by y bits.
Asm: VPSRLD, CPU Feature: AVX512
func (Uint32x16) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2
func (Uint32x16) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVD, CPU Feature: AVX512
func (Uint32x16) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2
func (Uint32x16) ShiftRight ¶
ShiftRight performs an unsigned right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRLVD, CPU Feature: AVX512
func (Uint32x16) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2
func (Uint32x16) StoreMasked ¶
StoreMasked stores a Uint32x16 to an array, at those elements enabled by mask.
Asm: VMOVDQU32, CPU Feature: AVX512
func (Uint32x16) StoreSlice ¶
StoreSlice stores x into a slice of at least 16 uint32s.
func (Uint32x16) StoreSlicePart ¶
StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.
func (Uint32x16) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX512
func (Uint32x16) TruncateToUint16 ¶
TruncateToUint16 truncates element values to uint16.
Asm: VPMOVDW, CPU Feature: AVX512
func (Uint32x16) TruncateToUint8 ¶
TruncateToUint8 truncates element values to uint8.
Asm: VPMOVDB, CPU Feature: AVX512
type Uint32x4 ¶
type Uint32x4 struct {
// contains filtered or unexported fields
}
Uint32x4 is a 128-bit SIMD vector of 4 uint32s.
func BroadcastUint32x4 ¶
BroadcastUint32x4 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadMaskedUint32x4 ¶
LoadMaskedUint32x4 loads a Uint32x4 from an array, at those elements enabled by mask.
Asm: VMASKMOVD, CPU Feature: AVX2
func LoadUint32x4 ¶
LoadUint32x4 loads a Uint32x4 from an array.
func LoadUint32x4Slice ¶
LoadUint32x4Slice loads an Uint32x4 from a slice of at least 4 uint32s.
func LoadUint32x4SlicePart ¶
LoadUint32x4SlicePart loads a Uint32x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadUint32x4Slice.
func (Uint32x4) AESInvMixColumns ¶
AESInvMixColumns performs the InvMixColumns operation in AES cipher algorithm defined in FIPS 197. x is the chunk of w array in use. result = InvMixColumns(x)
Asm: VAESIMC, CPU Feature: AVX, AES
func (Uint32x4) AESRoundKeyGenAssist ¶
AESRoundKeyGenAssist performs some components of KeyExpansion in AES cipher algorithm defined in FIPS 197. x is an array of AES words, but only x[0] and x[2] are used. r is a value from the Rcon constant array. result[0] = XOR(SubWord(RotWord(x[0])), r) result[1] = SubWord(x[1]) result[2] = XOR(SubWord(RotWord(x[2])), r) result[3] = SubWord(x[3])
rconVal results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VAESKEYGENASSIST, CPU Feature: AVX, AES
func (Uint32x4) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0+x1, x2+x3, ..., y0+y1, y2+y3, ...].
Asm: VPHADDD, CPU Feature: AVX
func (Uint32x4) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Uint32x4) AsFloat32x4 ¶
AsFloat32x4 returns a Float32x4 with the same bit representation as x.
func (Uint32x4) AsFloat64x2 ¶
AsFloat64x2 returns a Float64x2 with the same bit representation as x.
func (Uint32x4) AsUint16x8 ¶
AsUint16x8 returns a Uint16x8 with the same bit representation as x.
func (Uint32x4) AsUint64x2 ¶
AsUint64x2 returns a Uint64x2 with the same bit representation as x.
func (Uint32x4) AsUint8x16 ¶
AsUint8x16 returns a Uint8x16 with the same bit representation as x.
func (Uint32x4) Broadcast1To16 ¶
Broadcast1To16 copies the lowest element of its input to all 16 elements of the output vector.
Asm: VPBROADCASTD, CPU Feature: AVX512
func (Uint32x4) Broadcast1To4 ¶
Broadcast1To4 copies the lowest element of its input to all 4 elements of the output vector.
Asm: VPBROADCASTD, CPU Feature: AVX2
func (Uint32x4) Broadcast1To8 ¶
Broadcast1To8 copies the lowest element of its input to all 8 elements of the output vector.
Asm: VPBROADCASTD, CPU Feature: AVX2
func (Uint32x4) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512
func (Uint32x4) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512
func (Uint32x4) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTUDQ2PS, CPU Feature: AVX512
func (Uint32x4) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTUDQ2PD, CPU Feature: AVX512
func (Uint32x4) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQD, CPU Feature: AVX
func (Uint32x4) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512
func (Uint32x4) ExtendLo2ToUint64 ¶
ExtendLo2ToUint64 zero-extends 2 lowest vector element values to uint64.
Asm: VPMOVZXDQ, CPU Feature: AVX
func (Uint32x4) ExtendToUint64 ¶
ExtendToUint64 zero-extends element values to uint64.
Asm: VPMOVZXDQ, CPU Feature: AVX2
func (Uint32x4) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRD, CPU Feature: AVX
func (Uint32x4) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Emulated, CPU Feature: AVX
func (Uint32x4) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX
func (Uint32x4) InterleaveHi ¶
InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX
func (Uint32x4) InterleaveLo ¶
InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX
func (Uint32x4) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Uint32x4) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512
func (Uint32x4) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX
func (Uint32x4) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX
func (Uint32x4) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXUD, CPU Feature: AVX
func (Uint32x4) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINUD, CPU Feature: AVX
func (Uint32x4) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX
func (Uint32x4) MulEvenWiden ¶
MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1[2*i] * v2[2*i].
Asm: VPMULUDQ, CPU Feature: AVX
func (Uint32x4) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX
func (Uint32x4) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ
func (Uint32x4) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Uint32x4) PermuteScalars ¶
PermuteScalars performs a permutation of vector x's elements using the supplied indices:
result = {x[a], x[b], x[c], x[d]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.
Asm: VPSHUFD, CPU Feature: AVX
func (Uint32x4) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512
func (Uint32x4) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512
func (Uint32x4) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512
func (Uint32x4) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512
func (Uint32x4) SHA1FourRounds ¶
SHA1FourRounds performs 4 rounds of B loop in SHA1 algorithm defined in FIPS 180-4. x contains the state variables a, b, c and d from upper to lower order. y contains the W array elements (with the state variable e added to the upper element) from upper to lower order. result = the state variables a', b', c', d' updated after 4 rounds. constant = 0 for the first 20 rounds of the loop, 1 for the next 20 rounds of the loop..., 3 for the last 20 rounds of the loop.
constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: SHA1RNDS4, CPU Feature: SHA
func (Uint32x4) SHA1Message1 ¶
SHA1Message1 does the XORing of 1 in SHA1 algorithm defined in FIPS 180-4. x = {W3, W2, W1, W0} y = {0, 0, W5, W4} result = {W3^W5, W2^W4, W1^W3, W0^W2}.
Asm: SHA1MSG1, CPU Feature: SHA
func (Uint32x4) SHA1Message2 ¶
SHA1Message2 does the calculation of 3 and 4 in SHA1 algorithm defined in FIPS 180-4. x = result of 2. y = {W15, W14, W13} result = {W19, W18, W17, W16}
Asm: SHA1MSG2, CPU Feature: SHA
func (Uint32x4) SHA1NextE ¶
SHA1NextE calculates the state variable e' updated after 4 rounds in SHA1 algorithm defined in FIPS 180-4. x contains the state variable a (before the 4 rounds), placed in the upper element. y is the elements of W array for next 4 rounds from upper to lower order. result = the elements of the W array for the next 4 rounds, with the updated state variable e' added to the upper element, from upper to lower order. For the last round of the loop, you can specify zero for y to obtain the e' value itself, or better off specifying H4:0:0:0 for y to get e' added to H4. (Note that the value of e' is computed only from x, and values of y don't affect the computation of the value of e'.)
Asm: SHA1NEXTE, CPU Feature: SHA
func (Uint32x4) SHA256Message1 ¶
SHA256Message1 does the sigma and addtion of 1 in SHA1 algorithm defined in FIPS 180-4. x = {W0, W1, W2, W3} y = {W4, 0, 0, 0} result = {W0+σ(W1), W1+σ(W2), W2+σ(W3), W3+σ(W4)}
Asm: SHA256MSG1, CPU Feature: SHA
func (Uint32x4) SHA256Message2 ¶
SHA256Message2 does the sigma and addition of 3 in SHA1 algorithm defined in FIPS 180-4. x = result of 2 y = {0, 0, W14, W15} result = {W16, W17, W18, W19}
Asm: SHA256MSG2, CPU Feature: SHA
func (Uint32x4) SHA256TwoRounds ¶
SHA256TwoRounds does 2 rounds of B loop to calculate updated state variables in SHA1 algorithm defined in FIPS 180-4. x = {h, g, d, c} y = {f, e, b, a} z = {W0+K0, W1+K1} result = {f', e', b', a'} The K array is a 64-DWORD constant array defined in page 11 of FIPS 180-4. Each element of the K array is to be added to the corresponding element of the W array to make the input data z. The updated state variables c', d', g', h' are not returned by this instruction, because they are equal to the input data y (the state variables a, b, e, f before the 2 rounds).
Asm: SHA256RNDS2, CPU Feature: SHA
func (Uint32x4) SaturateToUint16 ¶
SaturateToUint16 converts element values to uint16 with unsigned saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVUSDW, CPU Feature: AVX512
func (Uint32x4) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8 with unsigned saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVUSDB, CPU Feature: AVX512
func (Uint32x4) SelectFromPair ¶
SelectFromPair returns the selection of four elements from the two vectors x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example,
{1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81})
returns {4,8,25,81}.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX
func (Uint32x4) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRD, CPU Feature: AVX
func (Uint32x4) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLD, CPU Feature: AVX
func (Uint32x4) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2
func (Uint32x4) ShiftAllRight ¶
ShiftAllRight performs an unsigned right shift on each element by y bits.
Asm: VPSRLD, CPU Feature: AVX
func (Uint32x4) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2
func (Uint32x4) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVD, CPU Feature: AVX2
func (Uint32x4) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2
func (Uint32x4) ShiftRight ¶
ShiftRight performs an unsigned right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRLVD, CPU Feature: AVX2
func (Uint32x4) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2
func (Uint32x4) StoreMasked ¶
StoreMasked stores a Uint32x4 to an array, at those elements enabled by mask.
Asm: VMASKMOVD, CPU Feature: AVX2
func (Uint32x4) StoreSlice ¶
StoreSlice stores x into a slice of at least 4 uint32s.
func (Uint32x4) StoreSlicePart ¶
StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.
func (Uint32x4) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX
func (Uint32x4) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0-x1, x2-x3, ..., y0-y1, y2-y3, ...].
Asm: VPHSUBD, CPU Feature: AVX
func (Uint32x4) TruncateToUint16 ¶
TruncateToUint16 truncates element values to uint16. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVDW, CPU Feature: AVX512
func (Uint32x4) TruncateToUint8 ¶
TruncateToUint8 truncates element values to uint8. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVDB, CPU Feature: AVX512
type Uint32x8 ¶
type Uint32x8 struct {
// contains filtered or unexported fields
}
Uint32x8 is a 256-bit SIMD vector of 8 uint32s.
func BroadcastUint32x8 ¶
BroadcastUint32x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadMaskedUint32x8 ¶
LoadMaskedUint32x8 loads a Uint32x8 from an array, at those elements enabled by mask.
Asm: VMASKMOVD, CPU Feature: AVX2
func LoadUint32x8 ¶
LoadUint32x8 loads a Uint32x8 from an array.
func LoadUint32x8Slice ¶
LoadUint32x8Slice loads an Uint32x8 from a slice of at least 8 uint32s.
func LoadUint32x8SlicePart ¶
LoadUint32x8SlicePart loads a Uint32x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadUint32x8Slice.
func (Uint32x8) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDD, CPU Feature: AVX2
func (Uint32x8) AddPairsGrouped ¶
AddPairsGrouped horizontally adds adjacent pairs of elements. With each 128-bit as a group: for x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0+x1, x2+x3, ..., y0+y1, y2+y3, ...].
Asm: VPHADDD, CPU Feature: AVX2
func (Uint32x8) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Uint32x8) AsFloat32x8 ¶
AsFloat32x8 returns a Float32x8 with the same bit representation as x.
func (Uint32x8) AsFloat64x4 ¶
AsFloat64x4 returns a Float64x4 with the same bit representation as x.
func (Uint32x8) AsInt16x16 ¶
AsInt16x16 returns an Int16x16 with the same bit representation as x.
func (Uint32x8) AsUint16x16 ¶
AsUint16x16 returns a Uint16x16 with the same bit representation as x.
func (Uint32x8) AsUint64x4 ¶
AsUint64x4 returns a Uint64x4 with the same bit representation as x.
func (Uint32x8) AsUint8x32 ¶
AsUint8x32 returns a Uint8x32 with the same bit representation as x.
func (Uint32x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512
func (Uint32x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512
func (Uint32x8) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTUDQ2PS, CPU Feature: AVX512
func (Uint32x8) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTUDQ2PD, CPU Feature: AVX512
func (Uint32x8) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQD, CPU Feature: AVX2
func (Uint32x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512
func (Uint32x8) ExtendToUint64 ¶
ExtendToUint64 zero-extends element values to uint64.
Asm: VPMOVZXDQ, CPU Feature: AVX512
func (Uint32x8) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Emulated, CPU Feature: AVX2
func (Uint32x8) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX2
func (Uint32x8) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX2
func (Uint32x8) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX2
func (Uint32x8) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Uint32x8) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512
func (Uint32x8) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX2
func (Uint32x8) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX2
func (Uint32x8) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXUD, CPU Feature: AVX2
func (Uint32x8) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINUD, CPU Feature: AVX2
func (Uint32x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX2
func (Uint32x8) MulEvenWiden ¶
MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1[2*i] * v2[2*i].
Asm: VPMULUDQ, CPU Feature: AVX2
func (Uint32x8) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX2
func (Uint32x8) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ
func (Uint32x8) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Uint32x8) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used.
Asm: VPERMD, CPU Feature: AVX2
func (Uint32x8) PermuteScalarsGrouped ¶
PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFD, CPU Feature: AVX2
func (Uint32x8) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512
func (Uint32x8) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512
func (Uint32x8) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512
func (Uint32x8) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512
func (Uint32x8) SaturateToUint16 ¶
SaturateToUint16 converts element values to uint16 with unsigned saturation.
Asm: VPMOVUSDW, CPU Feature: AVX512
func (Uint32x8) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8 with unsigned saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVUSDB, CPU Feature: AVX512
func (Uint32x8) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})
returns {70, 71, 72, 73, 40, 41, 42, 43}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Uint32x8) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example,
{1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})
returns {4,8,25,81,64,128,169,289}.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX
func (Uint32x8) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint32x8) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint32x8) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLD, CPU Feature: AVX2
func (Uint32x8) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2
func (Uint32x8) ShiftAllRight ¶
ShiftAllRight performs an unsigned right shift on each element by y bits.
Asm: VPSRLD, CPU Feature: AVX2
func (Uint32x8) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2
func (Uint32x8) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVD, CPU Feature: AVX2
func (Uint32x8) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2
func (Uint32x8) ShiftRight ¶
ShiftRight performs an unsigned right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRLVD, CPU Feature: AVX2
func (Uint32x8) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2
func (Uint32x8) StoreMasked ¶
StoreMasked stores a Uint32x8 to an array, at those elements enabled by mask.
Asm: VMASKMOVD, CPU Feature: AVX2
func (Uint32x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 uint32s.
func (Uint32x8) StoreSlicePart ¶
StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Uint32x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX2
func (Uint32x8) SubPairsGrouped ¶
SubPairsGrouped horizontally subtracts adjacent pairs of elements. With each 128-bit as a group: for x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [x0-x1, x2-x3, ..., y0-y1, y2-y3, ...].
Asm: VPHSUBD, CPU Feature: AVX2
func (Uint32x8) TruncateToUint16 ¶
TruncateToUint16 truncates element values to uint16.
Asm: VPMOVDW, CPU Feature: AVX512
func (Uint32x8) TruncateToUint8 ¶
TruncateToUint8 truncates element values to uint8. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVDB, CPU Feature: AVX512
type Uint64x2 ¶
type Uint64x2 struct {
// contains filtered or unexported fields
}
Uint64x2 is a 128-bit SIMD vector of 2 uint64s.
func BroadcastUint64x2 ¶
BroadcastUint64x2 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadMaskedUint64x2 ¶
LoadMaskedUint64x2 loads a Uint64x2 from an array, at those elements enabled by mask.
Asm: VMASKMOVQ, CPU Feature: AVX2
func LoadUint64x2 ¶
LoadUint64x2 loads a Uint64x2 from an array.
func LoadUint64x2Slice ¶
LoadUint64x2Slice loads an Uint64x2 from a slice of at least 2 uint64s.
func LoadUint64x2SlicePart ¶
LoadUint64x2SlicePart loads a Uint64x2 from the slice s. If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes. If s has 2 or more elements, the function is equivalent to LoadUint64x2Slice.
func (Uint64x2) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Uint64x2) AsFloat32x4 ¶
AsFloat32x4 returns a Float32x4 with the same bit representation as x.
func (Uint64x2) AsFloat64x2 ¶
AsFloat64x2 returns a Float64x2 with the same bit representation as x.
func (Uint64x2) AsUint16x8 ¶
AsUint16x8 returns a Uint16x8 with the same bit representation as x.
func (Uint64x2) AsUint32x4 ¶
AsUint32x4 returns a Uint32x4 with the same bit representation as x.
func (Uint64x2) AsUint8x16 ¶
AsUint8x16 returns a Uint8x16 with the same bit representation as x.
func (Uint64x2) Broadcast1To2 ¶
Broadcast1To2 copies the lowest element of its input to all 2 elements of the output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2
func (Uint64x2) Broadcast1To4 ¶
Broadcast1To4 copies the lowest element of its input to all 4 elements of the output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2
func (Uint64x2) Broadcast1To8 ¶
Broadcast1To8 copies the lowest element of its input to all 8 elements of the output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX512
func (Uint64x2) CarrylessMultiply ¶
CarrylessMultiply computes one of four possible carryless multiplications of selected high and low halves of x and y, depending on the values of a and b, returning the 128-bit product in the concatenated two elements of the result. a selects the low (0) or high (1) element of x and b selects the low (0) or high (1) element of y.
A carryless multiplication uses bitwise XOR instead of add-with-carry, for example (in base two):
11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101
This also models multiplication of polynomials with coefficients from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 = x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds polynomial terms, but coefficients "add" with XOR.)
constant values of a and b will result in better performance, otherwise the intrinsic may translate into a jump table.
Asm: VPCLMULQDQ, CPU Feature: AVX
func (Uint64x2) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512
func (Uint64x2) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512
func (Uint64x2) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTUQQ2PSX, CPU Feature: AVX512
func (Uint64x2) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTUQQ2PD, CPU Feature: AVX512
func (Uint64x2) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQQ, CPU Feature: AVX
func (Uint64x2) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512
func (Uint64x2) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRQ, CPU Feature: AVX
func (Uint64x2) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Emulated, CPU Feature: AVX
func (Uint64x2) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX
func (Uint64x2) InterleaveHi ¶
InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX
func (Uint64x2) InterleaveLo ¶
InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX
func (Uint64x2) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Uint64x2) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512
func (Uint64x2) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX
func (Uint64x2) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX
func (Uint64x2) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXUQ, CPU Feature: AVX512
func (Uint64x2) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINUQ, CPU Feature: AVX512
func (Uint64x2) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512
func (Uint64x2) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX
func (Uint64x2) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ
func (Uint64x2) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Uint64x2) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512
func (Uint64x2) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512
func (Uint64x2) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512
func (Uint64x2) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512
func (Uint64x2) SaturateToUint16 ¶
SaturateToUint16 converts element values to uint16 with unsigned saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVUSQW, CPU Feature: AVX512
func (Uint64x2) SaturateToUint32 ¶
SaturateToUint32 converts element values to uint32 with unsigned saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVUSQD, CPU Feature: AVX512
func (Uint64x2) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8 with unsigned saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVUSQB, CPU Feature: AVX512
func (Uint64x2) SelectFromPair ¶
SelectFromPair returns the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX
func (Uint64x2) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRQ, CPU Feature: AVX
func (Uint64x2) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLQ, CPU Feature: AVX
func (Uint64x2) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2
func (Uint64x2) ShiftAllRight ¶
ShiftAllRight performs an unsigned right shift on each element by y bits.
Asm: VPSRLQ, CPU Feature: AVX
func (Uint64x2) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2
func (Uint64x2) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVQ, CPU Feature: AVX2
func (Uint64x2) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2
func (Uint64x2) ShiftRight ¶
ShiftRight performs an unsigned right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRLVQ, CPU Feature: AVX2
func (Uint64x2) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2
func (Uint64x2) StoreMasked ¶
StoreMasked stores a Uint64x2 to an array, at those elements enabled by mask.
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Uint64x2) StoreSlice ¶
StoreSlice stores x into a slice of at least 2 uint64s.
func (Uint64x2) StoreSlicePart ¶
StoreSlicePart stores the 2 elements of x into the slice s. It stores as many elements as will fit in s. If s has 2 or more elements, the method is equivalent to x.StoreSlice.
func (Uint64x2) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX
func (Uint64x2) TruncateToUint16 ¶
TruncateToUint16 truncates element values to uint16. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVQW, CPU Feature: AVX512
func (Uint64x2) TruncateToUint32 ¶
TruncateToUint32 truncates element values to uint32. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVQD, CPU Feature: AVX512
func (Uint64x2) TruncateToUint8 ¶
TruncateToUint8 truncates element values to uint8. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVQB, CPU Feature: AVX512
type Uint64x4 ¶
type Uint64x4 struct {
// contains filtered or unexported fields
}
Uint64x4 is a 256-bit SIMD vector of 4 uint64s.
func BroadcastUint64x4 ¶
BroadcastUint64x4 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadMaskedUint64x4 ¶
LoadMaskedUint64x4 loads a Uint64x4 from an array, at those elements enabled by mask.
Asm: VMASKMOVQ, CPU Feature: AVX2
func LoadUint64x4 ¶
LoadUint64x4 loads a Uint64x4 from an array.
func LoadUint64x4Slice ¶
LoadUint64x4Slice loads an Uint64x4 from a slice of at least 4 uint64s.
func LoadUint64x4SlicePart ¶
LoadUint64x4SlicePart loads a Uint64x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadUint64x4Slice.
func (Uint64x4) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDQ, CPU Feature: AVX2
func (Uint64x4) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Uint64x4) AsFloat32x8 ¶
AsFloat32x8 returns a Float32x8 with the same bit representation as x.
func (Uint64x4) AsFloat64x4 ¶
AsFloat64x4 returns a Float64x4 with the same bit representation as x.
func (Uint64x4) AsInt16x16 ¶
AsInt16x16 returns an Int16x16 with the same bit representation as x.
func (Uint64x4) AsUint16x16 ¶
AsUint16x16 returns a Uint16x16 with the same bit representation as x.
func (Uint64x4) AsUint32x8 ¶
AsUint32x8 returns a Uint32x8 with the same bit representation as x.
func (Uint64x4) AsUint8x32 ¶
AsUint8x32 returns a Uint8x32 with the same bit representation as x.
func (Uint64x4) CarrylessMultiplyGrouped ¶
CarrylessMultiplyGrouped computes one of four possible carryless multiplications of selected high and low halves of each of the two 128-bit lanes of x and y, depending on the values of a and b, and returns the four 128-bit products in the result's lanes. a selects the low (0) or high (1) elements of x's lanes and b selects the low (0) or high (1) elements of y's lanes.
A carryless multiplication uses bitwise XOR instead of add-with-carry, for example (in base two):
11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101
This also models multiplication of polynomials with coefficients from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 = x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds polynomial terms, but coefficients "add" with XOR.)
constant values of a and b will result in better performance, otherwise the intrinsic may translate into a jump table.
Asm: VPCLMULQDQ, CPU Feature: AVX512VPCLMULQDQ
func (Uint64x4) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512
func (Uint64x4) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512
func (Uint64x4) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTUQQ2PSY, CPU Feature: AVX512
func (Uint64x4) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTUQQ2PD, CPU Feature: AVX512
func (Uint64x4) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQQ, CPU Feature: AVX2
func (Uint64x4) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512
func (Uint64x4) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Emulated, CPU Feature: AVX2
func (Uint64x4) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX2
func (Uint64x4) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX2
func (Uint64x4) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX2
func (Uint64x4) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Uint64x4) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512
func (Uint64x4) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX2
func (Uint64x4) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX2
func (Uint64x4) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXUQ, CPU Feature: AVX512
func (Uint64x4) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINUQ, CPU Feature: AVX512
func (Uint64x4) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512
func (Uint64x4) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX2
func (Uint64x4) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ
func (Uint64x4) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Uint64x4) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 2 bits (values 0-3) of each element of indices is used.
Asm: VPERMQ, CPU Feature: AVX512
func (Uint64x4) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512
func (Uint64x4) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512
func (Uint64x4) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512
func (Uint64x4) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512
func (Uint64x4) SaturateToUint16 ¶
SaturateToUint16 converts element values to uint16 with unsigned saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVUSQW, CPU Feature: AVX512
func (Uint64x4) SaturateToUint32 ¶
SaturateToUint32 converts element values to uint32 with unsigned saturation.
Asm: VPMOVUSQD, CPU Feature: AVX512
func (Uint64x4) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8 with unsigned saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVUSQB, CPU Feature: AVX512
func (Uint64x4) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})
returns {70, 71, 40, 41}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Uint64x4) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX
func (Uint64x4) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint64x4) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint64x4) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLQ, CPU Feature: AVX2
func (Uint64x4) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2
func (Uint64x4) ShiftAllRight ¶
ShiftAllRight performs an unsigned right shift on each element by y bits.
Asm: VPSRLQ, CPU Feature: AVX2
func (Uint64x4) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2
func (Uint64x4) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVQ, CPU Feature: AVX2
func (Uint64x4) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2
func (Uint64x4) ShiftRight ¶
ShiftRight performs an unsigned right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRLVQ, CPU Feature: AVX2
func (Uint64x4) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2
func (Uint64x4) StoreMasked ¶
StoreMasked stores a Uint64x4 to an array, at those elements enabled by mask.
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Uint64x4) StoreSlice ¶
StoreSlice stores x into a slice of at least 4 uint64s.
func (Uint64x4) StoreSlicePart ¶
StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.
func (Uint64x4) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX2
func (Uint64x4) TruncateToUint16 ¶
TruncateToUint16 truncates element values to uint16. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVQW, CPU Feature: AVX512
func (Uint64x4) TruncateToUint32 ¶
TruncateToUint32 truncates element values to uint32.
Asm: VPMOVQD, CPU Feature: AVX512
func (Uint64x4) TruncateToUint8 ¶
TruncateToUint8 truncates element values to uint8. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVQB, CPU Feature: AVX512
type Uint64x8 ¶
type Uint64x8 struct {
// contains filtered or unexported fields
}
Uint64x8 is a 512-bit SIMD vector of 8 uint64s.
func BroadcastUint64x8 ¶
BroadcastUint64x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX512F
func LoadMaskedUint64x8 ¶
LoadMaskedUint64x8 loads a Uint64x8 from an array, at those elements enabled by mask.
Asm: VMOVDQU64.Z, CPU Feature: AVX512
func LoadUint64x8 ¶
LoadUint64x8 loads a Uint64x8 from an array.
func LoadUint64x8Slice ¶
LoadUint64x8Slice loads an Uint64x8 from a slice of at least 8 uint64s.
func LoadUint64x8SlicePart ¶
LoadUint64x8SlicePart loads a Uint64x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadUint64x8Slice.
func (Uint64x8) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDQ, CPU Feature: AVX512
func (Uint64x8) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDQ, CPU Feature: AVX512
func (Uint64x8) AsFloat32x16 ¶
func (x Uint64x8) AsFloat32x16() Float32x16
AsFloat32x16 returns a Float32x16 with the same bit representation as x.
func (Uint64x8) AsFloat64x8 ¶
AsFloat64x8 returns a Float64x8 with the same bit representation as x.
func (Uint64x8) AsInt16x32 ¶
AsInt16x32 returns an Int16x32 with the same bit representation as x.
func (Uint64x8) AsInt32x16 ¶
AsInt32x16 returns an Int32x16 with the same bit representation as x.
func (Uint64x8) AsUint16x32 ¶
AsUint16x32 returns a Uint16x32 with the same bit representation as x.
func (Uint64x8) AsUint32x16 ¶
AsUint32x16 returns a Uint32x16 with the same bit representation as x.
func (Uint64x8) AsUint8x64 ¶
AsUint8x64 returns a Uint8x64 with the same bit representation as x.
func (Uint64x8) CarrylessMultiplyGrouped ¶
CarrylessMultiplyGrouped computes one of four possible carryless multiplications of selected high and low halves of each of the four 128-bit lanes of x and y, depending on the values of a and b, and returns the four 128-bit products in the result's lanes. a selects the low (0) or high (1) elements of x's lanes and b selects the low (0) or high (1) elements of y's lanes.
A carryless multiplication uses bitwise XOR instead of add-with-carry, for example (in base two):
11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101
This also models multiplication of polynomials with coefficients from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 = x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds polynomial terms, but coefficients "add" with XOR.)
constant values of a and b will result in better performance, otherwise the intrinsic may translate into a jump table.
Asm: VPCLMULQDQ, CPU Feature: AVX512VPCLMULQDQ
func (Uint64x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512
func (Uint64x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512
func (Uint64x8) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTUQQ2PS, CPU Feature: AVX512
func (Uint64x8) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTUQQ2PD, CPU Feature: AVX512
func (Uint64x8) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQQ, CPU Feature: AVX512
func (Uint64x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512
func (Uint64x8) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPUQ, CPU Feature: AVX512
func (Uint64x8) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Asm: VPCMPUQ, CPU Feature: AVX512
func (Uint64x8) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX512
func (Uint64x8) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX512
func (Uint64x8) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512
func (Uint64x8) Less ¶
Less returns a mask whose elements indicate whether x < y.
Asm: VPCMPUQ, CPU Feature: AVX512
func (Uint64x8) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Asm: VPCMPUQ, CPU Feature: AVX512
func (Uint64x8) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXUQ, CPU Feature: AVX512
func (Uint64x8) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINUQ, CPU Feature: AVX512
func (Uint64x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512
func (Uint64x8) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Asm: VPCMPUQ, CPU Feature: AVX512
func (Uint64x8) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ
func (Uint64x8) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORQ, CPU Feature: AVX512
func (Uint64x8) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 3 bits (values 0-7) of each element of indices is used.
Asm: VPERMQ, CPU Feature: AVX512
func (Uint64x8) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512
func (Uint64x8) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by shift.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512
func (Uint64x8) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512
func (Uint64x8) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512
func (Uint64x8) SaturateToUint16 ¶
SaturateToUint16 converts element values to uint16 with unsigned saturation.
Asm: VPMOVUSQW, CPU Feature: AVX512
func (Uint64x8) SaturateToUint32 ¶
SaturateToUint32 converts element values to uint32 with unsigned saturation.
Asm: VPMOVUSQD, CPU Feature: AVX512
func (Uint64x8) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8 with unsigned saturation. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVUSQB, CPU Feature: AVX512
func (Uint64x8) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX512
func (Uint64x8) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint64x8) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint64x8) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by y bits.
Asm: VPSLLQ, CPU Feature: AVX512
func (Uint64x8) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by shift (only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2
func (Uint64x8) ShiftAllRight ¶
ShiftAllRight performs an unsigned right shift on each element by y bits.
Asm: VPSRLQ, CPU Feature: AVX512
func (Uint64x8) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by shift (only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2
func (Uint64x8) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements.
Asm: VPSLLVQ, CPU Feature: AVX512
func (Uint64x8) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2
func (Uint64x8) ShiftRight ¶
ShiftRight performs an unsigned right shift on each element in x by the number of bits specified in y's corresponding elements.
Asm: VPSRLVQ, CPU Feature: AVX512
func (Uint64x8) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y (only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2
func (Uint64x8) StoreMasked ¶
StoreMasked stores a Uint64x8 to an array, at those elements enabled by mask.
Asm: VMOVDQU64, CPU Feature: AVX512
func (Uint64x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 uint64s.
func (Uint64x8) StoreSlicePart ¶
StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Uint64x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX512
func (Uint64x8) TruncateToUint16 ¶
TruncateToUint16 truncates element values to uint16.
Asm: VPMOVQW, CPU Feature: AVX512
func (Uint64x8) TruncateToUint32 ¶
TruncateToUint32 truncates element values to uint32.
Asm: VPMOVQD, CPU Feature: AVX512
func (Uint64x8) TruncateToUint8 ¶
TruncateToUint8 truncates element values to uint8. Results are packed to low elements in the returned vector, its upper elements are zeroed.
Asm: VPMOVQB, CPU Feature: AVX512
type Uint8x16 ¶
type Uint8x16 struct {
// contains filtered or unexported fields
}
Uint8x16 is a 128-bit SIMD vector of 16 uint8s.
func BroadcastUint8x16 ¶
BroadcastUint8x16 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadUint8x16 ¶
LoadUint8x16 loads a Uint8x16 from an array.
func LoadUint8x16Slice ¶
LoadUint8x16Slice loads an Uint8x16 from a slice of at least 16 uint8s.
func LoadUint8x16SlicePart ¶
LoadUint8x16SlicePart loads a Uint8x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadUint8x16Slice.
func (Uint8x16) AESDecryptLastRound ¶
AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)
Asm: VAESDECLAST, CPU Feature: AVX, AES
func (Uint8x16) AESDecryptOneRound ¶
AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)
Asm: VAESDEC, CPU Feature: AVX, AES
func (Uint8x16) AESEncryptLastRound ¶
AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey((ShiftRows(SubBytes(x))), y)
Asm: VAESENCLAST, CPU Feature: AVX, AES
func (Uint8x16) AESEncryptOneRound ¶
AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)
Asm: VAESENC, CPU Feature: AVX, AES
func (Uint8x16) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSB, CPU Feature: AVX
func (Uint8x16) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Uint8x16) AsFloat32x4 ¶
AsFloat32x4 returns a Float32x4 with the same bit representation as x.
func (Uint8x16) AsFloat64x2 ¶
AsFloat64x2 returns a Float64x2 with the same bit representation as x.
func (Uint8x16) AsUint16x8 ¶
AsUint16x8 returns a Uint16x8 with the same bit representation as x.
func (Uint8x16) AsUint32x4 ¶
AsUint32x4 returns a Uint32x4 with the same bit representation as x.
func (Uint8x16) AsUint64x2 ¶
AsUint64x2 returns a Uint64x2 with the same bit representation as x.
func (Uint8x16) Average ¶
Average computes the rounded average of corresponding elements.
Asm: VPAVGB, CPU Feature: AVX
func (Uint8x16) Broadcast1To16 ¶
Broadcast1To16 copies the lowest element of its input to all 16 elements of the output vector.
Asm: VPBROADCASTB, CPU Feature: AVX2
func (Uint8x16) Broadcast1To32 ¶
Broadcast1To32 copies the lowest element of its input to all 32 elements of the output vector.
Asm: VPBROADCASTB, CPU Feature: AVX2
func (Uint8x16) Broadcast1To64 ¶
Broadcast1To64 copies the lowest element of its input to all 64 elements of the output vector.
Asm: VPBROADCASTB, CPU Feature: AVX512
func (Uint8x16) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2
func (Uint8x16) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI
func (Uint8x16) ConcatShiftBytesRight ¶
ConcatShiftBytesRight concatenates x and y and shift it right by shift bytes. The result vector will be the lower half of the concatenated vector.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPALIGNR, CPU Feature: AVX
func (Uint8x16) DotProductPairsSaturated ¶
DotProductPairsSaturated multiplies the elements and add the pairs together with saturation, yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDUBSW, CPU Feature: AVX
func (Uint8x16) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQB, CPU Feature: AVX
func (Uint8x16) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2
func (Uint8x16) ExtendLo2ToUint64 ¶
ExtendLo2ToUint64 zero-extends 2 lowest vector element values to uint64.
Asm: VPMOVZXBQ, CPU Feature: AVX
func (Uint8x16) ExtendLo4ToUint32 ¶
ExtendLo4ToUint32 zero-extends 4 lowest vector element values to uint32.
Asm: VPMOVZXBD, CPU Feature: AVX
func (Uint8x16) ExtendLo4ToUint64 ¶
ExtendLo4ToUint64 zero-extends 4 lowest vector element values to uint64.
Asm: VPMOVZXBQ, CPU Feature: AVX2
func (Uint8x16) ExtendLo8ToUint16 ¶
ExtendLo8ToUint16 zero-extends 8 lowest vector element values to uint16.
Asm: VPMOVZXBW, CPU Feature: AVX
func (Uint8x16) ExtendLo8ToUint32 ¶
ExtendLo8ToUint32 zero-extends 8 lowest vector element values to uint32.
Asm: VPMOVZXBD, CPU Feature: AVX2
func (Uint8x16) ExtendLo8ToUint64 ¶
ExtendLo8ToUint64 zero-extends 8 lowest vector element values to uint64.
Asm: VPMOVZXBQ, CPU Feature: AVX512
func (Uint8x16) ExtendToUint16 ¶
ExtendToUint16 zero-extends element values to uint16.
Asm: VPMOVZXBW, CPU Feature: AVX2
func (Uint8x16) ExtendToUint32 ¶
ExtendToUint32 zero-extends element values to uint32.
Asm: VPMOVZXBD, CPU Feature: AVX512
func (Uint8x16) GaloisFieldAffineTransform ¶
GaloisFieldAffineTransform computes an affine transformation in GF(2^8): x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI
func (Uint8x16) GaloisFieldAffineTransformInverse ¶
GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8), with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1: x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI
func (Uint8x16) GaloisFieldMul ¶
GaloisFieldMul computes element-wise GF(2^8) multiplication with reduction polynomial x^8 + x^4 + x^3 + x + 1.
Asm: VGF2P8MULB, CPU Feature: AVX512GFNI
func (Uint8x16) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRB, CPU Feature: AVX512
func (Uint8x16) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Emulated, CPU Feature: AVX2
func (Uint8x16) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX2
func (Uint8x16) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Uint8x16) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX2
func (Uint8x16) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX2
func (Uint8x16) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXUB, CPU Feature: AVX
func (Uint8x16) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINUB, CPU Feature: AVX
func (Uint8x16) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX
func (Uint8x16) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG
func (Uint8x16) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Uint8x16) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 4 bits (values 0-15) of each element of indices is used.
Asm: VPERMB, CPU Feature: AVX512VBMI
func (Uint8x16) PermuteOrZero ¶
PermuteOrZero performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The lower four bits of each byte-sized index in indices select an element from x, unless the index's sign bit is set in which case zero is used instead.
Asm: VPSHUFB, CPU Feature: AVX
func (Uint8x16) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRB, CPU Feature: AVX
func (Uint8x16) StoreSlice ¶
StoreSlice stores x into a slice of at least 16 uint8s.
func (Uint8x16) StoreSlicePart ¶
StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.
func (Uint8x16) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX
func (Uint8x16) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSB, CPU Feature: AVX
func (Uint8x16) SumAbsDiff ¶
SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed. This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.
Asm: VPSADBW, CPU Feature: AVX
type Uint8x32 ¶
type Uint8x32 struct {
// contains filtered or unexported fields
}
Uint8x32 is a 256-bit SIMD vector of 32 uint8s.
func BroadcastUint8x32 ¶
BroadcastUint8x32 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX2
func LoadUint8x32 ¶
LoadUint8x32 loads a Uint8x32 from an array.
func LoadUint8x32Slice ¶
LoadUint8x32Slice loads an Uint8x32 from a slice of at least 32 uint8s.
func LoadUint8x32SlicePart ¶
LoadUint8x32SlicePart loads a Uint8x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadUint8x32Slice.
func (Uint8x32) AESDecryptLastRound ¶
AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)
Asm: VAESDECLAST, CPU Feature: AVX512VAES
func (Uint8x32) AESDecryptOneRound ¶
AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)
Asm: VAESDEC, CPU Feature: AVX512VAES
func (Uint8x32) AESEncryptLastRound ¶
AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey((ShiftRows(SubBytes(x))), y)
Asm: VAESENCLAST, CPU Feature: AVX512VAES
func (Uint8x32) AESEncryptOneRound ¶
AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)
Asm: VAESENC, CPU Feature: AVX512VAES
func (Uint8x32) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDB, CPU Feature: AVX2
func (Uint8x32) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSB, CPU Feature: AVX2
func (Uint8x32) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Uint8x32) AsFloat32x8 ¶
AsFloat32x8 returns a Float32x8 with the same bit representation as x.
func (Uint8x32) AsFloat64x4 ¶
AsFloat64x4 returns a Float64x4 with the same bit representation as x.
func (Uint8x32) AsInt16x16 ¶
AsInt16x16 returns an Int16x16 with the same bit representation as x.
func (Uint8x32) AsUint16x16 ¶
AsUint16x16 returns a Uint16x16 with the same bit representation as x.
func (Uint8x32) AsUint32x8 ¶
AsUint32x8 returns a Uint32x8 with the same bit representation as x.
func (Uint8x32) AsUint64x4 ¶
AsUint64x4 returns a Uint64x4 with the same bit representation as x.
func (Uint8x32) Average ¶
Average computes the rounded average of corresponding elements.
Asm: VPAVGB, CPU Feature: AVX2
func (Uint8x32) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2
func (Uint8x32) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI
func (Uint8x32) ConcatShiftBytesRightGrouped ¶
ConcatShiftBytesRightGrouped concatenates x and y and shift it right by shift bytes. The result vector will be the lower half of the concatenated vector. This operation is performed grouped by each 16 byte.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPALIGNR, CPU Feature: AVX2
func (Uint8x32) DotProductPairsSaturated ¶
DotProductPairsSaturated multiplies the elements and add the pairs together with saturation, yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDUBSW, CPU Feature: AVX2
func (Uint8x32) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQB, CPU Feature: AVX2
func (Uint8x32) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2
func (Uint8x32) ExtendToUint16 ¶
ExtendToUint16 zero-extends element values to uint16.
Asm: VPMOVZXBW, CPU Feature: AVX512
func (Uint8x32) GaloisFieldAffineTransform ¶
GaloisFieldAffineTransform computes an affine transformation in GF(2^8): x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI
func (Uint8x32) GaloisFieldAffineTransformInverse ¶
GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8), with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1: x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI
func (Uint8x32) GaloisFieldMul ¶
GaloisFieldMul computes element-wise GF(2^8) multiplication with reduction polynomial x^8 + x^4 + x^3 + x + 1.
Asm: VGF2P8MULB, CPU Feature: AVX512GFNI
func (Uint8x32) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Emulated, CPU Feature: AVX2
func (Uint8x32) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Emulated, CPU Feature: AVX2
func (Uint8x32) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y.
Asm: VPTEST, CPU Feature: AVX
func (Uint8x32) Less ¶
Less returns a mask whose elements indicate whether x < y.
Emulated, CPU Feature: AVX2
func (Uint8x32) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Emulated, CPU Feature: AVX2
func (Uint8x32) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXUB, CPU Feature: AVX2
func (Uint8x32) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINUB, CPU Feature: AVX2
func (Uint8x32) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Emulated, CPU Feature: AVX2
func (Uint8x32) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG
func (Uint8x32) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Uint8x32) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 5 bits (values 0-31) of each element of indices is used.
Asm: VPERMB, CPU Feature: AVX512VBMI
func (Uint8x32) PermuteOrZeroGrouped ¶
PermuteOrZeroGrouped performs a grouped permutation of vector x using indices:
result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...}
The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.
Asm: VPSHUFB, CPU Feature: AVX2
func (Uint8x32) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{0x40, 0x41, ..., 0x4f, 0x50, 0x51, ..., 0x5f}.Select128FromPair(3, 0,
{0x60, 0x61, ..., 0x6f, 0x70, 0x71, ..., 0x7f})
returns {0x70, 0x71, ..., 0x7f, 0x40, 0x41, ..., 0x4f}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Uint8x32) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint8x32) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint8x32) StoreSlice ¶
StoreSlice stores x into a slice of at least 32 uint8s.
func (Uint8x32) StoreSlicePart ¶
StoreSlicePart stores the 32 elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.
func (Uint8x32) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX2
func (Uint8x32) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSB, CPU Feature: AVX2
func (Uint8x32) SumAbsDiff ¶
SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed. This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.
Asm: VPSADBW, CPU Feature: AVX2
type Uint8x64 ¶
type Uint8x64 struct {
// contains filtered or unexported fields
}
Uint8x64 is a 512-bit SIMD vector of 64 uint8s.
func BroadcastUint8x64 ¶
BroadcastUint8x64 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature: AVX512BW
func LoadMaskedUint8x64 ¶
LoadMaskedUint8x64 loads a Uint8x64 from an array, at those elements enabled by mask.
Asm: VMOVDQU8.Z, CPU Feature: AVX512
func LoadUint8x64 ¶
LoadUint8x64 loads a Uint8x64 from an array.
func LoadUint8x64Slice ¶
LoadUint8x64Slice loads an Uint8x64 from a slice of at least 64 uint8s.
func LoadUint8x64SlicePart ¶
LoadUint8x64SlicePart loads a Uint8x64 from the slice s. If s has fewer than 64 elements, the remaining elements of the vector are filled with zeroes. If s has 64 or more elements, the function is equivalent to LoadUint8x64Slice.
func (Uint8x64) AESDecryptLastRound ¶
AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)
Asm: VAESDECLAST, CPU Feature: AVX512VAES
func (Uint8x64) AESDecryptOneRound ¶
AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)
Asm: VAESDEC, CPU Feature: AVX512VAES
func (Uint8x64) AESEncryptLastRound ¶
AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey((ShiftRows(SubBytes(x))), y)
Asm: VAESENCLAST, CPU Feature: AVX512VAES
func (Uint8x64) AESEncryptOneRound ¶
AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)
Asm: VAESENC, CPU Feature: AVX512VAES
func (Uint8x64) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDB, CPU Feature: AVX512
func (Uint8x64) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSB, CPU Feature: AVX512
func (Uint8x64) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512
func (Uint8x64) AsFloat32x16 ¶
func (x Uint8x64) AsFloat32x16() Float32x16
AsFloat32x16 returns a Float32x16 with the same bit representation as x.
func (Uint8x64) AsFloat64x8 ¶
AsFloat64x8 returns a Float64x8 with the same bit representation as x.
func (Uint8x64) AsInt16x32 ¶
AsInt16x32 returns an Int16x32 with the same bit representation as x.
func (Uint8x64) AsInt32x16 ¶
AsInt32x16 returns an Int32x16 with the same bit representation as x.
func (Uint8x64) AsUint16x32 ¶
AsUint16x32 returns a Uint16x32 with the same bit representation as x.
func (Uint8x64) AsUint32x16 ¶
AsUint32x16 returns a Uint32x16 with the same bit representation as x.
func (Uint8x64) AsUint64x8 ¶
AsUint64x8 returns a Uint64x8 with the same bit representation as x.
func (Uint8x64) Average ¶
Average computes the rounded average of corresponding elements.
Asm: VPAVGB, CPU Feature: AVX512
func (Uint8x64) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2
func (Uint8x64) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices:
result = {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]}
where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI
func (Uint8x64) ConcatShiftBytesRightGrouped ¶
ConcatShiftBytesRightGrouped concatenates x and y and shift it right by shift bytes. The result vector will be the lower half of the concatenated vector. This operation is performed grouped by each 16 byte.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPALIGNR, CPU Feature: AVX512
func (Uint8x64) DotProductPairsSaturated ¶
DotProductPairsSaturated multiplies the elements and add the pairs together with saturation, yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDUBSW, CPU Feature: AVX512
func (Uint8x64) Equal ¶
Equal returns a mask whose elements indicate whether x == y.
Asm: VPCMPEQB, CPU Feature: AVX512
func (Uint8x64) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2
func (Uint8x64) GaloisFieldAffineTransform ¶
GaloisFieldAffineTransform computes an affine transformation in GF(2^8): x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI
func (Uint8x64) GaloisFieldAffineTransformInverse ¶
GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8), with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1: x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI
func (Uint8x64) GaloisFieldMul ¶
GaloisFieldMul computes element-wise GF(2^8) multiplication with reduction polynomial x^8 + x^4 + x^3 + x + 1.
Asm: VGF2P8MULB, CPU Feature: AVX512GFNI
func (Uint8x64) Greater ¶
Greater returns a mask whose elements indicate whether x > y.
Asm: VPCMPUB, CPU Feature: AVX512
func (Uint8x64) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y.
Asm: VPCMPUB, CPU Feature: AVX512
func (Uint8x64) Less ¶
Less returns a mask whose elements indicate whether x < y.
Asm: VPCMPUB, CPU Feature: AVX512
func (Uint8x64) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y.
Asm: VPCMPUB, CPU Feature: AVX512
func (Uint8x64) Max ¶
Max computes the maximum of each pair of corresponding elements in x and y.
Asm: VPMAXUB, CPU Feature: AVX512
func (Uint8x64) Min ¶
Min computes the minimum of each pair of corresponding elements in x and y.
Asm: VPMINUB, CPU Feature: AVX512
func (Uint8x64) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y.
Asm: VPCMPUB, CPU Feature: AVX512
func (Uint8x64) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG
func (Uint8x64) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512
func (Uint8x64) Permute ¶
Permute performs a full permutation of vector x using indices:
result = {x[indices[0]], x[indices[1]], ..., x[indices[n]]}
The low 6 bits (values 0-63) of each element of indices is used.
Asm: VPERMB, CPU Feature: AVX512VBMI
func (Uint8x64) PermuteOrZeroGrouped ¶
PermuteOrZeroGrouped performs a grouped permutation of vector x using indices:
result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...}
The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.
Asm: VPSHUFB, CPU Feature: AVX512
func (Uint8x64) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint8x64) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint8x64) StoreMasked ¶
StoreMasked stores a Uint8x64 to an array, at those elements enabled by mask.
Asm: VMOVDQU8, CPU Feature: AVX512
func (Uint8x64) StoreSlice ¶
StoreSlice stores x into a slice of at least 64 uint8s.
func (Uint8x64) StoreSlicePart ¶
StoreSlicePart stores the 64 elements of x into the slice s. It stores as many elements as will fit in s. If s has 64 or more elements, the method is equivalent to x.StoreSlice.
func (Uint8x64) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX512
func (Uint8x64) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSB, CPU Feature: AVX512
func (Uint8x64) SumAbsDiff ¶
SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed. This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.
Asm: VPSADBW, CPU Feature: AVX512
type X86Features ¶
type X86Features struct{}
var X86 X86Features
func (X86Features) AES ¶
func (X86Features) AES() bool
AES returns whether the CPU supports the AES feature.
AES is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX ¶
func (X86Features) AVX() bool
AVX returns whether the CPU supports the AVX feature.
AVX is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX2 ¶
func (X86Features) AVX2() bool
AVX2 returns whether the CPU supports the AVX2 feature.
AVX2 is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512 ¶
func (X86Features) AVX512() bool
AVX512 returns whether the CPU supports the AVX512F+CD+BW+DQ+VL features.
These five CPU features are bundled together, and no use of AVX-512 is allowed unless all of these features are supported together. Nearly every CPU that has shipped with any support for AVX-512 has supported all five of these features.
AVX512 is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512BITALG ¶
func (X86Features) AVX512BITALG() bool
AVX512BITALG returns whether the CPU supports the AVX512BITALG feature.
AVX512BITALG is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512GFNI ¶
func (X86Features) AVX512GFNI() bool
AVX512GFNI returns whether the CPU supports the AVX512GFNI feature.
AVX512GFNI is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512VAES ¶
func (X86Features) AVX512VAES() bool
AVX512VAES returns whether the CPU supports the AVX512VAES feature.
AVX512VAES is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512VBMI ¶
func (X86Features) AVX512VBMI() bool
AVX512VBMI returns whether the CPU supports the AVX512VBMI feature.
AVX512VBMI is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512VBMI2 ¶
func (X86Features) AVX512VBMI2() bool
AVX512VBMI2 returns whether the CPU supports the AVX512VBMI2 feature.
AVX512VBMI2 is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512VNNI ¶
func (X86Features) AVX512VNNI() bool
AVX512VNNI returns whether the CPU supports the AVX512VNNI feature.
AVX512VNNI is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512VPCLMULQDQ ¶
func (X86Features) AVX512VPCLMULQDQ() bool
AVX512VPCLMULQDQ returns whether the CPU supports the AVX512VPCLMULQDQ feature.
AVX512VPCLMULQDQ is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512VPOPCNTDQ ¶
func (X86Features) AVX512VPOPCNTDQ() bool
AVX512VPOPCNTDQ returns whether the CPU supports the AVX512VPOPCNTDQ feature.
AVX512VPOPCNTDQ is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVXVNNI ¶
func (X86Features) AVXVNNI() bool
AVXVNNI returns whether the CPU supports the AVXVNNI feature.
AVXVNNI is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) SHA ¶
func (X86Features) SHA() bool
SHA returns whether the CPU supports the SHA feature.
SHA is defined on all GOARCHes, but will only return true on GOARCH amd64.
Notes ¶
Bugs ¶
Using a vector type as a type parameter may not work.
Using reflect Call to call a vector function/method may not work.