Documentation
¶
Overview ¶
Package archsimd provides access to architecture-specific SIMD operations.
This is a low-level package that exposes hardware-specific functionality. It currently supports AMD64.
This package is experimental, and not subject to the Go 1 compatibility promise. It only exists when building with the GOEXPERIMENT=simd environment variable set.
Vector types and operations ¶
Vector types are defined as structs, such as Int8x16 and Float64x8, corresponding to the hardware's vector registers. On AMD64, 128-, 256-, and 512-bit vectors are supported.
Mask types are defined similarly, such as Mask8x16, and are represented as opaque types, handling the differences in the underlying representations. A mask can be converted to/from the corresponding integer vector type, or to/from a bitmask.
Operations are mostly defined as methods on the vector types. Most of them are compiler intrinsics and correspond directly to hardware instructions.
Common operations include:
- Load/Store: Load a vector from memory or store a vector to memory.
- Arithmetic: Add, Sub, Mul, etc.
- Bitwise: And, Or, Xor, etc.
- Comparison: Equal, Greater, etc., which produce a mask.
- Conversion: Convert between different vector types.
- Field selection and rearrangement: GetElem, Permute, etc.
- Masking: Masked, Merge.
The compiler recognizes certain patterns of operations and may optimize them to more performant instructions. For example, on AVX512, an Add operation followed by Masked may be optimized to a masked add instruction. For this reason, not all hardware instructions are available as APIs.
CPU feature checks ¶
The package provides global variables to check for CPU features available at runtime. For example, on AMD64, the X86 variable provides methods to check for AVX2, AVX512, etc. It is recommended to check for CPU features before using the corresponding vector operations.
Notes ¶
- This package is not portable, as the available types and operations depend on the target architecture. It is not recommended to expose the SIMD types defined in this package in public APIs.
- For performance reasons, it is recommended to use the vector types directly as values. It is not recommended to take the address of a vector type, allocate it in the heap, or put it in an aggregate type.
Index ¶
- func ClearAVXUpperBits()
- type Float32x16
- func (x Float32x16) Add(y Float32x16) Float32x16
- func (from Float32x16) AsFloat64x8() (to Float64x8)
- func (from Float32x16) AsInt16x32() (to Int16x32)
- func (from Float32x16) AsInt32x16() (to Int32x16)
- func (from Float32x16) AsInt64x8() (to Int64x8)
- func (from Float32x16) AsInt8x64() (to Int8x64)
- func (from Float32x16) AsUint16x32() (to Uint16x32)
- func (from Float32x16) AsUint32x16() (to Uint32x16)
- func (from Float32x16) AsUint64x8() (to Uint64x8)
- func (from Float32x16) AsUint8x64() (to Uint8x64)
- func (x Float32x16) CeilScaled(prec uint8) Float32x16
- func (x Float32x16) CeilScaledResidue(prec uint8) Float32x16
- func (x Float32x16) Compress(mask Mask32x16) Float32x16
- func (x Float32x16) ConcatPermute(y Float32x16, indices Uint32x16) Float32x16
- func (x Float32x16) ConvertToInt32() Int32x16
- func (x Float32x16) ConvertToUint32() Uint32x16
- func (x Float32x16) Div(y Float32x16) Float32x16
- func (x Float32x16) Equal(y Float32x16) Mask32x16
- func (x Float32x16) Expand(mask Mask32x16) Float32x16
- func (x Float32x16) FloorScaled(prec uint8) Float32x16
- func (x Float32x16) FloorScaledResidue(prec uint8) Float32x16
- func (x Float32x16) GetHi() Float32x8
- func (x Float32x16) GetLo() Float32x8
- func (x Float32x16) Greater(y Float32x16) Mask32x16
- func (x Float32x16) GreaterEqual(y Float32x16) Mask32x16
- func (x Float32x16) IsNan(y Float32x16) Mask32x16
- func (x Float32x16) Len() int
- func (x Float32x16) Less(y Float32x16) Mask32x16
- func (x Float32x16) LessEqual(y Float32x16) Mask32x16
- func (x Float32x16) Masked(mask Mask32x16) Float32x16
- func (x Float32x16) Max(y Float32x16) Float32x16
- func (x Float32x16) Merge(y Float32x16, mask Mask32x16) Float32x16
- func (x Float32x16) Min(y Float32x16) Float32x16
- func (x Float32x16) Mul(y Float32x16) Float32x16
- func (x Float32x16) MulAdd(y Float32x16, z Float32x16) Float32x16
- func (x Float32x16) MulAddSub(y Float32x16, z Float32x16) Float32x16
- func (x Float32x16) MulSubAdd(y Float32x16, z Float32x16) Float32x16
- func (x Float32x16) NotEqual(y Float32x16) Mask32x16
- func (x Float32x16) Permute(indices Uint32x16) Float32x16
- func (x Float32x16) Reciprocal() Float32x16
- func (x Float32x16) ReciprocalSqrt() Float32x16
- func (x Float32x16) RoundToEvenScaled(prec uint8) Float32x16
- func (x Float32x16) RoundToEvenScaledResidue(prec uint8) Float32x16
- func (x Float32x16) Scale(y Float32x16) Float32x16
- func (x Float32x16) SelectFromPairGrouped(a, b, c, d uint8, y Float32x16) Float32x16
- func (x Float32x16) SetHi(y Float32x8) Float32x16
- func (x Float32x16) SetLo(y Float32x8) Float32x16
- func (x Float32x16) Sqrt() Float32x16
- func (x Float32x16) Store(y *[16]float32)
- func (x Float32x16) StoreMasked(y *[16]float32, mask Mask32x16)
- func (x Float32x16) StoreSlice(s []float32)
- func (x Float32x16) StoreSlicePart(s []float32)
- func (x Float32x16) String() string
- func (x Float32x16) Sub(y Float32x16) Float32x16
- func (x Float32x16) TruncScaled(prec uint8) Float32x16
- func (x Float32x16) TruncScaledResidue(prec uint8) Float32x16
- type Float32x4
- func (x Float32x4) Add(y Float32x4) Float32x4
- func (x Float32x4) AddPairs(y Float32x4) Float32x4
- func (x Float32x4) AddSub(y Float32x4) Float32x4
- func (from Float32x4) AsFloat64x2() (to Float64x2)
- func (from Float32x4) AsInt16x8() (to Int16x8)
- func (from Float32x4) AsInt32x4() (to Int32x4)
- func (from Float32x4) AsInt64x2() (to Int64x2)
- func (from Float32x4) AsInt8x16() (to Int8x16)
- func (from Float32x4) AsUint16x8() (to Uint16x8)
- func (from Float32x4) AsUint32x4() (to Uint32x4)
- func (from Float32x4) AsUint64x2() (to Uint64x2)
- func (from Float32x4) AsUint8x16() (to Uint8x16)
- func (x Float32x4) Broadcast128() Float32x4
- func (x Float32x4) Broadcast256() Float32x8
- func (x Float32x4) Broadcast512() Float32x16
- func (x Float32x4) Ceil() Float32x4
- func (x Float32x4) CeilScaled(prec uint8) Float32x4
- func (x Float32x4) CeilScaledResidue(prec uint8) Float32x4
- func (x Float32x4) Compress(mask Mask32x4) Float32x4
- func (x Float32x4) ConcatPermute(y Float32x4, indices Uint32x4) Float32x4
- func (x Float32x4) ConvertToFloat64() Float64x4
- func (x Float32x4) ConvertToInt32() Int32x4
- func (x Float32x4) ConvertToInt64() Int64x4
- func (x Float32x4) ConvertToUint32() Uint32x4
- func (x Float32x4) ConvertToUint64() Uint64x4
- func (x Float32x4) Div(y Float32x4) Float32x4
- func (x Float32x4) Equal(y Float32x4) Mask32x4
- func (x Float32x4) Expand(mask Mask32x4) Float32x4
- func (x Float32x4) Floor() Float32x4
- func (x Float32x4) FloorScaled(prec uint8) Float32x4
- func (x Float32x4) FloorScaledResidue(prec uint8) Float32x4
- func (x Float32x4) GetElem(index uint8) float32
- func (x Float32x4) Greater(y Float32x4) Mask32x4
- func (x Float32x4) GreaterEqual(y Float32x4) Mask32x4
- func (x Float32x4) IsNan(y Float32x4) Mask32x4
- func (x Float32x4) Len() int
- func (x Float32x4) Less(y Float32x4) Mask32x4
- func (x Float32x4) LessEqual(y Float32x4) Mask32x4
- func (x Float32x4) Masked(mask Mask32x4) Float32x4
- func (x Float32x4) Max(y Float32x4) Float32x4
- func (x Float32x4) Merge(y Float32x4, mask Mask32x4) Float32x4
- func (x Float32x4) Min(y Float32x4) Float32x4
- func (x Float32x4) Mul(y Float32x4) Float32x4
- func (x Float32x4) MulAdd(y Float32x4, z Float32x4) Float32x4
- func (x Float32x4) MulAddSub(y Float32x4, z Float32x4) Float32x4
- func (x Float32x4) MulSubAdd(y Float32x4, z Float32x4) Float32x4
- func (x Float32x4) NotEqual(y Float32x4) Mask32x4
- func (x Float32x4) Reciprocal() Float32x4
- func (x Float32x4) ReciprocalSqrt() Float32x4
- func (x Float32x4) RoundToEven() Float32x4
- func (x Float32x4) RoundToEvenScaled(prec uint8) Float32x4
- func (x Float32x4) RoundToEvenScaledResidue(prec uint8) Float32x4
- func (x Float32x4) Scale(y Float32x4) Float32x4
- func (x Float32x4) SelectFromPair(a, b, c, d uint8, y Float32x4) Float32x4
- func (x Float32x4) SetElem(index uint8, y float32) Float32x4
- func (x Float32x4) Sqrt() Float32x4
- func (x Float32x4) Store(y *[4]float32)
- func (x Float32x4) StoreMasked(y *[4]float32, mask Mask32x4)
- func (x Float32x4) StoreSlice(s []float32)
- func (x Float32x4) StoreSlicePart(s []float32)
- func (x Float32x4) String() string
- func (x Float32x4) Sub(y Float32x4) Float32x4
- func (x Float32x4) SubPairs(y Float32x4) Float32x4
- func (x Float32x4) Trunc() Float32x4
- func (x Float32x4) TruncScaled(prec uint8) Float32x4
- func (x Float32x4) TruncScaledResidue(prec uint8) Float32x4
- type Float32x8
- func (x Float32x8) Add(y Float32x8) Float32x8
- func (x Float32x8) AddPairs(y Float32x8) Float32x8
- func (x Float32x8) AddSub(y Float32x8) Float32x8
- func (from Float32x8) AsFloat64x4() (to Float64x4)
- func (from Float32x8) AsInt16x16() (to Int16x16)
- func (from Float32x8) AsInt32x8() (to Int32x8)
- func (from Float32x8) AsInt64x4() (to Int64x4)
- func (from Float32x8) AsInt8x32() (to Int8x32)
- func (from Float32x8) AsUint16x16() (to Uint16x16)
- func (from Float32x8) AsUint32x8() (to Uint32x8)
- func (from Float32x8) AsUint64x4() (to Uint64x4)
- func (from Float32x8) AsUint8x32() (to Uint8x32)
- func (x Float32x8) Ceil() Float32x8
- func (x Float32x8) CeilScaled(prec uint8) Float32x8
- func (x Float32x8) CeilScaledResidue(prec uint8) Float32x8
- func (x Float32x8) Compress(mask Mask32x8) Float32x8
- func (x Float32x8) ConcatPermute(y Float32x8, indices Uint32x8) Float32x8
- func (x Float32x8) ConvertToFloat64() Float64x8
- func (x Float32x8) ConvertToInt32() Int32x8
- func (x Float32x8) ConvertToInt64() Int64x8
- func (x Float32x8) ConvertToUint32() Uint32x8
- func (x Float32x8) ConvertToUint64() Uint64x8
- func (x Float32x8) Div(y Float32x8) Float32x8
- func (x Float32x8) Equal(y Float32x8) Mask32x8
- func (x Float32x8) Expand(mask Mask32x8) Float32x8
- func (x Float32x8) Floor() Float32x8
- func (x Float32x8) FloorScaled(prec uint8) Float32x8
- func (x Float32x8) FloorScaledResidue(prec uint8) Float32x8
- func (x Float32x8) GetHi() Float32x4
- func (x Float32x8) GetLo() Float32x4
- func (x Float32x8) Greater(y Float32x8) Mask32x8
- func (x Float32x8) GreaterEqual(y Float32x8) Mask32x8
- func (x Float32x8) IsNan(y Float32x8) Mask32x8
- func (x Float32x8) Len() int
- func (x Float32x8) Less(y Float32x8) Mask32x8
- func (x Float32x8) LessEqual(y Float32x8) Mask32x8
- func (x Float32x8) Masked(mask Mask32x8) Float32x8
- func (x Float32x8) Max(y Float32x8) Float32x8
- func (x Float32x8) Merge(y Float32x8, mask Mask32x8) Float32x8
- func (x Float32x8) Min(y Float32x8) Float32x8
- func (x Float32x8) Mul(y Float32x8) Float32x8
- func (x Float32x8) MulAdd(y Float32x8, z Float32x8) Float32x8
- func (x Float32x8) MulAddSub(y Float32x8, z Float32x8) Float32x8
- func (x Float32x8) MulSubAdd(y Float32x8, z Float32x8) Float32x8
- func (x Float32x8) NotEqual(y Float32x8) Mask32x8
- func (x Float32x8) Permute(indices Uint32x8) Float32x8
- func (x Float32x8) Reciprocal() Float32x8
- func (x Float32x8) ReciprocalSqrt() Float32x8
- func (x Float32x8) RoundToEven() Float32x8
- func (x Float32x8) RoundToEvenScaled(prec uint8) Float32x8
- func (x Float32x8) RoundToEvenScaledResidue(prec uint8) Float32x8
- func (x Float32x8) Scale(y Float32x8) Float32x8
- func (x Float32x8) Select128FromPair(lo, hi uint8, y Float32x8) Float32x8
- func (x Float32x8) SelectFromPairGrouped(a, b, c, d uint8, y Float32x8) Float32x8
- func (x Float32x8) SetHi(y Float32x4) Float32x8
- func (x Float32x8) SetLo(y Float32x4) Float32x8
- func (x Float32x8) Sqrt() Float32x8
- func (x Float32x8) Store(y *[8]float32)
- func (x Float32x8) StoreMasked(y *[8]float32, mask Mask32x8)
- func (x Float32x8) StoreSlice(s []float32)
- func (x Float32x8) StoreSlicePart(s []float32)
- func (x Float32x8) String() string
- func (x Float32x8) Sub(y Float32x8) Float32x8
- func (x Float32x8) SubPairs(y Float32x8) Float32x8
- func (x Float32x8) Trunc() Float32x8
- func (x Float32x8) TruncScaled(prec uint8) Float32x8
- func (x Float32x8) TruncScaledResidue(prec uint8) Float32x8
- type Float64x2
- func (x Float64x2) Add(y Float64x2) Float64x2
- func (x Float64x2) AddPairs(y Float64x2) Float64x2
- func (x Float64x2) AddSub(y Float64x2) Float64x2
- func (from Float64x2) AsFloat32x4() (to Float32x4)
- func (from Float64x2) AsInt16x8() (to Int16x8)
- func (from Float64x2) AsInt32x4() (to Int32x4)
- func (from Float64x2) AsInt64x2() (to Int64x2)
- func (from Float64x2) AsInt8x16() (to Int8x16)
- func (from Float64x2) AsUint16x8() (to Uint16x8)
- func (from Float64x2) AsUint32x4() (to Uint32x4)
- func (from Float64x2) AsUint64x2() (to Uint64x2)
- func (from Float64x2) AsUint8x16() (to Uint8x16)
- func (x Float64x2) Broadcast128() Float64x2
- func (x Float64x2) Broadcast256() Float64x4
- func (x Float64x2) Broadcast512() Float64x8
- func (x Float64x2) Ceil() Float64x2
- func (x Float64x2) CeilScaled(prec uint8) Float64x2
- func (x Float64x2) CeilScaledResidue(prec uint8) Float64x2
- func (x Float64x2) Compress(mask Mask64x2) Float64x2
- func (x Float64x2) ConcatPermute(y Float64x2, indices Uint64x2) Float64x2
- func (x Float64x2) ConvertToFloat32() Float32x4
- func (x Float64x2) ConvertToInt32() Int32x4
- func (x Float64x2) ConvertToInt64() Int64x2
- func (x Float64x2) ConvertToUint32() Uint32x4
- func (x Float64x2) ConvertToUint64() Uint64x2
- func (x Float64x2) Div(y Float64x2) Float64x2
- func (x Float64x2) Equal(y Float64x2) Mask64x2
- func (x Float64x2) Expand(mask Mask64x2) Float64x2
- func (x Float64x2) Floor() Float64x2
- func (x Float64x2) FloorScaled(prec uint8) Float64x2
- func (x Float64x2) FloorScaledResidue(prec uint8) Float64x2
- func (x Float64x2) GetElem(index uint8) float64
- func (x Float64x2) Greater(y Float64x2) Mask64x2
- func (x Float64x2) GreaterEqual(y Float64x2) Mask64x2
- func (x Float64x2) IsNan(y Float64x2) Mask64x2
- func (x Float64x2) Len() int
- func (x Float64x2) Less(y Float64x2) Mask64x2
- func (x Float64x2) LessEqual(y Float64x2) Mask64x2
- func (x Float64x2) Masked(mask Mask64x2) Float64x2
- func (x Float64x2) Max(y Float64x2) Float64x2
- func (x Float64x2) Merge(y Float64x2, mask Mask64x2) Float64x2
- func (x Float64x2) Min(y Float64x2) Float64x2
- func (x Float64x2) Mul(y Float64x2) Float64x2
- func (x Float64x2) MulAdd(y Float64x2, z Float64x2) Float64x2
- func (x Float64x2) MulAddSub(y Float64x2, z Float64x2) Float64x2
- func (x Float64x2) MulSubAdd(y Float64x2, z Float64x2) Float64x2
- func (x Float64x2) NotEqual(y Float64x2) Mask64x2
- func (x Float64x2) Reciprocal() Float64x2
- func (x Float64x2) ReciprocalSqrt() Float64x2
- func (x Float64x2) RoundToEven() Float64x2
- func (x Float64x2) RoundToEvenScaled(prec uint8) Float64x2
- func (x Float64x2) RoundToEvenScaledResidue(prec uint8) Float64x2
- func (x Float64x2) Scale(y Float64x2) Float64x2
- func (x Float64x2) SelectFromPair(a, b uint8, y Float64x2) Float64x2
- func (x Float64x2) SetElem(index uint8, y float64) Float64x2
- func (x Float64x2) Sqrt() Float64x2
- func (x Float64x2) Store(y *[2]float64)
- func (x Float64x2) StoreMasked(y *[2]float64, mask Mask64x2)
- func (x Float64x2) StoreSlice(s []float64)
- func (x Float64x2) StoreSlicePart(s []float64)
- func (x Float64x2) String() string
- func (x Float64x2) Sub(y Float64x2) Float64x2
- func (x Float64x2) SubPairs(y Float64x2) Float64x2
- func (x Float64x2) Trunc() Float64x2
- func (x Float64x2) TruncScaled(prec uint8) Float64x2
- func (x Float64x2) TruncScaledResidue(prec uint8) Float64x2
- type Float64x4
- func (x Float64x4) Add(y Float64x4) Float64x4
- func (x Float64x4) AddPairs(y Float64x4) Float64x4
- func (x Float64x4) AddSub(y Float64x4) Float64x4
- func (from Float64x4) AsFloat32x8() (to Float32x8)
- func (from Float64x4) AsInt16x16() (to Int16x16)
- func (from Float64x4) AsInt32x8() (to Int32x8)
- func (from Float64x4) AsInt64x4() (to Int64x4)
- func (from Float64x4) AsInt8x32() (to Int8x32)
- func (from Float64x4) AsUint16x16() (to Uint16x16)
- func (from Float64x4) AsUint32x8() (to Uint32x8)
- func (from Float64x4) AsUint64x4() (to Uint64x4)
- func (from Float64x4) AsUint8x32() (to Uint8x32)
- func (x Float64x4) Ceil() Float64x4
- func (x Float64x4) CeilScaled(prec uint8) Float64x4
- func (x Float64x4) CeilScaledResidue(prec uint8) Float64x4
- func (x Float64x4) Compress(mask Mask64x4) Float64x4
- func (x Float64x4) ConcatPermute(y Float64x4, indices Uint64x4) Float64x4
- func (x Float64x4) ConvertToFloat32() Float32x4
- func (x Float64x4) ConvertToInt32() Int32x4
- func (x Float64x4) ConvertToInt64() Int64x4
- func (x Float64x4) ConvertToUint32() Uint32x4
- func (x Float64x4) ConvertToUint64() Uint64x4
- func (x Float64x4) Div(y Float64x4) Float64x4
- func (x Float64x4) Equal(y Float64x4) Mask64x4
- func (x Float64x4) Expand(mask Mask64x4) Float64x4
- func (x Float64x4) Floor() Float64x4
- func (x Float64x4) FloorScaled(prec uint8) Float64x4
- func (x Float64x4) FloorScaledResidue(prec uint8) Float64x4
- func (x Float64x4) GetHi() Float64x2
- func (x Float64x4) GetLo() Float64x2
- func (x Float64x4) Greater(y Float64x4) Mask64x4
- func (x Float64x4) GreaterEqual(y Float64x4) Mask64x4
- func (x Float64x4) IsNan(y Float64x4) Mask64x4
- func (x Float64x4) Len() int
- func (x Float64x4) Less(y Float64x4) Mask64x4
- func (x Float64x4) LessEqual(y Float64x4) Mask64x4
- func (x Float64x4) Masked(mask Mask64x4) Float64x4
- func (x Float64x4) Max(y Float64x4) Float64x4
- func (x Float64x4) Merge(y Float64x4, mask Mask64x4) Float64x4
- func (x Float64x4) Min(y Float64x4) Float64x4
- func (x Float64x4) Mul(y Float64x4) Float64x4
- func (x Float64x4) MulAdd(y Float64x4, z Float64x4) Float64x4
- func (x Float64x4) MulAddSub(y Float64x4, z Float64x4) Float64x4
- func (x Float64x4) MulSubAdd(y Float64x4, z Float64x4) Float64x4
- func (x Float64x4) NotEqual(y Float64x4) Mask64x4
- func (x Float64x4) Permute(indices Uint64x4) Float64x4
- func (x Float64x4) Reciprocal() Float64x4
- func (x Float64x4) ReciprocalSqrt() Float64x4
- func (x Float64x4) RoundToEven() Float64x4
- func (x Float64x4) RoundToEvenScaled(prec uint8) Float64x4
- func (x Float64x4) RoundToEvenScaledResidue(prec uint8) Float64x4
- func (x Float64x4) Scale(y Float64x4) Float64x4
- func (x Float64x4) Select128FromPair(lo, hi uint8, y Float64x4) Float64x4
- func (x Float64x4) SelectFromPairGrouped(a, b uint8, y Float64x4) Float64x4
- func (x Float64x4) SetHi(y Float64x2) Float64x4
- func (x Float64x4) SetLo(y Float64x2) Float64x4
- func (x Float64x4) Sqrt() Float64x4
- func (x Float64x4) Store(y *[4]float64)
- func (x Float64x4) StoreMasked(y *[4]float64, mask Mask64x4)
- func (x Float64x4) StoreSlice(s []float64)
- func (x Float64x4) StoreSlicePart(s []float64)
- func (x Float64x4) String() string
- func (x Float64x4) Sub(y Float64x4) Float64x4
- func (x Float64x4) SubPairs(y Float64x4) Float64x4
- func (x Float64x4) Trunc() Float64x4
- func (x Float64x4) TruncScaled(prec uint8) Float64x4
- func (x Float64x4) TruncScaledResidue(prec uint8) Float64x4
- type Float64x8
- func (x Float64x8) Add(y Float64x8) Float64x8
- func (from Float64x8) AsFloat32x16() (to Float32x16)
- func (from Float64x8) AsInt16x32() (to Int16x32)
- func (from Float64x8) AsInt32x16() (to Int32x16)
- func (from Float64x8) AsInt64x8() (to Int64x8)
- func (from Float64x8) AsInt8x64() (to Int8x64)
- func (from Float64x8) AsUint16x32() (to Uint16x32)
- func (from Float64x8) AsUint32x16() (to Uint32x16)
- func (from Float64x8) AsUint64x8() (to Uint64x8)
- func (from Float64x8) AsUint8x64() (to Uint8x64)
- func (x Float64x8) CeilScaled(prec uint8) Float64x8
- func (x Float64x8) CeilScaledResidue(prec uint8) Float64x8
- func (x Float64x8) Compress(mask Mask64x8) Float64x8
- func (x Float64x8) ConcatPermute(y Float64x8, indices Uint64x8) Float64x8
- func (x Float64x8) ConvertToFloat32() Float32x8
- func (x Float64x8) ConvertToInt32() Int32x8
- func (x Float64x8) ConvertToInt64() Int64x8
- func (x Float64x8) ConvertToUint32() Uint32x8
- func (x Float64x8) ConvertToUint64() Uint64x8
- func (x Float64x8) Div(y Float64x8) Float64x8
- func (x Float64x8) Equal(y Float64x8) Mask64x8
- func (x Float64x8) Expand(mask Mask64x8) Float64x8
- func (x Float64x8) FloorScaled(prec uint8) Float64x8
- func (x Float64x8) FloorScaledResidue(prec uint8) Float64x8
- func (x Float64x8) GetHi() Float64x4
- func (x Float64x8) GetLo() Float64x4
- func (x Float64x8) Greater(y Float64x8) Mask64x8
- func (x Float64x8) GreaterEqual(y Float64x8) Mask64x8
- func (x Float64x8) IsNan(y Float64x8) Mask64x8
- func (x Float64x8) Len() int
- func (x Float64x8) Less(y Float64x8) Mask64x8
- func (x Float64x8) LessEqual(y Float64x8) Mask64x8
- func (x Float64x8) Masked(mask Mask64x8) Float64x8
- func (x Float64x8) Max(y Float64x8) Float64x8
- func (x Float64x8) Merge(y Float64x8, mask Mask64x8) Float64x8
- func (x Float64x8) Min(y Float64x8) Float64x8
- func (x Float64x8) Mul(y Float64x8) Float64x8
- func (x Float64x8) MulAdd(y Float64x8, z Float64x8) Float64x8
- func (x Float64x8) MulAddSub(y Float64x8, z Float64x8) Float64x8
- func (x Float64x8) MulSubAdd(y Float64x8, z Float64x8) Float64x8
- func (x Float64x8) NotEqual(y Float64x8) Mask64x8
- func (x Float64x8) Permute(indices Uint64x8) Float64x8
- func (x Float64x8) Reciprocal() Float64x8
- func (x Float64x8) ReciprocalSqrt() Float64x8
- func (x Float64x8) RoundToEvenScaled(prec uint8) Float64x8
- func (x Float64x8) RoundToEvenScaledResidue(prec uint8) Float64x8
- func (x Float64x8) Scale(y Float64x8) Float64x8
- func (x Float64x8) SelectFromPairGrouped(a, b uint8, y Float64x8) Float64x8
- func (x Float64x8) SetHi(y Float64x4) Float64x8
- func (x Float64x8) SetLo(y Float64x4) Float64x8
- func (x Float64x8) Sqrt() Float64x8
- func (x Float64x8) Store(y *[8]float64)
- func (x Float64x8) StoreMasked(y *[8]float64, mask Mask64x8)
- func (x Float64x8) StoreSlice(s []float64)
- func (x Float64x8) StoreSlicePart(s []float64)
- func (x Float64x8) String() string
- func (x Float64x8) Sub(y Float64x8) Float64x8
- func (x Float64x8) TruncScaled(prec uint8) Float64x8
- func (x Float64x8) TruncScaledResidue(prec uint8) Float64x8
- type Int16x16
- func (x Int16x16) Abs() Int16x16
- func (x Int16x16) Add(y Int16x16) Int16x16
- func (x Int16x16) AddPairs(y Int16x16) Int16x16
- func (x Int16x16) AddPairsSaturated(y Int16x16) Int16x16
- func (x Int16x16) AddSaturated(y Int16x16) Int16x16
- func (x Int16x16) And(y Int16x16) Int16x16
- func (x Int16x16) AndNot(y Int16x16) Int16x16
- func (from Int16x16) AsFloat32x8() (to Float32x8)
- func (from Int16x16) AsFloat64x4() (to Float64x4)
- func (from Int16x16) AsInt32x8() (to Int32x8)
- func (from Int16x16) AsInt64x4() (to Int64x4)
- func (from Int16x16) AsInt8x32() (to Int8x32)
- func (from Int16x16) AsUint16x16() (to Uint16x16)
- func (from Int16x16) AsUint32x8() (to Uint32x8)
- func (from Int16x16) AsUint64x4() (to Uint64x4)
- func (from Int16x16) AsUint8x32() (to Uint8x32)
- func (x Int16x16) Compress(mask Mask16x16) Int16x16
- func (x Int16x16) ConcatPermute(y Int16x16, indices Uint16x16) Int16x16
- func (x Int16x16) CopySign(y Int16x16) Int16x16
- func (x Int16x16) DotProductPairs(y Int16x16) Int32x8
- func (x Int16x16) Equal(y Int16x16) Mask16x16
- func (x Int16x16) Expand(mask Mask16x16) Int16x16
- func (x Int16x16) ExtendToInt32() Int32x16
- func (x Int16x16) GetHi() Int16x8
- func (x Int16x16) GetLo() Int16x8
- func (x Int16x16) Greater(y Int16x16) Mask16x16
- func (x Int16x16) GreaterEqual(y Int16x16) Mask16x16
- func (x Int16x16) InterleaveHiGrouped(y Int16x16) Int16x16
- func (x Int16x16) InterleaveLoGrouped(y Int16x16) Int16x16
- func (x Int16x16) IsZero() bool
- func (x Int16x16) Len() int
- func (x Int16x16) Less(y Int16x16) Mask16x16
- func (x Int16x16) LessEqual(y Int16x16) Mask16x16
- func (x Int16x16) Masked(mask Mask16x16) Int16x16
- func (x Int16x16) Max(y Int16x16) Int16x16
- func (x Int16x16) Merge(y Int16x16, mask Mask16x16) Int16x16
- func (x Int16x16) Min(y Int16x16) Int16x16
- func (x Int16x16) Mul(y Int16x16) Int16x16
- func (x Int16x16) MulHigh(y Int16x16) Int16x16
- func (x Int16x16) Not() Int16x16
- func (x Int16x16) NotEqual(y Int16x16) Mask16x16
- func (x Int16x16) OnesCount() Int16x16
- func (x Int16x16) Or(y Int16x16) Int16x16
- func (x Int16x16) Permute(indices Uint16x16) Int16x16
- func (x Int16x16) PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x16
- func (x Int16x16) PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x16
- func (x Int16x16) SaturateToInt8() Int8x16
- func (x Int16x16) SaturateToUint8() Int8x16
- func (x Int16x16) Select128FromPair(lo, hi uint8, y Int16x16) Int16x16
- func (x Int16x16) SetHi(y Int16x8) Int16x16
- func (x Int16x16) SetLo(y Int16x8) Int16x16
- func (x Int16x16) ShiftAllLeft(y uint64) Int16x16
- func (x Int16x16) ShiftAllLeftConcat(shift uint8, y Int16x16) Int16x16
- func (x Int16x16) ShiftAllRight(y uint64) Int16x16
- func (x Int16x16) ShiftAllRightConcat(shift uint8, y Int16x16) Int16x16
- func (x Int16x16) ShiftLeft(y Int16x16) Int16x16
- func (x Int16x16) ShiftLeftConcat(y Int16x16, z Int16x16) Int16x16
- func (x Int16x16) ShiftRight(y Int16x16) Int16x16
- func (x Int16x16) ShiftRightConcat(y Int16x16, z Int16x16) Int16x16
- func (x Int16x16) Store(y *[16]int16)
- func (x Int16x16) StoreSlice(s []int16)
- func (x Int16x16) StoreSlicePart(s []int16)
- func (x Int16x16) String() string
- func (x Int16x16) Sub(y Int16x16) Int16x16
- func (x Int16x16) SubPairs(y Int16x16) Int16x16
- func (x Int16x16) SubPairsSaturated(y Int16x16) Int16x16
- func (x Int16x16) SubSaturated(y Int16x16) Int16x16
- func (from Int16x16) ToMask() (to Mask16x16)
- func (x Int16x16) TruncateToInt8() Int8x16
- func (x Int16x16) Xor(y Int16x16) Int16x16
- type Int16x32
- func (x Int16x32) Abs() Int16x32
- func (x Int16x32) Add(y Int16x32) Int16x32
- func (x Int16x32) AddSaturated(y Int16x32) Int16x32
- func (x Int16x32) And(y Int16x32) Int16x32
- func (x Int16x32) AndNot(y Int16x32) Int16x32
- func (from Int16x32) AsFloat32x16() (to Float32x16)
- func (from Int16x32) AsFloat64x8() (to Float64x8)
- func (from Int16x32) AsInt32x16() (to Int32x16)
- func (from Int16x32) AsInt64x8() (to Int64x8)
- func (from Int16x32) AsInt8x64() (to Int8x64)
- func (from Int16x32) AsUint16x32() (to Uint16x32)
- func (from Int16x32) AsUint32x16() (to Uint32x16)
- func (from Int16x32) AsUint64x8() (to Uint64x8)
- func (from Int16x32) AsUint8x64() (to Uint8x64)
- func (x Int16x32) Compress(mask Mask16x32) Int16x32
- func (x Int16x32) ConcatPermute(y Int16x32, indices Uint16x32) Int16x32
- func (x Int16x32) DotProductPairs(y Int16x32) Int32x16
- func (x Int16x32) Equal(y Int16x32) Mask16x32
- func (x Int16x32) Expand(mask Mask16x32) Int16x32
- func (x Int16x32) GetHi() Int16x16
- func (x Int16x32) GetLo() Int16x16
- func (x Int16x32) Greater(y Int16x32) Mask16x32
- func (x Int16x32) GreaterEqual(y Int16x32) Mask16x32
- func (x Int16x32) InterleaveHiGrouped(y Int16x32) Int16x32
- func (x Int16x32) InterleaveLoGrouped(y Int16x32) Int16x32
- func (x Int16x32) Len() int
- func (x Int16x32) Less(y Int16x32) Mask16x32
- func (x Int16x32) LessEqual(y Int16x32) Mask16x32
- func (x Int16x32) Masked(mask Mask16x32) Int16x32
- func (x Int16x32) Max(y Int16x32) Int16x32
- func (x Int16x32) Merge(y Int16x32, mask Mask16x32) Int16x32
- func (x Int16x32) Min(y Int16x32) Int16x32
- func (x Int16x32) Mul(y Int16x32) Int16x32
- func (x Int16x32) MulHigh(y Int16x32) Int16x32
- func (x Int16x32) Not() Int16x32
- func (x Int16x32) NotEqual(y Int16x32) Mask16x32
- func (x Int16x32) OnesCount() Int16x32
- func (x Int16x32) Or(y Int16x32) Int16x32
- func (x Int16x32) Permute(indices Uint16x32) Int16x32
- func (x Int16x32) PermuteScalarsHiGrouped(a, b, c, d uint8) Int16x32
- func (x Int16x32) PermuteScalarsLoGrouped(a, b, c, d uint8) Int16x32
- func (x Int16x32) SaturateToInt8() Int8x32
- func (x Int16x32) SetHi(y Int16x16) Int16x32
- func (x Int16x32) SetLo(y Int16x16) Int16x32
- func (x Int16x32) ShiftAllLeft(y uint64) Int16x32
- func (x Int16x32) ShiftAllLeftConcat(shift uint8, y Int16x32) Int16x32
- func (x Int16x32) ShiftAllRight(y uint64) Int16x32
- func (x Int16x32) ShiftAllRightConcat(shift uint8, y Int16x32) Int16x32
- func (x Int16x32) ShiftLeft(y Int16x32) Int16x32
- func (x Int16x32) ShiftLeftConcat(y Int16x32, z Int16x32) Int16x32
- func (x Int16x32) ShiftRight(y Int16x32) Int16x32
- func (x Int16x32) ShiftRightConcat(y Int16x32, z Int16x32) Int16x32
- func (x Int16x32) Store(y *[32]int16)
- func (x Int16x32) StoreMasked(y *[32]int16, mask Mask16x32)
- func (x Int16x32) StoreSlice(s []int16)
- func (x Int16x32) StoreSlicePart(s []int16)
- func (x Int16x32) String() string
- func (x Int16x32) Sub(y Int16x32) Int16x32
- func (x Int16x32) SubSaturated(y Int16x32) Int16x32
- func (from Int16x32) ToMask() (to Mask16x32)
- func (x Int16x32) TruncateToInt8() Int8x32
- func (x Int16x32) Xor(y Int16x32) Int16x32
- type Int16x8
- func (x Int16x8) Abs() Int16x8
- func (x Int16x8) Add(y Int16x8) Int16x8
- func (x Int16x8) AddPairs(y Int16x8) Int16x8
- func (x Int16x8) AddPairsSaturated(y Int16x8) Int16x8
- func (x Int16x8) AddSaturated(y Int16x8) Int16x8
- func (x Int16x8) And(y Int16x8) Int16x8
- func (x Int16x8) AndNot(y Int16x8) Int16x8
- func (from Int16x8) AsFloat32x4() (to Float32x4)
- func (from Int16x8) AsFloat64x2() (to Float64x2)
- func (from Int16x8) AsInt32x4() (to Int32x4)
- func (from Int16x8) AsInt64x2() (to Int64x2)
- func (from Int16x8) AsInt8x16() (to Int8x16)
- func (from Int16x8) AsUint16x8() (to Uint16x8)
- func (from Int16x8) AsUint32x4() (to Uint32x4)
- func (from Int16x8) AsUint64x2() (to Uint64x2)
- func (from Int16x8) AsUint8x16() (to Uint8x16)
- func (x Int16x8) Broadcast128() Int16x8
- func (x Int16x8) Broadcast256() Int16x16
- func (x Int16x8) Broadcast512() Int16x32
- func (x Int16x8) Compress(mask Mask16x8) Int16x8
- func (x Int16x8) ConcatPermute(y Int16x8, indices Uint16x8) Int16x8
- func (x Int16x8) CopySign(y Int16x8) Int16x8
- func (x Int16x8) DotProductPairs(y Int16x8) Int32x4
- func (x Int16x8) Equal(y Int16x8) Mask16x8
- func (x Int16x8) Expand(mask Mask16x8) Int16x8
- func (x Int16x8) ExtendLo2ToInt64x2() Int64x2
- func (x Int16x8) ExtendLo4ToInt32x4() Int32x4
- func (x Int16x8) ExtendLo4ToInt64x4() Int64x4
- func (x Int16x8) ExtendToInt32() Int32x8
- func (x Int16x8) ExtendToInt64() Int64x8
- func (x Int16x8) GetElem(index uint8) int16
- func (x Int16x8) Greater(y Int16x8) Mask16x8
- func (x Int16x8) GreaterEqual(y Int16x8) Mask16x8
- func (x Int16x8) InterleaveHi(y Int16x8) Int16x8
- func (x Int16x8) InterleaveLo(y Int16x8) Int16x8
- func (x Int16x8) IsZero() bool
- func (x Int16x8) Len() int
- func (x Int16x8) Less(y Int16x8) Mask16x8
- func (x Int16x8) LessEqual(y Int16x8) Mask16x8
- func (x Int16x8) Masked(mask Mask16x8) Int16x8
- func (x Int16x8) Max(y Int16x8) Int16x8
- func (x Int16x8) Merge(y Int16x8, mask Mask16x8) Int16x8
- func (x Int16x8) Min(y Int16x8) Int16x8
- func (x Int16x8) Mul(y Int16x8) Int16x8
- func (x Int16x8) MulHigh(y Int16x8) Int16x8
- func (x Int16x8) Not() Int16x8
- func (x Int16x8) NotEqual(y Int16x8) Mask16x8
- func (x Int16x8) OnesCount() Int16x8
- func (x Int16x8) Or(y Int16x8) Int16x8
- func (x Int16x8) Permute(indices Uint16x8) Int16x8
- func (x Int16x8) PermuteScalarsHi(a, b, c, d uint8) Int16x8
- func (x Int16x8) PermuteScalarsLo(a, b, c, d uint8) Int16x8
- func (x Int16x8) SaturateToInt8() Int8x16
- func (x Int16x8) SaturateToUint8() Int8x16
- func (x Int16x8) SetElem(index uint8, y int16) Int16x8
- func (x Int16x8) ShiftAllLeft(y uint64) Int16x8
- func (x Int16x8) ShiftAllLeftConcat(shift uint8, y Int16x8) Int16x8
- func (x Int16x8) ShiftAllRight(y uint64) Int16x8
- func (x Int16x8) ShiftAllRightConcat(shift uint8, y Int16x8) Int16x8
- func (x Int16x8) ShiftLeft(y Int16x8) Int16x8
- func (x Int16x8) ShiftLeftConcat(y Int16x8, z Int16x8) Int16x8
- func (x Int16x8) ShiftRight(y Int16x8) Int16x8
- func (x Int16x8) ShiftRightConcat(y Int16x8, z Int16x8) Int16x8
- func (x Int16x8) Store(y *[8]int16)
- func (x Int16x8) StoreSlice(s []int16)
- func (x Int16x8) StoreSlicePart(s []int16)
- func (x Int16x8) String() string
- func (x Int16x8) Sub(y Int16x8) Int16x8
- func (x Int16x8) SubPairs(y Int16x8) Int16x8
- func (x Int16x8) SubPairsSaturated(y Int16x8) Int16x8
- func (x Int16x8) SubSaturated(y Int16x8) Int16x8
- func (from Int16x8) ToMask() (to Mask16x8)
- func (x Int16x8) TruncateToInt8() Int8x16
- func (x Int16x8) Xor(y Int16x8) Int16x8
- type Int32x16
- func (x Int32x16) Abs() Int32x16
- func (x Int32x16) Add(y Int32x16) Int32x16
- func (x Int32x16) And(y Int32x16) Int32x16
- func (x Int32x16) AndNot(y Int32x16) Int32x16
- func (from Int32x16) AsFloat32x16() (to Float32x16)
- func (from Int32x16) AsFloat64x8() (to Float64x8)
- func (from Int32x16) AsInt16x32() (to Int16x32)
- func (from Int32x16) AsInt64x8() (to Int64x8)
- func (from Int32x16) AsInt8x64() (to Int8x64)
- func (from Int32x16) AsUint16x32() (to Uint16x32)
- func (from Int32x16) AsUint32x16() (to Uint32x16)
- func (from Int32x16) AsUint64x8() (to Uint64x8)
- func (from Int32x16) AsUint8x64() (to Uint8x64)
- func (x Int32x16) Compress(mask Mask32x16) Int32x16
- func (x Int32x16) ConcatPermute(y Int32x16, indices Uint32x16) Int32x16
- func (x Int32x16) ConvertToFloat32() Float32x16
- func (x Int32x16) Equal(y Int32x16) Mask32x16
- func (x Int32x16) Expand(mask Mask32x16) Int32x16
- func (x Int32x16) GetHi() Int32x8
- func (x Int32x16) GetLo() Int32x8
- func (x Int32x16) Greater(y Int32x16) Mask32x16
- func (x Int32x16) GreaterEqual(y Int32x16) Mask32x16
- func (x Int32x16) InterleaveHiGrouped(y Int32x16) Int32x16
- func (x Int32x16) InterleaveLoGrouped(y Int32x16) Int32x16
- func (x Int32x16) LeadingZeros() Int32x16
- func (x Int32x16) Len() int
- func (x Int32x16) Less(y Int32x16) Mask32x16
- func (x Int32x16) LessEqual(y Int32x16) Mask32x16
- func (x Int32x16) Masked(mask Mask32x16) Int32x16
- func (x Int32x16) Max(y Int32x16) Int32x16
- func (x Int32x16) Merge(y Int32x16, mask Mask32x16) Int32x16
- func (x Int32x16) Min(y Int32x16) Int32x16
- func (x Int32x16) Mul(y Int32x16) Int32x16
- func (x Int32x16) Not() Int32x16
- func (x Int32x16) NotEqual(y Int32x16) Mask32x16
- func (x Int32x16) OnesCount() Int32x16
- func (x Int32x16) Or(y Int32x16) Int32x16
- func (x Int32x16) Permute(indices Uint32x16) Int32x16
- func (x Int32x16) PermuteScalarsGrouped(a, b, c, d uint8) Int32x16
- func (x Int32x16) RotateAllLeft(shift uint8) Int32x16
- func (x Int32x16) RotateAllRight(shift uint8) Int32x16
- func (x Int32x16) RotateLeft(y Int32x16) Int32x16
- func (x Int32x16) RotateRight(y Int32x16) Int32x16
- func (x Int32x16) SaturateToInt16() Int16x16
- func (x Int32x16) SaturateToInt16Concat(y Int32x16) Int16x32
- func (x Int32x16) SaturateToInt8() Int8x16
- func (x Int32x16) SaturateToUint8() Int8x16
- func (x Int32x16) SelectFromPairGrouped(a, b, c, d uint8, y Int32x16) Int32x16
- func (x Int32x16) SetHi(y Int32x8) Int32x16
- func (x Int32x16) SetLo(y Int32x8) Int32x16
- func (x Int32x16) ShiftAllLeft(y uint64) Int32x16
- func (x Int32x16) ShiftAllLeftConcat(shift uint8, y Int32x16) Int32x16
- func (x Int32x16) ShiftAllRight(y uint64) Int32x16
- func (x Int32x16) ShiftAllRightConcat(shift uint8, y Int32x16) Int32x16
- func (x Int32x16) ShiftLeft(y Int32x16) Int32x16
- func (x Int32x16) ShiftLeftConcat(y Int32x16, z Int32x16) Int32x16
- func (x Int32x16) ShiftRight(y Int32x16) Int32x16
- func (x Int32x16) ShiftRightConcat(y Int32x16, z Int32x16) Int32x16
- func (x Int32x16) Store(y *[16]int32)
- func (x Int32x16) StoreMasked(y *[16]int32, mask Mask32x16)
- func (x Int32x16) StoreSlice(s []int32)
- func (x Int32x16) StoreSlicePart(s []int32)
- func (x Int32x16) String() string
- func (x Int32x16) Sub(y Int32x16) Int32x16
- func (from Int32x16) ToMask() (to Mask32x16)
- func (x Int32x16) TruncateToInt16() Int16x16
- func (x Int32x16) TruncateToInt8() Int8x16
- func (x Int32x16) Xor(y Int32x16) Int32x16
- type Int32x4
- func (x Int32x4) Abs() Int32x4
- func (x Int32x4) Add(y Int32x4) Int32x4
- func (x Int32x4) AddPairs(y Int32x4) Int32x4
- func (x Int32x4) And(y Int32x4) Int32x4
- func (x Int32x4) AndNot(y Int32x4) Int32x4
- func (from Int32x4) AsFloat32x4() (to Float32x4)
- func (from Int32x4) AsFloat64x2() (to Float64x2)
- func (from Int32x4) AsInt16x8() (to Int16x8)
- func (from Int32x4) AsInt64x2() (to Int64x2)
- func (from Int32x4) AsInt8x16() (to Int8x16)
- func (from Int32x4) AsUint16x8() (to Uint16x8)
- func (from Int32x4) AsUint32x4() (to Uint32x4)
- func (from Int32x4) AsUint64x2() (to Uint64x2)
- func (from Int32x4) AsUint8x16() (to Uint8x16)
- func (x Int32x4) Broadcast128() Int32x4
- func (x Int32x4) Broadcast256() Int32x8
- func (x Int32x4) Broadcast512() Int32x16
- func (x Int32x4) Compress(mask Mask32x4) Int32x4
- func (x Int32x4) ConcatPermute(y Int32x4, indices Uint32x4) Int32x4
- func (x Int32x4) ConvertToFloat32() Float32x4
- func (x Int32x4) ConvertToFloat64() Float64x4
- func (x Int32x4) CopySign(y Int32x4) Int32x4
- func (x Int32x4) Equal(y Int32x4) Mask32x4
- func (x Int32x4) Expand(mask Mask32x4) Int32x4
- func (x Int32x4) ExtendLo2ToInt64x2() Int64x2
- func (x Int32x4) ExtendToInt64() Int64x4
- func (x Int32x4) GetElem(index uint8) int32
- func (x Int32x4) Greater(y Int32x4) Mask32x4
- func (x Int32x4) GreaterEqual(y Int32x4) Mask32x4
- func (x Int32x4) InterleaveHi(y Int32x4) Int32x4
- func (x Int32x4) InterleaveLo(y Int32x4) Int32x4
- func (x Int32x4) IsZero() bool
- func (x Int32x4) LeadingZeros() Int32x4
- func (x Int32x4) Len() int
- func (x Int32x4) Less(y Int32x4) Mask32x4
- func (x Int32x4) LessEqual(y Int32x4) Mask32x4
- func (x Int32x4) Masked(mask Mask32x4) Int32x4
- func (x Int32x4) Max(y Int32x4) Int32x4
- func (x Int32x4) Merge(y Int32x4, mask Mask32x4) Int32x4
- func (x Int32x4) Min(y Int32x4) Int32x4
- func (x Int32x4) Mul(y Int32x4) Int32x4
- func (x Int32x4) MulEvenWiden(y Int32x4) Int64x2
- func (x Int32x4) Not() Int32x4
- func (x Int32x4) NotEqual(y Int32x4) Mask32x4
- func (x Int32x4) OnesCount() Int32x4
- func (x Int32x4) Or(y Int32x4) Int32x4
- func (x Int32x4) PermuteScalars(a, b, c, d uint8) Int32x4
- func (x Int32x4) RotateAllLeft(shift uint8) Int32x4
- func (x Int32x4) RotateAllRight(shift uint8) Int32x4
- func (x Int32x4) RotateLeft(y Int32x4) Int32x4
- func (x Int32x4) RotateRight(y Int32x4) Int32x4
- func (x Int32x4) SaturateToInt16() Int16x8
- func (x Int32x4) SaturateToInt16Concat(y Int32x4) Int16x8
- func (x Int32x4) SaturateToInt8() Int8x16
- func (x Int32x4) SaturateToUint8() Int8x16
- func (x Int32x4) SelectFromPair(a, b, c, d uint8, y Int32x4) Int32x4
- func (x Int32x4) SetElem(index uint8, y int32) Int32x4
- func (x Int32x4) ShiftAllLeft(y uint64) Int32x4
- func (x Int32x4) ShiftAllLeftConcat(shift uint8, y Int32x4) Int32x4
- func (x Int32x4) ShiftAllRight(y uint64) Int32x4
- func (x Int32x4) ShiftAllRightConcat(shift uint8, y Int32x4) Int32x4
- func (x Int32x4) ShiftLeft(y Int32x4) Int32x4
- func (x Int32x4) ShiftLeftConcat(y Int32x4, z Int32x4) Int32x4
- func (x Int32x4) ShiftRight(y Int32x4) Int32x4
- func (x Int32x4) ShiftRightConcat(y Int32x4, z Int32x4) Int32x4
- func (x Int32x4) Store(y *[4]int32)
- func (x Int32x4) StoreMasked(y *[4]int32, mask Mask32x4)
- func (x Int32x4) StoreSlice(s []int32)
- func (x Int32x4) StoreSlicePart(s []int32)
- func (x Int32x4) String() string
- func (x Int32x4) Sub(y Int32x4) Int32x4
- func (x Int32x4) SubPairs(y Int32x4) Int32x4
- func (from Int32x4) ToMask() (to Mask32x4)
- func (x Int32x4) TruncateToInt16() Int16x8
- func (x Int32x4) TruncateToInt8() Int8x16
- func (x Int32x4) Xor(y Int32x4) Int32x4
- type Int32x8
- func (x Int32x8) Abs() Int32x8
- func (x Int32x8) Add(y Int32x8) Int32x8
- func (x Int32x8) AddPairs(y Int32x8) Int32x8
- func (x Int32x8) And(y Int32x8) Int32x8
- func (x Int32x8) AndNot(y Int32x8) Int32x8
- func (from Int32x8) AsFloat32x8() (to Float32x8)
- func (from Int32x8) AsFloat64x4() (to Float64x4)
- func (from Int32x8) AsInt16x16() (to Int16x16)
- func (from Int32x8) AsInt64x4() (to Int64x4)
- func (from Int32x8) AsInt8x32() (to Int8x32)
- func (from Int32x8) AsUint16x16() (to Uint16x16)
- func (from Int32x8) AsUint32x8() (to Uint32x8)
- func (from Int32x8) AsUint64x4() (to Uint64x4)
- func (from Int32x8) AsUint8x32() (to Uint8x32)
- func (x Int32x8) Compress(mask Mask32x8) Int32x8
- func (x Int32x8) ConcatPermute(y Int32x8, indices Uint32x8) Int32x8
- func (x Int32x8) ConvertToFloat32() Float32x8
- func (x Int32x8) ConvertToFloat64() Float64x8
- func (x Int32x8) CopySign(y Int32x8) Int32x8
- func (x Int32x8) Equal(y Int32x8) Mask32x8
- func (x Int32x8) Expand(mask Mask32x8) Int32x8
- func (x Int32x8) ExtendToInt64() Int64x8
- func (x Int32x8) GetHi() Int32x4
- func (x Int32x8) GetLo() Int32x4
- func (x Int32x8) Greater(y Int32x8) Mask32x8
- func (x Int32x8) GreaterEqual(y Int32x8) Mask32x8
- func (x Int32x8) InterleaveHiGrouped(y Int32x8) Int32x8
- func (x Int32x8) InterleaveLoGrouped(y Int32x8) Int32x8
- func (x Int32x8) IsZero() bool
- func (x Int32x8) LeadingZeros() Int32x8
- func (x Int32x8) Len() int
- func (x Int32x8) Less(y Int32x8) Mask32x8
- func (x Int32x8) LessEqual(y Int32x8) Mask32x8
- func (x Int32x8) Masked(mask Mask32x8) Int32x8
- func (x Int32x8) Max(y Int32x8) Int32x8
- func (x Int32x8) Merge(y Int32x8, mask Mask32x8) Int32x8
- func (x Int32x8) Min(y Int32x8) Int32x8
- func (x Int32x8) Mul(y Int32x8) Int32x8
- func (x Int32x8) MulEvenWiden(y Int32x8) Int64x4
- func (x Int32x8) Not() Int32x8
- func (x Int32x8) NotEqual(y Int32x8) Mask32x8
- func (x Int32x8) OnesCount() Int32x8
- func (x Int32x8) Or(y Int32x8) Int32x8
- func (x Int32x8) Permute(indices Uint32x8) Int32x8
- func (x Int32x8) PermuteScalarsGrouped(a, b, c, d uint8) Int32x8
- func (x Int32x8) RotateAllLeft(shift uint8) Int32x8
- func (x Int32x8) RotateAllRight(shift uint8) Int32x8
- func (x Int32x8) RotateLeft(y Int32x8) Int32x8
- func (x Int32x8) RotateRight(y Int32x8) Int32x8
- func (x Int32x8) SaturateToInt16() Int16x8
- func (x Int32x8) SaturateToInt16Concat(y Int32x8) Int16x16
- func (x Int32x8) SaturateToInt8() Int8x16
- func (x Int32x8) SaturateToUint8() Int8x16
- func (x Int32x8) Select128FromPair(lo, hi uint8, y Int32x8) Int32x8
- func (x Int32x8) SelectFromPairGrouped(a, b, c, d uint8, y Int32x8) Int32x8
- func (x Int32x8) SetHi(y Int32x4) Int32x8
- func (x Int32x8) SetLo(y Int32x4) Int32x8
- func (x Int32x8) ShiftAllLeft(y uint64) Int32x8
- func (x Int32x8) ShiftAllLeftConcat(shift uint8, y Int32x8) Int32x8
- func (x Int32x8) ShiftAllRight(y uint64) Int32x8
- func (x Int32x8) ShiftAllRightConcat(shift uint8, y Int32x8) Int32x8
- func (x Int32x8) ShiftLeft(y Int32x8) Int32x8
- func (x Int32x8) ShiftLeftConcat(y Int32x8, z Int32x8) Int32x8
- func (x Int32x8) ShiftRight(y Int32x8) Int32x8
- func (x Int32x8) ShiftRightConcat(y Int32x8, z Int32x8) Int32x8
- func (x Int32x8) Store(y *[8]int32)
- func (x Int32x8) StoreMasked(y *[8]int32, mask Mask32x8)
- func (x Int32x8) StoreSlice(s []int32)
- func (x Int32x8) StoreSlicePart(s []int32)
- func (x Int32x8) String() string
- func (x Int32x8) Sub(y Int32x8) Int32x8
- func (x Int32x8) SubPairs(y Int32x8) Int32x8
- func (from Int32x8) ToMask() (to Mask32x8)
- func (x Int32x8) TruncateToInt16() Int16x8
- func (x Int32x8) TruncateToInt8() Int8x16
- func (x Int32x8) Xor(y Int32x8) Int32x8
- type Int64x2
- func (x Int64x2) Abs() Int64x2
- func (x Int64x2) Add(y Int64x2) Int64x2
- func (x Int64x2) And(y Int64x2) Int64x2
- func (x Int64x2) AndNot(y Int64x2) Int64x2
- func (from Int64x2) AsFloat32x4() (to Float32x4)
- func (from Int64x2) AsFloat64x2() (to Float64x2)
- func (from Int64x2) AsInt16x8() (to Int16x8)
- func (from Int64x2) AsInt32x4() (to Int32x4)
- func (from Int64x2) AsInt8x16() (to Int8x16)
- func (from Int64x2) AsUint16x8() (to Uint16x8)
- func (from Int64x2) AsUint32x4() (to Uint32x4)
- func (from Int64x2) AsUint64x2() (to Uint64x2)
- func (from Int64x2) AsUint8x16() (to Uint8x16)
- func (x Int64x2) Broadcast128() Int64x2
- func (x Int64x2) Broadcast256() Int64x4
- func (x Int64x2) Broadcast512() Int64x8
- func (x Int64x2) Compress(mask Mask64x2) Int64x2
- func (x Int64x2) ConcatPermute(y Int64x2, indices Uint64x2) Int64x2
- func (x Int64x2) ConvertToFloat32() Float32x4
- func (x Int64x2) ConvertToFloat64() Float64x2
- func (x Int64x2) Equal(y Int64x2) Mask64x2
- func (x Int64x2) Expand(mask Mask64x2) Int64x2
- func (x Int64x2) GetElem(index uint8) int64
- func (x Int64x2) Greater(y Int64x2) Mask64x2
- func (x Int64x2) GreaterEqual(y Int64x2) Mask64x2
- func (x Int64x2) InterleaveHi(y Int64x2) Int64x2
- func (x Int64x2) InterleaveLo(y Int64x2) Int64x2
- func (x Int64x2) IsZero() bool
- func (x Int64x2) LeadingZeros() Int64x2
- func (x Int64x2) Len() int
- func (x Int64x2) Less(y Int64x2) Mask64x2
- func (x Int64x2) LessEqual(y Int64x2) Mask64x2
- func (x Int64x2) Masked(mask Mask64x2) Int64x2
- func (x Int64x2) Max(y Int64x2) Int64x2
- func (x Int64x2) Merge(y Int64x2, mask Mask64x2) Int64x2
- func (x Int64x2) Min(y Int64x2) Int64x2
- func (x Int64x2) Mul(y Int64x2) Int64x2
- func (x Int64x2) Not() Int64x2
- func (x Int64x2) NotEqual(y Int64x2) Mask64x2
- func (x Int64x2) OnesCount() Int64x2
- func (x Int64x2) Or(y Int64x2) Int64x2
- func (x Int64x2) RotateAllLeft(shift uint8) Int64x2
- func (x Int64x2) RotateAllRight(shift uint8) Int64x2
- func (x Int64x2) RotateLeft(y Int64x2) Int64x2
- func (x Int64x2) RotateRight(y Int64x2) Int64x2
- func (x Int64x2) SaturateToInt16() Int16x8
- func (x Int64x2) SaturateToInt32() Int32x4
- func (x Int64x2) SaturateToInt8() Int8x16
- func (x Int64x2) SaturateToUint8() Int8x16
- func (x Int64x2) SelectFromPair(a, b uint8, y Int64x2) Int64x2
- func (x Int64x2) SetElem(index uint8, y int64) Int64x2
- func (x Int64x2) ShiftAllLeft(y uint64) Int64x2
- func (x Int64x2) ShiftAllLeftConcat(shift uint8, y Int64x2) Int64x2
- func (x Int64x2) ShiftAllRight(y uint64) Int64x2
- func (x Int64x2) ShiftAllRightConcat(shift uint8, y Int64x2) Int64x2
- func (x Int64x2) ShiftLeft(y Int64x2) Int64x2
- func (x Int64x2) ShiftLeftConcat(y Int64x2, z Int64x2) Int64x2
- func (x Int64x2) ShiftRight(y Int64x2) Int64x2
- func (x Int64x2) ShiftRightConcat(y Int64x2, z Int64x2) Int64x2
- func (x Int64x2) Store(y *[2]int64)
- func (x Int64x2) StoreMasked(y *[2]int64, mask Mask64x2)
- func (x Int64x2) StoreSlice(s []int64)
- func (x Int64x2) StoreSlicePart(s []int64)
- func (x Int64x2) String() string
- func (x Int64x2) Sub(y Int64x2) Int64x2
- func (from Int64x2) ToMask() (to Mask64x2)
- func (x Int64x2) TruncateToInt16() Int16x8
- func (x Int64x2) TruncateToInt32() Int32x4
- func (x Int64x2) TruncateToInt8() Int8x16
- func (x Int64x2) Xor(y Int64x2) Int64x2
- type Int64x4
- func (x Int64x4) Abs() Int64x4
- func (x Int64x4) Add(y Int64x4) Int64x4
- func (x Int64x4) And(y Int64x4) Int64x4
- func (x Int64x4) AndNot(y Int64x4) Int64x4
- func (from Int64x4) AsFloat32x8() (to Float32x8)
- func (from Int64x4) AsFloat64x4() (to Float64x4)
- func (from Int64x4) AsInt16x16() (to Int16x16)
- func (from Int64x4) AsInt32x8() (to Int32x8)
- func (from Int64x4) AsInt8x32() (to Int8x32)
- func (from Int64x4) AsUint16x16() (to Uint16x16)
- func (from Int64x4) AsUint32x8() (to Uint32x8)
- func (from Int64x4) AsUint64x4() (to Uint64x4)
- func (from Int64x4) AsUint8x32() (to Uint8x32)
- func (x Int64x4) Compress(mask Mask64x4) Int64x4
- func (x Int64x4) ConcatPermute(y Int64x4, indices Uint64x4) Int64x4
- func (x Int64x4) ConvertToFloat32() Float32x4
- func (x Int64x4) ConvertToFloat64() Float64x4
- func (x Int64x4) Equal(y Int64x4) Mask64x4
- func (x Int64x4) Expand(mask Mask64x4) Int64x4
- func (x Int64x4) GetHi() Int64x2
- func (x Int64x4) GetLo() Int64x2
- func (x Int64x4) Greater(y Int64x4) Mask64x4
- func (x Int64x4) GreaterEqual(y Int64x4) Mask64x4
- func (x Int64x4) InterleaveHiGrouped(y Int64x4) Int64x4
- func (x Int64x4) InterleaveLoGrouped(y Int64x4) Int64x4
- func (x Int64x4) IsZero() bool
- func (x Int64x4) LeadingZeros() Int64x4
- func (x Int64x4) Len() int
- func (x Int64x4) Less(y Int64x4) Mask64x4
- func (x Int64x4) LessEqual(y Int64x4) Mask64x4
- func (x Int64x4) Masked(mask Mask64x4) Int64x4
- func (x Int64x4) Max(y Int64x4) Int64x4
- func (x Int64x4) Merge(y Int64x4, mask Mask64x4) Int64x4
- func (x Int64x4) Min(y Int64x4) Int64x4
- func (x Int64x4) Mul(y Int64x4) Int64x4
- func (x Int64x4) Not() Int64x4
- func (x Int64x4) NotEqual(y Int64x4) Mask64x4
- func (x Int64x4) OnesCount() Int64x4
- func (x Int64x4) Or(y Int64x4) Int64x4
- func (x Int64x4) Permute(indices Uint64x4) Int64x4
- func (x Int64x4) RotateAllLeft(shift uint8) Int64x4
- func (x Int64x4) RotateAllRight(shift uint8) Int64x4
- func (x Int64x4) RotateLeft(y Int64x4) Int64x4
- func (x Int64x4) RotateRight(y Int64x4) Int64x4
- func (x Int64x4) SaturateToInt16() Int16x8
- func (x Int64x4) SaturateToInt32() Int32x4
- func (x Int64x4) SaturateToInt8() Int8x16
- func (x Int64x4) SaturateToUint8() Int8x16
- func (x Int64x4) Select128FromPair(lo, hi uint8, y Int64x4) Int64x4
- func (x Int64x4) SelectFromPairGrouped(a, b uint8, y Int64x4) Int64x4
- func (x Int64x4) SetHi(y Int64x2) Int64x4
- func (x Int64x4) SetLo(y Int64x2) Int64x4
- func (x Int64x4) ShiftAllLeft(y uint64) Int64x4
- func (x Int64x4) ShiftAllLeftConcat(shift uint8, y Int64x4) Int64x4
- func (x Int64x4) ShiftAllRight(y uint64) Int64x4
- func (x Int64x4) ShiftAllRightConcat(shift uint8, y Int64x4) Int64x4
- func (x Int64x4) ShiftLeft(y Int64x4) Int64x4
- func (x Int64x4) ShiftLeftConcat(y Int64x4, z Int64x4) Int64x4
- func (x Int64x4) ShiftRight(y Int64x4) Int64x4
- func (x Int64x4) ShiftRightConcat(y Int64x4, z Int64x4) Int64x4
- func (x Int64x4) Store(y *[4]int64)
- func (x Int64x4) StoreMasked(y *[4]int64, mask Mask64x4)
- func (x Int64x4) StoreSlice(s []int64)
- func (x Int64x4) StoreSlicePart(s []int64)
- func (x Int64x4) String() string
- func (x Int64x4) Sub(y Int64x4) Int64x4
- func (from Int64x4) ToMask() (to Mask64x4)
- func (x Int64x4) TruncateToInt16() Int16x8
- func (x Int64x4) TruncateToInt32() Int32x4
- func (x Int64x4) TruncateToInt8() Int8x16
- func (x Int64x4) Xor(y Int64x4) Int64x4
- type Int64x8
- func (x Int64x8) Abs() Int64x8
- func (x Int64x8) Add(y Int64x8) Int64x8
- func (x Int64x8) And(y Int64x8) Int64x8
- func (x Int64x8) AndNot(y Int64x8) Int64x8
- func (from Int64x8) AsFloat32x16() (to Float32x16)
- func (from Int64x8) AsFloat64x8() (to Float64x8)
- func (from Int64x8) AsInt16x32() (to Int16x32)
- func (from Int64x8) AsInt32x16() (to Int32x16)
- func (from Int64x8) AsInt8x64() (to Int8x64)
- func (from Int64x8) AsUint16x32() (to Uint16x32)
- func (from Int64x8) AsUint32x16() (to Uint32x16)
- func (from Int64x8) AsUint64x8() (to Uint64x8)
- func (from Int64x8) AsUint8x64() (to Uint8x64)
- func (x Int64x8) Compress(mask Mask64x8) Int64x8
- func (x Int64x8) ConcatPermute(y Int64x8, indices Uint64x8) Int64x8
- func (x Int64x8) ConvertToFloat32() Float32x8
- func (x Int64x8) ConvertToFloat64() Float64x8
- func (x Int64x8) Equal(y Int64x8) Mask64x8
- func (x Int64x8) Expand(mask Mask64x8) Int64x8
- func (x Int64x8) GetHi() Int64x4
- func (x Int64x8) GetLo() Int64x4
- func (x Int64x8) Greater(y Int64x8) Mask64x8
- func (x Int64x8) GreaterEqual(y Int64x8) Mask64x8
- func (x Int64x8) InterleaveHiGrouped(y Int64x8) Int64x8
- func (x Int64x8) InterleaveLoGrouped(y Int64x8) Int64x8
- func (x Int64x8) LeadingZeros() Int64x8
- func (x Int64x8) Len() int
- func (x Int64x8) Less(y Int64x8) Mask64x8
- func (x Int64x8) LessEqual(y Int64x8) Mask64x8
- func (x Int64x8) Masked(mask Mask64x8) Int64x8
- func (x Int64x8) Max(y Int64x8) Int64x8
- func (x Int64x8) Merge(y Int64x8, mask Mask64x8) Int64x8
- func (x Int64x8) Min(y Int64x8) Int64x8
- func (x Int64x8) Mul(y Int64x8) Int64x8
- func (x Int64x8) Not() Int64x8
- func (x Int64x8) NotEqual(y Int64x8) Mask64x8
- func (x Int64x8) OnesCount() Int64x8
- func (x Int64x8) Or(y Int64x8) Int64x8
- func (x Int64x8) Permute(indices Uint64x8) Int64x8
- func (x Int64x8) RotateAllLeft(shift uint8) Int64x8
- func (x Int64x8) RotateAllRight(shift uint8) Int64x8
- func (x Int64x8) RotateLeft(y Int64x8) Int64x8
- func (x Int64x8) RotateRight(y Int64x8) Int64x8
- func (x Int64x8) SaturateToInt16() Int16x8
- func (x Int64x8) SaturateToInt32() Int32x8
- func (x Int64x8) SaturateToInt8() Int8x16
- func (x Int64x8) SaturateToUint8() Int8x16
- func (x Int64x8) SelectFromPairGrouped(a, b uint8, y Int64x8) Int64x8
- func (x Int64x8) SetHi(y Int64x4) Int64x8
- func (x Int64x8) SetLo(y Int64x4) Int64x8
- func (x Int64x8) ShiftAllLeft(y uint64) Int64x8
- func (x Int64x8) ShiftAllLeftConcat(shift uint8, y Int64x8) Int64x8
- func (x Int64x8) ShiftAllRight(y uint64) Int64x8
- func (x Int64x8) ShiftAllRightConcat(shift uint8, y Int64x8) Int64x8
- func (x Int64x8) ShiftLeft(y Int64x8) Int64x8
- func (x Int64x8) ShiftLeftConcat(y Int64x8, z Int64x8) Int64x8
- func (x Int64x8) ShiftRight(y Int64x8) Int64x8
- func (x Int64x8) ShiftRightConcat(y Int64x8, z Int64x8) Int64x8
- func (x Int64x8) Store(y *[8]int64)
- func (x Int64x8) StoreMasked(y *[8]int64, mask Mask64x8)
- func (x Int64x8) StoreSlice(s []int64)
- func (x Int64x8) StoreSlicePart(s []int64)
- func (x Int64x8) String() string
- func (x Int64x8) Sub(y Int64x8) Int64x8
- func (from Int64x8) ToMask() (to Mask64x8)
- func (x Int64x8) TruncateToInt16() Int16x8
- func (x Int64x8) TruncateToInt32() Int32x8
- func (x Int64x8) TruncateToInt8() Int8x16
- func (x Int64x8) Xor(y Int64x8) Int64x8
- type Int8x16
- func (x Int8x16) Abs() Int8x16
- func (x Int8x16) Add(y Int8x16) Int8x16
- func (x Int8x16) AddSaturated(y Int8x16) Int8x16
- func (x Int8x16) And(y Int8x16) Int8x16
- func (x Int8x16) AndNot(y Int8x16) Int8x16
- func (from Int8x16) AsFloat32x4() (to Float32x4)
- func (from Int8x16) AsFloat64x2() (to Float64x2)
- func (from Int8x16) AsInt16x8() (to Int16x8)
- func (from Int8x16) AsInt32x4() (to Int32x4)
- func (from Int8x16) AsInt64x2() (to Int64x2)
- func (from Int8x16) AsUint16x8() (to Uint16x8)
- func (from Int8x16) AsUint32x4() (to Uint32x4)
- func (from Int8x16) AsUint64x2() (to Uint64x2)
- func (from Int8x16) AsUint8x16() (to Uint8x16)
- func (x Int8x16) Broadcast128() Int8x16
- func (x Int8x16) Broadcast256() Int8x32
- func (x Int8x16) Broadcast512() Int8x64
- func (x Int8x16) Compress(mask Mask8x16) Int8x16
- func (x Int8x16) ConcatPermute(y Int8x16, indices Uint8x16) Int8x16
- func (x Int8x16) CopySign(y Int8x16) Int8x16
- func (x Int8x16) DotProductQuadruple(y Uint8x16) Int32x4
- func (x Int8x16) DotProductQuadrupleSaturated(y Uint8x16) Int32x4
- func (x Int8x16) Equal(y Int8x16) Mask8x16
- func (x Int8x16) Expand(mask Mask8x16) Int8x16
- func (x Int8x16) ExtendLo2ToInt64x2() Int64x2
- func (x Int8x16) ExtendLo4ToInt32x4() Int32x4
- func (x Int8x16) ExtendLo4ToInt64x4() Int64x4
- func (x Int8x16) ExtendLo8ToInt16x8() Int16x8
- func (x Int8x16) ExtendLo8ToInt32x8() Int32x8
- func (x Int8x16) ExtendLo8ToInt64x8() Int64x8
- func (x Int8x16) ExtendToInt16() Int16x16
- func (x Int8x16) ExtendToInt32() Int32x16
- func (x Int8x16) GetElem(index uint8) int8
- func (x Int8x16) Greater(y Int8x16) Mask8x16
- func (x Int8x16) GreaterEqual(y Int8x16) Mask8x16
- func (x Int8x16) IsZero() bool
- func (x Int8x16) Len() int
- func (x Int8x16) Less(y Int8x16) Mask8x16
- func (x Int8x16) LessEqual(y Int8x16) Mask8x16
- func (x Int8x16) Masked(mask Mask8x16) Int8x16
- func (x Int8x16) Max(y Int8x16) Int8x16
- func (x Int8x16) Merge(y Int8x16, mask Mask8x16) Int8x16
- func (x Int8x16) Min(y Int8x16) Int8x16
- func (x Int8x16) Not() Int8x16
- func (x Int8x16) NotEqual(y Int8x16) Mask8x16
- func (x Int8x16) OnesCount() Int8x16
- func (x Int8x16) Or(y Int8x16) Int8x16
- func (x Int8x16) Permute(indices Uint8x16) Int8x16
- func (x Int8x16) PermuteOrZero(indices Int8x16) Int8x16
- func (x Int8x16) SetElem(index uint8, y int8) Int8x16
- func (x Int8x16) Store(y *[16]int8)
- func (x Int8x16) StoreSlice(s []int8)
- func (x Int8x16) StoreSlicePart(s []int8)
- func (x Int8x16) String() string
- func (x Int8x16) Sub(y Int8x16) Int8x16
- func (x Int8x16) SubSaturated(y Int8x16) Int8x16
- func (from Int8x16) ToMask() (to Mask8x16)
- func (x Int8x16) Xor(y Int8x16) Int8x16
- type Int8x32
- func (x Int8x32) Abs() Int8x32
- func (x Int8x32) Add(y Int8x32) Int8x32
- func (x Int8x32) AddSaturated(y Int8x32) Int8x32
- func (x Int8x32) And(y Int8x32) Int8x32
- func (x Int8x32) AndNot(y Int8x32) Int8x32
- func (from Int8x32) AsFloat32x8() (to Float32x8)
- func (from Int8x32) AsFloat64x4() (to Float64x4)
- func (from Int8x32) AsInt16x16() (to Int16x16)
- func (from Int8x32) AsInt32x8() (to Int32x8)
- func (from Int8x32) AsInt64x4() (to Int64x4)
- func (from Int8x32) AsUint16x16() (to Uint16x16)
- func (from Int8x32) AsUint32x8() (to Uint32x8)
- func (from Int8x32) AsUint64x4() (to Uint64x4)
- func (from Int8x32) AsUint8x32() (to Uint8x32)
- func (x Int8x32) Compress(mask Mask8x32) Int8x32
- func (x Int8x32) ConcatPermute(y Int8x32, indices Uint8x32) Int8x32
- func (x Int8x32) CopySign(y Int8x32) Int8x32
- func (x Int8x32) DotProductQuadruple(y Uint8x32) Int32x8
- func (x Int8x32) DotProductQuadrupleSaturated(y Uint8x32) Int32x8
- func (x Int8x32) Equal(y Int8x32) Mask8x32
- func (x Int8x32) Expand(mask Mask8x32) Int8x32
- func (x Int8x32) ExtendToInt16() Int16x32
- func (x Int8x32) GetHi() Int8x16
- func (x Int8x32) GetLo() Int8x16
- func (x Int8x32) Greater(y Int8x32) Mask8x32
- func (x Int8x32) GreaterEqual(y Int8x32) Mask8x32
- func (x Int8x32) IsZero() bool
- func (x Int8x32) Len() int
- func (x Int8x32) Less(y Int8x32) Mask8x32
- func (x Int8x32) LessEqual(y Int8x32) Mask8x32
- func (x Int8x32) Masked(mask Mask8x32) Int8x32
- func (x Int8x32) Max(y Int8x32) Int8x32
- func (x Int8x32) Merge(y Int8x32, mask Mask8x32) Int8x32
- func (x Int8x32) Min(y Int8x32) Int8x32
- func (x Int8x32) Not() Int8x32
- func (x Int8x32) NotEqual(y Int8x32) Mask8x32
- func (x Int8x32) OnesCount() Int8x32
- func (x Int8x32) Or(y Int8x32) Int8x32
- func (x Int8x32) Permute(indices Uint8x32) Int8x32
- func (x Int8x32) PermuteOrZeroGrouped(indices Int8x32) Int8x32
- func (x Int8x32) Select128FromPair(lo, hi uint8, y Int8x32) Int8x32
- func (x Int8x32) SetHi(y Int8x16) Int8x32
- func (x Int8x32) SetLo(y Int8x16) Int8x32
- func (x Int8x32) Store(y *[32]int8)
- func (x Int8x32) StoreSlice(s []int8)
- func (x Int8x32) StoreSlicePart(s []int8)
- func (x Int8x32) String() string
- func (x Int8x32) Sub(y Int8x32) Int8x32
- func (x Int8x32) SubSaturated(y Int8x32) Int8x32
- func (from Int8x32) ToMask() (to Mask8x32)
- func (x Int8x32) Xor(y Int8x32) Int8x32
- type Int8x64
- func (x Int8x64) Abs() Int8x64
- func (x Int8x64) Add(y Int8x64) Int8x64
- func (x Int8x64) AddSaturated(y Int8x64) Int8x64
- func (x Int8x64) And(y Int8x64) Int8x64
- func (x Int8x64) AndNot(y Int8x64) Int8x64
- func (from Int8x64) AsFloat32x16() (to Float32x16)
- func (from Int8x64) AsFloat64x8() (to Float64x8)
- func (from Int8x64) AsInt16x32() (to Int16x32)
- func (from Int8x64) AsInt32x16() (to Int32x16)
- func (from Int8x64) AsInt64x8() (to Int64x8)
- func (from Int8x64) AsUint16x32() (to Uint16x32)
- func (from Int8x64) AsUint32x16() (to Uint32x16)
- func (from Int8x64) AsUint64x8() (to Uint64x8)
- func (from Int8x64) AsUint8x64() (to Uint8x64)
- func (x Int8x64) Compress(mask Mask8x64) Int8x64
- func (x Int8x64) ConcatPermute(y Int8x64, indices Uint8x64) Int8x64
- func (x Int8x64) DotProductQuadruple(y Uint8x64) Int32x16
- func (x Int8x64) DotProductQuadrupleSaturated(y Uint8x64) Int32x16
- func (x Int8x64) Equal(y Int8x64) Mask8x64
- func (x Int8x64) Expand(mask Mask8x64) Int8x64
- func (x Int8x64) GetHi() Int8x32
- func (x Int8x64) GetLo() Int8x32
- func (x Int8x64) Greater(y Int8x64) Mask8x64
- func (x Int8x64) GreaterEqual(y Int8x64) Mask8x64
- func (x Int8x64) Len() int
- func (x Int8x64) Less(y Int8x64) Mask8x64
- func (x Int8x64) LessEqual(y Int8x64) Mask8x64
- func (x Int8x64) Masked(mask Mask8x64) Int8x64
- func (x Int8x64) Max(y Int8x64) Int8x64
- func (x Int8x64) Merge(y Int8x64, mask Mask8x64) Int8x64
- func (x Int8x64) Min(y Int8x64) Int8x64
- func (x Int8x64) Not() Int8x64
- func (x Int8x64) NotEqual(y Int8x64) Mask8x64
- func (x Int8x64) OnesCount() Int8x64
- func (x Int8x64) Or(y Int8x64) Int8x64
- func (x Int8x64) Permute(indices Uint8x64) Int8x64
- func (x Int8x64) PermuteOrZeroGrouped(indices Int8x64) Int8x64
- func (x Int8x64) SetHi(y Int8x32) Int8x64
- func (x Int8x64) SetLo(y Int8x32) Int8x64
- func (x Int8x64) Store(y *[64]int8)
- func (x Int8x64) StoreMasked(y *[64]int8, mask Mask8x64)
- func (x Int8x64) StoreSlice(s []int8)
- func (x Int8x64) StoreSlicePart(s []int8)
- func (x Int8x64) String() string
- func (x Int8x64) Sub(y Int8x64) Int8x64
- func (x Int8x64) SubSaturated(y Int8x64) Int8x64
- func (from Int8x64) ToMask() (to Mask8x64)
- func (x Int8x64) Xor(y Int8x64) Int8x64
- type Mask16x16
- type Mask16x32
- type Mask16x8
- type Mask32x16
- type Mask32x4
- type Mask32x8
- type Mask64x2
- type Mask64x4
- type Mask64x8
- type Mask8x16
- type Mask8x32
- type Mask8x64
- type Uint16x16
- func (x Uint16x16) Add(y Uint16x16) Uint16x16
- func (x Uint16x16) AddPairs(y Uint16x16) Uint16x16
- func (x Uint16x16) AddSaturated(y Uint16x16) Uint16x16
- func (x Uint16x16) And(y Uint16x16) Uint16x16
- func (x Uint16x16) AndNot(y Uint16x16) Uint16x16
- func (from Uint16x16) AsFloat32x8() (to Float32x8)
- func (from Uint16x16) AsFloat64x4() (to Float64x4)
- func (from Uint16x16) AsInt16x16() (to Int16x16)
- func (from Uint16x16) AsInt32x8() (to Int32x8)
- func (from Uint16x16) AsInt64x4() (to Int64x4)
- func (from Uint16x16) AsInt8x32() (to Int8x32)
- func (from Uint16x16) AsUint32x8() (to Uint32x8)
- func (from Uint16x16) AsUint64x4() (to Uint64x4)
- func (from Uint16x16) AsUint8x32() (to Uint8x32)
- func (x Uint16x16) Average(y Uint16x16) Uint16x16
- func (x Uint16x16) Compress(mask Mask16x16) Uint16x16
- func (x Uint16x16) ConcatPermute(y Uint16x16, indices Uint16x16) Uint16x16
- func (x Uint16x16) Equal(y Uint16x16) Mask16x16
- func (x Uint16x16) Expand(mask Mask16x16) Uint16x16
- func (x Uint16x16) ExtendToUint32() Uint32x16
- func (x Uint16x16) GetHi() Uint16x8
- func (x Uint16x16) GetLo() Uint16x8
- func (x Uint16x16) Greater(y Uint16x16) Mask16x16
- func (x Uint16x16) GreaterEqual(y Uint16x16) Mask16x16
- func (x Uint16x16) InterleaveHiGrouped(y Uint16x16) Uint16x16
- func (x Uint16x16) InterleaveLoGrouped(y Uint16x16) Uint16x16
- func (x Uint16x16) IsZero() bool
- func (x Uint16x16) Len() int
- func (x Uint16x16) Less(y Uint16x16) Mask16x16
- func (x Uint16x16) LessEqual(y Uint16x16) Mask16x16
- func (x Uint16x16) Masked(mask Mask16x16) Uint16x16
- func (x Uint16x16) Max(y Uint16x16) Uint16x16
- func (x Uint16x16) Merge(y Uint16x16, mask Mask16x16) Uint16x16
- func (x Uint16x16) Min(y Uint16x16) Uint16x16
- func (x Uint16x16) Mul(y Uint16x16) Uint16x16
- func (x Uint16x16) MulHigh(y Uint16x16) Uint16x16
- func (x Uint16x16) Not() Uint16x16
- func (x Uint16x16) NotEqual(y Uint16x16) Mask16x16
- func (x Uint16x16) OnesCount() Uint16x16
- func (x Uint16x16) Or(y Uint16x16) Uint16x16
- func (x Uint16x16) Permute(indices Uint16x16) Uint16x16
- func (x Uint16x16) PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x16
- func (x Uint16x16) PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x16
- func (x Uint16x16) Select128FromPair(lo, hi uint8, y Uint16x16) Uint16x16
- func (x Uint16x16) SetHi(y Uint16x8) Uint16x16
- func (x Uint16x16) SetLo(y Uint16x8) Uint16x16
- func (x Uint16x16) ShiftAllLeft(y uint64) Uint16x16
- func (x Uint16x16) ShiftAllLeftConcat(shift uint8, y Uint16x16) Uint16x16
- func (x Uint16x16) ShiftAllRight(y uint64) Uint16x16
- func (x Uint16x16) ShiftAllRightConcat(shift uint8, y Uint16x16) Uint16x16
- func (x Uint16x16) ShiftLeft(y Uint16x16) Uint16x16
- func (x Uint16x16) ShiftLeftConcat(y Uint16x16, z Uint16x16) Uint16x16
- func (x Uint16x16) ShiftRight(y Uint16x16) Uint16x16
- func (x Uint16x16) ShiftRightConcat(y Uint16x16, z Uint16x16) Uint16x16
- func (x Uint16x16) Store(y *[16]uint16)
- func (x Uint16x16) StoreSlice(s []uint16)
- func (x Uint16x16) StoreSlicePart(s []uint16)
- func (x Uint16x16) String() string
- func (x Uint16x16) Sub(y Uint16x16) Uint16x16
- func (x Uint16x16) SubPairs(y Uint16x16) Uint16x16
- func (x Uint16x16) SubSaturated(y Uint16x16) Uint16x16
- func (x Uint16x16) TruncateToUint8() Uint8x16
- func (x Uint16x16) Xor(y Uint16x16) Uint16x16
- type Uint16x32
- func (x Uint16x32) Add(y Uint16x32) Uint16x32
- func (x Uint16x32) AddSaturated(y Uint16x32) Uint16x32
- func (x Uint16x32) And(y Uint16x32) Uint16x32
- func (x Uint16x32) AndNot(y Uint16x32) Uint16x32
- func (from Uint16x32) AsFloat32x16() (to Float32x16)
- func (from Uint16x32) AsFloat64x8() (to Float64x8)
- func (from Uint16x32) AsInt16x32() (to Int16x32)
- func (from Uint16x32) AsInt32x16() (to Int32x16)
- func (from Uint16x32) AsInt64x8() (to Int64x8)
- func (from Uint16x32) AsInt8x64() (to Int8x64)
- func (from Uint16x32) AsUint32x16() (to Uint32x16)
- func (from Uint16x32) AsUint64x8() (to Uint64x8)
- func (from Uint16x32) AsUint8x64() (to Uint8x64)
- func (x Uint16x32) Average(y Uint16x32) Uint16x32
- func (x Uint16x32) Compress(mask Mask16x32) Uint16x32
- func (x Uint16x32) ConcatPermute(y Uint16x32, indices Uint16x32) Uint16x32
- func (x Uint16x32) Equal(y Uint16x32) Mask16x32
- func (x Uint16x32) Expand(mask Mask16x32) Uint16x32
- func (x Uint16x32) GetHi() Uint16x16
- func (x Uint16x32) GetLo() Uint16x16
- func (x Uint16x32) Greater(y Uint16x32) Mask16x32
- func (x Uint16x32) GreaterEqual(y Uint16x32) Mask16x32
- func (x Uint16x32) InterleaveHiGrouped(y Uint16x32) Uint16x32
- func (x Uint16x32) InterleaveLoGrouped(y Uint16x32) Uint16x32
- func (x Uint16x32) Len() int
- func (x Uint16x32) Less(y Uint16x32) Mask16x32
- func (x Uint16x32) LessEqual(y Uint16x32) Mask16x32
- func (x Uint16x32) Masked(mask Mask16x32) Uint16x32
- func (x Uint16x32) Max(y Uint16x32) Uint16x32
- func (x Uint16x32) Merge(y Uint16x32, mask Mask16x32) Uint16x32
- func (x Uint16x32) Min(y Uint16x32) Uint16x32
- func (x Uint16x32) Mul(y Uint16x32) Uint16x32
- func (x Uint16x32) MulHigh(y Uint16x32) Uint16x32
- func (x Uint16x32) Not() Uint16x32
- func (x Uint16x32) NotEqual(y Uint16x32) Mask16x32
- func (x Uint16x32) OnesCount() Uint16x32
- func (x Uint16x32) Or(y Uint16x32) Uint16x32
- func (x Uint16x32) Permute(indices Uint16x32) Uint16x32
- func (x Uint16x32) PermuteScalarsHiGrouped(a, b, c, d uint8) Uint16x32
- func (x Uint16x32) PermuteScalarsLoGrouped(a, b, c, d uint8) Uint16x32
- func (x Uint16x32) SaturateToUint8() Uint8x32
- func (x Uint16x32) SetHi(y Uint16x16) Uint16x32
- func (x Uint16x32) SetLo(y Uint16x16) Uint16x32
- func (x Uint16x32) ShiftAllLeft(y uint64) Uint16x32
- func (x Uint16x32) ShiftAllLeftConcat(shift uint8, y Uint16x32) Uint16x32
- func (x Uint16x32) ShiftAllRight(y uint64) Uint16x32
- func (x Uint16x32) ShiftAllRightConcat(shift uint8, y Uint16x32) Uint16x32
- func (x Uint16x32) ShiftLeft(y Uint16x32) Uint16x32
- func (x Uint16x32) ShiftLeftConcat(y Uint16x32, z Uint16x32) Uint16x32
- func (x Uint16x32) ShiftRight(y Uint16x32) Uint16x32
- func (x Uint16x32) ShiftRightConcat(y Uint16x32, z Uint16x32) Uint16x32
- func (x Uint16x32) Store(y *[32]uint16)
- func (x Uint16x32) StoreMasked(y *[32]uint16, mask Mask16x32)
- func (x Uint16x32) StoreSlice(s []uint16)
- func (x Uint16x32) StoreSlicePart(s []uint16)
- func (x Uint16x32) String() string
- func (x Uint16x32) Sub(y Uint16x32) Uint16x32
- func (x Uint16x32) SubSaturated(y Uint16x32) Uint16x32
- func (x Uint16x32) TruncateToUint8() Uint8x32
- func (x Uint16x32) Xor(y Uint16x32) Uint16x32
- type Uint16x8
- func (x Uint16x8) Add(y Uint16x8) Uint16x8
- func (x Uint16x8) AddPairs(y Uint16x8) Uint16x8
- func (x Uint16x8) AddSaturated(y Uint16x8) Uint16x8
- func (x Uint16x8) And(y Uint16x8) Uint16x8
- func (x Uint16x8) AndNot(y Uint16x8) Uint16x8
- func (from Uint16x8) AsFloat32x4() (to Float32x4)
- func (from Uint16x8) AsFloat64x2() (to Float64x2)
- func (from Uint16x8) AsInt16x8() (to Int16x8)
- func (from Uint16x8) AsInt32x4() (to Int32x4)
- func (from Uint16x8) AsInt64x2() (to Int64x2)
- func (from Uint16x8) AsInt8x16() (to Int8x16)
- func (from Uint16x8) AsUint32x4() (to Uint32x4)
- func (from Uint16x8) AsUint64x2() (to Uint64x2)
- func (from Uint16x8) AsUint8x16() (to Uint8x16)
- func (x Uint16x8) Average(y Uint16x8) Uint16x8
- func (x Uint16x8) Broadcast128() Uint16x8
- func (x Uint16x8) Broadcast256() Uint16x16
- func (x Uint16x8) Broadcast512() Uint16x32
- func (x Uint16x8) Compress(mask Mask16x8) Uint16x8
- func (x Uint16x8) ConcatPermute(y Uint16x8, indices Uint16x8) Uint16x8
- func (x Uint16x8) Equal(y Uint16x8) Mask16x8
- func (x Uint16x8) Expand(mask Mask16x8) Uint16x8
- func (x Uint16x8) ExtendLo2ToUint64x2() Uint64x2
- func (x Uint16x8) ExtendLo4ToUint32x4() Uint32x4
- func (x Uint16x8) ExtendLo4ToUint64x4() Uint64x4
- func (x Uint16x8) ExtendToUint32() Uint32x8
- func (x Uint16x8) ExtendToUint64() Uint64x8
- func (x Uint16x8) GetElem(index uint8) uint16
- func (x Uint16x8) Greater(y Uint16x8) Mask16x8
- func (x Uint16x8) GreaterEqual(y Uint16x8) Mask16x8
- func (x Uint16x8) InterleaveHi(y Uint16x8) Uint16x8
- func (x Uint16x8) InterleaveLo(y Uint16x8) Uint16x8
- func (x Uint16x8) IsZero() bool
- func (x Uint16x8) Len() int
- func (x Uint16x8) Less(y Uint16x8) Mask16x8
- func (x Uint16x8) LessEqual(y Uint16x8) Mask16x8
- func (x Uint16x8) Masked(mask Mask16x8) Uint16x8
- func (x Uint16x8) Max(y Uint16x8) Uint16x8
- func (x Uint16x8) Merge(y Uint16x8, mask Mask16x8) Uint16x8
- func (x Uint16x8) Min(y Uint16x8) Uint16x8
- func (x Uint16x8) Mul(y Uint16x8) Uint16x8
- func (x Uint16x8) MulHigh(y Uint16x8) Uint16x8
- func (x Uint16x8) Not() Uint16x8
- func (x Uint16x8) NotEqual(y Uint16x8) Mask16x8
- func (x Uint16x8) OnesCount() Uint16x8
- func (x Uint16x8) Or(y Uint16x8) Uint16x8
- func (x Uint16x8) Permute(indices Uint16x8) Uint16x8
- func (x Uint16x8) PermuteScalarsHi(a, b, c, d uint8) Uint16x8
- func (x Uint16x8) PermuteScalarsLo(a, b, c, d uint8) Uint16x8
- func (x Uint16x8) SetElem(index uint8, y uint16) Uint16x8
- func (x Uint16x8) ShiftAllLeft(y uint64) Uint16x8
- func (x Uint16x8) ShiftAllLeftConcat(shift uint8, y Uint16x8) Uint16x8
- func (x Uint16x8) ShiftAllRight(y uint64) Uint16x8
- func (x Uint16x8) ShiftAllRightConcat(shift uint8, y Uint16x8) Uint16x8
- func (x Uint16x8) ShiftLeft(y Uint16x8) Uint16x8
- func (x Uint16x8) ShiftLeftConcat(y Uint16x8, z Uint16x8) Uint16x8
- func (x Uint16x8) ShiftRight(y Uint16x8) Uint16x8
- func (x Uint16x8) ShiftRightConcat(y Uint16x8, z Uint16x8) Uint16x8
- func (x Uint16x8) Store(y *[8]uint16)
- func (x Uint16x8) StoreSlice(s []uint16)
- func (x Uint16x8) StoreSlicePart(s []uint16)
- func (x Uint16x8) String() string
- func (x Uint16x8) Sub(y Uint16x8) Uint16x8
- func (x Uint16x8) SubPairs(y Uint16x8) Uint16x8
- func (x Uint16x8) SubSaturated(y Uint16x8) Uint16x8
- func (x Uint16x8) TruncateToUint8() Uint8x16
- func (x Uint16x8) Xor(y Uint16x8) Uint16x8
- type Uint32x16
- func (x Uint32x16) Add(y Uint32x16) Uint32x16
- func (x Uint32x16) And(y Uint32x16) Uint32x16
- func (x Uint32x16) AndNot(y Uint32x16) Uint32x16
- func (from Uint32x16) AsFloat32x16() (to Float32x16)
- func (from Uint32x16) AsFloat64x8() (to Float64x8)
- func (from Uint32x16) AsInt16x32() (to Int16x32)
- func (from Uint32x16) AsInt32x16() (to Int32x16)
- func (from Uint32x16) AsInt64x8() (to Int64x8)
- func (from Uint32x16) AsInt8x64() (to Int8x64)
- func (from Uint32x16) AsUint16x32() (to Uint16x32)
- func (from Uint32x16) AsUint64x8() (to Uint64x8)
- func (from Uint32x16) AsUint8x64() (to Uint8x64)
- func (x Uint32x16) Compress(mask Mask32x16) Uint32x16
- func (x Uint32x16) ConcatPermute(y Uint32x16, indices Uint32x16) Uint32x16
- func (x Uint32x16) ConvertToFloat32() Float32x16
- func (x Uint32x16) Equal(y Uint32x16) Mask32x16
- func (x Uint32x16) Expand(mask Mask32x16) Uint32x16
- func (x Uint32x16) GetHi() Uint32x8
- func (x Uint32x16) GetLo() Uint32x8
- func (x Uint32x16) Greater(y Uint32x16) Mask32x16
- func (x Uint32x16) GreaterEqual(y Uint32x16) Mask32x16
- func (x Uint32x16) InterleaveHiGrouped(y Uint32x16) Uint32x16
- func (x Uint32x16) InterleaveLoGrouped(y Uint32x16) Uint32x16
- func (x Uint32x16) LeadingZeros() Uint32x16
- func (x Uint32x16) Len() int
- func (x Uint32x16) Less(y Uint32x16) Mask32x16
- func (x Uint32x16) LessEqual(y Uint32x16) Mask32x16
- func (x Uint32x16) Masked(mask Mask32x16) Uint32x16
- func (x Uint32x16) Max(y Uint32x16) Uint32x16
- func (x Uint32x16) Merge(y Uint32x16, mask Mask32x16) Uint32x16
- func (x Uint32x16) Min(y Uint32x16) Uint32x16
- func (x Uint32x16) Mul(y Uint32x16) Uint32x16
- func (x Uint32x16) Not() Uint32x16
- func (x Uint32x16) NotEqual(y Uint32x16) Mask32x16
- func (x Uint32x16) OnesCount() Uint32x16
- func (x Uint32x16) Or(y Uint32x16) Uint32x16
- func (x Uint32x16) Permute(indices Uint32x16) Uint32x16
- func (x Uint32x16) PermuteScalarsGrouped(a, b, c, d uint8) Uint32x16
- func (x Uint32x16) RotateAllLeft(shift uint8) Uint32x16
- func (x Uint32x16) RotateAllRight(shift uint8) Uint32x16
- func (x Uint32x16) RotateLeft(y Uint32x16) Uint32x16
- func (x Uint32x16) RotateRight(y Uint32x16) Uint32x16
- func (x Uint32x16) SaturateToUint16() Uint16x16
- func (x Uint32x16) SaturateToUint16Concat(y Uint32x16) Uint16x32
- func (x Uint32x16) SelectFromPairGrouped(a, b, c, d uint8, y Uint32x16) Uint32x16
- func (x Uint32x16) SetHi(y Uint32x8) Uint32x16
- func (x Uint32x16) SetLo(y Uint32x8) Uint32x16
- func (x Uint32x16) ShiftAllLeft(y uint64) Uint32x16
- func (x Uint32x16) ShiftAllLeftConcat(shift uint8, y Uint32x16) Uint32x16
- func (x Uint32x16) ShiftAllRight(y uint64) Uint32x16
- func (x Uint32x16) ShiftAllRightConcat(shift uint8, y Uint32x16) Uint32x16
- func (x Uint32x16) ShiftLeft(y Uint32x16) Uint32x16
- func (x Uint32x16) ShiftLeftConcat(y Uint32x16, z Uint32x16) Uint32x16
- func (x Uint32x16) ShiftRight(y Uint32x16) Uint32x16
- func (x Uint32x16) ShiftRightConcat(y Uint32x16, z Uint32x16) Uint32x16
- func (x Uint32x16) Store(y *[16]uint32)
- func (x Uint32x16) StoreMasked(y *[16]uint32, mask Mask32x16)
- func (x Uint32x16) StoreSlice(s []uint32)
- func (x Uint32x16) StoreSlicePart(s []uint32)
- func (x Uint32x16) String() string
- func (x Uint32x16) Sub(y Uint32x16) Uint32x16
- func (x Uint32x16) TruncateToUint16() Uint16x16
- func (x Uint32x16) TruncateToUint8() Uint8x16
- func (x Uint32x16) Xor(y Uint32x16) Uint32x16
- type Uint32x4
- func (x Uint32x4) AESInvMixColumns() Uint32x4
- func (x Uint32x4) AESRoundKeyGenAssist(rconVal uint8) Uint32x4
- func (x Uint32x4) Add(y Uint32x4) Uint32x4
- func (x Uint32x4) AddPairs(y Uint32x4) Uint32x4
- func (x Uint32x4) And(y Uint32x4) Uint32x4
- func (x Uint32x4) AndNot(y Uint32x4) Uint32x4
- func (from Uint32x4) AsFloat32x4() (to Float32x4)
- func (from Uint32x4) AsFloat64x2() (to Float64x2)
- func (from Uint32x4) AsInt16x8() (to Int16x8)
- func (from Uint32x4) AsInt32x4() (to Int32x4)
- func (from Uint32x4) AsInt64x2() (to Int64x2)
- func (from Uint32x4) AsInt8x16() (to Int8x16)
- func (from Uint32x4) AsUint16x8() (to Uint16x8)
- func (from Uint32x4) AsUint64x2() (to Uint64x2)
- func (from Uint32x4) AsUint8x16() (to Uint8x16)
- func (x Uint32x4) Broadcast128() Uint32x4
- func (x Uint32x4) Broadcast256() Uint32x8
- func (x Uint32x4) Broadcast512() Uint32x16
- func (x Uint32x4) Compress(mask Mask32x4) Uint32x4
- func (x Uint32x4) ConcatPermute(y Uint32x4, indices Uint32x4) Uint32x4
- func (x Uint32x4) ConvertToFloat32() Float32x4
- func (x Uint32x4) ConvertToFloat64() Float64x4
- func (x Uint32x4) Equal(y Uint32x4) Mask32x4
- func (x Uint32x4) Expand(mask Mask32x4) Uint32x4
- func (x Uint32x4) ExtendLo2ToUint64x2() Uint64x2
- func (x Uint32x4) ExtendToUint64() Uint64x4
- func (x Uint32x4) GetElem(index uint8) uint32
- func (x Uint32x4) Greater(y Uint32x4) Mask32x4
- func (x Uint32x4) GreaterEqual(y Uint32x4) Mask32x4
- func (x Uint32x4) InterleaveHi(y Uint32x4) Uint32x4
- func (x Uint32x4) InterleaveLo(y Uint32x4) Uint32x4
- func (x Uint32x4) IsZero() bool
- func (x Uint32x4) LeadingZeros() Uint32x4
- func (x Uint32x4) Len() int
- func (x Uint32x4) Less(y Uint32x4) Mask32x4
- func (x Uint32x4) LessEqual(y Uint32x4) Mask32x4
- func (x Uint32x4) Masked(mask Mask32x4) Uint32x4
- func (x Uint32x4) Max(y Uint32x4) Uint32x4
- func (x Uint32x4) Merge(y Uint32x4, mask Mask32x4) Uint32x4
- func (x Uint32x4) Min(y Uint32x4) Uint32x4
- func (x Uint32x4) Mul(y Uint32x4) Uint32x4
- func (x Uint32x4) MulEvenWiden(y Uint32x4) Uint64x2
- func (x Uint32x4) Not() Uint32x4
- func (x Uint32x4) NotEqual(y Uint32x4) Mask32x4
- func (x Uint32x4) OnesCount() Uint32x4
- func (x Uint32x4) Or(y Uint32x4) Uint32x4
- func (x Uint32x4) PermuteScalars(a, b, c, d uint8) Uint32x4
- func (x Uint32x4) RotateAllLeft(shift uint8) Uint32x4
- func (x Uint32x4) RotateAllRight(shift uint8) Uint32x4
- func (x Uint32x4) RotateLeft(y Uint32x4) Uint32x4
- func (x Uint32x4) RotateRight(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA1FourRounds(constant uint8, y Uint32x4) Uint32x4
- func (x Uint32x4) SHA1Message1(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA1Message2(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA1NextE(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA256Message1(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA256Message2(y Uint32x4) Uint32x4
- func (x Uint32x4) SHA256TwoRounds(y Uint32x4, z Uint32x4) Uint32x4
- func (x Uint32x4) SaturateToUint16() Uint16x8
- func (x Uint32x4) SaturateToUint16Concat(y Uint32x4) Uint16x8
- func (x Uint32x4) SelectFromPair(a, b, c, d uint8, y Uint32x4) Uint32x4
- func (x Uint32x4) SetElem(index uint8, y uint32) Uint32x4
- func (x Uint32x4) ShiftAllLeft(y uint64) Uint32x4
- func (x Uint32x4) ShiftAllLeftConcat(shift uint8, y Uint32x4) Uint32x4
- func (x Uint32x4) ShiftAllRight(y uint64) Uint32x4
- func (x Uint32x4) ShiftAllRightConcat(shift uint8, y Uint32x4) Uint32x4
- func (x Uint32x4) ShiftLeft(y Uint32x4) Uint32x4
- func (x Uint32x4) ShiftLeftConcat(y Uint32x4, z Uint32x4) Uint32x4
- func (x Uint32x4) ShiftRight(y Uint32x4) Uint32x4
- func (x Uint32x4) ShiftRightConcat(y Uint32x4, z Uint32x4) Uint32x4
- func (x Uint32x4) Store(y *[4]uint32)
- func (x Uint32x4) StoreMasked(y *[4]uint32, mask Mask32x4)
- func (x Uint32x4) StoreSlice(s []uint32)
- func (x Uint32x4) StoreSlicePart(s []uint32)
- func (x Uint32x4) String() string
- func (x Uint32x4) Sub(y Uint32x4) Uint32x4
- func (x Uint32x4) SubPairs(y Uint32x4) Uint32x4
- func (x Uint32x4) TruncateToUint16() Uint16x8
- func (x Uint32x4) TruncateToUint8() Uint8x16
- func (x Uint32x4) Xor(y Uint32x4) Uint32x4
- type Uint32x8
- func (x Uint32x8) Add(y Uint32x8) Uint32x8
- func (x Uint32x8) AddPairs(y Uint32x8) Uint32x8
- func (x Uint32x8) And(y Uint32x8) Uint32x8
- func (x Uint32x8) AndNot(y Uint32x8) Uint32x8
- func (from Uint32x8) AsFloat32x8() (to Float32x8)
- func (from Uint32x8) AsFloat64x4() (to Float64x4)
- func (from Uint32x8) AsInt16x16() (to Int16x16)
- func (from Uint32x8) AsInt32x8() (to Int32x8)
- func (from Uint32x8) AsInt64x4() (to Int64x4)
- func (from Uint32x8) AsInt8x32() (to Int8x32)
- func (from Uint32x8) AsUint16x16() (to Uint16x16)
- func (from Uint32x8) AsUint64x4() (to Uint64x4)
- func (from Uint32x8) AsUint8x32() (to Uint8x32)
- func (x Uint32x8) Compress(mask Mask32x8) Uint32x8
- func (x Uint32x8) ConcatPermute(y Uint32x8, indices Uint32x8) Uint32x8
- func (x Uint32x8) ConvertToFloat32() Float32x8
- func (x Uint32x8) ConvertToFloat64() Float64x8
- func (x Uint32x8) Equal(y Uint32x8) Mask32x8
- func (x Uint32x8) Expand(mask Mask32x8) Uint32x8
- func (x Uint32x8) ExtendToUint64() Uint64x8
- func (x Uint32x8) GetHi() Uint32x4
- func (x Uint32x8) GetLo() Uint32x4
- func (x Uint32x8) Greater(y Uint32x8) Mask32x8
- func (x Uint32x8) GreaterEqual(y Uint32x8) Mask32x8
- func (x Uint32x8) InterleaveHiGrouped(y Uint32x8) Uint32x8
- func (x Uint32x8) InterleaveLoGrouped(y Uint32x8) Uint32x8
- func (x Uint32x8) IsZero() bool
- func (x Uint32x8) LeadingZeros() Uint32x8
- func (x Uint32x8) Len() int
- func (x Uint32x8) Less(y Uint32x8) Mask32x8
- func (x Uint32x8) LessEqual(y Uint32x8) Mask32x8
- func (x Uint32x8) Masked(mask Mask32x8) Uint32x8
- func (x Uint32x8) Max(y Uint32x8) Uint32x8
- func (x Uint32x8) Merge(y Uint32x8, mask Mask32x8) Uint32x8
- func (x Uint32x8) Min(y Uint32x8) Uint32x8
- func (x Uint32x8) Mul(y Uint32x8) Uint32x8
- func (x Uint32x8) MulEvenWiden(y Uint32x8) Uint64x4
- func (x Uint32x8) Not() Uint32x8
- func (x Uint32x8) NotEqual(y Uint32x8) Mask32x8
- func (x Uint32x8) OnesCount() Uint32x8
- func (x Uint32x8) Or(y Uint32x8) Uint32x8
- func (x Uint32x8) Permute(indices Uint32x8) Uint32x8
- func (x Uint32x8) PermuteScalarsGrouped(a, b, c, d uint8) Uint32x8
- func (x Uint32x8) RotateAllLeft(shift uint8) Uint32x8
- func (x Uint32x8) RotateAllRight(shift uint8) Uint32x8
- func (x Uint32x8) RotateLeft(y Uint32x8) Uint32x8
- func (x Uint32x8) RotateRight(y Uint32x8) Uint32x8
- func (x Uint32x8) SaturateToUint16() Uint16x8
- func (x Uint32x8) SaturateToUint16Concat(y Uint32x8) Uint16x16
- func (x Uint32x8) Select128FromPair(lo, hi uint8, y Uint32x8) Uint32x8
- func (x Uint32x8) SelectFromPairGrouped(a, b, c, d uint8, y Uint32x8) Uint32x8
- func (x Uint32x8) SetHi(y Uint32x4) Uint32x8
- func (x Uint32x8) SetLo(y Uint32x4) Uint32x8
- func (x Uint32x8) ShiftAllLeft(y uint64) Uint32x8
- func (x Uint32x8) ShiftAllLeftConcat(shift uint8, y Uint32x8) Uint32x8
- func (x Uint32x8) ShiftAllRight(y uint64) Uint32x8
- func (x Uint32x8) ShiftAllRightConcat(shift uint8, y Uint32x8) Uint32x8
- func (x Uint32x8) ShiftLeft(y Uint32x8) Uint32x8
- func (x Uint32x8) ShiftLeftConcat(y Uint32x8, z Uint32x8) Uint32x8
- func (x Uint32x8) ShiftRight(y Uint32x8) Uint32x8
- func (x Uint32x8) ShiftRightConcat(y Uint32x8, z Uint32x8) Uint32x8
- func (x Uint32x8) Store(y *[8]uint32)
- func (x Uint32x8) StoreMasked(y *[8]uint32, mask Mask32x8)
- func (x Uint32x8) StoreSlice(s []uint32)
- func (x Uint32x8) StoreSlicePart(s []uint32)
- func (x Uint32x8) String() string
- func (x Uint32x8) Sub(y Uint32x8) Uint32x8
- func (x Uint32x8) SubPairs(y Uint32x8) Uint32x8
- func (x Uint32x8) TruncateToUint16() Uint16x8
- func (x Uint32x8) TruncateToUint8() Uint8x16
- func (x Uint32x8) Xor(y Uint32x8) Uint32x8
- type Uint64x2
- func (x Uint64x2) Add(y Uint64x2) Uint64x2
- func (x Uint64x2) And(y Uint64x2) Uint64x2
- func (x Uint64x2) AndNot(y Uint64x2) Uint64x2
- func (from Uint64x2) AsFloat32x4() (to Float32x4)
- func (from Uint64x2) AsFloat64x2() (to Float64x2)
- func (from Uint64x2) AsInt16x8() (to Int16x8)
- func (from Uint64x2) AsInt32x4() (to Int32x4)
- func (from Uint64x2) AsInt64x2() (to Int64x2)
- func (from Uint64x2) AsInt8x16() (to Int8x16)
- func (from Uint64x2) AsUint16x8() (to Uint16x8)
- func (from Uint64x2) AsUint32x4() (to Uint32x4)
- func (from Uint64x2) AsUint8x16() (to Uint8x16)
- func (x Uint64x2) Broadcast128() Uint64x2
- func (x Uint64x2) Broadcast256() Uint64x4
- func (x Uint64x2) Broadcast512() Uint64x8
- func (x Uint64x2) CarrylessMultiply(a, b uint8, y Uint64x2) Uint64x2
- func (x Uint64x2) Compress(mask Mask64x2) Uint64x2
- func (x Uint64x2) ConcatPermute(y Uint64x2, indices Uint64x2) Uint64x2
- func (x Uint64x2) ConvertToFloat32() Float32x4
- func (x Uint64x2) ConvertToFloat64() Float64x2
- func (x Uint64x2) Equal(y Uint64x2) Mask64x2
- func (x Uint64x2) Expand(mask Mask64x2) Uint64x2
- func (x Uint64x2) GetElem(index uint8) uint64
- func (x Uint64x2) Greater(y Uint64x2) Mask64x2
- func (x Uint64x2) GreaterEqual(y Uint64x2) Mask64x2
- func (x Uint64x2) InterleaveHi(y Uint64x2) Uint64x2
- func (x Uint64x2) InterleaveLo(y Uint64x2) Uint64x2
- func (x Uint64x2) IsZero() bool
- func (x Uint64x2) LeadingZeros() Uint64x2
- func (x Uint64x2) Len() int
- func (x Uint64x2) Less(y Uint64x2) Mask64x2
- func (x Uint64x2) LessEqual(y Uint64x2) Mask64x2
- func (x Uint64x2) Masked(mask Mask64x2) Uint64x2
- func (x Uint64x2) Max(y Uint64x2) Uint64x2
- func (x Uint64x2) Merge(y Uint64x2, mask Mask64x2) Uint64x2
- func (x Uint64x2) Min(y Uint64x2) Uint64x2
- func (x Uint64x2) Mul(y Uint64x2) Uint64x2
- func (x Uint64x2) Not() Uint64x2
- func (x Uint64x2) NotEqual(y Uint64x2) Mask64x2
- func (x Uint64x2) OnesCount() Uint64x2
- func (x Uint64x2) Or(y Uint64x2) Uint64x2
- func (x Uint64x2) RotateAllLeft(shift uint8) Uint64x2
- func (x Uint64x2) RotateAllRight(shift uint8) Uint64x2
- func (x Uint64x2) RotateLeft(y Uint64x2) Uint64x2
- func (x Uint64x2) RotateRight(y Uint64x2) Uint64x2
- func (x Uint64x2) SaturateToUint16() Uint16x8
- func (x Uint64x2) SaturateToUint32() Uint32x4
- func (x Uint64x2) SelectFromPair(a, b uint8, y Uint64x2) Uint64x2
- func (x Uint64x2) SetElem(index uint8, y uint64) Uint64x2
- func (x Uint64x2) ShiftAllLeft(y uint64) Uint64x2
- func (x Uint64x2) ShiftAllLeftConcat(shift uint8, y Uint64x2) Uint64x2
- func (x Uint64x2) ShiftAllRight(y uint64) Uint64x2
- func (x Uint64x2) ShiftAllRightConcat(shift uint8, y Uint64x2) Uint64x2
- func (x Uint64x2) ShiftLeft(y Uint64x2) Uint64x2
- func (x Uint64x2) ShiftLeftConcat(y Uint64x2, z Uint64x2) Uint64x2
- func (x Uint64x2) ShiftRight(y Uint64x2) Uint64x2
- func (x Uint64x2) ShiftRightConcat(y Uint64x2, z Uint64x2) Uint64x2
- func (x Uint64x2) Store(y *[2]uint64)
- func (x Uint64x2) StoreMasked(y *[2]uint64, mask Mask64x2)
- func (x Uint64x2) StoreSlice(s []uint64)
- func (x Uint64x2) StoreSlicePart(s []uint64)
- func (x Uint64x2) String() string
- func (x Uint64x2) Sub(y Uint64x2) Uint64x2
- func (x Uint64x2) TruncateToUint16() Uint16x8
- func (x Uint64x2) TruncateToUint32() Uint32x4
- func (x Uint64x2) TruncateToUint8() Uint8x16
- func (x Uint64x2) Xor(y Uint64x2) Uint64x2
- type Uint64x4
- func (x Uint64x4) Add(y Uint64x4) Uint64x4
- func (x Uint64x4) And(y Uint64x4) Uint64x4
- func (x Uint64x4) AndNot(y Uint64x4) Uint64x4
- func (from Uint64x4) AsFloat32x8() (to Float32x8)
- func (from Uint64x4) AsFloat64x4() (to Float64x4)
- func (from Uint64x4) AsInt16x16() (to Int16x16)
- func (from Uint64x4) AsInt32x8() (to Int32x8)
- func (from Uint64x4) AsInt64x4() (to Int64x4)
- func (from Uint64x4) AsInt8x32() (to Int8x32)
- func (from Uint64x4) AsUint16x16() (to Uint16x16)
- func (from Uint64x4) AsUint32x8() (to Uint32x8)
- func (from Uint64x4) AsUint8x32() (to Uint8x32)
- func (x Uint64x4) CarrylessMultiplyGrouped(a, b uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) Compress(mask Mask64x4) Uint64x4
- func (x Uint64x4) ConcatPermute(y Uint64x4, indices Uint64x4) Uint64x4
- func (x Uint64x4) ConvertToFloat32() Float32x4
- func (x Uint64x4) ConvertToFloat64() Float64x4
- func (x Uint64x4) Equal(y Uint64x4) Mask64x4
- func (x Uint64x4) Expand(mask Mask64x4) Uint64x4
- func (x Uint64x4) GetHi() Uint64x2
- func (x Uint64x4) GetLo() Uint64x2
- func (x Uint64x4) Greater(y Uint64x4) Mask64x4
- func (x Uint64x4) GreaterEqual(y Uint64x4) Mask64x4
- func (x Uint64x4) InterleaveHiGrouped(y Uint64x4) Uint64x4
- func (x Uint64x4) InterleaveLoGrouped(y Uint64x4) Uint64x4
- func (x Uint64x4) IsZero() bool
- func (x Uint64x4) LeadingZeros() Uint64x4
- func (x Uint64x4) Len() int
- func (x Uint64x4) Less(y Uint64x4) Mask64x4
- func (x Uint64x4) LessEqual(y Uint64x4) Mask64x4
- func (x Uint64x4) Masked(mask Mask64x4) Uint64x4
- func (x Uint64x4) Max(y Uint64x4) Uint64x4
- func (x Uint64x4) Merge(y Uint64x4, mask Mask64x4) Uint64x4
- func (x Uint64x4) Min(y Uint64x4) Uint64x4
- func (x Uint64x4) Mul(y Uint64x4) Uint64x4
- func (x Uint64x4) Not() Uint64x4
- func (x Uint64x4) NotEqual(y Uint64x4) Mask64x4
- func (x Uint64x4) OnesCount() Uint64x4
- func (x Uint64x4) Or(y Uint64x4) Uint64x4
- func (x Uint64x4) Permute(indices Uint64x4) Uint64x4
- func (x Uint64x4) RotateAllLeft(shift uint8) Uint64x4
- func (x Uint64x4) RotateAllRight(shift uint8) Uint64x4
- func (x Uint64x4) RotateLeft(y Uint64x4) Uint64x4
- func (x Uint64x4) RotateRight(y Uint64x4) Uint64x4
- func (x Uint64x4) SaturateToUint16() Uint16x8
- func (x Uint64x4) SaturateToUint32() Uint32x4
- func (x Uint64x4) Select128FromPair(lo, hi uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) SelectFromPairGrouped(a, b uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) SetHi(y Uint64x2) Uint64x4
- func (x Uint64x4) SetLo(y Uint64x2) Uint64x4
- func (x Uint64x4) ShiftAllLeft(y uint64) Uint64x4
- func (x Uint64x4) ShiftAllLeftConcat(shift uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) ShiftAllRight(y uint64) Uint64x4
- func (x Uint64x4) ShiftAllRightConcat(shift uint8, y Uint64x4) Uint64x4
- func (x Uint64x4) ShiftLeft(y Uint64x4) Uint64x4
- func (x Uint64x4) ShiftLeftConcat(y Uint64x4, z Uint64x4) Uint64x4
- func (x Uint64x4) ShiftRight(y Uint64x4) Uint64x4
- func (x Uint64x4) ShiftRightConcat(y Uint64x4, z Uint64x4) Uint64x4
- func (x Uint64x4) Store(y *[4]uint64)
- func (x Uint64x4) StoreMasked(y *[4]uint64, mask Mask64x4)
- func (x Uint64x4) StoreSlice(s []uint64)
- func (x Uint64x4) StoreSlicePart(s []uint64)
- func (x Uint64x4) String() string
- func (x Uint64x4) Sub(y Uint64x4) Uint64x4
- func (x Uint64x4) TruncateToUint16() Uint16x8
- func (x Uint64x4) TruncateToUint32() Uint32x4
- func (x Uint64x4) TruncateToUint8() Uint8x16
- func (x Uint64x4) Xor(y Uint64x4) Uint64x4
- type Uint64x8
- func (x Uint64x8) Add(y Uint64x8) Uint64x8
- func (x Uint64x8) And(y Uint64x8) Uint64x8
- func (x Uint64x8) AndNot(y Uint64x8) Uint64x8
- func (from Uint64x8) AsFloat32x16() (to Float32x16)
- func (from Uint64x8) AsFloat64x8() (to Float64x8)
- func (from Uint64x8) AsInt16x32() (to Int16x32)
- func (from Uint64x8) AsInt32x16() (to Int32x16)
- func (from Uint64x8) AsInt64x8() (to Int64x8)
- func (from Uint64x8) AsInt8x64() (to Int8x64)
- func (from Uint64x8) AsUint16x32() (to Uint16x32)
- func (from Uint64x8) AsUint32x16() (to Uint32x16)
- func (from Uint64x8) AsUint8x64() (to Uint8x64)
- func (x Uint64x8) CarrylessMultiplyGrouped(a, b uint8, y Uint64x8) Uint64x8
- func (x Uint64x8) Compress(mask Mask64x8) Uint64x8
- func (x Uint64x8) ConcatPermute(y Uint64x8, indices Uint64x8) Uint64x8
- func (x Uint64x8) ConvertToFloat32() Float32x8
- func (x Uint64x8) ConvertToFloat64() Float64x8
- func (x Uint64x8) Equal(y Uint64x8) Mask64x8
- func (x Uint64x8) Expand(mask Mask64x8) Uint64x8
- func (x Uint64x8) GetHi() Uint64x4
- func (x Uint64x8) GetLo() Uint64x4
- func (x Uint64x8) Greater(y Uint64x8) Mask64x8
- func (x Uint64x8) GreaterEqual(y Uint64x8) Mask64x8
- func (x Uint64x8) InterleaveHiGrouped(y Uint64x8) Uint64x8
- func (x Uint64x8) InterleaveLoGrouped(y Uint64x8) Uint64x8
- func (x Uint64x8) LeadingZeros() Uint64x8
- func (x Uint64x8) Len() int
- func (x Uint64x8) Less(y Uint64x8) Mask64x8
- func (x Uint64x8) LessEqual(y Uint64x8) Mask64x8
- func (x Uint64x8) Masked(mask Mask64x8) Uint64x8
- func (x Uint64x8) Max(y Uint64x8) Uint64x8
- func (x Uint64x8) Merge(y Uint64x8, mask Mask64x8) Uint64x8
- func (x Uint64x8) Min(y Uint64x8) Uint64x8
- func (x Uint64x8) Mul(y Uint64x8) Uint64x8
- func (x Uint64x8) Not() Uint64x8
- func (x Uint64x8) NotEqual(y Uint64x8) Mask64x8
- func (x Uint64x8) OnesCount() Uint64x8
- func (x Uint64x8) Or(y Uint64x8) Uint64x8
- func (x Uint64x8) Permute(indices Uint64x8) Uint64x8
- func (x Uint64x8) RotateAllLeft(shift uint8) Uint64x8
- func (x Uint64x8) RotateAllRight(shift uint8) Uint64x8
- func (x Uint64x8) RotateLeft(y Uint64x8) Uint64x8
- func (x Uint64x8) RotateRight(y Uint64x8) Uint64x8
- func (x Uint64x8) SaturateToUint16() Uint16x8
- func (x Uint64x8) SaturateToUint32() Uint32x8
- func (x Uint64x8) SelectFromPairGrouped(a, b uint8, y Uint64x8) Uint64x8
- func (x Uint64x8) SetHi(y Uint64x4) Uint64x8
- func (x Uint64x8) SetLo(y Uint64x4) Uint64x8
- func (x Uint64x8) ShiftAllLeft(y uint64) Uint64x8
- func (x Uint64x8) ShiftAllLeftConcat(shift uint8, y Uint64x8) Uint64x8
- func (x Uint64x8) ShiftAllRight(y uint64) Uint64x8
- func (x Uint64x8) ShiftAllRightConcat(shift uint8, y Uint64x8) Uint64x8
- func (x Uint64x8) ShiftLeft(y Uint64x8) Uint64x8
- func (x Uint64x8) ShiftLeftConcat(y Uint64x8, z Uint64x8) Uint64x8
- func (x Uint64x8) ShiftRight(y Uint64x8) Uint64x8
- func (x Uint64x8) ShiftRightConcat(y Uint64x8, z Uint64x8) Uint64x8
- func (x Uint64x8) Store(y *[8]uint64)
- func (x Uint64x8) StoreMasked(y *[8]uint64, mask Mask64x8)
- func (x Uint64x8) StoreSlice(s []uint64)
- func (x Uint64x8) StoreSlicePart(s []uint64)
- func (x Uint64x8) String() string
- func (x Uint64x8) Sub(y Uint64x8) Uint64x8
- func (x Uint64x8) TruncateToUint16() Uint16x8
- func (x Uint64x8) TruncateToUint32() Uint32x8
- func (x Uint64x8) TruncateToUint8() Uint8x16
- func (x Uint64x8) Xor(y Uint64x8) Uint64x8
- type Uint8x16
- func (x Uint8x16) AESDecryptLastRound(y Uint32x4) Uint8x16
- func (x Uint8x16) AESDecryptOneRound(y Uint32x4) Uint8x16
- func (x Uint8x16) AESEncryptLastRound(y Uint32x4) Uint8x16
- func (x Uint8x16) AESEncryptOneRound(y Uint32x4) Uint8x16
- func (x Uint8x16) Add(y Uint8x16) Uint8x16
- func (x Uint8x16) AddSaturated(y Uint8x16) Uint8x16
- func (x Uint8x16) And(y Uint8x16) Uint8x16
- func (x Uint8x16) AndNot(y Uint8x16) Uint8x16
- func (from Uint8x16) AsFloat32x4() (to Float32x4)
- func (from Uint8x16) AsFloat64x2() (to Float64x2)
- func (from Uint8x16) AsInt16x8() (to Int16x8)
- func (from Uint8x16) AsInt32x4() (to Int32x4)
- func (from Uint8x16) AsInt64x2() (to Int64x2)
- func (from Uint8x16) AsInt8x16() (to Int8x16)
- func (from Uint8x16) AsUint16x8() (to Uint16x8)
- func (from Uint8x16) AsUint32x4() (to Uint32x4)
- func (from Uint8x16) AsUint64x2() (to Uint64x2)
- func (x Uint8x16) Average(y Uint8x16) Uint8x16
- func (x Uint8x16) Broadcast128() Uint8x16
- func (x Uint8x16) Broadcast256() Uint8x32
- func (x Uint8x16) Broadcast512() Uint8x64
- func (x Uint8x16) Compress(mask Mask8x16) Uint8x16
- func (x Uint8x16) ConcatPermute(y Uint8x16, indices Uint8x16) Uint8x16
- func (x Uint8x16) ConcatShiftBytesRight(constant uint8, y Uint8x16) Uint8x16
- func (x Uint8x16) DotProductPairsSaturated(y Int8x16) Int16x8
- func (x Uint8x16) Equal(y Uint8x16) Mask8x16
- func (x Uint8x16) Expand(mask Mask8x16) Uint8x16
- func (x Uint8x16) ExtendLo2ToUint64x2() Uint64x2
- func (x Uint8x16) ExtendLo4ToUint32x4() Uint32x4
- func (x Uint8x16) ExtendLo4ToUint64x4() Uint64x4
- func (x Uint8x16) ExtendLo8ToUint16x8() Uint16x8
- func (x Uint8x16) ExtendLo8ToUint32x8() Uint32x8
- func (x Uint8x16) ExtendLo8ToUint64x8() Uint64x8
- func (x Uint8x16) ExtendToUint16() Uint16x16
- func (x Uint8x16) ExtendToUint32() Uint32x16
- func (x Uint8x16) GaloisFieldAffineTransform(y Uint64x2, b uint8) Uint8x16
- func (x Uint8x16) GaloisFieldAffineTransformInverse(y Uint64x2, b uint8) Uint8x16
- func (x Uint8x16) GaloisFieldMul(y Uint8x16) Uint8x16
- func (x Uint8x16) GetElem(index uint8) uint8
- func (x Uint8x16) Greater(y Uint8x16) Mask8x16
- func (x Uint8x16) GreaterEqual(y Uint8x16) Mask8x16
- func (x Uint8x16) IsZero() bool
- func (x Uint8x16) Len() int
- func (x Uint8x16) Less(y Uint8x16) Mask8x16
- func (x Uint8x16) LessEqual(y Uint8x16) Mask8x16
- func (x Uint8x16) Masked(mask Mask8x16) Uint8x16
- func (x Uint8x16) Max(y Uint8x16) Uint8x16
- func (x Uint8x16) Merge(y Uint8x16, mask Mask8x16) Uint8x16
- func (x Uint8x16) Min(y Uint8x16) Uint8x16
- func (x Uint8x16) Not() Uint8x16
- func (x Uint8x16) NotEqual(y Uint8x16) Mask8x16
- func (x Uint8x16) OnesCount() Uint8x16
- func (x Uint8x16) Or(y Uint8x16) Uint8x16
- func (x Uint8x16) Permute(indices Uint8x16) Uint8x16
- func (x Uint8x16) PermuteOrZero(indices Int8x16) Uint8x16
- func (x Uint8x16) SetElem(index uint8, y uint8) Uint8x16
- func (x Uint8x16) Store(y *[16]uint8)
- func (x Uint8x16) StoreSlice(s []uint8)
- func (x Uint8x16) StoreSlicePart(s []uint8)
- func (x Uint8x16) String() string
- func (x Uint8x16) Sub(y Uint8x16) Uint8x16
- func (x Uint8x16) SubSaturated(y Uint8x16) Uint8x16
- func (x Uint8x16) SumAbsDiff(y Uint8x16) Uint16x8
- func (x Uint8x16) Xor(y Uint8x16) Uint8x16
- type Uint8x32
- func (x Uint8x32) AESDecryptLastRound(y Uint32x8) Uint8x32
- func (x Uint8x32) AESDecryptOneRound(y Uint32x8) Uint8x32
- func (x Uint8x32) AESEncryptLastRound(y Uint32x8) Uint8x32
- func (x Uint8x32) AESEncryptOneRound(y Uint32x8) Uint8x32
- func (x Uint8x32) Add(y Uint8x32) Uint8x32
- func (x Uint8x32) AddSaturated(y Uint8x32) Uint8x32
- func (x Uint8x32) And(y Uint8x32) Uint8x32
- func (x Uint8x32) AndNot(y Uint8x32) Uint8x32
- func (from Uint8x32) AsFloat32x8() (to Float32x8)
- func (from Uint8x32) AsFloat64x4() (to Float64x4)
- func (from Uint8x32) AsInt16x16() (to Int16x16)
- func (from Uint8x32) AsInt32x8() (to Int32x8)
- func (from Uint8x32) AsInt64x4() (to Int64x4)
- func (from Uint8x32) AsInt8x32() (to Int8x32)
- func (from Uint8x32) AsUint16x16() (to Uint16x16)
- func (from Uint8x32) AsUint32x8() (to Uint32x8)
- func (from Uint8x32) AsUint64x4() (to Uint64x4)
- func (x Uint8x32) Average(y Uint8x32) Uint8x32
- func (x Uint8x32) Compress(mask Mask8x32) Uint8x32
- func (x Uint8x32) ConcatPermute(y Uint8x32, indices Uint8x32) Uint8x32
- func (x Uint8x32) ConcatShiftBytesRightGrouped(constant uint8, y Uint8x32) Uint8x32
- func (x Uint8x32) DotProductPairsSaturated(y Int8x32) Int16x16
- func (x Uint8x32) Equal(y Uint8x32) Mask8x32
- func (x Uint8x32) Expand(mask Mask8x32) Uint8x32
- func (x Uint8x32) ExtendToUint16() Uint16x32
- func (x Uint8x32) GaloisFieldAffineTransform(y Uint64x4, b uint8) Uint8x32
- func (x Uint8x32) GaloisFieldAffineTransformInverse(y Uint64x4, b uint8) Uint8x32
- func (x Uint8x32) GaloisFieldMul(y Uint8x32) Uint8x32
- func (x Uint8x32) GetHi() Uint8x16
- func (x Uint8x32) GetLo() Uint8x16
- func (x Uint8x32) Greater(y Uint8x32) Mask8x32
- func (x Uint8x32) GreaterEqual(y Uint8x32) Mask8x32
- func (x Uint8x32) IsZero() bool
- func (x Uint8x32) Len() int
- func (x Uint8x32) Less(y Uint8x32) Mask8x32
- func (x Uint8x32) LessEqual(y Uint8x32) Mask8x32
- func (x Uint8x32) Masked(mask Mask8x32) Uint8x32
- func (x Uint8x32) Max(y Uint8x32) Uint8x32
- func (x Uint8x32) Merge(y Uint8x32, mask Mask8x32) Uint8x32
- func (x Uint8x32) Min(y Uint8x32) Uint8x32
- func (x Uint8x32) Not() Uint8x32
- func (x Uint8x32) NotEqual(y Uint8x32) Mask8x32
- func (x Uint8x32) OnesCount() Uint8x32
- func (x Uint8x32) Or(y Uint8x32) Uint8x32
- func (x Uint8x32) Permute(indices Uint8x32) Uint8x32
- func (x Uint8x32) PermuteOrZeroGrouped(indices Int8x32) Uint8x32
- func (x Uint8x32) Select128FromPair(lo, hi uint8, y Uint8x32) Uint8x32
- func (x Uint8x32) SetHi(y Uint8x16) Uint8x32
- func (x Uint8x32) SetLo(y Uint8x16) Uint8x32
- func (x Uint8x32) Store(y *[32]uint8)
- func (x Uint8x32) StoreSlice(s []uint8)
- func (x Uint8x32) StoreSlicePart(s []uint8)
- func (x Uint8x32) String() string
- func (x Uint8x32) Sub(y Uint8x32) Uint8x32
- func (x Uint8x32) SubSaturated(y Uint8x32) Uint8x32
- func (x Uint8x32) SumAbsDiff(y Uint8x32) Uint16x16
- func (x Uint8x32) Xor(y Uint8x32) Uint8x32
- type Uint8x64
- func (x Uint8x64) AESDecryptLastRound(y Uint32x16) Uint8x64
- func (x Uint8x64) AESDecryptOneRound(y Uint32x16) Uint8x64
- func (x Uint8x64) AESEncryptLastRound(y Uint32x16) Uint8x64
- func (x Uint8x64) AESEncryptOneRound(y Uint32x16) Uint8x64
- func (x Uint8x64) Add(y Uint8x64) Uint8x64
- func (x Uint8x64) AddSaturated(y Uint8x64) Uint8x64
- func (x Uint8x64) And(y Uint8x64) Uint8x64
- func (x Uint8x64) AndNot(y Uint8x64) Uint8x64
- func (from Uint8x64) AsFloat32x16() (to Float32x16)
- func (from Uint8x64) AsFloat64x8() (to Float64x8)
- func (from Uint8x64) AsInt16x32() (to Int16x32)
- func (from Uint8x64) AsInt32x16() (to Int32x16)
- func (from Uint8x64) AsInt64x8() (to Int64x8)
- func (from Uint8x64) AsInt8x64() (to Int8x64)
- func (from Uint8x64) AsUint16x32() (to Uint16x32)
- func (from Uint8x64) AsUint32x16() (to Uint32x16)
- func (from Uint8x64) AsUint64x8() (to Uint64x8)
- func (x Uint8x64) Average(y Uint8x64) Uint8x64
- func (x Uint8x64) Compress(mask Mask8x64) Uint8x64
- func (x Uint8x64) ConcatPermute(y Uint8x64, indices Uint8x64) Uint8x64
- func (x Uint8x64) ConcatShiftBytesRightGrouped(constant uint8, y Uint8x64) Uint8x64
- func (x Uint8x64) DotProductPairsSaturated(y Int8x64) Int16x32
- func (x Uint8x64) Equal(y Uint8x64) Mask8x64
- func (x Uint8x64) Expand(mask Mask8x64) Uint8x64
- func (x Uint8x64) GaloisFieldAffineTransform(y Uint64x8, b uint8) Uint8x64
- func (x Uint8x64) GaloisFieldAffineTransformInverse(y Uint64x8, b uint8) Uint8x64
- func (x Uint8x64) GaloisFieldMul(y Uint8x64) Uint8x64
- func (x Uint8x64) GetHi() Uint8x32
- func (x Uint8x64) GetLo() Uint8x32
- func (x Uint8x64) Greater(y Uint8x64) Mask8x64
- func (x Uint8x64) GreaterEqual(y Uint8x64) Mask8x64
- func (x Uint8x64) Len() int
- func (x Uint8x64) Less(y Uint8x64) Mask8x64
- func (x Uint8x64) LessEqual(y Uint8x64) Mask8x64
- func (x Uint8x64) Masked(mask Mask8x64) Uint8x64
- func (x Uint8x64) Max(y Uint8x64) Uint8x64
- func (x Uint8x64) Merge(y Uint8x64, mask Mask8x64) Uint8x64
- func (x Uint8x64) Min(y Uint8x64) Uint8x64
- func (x Uint8x64) Not() Uint8x64
- func (x Uint8x64) NotEqual(y Uint8x64) Mask8x64
- func (x Uint8x64) OnesCount() Uint8x64
- func (x Uint8x64) Or(y Uint8x64) Uint8x64
- func (x Uint8x64) Permute(indices Uint8x64) Uint8x64
- func (x Uint8x64) PermuteOrZeroGrouped(indices Int8x64) Uint8x64
- func (x Uint8x64) SetHi(y Uint8x32) Uint8x64
- func (x Uint8x64) SetLo(y Uint8x32) Uint8x64
- func (x Uint8x64) Store(y *[64]uint8)
- func (x Uint8x64) StoreMasked(y *[64]uint8, mask Mask8x64)
- func (x Uint8x64) StoreSlice(s []uint8)
- func (x Uint8x64) StoreSlicePart(s []uint8)
- func (x Uint8x64) String() string
- func (x Uint8x64) Sub(y Uint8x64) Uint8x64
- func (x Uint8x64) SubSaturated(y Uint8x64) Uint8x64
- func (x Uint8x64) SumAbsDiff(y Uint8x64) Uint16x32
- func (x Uint8x64) Xor(y Uint8x64) Uint8x64
- type X86Features
- func (X86Features) AES() bool
- func (X86Features) AVX() bool
- func (X86Features) AVX2() bool
- func (X86Features) AVX512() bool
- func (X86Features) AVX512BITALG() bool
- func (X86Features) AVX512GFNI() bool
- func (X86Features) AVX512VAES() bool
- func (X86Features) AVX512VBMI() bool
- func (X86Features) AVX512VBMI2() bool
- func (X86Features) AVX512VNNI() bool
- func (X86Features) AVX512VPCLMULQDQ() bool
- func (X86Features) AVX512VPOPCNTDQ() bool
- func (X86Features) AVXVNNI() bool
- func (X86Features) SHA() bool
- Bugs
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ClearAVXUpperBits ¶
func ClearAVXUpperBits()
ClearAVXUpperBits clears the high bits of Y0-Y15 and Z0-Z15 registers. It is intended for transitioning from AVX to SSE, eliminating the performance penalties caused by false dependencies.
Note: in the future the compiler may automatically generate the instruction, making this function unnecessary.
Asm: VZEROUPPER, CPU Feature: AVX
Types ¶
type Float32x16 ¶
type Float32x16 struct {
// contains filtered or unexported fields
}
Float32x16 is a 512-bit SIMD vector of 16 float32
func BroadcastFloat32x16 ¶
func BroadcastFloat32x16(x float32) Float32x16
BroadcastFloat32x16 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX512F
func LoadFloat32x16 ¶
func LoadFloat32x16(y *[16]float32) Float32x16
LoadFloat32x16 loads a Float32x16 from an array
func LoadFloat32x16Slice ¶
func LoadFloat32x16Slice(s []float32) Float32x16
LoadFloat32x16Slice loads a Float32x16 from a slice of at least 16 float32s
func LoadFloat32x16SlicePart ¶
func LoadFloat32x16SlicePart(s []float32) Float32x16
LoadFloat32x16SlicePart loads a Float32x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadFloat32x16Slice.
func LoadMaskedFloat32x16 ¶
func LoadMaskedFloat32x16(y *[16]float32, mask Mask32x16) Float32x16
LoadMaskedFloat32x16 loads a Float32x16 from an array, at those elements enabled by mask
Asm: VMOVDQU32.Z, CPU Feature: AVX512
func (Float32x16) Add ¶
func (x Float32x16) Add(y Float32x16) Float32x16
Add adds corresponding elements of two vectors.
Asm: VADDPS, CPU Feature: AVX512
func (Float32x16) AsFloat64x8 ¶
func (from Float32x16) AsFloat64x8() (to Float64x8)
Float64x8 converts from Float32x16 to Float64x8
func (Float32x16) AsInt16x32 ¶
func (from Float32x16) AsInt16x32() (to Int16x32)
Int16x32 converts from Float32x16 to Int16x32
func (Float32x16) AsInt32x16 ¶
func (from Float32x16) AsInt32x16() (to Int32x16)
Int32x16 converts from Float32x16 to Int32x16
func (Float32x16) AsInt64x8 ¶
func (from Float32x16) AsInt64x8() (to Int64x8)
Int64x8 converts from Float32x16 to Int64x8
func (Float32x16) AsInt8x64 ¶
func (from Float32x16) AsInt8x64() (to Int8x64)
Int8x64 converts from Float32x16 to Int8x64
func (Float32x16) AsUint16x32 ¶
func (from Float32x16) AsUint16x32() (to Uint16x32)
Uint16x32 converts from Float32x16 to Uint16x32
func (Float32x16) AsUint32x16 ¶
func (from Float32x16) AsUint32x16() (to Uint32x16)
Uint32x16 converts from Float32x16 to Uint32x16
func (Float32x16) AsUint64x8 ¶
func (from Float32x16) AsUint64x8() (to Uint64x8)
Uint64x8 converts from Float32x16 to Uint64x8
func (Float32x16) AsUint8x64 ¶
func (from Float32x16) AsUint8x64() (to Uint8x64)
Uint8x64 converts from Float32x16 to Uint8x64
func (Float32x16) CeilScaled ¶
func (x Float32x16) CeilScaled(prec uint8) Float32x16
CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x16) CeilScaledResidue ¶
func (x Float32x16) CeilScaledResidue(prec uint8) Float32x16
CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x16) Compress ¶
func (x Float32x16) Compress(mask Mask32x16) Float32x16
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPS, CPU Feature: AVX512
func (Float32x16) ConcatPermute ¶
func (x Float32x16) ConcatPermute(y Float32x16, indices Uint32x16) Float32x16
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PS, CPU Feature: AVX512
func (Float32x16) ConvertToInt32 ¶
func (x Float32x16) ConvertToInt32() Int32x16
ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2DQ, CPU Feature: AVX512
func (Float32x16) ConvertToUint32 ¶
func (x Float32x16) ConvertToUint32() Uint32x16
ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2UDQ, CPU Feature: AVX512
func (Float32x16) Div ¶
func (x Float32x16) Div(y Float32x16) Float32x16
Div divides elements of two vectors.
Asm: VDIVPS, CPU Feature: AVX512
func (Float32x16) Equal ¶
func (x Float32x16) Equal(y Float32x16) Mask32x16
Equal returns x equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX512
func (Float32x16) Expand ¶
func (x Float32x16) Expand(mask Mask32x16) Float32x16
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPS, CPU Feature: AVX512
func (Float32x16) FloorScaled ¶
func (x Float32x16) FloorScaled(prec uint8) Float32x16
FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x16) FloorScaledResidue ¶
func (x Float32x16) FloorScaledResidue(prec uint8) Float32x16
FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x16) GetHi ¶
func (x Float32x16) GetHi() Float32x8
GetHi returns the upper half of x.
Asm: VEXTRACTF64X4, CPU Feature: AVX512
func (Float32x16) GetLo ¶
func (x Float32x16) GetLo() Float32x8
GetLo returns the lower half of x.
Asm: VEXTRACTF64X4, CPU Feature: AVX512
func (Float32x16) Greater ¶
func (x Float32x16) Greater(y Float32x16) Mask32x16
Greater returns x greater-than y, elementwise.
Asm: VCMPPS, CPU Feature: AVX512
func (Float32x16) GreaterEqual ¶
func (x Float32x16) GreaterEqual(y Float32x16) Mask32x16
GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX512
func (Float32x16) IsNan ¶
func (x Float32x16) IsNan(y Float32x16) Mask32x16
IsNan checks if elements are NaN. Use as x.IsNan(x).
Asm: VCMPPS, CPU Feature: AVX512
func (Float32x16) Len ¶
func (x Float32x16) Len() int
Len returns the number of elements in a Float32x16
func (Float32x16) Less ¶
func (x Float32x16) Less(y Float32x16) Mask32x16
Less returns x less-than y, elementwise.
Asm: VCMPPS, CPU Feature: AVX512
func (Float32x16) LessEqual ¶
func (x Float32x16) LessEqual(y Float32x16) Mask32x16
LessEqual returns x less-than-or-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX512
func (Float32x16) Masked ¶
func (x Float32x16) Masked(mask Mask32x16) Float32x16
Masked returns x but with elements zeroed where mask is false.
func (Float32x16) Max ¶
func (x Float32x16) Max(y Float32x16) Float32x16
Max computes the maximum of corresponding elements.
Asm: VMAXPS, CPU Feature: AVX512
func (Float32x16) Merge ¶
func (x Float32x16) Merge(y Float32x16, mask Mask32x16) Float32x16
Merge returns x but with elements set to y where m is false.
func (Float32x16) Min ¶
func (x Float32x16) Min(y Float32x16) Float32x16
Min computes the minimum of corresponding elements.
Asm: VMINPS, CPU Feature: AVX512
func (Float32x16) Mul ¶
func (x Float32x16) Mul(y Float32x16) Float32x16
Mul multiplies corresponding elements of two vectors.
Asm: VMULPS, CPU Feature: AVX512
func (Float32x16) MulAdd ¶
func (x Float32x16) MulAdd(y Float32x16, z Float32x16) Float32x16
MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PS, CPU Feature: AVX512
func (Float32x16) MulAddSub ¶
func (x Float32x16) MulAddSub(y Float32x16, z Float32x16) Float32x16
MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PS, CPU Feature: AVX512
func (Float32x16) MulSubAdd ¶
func (x Float32x16) MulSubAdd(y Float32x16, z Float32x16) Float32x16
MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PS, CPU Feature: AVX512
func (Float32x16) NotEqual ¶
func (x Float32x16) NotEqual(y Float32x16) Mask32x16
NotEqual returns x not-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX512
func (Float32x16) Permute ¶
func (x Float32x16) Permute(indices Uint32x16) Float32x16
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used
Asm: VPERMPS, CPU Feature: AVX512
func (Float32x16) Reciprocal ¶
func (x Float32x16) Reciprocal() Float32x16
Reciprocal computes an approximate reciprocal of each element.
Asm: VRCP14PS, CPU Feature: AVX512
func (Float32x16) ReciprocalSqrt ¶
func (x Float32x16) ReciprocalSqrt() Float32x16
ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRT14PS, CPU Feature: AVX512
func (Float32x16) RoundToEvenScaled ¶
func (x Float32x16) RoundToEvenScaled(prec uint8) Float32x16
RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x16) RoundToEvenScaledResidue ¶
func (x Float32x16) RoundToEvenScaledResidue(prec uint8) Float32x16
RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x16) Scale ¶
func (x Float32x16) Scale(y Float32x16) Float32x16
Scale multiplies elements by a power of 2.
Asm: VSCALEFPS, CPU Feature: AVX512
func (Float32x16) SelectFromPairGrouped ¶
func (x Float32x16) SelectFromPairGrouped(a, b, c, d uint8, y Float32x16) Float32x16
SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX512
func (Float32x16) SetHi ¶
func (x Float32x16) SetHi(y Float32x8) Float32x16
SetHi returns x with its upper half set to y.
Asm: VINSERTF64X4, CPU Feature: AVX512
func (Float32x16) SetLo ¶
func (x Float32x16) SetLo(y Float32x8) Float32x16
SetLo returns x with its lower half set to y.
Asm: VINSERTF64X4, CPU Feature: AVX512
func (Float32x16) Sqrt ¶
func (x Float32x16) Sqrt() Float32x16
Sqrt computes the square root of each element.
Asm: VSQRTPS, CPU Feature: AVX512
func (Float32x16) Store ¶
func (x Float32x16) Store(y *[16]float32)
Store stores a Float32x16 to an array
func (Float32x16) StoreMasked ¶
func (x Float32x16) StoreMasked(y *[16]float32, mask Mask32x16)
StoreMasked stores a Float32x16 to an array, at those elements enabled by mask
Asm: VMOVDQU32, CPU Feature: AVX512
func (Float32x16) StoreSlice ¶
func (x Float32x16) StoreSlice(s []float32)
StoreSlice stores x into a slice of at least 16 float32s
func (Float32x16) StoreSlicePart ¶
func (x Float32x16) StoreSlicePart(s []float32)
StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.
func (Float32x16) String ¶
func (x Float32x16) String() string
String returns a string representation of SIMD vector x
func (Float32x16) Sub ¶
func (x Float32x16) Sub(y Float32x16) Float32x16
Sub subtracts corresponding elements of two vectors.
Asm: VSUBPS, CPU Feature: AVX512
func (Float32x16) TruncScaled ¶
func (x Float32x16) TruncScaled(prec uint8) Float32x16
TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x16) TruncScaledResidue ¶
func (x Float32x16) TruncScaledResidue(prec uint8) Float32x16
TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
type Float32x4 ¶
type Float32x4 struct {
// contains filtered or unexported fields
}
Float32x4 is a 128-bit SIMD vector of 4 float32
func BroadcastFloat32x4 ¶
BroadcastFloat32x4 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadFloat32x4 ¶
LoadFloat32x4 loads a Float32x4 from an array
func LoadFloat32x4Slice ¶
LoadFloat32x4Slice loads a Float32x4 from a slice of at least 4 float32s
func LoadFloat32x4SlicePart ¶
LoadFloat32x4SlicePart loads a Float32x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadFloat32x4Slice.
func LoadMaskedFloat32x4 ¶
LoadMaskedFloat32x4 loads a Float32x4 from an array, at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
func (Float32x4) Add ¶
Add adds corresponding elements of two vectors.
Asm: VADDPS, CPU Feature: AVX
func (Float32x4) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VHADDPS, CPU Feature: AVX
func (Float32x4) AddSub ¶
AddSub subtracts even elements and adds odd elements of two vectors.
Asm: VADDSUBPS, CPU Feature: AVX
func (Float32x4) AsFloat64x2 ¶
Float64x2 converts from Float32x4 to Float64x2
func (Float32x4) AsUint16x8 ¶
Uint16x8 converts from Float32x4 to Uint16x8
func (Float32x4) AsUint32x4 ¶
Uint32x4 converts from Float32x4 to Uint32x4
func (Float32x4) AsUint64x2 ¶
Uint64x2 converts from Float32x4 to Uint64x2
func (Float32x4) AsUint8x16 ¶
Uint8x16 converts from Float32x4 to Uint8x16
func (Float32x4) Broadcast128 ¶
Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.
Asm: VBROADCASTSS, CPU Feature: AVX2
func (Float32x4) Broadcast256 ¶
Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.
Asm: VBROADCASTSS, CPU Feature: AVX2
func (Float32x4) Broadcast512 ¶
func (x Float32x4) Broadcast512() Float32x16
Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.
Asm: VBROADCASTSS, CPU Feature: AVX512
func (Float32x4) Ceil ¶
Ceil rounds elements up to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX
func (Float32x4) CeilScaled ¶
CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x4) CeilScaledResidue ¶
CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x4) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPS, CPU Feature: AVX512
func (Float32x4) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PS, CPU Feature: AVX512
func (Float32x4) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTPS2PD, CPU Feature: AVX
func (Float32x4) ConvertToInt32 ¶
ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2DQ, CPU Feature: AVX
func (Float32x4) ConvertToInt64 ¶
ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2QQ, CPU Feature: AVX512
func (Float32x4) ConvertToUint32 ¶
ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2UDQ, CPU Feature: AVX512
func (Float32x4) ConvertToUint64 ¶
ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2UQQ, CPU Feature: AVX512
func (Float32x4) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPS, CPU Feature: AVX512
func (Float32x4) Floor ¶
Floor rounds elements down to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX
func (Float32x4) FloorScaled ¶
FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x4) FloorScaledResidue ¶
FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x4) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRD, CPU Feature: AVX
func (Float32x4) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x4) GreaterEqual ¶
GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x4) IsNan ¶
IsNan checks if elements are NaN. Use as x.IsNan(x).
Asm: VCMPPS, CPU Feature: AVX
func (Float32x4) LessEqual ¶
LessEqual returns x less-than-or-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x4) Max ¶
Max computes the maximum of corresponding elements.
Asm: VMAXPS, CPU Feature: AVX
func (Float32x4) Min ¶
Min computes the minimum of corresponding elements.
Asm: VMINPS, CPU Feature: AVX
func (Float32x4) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VMULPS, CPU Feature: AVX
func (Float32x4) MulAdd ¶
MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PS, CPU Feature: AVX512
func (Float32x4) MulAddSub ¶
MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PS, CPU Feature: AVX512
func (Float32x4) MulSubAdd ¶
MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PS, CPU Feature: AVX512
func (Float32x4) NotEqual ¶
NotEqual returns x not-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x4) Reciprocal ¶
Reciprocal computes an approximate reciprocal of each element.
Asm: VRCPPS, CPU Feature: AVX
func (Float32x4) ReciprocalSqrt ¶
ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRTPS, CPU Feature: AVX
func (Float32x4) RoundToEven ¶
RoundToEven rounds elements to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX
func (Float32x4) RoundToEvenScaled ¶
RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x4) RoundToEvenScaledResidue ¶
RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x4) Scale ¶
Scale multiplies elements by a power of 2.
Asm: VSCALEFPS, CPU Feature: AVX512
func (Float32x4) SelectFromPair ¶
SelectFromPair returns the selection of four elements from the two vectors x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX
func (Float32x4) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRD, CPU Feature: AVX
func (Float32x4) Sqrt ¶
Sqrt computes the square root of each element.
Asm: VSQRTPS, CPU Feature: AVX
func (Float32x4) StoreMasked ¶
StoreMasked stores a Float32x4 to an array, at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
func (Float32x4) StoreSlice ¶
StoreSlice stores x into a slice of at least 4 float32s
func (Float32x4) StoreSlicePart ¶
StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.
func (Float32x4) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VSUBPS, CPU Feature: AVX
func (Float32x4) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VHSUBPS, CPU Feature: AVX
func (Float32x4) TruncScaled ¶
TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x4) TruncScaledResidue ¶
TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
type Float32x8 ¶
type Float32x8 struct {
// contains filtered or unexported fields
}
Float32x8 is a 256-bit SIMD vector of 8 float32
func BroadcastFloat32x8 ¶
BroadcastFloat32x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadFloat32x8 ¶
LoadFloat32x8 loads a Float32x8 from an array
func LoadFloat32x8Slice ¶
LoadFloat32x8Slice loads a Float32x8 from a slice of at least 8 float32s
func LoadFloat32x8SlicePart ¶
LoadFloat32x8SlicePart loads a Float32x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadFloat32x8Slice.
func LoadMaskedFloat32x8 ¶
LoadMaskedFloat32x8 loads a Float32x8 from an array, at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
func (Float32x8) Add ¶
Add adds corresponding elements of two vectors.
Asm: VADDPS, CPU Feature: AVX
func (Float32x8) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VHADDPS, CPU Feature: AVX
func (Float32x8) AddSub ¶
AddSub subtracts even elements and adds odd elements of two vectors.
Asm: VADDSUBPS, CPU Feature: AVX
func (Float32x8) AsFloat64x4 ¶
Float64x4 converts from Float32x8 to Float64x4
func (Float32x8) AsInt16x16 ¶
Int16x16 converts from Float32x8 to Int16x16
func (Float32x8) AsUint16x16 ¶
Uint16x16 converts from Float32x8 to Uint16x16
func (Float32x8) AsUint32x8 ¶
Uint32x8 converts from Float32x8 to Uint32x8
func (Float32x8) AsUint64x4 ¶
Uint64x4 converts from Float32x8 to Uint64x4
func (Float32x8) AsUint8x32 ¶
Uint8x32 converts from Float32x8 to Uint8x32
func (Float32x8) Ceil ¶
Ceil rounds elements up to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX
func (Float32x8) CeilScaled ¶
CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x8) CeilScaledResidue ¶
CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPS, CPU Feature: AVX512
func (Float32x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PS, CPU Feature: AVX512
func (Float32x8) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTPS2PD, CPU Feature: AVX512
func (Float32x8) ConvertToInt32 ¶
ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2DQ, CPU Feature: AVX
func (Float32x8) ConvertToInt64 ¶
ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2QQ, CPU Feature: AVX512
func (Float32x8) ConvertToUint32 ¶
ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2UDQ, CPU Feature: AVX512
func (Float32x8) ConvertToUint64 ¶
ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPS2UQQ, CPU Feature: AVX512
func (Float32x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPS, CPU Feature: AVX512
func (Float32x8) Floor ¶
Floor rounds elements down to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX
func (Float32x8) FloorScaled ¶
FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x8) FloorScaledResidue ¶
FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x8) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x8) GreaterEqual ¶
GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x8) IsNan ¶
IsNan checks if elements are NaN. Use as x.IsNan(x).
Asm: VCMPPS, CPU Feature: AVX
func (Float32x8) LessEqual ¶
LessEqual returns x less-than-or-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x8) Max ¶
Max computes the maximum of corresponding elements.
Asm: VMAXPS, CPU Feature: AVX
func (Float32x8) Min ¶
Min computes the minimum of corresponding elements.
Asm: VMINPS, CPU Feature: AVX
func (Float32x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VMULPS, CPU Feature: AVX
func (Float32x8) MulAdd ¶
MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PS, CPU Feature: AVX512
func (Float32x8) MulAddSub ¶
MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PS, CPU Feature: AVX512
func (Float32x8) MulSubAdd ¶
MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PS, CPU Feature: AVX512
func (Float32x8) NotEqual ¶
NotEqual returns x not-equals y, elementwise.
Asm: VCMPPS, CPU Feature: AVX
func (Float32x8) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMPS, CPU Feature: AVX2
func (Float32x8) Reciprocal ¶
Reciprocal computes an approximate reciprocal of each element.
Asm: VRCPPS, CPU Feature: AVX
func (Float32x8) ReciprocalSqrt ¶
ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRTPS, CPU Feature: AVX
func (Float32x8) RoundToEven ¶
RoundToEven rounds elements to the nearest integer.
Asm: VROUNDPS, CPU Feature: AVX
func (Float32x8) RoundToEvenScaled ¶
RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x8) RoundToEvenScaledResidue ¶
RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
func (Float32x8) Scale ¶
Scale multiplies elements by a power of 2.
Asm: VSCALEFPS, CPU Feature: AVX512
func (Float32x8) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})
returns {70, 71, 72, 73, 40, 41, 42, 43}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2F128, CPU Feature: AVX
func (Float32x8) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})
returns {4,8,25,81,64,128,169,289}
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX
func (Float32x8) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTF128, CPU Feature: AVX
func (Float32x8) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTF128, CPU Feature: AVX
func (Float32x8) Sqrt ¶
Sqrt computes the square root of each element.
Asm: VSQRTPS, CPU Feature: AVX
func (Float32x8) StoreMasked ¶
StoreMasked stores a Float32x8 to an array, at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
func (Float32x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 float32s
func (Float32x8) StoreSlicePart ¶
StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Float32x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VSUBPS, CPU Feature: AVX
func (Float32x8) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VHSUBPS, CPU Feature: AVX
func (Float32x8) TruncScaled ¶
TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPS, CPU Feature: AVX512
func (Float32x8) TruncScaledResidue ¶
TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPS, CPU Feature: AVX512
type Float64x2 ¶
type Float64x2 struct {
// contains filtered or unexported fields
}
Float64x2 is a 128-bit SIMD vector of 2 float64
func BroadcastFloat64x2 ¶
BroadcastFloat64x2 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadFloat64x2 ¶
LoadFloat64x2 loads a Float64x2 from an array
func LoadFloat64x2Slice ¶
LoadFloat64x2Slice loads a Float64x2 from a slice of at least 2 float64s
func LoadFloat64x2SlicePart ¶
LoadFloat64x2SlicePart loads a Float64x2 from the slice s. If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes. If s has 2 or more elements, the function is equivalent to LoadFloat64x2Slice.
func LoadMaskedFloat64x2 ¶
LoadMaskedFloat64x2 loads a Float64x2 from an array, at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Float64x2) Add ¶
Add adds corresponding elements of two vectors.
Asm: VADDPD, CPU Feature: AVX
func (Float64x2) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VHADDPD, CPU Feature: AVX
func (Float64x2) AddSub ¶
AddSub subtracts even elements and adds odd elements of two vectors.
Asm: VADDSUBPD, CPU Feature: AVX
func (Float64x2) AsFloat32x4 ¶
Float32x4 converts from Float64x2 to Float32x4
func (Float64x2) AsUint16x8 ¶
Uint16x8 converts from Float64x2 to Uint16x8
func (Float64x2) AsUint32x4 ¶
Uint32x4 converts from Float64x2 to Uint32x4
func (Float64x2) AsUint64x2 ¶
Uint64x2 converts from Float64x2 to Uint64x2
func (Float64x2) AsUint8x16 ¶
Uint8x16 converts from Float64x2 to Uint8x16
func (Float64x2) Broadcast128 ¶
Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2
func (Float64x2) Broadcast256 ¶
Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.
Asm: VBROADCASTSD, CPU Feature: AVX2
func (Float64x2) Broadcast512 ¶
Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.
Asm: VBROADCASTSD, CPU Feature: AVX512
func (Float64x2) Ceil ¶
Ceil rounds elements up to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX
func (Float64x2) CeilScaled ¶
CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x2) CeilScaledResidue ¶
CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x2) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPD, CPU Feature: AVX512
func (Float64x2) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PD, CPU Feature: AVX512
func (Float64x2) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32. The result vector's elements are rounded to the nearest value.
Asm: VCVTPD2PSX, CPU Feature: AVX
func (Float64x2) ConvertToInt32 ¶
ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2DQX, CPU Feature: AVX
func (Float64x2) ConvertToInt64 ¶
ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2QQ, CPU Feature: AVX512
func (Float64x2) ConvertToUint32 ¶
ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2UDQX, CPU Feature: AVX512
func (Float64x2) ConvertToUint64 ¶
ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2UQQ, CPU Feature: AVX512
func (Float64x2) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPD, CPU Feature: AVX512
func (Float64x2) Floor ¶
Floor rounds elements down to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX
func (Float64x2) FloorScaled ¶
FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x2) FloorScaledResidue ¶
FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x2) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRQ, CPU Feature: AVX
func (Float64x2) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x2) GreaterEqual ¶
GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x2) IsNan ¶
IsNan checks if elements are NaN. Use as x.IsNan(x).
Asm: VCMPPD, CPU Feature: AVX
func (Float64x2) LessEqual ¶
LessEqual returns x less-than-or-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x2) Max ¶
Max computes the maximum of corresponding elements.
Asm: VMAXPD, CPU Feature: AVX
func (Float64x2) Min ¶
Min computes the minimum of corresponding elements.
Asm: VMINPD, CPU Feature: AVX
func (Float64x2) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VMULPD, CPU Feature: AVX
func (Float64x2) MulAdd ¶
MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PD, CPU Feature: AVX512
func (Float64x2) MulAddSub ¶
MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PD, CPU Feature: AVX512
func (Float64x2) MulSubAdd ¶
MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PD, CPU Feature: AVX512
func (Float64x2) NotEqual ¶
NotEqual returns x not-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x2) Reciprocal ¶
Reciprocal computes an approximate reciprocal of each element.
Asm: VRCP14PD, CPU Feature: AVX512
func (Float64x2) ReciprocalSqrt ¶
ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRT14PD, CPU Feature: AVX512
func (Float64x2) RoundToEven ¶
RoundToEven rounds elements to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX
func (Float64x2) RoundToEvenScaled ¶
RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x2) RoundToEvenScaledResidue ¶
RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x2) Scale ¶
Scale multiplies elements by a power of 2.
Asm: VSCALEFPD, CPU Feature: AVX512
func (Float64x2) SelectFromPair ¶
SelectFromPair returns the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX
func (Float64x2) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRQ, CPU Feature: AVX
func (Float64x2) Sqrt ¶
Sqrt computes the square root of each element.
Asm: VSQRTPD, CPU Feature: AVX
func (Float64x2) StoreMasked ¶
StoreMasked stores a Float64x2 to an array, at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Float64x2) StoreSlice ¶
StoreSlice stores x into a slice of at least 2 float64s
func (Float64x2) StoreSlicePart ¶
StoreSlicePart stores the 2 elements of x into the slice s. It stores as many elements as will fit in s. If s has 2 or more elements, the method is equivalent to x.StoreSlice.
func (Float64x2) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VSUBPD, CPU Feature: AVX
func (Float64x2) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VHSUBPD, CPU Feature: AVX
func (Float64x2) TruncScaled ¶
TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x2) TruncScaledResidue ¶
TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
type Float64x4 ¶
type Float64x4 struct {
// contains filtered or unexported fields
}
Float64x4 is a 256-bit SIMD vector of 4 float64
func BroadcastFloat64x4 ¶
BroadcastFloat64x4 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadFloat64x4 ¶
LoadFloat64x4 loads a Float64x4 from an array
func LoadFloat64x4Slice ¶
LoadFloat64x4Slice loads a Float64x4 from a slice of at least 4 float64s
func LoadFloat64x4SlicePart ¶
LoadFloat64x4SlicePart loads a Float64x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadFloat64x4Slice.
func LoadMaskedFloat64x4 ¶
LoadMaskedFloat64x4 loads a Float64x4 from an array, at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Float64x4) Add ¶
Add adds corresponding elements of two vectors.
Asm: VADDPD, CPU Feature: AVX
func (Float64x4) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VHADDPD, CPU Feature: AVX
func (Float64x4) AddSub ¶
AddSub subtracts even elements and adds odd elements of two vectors.
Asm: VADDSUBPD, CPU Feature: AVX
func (Float64x4) AsFloat32x8 ¶
Float32x8 converts from Float64x4 to Float32x8
func (Float64x4) AsInt16x16 ¶
Int16x16 converts from Float64x4 to Int16x16
func (Float64x4) AsUint16x16 ¶
Uint16x16 converts from Float64x4 to Uint16x16
func (Float64x4) AsUint32x8 ¶
Uint32x8 converts from Float64x4 to Uint32x8
func (Float64x4) AsUint64x4 ¶
Uint64x4 converts from Float64x4 to Uint64x4
func (Float64x4) AsUint8x32 ¶
Uint8x32 converts from Float64x4 to Uint8x32
func (Float64x4) Ceil ¶
Ceil rounds elements up to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX
func (Float64x4) CeilScaled ¶
CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x4) CeilScaledResidue ¶
CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x4) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPD, CPU Feature: AVX512
func (Float64x4) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PD, CPU Feature: AVX512
func (Float64x4) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32. The result vector's elements are rounded to the nearest value.
Asm: VCVTPD2PSY, CPU Feature: AVX
func (Float64x4) ConvertToInt32 ¶
ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2DQY, CPU Feature: AVX
func (Float64x4) ConvertToInt64 ¶
ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2QQ, CPU Feature: AVX512
func (Float64x4) ConvertToUint32 ¶
ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2UDQY, CPU Feature: AVX512
func (Float64x4) ConvertToUint64 ¶
ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2UQQ, CPU Feature: AVX512
func (Float64x4) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPD, CPU Feature: AVX512
func (Float64x4) Floor ¶
Floor rounds elements down to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX
func (Float64x4) FloorScaled ¶
FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x4) FloorScaledResidue ¶
FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x4) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x4) GreaterEqual ¶
GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x4) IsNan ¶
IsNan checks if elements are NaN. Use as x.IsNan(x).
Asm: VCMPPD, CPU Feature: AVX
func (Float64x4) LessEqual ¶
LessEqual returns x less-than-or-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x4) Max ¶
Max computes the maximum of corresponding elements.
Asm: VMAXPD, CPU Feature: AVX
func (Float64x4) Min ¶
Min computes the minimum of corresponding elements.
Asm: VMINPD, CPU Feature: AVX
func (Float64x4) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VMULPD, CPU Feature: AVX
func (Float64x4) MulAdd ¶
MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PD, CPU Feature: AVX512
func (Float64x4) MulAddSub ¶
MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PD, CPU Feature: AVX512
func (Float64x4) MulSubAdd ¶
MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PD, CPU Feature: AVX512
func (Float64x4) NotEqual ¶
NotEqual returns x not-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX
func (Float64x4) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 2 bits (values 0-3) of each element of indices is used
Asm: VPERMPD, CPU Feature: AVX512
func (Float64x4) Reciprocal ¶
Reciprocal computes an approximate reciprocal of each element.
Asm: VRCP14PD, CPU Feature: AVX512
func (Float64x4) ReciprocalSqrt ¶
ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRT14PD, CPU Feature: AVX512
func (Float64x4) RoundToEven ¶
RoundToEven rounds elements to the nearest integer.
Asm: VROUNDPD, CPU Feature: AVX
func (Float64x4) RoundToEvenScaled ¶
RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x4) RoundToEvenScaledResidue ¶
RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x4) Scale ¶
Scale multiplies elements by a power of 2.
Asm: VSCALEFPD, CPU Feature: AVX512
func (Float64x4) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})
returns {70, 71, 40, 41}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2F128, CPU Feature: AVX
func (Float64x4) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX
func (Float64x4) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTF128, CPU Feature: AVX
func (Float64x4) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTF128, CPU Feature: AVX
func (Float64x4) Sqrt ¶
Sqrt computes the square root of each element.
Asm: VSQRTPD, CPU Feature: AVX
func (Float64x4) StoreMasked ¶
StoreMasked stores a Float64x4 to an array, at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Float64x4) StoreSlice ¶
StoreSlice stores x into a slice of at least 4 float64s
func (Float64x4) StoreSlicePart ¶
StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.
func (Float64x4) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VSUBPD, CPU Feature: AVX
func (Float64x4) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VHSUBPD, CPU Feature: AVX
func (Float64x4) TruncScaled ¶
TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x4) TruncScaledResidue ¶
TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
type Float64x8 ¶
type Float64x8 struct {
// contains filtered or unexported fields
}
Float64x8 is a 512-bit SIMD vector of 8 float64
func BroadcastFloat64x8 ¶
BroadcastFloat64x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX512F
func LoadFloat64x8 ¶
LoadFloat64x8 loads a Float64x8 from an array
func LoadFloat64x8Slice ¶
LoadFloat64x8Slice loads a Float64x8 from a slice of at least 8 float64s
func LoadFloat64x8SlicePart ¶
LoadFloat64x8SlicePart loads a Float64x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadFloat64x8Slice.
func LoadMaskedFloat64x8 ¶
LoadMaskedFloat64x8 loads a Float64x8 from an array, at those elements enabled by mask
Asm: VMOVDQU64.Z, CPU Feature: AVX512
func (Float64x8) Add ¶
Add adds corresponding elements of two vectors.
Asm: VADDPD, CPU Feature: AVX512
func (Float64x8) AsFloat32x16 ¶
func (from Float64x8) AsFloat32x16() (to Float32x16)
Float32x16 converts from Float64x8 to Float32x16
func (Float64x8) AsInt16x32 ¶
Int16x32 converts from Float64x8 to Int16x32
func (Float64x8) AsInt32x16 ¶
Int32x16 converts from Float64x8 to Int32x16
func (Float64x8) AsUint16x32 ¶
Uint16x32 converts from Float64x8 to Uint16x32
func (Float64x8) AsUint32x16 ¶
Uint32x16 converts from Float64x8 to Uint32x16
func (Float64x8) AsUint64x8 ¶
Uint64x8 converts from Float64x8 to Uint64x8
func (Float64x8) AsUint8x64 ¶
Uint8x64 converts from Float64x8 to Uint8x64
func (Float64x8) CeilScaled ¶
CeilScaled rounds elements up with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x8) CeilScaledResidue ¶
CeilScaledResidue computes the difference after ceiling with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VCOMPRESSPD, CPU Feature: AVX512
func (Float64x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2PD, CPU Feature: AVX512
func (Float64x8) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32. The result vector's elements are rounded to the nearest value.
Asm: VCVTPD2PS, CPU Feature: AVX512
func (Float64x8) ConvertToInt32 ¶
ConvertToInt32 converts element values to int32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2DQ, CPU Feature: AVX512
func (Float64x8) ConvertToInt64 ¶
ConvertToInt64 converts element values to int64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in int64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2QQ, CPU Feature: AVX512
func (Float64x8) ConvertToUint32 ¶
ConvertToUint32 converts element values to uint32. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint32, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2UDQ, CPU Feature: AVX512
func (Float64x8) ConvertToUint64 ¶
ConvertToUint64 converts element values to uint64. When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result cannot be represented in uint64, an implementation-defined architecture-specific value is returned.
Asm: VCVTTPD2UQQ, CPU Feature: AVX512
func (Float64x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VEXPANDPD, CPU Feature: AVX512
func (Float64x8) FloorScaled ¶
FloorScaled rounds elements down with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x8) FloorScaledResidue ¶
FloorScaledResidue computes the difference after flooring with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x8) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VCMPPD, CPU Feature: AVX512
func (Float64x8) GreaterEqual ¶
GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX512
func (Float64x8) IsNan ¶
IsNan checks if elements are NaN. Use as x.IsNan(x).
Asm: VCMPPD, CPU Feature: AVX512
func (Float64x8) LessEqual ¶
LessEqual returns x less-than-or-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX512
func (Float64x8) Max ¶
Max computes the maximum of corresponding elements.
Asm: VMAXPD, CPU Feature: AVX512
func (Float64x8) Min ¶
Min computes the minimum of corresponding elements.
Asm: VMINPD, CPU Feature: AVX512
func (Float64x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VMULPD, CPU Feature: AVX512
func (Float64x8) MulAdd ¶
MulAdd performs a fused (x * y) + z.
Asm: VFMADD213PD, CPU Feature: AVX512
func (Float64x8) MulAddSub ¶
MulAddSub performs a fused (x * y) - z for odd-indexed elements, and (x * y) + z for even-indexed elements.
Asm: VFMADDSUB213PD, CPU Feature: AVX512
func (Float64x8) MulSubAdd ¶
MulSubAdd performs a fused (x * y) + z for odd-indexed elements, and (x * y) - z for even-indexed elements.
Asm: VFMSUBADD213PD, CPU Feature: AVX512
func (Float64x8) NotEqual ¶
NotEqual returns x not-equals y, elementwise.
Asm: VCMPPD, CPU Feature: AVX512
func (Float64x8) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMPD, CPU Feature: AVX512
func (Float64x8) Reciprocal ¶
Reciprocal computes an approximate reciprocal of each element.
Asm: VRCP14PD, CPU Feature: AVX512
func (Float64x8) ReciprocalSqrt ¶
ReciprocalSqrt computes an approximate reciprocal of the square root of each element.
Asm: VRSQRT14PD, CPU Feature: AVX512
func (Float64x8) RoundToEvenScaled ¶
RoundToEvenScaled rounds elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x8) RoundToEvenScaledResidue ¶
RoundToEvenScaledResidue computes the difference after rounding with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
func (Float64x8) Scale ¶
Scale multiplies elements by a power of 2.
Asm: VSCALEFPD, CPU Feature: AVX512
func (Float64x8) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX512
func (Float64x8) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTF64X4, CPU Feature: AVX512
func (Float64x8) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTF64X4, CPU Feature: AVX512
func (Float64x8) Sqrt ¶
Sqrt computes the square root of each element.
Asm: VSQRTPD, CPU Feature: AVX512
func (Float64x8) StoreMasked ¶
StoreMasked stores a Float64x8 to an array, at those elements enabled by mask
Asm: VMOVDQU64, CPU Feature: AVX512
func (Float64x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 float64s
func (Float64x8) StoreSlicePart ¶
StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Float64x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VSUBPD, CPU Feature: AVX512
func (Float64x8) TruncScaled ¶
TruncScaled truncates elements with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VRNDSCALEPD, CPU Feature: AVX512
func (Float64x8) TruncScaledResidue ¶
TruncScaledResidue computes the difference after truncating with specified precision.
prec results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VREDUCEPD, CPU Feature: AVX512
type Int16x16 ¶
type Int16x16 struct {
// contains filtered or unexported fields
}
Int16x16 is a 256-bit SIMD vector of 16 int16
func BroadcastInt16x16 ¶
BroadcastInt16x16 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadInt16x16 ¶
LoadInt16x16 loads a Int16x16 from an array
func LoadInt16x16Slice ¶
LoadInt16x16Slice loads an Int16x16 from a slice of at least 16 int16s
func LoadInt16x16SlicePart ¶
LoadInt16x16SlicePart loads a Int16x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadInt16x16Slice.
func (Int16x16) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSW, CPU Feature: AVX2
func (Int16x16) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDW, CPU Feature: AVX2
func (Int16x16) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDW, CPU Feature: AVX2
func (Int16x16) AddPairsSaturated ¶
AddPairsSaturated horizontally adds adjacent pairs of elements with saturation. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDSW, CPU Feature: AVX2
func (Int16x16) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSW, CPU Feature: AVX2
func (Int16x16) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Int16x16) AsFloat32x8 ¶
Float32x8 converts from Int16x16 to Float32x8
func (Int16x16) AsFloat64x4 ¶
Float64x4 converts from Int16x16 to Float64x4
func (Int16x16) AsUint16x16 ¶
Uint16x16 converts from Int16x16 to Uint16x16
func (Int16x16) AsUint32x8 ¶
Uint32x8 converts from Int16x16 to Uint32x8
func (Int16x16) AsUint64x4 ¶
Uint64x4 converts from Int16x16 to Uint64x4
func (Int16x16) AsUint8x32 ¶
Uint8x32 converts from Int16x16 to Uint8x32
func (Int16x16) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2
func (Int16x16) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512
func (Int16x16) CopySign ¶
CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.
Asm: VPSIGNW, CPU Feature: AVX2
func (Int16x16) DotProductPairs ¶
DotProductPairs multiplies the elements and add the pairs together, yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDWD, CPU Feature: AVX2
func (Int16x16) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2
func (Int16x16) ExtendToInt32 ¶
ExtendToInt32 converts element values to int32. The result vector's elements are sign-extended.
Asm: VPMOVSXWD, CPU Feature: AVX512
func (Int16x16) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPGTW, CPU Feature: AVX2
func (Int16x16) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2
func (Int16x16) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX2
func (Int16x16) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX2
func (Int16x16) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Int16x16) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2
func (Int16x16) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2
func (Int16x16) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXSW, CPU Feature: AVX2
func (Int16x16) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINSW, CPU Feature: AVX2
func (Int16x16) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX2
func (Int16x16) MulHigh ¶
MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHW, CPU Feature: AVX2
func (Int16x16) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2
func (Int16x16) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG
func (Int16x16) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Int16x16) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used
Asm: VPERMW, CPU Feature: AVX512
func (Int16x16) PermuteScalarsHiGrouped ¶
PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4],
x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX2
func (Int16x16) PermuteScalarsLoGrouped ¶
PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7],
x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX2
func (Int16x16) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSWB, CPU Feature: AVX512
func (Int16x16) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSWB, CPU Feature: AVX512
func (Int16x16) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 42, 43, 44, 45, 46, 47, 50, 51, 52, 53, 54, 55, 56, 57}.Select128FromPair(3, 0,
{60, 61, 62, 63, 64, 65, 66, 67, 70, 71, 72, 73, 74, 75, 76, 77})
returns {70, 71, 72, 73, 74, 75, 76, 77, 40, 41, 42, 43, 44, 45, 46, 47}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Int16x16) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int16x16) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int16x16) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLW, CPU Feature: AVX2
func (Int16x16) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2
func (Int16x16) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAW, CPU Feature: AVX2
func (Int16x16) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2
func (Int16x16) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVW, CPU Feature: AVX512
func (Int16x16) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2
func (Int16x16) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVW, CPU Feature: AVX512
func (Int16x16) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2
func (Int16x16) StoreSlice ¶
StoreSlice stores x into a slice of at least 16 int16s
func (Int16x16) StoreSlicePart ¶
StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.
func (Int16x16) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX2
func (Int16x16) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBW, CPU Feature: AVX2
func (Int16x16) SubPairsSaturated ¶
SubPairsSaturated horizontally subtracts adjacent pairs of elements with saturation. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBSW, CPU Feature: AVX2
func (Int16x16) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSW, CPU Feature: AVX2
func (Int16x16) ToMask ¶
ToMask converts from Int16x16 to Mask16x16, mask element is set to true when the corresponding vector element is non-zero.
func (Int16x16) TruncateToInt8 ¶
TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVWB, CPU Feature: AVX512
type Int16x32 ¶
type Int16x32 struct {
// contains filtered or unexported fields
}
Int16x32 is a 512-bit SIMD vector of 32 int16
func BroadcastInt16x32 ¶
BroadcastInt16x32 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX512BW
func LoadInt16x32 ¶
LoadInt16x32 loads a Int16x32 from an array
func LoadInt16x32Slice ¶
LoadInt16x32Slice loads an Int16x32 from a slice of at least 32 int16s
func LoadInt16x32SlicePart ¶
LoadInt16x32SlicePart loads a Int16x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadInt16x32Slice.
func LoadMaskedInt16x32 ¶
LoadMaskedInt16x32 loads a Int16x32 from an array, at those elements enabled by mask
Asm: VMOVDQU16.Z, CPU Feature: AVX512
func (Int16x32) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSW, CPU Feature: AVX512
func (Int16x32) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDW, CPU Feature: AVX512
func (Int16x32) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSW, CPU Feature: AVX512
func (Int16x32) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512
func (Int16x32) AsFloat32x16 ¶
func (from Int16x32) AsFloat32x16() (to Float32x16)
Float32x16 converts from Int16x32 to Float32x16
func (Int16x32) AsFloat64x8 ¶
Float64x8 converts from Int16x32 to Float64x8
func (Int16x32) AsInt32x16 ¶
Int32x16 converts from Int16x32 to Int32x16
func (Int16x32) AsUint16x32 ¶
Uint16x32 converts from Int16x32 to Uint16x32
func (Int16x32) AsUint32x16 ¶
Uint32x16 converts from Int16x32 to Uint32x16
func (Int16x32) AsUint64x8 ¶
Uint64x8 converts from Int16x32 to Uint64x8
func (Int16x32) AsUint8x64 ¶
Uint8x64 converts from Int16x32 to Uint8x64
func (Int16x32) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2
func (Int16x32) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512
func (Int16x32) DotProductPairs ¶
DotProductPairs multiplies the elements and add the pairs together, yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDWD, CPU Feature: AVX512
func (Int16x32) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2
func (Int16x32) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPGTW, CPU Feature: AVX512
func (Int16x32) GreaterEqual ¶
GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPW, CPU Feature: AVX512
func (Int16x32) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX512
func (Int16x32) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX512
func (Int16x32) LessEqual ¶
LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPW, CPU Feature: AVX512
func (Int16x32) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXSW, CPU Feature: AVX512
func (Int16x32) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINSW, CPU Feature: AVX512
func (Int16x32) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX512
func (Int16x32) MulHigh ¶
MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHW, CPU Feature: AVX512
func (Int16x32) NotEqual ¶
NotEqual returns x not-equals y, elementwise.
Asm: VPCMPW, CPU Feature: AVX512
func (Int16x32) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG
func (Int16x32) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512
func (Int16x32) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 5 bits (values 0-31) of each element of indices is used
Asm: VPERMW, CPU Feature: AVX512
func (Int16x32) PermuteScalarsHiGrouped ¶
PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4],
x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12],
x[16], x[17], x[18], x[19], x[a+20], x[b+20], x[c+20], x[d+20],
x[24], x[25], x[26], x[27], x[a+28], x[b+28], x[c+28], x[d+28]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX512
func (Int16x32) PermuteScalarsLoGrouped ¶
PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7],
x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15],
x[a+16], x[b+16], x[c+16], x[d+16], x[20], x[21], x[22], x[23],
x[a+24], x[b+24], x[c+24], x[d+24], x[28], x[29], x[30], x[31]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX512
func (Int16x32) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements.
Asm: VPMOVSWB, CPU Feature: AVX512
func (Int16x32) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int16x32) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int16x32) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLW, CPU Feature: AVX512
func (Int16x32) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2
func (Int16x32) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAW, CPU Feature: AVX512
func (Int16x32) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2
func (Int16x32) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVW, CPU Feature: AVX512
func (Int16x32) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2
func (Int16x32) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVW, CPU Feature: AVX512
func (Int16x32) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2
func (Int16x32) StoreMasked ¶
StoreMasked stores a Int16x32 to an array, at those elements enabled by mask
Asm: VMOVDQU16, CPU Feature: AVX512
func (Int16x32) StoreSlice ¶
StoreSlice stores x into a slice of at least 32 int16s
func (Int16x32) StoreSlicePart ¶
StoreSlicePart stores the 32 elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.
func (Int16x32) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX512
func (Int16x32) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSW, CPU Feature: AVX512
func (Int16x32) ToMask ¶
ToMask converts from Int16x32 to Mask16x32, mask element is set to true when the corresponding vector element is non-zero.
func (Int16x32) TruncateToInt8 ¶
TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements.
Asm: VPMOVWB, CPU Feature: AVX512
type Int16x8 ¶
type Int16x8 struct {
// contains filtered or unexported fields
}
Int16x8 is a 128-bit SIMD vector of 8 int16
func BroadcastInt16x8 ¶
BroadcastInt16x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadInt16x8Slice ¶
LoadInt16x8Slice loads an Int16x8 from a slice of at least 8 int16s
func LoadInt16x8SlicePart ¶
LoadInt16x8SlicePart loads a Int16x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadInt16x8Slice.
func (Int16x8) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDW, CPU Feature: AVX
func (Int16x8) AddPairsSaturated ¶
AddPairsSaturated horizontally adds adjacent pairs of elements with saturation. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDSW, CPU Feature: AVX
func (Int16x8) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSW, CPU Feature: AVX
func (Int16x8) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Int16x8) AsFloat32x4 ¶
Float32x4 converts from Int16x8 to Float32x4
func (Int16x8) AsFloat64x2 ¶
Float64x2 converts from Int16x8 to Float64x2
func (Int16x8) AsUint16x8 ¶
Uint16x8 converts from Int16x8 to Uint16x8
func (Int16x8) AsUint32x4 ¶
Uint32x4 converts from Int16x8 to Uint32x4
func (Int16x8) AsUint64x2 ¶
Uint64x2 converts from Int16x8 to Uint64x2
func (Int16x8) AsUint8x16 ¶
Uint8x16 converts from Int16x8 to Uint8x16
func (Int16x8) Broadcast128 ¶
Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.
Asm: VPBROADCASTW, CPU Feature: AVX2
func (Int16x8) Broadcast256 ¶
Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.
Asm: VPBROADCASTW, CPU Feature: AVX2
func (Int16x8) Broadcast512 ¶
Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.
Asm: VPBROADCASTW, CPU Feature: AVX512
func (Int16x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2
func (Int16x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512
func (Int16x8) CopySign ¶
CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.
Asm: VPSIGNW, CPU Feature: AVX
func (Int16x8) DotProductPairs ¶
DotProductPairs multiplies the elements and add the pairs together, yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDWD, CPU Feature: AVX
func (Int16x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2
func (Int16x8) ExtendLo2ToInt64x2 ¶
ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64. The result vector's elements are sign-extended.
Asm: VPMOVSXWQ, CPU Feature: AVX
func (Int16x8) ExtendLo4ToInt32x4 ¶
ExtendLo4ToInt32x4 converts 4 lowest vector element values to int32. The result vector's elements are sign-extended.
Asm: VPMOVSXWD, CPU Feature: AVX
func (Int16x8) ExtendLo4ToInt64x4 ¶
ExtendLo4ToInt64x4 converts 4 lowest vector element values to int64. The result vector's elements are sign-extended.
Asm: VPMOVSXWQ, CPU Feature: AVX2
func (Int16x8) ExtendToInt32 ¶
ExtendToInt32 converts element values to int32. The result vector's elements are sign-extended.
Asm: VPMOVSXWD, CPU Feature: AVX2
func (Int16x8) ExtendToInt64 ¶
ExtendToInt64 converts element values to int64. The result vector's elements are sign-extended.
Asm: VPMOVSXWQ, CPU Feature: AVX512
func (Int16x8) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRW, CPU Feature: AVX512
func (Int16x8) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPGTW, CPU Feature: AVX
func (Int16x8) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX
func (Int16x8) InterleaveHi ¶
InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX
func (Int16x8) InterleaveLo ¶
InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX
func (Int16x8) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Int16x8) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX
func (Int16x8) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX
func (Int16x8) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXSW, CPU Feature: AVX
func (Int16x8) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINSW, CPU Feature: AVX
func (Int16x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX
func (Int16x8) MulHigh ¶
MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHW, CPU Feature: AVX
func (Int16x8) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX
func (Int16x8) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG
func (Int16x8) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Int16x8) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMW, CPU Feature: AVX512
func (Int16x8) PermuteScalarsHi ¶
PermuteScalarsHi performs a permutation of vector x using the supplied indices:
result = {x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX512
func (Int16x8) PermuteScalarsLo ¶
PermuteScalarsLo performs a permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX512
func (Int16x8) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSWB, CPU Feature: AVX512
func (Int16x8) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSWB, CPU Feature: AVX512
func (Int16x8) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRW, CPU Feature: AVX
func (Int16x8) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLW, CPU Feature: AVX
func (Int16x8) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2
func (Int16x8) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAW, CPU Feature: AVX
func (Int16x8) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2
func (Int16x8) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVW, CPU Feature: AVX512
func (Int16x8) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2
func (Int16x8) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVW, CPU Feature: AVX512
func (Int16x8) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2
func (Int16x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 int16s
func (Int16x8) StoreSlicePart ¶
StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Int16x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX
func (Int16x8) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBW, CPU Feature: AVX
func (Int16x8) SubPairsSaturated ¶
SubPairsSaturated horizontally subtracts adjacent pairs of elements with saturation. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBSW, CPU Feature: AVX
func (Int16x8) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSW, CPU Feature: AVX
func (Int16x8) ToMask ¶
ToMask converts from Int16x8 to Mask16x8, mask element is set to true when the corresponding vector element is non-zero.
func (Int16x8) TruncateToInt8 ¶
TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVWB, CPU Feature: AVX512
type Int32x16 ¶
type Int32x16 struct {
// contains filtered or unexported fields
}
Int32x16 is a 512-bit SIMD vector of 16 int32
func BroadcastInt32x16 ¶
BroadcastInt32x16 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX512F
func LoadInt32x16 ¶
LoadInt32x16 loads a Int32x16 from an array
func LoadInt32x16Slice ¶
LoadInt32x16Slice loads an Int32x16 from a slice of at least 16 int32s
func LoadInt32x16SlicePart ¶
LoadInt32x16SlicePart loads a Int32x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadInt32x16Slice.
func LoadMaskedInt32x16 ¶
LoadMaskedInt32x16 loads a Int32x16 from an array, at those elements enabled by mask
Asm: VMOVDQU32.Z, CPU Feature: AVX512
func (Int32x16) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSD, CPU Feature: AVX512
func (Int32x16) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDD, CPU Feature: AVX512
func (Int32x16) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512
func (Int32x16) AsFloat32x16 ¶
func (from Int32x16) AsFloat32x16() (to Float32x16)
Float32x16 converts from Int32x16 to Float32x16
func (Int32x16) AsFloat64x8 ¶
Float64x8 converts from Int32x16 to Float64x8
func (Int32x16) AsInt16x32 ¶
Int16x32 converts from Int32x16 to Int16x32
func (Int32x16) AsUint16x32 ¶
Uint16x32 converts from Int32x16 to Uint16x32
func (Int32x16) AsUint32x16 ¶
Uint32x16 converts from Int32x16 to Uint32x16
func (Int32x16) AsUint64x8 ¶
Uint64x8 converts from Int32x16 to Uint64x8
func (Int32x16) AsUint8x64 ¶
Uint8x64 converts from Int32x16 to Uint8x64
func (Int32x16) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512
func (Int32x16) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512
func (Int32x16) ConvertToFloat32 ¶
func (x Int32x16) ConvertToFloat32() Float32x16
ConvertToFloat32 converts element values to float32.
Asm: VCVTDQ2PS, CPU Feature: AVX512
func (Int32x16) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512
func (Int32x16) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPGTD, CPU Feature: AVX512
func (Int32x16) GreaterEqual ¶
GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPD, CPU Feature: AVX512
func (Int32x16) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX512
func (Int32x16) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX512
func (Int32x16) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512
func (Int32x16) LessEqual ¶
LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPD, CPU Feature: AVX512
func (Int32x16) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXSD, CPU Feature: AVX512
func (Int32x16) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINSD, CPU Feature: AVX512
func (Int32x16) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX512
func (Int32x16) NotEqual ¶
NotEqual returns x not-equals y, elementwise.
Asm: VPCMPD, CPU Feature: AVX512
func (Int32x16) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ
func (Int32x16) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512
func (Int32x16) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used
Asm: VPERMD, CPU Feature: AVX512
func (Int32x16) PermuteScalarsGrouped ¶
PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{ x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4],
x[a+8], x[b+8], x[c+8], x[d+8], x[a+12], x[b+12], x[c+12], x[d+12]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.
Asm: VPSHUFD, CPU Feature: AVX512
func (Int32x16) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512
func (Int32x16) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512
func (Int32x16) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512
func (Int32x16) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512
func (Int32x16) SaturateToInt16 ¶
SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.
Asm: VPMOVSDW, CPU Feature: AVX512
func (Int32x16) SaturateToInt16Concat ¶
SaturateToInt16Concat converts element values to int16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.
Asm: VPACKSSDW, CPU Feature: AVX512
func (Int32x16) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSDB, CPU Feature: AVX512
func (Int32x16) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSDB, CPU Feature: AVX512
func (Int32x16) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX512
func (Int32x16) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int32x16) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int32x16) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLD, CPU Feature: AVX512
func (Int32x16) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2
func (Int32x16) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAD, CPU Feature: AVX512
func (Int32x16) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2
func (Int32x16) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVD, CPU Feature: AVX512
func (Int32x16) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2
func (Int32x16) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVD, CPU Feature: AVX512
func (Int32x16) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2
func (Int32x16) StoreMasked ¶
StoreMasked stores a Int32x16 to an array, at those elements enabled by mask
Asm: VMOVDQU32, CPU Feature: AVX512
func (Int32x16) StoreSlice ¶
StoreSlice stores x into a slice of at least 16 int32s
func (Int32x16) StoreSlicePart ¶
StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.
func (Int32x16) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX512
func (Int32x16) ToMask ¶
ToMask converts from Int32x16 to Mask32x16, mask element is set to true when the corresponding vector element is non-zero.
func (Int32x16) TruncateToInt16 ¶
TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.
Asm: VPMOVDW, CPU Feature: AVX512
func (Int32x16) TruncateToInt8 ¶
TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVDB, CPU Feature: AVX512
type Int32x4 ¶
type Int32x4 struct {
// contains filtered or unexported fields
}
Int32x4 is a 128-bit SIMD vector of 4 int32
func BroadcastInt32x4 ¶
BroadcastInt32x4 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadInt32x4Slice ¶
LoadInt32x4Slice loads an Int32x4 from a slice of at least 4 int32s
func LoadInt32x4SlicePart ¶
LoadInt32x4SlicePart loads a Int32x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadInt32x4Slice.
func LoadMaskedInt32x4 ¶
LoadMaskedInt32x4 loads a Int32x4 from an array, at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
func (Int32x4) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDD, CPU Feature: AVX
func (Int32x4) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Int32x4) AsFloat32x4 ¶
Float32x4 converts from Int32x4 to Float32x4
func (Int32x4) AsFloat64x2 ¶
Float64x2 converts from Int32x4 to Float64x2
func (Int32x4) AsUint16x8 ¶
Uint16x8 converts from Int32x4 to Uint16x8
func (Int32x4) AsUint32x4 ¶
Uint32x4 converts from Int32x4 to Uint32x4
func (Int32x4) AsUint64x2 ¶
Uint64x2 converts from Int32x4 to Uint64x2
func (Int32x4) AsUint8x16 ¶
Uint8x16 converts from Int32x4 to Uint8x16
func (Int32x4) Broadcast128 ¶
Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.
Asm: VPBROADCASTD, CPU Feature: AVX2
func (Int32x4) Broadcast256 ¶
Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.
Asm: VPBROADCASTD, CPU Feature: AVX2
func (Int32x4) Broadcast512 ¶
Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.
Asm: VPBROADCASTD, CPU Feature: AVX512
func (Int32x4) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512
func (Int32x4) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512
func (Int32x4) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTDQ2PS, CPU Feature: AVX
func (Int32x4) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTDQ2PD, CPU Feature: AVX
func (Int32x4) CopySign ¶
CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.
Asm: VPSIGND, CPU Feature: AVX
func (Int32x4) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512
func (Int32x4) ExtendLo2ToInt64x2 ¶
ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64. The result vector's elements are sign-extended.
Asm: VPMOVSXDQ, CPU Feature: AVX
func (Int32x4) ExtendToInt64 ¶
ExtendToInt64 converts element values to int64. The result vector's elements are sign-extended.
Asm: VPMOVSXDQ, CPU Feature: AVX2
func (Int32x4) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRD, CPU Feature: AVX
func (Int32x4) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPGTD, CPU Feature: AVX
func (Int32x4) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX
func (Int32x4) InterleaveHi ¶
InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX
func (Int32x4) InterleaveLo ¶
InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX
func (Int32x4) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Int32x4) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512
func (Int32x4) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX
func (Int32x4) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX
func (Int32x4) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXSD, CPU Feature: AVX
func (Int32x4) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINSD, CPU Feature: AVX
func (Int32x4) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX
func (Int32x4) MulEvenWiden ¶
MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1.Even[i] * v2.Even[i].
Asm: VPMULDQ, CPU Feature: AVX
func (Int32x4) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX
func (Int32x4) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ
func (Int32x4) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Int32x4) PermuteScalars ¶
PermuteScalars performs a permutation of vector x's elements using the supplied indices:
result = {x[a], x[b], x[c], x[d]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.
Asm: VPSHUFD, CPU Feature: AVX
func (Int32x4) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512
func (Int32x4) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512
func (Int32x4) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512
func (Int32x4) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512
func (Int32x4) SaturateToInt16 ¶
SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.
Asm: VPMOVSDW, CPU Feature: AVX512
func (Int32x4) SaturateToInt16Concat ¶
SaturateToInt16Concat converts element values to int16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.
Asm: VPACKSSDW, CPU Feature: AVX
func (Int32x4) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSDB, CPU Feature: AVX512
func (Int32x4) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSDB, CPU Feature: AVX512
func (Int32x4) SelectFromPair ¶
SelectFromPair returns the selection of four elements from the two vectors x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX
func (Int32x4) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRD, CPU Feature: AVX
func (Int32x4) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLD, CPU Feature: AVX
func (Int32x4) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2
func (Int32x4) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAD, CPU Feature: AVX
func (Int32x4) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2
func (Int32x4) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVD, CPU Feature: AVX2
func (Int32x4) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2
func (Int32x4) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVD, CPU Feature: AVX2
func (Int32x4) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2
func (Int32x4) StoreMasked ¶
StoreMasked stores a Int32x4 to an array, at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
func (Int32x4) StoreSlice ¶
StoreSlice stores x into a slice of at least 4 int32s
func (Int32x4) StoreSlicePart ¶
StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.
func (Int32x4) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX
func (Int32x4) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBD, CPU Feature: AVX
func (Int32x4) ToMask ¶
ToMask converts from Int32x4 to Mask32x4, mask element is set to true when the corresponding vector element is non-zero.
func (Int32x4) TruncateToInt16 ¶
TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.
Asm: VPMOVDW, CPU Feature: AVX512
func (Int32x4) TruncateToInt8 ¶
TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVDB, CPU Feature: AVX512
type Int32x8 ¶
type Int32x8 struct {
// contains filtered or unexported fields
}
Int32x8 is a 256-bit SIMD vector of 8 int32
func BroadcastInt32x8 ¶
BroadcastInt32x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadInt32x8Slice ¶
LoadInt32x8Slice loads an Int32x8 from a slice of at least 8 int32s
func LoadInt32x8SlicePart ¶
LoadInt32x8SlicePart loads a Int32x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadInt32x8Slice.
func LoadMaskedInt32x8 ¶
LoadMaskedInt32x8 loads a Int32x8 from an array, at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
func (Int32x8) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSD, CPU Feature: AVX2
func (Int32x8) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDD, CPU Feature: AVX2
func (Int32x8) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Int32x8) AsFloat32x8 ¶
Float32x8 converts from Int32x8 to Float32x8
func (Int32x8) AsFloat64x4 ¶
Float64x4 converts from Int32x8 to Float64x4
func (Int32x8) AsInt16x16 ¶
Int16x16 converts from Int32x8 to Int16x16
func (Int32x8) AsUint16x16 ¶
Uint16x16 converts from Int32x8 to Uint16x16
func (Int32x8) AsUint32x8 ¶
Uint32x8 converts from Int32x8 to Uint32x8
func (Int32x8) AsUint64x4 ¶
Uint64x4 converts from Int32x8 to Uint64x4
func (Int32x8) AsUint8x32 ¶
Uint8x32 converts from Int32x8 to Uint8x32
func (Int32x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512
func (Int32x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512
func (Int32x8) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTDQ2PS, CPU Feature: AVX
func (Int32x8) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTDQ2PD, CPU Feature: AVX512
func (Int32x8) CopySign ¶
CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.
Asm: VPSIGND, CPU Feature: AVX2
func (Int32x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512
func (Int32x8) ExtendToInt64 ¶
ExtendToInt64 converts element values to int64. The result vector's elements are sign-extended.
Asm: VPMOVSXDQ, CPU Feature: AVX512
func (Int32x8) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPGTD, CPU Feature: AVX2
func (Int32x8) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2
func (Int32x8) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX2
func (Int32x8) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX2
func (Int32x8) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Int32x8) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512
func (Int32x8) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2
func (Int32x8) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2
func (Int32x8) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXSD, CPU Feature: AVX2
func (Int32x8) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINSD, CPU Feature: AVX2
func (Int32x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX2
func (Int32x8) MulEvenWiden ¶
MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1.Even[i] * v2.Even[i].
Asm: VPMULDQ, CPU Feature: AVX2
func (Int32x8) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2
func (Int32x8) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ
func (Int32x8) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Int32x8) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMD, CPU Feature: AVX2
func (Int32x8) PermuteScalarsGrouped ¶
PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.
Asm: VPSHUFD, CPU Feature: AVX2
func (Int32x8) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512
func (Int32x8) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512
func (Int32x8) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512
func (Int32x8) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512
func (Int32x8) SaturateToInt16 ¶
SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.
Asm: VPMOVSDW, CPU Feature: AVX512
func (Int32x8) SaturateToInt16Concat ¶
SaturateToInt16Concat converts element values to int16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.
Asm: VPACKSSDW, CPU Feature: AVX2
func (Int32x8) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSDB, CPU Feature: AVX512
func (Int32x8) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSDB, CPU Feature: AVX512
func (Int32x8) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})
returns {70, 71, 72, 73, 40, 41, 42, 43}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Int32x8) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})
returns {4,8,25,81,64,128,169,289}
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX
func (Int32x8) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int32x8) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int32x8) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLD, CPU Feature: AVX2
func (Int32x8) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2
func (Int32x8) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAD, CPU Feature: AVX2
func (Int32x8) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2
func (Int32x8) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVD, CPU Feature: AVX2
func (Int32x8) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2
func (Int32x8) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVD, CPU Feature: AVX2
func (Int32x8) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2
func (Int32x8) StoreMasked ¶
StoreMasked stores a Int32x8 to an array, at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
func (Int32x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 int32s
func (Int32x8) StoreSlicePart ¶
StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Int32x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX2
func (Int32x8) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBD, CPU Feature: AVX2
func (Int32x8) ToMask ¶
ToMask converts from Int32x8 to Mask32x8, mask element is set to true when the corresponding vector element is non-zero.
func (Int32x8) TruncateToInt16 ¶
TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.
Asm: VPMOVDW, CPU Feature: AVX512
func (Int32x8) TruncateToInt8 ¶
TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVDB, CPU Feature: AVX512
type Int64x2 ¶
type Int64x2 struct {
// contains filtered or unexported fields
}
Int64x2 is a 128-bit SIMD vector of 2 int64
func BroadcastInt64x2 ¶
BroadcastInt64x2 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadInt64x2Slice ¶
LoadInt64x2Slice loads an Int64x2 from a slice of at least 2 int64s
func LoadInt64x2SlicePart ¶
LoadInt64x2SlicePart loads a Int64x2 from the slice s. If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes. If s has 2 or more elements, the function is equivalent to LoadInt64x2Slice.
func LoadMaskedInt64x2 ¶
LoadMaskedInt64x2 loads a Int64x2 from an array, at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Int64x2) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSQ, CPU Feature: AVX512
func (Int64x2) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Int64x2) AsFloat32x4 ¶
Float32x4 converts from Int64x2 to Float32x4
func (Int64x2) AsFloat64x2 ¶
Float64x2 converts from Int64x2 to Float64x2
func (Int64x2) AsUint16x8 ¶
Uint16x8 converts from Int64x2 to Uint16x8
func (Int64x2) AsUint32x4 ¶
Uint32x4 converts from Int64x2 to Uint32x4
func (Int64x2) AsUint64x2 ¶
Uint64x2 converts from Int64x2 to Uint64x2
func (Int64x2) AsUint8x16 ¶
Uint8x16 converts from Int64x2 to Uint8x16
func (Int64x2) Broadcast128 ¶
Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2
func (Int64x2) Broadcast256 ¶
Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2
func (Int64x2) Broadcast512 ¶
Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX512
func (Int64x2) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512
func (Int64x2) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512
func (Int64x2) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTQQ2PSX, CPU Feature: AVX512
func (Int64x2) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTQQ2PD, CPU Feature: AVX512
func (Int64x2) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512
func (Int64x2) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRQ, CPU Feature: AVX
func (Int64x2) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPGTQ, CPU Feature: AVX
func (Int64x2) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX
func (Int64x2) InterleaveHi ¶
InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX
func (Int64x2) InterleaveLo ¶
InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX
func (Int64x2) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Int64x2) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512
func (Int64x2) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX
func (Int64x2) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX
func (Int64x2) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXSQ, CPU Feature: AVX512
func (Int64x2) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINSQ, CPU Feature: AVX512
func (Int64x2) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512
func (Int64x2) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX
func (Int64x2) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ
func (Int64x2) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Int64x2) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512
func (Int64x2) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512
func (Int64x2) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512
func (Int64x2) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512
func (Int64x2) SaturateToInt16 ¶
SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.
Asm: VPMOVSQW, CPU Feature: AVX512
func (Int64x2) SaturateToInt32 ¶
SaturateToInt32 converts element values to int32. Conversion is done with saturation on the vector elements.
Asm: VPMOVSQD, CPU Feature: AVX512
func (Int64x2) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSQB, CPU Feature: AVX512
func (Int64x2) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSQB, CPU Feature: AVX512
func (Int64x2) SelectFromPair ¶
SelectFromPair returns the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX
func (Int64x2) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRQ, CPU Feature: AVX
func (Int64x2) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLQ, CPU Feature: AVX
func (Int64x2) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2
func (Int64x2) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAQ, CPU Feature: AVX512
func (Int64x2) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2
func (Int64x2) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVQ, CPU Feature: AVX2
func (Int64x2) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2
func (Int64x2) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVQ, CPU Feature: AVX512
func (Int64x2) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2
func (Int64x2) StoreMasked ¶
StoreMasked stores a Int64x2 to an array, at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Int64x2) StoreSlice ¶
StoreSlice stores x into a slice of at least 2 int64s
func (Int64x2) StoreSlicePart ¶
StoreSlicePart stores the 2 elements of x into the slice s. It stores as many elements as will fit in s. If s has 2 or more elements, the method is equivalent to x.StoreSlice.
func (Int64x2) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX
func (Int64x2) ToMask ¶
ToMask converts from Int64x2 to Mask64x2, mask element is set to true when the corresponding vector element is non-zero.
func (Int64x2) TruncateToInt16 ¶
TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.
Asm: VPMOVQW, CPU Feature: AVX512
func (Int64x2) TruncateToInt32 ¶
TruncateToInt32 converts element values to int32. Conversion is done with truncation on the vector elements.
Asm: VPMOVQD, CPU Feature: AVX512
func (Int64x2) TruncateToInt8 ¶
TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVQB, CPU Feature: AVX512
type Int64x4 ¶
type Int64x4 struct {
// contains filtered or unexported fields
}
Int64x4 is a 256-bit SIMD vector of 4 int64
func BroadcastInt64x4 ¶
BroadcastInt64x4 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadInt64x4Slice ¶
LoadInt64x4Slice loads an Int64x4 from a slice of at least 4 int64s
func LoadInt64x4SlicePart ¶
LoadInt64x4SlicePart loads a Int64x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadInt64x4Slice.
func LoadMaskedInt64x4 ¶
LoadMaskedInt64x4 loads a Int64x4 from an array, at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Int64x4) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSQ, CPU Feature: AVX512
func (Int64x4) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Int64x4) AsFloat32x8 ¶
Float32x8 converts from Int64x4 to Float32x8
func (Int64x4) AsFloat64x4 ¶
Float64x4 converts from Int64x4 to Float64x4
func (Int64x4) AsInt16x16 ¶
Int16x16 converts from Int64x4 to Int16x16
func (Int64x4) AsUint16x16 ¶
Uint16x16 converts from Int64x4 to Uint16x16
func (Int64x4) AsUint32x8 ¶
Uint32x8 converts from Int64x4 to Uint32x8
func (Int64x4) AsUint64x4 ¶
Uint64x4 converts from Int64x4 to Uint64x4
func (Int64x4) AsUint8x32 ¶
Uint8x32 converts from Int64x4 to Uint8x32
func (Int64x4) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512
func (Int64x4) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512
func (Int64x4) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTQQ2PSY, CPU Feature: AVX512
func (Int64x4) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTQQ2PD, CPU Feature: AVX512
func (Int64x4) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512
func (Int64x4) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPGTQ, CPU Feature: AVX2
func (Int64x4) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2
func (Int64x4) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX2
func (Int64x4) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX2
func (Int64x4) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Int64x4) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512
func (Int64x4) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2
func (Int64x4) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2
func (Int64x4) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXSQ, CPU Feature: AVX512
func (Int64x4) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINSQ, CPU Feature: AVX512
func (Int64x4) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512
func (Int64x4) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2
func (Int64x4) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ
func (Int64x4) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Int64x4) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 2 bits (values 0-3) of each element of indices is used
Asm: VPERMQ, CPU Feature: AVX512
func (Int64x4) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512
func (Int64x4) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512
func (Int64x4) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512
func (Int64x4) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512
func (Int64x4) SaturateToInt16 ¶
SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.
Asm: VPMOVSQW, CPU Feature: AVX512
func (Int64x4) SaturateToInt32 ¶
SaturateToInt32 converts element values to int32. Conversion is done with saturation on the vector elements.
Asm: VPMOVSQD, CPU Feature: AVX512
func (Int64x4) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSQB, CPU Feature: AVX512
func (Int64x4) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSQB, CPU Feature: AVX512
func (Int64x4) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})
returns {70, 71, 40, 41}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Int64x4) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX
func (Int64x4) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int64x4) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int64x4) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLQ, CPU Feature: AVX2
func (Int64x4) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2
func (Int64x4) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAQ, CPU Feature: AVX512
func (Int64x4) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2
func (Int64x4) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVQ, CPU Feature: AVX2
func (Int64x4) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2
func (Int64x4) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVQ, CPU Feature: AVX512
func (Int64x4) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2
func (Int64x4) StoreMasked ¶
StoreMasked stores a Int64x4 to an array, at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Int64x4) StoreSlice ¶
StoreSlice stores x into a slice of at least 4 int64s
func (Int64x4) StoreSlicePart ¶
StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.
func (Int64x4) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX2
func (Int64x4) ToMask ¶
ToMask converts from Int64x4 to Mask64x4, mask element is set to true when the corresponding vector element is non-zero.
func (Int64x4) TruncateToInt16 ¶
TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.
Asm: VPMOVQW, CPU Feature: AVX512
func (Int64x4) TruncateToInt32 ¶
TruncateToInt32 converts element values to int32. Conversion is done with truncation on the vector elements.
Asm: VPMOVQD, CPU Feature: AVX512
func (Int64x4) TruncateToInt8 ¶
TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVQB, CPU Feature: AVX512
type Int64x8 ¶
type Int64x8 struct {
// contains filtered or unexported fields
}
Int64x8 is a 512-bit SIMD vector of 8 int64
func BroadcastInt64x8 ¶
BroadcastInt64x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX512F
func LoadInt64x8Slice ¶
LoadInt64x8Slice loads an Int64x8 from a slice of at least 8 int64s
func LoadInt64x8SlicePart ¶
LoadInt64x8SlicePart loads a Int64x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadInt64x8Slice.
func LoadMaskedInt64x8 ¶
LoadMaskedInt64x8 loads a Int64x8 from an array, at those elements enabled by mask
Asm: VMOVDQU64.Z, CPU Feature: AVX512
func (Int64x8) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSQ, CPU Feature: AVX512
func (Int64x8) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDQ, CPU Feature: AVX512
func (Int64x8) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDQ, CPU Feature: AVX512
func (Int64x8) AsFloat32x16 ¶
func (from Int64x8) AsFloat32x16() (to Float32x16)
Float32x16 converts from Int64x8 to Float32x16
func (Int64x8) AsFloat64x8 ¶
Float64x8 converts from Int64x8 to Float64x8
func (Int64x8) AsInt16x32 ¶
Int16x32 converts from Int64x8 to Int16x32
func (Int64x8) AsInt32x16 ¶
Int32x16 converts from Int64x8 to Int32x16
func (Int64x8) AsUint16x32 ¶
Uint16x32 converts from Int64x8 to Uint16x32
func (Int64x8) AsUint32x16 ¶
Uint32x16 converts from Int64x8 to Uint32x16
func (Int64x8) AsUint64x8 ¶
Uint64x8 converts from Int64x8 to Uint64x8
func (Int64x8) AsUint8x64 ¶
Uint8x64 converts from Int64x8 to Uint8x64
func (Int64x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512
func (Int64x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512
func (Int64x8) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTQQ2PS, CPU Feature: AVX512
func (Int64x8) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTQQ2PD, CPU Feature: AVX512
func (Int64x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512
func (Int64x8) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPGTQ, CPU Feature: AVX512
func (Int64x8) GreaterEqual ¶
GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPQ, CPU Feature: AVX512
func (Int64x8) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX512
func (Int64x8) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX512
func (Int64x8) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512
func (Int64x8) LessEqual ¶
LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPQ, CPU Feature: AVX512
func (Int64x8) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXSQ, CPU Feature: AVX512
func (Int64x8) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINSQ, CPU Feature: AVX512
func (Int64x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512
func (Int64x8) NotEqual ¶
NotEqual returns x not-equals y, elementwise.
Asm: VPCMPQ, CPU Feature: AVX512
func (Int64x8) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ
func (Int64x8) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORQ, CPU Feature: AVX512
func (Int64x8) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMQ, CPU Feature: AVX512
func (Int64x8) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512
func (Int64x8) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512
func (Int64x8) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512
func (Int64x8) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512
func (Int64x8) SaturateToInt16 ¶
SaturateToInt16 converts element values to int16. Conversion is done with saturation on the vector elements.
Asm: VPMOVSQW, CPU Feature: AVX512
func (Int64x8) SaturateToInt32 ¶
SaturateToInt32 converts element values to int32. Conversion is done with saturation on the vector elements.
Asm: VPMOVSQD, CPU Feature: AVX512
func (Int64x8) SaturateToInt8 ¶
SaturateToInt8 converts element values to int8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSQB, CPU Feature: AVX512
func (Int64x8) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVSQB, CPU Feature: AVX512
func (Int64x8) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX512
func (Int64x8) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int64x8) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int64x8) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLQ, CPU Feature: AVX512
func (Int64x8) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2
func (Int64x8) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are filled with the sign bit.
Asm: VPSRAQ, CPU Feature: AVX512
func (Int64x8) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2
func (Int64x8) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVQ, CPU Feature: AVX512
func (Int64x8) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2
func (Int64x8) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are filled with the sign bit.
Asm: VPSRAVQ, CPU Feature: AVX512
func (Int64x8) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2
func (Int64x8) StoreMasked ¶
StoreMasked stores a Int64x8 to an array, at those elements enabled by mask
Asm: VMOVDQU64, CPU Feature: AVX512
func (Int64x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 int64s
func (Int64x8) StoreSlicePart ¶
StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Int64x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX512
func (Int64x8) ToMask ¶
ToMask converts from Int64x8 to Mask64x8, mask element is set to true when the corresponding vector element is non-zero.
func (Int64x8) TruncateToInt16 ¶
TruncateToInt16 converts element values to int16. Conversion is done with truncation on the vector elements.
Asm: VPMOVQW, CPU Feature: AVX512
func (Int64x8) TruncateToInt32 ¶
TruncateToInt32 converts element values to int32. Conversion is done with truncation on the vector elements.
Asm: VPMOVQD, CPU Feature: AVX512
func (Int64x8) TruncateToInt8 ¶
TruncateToInt8 converts element values to int8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVQB, CPU Feature: AVX512
type Int8x16 ¶
type Int8x16 struct {
// contains filtered or unexported fields
}
Int8x16 is a 128-bit SIMD vector of 16 int8
func BroadcastInt8x16 ¶
BroadcastInt8x16 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadInt8x16Slice ¶
LoadInt8x16Slice loads an Int8x16 from a slice of at least 16 int8s
func LoadInt8x16SlicePart ¶
LoadInt8x16SlicePart loads a Int8x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadInt8x16Slice.
func (Int8x16) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSB, CPU Feature: AVX
func (Int8x16) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Int8x16) AsFloat32x4 ¶
Float32x4 converts from Int8x16 to Float32x4
func (Int8x16) AsFloat64x2 ¶
Float64x2 converts from Int8x16 to Float64x2
func (Int8x16) AsUint16x8 ¶
Uint16x8 converts from Int8x16 to Uint16x8
func (Int8x16) AsUint32x4 ¶
Uint32x4 converts from Int8x16 to Uint32x4
func (Int8x16) AsUint64x2 ¶
Uint64x2 converts from Int8x16 to Uint64x2
func (Int8x16) AsUint8x16 ¶
Uint8x16 converts from Int8x16 to Uint8x16
func (Int8x16) Broadcast128 ¶
Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.
Asm: VPBROADCASTB, CPU Feature: AVX2
func (Int8x16) Broadcast256 ¶
Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.
Asm: VPBROADCASTB, CPU Feature: AVX2
func (Int8x16) Broadcast512 ¶
Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.
Asm: VPBROADCASTB, CPU Feature: AVX512
func (Int8x16) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2
func (Int8x16) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI
func (Int8x16) CopySign ¶
CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.
Asm: VPSIGNB, CPU Feature: AVX
func (Int8x16) DotProductQuadruple ¶
DotProductQuadruple performs dot products on groups of 4 elements of x and y. DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.
Asm: VPDPBUSD, CPU Feature: AVXVNNI
func (Int8x16) DotProductQuadrupleSaturated ¶
DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y. DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.
Asm: VPDPBUSDS, CPU Feature: AVXVNNI
func (Int8x16) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2
func (Int8x16) ExtendLo2ToInt64x2 ¶
ExtendLo2ToInt64x2 converts 2 lowest vector element values to int64. The result vector's elements are sign-extended.
Asm: VPMOVSXBQ, CPU Feature: AVX
func (Int8x16) ExtendLo4ToInt32x4 ¶
ExtendLo4ToInt32x4 converts 4 lowest vector element values to int32. The result vector's elements are sign-extended.
Asm: VPMOVSXBD, CPU Feature: AVX
func (Int8x16) ExtendLo4ToInt64x4 ¶
ExtendLo4ToInt64x4 converts 4 lowest vector element values to int64. The result vector's elements are sign-extended.
Asm: VPMOVSXBQ, CPU Feature: AVX2
func (Int8x16) ExtendLo8ToInt16x8 ¶
ExtendLo8ToInt16x8 converts 8 lowest vector element values to int16. The result vector's elements are sign-extended.
Asm: VPMOVSXBW, CPU Feature: AVX
func (Int8x16) ExtendLo8ToInt32x8 ¶
ExtendLo8ToInt32x8 converts 8 lowest vector element values to int32. The result vector's elements are sign-extended.
Asm: VPMOVSXBD, CPU Feature: AVX2
func (Int8x16) ExtendLo8ToInt64x8 ¶
ExtendLo8ToInt64x8 converts 8 lowest vector element values to int64. The result vector's elements are sign-extended.
Asm: VPMOVSXBQ, CPU Feature: AVX512
func (Int8x16) ExtendToInt16 ¶
ExtendToInt16 converts element values to int16. The result vector's elements are sign-extended.
Asm: VPMOVSXBW, CPU Feature: AVX2
func (Int8x16) ExtendToInt32 ¶
ExtendToInt32 converts element values to int32. The result vector's elements are sign-extended.
Asm: VPMOVSXBD, CPU Feature: AVX512
func (Int8x16) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRB, CPU Feature: AVX512
func (Int8x16) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPGTB, CPU Feature: AVX
func (Int8x16) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX
func (Int8x16) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Int8x16) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX
func (Int8x16) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX
func (Int8x16) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXSB, CPU Feature: AVX
func (Int8x16) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINSB, CPU Feature: AVX
func (Int8x16) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX
func (Int8x16) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG
func (Int8x16) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Int8x16) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used
Asm: VPERMB, CPU Feature: AVX512VBMI
func (Int8x16) PermuteOrZero ¶
PermuteOrZero performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The lower four bits of each byte-sized index in indices select an element from x, unless the index's sign bit is set in which case zero is used instead.
Asm: VPSHUFB, CPU Feature: AVX
func (Int8x16) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRB, CPU Feature: AVX
func (Int8x16) StoreSlice ¶
StoreSlice stores x into a slice of at least 16 int8s
func (Int8x16) StoreSlicePart ¶
StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.
func (Int8x16) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX
func (Int8x16) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSB, CPU Feature: AVX
type Int8x32 ¶
type Int8x32 struct {
// contains filtered or unexported fields
}
Int8x32 is a 256-bit SIMD vector of 32 int8
func BroadcastInt8x32 ¶
BroadcastInt8x32 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadInt8x32Slice ¶
LoadInt8x32Slice loads an Int8x32 from a slice of at least 32 int8s
func LoadInt8x32SlicePart ¶
LoadInt8x32SlicePart loads a Int8x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadInt8x32Slice.
func (Int8x32) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSB, CPU Feature: AVX2
func (Int8x32) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSB, CPU Feature: AVX2
func (Int8x32) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Int8x32) AsFloat32x8 ¶
Float32x8 converts from Int8x32 to Float32x8
func (Int8x32) AsFloat64x4 ¶
Float64x4 converts from Int8x32 to Float64x4
func (Int8x32) AsInt16x16 ¶
Int16x16 converts from Int8x32 to Int16x16
func (Int8x32) AsUint16x16 ¶
Uint16x16 converts from Int8x32 to Uint16x16
func (Int8x32) AsUint32x8 ¶
Uint32x8 converts from Int8x32 to Uint32x8
func (Int8x32) AsUint64x4 ¶
Uint64x4 converts from Int8x32 to Uint64x4
func (Int8x32) AsUint8x32 ¶
Uint8x32 converts from Int8x32 to Uint8x32
func (Int8x32) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2
func (Int8x32) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI
func (Int8x32) CopySign ¶
CopySign returns the product of the first operand with -1, 0, or 1, whichever constant is nearest to the value of the second operand.
Asm: VPSIGNB, CPU Feature: AVX2
func (Int8x32) DotProductQuadruple ¶
DotProductQuadruple performs dot products on groups of 4 elements of x and y. DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.
Asm: VPDPBUSD, CPU Feature: AVXVNNI
func (Int8x32) DotProductQuadrupleSaturated ¶
DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y. DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.
Asm: VPDPBUSDS, CPU Feature: AVXVNNI
func (Int8x32) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2
func (Int8x32) ExtendToInt16 ¶
ExtendToInt16 converts element values to int16. The result vector's elements are sign-extended.
Asm: VPMOVSXBW, CPU Feature: AVX512
func (Int8x32) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPGTB, CPU Feature: AVX2
func (Int8x32) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2
func (Int8x32) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Int8x32) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2
func (Int8x32) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2
func (Int8x32) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXSB, CPU Feature: AVX2
func (Int8x32) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINSB, CPU Feature: AVX2
func (Int8x32) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2
func (Int8x32) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG
func (Int8x32) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Int8x32) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 5 bits (values 0-31) of each element of indices is used
Asm: VPERMB, CPU Feature: AVX512VBMI
func (Int8x32) PermuteOrZeroGrouped ¶
PermuteOrZeroGrouped performs a grouped permutation of vector x using indices: result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...} The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.
Asm: VPSHUFB, CPU Feature: AVX2
func (Int8x32) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{0x40, 0x41, ..., 0x4f, 0x50, 0x51, ..., 0x5f}.Select128FromPair(3, 0,
{0x60, 0x61, ..., 0x6f, 0x70, 0x71, ..., 0x7f})
returns {0x70, 0x71, ..., 0x7f, 0x40, 0x41, ..., 0x4f}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Int8x32) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int8x32) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Int8x32) StoreSlice ¶
StoreSlice stores x into a slice of at least 32 int8s
func (Int8x32) StoreSlicePart ¶
StoreSlicePart stores the elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.
func (Int8x32) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX2
func (Int8x32) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSB, CPU Feature: AVX2
type Int8x64 ¶
type Int8x64 struct {
// contains filtered or unexported fields
}
Int8x64 is a 512-bit SIMD vector of 64 int8
func BroadcastInt8x64 ¶
BroadcastInt8x64 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX512BW
func LoadInt8x64Slice ¶
LoadInt8x64Slice loads an Int8x64 from a slice of at least 64 int8s
func LoadInt8x64SlicePart ¶
LoadInt8x64SlicePart loads a Int8x64 from the slice s. If s has fewer than 64 elements, the remaining elements of the vector are filled with zeroes. If s has 64 or more elements, the function is equivalent to LoadInt8x64Slice.
func LoadMaskedInt8x64 ¶
LoadMaskedInt8x64 loads a Int8x64 from an array, at those elements enabled by mask
Asm: VMOVDQU8.Z, CPU Feature: AVX512
func (Int8x64) Abs ¶
Abs computes the absolute value of each element.
Asm: VPABSB, CPU Feature: AVX512
func (Int8x64) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDB, CPU Feature: AVX512
func (Int8x64) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDSB, CPU Feature: AVX512
func (Int8x64) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512
func (Int8x64) AsFloat32x16 ¶
func (from Int8x64) AsFloat32x16() (to Float32x16)
Float32x16 converts from Int8x64 to Float32x16
func (Int8x64) AsFloat64x8 ¶
Float64x8 converts from Int8x64 to Float64x8
func (Int8x64) AsInt16x32 ¶
Int16x32 converts from Int8x64 to Int16x32
func (Int8x64) AsInt32x16 ¶
Int32x16 converts from Int8x64 to Int32x16
func (Int8x64) AsUint16x32 ¶
Uint16x32 converts from Int8x64 to Uint16x32
func (Int8x64) AsUint32x16 ¶
Uint32x16 converts from Int8x64 to Uint32x16
func (Int8x64) AsUint64x8 ¶
Uint64x8 converts from Int8x64 to Uint64x8
func (Int8x64) AsUint8x64 ¶
Uint8x64 converts from Int8x64 to Uint8x64
func (Int8x64) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2
func (Int8x64) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI
func (Int8x64) DotProductQuadruple ¶
DotProductQuadruple performs dot products on groups of 4 elements of x and y. DotProductQuadruple(x, y).Add(z) will be optimized to the full form of the underlying instruction.
Asm: VPDPBUSD, CPU Feature: AVX512VNNI
func (Int8x64) DotProductQuadrupleSaturated ¶
DotProductQuadrupleSaturated multiplies performs dot products on groups of 4 elements of x and y. DotProductQuadrupleSaturated(x, y).Add(z) will be optimized to the full form of the underlying instruction.
Asm: VPDPBUSDS, CPU Feature: AVX512VNNI
func (Int8x64) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2
func (Int8x64) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPGTB, CPU Feature: AVX512
func (Int8x64) GreaterEqual ¶
GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPB, CPU Feature: AVX512
func (Int8x64) LessEqual ¶
LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPB, CPU Feature: AVX512
func (Int8x64) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXSB, CPU Feature: AVX512
func (Int8x64) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINSB, CPU Feature: AVX512
func (Int8x64) NotEqual ¶
NotEqual returns x not-equals y, elementwise.
Asm: VPCMPB, CPU Feature: AVX512
func (Int8x64) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG
func (Int8x64) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512
func (Int8x64) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 6 bits (values 0-63) of each element of indices is used
Asm: VPERMB, CPU Feature: AVX512VBMI
func (Int8x64) PermuteOrZeroGrouped ¶
PermuteOrZeroGrouped performs a grouped permutation of vector x using indices: result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...} The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.
Asm: VPSHUFB, CPU Feature: AVX512
func (Int8x64) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int8x64) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Int8x64) StoreMasked ¶
StoreMasked stores a Int8x64 to an array, at those elements enabled by mask
Asm: VMOVDQU8, CPU Feature: AVX512
func (Int8x64) StoreSlice ¶
StoreSlice stores x into a slice of at least 64 int8s
func (Int8x64) StoreSlicePart ¶
StoreSlicePart stores the 64 elements of x into the slice s. It stores as many elements as will fit in s. If s has 64 or more elements, the method is equivalent to x.StoreSlice.
func (Int8x64) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX512
func (Int8x64) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBSB, CPU Feature: AVX512
type Mask16x16 ¶
type Mask16x16 struct {
// contains filtered or unexported fields
}
Mask16x16 is a 256-bit SIMD vector of 16 int16
func Mask16x16FromBits ¶
Mask16x16FromBits constructs a Mask16x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVW, CPU Feature: AVX512
func (Mask16x16) ToBits ¶
ToBits constructs a bitmap from a Mask16x16, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVW, CPU Features: AVX512
func (Mask16x16) ToInt16x16 ¶
ToInt16x16 converts from Mask16x16 to Int16x16
type Mask16x32 ¶
type Mask16x32 struct {
// contains filtered or unexported fields
}
Mask16x32 is a 512-bit SIMD vector of 32 int16
func Mask16x32FromBits ¶
Mask16x32FromBits constructs a Mask16x32 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVW, CPU Feature: AVX512
func (Mask16x32) ToBits ¶
ToBits constructs a bitmap from a Mask16x32, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVW, CPU Features: AVX512
func (Mask16x32) ToInt16x32 ¶
ToInt16x32 converts from Mask16x32 to Int16x32
type Mask16x8 ¶
type Mask16x8 struct {
// contains filtered or unexported fields
}
Mask16x8 is a 128-bit SIMD vector of 8 int16
func Mask16x8FromBits ¶
Mask16x8FromBits constructs a Mask16x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVW, CPU Feature: AVX512
type Mask32x16 ¶
type Mask32x16 struct {
// contains filtered or unexported fields
}
Mask32x16 is a 512-bit SIMD vector of 16 int32
func Mask32x16FromBits ¶
Mask32x16FromBits constructs a Mask32x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVD, CPU Feature: AVX512
func (Mask32x16) ToBits ¶
ToBits constructs a bitmap from a Mask32x16, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVD, CPU Features: AVX512
func (Mask32x16) ToInt32x16 ¶
ToInt32x16 converts from Mask32x16 to Int32x16
type Mask32x4 ¶
type Mask32x4 struct {
// contains filtered or unexported fields
}
Mask32x4 is a 128-bit SIMD vector of 4 int32
func Mask32x4FromBits ¶
Mask32x4FromBits constructs a Mask32x4 from a bitmap value, where 1 means set for the indexed element, 0 means unset. Only the lower 4 bits of y are used.
Asm: KMOVD, CPU Feature: AVX512
type Mask32x8 ¶
type Mask32x8 struct {
// contains filtered or unexported fields
}
Mask32x8 is a 256-bit SIMD vector of 8 int32
func Mask32x8FromBits ¶
Mask32x8FromBits constructs a Mask32x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVD, CPU Feature: AVX512
type Mask64x2 ¶
type Mask64x2 struct {
// contains filtered or unexported fields
}
Mask64x2 is a 128-bit SIMD vector of 2 int64
func Mask64x2FromBits ¶
Mask64x2FromBits constructs a Mask64x2 from a bitmap value, where 1 means set for the indexed element, 0 means unset. Only the lower 2 bits of y are used.
Asm: KMOVQ, CPU Feature: AVX512
type Mask64x4 ¶
type Mask64x4 struct {
// contains filtered or unexported fields
}
Mask64x4 is a 256-bit SIMD vector of 4 int64
func Mask64x4FromBits ¶
Mask64x4FromBits constructs a Mask64x4 from a bitmap value, where 1 means set for the indexed element, 0 means unset. Only the lower 4 bits of y are used.
Asm: KMOVQ, CPU Feature: AVX512
type Mask64x8 ¶
type Mask64x8 struct {
// contains filtered or unexported fields
}
Mask64x8 is a 512-bit SIMD vector of 8 int64
func Mask64x8FromBits ¶
Mask64x8FromBits constructs a Mask64x8 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVQ, CPU Feature: AVX512
type Mask8x16 ¶
type Mask8x16 struct {
// contains filtered or unexported fields
}
Mask8x16 is a 128-bit SIMD vector of 16 int8
func Mask8x16FromBits ¶
Mask8x16FromBits constructs a Mask8x16 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVB, CPU Feature: AVX512
type Mask8x32 ¶
type Mask8x32 struct {
// contains filtered or unexported fields
}
Mask8x32 is a 256-bit SIMD vector of 32 int8
func Mask8x32FromBits ¶
Mask8x32FromBits constructs a Mask8x32 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVB, CPU Feature: AVX512
type Mask8x64 ¶
type Mask8x64 struct {
// contains filtered or unexported fields
}
Mask8x64 is a 512-bit SIMD vector of 64 int8
func Mask8x64FromBits ¶
Mask8x64FromBits constructs a Mask8x64 from a bitmap value, where 1 means set for the indexed element, 0 means unset.
Asm: KMOVB, CPU Feature: AVX512
type Uint16x16 ¶
type Uint16x16 struct {
// contains filtered or unexported fields
}
Uint16x16 is a 256-bit SIMD vector of 16 uint16
func BroadcastUint16x16 ¶
BroadcastUint16x16 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadUint16x16 ¶
LoadUint16x16 loads a Uint16x16 from an array
func LoadUint16x16Slice ¶
LoadUint16x16Slice loads an Uint16x16 from a slice of at least 16 uint16s
func LoadUint16x16SlicePart ¶
LoadUint16x16SlicePart loads a Uint16x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadUint16x16Slice.
func (Uint16x16) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDW, CPU Feature: AVX2
func (Uint16x16) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDW, CPU Feature: AVX2
func (Uint16x16) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSW, CPU Feature: AVX2
func (Uint16x16) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Uint16x16) AsFloat32x8 ¶
Float32x8 converts from Uint16x16 to Float32x8
func (Uint16x16) AsFloat64x4 ¶
Float64x4 converts from Uint16x16 to Float64x4
func (Uint16x16) AsInt16x16 ¶
Int16x16 converts from Uint16x16 to Int16x16
func (Uint16x16) AsUint32x8 ¶
Uint32x8 converts from Uint16x16 to Uint32x8
func (Uint16x16) AsUint64x4 ¶
Uint64x4 converts from Uint16x16 to Uint64x4
func (Uint16x16) AsUint8x32 ¶
Uint8x32 converts from Uint16x16 to Uint8x32
func (Uint16x16) Average ¶
Average computes the rounded average of corresponding elements.
Asm: VPAVGW, CPU Feature: AVX2
func (Uint16x16) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2
func (Uint16x16) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512
func (Uint16x16) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2
func (Uint16x16) ExtendToUint32 ¶
ExtendToUint32 converts element values to uint32. The result vector's elements are zero-extended.
Asm: VPMOVZXWD, CPU Feature: AVX512
func (Uint16x16) Greater ¶
Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX2
func (Uint16x16) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2
func (Uint16x16) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX2
func (Uint16x16) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX2
func (Uint16x16) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Uint16x16) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2
func (Uint16x16) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2
func (Uint16x16) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXUW, CPU Feature: AVX2
func (Uint16x16) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINUW, CPU Feature: AVX2
func (Uint16x16) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX2
func (Uint16x16) MulHigh ¶
MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHUW, CPU Feature: AVX2
func (Uint16x16) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2
func (Uint16x16) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG
func (Uint16x16) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Uint16x16) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used
Asm: VPERMW, CPU Feature: AVX512
func (Uint16x16) PermuteScalarsHiGrouped ¶
PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4],
x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12]}
Each group is of size 128-bit.
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX2
func (Uint16x16) PermuteScalarsLoGrouped ¶
PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7],
x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX2
func (Uint16x16) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 42, 43, 44, 45, 46, 47, 50, 51, 52, 53, 54, 55, 56, 57}.Select128FromPair(3, 0,
{60, 61, 62, 63, 64, 65, 66, 67, 70, 71, 72, 73, 74, 75, 76, 77})
returns {70, 71, 72, 73, 74, 75, 76, 77, 40, 41, 42, 43, 44, 45, 46, 47}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Uint16x16) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint16x16) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint16x16) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLW, CPU Feature: AVX2
func (Uint16x16) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2
func (Uint16x16) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLW, CPU Feature: AVX2
func (Uint16x16) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2
func (Uint16x16) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVW, CPU Feature: AVX512
func (Uint16x16) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2
func (Uint16x16) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVW, CPU Feature: AVX512
func (Uint16x16) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2
func (Uint16x16) StoreSlice ¶
StoreSlice stores x into a slice of at least 16 uint16s
func (Uint16x16) StoreSlicePart ¶
StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.
func (Uint16x16) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX2
func (Uint16x16) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBW, CPU Feature: AVX2
func (Uint16x16) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSW, CPU Feature: AVX2
func (Uint16x16) TruncateToUint8 ¶
TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVWB, CPU Feature: AVX512
type Uint16x32 ¶
type Uint16x32 struct {
// contains filtered or unexported fields
}
Uint16x32 is a 512-bit SIMD vector of 32 uint16
func BroadcastUint16x32 ¶
BroadcastUint16x32 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX512BW
func LoadMaskedUint16x32 ¶
LoadMaskedUint16x32 loads a Uint16x32 from an array, at those elements enabled by mask
Asm: VMOVDQU16.Z, CPU Feature: AVX512
func LoadUint16x32 ¶
LoadUint16x32 loads a Uint16x32 from an array
func LoadUint16x32Slice ¶
LoadUint16x32Slice loads an Uint16x32 from a slice of at least 32 uint16s
func LoadUint16x32SlicePart ¶
LoadUint16x32SlicePart loads a Uint16x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadUint16x32Slice.
func (Uint16x32) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDW, CPU Feature: AVX512
func (Uint16x32) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSW, CPU Feature: AVX512
func (Uint16x32) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512
func (Uint16x32) AsFloat32x16 ¶
func (from Uint16x32) AsFloat32x16() (to Float32x16)
Float32x16 converts from Uint16x32 to Float32x16
func (Uint16x32) AsFloat64x8 ¶
Float64x8 converts from Uint16x32 to Float64x8
func (Uint16x32) AsInt16x32 ¶
Int16x32 converts from Uint16x32 to Int16x32
func (Uint16x32) AsInt32x16 ¶
Int32x16 converts from Uint16x32 to Int32x16
func (Uint16x32) AsUint32x16 ¶
Uint32x16 converts from Uint16x32 to Uint32x16
func (Uint16x32) AsUint64x8 ¶
Uint64x8 converts from Uint16x32 to Uint64x8
func (Uint16x32) AsUint8x64 ¶
Uint8x64 converts from Uint16x32 to Uint8x64
func (Uint16x32) Average ¶
Average computes the rounded average of corresponding elements.
Asm: VPAVGW, CPU Feature: AVX512
func (Uint16x32) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2
func (Uint16x32) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512
func (Uint16x32) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2
func (Uint16x32) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPUW, CPU Feature: AVX512
func (Uint16x32) GreaterEqual ¶
GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPUW, CPU Feature: AVX512
func (Uint16x32) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX512
func (Uint16x32) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX512
func (Uint16x32) LessEqual ¶
LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPUW, CPU Feature: AVX512
func (Uint16x32) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXUW, CPU Feature: AVX512
func (Uint16x32) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINUW, CPU Feature: AVX512
func (Uint16x32) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX512
func (Uint16x32) MulHigh ¶
MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHUW, CPU Feature: AVX512
func (Uint16x32) NotEqual ¶
NotEqual returns x not-equals y, elementwise.
Asm: VPCMPUW, CPU Feature: AVX512
func (Uint16x32) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG
func (Uint16x32) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512
func (Uint16x32) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 5 bits (values 0-31) of each element of indices is used
Asm: VPERMW, CPU Feature: AVX512
func (Uint16x32) PermuteScalarsHiGrouped ¶
PermuteScalarsHiGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{ x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4],
x[8], x[9], x[10], x[11], x[a+12], x[b+12], x[c+12], x[d+12],
x[16], x[17], x[18], x[19], x[a+20], x[b+20], x[c+20], x[d+20],
x[24], x[25], x[26], x[27], x[a+28], x[b+28], x[c+28], x[d+28]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX512
func (Uint16x32) PermuteScalarsLoGrouped ¶
PermuteScalarsLoGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7],
x[a+8], x[b+8], x[c+8], x[d+8], x[12], x[13], x[14], x[15],
x[a+16], x[b+16], x[c+16], x[d+16], x[20], x[21], x[22], x[23],
x[a+24], x[b+24], x[c+24], x[d+24], x[28], x[29], x[30], x[31]}
Each group is of size 128-bit.
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX512
func (Uint16x32) SaturateToUint8 ¶
SaturateToUint8 converts element values to uint8. Conversion is done with saturation on the vector elements.
Asm: VPMOVUSWB, CPU Feature: AVX512
func (Uint16x32) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint16x32) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint16x32) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLW, CPU Feature: AVX512
func (Uint16x32) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2
func (Uint16x32) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLW, CPU Feature: AVX512
func (Uint16x32) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2
func (Uint16x32) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVW, CPU Feature: AVX512
func (Uint16x32) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2
func (Uint16x32) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVW, CPU Feature: AVX512
func (Uint16x32) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2
func (Uint16x32) StoreMasked ¶
StoreMasked stores a Uint16x32 to an array, at those elements enabled by mask
Asm: VMOVDQU16, CPU Feature: AVX512
func (Uint16x32) StoreSlice ¶
StoreSlice stores x into a slice of at least 32 uint16s
func (Uint16x32) StoreSlicePart ¶
StoreSlicePart stores the 32 elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.
func (Uint16x32) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX512
func (Uint16x32) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSW, CPU Feature: AVX512
func (Uint16x32) TruncateToUint8 ¶
TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements.
Asm: VPMOVWB, CPU Feature: AVX512
type Uint16x8 ¶
type Uint16x8 struct {
// contains filtered or unexported fields
}
Uint16x8 is a 128-bit SIMD vector of 8 uint16
func BroadcastUint16x8 ¶
BroadcastUint16x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadUint16x8 ¶
LoadUint16x8 loads a Uint16x8 from an array
func LoadUint16x8Slice ¶
LoadUint16x8Slice loads an Uint16x8 from a slice of at least 8 uint16s
func LoadUint16x8SlicePart ¶
LoadUint16x8SlicePart loads a Uint16x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadUint16x8Slice.
func (Uint16x8) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDW, CPU Feature: AVX
func (Uint16x8) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSW, CPU Feature: AVX
func (Uint16x8) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Uint16x8) AsFloat32x4 ¶
Float32x4 converts from Uint16x8 to Float32x4
func (Uint16x8) AsFloat64x2 ¶
Float64x2 converts from Uint16x8 to Float64x2
func (Uint16x8) AsUint32x4 ¶
Uint32x4 converts from Uint16x8 to Uint32x4
func (Uint16x8) AsUint64x2 ¶
Uint64x2 converts from Uint16x8 to Uint64x2
func (Uint16x8) AsUint8x16 ¶
Uint8x16 converts from Uint16x8 to Uint8x16
func (Uint16x8) Average ¶
Average computes the rounded average of corresponding elements.
Asm: VPAVGW, CPU Feature: AVX
func (Uint16x8) Broadcast128 ¶
Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.
Asm: VPBROADCASTW, CPU Feature: AVX2
func (Uint16x8) Broadcast256 ¶
Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.
Asm: VPBROADCASTW, CPU Feature: AVX2
func (Uint16x8) Broadcast512 ¶
Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.
Asm: VPBROADCASTW, CPU Feature: AVX512
func (Uint16x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSW, CPU Feature: AVX512VBMI2
func (Uint16x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2W, CPU Feature: AVX512
func (Uint16x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDW, CPU Feature: AVX512VBMI2
func (Uint16x8) ExtendLo2ToUint64x2 ¶
ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64. The result vector's elements are zero-extended.
Asm: VPMOVZXWQ, CPU Feature: AVX
func (Uint16x8) ExtendLo4ToUint32x4 ¶
ExtendLo4ToUint32x4 converts 4 lowest vector element values to uint32. The result vector's elements are zero-extended.
Asm: VPMOVZXWD, CPU Feature: AVX
func (Uint16x8) ExtendLo4ToUint64x4 ¶
ExtendLo4ToUint64x4 converts 4 lowest vector element values to uint64. The result vector's elements are zero-extended.
Asm: VPMOVZXWQ, CPU Feature: AVX2
func (Uint16x8) ExtendToUint32 ¶
ExtendToUint32 converts element values to uint32. The result vector's elements are zero-extended.
Asm: VPMOVZXWD, CPU Feature: AVX2
func (Uint16x8) ExtendToUint64 ¶
ExtendToUint64 converts element values to uint64. The result vector's elements are zero-extended.
Asm: VPMOVZXWQ, CPU Feature: AVX512
func (Uint16x8) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRW, CPU Feature: AVX512
func (Uint16x8) Greater ¶
Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX
func (Uint16x8) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX
func (Uint16x8) InterleaveHi ¶
InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHWD, CPU Feature: AVX
func (Uint16x8) InterleaveLo ¶
InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLWD, CPU Feature: AVX
func (Uint16x8) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Uint16x8) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX
func (Uint16x8) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX
func (Uint16x8) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXUW, CPU Feature: AVX
func (Uint16x8) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINUW, CPU Feature: AVX
func (Uint16x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLW, CPU Feature: AVX
func (Uint16x8) MulHigh ¶
MulHigh multiplies elements and stores the high part of the result.
Asm: VPMULHUW, CPU Feature: AVX
func (Uint16x8) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX
func (Uint16x8) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTW, CPU Feature: AVX512BITALG
func (Uint16x8) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Uint16x8) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMW, CPU Feature: AVX512
func (Uint16x8) PermuteScalarsHi ¶
PermuteScalarsHi performs a permutation of vector x using the supplied indices:
result = {x[0], x[1], x[2], x[3], x[a+4], x[b+4], x[c+4], x[d+4]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFHW, CPU Feature: AVX512
func (Uint16x8) PermuteScalarsLo ¶
PermuteScalarsLo performs a permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[4], x[5], x[6], x[7]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFLW, CPU Feature: AVX512
func (Uint16x8) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRW, CPU Feature: AVX
func (Uint16x8) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLW, CPU Feature: AVX
func (Uint16x8) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDW, CPU Feature: AVX512VBMI2
func (Uint16x8) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLW, CPU Feature: AVX
func (Uint16x8) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDW, CPU Feature: AVX512VBMI2
func (Uint16x8) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVW, CPU Feature: AVX512
func (Uint16x8) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVW, CPU Feature: AVX512VBMI2
func (Uint16x8) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVW, CPU Feature: AVX512
func (Uint16x8) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVW, CPU Feature: AVX512VBMI2
func (Uint16x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 uint16s
func (Uint16x8) StoreSlicePart ¶
StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Uint16x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBW, CPU Feature: AVX
func (Uint16x8) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBW, CPU Feature: AVX
func (Uint16x8) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSW, CPU Feature: AVX
func (Uint16x8) TruncateToUint8 ¶
TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVWB, CPU Feature: AVX512
type Uint32x16 ¶
type Uint32x16 struct {
// contains filtered or unexported fields
}
Uint32x16 is a 512-bit SIMD vector of 16 uint32
func BroadcastUint32x16 ¶
BroadcastUint32x16 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX512F
func LoadMaskedUint32x16 ¶
LoadMaskedUint32x16 loads a Uint32x16 from an array, at those elements enabled by mask
Asm: VMOVDQU32.Z, CPU Feature: AVX512
func LoadUint32x16 ¶
LoadUint32x16 loads a Uint32x16 from an array
func LoadUint32x16Slice ¶
LoadUint32x16Slice loads an Uint32x16 from a slice of at least 16 uint32s
func LoadUint32x16SlicePart ¶
LoadUint32x16SlicePart loads a Uint32x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadUint32x16Slice.
func (Uint32x16) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDD, CPU Feature: AVX512
func (Uint32x16) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512
func (Uint32x16) AsFloat32x16 ¶
func (from Uint32x16) AsFloat32x16() (to Float32x16)
Float32x16 converts from Uint32x16 to Float32x16
func (Uint32x16) AsFloat64x8 ¶
Float64x8 converts from Uint32x16 to Float64x8
func (Uint32x16) AsInt16x32 ¶
Int16x32 converts from Uint32x16 to Int16x32
func (Uint32x16) AsInt32x16 ¶
Int32x16 converts from Uint32x16 to Int32x16
func (Uint32x16) AsUint16x32 ¶
Uint16x32 converts from Uint32x16 to Uint16x32
func (Uint32x16) AsUint64x8 ¶
Uint64x8 converts from Uint32x16 to Uint64x8
func (Uint32x16) AsUint8x64 ¶
Uint8x64 converts from Uint32x16 to Uint8x64
func (Uint32x16) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512
func (Uint32x16) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512
func (Uint32x16) ConvertToFloat32 ¶
func (x Uint32x16) ConvertToFloat32() Float32x16
ConvertToFloat32 converts element values to float32.
Asm: VCVTUDQ2PS, CPU Feature: AVX512
func (Uint32x16) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512
func (Uint32x16) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPUD, CPU Feature: AVX512
func (Uint32x16) GreaterEqual ¶
GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPUD, CPU Feature: AVX512
func (Uint32x16) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX512
func (Uint32x16) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX512
func (Uint32x16) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512
func (Uint32x16) LessEqual ¶
LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPUD, CPU Feature: AVX512
func (Uint32x16) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXUD, CPU Feature: AVX512
func (Uint32x16) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINUD, CPU Feature: AVX512
func (Uint32x16) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX512
func (Uint32x16) NotEqual ¶
NotEqual returns x not-equals y, elementwise.
Asm: VPCMPUD, CPU Feature: AVX512
func (Uint32x16) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ
func (Uint32x16) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512
func (Uint32x16) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used
Asm: VPERMD, CPU Feature: AVX512
func (Uint32x16) PermuteScalarsGrouped ¶
PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
result =
{ x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4],
x[a+8], x[b+8], x[c+8], x[d+8], x[a+12], x[b+12], x[c+12], x[d+12]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFD, CPU Feature: AVX512
func (Uint32x16) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512
func (Uint32x16) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512
func (Uint32x16) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512
func (Uint32x16) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512
func (Uint32x16) SaturateToUint16 ¶
SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.
Asm: VPMOVUSDW, CPU Feature: AVX512
func (Uint32x16) SaturateToUint16Concat ¶
SaturateToUint16Concat converts element values to uint16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.
Asm: VPACKUSDW, CPU Feature: AVX512
func (Uint32x16) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX512
func (Uint32x16) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint32x16) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint32x16) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLD, CPU Feature: AVX512
func (Uint32x16) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2
func (Uint32x16) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLD, CPU Feature: AVX512
func (Uint32x16) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2
func (Uint32x16) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVD, CPU Feature: AVX512
func (Uint32x16) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2
func (Uint32x16) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVD, CPU Feature: AVX512
func (Uint32x16) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2
func (Uint32x16) StoreMasked ¶
StoreMasked stores a Uint32x16 to an array, at those elements enabled by mask
Asm: VMOVDQU32, CPU Feature: AVX512
func (Uint32x16) StoreSlice ¶
StoreSlice stores x into a slice of at least 16 uint32s
func (Uint32x16) StoreSlicePart ¶
StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.
func (Uint32x16) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX512
func (Uint32x16) TruncateToUint16 ¶
TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.
Asm: VPMOVDW, CPU Feature: AVX512
func (Uint32x16) TruncateToUint8 ¶
TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVDB, CPU Feature: AVX512
type Uint32x4 ¶
type Uint32x4 struct {
// contains filtered or unexported fields
}
Uint32x4 is a 128-bit SIMD vector of 4 uint32
func BroadcastUint32x4 ¶
BroadcastUint32x4 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadMaskedUint32x4 ¶
LoadMaskedUint32x4 loads a Uint32x4 from an array, at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
func LoadUint32x4 ¶
LoadUint32x4 loads a Uint32x4 from an array
func LoadUint32x4Slice ¶
LoadUint32x4Slice loads an Uint32x4 from a slice of at least 4 uint32s
func LoadUint32x4SlicePart ¶
LoadUint32x4SlicePart loads a Uint32x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadUint32x4Slice.
func (Uint32x4) AESInvMixColumns ¶
AESInvMixColumns performs the InvMixColumns operation in AES cipher algorithm defined in FIPS 197. x is the chunk of w array in use. result = InvMixColumns(x)
Asm: VAESIMC, CPU Feature: AVX, AES
func (Uint32x4) AESRoundKeyGenAssist ¶
AESRoundKeyGenAssist performs some components of KeyExpansion in AES cipher algorithm defined in FIPS 197. x is an array of AES words, but only x[0] and x[2] are used. r is a value from the Rcon constant array. result[0] = XOR(SubWord(RotWord(x[0])), r) result[1] = SubWord(x[1]) result[2] = XOR(SubWord(RotWord(x[2])), r) result[3] = SubWord(x[3])
rconVal results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VAESKEYGENASSIST, CPU Feature: AVX, AES
func (Uint32x4) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDD, CPU Feature: AVX
func (Uint32x4) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Uint32x4) AsFloat32x4 ¶
Float32x4 converts from Uint32x4 to Float32x4
func (Uint32x4) AsFloat64x2 ¶
Float64x2 converts from Uint32x4 to Float64x2
func (Uint32x4) AsUint16x8 ¶
Uint16x8 converts from Uint32x4 to Uint16x8
func (Uint32x4) AsUint64x2 ¶
Uint64x2 converts from Uint32x4 to Uint64x2
func (Uint32x4) AsUint8x16 ¶
Uint8x16 converts from Uint32x4 to Uint8x16
func (Uint32x4) Broadcast128 ¶
Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.
Asm: VPBROADCASTD, CPU Feature: AVX2
func (Uint32x4) Broadcast256 ¶
Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.
Asm: VPBROADCASTD, CPU Feature: AVX2
func (Uint32x4) Broadcast512 ¶
Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.
Asm: VPBROADCASTD, CPU Feature: AVX512
func (Uint32x4) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512
func (Uint32x4) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512
func (Uint32x4) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTUDQ2PS, CPU Feature: AVX512
func (Uint32x4) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTUDQ2PD, CPU Feature: AVX512
func (Uint32x4) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512
func (Uint32x4) ExtendLo2ToUint64x2 ¶
ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64. The result vector's elements are zero-extended.
Asm: VPMOVZXDQ, CPU Feature: AVX
func (Uint32x4) ExtendToUint64 ¶
ExtendToUint64 converts element values to uint64. The result vector's elements are zero-extended.
Asm: VPMOVZXDQ, CPU Feature: AVX2
func (Uint32x4) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRD, CPU Feature: AVX
func (Uint32x4) Greater ¶
Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX
func (Uint32x4) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX
func (Uint32x4) InterleaveHi ¶
InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX
func (Uint32x4) InterleaveLo ¶
InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX
func (Uint32x4) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Uint32x4) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512
func (Uint32x4) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX
func (Uint32x4) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX
func (Uint32x4) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXUD, CPU Feature: AVX
func (Uint32x4) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINUD, CPU Feature: AVX
func (Uint32x4) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX
func (Uint32x4) MulEvenWiden ¶
MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1.Even[i] * v2.Even[i].
Asm: VPMULUDQ, CPU Feature: AVX
func (Uint32x4) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX
func (Uint32x4) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ
func (Uint32x4) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Uint32x4) PermuteScalars ¶
PermuteScalars performs a permutation of vector x's elements using the supplied indices:
result = {x[a], x[b], x[c], x[d]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table may be generated.
Asm: VPSHUFD, CPU Feature: AVX
func (Uint32x4) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512
func (Uint32x4) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512
func (Uint32x4) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512
func (Uint32x4) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512
func (Uint32x4) SHA1FourRounds ¶
SHA1FourRounds performs 4 rounds of B loop in SHA1 algorithm defined in FIPS 180-4. x contains the state variables a, b, c and d from upper to lower order. y contains the W array elements (with the state variable e added to the upper element) from upper to lower order. result = the state variables a', b', c', d' updated after 4 rounds. constant = 0 for the first 20 rounds of the loop, 1 for the next 20 rounds of the loop..., 3 for the last 20 rounds of the loop.
constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: SHA1RNDS4, CPU Feature: SHA
func (Uint32x4) SHA1Message1 ¶
SHA1Message1 does the XORing of 1 in SHA1 algorithm defined in FIPS 180-4. x = {W3, W2, W1, W0} y = {0, 0, W5, W4} result = {W3^W5, W2^W4, W1^W3, W0^W2}.
Asm: SHA1MSG1, CPU Feature: SHA
func (Uint32x4) SHA1Message2 ¶
SHA1Message2 does the calculation of 3 and 4 in SHA1 algorithm defined in FIPS 180-4. x = result of 2. y = {W15, W14, W13} result = {W19, W18, W17, W16}
Asm: SHA1MSG2, CPU Feature: SHA
func (Uint32x4) SHA1NextE ¶
SHA1NextE calculates the state variable e' updated after 4 rounds in SHA1 algorithm defined in FIPS 180-4. x contains the state variable a (before the 4 rounds), placed in the upper element. y is the elements of W array for next 4 rounds from upper to lower order. result = the elements of the W array for the next 4 rounds, with the updated state variable e' added to the upper element, from upper to lower order. For the last round of the loop, you can specify zero for y to obtain the e' value itself, or better off specifying H4:0:0:0 for y to get e' added to H4. (Note that the value of e' is computed only from x, and values of y don't affect the computation of the value of e'.)
Asm: SHA1NEXTE, CPU Feature: SHA
func (Uint32x4) SHA256Message1 ¶
SHA256Message1 does the sigma and addtion of 1 in SHA1 algorithm defined in FIPS 180-4. x = {W0, W1, W2, W3} y = {W4, 0, 0, 0} result = {W0+σ(W1), W1+σ(W2), W2+σ(W3), W3+σ(W4)}
Asm: SHA256MSG1, CPU Feature: SHA
func (Uint32x4) SHA256Message2 ¶
SHA256Message2 does the sigma and addition of 3 in SHA1 algorithm defined in FIPS 180-4. x = result of 2 y = {0, 0, W14, W15} result = {W16, W17, W18, W19}
Asm: SHA256MSG2, CPU Feature: SHA
func (Uint32x4) SHA256TwoRounds ¶
SHA256TwoRounds does 2 rounds of B loop to calculate updated state variables in SHA1 algorithm defined in FIPS 180-4. x = {h, g, d, c} y = {f, e, b, a} z = {W0+K0, W1+K1} result = {f', e', b', a'} The K array is a 64-DWORD constant array defined in page 11 of FIPS 180-4. Each element of the K array is to be added to the corresponding element of the W array to make the input data z. The updated state variables c', d', g', h' are not returned by this instruction, because they are equal to the input data y (the state variables a, b, e, f before the 2 rounds).
Asm: SHA256RNDS2, CPU Feature: SHA
func (Uint32x4) SaturateToUint16 ¶
SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.
Asm: VPMOVUSDW, CPU Feature: AVX512
func (Uint32x4) SaturateToUint16Concat ¶
SaturateToUint16Concat converts element values to uint16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.
Asm: VPACKUSDW, CPU Feature: AVX
func (Uint32x4) SelectFromPair ¶
SelectFromPair returns the selection of four elements from the two vectors x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8}.SelectFromPair(2,3,5,7,{9,25,49,81}) returns {4,8,25,81}
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX
func (Uint32x4) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRD, CPU Feature: AVX
func (Uint32x4) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLD, CPU Feature: AVX
func (Uint32x4) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2
func (Uint32x4) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLD, CPU Feature: AVX
func (Uint32x4) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2
func (Uint32x4) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVD, CPU Feature: AVX2
func (Uint32x4) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2
func (Uint32x4) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVD, CPU Feature: AVX2
func (Uint32x4) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2
func (Uint32x4) StoreMasked ¶
StoreMasked stores a Uint32x4 to an array, at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
func (Uint32x4) StoreSlice ¶
StoreSlice stores x into a slice of at least 4 uint32s
func (Uint32x4) StoreSlicePart ¶
StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.
func (Uint32x4) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX
func (Uint32x4) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBD, CPU Feature: AVX
func (Uint32x4) TruncateToUint16 ¶
TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.
Asm: VPMOVDW, CPU Feature: AVX512
func (Uint32x4) TruncateToUint8 ¶
TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVDB, CPU Feature: AVX512
type Uint32x8 ¶
type Uint32x8 struct {
// contains filtered or unexported fields
}
Uint32x8 is a 256-bit SIMD vector of 8 uint32
func BroadcastUint32x8 ¶
BroadcastUint32x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadMaskedUint32x8 ¶
LoadMaskedUint32x8 loads a Uint32x8 from an array, at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
func LoadUint32x8 ¶
LoadUint32x8 loads a Uint32x8 from an array
func LoadUint32x8Slice ¶
LoadUint32x8Slice loads an Uint32x8 from a slice of at least 8 uint32s
func LoadUint32x8SlicePart ¶
LoadUint32x8SlicePart loads a Uint32x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadUint32x8Slice.
func (Uint32x8) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDD, CPU Feature: AVX2
func (Uint32x8) AddPairs ¶
AddPairs horizontally adds adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
Asm: VPHADDD, CPU Feature: AVX2
func (Uint32x8) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Uint32x8) AsFloat32x8 ¶
Float32x8 converts from Uint32x8 to Float32x8
func (Uint32x8) AsFloat64x4 ¶
Float64x4 converts from Uint32x8 to Float64x4
func (Uint32x8) AsInt16x16 ¶
Int16x16 converts from Uint32x8 to Int16x16
func (Uint32x8) AsUint16x16 ¶
Uint16x16 converts from Uint32x8 to Uint16x16
func (Uint32x8) AsUint64x4 ¶
Uint64x4 converts from Uint32x8 to Uint64x4
func (Uint32x8) AsUint8x32 ¶
Uint8x32 converts from Uint32x8 to Uint8x32
func (Uint32x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSD, CPU Feature: AVX512
func (Uint32x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2D, CPU Feature: AVX512
func (Uint32x8) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTUDQ2PS, CPU Feature: AVX512
func (Uint32x8) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTUDQ2PD, CPU Feature: AVX512
func (Uint32x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDD, CPU Feature: AVX512
func (Uint32x8) ExtendToUint64 ¶
ExtendToUint64 converts element values to uint64. The result vector's elements are zero-extended.
Asm: VPMOVZXDQ, CPU Feature: AVX512
func (Uint32x8) Greater ¶
Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX2
func (Uint32x8) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2
func (Uint32x8) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHDQ, CPU Feature: AVX2
func (Uint32x8) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLDQ, CPU Feature: AVX2
func (Uint32x8) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Uint32x8) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTD, CPU Feature: AVX512
func (Uint32x8) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2
func (Uint32x8) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2
func (Uint32x8) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXUD, CPU Feature: AVX2
func (Uint32x8) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINUD, CPU Feature: AVX2
func (Uint32x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLD, CPU Feature: AVX2
func (Uint32x8) MulEvenWiden ¶
MulEvenWiden multiplies even-indexed elements, widening the result. Result[i] = v1.Even[i] * v2.Even[i].
Asm: VPMULUDQ, CPU Feature: AVX2
func (Uint32x8) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2
func (Uint32x8) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTD, CPU Feature: AVX512VPOPCNTDQ
func (Uint32x8) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Uint32x8) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMD, CPU Feature: AVX2
func (Uint32x8) PermuteScalarsGrouped ¶
PermuteScalarsGrouped performs a grouped permutation of vector x using the supplied indices:
result = {x[a], x[b], x[c], x[d], x[a+4], x[b+4], x[c+4], x[d+4]}
Parameters a,b,c,d should have values between 0 and 3. If a through d are constants, then an instruction will be inlined, otherwise a jump table is generated.
Asm: VPSHUFD, CPU Feature: AVX2
func (Uint32x8) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLD, CPU Feature: AVX512
func (Uint32x8) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORD, CPU Feature: AVX512
func (Uint32x8) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVD, CPU Feature: AVX512
func (Uint32x8) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVD, CPU Feature: AVX512
func (Uint32x8) SaturateToUint16 ¶
SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.
Asm: VPMOVUSDW, CPU Feature: AVX512
func (Uint32x8) SaturateToUint16Concat ¶
SaturateToUint16Concat converts element values to uint16. With each 128-bit as a group: The converted group from the first input vector will be packed to the lower part of the result vector, the converted group from the second input vector will be packed to the upper part of the result vector. Conversion is done with saturation on the vector elements.
Asm: VPACKUSDW, CPU Feature: AVX2
func (Uint32x8) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 42, 43, 50, 51, 52, 53}.Select128FromPair(3, 0, {60, 61, 62, 63, 70, 71, 72, 73})
returns {70, 71, 72, 73, 40, 41, 42, 43}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Uint32x8) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of four elements from x and y, where selector values in the range 0-3 specify elements from x and values in the range 4-7 specify the 0-3 elements of y. When the selectors are constants and can be the selection can be implemented in a single instruction, it will be, otherwise it requires two. a is the source index of the least element in the output, and b, c, and d are the indices of the 2nd, 3rd, and 4th elements in the output. For example, {1,2,4,8,16,32,64,128}.SelectFromPair(2,3,5,7,{9,25,49,81,121,169,225,289})
returns {4,8,25,81,64,128,169,289}
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPS, CPU Feature: AVX
func (Uint32x8) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint32x8) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint32x8) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLD, CPU Feature: AVX2
func (Uint32x8) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDD, CPU Feature: AVX512VBMI2
func (Uint32x8) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLD, CPU Feature: AVX2
func (Uint32x8) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDD, CPU Feature: AVX512VBMI2
func (Uint32x8) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVD, CPU Feature: AVX2
func (Uint32x8) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVD, CPU Feature: AVX512VBMI2
func (Uint32x8) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVD, CPU Feature: AVX2
func (Uint32x8) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVD, CPU Feature: AVX512VBMI2
func (Uint32x8) StoreMasked ¶
StoreMasked stores a Uint32x8 to an array, at those elements enabled by mask
Asm: VMASKMOVD, CPU Feature: AVX2
func (Uint32x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 uint32s
func (Uint32x8) StoreSlicePart ¶
StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Uint32x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBD, CPU Feature: AVX2
func (Uint32x8) SubPairs ¶
SubPairs horizontally subtracts adjacent pairs of elements. For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
Asm: VPHSUBD, CPU Feature: AVX2
func (Uint32x8) TruncateToUint16 ¶
TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.
Asm: VPMOVDW, CPU Feature: AVX512
func (Uint32x8) TruncateToUint8 ¶
TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVDB, CPU Feature: AVX512
type Uint64x2 ¶
type Uint64x2 struct {
// contains filtered or unexported fields
}
Uint64x2 is a 128-bit SIMD vector of 2 uint64
func BroadcastUint64x2 ¶
BroadcastUint64x2 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadMaskedUint64x2 ¶
LoadMaskedUint64x2 loads a Uint64x2 from an array, at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
func LoadUint64x2 ¶
LoadUint64x2 loads a Uint64x2 from an array
func LoadUint64x2Slice ¶
LoadUint64x2Slice loads an Uint64x2 from a slice of at least 2 uint64s
func LoadUint64x2SlicePart ¶
LoadUint64x2SlicePart loads a Uint64x2 from the slice s. If s has fewer than 2 elements, the remaining elements of the vector are filled with zeroes. If s has 2 or more elements, the function is equivalent to LoadUint64x2Slice.
func (Uint64x2) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Uint64x2) AsFloat32x4 ¶
Float32x4 converts from Uint64x2 to Float32x4
func (Uint64x2) AsFloat64x2 ¶
Float64x2 converts from Uint64x2 to Float64x2
func (Uint64x2) AsUint16x8 ¶
Uint16x8 converts from Uint64x2 to Uint16x8
func (Uint64x2) AsUint32x4 ¶
Uint32x4 converts from Uint64x2 to Uint32x4
func (Uint64x2) AsUint8x16 ¶
Uint8x16 converts from Uint64x2 to Uint8x16
func (Uint64x2) Broadcast128 ¶
Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2
func (Uint64x2) Broadcast256 ¶
Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX2
func (Uint64x2) Broadcast512 ¶
Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.
Asm: VPBROADCASTQ, CPU Feature: AVX512
func (Uint64x2) CarrylessMultiply ¶
CarrylessMultiply computes one of four possible carryless multiplications of selected high and low halves of x and y, depending on the values of a and b, returning the 128-bit product in the concatenated two elements of the result. a selects the low (0) or high (1) element of x and b selects the low (0) or high (1) element of y.
A carryless multiplication uses bitwise XOR instead of add-with-carry, for example (in base two): 11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101
This also models multiplication of polynomials with coefficients from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 = x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds polynomial terms, but coefficients "add" with XOR.)
constant values of a and b will result in better performance, otherwise the intrinsic may translate into a jump table.
Asm: VPCLMULQDQ, CPU Feature: AVX
func (Uint64x2) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512
func (Uint64x2) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512
func (Uint64x2) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTUQQ2PSX, CPU Feature: AVX512
func (Uint64x2) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTUQQ2PD, CPU Feature: AVX512
func (Uint64x2) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512
func (Uint64x2) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRQ, CPU Feature: AVX
func (Uint64x2) Greater ¶
Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX
func (Uint64x2) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX
func (Uint64x2) InterleaveHi ¶
InterleaveHi interleaves the elements of the high halves of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX
func (Uint64x2) InterleaveLo ¶
InterleaveLo interleaves the elements of the low halves of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX
func (Uint64x2) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Uint64x2) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512
func (Uint64x2) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX
func (Uint64x2) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX
func (Uint64x2) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXUQ, CPU Feature: AVX512
func (Uint64x2) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINUQ, CPU Feature: AVX512
func (Uint64x2) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512
func (Uint64x2) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX
func (Uint64x2) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ
func (Uint64x2) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Uint64x2) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512
func (Uint64x2) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512
func (Uint64x2) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512
func (Uint64x2) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512
func (Uint64x2) SaturateToUint16 ¶
SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.
Asm: VPMOVUSQW, CPU Feature: AVX512
func (Uint64x2) SaturateToUint32 ¶
SaturateToUint32 converts element values to uint32. Conversion is done with saturation on the vector elements.
Asm: VPMOVUSQD, CPU Feature: AVX512
func (Uint64x2) SelectFromPair ¶
SelectFromPair returns the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX
func (Uint64x2) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRQ, CPU Feature: AVX
func (Uint64x2) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLQ, CPU Feature: AVX
func (Uint64x2) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2
func (Uint64x2) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLQ, CPU Feature: AVX
func (Uint64x2) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2
func (Uint64x2) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVQ, CPU Feature: AVX2
func (Uint64x2) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2
func (Uint64x2) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVQ, CPU Feature: AVX2
func (Uint64x2) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2
func (Uint64x2) StoreMasked ¶
StoreMasked stores a Uint64x2 to an array, at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Uint64x2) StoreSlice ¶
StoreSlice stores x into a slice of at least 2 uint64s
func (Uint64x2) StoreSlicePart ¶
StoreSlicePart stores the 2 elements of x into the slice s. It stores as many elements as will fit in s. If s has 2 or more elements, the method is equivalent to x.StoreSlice.
func (Uint64x2) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX
func (Uint64x2) TruncateToUint16 ¶
TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.
Asm: VPMOVQW, CPU Feature: AVX512
func (Uint64x2) TruncateToUint32 ¶
TruncateToUint32 converts element values to uint32. Conversion is done with truncation on the vector elements.
Asm: VPMOVQD, CPU Feature: AVX512
func (Uint64x2) TruncateToUint8 ¶
TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVQB, CPU Feature: AVX512
type Uint64x4 ¶
type Uint64x4 struct {
// contains filtered or unexported fields
}
Uint64x4 is a 256-bit SIMD vector of 4 uint64
func BroadcastUint64x4 ¶
BroadcastUint64x4 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadMaskedUint64x4 ¶
LoadMaskedUint64x4 loads a Uint64x4 from an array, at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
func LoadUint64x4 ¶
LoadUint64x4 loads a Uint64x4 from an array
func LoadUint64x4Slice ¶
LoadUint64x4Slice loads an Uint64x4 from a slice of at least 4 uint64s
func LoadUint64x4SlicePart ¶
LoadUint64x4SlicePart loads a Uint64x4 from the slice s. If s has fewer than 4 elements, the remaining elements of the vector are filled with zeroes. If s has 4 or more elements, the function is equivalent to LoadUint64x4Slice.
func (Uint64x4) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDQ, CPU Feature: AVX2
func (Uint64x4) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Uint64x4) AsFloat32x8 ¶
Float32x8 converts from Uint64x4 to Float32x8
func (Uint64x4) AsFloat64x4 ¶
Float64x4 converts from Uint64x4 to Float64x4
func (Uint64x4) AsInt16x16 ¶
Int16x16 converts from Uint64x4 to Int16x16
func (Uint64x4) AsUint16x16 ¶
Uint16x16 converts from Uint64x4 to Uint16x16
func (Uint64x4) AsUint32x8 ¶
Uint32x8 converts from Uint64x4 to Uint32x8
func (Uint64x4) AsUint8x32 ¶
Uint8x32 converts from Uint64x4 to Uint8x32
func (Uint64x4) CarrylessMultiplyGrouped ¶
CarrylessMultiplyGrouped computes one of four possible carryless multiplications of selected high and low halves of each of the two 128-bit lanes of x and y, depending on the values of a and b, and returns the four 128-bit products in the result's lanes. a selects the low (0) or high (1) elements of x's lanes and b selects the low (0) or high (1) elements of y's lanes.
A carryless multiplication uses bitwise XOR instead of add-with-carry, for example (in base two): 11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101
This also models multiplication of polynomials with coefficients from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 = x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds polynomial terms, but coefficients "add" with XOR.)
constant values of a and b will result in better performance, otherwise the intrinsic may translate into a jump table.
Asm: VPCLMULQDQ, CPU Feature: AVX512VPCLMULQDQ
func (Uint64x4) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512
func (Uint64x4) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512
func (Uint64x4) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTUQQ2PSY, CPU Feature: AVX512
func (Uint64x4) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTUQQ2PD, CPU Feature: AVX512
func (Uint64x4) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512
func (Uint64x4) Greater ¶
Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX2
func (Uint64x4) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2
func (Uint64x4) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX2
func (Uint64x4) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX2
func (Uint64x4) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Uint64x4) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512
func (Uint64x4) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2
func (Uint64x4) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2
func (Uint64x4) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXUQ, CPU Feature: AVX512
func (Uint64x4) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINUQ, CPU Feature: AVX512
func (Uint64x4) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512
func (Uint64x4) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2
func (Uint64x4) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ
func (Uint64x4) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Uint64x4) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 2 bits (values 0-3) of each element of indices is used
Asm: VPERMQ, CPU Feature: AVX512
func (Uint64x4) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512
func (Uint64x4) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512
func (Uint64x4) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512
func (Uint64x4) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512
func (Uint64x4) SaturateToUint16 ¶
SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.
Asm: VPMOVUSQW, CPU Feature: AVX512
func (Uint64x4) SaturateToUint32 ¶
SaturateToUint32 converts element values to uint32. Conversion is done with saturation on the vector elements.
Asm: VPMOVUSQD, CPU Feature: AVX512
func (Uint64x4) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{40, 41, 50, 51}.Select128FromPair(3, 0, {60, 61, 70, 71})
returns {70, 71, 40, 41}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Uint64x4) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the two 128-bit halves of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX
func (Uint64x4) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint64x4) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint64x4) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLQ, CPU Feature: AVX2
func (Uint64x4) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2
func (Uint64x4) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLQ, CPU Feature: AVX2
func (Uint64x4) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2
func (Uint64x4) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVQ, CPU Feature: AVX2
func (Uint64x4) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2
func (Uint64x4) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVQ, CPU Feature: AVX2
func (Uint64x4) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2
func (Uint64x4) StoreMasked ¶
StoreMasked stores a Uint64x4 to an array, at those elements enabled by mask
Asm: VMASKMOVQ, CPU Feature: AVX2
func (Uint64x4) StoreSlice ¶
StoreSlice stores x into a slice of at least 4 uint64s
func (Uint64x4) StoreSlicePart ¶
StoreSlicePart stores the 4 elements of x into the slice s. It stores as many elements as will fit in s. If s has 4 or more elements, the method is equivalent to x.StoreSlice.
func (Uint64x4) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX2
func (Uint64x4) TruncateToUint16 ¶
TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.
Asm: VPMOVQW, CPU Feature: AVX512
func (Uint64x4) TruncateToUint32 ¶
TruncateToUint32 converts element values to uint32. Conversion is done with truncation on the vector elements.
Asm: VPMOVQD, CPU Feature: AVX512
func (Uint64x4) TruncateToUint8 ¶
TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVQB, CPU Feature: AVX512
type Uint64x8 ¶
type Uint64x8 struct {
// contains filtered or unexported fields
}
Uint64x8 is a 512-bit SIMD vector of 8 uint64
func BroadcastUint64x8 ¶
BroadcastUint64x8 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX512F
func LoadMaskedUint64x8 ¶
LoadMaskedUint64x8 loads a Uint64x8 from an array, at those elements enabled by mask
Asm: VMOVDQU64.Z, CPU Feature: AVX512
func LoadUint64x8 ¶
LoadUint64x8 loads a Uint64x8 from an array
func LoadUint64x8Slice ¶
LoadUint64x8Slice loads an Uint64x8 from a slice of at least 8 uint64s
func LoadUint64x8SlicePart ¶
LoadUint64x8SlicePart loads a Uint64x8 from the slice s. If s has fewer than 8 elements, the remaining elements of the vector are filled with zeroes. If s has 8 or more elements, the function is equivalent to LoadUint64x8Slice.
func (Uint64x8) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDQ, CPU Feature: AVX512
func (Uint64x8) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDQ, CPU Feature: AVX512
func (Uint64x8) AsFloat32x16 ¶
func (from Uint64x8) AsFloat32x16() (to Float32x16)
Float32x16 converts from Uint64x8 to Float32x16
func (Uint64x8) AsFloat64x8 ¶
Float64x8 converts from Uint64x8 to Float64x8
func (Uint64x8) AsInt16x32 ¶
Int16x32 converts from Uint64x8 to Int16x32
func (Uint64x8) AsInt32x16 ¶
Int32x16 converts from Uint64x8 to Int32x16
func (Uint64x8) AsUint16x32 ¶
Uint16x32 converts from Uint64x8 to Uint16x32
func (Uint64x8) AsUint32x16 ¶
Uint32x16 converts from Uint64x8 to Uint32x16
func (Uint64x8) AsUint8x64 ¶
Uint8x64 converts from Uint64x8 to Uint8x64
func (Uint64x8) CarrylessMultiplyGrouped ¶
CarrylessMultiplyGrouped computes one of four possible carryless multiplications of selected high and low halves of each of the four 128-bit lanes of x and y, depending on the values of a and b, and returns the four 128-bit products in the result's lanes. a selects the low (0) or high (1) elements of x's lanes and b selects the low (0) or high (1) elements of y's lanes.
A carryless multiplication uses bitwise XOR instead of add-with-carry, for example (in base two): 11 * 11 = 11 * (10 ^ 1) = (11 * 10) ^ (11 * 1) = 110 ^ 11 = 101
This also models multiplication of polynomials with coefficients from GF(2) -- 11 * 11 models (x+1)*(x+1) = x**2 + (1^1)x + 1 = x**2 + 0x + 1 = x**2 + 1 modeled by 101. (Note that "+" adds polynomial terms, but coefficients "add" with XOR.)
constant values of a and b will result in better performance, otherwise the intrinsic may translate into a jump table.
Asm: VPCLMULQDQ, CPU Feature: AVX512VPCLMULQDQ
func (Uint64x8) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSQ, CPU Feature: AVX512
func (Uint64x8) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2Q, CPU Feature: AVX512
func (Uint64x8) ConvertToFloat32 ¶
ConvertToFloat32 converts element values to float32.
Asm: VCVTUQQ2PS, CPU Feature: AVX512
func (Uint64x8) ConvertToFloat64 ¶
ConvertToFloat64 converts element values to float64.
Asm: VCVTUQQ2PD, CPU Feature: AVX512
func (Uint64x8) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDQ, CPU Feature: AVX512
func (Uint64x8) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPUQ, CPU Feature: AVX512
func (Uint64x8) GreaterEqual ¶
GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPUQ, CPU Feature: AVX512
func (Uint64x8) InterleaveHiGrouped ¶
InterleaveHiGrouped interleaves the elements of the high half of each 128-bit subvector of x and y.
Asm: VPUNPCKHQDQ, CPU Feature: AVX512
func (Uint64x8) InterleaveLoGrouped ¶
InterleaveLoGrouped interleaves the elements of the low half of each 128-bit subvector of x and y.
Asm: VPUNPCKLQDQ, CPU Feature: AVX512
func (Uint64x8) LeadingZeros ¶
LeadingZeros counts the leading zeros of each element in x.
Asm: VPLZCNTQ, CPU Feature: AVX512
func (Uint64x8) LessEqual ¶
LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPUQ, CPU Feature: AVX512
func (Uint64x8) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXUQ, CPU Feature: AVX512
func (Uint64x8) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINUQ, CPU Feature: AVX512
func (Uint64x8) Mul ¶
Mul multiplies corresponding elements of two vectors.
Asm: VPMULLQ, CPU Feature: AVX512
func (Uint64x8) NotEqual ¶
NotEqual returns x not-equals y, elementwise.
Asm: VPCMPUQ, CPU Feature: AVX512
func (Uint64x8) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTQ, CPU Feature: AVX512VPOPCNTDQ
func (Uint64x8) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORQ, CPU Feature: AVX512
func (Uint64x8) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 3 bits (values 0-7) of each element of indices is used
Asm: VPERMQ, CPU Feature: AVX512
func (Uint64x8) RotateAllLeft ¶
RotateAllLeft rotates each element to the left by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPROLQ, CPU Feature: AVX512
func (Uint64x8) RotateAllRight ¶
RotateAllRight rotates each element to the right by the number of bits specified by the immediate.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPRORQ, CPU Feature: AVX512
func (Uint64x8) RotateLeft ¶
RotateLeft rotates each element in x to the left by the number of bits specified by y's corresponding elements.
Asm: VPROLVQ, CPU Feature: AVX512
func (Uint64x8) RotateRight ¶
RotateRight rotates each element in x to the right by the number of bits specified by y's corresponding elements.
Asm: VPRORVQ, CPU Feature: AVX512
func (Uint64x8) SaturateToUint16 ¶
SaturateToUint16 converts element values to uint16. Conversion is done with saturation on the vector elements.
Asm: VPMOVUSQW, CPU Feature: AVX512
func (Uint64x8) SaturateToUint32 ¶
SaturateToUint32 converts element values to uint32. Conversion is done with saturation on the vector elements.
Asm: VPMOVUSQD, CPU Feature: AVX512
func (Uint64x8) SelectFromPairGrouped ¶
SelectFromPairGrouped returns, for each of the four 128-bit subvectors of the vectors x and y, the selection of two elements from the two vectors x and y, where selector values in the range 0-1 specify elements from x and values in the range 2-3 specify the 0-1 elements of y. When the selectors are constants the selection can be implemented in a single instruction.
If the selectors are not constant this will translate to a function call.
Asm: VSHUFPD, CPU Feature: AVX512
func (Uint64x8) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint64x8) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint64x8) ShiftAllLeft ¶
ShiftAllLeft shifts each element to the left by the specified number of bits. Emptied lower bits are zeroed.
Asm: VPSLLQ, CPU Feature: AVX512
func (Uint64x8) ShiftAllLeftConcat ¶
ShiftAllLeftConcat shifts each element of x to the left by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the upper bits of y to the emptied lower bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHLDQ, CPU Feature: AVX512VBMI2
func (Uint64x8) ShiftAllRight ¶
ShiftAllRight shifts each element to the right by the specified number of bits. Emptied upper bits are zeroed.
Asm: VPSRLQ, CPU Feature: AVX512
func (Uint64x8) ShiftAllRightConcat ¶
ShiftAllRightConcat shifts each element of x to the right by the number of bits specified by the immediate(only the lower 5 bits are used), and then copies the lower bits of y to the emptied upper bits of the shifted x.
shift results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPSHRDQ, CPU Feature: AVX512VBMI2
func (Uint64x8) ShiftLeft ¶
ShiftLeft shifts each element in x to the left by the number of bits specified in y's corresponding elements. Emptied lower bits are zeroed.
Asm: VPSLLVQ, CPU Feature: AVX512
func (Uint64x8) ShiftLeftConcat ¶
ShiftLeftConcat shifts each element of x to the left by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the upper bits of z to the emptied lower bits of the shifted x.
Asm: VPSHLDVQ, CPU Feature: AVX512VBMI2
func (Uint64x8) ShiftRight ¶
ShiftRight shifts each element in x to the right by the number of bits specified in y's corresponding elements. Emptied upper bits are zeroed.
Asm: VPSRLVQ, CPU Feature: AVX512
func (Uint64x8) ShiftRightConcat ¶
ShiftRightConcat shifts each element of x to the right by the number of bits specified by the corresponding elements in y(only the lower 5 bits are used), and then copies the lower bits of z to the emptied upper bits of the shifted x.
Asm: VPSHRDVQ, CPU Feature: AVX512VBMI2
func (Uint64x8) StoreMasked ¶
StoreMasked stores a Uint64x8 to an array, at those elements enabled by mask
Asm: VMOVDQU64, CPU Feature: AVX512
func (Uint64x8) StoreSlice ¶
StoreSlice stores x into a slice of at least 8 uint64s
func (Uint64x8) StoreSlicePart ¶
StoreSlicePart stores the 8 elements of x into the slice s. It stores as many elements as will fit in s. If s has 8 or more elements, the method is equivalent to x.StoreSlice.
func (Uint64x8) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBQ, CPU Feature: AVX512
func (Uint64x8) TruncateToUint16 ¶
TruncateToUint16 converts element values to uint16. Conversion is done with truncation on the vector elements.
Asm: VPMOVQW, CPU Feature: AVX512
func (Uint64x8) TruncateToUint32 ¶
TruncateToUint32 converts element values to uint32. Conversion is done with truncation on the vector elements.
Asm: VPMOVQD, CPU Feature: AVX512
func (Uint64x8) TruncateToUint8 ¶
TruncateToUint8 converts element values to uint8. Conversion is done with truncation on the vector elements. Results are packed to low elements in the returned vector, its upper elements are zero-cleared.
Asm: VPMOVQB, CPU Feature: AVX512
type Uint8x16 ¶
type Uint8x16 struct {
// contains filtered or unexported fields
}
Uint8x16 is a 128-bit SIMD vector of 16 uint8
func BroadcastUint8x16 ¶
BroadcastUint8x16 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadUint8x16 ¶
LoadUint8x16 loads a Uint8x16 from an array
func LoadUint8x16Slice ¶
LoadUint8x16Slice loads an Uint8x16 from a slice of at least 16 uint8s
func LoadUint8x16SlicePart ¶
LoadUint8x16SlicePart loads a Uint8x16 from the slice s. If s has fewer than 16 elements, the remaining elements of the vector are filled with zeroes. If s has 16 or more elements, the function is equivalent to LoadUint8x16Slice.
func (Uint8x16) AESDecryptLastRound ¶
AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)
Asm: VAESDECLAST, CPU Feature: AVX, AES
func (Uint8x16) AESDecryptOneRound ¶
AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)
Asm: VAESDEC, CPU Feature: AVX, AES
func (Uint8x16) AESEncryptLastRound ¶
AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey((ShiftRows(SubBytes(x))), y)
Asm: VAESENCLAST, CPU Feature: AVX, AES
func (Uint8x16) AESEncryptOneRound ¶
AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)
Asm: VAESENC, CPU Feature: AVX, AES
func (Uint8x16) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSB, CPU Feature: AVX
func (Uint8x16) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX
func (Uint8x16) AsFloat32x4 ¶
Float32x4 converts from Uint8x16 to Float32x4
func (Uint8x16) AsFloat64x2 ¶
Float64x2 converts from Uint8x16 to Float64x2
func (Uint8x16) AsUint16x8 ¶
Uint16x8 converts from Uint8x16 to Uint16x8
func (Uint8x16) AsUint32x4 ¶
Uint32x4 converts from Uint8x16 to Uint32x4
func (Uint8x16) AsUint64x2 ¶
Uint64x2 converts from Uint8x16 to Uint64x2
func (Uint8x16) Average ¶
Average computes the rounded average of corresponding elements.
Asm: VPAVGB, CPU Feature: AVX
func (Uint8x16) Broadcast128 ¶
Broadcast128 copies element zero of its (128-bit) input to all elements of the 128-bit output vector.
Asm: VPBROADCASTB, CPU Feature: AVX2
func (Uint8x16) Broadcast256 ¶
Broadcast256 copies element zero of its (128-bit) input to all elements of the 256-bit output vector.
Asm: VPBROADCASTB, CPU Feature: AVX2
func (Uint8x16) Broadcast512 ¶
Broadcast512 copies element zero of its (128-bit) input to all elements of the 512-bit output vector.
Asm: VPBROADCASTB, CPU Feature: AVX512
func (Uint8x16) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2
func (Uint8x16) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI
func (Uint8x16) ConcatShiftBytesRight ¶
ConcatShiftBytesRight concatenates x and y and shift it right by constant bytes. The result vector will be the lower half of the concatenated vector.
constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPALIGNR, CPU Feature: AVX
func (Uint8x16) DotProductPairsSaturated ¶
DotProductPairsSaturated multiplies the elements and add the pairs together with saturation, yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDUBSW, CPU Feature: AVX
func (Uint8x16) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2
func (Uint8x16) ExtendLo2ToUint64x2 ¶
ExtendLo2ToUint64x2 converts 2 lowest vector element values to uint64. The result vector's elements are zero-extended.
Asm: VPMOVZXBQ, CPU Feature: AVX
func (Uint8x16) ExtendLo4ToUint32x4 ¶
ExtendLo4ToUint32x4 converts 4 lowest vector element values to uint32. The result vector's elements are zero-extended.
Asm: VPMOVZXBD, CPU Feature: AVX
func (Uint8x16) ExtendLo4ToUint64x4 ¶
ExtendLo4ToUint64x4 converts 4 lowest vector element values to uint64. The result vector's elements are zero-extended.
Asm: VPMOVZXBQ, CPU Feature: AVX2
func (Uint8x16) ExtendLo8ToUint16x8 ¶
ExtendLo8ToUint16x8 converts 8 lowest vector element values to uint16. The result vector's elements are zero-extended.
Asm: VPMOVZXBW, CPU Feature: AVX
func (Uint8x16) ExtendLo8ToUint32x8 ¶
ExtendLo8ToUint32x8 converts 8 lowest vector element values to uint32. The result vector's elements are zero-extended.
Asm: VPMOVZXBD, CPU Feature: AVX2
func (Uint8x16) ExtendLo8ToUint64x8 ¶
ExtendLo8ToUint64x8 converts 8 lowest vector element values to uint64. The result vector's elements are zero-extended.
Asm: VPMOVZXBQ, CPU Feature: AVX512
func (Uint8x16) ExtendToUint16 ¶
ExtendToUint16 converts element values to uint16. The result vector's elements are zero-extended.
Asm: VPMOVZXBW, CPU Feature: AVX2
func (Uint8x16) ExtendToUint32 ¶
ExtendToUint32 converts element values to uint32. The result vector's elements are zero-extended.
Asm: VPMOVZXBD, CPU Feature: AVX512
func (Uint8x16) GaloisFieldAffineTransform ¶
GaloisFieldAffineTransform computes an affine transformation in GF(2^8): x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI
func (Uint8x16) GaloisFieldAffineTransformInverse ¶
GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8), with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1: x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI
func (Uint8x16) GaloisFieldMul ¶
GaloisFieldMul computes element-wise GF(2^8) multiplication with reduction polynomial x^8 + x^4 + x^3 + x + 1.
Asm: VGF2P8MULB, CPU Feature: AVX512GFNI
func (Uint8x16) GetElem ¶
GetElem retrieves a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPEXTRB, CPU Feature: AVX512
func (Uint8x16) Greater ¶
Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX2
func (Uint8x16) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2
func (Uint8x16) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Uint8x16) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2
func (Uint8x16) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2
func (Uint8x16) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXUB, CPU Feature: AVX
func (Uint8x16) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINUB, CPU Feature: AVX
func (Uint8x16) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX
func (Uint8x16) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG
func (Uint8x16) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX
func (Uint8x16) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 4 bits (values 0-15) of each element of indices is used
Asm: VPERMB, CPU Feature: AVX512VBMI
func (Uint8x16) PermuteOrZero ¶
PermuteOrZero performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The lower four bits of each byte-sized index in indices select an element from x, unless the index's sign bit is set in which case zero is used instead.
Asm: VPSHUFB, CPU Feature: AVX
func (Uint8x16) SetElem ¶
SetElem sets a single constant-indexed element's value.
index results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPINSRB, CPU Feature: AVX
func (Uint8x16) StoreSlice ¶
StoreSlice stores x into a slice of at least 16 uint8s
func (Uint8x16) StoreSlicePart ¶
StoreSlicePart stores the 16 elements of x into the slice s. It stores as many elements as will fit in s. If s has 16 or more elements, the method is equivalent to x.StoreSlice.
func (Uint8x16) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX
func (Uint8x16) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSB, CPU Feature: AVX
func (Uint8x16) SumAbsDiff ¶
SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed. This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.
Asm: VPSADBW, CPU Feature: AVX
type Uint8x32 ¶
type Uint8x32 struct {
// contains filtered or unexported fields
}
Uint8x32 is a 256-bit SIMD vector of 32 uint8
func BroadcastUint8x32 ¶
BroadcastUint8x32 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX2
func LoadUint8x32 ¶
LoadUint8x32 loads a Uint8x32 from an array
func LoadUint8x32Slice ¶
LoadUint8x32Slice loads an Uint8x32 from a slice of at least 32 uint8s
func LoadUint8x32SlicePart ¶
LoadUint8x32SlicePart loads a Uint8x32 from the slice s. If s has fewer than 32 elements, the remaining elements of the vector are filled with zeroes. If s has 32 or more elements, the function is equivalent to LoadUint8x32Slice.
func (Uint8x32) AESDecryptLastRound ¶
AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)
Asm: VAESDECLAST, CPU Feature: AVX512VAES
func (Uint8x32) AESDecryptOneRound ¶
AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)
Asm: VAESDEC, CPU Feature: AVX512VAES
func (Uint8x32) AESEncryptLastRound ¶
AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey((ShiftRows(SubBytes(x))), y)
Asm: VAESENCLAST, CPU Feature: AVX512VAES
func (Uint8x32) AESEncryptOneRound ¶
AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)
Asm: VAESENC, CPU Feature: AVX512VAES
func (Uint8x32) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDB, CPU Feature: AVX2
func (Uint8x32) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSB, CPU Feature: AVX2
func (Uint8x32) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPAND, CPU Feature: AVX2
func (Uint8x32) AsFloat32x8 ¶
Float32x8 converts from Uint8x32 to Float32x8
func (Uint8x32) AsFloat64x4 ¶
Float64x4 converts from Uint8x32 to Float64x4
func (Uint8x32) AsInt16x16 ¶
Int16x16 converts from Uint8x32 to Int16x16
func (Uint8x32) AsUint16x16 ¶
Uint16x16 converts from Uint8x32 to Uint16x16
func (Uint8x32) AsUint32x8 ¶
Uint32x8 converts from Uint8x32 to Uint32x8
func (Uint8x32) AsUint64x4 ¶
Uint64x4 converts from Uint8x32 to Uint64x4
func (Uint8x32) Average ¶
Average computes the rounded average of corresponding elements.
Asm: VPAVGB, CPU Feature: AVX2
func (Uint8x32) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2
func (Uint8x32) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI
func (Uint8x32) ConcatShiftBytesRightGrouped ¶
ConcatShiftBytesRightGrouped concatenates x and y and shift it right by constant bytes. The result vector will be the lower half of the concatenated vector. This operation is performed grouped by each 16 byte.
constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPALIGNR, CPU Feature: AVX2
func (Uint8x32) DotProductPairsSaturated ¶
DotProductPairsSaturated multiplies the elements and add the pairs together with saturation, yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDUBSW, CPU Feature: AVX2
func (Uint8x32) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2
func (Uint8x32) ExtendToUint16 ¶
ExtendToUint16 converts element values to uint16. The result vector's elements are zero-extended.
Asm: VPMOVZXBW, CPU Feature: AVX512
func (Uint8x32) GaloisFieldAffineTransform ¶
GaloisFieldAffineTransform computes an affine transformation in GF(2^8): x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI
func (Uint8x32) GaloisFieldAffineTransformInverse ¶
GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8), with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1: x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI
func (Uint8x32) GaloisFieldMul ¶
GaloisFieldMul computes element-wise GF(2^8) multiplication with reduction polynomial x^8 + x^4 + x^3 + x + 1.
Asm: VGF2P8MULB, CPU Feature: AVX512GFNI
func (Uint8x32) Greater ¶
Greater returns a mask whose elements indicate whether x > y
Emulated, CPU Feature AVX2
func (Uint8x32) GreaterEqual ¶
GreaterEqual returns a mask whose elements indicate whether x >= y
Emulated, CPU Feature AVX2
func (Uint8x32) IsZero ¶
IsZero returns true if all elements of x are zeros.
This method compiles to VPTEST x, x. x.And(y).IsZero() and x.AndNot(y).IsZero() will be optimized to VPTEST x, y
Asm: VPTEST, CPU Feature: AVX
func (Uint8x32) Less ¶
Less returns a mask whose elements indicate whether x < y
Emulated, CPU Feature AVX2
func (Uint8x32) LessEqual ¶
LessEqual returns a mask whose elements indicate whether x <= y
Emulated, CPU Feature AVX2
func (Uint8x32) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXUB, CPU Feature: AVX2
func (Uint8x32) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINUB, CPU Feature: AVX2
func (Uint8x32) NotEqual ¶
NotEqual returns a mask whose elements indicate whether x != y
Emulated, CPU Feature AVX2
func (Uint8x32) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG
func (Uint8x32) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPOR, CPU Feature: AVX2
func (Uint8x32) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 5 bits (values 0-31) of each element of indices is used
Asm: VPERMB, CPU Feature: AVX512VBMI
func (Uint8x32) PermuteOrZeroGrouped ¶
PermuteOrZeroGrouped performs a grouped permutation of vector x using indices: result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...} The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.
Asm: VPSHUFB, CPU Feature: AVX2
func (Uint8x32) Select128FromPair ¶
Select128FromPair treats the 256-bit vectors x and y as a single vector of four 128-bit elements, and returns a 256-bit result formed by concatenating the two elements specified by lo and hi. For example,
{0x40, 0x41, ..., 0x4f, 0x50, 0x51, ..., 0x5f}.Select128FromPair(3, 0,
{0x60, 0x61, ..., 0x6f, 0x70, 0x71, ..., 0x7f})
returns {0x70, 0x71, ..., 0x7f, 0x40, 0x41, ..., 0x4f}.
lo, hi result in better performance when they are constants, non-constant values will be translated into a jump table. lo, hi should be between 0 and 3, inclusive; other values may result in a runtime panic.
Asm: VPERM2I128, CPU Feature: AVX2
func (Uint8x32) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint8x32) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI128, CPU Feature: AVX2
func (Uint8x32) StoreSlice ¶
StoreSlice stores x into a slice of at least 32 uint8s
func (Uint8x32) StoreSlicePart ¶
StoreSlicePart stores the 32 elements of x into the slice s. It stores as many elements as will fit in s. If s has 32 or more elements, the method is equivalent to x.StoreSlice.
func (Uint8x32) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX2
func (Uint8x32) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSB, CPU Feature: AVX2
func (Uint8x32) SumAbsDiff ¶
SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed. This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.
Asm: VPSADBW, CPU Feature: AVX2
type Uint8x64 ¶
type Uint8x64 struct {
// contains filtered or unexported fields
}
Uint8x64 is a 512-bit SIMD vector of 64 uint8
func BroadcastUint8x64 ¶
BroadcastUint8x64 returns a vector with the input x assigned to all elements of the output.
Emulated, CPU Feature AVX512BW
func LoadMaskedUint8x64 ¶
LoadMaskedUint8x64 loads a Uint8x64 from an array, at those elements enabled by mask
Asm: VMOVDQU8.Z, CPU Feature: AVX512
func LoadUint8x64 ¶
LoadUint8x64 loads a Uint8x64 from an array
func LoadUint8x64Slice ¶
LoadUint8x64Slice loads an Uint8x64 from a slice of at least 64 uint8s
func LoadUint8x64SlicePart ¶
LoadUint8x64SlicePart loads a Uint8x64 from the slice s. If s has fewer than 64 elements, the remaining elements of the vector are filled with zeroes. If s has 64 or more elements, the function is equivalent to LoadUint8x64Slice.
func (Uint8x64) AESDecryptLastRound ¶
AESDecryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvShiftRows(InvSubBytes(x)), y)
Asm: VAESDECLAST, CPU Feature: AVX512VAES
func (Uint8x64) AESDecryptOneRound ¶
AESDecryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of dw array in use. result = AddRoundKey(InvMixColumns(InvShiftRows(InvSubBytes(x))), y)
Asm: VAESDEC, CPU Feature: AVX512VAES
func (Uint8x64) AESEncryptLastRound ¶
AESEncryptLastRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey((ShiftRows(SubBytes(x))), y)
Asm: VAESENCLAST, CPU Feature: AVX512VAES
func (Uint8x64) AESEncryptOneRound ¶
AESEncryptOneRound performs a series of operations in AES cipher algorithm defined in FIPS 197. x is the state array, starting from low index to high are s00, s10, s20, s30, s01, ..., s33. y is the chunk of w array in use. result = AddRoundKey(MixColumns(ShiftRows(SubBytes(x))), y)
Asm: VAESENC, CPU Feature: AVX512VAES
func (Uint8x64) Add ¶
Add adds corresponding elements of two vectors.
Asm: VPADDB, CPU Feature: AVX512
func (Uint8x64) AddSaturated ¶
AddSaturated adds corresponding elements of two vectors with saturation.
Asm: VPADDUSB, CPU Feature: AVX512
func (Uint8x64) And ¶
And performs a bitwise AND operation between two vectors.
Asm: VPANDD, CPU Feature: AVX512
func (Uint8x64) AsFloat32x16 ¶
func (from Uint8x64) AsFloat32x16() (to Float32x16)
Float32x16 converts from Uint8x64 to Float32x16
func (Uint8x64) AsFloat64x8 ¶
Float64x8 converts from Uint8x64 to Float64x8
func (Uint8x64) AsInt16x32 ¶
Int16x32 converts from Uint8x64 to Int16x32
func (Uint8x64) AsInt32x16 ¶
Int32x16 converts from Uint8x64 to Int32x16
func (Uint8x64) AsUint16x32 ¶
Uint16x32 converts from Uint8x64 to Uint16x32
func (Uint8x64) AsUint32x16 ¶
Uint32x16 converts from Uint8x64 to Uint32x16
func (Uint8x64) AsUint64x8 ¶
Uint64x8 converts from Uint8x64 to Uint64x8
func (Uint8x64) Average ¶
Average computes the rounded average of corresponding elements.
Asm: VPAVGB, CPU Feature: AVX512
func (Uint8x64) Compress ¶
Compress performs a compression on vector x using mask by selecting elements as indicated by mask, and pack them to lower indexed elements.
Asm: VPCOMPRESSB, CPU Feature: AVX512VBMI2
func (Uint8x64) ConcatPermute ¶
ConcatPermute performs a full permutation of vector x, y using indices: result := {xy[indices[0]], xy[indices[1]], ..., xy[indices[n]]} where xy is the concatenation of x (lower half) and y (upper half). Only the needed bits to represent xy's index are used in indices' elements.
Asm: VPERMI2B, CPU Feature: AVX512VBMI
func (Uint8x64) ConcatShiftBytesRightGrouped ¶
ConcatShiftBytesRightGrouped concatenates x and y and shift it right by constant bytes. The result vector will be the lower half of the concatenated vector. This operation is performed grouped by each 16 byte.
constant results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VPALIGNR, CPU Feature: AVX512
func (Uint8x64) DotProductPairsSaturated ¶
DotProductPairsSaturated multiplies the elements and add the pairs together with saturation, yielding a vector of half as many elements with twice the input element size.
Asm: VPMADDUBSW, CPU Feature: AVX512
func (Uint8x64) Expand ¶
Expand performs an expansion on a vector x whose elements are packed to lower parts. The expansion is to distribute elements as indexed by mask, from lower mask elements to upper in order.
Asm: VPEXPANDB, CPU Feature: AVX512VBMI2
func (Uint8x64) GaloisFieldAffineTransform ¶
GaloisFieldAffineTransform computes an affine transformation in GF(2^8): x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEQB, CPU Feature: AVX512GFNI
func (Uint8x64) GaloisFieldAffineTransformInverse ¶
GaloisFieldAffineTransformInverse computes an affine transformation in GF(2^8), with x inverted with respect to reduction polynomial x^8 + x^4 + x^3 + x + 1: x is a vector of 8-bit vectors, with each adjacent 8 as a group; y is a vector of 8x8 1-bit matrixes; b is an 8-bit vector. The affine transformation is y * x + b, with each element of y corresponding to a group of 8 elements in x.
b results in better performance when it's a constant, a non-constant value will be translated into a jump table.
Asm: VGF2P8AFFINEINVQB, CPU Feature: AVX512GFNI
func (Uint8x64) GaloisFieldMul ¶
GaloisFieldMul computes element-wise GF(2^8) multiplication with reduction polynomial x^8 + x^4 + x^3 + x + 1.
Asm: VGF2P8MULB, CPU Feature: AVX512GFNI
func (Uint8x64) Greater ¶
Greater returns x greater-than y, elementwise.
Asm: VPCMPUB, CPU Feature: AVX512
func (Uint8x64) GreaterEqual ¶
GreaterEqual returns x greater-than-or-equals y, elementwise.
Asm: VPCMPUB, CPU Feature: AVX512
func (Uint8x64) LessEqual ¶
LessEqual returns x less-than-or-equals y, elementwise.
Asm: VPCMPUB, CPU Feature: AVX512
func (Uint8x64) Max ¶
Max computes the maximum of corresponding elements.
Asm: VPMAXUB, CPU Feature: AVX512
func (Uint8x64) Min ¶
Min computes the minimum of corresponding elements.
Asm: VPMINUB, CPU Feature: AVX512
func (Uint8x64) NotEqual ¶
NotEqual returns x not-equals y, elementwise.
Asm: VPCMPUB, CPU Feature: AVX512
func (Uint8x64) OnesCount ¶
OnesCount counts the number of set bits in each element.
Asm: VPOPCNTB, CPU Feature: AVX512BITALG
func (Uint8x64) Or ¶
Or performs a bitwise OR operation between two vectors.
Asm: VPORD, CPU Feature: AVX512
func (Uint8x64) Permute ¶
Permute performs a full permutation of vector x using indices: result := {x[indices[0]], x[indices[1]], ..., x[indices[n]]} The low 6 bits (values 0-63) of each element of indices is used
Asm: VPERMB, CPU Feature: AVX512VBMI
func (Uint8x64) PermuteOrZeroGrouped ¶
PermuteOrZeroGrouped performs a grouped permutation of vector x using indices: result = {x_group0[indices[0]], x_group0[indices[1]], ..., x_group1[indices[16]], x_group1[indices[17]], ...} The lower four bits of each byte-sized index in indices select an element from its corresponding group in x, unless the index's sign bit is set in which case zero is used instead. Each group is of size 128-bit.
Asm: VPSHUFB, CPU Feature: AVX512
func (Uint8x64) SetHi ¶
SetHi returns x with its upper half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint8x64) SetLo ¶
SetLo returns x with its lower half set to y.
Asm: VINSERTI64X4, CPU Feature: AVX512
func (Uint8x64) StoreMasked ¶
StoreMasked stores a Uint8x64 to an array, at those elements enabled by mask
Asm: VMOVDQU8, CPU Feature: AVX512
func (Uint8x64) StoreSlice ¶
StoreSlice stores x into a slice of at least 64 uint8s
func (Uint8x64) StoreSlicePart ¶
StoreSlicePart stores the 64 elements of x into the slice s. It stores as many elements as will fit in s. If s has 64 or more elements, the method is equivalent to x.StoreSlice.
func (Uint8x64) Sub ¶
Sub subtracts corresponding elements of two vectors.
Asm: VPSUBB, CPU Feature: AVX512
func (Uint8x64) SubSaturated ¶
SubSaturated subtracts corresponding elements of two vectors with saturation.
Asm: VPSUBUSB, CPU Feature: AVX512
func (Uint8x64) SumAbsDiff ¶
SumAbsDiff sums the absolute distance of the two input vectors, each adjacent 8 bytes as a group. The output sum will be a vector of word-sized elements whose each 4*n-th element contains the sum of the n-th input group. The other elements in the result vector are zeroed. This method could be seen as the norm of the L1 distance of each adjacent 8-byte vector group of the two input vectors.
Asm: VPSADBW, CPU Feature: AVX512
type X86Features ¶
type X86Features struct{}
var X86 X86Features
func (X86Features) AES ¶
func (X86Features) AES() bool
AES returns whether the CPU supports the AES feature.
AES is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX ¶
func (X86Features) AVX() bool
AVX returns whether the CPU supports the AVX feature.
AVX is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX2 ¶
func (X86Features) AVX2() bool
AVX2 returns whether the CPU supports the AVX2 feature.
AVX2 is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512 ¶
func (X86Features) AVX512() bool
AVX512 returns whether the CPU supports the AVX512F+CD+BW+DQ+VL features.
These five CPU features are bundled together, and no use of AVX-512 is allowed unless all of these features are supported together. Nearly every CPU that has shipped with any support for AVX-512 has supported all five of these features.
AVX512 is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512BITALG ¶
func (X86Features) AVX512BITALG() bool
AVX512BITALG returns whether the CPU supports the AVX512BITALG feature.
AVX512BITALG is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512GFNI ¶
func (X86Features) AVX512GFNI() bool
AVX512GFNI returns whether the CPU supports the AVX512GFNI feature.
AVX512GFNI is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512VAES ¶
func (X86Features) AVX512VAES() bool
AVX512VAES returns whether the CPU supports the AVX512VAES feature.
AVX512VAES is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512VBMI ¶
func (X86Features) AVX512VBMI() bool
AVX512VBMI returns whether the CPU supports the AVX512VBMI feature.
AVX512VBMI is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512VBMI2 ¶
func (X86Features) AVX512VBMI2() bool
AVX512VBMI2 returns whether the CPU supports the AVX512VBMI2 feature.
AVX512VBMI2 is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512VNNI ¶
func (X86Features) AVX512VNNI() bool
AVX512VNNI returns whether the CPU supports the AVX512VNNI feature.
AVX512VNNI is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512VPCLMULQDQ ¶
func (X86Features) AVX512VPCLMULQDQ() bool
AVX512VPCLMULQDQ returns whether the CPU supports the AVX512VPCLMULQDQ feature.
AVX512VPCLMULQDQ is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVX512VPOPCNTDQ ¶
func (X86Features) AVX512VPOPCNTDQ() bool
AVX512VPOPCNTDQ returns whether the CPU supports the AVX512VPOPCNTDQ feature.
AVX512VPOPCNTDQ is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) AVXVNNI ¶
func (X86Features) AVXVNNI() bool
AVXVNNI returns whether the CPU supports the AVXVNNI feature.
AVXVNNI is defined on all GOARCHes, but will only return true on GOARCH amd64.
func (X86Features) SHA ¶
func (X86Features) SHA() bool
SHA returns whether the CPU supports the SHA feature.
SHA is defined on all GOARCHes, but will only return true on GOARCH amd64.
Notes ¶
Bugs ¶
Using a vector type as a type parameter may not work.
Using reflect Call to call a vector function/method may not work.