simd

package
v0.0.10 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 8, 2020 License: Apache-2.0 Imports: 5 Imported by: 10

Documentation

Overview

Package simd provides access to SIMD-based implementations of several common operations on byte arrays which the compiler cannot be trusted to autovectorize within the next several years.

The backend assumes SSE4.2 is available: init() checks for SSE4.2 support, and panics when it isn't there. The interface is designed to allow the backend to also autodetect e.g. AVX2/AVX-512 and opportunistically use those instructions, without any changes to properly written higher-level code. Implementation of the AVX2 part of this is in progress.

The central constraint driving this package's design is the standard Go compiler's inability to inline short assembly functions; see

https://groups.google.com/forum/#!topic/golang-nuts/yVOfeHYCIT4
https://github.com/golang/go/issues/4947#issuecomment-66075571

for more details. As a consequence, it is critical to push looping logic down to the assembly level as well, otherwise function call overhead is overwhelming. Conversely, due to the much higher development burden of this type of code, we don't go beyond that point; this package is essentially a collection of for loops.

Two classes of functions are exported:

- Functions with 'Unsafe' in their names are very performant, but are memory-unsafe, do not validate documented preconditions, and may have the unusual property of reading/writing to a few bytes *past* the end of the given slices. The MakeUnsafe() function and its relatives allocate byte-slices with sufficient extra capacity for all Unsafe functions with the latter property to work properly.

- Their safe analogues work properly on ordinary slices, and often panic when documented preconditions are not met. When a precondition is not explicitly checked (due to computational cost), safe functions may return garbage values when the condition is not met, but they are memory-safe: they will not corrupt unrelated memory or perform out-of-bounds read operations.

Index

Constants

View Source
const BitsPerWord = BytesPerWord * 8

BitsPerWord is the number of bits in a machine word.

View Source
const BytesPerWord = 8

BytesPerWord is the number of bytes in a machine word. We don't use unsafe.Sizeof(uintptr(1)) since there are advantages to having this as an untyped constant, and there's essentially no drawback since this is an _amd64-specific file.

View Source
const Log2BytesPerWord = uint(3)

Log2BytesPerWord is log2(BytesPerWord). This is relevant for manual bit-shifting when we know that's a safe way to divide and the compiler does not (e.g. dividend is of signed int type).

Variables

This section is empty.

Functions

func Accumulate8

func Accumulate8(src []byte) int

Accumulate8 returns the sum of the (unsigned) bytes in src[].

func Accumulate8Greater

func Accumulate8Greater(src []byte, val byte) int

Accumulate8Greater returns the sum of all bytes in src[] greater than the given value.

func AddConst8

func AddConst8(dst, src []byte, val byte)

AddConst8 sets dst[pos] := src[pos] + val for every byte in src (with the usual unsigned overflow). It panics if len(src) != len(dst).

func AddConst8Inplace

func AddConst8Inplace(main []byte, val byte)

AddConst8Inplace adds the given constant to every byte of main[], with unsigned overflow.

func AddConst8Unsafe

func AddConst8Unsafe(dst, src []byte, val byte)

AddConst8Unsafe sets dst[pos] := src[pos] + val for every byte in src (with the usual unsigned overflow).

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe() or XcapUnsafe(), and the same is true for dst[].

1. len(src) and len(dst) are equal.

2. Capacities are at least RoundUpPow2(len(src) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func AddConst8UnsafeInplace

func AddConst8UnsafeInplace(main []byte, val byte)

AddConst8UnsafeInplace adds the given constant to every byte of main[], with unsigned overflow.

WARNING: This is a function designed to be used in inner loops, which assumes without checking that capacity is at least RoundUpPow2(len(main), bytesPerVec). It also assumes that the caller does not care if a few bytes past the end of main[] are changed. Use the safe version of this function if any of these properties are problematic. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

func And

func And(dst, src1, src2 []byte)

And sets dst[pos] := src1[pos] & src2[pos] for every position in dst. It panics if slice lengths don't match.

func AndConst8

func AndConst8(dst, src []byte, val byte)

AndConst8 sets dst[pos] := src[pos] & val for every position in dst. It panics if slice lengths don't match.

func AndConst8Inplace

func AndConst8Inplace(main []byte, val byte)

AndConst8Inplace sets main[pos] := main[pos] & val for every position in main[].

func AndConst8Unsafe

func AndConst8Unsafe(dst, src []byte, val byte)

AndConst8Unsafe sets dst[pos] := src[pos] & val for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

1. len(src) and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func AndConst8UnsafeInplace

func AndConst8UnsafeInplace(main []byte, val byte)

AndConst8UnsafeInplace sets main[pos] := main[pos] & val for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).

2. The caller does not care if a few bytes past the end of main[] are changed.

func AndInplace

func AndInplace(main, arg []byte)

AndInplace sets main[pos] := arg[pos] & main[pos] for every position in main[]. It panics if slice lengths don't match.

func AndUnsafe

func AndUnsafe(dst, src1, src2 []byte)

AndUnsafe sets dst[pos] := src1[pos] & src2[pos] for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].

1. len(src1), len(src2), and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func AndUnsafeInplace

func AndUnsafeInplace(main, arg []byte)

AndUnsafeInplace sets main[pos] := main[pos] & arg[pos] for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].

1. len(arg) and len(main) must be equal.

2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of main[] are changed.

func BitFromEveryByte

func BitFromEveryByte(dst, src []byte, bitIdx int)

BitFromEveryByte fills dst[] with a bitarray containing every 8th bit from src[], starting with bitIdx, where bitIdx is in [0,7]. If len(src) is not divisible by 8, extra bits in the last filled byte of dst are set to zero.

For example, if src[] is

0x1f 0x33 0x0d 0x00 0x51 0xcc 0x34 0x59 0x44

and bitIdx is 2, bit 2 from every byte is

1    0    1    0    0    1    1    0    1

so dst[] is filled with

0x65 0x01.

- It panics if len(dst) < (len(src) + 7) / 8, or bitIdx isn't in [0,7]. - If dst is larger than necessary, the extra bytes are not changed.

func BytesPerVec

func BytesPerVec() int

BytesPerVec is an accessor for the bytesPerVec package variable.

func Count2Bytes

func Count2Bytes(src []byte, val1, val2 byte) int

Count2Bytes counts the number of bytes in src[] which are equal to either val1 or val2. (bytes.Count() should be good enough for a single byte.)

func Count3Bytes

func Count3Bytes(src []byte, val1, val2, val3 byte) int

Count3Bytes counts the number of bytes in src[] which are equal to val1, val2, or val3.

func CountNibblesInSet

func CountNibblesInSet(src []byte, tablePtr *NibbleLookupTable) int

CountNibblesInSet counts the number of nibbles in src[] which are in the given set. The set must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.

WARNING: This function does not validate the table. It may return a garbage result on invalid input. (However, it won't corrupt memory.)

func CountNibblesInTwoSets

func CountNibblesInTwoSets(src []byte, table1Ptr, table2Ptr *NibbleLookupTable) (int, int)

CountNibblesInTwoSets counts the number of bytes in src[] which are in the given two sets, assuming all bytes are <16. The sets must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.

WARNING: This function does not validate the tables. It may crash or return garbage results on invalid input. (However, it won't corrupt memory.)

func CountUnpackedNibblesInSet

func CountUnpackedNibblesInSet(src []byte, tablePtr *NibbleLookupTable) int

CountUnpackedNibblesInSet counts the number of bytes in src[] which are in the given set, assuming all bytes are <16. The set must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.

WARNING: This function does not validate the table. It may crash or return a garbage result on invalid input. (However, it won't corrupt memory.)

func CountUnpackedNibblesInTwoSets

func CountUnpackedNibblesInTwoSets(src []byte, table1Ptr, table2Ptr *NibbleLookupTable) (int, int)

CountUnpackedNibblesInTwoSets counts the number of bytes in src[] which are in the given two sets, assuming all bytes are <16. The sets must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.

WARNING: This function does not validate the tables. It may crash or return garbage results on invalid input. (However, it won't corrupt memory.)

func DivUpPow2

func DivUpPow2(dividend, divisor int, log2Divisor uint) int

DivUpPow2 efficiently divides a number by a power-of-2 divisor. (This works for negative dividends since the language specifies arithmetic right-shifts of signed numbers. I'm pretty sure this doesn't have a performance penalty.)

func FirstGreater8

func FirstGreater8(arg []byte, val byte, startPos int) int

FirstGreater8 scans arg[startPos:] for the first value larger than the given constant, returning its position if one is found, or len(arg) if all bytes are <= (or startPos >= len).

This should only be used when greater values are usually present at ~5% or lower frequency. Above that, use a simple for loop.

func FirstGreater8Unsafe

func FirstGreater8Unsafe(arg []byte, val byte, startPos int) int

FirstGreater8Unsafe scans arg[startPos:] for the first value larger than the given constant, returning its position if one is found, or len(arg) if all bytes are <= (or startPos >= len).

This should only be used when greater values are usually present at ~5% or lower frequency. Above that, use a simple for loop.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. The second assumption is always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

1. startPos is nonnegative.

2. cap(arg) >= RoundUpPow2(len, bytesPerVec).

func FirstLeq8

func FirstLeq8(arg []byte, val byte, startPos int) int

FirstLeq8 scans arg[startPos:] for the first value <= the given constant, returning its position if one is found, or len(arg) if all bytes are greater (or startPos >= len).

This should only be used when <= values are usually present at ~5% or lower frequency. Above that, use a simple for loop.

func FirstLeq8Unsafe

func FirstLeq8Unsafe(arg []byte, val byte, startPos int) int

FirstLeq8Unsafe scans arg[startPos:] for the first value <= the given constant, returning its position if one is found, or len(arg) if all bytes are greater (or startPos >= len).

This should only be used when <= values are usually present at ~5% or lower frequency. Above that, use a simple for loop.

See warning for FirstGreater8Unsafe.

func FirstUnequal8

func FirstUnequal8(arg1, arg2 []byte, startPos int) int

FirstUnequal8 scans arg1[startPos:] and arg2[startPos:] for the first mismatching byte, returning its position if one is found, or the common length if all bytes match (or startPos >= len). It panics if the lengths are not identical, or startPos is negative.

This is essentially an extension of bytes.Compare().

func FirstUnequal8Unsafe

func FirstUnequal8Unsafe(arg1, arg2 []byte, startPos int) int

FirstUnequal8Unsafe scans arg1[startPos:] and arg2[startPos:] for the first mismatching byte, returning its position if one is found, or the common length if all bytes match (or startPos >= len). This has essentially the same speed as bytes.Compare().

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. The second assumption is always satisfied when the last potentially-size-increasing operation on arg1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for arg2[].

1. len(arg1) == len(arg2).

2. Capacities are at least RoundUpPow2(len, bytesPerVec).

func IndexU16 added in v0.0.10

func IndexU16(main []uint16, val uint16) int

IndexU16 returns the index of the first instance of val in main, or -1 if val is not present in main.

func Interleave8

func Interleave8(dst, even, odd []byte)

Interleave8 sets the bytes in dst[] as follows:

if pos is even, dst[pos] := even[pos/2]
if pos is odd, dst[pos] := odd[pos/2]

It panics if ((len(dst) + 1) / 2) != len(even), or (len(dst) / 2) != len(odd).

func Interleave8Unsafe

func Interleave8Unsafe(dst, even, odd []byte)

Interleave8Unsafe sets the bytes in dst[] as follows:

if pos is even, dst[pos] := even[pos/2]
if pos is odd, dst[pos] := odd[pos/2]

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on dst[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for even[] and odd[].

1. len(even) = (len(dst) + 1) / 2, and len(odd) = len(dst) / 2.

2. cap(dst) >= RoundUpPow2(len(dst) + 1, bytesPerVec), cap(even) >= RoundUpPow2(len(even) + 1, bytesPerVec), and cap(odd) >= RoundUpPow2(len(odd) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func Invmask

func Invmask(dst, src1, src2 []byte)

Invmask sets dst[pos] := src1[pos] &^ src2[pos] for every position in dst. It panics if slice lengths don't match.

func InvmaskConst8

func InvmaskConst8(dst, src []byte, val byte)

InvmaskConst8 sets dst[pos] := src[pos] &^ val for every position in dst. It panics if slice lengths don't match.

func InvmaskConst8Inplace

func InvmaskConst8Inplace(main []byte, val byte)

InvmaskConst8Inplace sets main[pos] := main[pos] &^ val for every position in main[].

func InvmaskConst8Unsafe

func InvmaskConst8Unsafe(dst, src []byte, val byte)

InvmaskConst8Unsafe sets dst[pos] := src[pos] &^ val for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

1. len(src) and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func InvmaskConst8UnsafeInplace

func InvmaskConst8UnsafeInplace(main []byte, val byte)

InvmaskConst8UnsafeInplace sets main[pos] := main[pos] &^ val for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).

2. The caller does not care if a few bytes past the end of main[] are changed.

func InvmaskInplace

func InvmaskInplace(main, arg []byte)

InvmaskInplace sets main[pos] := arg[pos] &^ main[pos] for every position in main[]. It panics if slice lengths don't match.

func InvmaskUnsafe

func InvmaskUnsafe(dst, src1, src2 []byte)

InvmaskUnsafe sets dst[pos] := src1[pos] &^ src2[pos] for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].

1. len(src1), len(src2), and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func InvmaskUnsafeInplace

func InvmaskUnsafeInplace(main, arg []byte)

InvmaskUnsafeInplace sets main[pos] := main[pos] &^ arg[pos] for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].

1. len(arg) and len(main) must be equal.

2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of main[] are changed.

func MakeUnsafe

func MakeUnsafe(len int) []byte

MakeUnsafe returns a byte slice of the given length which is guaranteed to have enough capacity for all Unsafe functions in this package to work. (It is not itself an unsafe function: allocated memory is zero-initialized.) Note that Unsafe functions occasionally have other caveats: e.g. PopcntUnsafe also requires relevant bytes past the end of the slice to be zeroed out.

func MaskThenCountByte

func MaskThenCountByte(src []byte, mask, val byte) int

MaskThenCountByte counts the number of bytes in src[] satisfying

src[pos] & mask == val.

func Memset16Raw

func Memset16Raw(dst, valPtr unsafe.Pointer, nElem int)

Memset16Raw assumes dst points to an array of nElem 2-byte elements, and valPtr points to a single 2-byte element. It fills dst with copies of *valPtr.

func Memset32Raw

func Memset32Raw(dst, valPtr unsafe.Pointer, nElem int)

Memset32Raw assumes dst points to an array of nElem 4-byte elements, and valPtr points to a single 4-byte element. It fills dst with copies of *valPtr.

func Memset8

func Memset8(dst []byte, val byte)

Memset8 sets all values of dst[] to the given byte. (This is intended for val != 0. It is better to use a range-for loop for val == 0 since the compiler has a hardcoded optimization for that case.)

func Memset8Unsafe

func Memset8Unsafe(dst []byte, val byte)

Memset8Unsafe sets all values of dst[] to the given byte. (This is intended for val != 0. It is better to use a range-for loop for val == 0 since the compiler has a hardcoded optimization for that case; see https://github.com/golang/go/issues/5373 .)

WARNING: This is a function designed to be used in inner loops, which assumes without checking that capacity is at least RoundUpPow2(len(dst), bytesPerVec). It also assumes that the caller does not care if a few bytes past the end of dst[] are changed. Use the safe version of this function if any of these properties are problematic. These assumptions are always satisfied when the last potentially-size-increasing operation on dst[] is {Make,Remake}Unsafe(), ResizeUnsafe(), or XcapUnsafe().

func Or

func Or(dst, src1, src2 []byte)

Or sets dst[pos] := src1[pos] | src2[pos] for every position in dst. It panics if slice lengths don't match.

func OrConst8

func OrConst8(dst, src []byte, val byte)

OrConst8 sets dst[pos] := src[pos] | val for every position in dst. It panics if slice lengths don't match.

func OrConst8Inplace

func OrConst8Inplace(main []byte, val byte)

OrConst8Inplace sets main[pos] := main[pos] | val for every position in main[].

func OrConst8Unsafe

func OrConst8Unsafe(dst, src []byte, val byte)

OrConst8Unsafe sets dst[pos] := src[pos] | val for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

1. len(src) and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func OrConst8UnsafeInplace

func OrConst8UnsafeInplace(main []byte, val byte)

OrConst8UnsafeInplace sets main[pos] := main[pos] | val for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).

2. The caller does not care if a few bytes past the end of main[] are changed.

func OrInplace

func OrInplace(main, arg []byte)

OrInplace sets main[pos] := arg[pos] | main[pos] for every position in main[]. It panics if slice lengths don't match.

func OrUnsafe

func OrUnsafe(dst, src1, src2 []byte)

OrUnsafe sets dst[pos] := src1[pos] | src2[pos] for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].

1. len(src1), len(src2), and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func OrUnsafeInplace

func OrUnsafeInplace(main, arg []byte)

OrUnsafeInplace sets main[pos] := main[pos] | arg[pos] for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].

1. len(arg) and len(main) must be equal.

2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of main[] are changed.

func PackedNibbleLookup

func PackedNibbleLookup(dst, src []byte, tablePtr *NibbleLookupTable)

PackedNibbleLookup sets the bytes in dst[] as follows:

if pos is even, dst[pos] := table[src[pos / 2] & 15]
if pos is odd, dst[pos] := table[src[pos / 2] >> 4]

It panics if len(src) != (len(dst) + 1) / 2.

Nothing bad happens if len(dst) is odd and some high bits in the last src[] byte are set, though it's generally good practice to ensure that case doesn't come up.

func PackedNibbleLookupUnsafe

func PackedNibbleLookupUnsafe(dst, src []byte, tablePtr *NibbleLookupTable)

PackedNibbleLookupUnsafe sets the bytes in dst[] as follows:

if pos is even, dst[pos] := table[src[pos / 2] & 15]
if pos is odd, dst[pos] := table[src[pos / 2] >> 4]

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-#3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

1. len(src) == (len(dst) + 1) / 2.

2. Capacity of src is at least RoundUpPow2(len(src) + 1, bytesPerVec), and the same is true for dst.

3. The caller does not care if a few bytes past the end of dst[] are changed.

func Popcnt

func Popcnt(bytes []byte) int

Popcnt returns the number of set bits in the given []byte.

Some effort has been made to make this run acceptably fast on relatively short arrays, since I expect knowing how to do so to be helpful when working with hundreds of millions of .bam reads with ~75 bytes of base data and ~150 bytes of quality data. Interestingly, moving the leading-byte handling code to assembly didn't make a difference.

Some single-threaded benchmark results calling Popcnt 99999999 times on a 14-byte unaligned array:

C implementation: 0.219-0.232s
This code: 0.606-0.620s
C implementation using memcpy for trailing bytes: 0.964-0.983s

So Go's extra looping and function call overhead can almost triple runtime in the short-array limit, but that's actually not as bad as the 4.5x overhead of trusting memcpy to handle trailing bytes.

func PopcntUnsafe

func PopcntUnsafe(bytes []byte) int

PopcntUnsafe returns the number of set bits in the given []byte, assuming that any trailing bytes up to the next multiple of BytesPerWord are zeroed out.

func RemakeUnsafe

func RemakeUnsafe(bufptr *[]byte, len int)

RemakeUnsafe reuses the given buffer if it has sufficient capacity; otherwise it does the same thing as MakeUnsafe. It does NOT preserve existing contents of buf[]; use ResizeUnsafe() for that.

func RepeatI16

func RepeatI16(dst []int16, val int16)

RepeatI16 fills dst[] with the given int16.

func RepeatU16

func RepeatU16(dst []uint16, val uint16)

RepeatU16 fills dst[] with the given uint16.

func ResizeUnsafe

func ResizeUnsafe(bufptr *[]byte, len int)

ResizeUnsafe changes the length of buf and ensures it has enough extra capacity to be passed to this package's Unsafe functions. Existing buf[] contents are preserved (with possible truncation), though when length is increased, new bytes might not be zero-initialized.

func Reverse16InplaceRaw

func Reverse16InplaceRaw(main unsafe.Pointer, nElem int)

Reverse16InplaceRaw assumes main points to an array of ct 2-byte elements, and reverses it in-place.

func Reverse16Raw

func Reverse16Raw(dst, src unsafe.Pointer, nElem int)

Reverse16Raw assumes dst and src both point to arrays of ct 2-byte elements, and sets dst[pos] := src[ct - 1 - pos] for each position.

func Reverse8

func Reverse8(dst, src []byte)

Reverse8 sets dst[pos] := src[len(src) - 1 - pos] for every position in src. It panics if len(src) != len(dst).

func Reverse8Inplace

func Reverse8Inplace(main []byte)

Reverse8Inplace reverses the bytes in main[]. (There is no unsafe version of this function.)

func Reverse8Unsafe

func Reverse8Unsafe(dst, src []byte)

Reverse8Unsafe sets dst[pos] := src[len(src) - 1 - pos] for every position in src.

WARNING: This does not verify len(dst) == len(src); call the safe version of this function if you want that.

func ReverseI16

func ReverseI16(dst, src []int16)

ReverseI16 sets dst[len(src) - 1 - pos] := src[pos] for each position in src. It panics if len(src) != len(dst).

func ReverseI16Inplace

func ReverseI16Inplace(main []int16)

ReverseI16Inplace reverses a []int16 in-place.

func ReverseU16

func ReverseU16(dst, src []uint16)

ReverseU16 sets dst[len(src) - 1 - pos] := src[pos] for each position in src. It panics if len(src) != len(dst).

func ReverseU16Inplace

func ReverseU16Inplace(main []uint16)

ReverseU16Inplace reverses a []uint16 in-place.

func RoundUpPow2

func RoundUpPow2(val, alignment int) int

RoundUpPow2 returns val rounded up to a multiple of alignment, assuming alignment is a power of 2.

func SubtractFromConst8

func SubtractFromConst8(dst, src []byte, val byte)

SubtractFromConst8 sets dst[pos] := val - src[pos] for every byte in src (with the usual unsigned overflow). It panics if len(src) != len(dst).

func SubtractFromConst8Inplace

func SubtractFromConst8Inplace(main []byte, val byte)

SubtractFromConst8Inplace subtracts every byte of main[] from the given constant, with unsigned underflow.

func SubtractFromConst8Unsafe

func SubtractFromConst8Unsafe(dst, src []byte, val byte)

SubtractFromConst8Unsafe sets dst[pos] := val - src[pos] for every byte in src (with the usual unsigned overflow).

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe() or XcapUnsafe(), and the same is true for dst[].

1. len(src) and len(dst) are equal.

2. Capacities are at least RoundUpPow2(len(src) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func SubtractFromConst8UnsafeInplace

func SubtractFromConst8UnsafeInplace(main []byte, val byte)

SubtractFromConst8UnsafeInplace subtracts every byte of main[] from the given constant, with unsigned underflow.

WARNING: This is a function designed to be used in inner loops, which assumes without checking that capacity is at least RoundUpPow2(len(main), bytesPerVec). It also assumes that the caller does not care if a few bytes past the end of main[] are changed. Use the safe version of this function if any of these properties are problematic. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

func UnpackedNibbleLookup

func UnpackedNibbleLookup(dst, src []byte, tablePtr *NibbleLookupTable)

UnpackedNibbleLookup sets the bytes in dst[] as follows:

if src[pos] < 128, set dst[pos] := table[src[pos] & 15]
otherwise, set dst[pos] := 0

It panics if len(src) != len(dst).

func UnpackedNibbleLookupInplace

func UnpackedNibbleLookupInplace(main []byte, tablePtr *NibbleLookupTable)

UnpackedNibbleLookupInplace replaces the bytes in main[] as follows:

if value < 128, set to table[value & 15]
otherwise, set to 0

func UnpackedNibbleLookupS added in v0.0.10

func UnpackedNibbleLookupS(dst []byte, src string, tablePtr *NibbleLookupTable)

UnpackedNibbleLookupS is a variant of UnpackedNibbleLookup() that takes string src.

func UnpackedNibbleLookupUnsafe

func UnpackedNibbleLookupUnsafe(dst, src []byte, tablePtr *NibbleLookupTable)

UnpackedNibbleLookupUnsafe sets the bytes in dst[] as follows:

if src[pos] < 128, set dst[pos] := table[src[pos] & 15]
otherwise, set dst[pos] := 0

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

1. len(src) and len(dst) are equal.

2. Capacities are at least RoundUpPow2(len(src) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func UnpackedNibbleLookupUnsafeInplace

func UnpackedNibbleLookupUnsafeInplace(main []byte, tablePtr *NibbleLookupTable)

UnpackedNibbleLookupUnsafeInplace replaces the bytes in main[] as follows:

if value < 128, set to table[value & 15]
otherwise, set to 0

WARNING: This is a function designed to be used in inner loops, which makes assumptions about capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Make,Remake}Unsafe(), ResizeUnsafe(), or XcapUnsafe().

1. cap(main) must be at least RoundUpPow2(len(main) + 1, bytesPerVec).

2. The caller does not care if a few bytes past the end of main[] are changed.

func XcapUnsafe

func XcapUnsafe(bufptr *[]byte)

XcapUnsafe is shorthand for ResizeUnsafe's most common use case (no length change, just want to ensure sufficient capacity).

func Xor

func Xor(dst, src1, src2 []byte)

Xor sets dst[pos] := src1[pos] ^ src2[pos] for every position in dst. It panics if slice lengths don't match.

func XorConst8

func XorConst8(dst, src []byte, val byte)

XorConst8 sets dst[pos] := src[pos] ^ val for every position in dst. It panics if slice lengths don't match.

func XorConst8Inplace

func XorConst8Inplace(main []byte, val byte)

XorConst8Inplace sets main[pos] := main[pos] ^ val for every position in main[].

func XorConst8Unsafe

func XorConst8Unsafe(dst, src []byte, val byte)

XorConst8Unsafe sets dst[pos] := src[pos] ^ val for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

1. len(src) and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func XorConst8UnsafeInplace

func XorConst8UnsafeInplace(main []byte, val byte)

XorConst8UnsafeInplace sets main[pos] := main[pos] ^ val for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).

2. The caller does not care if a few bytes past the end of main[] are changed.

func XorInplace

func XorInplace(main, arg []byte)

XorInplace sets main[pos] := arg[pos] ^ main[pos] for every position in main[]. It panics if slice lengths don't match.

func XorUnsafe

func XorUnsafe(dst, src1, src2 []byte)

XorUnsafe sets dst[pos] := src1[pos] ^ src2[pos] for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].

1. len(src1), len(src2), and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func XorUnsafeInplace

func XorUnsafeInplace(main, arg []byte)

XorUnsafeInplace sets main[pos] := main[pos] ^ arg[pos] for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].

1. len(arg) and len(main) must be equal.

2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of main[] are changed.

Types

type NibbleLookupTable

type NibbleLookupTable struct {
	// contains filtered or unexported fields
}

NibbleLookupTable represents a parallel-byte-substitution operation f, where every byte b in a byte-slice is replaced with

f(b) := shuffle[0][b & 15] for b <= 127, and
f(b) := 0 for b > 127.

(The second part is usually irrelevant in practice, but must be defined this way to allow _mm_shuffle_epi8()/_mm256_shuffle_epi8()/_mm512_shuffle_epi8() to be used to implement the operation efficiently.) It's named NibbleLookupTable rather than ByteLookupTable since only the bottom nibble of each byte can be used for table lookup. It potentially stores multiple adjacent copies of the lookup table since that speeds up the AVX2 and AVX-512 use cases (the table can be loaded with a single _mm256_loadu_si256 operation, instead of e.g. _mm_loadu_si128 followed by _mm256_set_m128i with the same argument twice), and the typical use case involves initializing very few tables and using them many, many times.

func MakeNibbleLookupTable

func MakeNibbleLookupTable(table [16]byte) (t NibbleLookupTable)

MakeNibbleLookupTable generates a NibbleLookupTable from a [16]byte.

func (*NibbleLookupTable) Get

func (t *NibbleLookupTable) Get(b byte) byte

Get performs the b <= 127 part of the lookup operation described above. The b > 127 branch is omitted because in many use cases (e.g. PackedNibbleLookup below), it can be proven that b > 127 is impossible, and removing the if-statement is a significant performance win when it's possible.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL