Documentation

Overview

    Package simd provides access to SIMD-based implementations of several common operations on byte arrays which the compiler cannot be trusted to autovectorize within the next several years.

    The backend assumes SSE4.2 is available: init() checks for SSE4.2 support, and panics when it isn't there. The interface is designed to allow the backend to also autodetect e.g. AVX2/AVX-512 and opportunistically use those instructions, without any changes to properly written higher-level code. Implementation of the AVX2 part of this is in progress.

    The central constraint driving this package's design is the standard Go compiler's inability to inline short assembly functions; see

    https://groups.google.com/forum/#!topic/golang-nuts/yVOfeHYCIT4
    https://github.com/golang/go/issues/4947#issuecomment-66075571
    

    for more details. As a consequence, it is critical to push looping logic down to the assembly level as well, otherwise function call overhead is overwhelming. Conversely, due to the much higher development burden of this type of code, we don't go beyond that point; this package is essentially a collection of for loops.

    Two classes of functions are exported:

    - Functions with 'Unsafe' in their names are very performant, but are memory-unsafe, do not validate documented preconditions, and may have the unusual property of reading/writing to a few bytes *past* the end of the given slices. The MakeUnsafe() function and its relatives allocate byte-slices with sufficient extra capacity for all Unsafe functions with the latter property to work properly.

    - Their safe analogues work properly on ordinary slices, and often panic when documented preconditions are not met. When a precondition is not explicitly checked (due to computational cost), safe functions may return garbage values when the condition is not met, but they are memory-safe: they will not corrupt unrelated memory or perform out-of-bounds read operations.

    Index

    Constants

    View Source
    const BitsPerWord = BytesPerWord * 8

      BitsPerWord is the number of bits in a machine word.

      View Source
      const BytesPerWord = 8

        BytesPerWord is the number of bytes in a machine word. We don't use unsafe.Sizeof(uintptr(1)) since there are advantages to having this as an untyped constant, and there's essentially no drawback since this is an _amd64-specific file.

        View Source
        const Log2BytesPerWord = uint(3)

          Log2BytesPerWord is log2(BytesPerWord). This is relevant for manual bit-shifting when we know that's a safe way to divide and the compiler does not (e.g. dividend is of signed int type).

          Variables

          This section is empty.

          Functions

          func Accumulate8

          func Accumulate8(src []byte) int

            Accumulate8 returns the sum of the (unsigned) bytes in src[].

            func Accumulate8Greater

            func Accumulate8Greater(src []byte, val byte) int

              Accumulate8Greater returns the sum of all bytes in src[] greater than the given value.

              func AddConst8

              func AddConst8(dst, src []byte, val byte)

                AddConst8 sets dst[pos] := src[pos] + val for every byte in src (with the usual unsigned overflow). It panics if len(src) != len(dst).

                func AddConst8Inplace

                func AddConst8Inplace(main []byte, val byte)

                  AddConst8Inplace adds the given constant to every byte of main[], with unsigned overflow.

                  func AddConst8Unsafe

                  func AddConst8Unsafe(dst, src []byte, val byte)

                    AddConst8Unsafe sets dst[pos] := src[pos] + val for every byte in src (with the usual unsigned overflow).

                    WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe() or XcapUnsafe(), and the same is true for dst[].

                    1. len(src) and len(dst) are equal.

                    2. Capacities are at least RoundUpPow2(len(src) + 1, bytesPerVec).

                    3. The caller does not care if a few bytes past the end of dst[] are changed.

                    func AddConst8UnsafeInplace

                    func AddConst8UnsafeInplace(main []byte, val byte)

                      AddConst8UnsafeInplace adds the given constant to every byte of main[], with unsigned overflow.

                      WARNING: This is a function designed to be used in inner loops, which assumes without checking that capacity is at least RoundUpPow2(len(main), bytesPerVec). It also assumes that the caller does not care if a few bytes past the end of main[] are changed. Use the safe version of this function if any of these properties are problematic. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

                      func And

                      func And(dst, src1, src2 []byte)

                        And sets dst[pos] := src1[pos] & src2[pos] for every position in dst. It panics if slice lengths don't match.

                        func AndConst8

                        func AndConst8(dst, src []byte, val byte)

                          AndConst8 sets dst[pos] := src[pos] & val for every position in dst. It panics if slice lengths don't match.

                          func AndConst8Inplace

                          func AndConst8Inplace(main []byte, val byte)

                            AndConst8Inplace sets main[pos] := main[pos] & val for every position in main[].

                            func AndConst8Unsafe

                            func AndConst8Unsafe(dst, src []byte, val byte)

                              AndConst8Unsafe sets dst[pos] := src[pos] & val for every position in dst.

                              WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

                              1. len(src) and len(dst) must be equal.

                              2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

                              3. The caller does not care if a few bytes past the end of dst[] are changed.

                              func AndConst8UnsafeInplace

                              func AndConst8UnsafeInplace(main []byte, val byte)

                                AndConst8UnsafeInplace sets main[pos] := main[pos] & val for every position in main[].

                                WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

                                1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).

                                2. The caller does not care if a few bytes past the end of main[] are changed.

                                func AndInplace

                                func AndInplace(main, arg []byte)

                                  AndInplace sets main[pos] := arg[pos] & main[pos] for every position in main[]. It panics if slice lengths don't match.

                                  func AndUnsafe

                                  func AndUnsafe(dst, src1, src2 []byte)

                                    AndUnsafe sets dst[pos] := src1[pos] & src2[pos] for every position in dst.

                                    WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].

                                    1. len(src1), len(src2), and len(dst) must be equal.

                                    2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

                                    3. The caller does not care if a few bytes past the end of dst[] are changed.

                                    func AndUnsafeInplace

                                    func AndUnsafeInplace(main, arg []byte)

                                      AndUnsafeInplace sets main[pos] := main[pos] & arg[pos] for every position in main[].

                                      WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].

                                      1. len(arg) and len(main) must be equal.

                                      2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).

                                      3. The caller does not care if a few bytes past the end of main[] are changed.

                                      func BitFromEveryByte

                                      func BitFromEveryByte(dst, src []byte, bitIdx int)

                                        BitFromEveryByte fills dst[] with a bitarray containing every 8th bit from src[], starting with bitIdx, where bitIdx is in [0,7]. If len(src) is not divisible by 8, extra bits in the last filled byte of dst are set to zero.

                                        For example, if src[] is

                                        0x1f 0x33 0x0d 0x00 0x51 0xcc 0x34 0x59 0x44
                                        

                                        and bitIdx is 2, bit 2 from every byte is

                                        1    0    1    0    0    1    1    0    1
                                        

                                        so dst[] is filled with

                                        0x65 0x01.
                                        

                                        - It panics if len(dst) < (len(src) + 7) / 8, or bitIdx isn't in [0,7]. - If dst is larger than necessary, the extra bytes are not changed.

                                        func BytesPerVec

                                        func BytesPerVec() int

                                          BytesPerVec is an accessor for the bytesPerVec package variable.

                                          func Count2Bytes

                                          func Count2Bytes(src []byte, val1, val2 byte) int

                                            Count2Bytes counts the number of bytes in src[] which are equal to either val1 or val2. (bytes.Count() should be good enough for a single byte.)

                                            func Count3Bytes

                                            func Count3Bytes(src []byte, val1, val2, val3 byte) int

                                              Count3Bytes counts the number of bytes in src[] which are equal to val1, val2, or val3.

                                              func CountNibblesInSet

                                              func CountNibblesInSet(src []byte, tablePtr *NibbleLookupTable) int

                                                CountNibblesInSet counts the number of nibbles in src[] which are in the given set. The set must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.

                                                WARNING: This function does not validate the table. It may return a garbage result on invalid input. (However, it won't corrupt memory.)

                                                func CountNibblesInTwoSets

                                                func CountNibblesInTwoSets(src []byte, table1Ptr, table2Ptr *NibbleLookupTable) (int, int)

                                                  CountNibblesInTwoSets counts the number of bytes in src[] which are in the given two sets, assuming all bytes are <16. The sets must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.

                                                  WARNING: This function does not validate the tables. It may crash or return garbage results on invalid input. (However, it won't corrupt memory.)

                                                  func CountUnpackedNibblesInSet

                                                  func CountUnpackedNibblesInSet(src []byte, tablePtr *NibbleLookupTable) int

                                                    CountUnpackedNibblesInSet counts the number of bytes in src[] which are in the given set, assuming all bytes are <16. The set must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.

                                                    WARNING: This function does not validate the table. It may crash or return a garbage result on invalid input. (However, it won't corrupt memory.)

                                                    func CountUnpackedNibblesInTwoSets

                                                    func CountUnpackedNibblesInTwoSets(src []byte, table1Ptr, table2Ptr *NibbleLookupTable) (int, int)

                                                      CountUnpackedNibblesInTwoSets counts the number of bytes in src[] which are in the given two sets, assuming all bytes are <16. The sets must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.

                                                      WARNING: This function does not validate the tables. It may crash or return garbage results on invalid input. (However, it won't corrupt memory.)

                                                      func DivUpPow2

                                                      func DivUpPow2(dividend, divisor int, log2Divisor uint) int

                                                        DivUpPow2 efficiently divides a number by a power-of-2 divisor. (This works for negative dividends since the language specifies arithmetic right-shifts of signed numbers. I'm pretty sure this doesn't have a performance penalty.)

                                                        func FirstGreater8

                                                        func FirstGreater8(arg []byte, val byte, startPos int) int

                                                          FirstGreater8 scans arg[startPos:] for the first value larger than the given constant, returning its position if one is found, or len(arg) if all bytes are <= (or startPos >= len).

                                                          This should only be used when greater values are usually present at ~5% or lower frequency. Above that, use a simple for loop.

                                                          func FirstGreater8Unsafe

                                                          func FirstGreater8Unsafe(arg []byte, val byte, startPos int) int

                                                            FirstGreater8Unsafe scans arg[startPos:] for the first value larger than the given constant, returning its position if one is found, or len(arg) if all bytes are <= (or startPos >= len).

                                                            This should only be used when greater values are usually present at ~5% or lower frequency. Above that, use a simple for loop.

                                                            WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. The second assumption is always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

                                                            1. startPos is nonnegative.

                                                            2. cap(arg) >= RoundUpPow2(len, bytesPerVec).

                                                            func FirstLeq8

                                                            func FirstLeq8(arg []byte, val byte, startPos int) int

                                                              FirstLeq8 scans arg[startPos:] for the first value <= the given constant, returning its position if one is found, or len(arg) if all bytes are greater (or startPos >= len).

                                                              This should only be used when <= values are usually present at ~5% or lower frequency. Above that, use a simple for loop.

                                                              func FirstLeq8Unsafe

                                                              func FirstLeq8Unsafe(arg []byte, val byte, startPos int) int

                                                                FirstLeq8Unsafe scans arg[startPos:] for the first value <= the given constant, returning its position if one is found, or len(arg) if all bytes are greater (or startPos >= len).

                                                                This should only be used when <= values are usually present at ~5% or lower frequency. Above that, use a simple for loop.

                                                                See warning for FirstGreater8Unsafe.

                                                                func FirstUnequal8

                                                                func FirstUnequal8(arg1, arg2 []byte, startPos int) int

                                                                  FirstUnequal8 scans arg1[startPos:] and arg2[startPos:] for the first mismatching byte, returning its position if one is found, or the common length if all bytes match (or startPos >= len). It panics if the lengths are not identical, or startPos is negative.

                                                                  This is essentially an extension of bytes.Compare().

                                                                  func FirstUnequal8Unsafe

                                                                  func FirstUnequal8Unsafe(arg1, arg2 []byte, startPos int) int

                                                                    FirstUnequal8Unsafe scans arg1[startPos:] and arg2[startPos:] for the first mismatching byte, returning its position if one is found, or the common length if all bytes match (or startPos >= len). This has essentially the same speed as bytes.Compare().

                                                                    WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. The second assumption is always satisfied when the last potentially-size-increasing operation on arg1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for arg2[].

                                                                    1. len(arg1) == len(arg2).

                                                                    2. Capacities are at least RoundUpPow2(len, bytesPerVec).

                                                                    func IndexU16

                                                                    func IndexU16(main []uint16, val uint16) int

                                                                      IndexU16 returns the index of the first instance of val in main, or -1 if val is not present in main.

                                                                      func Interleave8

                                                                      func Interleave8(dst, even, odd []byte)

                                                                        Interleave8 sets the bytes in dst[] as follows:

                                                                        if pos is even, dst[pos] := even[pos/2]
                                                                        if pos is odd, dst[pos] := odd[pos/2]
                                                                        

                                                                        It panics if ((len(dst) + 1) / 2) != len(even), or (len(dst) / 2) != len(odd).

                                                                        func Interleave8Unsafe

                                                                        func Interleave8Unsafe(dst, even, odd []byte)

                                                                          Interleave8Unsafe sets the bytes in dst[] as follows:

                                                                          if pos is even, dst[pos] := even[pos/2]
                                                                          if pos is odd, dst[pos] := odd[pos/2]
                                                                          

                                                                          WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on dst[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for even[] and odd[].

                                                                          1. len(even) = (len(dst) + 1) / 2, and len(odd) = len(dst) / 2.

                                                                          2. cap(dst) >= RoundUpPow2(len(dst) + 1, bytesPerVec), cap(even) >= RoundUpPow2(len(even) + 1, bytesPerVec), and cap(odd) >= RoundUpPow2(len(odd) + 1, bytesPerVec).

                                                                          3. The caller does not care if a few bytes past the end of dst[] are changed.

                                                                          func Invmask

                                                                          func Invmask(dst, src1, src2 []byte)

                                                                            Invmask sets dst[pos] := src1[pos] &^ src2[pos] for every position in dst. It panics if slice lengths don't match.

                                                                            func InvmaskConst8

                                                                            func InvmaskConst8(dst, src []byte, val byte)

                                                                              InvmaskConst8 sets dst[pos] := src[pos] &^ val for every position in dst. It panics if slice lengths don't match.

                                                                              func InvmaskConst8Inplace

                                                                              func InvmaskConst8Inplace(main []byte, val byte)

                                                                                InvmaskConst8Inplace sets main[pos] := main[pos] &^ val for every position in main[].

                                                                                func InvmaskConst8Unsafe

                                                                                func InvmaskConst8Unsafe(dst, src []byte, val byte)

                                                                                  InvmaskConst8Unsafe sets dst[pos] := src[pos] &^ val for every position in dst.

                                                                                  WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

                                                                                  1. len(src) and len(dst) must be equal.

                                                                                  2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

                                                                                  3. The caller does not care if a few bytes past the end of dst[] are changed.

                                                                                  func InvmaskConst8UnsafeInplace

                                                                                  func InvmaskConst8UnsafeInplace(main []byte, val byte)

                                                                                    InvmaskConst8UnsafeInplace sets main[pos] := main[pos] &^ val for every position in main[].

                                                                                    WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

                                                                                    1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).

                                                                                    2. The caller does not care if a few bytes past the end of main[] are changed.

                                                                                    func InvmaskInplace

                                                                                    func InvmaskInplace(main, arg []byte)

                                                                                      InvmaskInplace sets main[pos] := arg[pos] &^ main[pos] for every position in main[]. It panics if slice lengths don't match.

                                                                                      func InvmaskUnsafe

                                                                                      func InvmaskUnsafe(dst, src1, src2 []byte)

                                                                                        InvmaskUnsafe sets dst[pos] := src1[pos] &^ src2[pos] for every position in dst.

                                                                                        WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].

                                                                                        1. len(src1), len(src2), and len(dst) must be equal.

                                                                                        2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

                                                                                        3. The caller does not care if a few bytes past the end of dst[] are changed.

                                                                                        func InvmaskUnsafeInplace

                                                                                        func InvmaskUnsafeInplace(main, arg []byte)

                                                                                          InvmaskUnsafeInplace sets main[pos] := main[pos] &^ arg[pos] for every position in main[].

                                                                                          WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].

                                                                                          1. len(arg) and len(main) must be equal.

                                                                                          2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).

                                                                                          3. The caller does not care if a few bytes past the end of main[] are changed.

                                                                                          func MakeUnsafe

                                                                                          func MakeUnsafe(len int) []byte

                                                                                            MakeUnsafe returns a byte slice of the given length which is guaranteed to have enough capacity for all Unsafe functions in this package to work. (It is not itself an unsafe function: allocated memory is zero-initialized.) Note that Unsafe functions occasionally have other caveats: e.g. PopcntUnsafe also requires relevant bytes past the end of the slice to be zeroed out.

                                                                                            func MaskThenCountByte

                                                                                            func MaskThenCountByte(src []byte, mask, val byte) int

                                                                                              MaskThenCountByte counts the number of bytes in src[] satisfying

                                                                                              src[pos] & mask == val.
                                                                                              

                                                                                              func Memset16Raw

                                                                                              func Memset16Raw(dst, valPtr unsafe.Pointer, nElem int)

                                                                                                Memset16Raw assumes dst points to an array of nElem 2-byte elements, and valPtr points to a single 2-byte element. It fills dst with copies of *valPtr.

                                                                                                func Memset32Raw

                                                                                                func Memset32Raw(dst, valPtr unsafe.Pointer, nElem int)

                                                                                                  Memset32Raw assumes dst points to an array of nElem 4-byte elements, and valPtr points to a single 4-byte element. It fills dst with copies of *valPtr.

                                                                                                  func Memset8

                                                                                                  func Memset8(dst []byte, val byte)

                                                                                                    Memset8 sets all values of dst[] to the given byte. (This is intended for val != 0. It is better to use a range-for loop for val == 0 since the compiler has a hardcoded optimization for that case.)

                                                                                                    func Memset8Unsafe

                                                                                                    func Memset8Unsafe(dst []byte, val byte)

                                                                                                      Memset8Unsafe sets all values of dst[] to the given byte. (This is intended for val != 0. It is better to use a range-for loop for val == 0 since the compiler has a hardcoded optimization for that case; see https://github.com/golang/go/issues/5373 .)

                                                                                                      WARNING: This is a function designed to be used in inner loops, which assumes without checking that capacity is at least RoundUpPow2(len(dst), bytesPerVec). It also assumes that the caller does not care if a few bytes past the end of dst[] are changed. Use the safe version of this function if any of these properties are problematic. These assumptions are always satisfied when the last potentially-size-increasing operation on dst[] is {Make,Remake}Unsafe(), ResizeUnsafe(), or XcapUnsafe().

                                                                                                      func Or

                                                                                                      func Or(dst, src1, src2 []byte)

                                                                                                        Or sets dst[pos] := src1[pos] | src2[pos] for every position in dst. It panics if slice lengths don't match.

                                                                                                        func OrConst8

                                                                                                        func OrConst8(dst, src []byte, val byte)

                                                                                                          OrConst8 sets dst[pos] := src[pos] | val for every position in dst. It panics if slice lengths don't match.

                                                                                                          func OrConst8Inplace

                                                                                                          func OrConst8Inplace(main []byte, val byte)

                                                                                                            OrConst8Inplace sets main[pos] := main[pos] | val for every position in main[].

                                                                                                            func OrConst8Unsafe

                                                                                                            func OrConst8Unsafe(dst, src []byte, val byte)

                                                                                                              OrConst8Unsafe sets dst[pos] := src[pos] | val for every position in dst.

                                                                                                              WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

                                                                                                              1. len(src) and len(dst) must be equal.

                                                                                                              2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

                                                                                                              3. The caller does not care if a few bytes past the end of dst[] are changed.

                                                                                                              func OrConst8UnsafeInplace

                                                                                                              func OrConst8UnsafeInplace(main []byte, val byte)

                                                                                                                OrConst8UnsafeInplace sets main[pos] := main[pos] | val for every position in main[].

                                                                                                                WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

                                                                                                                1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).

                                                                                                                2. The caller does not care if a few bytes past the end of main[] are changed.

                                                                                                                func OrInplace

                                                                                                                func OrInplace(main, arg []byte)

                                                                                                                  OrInplace sets main[pos] := arg[pos] | main[pos] for every position in main[]. It panics if slice lengths don't match.

                                                                                                                  func OrUnsafe

                                                                                                                  func OrUnsafe(dst, src1, src2 []byte)

                                                                                                                    OrUnsafe sets dst[pos] := src1[pos] | src2[pos] for every position in dst.

                                                                                                                    WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].

                                                                                                                    1. len(src1), len(src2), and len(dst) must be equal.

                                                                                                                    2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

                                                                                                                    3. The caller does not care if a few bytes past the end of dst[] are changed.

                                                                                                                    func OrUnsafeInplace

                                                                                                                    func OrUnsafeInplace(main, arg []byte)

                                                                                                                      OrUnsafeInplace sets main[pos] := main[pos] | arg[pos] for every position in main[].

                                                                                                                      WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].

                                                                                                                      1. len(arg) and len(main) must be equal.

                                                                                                                      2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).

                                                                                                                      3. The caller does not care if a few bytes past the end of main[] are changed.

                                                                                                                      func PackedNibbleLookup

                                                                                                                      func PackedNibbleLookup(dst, src []byte, tablePtr *NibbleLookupTable)

                                                                                                                        PackedNibbleLookup sets the bytes in dst[] as follows:

                                                                                                                        if pos is even, dst[pos] := table[src[pos / 2] & 15]
                                                                                                                        if pos is odd, dst[pos] := table[src[pos / 2] >> 4]
                                                                                                                        

                                                                                                                        It panics if len(src) != (len(dst) + 1) / 2.

                                                                                                                        Nothing bad happens if len(dst) is odd and some high bits in the last src[] byte are set, though it's generally good practice to ensure that case doesn't come up.

                                                                                                                        func PackedNibbleLookupUnsafe

                                                                                                                        func PackedNibbleLookupUnsafe(dst, src []byte, tablePtr *NibbleLookupTable)

                                                                                                                          PackedNibbleLookupUnsafe sets the bytes in dst[] as follows:

                                                                                                                          if pos is even, dst[pos] := table[src[pos / 2] & 15]
                                                                                                                          if pos is odd, dst[pos] := table[src[pos / 2] >> 4]
                                                                                                                          

                                                                                                                          WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-#3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

                                                                                                                          1. len(src) == (len(dst) + 1) / 2.

                                                                                                                          2. Capacity of src is at least RoundUpPow2(len(src) + 1, bytesPerVec), and the same is true for dst.

                                                                                                                          3. The caller does not care if a few bytes past the end of dst[] are changed.

                                                                                                                          func Popcnt

                                                                                                                          func Popcnt(bytes []byte) int

                                                                                                                            Popcnt returns the number of set bits in the given []byte.

                                                                                                                            Some effort has been made to make this run acceptably fast on relatively short arrays, since I expect knowing how to do so to be helpful when working with hundreds of millions of .bam reads with ~75 bytes of base data and ~150 bytes of quality data. Interestingly, moving the leading-byte handling code to assembly didn't make a difference.

                                                                                                                            Some single-threaded benchmark results calling Popcnt 99999999 times on a 14-byte unaligned array:

                                                                                                                            C implementation: 0.219-0.232s
                                                                                                                            This code: 0.606-0.620s
                                                                                                                            C implementation using memcpy for trailing bytes: 0.964-0.983s
                                                                                                                            

                                                                                                                            So Go's extra looping and function call overhead can almost triple runtime in the short-array limit, but that's actually not as bad as the 4.5x overhead of trusting memcpy to handle trailing bytes.

                                                                                                                            func PopcntUnsafe

                                                                                                                            func PopcntUnsafe(bytes []byte) int

                                                                                                                              PopcntUnsafe returns the number of set bits in the given []byte, assuming that any trailing bytes up to the next multiple of BytesPerWord are zeroed out.

                                                                                                                              func RemakeUnsafe

                                                                                                                              func RemakeUnsafe(bufptr *[]byte, len int)

                                                                                                                                RemakeUnsafe reuses the given buffer if it has sufficient capacity; otherwise it does the same thing as MakeUnsafe. It does NOT preserve existing contents of buf[]; use ResizeUnsafe() for that.

                                                                                                                                func RepeatI16

                                                                                                                                func RepeatI16(dst []int16, val int16)

                                                                                                                                  RepeatI16 fills dst[] with the given int16.

                                                                                                                                  func RepeatU16

                                                                                                                                  func RepeatU16(dst []uint16, val uint16)

                                                                                                                                    RepeatU16 fills dst[] with the given uint16.

                                                                                                                                    func ResizeUnsafe

                                                                                                                                    func ResizeUnsafe(bufptr *[]byte, len int)

                                                                                                                                      ResizeUnsafe changes the length of buf and ensures it has enough extra capacity to be passed to this package's Unsafe functions. Existing buf[] contents are preserved (with possible truncation), though when length is increased, new bytes might not be zero-initialized.

                                                                                                                                      func Reverse16InplaceRaw

                                                                                                                                      func Reverse16InplaceRaw(main unsafe.Pointer, nElem int)

                                                                                                                                        Reverse16InplaceRaw assumes main points to an array of ct 2-byte elements, and reverses it in-place.

                                                                                                                                        func Reverse16Raw

                                                                                                                                        func Reverse16Raw(dst, src unsafe.Pointer, nElem int)

                                                                                                                                          Reverse16Raw assumes dst and src both point to arrays of ct 2-byte elements, and sets dst[pos] := src[ct - 1 - pos] for each position.

                                                                                                                                          func Reverse8

                                                                                                                                          func Reverse8(dst, src []byte)

                                                                                                                                            Reverse8 sets dst[pos] := src[len(src) - 1 - pos] for every position in src. It panics if len(src) != len(dst).

                                                                                                                                            func Reverse8Inplace

                                                                                                                                            func Reverse8Inplace(main []byte)

                                                                                                                                              Reverse8Inplace reverses the bytes in main[]. (There is no unsafe version of this function.)

                                                                                                                                              func Reverse8Unsafe

                                                                                                                                              func Reverse8Unsafe(dst, src []byte)

                                                                                                                                                Reverse8Unsafe sets dst[pos] := src[len(src) - 1 - pos] for every position in src.

                                                                                                                                                WARNING: This does not verify len(dst) == len(src); call the safe version of this function if you want that.

                                                                                                                                                func ReverseI16

                                                                                                                                                func ReverseI16(dst, src []int16)

                                                                                                                                                  ReverseI16 sets dst[len(src) - 1 - pos] := src[pos] for each position in src. It panics if len(src) != len(dst).

                                                                                                                                                  func ReverseI16Inplace

                                                                                                                                                  func ReverseI16Inplace(main []int16)

                                                                                                                                                    ReverseI16Inplace reverses a []int16 in-place.

                                                                                                                                                    func ReverseU16

                                                                                                                                                    func ReverseU16(dst, src []uint16)

                                                                                                                                                      ReverseU16 sets dst[len(src) - 1 - pos] := src[pos] for each position in src. It panics if len(src) != len(dst).

                                                                                                                                                      func ReverseU16Inplace

                                                                                                                                                      func ReverseU16Inplace(main []uint16)

                                                                                                                                                        ReverseU16Inplace reverses a []uint16 in-place.

                                                                                                                                                        func RoundUpPow2

                                                                                                                                                        func RoundUpPow2(val, alignment int) int

                                                                                                                                                          RoundUpPow2 returns val rounded up to a multiple of alignment, assuming alignment is a power of 2.

                                                                                                                                                          func SubtractFromConst8

                                                                                                                                                          func SubtractFromConst8(dst, src []byte, val byte)

                                                                                                                                                            SubtractFromConst8 sets dst[pos] := val - src[pos] for every byte in src (with the usual unsigned overflow). It panics if len(src) != len(dst).

                                                                                                                                                            func SubtractFromConst8Inplace

                                                                                                                                                            func SubtractFromConst8Inplace(main []byte, val byte)

                                                                                                                                                              SubtractFromConst8Inplace subtracts every byte of main[] from the given constant, with unsigned underflow.

                                                                                                                                                              func SubtractFromConst8Unsafe

                                                                                                                                                              func SubtractFromConst8Unsafe(dst, src []byte, val byte)

                                                                                                                                                                SubtractFromConst8Unsafe sets dst[pos] := val - src[pos] for every byte in src (with the usual unsigned overflow).

                                                                                                                                                                WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe() or XcapUnsafe(), and the same is true for dst[].

                                                                                                                                                                1. len(src) and len(dst) are equal.

                                                                                                                                                                2. Capacities are at least RoundUpPow2(len(src) + 1, bytesPerVec).

                                                                                                                                                                3. The caller does not care if a few bytes past the end of dst[] are changed.

                                                                                                                                                                func SubtractFromConst8UnsafeInplace

                                                                                                                                                                func SubtractFromConst8UnsafeInplace(main []byte, val byte)

                                                                                                                                                                  SubtractFromConst8UnsafeInplace subtracts every byte of main[] from the given constant, with unsigned underflow.

                                                                                                                                                                  WARNING: This is a function designed to be used in inner loops, which assumes without checking that capacity is at least RoundUpPow2(len(main), bytesPerVec). It also assumes that the caller does not care if a few bytes past the end of main[] are changed. Use the safe version of this function if any of these properties are problematic. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

                                                                                                                                                                  func UnpackedNibbleLookup

                                                                                                                                                                  func UnpackedNibbleLookup(dst, src []byte, tablePtr *NibbleLookupTable)

                                                                                                                                                                    UnpackedNibbleLookup sets the bytes in dst[] as follows:

                                                                                                                                                                    if src[pos] < 128, set dst[pos] := table[src[pos] & 15]
                                                                                                                                                                    otherwise, set dst[pos] := 0
                                                                                                                                                                    

                                                                                                                                                                    It panics if len(src) != len(dst).

                                                                                                                                                                    func UnpackedNibbleLookupInplace

                                                                                                                                                                    func UnpackedNibbleLookupInplace(main []byte, tablePtr *NibbleLookupTable)

                                                                                                                                                                      UnpackedNibbleLookupInplace replaces the bytes in main[] as follows:

                                                                                                                                                                      if value < 128, set to table[value & 15]
                                                                                                                                                                      otherwise, set to 0
                                                                                                                                                                      

                                                                                                                                                                      func UnpackedNibbleLookupS

                                                                                                                                                                      func UnpackedNibbleLookupS(dst []byte, src string, tablePtr *NibbleLookupTable)

                                                                                                                                                                        UnpackedNibbleLookupS is a variant of UnpackedNibbleLookup() that takes string src.

                                                                                                                                                                        func UnpackedNibbleLookupUnsafe

                                                                                                                                                                        func UnpackedNibbleLookupUnsafe(dst, src []byte, tablePtr *NibbleLookupTable)

                                                                                                                                                                          UnpackedNibbleLookupUnsafe sets the bytes in dst[] as follows:

                                                                                                                                                                          if src[pos] < 128, set dst[pos] := table[src[pos] & 15]
                                                                                                                                                                          otherwise, set dst[pos] := 0
                                                                                                                                                                          

                                                                                                                                                                          WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

                                                                                                                                                                          1. len(src) and len(dst) are equal.

                                                                                                                                                                          2. Capacities are at least RoundUpPow2(len(src) + 1, bytesPerVec).

                                                                                                                                                                          3. The caller does not care if a few bytes past the end of dst[] are changed.

                                                                                                                                                                          func UnpackedNibbleLookupUnsafeInplace

                                                                                                                                                                          func UnpackedNibbleLookupUnsafeInplace(main []byte, tablePtr *NibbleLookupTable)

                                                                                                                                                                            UnpackedNibbleLookupUnsafeInplace replaces the bytes in main[] as follows:

                                                                                                                                                                            if value < 128, set to table[value & 15]
                                                                                                                                                                            otherwise, set to 0
                                                                                                                                                                            

                                                                                                                                                                            WARNING: This is a function designed to be used in inner loops, which makes assumptions about capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Make,Remake}Unsafe(), ResizeUnsafe(), or XcapUnsafe().

                                                                                                                                                                            1. cap(main) must be at least RoundUpPow2(len(main) + 1, bytesPerVec).

                                                                                                                                                                            2. The caller does not care if a few bytes past the end of main[] are changed.

                                                                                                                                                                            func XcapUnsafe

                                                                                                                                                                            func XcapUnsafe(bufptr *[]byte)

                                                                                                                                                                              XcapUnsafe is shorthand for ResizeUnsafe's most common use case (no length change, just want to ensure sufficient capacity).

                                                                                                                                                                              func Xor

                                                                                                                                                                              func Xor(dst, src1, src2 []byte)

                                                                                                                                                                                Xor sets dst[pos] := src1[pos] ^ src2[pos] for every position in dst. It panics if slice lengths don't match.

                                                                                                                                                                                func XorConst8

                                                                                                                                                                                func XorConst8(dst, src []byte, val byte)

                                                                                                                                                                                  XorConst8 sets dst[pos] := src[pos] ^ val for every position in dst. It panics if slice lengths don't match.

                                                                                                                                                                                  func XorConst8Inplace

                                                                                                                                                                                  func XorConst8Inplace(main []byte, val byte)

                                                                                                                                                                                    XorConst8Inplace sets main[pos] := main[pos] ^ val for every position in main[].

                                                                                                                                                                                    func XorConst8Unsafe

                                                                                                                                                                                    func XorConst8Unsafe(dst, src []byte, val byte)

                                                                                                                                                                                      XorConst8Unsafe sets dst[pos] := src[pos] ^ val for every position in dst.

                                                                                                                                                                                      WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

                                                                                                                                                                                      1. len(src) and len(dst) must be equal.

                                                                                                                                                                                      2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

                                                                                                                                                                                      3. The caller does not care if a few bytes past the end of dst[] are changed.

                                                                                                                                                                                      func XorConst8UnsafeInplace

                                                                                                                                                                                      func XorConst8UnsafeInplace(main []byte, val byte)

                                                                                                                                                                                        XorConst8UnsafeInplace sets main[pos] := main[pos] ^ val for every position in main[].

                                                                                                                                                                                        WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

                                                                                                                                                                                        1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).

                                                                                                                                                                                        2. The caller does not care if a few bytes past the end of main[] are changed.

                                                                                                                                                                                        func XorInplace

                                                                                                                                                                                        func XorInplace(main, arg []byte)

                                                                                                                                                                                          XorInplace sets main[pos] := arg[pos] ^ main[pos] for every position in main[]. It panics if slice lengths don't match.

                                                                                                                                                                                          func XorUnsafe

                                                                                                                                                                                          func XorUnsafe(dst, src1, src2 []byte)

                                                                                                                                                                                            XorUnsafe sets dst[pos] := src1[pos] ^ src2[pos] for every position in dst.

                                                                                                                                                                                            WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].

                                                                                                                                                                                            1. len(src1), len(src2), and len(dst) must be equal.

                                                                                                                                                                                            2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

                                                                                                                                                                                            3. The caller does not care if a few bytes past the end of dst[] are changed.

                                                                                                                                                                                            func XorUnsafeInplace

                                                                                                                                                                                            func XorUnsafeInplace(main, arg []byte)

                                                                                                                                                                                              XorUnsafeInplace sets main[pos] := main[pos] ^ arg[pos] for every position in main[].

                                                                                                                                                                                              WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].

                                                                                                                                                                                              1. len(arg) and len(main) must be equal.

                                                                                                                                                                                              2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).

                                                                                                                                                                                              3. The caller does not care if a few bytes past the end of main[] are changed.

                                                                                                                                                                                              Types

                                                                                                                                                                                              type NibbleLookupTable

                                                                                                                                                                                              type NibbleLookupTable struct {
                                                                                                                                                                                              	// contains filtered or unexported fields
                                                                                                                                                                                              }

                                                                                                                                                                                                NibbleLookupTable represents a parallel-byte-substitution operation f, where every byte b in a byte-slice is replaced with

                                                                                                                                                                                                f(b) := shuffle[0][b & 15] for b <= 127, and
                                                                                                                                                                                                f(b) := 0 for b > 127.
                                                                                                                                                                                                

                                                                                                                                                                                                (The second part is usually irrelevant in practice, but must be defined this way to allow _mm_shuffle_epi8()/_mm256_shuffle_epi8()/_mm512_shuffle_epi8() to be used to implement the operation efficiently.) It's named NibbleLookupTable rather than ByteLookupTable since only the bottom nibble of each byte can be used for table lookup. It potentially stores multiple adjacent copies of the lookup table since that speeds up the AVX2 and AVX-512 use cases (the table can be loaded with a single _mm256_loadu_si256 operation, instead of e.g. _mm_loadu_si128 followed by _mm256_set_m128i with the same argument twice), and the typical use case involves initializing very few tables and using them many, many times.

                                                                                                                                                                                                func MakeNibbleLookupTable

                                                                                                                                                                                                func MakeNibbleLookupTable(table [16]byte) (t NibbleLookupTable)

                                                                                                                                                                                                  MakeNibbleLookupTable generates a NibbleLookupTable from a [16]byte.

                                                                                                                                                                                                  func (*NibbleLookupTable) Get

                                                                                                                                                                                                  func (t *NibbleLookupTable) Get(b byte) byte

                                                                                                                                                                                                    Get performs the b <= 127 part of the lookup operation described above. The b > 127 branch is omitted because in many use cases (e.g. PackedNibbleLookup below), it can be proven that b > 127 is impossible, and removing the if-statement is a significant performance win when it's possible.