hd

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 15, 2024 License: MIT Imports: 1 Imported by: 0

README

go-hackers-delight

codecov fuzzing Go Reference

An interactive Go showcase of "Hacker's Delight" 2nd Edition by Henrey S.Warren Jr, 2013.[^1][^2]

If you write optimizing compilers or high-performance code, you must read this book. — Guy L. Steele [PhD, ACM Award, AAAI Fellow, Sun Fellow, core Java design]

This is the first book that promises to tell the deep, dark secrets of computer arithmetic, and it delivers in spades. It contains every trick I knew plus many, many more. A godsend for library developers, compiler writers, and lovers of elegant hacks. It deserves a spot on your self right next to Knuth. In the ten years since the first edition came out, it's been absolutely invaluable to my work at Sun and Google. — Joshua Bloch [PhD, Distinguished Engineer at Sun, Chief Java Architect at Google 2004~2012]

Methodology
  • extensively fuzzing
  • not using any Go packages, not even standard library
  • using generics whenever possible
  • verifying compiled code via https://godbolt.org
Observations
  • native Abs performance is the same
  • native Div and Mod by small constants performance is better
  • native math/bits.Mul32 and math/bits.Mul64[^3] performance is the same
  • native math/bits.LeadingZeros performance is better than LeadingZeroes algorithm
  • native math.Sqrt performance is better
  • native math.Pow(x, 1./3)[^4] performance is worse than Cbrt algorithm 💥
  • native math.Pow(x, 1./3)[^4][^5] underflows but Cbrt algorithm is correct 💥
  • native math.Pow(x, n)[^4] performance is worse than Pow algorithm 💥
  • native math.Log2(x)[^4] performance is worse than Log2 algorithm 💥
  • native math.Log10(x)[^4] performance is worse than Log10 algorithm 💥
  • native math.Log10(x) [^4][^5] overflows for math.MaxUint64 but Log10 algorithm is correct 💥
  • native 1/math.Sqrt(x) performance is 1.5x better than RSqrtFloat32 algorithm
  • native switch performance is better than CycleThreeValues algorithm
  • native crc32.Checksum performance is 30x~500x better than CRC32 algorithms
  • simplistic LRUCache performance is 3x worse than LRUCache algorithm 💥
Appendix: Benchmarks
$ go test -bench .
goos: darwin
goarch: arm64
pkg: github.com/nikolaydubina/go-hackers-delight
BenchmarkNoop/---------------------------------16         	1000000000	         0.0000001 ns/op
BenchmarkAbs/basic-16                                     	1000000000	         0.9826 ns/op
BenchmarkAbs/Abs-16                                       	1000000000	         0.9647 ns/op
BenchmarkAbs/Abs2-16                                      	1000000000	         0.9943 ns/op
BenchmarkAbs/Abs3-16                                      	1000000000	         0.9819 ns/op
BenchmarkAbs/Abs4-16                                      	1000000000	         1.003 ns/op
BenchmarkAbs/AbsFastMul-16                                	1000000000	         0.9598 ns/op
BenchmarkAvg/basic-16                                     	973716225	         2.045 ns/op
BenchmarkAvg/AvgFloor-16                                  	602586224	         2.050 ns/op
BenchmarkAvg/AvgCeil-16                                   	582029594	         2.054 ns/op
BenchmarkCycleThree/basic-16                              	767160418	         1.560 ns/op
BenchmarkCycleThree/CycleThreeValues-16                   	438818894	         2.729 ns/op
BenchmarkLeadingZeros/uint32/basic-16                     	1000000000	         0.9419 ns/op
BenchmarkLeadingZeros/uint32/LeadingZerosUint32-16        	1000000000	         1.124 ns/op
BenchmarkLeadingZeros/uint64/basic-16                     	1000000000	         0.9230 ns/op
BenchmarkLeadingZeros/uint64/LeadingZerosUint64-16        	898095195	         1.336 ns/op
BenchmarkCompress/Compress-16                             	100000000	        10.60 ns/op
BenchmarkCompress/Compress2-16                            	55584826	        21.52 ns/op
BenchmarkLRU/basic-16                                     	246358870	         4.870 ns/op
BenchmarkLRU/LRUCache-16                                  	960896830	         1.239 ns/op
BenchmarkMul/uint32/basic-16                              	593555838	         1.892 ns/op
BenchmarkMul/uint32/MultiplyHighOrder32-16                	951445552	         2.046 ns/op
BenchmarkMul/uint64/basic-16                              	977065424	         1.220 ns/op
BenchmarkMul/uint64/MultiplyHighOrder64-16                	675693746	         2.042 ns/op
BenchmarkDivMod/DivMod/3/basic-16                         	1000000000	         0.8500 ns/op
BenchmarkDivMod/DivMod/3/DivMod3Signed-16                 	605588445	         1.970 ns/op
BenchmarkDivMod/DivMod/3/DivMod3Signed2-16                	1000000000	         1.078 ns/op
BenchmarkDivMod/DivMod/7/basic-16                         	1000000000	         0.8311 ns/op
BenchmarkDivMod/DivMod/7/DivMod7Signed-16                 	582087586	         2.105 ns/op
BenchmarkDivMod/Div/3/basic-16                            	1000000000	         0.8325 ns/op
BenchmarkDivMod/Div/3/Div3Signed-16                       	793883130	         1.509 ns/op
BenchmarkDivMod/Div/3/Div3ShiftSigned-16                  	907116610	         1.320 ns/op
BenchmarkDivMod/Div/7/basic-16                            	1000000000	         0.8344 ns/op
BenchmarkDivMod/Div/7/Div7Signed-16                       	755509315	         1.590 ns/op
BenchmarkDivMod/Div/7/Div7ShiftSigned-16                  	841563656	         1.424 ns/op
BenchmarkDivMod/Mod/3/basic-16                            	1000000000	         0.8309 ns/op
BenchmarkDivMod/Mod/3/Mod3Signed-16                       	812136249	         1.466 ns/op
BenchmarkDivMod/Mod/3/Mod3Signed2-16                      	1000000000	         0.8410 ns/op
BenchmarkDivMod/Mod/7/basic-16                            	1000000000	         0.8332 ns/op
BenchmarkDivMod/Mod/7/Mod7Signed-16                       	766677633	         1.564 ns/op
BenchmarkDivMod/Mod/7/Mod7Signed2-16                      	1000000000	         1.095 ns/op
BenchmarkDivMod/Mod/10/basic-16                           	1000000000	         0.8318 ns/op
BenchmarkDivMod/Mod/10/Mod10Signed-16                     	868932930	         1.441 ns/op
BenchmarkDivMod/DivExact/7/basic-16                       	1000000000	         0.9247 ns/op
BenchmarkDivMod/DivExact/7/DivExact7-16                   	1000000000	         0.9238 ns/op
BenchmarkDivMod/DivExact/7/Div7Signed-16                  	718667949	         1.668 ns/op
BenchmarkDivMod/DivExact/7/Div7ShiftSigned-16             	802988229	         1.490 ns/op
BenchmarkCbrt/basic-16                                    	47340079	        26.01 ns/op
BenchmarkCbrt/Cbrt-16                                     	85196262	        14.55 ns/op
BenchmarkPow/basic-16                                     	24005180	        48.25 ns/op
BenchmarkPow/Pow-16                                       	65121390	        19.08 ns/op
BenchmarkLog/uint32/2/basic-16                            	99810775	        12.06 ns/op
BenchmarkLog/uint32/2/Log2-16                             	984283590	         1.223 ns/op
BenchmarkLog/uint32/10/basic-16                           	140540709	         8.516 ns/op
BenchmarkLog/uint32/10/Log10-16                           	539441811	         2.220 ns/op
BenchmarkLog/uint64/2/basic-16                            	100000000	        11.73 ns/op
BenchmarkLog/uint64/2/Log2-16                             	839779903	         1.419 ns/op
BenchmarkLog/uint64/10/basic-16                           	142679388	         8.419 ns/op
BenchmarkLog/uint64/10/Log10-16                           	538269764	         2.228 ns/op
BenchmarkSqrt/basic-16                                    	1000000000	         1.019 ns/op
BenchmarkSqrt/SqrtNewton-16                               	188142513	         6.538 ns/op
BenchmarkSqrt/SqrtBinarySearch-16                         	74752382	        15.71 ns/op
BenchmarkSqrt/SqrtShiftAndSubtract-16                     	136426688	         8.834 ns/op
BenchmarkCRC32/basic-16                                   	220094371	         5.395 ns/op
BenchmarkCRC32/CRC32Basic-16                              	  448540	      2510 ns/op
BenchmarkCRC32/CRC32TableLookup-16                        	 8108901	       147.5 ns/op
BenchmarkRSqrtFloat32/basic-16                            	1000000000	         0.9354 ns/op
BenchmarkRSqrtFloat32/RSqrtFloat32-16                     	828149971	         1.448 ns/op
PASS
ok  	github.com/nikolaydubina/go-hackers-delight	91.183s

[^1]: showcase in Chttps://github.com/hcs0/Hackers-Delight [^2]: showcase in Rusthttps://github.com/victoryang00/Delight [^3]: given manual inlining of generic type, which produces equivalent Go code [^4]: we are comparing native float64 result converted to uint32, as there is no better standard function [^5]: which is due to float64 not having enough precision

Documentation

Overview

Chapter 2: Basics

For better performance, it is advisable to avoid branches in simple functions.

Right-to-Left Computability — a function mapping words to words can be implemented with word-parallel `add`, `substract`, `and`, `or` and `not` instructions if and only if each bit of of the result depends only on the bits at and to the right of each input operand. All these operands themselves depend only on right bits, so any of their composition also depends on the right bits.

It can be shown that in the ordinary addition of binary numbers with each bit independently equally likely to be 0 or 1, a carry occurs at each position with probability about 0.5.

If instruction set includes an instruction for each of 16 Boolean functions of two variables, then any Boolean function of three variables can be implemented with four or fewer instructions (and four variables with seven instructions).

Comparison functions are a couple branch-free instructions that store result in the most significant bit.

Overflow means result of operation is too large or too small to fit into the variable. Many hardware supply bits for overflow, but high-lever languages may not have access to it (rejected Go overflow proposal).

Chapter 8: Multiplication

Any multiplication can be decomposed in a summation of left shifts. For example,

  • 13 = 0b1101
  • x * 13 = 8x + 4x + x
  • t1 := x << 2
  • t2 := x << 3
  • result := t1 + t2 + x

For every multiplication there are multiple paths possible that utilize registers and shifts. These paths can have fewer or more instructions and registers to accomplish this. In general, it is nontrivial to find minimum number instructions required and which these instructions are. Following theorem gives upper bound: "Multiplication of a variable x by an n-bit constant m, m >= 1, can be accomplished with at most n instructions of the type add, subtract, and shift left by any given amount."

Chapter 9: Integer Division

  • Short Division is division of single word by another (e.g. 32bit / 32bit => 32bit). This is typical division operator available in high level languages.
  • Long Division is division of multi-word by single word (e.g. 64bit / 32bit => 32bit).

Chapter 10: Integer Division by Constants

On many computers, division is very time consuming and is to be avoided. A value of 20 or more elementary Add instructions is usually needed and execution time is usually large. When the divisor is a constant, the division can be replaced by faster operations.

Certain divisors have repetitive patterns in binary representation which allows to use few (~20) shift, add, subtract instructions to approximate residual and get quotient. It is not trivial how to generate such optimal code.

Chapter 12: Unusual Bases for Number Systems

Standard binary representation of an non-negative integer number is base 2.

  • (aₙ, ... a₃ a₂, a₁, a₀) = aₙ ⋅ 2ⁿ + ... + a₃ ⋅ 2³ + a₂ ⋅ 2² + a₁ ⋅ 2¹ + a₀ ⋅ 2⁰

However, there are alternative bases, including complex numbers.

  • (aₙ, ... a₃ a₂, a₁, a₀) = aₙ ⋅ -2ⁿ + ... + a₃ ⋅ -2³ + a₂ ⋅ -2² + a₁ ⋅ -2¹ + a₀ ⋅ -2⁰
  • (aₙ, ... a₃ a₂, a₁, a₀) = aₙ ⋅ (-1 + 𝑖)ⁿ + ... + a₃ ⋅ (-1 + 𝑖)³ + a₂ ⋅ (-1 + 𝑖)² + a₁ ⋅ (-1 + 𝑖)¹ + a₀ ⋅ (-1 + 𝑖)⁰
  • (aₙ, ... a₃ a₂, a₁, a₀) = aₙ ⋅ (-1 - 𝑖)ⁿ + ... + a₃ ⋅ (-1 - 𝑖)³ + a₂ ⋅ (-1 - 𝑖)² + a₁ ⋅ (-1 - 𝑖)¹ + a₀ ⋅ (-1 - 𝑖)⁰

Which base is the most efficient? It can be argued that under standard assumptions of hardware design (e.g. "cost of b-state circuit" is proportional to b) the most efficient base is 𝑒. However, it is not practical. Meanwhile, base 2 is only 5% more costly than base 3 and only 6% more costly than base 𝑒. Which is an interesting result, given base 2 is both convenient to use and is theoretically optimal.

Chapter 18: Primes

It is known, that there is no polynomial f(n) that produces a prime for every n.

Index

Examples

Constants

View Source
const SqrtFloat32ErrorRate float32 = 0.035

SqrtFloat32ErrorRate is error rate (abs(true_value - result) / result) that RSqrtFloat32 produces.

Variables

View Source
var CRC32Generator uint32 = 0x4C11DB7

CRC32Generator is Cycle Redundancy Check generator polynomial. Each bit i corresponds to the 1/0 of coefficient at x^i in polynomial. Among others, it is used in following standards: ISO 3309 (HDLC), ANSI X3.66 (ADCCP), ISO/IEC/IEEE 802-3 (Ethernet), MPEG-2, PKZIP, Gzip, Bzip2, POSIX cksum, PNG.

View Source
var CRC32GeneratorReversed uint32 = 0xEDB88320
View Source
var InterestingPrimes = [...]uint{
	(1 << 32) - 5,
}
View Source
var PowerOfTwo = [...]uint{
	1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192,
	16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152,
	4194304, 8388608, 16777216, 33554432, 67108864, 134217728, 268435456,
	536870912, 1073741824, 2147483648, 4294967296, 8589934592,
	17179869184, 34359738368, 68719476736, 137438953472, 274877906944,
	549755813888, 1099511627776, 2199023255552, 4398046511104, 8796093022208,
	17592186044416, 35184372088832, 70368744177664, 140737488355328,
	281474976710656, 562949953421312, 1125899906842624, 2251799813685248,
	4503599627370496, 9007199254740992, 18014398509481984, 36028797018963968,
	72057594037927936, 144115188075855872, 288230376151711744, 576460752303423488,
	1152921504606846976, 2305843009213693952, 4611686018427387904, 9223372036854775808,
}

PowerOfTwo that fit into uint64.

Functions

func Abs

func Abs[T Signed](x T) T

Abs can be computed in three or four branch-free instructions.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Print(hd.Abs(-42))
}
Output:

42

func Abs2

func Abs2[T Signed](x T) T

func Abs3

func Abs3[T Signed](x T) T

func Abs4

func Abs4[T Signed](x T) T

func AbsDiff

func AbsDiff[T Integer](x, y T) T

AbsDiff is absolute difference that does not overflow.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Print(hd.AbsDiff(1, 100))
}
Output:

99

func AbsDiff2

func AbsDiff2[T Signed](x, y T) T

func AbsDiffUnsigned

func AbsDiffUnsigned(x, y uint32) uint32

func AbsFastMul

func AbsFastMul(x int32) int32

AbsFastMul when you have fast instruction for +1/-1 multiplication.

func AddDoubleLength

func AddDoubleLength(x [2]uint32, y [2]uint32) [2]uint32

AddDoubleLength of uint64 numbers encoded in two uint32. This takes nine instructions.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	x := [2]uint32{310, 410}
	y := [2]uint32{100, 200}
	fmt.Println(hd.AddDoubleLength(x, y))
}
Output:

[410 610]

func AddFourUint8

func AddFourUint8(x, y uint32) uint32

AddFourUint8 executes in eight branch-free instructions

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	x := hd.FourUint8ToUint32([4]uint8{1, 2, 254, 4})
	y := hd.FourUint8ToUint32([4]uint8{2, 3, 1, 5})

	sum := hd.AddFourUint8(x, y)

	fmt.Println(hd.FourUint8FromUint32(sum))
}
Output:

[3 5 255 9]

func AddTwoUint16

func AddTwoUint16(x, y uint32) uint32

AddFourUint8 executes in seven branch-free instructions

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	x := hd.TwoUint16ToUint32([2]uint16{65535 - 1, 4})
	y := hd.TwoUint16ToUint32([2]uint16{1, 5})

	sum := hd.AddTwoUint16(x, y)

	fmt.Println(hd.TwoUint16FromUint32(sum))
}
Output:

[65535 9]

func AvgCeil

func AvgCeil[T Integer](x, y T) T

AvgCeil does not cause overflow. Rounded up. In Go, when left operand of shift operator is signed int, then arithmetic shift is performed. Arithmetic shift is same thing as signed shift. Since in Go right shift is unsigned, then formula for int32 is the same as for uint32.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Print(hd.AvgCeil[int32](-101, -200))
}
Output:

-150

func AvgFloor

func AvgFloor[T Integer](x, y T) T

AvgFloor does not cause overflow. Rounded down. This is same as unsigned version, but uses signed-shift right. In Go, when left operand of shift operator is signed int, then arithmetic shift is performed. Since in Go right shift is unsigned, then formula for int32 is the same as for uint32.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Print(hd.AvgFloor[int32](-101, -200))
}
Output:

-151

func BitSize32

func BitSize32(x int32) uint8

BitSize32 returns minimum number of bits requires to represent number in two's complement signed number.

Example (Bit32)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.BitSize32(0xFFFF_FFFF >> 1))
}
Output:

32
Example (Bit6)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.BitSize32(0b0000_1101))
}
Output:

5
Example (Bit7)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.BitSize32(0b0011_1101))
}
Output:

7
Example (Zero)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.BitSize32(0))
}
Output:

0

func CLPTwo

func CLPTwo(x uint32) uint32

CLPTwo is Ceil to nearest Power of Two. Values >= 2^31 will be zero.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.CLPTwo(200))
}
Output:

256

func CLPTwo2

func CLPTwo2(x uint32) uint32

func CLPTwo3

func CLPTwo3(x uint32) uint32

CLPTwo3 is alternative branch-free version when NLZ is not available.

func CRC32Basic

func CRC32Basic(data []byte) uint32

CRC32Basic computes CRC 32bit checksum one bit a time.

func CRC32TableLookup

func CRC32TableLookup(data []byte) uint32

CRC32TableLookup uses precomputed values.

func CSA

func CSA[T Unsigned](a, b, c T) (h, l T)

CSA is Carry Save Adder

func Cbrt

func Cbrt(x uint32) uint32

Cbrt is cube root, it starts of with shift based Newton algorithm for Sqrt and modifies it to cuber root

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.Cbrt(64))
}
Output:

4

func CheckBits

func CheckBits(u uint32) uint32

CheckBits produces 6bits of parity checks. It is simple Hamming Code based error detection and correction scheme.

Example (No_error)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	v := uint32(0xABCD_EF01)
	p := hd.CheckBits(v)
	u, errb := hd.Correct(p, v)
	fmt.Printf("%0x %v", u, errb)
}
Output:

abcdef01 0

func CompareCountOnes

func CompareCountOnes(x, y uint32) int
Example (Left_bigger)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	var x uint32 = 0b0000_0100_0100_01111
	var y uint32 = 0b0000_0100_0100_01110
	fmt.Println(hd.CompareCountOnes(x, y))
}
Output:

1
Example (Left_smaller)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	var x uint32 = 0b0000_0100_0100_01110
	var y uint32 = 0b0000_0100_0100_01111
	fmt.Println(hd.CompareCountOnes(x, y))
}
Output:

-1
Example (Same)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	var x uint32 = 0b0000_0100_0100_01110
	var y uint32 = 0b0000_0100_0100_01110
	fmt.Println(hd.CompareCountOnes(x, y))
}
Output:

0

func Compress

func Compress(x, m uint32) uint32

Compress (Generalized Extract) selects bits from the x where m is 1 and puts them in order to least significant bits. This uses divide and conquer. This executes in 169 instructions on 64bit RISC. There is also hardware focused version that does not work well in general RISC.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	//          x:   abcd_efgh_ijkl_mnop_qrst_uvwx_yzAB_CDEF
	var x uint32 = 0b1010_1011_0101_1110_0001_0101_0000_1000
	var m uint32 = 0b0000_1111_0011_0011_1010_1010_0101_0101
	// 		  exp:   0000_0000_0000_0000_efgh_klop_qsuw_zBDF
	fmt.Printf("%032b", hd.Compress(x, m))
}
Output:

00000000000000001011011000000000

func Compress2

func Compress2[T Unsigned](x, m T) T

Compress2 is direct version of Compress. This is 260 RISC instructions in loop brach-free.

func Correct

func Correct(pr, u uint32) (cr uint32, errb uint8)

Correct bits in pr that contain 32 "information" bits stored in ur and 7 "check" bits stored in pr. Check bits are extracted and error is corrected in returned result. This can correct up to 1 wrong bit. This is based on Hamming error correcting code. It returns corrected bits and number of error bits detected (0,1,2). TODO: 2 bits error detection is not working.

func CountOnes

func CountOnes(x uint32) uint32

CountOnes (pop) uses divide and conquer to count number set bits. Folklore says counting ones is important for NSA, but nobody knows why. Applications of this function include Hamming Distance.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.CountOnes5(uint32(0b0000_0100_0100_01110)))
}
Output:

5

func CountOnes1

func CountOnes1(x uint32) uint32

CountOnes1 is optimized version of divide and conquer that executes in 21 instructions and is branch-free.

func CountOnes3

func CountOnes3(x uint32) uint32

CountOnes3 it executes in 19 RISC instructions, but works well on machines with two addresses.

func CountOnes4

func CountOnes4[T Unsigned](x T) T

CountOnes4 is very fast for if number of 1 bits is small.

func CountOnes5

func CountOnes5(x uint32) uint32

CountOnes5 uses uint64 registers. It works only with uint15.

func CountOnesArrayGroup8

func CountOnesArrayGroup8(A []uint32) uint32

CountOnesArrayGroup8 utilizes CSA for faster count ones computation. It is executing with 3x less instructions than direct version with CountOnes per uint32.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	vs := []uint32{
		0b0000_0100_0100_01110,
		0b0001_0100_0100_01110,
		0b0000_0100_0100_01110,
		0b0011_0100_0100_01110,
		0b0000_0100_0100_01110,
		0b0111_0100_0100_01110,
		0b1001_0100_0100_01110,
		0b0000_0100_0100_01110,
		0b0101_0100_0100_01110,
		0b0101_0100_0100_01110,
	}
	fmt.Println(hd.CountOnesArrayGroup8(vs))
}
Output:

62

func CycleThreeIdentifier

func CycleThreeIdentifier(vs [3]int32, n1, n2 uint8) [2][3]int32

CycleThreeIdentifier constructs identifier byte from bits stored at positions n1 and n2.

func CycleThreeValues

func CycleThreeValues(a, b, c, x int32) int32

CycleThreeValues is an ingenious and very efficient (8 branch-free instructions) way of cycling in three different constants. It requires pre-computing constants that can be done at compile time. It relies on the fact that among three different constants there are always two bit positions where they differ and each time one-odd-out.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	var c int32 = 0b10101 // 21
	var a int32 = 0b11111 // 31
	var b int32 = 0b10100 // 20

	out := []int32{a}
	for range 10 {
		out = append(out, hd.CycleThreeValues(a, b, c, out[len(out)-1]))
	}

	fmt.Println(out)
}
Output:

[31 20 21 31 20 21 31 20 21 31 20]

func DifferenceOrZero

func DifferenceOrZero[T Signed](x, y T) T

DifferenceOrZero (doz) returns x - y if x > y else 0. It computes in 7-10 branch-free RISC instructions. Note, smaller DifferenceOrZero version with comparison is skipped as Go lacks comparison returning int.

func DifferenceOrZeroRanges

func DifferenceOrZeroRanges[T int32 | uint32](x, y T) T

DifferenceOrZeroRanges requires signed x and y be in range [-2^30, 2^30-1] and unsigned x and y be in range [0, 2^31-1]

func DifferenceOrZeroUnsigned

func DifferenceOrZeroUnsigned(x, y uint32) uint32

DifferenceOrZeroUnsigned (dozu) computes in 7-10 branch-free RISC instructions. Note, smaller DOZ version with comparison is skipped as Go lacks comparison returning int.

func Div1000ShiftSigned

func Div1000ShiftSigned(n int32) int32

Div1000ShiftSigned (divs1000)

func Div1000ShiftUnsigned

func Div1000ShiftUnsigned(n uint32) uint32

Div1000ShiftUnsigned (div1000) is 23 instructions.

func Div100ShiftSigned

func Div100ShiftSigned(n int32) int32

Div100ShiftSigned (divs100)

func Div100ShiftUnsigned

func Div100ShiftUnsigned(n uint32) uint32

Div100ShiftUnsigned (div100) is 25 instructions.

func Div10ShiftSigned

func Div10ShiftSigned(n int32) int32

Div10ShiftSigned (divs10)

func Div10ShiftUnsigned

func Div10ShiftUnsigned(n uint32) uint32

Div10ShiftUnsigned (div10)

func Div11ShiftSigned

func Div11ShiftSigned(n int32) int32

Div11ShiftSigned (divs11)

func Div11ShiftUnsigned

func Div11ShiftUnsigned(n uint32) uint32

Div11ShiftUnsigned (div11)

func Div12ShiftSigned

func Div12ShiftSigned(n int32) int32

Div12ShiftSigned (divs12)

func Div12ShiftUnsigned

func Div12ShiftUnsigned(n uint32) uint32

Div12ShiftUnsigned (div12)

func Div13ShiftSigned

func Div13ShiftSigned(n int32) int32

Div13ShiftSigned (divs13)

func Div13ShiftUnsigned

func Div13ShiftUnsigned(n uint32) uint32

Div13ShiftUnsigned (div13)

func Div3ShiftSigned

func Div3ShiftSigned(n int32) int32

Div3ShiftSigned (divs3)

func Div3ShiftUnsigned

func Div3ShiftUnsigned(n uint32) uint32

Div3ShiftUnsigned (divu3) uses shifts and adds to approximate through residual. It is 18 instructions.

func Div3Signed

func Div3Signed(n int32) int32

Div3Signed this computes in 6 instructions. This is 9 or 10 cycles. Meanwhile, divide operation can take 20 cycles.

func Div5ShiftSigned

func Div5ShiftSigned(n int32) int32

Div5ShiftSigned (divs5)

func Div5ShiftUnsigned

func Div5ShiftUnsigned(n uint32) uint32

Div5ShiftUnsigned (divu5a)

func Div5Signed

func Div5Signed(n int32) int32

Div5Signed is similar to Div3Signed, but error terms is too large, and thus it needs slight variation of magic constant and correction with shift right.

func Div6ShiftSigned

func Div6ShiftSigned(n int32) int32

Div6ShiftSigned (divs6)

func Div6ShiftUnsigned

func Div6ShiftUnsigned(n uint32) uint32

Div6ShiftUnsigned (divu6a)

func Div7ShiftSigned

func Div7ShiftSigned(n int32) int32

Div7ShiftSigned (divs7)

func Div7ShiftUnsigned

func Div7ShiftUnsigned(n uint32) uint32

Div7ShiftUnsigned (divu7)

func Div7Signed

func Div7Signed(n int32) int32

Div7Signed is similar to Div3Signed, but error terms is too large, and thus it needs slight variation of magic constant and correction with shift right. The magic is still too large, and thus it needs flipping sign and addition.

func Div9ShiftSigned

func Div9ShiftSigned(n int32) int32

Div9ShiftSigned (divs9)

func Div9ShiftUnsigned

func Div9ShiftUnsigned(n uint32) uint32

Div9ShiftUnsigned (div9)

func DivExact

func DivExact[T Integer](n, d T) T

DivExact uses multiplicative inverse that can be computed at compile time. It relies on theorem "if a and m are relatively prime integers, then there exists an integer ā such that a*ā = 1 (mod m)".

func DivExact7

func DivExact7(n int32) int32

func DivMod3Signed

func DivMod3Signed(n int32) (q, r int32)

func DivMod3Signed2

func DivMod3Signed2(n int32) (q, r int32)

DivMod3Signed2 calculates reminder first.

func DivMod5Signed

func DivMod5Signed(n int32) (q, r int32)

func DivMod7Signed

func DivMod7Signed(n int32) (q, r int32)

func DivModConstSigned

func DivModConstSigned(n, d int32) (q, r int32)

DivModConstSigned illustrates division by constant. This code should be generated at compile time depending on the value of compile time constant k and result of MagicSigned execution. The basic trick is to multiply by magic number 2**32/d and then extract leftmost 32 bits of the product.

func DivModConstSignedMagic

func DivModConstSignedMagic(d int32) (M, s int32)

DivModConstSignedMagic computes magic number and shift amount for signed integer division. This values can be computed at compile time. Code using big numbers can be simplified, such as in the case of Python or math.Big, it is no listed here.

func DivModConstUnsigned

func DivModConstUnsigned(n, d uint32) (q, r uint32)

func DivModConstUnsignedMagic

func DivModConstUnsignedMagic(d uint32) (M uint32, a, s int32)

DivModConstUnsignedMagic computes magic for unsigned constant multiplication. This values can be computed at compile time. Code using big numbers can be simplified, such as in the case of Python or math.Big, it is no listed here.

func DivModLongUnsigned64b32b

func DivModLongUnsigned64b32b(x uint64, y uint32) (q, r uint32)

DivModLongUnsigned64b32b (divlu) performs long division of 64-bit unsigned integer by 32-bit unsigned integer. This algorithm is slightly modified to store both lower and higher 32 bits of dividend into 64-bit number. This algorithm uses shift-and-subtract operations. It illustrates how hardware is doing such division. It does not work for overflow cases. This executes in 321 to 385 RISC instructions.

func DivModLongUnsigned64b32b2

func DivModLongUnsigned64b32b2(x uint64, y uint32) (q, r uint32)

DivModLongUnsigned64b32b2 is alternative version based on multiword division. If overflow, it returns maximum quotient and reminder.

func DivModMultiWordUnsigned

func DivModMultiWordUnsigned(q, r, u, v []uint16)

DivModMultiWordUnsigned is Knuth algorithm for integer division. It stores quotient in q and remainder in r.

func DivModSignedPowTwo

func DivModSignedPowTwo(n int32, k int) (q, r int32)

DivModSignedPowTwo takes 7 instructions, 5 instructions for division and 2 instructions for reminder.

func DivModUnsignedPowTwo

func DivModUnsignedPowTwo(n uint32, k int) (q, r uint32)

func DivModUnsignedSeven

func DivModUnsignedSeven(n uint32) (q, r uint32)

func DivModUnsignedThree

func DivModUnsignedThree(n uint32) (q, r uint32)

func DivSignedPowTwo

func DivSignedPowTwo(n int32, k int) int32

DivSignedPowTwo returns n / 2 ** k for 1 <= k < 31. This is illustration, for performance k should be fixed at compile time. This is four branch-free instructions.

func DivSignedPowTwo2

func DivSignedPowTwo2(n int32, k int) int32

DivSignedPowTwo2 is alternative version that uses branch.

func DivSignedPowTwo_fixed5

func DivSignedPowTwo_fixed5(n int32) int32
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.DivSignedPowTwo_fixed5(96))
}
Output:

3

func DoubleLengthInt32FromUint64

func DoubleLengthInt32FromUint64(x uint64) [2]uint32

DoubleLengthInt32FromUint64 unpacks two uint32 from single uint64. This is known as Double-Length arithmetics. This can be implemented in five instructions by using only 31 bits and storing carry in most significant bit. Here is only simplified versions.

func Equal

func Equal[T Signed](x, y T) T
Example (Equal)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.Equal(10, 10) >> 31)
}
Output:

-1
Example (Not_equal)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.Equal(10, 11) >> 31)
}
Output:

0

func Equal2

func Equal2(x, y int32) int32

func Equal3

func Equal3(x, y int32) int32

func Equal4

func Equal4(x, y int32) int32

func Equal5

func Equal5[T Integer](x, y T) T

func EqualZero

func EqualZero[T Signed](x T) T

func EqualZero2

func EqualZero2(x int32) int32

func EqualZero3

func EqualZero3(x int32) int32

func EqualZero4

func EqualZero4[T Integer](x T) T

func EqualZero5

func EqualZero5[T Integer](x T) T

func ExchangeBitsInRegister

func ExchangeBitsInRegister[T Unsigned](x, md, mo T, k int) T

ExchangeBitsInRegister swaps two regions of bits within single register without affecting other bits. For example, swapping B and D regions. B and D should be same size, but A C E and B/D can be different sizes. k is number of bits of C + D md is mask for field D mo is mask for fields A C and E (all fields to not swap) [aaaa bbbb cccc dddd eeee] -> [aaaa dddd cccc bbbb eeee] This is straight forward approach and requires eleven instructions and six cycles.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	var xx uint32 = 0xABCDEF12
	var mb uint32 = 0x0000F000
	var mo uint32 = 0xF0FF0FFF
	fmt.Printf("%0X", hd.ExchangeBitsInRegister(xx, mb, mo, 4*3))
}
Output:

AECDBF12

func ExchangeBitsInRegisterFast

func ExchangeBitsInRegisterFast[T Unsigned](x, md, mo T, k int) T

ExchangeBitsInRegisterFast requires eight instructions and five cycles.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	var xx uint32 = 0xABCDEF12
	var mb uint32 = 0x0000F000
	var mo uint32 = 0xF0FF0FFF
	fmt.Printf("%0X", hd.ExchangeBitsInRegisterFast(xx, mb, mo, 4*3))
}
Output:

AECDBF12

func ExchangeRegisters

func ExchangeRegisters[T Integer](x, y T) (T, T)

ExchangeRegisters illustrates very old trick on how to exchange two registers without using third one. This is swap operation. Also known as multiple assignment in Go.

func ExchangeRegistersMasked

func ExchangeRegistersMasked[T Integer](x, y, m T) (T, T)

ExchangeRegistersMasked illustrates how to exchange only masked bits between two registers. This can be done in 3 cycles, given parallelism and and-not instructions.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	var x uint32 = 0xABCDEF12
	var y uint32 = 0x12345678
	var m uint32 = 0x0F0F0F0F
	x, y = hd.ExchangeRegistersMasked(x, y, m)
	fmt.Printf("\n%0X\n%0X", x, y)
}
Output:

A2C4E618
1B3D5F72
Example (Bits)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	var x uint32 = 0b11110000111100001111000011110000
	var y uint32 = 0b00001111000011110000111100001111
	var m uint32 = 0b00001111111100000000000000000000

	x, y = hd.ExchangeRegistersMasked(x, y, m)
	fmt.Printf("\n%032b\n%032b", x, y)
}
Output:

11111111000000001111000011110000
00000000111111110000111100001111

func ExchangeRegistersMasked2

func ExchangeRegistersMasked2[T Integer](x, y, m T) (T, T)

func ExchangeRegistersMasked3

func ExchangeRegistersMasked3[T Integer](x, y, m T) (T, T)

ExchangeRegistersMasked3 is heavily using bitwise equality, but in Go there is no such operator, so expanding that notation.

func ExchangeRegistersMasked4

func ExchangeRegistersMasked4[T Integer](x, y, m T) (T, T)

ExchangeRegistersMasked4 also executes in three cycles, given sufficient instruction parallelism in the CPU.

func Expand

func Expand(x, m uint32) uint32

Expand (Generalized Insert, unpack, scatter, deposit) selects bits from the x where m is 1 and puts them in order to least significant bits. This is reverse of Compress steps. This is 168 RISC instructions, or 200 instructions on 64bit RISC. There is also hardware focused version that does not work well in general RISC.

func ExtendSign7

func ExtendSign7(x uint32) uint32

ExtendSign8 sign-extends a 8-bit number to a 32-bit number. Sign of 8-bit number is stored in 8th-bit. Sign extension is treating n-th least significant bit as sign bit and copying it to all more significant bits. This is usually implemented with shift-left-logical followed by shift-right-arithmetic, but alternative may be useful if you don't have shift.

Example (Extended)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.ExtendSign7(0b1110_1010))
}
Output:

11111111111111111111111111101010
Example (NotExtended)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.ExtendSign7(0b0010_1010))
}
Output:

00101010

func ExtendSign7Three

func ExtendSign7Three(x uint32) uint32

ExtendSign7Three is alternative version

Example (Extended)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.ExtendSign7Three(0b1110_1010))
}
Output:

11111111111111111111111111101010
Example (NotExtended)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.ExtendSign7Three(0b0010_1010))
}
Output:

00101010

func ExtendSign7Two

func ExtendSign7Two(x uint32) uint32

ExtendSign7Two is alternative version. If you know all higher order bits are zero, then `and` can be omitted.

Example (Extended)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.ExtendSign7Two(0b1110_1010))
}
Output:

11111111111111111111111111101010
Example (NotExtended)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.ExtendSign7Two(0b0010_1010))
}
Output:

00101010

func FLPTwo

func FLPTwo(x uint32) uint32

FLPTwo is Floor to nearest Power of Two. Values >= 2^31 will be zero. This formula works for x > 0.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.FLPTwo(310))
}
Output:

256

func FLPTwo2

func FLPTwo2(x uint32) uint32

FLPTwo2 is alternative version that works for x == 0 as well.

func FLPTwo3

func FLPTwo3(x uint32) uint32

func FLPTwo4

func FLPTwo4(x uint32) uint32

FLPTwo4 is alternative branch-free version when NLZ is not available.

func FLPTwo5

func FLPTwo5(x uint32) uint32

FLPTwo5 uses simple loop that executes in 4 * NLZ(x) + 3 instructions.

func FindFirstStringOnes

func FindFirstStringOnes(x uint32, n int) int

FindFirstStringOnes (FFStr1) finds index of first string of 1s of length n or 32 if none found. This uses divide and conquer and executes in 3 to 36 RISC instructions. This loop can be effectively unrolled.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.FindFirstStringOnes(0b0000_0000_0000_0000_1100_1110_1111_1000, 5))
}
Output:

24

func FindFirstStringOnes1

func FindFirstStringOnes1(x uint32, n int) int

FindFirstStringOnes1 is is direct algorithm. For worst case this is not good, it is 178 RISC instructions for 0x5555_5555.

func FindInByte

func FindInByte(x uint32, m byte) int

FindInByte m in x illustrates that to find specific byte you need to XOR with repeated value.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.FindInByte(0x12_00_FF_00, 0xFF))
}
Output:

2

func FindInByteEq

func FindInByteEq(x, y uint32) int

FindInByteEq finds index of first byte that x and y are equal.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.FindInByteEq(0x12_00_F9_00, 0x23_01_F9_00))
}
Output:

2

func FirstOneOffDifferentBits

func FirstOneOffDifferentBits[T int32 | uint32](vs [3]T) [3]int8

FirstOneOffDifferentBits computes 1-based position of first least significant bit that is one-off among three values for corresponding value or -1 if there is no such bit.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	var a int32 = 0b11111
	var b int32 = 0b10100
	var c int32 = 0b10101
	fmt.Println(hd.FirstOneOffDifferentBits([3]int32{a, b, c}))
}
Output:

[1 0 -1]

func FourUint8FromUint32

func FourUint8FromUint32(x uint32) [4]uint8

FourUint8FromUint32 unpacks 4 uint8 from single uint32.

func FourUint8ToUint32

func FourUint8ToUint32(x [4]uint8) uint32

FourUint8ToUint32 packs 4 uint4 into single uint32.

func GrayCodeFromUint

func GrayCodeFromUint[T Unsigned](n T) T

GrayCodeFromUint converts a binary number to Gray code. Gray Code is sequence of binary numbers where each successive differs only by one bit. It is possible to iterate through all 2ⁿ numbers in single Gray code. There are many versions of Gray Codes, here we use most common one, "reflected binary Gray code". Gray codes named after Frank Gray, physicist at Bell Labs, who invented it in 1930s for TV.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	for i := range 10 {
		fmt.Printf("%08b -> %08b\n", i, hd.GrayCodeFromUint(uint64(i)))
	}
}
Output:

00000000 -> 00000000
00000001 -> 00000001
00000010 -> 00000011
00000011 -> 00000010
00000100 -> 00000110
00000101 -> 00000111
00000110 -> 00000101
00000111 -> 00000100
00001000 -> 00001100
00001001 -> 00001101

func GrayCodeToUint16

func GrayCodeToUint16(g uint16) (b uint16)

func GrayCodeToUint32

func GrayCodeToUint32(g uint32) (b uint32)

func GrayCodeToUint64

func GrayCodeToUint64(g uint64) (b uint64)

func HigherEqualZero

func HigherEqualZero[T Signed](x T) T

func HigherZero

func HigherZero[T Signed](x T) T

func HigherZero2

func HigherZero2[T Signed](x T) T

func HigherZero3

func HigherZero3[T Signed](x T) T

func ISIGN

func ISIGN[T Signed](x, y T) T

ISIGN is sign-transfer function, as known in FORTRAN.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.ISIGN(10, -100000), hd.ISIGN(-10, 100000))
}
Output:

-10 10

func ISIGN2

func ISIGN2[T Signed](x, y T) T

func ISIGN3

func ISIGN3[T Signed](x, y T) T

func ISIGN4

func ISIGN4[T Signed](x, y T) T

func IncrReversed

func IncrReversed(x uint32) uint32

IncrReversed value is used in Fast Furier Transform (FFT) when computing reversed and incrementing is used in the loop. However, computing reversed takes 29 instructions, and lookup table for large values of is not practical. Thus storing previous reversed value and incrementing reversed is useful. This executes in 5 RISC instructions.

func Int64FromNx16b

func Int64FromNx16b(v []uint16) int64

func IntToNx16b

func IntToNx16b[T int | int32 | int64 | uint | uint32 | uint64](v T) []uint16
Example (Small)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.IntToNx16b(5))
}
Output:

[5]

func IsAddOverflow

func IsAddOverflow[T Signed](x, y T) T

IsAddOverflow sets most significant bit if overflow occurs. This version does not use carry bit and is efficient.

Example
package main

import (
	"fmt"
	"math"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.IsAddOverflow(int32(math.MaxInt32), 1)>>31, hd.IsAddOverflow(int32(10), 1)>>31)
}
Output:

-1 0

func IsAddOverflow2

func IsAddOverflow2[T Signed](x, y T) T

func IsAddOverflowUnsigned

func IsAddOverflowUnsigned[T Unsigned](x, y T) T
Example
package main

import (
	"fmt"
	"math"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.IsAddOverflowUnsigned(uint32(math.MaxUint32), 1)>>31, hd.IsAddOverflowUnsigned(uint32(10), 1)>>31)
}
Output:

1 0

func IsAddOverflowUnsigned2

func IsAddOverflowUnsigned2[T Unsigned](x, y T) T

func IsAddOverflowUnsigned3

func IsAddOverflowUnsigned3[T Unsigned](x, y T) bool

func IsAddOverflowUnsigned4

func IsAddOverflowUnsigned4[T Unsigned](x, y, sum T) bool
Example
package main

import (
	"fmt"
	"math"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	var x uint32 = 1
	fmt.Println(hd.IsAddOverflowUnsigned4(math.MaxUint32, 1, math.MaxUint32+x), hd.IsAddOverflowUnsigned4(10, 1, 10+x))
}
Output:

true false

func IsDivExactSigned

func IsDivExactSigned(n, d int32) bool

IsDivExactSigned is simplified version with abs. it has single branch. however, it is not minimal number of instructions.

func IsDivExactUnsigned

func IsDivExactUnsigned(n, d uint32) bool

IsDivExactUnsigned is similar to odd version but transforms divisor d0 * 2^k and performs rotate right trick. It has single branch.

func IsDivExactUnsignedOdd

func IsDivExactUnsignedOdd(n, d uint32) bool

IsDivExactUnsignedOdd tests if n is multiple of d, for odd d. It has single branch.

func IsDivOverflow

func IsDivOverflow(x, y int32) bool

IsDivOverflow uses around seven instructions with branches, and it is not easy to improve this.

func IsDivOverflowUnsigned

func IsDivOverflowUnsigned[T Unsigned](x, y T) bool

func IsInRange

func IsInRange[T Integer](x, a, b T) bool

IsInRange illustrates how to perform signed integer bounds checking (both ends inclusive) in single comparison. This is useful for array bounds checking. Using uint conversion, to force unsigned comparison on signed integers, given Go lacks unsigned comparison operator on signed integers.

func IsInRangeClosedOpen

func IsInRangeClosedOpen[T Integer](x, a, b T) bool

IsInRangeClosedOpen: a <= x < b

func IsInRangeOpen

func IsInRangeOpen[T Integer](x, a, b T) bool

IsInRangeOpen: a < x < b

func IsInRangeOpen2

func IsInRangeOpen2[T Integer](x, a, b T) bool

func IsInRangeOpenClosed

func IsInRangeOpenClosed[T Integer](x, a, b T) bool

IsInRangeOpenClosed: a < x <= b

func IsInRangePowerTwo

func IsInRangePowerTwo[T Unsigned](x T, n int) bool

IsInRangePowerTwo: 0 <= x <= 2^n - 1

func IsInRangePowerTwoOffset

func IsInRangePowerTwoOffset[T Unsigned](x, a T, n int) bool

IsInRangePowerTwoOffset: a <= x <= a + 2^n - 1

func IsMostSignificantSet

func IsMostSignificantSet[T int32 | uint32](x T) bool
Example (Int32)
package main

import (
	"fmt"
	"math"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.IsMostSignificantSet(int32(-1)), hd.IsMostSignificantSet(int32(1)), hd.IsMostSignificantSet(int32(math.MaxInt32)))
}
Output:

true false false
Example (Uint32)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.IsMostSignificantSet(uint32(0xFFFFFFFF)), hd.IsMostSignificantSet(uint32(10)))
}
Output:

true false

func IsMulOverflow

func IsMulOverflow(x, y int32) bool

func IsMulOverflowUnsigned

func IsMulOverflowUnsigned(x, y uint32) bool

func IsMulOverflowUnsignedNLZ

func IsMulOverflowUnsignedNLZ(x, y uint32) bool

IsMulOverflowUnsignedNLZ counts number of leading zeros to detect overflow.

func IsPowerOfTwoBoundaryCrossed

func IsPowerOfTwoBoundaryCrossed[T Unsigned](a, l, b T) bool

IsPowerOfTwoBoundaryCrossed checks if adding l to a crosses power of block of size b. b size has to be power of two. This and versions bellow are five or six RISC instructions. b has to be power of two and likely to be a constant.

Example (Crossed)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.IsPowerOfTwoBoundaryCrossed[uint32](100, 300, 256))
}
Output:

true
Example (Not_crossed)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.IsPowerOfTwoBoundaryCrossed[uint32](100, 100, 256))
}
Output:

false

func IsPowerOfTwoBoundaryCrossed2

func IsPowerOfTwoBoundaryCrossed2[T Unsigned](a, l, b T) bool

func IsPowerOfTwoBoundaryCrossed3

func IsPowerOfTwoBoundaryCrossed3[T Unsigned](a, l, b T) bool

func IsPowerOfTwoBoundaryCrossed4

func IsPowerOfTwoBoundaryCrossed4[T Unsigned](a, l, b T) bool

func IsSubOverflow

func IsSubOverflow[T Signed](x, y T) T

IsSubOverflow sets most significant bit if overflow occurs. This version does not use carry bit and is efficient.

Example
package main

import (
	"fmt"
	"math"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.IsSubOverflow2(int32(math.MinInt32), 1)>>31, hd.IsSubOverflow2(int32(-10), 1)>>31)
}
Output:

-1 0

func IsSubOverflow2

func IsSubOverflow2[T Signed](x, y T) T

func IsSubOverflowUnsigned

func IsSubOverflowUnsigned[T Unsigned](x, y T) T
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.IsSubOverflowUnsigned(uint32(0), 1)>>31, hd.IsSubOverflowUnsigned(uint32(10), 1)>>31)
}
Output:

1 0

func IsSubOverflowUnsigned2

func IsSubOverflowUnsigned2[T Unsigned](x, y T) T

func IsSubOverflowUnsigned3

func IsSubOverflowUnsigned3[T Unsigned](x, y T) bool

func IsSubOverflowUnsigned4

func IsSubOverflowUnsigned4[T Unsigned](x, y, sub T) bool
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	var x uint32 = 1
	fmt.Println(hd.IsSubOverflowUnsigned4(0, 1, 0-x), hd.IsSubOverflowUnsigned4(10, 1, 10-x))
}
Output:

true false

func IsolateRightmostOneBit

func IsolateRightmostOneBit[T Signed](x T) T
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.IsolateRightmostOneBit(0b01011000))
}
Output:

00001000

func LeadingZerosEqual

func LeadingZerosEqual[T Unsigned](x, y T) bool

func LeadingZerosLess

func LeadingZerosLess[T Unsigned](x, y T) bool

func LeadingZerosLessOrEqual

func LeadingZerosLessOrEqual[T Unsigned](x, y T) bool

func LeadingZerosUint32

func LeadingZerosUint32(x uint32) uint8

LeadingZerosUint32 (nlz) is algorithm from Robert Harley. It consists of 14 instructions branch-free. It uses Julius Goryavsky variation for smaller lookup table size. LeadingZerosUint32 has direct relationship of log2 as well, and can be used to compute it directly. Some instruction sets, such as ARM M1 chips, include single assembly instruction for this operation.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.LeadingZerosUint32(255))
}
Output:

24
Example (Long)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.LeadingZerosUint32(0b00111111111111111111111111101010))
}
Output:

2

func LeadingZerosUint32BinarySearch

func LeadingZerosUint32BinarySearch(x uint32) uint8

func LeadingZerosUint64

func LeadingZerosUint64(x uint64) uint8

func LenLongestStringOnes

func LenLongestStringOnes[T Unsigned](x T) uint8

LenLongestStringOnes (maxstr1) returns longest length of string of ones. Executes in 131 RISC instructions in worst case. TODO: it has optimization to reduce to 39 RISC instructions in worst case.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.LenLongestStringOnes(uint64(0b0000_0000_0000_0000_1100_1110_1111_1000)))
}
Output:

5

func LenShortestStringOnes

func LenShortestStringOnes(x uint32) (p, n uint8)

LenShortestStringOnes (fminstr1) returns shortest length of string of ones.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.LenShortestStringOnes(0xFF0FF0))
}
Output:

8 8
Example (Zero)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.LenShortestStringOnes(0))
}
Output:

32 0

func Less

func Less[T Signed](x, y T) T

func Less2

func Less2[T Signed](x, y T) T

func Less3

func Less3[T Signed](x, y T) T

func Less4

func Less4[T Integer](x, y T) T

Less4 utilizes the fact that x/2 - y/2 never overflows. Stores result in most significant bit. Exactly same formula works for uint32. This is because Go uses signed shift right for int32 and unsigned shift right for uint32. This takes 6 or seven instructions.

func LessOrEqual

func LessOrEqual[T Signed](x, y T) T

func LessOrEqual2

func LessOrEqual2[T Signed](x, y T) T

func LessOrEqualUnsigned

func LessOrEqualUnsigned[T Unsigned](x, y T) T

func LessOrEqualZero

func LessOrEqualZero[T Signed](x T) T

func LessOrEqualZero2

func LessOrEqualZero2[T Signed](x T) T

func LessUnsigned

func LessUnsigned[T Unsigned](x, y T) T

func LessUnsigned2

func LessUnsigned2[T Unsigned](x, y T) T

func LessZero

func LessZero[T Signed](x T) T

func Log10x32

func Log10x32(x uint32) uint32

Log10x32 (ilog10) utilizes multiple techniques, but at its core it is number of leading zeroes with adjustments. It is 11 branch free RISC instructions.

func Log10x64

func Log10x64(x uint64) uint64

Log10x64 (ilog10) utilizes multiple techniques, but at its core it is number of leading zeroes with adjustments. It is 11 branch free RISC instructions.

func Log2

func Log2(x uint32) uint32

Log2 (ilog2)

Example (Four)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.Log2(4))
}
Output:

2
Example (Nine)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.Log2(9))
}
Output:

3
Example (One)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.Log2(4))
}
Output:

2
Example (Zero)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	// This is special condition. There is a case to extend this definition to be valid.
	// However, it is not universally accepted.
	fmt.Println(hd.Log2(0))
}
Output:

4294967295

func Log2x64

func Log2x64(x uint64) uint64

Log2x64 (ilog2). Interestingly, it is more accurate than float64 based version, since float64 overflows, but this code does not.

func LoopDetectionGosper

func LoopDetectionGosper(f func(int) int, x0 int) (μLower, μUpper, λ int)

LoopDetectionGosper uses R.W.Gosper algorithm to detect start index of a loop and it's period. loop is defined on sequence: X_n+1 = f(X_n); X_0, X_1, ..., X_μ-1, X_μ, ... X_μ+λ. [HAK #132].

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	f := func(v int) int {
		if v < 100 {
			return v + 1
		}
		return 1
	}
	fmt.Println(hd.LoopDetectionGosper(f, 0))
}
Output:

0 63 100

func Max

func Max[T Signed](x, y T) T

func MaxAND

func MaxAND(minx, maxx, miny, maxy uint32) uint32

MaxAND is max(x|y) for x in [minx, maxx] and y in [miny, maxy]. This gives tighter bound than min(maxx, maxy).

func MaxOR

func MaxOR(minx, maxx, miny, maxy uint32) uint32

MaxOR is max(x|y) for x in [minx, maxx] and y in [miny, maxy]. This gives tighter bound than maxx + maxy.

func MaxRanges

func MaxRanges[T int32 | uint32](x, y T) T

MaxRanges requires signed x and y be in range [-2^30, 2^30-1] and unsigned x and y be in range [0, 2^31-1]

func Min

func Min[T Signed](x, y T) T

func MinAND

func MinAND(minx, maxx, miny, maxy uint32) uint32

MinAND is min(x&y) for x in [minx, maxx] and y in [miny, maxy]. This gives tighter bound than 0.

func MinOR

func MinOR(minx, maxx, miny, maxy uint32) uint32

MinOR is min(x|y) for x in [minx, maxx] and y in [miny, maxy]. This gives tighter bound than max(minx, miny).

func MinRanges

func MinRanges[T int32 | uint32](x, y T) T

MinRanges requires signed x and y be in range [-2^30, 2^30-1] and unsigned x and y be in range [0, 2^31-1]

func Mod10Signed

func Mod10Signed(n int32) int32

Mod10Signed (rems10) uses multiplication method.

func Mod10Unsigned

func Mod10Unsigned(n uint32) uint32

Mod10Unsigned (remu10) is using multiplication method similar to Mod3Unsigned5.

func Mod3Signed

func Mod3Signed(n int32) int32

Mod3Signed (rems3) is similar to unsigned version, but remaps output of it differently.

func Mod3Signed2

func Mod3Signed2(n int32) int32

Mod3Signed2 (rems3) is using multiply method.

func Mod3Unsigned

func Mod3Unsigned(n uint32) uint32

Mod3Unsigned (remu3) is using CountOnes (pop).

func Mod3Unsigned2

func Mod3Unsigned2(n uint32) uint32

Mod3Unsigned2 (remu3) is using CountOnes (pop) and table lookup.

func Mod3Unsigned3

func Mod3Unsigned3(n uint32) uint32

Mod3Unsigned3 (remu3) is using digit summing and in-register lookup.

func Mod3Unsigned4

func Mod3Unsigned4(n uint32) uint32

Mod3Unsigned4 (remu3) is using digit summing and in-memory lookup.

func Mod3Unsigned5

func Mod3Unsigned5(n uint32) uint32

Mod3Unsigned5 (remu3) is using the fact that n % 3 == floor(4/3 * n) % 3. This code is discovered in trial and error. This uses only shift and multiply operations. It executes in 13 instructions.

func Mod3Unsigned6

func Mod3Unsigned6(n uint32) uint32

Mod3Unsigned6 (remu3) is same as Mod3Unsigned5 but expands multiplications to shifts.

func Mod5Signed

func Mod5Signed(n int32) int32

Mod5Signed (rems5) is similar to unsigned version, but remaps output of it differently.

func Mod5Signed2

func Mod5Signed2(n int32) int32

Mod5Signed2 (rems5) is using multiply method.

func Mod5Unsigned

func Mod5Unsigned(n uint32) uint32

Mod5Unsigned (remu5) is using CountOnes (pop) and in-memory lookup.

func Mod5Unsigned2

func Mod5Unsigned2(n uint32) uint32

Mod5Unsigned2 (remu5) is using multiplication method similar to Mod3Unsigned5. It executes in 11 instructions.

func Mod63Unsigned

func Mod63Unsigned(n uint32) uint32

Mod63Unsigned (remu63) is mysterious code from Joe Keane that works in 12 instructions.

func Mod63Unsigned2

func Mod63Unsigned2(n uint32) uint32

Mod63Unsigned2 (remu63) is using multiplication method similar to Mod3Unsigned5. This is not as fast as Joe Keane method, unless there is fast multiplication instruction on the machine.

func Mod7Signed

func Mod7Signed(n int32) int32

Mod7Signed (rems7) is similar to unsigned version, but remaps output of it differently.

func Mod7Signed2

func Mod7Signed2(n int32) int32

Mod7Signed2 (rems7) uses multiplication method.

func Mod7Unsigned

func Mod7Unsigned(n uint32) uint32

Mod7Unsigned (remu7) is using summing method and in-memory lookup.

func Mod7Unsigned2

func Mod7Unsigned2(n uint32) uint32

Mod75Unsigned2 (remu7) is using multiplication method similar to Mod3Unsigned5.

func Mod9Signed

func Mod9Signed(n int32) int32

Mod9Signed (rems7) is similar to unsigned version, but remaps output of it differently.

func Mod9Signed2

func Mod9Signed2(n int32) int32

Mod9Signed2 (rems7) uses multiplication method.

func Mod9Unsigned

func Mod9Unsigned(n uint32) uint32

Mod9Unsigned (remu9) is using summing method and in-memory lookup.

func Mod9Unsigned2

func Mod9Unsigned2(n uint32) uint32

Mod9Unsigned2 (remu9) is using multiplication method similar to Mod3Unsigned5.

func MulMultiWord

func MulMultiWord(w, u, v []uint16)

MulMultiWord (mulmns) multiplies two multiwords word-wise. w = u * v This does not overflow. We are using uint16 and uint32 to avoid overflow in word multiplication. Most important word can be negative when converted to int16. Refer to routines Int64To4x16b and Int64From4x16b for conversion.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	u := []uint16{1, 2, 3, 4}
	v := []uint16{5, 6, 7}
	w := make([]uint16, len(u)+len(v))

	hd.MulMultiWord(w, u, v)

	fmt.Println(w)
}
Output:

[5 16 34 52 45 28 0]

func MultiplicativeInverseEuclid

func MultiplicativeInverseEuclid(d uint32) uint32

MultiplicativeInverseEuclid uses extended Euclidian algorithm.

func MultiplicativeInverseNewton

func MultiplicativeInverseNewton[T Integer](d T) T

MultiplicativeInverseNewton uses Newton method of multiplicative inverse. It follows from the well-known fact that sequence of xn_1 = xn * (2 - d * xn) converges to 1/d (mod d) given good starting xn. This also works with any inverse module of power of 2. Each iteration doubles number of correct bits.

func MultiplyHighOrder32

func MultiplyHighOrder32[T uint32 | int32](u, v T) T

MultiplyHighOrder32 (mulhs/mulhu) multiplies two integers and returns the high-order half of the product. This executes in 16 RISC instructions. Go has math/bits.Mul32 that returns higher order bits, however it does uint64 cast and works only for uint32. Remarkably, Go math/bits.Mul64 uses same algorithm as this function but uses 32bit words.

func MultiplyHighOrder64

func MultiplyHighOrder64[T uint64 | int64](u, v T) T

MultiplyHighOrder64 (mulhs/mulhu) multiplies two integers and returns the high-order half of the product. This executes in 16 RISC instructions. Go has math/bits.Mul64 that returns higher order bits, however it works only for uint64. Remarkably, Go math/bits.Mul64 uses same algorithm as this function but uses 32bit words. Algorithm in math/bits.Mul64 is the same, but it performs better because it does not use generics.

func NAbs

func NAbs[T Signed](x T) T
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Print(hd.NAbs(-42))
}
Output:

-42

func NAbs2

func NAbs2[T Signed](x T) T

func NAbs3

func NAbs3[T Signed](x T) T

func NewCRC32TableLookupTable

func NewCRC32TableLookupTable() [256]uint32

func NextHigherNumberWithSameNumberOfOnes

func NextHigherNumberWithSameNumberOfOnes[T Unsigned](x T) T

NextHigherNumberWithSameNumberOfOnes is useful in bitsets stored in integers to find next set of same cardinality. This function utilizes the theorem of Right-to-Left Computability.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.NextHigherNumberWithSameNumberOfOnes(uint64(0b0011110000)))
}
Output:

100000111

func NotEqual

func NotEqual[T Signed](x, y T) T

func NotEqual2

func NotEqual2(x, y int32) int32

func NotEqual3

func NotEqual3[T Integer](x, y T) T

func NotEqualZero

func NotEqualZero[T Signed](x T) T

func NotEqualZero2

func NotEqualZero2(x int32) int32

func NotEqualZero3

func NotEqualZero3[T Integer](x T) T

func NotEqualZero4

func NotEqualZero4[T int32 | uint32](x T) T

func Parity

func Parity(x uint32) uint32

Parity of number of 1-bits in bit-sequence. This executes in 10 instructions. This is example of "parallel prefix" operation, that very efficient for parallel computing.

Example (Even)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.Parity(0b0010_1101))
}
Output:

0
Example (Odd)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.Parity(0b0000_1101))
}
Output:

1

func Parity2

func Parity2(x uint32) uint32

Parity2 executes in 9 instructions. It avoids computing higher-order parity bits that will not be used.

func Parity3

func Parity3(x uint32) uint32

Parity3 computes parity of uint7 and sets 8th bit to 1/0 based on parity. Here we wrap into same signature as other Parity methods for convenience.

func Parity4

func Parity4(x uint32) uint32

Parity4 computes parity of uint7 and sets 8th bit to 1/0 based on parity, but flipped. Here we wrap into same signature as other Parity methods for convenience.

func Pow

func Pow[T Integer](x T, n uint) T

Pow (iexp) takes x to the power of n. This utilizes binary expression of exponent. For example: x^13, and binary of 13 = 0b1101 = 0b1000 + 0b0100 + 0b0001 = 8 + 4 + 1. Thus, we can compute exponent in log number of multiplications. However, this is not always optimal code (e.g. n=27 there is better decomposition ((x^3)^3)^3)). Note, this function is not resistant to overflows.

func RSqrtFloat32

func RSqrtFloat32(x float32) float32

RSqrtFloat32 (rsqrt) algorithm is fast floating point approximation of reciprocal square root (1 / sqrt). It does not work +/- Inf, NaN, denormals, negative numbers. This algorithm caused some buzz in early 2000s. It produces result with 3.5% error rate.

func Reverse8Bit

func Reverse8Bit(x uint32) uint32
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.Reverse8Bit(0b1101_0101))
}
Output:

10101011

func ReverseBits

func ReverseBits(x uint32) uint32

ReverseBits (rev) reverses bits in x using divide and conquer.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("0x%X", hd.ReverseBits(0x01234567))
}
Output:

0xE6A2C480

func ReverseBits2

func ReverseBits2(x uint32) uint32

ReverseBits2 uses Knuth algorithm. It is using 21 RISC instructions.

func ReverseBits64Knuth

func ReverseBits64Knuth(x uint64) uint64
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("0x%X", hd.ReverseBits64Knuth(0x01234567_89ABCDEF))
}
Output:

0xF7B3D1D1E6A2C480

func ReverseByte

func ReverseByte(b byte) byte
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%0b", hd.ReverseByte(0b1101_0101))
}
Output:

10101011

func RotateLeft

func RotateLeft(x uint32, n int) uint32
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%032b", hd.RotateLeft(0b11101111111100001111111111111101, 8))
}
Output:

11110000111111111111110111101111

func RotateRight

func RotateRight(x uint32, n int) uint32
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%032b", hd.RotateRight(0b11101111111100001111111111111101, 8))
}
Output:

11111101111011111111000011111111

func RoundDownBlockPowerOfTwo

func RoundDownBlockPowerOfTwo[T Unsigned](x T, k uint8) T
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.RoundDownBlockPowerOfTwo[uint](257, 8))
}
Output:

256

func RoundDownBlockPowerOfTwo2

func RoundDownBlockPowerOfTwo2[T Unsigned](x T, k uint8) T

func RoundUpBlockPowerOfTwo

func RoundUpBlockPowerOfTwo[T Unsigned](x T, k uint8) T
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.RoundUpBlockPowerOfTwo[uint](210, 8))
}
Output:

256

func RoundUpBlockPowerOfTwo2

func RoundUpBlockPowerOfTwo2[T Unsigned](x T, k uint8) T

func SetBitLastZero

func SetBitLastZero[T Signed](x T) T
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.SetBitLastZero(0b10100111))
}
Output:

00001000

func SetTrailingOnesWithRightMostOne

func SetTrailingOnesWithRightMostOne[T Signed](x T) T
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.SetTrailingOnesWithRightMostOne(0b01010111))
}
Output:

00001111

func SetTrailingZeros

func SetTrailingZeros[T Signed](x T) T
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.SetTrailingZeros(0b01011000))
}
Output:

00000111

func SetTrailingZeros2

func SetTrailingZeros2[T Signed](x T) T
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.SetTrailingZeros2(0b01011000))
}
Output:

00000111

func SetTrailingZeros3

func SetTrailingZeros3[T Signed](x T) T
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.SetTrailingZeros3(0b01011000))
}
Output:

00000111

func SetTrailingZerosWithRightMostOne

func SetTrailingZerosWithRightMostOne[T Signed](x T) T
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.SetTrailingZerosWithRightMostOne(0b01011000))
}
Output:

00001111

func SetupCycleThreeValues

func SetupCycleThreeValues(a, b, c int32) (c1, c2, c3 int32, n1, n2 uint8)

SetupCycleThreeValues is final routine that prepares constants for cycling. This can be computed at compile time.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	var c int32 = 0b10101 // 21
	var a int32 = 0b11111 // 31
	var b int32 = 0b10100 // 20
	fmt.Println(hd.SetupCycleThreeValues(a, b, c))
}
Output:

-11 10 21 1 0

func SetupCycleThreeValuesN1N2

func SetupCycleThreeValuesN1N2(a, b, c int32) (na, nb, nc int32, n1, n2 uint8)

SetupCycleThreeValuesN1N2 rearranges a,b,c in order required for cycling and defines n1 and n2 positions.

func ShiftLeftDoubleLength

func ShiftLeftDoubleLength(x [2]uint32, n int) [2]uint32
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	x := [2]uint32{0b1111, 0b1111}
	y := hd.ShiftLeftDoubleLength(x, 3)
	fmt.Printf("%032b_%032b", y[1], y[0])
}
Output:

00000000000000000000000001111000_00000000000000000000000001111000

func ShiftRightSigned32

func ShiftRightSigned32[T int32 | uint32](x T, v int) T

ShiftRightUnsigned32 does arithmetic shift right that Go is missing. This compiles to single RISC instruction and is going to be inlined.

func ShiftRightSigned64

func ShiftRightSigned64[T int64 | uint64](x T, v int) T

ShiftRightUnsigned64 does arithmetic shift right that Go is missing. This compiles to single RISC instruction and is going to be inlined.

func ShiftRightSignedDoubleLength

func ShiftRightSignedDoubleLength(x [2]uint32, n int) [2]uint32
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	x := [2]uint32{0b1111, 0b1111}
	y := hd.ShiftRightSignedDoubleLength(x, 3)
	fmt.Printf("%032b_%032b", y[1], y[0])
}
Output:

00000000000000000000000000000001_11100000000000000000000000000001
Example (Negative)
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	x := [2]uint32{0b1111, 0b10000000000000000000000000000001}
	y := hd.ShiftRightSignedDoubleLength(x, 3)
	fmt.Printf("%032b_%032b", y[1], y[0])
}
Output:

11110000000000000000000000000000_00100000000000000000000000000001

func ShiftRightSignedFromUnsigned

func ShiftRightSignedFromUnsigned(x uint32, n int) uint32

ShiftRightSignedFromUnsigned computes signed shift from unsigned shift. This compiles to five or six RISC instructions. If n is fixed, then this compiles to three or four RISC instructions.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.ShiftRightSignedFromUnsigned(0b11111111111111111111111111101010, 2))
}
Output:

11111111111111111111111111111010

func ShiftRightSignedFromUnsigned2

func ShiftRightSignedFromUnsigned2(x uint32, n int) uint32

ShiftRightSignedFromUnsigned2 is alternative version. If n is fixed, then this compiles to three or four RISC instructions.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.ShiftRightSignedFromUnsigned2(0b11111111111111111111111111101010, 2))
}
Output:

11111111111111111111111111111010

func ShiftRightSignedFromUnsigned3

func ShiftRightSignedFromUnsigned3(x uint32, n int) uint32
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.ShiftRightSignedFromUnsigned3(0b11111111111111111111111111101010, 2))
}
Output:

11111111111111111111111111111010

func ShiftRightSignedFromUnsigned4

func ShiftRightSignedFromUnsigned4(x uint32, n int) uint32
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.ShiftRightSignedFromUnsigned4(0b11111111111111111111111111101010, 2))
}
Output:

11111111111111111111111111111010

func ShiftRightSignedFromUnsigned5

func ShiftRightSignedFromUnsigned5(x uint32, n int) uint32
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.ShiftRightSignedFromUnsigned5(0b11111111111111111111111111101010, 2))
}
Output:

11111111111111111111111111111010

func ShiftRightUnsigned32

func ShiftRightUnsigned32[T int32 | uint32](x T, v int) T

ShiftRightUnsigned32 does logical shift right that Go is missing. This compiles to single RISC instruction and is going to be inlined.

func ShiftRightUnsigned64

func ShiftRightUnsigned64[T int64 | uint64](x T, v int) T

ShiftRightUnsigned64 does logical shift right that Go is missing. This compiles to single RISC instruction and is going to be inlined.

func ShiftRightUnsignedDoubleLength

func ShiftRightUnsignedDoubleLength(x [2]uint32, n int) [2]uint32
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	x := [2]uint32{0b1111, 0b1111}
	y := hd.ShiftRightUnsignedDoubleLength(x, 3)
	fmt.Printf("%032b_%032b", y[1], y[0])
}
Output:

00000000000000000000000000000001_11100000000000000000000000000001

func ShuffleInner

func ShuffleInner(x uint32) uint32

ShuffleInner performs perfect inner shuffle of bits.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	// each symbol is a bit
	// in: abcd_efgh_ijkl_mnop_ABCD_EFGH_IJKL_MNPO
	// out: AaBb_CcDd_EeFf_GgHh_IiJj_KkLl_MmNn_OoPp
	fmt.Printf("%032b", hd.ShuffleInner(0b1111_1111_1111_1111_0000_0000_0000_0000))
}
Output:

01010101010101010101010101010101

func ShuffleOuter

func ShuffleOuter(x uint32) uint32

ShuffleOuter performs perfect outer shuffle of bits. This function is used in cryptography. This is 32 RISC instructions.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	// each symbol is a bit
	// in: abcd_efgh_ijkl_mnop_ABCD_EFGH_IJKL_MNPO
	// out: aAbB_cCdD_eEfF_gGhH_iI_jJ_kKlL_mMnN_oOpP
	fmt.Printf("%032b", hd.ShuffleOuter(0b1111_1111_0000_0000_0000_0000_1111_1111))
}
Output:

10101010101010100101010101010101

func Sign

func Sign(x int32) int32

Sign can be calculated in four RISC instructions with comparison operators. Note, there is five and four instruction formula, but in Go it would not fit, since we need explicit logical right shift and conversion of full signed to unsigned bits. Note, there is three instruction formula using comparison operators, but in Go strong typing booleans not converted to ints, which would require branches and more instructions.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.Sign(-10), hd.Sign(10), hd.Sign(0))
}
Output:

-1 1 0

func SqrtBinarySearch

func SqrtBinarySearch(x uint32) uint32

SqrtBinarySearch (isqrt) is similar to Newton method with estimate, but performs fully estimation.

func SqrtNewton

func SqrtNewton(x uint32) uint32

SqrtNewton (isqrt) converges quadratically on series of better approximations. This version uses lookup table coded in branches that is used for better starting values.

func SqrtShiftAndSubtract

func SqrtShiftAndSubtract(x uint32) uint32

SqrtShiftAndSubtract (isqrt) is similar to some hardware implementation due to its effective use of shift instructions.

func SubDoubleLength

func SubDoubleLength(x [2]uint32, y [2]uint32) [2]uint32

SubDoubleLength of uint64 numbers encoded in two uint32. This takes eight instructions.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	x := [2]uint32{310, 405}
	y := [2]uint32{100, 200}
	fmt.Println(hd.SubDoubleLength(x, y))
}
Output:

[210 205]

func SubFourUint8

func SubFourUint8(x, y uint32) uint32

SubFourUint8 executes in eight branch-free instructions

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	x := hd.FourUint8ToUint32([4]uint8{2, 3, 254, 17})
	y := hd.FourUint8ToUint32([4]uint8{2, 1, 253, 0})

	sum := hd.SubFourUint8(x, y)

	fmt.Println(hd.FourUint8FromUint32(sum))
}
Output:

[0 2 1 17]

func SubTwoUint16

func SubTwoUint16(x, y uint32) uint32

SubFourUint8 executes in seven branch-free instructions

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	x := hd.TwoUint16ToUint32([2]uint16{65535 - 1, 5})
	y := hd.TwoUint16ToUint32([2]uint16{65535 - 5, 1})

	sum := hd.SubTwoUint16(x, y)

	fmt.Println(hd.TwoUint16FromUint32(sum))
}
Output:

[4 4]

func Syndrome

func Syndrome(pr, u uint32) uint32

Syndrome in Hamming based error checking code

func TrailingZerosUint32

func TrailingZerosUint32(x uint32) uint8

TrailingZerosUint32 (ntz) uses John Reiser variant of David Seal method. Among applications of TrailingZerosUint32 is R.W.Gosper Loop Detection Algorithm.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.TrailingZerosUint32(0b0000_1101_0000))
}
Output:

4

func TransposeMatrix8bx8b

func TransposeMatrix8bx8b(A, B []byte, m, n int)

TransposeMatrix8bx8b (transpose8) uses divide and conquer method. Each matrix i,j value is encoded in i-th byte and j-th bit of a byte. This is 2x fewer calculating RISC instructions than straightforward masking method. This is 21 calculating RISC instructions. A: m rows * n columns is 1-based coordinates of source submatrix. B: n rows * m columns is 1-based coordinates of destination submatrix.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	A := []byte{
		0b10101011,
		0b01011110,
		0b00010101,
		0b00001000,
		0b00010000,
		0b00110000,
		0b01010000,
		0b10010000,
	}
	B := make([]byte, 8)

	hd.TransposeMatrix8bx8b(A, B, 1, 1)

	for _, q := range B {
		fmt.Printf("%08b\n", q)
	}
}
Output:

10000001
01000010
10000100
01101111
11010000
01100000
11000000
10100000

func TurnOffRightMostBit

func TurnOffRightMostBit[T Signed](x T) T

TurnOffRightMostBit this can be used to test if integer is power of 2

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.TurnOffRightMostBit(0b01011000))
}
Output:

01010000

func TurnOffRightmostOnes

func TurnOffRightmostOnes[T Signed](x T) T

TurnOffRightmostOnes this can be sued to determine if a nonnegative integer is of the for 2^j - 2^k for some j >= k >= 0

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.TurnOffRightmostOnes(0b01011100))
}
Output:

01000000

func TurnOffRightmostOnes2

func TurnOffRightmostOnes2[T Signed](x T) T
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.TurnOffRightmostOnes2(0b01011100))
}
Output:

01000000

func TurnOffTrailingOnes

func TurnOffTrailingOnes[T Signed](x T) T

TurnOffTrailingOnes this can be used to test if integer if of the form 2^n - 1

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.TurnOffTrailingOnes(0b10100111))
}
Output:

10100000

func TurnOnRightMostBit

func TurnOnRightMostBit[T Signed](x T) T
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.TurnOnRightMostBit(0b10100111))
}
Output:

10101111

func TurnOnTrailingZeros

func TurnOnTrailingZeros[T Signed](x T) T
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Printf("%08b", hd.TurnOnTrailingZeros(0b10101000))
}
Output:

10101111

func TwoUint16FromUint32

func TwoUint16FromUint32(x uint32) [2]uint16

TwoUint16FromUint32 unpacks two uint16 from single uint32.

func TwoUint16ToUint32

func TwoUint16ToUint32(x [2]uint16) uint32

TwoUint16ToUint32 packs two uint16 into single uint32.

func Uint64FromNx16b

func Uint64FromNx16b(v []uint16) uint64

func UnShuffleInner

func UnShuffleInner(x uint32) uint32

UnShuffleInner is reverse of ShuffleInner.

func UnShuffleOuter

func UnShuffleOuter(x uint32) uint32

UnShuffleOuter is reverse of ShuffleOuter.

func ZByteL

func ZByteL(x uint32) int

ZByteL finds index of left most zero byte. This is branch-free code in 11 RISC instructions. This is useful for `strlen` in C strings, which use 0 byte for string termination.

Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.ZByteL(0x12_00_FF_00))
}
Output:

1

func ZByteL1

func ZByteL1(x uint32) int

ZByteL1 is direct algorithm.

func ZByteL64

func ZByteL64(x uint64) int
Example
package main

import (
	"fmt"

	hd "github.com/nikolaydubina/go-hackers-delight"
)

func main() {
	fmt.Println(hd.ZByteL64(0x11_12_00_00_FF_FF_00_11))
}
Output:

2

Types

type Integer

type Integer interface {
	Signed | Unsigned
}

type LRUCache

type LRUCache struct {
	// contains filtered or unexported fields
}

LRUCache is eight-way set associative cache with least-recently-used replacement policy that uses reference matrix method. The whole structure fits into single 64bit register. Internally, least significant byte of uint64 holds row 0 of reference matrix.

func (*LRUCache) Hit

func (c *LRUCache) Hit(i uint8)

Hit value i as most recently used. This is five or six instructions on 64bit RISC. Values of i should be in [0, 7].

func (*LRUCache) LeastRecentlyUsed

func (c *LRUCache) LeastRecentlyUsed() uint8

type Signed

type Signed interface {
	int | int8 | int16 | int32 | int64
}

type Unsigned

type Unsigned interface {
	uint | uint8 | uint16 | uint32 | uint64
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL