Documentation
¶
Overview ¶
Package ace is an Abstract Compression Engine providing special-purpose lossless entropy coding/decoding.
The algorithm used is a simplified variation of Re-Pair, a grammar-based compression algorithm.
No wire/transport format is provided, hence the "abstract" part of the name.
The engine is not practical for very long input sequences/huge files as both encoding and decoding consumes several bytes of memory per every sequence symbol.
Symbols are 32 bit unsigned numbers, the length of the output sequence is limited to 2^32 symbols. However, many machines will run out of memory long before handling output sequences of such length.
It is usually possible to encode and later decode a 10^9 symbol sequence on a 16 GB RAM machine.
Compression ratios ¶
me@3900x:~/src/modernc.org/ace$ make test === RUN TestMemory0 all_test.go:79: abcabdabcabd : syms in= 12 syms out= 8 k=66.67% te= 8.62µs td= 1.37µs all_test.go:79: enwik3 : syms in= 1,000 syms out= 370 k=37.00% te= 332.58µs td= 11.94µs all_test.go:79: enwik4 : syms in= 10,000 syms out= 3,552 k=35.52% te= 2.933338ms td= 109.45µs all_test.go:79: enwik5 : syms in= 100,000 syms out= 25,913 k=25.91% te= 22.427729ms td= 1.467629ms all_test.go:79: enwik6 : syms in=1,000,000 syms out=195,100 k=19.51% te= 246.72995ms td= 10.718449ms all_test.go:79: gettysburg : syms in= 1,463 syms out= 737 k=50.38% te= 530.71µs td= 14.45µs all_test.go:79: gettysburgx10 : syms in= 14,630 syms out= 1,497 k=10.23% te= 1.734398ms td= 116.87µs all_test.go:79: gettysburgx100 : syms in= 146,300 syms out= 1,510 k= 1.03% te= 8.984032ms td= 921.319µs all_test.go:79: zero3 : syms in= 1,000 syms out= 22 k= 2.20% te= 51.54µs td= 6.49µs all_test.go:79: zero4 : syms in= 10,000 syms out= 29 k= 0.29% te= 486.01µs td= 55.889µs all_test.go:79: zero5 : syms in= 100,000 syms out= 36 k= 0.04% te= 4.719957ms td= 526.45µs all_test.go:79: zero6 : syms in=1,000,000 syms out= 43 k= 0.00% te= 47.401958ms td= 9.755601ms --- PASS: TestMemory0 (0.36s) PASS ok modernc.org/ace 0.364s me@3900x:~/src/modernc.org/ace$
Performance ¶
me@3900x:~/src/modernc.org/ace$ make benchmark goos: linux goarch: amd64 pkg: modernc.org/ace cpu: AMD Ryzen 9 3900X 12-Core Processor BenchmarkEncodeEnwik6-24 6 255495376 ns/op 3.91 MB/s 13102778 B/op 7572 allocs/op BenchmarkDecodeEnwik6-24 100 19666232 ns/op 50.85 MB/s 9355974 B/op 64 allocs/op PASS ok modernc.org/ace 4.264s me@3900x:~/src/modernc.org/ace$
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Decoder ¶ added in v0.1.0
type Decoder struct {
// contains filtered or unexported fields
}
Decoder represents the decoding state.
func NewDecoder ¶
NewDecoder returns a newly created Decoder. The value of the 'firstMeta' argument must be the same one used to encode the sequence. See NewEncoder for more details.
The 'write' function is called to add symbols to the decoded sequence.
type Encoder ¶ added in v0.1.0
type Encoder struct {
// contains filtered or unexported fields
}
Encoder represents the encoding state.
func NewEncoder ¶
NewEncoder returns a newly created Encoder.
The 'window' parameter sets the size of the compression window. The compression ratio may improve with larger values, but the improvements diminish quickly. A usable value for textual input sequences is ~32.
The 'write' function is called to add symbols to the encoded sequence.
The 'firstMeta' argument declares the limit of the values in the input sequence. If the input consists of, say ASCII-only characters, the value is 128 (or higher). For a byte sequence pass 256. If the symbols are Unicode runes then unicode.MaxRune+1 is a safe value of the 'firstMeta' argument. A possible way how to share this value without prior knowledge of the decoder could be, for example, to prepend the 'firstMeta' value to the output sequence, so the decoder can read it back.