shoco

package module
v0.0.0-...-13bc643 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 4, 2019 License: BSD-3-Clause, MIT Imports: 4 Imported by: 2

README

shoco

GoDoc Build Status

shoco is a Golang package, based on the shoco C library, to compress and decompress short strings. It is very fast and easy to use. The default compression model is optimized for english words, but it is possible to generate your own compression model based on your specific input data.

Compression models can be found in the models package.

Download

go get github.com/tmthrgd/shoco

Benchmark

BenchmarkCompress/#0-0-8         	10000000	       177 ns/op
BenchmarkCompress/#1-4-8         	 5000000	       264 ns/op	  15.14 MB/s
BenchmarkCompress/#2-5-8         	 5000000	       349 ns/op	  14.30 MB/s
BenchmarkCompress/#3-240-8       	  300000	      4768 ns/op	  50.33 MB/s
BenchmarkCompress/#4-58-8        	 1000000	      1180 ns/op	  49.15 MB/s
BenchmarkCompress/#5-20-8        	 2000000	       684 ns/op	  29.23 MB/s
BenchmarkCompress/#6-13-8        	 3000000	       450 ns/op	  28.83 MB/s
BenchmarkCompress/#7-111-8       	  500000	      2748 ns/op	  40.38 MB/s
BenchmarkCompress/#8-9-8         	 5000000	       400 ns/op	  22.45 MB/s
BenchmarkCompress/#9-13-8        	 3000000	       452 ns/op	  28.75 MB/s
BenchmarkCompress/#10-13-8       	 3000000	       433 ns/op	  30.02 MB/s
BenchmarkCompress/#11-10-8       	 3000000	       398 ns/op	  25.10 MB/s
BenchmarkCompress/#12-15-8       	 3000000	       462 ns/op	  32.44 MB/s
BenchmarkCompress/#13-35-8       	 2000000	       974 ns/op	  35.91 MB/s
BenchmarkCompress/#14-6-8        	 5000000	       330 ns/op	  18.18 MB/s
BenchmarkCompress/#15-2-8        	10000000	       218 ns/op	   9.14 MB/s
BenchmarkCompress/#16-4-8        	 5000000	       269 ns/op	  14.85 MB/s
BenchmarkCompress/#17-4-8        	 5000000	       269 ns/op	  14.82 MB/s
BenchmarkCompress/#18-9-8        	 5000000	       297 ns/op	  30.23 MB/s
BenchmarkCompress/#19-2-8        	10000000	       193 ns/op	  10.35 MB/s
BenchmarkCompress/#20-4-8        	10000000	       200 ns/op	  19.94 MB/s
BenchmarkCompress/#21-4-8        	10000000	       191 ns/op	  20.94 MB/s
BenchmarkDecompress/#0-0-8       	10000000	       120 ns/op
BenchmarkDecompress/#1-2-8       	10000000	       196 ns/op	  10.16 MB/s
BenchmarkDecompress/#2-3-8       	10000000	       214 ns/op	  14.02 MB/s
BenchmarkDecompress/#3-169-8     	  500000	      4170 ns/op	  40.52 MB/s
BenchmarkDecompress/#4-39-8      	 1000000	      1316 ns/op	  29.63 MB/s
BenchmarkDecompress/#5-24-8      	 3000000	       470 ns/op	  51.04 MB/s
BenchmarkDecompress/#6-17-8      	 5000000	       369 ns/op	  45.99 MB/s
BenchmarkDecompress/#7-79-8      	 1000000	      2255 ns/op	  35.02 MB/s
BenchmarkDecompress/#8-18-8      	 5000000	       284 ns/op	  63.29 MB/s
BenchmarkDecompress/#9-22-8      	 5000000	       333 ns/op	  65.96 MB/s
BenchmarkDecompress/#10-22-8     	 5000000	       327 ns/op	  67.26 MB/s
BenchmarkDecompress/#11-20-8     	 5000000	       304 ns/op	  65.77 MB/s
BenchmarkDecompress/#12-25-8     	 5000000	       360 ns/op	  69.35 MB/s
BenchmarkDecompress/#13-46-8     	 2000000	       858 ns/op	  53.60 MB/s
BenchmarkDecompress/#14-12-8     	10000000	       174 ns/op	  68.65 MB/s
BenchmarkDecompress/#15-4-8      	10000000	       176 ns/op	  22.71 MB/s
BenchmarkDecompress/#16-8-8      	10000000	       216 ns/op	  36.92 MB/s
BenchmarkDecompress/#17-8-8      	10000000	       222 ns/op	  36.00 MB/s
BenchmarkDecompress/#18-6-8      	 5000000	       344 ns/op	  17.43 MB/s
BenchmarkDecompress/#19-3-8      	10000000	       183 ns/op	  16.36 MB/s
BenchmarkDecompress/#20-5-8      	10000000	       190 ns/op	  26.31 MB/s
BenchmarkDecompress/#21-5-8      	10000000	       188 ns/op	  26.49 MB/s
BenchmarkWords/Compress-8        	     100	  21806321 ns/op	  43.05 MB/s
BenchmarkWords/Decompress-8      	     100	  16730975 ns/op	  39.60 MB/s
--- BENCH: BenchmarkWords
	shoco_test.go:228: len(in)  = 938848B
	shoco_test.go:229: len(out) = 662545B
	shoco_test.go:230: ratio    = 0.705700%

License

Unless otherwise noted, the shoco source files are distributed under the Modified BSD License found in the LICENSE file.

Documentation

Overview

Package shoco is a compressor for small text strings based on the shoco C library.

Index

Constants

This section is empty.

Variables

View Source
var DefaultModel = WordsEnModel

DefaultModel is the default model used by the package level functions.

View Source
var ErrInvalid = errors.New("shoco: invalid input")

ErrInvalid is returned by decompress functions when the compressed input data is malformed.

View Source
var FilePathModel = filePathModel

FilePathModel is a model optimised for filepaths.

Deprecated: Use models.FilePath() instead.

View Source
var TextEnModel = textEnModel

TextEnModel is a model optimised for English langauge text.

Deprecated: Use models.TextEn() instead.

View Source
var WordsEnModel = wordsEnModel

WordsEnModel is a model optimised for words of the English langauge.

Deprecated: Use models.WordsEn() instead.

Functions

func Compress

func Compress(in []byte) (out []byte)

Compress uses DefaultModel to compress the input data.

func Decompress

func Decompress(in []byte) (out []byte, err error)

Decompress uses DefaultModel to decompress the input data, it will return an error if the data is invalid.

func ProposedCompress

func ProposedCompress(in []byte) (out []byte)

ProposedCompress uses DefaultModel to compress the input data, it uses a shorter encoding for non-ASCII characters.

func ProposedDecompress

func ProposedDecompress(in []byte) (out []byte, err error)

ProposedDecompress uses DefaultModel to decompress the input data, it will return an error if the data is invalid. It requires the data to have been previously compressed with the shorter encoding produced by ProposedCompress.

Types

type Model

type Model struct {
	ChrsByChrID                 []byte
	ChrIdsByChr                 [256]int8
	SuccessorIDsByChrIDAndChrID [][]int8
	ChrsByChrAndSuccessorID     [][]byte
	Packs                       []Pack
	MinChr                      byte
	MaxSuccessorN               int
	// contains filtered or unexported fields
}

Model represents a shoco compression model.

It can be generated using the generate_compressor_model.py script in Ed-von-Schleck/shoco. The output of that script will require conversion to Go code. The script is available at: https://github.com/Ed-von-Schleck/shoco/blob/4dee0fc850cdec2bdb911093fe0a6a56e3623b71/generate_compressor_model.py.

func (*Model) Compress

func (m *Model) Compress(in []byte) (out []byte)

Compress uses the given model to compress the input data.

func (*Model) Decompress

func (m *Model) Decompress(in []byte) (out []byte, err error)

Decompress uses the given model to decompress the input data, it will return an error if the data is invalid.

func (*Model) ProposedCompress

func (m *Model) ProposedCompress(in []byte) (out []byte)

ProposedCompress uses the given model to compress the input data, it uses a shorter encoding for non-ASCII characters.

func (*Model) ProposedDecompress

func (m *Model) ProposedDecompress(in []byte) (out []byte, err error)

ProposedDecompress uses the given model to decompress the input data, it will return an error if the data is invalid. It requires the data to have been previously compressed with the shorter encoding produced by ProposedCompress.

type Pack

type Pack struct {
	Word          uint32
	BytesPacked   int
	BytesUnpacked int
	Offsets       [8]uint
	Masks         [8]int16
}

Pack represents encoding data for a shoco compression model.

Directories

Path Synopsis
Package models contains various compression models for shoco.
Package models contains various compression models for shoco.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL