ccc

package module
v0.0.0-...-704a610 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 13, 2017 License: MIT Imports: 11 Imported by: 0

README

Crawlcoin Compression

A wrapper around Zlib and Brotli that lazy-loads versioned compression dictionaries over HTTP, file-system, or memory. Supports both a shared dictionary and a per-ID custom dictionary. The ID can be anything app-specific, commonly a domain name.

Dictionaries can be manually created or by using dictator.

Part of the Crawlcoin web-scale crawling system.

Overview

Each dictionary must be static for that id/version. If a new dictionary is created, the version should be bumped. This allows for backward compatibility with any existing compressed file while allowing refinement over time. The version of the dictionary used to compress some bytes is not included in the compressed file and must be stored separately.

If 0 is specified for a particular dictionary, no dictionary is used (including shared).

Usage

To use the bindings, you just need to import the ccc package, provider a dictionary provider, and compress/decompress.

Compression + decompression example with no error handling:


import (
	"github.com/crawlcoin/ccc"
)

func cccRoundtrip(input []byte) []byte {
	mem := ccc.NewMemoryDictionaryProvider()
	id := "test"
	customVersion := 1
	sharedVersion := 1
	mem.AddCustom(id, customVersion, []byte{1, 2})
	mem.AddShared(sharedVersion, []byte{3, 4})

	compressed, _ := ccc.BrotliCompress(mem, input, id, customVersion, sharedVersion)
	decompressed, _ := ccc.BrotliDecompress(mem, compressed, id, customVersion, sharedVersion)
	return decompressed
}

Check out providers/url_test.go for complete examples of HTTP dictionaries and caching.

Full documentation available on godoc.

Development

Testing

make test to run all tests locally

make bench to run all benchmarks

Submitting Patches

Before sending commits or patches run

make fmt && make fulltest

or if you want to git commit -a you can run the convenience target

make commit

Documentation

Index

Constants

View Source
const DEFAULT_CACHE_SIZE = 512
View Source
const SHARED = "shared"

Variables

View Source
var INVALID_HOSTS = map[string]bool{"shared": true}

Functions

func Append

func Append(dict1 []byte, dict2 []byte) []byte

func BrotliCompress

func BrotliCompress(provider DictionaryProvider, b []byte, id string, customVersion int, sharedVersion int) ([]byte, error)

Given a dictionary provider, compress some bytes with Brotli using versioned dictionaries. Use 0 to ignore that type of dictionary.

func BrotliDecompress

func BrotliDecompress(provider DictionaryProvider, b []byte, id string, customVersion int, sharedVersion int) ([]byte, error)

Given a dictionary provider, decompress some bytes with Brotli using versioned dictionaries. Use 0 to ignore that type of dictionary.

func Combined

func Combined(p DictionaryProvider, id string, customVersion int, sharedVersion int) ([]byte, error)

func CustomDictionary

func CustomDictionary(p DictionaryProvider, id string, version int) ([]byte, error)

func SharedDictionary

func SharedDictionary(p DictionaryProvider, version int) ([]byte, error)

func ZlibCompress

func ZlibCompress(provider DictionaryProvider, level int, b []byte, id string, customVersion int, sharedVersion int) ([]byte, error)

Given a dictionary provider, compress some bytes with zlib using versioned dictionaries. Use 0 to ignore that type of dictionary.

func ZlibDecompress

func ZlibDecompress(provider DictionaryProvider, b []byte, id string, customVersion int, sharedVersion int) ([]byte, error)

Given a dictionary provider, decompress some bytes with zlib using versioned dictionaries. Use 0 to ignore that type of dictionary.

Types

type DictionaryProvider

type DictionaryProvider interface {
	// Return the bytes for a shared dictionary at a specific version
	SharedDictionary(version int) ([]byte, error)

	// Return the vytes for a custom dictionary at a specific version
	CustomDictionary(id string, version int) ([]byte, error)
}

type FileDictionaryProvider

type FileDictionaryProvider struct {
	BaseSharedDirectory string
	BaseCustomDirectory string
}

func NewFileDictionaryProvider

func NewFileDictionaryProvider(BaseSharedDirectory string, BaseCustomDirectory string) *FileDictionaryProvider

func (FileDictionaryProvider) CustomDictionary

func (p FileDictionaryProvider) CustomDictionary(id string, version int) ([]byte, error)

func (FileDictionaryProvider) SharedDictionary

func (p FileDictionaryProvider) SharedDictionary(version int) ([]byte, error)

type MemoryDictionaryProvider

type MemoryDictionaryProvider struct {
	SharedDictionaries map[int][]byte
	CustomDictionaries map[string]map[int][]byte
	// contains filtered or unexported fields
}

func NewMemoryDictionaryProvider

func NewMemoryDictionaryProvider() *MemoryDictionaryProvider

func (*MemoryDictionaryProvider) AddCustom

func (p *MemoryDictionaryProvider) AddCustom(id string, version int, dict []byte) error

func (*MemoryDictionaryProvider) AddShared

func (p *MemoryDictionaryProvider) AddShared(version int, dict []byte) error

func (*MemoryDictionaryProvider) CustomDictionary

func (p *MemoryDictionaryProvider) CustomDictionary(id string, version int) ([]byte, error)

func (*MemoryDictionaryProvider) SharedDictionary

func (p *MemoryDictionaryProvider) SharedDictionary(version int) ([]byte, error)

type URLDictionaryProvider

type URLDictionaryProvider struct {
	BaseSharedURL string
	BaseCustomURL string

	UseMemoryCache  bool
	MemoryCacheSize int

	UseFSCache     bool
	FSSharedPrefix string
	FSCustomPrefix string
	// contains filtered or unexported fields
}

func NewCachedURLDictionaryProvider

func NewCachedURLDictionaryProvider(BaseSharedURL string, BaseCustomURL string, UseMemoryCache bool, MemoryCacheSize int, UseFSCache bool, FSSharedPrefix string, FSCustomPrefix string) (*URLDictionaryProvider, error)

More complicated constructor with caching options

func NewURLDictionaryProvider

func NewURLDictionaryProvider(BaseSharedURL string, BaseCustomURL string) *URLDictionaryProvider

Create a basic URL provider, all requests will go over HTTP

func (*URLDictionaryProvider) CustomDictionary

func (p *URLDictionaryProvider) CustomDictionary(id string, version int) ([]byte, error)

func (*URLDictionaryProvider) SharedDictionary

func (p *URLDictionaryProvider) SharedDictionary(version int) ([]byte, error)

type Version

type Version struct {
	Version int
}

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL