unsnap

package module
Version: v0.0.0-...-47dfef3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 30, 2021 License: MIT Imports: 9 Imported by: 13

README

go-unsnap-stream

This is a small golang library for decoding and encoding the snappy streaming format, specified here: https://github.com/google/snappy/blob/master/framing_format.txt

Note that the streaming or framing format for snappy is different from snappy itself. Think of it as a train of boxcars: the streaming format breaks your data in chunks, applies snappy to each chunk alone, then puts a thin wrapper around the chunk, and sends it along in turn. You can begin decoding before receiving everything. And memory requirements for decoding are sane.

Strangely, though the streaming format was first proposed in Go[1][2], it was never upated, and I could not locate any other library for Go that would handle the streaming/framed snappy format. Hence this implementation of the spec. There is a command line tool[3] that has a C implementation, but this is the only Go implementation that I am aware of. The reference for the framing/streaming spec seems to be the python implementation[4].

Update to the previous paragraph: Horray! Good news: Thanks to @nigeltao, we have since learned that the github.com/golang/snappy package now provides the snappy streaming format too. Even though the type level descriptions are a little misleading because they don't mention that they are for the stream format, the snappy package header documentation points out that the snappy.Reader and snappy.Writer types do indeed provide stream (vs block) handling. Although I have not benchmarked, you should probably prefer that package as it will likely be maintained more than I have time to devote, and also perhaps better integrated with the underlying snappy as they share the same repo.

For binary compatibility with the python implementation in [4], one could use the C-snappy compressor/decompressor code directly; using github.com/dgryski/go-csnappy. In fact we did this for a while to verify byte-for-byte compatiblity, as the native Go implementation produces slightly different binary compression (still conformant with the standard of course), which made test-diffs harder, and some have complained about it being slower than the C.

However, while the c-snappy was useful for checking compatibility, it introduced dependencies on external C libraries (both the c-snappy library and the C standard library). Our go binary executable that used the go-unsnap-stream library was no longer standalone, and deployment was painful if not impossible if the target had a different C standard library. So we've gone back to using the snappy-go implementation (entirely in Go) for ease of deployment. See the comments at the top of unsnap.go if you wish to use c-snappy instead.

[1] https://groups.google.com/forum/#!msg/snappy-compression/qvLNe2cSH9s/R19oBC-p7g4J

[2] https://codereview.appspot.com/5167058

[3] https://github.com/kubo/snzip

[4] https://pypi.python.org/pypi/python-snappy

Documentation

Index

Constants

View Source
const CHUNK_MAX = 65536

Variables

View Source
var SnappyStreamHeaderMagic = []byte{0xff, 0x06, 0x00, 0x00, 0x73, 0x4e, 0x61, 0x50, 0x70, 0x59}

0xff 0x06 0x00 0x00 sNaPpY

Functions

func IntMin

func IntMin(a int, b int) int

func ReadSnappyStreamCompressedFile

func ReadSnappyStreamCompressedFile(filename string) ([]byte, error)

func UnsnapOneFrame

func UnsnapOneFrame(r io.Reader, encBuf *FixedSizeRingBuf, outDecodedBuf *FixedSizeRingBuf, fname string) (nEnc int64, nDec int64, err error)

for an increment of a frame at a time: read from r into encBuf (encBuf is still encoded, thus the name), and write unsnappified frames into outDecodedBuf

the returned n: number of bytes read from the encrypted encBuf

func Unsnappy

func Unsnappy(r io.Reader, w io.Writer) (err error)

for whole file at once:

receive on stdin a stream of bytes in the snappy-streaming framed

format, defined here: http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt

Grab each frame, run it through the snappy decoder, and spit out

each frame all joined back-to-back on stdout.

Types

type FixedSizeRingBuf

type FixedSizeRingBuf struct {
	A        [2][]byte // a pair of ping/pong buffers. Only one is active.
	Use      int       // which A buffer is in active use, 0 or 1
	N        int       // MaxViewInBytes, the size of A[0] and A[1] in bytes.
	Beg      int       // start of data in A[Use]
	Readable int       // number of bytes available to read in A[Use]

	OneMade bool // lazily instantiate the [1] buffer. If we never call Bytes(),

}

func NewFixedSizeRingBuf

func NewFixedSizeRingBuf(maxViewInBytes int) *FixedSizeRingBuf

func (*FixedSizeRingBuf) Adopt

func (b *FixedSizeRingBuf) Adopt(me []byte)

Adopt(): non-standard.

For efficiency's sake, (possibly) take ownership of already allocated slice offered in me.

If me is large we will adopt it, and we will potentially then write to the me buffer. If we already have a bigger buffer, copy me into the existing buffer instead.

func (*FixedSizeRingBuf) Advance

func (b *FixedSizeRingBuf) Advance(n int)

Advance(): non-standard, but better than Next(), because we don't have to unwrap our buffer and pay the cpu time for the copy that unwrapping may need. Useful in conjuction/after ReadWithoutAdvance() above.

func (*FixedSizeRingBuf) Bytes

func (b *FixedSizeRingBuf) Bytes() []byte

from the standard library description of Bytes(): Bytes() returns a slice of the contents of the unread portion of the buffer. If the caller changes the contents of the returned slice, the contents of the buffer will change provided there

are no intervening method calls on the Buffer.

func (*FixedSizeRingBuf) ContigLen

func (b *FixedSizeRingBuf) ContigLen() int

get the length of the largest read that we can provide to a contiguous slice without an extra linearizing copy of all bytes internally.

func (*FixedSizeRingBuf) GetEndmostWritable

func (b *FixedSizeRingBuf) GetEndmostWritable() (beg int, end int)

Get the (beg, end] indices of the tailing empty buffer of bytes slice that from that is free for writing. Note: not guaranteed to be zeroed. At all.

func (*FixedSizeRingBuf) GetEndmostWritableSlice

func (b *FixedSizeRingBuf) GetEndmostWritableSlice() []byte

Note: not guaranteed to be zeroed.

func (*FixedSizeRingBuf) Make2ndBuffer

func (b *FixedSizeRingBuf) Make2ndBuffer()

func (*FixedSizeRingBuf) Read

func (b *FixedSizeRingBuf) Read(p []byte) (n int, err error)

func (*FixedSizeRingBuf) ReadAndMaybeAdvance

func (b *FixedSizeRingBuf) ReadAndMaybeAdvance(p []byte, doAdvance bool) (n int, err error)

func (*FixedSizeRingBuf) ReadFrom

func (b *FixedSizeRingBuf) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom() reads data from r until EOF or error. The return value n is the number of bytes read. Any error except io.EOF encountered during the read is also returned.

func (*FixedSizeRingBuf) ReadWithoutAdvance

func (b *FixedSizeRingBuf) ReadWithoutAdvance(p []byte) (n int, err error)

if you want to Read the data and leave it in the buffer, so as to peek ahead for example.

func (*FixedSizeRingBuf) Reset

func (b *FixedSizeRingBuf) Reset()

func (*FixedSizeRingBuf) Write

func (b *FixedSizeRingBuf) Write(p []byte) (n int, err error)

Write writes len(p) bytes from p to the underlying data stream. It returns the number of bytes written from p (0 <= n <= len(p)) and any error encountered that caused the write to stop early. Write must return a non-nil error if it returns n < len(p).

func (*FixedSizeRingBuf) WriteTo

func (b *FixedSizeRingBuf) WriteTo(w io.Writer) (n int64, err error)

WriteTo writes data to w until there's no more data to write or when an error occurs. The return value n is the number of bytes written. Any error encountered during the write is also returned.

type SnappyFile

type SnappyFile struct {
	Fname string

	Reader io.Reader
	Writer io.Writer

	// allow clients to substitute us for an os.File and just switch
	// off compression if they don't want it.
	SnappyEncodeDecodeOff bool // if true, we bypass straight to Filep

	EncBuf FixedSizeRingBuf // holds any extra that isn't yet returned, encoded
	DecBuf FixedSizeRingBuf // holds any extra that isn't yet returned, decoded

	// for writing to stream-framed snappy
	HeaderChunkWritten bool

	// Sanity check: we can only read, or only write, to one SnappyFile.
	// EncBuf and DecBuf are used differently in each mode. Verify
	// that we are consistent with this flag.
	Writing bool
}

func Create

func Create(name string) (file *SnappyFile, err error)

func NewReader

func NewReader(r io.Reader) *SnappyFile

func NewWriter

func NewWriter(w io.Writer) *SnappyFile

func Open

func Open(name string) (file *SnappyFile, err error)

func (*SnappyFile) Close

func (f *SnappyFile) Close() error

func (*SnappyFile) Dump

func (f *SnappyFile) Dump()

for debugging, show state of buffers

func (*SnappyFile) Read

func (f *SnappyFile) Read(p []byte) (n int, err error)

func (*SnappyFile) Sync

func (f *SnappyFile) Sync() error

func (*SnappyFile) Write

func (sf *SnappyFile) Write(p []byte) (n int, err error)

Write writes len(p) bytes from p to the underlying data stream. It returns the number of bytes written from p (0 <= n <= len(p)) and any error encountered that caused the write to stop early. Write must return a non-nil error if it returns n < len(p).

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
t or T : Toggle theme light dark auto
y or Y : Canonical URL