README

ogórek

GoDoc Build Status

ogórek is a Go library for encoding and decoding pickles.

Fuzz Testing

Fuzz testing has been implemented for decoder and encoder. To run fuzz tests do the following:

go get github.com/dvyukov/go-fuzz/go-fuzz
go get github.com/dvyukov/go-fuzz/go-fuzz-build
go-fuzz-build github.com/kisielk/og-rek
go-fuzz -bin=./ogórek-fuzz.zip -workdir=./fuzz
Expand ▾ Collapse ▴

Documentation

Overview

Package ogórek(*) is a library for decoding/encoding Python's pickle format.

Use Decoder to decode a pickle from input stream, for example:

d := ogórek.NewDecoder(r)
obj, err := d.Decode() // obj is interface{} representing decoded Python object

Use Encoder to encode an object as pickle into output stream, for example:

e := ogórek.NewEncoder(w)
err := e.Encode(obj)

The following table summarizes mapping of basic types in between Python and Go:

Python	   Go
------	   --

None	↔  ogórek.None
bool	↔  bool
int	↔  int64
int	←  int, intX, uintX
long	↔  *big.Int
float	↔  float64
float	←  floatX
list	↔  []interface{}
tuple	↔  ogórek.Tuple
dict	↔  map[interface{}]interface{}

str        ↔  string         (+)
bytes      ↔  ogórek.Bytes   (~)
bytearray  ↔  []byte

Python classes and instances are mapped to Class and Call, for example:

Python				Go
------	   			--

decimal.Decimal            ↔    ogórek.Class{"decimal", "Decimal"}
decimal.Decimal("3.14")    ↔    ogórek.Call{
					ogórek.Class{"decimal", "Decimal"},
					ogórek.Tuple{"3.14"},
				}

In particular on Go side it is thus by default safe to decode pickles from untrusted sources(^).

Pickle protocol versions

Over the time the pickle stream format was evolving. The original protocol version 0 is human-readable with versions 1 and 2 extending the protocol in backward-compatible way with binary encodings for efficiency. Protocol version 2 is the highest protocol version that is understood by standard pickle module of Python2. Protocol version 3 added ways to represent Python bytes objects from Python3(~). Protocol version 4 further enhances on version 3 and completely switches to binary-only encoding. Please see https://docs.python.org/3/library/pickle.html#data-stream-format for details.

On decoding ogórek detects which protocol is being used and automatically handles all necessary details.

On encoding, for compatibility with Python2, by default ogórek produces pickles with protocol 2. Bytes thus, by default, will be unpickled as str on Python2 and as bytes on Python3. If an earlier protocol is desired, or on the other hand, if Bytes needs to be encoded efficiently (protocol 2 encoding for bytes is far from optimal), and compatibility with pure Python2 is not an issue, the protocol to use for encoding could be explicitly specified, for example:

e := ogórek.NewEncoderWithConfig(w, &ogórek.EncoderConfig{
	Protocol: 3,
})
err := e.Encode(obj)

See EncoderConfig.Protocol for details.

Persistent references

Pickle was originally created for serialization in ZODB (http://zodb.org) object database, where on-disk objects can reference each other similarly to how one in-RAM object can have a reference to another in-RAM object.

When a pickle with such persistent reference is decoded, ogórek represents the reference with Ref placeholder similarly to Class and Call. However it is possible to hook into decoding and process such references in application specific way, for example loading the referenced object from the database:

d := ogórek.NewDecoderWithConfig(r, &ogórek.DecoderConfig{
	PersistentLoad: ...
})
obj, err := d.Decode()

Similarly, for encoding, an application can hook into serialization process and turn pointers to some in-RAM objects into persistent references.

Please see DecoderConfig.PersistentLoad and EncoderConfig.PersistentRef for details.

--------

(*) ogórek is Polish for "pickle".

(+) for Python2 both str and unicode are decoded into string with Python str being considered as UTF-8 encoded. Correspondingly for protocol ≤ 2 Go string is encoded as UTF-8 encoded Python str, and for protocol ≥ 3 as unicode.

(~) bytes can be produced only by Python3 or zodbpickle (https://pypi.org/project/zodbpickle), not by standard Python2. Respectively, for protocol ≤ 2, what ogórek produces is unpickled as bytes by Python3 or zodbpickle, and as str by Python2.

(^) contrary to Python implementation, where malicious pickle can cause the decoder to run arbitrary code, including e.g. os.system("rm -rf /").

Index

Constants

This section is empty.

Variables

View Source
var ErrInvalidPickleVersion = errors.New("invalid pickle version")

Functions

This section is empty.

Types

type Bytes

type Bytes string

    Bytes represents Python's bytes.

    type Call

    type Call struct {
    	Callable Class
    	Args     Tuple
    }

      Call represents Python's call.

      type Class

      type Class struct {
      	Module, Name string
      }

        Class represents a Python class.

        type Decoder

        type Decoder struct {
        	// contains filtered or unexported fields
        }

          Decoder is a decoder for pickle streams.

          func NewDecoder

          func NewDecoder(r io.Reader) *Decoder

            NewDecoder constructs a new Decoder which will decode the pickle stream in r.

            func NewDecoderWithConfig

            func NewDecoderWithConfig(r io.Reader, config *DecoderConfig) *Decoder

              NewDecoderWithConfig is similar to NewDecoder, but allows specifying decoder configuration.

              func (*Decoder) Decode

              func (d *Decoder) Decode() (interface{}, error)

                Decode decodes the pickle stream and returns the result or an error.

                type DecoderConfig

                type DecoderConfig struct {
                	// PersistentLoad, if !nil, will be used by decoder to handle persistent references.
                	//
                	// Whenever the decoder finds an object reference in the pickle stream
                	// it will call PersistentLoad. If PersistentLoad returns !nil object
                	// without error, the decoder will use that object instead of Ref in
                	// the resulted built Go object.
                	//
                	// An example use-case for PersistentLoad is to transform persistent
                	// references in a ZODB database of form (type, oid) tuple, into
                	// equivalent-to-type Go ghost object, e.g. equivalent to zodb.BTree.
                	//
                	// See Ref documentation for more details.
                	PersistentLoad func(ref Ref) (interface{}, error)
                }

                  DecoderConfig allows to tune Decoder.

                  type Encoder

                  type Encoder struct {
                  	// contains filtered or unexported fields
                  }

                    An Encoder encodes Go data structures into pickle byte stream

                    func NewEncoder

                    func NewEncoder(w io.Writer) *Encoder

                      NewEncoder returns a new Encoder struct with default values

                      func NewEncoderWithConfig

                      func NewEncoderWithConfig(w io.Writer, config *EncoderConfig) *Encoder

                        NewEncoderWithConfig is similar to NewEncoder, but allows specifying the encoder configuration.

                        func (*Encoder) Encode

                        func (e *Encoder) Encode(v interface{}) error

                          Encode writes the pickle encoding of v to w, the encoder's writer

                          type EncoderConfig

                          type EncoderConfig struct {
                          	// Protocol specifies which pickle protocol version should be used.
                          	Protocol int
                          
                          	// PersistentRef, if !nil, will be used by encoder to encode objects as persistent references.
                          	//
                          	// Whenever the encoders sees pointer to a Go struct object, it will call
                          	// PersistentRef to find out how to encode that object. If PersistentRef
                          	// returns nil, the object is encoded regularly. If !nil - the object
                          	// will be encoded as an object reference.
                          	//
                          	// See Ref documentation for more details.
                          	PersistentRef func(obj interface{}) *Ref
                          }

                            EncoderConfig allows to tune Encoder.

                            type None

                            type None struct{}

                              None is a representation of Python's None.

                              type OpcodeError

                              type OpcodeError struct {
                              	Key byte
                              	Pos int
                              }

                                OpcodeError is the error that Decode returns when it sees unknown pickle opcode.

                                func (OpcodeError) Error

                                func (e OpcodeError) Error() string

                                type Ref

                                type Ref struct {
                                	// persistent ID of referenced object.
                                	//
                                	// used to be string for protocol 0, but "upgraded" to be arbitrary
                                	// object for later protocols.
                                	Pid interface{}
                                }

                                  Ref is the default representation for a Python persistent reference.

                                  Such references are used when one pickle somehow references another pickle in e.g. a database.

                                  See https://docs.python.org/3/library/pickle.html#pickle-persistent for details.

                                  See DecoderConfig.PersistentLoad and EncoderConfig.PersistentRef for ways to tune Decoder and Encoder to handle persistent references with user-specified application logic.

                                  type Tuple

                                  type Tuple []interface{}

                                    Tuple is a representation of Python's tuple.

                                    type TypeError

                                    type TypeError struct {
                                    	// contains filtered or unexported fields
                                    }

                                    func (*TypeError) Error

                                    func (te *TypeError) Error() string