rezi

package module
v2.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 18, 2023 License: MIT Imports: 11 Imported by: 3

README

REZI

Tests Status Badge Go Reference

The Rarefied Encoding (Compressible) for Interchange (REZI) library performs binary marshaling of data to formatted binary data bytes. It can encode and decode most simple Go types out-of-the-box, whether user-defined or built-in, and for those cases where the automatic format just doesn't cut it, it allows customization of user-defined types that implement binary or text marshaling interfaces from the built-in Go encoding package.

All data is encoded in a deterministic fashion, or as deterministically as possible. Any non-determinism in the resulting encoded value will arise from functions outside of the library's control; it is up to the user to ensure that, for instance, calling MarshalBinary on a user-defined type passed to REZI for encoding gives a deterministic result.

The REZI format was originally created for structs in the Ictiobus project and eventually grew into a separate library for use with other projects.

Installation

Install REZI into your project:

$ go get -u github.com/dekarrin/rezi/v2@latest

And import the '/v2' path:

import "github.com/dekarrin/rezi/v2"
Usage

The primary REZI format functions are Enc for encoding data and Dec to decode it. Both of these work similar to the Marshal and Unmarshal functions in the json library. For encoding, the value to be encoded is passed in directly, and for decoding, a pointer to a value of the correct type is passed in.

Encoding

To encode a value using REZI, pass it to Enc(). This will return a slice of bytes holding the encoded value.

import (
    "fmt"

    "github.com/dekarrin/rezi/v2"
)

...

name := "Terezi"

nameData, err := rezi.Enc(name)
if err != nil {
    panic(err)
}

fmt.Println(nameData) // this will print out the encoded data bytes

Multiple encoded values are joined into a single slice of REZI-compatible bytes by appending the results of Enc() together.

// A new value to encode:
number := 612

numData, err := rezi.Enc(number)
if err != nil {
    panic(err)
}

// we'll append the two data slices together in a new slice containing both the
// encoded name and number:

var data []byte
data = append(data, nameData...)
data = append(data, numData...)

You'll need to keep the order of the encoded values in mind when decoding. In the above example, the data slice contains the encoded name, followed by the encoded number.

Decoding

To decode data from a slice of bytes containing REZI-format data, pass it along with a pointer to receive the value to the Dec() function. The data can contain more than one value in sequence; Dec() will decode the one that begins at the start of the slice, and return the number of bytes it decoded.

import (
    "fmt"

    "github.com/dekarrin/rezi/v2"
)

...

var decodedName string
var decodedNumber int

var readByteCount int
var err error

// assume data is the []byte from the end of the Enc() example. It contains a
// REZI-format string, followed by a REZI-format int.

// decode the first value, the name:
readByteCount, err = rezi.Dec(data, &decodedName)
if err != nil {
    panic(err)
}

// skip ahead the number of bytes that were just read so that the start of data
// now points at the next REZI-encoded value
data = data[readByteCount:]

// decode the second value, the number:
readByteCount, err = rezi.Dec(data, &decodedNumber)
if err != nil {
    panic(err)
}

fmt.Println(decodedName) // "Terezi"
fmt.Println(decodedNumber) // 612
Readers And Writers

You can also use REZI by creating a Reader or Writer and calling their Dec or Enc methods respectively. This lets you read and write values directly to and from streams of bytes.


// on the write side, get an io.Writer someWriter you want to write REZI-encoded
// data to, and write out with Enc.

w, err := rezi.NewWriter(someWriter, nil)
if err != nil {
    panic(err)
}

w.Enc(413)
w.Enc("NEPETA")
w.Enc(true)

// don't forget to call Flush or Close when done!
w.Flush()

// on the read side, get an io.Reader someReader you want to read REZI-encoded
// data from, and read it with Dec.

r, err := rezi.NewReader(someReader, nil)
if err != nil {
    panic(err)
}

var number int
var name string
var isGood bool

r.Dec(&number)
r.Dec(&name)
r.Dec(&isGood)

fmt.Println(number) // 413
fmt.Println(name)   // "NEPETA"
fmt.Println(isGood) // true

Output from Writer can be read in earlier versions of REZI as well with non-Reader calls, as long as a nil or a Version 1 Format is used at startup, without compression enabled. This does not extend to data types that didn't exist in that version, however.

Readers created with a nil or Version 1 Format with compression disabled are able to read data written by any prior version of REZI.

Supported Types

REZI supports all built-in basic types. Additionally, any type that implements encoding.BinaryMarshaler can be encoded, and any type that implements encoding.BinaryUnmarshaler with a pointer receiver can be decoded. Additionally, any type that implements encoding.TextMarshaler can be encoded, and any type that implements encoding.TextUnmarshaler can be decoded. If a type implements both sets of functions, REZI will prefer the binary marshaling functions.

REZI supports slice, array, and map types whose values are of any supported type (including those whose values are themselves slice, array, or map values). Maps must additionally have keys of type string, bool, one of the built-in integer types, or one of the built-in float types.

REZI can also handle encoding and decoding pointers to any supported type, with any level of indirection.

On top of all of the above, REZI automatically supports any type whose underlying type is supported, as well as any struct whose exported fields are all of supported types.

Struct Support

Much like the json package, REZI can encode and decode most simple structs out of the box without requiring any further customization. Simple in this case means that all of its exported fields are a supported type. As only the exported fields are encoded and decoded to bytes, it's okay if an unexported field is of an unsupported type.


// AnimalInfo is fully supported; all fields will be encoded and decoded.
type AnimalInfo struct {
    Name string
    Taxonomy []string
    AverageAge int
}

// Animal is *not* supported; despite field Info being of a supported type,
// Counter is a channel, which is unsupported.
type Animal struct {
    Info AnimalInfo
    Counter chan int
}

// HiddenCounterAnimal is supported. Even though field counter is of an
// unsupported type, it is unexported and so it will be ignored.
type HiddenCounterAnimal struct {
    Info AnimalInfo
    counter chan int
}

Unexported fields of a struct are ignored when encoding and decoding. Any struct that has unexported fields will keep their original values if a pointer to that struct is passed in to be decoded.

type Player struct {
    Name string
    Classpect string

    echeladder string
}

john := Player{Name: "John Egbert", Classpect: "Heir of Breath", echeladder: "Plucky Tot"}

// the encoded bytes will only contain Name and Classpect; echeladder is not
// exported so it is ignored
data, err := rezi.Enc(john)
if err != nil {
    panic(err)
}

// now we will decode the bytes to two structs, one with the unexported member
// pre-set
var playerWithoutEcheladder Player
var playerWithEcheladder Player = Player{echeladder: "Plucky Tot"}

_, err = rezi.Dec(data, &playerWithoutEcheladder)
if err != nil {
    panic(err)
}
_, err = rezi.Dec(data, &playerWithEcheladder)
if err != nil {
    panic(err)
}

fmt.Println(playerWithoutEcheladder.echeladder) // ""
fmt.Println(playerWithEcheladder.echeladder) // "Plucky Tot"

Embedded structs within structs are supported if the embedded struct type is exported; this is because it will be turned into a field with the same name as the embedded type, and if it is exported, the field name will correspondingly be exported. Likewise, embedded structs whose type is unexported will be ignored during encoding and will not be encoded to.

type InternalRecord struct {
    ID int
    Location string
}

type secret struct {
    BigSecret string
}

// All fields of Employee will be marshaled and unmarshaled to; InternalRecord
// is exported
type Employee struct {
    InternalRecord
    Name string
}

// Only Name will be encoded and decoded to; secret is an unexported type.
type KeyData struct {
    secret
    Name string
}

If any of the above limitations are a concern, you can customize the encoding of user-defined types by implementing one of the marshaler types encoding.BinaryMarshaler or encoding.TextMarshaler (and their corresponding unmarshler interfaces for decoding) as described in the next section.

Custom Encoding And Decoding

REZI supports encoding any custom type that implements encoding.BinaryMarshaler, and it supports decoding any custom type that implements encoding.BinaryUnmarshaler with a pointer receiver. In fact, the lack of built-in facilities in Go for binary encoding of user-defined types is partially why REZI exists.

REZI can perform automatic inference of a user-defined struct type's encoding, similar to what the json library is capable of. User-defined types that do not implement BinaryMarshaler or TextMarshaler are supported for encoding if their underlying type is one supported by REZI, or if it is a struct type, if all of its exported fields are supported, and vice-versa for decoding.

Within the MarshalBinary method, you can customize encoding the data to whichever format you wish, though these examples will have that function use REZI to encode the members of the types. The contents of the slice that MarshalBinary returns are completely opaque to REZI, which will consider only the slice's length. Do note that this means that returning a nil slice or an empty but initialized slice will both be interpreted the same by REZI and will not result in different encodings.

// Person is an example of a user-defined type that REZI can encode and decode.
type Person struct {
    Name string
    Number int
}

func (p Person) MarshalBinary() ([]byte, error) {
    var enc []byte

    var err error
    var reziBytes []byte

    reziBytes, err = rezi.Enc(p.Name)
    if err != nil {
        return nil, fmt.Errorf("name: %w", err)
    }
    enc = append(enc, reziBytes...)

    reziBytes, err = rezi.Enc(p.Number)
    if err != nil {
        return nil, fmt.Errorf("number: %w", err)
    }
    enc = append(enc, reziBytes...)

    return enc, nil
}

It's always good practice to check the error value returned by Enc, but it is worth noting that for certain values (generally, ones whose type is built-in or consists only of built-in types), Enc will never return an error. If you know that a value cannot possibly return an error under normal circumstances (see the Godocs for Enc() to check which types that is true for), you can use MustEnc to return only the bytes, which can be useful when encoding several values in sequence directly into append() calls.

// this variant of MarshalBinary calls MustEnc to encode values that are
// built-in types.
func (p Person) MarshalBinary() ([]byte, error) {
    var enc []byte

    enc = append(enc, rezi.MustEnc(p.Name)...)
    enc = append(enc, rezi.MustEnc(p.Number)...)

    return enc, nil
}

Custom decoding of user-defined types is handled with the UnmarshalBinary method. The bytes that were returned by MarshalBinary while decoding are picked up by REZI and passed into UnmarshalBinary. Note that unlike the MarshalBinary method, which must be defined with a value receiver for the type, REZI requires the UnmarshalBinary to be defined with a pointer receiver.

// UnmarshalBinary takes in bytes and decodes them into a new Person object,
// which it assigns as the value of its receiver.
func (p *Person) UnmarshalBinary(data []byte) error {
    var n int
    var offset int
    var err error

    var decoded Person

    // decode name member
    n, err = rezi.Dec(data[offset:], &decoded.Name)
    if err != nil {
        return fmt.Errorf("name: %w", err)
    }
    offset += n

    // decode number member
    n, err = rezi.Dec(data[offset:], &decoded.Number)
    if err != nil {
        return fmt.Errorf("number: %w", err)
    }
    offset += n

    // everyfin was successfully decoded! assign the result as the value of this
    // Person.
    *p = decoded

    return nil
}

REZI decoding supports reporting byte offsets where an error occurred in the supplied data. In order to support this in user-defined types, Wrapf can be used to wrap an error returned from REZI functions and give the offset into the data that the REZI function was called on. This offset will be combined with any inside the REZI error to give the complete offset:

// a typical use of Wrapf within an UnmarshalBinary method:

n, err = rezi.Dec(data[offset:], &decoded.Name)
if err != nil {
    // Always make sure to use %s or %v in Wrapf, never %w!
    return rezi.Wrapf(offset, "name: %s", err)
}
offset += n

// Additionally, first arg to the format string must always be the error
// returned from the REZI function.

When a type has both the UnmarshalBinary and MarshalBinary methods defined, it can be encoded and decoded with Enc and Dec just like any other type:

import (
    "fmt"

    "github.com/dekarrin/rezi/v2"
)

...

p := Person{Name: "Terezi", Number: 612}

data, err := rezi.Enc(p)
if err != nil {
    panic(err)
}

var decoded Person

_, err := rezi.Dec(data, &decoded)
if err != nil {
    panic(err)
}

fmt.Println(decoded.Name) // "Terezi"
fmt.Println(decoded.Number) // 612
Compression

REZI supports compression via the use of Reader and Writer. When one is created, instead of giving a nil value for the Format it accepts, pass in a Format with Compression set to true.


w, err := rezi.NewWriter(someWriter, &rezi.Format{Compression: true})
if err != nil {
    panic(err)
}

w.Enc(413)

// don't forget to call Flush or Close when done
w.Flush()

// on the read side, get an io.Reader someReader you want to read REZI-encoded
// data from, and pass it to NewReader along with a Format that enables
// Compression.

r, err := rezi.NewReader(someReader, &rezi.Format{Compression: true})
if err != nil {
    panic(err)
}

var number int

r.Dec(&number)

fmt.Println(number) // 413

Documentation

Overview

Package rezi provides the ability to encode and decode data in Rarefied Encoding (Compressible) Interchange format. It allows Go types and user-defined types to be easily read from and written to byte slices, with customization possible by implementing encoding.BinaryUnmarshaler and encoding.BinaryMarshaler on a type, or alternatively by implementing encoding.TextUnmarshaler and encoding.TextMarshaler. REZI has an interface similar to the json package; one function is used to encode all supported types, and another function receives bytes and a receiver for decoded data and infers how to decode the bytes based on the receiver.

The Enc function is used to encode any supported type to REZI bytes:

import "github.com/dekarrin/rezi/v2"

func main() {
	specialNumber := 413
	name := "TEREZI"

	var numData []byte
	var nameData []byte
	var err error

	numData, err = rezi.Enc(specialNumber)
	if err != nil {
		panic(err.Error())
	}

	nameData, err = rezi.Enc(name)
	if err != nil {
		panic(err.Error())
	}
}

Data from multiple calls to Enc() can be combined into a single block of data by appending them together:

var allData []byte
allData = append(allData, numData...)
allData = append(allData, nameData...)

The Dec function is used to decode data from REZI bytes:

var readNumber int
var readName string

var n, offset int
var err error

n, err = rezi.Dec(allData[offset:], &readNumber)
if err != nil {
	panic(err.Error())
}
offset += n

n, err = rezi.Dec(allData[offset:], &readName)
if err != nil {
	panic(err.Error())
}
offset += n

Alternatively, instead of calling Dec and Enc directly on a pre-loaded slice of data bytes, the Reader and Writer types can be used to operate on a stream of data. See the section below for more information.

Compression

Compression can be enabled by passing a Format struct with Compression options set to any method which accepts a Format. At this time, this is possible only with Readers and Writers.

The zlib library is used for compression, with a compression ratio that may be specified at write time.

Readers and Writers

For reading and writing from data streams, Reader and Writer are provided. They each have their own Dec and Enc methods and do not require that manual tracking be provided for proper offset error-reporting.

Additionally, the Reader and Writer both support being used for writing arbitrary streams of bytes encoded as REZI byte slices. Using the typical Write method on Writer will result in writing them as a single REZI-encoded slice of bytes. Those bytes can later be read from via calls to Reader.Read, which will automatically read across multiple encoded slices if needed. This allows both sides to operate without needing to the "full" length of their data ahead of time, although it should be noted that this is not a particularly efficient use of REZI encoding.

Error Checking

Errors in REZI have specific types that they can be checked against to determine their cause. These errors conform to the errors interface and must be checked by using errors.Is.

As mentioned in that library's documentation, errors should not be checked with simple equality checks. REZI enforces this fully. Non-nil errors that are checked with `==` will never return true.

if err == rezi.Error

The above expression is not simply the non-preferred way of checking an error, but rather is entirely non-functional, as it will always return false. Instead, do:

if errors.Is(err, rezi.Error)

There are several error types defined for checking non-nil errors. Error is the type that all non-nil errors from REZI will match. It may be caused by some other underlying error; again, use errors.Is to check this, even if a non-rezi error is being checked. For instance, to check if an error was caused due to the supplied bytes being shorter than expected, use errors.Is(err, io.ErrUnexpectedEOF).

See the individual functions for a list of error types that returned errors may be checked against.

Supported Data Types

REZI supports all built-in basic Go types: int (as well as all of its unsigned and specific-size varieties), float32, float64, complex64, complex128, string, bool, and any type that implements encoding.BinaryMarshaler or encoding.TextMarshaler (for encoding) or whose pointer type implements encoding.BinaryUnmarshaler or encoding.TextUnmarshaler (for decoding). Implementations of encoding.BinaryUnmarshaler should use Wrapf when encountering an error from a REZI function called from within UnmarshalBinary to supply additional offset information, but this is not strictly required.

Slices, arrays, and maps are supported with some stipulations. Slices and arrays must contain only other supported types (or pointers to them). Maps have the same restrictions on their values, but only maps with a key type of string, int (or any of its unsigned or specific-size varieties), float32, float64, or bool are supported.

Pointers to any supported type are also accepted, including to other pointer types with any number of indirections. The REZI format encodes information on how many levels of indirection are valid, though of course note that it does not have any concept of two different pointer variables pointing to the same data.

All non-struct types whose underlying type is a supported type are themselves supported as well. For example, time.Duration has an underlying type of int64, and is therefore supported in REZI.

Struct types are supported even if they do not implement text or binary marshaling functions, provided all of their exported fields are of a supported type. Both decoding and encoding ignore all unexported fields. If a field is not present in the given bytes during decoding, its original value is left intact, even if it is exported.

Binary Data Format

REZI uses a binary format for all supported types. Other than bool, which is encoded as a single byte, an encoded value will start with one or more "info" bytes that contain metadata on the value itself. This is typically the length of the full value but may include additional information such as whether the encoded value is a nil pointer.

Note that the info byte does not give information on the type of the encoded value, besides whether it is nil (and still, the type of the nil is not encoded). Types of the encoded values are inferred by the pointer receiver that is passed to Dec(). If a pointer to an int is passed to it, the bytes will be interpreted as an encoded int; likewise, if a pointer to a string is passed to it, the bytes will be interpreted as an encoded string.

The INFO Byte

Layout:

SXNILLLL
|      |
MSB  LSB

The info byte has information coded into its bits represented as SXNILLLL, where each letter from left to right stands for a particular bit from most to least significant.

The bit labeled "S" is the sign bit; when high (1), it indicates that the following integer value is negative.

The "X" bit is the extension flag, and indicates that the next byte is a second info byte with additional information, called the info extension byte. At this time, only encoded string values use this extension byte.

The "N" bit is the explicit nil flag, and when set it indicates that the value is a nil and that there are no following bytes in the encoded value other than any indirection amount indicators.

The "I" bit is the indirection bit, and if set, indicates that the following bytes encode the number of additional indirections of the pointer beyond the initial indirection at which the nil occurs; for instance, a nil *int value is encoded as simply the info byte 0b00100000, but a non-nil **int that points at a nil *int would be encoded with one level of additional indirection and the info byte's I bit would be set.

The "L" bits make up the length of the value. Together, they are a 4-bit unsigned integer that indicates how many of the following bytes are part of the encoded value. If the I bit is set on the info byte, the L bits give the number of bytes that make up the indirection level rather than the actual value.

The EXT Byte

Layout:

BXUUVVVV
|      |
MSB  LSB

The initial INFO byte may be followed by a second byte, the info extension byte (EXT for short). This encodes additional metadata about the encoded value.

The "B" bit is the byte count flag. If this is set, it explicitly indicates that a count in bytes is given immediately after all extension bytes in the header have been scanned. This count is given as the data bytes of a regularly-encoded int value sans its own header (its header is the one that the EXT byte is a part of). Note that the lack of this flag or the extension byte as a whole does not necessarily indicate that the count is *not* byte-based; an encoded type format that explicitly notes that the count is byte-based without an EXT byte in its layout diagram will be assumed to have a byte-based length.

The "V" bits make up the version field of the extension byte. This indicates the version of encoding of the particular type that is represented, encoded as a 4-bit unsigned integer. If not present (all 0's, or the EXT byte itself is not present), it is assumed to be 1. This version number is purely informative and does not affect decoding in any way.

The "U" bits are unused at this time and are reserved for future use.

Bool Values

Layout:

[ VALUE ]
 1 byte

Boolean values are encoded in REZI as the byte value 0x01 for true, or 0x00 for false. Bool is the only type whose encoded value does not begin with an info byte, although a pointer-to-bool may be encoded with an info byte if it is nil.

Integer Values

Layout:

[ INFO ] [ INT VALUE ]
 1 byte    0..8 bytes

Integer values begin with the info byte. Assuming that it is not nil, the 4 L bits of the info byte give the number of bytes that are in the value itself, and the S bit represents whether the value is negative.

The INT VALUE portion of the integer includes all bytes necessary to rebuild the integer value. It is created by first taking the integer's value expanded to 64-bits, and then removing all leading insignificant bytes (those with a value of 0x00 for positive integers, or those with a value of 0xff for negative integers). These bytes are then used as the INT VALUE.

As a result of the above encoding, certain integer values can be encoded with no bytes in INT VALUE at all; the 64-bit representation for 0 is all 0x00's, and therefore has no significant bytes. Likewise, the 64-bit representation for -1 using two's complement representation is all 0xff's. Both of these are encoded by an INFO byte that gives a length of zero; distinguishing between the two is done via the sign bit in the INFO byte.

All Go integer types are encoded in the same way. This includes int, int8, int16, int32, int64, uint, uint8, uint16, uint32, and uint64. The specific interpretation into a value is handled at decoding time by infering the type from the pointer passed to Enc.

Float Values

Full Layout:

[ INFO ] [ COMP-EXPONENT-HIGHS ] [ MIXED ] [ MANTISSA-LOWS ]
 1 byte          1 byte            1 byte      0..6 bytes

Short-Form Layout:

[ INFO ]
 1 byte

A non-zero float value is encoded by taking the components of its representation in IEEE-754 double-precision and encoding them across 1 to 9 bytes, using compaction where possible. These components are a 1-bit sign, an 11-bit exponent, and a 52-bit fraction (also known as the mantissa). Float values of 0.0 and -0.0 are instead encoded using an abbreviated "short-form" that consists of only a single byte.

All float values begin with an INFO byte. Assuming it does not denote a nil value, the 4 L bits of the info byte give the number of bytes following all header bytes that are used to encode the value, and the S bit represents whether the value is negative, thus encoding the 1-bit sign. If the L bits give a non-zero value, the float value uses the full encoding layout; if the L bits give a zero value, the float value uses the short-form.

An INFO byte in full-form is followed by the COMP-EXPONENT-HIGHS byte. This contains two fields, organized in the byte bits as CEEEEEEE. The first field is a 1-bit flag, denoted by "C", that indicates whether compaction of the mantissa is performed from the right or the left side. If set, it is from the right; if not set, it is from the left. The remaining bits in the byte, denoted by "E", are the 7 high-order bits of the exponent component of the represented value.

The next byte in full-form is a MIXED byte containing two fields, organized in the byte bits as EEEEMMMM. The first field, denoted by "E", contains the 4 lower-order bits of the exponent. The second field, denoted by "MMMM", contains the 4 high-order bits of the mantissa.

After the MIXED byte, the remaining 48 low-order bits of the mantissa are encoded with compaction similar to that performed on integer values, but with some modifications. First, only 0x00 bytes are removed from the representation to compact them; 0xff bytes are never removed, as the mantissa is itself is never represented as a two's complement negative value. Second, consecutive 0x00 bytes may be removed from either the left or the right side of those 48 bits, whatever would make it more compact. The "C" bit being set in the COMP-EXPONENT-HIGHS byte indicates that they are removed from the right, otherwise they are removed from the left as in compaction of integer values. If all 48 low-order bits of the Mantissa are 0x00, they will all be compacted and the entire float will take up only the initial three bytes.

Note that the above compaction applies only to the 48 low-order bits of the mantissa; the high 4 bits will always be present in the MIXED byte regardless of their value.

Zero-valued floats, 0.0 and -0.0, are not encoded using the full layout described above, but instead as special cases are encoded in a short-form layout as a single INFO byte whose L bits are all set to 0. 0.0 is encoded in as a single 0x00 byte, and -0.0 is encoded as a single 0x80 byte. These are the only values of float that are encoded in short-form; all others use the full form.

Complex Values

Layout:
[ INFO ] [ EXT ] [ INT VALUE ] [ INFO ] [ FLOAT VALUE ] [ INFO ] [ FLOAT VALUE ]
<-----------COUNT------------> <------REAL PART-------> <----IMAGINARY PART---->
         2..10 bytes                  3..9 bytes               3..9 bytes

Short-Form Layout:

[ INFO ]
 1 byte

Complex values are, in general, encoded as a count of bytes in the header bytes given as an explicit byte count followed by that many bytes containing first the real component and then the imaginary component in sequence, encoded as float values.

As special cases, a complex value with a positive 0.0 real part and positive 0.0 imaginary part is encoded using the short-form layout as only a single info byte with a value of 0x00, and a complex value with a negative 0.0 real part and negative 0.0 imaginary part is encoded as only a single info byte with a value of 0x80. This only applies to values of (+0.0)+(+0.0)i and (-0.0)+(-0.0)i; there is no special case for when both are zero but of opposite signs or for when one part is some zero but the other part is not.

String Values

Full Layout:

[ INFO ] [ EXT ] [ INT VALUE ] [ CODEPOINT 1 ] ... [ CODEPOINT N ]
<-----------COUNT------------> <------------CODEPOINTS----------->
         2..10 bytes                       COUNT bytes

Short-Form Layout:

[ INFO ]
 1 byte

String values are encoded as a count of bytes in the info header section followed by the Unicode codepoints that make up the string encoded using UTF-8. Non-empty strings will use the full layout; an empty string will use the abbreviated short-form layout.

A non-empty string value's first info byte will have its extension bit set and will indicate explicitly that it uses a byte-based count in the extension byte that follows. This is to distinguish it from older-style (V0) string encodings, which encoded data length as the count of codepoints rather than bytes.

An empty string is encoded using the short-form layout as a single info byte, 0x00.

encoding.BinaryMarshaler Values

Layout:

[ INFO ] [ INT VALUE ] [ MARSHALED BYTES ]
<-------COUNT--------> <-MARSHALED BYTES->
      1..9 bytes           COUNT bytes

Any type that implements encoding.BinaryMarshaler is encoded by taking the result of calling its MarshalBinary() method and prepending it with an integer value giving the number of bytes in it.

encoding.TextMarshaler Values

Layout:

[ INFO ] [ INT VALUE ] [ MARSHALED BYTES ]
<-------COUNT--------> <-MARSHALED BYTES->
      1..9 bytes           COUNT bytes

Any type that implements encoding.TextMarshaler is encoded by taking the result of calling its MarshalText() method and encoding that value as a string.

Note that BinarayMarshaler encoding takes precedence over TextMarshaler encoding; if a type implements both, it will be encoded as a BinaryMarshaler, not a TextMarshaler.

Struct Values

Layout:

[ INFO ] [ INT VALUE ] [ FIELD 1 ] [ VALUE 1 ] ... [ FIELD N ] [ VALUE N ]
<-------COUNT--------> <---------------------VALUES---------------------->
      1..9 bytes                           COUNT bytes

Structs that do not implement binary marshaling or text marshaling funcitons are encoded as a count of all bytes that make up the entire struct, followed by pairs of the names and associated values for each exported field of the struct. Each pair consists of the case-sensitive name of the field encoded as a string, followed immediately by the encoded value of that field. There is no special delimiter between name-value pairs or between the name and value in a pair; where one ends, the next one begins.

The encoded names are placed in a consistent order; encoding the same struct will result in the same encoding.

Slice Values

Layout:

[ INFO ] [ INT VALUE ] [ ITEM 1 ] ... [ ITEM N ]
<-------COUNT--------> <--------VALUES--------->
      1..9 bytes              COUNT bytes

Slices are encoded as a count of bytes that make up the entire slice, followed by the encoded value of each element in the slice. There is no special delimiter between the encoded elements; when one ends, the next one begins.

Array Values

Layout:

(same as slice values)

Arrays are encoded in an identical fashion to slices. They do not record the size of the array type.

Map Values

Layout:

[ INFO ] [ INT VALUE ] [ KEY 1 ] [ VALUE 1 ] ... [ KEY N ] [ VALUE N ]
<-------COUNT--------> <-------------------VALUES-------------------->
      1..9 bytes                         COUNT bytes

Map values are encoded as a count of all bytes that make up the entire map, followed by pairs of the encoded keys and associated values for each element of the map. Each pair consists of the encoded key, followed immediately by the encoded value that the key maps to. There is no special delimiter between key-value pairs or between the key and value in a pair; where one ends, the next one begins.

The encoded keys are placed in a consistent order; encoding the same map will result in the same encoding regardless of the order of keys encountered during iteration over the keys.

Nil Values

Layout:

[ INFO ] [ INT INFO ] [ INT VALUE ]
<-INFO-> <---EXTRA INDIRECTIONS--->
 1 byte          0..9 bytes

Nil values are encoded similarly to integers, with one major exception: the nil bit in the info byte is set to true. This allows a nil to be stored in the same place as a length count, so when interpreting data, a length count can be checked for nil and if nil, instead of the normal value being decoded, a nil value is decoded.

Nil pointers to a non-pointer type of any kind are encoded as a single info byte with the nil bit set and the indirection bit unset.

Pointers that are themselves not nil but point to another pointer which is nil are encoded slightly differently. In this case, the info byte will have both the nil bit and the indirection bit set, and will then be followed by a normal encoded integer with its own info byte. The encoded integer gives the number of indirections that are done before a nil pointer is arrived at. For instance, a ***int that points to a valid **int that itself points to a valid *int which is nil would be encoded as a nil with an indirection level of 2.

Encoded nil values are not typed; they will be interpreted as the same type as the pointed-to value of the receiver passed to REZI during decoding.

Pointer Values

Layout:

(either encoded value type, or encoded nil)

Pointers do not have their own dedicated encoding format. Instead, the value a pointer points to is encoded as though it were not a pointer type, and when decoding to a pointer, the value is first decoded, then a pointer to the decoded value is created and used as the returned value.

If a pointer is nil, it is instead encoded as a nil value.

Pointers that have multiple levels of indirection before arriving at the pointed-to value are not treated any differently when non-nil; i.e. an **int which points to an *int which points to an int with value 413 would be encoded as an integer value representing 413. If a pointer with multiple levels of indirection has a nil somewhere in the indirection chain, it is encoded as a nil value; see the section on nil value encodings for a description of how this information is captured.

Backward Compatibility

Older versions of the REZI library use a binary data format that differs from the current one. The current version retains compatibility for reading data produced by prior versions of this library, regardless of whether they were major version releases. The binary format outlined above and the changes noted below are all considered a part of "V1" of the binary format itself separate from the version of the Go module.

REZI library versions prior to v1.1.0 indicate nil by giving -1 as the byte count, and could only encode a nil value for slices and maps. This older format is only able to encode a single level of indirection, i.e. a nil pointer-to-type, with no additional indirections. Due to this limitation, decoding these values will result in either a nil pointer or all levels indirected up to the non-nil value; it will never be decoded as, for example, a pointer to a pointer which is then nil. When writing a nil value, REZI sets the sign bit and keeps the length bytes clear in the first INFO header byte; this allows versions prior to v1.1.0 to be able to read it, as long as it has only a single level of indirection.

REZI library versions prior to v2.1.0 encode string data length as the number of Unicode codepoints rather than the number of bytes and do so in the info byte with no info extension byte. These strings can be decoded as normal with Dec and Reader.Dec.

Index

Constants

This section is empty.

Variables

View Source
var (
	// Error is a general error returned from encoding and decoding functions.
	// All non-nil errors returned from this package will return true for the
	// expression errors.Is(err, Error).
	Error = errors.New("a problem related to the binary REZI format has occurred")

	// ErrMarshalBinary indicates that calling a MarshalBinary method on a type
	// that was being encoded returned a non-nil error. Any error returned from
	// this package that was caused by this will return true for the expression
	// errors.Is(err, ErrMarshalBinary).
	ErrMarshalBinary = errors.New("MarshalBinary() returned an error")

	// ErrMarshalText indicates that calling a MarshalText method on a type that
	// was being encoded returned a non-nil error. Any error returned from this
	// package that was caused by this will return true for the expression
	// errors.Is(err, ErrMarshalText).
	ErrMarshalText = errors.New("MarshalText() returned an error")

	// ErrUnmarshalBinary indicates that calling an UnmarshalBinary method on a
	// type that was being decoded returned a non-nil error. Any error returned
	// from this package that was caused by this will return true for the
	// expression errors.Is(err, ErrUnmarshalBinary).
	ErrUnmarshalBinary = errors.New("UnmarshalBinary() returned an error")

	// ErrUnmarshalText indicates that calling an UnmarshalText method on a type
	// that was being decoded returned a non-nil error. Any error returned from
	// this package that was caused by this will return true for the expression
	// errors.Is(err, ErrUnmarshalText).
	ErrUnmarshalText = errors.New("UnmarshalText() returned an error")

	// ErrInvalidType indicates that the value to be encoded or decoded to is
	// not of a valid type. Any error returned from this package that was caused
	// by this will return true for the expression
	// errors.Is(err, ErrInvalidType).
	ErrInvalidType = errors.New("data is not the correct type")

	// ErrMalformedData indicates that there is a problem with the data being
	// decoded. Any error returned from this package that was caused by this
	// will return true for the expression errors.Is(err, ErrMalformedData).
	ErrMalformedData = errors.New("data cannot be interpretered")
)

Functions

func Dec

func Dec(data []byte, v interface{}) (n int, err error)

Dec decodes a value from REZI-format bytes in data, starting with the first byte in it. It returns the number of bytes consumed in order to read the complete value. If the data slice was constructed by appending encoded values together, then skipping over n bytes after a successful call to Dec will result in the next call to Dec reading the next subsequent value.

V must be a non-nil pointer to a type supported by REZI. The type of v is examined to determine how to decode the value. The data itself is not examined for type inference, therefore v must be a pointer to a compatible type. V is only assigned to on successful decoding; if this function returns a non-nil error, v will not have been assigned to.

If a problem occurs while decoding, the returned error will be non-nil and will return true for errors.Is(err, rezi.Error). Additionally, the same expression will return true for other error types, depending on the cause of the error. Do not check error types with the equality operator ==; this will always return false.

Non-nil errors from this function can match the following error types:

  • Error in all cases.
  • ErrInvalidType if the type pointed to by v is not supported or if v is a nil pointer.
  • ErrUnmarshalBinary if an implementor of encoding.BinaryUnmarshaler returns an error from its UnmarshalBinary method (additionally, the returned error will match the same types that the error returned from UnmarshalBinary would match).
  • ErrUnmarshalText if an implementor of encoding.TextUnmarshaler returns an error from its UnmarshalText method.
  • io.ErrUnexpectedEOF if there are fewer bytes than necessary to decode the value.
  • ErrMalformedData if there is any problem with the data itself (including there being fewer bytes than necessary to decode the value).

func Enc

func Enc(v interface{}) (data []byte, err error)

Enc encodes a value to REZI-format bytes. The type of the value is examined to determine how to encode it. No type information is included in the returned bytes, so it is up to the caller to keep track of it and use a receiver of a compatible type when decoding.

If a problem occurs while encoding, the returned error will be non-nil and will return true for errors.Is(err, rezi.Error). Additionally, the same expression will return true for other error types, depending on the cause of the error. Do not check error types with the equality operator ==; this will always return false.

Non-nil errors from this function can match the following error types:

  • Error in all cases.
  • ErrInvalidType if the type of v is not supported.
  • ErrMarshalBinary if an implementor of encoding.BinaryMarshaler returns an error from its MarshalBinary method (additionally, the returned error will match the same types that the error returned from MarshalBinary would match).
  • ErrMarshalText if an implementor of encoding.TextMarshal returns an error from its MarshalText method.

func MustDec

func MustDec(data []byte, v interface{}) int

MustDec is identical to Dec, but panics if an error would be returned.

func MustEnc

func MustEnc(v interface{}) []byte

MustEnc is identical to Enc, but panics if an error would be returned.

func Wrapf added in v2.1.0

func Wrapf(offset int, format string, reziErr error, a ...interface{}) error

Wrapf takes an offset and applies it to an existing error returned from rezi. It is intended to be used within custom UnmarshalBinary methods to provide the number of bytes into the data that the problem occured for error reporting.

The offset is applied to the given error, which must be a rezi error. The first argument to the format string is the error, which will be wrapped by an error that adds the supplied offset. If the error is not an error returned from rezi, this function will panic.

Use it like this:

n, err = rezi.Dec(dataBytes[curPos:], &dest)
if err != nil {
  return rezi.Wrapf(curPos, "problem occured: %v", err)
}

This is generally only intended to be used with errors returned from decoding, but it can be used to supply an offset for encoding errors as well, should it be desired.

Do not use "%w" to wrap the error; it will automatically be wrapped, so use "%v" instead. Using "%w" will make this function panic.

Types

type Format added in v2.1.0

type Format struct {
	// Version is the version of the Format used. At this time only data format
	// V1 exists.
	//
	// As a special case, a Version value of 0 is interpreted as data format V1;
	// all other values are interpreted as that exact data format version.
	//
	// A Version value of -1 is interpreted as auto-detected data format. This
	// can only be used to detect formats in data written in formats after V1.
	Version int

	// Compression is whether compression is enabled.
	Compression bool

	// CompressionLevel is the level of compression to use for writing, as
	// specified by constants from the zlib package. If not given,
	// zlib.DefaultCompression is used.
	//
	// This property is used only by NewWriter and is ignored by NewReader.
	CompressionLevel int
}

Format is a specification of a binary data format used by REZI. It specifies how data should be laid out and contains any options needed to do so.

A nil or empty Format can be passed to functions which use it, and will be interpreted as a version 1 Format with no compression.

type Reader added in v2.1.0

type Reader struct {
	// contains filtered or unexported fields
}

Reader is an io.ReadCloser that reads from REZI data streams. A Reader may be opened in compression mode or normal mode; compression mode can only read streams written by a Writer in compression mode.

The zero-value is a Reader ready to read REZI data streams in the default V1 data format.

func NewReader added in v2.1.0

func NewReader(r io.Reader, f *Format) (*Reader, error)

NewReader creates a new Reader ready to read data from r. If Compression is enabled in the supplied Format, it will interpret compressed data returned from r.

If f is nil or points to the zero-value of Format, the default format of V1 with compression disabled is selected, compatible for reading all written data that did not specify a Format (including in older releases of REZI). This function will make a copy of the Format pointed to; changes to it from outside this function will not be reflected in the returned Reader.

This function returns a non-nil error only in cases where compression is selected via the format and an error occurs when opening a zlib reader on r.

It is the caller's responsibility to call Close on the returned reader when done.

func (*Reader) Close added in v2.1.0

func (r *Reader) Close() error

Close frees any resources needed from opening the Reader.

func (*Reader) Dec added in v2.1.0

func (r *Reader) Dec(v interface{}) (err error)

Dec decodes REZI-encoded bytes in r at the current position into the supplied value v, then advances the data stream past those bytes.

Parameter v must be a pointer to a type supported by REZI.

func (*Reader) Format added in v2.1.0

func (r *Reader) Format() Format

Format returns the Format that r interprets data as.

func (*Reader) Offset added in v2.1.0

func (r *Reader) Offset() int

Offset returns the current number of bytes that the Reader has interpreted as REZI encoded bytes from the stream. Note that if compression is enabled, this refers to the number of uncompressed data bytes interpreted, regardless of how many actual bytes are read from the underlying reader provided to r at construction.

func (*Reader) Read added in v2.1.0

func (r *Reader) Read(p []byte) (n int, err error)

Read reads up to len(p) bytes from one or more REZI-encoded byte slices present in sequence in the data stream and places them into p. Returns the number of valid bytes read into p.

Read requires the underlying data stream at the current position to consist only of one or more REZI-encoded byte slices. Attempting to read more bytes than the current byte slice has will cause more slices to be read from the underlying reader until either p can be filled or the end of the stream is reached. If the last slice read in this fashion is not completely used by p, i.e. if p does not have enough room to hold the complete slice, then the remaining bytes decoded are buffered, and the next call to Read will begin filling its p with those bytes before reading another slice from the underlying reader.

Note that the number of bytes read into p (returned as n) is almost certainly less than the total number of bytes read from the underlying data stream; to capture this, call Offset before and after calling Read and check the difference between them.

Returns io.EOF only in non-error circumstances. It is possible for n > 0 when err is non-nil and even when err is not io.EOF. All errors besides io.EOF will be wrapped in a special error type from the rezi package; use errors.Is to compare the returned error.

If len(p) is greater than the total number of bytes available, but every byte that *is* available is organized as valid REZI-encoded byte slices, err will be io.EOF and n will be the number of bytes that could be read.

type Writer added in v2.1.0

type Writer struct {
	// contains filtered or unexported fields
}

Writer is an io.WriteCloser that writes REZI data streams. A Writer may be opened in compression mode or normal mode; bytes written in compression can only be read by a Reader in compression mode.

The zero-value is a Writer ready to write REZI data streams in the default V1 data format.

func NewWriter added in v2.1.0

func NewWriter(w io.Writer, f *Format) (*Writer, error)

NewWriter creates a new Writer ready to write data to w. If Compression is enabled in the supplied Format, it will write compressed REZI-encoded data to w.

If f is nil or points to the zero-value of Format, the default format of V1 with compression disabled is selected, compatible for writing data that can be read by routines which do not specify a Format (including those in older releases of REZI). This function will make a copy of the Format pointed to; changes to it from outside this function will not be reflected in the returned Writer.

This function returns a non-nil error only in cases where compression is selected via the format and an error occurs when opening a zlib writer on w.

It is the caller's responsibility to call Close on the returned Writer when done. Writes may be bufferred and not flushed until Close.

func (*Writer) Close added in v2.1.0

func (w *Writer) Close() error

Close flushes any pending bytes to the underlying stream and frees any resources created from opening the Writer.

func (*Writer) Enc added in v2.1.0

func (w *Writer) Enc(v interface{}) error

Enc writes REZI-encoded bytes to w. The encoded bytes are not necessarily flushed until the Writer is closed or explicitly flushed.

Parameter v must be a type supported by REZI.

func (*Writer) Flush added in v2.1.0

func (w *Writer) Flush() error

Flush writes any pending data to the underlying data stream.

func (*Writer) Format added in v2.1.0

func (w *Writer) Format() Format

Format returns the Format that w encodes data as.

func (*Writer) Write added in v2.1.0

func (w *Writer) Write(p []byte) (n int, err error)

Write writes the given bytes as a single slice of REZI-encoded bytes to the underlying data stream. Any number of bytes written in this function across multiple calls to Write can be read by Reader.Read in any aribitrary order; this makes it so that the length does not need to be known ahead of time on either side, at the cost of data space.

If the Writer was opened with compression enabled, the written bytes are not necessarily flushed until the Writer is closed or explicitly flushed.

Written byte slices use an explicit header; this will result in corrupted data if n is ever < len(p). At this time, n is not a reliable indicator of the number of bytes from p that were written when err != nil, but rather the total number written to the stream. When err == nil, n will be equal to len(p).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL