mus

package module
v0.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 9, 2025 License: MIT Imports: 1 Imported by: 7

README

mus-go Serializer

Go Reference GoReportCard codecov

mus-go is a MUS format serializer. However, due to its minimalist design and a wide range of serialization primitives, it can also be used to implement other binary serialization formats (here is an example where mus-go is utilized to implement Protobuf encoding).

To get started quickly, go to the code generator page.

Why mus-go?

It is lightning fast, space efficient and well tested.

Description

  • Has a streaming version.
  • Can run on both 32 and 64-bit systems.
  • Variable-length data types (like string, array, slice, or map) are encoded as: length + data. You can choose binary representation for both of these parts.
  • Supports data versioning.
  • Deserialization may fail with one of the following errors: ErrOverflow, ErrNegativeLength, ErrTooSmallByteSlice, ErrWrongFormat.
  • Can validate and skip data while unmarshalling.
  • Supports pointers.
  • Can encode data structures such as graphs or linked lists.
  • Supports oneof feature.
  • Supports private fields.
  • Supports out-of-order deserialization.
  • Supports zero allocation deserialization.

Contents

cmd-stream-go

cmd-stream-go allows to execute commands on the server. cmd-stream-go/MUS is about 3 times faster than gRPC/Protobuf.

musgen-go

Writing mus-go code manually can be tedious and error-prone. A better approach is to use a code generator, it's also incredibly easy to use - just provide a type and call Generate().

Benchmarks

Why did I create another benchmarks? The existing benchmarks have some notable issues - try running them several times, and you'll likely get inconsistent results, making it difficult to determine which serializer is truly faster. That was one of the reasons, and basically I made them for my own use.

How To

With mus-go, to make a type serializable, you need to implement the Serializer interface:


import "github.com/mus-format/mus-go"

// YourTypeMUS is a MUS serializer for YourType.
var YourTypeMUS = yourTypeMUS{}

// yourTypeMUS implements the mus.Serializer interface.
type yourTypeMUS struct{}

func (s yourTypeMUS) Marshal(v YourType, bs []byte) (n int)              {...}
func (s yourTypeMUS) Unmarshal(bs []byte) (v YourType, n int, err error) {...}
func (s yourTypeMUS) Size(v YourType) (size int)                         {...}
func (s yourTypeMUS) Skip(bs []byte) (n int, err error)                  {...}

And than use it like:

var (
  value YourType = ...
  size = YourTypeMUS.Size(value) // The number of bytes required to serialize the value.
  bs = make([]byte, size)
)

n := YourTypeMUS.Marshal(value, bs) // Returns the number of used bytes.
value, n, err := YourTypeMUS.Unmarshal(bs) // Returns the value, the number of 
// used bytes and any error encountered.

// Instead of unmarshalling the value can be skipped:
n, err := YourTypeMUS.Skip(bs)

Packages

mus-go offers several encoding options, each of which is in a separate package.

varint

Contains Varint serialzers for all uint (uint64, uint32, uint16, uint8, uint), int, float, byte data types. Example:

package main

import "github.com/mus-format/mus-go/varint"

func main() {
  var (
    num  = 100
    size = varint.Int.Size(num)
    bs = make([]byte, size)
  )
  n := varint.Int.Marshal(num, bs)
  num, n, err := varint.Int.Unmarshal(bs)
  // ...
}

Also includes the PositiveInt serializer (Varint without ZigZag) for positive int values. It can handle negative values as well, but with lower performance.

raw

Contains Raw serializers for the same byte, uint, int, float, time.Time data types. Example:

package main

import "github.com/mus-format/mus-go/raw"

func main() {
  var (
    num = 100
    size = raw.Int.Size(num)
    bs  = make([]byte, size)
  )
  n := raw.Int.Marshal(num, bs)
  num, n, err := raw.Int.Unmarshal(bs)
  // ...
}

More details about Varint and Raw encodings can be found in the MUS format specification. If in doubt, use Varint.

For time.Time, there are several serializers:

  • TimeUnix – encodes a value as a Unix timestamp in seconds.
  • TimeUnixMilli – encodes a value as a Unix timestamp in milliseconds.
  • TimeUnixMicro – encodes a value as a Unix timestamp in microseconds.
  • TimeUnixNano – encodes a value as a Unix timestamp in nanoseconds.

To ensure the deserialized value is in UTC, make sure your TZ environment variable is set to UTC. This can be done as follows:

os.Setenv("TZ", "")

Alternatively, you can use one of the corresponding UTC serializers, e.g., TimeUnixUTC, TimeUnixMilliUTC, etc.

ord (ordinary)

Contains serializers/constructors for bool, string, array, byte slice, slice, map, and pointer types.

Variable-length data types (such as string, array, slice, or map) are encoded as length + data. You can choose the binary representation for both parts. By default, the length is encoded using a Varint without ZigZag (varint.PositiveInt). In this case, the maximum length is limited by the maximum value of the int type on your system. This works well across different architectures - for example, an attempt to unmarshal a string that is too long on a 32-bit system will result in an ErrOverflow.

For array, slice, and map types, there are only constructors available to create a concrete serializer.

Array

Unfortunately, Go does not support generic parameterization of array sizes, as a result, the array serializer constructor looks like:

package main

import (
  "github.com/mus-format/mus-go/ord"
  "github.com/mus-format/mus-go/varint"
  arrops "github.com/mus-format/mus-go/options/array"
)

func main() {
   var (
    // The first type parameter of the NewArraySer function represents the array
    // type, and the second - the type of the array’s elements.
    //
    // As for the function parameters, varint.Int specifies the serializer for 
    // the array’s elements.
    ser = ord.NewArraySer[[3]int, int](varint.Int)

    // To create an array serializer with the specific length serializer use:
    // ser = ord.NewArraySer[[3]int, int](varint.Int, arrops.WithLenSer(lenSer))

    arr  = [3]int{1, 2, 3}
    size = ser.Size(arr)
    bs   = make([]byte, size)
  )
  n := ser.Marshal(arr, bs)
  arr, n, err := ser.Unmarshal(bs)
  // ...
}
Slice
package main

import (
  "github.com/mus-format/mus-go/ord"
  "github.com/mus-format/mus-go/varint"
  slops "github.com/mus-format/mus-go/options/slice"
)

func main() {
  var (
    // varint.Int specifies the serializer for the slice's elements.
    ser = ord.NewSliceSer[int](varint.Int)

    // To create a slice serializer with the specific length serializer use:
    // ser = ord.NewSliceSer[int](varint.Int, slops.WithLenSer(lenSer))

    sl = []int{1, 2, 3}
    size = ser.Size(sl)
    bs = make([]byte, size)
  )
  n := ser.Marshal(sl, bs)
  sl, n, err := ser.Unmarshal(bs)
  // ...
}
Map
package main

import (
  "github.com/mus-format/mus-go/ord"
  "github.com/mus-format/mus-go/varint"
  mapops "github.com/mus-format/mus-go/options/map"
)
func main() {
  var (
    // varint.Int specifies the serializer for the map’s keys, and ord.String -
    // the serializer for the map’s values.
    ser = ord.NewMapSer[int, string](varint.Int, ord.String)

    // To create a map serializer with the specific length serializer use:
    // ser = ord.NewMapSer[int, string](varint.Int, ord.String, mapops.WithLenSer(lenSer))

    m    = map[int]string{1: "one", 2: "two", 3: "three"}
    size = ser.Size(m)
    bs   = make([]byte, size)
  )
  n := ser.Marshal(m, bs)
  m, n, err := ser.Unmarshal(bs)
  // ...
}
unsafe

The unsafe package provides maximum performance, but be careful - it uses an unsafe type conversion. This warning largely applies to the string type because modifying the byte slice after unmarshalling will also change the string’s contents. Here is an example that demonstrates this behavior more clearly.

Provides serializers for the following data types: byte, bool, string, byte slice, time.Time and all uint, int, float.

pm (pointer mapping)

Let's consider two pointers initialized with the same value:

var (
  str = "hello world"
  ptr = &str

  ptr1 *string = ptr
  ptr2 *string = ptr
)

The pm package preserves pointer equality after unmarshalling ptr1 == ptr2, while the ord package does not. This capability enables the serialization of data structures like graphs or linked lists. You can find corresponding examples in mus-examples-go.

Structs Support

mus-go doesn’t support structural data types out of the box, which means you’ll need to implement the mus.Serializer interface yourself. But that’s not difficult at all. For example:

package main

import (
  "github.com/mus-format/mus-go/ord"
  "github.com/mus-format/mus-go/varint"
)

// We will implement the FooMUS serializer for this struct.
type Foo struct {
  str string
  sl  []int
}

// Serializers.
var (
  FooMUS = fooMUS{}

  // IntSliceMUS is used by the FooMUS serializer.
  IntSliceMUS = ord.NewSliceSer[int](varint.Int)
)

// fooMUS implements the mus.Serializer interface.
type fooMUS struct{}

func (s fooMUS) Marshal(v Foo, bs []byte) (n int) {
  n = ord.String.Marshal(v.str, bs)
  return n + IntSliceMUS.Marshal(v.sl, bs[n:])
}

func (s fooMUS) Unmarshal(bs []byte) (v Foo, n int, err error) {
  v.str, n, err = ord.String.Unmarshal(bs)
  if err != nil {
    return
  }
  var n1 int
  v.sl, n1, err = IntSliceMUS.Unmarshal(bs[n:])
  n += n1
  return
}

func (s fooMUS) Size(v Foo) (size int) {
  size += ord.String.Size(v.str)
  return size + IntSliceMUS.Size(v.sl)
}

func (s fooMUS) Skip(bs []byte) (n int, err error) {
  n, err = ord.String.Skip(bs)
  if err != nil {
    return
  }
  var n1 int
  n1, err = IntSliceMUS.Skip(bs[n:])
  n += n1
  return
}

All you have to do is deconstruct the structure into simpler data types and choose the desired encoding for each. Of course, this requires some effort. But, firstly, the code can be generated, secondly, this approach provides greater flexibility, and thirdly, mus-go stays quite simple, making it easy to implement in other programming languages.

DTS (Data Type metadata Support)

mus-dts-go enables typed data serialization using DTM.

Data Versioning

mus-dts-go can be used to implement data versioning. Here is an example.

MarshallerMUS Interface and MarshalMUS Function

It is often convenient to use the MarshallerMUS interface:

type MarshallerMUS interface {
  MarshalMUS(bs []byte) (n int)
  SizeMUS() (size int)
}

and MarshalMUS function:

func MarshalMUS(v MarshallerMUS) (bs []byte) {
  bs = make([]byte, v.SizeMUS())
  v.MarshalMUS(bs)
  return
}

// Foo implements the MarshallerMUS interface.
type Foo struct {...}
...

func main() {
  // Foo can now be marshalled with a single function call.
  bs := MarshalMUS(Foo{...})
  // ...
}

They are already defined in the ext-mus-go module, which also includes the MarshallerTypedMUS interface and the MarshalTypedMUS function for typed data serialization (DTM + data).

The full code of using MarshalMUS function can be found here.

Interface Serialization (oneof feature)

mus-dts-go will also help to create a serializer for an interface. Example:

import (
  dts "github.com/mus-format/mus-dts-go"
  ext "github.com/mus-format/ext-mus-go"
)

// Interface to serializer.
type Instruction interface {...}

// Copy implements the Instruction and ext.MarshallerTypedMUS interfaces.
type Copy struct {...}

// MarshalTYpedMUS uses CopyDTS.
func (c Copy) MarshalTypedMUS(bs []byte) (n int) {
  return CopyDTS.Marshal(c, bs)
}

// SizeTypedMUS uses CopyDTS.
func (c Copy) SizeTypedMUS() (size int) {
  return CopyDTS.Size(c, bs)
}

// Insert implements the Instruction and ext.MarshallerTypedMUS interfaces.
type Insert struct {...}

// ...

// instructionMUS implements the mus.Serializer interface.
type instructionMUS struct {}

func (s instructionMUS) Marshal(i Instruction, bs []byte) (n int) {
  if m, ok := i.(MarshallerTypedMUS); ok {
    return m.MarshalTypedMUS(bs)
  }
  panic(fmt.Sprintf("%v doesn't implement ext.MarshallerTypedMUS interface", 
    reflect.TypeOf(i)))
}

func (s instructionMUS) Unmarshal(bs []byte) (i Instruction, n int, err error) {
  dtm, n, err := dts.DTMSer.Unmarshal(bs)
  if err != nil {
    return
  }
  switch dtm {
  case CopyDTM:
    return CopyDTS.UnmarshalData(bs[n:])
  case InsertDTM:
    return InsertDTS.UnmarshalData(bs[n:])
  default:
    err = ErrUnexpectedDTM
    return
  }
}

func (s instructionMUS) Size(i Instruction) (size int) {
  if s, ok := i.(MarshallerTypedMUS); ok {
    return s.SizeTypedMUS()
  }
  panic(fmt.Sprintf("%v doesn't implement ext.MarshallerTypedMUS interface", 
    reflect.TypeOf(i)))
}

A full example can be found at mus-examples-go.

Validation

Validation is performed during unmarshalling. Validator is just a function with the following signature func (value Type) error, where Type is a type of the value to which the validator is applied.

String

ord.NewValidStringSer constructor creates a string serializer with the length validator.

package main

import (
  com "github.com/mus-format/common-go"
  "github.com/mus-format/mus-go/ord"
  strops "github.com/mus-format/mus-go/options/string"
)

func main() {
  var (
    // Length validator.
    lenVl = func(length int) (err error) {
      if length > 3 {
        err = com.ErrTooLargeLength
      }
      return
    }
    ser = ord.NewValidStringSer(strops.WithLenValidator(com.ValidatorFn[int](lenVl)))

    // To create a valid string serializer with the specific length serializer
    // use:
    // ser = ord.NewValidStringSer(strops.WithLenSer(lenSer), ...)

    value = "hello world"
    size  = ser.Size(value)
    bs    = make([]byte, size)
  )
  n := ser.Marshal(value, bs)
  // Unmarshalling stops when a validator returns an error. As a result, in
  // this case, we will receive a length validation error.
  value, n, err := ser.Unmarshal(bs)
  // ...
}

Slice

ord.NewValidSliceSer constructor creates a valid slice serializer with the length and element validators.

package main

import (
  com "github.com/mus-format/common-go"
  "github.com/mus-format/mus-go/ord"
  slops "github.com/mus-format/mus-go/options/slice"
)

func main() {
  var (
    // Length validator.
    lenVl = func(length int) (err error) {
      if length > 3 {
        err = com.ErrTooLargeLength
      }
      return
    }
    // Element validator.
    elemVl = func(elem string) (err error) {
      if elem == "hello" {
        err = ErrBadElement
      }
      return
    }
    // Each of the validators could be nil.
    ser = ord.NewValidSliceSer[string](ord.String,
      slops.WithLenValidator[string](com.ValidatorFn[int](lenVl)),
      slops.WithElemValidator[string](com.ValidatorFn[string](elemVl)))

    // To create a valid slice serializer with the specific length serializer
    // use:
    // ser = ord.NewValidSliceSer[string](ord.String,
    //   slops.WithLenSer[string](lenSer), ...)

    value = []string{"hello", "world"}
    size  = ser.Size(value)
    bs    = make([]byte, size)
  )
  n := ser.Marshal(value, bs)
  // Unmarshalling stops when any of the validators return an error. As a
  // result, in this case, we will receive an element validation error.
  value, n, err := ser.Unmarshal(bs)
  // ...
}
Map

ord.NewValidMapSer constructor creates a valid map serializer with the length, key and value validators.

package main

import (
  com "github.com/mus-format/common-go"
  "github.com/mus-format/mus-go/ord"
  "github.com/mus-format/mus-go/varint"
  mapops "github.com/mus-format/mus-go/options/map"
)

func main() {
  var (
    // Length validator.
    lenVl = func(length int) (err error) {
      if length > 3 {
        err = com.ErrTooLargeLength
      }
      return
    }
    // Key validator.
    keyVl = func(key int) (err error) {
      if key == 1 {
        err = ErrBadKey
      }
      return
    }
    // Value validator.
    valueVl = func(val string) (err error) {
      if val == "hello" {
        err = ErrBadValue
      }
      return
    }
    // Each of the validators could be nil.
    ser = ord.NewValidMapSer[int, string](varint.Int, ord.String,
      mapops.WithLenValidator[int, string](com.ValidatorFn[int](lenVl)),
      mapops.WithKeyValidator[int, string](com.ValidatorFn[int](keyVl)),
      mapops.WithValueValidator[int, string](com.ValidatorFn[string](valueVl)))

    // To create a valid map serializer with the specific length serializer
    // use:
    // ser = ord.NewValidMapSer[int, string](varint.Int, ord.String,
    //   mapops.WithLenSer[int, string](lenSer), ...)

    value = map[int]string{1: "hello", 2: "world"}
    size  = ser.Size(value)
    bs    = make([]byte, size)
  )
  n := ser.Marshal(value, bs)
  // Unmarshalling stops when any of the validators return an error. As a
  // result, in this case, we will receive a key validation error.
  value, n, err := ser.Unmarshal(bs)
  // ...
}
Struct

Unmarshalling an invalid structure may stop at the first invalid field, returning a validation error.

package main

import "github.com/mus-format/mus-go/varint"

type fooMUS struct{}

// ...

func (s fooMUS) Unmarshal(bs []byte) (v Foo, n int, err error) {
  // Unmarshal the first field.
  v.str, n, err = ord.String.Unmarshal(bs)
  if err != nil {
    return
  }
  // Validate the first field.
  if err = ValidateFieldA(v.a); err != nil {
    // The rest of the structure remains unmarshaled.
    return
  }
  // ...
}

Out of Order Deserialization

A simple example can be found here.

Zero Allocation Deserialization

Can be achieved using the unsafe package.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrTooSmallByteSlice = errors.New("too small byte slice")

ErrTooSmallByteSlice means that an Unmarshal requires a longer byte slice than was provided.

Functions

This section is empty.

Types

type Serializer added in v0.5.0

type Serializer[T any] interface {
	Marshal(t T, bs []byte) (n int)
	Unmarshal(bs []byte) (t T, n int, err error)
	Size(t T) (size int)
	Skip(bs []byte) (n int, err error)
}

Serializer is the interface that groups the Marshal, Unmarshal, Size and Skip methods.

Marshal fills bs with an encoded value, returning the number of used bytes. It should panic if receives too small byte slice.

Unmarshal parses an encoded value from bs, returning the value, the number of used bytes and any error encountered.

Size method returns the number of bytes needed to encode the value.

Skip skips an encoded value, returning the number of skipped bytes and any error encountered.

Directories

Path Synopsis
options
map

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL