avro

package module
v0.4.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 29, 2023 License: MIT Imports: 20 Imported by: 0

README

Avro - Go-idiomatic encoding and decoding of Avro data

Go Reference

This package provides both a code generator that generates Go data structures from Avro schemas and a mapping between native Go data types and Avro schemas.

The API is modelled after that of Go's standard library encoding/json package.

The documentation can be found here.

It also provides support for encoding and decoding messages using an Avro schema registry - see github.com/heetch/avro/avroregistry.

How are Avro schemas represented as Go datatypes?

When the avrogo command generates Go datatypes from Avro schemas, it uses the following rules:

  • "int" is represented as int
  • "long" is represented as int64
  • "float" is represented as float32
  • "double" is represented as float64
  • "string" is represented as string
  • "boolean" is represented as bool
  • "bytes" is represented as []byte
  • "null" is represented as the Go value nil
  • {"type": "array", "items": T} is represented as []T
  • {"type": "map", "values": T} is represented as map[string]T
  • {"type": "enum", "name": "E", "symbols": ["red", "green", "blue"]} is represented a Go int type with String, MarshalText and UnmarshalText methods so it will encode as a string when used in JSON.
  • {"type": "fixed", "size": 123, "name": "F"} will encode as a Go [123]byte type named F
  • ["null", T] encodes as *T
  • [T, "null"] encodes as *T
  • [T₁, T₂, ...] (a union) encodes as interface{} that should hold only the types for T₁, T₂, etc.
  • {"type": "record", "name": "R", "fields": [....]} encodes as a Go struct type named R with corresponding fields.
  • {"type": "long", "logicalType": "timestamp-micros"} is represented as time.Time type
  • {"type": "string", "logicalType": "uuid"} is represented as github.com/google/uuid.UUID type.
  • {"type": "long", "logicalType": "duration-nanos"} is represented as time.Duration type.

If a definition has a go.package annotation the type from that package will be used instead of generating a Go type. The type must be compatible with the Avro schema (it may contain extra fields, but all fields in common must be compatible).

If a definition has a go.name annotation the associated string will be used for the generated Go type name.

Comparison with other Go Avro packages

github.com/linkedin/goavro/v2, is oriented towards dynamic processing of Avro data. It does not provide an idiomatic way to marshal/unmarshal Avro data into Go struct values. It does, however, provide good support for encoding and decoding with the standard Avro JSON format, which this package does not.

github.com/actgardner/gogen-avro was the original inspiration for this package. It generates Go code for Avro schemas. It uses a neat VM-based schema for encoding and decoding (and is also used by this package under the hood), but the generated Go data structures are awkward to use and don't reflect the data structures that people would idiomatically define in Go.

For example, in gogen-avro the Avro type ["null", "int"] (either null or an integer) is represented as a struct containing three members, and an associated enum type:

type UnionNullIntTypeEnum int

const (
	UnionNullIntTypeEnumNull UnionNullIntTypeEnum = 0
	UnionNullIntTypeEnumInt UnionNullIntTypeEnum = 1
)

type UnionNullInt struct {
	Null *types.NullVal
	Int int32
	UnionType UnionNullIntTypeEnum
}

With heetch/avro, the above type is simply represented as a *int, a representation likely to be familiar to most Go users.

Integration testing

A github.com/heetch/avro/avroregistrytest package is provided to run integration test against a real schema registry.

import "github.com/heetch/avro/avroregistrytest"

type X struct {
   A int
}

avroregistrytest.Register(context.Background(), t, A{}, "test-topic")

This code snippet register an avro type for X struct for test-topic in the schema registry defined by KAFKA_REGISTRY_ADDR environment variable that must set to host:port form.

Documentation

Overview

Package avro provides encoding and decoding for the Avro binary data format.

The format uses out-of-band schemas to determine the encoding, with a schema migration scheme to allow data written with one schema to be read using another schema.

See here for more information on the format:

https://avro.apache.org/docs/1.9.1/spec.html

This package provides a mapping from regular Go types to Avro schemas. See the TypeOf function for more details.

There is also a code generation tool that can generate Go data structures from Avro schemas. See https://pkg.go.dev/github.com/heetch/avro/cmd/avrogo for details.

Index

Constants

View Source
const (
	Backward CompatMode = 1 << iota
	Forward
	Transitive

	BackwardTransitive = Backward | Transitive
	ForwardTransitive  = Forward | Transitive
	Full               = Backward | Forward
	FullTransitive     = Full | Transitive
)

Variables

This section is empty.

Functions

This section is empty.

Types

type CanonicalOpts

type CanonicalOpts int

CanonicalOpts holds a bitmask of options for CanonicalString.

const (
	// LeaveDefaults specifies that default values should be retained in
	// the canonicalized schema string.
	RetainDefaults CanonicalOpts = 1 << iota
	RetainLogicalTypes
	RetainAll CanonicalOpts = RetainDefaults | RetainLogicalTypes
)

type CompatMode

type CompatMode int

CompatMode defines a compatiblity mode used for checking Avro type compatibility.

func ParseCompatMode

func ParseCompatMode(s string) CompatMode

ParseCompatMode returns the CompatMode from a string. It returns -1 if no matches are found.

func (CompatMode) String

func (m CompatMode) String() string

String returns a string representation of m, one of the values defined in https://docs.confluent.io/current/schema-registry/avro.html#schema-evolution-and-compatibility. For example FullTransitive.String() returns "FULL_TRANSITIVE".

type DecodingRegistry

type DecodingRegistry interface {
	// DecodeSchemaID returns the schema ID header of the message
	// and the bare message without schema information.
	// A schema ID is specific to the DecodingRegistry instance - within
	// a given DecodingRegistry instance (only), a given schema ID
	// must always correspond to the same schema.
	//
	// If the message isn't valid, DecodeSchemaID should return (0, nil).
	DecodeSchemaID(msg []byte) (int64, []byte)

	// SchemaForID returns the schema for the given ID.
	SchemaForID(ctx context.Context, id int64) (*Type, error)
}

DecodingRegistry is used by SingleDecoder to find information about schema identifiers in messages.

type EncodingRegistry

type EncodingRegistry interface {
	// AppendSchemaID appends the given schema ID header to buf
	// and returns the resulting slice.
	AppendSchemaID(buf []byte, id int64) []byte

	// IDForSchema returns an ID for the given schema.
	IDForSchema(ctx context.Context, schema *Type) (int64, error)
}

EncodingRegistry is used by SingleEncoder to find ids for schemas encoded in messages.

type Names

type Names struct {
	// contains filtered or unexported fields
}

Names represents a namespace that can rename schema names. The zero value of a Names is the empty namespace.

func (*Names) Marshal

func (names *Names) Marshal(x interface{}) ([]byte, *Type, error)

Marshal is like the Marshal function except that names in the schema for x are renamed according to names.

func (*Names) Rename

func (n *Names) Rename(oldName string, newName string, newAliases ...string) *Names

Rename returns a copy of n that renames oldName to newName with the given aliases when a schema is used.

If aliases aren't full names, their namespace will be taken from the namespace of newName.

If n already includes a rename for oldName, the old association will be overwritten.

The rename only applies to schemas directly - it does not rename names already passed to Rename as newName or aliases.

So for example:

n.Rename("foo", "bar").
	Rename("bar", "baz").
	TypeOf(`{"type":"record", "name": "foo", "fields": ...}`)

will return a type with a schema named "bar", not "baz".

Rename panics if oldName is any of the built-in Avro types.

func (*Names) RenameType

func (n *Names) RenameType(x interface{}, newName string, newAliases ...string) *Names

RenameType returns a copy of n that uses the given name and aliases for the type of x.

RenameType will panic if TypeOf(x) returns an error or the type doesn't represent an Avro named definition (a record, an enum or a fixed Avro type)

If RenameType has already been called for the type of x, the old association will be overwritten.

func (*Names) TypeOf

func (n *Names) TypeOf(x interface{}) (*Type, error)

TypeOf is like the TypeOf function except that Avro names in x will be translate through the namespace n.

func (*Names) Unmarshal

func (names *Names) Unmarshal(data []byte, x interface{}, wType *Type) (*Type, error)

Unmarshal is like the Unmarshal function except that names in the schema for x are renamed according to names.

type Null

type Null = avrotypegen.Null

Null represents the Avro null type. Its only JSON representation is null.

type SingleDecoder

type SingleDecoder struct {
	// contains filtered or unexported fields
}

SingleDecoder decodes messages in Avro binary format. Each message includes a header or wrapper that indicates the schema used to encode the message.

A DecodingRegistry is used to retrieve the schema for a given message or to find the encoding for a given schema.

To encode or decode a stream of messages that all use the same schema, use StreamEncoder or StreamDecoder instead.

func NewSingleDecoder

func NewSingleDecoder(r DecodingRegistry, names *Names) *SingleDecoder

NewSingleDecoder returns a new SingleDecoder that uses g to determine the schema of each message that's marshaled or unmarshaled.

Go values unmarshaled through Unmarshal will have their Avro schemas translated with the given Names instance. If names is nil, the global namespace will be used.

func (*SingleDecoder) Unmarshal

func (c *SingleDecoder) Unmarshal(ctx context.Context, data []byte, x interface{}) (*Type, error)

Unmarshal unmarshals the given message into x. The body of the message is unmarshaled as with the Unmarshal function.

It needs the context argument because it might end up fetching schema data over the network via the DecodingRegistry.

Unmarshal returns the actual type that was decoded into.

type SingleEncoder

type SingleEncoder struct {
	// contains filtered or unexported fields
}

SingleEncoder encodes messages in Avro binary format. Each message includes a header or wrapper that indicates the schema.

func NewSingleEncoder

func NewSingleEncoder(r EncodingRegistry, names *Names) *SingleEncoder

NewSingleEncoder returns a SingleEncoder instance that encodes single messages along with their schema identifier.

Go values unmarshaled through Marshal will have their Avro schemas translated with the given Names instance. If names is nil, the global namespace will be used.

func (*SingleEncoder) CheckMarshalType

func (enc *SingleEncoder) CheckMarshalType(ctx context.Context, x interface{}) error

CheckMarshalType checks that the given type can be marshaled with the encoder. It also caches any type information obtained from the EncodingRegistry from the type, so future calls to Marshal with that type won't call it.

func (*SingleEncoder) Marshal

func (enc *SingleEncoder) Marshal(ctx context.Context, x interface{}) ([]byte, error)

Marshal returns x marshaled as using the Avro binary encoding, along with an identifier that records the type that it was encoded with.

type Type

type Type struct {
	// contains filtered or unexported fields
}

Type represents an Avro schema type.

func Marshal

func Marshal(x interface{}) ([]byte, *Type, error)

Marshal encodes x as a message using the Avro binary encoding, using TypeOf(x) as the Avro type for marshaling.

Marshal returns the encoded data and the actual type that was used for marshaling.

See https://avro.apache.org/docs/current/spec.html#binary_encoding

func ParseType

func ParseType(s string) (*Type, error)

ParseType parses an Avro schema in the format defined by the Avro specification at https://avro.apache.org/docs/current/spec.html.

func TypeOf

func TypeOf(x interface{}) (*Type, error)

TypeOf returns the Avro type for the Go type of x.

If the type was generated by avrogo, the returned schema will be the same as the schema it was generated from.

Otherwise TypeOf(T) is derived according to the following rules:

  • int, int64 and uint32 encode as "long"
  • int32, int16, uint16, int8 and uint8 encode as "int"
  • float32 encodes as "float"
  • float64 encodes as "double"
  • string encodes as "string"
  • Null{} encodes as "null"
  • time.Duration encodes as {"type": "long", "logicalType": "duration-nanos"}
  • time.Time encodes as {"type": "long", "logicalType": "timestamp-micros"}
  • github.com/google/uuid.UUID encodes as {"type": "string", "logicalType": "string"}
  • [N]byte encodes as {"type": "fixed", "name": "go.FixedN", "size": N}
  • a named type with underlying type [N]byte encodes as [N]byte but typeName(T) for the name.
  • []T encodes as {"type": "array", "items": TypeOf(T)}
  • map[string]T encodes as {"type": "map", "values": TypeOf(T)}
  • *T encodes as ["null", TypeOf(T)]
  • a named struct type encodes as {"type": "record", "name": typeName(T), "fields": ...} where the fields are encoded as described below.
  • interface types are disallowed.

Struct fields are encoded as follows:

  • unexported struct fields are ignored
  • the field name is taken from the Go field name, or from a "json" tag for the field if present.
  • the default value for the field is the zero value for the type.
  • anonymous struct fields are disallowed (this restriction may be lifted in the future).

func Unmarshal

func Unmarshal(data []byte, x interface{}, wType *Type) (*Type, error)

Unmarshal unmarshals the given Avro-encoded binary data, which must have been written with Avro type described by wType, into x, which must be a pointer to a struct type.

The reader type used is TypeOf(*x), and must be compatible with wType according to the rules described here: https://avro.apache.org/docs/current/spec.html#Schema+Resolution

Unmarshal returns the reader type.

func (*Type) CanonicalString

func (t *Type) CanonicalString(opts CanonicalOpts) string

CanonicalString returns the canonical string representation of the type, as documented here: https://avro.apache.org/docs/1.9.1/spec.html#Transforming+into+Parsing+Canonical+Form

BUG: Unicode characters \u2028 and \u2029 in strings inside the schema are always escaped, contrary to the specification above.

func (*Type) Name

func (t *Type) Name() string

Name returns the fully qualified Avro Name for the type, or the empty string if it's not a definition.

func (*Type) String

func (t *Type) String() string

Directories

Path Synopsis
Package avroregistry provides avro.*Registry implementations that consult an Avro registry through its REST API.
Package avroregistry provides avro.*Registry implementations that consult an Avro registry through its REST API.
Package avrotypegen holds types that are used by generated Avro Go code.
Package avrotypegen holds types that are used by generated Avro Go code.
cmd
avrogo
The avrogo command generates Go types for the Avro schemas specified on the command line.
The avrogo command generates Go types for the Avro schemas specified on the command line.
avrogo/avrotypemap
Package avrotypemap is an internal implementation detail of the avrogo program and should not be used externally.
Package avrotypemap is an internal implementation detail of the avrogo program and should not be used externally.
go2avro
The go2avro command generates Avro schemas for Go types.
The go2avro command generates Avro schemas for Go types.
internal
testtypes
Package testtypes defines types for testing the avro package that aren't easily defined in the test package there.
Package testtypes defines types for testing the avro package that aren't easily defined in the test package there.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL