avro

package module

v0.0.0-...-f64d274 Latest Latest Go to latest Published: Mar 1, 2026 License: MIT Imports: 19 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/twmb/avro

Links

Open Source Insights

README ¶

avro

Encode and decode Avro binary data.

Parse an Avro JSON schema, then encode and decode Go values directly — no code generation required. Supports all primitive and complex types, logical types, schema evolution, Object Container Files, Single Object Encoding, and fingerprinting.

Quick Start

package main

import (
	"fmt"
	"log"

	"github.com/twmb/avro"
)

var schema = avro.MustParse(`{
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "age",  "type": "int"}
    ]
}`)

type User struct {
	Name string `avro:"name"`
	Age  int    `avro:"age"`
}

func main() {
	// Encode
	data, err := schema.Encode(&User{Name: "Alice", Age: 30})
	if err != nil {
		log.Fatal(err)
	}

	// Decode
	var u User
	_, err = schema.Decode(data, &u)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(u) // {Alice 30}
}

Type Mapping

The table below shows which Go types can be used with each Avro type when encoding and decoding.

Avro Type	Go Types
null	`any` (always nil)
boolean	`bool`, `any`
int, long	`int`, `int8`, `int16`, `int32`, `int64`, `uint`, `uint8`, `uint16`, `uint32`, `uint64`, `any`
float	`float32`, `float64`, `any`
double	`float64`, `float32`, `any`
string	`string`, `[]byte`, `any`; also `encoding.TextUnmarshaler`
bytes	`[]byte`, `string`, `any`
enum	`string`, any integer type (ordinal), `any`
fixed	`[N]byte`, `[]byte`, `any`
array	slice, `any`
map	`map[string]T`, `any`
union	`any`, `*T` (for `["null", T]` unions), or the matched branch type
record	struct (matched by field name or `avro` tag), `map[string]any`, `any`

When decoding into any, values use their natural Go types: nil, bool, int64, float32, float64, string, []byte, []any, map[string]any.

Encoding also accepts fmt.Stringer and encoding.TextMarshaler for string schema types.

Struct Tags

Struct fields are mapped to Avro record fields using the avro tag:

type Example struct {
    Name    string `avro:"name"`           // maps to Avro field "name"
    Ignored int    `avro:"-"`              // skipped entirely
    Inner   Nested `avro:",inline"`        // inlines nested struct fields into parent
    Email   *string `avro:",omitzero"`     // encodes as null when zero (for ["null", T] unions)
}

Without a tag, the exported Go field name is used. Embedded (anonymous) structs are inlined automatically unless they have an explicit avro:"name" tag.

Logical Types

Logical types are decoded into natural Go types when available, and fall back to the underlying Avro type otherwise.

Logical Type	Avro Type	Go Type
`date`	int	`time.Time` or int types
`time-millis`	int	`time.Duration` or int types
`time-micros`	long	`time.Duration` or int types
`timestamp-millis`	long	`time.Time` or int types
`timestamp-micros`	long	`time.Time` or int types
`timestamp-nanos`	long	`time.Time` or int types
`local-timestamp-millis`	long	`time.Time` or int types
`local-timestamp-micros`	long	`time.Time` or int types
`local-timestamp-nanos`	long	`time.Time` or int types
`uuid`	string or fixed(16)	`[16]byte` (RFC 4122 hex-dash ↔ binary) or `string`
`decimal`	bytes or fixed	underlying bytes/fixed
`duration`	fixed(12)	underlying fixed

Unknown logical types are silently ignored per the Avro spec, and the underlying type is used as-is.

Schema Evolution

Avro data is always written with a specific schema — the writer schema. When you read that data later, your application may expect a different schema — the reader schema. You may have added a field, removed one, or widened a type from int to long.

Resolve bridges this gap. Given the writer and reader schemas, it returns a new schema that decodes data in the old wire format and produces values in the reader's layout:

Fields in the reader but not the writer are filled from defaults.
Fields in the writer but not the reader are skipped.
Fields that exist in both are matched by name (or alias) and decoded, with type promotion applied where needed (e.g. int → long).

Example

Suppose v1 of your application wrote User records with just a name:

var writerSchema = avro.MustParse(`{
    "type": "record", "name": "User",
    "fields": [
        {"name": "name", "type": "string"}
    ]
}`)

In v2 you added an email field with a default:

var readerSchema = avro.MustParse(`{
    "type": "record", "name": "User",
    "fields": [
        {"name": "name",  "type": "string"},
        {"name": "email", "type": "string", "default": ""}
    ]
}`)

type User struct {
    Name  string `avro:"name"`
    Email string `avro:"email"`
}

To read old v1 data with your v2 struct, resolve the two schemas:

resolved, err := avro.Resolve(writerSchema, readerSchema)

var u User
_, err = resolved.Decode(v1Data, &u)
// u == User{Name: "Alice", Email: ""}

The following type promotions are supported:

Writer → Reader
int → long, float, double
long → float, double
float → double
string ↔ bytes

CheckCompatibility checks whether two schemas are compatible without building a resolved schema. The direction you check depends on the guarantee you need:

// Backward: new schema can read old data.
avro.CheckCompatibility(oldSchema, newSchema)

// Forward: old schema can read new data.
avro.CheckCompatibility(newSchema, oldSchema)

// Full: check both directions.
avro.CheckCompatibility(oldSchema, newSchema)
avro.CheckCompatibility(newSchema, oldSchema)

Object Container Files

The ocf sub-package reads and writes Avro Object Container Files — self-describing binary files that embed the schema in the header and store data in compressed blocks.

Writing

var schema = avro.MustParse(`{
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "age",  "type": "int"}
    ]
}`)

f, _ := os.Create("users.avro")
w, err := ocf.NewWriter(f, schema, ocf.WithCodec(ocf.SnappyCodec()))
if err != nil {
    log.Fatal(err)
}
w.Encode(&User{Name: "Alice", Age: 30})
w.Encode(&User{Name: "Bob", Age: 25})
w.Close()
f.Close()

Reading

f, _ := os.Open("users.avro")
r, err := ocf.NewReader(f)
if err != nil {
    log.Fatal(err)
}
defer r.Close()
for {
    var u User
    err := r.Decode(&u)
    if err == io.EOF {
        break
    }
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(u)
}

The reader's Schema() method returns the schema parsed from the file header, which you can pass as the writer schema to Resolve.

Codecs

Built-in codecs: null (default, no compression), deflate (DeflateCodec), snappy (SnappyCodec), and zstandard (ZstdCodec). Custom codecs can be provided via the Codec interface.

Appending

NewAppendWriter opens an existing OCF for appending — it reads the header to recover the schema, codec, and sync marker, then seeks to the end.

Single Object Encoding

For sending self-describing values over the wire (as opposed to files, where OCF is preferred), use Single Object Encoding. Each message is a 2-byte magic header, an 8-byte CRC-64-AVRO fingerprint, and the Avro binary payload.

// Encode with fingerprint header
data, err := schema.AppendSingleObject(nil, &user)

// Decode (schema known)
_, err = schema.DecodeSingleObject(data, &user)

// Decode (schema unknown): extract fingerprint, look up schema
fp, payload, err := avro.SingleObjectFingerprint(data)
schema := registry.Lookup(fp) // your schema registry
_, err = schema.Decode(payload, &user)

Fingerprinting

Canonical returns the Parsing Canonical Form of a schema — a deterministic JSON representation stripped of doc, aliases, defaults, and other non-essential attributes. Use it for schema comparison and fingerprinting.

canonical := schema.Canonical() // []byte

// CRC-64-AVRO (Rabin) — the Avro-standard fingerprint
fp := schema.Fingerprint(avro.NewRabin())

// SHA-256 — common for cross-language registries
fp256 := schema.Fingerprint(sha256.New())

Performance

Struct field access uses unsafe pointer arithmetic (similar to encoding/json v2) to avoid reflect.Value overhead on every encode/decode. All schemas, type mappings, and codec state are cached after first use so repeated operations pay no extra allocation cost.

Documentation ¶

Overview ¶

Package avro encodes and decodes Avro binary data.

Parse an Avro JSON schema with Parse (or MustParse for package-level vars), then call Schema.Encode / Schema.Decode to convert between Go values and Avro binary. See Schema.Decode for the full Go-to-Avro type mapping.

Basic usage ¶

schema := avro.MustParse(`{
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "age",  "type": "int"}
    ]
}`)

type User struct {
    Name string `avro:"name"`
    Age  int    `avro:"age"`
}

// Encode
data, err := schema.Encode(&User{Name: "Alice", Age: 30})

// Decode
var u User
_, err = schema.Decode(data, &u)

Schema evolution ¶

Avro data is always written with a specific schema — the "writer schema." When you read that data later, your application may expect a different schema — the "reader schema." For example, you may have added a field, removed one, or widened a type from int to long. The data on disk doesn't change, but your code expects the new layout.

Resolve bridges this gap. Given the writer and reader schemas, it returns a new schema that knows how to decode the old wire format and produce values in the reader's layout:

Fields in the reader but not the writer are filled from defaults.
Fields in the writer but not the reader are skipped.
Fields that exist in both are matched by name (or alias) and decoded, with type promotion applied where needed (e.g. int → long).

You typically get the writer schema from the data itself: an [ocf] file header embeds it, and schema registries store it by ID or fingerprint.

As a concrete example, suppose v1 of your application wrote User records with just a name:

var writerSchema = avro.MustParse(`{
    "type": "record", "name": "User",
    "fields": [
        {"name": "name", "type": "string"}
    ]
}`)

In v2, you added an email field with a default:

var readerSchema = avro.MustParse(`{
    "type": "record", "name": "User",
    "fields": [
        {"name": "name",  "type": "string"},
        {"name": "email", "type": "string", "default": ""}
    ]
}`)

type User struct {
    Name  string `avro:"name"`
    Email string `avro:"email"`
}

To read old v1 data with your v2 struct, resolve the two schemas:

resolved, err := avro.Resolve(writerSchema, readerSchema)

// Decode v1 data: "email" is absent in the old data, so it gets
// the reader default ("").
var u User
_, err = resolved.Decode(v1Data, &u)
// u == User{Name: "Alice", Email: ""}

If you just want to check whether two schemas are compatible without building a resolved schema, use CheckCompatibility.

Single Object Encoding ¶

For sending self-describing values over the wire (as opposed to files, where [ocf] is preferred), use Schema.AppendSingleObject and Schema.DecodeSingleObject. To decode without knowing the schema in advance, extract the fingerprint with SingleObjectFingerprint and look it up in your own registry.

Fingerprinting ¶

Schema.Canonical returns the Parsing Canonical Form for deterministic comparison. Schema.Fingerprint hashes it with any hash.Hash; use NewRabin for the Avro-standard CRC-64-AVRO.

Example ¶

package main

import (
	"fmt"
	"log"

	"github.com/twmb/avro"
)

func main() {
	schema := avro.MustParse(`{
		"type": "record",
		"name": "User",
		"fields": [
			{"name": "name", "type": "string"},
			{"name": "age",  "type": "int"}
		]
	}`)

	type User struct {
		Name string `avro:"name"`
		Age  int32  `avro:"age"`
	}

	data, err := schema.Encode(&User{Name: "Alice", Age: 30})
	if err != nil {
		log.Fatal(err)
	}

	var u User
	if _, err := schema.Decode(data, &u); err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%s is %d\n", u.Name, u.Age)
}

Output:

Alice is 30

Index ¶

func CheckCompatibility(writer, reader *Schema) error
func NewRabin() hash.Hash64
func SingleObjectFingerprint(data []byte) (fp [8]byte, rest []byte, err error)
type CompatibilityError
- func (e *CompatibilityError) Error() string
type Duration
type Schema
type SchemaCache
- func NewSchemaCache() *SchemaCache
- func (c *SchemaCache) Parse(schema string) (*Schema, error)
type SemanticError
- func (e *SemanticError) Error() string
- func (e *SemanticError) Unwrap() error
type ShortBufferError
- func (e *ShortBufferError) Error() string

Examples ¶

Package
Resolve
Schema.AppendEncode
Schema.AppendSingleObject
SchemaCache

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CheckCompatibility ¶

func CheckCompatibility(writer, reader *Schema) error

CheckCompatibility reports whether data written with the writer schema can be read by the reader schema. It returns nil on success or a *CompatibilityError describing the first incompatibility.

See Resolve for a note on argument order.

func NewRabin ¶

func NewRabin() hash.Hash64

NewRabin returns a hash.Hash64 computing the CRC-64-AVRO (Rabin) fingerprint defined by the Avro specification.

func SingleObjectFingerprint ¶

func SingleObjectFingerprint(data []byte) (fp [8]byte, rest []byte, err error)

SingleObjectFingerprint extracts the 8-byte CRC-64-AVRO fingerprint and returns the remaining payload. Use this to look up the schema by fingerprint before decoding.

Types ¶

type CompatibilityError ¶

type CompatibilityError struct {
	// Path is the dotted path to the incompatible element (e.g. "User.address.zip").
	Path string
	// ReaderType is the Avro type in the reader schema.
	ReaderType string
	// WriterType is the Avro type in the writer schema.
	WriterType string
	// Detail describes the specific incompatibility.
	Detail string
}

CompatibilityError describes an incompatibility between a reader and writer schema, as returned by CheckCompatibility and Resolve.

func (*CompatibilityError) Error ¶

func (e *CompatibilityError) Error() string

type Duration ¶

type Duration struct {
	Months       uint32
	Days         uint32
	Milliseconds uint32
}

Duration represents the Avro duration logical type: a 12-byte fixed value containing three little-endian unsigned 32-bit integers representing months, days, and milliseconds.

type Schema ¶

type Schema struct {
	// contains filtered or unexported fields
}

Schema is a compiled Avro schema. Create one with Parse or MustParse, then use Schema.Encode / Schema.Decode to convert between Go values and Avro binary. A Schema is safe for concurrent use.

func MustParse ¶

func MustParse(schema string) *Schema

MustParse is like Parse but panics on error. Useful for package-level var declarations.

func Parse ¶

func Parse(schema string) (*Schema, error)

Parse parses an Avro JSON schema string and returns a compiled *Schema. The input can be a primitive name (e.g. `"string"`), a JSON object (record, enum, array, map, fixed), or a JSON array (union). Named types may reference each other and self-reference. The schema is fully validated: unknown types, duplicate names, invalid defaults, etc. all return errors.

To parse schemas that reference named types from other schemas, use SchemaCache.

func Resolve ¶

func Resolve(writer, reader *Schema) (*Schema, error)

Resolve returns a schema that decodes data written with the writer schema and produces values matching the reader schema's layout. The writer schema is what the data was encoded with (typically from an [ocf] file header or a schema registry); the reader schema is what your application expects now.

Decoding with the returned schema handles field addition (defaults), field removal (skip), renaming (aliases), reordering, and type promotion. Encoding with it uses the reader's format.

If the schemas have identical canonical forms, reader is returned as-is. Otherwise CheckCompatibility is called first and any incompatibility is returned as a *CompatibilityError. See the package-level documentation for a full example.

Note: the argument order is (writer, reader), matching source-then-destination convention and Java's GenericDatumReader. This differs from the Avro spec text and hamba/avro, which put reader first.

Example ¶

package main

import (
	"fmt"
	"log"

	"github.com/twmb/avro"
)

func main() {
	// v1 wrote User with just a name.
	writerSchema := avro.MustParse(`{
		"type": "record", "name": "User",
		"fields": [{"name": "name", "type": "string"}]
	}`)

	// v2 added an email field with a default.
	readerSchema := avro.MustParse(`{
		"type": "record", "name": "User",
		"fields": [
			{"name": "name",  "type": "string"},
			{"name": "email", "type": "string", "default": ""}
		]
	}`)

	resolved, err := avro.Resolve(writerSchema, readerSchema)
	if err != nil {
		log.Fatal(err)
	}

	// Encode a v1 record (name only).
	v1Data, err := writerSchema.Encode(map[string]any{"name": "Alice"})
	if err != nil {
		log.Fatal(err)
	}

	// Decode old data into the new layout; email gets the default.
	type User struct {
		Name  string `avro:"name"`
		Email string `avro:"email"`
	}
	var u User
	if _, err := resolved.Decode(v1Data, &u); err != nil {
		log.Fatal(err)
	}
	fmt.Printf("name=%s email=%q\n", u.Name, u.Email)
}

Output:

name=Alice email=""

func (*Schema) AppendEncode ¶

func (s *Schema) AppendEncode(dst []byte, v any) ([]byte, error)

AppendEncode appends the Avro binary encoding of v to dst. See Schema.Decode for the Go-to-Avro type mapping.

Example ¶

package main

import (
	"fmt"
	"log"

	"github.com/twmb/avro"
)

func main() {
	schema := avro.MustParse(`"string"`)

	// AppendEncode reuses a buffer across calls, avoiding allocation.
	var buf []byte
	var err error
	for _, s := range []string{"hello", "world"} {
		buf, err = schema.AppendEncode(buf[:0], s)
		if err != nil {
			log.Fatal(err)
		}
		fmt.Printf("encoded %q: %d bytes\n", s, len(buf))
	}
}

Output:

encoded "hello": 6 bytes
encoded "world": 6 bytes

func (*Schema) AppendSingleObject ¶

func (s *Schema) AppendSingleObject(dst []byte, v any) ([]byte, error)

AppendSingleObject appends a Single Object Encoding of v to dst: 2-byte magic, 8-byte CRC-64-AVRO fingerprint, then the Avro binary payload.

Example ¶

package main

import (
	"fmt"
	"log"

	"github.com/twmb/avro"
)

func main() {
	schema := avro.MustParse(`{
		"type": "record",
		"name": "Event",
		"fields": [
			{"name": "id",   "type": "long"},
			{"name": "name", "type": "string"}
		]
	}`)

	type Event struct {
		ID   int64  `avro:"id"`
		Name string `avro:"name"`
	}

	// Encode: 2-byte magic + 8-byte fingerprint + Avro payload.
	data, err := schema.AppendSingleObject(nil, &Event{ID: 1, Name: "click"})
	if err != nil {
		log.Fatal(err)
	}

	// Decode.
	var e Event
	if _, err := schema.DecodeSingleObject(data, &e); err != nil {
		log.Fatal(err)
	}
	fmt.Printf("id=%d name=%s\n", e.ID, e.Name)
}

Output:

id=1 name=click

func (*Schema) Canonical ¶

func (s *Schema) Canonical() []byte

Canonical returns the Parsing Canonical Form of the schema, stripping doc, aliases, defaults, and other non-essential attributes. The result is deterministic and suitable for comparison and fingerprinting.

func (*Schema) Decode ¶

func (s *Schema) Decode(src []byte, v any) ([]byte, error)

Decode reads Avro binary from src into v and returns the remaining bytes. v must be a non-nil pointer to a type compatible with the schema:

null: any (always decodes to nil)
boolean: bool, any
int, long: int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, any
float: float32, float64, any
double: float64, float32, any
string: string, []byte, any; also encoding.TextUnmarshaler
bytes: []byte, string, any
enum: string, int/int8/.../uint64 (ordinal), any
fixed: [N]byte, []byte, any
array: slice, any
map: map[string]T, any
union: any, *T (for ["null", T] unions), or the matched branch type
record: struct (matched by field name or `avro` tag), map[string]any, any

When decoding into any (interface{}), values are returned as their natural Go types: nil, bool, int64, float32, float64, string, []byte, []any, map[string]any, or map[string]any for records.

func (*Schema) DecodeSingleObject ¶

func (s *Schema) DecodeSingleObject(data []byte, v any) ([]byte, error)

DecodeSingleObject decodes a Single Object Encoding message into v after verifying the magic and fingerprint match this schema.

func (*Schema) Encode ¶

func (s *Schema) Encode(v any) ([]byte, error)

Encode encodes v as Avro binary. It is shorthand for AppendEncode(nil, v).

func (*Schema) Fingerprint ¶

func (s *Schema) Fingerprint(h hash.Hash) []byte

Fingerprint hashes the schema's canonical form with h. Use NewRabin for CRC-64-AVRO or crypto/sha256 for cross-language compatibility.

func (*Schema) String ¶

func (s *Schema) String() string

String returns the original JSON passed to Parse, preserving all attributes (doc, aliases, defaults, etc.) unlike Schema.Canonical.

type SchemaCache ¶

type SchemaCache struct {
	// contains filtered or unexported fields
}

SchemaCache accumulates named types across multiple SchemaCache.Parse calls, allowing schemas to reference types defined in previously parsed schemas. This is useful for Schema Registry integrations where schemas have references to other schemas.

Schemas must be parsed in dependency order: referenced types must be parsed before the schemas that reference them.

The returned *Schema from each Parse call is fully resolved and independent of the cache — it can be used for Schema.Encode and Schema.Decode without the cache.

A SchemaCache is safe for concurrent use.

Example ¶

package main

import (
	"fmt"
	"log"

	"github.com/twmb/avro"
)

func main() {
	cache := avro.NewSchemaCache()

	// Parse the Address type first.
	if _, err := cache.Parse(`{
		"type": "record",
		"name": "Address",
		"fields": [
			{"name": "street", "type": "string"},
			{"name": "city",   "type": "string"}
		]
	}`); err != nil {
		log.Fatal(err)
	}

	// User references Address by name.
	schema, err := cache.Parse(`{
		"type": "record",
		"name": "User",
		"fields": [
			{"name": "name",    "type": "string"},
			{"name": "address", "type": "Address"}
		]
	}`)
	if err != nil {
		log.Fatal(err)
	}

	type Address struct {
		Street string `avro:"street"`
		City   string `avro:"city"`
	}
	type User struct {
		Name    string  `avro:"name"`
		Address Address `avro:"address"`
	}

	data, err := schema.Encode(&User{
		Name:    "Alice",
		Address: Address{Street: "123 Main St", City: "Springfield"},
	})
	if err != nil {
		log.Fatal(err)
	}

	var u User
	if _, err := schema.Decode(data, &u); err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%s lives at %s, %s\n", u.Name, u.Address.Street, u.Address.City)
}

Output:

Alice lives at 123 Main St, Springfield

func NewSchemaCache ¶

func NewSchemaCache() *SchemaCache

NewSchemaCache returns a new empty SchemaCache.

func (*SchemaCache) Parse ¶

func (c *SchemaCache) Parse(schema string) (*Schema, error)

Parse parses a schema string, registering any named types (records, enums, fixed) in the cache. Named types from previous Parse calls are available for reference resolution. On failure, the cache is not modified.

type SemanticError ¶

type SemanticError struct {
	// GoType is the Go type involved, if applicable.
	GoType reflect.Type
	// AvroType is the Avro schema type (e.g. "int", "record", "boolean").
	AvroType string
	// Field is the record field name, if within a record.
	Field string
	// Err is the underlying error.
	Err error
}

SemanticError indicates a Go type is incompatible with an Avro schema type during encoding or decoding.

func (*SemanticError) Error ¶

func (e *SemanticError) Error() string

func (*SemanticError) Unwrap ¶

func (e *SemanticError) Unwrap() error

type ShortBufferError ¶

type ShortBufferError struct {
	// Type is what was being read (e.g. "boolean", "string", "uint32").
	Type string
	// Need is the number of bytes required (0 if unknown).
	Need int
	// Have is the number of bytes available.
	Have int
}

ShortBufferError indicates the input buffer is too short for the value being decoded.

func (*ShortBufferError) Error ¶

func (e *ShortBufferError) Error() string

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
ocf Package ocf implements Avro [Object Container Files] (OCF).	Package ocf implements Avro [Object Container Files] (OCF).

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL