avro

package module
v0.0.0-...-f64d274 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2026 License: MIT Imports: 19 Imported by: 0

README

avro

Go Reference

Encode and decode Avro binary data.

Parse an Avro JSON schema, then encode and decode Go values directly — no code generation required. Supports all primitive and complex types, logical types, schema evolution, Object Container Files, Single Object Encoding, and fingerprinting.

Index

Quick Start

package main

import (
	"fmt"
	"log"

	"github.com/twmb/avro"
)

var schema = avro.MustParse(`{
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "age",  "type": "int"}
    ]
}`)

type User struct {
	Name string `avro:"name"`
	Age  int    `avro:"age"`
}

func main() {
	// Encode
	data, err := schema.Encode(&User{Name: "Alice", Age: 30})
	if err != nil {
		log.Fatal(err)
	}

	// Decode
	var u User
	_, err = schema.Decode(data, &u)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(u) // {Alice 30}
}

Type Mapping

The table below shows which Go types can be used with each Avro type when encoding and decoding.

Avro Type Go Types
null any (always nil)
boolean bool, any
int, long int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, any
float float32, float64, any
double float64, float32, any
string string, []byte, any; also encoding.TextUnmarshaler
bytes []byte, string, any
enum string, any integer type (ordinal), any
fixed [N]byte, []byte, any
array slice, any
map map[string]T, any
union any, *T (for ["null", T] unions), or the matched branch type
record struct (matched by field name or avro tag), map[string]any, any

When decoding into any, values use their natural Go types: nil, bool, int64, float32, float64, string, []byte, []any, map[string]any.

Encoding also accepts fmt.Stringer and encoding.TextMarshaler for string schema types.

Struct Tags

Struct fields are mapped to Avro record fields using the avro tag:

type Example struct {
    Name    string `avro:"name"`           // maps to Avro field "name"
    Ignored int    `avro:"-"`              // skipped entirely
    Inner   Nested `avro:",inline"`        // inlines nested struct fields into parent
    Email   *string `avro:",omitzero"`     // encodes as null when zero (for ["null", T] unions)
}

Without a tag, the exported Go field name is used. Embedded (anonymous) structs are inlined automatically unless they have an explicit avro:"name" tag.

Logical Types

Logical types are decoded into natural Go types when available, and fall back to the underlying Avro type otherwise.

Logical Type Avro Type Go Type
date int time.Time or int types
time-millis int time.Duration or int types
time-micros long time.Duration or int types
timestamp-millis long time.Time or int types
timestamp-micros long time.Time or int types
timestamp-nanos long time.Time or int types
local-timestamp-millis long time.Time or int types
local-timestamp-micros long time.Time or int types
local-timestamp-nanos long time.Time or int types
uuid string or fixed(16) [16]byte (RFC 4122 hex-dash ↔ binary) or string
decimal bytes or fixed underlying bytes/fixed
duration fixed(12) underlying fixed

Unknown logical types are silently ignored per the Avro spec, and the underlying type is used as-is.

Schema Evolution

Avro data is always written with a specific schema — the writer schema. When you read that data later, your application may expect a different schema — the reader schema. You may have added a field, removed one, or widened a type from int to long.

Resolve bridges this gap. Given the writer and reader schemas, it returns a new schema that decodes data in the old wire format and produces values in the reader's layout:

  • Fields in the reader but not the writer are filled from defaults.
  • Fields in the writer but not the reader are skipped.
  • Fields that exist in both are matched by name (or alias) and decoded, with type promotion applied where needed (e.g. int → long).
Example

Suppose v1 of your application wrote User records with just a name:

var writerSchema = avro.MustParse(`{
    "type": "record", "name": "User",
    "fields": [
        {"name": "name", "type": "string"}
    ]
}`)

In v2 you added an email field with a default:

var readerSchema = avro.MustParse(`{
    "type": "record", "name": "User",
    "fields": [
        {"name": "name",  "type": "string"},
        {"name": "email", "type": "string", "default": ""}
    ]
}`)

type User struct {
    Name  string `avro:"name"`
    Email string `avro:"email"`
}

To read old v1 data with your v2 struct, resolve the two schemas:

resolved, err := avro.Resolve(writerSchema, readerSchema)

var u User
_, err = resolved.Decode(v1Data, &u)
// u == User{Name: "Alice", Email: ""}

The following type promotions are supported:

Writer → Reader
int → long, float, double
long → float, double
float → double
string ↔ bytes

CheckCompatibility checks whether two schemas are compatible without building a resolved schema. The direction you check depends on the guarantee you need:

// Backward: new schema can read old data.
avro.CheckCompatibility(oldSchema, newSchema)

// Forward: old schema can read new data.
avro.CheckCompatibility(newSchema, oldSchema)

// Full: check both directions.
avro.CheckCompatibility(oldSchema, newSchema)
avro.CheckCompatibility(newSchema, oldSchema)

Object Container Files

The ocf sub-package reads and writes Avro Object Container Files — self-describing binary files that embed the schema in the header and store data in compressed blocks.

Writing
var schema = avro.MustParse(`{
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "age",  "type": "int"}
    ]
}`)

f, _ := os.Create("users.avro")
w, err := ocf.NewWriter(f, schema, ocf.WithCodec(ocf.SnappyCodec()))
if err != nil {
    log.Fatal(err)
}
w.Encode(&User{Name: "Alice", Age: 30})
w.Encode(&User{Name: "Bob", Age: 25})
w.Close()
f.Close()
Reading
f, _ := os.Open("users.avro")
r, err := ocf.NewReader(f)
if err != nil {
    log.Fatal(err)
}
defer r.Close()
for {
    var u User
    err := r.Decode(&u)
    if err == io.EOF {
        break
    }
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(u)
}

The reader's Schema() method returns the schema parsed from the file header, which you can pass as the writer schema to Resolve.

Codecs

Built-in codecs: null (default, no compression), deflate (DeflateCodec), snappy (SnappyCodec), and zstandard (ZstdCodec). Custom codecs can be provided via the Codec interface.

Appending

NewAppendWriter opens an existing OCF for appending — it reads the header to recover the schema, codec, and sync marker, then seeks to the end.

Single Object Encoding

For sending self-describing values over the wire (as opposed to files, where OCF is preferred), use Single Object Encoding. Each message is a 2-byte magic header, an 8-byte CRC-64-AVRO fingerprint, and the Avro binary payload.

// Encode with fingerprint header
data, err := schema.AppendSingleObject(nil, &user)

// Decode (schema known)
_, err = schema.DecodeSingleObject(data, &user)

// Decode (schema unknown): extract fingerprint, look up schema
fp, payload, err := avro.SingleObjectFingerprint(data)
schema := registry.Lookup(fp) // your schema registry
_, err = schema.Decode(payload, &user)

Fingerprinting

Canonical returns the Parsing Canonical Form of a schema — a deterministic JSON representation stripped of doc, aliases, defaults, and other non-essential attributes. Use it for schema comparison and fingerprinting.

canonical := schema.Canonical() // []byte

// CRC-64-AVRO (Rabin) — the Avro-standard fingerprint
fp := schema.Fingerprint(avro.NewRabin())

// SHA-256 — common for cross-language registries
fp256 := schema.Fingerprint(sha256.New())

Performance

Struct field access uses unsafe pointer arithmetic (similar to encoding/json v2) to avoid reflect.Value overhead on every encode/decode. All schemas, type mappings, and codec state are cached after first use so repeated operations pay no extra allocation cost.

Documentation

Overview

Package avro encodes and decodes Avro binary data.

Parse an Avro JSON schema with Parse (or MustParse for package-level vars), then call Schema.Encode / Schema.Decode to convert between Go values and Avro binary. See Schema.Decode for the full Go-to-Avro type mapping.

Basic usage

schema := avro.MustParse(`{
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "age",  "type": "int"}
    ]
}`)

type User struct {
    Name string `avro:"name"`
    Age  int    `avro:"age"`
}

// Encode
data, err := schema.Encode(&User{Name: "Alice", Age: 30})

// Decode
var u User
_, err = schema.Decode(data, &u)

Schema evolution

Avro data is always written with a specific schema — the "writer schema." When you read that data later, your application may expect a different schema — the "reader schema." For example, you may have added a field, removed one, or widened a type from int to long. The data on disk doesn't change, but your code expects the new layout.

Resolve bridges this gap. Given the writer and reader schemas, it returns a new schema that knows how to decode the old wire format and produce values in the reader's layout:

  • Fields in the reader but not the writer are filled from defaults.
  • Fields in the writer but not the reader are skipped.
  • Fields that exist in both are matched by name (or alias) and decoded, with type promotion applied where needed (e.g. int → long).

You typically get the writer schema from the data itself: an [ocf] file header embeds it, and schema registries store it by ID or fingerprint.

As a concrete example, suppose v1 of your application wrote User records with just a name:

var writerSchema = avro.MustParse(`{
    "type": "record", "name": "User",
    "fields": [
        {"name": "name", "type": "string"}
    ]
}`)

In v2, you added an email field with a default:

var readerSchema = avro.MustParse(`{
    "type": "record", "name": "User",
    "fields": [
        {"name": "name",  "type": "string"},
        {"name": "email", "type": "string", "default": ""}
    ]
}`)

type User struct {
    Name  string `avro:"name"`
    Email string `avro:"email"`
}

To read old v1 data with your v2 struct, resolve the two schemas:

resolved, err := avro.Resolve(writerSchema, readerSchema)

// Decode v1 data: "email" is absent in the old data, so it gets
// the reader default ("").
var u User
_, err = resolved.Decode(v1Data, &u)
// u == User{Name: "Alice", Email: ""}

If you just want to check whether two schemas are compatible without building a resolved schema, use CheckCompatibility.

Single Object Encoding

For sending self-describing values over the wire (as opposed to files, where [ocf] is preferred), use Schema.AppendSingleObject and Schema.DecodeSingleObject. To decode without knowing the schema in advance, extract the fingerprint with SingleObjectFingerprint and look it up in your own registry.

Fingerprinting

Schema.Canonical returns the Parsing Canonical Form for deterministic comparison. Schema.Fingerprint hashes it with any hash.Hash; use NewRabin for the Avro-standard CRC-64-AVRO.

Example
package main

import (
	"fmt"
	"log"

	"github.com/twmb/avro"
)

func main() {
	schema := avro.MustParse(`{
		"type": "record",
		"name": "User",
		"fields": [
			{"name": "name", "type": "string"},
			{"name": "age",  "type": "int"}
		]
	}`)

	type User struct {
		Name string `avro:"name"`
		Age  int32  `avro:"age"`
	}

	data, err := schema.Encode(&User{Name: "Alice", Age: 30})
	if err != nil {
		log.Fatal(err)
	}

	var u User
	if _, err := schema.Decode(data, &u); err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%s is %d\n", u.Name, u.Age)
}
Output:

Alice is 30

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func CheckCompatibility

func CheckCompatibility(writer, reader *Schema) error

CheckCompatibility reports whether data written with the writer schema can be read by the reader schema. It returns nil on success or a *CompatibilityError describing the first incompatibility.

See Resolve for a note on argument order.

func NewRabin

func NewRabin() hash.Hash64

NewRabin returns a hash.Hash64 computing the CRC-64-AVRO (Rabin) fingerprint defined by the Avro specification.

func SingleObjectFingerprint

func SingleObjectFingerprint(data []byte) (fp [8]byte, rest []byte, err error)

SingleObjectFingerprint extracts the 8-byte CRC-64-AVRO fingerprint and returns the remaining payload. Use this to look up the schema by fingerprint before decoding.

Types

type CompatibilityError

type CompatibilityError struct {
	// Path is the dotted path to the incompatible element (e.g. "User.address.zip").
	Path string
	// ReaderType is the Avro type in the reader schema.
	ReaderType string
	// WriterType is the Avro type in the writer schema.
	WriterType string
	// Detail describes the specific incompatibility.
	Detail string
}

CompatibilityError describes an incompatibility between a reader and writer schema, as returned by CheckCompatibility and Resolve.

func (*CompatibilityError) Error

func (e *CompatibilityError) Error() string

type Duration

type Duration struct {
	Months       uint32
	Days         uint32
	Milliseconds uint32
}

Duration represents the Avro duration logical type: a 12-byte fixed value containing three little-endian unsigned 32-bit integers representing months, days, and milliseconds.

type Schema

type Schema struct {
	// contains filtered or unexported fields
}

Schema is a compiled Avro schema. Create one with Parse or MustParse, then use Schema.Encode / Schema.Decode to convert between Go values and Avro binary. A Schema is safe for concurrent use.

func MustParse

func MustParse(schema string) *Schema

MustParse is like Parse but panics on error. Useful for package-level var declarations.

func Parse

func Parse(schema string) (*Schema, error)

Parse parses an Avro JSON schema string and returns a compiled *Schema. The input can be a primitive name (e.g. `"string"`), a JSON object (record, enum, array, map, fixed), or a JSON array (union). Named types may reference each other and self-reference. The schema is fully validated: unknown types, duplicate names, invalid defaults, etc. all return errors.

To parse schemas that reference named types from other schemas, use SchemaCache.

func Resolve

func Resolve(writer, reader *Schema) (*Schema, error)

Resolve returns a schema that decodes data written with the writer schema and produces values matching the reader schema's layout. The writer schema is what the data was encoded with (typically from an [ocf] file header or a schema registry); the reader schema is what your application expects now.

Decoding with the returned schema handles field addition (defaults), field removal (skip), renaming (aliases), reordering, and type promotion. Encoding with it uses the reader's format.

If the schemas have identical canonical forms, reader is returned as-is. Otherwise CheckCompatibility is called first and any incompatibility is returned as a *CompatibilityError. See the package-level documentation for a full example.

Note: the argument order is (writer, reader), matching source-then-destination convention and Java's GenericDatumReader. This differs from the Avro spec text and hamba/avro, which put reader first.

Example
package main

import (
	"fmt"
	"log"

	"github.com/twmb/avro"
)

func main() {
	// v1 wrote User with just a name.
	writerSchema := avro.MustParse(`{
		"type": "record", "name": "User",
		"fields": [{"name": "name", "type": "string"}]
	}`)

	// v2 added an email field with a default.
	readerSchema := avro.MustParse(`{
		"type": "record", "name": "User",
		"fields": [
			{"name": "name",  "type": "string"},
			{"name": "email", "type": "string", "default": ""}
		]
	}`)

	resolved, err := avro.Resolve(writerSchema, readerSchema)
	if err != nil {
		log.Fatal(err)
	}

	// Encode a v1 record (name only).
	v1Data, err := writerSchema.Encode(map[string]any{"name": "Alice"})
	if err != nil {
		log.Fatal(err)
	}

	// Decode old data into the new layout; email gets the default.
	type User struct {
		Name  string `avro:"name"`
		Email string `avro:"email"`
	}
	var u User
	if _, err := resolved.Decode(v1Data, &u); err != nil {
		log.Fatal(err)
	}
	fmt.Printf("name=%s email=%q\n", u.Name, u.Email)
}
Output:

name=Alice email=""

func (*Schema) AppendEncode

func (s *Schema) AppendEncode(dst []byte, v any) ([]byte, error)

AppendEncode appends the Avro binary encoding of v to dst. See Schema.Decode for the Go-to-Avro type mapping.

Example
package main

import (
	"fmt"
	"log"

	"github.com/twmb/avro"
)

func main() {
	schema := avro.MustParse(`"string"`)

	// AppendEncode reuses a buffer across calls, avoiding allocation.
	var buf []byte
	var err error
	for _, s := range []string{"hello", "world"} {
		buf, err = schema.AppendEncode(buf[:0], s)
		if err != nil {
			log.Fatal(err)
		}
		fmt.Printf("encoded %q: %d bytes\n", s, len(buf))
	}
}
Output:

encoded "hello": 6 bytes
encoded "world": 6 bytes

func (*Schema) AppendSingleObject

func (s *Schema) AppendSingleObject(dst []byte, v any) ([]byte, error)

AppendSingleObject appends a Single Object Encoding of v to dst: 2-byte magic, 8-byte CRC-64-AVRO fingerprint, then the Avro binary payload.

Example
package main

import (
	"fmt"
	"log"

	"github.com/twmb/avro"
)

func main() {
	schema := avro.MustParse(`{
		"type": "record",
		"name": "Event",
		"fields": [
			{"name": "id",   "type": "long"},
			{"name": "name", "type": "string"}
		]
	}`)

	type Event struct {
		ID   int64  `avro:"id"`
		Name string `avro:"name"`
	}

	// Encode: 2-byte magic + 8-byte fingerprint + Avro payload.
	data, err := schema.AppendSingleObject(nil, &Event{ID: 1, Name: "click"})
	if err != nil {
		log.Fatal(err)
	}

	// Decode.
	var e Event
	if _, err := schema.DecodeSingleObject(data, &e); err != nil {
		log.Fatal(err)
	}
	fmt.Printf("id=%d name=%s\n", e.ID, e.Name)
}
Output:

id=1 name=click

func (*Schema) Canonical

func (s *Schema) Canonical() []byte

Canonical returns the Parsing Canonical Form of the schema, stripping doc, aliases, defaults, and other non-essential attributes. The result is deterministic and suitable for comparison and fingerprinting.

func (*Schema) Decode

func (s *Schema) Decode(src []byte, v any) ([]byte, error)

Decode reads Avro binary from src into v and returns the remaining bytes. v must be a non-nil pointer to a type compatible with the schema:

  • null: any (always decodes to nil)
  • boolean: bool, any
  • int, long: int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, any
  • float: float32, float64, any
  • double: float64, float32, any
  • string: string, []byte, any; also encoding.TextUnmarshaler
  • bytes: []byte, string, any
  • enum: string, int/int8/.../uint64 (ordinal), any
  • fixed: [N]byte, []byte, any
  • array: slice, any
  • map: map[string]T, any
  • union: any, *T (for ["null", T] unions), or the matched branch type
  • record: struct (matched by field name or `avro` tag), map[string]any, any

When decoding into any (interface{}), values are returned as their natural Go types: nil, bool, int64, float32, float64, string, []byte, []any, map[string]any, or map[string]any for records.

func (*Schema) DecodeSingleObject

func (s *Schema) DecodeSingleObject(data []byte, v any) ([]byte, error)

DecodeSingleObject decodes a Single Object Encoding message into v after verifying the magic and fingerprint match this schema.

func (*Schema) Encode

func (s *Schema) Encode(v any) ([]byte, error)

Encode encodes v as Avro binary. It is shorthand for AppendEncode(nil, v).

func (*Schema) Fingerprint

func (s *Schema) Fingerprint(h hash.Hash) []byte

Fingerprint hashes the schema's canonical form with h. Use NewRabin for CRC-64-AVRO or crypto/sha256 for cross-language compatibility.

func (*Schema) String

func (s *Schema) String() string

String returns the original JSON passed to Parse, preserving all attributes (doc, aliases, defaults, etc.) unlike Schema.Canonical.

type SchemaCache

type SchemaCache struct {
	// contains filtered or unexported fields
}

SchemaCache accumulates named types across multiple SchemaCache.Parse calls, allowing schemas to reference types defined in previously parsed schemas. This is useful for Schema Registry integrations where schemas have references to other schemas.

Schemas must be parsed in dependency order: referenced types must be parsed before the schemas that reference them.

The returned *Schema from each Parse call is fully resolved and independent of the cache — it can be used for Schema.Encode and Schema.Decode without the cache.

A SchemaCache is safe for concurrent use.

Example
package main

import (
	"fmt"
	"log"

	"github.com/twmb/avro"
)

func main() {
	cache := avro.NewSchemaCache()

	// Parse the Address type first.
	if _, err := cache.Parse(`{
		"type": "record",
		"name": "Address",
		"fields": [
			{"name": "street", "type": "string"},
			{"name": "city",   "type": "string"}
		]
	}`); err != nil {
		log.Fatal(err)
	}

	// User references Address by name.
	schema, err := cache.Parse(`{
		"type": "record",
		"name": "User",
		"fields": [
			{"name": "name",    "type": "string"},
			{"name": "address", "type": "Address"}
		]
	}`)
	if err != nil {
		log.Fatal(err)
	}

	type Address struct {
		Street string `avro:"street"`
		City   string `avro:"city"`
	}
	type User struct {
		Name    string  `avro:"name"`
		Address Address `avro:"address"`
	}

	data, err := schema.Encode(&User{
		Name:    "Alice",
		Address: Address{Street: "123 Main St", City: "Springfield"},
	})
	if err != nil {
		log.Fatal(err)
	}

	var u User
	if _, err := schema.Decode(data, &u); err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%s lives at %s, %s\n", u.Name, u.Address.Street, u.Address.City)
}
Output:

Alice lives at 123 Main St, Springfield

func NewSchemaCache

func NewSchemaCache() *SchemaCache

NewSchemaCache returns a new empty SchemaCache.

func (*SchemaCache) Parse

func (c *SchemaCache) Parse(schema string) (*Schema, error)

Parse parses a schema string, registering any named types (records, enums, fixed) in the cache. Named types from previous Parse calls are available for reference resolution. On failure, the cache is not modified.

type SemanticError

type SemanticError struct {
	// GoType is the Go type involved, if applicable.
	GoType reflect.Type
	// AvroType is the Avro schema type (e.g. "int", "record", "boolean").
	AvroType string
	// Field is the record field name, if within a record.
	Field string
	// Err is the underlying error.
	Err error
}

SemanticError indicates a Go type is incompatible with an Avro schema type during encoding or decoding.

func (*SemanticError) Error

func (e *SemanticError) Error() string

func (*SemanticError) Unwrap

func (e *SemanticError) Unwrap() error

type ShortBufferError

type ShortBufferError struct {
	// Type is what was being read (e.g. "boolean", "string", "uint32").
	Type string
	// Need is the number of bytes required (0 if unknown).
	Need int
	// Have is the number of bytes available.
	Have int
}

ShortBufferError indicates the input buffer is too short for the value being decoded.

func (*ShortBufferError) Error

func (e *ShortBufferError) Error() string

Directories

Path Synopsis
Package ocf implements Avro [Object Container Files] (OCF).
Package ocf implements Avro [Object Container Files] (OCF).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL