avro

package module
v0.0.19 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 7, 2023 License: MIT Imports: 17 Imported by: 0

README

avro

GoDoc

avro is a decoder for Apache AVRO that decodes directly into Go structs and follows naming from JSON tags. It is intended primarily for decoding output from Google's Big Query.

https://avro.apache.org/docs/1.8.1/spec.html

Documentation

Overview

Package avro is an AVRO decoder aimed principly at decoding AVRO output from Google's Big Query. It decodes directly into Go structs, and uses json tags as naming hints.

The primary interface to the package is ReadFile. This reads an AVRO file, combining the schema in the file with type information from the struct passed via the out parameter to decode the records. It then passes an instance of a struct of type out to the callback cb for each record in the file.

You can implement custom decoders for your own types and register them via the Register function. github.com/phil/avro/null is an example of custom decoders for the types defined in github.com/unravelin/null

Index

Constants

This section is empty.

Variables

View Source
var FileMagic = [4]byte{'O', 'b', 'j', 1}

Functions

func ReadFile

func ReadFile(r Reader, out interface{}, cb func(val unsafe.Pointer, rb *ResourceBank) error) error

ReadFile reads from an AVRO file. The records in the file are decoded into structs of the type indicated by out. These are fed back to the application via the cb callback. ReadFile calls cb with a pointer to the struct. The pointer is converted to an unsafe.Pointer. The pointer should not be retained by the application past the return of cb.

 var records []myrecord
 if err := ReadFile(f, myrecord{}, func(val unsafe.Pointer) error {
     records = append(records, *(*record)(val))
     return nil
 }); err != nil {
	    return err
 }

func Register

func Register(typ reflect.Type, f CodecBuildFunc)

Register is used to set a custom codec builder for a type

Types

type BoolCodec

type BoolCodec struct{}

func (BoolCodec) New

func (BoolCodec) New(r *Buffer) unsafe.Pointer

func (BoolCodec) Read

func (BoolCodec) Read(r *Buffer, p unsafe.Pointer) error

func (BoolCodec) Skip

func (BoolCodec) Skip(r *Buffer) error

type Buffer added in v0.0.3

type Buffer struct {
	// contains filtered or unexported fields
}

Buffer is a very simple replacement for bytes.Reader that avoids data copies

func NewBuffer added in v0.0.3

func NewBuffer(data []byte) *Buffer

NewBuffer returns a new Buffer.

func (*Buffer) Alloc added in v0.0.6

func (d *Buffer) Alloc(rtyp reflect.Type) unsafe.Pointer

Alloc allocates a pointer to the type rtyp. The data is allocated in a ResourceBank

func (*Buffer) ExtractResourceBank added in v0.0.14

func (d *Buffer) ExtractResourceBank() *ResourceBank

ExtractResourceBank extracts the current ResourceBank from the buffer, and replaces it with a fresh one.

func (*Buffer) Len added in v0.0.3

func (d *Buffer) Len() int

Len returns the length of unread data in the buffer

func (*Buffer) Next added in v0.0.3

func (d *Buffer) Next(l int) ([]byte, error)

Next returns the next l bytes from the buffer. It does so without copying, so if you hold onto the data you risk holding onto a lot of data. If l exceeds the remaining space Next returns io.EOF

func (*Buffer) NextAsString added in v0.0.6

func (d *Buffer) NextAsString(l int) (string, error)

NextAsString returns the next l bytes from the buffer as a string. The string data is held in a StringBank and will be valid only until someone calls Close on that bank. If l exceeds the remaining space NextAsString returns io.EOF

func (*Buffer) ReadByte added in v0.0.3

func (d *Buffer) ReadByte() (byte, error)

ReadByte returns the next byte from the buffer. If no bytes are left it returns io.EOF

func (*Buffer) Reset added in v0.0.3

func (d *Buffer) Reset(data []byte)

Reset allows you to reuse a buffer with a new set of data

func (*Buffer) Varint added in v0.0.4

func (d *Buffer) Varint() (int64, error)

Varint reads a varint from the buffer

type BytesCodec

type BytesCodec struct{}

func (BytesCodec) New

func (BytesCodec) New(r *Buffer) unsafe.Pointer

func (BytesCodec) Read

func (BytesCodec) Read(r *Buffer, ptr unsafe.Pointer) error

func (BytesCodec) Skip

func (BytesCodec) Skip(r *Buffer) error

type Codec

type Codec interface {
	// Read reads the wire format bytes for the current field from r and sets up
	// the value that p points to. The codec can assume that the memory for an
	// instance of the type for which the codec is registered is present behind
	// p
	Read(r *Buffer, p unsafe.Pointer) error
	// Skip advances the reader over the bytes for the current field.
	Skip(r *Buffer) error
	// New creates a pointer to the type for which the codec is registered. It is
	// used if the enclosing record has a field that is a pointer to this type
	New(r *Buffer) unsafe.Pointer
}

Codec defines a decoder for a type. It may eventually define an encoder too. You can write custom Codecs for types. See Register and CodecBuildFunc

type CodecBuildFunc

type CodecBuildFunc func(schema Schema, typ reflect.Type) (Codec, error)

CodecBuildFunc is the function signature for a codec builder. If you want to customise AVRO decoding for a type register a CodecBuildFunc via the Register call. Schema is the AVRO schema for the type to build. typ should match the type the function was registered under.

type Compression added in v0.0.18

type Compression string
const (
	CompressionNull    Compression = "null"
	CompressionDeflate Compression = "deflate"
	CompressionSnappy  Compression = "snappy"
)

type DoubleCodec

type DoubleCodec = floatCodec[float64]

type FileHeader

type FileHeader struct {
	Magic [4]byte           `json:"magic"`
	Meta  map[string][]byte `json:"meta"`
	Sync  [16]byte          `json:"sync"`
}

FileHeader represents an AVRO file header

type FileWriter added in v0.0.18

type FileWriter struct {
	// contains filtered or unexported fields
}

FileWriter provides limited support for writing AVRO files. It allows you to write blocks of already encoded data. Actually encoding data as AVRO is not yet supported.

func NewFileWriter added in v0.0.18

func NewFileWriter(schema []byte, compression Compression) (*FileWriter, error)

NewFileWriter creates a new FileWriter. The schema is the JSON encoded schema. The compression parameter indicates the compression codec to use.

func (*FileWriter) AppendHeader added in v0.0.18

func (f *FileWriter) AppendHeader(buf []byte) []byte

AppendHeader appends the AVRO file header to the provided buffer.

func (*FileWriter) WriteBlock added in v0.0.18

func (f *FileWriter) WriteBlock(w io.Writer, rowCount int, block []byte) error

WriteBlock writes a block of data to the writer. The block must be rowCount rows of AVRO encoded data.

func (*FileWriter) WriteHeader added in v0.0.18

func (f *FileWriter) WriteHeader(w io.Writer) error

WriteHeader writes the AVRO file header to the writer.

type Float32DoubleCodec

type Float32DoubleCodec struct {
	DoubleCodec
}

func (Float32DoubleCodec) New

func (Float32DoubleCodec) Read

type FloatCodec

type FloatCodec = floatCodec[float32]

type Int16Codec

type Int16Codec = IntCodec[int16]

type Int32Codec

type Int32Codec = IntCodec[int32]

type Int64Codec

type Int64Codec = IntCodec[int64]

type IntCodec added in v0.0.13

type IntCodec[T int64 | int32 | int16] struct{}

Int64Codec is an avro codec for int64

func (IntCodec[T]) New added in v0.0.13

func (IntCodec[T]) New(r *Buffer) unsafe.Pointer

New creates a pointer to a new int64

func (IntCodec[T]) Read added in v0.0.13

func (IntCodec[T]) Read(r *Buffer, p unsafe.Pointer) error

func (IntCodec[T]) Skip added in v0.0.13

func (IntCodec[T]) Skip(r *Buffer) error

Skip skips over an int

type MapCodec

type MapCodec struct {
	// contains filtered or unexported fields
}

MapCodec is a decoder for map types. The key must always be string

func (*MapCodec) New

func (m *MapCodec) New(r *Buffer) unsafe.Pointer

func (*MapCodec) Read

func (m *MapCodec) Read(r *Buffer, p unsafe.Pointer) error

func (*MapCodec) Skip

func (m *MapCodec) Skip(r *Buffer) error

type PointerCodec added in v0.0.6

type PointerCodec struct {
	Codec
}

func (*PointerCodec) New added in v0.0.6

func (c *PointerCodec) New(r *Buffer) unsafe.Pointer

func (*PointerCodec) Read added in v0.0.6

func (c *PointerCodec) Read(r *Buffer, p unsafe.Pointer) error

type Reader

type Reader interface {
	io.Reader
	io.ByteReader
}

Reader combines io.ByteReader and io.Reader. It's what we need to read

type ResourceBank added in v0.0.6

type ResourceBank struct {
	// contains filtered or unexported fields
}

ResourceBank is used to allocate memory used to create structs to decode AVRO into. The primary reason for having it is to allow the user to flag the memory can be re-used, so reducing the strain on the GC

We allocate using the required type of thing so the GC can still inspect within the memory.

func (*ResourceBank) Alloc added in v0.0.6

func (rb *ResourceBank) Alloc(rtyp reflect.Type) unsafe.Pointer

Alloc reserves some memory in the ResourceBank. Note that this memory may be re-used after Close is called.

func (*ResourceBank) Close added in v0.0.6

func (rb *ResourceBank) Close()

Close marks the resources in the ResourceBank as available for re-use

func (*ResourceBank) ToString added in v0.0.6

func (rb *ResourceBank) ToString(in []byte) string

ToString saves string data in the bank and returns a string. The string is valid until someone calls Close

type Schema

type Schema struct {
	Type   string
	Object *SchemaObject
	Union  []Schema
}

Schema is a representation of AVRO schema JSON. Primitive types populate Type only. UnionTypes populate Type and Union fields. All other types populate Type and a subset of Object fields.

func FileSchema added in v0.0.11

func FileSchema(filename string) (Schema, error)

FileSchema reads the Schema from an AVRO file.

func SchemaFromString added in v0.0.14

func SchemaFromString(in string) (Schema, error)

func (Schema) Codec added in v0.0.14

func (s Schema) Codec(out interface{}) (Codec, error)

Codec creates a codec for the given schema and output type

type SchemaObject

type SchemaObject struct {
	Type        string `json:"type"`
	LogicalType string `json:"logicalType,omitempty"`
	Name        string `json:"name,omitempty"`
	Namespace   string `json:"namespace,omitempty"`
	// Fields in a record
	Fields []SchemaRecordField `json:"fields,omitempty"`
	// The type of each item in an array
	Items Schema `json:"items,omitempty"`
	// The value types of a map (keys are strings)
	Values Schema `json:"values,omitempty"`
	// The size of a fixed type
	Size int `json:"size,omitempty"`
	// The values of an enum
	Symbols []string `json:"symbols,omitempty"`
}

SchemaObject contains all the fields of more complex schema types

type SchemaRecordField

type SchemaRecordField struct {
	Name string `json:"name,omitempty"`
	Type Schema `json:"type,omitempty"`
}

SchemaRecordField represents one field of a Record schema

type StringCodec

type StringCodec struct{}

StringCodec is a decoder for strings

func (StringCodec) New

func (StringCodec) New(r *Buffer) unsafe.Pointer

func (StringCodec) Read

func (StringCodec) Read(r *Buffer, ptr unsafe.Pointer) error

func (StringCodec) Skip

func (StringCodec) Skip(r *Buffer) error

Directories

Path Synopsis
Package null contains avro decoders for the types in github.com/unravelin/null.
Package null contains avro decoders for the types in github.com/unravelin/null.
Package time contains avro decoders for time.Time.
Package time contains avro decoders for time.Time.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL