bufarrow

package module
v0.5.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 13, 2025 License: Apache-2.0 Imports: 23 Imported by: 2

README ΒΆ

bufarrow 🦬

Go Reference

Go library to build Apache Arrow records from Protocol Buffers

Features

  • generate Arrow and Parquet schemas from Protobuf structs
  • build Arrow records from Protobuf

πŸš€ Install

Using bufarrow is easy. First, use go get to install the latest version of the library.

go get -u github.com/loicalleyne/bufarrow@latest

πŸ’‘ Usage

You can import bufarrow using:

import "github.com/loicalleyne/bufarrow"

πŸ’« Show your support

Give a ⭐️ if this project helped you! Feedback and PRs welcome.

License

Bufarrow is released under the Apache 2.0 license. See LICENCE.txt

Documentation ΒΆ

Index ΒΆ

Constants ΒΆ

This section is empty.

Variables ΒΆ

View Source
var ErrMxDepth = errors.New("max depth reached, either the message is deeply nested or a circular dependency was introduced")
View Source
var ErrPathNotFound = errors.New("path not found")

Functions ΒΆ

This section is empty.

Types ΒΆ

type Cardinality ΒΆ added in v0.5.1

type Cardinality protoreflect.Cardinality

Cardinality determines whether a field is optional, required, or repeated.

const (
	Optional Cardinality = 1 // appears zero or one times
	Required Cardinality = 2 // appears exactly one time; invalid with Proto3
	Repeated Cardinality = 3 // appears zero or more times
)

Constants as defined by the google.protobuf.Cardinality enumeration.

func (*Cardinality) Get ΒΆ added in v0.5.1

type CustomField ΒΆ added in v0.5.1

type CustomField struct {
	Name             string
	Type             FieldType
	FieldCardinality Cardinality
	IsPacked         bool
}

type FieldType ΒΆ added in v0.5.1

type FieldType fieldType
const (
	BOOL    FieldType = "bool"
	BYTES   FieldType = "[]byte"
	STRING  FieldType = "string"
	INT64   FieldType = "int64"
	FLOAT64 FieldType = "float64"
)

type Opt ΒΆ added in v0.5.0

type Opt struct {
	// contains filtered or unexported fields
}

type Option ΒΆ added in v0.5.0

type Option func(config)

func WithCustomFields ΒΆ added in v0.5.1

func WithCustomFields(c []CustomField) Option

WithCustomFields

func WithNormalizer ΒΆ added in v0.5.0

func WithNormalizer(fields, aliases []string, failOnRangeError bool) Option

WithNormalizer configures the scalars to add to a flat Arrow Record suitable for efficient aggregation. Fields should be specified by their path (field names separated by a period ie. 'field1.field2.field3'). The Arrow field types of the selected fields will be used to build the new schema. If coaslescing data between multiple fields of the same type, specify only one of the paths. List fields should have an index to retrieve specified, otherwise defaults to all elements; ranges are not yet implemented. Current functionality is limited to valitating the fields/aliases match in `New()β€œ, and `NormalizerBuilder()` returning an `*arrow.RecordBuilder` to be used externally to append data, and NewNormalizerRecord() to get an `arrow.Record` from the normalizer RecordBuilder. Future development may include Append methods that accept protopath operations to normalize protobuf messages in-flight internally to the package. failOnRangeError indicates whether to fail on a list[start:end] where end > len(list). TODO

type Schema ΒΆ

type Schema[T proto.Message] struct {
	// contains filtered or unexported fields
}

func New ΒΆ

func New[T proto.Message](mem memory.Allocator, opts ...Option) (schema *Schema[T], err error)

func (*Schema[T]) Append ΒΆ

func (s *Schema[T]) Append(value T)

Append appends protobuf value to the schema builder.This method is not safe for concurrent use.

func (*Schema[T]) AppendWithCustom ΒΆ added in v0.5.1

func (s *Schema[T]) AppendWithCustom(value T, c ...any) error

AppendWithCustom appends protobuf value and custom field values to the schema builder. This method is not safe for concurrent use.

func (*Schema[T]) Clone ΒΆ added in v0.4.0

func (s *Schema[T]) Clone(mem memory.Allocator) (schema *Schema[T], err error)

func (*Schema[T]) FieldNames ΒΆ added in v0.3.0

func (s *Schema[T]) FieldNames() []string

FieldNames returns top-level field names

func (*Schema[T]) NewNormalizerRecord ΒΆ added in v0.5.0

func (s *Schema[T]) NewNormalizerRecord() arrow.Record

NewNormalizerRecord returns buffered builder value as an arrow.Record. The builder is reset and can be reused to build new records.

func (*Schema[T]) NewRecord ΒΆ

func (s *Schema[T]) NewRecord() arrow.Record

NewRecord returns buffered builder value as an arrow.Record. The builder is reset and can be reused to build new records.

func (*Schema[T]) NormalizerBuilder ΒΆ added in v0.5.0

func (s *Schema[T]) NormalizerBuilder() *array.RecordBuilder

func (*Schema[T]) Parquet ΒΆ

func (s *Schema[T]) Parquet() *schema.Schema

Parquet returns schema as parquet schema

func (*Schema[T]) Proto ΒΆ

func (s *Schema[T]) Proto(r arrow.Record, rows []int) []T

Proto decodes rows and returns them as proto messages.

func (*Schema[T]) ReadParquet ΒΆ

func (s *Schema[T]) ReadParquet(ctx context.Context, r parquet.ReaderAtSeeker, columns []int) (arrow.Record, error)

ReadParquet specified columns from parquet source r and returns an Arrow record. The returned record must be released by the caller.

func (*Schema[T]) Release ΒΆ

func (s *Schema[T]) Release()

Release releases the reference on the message builder

func (*Schema[T]) Schema ΒΆ

func (s *Schema[T]) Schema() *arrow.Schema

Schema returns schema as arrow schema

func (*Schema[T]) WriteParquet ΒΆ

func (s *Schema[T]) WriteParquet(w io.Writer) error

WriteParquet writes Parquet to an io.Writer

func (*Schema[T]) WriteParquetRecords ΒΆ

func (s *Schema[T]) WriteParquetRecords(w io.Writer, records ...arrow.Record) error

WriteParquetRecords write one or many Arrow records to parquet

Directories ΒΆ

Path Synopsis
gen

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL