dio

package
v0.0.0-...-bd113ff Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 2, 2026 License: MIT Imports: 3 Imported by: 0

Documentation

Overview

Package dio provides a simple and easy to use API for reading and writing records from/to different data sources. It is an abstraction layer with a similar API as the io package from the Go standard library.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Copy

func Copy(ctx context.Context, r Reader, w Writer) error

Copy transfers data from Reader to Writer. It checks for ReaderFrom optimization before falling back to the generic path.

Types

type And

type And struct {
	Filters []Filter
}

And combines multiple filters with logical AND.

type BatchStream

type BatchStream interface {
	// Schema returns the Arrow schema for records in this stream.
	Schema() *arrow.Schema

	// Next returns the next record batch in the stream.
	// Returns io.EOF when no more batches are available.
	// The caller must call Release() on the returned record.
	Next(ctx context.Context) (arrow.RecordBatch, error)

	// Close releases any resources held by this stream.
	Close() error
}

BatchStream represents a stream of Arrow RecordBatches using a pull model. Callers MUST call Release() on each returned record. Returns io.EOF when the stream is exhausted.

func NewEmptyStream

func NewEmptyStream(schema *arrow.Schema) BatchStream

NewEmptyStream creates a BatchStream with the given schema that produces no records.

type Eq

type Eq struct {
	Column int
	Value  any
}

Eq tests column == value.

type Filter

type Filter interface {
	// contains filtered or unexported methods
}

Filter represents a pushdown predicate. Deliberately limited to column-vs-literal comparisons. The unexported filter() method seals the interface.

type FilterClassifier

type FilterClassifier interface {
	ClassifyFilters(filters []Filter) []FilterSupport
}

FilterClassifier is optionally implemented by Readers that can report which filters they actually handle.

type FilterSupport

type FilterSupport int

FilterSupport indicates how well a Reader handles a particular filter.

const (
	// FilterUnsupported means the Reader cannot use this filter.
	FilterUnsupported FilterSupport = iota

	// FilterInexact means the Reader uses this filter as a hint
	// (e.g., Parquet row-group pruning) but does not guarantee
	// all returned rows satisfy it. Caller must still post-filter.
	FilterInexact

	// FilterExact means the Reader guarantees all returned rows
	// satisfy this filter. Caller can skip post-filtering for it.
	FilterExact
)

type Gt

type Gt struct {
	Column int
	Value  any
}

Gt tests column > value.

type GtEq

type GtEq struct {
	Column int
	Value  any
}

GtEq tests column >= value.

type In

type In struct {
	Column int
	Values []any
}

In tests column IN (values...).

type IsNotNull

type IsNotNull struct {
	Column int
}

IsNotNull tests column IS NOT NULL.

type IsNull

type IsNull struct {
	Column int
}

IsNull tests column IS NULL.

type Lt

type Lt struct {
	Column int
	Value  any
}

Lt tests column < value.

type LtEq

type LtEq struct {
	Column int
	Value  any
}

LtEq tests column <= value.

type Neq

type Neq struct {
	Column int
	Value  any
}

Neq tests column != value.

type Not

type Not struct {
	Child Filter
}

Not negates a filter.

type Or

type Or struct {
	Filters []Filter
}

Or combines multiple filters with logical OR.

type ReadConfig

type ReadConfig struct {
	// Projection specifies column indices to read. Nil means all columns.
	Projection []int

	// Filters specifies pushdown filter predicates. Readers may ignore these.
	Filters []Filter
}

ReadConfig holds the configuration for a Read call. Fields are exported so sub-packages (dio/csv, dio/parquet) can access projection/filter config.

func ApplyOptions

func ApplyOptions(opts []ReadOption) ReadConfig

ApplyOptions applies the given ReadOptions to a ReadConfig and returns it.

type ReadOption

type ReadOption func(*ReadConfig)

ReadOption configures a Read operation.

func WithFilter

func WithFilter(filters []Filter) ReadOption

WithFilter returns a ReadOption that sets filter predicates for pushdown. Filters are hints; readers are not required to apply them.

func WithProjection

func WithProjection(cols []int) ReadOption

WithProjection returns a ReadOption that sets column projection. cols specifies which column indices to read from the source.

type Reader

type Reader interface {
	// Schema returns the Arrow schema for records produced by this reader.
	Schema() *arrow.Schema

	// Read opens a stream of Arrow RecordBatches from the source.
	// Options allow projection and filter pushdown.
	Read(ctx context.Context, opts ...ReadOption) (BatchStream, error)
}

Reader reads Arrow data from a source. The batch size is a construction-time concern, not a per-read parameter.

type ReaderFrom

type ReaderFrom interface {
	ReadFrom(ctx context.Context, r Reader) error
}

ReaderFrom allows a Writer to optimize for specific Reader types. Copy() checks for this before falling back to the generic stream path.

type Writer

type Writer interface {
	// WriteStream writes all records from the stream to the destination.
	WriteStream(ctx context.Context, s BatchStream) error

	// Close finalizes the destination (write footers, flush buffers, etc.).
	Close() error
}

Writer writes a stream of Arrow RecordBatches to a destination.

Directories

Path Synopsis
ABOUTME: Row group pruning logic for Parquet filter pushdown.
ABOUTME: Row group pruning logic for Parquet filter pushdown.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL