Documentation
¶
Overview ¶
Package dio provides a simple and easy to use API for reading and writing records from/to different data sources. It is an abstraction layer with a similar API as the io package from the Go standard library.
Index ¶
- func Copy(ctx context.Context, r Reader, w Writer) error
- type And
- type BatchStream
- type Eq
- type Filter
- type FilterClassifier
- type FilterSupport
- type Gt
- type GtEq
- type In
- type IsNotNull
- type IsNull
- type Lt
- type LtEq
- type Neq
- type Not
- type Or
- type ReadConfig
- type ReadOption
- type Reader
- type ReaderFrom
- type Writer
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type BatchStream ¶
type BatchStream interface {
// Schema returns the Arrow schema for records in this stream.
Schema() *arrow.Schema
// Next returns the next record batch in the stream.
// Returns io.EOF when no more batches are available.
// The caller must call Release() on the returned record.
Next(ctx context.Context) (arrow.RecordBatch, error)
// Close releases any resources held by this stream.
Close() error
}
BatchStream represents a stream of Arrow RecordBatches using a pull model. Callers MUST call Release() on each returned record. Returns io.EOF when the stream is exhausted.
func NewEmptyStream ¶
func NewEmptyStream(schema *arrow.Schema) BatchStream
NewEmptyStream creates a BatchStream with the given schema that produces no records.
type Filter ¶
type Filter interface {
// contains filtered or unexported methods
}
Filter represents a pushdown predicate. Deliberately limited to column-vs-literal comparisons. The unexported filter() method seals the interface.
type FilterClassifier ¶
type FilterClassifier interface {
ClassifyFilters(filters []Filter) []FilterSupport
}
FilterClassifier is optionally implemented by Readers that can report which filters they actually handle.
type FilterSupport ¶
type FilterSupport int
FilterSupport indicates how well a Reader handles a particular filter.
const ( // FilterUnsupported means the Reader cannot use this filter. FilterUnsupported FilterSupport = iota // FilterInexact means the Reader uses this filter as a hint // (e.g., Parquet row-group pruning) but does not guarantee // all returned rows satisfy it. Caller must still post-filter. FilterInexact // FilterExact means the Reader guarantees all returned rows // satisfy this filter. Caller can skip post-filtering for it. FilterExact )
type ReadConfig ¶
type ReadConfig struct {
// Projection specifies column indices to read. Nil means all columns.
Projection []int
// Filters specifies pushdown filter predicates. Readers may ignore these.
Filters []Filter
}
ReadConfig holds the configuration for a Read call. Fields are exported so sub-packages (dio/csv, dio/parquet) can access projection/filter config.
func ApplyOptions ¶
func ApplyOptions(opts []ReadOption) ReadConfig
ApplyOptions applies the given ReadOptions to a ReadConfig and returns it.
type ReadOption ¶
type ReadOption func(*ReadConfig)
ReadOption configures a Read operation.
func WithFilter ¶
func WithFilter(filters []Filter) ReadOption
WithFilter returns a ReadOption that sets filter predicates for pushdown. Filters are hints; readers are not required to apply them.
func WithProjection ¶
func WithProjection(cols []int) ReadOption
WithProjection returns a ReadOption that sets column projection. cols specifies which column indices to read from the source.
type Reader ¶
type Reader interface {
// Schema returns the Arrow schema for records produced by this reader.
Schema() *arrow.Schema
// Read opens a stream of Arrow RecordBatches from the source.
// Options allow projection and filter pushdown.
Read(ctx context.Context, opts ...ReadOption) (BatchStream, error)
}
Reader reads Arrow data from a source. The batch size is a construction-time concern, not a per-read parameter.
type ReaderFrom ¶
ReaderFrom allows a Writer to optimize for specific Reader types. Copy() checks for this before falling back to the generic stream path.
type Writer ¶
type Writer interface {
// WriteStream writes all records from the stream to the destination.
WriteStream(ctx context.Context, s BatchStream) error
// Close finalizes the destination (write footers, flush buffers, etc.).
Close() error
}
Writer writes a stream of Arrow RecordBatches to a destination.