io

package
v0.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 15, 2026 License: MIT Imports: 11 Imported by: 0

Documentation

Overview

Package io defines the I/O pipeline framework for Pulse: Reader/Writer interfaces, schema inference, and job types (ImportJob, ExportJob, ConvertJob).

Format-specific adapters live in sub-packages (csv, tsv).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ErrStopIteration

func ErrStopIteration() error

ErrStopIteration returns the stop iteration sentinel for use by readers.

Types

type ConvertJob

type ConvertJob struct {
	Source      Reader
	Target      Writer
	Schema      *encoding.Schema
	KeepPulseAt string // optional: also write intermediate .pulse
	SampleRows  int
	FS          afero.Fs
}

ConvertJob chains import and export with no intermediate file on disk (unless KeepPulseAt is set).

func NewConvertJob

func NewConvertJob(source Reader, target Writer) *ConvertJob

NewConvertJob creates a ConvertJob with default settings.

func (*ConvertJob) Predict

func (j *ConvertJob) Predict(ctx context.Context) (*PredictReport, error)

Predict validates the conversion without writing.

func (*ConvertJob) Run

func (j *ConvertJob) Run(ctx context.Context) (*ConvertReport, error)

Run executes the convert job, streaming from source to target. If KeepPulseAt is set, also writes an intermediate .pulse file.

type ConvertReport

type ConvertReport struct {
	RowsConverted int
	Schema        *encoding.Schema
	RowErrors     []RowError
}

ConvertReport summarizes the result of a convert operation.

type ExportJob

type ExportJob struct {
	Source string // input .pulse path
	Target Writer
	FS     afero.Fs
}

ExportJob converts a .pulse file into tabular output.

func NewExportJob

func NewExportJob(source string, target Writer) *ExportJob

NewExportJob creates an ExportJob.

func (*ExportJob) Predict

func (j *ExportJob) Predict(ctx context.Context) (*PredictReport, error)

Predict validates the export without writing.

func (*ExportJob) Run

func (j *ExportJob) Run(ctx context.Context) (*ExportReport, error)

Run executes the export job, converting a .pulse file to a tabular target.

type ExportReport

type ExportReport struct {
	RowsExported int
	RowErrors    []RowError
}

ExportReport summarizes the result of an export operation.

type ImportJob

type ImportJob struct {
	Source     Reader
	Target     string // output .pulse path
	Schema     *encoding.Schema
	SampleRows int // default 500, min 50
	FS         afero.Fs
}

ImportJob converts tabular source data into a .pulse file.

func NewImportJob

func NewImportJob(source Reader, target string) *ImportJob

NewImportJob creates an ImportJob with default settings.

func (*ImportJob) Predict

func (j *ImportJob) Predict(ctx context.Context) (*PredictReport, error)

Predict validates the import without writing any output.

func (*ImportJob) Run

func (j *ImportJob) Run(ctx context.Context) (*ImportReport, error)

Run executes the import job, converting tabular source into a .pulse file.

type ImportReport

type ImportReport struct {
	RowsImported int
	Schema       *encoding.Schema
	RowErrors    []RowError
}

ImportReport summarizes the result of an import operation.

type InferenceWarning

type InferenceWarning struct {
	Column  string
	Message string
}

InferenceWarning records a non-fatal observation during inference.

func InferSchema

func InferSchema(reader Reader, sampleRows int) (*encoding.Schema, []InferenceWarning, error)

InferSchema samples up to sampleRows rows from reader and proposes a Schema. If sampleRows <= 0, defaultSampleRows is used. The minimum is minSampleRows.

type PredictReport

type PredictReport struct {
	Schema        *encoding.Schema
	EstimatedRows int
	Warnings      []InferenceWarning
}

PredictReport summarizes a validation-only run.

type Reader

type Reader interface {
	// ReadHeader returns column names from the source.
	ReadHeader() ([]string, error)
	// ReadRows streams rows; calls fn for each row.
	// The context controls cancellation.
	ReadRows(ctx context.Context, fn func(row []string) error) error
	// Close releases underlying resources.
	Close() error
}

Reader reads tabular data from a source format.

type ResetReader

type ResetReader interface {
	Reader
	// Reset rewinds the reader to the beginning.
	Reset() error
}

ResetReader is an optional interface for readers that support rewinding to the beginning. This is needed for schema inference followed by import.

type RowError

type RowError struct {
	Row int
	Err error
}

RowError records a per-row error during import or export.

type SchemaAwareWriter added in v0.2.0

type SchemaAwareWriter interface {
	Writer
	SetPulseSchema(s *encoding.Schema)
}

SchemaAwareWriter is an optional extension of Writer for targets that emit native typed columns (Arrow, Parquet, Excel) and want the source .pulse schema to drive column-type selection. ExportJob calls SetPulseSchema before WriteHeader on writers that implement this interface, then passes typed values through WriteRow:

  • encoding.Decimal128 for decimal128 / nullable_decimal128 columns
  • encoding.PointF64 for point_f64 columns
  • encoding.H3Cell for h3_cell columns
  • canonical strings for narrow types (current behavior)

Writers that do not implement SchemaAwareWriter receive only canonical strings, which is the prior text-only export contract.

type Writer

type Writer interface {
	// WriteHeader writes column names to the target.
	WriteHeader(columns []string) error
	// WriteRow writes a single row of values.
	WriteRow(values []any) error
	// Close flushes and releases resources.
	Close() error
}

Writer writes tabular data to a target format.

Directories

Path Synopsis
Package arrow provides Arrow IPC (Feather V2) import and export for the pulse I/O pipeline, plus shared Arrow<->Pulse type-mapping helpers used by both this package and io/parquet.
Package arrow provides Arrow IPC (Feather V2) import and export for the pulse I/O pipeline, plus shared Arrow<->Pulse type-mapping helpers used by both this package and io/parquet.
Package csv provides CSV format adapters for the Pulse I/O pipeline.
Package csv provides CSV format adapters for the Pulse I/O pipeline.
Package excel provides Excel import and export for the pulse I/O pipeline.
Package excel provides Excel import and export for the pulse I/O pipeline.
Package format dispatches tabular Reader construction by format identifier, sitting between the io/ interface definitions and the per-format leaf packages (io/csv, io/tsv, io/ndjson, io/jsonarray, io/parquet, io/arrow, io/excel).
Package format dispatches tabular Reader construction by format identifier, sitting between the io/ interface definitions and the per-format leaf packages (io/csv, io/tsv, io/ndjson, io/jsonarray, io/parquet, io/arrow, io/excel).
Package jsonarray provides JSON-array import and export for the pulse I/O pipeline.
Package jsonarray provides JSON-array import and export for the pulse I/O pipeline.
Package jsonshared holds value coercion helpers shared by the ndjson and jsonarray packages.
Package jsonshared holds value coercion helpers shared by the ndjson and jsonarray packages.
Package ndjson provides NDJSON (newline-delimited JSON) import and export for the pulse I/O pipeline.
Package ndjson provides NDJSON (newline-delimited JSON) import and export for the pulse I/O pipeline.
Package parquet provides Parquet import and export for the pulse I/O pipeline.
Package parquet provides Parquet import and export for the pulse I/O pipeline.
Package tsv provides TSV import and export for the pulse I/O pipeline.
Package tsv provides TSV import and export for the pulse I/O pipeline.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL