Documentation
¶
Overview ¶
Package memory provides a lightweight Go-slice-backed compute engine for the dataset package. It implements dataset.ColumnFactory, dataset.BuilderFactory, dataset.Aggregator, and dataset.Caster.
Usage:
eng := memory.NewEngine(context.Background())
f := eng.(dataset.ColumnFactory)
ds, _ := f.FromColumns(
dataset.NewSchema(dataset.FloatCol("x"), dataset.StringCol("label")),
f.NewFloat64Column("x", []float64{1, 2, 3}),
f.NewStringColumn("label", []string{"a", "b", "c"}),
)
Index ¶
- Variables
- type Engine
- func (e *Engine) Abs(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Acos(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) AddCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) AddScalar(col dataset.AnyColumn, val float64) (dataset.AnyColumn, error)
- func (e *Engine) Asin(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Atan(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Atan2(y, x dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitAnd(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitNot(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitOr(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitShiftLeft(col dataset.AnyColumn, n int) (dataset.AnyColumn, error)
- func (e *Engine) BitShiftRight(col dataset.AnyColumn, n int) (dataset.AnyColumn, error)
- func (e *Engine) BitXor(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Cast(col dataset.AnyColumn, target dataset.DType) (dataset.AnyColumn, error)
- func (e *Engine) Ceil(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Combine(datasets ...dataset.Table) (dataset.Table, error)
- func (e *Engine) Complete(ds dataset.Table, cols ...string) (dataset.Table, error)
- func (e *Engine) Concatenate(ds dataset.Table, col string, from []string, sep string) (dataset.Table, error)
- func (e *Engine) Context() context.Context
- func (e *Engine) Cos(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Count(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) CumMax(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) CumMin(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) CumSum(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) DenseRank(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) DivCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) DropNA(ds dataset.Table, cols ...string) (dataset.Table, error)
- func (e *Engine) Erf(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Exp(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Fill(col dataset.AnyColumn, dir dataset.FillDirection) (dataset.AnyColumn, error)
- func (e *Engine) Filter(ds dataset.Table, mask dataset.Masker) (dataset.Table, error)
- func (e *Engine) FilterIndices(mask []bool) []int
- func (e *Engine) Floor(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) FromColumns(schema *dataset.Schema, cols ...dataset.AnyColumn) (dataset.Table, error)
- func (e *Engine) Join(left, right dataset.Table, spec dataset.JoinSpec) (dataset.Table, error)
- func (e *Engine) Lag(col dataset.AnyColumn, n int) (dataset.AnyColumn, error)
- func (e *Engine) Lead(col dataset.AnyColumn, n int) (dataset.AnyColumn, error)
- func (e *Engine) Ln(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Log2(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Log10(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Mean(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Median(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) MinMax(col dataset.AnyColumn) (dataset.AnyColumn, dataset.AnyColumn, error)
- func (e *Engine) MulCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) MulScalar(col dataset.AnyColumn, val float64) (dataset.AnyColumn, error)
- func (e *Engine) Name() string
- func (e *Engine) Neg(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) NewBoolColumn(name string, data []bool) dataset.AnyColumn
- func (e *Engine) NewBuilder(schema *dataset.Schema) dataset.Builder
- func (e *Engine) NewFloat64Column(name string, data []float64) dataset.AnyColumn
- func (e *Engine) NewInt64Column(name string, data []int64) dataset.AnyColumn
- func (e *Engine) NewStringColumn(name string, data []string) dataset.AnyColumn
- func (e *Engine) NewTimestampColumn(name string, data []int64) dataset.AnyColumn
- func (e *Engine) PercentRank(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) PivotLonger(ds dataset.Table, spec dataset.PivotLongerSpec) (dataset.Table, error)
- func (e *Engine) PivotWider(ds dataset.Table, spec dataset.PivotWiderSpec) (dataset.Table, error)
- func (e *Engine) Pow(col dataset.AnyColumn, exp float64) (dataset.AnyColumn, error)
- func (e *Engine) Rank(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) ReadCSV(_ context.Context, r io.Reader, cfg dataset.CSVConfig) (dataset.Table, error)
- func (e *Engine) ReadParquet(_ context.Context, r io.ReaderAt, size int64, _ dataset.ParquetConfig) (dataset.Table, error)
- func (e *Engine) ReplaceNA(col dataset.AnyColumn, defaultVal float64) (dataset.AnyColumn, error)
- func (e *Engine) Round(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) RowNumber(n int) (dataset.AnyColumn, error)
- func (e *Engine) Select(col dataset.AnyColumn, indices []int) (dataset.AnyColumn, error)
- func (e *Engine) Separate(ds dataset.Table, col string, into []string, sep string) (dataset.Table, error)
- func (e *Engine) Sigmoid(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Sign(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Sin(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Slice(col dataset.AnyColumn, start, end int) (dataset.AnyColumn, error)
- func (e *Engine) SortIndices(col dataset.AnyColumn) ([]int, error)
- func (e *Engine) Sqrt(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Stack(datasets ...dataset.Table) (dataset.Table, error)
- func (e *Engine) SubCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Sum(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Tan(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Tanh(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Variance(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) WriteCSV(_ context.Context, w io.Writer, ds dataset.Table, cfg dataset.CSVConfig) error
- func (e *Engine) WriteParquet(_ context.Context, w io.Writer, ds dataset.Table, _ dataset.ParquetConfig) error
Constants ¶
This section is empty.
Variables ¶
var ( // ErrUnsupportedType is returned for unsupported column types. ErrUnsupportedType = errors.New("memory: unsupported column type") // ErrLengthMismatch is returned when column lengths don't match. ErrLengthMismatch = errors.New("memory: column length mismatch") // ErrEmptyColumn is returned when an operation requires non-empty data. ErrEmptyColumn = errors.New("memory: empty column") // ErrRequiresFloat64 is returned when a float64 column is required. ErrRequiresFloat64 = errors.New("memory: operation requires float64 column") // ErrRequiresInt64 is returned when an int64 column is required. ErrRequiresInt64 = errors.New("memory: operation requires int64 column") // ErrRequiresNumeric is returned when a numeric column is required. ErrRequiresNumeric = errors.New("memory: operation requires numeric column") // ErrJoinKeyMismatch is returned when join key types don't match. ErrJoinKeyMismatch = errors.New("memory: join key type mismatch") // ErrTakeTypeMismatch is returned when a Take/Select result has unexpected type. ErrTakeTypeMismatch = errors.New("memory: unexpected result type from Take/Select") )
Sentinel errors for the memory engine package.
Functions ¶
This section is empty.
Types ¶
type Engine ¶
type Engine struct {
// contains filtered or unexported fields
}
Engine is the Go-slice compute backend.
func (*Engine) BitShiftLeft ¶
BitShiftLeft shifts each int64 element left by n bits.
func (*Engine) BitShiftRight ¶
BitShiftRight shifts each int64 element right by n bits.
func (*Engine) Complete ¶
Complete generates all combinations of the specified columns' unique values, filling missing rows with null values.
func (*Engine) Concatenate ¶
func (e *Engine) Concatenate(ds dataset.Table, col string, from []string, sep string) (dataset.Table, error)
Concatenate joins multiple string columns into one with a separator.
func (*Engine) FilterIndices ¶
FilterIndices returns the indices where mask is true.
func (*Engine) FromColumns ¶
func (e *Engine) FromColumns(schema *dataset.Schema, cols ...dataset.AnyColumn) (dataset.Table, error)
FromColumns constructs a Table from a schema and pre-built columns.
func (*Engine) Join ¶
Join implements the Joiner interface with a hash-join algorithm. It supports Inner, Left, Right, Full, Semi, and Anti joins.
func (*Engine) NewBoolColumn ¶
NewBoolColumn creates a bool column from the given slice.
func (*Engine) NewBuilder ¶
NewBuilder creates a typed row-appender for the given schema.
func (*Engine) NewFloat64Column ¶
NewFloat64Column creates a float64 column from the given slice.
func (*Engine) NewInt64Column ¶
NewInt64Column creates an int64 column from the given slice.
func (*Engine) NewStringColumn ¶
NewStringColumn creates a string column from the given slice.
func (*Engine) NewTimestampColumn ¶
NewTimestampColumn creates a timestamp column (int64-backed) from the given slice.
func (*Engine) PercentRank ¶
PercentRank returns (rank - 1) / (n - 1) as float64. Returns 0 for single element.
func (*Engine) PivotLonger ¶
PivotLonger reshapes a wide dataset to long format. Columns listed in spec.Cols are "gathered" into two new columns: spec.NamesTo (holds original column names) and spec.ValuesTo (holds values). All other columns are repeated for each gathered column.
func (*Engine) PivotWider ¶
PivotWider reshapes a long dataset to wide format. spec.NamesFrom identifies the column whose unique values become new column names. spec.ValuesFrom identifies the column whose values fill the new columns. All other columns are the "id" columns that define unique rows.
func (*Engine) Rank ¶
Rank returns competition rank (1-indexed). Ties get the same rank, next rank skips. E.g. [10,20,20,30] → [1,2,2,4].
func (*Engine) ReadCSV ¶
func (e *Engine) ReadCSV(_ context.Context, r io.Reader, cfg dataset.CSVConfig) (dataset.Table, error)
ReadCSV reads CSV data using go-simdcsv with schema inference.
func (*Engine) ReadParquet ¶
func (e *Engine) ReadParquet(_ context.Context, r io.ReaderAt, size int64, _ dataset.ParquetConfig) (dataset.Table, error)
ReadParquet reads Parquet data using parquet-go (row-based reader).
func (*Engine) ReplaceNA ¶
ReplaceNA replaces null (NaN) values in a float64 column with defaultVal.
func (*Engine) Separate ¶
func (e *Engine) Separate(ds dataset.Table, col string, into []string, sep string) (dataset.Table, error)
Separate splits a string column by a delimiter into multiple columns.
func (*Engine) SortIndices ¶
SortIndices returns the permutation that sorts the column ascending.
func (*Engine) Variance ¶
Variance returns the sample variance of a float64 column as a single-row column.