datatable

package module
v0.0.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 14, 2024 License: Unlicense Imports: 7 Imported by: 0

README

datatable

Go package providing a column-centric data structure for aggregating data. Inspired by this R-based datatable

Test Status go.dev reference

Installation

Simply run

go get github.com/iand/datatable

Documentation is at https://pkg.go.dev/github.com/iand/datatable

Author

Note that this package was initially developed for Avocet and is released here with their permission.

License

This is free and unencumbered software released into the public domain. For more information, see http://unlicense.org/ or the accompanying UNLICENSE file.

Documentation

Overview

Package datatable provides a column-centric data structure for aggregating data See https://github.com/Rdatatable/data.table/wiki for inspiration

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrInvalidColumnLength   = errors.New("invalid column length")
	ErrMismatchedColumnTypes = errors.New("mismatched column types")
	ErrWrongNumberOfColumns  = errors.New("wrong number of columns in data")
)

Functions

This section is empty.

Types

type Aggregator

type Aggregator interface {
	Aggregate(rg RowGroup) float64
}

func Count

func Count() Aggregator

Count returns an Aggregator that finds the count of numeric values in a group of rows.

func DifferenceOfSums

func DifferenceOfSums(a, b string) Aggregator

func Max

func Max(name string) Aggregator

Max returns an Aggregator that finds the maximum value of a numeric column in a group of rows.

func Mean

func Mean(name string) Aggregator

Mean returns an Aggregator that finds the mean value of a numeric column in a group of rows.

func Min

func Min(name string) Aggregator

Min returns an Aggregator that finds the minimum value of a numeric column in a group of rows.

func RatioOfSums

func RatioOfSums(a, b string) Aggregator

func Sum

func Sum(name string) Aggregator

Sum returns an Aggregator that sums a numeric column in a group of rows.

func Variance

func Variance(name string) Aggregator

Variance returns an Aggregator that finds the variance of a numeric column in a group of rows.

type AggregatorFunc

type AggregatorFunc func(rg RowGroup) float64

AggregatorFunc adapts a function to an Aggregator interface

func (AggregatorFunc) Aggregate

func (fn AggregatorFunc) Aggregate(rg RowGroup) float64

type Calculator

type Calculator interface {
	Calculate(row RowRef) float64
}

A Calculator performs a calculation on a single row of numeric data.

func Constant

func Constant(v float64) Calculator

Constant returns a Calculator that always returns the constant value v

func Zero

func Zero() Calculator

Zero returns a Calculator that always returns zero

type CalculatorFunc

type CalculatorFunc func(row RowRef) float64

CalculatorFunc adapts a function to a Calculator interface

func (CalculatorFunc) Calculate

func (fn CalculatorFunc) Calculate(row RowRef) float64

type DataTable

type DataTable struct {
	// contains filtered or unexported fields
}

DataTable is a column-centric table of data. Columns can be either numeric (float64) or text (string). A DataTable is not safe for concurrent use.

func (*DataTable) AddColumn

func (dt *DataTable) AddColumn(name string, values []float64) error

AddColumn adds a column of float64 data. The length of the column must equal the length of any other columns already present in the table.

func (*DataTable) AddStringColumn

func (dt *DataTable) AddStringColumn(name string, values []string) error

AddStringColumn adds a column of string data. The length of the column must equal the length of any other columns already present in the table.

func (*DataTable) Aggregate

func (dt *DataTable) Aggregate(colName string, a Aggregator)

Aggregate appends a new numeric column to the table whose values will be populated by executing the aggregator a against each group of rows that share the same key column values. Each row in a group will be assigned the same value. Rows are evaluated in the table's current sort order as specified by its keys.

func (*DataTable) AggregateIndex

func (dt *DataTable) AggregateIndex(colName string, a Aggregator, indices []int)

AggregateIndex appends a new numeric column to the table whose values will be populated by executing the aggregator a against each group of rows that share the same key column values and are present in indices. Each row in a group will be assigned the same value. Rows are evaluated in the order they appear in indices. Rows not present in indices will be assigned a NaN value in the new column.

func (*DataTable) AggregateIndexFill

func (dt *DataTable) AggregateIndexFill(col []float64, a Aggregator, indices []int)

AggregateIndexFill populates col with values found by executing the aggregator a against each group of rows that share the same key column values and are present in indices. col must be of the same length as the datatable

func (*DataTable) AggregateWhere

func (dt *DataTable) AggregateWhere(colName string, a Aggregator, m Matcher)

AggregateWhere appends a new numeric column to the table whose values will be populated by executing the aggregator a against each group of rows that share the same key column values and match m. Each row in a group will be assigned the same value. Rows are evaluated in the table's current sort order as specified by its keys. Rows not matched by m will be assigned a NaN value in the new column.

func (*DataTable) Append

func (dt *DataTable) Append(dt2 *DataTable) error

Append appends the rows of dt2 to the data table. An error is returned if the tables share a column name with differing types (numeric vs text). Columns present in dt but not in dt2 will be expanded to the correct length with either NaN or the empty string. Columns present in dt2 but not dt will be pre-filled with NaN or empty strings before the dt2's data is appened. The data table remains sorted according to its keys after the append.

func (*DataTable) AppendRow

func (dt *DataTable) AppendRow(row []interface{}) error

AppendRow appends the data in row to the data table.

func (*DataTable) Apply

func (dt *DataTable) Apply(g Grouper)

Apply executes the grouper function g against each group of rows that share the same key column values. Rows are evaluated in the table's current sort order as specified by its keys.

func (*DataTable) ApplyIndex

func (dt *DataTable) ApplyIndex(g Grouper, indices []int)

ApplyIndex executes the grouper function g against each group of rows that share the same key column values and are present in indices. Rows are evaluated in the order they appear in indices.

func (*DataTable) ApplyWhere

func (dt *DataTable) ApplyWhere(g Grouper, m Matcher)

ApplyWhere executes the grouper function g against each group of rows that share the same key column values and match m. Rows are evaluated in the table's current sort order as specified by its keys.

func (*DataTable) CSV

func (dt *DataTable) CSV(w io.Writer) error

CSV writes the datatable as CSV

func (*DataTable) Calc

func (dt *DataTable) Calc(colName string, c Calculator)

Calc appends a new numeric column to the table whose values will be populated by executing the calculator c against each row of data. Rows are evaluated in the table's current sort order as specified by its keys.

func (*DataTable) CalcIndex

func (dt *DataTable) CalcIndex(colName string, c Calculator, indices []int)

CalcIndex appends a new numeric column to the table whose values will be populated by execting the calculator c against each row of data whose index is contained in indices. Rows are evaluated in the order they appear in indices. Rows not present in indices will be assigned a NaN value in the new column.

func (*DataTable) CalcIndexFill

func (dt *DataTable) CalcIndexFill(col []float64, c Calculator, indices []int)

func (*DataTable) CalcWhere

func (dt *DataTable) CalcWhere(colName string, c Calculator, m Matcher)

CalcWhere appends a new numeric column to the table whose values will be populated by execting the calculator c against each row of data that matches m. Rows are evaluated in the table's current sort order as specified by its keys. Rows not matched by m will be assigned a NaN value in the new column.

func (*DataTable) Clone

func (dt *DataTable) Clone() *DataTable

Clone returns a new data table containing copies of the columns contained in dt. The returned data table will have no keys set.

func (*DataTable) CloneEmpty

func (dt *DataTable) CloneEmpty() *DataTable

CloneEmpty creates an identical but empty data table with no keys set.

func (*DataTable) CountWhere

func (dt *DataTable) CountWhere(m Matcher) int

CountWhere counts the number of rows that match m. Rows are evaluated in the table's current sort order as specified by its keys.

func (*DataTable) Equal

func (dt *DataTable) Equal(i, j int) bool

Equal compares two rows and returns whether they contain the same values. If the table has keys specified then only those columns will be used in the comparison, in the order specified by the keys. Otherwise all columns are compared in the order they were added to the table.

func (*DataTable) KeyNames

func (dt *DataTable) KeyNames() []string

func (*DataTable) Len

func (dt *DataTable) Len() int

Len returns the number of rows in the data table

func (*DataTable) Less

func (dt *DataTable) Less(i, j int) bool

Less compares two rows and returns whether the row with index i should sort before the row at index j. If the table has keys specified then only those columns will be used in the comparison, in the order specified by the keys. Otherwise all columns are compared in the order they were added to the table.

func (*DataTable) Matches

func (dt *DataTable) Matches(m Matcher) []int

func (*DataTable) N

func (dt *DataTable) N() int

N returns the number of columns in the data table

func (*DataTable) Names

func (dt *DataTable) Names() []string

Names returns a slice of the column names in the data table in the order the columns were added to the table.

func (*DataTable) ParseRow

func (dt *DataTable) ParseRow(values ...string) error

ParseRow attempts to append a row of data by parsing values as either float64 or string depending on the existing type of the relevant column. Values are processed in the order that columns were added to the table.

func (*DataTable) RawRows

func (dt *DataTable) RawRows(headers bool) [][]interface{}

RawRows returns all the rows in the datatable. If headers is true then the first row returned will contain the column names. Values in each row are in the order the column was added to the table.

func (*DataTable) Reduce

func (dt *DataTable) Reduce(a Aggregator) float64

Reduce returns the value obtained by executing the aggregator a against each row in the datatable.

func (*DataTable) RemoveColumn

func (dt *DataTable) RemoveColumn(name string) error

RemoveColumn removes a column of any type from the data table.

func (*DataTable) RemoveRows

func (dt *DataTable) RemoveRows(m Matcher)

RemoveRows removes any rows that match m without altering their order.

func (*DataTable) Row

func (dt *DataTable) Row(n int) ([]interface{}, bool)

Row returns a single row of data as a slice or an empty slice and false if the row number exceed the bounds of the table. The returned slice contains one value per column in the order the columns were added to the table.

func (*DataTable) RowMap

func (dt *DataTable) RowMap(n int) (RowMap, bool)

RowMap returns a single row of data as a map or an empty map and false if the row number exceed the bounds of the table. The keys in the returned map correspond to the names of the columns.

func (*DataTable) RowRef

func (dt *DataTable) RowRef(n int) (RowRef, bool)

func (*DataTable) Rows

func (dt *DataTable) Rows() RowGroup

func (*DataTable) RowsWhere

func (dt *DataTable) RowsWhere(m Matcher) RowGroup

func (*DataTable) Select

func (dt *DataTable) Select(names []string) (*DataTable, error)

Select returns a new data table containing copies of the columns specified in names. The returned data table will have no keys set.

func (*DataTable) SelectIndex

func (dt *DataTable) SelectIndex(names []string, indices []int) (*DataTable, error)

SelectIndex returns a new data table containing copies of the columns specified in names where the rows are in indices. The returned data table will have no keys set.

func (*DataTable) SelectWhere

func (dt *DataTable) SelectWhere(names []string, m Matcher) (*DataTable, error)

SelectWhere returns a new data table containing copies of the columns specified in names where the rows match m. The returned data table will have no keys set.

func (*DataTable) SetFloatValue

func (dt *DataTable) SetFloatValue(name string, row int, v float64) error

func (*DataTable) SetKeys

func (dt *DataTable) SetKeys(keys ...string) error

SetKeys assigns a set of column names to be used as keys when sorting or aggregating. Setting keys sorts the table immediately by the specified keys.

func (*DataTable) Swap

func (dt *DataTable) Swap(i, j int)

Swap exchanges the data in one row of the table for the data in another row.

func (*DataTable) Unique

func (dt *DataTable) Unique() *DataTable

Unique returns a new data table containing only the unique rows from dt. The returned data table will contain the same number of columns in the same order as dt and will have no keys set.

type Grouper

type Grouper interface {
	Group(rg RowGroup)
}

A Grouper performs an action given a group of rows.

type GrouperFunc

type GrouperFunc func(rg RowGroup)

GrouperFunc adapts a function to a Grouper interface

func (GrouperFunc) Group

func (fn GrouperFunc) Group(rg RowGroup)

type Matcher

type Matcher interface {
	Match(row RowRef) bool
}

A Matcher tests a single row of data to determine whether it matches a particular set of criteria.

func CloselyEqual

func CloselyEqual(name string, v float64, e float64) Matcher

CloselyEqual returns a Matcher that tests whether the named column is equal to v within the range +/- e

func GreaterThan

func GreaterThan(name string, v float64) Matcher

GreaterThan returns a Matcher that tests whether the named column is greater than v or not

func IsEqualString

func IsEqualString(col string, val string) Matcher

IsEqualString returns a Matcher that tests whether the named column is equal to the given string

func IsInf

func IsInf(name string) Matcher

IsInf returns a Matcher that tests whether the named column is infinite (either positive or negative infinity will return true).

func IsNan

func IsNan(name string) Matcher

IsNan returns a Matcher that tests whether the named column is NaN or not

func IsZero

func IsZero(name string) Matcher

IsZero returns a Matcher that tests whether the named column is zero or not

func LessThan

func LessThan(name string, v float64) Matcher

LessThan returns a Matcher that tests whether the named column is less than v or not

func MultiColumnMatcher

func MultiColumnMatcher(m map[string]string) Matcher

MultiColumnMatcher returns a Matcher that tests whether the a rown matches the names and values in the map m

func Not

func Not(m Matcher) Matcher

Not returns a Matcher that inverts the value of the supplied matcher

func NumericColumnMatcher

func NumericColumnMatcher(name string, fn func(float64) bool) Matcher

NumericColumnMatcher returns a Matcher that tests the value of a single column in a row of data against the numeric function fn.

func StringColumnMatcher

func StringColumnMatcher(name string, fn func(string) bool) Matcher

StringColumnMatcher returns a Matcher that tests the value of a single column in a row of data against the string function fn.

type MatcherFunc

type MatcherFunc func(row RowRef) bool

MatcherFunc adapts a function to a Matcher interface

func (MatcherFunc) Match

func (fn MatcherFunc) Match(row RowRef) bool

type MatchingRowGroup

type MatchingRowGroup struct {
	// contains filtered or unexported fields
}

func (*MatchingRowGroup) FloatValue

func (m *MatchingRowGroup) FloatValue(name string) (float64, bool)

func (*MatchingRowGroup) Next

func (m *MatchingRowGroup) Next() bool

func (*MatchingRowGroup) Reset

func (m *MatchingRowGroup) Reset()

func (*MatchingRowGroup) RowIndex

func (m *MatchingRowGroup) RowIndex() int

func (*MatchingRowGroup) StringValue

func (m *MatchingRowGroup) StringValue(name string) (string, bool)

func (*MatchingRowGroup) Value

func (m *MatchingRowGroup) Value(name string) (interface{}, bool)

type RowGroup

type RowGroup interface {
	Valuer
	Reset()
	RowIndex() int
	Next() bool
}

type RowMap

type RowMap map[string]interface{}

func (RowMap) FloatValue

func (r RowMap) FloatValue(name string) (float64, bool)

func (RowMap) StringValue

func (r RowMap) StringValue(name string) (string, bool)

func (RowMap) Value

func (r RowMap) Value(name string) (interface{}, bool)

type RowRef

type RowRef struct {
	// contains filtered or unexported fields
}

func (*RowRef) FloatValue

func (r *RowRef) FloatValue(name string) (float64, bool)

func (*RowRef) StringValue

func (r *RowRef) StringValue(name string) (string, bool)

func (*RowRef) Value

func (r *RowRef) Value(name string) (interface{}, bool)

type StaticRowGroup

type StaticRowGroup struct {
	// contains filtered or unexported fields
}

func (*StaticRowGroup) FloatValue

func (r *StaticRowGroup) FloatValue(name string) (float64, bool)

func (*StaticRowGroup) Next

func (r *StaticRowGroup) Next() bool

func (*StaticRowGroup) Reset

func (r *StaticRowGroup) Reset()

func (*StaticRowGroup) RowIndex

func (r *StaticRowGroup) RowIndex() int

RowIndex returns the datatable index of the current row in the row group. It is an error if this is called before calling Next and the function will panic.

func (*StaticRowGroup) StringValue

func (r *StaticRowGroup) StringValue(name string) (string, bool)

func (*StaticRowGroup) Value

func (r *StaticRowGroup) Value(name string) (interface{}, bool)

func (*StaticRowGroup) Where

func (r *StaticRowGroup) Where(m Matcher) *StaticRowGroup

Where applies a matcher to the rows in this row group, returning a new row group that contains only the rows that matched. It does not affect the current position of r's iteration.

type Valuer

type Valuer interface {
	Value(name string) (interface{}, bool)
	FloatValue(name string) (float64, bool)
	StringValue(name string) (string, bool)
}

A Valuer can get the value of a column in a particular context

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL