Documentation
¶
Overview ¶
Package dataframe provides a general interface for managing tabular data that support filtering, aggregation and data manipulation.
This package is beta and its API is subject to change.
This package includes a decoder for CSV files which can be used to create dataframes from CSV files. The decoder is not a streaming decoder and will load the entire CSV file into memory.
The following example shows how to create a dataframe from a CSV file:
package main
import ...
var (
PickupDay = dataframe.Strings("PICKUP_DAY")
PickupTime = dataframe.Ints("PICKUP_TIME")
// ...
)
func main() {
run.Run(handler, run.Decode(dataframe.FromCSV))
}
func handler(d dataframe.DataFrame, o store.Options) (store.Solver, error) {
d = d.Filter(
PickupDay.Equals("Monday").Or(PickupDay.Equals("Tuesday")
).Filter(
PickupTime.IsInRange(16, 19),
)
// ...
}
Running the example will create a dataframe from the CSV file and pass it to the handler function. The handler function can then use the dataframe to solve the problem. To pass a CSV file to the example use the input path flag:
nextmv run local . -- \ -runner.input.path ./data.csv.gz \ -runner.output.path output.json
Index ¶
Constants ¶
const ( // Bool type representing boolean true and false values. Bool DataType = "bool" // Int type representing int values. Int = "int" // Float type representing float64 values. Float = "float" // String type representing string values. String = "string" )
Types of data in DataFrame.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Aggregation ¶
type Aggregation interface {
fmt.Stringer
// Column returns the column the aggregation will be applied to.
Column() Column
// As returns the column to be used to identify the newly created column.
// containing the aggregated value.
As() Column
}
Aggregation defines how to aggregate rows of a group of rows in a Groups instance.
type Aggregations ¶
type Aggregations []Aggregation
Aggregations is the slice of Aggregation instances.
type BoolColumn ¶
type BoolColumn interface {
Column
// IsFalse creates a filter to filter all rows having value false.
IsFalse() Filter
// IsTrue creates a filter to filter all rows having value true.
IsTrue() Filter
// Value return the value at row for dataframe df,
// panics if out of bound.
Value(df DataFrame, row int) bool
// Values returns all the values in the column for dataframe df.
Values(df DataFrame) []bool
}
BoolColumn is the typed column of type Bool.
type Column ¶
type Column interface {
fmt.Stringer
// Name returns the name of the column, the name is the unique identifier
// of the column within a DataFrame instance.
Name() string
// DataType returns the type of the column.
DataType() DataType
}
Column is a single column in a DataFrame instance. It is identified by its name and has a DataType.
type DataFrame ¶
type DataFrame interface {
// Column returns a column identified by name, panics if not present.
Column(name string) Column
// Columns returns all columns present in the dataframe.
Columns() Columns
// Distinct returns a new DataFrame that only contains unique rows with
// respect to the specified columns. If no columns are given Distinct will
// return rows where all columns are unique.
Distinct(columns ...Column) DataFrame
// Filter returns a new filtered DataFrame according to the filter.
Filter(filter Filter) DataFrame
// GroupBy groups rows together for which the values of specified columns
// are the same.
GroupBy(columns ...Column) Groups
// HasColumn reports if a columns with name is present in the dataframe.
HasColumn(name string) bool
// AreBools returns true if column by name is of type Bool, otherwise false.
AreBools(name string) bool
// AreInts returns true if column by name is of type Int, otherwise false.
AreInts(name string) bool
// AreFloats returns true if column by name is of type floats, otherwise
// false.
AreFloats(name string) bool
// AreStrings returns true if column by name is of type String, otherwise
// false.
AreStrings(name string) bool
// Len returns the number of rows in the dataframe.
Len() int
// Select returns a new dataframe containing only the specified columns.
Select(columns ...Column) DataFrame
}
DataFrame is an immutable data frame that support filtering, aggregation and data manipulation.
type Filter ¶
type Filter interface {
fmt.Stringer
// And creates and returns a conjunction filter of the invoking filter
// and filter.
And(filter Filter) Filter
// Not creates and returns a negation filter of the invoking filter.
Not() Filter
// Or creates and returns a disjunction filter of the invoking filter
// and filter.
Or(filter Filter) Filter
}
Filter defines how to filter columns out of a DataFrame instance.
type FloatColumn ¶
type FloatColumn interface {
Column
NumericAggregations
// IsInRange creates a filter to filter all rows within range [min, max].
IsInRange(min, max float64) Filter
// Value return the value at row, panics if out of bound.
Value(df DataFrame, row int) float64
// Values returns all the values in the column.
Values(df DataFrame) []float64
}
FloatColumn is the typed column of type Float.
type Groups ¶
type Groups interface {
// Aggregate applies the given aggregations to all row groups in the
// Groups and returns DataFrame instance where each row corresponds
// to each group.
Aggregate(aggregations ...Aggregation) DataFrame
// DataFrames returns a slice of DataFrame where each frame represents
// the content of one group.
DataFrames() DataFrames
}
Groups contains groups of rows produced by DataFrame.GroupBy function.
type IntColumn ¶
type IntColumn interface {
Column
NumericAggregations
// IsInRange creates a filter to filter all value within range [min, max].
IsInRange(min, max int) Filter
// Value return the value at row, panics if out of bound.
Value(df DataFrame, row int) int
// Values returns all the values in the column.
Values(df DataFrame) []int
}
IntColumn is the typed column of type Int.
type NumericAggregations ¶
type NumericAggregations interface {
// Max creates an aggregation which reports the maximum value using
// name as.
Max(as string) Aggregation
// Min creates an aggregation which reports the minimum value using
// name as.
Min(as string) Aggregation
// Sum creates an aggregation which reports the sum of values using
// name as.
Sum(as string) Aggregation
}
NumericAggregations defines the possible aggregations which can be applied on columns of type Float and Int.
type StringColumn ¶
type StringColumn interface {
Column
// Equals creates a filter to filter all rows having value value.
Equals(value string) Filter
// Value return the value at row, panics if out of bound.
Value(df DataFrame, row int) string
// Values returns all the values in the column.
Values(df DataFrame) []string
}
StringColumn is the typed column of type String.
func Strings ¶
func Strings(name string) StringColumn
Strings returns a StringColumn identified by name.