Documentation
¶
Overview ¶
Package dataframe provides a general interface for managing tabular data that support filtering, aggregation and data manipulation.
This package is beta and its API is subject to change.
This package includes a decoder for CSV files which can be used to create dataframes from CSV files. The decoder is not a streaming decoder and will load the entire CSV file into memory.
The following example shows how to create a dataframe from a CSV file:
package main import ... var ( PickupDay = dataframe.Strings("PICKUP_DAY") PickupTime = dataframe.Ints("PICKUP_TIME") // ... ) func main() { run.Run(handler, run.Decode(dataframe.FromCSV)) } func handler(d dataframe.DataFrame, o store.Options) (store.Solver, error) { d = d.Filter( PickupDay.Equals("Monday").Or(PickupDay.Equals("Tuesday") ).Filter( PickupTime.IsInRange(16, 19), ) // ... }
Running the example will create a dataframe from the CSV file and pass it to the handler function. The handler function can then use the dataframe to solve the problem. To pass a CSV file to the example use the input path flag:
nextmv run local . -- \ -runner.input.path ./data.csv.gz \ -runner.output.path output.json
Index ¶
Constants ¶
const ( // Bool type representing boolean true and false values. Bool DataType = "bool" // Int type representing int values. Int = "int" // Float type representing float64 values. Float = "float" // String type representing string values. String = "string" )
Types of data in DataFrame.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Aggregation ¶
type Aggregation interface { fmt.Stringer // Column returns the column the aggregation will be applied to. Column() Column // As returns the column to be used to identify the newly created column. // containing the aggregated value. As() Column }
Aggregation defines how to aggregate rows of a group of rows in a Groups instance.
type Aggregations ¶
type Aggregations []Aggregation
Aggregations is the slice of Aggregation instances.
type BoolColumn ¶
type BoolColumn interface { Column // IsFalse creates a filter to filter all rows having value false. IsFalse() Filter // IsTrue creates a filter to filter all rows having value true. IsTrue() Filter // Value return the value at row for dataframe df, // panics if out of bound. Value(df DataFrame, row int) bool // Values returns all the values in the column for dataframe df. Values(df DataFrame) []bool }
BoolColumn is the typed column of type Bool.
type Column ¶
type Column interface { fmt.Stringer // Name returns the name of the column, the name is the unique identifier // of the column within a DataFrame instance. Name() string // DataType returns the type of the column. DataType() DataType }
Column is a single column in a DataFrame instance. It is identified by its name and has a DataType.
type DataFrame ¶
type DataFrame interface { // Column returns a column identified by name, panics if not present. Column(name string) Column // Columns returns all columns present in the dataframe. Columns() Columns // Distinct returns a new DataFrame that only contains unique rows with // respect to the specified columns. If no columns are given Distinct will // return rows where all columns are unique. Distinct(columns ...Column) DataFrame // Filter returns a new filtered DataFrame according to the filter. Filter(filter Filter) DataFrame // GroupBy groups rows together for which the values of specified columns // are the same. GroupBy(columns ...Column) Groups // HasColumn reports if a columns with name is present in the dataframe. HasColumn(name string) bool // AreBools returns true if column by name is of type Bool, otherwise false. AreBools(name string) bool // AreInts returns true if column by name is of type Int, otherwise false. AreInts(name string) bool // AreFloats returns true if column by name is of type floats, otherwise // false. AreFloats(name string) bool // AreStrings returns true if column by name is of type String, otherwise // false. AreStrings(name string) bool // Len returns the number of rows in the dataframe. Len() int // Select returns a new dataframe containing only the specified columns. Select(columns ...Column) DataFrame }
DataFrame is an immutable data frame that support filtering, aggregation and data manipulation.
type Filter ¶
type Filter interface { fmt.Stringer // And creates and returns a conjunction filter of the invoking filter // and filter. And(filter Filter) Filter // Not creates and returns a negation filter of the invoking filter. Not() Filter // Or creates and returns a disjunction filter of the invoking filter // and filter. Or(filter Filter) Filter }
Filter defines how to filter columns out of a DataFrame instance.
type FloatColumn ¶
type FloatColumn interface { Column NumericAggregations // IsInRange creates a filter to filter all rows within range [min, max]. IsInRange(min, max float64) Filter // Value return the value at row, panics if out of bound. Value(df DataFrame, row int) float64 // Values returns all the values in the column. Values(df DataFrame) []float64 }
FloatColumn is the typed column of type Float.
type Groups ¶
type Groups interface { // Aggregate applies the given aggregations to all row groups in the // Groups and returns DataFrame instance where each row corresponds // to each group. Aggregate(aggregations ...Aggregation) DataFrame // DataFrames returns a slice of DataFrame where each frame represents // the content of one group. DataFrames() DataFrames }
Groups contains groups of rows produced by DataFrame.GroupBy function.
type IntColumn ¶
type IntColumn interface { Column NumericAggregations // IsInRange creates a filter to filter all value within range [min, max]. IsInRange(min, max int) Filter // Value return the value at row, panics if out of bound. Value(df DataFrame, row int) int // Values returns all the values in the column. Values(df DataFrame) []int }
IntColumn is the typed column of type Int.
type NumericAggregations ¶
type NumericAggregations interface { // Max creates an aggregation which reports the maximum value using // name as. Max(as string) Aggregation // Min creates an aggregation which reports the minimum value using // name as. Min(as string) Aggregation // Sum creates an aggregation which reports the sum of values using // name as. Sum(as string) Aggregation }
NumericAggregations defines the possible aggregations which can be applied on columns of type Float and Int.
type StringColumn ¶
type StringColumn interface { Column // Equals creates a filter to filter all rows having value value. Equals(value string) Filter // Value return the value at row, panics if out of bound. Value(df DataFrame, row int) string // Values returns all the values in the column. Values(df DataFrame) []string }
StringColumn is the typed column of type String.
func Strings ¶
func Strings(name string) StringColumn
Strings returns a StringColumn identified by name.