dataframe

package
v0.20.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 7, 2022 License: Apache-2.0 Imports: 3 Imported by: 0

Documentation

Overview

Package dataframe provides a general interface for managing tabular data that support filtering, aggregation and data manipulation.

This package is beta and its API is subject to change.

This package includes a decoder for CSV files which can be used to create dataframes from CSV files. The decoder is not a streaming decoder and will load the entire CSV file into memory.

The following example shows how to create a dataframe from a CSV file:

package main

import ...

var (
	PickupDay = dataframe.Strings("PICKUP_DAY")
	PickupTime = dataframe.Ints("PICKUP_TIME")
	// ...
)
func main() {
	run.Run(handler, run.Decode(dataframe.FromCSV))
}

func handler(d dataframe.DataFrame, o store.Options) (store.Solver, error) {
	d = d.Filter(
			PickupDay.Equals("Monday").Or(PickupDay.Equals("Tuesday")
		).Filter(
			PickupTime.IsInRange(16, 19),
		)
	// ...
}

Running the example will create a dataframe from the CSV file and pass it to the handler function. The handler function can then use the dataframe to solve the problem. To pass a CSV file to the example use the input path flag:

nextmv run local . -- \
	-hop.runner.input.path ./data.csv.gz \
	-hop.runner.output.path output.json

Index

Constants

View Source
const (
	// Bool type representing boolean true and false values.
	Bool DataType = "bool"
	// Int type representing int values.
	Int = "int"
	// Float type representing float64 values.
	Float = "float"
	// String type representing string values.
	String = "string"
)

Types of data in DataFrame.

Variables

This section is empty.

Functions

func FromCSV

func FromCSV() decode.Decoder

FromCSV returns a decoder to decode comma separated values (CSV) files and turn it into a DataFrame.

Types

type Aggregation

type Aggregation interface {
	fmt.Stringer

	// Column returns the column the aggregation will be applied to.
	Column() Column

	// As returns the column to be used to identify the newly created column.
	// containing the aggregated value.
	As() Column
}

Aggregation defines how to aggregate rows of a group of rows in a Groups instance.

type Aggregations

type Aggregations []Aggregation

Aggregations is the slice of Aggregation instances.

type BoolColumn

type BoolColumn interface {
	Column

	// IsFalse creates a filter to filter all rows having value false.
	IsFalse() Filter
	// IsTrue creates a filter to filter all rows having value true.
	IsTrue() Filter

	// Value return the value at row for dataframe df,
	// panics if out of bound.
	Value(df DataFrame, row int) bool

	// Values returns all the values in the column for dataframe df.
	Values(df DataFrame) []bool
}

BoolColumn is the typed column of type Bool.

func Bools

func Bools(name string) BoolColumn

Bools returns a BoolColumn identified by name.

type Column

type Column interface {
	fmt.Stringer

	// Name returns the name of the column, the name is the unique identifier
	// of the column within a DataFrame instance.
	Name() string

	// DataType returns the type of the column.
	DataType() DataType
}

Column is a single column in a DataFrame instance. It is identified by its name and has a DataType.

type Columns

type Columns []Column

Columns is the slice of Column instances.

type DataFrame

type DataFrame interface {
	// Column returns a column identified by name, panics if not present.
	Column(name string) Column
	// Columns returns all columns present in the dataframe.
	Columns() Columns

	// Distinct returns a new DataFrame that only contains unique rows with
	// respect to the specified columns. If no columns are given Distinct will
	// return rows where all columns are unique.
	Distinct(columns ...Column) DataFrame

	// Filter returns a new filtered DataFrame according to the filter.
	Filter(filter Filter) DataFrame

	// GroupBy groups rows together for which the values of specified columns
	// are the same.
	GroupBy(columns ...Column) Groups

	// HasColumn reports if a columns with name is present in the dataframe.
	HasColumn(name string) bool

	// AreBools returns true if column by name is of type Bool, otherwise false.
	AreBools(name string) bool
	// AreInts returns true if column by name is of type Int, otherwise false.
	AreInts(name string) bool
	// AreFloats returns true if column by name is of type floats, otherwise
	// false.
	AreFloats(name string) bool
	// AreStrings returns true if column by name is of type String, otherwise
	// false.
	AreStrings(name string) bool

	// Len returns the number of rows in the dataframe.
	Len() int

	// Select returns a new dataframe containing only the specified columns.
	Select(columns ...Column) DataFrame
}

DataFrame is an immutable data frame that support filtering, aggregation and data manipulation.

type DataFrames

type DataFrames []DataFrame

DataFrames is the slice of DataFrame instances.

type DataType

type DataType string

DataType defines the types of colums available in DataFrame.

type Filter

type Filter interface {
	fmt.Stringer

	// And creates and returns a conjunction filter of the invoking filter
	// and filter.
	And(filter Filter) Filter

	// Not creates and returns a negation filter of the invoking filter.
	Not() Filter

	// Or creates and returns a disjunction filter of the invoking filter
	// and filter.
	Or(filter Filter) Filter
}

Filter defines how to filter columns out of a DataFrame instance.

type Filters

type Filters []Filter

Filters is the slice of Filter instances.

type FloatColumn

type FloatColumn interface {
	Column
	NumericAggregations

	// IsInRange creates a filter to filter all rows within range [min, max].
	IsInRange(min, max float64) Filter

	// Value return the value at row, panics if out of bound.
	Value(df DataFrame, row int) float64

	// Values returns all the values in the column.
	Values(df DataFrame) []float64
}

FloatColumn is the typed column of type Float.

func Floats

func Floats(name string) FloatColumn

Floats returns a FloatColumn identified by name.

type Groups

type Groups interface {
	// Aggregate applies the given aggregations to all row groups in the
	// Groups and returns DataFrame instance where each row corresponds
	// to each group.
	Aggregate(aggregations ...Aggregation) DataFrame

	// DataFrames returns a slice of DataFrame where each frame represents
	// the content of one group.
	DataFrames() DataFrames
}

Groups contains groups of rows produced by DataFrame.GroupBy function.

type IntColumn

type IntColumn interface {
	Column
	NumericAggregations

	// IsInRange creates a filter to filter all value within range [min, max].
	IsInRange(min, max int) Filter

	// Value return the value at row, panics if out of bound.
	Value(df DataFrame, row int) int

	// Values returns all the values in the column.
	Values(df DataFrame) []int
}

IntColumn is the typed column of type Int.

func Ints

func Ints(name string) IntColumn

Ints returns a IntColumn identified by name.

type NumericAggregations

type NumericAggregations interface {
	// Max creates an aggregation which reports the maximum value using
	// name as.
	Max(as string) Aggregation
	// Min creates an aggregation which reports the minimum value using
	// name as.
	Min(as string) Aggregation
	// Sum creates an aggregation which reports the sum of values using
	// name as.
	Sum(as string) Aggregation
}

NumericAggregations defines the possible aggregations which can be applied on columns of type Float and Int.

type StringColumn

type StringColumn interface {
	Column

	// Equals creates a filter to filter all rows having value value.
	Equals(value string) Filter

	// Value return the value at row, panics if out of bound.
	Value(df DataFrame, row int) string

	// Values returns all the values in the column.
	Values(df DataFrame) []string
}

StringColumn is the typed column of type String.

func Strings

func Strings(name string) StringColumn

Strings returns a StringColumn identified by name.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL