dataframe

package
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 18, 2025 License: MIT Imports: 12 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AddTypedColumn

func AddTypedColumn[T any](df *DataFrame, col *Column[T]) error

AddTypedColumn adds a typed column to the DataFrame.

Parameters:

  • df: The DataFrame to which the column will be added.
  • col: The typed column to add.

Returns:

  • error: An error if the operation fails.

Types

type Column

type Column[T any] struct {
	Name string
	Data []T
}

Column represents a typed column in the DataFrame T is the type of the column data (e.g., int, float64, string, bool)

func ConvertToAnyColumn

func ConvertToAnyColumn[T any](col *Column[T]) *Column[any]

ConvertToAnyColumn converts a typed column to a generic column of type `any`

func NewColumn

func NewColumn[T any](name string, data []T) *Column[T]

NewColumn creates a new typed column

func (*Column[T]) At

func (c *Column[T]) At(index int) (T, error)

At returns the value at the given index

func (*Column[T]) Len

func (c *Column[T]) Len() int

Len returns the length of the column

type DataFrame

type DataFrame struct {
	Columns map[string]*Column[any] // Map column name to generic Column
}

DataFrame represents a collection of typed columns. It provides methods for adding, removing, and manipulating columns and rows.

func FromCSVReader

func FromCSVReader(reader io.Reader) (*DataFrame, error)

FromCSVReader creates a DataFrame from a CSV reader.

Parameters:

  • reader: An io.Reader for the CSV data.

Returns:

  • *DataFrame: The created DataFrame.
  • error: An error if the data cannot be read.

func NewDataFrame

func NewDataFrame() *DataFrame

NewDataFrame creates a new empty DataFrame.

Returns:

  • *DataFrame: A pointer to the newly created DataFrame.

func (*DataFrame) Add

func (df *DataFrame) Add(other *DataFrame, fillValue ...any) (*DataFrame, error)

Add sums 2 dataframes together.

Parameters:

  • other: The other dataframe to be summed with.
  • fillValue: The value to fill if a Nil value is present in the dataframe.

Returns:

  • *Dataframe: The pointer to a new Dataframe that contains the summed values.

Note:

  • String values are not summed but instead a Nil value is inserted as the row value.
  • If the number of rows do not match, the default value for the mismatched rows will be nil unless fillValue is specified.
  • Only the first value in the passed into the fillValue slice will be respected.

func (*DataFrame) AddColumn

func (df *DataFrame) AddColumn(col *Column[any]) error

AddColumn adds a generic column to the DataFrame.

Parameters:

  • col: The generic column to add.

Returns:

  • error: An error if the operation fails.

func (*DataFrame) AddDatetimeIndex

func (df *DataFrame) AddDatetimeIndex(columnName string, format string) error

AddDatetimeIndex adds a datetime index to the DataFrame

func (*DataFrame) AppendRow

func (df *DataFrame) AppendRow(result *DataFrame, row map[string]any) error

func (*DataFrame) Apply

func (df *DataFrame) Apply(function FuncType, axis ...int) (any, error)

Apply applies the function defined in the parameters to the data in row-wise or column-wise

Parameters:

  • function: The function to apply to all the data. Can be custom defined functions.
  • axis (optional): The direction to apply the function. 0 for column-wise, 1 for row-wise. The default is column-wise if left empty, else it is row wise for numbers other than 0.

Returns:

  • any: The returned datatype depends on the return type of the function passed into the parameter.
  • error: An error if unable to get the dataset's row, functions return nothing, etc...

Note:

  • The method signature of the custom function needs to match the FuncType type: 'func(x any) any'

func (*DataFrame) Astype

func (df *DataFrame) Astype(columnName string, targetType string) error

Astype converts the data type of a column

func (*DataFrame) BarPlot

func (df *DataFrame) BarPlot(columnName, outputFile string) error

BarPlot generates a bar plot for the specified column and saves it to a file

func (*DataFrame) BooleanIndex

func (df *DataFrame) BooleanIndex(condition func(row map[string]any) bool) *DataFrame

BooleanIndex filters rows based on a boolean condition

func (*DataFrame) ColumnNames

func (df *DataFrame) ColumnNames() []string

ColumnNames returns the names of all columns in the DataFrame.

Returns:

  • []string: A sorted list of column names.

func (*DataFrame) DropColumn

func (df *DataFrame) DropColumn(name string) error

DropColumn removes a column from the DataFrame.

Parameters:

  • name: The name of the column to remove.

Returns:

  • error: An error if the column does not exist.

func (*DataFrame) DropNa

func (df *DataFrame) DropNa() error

DropNa removes rows with missing values from the DataFrame

func (*DataFrame) DropRow

func (df *DataFrame) DropRow(i int) error

DropRow removes a row by index from the DataFrame

func (*DataFrame) FillNa

func (df *DataFrame) FillNa(value any)

FillNa fills missing values in the DataFrame with a specified value

func (*DataFrame) Filter

func (df *DataFrame) Filter(condition func(row map[string]any) bool) *DataFrame

Filter returns a new DataFrame with rows that satisfy the given condition.

Parameters:

  • condition: A function that takes a row and returns true if the row should be included.

Returns:

  • *DataFrame: A new DataFrame containing the filtered rows.

func (*DataFrame) FromCSV

func (df *DataFrame) FromCSV(filename string) (*DataFrame, error)

FromCSV creates a DataFrame from a CSV file.

Parameters:

  • filename: The path to the CSV file.

Returns:

  • *DataFrame: The created DataFrame.
  • error: An error if the file cannot be read.

func (*DataFrame) Groupby

func (df *DataFrame) Groupby(key any) *GroupedDataFrame

func (*DataFrame) Head

func (df *DataFrame) Head(n int) *DataFrame

Head returns the first n rows of the DataFrame.

Parameters:

  • n: The number of rows to return.

Returns:

  • *DataFrame: A new DataFrame containing the first n rows.

func (*DataFrame) Iloc

func (df *DataFrame) Iloc(rowIndices []int, colIndices []int) (*DataFrame, error)

Iloc selects rows and columns by integer positions

func (*DataFrame) InnerJoin

func (df *DataFrame) InnerJoin(other *DataFrame, key string) (*DataFrame, error)

func (*DataFrame) LeftJoin

func (df *DataFrame) LeftJoin(other *DataFrame, key string) (*DataFrame, error)

func (*DataFrame) LinePlot

func (df *DataFrame) LinePlot(xCol, yCol, outputFile string) error

LinePlot generates a line plot for the specified columns and saves it to a file

func (*DataFrame) Loc

func (df *DataFrame) Loc(rowLabels []any, colLabels []string) (*DataFrame, error)

Loc selects rows and columns by labels

func (*DataFrame) Max

func (df *DataFrame) Max() (map[string]float64, error)

Max calculates the maximum value for each column in the DataFrame

func (*DataFrame) Mean

func (df *DataFrame) Mean() (map[string]float64, error)

Mean calculates the mean of numeric values for each column in the DataFrame

func (*DataFrame) Min

func (df *DataFrame) Min() (map[string]float64, error)

Min calculates the minimum value for each column in the DataFrame

func (*DataFrame) MultiSelect

func (df *DataFrame) MultiSelect(name ...string) (*DataFrame, error)

MultiSelect returns a dataframe of the selected columns.

Parameters:

  • name: The name of the column(s) to select.

Returns:

  • *DataFrame: The DataFrame struct containing the selected columns.
  • error: An error if the column(s) does not exist.

func (*DataFrame) Ncols

func (df *DataFrame) Ncols() int

Ncols returns the number of columns in the DataFrame.

Returns:

  • int: The number of columns in the DataFrame.

func (*DataFrame) Nrows

func (df *DataFrame) Nrows() int

Nrows returns the number of rows in the DataFrame.

Returns:

  • int: The number of rows in the DataFrame.

func (*DataFrame) OuterJoin

func (df *DataFrame) OuterJoin(other *DataFrame, key string) (*DataFrame, error)

func (*DataFrame) RenameColumn

func (df *DataFrame) RenameColumn(oldName, newName string) error

RenameColumn renames a column in the DataFrame

func (*DataFrame) Resample

func (df *DataFrame) Resample(datetimeColumn string, freq string, aggFunc func([]any) any) (*DataFrame, error)

Resample aggregates data based on a given time frequency

func (*DataFrame) RightJoin

func (df *DataFrame) RightJoin(other *DataFrame, key string) (*DataFrame, error)

func (*DataFrame) Row

func (df *DataFrame) Row(index int) (map[string]any, error)

Row returns a row by index.

Parameters:

  • index: The index of the row to retrieve.

Returns:

  • map[string]any: A map representing the row, with column names as keys.
  • error: An error if the index is out of bounds.

func (*DataFrame) Select

func (df *DataFrame) Select(name string) (*Column[any], error)

Select returns a column by name.

Parameters:

  • name: The name of the column to select.

Returns:

  • *Column[any]: The selected column.
  • error: An error if the column does not exist.

func (*DataFrame) Shift

func (df *DataFrame) Shift(periods int) *DataFrame

Shift shifts the data in the DataFrame by a given number of periods

func (*DataFrame) SortValues added in v1.1.0

func (df *DataFrame) SortValues(by string, ascending ...bool) (*DataFrame, error)

sort_values is a DataFrame method that sorts the columns and returns the new sorted DataFrame.

Parameters:

  • by : The column name to sort by.
  • ascending (optional) : The order of the values to sort by. True = Ascending, False = Descending If it is not declared by user, it will be ascending by default.

Returns:

  • *DataFrame: The sorted DataFrame, returns an empty dataframe if there is an error.
  • error: An error if the operation fails.

func (*DataFrame) String

func (df *DataFrame) String() string

String returns a string representation of the DataFrame.

Returns:

  • string: A string representation of the DataFrame.

func (*DataFrame) Sum

func (df *DataFrame) Sum() (map[string]float64, error)

Sum calculates the sum of numeric values for each column in the DataFrame

func (*DataFrame) Tail

func (df *DataFrame) Tail(n int) *DataFrame

Tail returns the last n rows of the DataFrame.

Parameters:

  • n: The number of rows to return.

Returns:

  • *DataFrame: A new DataFrame containing the last n rows.

func (*DataFrame) ToCSV

func (df *DataFrame) ToCSV(filename string) error

ToCSV exports the DataFrame to a CSV file.

Parameters:

  • filename: The path to the output CSV file.

Returns:

  • error: An error if the file cannot be written.

func (*DataFrame) ToCSVWriter

func (df *DataFrame) ToCSVWriter(writer io.Writer) error

ToCSVWriter exports the DataFrame to a CSV writer.

Parameters:

  • writer: An io.Writer for the CSV data.

Returns:

  • error: An error if the data cannot be written.

type DataFrameSorter added in v1.1.0

type DataFrameSorter struct {
	// contains filtered or unexported fields
}

DataFrameSorter is a helper structure to implement the sort.Interface. It allows us to use Go's standard library sort function on the DataFrame.

func (DataFrameSorter) Len added in v1.1.0

func (s DataFrameSorter) Len() int

Len is part of sort.Interface.

func (DataFrameSorter) Less added in v1.1.0

func (s DataFrameSorter) Less(i, j int) bool

Less is part of sort.Interface. It compares elements i and j in the sort column.

func (DataFrameSorter) Swap added in v1.1.0

func (s DataFrameSorter) Swap(i, j int)

Swap is part of sort.Interface. It swaps the elements at indices i and j across ALL columns to preserve row integrity.

type FuncType

type FuncType func([]any) any

type GroupedDataFrame

type GroupedDataFrame struct {
	Groups   map[any][]map[string]any
	KeyOrder []any // This is to preserve the order of the data
	Key      string
	Err      error
}

func (*GroupedDataFrame) Count

func (gdf *GroupedDataFrame) Count(colNames ...string) (*DataFrame, error)

func (*GroupedDataFrame) Error

func (gdf *GroupedDataFrame) Error() error

func (*GroupedDataFrame) GetAllColumnNames

func (gdf *GroupedDataFrame) GetAllColumnNames() []string

func (*GroupedDataFrame) Mean

func (gdf *GroupedDataFrame) Mean(colNames ...string) (*DataFrame, error)

func (*GroupedDataFrame) Sum

func (gdf *GroupedDataFrame) Sum(colNames ...string) (*DataFrame, error)

type MultiIndex

type MultiIndex struct {
	Levels [][]any
	Labels [][]int
}

MultiIndex represents hierarchical indexing for rows

type Series

type Series struct {
	Name string
	Data []any
}

Series represents a single column of data with a name and type. It provides methods for accessing and manipulating the data.

func NewSeries

func NewSeries(name string, data []any) *Series

NewSeries creates a new Series with the given name and data.

Parameters:

  • name: The name of the series.
  • data: The data for the series.

Returns:

  • *Series: A pointer to the newly created Series.

func (*Series) AsFloat64

func (s *Series) AsFloat64() ([]float64, error)

AsFloat64 returns the series data as a float64 slice, converting where possible.

Returns:

  • []float64: The data converted to float64.
  • error: An error if any value cannot be converted.

func (*Series) At

func (s *Series) At(index int) interface{}

At returns the value at the given index.

Parameters:

  • index: The index of the value to retrieve.

Returns:

  • interface{}: The value at the specified index.
  • error: An error if the index is out of bounds.

func (*Series) Len

func (s *Series) Len() int

Len returns the length of the series.

Returns:

  • int: The number of elements in the series.

func (*Series) Max

func (s *Series) Max() (float64, error)

Max finds the maximum value in the series.

Returns:

  • float64: The maximum value.
  • error: An error if the series is empty or contains non-numeric values.

func (*Series) Mean

func (s *Series) Mean() (float64, error)

Mean calculates the mean of numeric values in the series.

Returns:

  • float64: The mean of the numeric values.
  • error: An error if the series is empty or contains non-numeric values.

func (*Series) Min

func (s *Series) Min() (float64, error)

Min finds the minimum value in the series.

Returns:

  • float64: The minimum value.
  • error: An error if the series is empty or contains non-numeric values.

func (*Series) Sum

func (s *Series) Sum() (float64, error)

Sum calculates the sum of numeric values in the series.

Returns:

  • float64: The sum of the numeric values.
  • error: An error if the series contains non-numeric values.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL