dataframe

package
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 6, 2026 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package dataframe implements goframe's DataFrame — a 2D labeled data structure.

What is a DataFrame?

A DataFrame is a table: rows × columns. Each column is a Series with a name. All columns share the same row Index. This is exactly pandas' DataFrame.

Internally we store:

  • columns map[string]*series.Series — column name → Series
  • colOrder []string — preserves insertion order
  • index *types.Index — shared row labels

Why not just a [][]Value 2D slice?

Columnar storage (one Series per column) is how real data processing works:

  • Column selection (df["price"]) is O(1) — just a map lookup
  • Aggregations on a column don't touch other columns' memory
  • Adding a column is cheap — add one entry to the map
  • Consistent with pandas' internal columnar storage

Row-based storage ([][]Value, like a CSV in memory) would make row operations fast but column operations expensive — the opposite of analytics workloads.

Index Consistency Invariant

All columns in a DataFrame must have the same length and use compatible indexes. We enforce this at construction and mutation time.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type DataFrame

type DataFrame struct {
	// contains filtered or unexported fields
}

DataFrame is a 2-dimensional labeled data structure.

func FromMap

func FromMap(data map[string]interface{}, colOrder []string) (*DataFrame, error)

FromMap is a convenience constructor accepting raw Go slices. Useful for quick DataFrame creation in tests and examples.

The data map values must be []int64, []float64, []string, or []bool. Returns an error for unsupported types.

func New

func New(cols map[string]*series.Series, colOrder []string) (*DataFrame, error)

New creates a DataFrame from a map of column name → Series.

All Series must have the same length. Column order in the output follows the order of the colOrder slice — pass nil to sort alphabetically.

Example:

df, err := dataframe.New(map[string]*series.Series{
    "name":  series.FromStrings([]string{"Alice", "Bob"}, "name"),
    "score": series.FromInts([]int64{90, 85}, "score"),
}, []string{"name", "score"})

func (*DataFrame) Apply

func (df *DataFrame) Apply(fn func(*series.Series) types.Value, resultName string) *series.Series

Apply applies a function to each column and returns a Series of results. Equivalent to df.apply(func, axis=0) in pandas (column-wise application).

Example — compute column sums:

sums := df.Apply(func(s *series.Series) types.Value {
    return types.Float(s.Sum())
}, "sums")

func (*DataFrame) Col

func (df *DataFrame) Col(name string) (*series.Series, error)

Col returns the Series for a column by name. Returns an error if the column doesn't exist. Equivalent to df["colname"] in pandas.

func (*DataFrame) Columns

func (df *DataFrame) Columns() []string

Columns returns the column names in order.

func (*DataFrame) Corr

func (df *DataFrame) Corr() (*DataFrame, error)

Corr computes pairwise Pearson correlation between all numeric columns. Returns a square DataFrame (correlation matrix), like df.corr() in pandas.

Pearson correlation formula:

r = Σ((x-mean_x)(y-mean_y)) / (n * std_x * std_y)

Returns values in [-1, 1] where 1=perfect positive, -1=perfect negative, 0=uncorrelated.

func (*DataFrame) Describe

func (df *DataFrame) Describe() (*DataFrame, error)

Describe returns summary statistics for all numeric columns. Equivalent to df.describe() in pandas.

func (*DataFrame) Drop

func (df *DataFrame) Drop(names ...string) (*DataFrame, error)

Drop returns a new DataFrame with the specified columns removed. Equivalent to df.drop(columns=[...]) in pandas.

func (*DataFrame) DropNull

func (df *DataFrame) DropNull(cols ...string) (*DataFrame, error)

DropNull removes rows where any of the specified columns contain null. If cols is empty, checks ALL columns — matching pandas' df.dropna() default.

func (*DataFrame) FillNull

func (df *DataFrame) FillNull(fill types.Value) (*DataFrame, error)

FillNull replaces null values in all columns with the given value. For per-column control, use df.WithColumn(name, col.FillNull(v)).

func (*DataFrame) Filter

func (df *DataFrame) Filter(mask *series.Series) (*DataFrame, error)

Filter returns a new DataFrame keeping only rows where mask[i] == true. Equivalent to df[boolean_mask] in pandas.

Example:

mask := df.MustCol("price").Gt(100)
cheap := df.Filter(mask)

func (*DataFrame) GroupBy

func (df *DataFrame) GroupBy(
	groupCol string,
	aggs map[string]func(*series.Series) types.Value,
) (*DataFrame, error)

GroupBy groups the DataFrame by unique values of a column and applies an aggregation function to each group.

This is a simplified version of pandas' df.groupby("col").agg(func).

Returns a new DataFrame with one row per unique group value. The group key column is always the first column in the result.

Example — average price per category:

result, _ := df.GroupBy("category", map[string]func(*series.Series) types.Value{
    "price": func(s *series.Series) types.Value { return types.Float(s.Mean()) },
    "qty":   func(s *series.Series) types.Value { return types.Float(s.Sum()) },
})

func (*DataFrame) HasColumn

func (df *DataFrame) HasColumn(name string) bool

HasColumn returns true if a column with the given name exists.

func (*DataFrame) Head

func (df *DataFrame) Head(n int) (*DataFrame, error)

Head returns the first n rows. Equivalent to df.head(n).

func (*DataFrame) ILoc

func (df *DataFrame) ILoc(i int) map[string]types.Value

ILoc returns the row at integer position i as a map[string]types.Value. Equivalent to df.iloc[i] in pandas.

func (*DataFrame) ILocRange

func (df *DataFrame) ILocRange(start, end int) (*DataFrame, error)

ILocRange returns a new DataFrame with rows [start, end). Equivalent to df.iloc[start:end] in pandas.

func (*DataFrame) Index

func (df *DataFrame) Index() *types.Index

Index returns the shared row index.

func (*DataFrame) Len

func (df *DataFrame) Len() int

Len returns the number of rows.

func (*DataFrame) MustCol

func (df *DataFrame) MustCol(name string) *series.Series

MustCol is like Col but panics on error — use only when you know the column exists.

func (*DataFrame) Query

func (df *DataFrame) Query(predicate func(map[string]types.Value) bool) (*DataFrame, error)

Query is a higher-level filter that takes a predicate function over rows. The predicate receives a map[colName]Value for each row. This is less efficient than Filter (can't vectorize) but more readable for multi-column conditions.

Example:

result, _ := df.Query(func(row map[string]types.Value) bool {
    price, _ := row["price"].AsFloat()
    qty, _ := row["qty"].AsInt()
    return price > 100 && qty > 5
})

func (*DataFrame) Rename

func (df *DataFrame) Rename(mapping map[string]string) (*DataFrame, error)

Rename renames columns. The `mapping` maps old name → new name. Equivalent to df.rename(columns={...}) in pandas.

func (*DataFrame) Select

func (df *DataFrame) Select(names ...string) (*DataFrame, error)

Select returns a new DataFrame containing only the specified columns. Equivalent to df[["a", "b"]] in pandas. Preserves the order given in `names`.

func (*DataFrame) Shape

func (df *DataFrame) Shape() (int, int)

Shape returns (nRows, nCols) — equivalent to df.shape in pandas.

func (*DataFrame) SortBy

func (df *DataFrame) SortBy(colName string, ascending bool) (*DataFrame, error)

SortBy returns a new DataFrame sorted by the given column. ascending=true for smallest-first (default in pandas).

func (*DataFrame) String

func (df *DataFrame) String() string

String returns a human-readable table view of the DataFrame. Truncates to 20 rows and 8 columns for readability.

func (*DataFrame) Tail

func (df *DataFrame) Tail(n int) (*DataFrame, error)

Tail returns the last n rows. Equivalent to df.tail(n).

func (*DataFrame) WithColumn

func (df *DataFrame) WithColumn(name string, s *series.Series) (*DataFrame, error)

WithColumn returns a new DataFrame with an added or replaced column. Equivalent to df["new_col"] = series in pandas (but immutable — returns new DF).

If the column already exists, it is replaced. If it's new, it's appended. The new Series must have the same length as the DataFrame.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL