dataframe

package

v0.0.2 Latest Latest Go to latest Published: May 6, 2026 License: MIT Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/LuizCFdosSantos/goframe

Links

Open Source Insights

Documentation ¶

Overview ¶

Package dataframe implements goframe's DataFrame — a 2D labeled data structure.

What is a DataFrame? ¶

A DataFrame is a table: rows × columns. Each column is a Series with a name. All columns share the same row Index. This is exactly pandas' DataFrame.

Internally we store:

columns map[string]*series.Series — column name → Series
colOrder []string — preserves insertion order
index *types.Index — shared row labels

Why not just a [][]Value 2D slice? ¶

Columnar storage (one Series per column) is how real data processing works:

Column selection (df["price"]) is O(1) — just a map lookup
Aggregations on a column don't touch other columns' memory
Adding a column is cheap — add one entry to the map
Consistent with pandas' internal columnar storage

Row-based storage ([][]Value, like a CSV in memory) would make row operations fast but column operations expensive — the opposite of analytics workloads.

Index Consistency Invariant ¶

All columns in a DataFrame must have the same length and use compatible indexes. We enforce this at construction and mutation time.

Index ¶

type DataFrame
- func FromMap(data map[string]interface{}, colOrder []string) (*DataFrame, error)
- func New(cols map[string]*series.Series, colOrder []string) (*DataFrame, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type DataFrame ¶

type DataFrame struct {
	// contains filtered or unexported fields
}

DataFrame is a 2-dimensional labeled data structure.

func FromMap ¶

func FromMap(data map[string]interface{}, colOrder []string) (*DataFrame, error)

FromMap is a convenience constructor accepting raw Go slices. Useful for quick DataFrame creation in tests and examples.

The data map values must be []int64, []float64, []string, or []bool. Returns an error for unsupported types.

func New ¶

func New(cols map[string]*series.Series, colOrder []string) (*DataFrame, error)

New creates a DataFrame from a map of column name → Series.

All Series must have the same length. Column order in the output follows the order of the colOrder slice — pass nil to sort alphabetically.

Example:

df, err := dataframe.New(map[string]*series.Series{
    "name":  series.FromStrings([]string{"Alice", "Bob"}, "name"),
    "score": series.FromInts([]int64{90, 85}, "score"),
}, []string{"name", "score"})

func (*DataFrame) Apply ¶

func (df *DataFrame) Apply(fn func(*series.Series) types.Value, resultName string) *series.Series

Apply applies a function to each column and returns a Series of results. Equivalent to df.apply(func, axis=0) in pandas (column-wise application).

Example — compute column sums:

sums := df.Apply(func(s *series.Series) types.Value {
    return types.Float(s.Sum())
}, "sums")

func (*DataFrame) Col ¶

func (df *DataFrame) Col(name string) (*series.Series, error)

Col returns the Series for a column by name. Returns an error if the column doesn't exist. Equivalent to df["colname"] in pandas.

func (*DataFrame) Columns ¶

func (df *DataFrame) Columns() []string

Columns returns the column names in order.

func (*DataFrame) Corr ¶

func (df *DataFrame) Corr() (*DataFrame, error)

Corr computes pairwise Pearson correlation between all numeric columns. Returns a square DataFrame (correlation matrix), like df.corr() in pandas.

Pearson correlation formula:

r = Σ((x-mean_x)(y-mean_y)) / (n * std_x * std_y)

Returns values in [-1, 1] where 1=perfect positive, -1=perfect negative, 0=uncorrelated.

func (*DataFrame) Describe ¶

func (df *DataFrame) Describe() (*DataFrame, error)

Describe returns summary statistics for all numeric columns. Equivalent to df.describe() in pandas.

func (*DataFrame) Drop ¶

func (df *DataFrame) Drop(names ...string) (*DataFrame, error)

Drop returns a new DataFrame with the specified columns removed. Equivalent to df.drop(columns=[...]) in pandas.

func (*DataFrame) DropNull ¶

func (df *DataFrame) DropNull(cols ...string) (*DataFrame, error)

DropNull removes rows where any of the specified columns contain null. If cols is empty, checks ALL columns — matching pandas' df.dropna() default.

func (*DataFrame) FillNull ¶

func (df *DataFrame) FillNull(fill types.Value) (*DataFrame, error)

FillNull replaces null values in all columns with the given value. For per-column control, use df.WithColumn(name, col.FillNull(v)).

func (*DataFrame) Filter ¶

func (df *DataFrame) Filter(mask *series.Series) (*DataFrame, error)

Filter returns a new DataFrame keeping only rows where mask[i] == true. Equivalent to df[boolean_mask] in pandas.

Example:

mask := df.MustCol("price").Gt(100)
cheap := df.Filter(mask)

func (*DataFrame) GroupBy ¶

func (df *DataFrame) GroupBy(
	groupCol string,
	aggs map[string]func(*series.Series) types.Value,
) (*DataFrame, error)

GroupBy groups the DataFrame by unique values of a column and applies an aggregation function to each group.

This is a simplified version of pandas' df.groupby("col").agg(func).

Returns a new DataFrame with one row per unique group value. The group key column is always the first column in the result.

Example — average price per category:

result, _ := df.GroupBy("category", map[string]func(*series.Series) types.Value{
    "price": func(s *series.Series) types.Value { return types.Float(s.Mean()) },
    "qty":   func(s *series.Series) types.Value { return types.Float(s.Sum()) },
})

func (*DataFrame) HasColumn ¶

func (df *DataFrame) HasColumn(name string) bool

HasColumn returns true if a column with the given name exists.

func (*DataFrame) Head ¶

func (df *DataFrame) Head(n int) (*DataFrame, error)

Head returns the first n rows. Equivalent to df.head(n).

func (*DataFrame) ILoc ¶

func (df *DataFrame) ILoc(i int) map[string]types.Value

ILoc returns the row at integer position i as a map[string]types.Value. Equivalent to df.iloc[i] in pandas.

func (*DataFrame) ILocRange ¶

func (df *DataFrame) ILocRange(start, end int) (*DataFrame, error)

ILocRange returns a new DataFrame with rows [start, end). Equivalent to df.iloc[start:end] in pandas.

func (*DataFrame) Index ¶

func (df *DataFrame) Index() *types.Index

Index returns the shared row index.

func (*DataFrame) Len ¶

func (df *DataFrame) Len() int

Len returns the number of rows.

func (*DataFrame) MustCol ¶

func (df *DataFrame) MustCol(name string) *series.Series

MustCol is like Col but panics on error — use only when you know the column exists.

func (*DataFrame) Query ¶

func (df *DataFrame) Query(predicate func(map[string]types.Value) bool) (*DataFrame, error)

Query is a higher-level filter that takes a predicate function over rows. The predicate receives a map[colName]Value for each row. This is less efficient than Filter (can't vectorize) but more readable for multi-column conditions.

Example:

result, _ := df.Query(func(row map[string]types.Value) bool {
    price, _ := row["price"].AsFloat()
    qty, _ := row["qty"].AsInt()
    return price > 100 && qty > 5
})

func (*DataFrame) Rename ¶

func (df *DataFrame) Rename(mapping map[string]string) (*DataFrame, error)

Rename renames columns. The `mapping` maps old name → new name. Equivalent to df.rename(columns={...}) in pandas.

func (*DataFrame) Select ¶

func (df *DataFrame) Select(names ...string) (*DataFrame, error)

Select returns a new DataFrame containing only the specified columns. Equivalent to df[["a", "b"]] in pandas. Preserves the order given in `names`.

func (*DataFrame) Shape ¶

func (df *DataFrame) Shape() (int, int)

Shape returns (nRows, nCols) — equivalent to df.shape in pandas.

func (*DataFrame) SortBy ¶

func (df *DataFrame) SortBy(colName string, ascending bool) (*DataFrame, error)

SortBy returns a new DataFrame sorted by the given column. ascending=true for smallest-first (default in pandas).

func (*DataFrame) String ¶

func (df *DataFrame) String() string

String returns a human-readable table view of the DataFrame. Truncates to 20 rows and 8 columns for readability.

func (*DataFrame) Tail ¶

func (df *DataFrame) Tail(n int) (*DataFrame, error)

Tail returns the last n rows. Equivalent to df.tail(n).

func (*DataFrame) WithColumn ¶

func (df *DataFrame) WithColumn(name string, s *series.Series) (*DataFrame, error)

WithColumn returns a new DataFrame with an added or replaced column. Equivalent to df["new_col"] = series in pandas (but immutable — returns new DF).

If the column already exists, it is replaced. If it's new, it's appended. The new Series must have the same length as the DataFrame.

Source Files ¶

View all Source files

dataframe.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL