Documentation
¶
Overview ¶
Package dataframe implements goframe's DataFrame — a 2D labeled data structure.
What is a DataFrame? ¶
A DataFrame is a table: rows × columns. Each column is a Series with a name. All columns share the same row Index. This is exactly pandas' DataFrame.
Internally we store:
- columns map[string]*series.Series — column name → Series
- colOrder []string — preserves insertion order
- index *types.Index — shared row labels
Why not just a [][]Value 2D slice? ¶
Columnar storage (one Series per column) is how real data processing works:
- Column selection (df["price"]) is O(1) — just a map lookup
- Aggregations on a column don't touch other columns' memory
- Adding a column is cheap — add one entry to the map
- Consistent with pandas' internal columnar storage
Row-based storage ([][]Value, like a CSV in memory) would make row operations fast but column operations expensive — the opposite of analytics workloads.
Index Consistency Invariant ¶
All columns in a DataFrame must have the same length and use compatible indexes. We enforce this at construction and mutation time.
Index ¶
- type DataFrame
- func (df *DataFrame) Apply(fn func(*series.Series) types.Value, resultName string) *series.Series
- func (df *DataFrame) Col(name string) (*series.Series, error)
- func (df *DataFrame) Columns() []string
- func (df *DataFrame) Corr() (*DataFrame, error)
- func (df *DataFrame) Describe() (*DataFrame, error)
- func (df *DataFrame) Drop(names ...string) (*DataFrame, error)
- func (df *DataFrame) DropNull(cols ...string) (*DataFrame, error)
- func (df *DataFrame) FillNull(fill types.Value) (*DataFrame, error)
- func (df *DataFrame) Filter(mask *series.Series) (*DataFrame, error)
- func (df *DataFrame) GroupBy(groupCol string, aggs map[string]func(*series.Series) types.Value) (*DataFrame, error)
- func (df *DataFrame) HasColumn(name string) bool
- func (df *DataFrame) Head(n int) (*DataFrame, error)
- func (df *DataFrame) ILoc(i int) map[string]types.Value
- func (df *DataFrame) ILocRange(start, end int) (*DataFrame, error)
- func (df *DataFrame) Index() *types.Index
- func (df *DataFrame) Len() int
- func (df *DataFrame) MustCol(name string) *series.Series
- func (df *DataFrame) Query(predicate func(map[string]types.Value) bool) (*DataFrame, error)
- func (df *DataFrame) Rename(mapping map[string]string) (*DataFrame, error)
- func (df *DataFrame) Select(names ...string) (*DataFrame, error)
- func (df *DataFrame) Shape() (int, int)
- func (df *DataFrame) SortBy(colName string, ascending bool) (*DataFrame, error)
- func (df *DataFrame) String() string
- func (df *DataFrame) Tail(n int) (*DataFrame, error)
- func (df *DataFrame) WithColumn(name string, s *series.Series) (*DataFrame, error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type DataFrame ¶
type DataFrame struct {
// contains filtered or unexported fields
}
DataFrame is a 2-dimensional labeled data structure.
func FromMap ¶
FromMap is a convenience constructor accepting raw Go slices. Useful for quick DataFrame creation in tests and examples.
The data map values must be []int64, []float64, []string, or []bool. Returns an error for unsupported types.
func New ¶
New creates a DataFrame from a map of column name → Series.
All Series must have the same length. Column order in the output follows the order of the colOrder slice — pass nil to sort alphabetically.
Example:
df, err := dataframe.New(map[string]*series.Series{
"name": series.FromStrings([]string{"Alice", "Bob"}, "name"),
"score": series.FromInts([]int64{90, 85}, "score"),
}, []string{"name", "score"})
func (*DataFrame) Apply ¶
Apply applies a function to each column and returns a Series of results. Equivalent to df.apply(func, axis=0) in pandas (column-wise application).
Example — compute column sums:
sums := df.Apply(func(s *series.Series) types.Value {
return types.Float(s.Sum())
}, "sums")
func (*DataFrame) Col ¶
Col returns the Series for a column by name. Returns an error if the column doesn't exist. Equivalent to df["colname"] in pandas.
func (*DataFrame) Corr ¶
Corr computes pairwise Pearson correlation between all numeric columns. Returns a square DataFrame (correlation matrix), like df.corr() in pandas.
Pearson correlation formula:
r = Σ((x-mean_x)(y-mean_y)) / (n * std_x * std_y)
Returns values in [-1, 1] where 1=perfect positive, -1=perfect negative, 0=uncorrelated.
func (*DataFrame) Describe ¶
Describe returns summary statistics for all numeric columns. Equivalent to df.describe() in pandas.
func (*DataFrame) Drop ¶
Drop returns a new DataFrame with the specified columns removed. Equivalent to df.drop(columns=[...]) in pandas.
func (*DataFrame) DropNull ¶
DropNull removes rows where any of the specified columns contain null. If cols is empty, checks ALL columns — matching pandas' df.dropna() default.
func (*DataFrame) FillNull ¶
FillNull replaces null values in all columns with the given value. For per-column control, use df.WithColumn(name, col.FillNull(v)).
func (*DataFrame) Filter ¶
Filter returns a new DataFrame keeping only rows where mask[i] == true. Equivalent to df[boolean_mask] in pandas.
Example:
mask := df.MustCol("price").Gt(100)
cheap := df.Filter(mask)
func (*DataFrame) GroupBy ¶
func (df *DataFrame) GroupBy( groupCol string, aggs map[string]func(*series.Series) types.Value, ) (*DataFrame, error)
GroupBy groups the DataFrame by unique values of a column and applies an aggregation function to each group.
This is a simplified version of pandas' df.groupby("col").agg(func).
Returns a new DataFrame with one row per unique group value. The group key column is always the first column in the result.
Example — average price per category:
result, _ := df.GroupBy("category", map[string]func(*series.Series) types.Value{
"price": func(s *series.Series) types.Value { return types.Float(s.Mean()) },
"qty": func(s *series.Series) types.Value { return types.Float(s.Sum()) },
})
func (*DataFrame) ILoc ¶
ILoc returns the row at integer position i as a map[string]types.Value. Equivalent to df.iloc[i] in pandas.
func (*DataFrame) ILocRange ¶
ILocRange returns a new DataFrame with rows [start, end). Equivalent to df.iloc[start:end] in pandas.
func (*DataFrame) MustCol ¶
MustCol is like Col but panics on error — use only when you know the column exists.
func (*DataFrame) Query ¶
Query is a higher-level filter that takes a predicate function over rows. The predicate receives a map[colName]Value for each row. This is less efficient than Filter (can't vectorize) but more readable for multi-column conditions.
Example:
result, _ := df.Query(func(row map[string]types.Value) bool {
price, _ := row["price"].AsFloat()
qty, _ := row["qty"].AsInt()
return price > 100 && qty > 5
})
func (*DataFrame) Rename ¶
Rename renames columns. The `mapping` maps old name → new name. Equivalent to df.rename(columns={...}) in pandas.
func (*DataFrame) Select ¶
Select returns a new DataFrame containing only the specified columns. Equivalent to df[["a", "b"]] in pandas. Preserves the order given in `names`.
func (*DataFrame) SortBy ¶
SortBy returns a new DataFrame sorted by the given column. ascending=true for smallest-first (default in pandas).
func (*DataFrame) String ¶
String returns a human-readable table view of the DataFrame. Truncates to 20 rows and 8 columns for readability.
func (*DataFrame) WithColumn ¶
WithColumn returns a new DataFrame with an added or replaced column. Equivalent to df["new_col"] = series in pandas (but immutable — returns new DF).
If the column already exists, it is replaced. If it's new, it's appended. The new Series must have the same length as the DataFrame.