gandalff

package module
v0.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 7, 2025 License: MIT Imports: 3 Imported by: 0

README ยถ

GANDALFF: Golang, ANother DatAframe Library For Fun ๐Ÿง™โ€โ™‚๏ธ

Go Reference

Or, for short, GDL: Golang Dataframe Library

What is it?

Gandalff is a library for data wrangling in Go. The goal is to provide a simple and efficient API for data manipulation in Go, similar to Pandas or Polars in Python, and Dplyr in R. It supports nullable types: null data is optimized for memory usage.

Gandalff is a work in progress, and the API is not stable yet. The DataFrame package is still being developed.

However, it already supports the following formats:

  • CSV
  • XPT (SAS)
  • XLSX
  • HTML
  • Markdown

Examples

package main

import (
	"strings"

	"github.com/caerbannogwhite/gandalff"
	"github.com/caerbannogwhite/gandalff/dataframe"
)

func main() {
	data1 := `
name,age,weight,junior,department,salary band
Alice C,29,75.0,F,HR,4
John Doe,30,80.5,true,IT,2
Bob,31,85.0,F,IT,4
Jane H,25,60.0,false,IT,4
Mary,28,70.0,false,IT,3
Oliver,32,90.0,true,HR,1
Ursula,27,65.0,f,Business,4
Charlie,33,60.0,t,Business,2
Megan,26,55.0,F,IT,3`

	dataframe.NewBaseDataFrame(gandalff.NewContext()).
		FromCsv().
		SetReader(strings.NewReader(data1)).
		Read().
		Select("department", "age", "weight", "junior").
		GroupBy("department").
		Agg(dataframe.Min("age"), dataframe.Max("weight"), dataframe.Mean("junior"), dataframe.Count()).
		Run().
		PPrint(dataframe.NewPPrintParams().SetUseLipGloss(true))
}

//   BaseDataFrame: 3 rows, 5 columns
// โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
// โ”‚ department โ”‚ min(age) โ”‚ max(weight) โ”‚ mean(junior) โ”‚ n     โ”‚
// โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
// โ”‚ String     โ”‚ Float64  โ”‚ Float64     โ”‚ Float64      โ”‚ Int64 โ”‚
// โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
// โ”‚ HR         โ”‚    29.00 โ”‚       90.00 โ”‚       0.5000 โ”‚ 2.000 โ”‚
// โ”‚ IT         โ”‚    25.00 โ”‚       85.00 โ”‚       0.2000 โ”‚ 5.000 โ”‚
// โ”‚ Business   โ”‚    27.00 โ”‚       65.00 โ”‚       0.5000 โ”‚ 2.000 โ”‚
// โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Supported data types

The data types not checked are not yet supported, but might be in the future.

  • Bool
  • Bool (memory optimized, not fully implemented yet)
  • Int16
  • Int
  • Int64
  • Float32
  • Float64
  • Complex64
  • Complex128
  • String
  • Time
  • Duration

Supported operations for Series

  • Filter

    • filter by bool slice
    • filter by int slice
    • filter by bool series
    • filter by int series
  • Group

    • Group (with nulls)
    • SubGroup (with nulls)
  • Map

  • Sort

    • Sort (with nulls)
    • SortRev (with nulls)
  • Take

Supported operations for DataFrame

  • Agg

  • Filter

  • GroupBy

  • Join

    • Inner
    • Left
    • Right
    • Outer
    • Inner with nulls
    • Left with nulls
    • Right with nulls
    • Outer with nulls
  • Map

  • OrderBy

  • Select

  • Take

  • Pivot

  • Stack/Append

Supported stats functions

  • Count
  • Sum
  • Mean
  • Median
  • Min
  • Max
  • StdDev
  • Variance
  • Quantile

Dependencies

Built with:

TODO

  • Improve filtering interface.
  • Improve dataframe PrettyPrint: add parameters, optimize data display, use lipgloss.
  • Implement string factors.
  • Times: set time format.
  • Implement Set(i []int, v []any) Series.
  • Add Slice(i []int) Series (using filter?).
  • Implement memory optimized Bool series with uint64.
  • Use uint64 for null mask.
  • Optimize XPT reader/writer with float32.
  • Add url resolver to each reader.
  • Add format option to each writer.
  • JSON reader by records.
  • Implement chunked series.
  • Implement Parquet reader and writer.
  • Implement SPSS reader and writer.
  • Implement SAS7BDAT reader and writer (https://cran.r-project.org/web/packages/sas7bdat/vignettes/sas7bdat.pdf)

Documentation ยถ

Index ยถ

Constants ยถ

View Source
const (
	// The default capacity of a series.
	DEFAULT_SERIES_INITIAL_CAPACITY = 10

	// The default capacity of a hash map.
	DEFAULT_HASH_MAP_INITIAL_CAPACITY = 1024

	// The default capacity of a dense map array.
	DEFAULT_DENSE_MAP_ARRAY_INITIAL_CAPACITY = 64

	// Number of threads to use for parallel operations.
	THREADS_NUMBER = 16

	// Minimum number of elements to use parallel operations.
	MINIMUM_PARALLEL_SIZE_1 = 16_384
	MINIMUM_PARALLEL_SIZE_2 = 131_072

	HASH_MAGIC_NUMBER      = int64(0xa8f4979b77e3f93)
	HASH_MAGIC_NUMBER_NULL = int64(0x7fff4979b77e3f93)
	HASH_NULL_KEY          = int64(0x7ff8000000000001)

	INF_TEXT        = "Inf"
	NA_TEXT         = "Na"
	EOL             = "\n"
	QUOTE           = "\""
	BOOL_TRUE_TEXT  = "true"
	BOOL_FALSE_TEXT = "false"

	CSV_READER_DEFAULT_DELIMITER           = ','
	CSV_READER_DEFAULT_HEADER              = true
	CSV_READER_DEFAULT_GUESS_DATA_TYPE_LEN = 1000

	XLSX_READER_DEFAULT_GUESS_DATA_TYPE_LEN = 1000
)

Variables ยถ

This section is empty.

Functions ยถ

This section is empty.

Types ยถ

type Context ยถ

type Context struct {
	// StringPool is a pool of strings that are used by the series.
	// This is used to reduce the number of allocations and to allow for fast comparisons.
	StringPool *StringPool
	// contains filtered or unexported fields
}

func NewContext ยถ

func NewContext() *Context

func (*Context) GetNaText ยถ

func (ctx *Context) GetNaText() string

func (*Context) GetThreadsNumber ยถ

func (ctx *Context) GetThreadsNumber() int

func (*Context) GetTimeFormat ยถ

func (ctx *Context) GetTimeFormat() string

func (*Context) SetNaText ยถ

func (ctx *Context) SetNaText(s string) *Context

func (*Context) SetThreadsNumber ยถ

func (ctx *Context) SetThreadsNumber(n int) *Context

func (*Context) SetTimeFormat ยถ

func (ctx *Context) SetTimeFormat(s string) *Context

type MapFunc ยถ

type MapFunc func(v any) any

type MapFuncNull ยถ

type MapFuncNull func(v any, isNull bool) (any, bool)

type NullableBool ยถ

type NullableBool struct {
	Valid bool
	Value bool
}

type NullableDuration ยถ

type NullableDuration struct {
	Valid bool
	Value time.Duration
}

type NullableFloat32 ยถ

type NullableFloat32 struct {
	Valid bool
	Value float32
}

type NullableFloat64 ยถ

type NullableFloat64 struct {
	Valid bool
	Value float64
}

type NullableInt ยถ

type NullableInt struct {
	Valid bool
	Value int
}

type NullableInt16 ยถ

type NullableInt16 struct {
	Valid bool
	Value int16
}

type NullableInt32 ยถ

type NullableInt32 struct {
	Valid bool
	Value int32
}

type NullableInt64 ยถ

type NullableInt64 struct {
	Valid bool
	Value int64
}

type NullableInt8 ยถ

type NullableInt8 struct {
	Valid bool
	Value int8
}

type NullableString ยถ

type NullableString struct {
	Valid bool
	Value string
}

type NullableTime ยถ

type NullableTime struct {
	Valid bool
	Value time.Time
}

type SeriesSortOrder ยถ

type SeriesSortOrder int16
const (
	// The series is not sorted.
	SORTED_NONE SeriesSortOrder = iota
	// The series is sorted in ascending order.
	SORTED_ASC
	// The series is sorted in descending order.
	SORTED_DESC
)

type StringPool ยถ

type StringPool struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

func NewStringPool ยถ

func NewStringPool() *StringPool

func (*StringPool) Get ยถ

func (sp *StringPool) Get(s string) *string

Get returns the address of the string if it exists in the pool, otherwise nil.

func (*StringPool) Len ยถ

func (sp *StringPool) Len() int

func (*StringPool) Put ยถ

func (sp *StringPool) Put(s string) *string

Put returns the address of the string if it exists in the pool, otherwise it adds it to the pool and returns its address.

func (*StringPool) PutSync ยถ

func (sp *StringPool) PutSync(s string) *string

PutSync returns the address of the string if it exists in the pool, otherwise it adds it to the pool and returns its address. This version is thread-safe.

func (*StringPool) SetNaText ยถ

func (sp *StringPool) SetNaText(s string) *StringPool

func (*StringPool) ToString ยถ

func (sp *StringPool) ToString() string

Directories ยถ

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL