hybridarray

package module
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 2, 2025 License: Apache-2.0 Imports: 2 Imported by: 0

README

hybridarray

A NumPy-inspired array library for Go, providing efficient columnar data structures with zero-copy views.

Overview

hybridarray combines three key data structures:

  • Map: O(1) column name lookups (like NumPy's structured array field access)
  • Linked list: Preserves insertion order for column iteration
  • Columnar arrays: Cache-friendly data access patterns (like NumPy's contiguous storage)

This hybrid approach enables NumPy-like operations with Go's type safety and performance.

Features

  • Zero-copy views: Slicing and column selection without data duplication
  • Type-aware columns: Runtime type information with DType
  • Ordered iteration: Preserves column insertion order
  • View composition: Views of views work correctly
  • Minimal API: Small, focused surface area for research workflows

Installation

go get github.com/alexshd/hybridarray

Quick Start

package main

import (
    "fmt"
    "github.com/alexshd/hybridarray"
)

func main() {
    // Create from map
    arr, _ := hybridarray.FromMap(map[string][]any{
        "x": {1.0, 2.0, 3.0, 4.0, 5.0},
        "y": {10.0, 20.0, 30.0, 40.0, 50.0},
    })

    fmt.Println(arr.Shape()) // (5, 2)

    // Zero-copy slice (rows 1-3)
    view := arr.Slice(1, 4)

    // Select columns
    xy := arr.Select("x", "y")

    // Access values
    val, _ := arr.At(2, "x") // 3.0

    // Iterate columns
    for col := range arr.Columns() {
        fmt.Printf("%s: %v\n", col.Name, col.Data)
    }
}

API Reference

Creating Arrays
// New array with specified rows
arr := hybridarray.New(100)

// From map of columns
arr, err := hybridarray.FromMap(map[string][]any{
    "temperature": {20.5, 21.0, 19.8},
    "humidity":    {65.0, 68.0, 62.0},
})
Adding Columns
data := []any{1.0, 2.0, 3.0}
err := arr.AddColumn("sensor", data, hybridarray.DTypeFloat64)
Accessing Data
// Single value
val, err := arr.At(row, "column")

// Full row as map
row, err := arr.Row(5)

// Column lookup
col := arr.GetColumn("temperature")

// Shape
nrows, ncols := arr.Shape()

// Column names
names := arr.ColumnNames()
Zero-Copy Operations
// Row slicing [start:end)
view, err := arr.Slice(10, 100)

// Column selection
view, err := arr.Select("x", "y", "z")

// Combine operations
filtered := arr.Slice(50, 150).Select("sensor", "value")
Iteration
// Range over columns (Go 1.23+ iter.Seq)
for col := range arr.Columns() {
    fmt.Printf("%s (%s): %d values\n",
        col.Name, col.DType, len(col.Data))
}

Data Types

const (
    DTypeFloat64  // float64, float32
    DTypeInt64    // int, int64, int32, int16, int8
    DTypeString   // string
    DTypeBool     // bool
    DTypeAny      // any (type-erased)
)

Types are inferred automatically in FromMap or can be specified in AddColumn.

Zero-Copy Semantics

Views created with Slice() and Select() share underlying data:

arr, _ := hybridarray.FromMap(map[string][]any{
    "x": {1.0, 2.0, 3.0, 4.0, 5.0},
})

view := arr.Slice(1, 4) // Rows 1-3

// Modifying original data affects view
arr.GetColumn("x").Data[2] = 99.0

val, _ := view.At(1, "x") // 99.0 (sees the change)

This enables efficient data pipelines without copying large arrays.

Performance

Benchmarks on M1 MacBook Pro (example):

BenchmarkFromMap-8              50000    25000 ns/op
BenchmarkSlice-8              5000000      250 ns/op   (zero-copy)
BenchmarkSelect-8             1000000     1500 ns/op   (zero-copy)
BenchmarkAt-8                20000000       65 ns/op
BenchmarkGetColumn-8        100000000       12 ns/op   (map lookup)

Run benchmarks:

go test -bench=. -benchmem

Testing

# Unit tests
go test -v

# Fuzz tests (Go 1.18+)
go test -fuzz=FuzzFromMap -fuzztime=30s
go test -fuzz=FuzzSlice -fuzztime=30s
go test -fuzz=FuzzSelect -fuzztime=30s

# Race detection
go test -race

# Coverage
go test -cover

Design Philosophy

hybridarray is designed as a minimal reference implementation for scientific computing workflows. It prioritizes:

  1. Simplicity: Small API surface, easy to understand
  2. Zero-copy: Memory-efficient view semantics
  3. Type awareness: Runtime type info without generics overhead
  4. Research-friendly: Quick iteration on data transformations

It is not designed for:

  • Production databases (no ACID guarantees)
  • Distributed computing (single-machine only)
  • Complex query optimization (no query planner)

NumPy Comparison

Similar to NumPy's ndarray:

  • Zero-copy slicing (like NumPy views)
  • Typed columns (analogous to structured arrays with dtype)
  • Efficient iteration
  • Field-based access (like structured arrays: arr['field'])

Different from NumPy:

  • Columnar storage instead of row-major (more like pandas DataFrame)
  • Map-based field lookup (O(1) like NumPy's structured arrays)
  • No multi-dimensional indexing yet (1D + columns only)
  • No broadcasting or vectorized operations (yet)

Direct NumPy API Equivalents:

# NumPy structured arrays
arr = np.array([(1, 2.5), (2, 3.5)], dtype=[('x', 'i4'), ('y', 'f8')])
view = arr[10:20]  # Zero-copy slice
x_col = arr['x']   # Field access

# hybridarray (Go)
arr := FromMap(map[string][]any{"x": {1, 2}, "y": {2.5, 3.5}})
view := arr.Slice(10, 20)  // Zero-copy slice
x_col := arr.GetColumn("x") // Field access

Go 1.25.3 Features

Uses latest Go features:

  • iter.Seq for range-over-func column iteration (Go 1.23+)
  • Improved generic type inference
  • Enhanced fuzzing support

Future Enhancements

Potential NumPy-inspired additions (not implemented):

  • Vectorized operations: Add(), Mul(), Apply() (like NumPy ufuncs)
  • Aggregations: Sum(), Mean(), Std() (like NumPy reductions)
  • Boolean indexing: Where(predicate) (like NumPy fancy indexing)
  • Sorting: Sort(), Argsort() (like NumPy sorting)
  • Set operations: Unique(), Intersect() (like NumPy set routines)
  • Broadcasting: Automatic shape alignment (NumPy's killer feature)
  • Multi-dimensional: True ndarray with arbitrary dimensions

Contributing

This is a minimal reference implementation inspired by NumPy. For production scientific computing in Go, consider:

License

Apache License 2.0 - See LICENSE file for details.

Copyright 2025 Alex Shadrin

Credits

Primary inspiration: NumPy's ndarray architecture

Additional influences:

  • NumPy structured arrays (field access, zero-copy views, dtype system)
  • pandas DataFrame API (columnar storage, named columns)
  • Apache Arrow columnar format (memory layout)

This is a learning/research implementation to understand NumPy's design principles in Go.

Documentation

Overview

Package hybridarray provides a NumPy-like array implementation in Go. It combines map-based column indexing with columnar storage for efficient data manipulation and zero-copy views.

This is a standalone reference implementation inspired by NumPy's ndarray, specifically NumPy's structured arrays with named fields.

Key NumPy concepts implemented:

  • Zero-copy views (like NumPy array slicing)
  • Structured array field access (arr['field_name'])
  • dtype-like type system
  • Columnar storage (cache-friendly, like pandas/Arrow)

Example (compare with NumPy):

// NumPy
arr = np.array([(1, 2.5), (2, 3.5)], dtype=[('x', 'i4'), ('y', 'f8')])
view = arr[10:20]
x = arr['x']

// hybridarray
arr := FromMap(map[string][]any{"x": {1, 2}, "y": {2.5, 3.5}})
view := arr.Slice(10, 20)
x := arr.GetColumn("x")
Example (Basic)

Example_basic demonstrates basic array creation and access.

package main

import (
	"fmt"

	"github.com/alexshd/hybridarray"
)

func main() {
	// Create array from map
	arr, _ := hybridarray.FromMap(map[string][]any{
		"x": {1.0, 2.0, 3.0},
		"y": {4.0, 5.0, 6.0},
	})

	// Get shape
	rows, cols := arr.Shape()
	fmt.Printf("Shape: (%d, %d)\n", rows, cols)

	// Access value
	val, _ := arr.At(1, "x")
	fmt.Printf("arr[1, 'x'] = %v\n", val)

}
Output:

Shape: (3, 2)
arr[1, 'x'] = 2
Example (ColumnOrdering)

Example_columnOrdering demonstrates insertion order preservation.

package main

import (
	"fmt"

	"github.com/alexshd/hybridarray"
)

func main() {
	arr := hybridarray.New(2)

	// Add in specific order
	arr.AddColumn("third", []any{3.0, 3.0}, hybridarray.DTypeFloat64)
	arr.AddColumn("first", []any{1.0, 1.0}, hybridarray.DTypeFloat64)
	arr.AddColumn("second", []any{2.0, 2.0}, hybridarray.DTypeFloat64)

	// Names preserve insertion order
	names := arr.ColumnNames()
	fmt.Printf("Column order: %v\n", names)

}
Output:

Column order: [third first second]
Example (ColumnSelection)

Example_columnSelection demonstrates zero-copy column selection.

package main

import (
	"fmt"

	"github.com/alexshd/hybridarray"
)

func main() {
	arr, _ := hybridarray.FromMap(map[string][]any{
		"x": {1.0, 2.0},
		"y": {3.0, 4.0},
		"z": {5.0, 6.0},
	})

	// Select only x and z
	view, _ := arr.Select("x", "z")

	names := view.ColumnNames()
	fmt.Printf("Selected columns: %v\n", names)

	// y is not in view
	if view.GetColumn("y") == nil {
		fmt.Println("Column y not in view")
	}

}
Output:

Selected columns: [x z]
Column y not in view
Example (DataTypes)

Example_dataTypes demonstrates type-aware columns.

package main

import (
	"fmt"

	"github.com/alexshd/hybridarray"
)

func main() {
	arr := hybridarray.New(2)

	arr.AddColumn("floats", []any{1.5, 2.5}, hybridarray.DTypeFloat64)
	arr.AddColumn("ints", []any{10, 20}, hybridarray.DTypeInt64)
	arr.AddColumn("strings", []any{"a", "b"}, hybridarray.DTypeString)
	arr.AddColumn("bools", []any{true, false}, hybridarray.DTypeBool)

	for col := range arr.Columns() {
		fmt.Printf("%s: %s\n", col.Name, col.DType)
	}

	// Output depends on iteration order
}
Example (ErrorHandling)

Example_errorHandling demonstrates proper error checking.

package main

import (
	"fmt"

	"github.com/alexshd/hybridarray"
)

func main() {
	arr := hybridarray.New(5)
	arr.AddColumn("x", []any{1.0, 2.0, 3.0, 4.0, 5.0}, hybridarray.DTypeFloat64)

	// Invalid column access
	_, err := arr.At(0, "nonexistent")
	if err != nil {
		fmt.Println("Error:", err)
	}

	// Invalid row access
	_, err = arr.At(10, "x")
	if err != nil {
		fmt.Println("Error:", err)
	}

	// Invalid slice
	_, err = arr.Slice(3, 2)
	if err != nil {
		fmt.Println("Error:", err)
	}

}
Output:

Error: column nonexistent does not exist
Error: row index 10 out of range [0:5)
Error: invalid slice range [3:2] for array with 5 rows
Example (Iteration)

Example_iteration demonstrates column iteration.

package main

import (
	"fmt"

	"github.com/alexshd/hybridarray"
)

func main() {
	arr, _ := hybridarray.FromMap(map[string][]any{
		"temp":     {20.5, 21.0, 19.8},
		"humidity": {65.0, 68.0, 62.0},
	})

	// Iterate columns
	for col := range arr.Columns() {
		fmt.Printf("%s (%s): %v\n", col.Name, col.DType, col.Data)
	}

	// Output depends on map iteration order
}
Example (ManualConstruction)

Example_manualConstruction demonstrates building an array step by step.

package main

import (
	"fmt"

	"github.com/alexshd/hybridarray"
)

func main() {
	arr := hybridarray.New(3)

	// Add columns
	arr.AddColumn("id", []any{1, 2, 3}, hybridarray.DTypeInt64)
	arr.AddColumn("name", []any{"Alice", "Bob", "Charlie"}, hybridarray.DTypeString)
	arr.AddColumn("score", []any{95.5, 87.0, 92.3}, hybridarray.DTypeFloat64)

	rows, cols := arr.Shape()
	fmt.Printf("Built array: %d rows, %d columns\n", rows, cols)

	// Access row
	row, _ := arr.Row(1)
	fmt.Printf("Row 1: %v\n", row)

	// Output depends on map iteration order
}
Example (ScientificWorkflow)

Example_scientificWorkflow demonstrates a typical research workflow.

package main

import (
	"fmt"
	"math"

	"github.com/alexshd/hybridarray"
)

func main() {
	// Simulate sensor data
	n := 100
	times := make([]any, n)
	temps := make([]any, n)
	pressures := make([]any, n)

	for i := 0; i < n; i++ {
		times[i] = float64(i)
		temps[i] = 20.0 + 5.0*math.Sin(float64(i)*0.1)
		pressures[i] = 1013.25 + 2.0*math.Cos(float64(i)*0.15)
	}

	// Create array
	arr, _ := hybridarray.FromMap(map[string][]any{
		"time":     times,
		"temp":     temps,
		"pressure": pressures,
	})

	// Filter to middle 50 samples
	filtered, _ := arr.Slice(25, 75)

	// Select only time and temperature
	selected, _ := filtered.Select("time", "temp")

	rows, cols := selected.Shape()
	fmt.Printf("Filtered data: %d samples, %d variables\n", rows, cols)

	// Sample first and last values
	first, _ := selected.At(0, "temp")
	last, _ := selected.At(rows-1, "temp")
	fmt.Printf("Temperature range: %.2f to %.2f°C\n", first, last)

}
Output:

Filtered data: 50 samples, 2 variables
Temperature range: 22.99 to 24.49°C
Example (Slicing)

Example_slicing demonstrates zero-copy row slicing.

package main

import (
	"fmt"

	"github.com/alexshd/hybridarray"
)

func main() {
	arr, _ := hybridarray.FromMap(map[string][]any{
		"x": {0.0, 1.0, 2.0, 3.0, 4.0, 5.0},
	})

	// Zero-copy slice
	view, _ := arr.Slice(2, 5) // rows 2, 3, 4

	rows, _ := view.Shape()
	fmt.Printf("View has %d rows\n", rows)

	val, _ := view.At(0, "x")
	fmt.Printf("view[0, 'x'] = %v\n", val)

}
Output:

View has 3 rows
view[0, 'x'] = 2
Example (ViewComposition)

Example_viewComposition demonstrates views of views.

package main

import (
	"fmt"

	"github.com/alexshd/hybridarray"
)

func main() {
	arr, _ := hybridarray.FromMap(map[string][]any{
		"x": {0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0},
		"y": {10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0},
	})

	// First slice: rows 2-8
	view1, _ := arr.Slice(2, 8)

	// Second slice: rows 1-4 of view1 (absolute rows 3-6)
	view2, _ := view1.Slice(1, 4)

	// Select column
	view3, _ := view2.Select("x")

	rows, cols := view3.Shape()
	fmt.Printf("Final view: %d rows, %d cols\n", rows, cols)

	val, _ := view3.At(0, "x")
	fmt.Printf("view3[0, 'x'] = %v (absolute row 3)\n", val)

}
Output:

Final view: 3 rows, 1 cols
view3[0, 'x'] = 3 (absolute row 3)
Example (ViewVsArray)

Example_viewVsArray demonstrates view detection.

package main

import (
	"fmt"

	"github.com/alexshd/hybridarray"
)

func main() {
	arr, _ := hybridarray.FromMap(map[string][]any{
		"x": {1.0, 2.0, 3.0},
	})

	view, _ := arr.Slice(0, 2)

	fmt.Printf("arr.IsView() = %v\n", arr.IsView())
	fmt.Printf("view.IsView() = %v\n", view.IsView())

}
Output:

arr.IsView() = false
view.IsView() = true
Example (ZeroCopySemantics)

Example_zeroCopySemantics demonstrates shared data between array and view.

package main

import (
	"fmt"

	"github.com/alexshd/hybridarray"
)

func main() {
	arr, _ := hybridarray.FromMap(map[string][]any{
		"x": {1.0, 2.0, 3.0, 4.0, 5.0},
	})

	// Create view
	view, _ := arr.Slice(1, 4)

	// Modify original data
	arr.GetColumn("x").Data[2] = 99.0

	// View sees the change (zero-copy)
	val, _ := view.At(1, "x")
	fmt.Printf("view[1, 'x'] = %v (modified)\n", val)

}
Output:

view[1, 'x'] = 99 (modified)

Index

Examples

Constants

View Source
const Version = "v0.0.1"

Version is the current version of the hybridarray package.

Variables

This section is empty.

Functions

This section is empty.

Types

type Array

type Array struct {
	// contains filtered or unexported fields
}

Array represents a hybrid data structure combining:

  • Map for O(1) column name lookups
  • Linked list for ordered column iteration
  • Columnar storage for cache-friendly data access
  • Zero-copy view semantics

Inspired by NumPy's ndarray and pandas DataFrame.

func FromMap

func FromMap(data map[string][]any) (*Array, error)

FromMap creates an Array from a map of column names to data slices. All slices must have the same length.

Example:

arr := FromMap(map[string][]any{
    "x": {1.0, 2.0, 3.0},
    "y": {4.0, 5.0, 6.0},
})

func New

func New(nrows int) *Array

New creates a new Array with the specified number of rows. Columns are added via AddColumn or FromMap.

func (*Array) AddColumn

func (a *Array) AddColumn(name string, data []any, dtype DType) error

AddColumn adds a new column to the array. The data slice must match the array's row count.

func (*Array) At

func (a *Array) At(row int, col string) (any, error)

At returns the value at the specified row and column. For views, coordinates are relative to the view.

func (*Array) ColumnNames

func (a *Array) ColumnNames() []string

ColumnNames returns all column names in insertion order.

func (*Array) Columns

func (a *Array) Columns() iter.Seq[*Column]

Columns returns an iterator over all columns in order. Uses Go 1.23+ range-over-func feature.

func (*Array) GetColumn

func (a *Array) GetColumn(name string) *Column

GetColumn retrieves a column by name. Returns nil if column doesn't exist.

func (*Array) IsView

func (a *Array) IsView() bool

IsView returns true if this array is a view of another array.

func (*Array) Row

func (a *Array) Row(row int) (map[string]any, error)

Row returns all values in the specified row as a map.

func (*Array) Select

func (a *Array) Select(colNames ...string) (*Array, error)

Select creates a zero-copy view with only specified columns.

Example:

view := arr.Select("x", "y")  // Only x and y columns

func (*Array) Shape

func (a *Array) Shape() (int, int)

Shape returns the dimensions of the array (rows, cols).

func (*Array) Slice

func (a *Array) Slice(start, end int) (*Array, error)

Slice creates a zero-copy view of rows [start:end). The view shares underlying column data with the parent array.

Example:

view := arr.Slice(10, 20)  // Rows 10-19

type Column

type Column struct {
	Name  string
	Data  []any // Type-erased for flexibility
	DType DType // Runtime type information
}

Column represents a single named column with typed data.

type ColumnNode

type ColumnNode struct {
	Column *Column
	Next   *ColumnNode
}

ColumnNode is a node in the column linked list.

type DType

type DType int

DType represents data types supported by the array.

const (
	DTypeFloat64 DType = iota
	DTypeInt64
	DTypeString
	DTypeBool
	DTypeAny
)

func (DType) String

func (d DType) String() string

String returns the string representation of a DType.

type ViewMetadata

type ViewMetadata struct {
	// contains filtered or unexported fields
}

ViewMetadata enables zero-copy array slicing.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL