hb

package module
v0.0.0-...-7e3c669 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 7, 2026 License: MIT Imports: 10 Imported by: 0

README ΒΆ

🏠 homebase-go-lib

Go Version License Go Report Card GitHub Stars

High-performance Go library for data processing, file I/O, and system utilities | Built for production workloads with automatic compression detection and format conversion

A comprehensive toolkit for building data pipelines, ETL workflows, and command-line tools in Go. Features universal file processing with 7 compression formats, structured data iteration (CSV, JSON Lines, Parquet, MsgPack), and production-ready utilities.

πŸ“’ Note: This library represents the beginning of open-sourcing minor components from parf's proprietary Homebase framework. The Homebase framework has been battle-tested in production environments processing petabytes of data. We're gradually sharing foundational utilities that can benefit the broader Go and data engineering community while keeping core business logic proprietary.


πŸ› οΈ Bundled Utilities

Command-line tools for data conversion and database import/export with comprehensive SQL support.

Utility Description
any2parquet Export data to Parquet format (recommended for analytics)
any2jsonl Export data to JSONL format (human-readable, debug-friendly)
any2csv Export data to CSV format (spreadsheet-compatible)
any2db Import data to MySQL/PostgreSQL databases with auto-schema

All utilities support:

  • πŸ”Œ SQL queries from MySQL & PostgreSQL databases
  • πŸ“¦ Multiple file formats (Parquet, JSONL, CSV, MsgPack)
  • πŸ—œοΈ Compression (.gz, .zst, .lz4, .br, .xz)
  • πŸ”„ Stdout piping with - for data pipelines

πŸ“– Full Documentation & Examples β†’


🌟 Key Features

πŸ“ Universal File Processing
  • 7 compression formats with auto-detection
  • 5 structured formats: CSV, JSONL, Parquet, MsgPack, FlatBuffer
  • HTTP/HTTPS URL support for remote files
  • Streaming processing for large files
⚑ High Performance
  • Zero-copy operations where possible
  • Parallel processing support
  • Memory efficient streaming
  • Progress tracking built-in
πŸ”§ Production Ready
  • Type-safe generic iterators
  • Error handling throughout
  • Battle-tested in production
  • Comprehensive tests
🎯 Developer Friendly
  • Simple API - consistent patterns
  • Auto-detection - no manual config
  • Rich examples included
  • Well documented

πŸ“¦ Installation

go get github.com/parf/homebase-go-lib

πŸš€ Quick Start

Process Compressed Files Automatically
import "github.com/parf/homebase-go-lib/fileiterator"

// Works with .gz, .zst, .lz4, .br, .xz automatically!
fileiterator.IterateLines("access.log.gz", func(line string) error {
    // Process each line
    fmt.Println(line)
    return nil
})
Type-Safe JSON Lines Processing
type User struct {
    ID    int64  `json:"id"`
    Name  string `json:"name"`
    Email string `json:"email"`
}

// Generic type-safe iterator
fileiterator.IterateJSONLTyped("users.jsonl.zst", func(user User) error {
    fmt.Printf("User: %s <%s>\n", user.Name, user.Email)
    return nil
})
Universal Schema Support
// Read ANY schema from Parquet, CSV, JSONL, MsgPack
records, _ := fileiterator.ReadInput("data.parquet")
for _, record := range records {
    // record is map[string]any - works with any field structure!
    fmt.Printf("Record: %v\n", record)
}
Progress Tracking
import "github.com/parf/homebase-go-lib/clistat"

stat := clistat.New(10) // Report every 10 seconds
for i := 0; i < 1_000_000; i++ {
    stat.Hit()  // Auto-reports progress: "45.2K hits/sec (500K total)"
}
stat.Finish()   // Final summary

πŸ“‹ Supported Formats

Structured Data Formats
Format Extensions Read Write Use Case Performance
πŸ“„ CSV .csv, .tsv βœ… βœ… Excel compatibility, human-readable Good
πŸ“ JSON Lines .jsonl, .ndjson βœ… βœ… Debugging, wide support Moderate
πŸ“Š Apache Parquet .parquet, .pk βœ… βœ… Analytics, columnar queries Excellent
πŸ”§ MessagePack .msgpack, .mp βœ… βœ… Binary efficiency, 2x smaller than JSON Very Good
⚑ FlatBuffer .fb βœ… βœ… Zero-copy, fastest reads (3x faster) Fastest
Compression Formats (Auto-Detected)
Compression Extension Speed Ratio Use Case
⚑ LZ4 .lz4 Fastest Good Real-time processing
🎯 Zstandard .zst Fast Excellent Recommended for most uses
πŸ“¦ Gzip .gz Moderate Good Universal compatibility
πŸ”₯ Brotli .br Slow Best Maximum compression
❄️ XZ/LZMA .xz Very Slow Excellent Archive storage
πŸ“‹ Zlib .zlib, .zz Moderate Good Legacy support

All formats work seamlessly with all compression types! For example: .jsonl.zst, .csv.gz, .parquet.lz4


πŸ’‘ Use Cases

πŸ”„ Data Pipeline Processing
// Convert between any formats with automatic compression
input, _ := fileiterator.ReadInput("raw-data.csv.gz")           // CSV + Gzip
fileiterator.WriteParquetAny("processed.parquet.zst", input)    // Parquet + Zstd
πŸ“Š Log Analysis
stat := clistat.New(5)
fileiterator.IterateLines("access.log.gz", func(line string) error {
    if strings.Contains(line, "ERROR") {
        // Process error logs
    }
    stat.Hit()
    return nil
})
stat.Finish()  // "Processed 2.5M lines in 3.2s (781K lines/sec)"
πŸ—„οΈ Database ETL
// Extract from CSV, transform, load to database
fileiterator.IterateCSVMap("export.csv.zst", func(row map[string]string) error {
    // Transform data
    user := transformUser(row)

    // Load to database
    return db.Insert(user)
})
πŸš€ Batch Processing
// Process millions of records efficiently
fileiterator.IterateParquetAny("events.parquet", func(event map[string]any) error {
    // Process each event with automatic memory management
    return processEvent(event)
})

πŸ“š Core Packages

πŸ“ fileiterator - Universal File Processing

The heart of homebase-go-lib. Process any file format with automatic compression detection.

Key Functions
// Universal I/O
FUOpen(filename)              // Open any file/URL with auto-decompression
FUCreate(filename)            // Create file with auto-compression
ReadInput(filename)           // Read ANY schema to []map[string]any
WriteOutput(filename, data)   // Write ANY schema from []map[string]any

// Line-by-line Processing
IterateLines(filename, func(line string) error)

// Structured Data (Untyped)
IterateJSONL(filename, func(map[string]any) error)
IterateCSVMap(filename, func(map[string]string) error)
IterateMsgPack(filename, func(any) error)
IterateParquetAny(filename, func(map[string]any) error)

// Structured Data (Type-Safe Generics)
IterateJSONLTyped[T](filename, func(T) error)
IterateMsgPackTyped[T](filename, func(T) error)

// Binary Formats
IterateBinaryRecords(filename, recordSize, func([]byte) error)
IterateFlatBufferList(filename, func([]byte) error)

Features:

  • βœ… Automatic compression detection from file extension
  • βœ… HTTP/HTTPS URL support
  • βœ… Streaming for memory efficiency
  • βœ… Progress reporting integration
  • βœ… Error handling with context
πŸ“Š clistat - Real-Time Statistics

Track processing progress with automatic hit-rate reporting.

stat := clistat.New(10)  // Report every 10 seconds

for i := 0; i < 1_000_000; i++ {
    // Your processing logic
    stat.Hit()  // Automatically reports: "45.2K hits/sec (500K total)"
}

stat.Finish()  // Final: "Processed 1M items in 22.1s (45.2K/sec)"

Features:

  • βœ… Automatic progress reporting
  • βœ… Configurable intervals
  • βœ… Hits-per-second calculation
  • βœ… Total count tracking
  • βœ… Elapsed time reporting
πŸ—„οΈ sql - Database Utilities

Efficient database operations with batch processing.

// Batch Insert
inserter := sql.NewBatchInserter(db, "users", []string{"id", "name", "email"}, 1000)
inserter.Add(1, "Alice", "alice@example.com")
inserter.Add(2, "Bob", "bob@example.com")
inserter.Flush()

// Query Iteration
sql.SqlIterator(db, "SELECT * FROM users WHERE active = true", func(row map[string]any) error {
    // Process each row
    return nil
})
πŸ’Ύ cache - Map-to-Parquet File Caching

Cache any Go map as a Parquet file with precise scalar type preservation.

import "github.com/parf/homebase-go-lib/cache"

data := make(map[string]any)
if !cache.Map("lookup.parquet", data) {
    data = expensiveComputation()
    if err := cache.WriteMap("lookup.parquet", data); err != nil {
        log.Printf("cache write failed: %v", err)
    }
}
// use data directly β€” no type assertion needed

Supports any map type β€” map[string]any, map[uint32]string, map[int]float64, etc.

Features:

  • βœ… Precise type preservation (uint32 stays uint32, float32 stays float32)
  • βœ… String-keyed maps (column-per-key) and numeric-keyed maps (key-value columns)
  • βœ… Best-effort caching β€” silent errors, corrupted files treated as misses
  • βœ… Internal Snappy compression

πŸ“– Full Documentation β†’


🎯 Format Conversion Tools

Universal Converters (Included)

Located in cmd/ directory:

any2parquet - Convert to Apache Parquet
# Convert any format to Parquet
any2parquet data.jsonl                    # β†’ data.parquet
any2parquet logs.csv.gz                   # β†’ logs.parquet
any2parquet events.msgpack.zst            # β†’ events.parquet
any2jsonl - Convert to JSON Lines
# Convert any format to human-readable JSONL
any2jsonl data.parquet                    # β†’ data.jsonl
any2jsonl users.csv                       # β†’ users.jsonl
any2jsonl metrics.parquet.zst             # β†’ metrics.jsonl

Standalone Tool: any-to-parquet - Optimized Parquet converter


πŸ“ˆ Performance Benchmarks

Based on 1 million records:

Format File Size Read Time Write Time Compression Best For
Parquet 44 MB 0.15s 0.46s Excellent Everything πŸ†
MsgPack.zst 38 MB 0.59s 0.61s Best Binary efficiency
JSONL.zst 43 MB 1.91s 0.84s Excellent Debugging
FlatBuffer.lz4 66 MB 0.06s 0.42s Good Ultra-fast reads
CSV.gz 52 MB 2.1s 1.2s Good Excel compatibility
Plain JSONL 156 MB 1.93s 1.38s None Human-readable

Winner: Parquet delivers the best balance of speed, compression, and compatibility.

Full Benchmark Results β†’


πŸ§ͺ Examples & Tests

Running Examples
# File processing examples
cd examples/fileiterator
go run main.go

# Statistics tracking
cd examples/clistat
go run main.go

# Schema examples (5 different data structures)
cd cmd/examples/schemas
./test-all-schemas.sh
Test Different Schemas

The library works with ANY schema structure. See examples:

View All Schema Examples β†’


πŸ› οΈ Development

Prerequisites
  • Go 1.21 or higher
  • Make (optional)
Build
make build
Run Tests
make test
Test Coverage
make test-coverage
Format & Lint
make fmt
make lint

πŸ“ Project Structure

homebase-go-lib/
β”œβ”€β”€ πŸ“¦ fileiterator/       # File processing & format conversion
β”‚   β”œβ”€β”€ parquet.go         # Apache Parquet support
β”‚   β”œβ”€β”€ jsonl.go           # JSON Lines processing
β”‚   β”œβ”€β”€ csv.go             # CSV with auto-detection
β”‚   β”œβ”€β”€ msgpack.go         # MessagePack binary format
β”‚   β”œβ”€β”€ genericio.go       # Universal I/O functions
β”‚   └── compression.go     # 7 compression formats
β”‚
β”œβ”€β”€ πŸ“Š clistat/            # Real-time statistics tracking
β”‚   └── clistat.go
β”‚
β”œβ”€β”€ πŸ—„οΈ sql/                # Database utilities
β”‚   β”œβ”€β”€ batch.go           # Batch insert operations
β”‚   └── iterator.go        # Query iteration
β”‚
β”œβ”€β”€ πŸ’Ύ cache/              # Map-to-Parquet file caching
β”‚   └── mapcache.go        # Precise type-preserving cache
β”‚
β”œβ”€β”€ 🎯 cmd/                # Command-line tools
β”‚   β”œβ”€β”€ any2parquet.go     # Universal β†’ Parquet converter
β”‚   β”œβ”€β”€ any2jsonl.go       # Universal β†’ JSONL converter
β”‚   └── examples/          # Usage examples
β”‚       └── schemas/       # 5 different schema examples
β”‚
β”œβ”€β”€ πŸ§ͺ examples/           # Code examples
β”œβ”€β”€ πŸ“š docs/               # Documentation
β”œβ”€β”€ πŸ—οΈ benchmarks/         # Performance benchmarks
└── 🧰 testdata/           # Test fixtures

πŸ”‘ Key Concepts

Automatic Compression Detection
// All these work automatically based on file extension:
fileiterator.IterateLines("file.txt")      // Plain text
fileiterator.IterateLines("file.txt.gz")   // Gzip compressed
fileiterator.IterateLines("file.txt.zst")  // Zstandard compressed
fileiterator.IterateLines("file.txt.lz4")  // LZ4 compressed
Universal Schema Support
// No schema definition needed - works with ANY structure!
records, _ := fileiterator.ReadInput("data.csv")
// records[0] might be: {"user_id": 1, "name": "Alice", "age": 28}

records2, _ := fileiterator.ReadInput("sensors.jsonl")
// records2[0] might be: {"sensor": "temp-01", "value": 23.5, "unit": "celsius"}
Type-Safe Generics
// Define your struct
type Product struct {
    ID    int     `json:"id"`
    Name  string  `json:"name"`
    Price float64 `json:"price"`
}

// Get type-safe iteration with Go generics
fileiterator.IterateJSONLTyped("products.jsonl", func(p Product) error {
    fmt.Printf("%s: $%.2f\n", p.Name, p.Price)
    return nil
})

🀝 Contributing

Contributions welcome! Please:

  1. 🍴 Fork the repository
  2. 🌿 Create a feature branch
  3. βœ… Add tests for new functionality
  4. πŸ“ Update documentation
  5. πŸš€ Submit a pull request

Report Bug Β· Request Feature


πŸ“„ License

MIT License - see LICENSE file for details



🏷️ Keywords

go library, golang, file processing, data pipeline, ETL, compression, gzip, zstd, lz4, parquet, json lines, csv processing, msgpack, data engineering, batch processing, streaming, apache parquet, columnar format, data conversion, format converter, structured data, log processing, statistics tracking, progress reporting, database utilities, sql batch insert, type-safe iterators, go generics, high performance, production ready


⭐ Star History

If you find this library useful, please give it a star! ⭐

Star History Chart


Built with ❀️ for the Go and data engineering community

Documentation Β· Examples Β· Benchmarks

Documentation ΒΆ

Overview ΒΆ

Package hb provides the main functionality for the homebase library.

Index ΒΆ

Examples ΒΆ

Constants ΒΆ

View Source
const Version = "0.1.0"

Version is the current version of the library

Variables ΒΆ

This section is empty.

Functions ΒΆ

func Any2uint32 ΒΆ

func Any2uint32(iii any) (r uint32, err error)

Any2uint32 converts various integer types to uint32

func DumpSortedMap ΒΆ

func DumpSortedMap(m map[string]any)

DumpSortedMap prints a map in key-sorted order

func MemReport ΒΆ

func MemReport(event string)

MemReport reports allocated memory to STDOUT

func Scale ΒΆ

func Scale(nn uint32) byte

Scale returns a 0..9 scale value for uint32 numbers using logarithmic base-4. Returns 0 for nn=0, 1 for nn=1, then logBase4(nn)+1 capped at 9. This is a copy of HB::scale(nn, 4) method for int16 numbers range.

Example ΒΆ
package main

import (
	"fmt"

	hb "github.com/parf/homebase-go-lib"
)

func main() {
	// Scale small values
	fmt.Println(hb.Scale(0))
	fmt.Println(hb.Scale(1))
	fmt.Println(hb.Scale(4))
	fmt.Println(hb.Scale(16))
	fmt.Println(hb.Scale(64))
	fmt.Println(hb.Scale(256))
	fmt.Println(hb.Scale(1024))
	fmt.Println(hb.Scale(4096))
	fmt.Println(hb.Scale(16384))
	fmt.Println(hb.Scale(100000))

}
Output:

0
1
2
3
4
5
6
7
8
9
Example (Ranges) ΒΆ
package main

import (
	"fmt"

	hb "github.com/parf/homebase-go-lib"
)

func main() {
	// Demonstrate ranges
	values := []uint32{0, 1, 5, 17, 65, 257, 1025, 5000, 20000}
	for _, v := range values {
		fmt.Printf("%d -> scale %d\n", v, hb.Scale(v))
	}

}
Output:

0 -> scale 0
1 -> scale 1
5 -> scale 3
17 -> scale 4
65 -> scale 5
257 -> scale 6
1025 -> scale 7
5000 -> scale 8
20000 -> scale 9

func SysLogError ΒΆ

func SysLogError(message string)

SysLogError writes error to syslog Usage: hb.SysLogError(fmt.Sprintf(...))

func SysLogNotice ΒΆ

func SysLogNotice(message string)

SysLogNotice writes notice to syslog Usage: hb.SysLogNotice(fmt.Sprintf(...))

Types ΒΆ

type DebugFunction ΒΆ

type DebugFunction func(level int, format string, a ...any)

DebugFunction is a function type for debug output

func Debug ΒΆ

func Debug(prefix string, level int) DebugFunction

Debug creates a debug function that outputs to STDERR

func DebugLog ΒΆ

func DebugLog(prefix string, level int) DebugFunction

DebugLog creates a debug function with timestamp prefix (uses log package)

type JobScheduler ΒΆ

type JobScheduler struct {
	// contains filtered or unexported fields
}

JobScheduler represents a periodic job scheduler

func NewJobScheduler ΒΆ

func NewJobScheduler(intervalSeconds int, jobFunc func()) *JobScheduler

NewJobScheduler creates a new job scheduler

func (*JobScheduler) IsRunning ΒΆ

func (js *JobScheduler) IsRunning() bool

IsRunning returns whether the scheduler is currently running

func (*JobScheduler) Start ΒΆ

func (js *JobScheduler) Start() error

Start begins the job scheduler

func (*JobScheduler) Stop ΒΆ

func (js *JobScheduler) Stop() error

Stop halts the job scheduler

type ParallelRunner ΒΆ

type ParallelRunner struct {
	// contains filtered or unexported fields
}

ParallelRunner runs tasks in parallel with performance tracking

func NewParallelRunner ΒΆ

func NewParallelRunner() ParallelRunner

NewParallelRunner creates a new parallel task runner

func (*ParallelRunner) Finish ΒΆ

func (p *ParallelRunner) Finish()

Finish waits for all parallel tasks to complete and logs final statistics

func (*ParallelRunner) Run ΒΆ

func (p *ParallelRunner) Run(name string, f func())

Run starts a task in parallel and logs start/finish with elapsed time and memory total

type SequentialRunner ΒΆ

type SequentialRunner struct {
	// contains filtered or unexported fields
}

SequentialRunner is a debug option to use instead of ParallelRunner

func NewSequentialRunner ΒΆ

func NewSequentialRunner() SequentialRunner

NewSequentialRunner creates a drop-in debug replacement for ParallelRunner Usage:

runner := hb.NewSequentialRunner()
runner.Run(func())
runner.Finish()

func (*SequentialRunner) Finish ΒΆ

func (p *SequentialRunner) Finish()

Finish logs final statistics for the sequential runner

func (*SequentialRunner) Run ΒΆ

func (p *SequentialRunner) Run(name string, f func())

Run executes a task sequentially and logs start/finish with elapsed time, memory diff & total

Directories ΒΆ

Path Synopsis
cmd
examples command
examples
clistat command
compression command
scale command
sql command
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL