π homebase-go-lib

High-performance Go library for data processing, file I/O, and system utilities | Built for production workloads with automatic compression detection and format conversion
A comprehensive toolkit for building data pipelines, ETL workflows, and command-line tools in Go. Features universal file processing with 7 compression formats, structured data iteration (CSV, JSON Lines, Parquet, MsgPack), and production-ready utilities.
π’ Note: This library represents the beginning of open-sourcing minor components from parf's proprietary Homebase framework. The Homebase framework has been battle-tested in production environments processing petabytes of data. We're gradually sharing foundational utilities that can benefit the broader Go and data engineering community while keeping core business logic proprietary.
π οΈ Bundled Utilities
Command-line tools for data conversion and database import/export with comprehensive SQL support.
| Utility |
Description |
| any2parquet |
Export data to Parquet format (recommended for analytics) |
| any2jsonl |
Export data to JSONL format (human-readable, debug-friendly) |
| any2csv |
Export data to CSV format (spreadsheet-compatible) |
| any2db |
Import data to MySQL/PostgreSQL databases with auto-schema |
All utilities support:
- π SQL queries from MySQL & PostgreSQL databases
- π¦ Multiple file formats (Parquet, JSONL, CSV, MsgPack)
- ποΈ Compression (.gz, .zst, .lz4, .br, .xz)
- π Stdout piping with
- for data pipelines
π Full Documentation & Examples β
π Key Features
π Universal File Processing
- 7 compression formats with auto-detection
- 5 structured formats: CSV, JSONL, Parquet, MsgPack, FlatBuffer
- HTTP/HTTPS URL support for remote files
- Streaming processing for large files
|
- Zero-copy operations where possible
- Parallel processing support
- Memory efficient streaming
- Progress tracking built-in
|
π§ Production Ready
- Type-safe generic iterators
- Error handling throughout
- Battle-tested in production
- Comprehensive tests
|
π― Developer Friendly
- Simple API - consistent patterns
- Auto-detection - no manual config
- Rich examples included
- Well documented
|
π¦ Installation
go get github.com/parf/homebase-go-lib
π Quick Start
Process Compressed Files Automatically
import "github.com/parf/homebase-go-lib/fileiterator"
// Works with .gz, .zst, .lz4, .br, .xz automatically!
fileiterator.IterateLines("access.log.gz", func(line string) error {
// Process each line
fmt.Println(line)
return nil
})
Type-Safe JSON Lines Processing
type User struct {
ID int64 `json:"id"`
Name string `json:"name"`
Email string `json:"email"`
}
// Generic type-safe iterator
fileiterator.IterateJSONLTyped("users.jsonl.zst", func(user User) error {
fmt.Printf("User: %s <%s>\n", user.Name, user.Email)
return nil
})
Universal Schema Support
// Read ANY schema from Parquet, CSV, JSONL, MsgPack
records, _ := fileiterator.ReadInput("data.parquet")
for _, record := range records {
// record is map[string]any - works with any field structure!
fmt.Printf("Record: %v\n", record)
}
Progress Tracking
import "github.com/parf/homebase-go-lib/clistat"
stat := clistat.New(10) // Report every 10 seconds
for i := 0; i < 1_000_000; i++ {
stat.Hit() // Auto-reports progress: "45.2K hits/sec (500K total)"
}
stat.Finish() // Final summary
| Format |
Extensions |
Read |
Write |
Use Case |
Performance |
| π CSV |
.csv, .tsv |
β
|
β
|
Excel compatibility, human-readable |
Good |
| π JSON Lines |
.jsonl, .ndjson |
β
|
β
|
Debugging, wide support |
Moderate |
| π Apache Parquet |
.parquet, .pk |
β
|
β
|
Analytics, columnar queries |
Excellent |
| π§ MessagePack |
.msgpack, .mp |
β
|
β
|
Binary efficiency, 2x smaller than JSON |
Very Good |
| β‘ FlatBuffer |
.fb |
β
|
β
|
Zero-copy, fastest reads (3x faster) |
Fastest |
| Compression |
Extension |
Speed |
Ratio |
Use Case |
| β‘ LZ4 |
.lz4 |
Fastest |
Good |
Real-time processing |
| π― Zstandard |
.zst |
Fast |
Excellent |
Recommended for most uses |
| π¦ Gzip |
.gz |
Moderate |
Good |
Universal compatibility |
| π₯ Brotli |
.br |
Slow |
Best |
Maximum compression |
| βοΈ XZ/LZMA |
.xz |
Very Slow |
Excellent |
Archive storage |
| π Zlib |
.zlib, .zz |
Moderate |
Good |
Legacy support |
All formats work seamlessly with all compression types! For example: .jsonl.zst, .csv.gz, .parquet.lz4
π‘ Use Cases
π Data Pipeline Processing
// Convert between any formats with automatic compression
input, _ := fileiterator.ReadInput("raw-data.csv.gz") // CSV + Gzip
fileiterator.WriteParquetAny("processed.parquet.zst", input) // Parquet + Zstd
π Log Analysis
stat := clistat.New(5)
fileiterator.IterateLines("access.log.gz", func(line string) error {
if strings.Contains(line, "ERROR") {
// Process error logs
}
stat.Hit()
return nil
})
stat.Finish() // "Processed 2.5M lines in 3.2s (781K lines/sec)"
ποΈ Database ETL
// Extract from CSV, transform, load to database
fileiterator.IterateCSVMap("export.csv.zst", func(row map[string]string) error {
// Transform data
user := transformUser(row)
// Load to database
return db.Insert(user)
})
π Batch Processing
// Process millions of records efficiently
fileiterator.IterateParquetAny("events.parquet", func(event map[string]any) error {
// Process each event with automatic memory management
return processEvent(event)
})
π Core Packages
π fileiterator - Universal File Processing
The heart of homebase-go-lib. Process any file format with automatic compression detection.
Key Functions
// Universal I/O
FUOpen(filename) // Open any file/URL with auto-decompression
FUCreate(filename) // Create file with auto-compression
ReadInput(filename) // Read ANY schema to []map[string]any
WriteOutput(filename, data) // Write ANY schema from []map[string]any
// Line-by-line Processing
IterateLines(filename, func(line string) error)
// Structured Data (Untyped)
IterateJSONL(filename, func(map[string]any) error)
IterateCSVMap(filename, func(map[string]string) error)
IterateMsgPack(filename, func(any) error)
IterateParquetAny(filename, func(map[string]any) error)
// Structured Data (Type-Safe Generics)
IterateJSONLTyped[T](filename, func(T) error)
IterateMsgPackTyped[T](filename, func(T) error)
// Binary Formats
IterateBinaryRecords(filename, recordSize, func([]byte) error)
IterateFlatBufferList(filename, func([]byte) error)
Features:
- β
Automatic compression detection from file extension
- β
HTTP/HTTPS URL support
- β
Streaming for memory efficiency
- β
Progress reporting integration
- β
Error handling with context
π clistat - Real-Time Statistics
Track processing progress with automatic hit-rate reporting.
stat := clistat.New(10) // Report every 10 seconds
for i := 0; i < 1_000_000; i++ {
// Your processing logic
stat.Hit() // Automatically reports: "45.2K hits/sec (500K total)"
}
stat.Finish() // Final: "Processed 1M items in 22.1s (45.2K/sec)"
Features:
- β
Automatic progress reporting
- β
Configurable intervals
- β
Hits-per-second calculation
- β
Total count tracking
- β
Elapsed time reporting
ποΈ sql - Database Utilities
Efficient database operations with batch processing.
// Batch Insert
inserter := sql.NewBatchInserter(db, "users", []string{"id", "name", "email"}, 1000)
inserter.Add(1, "Alice", "alice@example.com")
inserter.Add(2, "Bob", "bob@example.com")
inserter.Flush()
// Query Iteration
sql.SqlIterator(db, "SELECT * FROM users WHERE active = true", func(row map[string]any) error {
// Process each row
return nil
})
πΎ cache - Map-to-Parquet File Caching
Cache any Go map as a Parquet file with precise scalar type preservation.
import "github.com/parf/homebase-go-lib/cache"
data := make(map[string]any)
if !cache.Map("lookup.parquet", data) {
data = expensiveComputation()
if err := cache.WriteMap("lookup.parquet", data); err != nil {
log.Printf("cache write failed: %v", err)
}
}
// use data directly β no type assertion needed
Supports any map type β map[string]any, map[uint32]string, map[int]float64, etc.
Features:
- β
Precise type preservation (uint32 stays uint32, float32 stays float32)
- β
String-keyed maps (column-per-key) and numeric-keyed maps (key-value columns)
- β
Best-effort caching β silent errors, corrupted files treated as misses
- β
Internal Snappy compression
π Full Documentation β
Universal Converters (Included)
Located in cmd/ directory:
any2parquet - Convert to Apache Parquet
# Convert any format to Parquet
any2parquet data.jsonl # β data.parquet
any2parquet logs.csv.gz # β logs.parquet
any2parquet events.msgpack.zst # β events.parquet
any2jsonl - Convert to JSON Lines
# Convert any format to human-readable JSONL
any2jsonl data.parquet # β data.jsonl
any2jsonl users.csv # β users.jsonl
any2jsonl metrics.parquet.zst # β metrics.jsonl
Standalone Tool: any-to-parquet - Optimized Parquet converter
Based on 1 million records:
| Format |
File Size |
Read Time |
Write Time |
Compression |
Best For |
| Parquet |
44 MB |
0.15s |
0.46s |
Excellent |
Everything π |
| MsgPack.zst |
38 MB |
0.59s |
0.61s |
Best |
Binary efficiency |
| JSONL.zst |
43 MB |
1.91s |
0.84s |
Excellent |
Debugging |
| FlatBuffer.lz4 |
66 MB |
0.06s |
0.42s |
Good |
Ultra-fast reads |
| CSV.gz |
52 MB |
2.1s |
1.2s |
Good |
Excel compatibility |
| Plain JSONL |
156 MB |
1.93s |
1.38s |
None |
Human-readable |
Winner: Parquet delivers the best balance of speed, compression, and compatibility.
Full Benchmark Results β
π§ͺ Examples & Tests
Running Examples
# File processing examples
cd examples/fileiterator
go run main.go
# Statistics tracking
cd examples/clistat
go run main.go
# Schema examples (5 different data structures)
cd cmd/examples/schemas
./test-all-schemas.sh
Test Different Schemas
The library works with ANY schema structure. See examples:
View All Schema Examples β
π οΈ Development
Prerequisites
- Go 1.21 or higher
- Make (optional)
Build
make build
Run Tests
make test
Test Coverage
make test-coverage
make fmt
make lint
π Project Structure
homebase-go-lib/
βββ π¦ fileiterator/ # File processing & format conversion
β βββ parquet.go # Apache Parquet support
β βββ jsonl.go # JSON Lines processing
β βββ csv.go # CSV with auto-detection
β βββ msgpack.go # MessagePack binary format
β βββ genericio.go # Universal I/O functions
β βββ compression.go # 7 compression formats
β
βββ π clistat/ # Real-time statistics tracking
β βββ clistat.go
β
βββ ποΈ sql/ # Database utilities
β βββ batch.go # Batch insert operations
β βββ iterator.go # Query iteration
β
βββ πΎ cache/ # Map-to-Parquet file caching
β βββ mapcache.go # Precise type-preserving cache
β
βββ π― cmd/ # Command-line tools
β βββ any2parquet.go # Universal β Parquet converter
β βββ any2jsonl.go # Universal β JSONL converter
β βββ examples/ # Usage examples
β βββ schemas/ # 5 different schema examples
β
βββ π§ͺ examples/ # Code examples
βββ π docs/ # Documentation
βββ ποΈ benchmarks/ # Performance benchmarks
βββ π§° testdata/ # Test fixtures
π Key Concepts
Automatic Compression Detection
// All these work automatically based on file extension:
fileiterator.IterateLines("file.txt") // Plain text
fileiterator.IterateLines("file.txt.gz") // Gzip compressed
fileiterator.IterateLines("file.txt.zst") // Zstandard compressed
fileiterator.IterateLines("file.txt.lz4") // LZ4 compressed
Universal Schema Support
// No schema definition needed - works with ANY structure!
records, _ := fileiterator.ReadInput("data.csv")
// records[0] might be: {"user_id": 1, "name": "Alice", "age": 28}
records2, _ := fileiterator.ReadInput("sensors.jsonl")
// records2[0] might be: {"sensor": "temp-01", "value": 23.5, "unit": "celsius"}
Type-Safe Generics
// Define your struct
type Product struct {
ID int `json:"id"`
Name string `json:"name"`
Price float64 `json:"price"`
}
// Get type-safe iteration with Go generics
fileiterator.IterateJSONLTyped("products.jsonl", func(p Product) error {
fmt.Printf("%s: $%.2f\n", p.Name, p.Price)
return nil
})
π€ Contributing
Contributions welcome! Please:
- π΄ Fork the repository
- πΏ Create a feature branch
- β
Add tests for new functionality
- π Update documentation
- π Submit a pull request
Report Bug Β· Request Feature
π License
MIT License - see LICENSE file for details
π·οΈ Keywords
go library, golang, file processing, data pipeline, ETL, compression, gzip, zstd, lz4, parquet, json lines, csv processing, msgpack, data engineering, batch processing, streaming, apache parquet, columnar format, data conversion, format converter, structured data, log processing, statistics tracking, progress reporting, database utilities, sql batch insert, type-safe iterators, go generics, high performance, production ready
β Star History
If you find this library useful, please give it a star! β
