parserkit

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 4, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

README

Parser Kit

Go Reference Go Report Card License codecov

中文文档

A generic data loader kit that supports loading data from multiple sources (file, Redis, remote HTTP) with priority-based fallback strategy.

Features

  • Multi-source support: Load data from local files, Redis, or remote HTTP endpoints
  • Priority-based fallback: Automatically fallback to next source if previous one fails
  • Generic design: Works with any JSON-serializable type
  • HTTP retry mechanism: Automatic retry with exponential backoff for remote requests
  • Size limits: Protection against memory exhaustion attacks (file/remote; Redis uses MaxFileSize with a size check)
  • Normalization support: Optional data normalization after parsing

Installation

go get github.com/soulteary/parser-kit

Usage

Basic Example
package main

import (
    "context"
    "github.com/soulteary/parser-kit"
    "github.com/soulteary/redis-kit/client"
)

type User struct {
    ID    string `json:"id"`
    Email string `json:"email"`
    Phone string `json:"phone"`
}

func main() {
    // Create loader
    loader, err := parserkit.NewLoader[User](nil)
    if err != nil {
        panic(err)
    }

    // Define sources with priority (lower number = higher priority)
    sources := []parserkit.Source{
        {
            Type:     parserkit.SourceTypeRedis,
            Priority: 0, // Highest priority
            Config: parserkit.SourceConfig{
                RedisKey:    "users:cache",
                RedisClient: redisClient, // *redis.Client
            },
        },
        {
            Type:     parserkit.SourceTypeRemote,
            Priority: 1,
            Config: parserkit.SourceConfig{
                RemoteURL:           "https://api.example.com/users",
                AuthorizationHeader: "Bearer token",
            },
        },
        {
            Type:     parserkit.SourceTypeFile,
            Priority: 2, // Lowest priority (fallback)
            Config: parserkit.SourceConfig{
                FilePath: "/path/to/users.json",
            },
        },
    }

    // Load data (will try Redis first, then remote, then file)
    ctx := context.Background()
    users, err := loader.Load(ctx, sources...)
    if err != nil {
        panic(err)
    }

    // Use users...
}
Individual Source Loading
// Load from file
users, err := loader.FromFile(ctx, "/path/to/users.json")

// Load from remote
users, err := loader.FromRemote(ctx, "https://api.example.com/users", "Bearer token")

// Load from Redis
users, err := loader.FromRedis(ctx, redisClient, "users:cache")
Custom Options
opts := &parserkit.LoadOptions{
    MaxFileSize:  20 * 1024 * 1024, // 20MB
    MaxRetries:   5,
    RetryDelay:   2 * time.Second,
    HTTPTimeout:  10 * time.Second,
}

normalizeFunc := func(users []User) []User {
    // Normalize data after parsing
    for i := range users {
        // Apply normalization logic
    }
    return users
}

loader, err := parserkit.NewLoaderWithNormalize[User](opts, normalizeFunc)

Source Types

File Source

Loads data from a local JSON file.

{
    Type: parserkit.SourceTypeFile,
    Priority: 2,
    Config: parserkit.SourceConfig{
        FilePath: "/path/to/data.json",
    },
}
Redis Source

Loads data from a Redis key (must contain JSON).

{
    Type: parserkit.SourceTypeRedis,
    Priority: 0,
    Config: parserkit.SourceConfig{
        RedisKey:    "data:cache",
        RedisClient: redisClient, // *redis.Client from redis-kit
    },
}
Remote Source

Loads data from a remote HTTP/HTTPS endpoint. Make sure RemoteURL is trusted or validated by the caller to avoid SSRF.

{
    Type: parserkit.SourceTypeRemote,
    Priority: 1,
    Config: parserkit.SourceConfig{
        RemoteURL:           "https://api.example.com/data",
        AuthorizationHeader: "Bearer token", // Optional
        Timeout:             5 * time.Second, // Optional, uses default if not set
    },
}
// Note: InsecureSkipVerify is set in LoadOptions at loader creation, not per source.

Note: InsecureSkipVerify is applied at loader creation time via LoadOptions. Per-source values are ignored; create separate loaders if you need different TLS behavior per source.

Priority System

Sources are processed in priority order:

  • Lower priority number = higher priority
  • Priority 0 is the highest priority
  • If a source fails, the loader automatically tries the next source
  • Behavior depends on LoadStrategy (see below)

Load Strategy

Two strategies control how data from multiple sources is combined:

Fallback (default)

LoadStrategyFallback: Returns data from the first successful source. Use for cache → remote → file style loading.

Merge

LoadStrategyMerge: Merges data from all successful sources with deduplication. Use when you need "remote + local supplement" (e.g. Warden's REMOTE_FIRST). Requires KeyFunc to extract a unique key per item.

keyFunc := func(u User) (string, bool) { return u.Phone, true } // key, include
opts := parserkit.DefaultLoadOptions()
opts.LoadStrategy = parserkit.LoadStrategyMerge
opts.KeyFunc = keyFunc
loader, _ := parserkit.NewLoader[User](opts)

// Load merges file1 + file2; same key overwrites (later source wins)
users, _ := loader.Load(ctx, sources...)

Options Reference

Option Default Description
MaxFileSize 10MB Max bytes to read from file/response
MaxRetries 3 Retries for remote requests
RetryDelay 1s Base delay between retries
HTTPTimeout 5s Timeout for remote requests
InsecureSkipVerify false Skip TLS verification (dev only)
AllowEmptyFile false Return [] when file not found instead of error
AllowEmptyData false When false, treat empty source as failure and try next
LoadStrategy fallback fallback or merge
KeyFunc nil Required for merge; func(T) (string, bool)

Use DefaultLoadOptions() and override fields as needed so MaxFileSize and similar are set. MaxFileSize is also used as a Redis value size guard before loading.

Error Handling

  • If all sources fail, Load() returns an error with the last error encountered
  • Individual source methods (FromFile, FromRemote, FromRedis) return errors immediately
  • File not found: error by default; use AllowEmptyFile: true to return []

Testing

Tests do not require a real Redis instance. The suite uses miniredis for Redis-related tests, so you can run tests and coverage locally without any external services:

go test ./...
go test -coverprofile=coverage.out -covermode=atomic ./...
go tool cover -func=coverage.out

Dependencies

  • github.com/soulteary/http-kit - For HTTP client and retry logic
  • github.com/redis/go-redis/v9 - For Redis operations

Test-only: github.com/alicebob/miniredis/v2 for in-process Redis in tests.

License

See LICENSE file for details.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type DataLoader

type DataLoader[T any] interface {
	// FromFile loads data from a local file
	FromFile(ctx context.Context, path string) ([]T, error)

	// FromRemote loads data from a remote URL
	FromRemote(ctx context.Context, url, auth string) ([]T, error)

	// FromRedis loads data from Redis
	FromRedis(ctx context.Context, client interface{}, key string) ([]T, error)

	// Load loads data from multiple sources
	// Sources are processed in priority order (lower number = higher priority)
	// Behavior depends on LoadStrategy:
	// - LoadStrategyFallback: returns data from the first successful source
	// - LoadStrategyMerge: merges data from all successful sources with deduplication
	Load(ctx context.Context, sources ...Source) ([]T, error)
}

DataLoader is a generic interface for loading data from various sources

func NewLoader

func NewLoader[T any](opts *LoadOptions) (DataLoader[T], error)

NewLoader creates a new generic data loader

func NewLoaderWithNormalize

func NewLoaderWithNormalize[T any](opts *LoadOptions, normalizeFunc NormalizeFunc[T]) (DataLoader[T], error)

NewLoaderWithNormalize creates a new generic data loader with normalization function

type KeyFunc

type KeyFunc[T any] func(T) (string, bool)

KeyFunc extracts a unique key from an item for deduplication Returns the key and true if the item should be included, false otherwise

type LoadOptions

type LoadOptions struct {
	// MaxFileSize limits the maximum file size to read (default: 10MB)
	MaxFileSize int64

	// MaxRetries for remote requests (default: 3)
	MaxRetries int

	// RetryDelay for remote requests (default: 1s)
	RetryDelay time.Duration

	// HTTPTimeout for remote requests (default: 5s)
	HTTPTimeout time.Duration

	// InsecureSkipVerify allows skipping TLS certificate verification (default: false)
	// Only use in development environments
	InsecureSkipVerify bool

	// AllowEmptyFile if true, returns empty slice instead of error when file not found (default: false)
	AllowEmptyFile bool

	// AllowEmptyData if true, continues to next source even if current source returns empty data (default: false)
	AllowEmptyData bool

	// LoadStrategy determines how to combine data from multiple sources (default: LoadStrategyFallback)
	// - LoadStrategyFallback: return data from first successful source
	// - LoadStrategyMerge: merge data from all successful sources with deduplication
	LoadStrategy LoadStrategy

	// KeyFunc is required when LoadStrategy is LoadStrategyMerge
	// It extracts a unique key from each item for deduplication
	// Note: This is stored as interface{} and will be type-asserted in the loader
	KeyFunc interface{} // Should be KeyFunc[T] but we can't use generics in struct fields
}

LoadOptions configures the behavior of Load operations

func DefaultLoadOptions

func DefaultLoadOptions() *LoadOptions

DefaultLoadOptions returns default load options

type LoadStrategy

type LoadStrategy string

LoadStrategy determines how data from multiple sources should be combined

const (
	// LoadStrategyFallback returns data from the first successful source (default)
	LoadStrategyFallback LoadStrategy = "fallback"
	// LoadStrategyMerge merges data from all successful sources with deduplication
	LoadStrategyMerge LoadStrategy = "merge"
)

type NormalizeFunc

type NormalizeFunc[T any] func([]T) []T

NormalizeFunc is a function type for normalizing data after parsing It accepts a slice of any type and returns a normalized slice

type Source

type Source struct {
	Type     SourceType
	Priority int // Lower number = higher priority (0 is highest)
	Config   SourceConfig
}

Source represents a data source with priority

type SourceConfig

type SourceConfig struct {
	// For file source
	FilePath string

	// For Redis source
	RedisKey    string
	RedisClient interface{} // *redis.Client from redis-kit

	// For remote source
	RemoteURL           string
	AuthorizationHeader string
	Timeout             time.Duration
	InsecureSkipVerify  bool
}

SourceConfig holds configuration for a data source

type SourceType

type SourceType string

SourceType represents the type of data source

const (
	// SourceTypeFile represents a local file source
	SourceTypeFile SourceType = "file"
	// SourceTypeRedis represents a Redis source
	SourceTypeRedis SourceType = "redis"
	// SourceTypeRemote represents a remote HTTP/HTTPS source
	SourceTypeRemote SourceType = "remote"
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL