tinyparser

package module
v0.0.0-...-e0a48d9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 9, 2025 License: MIT Imports: 4 Imported by: 0

README ยถ

TinyParser: High-Performance Parallel File Processing in Go

TinyParser is a lightweight and efficient Go library designed for parallel file processing.
It reads and processes large files in multiple worker threads, optimizing CPU and memory usage.

๐Ÿš€ Features

  • โœ… Parallel Processing: Uses multiple worker threads, each pinned to a separate CPU core.
  • โœ… Custom Parsing Function: Users can define their own parser function for handling data.
  • โœ… Optimized Memory Usage: Automatically distributes memory limits across workers.
  • โœ… Built-in Mutex Handling: Ensures thread-safe operations without extra coding.
  • โœ… Efficient Large File Handling: Reads chunks directly from the file without loading everything into memory.

Benchmark Test Results
  • File size: 20 MB
  • Number of workers: 4
  • Memory limit: 10 MB
Results:
  • TinyParser: 2.735708ms
  • bufio.NewReaderSize: 7.035084ms

โœ… TinyParser is faster!


๐Ÿ”ง Installation

To install TinyParser, run the following command:

go get github.com/dzaurov/TinyParser

๐Ÿ› ๏ธ How It Works

  1. Divides the file into chunks based on available memory.
  2. Assigns chunks to worker threads in an alternating sequence (e.g., worker 1 โ†’ chunk 1, 3, 5, etc.).
  3. Each worker reads its assigned chunks without overlapping with others.
  4. Passes data to the user-defined parser function for processing.
  5. Stores results safely using a built-in mutex to prevent race conditions.

๐Ÿ’ก Usage Example

package main

import (
	"fmt"
	"log"

	"github.com/dzaurov/tinyparser"
)

// Storage structure for parsed data
type Storage struct {
	Results []string
}

// Custom parser function
func myParser(workerID int, data []byte, storage interface{}) error {
	store := storage.(*Storage)
	result := fmt.Sprintf("Worker %d processed: %s", workerID, string(data))
	store.Results = append(store.Results, result)
	return nil
}

func main() {
	// Initialize storage
	storage := &Storage{}

	// Configure FileParser
	config := fileparser.Config{
		ParserFunc: myParser,
		Storage:    storage,  // Storage for results
		MaxRAM:     10000000, // Max RAM usage (10MB)
		NumWorkers: 4,        // 4 worker threads
		FilePath:   "large_file.txt",
	}

	// Run parser
	if err := fileparser.Run(config); err != nil {
		log.Fatalf("Parsing error: %v", err)
	}

	// Print results
	for _, res := range storage.Results {
		fmt.Println(res)
	}
}

๐Ÿ“œ License

This project is licensed under the MIT License.
Feel free to use, modify, and contribute! ๐Ÿš€


๐Ÿค Contributions

Contributions are welcome!

  • Open an issue for bug reports or feature requests.
  • Submit a pull request for code improvements.

Happy coding! ๐ŸŽ‰

Documentation ยถ

Index ยถ

Constants ยถ

This section is empty.

Variables ยถ

This section is empty.

Functions ยถ

func Run ยถ

func Run(cfg Config) error

Types ยถ

type Config ยถ

type Config struct {
	ParserFunc func(workerID int, data []byte, storage interface{}) error
	Storage    interface{}
	MaxRAM     int64
	NumWorkers int
	FilePath   string
}

type WorkerTask ยถ

type WorkerTask struct {
	WorkerID int
	Offset   int64
	Size     int64
}

Directories ยถ

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL