splitcsv

command module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 9, 2025 License: Apache-2.0 Imports: 6 Imported by: 0

README

CSV Splitter

A robust and flexible command-line tool to split large CSV files into smaller chunks while preserving headers and maintaining data integrity.

Features

  • Flexible Configuration: Multiple command-line options for customization
  • Performance Optimized: Efficient memory usage and I/O operations
  • Error Handling: Comprehensive error reporting with line numbers
  • Multiple Delimiters: Support for different CSV delimiter characters
  • Verbose Output: Optional detailed progress information
  • Empty Record Handling: Configurable skipping of empty records
  • Output Directory Control: Specify custom output directories

Installation

You need Go installed (version 1.18 or newer recommended).

Build from Source
git clone https://github.com/kianooshaz/splitcsv.git
cd splitcsv
go build -o csvplit
Install via go install
go install github.com/kianooshaz/splitcsv@latest

Usage

Basic Usage
./csvplit -input data.csv -limit 5000
Advanced Usage
./csvplit -i data.csv -o chunk -dir ./output -l 1000 -delimiter ";" -v
Command Line Options
Flag Shorthand Default Description
-input -i required Path to the input CSV file
-out -o output Prefix for the output files
-limit -l 10000 Maximum number of records per output file
-dir . Output directory for split files
-delimiter , CSV delimiter character
-buffer 65536 Buffer size for file I/O in bytes
-skip-empty true Skip empty records
-verbose -v false Enable verbose output
-help -h Show help message
Examples

Split a large CSV file into chunks of 5000 records:

./csvplit -input bigdata.csv -limit 5000

Use a custom output prefix and directory:

./csvplit -i data.csv -o part -dir ./chunks

Process a semicolon-delimited file with verbose output:

./csvplit -i data.csv -delimiter ";" -v

Split with custom buffer size for better performance:

./csvplit -i largefile.csv -buffer 131072 -l 10000

Output

The tool creates numbered output files with the format: {prefix}_{number}.csv

For example, with -out part, you'll get:

  • part_1.csv
  • part_2.csv
  • part_3.csv
  • etc.

Each output file includes:

  • The original CSV header as the first line
  • Up to the specified number of data records
  • Proper CSV formatting with the same delimiter as the input

Error Handling

The tool provides detailed error messages including:

  • File access issues
  • CSV parsing errors with line numbers
  • Configuration validation errors
  • I/O operation failures

Performance Considerations

  • Memory Efficient: Processes files in streaming fashion
  • Configurable Buffering: Adjust buffer size for optimal I/O performance
  • Large File Support: Can handle files larger than available RAM

Requirements

  • Go 1.18 or newer
  • Read access to input CSV file
  • Write access to output directory

License

See LICENSE.

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL