The highest tagged major version is v2.

csv

package module

v3.0.0-...-3b7c5a1 Latest Latest Go to latest Published: Sep 3, 2025 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/josephcopenhaver/csv-go

Links

Open Source Insights

README ¶

csv-go

This package is a highly flexible and performant single threaded csv stream reader and writer. It opts for strictness with nearly all options off by default. Using the option functions pattern on Reader and Writer creation ensures extreme flexibility can be offered while configuration can be validated up-front in cold paths. This creates an immutable, clear execution of the csv file/stream parsing strategy. It has been battle tested thoroughly in production contexts for both correctness and speed so feel free to use in any way you like.

The reader is also more performant than the standard go csv package when compared in an apples-to-apples configuration between the two. I expect mileage here to vary over time. My primary goal with this lib was to solve my own edge case problems like suspect-encodings/loose-rules and offer something back more aligned with others that think like myself with regard to reducing allocations, GC pause, and increasing efficiency.

package main

// this is a toy example that reads a csv file and writes to another

import (
	"os"

	"github.com/josephcopenhaver/csv-go/v3"
)

func main() {
	r, err := os.Open("input.csv")
	if err != nil {
		panic(err)
	}
	defer r.Close()

	cr, err := csv.NewReader(
		csv.ReaderOpts().Reader(r),
		// by default quotes have no meaning
		// so must be specified to match RFC 4180
		// csv.ReaderOpts().Quote('"'),
	)
	if err != nil {
		panic(err)
	}
	defer cr.Close()

	w, err := os.Create("output.csv")
	if err != nil {
		panic(err)
	}
	defer w.Close()

	cw, err := csv.NewWriter(
		csv.WriterOpts().Writer(w),
	)
	if err != nil {
		panic(err)
	}
	defer cw.Close()

	for row := range cr.IntoIter() {
		if _, err := cw.WriteRow(row...); err != nil {
			panic(err)
		}
	}
	if err := cr.Err(); err != nil {
		panic(err)
	}
}

See the Reader and Writer examples for more in-depth usages.

Reader Features

Name	option(s)
Zero allocations during processing	BorrowRow + BorrowFields + InitialRecordBuffer + InitialRecordBufferSize + NumFields
Format Specification	Comment + CommentsAllowedAfterStartOfRecords + Escape + FieldSeparator + Quote + RecordSeparator + NumFields
Format Discovery	DiscoverRecordSeparator
Data Loss Prevention	ClearFreedDataMemory
Byte Order Marker Support	RemoveByteOrderMarker + ErrorOnNoByteOrderMarker
Headers Support	ExpectHeaders + RemoveHeaderRow + TrimHeaders
Reader Buffer tuning	ReaderBuffer + ReaderBufferSize
Format Validation	ErrorOnNoRows + ErrorOnNewlineInUnquotedField + ErrorOnQuotesInUnquotedField
Security Limits	MaxFields + MaxRecordBytes + MaxRecords + MaxComments + MaxCommentBytes

Writer Features

Name	option(s)
Zero allocations	planned
Header and Comment Specification	CommentRune + CommentLines + IncludeByteOrderMarker + Headers + TrimHeaders
Format Specification	Escape + FieldSeparator + Quote + RecordSeparator + NumFields
Data Loss Prevention	ClearFreedDataMemory
Encoding Validation	ErrorOnNonUTF8
Security Limits	planned

CHANGELOG

Documentation ¶

Index ¶

Constants
Variables
type Reader
- func NewReader(options ...ReaderOption) (Reader, error)
type ReaderOption
type ReaderOptions
- func ReaderOpts() ReaderOptions
type WriteHeaderOption
type WriteHeaderOptions
- func WriteHeaderOpts() WriteHeaderOptions
type Writer
- func NewWriter(options ...WriterOption) (*Writer, error)
type WriterOption
type WriterOptions
- func WriterOpts() WriterOptions

Constants ¶

View Source

const (

	// ReaderMinBufferSize is the minimum value a ReaderBufferSize
	// option will allow. It is also the minimum length for any
	// ReaderBuffer slice argument. This is exported so
	// configuration which may not be hardcoded by the utilizing
	// author can more easily define validation logic and cite
	// the reason for the limit.
	//
	// Algorithms used in this lib cannot work with a smaller buffer
	// size than this - however in general ReaderBufferSize and
	// ReaderBuffer options should be used to tune and balance mem
	// constraints with performance gained via using larger amounts
	// of buffer space.
	ReaderMinBufferSize = utf8.UTFMax + rMaxOverflowNumBytes
)

Variables ¶

View Source

var (
	// classifications
	ErrIO         = errors.New("io error")
	ErrParsing    = errors.New("parsing error")
	ErrFieldCount = errors.New("field count error")
	ErrBadConfig  = errors.New("bad config")
	ErrSecOp      = errors.New("security error")

	// instances
	ErrTooManyFields                = errors.New("too many fields")
	ErrSecOpRecordByteCountAboveMax = errors.New("record byte count exceeds max")
	// is a sub-instance of ErrTooManyFields
	ErrSecOpFieldCountAboveMax     = errors.New("field count exceeds max")
	ErrSecOpRecordCountAboveMax    = errors.New("record count exceeds max")
	ErrSecOpCommentBytesAboveMax   = errors.New("comment byte count exceeds max")
	ErrSecOpCommentsAboveMax       = errors.New("comment line count exceeds max")
	ErrNotEnoughFields             = errors.New("not enough fields")
	ErrReaderClosed                = errors.New("reader closed")
	ErrUnexpectedHeaderRowContents = errors.New("header row values do not match expectations")
	ErrBadRecordSeparator          = errors.New("record separator can only be one valid utf8 rune long or \"\\r\\n\"")
	ErrIncompleteQuotedField       = fmt.Errorf("incomplete quoted field: %w", io.ErrUnexpectedEOF)
	ErrQuoteInUnquotedField        = errors.New("quote found in unquoted field")
	ErrInvalidQuotedFieldEnding    = errors.New("unexpected character found after end of quoted field") // expecting field separator, record separator, quote char, or end of file if field count matches expectations
	ErrNoHeaderRow                 = fmt.Errorf("no header row: %w", io.ErrUnexpectedEOF)
	ErrNoRows                      = fmt.Errorf("no rows: %w", io.ErrUnexpectedEOF)
	ErrNoByteOrderMarker           = errors.New("no byte order marker")
	ErrNilReader                   = errors.New("nil reader")
	ErrInvalidEscSeqInQuotedField  = errors.New("invalid escape sequence in quoted field")
	ErrNewlineInUnquotedField      = errors.New("newline rune found in unquoted field")
	ErrUnexpectedQuoteAfterField   = errors.New("unexpected quote after quoted+escaped field")
	ErrUnsafeCRFileEnd             = fmt.Errorf("ended in a carriage return which must be quoted when record separator is CRLF: %w", io.ErrUnexpectedEOF)
)

View Source

var (
	ErrRowNilOrEmpty             = errors.New("row is nil or empty")
	ErrNonUTF8InRecord           = errors.New("non-utf8 characters in record")
	ErrNonUTF8InComment          = errors.New("non-utf8 characters in comment")
	ErrWriterClosed              = errors.New("writer closed")
	ErrHeaderWritten             = errors.New("header already written")
	ErrInvalidFieldCountInRecord = errors.New("invalid field count in record")
)

Functions ¶

This section is empty.

Types ¶

type Reader ¶

type Reader interface {
	Close() error
	Err() error
	IntoIter() iter.Seq[[]string]
	Row() []string
	Scan() bool
}

func NewReader ¶

func NewReader(options ...ReaderOption) (Reader, error)

NewReader creates a new instance of a CSV reader which is not safe for concurrent reads.

type ReaderOption ¶

type ReaderOption func(*rCfg)

type ReaderOptions ¶

type ReaderOptions struct{}

ReaderOptions should never be instantiated manually

Instead call ReaderOpts()

This is only exported to allow godocs to discover the exported methods.

ReaderOptions will never have exported members and the zero value is not part of the semver guarantee. Instantiate it incorrectly at your own peril.

Calling the function is a nop that is compiled away anyways, you will not optimize anything at all. Use ReaderOpts()!

func ReaderOpts ¶

func ReaderOpts() ReaderOptions

func (ReaderOptions) BorrowFields ¶

func (ReaderOptions) BorrowFields(b bool) ReaderOption

BorrowFields alters the Row function to return strings that directly reference the internal buffer without copying. This is UNSAFE and can lead to memory corruption if not handled properly.

WARNING: Specifying this option as true while BorrowRow is false will result in an error.

DANGER: Only set to true if you guarantee that field strings are NEVER used after the next call to Scan or Close. Otherwise, you MUST clone both the slice AND the strings within it via strings.Clone(). Failure to do so can lead to memory corruption as the underlying buffer will be reused.

Example of safe usage:

for reader.Scan() {
  row := reader.Row()
  // Process row immediately without storing references
  processRow(row[0], row[1])
}
if reader.Err() != nil { ... }

Example of UNSAFE usage that will lead to bugs:

var savedStrings []string
for reader.Scan() {
  row := reader.Row()
  savedStrings = append(savedStrings, row[0]) // WRONG! Will be corrupted
}
if reader.Err() != nil { ... }

This should be considered a micro-optimization only for performance-critical code paths where profiling has identified string copying as a bottleneck.

func (ReaderOptions) BorrowRow ¶

func (ReaderOptions) BorrowRow(b bool) ReaderOption

BorrowRow alters the Row function to return the same slice instance each time with the strings inside set to different values.

Only set to true if the returned row slice is never used or modified after the next call to Scan or Close. You must clone the slice if doing otherwise.

See BorrowFields() if you wish to also remove allocations related to cloning strings into the slice.

Please consider this to be a micro optimization in most circumstances just because is tightens the usage contract of the returned row in ways most would not normally consider.

func (ReaderOptions) ClearFreedDataMemory ¶

func (ReaderOptions) ClearFreedDataMemory(b bool) ReaderOption

ClearFreedDataMemory ensures that whenever a shared memory buffer that contains data goes out of scope that zero values are written to every byte within the buffer.

This may significantly degrade performance and is recommended only for sensitive data or long-lived processes.

func (ReaderOptions) Comment ¶

func (ReaderOptions) Comment(r rune) ReaderOption

func (ReaderOptions) CommentsAllowedAfterStartOfRecords ¶

func (ReaderOptions) CommentsAllowedAfterStartOfRecords(b bool) ReaderOption

func (ReaderOptions) DiscoverRecordSeparator ¶

func (ReaderOptions) DiscoverRecordSeparator(b bool) ReaderOption

func (ReaderOptions) ErrorOnNewlineInUnquotedField ¶

func (ReaderOptions) ErrorOnNewlineInUnquotedField(b bool) ReaderOption

func (ReaderOptions) ErrorOnNoByteOrderMarker ¶

func (ReaderOptions) ErrorOnNoByteOrderMarker(b bool) ReaderOption

func (ReaderOptions) ErrorOnNoRows ¶

func (ReaderOptions) ErrorOnNoRows(b bool) ReaderOption

ErrorOnNoRows causes cr.Err() to return ErrNoRows should the reader stream terminate before any data records are parsed.

func (ReaderOptions) ErrorOnQuotesInUnquotedField ¶

func (ReaderOptions) ErrorOnQuotesInUnquotedField(b bool) ReaderOption

func (ReaderOptions) Escape ¶

func (ReaderOptions) Escape(r rune) ReaderOption

Escape is useful for specifying what character is used to escape a quote in a field and the literal escape character itself.

Without specifying this option a quote character is expected to be escaped by it just being doubled while the overall field is wrapped in quote characters.

This is mainly useful when processing a spark csv file as it does not follow strict rfc4180.

So set to '\\' if you have this need.

It is not valid to use this option without specifically setting a quote. Doing so will result in an error being returned on Reader creation.

func (ReaderOptions) ExpectHeaders ¶

func (ReaderOptions) ExpectHeaders(h ...string) ReaderOption

ExpectHeaders causes the first row to be recognized as a header row.

If the slice of header values does not match then the reader will error.

func (ReaderOptions) FieldSeparator ¶

func (ReaderOptions) FieldSeparator(r rune) ReaderOption

func (ReaderOptions) InitialRecordBuffer ¶

func (ReaderOptions) InitialRecordBuffer(v []byte) ReaderOption

InitialRecordBuffer is a hint to pre-allocate record buffer space once externally and pipe it in to reduce the number of re-allocations when processing a reader and reuse it at a later time after the reader is closed.

This option should generally not be used. It only exists to assist with processing large numbers of CSV files should memory be a clear constraint. There is no guarantee this buffer will always be used till the end of the csv Reader's lifecycle.

Please consider this to be a micro optimization in most circumstances just because is tightens the usage contract of the csv Reader in ways most would not normally consider.

func (ReaderOptions) InitialRecordBufferSize ¶

func (ReaderOptions) InitialRecordBufferSize(v int) ReaderOption

InitialRecordBufferSize is a hint to pre-allocate record buffer space once and reduce the number of re-allocations when processing a reader.

Please consider this to be a micro optimization in most circumstances just because it's not likely that most users will know the maximum total record size they wish to target / be under and it's generally a better practice to leave these details to the go runtime to coordinate via standard garbage collection.

func (ReaderOptions) MaxCommentBytes ¶

func (ReaderOptions) MaxCommentBytes(n int) ReaderOption

MaxCommentBytes is a security option that limits the number of bytes allowed in a comment line before a SecOp error is thrown

func (ReaderOptions) MaxComments ¶

func (ReaderOptions) MaxComments(n int) ReaderOption

MaxComments is a security option that limits the number of comment lines allowed in a stream before a SecOp error is thrown

func (ReaderOptions) MaxFields ¶

func (ReaderOptions) MaxFields(v uint) ReaderOption

MaxFields is a security option that limits the number of fields allowed to be detected automatically before a SecOp error is thrown

using this option at the same time as the NumFields option will lead to an error on reader creation since using both is counter intuitive in general

func (ReaderOptions) MaxRecordBytes ¶

func (ReaderOptions) MaxRecordBytes(n int) ReaderOption

MaxRecordBytes is a security option that limits the number of bytes allowed to be detected in a record before a SecOp error is thrown

func (ReaderOptions) MaxRecords ¶

func (ReaderOptions) MaxRecords(n uint64) ReaderOption

MaxRecords is a security option that limits the number of records allowed in a stream before a SecOp error is thrown

func (ReaderOptions) NumFields ¶

func (ReaderOptions) NumFields(n int) ReaderOption

func (ReaderOptions) Quote ¶

func (ReaderOptions) Quote(r rune) ReaderOption

func (ReaderOptions) Reader ¶

func (ReaderOptions) Reader(r io.Reader) ReaderOption

func (ReaderOptions) ReaderBuffer ¶

func (ReaderOptions) ReaderBuffer(v []byte) ReaderOption

ReaderBuffer will only accept a slice with a length greater than or equal to ReaderMinBufferSize otherwise an error will be thrown when creating the reader instance. Only up to the length of the slice is utilized during buffering operations. Capacity of the provided slice is not utilized in any way.

func (ReaderOptions) ReaderBufferSize ¶

func (ReaderOptions) ReaderBufferSize(v int) ReaderOption

ReaderBufferSize will only accept a value greater than or equal to ReaderMinBufferSize otherwise an error will be thrown when creating the reader instance.

func (ReaderOptions) RecordSeparator ¶

func (ReaderOptions) RecordSeparator(s string) ReaderOption

func (ReaderOptions) RemoveByteOrderMarker ¶

func (ReaderOptions) RemoveByteOrderMarker(b bool) ReaderOption

func (ReaderOptions) RemoveHeaderRow ¶

func (ReaderOptions) RemoveHeaderRow(b bool) ReaderOption

RemoveHeaderRow causes the first row to be recognized as a header row.

The row will be skipped over by Scan() and will not be returned by Row().

func (ReaderOptions) TerminalRecordSeparatorEmitsRecord ¶

func (ReaderOptions) TerminalRecordSeparatorEmitsRecord(b bool) ReaderOption

TerminalRecordSeparatorEmitsRecord only exists to acknowledge an edge case when processing csv documents that contain one column. If the file contents end in a record separator it's impossible to determine if that should indicate that a new record with an empty field should be emitted unless that record is enclosed in quotes or a config option like this exists.

In most cases this should not be an issue, unless the dataset is a single column list that allows empty strings for some use case and the writer used to create the file chooses to not always write the last record followed by a record separator. (treating the record separator like a record terminator)

func (ReaderOptions) TrimHeaders ¶

func (ReaderOptions) TrimHeaders(b bool) ReaderOption

TrimHeaders causes the first row to be recognized as a header row and all values are returned with whitespace trimmed.

type WriteHeaderOption ¶

type WriteHeaderOption func(*whCfg)

type WriteHeaderOptions ¶

type WriteHeaderOptions struct{}

WriteHeaderOptions should never be instantiated manually

Instead call WriteHeaderOpts()

This is only exported to allow godocs to discover the exported methods.

WriteHeaderOptions will never have exported members and the zero value is not part of the semver guarantee. Instantiate it incorrectly at your own peril.

Calling the function is a nop that is compiled away anyways, you will not optimize anything at all. Use WriteHeaderOpts()!

func WriteHeaderOpts ¶

func WriteHeaderOpts() WriteHeaderOptions

func (WriteHeaderOptions) CommentLines ¶

func (WriteHeaderOptions) CommentLines(s ...string) WriteHeaderOption

func (WriteHeaderOptions) CommentRune ¶

func (WriteHeaderOptions) CommentRune(r rune) WriteHeaderOption

func (WriteHeaderOptions) Headers ¶

func (WriteHeaderOptions) Headers(h ...string) WriteHeaderOption

func (WriteHeaderOptions) IncludeByteOrderMarker ¶

func (WriteHeaderOptions) IncludeByteOrderMarker(b bool) WriteHeaderOption

func (WriteHeaderOptions) TrimHeaders ¶

func (WriteHeaderOptions) TrimHeaders(b bool) WriteHeaderOption

type Writer ¶

type Writer struct {
	// contains filtered or unexported fields
}

func NewWriter ¶

func NewWriter(options ...WriterOption) (*Writer, error)

NewWriter creates a new instance of a CSV writer which is not safe for concurrent reads.

func (*Writer) Close ¶

func (w *Writer) Close() error

Close should be called after writing all rows successfully to the underlying writer.

Close currently always returns nil, but in the future it may not.

Should any configuration options require post-flight checks they will be implemented here.

It will never attempt to flush or close the underlying writer instance. That is left to the calling context.

func (*Writer) WriteHeader ¶

func (w *Writer) WriteHeader(options ...WriteHeaderOption) (int, error)

func (*Writer) WriteRow ¶

func (w *Writer) WriteRow(row ...string) (int, error)

type WriterOption ¶

type WriterOption func(*wCfg)

type WriterOptions ¶

type WriterOptions struct{}

WriterOptions should never be instantiated manually

Instead call WriterOpts()

This is only exported to allow godocs to discover the exported methods.

WriterOptions will never have exported members and the zero value is not part of the semver guarantee. Instantiate it incorrectly at your own peril.

Calling the function is a nop that is compiled away anyways, you will not optimize anything at all. Use WriterOpts()!

func WriterOpts ¶

func WriterOpts() WriterOptions

func (WriterOptions) ClearFreedDataMemory ¶

func (WriterOptions) ClearFreedDataMemory(b bool) WriterOption

ClearFreedDataMemory ensures that whenever a shared memory buffer that contains data goes out of scope that zero values are written to every byte within the buffer.

This may significantly degrade performance and is recommended only for sensitive data or long-lived processes.

func (WriterOptions) ErrorOnNonUTF8 ¶

func (WriterOptions) ErrorOnNonUTF8(v bool) WriterOption

func (WriterOptions) Escape ¶

func (WriterOptions) Escape(r rune) WriterOption

func (WriterOptions) FieldSeparator ¶

func (WriterOptions) FieldSeparator(v rune) WriterOption

func (WriterOptions) NumFields ¶

func (WriterOptions) NumFields(v int) WriterOption

func (WriterOptions) Quote ¶

func (WriterOptions) Quote(v rune) WriterOption

func (WriterOptions) RecordSeparator ¶

func (WriterOptions) RecordSeparator(s string) WriterOption

func (WriterOptions) Writer ¶

func (WriterOptions) Writer(v io.Writer) WriterOption

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
internal
cmd/generate command
examples/reader command
examples/writer command

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL