csvpp

package module
v0.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 29, 2026 License: MIT Imports: 8 Imported by: 0

README

go-csvpp

Go Reference Go Report Card

A Go implementation of the IETF CSV++ specification (draft-mscaldas-csvpp-01).

CSV++ extends traditional CSV to support arrays and structured fields within cells, enabling complex data representation while maintaining CSV's simplicity.

Features

  • Full IETF CSV++ specification compliance
  • Wraps encoding/csv for RFC 4180 compatibility
  • Four field types: Simple, Array, Structured, ArrayStructured
  • Struct mapping with csvpp tags (Marshal/Unmarshal)
  • Configurable delimiters
  • Security-conscious design (nesting depth limits)

Requirements

  • Go 1.24 or later

Installation

go get github.com/osamingo/go-csvpp

Quick Start

Reading CSV++ Data
package main

import (
    "fmt"
    "io"
    "strings"

    "github.com/osamingo/go-csvpp"
)

func main() {
    input := `name,phone[],geo(lat^lon)
Alice,555-1234~555-5678,34.0522^-118.2437
Bob,555-9999,40.7128^-74.0060
`

    reader := csvpp.NewReader(strings.NewReader(input))

    for {
        record, err := reader.Read()
        if err == io.EOF {
            break
        }
        if err != nil {
            panic(err)
        }

        name := record[0].Value
        phones := record[1].Values
        lat := record[2].Components[0].Value
        lon := record[2].Components[1].Value

        fmt.Printf("%s: phones=%v, location=(%s, %s)\n", name, phones, lat, lon)
    }
}

Output:

Alice: phones=[555-1234 555-5678], location=(34.0522, -118.2437)
Bob: phones=[555-9999], location=(40.7128, -74.0060)
Writing CSV++ Data
package main

import (
    "bytes"
    "fmt"

    "github.com/osamingo/go-csvpp"
)

func main() {
    var buf bytes.Buffer
    writer := csvpp.NewWriter(&buf)

    headers := []*csvpp.ColumnHeader{
        {Name: "name", Kind: csvpp.SimpleField},
        {Name: "tags", Kind: csvpp.ArrayField, ArrayDelimiter: '~'},
    }
    writer.SetHeaders(headers)

    if err := writer.WriteHeader(); err != nil {
        panic(err)
    }
    if err := writer.Write([]*csvpp.Field{
        {Value: "Alice"},
        {Values: []string{"go", "rust", "python"}},
    }); err != nil {
        panic(err)
    }
    writer.Flush()

    fmt.Print(buf.String())
}

Output:

name,tags[]
Alice,go~rust~python
Struct Mapping
package main

import (
    "fmt"
    "strings"

    "github.com/osamingo/go-csvpp"
)

type Person struct {
    Name   string   `csvpp:"name"`
    Phones []string `csvpp:"phone[]"`
    Geo    struct {
        Lat string
        Lon string
    } `csvpp:"geo(lat^lon)"`
}

func main() {
    input := `name,phone[],geo(lat^lon)
Alice,555-1234~555-5678,34.0522^-118.2437
`

    var people []Person
    if err := csvpp.Unmarshal(strings.NewReader(input), &people); err != nil {
        panic(err)
    }

    for _, p := range people {
        fmt.Printf("%s: phones=%v, geo=(%s, %s)\n",
            p.Name, p.Phones, p.Geo.Lat, p.Geo.Lon)
    }
}

Output:

Alice: phones=[555-1234 555-5678], geo=(34.0522, -118.2437)

Field Types

CSV++ supports four field types in headers:

Type Header Syntax Example Data Description
Simple name Alice Plain text value
Array tags[] go~rust~python Multiple values with delimiter
Structured geo(lat^lon) 34.05^-118.24 Named components
ArrayStructured addr[](city^zip) LA^90210~NY^10001 Array of structures
Default Delimiters
  • Array delimiter: ~ (tilde)
  • Component delimiter: ^ (caret)

Custom delimiters can be specified in the header:

  • phone[|] - uses | as array delimiter
  • geo;(lat;lon) - uses ; as component delimiter
Delimiter Progression

For nested structures, the IETF specification recommends:

Level Delimiter
1 (arrays) ~
2 (components) ^
3 ;
4 :

API Reference

Reader
reader := csvpp.NewReader(r) // r is io.Reader

// Configuration (same as encoding/csv)
reader.Comma = ','           // Field delimiter
reader.Comment = '#'         // Comment character
reader.LazyQuotes = false    // Relaxed quote handling
reader.TrimLeadingSpace = false
reader.MaxNestingDepth = 10  // Nesting limit (security)

// Methods
headers, err := reader.Headers()  // Get parsed headers
record, err := reader.Read()      // Read one record
records, err := reader.ReadAll()  // Read all records
Writer
writer := csvpp.NewWriter(w) // w is io.Writer

// Configuration
writer.Comma = ','      // Field delimiter
writer.UseCRLF = false  // Use \r\n line endings

// Methods
writer.SetHeaders(headers)  // Set column headers
writer.WriteHeader()        // Write header row
writer.Write(record)        // Write one record
writer.WriteAll(records)    // Write all records
writer.Flush()              // Flush buffer
Marshal/Unmarshal
// Unmarshal CSV++ data into structs
var people []Person
err := csvpp.Unmarshal(reader, &people)

// Marshal structs to CSV++ data
err := csvpp.Marshal(writer, people)
Struct Tags

Use csvpp struct tags to map fields:

type Record struct {
    Name     string   `csvpp:"name"`           // Simple field
    Tags     []string `csvpp:"tags[]"`         // Array field
    Location struct {                          // Structured field
        Lat string
        Lon string
    } `csvpp:"geo(lat^lon)"`
    Addresses []Address `csvpp:"addr[](street^city)"` // Array structured
}

Compatibility

This package wraps encoding/csv and inherits:

  • Full RFC 4180 compliance
  • Quoted field handling
  • Configurable field/line delimiters
  • Comment support

Security

  • MaxNestingDepth: Limits nested structure depth (default: 10) to prevent stack overflow from malicious input
  • Header names are restricted to ASCII characters per IETF specification
CSV Injection Prevention

When CSV files are opened in spreadsheet applications, values starting with =, +, -, or @ may be interpreted as formulas. Use HasFormulaPrefix to detect and escape dangerous values:

if csvpp.HasFormulaPrefix(value) {
    value = "'" + value // Escape for spreadsheet safety
}

Specification

This implementation follows the IETF CSV++ specification:

License

MIT License - see LICENSE for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Documentation

Overview

Package csvpp implements the IETF CSV++ specification (draft-mscaldas-csvpp-01).

CSV++ extends traditional CSV to support arrays and structured fields within cells, enabling complex data representation while maintaining CSV's simplicity. This package wraps encoding/csv and is fully compatible with RFC 4180.

Overview

CSV++ introduces four field types beyond simple text values:

  • Simple: "name" - plain text value
  • Array: "tags[]" - multiple values separated by a delimiter (default: ~)
  • Structured: "geo(lat^lon)" - named components separated by a delimiter (default: ^)
  • ArrayStructured: "addresses[](street^city)" - array of structured values

These field types are represented by the FieldKind constants: SimpleField, ArrayField, StructuredField, and ArrayStructuredField.

Basic Usage

Reading CSV++ data:

r := csvpp.NewReader(file)

// Get parsed headers
headers, err := r.Headers()
if err != nil {
    log.Fatal(err)
}

// Read records
for {
    record, err := r.Read()
    if err == io.EOF {
        break
    }
    if err != nil {
        log.Fatal(err)
    }
    // process record
}

Writing CSV++ data:

w := csvpp.NewWriter(file)
w.SetHeaders(headers)

if err := w.WriteHeader(); err != nil {
    log.Fatal(err)
}

for _, record := range records {
    if err := w.Write(record); err != nil {
        log.Fatal(err)
    }
}
w.Flush()
if err := w.Error(); err != nil {
    log.Fatal(err)
}

Struct Mapping

Use Marshal and Unmarshal for automatic struct mapping with struct tags:

type Person struct {
    Name   string   `csvpp:"name"`
    Phones []string `csvpp:"phone[]"`
    Geo    struct {
        Lat string
        Lon string
    } `csvpp:"geo(lat^lon)"`
}

// Read into structs
var people []Person
if err := csvpp.Unmarshal(file, &people); err != nil {
    log.Fatal(err)
}

// Write from structs
var buf bytes.Buffer
if err := csvpp.Marshal(&buf, people); err != nil {
    log.Fatal(err)
}

Delimiter Conventions

The IETF CSV++ specification recommends using specific delimiters for nested structures to avoid conflicts. The recommended progression is:

  • Level 1 (arrays): ~ (tilde)
  • Level 2 (components): ^ (caret)
  • Level 3: ; (semicolon)
  • Level 4: : (colon)

This package uses ~ and ^ as defaults, matching the IETF recommendation.

Compatibility with encoding/csv

This package wraps encoding/csv and inherits its RFC 4180 compliance. The Reader and Writer types expose the same configuration options:

  • Comma: field delimiter (default: ',')
  • Comment: comment character (Reader only)
  • LazyQuotes: relaxed quote handling (Reader only)
  • TrimLeadingSpace: trim leading whitespace (Reader only)
  • UseCRLF: use \r\n line endings (Writer only)

Security Considerations

The MaxNestingDepth option (default: 10) limits the depth of nested structures to prevent stack overflow attacks from maliciously crafted input.

CSV Injection

When CSV files are opened in spreadsheet applications (Excel, Google Sheets, etc.), values beginning with '=', '+', '-', or '@' may be interpreted as formulas. This can lead to security vulnerabilities known as "CSV injection" or "formula injection".

Use the HasFormulaPrefix function to detect potentially dangerous values:

for _, field := range record {
    if csvpp.HasFormulaPrefix(field.Value) {
        field.Value = "'" + field.Value // Escape for spreadsheet safety
    }
}

Note: This package does not automatically escape formula prefixes to preserve data integrity. Applications should implement appropriate escaping based on their specific security requirements and target environments.

Errors

The package defines the following sentinel errors:

Parse errors are wrapped in ParseError, which provides line/column information.

Constants

Default delimiters follow IETF recommendations:

Specification Reference

For the complete IETF CSV++ specification, see: https://datatracker.ietf.org/doc/draft-mscaldas-csvpp/

Example
input := `name,phone[],geo(lat^lon)
Alice,555-1234~555-5678,34.0522^-118.2437
Bob,555-9999,40.7128^-74.0060
`

reader := csvpp.NewReader(strings.NewReader(input))

// Get headers
headers, err := reader.Headers()
if err != nil {
	log.Fatal(err)
}

fmt.Printf("Headers: %s, %s, %s\n", headers[0].Name, headers[1].Name, headers[2].Name)

// Read all records
for {
	record, err := reader.Read()
	if err == io.EOF {
		break
	}
	if err != nil {
		log.Fatal(err)
	}

	name := record[0].Value
	phones := record[1].Values
	lat := record[2].Components[0].Value
	lon := record[2].Components[1].Value

	fmt.Printf("%s: phones=%v, location=(%s, %s)\n", name, phones, lat, lon)
}
Output:

Headers: name, phone, geo
Alice: phones=[555-1234 555-5678], location=(34.0522, -118.2437)
Bob: phones=[555-9999], location=(40.7128, -74.0060)

Index

Examples

Constants

View Source
const (
	DefaultArrayDelimiter     = '~' // IETF Section 2.3.2: recommended for array fields
	DefaultComponentDelimiter = '^' // IETF Section 2.3.2: recommended for structured fields
)

Default delimiters as recommended in IETF CSV++ Section 2.3.2. The specification suggests delimiter progression: ~ → ^ → ; → : for nested structures.

View Source
const DefaultMaxNestingDepth = 10

DefaultMaxNestingDepth is the default maximum nesting depth. IETF Section 5 (Security Considerations) recommends limiting nesting depth to prevent stack overflow attacks from maliciously crafted input.

Variables

View Source
var (
	ErrNoHeader       = errors.New("csvpp: header record is required")
	ErrInvalidHeader  = errors.New("csvpp: invalid column header format")
	ErrNestingTooDeep = errors.New("csvpp: nesting level exceeds limit")
)

Error definitions.

Functions

func HasFormulaPrefix added in v0.0.2

func HasFormulaPrefix(s string) bool

HasFormulaPrefix reports whether s starts with a character that spreadsheet applications may interpret as a formula. These characters are: '=', '+', '-', '@'.

When CSV files are opened in spreadsheet applications like Microsoft Excel or Google Sheets, values beginning with these characters may be executed as formulas, potentially leading to security vulnerabilities (CSV injection).

This function helps identify potentially dangerous values so that applications can take appropriate action, such as prefixing with a single quote or rejecting the input.

Example:

if csvpp.HasFormulaPrefix(value) {
    value = "'" + value // Escape for spreadsheet safety
}

func Marshal

func Marshal(w io.Writer, src any) error

Marshal encodes a slice of structs to CSV++ data.

Example
people := []Person{
	{Name: "Alice", Phones: []string{"555-1234", "555-5678"}},
	{Name: "Bob", Phones: []string{"555-9999"}},
}

var buf bytes.Buffer
if err := csvpp.Marshal(&buf, people); err != nil {
	log.Fatal(err)
}

fmt.Print(buf.String())
Output:

name,phone[]
Alice,555-1234~555-5678
Bob,555-9999

func MarshalWriter

func MarshalWriter(w *Writer, src any) error

MarshalWriter encodes a slice of structs to a Writer.

func Unmarshal

func Unmarshal(r io.Reader, dst any) error

Unmarshal decodes CSV++ data into a slice of structs. dst must be a pointer to a slice of structs.

Example
input := `name,phone[]
Alice,555-1234~555-5678
Bob,555-9999
`

var people []Person
if err := csvpp.Unmarshal(strings.NewReader(input), &people); err != nil {
	log.Fatal(err)
}

for _, p := range people {
	fmt.Printf("%s: %v\n", p.Name, p.Phones)
}
Output:

Alice: [555-1234 555-5678]
Bob: [555-9999]
Example (Structured)
input := `name,geo(lat^lon)
Los Angeles,34.0522^-118.2437
New York,40.7128^-74.0060
`

var locations []Location
if err := csvpp.Unmarshal(strings.NewReader(input), &locations); err != nil {
	log.Fatal(err)
}

for _, loc := range locations {
	fmt.Printf("%s: (%s, %s)\n", loc.Name, loc.Geo.Lat, loc.Geo.Lon)
}
Output:

Los Angeles: (34.0522, -118.2437)
New York: (40.7128, -74.0060)

func UnmarshalReader

func UnmarshalReader(r *Reader, dst any) error

UnmarshalReader decodes from a Reader into a slice of structs.

Types

type ColumnHeader

type ColumnHeader struct {
	Name               string          // Field name (ABNF: name = 1*field-char)
	Kind               FieldKind       // Field type (IETF Section 2.2)
	ArrayDelimiter     rune            // Array delimiter (ABNF: delimiter)
	ComponentDelimiter rune            // Component delimiter (ABNF: component-delim)
	Components         []*ColumnHeader // Component list (ABNF: component-list)
}

ColumnHeader represents the declaration information for an individual field. It corresponds to the ABNF "field" rule in IETF CSV++ Section 2.2:

field = simple-field / array-field / struct-field / array-struct-field
name  = 1*field-char
field-char = ALPHA / DIGIT / "_" / "-"

type Field

type Field struct {
	Value      string   // Value for SimpleField
	Values     []string // Values for ArrayField (IETF Section 2.2.2)
	Components []*Field // Components for StructuredField/ArrayStructuredField (IETF Section 2.2.3/2.2.4)
}

Field represents a parsed field value from a data row. The populated fields depend on the corresponding ColumnHeader.Kind:

  • SimpleField: Value is set
  • ArrayField: Values is set
  • StructuredField: Components is set (each component is a Field)
  • ArrayStructuredField: Components is set (each is a Field with its own Components)

type FieldKind

type FieldKind int

FieldKind represents the type of field as defined in IETF CSV++ Section 2.2. See: https://datatracker.ietf.org/doc/draft-mscaldas-csvpp/

const (
	SimpleField          FieldKind = iota // IETF Section 2.2.1: simple-field = name
	ArrayField                            // IETF Section 2.2.2: array-field = name "[" [delimiter] "]"
	StructuredField                       // IETF Section 2.2.3: struct-field = name [component-delim] "(" component-list ")"
	ArrayStructuredField                  // IETF Section 2.2.4: array-struct-field = name "[" [delimiter] "]" [component-delim] "(" component-list ")"
)

func (FieldKind) String

func (k FieldKind) String() string

String returns the string representation of FieldKind.

type ParseError

type ParseError struct {
	Line   int    // Line number where the error occurred (1-based)
	Column int    // Column number where the error occurred (1-based)
	Field  string // Field name (if available)
	Err    error  // Original error
}

ParseError holds detailed information about an error that occurred during parsing.

func (*ParseError) Error

func (e *ParseError) Error() string

Error returns the error message for ParseError.

func (*ParseError) Unwrap

func (e *ParseError) Unwrap() error

Unwrap returns the original error.

type Reader

type Reader struct {
	// Comma is the field delimiter (default: ',').
	Comma rune
	// Comment is the comment character (disabled if 0).
	Comment rune
	// LazyQuotes relaxes strict quote checking if true.
	LazyQuotes bool
	// TrimLeadingSpace trims leading whitespace from fields if true.
	TrimLeadingSpace bool
	// MaxNestingDepth is the maximum nesting depth for structured fields (default: 10).
	// This limit prevents stack overflow from deeply nested input (IETF Section 5).
	// If 0, DefaultMaxNestingDepth is used.
	MaxNestingDepth int
	// contains filtered or unexported fields
}

Reader reads CSV++ files according to the IETF CSV++ specification. It wraps encoding/csv.Reader and provides CSV++ header parsing and field parsing. The first row is always treated as the header row (IETF Section 2.1).

func NewReader

func NewReader(r io.Reader) *Reader

NewReader creates a new Reader.

Example (CustomDelimiter)
// Using semicolon as field delimiter (common in European locales)
input := `name;age
Alice;30
Bob;25
`

reader := csvpp.NewReader(strings.NewReader(input))
reader.Comma = ';'

records, err := reader.ReadAll()
if err != nil {
	log.Fatal(err)
}

for _, record := range records {
	fmt.Printf("%s is %s\n", record[0].Value, record[1].Value)
}
Output:

Alice is 30
Bob is 25

func (*Reader) Headers

func (r *Reader) Headers() ([]*ColumnHeader, error)

Headers returns the parsed header information. If headers have not been parsed yet, the first row is read and parsed.

Example
input := `id,name,tags[],address(street^city^zip)
1,Alice,go~rust,123 Main^LA^90210
`

reader := csvpp.NewReader(strings.NewReader(input))
headers, err := reader.Headers()
if err != nil {
	log.Fatal(err)
}

for _, h := range headers {
	fmt.Printf("%s: %s\n", h.Name, h.Kind)
}
Output:

id: SimpleField
name: SimpleField
tags: ArrayField
address: StructuredField

func (*Reader) Read

func (r *Reader) Read() ([]*Field, error)

Read reads and returns one record's worth of fields. The header row is automatically parsed on the first call. Returns io.EOF when the end of file is reached.

Example
input := `name,scores[]
Alice,100~95~88
Bob,77~82
`

reader := csvpp.NewReader(strings.NewReader(input))

for {
	record, err := reader.Read()
	if err == io.EOF {
		break
	}
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%s: %v\n", record[0].Value, record[1].Values)
}
Output:

Alice: [100 95 88]
Bob: [77 82]

func (*Reader) ReadAll

func (r *Reader) ReadAll() ([][]*Field, error)

ReadAll reads and returns all records. The header row is automatically parsed on the first call.

Example
input := `name,age
Alice,30
Bob,25
Charlie,35
`

reader := csvpp.NewReader(strings.NewReader(input))
records, err := reader.ReadAll()
if err != nil {
	log.Fatal(err)
}

fmt.Printf("Read %d records\n", len(records))
for _, record := range records {
	fmt.Printf("%s is %s years old\n", record[0].Value, record[1].Value)
}
Output:

Read 3 records
Alice is 30 years old
Bob is 25 years old
Charlie is 35 years old

type Writer

type Writer struct {
	// Comma is the field delimiter (default: ',').
	Comma rune
	// UseCRLF uses \r\n as the line terminator if true.
	UseCRLF bool
	// contains filtered or unexported fields
}

Writer writes CSV++ files according to the IETF CSV++ specification. It wraps encoding/csv.Writer and serializes CSV++ fields using the delimiters defined in the headers. The output is RFC 4180 compliant.

Example
var buf bytes.Buffer
writer := csvpp.NewWriter(&buf)

headers := []*csvpp.ColumnHeader{
	{Name: "name", Kind: csvpp.SimpleField},
	{Name: "tags", Kind: csvpp.ArrayField, ArrayDelimiter: '~'},
}
writer.SetHeaders(headers)

if err := writer.WriteHeader(); err != nil {
	log.Fatal(err)
}

records := [][]*csvpp.Field{
	{{Value: "Alice"}, {Values: []string{"go", "rust"}}},
	{{Value: "Bob"}, {Values: []string{"python"}}},
}

for _, record := range records {
	if err := writer.Write(record); err != nil {
		log.Fatal(err)
	}
}
writer.Flush()

fmt.Print(buf.String())
Output:

name,tags[]
Alice,go~rust
Bob,python

func NewWriter

func NewWriter(w io.Writer) *Writer

NewWriter creates a new Writer.

func (*Writer) Error

func (w *Writer) Error() error

Error returns any error that occurred during writing.

func (*Writer) Flush

func (w *Writer) Flush()

Flush flushes the buffer.

func (*Writer) SetHeaders

func (w *Writer) SetHeaders(headers []*ColumnHeader)

SetHeaders sets the header information. This must be called before WriteHeader or Write.

func (*Writer) Write

func (w *Writer) Write(record []*Field) error

Write writes one record's worth of fields.

func (*Writer) WriteAll

func (w *Writer) WriteAll(records [][]*Field) error

WriteAll writes all records. The header row is also written automatically.

Example
var buf bytes.Buffer
writer := csvpp.NewWriter(&buf)

headers := []*csvpp.ColumnHeader{
	{Name: "name", Kind: csvpp.SimpleField},
	{Name: "score", Kind: csvpp.SimpleField},
}
writer.SetHeaders(headers)

records := [][]*csvpp.Field{
	{{Value: "Alice"}, {Value: "100"}},
	{{Value: "Bob"}, {Value: "95"}},
}

if err := writer.WriteAll(records); err != nil {
	log.Fatal(err)
}

fmt.Print(buf.String())
Output:

name,score
Alice,100
Bob,95

func (*Writer) WriteHeader

func (w *Writer) WriteHeader() error

WriteHeader writes the header row.

Directories

Path Synopsis
Package csvpputil provides utility functions for converting CSV++ data to other formats such as JSON and YAML.
Package csvpputil provides utility functions for converting CSV++ data to other formats such as JSON and YAML.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL