Documentation
¶
Overview ¶
Package csvpp implements the IETF CSV++ specification (draft-mscaldas-csvpp-01).
CSV++ extends traditional CSV to support arrays and structured fields within cells, enabling complex data representation while maintaining CSV's simplicity. This package wraps encoding/csv and is fully compatible with RFC 4180.
Overview ¶
CSV++ introduces four field types beyond simple text values:
- Simple: "name" - plain text value
- Array: "tags[]" - multiple values separated by a delimiter (default: ~)
- Structured: "geo(lat^lon)" - named components separated by a delimiter (default: ^)
- ArrayStructured: "addresses[](street^city)" - array of structured values
These field types are represented by the FieldKind constants: SimpleField, ArrayField, StructuredField, and ArrayStructuredField.
Basic Usage ¶
Reading CSV++ data:
r := csvpp.NewReader(file)
// Get parsed headers
headers, err := r.Headers()
if err != nil {
log.Fatal(err)
}
// Read records
for {
record, err := r.Read()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
// process record
}
Writing CSV++ data:
w := csvpp.NewWriter(file)
w.SetHeaders(headers)
if err := w.WriteHeader(); err != nil {
log.Fatal(err)
}
for _, record := range records {
if err := w.Write(record); err != nil {
log.Fatal(err)
}
}
w.Flush()
if err := w.Error(); err != nil {
log.Fatal(err)
}
Struct Mapping ¶
Use Marshal and Unmarshal for automatic struct mapping with struct tags:
type Person struct {
Name string `csvpp:"name"`
Phones []string `csvpp:"phone[]"`
Geo struct {
Lat string
Lon string
} `csvpp:"geo(lat^lon)"`
}
// Read into structs
var people []Person
if err := csvpp.Unmarshal(file, &people); err != nil {
log.Fatal(err)
}
// Write from structs
var buf bytes.Buffer
if err := csvpp.Marshal(&buf, people); err != nil {
log.Fatal(err)
}
Delimiter Conventions ¶
The IETF CSV++ specification recommends using specific delimiters for nested structures to avoid conflicts. The recommended progression is:
- Level 1 (arrays): ~ (tilde)
- Level 2 (components): ^ (caret)
- Level 3: ; (semicolon)
- Level 4: : (colon)
This package uses ~ and ^ as defaults, matching the IETF recommendation.
Compatibility with encoding/csv ¶
This package wraps encoding/csv and inherits its RFC 4180 compliance. The Reader and Writer types expose the same configuration options:
- Comma: field delimiter (default: ',')
- Comment: comment character (Reader only)
- LazyQuotes: relaxed quote handling (Reader only)
- TrimLeadingSpace: trim leading whitespace (Reader only)
- UseCRLF: use \r\n line endings (Writer only)
Security Considerations ¶
The MaxNestingDepth option (default: 10) limits the depth of nested structures to prevent stack overflow attacks from maliciously crafted input.
CSV Injection ¶
When CSV files are opened in spreadsheet applications (Excel, Google Sheets, etc.), values beginning with '=', '+', '-', or '@' may be interpreted as formulas. This can lead to security vulnerabilities known as "CSV injection" or "formula injection".
Use the HasFormulaPrefix function to detect potentially dangerous values:
for _, field := range record {
if csvpp.HasFormulaPrefix(field.Value) {
field.Value = "'" + field.Value // Escape for spreadsheet safety
}
}
Note: This package does not automatically escape formula prefixes to preserve data integrity. Applications should implement appropriate escaping based on their specific security requirements and target environments.
Errors ¶
The package defines the following sentinel errors:
- ErrNoHeader: returned when attempting to read without a header row
- ErrInvalidHeader: returned when header format is invalid
- ErrNestingTooDeep: returned when nesting exceeds MaxNestingDepth
Parse errors are wrapped in ParseError, which provides line/column information.
Constants ¶
Default delimiters follow IETF recommendations:
- DefaultArrayDelimiter: ~ (tilde) for array fields
- DefaultComponentDelimiter: ^ (caret) for structured fields
- DefaultMaxNestingDepth: 10 (IETF recommended limit)
Specification Reference ¶
For the complete IETF CSV++ specification, see: https://datatracker.ietf.org/doc/draft-mscaldas-csvpp/
Example ¶
input := `name,phone[],geo(lat^lon)
Alice,555-1234~555-5678,34.0522^-118.2437
Bob,555-9999,40.7128^-74.0060
`
reader := csvpp.NewReader(strings.NewReader(input))
// Get headers
headers, err := reader.Headers()
if err != nil {
log.Fatal(err)
}
fmt.Printf("Headers: %s, %s, %s\n", headers[0].Name, headers[1].Name, headers[2].Name)
// Read all records
for {
record, err := reader.Read()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
name := record[0].Value
phones := record[1].Values
lat := record[2].Components[0].Value
lon := record[2].Components[1].Value
fmt.Printf("%s: phones=%v, location=(%s, %s)\n", name, phones, lat, lon)
}
Output: Headers: name, phone, geo Alice: phones=[555-1234 555-5678], location=(34.0522, -118.2437) Bob: phones=[555-9999], location=(40.7128, -74.0060)
Index ¶
- Constants
- Variables
- func HasFormulaPrefix(s string) bool
- func Marshal(w io.Writer, src any) error
- func MarshalWriter(w *Writer, src any) error
- func Unmarshal(r io.Reader, dst any) error
- func UnmarshalReader(r *Reader, dst any) error
- type ColumnHeader
- type Field
- type FieldKind
- type ParseError
- type Reader
- type Writer
Examples ¶
Constants ¶
const ( DefaultArrayDelimiter = '~' // IETF Section 2.3.2: recommended for array fields DefaultComponentDelimiter = '^' // IETF Section 2.3.2: recommended for structured fields )
Default delimiters as recommended in IETF CSV++ Section 2.3.2. The specification suggests delimiter progression: ~ → ^ → ; → : for nested structures.
const DefaultMaxNestingDepth = 10
DefaultMaxNestingDepth is the default maximum nesting depth. IETF Section 5 (Security Considerations) recommends limiting nesting depth to prevent stack overflow attacks from maliciously crafted input.
Variables ¶
var ( ErrNoHeader = errors.New("csvpp: header record is required") ErrInvalidHeader = errors.New("csvpp: invalid column header format") ErrNestingTooDeep = errors.New("csvpp: nesting level exceeds limit") )
Error definitions.
Functions ¶
func HasFormulaPrefix ¶ added in v0.0.2
HasFormulaPrefix reports whether s starts with a character that spreadsheet applications may interpret as a formula. These characters are: '=', '+', '-', '@'.
When CSV files are opened in spreadsheet applications like Microsoft Excel or Google Sheets, values beginning with these characters may be executed as formulas, potentially leading to security vulnerabilities (CSV injection).
This function helps identify potentially dangerous values so that applications can take appropriate action, such as prefixing with a single quote or rejecting the input.
Example:
if csvpp.HasFormulaPrefix(value) {
value = "'" + value // Escape for spreadsheet safety
}
func Marshal ¶
Marshal encodes a slice of structs to CSV++ data.
Example ¶
people := []Person{
{Name: "Alice", Phones: []string{"555-1234", "555-5678"}},
{Name: "Bob", Phones: []string{"555-9999"}},
}
var buf bytes.Buffer
if err := csvpp.Marshal(&buf, people); err != nil {
log.Fatal(err)
}
fmt.Print(buf.String())
Output: name,phone[] Alice,555-1234~555-5678 Bob,555-9999
func MarshalWriter ¶
MarshalWriter encodes a slice of structs to a Writer.
func Unmarshal ¶
Unmarshal decodes CSV++ data into a slice of structs. dst must be a pointer to a slice of structs.
Example ¶
input := `name,phone[]
Alice,555-1234~555-5678
Bob,555-9999
`
var people []Person
if err := csvpp.Unmarshal(strings.NewReader(input), &people); err != nil {
log.Fatal(err)
}
for _, p := range people {
fmt.Printf("%s: %v\n", p.Name, p.Phones)
}
Output: Alice: [555-1234 555-5678] Bob: [555-9999]
Example (Structured) ¶
input := `name,geo(lat^lon)
Los Angeles,34.0522^-118.2437
New York,40.7128^-74.0060
`
var locations []Location
if err := csvpp.Unmarshal(strings.NewReader(input), &locations); err != nil {
log.Fatal(err)
}
for _, loc := range locations {
fmt.Printf("%s: (%s, %s)\n", loc.Name, loc.Geo.Lat, loc.Geo.Lon)
}
Output: Los Angeles: (34.0522, -118.2437) New York: (40.7128, -74.0060)
func UnmarshalReader ¶
UnmarshalReader decodes from a Reader into a slice of structs.
Types ¶
type ColumnHeader ¶
type ColumnHeader struct {
Name string // Field name (ABNF: name = 1*field-char)
Kind FieldKind // Field type (IETF Section 2.2)
ArrayDelimiter rune // Array delimiter (ABNF: delimiter)
ComponentDelimiter rune // Component delimiter (ABNF: component-delim)
Components []*ColumnHeader // Component list (ABNF: component-list)
}
ColumnHeader represents the declaration information for an individual field. It corresponds to the ABNF "field" rule in IETF CSV++ Section 2.2:
field = simple-field / array-field / struct-field / array-struct-field name = 1*field-char field-char = ALPHA / DIGIT / "_" / "-"
type Field ¶
type Field struct {
Value string // Value for SimpleField
Values []string // Values for ArrayField (IETF Section 2.2.2)
Components []*Field // Components for StructuredField/ArrayStructuredField (IETF Section 2.2.3/2.2.4)
}
Field represents a parsed field value from a data row. The populated fields depend on the corresponding ColumnHeader.Kind:
- SimpleField: Value is set
- ArrayField: Values is set
- StructuredField: Components is set (each component is a Field)
- ArrayStructuredField: Components is set (each is a Field with its own Components)
type FieldKind ¶
type FieldKind int
FieldKind represents the type of field as defined in IETF CSV++ Section 2.2. See: https://datatracker.ietf.org/doc/draft-mscaldas-csvpp/
const ( SimpleField FieldKind = iota // IETF Section 2.2.1: simple-field = name ArrayField // IETF Section 2.2.2: array-field = name "[" [delimiter] "]" StructuredField // IETF Section 2.2.3: struct-field = name [component-delim] "(" component-list ")" ArrayStructuredField // IETF Section 2.2.4: array-struct-field = name "[" [delimiter] "]" [component-delim] "(" component-list ")" )
type ParseError ¶
type ParseError struct {
Line int // Line number where the error occurred (1-based)
Column int // Column number where the error occurred (1-based)
Field string // Field name (if available)
Err error // Original error
}
ParseError holds detailed information about an error that occurred during parsing.
func (*ParseError) Error ¶
func (e *ParseError) Error() string
Error returns the error message for ParseError.
type Reader ¶
type Reader struct {
// Comma is the field delimiter (default: ',').
Comma rune
// Comment is the comment character (disabled if 0).
Comment rune
// LazyQuotes relaxes strict quote checking if true.
LazyQuotes bool
// TrimLeadingSpace trims leading whitespace from fields if true.
TrimLeadingSpace bool
// MaxNestingDepth is the maximum nesting depth for structured fields (default: 10).
// This limit prevents stack overflow from deeply nested input (IETF Section 5).
// If 0, DefaultMaxNestingDepth is used.
MaxNestingDepth int
// contains filtered or unexported fields
}
Reader reads CSV++ files according to the IETF CSV++ specification. It wraps encoding/csv.Reader and provides CSV++ header parsing and field parsing. The first row is always treated as the header row (IETF Section 2.1).
func NewReader ¶
NewReader creates a new Reader.
Example (CustomDelimiter) ¶
// Using semicolon as field delimiter (common in European locales)
input := `name;age
Alice;30
Bob;25
`
reader := csvpp.NewReader(strings.NewReader(input))
reader.Comma = ';'
records, err := reader.ReadAll()
if err != nil {
log.Fatal(err)
}
for _, record := range records {
fmt.Printf("%s is %s\n", record[0].Value, record[1].Value)
}
Output: Alice is 30 Bob is 25
func (*Reader) Headers ¶
func (r *Reader) Headers() ([]*ColumnHeader, error)
Headers returns the parsed header information. If headers have not been parsed yet, the first row is read and parsed.
Example ¶
input := `id,name,tags[],address(street^city^zip)
1,Alice,go~rust,123 Main^LA^90210
`
reader := csvpp.NewReader(strings.NewReader(input))
headers, err := reader.Headers()
if err != nil {
log.Fatal(err)
}
for _, h := range headers {
fmt.Printf("%s: %s\n", h.Name, h.Kind)
}
Output: id: SimpleField name: SimpleField tags: ArrayField address: StructuredField
func (*Reader) Read ¶
Read reads and returns one record's worth of fields. The header row is automatically parsed on the first call. Returns io.EOF when the end of file is reached.
Example ¶
input := `name,scores[]
Alice,100~95~88
Bob,77~82
`
reader := csvpp.NewReader(strings.NewReader(input))
for {
record, err := reader.Read()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s: %v\n", record[0].Value, record[1].Values)
}
Output: Alice: [100 95 88] Bob: [77 82]
func (*Reader) ReadAll ¶
ReadAll reads and returns all records. The header row is automatically parsed on the first call.
Example ¶
input := `name,age
Alice,30
Bob,25
Charlie,35
`
reader := csvpp.NewReader(strings.NewReader(input))
records, err := reader.ReadAll()
if err != nil {
log.Fatal(err)
}
fmt.Printf("Read %d records\n", len(records))
for _, record := range records {
fmt.Printf("%s is %s years old\n", record[0].Value, record[1].Value)
}
Output: Read 3 records Alice is 30 years old Bob is 25 years old Charlie is 35 years old
type Writer ¶
type Writer struct {
// Comma is the field delimiter (default: ',').
Comma rune
// UseCRLF uses \r\n as the line terminator if true.
UseCRLF bool
// contains filtered or unexported fields
}
Writer writes CSV++ files according to the IETF CSV++ specification. It wraps encoding/csv.Writer and serializes CSV++ fields using the delimiters defined in the headers. The output is RFC 4180 compliant.
Example ¶
var buf bytes.Buffer
writer := csvpp.NewWriter(&buf)
headers := []*csvpp.ColumnHeader{
{Name: "name", Kind: csvpp.SimpleField},
{Name: "tags", Kind: csvpp.ArrayField, ArrayDelimiter: '~'},
}
writer.SetHeaders(headers)
if err := writer.WriteHeader(); err != nil {
log.Fatal(err)
}
records := [][]*csvpp.Field{
{{Value: "Alice"}, {Values: []string{"go", "rust"}}},
{{Value: "Bob"}, {Values: []string{"python"}}},
}
for _, record := range records {
if err := writer.Write(record); err != nil {
log.Fatal(err)
}
}
writer.Flush()
fmt.Print(buf.String())
Output: name,tags[] Alice,go~rust Bob,python
func (*Writer) SetHeaders ¶
func (w *Writer) SetHeaders(headers []*ColumnHeader)
SetHeaders sets the header information. This must be called before WriteHeader or Write.
func (*Writer) WriteAll ¶
WriteAll writes all records. The header row is also written automatically.
Example ¶
var buf bytes.Buffer
writer := csvpp.NewWriter(&buf)
headers := []*csvpp.ColumnHeader{
{Name: "name", Kind: csvpp.SimpleField},
{Name: "score", Kind: csvpp.SimpleField},
}
writer.SetHeaders(headers)
records := [][]*csvpp.Field{
{{Value: "Alice"}, {Value: "100"}},
{{Value: "Bob"}, {Value: "95"}},
}
if err := writer.WriteAll(records); err != nil {
log.Fatal(err)
}
fmt.Print(buf.String())
Output: name,score Alice,100 Bob,95
func (*Writer) WriteHeader ¶
WriteHeader writes the header row.