csv

package module
v0.0.0-...-0509d56 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 23, 2018 License: MIT Imports: 13 Imported by: 0

README

CSV

godoc

A Golang package for reading and writing CSV-like documents.

Working in progress.

Why another CSV package?

Golang provides an encoding/csv package out of the box for reading and writing standard CSV files (as described in RFC 4180). However, not all CSV documents follow the specs. Although the built-in csv package provides some sort of customizability, it cannot cover all variations of these CSV formats.

Also, the built-in csv package does not implement a marshaler/unmarshaler (like in json) for generating/parsing CSV documents from/into struct instances.

This package aims to offer better customizability than the built-in csv package, as well as a set of easy-to-use marshaler and unmarshaler.

Installation

$ go get -u github.com/beta/csv

Getting started

TBD.

Customization

csv uses setting functions for customization. The default setting used by csv supports a flexible variation of the standard CSV format.

  • The default encoding is UTF-8.

  • , is used as the default separator.

  • Single quotes are allowed. Escaping works exactly the same as double quotes.

    "field 1",'field 2','field 3 ''escaped''','field "4"'

    will be parsed as

    ["field 1", "field 2", "field 3 'escaped'", "field \"4\""]

  • Empty fields are allowed.

    field 1,,field 3

    will be parsed as

    ["field 1", "", "field 3"]

  • An ending line break in the last record is allowed.

  • Leading and trailing spaces in fields will be ignored.

    field 1 , field 2

    will be parsed as

    ["field 1", "field 2"]

  • Empty lines are omitted.

  • The marshaler by default outputs the header row based on the csv tag of struct fields.

Below lists all the settings that can be used to customize the behavior of csv.

Common settings
Setting Description Default
Encoding(encoding.Encoding) Sets the character encoding used while reading and writing a document. unicode.UTF8
Separator(rune) Sets the separator used to separate fields while reading and writing a document. ,
Prefix(rune) Sets the prefix of every field while reading and writing a document.
Suffix(rune) Sets the suffix of every field while reading and writing a document.
Scanner settings
Setting Description Default
AllowSingleQuote(bool) Sets whether single quotes are allowed while scanning a document. true
AllowEmptyField(bool) Sets whether empty fields are allowed while scanning a document. true
AllowEndingLineBreakInLastRecord(bool) Sets whether the last record may have an ending line break while reading a document. true
OmitLeadingSpace(bool) Sets whether the leading spaces of fields should be omitted while scanning a document. true
OmitTrailingSpace(bool) Sets whether the trailing spaces of fields should be omitted while scanning a document. true
OmitEmptyLine(bool) Sets whether empty lines should be omitted while reading a document. true
Comment(rune) Sets the leading rune of comments used while scanning a document.
IgnoreBOM(bool) Sets whether the leading BOM (byte order mark) should be ignored while reading a document. If not, the BOM will be treated as normal content.
This should not be done by a csv package, but since Golang has no built-in support for BOM, a workaround is required.
true
Unmarshaler and marshaler settings
Setting Description Default
HeaderPrefix(rune) Sets the prefix rune of header names while unmarshaling and marshaling a document.
If a header prefix is set, the Prefix setting will be ignored while reading and writing the header row, but will still be used for fields.
HeaderSuffix(rune) Sets the suffix rune of header names while unmarshaling and marshaling a document.
If a header suffix is set, the Suffix setting will be ignored while reading and writing the header row, but will still be used for fields.
FieldPrefix(rune) Sets the prefix rune of fields while unmarshaling and marshaling a document.
If a field prefix is set, the Prefix setting will be ignored while reading and writing fields, but will still be used for the header.
FieldSuffix(rune) Sets the suffix rune of fields while unmarshaling and marshaling a document.
If a field suffix is set, the Suffix setting will be ignored while reading and writing fields, but will still be used for the header.
Unmarshaler settings
Setting Description Default
Validator(string, func(interface{}) bool) Adds a new validator function for validating a CSV value while unmarshaling a document.
Marshaler settings
Setting Description Default
WriteHeader(bool) Sets whether to output the header row while writing the document. true

All scanner settings can be used in an unmarshaler. Also, all generator settings can be used in an marshaler.

Beside the settings above, there's a special setting named RFC4180 which applies the requirements as described in RFC 4180, including

  • using , as the separator,
  • no prefix and suffix,
  • not allowing single quotes,
  • not allowing empty fields,
  • allowing an ending line break in the last record,
  • not omitting leading and trailing spaces,
  • not omitting empty lines, and
  • not allowing comments.

License

MIT

Documentation

Overview

Package csv implements a parser and generator for CSV-like documents.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Marshal

func Marshal(v interface{}, settings ...Setting) ([]byte, error)

Marshal generates a CSV document from v with the given settings.

v should be an array/slice of struct or struct pointers. In these structs, each exported field will be marshaled as a CSV field, with the field name as the column header. This can be customized with a "csv" struct field tag, which gives the name of the field. Use "-" to omit a field from being marshaled. Use "-," to set the header name to "-".

Below are some example of using the "csv" struct field tag.

// Field will be marshaled with "myName" as its header name.
Field int `csv:"myName"`

// Field is ignored.
Field int `csv:"-"`

// Field will be marshaled with "-" as its header name.
Field int `csv:"-,"`

Marshal supports the following types:

A boolean value will be marshaled to "true" or "false" based on its value.

A floating point, integer or number value will be marshaled to the string representation of its value.

A string value will be marshaled to the value of itself.

Any type implementing encoding.TextMarshaler will be marshaled to the value returned by MarshalText.

If Marshal encounters a field with an unsupported type, an UnsupportedTypeError will be returned.

In order to marshal an unsupported type, a translator can be used to translate the value. A translator is a func(interface{}) ([]byte, error). Use the Translator setting to register one or more translators before marshaling. For example:

func TranslateIntSlice(slice interface{}) ([]byte, error) {
    ...
}

csv.Marshal(..., csv.Translator("intSlice", TranslateIntSlice))

To use a translator for an unsupported type, add it to the "csv" struct field tag. For example:

// Field will be marshaled with "myName" as its header name, and use
// translator with name "intSlice" to translate its value.
Field []int `csv:"myName,intSlice"`

If a field has multiple ways to be marshaled, the order of using these ways is:

  1. Using the translator specified in "csv" struct field tag.
  2. Call MarshalText of the field.
  3. Use the default way to marshal the field if it is supported.

func Unmarshal

func Unmarshal(data []byte, dest interface{}, settings ...Setting) error

Unmarshal parses a CSV document and stores the result in the struct slice pointed to by dest. If dest is nil or not a pointer to a struct slice, Unmarshal returns an InvalidUnmarshalError.

Types

type Generator

type Generator struct {
	// contains filtered or unexported fields
}

A Generator generates a new CSV document.

func NewGenerator

func NewGenerator(settings ...Setting) *Generator

NewGenerator creates and returns a new generator with the given settings.

func (*Generator) Finish

func (g *Generator) Finish() ([]byte, error)

Finish finishes writing to the generator and returns data of the document.

After calling Finish, the generator can no longer be written. Any call to Write and WriteAll will return an error.

func (*Generator) Write

func (g *Generator) Write(record []string) error

Write writes a record row to the end of the document.

If Finish has been called, Write returns an error.

func (*Generator) WriteAll

func (g *Generator) WriteAll(records [][]string) error

WriteAll writes all the rows in records to the end of the document.

If Finish has been called, WriteAll returns an error.

type InvalidUnmarshalError

type InvalidUnmarshalError struct {
	Type reflect.Type
}

An InvalidUnmarshalError describes an invalid argument passed to Unmarshal. (The argument to Unmarshal must be a non-nil pointer.)

func (*InvalidUnmarshalError) Error

func (e *InvalidUnmarshalError) Error() string

type Scanner

type Scanner struct {
	// contains filtered or unexported fields
}

A Scanner scans a CSV document and returns the scanned header and rows.

func NewScanner

func NewScanner(data []byte, settings ...Setting) (*Scanner, error)

NewScanner creates and returns a new scanner from a byte slice with the given settings.

func (*Scanner) Scan

func (s *Scanner) Scan() (row []string, err error)

Scan scans the next row from the CSV document.

If an error occurs, row will be returned as nil.

If there is no more row to be scanned, io.EOF will be returned.

func (*Scanner) ScanAll

func (s *Scanner) ScanAll() (rows [][]string, err error)

ScanAll scans the rest rows of the CSV document.

If an error occurs, rows will be returned as nil.

func (*Scanner) Setting

func (s *Scanner) Setting(settings ...Setting)

Setting applies settings for s.

type Setting

type Setting func(*rule)

A Setting provides information on how documents should be parsed.

func AllowEmptyField

func AllowEmptyField(v bool) Setting

AllowEmptyField sets whether empty fields are allowed while reading a document.

func AllowEndingLineBreakInLastRecord

func AllowEndingLineBreakInLastRecord(v bool) Setting

AllowEndingLineBreakInLastRecord sets whether the last record may have an ending line break while reading a document.

func AllowSingleQuote

func AllowSingleQuote(v bool) Setting

AllowSingleQuote sets whether single quotes are allowed while reading a document.

func Comment

func Comment(comment rune) Setting

Comment sets the leading rune of comments used while reading a document.

func Encoding

func Encoding(enc encoding.Encoding) Setting

Encoding sets the character encoding used while reading and writing a document.

func FieldPrefix

func FieldPrefix(prefix rune) Setting

FieldPrefix sets the prefix rune of fields while unmarshaling and marshaling a document.

If a field prefix is set, the Prefix setting will be ignored while reading and writing fields, but will still be used for the header.

func FieldSuffix

func FieldSuffix(suffix rune) Setting

FieldSuffix sets the suffix rune of fields while unmarshaling and marshaling a document.

If a field suffix is set, the Suffix setting will be ignored while reading and writing fields, but will still be used for the header.

func HeaderPrefix

func HeaderPrefix(prefix rune) Setting

HeaderPrefix sets the prefix rune of header names while unmarshaling and marshaling a document.

If a header prefix is set, the Prefix setting will be ignored while reading and writing the header row, but will still be used for fields.

func HeaderSuffix

func HeaderSuffix(suffix rune) Setting

HeaderSuffix sets the suffix rune of header names while unmarshaling and marshaling a document.

If a header suffix is set, the Suffix setting will be ignored while reading and writing the header row, but will still be used for fields.

func IgnoreBOM

func IgnoreBOM(v bool) Setting

IgnoreBOM sets whether the leading BOM (byte order mark) should be ignored while reading a document. If not, the BOM will be treated as normal content.

This should not be done by a csv package, but since Golang has no built-in support for BOM, a workaround is required.

func OmitEmptyLine

func OmitEmptyLine(v bool) Setting

OmitEmptyLine sets whether empty lines should be omitted while reading a document.

func OmitLeadingSpace

func OmitLeadingSpace(v bool) Setting

OmitLeadingSpace sets whether the leading spaces of fields should be omitted while reading a document.

func OmitTrailingSpace

func OmitTrailingSpace(v bool) Setting

OmitTrailingSpace sets whether the trailing spaces of fields should be omitted while reading a document.

func Prefix

func Prefix(prefix rune) Setting

Prefix sets the prefix of every field while reading and writing a document.

func RFC4180

func RFC4180() Setting

RFC4180 sets the parser and generator to work in the exact way as described in RFC 4180.

func Separator

func Separator(sep rune) Setting

Separator sets the separator used to separate fields while reading and writing a document.

func Suffix

func Suffix(suffix rune) Setting

Suffix sets the suffix of every field when reading and writing a document.

func Validator

func Validator(name string, validator func(interface{}) bool) Setting

Validator adds a new validator functions for validating a CSV value while unmarshaling a document.

func WriteHeader

func WriteHeader(v bool) Setting

WriteHeader sets whether to output the header row while writing the document.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL