tableschema-go

module
v0.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 13, 2017 License: MIT

README

Build Status Coverage Status Go Report Card Gitter chat GoDoc

tableschema-go

A Go library for working with Table Schema.

Main Features

  • table package defines Table and Iterator interfaces, which are used to manipulate and/or explore tabular data;
  • csv package contains implementation of Table and Iterator interfaces to deal with CSV format;
  • schema package contains classes and funcions for working with table schemas:

Getting started

Installation

This package uses semantic versioning 2.0.0.

Using dep
$ dep init
$ dep ensure -add github.com/frictionlessdata/tableschema-go/csv@>=0.1

Examples

Code examples in this readme requires Go 1.8+. You can find more examples in the examples directory.

package main

import (
	"github.com/frictionlessdata/tableschema-go/csv"
	"github.com/frictionlessdata/tableschema-go/schema"
)

type user struct {
	ID   int
	Age  int
	Name string
}

func main() {
	tab, err := csv.NewTable(csv.FromFile("data_infer_utf8.csv"), csv.SetHeaders("id", "age", "name"))
	if err != nil {
		panic(err)
	}
	sch, err := schema.Infer(tab) // infer the table schema
	if err != nil {
		panic(err)
	}
	sch.SaveToFile("schema.json") // save inferred schema to file
	var users []user
	sch.DecodeTable(tab, &users) // unmarshals the table data into the slice.
}

Documentation

Reading and processing data

A Table is a core concept in the tabular data world. It is the logic representation of the data, thus it's interface should be agnostic from the physical representation. Let's see how we could use it in practice.

Consider we have some local CSV file, data.csv:

city,location
london,"51.50,-0.11"
paris,"48.85,2.30"
rome,N/A

As the physical representation is a CSV, we read its contents we use csv.NewReader to create a table object and use csv.FromFile as Source.

locTable, _ := csv.NewReader(csv.FromFile("data.csv"), csv.LoadHeaders())
locTable.Headers() // ["city", "location"]
fmt.Println(locTable.ReadAll())
// [[london 51.50,-0.11] [paris 48.85,2.30] [rome N/A]]

So far, locations are string, but it should be geopoints. Also Rome's location is not available but it's also just a N/A string instead of go's zero value. To start doing proper data processing we need to encode the data. In frictionless data, the schema represents that. An effortless way to quick start a schema is to use schema.Infer and algorithmically infer the table Schema:

locSchema, _ := schema.Infer(tab)
fmt.Printf("%+v", locSchema)
// "fields": [
//     {"name": "city", "type": "string", "format": "default"},
//     {"name": "location", "type": "geopoint", "format": "default"},
// ],
// "missingValues": []
// ...

Then we are ready create to decode the table data into go structs. It is like json.Unmarshal, but for table rows.

type Location struct {
    City string
    Location schema.GeoPoint
}

var locations []Location
err := locSchema.DecodeTable(locTable, &locations)
// Fails with: "Invalid geopoint:\"N/A\""

The problem is that the library does not know that N/A is not an empty value. For those cases, there is a missingValues property in Table Schema specification. As a first try we set missingValues to N/A in table.Schema.

locSchema.MissingValues = []string{"N/A"}
locSchema.DecodeTable(locTable, &locations)
// [{london {51.5 -0.11}} {paris {48.85 2.3}} {rome {0 0}}]

If the data being processed is too big and you would like to iterate over table row by row, you could use table.Iter.

locSchema.MissingValues = []string{"N/A"}
iter, _ := locSchema.Iter()
for iter.Next() {
    var loc Location
    locSchema.Decode(iter.Row(), loc)
    // process location
}

And because there are no errors on data reading we could be sure that our data is valid againt our schema. Let's save it:

locSchema.SaveToFile("schema.json")

Writing data

The writing path is equally simple. Lets assume we have the locSchema created above and want to open source a CSV table (it could also append to an existing table) based on a slice of Locationobjects, which where generated by some internal pipeline. This new pipeline step would be:

package main

import (
	"os"
	"github.com/frictionlessdata/tableschema-go/csv"
)

func CreateLocationTable(locations []Location, path string) error {
    f, err := os.Open(path)
    if err != nil {
    	return err
    }
    defer f.Close()
    rows, err := locSchema.EncodeTable(locations)
    if err != nil {
    	return err
    }
    w := csv.NewWriter(f)
    w.Write([]string{"City", "Location"})
    w.WriteAll(rows)
    w.Flush()
    return nil
}

Documentation

More detailed documentation about API methods is available at https://godoc.org/github.com/frictionlessdata/tableschema-go

Directories

Path Synopsis
examples
Package table provides the main interfaces used to manipulate tabular data.
Package table provides the main interfaces used to manipulate tabular data.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL