gg

package
v0.0.0-...-abd1f79 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 23, 2017 License: BSD-3-Clause Imports: 25 Imported by: 4

Documentation

Overview

Package gg creates plots using the Grammar of Graphics.

WARNING: This API is highly unstable. For now, please vendor this package.

gg creates statistical visualizations. It's designed to help users quickly navigate and explore complex data in different ways, both in terms of what they're plotting and how they're plotting it. This focus on rapid exploration of complex data leads to a very different design than typical plotting packages.

gg is heavily inspired by Wilkinson's Grammar of Graphics [1]. A key observation of the Grammar of Graphics is that there are many motifs across different types of plots. The Grammar of Graphics separates these motifs into orthogonal concerns that can be manipulated and extended independently, enabling the creation of traditional plot types from their fundamental components as well as the creation of entirely new plot types.

Data model

Central to gg is its data model. At the most basic level, the input data consists of a table with a set of named columns, with the rows organized into one or more groups. At a higher level, because gg makes it easy to restructure data before plotting it, it expects to start with regularized input data, where each column represents a distinct independent or dependent variable. In other words, any two values that make sense to plot on the same axis should be in the same column.

For example, to express a line graph with several series of different colors in gg, you would say "plot column A against column B, grouped into series and colored according to column C". In contrast, typical plotting packages use a "spreadsheet" model, where each data series is a separate column, so expressing the same graph requires saying "plot column A against column B in color 1 and plot column A against column C in color 2" and so on.

gg's approach is suited to exploratory data analysis because you don't have to restructure the data to see it in a different way. In the traditional spreadsheet model, you have to structure the data to match the plot. In gg, you tell the plot what structure to extract from the data.

Layers and scales

To visualize data, gg provides a set of composable plot building blocks. There are no fixed "plot types" in gg. The main building block is a "layer", which transforms a data set into a set of visual marks, such as lines, points, or rectangles. Each layer is configured by mapping columns of the data set to different "aesthetics". An aesthetic is a generalization of a dimension: X and Y are aesthetics, but so are color and stroke width and point shape. Unlike typical plotting packages, these various aesthetics are treated symmetrically and any aesthetic can be fed from any column of the data.

Layers work in close concert with "scales", which map from values in the data space to values in the visual space. Scales can map from continuous or discrete data values (such as numbers or strings) to continuous or discrete visual values (such as pixel offsets or point shapes). Each aesthetic has an associated scale. If the user hasn't provided a specific scale for an aesthetic, gg uses a default scale that guesses what to do based on the data type and aesthetic.

Stats

Data can be pre-processed prior to rendering it with a layer using a "stat". A stat can be an arbitrary data transformation, but it's typically used to compute statistical summaries, such as the five-number summary (e.g., for a box plot), a linear regression, or a density estimate.

TODO: "Compound" layers?

Facets

TODO.

Aesthetics

gg understands the following aesthetics.

"x" and "y" give the offset from the lower-left corner of a plot. Their ranges are always set to the pixel coordinates of the X and Y axes, respectively, and cannot be overridden.

"stroke" and "fill" give the stroke and fill colors of paths and points. Their ranger must have type color.Color. The default ranger returns a single-hue gradient for continuous data, or a categorical palette for discrete data.

"opacity" gives the overall opacity of a mark. Its ranger must have type float64 and give values between 0 and 1, inclusive. The default ranger ranges from 10% opaque (0.1) to fully opaque (1.0).

"size" gives the size of marks. Its ranger must have type float64 and yields values that are relative to the smallest dimension of the plot area (e.g., a value of 0.5 creates a point that cover half of the plot width or height, whichever is smaller). The default ranger ranges from 1% (0.01) to 10% (0.1).

gg draws ideas and inspiration from many sources. The core principle of a Grammar of Graphics was introduced by Wiklinson [1]. There have been many implementations in many languages. The most popular is certainly Wickham's ggplot2 for R [2]. gg draws most heavily on Wickham's follow-up work on ggvis for R [3].

[1] Leland Wilkinson, The Grammar of Graphics, Springer, 1999.

[2] Hadley Wickham, ggplot2: Elegant Graphics for Data Analysis, Springer, 2009.

[3] Hadley Wickham, ggvis, http://ggvis.rstudio.com/.

TODO: Scale transforms, coordinate spaces.

Index

Examples

Constants

This section is empty.

Variables

View Source
var Warning = log.New(os.Stderr, "[gg] ", log.Lshortfile)

Warning is a logger for reporting conditions that don't prevent the production of a plot, but may lead to unexpected results.

Functions

func NewTimeScaler

func NewTimeScaler() *timeScale

NewTimeScaler returns a continuous linear scale. The domain must be time.Time.

Example
var x []time.Time
var y []float64
var steps []time.Duration
for _, step := range []time.Duration{
	1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9,
	time.Minute, time.Hour, 24 * time.Hour, 7 * 24 * time.Hour,
} {
	t := time.Now()
	for i := 0; i < 100; i++ {
		x = append(x, t)
		y = append(y, rand.Float64()-.5)
		steps = append(steps, 100*step)
		t = t.Add(-step)
	}
}

tb := table.NewBuilder(nil)
tb.Add("x", x).Add("y", y).Add("steps", steps)

plot := NewPlot(tb.Done())

plot.SetScale("x", NewTimeScaler())

plot.Add(FacetY{
	Col:          "steps",
	SplitXScales: true,
})

plot.Add(LayerLines{
	X: "x",
	Y: "y",
})

f, err := os.Create("scale_time.svg")
if err != nil {
	panic("unable to create scale_time.svg")
}
defer f.Close()
plot.WriteSVG(f, 800, 1000)
fmt.Println("ok")
Output:

ok

Types

type ContinuousRanger

type ContinuousRanger interface {
	Ranger
	Map(x float64) (y interface{})
	Unmap(y interface{}) (x float64, ok bool)
}

func NewFloatRanger

func NewFloatRanger(lo, hi float64) ContinuousRanger

type ContinuousScaler

type ContinuousScaler interface {
	Scaler

	// SetMin and SetMax set the minimum and maximum values of
	// this Scalar's domain and return the Scalar. If v is nil, it
	// unsets the bound.
	//
	// v must be convertible to the Scaler's domain type. For
	// example, if this is a linear scale, v can be of any
	// numerical type. Unlike ExpandDomain, these do not set the
	// Scaler's domain type.
	SetMin(v interface{}) ContinuousScaler
	SetMax(v interface{}) ContinuousScaler

	// Include requires that v be included in this Scaler's
	// domain. Like SetMin/SetMax, this can expand Scaler's
	// domain, but unlike SetMin/SetMax, this does not restrict
	// it. If v is nil, it does nothing.
	//
	// v must be convertible to the Scaler's domain type. Unlike
	// ExpandDomain, this does not set the Scaler's domain type.
	Include(v interface{}) ContinuousScaler
}

func NewLinearScaler

func NewLinearScaler() ContinuousScaler

NewLinearScaler returns a continuous linear scale. The domain must be a VarCardinal.

XXX If I return a Scaler, I can't have methods for setting fixed bounds and such. I don't really want to expose the whole type. Maybe a sub-interface for continuous Scalers?

func NewLogScaler

func NewLogScaler(base int) ContinuousScaler

type DiscreteRanger

type DiscreteRanger interface {
	Ranger
	Levels() (min, max int)
	MapLevel(i, j int) interface{}
}

func NewColorRanger

func NewColorRanger(palette []color.Color) DiscreteRanger

type FacetCommon

type FacetCommon struct {
	// Col names the column to facet by. Each distinct value of
	// this column will become a separate plot. If Col is
	// orderable, the facets will be in value order; otherwise,
	// they will be in index order.
	Col string

	// SplitXScales indicates that each band (column for FacetX;
	// row for FacetY) created by this faceting operation should
	// have separate X axis scales. The default, false, indicates
	// that subplots should continue to share X scales.
	//
	// SplitXScales and SplitYScales, combined with facet
	// composition, give a great deal of control over how scales
	// are shared. Suppose you want to create an X/Y facet grid by
	// first performing a FacetX and then a FacetY. Here are some
	// common ways to share or split the scales:
	//
	// * To share the same scales between all subplots, set both
	// flags to false in both facet operations.
	//
	// * To have independent scales in all subplots, set both
	// flags to true in the FacetY (and it doesn't matter what
	// they are in the FacetX).
	//
	// * To share the X scale within each column and the Y scale
	// within each row, set SplitXScales in the FacetX and
	// SplitYScales in the FacetY.
	SplitXScales bool

	// SplitYScales is the equivalent of SplitXScales for Y axis
	// scales.
	SplitYScales bool

	// Labeler is a function that constructs facet labels from
	// data values. If this is nil, the default is fmt.Sprint.
	//
	// TODO: Call this through reflect to get the argument type
	// right?
	Labeler func(interface{}) string

	// Rows and Cols specify the number of rows or columns for
	// FacetWrap. If both are zero, FacetWrap chooses reasonable
	// defaults. Otherwise, one or the other should be zero.
	Rows, Cols int
}

FacetCommon is the base type for plot faceting operations. Faceting is a grouping operation that subdivides a plot into subplots based on the values in data column. Faceting operations may be composed: if a faceting operation has already divided the plot into subplots, a further faceting operation will subdivide each of those subplots.

type FacetWrap

type FacetWrap FacetCommon

FacetWrap splits a plot into a grid of rows and columns.

func (FacetWrap) Apply

func (f FacetWrap) Apply(p *Plot)

type FacetX

type FacetX FacetCommon

FacetX splits a plot into columns.

func (FacetX) Apply

func (f FacetX) Apply(p *Plot)

type FacetY

type FacetY FacetCommon

FacetY splits a plot into rows.

func (FacetY) Apply

func (f FacetY) Apply(p *Plot)

type LayerArea

type LayerArea struct {
	// X names the column that defines the input of each point. If
	// this is empty, it defaults to the first column.
	X string

	// Upper and Lower name columns that define the range of
	// response to shade. If either is "", it defaults to a
	// constant 0 value.
	Upper, Lower string

	// Fill names a column that defines the fill color of each
	// area. If Fill is "", it defaults to black. Otherwise, the
	// data is grouped by Fill.
	Fill string

	// FillOpacity names a column that defines the fill opacity of
	// each area. If FillOpacity is "", it defaults to 0.5.
	// Otherwise, the data is grouped by FillOpacity.
	FillOpacity string
}

LayerArea shades the area between two columns with a polygon. It is useful in conjunction with ggstat.AggMax and ggstat.AggMin for drawing the extents of data.

func (LayerArea) Apply

func (l LayerArea) Apply(p *Plot)

type LayerLines

type LayerLines LayerPaths

LayerLines is like LayerPaths, but connects data points in order by the "x" property.

func (LayerLines) Apply

func (l LayerLines) Apply(p *Plot)

type LayerPaths

type LayerPaths struct {
	// X and Y name columns that define the input and response of
	// each point on the path. If these are empty, they default to
	// the first and second columns, respectively.
	X, Y string

	// Color names a column that defines the stroke color of each
	// path. If Color is "", it defaults to constant black.
	// Otherwise, the data is grouped by Color.
	Color string

	// Fill names a column that defines the fill color of each
	// path. If Fill is "", it defaults to none. Otherwise, the
	// data is grouped by Fill.
	Fill string
}

LayerPaths groups by Color and Fill, and then connects successive data points in each group with a path and/or a filled polygon.

func (LayerPaths) Apply

func (l LayerPaths) Apply(p *Plot)

type LayerPoints

type LayerPoints struct {
	// X and Y name columns that define input and response of each
	// point. If these are empty, they default to the first and
	// second columns, respectively.
	X, Y string

	// Color names the column that defines the fill color of each
	// point. If Color is "", it defaults to constant black.
	Color string

	// Opacity names the column that defines the opacity of each
	// point. If Opacity is "", it defaults to fully opaque. This
	// is multiplied by any alpha value specified by Color.
	Opacity string

	// Size names the column that defines the size of each point.
	// If Size is "", it defaults to 1% of the smallest plot
	// dimension.
	Size string
}

LayerPoints layers a point mark at each data point.

func (LayerPoints) Apply

func (l LayerPoints) Apply(p *Plot)

type LayerSteps

type LayerSteps struct {
	LayerPaths

	Step StepMode
}

LayerSteps is like LayerPaths, but connects data points with a path consisting only of horizontal and vertical segments.

func (LayerSteps) Apply

func (l LayerSteps) Apply(p *Plot)

type LayerTags

type LayerTags struct {
	// X and Y name columns that define the input and response
	// each tag is attached to. If they are "", they default to
	// the first and second columns, respectively.
	X, Y string

	// Label names the column that gives the text to put in the
	// tag at X, Y. Label is required.
	Label string

	// HPos controls the horizontal position of the tag if
	// multiple points have the same Label. The label will be
	// attached to the point closest to HPos between the left-most
	// (HPos == 0) and the right-most (HPos == 1) points on this
	// curve.
	HPos float64

	// Offset controls the pixel offset of the tag from the point
	// it is attached to. If these are both zero, they are treated
	// as -20, -20.
	OffsetX, OffsetY int
}

LayerTags attaches text annotations to data points.

TODO: Currently this groups by label and makes one annotation per group. This should be a controllable.

func (LayerTags) Apply

func (l LayerTags) Apply(p *Plot)

type LayerTiles

type LayerTiles struct {
	// X and Y name columns that define the input and response at
	// the center of each rectangle. If they are "", they default
	// to the first and second columns, respectively.
	X, Y string

	// Width and Height name columns that define the width and
	// height of each rectangle. If they are "", the width and/or
	// height are automatically determined from the smallest
	// spacing between distinct X and Y points.
	Width, Height string

	// Fill names a column that defines the fill color of each
	// rectangle. If it is "", the default fill is black.
	Fill string
}

LayerTiles layers a rectangle at each data point. The rectangle is specified by its center, width, and height.

func (LayerTiles) Apply

func (l LayerTiles) Apply(p *Plot)

type LayerTooltips

type LayerTooltips struct {
	// X and Y name columns that define locations of tooltips. If
	// they are "", they default to the first and second columns,
	// respectively.
	X, Y string

	// Label names the column that gives the text of the tooltip.
	Label string
}

LayerTooltips attaches hover tooltips to data points.

func (LayerTooltips) Apply

func (l LayerTooltips) Apply(p *Plot)

type Plot

type Plot struct {
	// contains filtered or unexported fields
}

Plot represents a single (potentially faceted) plot.

func NewPlot

func NewPlot(data table.Grouping) *Plot

NewPlot returns a new Plot backed by data. It has no layers, one facet, and all scales are default.

func (*Plot) Add

func (p *Plot) Add(plotters ...Plotter) *Plot

Add applies each of plotters to Plot in order.

func (*Plot) Const

func (p *Plot) Const(val interface{}) string

Const creates a new constant column bound to val in all groups and returns the generated column name. This is a convenient way to pass constant values to layers as columns.

TODO: Typically this should be used with PreScaled or physical types.

func (*Plot) Data

func (p *Plot) Data() table.Grouping

Data returns p's current data table.

func (*Plot) GetScale

func (p *Plot) GetScale(aes string) Scaler

GetScale returns the scale for the given visual aesthetic used for data in the root group.

func (*Plot) GetScaleAt

func (p *Plot) GetScaleAt(aes string, gid table.GroupID) Scaler

GetScaleAt returns the scale for the given visual aesthetic used for data in group gid.

func (*Plot) GroupAuto

func (p *Plot) GroupAuto() *Plot

GroupAuto groups p's data table on all columns that are comparable but are not numeric (that is, all categorical columns).

TODO: Maybe there should be a CategoricalBindings that returns the set of categorical bindings, which callers could just pass to GroupBy, possibly after manipulating.

TODO: Does implementing sort.Interface make an otherwise cardinal column ordinal?

func (*Plot) GroupBy

func (p *Plot) GroupBy(cols ...string) *Plot

GroupBy sub-divides all groups such that all of the rows in each group have equal values for all of the named columns.

func (*Plot) Restore

func (p *Plot) Restore() *Plot

Restore restores the data table of p from the save stack.

func (*Plot) Save

func (p *Plot) Save() *Plot

Save saves the current data table of p to a stack.

func (*Plot) SetData

func (p *Plot) SetData(data table.Grouping) *Plot

SetData sets p's current data table. The caller must not modify data in this table after this point.

func (*Plot) SetScale

func (p *Plot) SetScale(aes string, s Scaler) *Plot

SetScale binds a scale to the given visual aesthetic. SetScale is shorthand for SetScaleAt(aes, s, table.RootGroupID). SetScale must be called before Add.

SetScale returns p for ease of chaining.

func (*Plot) SetScaleAt

func (p *Plot) SetScaleAt(aes string, s Scaler, gid table.GroupID) *Plot

SetScaleAt binds a scale to the given visual aesthetic for all data in group gid or descendants of gid. SetScaleAt must be called before Add.

func (*Plot) SortBy

func (p *Plot) SortBy(cols ...string) *Plot

SortBy sorts each group by the named columns. If a column's type implements sort.Interface, rows will be sorted according to that order. Otherwise, the values in the column must be naturally ordered (their types must be orderable by the Go specification). If neither is true, SortBy panics with a *generic.TypeError. If more than one column is given, SortBy sorts by the tuple of the columns; that is, if two values in the first column are equal, they are sorted by the second column, and so on.

func (*Plot) Stat

func (p *Plot) Stat(stats ...Stat) *Plot

Stat applies each of stats in order to p's data.

TODO: Perform scale transforms before applying stats.

func (*Plot) WriteSVG

func (p *Plot) WriteSVG(w io.Writer, width, height int) error

type Plotter

type Plotter interface {
	Apply(*Plot)
}

A Plotter is an operation that can modify a Plot.

func AxisLabel

func AxisLabel(axis, label string) Plotter

AxisLabel returns a Plotter that sets the label of an axis on a Plot. By default, Plot constructs automatic axis labels from column names, but AxisLabel lets callers override these.

TODO: Should labels be attached to aesthetics, generally?

TODO: Should this really be a Plotter or just a method of Plot?

func Title

func Title(label string) Plotter

Title returns a Plotter that sets the title of a Plot.

type Ranger

type Ranger interface {
	RangeType() reflect.Type
}

XXX

A Ranger must be either a ContinuousRanger or a DiscreteRanger.

type Scaler

type Scaler interface {
	ExpandDomain(table.Slice)

	// Ranger sets this Scaler's output range if r is non-nil and
	// returns the previous range. If a scale's Ranger is nil, it
	// will be assigned a default Ranger based on its aesthetic
	// when the Plot is rendered.
	Ranger(r Ranger) Ranger

	// XXX Should RangeType be implied by the aesthetic?
	//
	// XXX Should this be a method of Ranger instead?
	RangeType() reflect.Type

	// XXX
	//
	// x must be of the same type as the values in the domain Var.
	//
	// XXX Or should this take a slice? Or even a Var? That would
	// also eliminate RangeType(), though then Map would need to
	// know how to make the right type of return slice. Unless we
	// pushed slice mapping all the way to Ranger.
	//
	// XXX We could eliminate ExpandDomain if the caller was
	// required to pass everything to this at once and this did
	// the scale training. That would also make it easy to
	// implement the cardinal -> discrete by value order rule.
	// This would probably also make Map much faster.
	//
	// XXX If x is Unscaled, Map must only apply the ranger.
	Map(x interface{}) interface{}

	// Ticks returns a set of "nice" major and minor tick marks
	// spanning this Scaler's domain. The returned tick locations
	// are values in this Scaler's domain type in increasing
	// order. labels[i] gives the label of the major tick at
	// major[i]. The minor ticks are a superset of the major
	// ticks.
	//
	// max and pred constrain the ticks returned by Ticks. If
	// possible, Ticks returns the largest set of ticks such that
	// there are no more than max major ticks and the ticks
	// satisfy pred. Both are hints, since for some scale types
	// there's no clear way to reduce the number of ticks.
	//
	// pred should return true if the given set of ticks is
	// acceptable. pred must be "monotonic" in the following
	// sense: if pred is true for a given set of ticks, it must be
	// true for any subset of those ticks and if pred is false for
	// a given set of ticks, it must be false for any superset of
	// those ticks. In other words, pred should return false if
	// there are "too many" ticks or they are "too close
	// together". If pred is nil, it is assumed to always be
	// satisfied.
	//
	// If no tick marks can be produced (for example, there are no
	// values in this Scaler's domain or the predicate cannot be
	// satisfied), Ticks returns nil, nil, nil.
	//
	// TODO: Should this return ticks in the input space, the
	// intermediate space, or the output space? moremath returns
	// values in the input space. Input space values doesn't work
	// for discrete scales if I want the ticks between values.
	// Intermediate space works for continuous and discrete
	// inputs, but not for discrete ranges (maybe that's okay) and
	// it's awkward for a caller to do anything with an
	// intermediate space value. Output space doesn't work with
	// this API because I change the plot location in the course
	// of layout without recomputing ticks. However, output space
	// could work if Scaler exposed tick levels, since I could
	// save the computed tick level across a re-layout and
	// recompute the output space ticks from that.
	Ticks(max int, pred func(major, minor table.Slice, labels []string) bool) (major, minor table.Slice, labels []string)

	// SetFormatter sets the formatter for values on this scale.
	//
	// f may be nil, which makes this Scaler use the default
	// formatting. Otherwise, f must be a func(T) string where T
	// is convertible from the Scaler's input type (note that this
	// is weaker than typical Go function calls, which require
	// that the argument be assignable; this makes it possible to
	// use general-purpose functions like func(float64) string
	// even for more specific input types).
	SetFormatter(f interface{})

	CloneScaler() Scaler
}

XXX

A Scaler can be cardinal, discrete, or identity.

A cardinal Scaler has a VarCardinal input domain. If its output range is continuous, it maps an interval over the input to an interval of the output (possibly through a transformation such as a logarithm). If its output range is discrete, the input is discretized in value order and it acts like a discrete scale.

XXX The cardinal -> discrete rule means we need to keep all of the input data, rather than just its bounds, just in case the range is discrete. Maybe it should just be a bucketing rule?

A discrete Scaler has a VarNominal input domain. If the input is VarOrdinal, its order is used; otherwise, index order is imposed. If the output range is continuous, a discrete Scaler maps its input to the centers of equal sub-intervals of [0, 1] and then applies the Ranger. If the output range is discrete, the Scaler maps the Nth input level to the N%len(range)th output value.

An identity Scaler ignores its input domain and output range and uses an identity function for mapping input to output. This is useful for specifying aesthetics directly, such as color or size, and is especially useful for constant Vars.

XXX Should identity Scalers map numeric types to float64? Maybe it should depend on the range type of the ranger?

XXX Arrange documentation as X -> Y?

func DefaultScale

func DefaultScale(seq table.Slice) (Scaler, error)

func NewIdentityScale

func NewIdentityScale() Scaler

func NewOrdinalScale

func NewOrdinalScale() Scaler

type Stat

type Stat interface {
	F(table.Grouping) table.Grouping
}

A Stat transforms a table.Grouping.

type StepMode

type StepMode int

StepMode controls how LayerSteps connects subsequent points.

const (
	// StepHV makes LayerSteps connect subsequent points with a
	// horizontal segment and then a vertical segment.
	StepHV StepMode = iota

	// StepVH makes LayerSteps connect subsequent points with a
	// vertical segment and then a horizontal segment.
	StepVH

	// StepHMid makes LayerSteps connect subsequent points A and B
	// with three segments: a horizontal segment from A to the
	// midpoint between A and B, followed by vertical segment,
	// followed by a horizontal segment from the midpoint to B.
	StepHMid

	// StepVMid makes LayerSteps connect subsequent points A and B
	// with three segments: a vertical segment from A to the
	// midpoint between A and B, followed by horizontal segment,
	// followed by a vertical segment from the midpoint to B.
	StepVMid
)

func (StepMode) String

func (i StepMode) String() string

type Unscaled

type Unscaled float64

Unscaled represents a value that should not be scaled, but instead mapped directly to the output range. For continuous scales, this should be a value between 0 and 1. For discrete scales, this should be an integral value.

TODO: This is confusing for opacity and size because it *doesn't* specify an exact opacity or size ratio since their default rangers aren't [0,1]. Maybe Unscaled should bypass scaling altogether (and only work if the range type is float64).

Directories

Path Synopsis
Package layout provides helpers for laying out hierarchies of rectangular elements in two dimensional space.
Package layout provides helpers for laying out hierarchies of rectangular elements in two dimensional space.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL