Version: v1.0.1 Latest Latest Go to latest
Published: Nov 19, 2019 License: MIT README ¶

regression

Multivariable Linear Regression in Go (golang)

installation

\$ go get github.com/sajari/regression

Supports Go 1.8+

example usage

Import the package, create a regression and add data to it. You can use as many variables as you like, in the below example there are 3 variables for each observation.

package main

import (
"fmt"

"github.com/sajari/regression"
)

func main() {
r := new(regression.Regression)
r.SetObserved("Murders per annum per 1,000,000 inhabitants")
r.SetVar(0, "Inhabitants")
r.SetVar(1, "Percent with incomes below \$5000")
r.SetVar(2, "Percent unemployed")
r.Train(
regression.DataPoint(11.2, []float64{587000, 16.5, 6.2}),
regression.DataPoint(13.4, []float64{643000, 20.5, 6.4}),
regression.DataPoint(40.7, []float64{635000, 26.3, 9.3}),
regression.DataPoint(5.3, []float64{692000, 16.5, 5.3}),
regression.DataPoint(24.8, []float64{1248000, 19.2, 7.3}),
regression.DataPoint(12.7, []float64{643000, 16.5, 5.9}),
regression.DataPoint(20.9, []float64{1964000, 20.2, 6.4}),
regression.DataPoint(35.7, []float64{1531000, 21.3, 7.6}),
regression.DataPoint(8.7, []float64{713000, 17.2, 4.9}),
regression.DataPoint(9.6, []float64{749000, 14.3, 6.4}),
regression.DataPoint(14.5, []float64{7895000, 18.1, 6}),
regression.DataPoint(26.9, []float64{762000, 23.1, 7.4}),
regression.DataPoint(15.7, []float64{2793000, 19.1, 5.8}),
regression.DataPoint(36.2, []float64{741000, 24.7, 8.6}),
regression.DataPoint(18.1, []float64{625000, 18.6, 6.5}),
regression.DataPoint(28.9, []float64{854000, 24.9, 8.3}),
regression.DataPoint(14.9, []float64{716000, 17.9, 6.7}),
regression.DataPoint(25.8, []float64{921000, 22.4, 8.6}),
regression.DataPoint(21.7, []float64{595000, 20.2, 8.4}),
regression.DataPoint(25.7, []float64{3353000, 16.9, 6.7}),
)
r.Run()

fmt.Printf("Regression formula:\n%v\n", r.Formula)
fmt.Printf("Regression:\n%s\n", r)
}

Note: You can also add data points one by one.

Once calculated you can print the data, look at the R^2, Variance, residuals, etc. You can also access the coefficients directly to use elsewhere, e.g.

// Get the coefficient for the "Inhabitants" variable 0:
c := r.Coeff(0)

You can also use the model to predict new data points

prediction, err := r.Predict([]float64{587000, 16.5, 6.2})

Feature crosses are supported so your model can capture fixed non-linear relationships

r.Train(
regression.DataPoint(11.2, []float64{587000, 16.5, 6.2}),
)
//Add a new feature which is the first variable (index 0) to the power of 2
r.Run() Documentation ¶

Constants ¶

This section is empty.

Variables ¶

View Source
var (
// ErrNotEnoughData signals that there weren't enough datapoint to train the model.
ErrNotEnoughData = errors.New("not enough data points")
// ErrTooManyVars signals that there are too many variables for the number of observations being made.
ErrTooManyVars = errors.New("not enough observations to to support this many variables")
// ErrRegressionRun signals that the Run method has already been called on the trained dataset.
ErrRegressionRun = errors.New("regression has already been run")
)

Functions ¶

func DataPoint ¶

func DataPoint(obs float64, vars []float64) *dataPoint

DataPoint creates a well formed *datapoint used for training.

func MakeDataPoints ¶

func MakeDataPoints(a [][]float64, obsIndex int) []*dataPoint

MakeDataPoints makes a `[]*dataPoint` from a `[][]float64`. The expected fomat for the input is a row-major [][]float64. That is to say the first slice represents a row, and the second represents the cols. Furthermore it is expected that all the col slices are of the same length. The obsIndex parameter indicates which column should be used

func MultiplierCross ¶

func MultiplierCross(vars ...int) featureCross

Feature cross based on the multiplication of multiple inputs.

func PowCross ¶

func PowCross(i int, power float64) featureCross

Feature cross based on computing the power of an input.

Types ¶

type DataPoints ¶

type DataPoints []*dataPoint

DataPoints is a slice of *dataPoint This type allows for easier construction of training data points.

type Regression ¶

type Regression struct {
R2                float64
Varianceobserved  float64
VariancePredicted float64

Formula string
// contains filtered or unexported fields
}

Regression is the exposed data structure for interacting with the API.

AddCross registers a feature cross to be applied to the data points.

func (*Regression) Coeff ¶

func (r *Regression) Coeff(i int) float64

Coeff returns the calculated coefficient for variable i.

func (*Regression) GetCoeffs ¶ added in v1.0.1

func (r *Regression) GetCoeffs() []float64

GetCoeffs returns the calculated coefficients. The element at index 0 is the offset.

func (*Regression) GetObserved ¶

func (r *Regression) GetObserved() string

GetObserved gets the name of the observed value.

func (*Regression) GetVar ¶

func (r *Regression) GetVar(i int) string

GetVar gets the name of variable i

func (*Regression) Predict ¶

func (r *Regression) Predict(vars []float64) (float64, error)

Predict updates the "Predicted" value for the inputed features.

func (*Regression) Run ¶

func (r *Regression) Run() error

Run determines if there is enough data present to run the regression and whether or not the training has already been completed. Once the above checks have passed feature crosses are applied if any and the model is trained using QR decomposition.

func (*Regression) SetObserved ¶

func (r *Regression) SetObserved(name string)

SetObserved sets the name of the observed value.

func (*Regression) SetVar ¶

func (r *Regression) SetVar(i int, name string)

SetVar sets the name of variable i.

func (*Regression) String ¶

func (r *Regression) String() string

String satisfies the stringer interface to display a regression as a string.

func (*Regression) Train ¶

func (r *Regression) Train(d ...*dataPoint)

Train the regression with some data points.