learn

package module
v0.0.0-...-c838f72 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 5, 2017 License: MIT Imports: 16 Imported by: 0

README

=====
learn
=====

|image0|_ |image1|_ |image2|_

.. |image0| image:: https://godoc.org/github.com/eraclitux/learn?status.png
.. _image0: https://godoc.org/github.com/eraclitux/learn

.. |image1| image:: https://travis-ci.org/eraclitux/learn.svg?branch=master
.. _image1: https://travis-ci.org/eraclitux/learn

.. |image2| image:: https://goreportcard.com/badge/github.com/eraclitux/learn
.. _image2: https://goreportcard.com/report/github.com/eraclitux/learn

Machine learning for Go.

Work in progress package, APIs are unstable and can quickly change.

Datasets
--------

Datasets used in tests are from `UCI Machine Learning Repository <http://archive.ics.uci.edu/ml>`_

Documentation

Overview

Package learn exposes machine learning functionalities.

Regression:

  • linear regression

Classification:

  • kNN

Clustering:

  • k means clustering

Example of data

Categorical and numerical features are supported in knn and kmc.

Hours	Status		Stars	Price
12,	"good",		5,	15.10
1,	"bad"		1,	1

Work in progress package, APIs are unstable and can quickly change.

Index

Examples

Constants

This section is empty.

Variables

View Source
var ErrNoData = errors.New("learn: no data")

ErrNoData is returned in case of problems retrieving data from Table's underlying storage.

Functions

func Normalize

func Normalize(data Table, mu, sigma []float64, catSet []string) ([]float64, []float64, []string, error)

Normalize uses Table's Update() to modify rows with normalized values.

Numerical features are normalized with the formula:

x - mu
------
sigma

If mu, sigma or catSet are nil they are calculated and returned otherwise their computation is skipped and passed values are used.

Categorical features are mapped to a representation suitable from other functions in the package.

Types

type Classifier

type Classifier interface {
	Predict(Table) (Table, error) // Returns a Table with predicted labels as rows.
}

Classifier models a classification problem (binary or multi-labels).

func NewkNN

func NewkNN(trainData Table, k int) (Classifier, error)

NewkNN returns a new kNN Classifier. Labels must be stored as last field in Table's rows.

Given m number of training samples and n their number of features, if m < 100 brute force is used, otherwise a k-d tree is built. Brute force implementation is at least O(n*m) but if m is low should be a better choice as avoids tree building overhead. Search in k-d tree is (n*log(m)) but when n > ~20 k-d tree could become O(n*m).

Example
package main

import (
	"fmt"
	"log"

	"github.com/eraclitux/learn"
)

func main() {
	trainSet, err := learn.ReadAllCSV("datasets/iris.csv")
	if err != nil {
		log.Fatal(err)
	}
	mu, sigma, catSet, err := learn.Normalize(trainSet, nil, nil, nil)
	if err != nil {
		log.Fatal(err)
	}
	clf, err := learn.NewkNN(trainSet, 3)
	if err != nil {
		log.Fatal(err)
	}
	// Categorize single sample.
	var testSet learn.MemoryTable = make([][]interface{}, 1)
	testSet[0] = []interface{}{5.2, 3.4, 1.3, 0.1}
	_, _, _, err = learn.Normalize(testSet, mu, sigma, catSet)
	if err != nil {
		log.Fatal(err)
	}
	prediction, err := clf.Predict(testSet)
	if err != nil {
		log.Fatal(err)
	}
	r, err := prediction.Row(0)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("predicted category:", r[0])

}
Output:

predicted category: setosa

type ConfMatrix

type ConfMatrix struct {
	// contains filtered or unexported fields
}

ConfMatrix stores confusion matrix needed to calculate precision and recall for classified labels.

func ConfusionM

func ConfusionM(expect, predict Table) (ConfMatrix, error)

ConfusionM computes confusion matrix. predict Table must store labels in single field rows, expected labels are taken from the last field of expect's rows.

func (ConfMatrix) String

func (cm ConfMatrix) String() string

type KmcResult

type KmcResult struct {
	Map       []Point
	Centroids [][]interface{}
	TotalSSE  float64 // Sum of squared errors
}

KmcResult stores result of k mean clustering. FIXME divide TotalSSE for number of samples to have a smaller number.

func Kmc

func Kmc(data Table, k int, weights []float64) (result *KmcResult, er error)

Kmc computes k means clustering (currently broken).

Data MUST be normalized before to be passed, Normalize function should be used.

Example
package main

import (
	"fmt"
	"log"

	"github.com/eraclitux/learn"
)

func main() {
	data, err := learn.ReadAllCSV("datasets/iris_nolabels.csv")
	if err != nil {
		log.Fatal(err)
	}
	_, _, _, err = learn.Normalize(data, nil, nil, nil)
	if err != nil {
		log.Fatal(err)
	}
	result, err := learn.Kmc(data, 3, nil)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(result)
}
Output:

func (*KmcResult) String

func (r *KmcResult) String() string

type MemoryTable

type MemoryTable [][]interface{}

MemoryTable is a Table that stores data in memory.

func (MemoryTable) Caps

func (t MemoryTable) Caps() (int, int)

Caps implements Table's Caps.

func (MemoryTable) Row

func (t MemoryTable) Row(i int) ([]interface{}, error)

Row implements Table's Row.

func (MemoryTable) Update

func (t MemoryTable) Update(i int, r []interface{}) error

Update implements Table's Update.

type Point

type Point struct {
	K        int     // The index of centroid to which point belongs.
	Distance float64 // Distance from centroid.
}

Point stores data about kmc's points.

type Regression

type Regression interface {
	Predict(Table) ([]float64, error)
}

Regression models a regression problem.

func NewLinearRegression

func NewLinearRegression(Data Table) (Regression, error)

NewLinearRegression returns Regression type for linear regression.

Data is a Table with training samples as rows. Last element in the row MUST be the observed value of dependent variable y.

Current implementation uses normal equation, data normalization is not necessary.

Table will be loaded in memory.

Example
package main

import (
	"fmt"
	"log"

	"github.com/eraclitux/learn"
)

func main() {
	trainData, err := learn.ReadAllCSV("datasets/linear_test.csv")
	if err != nil {
		log.Fatal(err)
	}
	var tab learn.MemoryTable = make([][]interface{}, 1)
	// No need to normalize as normal equation
	// is used.
	tab[0] = []interface{}{1650.0, 3.0}
	lr, err := learn.NewLinearRegression(trainData)
	if err != nil {
		log.Fatal(err)
	}
	y, err := lr.Predict(tab)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("predicted price for a (%.f sq-ft, %.f rooms) house: $%.f", tab[0][0], tab[0][1], y[0])
}
Output:

predicted price for a (1650 sq-ft, 3 rooms) house: $293081

type Table

type Table interface {
	Caps() (int, int)                    // Returns rows and columns numbers.
	Row(i int) ([]interface{}, error)    // Returns i-th row.
	Update(i int, r []interface{}) error // Substitutes i-th row with r.
}

Table models tabular data.

func ReadAllCSV

func ReadAllCSV(path string) (Table, error)

ReadAllCSV read whole file and load it in memory.

type Validation

type Validation struct {
	Precision float64
	Recall    float64
}

Validation stores validation data for a single label.

type ValidationReport

type ValidationReport struct {
	Labels   map[string]Validation
	Accuracy float64
}

ValidationReport stores precision and recall for all the labels and the overall accuracy.

func Validate

func Validate(cm ConfMatrix) ValidationReport

Validate computes precision, recall and overall accuracy. Used for cross-validating Classifier.

Example
package main

import (
	"fmt"
	"log"

	"github.com/eraclitux/learn"
)

func main() {
	// Cross validation
	trainSet, err := learn.ReadAllCSV("datasets/iris_train.csv")
	if err != nil {
		log.Fatal(err)
	}
	mu, sigma, stringFeature, err := learn.Normalize(trainSet, nil, nil, nil)
	if err != nil {
		log.Fatal(err)
	}
	testSet, err := learn.ReadAllCSV("datasets/iris_test.csv")
	if err != nil {
		log.Fatal(err)
	}
	_, _, _, err = learn.Normalize(testSet, mu, sigma, stringFeature)
	if err != nil {
		log.Fatal(err)
	}
	clf, err := learn.NewkNN(trainSet, 3)
	if err != nil {
		log.Fatal(err)
	}
	predictedLabels, err := clf.Predict(testSet)
	if err != nil {
		log.Fatal(err)
	}
	confMatrix, err := learn.ConfusionM(testSet, predictedLabels)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(confMatrix)
	report := learn.Validate(confMatrix)
	fmt.Println(report)

}
Output:

            setosa(1):           5           0           0
        versicolor(2):           0           7           0
         virginica(3):           0           0           3

     feature | precision | recall |
      setosa |      1.00 |   1.00 |
  versicolor |      1.00 |   1.00 |
   virginica |      1.00 |   1.00 |
Overall accuracy: 1.00

func (ValidationReport) String

func (r ValidationReport) String() string

Notes

Bugs

  • randomly returns same category in tests.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL