learn

package module

v0.0.0-...-c838f72 Latest Latest Go to latest Published: Jun 5, 2017 License: MIT Imports: 16 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/eraclitux/learn

Links

Open Source Insights

README ¶

=====
learn
=====

|image0|_ |image1|_ |image2|_

.. |image0| image:: https://godoc.org/github.com/eraclitux/learn?status.png
.. _image0: https://godoc.org/github.com/eraclitux/learn

.. |image1| image:: https://travis-ci.org/eraclitux/learn.svg?branch=master
.. _image1: https://travis-ci.org/eraclitux/learn

.. |image2| image:: https://goreportcard.com/badge/github.com/eraclitux/learn
.. _image2: https://goreportcard.com/report/github.com/eraclitux/learn

Machine learning for Go.

Work in progress package, APIs are unstable and can quickly change.

Datasets
--------

Datasets used in tests are from `UCI Machine Learning Repository <http://archive.ics.uci.edu/ml>`_

Documentation ¶

Overview ¶

Package learn exposes machine learning functionalities.

Regression:

linear regression

Classification:

kNN

Clustering:

k means clustering

Example of data ¶

Categorical and numerical features are supported in knn and kmc.

Hours	Status		Stars	Price
12,	"good",		5,	15.10
1,	"bad"		1,	1

Work in progress package, APIs are unstable and can quickly change.

Index ¶

Variables
func Normalize(data Table, mu, sigma []float64, catSet []string) ([]float64, []float64, []string, error)
type Classifier
- func NewkNN(trainData Table, k int) (Classifier, error)
type ConfMatrix
- func ConfusionM(expect, predict Table) (ConfMatrix, error)
- func (cm ConfMatrix) String() string
type KmcResult
- func Kmc(data Table, k int, weights []float64) (result *KmcResult, er error)
- func (r *KmcResult) String() string
type MemoryTable
- func (t MemoryTable) Caps() (int, int)
- func (t MemoryTable) Row(i int) ([]interface{}, error)
- func (t MemoryTable) Update(i int, r []interface{}) error
type Point
type Regression
- func NewLinearRegression(Data Table) (Regression, error)
type Table
- func ReadAllCSV(path string) (Table, error)
type Validation
type ValidationReport
- func Validate(cm ConfMatrix) ValidationReport
- func (r ValidationReport) String() string
Bugs

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrNoData = errors.New("learn: no data")

ErrNoData is returned in case of problems retrieving data from Table's underlying storage.

Functions ¶

func Normalize ¶

func Normalize(data Table, mu, sigma []float64, catSet []string) ([]float64, []float64, []string, error)

Normalize uses Table's Update() to modify rows with normalized values.

Numerical features are normalized with the formula:

x - mu
------
sigma

If mu, sigma or catSet are nil they are calculated and returned otherwise their computation is skipped and passed values are used.

Categorical features are mapped to a representation suitable from other functions in the package.

Types ¶

type Classifier ¶

type Classifier interface {
	Predict(Table) (Table, error) // Returns a Table with predicted labels as rows.
}

Classifier models a classification problem (binary or multi-labels).

func NewkNN ¶

func NewkNN(trainData Table, k int) (Classifier, error)

NewkNN returns a new kNN Classifier. Labels must be stored as last field in Table's rows.

Given m number of training samples and n their number of features, if m < 100 brute force is used, otherwise a k-d tree is built. Brute force implementation is at least O(n*m) but if m is low should be a better choice as avoids tree building overhead. Search in k-d tree is (n*log(m)) but when n > ~20 k-d tree could become O(n*m).

Example ¶

package main

import (
	"fmt"
	"log"

	"github.com/eraclitux/learn"
)

func main() {
	trainSet, err := learn.ReadAllCSV("datasets/iris.csv")
	if err != nil {
		log.Fatal(err)
	}
	mu, sigma, catSet, err := learn.Normalize(trainSet, nil, nil, nil)
	if err != nil {
		log.Fatal(err)
	}
	clf, err := learn.NewkNN(trainSet, 3)
	if err != nil {
		log.Fatal(err)
	}
	// Categorize single sample.
	var testSet learn.MemoryTable = make([][]interface{}, 1)
	testSet[0] = []interface{}{5.2, 3.4, 1.3, 0.1}
	_, _, _, err = learn.Normalize(testSet, mu, sigma, catSet)
	if err != nil {
		log.Fatal(err)
	}
	prediction, err := clf.Predict(testSet)
	if err != nil {
		log.Fatal(err)
	}
	r, err := prediction.Row(0)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("predicted category:", r[0])

}

Output:

predicted category: setosa

type ConfMatrix ¶

type ConfMatrix struct {
	// contains filtered or unexported fields
}

ConfMatrix stores confusion matrix needed to calculate precision and recall for classified labels.

func ConfusionM ¶

func ConfusionM(expect, predict Table) (ConfMatrix, error)

ConfusionM computes confusion matrix. predict Table must store labels in single field rows, expected labels are taken from the last field of expect's rows.

func (ConfMatrix) String ¶

func (cm ConfMatrix) String() string

type KmcResult ¶

type KmcResult struct {
	Map       []Point
	Centroids [][]interface{}
	TotalSSE  float64 // Sum of squared errors
}

KmcResult stores result of k mean clustering. FIXME divide TotalSSE for number of samples to have a smaller number.

func Kmc ¶

func Kmc(data Table, k int, weights []float64) (result *KmcResult, er error)

Kmc computes k means clustering (currently broken).

Data MUST be normalized before to be passed, Normalize function should be used.

Example ¶

package main

import (
	"fmt"
	"log"

	"github.com/eraclitux/learn"
)

func main() {
	data, err := learn.ReadAllCSV("datasets/iris_nolabels.csv")
	if err != nil {
		log.Fatal(err)
	}
	_, _, _, err = learn.Normalize(data, nil, nil, nil)
	if err != nil {
		log.Fatal(err)
	}
	result, err := learn.Kmc(data, 3, nil)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(result)
}

Output:

func (*KmcResult) String ¶

func (r *KmcResult) String() string

type MemoryTable ¶

type MemoryTable [][]interface{}

MemoryTable is a Table that stores data in memory.

func (MemoryTable) Caps ¶

func (t MemoryTable) Caps() (int, int)

Caps implements Table's Caps.

func (MemoryTable) Row ¶

func (t MemoryTable) Row(i int) ([]interface{}, error)

Row implements Table's Row.

func (MemoryTable) Update ¶

func (t MemoryTable) Update(i int, r []interface{}) error

Update implements Table's Update.

type Point ¶

type Point struct {
	K        int     // The index of centroid to which point belongs.
	Distance float64 // Distance from centroid.
}

Point stores data about kmc's points.

type Regression ¶

type Regression interface {
	Predict(Table) ([]float64, error)
}

Regression models a regression problem.

func NewLinearRegression ¶

func NewLinearRegression(Data Table) (Regression, error)

NewLinearRegression returns Regression type for linear regression.

Data is a Table with training samples as rows. Last element in the row MUST be the observed value of dependent variable y.

Current implementation uses normal equation, data normalization is not necessary.

Table will be loaded in memory.

Example ¶

package main

import (
	"fmt"
	"log"

	"github.com/eraclitux/learn"
)

func main() {
	trainData, err := learn.ReadAllCSV("datasets/linear_test.csv")
	if err != nil {
		log.Fatal(err)
	}
	var tab learn.MemoryTable = make([][]interface{}, 1)
	// No need to normalize as normal equation
	// is used.
	tab[0] = []interface{}{1650.0, 3.0}
	lr, err := learn.NewLinearRegression(trainData)
	if err != nil {
		log.Fatal(err)
	}
	y, err := lr.Predict(tab)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("predicted price for a (%.f sq-ft, %.f rooms) house: $%.f", tab[0][0], tab[0][1], y[0])
}

Output:

predicted price for a (1650 sq-ft, 3 rooms) house: $293081

type Table ¶

type Table interface {
	Caps() (int, int)                    // Returns rows and columns numbers.
	Row(i int) ([]interface{}, error)    // Returns i-th row.
	Update(i int, r []interface{}) error // Substitutes i-th row with r.
}

Table models tabular data.

func ReadAllCSV ¶

func ReadAllCSV(path string) (Table, error)

ReadAllCSV read whole file and load it in memory.

type Validation ¶

type Validation struct {
	Precision float64
	Recall    float64
}

Validation stores validation data for a single label.

type ValidationReport ¶

type ValidationReport struct {
	Labels   map[string]Validation
	Accuracy float64
}

ValidationReport stores precision and recall for all the labels and the overall accuracy.

func Validate ¶

func Validate(cm ConfMatrix) ValidationReport

Validate computes precision, recall and overall accuracy. Used for cross-validating Classifier.

Example ¶

package main

import (
	"fmt"
	"log"

	"github.com/eraclitux/learn"
)

func main() {
	// Cross validation
	trainSet, err := learn.ReadAllCSV("datasets/iris_train.csv")
	if err != nil {
		log.Fatal(err)
	}
	mu, sigma, stringFeature, err := learn.Normalize(trainSet, nil, nil, nil)
	if err != nil {
		log.Fatal(err)
	}
	testSet, err := learn.ReadAllCSV("datasets/iris_test.csv")
	if err != nil {
		log.Fatal(err)
	}
	_, _, _, err = learn.Normalize(testSet, mu, sigma, stringFeature)
	if err != nil {
		log.Fatal(err)
	}
	clf, err := learn.NewkNN(trainSet, 3)
	if err != nil {
		log.Fatal(err)
	}
	predictedLabels, err := clf.Predict(testSet)
	if err != nil {
		log.Fatal(err)
	}
	confMatrix, err := learn.ConfusionM(testSet, predictedLabels)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(confMatrix)
	report := learn.Validate(confMatrix)
	fmt.Println(report)

}

Output:

            setosa(1):           5           0           0
        versicolor(2):           0           7           0
         virginica(3):           0           0           3

     feature | precision | recall |
      setosa |      1.00 |   1.00 |
  versicolor |      1.00 |   1.00 |
   virginica |      1.00 |   1.00 |
Overall accuracy: 1.00

func (ValidationReport) String ¶

func (r ValidationReport) String() string

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

learn

README ¶

Documentation ¶

Overview ¶

Example of data ¶

Index ¶

Examples ¶

Constants ¶

Variables ¶

Functions ¶

func Normalize ¶

Types ¶

type Classifier ¶

func NewkNN ¶

type ConfMatrix ¶

func ConfusionM ¶

func (ConfMatrix) String ¶

type KmcResult ¶

func Kmc ¶

func (*KmcResult) String ¶

type MemoryTable ¶

func (MemoryTable) Caps ¶

func (MemoryTable) Row ¶

func (MemoryTable) Update ¶

type Point ¶

type Regression ¶

func NewLinearRegression ¶

type Table ¶

func ReadAllCSV ¶

type Validation ¶

type ValidationReport ¶

func Validate ¶

func (ValidationReport) String ¶

Notes ¶

Bugs ¶

Source Files ¶