Documentation
¶
Overview ¶
Package learn exposes machine learning functionalities.
Regression:
- linear regression
Classification:
- kNN
Clustering:
- k means clustering
Example of data ¶
Categorical and numerical features are supported in knn and kmc.
Hours Status Stars Price 12, "good", 5, 15.10 1, "bad" 1, 1
Work in progress package, APIs are unstable and can quickly change.
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ErrNoData = errors.New("learn: no data")
ErrNoData is returned in case of problems retrieving data from Table's underlying storage.
Functions ¶
func Normalize ¶
func Normalize(data Table, mu, sigma []float64, catSet []string) ([]float64, []float64, []string, error)
Normalize uses Table's Update() to modify rows with normalized values.
Numerical features are normalized with the formula:
x - mu ------ sigma
If mu, sigma or catSet are nil they are calculated and returned otherwise their computation is skipped and passed values are used.
Categorical features are mapped to a representation suitable from other functions in the package.
Types ¶
type Classifier ¶
type Classifier interface {
Predict(Table) (Table, error) // Returns a Table with predicted labels as rows.
}
Classifier models a classification problem (binary or multi-labels).
func NewkNN ¶
func NewkNN(trainData Table, k int) (Classifier, error)
NewkNN returns a new kNN Classifier. Labels must be stored as last field in Table's rows.
Given m number of training samples and n their number of features, if m < 100 brute force is used, otherwise a k-d tree is built. Brute force implementation is at least O(n*m) but if m is low should be a better choice as avoids tree building overhead. Search in k-d tree is (n*log(m)) but when n > ~20 k-d tree could become O(n*m).
Example ¶
package main import ( "fmt" "log" "github.com/eraclitux/learn" ) func main() { trainSet, err := learn.ReadAllCSV("datasets/iris.csv") if err != nil { log.Fatal(err) } mu, sigma, catSet, err := learn.Normalize(trainSet, nil, nil, nil) if err != nil { log.Fatal(err) } clf, err := learn.NewkNN(trainSet, 3) if err != nil { log.Fatal(err) } // Categorize single sample. var testSet learn.MemoryTable = make([][]interface{}, 1) testSet[0] = []interface{}{5.2, 3.4, 1.3, 0.1} _, _, _, err = learn.Normalize(testSet, mu, sigma, catSet) if err != nil { log.Fatal(err) } prediction, err := clf.Predict(testSet) if err != nil { log.Fatal(err) } r, err := prediction.Row(0) if err != nil { log.Fatal(err) } fmt.Println("predicted category:", r[0]) }
Output: predicted category: setosa
type ConfMatrix ¶
type ConfMatrix struct {
// contains filtered or unexported fields
}
ConfMatrix stores confusion matrix needed to calculate precision and recall for classified labels.
func ConfusionM ¶
func ConfusionM(expect, predict Table) (ConfMatrix, error)
ConfusionM computes confusion matrix. predict Table must store labels in single field rows, expected labels are taken from the last field of expect's rows.
func (ConfMatrix) String ¶
func (cm ConfMatrix) String() string
type KmcResult ¶
type KmcResult struct { Map []Point Centroids [][]interface{} TotalSSE float64 // Sum of squared errors }
KmcResult stores result of k mean clustering. FIXME divide TotalSSE for number of samples to have a smaller number.
func Kmc ¶
Kmc computes k means clustering (currently broken).
Data MUST be normalized before to be passed, Normalize function should be used.
Example ¶
package main import ( "fmt" "log" "github.com/eraclitux/learn" ) func main() { data, err := learn.ReadAllCSV("datasets/iris_nolabels.csv") if err != nil { log.Fatal(err) } _, _, _, err = learn.Normalize(data, nil, nil, nil) if err != nil { log.Fatal(err) } result, err := learn.Kmc(data, 3, nil) if err != nil { log.Fatal(err) } fmt.Println(result) }
Output:
type MemoryTable ¶
type MemoryTable [][]interface{}
MemoryTable is a Table that stores data in memory.
func (MemoryTable) Row ¶
func (t MemoryTable) Row(i int) ([]interface{}, error)
Row implements Table's Row.
func (MemoryTable) Update ¶
func (t MemoryTable) Update(i int, r []interface{}) error
Update implements Table's Update.
type Point ¶
type Point struct { K int // The index of centroid to which point belongs. Distance float64 // Distance from centroid. }
Point stores data about kmc's points.
type Regression ¶
Regression models a regression problem.
func NewLinearRegression ¶
func NewLinearRegression(Data Table) (Regression, error)
NewLinearRegression returns Regression type for linear regression.
Data is a Table with training samples as rows. Last element in the row MUST be the observed value of dependent variable y.
Current implementation uses normal equation, data normalization is not necessary.
Table will be loaded in memory.
Example ¶
package main import ( "fmt" "log" "github.com/eraclitux/learn" ) func main() { trainData, err := learn.ReadAllCSV("datasets/linear_test.csv") if err != nil { log.Fatal(err) } var tab learn.MemoryTable = make([][]interface{}, 1) // No need to normalize as normal equation // is used. tab[0] = []interface{}{1650.0, 3.0} lr, err := learn.NewLinearRegression(trainData) if err != nil { log.Fatal(err) } y, err := lr.Predict(tab) if err != nil { log.Fatal(err) } fmt.Printf("predicted price for a (%.f sq-ft, %.f rooms) house: $%.f", tab[0][0], tab[0][1], y[0]) }
Output: predicted price for a (1650 sq-ft, 3 rooms) house: $293081
type Table ¶
type Table interface { Caps() (int, int) // Returns rows and columns numbers. Row(i int) ([]interface{}, error) // Returns i-th row. Update(i int, r []interface{}) error // Substitutes i-th row with r. }
Table models tabular data.
func ReadAllCSV ¶
ReadAllCSV read whole file and load it in memory.
type Validation ¶
Validation stores validation data for a single label.
type ValidationReport ¶
type ValidationReport struct { Labels map[string]Validation Accuracy float64 }
ValidationReport stores precision and recall for all the labels and the overall accuracy.
func Validate ¶
func Validate(cm ConfMatrix) ValidationReport
Validate computes precision, recall and overall accuracy. Used for cross-validating Classifier.
Example ¶
package main import ( "fmt" "log" "github.com/eraclitux/learn" ) func main() { // Cross validation trainSet, err := learn.ReadAllCSV("datasets/iris_train.csv") if err != nil { log.Fatal(err) } mu, sigma, stringFeature, err := learn.Normalize(trainSet, nil, nil, nil) if err != nil { log.Fatal(err) } testSet, err := learn.ReadAllCSV("datasets/iris_test.csv") if err != nil { log.Fatal(err) } _, _, _, err = learn.Normalize(testSet, mu, sigma, stringFeature) if err != nil { log.Fatal(err) } clf, err := learn.NewkNN(trainSet, 3) if err != nil { log.Fatal(err) } predictedLabels, err := clf.Predict(testSet) if err != nil { log.Fatal(err) } confMatrix, err := learn.ConfusionM(testSet, predictedLabels) if err != nil { log.Fatal(err) } fmt.Println(confMatrix) report := learn.Validate(confMatrix) fmt.Println(report) }
Output: setosa(1): 5 0 0 versicolor(2): 0 7 0 virginica(3): 0 0 3 feature | precision | recall | setosa | 1.00 | 1.00 | versicolor | 1.00 | 1.00 | virginica | 1.00 | 1.00 | Overall accuracy: 1.00
func (ValidationReport) String ¶
func (r ValidationReport) String() string
Notes ¶
Bugs ¶
randomly returns same category in tests.