randomforest

package module
Version: v0.0.0-...-82dce2f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 8, 2022 License: Apache-2.0 Imports: 8 Imported by: 1

README

GoDoc: https://godoc.org/github.com/malaschitz/randomForest

Test:

go test ./... -cover -coverpkg=.  

randomForest

Random Forest implementation in golang.

Simple Random Forest

	xData := [][]float64{}
	yData := []int{}
	for i := 0; i < 1000; i++ {
		x := []float64{rand.Float64(), rand.Float64(), rand.Float64(), rand.Float64()}
		y := int(x[0] + x[1] + x[2] + x[3])
		xData = append(xData, x)
		yData = append(yData, y)
	}
	forest := randomForest.Forest{}		
	forest.Data = randomforest.ForestData{X: xData, Class: yData}
	forest.Train(1000)
	//test
	fmt.Println("Vote", forest.Vote([]float64{0.1, 0.1, 0.1, 0.1})) 
	fmt.Println("Vote", forest.Vote([]float64{0.9, 0.9, 0.9, 0.9}))

Extremely Randomized Trees

	forest.TrainX(1000)	

Deep Forest

Deep forest inspired by https://arxiv.org/abs/1705.07366

    dForest := forest.BuildDeepForest()
    dForest.Train(20, 100, 1000) //20 small forest with 100 trees help to build deep forest with 1000 trees

Continuos Random Forest

Continuos Random Forest for data where are still new and new data (forex, wheather, user logs, ...). New data create a new trees and oldest trees are removed.

forest := randomForest.Forest{}
data := []float64{rand.Float64(), rand.Float64()}
res := 1; //result
forest.AddDataRow(data, res, 1000, 10, 2000) 
// AddDataRow : add new row, trim oldest row if there is more than 1000 rows, calculate a new 10 trees, but remove oldest trees if there is more than 2000 trees.

Boruta Algorithm for feature selection

Boruta algorithm was developed as package for language R. It is one of most effective feature selection algorithm. There is paper in Journal of Statistical Software.

Boruta algorithm use random forest for selection important features.

	xData := ... //data
	yData := ... //labels
	selectedFeatures := randomforest.BorutaDefault(xData, yData)
	// or randomforest.BorutaDefault(xData, yData, 100, 20, 0.05, true, true)

In /examples is example with MNIST database. On picture are selected features (495 from 784) from images.

boruta 05

Isolation Forest

Isolation forest is an anomaly detection algorithm. It detects anomalies using isolation (how far a data point is to the rest of the data), rather than modelling the normal points. (wiki)

Two Isolation Forest methods are implemented. The first is done as a statistic over the standard Random Forest. After the Random Forest is computed, per tree and per branch it calculates how deep each record is. This is done over all trees, and the function returns the ranked statistics of the individual records. I recommend increasing the MaxDepth value.

	isolations, mean, stddev := forest.IsolationForest()
	for i, d := range isolations {
		fmt.Println(i, d, mean, stddev)
	}

The second method is done by https://en.wikipedia.org/wiki/Isolation_forest. It gives different results than the first one. In the isolation2.go example, it is used in a way that each label is evaluated separately.

	forest := randomforest.IsolationForest{X: x}
	forest.Train(TREES)

Result for MINST are on image.

isolation forest

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	NumWorkers = runtime.NumCPU() // max number of concurrent goroutines during training
)

Functions

func Boruta

func Boruta(x [][]float64, class []int, trees int, cycles int, threshold float64, recursive bool, verbose bool) ([]int, map[int]int)

func BorutaDefault

func BorutaDefault(x [][]float64, class []int) ([]int, map[int]int)

Boruta is smart algorithm for select important features with Random Forest. It was developed in language R.

X [][]float64 - data for random forest. At least three features (columns) are required. Class []int - classes for random forest (0,1,..) trees int - number of trees used by Boruta algorithm. Is not need too big number of trees. (50-200) cycles int - number of cycles (20-50) of Boruta algorithm. threshold float64 - threshold for select feauters (0.05) recursive bool - algorithm repeat process until all features are important verbose bool - will print process of boruta algorithm.

Types

type Branch

type Branch struct {
	Attribute        int
	Value            float64
	IsLeaf           bool
	LeafValue        []float64
	Gini             float64
	GiniGain         float64
	Size             int
	Branch0, Branch1 *Branch
	Depth            int
}

Branch is tree structure of branches

type DeepForest

type DeepForest struct {
	Forest         *Forest
	ForestDeep     Forest
	Groves         []Forest
	NGroves        int
	NFeatures      int
	NTrees         int
	RandomFeatures [][]int
	ResultFeatures [][]float64
	Results        []float64
}

DeepForest deep forest implementation where is standard forest, mini forests (Groves) and final ForestDeep (Forest + Groves)

func (*DeepForest) Train

func (dForest *DeepForest) Train(groves int, trees int, deepTrees int)

Train DeepForest with parameters of number of groves, number of trees in groves, number of trees in final Deep Forest

func (*DeepForest) Vote

func (dForest *DeepForest) Vote(x []float64) []float64

Vote return result of DeepForest

type Forest

type Forest struct {
	Data              ForestData // database for calculate trees
	Trees             []Tree     // all generated trees
	Features          int        // number of attributes
	Classes           int        // number of classes
	LeafSize          int        // leaf size
	MFeatures         int        // attributes for choose proper split
	NTrees            int        // number of trees
	NSize             int        // len of data
	MaxDepth          int        // max depth of forest
	FeatureImportance []float64  //stats of FeatureImportance
}

Forest je base class for whole forest with database, properties of Forest and trees.

func (*Forest) AddDataRow

func (forest *Forest) AddDataRow(data []float64, class int, max int, newTrees int, maxTrees int)

AddDataRow add new data data: new data row class: result max: max number of data. Remove first if there is more datas. If max < 1 - unlimited newTrees: number of trees after add data row maxTress: maximum number of trees

This feature support Continuous Random Forest

func (*Forest) BuildDeepForest

func (forest *Forest) BuildDeepForest() DeepForest

BuildDeepForest create DeepForest from Forest

func (*Forest) IsolationForest

func (forest *Forest) IsolationForest() (isolations []float64, mean float64, stddev float64)

Calculate outliers with Isolation Forest method

func (*Forest) PrintFeatureImportance

func (forest *Forest) PrintFeatureImportance()

PrintFeatureImportance print list of features

func (*Forest) Train

func (forest *Forest) Train(trees int)

Train run training process. Parameter is number of calculated trees.

func (*Forest) TrainX

func (forest *Forest) TrainX(trees int)

TrainX Extremely randomized trees

func (*Forest) Vote

func (forest *Forest) Vote(x []float64) []float64

Vote is used for calculate class in existed forest

func (*Forest) WeightVote

func (forest *Forest) WeightVote(x []float64) []float64

WeightVote use validation's weight for result

type ForestData

type ForestData struct {
	X     [][]float64 // All data are float64 numbers
	Class []int       // Result should be int numbers 0,1,2,..
}

ForestData contains database

type IsolationForest

type IsolationForest struct {
	X        [][]float64
	Features int      // number of attributes
	NTrees   int      // number of trees
	NSize    int      // len of data
	Sample   int      //sample size
	Results  [][2]int //results - sum of depths and counts for every data
}

Forest je base class for whole forest with database, properties of Forest and trees.

func (*IsolationForest) Train

func (forest *IsolationForest) Train(trees int)

Train run training process. Parameter is number of calculated trees.

type Tree

type Tree struct {
	Root       Branch
	Validation float64
}

Tree is one random tree in forest with Branch and validation number

Directories

Path Synopsis
img
tests
generator
Package generator is creting testing data for machine learning
Package generator is creting testing data for machine learning

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL