rf

package
v0.30.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 4, 2021 License: BSD-3-Clause, BSD-3-Clause Imports: 10 Imported by: 0

Documentation

Overview

Package rf implement ensemble of classifiers using random forest algorithm by Breiman and Cutler.

Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.

The implementation is based on various sources and using author experience.

Index

Constants

View Source
const (

	// DefNumTree default number of tree.
	DefNumTree = 100

	// DefPercentBoot default percentage of sample that will be used for
	// bootstraping a tree.
	DefPercentBoot = 66

	// DefOOBStatsFile default statistic file output.
	DefOOBStatsFile = "rf.oob.stat"

	// DefPerfFile default performance file output.
	DefPerfFile = "rf.perf"

	// DefStatFile default statistic file.
	DefStatFile = "rf.stat"
)

Variables

View Source
var (
	// ErrNoInput will tell you when no input is given.
	ErrNoInput = errors.New("rf: input samples is empty")
)

Functions

This section is empty.

Types

type Runtime

type Runtime struct {
	// Runtime embed common fields for classifier.
	classifier.Runtime

	// NTree number of tree in forest.
	NTree int `json:"NTree"`
	// NRandomFeature number of feature randomly selected for each tree.
	NRandomFeature int `json:"NRandomFeature"`
	// PercentBoot percentage of sample for bootstraping.
	PercentBoot int `json:"PercentBoot"`
	// contains filtered or unexported fields
}

Runtime contains input and output configuration when generating random forest.

func (*Runtime) AddBagIndex

func (forest *Runtime) AddBagIndex(bagIndex []int)

AddBagIndex add bagging index for book keeping.

func (*Runtime) AddCartTree

func (forest *Runtime) AddCartTree(tree cart.Runtime)

AddCartTree add tree to forest

func (*Runtime) Build

func (forest *Runtime) Build(samples tabula.ClasetInterface) (e error)

Build the forest using samples dataset.

Algorithm,

(0) Recheck input value: number of tree, percentage bootstrap, etc; and

Open statistic file output.

(1) For 0 to NTree, (1.1) Create new tree, repeat until all trees has been build. (2) Compute and write total statistic.

func (*Runtime) ClassifySet

func (forest *Runtime) ClassifySet(samples tabula.ClasetInterface,
	sampleIds []int,
) (
	predicts []string, cm *classifier.CM, probs []float64,
)

ClassifySet given a samples predict their class by running each sample in forest, and return their class prediction with confusion matrix. `samples` is the sample that will be predicted, `sampleIds` is the index of samples. If `sampleIds` is not nil, then sample index will be checked in each tree, if the sample is used for training, their vote is not counted.

Algorithm,

(0) Get value space (possible class values in dataset) (1) For each row in test-set, (1.1) collect votes in all trees, (1.2) select majority class vote, and (1.3) compute and save the actual class probabilities. (2) Compute confusion matrix from predictions. (3) Compute stat from confusion matrix. (4) Write the stat to file only if sampleIds is empty, which mean its run not from OOB set.

func (*Runtime) GrowTree

func (forest *Runtime) GrowTree(samples tabula.ClasetInterface) (
	cm *classifier.CM, stat *classifier.Stat, e error,
)

GrowTree build a new tree in forest, return OOB error value or error if tree can not grow.

Algorithm,

(1) Select random samples with replacement, also with OOB. (2) Build tree using CART, without pruning. (3) Add tree to forest. (4) Save index of random samples for calculating error rate later. (5) Run OOB on forest. (6) Calculate OOB error rate and statistic values.

func (*Runtime) Initialize

func (forest *Runtime) Initialize(samples tabula.ClasetInterface) error

Initialize will check forest inputs and set it to default values if invalid.

It will also calculate number of random samples for each tree using,

number-of-sample * percentage-of-bootstrap

func (*Runtime) Trees

func (forest *Runtime) Trees() []cart.Runtime

Trees return all tree in forest.

func (*Runtime) Votes

func (forest *Runtime) Votes(sample *tabula.Row, sampleIdx int) (
	votes []string,
)

Votes will return votes, or classes, in each tree based on sample. If checkIdx is true then the `sampleIdx` will be checked in if it has been used when training the tree, if its exist then the sample will be skipped.

(1) If row is used to build the tree then skip it, (2) classify row in tree, (3) save tree class value.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL