bayes

package module

v0.1.0 Latest Latest Go to latest Published: Jun 14, 2018 License: MIT Imports: 5 Imported by: 5

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/gnames/bayes

Links

Open Source Insights

README ¶

bayes

A simple implementation of Naive Bayes classifier. More details are in docs.

Development

Testing

Install ginkgo, a BDD testing framefork for Go.

go get github.com/onsi/ginkgo/ginkgo
go get github.com/onsi/gomega

To run tests go to root directory of the project and run

ginkgo

#or

go test

Other implementations:

Go, Java, Python, R, Ruby

Documentation ¶

Overview ¶

Package bayes implements Naive Bayes trainer and classifier. Code is located at https://github.com/gnames/bayes

Naive Bayes rule calculates a probability of a hypothesis from a prior knowledge about the hypothesis, as well as the evidence that supports or diminishes the probability of the hypothesis. Prior knowledge can dramatically influence the posterior probability of a hypothesis. For example assuming that an adult bird that cannot fly is a penguin is very unlikely in the northern hemisphere, but is very likely in Antarctica. Bayes' theorem is often depicted as

P(H|E) = P(H) * P(E|H) / P(E)

where H is our hypothesis, E is a new evidence, P(H) is a prior probability of H to be true, P(E|H) is a known probability for the evidence when H is true, P(E) is a known probability of E in all known cases. P(H|E) is a posterior probability of a hypothesis H adjusted accordingly to the new evidence E.

Finding a probability that a hypothesis is true can be considered a classification event. Given prior knowledge and a new evidence we are able to classify an entity to a hypothesis that has the highest posterior probability.

Using odds instead of probabilities ¶

It is possible to represent Bayes theorem using odds. Odds describe how likely a hypothesis is in comparison to all other possible hypotheses.

odds = P(H) / (1 - P(H))

P(H) = odds / (1 + odds)

Using odds allows us to simplify Bayes calculations

oddsPosterior = oddsPrior * likelihood

where likelihood is

likelihood = P(E|H)/P(E|H')

P(E|H') in this case is a known probability of an evidence when H is not true. In case if we have several evidences that are independent from each other, posterior odds can be calculated as a product of prior odds and all likelihoods of all given evidences.

oddsPosterior = oddsPrior * likelihood1 * likelihood2 * likelihood3 ...

Each subsequent evidence modifies prior odds. If evidences are not independent (for example inability to fly and a propensity to nesting on the ground for birds) they skew the outcome. In reality given evidences are quite often not completely independent. Because of that Naive Bayes got its name. People who apply it "naively" state that their evidences are completely independent from each other. In practice Naive Bayes approach often shows good results in spite of this known fallacy.

Training and prior odds ¶

It is quite possible that while likelihoods of evidences are representative for classification data the prior odds from the training are not. As in the previous example an evidence that a bird cannot fly supports a 'penguin' hypothesis much better in Antarctica because odds to meet a penguin there are much higher than in the northern hemisphere. Therefore we give an ability to supply prior probability value at a classification event.

Terminology ¶

In natural language processing `evidences` are often called `features`. We follow the same convention in this package.

Hypotheses are often called `classes` or `labels`. Based on the outcome we classify an entity (assign a label to the entity in other words). In this package we use the term `label` for hypotheses. Every label receives a number of elements or `tokens`, each with a set of features.

Index ¶

func IgnorePriorOdds(nb *NaiveBayes) error
func Odds(l Labeler, lf LabelFreq) (float64, error)
func RegisterLabel(m map[string]Labeler)
func WithPriorOdds(lf LabelFreq) func(*NaiveBayes) error
type FeatureFreq
type FeatureName
type FeatureTotal
type FeatureValue
type Featurer
type LabelFreq
type LabeledFeatures
type Labeler
- func LabelFactory(s string) (Labeler, error)
type Likelihoods
type NaiveBayes
- func NewNaiveBayes() *NaiveBayes
- func TrainNB(lfs []LabeledFeatures, opts ...OptionNB) *NaiveBayes
type OptionNB
type Posterior

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func IgnorePriorOdds ¶

func IgnorePriorOdds(nb *NaiveBayes) error

IgnorePriorOdds might be needed if it is a multistep Bayes calculation and PriorOdds already are accounted for.

func Odds ¶

func Odds(l Labeler, lf LabelFreq) (float64, error)

Odds returns odds for a label in a given label frequency distribution

func RegisterLabel ¶

func RegisterLabel(m map[string]Labeler)

RegisterLabel takes a map from a string to Labeler interface. This map is required to unmarshal JSON data from a string to a label using LabelFactory.

func WithPriorOdds ¶

func WithPriorOdds(lf LabelFreq) func(*NaiveBayes) error

WithPriorOdds allows dynamical change of prior odds used in calculations. Sometimes prior odds during classification event are very different from ones aquired during training. If for example 'real' prior odds are 100 times larger it means the calculated posterior odds will be 100 times smaller than what they would suppose to be.

Types ¶

type FeatureFreq ¶

type FeatureFreq map[FeatureName]map[FeatureValue]map[Labeler]float64

FeatureFreq is a map for collecting frequencies of a training feature set. FeatureFreq is used for calculating Likelihoods of a NaiveBayes classifier.

type FeatureName ¶

type FeatureName string

FeatureName is a name of a Feature

type FeatureTotal ¶

type FeatureTotal map[FeatureName]map[FeatureValue]float64

FeatureTotal is used for calculating multinomial likelihoods. For example if we are interested in calculating likelihood of feature `f` its likelihood would be

L = P(f|H)/P(f/H')

where `H` is main "hypothesis" or "label" and `H'` is a combination of all other hypotheses.

type FeatureValue ¶

type FeatureValue string

FeatureName is a value of a Feature

type Featurer ¶

type Featurer interface {
	// Name defines an id of a feature
	Name() FeatureName
	// Value defines the value of a feature. The value set can be simple
	// 'true|false' or more complex 'red|blue|grey|yellow'
	Value() FeatureValue
}

Feature is an interface of an "evidence" we use for training a NaiveBayes classifier, or for the classification of an unknown entity.

type LabelFreq ¶

type LabelFreq map[Labeler]float64

LabelFreq is a collection of counts for every Label in the training dataset. this information allows to calculate prior odds for a Label.

type LabeledFeatures ¶

type LabeledFeatures struct {
	Features []Featurer
	Label    Labeler
}

LabeledFeatures are data used for supervised training of NaiveBayes algorithm.

type Labeler ¶

type Labeler interface {
	fmt.Stringer
}

Labeler is an interface to represent a "hypothesis" or "class" the NaiveBayes is aware about.

func LabelFactory ¶

func LabelFactory(s string) (Labeler, error)

LabelFactory takes a string and returns a Label. This function is mostly used for unmarshalling data from JSON to NaiveBayes object.

type Likelihoods ¶

type Likelihoods map[Labeler]map[FeatureName]map[FeatureValue]float64

Likelihoods provide a likelihood of a feature to appear for a particular label.

type NaiveBayes ¶

type NaiveBayes struct {
	// Labels is a list of "hypotheses", "classes", "categories", "labels".
	// It contains all labels created by training.
	Labels []Labeler `json:"-"`
	// FeatureFreq keeps count of all the features for the labels.
	FeatureFreq `json:"-"`
	// LabelFreq keeps counts of the tokens belonging to each label
	LabelFreq `json:"-"`
	// FeatureTotal keeps total count of tokens for each feature.
	FeatureTotal `json:"feature_total"`
	// Total is a total number of tokens used for training.
	Total float64 `json:"total"`
	// IgnorePriorOdds is set true if prior odds do not need to be used in the
	// returning result.
	IgnorePriorOdds bool

	Output io.Writer `json:"-"`
	// contains filtered or unexported fields
}

NaiveBayes is a classifier for assigning an entity represented by its features to a label.

func NewNaiveBayes ¶

func NewNaiveBayes() *NaiveBayes

NewNaiveBayes is a constructor for NaiveBayes object. It initializes several important defaults, and sets options that modify behavior of the NaiveBayes object. Currently constructor supports the following options:

func TrainNB ¶

func TrainNB(lfs []LabeledFeatures, opts ...OptionNB) *NaiveBayes

TrainNB takes data from a training dataset and returns back a trained classifier.

func (*NaiveBayes) Dump ¶

func (nb *NaiveBayes) Dump() []byte

Dump serializes a NaiveBayes object into a JSON format.

func (*NaiveBayes) MarshalJSON ¶

func (nb *NaiveBayes) MarshalJSON() ([]byte, error)

MarshalJSON serializes a NaiveBayes object to JSON.

func (*NaiveBayes) Predict ¶

func (nb *NaiveBayes) Predict(fs []Featurer,
	opts ...OptionNB) (Posterior, error)

Predict is a general function that runs NaiveBayes classifier against trained set. It can take a different PriorOdds value to influence calculation of the Posterior Odds.

func (*NaiveBayes) Restore ¶

func (nb *NaiveBayes) Restore(dump []byte)

Restore deserializes a JSON text into NaiveBayes object. The function needs to know how to convert a string that represents a label to an object. Use RegisterLabel function to inject a string-to-Label conversion map.

func (*NaiveBayes) TrainingPrior ¶

func (nb *NaiveBayes) TrainingPrior(l Labeler) (float64, error)

TrainingPrior returns prior odds calculated from the training set

func (*NaiveBayes) UnmarshalJSON ¶

func (nb *NaiveBayes) UnmarshalJSON(data []byte) (err error)

UnmarshalJSON deserializes JSON data to a NaiveBayes object.

type OptionNB ¶

type OptionNB func(*NaiveBayes) error

OptionNB is a type for options supplied to NaiveBayes classifier. It can support either flags or parameterized options.

type Posterior ¶

type Posterior struct {
	LabelOdds map[Labeler]float64
	MaxLabel  Labeler
	MaxOdds   float64
	LabelFreq
	Likelihoods
}

Posterior contains outcomes from NativeBayes classifier

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL