classifier

package module
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 3, 2018 License: Apache-2.0 Imports: 7 Imported by: 0

README

classifier

A naive bayes text classifier.

Codeship Status for n3integration/classifier codecov Go Report Card Documentation

Installation

go get github.com/n3integration/classifier

Usage

Classification

There are two methods of classification: io.Reader or string. To perform classification of strings, use the TrainString or ClassifyString functions. To classify larger sources, use the Train and Classify functions that take an io.Reader as input.

import "github.com/n3integration/classifier/naive"

classifier := naive.New()
classifier.TrainString("The quick brown fox jumped over the lazy dog", "ham")
classifier.TrainString("Earn a degree online", "ham")
classifier.TrainString("Earn cash quick online", "spam")

if classification, err := classifier.ClassifyString("Earn your masters degree online"); err == nil {
    fmt.Println("Classification => ", classification) // ham
} else {
    fmt.Println("error: ", err)
}

Contributing

  • Fork the repository
  • Create a local feature branch
  • Run gofmt
  • Bump the VERSION file using semantic versioning
  • Submit a pull request

License

Copyright 2018 n3integration@gmail.com

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Filter

func Filter(vs chan string, f Predicate) chan string

Filter removes elements from the input slice where the supplied predicate is satisfied

func IsNotStopWord

func IsNotStopWord(v string) bool

IsNotStopWord is the inverse function of IsStopWord

func IsStopWord

func IsStopWord(v string) bool

IsStopWord performs a binary search against a list of known english stop words returns true if v is a stop word; false otherwise

func Map

func Map(vs chan string, f Mapper) chan string

Map applies f to each element of the supplied input slice

func WordCounts

func WordCounts(r io.Reader) (map[string]int, error)

WordCounts extracts term frequencies from a text corpus

Types

type Classifier

type Classifier interface {
	// Train allows clients to train the classifier
	Train(io.Reader, string) error
	// TrainString allows clients to train the classifier using a string
	TrainString(string, string) error
	// Classify performs a classification on the input corpus and assumes that
	// the underlying classifier has been trained.
	Classify(io.Reader) (string, error)
	// ClassifyString performs text classification using a string
	ClassifyString(string) (string, error)
}

Classifier provides a simple interface for different text classifiers

type Mapper added in v0.3.0

type Mapper func(string) string

Mapper provides a map function

type Predicate added in v0.3.0

type Predicate func(string) bool

Predicate provides a predicate function

type Tokenizer added in v0.3.0

type Tokenizer interface {
	// Tokenize breaks the provided document into a token slice
	Tokenize(r io.Reader) chan string
}

Tokenizer provides a common interface to tokenize documents

func NewRegexTokenizer added in v0.3.0

func NewRegexTokenizer() Tokenizer

NewRegexTokenizer initializes a new regular expression Tokenizer instance

func NewTokenizer added in v0.3.0

func NewTokenizer() Tokenizer

NewTokenizer initializes a new standard Tokenizer instance

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL