classifier

package module
v1.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 5, 2023 License: Apache-2.0 Imports: 4 Imported by: 0

README

classifier

A naive bayes text classifier.

codecov Go Report Card Documentation

Installation

go get github.com/carautenbach/classifier

Usage

Classification

There are two methods of classification: io.Reader or string. To classify strings, use the TrainString or ClassifyString functions. To classify larger sources, use the Train and Classify functions that take an io.Reader as input.

import "github.com/carautenbach/classifier/naive"

classifier := naive.New()
classifier.TrainString("The quick brown fox jumped over the lazy dog", "ham")
classifier.TrainString("Earn a degree online", "ham")
classifier.TrainString("Earn cash quick online", "spam")

if classification, err := classifier.ClassifyString("Earn your masters degree online"); err == nil {
    fmt.Println("Classification => ", classification) // ham
} else {
    fmt.Println("error: ", err)
}

Contributing

  • Fork the repository
  • Create a local feature branch
  • Run gofmt
  • Bump the VERSION file using semantic versioning
  • Submit a pull request

License

Copyright 2022 carautenbach@gmail.com

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Filter

func Filter(vs chan string, filters ...Predicate) chan string

Filter removes elements from the input channel where the supplied predicate is satisfied Filter is a Predicate aggregation

func IsNotStopWord

func IsNotStopWord(v string) bool

IsNotStopWord is the inverse function of IsStopWord

func IsStopWord

func IsStopWord(v string) bool

IsStopWord performs a binary search against a list of known english stop words returns true if v is a stop word; false otherwise

func Map

func Map(vs chan string, f ...Mapper) chan string

Map applies f to each element of the supplied input channel

Types

type Classifier

type Classifier interface {
	// Train allows clients to train the classifier
	Train(io.Reader, string) error
	// TrainString allows clients to train the classifier using a string
	TrainString(string, string) error
	// Classify performs a classification on the input corpus and assumes that
	// the underlying classifier has been trained.
	Classify(io.Reader) (string, error)
	// ClassifyString performs text classification using a string
	ClassifyString(string) (string, error)
}

Classifier provides a simple interface for different text classifiers

type Mapper

type Mapper func(string) string

Mapper provides a map function

type Predicate

type Predicate func(string) bool

Predicate provides a predicate function

type StdOption

type StdOption func(*StdTokenizer)

StdOption provides configuration settings for a StdTokenizer

func BufferSize

func BufferSize(size int) StdOption

BufferSize adjusts the size of the buffered channel

func Filters

func Filters(f ...Predicate) StdOption

Filters overrides the list of predicates

func Transforms

func Transforms(m ...Mapper) StdOption

Transforms overrides the list of mappers

type StdTokenizer

type StdTokenizer struct {
	// contains filtered or unexported fields
}

StdTokenizer provides a common document tokenizer that splits a document by word boundaries

func NewTokenizer

func NewTokenizer(opts ...StdOption) *StdTokenizer

NewTokenizer initializes a new standard Tokenizer instance

func (*StdTokenizer) Tokenize

func (t *StdTokenizer) Tokenize(r io.Reader) chan string

Tokenize words and return streaming results

type Tokenizer

type Tokenizer interface {
	// Tokenize breaks the provided document into a channel of tokens
	Tokenize(io.Reader) chan string
}

Tokenizer provides a common interface to tokenize documents

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL