filter

package
v2.3.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 19, 2020 License: MIT Imports: 6 Imported by: 8

Documentation

Overview

Package filter prepares the inputs and outputs.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Drop

func Drop(tokens *[]tokenizer.Token, match func(t tokenizer.Token) bool)

Drop drops a token given the provided match function.

func Keep

func Keep(tokens *[]tokenizer.Token, match func(t tokenizer.Token) bool)

Keep keeps a token given the provided match function.

func ScanSentences

func ScanSentences(data []byte, atEOF bool) (advance int, token []byte, err error)

ScanSentences implements SplitFunc interface of bufio.Scanner that returns each sentence of text. see. https://pkg.go.dev/bufio#SplitFunc

Types

type Feature

type Feature = string

Feature represents a feature.

const Any Feature = "\x00"

Any represents an arbitrary feature.

type Features

type Features = []string

Features represents a vector of features.

type FeaturesFilter

type FeaturesFilter struct {
	// contains filtered or unexported fields
}

FeaturesFilter represents a filter that filters a vector of features.

func NewFeaturesFilter

func NewFeaturesFilter(fs ...Features) *FeaturesFilter

NewFeaturesFilter returns a features filter.

func (FeaturesFilter) Match

func (f FeaturesFilter) Match(fs Features) bool

Match returns true if a filter matches given features.

func (FeaturesFilter) String

func (f FeaturesFilter) String() string

String implements string interface.

type POS

type POS = []string

POS represents a part-of-speech that is a vector of features.

type POSFilter

type POSFilter struct {
	// contains filtered or unexported fields
}

POSFilter represents a part-of-speech filter.

func NewPOSFilter

func NewPOSFilter(stops ...POS) *POSFilter

NewPOSFilter returns a part-of-speech filter.

func (POSFilter) Drop

func (f POSFilter) Drop(tokens *[]tokenizer.Token)

Drop drops a token if a filter matches token's POS.

func (POSFilter) Keep

func (f POSFilter) Keep(tokens *[]tokenizer.Token)

Keep keeps a token if a filter matches token's POS.

func (POSFilter) Match

func (f POSFilter) Match(p POS) bool

Match returns true if a filter matches given POS.

type SentenceSplitter

type SentenceSplitter struct {
	Delim               []rune // delimiter set. ex. {'。','.'}
	Follower            []rune // allow following after delimiters. ex. {'」','』'}
	SkipWhiteSpace      bool   // eliminate white space or not
	DoubleLineFeedSplit bool   // splite at '\n\n' or not
	MaxRuneLen          int    // max sentence length
}

SentenceSplitter is a tiny sentence splitter for japanese texts.

func (SentenceSplitter) ScanSentences

func (s SentenceSplitter) ScanSentences(data []byte, atEOF bool) (advance int, token []byte, err error)

ScanSentences is a split function for a Scanner that returns each sentence of text. nolint: gocyclo

type WordFilter

type WordFilter struct {
	// contains filtered or unexported fields
}

WordFilter represents a word filter.

func NewWordFilter

func NewWordFilter(words []string) *WordFilter

NewWordFilter returns a word filter.

func (WordFilter) Drop

func (f WordFilter) Drop(tokens *[]tokenizer.Token)

Drop drops a token if a filter matches token's surface.

func (WordFilter) Keep

func (f WordFilter) Keep(tokens *[]tokenizer.Token)

Keep keeps a token if a filter matches token's surface.

func (WordFilter) Match

func (f WordFilter) Match(w string) bool

Match returns true if a filter matches a given word.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL