cedar

package module
v0.0.0-...-6e174eb Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 16, 2019 License: GPL-2.0 Imports: 11 Imported by: 0

README

ahocorasick (fork from iohub/ahocorasick)

Documentation

Index

Constants

View Source
const (
	DefaultTokenBufferSize = 137500
	DefaultMatchBufferSize = 137500
)
View Source
const (
	CJKZhMin = '\u4E00'
	CJKZhMax = '\u9FFF'
)

defines max & min value of chinese CJK code

Variables

View Source
var (
	ErrInvalidDataType = errors.New("cedar: invalid datatype")
	ErrInvalidValue    = errors.New("cedar: invalid value")
	ErrInvalidKey      = errors.New("cedar: invalid key")
	ErrNoPath          = errors.New("cedar: no path")
	ErrNoValue         = errors.New("cedar: no value")
	ErrTooLarge        = errors.New("acmatcher: Tool Large for grow")
)

defines Error type

Functions

This section is empty.

Types

type Cedar

type Cedar struct {
	// contains filtered or unexported fields
}

Cedar encapsulates a fast and compressed double array trie for words query

func NewCedar

func NewCedar() *Cedar

NewCedar new a Cedar instance

func (*Cedar) Delete

func (da *Cedar) Delete(key []byte) error

Delete removes a key-value pair from the cedar. It will return ErrNoPath, if the key has not been added.

func (*Cedar) DumpGraph

func (da *Cedar) DumpGraph(fname string)

DumpGraph dumps inner data structures for graphviz

func (*Cedar) Get

func (da *Cedar) Get(key []byte) (value interface{}, err error)

Get returns the value associated with the given `key`. It is equivalent to

id, err1 = Jump(key)
value, err2 = Value(id)

Thus, it may return ErrNoPath or ErrNoValue,

func (*Cedar) Insert

func (da *Cedar) Insert(key []byte, value interface{}) error

Insert adds a key-value pair into the cedar. It will return ErrInvalidValue, if value < 0 or >= valueLimit.

func (*Cedar) Jump

func (da *Cedar) Jump(path []byte, from int) (to int, err error)

Jump travels from a node `from` to another node `to` by following the path `path`. For example, if the following keys were inserted:

id	key
19	abc
23	ab
37	abcd

then

Jump([]byte("ab"), 0) = 23, nil		// reach "ab" from root
Jump([]byte("c"), 23) = 19, nil			// reach "abc" from "ab"
Jump([]byte("cd"), 23) = 37, nil		// reach "abcd" from "ab"

func (*Cedar) Key

func (da *Cedar) Key(id int) (key []byte, err error)

Key returns the key of the node with the given `id`. It will return ErrNoPath, if the node does not exist.

func (*Cedar) Load

func (da *Cedar) Load(in io.Reader, dataType string) error

Load loads the cedar from an io.Writer, where dataType is either "json" or "gob".

func (*Cedar) LoadFromFile

func (da *Cedar) LoadFromFile(fileName string, dataType string) error

LoadFromFile loads the cedar from a file, where dataType is either "json" or "gob".

func (*Cedar) PrefixMatch

func (da *Cedar) PrefixMatch(key []byte, num int) (ids []int)

PrefixMatch returns a list of at most `num` nodes which match the prefix of the key. If `num` is 0, it returns all matches. For example, if the following keys were inserted:

id	key
19	abc
23	ab
37	abcd

then

PrefixMatch([]byte("abc"), 1) = [ 23 ]				// match ["ab"]
PrefixMatch([]byte("abcd"), 0) = [ 23, 19, 37]		// match ["ab", "abc", "abcd"]

func (*Cedar) PrefixPredict

func (da *Cedar) PrefixPredict(key []byte, num int) (ids []int)

PrefixPredict returns a list of at most `num` nodes which has the key as their prefix. These nodes are ordered by their keys. If `num` is 0, it returns all matches. For example, if the following keys were inserted:

id	key
19	abc
23	ab
37	abcd

then

PrefixPredict([]byte("ab"), 2) = [ 23, 19 ]			// predict ["ab", "abc"]
PrefixPredict([]byte("ab"), 0) = [ 23, 19, 37 ]		// predict ["ab", "abc", "abcd"]

func (*Cedar) Save

func (da *Cedar) Save(out io.Writer, dataType string) error

Save saves the cedar to an io.Writer, where dataType is either "json" or "gob".

func (*Cedar) SaveToFile

func (da *Cedar) SaveToFile(fileName string, dataType string) error

SaveToFile saves the cedar to a file, where dataType is either "json" or "gob".

func (*Cedar) Status

func (da *Cedar) Status() (keys, nodes, size, capacity int)

Status reports the following statistics of the cedar:

keys:		number of keys that are in the cedar,
nodes:		number of trie nodes (slots in the base array) has been taken,
size:			the size of the base array used by the cedar,
capacity:		the capicity of the base array used by the cedar.

func (*Cedar) Update

func (da *Cedar) Update(key []byte, value int) error

Update increases the value associated with the `key`. The `key` will be inserted if it is not in the cedar. It will return ErrInvalidValue, if the updated value < 0 or >= valueLimit.

type MatchToken

type MatchToken struct {
	KLen  int // len of key
	Value interface{}
	At    int // match position of source text
	Freq  uint
}

MatchToken matched words in Aho Corasick Matcher

type Matcher

type Matcher struct {
	// contains filtered or unexported fields
}

Matcher Aho Corasick Matcher

func NewMatcher

func NewMatcher() *Matcher

NewMatcher new an aho corasick matcher

func (*Matcher) Cedar

func (m *Matcher) Cedar() *Cedar

Cedar return a cedar trie instance

func (*Matcher) Compile

func (m *Matcher) Compile()

Compile trie to aho-corasick

func (*Matcher) DumpGraph

func (m *Matcher) DumpGraph(fname string)

DumpGraph dumps aho-corasick dfa structures to graphviz file

func (*Matcher) Insert

func (m *Matcher) Insert(bs []byte, val interface{})

Insert a byte sequence to double array trie inner matcher

func (*Matcher) Key

func (m *Matcher) Key(seq []byte, t MatchToken) []byte

Key extract matched key in seq

func (*Matcher) Match

func (m *Matcher) Match(seq []byte) *Response

Match multiple subsequence in seq and return tokens

type Response

type Response struct {
	// contains filtered or unexported fields
}

func NewResponse

func NewResponse(ac *Matcher) *Response

func (*Response) HasNext

func (r *Response) HasNext() bool

func (*Response) NextMatchItem

func (r *Response) NextMatchItem(content []byte) []MatchToken

func (*Response) Release

func (r *Response) Release()

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL