crf

package
v0.0.14 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 4, 2026 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package crf implements a linear-chain Conditional Random Field.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func FeaturesToAttributes

func FeaturesToAttributes(features map[string]any) map[string]float64

FeaturesToAttributes converts a feature dict (with mixed value types) to CRF attribute strings with float64 values.

Conversion rules:

  • string value: "key=value" → 1.0
  • []string value: "key:item" → 1.0 for each item
  • bool value: "key" → 1.0 if true
  • int/float value: "key" → float64(value)

func MarshalModel

func MarshalModel(model *Model) ([]byte, error)

MarshalModel serializes the model to JSON bytes.

func SaveModel

func SaveModel(model *Model, path string) error

SaveModel serializes the model to JSON.

func TransitionMarginals

func TransitionMarginals(fb ForwardBackwardResult, stateScores, transScores [][]float64) [][][]float64

TransitionMarginals computes P(y_{t-1}=i, y_t=j | x) for all t, i, j. Returns [T-1][L][L] tensor.

func Viterbi

func Viterbi(stateScores, transScores [][]float64) ([]int, float64)

Viterbi finds the best label sequence using the Viterbi algorithm (log-domain).

Types

type Alphabet

type Alphabet struct {
	ToID  map[string]int `json:"to_id"`
	ToStr []string       `json:"to_str"`
}

Alphabet maps between string labels/attributes and integer IDs.

func BuildAttributeAlphabet

func BuildAttributeAlphabet(sequences []TrainingSequence) *Alphabet

BuildAttributeAlphabet builds the attribute alphabet from training sequences.

func BuildLabelAlphabet

func BuildLabelAlphabet(sequences []TrainingSequence) *Alphabet

BuildLabelAlphabet builds the label alphabet from training sequences.

func NewAlphabet

func NewAlphabet() *Alphabet

NewAlphabet creates an empty alphabet.

func (*Alphabet) Add

func (a *Alphabet) Add(s string) int

Add adds a string to the alphabet if not already present, returns its ID.

func (*Alphabet) Get

func (a *Alphabet) Get(s string) int

Get returns the ID for a string, or -1 if not found.

func (*Alphabet) Size

func (a *Alphabet) Size() int

Size returns the number of entries.

type ForwardBackwardResult

type ForwardBackwardResult struct {
	LogZ      float64     // log partition function
	Marginals [][]float64 // [T][L] marginal probabilities P(y_t=j|x)
	Alpha     [][]float64 // [T][L] scaled forward variables
	Beta      [][]float64 // [T][L] scaled backward variables
	Scale     []float64   // [T] scaling factors
}

ForwardBackwardResult holds the results of the forward-backward algorithm.

func ForwardBackward

func ForwardBackward(stateScores, transScores [][]float64) ForwardBackwardResult

ForwardBackward computes scaled forward-backward algorithm. stateScores: [T][L] state feature scores transScores: [L][L] transition feature scores

type Model

type Model struct {
	Labels     *Alphabet `json:"labels"`
	Attributes *Alphabet `json:"attributes"`
	Weights    []float64 `json:"weights"`
	NumLabels  int       `json:"num_labels"`
}

Model holds the CRF parameters.

func LoadModel

func LoadModel(path string) (*Model, error)

LoadModel deserializes a model from JSON.

func NewModel

func NewModel() *Model

NewModel creates a new empty model.

func Train

func Train(sequences []TrainingSequence, config TrainerConfig) *Model

Train trains a CRF model on the given sequences using OWL-QN.

func UnmarshalModel

func UnmarshalModel(data []byte) (*Model, error)

UnmarshalModel deserializes a model from JSON bytes.

func (*Model) ComputeStateScores

func (m *Model) ComputeStateScores(features []map[string]float64) [][]float64

ComputeStateScores computes state feature scores for each position and label. Returns [T][L] matrix where T is sequence length and L is number of labels.

func (*Model) ComputeTransScores

func (m *Model) ComputeTransScores() [][]float64

ComputeTransScores returns the [L][L] transition score matrix.

func (*Model) NumWeights

func (m *Model) NumWeights() int

NumWeights returns the total number of weights.

func (*Model) Predict

func (m *Model) Predict(features []map[string]float64) []string

Predict returns the best label sequence as strings.

func (*Model) PredictMarginals

func (m *Model) PredictMarginals(features []map[string]float64) []map[string]float64

PredictMarginals returns marginal probabilities for each position.

func (*Model) StateFeatureIndex

func (m *Model) StateFeatureIndex(attrID, labelID int) int

StateFeatureIndex returns the weight index for a state feature.

func (*Model) TransFeatureIndex

func (m *Model) TransFeatureIndex(fromLabelID, toLabelID int) int

TransFeatureIndex returns the weight index for a transition feature.

func (*Model) TransOffset

func (m *Model) TransOffset() int

TransOffset returns the offset where transition features start in the weight vector.

type Sequence

type Sequence struct {
	Features []map[string]float64 // per-position feature dicts
}

Sequence represents an unlabeled sequence for prediction.

type TrainerConfig

type TrainerConfig struct {
	C1                     float64 // L1 regularization
	C2                     float64 // L2 regularization
	MaxIterations          int
	AllPossibleTransitions bool
	Epsilon                float64 // convergence threshold
	Verbose                bool
}

TrainerConfig holds CRF training hyperparameters.

func DefaultTrainerConfig

func DefaultTrainerConfig() TrainerConfig

DefaultTrainerConfig returns default training config matching Formasaurus.

type TrainingSequence

type TrainingSequence struct {
	Features []map[string]float64 // per-position feature dicts
	Labels   []string             // gold labels
	Group    int                  // for grouped cross-validation
}

TrainingSequence represents a labeled sequence for training.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL