sentiment

package

v1.36.0 Latest Latest Go to latest Published: Mar 29, 2026 License: Apache-2.0 Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

Documentation ¶

Overview ¶

Package sentiment provides a high-level sentiment classification pipeline that wraps encoder model loading and inference. It supports pluggable tokenization, batch processing, and both softmax and continuous scoring modes.

Basic usage with pre-tokenized input:

p, err := sentiment.New("model.gguf",
    sentiment.WithLabels([]string{"negative", "positive"}),
)
if err != nil {
    log.Fatal(err)
}
defer p.Close()

results, err := p.ClassifyTokenized(ctx, [][]int{{101, 2023, 2003, 2307, 102}})

Index ¶

type Encoder
type EpochMetric
type Option
type Pipeline
- func New(modelPath string, opts ...Option) (*Pipeline, error)
type SentimentResult
type Tokenizer
type TrainableModel
type TrainingConfig
type TrainingData
- func LoadTrainingData(path string) ([]TrainingData, error)
type TrainingResult
- func FineTune(model TrainableModel, tokenizer Tokenizer, data []TrainingData, ...) (*TrainingResult, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Encoder ¶

type Encoder interface {
	Forward(ctx context.Context, inputIDs []int) ([]float32, error)
	OutputShape() []int
	Close() error
}

Encoder abstracts the forward pass of an encoder model. This allows injection of mock models for testing without loading GGUF files.

type EpochMetric ¶

type EpochMetric struct {
	Epoch     int
	TrainLoss float64
	ValAcc    float64
}

EpochMetric holds metrics for a single training epoch.

type Option ¶

type Option func(*Pipeline)

Option configures a Pipeline.

func WithBatchSize ¶

func WithBatchSize(n int) Option

WithBatchSize sets the number of texts processed per forward pass. Default is 64.

func WithContinuous ¶

func WithContinuous() Option

WithContinuous enables continuous scoring mode. In this mode, Confidence contains sigmoid outputs instead of softmax, which is useful for regression-style sentiment strength scoring.

func WithDevice ¶

func WithDevice(device string) Option

WithDevice sets the compute device for model loading (e.g. "cpu", "cuda").

func WithEncoder ¶

func WithEncoder(enc Encoder) Option

WithEncoder injects a pre-built encoder, bypassing GGUF file loading. The caller retains ownership and must close the encoder separately.

func WithLabels ¶

func WithLabels(labels []string) Option

WithLabels sets the class labels. The order must match the model's output logit indices.

func WithMaxSeqLen ¶

func WithMaxSeqLen(n int) Option

WithMaxSeqLen sets the maximum sequence length for tokenized input. Sequences longer than this are truncated. Default is 512.

func WithTokenizer ¶

func WithTokenizer(t Tokenizer) Option

WithTokenizer sets a custom tokenizer for text input.

func WithVocabFile ¶ added in v1.10.0

func WithVocabFile(vocabPath string) Option

WithVocabFile loads a WordPiece vocabulary file and uses it as the tokenizer.

type Pipeline ¶

type Pipeline struct {
	// contains filtered or unexported fields
}

Pipeline wraps an encoder model for sentiment classification.

func New ¶

func New(modelPath string, opts ...Option) (*Pipeline, error)

New creates a sentiment pipeline. If modelPath is non-empty and no Encoder is injected via WithEncoder, the model is loaded from a GGUF file.

func (*Pipeline) Classify ¶

func (p *Pipeline) Classify(ctx context.Context, texts []string) ([]SentimentResult, error)

Classify runs sentiment classification on one or more texts. A Tokenizer must be set via WithTokenizer, otherwise use ClassifyTokenized.

func (*Pipeline) ClassifyTokenized ¶

func (p *Pipeline) ClassifyTokenized(ctx context.Context, inputIDs [][]int) ([]SentimentResult, error)

ClassifyTokenized runs classification on pre-tokenized input IDs.

func (*Pipeline) Close ¶

func (p *Pipeline) Close() error

Close releases resources held by the pipeline. If the pipeline loaded the model itself (no injected Encoder), the model is closed.

type SentimentResult ¶

type SentimentResult struct {
	Label      string    // predicted label (e.g. "positive")
	Score      float64   // probability of the predicted label
	Logits     []float64 // raw logits for all classes
	Confidence []float64 // softmax probabilities for all classes
}

SentimentResult holds classification output for a single text.

type Tokenizer ¶

type Tokenizer interface {
	Encode(text string) ([]int, error)
	Decode(ids []int) (string, error)
}

Tokenizer is the interface for text tokenization.

type TrainableModel ¶

type TrainableModel interface {
	// Forward runs the model on tokenized input and returns logits.
	Forward(inputIDs []int) ([]float32, error)

	// NumClasses returns the number of output classes.
	NumClasses() int

	// UpdateParams applies gradient-based parameter updates.
	// grad is the loss gradient (softmax - one_hot) averaged over the batch.
	// lr is the learning rate.
	UpdateParams(grad []float64, lr float64) error
}

TrainableModel abstracts a model that supports forward pass and parameter updates during fine-tuning. This allows mock implementations for testing.

type TrainingConfig ¶

type TrainingConfig struct {
	Epochs       int
	LearningRate float64
	BatchSize    int
	ValSplit     float64 // fraction for validation (0.0-1.0)
	LoRARank     int     // 0 = full fine-tuning, >0 = LoRA
	MaxSeqLen    int
	Labels       []string
}

TrainingConfig holds fine-tuning configuration.

type TrainingData ¶

type TrainingData struct {
	Text  string `json:"text"`
	Label string `json:"label"`
}

TrainingData represents a labeled text sample.

func LoadTrainingData ¶

func LoadTrainingData(path string) ([]TrainingData, error)

LoadTrainingData reads labeled data from a CSV or JSONL file. CSV format: "text","label" columns (header required). JSONL format: {"text": "...", "label": "..."} per line. Format is auto-detected from file extension.

type TrainingResult ¶

type TrainingResult struct {
	FinalTrainLoss float64
	FinalValAcc    float64
	EpochMetrics   []EpochMetric
}

TrainingResult holds metrics from fine-tuning.

func FineTune ¶

func FineTune(model TrainableModel, tokenizer Tokenizer, data []TrainingData, cfg TrainingConfig) (*TrainingResult, error)

FineTune trains a sentiment model on labeled data using the provided TrainableModel, Tokenizer, and configuration. It returns training metrics.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL