sentiment

package
v1.36.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 29, 2026 License: Apache-2.0 Imports: 13 Imported by: 0

Documentation

Overview

Package sentiment provides a high-level sentiment classification pipeline that wraps encoder model loading and inference. It supports pluggable tokenization, batch processing, and both softmax and continuous scoring modes.

Basic usage with pre-tokenized input:

p, err := sentiment.New("model.gguf",
    sentiment.WithLabels([]string{"negative", "positive"}),
)
if err != nil {
    log.Fatal(err)
}
defer p.Close()

results, err := p.ClassifyTokenized(ctx, [][]int{{101, 2023, 2003, 2307, 102}})

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Encoder

type Encoder interface {
	Forward(ctx context.Context, inputIDs []int) ([]float32, error)
	OutputShape() []int
	Close() error
}

Encoder abstracts the forward pass of an encoder model. This allows injection of mock models for testing without loading GGUF files.

type EpochMetric

type EpochMetric struct {
	Epoch     int
	TrainLoss float64
	ValAcc    float64
}

EpochMetric holds metrics for a single training epoch.

type Option

type Option func(*Pipeline)

Option configures a Pipeline.

func WithBatchSize

func WithBatchSize(n int) Option

WithBatchSize sets the number of texts processed per forward pass. Default is 64.

func WithContinuous

func WithContinuous() Option

WithContinuous enables continuous scoring mode. In this mode, Confidence contains sigmoid outputs instead of softmax, which is useful for regression-style sentiment strength scoring.

func WithDevice

func WithDevice(device string) Option

WithDevice sets the compute device for model loading (e.g. "cpu", "cuda").

func WithEncoder

func WithEncoder(enc Encoder) Option

WithEncoder injects a pre-built encoder, bypassing GGUF file loading. The caller retains ownership and must close the encoder separately.

func WithLabels

func WithLabels(labels []string) Option

WithLabels sets the class labels. The order must match the model's output logit indices.

func WithMaxSeqLen

func WithMaxSeqLen(n int) Option

WithMaxSeqLen sets the maximum sequence length for tokenized input. Sequences longer than this are truncated. Default is 512.

func WithTokenizer

func WithTokenizer(t Tokenizer) Option

WithTokenizer sets a custom tokenizer for text input.

func WithVocabFile added in v1.10.0

func WithVocabFile(vocabPath string) Option

WithVocabFile loads a WordPiece vocabulary file and uses it as the tokenizer.

type Pipeline

type Pipeline struct {
	// contains filtered or unexported fields
}

Pipeline wraps an encoder model for sentiment classification.

func New

func New(modelPath string, opts ...Option) (*Pipeline, error)

New creates a sentiment pipeline. If modelPath is non-empty and no Encoder is injected via WithEncoder, the model is loaded from a GGUF file.

func (*Pipeline) Classify

func (p *Pipeline) Classify(ctx context.Context, texts []string) ([]SentimentResult, error)

Classify runs sentiment classification on one or more texts. A Tokenizer must be set via WithTokenizer, otherwise use ClassifyTokenized.

func (*Pipeline) ClassifyTokenized

func (p *Pipeline) ClassifyTokenized(ctx context.Context, inputIDs [][]int) ([]SentimentResult, error)

ClassifyTokenized runs classification on pre-tokenized input IDs.

func (*Pipeline) Close

func (p *Pipeline) Close() error

Close releases resources held by the pipeline. If the pipeline loaded the model itself (no injected Encoder), the model is closed.

type SentimentResult

type SentimentResult struct {
	Label      string    // predicted label (e.g. "positive")
	Score      float64   // probability of the predicted label
	Logits     []float64 // raw logits for all classes
	Confidence []float64 // softmax probabilities for all classes
}

SentimentResult holds classification output for a single text.

type Tokenizer

type Tokenizer interface {
	Encode(text string) ([]int, error)
	Decode(ids []int) (string, error)
}

Tokenizer is the interface for text tokenization.

type TrainableModel

type TrainableModel interface {
	// Forward runs the model on tokenized input and returns logits.
	Forward(inputIDs []int) ([]float32, error)

	// NumClasses returns the number of output classes.
	NumClasses() int

	// UpdateParams applies gradient-based parameter updates.
	// grad is the loss gradient (softmax - one_hot) averaged over the batch.
	// lr is the learning rate.
	UpdateParams(grad []float64, lr float64) error
}

TrainableModel abstracts a model that supports forward pass and parameter updates during fine-tuning. This allows mock implementations for testing.

type TrainingConfig

type TrainingConfig struct {
	Epochs       int
	LearningRate float64
	BatchSize    int
	ValSplit     float64 // fraction for validation (0.0-1.0)
	LoRARank     int     // 0 = full fine-tuning, >0 = LoRA
	MaxSeqLen    int
	Labels       []string
}

TrainingConfig holds fine-tuning configuration.

type TrainingData

type TrainingData struct {
	Text  string `json:"text"`
	Label string `json:"label"`
}

TrainingData represents a labeled text sample.

func LoadTrainingData

func LoadTrainingData(path string) ([]TrainingData, error)

LoadTrainingData reads labeled data from a CSV or JSONL file. CSV format: "text","label" columns (header required). JSONL format: {"text": "...", "label": "..."} per line. Format is auto-detected from file extension.

type TrainingResult

type TrainingResult struct {
	FinalTrainLoss float64
	FinalValAcc    float64
	EpochMetrics   []EpochMetric
}

TrainingResult holds metrics from fine-tuning.

func FineTune

func FineTune(model TrainableModel, tokenizer Tokenizer, data []TrainingData, cfg TrainingConfig) (*TrainingResult, error)

FineTune trains a sentiment model on labeled data using the provided TrainableModel, Tokenizer, and configuration. It returns training metrics.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL