Documentation
¶
Overview ¶
Package sentiment provides a high-level sentiment classification pipeline that wraps encoder model loading and inference. It supports pluggable tokenization, batch processing, and both softmax and continuous scoring modes.
Basic usage with pre-tokenized input:
p, err := sentiment.New("model.gguf",
sentiment.WithLabels([]string{"negative", "positive"}),
)
if err != nil {
log.Fatal(err)
}
defer p.Close()
results, err := p.ClassifyTokenized(ctx, [][]int{{101, 2023, 2003, 2307, 102}})
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Encoder ¶
type Encoder interface {
Forward(ctx context.Context, inputIDs []int) ([]float32, error)
OutputShape() []int
Close() error
}
Encoder abstracts the forward pass of an encoder model. This allows injection of mock models for testing without loading GGUF files.
type EpochMetric ¶
EpochMetric holds metrics for a single training epoch.
type Option ¶
type Option func(*Pipeline)
Option configures a Pipeline.
func WithBatchSize ¶
WithBatchSize sets the number of texts processed per forward pass. Default is 64.
func WithContinuous ¶
func WithContinuous() Option
WithContinuous enables continuous scoring mode. In this mode, Confidence contains sigmoid outputs instead of softmax, which is useful for regression-style sentiment strength scoring.
func WithDevice ¶
WithDevice sets the compute device for model loading (e.g. "cpu", "cuda").
func WithEncoder ¶
WithEncoder injects a pre-built encoder, bypassing GGUF file loading. The caller retains ownership and must close the encoder separately.
func WithLabels ¶
WithLabels sets the class labels. The order must match the model's output logit indices.
func WithMaxSeqLen ¶
WithMaxSeqLen sets the maximum sequence length for tokenized input. Sequences longer than this are truncated. Default is 512.
func WithTokenizer ¶
WithTokenizer sets a custom tokenizer for text input.
func WithVocabFile ¶ added in v1.10.0
WithVocabFile loads a WordPiece vocabulary file and uses it as the tokenizer.
type Pipeline ¶
type Pipeline struct {
// contains filtered or unexported fields
}
Pipeline wraps an encoder model for sentiment classification.
func New ¶
New creates a sentiment pipeline. If modelPath is non-empty and no Encoder is injected via WithEncoder, the model is loaded from a GGUF file.
func (*Pipeline) Classify ¶
Classify runs sentiment classification on one or more texts. A Tokenizer must be set via WithTokenizer, otherwise use ClassifyTokenized.
func (*Pipeline) ClassifyTokenized ¶
func (p *Pipeline) ClassifyTokenized(ctx context.Context, inputIDs [][]int) ([]SentimentResult, error)
ClassifyTokenized runs classification on pre-tokenized input IDs.
type SentimentResult ¶
type SentimentResult struct {
Label string // predicted label (e.g. "positive")
Score float64 // probability of the predicted label
Logits []float64 // raw logits for all classes
Confidence []float64 // softmax probabilities for all classes
}
SentimentResult holds classification output for a single text.
type TrainableModel ¶
type TrainableModel interface {
// Forward runs the model on tokenized input and returns logits.
Forward(inputIDs []int) ([]float32, error)
// NumClasses returns the number of output classes.
NumClasses() int
// UpdateParams applies gradient-based parameter updates.
// grad is the loss gradient (softmax - one_hot) averaged over the batch.
// lr is the learning rate.
UpdateParams(grad []float64, lr float64) error
}
TrainableModel abstracts a model that supports forward pass and parameter updates during fine-tuning. This allows mock implementations for testing.
type TrainingConfig ¶
type TrainingConfig struct {
Epochs int
LearningRate float64
BatchSize int
ValSplit float64 // fraction for validation (0.0-1.0)
LoRARank int // 0 = full fine-tuning, >0 = LoRA
MaxSeqLen int
Labels []string
}
TrainingConfig holds fine-tuning configuration.
type TrainingData ¶
TrainingData represents a labeled text sample.
func LoadTrainingData ¶
func LoadTrainingData(path string) ([]TrainingData, error)
LoadTrainingData reads labeled data from a CSV or JSONL file. CSV format: "text","label" columns (header required). JSONL format: {"text": "...", "label": "..."} per line. Format is auto-detected from file extension.
type TrainingResult ¶
type TrainingResult struct {
FinalTrainLoss float64
FinalValAcc float64
EpochMetrics []EpochMetric
}
TrainingResult holds metrics from fine-tuning.
func FineTune ¶
func FineTune(model TrainableModel, tokenizer Tokenizer, data []TrainingData, cfg TrainingConfig) (*TrainingResult, error)
FineTune trains a sentiment model on labeled data using the provided TrainableModel, Tokenizer, and configuration. It returns training metrics.