compare

package
v1.38.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 30, 2026 License: Apache-2.0 Imports: 9 Imported by: 0

Documentation

Overview

Package compare provides a model comparison tool that runs the same prompts through multiple models and compares their performance metrics.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func FormatJSON

func FormatJSON(w io.Writer, cr *ComparisonResult) error

FormatJSON writes the comparison result as indented JSON to w.

func FormatTable

func FormatTable(w io.Writer, cr *ComparisonResult) error

FormatTable writes a comparison table to w using text/tabwriter.

Types

type ComparisonResult

type ComparisonResult struct {
	Results  []ModelResult `json:"results"`
	Rankings []Ranking     `json:"rankings"`
}

ComparisonResult holds the full comparison output.

type Config

type Config struct {
	Models  []ModelSpec `json:"models"`
	Prompts []string    `json:"prompts"`
	Metrics []string    `json:"metrics"`
}

Config controls a model comparison run.

type InferFunc

type InferFunc func(ctx context.Context, model ModelSpec, prompt string) (text string, metrics map[string]float64, err error)

InferFunc is the function signature for running inference on a model. It takes a model spec and prompt, and returns the generated text along with collected metrics (throughput_tok_s, latency_ms, memory_mb, perplexity).

type ModelComparator

type ModelComparator struct {
	// contains filtered or unexported fields
}

ModelComparator compares multiple models on the same set of prompts.

func NewModelComparator

func NewModelComparator(cfg Config, infer InferFunc) *ModelComparator

NewModelComparator creates a new comparator with the given config and inference function.

func (*ModelComparator) Compare

func (mc *ModelComparator) Compare(ctx context.Context) (*ComparisonResult, error)

Compare runs all prompts through all models and returns the comparison result.

type ModelResult

type ModelResult struct {
	Model   ModelSpec          `json:"model"`
	Metrics map[string]float64 `json:"metrics"`
	StdDev  map[string]float64 `json:"std_dev"`
	// PerPrompt holds the raw metrics for each prompt.
	PerPrompt []map[string]float64 `json:"per_prompt"`
}

ModelResult holds the aggregated results for a single model.

type ModelSpec

type ModelSpec struct {
	Path         string `json:"path"`
	Name         string `json:"name"`
	Architecture string `json:"architecture"`
	Quantization string `json:"quantization"`
}

ModelSpec identifies a model to compare.

type Ranking

type Ranking struct {
	Metric string   `json:"metric"`
	Order  []string `json:"order"`
}

Ranking holds a metric name and the ranked list of model names from best to worst.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL