Documentation
¶
Overview ¶
Package compare provides a model comparison tool that runs the same prompts through multiple models and compares their performance metrics.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func FormatJSON ¶
func FormatJSON(w io.Writer, cr *ComparisonResult) error
FormatJSON writes the comparison result as indented JSON to w.
func FormatTable ¶
func FormatTable(w io.Writer, cr *ComparisonResult) error
FormatTable writes a comparison table to w using text/tabwriter.
Types ¶
type ComparisonResult ¶
type ComparisonResult struct {
Results []ModelResult `json:"results"`
Rankings []Ranking `json:"rankings"`
}
ComparisonResult holds the full comparison output.
type Config ¶
type Config struct {
Models []ModelSpec `json:"models"`
Prompts []string `json:"prompts"`
Metrics []string `json:"metrics"`
}
Config controls a model comparison run.
type InferFunc ¶
type InferFunc func(ctx context.Context, model ModelSpec, prompt string) (text string, metrics map[string]float64, err error)
InferFunc is the function signature for running inference on a model. It takes a model spec and prompt, and returns the generated text along with collected metrics (throughput_tok_s, latency_ms, memory_mb, perplexity).
type ModelComparator ¶
type ModelComparator struct {
// contains filtered or unexported fields
}
ModelComparator compares multiple models on the same set of prompts.
func NewModelComparator ¶
func NewModelComparator(cfg Config, infer InferFunc) *ModelComparator
NewModelComparator creates a new comparator with the given config and inference function.
func (*ModelComparator) Compare ¶
func (mc *ModelComparator) Compare(ctx context.Context) (*ComparisonResult, error)
Compare runs all prompts through all models and returns the comparison result.
type ModelResult ¶
type ModelResult struct {
Model ModelSpec `json:"model"`
Metrics map[string]float64 `json:"metrics"`
StdDev map[string]float64 `json:"std_dev"`
// PerPrompt holds the raw metrics for each prompt.
PerPrompt []map[string]float64 `json:"per_prompt"`
}
ModelResult holds the aggregated results for a single model.