agentmeter

package module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 10, 2026 License: Apache-2.0 Imports: 4 Imported by: 1

README

agentmeter

A Go dev tool for inspecting and debugging LLM agent systems directly in your terminal.

When you're building multi-agent pipelines — planner, retriever, executor, whatever — things get hard to follow fast. agentmeter gives you a live, structured view of every step: which agent ran, what it said, what tools it called, how long it took, and what it cost. No dashboards, no cloud, no setup. Just your terminal.

It's framework-agnostic. The core has no SDK dependencies. An adapter for Eino is available as a separate module.

go get github.com/erlangb/agentmeter

Step output

History output


Quick start

meter := agentmeter.New(pricing.WithDefaultPricing())
meter.Reset("run-1")

start := time.Now()
// ... model call ...
meter.Record(agentmeter.AgentStep{
    Role:      "model",
    Cluster:   agentmeter.ClusterCognitive,
    AgentName: "planner",
    ModelID:   "gpt-4o",
    StartedAt: start,
    Content:   "I'll search for recent Go releases.",
    Usage:     agentmeter.TokenUsage{PromptTokens: 200, CompletionTokens: 30},
})

start = time.Now()
// ... tool call ...
meter.Record(agentmeter.AgentStep{
    Role:      "tool",
    Cluster:   agentmeter.ClusterAction,
    AgentName: "executor",
    StartedAt: start,
    ToolName:  "web_search",
    Content:   "Go 1.23 was released in August 2024...",
})

meter.Finalize()

printer := reasoning.NewPrinter(os.Stdout)
printer.Print(meter.Snapshot())

Adapters

Each adapter is a separate Go sub-module — pull in only what you need.

Eino
go get github.com/erlangb/agentmeter/adapters/eino
meter := agentmeter.New(pricing.WithDefaultPricing())
handler := einometer.NewAgentMeterHandler(meter)

runner, _ := graph.Compile(ctx, compose.WithGlobalCallbacks(handler))
runner.Invoke(ctx, "input")

printer.Print(meter.Snapshot())

Captures: chain start/end → Reset/Finalize, ChatModel output → ClusterCognitive step (with token usage, model ID, tool calls), ToolsNode output → ClusterAction steps, errors → ClusterError.


Core concepts

StepCluster

The fixed vocabulary that drives rendering and counters. Set it on every step.

Constant Use for Effect
ClusterCognitive Model inference, thinking Increments ModelCalls
ClusterAction Tool result Rendered as a tool block
ClusterMessage User-facing output Rendered as a message
ClusterError Failures, exceptions Rendered in red

Role is a free-form string you own. The library defines no role constants.

Run label vs agent name
Field Scope Set by
Snapshot.Label The run meter.Reset(label)
AgentStep.AgentName Each step You, in Record()

Label groups the session. AgentName tells you which agent produced each step. Useful when planner, retriever, and executor all share one meter.

Snapshot

Snapshot() returns an immutable, mutex-free copy of the current run — safe to log, marshal, or pass around.

snap := meter.Snapshot()
snap.Label
snap.Steps                          // []AgentStep, chronological
snap.TokenSummary.ByModel           // token breakdown per model
snap.TokenSummary.EstimatedCostUSD
snap.TotalDuration
History

Finalize() seals a run and appends it to a bounded history (default: 100 runs).

history := meter.History() // []Snapshot
printer.PrintHistory(history)

Pricing

// Built-in table: OpenAI, Anthropic Claude, Google Gemini
meter := agentmeter.New(pricing.WithDefaultPricing())

// Or roll your own
meter := agentmeter.New(agentmeter.WithCostFunc(func(s agentmeter.TokenSummary) float64 {
    u := s.AggregateTokenUsage()
    return float64(u.PromptTokens)*0.000002 + float64(u.CompletionTokens)*0.000006
}))

Cost is computed lazily at Snapshot() time. See pricing/pricing.go for the full model list.


Terminal output

printer := reasoning.NewPrinter(os.Stdout)
printer.Print(snap)           // single run
printer.PrintHistory(history) // all runs with aggregate summary

Examples

Example What it shows
go run ./examples/basic/ One run: model → tool → model
go run ./examples/history/ Multi-run history with aggregate cost
go run ./examples/mixed_pricing/ GPT-4o and Gemini in one run
go run ./examples/custom_cost/ Custom CostFunc
go run ./examples/terminal_output/ Coloured, plain, and custom terminal styles

Options

meter := agentmeter.New(
    agentmeter.WithCostFunc(costFn),
    agentmeter.WithMaxHistory(50),
)

Testing

go test -v -race ./...
go vet ./...

# eino adapter
cd adapters/eino && go test -v -race ./...

Documentation

Overview

Package agentmeter provides framework-agnostic observability for LLM agent runs. It records reasoning traces, token usage, tool calls, and estimated costs without introducing any external dependencies in the core package.

Index

Constants

View Source
const DefaultMaxHistory = 100

DefaultMaxHistory is the default cap on completed-run snapshots retained in History.

Variables

This section is empty.

Functions

This section is empty.

Types

type AgentStep

type AgentStep struct {
	// Role is a free descriptive label set by the client (e.g. "model", "tool",
	// "retrieval"). It appears in headers and logs but does not drive rendering
	// or counter gating — use Cluster for that.
	Role StepRole
	// Cluster drives display and counter gating. Set to one of the Cluster*
	// constants. ClusterCognitive increments ModelCalls; ClusterAction renders
	// as a tool result; ClusterMessage and ClusterError have dedicated styles.
	Cluster StepCluster
	// AgentName is the name of the agent that performed this step.
	AgentName string
	// Provider identifies the API provider serving the model
	// (e.g. "openai", "anthropic", "google", "azure_openai", "aws_bedrock").
	// Follows the gen_ai.system semantic convention from OpenTelemetry.
	// Optional: leave empty if the provider is unambiguous from ModelID in your setup.
	// Explicit over inferred — the same ModelID can be served by multiple providers
	// (e.g. "claude-3-5-sonnet" via Anthropic direct, AWS Bedrock, or Vertex AI).
	Provider string
	// ModelID is the model identifier (e.g. "gpt-4o"). Non-empty for cognitive steps.
	ModelID string
	// Content holds the primary text output (model response text or tool result).
	Content string
	// ThinkingContent holds chain-of-thought text for thinking steps.
	ThinkingContent string
	// ToolName is the name of the tool invoked. Non-empty for action steps.
	ToolName string
	// ToolInput is the serialised input passed to the tool.
	ToolInput string
	// ToolCallID is the provider-assigned correlation ID linking a tool call
	// in an assistant message to its corresponding tool result step.
	ToolCallID string
	// ToolCalls is a list of tool-call summaries (e.g. "search({q:\"go\"})") for
	// model steps that dispatched one or more tools. Empty for other step types.
	ToolCalls []string
	// Usage records token counts for model and thinking steps; zero for tool steps.
	Usage TokenUsage
	// StartedAt is the wall-clock time at which this step began.
	// If Duration is zero and StartedAt is non-zero, Record() auto-computes
	// Duration = time.Since(StartedAt). Set this immediately before issuing
	// the model or tool call — before any blocking I/O — for accurate timing.
	StartedAt time.Time
	// Duration is the wall-clock time spent in this step.
	// If set explicitly it takes precedence over StartedAt-based auto-calculation.
	Duration time.Duration
}

AgentStep is an immutable record of one step in an agent run.

type CostFunc

type CostFunc func(TokenSummary) float64

CostFunc computes the estimated cost in USD for a given TokenSummary. Implementations should be pure functions with no side effects. A nil CostFunc is valid; Meter will skip cost computation in that case.

type Meter

type Meter struct {
	// contains filtered or unexported fields
}

Meter records the reasoning trace of one or more agent runs and accumulates a bounded history of completed runs. Safe for concurrent use.

func New

func New(opts ...Option) *Meter

New creates a Meter instance with the supplied options applied. It is safe to call New with no options; all settings have safe defaults.

func (*Meter) ClearHistory

func (t *Meter) ClearHistory()

ClearHistory removes all completed-run snapshots from memory.

func (*Meter) Finalize

func (t *Meter) Finalize()

Finalize marks the current run complete and appends a Snapshot to history. Calling it more than once on the same run is a no-op.

func (*Meter) History

func (t *Meter) History() []Snapshot

History returns a copy of all completed runs in chronological order. Runs are added by Finalize(). The slice is bounded by the MaxHistory option (default 100). Use ClearHistory to reset it.

func (*Meter) Record

func (t *Meter) Record(s AgentStep)

Record appends s to the current run and updates token aggregates. Duration is auto-computed from StartedAt if not set explicitly.

func (*Meter) Reset

func (t *Meter) Reset(label string)

Reset clears the current run state and prepares Meter to record a new run. label is a human-readable identifier for this run (e.g. "conversation-42", "search-agent-run-3"). It is stored in Snapshot.Label and does not affect how individual steps are recorded. Steps carry their own AgentName field. Reset does not clear history.

func (*Meter) Snapshot

func (t *Meter) Snapshot() Snapshot

Snapshot returns a point-in-time, immutable view of the current run. The returned Snapshot is a deep copy: mutating it does not affect Meter. EstimatedCostUSD is computed here if a CostFunc was supplied.

type Option

type Option func(*config)

Option is a functional option that configures a Meter instance. Options are applied in the order they are passed to New().

func WithCostFunc

func WithCostFunc(fn CostFunc) Option

WithCostFunc sets the CostFunc used to estimate cost in USD for each run. Pass nil to disable cost estimation (the default).

func WithMaxHistory

func WithMaxHistory(n int) Option

WithMaxHistory sets the maximum number of completed runs to retain in history. n must be positive; zero or negative values are ignored and the default is kept.

type Snapshot

type Snapshot struct {
	// Label is the run identifier set by Reset(label). It identifies the run,
	// not the agents within it. Individual steps carry their own AgentName.
	Label string
	// Steps is a deep copy of the reasoning steps recorded so far.
	Steps []AgentStep
	// TokenSummary is an aggregated token and cost summary for this run.
	TokenSummary TokenSummary
	// TotalDuration is the accumulated wall-clock time across all steps in this snapshot.
	TotalDuration time.Duration
}

Snapshot is a point-in-time, immutable view of an agent run. It is returned by Meter.Snapshot() and stored in History(). Unlike Meter, a Snapshot carries no mutex and is safe to share freely.

type StepCluster

type StepCluster string

StepCluster is the stable rendering and routing category for a step. It is the library's fixed vocabulary; Role is owned by the client. Use one of the Cluster* constants when recording a step.

const (
	// ClusterCognitive covers model inference and extended-thinking steps.
	// Steps with this cluster increment ModelCalls and are rendered with
	// thinking/plan/call labels.
	ClusterCognitive StepCluster = "cognitive"
	// ClusterAction covers tool calls and external side-effects.
	ClusterAction StepCluster = "action"
	// ClusterMessage covers text responses directed at the user.
	ClusterMessage StepCluster = "message"
	// ClusterError covers failures, exceptions, and retries.
	ClusterError StepCluster = "error"
)

type StepRole

type StepRole string

StepRole is a free-form descriptive label for a reasoning step. The library defines the type but no constants — clients supply whatever role labels make sense for their framework or application (e.g. "model", "tool", "retrieval", "rerank"). Rendering and counter gating are driven by StepCluster, not by Role.

type TokenSummary

type TokenSummary struct {
	// ByModel holds token usage aggregated per model ID.
	ByModel map[string]TokenUsage
	// ModelCalls is the number of ClusterCognitive steps recorded.
	ModelCalls int
	// EstimatedCostUSD is the cost computed by CostFunc at Snapshot time.
	EstimatedCostUSD float64
}

TokenSummary aggregates multiple usages into a final report.

func (TokenSummary) AggregateTokenUsage

func (s TokenSummary) AggregateTokenUsage() TokenUsage

AggregateTokenUsage returns the sum of all per-model token usage.

type TokenUsage

type TokenUsage struct {
	PromptTokens      int
	CompletionTokens  int
	TotalTokens       int
	CachedInputTokens int
	CacheWriteTokens  int
	ReasoningTokens   int
}

TokenUsage captures raw token counts. It is the "source of truth" for what happened in a single model interaction.

func (TokenUsage) Add

func (u TokenUsage) Add(other TokenUsage) TokenUsage

Add returns a NEW TokenUsage that is the sum of u and other.

Directories

Path Synopsis
adapters
eino module
examples
basic command
Package main is the minimal agentmeter example.
Package main is the minimal agentmeter example.
custom_cost command
history command
Package main demonstrates multi-run history with agentmeter.
Package main demonstrates multi-run history with agentmeter.
mixed_pricing command
Package main demonstrates per-model cost tracking with agentmeter.
Package main demonstrates per-model cost tracking with agentmeter.
terminal_output command
Package main demonstrates the terminal rendering options of agentmeter.
Package main demonstrates the terminal rendering options of agentmeter.
Package reasoning provides terminal rendering utilities for agentmeter traces.
Package reasoning provides terminal rendering utilities for agentmeter traces.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL