agent

package
v0.3.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 30, 2026 License: MIT Imports: 34 Imported by: 0

Documentation

Overview

compact.go owns session compaction: when the live message list grows past CompactionConfig.AutoCompactInputTokens the agent collapses the older portion into a synthetic summary message, preserving a configurable tail of recent turns. Compaction must not split a tool_use from its matching tool_result — see adjustBoundary in T-204.

compact_summary.go owns the deterministic local summarizer that CompactSession folds removed messages through. No LLM call — the summary is rules-based so compaction is fast, free, and reproducible across restarts of the same input.

events.go defines the tagged-union the agent pushes through its Events() channel. Replaces the old Callbacks struct — instead of the UI registering function pointers and the agent calling them synchronously on its goroutine, the UI consumes a typed event stream and translates each to its own representation (tea.Msg in the TUI, fprintf in the CLI).

Why a sealed interface over a struct-of-functions:

  • Order is explicit: events arrive in the order the agent emits them, so the consumer never has to reason about callback interleaving across goroutines.
  • Adding a new event kind is one type declaration plus a consumer case, with the compiler flagging missing handling anywhere a switch lists known variants.
  • The CLI and TUI adapters are just two consumers of the same channel — no parallel implementations of the same function table to keep in sync.

Permission ask is special. The agent must block until the user answers, so EventPermissionAsk carries a buffered Reply channel. The consumer writes a PermissionResponse on Reply; the agent goroutine is parked on that receive.

semantic_compact.go adds LLM-powered semantic compaction on top of the deterministic local summarizer. When context pressure hits the semantic threshold, the agent calls deepseek-v4-flash (thinking disabled) to produce a richer summary. Falls back to deterministic compaction on failure.

Package agent is the deepseekcode ReAct loop. It owns turn boundaries, tool dispatch, callback fan-out, and stop conditions.

The shape closely mirrors charmbracelet/crush's internal/agent/agent.go (callback table + StopWhen []StopCondition). The three load-bearing patches from docs/design.md §6.4 are applied here: stream/present split, finish-reason override, and two-tier timeout (the timeout is applied at the llm.Client layer; this package just configures it).

Index

Constants

View Source
const (
	BudgetKindWarning  = "warning"  // crossed WarnCNY
	BudgetKindBlocked  = "blocked"  // crossed HardCNY — the turn is refused
	BudgetKindUnpriced = "unpriced" // model has no pricing table; gate can't price it
)

Budget-gate event kinds (EventBudget.Kind), T1.3.

View Source
const DefaultSystemPrompt = `` /* 742-byte string literal not displayed */

DefaultSystemPrompt is the cache-stable system prompt. It must not change between turns; that would invalidate the prompt cache and blow the cost story. Versioned by binary release, not by session.

View Source
const EventProtocolVersion = 1

EventProtocolVersion is the current envelope version. Increment when the EventEnvelope shape changes in a backward-incompatible way.

View Source
const MaxContextTokens = 1_000_000

MaxContextTokens is the default maximum context window size for DeepSeek V4 models (1M context). Used by ContextPressure to compute the usage ratio. Override via Agent.MaxContextTokens.

Variables

This section is empty.

Functions

func AffectedPathsFor

func AffectedPathsFor(reg *tools.Registry, call llm.ToolCall) []string

AffectedPathsFor returns the static affected paths for a tool call (used by the snapshot manager). Bash returns nil since its effects are unknown statically; the destructive-bash check + permission prompt are the safety net there.

func CheckBudget

func CheckBudget(policy BudgetPolicy, state BudgetState, projectedCNY float64) (allow bool, warn bool)

CheckBudget evaluates whether a model turn should proceed given the policy, current state, and projected cost for the upcoming turn.

Returns:

  • allow: true if the turn may proceed; false if hard limit is exceeded
  • warn: true if this call should emit a warning (first time crossing WarnCNY)

Pure function: does not modify state. The caller updates BudgetState based on the returned flags.

func ContextPressure

func ContextPressure(messages []llm.Message, maxContextTokens int, charsPerToken float64) float64

ContextPressure returns the current context usage ratio (0-1).

func EstimateTokens

func EstimateTokens(messages []llm.Message) int

EstimateTokens returns the cold-start (UTF-8 byte ÷ 4) token estimate. It is the uncalibrated wrapper over EstimateTokensCalibrated; callers holding a learned per-session ratio should use the calibrated form directly.

func EstimateTokensCalibrated

func EstimateTokensCalibrated(messages []llm.Message, charsPerToken float64) int

EstimateTokensCalibrated estimates the token count of a message list using the given chars-per-token ratio — a per-session value learned from provider usage frames (see Agent.calibrateCharsPerToken). A non-positive ratio falls back to the cold-start prior. Used only for compaction triggering and the pre-stream budget projection — never for cost computation. Cheap, deterministic, no tokenizer dependency.

Rounding-locus note: this divides the SUMMED char count once, whereas the historical EstimateTokens floored each block's len/4 independently. For the existing strict test inputs every block length is a multiple of 4, so the results are identical; for other inputs the two differ by at most (blocks-1) tokens — negligible on a 1M window.

func ProjectedTurnCostCNY

func ProjectedTurnCostCNY(model string, req llm.Request, charsPerToken, cacheHitRate float64) float64

ProjectedTurnCostCNY returns a pre-stream cost estimate for one model turn. It runs before DeepSeek returns authoritative cache hit/miss usage, so it prices the prompt against the rolling session cache-hit rate (T4.2): the fraction cacheHitRate of input tokens is priced at the cheap cache-hit rate and the rest as cache miss.

The input token count uses the per-session calibrated chars-per-token ratio (charsPerToken<=0 falls back to the cold-start char/4 prior), so the gate tracks the real prompt size on CJK/code-heavy sessions (T4.1).

cacheHitRate is clamped to [0,1] and the hit-token split is floored, so the projection is biased toward cache miss. cacheHitRate==0 (the cold-start floor, before any usage is observed) reproduces the all-miss estimate exactly — the gate is never looser than the conservative default.

func ShouldCompact

func ShouldCompact(messages []llm.Message, cfg CompactionConfig, charsPerToken float64) (ok bool, fromIdx, toIdx int)

ShouldCompact decides whether the message list has grown enough to merit compaction. Returns ok=true with the proposed [fromIdx, toIdx) window of messages to summarize. The window is not yet boundary-safe — callers must run adjustBoundary (T-204) before deleting anything.

Returns ok=false (and zero indices) when:

  • estimated tokens are below AutoCompactInputTokens, or
  • len(messages) <= preserve*2 (nothing meaningful to compact)

func ShouldSemanticCompact

func ShouldSemanticCompact(pressure float64, cfg SemanticCompactionConfig) string

ShouldSemanticCompact decides whether semantic compaction should fire. Returns the action: "none", "warn", "compact", or "protect".

Types

type Agent

type Agent struct {
	Client      *llm.Client
	Tools       *tools.Registry
	Permissions *permissions.Policy

	// Persister, if non-nil, receives session and snapshot bookkeeping
	// alongside the in-memory Messages list. nil = ephemeral session
	// (the -p one-shot mode runs this way).
	Persister Persister

	// Model is the active main-loop model (e.g. deepseek-v4-flash).
	// Changed mid-session via /models.
	Model    string
	Thinking bool

	// Temperature and TopP, when non-nil, are sent on every model request as
	// the OpenAI-shaped sampling controls. nil (the default) omits the field
	// entirely so the main-loop wire bytes — and thus the cache fingerprint —
	// are unchanged. Sub-agents set these from their def frontmatter (T7.1).
	Temperature *float64
	TopP        *float64

	// EscalationModel, when non-empty and different from Model, enables
	// model-driven escalation (T2.3): a turn is re-issued once on this model
	// when the assistant emits a <<<NEEDS_PRO>>> self-declaration or the
	// per-turn repair-error count crosses escalationRepairThreshold. Empty (the
	// default) disables escalation entirely — the mechanism is a no-op and adds
	// no wire bytes. The marker *contract* (telling the model the marker exists)
	// is a separate, opt-in system-prompt addition (see escalationContract); the
	// detection here works regardless and never moves the Prefix Fingerprint.
	EscalationModel string

	// AutoReasoning enables per-turn thinking selection via
	// llm.SelectThinking. When true, runStep calls SelectThinking
	// with the last user message text to decide thinking on/off.
	AutoReasoning bool

	// DisablePrefixEpoch disables the PrefixEpoch feature (for benchmarking).
	DisablePrefixEpoch bool

	// DisableSemanticCompaction disables semantic (LLM) compaction (for benchmarking).
	DisableSemanticCompaction bool

	// System is the system prompt. Cache-stable across turns by design.
	System string

	// PromptBuilder, when non-nil, overrides System with the builder's
	// output at the start of Run. The builder owns the static + dynamic
	// split (see internal/prompt); the agent just calls Build() to get
	// the assembled string. nil → System stays as configured.
	PromptBuilder *prompt.SystemPromptBuilder

	// CompactionCfg controls when the running message list gets
	// collapsed into a synthetic summary. Initialized to
	// DefaultCompactionConfig in New; override fields before Run.
	CompactionCfg CompactionConfig

	// SemanticCfg controls semantic (LLM-powered) compaction.
	// Initialized to defaultSemanticCompactionConfig in New;
	// override fields before Run. Zero value disables semantic
	// compaction (falls back to deterministic only).
	SemanticCfg SemanticCompactionConfig

	// MaxContextTokens is the maximum context window size for
	// context pressure computation. Default: 128_000.
	MaxContextTokens int

	// HookRunner dispatches lifecycle hooks (PreToolUse, PostToolUse,
	// SessionStart, SessionEnd). nil = hooks disabled.
	HookRunner *hooks.Runner

	// StopWhen runs after each step; first match wins. Defaults below.
	StopWhen []StopCondition

	// Messages is the conversation. The agent appends user messages,
	// assistant turns, and tool results here.
	Messages []llm.Message

	// StepTimeout, if non-zero, caps the duration of a single step
	// (one model turn + tool execution). 0 = no per-step limit.
	StepTimeout time.Duration

	// MaxToolCalls is the hard cap on total tool calls per session.
	// Warns at 80% via OnInfo. 0 = unlimited.
	MaxToolCalls int

	// IsSubagent is true when this agent was spawned by a parent
	// via LoopSpawner. It disables thinking in sub-agents (via
	// SelectThinking) and may be used to adjust other behaviors.
	IsSubagent bool

	// Spawner, when non-nil, enables sub-agent dispatch from slash
	// commands that declare agent: or subtask: true. Set by the
	// assembly layer (cmd/dsc or TUI) after construction.
	Spawner tools.Spawner

	// Jobs manages background jobs (async sub-agents and background_bash).
	// Initialized in New; closed via defer in Run.
	Jobs *JobRegistry

	// BudgetPolicy and BudgetState control session cost gating.
	// Zero values (default) disable budget checks entirely.
	BudgetPolicy BudgetPolicy
	BudgetState  BudgetState

	// Skills is the skill metadata store. nil = no skills loaded.
	Skills *skills.Store

	// MCPRegistry is the MCP tool registry. nil = no MCP servers.
	// Its SchemaHash feeds the epoch's mcp_schema_hash, so startup MCP
	// discovery is part of the frozen prefix and mid-session schema
	// changes surface as pending changes rather than live drift.
	MCPRegistry *mcp.Registry

	// Profile is the active first-class agent profile. nil means the
	// implicit "default" profile. The profile name feeds the epoch's
	// agent_profile_hash; switching profiles via SwitchProfile creates a
	// new epoch (one expected cache miss) rather than mutating the live one.
	Profile *agents.AgentProfile

	// ActiveTiers controls which tool tiers are sent to the model. The
	// agent uses Tools.AsLLMToolsFiltered(ActiveTiers...) when building
	// requests; a nil/empty slice means "no filter" (all registered tools
	// are exposed). The constructor defaults this to [TierCore] (see New).
	ActiveTiers []tools.ToolTier
	// contains filtered or unexported fields
}

Agent is one running ReAct loop. Construct with New, drive with Run.

Agent is *not* safe for concurrent use within a single session. The TUI wraps it in a goroutine and a consumer reads events from Events() to drive the UI.

func New

func New(client *llm.Client, reg *tools.Registry, pol *permissions.Policy, model string) *Agent

New returns an Agent with sensible defaults for v0.1.

The Events channel is buffered at 256: roughly 4 seconds at a 60 tok/s burst rate. Streaming deltas don't block the model goroutine unless the consumer falls more than that behind, which would only happen if the UI goroutine were stuck — an upstream bug we'd want to surface.

func (*Agent) AskQuestion

func (a *Agent) AskQuestion(ctx context.Context, req tools.QuestionRequest) (tools.QuestionResponse, error)

AskQuestion implements tools.Questioner. It emits an EventQuestionAsk and blocks until the consumer replies or ctx is cancelled.

func (*Agent) AttachChildTraceSink

func (a *Agent) AttachChildTraceSink(child *Agent) *TraceSinkHandle

AttachChildTraceSink wires a subagent's event bus into the parent's trace writer, stamping every child record with agent_role="subagent" and the parent's current epoch_id. It is a no-op (returns nil) when the parent has no trace sink attached, so normal interactive/CLI runs are unaffected. The returned handle's Wait blocks until the child's EventDone is processed; the caller closes it after the subagent's Run returns.

func (*Agent) AttachTraceSink

func (a *Agent) AttachTraceSink(w io.Writer) *TraceSinkHandle

AttachTraceSink subscribes a root TraceSink to the agent's bus and starts a drain goroutine. The returned handle's Wait blocks until the run's EventDone is processed, so callers can flush a JSONL file before exit. The sink is also retained on the agent so spawned subagents can tee their own epoch/usage events into the same trace (see AttachChildTraceSink).

func (*Agent) Bus

func (a *Agent) Bus() *Bus

Bus returns the agent's event bus. Additional consumers (loggers, parity recorders, future daemons) subscribe via Bus().Subscribe to receive versioned EventEnvelope values. The primary consumer (TUI/CLI) should continue using Events() for backward compatibility.

func (*Agent) CancelJob

func (a *Agent) CancelJob(id string) error

CancelJob implements tools.JobStatusController.

func (*Agent) Close

func (a *Agent) Close()

Close releases agent resources. It cancels all running background jobs and should be called when the session ends (not per prompt turn).

func (*Agent) CurrentEpochID

func (a *Agent) CurrentEpochID() string

CurrentEpochID returns the current epoch's ID, or "" when no epoch has been initialized yet. Used to stamp a subagent's child trace with the parent epoch it ran under.

func (*Agent) EmitInfo

func (a *Agent) EmitInfo(msg string)

EmitInfo pushes an out-of-band notice onto the event stream. Used by adjacent components (e.g. llm.Client.OnRetry) that don't otherwise hold the event channel but want to surface user-visible status.

func (*Agent) EnableEscalation

func (a *Agent) EnableEscalation(model string)

EnableEscalation turns on model-driven escalation to the given model and adds the marker contract to the static system prompt so the model knows to emit <<<NEEDS_PRO>>>. Call it BEFORE Run (before epoch #1 freezes) so the contract is part of the frozen, fingerprinted prefix; it is inserted just before prompt.DynamicContextBoundary (when present) so per-turn dynamic context still follows it. Adding the contract deliberately moves the Prefix Fingerprint for this session — the model name is the only interpolant, so it stays byte-stable across turns — while the default (escalation off) leaves DefaultSystemPrompt and the committed cache-stable golden untouched. No-op when model is empty or already the active model. NOTE: when a PromptBuilder is set it rebuilds a.System each turn, overwriting this injection; such assemblies must add the contract through the builder's static section instead.

func (*Agent) EnterPlan

func (a *Agent) EnterPlan(_ context.Context) error

EnterPlan transitions the agent into plan mode. While in plan mode only read-only tools, question, and plan_exit are available. Calling EnterPlan when already in plan mode returns an error.

func (*Agent) Events

func (a *Agent) Events() <-chan Event

Events returns the receive end of the agent-lifetime event stream. Consume from one goroutine; the agent guarantees in-order delivery. The channel is never closed by the agent — multiple Run calls share it. Consumers should select against their own ctx.Done() to exit cleanly during shutdown.

func (*Agent) ExitPlan

func (a *Agent) ExitPlan(_ context.Context, plan string) error

ExitPlan transitions the agent out of plan mode, restoring the original tool registry and permissions policy. plan is the finalized plan text (unused here; consumed by the plan_exit tool itself). Calling ExitPlan when not in plan mode returns an error.

func (*Agent) ForceCompact

func (a *Agent) ForceCompact(ctx context.Context)

ForceCompact runs maybeCompact with a temporarily-lowered token threshold so the user's /compact slash command can fire even when the message list hasn't hit the auto threshold. The preserve count is honored — too-short transcripts still no-op.

func (*Agent) HasActiveBackgroundWork

func (a *Agent) HasActiveBackgroundWork() bool

HasActiveBackgroundWork reports whether any background job (async subagent or background_bash) is still running. ReloadSkills mutates state those detached goroutines read — notably the shared skill store via skill_read — so a caller must refuse a reload while this is true. a.running alone covers only the main loop, not work that outlives the parent turn.

func (*Agent) JobStatus

func (a *Agent) JobStatus(id string, tailLines int) (tools.Status, error)

JobStatus implements tools.JobStatusController.

func (*Agent) ReconcileUndo

func (a *Agent) ReconcileUndo(ctx context.Context, n int) (int, error)

ReconcileUndo rolls the transcript back by n completed steps so the model's view matches the files a /undo just reverted (today /undo reverts files only; a.Messages and disk still claim the reverted turn succeeded). It truncates a.Messages to the boundary the first undone step started from (StepRecord.MessageCount), trims a.steps in lockstep, and — when the Persister supports it — truncates the persisted messages to the same boundary.

It refuses to cross a compaction: compaction renumbers a.Messages, so boundaries recorded by steps below compactionFloor are stale and truncating to them would corrupt the transcript. The caller (TUI) must ensure the agent is not running, since ReconcileUndo mutates a.Messages that runStep reads.

The Static Prefix (system + tools) is untouched, so the cache fingerprint is byte-identical across the undo and the 50x discount survives the rewind. Returns the number of body messages removed.

func (*Agent) ReloadSkills

func (a *Agent) ReloadSkills(cwd, home string) (ReloadResult, error)

ReloadSkills re-scans the skill directories under cwd (then home), refreshes the in-place skill store, rebuilds the model-visible system prompt, and — only when the rebuilt prefix actually moves — mints a new prefix epoch so the edit takes effect mid-session.

This is the deliberate, user-triggered exception to skills.LoadScan's session-start-only rule: the skill directory is normally frozen for the whole session to protect DeepSeek's 50x prompt cache. /reload-skills trades exactly one cache miss (the new epoch's first turn) for the model seeing edited skills now instead of next session.

The store is mutated in place (skills.Store.ReplaceFrom). a.Skills and the skill_read dispatcher share one *Store pointer, so the capability set used for drift detection and on-demand skill-body lookups both pick up the reloaded skills atomically.

Concurrency: the caller MUST ensure no turn is in flight — neither the main loop (a.running) NOR any background job (HasActiveBackgroundWork). ReloadSkills mutates a.System, the shared skill store (which runStep's capability set and the skill_read tool both read — including from a still-live async subagent), and the epoch state, none of which is safe to touch while a reader runs. The TUI gates this behind both checks; /undo's guard covers only the main loop, so reload's guard is deliberately stronger.

func (*Agent) RequestStop

func (a *Agent) RequestStop()

RequestStop marks the current run as explicitly stopped by the user, so a subsequent context cancellation is reported as StopUserRequested rather than the ambient StopContextCancel. Callers invoke it immediately before cancelling the run's context (e.g. the TUI's ctrl+c handler). Safe to call from a goroutine other than the one driving Run.

func (*Agent) Run

func (a *Agent) Run(ctx context.Context, userPrompt string) (reason StopReason, err error)

Run drives the loop until a stop condition fires or context cancels. Returns the StopReason and any infrastructure error.

The userPrompt is appended as a user message. To resume without a new user prompt (e.g. after a tool result the model needs to react to), pass "".

Run defers an EventDone emit so the consumer sees a strict terminator AFTER every other event from this turn. Bypassing the events channel for the "done" signal used to race trailing deltas and leave the UI's chrome stuck on "writing…" — never do that.

func (*Agent) StartBashJob

func (a *Agent) StartBashJob(ctx context.Context, command string, usePTY bool, timeoutMs int, sb sandbox.Sandbox, profile sandbox.Profile) (string, error)

StartBashJob implements tools.JobController. It starts a background bash job and returns immediately with the job ID.

func (*Agent) SwitchProfile

func (a *Agent) SwitchProfile(p agents.AgentProfile) *PrefixEpoch

SwitchProfile makes p the active agent profile. It applies the profile's tool tiers and model, then creates a new PrefixEpoch via the epoch manager. The first turn of the new epoch is expected to miss cache (ExpectedCacheMiss); subsequent same-epoch turns stay cache-stable. Returns the new epoch. A no-op when an epoch hasn't been initialized yet (the first runStep will pick up the profile when it creates epoch #1).

func (*Agent) Transcript

func (a *Agent) Transcript() []byte

Transcript returns a compact wire-format snapshot of recent messages for the Duet builtin hook. Bounded so we don't blow up pro's context uselessly; for v0.1 we send the last 8 messages.

func (*Agent) WaitChildTraces

func (a *Agent) WaitChildTraces(timeout time.Duration)

WaitChildTraces blocks until every tracked subagent trace handle has flushed its child's EventDone, or the shared deadline elapses, then closes them. A one-shot run calls this before closing the root trace so an async (`task` with async:true) subagent's child epoch is flushed instead of being lost when the process exits. No-op when no subagent trace was attached.

If a handle times out the child never reached EventDone — its trace is partial. Rather than close it silently, a `child_trace_incomplete` record is written so the gate fails closed instead of trusting a cut-off child.

type BudgetPolicy

type BudgetPolicy struct {
	WarnCNY float64 // emit warning when projected spend >= WarnCNY
	HardCNY float64 // block turn when projected spend >= HardCNY
}

BudgetPolicy configures session cost thresholds. Zero values disable the corresponding gate.

type BudgetState

type BudgetState struct {
	SpentCNY float64
	Warned   bool

	// Rolling cache-hit accounting over every billed turn this session. Used
	// to discount the pre-stream cost projection by the realized cache-hit
	// rate (T4.2). Only frames with input tokens (hit+miss>0) fold in, so an
	// empty/absent usage frame can't skew the rate.
	CacheHitTokens  int
	CacheMissTokens int

	// UnknownModelWarned makes the "model has no known pricing, so the budget
	// gate can't cost-gate it" warning fire at most once per session.
	UnknownModelWarned bool
}

BudgetState tracks cumulative spend and whether a warning has already been emitted for this session.

func (*BudgetState) FoldCacheUsage

func (s *BudgetState) FoldCacheUsage(hitTokens, missTokens int)

FoldCacheUsage adds one turn's realized cache hit/miss token counts into the rolling session accounting. Turns with no input tokens are ignored so they can't move the rate.

func (BudgetState) SessionCacheHitRate

func (s BudgetState) SessionCacheHitRate() float64

SessionCacheHitRate is the rolling cache-hit fraction in [0,1] over all input tokens billed this session. It returns 0 before any usage is observed (cold start), so a projection discounted by it floors to all-miss — never looser than the conservative default. The result is clamped defensively.

type Bus

type Bus struct {
	// contains filtered or unexported fields
}

Bus is a multi-consumer fan-out for agent events. Subscribers receive every published event as an EventEnvelope in publish order.

Ordinary events are delivered non-blocking: if a subscriber's buffer is full the event is dropped and its Dropped counter increments. Reply-carrying events (EventPermissionAsk, EventQuestionAsk) are delivered blocking — they carry a reply channel the agent goroutine parks on, so dropping them would deadlock the agent.

func NewBus

func NewBus() *Bus

NewBus returns an empty Bus ready for subscribers.

func (*Bus) Close

func (b *Bus) Close()

Close shuts down the Bus and closes every subscriber channel. Publish after Close is a no-op.

func (*Bus) Publish

func (b *Bus) Publish(ev Event)

Publish wraps ev in an EventEnvelope (with a monotonic Seq and current time) and fans it out to every subscriber. The caller must not hold any locks that a subscriber's reading goroutine might also need.

func (*Bus) Subscribe

func (b *Bus) Subscribe(buffer int) *Subscription

Subscribe adds a consumer and returns its Subscription. buffer is the channel capacity; buffer <= 0 defaults to 256 (matching the legacy events channel). The caller must drain C or Unsubscribe to avoid back-pressure on reply events.

func (*Bus) Unsubscribe

func (b *Bus) Unsubscribe(s *Subscription)

Unsubscribe removes the subscription and closes its channel. Safe to call multiple times; subsequent calls are no-ops.

type CapabilitySet

type CapabilitySet struct {
	ProfileID string
	Skills    *skills.Store     // nil when no skill store is configured
	MCPTools  []mcp.McpToolMeta // nil when no MCP servers are connected
}

CapabilitySet is the latent capability identity behind a PrefixEpoch: the inputs that determine *which* StaticPrefix gets built (the active agent profile, the skill catalog, the connected MCP tools) but which are not themselves the model-visible bytes. EpochManager watches it to record pending changes; it is deliberately NOT part of the Prefix Fingerprint — see /CONTEXT.md and docs/adr/0001-prefix-fingerprint-is-model-visible-bytes-only.

Skill and active-MCP changes also move the fingerprint (the skill directory is rendered into the system prompt; active MCP tools are in the tool set), so they are reported here as one fine-grained pending change rather than also as a raw "system"/"tools" change — that is the double-report the cache-epoch review flagged.

type CompactionConfig

type CompactionConfig struct {
	// PreserveRecentMessages is how many trailing messages stay
	// outside the compaction window (default 4).
	PreserveRecentMessages int

	// MaxEstimatedTokens caps the compacted summary's own token
	// budget — used by the summarizer to truncate (default 10_000).
	MaxEstimatedTokens int

	// AutoCompactInputTokens is the trigger threshold: once the
	// estimated token count of the full message list exceeds this
	// value, compaction fires (default 100_000; override via env
	// DEEPSEEKCODE_AUTO_COMPACT_INPUT_TOKENS).
	AutoCompactInputTokens int
}

CompactionConfig controls when and how the agent compacts its running message list. Values flow in via Agent.CompactionCfg; the agent reads them under no lock — set them once at construction.

func DefaultCompactionConfig

func DefaultCompactionConfig() CompactionConfig

DefaultCompactionConfig returns the default config. The AutoCompactInputTokens value can be overridden at process start via DEEPSEEKCODE_AUTO_COMPACT_INPUT_TOKENS — malformed values fall back to the default rather than crash.

type CompactionResult

type CompactionResult struct {
	Summary        string
	FromIdx, ToIdx int
	RemovedCount   int
	SummaryMessage llm.Message
	KeptMessages   []llm.Message
}

CompactionResult is what CompactSession produces. Summary == "" means "no compaction performed" — the caller should leave the message list untouched.

func CompactSession

func CompactSession(messages []llm.Message, cfg CompactionConfig, charsPerToken float64) CompactionResult

CompactSession runs the full pipeline: ShouldCompact → adjustBoundary → summarize. Returns a CompactionResult with Summary == "" when no compaction was performed (caller must check Summary before mutating its message list).

CompactSession does NOT persist — the caller wires the result into Persister.ReplaceWithCompaction (T-209) and replaces its in-memory a.Messages slice.

type EpochComponents

type EpochComponents struct {
	AgentProfileID  string
	Model           string
	ReasoningEffort string
	StaticSystem    string
	FewShots        []llm.Message
	ToolSpecs       []llm.Tool
	Capability      CapabilitySet
}

EpochComponents is the input for creating a PrefixEpoch. StaticSystem, ToolSpecs (and, when folded in, FewShots) are the model-visible bytes that determine the Prefix Fingerprint; Capability is the latent identity used only for pending-change detection.

type EpochManager

type EpochManager struct {
	// contains filtered or unexported fields
}

EpochManager manages PrefixEpoch lifecycle.

func NewEpochManager

func NewEpochManager() *EpochManager

func (*EpochManager) CreateEpoch

func (m *EpochManager) CreateEpoch(reason string, components EpochComponents) *PrefixEpoch

CreateEpoch builds a new PrefixEpoch from components but does not make it current. Use SwitchEpoch or the initial CreateEpoch path.

func (*EpochManager) CurrentEpoch

func (m *EpochManager) CurrentEpoch() *PrefixEpoch

CurrentEpoch returns the current epoch. Returns nil if no epoch has been initialized.

func (*EpochManager) DetectDrift

func (m *EpochManager) DetectDrift(components EpochComponents) []PendingChange

DetectDrift records the latent capability deltas (profile / skills / MCP) between the frozen epoch and the live components as pending changes, using canonical comparisons. Model-visible byte drift is NOT detected here — it is caught per turn by llm.PrefixMonitor and treated as a bug, not a pending change. Returns the newly detected changes. See docs/adr/0001.

func (*EpochManager) ExpectedCacheMiss

func (m *EpochManager) ExpectedCacheMiss() bool

ExpectedCacheMiss returns true on the first turn after an epoch switch. Returns false on subsequent turns. Call once per turn — it clears the flag on read.

func (*EpochManager) FreezeEpoch

func (m *EpochManager) FreezeEpoch()

FreezeEpoch marks the epoch as immutable after first model request and captures FrozenTools/FrozenSystem from the current epoch.

func (*EpochManager) InitEpoch

func (m *EpochManager) InitEpoch(reason string, components EpochComponents) *PrefixEpoch

InitEpoch creates and sets the initial epoch. Called once at session start. Panics if called when an epoch already exists.

func (*EpochManager) IsFrozen

func (m *EpochManager) IsFrozen() bool

IsFrozen reports whether the epoch is frozen.

func (*EpochManager) PendingChanges

func (m *EpochManager) PendingChanges() []PendingChange

PendingChanges returns a copy of the pending changes list.

func (*EpochManager) RecordPendingChange

func (m *EpochManager) RecordPendingChange(change PendingChange)

RecordPendingChange records a mutation that occurred after the epoch was frozen. The change is not applied to the current epoch.

func (*EpochManager) SetBus

func (m *EpochManager) SetBus(bus *Bus)

SetBus attaches an event bus for epoch lifecycle events.

func (*EpochManager) SwitchEpoch

func (m *EpochManager) SwitchEpoch(reason string, components EpochComponents) *PrefixEpoch

SwitchEpoch creates a new epoch, makes it current, and resets the frozen/pending state. The first turn of the new epoch will report ExpectedCacheMiss() = true.

type Event

type Event interface {
	// contains filtered or unexported methods
}

Event is the sealed interface implemented by every event the agent emits. Type-switch on the concrete type at the consumer.

type EventBackgroundJobFinish

type EventBackgroundJobFinish struct {
	ID      string
	State   JobState
	Summary string
}

EventBackgroundJobFinish signals that a background job has completed.

type EventBackgroundJobStart

type EventBackgroundJobStart struct {
	ID   string
	Kind JobKind
}

EventBackgroundJobStart signals that a background job has started.

type EventBudget

type EventBudget struct {
	Kind         string
	ProjectedCNY float64
	SpentCNY     float64
	Model        string
}

EventBudget reports a session-budget gate decision (T1.3 — promoted from a stringly-typed EventInfo so "warned" vs "blocked" vs "unpriced" are distinguishable for analytics and programmatic gating). ProjectedCNY/SpentCNY are the gate's inputs (ProjectedCNY is 0 for the unpriced kind, where it cannot be computed); Model is the model being gated. Traced with type budget.warning / budget.blocked / budget.unpriced.

type EventCompaction

type EventCompaction struct {
	FromIdx, ToIdx int
	Summary        string
	RemovedCount   int
}

EventCompaction reports that the agent collapsed messages [FromIdx, ToIdx) into a single Summary message, freeing RemovedCount slots from the live transcript. Wired in Phase 2.

type EventCompactionWarning

type EventCompactionWarning struct {
	Pressure  float64
	Threshold float64
}

EventCompactionWarning is emitted when context pressure crosses the warning threshold (default 75%). The UI can show a status indicator or prepare pinned facts for an upcoming compaction.

type EventDone

type EventDone struct {
	Reason StopReason
	Err    error
}

EventDone is the agent's "this Run is finished" signal. Emitted via defer from Run, so it travels the same channel as every other event and arrives in strict order AFTER the final EventStepFinish. This is load-bearing: routing the "done" signal through a separate goroutine + tea.Msg path used to race past trailing text deltas, leaving the UI's chrome stuck on "writing…" because a late delta would re-fire BeginWriting after the reset.

type EventDriftBlocked

type EventDriftBlocked struct {
	EpochID string
	Which   string
}

EventDriftBlocked signals that an unauthorized prefix drift was detected and blocked within a frozen epoch.

type EventEnvelope

type EventEnvelope struct {
	Version int       // = EventProtocolVersion
	Seq     uint64    // monotonic, assigned by Bus
	At      time.Time // publish moment
	Event   Event     // the concrete event
}

EventEnvelope wraps an Event with versioning, sequence number, and timestamp for multi-consumer fan-out on the Bus. It does NOT implement Event itself — it is a container, not an event.

type EventEpochCreated

type EventEpochCreated struct {
	EpochID          string
	StaticPrefixHash string
	ToolsHash        string
	Reason           string
}

EventEpochCreated signals that a new PrefixEpoch was created.

type EventEpochFrozen

type EventEpochFrozen struct {
	EpochID string
}

EventEpochFrozen signals that the current PrefixEpoch was frozen after the first model request.

type EventEpochSwitched

type EventEpochSwitched struct {
	OldEpochID       string
	NewEpochID       string
	StaticPrefixHash string
	ToolsHash        string
	Reason           string
}

EventEpochSwitched signals an explicit epoch switch.

type EventEscalated

type EventEscalated struct {
	Trigger   string
	FromModel string
	ToModel   string
	Reason    string
}

EventEscalated reports that the current turn was re-issued on a stronger model (the Two-Model escalation). Trigger is "marker" (the model emitted a <<<NEEDS_PRO>>> self-declaration) or "repair_errors" (the per-turn repair failure count crossed the threshold). FromModel/ToModel record the switch. Traced with type policy.escalated.

type EventHookFired

type EventHookFired struct {
	HookName string
	Event    string // PreToolUse / PostToolUse / ...
	Decision string // allow / deny / continue / ask
	Reason   string
	Dur      time.Duration
}

EventHookFired reports that a registered hook ran. Decision is one of allow / deny / continue / ask; Reason is the hook's free-form explanation. Surfaced so the UI can show a `[hook] …` chat line. Wired in Phase 3.

type EventInfo

type EventInfo struct{ Text string }

EventInfo is an out-of-band notice (retry attempt, validator skipped, tool-call rate warning). Surfaced as a chat line.

type EventPendingChange

type EventPendingChange struct {
	EpochID     string
	Kind        PendingChangeKind
	Description string
}

EventPendingChange signals that a component change was detected after the epoch was frozen. The change is recorded but not applied.

type EventPermissionAsk

type EventPermissionAsk struct {
	Check permissions.Check
	Reply chan<- PermissionResponse
}

EventPermissionAsk requests user approval for a tool call. The consumer MUST send a PermissionResponse on Reply — the agent goroutine blocks on the receive. Reply is buffered (cap 1) so the consumer can send without serialization concerns.

type EventPermissionDenied

type EventPermissionDenied struct {
	Tool   string
	Reason string
	ByRule bool
}

EventPermissionDenied reports a tool call refused by the permission layer (T1.3 — promoted from EventInfo). ByRule distinguishes an explicit deny-rule match from a policy-tier denial; Reason is the human-readable cause. Traced with type permission.denied.

type EventQuestionAsk

type EventQuestionAsk struct {
	Questions []tools.Question
	Reply     chan<- tools.QuestionResponse
}

EventQuestionAsk requests the user answer one or more questions. The consumer MUST send a QuestionResponse on Reply — the agent goroutine blocks on the receive. Reply is buffered (cap 1) so the consumer can send without serialization concerns.

type EventReasoningDelta

type EventReasoningDelta struct{ Text string }

EventReasoningDelta appends to the active reasoning block.

type EventReasoningEnd

type EventReasoningEnd struct{}

EventReasoningEnd closes the active reasoning block.

type EventReasoningStart

type EventReasoningStart struct{}

EventReasoningStart opens a new reasoning block.

type EventRepair

type EventRepair struct {
	Kind       string
	Tool       string
	CallID     string
	Message    string
	BeforeHash string
	AfterHash  string
}

EventRepair reports a tool-call repair action (args completed, recovered, suppressed, or schema-complex). Published by the repair integration layer in runStep after model streaming finishes.

type EventSemanticCompaction

type EventSemanticCompaction struct {
	FromIdx, ToIdx         int
	UsedSemantic           bool
	SummaryCost            float64
	FallbackReason         string
	StaticPrefixHashBefore string
	StaticPrefixHashAfter  string
}

EventSemanticCompaction reports that semantic compaction ran. UsedSemantic is true when the LLM summary was used; false when deterministic fallback was used. SummaryCost is the cost of the LLM call (0 when fallback). FallbackReason is set when the LLM call failed and deterministic compaction was used instead.

StaticPrefixHashBefore/After are the measured static-prefix fingerprints (system + tools) of the frozen baseline and of the request compaction actually fed the model. They are emitted into the trace so the benchmark can verify compaction did not move the prefix — instead of the agent asserting stability with a hardcoded boolean. When the freeze override is intact they are equal; a regression that summarized against the live (non-frozen) prompt makes them diverge and fails the gate.

type EventStepFinish

type EventStepFinish struct {
	Reason StopReason
	Usage  llm.Usage
	Model  string
}

EventStepFinish ends one ReAct step. The consumer updates its status counters / cost HUD here. Model is the model that produced the step (an escalated turn reports the stronger model); empty means the loop model.

type EventSubagentFinish

type EventSubagentFinish struct {
	Agent  string
	Result SubResult
}

EventSubagentFinish signals that a sub-agent run completed (successfully or not).

type EventSubagentStart

type EventSubagentStart struct {
	Agent       string
	Description string
}

EventSubagentStart signals that a sub-agent spawn has begun.

type EventTextDelta

type EventTextDelta struct{ Text string }

EventTextDelta appends to the active assistant-text block.

type EventToolCallResult

type EventToolCallResult struct {
	CallID string
	Result tools.Result
	Dur    time.Duration
}

EventToolCallResult carries the result of an executed tool call.

type EventToolCallStart

type EventToolCallStart struct{ Call llm.ToolCall }

EventToolCallStart announces a tool call (model decided to run a tool; permission gates haven't fired yet).

type Job

type Job struct {
	ID          string
	Kind        JobKind
	Description string
	StartedAt   time.Time
	FinishedAt  time.Time
	State       JobState
	Summary     string
	// contains filtered or unexported fields
}

Job represents a running or completed background job.

func (*Job) AppendOutput

func (j *Job) AppendOutput(p []byte)

AppendOutput appends data to the job's ring buffer, dropping old data if the buffer would exceed maxBytes.

func (*Job) Tail

func (j *Job) Tail(maxLines int) (output string, droppedBytes int64, truncatedLines bool)

Tail returns the last n lines of output, the number of dropped bytes, and whether the output was truncated.

type JobKind

type JobKind int

JobKind distinguishes between different background job types.

const (
	JobSubagent JobKind = iota
	JobBackgroundBash
)

func (JobKind) String

func (k JobKind) String() string

String returns the human-readable name of the job kind.

type JobRegistry

type JobRegistry struct {
	// contains filtered or unexported fields
}

JobRegistry manages background jobs for an agent.

func NewJobRegistry

func NewJobRegistry() *JobRegistry

NewJobRegistry creates a new, empty job registry.

func (*JobRegistry) Cancel

func (r *JobRegistry) Cancel(id string) bool

Cancel cancels the job with the given ID. Returns true if the job was found and canceled, false otherwise. Calling Cancel on an already- cancelled or finished job returns false.

func (*JobRegistry) Close

func (r *JobRegistry) Close()

Close cancels all running jobs and waits up to 2 seconds for them to finish. Any jobs still running after the grace period are marked as JobCanceled.

func (*JobRegistry) Finish

func (r *JobRegistry) Finish(id string, state JobState, summary string)

Finish marks a job as completed with the given state and summary. It is safe to call multiple times; only the first call takes effect.

func (*JobRegistry) Get

func (r *JobRegistry) Get(id string) (*Job, bool)

Get returns the job with the given ID, or nil if not found.

func (*JobRegistry) HasActive

func (r *JobRegistry) HasActive() bool

HasActive reports whether any job is currently in the JobRunning state. /reload-skills uses it (via Agent.HasActiveBackgroundWork) to refuse a skill reload while an async subagent or background_bash job is still live: those goroutines outlive the parent loop and can read the shared skill store that the reload mutates in place, so a.running alone does not cover them.

func (*JobRegistry) JobStatus

func (r *JobRegistry) JobStatus(id string, tailLines int) (Status, error)

JobStatus returns status information for a job suitable for tools.Status return.

func (*JobRegistry) List

func (r *JobRegistry) List() []*Job

List returns all jobs in the registry.

func (*JobRegistry) Start

func (r *JobRegistry) Start(parent context.Context, kind JobKind, description string) (*Job, context.Context)

Start creates a new job with State=JobRunning and returns it along with a derived context. The caller holds the context and runs the actual work; when done, call Finish to lock in the final state.

type JobState

type JobState int

JobState represents the current state of a background job.

const (
	JobRunning JobState = iota
	JobSucceeded
	JobFailed
	JobCanceled
)

func (JobState) String

func (s JobState) String() string

String returns the human-readable name of the job state.

type LoopSpawner

type LoopSpawner struct {
	Client   *llm.Client
	Parent   *Agent
	Defs     map[string]agents.AgentDef
	MaxDepth int // 0 → default 2

	// WT is the worktree manager for git-worktree isolation.
	// nil = worktree path disabled; def.Worktree==true degrades to normal spawn.
	WT *worktree.Manager

	// Locks provides branch-level mutual exclusion for worktree operations.
	Locks worktree.BranchLocker
	// contains filtered or unexported fields
}

LoopSpawner implements tools.Spawner by running a child Agent loop in the same process. It derives a child Registry and Policy from the parent Agent, respecting agent-def tool whitelists and depth limits.

func (*LoopSpawner) Spawn

type MessageTruncator

type MessageTruncator interface {
	TruncateMessages(ctx context.Context, keepCount int) (int, error)
	// PersistedMessageCount returns the count of persisted body messages.
	// ReconcileUndo compares it to len(a.Messages) to confirm the in-memory
	// transcript is index-aligned with disk before truncating disk — alignment
	// breaks after a resume, a branch, or a dangling-tool-call repair insert,
	// where an in-memory boundary would delete the wrong persisted rows.
	PersistedMessageCount(ctx context.Context) (int, error)
}

MessageTruncator is an optional Persister capability: drop persisted body messages with index >= keepCount so disk matches the in-memory transcript after an /undo (T3.5). internal/session.Persister implements it; it is checked via type assertion so non-persisting agents and test fakes need not implement it (mirrors ReceiptAppender).

type PendingChange

type PendingChange struct {
	Kind        PendingChangeKind
	Description string
	DetectedAt  time.Time
}

PendingChange is a detected mutation that is blocked from model-visibility until an explicit epoch switch.

func CapabilityDiff

func CapabilityDiff(oldCS, newCS CapabilitySet) []PendingChange

CapabilityDiff returns the pending changes between two capability sets using canonical comparisons (skills.Store.Diff and mcp.CompareToolLists), so cache-irrelevant noise — a reordered MCP tool list, or reordered JSON-Schema keys within a tool — never registers as a change.

type PendingChangeKind

type PendingChangeKind string

PendingChangeKind identifies the type of change detected after an epoch was frozen.

const (
	PendingSystemChanged        PendingChangeKind = "system_changed"
	PendingToolAdded            PendingChangeKind = "tool_added"
	PendingToolRemoved          PendingChangeKind = "tool_removed"
	PendingToolSchemaChanged    PendingChangeKind = "tool_schema_changed"
	PendingSkillAdded           PendingChangeKind = "skill_added"
	PendingSkillRemoved         PendingChangeKind = "skill_removed"
	PendingSkillBodyChanged     PendingChangeKind = "skill_body_changed"
	PendingMCPToolAdded         PendingChangeKind = "mcp_tool_added"
	PendingMCPToolRemoved       PendingChangeKind = "mcp_tool_removed"
	PendingMCPToolSchemaChanged PendingChangeKind = "mcp_tool_schema_changed"
	PendingAgentProfileChanged  PendingChangeKind = "agent_profile_changed"
	PendingFewShotsChanged      PendingChangeKind = "few_shots_changed"
)

type PermissionResponse

type PermissionResponse struct {
	Allow          bool
	PersistPattern bool // when true (bash + "always"), persist to allowlist
}

PermissionResponse is what the UI returns from OnPermissionAsk.

type Persister

type Persister interface {
	// SessionID is the session this agent is associated with.
	SessionID() string

	// AppendUserMessage records a user turn as a typed block slice
	// (typically one TextBlock; multimodal extensions later).
	AppendUserMessage(ctx context.Context, blocks []llm.ContentBlock) (int, error)

	// AppendAssistant records an assistant turn as a typed block slice
	// (Thinking, Text, ToolUse — order matches model emission).
	AppendAssistant(ctx context.Context, blocks []llm.ContentBlock, model string, usage llm.Usage) (int, error)

	// AppendToolResult records the result of one tool_use, identified
	// by toolUseID. isError true marks an infrastructure failure so
	// renderers can color it.
	AppendToolResult(ctx context.Context, toolUseID string, content string, isError bool) (int, error)

	// TakeSnapshot snapshots the given paths before a mutating tool runs.
	// stepIdx is the message index of the assistant turn that contained
	// the tool call; the snapshot manager uses it to namespace files on
	// disk so /undo can revert a specific step.
	TakeSnapshot(stepIdx int, paths []string) (int, error)

	// SetActiveModel persists the result of a /models switch.
	SetActiveModel(ctx context.Context, model string) error

	// ReplaceWithCompaction atomically deletes messages in [fromIdx,
	// toIdx) and inserts a synthetic summary message at fromIdx,
	// renumbering subsequent messages to keep idx contiguous. Returns
	// the idx of the inserted summary. The full transactional
	// implementation lands in Phase 2 (T-208); for now this method
	// exists so the rest of the system can compile against the final
	// interface shape.
	ReplaceWithCompaction(ctx context.Context, fromIdx, toIdx int, summary string) (int, error)
}

Persister abstracts the session/snapshot bookkeeping the agent needs. internal/session.Persister and internal/snapshots.Manager satisfy this. Keeping the interface here lets the agent stay decoupled.

type PrefixEpoch

type PrefixEpoch struct {
	EpochID         string
	AgentProfileID  string
	Model           string
	ReasoningEffort string
	StaticSystem    string
	FewShots        []llm.Message
	ToolSpecs       []llm.Tool
	// Capability is the latent identity (profile/skills/MCP) frozen with this
	// epoch. It drives pending-change detection but is NOT in StaticPrefixHash.
	Capability      CapabilitySet
	CreatedAt       time.Time
	CreatedReason   string
	ComponentHashes map[string]string
	// StaticPrefixHash is the Prefix Fingerprint: the canonical hash of the
	// model-visible bytes (system + tools) — the DeepSeek cache key. Latent
	// capability state is intentionally excluded (see docs/adr/0001).
	StaticPrefixHash string

	// FrozenTools and FrozenSystem capture the tool list and system
	// prompt at the moment FreezeEpoch is called. When the epoch is
	// frozen, runStep and maybeCompact MUST use these instead of the
	// live values to guarantee cache-stable prefixes.
	FrozenTools  []llm.Tool
	FrozenSystem string
}

PrefixEpoch is a frozen model-visible prefix snapshot. Once frozen (after first model request), it cannot change. Changes to tools, skills, MCP, system prompt, etc. become pending changes that are visible in receipts but not model-visible until an explicit epoch switch.

type ReceiptAppender

type ReceiptAppender interface {
	AppendReceipt(ctx context.Context, kind session.ReceiptKind, payload json.RawMessage) (int64, error)
}

ReceiptAppender is an optional interface that Persister implementations can satisfy to support transcript receipt persistence. The agent checks for this interface via type assertion and uses it when available.

type ReloadResult

type ReloadResult struct {
	// Changes is the skill-level diff (added / removed / body_changed) between
	// the previously loaded skills and the freshly re-scanned set.
	Changes []skills.SkillChange

	// FingerprintMoved reports whether the rebuilt model-visible system prompt
	// differs from the pre-reload one. True means the bytes DeepSeek caches
	// actually changed, so a new epoch was minted and the next turn will miss
	// cache exactly once. False means nothing the model can see changed and no
	// epoch was switched — so the reload costs no cache miss.
	FingerprintMoved bool

	// OldEpochID / NewEpochID are populated only when an epoch switch occurred
	// (FingerprintMoved && an epoch already existed).
	OldEpochID string
	NewEpochID string
}

ReloadResult summarizes an Agent.ReloadSkills (the /reload-skills command).

type SemanticCompactionConfig

type SemanticCompactionConfig struct {
	// WarnThreshold is the context ratio (0-1) at which to emit
	// a warning and prepare pinned facts. Default: 0.75
	WarnThreshold float64

	// CompactThreshold is the context ratio at which to attempt
	// semantic compaction. Default: 0.80
	CompactThreshold float64

	// ProtectionThreshold is the context ratio at which to enter
	// protection mode (preserve task continuity over full history).
	// Default: 0.90
	ProtectionThreshold float64

	// SummaryModel is the model to use for semantic summaries.
	// Default: "deepseek-v4-flash"
	SummaryModel string

	// SummaryTimeout is the timeout for the summary request.
	// Default: 15 seconds
	SummaryTimeout time.Duration

	// MaxSummaryTokens caps the semantic summary length.
	// Default: 2000
	MaxSummaryTokens int
}

SemanticCompactionConfig controls semantic compaction behavior.

type SemanticCompactionResult

type SemanticCompactionResult struct {
	Summary        string
	FromIdx, ToIdx int
	RemovedCount   int
	SummaryMessage llm.Message
	KeptMessages   []llm.Message
	UsedSemantic   bool    // true if LLM summary was used
	SummaryCost    float64 // cost of the LLM call, if any
	FallbackReason string  // why deterministic fallback was used
}

SemanticCompactionResult is what SemanticCompact produces.

func CompactWithSemantic

func CompactWithSemantic(
	ctx context.Context,
	messages []llm.Message,
	client *llm.Client,
	systemPrompt string,
	tools []llm.Tool,
	compCfg CompactionConfig,
	semanticCfg SemanticCompactionConfig,
	maxContextTokens int,
) SemanticCompactionResult

CompactWithSemantic checks context pressure and decides between no compaction, a warning, semantic compaction (LLM), or deterministic fallback. It returns a SemanticCompactionResult with Summary == "" when no compaction was performed.

The action decision:

  • "none": below all thresholds → no compaction
  • "warn": above warn threshold → warning only
  • "compact": above compact threshold → semantic compaction
  • "protect": above protection threshold → semantic compaction (same as compact for now)

When semantic compaction fails, it falls back to the deterministic CompactSession. The caller should check UsedSemantic and FallbackReason to report the outcome.

func SemanticCompact

func SemanticCompact(
	ctx context.Context,
	messages []llm.Message,
	client *llm.Client,
	systemPrompt string,
	tools []llm.Tool,
	cfg SemanticCompactionConfig,
) SemanticCompactionResult

SemanticCompact attempts LLM-powered compaction, falling back to deterministic on failure. The LLM call:

  • Uses deepseek-v4-flash
  • Disables thinking
  • Has a 15 second timeout
  • Reuses the same static system prefix
  • Preserves pinned skills and constraints
  • Preserves current objective
  • Preserves negative constraints
  • Preserves changed file paths
  • Preserves recent tool evidence
  • Records its own usage and cost

type Spawner

type Spawner interface {
	Spawn(ctx context.Context, task SubTask) (SubResult, error)
}

Spawner is the v0.2 interface for subagent dispatch. v0.1 leaves it unimplemented; reserving the type makes the v0.2 addition additive rather than a refactor.

type Status

type Status struct {
	ID           string
	Kind         string
	State        string
	StartedAt    time.Time
	FinishedAt   time.Time
	Summary      string
	Tail         string
	DroppedBytes int64
	TotalLines   int
	Truncated    bool
}

Status is the public status struct returned to tools.

type StepRecord

type StepRecord struct {
	FinishReason      string
	Usage             llm.Usage
	ToolCalls         []llm.ToolCall
	EpochID           string
	StaticPrefixHash  string
	ExpectedCacheMiss bool
	// Model is the model that actually produced this step. It usually equals
	// the loop model, but an escalated turn (T2.3) records the stronger model
	// so cost/trace attribution follows the turn, not the static loop model.
	Model string
	// MessageCount is len(a.Messages) captured BEFORE this step's model turn —
	// the transcript boundary this step started from. /undo (T3.5) truncates
	// a.Messages back to the boundary of the first undone step so the model's
	// view matches the reverted files. Boundaries recorded before a compaction
	// are stale (compaction renumbers messages), so undo refuses to cross one.
	MessageCount int
	// Snapshotted is true when this step took a file snapshot (i.e. it ran a
	// mutating tool). /undo counts SNAPSHOTS, not steps — snapshots are sparse
	// (read-only steps take none) — so ReconcileUndo walks snapshotted steps to
	// find the same boundary the snapshot manager reverts files to (T3.5).
	Snapshotted bool
}

StepRecord captures one step's outcome so stop conditions can reason across history.

type StopCondition

type StopCondition func(steps []StepRecord) (stop bool, reason StopReason)

StopCondition examines recent history and returns (true, reason) when the loop should terminate. The agent calls all conditions after each step and stops on the first that fires.

func LoopDetection

func LoopDetection(window, maxRepeats int) StopCondition

LoopDetection breaks the loop when the same tool call (name + arg hash) appears `maxRepeats` times within the last `window` steps. Crush calls this in internal/agent/loop_detection.go; we use the same shape. Default v0.1: window=5, maxRepeats=3.

func MaxSteps

func MaxSteps(n int) StopCondition

MaxSteps caps total agent steps in a single Run. Default 50 in v0.1.

type StopReason

type StopReason int

StopReason describes why the loop terminated.

const (
	StopUnknown       StopReason = iota
	StopModelDone                // finish_reason!=tool_calls and no tool calls
	StopMaxSteps                 // step cap exceeded
	StopLoopDetected             // same tool call repeated too many times
	StopContextCancel            // ctx.Err()
	StopUserRequested            // explicit cancellation from TUI
	StopStepTimeout              // per-step deadline exceeded (non-success)
)

func (StopReason) IsSuccess

func (r StopReason) IsSuccess() bool

IsSuccess reports whether a stop reason represents a clean, complete run (the model finished on its own). Every other reason — cancellation, a step timeout, a loop or step-cap halt, or an unknown/error exit — is a non-success termination and must not be rendered or recorded as "done".

func (StopReason) String

func (r StopReason) String() string

type SubResult

type SubResult struct {
	Summary    string
	StepCount  int
	TokenCount int
}

SubResult is the summary a subagent returns.

type SubTask

type SubTask struct {
	Description string
	Tools       []string // names; subset of parent registry
}

SubTask is what a parent agent hands to a subagent.

type Subscription

type Subscription struct {
	C <-chan EventEnvelope
	// contains filtered or unexported fields
}

Subscription is one consumer's view of the Bus. C delivers events in publish order. Callers that stop reading should Unsubscribe to avoid leaking the goroutine that would otherwise block on the full channel.

func (*Subscription) Dropped

func (s *Subscription) Dropped() uint64

Dropped returns the number of events dropped for this subscription because its buffer was full. Only non-reply events are dropped; reply-carrying events block until the consumer reads them.

type TraceSink

type TraceSink struct {
	// contains filtered or unexported fields
}

TraceSink converts an agent's event stream into JSONL trace records. It subscribes to a bus and writes one record per epoch lifecycle event, per turn (prefix snapshot + usage), per compaction, and per blocked drift. The trace is the source of truth for the benchmark's cache-reliability gate.

Construct the root via Agent.AttachTraceSink (wires the subscription, a drain goroutine, and a handle the caller waits on after Run). Subagent child sinks are derived via newChildTraceSink and share the root's writer.

func NewTraceSink

func NewTraceSink(w io.Writer, model string) *TraceSink

NewTraceSink builds a root sink writing JSONL to w. model is used to price usage records (cost_cny). Every record is stamped with a per-run run_id and agent_role="root" so the benchmark can distinguish the root epoch from any subagent epochs when judging parent/subagent cache pollution.

func (*TraceSink) Handle

func (s *TraceSink) Handle(ev Event)

Handle processes a single agent event, emitting trace records. Exported so it can be unit-tested without a live bus.

type TraceSinkHandle

type TraceSinkHandle struct {
	// contains filtered or unexported fields
}

TraceSinkHandle lets the caller wait for the sink to finish processing after Run returns (EventDone is the terminator) and unsubscribe.

func (*TraceSinkHandle) Close

func (h *TraceSinkHandle) Close()

Close unsubscribes the sink from the bus. Safe to call after Wait.

func (*TraceSinkHandle) Wait

func (h *TraceSinkHandle) Wait()

Wait blocks until the agent's terminating EventDone has been processed.

func (*TraceSinkHandle) WaitTimeout

func (h *TraceSinkHandle) WaitTimeout(d time.Duration) bool

WaitTimeout blocks until EventDone is processed or d elapses, whichever is first. It returns true when EventDone was processed (the agent finished cleanly) and false on timeout — a timed-out child trace is partial, which the caller must surface rather than silently close.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL