analyzer

package
v0.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 3, 2026 License: MIT Imports: 9 Imported by: 0

Documentation

Index

Constants

View Source
const (
	CharsPerToken       = 4   // ~4 characters per token for English text
	Base64BytesPerToken = 750 // ~750 bytes of base64 per image token
)

Token estimation constants.

View Source
const CompactionDropThreshold = 50000

CompactionDropThreshold is the minimum token decrease between consecutive assistant messages that indicates a compaction event occurred.

View Source
const CompactionThreshold = 165000

CompactionThreshold is the observed token count where Claude Code triggers automatic compaction (~165K-170K tokens).

View Source
const ContextWindowSize = 200000

ContextWindowSize is the standard Claude context window size in tokens.

View Source
const LargeOutputThreshold = 4096

LargeOutputThreshold is the default byte threshold for flagging large Bash outputs.

View Source
const OversizedImageThreshold = 5 * 1024 * 1024

OversizedImageThreshold is the base64 data length above which an image is considered oversized. ~5 MB of base64 data ≈ 3.75 MB decoded image.

View Source
const TimeGapThreshold = 30 * time.Minute

TimeGapThreshold is the minimum time gap between consecutive entries that triggers a new branch within the same epoch.

Variables

View Source
var DefaultPricing = KnownPricing["claude-opus-4-6"]

DefaultPricing is used when model cannot be determined.

View Source
var KnownPricing = map[string]ModelPricing{
	"claude-opus-4-6": {
		Name:                 "Opus 4.6",
		InputPerMillion:      15.0,
		OutputPerMillion:     75.0,
		CacheWritePerMillion: 3.75,
		CacheReadPerMillion:  0.75,
	},
	"claude-sonnet-4-6": {
		Name:                 "Sonnet 4.6",
		InputPerMillion:      3.0,
		OutputPerMillion:     15.0,
		CacheWritePerMillion: 0.75,
		CacheReadPerMillion:  0.15,
	},
	"claude-haiku-4-5-20251001": {
		Name:                 "Haiku 4.5",
		InputPerMillion:      0.80,
		OutputPerMillion:     4.0,
		CacheWritePerMillion: 0.20,
		CacheReadPerMillion:  0.04,
	},
}

KnownPricing maps model IDs to their pricing. Prices are per million tokens.

Functions

func AutoSelectProgress

func AutoSelectProgress(entries []jsonl.Entry, selected map[int]bool) map[int]bool

AutoSelectProgress expands a selection to include progress messages linked to selected assistant messages via toolUseID.

func CompactionDistance

func CompactionDistance(stats *ContextStats) int

CompactionDistance estimates the number of conversational turns remaining before the next automatic compaction triggers.

func CostPercent added in v0.4.0

func CostPercent(component, total float64) float64

CostPercent returns the percentage of total that component represents.

func DetectImageType added in v0.4.8

func DetectImageType(b64data string) string

DetectImageType returns the MIME type based on the base64-encoded image data magic bytes.

func DetectSessionCWD added in v0.4.3

func DetectSessionCWD(entries []jsonl.Entry) string

DetectSessionCWD extracts the CWD from the first entry that has one set.

func EstimateImageBytes

func EstimateImageBytes(e *jsonl.Entry) int64

EstimateImageBytes returns the approximate decoded size of all images in an entry.

func EstimateImageTokens added in v0.3.0

func EstimateImageTokens(e *jsonl.Entry) int

EstimateImageTokens returns the estimated token cost of images in an entry.

func EstimateTokens

func EstimateTokens(e *jsonl.Entry) int

EstimateTokens estimates the token count for a single entry.

func ExtractDecisionHint added in v0.4.6

func ExtractDecisionHint(text string) string

ExtractDecisionHint returns a truncated hint if the text contains decision keywords.

func ExtractSummaryText added in v0.4.6

func ExtractSummaryText(e jsonl.Entry) string

ExtractSummaryText extracts concatenated text content from an entry.

func ExtractToolInputPath added in v0.4.6

func ExtractToolInputPath(input json.RawMessage) string

ExtractToolInputPath extracts file_path, path, or pattern from tool_use input.

func FormatCost added in v0.4.0

func FormatCost(cost float64) string

FormatCost formats a dollar cost for display.

func FormatCostPerTurn added in v0.4.0

func FormatCostPerTurn(total float64, turns int) string

FormatCostPerTurn formats cost with per-turn breakdown.

func GradeFromSignalPercent added in v0.4.4

func GradeFromSignalPercent(pct int) string

GradeFromSignalPercent exports grading for use by TUI session browser.

func QuickCost added in v0.4.0

func QuickCost(inputTokens, outputTokens, cacheWriteTokens, cacheReadTokens int, model string) float64

QuickCost calculates an estimated cost from accumulated token counts. Used by session browser where full parsing is too slow.

func TruncateHint added in v0.4.6

func TruncateHint(s string, maxLen int) string

TruncateHint truncates text to maxLen runes, adding "..." if needed.

Types

type Branch added in v0.4.2

type Branch struct {
	Index         int
	StartIdx      int    // first entry index (inclusive)
	EndIdx        int    // last entry index (inclusive)
	Summary       string // first user message text, truncated 60 chars
	TokenCost     int    // sum of RawSize/4
	EntryCount    int
	UserTurns     int // count of user messages
	TimeStart     time.Time
	TimeEnd       time.Time
	FileCount     int  // unique files from extractAllPaths
	HasCompaction bool // branch starts at a compaction boundary
	IsLast        bool
}

Branch represents a contiguous segment of conversation entries, bounded by compaction events or significant time gaps.

func FindBranches added in v0.4.2

func FindBranches(entries []jsonl.Entry, compactions []CompactionEvent) []Branch

FindBranches segments session entries into logical branches based on compaction boundaries and time gaps exceeding TimeGapThreshold.

type CleanupItem added in v0.4.0

type CleanupItem struct {
	Category    string // "progress", "snapshots", "stale_reads", "images", "large_outputs", "failed_retries", "sidechains", "tangents"
	Label       string // human-readable: "Progress messages"
	Count       int    // items affected
	TokensSaved int    // estimated tokens recoverable
	TurnsGained int    // TokensSaved / TokenGrowthRate
}

CleanupItem represents a single category of cleanable content.

type CleanupRecommendation added in v0.4.0

type CleanupRecommendation struct {
	Items            []CleanupItem // sorted by TokensSaved descending
	TotalTokens      int
	TotalTurnsGained int
	CurrentPercent   float64
	ProjectedPercent float64 // context usage after cleanup
}

CleanupRecommendation aggregates all cleanable categories into a ranked list.

func Recommend added in v0.4.0

func Recommend(stats *ContextStats, dupResult *DuplicateReadResult, retryResult *RetryResult, tangentResult *TangentResult) *CleanupRecommendation

Recommend builds a ranked cleanup recommendation from existing analysis data.

type CompactionArchaeology added in v0.4.0

type CompactionArchaeology struct {
	CompactionIndex int
	LineIndex       int
	Before          EpochSummary
	After           CompactionSummary
}

CompactionArchaeology describes what was lost at a single compaction boundary.

type CompactionEvent

type CompactionEvent struct {
	LineIndex    int
	BeforeTokens int
	AfterTokens  int
	TokensDrop   int
}

CompactionEvent records a detected context compaction.

type CompactionReport added in v0.4.0

type CompactionReport struct {
	Events []CompactionArchaeology
}

CompactionReport holds archaeology data for all compaction events in a session.

func AnalyzeCompactions added in v0.4.0

func AnalyzeCompactions(entries []jsonl.Entry, compactions []CompactionEvent) *CompactionReport

AnalyzeCompactions performs archaeology on all compaction events. It segments entries by compaction boundaries, extracts structural metadata from each pre-compaction epoch, and captures the post-compaction summary.

type CompactionSummary added in v0.4.0

type CompactionSummary struct {
	SummaryText      string
	SummaryCharCount int
	CompressionRatio float64
}

CompactionSummary holds post-compaction data.

type ContextStats

type ContextStats struct {
	TotalLines           int
	MessageCounts        map[jsonl.MessageType]int
	CurrentContextTokens int
	MaxContextTokens     int
	UsagePercent         float64
	CompactionCount      int
	Compactions          []CompactionEvent
	TokenGrowthRate      float64
	EstimatedTurnsLeft   int
	FileSizeBytes        int64
	ImageCount           int
	ImageBytesTotal      int64
	SnapshotCount        int
	SnapshotBytesTotal   int64
	LargeOutputCount     int
	LargeOutputTokens    int
	SidechainCount       int
	SidechainGroups      int
	SidechainTokens      int
	TangentCount         int
	TangentEntries       int
	TangentTokens        int
	ProgressCount        int
	ProgressTokens       int
	ConversationalTurns  int
	LastCompactionLine   int
	Cost                 *CostBreakdown
	EpochCosts           []EpochCost
	Model                string
	Archaeology          *CompactionReport
}

ContextStats holds comprehensive analysis results for a session.

func Analyze

func Analyze(entries []jsonl.Entry) *ContextStats

Analyze performs a full analysis of parsed session entries.

type CostBreakdown added in v0.4.0

type CostBreakdown struct {
	InputCost        float64
	OutputCost       float64
	CacheWriteCost   float64
	CacheReadCost    float64
	TotalCost        float64
	InputTokens      int
	OutputTokens     int
	CacheWriteTokens int
	CacheReadTokens  int
	TurnCount        int
	CostPerTurn      float64
	Model            string
}

CostBreakdown holds itemized cost for a session or epoch.

func CalculateCost added in v0.4.0

func CalculateCost(entries []jsonl.Entry) *CostBreakdown

CalculateCost computes total session cost from assistant message usage fields.

type DeletionImpact

type DeletionImpact struct {
	SelectedCount        int
	ProgressAutoRemoved  int
	EstimatedTokenSaved  int
	NewContextPercent    float64
	PredictedTurnsGained int
	ChainRepairs         int
	Warnings             []string
}

DeletionImpact predicts the effects of deleting selected messages.

func PredictImpact

func PredictImpact(entries []jsonl.Entry, selected map[int]bool, stats *ContextStats) *DeletionImpact

PredictImpact analyzes the impact of deleting selected entries.

type DiagnosisResult added in v0.3.0

type DiagnosisResult struct {
	Issues []Issue
}

DiagnosisResult holds all detected issues.

func Diagnose added in v0.3.0

func Diagnose(entries []jsonl.Entry) *DiagnosisResult

Diagnose scans session entries for common problems.

func (*DiagnosisResult) IssuesByIndex added in v0.3.0

func (d *DiagnosisResult) IssuesByIndex() map[int][]Issue

IssuesByIndex returns a map from entry index to issues affecting it.

type DuplicateGroup added in v0.3.0

type DuplicateGroup struct {
	FilePath        string
	ReadIndices     []int       // all entry indices containing tool_use for this file read
	LatestIndex     int         // the most recent read (to keep)
	StaleReads      []StaleRead // older reads with details for cleanup
	EstimatedTokens int         // total tokens across stale reads
}

DuplicateGroup represents a file that was read multiple times.

func (DuplicateGroup) StaleIndices added in v0.3.0

func (g DuplicateGroup) StaleIndices() []int

StaleIndices returns all entry indices that are stale (assistant + tool_result).

type DuplicateReadResult added in v0.3.0

type DuplicateReadResult struct {
	Groups      []DuplicateGroup
	TotalStale  int
	TotalTokens int
	UniqueFiles int
}

DuplicateReadResult summarizes all duplicate reads in a session.

func FindDuplicateReads added in v0.3.0

func FindDuplicateReads(entries []jsonl.Entry) *DuplicateReadResult

FindDuplicateReads scans entries for files read more than once. Returns groups of duplicate reads with stale indices marked.

func (*DuplicateReadResult) AllStaleIndices added in v0.3.0

func (r *DuplicateReadResult) AllStaleIndices() map[int]bool

AllStaleIndices returns every stale entry index across all groups.

type Epoch added in v0.4.0

type Epoch struct {
	Index         int
	TurnCount     int
	PeakTokens    int
	Cost          float64
	Topic         string
	SurvivedChars int                    // -1 for active epoch
	IsActive      bool                   // true for last epoch
	Archaeology   *CompactionArchaeology // nil for active epoch
}

Epoch is a unified view of a compaction epoch, merging cost and archaeology data.

func BuildEpochs added in v0.4.0

func BuildEpochs(epochCosts []EpochCost, archaeology *CompactionReport, activeTopicHint string) []Epoch

BuildEpochs merges EpochCost and CompactionArchaeology into a unified epoch timeline. activeTopicHint is used as the topic for the active (last) epoch when no archaeology exists.

type EpochCost added in v0.4.0

type EpochCost struct {
	EpochIndex int
	TurnCount  int
	PeakTokens int
	Cost       CostBreakdown
}

EpochCost holds cost for a single compaction epoch.

func CalculateEpochCosts added in v0.4.0

func CalculateEpochCosts(entries []jsonl.Entry, compactions []CompactionEvent) []EpochCost

CalculateEpochCosts segments entries by compaction boundaries and computes cost per epoch. Epoch 0 is pre-first-compaction, epoch N is after Nth compaction.

type EpochScope added in v0.4.1

type EpochScope struct {
	EpochIndex     int
	InScope        int            // entries with CWD path refs
	OutScope       int            // entries with external path refs
	OutScopeByRepo map[string]int // external repo root -> count
	DriftRatio     float64
	DriftCost      float64 // dollar cost of out-of-scope assistant turns
}

EpochScope holds scope distribution for a single compaction epoch.

type EpochSummary added in v0.4.0

type EpochSummary struct {
	TurnCount       int
	TokensPeak      int
	FilesReferenced []string
	ToolCallCounts  map[string]int
	UserQuestions   []string
	DecisionHints   []string
}

EpochSummary holds structural metadata extracted from a pre-compaction epoch.

func (EpochSummary) TotalToolCalls added in v0.4.0

func (s EpochSummary) TotalToolCalls() int

TotalToolCalls returns the sum of all tool call counts.

type HealthScore added in v0.4.4

type HealthScore struct {
	SignalTokens    int     // CurrentContextTokens - NoiseTokens
	NoiseTokens     int     // sum of all CleanupItem.TokensSaved
	TotalTokens     int     // CurrentContextTokens
	SignalPercent   float64 // SignalTokens / TotalTokens * 100
	NoisePercent    float64 // NoiseTokens / TotalTokens * 100
	Grade           string  // "A" (>90%), "B" (>75%), "C" (>60%), "D" (>40%), "F" (<=40%)
	BiggestOffender string  // CleanupItem[0].Category
	OffenderTokens  int     // CleanupItem[0].TokensSaved
}

HealthScore represents the signal/noise ratio for a session's context.

func ComputeHealth added in v0.4.4

func ComputeHealth(stats *ContextStats, rec *CleanupRecommendation) *HealthScore

ComputeHealth derives a health score from existing analysis data.

type Issue added in v0.3.0

type Issue struct {
	Kind        IssueKind
	EntryIndex  int
	Description string
	// RelatedIndex is set for filter_block (the triggering user message index).
	RelatedIndex int
}

Issue describes a single detected problem in a session.

type IssueKind added in v0.3.0

type IssueKind string

IssueKind classifies the type of session problem detected.

const (
	IssueFilterBlock       IssueKind = "filter_block"
	IssueOversizedImage    IssueKind = "oversized_image"
	IssueOrphanedResult    IssueKind = "orphaned_result"
	IssueMalformed         IssueKind = "malformed"
	IssueMediaTypeMismatch IssueKind = "media_type_mismatch"
)

type ModelPricing added in v0.4.0

type ModelPricing struct {
	Name                 string
	InputPerMillion      float64
	OutputPerMillion     float64
	CacheWritePerMillion float64
	CacheReadPerMillion  float64
}

ModelPricing holds per-million token pricing for a Claude model.

func PricingForModel added in v0.4.0

func PricingForModel(model string) ModelPricing

PricingForModel looks up pricing by model ID. Supports prefix matching (e.g. "claude-opus-4-6-20260301" matches "claude-opus-4-6"). Falls back to DefaultPricing if no match.

type RangeMetadata added in v0.4.3

type RangeMetadata struct {
	TargetRepo  string
	TokenCost   int
	DollarCost  float64
	ReExplFiles []string
}

RangeMetadata holds computed metadata for an entry range.

func ComputeRangeMetadata added in v0.4.3

func ComputeRangeMetadata(entries []jsonl.Entry, from, to int, cwd string) *RangeMetadata

ComputeRangeMetadata computes tangent metadata for entries[from:to+1].

type RetryResult added in v0.3.0

type RetryResult struct {
	Sequences   []RetrySequence
	TotalFailed int
	TotalTokens int
}

RetryResult summarizes all failed-then-retried sequences.

func FindFailedRetries added in v0.3.0

func FindFailedRetries(entries []jsonl.Entry) *RetryResult

FindFailedRetries detects tool_use attempts that failed and were retried. A sequence is flagged only when the same tool name appears again within retryWindow entries and the original tool_result indicates an error.

func (*RetryResult) AllFailedIndices added in v0.3.0

func (r *RetryResult) AllFailedIndices() map[int]bool

AllFailedIndices returns all entry indices that are part of failed attempts.

type RetrySequence added in v0.3.0

type RetrySequence struct {
	FailedToolUseIdx int    // index of assistant entry with failed tool_use
	FailedToolUseID  string // tool_use ID of the failed attempt
	FailedResultIdx  int    // index of user entry with error tool_result
	RetryToolUseIdx  int    // index of assistant entry with retry tool_use
	ToolName         string // tool name (e.g., "Bash", "Read")
	EstimatedTokens  int    // tokens in the failed attempt
}

RetrySequence represents a failed tool attempt that was retried.

type ScopeDrift added in v0.4.1

type ScopeDrift struct {
	SessionProject string            // CWD detected from entries
	EpochScopes    []EpochScope      // per-epoch scope distribution
	TangentSeqs    []TangentSequence // contiguous out-of-scope sequences
	TotalInScope   int
	TotalOutScope  int
	OverallDrift   float64 // TotalOutScope / (TotalInScope + TotalOutScope)
}

ScopeDrift holds the complete scope drift analysis for a session.

func AnalyzeScopeDrift added in v0.4.1

func AnalyzeScopeDrift(entries []jsonl.Entry, compactions []CompactionEvent, cwd string) *ScopeDrift

AnalyzeScopeDrift performs per-epoch scope analysis and tangent sequence detection. cwd can be provided explicitly or "" to auto-detect from entries.

func (*ScopeDrift) DriftIndices added in v0.4.1

func (d *ScopeDrift) DriftIndices() map[int]bool

DriftIndices returns all entry indices flagged as out-of-scope.

func (*ScopeDrift) DriftRepoForIndex added in v0.4.1

func (d *ScopeDrift) DriftRepoForIndex(idx int) string

DriftRepoForIndex returns the dominant external repo basename for a given entry, or "" if the entry is not in a tangent sequence.

type SessionInfoLite added in v0.7.0

type SessionInfoLite struct {
	SessionID string
	Slug      string
	Created   time.Time
	Modified  time.Time
}

SessionInfoLite is the subset of session.Info needed by distill. Avoids circular import with session package.

type StaleRead added in v0.3.0

type StaleRead struct {
	AssistantIdx int    // index of the assistant entry with the tool_use
	ToolUseID    string // tool_use ID for content surgery
	ResultIdx    int    // index of the tool_result entry, -1 if not found
}

StaleRead holds details about a single stale file read for cleanup.

type TangentGroup added in v0.3.0

type TangentGroup struct {
	StartIndex      int      // first entry in the tangent
	EndIndex        int      // last entry in the tangent (inclusive)
	EntryIndices    []int    // all entry indices in this tangent
	ExternalPaths   []string // unique external paths referenced
	EstimatedTokens int      // total tokens across tangent entries
}

TangentGroup represents a contiguous block of entries referencing external repos.

type TangentResult added in v0.3.0

type TangentResult struct {
	Groups       []TangentGroup
	TotalEntries int
	TotalTokens  int
	ExternalDirs int // unique external root directories
	SessionCWD   string
}

TangentResult summarizes all detected cross-repo tangents in a session.

func FindTangents added in v0.3.0

func FindTangents(entries []jsonl.Entry) *TangentResult

FindTangents detects cross-repo tangent sequences in a session. A tangent is a contiguous block of entries where tool_use inputs reference paths outside the session's CWD AND no file modifications occur in the CWD. Only flags tangents where external paths are explicitly present — no semantic analysis.

func (*TangentResult) AllTangentIndices added in v0.3.0

func (r *TangentResult) AllTangentIndices() map[int]bool

AllTangentIndices returns every entry index across all tangent groups.

type TangentSequence added in v0.4.1

type TangentSequence struct {
	StartIdx           int
	EndIdx             int
	TargetRepo         string // dominant external repo (basename)
	EntryIndices       []int
	TokenCost          int
	DollarCost         float64
	ReExplanationFiles []string // CWD files re-read after tangent
}

TangentSequence is a contiguous block of entries primarily about another project.

type Topic added in v0.7.0

type Topic struct {
	GlobalIndex int
	SessionID   string
	SessionSlug string
	Branch      Branch
	Entries     []jsonl.Entry
	Compaction  *CompactionArchaeology
	CostDollars float64
}

Topic represents a conversation branch annotated with session-level metadata.

type TopicSessionInfo added in v0.7.0

type TopicSessionInfo struct {
	SessionID  string
	Slug       string
	TopicCount int
	Created    time.Time
	Modified   time.Time
	Cost       float64
}

TopicSessionInfo holds lightweight session metadata for the distill output header.

type TopicSessionInput added in v0.7.0

type TopicSessionInput struct {
	Entries []jsonl.Entry
	Info    SessionInfoLite
}

TopicSessionInput is the input for CollectTopics.

type TopicSet added in v0.7.0

type TopicSet struct {
	ProjectName string
	Topics      []Topic
	Sessions    []TopicSessionInfo
	TotalTokens int
	TotalCost   float64
}

TopicSet holds all topics discovered across sessions for a project.

func CollectTopics added in v0.7.0

func CollectTopics(sessions []TopicSessionInput) *TopicSet

CollectTopics builds a TopicSet from parsed session data.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL