Documentation
¶
Overview ¶
Package contexty implements a token budget allocator for LLM context windows.
LLMs have a fixed token limit (e.g. 8192). Contexty helps fit system prompts, pinned facts, RAG results, chat history, and tool outputs into that budget by treating memory as tiers: higher-priority blocks are allocated first, and configurable eviction strategies (strict, drop, truncate, summarize) apply when a block does not fit.
The library does not tokenize text itself. Callers inject a TokenCounter (e.g. tiktoken for a specific model, or CharFallbackCounter for tests). See AllocatorConfig, Builder, and Builder.Compile for the main API.
Example:
counter := &contexty.CharFallbackCounter{CharsPerToken: 4}
builder := contexty.NewBuilder(contexty.AllocatorConfig{
MaxTokens: 4000,
TokenCounter: counter,
})
builder.AddBlock(contexty.MemoryBlock{
ID: "persona", Tier: contexty.TierSystem,
Strategy: contexty.NewStrictStrategy(),
Messages: []contexty.Message{contexty.TextMessage("system", "You are a helpful assistant.")},
})
msgs, report, err := builder.Compile(ctx)
// report.TotalTokensUsed, report.Evictions, report.BlocksDropped describe what happened.
// See examples/full_assembly for a full multi-tier setup.
Example (BuilderChatHistory) ¶
package main
import (
"context"
"fmt"
"github.com/skosovsky/contexty"
)
func main() {
// System (1 msg) + history (6 msgs); 50 tokens each = 350 total, limit 300 -> truncate removes 1.
counter := &contexty.FixedCounter{TokensPerMessage: 50}
b := contexty.NewBuilder(contexty.AllocatorConfig{MaxTokens: 300, TokenCounter: counter})
b.AddBlock(contexty.MemoryBlock{
ID: "sys", Tier: contexty.TierSystem, Strategy: contexty.NewStrictStrategy(),
Messages: []contexty.Message{contexty.TextMessage("system", "You are helpful.")},
})
history := []contexty.Message{
contexty.TextMessage("user", "hi"),
contexty.TextMessage("assistant", "hello"),
contexty.TextMessage("user", "hi"),
contexty.TextMessage("assistant", "hello"),
contexty.TextMessage("user", "hi"),
contexty.TextMessage("assistant", "hello"),
}
b.AddBlock(contexty.MemoryBlock{
ID: "history", Tier: contexty.TierHistory, Strategy: contexty.NewTruncateOldestStrategy(),
Messages: history,
})
msgs, report, err := b.Compile(context.Background())
if err != nil {
return
}
fmt.Printf("messages: %d, tokens: %d, eviction(history)=%q\n",
len(msgs), report.TotalTokensUsed, report.Evictions["history"])
}
Output: messages: 6, tokens: 300, eviction(history)="truncated"
Example (InjectIntoSystemXML) ¶
package main
import (
"fmt"
"strings"
"github.com/skosovsky/contexty"
)
func main() {
sys := contexty.TextMessage("system", "Base.")
got := contexty.InjectIntoSystem(sys,
contexty.Message{Content: []contexty.ContentPart{{Type: "text", Text: "Fact1"}}},
contexty.Message{Content: []contexty.ContentPart{{Type: "text", Text: "Fact2"}}},
)
text := got.Content[0].Text
fmt.Println(got.Role)
fmt.Println(len(text) > 0 && text[:1] == "B")
fmt.Println(strings.Contains(text, "<context>") && strings.Contains(text, "<fact>"))
}
Output: system true true
Index ¶
- Constants
- Variables
- type AllocatorConfig
- type Builder
- type CharFallbackCounter
- type CompileReport
- type ContentPart
- type EvictionStrategy
- type FactExtractor
- type FixedCounter
- type FunctionCall
- type ImageURL
- type MemoryBlock
- type Message
- type Summarizer
- type Tier
- type TokenCounter
- type ToolCall
- type ToolCallEstimator
- type TruncateOption
Examples ¶
Constants ¶
const DefaultTokensPerNonTextPart = 85
DefaultTokensPerNonTextPart is the fallback token count for content parts that are not Type "text" (e.g. image_url). No validation or network checks.
Variables ¶
var ( // ErrBudgetExceeded is returned by StrictStrategy when a block does not fit // within the remaining token budget. ErrBudgetExceeded = errors.New("contexty: block exceeds remaining token budget") // ErrInvalidConfig is returned by Compile when configuration is invalid // (e.g. MaxTokens <= 0 or TokenCounter == nil). ErrInvalidConfig = errors.New("contexty: invalid allocator config") // ErrTokenCountFailed is returned when TokenCounter.Count returns an error. ErrTokenCountFailed = errors.New("contexty: token counting failed") // ErrInvalidCharsPerToken is returned by CharFallbackCounter when // CharsPerToken is zero or negative. ErrInvalidCharsPerToken = errors.New("contexty: CharsPerToken must be positive") // ErrNilStrategy is returned by Compile when a MemoryBlock has a nil Strategy. ErrNilStrategy = errors.New("contexty: block has nil eviction strategy") // ErrStrategyExceededBudget is returned by Compile when an EvictionStrategy.Apply // returns messages whose token count exceeds the remaining budget (contract violation). ErrStrategyExceededBudget = errors.New("contexty: strategy returned output exceeding remaining budget") )
Sentinel errors for typical contexty failure modes. Use errors.Is to check for these in calling code.
Functions ¶
This section is empty.
Types ¶
type AllocatorConfig ¶
type AllocatorConfig struct {
MaxTokens int // Total token budget (must be > 0)
TokenCounter TokenCounter // Required; used by Compile
}
AllocatorConfig configures the token budget and how to count tokens.
type Builder ¶
type Builder struct {
// contains filtered or unexported fields
}
Builder collects memory blocks and compiles them into a single message slice within the token budget. A Builder can be reused: call AddBlock and Compile multiple times. Each Compile uses the current list of blocks (blocks are not cleared after Compile). For a fresh compile, create a new Builder.
func NewBuilder ¶
func NewBuilder(cfg AllocatorConfig) *Builder
NewBuilder returns a new Builder with the given config. Config is not validated until Compile.
Example ¶
package main
import (
"context"
"fmt"
"github.com/skosovsky/contexty"
)
func main() {
counter := &contexty.CharFallbackCounter{CharsPerToken: 4}
builder := contexty.NewBuilder(contexty.AllocatorConfig{
MaxTokens: 100,
TokenCounter: counter,
})
builder.AddBlock(contexty.MemoryBlock{
ID: "sys", Tier: contexty.TierSystem, Strategy: contexty.NewStrictStrategy(),
Messages: []contexty.Message{contexty.TextMessage("system", "You are helpful.")},
})
msgs, report, err := builder.Compile(context.Background())
if err != nil {
return
}
fmt.Printf("messages: %d, tokens: %d\n", len(msgs), report.TotalTokensUsed)
}
Output: messages: 1, tokens: 4
func (*Builder) AddBlock ¶
func (b *Builder) AddBlock(block MemoryBlock) *Builder
AddBlock appends a block and returns the builder for chaining.
func (*Builder) Compile ¶
Compile assembles all blocks into a single []Message that fits within MaxTokens. Blocks are processed in Tier order (stable sort); within the same Tier, insertion order is kept. Returns the final messages, a report, and an error (e.g. invalid config or StrictStrategy overflow). Compile can be called multiple times on the same Builder; each call uses the current blocks.
Example ¶
package main
import (
"context"
"fmt"
"github.com/skosovsky/contexty"
)
func main() {
counter := &contexty.CharFallbackCounter{CharsPerToken: 4}
b := contexty.NewBuilder(contexty.AllocatorConfig{MaxTokens: 50, TokenCounter: counter})
b.AddBlock(contexty.MemoryBlock{
ID: "core", Tier: contexty.TierCore, Strategy: contexty.NewDropStrategy(),
Messages: []contexty.Message{contexty.TextMessage("system", "User: Alice")},
})
b.AddBlock(contexty.MemoryBlock{
ID: "history", Tier: contexty.TierHistory, Strategy: contexty.NewTruncateOldestStrategy(),
Messages: []contexty.Message{
contexty.TextMessage("user", "Hi"),
contexty.TextMessage("assistant", "Hello!"),
},
})
msgs, report, err := b.Compile(context.Background())
if err != nil {
return
}
fmt.Printf("msgs=%d evictions=%v\n", len(msgs), report.Evictions)
}
Output: msgs=3 evictions=map[]
type CharFallbackCounter ¶
type CharFallbackCounter struct {
// CharsPerToken is the character-to-token ratio (e.g. 4 for English).
// Must be positive.
CharsPerToken int
// TokensPerNonTextPart is the weight for content parts with Type != "text"
// (e.g. image_url). Zero means use DefaultTokensPerNonTextPart.
TokensPerNonTextPart int
// EstimateTool is optional; when set, used for each ToolCall instead of rune-based fallback.
EstimateTool ToolCallEstimator
}
CharFallbackCounter approximates token count by dividing character count by a configurable ratio. It does not use a real tokenizer (BPE/tiktoken). Suitable for prototyping and environments where exact counting is not critical. For production, inject a model-specific TokenCounter (e.g. tiktoken).
func (*CharFallbackCounter) Count ¶
Count returns the estimated token count for all messages. Text from ContentPart (Type "text") is measured in runes; non-text parts use a constant weight. ToolCalls: if EstimateTool is set, its result is summed; otherwise runes of Arguments+Name are used. Returns ErrInvalidCharsPerToken if CharsPerToken <= 0.
type CompileReport ¶
type CompileReport struct {
TotalTokensUsed int // Total tokens in the final result
OriginalTokens int // Total tokens before eviction (all blocks considered)
RemainingTokens int // MaxTokens minus TotalTokensUsed after compile
OriginalTokensPerBlock map[string]int // Block ID -> tokens before eviction (before strategy applied)
TokensPerBlock map[string]int // Block ID -> tokens used in output
Evictions map[string]string // Block ID -> strategy applied ("rejected", "dropped", "truncated", "summarized")
BlocksDropped []string // IDs of blocks completely removed (may contain duplicates if multiple blocks shared the same ID)
}
CompileReport describes what happened during Compile: token usage and evictions.
type ContentPart ¶ added in v0.2.0
type ContentPart struct {
Type string // "text", "image_url", or provider-specific
Text string `json:"text,omitempty"`
ImageURL *ImageURL `json:"image_url,omitempty"`
}
ContentPart represents a single part of message content (text or image). Type is not validated in core; typical values are "text", "image_url".
type EvictionStrategy ¶
type EvictionStrategy interface {
// Apply returns a subset of msgs that fits within limit tokens, or an error.
// originalTokens is the token count of msgs (from counter.Count(ctx, msgs)); use it to avoid re-counting.
// Returned messages must have total token count <= limit; Compile enforces this.
Apply(ctx context.Context, msgs []Message, originalTokens int, limit int, counter TokenCounter) ([]Message, error)
}
EvictionStrategy defines how to shrink or trim a block to fit the remaining budget. Each MemoryBlock has its own strategy (strict, drop, truncate, summarize).
Apply receives originalTokens (pre-counted by Builder) for DRY; implementations must return messages whose total token count <= limit. Compile re-counts output and returns ErrStrategyExceededBudget if the contract is violated.
func NewDropStrategy ¶
func NewDropStrategy() EvictionStrategy
NewDropStrategy returns a strategy that drops the block entirely when it exceeds the limit. Use for RAG or other optional blocks where partial content is worse than none.
func NewStrictStrategy ¶
func NewStrictStrategy() EvictionStrategy
NewStrictStrategy returns a strategy that fails with ErrBudgetExceeded when the block exceeds the limit. Use for TierSystem and other blocks that must never be evicted.
func NewSummarizeStrategy ¶
func NewSummarizeStrategy(summarizer Summarizer) EvictionStrategy
NewSummarizeStrategy returns a strategy that calls the given Summarizer when the block exceeds the limit. If the summary still does not fit, the block is dropped (empty result). Panics if summarizer is nil (programmer error at init time).
func NewTruncateOldestStrategy ¶
func NewTruncateOldestStrategy(opts ...TruncateOption) EvictionStrategy
NewTruncateOldestStrategy returns a strategy that truncates from the oldest messages. Options: KeepUserAssistantPairs, MinMessages, ProtectRole.
type FactExtractor ¶
FactExtractor analyzes conversation history and extracts new long-term facts. This interface is reserved for v2; the allocator does not use it yet. Implementations typically call an LLM with a prompt like "Extract new facts about the user."
TODO: v2 — integrate with TierCore updates and diffing utilities.
type FixedCounter ¶
type FixedCounter struct {
// TokensPerMessage is the base weight per message (always applied).
TokensPerMessage int
// TokensPerContentPart is added for each ContentPart in a message (0 = not used).
TokensPerContentPart int
// TokensPerToolCall is added for each ToolCall in a message (0 = not used).
TokensPerToolCall int
}
FixedCounter returns a token count derived from message structure for testing. Enables realistic eviction tests: removing one "heavy" message frees many tokens.
type FunctionCall ¶ added in v0.2.0
FunctionCall holds function name and arguments (JSON string; not validated in core).
type ImageURL ¶ added in v0.2.0
type ImageURL struct {
URL string `json:"url"`
Detail string `json:"detail,omitempty"` // e.g. "low", "high"
}
ImageURL holds URL and optional detail level for image content. No URL validation or network checks in core.
type MemoryBlock ¶
type MemoryBlock struct {
ID string
Messages []Message
Tier Tier
Strategy EvictionStrategy
MaxTokens int // Optional: hard per-block token limit (0 = no limit)
CacheControl string // Optional: caching rules for the block
}
MemoryBlock is a logical group of messages with a Tier and an EvictionStrategy. ID is used in CompileReport; empty ID is allowed. MaxTokens is optional: when > 0 and less than the remaining global budget, Apply receives this value as the limit so the block is capped locally (e.g. RAG block limited to 200 tokens). CacheControl is for provider-specific prompt caching (e.g. Anthropic/Gemini); not interpreted in core.
type Message ¶
type Message struct {
Role string
Content []ContentPart // Always slice; text-only = one part with Type "text"
Name string // Optional: function name for tool messages
ToolCalls []ToolCall
ToolCallID string
Metadata map[string]any
}
Message is the minimal unit of context: a single chat turn with role and content. v2: Content is always []ContentPart; use TextMessage/MultipartMessage helpers. ToolCalls and Metadata support agents and prompt caching; no validation in core.
func InjectIntoSystem ¶
InjectIntoSystem merges auxiliary text blocks into a single system message using XML tags for structured separation. Only text parts (Type "text") are included; other part types (e.g. image_url, audio) are safely ignored to avoid embedding large or binary content. Content is XML-escaped to prevent injection. If blocks is empty, returns systemMsg unchanged.
Example ¶
package main
import (
"fmt"
"github.com/skosovsky/contexty"
)
func main() {
sys := contexty.TextMessage("system", "You are a doctor.")
got := contexty.InjectIntoSystem(sys,
contexty.Message{Content: []contexty.ContentPart{{Type: "text", Text: "Patient has fever."}}},
contexty.Message{Content: []contexty.ContentPart{{Type: "text", Text: "Allergies: none."}}},
)
fmt.Println(got.Role)
fmt.Println(len(got.Content) > 0 && len(got.Content[0].Text) > 0 && got.Content[0].Text[0] == 'Y')
}
Output: system true
func MultipartMessage ¶ added in v0.2.0
func MultipartMessage(role string, parts ...ContentPart) Message
MultipartMessage creates a message with multiple content parts (text, images, etc.).
func TextMessage ¶ added in v0.2.0
TextMessage creates a simple text-only message (single ContentPart with Type "text").
type Summarizer ¶
Summarizer compresses a slice of messages into a single summary message. Typically implemented via a cheap/fast LLM call; used by SummarizeStrategy.
type Tier ¶
type Tier int
Tier is the priority level of a memory block (lower number = higher priority). The type is int so callers can define custom tiers (e.g. Tier(10) for debug logs). Built-in constants cover typical use cases but the set is not closed.
const ( // TierSystem is for immutable instructions (persona, rules). Never evicted; error if doesn't fit. TierSystem Tier = 0 // TierCore is for pinned facts (user name, preferences). TierCore Tier = 1 // TierRAG is for external knowledge (episodic retrieval). TierRAG Tier = 2 // TierHistory is for conversation history (working memory). TierHistory Tier = 3 // TierScratchpad is for temporary reasoning and tool call logs. TierScratchpad Tier = 4 )
type TokenCounter ¶
TokenCounter counts tokens for a slice of messages. The library does not implement real tokenization; the caller injects an implementation. Count must account for message structure (role, content parts, tool calls) and any per-message overhead; no validation of content types or URLs in core. The context is passed from Compile and may be used for cancellation or timeouts (e.g. when counting involves a network call to a tokenization service).
type ToolCall ¶ added in v0.2.0
type ToolCall struct {
ID string `json:"id"`
Type string `json:"type"` // typically "function"
Function FunctionCall `json:"function"`
}
ToolCall represents a tool/function call in agent messages.
type ToolCallEstimator ¶ added in v0.2.1
ToolCallEstimator returns the token weight of a single tool call. When non-nil in CharFallbackCounter, it is used for ToolCalls instead of rune-based fallback.
type TruncateOption ¶
type TruncateOption func(*truncateConfig)
TruncateOption configures TruncateOldestStrategy behavior.
func KeepUserAssistantPairs ¶
func KeepUserAssistantPairs(keep bool) TruncateOption
KeepUserAssistantPairs ensures messages are removed in user-assistant pairs from the start, so that dialog coherence is preserved (no orphan user or assistant).
func MinMessages ¶
func MinMessages(n int) TruncateOption
MinMessages sets the minimum number of messages to keep after truncation. If the remaining budget cannot fit at least MinMessages messages, the block is dropped entirely (empty result).
func ProtectRole ¶ added in v0.2.2
func ProtectRole(role string) TruncateOption
ProtectRole marks a role so that messages with this role are never removed when truncating. The first removable message (or user+assistant pair when KeepUserAssistantPairs is set) is removed instead. Duplicate roles are not added; the config stays deduplicated.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
examples
|
|
|
full_assembly
command
Full-assembly example: builds a multi-tier context (system, core, RAG, history) and compiles it within a token budget.
|
Full-assembly example: builds a multi-tier context (system, core, RAG, history) and compiles it within a token budget. |