Documentation
¶
Overview ¶
Package tokencount provides a shared offline tiktoken wrapper for LLM token estimation. It maps model IDs to BPE encodings and counts tokens without any network calls, using embedded BPE tables from tiktoken-go-loader.
Custom encodings (e.g. MiniMax BPE) can be registered at init time via RegisterEncoding so that provider packages can wire in their own tokenizers without creating an import cycle (tokencount ← provider ← tokencount).
Index ¶
- Constants
- func ApplyAnthropicToolOverhead(tc *TokenCount, numTools int)
- func CountMessage(model string, msg llm.Message) (int, error)
- func CountMessagesAndTools(tc *TokenCount, req TokenCountRequest, opts CountOpts) error
- func CountMessagesAndToolsAnthropic(tc *TokenCount, req TokenCountRequest) error
- func CountText(model, text string) (int, error)
- func CountTextForEncoding(encoding, text string) (int, error)
- func CountTextForModel(modelID, text string) (int, error)
- func EncodingForModel(modelID string) (encoding string, ok bool)
- func RegisterEncoding(name string, fn func(text string) (int, error))
- type CountOpts
- type TokenCount
- type TokenCountRequest
- type TokenCounter
Constants ¶
const ( EncodingCL100K = "cl100k_base" EncodingO200K = "o200k_base" // EncodingMinimax is the encoding name for the MiniMax BPE tokenizer. // The implementation is registered by provider/minimax at init time via // RegisterEncoding. EncodingMinimax = "minimax_bpe" )
Variables ¶
This section is empty.
Functions ¶
func ApplyAnthropicToolOverhead ¶ added in v0.29.0
func ApplyAnthropicToolOverhead(tc *TokenCount, numTools int)
ApplyAnthropicToolOverhead adds the Anthropic tool-use preamble and per-tool serialisation framing to tc.OverheadTokens and tc.InputTokens.
This is exported so that providers using the Anthropic API format (e.g. MiniMax) can apply the same overhead after calling CountMessagesAndTools with their own encoding.
func CountMessage ¶ added in v0.29.0
CountMessage returns the number of tokens for a single Message for the given model. The message is converted to its text representation using the same logic as CountTokens (role content + tool call names/args for IsAssistantMsg, output for ToolResult, etc.).
This is a convenience function for callers that count messages individually rather than as a batch — for example, per-entry token estimates in a conversation history manager.
func CountMessagesAndTools ¶ added in v0.29.0
func CountMessagesAndTools(tc *TokenCount, req TokenCountRequest, opts CountOpts) error
CountMessagesAndTools is a low-level helper for provider TokenCounter implementations. Library consumers should use the TokenCounter interface directly rather than calling this function.
It fills tc.PerMessage, tc.ToolsTokens, tc.PerTool, and tc.InputTokens using the given BPE encoding, then calls applyRoleBreakdown to populate the role breakdown fields.
Returns an error if req.Model is empty.
func CountMessagesAndToolsAnthropic ¶ added in v0.29.0
func CountMessagesAndToolsAnthropic(tc *TokenCount, req TokenCountRequest) error
CountMessagesAndToolsAnthropic is like CountMessagesAndTools but applies Anthropic-specific tool overhead constants: the hidden tool-use system preamble (~330 tokens, paid once) plus per-tool serialisation framing (~126 tokens first tool, ~85 tokens each additional). In total, a request with N tools adds 330+126+(N-1)×85 tokens on top of the raw JSON counts.
Use this for anthropic, bedrock, and claude providers.
func CountText ¶
CountText returns the number of tokens in text for the given model. The encoding is selected automatically based on the model ToolCallID: o200k_base for GPT-4o/o-series, cl100k_base for everything else.
This is a convenience function for callers that need to count raw text without constructing a full TokenCountRequest — for example, context-budget managers that count individual history entries.
func CountTextForEncoding ¶ added in v0.29.0
CountTextForEncoding returns the number of tokens in text using the named BPE encoding. The encoding must be one of the constants in this package (cl100k_base, o200k_base, minimax_bpe) or a name registered via RegisterEncoding.
func CountTextForModel ¶
CountTextForModel is a convenience wrapper that calls EncodingForModel and then CountTextForEncoding.
func EncodingForModel ¶
EncodingForModel returns the BPE encoding name appropriate for the given model ID, using prefix matching.
Mappings:
- minimax_bpe: minimax-*, MiniMax-*
- o200k_base: gpt-4o*, gpt-4.1*, gpt-4.5*, o1*, o3*, o4*
- cl100k_base: claude-*, gpt-4* (non-o suffixed), gpt-3.5*, and all unknowns
The second return value is false when the model was not recognised and the fallback encoding (cl100k_base) was returned.
func RegisterEncoding ¶ added in v0.26.0
RegisterEncoding registers a custom CountTextForEncoding implementation for the given encoding name. It is called from provider init() functions to wire in tokenizers that live outside the tokencount package, avoiding import cycles.
Registering the same name twice panics to catch accidental double-registration.
Types ¶
type CountOpts ¶ added in v0.29.0
type CountOpts struct {
// Encoding is the BPE encoding to use for token counting
// (e.g. "cl100k_base", "o200k_base", "minimax_bpe").
Encoding string
// PerMsgOverhead is added to InputTokens once per message. For example,
// OpenAI adds 4 tokens per message for role/framing overhead.
PerMsgOverhead int
// ReplyPriming is a fixed addend for reply-priming tokens. For example,
// OpenAI adds 3 tokens for the "assistant" token prepended by the API.
ReplyPriming int
}
CountOpts configures the shared CountMessagesAndTools helper.
type TokenCount ¶ added in v0.29.0
type TokenCount struct {
// InputTokens is the total estimated input token count:
// all messages + all tool definitions + any provider-specific overhead.
InputTokens int
// PerMessage contains the token count for each entry in TokenCountRequest.Messages,
// in the same index order. Does not include tool definitions or overhead.
// len(PerMessage) == len(TokenCountRequest.Messages) is guaranteed.
PerMessage []int
// Role breakdowns — derived from PerMessage, provided for convenience.
// SystemTokens + UserTokens + AssistantTokens + ToolResultTokens == sum(PerMessage).
SystemTokens int // sum of PerMessage for all RoleSystem messages
UserTokens int // sum of PerMessage for all RoleUser messages
AssistantTokens int // sum of PerMessage for all RoleAssistant messages
ToolResultTokens int // sum of PerMessage for all RoleTool (ToolResult) messages
// ToolsTokens is the total raw token count for all tool definitions combined,
// derived purely from the JSON-serialised tool schemas.
// sum(values(PerTool)) == ToolsTokens.
ToolsTokens int
// PerTool maps each tool definition's ToolName to its individual raw token count.
// sum(values(PerTool)) == ToolsTokens.
PerTool map[string]int
// OverheadTokens is the number of tokens the provider adds on top of the
// caller-supplied content — tokens the caller did not write and cannot
// control. Examples:
// - Anthropic: hidden tool-use system preamble + per-tool framing (~330+126+85×n)
// - Claude OAuth: injected billing/identity system blocks (~45 tokens)
//
// Zero for providers that add no hidden content (OpenAI, OpenRouter, Ollama).
//
// The invariant: InputTokens == sum(PerMessage) + ToolsTokens + OverheadTokens
// (plus any per-message overhead, e.g. +4/msg for OpenAI).
OverheadTokens int
}
TokenCount holds the result of a CountTokens call.
Invariants:
- len(PerMessage) == len(TokenCountRequest.Messages)
- SystemTokens + UserTokens + AssistantTokens + ToolResultTokens == sum(PerMessage)
- sum(values(PerTool)) == ToolsTokens (raw tool JSON counts only, no overhead)
- InputTokens == sum(PerMessage) + ToolsTokens + OverheadTokens + provider-specific per-message overhead
type TokenCountRequest ¶ added in v0.29.0
type TokenCountRequest struct {
// Model is the model ToolCallID to count tokens for (e.g. "gpt-4o", "claude-sonnet-4-5").
// Required — returns an error if empty.
Model string
Messages llm.Messages
Tools []tool.Definition
}
TokenCountRequest is the input to TokenCounter.CountTokens. Model is required — providers use it to select the correct BPE encoding.
type TokenCounter ¶ added in v0.29.0
type TokenCounter interface {
CountTokens(ctx context.Context, req TokenCountRequest) (*TokenCount, error)
}
TokenCounter is an optional interface providers may implement to estimate token usage before sending a request.
All implementations in this codebase are local/offline — no network call is made. Counts should be treated as estimates; accuracy varies by provider:
- OpenAI: exact (tiktoken matches the API tokenizer)
- OpenRouter: approximate (tiktoken, best-effort model prefix matching)
- Anthropic: approximate (cl100k_base, ±5-10% for English; tokenizer not public)
- Bedrock: approximate (same as Anthropic)
- Ollama: approximate (cl100k_base; no public tokenize endpoint)
Usage:
if tc, ok := provider.(llm.TokenCounter); ok {
count, err := tc.CountTokens(ctx, llm.TokenCountRequest{
Model: "gpt-4o",
Messages: messages,
Tools: tools,
})
if err == nil && count.InputTokens > maxTokens {
return fmt.Errorf("request too large: %d tokens (limit %d)", count.InputTokens, maxTokens)
}
}