Documentation
¶
Overview ¶
Package tokencount provides a shared offline tiktoken wrapper for LLM token estimation. It maps model IDs to BPE encodings and counts tokens without any network calls, using embedded BPE tables from tiktoken-go-loader.
Custom encodings (e.g. MiniMax BPE) can be registered at init time via RegisterEncoding so that provider packages can wire in their own tokenizers without creating an import cycle (tokencount ← provider ← tokencount).
Index ¶
Constants ¶
const ( EncodingCL100K = "cl100k_base" EncodingO200K = "o200k_base" // EncodingMinimax is the encoding name for the MiniMax BPE tokenizer. // The implementation is registered by provider/minimax at init time via // RegisterEncoding. EncodingMinimax = "minimax_bpe" )
Variables ¶
This section is empty.
Functions ¶
func CountText ¶
CountText returns the number of tokens in text using the named BPE encoding. The encoding must be one of the constants in this package (cl100k_base, o200k_base, minimax_bpe) or a name registered via RegisterEncoding.
func CountTextForModel ¶
CountTextForModel is a convenience wrapper that calls EncodingForModel and then CountText.
func EncodingForModel ¶
EncodingForModel returns the BPE encoding name appropriate for the given model ID, using prefix matching.
Mappings:
- minimax_bpe: minimax-*, MiniMax-*
- o200k_base: gpt-4o*, gpt-4.1*, gpt-4.5*, o1*, o3*, o4*
- cl100k_base: claude-*, gpt-4* (non-o suffixed), gpt-3.5*, and all unknowns
The second return value is false when the model was not recognised and the fallback encoding (cl100k_base) was returned.
func RegisterEncoding ¶ added in v0.26.0
RegisterEncoding registers a custom CountText implementation for the given encoding name. It is called from provider init() functions to wire in tokenizers that live outside the tokencount package, avoiding import cycles.
Registering the same name twice panics to catch accidental double-registration.
Types ¶
This section is empty.