tokencount

package

v0.28.0 Latest Latest Go to latest Published: Mar 25, 2026 License: MIT Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/codewandler/llm

Links

Open Source Insights

Documentation ¶

Overview ¶

Package tokencount provides a shared offline tiktoken wrapper for LLM token estimation. It maps model IDs to BPE encodings and counts tokens without any network calls, using embedded BPE tables from tiktoken-go-loader.

Custom encodings (e.g. MiniMax BPE) can be registered at init time via RegisterEncoding so that provider packages can wire in their own tokenizers without creating an import cycle (tokencount ← provider ← tokencount).

Index ¶

Constants
func CountText(encoding, text string) (int, error)
func CountTextForModel(modelID, text string) (int, error)
func EncodingForModel(modelID string) (encoding string, ok bool)
func RegisterEncoding(name string, fn func(text string) (int, error))

Constants ¶

View Source

const (
	EncodingCL100K = "cl100k_base"
	EncodingO200K  = "o200k_base"
	// EncodingMinimax is the encoding name for the MiniMax BPE tokenizer.
	// The implementation is registered by provider/minimax at init time via
	// RegisterEncoding.
	EncodingMinimax = "minimax_bpe"
)

Variables ¶

This section is empty.

Functions ¶

func CountText ¶

func CountText(encoding, text string) (int, error)

CountText returns the number of tokens in text using the named BPE encoding. The encoding must be one of the constants in this package (cl100k_base, o200k_base, minimax_bpe) or a name registered via RegisterEncoding.

func CountTextForModel ¶

func CountTextForModel(modelID, text string) (int, error)

CountTextForModel is a convenience wrapper that calls EncodingForModel and then CountText.

func EncodingForModel ¶

func EncodingForModel(modelID string) (encoding string, ok bool)

EncodingForModel returns the BPE encoding name appropriate for the given model ID, using prefix matching.

Mappings:

minimax_bpe: minimax-*, MiniMax-*
o200k_base: gpt-4o*, gpt-4.1*, gpt-4.5*, o1*, o3*, o4*
cl100k_base: claude-*, gpt-4* (non-o suffixed), gpt-3.5*, and all unknowns

The second return value is false when the model was not recognised and the fallback encoding (cl100k_base) was returned.

func RegisterEncoding ¶ added in v0.26.0

func RegisterEncoding(name string, fn func(text string) (int, error))

RegisterEncoding registers a custom CountText implementation for the given encoding name. It is called from provider init() functions to wire in tokenizers that live outside the tokencount package, avoiding import cycles.

Registering the same name twice panics to catch accidental double-registration.

Types ¶

This section is empty.

Source Files ¶

View all Source files

tiktoken.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL