tokencount

package
v0.28.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 25, 2026 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package tokencount provides a shared offline tiktoken wrapper for LLM token estimation. It maps model IDs to BPE encodings and counts tokens without any network calls, using embedded BPE tables from tiktoken-go-loader.

Custom encodings (e.g. MiniMax BPE) can be registered at init time via RegisterEncoding so that provider packages can wire in their own tokenizers without creating an import cycle (tokencount ← provider ← tokencount).

Index

Constants

View Source
const (
	EncodingCL100K = "cl100k_base"
	EncodingO200K  = "o200k_base"
	// EncodingMinimax is the encoding name for the MiniMax BPE tokenizer.
	// The implementation is registered by provider/minimax at init time via
	// RegisterEncoding.
	EncodingMinimax = "minimax_bpe"
)

Variables

This section is empty.

Functions

func CountText

func CountText(encoding, text string) (int, error)

CountText returns the number of tokens in text using the named BPE encoding. The encoding must be one of the constants in this package (cl100k_base, o200k_base, minimax_bpe) or a name registered via RegisterEncoding.

func CountTextForModel

func CountTextForModel(modelID, text string) (int, error)

CountTextForModel is a convenience wrapper that calls EncodingForModel and then CountText.

func EncodingForModel

func EncodingForModel(modelID string) (encoding string, ok bool)

EncodingForModel returns the BPE encoding name appropriate for the given model ID, using prefix matching.

Mappings:

  • minimax_bpe: minimax-*, MiniMax-*
  • o200k_base: gpt-4o*, gpt-4.1*, gpt-4.5*, o1*, o3*, o4*
  • cl100k_base: claude-*, gpt-4* (non-o suffixed), gpt-3.5*, and all unknowns

The second return value is false when the model was not recognised and the fallback encoding (cl100k_base) was returned.

func RegisterEncoding added in v0.26.0

func RegisterEncoding(name string, fn func(text string) (int, error))

RegisterEncoding registers a custom CountText implementation for the given encoding name. It is called from provider init() functions to wire in tokenizers that live outside the tokencount package, avoiding import cycles.

Registering the same name twice panics to catch accidental double-registration.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL