tokeniser

package
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 4, 2026 License: EUPL-1.2 Imports: 4 Imported by: 0

Documentation

Index

Constants

View Source
const (
	EncodingCl100kBase = "cl100k_base" // GPT-4, GPT-3.5-turbo, embeddings
	EncodingO200kBase  = "o200k_base"  // GPT-4o, GPT-4.1, GPT-4.5
	EncodingP50kBase   = "p50k_base"   // Codex models, davinci-002, davinci-003
	EncodingR50kBase   = "r50k_base"   // GPT-3 models
)

Encoding names accepted by tiktoken for common model families.

Variables

This section is empty.

Functions

func CountTokens

func CountTokens(text string, model string) (int, error)

CountTokens returns the number of tokens in text for the given model.

func EncodingForModel

func EncodingForModel(model string) (string, error)

EncodingForModel maps a model name to the nearest tiktoken encoding. Unknown model families fall back to cl100k_base.

func GetEncoding

func GetEncoding(encodingName string) (*tiktoken.Tiktoken, error)

GetEncoding returns a cached Tiktoken encoding, initialising it if necessary.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL