llama

package
v1.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 28, 2023 License: MIT Imports: 16 Imported by: 1

Documentation

Index

Constants

View Source
const (
	LLAMA_FILE_VERSION           = 1
	LLAMA_FILE_MAGIC             = 0x67676a74 // 'ggjt' in hex
	LLAMA_FILE_MAGIC_OLD         = 0x67676d66 // 'ggmf' in hex
	LLAMA_FILE_MAGIC_UNVERSIONED = 0x67676d6c // 'ggml' pre-versioned files
)

Variables

This section is empty.

Functions

func Colorize

func Colorize(format string, opts ...interface{}) (n int, err error)

Colorize is a function to print colored text to the console

func Eval

func Eval(
	lctx *Context,
	vocab *ml.Vocab,
	model *Model,
	tokens []uint32,
	pastCount uint32,
	params *ModelParams,
) error

Eval runs one inference iteration over the LLaMA model lctx = model context with all LLaMA data tokens = new batch of tokens to process pastCount = the context size so far params = all other parameters like max threads allowed, etc

func ExtractTokens

func ExtractTokens(r *ring.Ring, count int) []uint32

ExtractTokens is a function to extract a slice of tokens from the ring buffer

func Resize

func Resize(slice []float32, size int) []float32

Resize() (safe) for using instead of C++ std::vector:resize() https://go.dev/play/p/VlQ7N75E5AD

func ResizeInplace

func ResizeInplace(slice *[]float32, size int)

NB! This do not clear the underlying array when resizing https://go.dev/play/p/DbK4dFqwrZn

func SampleTopPTopK

func SampleTopPTopK(
	logits []float32,
	lastNTokens *ring.Ring,
	lastNTokensSize uint32,
	topK uint32,
	topP float32,
	temp float32,
	repeatPenalty float32,
) uint32

SampleTopPTopK samples next token given probabilities for each embedding:

  • consider only the top K tokens
  • from them, consider only the top tokens with cumulative probability > P

Types

type Context

type Context struct {
	Logits    []float32 // decode output 2D array [tokensCount][vocabSize]
	Embedding []float32 // input embedding 1D array [embdSize]
	MLContext *ml.Context
	// contains filtered or unexported fields
}

Context is the context of the model.

func NewContext

func NewContext(model *Model, params *ModelParams) *Context

NewContext creates a new context.

func (*Context) ReleaseContext added in v1.4.0

func (ctx *Context) ReleaseContext()

type ContextParams

type ContextParams struct {
	CtxSize    uint32 // text context
	PartsCount int    // -1 for default
	Seed       int    // RNG seed, 0 for random
	LogitsAll  bool   // the llama_eval() call computes all logits, not just the last one
	VocabOnly  bool   // only load the vocabulary, no weights
	UseLock    bool   // force system to keep model in RAM
	Embedding  bool   // embedding mode only
}

ContextParams are the parameters for the context. struct llama_context_params {

type HParams

type HParams struct {
	// contains filtered or unexported fields
}

HParams are the hyperparameters of the model (LLaMA-7B commented as example).

type KVCache

type KVCache struct {
	K *ml.Tensor
	V *ml.Tensor

	N uint32 // number of tokens currently in the cache
}

KVCache is a key-value cache for the self attention.

type Layer

type Layer struct {
	// contains filtered or unexported fields
}

Layer is a single layer of the model.

type Model

type Model struct {
	Type ModelType
	// contains filtered or unexported fields
}

Model is the representation of any NN model (and LLaMA too).

func LoadModel

func LoadModel(fileName string, params *ModelParams, silent bool) (*ml.Vocab, *Model, error)

LoadModel loads a model's weights from a file See convert-pth-to-ggml.py for details on format func LoadModel(fileName string, params ModelParams, silent bool) (*Context, error) {

func NewModel

func NewModel(params *ModelParams) *Model

NewModel creates a new model with default hyperparameters.

type ModelParams added in v1.2.0

type ModelParams struct {
	Model  string // model path
	Prompt string

	MaxThreads int

	UseAVX  bool
	UseNEON bool

	Seed         int
	PredictCount uint32 // new tokens to predict
	RepeatLastN  uint32 // last n tokens to penalize
	PartsCount   int    // amount of model parts (-1 = determine from model dimensions)
	CtxSize      uint32 // context size
	BatchSize    uint32 // batch size for prompt processing
	KeepCount    uint32

	TopK          uint32  // 40
	TopP          float32 // 0.95
	Temp          float32 // 0.80
	RepeatPenalty float32 // 1.10

	InputPrefix string   // string to prefix user inputs with
	Antiprompt  []string // string upon seeing which more user input is prompted

	MemoryFP16   bool // use f16 instead of f32 for memory kv
	RandomPrompt bool // do not randomize prompt if none provided
	UseColor     bool // use color to distinguish generations and inputs
	Interactive  bool // interactive mode

	Embedding        bool // get only sentence embedding
	InteractiveStart bool // wait for user input immediately

	Instruct   bool // instruction mode (used for Alpaca models)
	IgnoreEOS  bool // do not stop generating after eos
	Perplexity bool // compute perplexity over the prompt
	UseMLock   bool // use mlock to keep model in memory
	MemTest    bool // compute maximum memory usage

	VerbosePrompt bool
}

type ModelType

type ModelType uint8

ModelType is the type of the model.

const (
	MODEL_UNKNOWN ModelType = iota
	MODEL_7B
	MODEL_13B
	MODEL_30B
	MODEL_65B
)

available llama models

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL