Affected by GO-2025-3548 and 8 other vulnerabilities

GO-2025-3548: Ollama Vulnerable to Denial of Service (DoS) via Crafted GZIP in github.com/ollama/ollama

GO-2025-3557: Ollama Allocation of Resources Without Limits or Throttling vulnerability in github.com/ollama/ollama

GO-2025-3558: Ollama Allows Out-of-Bounds Read in github.com/ollama/ollama

GO-2025-3559: Ollama Divide By Zero vulnerability in github.com/ollama/ollama

GO-2025-3582: Ollama Denial of Service (DoS) via Null Pointer Dereference in github.com/ollama/ollama

GO-2025-3689: Ollama Divide by Zero Vulnerability in github.com/ollama/ollama

GO-2025-3695: Ollama Server Vulnerable to Denial of Service (DoS) Attack in github.com/ollama/ollama

GO-2025-3824: Ollama vulnerable to Cross-Domain Token Exposure in github.com/ollama/ollama

GO-2025-4251: Ollama has missing authentication enabling attackers to perform model management operations in github.com/ollama/ollama

llm

package

v0.9.1 Latest Latest Go to latest Published: Jun 14, 2025 License: MIT Imports: 31 Imported by: 8

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ollama/ollama

Links

Documentation ¶

Rendered for

Index ¶

Variables
func LoadModel(model string, maxArraySize int) (*ggml.GGML, error)
func PredictServerFit(allGpus discover.GpuInfoList, f *ggml.GGML, adapters, projectors []string, ...) (bool, uint64)
type CompletionRequest
type CompletionResponse
type DetokenizeRequest
type DetokenizeResponse
type DoneReason
- func (d DoneReason) String() string
type EmbeddingRequest
type EmbeddingResponse
type ImageData
type LlamaServer
- func NewLlamaServer(gpus discover.GpuInfoList, modelPath string, f *ggml.GGML, ...) (LlamaServer, error)
type MemoryEstimate
- func EstimateGPULayers(gpus []discover.GpuInfo, f *ggml.GGML, projectors []string, opts api.Options, ...) MemoryEstimate
- func (m MemoryEstimate) LogValue() slog.Value
type ServerStatus
- func (s ServerStatus) String() string
type ServerStatusResponse
type StatusWriter
- func NewStatusWriter(out *os.File) *StatusWriter
- func (w *StatusWriter) Write(b []byte) (int, error)
type TokenizeRequest
type TokenizeResponse

Constants ¶

This section is empty.

Variables ¶

View Source

var LlamaServerSysProcAttr = &syscall.SysProcAttr{}

Functions ¶

func LoadModel ¶ added in v0.1.33

func LoadModel(model string, maxArraySize int) (*ggml.GGML, error)

LoadModel will load a model from disk. The model must be in the GGML format.

It collects array values for arrays with a size less than or equal to maxArraySize. If maxArraySize is 0, the default value of 1024 is used. If the maxArraySize is negative, all arrays are collected.

func PredictServerFit ¶ added in v0.1.33

func PredictServerFit(allGpus discover.GpuInfoList, f *ggml.GGML, adapters, projectors []string, opts api.Options, numParallel int) (bool, uint64)

This algorithm looks for a complete fit to determine if we need to unload other models

Types ¶

type CompletionRequest ¶ added in v0.1.32

type CompletionRequest struct {
	Prompt  string
	Format  json.RawMessage
	Images  []ImageData
	Options *api.Options

	Grammar string // set before sending the request to the subprocess
}

type CompletionResponse ¶ added in v0.1.32

type CompletionResponse struct {
	Content            string        `json:"content"`
	DoneReason         DoneReason    `json:"done_reason"`
	Done               bool          `json:"done"`
	PromptEvalCount    int           `json:"prompt_eval_count"`
	PromptEvalDuration time.Duration `json:"prompt_eval_duration"`
	EvalCount          int           `json:"eval_count"`
	EvalDuration       time.Duration `json:"eval_duration"`
}

type DetokenizeRequest ¶

type DetokenizeRequest struct {
	Tokens []int `json:"tokens"`
}

type DetokenizeResponse ¶

type DetokenizeResponse struct {
	Content string `json:"content"`
}

type DoneReason ¶ added in v0.6.5

type DoneReason int

DoneReason represents the reason why a completion response is done

const (
	// DoneReasonStop indicates the completion stopped naturally
	DoneReasonStop DoneReason = iota
	// DoneReasonLength indicates the completion stopped due to length limits
	DoneReasonLength
	// DoneReasonConnectionClosed indicates the completion stopped due to the connection being closed
	DoneReasonConnectionClosed
)

func (DoneReason) String ¶ added in v0.6.5

func (d DoneReason) String() string

type EmbeddingRequest ¶

type EmbeddingRequest struct {
	Content string `json:"content"`
}

type EmbeddingResponse ¶

type EmbeddingResponse struct {
	Embedding []float32 `json:"embedding"`
}

type ImageData ¶

type ImageData struct {
	Data []byte `json:"data"`
	ID   int    `json:"id"`
}

type LlamaServer ¶ added in v0.1.32

type LlamaServer interface {
	Ping(ctx context.Context) error
	WaitUntilRunning(ctx context.Context) error
	Completion(ctx context.Context, req CompletionRequest, fn func(CompletionResponse)) error
	Embedding(ctx context.Context, input string) ([]float32, error)
	Tokenize(ctx context.Context, content string) ([]int, error)
	Detokenize(ctx context.Context, tokens []int) (string, error)
	Close() error
	EstimatedVRAM() uint64 // Total VRAM across all GPUs
	EstimatedTotal() uint64
	EstimatedVRAMByGPU(gpuID string) uint64
	Pid() int
}

func NewLlamaServer ¶ added in v0.1.32

func NewLlamaServer(gpus discover.GpuInfoList, modelPath string, f *ggml.GGML, adapters, projectors []string, opts api.Options, numParallel int) (LlamaServer, error)

NewLlamaServer will run a server for the given GPUs The gpu list must be a single family.

type MemoryEstimate ¶ added in v0.1.45

type MemoryEstimate struct {
	// How many layers we predict we can load
	Layers int

	// The size of the graph which occupies the main GPU
	Graph uint64

	// How much VRAM will be allocated given the number of layers we predict
	VRAMSize uint64

	// The total size of the model if loaded into VRAM.  If all layers are loaded, VRAMSize == TotalSize
	TotalSize uint64

	// For multi-GPU scenarios, this provides the tensor split parameter
	TensorSplit string

	// For multi-GPU scenarios, this is the size in bytes per GPU
	GPUSizes []uint64
	// contains filtered or unexported fields
}

func EstimateGPULayers ¶ added in v0.1.33

func EstimateGPULayers(gpus []discover.GpuInfo, f *ggml.GGML, projectors []string, opts api.Options, numParallel int) MemoryEstimate

Given a model and one or more GPU targets, predict how many layers and bytes we can load, and the total size The GPUs provided must all be the same Library

func (MemoryEstimate) LogValue ¶ added in v0.5.12

func (m MemoryEstimate) LogValue() slog.Value

type ServerStatus ¶ added in v0.1.32

type ServerStatus int

const (
	ServerStatusReady ServerStatus = iota
	ServerStatusNoSlotsAvailable
	ServerStatusLoadingModel
	ServerStatusNotResponding
	ServerStatusError
)

func (ServerStatus) String ¶ added in v0.6.2

func (s ServerStatus) String() string

type ServerStatusResponse ¶ added in v0.6.2

type ServerStatusResponse struct {
	Status   ServerStatus `json:"status"`
	Progress float32      `json:"progress"`
}

type StatusWriter ¶ added in v0.1.32

type StatusWriter struct {
	LastErrMsg string
	// contains filtered or unexported fields
}

StatusWriter is a writer that captures error messages from the llama runner process

func NewStatusWriter ¶ added in v0.1.32

func NewStatusWriter(out *os.File) *StatusWriter

func (*StatusWriter) Write ¶ added in v0.1.32

func (w *StatusWriter) Write(b []byte) (int, error)

type TokenizeRequest ¶

type TokenizeRequest struct {
	Content string `json:"content"`
}

type TokenizeResponse ¶

type TokenizeResponse struct {
	Tokens []int `json:"tokens"`
}

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL