api-server

command

v1.17.1 Latest Latest Go to latest Published: Mar 25, 2026 License: Apache-2.0 Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

README ¶

API Server Example

Start an OpenAI-compatible HTTP inference server powered by Zerfoo.

Prerequisites

Go 1.25+
A GGUF model file (e.g., Gemma 3 1B or Llama 3.2 1B)

Downloading a test model

pip install huggingface-hub

huggingface-cli download google/gemma-3-1b-it-qat-q4_0-gguf \
  --local-dir ./models

Build

go build -o api-server ./examples/api-server/

Run

./api-server ./models/gemma-3-1b-it-qat-q4_0.gguf

With a custom port:

./api-server -port 9090 ./models/gemma-3-1b-it-qat-q4_0.gguf

With GPU acceleration:

./api-server -device cuda ./models/gemma-3-1b-it-qat-q4_0.gguf

Testing with curl

Chat completion

curl -s http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-3-1b-it",
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "temperature": 0.7,
    "max_tokens": 128
  }' | jq .

Text completion

curl -s http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-3-1b-it",
    "prompt": "The capital of France is",
    "max_tokens": 64
  }' | jq .

Streaming

curl -N http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-3-1b-it",
    "messages": [{"role": "user", "content": "Write a haiku about Go."}],
    "stream": true
  }'

List models

curl -s http://localhost:8080/v1/models | jq .

Endpoints

Method	Path	Description
POST	`/v1/chat/completions`	Chat completion (OpenAI-compatible)
POST	`/v1/completions`	Text completion
POST	`/v1/embeddings`	Text embeddings
GET	`/v1/models`	List loaded models
GET	`/openapi.yaml`	OpenAPI specification
GET	`/metrics`	Prometheus metrics

Documentation ¶

Overview ¶

Command api-server demonstrates starting an OpenAI-compatible inference server.

Usage:

go build -o api-server ./examples/api-server/
./api-server path/to/model.gguf

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL