go-llm-api

module

v0.0.0-...-48005d2 Latest Latest Go to latest Published: May 12, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tiagomelo/go-llm-api

Links

Open Source Insights

README ¶

go-llm-api

A small Go REST API that wraps a local Ollama instance and exposes it through a clean, versioned HTTP interface — including a token-by-token streaming endpoint over Server-Sent Events.

Built on top of tiagomelo/go-templates/example-rest-api, which provides the routing, structured logging, middleware, graceful shutdown, and Swagger plumbing.

Walkthrough article: Building a streaming LLM API in Go with Ollama — and watching it run from a SwiftUI iOS app

Architecture

┌────────────────┐      HTTP / SSE       ┌──────────────┐      HTTP        ┌──────────┐
│  SwiftUI app   │  ───────────────────► │   Go API     │ ───────────────► │  Ollama  │
│  (iOS / iPad)  │  ◄─────────────────── │  (this repo) │ ◄─────────────── │ (Docker) │
└────────────────┘  data: {"response":…} └──────────────┘  ndjson chunks   └──────────┘

The Go API is the only thing that talks to Ollama. The iOS app talks to the Go API. Streaming flows end-to-end: as Ollama emits each token, the Go server forwards it as an SSE frame, and the iOS app appends it to the screen as it arrives.

API endpoints

Method	Path	Description
GET	`/api/v1/models`	List locally available models
POST	`/api/v1/generate`	Non-streaming generation (single JSON response)
POST	`/api/v1/generate/stream`	Token-by-token generation over Server-Sent Events

Prerequisites

Go 1.26+
Docker + Docker Compose (for Ollama and Swagger UI)
For the iOS app (optional): macOS, Xcode 15+, and xcodegen. See mobile/README.md.

Configuration

All runtime configuration lives in .env at the repo root:

# Ollama Docker Configuration
OLLAMA_CONTAINER_NAME=ollama
OLLAMA_HOST=localhost
OLLAMA_PORT=11434
OLLAMA_MODEL_NAME=llama3.2:1b
DOCKER_NETWORK_NAME=ollama_network

# Ollama HTTP Client Configuration
OLLAMA_HTTP_CLIENT_TIMEOUT_SECONDS=30
OLLAMA_HTTP_CLIENT_KEEP_ALIVE_SECONDS=30
OLLAMA_HTTP_CLIENT_IDLE_CONN_TIMEOUT_SECONDS=90
OLLAMA_HTTP_CLIENT_TLS_HANDSHAKE_TIMEOUT_SECONDS=10
OLLAMA_HTTP_CLIENT_EXPECT_CONTINUE_TIMEOUT_SECONDS=1

# Go LLM API Configuration
GO_LLM_API_PORT=4000

Note: the HTTP client timeout is applied to dial and response-header phases only — not the body — so streaming responses are not bounded by OLLAMA_HTTP_CLIENT_TIMEOUT_SECONDS. Per-request lifetime is controlled by the caller's context.Context.

Quickstart

make run-ollama      # bring up Ollama in Docker
make download-model  # pull OLLAMA_MODEL_NAME inside the container
make run-api         # start the Go API on :4000

make run-api also makes sure Ollama is reachable before booting the API, so in practice you can just run that one target — it brings up Ollama if it isn't already running and pulls the model if needed.

Sanity-check the streaming endpoint with curl:

curl -N -s \
  -H "Accept: text/event-stream" \
  -H "Content-Type: application/json" \
  -X POST http://localhost:4000/api/v1/generate/stream \
  -d '{"model":"llama3.2:1b","prompt":"say hello"}'

You should see frames trickle in one at a time. The -N flag is essential — without it curl buffers the response and you'll see everything at once.

Project structure

.
├── cmd/                      # Application entry point
│   └── main.go
├── config/                   # Env-based config loader
├── ollama/                   # Ollama client (Models, Generate, GenerateStream)
│   ├── ollama.go
│   ├── http.go               # http.Client factory tuned for streaming
│   └── ollama_test.go
├── handlers/                 # HTTP layer
│   ├── handlers.go
│   └── v1/
│       ├── v1.go             # v1 router + middleware wiring
│       ├── v1_test.go
│       └── ollama/           # Ollama-specific HTTP handlers
│           ├── ollama.go
│           ├── request.go
│           └── ollama_test.go
├── middleware/               # Logger, Compress (SSE-aware), PanicRecovery
├── validate/                 # Request validation helpers
├── web/                      # JSON request/response helpers
├── doc/                      # Swagger annotations + generated spec
├── docker-compose.yml        # Ollama service
├── Makefile
└── .env

Make targets

Target	Description
`help`	Show all available targets
`test`	Run unit tests with the race detector
`coverage`	Generate `coverage.html` (no `-race`, so `covermode=set` works)
`run-ollama`	Start the Ollama container
`stop-ollama`	Stop the Ollama container
`check-ollama`	Verify Ollama is reachable at `${OLLAMA_HOST}:${OLLAMA_PORT}`
`download-model`	Pull `${OLLAMA_MODEL_NAME}` inside the Ollama container
`run-api`	Start the Go API on `${GO_LLM_API_PORT}`
`swagger`	Regenerate the OpenAPI spec from code annotations
`swagger-ui`	Launch Swagger UI in Docker on port 80

Middleware

Applied to all v1 routes:

Logger — structured JSON logging of method, path, remote address, and duration via slog.
Compress — gzip for normal responses, bypassed for Accept: text/event-stream so SSE frames are not buffered by the compressor.
PanicRecovery — recovers from panics and logs stack traces.

Graceful shutdown

The server listens for SIGINT and SIGTERM and gives in-flight requests up to 5 seconds to complete before shutting down.

Testing

make test       # unit tests, race detector on
make coverage   # writes coverage.html (open in a browser)

make test runs with -race, which forces covermode=atomic — that produces frequency-tinted "grey" coverage that's easy to misread as uncovered. make coverage runs without -race and pins -covermode=set, so the HTML report is binary red/green.

Swagger documentation

make swagger      # regenerate doc/swagger.json from code annotations
make swagger-ui   # launch Swagger UI in Docker on port 80

Then open http://localhost for an interactive playground covering all three endpoints — including the streaming one.

License

MIT — see LICENSE.

Directories ¶

Path	Synopsis
cmd
config Package config provides the configuration struct and related functions for the application.	Package config provides the configuration struct and related functions for the application.
doc Go LLM API	Go LLM API
handlers Package handlers contains the HTTP handlers for the application.	Package handlers contains the HTTP handlers for the application.
v1 Package v1 provides version 1 of the API handlers.	Package v1 provides version 1 of the API handlers.
v1/ollama Package ollama provides handlers for interacting with the Ollama API.	Package ollama provides handlers for interacting with the Ollama API.
middleware Package middleware provides common HTTP middleware functions.	Package middleware provides common HTTP middleware functions.
ollama Package ollama provides a client for interacting with the Ollama API, allowing you to manage models and generate responses from them.	Package ollama provides a client for interacting with the Ollama API, allowing you to manage models and generate responses from them.
validate Package validate provides functions for validating input data and ensuring it meets the required criteria.	Package validate provides functions for validating input data and ensuring it meets the required criteria.
web Package web provides a simple web server for serving static files and handling HTTP requests.	Package web provides a simple web server for serving static files and handling HTTP requests.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL