go-llm-api

module
v0.0.0-...-48005d2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 12, 2026 License: MIT

README

CI Go Reference

go-llm-api

A small Go REST API that wraps a local Ollama instance and exposes it through a clean, versioned HTTP interface — including a token-by-token streaming endpoint over Server-Sent Events.

Built on top of tiagomelo/go-templates/example-rest-api, which provides the routing, structured logging, middleware, graceful shutdown, and Swagger plumbing.

Walkthrough article: Building a streaming LLM API in Go with Ollama — and watching it run from a SwiftUI iOS app


Architecture

┌────────────────┐      HTTP / SSE       ┌──────────────┐      HTTP        ┌──────────┐
│  SwiftUI app   │  ───────────────────► │   Go API     │ ───────────────► │  Ollama  │
│  (iOS / iPad)  │  ◄─────────────────── │  (this repo) │ ◄─────────────── │ (Docker) │
└────────────────┘  data: {"response":…} └──────────────┘  ndjson chunks   └──────────┘

The Go API is the only thing that talks to Ollama. The iOS app talks to the Go API. Streaming flows end-to-end: as Ollama emits each token, the Go server forwards it as an SSE frame, and the iOS app appends it to the screen as it arrives.


API endpoints

Method Path Description
GET /api/v1/models List locally available models
POST /api/v1/generate Non-streaming generation (single JSON response)
POST /api/v1/generate/stream Token-by-token generation over Server-Sent Events

Prerequisites


Configuration

All runtime configuration lives in .env at the repo root:

# Ollama Docker Configuration
OLLAMA_CONTAINER_NAME=ollama
OLLAMA_HOST=localhost
OLLAMA_PORT=11434
OLLAMA_MODEL_NAME=llama3.2:1b
DOCKER_NETWORK_NAME=ollama_network

# Ollama HTTP Client Configuration
OLLAMA_HTTP_CLIENT_TIMEOUT_SECONDS=30
OLLAMA_HTTP_CLIENT_KEEP_ALIVE_SECONDS=30
OLLAMA_HTTP_CLIENT_IDLE_CONN_TIMEOUT_SECONDS=90
OLLAMA_HTTP_CLIENT_TLS_HANDSHAKE_TIMEOUT_SECONDS=10
OLLAMA_HTTP_CLIENT_EXPECT_CONTINUE_TIMEOUT_SECONDS=1

# Go LLM API Configuration
GO_LLM_API_PORT=4000

Note: the HTTP client timeout is applied to dial and response-header phases only — not the body — so streaming responses are not bounded by OLLAMA_HTTP_CLIENT_TIMEOUT_SECONDS. Per-request lifetime is controlled by the caller's context.Context.


Quickstart

make run-ollama      # bring up Ollama in Docker
make download-model  # pull OLLAMA_MODEL_NAME inside the container
make run-api         # start the Go API on :4000

make run-api also makes sure Ollama is reachable before booting the API, so in practice you can just run that one target — it brings up Ollama if it isn't already running and pulls the model if needed.

Sanity-check the streaming endpoint with curl:

curl -N -s \
  -H "Accept: text/event-stream" \
  -H "Content-Type: application/json" \
  -X POST http://localhost:4000/api/v1/generate/stream \
  -d '{"model":"llama3.2:1b","prompt":"say hello"}'

You should see frames trickle in one at a time. The -N flag is essential — without it curl buffers the response and you'll see everything at once.


Project structure

.
├── cmd/                      # Application entry point
│   └── main.go
├── config/                   # Env-based config loader
├── ollama/                   # Ollama client (Models, Generate, GenerateStream)
│   ├── ollama.go
│   ├── http.go               # http.Client factory tuned for streaming
│   └── ollama_test.go
├── handlers/                 # HTTP layer
│   ├── handlers.go
│   └── v1/
│       ├── v1.go             # v1 router + middleware wiring
│       ├── v1_test.go
│       └── ollama/           # Ollama-specific HTTP handlers
│           ├── ollama.go
│           ├── request.go
│           └── ollama_test.go
├── middleware/               # Logger, Compress (SSE-aware), PanicRecovery
├── validate/                 # Request validation helpers
├── web/                      # JSON request/response helpers
├── doc/                      # Swagger annotations + generated spec
├── docker-compose.yml        # Ollama service
├── Makefile
└── .env

Make targets

Target Description
help Show all available targets
test Run unit tests with the race detector
coverage Generate coverage.html (no -race, so covermode=set works)
run-ollama Start the Ollama container
stop-ollama Stop the Ollama container
check-ollama Verify Ollama is reachable at ${OLLAMA_HOST}:${OLLAMA_PORT}
download-model Pull ${OLLAMA_MODEL_NAME} inside the Ollama container
run-api Start the Go API on ${GO_LLM_API_PORT}
swagger Regenerate the OpenAPI spec from code annotations
swagger-ui Launch Swagger UI in Docker on port 80

Middleware

Applied to all v1 routes:

  • Logger — structured JSON logging of method, path, remote address, and duration via slog.
  • Compress — gzip for normal responses, bypassed for Accept: text/event-stream so SSE frames are not buffered by the compressor.
  • PanicRecovery — recovers from panics and logs stack traces.

Graceful shutdown

The server listens for SIGINT and SIGTERM and gives in-flight requests up to 5 seconds to complete before shutting down.


Testing

make test       # unit tests, race detector on
make coverage   # writes coverage.html (open in a browser)

make test runs with -race, which forces covermode=atomic — that produces frequency-tinted "grey" coverage that's easy to misread as uncovered. make coverage runs without -race and pins -covermode=set, so the HTML report is binary red/green.


Swagger documentation

make swagger      # regenerate doc/swagger.json from code annotations
make swagger-ui   # launch Swagger UI in Docker on port 80

Then open http://localhost for an interactive playground covering all three endpoints — including the streaming one.


License

MIT — see LICENSE.

Directories

Path Synopsis
Package config provides the configuration struct and related functions for the application.
Package config provides the configuration struct and related functions for the application.
Go LLM API
Go LLM API
Package handlers contains the HTTP handlers for the application.
Package handlers contains the HTTP handlers for the application.
v1
Package v1 provides version 1 of the API handlers.
Package v1 provides version 1 of the API handlers.
v1/ollama
Package ollama provides handlers for interacting with the Ollama API.
Package ollama provides handlers for interacting with the Ollama API.
Package middleware provides common HTTP middleware functions.
Package middleware provides common HTTP middleware functions.
Package ollama provides a client for interacting with the Ollama API, allowing you to manage models and generate responses from them.
Package ollama provides a client for interacting with the Ollama API, allowing you to manage models and generate responses from them.
Package validate provides functions for validating input data and ensuring it meets the required criteria.
Package validate provides functions for validating input data and ensuring it meets the required criteria.
Package web provides a simple web server for serving static files and handling HTTP requests.
Package web provides a simple web server for serving static files and handling HTTP requests.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL