daimon

module

v0.2.8 Latest Latest Go to latest Published: May 7, 2026 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/sonicboom15/daimon

Links

Open Source Insights

README ¶

daimon

The spirit that runs alongside your AI app.

Daimon is a local sidecar process that gives your application a single, stable HTTP interface to any LLM. Swap providers, rotate keys, add tracing, wire up MCP tools, query vector stores, or traverse knowledge graphs — without touching your app code.

Inspired by Dapr's component model, adapted for AI-native primitives: streaming responses, pluggable providers, MCP tool calls, vector/graph stores, and persistent sessions.

How it works

your app  ──POST /v1/converse/claude──▶  daimon  ──▶  Anthropic API
          ◀── text/event-stream ────────────────────────────────────
                                            │
                                     MCP tool server(s)
                                   (filesystem, GitHub, ...)
                                            │
                                   vector stores (Chroma, Qdrant,
                                     Redis, pgvector, in-memory)
                                            │
                                   graph stores (Neo4j, Memgraph)

Daimon runs on localhost:3500. Your app speaks plain HTTP + Server-Sent Events. The provider, model, credentials, and tool servers all live in a YAML config — not in your code.

Quick start

Prerequisites: An OpenAI or Anthropic API key.

1 — Install

macOS / Linux — Homebrew

brew tap sonicboom15/tap
brew install daimon

Windows — winget

winget install sonicboom15.daimon

Windows — Scoop

scoop bucket add sonicboom15 https://github.com/sonicboom15/scoop-bucket
scoop install daimon

Linux — apt / rpm Download the .deb or .rpm from the latest release and install with dpkg -i or rpm -i.

Build from source

git clone https://github.com/sonicboom15/daimon.git && cd daimon && make build
# → ./bin/daimon

2 — Create a config

# config.yaml
port: 3500

components:
  - name: claude
    type: anthropic
    metadata:
      default_model: claude-haiku-4-5-20251001
      # api_key: sk-ant-...  # or set ANTHROPIC_API_KEY

  - name: gpt4o
    type: openai
    metadata:
      default_model: gpt-4o-mini
      # api_key: sk-...  # or set OPENAI_API_KEY

3 — Run

export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
daimon serve --config config.yaml

INFO daimon listening addr=127.0.0.1:3500

4 — First request

curl:

curl -sN http://127.0.0.1:3500/v1/converse/claude \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is a daimon?"}]}'

data: {"type":"text","text":"In ancient Greek thought, a daimon"}
data: {"type":"text","text":" is a guiding spirit..."}
data: {"type":"done"}

Python SDK:

pip install daimon-client

import daimon_client as daimon

with daimon.Client() as client:
    for text in client.stream("claude", "What is a daimon?"):
        print(text, end="", flush=True)

TypeScript SDK:

npm install daimon-client

import { Client } from 'daimon-client';

const client = new Client();
for await (const text of client.stream('claude', 'What is a daimon?')) {
  process.stdout.write(text);
}

Configuration

port: 3500

components:

  # ── Embedder (declare before vector stores) ──────────────────────────────
  # - name: embedder
  #   type: embedding/openai
  #   metadata:
  #     base_url: http://localhost:11434/v1   # Ollama; omit for OpenAI
  #     model: nomic-embed-text
  #     dimensions: "768"

  # ── Session store (optional; defaults to in-memory) ──────────────────────
  # - name: sessions
  #   type: session/redis
  #   metadata:
  #     addr: localhost:6379
  #     ttl: "24h"

  # ── Vector / document stores ─────────────────────────────────────────────
  # - name: docs
  #   type: inmemory          # BM25 lexical, no deps — dev/testing only
  #
  # - name: chroma-docs
  #   type: chroma
  #   metadata:
  #     base_url: http://localhost:8000
  #     collection: daimon
  #     create_if_missing: "true"
  #
  # - name: qdrant-docs
  #   type: qdrant
  #   metadata:
  #     base_url: http://localhost:6333
  #     collection: daimon
  #     embedder: embedder
  #     create_if_missing: "true"

  # ── Graph stores ──────────────────────────────────────────────────────────
  # - name: kg
  #   type: neo4j
  #   metadata:
  #     bolt_url: bolt://localhost:7687
  #     username: neo4j
  #     password: secret

  # ── LLM components ────────────────────────────────────────────────────────
  - name: claude
    type: anthropic
    # memory_store: chroma-docs   # enable transparent RAG from a vector store
    metadata:
      default_model: claude-opus-4-7
      # api_key: sk-ant-...  # or set ANTHROPIC_API_KEY
    # defaults:
    #   temperature: 1.0
    #   max_tokens: 4096
    #   top_p: 0.9
    #   top_k: 50          # Anthropic-specific
    #   stop: ["Human:"]
    #   system: "You are a helpful assistant."

  - name: gpt4o
    type: openai
    metadata:
      default_model: gpt-4o
      # api_key: sk-...  # or set OPENAI_API_KEY
    # defaults:
    #   temperature: 0.7
    #   max_tokens: 2048
    #   frequency_penalty: 0.0
    #   presence_penalty: 0.0
    #   seed: 42

  - name: local
    type: llamacpp
    metadata:
      base_url: http://localhost:11434/v1   # Ollama default
      # base_url: http://localhost:1234/v1  # LM Studio default
      # base_url: http://localhost:8080/v1  # llama.cpp default
      default_model: llama3.2:3b

# MCP tool servers — daimon connects at startup and injects their tools into
# every chat request automatically. The model can call them; daimon runs the loop.
# mcp_servers:
#   - name: filesystem
#     command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
#   - name: github
#     command: ["npx", "-y", "@modelcontextprotocol/server-github"]

telemetry:
  otlp_endpoint: ""   # e.g. "localhost:4318" — leave empty to disable

All component types — LLMs, embedders, session stores, vector stores, and graph stores — live under components:. Declaration order matters: embedders before vector stores, vector stores before LLMs that reference them via memory_store:. See examples/config.yaml for the fully-documented reference.

API

`POST /v1/converse/{component}`

Send a chat request and receive a streaming response over Server-Sent Events.

Request body:

{
  "messages": [
    { "role": "system",    "content": "You are a helpful assistant." },
    { "role": "user",      "content": "What is a daimon?" }
  ],
  "model":             "gpt-4o-mini",
  "system":            "Override or set a system prompt here.",
  "max_tokens":        512,
  "temperature":       0.7,
  "top_p":             0.9,
  "top_k":             50,
  "stop":              ["Human:"],
  "frequency_penalty": 0.0,
  "presence_penalty":  0.0,
  "seed":              42,
  "tools": [
    {
      "name":        "get_weather",
      "description": "Get current weather for a city.",
      "input_schema": {
        "type": "object",
        "properties": { "city": { "type": "string" } },
        "required":   ["city"]
      }
    }
  ]
}

All fields except messages are optional. Omitted inference parameters fall back to the component's configured defaults.

Sessions: include "session_id" to have daimon maintain conversation history server-side. Only send the new user turn — the server prepends stored history automatically.

# Turn 1
curl -sN http://127.0.0.1:3500/v1/converse/claude \
  -H "Content-Type: application/json" \
  -d '{"session_id":"chat-1","messages":[{"role":"user","content":"My name is Alice."}]}'

# Turn 2 — server prepends the previous exchange automatically
curl -sN http://127.0.0.1:3500/v1/converse/claude \
  -H "Content-Type: application/json" \
  -d '{"session_id":"chat-1","messages":[{"role":"user","content":"What is my name?"}]}'

Clear a session with DELETE /v1/sessions/{id} (returns 204, idempotent).

Provider support matrix:

Parameter	OpenAI	Anthropic	llamacpp
`temperature`	✓	✓	✓
`max_tokens`	✓	✓	✓
`top_p`	✓	✓	✓
`top_k`	—	✓	—
`stop`	✓	✓	✓
`frequency_penalty`	✓	—	✓
`presence_penalty`	✓	—	✓
`seed`	✓	—	✓

Unsupported parameters are silently ignored per provider.

Response (text/event-stream):

data: {"type":"text","text":"In ancient Greek thought..."}

data: {"type":"tool_call","tool_call":{"id":"call_1","name":"get_weather","input":{"city":"London"}}}

data: {"type":"text","text":"The weather in London is 12°C."}

data: {"type":"done"}

Each data: line is a JSON object:

`type`	additional fields	meaning
`text`	`text`	a fragment of the model's response
`tool_call`	`tool_call.id`, `.name`, `.input`	model invoked a tool (daimon executes it and continues)
`done`	—	stream finished successfully
`error`	`error`	terminal error; stream ends

tool_call events are forwarded so clients can show progress ("calling tool X…"). Daimon executes the tool automatically and loops back to the model — no client-side action needed.

`DELETE /v1/sessions/{id}`

Clears server-side session history for the given ID. Returns 204 No Content. Idempotent — deleting a session that does not exist is not an error.

`GET /healthz`

Returns 200 ok when the sidecar is up.

Python SDK

Install:

pip install daimon-client

Streaming text:

import daimon_client as daimon

# context manager reuses the HTTP connection
with daimon.Client() as client:
    for text in client.stream("claude", "Explain recursion in one sentence."):
        print(text, end="", flush=True)
print()

Convenience: collect the full response:

reply = client.chat("gpt4o", "What is the capital of France?")
print(reply)  # "The capital of France is Paris."

Multi-turn conversation:

messages = [
    daimon.Message(role="system", content="You are a helpful assistant."),
    daimon.Message(role="user",   content="My name is Alice."),
]
reply = client.chat("claude", messages)
messages.append(daimon.Message(role="assistant", content=reply))
messages.append(daimon.Message(role="user", content="What is my name?"))
print(client.chat("claude", messages))

Sessions:

client.chat("claude", "My name is Alice.", session_id="chat-1")
reply = client.chat("claude", "What is my name?", session_id="chat-1")
# reply: "Your name is Alice."
client.clear_session("chat-1")

With inference parameters:

reply = client.chat(
    "gpt4o",
    "Write a haiku about Go.",
    model="gpt-4o",
    temperature=0.9,
    max_tokens=64,
)

Observing tool calls:

def on_tool(tc: daimon.ToolCall) -> None:
    print(f"[tool: {tc.name}({tc.input})]")

for text in client.stream("claude", "What's the weather in Tokyo?", on_tool_call=on_tool):
    print(text, end="", flush=True)

Async:

import asyncio
import daimon_client as daimon

async def main():
    async with daimon.AsyncClient() as client:
        async for text in client.stream("claude", "Hello!"):
            print(text, end="", flush=True)

asyncio.run(main())

Full runnable examples: examples/client/chat.py · examples/client/chat_async.py

TypeScript SDK

Install:

npm install daimon-client

Streaming text:

import { Client } from 'daimon-client';

const client = new Client();
for await (const text of client.stream('claude', 'Explain recursion in one sentence.')) {
  process.stdout.write(text);
}

Convenience: collect the full response:

const reply = await client.chat('gpt4o', 'What is the capital of France?');
console.log(reply); // "The capital of France is Paris."

Sessions:

await client.chat('claude', 'My name is Alice.', { session_id: 'chat-1' });
const reply = await client.chat('claude', 'What is my name?', { session_id: 'chat-1' });
// reply: "Your name is Alice."
await client.clearSession('chat-1');

With inference parameters:

const reply = await client.chat('gpt4o', 'Write a haiku about Go.', {
  model:       'gpt-4o',
  temperature: 0.9,
  max_tokens:  64,
});

Full runnable examples: sdk/typescript/examples/

Tool calls via MCP

Daimon acts as an MCP client. Configure MCP servers in YAML and daimon:

Connects to each server at startup and fetches its tool catalogue.
Injects all tools into every chat request automatically.
When the model calls a tool, daimon executes it via the MCP server and feeds the result back — looping until the model returns a plain text response.

Your application sees a single streaming response with the final answer, plus tool_call events for progress:

mcp_servers:
  - name: filesystem
    command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
  - name: brave-search
    command: ["npx", "-y", "@modelcontextprotocol/server-brave-search"]

No client-side changes required.

Memory & Graph Stores

Daimon ships with five vector stores and two graph stores, all configured the same way — as components: entries.

Vector stores

Type	External service	Embedding
`inmemory`	None	BM25 (lexical)
`chroma`	Chroma	Server-side
`qdrant`	Qdrant	Configurable endpoint
`redis`	Redis Stack	Configurable endpoint
`pgvector`	PostgreSQL + pgvector	Configurable endpoint

HTTP API: PUT /v1/memory/{store}/{id} · POST /v1/memory/{store} · POST /v1/memory/{store}/query · DELETE /v1/memory/{store}/{id}

Python SDK:

store = client.memory("docs")
store.upsert("The Eiffel Tower is 330 m tall.", id="doc1", metadata={"src": "wiki"})
results = store.query("tall Paris structures", top_k=3)
# results[0].id, .content, .score, .metadata
store.delete("doc1")

TypeScript SDK:

const store = client.memory('docs');
await store.upsert('The Eiffel Tower is 330 m tall.', { id: 'doc1', metadata: { src: 'wiki' } });
const results = await store.query('tall Paris structures', 3);
await store.delete('doc1');

Transparent RAG

Add memory_store: <name> to any LLM component and daimon automatically queries the store before every chat request, injecting the top results as a system message:

- name: claude
  type: anthropic
  memory_store: chroma-docs

No client code changes needed — the enrichment happens inside the sidecar.

Graph stores

Type	External service	Protocol
`neo4j`	Neo4j	Bolt (default) / HTTP
`memgraph`	Memgraph	Bolt (default) / HTTP

HTTP API: PUT /v1/graph/{store}/nodes/{id} · POST /v1/graph/{store}/edges · POST /v1/graph/{store}/cypher · DELETE /v1/graph/{store}/nodes/{id}

Python SDK:

graph = client.graph("kg")
graph.add_node(id="alice", labels=["Person"], props={"name": "Alice"})
graph.add_edge("alice", "bob", "KNOWS")
rows = graph.cypher("MATCH (a)-[:KNOWS]->(b) RETURN a.name, b.name")

Both stores also generate {name}_cypher, {name}_add_node, and {name}_add_edge tools that the LLM can call directly via the agentic loop.

Supported providers

Type	Env var	Default model
`openai`	`OPENAI_API_KEY`	`gpt-4o`
`anthropic`	`ANTHROPIC_API_KEY`	`claude-opus-4-7`
`llamacpp`	—	(required)

llamacpp connects to any OpenAI-compatible local server: llama.cpp, Ollama, or LM Studio. Set base_url in metadata to point at your server's /v1 endpoint.

Adding a provider

Create internal/components/llm/<name>/<name>.go.

Implement conversation.Conversation:

type Component struct { /* ... */ }

func (c *Component) Chat(ctx context.Context, req conversation.Request) (<-chan conversation.Chunk, error) {
    // stream chunks through the returned channel
}

Register in init():

func init() {
    conversation.Register("<name>", func(cfg conversation.ComponentConfig) (conversation.Conversation, error) {
        return New(cfg)
    })
}

Blank-import the package from cmd/daimon/serve.go and cmd/daimon/run.go.
Add a worked example to examples/config.yaml.

No changes to the server, config loader, or any other package. See Development for adding vector stores or graph stores.

Development

make build          # compile → ./bin/daimon
make run            # build + run with examples/config.yaml
make test           # go test ./...
make lint           # golangci-lint
make fmt            # gofmt + goimports
make license-check

Integration tests (require API keys / Docker):

# OpenAI + Anthropic
OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... \
  go test -tags integration -v ./internal/components/...

# llamacpp — starts Ollama in Docker automatically, pulls qwen2.5:1.5b
go test -tags integration -v ./internal/components/llm/llamacpp/

# Full e2e suite (Go + Python SDK + TypeScript SDK) — requires Docker
go test -tags integration -v -timeout 20m ./test/e2e/

Python SDK tests:

cd sdk/python
pip install -e ".[dev]"
pytest tests/ -v

TypeScript SDK tests:

cd sdk/typescript
npm install
npm test

Roadmap

AI-native memory systems (Zep, Mem0) — session-aware, auto-summarising, distinct from vector stores
Middleware pipeline — per-request hooks for moderation, PII redaction, semantic cache, rate limiting
Multi-agent routing — fallback chains, load balancing across LLM components
Metrics alongside traces (OTel)
Authentication and per-client rate limiting

Explicitly out of scope for now: gRPC, external plugin loading, pub/sub.

License

Apache 2.0 — see LICENSE.

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
cmd
daimon command Command daimon is the daimon sidecar CLI.	Command daimon is the daimon sidecar CLI.
internal
components/embedding/openai Package openai provides an Embedder backed by the OpenAI (or OpenAI-compatible) /v1/embeddings endpoint.	Package openai provides an Embedder backed by the OpenAI (or OpenAI-compatible) /v1/embeddings endpoint.
components/graph/memgraph Package memgraph provides a GraphStore backed by Memgraph.	Package memgraph provides a GraphStore backed by Memgraph.
components/graph/neo4j Package neo4j provides a GraphStore backed by Neo4j.	Package neo4j provides a GraphStore backed by Neo4j.
components/llm/anthropic Package anthropic provides a Conversation implementation backed by the Anthropic API.	Package anthropic provides a Conversation implementation backed by the Anthropic API.
components/llm/llamacpp Package llamacpp provides a Conversation implementation for any server that exposes an OpenAI-compatible HTTP API, including llama.cpp and LM Studio.	Package llamacpp provides a Conversation implementation for any server that exposes an OpenAI-compatible HTTP API, including llama.cpp and LM Studio.
components/llm/openai Package openai provides a Conversation implementation backed by the OpenAI API.	Package openai provides a Conversation implementation backed by the OpenAI API.
components/session/postgres Package postgres provides a PostgreSQL-backed SessionStore.	Package postgres provides a PostgreSQL-backed SessionStore.
components/session/redis Package redis provides a Redis-backed SessionStore.	Package redis provides a Redis-backed SessionStore.
components/vector/chroma Package chroma provides a MemoryStore backed by the Chroma vector database HTTP API.	Package chroma provides a MemoryStore backed by the Chroma vector database HTTP API.
components/vector/inmemory Package inmemory provides an in-process MemoryStore with BM25 lexical scoring.	Package inmemory provides an in-process MemoryStore with BM25 lexical scoring.
components/vector/pgvector Package pgvector provides a MemoryStore backed by PostgreSQL with the pgvector extension.	Package pgvector provides a MemoryStore backed by PostgreSQL with the pgvector extension.
components/vector/qdrant Package qdrant provides a MemoryStore backed by the Qdrant REST API.	Package qdrant provides a MemoryStore backed by the Qdrant REST API.
components/vector/redis Package redis provides a MemoryStore backed by Redis Stack (RediSearch + JSON modules).	Package redis provides a MemoryStore backed by Redis Stack (RediSearch + JSON modules).
config Package config loads and validates the daimon YAML configuration.	Package config loads and validates the daimon YAML configuration.
conversation Package conversation defines the core inference interface used by all provider components.	Package conversation defines the core inference interface used by all provider components.
embedding Package embedding defines the Embedder interface and factory registry used by vector store components that need to generate dense vector representations.	Package embedding defines the Embedder interface and factory registry used by vector store components that need to generate dense vector representations.
mcp Package mcp implements a lightweight MCP client over stdio transport.	Package mcp implements a lightweight MCP client over stdio transport.
memory Package memory defines the MemoryStore and GraphStore interfaces and their factory registries used by vector and graph database components.	Package memory defines the MemoryStore and GraphStore interfaces and their factory registries used by vector and graph database components.
server Package server implements the daimon HTTP API.	Package server implements the daimon HTTP API.
session Package session defines the SessionStore interface and factory registry for persistent conversation history.	Package session defines the SessionStore interface and factory registry for persistent conversation history.
telemetry Package telemetry configures OpenTelemetry tracing for the sidecar.	Package telemetry configures OpenTelemetry tracing for the sidecar.