daimon

module
v0.2.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 7, 2026 License: Apache-2.0

README

daimon

The spirit that runs alongside your AI app.

GitHub release PyPI npm License Docs

Daimon is a local sidecar process that gives your application a single, stable HTTP interface to any LLM. Swap providers, rotate keys, add tracing, wire up MCP tools, query vector stores, or traverse knowledge graphs — without touching your app code.

Inspired by Dapr's component model, adapted for AI-native primitives: streaming responses, pluggable providers, MCP tool calls, vector/graph stores, and persistent sessions.


How it works

your app  ──POST /v1/converse/claude──▶  daimon  ──▶  Anthropic API
          ◀── text/event-stream ────────────────────────────────────
                                            │
                                     MCP tool server(s)
                                   (filesystem, GitHub, ...)
                                            │
                                   vector stores (Chroma, Qdrant,
                                     Redis, pgvector, in-memory)
                                            │
                                   graph stores (Neo4j, Memgraph)

Daimon runs on localhost:3500. Your app speaks plain HTTP + Server-Sent Events. The provider, model, credentials, and tool servers all live in a YAML config — not in your code.


Quick start

Prerequisites: An OpenAI or Anthropic API key.

1 — Install

macOS / Linux — Homebrew

brew tap sonicboom15/tap
brew install daimon

Windows — winget

winget install sonicboom15.daimon

Windows — Scoop

scoop bucket add sonicboom15 https://github.com/sonicboom15/scoop-bucket
scoop install daimon

Linux — apt / rpm Download the .deb or .rpm from the latest release and install with dpkg -i or rpm -i.

Build from source

git clone https://github.com/sonicboom15/daimon.git && cd daimon && make build
# → ./bin/daimon
2 — Create a config
# config.yaml
port: 3500

components:
  - name: claude
    type: anthropic
    metadata:
      default_model: claude-haiku-4-5-20251001
      # api_key: sk-ant-...  # or set ANTHROPIC_API_KEY

  - name: gpt4o
    type: openai
    metadata:
      default_model: gpt-4o-mini
      # api_key: sk-...  # or set OPENAI_API_KEY
3 — Run
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
daimon serve --config config.yaml
INFO daimon listening addr=127.0.0.1:3500
4 — First request

curl:

curl -sN http://127.0.0.1:3500/v1/converse/claude \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is a daimon?"}]}'
data: {"type":"text","text":"In ancient Greek thought, a daimon"}
data: {"type":"text","text":" is a guiding spirit..."}
data: {"type":"done"}

Python SDK:

pip install daimon-client
import daimon_client as daimon

with daimon.Client() as client:
    for text in client.stream("claude", "What is a daimon?"):
        print(text, end="", flush=True)

TypeScript SDK:

npm install daimon-client
import { Client } from 'daimon-client';

const client = new Client();
for await (const text of client.stream('claude', 'What is a daimon?')) {
  process.stdout.write(text);
}

Configuration

port: 3500

components:

  # ── Embedder (declare before vector stores) ──────────────────────────────
  # - name: embedder
  #   type: embedding/openai
  #   metadata:
  #     base_url: http://localhost:11434/v1   # Ollama; omit for OpenAI
  #     model: nomic-embed-text
  #     dimensions: "768"

  # ── Session store (optional; defaults to in-memory) ──────────────────────
  # - name: sessions
  #   type: session/redis
  #   metadata:
  #     addr: localhost:6379
  #     ttl: "24h"

  # ── Vector / document stores ─────────────────────────────────────────────
  # - name: docs
  #   type: inmemory          # BM25 lexical, no deps — dev/testing only
  #
  # - name: chroma-docs
  #   type: chroma
  #   metadata:
  #     base_url: http://localhost:8000
  #     collection: daimon
  #     create_if_missing: "true"
  #
  # - name: qdrant-docs
  #   type: qdrant
  #   metadata:
  #     base_url: http://localhost:6333
  #     collection: daimon
  #     embedder: embedder
  #     create_if_missing: "true"

  # ── Graph stores ──────────────────────────────────────────────────────────
  # - name: kg
  #   type: neo4j
  #   metadata:
  #     bolt_url: bolt://localhost:7687
  #     username: neo4j
  #     password: secret

  # ── LLM components ────────────────────────────────────────────────────────
  - name: claude
    type: anthropic
    # memory_store: chroma-docs   # enable transparent RAG from a vector store
    metadata:
      default_model: claude-opus-4-7
      # api_key: sk-ant-...  # or set ANTHROPIC_API_KEY
    # defaults:
    #   temperature: 1.0
    #   max_tokens: 4096
    #   top_p: 0.9
    #   top_k: 50          # Anthropic-specific
    #   stop: ["Human:"]
    #   system: "You are a helpful assistant."

  - name: gpt4o
    type: openai
    metadata:
      default_model: gpt-4o
      # api_key: sk-...  # or set OPENAI_API_KEY
    # defaults:
    #   temperature: 0.7
    #   max_tokens: 2048
    #   frequency_penalty: 0.0
    #   presence_penalty: 0.0
    #   seed: 42

  - name: local
    type: llamacpp
    metadata:
      base_url: http://localhost:11434/v1   # Ollama default
      # base_url: http://localhost:1234/v1  # LM Studio default
      # base_url: http://localhost:8080/v1  # llama.cpp default
      default_model: llama3.2:3b

# MCP tool servers — daimon connects at startup and injects their tools into
# every chat request automatically. The model can call them; daimon runs the loop.
# mcp_servers:
#   - name: filesystem
#     command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
#   - name: github
#     command: ["npx", "-y", "@modelcontextprotocol/server-github"]

telemetry:
  otlp_endpoint: ""   # e.g. "localhost:4318" — leave empty to disable

All component types — LLMs, embedders, session stores, vector stores, and graph stores — live under components:. Declaration order matters: embedders before vector stores, vector stores before LLMs that reference them via memory_store:. See examples/config.yaml for the fully-documented reference.


API

POST /v1/converse/{component}

Send a chat request and receive a streaming response over Server-Sent Events.

Request body:

{
  "messages": [
    { "role": "system",    "content": "You are a helpful assistant." },
    { "role": "user",      "content": "What is a daimon?" }
  ],
  "model":             "gpt-4o-mini",
  "system":            "Override or set a system prompt here.",
  "max_tokens":        512,
  "temperature":       0.7,
  "top_p":             0.9,
  "top_k":             50,
  "stop":              ["Human:"],
  "frequency_penalty": 0.0,
  "presence_penalty":  0.0,
  "seed":              42,
  "tools": [
    {
      "name":        "get_weather",
      "description": "Get current weather for a city.",
      "input_schema": {
        "type": "object",
        "properties": { "city": { "type": "string" } },
        "required":   ["city"]
      }
    }
  ]
}

All fields except messages are optional. Omitted inference parameters fall back to the component's configured defaults.

Sessions: include "session_id" to have daimon maintain conversation history server-side. Only send the new user turn — the server prepends stored history automatically.

# Turn 1
curl -sN http://127.0.0.1:3500/v1/converse/claude \
  -H "Content-Type: application/json" \
  -d '{"session_id":"chat-1","messages":[{"role":"user","content":"My name is Alice."}]}'

# Turn 2 — server prepends the previous exchange automatically
curl -sN http://127.0.0.1:3500/v1/converse/claude \
  -H "Content-Type: application/json" \
  -d '{"session_id":"chat-1","messages":[{"role":"user","content":"What is my name?"}]}'

Clear a session with DELETE /v1/sessions/{id} (returns 204, idempotent).

Provider support matrix:

Parameter OpenAI Anthropic llamacpp
temperature
max_tokens
top_p
top_k
stop
frequency_penalty
presence_penalty
seed

Unsupported parameters are silently ignored per provider.

Response (text/event-stream):

data: {"type":"text","text":"In ancient Greek thought..."}

data: {"type":"tool_call","tool_call":{"id":"call_1","name":"get_weather","input":{"city":"London"}}}

data: {"type":"text","text":"The weather in London is 12°C."}

data: {"type":"done"}

Each data: line is a JSON object:

type additional fields meaning
text text a fragment of the model's response
tool_call tool_call.id, .name, .input model invoked a tool (daimon executes it and continues)
done stream finished successfully
error error terminal error; stream ends

tool_call events are forwarded so clients can show progress ("calling tool X…"). Daimon executes the tool automatically and loops back to the model — no client-side action needed.

DELETE /v1/sessions/{id}

Clears server-side session history for the given ID. Returns 204 No Content. Idempotent — deleting a session that does not exist is not an error.

GET /healthz

Returns 200 ok when the sidecar is up.


Python SDK

Install:

pip install daimon-client

Streaming text:

import daimon_client as daimon

# context manager reuses the HTTP connection
with daimon.Client() as client:
    for text in client.stream("claude", "Explain recursion in one sentence."):
        print(text, end="", flush=True)
print()

Convenience: collect the full response:

reply = client.chat("gpt4o", "What is the capital of France?")
print(reply)  # "The capital of France is Paris."

Multi-turn conversation:

messages = [
    daimon.Message(role="system", content="You are a helpful assistant."),
    daimon.Message(role="user",   content="My name is Alice."),
]
reply = client.chat("claude", messages)
messages.append(daimon.Message(role="assistant", content=reply))
messages.append(daimon.Message(role="user", content="What is my name?"))
print(client.chat("claude", messages))

Sessions:

client.chat("claude", "My name is Alice.", session_id="chat-1")
reply = client.chat("claude", "What is my name?", session_id="chat-1")
# reply: "Your name is Alice."
client.clear_session("chat-1")

With inference parameters:

reply = client.chat(
    "gpt4o",
    "Write a haiku about Go.",
    model="gpt-4o",
    temperature=0.9,
    max_tokens=64,
)

Observing tool calls:

def on_tool(tc: daimon.ToolCall) -> None:
    print(f"[tool: {tc.name}({tc.input})]")

for text in client.stream("claude", "What's the weather in Tokyo?", on_tool_call=on_tool):
    print(text, end="", flush=True)

Async:

import asyncio
import daimon_client as daimon

async def main():
    async with daimon.AsyncClient() as client:
        async for text in client.stream("claude", "Hello!"):
            print(text, end="", flush=True)

asyncio.run(main())

Full runnable examples: examples/client/chat.py · examples/client/chat_async.py


TypeScript SDK

Install:

npm install daimon-client

Streaming text:

import { Client } from 'daimon-client';

const client = new Client();
for await (const text of client.stream('claude', 'Explain recursion in one sentence.')) {
  process.stdout.write(text);
}

Convenience: collect the full response:

const reply = await client.chat('gpt4o', 'What is the capital of France?');
console.log(reply); // "The capital of France is Paris."

Sessions:

await client.chat('claude', 'My name is Alice.', { session_id: 'chat-1' });
const reply = await client.chat('claude', 'What is my name?', { session_id: 'chat-1' });
// reply: "Your name is Alice."
await client.clearSession('chat-1');

With inference parameters:

const reply = await client.chat('gpt4o', 'Write a haiku about Go.', {
  model:       'gpt-4o',
  temperature: 0.9,
  max_tokens:  64,
});

Full runnable examples: sdk/typescript/examples/


Tool calls via MCP

Daimon acts as an MCP client. Configure MCP servers in YAML and daimon:

  1. Connects to each server at startup and fetches its tool catalogue.
  2. Injects all tools into every chat request automatically.
  3. When the model calls a tool, daimon executes it via the MCP server and feeds the result back — looping until the model returns a plain text response.

Your application sees a single streaming response with the final answer, plus tool_call events for progress:

mcp_servers:
  - name: filesystem
    command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
  - name: brave-search
    command: ["npx", "-y", "@modelcontextprotocol/server-brave-search"]

No client-side changes required.


Memory & Graph Stores

Daimon ships with five vector stores and two graph stores, all configured the same way — as components: entries.

Vector stores
Type External service Embedding
inmemory None BM25 (lexical)
chroma Chroma Server-side
qdrant Qdrant Configurable endpoint
redis Redis Stack Configurable endpoint
pgvector PostgreSQL + pgvector Configurable endpoint

HTTP API: PUT /v1/memory/{store}/{id} · POST /v1/memory/{store} · POST /v1/memory/{store}/query · DELETE /v1/memory/{store}/{id}

Python SDK:

store = client.memory("docs")
store.upsert("The Eiffel Tower is 330 m tall.", id="doc1", metadata={"src": "wiki"})
results = store.query("tall Paris structures", top_k=3)
# results[0].id, .content, .score, .metadata
store.delete("doc1")

TypeScript SDK:

const store = client.memory('docs');
await store.upsert('The Eiffel Tower is 330 m tall.', { id: 'doc1', metadata: { src: 'wiki' } });
const results = await store.query('tall Paris structures', 3);
await store.delete('doc1');
Transparent RAG

Add memory_store: <name> to any LLM component and daimon automatically queries the store before every chat request, injecting the top results as a system message:

- name: claude
  type: anthropic
  memory_store: chroma-docs

No client code changes needed — the enrichment happens inside the sidecar.

Graph stores
Type External service Protocol
neo4j Neo4j Bolt (default) / HTTP
memgraph Memgraph Bolt (default) / HTTP

HTTP API: PUT /v1/graph/{store}/nodes/{id} · POST /v1/graph/{store}/edges · POST /v1/graph/{store}/cypher · DELETE /v1/graph/{store}/nodes/{id}

Python SDK:

graph = client.graph("kg")
graph.add_node(id="alice", labels=["Person"], props={"name": "Alice"})
graph.add_edge("alice", "bob", "KNOWS")
rows = graph.cypher("MATCH (a)-[:KNOWS]->(b) RETURN a.name, b.name")

Both stores also generate {name}_cypher, {name}_add_node, and {name}_add_edge tools that the LLM can call directly via the agentic loop.


Supported providers

Type Env var Default model
openai OPENAI_API_KEY gpt-4o
anthropic ANTHROPIC_API_KEY claude-opus-4-7
llamacpp (required)

llamacpp connects to any OpenAI-compatible local server: llama.cpp, Ollama, or LM Studio. Set base_url in metadata to point at your server's /v1 endpoint.


Adding a provider

  1. Create internal/components/llm/<name>/<name>.go.
  2. Implement conversation.Conversation:
    type Component struct { /* ... */ }
    
    func (c *Component) Chat(ctx context.Context, req conversation.Request) (<-chan conversation.Chunk, error) {
        // stream chunks through the returned channel
    }
    
  3. Register in init():
    func init() {
        conversation.Register("<name>", func(cfg conversation.ComponentConfig) (conversation.Conversation, error) {
            return New(cfg)
        })
    }
    
  4. Blank-import the package from cmd/daimon/serve.go and cmd/daimon/run.go.
  5. Add a worked example to examples/config.yaml.

No changes to the server, config loader, or any other package. See Development for adding vector stores or graph stores.


Development

make build          # compile → ./bin/daimon
make run            # build + run with examples/config.yaml
make test           # go test ./...
make lint           # golangci-lint
make fmt            # gofmt + goimports
make license-check

Integration tests (require API keys / Docker):

# OpenAI + Anthropic
OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... \
  go test -tags integration -v ./internal/components/...

# llamacpp — starts Ollama in Docker automatically, pulls qwen2.5:1.5b
go test -tags integration -v ./internal/components/llm/llamacpp/

# Full e2e suite (Go + Python SDK + TypeScript SDK) — requires Docker
go test -tags integration -v -timeout 20m ./test/e2e/

Python SDK tests:

cd sdk/python
pip install -e ".[dev]"
pytest tests/ -v

TypeScript SDK tests:

cd sdk/typescript
npm install
npm test

Roadmap

  • AI-native memory systems (Zep, Mem0) — session-aware, auto-summarising, distinct from vector stores
  • Middleware pipeline — per-request hooks for moderation, PII redaction, semantic cache, rate limiting
  • Multi-agent routing — fallback chains, load balancing across LLM components
  • Metrics alongside traces (OTel)
  • Authentication and per-client rate limiting

Explicitly out of scope for now: gRPC, external plugin loading, pub/sub.


License

Apache 2.0 — see LICENSE.

Directories

Path Synopsis
cmd
daimon command
Command daimon is the daimon sidecar CLI.
Command daimon is the daimon sidecar CLI.
internal
components/embedding/openai
Package openai provides an Embedder backed by the OpenAI (or OpenAI-compatible) /v1/embeddings endpoint.
Package openai provides an Embedder backed by the OpenAI (or OpenAI-compatible) /v1/embeddings endpoint.
components/graph/memgraph
Package memgraph provides a GraphStore backed by Memgraph.
Package memgraph provides a GraphStore backed by Memgraph.
components/graph/neo4j
Package neo4j provides a GraphStore backed by Neo4j.
Package neo4j provides a GraphStore backed by Neo4j.
components/llm/anthropic
Package anthropic provides a Conversation implementation backed by the Anthropic API.
Package anthropic provides a Conversation implementation backed by the Anthropic API.
components/llm/llamacpp
Package llamacpp provides a Conversation implementation for any server that exposes an OpenAI-compatible HTTP API, including llama.cpp and LM Studio.
Package llamacpp provides a Conversation implementation for any server that exposes an OpenAI-compatible HTTP API, including llama.cpp and LM Studio.
components/llm/openai
Package openai provides a Conversation implementation backed by the OpenAI API.
Package openai provides a Conversation implementation backed by the OpenAI API.
components/session/postgres
Package postgres provides a PostgreSQL-backed SessionStore.
Package postgres provides a PostgreSQL-backed SessionStore.
components/session/redis
Package redis provides a Redis-backed SessionStore.
Package redis provides a Redis-backed SessionStore.
components/vector/chroma
Package chroma provides a MemoryStore backed by the Chroma vector database HTTP API.
Package chroma provides a MemoryStore backed by the Chroma vector database HTTP API.
components/vector/inmemory
Package inmemory provides an in-process MemoryStore with BM25 lexical scoring.
Package inmemory provides an in-process MemoryStore with BM25 lexical scoring.
components/vector/pgvector
Package pgvector provides a MemoryStore backed by PostgreSQL with the pgvector extension.
Package pgvector provides a MemoryStore backed by PostgreSQL with the pgvector extension.
components/vector/qdrant
Package qdrant provides a MemoryStore backed by the Qdrant REST API.
Package qdrant provides a MemoryStore backed by the Qdrant REST API.
components/vector/redis
Package redis provides a MemoryStore backed by Redis Stack (RediSearch + JSON modules).
Package redis provides a MemoryStore backed by Redis Stack (RediSearch + JSON modules).
config
Package config loads and validates the daimon YAML configuration.
Package config loads and validates the daimon YAML configuration.
conversation
Package conversation defines the core inference interface used by all provider components.
Package conversation defines the core inference interface used by all provider components.
embedding
Package embedding defines the Embedder interface and factory registry used by vector store components that need to generate dense vector representations.
Package embedding defines the Embedder interface and factory registry used by vector store components that need to generate dense vector representations.
mcp
Package mcp implements a lightweight MCP client over stdio transport.
Package mcp implements a lightweight MCP client over stdio transport.
memory
Package memory defines the MemoryStore and GraphStore interfaces and their factory registries used by vector and graph database components.
Package memory defines the MemoryStore and GraphStore interfaces and their factory registries used by vector and graph database components.
server
Package server implements the daimon HTTP API.
Package server implements the daimon HTTP API.
session
Package session defines the SessionStore interface and factory registry for persistent conversation history.
Package session defines the SessionStore interface and factory registry for persistent conversation history.
telemetry
Package telemetry configures OpenTelemetry tracing for the sidecar.
Package telemetry configures OpenTelemetry tracing for the sidecar.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL