runtime

module
v0.32.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 24, 2026 License: Apache-2.0

README ΒΆ

Contenox

AI workflows you can run, review, and own.

Contenox is an open-source AI workflow runtime for developers. It turns repeatable coding and tool workflows into versioned Chains: files that declare prompts, model/provider routing, tool allowlists, retries, branches, budgets, and human approval gates.

Many coding workflows do not need a frontier model. Contenox gives you a way to run that work where the code is, with a proper agent loop instead of hidden prompt habits or one-off glue, and route to network or cloud models when the job needs them.

Run the same workflow from the CLI, VS Code, or any ACP client. Use modeld for the edge path, Ollama or vLLM on your network, or hosted providers, while sessions, config, telemetry, and runtime state stay on your machine.

  • It speaks Unix: Pipe data directly into your workflows. git diff | contenox run commit-msg or git log | contenox run release-notes.
  • It respects boundaries: Human-in-the-loop isn't a UI toggle, it's a strict policy file. The AI pauses and asks for terminal approval before running destructive commands.
  • It routes inference: Use edge modeld, private-network backends, or hosted providers per workflow. modeld is built for one active local model and resident coding context, not model multiplexing.

You own the workflow. The vendor doesn't decide how it behaves on your machine. You do.

Go License Version

It is built for specific, reviewable AI work, not vague promises of fully autonomous agents.

πŸ“– contenox.com


What would I use this for?

Package a repeatable AI task as a chain, then run it the same way every time:

  • Review a diff β€” run the tests, summarize the risk, and gate on your approval before it acts.
  • Draft release evidence β€” turn git log, PRs, and CI output into a changelog and reviewer packet.
  • Wrap an internal API β€” expose a safe, curated tool subset with approval required on mutating calls.
  • Automate repo chores β€” take an issue, produce a patch, run the tests, write the PR description.
  • Ask an owned model β€” codebase chat and one-off prompts through local modeld or a private inference endpoint.
  • Use edge autocomplete β€” keep VS Code ghost text on a local or local-network coder model while chat uses a larger hosted model.

The same chains run from the CLI, VS Code, or any ACP client. Inference can sit on the device, on your network, or with a cloud vendor, while sessions and state stay local. Detailed examples are in What it is good for below.


Install

curl -fsSL https://contenox.com/install.sh | sh

Quick Start

# Configure a provider/model for this machine
contenox setup

# Use it from the CLI
contenox "say hello world in python"
contenox chat -e                        # open $EDITOR to compose a prompt

For normal CLI/VS Code installs, choose local Ollama, a private network backend, or a hosted provider in setup. Owned local GGUF/OpenVINO inference uses the separate native modeld daemon, which is not bundled in release installs yet. If you choose a local modeld provider, setup prints source-build commands. Full guide: modeld Source Build and Packaging.

Resume past sessions with contenox session list and contenox session switch <name>. Backends are summarized below.

Developing the source-built local backend? See modeld Source Build and Packaging.

VS Code autocomplete can use a different model

Inline autocomplete is intentionally separate from chat. That lets you run low-latency ghost text at the edge, on a LAN Ollama box, or on a FIM/coder cloud model while keeping chat and tool workflows on a larger provider.

# Chat can stay on a hosted model:
contenox config set default-provider openai
contenox config set default-model    gpt-5-mini

# Autocomplete can stay local via modeld:
contenox config set default-autocomplete-provider llama
contenox config set default-autocomplete-model    qwen3-coder-30b-a3b

# Or point autocomplete at a local-network Ollama coder model:
contenox config set default-autocomplete-provider ollama
contenox config set default-autocomplete-model    qwen2.5-coder:7b

In VS Code, enable it with Contenox: Enable Autocomplete and verify with Contenox: Test Autocomplete At Cursor.


What you author

The workflow behavior is a chain file. Every decision is a JSON key:

{
  "id": "review",
  "tasks": [
    {
      "id": "review",
      "handler": "chat_completion",
      "system_instruction": "You are a code reviewer. Analyze the diff, run the tests if tools are available, then give a concise review.",
      "execute_config": {
        "model": "{{var:model}}",
        "provider": "{{var:provider}}",
        "tools": ["local_shell", "local_fs"],
        "tools_policies": {
          "local_shell": { "_allowed_commands": "go,make,npm,cargo,grep,cat" }
        }
      },
      "transition": {
        "branches": [
          { "operator": "equals", "when": "tool_call", "goto": "run_tools" },
          { "operator": "default", "goto": "end" }
        ]
      }
    },
    {
      "id": "run_tools",
      "handler": "execute_tool_calls",
      "input_var": "review",
      "transition": {
        "branches": [
          { "operator": "default", "goto": "review" }
        ]
      }
    }
  ]
}

System prompt, model, tool policy, allowed commands, retry budget, and transitions are all visible. Save the chain and pipe in a diff:

git diff | contenox run --chain ./review.json

Walk through your first chain step by step: contenox.com/docs/guide/first-chain.


What it is good for

Contenox is strongest when the workflow is specific and repeatable: known inputs, known tools, known output shape, and explicit review gates.

Examples of workflows you can package as chains:

Release evidence pack
Input: git log, PRs, tickets, CI output
Output: changelog, risk notes, deployment checklist, reviewer packet
Gate: human approval before publishing
API-to-workflow wrapper
Input: internal OpenAPI spec
Output: curated tool subset, hidden tenant/env args, auth handling, HITL policy
Gate: approval for mutating calls
Repo maintenance chain
Input: issue or migration request
Output: patch, test run, PR description
Gate: shell/filesystem approval and human merge

State lives locally in SQLite. Sessions persist across invocations. The AI provider is a config line: local modeld (llama/openvino), Ollama, vLLM, OpenAI, Anthropic, Mistral, Gemini, AWS Bedrock, OpenRouter, or Vertex. Use edge inference, private network inference, or a hosted vendor depending on the workflow, latency target, cost, and data boundary. Autocomplete has its own provider/model defaults, so editor ghost text can stay local even when chat uses the cloud.


Where it fits

Contenox is the agent layer you control from terminal to editor. The category is AI workflow runtime with edge, private network, and cloud inference routing; the architecture is developer agent runtime.

Nearby world Why Contenox is different
Cursor / IDE copilots Runtime-first, not editor-first. The same engine works from the terminal, VS Code, and ACP clients.
Aider / CLI coding agents Broader workflow, session, tool policy, and provider scope than a single coding loop.
LangChain / agent frameworks End-user executable product, not just a library you wire into an app.
Dify / n8n / web AI workflow tools Local desktop/workspace-first, not web-app/SaaS-first.
Ollama wrappers Provider-neutral and workflow/tool/HITL-oriented, spanning owned local inference, private network backends, and hosted vendors.

Connect your stack

Anything you can reach over MCP, an OpenAPI spec, or a shell command can become a scoped tool in a chain:

# Any MCP-compatible server (Notion, Linear, Playwright, GitHub, Postgres, …)
contenox mcp add notion https://mcp.notion.com/mcp --auth-type oauth

# Any HTTP API with an OpenAPI spec (no glue code required)
# Slice a monolithic API into safe subsets by pointing --spec at a curated local file
contenox tools add erp_billing --url https://erp.internal.example.com --spec ./billing-subset.yaml

# The shell, with your own command policy declared in the chain
contenox --shell "check Proxmox and flag anything red"

Use it from Zed (or any ACP client)

ACP/editor support is an optional way to run the same local chains inside an editor. Contenox speaks the Agent Client Protocol over stdio. Drop this into ~/.config/zed/settings.json:

{
  "agent_servers": {
    "Contenox": {
      "type": "custom",
      "command": "contenox",
      "args": ["acp"]
    }
  }
}

Open Zed's agent panel and pick Contenox. Your chain runs inside the editor: tool calls render as cards with the actual command/path, HITL prompts route through Zed's permission UI, and session history replays when you reopen the project. Chain selection lives at ~/.contenox/default-acp-chain.json (or set CONTENOX_ACP_CHAIN_PATH). Full guide β†’ contenox.com/docs/guide/zed.

JetBrains (GoLand, IntelliJ IDEA, …) reads agent servers from ~/.jetbrains/acp.json β€” same binary, different schema (no "type" field):

{
  "default_mcp_settings": { "use_custom_mcp": true, "use_idea_mcp": false },
  "agent_servers": {
    "Contenox": {
      "command": "contenox",
      "args": ["acp"]
    }
  }
}

Verified with GoLand 2026.1.2. Full guide β†’ contenox.com/docs/guide/jetbrains.

AionUi β€” a free, local, open-source desktop chat UI for ACP agents. Add a Custom Agent: command contenox, args ["acp"]. Verified with AionUi 2.0.0. Full guide β†’ contenox.com/docs/guide/aionui.


Local north star: long context on your own accelerator

Most of Contenox runs against whatever provider you choose. The native modeld daemon exists for one specific bet: a local AI coding agent on a single consumer accelerator that serves real, long-context work β€” an effective context far beyond a model's native window (the goal is ~200k tokens) on limited hardware, by treating context as resident state kept hot rather than a prompt resent every turn.

modeld is shaped entirely around that bet:

  • One model, one user, many sessions. A single active model slot serves many persistent sessions for one owner, so the device's whole memory and KV budget go to making that model deep and fast instead of multiplexing several.
  • Warm-reuse sessions. Each session keeps its stable prefix's KV hot and re-prefills only the changed suffix (EnsurePrefix β†’ PrefillSuffix β†’ Decode), so a long working context is paid for once, not resent on every turn.
  • Snapshot / restore. Session state is durable and branchable, so effective context outlives a single live process.
  • Accelerator-driven, no knobs. modeld detects the accelerator and derives offload and the effective window from the device at runtime β€” no per-model flags.

This is the direction the local backend is built toward, not a shipped guarantee on every model and device. The workflow runtime above doesn't depend on it β€” use any hosted or local provider today. How it maps onto the code (KV cache, warm reuse, capacity, the latency budget, and what's still required): Effective Context North Star.


Backends

The llama and openvino backends are local modeld-backed inference providers. contenox init registers them automatically and contenox model pull <name> downloads artifacts into ~/.contenox/models/<backend>/. The current CLI/VSIX release assets do not bundle modeld, so local modeld providers require a source build for now: modeld Source Build and Packaging.

To add other backends:

# Private network / self-hosted inference
contenox backend add ollama    --type ollama
contenox backend add myvllm    --type vllm   --url http://gpu-host:8000

# Hosted AI vendors
contenox backend add openai    --type openai    --api-key-env OPENAI_API_KEY
contenox backend add anthropic --type anthropic --api-key-env ANTHROPIC_API_KEY
contenox backend add mistral   --type mistral   --api-key-env MISTRAL_API_KEY
contenox backend add gemini    --type gemini    --api-key-env GEMINI_API_KEY
contenox backend add bedrock   --type bedrock   --url https://bedrock-runtime.us-east-1.amazonaws.com
contenox backend add vertex    --type vertex-google --url "https://us-central1-aiplatform.googleapis.com/v1/projects/$GOOGLE_CLOUD_PROJECT/locations/us-central1"

# Set your defaults
contenox config set default-model qwen3-8b
contenox config set default-provider llama

Build from source

Requires Go 1.25+.

git clone https://github.com/contenox/runtime
cd runtime
make build-contenox

Build and run local modeld for llama.cpp:

CONTENOX_MODELD_BACKEND=llama make run-modeld

Build and run local modeld for OpenVINO:

make deps-modeld
CONTENOX_MODELD_BACKEND=openvino make run-modeld

Build a relocatable Linux modeld bundle:

MODELD_DIST_DIR="$PWD/bin/modeld-linux-amd64" make package-modeld
tar -C bin -czf bin/modeld-linux-amd64.tar.gz modeld-linux-amd64

See modeld Source Build and Packaging for the complete local modeld flow.


Built on

The contenox CLI is pure Go. Local inference lives in the separate modeld daemon, which builds on these upstream projects (pinned in mk/llama-flags.mk and mk/openvino-flags.mk):

Project Role License
llama.cpp GGUF inference and the ggml CPU/CUDA/HIP/Metal backends MIT
OpenVINO Inference runtime (CPU / iGPU / NPU) Apache-2.0
OpenVINO GenAI LLM pipeline over OpenVINO Apache-2.0
OpenVINO Tokenizers Tokenizer extension for OpenVINO GenAI Apache-2.0
minja Chat-template engine (vendored by OpenVINO GenAI) MIT
gguf-tools GGUF parsing headers (vendored by OpenVINO GenAI) see upstream

Native backends are compiled, not embedded: modeld links these at build time and ships their runtime libraries inside each release package. Upstream license texts travel with the artifacts (licenses/ in dependency bundles, LICENSES/ in modeld packages). Other Go dependencies are listed in go.mod.

Provider integrations contenox talks to over the network (Ollama, vLLM, and hosted OpenAI-compatible vendors) are not built into contenox and are not listed here.


Questions: hello@contenox.com

Directories ΒΆ

Path Synopsis
cmd
contenox command
Contenox CLI: run task chains locally with SQLite-backed state.
Contenox CLI: run task chains locally with SQLite-backed state.
modeld command
Command modeld is the contenox model daemon: the per-user, per-data-root owner of resident model state.
Command modeld is the contenox model daemon: the per-user, per-data-root owner of resident model state.
Package libauth provides secure authentication and authorization services using JWT tokens.
Package libauth provides secure authentication and authorization services using JWT tokens.
Package bus provides an interface for core publish-subscribe messaging.
Package bus provides an interface for core publish-subscribe messaging.
Package libcipher provides a collection of cryptographic utilities for encryption, decryption, integrity verification, and secure key generation.
Package libcipher provides a collection of cryptographic utilities for encryption, decryption, integrity verification, and secure key generation.
3.
Package liblease implements a cooperative, time-bounded file lease: a single-holder lock backed by an ordinary file, not an OS primitive.
Package liblease implements a cooperative, time-bounded file lease: a single-holder lock backed by an ordinary file, not an OS primitive.
Package routine provides utilities for managing recurring tasks (routines) with circuit breaker protection.
Package routine provides utilities for managing recurring tasks (routines) with circuit breaker protection.
modeld
capacity
Package capacity is modeld's hardware capacity planner: it resolves the EFFECTIVE context window a model can actually be served at on this device, from the model's KV-cache footprint and the device's free memory β€” not the model's trained ceiling alone.
Package capacity is modeld's hardware capacity planner: it resolves the EFFECTIVE context window a model can actually be served at on this device, from the model's KV-cache footprint and the device's free memory β€” not the model's trained ceiling alone.
internal/sessionkit
Package sessionkit holds the small backend-neutral helpers shared by the modeld transport.Session adapters (llama.cpp in modeld/llama/llamasession and OpenVINO in modeld/openvino).
Package sessionkit holds the small backend-neutral helpers shared by the modeld transport.Session adapters (llama.cpp in modeld/llama/llamasession and OpenVINO in modeld/openvino).
llama
Package llama defines the modeld-side llama backend contract: persistent inference sessions keep a stable prefix's KV hot and re-prefill only the changed suffix.
Package llama defines the modeld-side llama backend contract: persistent inference sessions keep a stable prefix's KV hot and re-prefill only the changed suffix.
llama/llamacppshim
Package llamacppshim owns the direct llama.cpp C API boundary for modeld.
Package llamacppshim owns the direct llama.cpp C API boundary for modeld.
openvino
Package openvino implements the runtime/transport.Service boundary for the OpenVINO (Intel) backend: it opens persistent, manifest-keyed sessions on the owned device (CPU / GPU / NPU) that the runtime drives over the transport.
Package openvino implements the runtime/transport.Service boundary for the OpenVINO (Intel) backend: it opens persistent, manifest-keyed sessions on the owned device (CPU / GPU / NPU) that the runtime drives over the transport.
openvino/ovsession
Package ovsession contains the native OpenVINO session/KV bridge used by the openvino modelrepo provider.
Package ovsession contains the native OpenVINO session/KV bridge used by the openvino modelrepo provider.
owner
Package owner manages lease-based ownership of the local runtime's resident state.
Package owner manages lease-based ownership of the local runtime's resident state.
residency
Package residency contains modeld's backend-neutral KV residency policy.
Package residency contains modeld's backend-neutral KV residency policy.
slot
Package slot enforces modeld's single active local model invariant.
Package slot enforces modeld's single active local model invariant.
runtime
benchreport
Package benchreport is the common local-node benchmark report: one JSON shape emitted across every backend/model/hardware profile so runtime latency and warm-reuse claims stay honest.
Package benchreport is the common local-node benchmark report: one JSON shape emitted across every backend/model/hardware profile so runtime latency and warm-reuse claims stay honest.
chatservice
Package chatservice persists the conversation thread.
Package chatservice persists the conversation thread.
contenoxcli
backends.go contains helpers for LLM backend and provider config KV storage.
backends.go contains helpers for LLM backend and provider config KV storage.
hitlservice
Package hitlservice evaluates approval policies for tool calls.
Package hitlservice evaluates approval policies for tool calls.
internal/modeldinstall
Package modeldinstall discovers, downloads, verifies, installs, and validates a prebuilt modeld package for the current machine.
Package modeldinstall discovers, downloads, verifies, installs, and validates a prebuilt modeld package for the current machine.
internal/modeldprobe
Package modeldprobe detects whether the modeld daemon (the separate CGO inference binary) is installed, running, or dead, so the runtime can fail honestly and the setup wizard can guide the user.
Package modeldprobe detects whether the modeld daemon (the separate CGO inference binary) is installed, running, or dead, so the runtime can fail honestly and the setup wizard can guide the user.
internal/setupcheck
Package setupcheck evaluates local runtime readiness (defaults, backends) for the CLI.
Package setupcheck evaluates local runtime readiness (defaults, backends) for the CLI.
internal/tools
internal/tools/multi_repo.go
internal/tools/multi_repo.go
llmrepo
Package llmrepo provides a unified facade over LLM backends discovered via runtimestate: prompt, chat, streaming, embedding, and tokenization through a single ModelRepo interface.
Package llmrepo provides a unified facade over LLM backends discovered via runtimestate: prompt, chat, streaming, embedding, and tokenization through a single ModelRepo interface.
localtools
Package localtools provides tools that fire around chain execution: approval gates and host-side helpers.
Package localtools provides tools that fire around chain execution: approval gates and host-side helpers.
localtools/mcpoauth
Package mcpoauth implements the MCP OAuth 2.1 Authorization Code + PKCE flow for CLI clients.
Package mcpoauth implements the MCP OAuth 2.1 Authorization Code + PKCE flow for CLI clients.
mcpserverservice
Package mcpserverservice stores MCP server configs.
Package mcpserverservice stores MCP server configs.
mcpworker
Package mcpworker keeps MCP server connections alive across chain steps.
Package mcpworker keeps MCP server connections alive across chain steps.
modelrepo
Package modelrepo defines the provider-facing contracts for LLM backends: the Provider interface (capabilities + client factories), the per-capability client interfaces (LLMPromptExecClient, LLMChatClient, LLMEmbedClient, LLMStreamClient), and the shared request/response types (Message, ChatResult, StreamParcel, Tool, ChatArgument).
Package modelrepo defines the provider-facing contracts for LLM backends: the Provider interface (capabilities + client factories), the per-capability client interfaces (LLMPromptExecClient, LLMChatClient, LLMEmbedClient, LLMStreamClient), and the shared request/response types (Message, ChatResult, StreamParcel, Tool, ChatArgument).
modelrepo/anthropic
Package anthropic is a direct (non-Vertex) provider for the Anthropic API (api.anthropic.com), which speaks the Messages API.
Package anthropic is a direct (non-Vertex) provider for the Anthropic API (api.anthropic.com), which speaks the Messages API.
modelrepo/bedrock
Package bedrock is a provider for AWS Bedrock via the unified Converse API.
Package bedrock is a provider for AWS Bedrock via the unified Converse API.
modelrepo/codec/chatcompletions
Package chatcompletions is a transport-agnostic codec for the OpenAI Chat Completions wire format (`/chat/completions`-style request/response and SSE streaming).
Package chatcompletions is a transport-agnostic codec for the OpenAI Chat Completions wire format (`/chat/completions`-style request/response and SSE streaming).
modelrepo/codec/messages
Package messages is a transport-agnostic codec for Anthropic's Messages API wire format (request, content-block response, and named-SSE-event streaming).
Package messages is a transport-agnostic codec for Anthropic's Messages API wire format (request, content-block response, and named-SSE-event streaming).
modelrepo/gemini
Package gemini implements the modelrepo.Provider contract against Google's Gemini Generative Language API.
Package gemini implements the modelrepo.Provider contract against Google's Gemini Generative Language API.
modelrepo/llama
Package llama is the graduated local coding-node runtime: a persistent, workspace-scoped inference session that keeps a stable prefix's KV hot and re-prefills only the changed suffix (the live warm-reuse hot path), distinct from the toy fixed-constant `local` provider.
Package llama is the graduated local coding-node runtime: a persistent, workspace-scoped inference session that keeps a stable prefix's KV hot and re-prefills only the changed suffix (the live warm-reuse hot path), distinct from the toy fixed-constant `local` provider.
modelrepo/mistral
Package mistral is a direct (non-Vertex) provider for the Mistral API (api.mistral.ai), which speaks the OpenAI-compatible chat/completions format.
Package mistral is a direct (non-Vertex) provider for the Mistral API (api.mistral.ai), which speaks the OpenAI-compatible chat/completions format.
modelrepo/modeldconn
Package modeldconn is the runtime's client seam to the modeld daemon: it resolves the current lease leader (via modeldprobe), dials it over the gRPC transport, and opens sessions.
Package modeldconn is the runtime's client seam to the modeld daemon: it resolves the current lease leader (via modeldprobe), dials it over the gRPC transport, and opens sessions.
modelrepo/ollama
Package ollama implements the modelrepo.Provider contract against Ollama HTTP endpoints.
Package ollama implements the modelrepo.Provider contract against Ollama HTTP endpoints.
modelrepo/openai
Package openai implements the modelrepo.Provider contract against the OpenAI HTTP API and OpenAI-compatible endpoints.
Package openai implements the modelrepo.Provider contract against the OpenAI HTTP API and OpenAI-compatible endpoints.
modelrepo/openrouter
Package openrouter is a catalog provider for OpenRouter (openrouter.ai), which exposes 300+ models from many providers through a single OpenAI-compatible endpoint.
Package openrouter is a catalog provider for OpenRouter (openrouter.ai), which exposes 300+ models from many providers through a single OpenAI-compatible endpoint.
modelrepo/openvino
Package openvino is the runtime-side modelprovider for OpenVINO (Intel) local inference.
Package openvino is the runtime-side modelprovider for OpenVINO (Intel) local inference.
modelrepo/vertex
Package vertex implements the modelrepo.Provider contract against Google Vertex AI publisher endpoints, using OAuth bearer tokens minted from service-account credentials.
Package vertex implements the modelrepo.Provider contract against Google Vertex AI publisher endpoints, using OAuth bearer tokens minted from service-account credentials.
modelrepo/vllm
Package vllm implements the modelrepo.Provider contract against vLLM OpenAI-compatible HTTP endpoints.
Package vllm implements the modelrepo.Provider contract against vLLM OpenAI-compatible HTTP endpoints.
ollamatokenizer
Package ollamatokenizer provides Tokenizer implementations used by llmrepo to count and split tokens for a given model.
Package ollamatokenizer provides Tokenizer implementations used by llmrepo to count and split tokens for a given model.
runtimestate
Package runtimestate reconciles the declared state of LLM backends (from dbInstance) with their actual observed state.
Package runtimestate reconciles the declared state of LLM backends (from dbInstance) with their actual observed state.
sessionservice
Package sessionservice stores CLI chat sessions so conversations persist across terminal restarts.
Package sessionservice stores CLI chat sessions so conversations persist across terminal restarts.
taskengine
Package taskengine orchestrates an agent: it drives LLM turns, tool calls, and routing in a loop, defined as a JSON chain you version in git.
Package taskengine orchestrates an agent: it drives LLM turns, tool calls, and routing in a loop, defined as a JSON chain you version in git.
taskengine/llmretry
Package llmretry wraps a single LLM call with classified retry, exponential backoff, and an optional model fallback.
Package llmretry wraps a single LLM call with classified retry, exponential backoff, and an optional model fallback.
transport
Package transport defines the contract the modeld daemon implements and the runtime calls: a persistent, manifest-keyed warm-reuse inference session.
Package transport defines the contract the modeld daemon implements and the runtime calls: a persistent, manifest-keyed warm-reuse inference session.
transport/grpc
Package grpc is the gRPC wire transport for the runtime/transport.Service contract.
Package grpc is the gRPC wire transport for the runtime/transport.Service contract.
tools
version command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL