swarmkit
Multi-agent coordination primitives for distributed AI agent swarms, built on NATS.
swarmkit ships the common building blocks every agent in a multi-agent system needs above its transport — pub/sub messaging, work dispatch, liveness detection, identity registration. The framing: agents are modern web services; NATS is the HTTP of agents; swarmkit is the layer of common primitives above that.
It pairs with agentkit, which handles single-agent concerns (LLM calls, tools, memory, policy, MCP, ACP, shutdown coordination). swarmkit handles agent-to-agent coordination.
Packages
| Package |
Purpose |
messaging |
Pub/sub + request/reply over NATS. Single NATS() constructor. Worker pools via Join. |
task |
JetStream-backed task dispatch. Dispatcher.Run/Start mirrors os/exec.Cmd.Run/Start. |
heartbeat |
Sender + Monitor for agent liveness. Channel-based Beats() and Deaths(). |
registry |
Durable agent registration. Agent + Skill schema aligned with Google A2A. |
All packages are NATS-native, ship with OpenTelemetry tracing, and implement Shutdown(ctx) error for graceful drain.
Install
go get github.com/vinayprograms/swarmkit@latest
Requires NATS server (with JetStream enabled for task and registry).
Quickstart
import (
"context"
"github.com/vinayprograms/swarmkit/messaging"
"github.com/vinayprograms/swarmkit/task"
"github.com/vinayprograms/swarmkit/heartbeat"
"github.com/vinayprograms/swarmkit/registry"
)
natsCfg := messaging.NATSConfig{URL: "nats://localhost:4222"}
bus, _ := messaging.NATS(natsCfg)
defer bus.Close()
reg, _ := registry.New(registry.Config{NATS: natsCfg})
defer reg.Close()
// Worker side
worker, _ := task.NewWorker(task.Config{NATS: natsCfg})
defer worker.Close()
worker.Handle("code.go", func(ctx context.Context, msg *task.Message) (*task.Result, error) {
// ... do the work ...
return task.NewResult(msg.ID, "worker-1", task.StatusSuccess), nil
})
// Heartbeat
sender, _ := heartbeat.NewSender(heartbeat.SenderConfig{Bus: bus, Agent: "worker-1"})
defer sender.Close()
// Register self
reg.Register(registry.Agent{
ID: "worker-1",
Skills: []registry.Skill{{ID: "code.go", Name: "Go coder"}},
})
// Dispatcher side (a different process)
disp, _ := task.NewDispatcher(task.Config{NATS: natsCfg})
defer disp.Close()
result, _ := disp.Run(context.Background(), "code.go", task.NewMessage("", map[string]string{
"file": "main.go",
}))
Cross-cutting features
OpenTelemetry tracing
Every package emits spans through OTel's global TracerProvider. Configure in your application:
import "go.opentelemetry.io/otel"
otel.SetTracerProvider(myTracerProvider)
otel.SetTextMapPropagator(propagation.TraceContext{})
Trace context propagates through NATS message headers (W3C TraceContext format), so spans flow from a dispatcher across the bus to a worker handler and back. No swarmkit-side configuration required — leave the global provider unset for noop.
Graceful shutdown
Long-lived types implement Shutdown(ctx context.Context) error for graceful drain. The signature matches agentkit/shutdown.Handler, so swarmkit instances register directly with shutdown.Sequence:
import "github.com/vinayprograms/agentkit/shutdown"
seq := shutdown.New(shutdown.Defaults())
seq.HandleSignals()
seq.RegisterWithPhase("worker", worker, 10) // task.Worker
seq.RegisterWithPhase("dispatcher", dispatcher, 10) // task.Dispatcher
seq.RegisterWithPhase("heartbeat", sender, 20)
seq.RegisterWithPhase("registry", reg, 30)
seq.RegisterWithPhase("messaging", bus, 40)
Shutdown(ctx) waits for in-flight work to complete (drain semantic), respecting ctx's deadline. Close() is the immediate-stop equivalent.
swarmkit does not import agentkit. The compatibility comes from Go's structural typing — duck typing matches Shutdown(ctx) error to shutdown.Handler.
Design principles
- Don't ship without validated consumers. Speculative packages (
state, results, resume, ratelimit, plus coord/events/schedule/identity) were considered and rejected because their use cases were either not validated or already solved by NATS itself.
- NATS-native, not abstracted. swarmkit doesn't hide NATS — it provides agent-shaped primitives over it. JetStream is fully internal to packages that need it (
task, registry); pub/sub-only packages (heartbeat) consume messaging.Bus.
- Each package is independently usable. No cross-package coupling beyond
messaging.Bus interface. A consumer who only needs heartbeat doesn't pay for task's JetStream dependency.
- Shape A interface architecture.
messaging.Bus is the only published interface other packages consume. task, heartbeat, registry each take what they need (Bus or NATSConfig) — no shared facade.
Architecture
Application code
│
┌────────────────┼─────────────────┐
│ │ │
task heartbeat registry
(work dispatch) (liveness) (identity + skills)
│ │ │
│ messaging.Bus │
│ │ │
└────────────────┼─────────────────┘
│
NATS
(JetStream for task & registry,
core pub/sub for messaging & heartbeat)
License
Apache 2.0 — see LICENSE.