Documentation
¶
Overview ¶
Package devstack centralises per-test dev-stack assembly.
Source of truth ¶
Since Phase 110d (D-197) this package's `Assemble` is a THIN wrapper over `internal/runtime/assemble.Assemble` — the SAME promoted fan-out `cmd/harbor/cmd_dev.go::bootDevStack` wraps. Production ↔ devstack subsystem-wiring parity therefore holds by construction; the pre-110d hand-mirrored copy (the D-094 "MUST track production field-for-field" discipline, which drifted anyway — the MCP ToolPolicy projection drop, the missing cfg-declared OAuth providers) is deleted. What remains per-caller is the test-kit-only band: dev auth signer, draft store, transports/mux, and the per-task run-loop driver mirror (whose POPULATION helpers are shared; the subscriber shell is per-caller).
What this package replaces ¶
Before D-094, four integration test files each duplicated ~100–200 LOC of stack assembly (audit + events + state + tasks + steering + protocol + auth + transports + catalog + builder):
- `test/integration/wave11_test.go::buildWave11Stack`
- `test/integration/phase64_harbor_dev_helpers_test.go::buildPhase64TestStack`
- `test/integration/phase64a_catalog_wiring_test.go::buildPhase64aEnv`
- `test/integration/phase31_approval_gates_test.go::buildPhase31Env`
Each tested a slightly different layer subset. The `AssembleOpts` `Skip*` knobs let a caller opt out of layers it does not exercise (auth / transports / catalog / steering); everything else is always built so the tests prove the layers the production binary composes still compose under the helper.
Real drivers everywhere — no mocks at the seam (CLAUDE.md §17.3) ¶
The helper opens REAL drivers via the registered factories — the patterns audit redactor, the inmem events / state / artifacts / tasks / memory drivers. The four test files MUST blank-import the driver packages so registration fires before Assemble is called; see the helper's godoc on `Assemble` for the canonical import block.
Identity propagation ¶
The helper takes NO identity in its signature. Tests construct their own (`identity.Quadruple`) and pass them into individual calls. Every layer the helper wires reads identity from `ctx` per CLAUDE.md §6.
Concurrent reuse (D-025) ¶
The returned `*DevStack` is shaped like a compiled artifact: every field is concurrent-safe under N parallel invocations (the underlying drivers' concurrent-reuse tests already gate this). `DevStack.Close` is idempotent and safe to defer.
Phase 65 (D-099) hot-reload deliberately NOT mirrored ¶
The production `harbor dev` hot-reload supervisor (`cmd/harbor/cmd_dev_hot_reload.go`) wraps `bootDevStack` — it lives at the runDev level, not inside bootDevStack itself. The helper mirrors bootDevStack's field-for-field assembly, NOT the surrounding supervisor: integration tests that need to exercise the hot-reload shape construct their own supervisor against the helper's assembled stack (the supervisor's exported constructor takes the boot opts and the initial stack — both reproducible here). Per D-094's "helper-tracks-production" rule, this is a deliberate scope choice, not drift: a hot-reload "helper" that owned the rebuild loop would duplicate the cmd-side orchestrator with no test using it. When the rebuild orchestrator's shape next changes, both files (this one and `cmd/harbor/cmd_dev_hot_reload.go`) are revisited together.
harbortest/devstack/session_ensurer.go — adapts the concrete sessions.Registry to the protocol.SessionEnsurer seam (D-171).
Mirrors `cmd/harbor/session_ensurer.go` field-for-field (D-094 source-of-truth invariant): the production dev boot and the production-mirroring fixture wire the SAME create-on-first-use behaviour, so an integration test against the devstack exercises the exact path `harbor dev` runs.
Index ¶
Constants ¶
const ( DefaultDevTenant = "dev" DefaultDevUser = "dev" DefaultDevSession = "dev" // DefaultKID is the kid header the in-test ES256 signer stamps // on tokens. Matches `cmd/harbor`'s DevKID convention. DefaultKID = "harbor-test" // DefaultTokenTTL pins the validity of minted dev tokens to one // hour — short enough that a forgotten token cannot leak past // CI run boundaries, long enough that no test will hit refresh. DefaultTokenTTL = 1 * time.Hour )
DefaultDevTenant / DefaultDevUser / DefaultDevSession match the `cmd/harbor` package-private dev-token constants. The Assemble helper mints a Bearer token under this identity when SkipAuth is false; tests that exercise the wire surface use this triple in their request bodies + JWT-validation expectations.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type AssembleOpts ¶
type AssembleOpts struct {
// SkipAuth disables Validator construction + dev-token minting.
// `DevStack.Validator` / `DevStack.Token` are nil. Use for tests
// that exercise the catalog or in-process invariants and never
// touch the wire.
SkipAuth bool
// SkipTransports disables `transports.NewMux` + the HTTP router.
// `DevStack.Handler` / `DevStack.Mux` are nil. Implies that the
// caller never opens an httptest.Server. Always implies the
// `tools.entries[]` catalog-wiring layer can still fire — the
// catalog builder does not depend on transports.
SkipTransports bool
// SkipCatalog disables `tools.NewCatalog` + the Phase 64a
// `catalog.Builder` apply path. `DevStack.Catalog` /
// `DevStack.Coordinator` / `DevStack.Gates` are nil. Use for
// tests that only need the bus / state / tasks layers.
SkipCatalog bool
// SkipSteering disables `steering.NewRegistry` + the
// ControlSurface. `DevStack.Steering` / `DevStack.Surface` are
// nil. Implies SkipTransports because a Mux requires a
// ControlSurface.
SkipSteering bool
// SkipRunLoop disables the `steering.RunLoop` construction and
// the per-task driver that subscribes to `task.spawned` to drive
// it (D-097, the production wiring that closes #114). When set,
// `DevStack.RunLoop` / `DevStack.RunLoopDriver` are nil. Tests
// that don't need the planner-step loop (anything that doesn't
// drive a `start` request to completion) set this to opt out;
// `wave11_test.go`'s post-D-097 wire-side approve E2E LEAVES the
// flag false so the production RunLoop fires.
//
// SkipRunLoop implies the in-test bridge for APPROVE/REJECT
// resolution is no longer needed (the production bridge in
// `steering.applier.routeThroughGate` fires from the RunLoop's
// drain), so callers that previously installed
// `runWave11WireBridge`-shaped goroutines can drop them.
//
// SkipRunLoop has no effect when SkipSteering or SkipCatalog is
// set: the RunLoop requires both the steering Registry and the
// catalog-applied gates map (the §13 primitive-with-consumer
// rule applied to the V1 wiring).
SkipRunLoop bool
// OAuthProviders pre-populates the OAuth-provider map the
// catalog Builder consults when an entry declares
// `tools.entries[].oauth`. Empty by default.
OAuthProviders map[string]toolauth.OAuthProvider
// PreRegisterTools is the descriptor list registered with the
// catalog BEFORE the Builder applies. Use this to register
// in-test tool fixtures (echo, stub, etc.) that operator config
// in `cfg.Tools.Entries` then wraps. Ignored when SkipCatalog is
// true.
PreRegisterTools []tools.ToolDescriptor
// LLMConfigSnapshot, when non-nil, overrides the LLM config
// snapshot the helper would otherwise compute from `cfg.LLM`.
// Phase 64 / D-089's `HARBOR_DEV_ALLOW_MOCK=1` path drives the
// production cmd to override `driver` to "mock"; the wave11
// integration test does the same thing. Pass an explicit
// snapshot to flip the driver without re-writing the yaml.
LLMConfigSnapshot *llm.ConfigSnapshot
// Logger, when non-nil, is threaded through the auth.Middleware
// wrapper for the draft handler so the helper's auth-rejection
// log lines match production exactly (D-094 helper-tracks-
// production rule; audit W2). When nil, the wrapper omits the
// MWLogger option — silent rejection in tests is fine.
Logger *slog.Logger
// PlannerOverride, when non-nil, replaces the registry-resolved
// planner concrete the helper would otherwise build from
// `cfg.Planner` (D-103). Tests that need a stub / scripted /
// pausing planner pass their own instance here; production code
// never sets this field (the registry path is the only way to
// reach a planner concrete in `harbor dev`). The override is
// applied AFTER the LLM client is built so the same `stack.LLMClient`
// the registry would have used is still available to the test.
PlannerOverride planner.Planner
// Identity overrides the dev-token's identity triple. Empty
// fields fall back to DefaultDev{Tenant,User,Session}.
Identity struct {
Tenant string
User string
Session string
}
// Phase 83f (D-149) — mirror the production cmd_dev.go
// per-run consumer wiring. The four fields are optional
// OVERRIDES: a set field wins; an unset field falls back to what
// the cfg implies, exactly like production (Phase 110c, D-196 —
// the fallbacks consume the same exported projections cmd does).
//
// `MemoryStore` is the store the per-task driver calls
// `GetLLMContext(ctx, q)` against. Nil falls back to the
// cfg-opened `DevStack.Memory` (nil when `memory.driver` unset).
// `SkillStore` is the store the kit's skills Directory browses
// (Phase 111d — D-201: the Directory is the `<skills_context>`
// producer, mirroring production). Nil falls back to the
// cfg-opened store when `skills.driver` is set (via
// `skills.SnapshotFromConfig`).
// `SkillsContextMax` caps the injected directory view's length
// when `skills.directory.max_entries` is unset; zero falls back
// to `cfg.Planner.SkillsContextMaxResolved()` (default 5,
// single-sourced at `config.DefaultSkillsContextMax`).
// `PlanningHints`, when non-nil, projects directly onto
// `RunContext.PlanningHints` for every run the driver spawns;
// nil falls back to `planner.HintsFromConfig(cfg.Planner.PlanningHints)`.
MemoryStore memory.MemoryStore
SkillStore skills.SkillStore
SkillsContextMax int
PlanningHints *planner.PlanningHints
// TracerOptions is forwarded verbatim to
// `assemble.Options.TracerOptions` (Phase 111f, D-203). Tests
// inject `telemetry.WithSpanExporter` with an in-memory recorder
// so the assembly-started bus→tracer bridge's spans are
// observable without a collector (the Wave C composed E2E is the
// first consumer).
TracerOptions []telemetry.TracerOption
// TopologyAccessor, when non-nil, is wired into the
// ControlSurface via protocol.WithTopologyAccessor so the Phase 74
// `topology.snapshot` method returns a real projection (D-114).
// Production `harbor dev` hosts no engine-graph (its runtime is
// planner/RunLoop-shaped), so its ControlSurface leaves the
// accessor nil; the Phase 74 integration test constructs a real
// `engine.Engine` and passes it here so the topology surface is
// exercised end-to-end with real drivers (CLAUDE.md §17.6 — the
// test fixture wires what the test needs; the production absence
// is documented, not a bug). Ignored when SkipSteering is set.
TopologyAccessor protocol.TopologyAccessor
// ScopeChecker, when non-nil, overrides the ControlSurface's
// admin-cross-tenant scope predicate (Phase 74 / D-114). The
// integration test injects a deterministic checker to exercise
// the cross-tenant admin path without standing up an
// auth.Middleware. Ignored when SkipSteering is set.
ScopeChecker protocol.ScopeChecker
// DraftRoot overrides the on-disk root the Phase 66 / D-100
// draft Store materialises drafts under. Empty falls back to a
// per-test temp dir (the helper picks one via testing.TempDir).
// Tests that want to share a root across multiple Assemble calls
// (rare) supply the same string twice.
//
// Cleanup responsibility (audit W5): when DraftRoot is empty, the
// helper picks the temp dir AND registers an os.RemoveAll cleanup
// on stack.Close. When DraftRoot is supplied explicitly, the
// caller OWNS the directory and is responsible for cleanup — the
// helper does NOT call os.RemoveAll on an operator-supplied path
// (it would clobber a caller-managed scratch dir). Use t.TempDir
// + DraftRoot together if you want both control and auto-cleanup.
DraftRoot string
}
AssembleOpts controls which layers the helper builds. The zero value builds everything the cfg implies — LLM / memory / artifacts / tasks plus auth + transports + catalog + steering.
Each `Skip*` is binary: when set, the corresponding `DevStack` field is left nil. Tests assert against the field they exercise.
type DevStack ¶
type DevStack struct {
// Cfg is the *config.Config the caller passed in. Pinned on the
// stack so tests can read driver-specific knobs without
// threading the cfg through their own helpers.
Cfg *config.Config
// Audit / Bus / State / Artifacts / Tasks are always non-nil
// after a successful Assemble — they are the runtime's
// load-bearing core. The Memory / LLMClient fields are only
// non-nil when the cfg declared a driver for them.
Audit audit.Redactor
Bus events.EventBus
State state.StateStore
Artifacts artifacts.ArtifactStore
Tasks tasks.TaskRegistry
LLMClient llm.LLMClient
Memory memory.MemoryStore
// Telemetry / Tracer mirror the assembly\'s canonical structured
// Logger + OTel tracer (Phase 111f, D-203). Both bridges
// (bus→metrics, bus→tracer) are started by the assembly and join
// its closer chain — the kit inherits them as a thin caller.
Telemetry *telemetry.Logger
Tracer *telemetry.Tracer
// Skills is non-nil when the cfg declared `skills.driver` (opened
// via `skills.SnapshotFromConfig`, mirroring production cmd_dev —
// Phase 110c, D-196) or when the caller passed
// `AssembleOpts.SkillStore` (which always wins).
Skills skills.SkillStore
// Steering / Surface are nil when SkipSteering is set.
Steering *steering.Registry
Surface *protocol.ControlSurface
// Sessions is the StateStore-backed SessionRegistry (D-171). Always
// non-nil after a successful Assemble — it mirrors the production
// `cmd/harbor` boot path. The ControlSurface is wired with its
// create-on-first-use ensurer, and (when transports are mounted) the
// `sessions.*` Protocol routes project over it. Integration tests use
// it to assert per-request session create-on-first-use + restart
// re-discovery via the persistent catalog.
Sessions *sessions.Registry
// RunLoop / RunLoopDriver are nil when SkipRunLoop is set OR when
// SkipSteering / SkipCatalog forces the construction to be
// skipped (the RunLoop needs both the steering Registry and the
// catalog-applied gates map). Tests that drive a `start` request
// rely on these — without RunLoop, the spawned task sits at
// StatusPending forever and the planner never runs.
RunLoop *steering.RunLoop
RunLoopDriver *DevStackRunLoopDriver
// Catalog / Coordinator / Gates / OAuthProviders are nil when
// SkipCatalog is set. The Gates map is keyed by tool name and
// populated by the catalog Builder; tests that drive
// `gate.ResolveApproval` reach for it.
Catalog tools.ToolCatalog
Coordinator pauseresume.Coordinator
Gates map[string]*toolapproval.ApprovalGate
OAuthProviders map[string]toolauth.OAuthProvider
// Phase 83g (D-150): the MCP Registry the dev stack populates
// from cfg.Tools.MCPServers. Nil when SkipCatalog is set or no
// servers are configured. Integration tests inspect this
// directly to assert each configured server reached the Registry.
MCPRegistry *mcpdrv.Registry
// Validator / SigningKey / KID / Token are nil/empty when
// SkipAuth is set. The Token is a signed Bearer the caller
// stamps on outgoing HTTP requests; SigningKey is the matching
// private key callers use to mint additional tokens (e.g. a
// bogus token for the failure-mode test).
Validator auth.Validator
SigningKey *ecdsa.PrivateKey
KID string
Token string
// Mux / Handler are nil when SkipTransports is set. Handler is
// the composed mux that exposes /healthz + /readyz + /v1/*; it
// is the value tests pass to httptest.NewServer.
Mux *http.ServeMux
Handler http.Handler
// DraftStore is the Phase 66 / D-100 draft scratchpad. Always
// non-nil after a successful Assemble — the helper mirrors
// production (D-094 source-of-truth invariant). Tests that
// exercise the draft surface read DraftStore.Root() for the on-
// disk path or drive the HTTP handler mounted at
// devdraft.RoutePrefix.
DraftStore *devdraft.Store
// Close runs every subsystem's Close in reverse dependency
// order. Idempotent: safe to defer; safe to call multiple
// times.
Close func()
// contains filtered or unexported fields
}
DevStack is the bundle Assemble returns. Fields are nil when the corresponding layer was skipped via AssembleOpts.
func Assemble ¶
Assemble builds the dev stack the production `harbor dev` subcommand boots. Since Phase 110d (D-197) it is a THIN wrapper over the promoted `internal/runtime/assemble` fan-out — the same entry point `cmd/harbor/cmd_dev.go::bootDevStack` wraps — plus the test-kit-only legs (dev auth signer, draft store, transports/mux, the per-task run-loop driver mirror).
The helper is `*testing.T`-flavoured: every failure is a `t.Fatalf` so tests don't need to thread error returns. On success, the caller defers `stack.Close()` immediately.
stack := devstack.Assemble(t, cfg, devstack.AssembleOpts{})
defer stack.Close()
Required blank imports ¶
None for the production set (Phase 110c, D-196): devstack imports the `internal/drivers/prod` aggregator itself, so every production driver factory AND the full LLM wrapper chain (corrections / downgrade / retry / governance) are seated by construction — the same registrations `cmd/harbor/main.go` boots with. The hand-curated per-test import list (and the drift it invited — SDK friction audit §7) is gone. The ONE driver outside the set is the dev-only mock LLM; a test that flips the snapshot to `driver: mock` still adds:
import _ "github.com/hurtener/Harbor/internal/llm/mock"
(existing per-test driver blank imports remain harmless — Go runs a package init exactly once regardless of how many importers).
type DevStackRunLoopDriver ¶
type DevStackRunLoopDriver struct {
// contains filtered or unexported fields
}
DevStackRunLoopDriver mirrors `cmd/harbor`'s package-private `perTaskRunLoopDriver`. The duplication is intentional per D-094's source-of-truth invariant: both ship the same shape (subscribe to `task.spawned`, launch a goroutine per spawned foreground task, drive the planner via `RunLoop.Run`, drain on Close). When the production shape evolves, both move in the same PR.
The driver is exported as a pointer-shaped opaque type — tests inspect via the `RunLoop` field rather than reaching into the driver's internals.
func (*DevStackRunLoopDriver) TrajectoryByTaskID ¶ added in v1.3.0
func (d *DevStackRunLoopDriver) TrajectoryByTaskID(taskID tasks.TaskID) *planner.Trajectory
TrajectoryByTaskID returns the planner trajectory for a completed run, or nil when the task's trajectory has been evicted or never existed. Reads are safe under concurrent access (RLock / D-025). The D-094 mirror of the production driver's accessor — the Enricher seam's trajectory source (Phase 107a parity, D-195 dated-note follow-up).