devstack

package
v1.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 11, 2026 License: Apache-2.0 Imports: 61 Imported by: 0

Documentation

Overview

Package devstack centralises per-test dev-stack assembly.

Source of truth

Since Phase 110d (D-197) this package's `Assemble` is a THIN wrapper over `internal/runtime/assemble.Assemble` — the SAME promoted fan-out `cmd/harbor/cmd_dev.go::bootDevStack` wraps. Production ↔ devstack subsystem-wiring parity therefore holds by construction; the pre-110d hand-mirrored copy (the D-094 "MUST track production field-for-field" discipline, which drifted anyway — the MCP ToolPolicy projection drop, the missing cfg-declared OAuth providers) is deleted. What remains per-caller is the test-kit-only band: dev auth signer, draft store, transports/mux, and the per-task run-loop driver mirror (whose POPULATION helpers are shared; the subscriber shell is per-caller).

What this package replaces

Before D-094, four integration test files each duplicated ~100–200 LOC of stack assembly (audit + events + state + tasks + steering + protocol + auth + transports + catalog + builder):

  • `test/integration/wave11_test.go::buildWave11Stack`
  • `test/integration/phase64_harbor_dev_helpers_test.go::buildPhase64TestStack`
  • `test/integration/phase64a_catalog_wiring_test.go::buildPhase64aEnv`
  • `test/integration/phase31_approval_gates_test.go::buildPhase31Env`

Each tested a slightly different layer subset. The `AssembleOpts` `Skip*` knobs let a caller opt out of layers it does not exercise (auth / transports / catalog / steering); everything else is always built so the tests prove the layers the production binary composes still compose under the helper.

Real drivers everywhere — no mocks at the seam (CLAUDE.md §17.3)

The helper opens REAL drivers via the registered factories — the patterns audit redactor, the inmem events / state / artifacts / tasks / memory drivers. The four test files MUST blank-import the driver packages so registration fires before Assemble is called; see the helper's godoc on `Assemble` for the canonical import block.

Identity propagation

The helper takes NO identity in its signature. Tests construct their own (`identity.Quadruple`) and pass them into individual calls. Every layer the helper wires reads identity from `ctx` per CLAUDE.md §6.

Concurrent reuse (D-025)

The returned `*DevStack` is shaped like a compiled artifact: every field is concurrent-safe under N parallel invocations (the underlying drivers' concurrent-reuse tests already gate this). `DevStack.Close` is idempotent and safe to defer.

Phase 65 (D-099) hot-reload deliberately NOT mirrored

The production `harbor dev` hot-reload supervisor (`cmd/harbor/cmd_dev_hot_reload.go`) wraps `bootDevStack` — it lives at the runDev level, not inside bootDevStack itself. The helper mirrors bootDevStack's field-for-field assembly, NOT the surrounding supervisor: integration tests that need to exercise the hot-reload shape construct their own supervisor against the helper's assembled stack (the supervisor's exported constructor takes the boot opts and the initial stack — both reproducible here). Per D-094's "helper-tracks-production" rule, this is a deliberate scope choice, not drift: a hot-reload "helper" that owned the rebuild loop would duplicate the cmd-side orchestrator with no test using it. When the rebuild orchestrator's shape next changes, both files (this one and `cmd/harbor/cmd_dev_hot_reload.go`) are revisited together.

harbortest/devstack/session_ensurer.go — adapts the concrete sessions.Registry to the protocol.SessionEnsurer seam (D-171).

Mirrors `cmd/harbor/session_ensurer.go` field-for-field (D-094 source-of-truth invariant): the production dev boot and the production-mirroring fixture wire the SAME create-on-first-use behaviour, so an integration test against the devstack exercises the exact path `harbor dev` runs.

Index

Constants

View Source
const (
	DefaultDevTenant  = "dev"
	DefaultDevUser    = "dev"
	DefaultDevSession = "dev"

	// DefaultKID is the kid header the in-test ES256 signer stamps
	// on tokens. Matches `cmd/harbor`'s DevKID convention.
	DefaultKID = "harbor-test"

	// DefaultTokenTTL pins the validity of minted dev tokens to one
	// hour — short enough that a forgotten token cannot leak past
	// CI run boundaries, long enough that no test will hit refresh.
	DefaultTokenTTL = 1 * time.Hour
)

DefaultDevTenant / DefaultDevUser / DefaultDevSession match the `cmd/harbor` package-private dev-token constants. The Assemble helper mints a Bearer token under this identity when SkipAuth is false; tests that exercise the wire surface use this triple in their request bodies + JWT-validation expectations.

Variables

This section is empty.

Functions

This section is empty.

Types

type AssembleOpts

type AssembleOpts struct {
	// SkipAuth disables Validator construction + dev-token minting.
	// `DevStack.Validator` / `DevStack.Token` are nil. Use for tests
	// that exercise the catalog or in-process invariants and never
	// touch the wire.
	SkipAuth bool

	// SkipTransports disables `transports.NewMux` + the HTTP router.
	// `DevStack.Handler` / `DevStack.Mux` are nil. Implies that the
	// caller never opens an httptest.Server. Always implies the
	// `tools.entries[]` catalog-wiring layer can still fire — the
	// catalog builder does not depend on transports.
	SkipTransports bool

	// SkipCatalog disables `tools.NewCatalog` + the Phase 64a
	// `catalog.Builder` apply path. `DevStack.Catalog` /
	// `DevStack.Coordinator` / `DevStack.Gates` are nil. Use for
	// tests that only need the bus / state / tasks layers.
	SkipCatalog bool

	// SkipSteering disables `steering.NewRegistry` + the
	// ControlSurface. `DevStack.Steering` / `DevStack.Surface` are
	// nil. Implies SkipTransports because a Mux requires a
	// ControlSurface.
	SkipSteering bool

	// SkipRunLoop disables the `steering.RunLoop` construction and
	// the per-task driver that subscribes to `task.spawned` to drive
	// it (D-097, the production wiring that closes #114). When set,
	// `DevStack.RunLoop` / `DevStack.RunLoopDriver` are nil. Tests
	// that don't need the planner-step loop (anything that doesn't
	// drive a `start` request to completion) set this to opt out;
	// `wave11_test.go`'s post-D-097 wire-side approve E2E LEAVES the
	// flag false so the production RunLoop fires.
	//
	// SkipRunLoop implies the in-test bridge for APPROVE/REJECT
	// resolution is no longer needed (the production bridge in
	// `steering.applier.routeThroughGate` fires from the RunLoop's
	// drain), so callers that previously installed
	// `runWave11WireBridge`-shaped goroutines can drop them.
	//
	// SkipRunLoop has no effect when SkipSteering or SkipCatalog is
	// set: the RunLoop requires both the steering Registry and the
	// catalog-applied gates map (the §13 primitive-with-consumer
	// rule applied to the V1 wiring).
	SkipRunLoop bool

	// OAuthProviders pre-populates the OAuth-provider map the
	// catalog Builder consults when an entry declares
	// `tools.entries[].oauth`. Empty by default.
	OAuthProviders map[string]toolauth.OAuthProvider

	// PreRegisterTools is the descriptor list registered with the
	// catalog BEFORE the Builder applies. Use this to register
	// in-test tool fixtures (echo, stub, etc.) that operator config
	// in `cfg.Tools.Entries` then wraps. Ignored when SkipCatalog is
	// true.
	PreRegisterTools []tools.ToolDescriptor

	// LLMConfigSnapshot, when non-nil, overrides the LLM config
	// snapshot the helper would otherwise compute from `cfg.LLM`.
	// Phase 64 / D-089's `HARBOR_DEV_ALLOW_MOCK=1` path drives the
	// production cmd to override `driver` to "mock"; the wave11
	// integration test does the same thing. Pass an explicit
	// snapshot to flip the driver without re-writing the yaml.
	LLMConfigSnapshot *llm.ConfigSnapshot

	// Logger, when non-nil, is threaded through the auth.Middleware
	// wrapper for the draft handler so the helper's auth-rejection
	// log lines match production exactly (D-094 helper-tracks-
	// production rule; audit W2). When nil, the wrapper omits the
	// MWLogger option — silent rejection in tests is fine.
	Logger *slog.Logger

	// PlannerOverride, when non-nil, replaces the registry-resolved
	// planner concrete the helper would otherwise build from
	// `cfg.Planner` (D-103). Tests that need a stub / scripted /
	// pausing planner pass their own instance here; production code
	// never sets this field (the registry path is the only way to
	// reach a planner concrete in `harbor dev`). The override is
	// applied AFTER the LLM client is built so the same `stack.LLMClient`
	// the registry would have used is still available to the test.
	PlannerOverride planner.Planner

	// Identity overrides the dev-token's identity triple. Empty
	// fields fall back to DefaultDev{Tenant,User,Session}.
	Identity struct {
		Tenant  string
		User    string
		Session string
	}

	// Phase 83f (D-149) — mirror the production cmd_dev.go
	// per-run consumer wiring. The four fields are optional
	// OVERRIDES: a set field wins; an unset field falls back to what
	// the cfg implies, exactly like production (Phase 110c, D-196 —
	// the fallbacks consume the same exported projections cmd does).
	//
	// `MemoryStore` is the store the per-task driver calls
	// `GetLLMContext(ctx, q)` against. Nil falls back to the
	// cfg-opened `DevStack.Memory` (nil when `memory.driver` unset).
	// `SkillStore` is the store the kit's skills Directory browses
	// (Phase 111d — D-201: the Directory is the `<skills_context>`
	// producer, mirroring production). Nil falls back to the
	// cfg-opened store when `skills.driver` is set (via
	// `skills.SnapshotFromConfig`).
	// `SkillsContextMax` caps the injected directory view's length
	// when `skills.directory.max_entries` is unset; zero falls back
	// to `cfg.Planner.SkillsContextMaxResolved()` (default 5,
	// single-sourced at `config.DefaultSkillsContextMax`).
	// `PlanningHints`, when non-nil, projects directly onto
	// `RunContext.PlanningHints` for every run the driver spawns;
	// nil falls back to `planner.HintsFromConfig(cfg.Planner.PlanningHints)`.
	MemoryStore      memory.MemoryStore
	SkillStore       skills.SkillStore
	SkillsContextMax int
	PlanningHints    *planner.PlanningHints

	// TracerOptions is forwarded verbatim to
	// `assemble.Options.TracerOptions` (Phase 111f, D-203). Tests
	// inject `telemetry.WithSpanExporter` with an in-memory recorder
	// so the assembly-started bus→tracer bridge's spans are
	// observable without a collector (the Wave C composed E2E is the
	// first consumer).
	TracerOptions []telemetry.TracerOption

	// TopologyAccessor, when non-nil, is wired into the
	// ControlSurface via protocol.WithTopologyAccessor so the Phase 74
	// `topology.snapshot` method returns a real projection (D-114).
	// Production `harbor dev` hosts no engine-graph (its runtime is
	// planner/RunLoop-shaped), so its ControlSurface leaves the
	// accessor nil; the Phase 74 integration test constructs a real
	// `engine.Engine` and passes it here so the topology surface is
	// exercised end-to-end with real drivers (CLAUDE.md §17.6 — the
	// test fixture wires what the test needs; the production absence
	// is documented, not a bug). Ignored when SkipSteering is set.
	TopologyAccessor protocol.TopologyAccessor

	// ScopeChecker, when non-nil, overrides the ControlSurface's
	// admin-cross-tenant scope predicate (Phase 74 / D-114). The
	// integration test injects a deterministic checker to exercise
	// the cross-tenant admin path without standing up an
	// auth.Middleware. Ignored when SkipSteering is set.
	ScopeChecker protocol.ScopeChecker

	// DraftRoot overrides the on-disk root the Phase 66 / D-100
	// draft Store materialises drafts under. Empty falls back to a
	// per-test temp dir (the helper picks one via testing.TempDir).
	// Tests that want to share a root across multiple Assemble calls
	// (rare) supply the same string twice.
	//
	// Cleanup responsibility (audit W5): when DraftRoot is empty, the
	// helper picks the temp dir AND registers an os.RemoveAll cleanup
	// on stack.Close. When DraftRoot is supplied explicitly, the
	// caller OWNS the directory and is responsible for cleanup — the
	// helper does NOT call os.RemoveAll on an operator-supplied path
	// (it would clobber a caller-managed scratch dir). Use t.TempDir
	// + DraftRoot together if you want both control and auto-cleanup.
	DraftRoot string
}

AssembleOpts controls which layers the helper builds. The zero value builds everything the cfg implies — LLM / memory / artifacts / tasks plus auth + transports + catalog + steering.

Each `Skip*` is binary: when set, the corresponding `DevStack` field is left nil. Tests assert against the field they exercise.

type DevStack

type DevStack struct {
	// Cfg is the *config.Config the caller passed in. Pinned on the
	// stack so tests can read driver-specific knobs without
	// threading the cfg through their own helpers.
	Cfg *config.Config

	// Audit / Bus / State / Artifacts / Tasks are always non-nil
	// after a successful Assemble — they are the runtime's
	// load-bearing core. The Memory / LLMClient fields are only
	// non-nil when the cfg declared a driver for them.
	Audit     audit.Redactor
	Bus       events.EventBus
	State     state.StateStore
	Artifacts artifacts.ArtifactStore
	Tasks     tasks.TaskRegistry
	LLMClient llm.LLMClient
	Memory    memory.MemoryStore

	// Telemetry / Tracer mirror the assembly\'s canonical structured
	// Logger + OTel tracer (Phase 111f, D-203). Both bridges
	// (bus→metrics, bus→tracer) are started by the assembly and join
	// its closer chain — the kit inherits them as a thin caller.
	Telemetry *telemetry.Logger
	Tracer    *telemetry.Tracer

	// Skills is non-nil when the cfg declared `skills.driver` (opened
	// via `skills.SnapshotFromConfig`, mirroring production cmd_dev —
	// Phase 110c, D-196) or when the caller passed
	// `AssembleOpts.SkillStore` (which always wins).
	Skills skills.SkillStore

	// Steering / Surface are nil when SkipSteering is set.
	Steering *steering.Registry
	Surface  *protocol.ControlSurface

	// Sessions is the StateStore-backed SessionRegistry (D-171). Always
	// non-nil after a successful Assemble — it mirrors the production
	// `cmd/harbor` boot path. The ControlSurface is wired with its
	// create-on-first-use ensurer, and (when transports are mounted) the
	// `sessions.*` Protocol routes project over it. Integration tests use
	// it to assert per-request session create-on-first-use + restart
	// re-discovery via the persistent catalog.
	Sessions *sessions.Registry

	// RunLoop / RunLoopDriver are nil when SkipRunLoop is set OR when
	// SkipSteering / SkipCatalog forces the construction to be
	// skipped (the RunLoop needs both the steering Registry and the
	// catalog-applied gates map). Tests that drive a `start` request
	// rely on these — without RunLoop, the spawned task sits at
	// StatusPending forever and the planner never runs.
	RunLoop       *steering.RunLoop
	RunLoopDriver *DevStackRunLoopDriver

	// Catalog / Coordinator / Gates / OAuthProviders are nil when
	// SkipCatalog is set. The Gates map is keyed by tool name and
	// populated by the catalog Builder; tests that drive
	// `gate.ResolveApproval` reach for it.
	Catalog        tools.ToolCatalog
	Coordinator    pauseresume.Coordinator
	Gates          map[string]*toolapproval.ApprovalGate
	OAuthProviders map[string]toolauth.OAuthProvider

	// Phase 83g (D-150): the MCP Registry the dev stack populates
	// from cfg.Tools.MCPServers. Nil when SkipCatalog is set or no
	// servers are configured. Integration tests inspect this
	// directly to assert each configured server reached the Registry.
	MCPRegistry *mcpdrv.Registry

	// Validator / SigningKey / KID / Token are nil/empty when
	// SkipAuth is set. The Token is a signed Bearer the caller
	// stamps on outgoing HTTP requests; SigningKey is the matching
	// private key callers use to mint additional tokens (e.g. a
	// bogus token for the failure-mode test).
	Validator  auth.Validator
	SigningKey *ecdsa.PrivateKey
	KID        string
	Token      string

	// Mux / Handler are nil when SkipTransports is set. Handler is
	// the composed mux that exposes /healthz + /readyz + /v1/*; it
	// is the value tests pass to httptest.NewServer.
	Mux     *http.ServeMux
	Handler http.Handler

	// DraftStore is the Phase 66 / D-100 draft scratchpad. Always
	// non-nil after a successful Assemble — the helper mirrors
	// production (D-094 source-of-truth invariant). Tests that
	// exercise the draft surface read DraftStore.Root() for the on-
	// disk path or drive the HTTP handler mounted at
	// devdraft.RoutePrefix.
	DraftStore *devdraft.Store

	// Close runs every subsystem's Close in reverse dependency
	// order. Idempotent: safe to defer; safe to call multiple
	// times.
	Close func()
	// contains filtered or unexported fields
}

DevStack is the bundle Assemble returns. Fields are nil when the corresponding layer was skipped via AssembleOpts.

func Assemble

func Assemble(t *testing.T, cfg *config.Config, opts AssembleOpts) *DevStack

Assemble builds the dev stack the production `harbor dev` subcommand boots. Since Phase 110d (D-197) it is a THIN wrapper over the promoted `internal/runtime/assemble` fan-out — the same entry point `cmd/harbor/cmd_dev.go::bootDevStack` wraps — plus the test-kit-only legs (dev auth signer, draft store, transports/mux, the per-task run-loop driver mirror).

The helper is `*testing.T`-flavoured: every failure is a `t.Fatalf` so tests don't need to thread error returns. On success, the caller defers `stack.Close()` immediately.

stack := devstack.Assemble(t, cfg, devstack.AssembleOpts{})
defer stack.Close()

Required blank imports

None for the production set (Phase 110c, D-196): devstack imports the `internal/drivers/prod` aggregator itself, so every production driver factory AND the full LLM wrapper chain (corrections / downgrade / retry / governance) are seated by construction — the same registrations `cmd/harbor/main.go` boots with. The hand-curated per-test import list (and the drift it invited — SDK friction audit §7) is gone. The ONE driver outside the set is the dev-only mock LLM; a test that flips the snapshot to `driver: mock` still adds:

import _ "github.com/hurtener/Harbor/internal/llm/mock"

(existing per-test driver blank imports remain harmless — Go runs a package init exactly once regardless of how many importers).

type DevStackRunLoopDriver

type DevStackRunLoopDriver struct {
	// contains filtered or unexported fields
}

DevStackRunLoopDriver mirrors `cmd/harbor`'s package-private `perTaskRunLoopDriver`. The duplication is intentional per D-094's source-of-truth invariant: both ship the same shape (subscribe to `task.spawned`, launch a goroutine per spawned foreground task, drive the planner via `RunLoop.Run`, drain on Close). When the production shape evolves, both move in the same PR.

The driver is exported as a pointer-shaped opaque type — tests inspect via the `RunLoop` field rather than reaching into the driver's internals.

func (*DevStackRunLoopDriver) TrajectoryByTaskID added in v1.3.0

func (d *DevStackRunLoopDriver) TrajectoryByTaskID(taskID tasks.TaskID) *planner.Trajectory

TrajectoryByTaskID returns the planner trajectory for a completed run, or nil when the task's trajectory has been evicted or never existed. Reads are safe under concurrent access (RLock / D-025). The D-094 mirror of the production driver's accessor — the Enricher seam's trajectory source (Phase 107a parity, D-195 dated-note follow-up).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL