gembatesting

package
v0.0.0-...-39bba70 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 13, 2026 License: MIT Imports: 12 Imported by: 0

README

github.com/GembaCore/gemba-core/testing

Conformance harness for Gemba adaptors. Import this package from your adaptor's own go test suite to validate that your WorkPlane or OrchestrationPlaneAdaptor implementation satisfies the contracts in docs/adaptors/workplane.md and docs/adaptors/orchestration.md before you ship.

Resolves DD-12, DD-15. Published early per the Foolery-spike lesson (docs/prior-art/foolery.md): a contract is only real when external authors can run the contract's own tests against their code.

Why this package exists

Foolery's BackendPort is the canonical shape for "extension point you actually treat as external": the contract type lives next to the tests that bind the contract, and third-party backends can go get both to validate their implementation in their own CI. Gemba follows the same pattern — anything less leaves adaptor authors hand-rolling tests against docs/adaptors/*.md, which rots the moment the contract tightens.

Quick start

package beads_test

import (
    "testing"

    "yourorg/beads" // your adaptor
    "github.com/GembaCore/gemba-core/core"
    gembatesting "github.com/GembaCore/gemba-core/testing"
)

func TestBeadsConformance(t *testing.T) {
    impl := beads.New(ctx, beads.Config{...})
    gembatesting.RunWorkPlaneConformance(t, impl, &gembatesting.WorkPlaneFixture{
        KnownMissingID: core.WorkItemID("gemba/gemba/bd-does-not-exist"),
    })
}

Run it:

$ go test -v -run TestBeadsConformance ./...
=== RUN   TestBeadsConformance
=== RUN   TestBeadsConformance/A_describe_returns_valid_manifest
=== RUN   TestBeadsConformance/A_manifest_round_trips_json
=== RUN   TestBeadsConformance/A_describe_is_idempotent
=== RUN   TestBeadsConformance/E_capability_denial_matches_manifest
=== RUN   TestBeadsConformance/F_not_found_is_tagged_adaptor_error
--- PASS: TestBeadsConformance (0.00s)

Entry points

Two parallel APIs — same probes, same group layout — cover the two contexts a conformance run happens in:

  • RunWorkPlaneConformance(t, impl, fixture) / RunOrchestrationConformance — drive probes from a *testing.T (e.g., a TestXxxConformance test in your adaptor's Go test suite).
  • RunWorkPlaneProbes(impl, fixture) / RunOrchestrationProbes — programmatic, testing-free entry points returning a structured *Report. Used by the gemba adaptor test CLI (gm-e3.5) and by any CI system that would rather consume JSON than parse go test output.
RunWorkPlaneConformance(t, impl, fixture)

Exercises the probes in docs/adaptors/workplane.md:

Group Probe Asserts
A describe_returns_valid_manifest Describe() returns; CapabilityManifest.Validate() passes; ProtocolVersion matches core.
A manifest_round_trips_json Manifest re-decodes byte-identical through encoding/json.
A describe_is_idempotent Two consecutive Describe() calls return equal manifests.
E capability_denial_matches_manifest Gated ops (attach_evidence, list_sprints, read_budget_rollup) opt-out via manifest → adaptor returns capability_denied *core.AdaptorError.
F not_found_is_tagged_adaptor_error GetWorkItem(KnownMissingID) returns a tagged error that also satisfies errors.Is(err, core.ErrNotFound). Skipped when KnownMissingID is empty.
RunOrchestrationConformance(t, impl, fixture)

Exercises the probes in docs/adaptors/orchestration.md:

Group Probe Asserts
A describe_returns_valid_manifest Manifest has non-empty workspace_kinds, group_modes, cost_axes, escalation_kinds, peek_modes and a valid Transport; OrchestrationAPIVersion matches core.
A manifest_round_trips_json Manifest re-decodes byte-identical through encoding/json.
B list_pending_requests_returns_empty_slice For a freshly-started session, ListPendingRequests returns a non-nil empty slice.
B end_session_populates_close_reason First terminal close populates Session.CloseReason with a valid SessionCloseReason.
B end_session_idempotent_under_same_nonce Second EndSession with the same (sessionID, nonce) is a no-op — no error.
B end_session_terminal_absorbing EndSession under a fresh nonce on an already-terminal session is a no-op — no error.
F list_pending_requests_unknown_session_is_tagged ListPendingRequests on an unknown id returns a tagged KindSessionNotFound error. Skipped when KnownMissingSessionID is empty.

The Group B probes need a live session. Supply OrchestrationFixture.SessionStarter:

gembatesting.RunOrchestrationConformance(t, impl, &gembatesting.OrchestrationFixture{
    KnownMissingSessionID: "sess-does-not-exist",
    SessionStarter: func(t *testing.T, adaptor core.OrchestrationPlaneAdaptor) (string, func()) {
        id, err := myAdaptor.MintFixtureSession(ctx)
        if err != nil { t.Fatal(err) }
        return id, func() { myAdaptor.DestroyFixtureSession(id) }
    },
})
RunWorkPlaneProbes(impl, fixture) *Report / RunOrchestrationProbes

The programmatic runner. Same probes as above, but failures accumulate into a *Report rather than failing a *testing.T. The CLI path:

gemba adaptor test --transport jsonl --target builtin:noop-work
gemba adaptor test --transport jsonl --target builtin:noop-work --json
gemba adaptor test --transport jsonl --target builtin:noop-work --junit out.xml

--target builtin:noop-work / builtin:noop-orch exercises the in-process reference adaptors. Remote targets (URL/socket/cmd) require a transport wire client (gm-e4.x) — they fail fast with a structured not-yet-implemented until those land.

The orchestration fixture used by the CLI path supplies ProgrammaticSessionStarter (a *testing.T-free counterpart to SessionStarter); it is called once per Group B probe because those probes individually close the session they receive.

Fixture contract

Field Required? Used by
WorkPlaneFixture.KnownMissingID optional Group F not-found probe (GetWorkItem).
OrchestrationFixture.KnownMissingSessionID optional Group F unknown-session probe (ListPendingRequests).
OrchestrationFixture.SessionStarter optional All Group B lifecycle probes (end session, close reason, idempotency).

Probes with no fixture data are skipped — they will not fail your suite silently, but the corresponding t.Run subtests will not execute.

Stability

Signatures are stable across minor versions. New probe subtests MAY land as part of minor releases; passing adaptors MUST keep passing. Breaking changes to probe semantics land with a bump of core.ProtocolVersion.

Reference implementation

internal/adapter/noop/ ships a minimal in-memory adaptor that passes both harnesses (see internal/adapter/noop/conformance_test.go). Clone the test file as the starting point for your own adaptor suite.

Documentation

Overview

Package testing publishes Gemba's conformance test harness as an importable library so third-party adaptor authors can run our contract tests inside their own `go test` suites (gm-2am, Foolery-spike lesson).

Import as:

import gembatesting "github.com/GembaCore/gemba-core/testing"

The alias avoids the obvious clash with the standard-library `testing` package, which every harness consumer also imports.

Entry points

testing.T-driven (Go test binary):

  • RunWorkPlaneConformance(t, impl, fixture)
  • RunOrchestrationConformance(t, impl, fixture)

Programmatic (gm-e3.5, powers `gemba adaptor test`):

  • RunWorkPlaneProbes(impl, fixture) *Report
  • RunOrchestrationProbes(impl, fixture) *Report
  • WriteTextReport(w, r) / WriteJUnit(w, r)

Both APIs exercise the same probe set. The testing.T-driven path surfaces failures through t.Run subtests; the programmatic path collects them into a structured *Report that the CLI renders or exports to JUnit. A fixture argument lets the adaptor pre-seed the data the probes need (known-missing ids, a live session id) so that adaptors with no way to construct fixture state still pass the fixture-independent groups.

Conformance groups

WorkPlane (docs/adaptors/workplane.md):

  • Group A — manifest validity, JSON round-trip, Describe idempotency.
  • Group E — capability denial: gated ops the manifest opts out of surface as `capability_denied` AdaptorError (gm-4qf).
  • Group F — error algebra: every observed boundary error is a tagged *core.AdaptorError (gm-faz).

Orchestration (docs/adaptors/orchestration.md):

  • Group A — manifest validity, required-axes coverage.
  • Group B — session lifecycle: EndSession idempotency (same-nonce replay + terminal-absorbing), CloseReason populated, active-turn protection, scope teardown, ListPendingRequests exists.
  • Group F — error algebra: ListPendingRequests on a bogus session id returns a tagged KindSessionNotFound.

Stability

The function signatures are stable across minor releases; new probe groups MAY be added behind new subtests without bumping the major version, but existing passing adaptors MUST keep passing. Breaking probe changes land with a bump of core.ProtocolVersion.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func RunOrchestrationConformance

func RunOrchestrationConformance(t *testing.T, impl core.OrchestrationPlaneAdaptor, fixture *OrchestrationFixture)

RunOrchestrationConformance runs the OrchestrationPlane contract probes against impl. Subtests are named by conformance group so failures surface directly; fixture-independent probes always run.

fixture may be nil; that is equivalent to a zero-value OrchestrationFixture.

func RunWorkPlaneConformance

func RunWorkPlaneConformance(t *testing.T, impl core.WorkPlane, fixture *WorkPlaneFixture)

RunWorkPlaneConformance runs the WorkPlane contract probes against impl (see package doc for group breakdown). Each group is a t.Run subtest so failures point directly at the offending probe; callers should invoke this from a top-level TestXxxConformance and let the subtest names surface in go test -v output.

fixture may be nil; that is equivalent to passing a zero-value WorkPlaneFixture — fixture-independent probes still run.

Usage:

func TestBeadsConformance(t *testing.T) {
    impl := bd.New(ctx, cfg)
    gembatesting.RunWorkPlaneConformance(t, impl, &gembatesting.WorkPlaneFixture{
        KnownMissingID: "gemba/gemba/gm-does-not-exist",
    })
}

func WriteJUnit

func WriteJUnit(w io.Writer, r *Report) error

WriteJUnit serializes r as a JUnit XML <testsuites> document. The `gemba adaptor test --junit <path>` flag emits this exact format.

func WriteTextReport

func WriteTextReport(w io.Writer, r *Report)

WriteTextReport emits a human-readable per-group summary of r to w. It is the default format for `gemba adaptor test` and mirrors what go test -v shows, except that skipped and not-applicable groups are called out explicitly so operators don't misread a missing fixture as a pass.

Types

type GroupResult

type GroupResult struct {
	Name          string        `json:"name"`
	NotApplicable bool          `json:"not_applicable,omitempty"`
	Note          string        `json:"note,omitempty"`
	Probes        []ProbeResult `json:"probes,omitempty"`
}

GroupResult carries one conformance group's probes. Name is the group letter + short description (e.g. "A: describe + capability shape"). NotApplicable=true means the group has no probes on this plane in the current release (e.g. WorkPlane has no edge interface yet, so Group C reports not-applicable).

type JUnitFailure

type JUnitFailure struct {
	Message string `xml:"message,attr"`
	Body    string `xml:",chardata"`
}

JUnitFailure carries the rolled-up probe messages as a <failure> element. Message is the one-line summary; the body is the full message block so operators can see every assertion that fired.

type JUnitSkipped

type JUnitSkipped struct {
	Message string `xml:"message,attr"`
}

JUnitSkipped carries the skip reason.

type JUnitTestCase

type JUnitTestCase struct {
	Name      string        `xml:"name,attr"`
	ClassName string        `xml:"classname,attr"`
	Failure   *JUnitFailure `xml:"failure,omitempty"`
	Skipped   *JUnitSkipped `xml:"skipped,omitempty"`
}

JUnitTestCase represents one probe. Exactly one of Failure / Skipped is set on a non-passing case.

type JUnitTestSuite

type JUnitTestSuite struct {
	Name     string          `xml:"name,attr"`
	Tests    int             `xml:"tests,attr"`
	Failures int             `xml:"failures,attr"`
	Skipped  int             `xml:"skipped,attr"`
	Cases    []JUnitTestCase `xml:"testcase"`
}

JUnitTestSuite represents one conformance group.

type JUnitTestSuites

type JUnitTestSuites struct {
	XMLName  xml.Name         `xml:"testsuites"`
	Name     string           `xml:"name,attr,omitempty"`
	Tests    int              `xml:"tests,attr"`
	Failures int              `xml:"failures,attr"`
	Skipped  int              `xml:"skipped,attr"`
	Suites   []JUnitTestSuite `xml:"testsuite"`
}

JUnitTestSuites is the root element JUnit consumers (GitHub Actions, GitLab, Jenkins) look for. The fields are intentionally the common subset every renderer we care about understands — more elaborate extensions (system-out, properties, categories) are out of scope here.

type OrchestrationFixture

type OrchestrationFixture struct {
	// KnownMissingSessionID is a session id the adaptor is guaranteed
	// NOT to have. The Group F probe calls ListPendingRequests(id) and
	// asserts the error is a tagged KindSessionNotFound AdaptorError.
	// Leave empty to skip.
	KnownMissingSessionID string

	// SessionStarter provisions a live session for the Group B
	// lifecycle probes. It returns the session id and a cleanup
	// callback the harness runs after the lifecycle probes complete.
	// Leave nil to skip the session-dependent probes.
	//
	// Used by the testing.T entry point (RunOrchestrationConformance).
	SessionStarter func(t *testing.T, impl core.OrchestrationPlaneAdaptor) (sessionID string, cleanup func())

	// ProgrammaticSessionStarter is the testing.T-free counterpart of
	// SessionStarter, used by RunOrchestrationProbes (the `gemba
	// adaptor test` CLI). It returns a fresh session id and an optional
	// cleanup callback, or an error if provisioning failed; the error
	// surfaces in the probe result instead of silently skipping.
	//
	// Each Group B probe requests its own session because most of them
	// move the session to terminal state. Adaptor authors implementing
	// both starters typically share a single internal helper that
	// mints an assignment and calls StartSession.
	ProgrammaticSessionStarter func(impl core.OrchestrationPlaneAdaptor) (sessionID string, cleanup func(), err error)

	// AcquireWorkspaceRequest seeds the Group D acquire/release event-
	// emission probe (gm-e3.6.3). Leave nil to skip the workspace half
	// of the event-emission group; adaptors without a workspace
	// allocator (noop) won't run it. The probe pairs the request with
	// a matching ReleaseWorkspace call so the workspace doesn't leak
	// into a follow-up probe.
	AcquireWorkspaceRequest *core.WorkspaceRequest
}

OrchestrationFixture carries the optional state some Group B probes need. Probes that require a live session skip when SessionStarter is nil; the manifest / error-algebra probes still run regardless.

type ProbeResult

type ProbeResult struct {
	Name     string   `json:"name"`
	Passed   bool     `json:"passed"`
	Skipped  bool     `json:"skipped,omitempty"`
	Messages []string `json:"messages,omitempty"`
}

ProbeResult is one probe's outcome. Passed=false with messages is a failure; Skipped=true means a required fixture was missing (e.g. no KnownMissingID, so the not-found probe could not run).

type Report

type Report struct {
	// Plane is "work" or "orchestration".
	Plane string `json:"plane"`
	// AdaptorName is the manifest-declared name (best-effort; empty if
	// Describe failed hard enough that no name is available).
	AdaptorName string `json:"adaptor_name,omitempty"`
	// AdaptorVersion is the manifest-declared version.
	AdaptorVersion string `json:"adaptor_version,omitempty"`
	// ProtocolVersion is what core.ProtocolVersion evaluated to at run
	// time; recorded so CI can correlate a failing run to a core pin.
	ProtocolVersion string `json:"protocol_version"`
	// StartedAt / FinishedAt bound the whole run for CI timing metrics.
	StartedAt  time.Time `json:"started_at"`
	FinishedAt time.Time `json:"finished_at"`
	// Groups is the A–F grouping defined in docs/adaptors/*.md.
	Groups []GroupResult `json:"groups"`
}

Report is the machine-readable summary of a conformance run. It is the shape the `gemba adaptor test` CLI emits (gm-e3.5), and what third- party CI systems read when they call the programmatic runner directly rather than go test.

One Report covers one plane (work or orchestration). A CLI run that tests both planes produces one Report per plane.

func RunOrchestrationProbes

func RunOrchestrationProbes(impl core.OrchestrationPlaneAdaptor, fixture *OrchestrationFixture) *Report

RunOrchestrationProbes runs the OrchestrationPlane conformance probes against impl and returns a structured Report. Programmatic counterpart to RunOrchestrationConformance (gm-e3.5).

fixture may be nil.

func RunWorkPlaneProbes

func RunWorkPlaneProbes(impl core.WorkPlane, fixture *WorkPlaneFixture) *Report

RunWorkPlaneProbes runs the WorkPlane conformance probes against impl and returns a structured Report. It is the programmatic counterpart to RunWorkPlaneConformance: same probes, same group layout, but no *testing.T dependency so the `gemba adaptor test` CLI can drive it.

fixture may be nil. Probes that require fixture state record a ProbeResult{Skipped: true} rather than silently disappearing, so CI can tell a missing fixture from a passed probe.

func (*Report) Passed

func (r *Report) Passed() bool

Passed reports whether every non-skipped probe across every group passed. Skipped probes do not count as failure (they document a fixture gap, not a contract violation).

func (*Report) Totals

func (r *Report) Totals() (passed, failed, skipped int)

Totals returns aggregate pass/fail/skip counts across the whole report. Convenient for CLI summary lines and JUnit <testsuites> attributes.

type WorkPlaneFixture

type WorkPlaneFixture struct {
	// KnownMissingID is a WorkItemID the adaptor is guaranteed NOT to
	// have. The Group F not-found probe calls GetWorkItem(id) and asserts
	// the error is a tagged *core.AdaptorError that also satisfies
	// errors.Is(err, core.ErrNotFound).
	KnownMissingID core.WorkItemID

	// SeedWorkItemWithEdges is an adaptor-supplied hook that stages one
	// WorkItem with a known-good Relationships slice inside the backend
	// and returns its id plus the expected relationships. The Group C
	// round-trip probe drives GetWorkItem(id) and checks the returned
	// relationships match the declaration. Leave nil when the adaptor
	// has no seed hook; the probe is skipped rather than failing.
	SeedWorkItemWithEdges func(impl core.WorkPlane) (core.WorkItemID, []core.Relationship, error)

	// ReadySetGraphEvolution — Group G (R3). Adaptor drives a
	// three-step scenario: stage two items A → (blocked by) B; observe
	// that B is in the ready-set and A is not; unblock A (close B or
	// remove the edge); observe A moves into the ready-set. Any
	// deviation returns a non-nil error. Gate:
	// DependencyGraphNative && ReadySetQuery.
	ReadySetGraphEvolution func(impl core.WorkPlane) error

	// DiscoveredFromMidExecution — Group G (R3, beads-specific). An
	// adaptor that declares an EdgeExtension named
	// "beads:discovered_from" stages a "discovered" item mid-execution
	// and asserts the new edge is visible in the parent's read view.
	// Gate: edge extension declared + hook non-nil.
	DiscoveredFromMidExecution func(impl core.WorkPlane) error

	// VersionedStateRoundTrip — Group H (R4). Adaptor exports state
	// via the declared transport, imports it into a fresh instance,
	// and asserts the round-tripped rows match. Runs once per
	// declared VersioningTransport entry that is not "none". Gate:
	// versioning_transport contains at least one real transport.
	VersionedStateRoundTrip func(impl core.WorkPlane, transport core.VersioningTransport) error

	// BranchMergeRoundTrip — Group H (R4). For transports that carry
	// branch semantics (git / dolt), the hook exercises a three-way
	// merge round-trip. Gate: versioning_transport contains git or
	// dolt.
	BranchMergeRoundTrip func(impl core.WorkPlane, transport core.VersioningTransport) error

	// ConcurrencyStressN — Group I (R5). Default N=16 concurrent
	// writers; the hook MAY use a smaller N and report it via the
	// returned int. Returns (N actually run, error). Gate:
	// concurrency_model is declared and not "optimistic".
	ConcurrencyStressN func(impl core.WorkPlane, n int) (int, error)

	// ReadAfterWriteCrossWriter — Group I (R5). Writes from goroutine
	// A, reads from goroutine B, asserts visibility within the
	// adaptor-declared event-latency budget.
	ReadAfterWriteCrossWriter func(impl core.WorkPlane) error

	// SessionDeathRecovery — Group J (R6). Simulates agent session
	// death mid-write; confirms state lands in the store. Gate:
	// agent_session_decoupling.
	SessionDeathRecovery func(impl core.WorkPlane) error

	// WorkPickupBySecondAgent — Group J (R6). First agent claims a
	// work item then disappears; second agent force-steals and
	// completes it. Gate: agent_session_decoupling.
	WorkPickupBySecondAgent func(impl core.WorkPlane) error

	// ReadySetSubscribeLatency — Group K (R8). Subscribe to ready-set
	// transitions; assert subscription receives the right events
	// within the adaptor-declared latency budget. Gate:
	// orchestrator_hooks ∋ ready-set-subscribe.
	ReadySetSubscribeLatency func(impl core.WorkPlane) error

	// ClaimAtomic — Group K (R8). Two concurrent claims on the same
	// item; exactly one succeeds, the other receives a typed conflict.
	// Gate: orchestrator_hooks ∋ claim-atomic.
	ClaimAtomic func(impl core.WorkPlane) error

	// EscalationIngestRoundTrip — Group K (R8). Adaptor accepts a
	// structured escalation and reads it back unchanged. Gate:
	// orchestrator_hooks ∋ escalation-ingest.
	EscalationIngestRoundTrip func(impl core.WorkPlane) error

	// WorkCompleteAck — Group K (R8). Adaptor round-trips a
	// work-complete write with an ack payload. Gate:
	// orchestrator_hooks ∋ work-complete-ack.
	WorkCompleteAck func(impl core.WorkPlane) error

	// PoolBulkDispatch — Group K (R8). N work items dispatched to a
	// pool of agents in one round-trip. Gate: orchestrator_hooks ∋
	// pool-bulk-dispatch.
	PoolBulkDispatch func(impl core.WorkPlane) error
}

WorkPlaneFixture carries the optional state some conformance probes need to run. Leave any field zero to skip the corresponding probe — fixture-independent probes (manifest validity, JSON round-trip, capability denial) still run regardless.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL