campaign

package

v0.3.4 Latest Latest Go to latest Published: Jan 2, 2026 License: Apache-2.0 Imports: 18 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/aalhour/rockyardkv

Links

Open Source Insights

Documentation ¶

Overview ¶

artifact.go defines run artifacts, campaign summaries, and persistence logic.

Key types:

RunResult: captures outcome of a single instance run
CampaignSummary: aggregates results across all runs in a campaign
RunArtifact: on-disk representation of a run for recheck/audit

Artifacts are written to stable paths under the run root directory.

composite.go defines multi-step composite instances.

Composite instances execute a sequence of steps (each a campaign instance) with configurable gating policies to determine overall pass/fail.

Use cases:

WAL+sync durability: write → crash → verify
Compaction stress: write → compact → verify

filter.go implements tag-based instance filtering.

Filters allow selecting a subset of instances based on their computed tags. Syntax: "key=value,key!=value,key=v1|v2" (comma-separated AND, pipe for OR).

Example: "tier=quick,tool=stresstest" selects quick-tier stress instances.

instance.go defines the Instance type and instance-level operations.

An Instance is a named, reproducible test configuration specifying:

Which tool to run (stresstest, crashtest, goldentest, adversarialtest)
Command-line arguments with placeholders (<SEED>, <RUN_DIR>, <DB_PATH>)
Seeds for deterministic reproduction
Stop conditions defining pass/fail criteria
Oracle requirements for C++ verification

known_failures.go implements failure fingerprinting and quarantine tracking.

Known failures are identified by stable fingerprints (hash of instance name, seed, failure kind, and key error details). When a failure recurs, it can be classified as a duplicate and optionally quarantined to prevent blocking CI.

Quarantine policies:

QuarantineNone: failure is not quarantined, fails the campaign
QuarantineAllowed: failure is known and allowed, does not fail the campaign

matrix.go defines the campaign instance matrix.

The matrix contains all predefined instances organized by tier (quick/nightly). Each instance is a specific test configuration with reproducible seeds.

Quick tier: fast feedback for local dev and CI pull requests. Nightly tier: comprehensive coverage for scheduled runs.

minimize.go implements failure minimization for stresstest runs.

When a stresstest fails, minimization attempts to reduce the reproduction parameters (duration, threads, keys) to find the smallest configuration that still reproduces the failure. This makes debugging faster.

Reduction strategy: binary search on each dimension independently.

oracle.go provides access to C++ RocksDB tools for consistency verification.

The oracle uses ldb and sst_dump from a RocksDB build to verify that Go-produced databases are bit-compatible with the C++ implementation.

Environment variables:

ROCKSDB_PATH: path to RocksDB build directory (derives ldb and sst_dump)
LDB_PATH: explicit path to ldb binary (overrides ROCKSDB_PATH)
SST_DUMP_PATH: explicit path to sst_dump binary (overrides ROCKSDB_PATH)

recheck.go implements artifact rechecking for policy re-evaluation.

Recheck mode allows re-evaluating existing campaign artifacts against current policies and oracle tools. This is useful when:

Oracle tools become available after initial run
Quarantine policies change
Stop conditions are updated

Recheck does not re-run the tool; it only re-evaluates persisted artifacts.

runner.go implements the core campaign execution engine.

The Runner orchestrates instance execution with:

Tool invocation with timeout and cancellation
Artifact persistence (run.json, logs, DB snapshots)
Oracle gating (require ldb checkconsistency before pass)
Failure fingerprinting and deduplication
Skip policy enforcement
Summary generation (summary.json, governance.json)

skip.go implements instance-level skip policies.

Skip policies allow excluding instances BEFORE they run, based on:

Exact instance name
Group prefix (e.g., "status.durability")
Tag matching (e.g., tier=quick, tool=crashtest)

Unlike quarantine (fingerprint-based, post-failure), skip is instance-level and prevents the run from starting. Skipped instances are recorded in summary.json.

status.go defines status/durability repro instances.

These instances reproduce specific durability and consistency scenarios documented in docs/status/durability_report.md. They serve as regression tests for known failure modes and recovery behaviors.

stop.go defines stop conditions for instance runs.

Stop conditions specify what constitutes success vs failure for a run. They control termination requirements, verification passes, and oracle checks.

sweep.go implements sweep instances for parameter matrix expansion.

A sweep instance defines a base configuration with varying parameters. At run time, it expands into multiple concrete instances by taking the Cartesian product of all parameter values.

Example: cycles=[1,2,3] × mode=[sync,async] → 6 concrete runs.

synthetic.go provides synthetic failure injection for CI testing.

Synthetic failures allow deterministic testing of the minimization and failure classification pipelines without relying on actual test failures. They produce stable fingerprints for verifying deduplication and quarantine.

tags.go defines the structured tag set for instances.

Tags provide metadata for filtering, grouping, and classification. Each instance computes its tags from its configuration (tier, tool, etc.).

Package campaign implements the Jepsen-style campaign runner for RockyardKV.

This package provides:

Taxonomy types for campaign configuration (tiers, tools, fault models)
Instance definitions for the campaign matrix
Oracle gating and tool execution
Artifact bundle writing and failure fingerprinting
Campaign execution and reporting

trace.go implements trace capture and argument injection for stresstest.

When trace capture is enabled, the runner injects -trace-out and -trace-max-size arguments into stresstest invocations to capture operation traces for debugging.

Index ¶

Constants
Variables
func AllGroups() []string
func AllTagKeys() []string
func BuildReplayCommand(tracePath, dbPath, binDir string) string
func CheckTraceSize(tracePath string, config TraceConfig) (size int64, exceeded bool, err error)
func ComputeFingerprint(instanceName string, seed int64, failureKind, failureReason, logPath string) string
func CopyFile(src, dst string) error
func EnsureDir(path string) error
func EnsureTraceDir(runDir string, config TraceConfig) error
func GateCheck(instance *Instance, oracle *Oracle) error
func GlobalTimeout(tier Tier) int
func InjectTraceArgs(args []string, runDir string, config TraceConfig) ([]string, string)
func InstanceTimeout(tier Tier) int
func ResolveStepArgs(args []string, runDir string, seed int64, dbPath string) []string
func StepRunDir(instanceRunDir string, stepName string) string
func TracePaths(runDir string, config TraceConfig) (traceFile, truncatedMarker string)
func ValidateSkipPolicyTags(p *SkipPolicy) error
func WriteCampaignSummary(runRoot string, tier Tier, startTime, endTime time.Time, results []*RunResult, ...) error
func WriteGovernanceReport(runRoot string, results []*RunResult, skipped []SkipSummary, ...) error
func WriteReplayScript(runDir, tracePath, dbPath, binDir string) error
func WriteRunArtifact(result *RunResult) error
func WriteTruncatedMarker(runDir string, config TraceConfig, bytesWritten int64) error
type CampaignSummary
- func ReadCampaignSummary(runRoot string) (*CampaignSummary, error)
type CompositeInstance
- func StatusCompositeInstances() []CompositeInstance
- func (c *CompositeInstance) IsComposite() bool
- func (c *CompositeInstance) ToSteps() []Step
type CompositeResult
- func (c *CompositeResult) ComputePassed()
type FailureClass
type FaultErrorType
type FaultKind
type FaultModel
- func (f FaultModel) String() string
type FaultScope
type Filter
- func ParseFilter(filterStr string) (*Filter, error)
- func (f *Filter) Match(tags Tags) bool
- func (f *Filter) String() string
type FilterClause
- func (c FilterClause) Match(tags Tags) bool
- func (c FilterClause) String() string
type FilterOp
type GatingPolicy
type GovernanceFailure
type GovernanceReport
type Instance
- func FilterInstances(instances []Instance, filter *Filter) []Instance
- func GetInstances(tier Tier) []Instance
- func GetStatusInstances(group string) []Instance
- func NightlyInstances() []Instance
- func QuickInstances() []Instance
- func StatusInstances() []Instance
- func SyntheticInstance() *Instance
- func (i *Instance) BinaryName() string
- func (i *Instance) BinaryPath(binDir string) string
- func (i *Instance) ComputeTags() Tags
- func (i *Instance) IsGoTest() bool
- func (i *Instance) ResolveArgs(runDir string, seed int64) []string
- func (i *Instance) RunDir(runRoot string, seed int64) string
type InstanceSkipPolicies
- func NewInstanceSkipPolicies(path string) *InstanceSkipPolicies
- func (sp *InstanceSkipPolicies) Add(policy *SkipPolicy)
- func (sp *InstanceSkipPolicies) Count() int
- func (sp *InstanceSkipPolicies) LoadWithValidation() error
- func (sp *InstanceSkipPolicies) Save() error
- func (sp *InstanceSkipPolicies) ShouldSkip(inst *Instance) *SkipResult
type KnownFailure
type KnownFailures
- func NewKnownFailures(path string) *KnownFailures
- func (kf *KnownFailures) All() []*KnownFailure
- func (kf *KnownFailures) Count() int
- func (kf *KnownFailures) Get(fingerprint string) *KnownFailure
- func (kf *KnownFailures) GetQuarantinePolicy(fingerprint string) QuarantinePolicy
- func (kf *KnownFailures) IsDuplicate(fingerprint string) bool
- func (kf *KnownFailures) IsQuarantined(fingerprint string) bool
- func (kf *KnownFailures) Record(fingerprint, instance, timestamp string) bool
type MarkerRecheckResult
type MinBounds
- func DefaultMinBounds() MinBounds
type MinimizeConfig
- func DefaultMinimizeConfig() MinimizeConfig
type MinimizeResult
type Minimizer
- func NewMinimizer(runner *Runner, config MinimizeConfig) *Minimizer
- func (m *Minimizer) Minimize(ctx context.Context, result *RunResult) (*MinimizeResult, error)
- func (m *Minimizer) ShouldMinimize(result *RunResult) bool
type Oracle
- func NewOracleFromEnv() *Oracle
- func (o *Oracle) Available() bool
- func (o *Oracle) CheckConsistency(dbPath string) *ToolResult
- func (o *Oracle) DumpManifest(dbPath string) *ToolResult
- func (o *Oracle) DumpSST(sstPath string, args ...string) *ToolResult
type OracleRecheckResult
type PolicyRecheckResult
type QuarantinePolicy
type RecheckResult
type Rechecker
- func NewRechecker(oracle *Oracle) *Rechecker
- func (r *Rechecker) RecheckCampaign(runRoot string) ([]RecheckResult, error)
- func (r *Rechecker) RecheckRun(runDir string) (*RecheckResult, error)
type ReductionStep
type RunArtifact
type RunResult
- func RunSyntheticFailure(ctx context.Context, config SyntheticFailConfig, runDir string) *RunResult
- func (r *RunResult) Duration() time.Duration
type RunSummary
type Runner
- func NewRunner(config RunnerConfig) *Runner
- func (r *Runner) Run(ctx context.Context) (*CampaignSummary, error)
- func (r *Runner) RunCompositeInstances(ctx context.Context, composites []CompositeInstance) (*CampaignSummary, error)
- func (r *Runner) RunGroup(ctx context.Context, group string) (*CampaignSummary, error)
- func (r *Runner) RunInstances(ctx context.Context, instances []Instance) (*CampaignSummary, error)
- func (r *Runner) RunSweepInstances(ctx context.Context, sweeps []SweepInstance) (*CampaignSummary, error)
type RunnerConfig
type SkipPolicy
- func (p *SkipPolicy) Matches(inst *Instance) bool
type SkipResult
type SkipSummary
type Step
type StepResult
type StopCondition
- func DefaultStopCondition() StopCondition
type SweepCase
- func DisableWALFaultFSMinimizeCases() []SweepCase
type SweepInstance
- func StatusSweepInstances() []SweepInstance
- func (s *SweepInstance) Expand() []Instance
type SweepParam
type SyntheticFailConfig
type Tags
- func (t Tags) Get(key string) string
type Tier
type Tool
type ToolResult
- func (r *ToolResult) OK() bool
type TraceConfig
- func DefaultTraceConfig() TraceConfig
type TraceResult
- func CollectTraceResult(runDir, dbPath, binDir string, config TraceConfig) *TraceResult

Constants ¶

View Source

const SchemaVersion = "1.1.0"

SchemaVersion is the current version of the artifact schema. Bump rules:

Major: interpretation changes (field meaning, fingerprint algorithm, pass/fail logic)
Minor: additive fields that don't change meaning or pass/fail
Patch: tooling bugfixes that don't change schema

Variables ¶

View Source

var ErrOracleNotConfigured = errors.New("oracle not configured: set ROCKSDB_PATH or provide explicit tool paths")

ErrOracleNotConfigured indicates the oracle tools are not available.

View Source

var ErrOracleToolNotFound = errors.New("oracle tool not found")

ErrOracleToolNotFound indicates a specific oracle tool was not found.

Functions ¶

func AllGroups ¶

func AllGroups() []string

AllGroups returns all available instance groups.

func AllTagKeys ¶

func AllTagKeys() []string

AllTagKeys returns all valid tag keys for filter validation.

func BuildReplayCommand ¶

func BuildReplayCommand(tracePath, dbPath, binDir string) string

BuildReplayCommand generates the traceanalyzer replay command for a trace file.

func CheckTraceSize ¶

func CheckTraceSize(tracePath string, config TraceConfig) (size int64, exceeded bool, err error)

CheckTraceSize checks if a trace file exists and its size. Returns the size and whether it exceeds the limit.

func ComputeFingerprint ¶

func ComputeFingerprint(instanceName string, seed int64, failureKind, failureReason, logPath string) string

ComputeFingerprint computes a failure fingerprint that includes: - Instance name (to avoid collisions across instances) - Seed (to identify specific run) - Failure kind (enum-like category) - Failure reason (specific message) - Log tail (for extra signal)

Uses SHA-256 truncated to 16 hex chars for uniqueness.

func CopyFile ¶

func CopyFile(src, dst string) error

CopyFile copies a file from src to dst.

func EnsureDir ¶

func EnsureDir(path string) error

EnsureDir creates a directory if it does not exist.

func EnsureTraceDir ¶

func EnsureTraceDir(runDir string, config TraceConfig) error

EnsureTraceDir creates the trace directory if trace capture is enabled.

func GateCheck ¶

func GateCheck(instance *Instance, oracle *Oracle) error

GateCheck verifies that the oracle is available if required by the instance. Returns an error if oracle is required but not available.

func GlobalTimeout ¶

func GlobalTimeout(tier Tier) int

GlobalTimeout returns the global timeout for a campaign run based on tier.

func InjectTraceArgs ¶

func InjectTraceArgs(args []string, runDir string, config TraceConfig) ([]string, string)

InjectTraceArgs adds -trace-out and -trace-max-size to the argument list if not already present. Returns the modified args and the trace file path. If -trace-out is already specified (either as "-trace-out <path>" or "-trace-out=<path>"), returns the existing path without modification (but still injects -trace-max-size if missing).

func InstanceTimeout ¶

func InstanceTimeout(tier Tier) int

InstanceTimeout returns the default timeout for an instance based on tier.

func ResolveStepArgs ¶

func ResolveStepArgs(args []string, runDir string, seed int64, dbPath string) []string

ResolveStepArgs returns args with placeholders replaced. dbPath is the DB path discovered from a previous step (or empty).

func StepRunDir ¶

func StepRunDir(instanceRunDir string, stepName string) string

StepRunDir returns the run directory for a specific step.

func TracePaths ¶

func TracePaths(runDir string, config TraceConfig) (traceFile, truncatedMarker string)

TracePaths returns the paths for trace artifacts in a run directory.

func ValidateSkipPolicyTags ¶

func ValidateSkipPolicyTags(p *SkipPolicy) error

ValidateSkipPolicyTags returns an error if any tag key in the policy is unknown.

func WriteCampaignSummary ¶

func WriteCampaignSummary(runRoot string, tier Tier, startTime, endTime time.Time, results []*RunResult, skipped []SkipSummary) error

WriteCampaignSummary writes the summary.json file to the run root.

func WriteGovernanceReport ¶

func WriteGovernanceReport(runRoot string, results []*RunResult, skipped []SkipSummary, knownFailures *KnownFailures) error

WriteGovernanceReport writes the governance.json file to the run root. This artifact provides an at-a-glance triage view for operators.

func WriteReplayScript ¶

func WriteReplayScript(runDir, tracePath, dbPath, binDir string) error

WriteReplayScript writes a replay.sh script to the run directory.

func WriteRunArtifact ¶

func WriteRunArtifact(result *RunResult) error

WriteRunArtifact writes the run.json file to the run directory. Also writes duplicate_of.txt if the failure is a duplicate.

func WriteTruncatedMarker ¶

func WriteTruncatedMarker(runDir string, config TraceConfig, bytesWritten int64) error

WriteTruncatedMarker writes a marker file indicating trace truncation.

Types ¶

type CampaignSummary ¶

type CampaignSummary struct {
	SchemaVersion string       `json:"schema_version"`
	Tier          string       `json:"tier"`
	StartTime     time.Time    `json:"start_time"`
	EndTime       time.Time    `json:"end_time"`
	DurationMs    int64        `json:"duration_ms"`
	TotalRuns     int          `json:"total_runs"`
	PassedRuns    int          `json:"passed_runs"`
	FailedRuns    int          `json:"failed_runs"`
	SkippedRuns   int          `json:"skipped_runs"`
	UniqueErrors  int          `json:"unique_errors"`
	AllPassed     bool         `json:"all_passed"`
	Runs          []RunSummary `json:"runs"`

	// Skipped instances and their reasons
	Skipped []SkipSummary `json:"skipped,omitempty"`

	// Governance fields for failure classification and deduplication
	NewFailures    int `json:"new_failures"`
	KnownFailures  int `json:"known_failures"`
	Duplicates     int `json:"duplicates"`
	Unquarantined  int `json:"unquarantined"`
	OracleRequired int `json:"oracle_required"`
	OracleGated    int `json:"oracle_gated"`
}

CampaignSummary is the JSON structure written to summary.json after a campaign.

func ReadCampaignSummary ¶

func ReadCampaignSummary(runRoot string) (*CampaignSummary, error)

ReadCampaignSummary reads summary.json from a run root.

type CompositeInstance ¶

type CompositeInstance struct {
	Instance

	// Steps are the execution steps (in order).
	// If nil or empty, the Instance.Tool/Args are used as a single step.
	Steps []Step

	// GatingPolicy determines how step results combine.
	// Default: GateAllSteps (fail if ANY step fails).
	GatingPolicy GatingPolicy
}

CompositeInstance extends Instance with multi-step execution support.

func StatusCompositeInstances ¶

func StatusCompositeInstances() []CompositeInstance

StatusCompositeInstances returns composite (multi-step) instances. These instances execute multiple steps with a gating policy.

func (*CompositeInstance) IsComposite ¶

func (c *CompositeInstance) IsComposite() bool

IsComposite returns true if this instance has multiple steps.

func (*CompositeInstance) ToSteps ¶

func (c *CompositeInstance) ToSteps() []Step

ToSteps converts the instance to a list of steps. If no explicit steps, creates a single step from Instance fields.

type CompositeResult ¶

type CompositeResult struct {
	// Steps contains results for each step.
	Steps []StepResult

	// Passed indicates if the composite instance passed per its gating policy.
	Passed bool

	// FailureReason summarizes why the instance failed.
	FailureReason string

	// GatingPolicy that was applied.
	GatingPolicy GatingPolicy
}

CompositeResult captures the outcome of a composite instance.

func (*CompositeResult) ComputePassed ¶

func (c *CompositeResult) ComputePassed()

ComputePassed evaluates the gating policy against step results.

type FailureClass ¶

type FailureClass string

FailureClass categorizes failures for governance reporting.

const (
	// FailureClassNone means the run passed.
	FailureClassNone FailureClass = ""
	// FailureClassNew is a new failure not previously seen.
	FailureClassNew FailureClass = "new_failure"
	// FailureClassKnown is a failure that matches a quarantined known failure.
	FailureClassKnown FailureClass = "known_failure"
	// FailureClassDuplicate is a repeat of a failure already seen in this campaign run.
	FailureClassDuplicate FailureClass = "duplicate"
)

type FaultErrorType ¶

type FaultErrorType string

FaultErrorType represents the error type for fault injection.

const (
	// ErrorTypeStatus returns a status error (retryable).
	ErrorTypeStatus FaultErrorType = "status"

	// ErrorTypeCorruption returns a corruption error (fatal).
	ErrorTypeCorruption FaultErrorType = "corruption"

	// ErrorTypeTruncated returns a truncated error.
	ErrorTypeTruncated FaultErrorType = "truncated"
)

type FaultKind ¶

type FaultKind string

FaultKind represents the type of fault to inject.

const (
	// FaultNone means no fault injection.
	FaultNone FaultKind = "none"

	// FaultRead injects read errors.
	FaultRead FaultKind = "read"

	// FaultWrite injects write errors.
	FaultWrite FaultKind = "write"

	// FaultSync injects sync/fsync errors.
	FaultSync FaultKind = "sync"

	// FaultCrash injects process crashes.
	FaultCrash FaultKind = "crash"

	// FaultCorrupt injects data corruption.
	FaultCorrupt FaultKind = "corrupt"
)

type FaultModel ¶

type FaultModel struct {
	// Kind is the type of fault to inject.
	Kind FaultKind

	// ErrorType is the error type for the fault (status, corruption, truncated).
	ErrorType FaultErrorType

	// OneIn is the probability denominator (e.g., 7 means 1/7 chance).
	OneIn int

	// Scope is where faults are injected.
	Scope FaultScope
}

FaultModel describes the fault injection configuration.

func (FaultModel) String ¶

func (f FaultModel) String() string

String returns a human-readable description of the fault model.

type FaultScope ¶

type FaultScope string

FaultScope represents where faults are injected.

const (
	// ScopeWorker injects faults in worker goroutines.
	ScopeWorker FaultScope = "worker"

	// ScopeFlusher injects faults in the flusher goroutine.
	ScopeFlusher FaultScope = "flusher"

	// ScopeReopener injects faults during DB reopen.
	ScopeReopener FaultScope = "reopener"

	// ScopeGlobal injects faults globally (all goroutines).
	ScopeGlobal FaultScope = "global"
)

type Filter ¶

type Filter struct {
	Clauses []FilterClause
}

Filter represents a parsed filter expression.

func ParseFilter ¶

func ParseFilter(filterStr string) (*Filter, error)

ParseFilter parses a filter string into a Filter. Format: "key=value,key!=value,key=val1|val2" - Comma separates clauses (AND semantics) - Pipe separates values within a clause (OR semantics) - "=" for equality, "!=" for inequality

func (*Filter) Match ¶

func (f *Filter) Match(tags Tags) bool

Match returns true if the tags match all filter clauses (AND semantics).

func (*Filter) String ¶

func (f *Filter) String() string

String returns a string representation of the filter.

type FilterClause ¶

type FilterClause struct {
	Key    string
	Op     FilterOp
	Values []string // Multiple values for OR (pipe-separated)
}

FilterClause represents a single filter clause (key op values).

func (FilterClause) Match ¶

func (c FilterClause) Match(tags Tags) bool

Match returns true if the tags match this clause.

func (FilterClause) String ¶

func (c FilterClause) String() string

String returns a string representation of the clause.

type FilterOp ¶

type FilterOp int

FilterOp is the filter operation type.

const (
	// OpEqual matches if tag value equals any of the values.
	OpEqual FilterOp = iota
	// OpNotEqual matches if tag value does not equal any of the values.
	OpNotEqual
)

type GatingPolicy ¶

type GatingPolicy string

GatingPolicy defines how multi-step instance results are combined.

const (
	// GateAllSteps fails if ANY step fails.
	GateAllSteps GatingPolicy = "all_steps"

	// GateLastStep fails ONLY if the last step fails.
	// Earlier step failures are recorded but don't fail the instance.
	GateLastStep GatingPolicy = "last_step"
)

type GovernanceFailure ¶

type GovernanceFailure struct {
	Instance    string `json:"instance"`
	Seed        int64  `json:"seed"`
	Fingerprint string `json:"fingerprint"`
	IssueID     string `json:"issue_id,omitempty"`
	FailureKind string `json:"failure_kind,omitempty"`
}

GovernanceFailure contains details about a failure for triage.

type GovernanceReport ¶

type GovernanceReport struct {
	SchemaVersion string `json:"schema_version"`

	// Summary counts
	TotalFailures    int `json:"total_failures"`
	NewFailures      int `json:"new_failures"`
	KnownFailures    int `json:"known_failures"`
	Duplicates       int `json:"duplicates"`
	Unquarantined    int `json:"unquarantined"`
	SkippedInstances int `json:"skipped_instances"`

	// Actionable items
	UnquarantinedDuplicates []GovernanceFailure `json:"unquarantined_duplicates,omitempty"`
	QuarantinedHits         []GovernanceFailure `json:"quarantined_hits,omitempty"`
	SkippedList             []SkipSummary       `json:"skipped,omitempty"`

	// Next steps for operators
	NextSteps string `json:"next_steps"`
}

GovernanceReport is the machine-readable triage report for operators. Written to governance.json in the run root.

type Instance ¶

type Instance struct {
	// Name is the unique instance identifier.
	// Should be descriptive: "stress.read.corruption.1in7"
	Name string

	// Tier is the intensity level (quick or nightly).
	Tier Tier

	// RequiresOracle indicates if C++ oracle tools are required.
	// If true, the runner will fail fast if oracle is not configured.
	RequiresOracle bool

	// Tool is the test binary to execute.
	Tool Tool

	// Args are the command-line arguments for the tool.
	// Use "<RUN_DIR>" as a placeholder for the run directory.
	// Use "<SEED>" as a placeholder for the seed value.
	Args []string

	// Env are additional environment variables for the tool.
	Env map[string]string

	// Seeds are the seed values to run. Each seed produces a separate run.
	Seeds []int64

	// FaultModel describes the fault injection configuration.
	FaultModel FaultModel

	// Stop defines the stopping conditions for this instance.
	Stop StopCondition
}

Instance represents a single campaign test instance. Each instance defines a specific test configuration to run.

func FilterInstances ¶

func FilterInstances(instances []Instance, filter *Filter) []Instance

FilterInstances returns instances that match the filter.

func GetInstances ¶

func GetInstances(tier Tier) []Instance

GetInstances returns the instances for the specified tier. Includes both campaign instances (stress, crash, golden) and status instances (durability, adversarial).

func GetStatusInstances ¶

func GetStatusInstances(group string) []Instance

GetStatusInstances returns status instances filtered by group prefix. If group is empty, returns all status instances.

func NightlyInstances ¶

func NightlyInstances() []Instance

NightlyInstances returns the instance matrix for the nightly tier. Nightly tier is for thorough testing that can run for hours.

func QuickInstances ¶

func QuickInstances() []Instance

QuickInstances returns the instance matrix for the quick tier. Quick tier is for local development and CI on pull requests.

func StatusInstances ¶

func StatusInstances() []Instance

StatusInstances returns the simple instance matrix for status/durability checks. For composite instances (multi-step), see StatusCompositeInstances(). For sweep instances (parameter expansion), see StatusSweepInstances().

func SyntheticInstance ¶

func SyntheticInstance() *Instance

SyntheticInstance returns a test-only instance that fails deterministically. This is gated behind ROCKYARDKV_SYNTHETIC_FAIL=1 env var to prevent accidental use.

Usage:

ROCKYARDKV_SYNTHETIC_FAIL=1 bin/campaignrunner -group=synthetic -minimize

func (*Instance) BinaryName ¶

func (i *Instance) BinaryName() string

BinaryName returns the binary name for the tool (without path).

func (*Instance) BinaryPath ¶

func (i *Instance) BinaryPath(binDir string) string

BinaryPath returns the full path to the tool binary. Uses binDir to construct path for test binaries (e.g., "./bin/stresstest"). For go test, returns "go" since it's expected to be on PATH.

func (*Instance) ComputeTags ¶

func (i *Instance) ComputeTags() Tags

ComputeTags derives the Tags from an Instance.

func (*Instance) IsGoTest ¶

func (i *Instance) IsGoTest() bool

IsGoTest returns true if this instance runs via `go test`.

func (*Instance) ResolveArgs ¶

func (i *Instance) ResolveArgs(runDir string, seed int64) []string

ResolveArgs returns the arguments with placeholders replaced.

func (*Instance) RunDir ¶

func (i *Instance) RunDir(runRoot string, seed int64) string

RunDir returns the run directory path for a specific seed.

type InstanceSkipPolicies ¶

type InstanceSkipPolicies struct {
	// contains filtered or unexported fields
}

InstanceSkipPolicies manages a set of skip policies.

func NewInstanceSkipPolicies ¶

func NewInstanceSkipPolicies(path string) *InstanceSkipPolicies

NewInstanceSkipPolicies creates a new skip policy manager. If path is non-empty, policies are loaded from disk.

func (*InstanceSkipPolicies) Add ¶

func (sp *InstanceSkipPolicies) Add(policy *SkipPolicy)

Add adds a new skip policy.

func (*InstanceSkipPolicies) Count ¶

func (sp *InstanceSkipPolicies) Count() int

Count returns the number of skip policies.

func (*InstanceSkipPolicies) LoadWithValidation ¶

func (sp *InstanceSkipPolicies) LoadWithValidation() error

LoadWithValidation loads policies and returns any validation errors. Use this when callers want to surface validation issues to users.

func (*InstanceSkipPolicies) Save ¶

func (sp *InstanceSkipPolicies) Save() error

Save writes skip policies to disk.

func (*InstanceSkipPolicies) ShouldSkip ¶

func (sp *InstanceSkipPolicies) ShouldSkip(inst *Instance) *SkipResult

ShouldSkip returns a SkipResult if the instance should be skipped, nil otherwise.

type KnownFailure ¶

type KnownFailure struct {
	Fingerprint string `json:"fingerprint"`
	Instance    string `json:"instance"`
	FirstSeen   string `json:"first_seen"`
	Count       int    `json:"count"`
	Description string `json:"description,omitempty"`

	// IssueID links the failure to a tracking issue (e.g., "GH-123").
	IssueID string `json:"issue_id,omitempty"`

	// Quarantine defines how this known failure should be handled.
	// If empty, the failure is not quarantined and will fail the campaign.
	Quarantine QuarantinePolicy `json:"quarantine,omitempty"`
}

KnownFailure represents a previously seen failure fingerprint.

type KnownFailures ¶

type KnownFailures struct {
	// contains filtered or unexported fields
}

KnownFailures tracks failure fingerprints for deduplication.

func NewKnownFailures ¶

func NewKnownFailures(path string) *KnownFailures

NewKnownFailures creates a new known failures tracker. If path is non-empty, failures are persisted to disk.

func (*KnownFailures) All ¶

func (kf *KnownFailures) All() []*KnownFailure

All returns all known failures.

func (*KnownFailures) Count ¶

func (kf *KnownFailures) Count() int

Count returns the number of known failure fingerprints.

func (*KnownFailures) Get ¶

func (kf *KnownFailures) Get(fingerprint string) *KnownFailure

Get returns the known failure for a fingerprint, or nil if not found.

func (*KnownFailures) GetQuarantinePolicy ¶

func (kf *KnownFailures) GetQuarantinePolicy(fingerprint string) QuarantinePolicy

GetQuarantinePolicy returns the quarantine policy for a fingerprint. Returns QuarantineNone if the fingerprint is not known or not quarantined.

func (*KnownFailures) IsDuplicate ¶

func (kf *KnownFailures) IsDuplicate(fingerprint string) bool

IsDuplicate returns true if the fingerprint has been seen before.

func (*KnownFailures) IsQuarantined ¶

func (kf *KnownFailures) IsQuarantined(fingerprint string) bool

IsQuarantined returns true if the fingerprint is known AND has a quarantine policy.

func (*KnownFailures) Record ¶

func (kf *KnownFailures) Record(fingerprint, instance, timestamp string) bool

Record adds or updates a failure fingerprint. Returns true if this is a new (not duplicate) failure.

type MarkerRecheckResult ¶

type MarkerRecheckResult struct {
	// Passed indicates if verification markers indicate success.
	Passed bool `json:"passed"`

	// Reason explains the result.
	Reason string `json:"reason"`
}

MarkerRecheckResult contains verification marker re-parse details.

type MinBounds ¶

type MinBounds struct {
	// MinDuration is the minimum test duration.
	// Default: 5 seconds.
	MinDuration time.Duration

	// MinThreads is the minimum number of threads.
	// Default: 4.
	MinThreads int

	// MinKeys is the minimum number of keys.
	// Default: 500.
	MinKeys int
}

MinBounds defines the minimum values for parameter reduction during minimization.

func DefaultMinBounds ¶

func DefaultMinBounds() MinBounds

DefaultMinBounds returns the Red Team approved minimization bounds.

type MinimizeConfig ¶

type MinimizeConfig struct {
	// Enabled controls whether minimization is active.
	Enabled bool

	// Bounds defines the minimum parameter values.
	Bounds MinBounds

	// AllowedFailureKinds is the set of failure kinds eligible for minimization.
	// Empty means all failure kinds are eligible.
	AllowedFailureKinds map[string]bool
}

MinimizeConfig controls the minimization process.

func DefaultMinimizeConfig ¶

func DefaultMinimizeConfig() MinimizeConfig

DefaultMinimizeConfig returns the default minimization configuration.

type MinimizeResult ¶

type MinimizeResult struct {
	// Success indicates if minimization found a smaller reproducer.
	Success bool `json:"success"`

	// OriginalArgs are the original instance arguments.
	OriginalArgs []string `json:"original_args"`

	// MinimalArgs are the minimized arguments (if successful).
	MinimalArgs []string `json:"minimal_args,omitempty"`

	// Steps records each reduction attempt.
	Steps []ReductionStep `json:"steps"`

	// FinalDuration is the minimized duration.
	FinalDuration string `json:"final_duration,omitempty"`

	// FinalThreads is the minimized thread count.
	FinalThreads int `json:"final_threads,omitempty"`

	// FinalKeys is the minimized key count.
	FinalKeys int `json:"final_keys,omitempty"`

	// PreservedFailureKind is the failure kind class that was preserved across reduction.
	PreservedFailureKind string `json:"preserved_failure_kind,omitempty"`

	// TotalAttempts is the number of runs performed during minimization.
	TotalAttempts int `json:"total_attempts"`

	// TotalDurationMs is the total time spent minimizing.
	TotalDurationMs int64 `json:"total_duration_ms"`
}

MinimizeResult captures the outcome of a minimization attempt.

type Minimizer ¶

type Minimizer struct {
	// contains filtered or unexported fields
}

Minimizer reduces failing test cases to minimal parameters.

func NewMinimizer ¶

func NewMinimizer(runner *Runner, config MinimizeConfig) *Minimizer

NewMinimizer creates a new minimizer with the given runner and config.

func (*Minimizer) Minimize ¶

func (m *Minimizer) Minimize(ctx context.Context, result *RunResult) (*MinimizeResult, error)

Minimize attempts to reduce a failing instance to minimal parameters. It uses sequential reduction: duration → threads → keys. Within each parameter, it uses binary search.

func (*Minimizer) ShouldMinimize ¶

func (m *Minimizer) ShouldMinimize(result *RunResult) bool

ShouldMinimize returns true if the failure is eligible for minimization.

type Oracle ¶

type Oracle struct {
	// RocksDBPath is the path to the RocksDB source/build directory.
	// Should contain ldb and sst_dump binaries.
	RocksDBPath string

	// LDBPath is the explicit path to the ldb binary.
	// If empty, uses RocksDBPath/ldb.
	LDBPath string

	// SSTDumpPath is the explicit path to the sst_dump binary.
	// If empty, uses RocksDBPath/sst_dump.
	SSTDumpPath string
}

Oracle provides access to the C++ RocksDB tools (ldb, sst_dump). These tools are used to verify database consistency and format correctness.

func NewOracleFromEnv ¶

func NewOracleFromEnv() *Oracle

NewOracleFromEnv creates an Oracle from environment variables.

Environment variable precedence:

LDB_PATH: explicit path to ldb binary (overrides ROCKSDB_PATH-derived path)
SST_DUMP_PATH: explicit path to sst_dump binary (overrides ROCKSDB_PATH-derived path)
ROCKSDB_PATH: path to RocksDB build directory (derives ldb and sst_dump from it)

Returns nil if neither ROCKSDB_PATH nor the tool-specific paths are set.

func (*Oracle) Available ¶

func (o *Oracle) Available() bool

Available returns true if the oracle tools are configured and accessible.

func (*Oracle) CheckConsistency ¶

func (o *Oracle) CheckConsistency(dbPath string) *ToolResult

CheckConsistency runs `ldb checkconsistency` on the database. Returns OK if the database passes all consistency checks.

func (*Oracle) DumpManifest ¶

func (o *Oracle) DumpManifest(dbPath string) *ToolResult

DumpManifest runs `ldb manifest_dump` on the database.

func (*Oracle) DumpSST ¶

func (o *Oracle) DumpSST(sstPath string, args ...string) *ToolResult

DumpSST runs `sst_dump` on an SST file.

type OracleRecheckResult ¶

type OracleRecheckResult struct {
	// Performed indicates if oracle check was run.
	Performed bool `json:"performed"`

	// Skipped indicates oracle check was skipped (not required or oracle unavailable).
	Skipped bool `json:"skipped,omitempty"`

	// SkipReason explains why oracle check was skipped.
	SkipReason string `json:"skip_reason,omitempty"`

	// OK indicates if the oracle check passed.
	OK bool `json:"ok"`

	// ExitCode is the oracle tool exit code.
	ExitCode int `json:"exit_code,omitempty"`

	// StdoutPath is the path to captured stdout.
	StdoutPath string `json:"stdout_path,omitempty"`

	// StderrPath is the path to captured stderr.
	StderrPath string `json:"stderr_path,omitempty"`

	// Summary is a brief inline summary.
	Summary string `json:"summary,omitempty"`
}

OracleRecheckResult contains oracle tool re-check details.

type PolicyRecheckResult ¶

type PolicyRecheckResult struct {
	// Passed indicates if the run passes current policy.
	Passed bool `json:"passed"`

	// Reason explains why it passed or failed.
	Reason string `json:"reason"`

	// Verified indicates if the run can be marked as VERIFIED.
	// False when oracle is required but missing.
	Verified bool `json:"verified"`
}

PolicyRecheckResult contains stop-condition policy evaluation.

type QuarantinePolicy ¶

type QuarantinePolicy string

QuarantinePolicy defines how a known failure should be handled.

const (
	// QuarantineNone means the failure is not quarantined and will fail the campaign.
	QuarantineNone QuarantinePolicy = ""
	// QuarantineAllowed means the failure is expected and allowed to occur.
	QuarantineAllowed QuarantinePolicy = "allowed"
	// QuarantineSkip means the instance should be skipped entirely.
	QuarantineSkip QuarantinePolicy = "skip"
)

type RecheckResult ¶

type RecheckResult struct {
	// RecheckTime is when the recheck was performed.
	RecheckTime time.Time `json:"recheck_time"`

	// RecheckSchemaVersion is the schema version used for this recheck.
	RecheckSchemaVersion string `json:"recheck_schema_version"`

	// OracleRecheck contains the oracle re-check outcome.
	OracleRecheck *OracleRecheckResult `json:"oracle_recheck,omitempty"`

	// MarkerRecheck contains the verification marker re-parse outcome.
	MarkerRecheck *MarkerRecheckResult `json:"marker_recheck,omitempty"`

	// FingerprintRecomputed is the recomputed fingerprint (if failure).
	FingerprintRecomputed string `json:"fingerprint_recomputed,omitempty"`

	// PolicyResult contains the pass/fail evaluation with current stop conditions.
	PolicyResult *PolicyRecheckResult `json:"policy_result"`
}

RecheckResult captures the outcome of re-evaluating an existing run.

type Rechecker ¶

type Rechecker struct {
	Oracle         *Oracle
	StopConditions map[string]StopCondition // instance name -> stop condition
}

Rechecker re-evaluates existing run artifacts.

func NewRechecker ¶

func NewRechecker(oracle *Oracle) *Rechecker

NewRechecker creates a new Rechecker.

func (*Rechecker) RecheckCampaign ¶

func (r *Rechecker) RecheckCampaign(runRoot string) ([]RecheckResult, error)

RecheckCampaign re-evaluates all runs in a campaign run root.

func (*Rechecker) RecheckRun ¶

func (r *Rechecker) RecheckRun(runDir string) (*RecheckResult, error)

RecheckRun re-evaluates a single run directory.

type ReductionStep ¶

type ReductionStep struct {
	Parameter   string `json:"parameter"` // "duration", "threads", or "keys"
	OriginalVal string `json:"original_value"`
	ReducedVal  string `json:"reduced_value"`
	StillFails  bool   `json:"still_fails"`
	DurationMs  int64  `json:"duration_ms"`
}

ReductionStep records a single step in the minimization process.

type RunArtifact ¶

type RunArtifact struct {
	SchemaVersion string    `json:"schema_version"`
	Instance      string    `json:"instance"`
	Seed          int64     `json:"seed"`
	BinaryPath    string    `json:"binary_path"`
	StartTime     time.Time `json:"start_time"`
	EndTime       time.Time `json:"end_time"`
	DurationMs    int64     `json:"duration_ms"`
	ExitCode      int       `json:"exit_code"`
	Passed        bool      `json:"passed"`
	Failure       string    `json:"failure,omitempty"`
	FailureKind   string    `json:"failure_kind,omitempty"`
	Fingerprint   string    `json:"fingerprint,omitempty"`
	IsDuplicate   bool      `json:"is_duplicate,omitempty"`

	// Oracle check fields
	OracleExitCode *int   `json:"oracle_exit_code,omitempty"`
	OracleOutput   string `json:"oracle_output,omitempty"`

	// Trace capture fields
	TracePath        string `json:"trace_path,omitempty"`
	TraceBytesWriten int64  `json:"trace_bytes_written,omitempty"`
	TraceTruncated   bool   `json:"trace_truncated,omitempty"`
	ReplayCommand    string `json:"replay_command,omitempty"`

	// Minimization fields
	Minimized       bool            `json:"minimized,omitempty"`
	MinimizedResult *MinimizeResult `json:"minimized_result,omitempty"`

	// Tags for filtering (computed at write time)
	Tags *Tags `json:"tags,omitempty"`
}

RunArtifact is the JSON structure written to run.json in each run directory.

type RunResult ¶

type RunResult struct {
	// Instance is the instance that was run.
	Instance *Instance

	// Seed is the seed value used for this run.
	Seed int64

	// RunDir is the directory containing all run artifacts.
	RunDir string

	// BinaryPath is the resolved path to the binary that was executed.
	BinaryPath string

	// StartTime is when the run started.
	StartTime time.Time

	// EndTime is when the run ended.
	EndTime time.Time

	// ExitCode is the process exit code.
	ExitCode int

	// Passed indicates if the run passed all stop conditions.
	Passed bool

	// FailureReason describes why the run failed (if it did).
	FailureReason string

	// FailureKind categorizes the failure type for fingerprinting.
	FailureKind string

	// Fingerprint is the failure fingerprint for deduplication.
	// Empty string if the run passed.
	Fingerprint string

	// IsDuplicate indicates if this failure fingerprint was already known.
	IsDuplicate bool

	// FailureClass categorizes the failure for governance reporting.
	FailureClass FailureClass

	// QuarantinePolicy is the policy for this failure (if it's a known failure).
	QuarantinePolicy QuarantinePolicy

	// OracleResult is the result of oracle verification (if performed).
	OracleResult *ToolResult

	// TraceResult contains trace capture information (if enabled).
	TraceResult *TraceResult

	// MinimizeResult contains minimization results (if performed).
	MinimizeResult *MinimizeResult
}

RunResult represents the outcome of a single instance run.

func RunSyntheticFailure ¶

func RunSyntheticFailure(ctx context.Context, config SyntheticFailConfig, runDir string) *RunResult

RunSyntheticFailure executes a synthetic failure for CI testing. Returns a RunResult that simulates a deterministic, classifiable failure.

func (*RunResult) Duration ¶

func (r *RunResult) Duration() time.Duration

Duration returns the run duration.

type RunSummary ¶

type RunSummary struct {
	Instance     string       `json:"instance"`
	Seed         int64        `json:"seed"`
	Passed       bool         `json:"passed"`
	Failure      string       `json:"failure,omitempty"`
	Fingerprint  string       `json:"fingerprint,omitempty"`
	FailureClass FailureClass `json:"failure_class,omitempty"`
	DurationMs   int64        `json:"duration_ms"`
}

RunSummary is a brief summary of each run for the campaign summary.

type Runner ¶

type Runner struct {
	// contains filtered or unexported fields
}

Runner executes campaign instances.

func NewRunner ¶

func NewRunner(config RunnerConfig) *Runner

NewRunner creates a new campaign runner.

func (*Runner) Run ¶

func (r *Runner) Run(ctx context.Context) (*CampaignSummary, error)

Run executes all instances for the configured tier. Returns the campaign summary and any error.

func (*Runner) RunCompositeInstances ¶

func (r *Runner) RunCompositeInstances(ctx context.Context, composites []CompositeInstance) (*CampaignSummary, error)

RunCompositeInstances executes composite (multi-step) instances with Phase-1-grade artifacts.

func (*Runner) RunGroup ¶

func (r *Runner) RunGroup(ctx context.Context, group string) (*CampaignSummary, error)

RunGroup executes instances matching the group prefix. If group is empty, runs all instances for the tier. If group starts with "status.", runs status instances. Special groups "status.composite" and "status.sweep" run composite/sweep instances.

func (*Runner) RunInstances ¶

func (r *Runner) RunInstances(ctx context.Context, instances []Instance) (*CampaignSummary, error)

RunInstances executes the specified instances.

func (*Runner) RunSweepInstances ¶

func (r *Runner) RunSweepInstances(ctx context.Context, sweeps []SweepInstance) (*CampaignSummary, error)

RunSweepInstances expands and runs sweep instances.

type RunnerConfig ¶

type RunnerConfig struct {
	// Tier is the intensity level.
	Tier Tier

	// RunRoot is the root directory for all run artifacts.
	RunRoot string

	// BinDir is the directory containing test binaries.
	// Defaults to "./bin" if empty.
	BinDir string

	// Oracle is the C++ oracle for consistency checks.
	// May be nil if oracle is not available.
	Oracle *Oracle

	// KnownFailures tracks failure fingerprints for deduplication.
	KnownFailures *KnownFailures

	// FailFast stops the campaign on the first failure.
	FailFast bool

	// Verbose enables verbose output.
	Verbose bool

	// Output is where to write progress messages.
	Output io.Writer

	// InstanceTimeout is the per-instance timeout in seconds.
	// If 0, uses the default for the tier.
	InstanceTimeout int

	// GlobalTimeout is the global campaign timeout in seconds.
	// If 0, uses the default for the tier.
	GlobalTimeout int

	// Trace controls trace capture behavior.
	Trace TraceConfig

	// Minimize controls minimization behavior.
	Minimize MinimizeConfig

	// Filter restricts which instances to run.
	// If nil, all instances are run.
	Filter *Filter

	// RequireQuarantine enforces that repeat failures must be quarantined.
	// If true, unquarantined duplicate failures cause the campaign to fail.
	RequireQuarantine bool

	// SkipPolicies defines instance-level skip policies.
	// Instances matching a skip policy are not run and are recorded as skipped.
	SkipPolicies *InstanceSkipPolicies
}

RunnerConfig configures the campaign runner.

type SkipPolicy ¶

type SkipPolicy struct {
	// InstanceName is the exact instance name to skip (highest priority).
	InstanceName string `json:"instance_name,omitempty"`

	// Group matches instances whose name starts with this prefix.
	// For example, "status.durability" matches "status.durability.cycles4".
	Group string `json:"group,omitempty"`

	// Tags matches instances with all specified tag values.
	// For example, {"tier": "nightly", "kind": "crash"} matches all nightly crash tests.
	Tags map[string]string `json:"tags,omitempty"`

	// Reason is a human-readable explanation for why the instance is skipped.
	Reason string `json:"reason"`

	// IssueID links to a tracking issue (e.g., "GH-456").
	IssueID string `json:"issue_id,omitempty"`
}

SkipPolicy represents an instance-level skip policy. Unlike fingerprint-based quarantine, this skips instances BEFORE they run.

func (*SkipPolicy) Matches ¶

func (p *SkipPolicy) Matches(inst *Instance) bool

Matches returns true if this policy matches the given instance.

type SkipResult ¶

type SkipResult struct {
	InstanceName string `json:"instance_name"`
	Reason       string `json:"reason"`
	IssueID      string `json:"issue_id,omitempty"`
	Policy       string `json:"policy"` // Which policy matched (for debugging)
}

SkipResult records why an instance was skipped.

type SkipSummary ¶

type SkipSummary struct {
	Instance string `json:"instance"`
	Reason   string `json:"reason"`
	IssueID  string `json:"issue_id,omitempty"`
}

SkipSummary records an instance that was skipped.

type Step ¶

type Step struct {
	// Name identifies this step (e.g., "crashtest", "collision-check").
	Name string

	// Tool is the binary to execute.
	Tool Tool

	// Args are the command-line arguments.
	// Supports placeholders: <RUN_DIR>, <SEED>, <DB_DIR>, <PREV_DB_DIR>.
	Args []string

	// Env are additional environment variables.
	Env map[string]string

	// RequiresOracle indicates if this step needs oracle tools.
	RequiresOracle bool

	// DiscoverDBPath indicates the runner should discover the DB path
	// from the previous step's artifacts and make it available as <DB_DIR>.
	DiscoverDBPath bool
}

Step represents a single execution step in a composite instance.

type StepResult ¶

type StepResult struct {
	// StepName identifies which step this result is for.
	StepName string

	// Passed indicates if this step succeeded.
	Passed bool

	// ExitCode is the process exit code.
	ExitCode int

	// FailureReason describes why the step failed (if applicable).
	FailureReason string

	// DurationMs is how long the step took.
	DurationMs int64

	// DBPath is the discovered DB path (if DiscoverDBPath was set).
	DBPath string

	// LogPath is the path to this step's log file.
	LogPath string
}

StepResult captures the outcome of a single step execution.

type StopCondition ¶

type StopCondition struct {
	// RequireTermination requires the process to terminate within the timeout.
	// If false, the runner will kill after timeout but not treat it as failure.
	RequireTermination bool

	// RequireFinalVerificationPass requires the tool's final verification to pass.
	// For stresstest, this means expected state verification.
	// For crashtest, this means recovery verification.
	RequireFinalVerificationPass bool

	// RequireOracleCheckConsistencyOK requires `ldb checkconsistency` to return OK.
	// Only applies to instances with RequiresOracle=true.
	RequireOracleCheckConsistencyOK bool

	// DedupeByFingerprint enables deduplication by failure fingerprint.
	// When true, repeated failures with the same fingerprint are marked as duplicates.
	DedupeByFingerprint bool
}

StopCondition defines when an instance run is considered complete and what constitutes success vs failure.

func DefaultStopCondition ¶

func DefaultStopCondition() StopCondition

DefaultStopCondition returns the default stop condition for most instances.

type SweepCase ¶

type SweepCase struct {
	// ID is a stable identifier for this case (e.g., "cycles_4_mode_drop").
	ID string

	// Params maps parameter names to their values for this case.
	Params map[string]string
}

SweepCase represents a single concrete case in a sweep expansion.

func DisableWALFaultFSMinimizeCases ¶

func DisableWALFaultFSMinimizeCases() []SweepCase

DisableWALFaultFSMinimizeCases returns the sweep cases for disablewal-faultfs-minimize. These mirror the cases in scripts/status/run_durability_repros.sh.

type SweepInstance ¶

type SweepInstance struct {
	// Base is the base instance (used as template).
	Base Instance

	// Params are the parameters to sweep over.
	Params []SweepParam

	// Cases are the explicit cases to run (if provided, Params is ignored).
	// This allows defining arbitrary combinations rather than full cross-product.
	Cases []SweepCase
}

SweepInstance defines a parameterized instance that expands into multiple runs.

func StatusSweepInstances ¶

func StatusSweepInstances() []SweepInstance

StatusSweepInstances returns sweep (parameter expansion) instances. These instances expand into multiple concrete runs.

func (*SweepInstance) Expand ¶

func (s *SweepInstance) Expand() []Instance

Expand returns the concrete instances for this sweep. Each returned instance has a unique Name derived from the sweep case.

type SweepParam ¶

type SweepParam struct {
	// Name is the parameter name (e.g., "cycles", "mode").
	Name string

	// Values are the values to sweep over.
	Values []string
}

SweepParam represents a parameter that can be varied in a sweep.

type SyntheticFailConfig ¶

type SyntheticFailConfig struct {
	// Enabled activates the synthetic failure mode.
	Enabled bool

	// FailAfterOps causes failure after N operations.
	// Used to exercise minimization: minimizer should reduce N.
	FailAfterOps int

	// FailureKind is the classification for the synthetic failure.
	FailureKind string

	// FailureMessage is the human-readable failure reason.
	FailureMessage string
}

SyntheticFailConfig configures the synthetic failure hook. This is used for CI testing of minimization and failure classification.

type Tags ¶

type Tags struct {
	// Campaign is the campaign identifier (e.g., "C05").
	Campaign string `json:"campaign"`

	// Tier is the execution tier (quick/nightly).
	Tier string `json:"tier"`

	// Tool is the binary used (stresstest/crashtest/goldentest/adversarialtest/sstdump).
	Tool string `json:"tool"`

	// Kind is the high-level category (stress/crash/golden/status/adversarial).
	Kind string `json:"kind"`

	// OracleRequired indicates if the instance requires C++ oracle tools.
	OracleRequired bool `json:"oracle_required"`

	// Group is the group prefix (e.g., "status.durability").
	Group string `json:"group"`

	// FaultKind is the fault injection kind (none/read/write/sync/crash/corrupt).
	FaultKind string `json:"fault_kind"`

	// FaultScope is the fault injection scope (worker/flusher/reopener/global).
	FaultScope string `json:"fault_scope"`

	// Extra contains optional instance-specific metadata.
	Extra map[string]string `json:"extra,omitempty"`
}

Tags represents the structured tag set for an instance. Required tags are always present; optional tags use the Extra map.

func (Tags) Get ¶

func (t Tags) Get(key string) string

Get returns the value of a tag by key. Returns empty string for unknown keys or unset values.

type Tier ¶

type Tier string

Tier represents the test intensity level. Each tier has different duration, concurrency, and thoroughness settings.

const (
	// TierQuick is for local development and CI on pull requests.
	// Duration: ~2-5 minutes per instance.
	TierQuick Tier = "quick"

	// TierNightly is for nightly pipelines that run for hours.
	// More thorough, longer duration, higher stress.
	TierNightly Tier = "nightly"
)

type Tool ¶

type Tool string

Tool represents the test binary to execute.

const (
	// ToolStress runs the stresstest binary for concurrent workloads.
	ToolStress Tool = "stresstest"

	// ToolCrash runs the crashtest binary for crash recovery testing.
	ToolCrash Tool = "crashtest"

	// ToolAdversarial runs the adversarialtest binary for corruption attacks.
	ToolAdversarial Tool = "adversarialtest"

	// ToolGolden runs the goldentest suite for C++ compatibility.
	ToolGolden Tool = "goldentest"

	// ToolSSTDump runs the sstdump binary for SST inspection/verification.
	ToolSSTDump Tool = "sstdump"
)

type ToolResult ¶

type ToolResult struct {
	ExitCode int
	Stdout   string
	Stderr   string
	Err      error
}

ToolResult contains the result of running an oracle tool.

func (*ToolResult) OK ¶

func (r *ToolResult) OK() bool

OK returns true if the tool exited successfully.

type TraceConfig ¶

type TraceConfig struct {
	// Enabled controls whether trace capture is active.
	Enabled bool

	// MaxSizeBytes is the maximum trace file size before truncation.
	// Default: 256MB (256 * 1024 * 1024).
	MaxSizeBytes int64

	// TraceDir is the subdirectory under the run directory for trace files.
	// Default: "trace".
	TraceDir string
}

TraceConfig defines trace capture behavior for campaign runs.

func DefaultTraceConfig ¶

func DefaultTraceConfig() TraceConfig

DefaultTraceConfig returns the default trace configuration.

type TraceResult ¶

type TraceResult struct {
	// Path is the path to the trace file (if captured).
	Path string

	// BytesWritten is the number of bytes written to the trace file.
	BytesWritten int64

	// Truncated indicates if the trace was truncated due to size limits.
	Truncated bool

	// ReplayCommand is the command to replay this trace.
	ReplayCommand string
}

TraceResult captures the outcome of trace handling for a run.

func CollectTraceResult ¶

func CollectTraceResult(runDir, dbPath, binDir string, config TraceConfig) *TraceResult

CollectTraceResult gathers trace information after a run completes.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL