dr

package
v1.1.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 15, 2026 License: MIT Imports: 13 Imported by: 0

Documentation

Overview

Package dr provides disaster recovery operations: drills, standby promotion, rollback, and split-brain fencing.

Index

Constants

View Source
const DrillAlertRuleYAML = `` /* 353-byte string literal not displayed */

DrillAlertRuleYAML is the exact Alertmanager/Prometheus rule that fires when a monthly drill produces a non-"pass" result. The content is deployed to `web/backend/nself/monitoring/alerts/dr.rules.yml` and evaluated by the Prometheus instance on nclaw-prod.

View Source
const DrillReportTableDDL = `` /* 640-byte string literal not displayed */

DrillReportTableDDL is the idempotent Postgres DDL for the report table. It is applied once on nclaw-prod during cron install.

View Source
const DrillResultMetricName = "nself_dr_drill_result"

DrillResultMetricName is the Prometheus metric name emitted by a drill run. Labels: drill_id (unique per run), result ("pass" or "fail"), rto_sec, rpo_sec.

Variables

View Source
var RequiredPluginChecks = []string{"claw", "ai", "mux"}

RequiredPluginChecks lists plugins whose health MUST be present in every drill report. Additional installed plugins are appended at runtime.

Functions

func Fence

func Fence(ctx context.Context, cfg *config.Config) error

Fence sets a read-only flag in Redis for split-brain prevention.

func FormatDrillResult

func FormatDrillResult(result *DrillResult, format string) (string, error)

FormatDrillResult renders a drill result as JSON or table.

func InstallSystemdUnits

func InstallSystemdUnits(opts SystemdInstallOptions) error

InstallSystemdUnits renders and writes unit files, then runs `systemctl daemon-reload` and enables the drill timer. Requires root.

func PromoteStandby

func PromoteStandby(ctx context.Context, cfg *config.Config, opts PromoteOptions) error

PromoteStandby promotes the warm standby to primary and updates DNS.

func ReconfigureDNS

func ReconfigureDNS(ctx context.Context, cfg *config.Config, newIP string) error

ReconfigureDNS updates Cloudflare A records to point to a new IP.

func RenderCloudInit

func RenderCloudInit(p CloudInitParams) (string, error)

RenderCloudInit returns the cloud-init user-data YAML for a drill VM. The output is deterministic for a given params set so that operators can diff the rendered YAML against the last known good template when a drill fails with a cloud-init error (see the fail-fix playbook).

func Rollback

func Rollback(ctx context.Context, cfg *config.Config) error

Rollback demotes the promoted standby and resyncs from the original primary.

Types

type CloudInitParams

type CloudInitParams struct {
	DrillID        string
	BackupID       string
	B2Bucket       string
	B2KeyID        string
	B2AppKey       string
	AgeKeyMaterial string // contents of age-key.txt; embedded, never logged
	SSHPublicKey   string
	ReporterURL    string // nclaw-prod API endpoint that receives the report
	ReporterToken  string
	NselfVersion   string // e.g. v1.0.3; empty means latest
}

CloudInitParams feeds the drill VM user-data template. The resulting cloud-init YAML installs Docker and the nSelf CLI, then runs the drill entrypoint which restores the latest backup and executes the smoke suite.

type DrillOptions

type DrillOptions struct {
	Scenario Scenario
	DryRun   bool
}

DrillOptions holds flags for `nself dr drill`.

type DrillReport

type DrillReport struct {
	DrillID      string         `json:"drill_id"`
	StartedAt    time.Time      `json:"started_at"`
	FinishedAt   time.Time      `json:"finished_at"`
	VMID         string         `json:"vm_id"`
	BackupID     string         `json:"backup_id"`
	RTOActualSec int64          `json:"rto_actual_sec"`
	RPOActualSec int64          `json:"rpo_actual_sec"`
	Result       string         `json:"result"` // pass | fail
	Scenarios    DrillScenarios `json:"scenarios"`
	CostEUR      float64        `json:"cost_eur"`
}

DrillReport is the persisted JSON schema for a monthly DR drill run.

It is inserted into the `dr_drill_report` PG table on nclaw-prod via an authenticated API call, and is also the payload posted to the #ops-dr Telegram channel. Field names match the spec in p88-block-g section 4.4.

func NewDrillReport

func NewDrillReport(drillID, vmID, backupID string, startedAt time.Time) *DrillReport

NewDrillReport builds a new DrillReport with zero values for all scenario checks and a pre-populated PluginHealth map seeded with required plugins.

func (*DrillReport) Finalize

func (r *DrillReport) Finalize(finishedAt time.Time)

Finalize stamps the finish time and computes pass/fail across all scenarios. A report passes iff every scenario check and every plugin health check is true, and both RTO/RPO values were recorded (> 0).

func (*DrillReport) Marshal

func (r *DrillReport) Marshal() ([]byte, error)

MarshalJSON renders the report in the exact field order defined by the spec.

func (*DrillReport) Validate

func (r *DrillReport) Validate() error

Validate ensures the report has every required field populated. It is used by the storage layer to reject malformed reports before they hit PG.

type DrillResult

type DrillResult struct {
	ID            string                 `json:"id"`
	Scenario      Scenario               `json:"scenario"`
	StartedAt     time.Time              `json:"started_at"`
	FinishedAt    time.Time              `json:"finished_at"`
	Status        string                 `json:"status"` // success, failed
	RowCountDelta map[string]int64       `json:"row_count_delta"`
	Details       map[string]interface{} `json:"details"`
}

DrillResult holds the outcome of a DR drill.

func Drill

func Drill(ctx context.Context, cfg *config.Config, opts DrillOptions) (*DrillResult, error)

Drill executes a disaster recovery drill by provisioning a fresh VM, restoring from backup, and verifying data integrity.

type DrillScenarios

type DrillScenarios struct {
	PGRestore     bool            `json:"pg_restore"`
	HasuraUp      bool            `json:"hasura_up"`
	MinIOMetadata bool            `json:"minio_metadata"`
	PluginHealth  map[string]bool `json:"plugin_health"`
}

DrillScenarios captures per-check pass/fail across the smoke suite. All boolean fields are true on success. PluginHealth maps plugin name to health status and MUST include claw, ai, and mux at minimum.

type PromoteOptions

type PromoteOptions struct {
	Region string
	Yes    bool
}

PromoteOptions holds flags for `nself dr promote-standby`.

type Scenario

type Scenario string

Scenario identifies the type of DR drill.

const (
	// ScenarioColdStart is the only supported DR drill scenario in v1.0.9.
	// It provisions a fresh VM, installs nSelf, restores the latest verified
	// backup, runs smoke queries from the verify catalog, and records RTO.
	ScenarioColdStart Scenario = "cold-start"

	// ScenarioRegionFailover is not supported in v1.0.9 (single-region
	// deployment by design). Cross-region replication is planned for v1.1.0.
	// Using this scenario returns a clear deprecation message, not a stub.
	ScenarioRegionFailover Scenario = "region-failover"

	// ScenarioDataCorruption is not supported in v1.0.9. PITR recovery via
	// pgbackrest is planned for v1.1.0. Using this scenario returns a clear
	// deprecation message, not a stub.
	ScenarioDataCorruption Scenario = "data-corruption"
)

type SystemdInstallOptions

type SystemdInstallOptions struct {
	// Schedule is the desired cadence. Only "monthly" is supported today.
	// Empty means monthly (OnCalendar=*-*-01 05:00:00).
	Schedule string

	// HetznerProject identifies the Hetzner project that owns the drill VM.
	// Used to select the correct API token from the env file (for example
	// "camarata" selects HETZNER_CAMARATA_TOKEN).
	HetznerProject string

	// VMType is the Hetzner server type used for the drill VM. Defaults to
	// "cx22" (smallest shared-CPU tier in fsn1) to keep drill cost < €0.05.
	VMType string

	// SSHKey is the absolute path to the public SSH key injected into the
	// drill VM via cloud-init. Defaults to /root/.config/nself/dr-key.pub.
	SSHKey string

	// Region is the Hetzner location for provisioning. Defaults to fsn1.
	Region string

	UnitDir    string // default /etc/systemd/system
	EnvFile    string // default /etc/nself/dr.env
	BinaryPath string // default /usr/local/bin/nself
	ProjectDir string // default /opt/nself
	DryRun     bool
}

SystemdInstallOptions controls install of the monthly DR drill timer.

The unit runs `nself dr drill --now` on the first of each month at 05:00 UTC, provisioning a throwaway Hetzner VM, restoring the latest backup, running a smoke suite, recording timings to the dr_drill_report PG table, and destroying the VM.

type SystemdUnitFiles

type SystemdUnitFiles map[string]string

SystemdUnitFiles maps filename -> unit contents.

func RenderSystemdUnits

func RenderSystemdUnits(opts SystemdInstallOptions) (SystemdUnitFiles, error)

RenderSystemdUnits produces the unit files for the monthly DR drill without writing to disk. Returns a map suitable for writing into /etc/systemd/system.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL