failover

package
v0.411.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 1, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

Package failover monitors primary liveness and promotes a secondary instance when the primary dies or shuts down gracefully.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BusPublisher

type BusPublisher interface {
	Publish(topic string, payload any) error
}

BusPublisher is the subset of ipc.Bus used by the Watcher for publishing.

type Config

type Config struct {
	// HeartbeatInterval controls how often the primary publishes its heartbeat.
	HeartbeatInterval time.Duration
	// HeartbeatTimeout is the maximum silence before a secondary declares the primary dead.
	HeartbeatTimeout time.Duration
	// ProbeInterval is how often the secondary actively pings the primary via RPC
	// as a complementary liveness check independent of PUB/SUB heartbeats.
	ProbeInterval time.Duration
	// Enabled controls whether the secondary will attempt automatic promotion when
	// the primary appears dead. Defaults to true; set false to observe without acting.
	Enabled bool
}

Config holds tunable failover parameters.

func DefaultConfig

func DefaultConfig() Config

DefaultConfig returns conservative defaults suitable for production use.

type EventSubscriber

type EventSubscriber interface {
	SubscribeTo(pubEndpoint string, topics ...string) (<-chan ipc.Envelope, error)
}

EventSubscriber is the subset of ipc.Client used by the Watcher for subscribing.

type ProbePrimaryFunc added in v0.326.0

type ProbePrimaryFunc func(ctx context.Context) error

ProbePrimaryFunc is an optional callback the Watcher uses to actively probe the primary via RPC (instance.ping). Set via SetProbePrimary.

type PromoteFunc added in v0.326.0

type PromoteFunc func(ctx context.Context, lockFile *os.File) error

PromoteFunc is called when this secondary wins the promotion race. The lockFile parameter is the open flock file the caller has already acquired (and must keep open for the lifetime of the primary role). If PromoteFunc returns an error the lockFile is released and the promotion is considered failed.

type RoleChanged

type RoleChanged struct {
	OldRole string // "primary" | "secondary"
	NewRole string
}

RoleChanged is emitted on RoleChangedC when the instance's role changes.

type Watcher

type Watcher struct {

	// RoleChangedC receives non-blocking notifications on every role transition.
	RoleChangedC chan RoleChanged
	// contains filtered or unexported fields
}

Watcher monitors primary liveness and coordinates role transitions.

  • On the primary instance: publishes periodic heartbeats and instance.shutdown on exit.
  • On secondary instances: watches for heartbeats and triggers promotion when none arrive.

func NewWatcher

func NewWatcher(
	cfg Config,
	role, instanceID, workdir string,
	pubPort, rpcPort int,
	bus BusPublisher,
	client EventSubscriber,
	pubEndpoint string,
	onPromote PromoteFunc,
	onDemote func(ctx context.Context) error,
) *Watcher

NewWatcher creates a Watcher. Exactly one of bus or client must be non-nil: primary instances pass bus, secondary instances pass client with the primary's pubEndpoint.

func NewWatcherForPrimary

func NewWatcherForPrimary(
	cfg Config,
	instanceID, workdir string,
	pubPort, rpcPort int,
	bus BusPublisher,
) *Watcher

NewWatcherForPrimary is a convenience constructor that wires a primary Watcher with the Bus and no client. onPromote/onDemote are unused for the primary role.

func NewWatcherForSecondary

func NewWatcherForSecondary(
	cfg Config,
	instanceID, workdir string,
	pubPort, rpcPort int,
	client EventSubscriber,
	pubEndpoint string,
	onPromote PromoteFunc,
) *Watcher

NewWatcherForSecondary is a convenience constructor that wires a secondary Watcher with the IPC client subscribed to the primary's PUB endpoint. onPromote may be nil; it can be set later via SetPromoteCallback.

func (*Watcher) CheckAndMaybeFailover added in v0.326.0

func (w *Watcher) CheckAndMaybeFailover(ctx context.Context) error

CheckAndMaybeFailover performs an immediate active probe of the primary and, if unreachable and auto-failover is enabled, triggers the promotion sequence. Designed to be called before processing each user prompt on a secondary. Returns nil if the primary is alive or if this is already the primary. Returns an error only if the probe fails AND promotion also fails.

func (*Watcher) FormatStatus

func (w *Watcher) FormatStatus() string

FormatStatus returns a human-readable description of the watcher state for diagnostics.

func (*Watcher) LastHeartbeat

func (w *Watcher) LastHeartbeat() time.Time

LastHeartbeat returns the time of the most recent heartbeat seen by this secondary. Returns zero time on primary instances.

func (*Watcher) SetEnabled

func (w *Watcher) SetEnabled(enabled bool)

SetEnabled enables or disables automatic failover at runtime. Safe for concurrent use.

func (*Watcher) SetProbePrimary added in v0.326.0

func (w *Watcher) SetProbePrimary(fn ProbePrimaryFunc)

SetProbePrimary registers an active-probe function used by the 60-second background ticker and by CheckAndMaybeFailover for per-prompt liveness checks.

func (*Watcher) SetPromoteCallback added in v0.326.0

func (w *Watcher) SetPromoteCallback(fn PromoteFunc)

SetPromoteCallback replaces the promotion callback after construction. Must be called before Start() to take effect.

func (*Watcher) Shutdown

func (w *Watcher) Shutdown(ctx context.Context)

Shutdown stops the watcher. On primary instances it first publishes instance.shutdown. Safe to call even if Start was never called.

func (*Watcher) Start

func (w *Watcher) Start(ctx context.Context)

Start begins monitoring in the background. For the primary it publishes heartbeats; for secondaries it watches for them and triggers promotion on absence.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL