leader

package
v0.0.0-...-54ed9d2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 17, 2026 License: Apache-2.0 Imports: 12 Imported by: 0

Documentation

Overview

Package leader – etcd-backed leader election with fencing tokens.

EtcdElection implements distributed leader election over a real etcd cluster using etcd sessions + elections (go.etcd.io/etcd/client/v3/concurrency). It exposes the same fencing-token contract as the in-memory LeaseRegistry: every successful leadership acquisition issues a strictly-increasing epoch, and a stale leader can be detected by any consumer that tracks the highest epoch it has observed.

Package leader provides distributed leader election utilities for Helix Cluster OS.

Index

Constants

View Source
const EpochNamespace = "/helix/leader-epoch"

EpochNamespace is the parallel etcd key root under which durable fencing-epoch counters are stored. It is deliberately disjoint from any election key prefix so concurrency.Election's WithPrefix(ElectionKey) scan never matches an epoch key (which would otherwise hang Campaign — see NewEtcdElection).

Variables

View Source
var (
	ErrNoHealthyMembers = errors.New("no healthy members available for election")
	ErrNotLeader        = errors.New("this node is not the leader")
)
View Source
var ErrLeaseHeld = errors.New("leader: lease still held by an un-expired holder")

ErrLeaseHeld is returned by AcquireLease when a valid (un-expired) lease is still held — either by this node or, via a shared LeaseRegistry, by another node. Deterministic failover requires that a new candidate may acquire ONLY after the prior lease has expired.

Functions

func StaleLeader

func StaleLeader(tok FencingToken, highestSeenEpoch uint64) bool

StaleLeader checks if a given fencing token is stale relative to the highest observed epoch. Consumers call this to reject requests from former leaders whose lease was revoked.

Types

type Clock

type Clock interface {
	Now() time.Time
}

Clock is an injectable time source. It exists so that leadership failover and TTL/lease expiry can be exercised deterministically in tests with a fake clock (no time.Sleep, no wall-clock dependence). The production default is realClock.

type Config

type Config struct {
	LocalID string
	TTL     time.Duration // default 5s
}

Config holds election configuration.

type Election

type Election struct {
	// contains filtered or unexported fields
}

Election implements a distributed leader election using SWIM membership. The leader is deterministically chosen as the lowest SHA256 hash of (memberID + term), and must renew its leadership within a TTL or another node will take over.

func NewElection

func NewElection(cfg *Config, opts ...Option) (*Election, error)

NewElection creates a new distributed Election. Additional behavior may be supplied via Options (e.g. WithClock); with no options the construction is identical to the original API.

func (*Election) AcquireLease

func (e *Election) AcquireLease() (FencingToken, error)

AcquireLease attempts to acquire (or renew) the leadership lease and returns a fresh FencingToken on success.

Hardening guarantees:

  • Fencing token: every successful acquisition issues a strictly increasing epoch, so stale leaders are detectable by consumers (see FencingToken.IsStale).
  • Deterministic failover: a new candidate may acquire ONLY after the current lease has expired (now >= expiry). A second acquire before expiry by a different node is rejected with ErrLeaseHeld and burns no epoch.
  • Split-brain guard: when a shared LeaseRegistry is attached, the registry enforces a single valid holder per clock instant across all nodes; a contending node cannot acquire while another holds an un-expired lease.

func (*Election) BecomeLeader

func (e *Election) BecomeLeader() error

BecomeLeader attempts to become leader by checking deterministic ordering. If this node is the lowest hash among healthy members, it becomes leader.

func (*Election) Epoch

func (e *Election) Epoch() uint64

Epoch returns the most recent fencing epoch issued to this node (0 if it has never acquired a lease).

func (*Election) IsLeader

func (e *Election) IsLeader() bool

IsLeader returns true if this instance is currently the leader.

func (*Election) LeaderID

func (e *Election) LeaderID() string

LeaderID returns the current leader's ID (may be empty if unknown).

func (*Election) LeaseValid

func (e *Election) LeaseValid() bool

LeaseValid reports whether this node currently holds a valid (un-expired) lease, evaluated against the injectable clock. When a shared registry is attached, validity is authoritative against the registry (so a node whose lease was taken over by a peer correctly reports false), preventing split-brain.

func (*Election) Resign

func (e *Election) Resign()

Resign resigns leadership.

func (*Election) Run

func (e *Election) Run(ctx context.Context)

Run starts a background goroutine that periodically attempts to become leader and resigns if the TTL expires without renewal.

func (*Election) SetLeaseRegistry

func (e *Election) SetLeaseRegistry(r *LeaseRegistry)

SetLeaseRegistry attaches a shared registry used as the cross-node split-brain guard. Elections sharing the same registry contend for one logical lease.

func (*Election) SetSWIMProtocol

func (e *Election) SetSWIMProtocol(p *swim.Protocol)

SetSWIMProtocol attaches a SWIM protocol for cluster membership awareness.

func (*Election) Stop

func (e *Election) Stop()

Stop stops the background goroutine.

type EtcdElection

type EtcdElection struct {
	// contains filtered or unexported fields
}

EtcdElection is a distributed leader election backed by real etcd sessions and the concurrency.Election primitive. It satisfies HXC-1093.

func NewEtcdElection

func NewEtcdElection(cfg EtcdElectionConfig, backend EtcdElectionBackend, epochStore EtcdEpochStore) (*EtcdElection, error)

NewEtcdElection creates a new EtcdElection.

backend is the injectable seam (pass nil to force the production path — but production callers should use NewEtcdElectionFromClient). epochStore is the injectable seam for epoch persistence (pass nil only in tests that inject a backend that manages epoch counting themselves).

func NewEtcdElectionFromClient

func NewEtcdElectionFromClient(ctx context.Context, cli *clientv3.Client, cfg EtcdElectionConfig, ttl int) (*EtcdElection, error)

NewEtcdElectionFromClient creates a production EtcdElection from a real etcd client. It creates a new session and election under cfg.ElectionKey.

func (*EtcdElection) Campaign

func (ee *EtcdElection) Campaign(ctx context.Context, runID string) (FencingToken, error)

Campaign blocks until this node wins the election, then records the fencing token and sets IsLeader. The per-run runID is embedded in the leader value (etcd key payload) for sink-side evidence.

func (*EtcdElection) Close

func (ee *EtcdElection) Close() error

Close releases the underlying session.

func (*EtcdElection) CurrentEpoch

func (ee *EtcdElection) CurrentEpoch() uint64

CurrentEpoch returns the most recently issued fencing epoch.

func (*EtcdElection) CurrentToken

func (ee *EtcdElection) CurrentToken() FencingToken

CurrentToken returns the most recently issued fencing token.

func (*EtcdElection) IsLeader

func (ee *EtcdElection) IsLeader() bool

IsLeader returns true if this node currently holds the etcd election.

func (*EtcdElection) LeaderID

func (ee *EtcdElection) LeaderID() string

LeaderID returns the current leader node ID (empty if unknown).

func (*EtcdElection) QueryLeader

func (ee *EtcdElection) QueryLeader(ctx context.Context) (LeaderValue, error)

QueryLeader reads the current leader's value from etcd and parses it. Returns an error if no leader exists.

func (*EtcdElection) Resign

func (ee *EtcdElection) Resign(ctx context.Context) error

Resign gives up leadership.

type EtcdElectionBackend

type EtcdElectionBackend interface {
	// Campaign blocks until this candidate wins the election, then writes value
	// as the leader value.
	Campaign(ctx context.Context, value string) error
	// Resign gives up leadership.
	Resign(ctx context.Context) error
	// Leader returns the current leader's KV, or an error if there is none.
	Leader(ctx context.Context) (*clientv3.GetResponse, error)
	// Observe returns a channel that fires whenever leadership changes.
	Observe(ctx context.Context) <-chan clientv3.GetResponse
	// LeaseID returns the underlying session lease ID (used in FencingToken).
	LeaseID() clientv3.LeaseID
	// Close releases the session.
	Close() error
}

EtcdElectionBackend is the injectable seam that isolates EtcdElection from the real clientv3/concurrency library during unit tests. The concurrency.Session + concurrency.Election pair satisfies this interface in production; a fake satisfies it in unit tests.

type EtcdElectionConfig

type EtcdElectionConfig struct {
	// LocalID uniquely identifies this candidate node.
	LocalID string
	// ElectionKey is the etcd key prefix under which the election runs,
	// e.g. "/helix/election/scheduler".
	ElectionKey string
	// EpochKey is the etcd key used to store the durable fencing epoch counter,
	// e.g. "/helix/election/scheduler/epoch". If empty, defaults to ElectionKey+"/epoch".
	EpochKey string
}

EtcdElectionConfig configures an EtcdElection.

type EtcdEpochStore

type EtcdEpochStore interface {
	// IncrementAndGet atomically increments the epoch stored at epochKey and
	// returns the new value. It MUST be strictly monotonic.
	IncrementAndGet(ctx context.Context, epochKey string) (uint64, error)
	// Get reads the current epoch (0 if not yet written).
	Get(ctx context.Context, epochKey string) (uint64, error)
}

EtcdEpochStore persists and retrieves the monotonic fencing epoch in etcd. This makes the epoch durable across restarts: a new leader reads the current epoch from etcd and increments it, so the token is guaranteed to be strictly greater even after a crash-restart cycle.

type FencingToken

type FencingToken struct {
	// HolderID is the node that acquired the lease.
	HolderID string
	// Epoch is the monotonic fencing number. Strictly increases across the whole
	// election lifetime; never reused.
	Epoch uint64
	// Expiry is the clock time at which this lease becomes invalid.
	Expiry time.Time
}

FencingToken is the monotonically increasing credential issued on each successful leadership acquisition. Consumers attach the token they observed from a leader to their writes; a downstream resource that tracks the highest epoch it has seen can reject any request carrying a strictly lower epoch, fencing out a stale (e.g. paused or partitioned) former leader.

func (FencingToken) IsStale

func (t FencingToken) IsStale(highestSeen uint64) bool

IsStale reports whether this token is older than highestSeen, i.e. a newer leadership epoch has been issued since this token. A consumer that rejects stale tokens prevents a former leader from acting after losing leadership.

type LeaderValue

type LeaderValue struct {
	HolderID string `json:"holder_id"`
	Epoch    uint64 `json:"epoch"`
	RunID    string `json:"run_id"` // per-run UUID for anti-bluff evidence
}

LeaderValue is the JSON payload written to etcd by the winning candidate. Embedding the per-run UUID and epoch here makes the sink-side value inspectable.

type LeaseRegistry

type LeaseRegistry struct {
	// contains filtered or unexported fields
}

LeaseRegistry is the shared split-brain guard. A single registry instance is shared (via SetLeaseRegistry) by all Election instances that contend for the SAME logical leadership. It guarantees that at most one holder owns a valid (un-expired) lease at any clock instant, and that each granted lease carries a strictly increasing epoch — so two nodes can never both hold a valid lease with the same epoch.

func NewLeaseRegistry

func NewLeaseRegistry() *LeaseRegistry

NewLeaseRegistry creates an empty registry.

type Option

type Option func(*Election)

Option configures an Election at construction time. Options are additive and optional, preserving the original single-argument NewElection call sites.

func WithClock

func WithClock(c Clock) Option

WithClock injects a custom time source (see Clock). This is the seam that makes failover and TTL/lease expiry testable deterministically with a fake clock, without any real time.Sleep or wall-clock dependence. If not supplied, the Election uses a real-time clock.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL