failover

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 8, 2026 License: MIT Imports: 11 Imported by: 0

README

failover

Go Reference Go Report Card Go Version

Single-leader election for distributed services backed by Redis.
Ensures exactly one node runs the workload at any time with automatic failover.

Features

  • 🏆 Single-leader election — exactly one node is active at any time
  • Automatic failover — new leader elected within seconds if active node dies
  • 🎯 Two strategies — Race (first wins) and Priority (lower number = higher priority)
  • 🔑 Fencing tokens — monotonically increasing revision prevents stale writes
  • 📊 Cluster stats — live nodes, current leader, recent events
  • 🔴 Redis Cluster ready — all keys use hash-tags for same-slot guarantee
  • 🛡️ Panic-safe callbacks — OnStart and OnStop are wrapped in recover

Install

go get github.com/OpexDevelop/go-failover

Quick Start

package main

import (
    "context"
    "fmt"
    "os/signal"
    "syscall"

    "github.com/OpexDevelop/go-failover"
)

func main() {
    ctx, cancel := signal.NotifyContext(context.Background(),
        syscall.SIGINT, syscall.SIGTERM)
    defer cancel()

    cluster, err := failover.Run(ctx, failover.Config{
        Project:  "myservice",
        RedisURL: "redis://localhost:6379",
        OnStart: func(ctx context.Context) error {
            fmt.Println("I am the leader!")
            <-ctx.Done()
            return nil
        },
        OnStop: func() {
            fmt.Println("Lost leadership")
        },
    })
    if err != nil {
        panic(err)
    }
    defer cluster.Close()

    cluster.Wait()
}

Strategies

Race (default)

Any node can become leader — whoever acquires the Redis key first wins.

failover.Run(ctx, failover.Config{
    Strategy: failover.Race,
    // ...
})
Priority

Nodes with lower priority number take precedence.
If a higher-priority node comes online, the current leader yields automatically.

// This node will yield to any node with priority < 2
failover.Run(ctx, failover.Config{
    Strategy:     failover.Priority,
    NodePriority: 2,
    // ...
})

Callbacks

OnStart

Called in a separate goroutine when this node becomes leader.
Receives a context.Context that is cancelled when leadership is lost.

OnStart: func(ctx context.Context) error {
    // start your work here
    <-ctx.Done() // block until leadership is lost
    return nil
},

Important: OnStart MUST respect the context and return promptly when ctx.Done() fires.
If OnStart returns a non-nil error, the node steps down immediately and waits LeaseTTL * 2 before retrying.

OnStop

Called after OnStart has returned when leadership is lost.

OnStop: func() {
    // cleanup
},

Important: OnStop MUST complete within 1 second or the goroutine is abandoned.

Config

Field Type Default Description
Project string required Unique name, used as Redis key prefix
NodeID string hostname Unique identifier for this node
NodePriority int 1 Lower = higher priority (Priority strategy only)
Strategy Strategy Race Election mode: Race or Priority
RedisURL string Redis connection string
RedisOptions *redis.Options Full Redis config (overrides RedisURL)
LeaseTTL time.Duration 3s Leader key expiration. Minimum: 100ms
TickInterval time.Duration 500ms How often each node checks state
MaxDrift time.Duration LeaseTTL/2 Max time without refresh before step-down
OnStart func(ctx) error Called when node becomes leader
OnStop func() Called when node loses leadership
Logger *slog.Logger slog.Default() Structured logger

Fencing Tokens

Each election increments a monotonically increasing revision stored in Redis.
Use it to reject stale writes in your storage layer.

revision := cluster.Revision() // 0 if not leader

Stats

stats, err := cluster.GetStats(ctx)
// stats.CurrentLeader   — current leader node ID
// stats.CurrentRevision — current fencing token
// stats.LiveNodes       — all alive nodes with priority and last seen
// stats.RecentEvents    — last 100 leadership events

Cluster Info

info := cluster.NodeInfo()
// info.NodeID, info.Priority, info.IsLeader, info.Revision

isLeader := cluster.IsLeader()
revision  := cluster.Revision()

Examples

See _examples/telegram-bot for a full example of a
Telegram bot that runs only on the leader node with automatic failover.

How It Works

Node A          Node B          Redis
  |               |               |
  |-- SET NX ---->|               |
  |               |<-- exists ----|
  |   (leader)    |  (follower)   |
  |               |               |
  |-- PEXPIRE --->|               |  refresh every tick
  |               |               |
  |   (crash)     |               |
  |               |-- SET NX ---->|  key expired
  |               |   (leader)    |  new leader elected
  1. Each node ticks every TickInterval (±20% jitter)
  2. Follower checks if leader key exists → tries SET NX if not
  3. Leader refreshes key TTL every tick
  4. If refresh fails or drift exceeds MaxDrift → step down
  5. Priority strategy: follower checks for higher-priority alive nodes before acquiring

Requirements

  • Go 1.21+
  • Redis 6.0+

License

MIT — see LICENSE

Documentation

Overview

election.go

Example
package main

import (
	"context"
	"fmt"
	"os"
	"os/signal"
	"syscall"

	"github.com/OpexDevelop/go-failover"
)

func main() {
	ctx, cancel := signal.NotifyContext(context.Background(),
		syscall.SIGINT, syscall.SIGTERM)
	defer cancel()

	cluster, err := failover.Run(ctx, failover.Config{
		Project:      "myservice",
		NodePriority: 1,
		Strategy:     failover.Priority,
		RedisURL:     os.Getenv("REDIS_URL"),
		OnStart: func(ctx context.Context) error {
			fmt.Println("I am the leader now!")
			// Block until leadership is lost
			<-ctx.Done()
			fmt.Println("Leadership context cancelled, cleaning up...")
			return nil
		},
		OnStop: func() {
			fmt.Println("I lost leadership")
		},
	})
	if err != nil {
		panic(err)
	}
	defer cluster.Close()

	cluster.Wait()
}

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func FormatStats

func FormatStats(s Stats) string

FormatStats returns a human-readable HTML string (suitable for Telegram).

func Hostname

func Hostname() string

Hostname returns the machine hostname (convenience wrapper).

Types

type Cluster

type Cluster struct {
	// contains filtered or unexported fields
}

Cluster is a running failover instance.

func Run

func Run(ctx context.Context, cfg Config) (*Cluster, error)

Run starts the failover loop. It connects to Redis, validates config, and begins leader election in a background goroutine. The returned Cluster is safe for concurrent use.

func (*Cluster) Close

func (cl *Cluster) Close() error

Close waits for the internal loop to finish and releases the Redis connection. Safe to call multiple times — subsequent calls return the same error as the first call. The context passed to Run must be cancelled before calling Close.

func (*Cluster) GetStats

func (cl *Cluster) GetStats(ctx context.Context) (Stats, error)

GetStats returns a snapshot of the cluster.

func (*Cluster) IsLeader

func (cl *Cluster) IsLeader() bool

IsLeader reports whether this node is currently the leader.

func (*Cluster) NodeInfo

func (cl *Cluster) NodeInfo() NodeInfo

NodeInfo returns information about the current node.

func (*Cluster) Revision

func (cl *Cluster) Revision() int64

Revision returns the current fencing token (monotonically increasing). Returns 0 if this node is not the leader.

func (*Cluster) Wait

func (cl *Cluster) Wait()

Wait blocks until the internal loop exits.

type Config

type Config struct {
	// Project is a unique name used as Redis key prefix. Required.
	Project string

	// NodeID uniquely identifies this node. Defaults to hostname.
	NodeID string

	// Priority of this node (lower = higher priority).
	// Only meaningful with Strategy = Priority. Defaults to 1.
	NodePriority int

	// Strategy selects election mode. Default: Race.
	Strategy Strategy

	// RedisURL is the connection string for Redis.
	// Ignored when RedisOptions is set.
	RedisURL string

	// RedisOptions allows full control over the Redis connection.
	// When set, RedisURL is ignored.
	RedisOptions *redis.Options

	// TickInterval is how often the node checks state. Default: 500ms.
	// A random jitter of ±20% is applied to each tick.
	TickInterval time.Duration

	// LeaseTTL is the leader key expiration. Default: 3s. Minimum: 100ms.
	LeaseTTL time.Duration

	// MaxDrift is max time without successful refresh before step-down.
	// Default: LeaseTTL / 2.
	MaxDrift time.Duration

	// OnStart is called in a separate goroutine when this node becomes leader.
	// The context is cancelled when leadership is lost — OnStart MUST return
	// promptly when ctx.Done() fires. If OnStart returns a non-nil error,
	// the node steps down immediately.
	OnStart func(ctx context.Context) error

	// OnStop is called when this node loses leadership, after OnStart returns.
	// Must complete within 1 second or it is abandoned (goroutine leak).
	OnStop func()

	// Logger is an optional structured logger. If nil, slog.Default() is used.
	Logger *slog.Logger
}

Config configures the failover election.

type Event

type Event struct {
	NodeID    string    `json:"node_id"`
	Action    string    `json:"action"`
	Timestamp time.Time `json:"timestamp"`
}

Event represents a recorded leadership event.

type LiveNode

type LiveNode struct {
	NodeID   string    `json:"node_id"`
	Priority int       `json:"priority"`
	IsLeader bool      `json:"is_leader"`
	LastSeen time.Time `json:"last_seen"`
}

LiveNode represents a currently alive node.

type NodeInfo

type NodeInfo struct {
	NodeID   string `json:"node_id"`
	Priority int    `json:"priority"`
	Project  string `json:"project"`
	IsLeader bool   `json:"is_leader"`
	Revision int64  `json:"revision"`
}

NodeInfo contains information about the current node.

type Stats

type Stats struct {
	Project         string     `json:"project"`
	CurrentLeader   string     `json:"current_leader"`
	CurrentRevision int64      `json:"current_revision"`
	LiveNodes       []LiveNode `json:"live_nodes"`
	RecentEvents    []Event    `json:"recent_events"`
}

Stats provides a snapshot of the cluster.

type Strategy

type Strategy int

Strategy determines how leader election works.

const (
	// Race — any node can become leader, whoever acquires first wins.
	Race Strategy = iota
	// Priority — nodes with lower priority number take precedence.
	Priority
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL