shutdown

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 2, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

README

shutdown

Phased, parallel-within-phase, observable graceful shutdown manager for long-running Go services. Designed for Kubernetes, systemd, and any orchestrator that delivers SIGTERM with a grace period.

Zero third-party dependencies in the core. Adapter modules for Zap / slog / OpenTelemetry / Prometheus / ubgo/health / Gin / Chi / Echo / Fiber / uber-go/fx ship under contrib/.

Install

go get github.com/ubgo/shutdown

Quick start

package main

import (
    "context"
    "log"
    "time"

    "github.com/ubgo/shutdown"
)

func main() {
    mgr := shutdown.New(
        shutdown.WithBudget(30 * time.Second),
    )

    mgr.Register("http",  httpServer.Shutdown,
        shutdown.WithPhase(shutdown.PhaseStopAccepting))
    mgr.Register("nats",  natsConn.Drain,
        shutdown.WithPhase(shutdown.PhaseDrainTraffic))
    mgr.Register("db",    db.Close,
        shutdown.WithPhase(shutdown.PhaseCloseClients))
    mgr.Register("redis", redisClient.Close,
        shutdown.WithPhase(shutdown.PhaseCloseClients)) // parallel with db
    mgr.Register("otel",  otelProvider.Shutdown,
        shutdown.WithPhase(shutdown.PhaseFlushLogs))

    if err := mgr.Listen(context.Background()); err != nil {
        log.Fatal(err)
    }
}

Listen blocks until SIGTERM/SIGINT, then runs every registered handler in phase order. Handlers in the same phase run in parallel. The whole thing is bounded by WithBudget; a watchdog hard-exits if budget plus grace expires.

Phases

The seven predefined phases match the typical k8s preStop drain pattern. Lower phases run first.

Phase Value Typical handlers
PhasePreShutdown -100 flip drain flag (load balancer stops sending)
PhaseStopAccepting 0 close HTTP listeners
PhaseDrainTraffic 100 wait for in-flight requests; NATS drain
PhaseFlushQueues 200 flush async producers, drain workers
PhaseCloseClients 300 close DB / cache / messaging clients
PhaseFlushLogs 400 flush logs and traces last
PhasePostShutdown 500 final cleanup

Phases are plain int — pass any value to WithPhase if you need finer-grained ordering.

Force-exit on second signal

By default, a second SIGTERM/SIGINT during shutdown calls os.Exit(130) immediately — the operator's escape hatch when a handler hangs. Disable via WithForceOnSecondSignal(false, 0).

Watchdog

WithBudget(d) sets the total wall-clock budget. After the budget plus a 1-second grace period (configurable via WithWatchdogGrace), the watchdog calls os.Exit(failureCode) with the names of the stuck handlers logged. No more zombie processes.

Programmatic trigger

Tests, panic-recovery middleware, custom HTTP /admin/shutdown endpoints, and health failure paths can all trigger shutdown without an OS signal:

err := mgr.Shutdown(ctx)

Same execution path as Listen.

Reload signals (SIGHUP)

mgr.OnSignal(syscall.SIGHUP, func(ctx context.Context, _ os.Signal) {
    config.Reload()
})

The hook fires; shutdown is NOT triggered. Useful for log rotation (SIGUSR1), config reload (SIGHUP), and similar gunicorn-style signal patterns.

Actor pattern (oklog/run-style)

For long-running goroutines (workers, schedulers) where the run loop and the cancel mechanism are distinct:

handle, _ := mgr.RegisterActor("worker", workerStop,
    shutdown.WithActorPhase(shutdown.PhaseDrainTraffic))

go func() {
    err := workerLoop()  // blocks until workerStop is called
    handle.Done(err)     // signals actor completed
}()

The manager calls workerStop during the configured phase, then waits up to the per-actor timeout for handle.Done to be called.

Observer pattern

Adapters subscribe to lifecycle callbacks instead of polluting the core:

mgr.Subscribe(shutdown.Observer{
    OnPhaseStart: func(p shutdown.Phase, n int) { /* OTEL span start */ },
    OnPhaseEnd:   func(p shutdown.Phase, dur time.Duration, errs []error) { /* span end */ },
    OnHandlerEnd: func(name string, p shutdown.Phase, dur time.Duration, err error) { /* prom metric */ },
    OnComplete:   func(total time.Duration, err error) { /* alert webhook */ },
})

shutdown-otel, shutdown-prom, and shutdown-health contribs use this pattern.

Adapters

Adapter modules ship as separate Go modules under contrib/. Import only the ones you use; each pulls only its own dependencies.

Adapter Module path Role
shutdown-zap github.com/ubgo/shutdown/contrib/shutdown-zap Zap Logger adapter
shutdown-slog github.com/ubgo/shutdown/contrib/shutdown-slog Explicit *slog.Logger adapter
shutdown-otel github.com/ubgo/shutdown/contrib/shutdown-otel OpenTelemetry spans per phase + handler
shutdown-prom github.com/ubgo/shutdown/contrib/shutdown-prom Prometheus metrics
shutdown-health github.com/ubgo/shutdown/contrib/shutdown-health Auto-flip ubgo/health readiness on PreShutdown
shutdown-nethttp github.com/ubgo/shutdown/contrib/shutdown-nethttp http.Server.Shutdown registered handler
shutdown-gin / -chi / -echo / -fiber …/contrib/shutdown-<framework> Framework-server shutdown helpers
shutdown-fx github.com/ubgo/shutdown/contrib/shutdown-fx uber-go/fx lifecycle bridge

(Adapters land in subsequent releases; v0.1.0 ships the core only.)

Comparison

Feature uber-fx oklog/run tokio-graceful-shutdown terminus ubgo/shutdown
Phase-based ordering
Parallel within phase partial
Force-exit on second signal
Watchdog hard-exit
Observer pattern
Native readiness drain ✅ (contrib)
Actor (run+interrupt) pairs partial
Reload signal hook
Zero-dep core

Compatibility

Requires Go 1.24 or later.

License

Apache License 2.0. See LICENSE and NOTICE.

Documentation

Overview

Package shutdown is a phased, parallel-within-phase, observable graceful shutdown manager for long-running Go services.

The package has zero third-party dependencies. Logger, OTEL, Prometheus, health, and HTTP framework integrations live in adapter modules under contrib/.

Typical use:

mgr := shutdown.New(
    shutdown.WithBudget(30 * time.Second),
)
mgr.Register("http", srv.Shutdown,    shutdown.WithPhase(shutdown.PhaseStopAccepting))
mgr.Register("nats", natsConn.Drain,  shutdown.WithPhase(shutdown.PhaseDrainTraffic))
mgr.Register("db",   db.Close,        shutdown.WithPhase(shutdown.PhaseCloseClients))
mgr.Register("otel", otelP.Shutdown,  shutdown.WithPhase(shutdown.PhaseFlushLogs))

if err := mgr.Listen(ctx); err != nil {
    log.Fatal(err)
}

See the README and the companion examples repo at github.com/ubgo/shutdown-examples for k8s preStop, drain, OTEL tracing, and worker-actor patterns.

Index

Constants

This section is empty.

Variables

View Source
var ErrAlreadyRegistered = errors.New("shutdown: handler already registered with that name")

ErrAlreadyRegistered is returned by Register when a handler with the given name is already in the Manager.

View Source
var ErrClosed = errors.New("shutdown: manager closed (shutdown in progress or completed)")

ErrClosed is returned by Register when the Manager has already started running its phases.

View Source
var ErrEmptyName = errors.New("shutdown: handler name must be non-empty")

ErrEmptyName is returned by Register when name is the empty string.

Functions

This section is empty.

Types

type ActorHandle

type ActorHandle struct {
	// contains filtered or unexported fields
}

ActorHandle is returned from RegisterActor. Call Done(err) when the actor's run loop has exited so the manager can proceed to the next phase.

func (*ActorHandle) Done

func (h *ActorHandle) Done(err error)

Done signals that the actor's run loop has returned. err is the run loop's exit error (nil if it returned cleanly). Idempotent — only the first call has effect.

type ActorOption

type ActorOption func(*actorRegistration)

ActorOption configures a RegisterActor call.

func WithActorPhase

func WithActorPhase(p Phase) ActorOption

WithActorPhase places the actor's interrupt step in a specific phase. Default: PhaseDrainTraffic.

func WithActorTimeout

func WithActorTimeout(d time.Duration) ActorOption

WithActorTimeout caps how long the manager waits for the actor's run loop to confirm completion via Done after interrupt is called. Default: 30s.

type ErrorPolicy

type ErrorPolicy int

ErrorPolicy decides what happens when a handler returns an error.

const (
	// ContinueOnError keeps running remaining handlers in the same phase
	// and proceeds to subsequent phases. All errors are aggregated via
	// errors.Join and returned at the end. Default.
	ContinueOnError ErrorPolicy = iota
	// StopOnError aborts the phase on the first failure and returns
	// immediately, skipping subsequent phases.
	StopOnError
)

type HandlerFunc

type HandlerFunc func(ctx context.Context) error

HandlerFunc is the unit of work registered with a Manager. It receives a context bounded by the per-handler timeout (or the remaining global budget, whichever is shorter) and should return promptly when ctx is done.

type InterruptFunc

type InterruptFunc func(err error)

InterruptFunc is the cancellation half of an actor registration. It is called when the manager wants the actor to stop. Implementations should be quick and idempotent.

type Logger

type Logger interface {
	Info(msg string, fields ...any)
	Warn(msg string, fields ...any)
	Error(msg string, fields ...any)
}

Logger is the minimal logging contract. Adapters ship as separate modules: shutdown-zap, shutdown-slog. The default Manager uses log/slog.

func NoopLogger

func NoopLogger() Logger

NoopLogger returns a Logger that discards all messages. Useful in tests or when the application logs shutdown events through its observer hooks instead of the Logger interface.

func SlogLogger

func SlogLogger(l *slog.Logger) Logger

SlogLogger wraps a *slog.Logger as a shutdown.Logger. Useful when the caller wants to pass a specific *slog.Logger rather than relying on slog.Default().

type Manager

type Manager struct {
	// contains filtered or unexported fields
}

Manager is the central shutdown coordinator.

Construct with New, register handlers (and optionally actors), then call Listen (blocking on signals) or Shutdown (programmatic). A Manager is safe for concurrent Register and Subscribe calls before Listen/Shutdown has been entered; once a shutdown is in progress further Register calls return ErrClosed.

func New

func New(opts ...Option) *Manager

New constructs a Manager with the supplied options.

func (*Manager) Listen

func (m *Manager) Listen(ctx context.Context) error

Listen blocks until one of the configured shutdown signals arrives or ctx is cancelled, then runs all phases in order. Returns the aggregated error (errors.Join of every handler error) or context.Canceled if the caller cancelled before completion.

Listen does NOT call os.Exit by default; opt in via WithExitOnComplete.

While shutdown is running a second shutdown signal triggers an immediate os.Exit(forceCode) when WithForceOnSecondSignal is enabled (default).

Listen is safe to call once per Manager; subsequent calls return ErrClosed.

Signals registered via OnSignal are automatically added to the listened set, so callers do not need to also include them in WithSignals — a hook for SIGHUP without WithSignals(syscall.SIGHUP, ...) still fires.

func (*Manager) OnSignal

func (m *Manager) OnSignal(sig os.Signal, fn func(ctx context.Context, sig os.Signal))

OnSignal registers a hook for a non-shutdown signal (e.g. SIGHUP for config reload, SIGUSR1 for log rotation). When the signal arrives the hook is invoked with the Listen ctx; the shutdown sequence is NOT triggered, and Listen continues waiting for further signals.

The signal is automatically added to the listened set, so callers do not need to also include it in WithSignals.

Note: signals registered here are no longer treated as shutdown triggers by the manager. If a user adds SIGTERM via OnSignal it will not start a shutdown — it will only call the user's hook. To restore shutdown behaviour for that signal, simply do not register a hook for it.

func (*Manager) Register

func (m *Manager) Register(name string, fn HandlerFunc, opts ...RegisterOption) error

Register adds a shutdown handler. Returns an error if name is empty, already registered, or the Manager has already started shutting down.

func (*Manager) RegisterActor

func (m *Manager) RegisterActor(name string, interrupt InterruptFunc, opts ...ActorOption) (*ActorHandle, error)

RegisterActor registers a long-running actor (goroutine-style service) with the Manager. The actor is signalled to stop via interrupt during its phase, and the manager waits up to the per-actor timeout (or the remaining global budget, whichever is shorter) for the actor to confirm completion via the returned ActorHandle.

Typical use:

handle, err := mgr.RegisterActor("worker", workerStop,
    shutdown.WithActorPhase(shutdown.PhaseDrainTraffic))
go func() {
    err := workerLoop()       // blocks until workerStop is called
    handle.Done(err)          // signals actor exited
}()

Note: the run loop itself is not held by the manager. The caller is responsible for spawning the goroutine that runs the work; the manager only owns the interrupt + completion handshake.

func (*Manager) Shutdown

func (m *Manager) Shutdown(ctx context.Context) error

Shutdown is the programmatic equivalent of receiving a signal. Runs the same phase machinery as Listen.

func (*Manager) Subscribe

func (m *Manager) Subscribe(o Observer)

Subscribe attaches an Observer. Multiple observers can coexist; each one receives every callback in the order they were subscribed.

type Observer

type Observer struct {
	OnSignal       func(sig os.Signal)
	OnPhaseStart   func(phase Phase, handlerCount int)
	OnPhaseEnd     func(phase Phase, dur time.Duration, errs []error)
	OnHandlerStart func(name string, phase Phase)
	OnHandlerEnd   func(name string, phase Phase, dur time.Duration, err error)
	OnComplete     func(totalDur time.Duration, err error)
}

Observer fan-out for adapters. All callbacks are optional and may be nil. Observers fire synchronously; long-running observers should fan out to a goroutine themselves.

type Option

type Option func(*config)

Option configures a Manager.

func WithBudget

func WithBudget(d time.Duration) Option

WithBudget sets the total wall-clock budget across all phases. After the budget expires, in-flight handler contexts are cancelled and the watchdog hard-exits the process after a 1-second grace period (configurable via WithWatchdogGrace). Default: 30s.

func WithErrorPolicy

func WithErrorPolicy(p ErrorPolicy) Option

WithErrorPolicy overrides the ContinueOnError default.

func WithExitOnComplete

func WithExitOnComplete(successCode, failureCode int) Option

WithExitOnComplete makes Listen call os.Exit at the end of shutdown. successCode is used when the aggregated error is nil; failureCode otherwise. Default: never exit (just return).

func WithForceOnSecondSignal

func WithForceOnSecondSignal(enabled bool, forceCode int) Option

WithForceOnSecondSignal makes a second signal during shutdown trigger an immediate os.Exit(forceCode). Default: true with forceCode=130.

Set enabled=false to ignore second signals (the orchestrator's SIGKILL is then the only escape hatch — useful only when you trust the watchdog budget completely).

func WithHandlerDefaultTimeout added in v0.2.0

func WithHandlerDefaultTimeout(d time.Duration) Option

WithHandlerDefaultTimeout sets the default per-handler timeout used when a Register call does not pass WithTimeout. Default: 5s.

func WithLogger

func WithLogger(l Logger) Option

WithLogger overrides the default slog-backed logger.

func WithSerial

func WithSerial(phase Phase) Option

WithSerial opts a specific phase out of parallel handler execution. By default all handlers in a phase run in parallel.

func WithSignals

func WithSignals(sigs ...os.Signal) Option

WithSignals overrides the listened signal set. Default: SIGINT, SIGTERM.

If you want to add a non-shutdown signal hook (e.g. SIGHUP for reload), use Manager.OnSignal instead — that registers a hook without making the signal trigger a shutdown.

func WithWatchdogGrace

func WithWatchdogGrace(d time.Duration) Option

WithWatchdogGrace sets the grace period after the budget expires before the watchdog calls os.Exit. Default: 1s.

type PanicError added in v0.2.0

type PanicError struct {
	Name  string
	Value any
}

PanicError wraps a recovered panic from inside a shutdown handler. The runner returns this in place of the handler's intended error so the panic surfaces in the aggregated error and observers' OnHandlerEnd hook.

func (*PanicError) Error added in v0.2.0

func (e *PanicError) Error() string

type Phase

type Phase int

Phase is the ordering key for handler execution. Lower phases run first. Predefined constants cover the common k8s preStop drain pattern; a raw int is also valid for power users who want finer-grained sequencing.

const (
	PhasePreShutdown   Phase = -100
	PhaseStopAccepting Phase = 0
	PhaseDrainTraffic  Phase = 100
	PhaseFlushQueues   Phase = 200
	PhaseCloseClients  Phase = 300
	PhaseFlushLogs     Phase = 400
	PhasePostShutdown  Phase = 500
)

Predefined phases — match the typical k8s graceful-shutdown flow:

  1. PhasePreShutdown — flip drain flag (load balancer stops sending).
  2. PhaseStopAccepting — close listeners (no new requests accepted).
  3. PhaseDrainTraffic — wait for in-flight work to finish.
  4. PhaseFlushQueues — flush async producers and worker queues.
  5. PhaseCloseClients — close DB, cache, messaging clients.
  6. PhaseFlushLogs — flush logs and traces last so prior phase errors reach the collector.
  7. PhasePostShutdown — final cleanup, exit-code reporting.

func (Phase) String

func (p Phase) String() string

String returns the canonical phase name when one of the predefined constants matches; otherwise returns "phase=<n>".

type RegisterOption

type RegisterOption func(*registration)

RegisterOption configures a Register call.

func WithPhase

func WithPhase(p Phase) RegisterOption

WithPhase places the handler in a specific phase. Default: PhaseCloseClients.

func WithTimeout

func WithTimeout(d time.Duration) RegisterOption

WithTimeout caps how long this handler may run. Default is set by WithHandlerDefaultTimeout on the Manager (5s out of the box). The actual deadline is min(WithTimeout, remaining global budget).

type RunFunc

type RunFunc func() error

RunFunc is the long-running half of an actor registration. It returns when the actor naturally exits (or its InterruptFunc was called).

Directories

Path Synopsis
contrib
shutdown-gin module
shutdown-otel module
shutdown-prom module
shutdown-zap module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL