health

package module
v2.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 30, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

README

health

A zero-dependency health check library for Go services. Built for Kubernetes, useful everywhere.

Full Documentation | pkg.go.dev

go get github.com/schigh/health/v2

When to use this library

This library is designed for Go services running in Kubernetes with multiple external dependencies (databases, caches, other services). It is most valuable when:

  • You need readiness separate from liveness (your pod is alive but Postgres is down, so you should stop receiving traffic without being killed)
  • You have startup sequencing requirements (loading data, warming caches, waiting for dependencies before accepting traffic)
  • You want structured observability into why a service is unhealthy, not just that it restarted
  • You run multiple services that depend on each other and want dependency graph visibility

If your service is stateless with no external dependencies, a simple http.HandleFunc("/healthz", ...) returning 200 is sufficient. You don't need this library for that.

Why this library?

health/v2 heptiolabs alexliesenfeld InVisionApp
External deps 0 2 3 5
K8s probes liveness, readiness, startup liveness, readiness liveness, readiness liveness, readiness
Degraded state yes no no no
Built-in checkers HTTP, TCP, DNS, Redis, DB, command HTTP, TCP, DNS none none
Maintained active archived active archived

Quick Start

package main

import (
    "context"
    "os/signal"
    "syscall"
    "time"

    "github.com/schigh/health/v2"
    "github.com/schigh/health/v2/manager/std"
    "github.com/schigh/health/v2/checker/http"
    "github.com/schigh/health/v2/checker/tcp"
    "github.com/schigh/health/v2/reporter/httpserver"
)

func main() {
    ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
    defer cancel()

    mgr := std.Manager{}

    // HTTP dependency
    mgr.AddCheck("api", http.NewChecker("api", "https://api.example.com/health"),
        health.WithCheckFrequency(health.CheckAtInterval, 10*time.Second, 0),
        health.WithLivenessImpact(),
        health.WithReadinessImpact(),
        health.WithGroup("external"),
        health.WithComponentType("http"),
    )

    // Database (TCP)
    mgr.AddCheck("postgres", tcp.NewChecker("postgres", "localhost:5432"),
        health.WithCheckFrequency(health.CheckAtInterval, 5*time.Second, 0),
        health.WithLivenessImpact(),
        health.WithReadinessImpact(),
        health.WithGroup("database"),
        health.WithComponentType("datastore"),
    )

    // HTTP reporter with BasicAuth
    mgr.AddReporter("http", httpserver.New(
        httpserver.WithPort(8181),
        httpserver.WithMiddleware(httpserver.BasicAuth("admin", "secret")),
    ))

    errChan := mgr.Run(ctx)
    select {
    case err := <-errChan:
        panic(err)
    case <-ctx.Done():
        mgr.Stop(ctx)
    }
}

Built-in Checkers

All checkers are zero-dependency, using only the standard library.

Package What it checks Options
checker/http HTTP endpoint returns expected status WithTimeout, WithExpectedStatus, WithMethod, WithClient
checker/tcp TCP port is accepting connections WithTimeout
checker/dns Hostname resolves to an address WithTimeout, WithResolver
checker/redis Redis PING via raw RESP protocol WithTimeout, WithPassword
checker/db Database ping via sql.DB interface WithTimeout
checker/command Run any func(ctx) error (none)
// Custom check with the command checker
mgr.AddCheck("s3", command.NewChecker("s3", func(ctx context.Context) error {
    _, err := s3Client.HeadBucket(ctx, &s3.HeadBucketInput{Bucket: &bucket})
    return err
}))

Caching

Wrap any checker with TTL-based caching to avoid hammering expensive dependencies:

cached := health.WithCache(
    redis.NewChecker("redis", "localhost:6379"),
    30*time.Second,
)
mgr.AddCheck("redis", cached, ...)

Check Metadata

Checks carry structured metadata for observability and dependency mapping:

mgr.AddCheck("postgres", dbChecker,
    health.WithGroup("database"),          // logical group
    health.WithComponentType("datastore"), // component type hint
)

The HTTP reporter includes this metadata in the JSON response:

{
  "postgres": {
    "name": "postgres",
    "status": "healthy",
    "group": "database",
    "componentType": "datastore",
    "duration": "1.234ms",
    "lastCheck": "2026-03-28T14:30:00Z"
  }
}

Reporters

HTTP Server (default)
// Functional options
reporter := httpserver.New(
    httpserver.WithPort(9090),
    httpserver.WithMiddleware(httpserver.BasicAuth("user", "pass")),
)

// Or struct config
reporter := httpserver.NewReporter(httpserver.Config{
    Addr: "0.0.0.0",
    Port: 8181,
})

Endpoints: /livez, /readyz, /healthz, /.well-known/health

Individual checks by name (K8s convention):

curl localhost:8181/livez/postgres    # [+]postgres ok (200)
curl localhost:8181/readyz/redis      # [-]redis failed: timeout (503)
curl localhost:8181/livez?verbose     # list all checks
curl "localhost:8181/livez?verbose&exclude=redis"  # exclude a check
gRPC

Implements the standard grpc.health.v1.Health protocol. Separate module to keep the core zero-dep.

go get github.com/schigh/health/v2/reporter/grpc
reporter := grpc.NewReporter(grpc.Config{
    Addr: "0.0.0.0:8182",
})
OpenTelemetry

Emits health metrics via the OTel API. Separate module.

go get github.com/schigh/health/v2/reporter/otel
reporter, err := otel.NewReporter(otel.Config{
    MeterProvider: provider, // your OTel MeterProvider
})

Metrics: health.check.status, health.check.duration, health.check.executions, health.liveness, health.readiness, health.startup.

Prometheus

Exposes health metrics for Prometheus scraping. Separate module.

go get github.com/schigh/health/v2/reporter/prometheus
reporter := prometheus.NewReporter(prometheus.Config{
    Namespace: "myapp", // optional prefix
})
http.Handle("/metrics", reporter.Handler())

Metrics: health_check_status, health_check_duration_milliseconds, health_check_executions_total, health_liveness, health_readiness, health_startup.

stdout

Prints an ASCII table to stdout. Useful for local development.

test

Instrumented reporter for unit tests. Tracks state changes, toggle counts, and health check updates.

Kubernetes

livenessProbe:
  httpGet:
    path: /livez
    port: 8181
  initialDelaySeconds: 5
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /readyz
    port: 8181
  initialDelaySeconds: 5
  periodSeconds: 10

startupProbe:
  httpGet:
    path: /healthz
    port: 8181
  failureThreshold: 30
  periodSeconds: 2

Service Discovery

Every service using this library can expose a /.well-known/health manifest endpoint that describes its health checks, dependencies, and current state. Other services can discover this manifest and build transitive dependency graphs with zero infrastructure.

// Enable the manifest endpoint
reporter := httpserver.New(
    httpserver.WithServiceName("orders-api"),
    httpserver.WithServiceVersion("1.2.3"),
)

// Declare dependencies between checks
mgr.AddCheck("payments", httpChecker,
    health.WithDependsOn("http://payments:8181"),
)
Discovering the graph
// Fetch a single service's manifest
manifest, _ := discovery.FetchManifest(ctx, "http://orders:8181")

// Walk the full dependency graph (BFS, follows HTTP DependsOn entries)
graph, _ := discovery.DiscoverGraph(ctx, "http://api-gateway:8181")

// Render as Mermaid or Graphviz
fmt.Println(graph.Mermaid())
fmt.Println(graph.DOT())

The manifest at /.well-known/health returns:

{
  "service": "orders-api",
  "version": "1.2.3",
  "status": "pass",
  "checks": [
    {
      "name": "postgres",
      "status": "healthy",
      "group": "database",
      "componentType": "datastore",
      "duration": "1.2ms"
    },
    {
      "name": "payments",
      "status": "healthy",
      "dependsOn": ["http://payments:8181"]
    }
  ],
  "timestamp": "2026-03-28T20:00:00Z"
}

Architecture

Checkers (HTTP, TCP, DNS, Redis, DB, command)
    │
    ▼
CheckResult{Name, Status, Group, ComponentType, Duration, ...}
    │
    ▼
Manager (evaluates fitness, tracks startup/liveness/readiness)
    │
    ▼
Reporters (HTTP server, gRPC, stdout, test)
    │
    ▼
Kubernetes probes, monitoring, dashboards

Status

  • StatusHealthy — check is passing
  • StatusDegraded — check is passing with warnings (does not fail probes)
  • StatusUnhealthy — check is failing

License

Apache 2.0

Documentation

Overview

Package health provides a health check framework for Go services.

The framework is built around three interfaces: Manager, Checker, and Reporter. A Manager orchestrates health checks and dispatches results to reporters. Checkers perform individual health checks against dependencies (databases, caches, HTTP endpoints). Reporters expose health state to external observers (HTTP endpoints, gRPC, Prometheus, OpenTelemetry).

The core module has zero external dependencies. Reporters with heavy dependencies (gRPC, OTel, Prometheus) are available as separate Go modules.

See https://schigh.github.io/health/ for full documentation.

Index

Constants

This section is empty.

Variables

View Source
var ErrHealth = errors.New("health")

ErrHealth is the sentinel error for all health check errors. Use errors.Is to check if an error originated from this package.

Functions

This section is empty.

Types

type AddCheckOption

type AddCheckOption func(*AddCheckOptions)

AddCheckOption is a functional option for adding a Checker to a health manager.

func WithCheckFrequency

func WithCheckFrequency(f CheckFrequency, interval, delay time.Duration) AddCheckOption

WithCheckFrequency tells the health instance the CheckFrequency at which it will perform check with the specified Checker instance. If the value for CheckFrequency is CheckOnce, the Interval parameter is ignored. If the value for CheckFrequency is CheckAtInterval, the value of Interval will be used. If the value of Interval is equal to or less than zero, then the default Interval is used. If the value of Delay is equal to or less than zero, it is ignored. This option is not additive, so multiple invocations of this option will result in the last invocation being used to configure the Checker.

func WithComponentType

func WithComponentType(ct string) AddCheckOption

WithComponentType assigns a component type hint to a health check (e.g., "datastore", "http", "tcp"). Component types are included in self-describing health endpoints.

func WithDependsOn

func WithDependsOn(deps ...string) AddCheckOption

WithDependsOn declares that this check depends on other named checks. Used by the discovery protocol to build dependency graphs.

func WithGroup

func WithGroup(group string) AddCheckOption

WithGroup assigns a logical group to a health check (e.g., "database", "cache", "external"). Groups are included in self-describing health endpoints and can be used for filtering.

func WithLivenessImpact

func WithLivenessImpact() AddCheckOption

WithLivenessImpact marks a health check as affecting the liveness of the application. If a check that affects liveness fails, readiness is also affected.

func WithReadinessImpact

func WithReadinessImpact() AddCheckOption

WithReadinessImpact marks a health check as affecting the readiness of the application.

func WithStartupImpact

func WithStartupImpact() AddCheckOption

WithStartupImpact marks a health check as affecting startup probes. Startup checks must all pass before liveness and readiness probes are evaluated. Once all startup checks pass, startup is considered complete and is not re-evaluated.

type AddCheckOptions

type AddCheckOptions struct {
	Frequency        CheckFrequency
	Delay            time.Duration
	Interval         time.Duration
	AffectsLiveness  bool
	AffectsReadiness bool
	AffectsStartup   bool
	Group            string
	ComponentType    string
	DependsOn        []string
}

AddCheckOptions contain the options needed to add a new health check to the manager.

type CachedChecker

type CachedChecker struct {
	// contains filtered or unexported fields
}

CachedChecker wraps a Checker with TTL-based caching. During refresh, stale values are served to concurrent readers. Only one goroutine refreshes at a time (prevents thundering herd on expensive checks).

The first call always executes the underlying checker synchronously.

func WithCache

func WithCache(c Checker, ttl time.Duration) *CachedChecker

WithCache wraps a Checker with TTL-based result caching.

func (*CachedChecker) Check

func (c *CachedChecker) Check(ctx context.Context) *CheckResult

Check returns the cached result if still valid, otherwise refreshes.

type CheckFrequency

type CheckFrequency uint

CheckFrequency is a set of flags to instruct the check scheduling.

const (
	// CheckOnce instructs the Checker to perform its check one time. If the
	// CheckAfter flag is set, CheckOnce will perform the check after a duration
	// specified by the desired configuration.
	CheckOnce CheckFrequency = 1 << iota

	// CheckAtInterval instructs the Checker to perform its check at a specified
	// interval. If the CheckAfter flag is set, this check will begin after a
	// lapse of the combined Delay and Interval.
	CheckAtInterval

	// CheckAfter instructs the Checker to wait until after a specified time to
	// perform its check.
	CheckAfter
)

type CheckResult

type CheckResult struct {
	// Name identifies the check. Set by the manager from the registered check name.
	Name string
	// Status is the health status of this check.
	Status Status
	// AffectsLiveness indicates whether a failing check should affect liveness. Set by manager.
	AffectsLiveness bool
	// AffectsReadiness indicates whether a failing check should affect readiness. Set by manager.
	AffectsReadiness bool
	// AffectsStartup indicates whether this check must pass before startup completes. Set by manager.
	AffectsStartup bool
	// Group is the logical group for this check (e.g., "database", "cache"). Set by manager.
	Group string
	// ComponentType is a type hint for observability tools (e.g., "datastore", "http"). Set by manager.
	ComponentType string
	// DependsOn lists service URLs this check depends on, used by the discovery protocol. Set by manager.
	DependsOn []string
	// Error is the error from the last check execution, if any. Set by checker.
	Error error
	// ErrorSince is when the error state began. Set by checker.
	ErrorSince time.Time
	// Duration is how long the check took to execute. Set by checker.
	Duration time.Duration
	// Metadata is arbitrary key-value data for observability. Set by checker.
	Metadata map[string]string
	// Timestamp is when this check result was produced. Set by checker.
	Timestamp time.Time
}

CheckResult is the outcome of a single health check execution.

Some fields are set by the checker (Status, Error, Duration, Timestamp, Metadata), while others are overridden by the manager from the registered AddCheckOptions (Name, AffectsLiveness, AffectsReadiness, AffectsStartup, Group, ComponentType, DependsOn).

type Checker

type Checker interface {
	// Check runs the health check and returns a check result.
	Check(context.Context) *CheckResult
}

Checker performs an individual health check and returns the result to the health manager.

type CheckerFunc

type CheckerFunc func(context.Context) *CheckResult

CheckerFunc is a functional health checker.

func (CheckerFunc) Check

func (cf CheckerFunc) Check(ctx context.Context) *CheckResult

Check satisfies Checker.

type Logger

type Logger interface {
	Debug(msg string, args ...any)
	Info(msg string, args ...any)
	Warn(msg string, args ...any)
	Error(msg string, args ...any)
}

Logger defines the logging interface used internally. This interface is implemented by *log/slog.Logger.

func DefaultLogger

func DefaultLogger() Logger

DefaultLogger returns the default slog.Logger, which satisfies the Logger interface.

type Manager

type Manager interface {
	// Run the health check manager. Invoking this will initialize all managed
	// checks and reporters. This function returns a read-only channel of errors.
	// If a non-nil error is propagated across this channel, that means the health
	// check manager has entered an unrecoverable state, and the application
	// should halt.
	Run(context.Context) <-chan error

	// Stop the manager and all included checks and reporters. Should be called
	// when an application is shutting down gracefully.
	Stop(context.Context) error

	// AddCheck will add a named health checker to the manager. By default, an
	// added check will run once immediately upon startup, and not affect
	// liveness or readiness. Options are available to set an initial check delay,
	// a check interval, and any affects on liveness or readiness. All added
	// health checks must be named uniquely. Adding a check with the same name
	// as an existing health check (case-insensitive), will overwrite the previous
	// check. Attempting to add a check after the manager is running will return
	// an error.
	AddCheck(name string, c Checker, opts ...AddCheckOption) error

	// AddReporter adds a named health reporter to the manager. Every time a
	// health check is reported, the manager will relay the update to the
	// reporters. All added health reporters must be named uniquely.
	// Adding a reporter with the same name as an existing health reporter
	// (case-insensitive), will overwrite the previous reporter. Attempting to
	// add a reporter after the manager is running will return an error.
	AddReporter(name string, r Reporter) error
}

Manager defines a manager of health checks for the application. A Manager is a running daemon that oversees all the health checks added to it. When a Manager has new health check information, it dispatches an update to its Reporter(s).

type NoOpLogger

type NoOpLogger struct{}

NoOpLogger is used to suppress log output.

func (NoOpLogger) Debug

func (n NoOpLogger) Debug(_ string, _ ...any)

func (NoOpLogger) Error

func (n NoOpLogger) Error(_ string, _ ...any)

func (NoOpLogger) Info

func (n NoOpLogger) Info(_ string, _ ...any)

func (NoOpLogger) Warn

func (n NoOpLogger) Warn(_ string, _ ...any)

type Reporter

type Reporter interface {
	// Run the reporter.
	Run(context.Context) error

	// Stop the reporter and release resources.
	Stop(context.Context) error

	// SetLiveness instructs the reporter to relay the liveness of the
	// application to an external observer.
	SetLiveness(context.Context, bool)

	// SetReadiness instructs the reporter to relay the readiness of the
	// application to an external observer.
	SetReadiness(context.Context, bool)

	// SetStartup instructs the reporter to relay the startup status of the
	// application to an external observer. Startup probes tell Kubernetes
	// that the application has finished initializing.
	SetStartup(context.Context, bool)

	// UpdateHealthChecks is called from the manager to update the reported
	// health checks.
	UpdateHealthChecks(context.Context, map[string]*CheckResult)
}

Reporter reports the health status of the application to a receiving output. The mechanism by which the Reporter sends this information is implementation-dependent. Some reporters, such as an HTTP server, are pull-based, while others, such as a stdout reporter, are push-based. Each reporter variant is responsible for managing the health information passed to it from the health Manager. A Manager may have multiple reporters, and a Reporter may have multiple providers. The common dialog between reporters and providers is a map of CheckResult items keyed by string. It is implied that all health checks within a system are named uniquely. A Reporter must be prepared to receive updates at any time and at any frequency.

type Status

type Status int

Status represents the health status of a check.

const (
	// StatusHealthy indicates the check is passing.
	StatusHealthy Status = iota

	// StatusDegraded indicates the check is passing but with warnings.
	// Degraded checks do not fail liveness or readiness probes.
	StatusDegraded

	// StatusUnhealthy indicates the check is failing.
	StatusUnhealthy
)

func (Status) String

func (s Status) String() string

String returns the lowercase string representation of a Status.

Directories

Path Synopsis
checker
db
dns
tcp
e2e
cmd/gateway command
cmd/orders command
cmd/payments command
examples
basic command
internal
manager
std
reporter

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL