healthcheck

package module
v0.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 18, 2020 License: Apache-2.0 Imports: 13 Imported by: 18

README

healthcheck

Build Status Go Report Card GoDoc

Healthcheck is a library for implementing Kubernetes liveness and readiness probe handlers in your Go application.

Features

  • Integrates easily with Kubernetes. This library explicitly separates liveness vs. readiness checks instead of lumping everything into a single category of check.

  • Optionally exposes each check as a Prometheus gauge metric. This allows for cluster-wide monitoring and alerting on individual checks.

  • Supports asynchronous checks, which run in a background goroutine at a fixed interval. These are useful for expensive checks that you don't want to add latency to the liveness and readiness endpoints.

  • Includes a small library of generically useful checks for validating upstream DNS, TCP, HTTP, and database dependencies as well as checking basic health of the Go runtime.

Usage

See the GoDoc examples for more detail.

  • Install with go get or your favorite Go dependency manager: go get -u github.com/heptiolabs/healthcheck

  • Import the package: import "github.com/heptiolabs/healthcheck"

  • Create a healthcheck.Handler:

    health := healthcheck.NewHandler()
    
  • Configure some application-specific liveness checks (whether the app itself is unhealthy):

    // Our app is not happy if we've got more than 100 goroutines running.
    health.AddLivenessCheck("goroutine-threshold", healthcheck.GoroutineCountCheck(100))
    
  • Configure some application-specific readiness checks (whether the app is ready to serve requests):

    // Our app is not ready if we can't resolve our upstream dependency in DNS.
    health.AddReadinessCheck(
        "upstream-dep-dns",
        healthcheck.DNSResolveCheck("upstream.example.com", 50*time.Millisecond))
    
    // Our app is not ready if we can't connect to our database (`var db *sql.DB`) in <1s.
    health.AddReadinessCheck("database", healthcheck.DatabasePingCheck(db, 1*time.Second))
    
  • Expose the /live and /ready endpoints over HTTP (on port 8086):

    go http.ListenAndServe("0.0.0.0:8086", health)
    
  • Configure your Kubernetes container with HTTP liveness and readiness probes see the (Kubernetes documentation) for more detail:

    # this is a bare bones example
    # copy and paste livenessProbe and readinessProbe as appropriate for your app
    apiVersion: v1
    kind: Pod
    metadata:
      name: heptio-healthcheck-example
    spec:
      containers:
      - name: liveness
        image: your-registry/your-container
    
        # define a liveness probe that checks every 5 seconds, starting after 5 seconds
        livenessProbe:
          httpGet:
            path: /live
            port: 8086
          initialDelaySeconds: 5
          periodSeconds: 5
    
        # define a readiness probe that checks every 5 seconds
        readinessProbe:
          httpGet:
            path: /ready
            port: 8086
          periodSeconds: 5
    
  • If one of your readiness checks fails, Kubernetes will stop routing traffic to that pod within a few seconds (depending on periodSeconds and other factors).

  • If one of your liveness checks fails or your app becomes totally unresponsive, Kubernetes will restart your container.

HTTP Endpoints

When you run go http.ListenAndServe("0.0.0.0:8086", health), two HTTP endpoints are exposed:

  • /live: liveness endpoint (HTTP 200 if healthy, HTTP 503 if unhealthy)
  • /ready: readiness endpoint (HTTP 200 if healthy, HTTP 503 if unhealthy)

Pass the ?full=1 query parameter to see the full check results as JSON. These are omitted by default for performance.

Documentation

Overview

Package healthcheck helps you implement Kubernetes liveness and readiness checks for your application. It supports synchronous and asynchronous (background) checks. It can optionally report each check's status as a set of Prometheus gauge metrics for cluster-wide monitoring and alerting.

It also includes a small library of generic checks for DNS, TCP, and HTTP reachability as well as Goroutine usage.

Example
// Create a Handler that we can use to register liveness and readiness checks.
health := NewHandler()

// Add a readiness check to make sure an upstream dependency resolves in DNS.
// If this fails we don't want to receive requests, but we shouldn't be
// restarted or rescheduled.
upstreamHost := "upstream.example.com"
_ = health.AddReadinessCheck(
	"upstream-dep-dns",
	DNSResolveCheck(upstreamHost, 50*time.Millisecond))

// Add a liveness check to detect Goroutine leaks. If this fails we want
// to be restarted/rescheduled.
_ = health.AddLivenessCheck("goroutine-threshold", GoroutineCountCheck(100))

// Serve http://0.0.0.0:8080/live and http://0.0.0.0:8080/ready endpoints.
// go http.ListenAndServe("0.0.0.0:8080", health)

// Make a request to the readiness endpoint and print the response.
fmt.Print(dumpRequest(health, "GET", "/ready"))
Output:

HTTP/1.1 503 Service Unavailable
Connection: close
Content-Type: application/json; charset=utf-8

{}
Example (Advanced)
// Create a Handler that we can use to register liveness and readiness checks.
health := NewHandler()

// Make sure we can connect to an upstream dependency over TCP in less than
// 50ms. Run this check asynchronously in the background every 10 seconds
// instead of every time the /ready or /live endpoints are hit.
//
// Async is useful whenever a check is expensive (especially if it causes
// load on upstream services).
upstreamAddr := "upstream.example.com:5432"
_ = health.AddReadinessCheck(
	"upstream-dep-tcp",
	Async(TCPDialCheck(upstreamAddr, 50*time.Millisecond), 10*time.Second))

// Add a readiness check against the health of an upstream HTTP dependency
upstreamURL := "http://upstream-svc.example.com:8080/healthy"
_ = health.AddReadinessCheck(
	"upstream-dep-http",
	HTTPGetCheck(upstreamURL, 500*time.Millisecond))

// Implement a custom check with a 50 millisecond timeout.
_ = health.AddLivenessCheck("custom-check-with-timeout", Timeout(func() error {
	// Simulate some work that could take a long time
	time.Sleep(time.Millisecond * 100)
	return nil
}, 50*time.Millisecond))

// Expose the readiness endpoints on a custom path /healthz mixed into
// our main application mux.
mux := http.NewServeMux()
mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
	_, _ = w.Write([]byte("Hello, world!"))
})
mux.HandleFunc("/healthz", health.ReadyEndpoint)

// Sleep for just a moment to make sure our Async handler had a chance to run
time.Sleep(500 * time.Millisecond)

// Make a sample request to the /healthz endpoint and print the response.
fmt.Println(dumpRequest(mux, "GET", "/healthz"))
Output:

HTTP/1.1 503 Service Unavailable
Connection: close
Content-Type: application/json; charset=utf-8

{}
Example (Database)
// Connect to a database/sql database
database := connectToDatabase()

// Create a Handler that we can use to register liveness and readiness checks.
health := NewHandler()

// Add a readiness check to we don't receive requests unless we can reach
// the database with a ping in <1 second.
_ = health.AddReadinessCheck("database", DatabasePingCheck(database, 1*time.Second))

// Serve http://0.0.0.0:8080/live and http://0.0.0.0:8080/ready endpoints.
// go http.ListenAndServe("0.0.0.0:8080", health)

// Make a request to the readiness endpoint and print the response.
fmt.Print(dumpRequest(health, "GET", "/ready?full=1"))
Output:

HTTP/1.1 200 OK
Connection: close
Content-Type: application/json; charset=utf-8

{
    "database": "OK"
}
Example (Metrics)
// Create a new Prometheus registry (you'd likely already have one of these).
registry := prometheus.NewRegistry()

// Create a metrics-exposing Handler for the Prometheus registry
// The healthcheck related metrics will be prefixed with the provided namespace
health := NewMetricsHandler(registry, "example")

// Add a simple readiness check that always fails.
_ = health.AddReadinessCheck("failing-check", func() error {
	return fmt.Errorf("example failure")
})

// Add a liveness check that always succeeds
_ = health.AddLivenessCheck("successful-check", func() error {
	return nil
})

// Create an "admin" listener on 0.0.0.0:9402
adminMux := http.NewServeMux()
// go http.ListenAndServe("0.0.0.0:9402", adminMux)

// Expose prometheus metrics on /metrics
adminMux.Handle("/metrics", promhttp.HandlerFor(registry, promhttp.HandlerOpts{}))

// Expose a liveness check on /live
adminMux.HandleFunc("/live", health.LiveEndpoint)

// Expose a readiness check on /ready
adminMux.HandleFunc("/ready", health.ReadyEndpoint)

// Make a request to the metrics endpoint and print the response.
fmt.Println(dumpRequest(adminMux, "GET", "/metrics"))
Output:

HTTP/1.1 200 OK
Content-Length: 245
Content-Type: text/plain; version=0.0.4

# HELP example_healthcheck_status Current check status (0 indicates success, 1 indicates failure)
# TYPE example_healthcheck_status gauge
example_healthcheck_status{check="failing-check"} 1
example_healthcheck_status{check="successful-check"} 0

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	ErrAlreadyExists = errors.New("healthcheck: already exists")
	ErrNotFound      = errors.New("healthcheck: not found")
)
View Source
var ErrNoData = errors.New("no data yet")

ErrNoData is returned if the first call of an Async() wrapped Check has not yet returned.

Functions

This section is empty.

Types

type Check

type Check func() error

Check is a health/readiness check.

func AMQPDialCheck

func AMQPDialCheck(uri string) Check

func Async

func Async(check Check, interval time.Duration) Check

Async converts a Check into an asynchronous check that runs in a background goroutine at a fixed interval. The check is called at a fixed rate, not with a fixed delay between invocations. If your check takes longer than the interval to execute, the next execution will happen immediately.

Note: if you need to clean up the background goroutine, use AsyncWithContext().

func AsyncWithContext

func AsyncWithContext(ctx context.Context, check Check, interval time.Duration) Check

AsyncWithContext converts a Check into an asynchronous check that runs in a background goroutine at a fixed interval. The check is called at a fixed rate, not with a fixed delay between invocations. If your check takes longer than the interval to execute, the next execution will happen immediately.

Note: if you don't need to cancel execution (because this runs forever), use Async()

func DNSResolveCheck

func DNSResolveCheck(host string, timeout time.Duration) Check

DNSResolveCheck returns a Check that makes sure the provided host can resolve to at least one IP address within the specified timeout.

func DatabasePingCheck

func DatabasePingCheck(database *sql.DB, timeout time.Duration) Check

DatabasePingCheck returns a Check that validates connectivity to a database/sql.DB using Ping().

func GoroutineCountCheck

func GoroutineCountCheck(threshold int) Check

GoroutineCountCheck returns a Check that fails if too many goroutines are running (which could indicate a resource leak).

func HTTPGetCheck

func HTTPGetCheck(url string, timeout time.Duration) Check

HTTPGetCheck returns a Check that performs an HTTP GET request against the specified URL. The check fails if the response times out or returns a non-200 status code.

func TCPDialCheck

func TCPDialCheck(addr string, timeout time.Duration) Check

TCPDialCheck returns a Check that checks TCP connectivity to the provided endpoint.

func Timeout

func Timeout(check Check, timeout time.Duration) Check

Timeout adds a timeout to a Check. If the underlying check takes longer than the timeout, it returns an error.

type Checks

type Checks interface {
	// AddLivenessCheck adds a check that indicates that this instance of the
	// application should be destroyed or restarted. A failed liveness check
	// indicates that this instance is unhealthy, not some upstream dependency.
	// Every liveness check is also included as a readiness check.
	AddLivenessCheck(name string, check Check) error

	// AddReadinessCheck adds a check that indicates that this instance of the
	// application is currently unable to serve requests because of an upstream
	// or some transient failure. If a readiness check fails, this instance
	// should no longer receiver requests, but should not be restarted or
	// destroyed.
	AddReadinessCheck(name string, check Check) error

	RemoveLivenessCheck(name string) error

	RemoveReadinessCheck(name string) error
}

type Config

type Config struct {
	Timeout       int    `json:"timeout" yaml:"timeout" mapstructure:"timeout"`
	LivenessName  string `json:"livenessName" yaml:"livenessName" mapstructure:"livenessName"`
	ReadinessName string `json:"readinessName" yaml:"readinessName" mapstructure:"readinessName"`
}

type Endpoints

type Endpoints interface {
	// LiveEndpoint is the HTTP handler for just the /live endpoint, which is
	// useful if you need to attach it into your own HTTP handler tree.
	LiveEndpoint(http.ResponseWriter, *http.Request)

	// ReadyEndpoint is the HTTP handler for just the /ready endpoint, which is
	// useful if you need to attach it into your own HTTP handler tree.
	ReadyEndpoint(http.ResponseWriter, *http.Request)
}

type Handler

type Handler interface {
	// The Handler is an http.Handler, so it can be exposed directly and handle
	// /live and /ready endpoints.
	http.Handler
	Checks
	Endpoints
}

Handler is an http.Handler with additional methods that register health and readiness checks. It handles handle "/live" and "/ready" HTTP endpoints.

func NewHandler

func NewHandler() Handler

NewHandler creates a new basic Handler

func NewMetricsHandler

func NewMetricsHandler(registry prometheus.Registerer, namespace string) Handler

NewMetricsHandler returns a healthcheck Handler that also exposes metrics into the provided Prometheus registry.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL