healthcheck

package module
v0.0.0-...-4e84b94 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 19, 2018 License: Apache-2.0 Imports: 0 Imported by: 0

README

healthcheck

Codecov Build Status Go Report Card GoDoc

Healthcheck is a library for implementing Kubernetes liveness and readiness probe handlers in your Go application.

Features

  • Integrates easily with Kubernetes. This library explicitly separates liveness vs. readiness checks instead of lumping everything into a single category of check.

  • Optionally exposes each check as a Prometheus gauge metric. This allows for cluster-wide monitoring and alerting on individual checks.

  • Supports asynchronous checks, which run in a background goroutine at a fixed interval. These are useful for expensive checks that you don't want to add latency to the liveness and readiness endpoints.

  • Includes a small library of generically useful checks for validating upstream DNS, TCP, HTTP, and database dependencies as well as checking basic health of the Go runtime.

Usage

See the GoDoc examples for more detail.

  • Install with go get or your favorite Go dependency manager: go get -u github.com/heptiolabs/healthcheck

  • Import the package: import "github.com/heptiolabs/healthcheck/checks" & import "github.com/heptiolabs/healthcheck/handlers"

  • Create a healthcheck.Handler:

    health := handlers.NewHandler(handlers.Options{})
    

You can also pass some metadata when creating a handler. Those metadata will be returned by the Endpoints

health := handlers.NewHandler(handlers.Options{
   Metadata: map[string]string{"foo": "bar"},
})

A great use case can be to pass the app-name, the app-version and the commit number in order to know which commit is making the app unhealthy

  • Configure some application-specific liveness checks (whether the app itself is unhealthy):

    // Our app is not happy if we've got more than 100 goroutines running.
    health.AddLivenessCheck("goroutine-threshold", healthcheck.GoroutineCountCheck(100))
    
  • Configure some application-specific readiness checks (whether the app is ready to serve requests):

    // Our app is not ready if we can't resolve our upstream dependency in DNS.
    health.AddReadinessCheck(
        "upstream-dep-dns",
        healthcheck.DNSResolveCheck("upstream.example.com", 50*time.Millisecond))
    
    // Our app is not ready if we can't connect to our database (`var db *sql.DB`) in <1s.
    health.AddReadinessCheck("database", healthcheck.DatabasePingCheck(db, 1*time.Second))
    
  • Expose the /live and /ready endpoints over HTTP (on port 8086):

    go http.ListenAndServe("0.0.0.0:8086", health)
    
  • Configure your Kubernetes container with HTTP liveness and readiness probes see the (Kubernetes documentation) for more detail:

    # this is a bare bones example
    # copy and paste livenessProbe and readinessProbe as appropriate for your app
    apiVersion: v1
    kind: Pod
    metadata:
      name: heptio-healthcheck-example
    spec:
      containers:
      - name: liveness
        image: your-registry/your-container
    
        # define a liveness probe that checks every 5 seconds, starting after 5 seconds
        livenessProbe:
          httpGet:
            path: /live
            port: 8086
          initialDelaySeconds: 5
          periodSeconds: 5
    
        # define a readiness probe that checks every 5 seconds
        readinessProbe:
          httpGet:
            path: /ready
            port: 8086
          periodSeconds: 5
    
  • If one of your readiness checks fails, Kubernetes will stop routing traffic to that pod within a few seconds (depending on periodSeconds and other factors).

  • If one of your liveness checks fails or your app becomes totally unresponsive, Kubernetes will restart your container.

HTTP Endpoints

Default routes

When you run go http.ListenAndServe("0.0.0.0:8086", health), two HTTP endpoints are exposed:

  • /live: liveness endpoint (HTTP 200 if healthy, HTTP 503 if unhealthy)
  • /ready: readiness endpoint (HTTP 200 if healthy, HTTP 503 if unhealthy)
Custom routes

You can also use other routes than /live & /ready by setting the HEALTH_LIVENESS_ROUTE and/or HEALTH_READINESS_ROUTE env var on your application

Endpoint response

Pass the ?full=1 query parameter to see the full check results as JSON. These are omitted by default for performance.

JSON result will look like this:

{
  "Checks": {
    "test-readiness-check": "failed readiness check",
    "redis-check":  "error message from check"
  },
  "Metadata": {
    "some fake metadata": "fake value",
    "app_name":  "fake service name"
  }
}

Documentation

Overview

Package healthcheck helps you implement Kubernetes liveness and readiness checks for your application. It supports synchronous and asynchronous (background) checks. It can optionally report each check's status as a set of Prometheus gauge metrics for cluster-wide monitoring and alerting.

It also includes a small library of generic checks for DNS, TCP, and HTTP reachability as well as Goroutine usage.

Example
// Create a Handler that we can use to register liveness and readiness checks.
health := handlers.NewHandler(handlers.Options{})

// Add a readiness check to make sure an upstream dependency resolves in DNS.
// If this fails we don't want to receive requests, but we shouldn't be
// restarted or rescheduled.
upstreamHost := "upstream.example.com"
health.AddReadinessCheck(
	"upstream-dep-dns",
	healthcheck.DNSResolveCheck(upstreamHost, 50*time.Millisecond))

// Add a liveness check to detect Goroutine leaks. If this fails we want
// to be restarted/rescheduled.
health.AddLivenessCheck("goroutine-threshold", healthcheck.GoroutineCountCheck(100))

// Serve http://0.0.0.0:8080/live and http://0.0.0.0:8080/ready endpoints.
// go http.ListenAndServe("0.0.0.0:8080", health)

// Make a request to the readiness endpoint and print the response.
fmt.Print(dumpRequest(health, "GET", "/ready"))
Output:

HTTP/1.1 503 Service Unavailable
Connection: close
Content-Type: application/json; charset=utf-8

{}
Example (Advanced)
// Create a Handler that we can use to register liveness and readiness checks.
health := handlers.NewHandler(handlers.Options{})

// Make sure we can connect to an upstream dependency over TCP in less than
// 50ms. Run this check asynchronously in the background every 10 seconds
// instead of every time the /ready or /live endpoints are hit.
//
// Async is useful whenever a check is expensive (especially if it causes
// load on upstream services).
upstreamAddr := "upstream.example.com:5432"
health.AddReadinessCheck(
	"upstream-dep-tcp",
	healthcheck.Async(healthcheck.TCPDialCheck(upstreamAddr, 50*time.Millisecond), 10*time.Second))

// Add a readiness check against the health of an upstream HTTP dependency
upstreamURL := "http://upstream-svc.example.com:8080/healthy"
health.AddReadinessCheck(
	"upstream-dep-http",
	healthcheck.HTTPGetCheck(upstreamURL, 500*time.Millisecond))

// Implement a custom check with a 50 millisecond timeout.
health.AddLivenessCheck("custom-check-with-timeout", healthcheck.Timeout(func() error {
	// Simulate some work that could take a long time
	time.Sleep(time.Millisecond * 100)
	return nil
}, 50*time.Millisecond))

// Expose the readiness endpoints on a custom path /healthz mixed into
// our main application mux.
mux := http.NewServeMux()
mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
	w.Write([]byte("Hello, world!"))
})
mux.HandleFunc("/healthz", health.ReadyEndpoint)

// Sleep for just a moment to make sure our Async handlers had a chance to run
time.Sleep(500 * time.Millisecond)

// Make a sample request to the /healthz endpoint and print the response.
fmt.Println(dumpRequest(mux, "GET", "/healthz"))
Output:

HTTP/1.1 503 Service Unavailable
Connection: close
Content-Type: application/json; charset=utf-8

{}
Example (Database)
// Connect to a database/sql database
var database *sql.DB
database = connectToDatabase()

// Create a Handler that we can use to register liveness and readiness checks.
// Add some metadata on it
health := handlers.NewHandler(handlers.Options{
	Metadata: map[string]string{"foo": "bar"},
})

// Add a readiness check to we don't receive requests unless we can reach
// the database with a ping in <1 second.
health.AddReadinessCheck("database", healthcheck.DatabasePingCheck(database, 1*time.Second))

// Serve http://0.0.0.0:8080/live and http://0.0.0.0:8080/ready endpoints.
// go http.ListenAndServe("0.0.0.0:8080", health)

// Make a request to the readiness endpoint and print the response.
fmt.Print(dumpRequest(health, "GET", "/ready?full=1"))
Output:

HTTP/1.1 200 OK
Connection: close
Content-Type: application/json; charset=utf-8

{
    "Checks": {
        "database": "OK"
    },
    "Metadata": {
        "foo": "bar"
    }
}
Example (Metrics)
// Create a new Prometheus registry (you'd likely already have one of these).
registry := prometheus.NewRegistry()

// Create a metrics-exposing Handler for the Prometheus registry
// The healthcheck related metrics will be prefixed with the provided namespace
health := handlers.NewMetricsHandler(registry, "example", handlers.Options{})

// Add a simple readiness check that always fails.
health.AddReadinessCheck("failing-check", func() error {
	return fmt.Errorf("example failure")
})

// Add a liveness check that always succeeds
health.AddLivenessCheck("successful-check", func() error {
	return nil
})

// Create an "admin" listener on 0.0.0.0:9402
adminMux := http.NewServeMux()
// go http.ListenAndServe("0.0.0.0:9402", adminMux)

// Expose prometheus metrics on /metrics
adminMux.Handle("/metrics", promhttp.HandlerFor(registry, promhttp.HandlerOpts{}))

// Expose a liveness check on /live
adminMux.HandleFunc("/live", health.LiveEndpoint)

// Expose a readiness check on /ready
adminMux.HandleFunc("/ready", health.ReadyEndpoint)

// Make a request to the metrics endpoint and print the response.
fmt.Println(dumpRequest(adminMux, "GET", "/metrics"))
Output:

HTTP/1.1 200 OK
Content-Length: 245
Content-Type: text/plain; version=0.0.4; charset=utf-8

# HELP example_healthcheck_status Current check status (0 indicates success, 1 indicates failure)
# TYPE example_healthcheck_status gauge
example_healthcheck_status{check="failing-check"} 1
example_healthcheck_status{check="successful-check"} 0

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL