platform-health

module
v0.9.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 23, 2026 License: BSD-3-Clause

README

Platform Health

Lightweight & extensible platform health monitoring.

Overview

Platform Health is a simple client/server system for lightweight health monitoring of platform components and systems.

The Platform Health client (ph client) sends a gRPC health check request to a Platform Health server which is configured to probe a set of network services. Probes run asynchronously on the server (subject to configurable timeouts), with the accumulated response returned to the client.

Providers

Probes use a compile-time provider plugin system that supports extension to monitoring of arbitrary services. Integrated providers include:

  • system: Hierarchical grouping of related health checks with status aggregation
  • satellite: A separate satellite instance of the Platform Health server
  • ssh: SSH protocol handshake with host key verification
  • tcp: TCP connectivity checks
  • tls: TLS handshake and certificate verification
  • http: HTTP(S) health checks with CEL-based response validation, full REST/GraphQL API support, and TLS details
  • grpc: gRPC Health v1 service status checks
  • kubernetes: Kubernetes resource existence and readiness
  • helm: Helm release existence and deployment status
  • vault: Vault cluster initialization and seal status

Each provider implements the Instance interface, with the health of each instance obtained asynchronously, and contributing to the overall response.

Installation

macOS/Linux
brew install isometry/tap/platform-health
$ ph server -l & sleep 1 && ph client && kill %1
{"status":"HEALTHY", "duration":"0.000004833s"}
Kubernetes
Install via helm chart
helm upgrade \
    --install platform-health \
    -n platform-health --create-namespace \
    oci://ghcr.io/isometry/charts/platform-health
Install via kubectl
kubectl create configmap platform-health --from-file=platform-health.yaml=/dev/stdin <<-EOF
components:
  ssh@localhost:
    type: tcp
    spec:
      host: localhost
      port: 22
  gmail:
    type: tls
    spec:
      host: smtp.gmail.com
      port: 465
  google:
    type: http
    spec:
      url: https://google.com
EOF

kubectl create deployment platform-health --image ghcr.io/isometry/platform-health:latest --port=8080

kubectl patch deployment platform-health --patch-file=/dev/stdin <<-EOF
  spec:
    template:
      spec:
        volumes:
          - name: config
            configMap:
              name: platform-health
        containers:
          - name: platform-health
            args:
              - -vv
            volumeMounts:
              - name: config
                mountPath: /config
EOF

kubectl create service loadbalancer platform-health --tcp=8080:8080

Usage

Client
# Check all components
ph client

# Check specific components
ph client -c google -c github

# Check with hierarchical path (system/component)
ph client -c fluxcd/source-controller

# Connect to remote server
ph client prod:8080 -c google
One-Shot Mode

Run health checks once and exit without starting a server:

ph check

# Check specific components only
ph check -c google -c fluxcd/source-controller

This is useful for:

  • Validating configuration files
  • Local health check verification
  • CI/CD pipeline integration
  • Testing specific components
Ad-hoc Checks

Create and run health checks without a configuration file:

# TCP connectivity check
ph check tcp --host example.com --port 443

# HTTP health check
ph check http --url https://api.example.com/health

# HTTP check with CEL expression
ph check http --url https://api.example.com/health \
  --check 'response.status == 200'

# TLS certificate check
ph check tls --host example.com --port 443
Context Inspection

Inspect the CEL evaluation context for debugging expressions:

# View context for a configured component
ph context my-app

# View context for nested system components
ph context fluxcd/source-controller

# View context for ad-hoc provider
ph context http --url https://api.example.com/health

Configuration

The Platform Health server reads configuration from a YAML file. By default, it searches for platform-health.yaml in standard config paths (. and /config).

You can customize this with:

  • --config-path: Override config file search paths (can be specified multiple times)
  • --config-name: Change the config file name (without extension)
# Use custom config file
ph server --config-name myconfig

# Add search paths
ph server --config-path /custom/path --config-path ./local
Configuration Structure

All health check components are defined under the components key:

components:
  <component-name>:
    type: <provider-type>      # required
    spec:                      # provider-specific configuration
      <key>: <value>
    checks: [...]              # optional CEL expressions
    timeout: <duration>        # optional per-instance timeout
    order: <int>               # optional execution order (default: 0)
    always: <bool>             # optional always-execute flag (default: false)
    components: {...}          # optional nested children (system provider)

Component names can contain any characters valid in YAML keys, but should avoid / which is used for path-filtered queries. The type field specifies which provider to use, and provider-specific configuration goes under spec.

Example

The following configuration will monitor that /something/ is listening on tcp/22 of localhost; validate connectivity and TLS handshake to the Gmail SSL mail-submission port; and validate that Google is accessible and returning a 200 status code:

components:
  ssh@localhost:
    type: tcp
    spec:
      host: localhost
      port: 22
  gmail:
    type: tls
    spec:
      host: smtp.gmail.com
      port: 465
  google:
    type: http
    spec:
      url: https://google.com
  api-health:
    type: http
    spec:
      url: https://api.example.com/health
      method: GET
    checks:
      - check: 'response.status == 200'
        message: "Expected HTTP 200"
      - check: 'response.json.status == "healthy"'
        message: "Service unhealthy"
Hierarchical Grouping

Use the system provider to group related checks:

components:
  fluxcd:
    type: system
    components:
      source-controller:
        type: kubernetes
        spec:
          kind: Deployment
          namespace: flux-system
          name: source-controller
      kustomize-controller:
        type: kubernetes
        spec:
          kind: Deployment
          namespace: flux-system
          name: kustomize-controller

The system is reported "healthy" only if all sub-components are healthy.

Execution Ordering

By default, all instances execute in parallel (order 0). Use the order field to group instances into sequential execution waves — lower values run first, and instances within the same order group run in parallel.

The always flag marks an instance to execute even after fail-fast cancellation. Always-flagged instances use the parent context rather than the errgroup's, so they survive cancellation. They also don't trigger fail-fast themselves.

components:
  # Order 0 (default): runs first, in parallel with other order-0 checks
  api-health:
    type: http
    spec:
      url: https://api.example.com/health

  database:
    type: tcp
    spec:
      host: db.example.com
      port: 5432

  # Order 1: runs after all order-0 checks complete
  # always: true ensures this runs even if earlier checks trigger fail-fast
  istio-quit:
    type: http
    order: 1
    always: true
    spec:
      url: http://localhost:15020/quitquitquit
      method: POST
CEL Expressions

Several providers support CEL (Common Expression Language) expressions for custom health check validation:

  • http: HTTP request and response details with JSON parsing for REST/GraphQL API validation
  • tls: TLS connection and certificate details
  • ssh: SSH host key and connection details
  • kubernetes: Full resource(s), including metadata, spec, status, etc.
  • helm: Release info, chart metadata, values and manifests

Use ph context to inspect the evaluation context available to your expressions. See pkg/checks/README.md for CEL syntax examples and patterns.

Directories

Path Synopsis
cmd
ph command
phc command
phs command
internal
cliflags
Package cliflags provides reusable flag definitions for CLI commands.
Package cliflags provides reusable flag definitions for CLI commands.
pkg
checks
Package checks provides shared CEL (Common Expression Language) evaluation capabilities for health check providers.
Package checks provides shared CEL (Common Expression Language) evaluation capabilities for health check providers.
checks/functions
Package functions provides custom CEL functions for health check expressions.
Package functions provides custom CEL functions for health check expressions.
commands/context
Package context provides the `ph context` command for inspecting CEL evaluation contexts.
Package context provides the `ph context` command for inspecting CEL evaluation contexts.
commands/shared
Package shared provides common utilities for provider-based commands.
Package shared provides common utilities for provider-based commands.
netutil
Package netutil provides network-related utility functions.
Package netutil provides network-related utility functions.
provider/http
Package http provides an HTTP health check provider with response validation capabilities using CEL (Common Expression Language).
Package http provides an HTTP health check provider with response validation capabilities using CEL (Common Expression Language).
provider/kubernetes
Code generated by go generate; DO NOT EDIT.
Code generated by go generate; DO NOT EDIT.
provider/kubernetes/testutil
Package testutil provides test utilities for the kubernetes provider.
Package testutil provides test utilities for the kubernetes provider.
provider/ssh
Package ssh provides an SSH protocol health check provider.
Package ssh provides an SSH protocol health check provider.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL