Documentation
¶
Index ¶
- Constants
- type Application
- type ApplicationOption
- type Component
- type ComponentOption
- type ComponentStatus
- type EventHooks
- type HealthChecker
- type HealthReporter
- type HealthServer
- type HealthServerOption
- type Logger
- type MetricsObserver
- type NamedComponentStatus
- type RestartPolicy
- type Supervisor
- func (s *Supervisor) Add(c Component, opts ...ComponentOption)
- func (s *Supervisor) ComponentHealth(name string) (err error, known bool)
- func (s *Supervisor) HealthReport() map[string]ComponentStatus
- func (s *Supervisor) HealthReportOrdered() []NamedComponentStatus
- func (s *Supervisor) Run(ctx context.Context) error
- type SupervisorOption
- func WithEventHooks(h *EventHooks) SupervisorOption
- func WithHealthInterval(d time.Duration) SupervisorOption
- func WithHealthTimeout(d time.Duration) SupervisorOption
- func WithMetricsObserver(m MetricsObserver) SupervisorOption
- func WithRestartResetWindow(d time.Duration) SupervisorOption
- func WithStartTimeout(d time.Duration) SupervisorOption
- func WithStopTimeout(d time.Duration) SupervisorOption
- func WithSupervisorLogger(l Logger) SupervisorOption
- type Tier
Constants ¶
const ( // ErrNothingToRun is returned by Application.Run when neither a MainFunc // nor a Supervisor was provided. ErrNothingToRun appError = "samsara: nothing to run (no main function or supervisor provided)" // ErrShutdownTimeout is returned when the application does not stop within // the configured ShutdownTimeout after the context is cancelled. ErrShutdownTimeout appError = "samsara: shutdown timeout exceeded" // ErrComponentAlreadyRegistered is returned when a component with the same // name is added to the Supervisor more than once. ErrComponentAlreadyRegistered appError = "samsara: component already registered" // ErrCircularDependency is returned when the Supervisor detects a cycle in // the component dependency graph. ErrCircularDependency appError = "samsara: circular dependency detected" // ErrUnknownDependency is returned when a component declares a dependency on // a name that has not been registered with the Supervisor. ErrUnknownDependency appError = "samsara: unknown dependency" // ErrSupervisorRunning is returned when Add is called after Run has started. ErrSupervisorRunning appError = "samsara: cannot add component after supervisor has started" )
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Application ¶
type Application struct {
// contains filtered or unexported fields
}
Application is the top-level entry point for a service. It wires together signal handling, an optional Supervisor, and a main function into a single blocking Run call.
Typical usage:
sup := samsara.NewSupervisor(...)
sup.Add(myDB, samsara.WithTier(samsara.TierCritical))
sup.Add(myCache, samsara.WithTier(samsara.TierSignificant))
app := samsara.NewApplication(
samsara.WithSupervisor(sup),
samsara.WithMainFunc(server.Run),
samsara.WithShutdownTimeout(20*time.Second),
)
if err := app.Run(); err != nil {
log.Fatal(err)
}
func NewApplication ¶
func NewApplication(opts ...ApplicationOption) *Application
NewApplication constructs an Application with the supplied options.
func (*Application) Run ¶
func (a *Application) Run() error
Run starts the application and blocks until it exits.
Startup order:
- Root context is created and wired to OS signals (SIGINT, SIGTERM, SIGHUP, SIGQUIT).
- Supervisor.Run is launched in a goroutine (if a Supervisor was provided).
- The main function is launched in a goroutine (if one was provided).
Shutdown is triggered by any of:
- An OS signal.
- A call to Application.Shutdown(cause).
- The main function returning (with or without an error).
- The Supervisor encountering a critical failure.
After the shutdown signal, Run waits up to ShutdownTimeout for both goroutines to finish. If they do not, ErrShutdownTimeout is joined into the returned error.
func (*Application) Shutdown ¶
func (a *Application) Shutdown(cause error)
Shutdown cancels the application's root context, triggering a graceful shutdown. The optional cause is attached to the context so that components and the main function can inspect it via context.Cause if needed.
It is safe to call from any goroutine. Calling Shutdown before Run is a no-op. Calling it multiple times is safe; only the first cause is recorded.
type ApplicationOption ¶
type ApplicationOption func(*applicationConfig)
ApplicationOption configures an Application.
func WithLogger ¶
func WithLogger(l Logger) ApplicationOption
WithLogger sets the logger used by the Application itself (not the Supervisor — pass WithSupervisorLogger to NewSupervisor for that).
func WithMainFunc ¶
func WithMainFunc(f func(ctx context.Context) error) ApplicationOption
WithMainFunc sets the primary function that runs as the application's main goroutine. The context passed to f is cancelled when an OS shutdown signal is received or when the Supervisor encounters a critical failure. Returning a non-nil error from f is treated as an application-level failure.
func WithShutdownTimeout ¶
func WithShutdownTimeout(d time.Duration) ApplicationOption
WithShutdownTimeout sets how long the application waits for the main function and supervisor to exit after the root context is cancelled. Defaults to 15 s. If the timeout is exceeded, ErrShutdownTimeout is joined into the returned error.
func WithSupervisor ¶
func WithSupervisor(s *Supervisor) ApplicationOption
WithSupervisor attaches a Supervisor to the application. The supervisor is started alongside the main function and both receive the same root context.
type Component ¶
type Component interface {
Name() string
Start(ctx context.Context, ready func()) error
Stop(ctx context.Context) error
}
Component is the fundamental unit managed by the Supervisor.
Lifecycle contract ¶
Start must block for the entire lifetime of the component. The ready function must be called exactly once, as soon as the component is ready to serve traffic — not before, and never more than once (the supervisor wraps it in sync.Once so double-calls are safe, but semantically wrong). The supervisor will not start the next component until ready is called. Start should return nil on a clean exit (ctx cancelled, Stop called) and a non-nil error on unexpected failure.
If ready is never called, the supervisor will wait up to startTimeout and then treat the attempt as a failure.
Stop is called with a context carrying the configured stop timeout. It must not block longer than that context allows. Stop must be idempotent — the supervisor may call it more than once in some shutdown paths. Stop must also be safe to call concurrently with a still-running Start (e.g. before a port is bound), so components must guard shared state accordingly.
If ctx is cancelled during Start (clean shutdown), Start should return nil. Only return a non-nil error when an abnormal failure occurs that the supervisor should treat as a crash.
Background goroutines ¶
Start is allowed to spawn background goroutines, but those goroutines must exit when ctx is cancelled or Stop is called — whichever comes first. A component that leaks goroutines after Stop returns will cause resource leaks on restart. The supervisor has no way to detect or recover from this.
Example — an HTTP server:
func (s *Server) Start(ctx context.Context, ready func()) error {
ln, err := net.Listen("tcp", s.addr)
if err != nil { return err }
ready() // port is bound — supervisor proceeds
return s.srv.Serve(ln) // blocks until Stop calls Shutdown
}
Example — a DB pool (no run loop needed):
func (p *Pool) Start(ctx context.Context, ready func()) error {
p.stop = make(chan struct{})
pool, err := pgxpool.New(ctx, p.dsn)
if err != nil { return err }
p.pool = pool
ready() // pool is up — supervisor proceeds
select {
case <-p.stop:
case <-ctx.Done():
}
return nil
}
type ComponentOption ¶
type ComponentOption func(*componentConfig)
ComponentOption configures a managedComponent at registration time.
func WithDependencies ¶
func WithDependencies(names ...string) ComponentOption
WithDependencies declares that this component must not be started until all named dependencies are running. Names must match Component.Name() of other registered components.
func WithRestartPolicy ¶
func WithRestartPolicy(p RestartPolicy) ComponentOption
WithRestartPolicy sets the restart policy. Defaults to NeverRestart().
func WithTier ¶
func WithTier(t Tier) ComponentOption
WithTier sets the importance tier of a component. Defaults to TierCritical.
type ComponentStatus ¶
type ComponentStatus struct {
Err error // nil means healthy; non-nil means last health check failed
Known bool // false until the first health check completes
Tier Tier
RestartCount int // number of times the component has been restarted by the supervisor
}
ComponentStatus is a point-in-time snapshot of a single component's health.
type EventHooks ¶
type EventHooks struct {
// OnUnhealthy is called when a component's Health check returns a non-nil
// error. It receives the component name and the health error.
OnUnhealthy func(component string, err error)
// OnRecovered is called when a component's Health check returns nil again
// after a previous OnUnhealthy event.
OnRecovered func(component string)
// OnFailed is called when a component fails permanently — either because
// its restart policy decided not to retry, or because all retries were
// exhausted. It receives the component name and the final error.
OnFailed func(component string, err error)
// OnRestart is called each time the supervisor schedules a restart attempt
// for a component. It receives the component name, the triggering error,
// and the attempt number (1-based).
OnRestart func(component string, err error, attempt int)
}
EventHooks carries optional callbacks that the Supervisor fires on significant component lifecycle events. All fields are optional; a nil function is silently skipped.
Hooks are called synchronously inside the supervisor goroutine that manages the component, so they must not block. Enqueue to a channel or spawn a goroutine if you need non-trivial work (e.g. sending a PagerDuty alert).
type HealthChecker ¶
HealthChecker is an optional extension of Component. When implemented, the supervisor polls Health on the configured healthInterval and acts on the result according to the component's Tier.
type HealthReporter ¶
type HealthReporter interface {
HealthReportOrdered() []NamedComponentStatus
}
HealthReporter is the interface the HealthServer uses to query component health. *Supervisor satisfies this via HealthReportOrdered().
type HealthServer ¶
type HealthServer struct {
// contains filtered or unexported fields
}
HealthServer is a Component that exposes three HTTP endpoints:
GET /livez — liveness: 200 while the process is alive GET /readyz — readiness: 200 if all Critical/Significant components healthy GET /healthz — alias for /readyz (Docker HEALTHCHECK compatibility)
Register HealthServer first with the Supervisor so it starts before everything else and stops last.
func NewHealthServer ¶
func NewHealthServer(reporter HealthReporter, opts ...HealthServerOption) *HealthServer
NewHealthServer creates a HealthServer. Pass a *Supervisor as the reporter.
func (*HealthServer) Name ¶
func (h *HealthServer) Name() string
type HealthServerOption ¶
type HealthServerOption func(*healthServerConfig)
HealthServerOption configures a HealthServer.
func WithHealthAddr ¶
func WithHealthAddr(addr string) HealthServerOption
func WithHealthLogger ¶
func WithHealthLogger(l Logger) HealthServerOption
func WithHealthName ¶
func WithHealthName(name string) HealthServerOption
WithHealthName overrides the component name returned by HealthServer.Name. This is useful when registering multiple HealthServer instances (e.g. on different ports) with the same Supervisor. Defaults to "health-server".
func WithHealthReadTimeout ¶
func WithHealthReadTimeout(d time.Duration) HealthServerOption
func WithHealthWriteTimeout ¶
func WithHealthWriteTimeout(d time.Duration) HealthServerOption
type Logger ¶
type Logger interface {
Debug(msg string, kv ...any)
Info(msg string, kv ...any)
Error(msg string, kv ...any)
}
Logger is a minimal structured logging interface. It is intentionally narrow so that any slog, zap, zerolog, or logrus wrapper satisfies it with a thin adapter, keeping the samsarawork free of logging dependencies.
Key-value pairs are passed as alternating key, value arguments (slog style).
type MetricsObserver ¶
type MetricsObserver interface {
// ComponentStarted is called each time a component's Start call returns
// without error and the component is considered running.
ComponentStarted(component string, attempt int)
// ComponentStopped is called when a component's Stop call returns,
// regardless of whether it returned an error.
ComponentStopped(component string, err error)
// ComponentRestarting is called when the supervisor decides to restart a
// component after a failure. attempt is 1-based.
ComponentRestarting(component string, err error, attempt int, delay time.Duration)
// HealthCheckCompleted is called after every health check poll, whether
// healthy or not. duration is how long the Health() call took.
HealthCheckCompleted(component string, duration time.Duration, err error)
}
MetricsObserver receives structured telemetry events from the Supervisor. Implement this interface to bridge into Prometheus, OpenTelemetry, Datadog, or any other metrics backend without adding a hard dependency to this package.
All methods are called synchronously from the supervisor goroutine that manages the component, so they must not block. Enqueue or use a non-blocking write if your backend requires I/O.
All fields are optional at the implementation level — a partial observer that only cares about restarts is perfectly valid.
type NamedComponentStatus ¶
type NamedComponentStatus struct {
Name string
ComponentStatus
}
NamedComponentStatus is a ComponentStatus with its component name.
type RestartPolicy ¶
type RestartPolicy interface {
ShouldRestart(err error, attempt int) (restart bool, delay time.Duration)
}
RestartPolicy decides whether a component should be restarted after a failure and, if so, how long to wait before the next attempt.
attempt is zero-based: the first restart is attempt 0, the second is 1, etc. Returning false for restart means the component has failed permanently.
func AlwaysRestart ¶
func AlwaysRestart(delay time.Duration) RestartPolicy
AlwaysRestart returns a policy that restarts a component unconditionally with a fixed delay between attempts.
func ExponentialBackoff ¶
func ExponentialBackoff(maxRetries int, baseDelay time.Duration) RestartPolicy
ExponentialBackoff returns a policy that restarts a component up to maxRetries times. The delay doubles with each attempt starting from baseDelay, with ±25% jitter applied to spread restarts when many instances fail simultaneously:
attempt 0: baseDelay × [0.75, 1.25) attempt 1: 2×baseDelay × [0.75, 1.25) attempt 2: 4×baseDelay × [0.75, 1.25) …and so on
func MaxRetries ¶
func MaxRetries(maxRetries int, delay time.Duration) RestartPolicy
MaxRetries returns a policy that restarts a component up to maxRetries times with a fixed delay. After maxRetries attempts the component fails permanently.
func NeverRestart ¶
func NeverRestart() RestartPolicy
NeverRestart returns a policy that never restarts a component. Use this for components whose failure should propagate immediately.
type Supervisor ¶
type Supervisor struct {
// contains filtered or unexported fields
}
Supervisor starts, monitors, and stops a set of Components in dependency order. Components are started sequentially (dependencies first) and stopped in reverse order (dependents first).
func NewSupervisor ¶
func NewSupervisor(opts ...SupervisorOption) *Supervisor
NewSupervisor constructs a Supervisor with the given options.
func (*Supervisor) Add ¶
func (s *Supervisor) Add(c Component, opts ...ComponentOption)
Add registers a Component with the Supervisor. Panics if called after Run has started or if a component with the same name is already registered.
func (*Supervisor) ComponentHealth ¶
func (s *Supervisor) ComponentHealth(name string) (err error, known bool)
ComponentHealth returns the last known health error for a named component.
func (*Supervisor) HealthReport ¶
func (s *Supervisor) HealthReport() map[string]ComponentStatus
HealthReport returns a snapshot of all component health states keyed by name.
func (*Supervisor) HealthReportOrdered ¶
func (s *Supervisor) HealthReportOrdered() []NamedComponentStatus
HealthReportOrdered returns a name-sorted slice of component health states.
type SupervisorOption ¶
type SupervisorOption func(*supervisorConfig)
SupervisorOption configures a Supervisor.
func WithEventHooks ¶
func WithEventHooks(h *EventHooks) SupervisorOption
func WithHealthInterval ¶
func WithHealthInterval(d time.Duration) SupervisorOption
WithHealthInterval sets how often the supervisor polls each component's Health method. Defaults to 10s.
func WithHealthTimeout ¶
func WithHealthTimeout(d time.Duration) SupervisorOption
func WithMetricsObserver ¶
func WithMetricsObserver(m MetricsObserver) SupervisorOption
WithMetricsObserver registers a MetricsObserver for telemetry events.
func WithRestartResetWindow ¶
func WithRestartResetWindow(d time.Duration) SupervisorOption
func WithStartTimeout ¶
func WithStartTimeout(d time.Duration) SupervisorOption
WithStartTimeout sets how long the supervisor waits for a component to call ready() after Start is launched. Defaults to 15s.
func WithStopTimeout ¶
func WithStopTimeout(d time.Duration) SupervisorOption
func WithSupervisorLogger ¶
func WithSupervisorLogger(l Logger) SupervisorOption
type Tier ¶
type Tier int
Tier expresses how important a component is to overall application health.
const ( // TierCritical (default) — a permanently failed or persistently unhealthy // critical component causes the entire application to shut down. TierCritical Tier = iota // TierSignificant — while a significant component is transiently unhealthy // the application is marked not-ready (/readyz returns 503) but keeps // running. A permanent failure (restart policy exhausted) triggers a full // shutdown, identical to TierCritical. TierSignificant // TierAuxiliary — health problems are logged and hooks are fired, but they // have no effect on /readyz and do not trigger a shutdown. Even a permanent // failure only removes the component from monitoring; the app continues. TierAuxiliary )