watchdog

package
v1.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 15, 2026 License: MIT Imports: 14 Imported by: 0

Documentation

Overview

Package watchdog implements self-healing container monitoring with circuit breaker.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Escalate

func Escalate(service, severity, detail string) []error

Escalate sends alerts through all configured channels.

func SendEmailAlert

func SendEmailAlert(cfg EscalationConfig, subject, body string) error

SendEmailAlert sends an alert email via SMTP. Uses a simple net/smtp approach.

func SendTelegramAlert

func SendTelegramAlert(cfg EscalationConfig, message string) error

SendTelegramAlert sends an alert message to a Telegram chat.

func TestAlert

func TestAlert(service, severity string) ([]string, []error)

TestAlert sends a test alert through all configured channels. Returns errors for any channel that failed.

Types

type CircuitState

type CircuitState string

CircuitState represents the state of a circuit breaker for a service.

const (
	CircuitClosed CircuitState = "closed" // healthy, restarts allowed
	CircuitOpen   CircuitState = "open"   // tripped, restarts blocked
)

type Config

type Config struct {
	Enabled                bool
	CircuitBreakerAttempts int           // default 3
	CircuitBreakerWindow   time.Duration // default 10m
	EscalationWebhook      string
	PollInterval           time.Duration // default 30s
}

Config holds watchdog configuration.

func DefaultConfig

func DefaultConfig() Config

DefaultConfig returns watchdog configuration with defaults.

type EscalationConfig

type EscalationConfig struct {
	TelegramBotToken string
	TelegramChatID   string
	SMTPHost         string
	SMTPPort         string
	SMTPFrom         string
	SMTPTo           string
	SMTPUser         string
	SMTPPass         string
}

EscalationConfig holds TG and SMTP notification settings.

func LoadEscalationConfig

func LoadEscalationConfig() EscalationConfig

LoadEscalationConfig reads escalation config from environment variables.

type Event

type Event struct {
	Timestamp time.Time `json:"timestamp"`
	Service   string    `json:"service"`
	Action    string    `json:"action"` // restart, circuit_open, circuit_reset, escalate
	Detail    string    `json:"detail"`
}

Event records a watchdog action.

type Incident

type Incident struct {
	ID        string    `json:"id"`
	Service   string    `json:"service"`
	Severity  string    `json:"severity"` // warning, critical
	Action    string    `json:"action"`
	Detail    string    `json:"detail"`
	Timestamp time.Time `json:"timestamp"`
	Notified  bool      `json:"notified"`
}

Incident records a watchdog incident for persistence.

type Metrics

type Metrics struct {
	// contains filtered or unexported fields
}

Metrics tracks Prometheus-compatible counters for the watchdog.

func NewMetrics

func NewMetrics() *Metrics

NewMetrics creates a new Metrics instance.

func (*Metrics) IncRestart

func (m *Metrics) IncRestart(service, result string)

IncRestart increments the restart counter for a service/result pair.

func (*Metrics) PrometheusText

func (m *Metrics) PrometheusText() string

PrometheusText returns all metrics in Prometheus exposition format.

func (*Metrics) SetCircuit

func (m *Metrics) SetCircuit(service string, open bool)

SetCircuit sets the circuit state for a service (0=closed, 1=open).

type ServiceCircuit

type ServiceCircuit struct {
	Service     string       `json:"service"`
	State       CircuitState `json:"state"`
	Attempts    int          `json:"attempts"`
	LastRestart time.Time    `json:"last_restart"`
	WindowStart time.Time    `json:"window_start"`
	TrippedAt   time.Time    `json:"tripped_at,omitempty"`
}

ServiceCircuit tracks circuit breaker state for one service.

type Status

type Status struct {
	Running    bool             `json:"running"`
	Circuits   []ServiceCircuit `json:"circuits"`
	EventCount int              `json:"event_count"`
	Since      time.Time        `json:"since"`
}

Status holds the current watchdog status.

type Watchdog

type Watchdog struct {
	// contains filtered or unexported fields
}

Watchdog monitors container health and restarts unhealthy services.

func New

func New(cfg Config, docker health.DockerClient) *Watchdog

New creates a new Watchdog instance.

func (*Watchdog) GetHistory

func (w *Watchdog) GetHistory(since time.Duration) []Event

GetHistory returns watchdog events, optionally filtered by duration.

func (*Watchdog) GetStatus

func (w *Watchdog) GetStatus() Status

GetStatus returns the current watchdog status.

func (*Watchdog) ResetBreakers

func (w *Watchdog) ResetBreakers() int

ResetBreakers resets all circuit breakers to closed state.

func (*Watchdog) SaveEvents

func (w *Watchdog) SaveEvents(dir string) error

SaveEvents persists watchdog events to a JSONL file.

func (*Watchdog) Start

func (w *Watchdog) Start(ctx context.Context)

Start begins the watchdog monitoring loop.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL