Documentation
¶
Overview ¶
Package watchdog implements self-healing container monitoring with circuit breaker.
Index ¶
- func Escalate(service, severity, detail string) []error
- func SendEmailAlert(cfg EscalationConfig, subject, body string) error
- func SendTelegramAlert(cfg EscalationConfig, message string) error
- func TestAlert(service, severity string) ([]string, []error)
- type CircuitState
- type Config
- type EscalationConfig
- type Event
- type Incident
- type Metrics
- type ServiceCircuit
- type Status
- type Watchdog
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func SendEmailAlert ¶
func SendEmailAlert(cfg EscalationConfig, subject, body string) error
SendEmailAlert sends an alert email via SMTP. Uses a simple net/smtp approach.
func SendTelegramAlert ¶
func SendTelegramAlert(cfg EscalationConfig, message string) error
SendTelegramAlert sends an alert message to a Telegram chat.
Types ¶
type CircuitState ¶
type CircuitState string
CircuitState represents the state of a circuit breaker for a service.
const ( CircuitClosed CircuitState = "closed" // healthy, restarts allowed CircuitOpen CircuitState = "open" // tripped, restarts blocked CircuitPermanentOpen CircuitState = "PERMANENT_OPEN" // permanently open; requires manual reset )
type Config ¶
type Config struct {
Enabled bool
CircuitBreakerAttempts int // default 3
CircuitBreakerWindow time.Duration // default 10m
EscalationWebhook string
PollInterval time.Duration // default 30s
// PermanentOpenThreshold is the number of consecutive OPEN windows after which
// the circuit transitions to PERMANENT_OPEN and stops all automated resets.
// Reads from WATCHDOG_PERMANENT_OPEN_THRESHOLD env var; default 3.
PermanentOpenThreshold int
}
Config holds watchdog configuration.
func DefaultConfig ¶
func DefaultConfig() Config
DefaultConfig returns watchdog configuration with defaults.
type EscalationConfig ¶
type EscalationConfig struct {
TelegramBotToken string
TelegramChatID string
SMTPHost string
SMTPPort string
SMTPFrom string
SMTPTo string
SMTPUser string
SMTPPass string
}
EscalationConfig holds TG and SMTP notification settings.
func LoadEscalationConfig ¶
func LoadEscalationConfig() EscalationConfig
LoadEscalationConfig reads escalation config from environment variables.
type Event ¶
type Event struct {
Timestamp time.Time `json:"timestamp"`
Service string `json:"service"`
Action string `json:"action"` // restart, circuit_open, circuit_reset, escalate
Detail string `json:"detail"`
}
Event records a watchdog action.
type Incident ¶
type Incident struct {
ID string `json:"id"`
Service string `json:"service"`
Severity string `json:"severity"` // warning, critical
Action string `json:"action"`
Detail string `json:"detail"`
Timestamp time.Time `json:"timestamp"`
Notified bool `json:"notified"`
}
Incident records a watchdog incident for persistence.
type Metrics ¶
type Metrics struct {
// contains filtered or unexported fields
}
Metrics tracks Prometheus-compatible counters for the watchdog.
func (*Metrics) IncRestart ¶
IncRestart increments the restart counter for a service/result pair.
func (*Metrics) PrometheusText ¶
PrometheusText returns all metrics in Prometheus exposition format.
func (*Metrics) SetCircuit ¶
SetCircuit sets the circuit state for a service (0=closed, 1=open).
type ServiceCircuit ¶
type ServiceCircuit struct {
Service string `json:"service"`
State CircuitState `json:"state"`
Attempts int `json:"attempts"`
ConsecutiveOpenWindows int `json:"consecutive_open_windows"`
LastRestart time.Time `json:"last_restart"`
WindowStart time.Time `json:"window_start"`
TrippedAt time.Time `json:"tripped_at,omitempty"`
PermanentOpenAt time.Time `json:"permanent_open_at,omitempty"`
}
ServiceCircuit tracks circuit breaker state for one service.
type Status ¶
type Status struct {
Running bool `json:"running"`
Circuits []ServiceCircuit `json:"circuits"`
EventCount int `json:"event_count"`
Since time.Time `json:"since"`
}
Status holds the current watchdog status.
type Watchdog ¶
type Watchdog struct {
// contains filtered or unexported fields
}
Watchdog monitors container health and restarts unhealthy services.
func New ¶
func New(cfg Config, docker health.DockerClient) *Watchdog
New creates a new Watchdog instance.
func (*Watchdog) GetHistory ¶
GetHistory returns watchdog events, optionally filtered by duration.
func (*Watchdog) ResetBreakers ¶
ResetBreakers resets all circuit breakers (including PERMANENT_OPEN) to closed state.
func (*Watchdog) ResetService ¶ added in v1.1.8
ResetService resets a single named service's circuit breaker to closed state. It clears PERMANENT_OPEN state and the consecutive-window counter. Returns false if the service has no tracked circuit.
func (*Watchdog) SaveEvents ¶
SaveEvents persists watchdog events to a JSONL file.