Documentation ¶
Overview ¶
Bolo Monitoring and Analytics - a collection of functions and datastructures to support the Bolo Monitoring and Analytics Daemon (bmad). Contains all the bmad logic for configuration management, check execution, and data submission.
Index ¶
Constants ¶
const CRITICAL int = 2
const MIN_INTERVAL int64 = 10
const OK int = 0
const UNKNOWN int = 3
const WARNING int = 1
Variables ¶
This section is empty.
Functions ¶
func ConnectToBolo ¶
func ConnectToBolo() error
Launches a child process to hold open a ZMQ connection to the upstream bolo server (send_bolo should take care of the configuration for how to connect). Upon termination this process will be respawned.
Currently, if the send_bolo configuration directive for bmad is updated on a config reload, the send_bolo process will not be respawned. A full-daemon restart is required to make use of the new send_bolo configuration
func DisconnectFromBolo ¶
func DisconnectFromBolo()
Disconnects from bolo (terminates the send_bolo process) If send_bolo is no longer running, does nothing.
func SendToBolo ¶
Sends an individual message from check output to bolo, via the send_bolo child process, spawned in ConnectToBolo()
Types ¶
type Check ¶
type Check struct { Command string // Command to execute for this Check Every int64 // Specific interval at which to run this Check (in seconds) Retries int // Number of times to retry this Check after failure Retry_every int64 // Retry interval at which to retry after Check failure (in secons) Timeout int64 // Maximum execution time for the Check (in seconds) Env map[string]string // Map of environment variables to set during Check execution Run_as string // User name to run this Check as Bulk string // Is this check a bulk-mode check Report string // Should this check report its exit code as a STATE event? (bulk-mode only) Name string // Name of the Check // contains filtered or unexported fields }
Checks in bmad come in two flavors - bulk, and regular. Both modes represent commands that should be run at specific intervals, whose output get piped up to a Bolo server.
Bulk mode Checks differ from Non-Bulk mode Checks in that they are expected to submit multiple datapoints/or states for a single execution (e.g. the sar collector, reporting all sar metrics with a single execution). Bulk checks are for this reason also allowed to submit meta-checks about their execution state to bolo, based on their return code.
The primary use case for non-bulk checks is to report a single metric, or state up to bolo, and thus state meta-checks are disallowed.
func (*Check) Reap ¶
Called on running checks, to determine if they have finished running.
If the Check has not finished executing, returns false.
If the Check has been running for longer than its Timeout, a SIGTERM (and failing that a SIGKILL) is issued to forcibly terminate the rogue Check process. In either case, this returns as if the check has not yet finished, and Reap() will need to be called again to fully reap the Check
If the Check has finished execution (on its own, or via forced termination), it will return true.
Once complete, some additional meta-stats for the check execution are appended to the check output, to be submit up to bolo
func (*Check) Spawn ¶
Does the needful to kick off a check. This will set the environment variables, pwd, effective user/group, hook up buffers for grabbing check output, run the process, and fill out accounting data for the check.
func (*Check) Submit ¶
Submits check results to bolo. This will append meta-stats to the checks as well, for bmad (like checks run, execution time, check latency). If the check has Bulk and Report both set to "true", it will report a STATE for the bulk check's execution. If the bulk check failed, any output to stderr will be included in the status message.
If full_stats is set to false, the latency, and count of checks run will *NOT* be reported. This is primarily used internally for reporting stats differently for run-once mode vs daemonized.
type Config ¶
type Config struct { Send_bolo string // Command to use for spawning the send_bolo process, to submit Check results Every int64 // Global default interval to run Checks (in seconds) Retry_every int64 // Global default interval to retry failed Checks (in seconds) Retries int // Global default number of times to retry a failed Check Timeout int64 // Global default timeout for maximum check execution time (in seconds) Bulk string // Global default for is this a bulk-mode check Report string // Global default for should a bulk check report its STATE Checks map[string]*Check // Map describing all Checks to be executed via bmad, keyed by Check name Env map[string]string // Global default environment variables to apply to all Checks run Log log.LogConfig // Configuration for the bmad logger Host string // Hostname that bmad is running on Include_dir string // Directory to include *.conf files from }
Config objects represent the internal bmad configuration, after being loaded from the YAML config file.
func LoadConfig ¶
Loads a YAML config file specified by cfg_file, and returns a Config object representing that config. Config reloads are auto-detected and handled seemlessly.