mesos_exporter

command module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 1, 2017 License: Apache-2.0 Imports: 16 Imported by: 0

README

Prometheus Mesos Exporter

Exporter for Mesos master and agent metrics.

Using

The Mesos Exporter can either expose cluster wide metrics from a master or task metrics from an agent.

Usage of mesos_exporter:
  -addr string
        Address to listen on (default ":9105")
  -exportedSlaveAttributes string
        Comma-separated list of slave attributes to include in the corresponding metric
  -exportedTaskLabels string
        Comma-separated list of task labels to include in the corresponding metric
  -master string
        Expose metrics from master running on this URL
  -slave string
        Expose metrics from slave running on this URL
  -timeout duration
        Master polling timeout (default 10s)
  -trustedCerts string
        Comma-separated list of certificates (.pem files) trusted for requests to
        Mesos endpoints

When using HTTP authentication, the following values are read from the environment:

  • MESOS_EXPORTER_USERNAME
  • MESOS_EXPORTER_PASSWORD

Prometheus Configuration

Usually you would run one exporter with -master for each master and one exporter for each slave with -slave. Monitoring each master individually ensures that the cluster can be monitored even if the underlying Mesos cluster is in a degraded state.

  • Master: mesos_exporter -master http://localhost:5050
  • Agent: mesos_exporter -slave http://localhost:5051

The necessary Prometheus configuration could look like this:

- job_name: mesos-master
  scrape_interval: 15s
  scrape_timeout: 10s
  static_configs:
  - targets:
    - master1.mesos.example.org:9105
    - master2.mesos.example.org:9105
    - master3.mesos.example.org:9105

- job_name: mesos-slave
  scrape_interval: 15s
  scrape_timeout: 10s
  static_configs:
  - targets:
    - node1.mesos.example.org:9105
    - node2.mesos.example.org:9105
    - node3.mesos.example.org:9105

A minimal set of alerts to ensure your cluster is operational could then be defined as follows:

ALERT MesosDown
  IF (up{job=~"mesos.*"} == 0) or (irate(mesos_collector_errors_total[5m]) > 0)
  FOR 5m
  LABELS { severity="warning" }
  ANNOTATIONS {
    description="Either the exporter or the associated Mesos component is down.",
    summary="The Mesos instance {{$labels.instance}} cannot be scraped."
  }

ALERT MesosMasterLeader
  IF sum(mesos_master_elected{job="mesos-master"}) != 1
  FOR 5m
  LABELS { severity="page" }
  ANNOTATIONS {
    description="Agents and frameworks require a unique leading Mesos master.",
    summary="Expected one leading Mesos master but there are {{ $value }}."
  }

ALERT MesosMasterTooManyRestarts
  IF resets(mesos_master_uptime_seconds{job="mesos-master"}[1h]) > 10
  FOR 5m
  LABELS { severity="page" }
  ANNOTATIONS {
    description="The number of seconds the process has been running is resetting regularly.",
    summary="The Mesos master {{$labels.instance}} has restarted {{ $value }} times in the last hour."
  }

ALERT MesosSlaveActive
  IF sum(mesos_master_slaves_state{state="active"}) < 0.9 * count(up{job="mesos-slave"})
  FOR 5m
  LABELS { severity="page" }
  ANNOTATIONS {
    description="Mesos agents must be registered with the master in order to receive tasks.",
    summary="More than 10% of all Mesos agents dropped out. Only {{ $value }} active agents remaining."
  }

ALERT MesosSlaveTooManyRestarts
  IF resets(mesos_slave_uptime_seconds{job="mesos-slave"}[1h]) > 10
  FOR 5m
  LABELS { severity="page" }
  ANNOTATIONS {
    description="The number of seconds the process has been running is resetting regularly.",
    summary="The Mesos agent {{$labels.instance}} has restarted {{ $value }} times in the last hour."
  }

Documentation

Overview

Scrape the /slave(1)/state endpoint to get information on the tasks running on executors. Information scraped at this point:

* Labels of running tasks ("mesos_slave_task_labels" series) * Attributes of mesos slaves ("mesos_slave_attributes")

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL