dcos

package
v1.28.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 15, 2023 License: MIT Imports: 22 Imported by: 11

README

DC/OS Input Plugin

This input plugin gathers metrics from a DC/OS cluster's metrics component.

Series Cardinality Warning

Depending on the work load of your DC/OS cluster, this plugin can quickly create a high number of series which, when unchecked, can cause high load on your database.

Global configuration options

In addition to the plugin-specific configuration settings, plugins support additional global and plugin configuration settings. These settings are used to modify metrics, tags, and field or create aliases and configure ordering, etc. See the CONFIGURATION.md for more details.

Configuration

# Input plugin for DC/OS metrics
[[inputs.dcos]]
  ## The DC/OS cluster URL.
  cluster_url = "https://dcos-master-1"

  ## The ID of the service account.
  service_account_id = "telegraf"
  ## The private key file for the service account.
  service_account_private_key = "/etc/telegraf/telegraf-sa-key.pem"

  ## Path containing login token.  If set, will read on every gather.
  # token_file = "/home/dcos/.dcos/token"

  ## In all filter options if both include and exclude are empty all items
  ## will be collected.  Arrays may contain glob patterns.
  ##
  ## Node IDs to collect metrics from.  If a node is excluded, no metrics will
  ## be collected for its containers or apps.
  # node_include = []
  # node_exclude = []
  ## Container IDs to collect container metrics from.
  # container_include = []
  # container_exclude = []
  ## Container IDs to collect app metrics from.
  # app_include = []
  # app_exclude = []

  ## Maximum concurrent connections to the cluster.
  # max_connections = 10
  ## Maximum time to receive a response from cluster.
  # response_timeout = "20s"

  ## Optional TLS Config
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  ## If false, skip chain & host verification
  # insecure_skip_verify = true

  ## Recommended filtering to reduce series cardinality.
  # [inputs.dcos.tagdrop]
  #   path = ["/var/lib/mesos/slave/slaves/*"]
Enterprise Authentication

When using Enterprise DC/OS, it is recommended to use a service account to authenticate with the cluster.

The plugin requires the following permissions:

dcos:adminrouter:ops:system-metrics full
dcos:adminrouter:ops:mesos full

Follow the directions to create a service account and assign permissions.

Quick configuration using the Enterprise CLI:

dcos security org service-accounts keypair telegraf-sa-key.pem telegraf-sa-cert.pem
dcos security org service-accounts create -p telegraf-sa-cert.pem -d "Telegraf DC/OS input plugin" telegraf
dcos security org users grant telegraf dcos:adminrouter:ops:system-metrics full
dcos security org users grant telegraf dcos:adminrouter:ops:mesos full
Open Source Authentication

The Open Source DC/OS does not provide service accounts. Instead you can use of the following options:

  1. Disable authentication
  2. Use the token_file parameter to read a authentication token from a file.

Then token_file can be set by using the [dcos cli] to login periodically. The cli can login for at most XXX days, you will need to ensure the cli performs a new login before this time expires.

dcos auth login --username foo --password bar
dcos config show core.dcos_acs_token > ~/.dcos/token

Another option to create a token_file is to generate a token using the cluster secret. This will allow you to set the expiration date manually or even create a never expiring token. However, if the cluster secret or the token is compromised it cannot be revoked and may require a full reinstall of the cluster. For more information on this technique reference this blog post.

Metrics

Please consult the Metrics Reference for details about field interpretation.

  • dcos_node

    • tags:
      • cluster
      • hostname
      • path (filesystem fields only)
      • interface (network fields only)
    • fields:
      • system_uptime (float)
      • cpu_cores (float)
      • cpu_total (float)
      • cpu_user (float)
      • cpu_system (float)
      • cpu_idle (float)
      • cpu_wait (float)
      • load_1min (float)
      • load_5min (float)
      • load_15min (float)
      • filesystem_capacity_total_bytes (int)
      • filesystem_capacity_used_bytes (int)
      • filesystem_capacity_free_bytes (int)
      • filesystem_inode_total (float)
      • filesystem_inode_used (float)
      • filesystem_inode_free (float)
      • memory_total_bytes (int)
      • memory_free_bytes (int)
      • memory_buffers_bytes (int)
      • memory_cached_bytes (int)
      • swap_total_bytes (int)
      • swap_free_bytes (int)
      • swap_used_bytes (int)
      • network_in_bytes (int)
      • network_out_bytes (int)
      • network_in_packets (float)
      • network_out_packets (float)
      • network_in_dropped (float)
      • network_out_dropped (float)
      • network_in_errors (float)
      • network_out_errors (float)
      • process_count (float)
  • dcos_container

    • tags:
      • cluster
      • hostname
      • container_id
      • task_name
    • fields:
      • cpus_limit (float)
      • cpus_system_time (float)
      • cpus_throttled_time (float)
      • cpus_user_time (float)
      • disk_limit_bytes (int)
      • disk_used_bytes (int)
      • mem_limit_bytes (int)
      • mem_total_bytes (int)
      • net_rx_bytes (int)
      • net_rx_dropped (float)
      • net_rx_errors (float)
      • net_rx_packets (float)
      • net_tx_bytes (int)
      • net_tx_dropped (float)
      • net_tx_errors (float)
      • net_tx_packets (float)
  • dcos_app

    • tags:
      • cluster
      • hostname
      • container_id
      • task_name
    • fields:
      • fields are application specific

Example Output

dcos_node,cluster=enterprise,hostname=192.168.122.18,path=/boot filesystem_capacity_free_bytes=918188032i,filesystem_capacity_total_bytes=1063256064i,filesystem_capacity_used_bytes=145068032i,filesystem_inode_free=523958,filesystem_inode_total=524288,filesystem_inode_used=330 1511859222000000000
dcos_node,cluster=enterprise,hostname=192.168.122.18,interface=dummy0 network_in_bytes=0i,network_in_dropped=0,network_in_errors=0,network_in_packets=0,network_out_bytes=0i,network_out_dropped=0,network_out_errors=0,network_out_packets=0 1511859222000000000
dcos_node,cluster=enterprise,hostname=192.168.122.18,interface=docker0 network_in_bytes=0i,network_in_dropped=0,network_in_errors=0,network_in_packets=0,network_out_bytes=0i,network_out_dropped=0,network_out_errors=0,network_out_packets=0 1511859222000000000
dcos_node,cluster=enterprise,hostname=192.168.122.18 cpu_cores=2,cpu_idle=81.62,cpu_system=4.19,cpu_total=13.670000000000002,cpu_user=9.48,cpu_wait=0,load_15min=0.7,load_1min=0.22,load_5min=0.6,memory_buffers_bytes=970752i,memory_cached_bytes=1830473728i,memory_free_bytes=1178636288i,memory_total_bytes=3975073792i,process_count=198,swap_free_bytes=859828224i,swap_total_bytes=859828224i,swap_used_bytes=0i,system_uptime=18874 1511859222000000000
dcos_node,cluster=enterprise,hostname=192.168.122.18,interface=lo network_in_bytes=1090992450i,network_in_dropped=0,network_in_errors=0,network_in_packets=1546938,network_out_bytes=1090992450i,network_out_dropped=0,network_out_errors=0,network_out_packets=1546938 1511859222000000000
dcos_node,cluster=enterprise,hostname=192.168.122.18,path=/ filesystem_capacity_free_bytes=1668378624i,filesystem_capacity_total_bytes=6641680384i,filesystem_capacity_used_bytes=4973301760i,filesystem_inode_free=3107856,filesystem_inode_total=3248128,filesystem_inode_used=140272 1511859222000000000
dcos_node,cluster=enterprise,hostname=192.168.122.18,interface=minuteman network_in_bytes=0i,network_in_dropped=0,network_in_errors=0,network_in_packets=0,network_out_bytes=210i,network_out_dropped=0,network_out_errors=0,network_out_packets=3 1511859222000000000
dcos_node,cluster=enterprise,hostname=192.168.122.18,interface=eth0 network_in_bytes=539886216i,network_in_dropped=1,network_in_errors=0,network_in_packets=979808,network_out_bytes=112395836i,network_out_dropped=0,network_out_errors=0,network_out_packets=891239 1511859222000000000
dcos_node,cluster=enterprise,hostname=192.168.122.18,interface=spartan network_in_bytes=0i,network_in_dropped=0,network_in_errors=0,network_in_packets=0,network_out_bytes=210i,network_out_dropped=0,network_out_errors=0,network_out_packets=3 1511859222000000000
dcos_node,cluster=enterprise,hostname=192.168.122.18,path=/var/lib/docker/overlay filesystem_capacity_free_bytes=1668378624i,filesystem_capacity_total_bytes=6641680384i,filesystem_capacity_used_bytes=4973301760i,filesystem_inode_free=3107856,filesystem_inode_total=3248128,filesystem_inode_used=140272 1511859222000000000
dcos_node,cluster=enterprise,hostname=192.168.122.18,interface=vtep1024 network_in_bytes=0i,network_in_dropped=0,network_in_errors=0,network_in_packets=0,network_out_bytes=0i,network_out_dropped=0,network_out_errors=0,network_out_packets=0 1511859222000000000
dcos_node,cluster=enterprise,hostname=192.168.122.18,path=/var/lib/docker/plugins filesystem_capacity_free_bytes=1668378624i,filesystem_capacity_total_bytes=6641680384i,filesystem_capacity_used_bytes=4973301760i,filesystem_inode_free=3107856,filesystem_inode_total=3248128,filesystem_inode_used=140272 1511859222000000000
dcos_node,cluster=enterprise,hostname=192.168.122.18,interface=d-dcos network_in_bytes=0i,network_in_dropped=0,network_in_errors=0,network_in_packets=0,network_out_bytes=0i,network_out_dropped=0,network_out_errors=0,network_out_packets=0 1511859222000000000
dcos_app,cluster=enterprise,container_id=9a78d34a-3bbf-467e-81cf-a57737f154ee,hostname=192.168.122.18 container_received_bytes_per_sec=0,container_throttled_bytes_per_sec=0 1511859222000000000
dcos_container,cluster=enterprise,container_id=cbf19b77-3b8d-4bcf-b81f-824b67279629,hostname=192.168.122.18 cpus_limit=0.3,cpus_system_time=307.31,cpus_throttled_time=102.029930607,cpus_user_time=268.57,disk_limit_bytes=268435456i,disk_used_bytes=30953472i,mem_limit_bytes=570425344i,mem_total_bytes=13316096i,net_rx_bytes=0i,net_rx_dropped=0,net_rx_errors=0,net_rx_packets=0,net_tx_bytes=0i,net_tx_dropped=0,net_tx_errors=0,net_tx_packets=0 1511859222000000000
dcos_app,cluster=enterprise,container_id=cbf19b77-3b8d-4bcf-b81f-824b67279629,hostname=192.168.122.18 container_received_bytes_per_sec=0,container_throttled_bytes_per_sec=0 1511859222000000000
dcos_container,cluster=enterprise,container_id=5725e219-f66e-40a8-b3ab-519d85f4c4dc,hostname=192.168.122.18,task_name=hello-world cpus_limit=0.6,cpus_system_time=25.6,cpus_throttled_time=327.977109217,cpus_user_time=566.54,disk_limit_bytes=0i,disk_used_bytes=0i,mem_limit_bytes=1107296256i,mem_total_bytes=335941632i,net_rx_bytes=0i,net_rx_dropped=0,net_rx_errors=0,net_rx_packets=0,net_tx_bytes=0i,net_tx_dropped=0,net_tx_errors=0,net_tx_packets=0 1511859222000000000
dcos_app,cluster=enterprise,container_id=5725e219-f66e-40a8-b3ab-519d85f4c4dc,hostname=192.168.122.18 container_received_bytes_per_sec=0,container_throttled_bytes_per_sec=0 1511859222000000000
dcos_app,cluster=enterprise,container_id=c76e1488-4fb7-4010-a4cf-25725f8173f9,hostname=192.168.122.18 container_received_bytes_per_sec=0,container_throttled_bytes_per_sec=0 1511859222000000000
dcos_container,cluster=enterprise,container_id=cbe0b2f9-061f-44ac-8f15-4844229e8231,hostname=192.168.122.18,task_name=telegraf cpus_limit=0.2,cpus_system_time=8.109999999,cpus_throttled_time=93.183916045,cpus_user_time=17.97,disk_limit_bytes=0i,disk_used_bytes=0i,mem_limit_bytes=167772160i,mem_total_bytes=0i,net_rx_bytes=0i,net_rx_dropped=0,net_rx_errors=0,net_rx_packets=0,net_tx_bytes=0i,net_tx_dropped=0,net_tx_errors=0,net_tx_packets=0 1511859222000000000
dcos_container,cluster=enterprise,container_id=b64115de-3d2a-431d-a805-76e7c46453f1,hostname=192.168.122.18 cpus_limit=0.2,cpus_system_time=2.69,cpus_throttled_time=20.064861214,cpus_user_time=6.56,disk_limit_bytes=268435456i,disk_used_bytes=29360128i,mem_limit_bytes=297795584i,mem_total_bytes=13733888i,net_rx_bytes=0i,net_rx_dropped=0,net_rx_errors=0,net_rx_packets=0,net_tx_bytes=0i,net_tx_dropped=0,net_tx_errors=0,net_tx_packets=0 1511859222000000000
dcos_app,cluster=enterprise,container_id=b64115de-3d2a-431d-a805-76e7c46453f1,hostname=192.168.122.18 container_received_bytes_per_sec=0,container_throttled_bytes_per_sec=0 1511859222000000000

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type APIError

type APIError struct {
	URL         string
	StatusCode  int
	Title       string
	Description string
}

func (APIError) Error

func (e APIError) Error() string

type AuthToken

type AuthToken struct {
	Text   string
	Expire time.Time
}

AuthToken is the authentication token.

type Client

type Client interface {
	SetToken(token string)

	Login(ctx context.Context, sa *ServiceAccount) (*AuthToken, error)
	GetSummary(ctx context.Context) (*Summary, error)
	GetContainers(ctx context.Context, node string) ([]Container, error)
	GetNodeMetrics(ctx context.Context, node string) (*Metrics, error)
	GetContainerMetrics(ctx context.Context, node, container string) (*Metrics, error)
	GetAppMetrics(ctx context.Context, node, container string) (*Metrics, error)
}

Client is an interface for communicating with the DC/OS API.

type ClusterClient

type ClusterClient struct {
	// contains filtered or unexported fields
}

ClusterClient is a Client that uses the cluster URL.

func NewClusterClient

func NewClusterClient(
	clusterURL *url.URL,
	timeout time.Duration,
	maxConns int,
	tlsConfig *tls.Config,
) *ClusterClient

func (*ClusterClient) GetAppMetrics

func (c *ClusterClient) GetAppMetrics(ctx context.Context, node, container string) (*Metrics, error)

func (*ClusterClient) GetContainerMetrics

func (c *ClusterClient) GetContainerMetrics(ctx context.Context, node, container string) (*Metrics, error)

func (*ClusterClient) GetContainers

func (c *ClusterClient) GetContainers(ctx context.Context, node string) ([]Container, error)

func (*ClusterClient) GetNodeMetrics

func (c *ClusterClient) GetNodeMetrics(ctx context.Context, node string) (*Metrics, error)

func (*ClusterClient) GetSummary

func (c *ClusterClient) GetSummary(ctx context.Context) (*Summary, error)

func (*ClusterClient) Login

func (c *ClusterClient) Login(ctx context.Context, sa *ServiceAccount) (*AuthToken, error)

func (*ClusterClient) SetToken

func (c *ClusterClient) SetToken(token string)

type Container

type Container struct {
	ID string
}

Container is a container on a node.

type Credentials

type Credentials interface {
	Token(ctx context.Context, client Client) (string, error)
	IsExpired() bool
}

type DCOS

type DCOS struct {
	ClusterURL string `toml:"cluster_url"`

	ServiceAccountID         string `toml:"service_account_id"`
	ServiceAccountPrivateKey string

	TokenFile string

	NodeInclude      []string
	NodeExclude      []string
	ContainerInclude []string
	ContainerExclude []string
	AppInclude       []string
	AppExclude       []string

	MaxConnections  int
	ResponseTimeout config.Duration
	tls.ClientConfig
	// contains filtered or unexported fields
}

func (*DCOS) Gather

func (d *DCOS) Gather(acc telegraf.Accumulator) error

func (*DCOS) GatherContainers

func (d *DCOS) GatherContainers(ctx context.Context, acc telegraf.Accumulator, cluster, node string)

func (*DCOS) GatherNode

func (d *DCOS) GatherNode(ctx context.Context, acc telegraf.Accumulator, cluster, node string)

func (*DCOS) SampleConfig

func (*DCOS) SampleConfig() string

type DataPoint

type DataPoint struct {
	Name  string            `json:"name"`
	Tags  map[string]string `json:"tags"`
	Unit  string            `json:"unit"`
	Value float64           `json:"value"`
}

type Login

type Login struct {
	UID   string `json:"uid"`
	Exp   int64  `json:"exp"`
	Token string `json:"token"`
}

Login is request data for logging in.

type LoginAuth

type LoginAuth struct {
	Token string `json:"token"`
}

LoginAuth is the response to a successful login.

type LoginError

type LoginError struct {
	Title       string `json:"title"`
	Description string `json:"description"`
}

LoginError is the response when login fails.

type Metrics

type Metrics struct {
	Datapoints []DataPoint            `json:"datapoints"`
	Dimensions map[string]interface{} `json:"dimensions"`
}

Metrics are the DCOS metrics

type NullCreds

type NullCreds struct {
}

func (*NullCreds) IsExpired

func (c *NullCreds) IsExpired() bool

func (*NullCreds) Token

func (c *NullCreds) Token(_ context.Context, _ Client) (string, error)

type ServiceAccount

type ServiceAccount struct {
	AccountID  string
	PrivateKey *rsa.PrivateKey
	// contains filtered or unexported fields
}

func (*ServiceAccount) IsExpired

func (c *ServiceAccount) IsExpired() bool

func (*ServiceAccount) Token

func (c *ServiceAccount) Token(ctx context.Context, client Client) (string, error)

type Slave

type Slave struct {
	ID string `json:"id"`
}

Slave is a node in the cluster.

type Summary

type Summary struct {
	Cluster string
	Slaves  []Slave
}

Summary provides high level cluster wide information.

type TokenCreds

type TokenCreds struct {
	Path string
}

func (*TokenCreds) IsExpired

func (c *TokenCreds) IsExpired() bool

func (*TokenCreds) Token

func (c *TokenCreds) Token(_ context.Context, _ Client) (string, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL