README
¶
AWS Container Insights Receiver
Status | |
---|---|
Stability | beta: metrics |
Distributions | contrib |
Warnings | Other |
Issues | |
Code Owners | @Aneurysm9, @pxaws |
Overview
AWS Container Insights Receiver (awscontainerinsightreceiver
) is an AWS specific receiver that supports CloudWatch Container Insights. CloudWatch Container Insights collect, aggregate,
and summarize metrics and logs from your containerized applications and microservices. Data are collected as as performance log events
using embedded metric format. From the EMF data, Amazon CloudWatch can create the aggregated CloudWatch metrics at the cluster, node, pod, task, and service level.
CloudWatch Container Insights has been supported by ECS Agent and CloudWatch Agent to collect infrastructure metrics for many resources such as such as CPU, memory, disk, and network. To migrate existing customers to use OpenTelemetry, AWS Container Insights Receiver (together with CloudWatch EMF Exporter) aims to support the same CloudWatch Container Insights experience for the following platforms:
- Amazon ECS
- Amazon EKS
- Kubernetes platforms on Amazon EC2
Design of AWS Container Insights Receiver
See the design doc
Configuration
Example configuration:
receivers:
awscontainerinsightreceiver:
# all parameters are optional
collection_interval: 60s
container_orchestrator: eks
add_service_as_attribute: true
prefer_full_pod_name: false
add_full_pod_name_metric_label: false
There is no need to provide any parameters since they are all optional.
collection_interval (optional)
The interval at which metrics should be collected. The default is 60 second.
container_orchestrator (optional)
The type of container orchestration service, e.g. eks or ecs. The default is eks.
add_service_as_attribute (optional)
Whether to add the associated service name as attribute. The default is true
prefer_full_pod_name (optional)
The "PodName" attribute is set based on the name of the relevant controllers like Daemonset, Job, ReplicaSet, ReplicationController, ... If it cannot be set that way and PrefFullPodName is true, the "PodName" attribute is set to the pod's own name. The default value is false.
add_full_pod_name_metric_label (optional)
The "FullPodName" attribute is the pod name including suffix. If false FullPodName label is not added. The default value is false
Sample configuration for Container Insights
This is a sample configuration for AWS Container Insights using the awscontainerinsightreceiver
and awsemfexporter
for an EKS cluster:
# create namespace
apiVersion: v1
kind: Namespace
metadata:
name: aws-otel-eks
labels:
name: aws-otel-eks
---
# create cwagent service account and role binding
apiVersion: v1
kind: ServiceAccount
metadata:
name: aws-otel-sa
namespace: aws-otel-eks
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: aoc-agent-role
rules:
- apiGroups: [""]
resources: ["pods", "nodes", "endpoints"]
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources: ["replicasets"]
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["list", "watch"]
- apiGroups: [""]
resources: ["nodes/proxy"]
verbs: ["get"]
- apiGroups: [""]
resources: ["nodes/stats", "configmaps", "events"]
verbs: ["create", "get"]
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["otel-container-insight-clusterleader"]
verbs: ["get","update"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: aoc-agent-role-binding
subjects:
- kind: ServiceAccount
name: aws-otel-sa
namespace: aws-otel-eks
roleRef:
kind: ClusterRole
name: aoc-agent-role
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-agent-conf
namespace: aws-otel-eks
labels:
app: opentelemetry
component: otel-agent-conf
data:
otel-agent-config: |
extensions:
health_check:
receivers:
awscontainerinsightreceiver:
processors:
batch/metrics:
timeout: 60s
exporters:
awsemf:
namespace: ContainerInsights
log_group_name: '/aws/containerinsights/{ClusterName}/performance'
log_stream_name: '{NodeName}'
resource_to_telemetry_conversion:
enabled: true
dimension_rollup_option: NoDimensionRollup
parse_json_encoded_attr_values: [Sources, kubernetes]
metric_declarations:
# node metrics
- dimensions: [[NodeName, InstanceId, ClusterName]]
metric_name_selectors:
- node_cpu_utilization
- node_memory_utilization
- node_network_total_bytes
- node_cpu_reserved_capacity
- node_memory_reserved_capacity
- node_number_of_running_pods
- node_number_of_running_containers
- dimensions: [[ClusterName]]
metric_name_selectors:
- node_cpu_utilization
- node_memory_utilization
- node_network_total_bytes
- node_cpu_reserved_capacity
- node_memory_reserved_capacity
- node_number_of_running_pods
- node_number_of_running_containers
- node_cpu_usage_total
- node_cpu_limit
- node_memory_working_set
- node_memory_limit
# pod metrics
- dimensions: [[PodName, Namespace, ClusterName], [Service, Namespace, ClusterName], [Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- pod_cpu_utilization
- pod_memory_utilization
- pod_network_rx_bytes
- pod_network_tx_bytes
- pod_cpu_utilization_over_pod_limit
- pod_memory_utilization_over_pod_limit
- dimensions: [[PodName, Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- pod_cpu_reserved_capacity
- pod_memory_reserved_capacity
- dimensions: [[PodName, Namespace, ClusterName]]
metric_name_selectors:
- pod_number_of_container_restarts
# cluster metrics
- dimensions: [[ClusterName]]
metric_name_selectors:
- cluster_node_count
- cluster_failed_node_count
# service metrics
- dimensions: [[Service, Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- service_number_of_running_pods
# node fs metrics
- dimensions: [[NodeName, InstanceId, ClusterName], [ClusterName]]
metric_name_selectors:
- node_filesystem_utilization
# namespace metrics
- dimensions: [[Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- namespace_number_of_running_pods
debug:
verbosity: detailed
service:
pipelines:
metrics:
receivers: [awscontainerinsightreceiver]
processors: [batch/metrics]
exporters: [awsemf]
extensions: [health_check]
---
# create Daemonset
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: aws-otel-eks-ci
namespace: aws-otel-eks
spec:
selector:
matchLabels:
name: aws-otel-eks-ci
template:
metadata:
labels:
name: aws-otel-eks-ci
spec:
containers:
- name: aws-otel-collector
image: {collector-image-url}
env:
#- name: AWS_REGION
# value: "us-east-1"
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: HOST_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: K8S_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
imagePullPolicy: Always
command:
- "/awscollector"
- "--config=/conf/otel-agent-config.yaml"
volumeMounts:
- name: rootfs
mountPath: /rootfs
readOnly: true
- name: dockersock
mountPath: /var/run/docker.sock
readOnly: true
- name: varlibdocker
mountPath: /var/lib/docker
readOnly: true
- name: containerdsock
mountPath: /run/containerd/containerd.sock
readOnly: true
- name: sys
mountPath: /sys
readOnly: true
- name: devdisk
mountPath: /dev/disk
readOnly: true
- name: otel-agent-config-vol
mountPath: /conf
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 200m
memory: 200Mi
volumes:
- configMap:
name: otel-agent-conf
items:
- key: otel-agent-config
path: otel-agent-config.yaml
name: otel-agent-config-vol
- name: rootfs
hostPath:
path: /
- name: dockersock
hostPath:
path: /var/run/docker.sock
- name: varlibdocker
hostPath:
path: /var/lib/docker
- name: containerdsock
hostPath:
path: /run/containerd/containerd.sock
- name: sys
hostPath:
path: /sys
- name: devdisk
hostPath:
path: /dev/disk/
serviceAccountName: aws-otel-sa
To deploy to an EKS cluster
kubectl apply -f config.yaml
Available Metrics and Resource Attributes
Cluster
Metric | Unit |
---|---|
cluster_failed_node_count | Count |
cluster_node_count | Count |
Resource Attribute |
---|
ClusterName |
NodeName |
Type |
Version |
Sources |
Cluster Namespace
Metric | Unit |
---|---|
namespace_number_of_running_pods | Count |
Resource Attribute |
---|
ClusterName |
NodeName |
Namespace |
Type |
Version |
Sources |
kubernete |
Cluster Service
Metric | Unit |
---|---|
service_number_of_running_pods | Count |
Resource Attribute |
---|
ClusterName |
NodeName |
Namespace |
Service |
Type |
Version |
Sources |
kubernete |
Node
Metric | Unit |
---|---|
node_cpu_limit | Millicore |
node_cpu_request | Millicore |
node_cpu_reserved_capacity | Percent |
node_cpu_usage_system | Millicore |
node_cpu_usage_total | Millicore |
node_cpu_usage_user | Millicore |
node_cpu_utilization | Percent |
node_memory_cache | Bytes |
node_memory_failcnt | Count |
node_memory_hierarchical_pgfault | Count/Second |
node_memory_hierarchical_pgmajfault | Count/Second |
node_memory_limit | Bytes |
node_memory_mapped_file | Bytes |
node_memory_max_usage | Bytes |
node_memory_pgfault | Count/Second |
node_memory_pgmajfault | Count/Second |
node_memory_request | Bytes |
node_memory_reserved_capacity | Percent |
node_memory_rss | Bytes |
node_memory_swap | Bytes |
node_memory_usage | Bytes |
node_memory_utilization | Percent |
node_memory_working_set | Bytes |
node_network_rx_bytes | Bytes/Second |
node_network_rx_dropped | Count/Second |
node_network_rx_errors | Count/Second |
node_network_rx_packets | Count/Second |
node_network_total_bytes | Bytes/Second |
node_network_tx_bytes | Bytes/Second |
node_network_tx_dropped | Count/Second |
node_network_tx_errors | Count/Second |
node_network_tx_packets | Count/Second |
node_number_of_running_containers | Count |
node_number_of_running_pods | Count |
Resource Attribute |
---|
ClusterName |
InstanceType |
NodeName |
Type |
Version |
Sources |
kubernete |
Node Disk IO
Metric | Unit |
---|---|
node_diskio_io_serviced_async | Count/Second |
node_diskio_io_serviced_read | Count/Second |
node_diskio_io_serviced_sync | Count/Second |
node_diskio_io_serviced_total | Count/Second |
node_diskio_io_serviced_write | Count/Second |
node_diskio_io_service_bytes_async | Bytes/Second |
node_diskio_io_service_bytes_read | Bytes/Second |
node_diskio_io_service_bytes_sync | Bytes/Second |
node_diskio_io_service_bytes_total | Bytes/Second |
node_diskio_io_service_bytes_write | Bytes/Second |
Resource Attribute |
---|
AutoScalingGroupName |
ClusterName |
InstanceId |
InstanceType |
NodeName |
EBSVolumeId |
device |
Type |
Version |
Sources |
kubernete |
Node Filesystem
Metric | Unit |
---|---|
node_filesystem_available | Bytes |
node_filesystem_capacity | Bytes |
node_filesystem_inodes | Count |
node_filesystem_inodes_free | Count |
node_filesystem_usage | Bytes |
node_filesystem_utilization | Percent |
Resource Attribute |
---|
AutoScalingGroupName |
ClusterName |
InstanceId |
InstanceType |
NodeName |
EBSVolumeId |
device |
fstype |
Type |
Version |
Sources |
kubernete |
Node Network
Metric | Unit |
---|---|
node_interface_network_rx_bytes | Bytes/Second |
node_interface_network_rx_dropped | Count/Second |
node_interface_network_rx_errors | Count/Second |
node_interface_network_rx_packets | Count/Second |
node_interface_network_total_bytes | Bytes/Second |
node_interface_network_tx_bytes | Bytes/Second |
node_interface_network_tx_dropped | Count/Second |
node_interface_network_tx_errors | Count/Second |
node_interface_network_tx_packets | Count/Second |
Resource Attribute |
---|
AutoScalingGroupName |
ClusterName |
InstanceId |
InstanceType |
NodeName |
Type |
Version |
interface |
Sources |
kubernete |
Pod
Metric | Unit |
---|---|
pod_cpu_limit | Millicore |
pod_cpu_request | Millicore |
pod_cpu_reserved_capacity | Percent |
pod_cpu_usage_system | Millicore |
pod_cpu_usage_total | Millicore |
pod_cpu_usage_user | Millicore |
pod_cpu_utilization | Percent |
pod_cpu_utilization_over_pod_limit | Percent |
pod_memory_cache | Bytes |
pod_memory_failcnt | Count |
pod_memory_hierarchical_pgfault | Count/Second |
pod_memory_hierarchical_pgmajfault | Count/Second |
pod_memory_limit | Bytes |
pod_memory_mapped_file | Bytes |
pod_memory_max_usage | Bytes |
pod_memory_pgfault | Count/Second |
pod_memory_pgmajfault | Count/Second |
pod_memory_request | Bytes |
pod_memory_reserved_capacity | Percent |
pod_memory_rss | Bytes |
pod_memory_swap | Bytes |
pod_memory_usage | Bytes |
pod_memory_utilization | Percent |
pod_memory_utilization_over_pod_limit | Percent |
pod_memory_working_set | Bytes |
pod_network_rx_bytes | Bytes/Second |
pod_network_rx_dropped | Count/Second |
pod_network_rx_errors | Count/Second |
pod_network_rx_packets | Count/Second |
pod_network_total_bytes | Bytes/Second |
pod_network_tx_bytes | Bytes/Second |
pod_network_tx_dropped | Count/Second |
pod_network_tx_errors | Count/Second |
pod_network_tx_packets | Count/Second |
pod_number_of_container_restarts | Count |
pod_number_of_containers | Count |
pod_number_of_running_containers | Count |
Resource Attribute |
---|
AutoScalingGroupName |
ClusterName |
InstanceId |
InstanceType |
K8sPodName |
Namespace |
NodeName |
PodId |
Type |
Version |
Sources |
kubernete |
pod_status |
Pod Network
Metric | Unit |
---|---|
pod_interface_network_rx_bytes | Bytes/Second |
pod_interface_network_rx_dropped | Count/Second |
pod_interface_network_rx_errors | Count/Second |
pod_interface_network_rx_packets | Count/Second |
pod_interface_network_total_bytes | Bytes/Second |
pod_interface_network_tx_bytes | Bytes/Second |
pod_interface_network_tx_dropped | Count/Second |
pod_interface_network_tx_errors | Count/Second |
pod_interface_network_tx_packets | Count/Second |
Resource Attribute |
---|
AutoScalingGroupName |
ClusterName |
InstanceId |
InstanceType |
K8sPodName |
Namespace |
NodeName |
PodId |
Type |
Version |
interface |
Sources |
kubernete |
pod_status |
Container
Metric | Unit |
---|---|
container_cpu_limit | Millicore |
container_cpu_request | Millicore |
container_cpu_usage_system | Millicore |
container_cpu_usage_total | Millicore |
container_cpu_usage_user | Millicore |
container_cpu_utilization | Percent |
container_memory_cache | Bytes |
container_memory_failcnt | Count |
container_memory_hierarchical_pgfault | Count/Second |
container_memory_hierarchical_pgmajfault | Count/Second |
container_memory_limit | Bytes |
container_memory_mapped_file | Bytes |
container_memory_max_usage | Bytes |
container_memory_pgfault | Count/Second |
container_memory_pgmajfault | Count/Second |
container_memory_request | Bytes |
container_memory_rss | Bytes |
container_memory_swap | Bytes |
container_memory_usage | Bytes |
container_memory_utilization | Percent |
container_memory_working_set | Bytes |
number_of_container_restarts | Count |
Resource Attribute |
---|
AutoScalingGroupName |
ClusterName |
ContainerId |
ContainerName |
InstanceId |
InstanceType |
K8sPodName |
Namespace |
NodeName |
PodId |
Type |
Version |
Sources |
kubernetes |
container_status |
container_status_reason |
container_last_termination_reason |
The attribute container_status_reason
is present only when container_status
is in "Waiting" or "Terminated" State. The attribute container_last_termination_reason
is present only when container_status
is in "Terminated" State.
This is a sample configuration for AWS Container Insights using the awscontainerinsightreceiver
and awsemfexporter
for an ECS cluster to collect the instance level metrics:
receivers:
awscontainerinsightreceiver:
collection_interval: 10s
container_orchestrator: ecs
processors:
batch/metrics:
timeout: 60s
exporters:
awsemf:
namespace: ContainerInsightsEC2Instance
log_group_name: '/aws/ecs/containerinsights/{ClusterName}/performance'
log_stream_name: 'instanceTelemetry/{ContainerInstanceId}'
resource_to_telemetry_conversion:
enabled: true
dimension_rollup_option: NoDimensionRollup
parse_json_encoded_attr_values: [Sources]
metric_declarations:
# instance metrics
- dimensions: [ [ ContainerInstanceId, InstanceId, ClusterName] ]
metric_name_selectors:
- instance_cpu_utilization
- instance_memory_utilization
- instance_network_total_bytes
- instance_cpu_reserved_capacity
- instance_memory_reserved_capacity
- instance_number_of_running_tasks
- instance_filesystem_utilization
- dimensions: [ [ClusterName] ]
metric_name_selectors:
- instance_cpu_utilization
- instance_memory_utilization
- instance_network_total_bytes
- instance_cpu_reserved_capacity
- instance_memory_reserved_capacity
- instance_number_of_running_tasks
- instance_cpu_usage_total
- instance_cpu_limit
- instance_memory_working_set
- instance_memory_limit
debug:
verbosity: detailed
service:
pipelines:
metrics:
receivers: [awscontainerinsightreceiver]
processors: [batch/metrics]
exporters: [awsemf,debug]
To deploy to an ECS cluster check this doc for details
Available Metrics and Resource Attributes
Instance
Metric | Unit |
---|---|
instance_cpu_limit | Millicore |
instance_cpu_reserved_capacity | Percent |
instance_cpu_usage_system | Millicore |
instance_cpu_usage_total | Millicore |
instance_cpu_usage_user | Millicore |
instance_cpu_utilization | Percent |
instance_memory_cache | Bytes |
instance_memory_failcnt | Count |
instance_memory_hierarchical_pgfault | Count/Second |
instance_memory_hierarchical_pgmajfault | Count/Second |
instance_memory_limit | Bytes |
instance_memory_mapped_file | Bytes |
instance_memory_max_usage | Bytes |
instance_memory_pgfault | Count/Second |
instance_memory_pgmajfault | Count/Second |
instance_memory_reserved_capacity | Percent |
instance_memory_rss | Bytes |
instance_memory_swap | Bytes |
instance_memory_usage | Bytes |
instance_memory_utilization | Percent |
instance_memory_working_set | Bytes |
instance_network_rx_bytes | Bytes/Second |
instance_network_rx_dropped | Count/Second |
instance_network_rx_errors | Count/Second |
instance_network_rx_packets | Count/Second |
instance_network_total_bytes | Bytes/Second |
instance_network_tx_bytes | Bytes/Second |
instance_network_tx_dropped | Count/Second |
instance_network_tx_errors | Count/Second |
instance_network_tx_packets | Count/Second |
instance_number_of_running_tasks | Count |
Resource Attribute |
---|
ClusterName |
InstanceType |
AutoScalingGroupName |
Type |
Version |
Sources |
ContainerInstanceId |
InstanceId |
Instance Disk IO
Metric | Unit |
---|---|
instance_diskio_io_serviced_async | Count/Second |
instance_diskio_io_serviced_read | Count/Second |
instance_diskio_io_serviced_sync | Count/Second |
instance_diskio_io_serviced_total | Count/Second |
instance_diskio_io_serviced_write | Count/Second |
instance_diskio_io_service_bytes_async | Bytes/Second |
instance_diskio_io_service_bytes_read | Bytes/Second |
instance_diskio_io_service_bytes_sync | Bytes/Second |
instance_diskio_io_service_bytes_total | Bytes/Second |
instance_diskio_io_service_bytes_write | Bytes/Second |
Resource Attribute |
---|
ClusterName |
InstanceType |
AutoScalingGroupName |
Type |
Version |
Sources |
ContainerInstanceId |
InstanceId |
EBSVolumeId |
Instance Filesystem
Metric | Unit |
---|---|
instance_filesystem_available | Bytes |
instance_filesystem_capacity | Bytes |
instance_filesystem_inodes | Count |
instance_filesystem_inodes_free | Count |
instance_filesystem_usage | Bytes |
instance_filesystem_utilization | Percent |
Resource Attribute |
---|
ClusterName |
InstanceType |
AutoScalingGroupName |
Type |
Version |
Sources |
ContainerInstanceId |
InstanceId |
EBSVolumeId |
Instance Network
Metric | Unit |
---|---|
instance_interface_network_rx_bytes | Bytes/Second |
instance_interface_network_rx_dropped | Count/Second |
instance_interface_network_rx_errors | Count/Second |
instance_interface_network_rx_packets | Count/Second |
instance_interface_network_total_bytes | Bytes/Second |
instance_interface_network_tx_bytes | Bytes/Second |
instance_interface_network_tx_dropped | Count/Second |
instance_interface_network_tx_errors | Count/Second |
instance_interface_network_tx_packets | Count/Second |
Resource Attribute |
---|
ClusterName |
InstanceType |
AutoScalingGroupName |
Type |
Version |
Sources |
ContainerInstanceId |
InstanceId |
EBSVolumeId |
Warnings
Root permissions
When using this component, the collector process needs root permission to be able to read the content of the files located in the following locations:
/
/var/run/docker.sock
/var/lib/docker
/run/containerd/containerd.sock
/sys
/dev/disk
This requirement comes from the fact that this component is based on cAdvisor.
Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewFactory ¶
NewFactory creates a factory for AWS container insight receiver
Types ¶
type Config ¶
type Config struct { // CollectionInterval is the interval at which metrics should be collected. The default is 60 second. CollectionInterval time.Duration `mapstructure:"collection_interval"` // ContainerOrchestrator is the type of container orchestration service, e.g. eks or ecs. The default is eks. ContainerOrchestrator string `mapstructure:"container_orchestrator"` // Whether to add the associated service name as attribute. The default is true TagService bool `mapstructure:"add_service_as_attribute"` // The "PodName" attribute is set based on the name of the relevant controllers like Daemonset, Job, ReplicaSet, ReplicationController, ... // If it cannot be set that way and PrefFullPodName is true, the "PodName" attribute is set to the pod's own name. // The default value is false PrefFullPodName bool `mapstructure:"prefer_full_pod_name"` // The "FullPodName" attribute is the pod name including suffix // If false FullPodName label is not added // The default value is false AddFullPodNameMetricLabel bool `mapstructure:"add_full_pod_name_metric_label"` }
Config defines configuration for aws ecs container metrics receiver.