Azure Monitor Receiver
This receiver scrapes Azure Monitor API for resources metrics.
Configuration
The following settings are required:
subscription_ids
: list of subscriptions on which the resource's metrics are collected
- or
discover_subscriptions
: (default = false
) If set to true, will collect metrics from all subscriptions in the tenant.
The following settings are optional:
auth
(default = service_principal): Specifies the used authentication method. Supported values are service_principal
, workload_identity
, managed_identity
, default_credentials
.
resource_groups
(default = none): Filter metrics for specific resource groups, not setting a value will scrape metrics for all resources in the subscription.
services
(default = none): Filter metrics for specific services, not setting a value will scrape metrics for all services integrated with Azure Monitor.
metrics
(default = none): Filter metrics by name and aggregations. Not setting a value will scrape all metrics and their aggregations.
cache_resources
(default = 86400): List of resources will be cached for the provided amount of time in seconds.
cache_resources_definitions
(default = 86400): List of metrics definitions will be cached for the provided amount of time in seconds.
maximum_number_of_metrics_in_a_call
(default = 20): Maximum number of metrics to fetch in per API call, current limit in Azure is 20 (as of 03/27/2023).
maximum_number_of_records_per_resource
(default = 10): Maximum number of records to fetch per resource.
initial_delay
(default = 1s
): defines how long this receiver waits before starting.
cloud
(default = AzureCloud
): defines which Azure cloud to use. Valid values: AzureCloud
, AzureUSGovernment
, AzureChinaCloud
.
dimensions.enabled
(default = true
): allows to opt out from automatically split by all the dimensions of the resource type.
dimensions.overrides
(default = {}
): if dimensions are enabled, it allows you to specify a set of dimensions for a particular metric. This is a two levels map with first key being the resource type and second key being the metric name. Programmatic value should be used for metric name https://learn.microsoft.com/en-us/azure/azure-monitor/reference/metrics-index
use_batch_api
(default = false
): Use the batch API to fetch metrics. This is useful when the number of subscriptions is high and the API calls are rate limited.
Authenticating using service principal requires following additional settings:
tenant_id
client_id
client_secret
Authenticating using workload identities requires following additional settings:
tenant_id
client_id
federate_token_file
Authenticating using managed identities has the following optional settings:
Filtering metrics
The metrics
configuration setting is designed to limit scraping to specific metrics and their particular aggregations. It accepts a nested map where the key of the top-level is the Azure Metric Namespace, the key of the nested map is an Azure Metric Name, and the map values are a list of aggregation methods (e.g., Average, Minimum, Maximum, Total, Count). Additionally, the metric map value can be an empty array or an array with one element *
(asterisk). In this case, the scraper will fetch all supported aggregations for a metric. The letter case of the Namespaces, Metric names, and Aggregations does not affect the functionality.
Scraping limited metrics and aggregations:
receivers:
azuremonitor:
resource_groups:
- ${resource_groups}
services:
- Microsoft.EventHub/namespaces
- Microsoft.AAD/DomainServices # scraper will fetch all metrics from this namespace since there are no limits under the "metrics" option
metrics:
"microsoft.eventhub/namespaces": # scraper will fetch only the metrics listed below:
IncomingMessages: [total] # metric IncomingMessages with aggregation "Total"
NamespaceCpuUsage: [*] # metric NamespaceCpuUsage with all known aggregations
Use Batch API (experimental)
There's two API to collect metrics in Azure Monitor:
- the Azure Resource Manager (ARM) API (currently by default)
- the Azure Monitor Metrics Data Plane API (with
use_batch_api=true
)
The Azure Monitor Metrics Data Plane API present some interesting benefits, especially regarding rate limits.
Some highlights from announcement blog post - Jan 31, 2024
- Higher Querying Limits: This API is designed to handle metric data queries for resources with higher query limits compared to existing Azure Resource Manager APIs. This is particularly advantageous for customers with large subscriptions containing numerous resources. While the REST API allows only 12,000 API calls per hour, the Azure Metrics Data Plane API elevates this limit to a staggering 360,000 API calls per hour. This increase in query throughput ensures a more efficient and streamlined experience for customers.
- Efficiency: The efficiency of this API shines when collecting metrics for multiple resources. Instead of making multiple API calls for each resource, the Azure Metrics Data Plane API offers a single batch API call that can accommodate up to 50 resource IDs. This results in higher throughput and more efficient querying, making it a time-saving solution.
- Improved Performance: The performance of client-side services can be greatly enhanced by reducing the number of calls required to extract the same amount of metric data for resources. The Azure Metrics Data Plane API optimizes the data retrieval process, ultimately leading to improved performance across the board.
Good news is that it's very easy for you to try out!
receivers:
azuremonitor:
use_bath_api: true
... # no change for other configs
Example Configurations
Using Service Principal for authentication:
receivers:
azuremonitor:
subscription_ids: ["${subscription_id}"]
tenant_id: "${tenant_id}"
client_id: "${client_id}"
client_secret: "${env:CLIENT_SECRET}"
cloud: AzureUSGovernment
resource_groups:
- "${resource_group1}"
- "${resource_group2}"
services:
- "${service1}"
- "${service2}"
collection_interval: 60s
initial_delay: 1s
Using Azure Workload Identity for authentication:
receivers:
azuremonitor:
subscription_ids: ["${subscription_id}"]
auth: "workload_identity"
tenant_id: "${env:AZURE_TENANT_ID}"
client_id: "${env:AZURE_CLIENT_ID}"
federated_token_file: "${env:AZURE_FEDERATED_TOKEN_FILE}"
Using Managed Identity for authentication:
receivers:
azuremonitor:
subscription_ids: ["${subscription_id}"]
auth: "managed_identity"
client_id: "${env:AZURE_CLIENT_ID}"
Using Environment Variables for authentication:
receivers:
azuremonitor:
subscription_ids: ["${subscription_id}"]
auth: "default_credentials"
Overriding dimensions for a particular metric:
receivers:
azuremonitor:
dimensions:
enabled: true
overrides:
"Microsoft.Network/azureFirewalls":
# Real example of an Azure limitation here:
# Dimensions exposed are Reason, Status, Protocol,
# but when selecting Protocol in the filters, it returns nothing.
# Note here that the metric display name is ``Network rules hit count`` but it's programmatic value is ``NetworkRuleHit``
# Ref: https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-network-azurefirewalls-metrics
"NetworkRuleHit": [Reason, Status]
Metrics
Details about the metrics scraped by this receiver can be found in Supported metrics with Azure Monitor. This receiver adds the prefix "azure_" to all scraped metrics.
Azure API calls summary
At each collection interval, here are the different Azure API that can be called.
It can be useful to know that, in order to configure the client permission in Azure or to choose the
right configuration based on your needs.
conditions:
- subscription_ids is set
- discover_subscriptions is false or not set
- resource_attributes.subscription.enabled is true
cardinality: once per sub id
conditions:
- discover_subscriptions is true
cardinality: once per *page of sub
conditions:
- always
cardinality: once per sub id
conditions: always
cardinality:
- if use_batch_api is false, once per res id and *page of metrics def
- if use_batch_api is true, once per res type and *page of metrics def
conditions:
- use_batch_api is false
cardinality: once per res id, *page of metrics, and **composite key
conditions:
- use_batch_api is true
cardinality: once per res type and **composite key
*page size has not been clearly identified, reading the documentation. Even Chat Bots lose themselves
with the "top"/"$top" filter that doesn't seem related, and give random results from 10 to 1000...
**the composite key is an identifier formed with info retrieved in metric definitions.
Useful to group and reduce the number of metrics calls.
It is composed by
- dimensions,
- aggregations,
- minimum timegrain.
It is used to get several metrics in one request.