Azure Data Explorer Output Plugin
Azure Data Explorer is a distributed, columnar store, purpose built for any type
of logs, metrics and time series data.
This plugin writes data collected by any of the Telegraf input plugins to
Azure Data Explorer, Azure Synapse Data Explorer,
and Real time analytics in Fabric.
Pre-requisites
Global configuration options
In addition to the plugin-specific configuration settings, plugins support
additional global and plugin configuration settings. These settings are used to
modify metrics, tags, and field or create aliases and configure ordering, etc.
See the CONFIGURATION.md for more details.
Configuration
# Sends metrics to Azure Data Explorer
[[outputs.azure_data_explorer]]
## The URI property of the Azure Data Explorer resource on Azure
## ex: endpoint_url = https://myadxresource.australiasoutheast.kusto.windows.net
endpoint_url = ""
## The Azure Data Explorer database that the metrics will be ingested into.
## The plugin will NOT generate this database automatically, it's expected that this database already exists before ingestion.
## ex: "exampledatabase"
database = ""
## Timeout for Azure Data Explorer operations
# timeout = "20s"
## Type of metrics grouping used when pushing to Azure Data Explorer.
## Default is "TablePerMetric" for one table per different metric.
## For more information, please check the plugin README.
# metrics_grouping_type = "TablePerMetric"
## Name of the single table to store all the metrics (Only needed if metrics_grouping_type is "SingleTable").
# table_name = ""
## Creates tables and relevant mapping if set to true(default).
## Skips table and mapping creation if set to false, this is useful for running Telegraf with the lowest possible permissions i.e. table ingestor role.
# create_tables = true
## Ingestion method to use.
## Available options are
## - managed -- streaming ingestion with fallback to batched ingestion or the "queued" method below
## - queued -- queue up metrics data and process sequentially
# ingestion_type = "queued"
Metrics Grouping
Metrics can be grouped in two ways to be sent to Azure Data Explorer. To specify
which metric grouping type the plugin should use, the respective value should be
given to the metrics_grouping_type
in the config file. If no value is given to
metrics_grouping_type
, by default, the metrics will be grouped using
TablePerMetric
.
TablePerMetric
The plugin will group the metrics by the metric name, and will send each group
of metrics to an Azure Data Explorer table. If the table doesn't exist the
plugin will create the table, if the table exists then the plugin will try to
merge the Telegraf metric schema to the existing table. For more information
about the merge process check the .create-merge
documentation.
The table name will match the name
property of the metric, this means that the
name of the metric should comply with the Azure Data Explorer table naming
constraints in case you plan to add a prefix to the metric name.
SingleTable
The plugin will send all the metrics received to a single Azure Data Explorer
table. The name of the table must be supplied via table_name
in the config
file. If the table doesn't exist the plugin will create the table, if the table
exists then the plugin will try to merge the Telegraf metric schema to the
existing table. For more information about the merge process check the
.create-merge
documentation.
Tables Schema
The schema of the Azure Data Explorer table will match the structure of the
Telegraf Metric
object. The corresponding Azure Data Explorer command
generated by the plugin would be like the following:
.create-merge table ['table-name'] (['fields']:dynamic, ['name']:string, ['tags']:dynamic, ['timestamp']:datetime)
The corresponding table mapping would be like the following:
.create-or-alter table ['table-name'] ingestion json mapping 'table-name_mapping' '[{"column":"fields", "Properties":{"Path":"$[\'fields\']"}},{"column":"name", "Properties":{"Path":"$[\'name\']"}},{"column":"tags", "Properties":{"Path":"$[\'tags\']"}},{"column":"timestamp", "Properties":{"Path":"$[\'timestamp\']"}}]'
Note: This plugin will automatically create Azure Data Explorer tables and
corresponding table mapping as per the above mentioned commands.
Ingestion type
Note:
Streaming ingestion
has to be enabled on ADX [configure the ADX cluster]
in case of managed
option.
Refer the query below to check if streaming is enabled
.show database <DB-Name> policy streamingingestion
Authentication
Supported Authentication Methods
This plugin provides several types of authentication. The plugin will check the
existence of several specific environment variables, and consequently will
choose the right method.
These methods are:
-
AAD Application Tokens (Service Principals with secrets or certificates).
For guidance on how to create and register an App in Azure Active Directory
check this article, and for more information on the Service
Principals check this article.
-
AAD User Tokens
- Allows Telegraf to authenticate like a user. This method is mainly used
for development purposes only.
-
Managed Service Identity (MSI) token
- If you are running Telegraf from Azure VM or infrastructure, then this is
the preferred authentication method.
Whichever method, the designated Principal needs to be assigned the Database User
role on the Database level in the Azure Data Explorer. This role will
allow the plugin to create the required tables and ingest data into it. If
create_tables=false
then the designated principal only needs the Database Ingestor
role at least.
Configurations of the chosen Authentication Method
The plugin will authenticate using the first available of the following
configurations, it's important to understand that the assessment, and
consequently choosing the authentication method, will happen in order as
below:
-
Client Credentials: Azure AD Application ID and Secret.
Set the following environment variables:
AZURE_TENANT_ID
: Specifies the Tenant to which to authenticate.
AZURE_CLIENT_ID
: Specifies the app client ID to use.
AZURE_CLIENT_SECRET
: Specifies the app secret to use.
-
Client Certificate: Azure AD Application ID and X.509 Certificate.
AZURE_TENANT_ID
: Specifies the Tenant to which to authenticate.
AZURE_CLIENT_ID
: Specifies the app client ID to use.
AZURE_CERTIFICATE_PATH
: Specifies the certificate Path to use.
AZURE_CERTIFICATE_PASSWORD
: Specifies the certificate password to use.
-
Resource Owner Password: Azure AD User and Password. This grant type is
not recommended, use device login instead if you need interactive login.
AZURE_TENANT_ID
: Specifies the Tenant to which to authenticate.
AZURE_CLIENT_ID
: Specifies the app client ID to use.
AZURE_USERNAME
: Specifies the username to use.
AZURE_PASSWORD
: Specifies the password to use.
-
Azure Managed Service Identity: Delegate credential management to the
platform. Requires that code is running in Azure, e.g. on a VM. All
configuration is handled by Azure. See Azure Managed Service Identity
for more details. Only available when using the Azure Resource
Manager.
Querying data collected in Azure Data Explorer
Examples of data transformations and queries that would be useful to gain
insights -
Sample SQL metrics data -
name |
tags |
timestamp |
fields |
sqlserver_database_io |
{"database_name":"azure-sql-db2","file_type":"DATA","host":"adx-vm","logical_filename":"tempdev","measurement_db_type":"AzureSQLDB","physical_filename":"tempdb.mdf","replica_updateability":"READ_WRITE","sql_instance":"adx-sql-server"} |
2021-09-09T13:51:20Z |
{"current_size_mb":16,"database_id":2,"file_id":1,"read_bytes":2965504,"read_latency_ms":68,"reads":47,"rg_read_stall_ms":42,"rg_write_stall_ms":0,"space_used_mb":0,"write_bytes":1220608,"write_latency_ms":103,"writes":149} |
sqlserver_waitstats |
{"database_name":"azure-sql-db2","host":"adx-vm","measurement_db_type":"AzureSQLDB","replica_updateability":"READ_WRITE","sql_instance":"adx-sql-server","wait_category":"Worker Thread","wait_type":"THREADPOOL"} |
2021-09-09T13:51:20Z |
{"max_wait_time_ms":15,"resource_wait_ms":4469,"signal_wait_time_ms":0,"wait_time_ms":4469,"waiting_tasks_count":1464} |
Since collected metrics object is of complex type so "fields" and "tags" are
stored as dynamic data type, multiple ways to query this data-
- Query JSON attributes directly: Azure Data Explorer provides an ability to
query JSON data in raw format without parsing it, so JSON attributes can be
queried directly in following way:
Tablename
| where name == "sqlserver_azure_db_resource_stats" and todouble(fields.avg_cpu_percent) > 7
Tablename
| distinct tostring(tags.database_name)
Note - This approach could have performance impact in case of large
volumes of data, use below mentioned approach for such cases.
- Use Update
policy**:
Transform dynamic data type columns using update policy. This is the
recommended performant way for querying over large volumes of data compared
to querying directly over JSON attributes:
// Function to transform data
.create-or-alter function Transform_TargetTableName() {
SourceTableName
| mv-apply fields on (extend key = tostring(bag_keys(fields)[0]))
| project fieldname=key, value=todouble(fields[key]), name, tags, timestamp
}
// Create destination table with above query's results schema (if it doesn't exist already)
.set-or-append TargetTableName <| Transform_TargetTableName() | limit 0
// Apply update policy on destination table
.alter table TargetTableName policy update
@'[{"IsEnabled": true, "Source": "SourceTableName", "Query": "Transform_TargetTableName()", "IsTransactional": true, "PropagateIngestionProperties": false}]'
Sample syslog data -
name |
tags |
timestamp |
fields |
syslog |
{"appname":"azsecmond","facility":"user","host":"adx-linux-vm","hostname":"adx-linux-vm","severity":"info"} |
2021-09-20T14:36:44Z |
{"facility_code":1,"message":" 2021/09/20 14:36:44.890110 Failed to connect to mdsd: dial unix /var/run/mdsd/default_djson.socket: connect: no such file or directory","procid":"2184","severity_code":6,"timestamp":"1632148604890477000","version":1} |
syslog |
{"appname":"CRON","facility":"authpriv","host":"adx-linux-vm","hostname":"adx-linux-vm","severity":"info"} |
2021-09-20T14:37:01Z |
{"facility_code":10,"message":" pam_unix(cron:session): session opened for user root by (uid=0)","procid":"26446","severity_code":6,"timestamp":"1632148621120781000","version":1} |
There are multiple ways to flatten dynamic columns using 'extend' or
'bag_unpack' operator. You can use either of these ways in above mentioned
update policy function - 'Transform_TargetTableName()'
-
Use
extend
operator - This is the recommended approach compared to 'bag_unpack' as it is
faster and robust. Even if schema changes, it will not break queries or
dashboards.
Tablenmae
| extend facility_code=toint(fields.facility_code), message=tostring(fields.message), procid= tolong(fields.procid), severity_code=toint(fields.severity_code),
SysLogTimestamp=unixtime_nanoseconds_todatetime(tolong(fields.timestamp)), version= todouble(fields.version),
appname= tostring(tags.appname), facility= tostring(tags.facility),host= tostring(tags.host), hostname=tostring(tags.hostname), severity=tostring(tags.severity)
| project-away fields, tags
-
Use bag_unpack
plugin
to unpack the dynamic type columns automatically. This method could lead to
issues if source schema changes as its dynamically expanding columns.
Tablename
| evaluate bag_unpack(tags, columnsConflict='replace_source')
| evaluate bag_unpack(fields, columnsConflict='replace_source')