dns_query

package
v0.0.0-...-5655933 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 20, 2023 License: MIT Imports: 11 Imported by: 0

README

应用场景

一般用于对DNS服务器的响应监测,帮助运维快速定位网络问题。

部署场景

不需要每台虚拟机都启用此插件,建议是独立或复合的某一台虚拟机启用此插件。

配置场景

本配置启用或数据定义如下功能:
使用本机DNS查询域名解析质量。
使用外部DNS查询域名解析质量。
使用不同记录类型进行DNS查询。
每种查询都设置超时时间5秒。
增加自定义标签,可通过自定义标签筛选数据及更加精确的告警推送。
在domains字段处增加自己想要被DNS查询的域名,一般填写公司业务系统的域名或第三方依赖的业务系统。

修改dns_query.toml文件配置

[root@aliyun input.dns_query]# cat dns_query.toml
# # collect interval
# interval = 15

[[instances]]
# # append some labels for series
labels = { cloud="huaweicloud", region="huabei-beijing-4",azone="az1", product="n9e" }

# # interval = global.interval * interval_times
# interval_times = 1

# #
auto_detect_local_dns_server  = true

### A record

## servers to query
servers = ["223.5.5.5","114.114.114.114","119.29.29.29"]

## Network is the network protocol name.
# network = "udp"

## Domains or subdomains to query.
domains = ["www.huaweicloud.com", "www.baidu.com", "www.tapd.cn"]

## Query record type.
## Possible values: A, AAAA, CNAME, MX, NS, PTR, TXT, SOA, SPF, SRV.
record_type = "A"

## Dns server port.
# port = 53

## Query timeout in seconds.
timeout = 5


### CNAME record

[[instances]]
# # append some labels for series
labels = { cloud="huaweicloud", region="huabei-beijing-4",azone="az1", product="n9e" }

# # interval = global.interval * interval_times
# interval_times = 1

# #
auto_detect_local_dns_server  = false

## servers to query
servers = ["223.5.5.5","114.114.114.114","119.29.29.29"]

## Network is the network protocol name.
# network = "udp"

## Domains or subdomains to query.
domains = ["www.huaweicloud.com", "www.baidu.com", "www.tapd.cn"]

## Query record type.
## Possible values: A, AAAA, CNAME, MX, NS, PTR, TXT, SOA, SPF, SRV.
record_type = "CNAME"

## Dns server port.
# port = 53

## Query timeout in seconds.
timeout = 5


### NS record

[[instances]]
# # append some labels for series
labels = { cloud="huaweicloud", region="huabei-beijing-4",azone="az1", product="n9e" }

# # interval = global.interval * interval_times
# interval_times = 1

# #
auto_detect_local_dns_server  = false

## servers to query
servers = ["223.5.5.5","114.114.114.114","119.29.29.29"]

## Network is the network protocol name.
# network = "udp"

## Domains or subdomains to query.
domains = ["www.huaweicloud.com", "www.baidu.com", "www.tapd.cn"]

## Query record type.
## Possible values: A, AAAA, CNAME, MX, NS, PTR, TXT, SOA, SPF, SRV.
record_type = "NS"

## Dns server port.
# port = 53

## Query timeout in seconds.
timeout = 5

测试配置

./categraf --test --inputs dns_query
....... A记录同理就省略
20:51:34 dns_query_rcode_value agent_hostname=aliyun.tjf.n9e.001 azone=az1 cloud=huaweicloud domain=www.tapd.cn product=n9e record_type=CNAME region=huabei-beijing-4 server=119.29.29.29 0
20:51:34 dns_query_result_code agent_hostname=aliyun.tjf.n9e.001 azone=az1 cloud=huaweicloud domain=www.tapd.cn product=n9e record_type=CNAME region=huabei-beijing-4 server=119.29.29.29 0
20:51:34 dns_query_query_time_ms agent_hostname=aliyun.tjf.n9e.001 azone=az1 cloud=huaweicloud domain=www.tapd.cn product=n9e record_type=CNAME region=huabei-beijing-4 server=119.29.29.29 33.500371

20:51:34 dns_query_rcode_value agent_hostname=aliyun.tjf.n9e.001 azone=az1 cloud=huaweicloud domain=www.baidu.com product=n9e record_type=CNAME region=huabei-beijing-4 server=119.29.29.29 0
20:51:34 dns_query_result_code agent_hostname=aliyun.tjf.n9e.001 azone=az1 cloud=huaweicloud domain=www.baidu.com product=n9e record_type=CNAME region=huabei-beijing-4 server=119.29.29.29 0
20:51:34 dns_query_query_time_ms agent_hostname=aliyun.tjf.n9e.001 azone=az1 cloud=huaweicloud domain=www.baidu.com product=n9e record_type=CNAME region=huabei-beijing-4 server=119.29.29.29 34.328242

20:51:34 dns_query_rcode_value agent_hostname=aliyun.tjf.n9e.001 azone=az1 cloud=huaweicloud domain=www.huaweicloud.com product=n9e record_type=CNAME region=huabei-beijing-4 server=119.29.29.29 0
20:51:34 dns_query_result_code agent_hostname=aliyun.tjf.n9e.001 azone=az1 cloud=huaweicloud domain=www.huaweicloud.com product=n9e record_type=CNAME region=huabei-beijing-4 server=119.29.29.29 0
20:51:34 dns_query_query_time_ms agent_hostname=aliyun.tjf.n9e.001 azone=az1 cloud=huaweicloud domain=www.huaweicloud.com product=n9e record_type=CNAME region=huabei-beijing-4 server=119.29.29.29
.....

重启服务

重启categraf服务生效
systemctl daemon-reload && systemctl restart categraf && systemctl status categraf

查看启动日志是否有错误
journalctl -f -n 500 -u categraf | grep "E\!" | grep "W\!"

检查数据呈现

等待1-2分钟后数据就会在图表中展示出来,如图: image

监控告警规则配置

个人经验仅供参考,一般DNS解析延迟时间:
超过2000毫秒,为P2级别,启用企业微信应用推送告警,3分钟内恢复发出恢复告警。
超过5000毫秒,为P1级别,启用电话语音告警&企业微信应用告警,3分钟内恢复发出恢复告警。

为什么会这么考量设计?
在用到DNS监控时,一般公司业务是遍布全国的,然而全国各个地区在解析DNS存在各种场景因素导致的DNS问题(如DNS被劫持、片区DNS服务器故障等),所以需要以高级别对待。
从收到告警到恢复告警设置3分钟的意图是防止期间是短暂时间有问题,同时也给SLA(99.99%)给足处理时长。

监控图表配置

先略过

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type DnsQuery

type DnsQuery struct {
	config.PluginConfig
	Instances []*Instance `toml:"instances"`
}

func (*DnsQuery) Clone

func (dq *DnsQuery) Clone() inputs.Input

func (*DnsQuery) GetInstances

func (dq *DnsQuery) GetInstances() []inputs.Instance

func (*DnsQuery) Name

func (c *DnsQuery) Name() string

type Instance

type Instance struct {
	config.InstanceConfig

	EnableAutoDetectDnsServer bool `toml:"auto_detect_local_dns_server"`
	// Domains or subdomains to query
	Domains []string `toml:"domains"`

	// Network protocol name
	Network string `toml:"network"`

	// Server to query
	Servers []string `toml:"servers"`

	// Record type
	RecordType string `toml:"record_type"`

	// DNS server port number
	Port int `toml:"port"`

	// Dns query timeout in seconds. 0 means no timeout
	Timeout int `toml:"timeout"`
}

func (*Instance) Gather

func (ins *Instance) Gather(slist *types.SampleList)

func (*Instance) Init

func (ins *Instance) Init() error

type ResultType

type ResultType uint64
const (
	Success ResultType = iota
	Timeout
	Error
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL