可观测服务支持第三方告警系统对接平台告警消息,目前提供告警消息API和webhook类型告警订阅两种对接方式,调用告警消息API可获取信息较为丰富的实时和历史消息,使用webhook类型告警订阅,可直接接收由Alertmanager发送的原始消息。若用于消息持久化,推荐使用告警消息API。若用于消息转发,推荐使用webhook类型告警订阅。
告警消息API
告警消息API可查询告警消息页面展示的所有实时和历史消息数据,包括消息内容、状态、级别、详情、分类和组件、项目和部门、分组和规则等信息,同时支持分类(数字原生引擎/云产品/用户负载)、状态(告警中/已屏蔽/已恢复)、级别(严重/警告/信息)和时间等粒度的筛选。使用方法见 API参考-告警消息。
该方式适用于消息的查询和存储。
Webhook类型告警订阅
您需要创建webhook类型的告警订阅(用户指南-告警管理-告警订阅-创建订阅-Webhook类型),并关联所需告警分组(用户指南-告警管理-告警订阅-关联分组),自主控制对接哪些分组下规则产生的告警消息,便于关注重点故障的发生和控制消息数量。
配置完成后Alertmanager会直接推送告警中
的消息至webhook终端,已屏蔽
和已恢复
状态的告警不会被推送。若配置允许发送恢复通知,在告警恢复时将会被推送一次。如有需要请自行处理和持久化。
该方式适用于消息的通知和转发。
webhook消息推送格式
Alertmanager将以JSON格式向配置的webhook终端发送HTTP POST请求。
名称 | 类型 | 描述 |
---|---|---|
receiver | string | 定义通知将发送到的接收者名称 |
status | string | 如果至少有一个告警正在firing,则定义为firing,否则为resolved |
groupLabels | dict | 告警的分组依据 |
commonLabels | dict | 告警通用的标签 |
commonAnnotations | dict | 告警通用注释集 |
externalURL | string | 内部链接 |
alerts | list | 告警消息列表(主要内容) |
alerts[$i].status | string | 告警状态 |
alerts[$i].fingerprint | string | 告警标识 |
alerts[$i].startsAt | string | 开始时间 |
alerts[$i].endsAt | string | 结束时间(当告警为resolved时有意义,firing时值为 ‘0001-01-01T00:00:00Z’) |
alerts[$i].labels | dict | 告警标签 |
alerts[$i].labels.alertname | string | 告警名称-中文 |
alerts[$i].labels.severity | string | 告警级别 |
alerts[$i].labels.category | string | 告警分类 |
alerts[$i].labels.group_id | string | 告警所属分组id |
alerts[$i].labels.rule_id | string | 告警所属规则id |
alerts[$i].labels.ecms_cluster_id | string | 集群id |
alerts[$i].labels.company | string | 告警平台-客户名称 |
alerts[$i].labels.project | string | 告警平台-项目名称 |
alerts[$i].labels.public_vip | string | 告警平台-外部访问地址 |
alerts[$i].annotations | dict | 告警注释 |
alerts[$i].annotations.alertname_en | string | 告警名称-英文 |
alerts[$i].annotations.description | string | 告警概述-中文 |
alerts[$i].annotations.description_en | string | 告警概述-英文 |
alerts[$i].annotations.summary | string | 告警详情-中文 |
alerts[$i].annotations.summary_en | string | 告警详情-英文 |
alerts[$i].annotations.solution | string | 解决方案-中文 |
alerts[$i].annotations.solution_en | string | 解决方案-英文 |
alerts[$i].annotations.expr | string | 监控数据查询表达式 |
alerts[$i].annotations.legend_format | string | 监控数据图例 |
alerts[$i].annotations.thresholds | string | 监控数据阈值 |
alerts[$i].annotations.unit | string | 监控数据单位 |
使用webhook终端对接告警消息,关注告警消息列表 alerts 内容即可,其余信息为通用说明可忽略,详见 官方数据结构 。
示例:
{
"receiver": "openstack/70869e71fdcd4860a1f5275adf73fb12/webhook-test",
"status": "firing",
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "Etcd磁盘同步持续时间过长",
"category": "platform",
"company": "EasyStack",
"ecms_cluster_id": "OpfyBj54wvGtKqVe",
"endpoint": "metrics",
"group_id": "d6e557c8abe593ee4226930dad94403d",
"host_ip": "10.10.1.4",
"instance": "10.10.1.4:2379",
"job": "etcd",
"namespace": "kube-system",
"node_name": "node-1",
"project": "Nanjing_4_10",
"public_vip": "100.100.4.10",
"rule_id": "5441717e39309f2a5de057e97d408233",
"rule_ns": "openstack",
"rule_res": "eks-managed.rules",
"service": "etcd",
"severity": "warning"
},
"annotations": {
"alertname_en": "Etcd disk fsync duration is too long",
"description": "节点 node-1:10.10.1.4 Etcd磁盘WAL同步持续时间过长,磁盘IO性能不足,持续10分钟告警。",
"description_en": "node-1:10.10.1.4 - Etcd disk WAL fsync duration is too long and disk IO performance is insufficient, and this situation continues for 10 minutes.",
"expr": "histogram_quantile(0.99, rate(ecms_etcd_disk_wal_fsync_duration_seconds_bucket[5m])) * 1000",
"legend_format": "\u003cnode_name\u003e fsync duration",
"solution": "请联系您的软件服务提供商,进行问题排查。",
"solution_en": "Please contact your software service provider for problem checking.",
"summary": "节点 node-1:10.10.1.4 Etcd磁盘WAL同步持续时间过长,磁盘IO性能不足,当前99%的持续时间为478ms。",
"summary_en": "node-1:10.10.1.4 - Etcd disk WAL fsync duration is too long and disk IO performance is insufficient. The current 99th percentile fsync durations are 478ms.",
"thresholds": "250,yellow,dashed,Too Long",
"unit": "ms"
},
"startsAt": "2024-03-12T03:29:20.812821959Z",
"endsAt": "0001-01-01T00:00:00Z",
"generatorURL": "http://ecms.web.ntih1l7j.easystack.io/graph?g0.expr=histogram_quantile%280.99%2C+rate%28ecms_etcd_disk_wal_fsync_duration_seconds_bucket%5B5m%5D%29%29+%2A+1000+%3E+250\u0026g0.tab=1",
"fingerprint": "629ab783ab3a361e"
},
{
"status": "resolved",
"labels": {
"alertname": "Etcd磁盘同步持续时间过长",
"category": "platform",
"company": "EasyStack",
"ecms_cluster_id": "OpfyBj54wvGtKqVe",
"endpoint": "metrics",
"group_id": "d6e557c8abe593ee4226930dad94403d",
"host_ip": "10.10.1.5",
"instance": "10.10.1.5:2379",
"job": "etcd",
"namespace": "kube-system",
"node_name": "node-2",
"project": "Nanjing_4_10",
"public_vip": "100.100.4.10",
"rule_id": "5441717e39309f2a5de057e97d408233",
"rule_ns": "openstack",
"rule_res": "eks-managed.rules",
"service": "etcd",
"severity": "warning"
},
"annotations": {
"alertname_en": "Etcd disk fsync duration is too long",
"description": "节点 node-2:10.10.1.5 Etcd磁盘WAL同步持续时间过长,磁盘IO性能不足,持续10分钟告警。",
"description_en": "node-2:10.10.1.5 - Etcd disk WAL fsync duration is too long and disk IO performance is insufficient, and this situation continues for 10 minutes.",
"expr": "histogram_quantile(0.99, rate(ecms_etcd_disk_wal_fsync_duration_seconds_bucket[5m])) * 1000",
"legend_format": "<node_name> fsync duration",
"solution": "请联系您的软件服务提供商,进行问题排查。",
"solution_en": "Please contact your software service provider for problem checking.",
"summary": "节点 node-2:10.10.1.5 Etcd磁盘WAL同步持续时间过长,磁盘IO性能不足,当前99%的持续时间为394ms。",
"summary_en": "node-2:10.10.1.5 - Etcd disk WAL fsync duration is too long and disk IO performance is insufficient. The current 99th percentile fsync durations are 394ms.",
"thresholds": "250,yellow,dashed,Too Long",
"unit": "ms"
},
"startsAt": "2024-03-12T03:29:20.812821959Z",
"endsAt": "2024-03-12T10:57:20.812821959Z",
"generatorURL": "http://ecms.web.ntih1l7j.easystack.io/graph?g0.expr=histogram_quantile%280.99%2C+rate%28ecms_etcd_disk_wal_fsync_duration_seconds_bucket%5B5m%5D%29%29+%2A+1000+%3E+250\u0026g0.tab=1",
"fingerprint": "8063c8d1127089ad"
}
],
"groupLabels": {
"alertname": "Etcd磁盘同步持续时间过长",
"group_id": "d6e557c8abe593ee4226930dad94403d"
},
"commonLabels": {
"alertname": "Etcd磁盘同步持续时间过长",
"category": "platform",
"company": "EasyStack",
"ecms_cluster_id": "OpfyBj54wvGtKqVe",
"endpoint": "metrics",
"group_id": "d6e557c8abe593ee4226930dad94403d",
"job": "etcd",
"namespace": "kube-system",
"project": "Nanjing_4_10",
"public_vip": "100.100.4.10",
"rule_id": "5441717e39309f2a5de057e97d408233",
"rule_ns": "openstack",
"rule_res": "eks-managed.rules",
"service": "etcd",
"severity": "warning"
},
"commonAnnotations": {
"alertname_en": "Etcd disk fsync duration is too long",
"expr": "histogram_quantile(0.99, rate(ecms_etcd_disk_wal_fsync_duration_seconds_bucket[5m])) * 1000",
"legend_format": "<node_name> fsync duration",
"solution": "请联系您的软件服务提供商,进行问题排查。",
"solution_en": "Please contact your software service provider for problem checking.",
"thresholds": "250,yellow,dashed,Too Long",
"unit": "ms"
},
"externalURL": "http://alertmanager-ecms-1:9093",
"version": "4",
"groupKey": "{}/{}/{group_id=\"d6e557c8abe593ee4226930dad94403d\"}:{alertname=\"Etcd磁盘同步持续时间过长\", group_id=\"d6e557c8abe593ee4226930dad94403d\"}",
"truncatedAlerts": 0
}