Loading
close

告警消息对接

time 更新时间:2024-08-20 19:36:26

可观测服务支持第三方告警系统对接平台告警消息,目前提供告警消息API和webhook类型告警订阅两种对接方式,调用告警消息API可获取信息较为丰富的实时和历史消息,使用webhook类型告警订阅,可直接接收由Alertmanager发送的原始消息。若用于消息持久化,推荐使用告警消息API。若用于消息转发,推荐使用webhook类型告警订阅。

告警消息API

告警消息API可查询告警消息页面展示的所有实时和历史消息数据,包括消息内容、状态、级别、详情、分类和组件、项目和部门、分组和规则等信息,同时支持分类(数字原生引擎/云产品/用户负载)、状态(告警中/已屏蔽/已恢复)、级别(严重/警告/信息)和时间等粒度的筛选。使用方法见 API参考-告警消息

该方式适用于消息的查询和存储。

Webhook类型告警订阅

您需要创建webhook类型的告警订阅(用户指南-告警管理-告警订阅-创建订阅-Webhook类型),并关联所需告警分组(用户指南-告警管理-告警订阅-关联分组),自主控制对接哪些分组下规则产生的告警消息,便于关注重点故障的发生和控制消息数量。

配置完成后Alertmanager会直接推送告警中的消息至webhook终端,已屏蔽已恢复状态的告警不会被推送。若配置允许发送恢复通知,在告警恢复时将会被推送一次。如有需要请自行处理和持久化。

该方式适用于消息的通知和转发。

webhook消息推送格式

Alertmanager将以JSON格式向配置的webhook终端发送HTTP POST请求。

名称 类型 描述
receiver string 定义通知将发送到的接收者名称
status string 如果至少有一个告警正在firing,则定义为firing,否则为resolved
groupLabels dict 告警的分组依据
commonLabels dict 告警通用的标签
commonAnnotations dict 告警通用注释集
externalURL string 内部链接
alerts list 告警消息列表(主要内容)
alerts[$i].status string 告警状态
alerts[$i].fingerprint string 告警标识
alerts[$i].startsAt string 开始时间
alerts[$i].endsAt string 结束时间(当告警为resolved时有意义,firing时值为 ‘0001-01-01T00:00:00Z’)
alerts[$i].labels dict 告警标签
alerts[$i].labels.alertname string 告警名称-中文
alerts[$i].labels.severity string 告警级别
alerts[$i].labels.category string 告警分类
alerts[$i].labels.group_id string 告警所属分组id
alerts[$i].labels.rule_id string 告警所属规则id
alerts[$i].labels.ecms_cluster_id string 集群id
alerts[$i].labels.company string 告警平台-客户名称
alerts[$i].labels.project string 告警平台-项目名称
alerts[$i].labels.public_vip string 告警平台-外部访问地址
alerts[$i].annotations dict 告警注释
alerts[$i].annotations.alertname_en string 告警名称-英文
alerts[$i].annotations.description string 告警概述-中文
alerts[$i].annotations.description_en string 告警概述-英文
alerts[$i].annotations.summary string 告警详情-中文
alerts[$i].annotations.summary_en string 告警详情-英文
alerts[$i].annotations.solution string 解决方案-中文
alerts[$i].annotations.solution_en string 解决方案-英文
alerts[$i].annotations.expr string 监控数据查询表达式
alerts[$i].annotations.legend_format string 监控数据图例
alerts[$i].annotations.thresholds string 监控数据阈值
alerts[$i].annotations.unit string 监控数据单位

使用webhook终端对接告警消息,关注告警消息列表 alerts 内容即可,其余信息为通用说明可忽略,详见 官方数据结构

示例:

{
    "receiver": "openstack/70869e71fdcd4860a1f5275adf73fb12/webhook-test",
    "status": "firing",
    "alerts": [
        {
            "status": "firing",
            "labels": {
                "alertname": "Etcd磁盘同步持续时间过长",
                "category": "platform",
                "company": "EasyStack",
                "ecms_cluster_id": "OpfyBj54wvGtKqVe",
                "endpoint": "metrics",
                "group_id": "d6e557c8abe593ee4226930dad94403d",
                "host_ip": "10.10.1.4",
                "instance": "10.10.1.4:2379",
                "job": "etcd",
                "namespace": "kube-system",
                "node_name": "node-1",
                "project": "Nanjing_4_10",
                "public_vip": "100.100.4.10",
                "rule_id": "5441717e39309f2a5de057e97d408233",
                "rule_ns": "openstack",
                "rule_res": "eks-managed.rules",
                "service": "etcd",
                "severity": "warning"
            },
            "annotations": {
                "alertname_en": "Etcd disk fsync duration is too long",
                "description": "节点 node-1:10.10.1.4 Etcd磁盘WAL同步持续时间过长,磁盘IO性能不足,持续10分钟告警。",
                "description_en": "node-1:10.10.1.4 - Etcd disk WAL fsync duration is too long and disk IO performance is insufficient, and this situation continues for 10 minutes.",
                "expr": "histogram_quantile(0.99, rate(ecms_etcd_disk_wal_fsync_duration_seconds_bucket[5m])) * 1000",
                "legend_format": "\u003cnode_name\u003e fsync duration",
                "solution": "请联系您的软件服务提供商,进行问题排查。",
                "solution_en": "Please contact your software service provider for problem checking.",
                "summary": "节点 node-1:10.10.1.4 Etcd磁盘WAL同步持续时间过长,磁盘IO性能不足,当前99%的持续时间为478ms。",
                "summary_en": "node-1:10.10.1.4 - Etcd disk WAL fsync duration is too long and disk IO performance is insufficient. The current 99th percentile fsync durations are 478ms.",
                "thresholds": "250,yellow,dashed,Too Long",
                "unit": "ms"
            },
            "startsAt": "2024-03-12T03:29:20.812821959Z",
            "endsAt": "0001-01-01T00:00:00Z",
            "generatorURL": "http://ecms.web.ntih1l7j.easystack.io/graph?g0.expr=histogram_quantile%280.99%2C+rate%28ecms_etcd_disk_wal_fsync_duration_seconds_bucket%5B5m%5D%29%29+%2A+1000+%3E+250\u0026g0.tab=1",
            "fingerprint": "629ab783ab3a361e"
        },
        {
            "status": "resolved",
            "labels": {
                "alertname": "Etcd磁盘同步持续时间过长",
                "category": "platform",
                "company": "EasyStack",
                "ecms_cluster_id": "OpfyBj54wvGtKqVe",
                "endpoint": "metrics",
                "group_id": "d6e557c8abe593ee4226930dad94403d",
                "host_ip": "10.10.1.5",
                "instance": "10.10.1.5:2379",
                "job": "etcd",
                "namespace": "kube-system",
                "node_name": "node-2",
                "project": "Nanjing_4_10",
                "public_vip": "100.100.4.10",
                "rule_id": "5441717e39309f2a5de057e97d408233",
                "rule_ns": "openstack",
                "rule_res": "eks-managed.rules",
                "service": "etcd",
                "severity": "warning"
            },
            "annotations": {
                "alertname_en": "Etcd disk fsync duration is too long",
                "description": "节点 node-2:10.10.1.5 Etcd磁盘WAL同步持续时间过长,磁盘IO性能不足,持续10分钟告警。",
                "description_en": "node-2:10.10.1.5 - Etcd disk WAL fsync duration is too long and disk IO performance is insufficient, and this situation continues for 10 minutes.",
                "expr": "histogram_quantile(0.99, rate(ecms_etcd_disk_wal_fsync_duration_seconds_bucket[5m])) * 1000",
                "legend_format": "<node_name> fsync duration",
                "solution": "请联系您的软件服务提供商,进行问题排查。",
                "solution_en": "Please contact your software service provider for problem checking.",
                "summary": "节点 node-2:10.10.1.5 Etcd磁盘WAL同步持续时间过长,磁盘IO性能不足,当前99%的持续时间为394ms。",
                "summary_en": "node-2:10.10.1.5 - Etcd disk WAL fsync duration is too long and disk IO performance is insufficient. The current 99th percentile fsync durations are 394ms.",
                "thresholds": "250,yellow,dashed,Too Long",
                "unit": "ms"
            },
            "startsAt": "2024-03-12T03:29:20.812821959Z",
            "endsAt": "2024-03-12T10:57:20.812821959Z",
            "generatorURL": "http://ecms.web.ntih1l7j.easystack.io/graph?g0.expr=histogram_quantile%280.99%2C+rate%28ecms_etcd_disk_wal_fsync_duration_seconds_bucket%5B5m%5D%29%29+%2A+1000+%3E+250\u0026g0.tab=1",
            "fingerprint": "8063c8d1127089ad"
        }
    ],
    "groupLabels": {
        "alertname": "Etcd磁盘同步持续时间过长",
        "group_id": "d6e557c8abe593ee4226930dad94403d"
    },
    "commonLabels": {
        "alertname": "Etcd磁盘同步持续时间过长",
        "category": "platform",
        "company": "EasyStack",
        "ecms_cluster_id": "OpfyBj54wvGtKqVe",
        "endpoint": "metrics",
        "group_id": "d6e557c8abe593ee4226930dad94403d",
        "job": "etcd",
        "namespace": "kube-system",
        "project": "Nanjing_4_10",
        "public_vip": "100.100.4.10",
        "rule_id": "5441717e39309f2a5de057e97d408233",
        "rule_ns": "openstack",
        "rule_res": "eks-managed.rules",
        "service": "etcd",
        "severity": "warning"
    },
    "commonAnnotations": {
        "alertname_en": "Etcd disk fsync duration is too long",
        "expr": "histogram_quantile(0.99, rate(ecms_etcd_disk_wal_fsync_duration_seconds_bucket[5m])) * 1000",
        "legend_format": "<node_name> fsync duration",
        "solution": "请联系您的软件服务提供商,进行问题排查。",
        "solution_en": "Please contact your software service provider for problem checking.",
        "thresholds": "250,yellow,dashed,Too Long",
        "unit": "ms"
    },
    "externalURL": "http://alertmanager-ecms-1:9093",
    "version": "4",
    "groupKey": "{}/{}/{group_id=\"d6e557c8abe593ee4226930dad94403d\"}:{alertname=\"Etcd磁盘同步持续时间过长\", group_id=\"d6e557c8abe593ee4226930dad94403d\"}",
    "truncatedAlerts": 0
}
此篇文章对你是否有帮助?
没帮助
locked-file

您暂无权限访问该产品