可观测服务提供实时和历史告警消息的查询,支持获取平台/项目的告警,用于异常情况下的故障分析和第三方告警系统对接等场景。
前提条件
- 签名机制使用 Token 认证,需提供项目用户身份验证生成的 Project 级别的 Token;
- 服务地址:
emla.openstack.svc.cluster.local
(示例使用默认根域名openstack.svc.cluster.local
)。
告警消息查询
该接口对应告警消息页面的实时和历史告警消息,提供告警内容、状态、级别、分类和组件、项目和部门、分组和规则等信息,可查询整个平台或某项目的告警,并支持分类(数字原生引擎/云产品/用户负载)、状态(告警中/已屏蔽/已恢复)、级别(严重/警告/信息)和时间等粒度的条件过滤。
URI
GET /apis/alerting/v1/projects/<project_id>/alerts
请求参数
名称 | 输入 | 类型 | 是否必选 | 描述 |
---|---|---|---|---|
project_id | Path | string | 是 | 项目id |
all_tenants | Query | boolean | 否 | 是否获取所有项目的告警消息 |
categories | Query | string | 否 | 分类(可选值:数字原生引擎 platform、云产品 cloudproduct、用户负载 userload) |
states | Query | string | 否 | 状态(可选值:告警中 firing、已屏蔽 silenced、已恢复 resolved) |
severities | Query | string | 否 | 级别(可选值:严重 critical、警告 warning、信息 info) |
start | Query | unix_timestamp | 否 | 开始时间 |
end | Query | unix_timestamp | 否 | 结束时间 |
- 上述可选参数若不指定,默认返回当前所有;
- all_tenants 仅供云管理员的admin项目使用,可查询整个平台的告警消息;
- categories、states和severities参数支持使用逗号组合多值,如
states=firing,silenced
查询实时告警消息;- start和end可指定查询存在告警消息的时间范围,start需小于end。
请求示例
云管理员查询平台数字原生引擎和用户负载分类的实时告警消息
curl -H 'X-Auth-Token: gAAAAABl7ng2_pQQGur8_EMHlV2rw2pBx_xn7FOXa4BncLHwouEKmC55Aqqavq8puUgjiIoPqp7GFRMz4qP7mhnHqA7VSh3dAp7PXb3dEe3IPgM51b-T2gCczrM4UYkS3qGtXiJBG7M8TE_Ti9qVT6tghF5fb_kQlg' 'http://emla.openstack.svc.cluster.local/apis/alerting/v1/projects/1100e312c9df4567a23806000ebee655/alerts?all_tenants=true&categories=platform,userload&states=firing,silenced'
普通用户查询某项目在 2024-03-01 00:00:00~2024-03-10 00:00:00 期间存在的历史告警消息
curl -H 'X-Auth-Token: gAAAAABl7q6QHukMZINDI4At_LRwXQ7gSTdERzKvcDNmRD7187vXwlGRXCqoSkzTvpkhxbu_r2VNejLky8CWy0e9Wgu8-MqseVEyIf3F9JL2eWIeFiZQdqSQATQ-wo1fd3qEO_kISuJyefoDL5JhEPzfSEF1_4RFwQ' 'http://emla.openstack.svc.cluster.local/apis/alerting/v1/projects/562a6eea71eb40e0be7d79d4d87ce94a/alerts?states=resolved&start=1709222400&end=1710000000'
响应参数
名称 | 类型 | 描述 |
---|---|---|
code | int | 状态码 |
error | string | 错误信息 |
data.statistics | dict | 告警消息统计信息 |
data.statistics.total | int | 消息总数 |
data.statistics.critical | int | 严重级别数量 |
data.statistics.warning | int | 警告级别数量 |
data.statistics.info | int | 信息级别数量 |
data.items | list | 告警消息列表 |
data.items[$i].id | string | 消息id |
data.items[$i].alertNameCN | string | 消息名称-中文 |
data.items[$i].alertNameEN | string | 消息名称-英文 |
data.items[$i].status | string | 状态 |
data.items[$i].severity | string | 级别 |
data.items[$i].startsAt | string | 开始时间 |
data.items[$i].updatedAt | string | 更新时间 |
data.items[$i].endsAt | string | 结束时间 |
data.items[$i].fingerprint | string | 消息标识 |
data.items[$i].silencedBy | list | silence id列表 |
data.items[$i].silenceStartsAt | string | 屏蔽开始时间 |
data.items[$i].silenceEndsAt | string | 屏蔽结束时间 |
data.items[$i].silencedByRule | boolean | 是否屏蔽过告警规则 |
data.items[$i].domainID | string | 所属部门id(仅云管权限存在) |
data.items[$i].domainName | string | 所属部门名称(仅云管权限存在) |
data.items[$i].projectID | string | 所属项目id(仅云管权限存在) |
data.items[$i].projectName | string | 所属项目名称(仅云管权限存在) |
data.items[$i].group | dict | 所属告警分组信息 |
data.items[$i].group.groupID | string | 分组id |
data.items[$i].group.groupName | string | 分组名称 |
data.items[$i].rule | dict | 所属告警规则信息 |
data.items[$i].rule.ruleID | string | 规则id |
data.items[$i].rule.ruleNameCN | string | 规则名称-中文 |
data.items[$i].rule.ruleNameEN | string | 规则名称-英文 |
data.items[$i].category | string | 分类 |
data.items[$i].component | string | 组件 |
data.items[$i].labels | dict | 标签字典 |
data.items[$i].annotations | dict | 注释 |
data.items[$i].annotations.summaryCN | string | 告警详情-中文 |
data.items[$i].annotations.summaryEN | string | 告警详情-英文 |
data.items[$i].annotations.descriptionCN | string | 告警概述-中文 |
data.items[$i].annotations.descriptionEN | string | 告警概述-英文 |
data.items[$i].annotations.solutionCN | string | 解决方案-中文 |
data.items[$i].annotations.solutionEN | string | 解决方案-英文 |
data.items[$i].annotations.expr | string | 监控数据查询表达式 |
data.items[$i].annotations.legendFormat | string | 监控数据图例 |
data.items[$i].annotations.thresholds | string | 监控数据阈值 |
data.items[$i].annotations.unit | string | 监控数据单位 |
响应示例
{
"code": 200,
"error": "",
"data": {
"statistics": {
"critical": 0,
"info": 0,
"total": 2,
"warning": 2
},
"items": [
{
"id": "e66f94883c9abfa64e30aea27e44bcd1",
"alertNameCN": "节点内存使用率大于70%",
"alertNameEN": "The memory utilization of a node is greater than 70%",
"status": "firing",
"severity": "warning",
"startsAt": "2024-03-10T12:06:43.791Z",
"endsAt": "0001-01-01T00:00:00.000Z",
"updatedAt": "2024-03-11T02:53:43.904Z",
"fingerprint": "f31ae4758d642870",
"labels": {
"alertname": "节点内存使用率大于70%",
"category": "platform",
"company": "nanjing_3_12",
"group_id": "adfbede9fe42a3c1d3aaab12e78af4be",
"host_ip": "10.10.1.5",
"node_name": "node-2",
"project": "nanjing_3_12",
"public_vip": "100.100.4.10",
"role": "controller_all",
"rule_id": "f0dc35909cecd9d17ee9a127915c2308",
"rule_ns": "openstack",
"rule_resource": "escl.rules",
"severity": "warning",
"state": "disabled"
},
"annotations": {
"descriptionCN": "节点 node-2:10.10.1.5 内存使用率大于70%且小于90%,持续5分钟告警。",
"descriptionEN": "node-2:10.10.1.5 - The memory utilization of this node is greater than 70% and less than 90%, and this situation continues for 5 minutes.",
"solutionCN": "请降低您的云主机业务负载、迁移云主机到其他节点,或计划扩容云环境。",
"solutionEN": "Please lower the workload of your instances, or migrate instances in this node to other nodes, or plan expansion of this cloud platform.",
"summaryCN": "节点 node-2:10.10.1.5 内存使用率大于70%,其中云主机内存使用率为0.00%。",
"summaryEN": "node-2:10.10.1.5 - The memory utilization of this node is greater than 70%, including the memory utilization of instances of this node is 0.00%.",
"expr": "node_instance_memory_utilization * on(host_ip,node_name) group_left(role) ecms_node_role{role=~\"controller_all|compute_osd|compute\"} * on(host_ip,node_name) group_left(state) ecms_node_dpdk_state{state=\"disabled\"} * on(host_ip,node_name) group_left() count by(node_name, host_ip)(((node_memory_MemTotal_bytes{instance=~\".+\"} - node_memory_MemFree_bytes{instance=~\".+\"} - node_memory_Buffers_bytes{instance=~\".+\"} - node_memory_Slab_bytes{instance=~\".+\"} - node_memory_Cached_bytes{instance=~\".+\"}) / node_memory_MemTotal_bytes{instance=~\".+\"} * 100))",
"legendFormat": "<node_name> memory utilization",
"thresholds": "70,yellow,dashed,Warning;90,red,dashed,Critical",
"unit": "%"
},
"group": {
"groupID": "adfbede9fe42a3c1d3aaab12e78af4be",
"groupName": "node.rules"
},
"rule": {
"ruleID": "f0dc35909cecd9d17ee9a127915c2308",
"ruleNameCN": "节点内存使用率大于70%",
"ruleNameEN": "The memory utilization of a node is greater than 70%"
},
"domainID": "default",
"domainName": "Default",
"projectID": "admin",
"projectName": "admin",
"category": "platform",
"component": "ESCL"
},
{
"id": "92dcb7c537c3fb8b5a354726212cd3cd",
"alertNameCN": "Etcd磁盘同步持续时间过长",
"alertNameEN": "Etcd disk fync duration is too long",
"status": "resolved",
"severity": "warning",
"startsAt": "2024-02-15T18:22:00.551Z",
"endsAt": "2024-02-15T18:26:00.551Z",
"updatedAt": "2024-02-15T18:26:00.551Z",
"fingerprint": "e054585f3994c467",
"labels": {
"alertname": "Etcd磁盘同步持续时间过长",
"category": "platform",
"company": "nanjing_3_12",
"endpoint": "metrics",
"group_id": "d6e557c8abe593ee4226930dad94403d",
"host_ip": "10.10.1.4",
"instance": "10.10.1.4:2379",
"job": "etcd",
"namespace": "kube-system",
"node_name": "node-1",
"project": "nanjing_3_12",
"public_vip": "100.100.4.10",
"rule_id": "5441717e39309f2a5de057e97d408233",
"rule_ns": "openstack",
"rule_resource": "eks-managed.rules",
"service": "etcd",
"severity": "warning"
},
"annotations": {
"descriptionCN": "节点 node-1:10.10.1.4 Etcd磁盘WAL同步持续时间过长,磁盘IO性能不足,持续10分钟告警。",
"descriptionEN": "node-1:10.10.1.4 - Etcd disk WAL fsync duration is too long and disk IO performance is insufficient, and this situation continues for 10 minutes.",
"solutionCN": "请联系您的软件服务提供商,进行问题排查。",
"solutionEN": "Please contact your software service provider for problem checking.",
"summaryCN": "节点 node-1:10.10.1.4 Etcd磁盘WAL同步持续时间过长,磁盘IO性能不足,当前99%的持续时间为452ms。",
"summaryEN": "node-1:10.10.1.4 - Etcd disk WAL fsync duration is too long and disk IO performance is insufficient. The current 99th percentile fsync durations are 452ms.",
"expr": "histogram_quantile(0.99, rate(ecms_etcd_disk_wal_fsync_duration_seconds_bucket[5m])) * 1000",
"legendFormat": "<node_name> fsync duration",
"thresholds": "250,yellow,dashed,Too Long",
"unit": "ms"
},
"group": {
"groupID": "d6e557c8abe593ee4226930dad94403d",
"groupName": "eks-managed.rules"
},
"rule": {
"ruleID": "5441717e39309f2a5de057e97d408233",
"ruleNameCN": "Etcd磁盘同步持续时间过长",
"ruleNameEN": "Etcd disk fync duration is too long"
},
"domainID": "default",
"domainName": "Default",
"projectID": "admin",
"projectName": "admin",
"category": "platform",
"component": "EKS-Managed"
}
]
}
}