grafana alert state error

环境

grafana 版本: v9.5.3 (916d9793aa)

问题

1. net/http: request canceled (Client.Timeoutexceeded while awaiting headers)

failed to execute query A: Get "http://prometheusk8s.monitoring.svc:9090/api/v1/query_range?end=17010504008query=%28kube_pod_container_status_restarts total+-
+kube_pod_container_status_restarts_total+offset+10m+%3E%3D+1%29+and+ignoring+%28reason%29+min_over_time%28kubepod_container_status_last terminated_reason%7Breason%3D%2200MKilled%22%7D%5B10m%5D%29+%3D%3D+1&start=1701049800&step=1" net/http: request canceled (Client.Timeoutexceeded while awaiting headers)

grafana alert state error_第1张图片

2. context deadline exceeded

failed to execute query A: Get "http://prometheus-k8s.monitoring.svc:9090/api/v1/query_range?end=1701069960&query=%28sum%28container_memory_working_set_bytes%7Bname%21%3D%22%22%7D%29+BY+%28instance%2C+name%2C+k8scluster%2Cnamespace%2Cpod%2Cnode%29+%2F+sum%28container_spec_memory_limit_bytes+%3E+0%29+BY+%28instance%2C+name%2C+k8scluster%2Cnamespace%2Cpod%2Cnode%29+%2A+100%29+%3E+90&start=1701069900&step=1": context deadline exceeded

grafana alert state error_第2张图片

处理

1. net/http: request canceled (Client.Timeoutexceeded while awaiting headers)

处理方法:Timeout awaiting response header error while querying
参数:Provision Grafana
例子:Provisioning example

直白就是 grafana 查询 prometheus 超时了,修改 grafana 的 timeout 时间即可,因在 k8s 中部署的,所有需要

### base64 -d 一下 datasources.yaml,修改完成之后直接 base64 一下
# kubectl -n monitoring  edit secrets  grafana-datasources
    {
        "apiVersion": 1,
        "datasources": [
            {
                "access": "proxy",
                "editable": false,
                "name": "prometheus",
                "orgId": 1,
                "type": "prometheus",
                "url": "http://prometheus-k8s.monitoring.svc:9090",
                "version": 1, 
                # 添加了这 3 行以及上面的","
                "jsonData": {
                  "timeout": 60
                }
            }
        ]
    }

2. context deadline exceeded

处理方法:context deadline exceeded; two different notification channels failed!
直白就是还没评估完就超时了,默认评估时间 30s

### 查看日志,有部分类似的信息,duration=30.x
# kubectl -n monitoring logs -f grafana-65c5f686fb-zfgzm | grep deadline
logger=ngalert.scheduler rule_uid=qjGWkH37z org_id=1 version=12 fingerprint=269a4b34c4c2f5b2 attempt=0 now=2023-11-27T08:03:00Z t=2023-11-27T08:03:32.617667829Z level
=error msg="Failed to evaluate rule" error="failed to execute query A: Get \"http://prometheus-k8s.monitoring.svc:9090/api/v1/query_range?end=1701072180&query=%28kube
_pod_container_status_restarts_total+-+kube_pod_container_status_restarts_total+offset+10m+%3E%3D+1%29+and+ignoring+%28reason%29+min_over_time%28kube_pod_container_st
atus_last_terminated_reason%7Breason%3D%22OOMKilled%22%7D%5B10m%5D%29+%3D%3D+1&start=1701071580&step=1\": context deadline exceeded" duration=30.017018796s
### 添加了环境变量(环境变量的优先级比配置文件中的高,如果有环境变量,那么相关的配置文件中的会被覆盖)
# kubectl -n monitoring edit deployments.apps grafana

        - name: GF_UNIFIED_ALERTING_EVALUATION_TIMEOUT
          value: 1m

你可能感兴趣的:(kubernetes,Prometheus,grafana)