服务器监控软件夜莺使用(二)

文章目录

  • 一、采集器安装
    • 1. Categraf简介
    • 2. Categraf部署
    • 3. 测试服务器部署
    • 4. 系统监控插件
    • 5. 显卡监控插件
    • 6. 服务监控插件
  • 二、监控仪表盘
    • 1. 机器列表
    • 2. 系统监控
    • 3. 服务监控
  • 三、告警配置
    • 1. 邮件通知
    • 2. 告警规则
    • 3. 告警自愈


一、采集器安装

1. Categraf简介

Categraf 需要部署到所有需要监控的机器上,因为采集 CPU、内存、进程等指标需要读取操作系统里的信息。
Categraf 推送监控数据到服务端,基于 Prometheus 的 RemoteWrite 协议。

Grafana 仪表盘市场
categraf插件说明
categraf部署文档
categraf下载地址
下载文件例如: categraf-v0.3.45-linux-amd64.tar.gz

2. Categraf部署

有些监控插件,docker部署方式很难配置,所以采用二进制部署Categraf。

  1. 删除不使用的插件
    categraf-v0.3.45-linux-amd64/conf/input.*
  2. 修改插件配置*.toml
  3. 修改Categraf配置config.toml
[global]
hostname = "机器标签"
[[writers]]
url = "http://192.168.6.226:17000/prometheus/v1/write"
[ibex]
enable = true
servers = ["192.168.6.226:20090"]
[heartbeat]
url = "http://192.168.6.226:17000/v1/n9e/heartbeat"
  1. 拷贝categraf
    拷贝categraf-v0.3.45-linux-amd64内的所有文件/文件夹到要部署的环境 /home/monitor/categraf
  2. 安装启动categraf
cd /home/monitor/categraf && chmod +x categraf && ./categraf --install && ./categraf --start
  • 其他命令
# 以service方式安装, 相当于添加service文件+systemctl daemon-reload
sudo ./categraf  --install
# 以service方式卸载, 相当于systemctl stop categraf + 删除service文件
# 如果安装过categraf,先卸载
sudo ./categraf  --remove
# 以service方式启动categraf ,相当于systemctl start categraf
sudo ./categraf  --start
# 以service方式停止categraf,相当于systemctl stop categraf
sudo ./categraf  --stop
# 以service方式查看categraf,相当于systemctl status categraf
sudo ./categraf  --status
# 采集了哪些 mysql 指标
sudo ./categraf --test --inputs mysql

3. 测试服务器部署

服务器监控软件夜莺使用(二)_第1张图片

4. 系统监控插件

  • cpu 插件:采集本机 CPU 的使用率、空闲率等
    input.cpu/cpu.toml,可使用默认配置
# 采集频率
interval = 15
# 是否采集每个单核的指标
collect_per_cpu = false
  • 磁盘 插件:采集磁盘利用率、inode利用率等
    input.disk/disk.toml,可使用默认配置
# 采集频率
interval = 15

# 统计指定挂载点
# mount_points = ["/"]

# 按文件系统类型忽略挂载点
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs", "nsfs", "CDFS"]

# 忽略挂载点
ignore_mount_points = ["/boot", "/var/lib/kubelet/pods"]
  • 磁盘IO 插件:采集磁盘读写IO指标
    input.diskio/diskio.toml,可使用默认配置
# 采集频率
interval = 15

# 统计指定设备
# devices = ["sda", "sdb", "vd*"]
  • 内核 插件:采集 OS 启动时间,上下文切换的次数等
    input.kernel/kernel.toml,可使用默认配置
# 采集频率
interval = 15
  • 内存 插件:采集内存利用率等
    input.mem/mem.toml,可使用默认配置
# 采集频率
interval = 15

# 是否采集各个平台特有的指标
collect_platform_fields = true
  • 网络流量 插件:采集网卡的流量、包量等
    input.net/net.toml,可使用默认配置
# 采集频率
interval = 15

# 是否在Linux上收集协议统计信息 
# collect_protocol_stats = false

# 统计指定网卡信息
# interfaces = ["eth0"]
  • 网络连接 插件:采集有多少 time_wait 连接,多少 established 连接等
    input.netstat/netstat.toml,可使用默认配置
# 采集频率
interval = 15

disable_summary_stats = false

# 如果有很多网络连接, 该插件占用系统资源
disable_connection_stats = true

tcp_ext = false
ip_ext = false
  • ntp时间 插件:监控机器时间偏移量
    input.ntp/ntp.toml
# 采集频率
interval = 15

# ntp 服务器
ntp_servers = ["ntp.aliyun.com"]

# 响应超时时间
timeout = 5
  • 进程 插件:采集进程 running 的有多少,sleeping 的有多少,total 有多少
    input.processes/processes.toml,可使用默认配置
# 采集频率
interval = 15

#  强制使用ps命令收集 
# force_ps = false

#  强制使用/proc收集
# force_proc = false
  • system 插件:采集系统负载信息
    input.system/system.toml,可使用默认配置
# 采集频率
interval = 15

# 是否收集system_n_users信息
# collect_user_number = false

5. 显卡监控插件

  • nvidia显卡 插件:监控nvidia显卡信息
    input.nvidia_smi/nvidia_smi.toml
# 采集频率
interval = 15

# 执行本地命令
nvidia_smi_command = "nvidia-smi"

# 可以通过运行`nvidia-smi --help-query-gpus`来查找可能的字段
# `AUTO` 自动检测要查询的字段
query_field_names = "AUTO"

6. 服务监控插件

  • docker 插件:docker容器监控
    input.docker/docker.toml
# 采集频率
interval = 15

[[instances]]
# interval = global.interval * interval_times
interval_times = 1

## Docker Endpoint
endpoint = "unix:///var/run/docker.sock"

# 包括/排除的容器
container_name_include = []
container_name_exclude = []

gather_services = false
gather_extend_memstats = false

container_id_label_enable = true
container_id_label_short_style = false

timeout = "5s"

perdevice_include = []

total_include = ["cpu", "blkio", "network"]

docker_label_include = []
docker_label_exclude = ["annotation*", "io.kubernetes*", "*description*", "*maintainer*", "*hash", "*author*", "*org_*", "*date*", "*url*", "*docker_compose*"]
  • 日志 插件:提取日志内容,转换为监控metrics
    input.mtail/mtail.toml
# 采集频率
interval = 15

[[instances]]
progs = "/home/monitor/categraf/conf/input.mtail/prog1" # 日志解析规则配置文件的路径
logs = ["/home/logs/example/all.log"] # 日志文件
labels = { log="6.221-example-log" } # 日志标签
override_timezone = "Asia/Shanghai" # 时区
emit_metric_timestamp = "true" # 时间戳

input.mtail/prog1/rule_error.mtail

gauge error_num
/ERROR.*/ {
      error_num++
}

input.mtail/prog1/rule_info.mtail

gauge info_num
/INFO.*/ {
      info_num++
}

input.mtail/prog1/rule_login.mtail

gauge login_num
/登录账户.*/ {
      login_num++
}
  • mysql 插件:连到 mysql 实例,执行一些 sql,解析输出内容,整理为监控数据上报
    input.mysql/mysql.toml
# 采集频率
interval = 15

# 定义instance, 一个instance对应一个mysql实例
[[instances]]
address = "192.168.6.200:3306"
username = "root"
password = "123456"

# 是否使用tls 等定制参数
parameters = "tls=false"
  • nginx 插件:监控nginx状态,该插件依赖nginx的 **http_stub_status_module
    input.nginx/nginx.toml
# 采集频率
interval = 15

[[instances]]
# 设置访问 Nginx stub_status 链接
urls = ["http://192.168.6.223:8080/nginx_status"]

response_timeout = "5s"

nginx服务需要启用http_stub_status_module模块
nginx.conf 配置加上

http {
     location /nginx_status {
            stub_status on;
            access_log off;
            allow 192.168.6.226;			// 允许IP访问
            deny all;						// 禁止其他IP访问
        }
    }
}

http://192.168.6.223:8080/nginx_status
在这里插入图片描述

  • redis 插件:就是连上 redis,执行 info 命令,解析结果,整理成监控数据上报
    input.redis/redis.toml
# 采集频率
interval = 15

# 定义instance, 一个instance对应一个redis实例
[[instances]]
address = "192.168.6.223:6379"
username = ""
password = ""
pool_size = 2

# 是否开启slowlog收集
gather_slowlog = true

# 最多收集少条slowlog
slowlog_max_len = 100

二、监控仪表盘

1. 机器列表

  • 仪表盘 JSON
{
    "name": "机器列表",
    "tags": "",
    "ident": "",
    "configs": {
        "panels": [
            {
                "type": "table",
                "id": "77bf513a-8504-4d33-9efe-75aaf9abc9e4",
                "layout": {
                    "h": 11,
                    "i": "77bf513a-8504-4d33-9efe-75aaf9abc9e4",
                    "isResizable": true,
                    "w": 24,
                    "x": 0,
                    "y": 5
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "avg(system_uptime{ident=~\"$ident\"}) by (ident)",
                        "refId": "A",
                        "legend": "启动时长"
                    },
                    {
                        "expr": "avg(cpu_usage_active{cpu=\"cpu-total\", ident=~\"$ident\"}) by (ident)",
                        "legend": "CPU使用率",
                        "refId": "B"
                    },
                    {
                        "expr": "avg(mem_used_percent{ident=~\"$ident\"}) by (ident)",
                        "legend": "内存使用率",
                        "refId": "C"
                    },
                    {
                        "expr": "avg(mem_total{ident=~\"$ident\"}) by (ident)",
                        "legend": "总内存",
                        "refId": "D"
                    },
                    {
                        "expr": "avg(disk_used_percent{ident=~\"$ident\",path=\"/\"}) by (ident)",
                        "legend": "硬盘使用率",
                        "refId": "E"
                    },
                    {
                        "expr": "avg(disk_total{ident=~\"$ident\"}) by (ident)",
                        "refId": "F",
                        "legend": "总硬盘"
                    },
                    {
                        "expr": "avg(rate(net_bytes_recv{ident=~\"$ident\"}[1m])) by(ident)",
                        "refId": "G",
                        "legend": "网络入流量"
                    },
                    {
                        "expr": "avg(rate(net_bytes_sent{ident=~\"$ident\"}[1m])) by(ident)",
                        "refId": "H",
                        "legend": "网络出流量"
                    },
                    {
                        "expr": "avg(nvidia_smi_utilization_gpu_ratio{ident=~\"$ident\"}) by (ident)",
                        "refId": "I",
                        "legend": "GPU使用率"
                    },
                    {
                        "expr": "avg(nvidia_smi_memory_used_bytes/nvidia_smi_memory_total_bytes{ident=~\"$ident\"}) by (ident)",
                        "refId": "J",
                        "legend": "显存使用率"
                    },
                    {
                        "expr": "avg(nvidia_smi_memory_total_bytes{ident=~\"$ident\"}) by (ident)",
                        "refId": "K",
                        "legend": "总显存"
                    },
                    {
                        "expr": "ntp_offset_ms",
                        "refId": "L",
                        "legend": "NTP偏移 ms"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {
                            "renameByName": {
                                "ident": "机器"
                            }
                        }
                    }
                ],
                "name": "机器列表",
                "maxPerRow": 4,
                "custom": {
                    "showHeader": true,
                    "colorMode": "background",
                    "calc": "lastNotNull",
                    "displayMode": "labelValuesToRows",
                    "aggrDimension": "ident",
                    "sortColumn": "ident",
                    "sortOrder": "ascend",
                    "linkMode": "cellLink"
                },
                "options": {
                    "standardOptions": {}
                },
                "overrides": [
                    {
                        "type": "special",
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "A"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "humantimeSeconds"
                            }
                        }
                    },
                    {
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "B"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "percent",
                                "decimals": 1
                            },
                            "valueMappings": []
                        }
                    },
                    {
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "C"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "percent",
                                "decimals": 1
                            },
                            "valueMappings": []
                        },
                        "type": "special"
                    },
                    {
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "D"
                        },
                        "properties": {
                            "standardOptions": {
                                "decimals": 1,
                                "util": "bytesIEC"
                            },
                            "valueMappings": []
                        },
                        "type": "special"
                    },
                    {
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "E"
                        },
                        "properties": {
                            "standardOptions": {
                                "decimals": 1,
                                "util": "percent"
                            },
                            "valueMappings": []
                        },
                        "type": "special"
                    },
                    {
                        "type": "special",
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "F"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "bytesIEC",
                                "decimals": 0
                            }
                        }
                    },
                    {
                        "type": "special",
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "G"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "bytesSecIEC",
                                "decimals": 1
                            }
                        }
                    },
                    {
                        "type": "special",
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "H"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "bytesSecIEC",
                                "decimals": 1
                            }
                        }
                    },
                    {
                        "type": "special",
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "I"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "percentUnit",
                                "decimals": 1
                            }
                        }
                    },
                    {
                        "type": "special",
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "J"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "percentUnit",
                                "decimals": 1
                            }
                        }
                    },
                    {
                        "type": "special",
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "K"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "bytesIEC",
                                "decimals": 1
                            }
                        }
                    }
                ]
            }
        ],
        "var": [
            {
                "definition": "prometheus",
                "name": "prom",
                "type": "datasource"
            },
            {
                "allOption": true,
                "datasource": {
                    "cate": "prometheus",
                    "value": "${prom}"
                },
                "definition": "label_values(system_load1,ident)",
                "multi": true,
                "name": "ident",
                "type": "query"
            }
        ],
        "version": "3.0.0"
    }
}
  • 仪表盘 效果
    服务器监控软件夜莺使用(二)_第2张图片

2. 系统监控

  • 仪表盘 JSON
{
    "name": "系统监控",
    "tags": "",
    "ident": "",
    "configs": {
        "panels": [
            {
                "type": "timeseries",
                "id": "043c26de-d19f-4fe8-a615-2b7c10ceb828",
                "layout": {
                    "h": 7,
                    "w": 8,
                    "x": 0,
                    "y": 0,
                    "i": "043c26de-d19f-4fe8-a615-2b7c10ceb828",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "cpu_usage_active{ident=~\"$ident\"}",
                        "refId": "A",
                        "legend": "{{ident}}-使用率"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "CPU使用率",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "util": "percent",
                        "min": 0,
                        "max": 101,
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off",
                            "standardOptions": {
                                "min": null,
                                "max": null,
                                "decimals": null
                            }
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "239aacdf-1982-428b-b240-57f4ce7f946d",
                "layout": {
                    "h": 7,
                    "w": 8,
                    "x": 8,
                    "y": 0,
                    "i": "239aacdf-1982-428b-b240-57f4ce7f946d",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "mem_used_percent{ident=~\"$ident\"}",
                        "refId": "A",
                        "legend": "{{ident}}-使用率"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "内存使用率",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "util": "percent",
                        "min": 0,
                        "max": 101,
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off",
                            "standardOptions": {
                                "decimals": null,
                                "min": null,
                                "max": null
                            }
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "bbd1ebda-99f6-419c-90a5-5f84973976dd",
                "layout": {
                    "h": 7,
                    "w": 8,
                    "x": 16,
                    "y": 0,
                    "i": "bbd1ebda-99f6-419c-90a5-5f84973976dd",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "rate(diskio_read_bytes{ident=~\"$ident\"}[1m])",
                        "legend": "{{ident}}-{{name}}-读IO",
                        "refId": "A"
                    },
                    {
                        "expr": "rate(diskio_write_bytes{ident=~\"$ident\"}[1m])",
                        "legend": "{{ident}}-{{name}}-写IO",
                        "refId": "B"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "磁盘IO",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "util": "bytesIEC",
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "f2ee5d32-737c-4095-b6b7-b15b778ffdb9",
                "layout": {
                    "h": 7,
                    "w": 8,
                    "x": 0,
                    "y": 7,
                    "i": "f2ee5d32-737c-4095-b6b7-b15b778ffdb9",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "rate(net_bytes_recv{ident=~\"$ident\"}[1m])",
                        "legend": "{{ident}}-入流量",
                        "refId": "A"
                    },
                    {
                        "expr": "rate(net_bytes_sent{ident=~\"$ident\"}[1m])",
                        "legend": "{{ident}}-出流量",
                        "refId": "B"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "网络流量",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "util": "bytesIEC",
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "6be9a2be-1d4c-488d-b695-aa1d82df3a3c",
                "layout": {
                    "h": 7,
                    "w": 8,
                    "x": 8,
                    "y": 7,
                    "i": "e164a7cb-394c-4670-b83c-e9321a08cbe6",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "nvidia_smi_utilization_gpu_ratio{ident=~\"$ident\"}",
                        "legend": "{{ident}}-使用率",
                        "refId": "A"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "显卡使用率",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "util": "percentUnit",
                        "min": 0,
                        "max": 1.01,
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "7873f825-1e41-45e9-a1ee-792a87fd4351",
                "layout": {
                    "h": 7,
                    "w": 8,
                    "x": 16,
                    "y": 7,
                    "i": "37ced102-b020-4e3f-8247-6b2c9240a762",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "nvidia_smi_memory_used_bytes/nvidia_smi_memory_total_bytes{ident=~\"$ident\"}",
                        "legend": "{{ident}}-使用率",
                        "refId": "A"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "显存使用率",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "util": "percentUnit",
                        "min": 0,
                        "max": 1.01,
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            }
        ],
        "var": [
            {
                "definition": "prometheus",
                "name": "prom",
                "type": "datasource"
            },
            {
                "allOption": true,
                "datasource": {
                    "cate": "prometheus",
                    "value": "${prom}"
                },
                "definition": "label_values(system_load1,ident)",
                "multi": true,
                "name": "ident",
                "type": "query"
            }
        ],
        "version": "3.0.0"
    }
}
  • 仪表盘 效果
    服务器监控软件夜莺使用(二)_第3张图片

3. 服务监控

  • 仪表盘 JSON
{
    "name": "服务监控",
    "tags": "",
    "ident": "",
    "configs": {
        "panels": [
            {
                "type": "timeseries",
                "id": "043c26de-d19f-4fe8-a615-2b7c10ceb828",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 0,
                    "y": 0,
                    "i": "043c26de-d19f-4fe8-a615-2b7c10ceb828",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "mysql_global_status_threads_connected{ident=~\"$ident\"}",
                        "refId": "A",
                        "legend": "{{ident}}-当前连接数"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "MySQL 连接数",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "min": null,
                        "max": null,
                        "decimals": null
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off",
                            "standardOptions": {
                                "min": null,
                                "max": null,
                                "decimals": null
                            }
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "bbd1ebda-99f6-419c-90a5-5f84973976dd",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 8,
                    "y": 0,
                    "i": "bbd1ebda-99f6-419c-90a5-5f84973976dd",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "mysql_global_status_slow_queries{ident=~\"$ident\"}",
                        "legend": "{{ident}}-慢查询",
                        "refId": "A"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "MySQL 慢查询数",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "decimals": null
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "3ca8db64-b25e-4e72-8dac-187cec4886ae",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 16,
                    "y": 0,
                    "i": "7174939f-2742-47bd-a023-5d1d3698bf76",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "mtail_login_num{ident=~\"$ident\"}",
                        "legend": "{{ident}}-登录",
                        "refId": "A",
                        "time": {
                            "start": "now-24h",
                            "end": "now"
                        }
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "登录 日志数",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "093b192e-e991-4590-ab4b-aa768159e00f",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 0,
                    "y": 6,
                    "i": "a18a3bd3-8c2b-4fa2-81f3-7b0d00b49cc9",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "redis_connected_clients{ident=~\"$ident\"}",
                        "refId": "A",
                        "legend": "{{ident}}-当前连接数"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "Redis 连接数",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "min": null,
                        "max": null,
                        "decimals": null
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0.01,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off",
                            "standardOptions": {
                                "min": null,
                                "max": null,
                                "decimals": null
                            }
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "2674442f-937f-4027-806b-10b2286b14f6",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 8,
                    "y": 6,
                    "i": "c8c061df-894d-458e-a89d-86a8428c52c9",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "redis_used_memory{ident=~\"$ident\"}",
                        "legend": "{{ident}}-内存",
                        "refId": "A"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "Redis 使用内存",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "decimals": null
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "d26e8bc3-16a0-4a60-9aa9-36d71b85abc5",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 16,
                    "y": 6,
                    "i": "0a3310ea-74ca-48fa-8c18-52c1b0f71235",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "mtail_error_num{ident=~\"$ident\"}",
                        "legend": "{{ident}}-错误",
                        "refId": "A",
                        "time": {
                            "start": "now-24h",
                            "end": "now"
                        }
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "Error 日志数",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "7fa2cdbe-b782-4b71-bd7e-2cdba7455e77",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 0,
                    "y": 12,
                    "i": "9a2e4d49-7a4f-4627-b2f6-cbe0e4ab04b1",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "nginx_active{ident=~\"$ident\"}",
                        "refId": "A",
                        "legend": "{{ident}}-活跃连接"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "Nginx 活跃连接数",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "min": null,
                        "max": null,
                        "decimals": null
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off",
                            "standardOptions": {
                                "min": null,
                                "max": null,
                                "decimals": null
                            }
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "0cb01432-ea29-41f4-8e6f-e6b9b71e90ab",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 8,
                    "y": 12,
                    "i": "8bf97e38-e840-4804-a686-28bb65fec78d",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "docker_n_containers_running{ident=~\"$ident\"}",
                        "refId": "A",
                        "legend": "{{ident}}-启动容器"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "Docker 启动容器数",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "min": null,
                        "max": null,
                        "decimals": null
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off",
                            "standardOptions": {
                                "min": null,
                                "max": null,
                                "decimals": null
                            }
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "936b934b-6340-4743-8c12-821c63210fd6",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 16,
                    "y": 12,
                    "i": "c6da1998-c1e3-4486-a24c-58e26d349206",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "docker_container_mem_usage{ident=~\"$ident\"}",
                        "legend": "{{ident}}-{{container_name}}-内存",
                        "refId": "A"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "Docker 内存使用率",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            }
        ],
        "var": [
            {
                "definition": "prometheus",
                "name": "prom",
                "type": "datasource"
            },
            {
                "allOption": true,
                "datasource": {
                    "cate": "prometheus",
                    "value": "${prom}"
                },
                "definition": "label_values(system_load1,ident)",
                "multi": true,
                "name": "ident",
                "type": "query"
            }
        ],
        "version": "3.0.0"
    }
}
  • 仪表盘 效果
    服务器监控软件夜莺使用(二)_第4张图片

三、告警配置

1. 邮件通知

  • 配置 SMTP
    服务器监控软件夜莺使用(二)_第5张图片
  • 配置 用户邮箱服务器监控软件夜莺使用(二)_第6张图片
  • 配置 邮件通知模板服务器监控软件夜莺使用(二)_第7张图片
<!DOCTYPE html>
	<html lang="en">
	<head>
		<meta charset="UTF-8">
		<meta http-equiv="X-UA-Compatible" content="ie=edge">
		<title>夜莺告警通知</title>
		<style type="text/css">
			.wrapper {
				background-color: #f8f8f8;
				padding: 15px;
				height: 100%;
			}
			.main {
				width: 600px;
				padding: 30px;
				margin: 0 auto;
				background-color: #fff;
				font-size: 12px;
				font-family: verdana,'Microsoft YaHei',Consolas,'Deja Vu Sans Mono','Bitstream Vera Sans Mono';
			}
			header {
				border-radius: 2px 2px 0 0;
			}
			header .title {
				font-size: 14px;
				color: #333333;
				margin: 0;
			}
			header .sub-desc {
				color: #333;
				font-size: 14px;
				margin-top: 6px;
				margin-bottom: 0;
			}
			hr {
				margin: 20px 0;
				height: 0;
				border: none;
				border-top: 1px solid #e5e5e5;
			}
			em {
				font-weight: 600;
			}
			table {
				margin: 20px 0;
				width: 100%;
			}
	
			table tbody tr{
				font-weight: 200;
				font-size: 12px;
				color: #666;
				height: 32px;
			}
			.succ {
				background-color: green;
				color: #fff;
			}
			.fail {
				background-color: red;
				color: #fff;
			}
			.succ th, .succ td, .fail th, .fail td {
				color: #fff;
			}
			table tbody tr th {
				width: 80px;
				text-align: right;
			}
			.text-right {
				text-align: right;
			}
			.body {
				margin-top: 24px;
			}
			.body-text {
				color: #666666;
				-webkit-font-smoothing: antialiased;
			}
			.body-extra {
				-webkit-font-smoothing: antialiased;
			}
			.body-extra.text-right a {
				text-decoration: none;
				color: #333;
			}
			.body-extra.text-right a:hover {
				color: #666;
			}
			.button {
				width: 200px;
				height: 50px;
				margin-top: 20px;
				text-align: center;
				border-radius: 2px;
				background: #2D77EE;
				line-height: 50px;
				font-size: 20px;
				color: #FFFFFF;
				cursor: pointer;
			}
			.button:hover {
				background: rgb(25, 115, 255);
				border-color: rgb(25, 115, 255);
				color: #fff;
			}
			footer {
				margin-top: 10px;
				text-align: right;
			}
			.footer-logo {
				text-align: right;
			}
			.footer-logo-image {
				width: 108px;
				height: 27px;
				margin-right: 10px;
			}
			.copyright {
				margin-top: 10px;
				font-size: 12px;
				text-align: right;
				color: #999;
				-webkit-font-smoothing: antialiased;
			}
		</style>
	</head>
	<body>
	<div class="wrapper">
		<div class="main">
			<header>
				<h3 class="title">{{.RuleName}}</h3>
				<p class="sub-desc"></p>
			</header>
			<hr>
			<div class="body">
				<table cellspacing="0" cellpadding="0" border="0">
					<tbody>
					{{if .IsRecovered}}
					<tr class="succ">
						<th>级别状态:</th>
						<td>S{{.Severity}} Recovered</td>
					</tr>
					{{else}}
					<tr class="fail">
						<th>级别状态:</th>
						<td>S{{.Severity}} Triggered</td>
					</tr>
					{{end}}
	
					{{if not .IsRecovered}}
					<tr>
						<th>触发时值:</th>
						<td>{{.TriggerValue}}</td>
					</tr>
					{{end}}
	
					{{if .TargetIdent}}
					<tr>
						<th>监控对象:</th>
						<td>{{.TargetIdent}}</td>
					</tr>
					{{end}}
					<tr>
						<th>监控指标:</th>
						<td>{{.TagsJSON}}</td>
					</tr>

                    {{$time_duration := sub now.Unix .FirstTriggerTime }}
					{{if .IsRecovered}}
					<tr>
						<th>持续时间:</th>
						<td>{{humanizeDurationInterface $time_duration}}</td>
					</tr>
					<tr>
						<th>恢复时间:</th>
						<td>{{timeformat .LastEvalTime}}</td>
					</tr>
					{{else}}
					<tr>
						<th>触发时间:</th>
						<td>
							{{timeformat .TriggerTime}}
						</td>
					</tr>
					{{end}}
					</tbody>
				</table>
			</div>
		</div>
	</div>
	</body>
	</html>

2. 告警规则

  • CPU 使用率超过90%
[
  {
    "cate": "prometheus",
    "datasource_ids": [
      0
    ],
    "name": "CPU 使用率超过90%",
    "note": "",
    "prod": "metric",
    "algorithm": "",
    "algo_params": null,
    "delay": 0,
    "severity": 0,
    "severities": [
      1
    ],
    "disabled": 0,
    "prom_for_duration": 60,
    "prom_ql": "",
    "rule_config": {
      "inhibit": true,
      "queries": [
        {
          "keys": {
            "labelKey": "",
            "valueKey": ""
          },
          "prom_ql": "cpu_usage_active > 90",
          "severity": 1
        }
      ]
    },
    "prom_eval_interval": 15,
    "enable_stime": "00:00",
    "enable_stimes": [
      "00:00"
    ],
    "enable_etime": "23:59",
    "enable_etimes": [
      "23:59"
    ],
    "enable_days_of_week": [
      "1",
      "2",
      "3",
      "4",
      "5",
      "6",
      "0"
    ],
    "enable_days_of_weeks": [
      [
        "1",
        "2",
        "3",
        "4",
        "5",
        "6",
        "0"
      ]
    ],
    "enable_in_bg": 0,
    "notify_recovered": 1,
    "notify_channels": [
      "email"
    ],
    "notify_repeat_step": 60,
    "notify_max_number": 3,
    "recover_duration": 60,
    "callbacks": [],
    "runbook_url": "",
    "append_tags": [],
    "annotations": {},
    "extra_config": null
  }
]
  • MySQL 1分钟内慢查询数超过10个
[
  {
    "cate": "prometheus",
    "datasource_ids": [
      0
    ],
    "name": "MySQL 1分钟内慢查询数超过10个",
    "note": "",
    "prod": "metric",
    "algorithm": "",
    "algo_params": null,
    "delay": 0,
    "severity": 0,
    "severities": [
      1
    ],
    "disabled": 0,
    "prom_for_duration": 120,
    "prom_ql": "",
    "rule_config": {
      "inhibit": false,
      "queries": [
        {
          "keys": {
            "labelKey": "",
            "valueKey": ""
          },
          "prom_ql": "increase(mysql_global_status_slow_queries[1m]) > 10",
          "severity": 1
        }
      ]
    },
    "prom_eval_interval": 15,
    "enable_stime": "00:00",
    "enable_stimes": [
      "00:00"
    ],
    "enable_etime": "23:59",
    "enable_etimes": [
      "23:59"
    ],
    "enable_days_of_week": [
      "1",
      "2",
      "3",
      "4",
      "5",
      "6",
      "0"
    ],
    "enable_days_of_weeks": [
      [
        "1",
        "2",
        "3",
        "4",
        "5",
        "6",
        "0"
      ]
    ],
    "enable_in_bg": 0,
    "notify_recovered": 1,
    "notify_channels": [
      "email"
    ],
    "notify_repeat_step": 60,
    "notify_max_number": 3,
    "recover_duration": 60,
    "callbacks": [],
    "runbook_url": "",
    "append_tags": [],
    "annotations": {},
    "extra_config": null
  }
]
  • MySQL 连接数超过80%
[
  {
    "cate": "prometheus",
    "datasource_ids": [
      0
    ],
    "name": "MySQL 连接数超过80%",
    "note": "",
    "prod": "metric",
    "algorithm": "",
    "algo_params": null,
    "delay": 0,
    "severity": 0,
    "severities": [
      1
    ],
    "disabled": 0,
    "prom_for_duration": 120,
    "prom_ql": "",
    "rule_config": {
      "inhibit": false,
      "queries": [
        {
          "keys": {
            "labelKey": "",
            "valueKey": ""
          },
          "prom_ql": "avg by (instance) (mysql_global_status_threads_connected) / avg by (instance) (mysql_global_variables_max_connections) * 100 > 80",
          "severity": 1
        }
      ]
    },
    "prom_eval_interval": 15,
    "enable_stime": "00:00",
    "enable_stimes": [
      "00:00"
    ],
    "enable_etime": "23:59",
    "enable_etimes": [
      "23:59"
    ],
    "enable_days_of_week": [
      "1",
      "2",
      "3",
      "4",
      "5",
      "6",
      "0"
    ],
    "enable_days_of_weeks": [
      [
        "1",
        "2",
        "3",
        "4",
        "5",
        "6",
        "0"
      ]
    ],
    "enable_in_bg": 0,
    "notify_recovered": 1,
    "notify_channels": [
      "email"
    ],
    "notify_repeat_step": 60,
    "notify_max_number": 3,
    "recover_duration": 60,
    "callbacks": [],
    "runbook_url": "",
    "append_tags": [],
    "annotations": {},
    "extra_config": null
  }
]
  • 内存 使用率超过85%
[
  {
    "cate": "prometheus",
    "datasource_ids": [
      0
    ],
    "name": "内存 使用率超过85%",
    "note": "",
    "prod": "metric",
    "algorithm": "",
    "algo_params": null,
    "delay": 0,
    "severity": 0,
    "severities": [
      1
    ],
    "disabled": 0,
    "prom_for_duration": 60,
    "prom_ql": "",
    "rule_config": {
      "inhibit": true,
      "queries": [
        {
          "keys": {
            "labelKey": "",
            "valueKey": ""
          },
          "prom_ql": "mem_used_percent > 85",
          "severity": 1
        }
      ]
    },
    "prom_eval_interval": 15,
    "enable_stime": "00:00",
    "enable_stimes": [
      "00:00"
    ],
    "enable_etime": "23:59",
    "enable_etimes": [
      "23:59"
    ],
    "enable_days_of_week": [
      "1",
      "2",
      "3",
      "4",
      "5",
      "6",
      "0"
    ],
    "enable_days_of_weeks": [
      [
        "1",
        "2",
        "3",
        "4",
        "5",
        "6",
        "0"
      ]
    ],
    "enable_in_bg": 0,
    "notify_recovered": 1,
    "notify_channels": [
      "email"
    ],
    "notify_repeat_step": 60,
    "notify_max_number": 3,
    "recover_duration": 60,
    "callbacks": [],
    "runbook_url": "",
    "append_tags": [],
    "annotations": {},
    "extra_config": null
  }
]
  • 硬盘 使用率超过80%
[
  {
    "cate": "prometheus",
    "datasource_ids": [
      0
    ],
    "name": "硬盘 使用率超过80%",
    "note": "",
    "prod": "metric",
    "algorithm": "",
    "algo_params": null,
    "delay": 0,
    "severity": 0,
    "severities": [
      1
    ],
    "disabled": 0,
    "prom_for_duration": 60,
    "prom_ql": "",
    "rule_config": {
      "inhibit": true,
      "queries": [
        {
          "keys": {
            "labelKey": "",
            "valueKey": ""
          },
          "prom_ql": "disk_used_percent > 80",
          "severity": 1
        }
      ]
    },
    "prom_eval_interval": 30,
    "enable_stime": "00:00",
    "enable_stimes": [
      "00:00"
    ],
    "enable_etime": "23:59",
    "enable_etimes": [
      "23:59"
    ],
    "enable_days_of_week": [
      "0",
      "1",
      "2",
      "3",
      "4",
      "5",
      "6"
    ],
    "enable_days_of_weeks": [
      [
        "0",
        "1",
        "2",
        "3",
        "4",
        "5",
        "6"
      ]
    ],
    "enable_in_bg": 0,
    "notify_recovered": 1,
    "notify_channels": [],
    "notify_repeat_step": 60,
    "notify_max_number": 3,
    "recover_duration": 60,
    "callbacks": [],
    "runbook_url": "",
    "append_tags": [],
    "annotations": {},
    "extra_config": null
  }
]
  • 网络 入流量超过6M/s
[
  {
    "cate": "prometheus",
    "datasource_ids": [
      0
    ],
    "name": "网络 入流量超过6M/s",
    "note": "",
    "prod": "metric",
    "algorithm": "",
    "algo_params": null,
    "delay": 0,
    "severity": 0,
    "severities": [
      1
    ],
    "disabled": 0,
    "prom_for_duration": 60,
    "prom_ql": "",
    "rule_config": {
      "inhibit": false,
      "queries": [
        {
          "keys": {
            "labelKey": "",
            "valueKey": ""
          },
          "prom_ql": "rate(net_bytes_recv[1m]) / 1024 / 1024 > 6",
          "severity": 1
        }
      ]
    },
    "prom_eval_interval": 15,
    "enable_stime": "00:00",
    "enable_stimes": [
      "00:00"
    ],
    "enable_etime": "23:59",
    "enable_etimes": [
      "23:59"
    ],
    "enable_days_of_week": [
      "1",
      "2",
      "3",
      "4",
      "5",
      "6",
      "0"
    ],
    "enable_days_of_weeks": [
      [
        "1",
        "2",
        "3",
        "4",
        "5",
        "6",
        "0"
      ]
    ],
    "enable_in_bg": 0,
    "notify_recovered": 1,
    "notify_channels": [
      "email"
    ],
    "notify_repeat_step": 60,
    "notify_max_number": 3,
    "recover_duration": 60,
    "callbacks": [],
    "runbook_url": "",
    "append_tags": [],
    "annotations": {},
    "extra_config": null
  }
]
  • 网络 出流量超过6M/s
[
  {
    "cate": "prometheus",
    "datasource_ids": [
      0
    ],
    "name": "网络 出流量超过6M/s",
    "note": "",
    "prod": "metric",
    "algorithm": "",
    "algo_params": null,
    "delay": 0,
    "severity": 0,
    "severities": [
      1
    ],
    "disabled": 0,
    "prom_for_duration": 60,
    "prom_ql": "",
    "rule_config": {
      "inhibit": false,
      "queries": [
        {
          "keys": {
            "labelKey": "",
            "valueKey": ""
          },
          "prom_ql": "rate(net_bytes_sent[1m]) / 1024 / 1024 > 6",
          "severity": 1
        }
      ]
    },
    "prom_eval_interval": 15,
    "enable_stime": "00:00",
    "enable_stimes": [
      "00:00"
    ],
    "enable_etime": "23:59",
    "enable_etimes": [
      "23:59"
    ],
    "enable_days_of_week": [
      "1",
      "2",
      "3",
      "4",
      "5",
      "6",
      "0"
    ],
    "enable_days_of_weeks": [
      [
        "1",
        "2",
        "3",
        "4",
        "5",
        "6",
        "0"
      ]
    ],
    "enable_in_bg": 0,
    "notify_recovered": 1,
    "notify_channels": [
      "email"
    ],
    "notify_repeat_step": 60,
    "notify_max_number": 3,
    "recover_duration": 60,
    "callbacks": [],
    "runbook_url": "",
    "append_tags": [],
    "annotations": {},
    "extra_config": null
  }
]

3. 告警自愈

  • 自愈配置
    服务器监控软件夜莺使用(二)_第8张图片
  • 测试告警自愈
    告警自愈 > 自愈脚本 > 创建
    服务器监控软件夜莺使用(二)_第9张图片
    告警自愈 > 自愈脚本 > test 创建任务 > 保存立刻执行 > 执行历史 > 点击标题下的任务
    服务器监控软件夜莺使用(二)_第10张图片

你可能感兴趣的:(软件部署和使用,服务器监控,夜莺)