http_requests_total{method="GET", status="200"}
。Helm 是 Kubernetes 的包管理工具,可快速部署包含 Prometheus、Alertmanager、Grafana 等的完整监控栈。
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
serviceMonitorSelectorNilUsesHelmValues=false
:允许 Prometheus 发现所有 ServiceMonitor,不限于 Helm 管理的资源。kubectl get pods -n monitoring
应看到以下 Pod:
prometheus-prometheus-kube-prometheus-prometheus-*
alertmanager-prometheus-kube-prometheus-alertmanager-*
grafana-*
Prometheus:
kubectl port-forward svc/prometheus-kube-prometheus-prometheus -n monitoring 9090:9090
浏览器访问 http://localhost:9090
Grafana:
kubectl get secret -n monitoring prometheus-grafana -o jsonpath='{.data.admin-password}' | base64 -d
kubectl port-forward svc/prometheus-grafana -n monitoring 3000:80
访问 http://localhost:3000
,使用用户名 admin
和上一步获取的密码登录。
Prometheus Operator 简化了 Prometheus 在 Kubernetes 上的管理。
kubectl create namespace monitoring
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml
创建 prometheus.yaml
:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
namespace: monitoring
spec:
serviceAccountName: prometheus
serviceMonitorSelector: {}
resources:
requests:
memory: 400Mi
enableAdminAPI: false
应用配置:
kubectl apply -f prometheus.yaml
示例 ServiceMonitor 用于监控 Kubernetes API:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kube-apiserver
namespace: monitoring
spec:
endpoints:
- port: https
scheme: https
tlsConfig:
insecureSkipVerify: true
selector:
matchLabels:
component: apiserver
应用配置:
kubectl apply -f servicemonitor.yaml
修改 prometheus.yaml
,添加持久化卷声明:
spec:
storage:
volumeClaimTemplate:
spec:
storageClassName: standard
resources:
requests:
storage: 50Gi
ServiceMonitor
或 PodMonitor
自动发现监控目标。PrometheusRule
资源定义告警条件。/metrics
端点。创建 alertmanager.yaml
定义告警路由:
route:
receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#alerts'
send_resolved: true
api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ'
315
查看 Kubernetes 集群概览)。通过 Helm 或 Prometheus Operator 在 Kubernetes 中部署 Prometheus,能够高效监控集群和应用状态。结合 Alertmanager 和 Grafana,可实现完整的监控告警体系。根据实际需求调整资源配置、存储方案和告警规则,确保系统稳定性和可观测性。