《OpenShift / RHEL / DevSecOps 汇总目录》
说明:本文已经在 OpenShift 4.13 的环境中验证
构成 OpenShift 监控功能的附件分为两部分:“平台监控组件” 和 “用户项目监控组件”。
$ oc new-project app-monitoring
$ oc new-app quay.io/brancz/prometheus-example-app:v0.2.0 -l app=prometheus-example-app
$ cat << EOF | oc apply -f -
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus-example-app
name: prometheus-example-app
spec:
ports:
- port: 8080
protocol: TCP
name: 8080-tcp
selector:
app: prometheus-example-app
type: ClusterIP
EOF
$ oc expose svc prometheus-example-app
$ curl -sw "%{http_code}\n" -o /dev/null $(oc get route prometheus-example-app -ojsonpath={.spec.host})
200
$ curl -sw "%{http_code}\n" -o /dev/null $(oc get route prometheus-example-app -ojsonpath={.spec.host})/err
404
$ curl $(oc get route prometheus-example-app -ojsonpath={.spec.host})/metrics
# HELP http_requests_total Count of all HTTP requests
# TYPE http_requests_total counter
http_requests_total{code="200",method="get"} 1
http_requests_total{code="404",method="get"} 1
# HELP version Version information about this binary
# TYPE version gauge
version{version="v0.2.0"} 1
说明:
也可在控制台上部署 quay.io/brancz/prometheus-example-app:v0.2.0 容器镜像,但需要增加 app=prometheus-example-app 标签,并且去掉 “安全路由” 选项。
$ cat << EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
enableUserWorkload: true
EOF
$ oc get pod -n openshift-user-workload-monitoring
NAME READY STATUS RESTARTS AGE
prometheus-operator-77d547b4dc-fcflk 2/2 Running 0 34h
prometheus-user-workload-0 6/6 Running 0 34h
thanos-ruler-user-workload-0 4/4 Running 0 34h
$ cat << EOF | oc apply -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: prometheus-example-monitor
namespace: app-monitoring
spec:
endpoints:
- interval: 30s
port: 8080-tcp
path: /metrics
selector:
matchLabels:
app: prometheus-example-app
EOF
$ oc get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
prometheus-example-app-b744f9c85-bmk7p 1/1 Running 0 6m15s 10.217.0.123 crc-2zx29-master-0 <none> <none>
$ cat << EOF | oc apply -f -
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: app-alert
namespace: app-monitoring
spec:
groups:
- name: app-alert
rules:
- alert: HttpRequestErrorRateIncrease
expr: rate(http_requests_total{code="404",job="prometheus-example-app"}[5m]) > 0.3
labels:
severity: warning
annotations:
summary: Prometheus example app's error rate increase.
message: Prometheus example app's error rate increase.
EOF
创建完后可以在“报警” 菜单中的 “报警规则” 页面中通过将 “过滤器” 中选择 “用户”,可以看到该报警规则。
$ for i in `seq 1 10000`
do
curl -sw "%{http_code}\n" -o /dev/null $(oc get route prometheus-example-app -ojsonpath={.spec.host})/err
sleep 1
done
$ cat << EOF | oc apply -f -
apiVersion: integreatly.org/v1alpha1
kind: Grafana
metadata:
name: my-grafana
namespace: my-grafana
spec:
config:
security:
admin_user: admin
admin_password: my-password
dataStorage:
accessModes:
- ReadWriteOnce
size: 1Gi
ingress:
enabled: true
tls:
enabled: true
EOF
$ oc create clusterrolebinding grafana-view --clusterrole=cluster-monitoring-view --serviceaccount=my-grafana:grafana-serviceaccount
$ TOKEN=$(oc create token grafana-serviceaccount -n my-grafana)
$ cat << EOF | oc apply -f -
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDataSource
metadata:
name: prometheus
namespace: my-grafana
spec:
datasources:
- basicAuthUser: internal
access: proxy
editable: true
secureJsonData:
httpHeaderValue1: >-
Bearer ${TOKEN}
name: Prometheus
url: 'https://thanos-querier.openshift-monitoring.svc.cluster.local:9091'
jsonData:
httpHeaderName1: Authorization
timeInterval: 5s
tlsSkipVerify: true
basicAuth: false
isDefault: true
version: 1
type: prometheus
name: test_name
EOF
rate(http_requests_total{code="404",job="prometheus-example-app"}[5m])
4. 在 Dashboard 页面点击右上方的 Dashboard settings 图标。
5. 设置 Name,然后保存。
6. 最后通过定制的 Dashboard 监控的应用指标如下图。
演示视频
https://github.com/k-srkw/openshift-monitoring-handson/blob/main/monitoring-handson.md
https://cloud.redhat.com/blog/your-guide-to-openshift-observability-part-1
https://access.redhat.com/solutions/5335491
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.5/html/monitoring/monitoring-your-own-services
https://catalog.workshops.aws/aws-openshift-workshop/en-US/8-observability/2-metrics/5-app-dashboard
https://github.com/brancz/prometheus-example-app
https://developers.redhat.com/articles/2023/08/08/how-monitor-workloads-using-openshift-monitoring-stack#how_to_monitor_a_sample_application
https://shonpaz.medium.com/monitor-your-application-metrics-using-the-openshift-monitoring-stack-862cb4111906
https://github.com/OpenShiftDemos/openshift-ops-workshops/blob/ocp4-dev/workshop/content/monitoring-basics.adoc
https://github.com/pittar/openshift-user-workload-monitoring
https://github.com/alvarolop/quarkus-observability-app/blob/main/README.adoc
https://prometheus.io/docs/prometheus/latest/querying/basics/
https://github.com/alvarolop/quarkus-observability-app