Prometheus Operator 配置PrometheusRule告警规则-FinClip官网

Prometheus Operator 配置PrometheusRule告警规则

网友投稿 1253 2022-11-30

Prometheus Operator 配置PrometheusRule告警规则

PrometheusRule

用于配置 Prometheus 的 Rule 规则文件，包括 recording rules 和 alerting，可以自动被 Prometheus 加载。

配置 PrometheusRule

现在我们知道怎么自定义一个 ServiceMonitor 对象了，但是如果需要自定义一个报警规则的话呢？我们去查看 Prometheus Dashboard 的 Alert 页面下面就已经有很多报警规则了，这一系列的规则其实都来自于项目 GitHub - kubernetes-monitoring/kubernetes-mixin: A set of Grafana dashboards and Prometheus alerts for Kubernetes.，我们都通过 Prometheus Operator 安装配置上了。

但是这些报警信息是哪里来的呢？他们应该用怎样的方式通知我们呢？我们知道之前我们使用自定义的方式可以在 Prometheus 的配置文件之中指定 AlertManager 实例和报警的 rules 文件，现在我们通过 Operator 部署的呢？我们可以在 Prometheus Dashboard 的 Config 页面下面查看关于 AlertManager 的配置：

alerting: alert_relabel_configs: - separator: ; regex: prometheus_replica replacement: ☸ ➜1 action: labeldrop alertmanagers: - follow_redirects: true enable_true scheme: path_prefix: / timeout: 10s api_version: v2 relabel_configs: - source_labels: [__meta_kubernetes_service_name] separator: ; regex: alertmanager-main replacement: ☸ ➜1 action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] separator: ; regex: web replacement: ☸ ➜1 action: keep kubernetes_sd_configs: - role: endpoints kubeconfig_file: "" follow_redirects: true enable_true namespaces: own_namespace: false names: - monitoringrule_files: - /etc/prometheus/rules/prometheus-k8s-rulefiles-0/*.yaml

上面 alertmanagers 的配置我们可以看到是通过 role 为 endpoints 的 kubernetes 的自动发现机制获取的，匹配的是服务名为 alertmanager-main，端口名为 web 的 Service 服务，我们可以查看下 alertmanager-main 这个 Service：

☸ ➜ kubectl describe svc alertmanager-main -n monitoringName: alertmanager-mainNamespace: monitoringLabels: app.kubernetes.io/component=alert-router app.kubernetes.io/instance=main app.kubernetes.io/name=alertmanager app.kubernetes.io/part-of=kube-prometheus app.kubernetes.io/version=0.24.0Annotations: Selector: app.kubernetes.io/component=alert-router,app.kubernetes.io/instance=main,app.kubernetes.io/name=alertmanager,app.kubernetes.io/part-of=kube-prometheusType: NodePortIP Family Policy: SingleStackIP Families: IPv4IP: 10.109.67.21IPs: 10.109.67.21Port: web 9093/TCPTargetPort: web/TCPNodePort: web 32033/TCPEndpoints: 10.244.1.193:9093,10.244.2.208:9093,10.244.2.210:9093Port: reloader-web 8080/TCPTargetPort: reloader-web/TCPNodePort: reloader-web 30181/TCPEndpoints: 10.244.1.193:8080,10.244.2.208:8080,10.244.2.210:8080Session Affinity: ClientIPExternal Traffic Policy: ClusterEvents:

可以看到服务名正是 alertmanager-main，Port 定义的名称也是 web，符合上面的规则，所以 Prometheus 和 AlertManager 组件就正确关联上了。

而对应的报警规则文件位于：/etc/prometheus/rules/prometheus-k8s-rulefiles-0/ 目录下面所有的 YAML 文件。我们可以进入 Prometheus 的 Pod 中验证下该目录下面是否有 YAML 文件：

☸ ➜ kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoringkubectl exec -it prometheus-k8s-0 -n monitoring -- /bin/sh/prometheus ☸ ➜ ls /etc/prometheus/rules/prometheus-k8s-rulefiles-0/monitoring-alertmanager-main-rules-79543974-2f8e-4c5f-9d23-2c349c38ff1d.yamlmonitoring-grafana-rules-8fc5e057-099e-4546-b6bd-d8fb1107c24d.yamlmonitoring-kube-prometheus-rules-79b18777-2df4-4e43-84a8-193053400842.yamlmonitoring-kube-state-metrics-rules-8341740e-f2b7-48e9-82c2-bd6b979f1da2.yamlmonitoring-kubernetes-monitoring-rules-4b169784-b211-4449-922f-52fb2efd839c.yamlmonitoring-node-exporter-rules-b5f0f4d3-aa18-4e7d-836f-ef0a8fda7569.yamlmonitoring-prometheus-k8s-prometheus-rules-9560ae4f-764c-4ba4-9a37-2fedb56773c7.yamlmonitoring-prometheus-operator-rules-7d3a1645-efe3-4214-b825-c77c39ceb0d4.yaml/prometheus ☸ ➜ cat /etc/prometheus/rules/prometheus-k8s-rulefiles-0/monitoring-kube-prometheus-rules-79b18777-2df4-4e43-84a8-193053400842.yamlgroups:- name: general.rules rules: - alert: TargetDown annotations: description: '{{ printf "%.4g" ☸ ➜value }}% of the {{ ☸ ➜labels.job }}/{{ ☸ ➜labels.service }} targets in {{ ☸ ➜labels.namespace }} namespace are down.' runbook_url: summary: One or more targets are unreachable. expr: 100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job, namespace, service)) > 10 for: 10m labels: severity: warning......

这个 YAML 文件实际上就是我们之前创建的一个 PrometheusRule 文件包含的内容：

☸ ➜ cat kubePrometheus-prometheusRule.yamlapiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-prometheus app.kubernetes.io/part-of: kube-prometheus prometheus: k8s role: alert-rules name: kube-prometheus-rules namespace: monitoringspec: groups: - name: general.rules rules: - alert: TargetDown annotations: description: '{{ printf "%.4g" ☸ ➜value }}% of the {{ ☸ ➜labels.job }}/{{ ☸ ➜labels.service }} targets in {{ ☸ ➜labels.namespace }} namespace are down.' runbook_url: summary: One or more targets are unreachable. expr: 100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job, namespace, service)) > 10 for: 10m labels: severity: warning......

我们这里的 PrometheusRule 的 name 为 kube-prometheus-rules，namespace 为 monitoring，我们可以猜想到我们创建一个 PrometheusRule 资源对象后，会自动在上面的 prometheus-k8s-rulefiles-0 目录下面生成一个对应的 --.yaml 文件，所以如果以后我们需要自定义一个报警选项的话，只需要定义一个 PrometheusRule 资源对象即可。

至于为什么 Prometheus 能够识别这个 PrometheusRule 资源对象呢？这就需要查看我们创建的 prometheus 这个资源对象了，里面有非常重要的一个属性 ruleSelector，用来匹配 rule 规则的过滤器，我们这里没有过滤，所以可以匹配所有的，假设要求匹配具有 prometheus=k8s 和 role=alert-rules 标签的 PrometheusRule 资源对象，则可以添加下面的配置：

ruleSelector: matchLabels: prometheus: k8s role: alert-rules

所以我们要想自定义一个报警规则，只需要创建一个能够被 prometheus 对象匹配的 PrometheusRule 对象即可，比如现在我们添加一个 etcd 是否可用的报警，我们知道 etcd 整个集群有一半以上的节点可用的话集群就是可用的，所以我们判断如果不可用的 etcd 数量超过了一半那么就触发报警，创建文件 prometheus-etcdRules.yaml：

apiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata: labels: prometheus: k8s role: alert-rules name: etcd-rules namespace: monitoringspec: groups: - name: etcd rules: - alert: EtcdClusterUnavailable annotations: summary: etcd cluster small description: If one more etcd peer goes down the cluster will be unavailable expr: | count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2 - 1) for: 3m labels: severity: critical

创建完成后，隔一会儿再去容器中查看下 rules 文件夹：

☸ ➜ kubectl apply -f created☸ ➜ kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoringDefaulting container name to prometheus.Use 'kubectl describe pod/prometheus-k8s-0 -n monitoring' to see all of the containers in this pod./prometheus ☸ ➜ ls /etc/prometheus/rules/prometheus-k8s-rulefiles-0/monitoring-etcd-rules.yaml monitoring-prometheus-k8s-rules.yaml

可以看到我们创建的 rule 文件已经被注入到了对应的 rulefiles 文件夹下面了，证明我们上面的设想是正确的。然后再去 Prometheus Dashboard 的 Alert 页面下面就可以查看到上面我们新建的报警规则了：

社区app软件开发如何提升用户互动与品牌忠诚度

1253 2022-11-30

Prometheus Operator 配置PrometheusRule告警规则

智慧屏安装 app如何提升家庭娱乐与教育体验的关键工具

社区app软件开发如何提升用户互动与品牌忠诚度

devops 信创在数字经济时代提升企业竞争力的关键策略

最近发表

更多内容

小程序SDK

Finclip技术文档

小程序开发

小程序容器

小程序框架

Finclip小程序平台

Finclip用户投稿

车联网

推荐文章

小程序SDK是什么意思？小程序sdk和插件有什么区别？

小程序支付功能怎么实现？

企业app开发流程是什么？

app运营模式有哪些？

小程序多端引流怎么做？

小程序生态分析的机会和威胁

Flutter入门这一篇效率文章就够了

原生与跨平台解决方案分析,跨平台软件开发技术方案

热更新技术：让软件更新变得更加轻松快速

解决方案

银行解决方案

证券解决方案

互联网解决方案

政企OA解决方案

科技解决方案

loT解决方案

信任解决方案

热评文章

AppCan:基于混合模式的移动应用开发,移动混合模

Hybrid App混合模式开发的了解

小程序容器技术助力券商数字营销突围，小程序容器化的意

用mpvue开发微信小程序基础知识（vue.js开发

小程序多端框架全面测评对比，强烈推荐！

券商app架构 - 解析券商应用程序的构建与设计