ps切片做网站,鲜花便宜的网站建设,app免费制作平台下载,直播一级a做爰片免费网站一、思路
Prometheus监控Etcd集群#xff0c;是没有对应的exporter#xff0c;而 由CoreOS公司开发的Operator#xff0c;用来扩展 Kubernetes API#xff0c;特定的应用程序控制器#xff0c;它用来创建、配置和管理复杂的有状态应用#xff0c;如数据库、缓存和监控系…一、思路
Prometheus监控Etcd集群是没有对应的exporter而 由CoreOS公司开发的Operator用来扩展 Kubernetes API特定的应用程序控制器它用来创建、配置和管理复杂的有状态应用如数据库、缓存和监控系统可以实现监控etcd。
用自定义的方式来对 Kubernetes 集群进行监控但是还是有一些缺陷比如 Prometheus、AlertManager 这些组件服务本身的高可用当然我们也完全可以用自定义的方式来实现这些需求我们也知道 Prometheus 在代码上就已经对 Kubernetes 有了原生的支持可以通过服务发现的形式来自动监控集群因此我们可以使用另外一种更加高级的方式来部署 PrometheusOperator 框架。
安装方法:
第一步 安装 Prometheus Operator
第二步建立一个 ServiceMonitor 对象用于 Prometheus 添加监控项
第三步为 ServiceMonitor 对象关联 metrics 数据接口的一个 Service 对象
第四步确保 Service 对象可以正确获取到 metrics 数据 二、安装 Prometheus Operator
Operator是由CoreOS公司开发的用来扩展 Kubernetes API特定的应用程序控制器它用来创建、配置和管理复杂的有状态应用如数据库、缓存和监控系统。Operator基于 Kubernetes 的资源和控制器概念之上构建但同时又包含了应用程序特定的一些专业知识比如创建一个数据库的Operator则必须对创建的数据库的各种运维方式非常了解创建Operator的关键是CRD自定义资源的设计。 CRD是对 Kubernetes API 的扩展Kubernetes 中的每个资源都是一个 API 对象的集合例如我们在YAML文件里定义的那些spec都是对 Kubernetes 中的资源对象的定义所有的自定义资源可以跟 Kubernetes 中内建的资源一样使用 kubectl 操作。 Operator是将运维人员对软件操作的知识给代码化同时利用 Kubernetes 强大的抽象来管理大规模的软件应用。目前CoreOS官方提供了几种Operator的实现其中就包括我们今天的主角Prometheus OperatorOperator的核心实现就是基于 Kubernetes 的以下两个概念
资源对象的状态定义控制器观测、分析和行动以调节资源的分布
当然我们如果有对应的需求也完全可以自己去实现一个Operator接下来我们就来给大家详细介绍下Prometheus-Operator的使用方法。
介绍
首先我们先来了解下Prometheus-Operator的架构图 上图是Prometheus-Operator官方提供的架构图其中Operator是最核心的部分作为一个控制器他会去创建Prometheus、ServiceMonitor、AlertManager以及PrometheusRule4个CRD资源对象然后会一直监控并维持这4个资源对象的状态。
其中创建的prometheus这种资源对象就是作为Prometheus Server存在而ServiceMonitor就是exporter的各种抽象exporter前面我们已经学习了是用来提供专门提供metrics数据接口的工具Prometheus就是通过ServiceMonitor提供的metrics数据接口去 pull 数据的当然alertmanager这种资源对象就是对应的AlertManager的抽象而PrometheusRule是用来被Prometheus实例使用的报警规则文件。
这样我们要在集群中监控什么数据就变成了直接去操作 Kubernetes 集群的资源对象了是不是方便很多了。上图中的 Service 和 ServiceMonitor 都是 Kubernetes 的资源一个 ServiceMonitor 可以通过 labelSelector 的方式去匹配一类 ServicePrometheus 也可以通过 labelSelector 去匹配多个ServiceMonitor。
安装
我们这里直接通过 Prometheus-Operator 的源码来进行安装当然也可以用 Helm 来进行一键安装我们采用源码安装可以去了解更多的实现细节。首页将源码 Clone 下来GitHub - prometheus-operator/prometheus-operator: Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes
注意版本由于我的k8s是1.21所以选择了release-0.9 $ git clone https://github.com/coreos/kube-prometheus.git
$ cd manifests
$ ls
00namespace-namespace.yaml node-exporter-clusterRole.yaml
0prometheus-operator-0alertmanagerCustomResourceDefinition.yaml node-exporter-daemonset.yaml
......
最新的版本官方将资源prometheus-operator/contrib/kube-prometheus at main · prometheus-operator/prometheus-operator · GitHub迁移到了独立的 git 仓库中GitHub - prometheus-operator/kube-prometheus: Use Prometheus to monitor Kubernetes and applications running on Kubernetes
注意老版本中进入到 manifests 目录下面这个目录下面包含我们所有的资源清单文件 prometheus-serviceMonitorKubelet.yaml 默认情况下这个 ServiceMonitor 是关联的 kubelet 的10250端口去采集的节点数据如果这个 metrics 数据已经迁移到其他只读端口上面去了数据已经迁移到10255这个只读端口上面去了我们只需要将文件中的https-metrics更改成http-metrics即可这个在 Prometheus-Operator 对节点端点同步的代码中有相关定义感兴趣的可以点此查看完整代码
Subsets: []v1.EndpointSubset{{Ports: []v1.EndpointPort{{Name: https-metrics,Port: 10250,},{Name: http-metrics,Port: 10255,},{Name: cadvisor,Port: 4194,},},},
},
正式部署
[rootmaster prometheus-operator]# kubectl get node
NAME STATUS ROLES AGE VERSION
master Ready control-plane,master 514d v1.21.1
slave01 Ready none 513d v1.21.1
slave02 Ready none 513d v1.21.1unzip kube-prometheus-release-0.9.zip
cd kube-prometheus-release-0.9/注意一定先部署manifests/setup否则会如下错误[rootmaster kube-prometheus-release-0.8]# kubectl create -f manifests/setup
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created
[rootmaster kube-prometheus-release-0.8]# kubectl get pod -A -owide -n monitoring
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
monitoring prometheus-operator-7775c66ccf-mwtx6 2/2 Running 0 54s 172.7.1.36 slave01 none none[rootmaster kube-prometheus-release-0.8]# kubectl create -f manifests/ alertmanager.monitoring.coreos.com/main created
Warning: policy/v1beta1 PodDisruptionBudget is deprecated in v1.21, unavailable in v1.25; use policy/v1 PodDisruptionBudget
poddisruptionbudget.policy/alertmanager-main created
prometheusrule.monitoring.coreos.com/alertmanager-main-rules created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
clusterrole.rbac.authorization.k8s.io/blackbox-exporter created
clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created
configmap/blackbox-exporter-configuration created
deployment.apps/blackbox-exporter created
service/blackbox-exporter created
serviceaccount/blackbox-exporter created
servicemonitor.monitoring.coreos.com/blackbox-exporter created
secret/grafana-datasources created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
prometheusrule.monitoring.coreos.com/kube-prometheus-rules created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
prometheusrule.monitoring.coreos.com/node-exporter-rules created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
poddisruptionbudget.policy/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
servicemonitor.monitoring.coreos.com/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-operator-rules created
servicemonitor.monitoring.coreos.com/prometheus-operator created
poddisruptionbudget.policy/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-k8s created
部署完成后会创建一个名为monitoring的 namespace所以资源对象对将部署在改命名空间下面此外 Operator 会自动创建8个 CRD 资源对象
[rootmaster kube-prometheus-release-0.8]# kubectl get crd |grep coreos
alertmanagerconfigs.monitoring.coreos.com 2024-06-28T17:03:27Z
alertmanagers.monitoring.coreos.com 2024-06-28T17:03:27Z
podmonitors.monitoring.coreos.com 2024-06-28T17:03:27Z
probes.monitoring.coreos.com 2024-06-28T17:03:27Z
prometheuses.monitoring.coreos.com 2024-06-28T17:03:27Z
prometheusrules.monitoring.coreos.com 2024-06-28T17:03:27Z
servicemonitors.monitoring.coreos.com 2024-06-28T17:03:28Z
thanosrulers.monitoring.coreos.com 2024-06-28T17:03:28Z可以在 monitoring 命名空间下面查看所有的 Pod其中 alertmanager 和 prometheus 是用 StatefulSet 控制器管理的其中还有一个比较核心的 prometheus-operator 的 Pod用来控制其他资源对象和监听对象变化的
[rootmaster kube-prometheus-release-0.8]# kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 12m
alertmanager-main-1 2/2 Running 0 12m
alertmanager-main-2 2/2 Running 0 12m
blackbox-exporter-55c457d5fb-wn54c 3/3 Running 0 12m
grafana-6dd5b5f65-7j675 1/1 Running 0 12m
kube-state-metrics-76f6cb7996-bsdvz 3/3 Running 0 12m
node-exporter-db24s 2/2 Running 0 12m
node-exporter-k2xd9 2/2 Running 0 12m
node-exporter-kblxs 2/2 Running 0 12m
prometheus-adapter-59df95d9f5-5r2gw 1/1 Running 0 12m
prometheus-adapter-59df95d9f5-6f449 1/1 Running 0 12m
prometheus-k8s-0 2/2 Running 1 12m
prometheus-k8s-1 2/2 Running 1 12m
prometheus-operator-7775c66ccf-mwtx6 2/2 Running 0 14m查看创建的 Service:
[rootmaster kube-prometheus-release-0.8]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.101.163.190 none 9093/TCP 21m
alertmanager-operated ClusterIP None none 9093/TCP,9094/TCP,9094/UDP 21m
blackbox-exporter ClusterIP 10.107.221.88 none 9115/TCP,19115/TCP 21m
grafana ClusterIP 10.109.100.129 none 3000/TCP 21m
kube-state-metrics ClusterIP None none 8443/TCP,9443/TCP 21m
node-exporter ClusterIP None none 9100/TCP 21m
prometheus-adapter ClusterIP 10.108.203.228 none 443/TCP 21m
prometheus-k8s ClusterIP 10.104.101.86 none 9090/TCP 21m
prometheus-operated ClusterIP None none 9090/TCP 21m
prometheus-operator ClusterIP None none 8443/TCP 23m可以看到上面针对 grafana 和 prometheus 都创建了一个类型为 ClusterIP 的 Service当然如果我们想要在外网访问这两个服务的话可以通过创建对应的 Ingress 对象或者使用 NodePort 类型的 Service我们这里为了简单直接使用 NodePort 类型的服务即可编辑 grafana 和 prometheus-k8s 这两个 Service将服务类型更改为 NodePort:
$ kubectl edit svc grafana -n monitoring
$ kubectl edit svc prometheus-k8s -n monitoring
$ kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
grafana NodePort 10.98.191.31 none 3000:32333/TCP 23h
prometheus-k8s NodePort 10.107.105.53 none 9090:30166/TCP 23h
或者通过ingress方式也行这里就不多过解释
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:name: prometheus-k8snamespace: monitoring
spec:ingressClassName: nginxrules:- host: prometheus-k8s.od.comhttp:paths:- backend:service:name: prometheus-k8sport:number: 9090path: /pathType: Prefix配置grafana
[rootmaster ~]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.98.97.216 none 9093/TCP 156m
alertmanager-operated ClusterIP None none 9093/TCP,9094/TCP,9094/UDP 156m
blackbox-exporter ClusterIP 10.109.140.175 none 9115/TCP,19115/TCP 156m
grafana ClusterIP 10.111.120.8 none 3000/TCP 156m
kube-state-metrics ClusterIP None none 8443/TCP,9443/TCP 156m
node-exporter ClusterIP None none 9100/TCP 156m
prometheus-adapter ClusterIP 10.97.255.204 none 443/TCP 156m
prometheus-k8s ClusterIP 10.100.253.0 none 9090/TCP 156m
prometheus-operated ClusterIP None none 9090/TCP 156m
prometheus-operator ClusterIP None none 8443/TCP 158m
[rootmaster ~]# curl 10.111.120.8:3000
a href/loginFound/a.[rootmaster ~]# vi ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:name: grafana-k8snamespace: monitoring
spec:ingressClassName: nginxrules:- host: grafana-k8s.od.comhttp:paths:- backend:service:name: grafanaport:number: 3000path: /pathType: Prefix[rootmaster ~]# kubectl apply -f ingress.yaml
ingress.networking.k8s.io/grafana-k8s created 访问Prometheus
可以发现Prometheus Operator已经给我们监控了好多服务 通过promethues的Configuration job_name是这些
- job_name: serviceMonitor/monitoring/alertmanager/0
- job_name: serviceMonitor/monitoring/blackbox-exporter/0
- job_name: serviceMonitor/monitoring/kube-apiserver/0
- job_name: serviceMonitor/monitoring/kube-state-metrics/0
- job_name: serviceMonitor/monitoring/kube-state-metrics/1
- job_name: serviceMonitor/monitoring/kubelet/0
- job_name: serviceMonitor/monitoring/kubelet/1
- job_name: serviceMonitor/monitoring/kubelet/2
- job_name: serviceMonitor/monitoring/node-exporter/0
- job_name: serviceMonitor/monitoring/prometheus-adapter/0
- job_name: serviceMonitor/monitoring/prometheus-k8s/0
- job_name: serviceMonitor/monitoring/prometheus-operator/0- job_name: serviceMonitor/monitoring/kube-scheduler/0
- job_name: serviceMonitor/monitoring/kube-controller-manager/0通过对比发现了少了kube-scheduler kube-controller-manager。
通过查看ServerMonitor也发现已经配置了kube-scheduler kube-controller-manager
[rootmaster mnt]# kubectl get ServiceMonitor -n monitoring
NAME AGE
alertmanager 45h
blackbox-exporter 45h
coredns 45h
grafana 45h
kube-apiserver 45h
kube-controller-manager 45h
kube-scheduler 45h
kube-state-metrics 45h
kubelet 45h
node-exporter 45h
prometheus-adapter 45h
prometheus-k8s 45h
prometheus-operator 45h如上图中其他的服务kubelet 能被监控是应为定义了ServiceMonitor 而ServiceMonitor 需要跟service绑定。
[rootmaster ~]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 none 53/UDP,53/TCP,9153/TCP 516d
kubelet ClusterIP None none 10250/TCP,10255/TCP,4194/TCP 2d11hkube-controller-manager 和 kube-scheduler 这两个系统组件和 ServiceMonitor 的定义有关系我们先来查看下 kube-scheduler 组件对应的 ServiceMonitor 资源的定义(prometheus-serviceMonitorKubeScheduler.yaml)
kubectl get serviceMonitor kube-scheduler -n monitoring -oyamlapiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:creationTimestamp: 2024-07-01T02:44:33Zgeneration: 1labels:app.kubernetes.io/name: kube-scheduler # 定义name: kube-schedulernamespace: monitoringresourceVersion: 879124uid: 635e9c40-6aca-4b01-9c82-09e5436c0212
spec:endpoints:- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/tokeninterval: 30s # 每30s获取一次信息port: https-metrics # 对应service的端口名scheme: httpstlsConfig:insecureSkipVerify: truejobLabel: app.kubernetes.io/namenamespaceSelector: # 表示去匹配某一命名空间中的service如果想从所有的namespace中匹配用any: matchNames:- kube-systemselector:matchLabels: 匹配的 Service 的labels如果使用mathLabels则下面的所有标签都匹配时才会匹配该service如果使用matchExpressions则至少匹配一个标签的service都会被选择app.kubernetes.io/name: kube-scheduler
上面是一个典型的 ServiceMonitor 资源文件的声明方式上面我们通过selector.matchLabels在 kube-system 这个命名空间下面匹配具有app.kubernetes.io/name: kube-scheduler这样的 Service但是我们系统中根本就没有对应的 Service所以我们需要手动创建一个 Serviceprometheus-kubeSchedulerService.yaml
[rootmaster mnt]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 none 53/UDP,53/TCP,9153/TCP 516d
kubelet ClusterIP None none 10250/TCP,10255/TCP,4194/TCP 45h
[rootmaster mnt]#[rootmaster mnt]#vi service.yaml
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: kube-scheduler # 要跟ServiceMonitor定义一致,# 必须和上面的 ServiceMonitor 下面的 matchLabels 保持一致name: kube-schedulernamespace: kube-system
spec:ports:- name: https-metrics # 注意这里跟ServiceMonitor中定义的名字一样port: 10251 # 10251是kube-scheduler组件 metrics 数据所在的端口10252是kube-controller-manager组件的监控数据所在端口。protocol: TCPtargetPort: 10251selector:component: kube-scheduler # 此处是在spec.ports,说明selector是选择的是pod的lables在下文这种通过查看kube-scheduler的lables就是这也就代表他连接的是pod kube-scheduler[rootmaster mnt]#kubectl apply -f service.yaml其中最重要的是上面 labels 和 selector 部分labels 区域的配置必须和我们上面的 ServiceMonitor 对象中的 selector 保持一致selector下面配置的是componentkube-scheduler为什么会是这个 label 标签呢我们可以去 describe 下 kube-scheduelr 这个 Pod
kubectl describe pod kube-scheduler-master -n kube-systemPriority Class Name: system-node-critical
Node: master/192.168.206.10
Start Time: Mon, 01 Jul 2024 09:25:28 0800
Labels: componentkube-schedulertiercontrol-plane
Annotations: kubernetes.io/config.hash: b7c68738b74c821ccea799a016e1ffa5kubernetes.io/config.mirror: b7c68738b74c821ccea799a016e1ffa5kubernetes.io/config.seen: 2024-07-01T01:48:11.32589252308:00kubernetes.io/config.source: file我们可以看到这个 Pod 具有componentkube-scheduler和tiercontrol-plane这两个标签而前面这个标签具有更唯一的特性所以使用前面这个标签较好这样上面创建的 Service 就可以和我们的 Pod 进行关联了直接创建即可[rootmaster ~]# kubectl get svc -n kube-system -l app.kubernetes.io/namekube-scheduler
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-scheduler ClusterIP 10.102.85.153 none 10259/TCP 13hkubectl label nodes master app.kubernetes.io/namekube-scheduler
创建完成后隔一小会儿后去 prometheus 查看 targets 下面 kube-scheduler 的状态 我们可以看到现在已经发现了 target但是抓取数据结果出错了这个错误是因为我们集群是使用 kubeadm 搭建的其中 kube-scheduler 默认是绑定在127.0.0.1上面的而上面我们这个地方是想通过节点的 IP 去访问所以访问被拒绝了我们只要把 kube-scheduler 绑定的地址更改成0.0.0.0即可满足要求由于 kube-scheduler 是以静态 Pod 的形式运行在集群中的所以我们只需要更改静态 Pod 目录下面对应的 YAML 文件即可
ls /etc/kubernetes/manifests/
etcd.yaml kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yaml
将 kube-scheduler.yaml 文件中-command的--address地址更改成0.0.0.0并且将port0注释
containers:
- command:
- kube-scheduler
- --leader-electtrue
- --kubeconfig/etc/kubernetes/scheduler.conf
- --address0.0.0.0
# - --port0 # 如果为0则不提供 HTTP 服务--secure-port 默认值10259通过身份验证和授权为 HTTPS 服务的端口如果为 0则不提供 HTTPS。更改后重启kubelet服务更改后 kube-scheduler 会自动重启重启完成后再去查看 Prometheus 上面的采集目标就正常了。
修改完成后我们将该文件从当前文件夹中移除隔一会儿再移回该目录就可以自动更新了然后再去看 prometheus 中 kube-scheduler 这个 target 如上报错是因为1.21.1版本需要注意现在版本默认的安全端口是10259
kubectl edit svc kube-scheduler -n kube-system
spec:ports:- name: https-metrics port: 10259protocol: TCPtargetPort: 10259selector:component: kube-scheduler
\ 部署 kube-controller-manager 组件的监控
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: kube-controller-managername: kube-controller-managernamespace: kube-system
spec:ports:- name: https-metricsport: 10257protocol: TCPtargetPort: 10257selector:component: kube-scheduler然后将 kube-controller-manager 静态 Pod 的资源清单文件中的参数 --bind-address127.0.0.1 更改为 --bind-address0.0.0.0。 注释 - --port0
cat /etc/kubernetes/manifests/kube-controller-manager.yaml 更改后重启kubelet服务更改后 kube-controller-manager会自动重启重启完成后再去查看 Prometheus 上面的采集目标就正常了 上面的监控数据配置完成后现在我们可以去查看下 grafana 下面的 dashboard同样使用上面的 NodePort 访问即可第一次登录使用 admin:admin 登录即可进入首页后可以发现已经和我们的 Prometheus 数据源关联上了正常来说可以看到一些监控图表了 三、监控 Etcd
第一步建立一个 ServiceMonitor 对象用于 Prometheus 添加监控项
第二步为 ServiceMonitor 对象关联 metrics 数据接口的一个 Service 对象
第三步确保 Service 对象可以正确获取到 metrics 数据
创建secrets资源
首先我们将需要使用到的证书通过 secret 对象保存到集群中去(在 etcd 运行的节点)
获取证书kubedm部署的
[rootmaster ~]# kubectl get pods -n kube-system | grep etcd
etcd-master 1/1 Running 44 516d[rootmaster ~]# kubectl describe pod etcd-master -n kube-system
Name: etcd-master
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: master/192.168.206.10
Start Time: Mon, 01 Jul 2024 09:25:28 0800
Labels: componentetcdtiercontrol-plane
Annotations: kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.206.10:2379kubernetes.io/config.hash: 4718945d29e49afeeca8a4ab35b6b412kubernetes.io/config.mirror: 4718945d29e49afeeca8a4ab35b6b412kubernetes.io/config.seen: 2023-01-31T22:04:39.59106312308:00kubernetes.io/config.source: file
Status: Running
IP: 192.168.206.10
IPs:IP: 192.168.206.10
Controlled By: Node/master
Containers:etcd:Container ID: docker://c7124102ca9389940ca148b835be0327f11506b05885aff1c634a308f309f200Image: registry.aliyuncs.com/google_containers/etcd:3.4.13-0Image ID: docker-pullable://registry.aliyuncs.com/google_containers/etcdsha256:4ad90a11b55313b182afc186b9876c8e891531b8db4c9bf1541953021618d0e2Port: noneHost Port: noneCommand:etcd--advertise-client-urlshttps://192.168.206.10:2379--cert-file/etc/kubernetes/pki/etcd/server.crt--client-cert-authtrue--data-dir/var/lib/etcd--initial-advertise-peer-urlshttps://192.168.206.10:2380--initial-clustermasterhttps://192.168.206.10:2380--key-file/etc/kubernetes/pki/etcd/server.key--listen-client-urlshttps://127.0.0.1:2379,https://192.168.206.10:2379--listen-metrics-urlshttp://127.0.0.1:2381--listen-peer-urlshttps://192.168.206.10:2380--namemaster--peer-cert-file/etc/kubernetes/pki/etcd/peer.crt--peer-client-cert-authtrue--peer-key-file/etc/kubernetes/pki/etcd/peer.key--peer-trusted-ca-file/etc/kubernetes/pki/etcd/ca.crt--snapshot-count10000--trusted-ca-file/etc/kubernetes/pki/etcd/ca.crtState: RunningStarted: Mon, 01 Jul 2024 11:59:47 0800Last State: TerminatedReason: ErrorExit Code: 255Started: Mon, 01 Jul 2024 09:25:30 0800Finished: Mon, 01 Jul 2024 11:59:37 0800Ready: TrueRestart Count: 44可以看到 ETCD 的证书文件在 Kubernetes Master 节点的 “/etc/kubernetes/pki/etcd/” 文件夹下。
将证书存入 Kubernetes
#创建secret资源
kubectl create secret generic etcd-certs --from-file/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file/etc/kubernetes/pki/etcd/ca.crt -n monitoring
查看刚刚创建的资源
[rootmaster ~]# kubectl get secret etcd-certs -n monitoring
NAME TYPE DATA AGE
etcd-certs Opaque 3 9s
[rootmaster ~]#将证书挂入 Prometheus
编译Prometheus资源将etcd证书导入
[rootmaster ~]# kubectl get prometheus -n monitoring
NAME VERSION REPLICAS AGE
k8s 2.29.1 2 3h20m
[rootmaster ~]# kubectl edit prometheus k8s -n monitoring
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:creationTimestamp: 2024-07-01T02:44:34Zgeneration: 2labels:app.kubernetes.io/component: prometheusapp.kubernetes.io/name: prometheusapp.kubernetes.io/part-of: kube-prometheusapp.kubernetes.io/version: 2.29.1prometheus: k8sname: k8snamespace: monitoringresourceVersion: 905177uid: 3ad2b674-458c-4918-907a-337e838ffd53
spec:alerting:alertmanagers:- apiVersion: v2name: alertmanager-mainnamespace: monitoringport: webenableFeatures: []externalLabels: {}image: quay.io/prometheus/prometheus:v2.29.1nodeSelector:kubernetes.io/os: linuxpodMetadata:labels:app.kubernetes.io/component: prometheusapp.kubernetes.io/name: prometheusapp.kubernetes.io/part-of: kube-prometheusapp.kubernetes.io/version: 2.29.1podMonitorNamespaceSelector: {}podMonitorSelector: {}probeNamespaceSelector: {}probeSelector: {}replicas: 2resources:requests:memory: 400MiruleNamespaceSelector: {}ruleSelector: {}secrets: #------新增证书配置将etcd证书挂入- etcd-certssecurityContext:fsGroup: 2000runAsNonRoot: truerunAsUser: 1000serviceAccountName: prometheus-k8sserviceMonitorNamespaceSelector: {}serviceMonitorSelector: {}version: 2.29.1
[rootmaster ~]#等到pod重启后,进入pod查看是否可以看到证书
[rootmaster ~]# kubectl get pod -owide -n monitoring
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alertmanager-main-0 2/2 Running 0 3h26m 172.7.2.119 slave02 none none
alertmanager-main-1 2/2 Running 0 3h26m 172.7.2.118 slave02 none none
alertmanager-main-2 2/2 Running 0 3h26m 172.7.1.108 slave01 none none
blackbox-exporter-6798fb5bb4-pf7tw 3/3 Running 0 3h26m 172.7.2.122 slave02 none none
grafana-7476b4c65b-bv62x 1/1 Running 0 3h26m 172.7.2.120 slave02 none none
kube-state-metrics-74964b6cd4-9tldk 3/3 Running 0 3h26m 172.7.1.109 slave01 none none
node-exporter-5lw2w 2/2 Running 0 3h26m 192.168.206.12 slave02 none none
node-exporter-g546z 2/2 Running 2 3h26m 192.168.206.10 master none none
node-exporter-gwhdr 2/2 Running 0 3h26m 192.168.206.11 slave01 none none
prometheus-adapter-8587b9cf9b-qzgmt 1/1 Running 0 3h26m 172.7.2.121 slave02 none none
prometheus-adapter-8587b9cf9b-rmmlk 1/1 Running 0 3h26m 172.7.1.110 slave01 none none
prometheus-k8s-0 2/2 Running 0 86s 172.7.2.124 slave02 none none
prometheus-k8s-1 2/2 Running 0 91s 172.7.1.112 slave01 none none
prometheus-operator-75d9b475d9-zshf7 2/2 Running 0 3h28m 172.7.1.106 slave01 none none
[rootmaster ~]# kubectl exec -it -n monitoring prometheus-k8s-0 -- /bin/sh
/prometheus $ ls -l /etc/prometheus/secrets/etcd-certs/
total 0
lrwxrwxrwx 1 root 2000 13 Jul 1 06:09 ca.crt - ..data/ca.crt
lrwxrwxrwx 1 root 2000 29 Jul 1 06:09 healthcheck-client.crt - ..data/healthcheck-client.crt
lrwxrwxrwx 1 root 2000 29 Jul 1 06:09 healthcheck-client.key - ..data/healthcheck-client.key
/prometheus $创建 Etcd Service Endpoints
因为 ETCD 是独立于集群之外的所以我们需要创建一个 Endpoints 将其代理到 Kubernetes 集群然后创建一个 Service 绑定 Endpoints然后 Kubernetes 集群的应用就可以访问 ETCD 了。
[rootmaster ~]# vi etcd-service.yaml
apiVersion: v1
kind: Service
metadata:name: etcd-k8snamespace: kube-systemlabels:k8s-app: etcd
spec:type: ClusterIPclusterIP: None #设置为None不分配Service IPports:- name: portport: 2379 protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:name: etcd-k8snamespace: kube-systemlabels:k8s-app: etcd
subsets:
- addresses:- ip: 192.168.206.10 #Etcd 所在节点的IPports:- port: 2379 #Etcd 端口号 如果是集群就是
apiVersion: v1
kind: Endpoints
metadata:name: etcd-k8snamespace: kube-systemlabels:k8s-app: etcd
subsets:
- addresses:- ip: 11.0.64.5- ip: 11.0.64.6- ip: 11.0.64.7 ports:- name: portport: 2379protocol: TCP [rootmaster ~]# kubectl apply -f etcd-service.yaml
service/etcd-k8s created
endpoints/etcd-k8s created
[rootmaster ~]#创建 ServiceMonitor
创建 Prometheus 监控资源配置用于监控 Etcd 参数。
vi etcd-monitor.yaml
$ vim prometheus-serviceMonitorEtcd.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:name: etcd-k8snamespace: monitoringlabels:k8s-app: etcd-k8s
spec:jobLabel: k8s-appendpoints:- port: portinterval: 30sscheme: httpstlsConfig:caFile: /etc/prometheus/secrets/etcd-certs/ca.crtcertFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crtkeyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.keyinsecureSkipVerify: trueselector:matchLabels:k8s-app: etcdnamespaceSelector:matchNames:- kube-system$ kubectl apply -f etcd-monitor.yaml
上面我们在 monitoring 命名空间下面创建了名为 etcd-k8s 的 ServiceMonitor 对象基本属性和前面章节中的一致匹配 kube-system 这个命名空间下面的具有 k8s-appetcd 这个 label 标签的 ServicejobLabel 表示用于检索 job 任务名称的标签和前面不太一样的地方是 endpoints 属性的写法配置上访问 etcd 的相关证书endpoints 属性下面可以配置很多抓取的参数比如 relabel、proxyUrltlsConfig 表示用于配置抓取监控数据端点的 tls 认证由于证书 serverName 和 etcd 中签发的可能不匹配所以加上了 insecureSkipVerifytrue Prometheus 的 Dashboard
中查看 targets便会有 etcd 的监控项 修改prometheus的时间
~]# docker tag quay.io/prometheus/prometheus:v2.29.1 quay.io/prometheus/prometheus-bak:v2.29.1~]# vi Dockerfile
FROM quay.io/prometheus/prometheus-bak:v2.29.1
USER root
RUN /bin/cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime echo Asia/Shanghai /etc/timezone~]# docker build -t quay.io/prometheus/prometheus:v2.29.1 . Grafana 引入 ETCD 仪表盘
数据采集到后可以在 grafana 中导入编号为3070的 dashboard获取到 etcd 的监控图表。 Grafana 持久化数据的能力
通过查看发现竟然将Grafana数据挂载emptyDir可实现Pod中的容器之间共享目录数据但没有持久化数据的能力存储卷会随着Pod生命周期结束而一起删除 kubectl get delpoyment grafana -n monitoring 此处通过动态pvc进行挂载 vim grafana-p.yaml apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: grafana-nfs-pvcnamespace: monitoring
spec:accessModes:- ReadWriteManystorageClassName: nfs-client-storageclassresources:requests:storage: 1Gi kubectl apply -f grafana-p.yaml kubectl edit delpoyment grafana -n monitoring - emptyDir:{}name: grafana-storage- name: grafana-storagepersistentVolumeClaim:claimName: grafana-nfs-pvc 异常处理
都不部署完成后发现了grafana的监控模板中只有这一块数据 查看其他监控项目数据是没有的 但是发现etcd的metrice是有数据的 解决方案问题是因为svc跟endpoint没绑定上
[rootmaster mnt]# kubectl describe svc etcd-k8s -n kube-system
Name: etcd-k8s
Namespace: kube-system
Labels: k8s-appetcd
Annotations: none
Selector: none
Type: ClusterIP
IP Family Policy: RequireDualStack
IP Families: IPv4,IPv6
IP: None
IPs: None
Port: port 2379/TCP
TargetPort: 2379/TCP
Endpoints:
Session Affinity: None
Events: none是因为高版本需要Endpoints 写kubelet名称 (kubectl get node)显示的名称
[rootmaster mnt]# cat etcd-service.yaml
apiVersion: v1
kind: Service
metadata:name: etcd-k8snamespace: kube-systemlabels:k8s-app: etcd
spec:type: ClusterIPclusterIP: Noneports:- name: portport: 2379protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:name: etcd-k8snamespace: kube-systemlabels:k8s-app: etcd
subsets:
- addresses:- ip: 192.168.206.10 #etcd节点名称nodeName: master #kubelet名称 (kubectl get node)显示的名称ports:- name: portport: 2379protocol: TCPkubectl apply -f etcd-service.yaml如果是集群
- addresses:- ip: 192.168.0.10 #etcd节点名称nodeName: k8s-01 #kubelet名称 (kubectl get node)显示的名称- ip: 192.168.0.11nodeName: k8s-02- ip: 192.168.0.12nodeName: k8s-03