当前位置: 首页 > ops >正文

Kubernetes部署Prometheus+Grafana 监控系统NFS存储方案

文章目录

  • Kubernetes 部署 Prometheus + Grafana 监控系统(NFS存储方案)
    • 一、NFS服务器配置(192.168.104.50)
      • 1. 安装NFS服务
      • 2. 创建共享目录
      • 3. 配置NFS共享
      • 4. 验证NFS共享
    • 二、所有Kubernetes节点配置(包括master和worker)
      • 1. 安装NFS客户端
      • 2. 验证NFS连接
    • 三、准备容器镜像(在可以访问外网的机器上操作)
      • 1. 拉取所需镜像
      • 2. 保存镜像为文件
      • 3. 将镜像文件传输到所有节点
      • 4. 在每个节点加载镜像
    • 四、部署NFS Provisioner
      • 1. 创建命名空间
      • 2. 创建NFS Provisioner部署文件(nfs-provisioner.yaml)
      • 3. 创建StorageClass(nfs-storageclass.yaml)
      • 4. 应用配置
      • 5. 验证部署
    • 五、部署Prometheus监控系统
      • 1. 创建监控命名空间
      • 2. 创建Prometheus配置文件(prometheus-configmap.yaml)
      • 3. 创建Prometheus主部署文件(prometheus.yaml)
      • 4. 应用Prometheus配置
    • 六、部署Node Exporter
      • 1. 创建Node Exporter部署文件(node-exportet-daemonset.yaml)
      • 2. 创建Node Exporter服务文件(node-exportet-svc.yaml)
      • 3. 应用Node Exporter配置
    • 七、部署Grafana
      • 1. 创建Grafana部署文件(grafana.yaml)
      • 2. 应用Grafana配置
    • 八、验证部署
      • 1. 检查所有组件状态
      • 2. 访问监控界面
      • 3. 配置Grafana
    • 九、常见问题解决指南
      • 问题1:PVC处于Pending状态
      • 问题2:镜像拉取失败
      • 问题3:NFS连接问题

Kubernetes 部署 Prometheus + Grafana 监控系统(NFS存储方案)

一、NFS服务器配置(192.168.104.50)

1. 安装NFS服务

yum install -y nfs-utils rpcbind
systemctl enable --now rpcbind
systemctl enable --now nfs-server

2. 创建共享目录

mkdir -p /data/k8s_data
chmod 777 /data/k8s_data

3. 配置NFS共享

echo "/data/k8s_data *(rw,sync,no_root_squash,no_subtree_check)" > /etc/exports
exportfs -arv

4. 验证NFS共享

showmount -e localhost
# 应该输出:
# Export list for localhost:
# /data/k8s_data *

二、所有Kubernetes节点配置(包括master和worker)

1. 安装NFS客户端

yum install -y nfs-utils

2. 验证NFS连接

mkdir /mnt/test
mount -t nfs 192.168.104.50:/data/k8s_data /mnt/test
touch /mnt/test/testfile
ls /mnt/test
umount /mnt/test

三、准备容器镜像(在可以访问外网的机器上操作)

1. 拉取所需镜像

docker pull prom/prometheus:latest
docker pull grafana/grafana:latest
docker pull prom/node-exporter:latest
docker pull registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
docker pull busybox:latest
docker pull quay.io/coreos/kube-state-metrics:v2.9.2

2. 保存镜像为文件

docker save -o prometheus.tar prom/prometheus:latest
docker save -o grafana.tar grafana/grafana:latest
docker save -o node-exporter.tar prom/node-exporter:latest
docker save -o nfs-provisioner.tar registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
docker save -o busybox.tar busybox:latest
docker save -o kube-state-metrics.tar quay.io/coreos/kube-state-metrics:v2.9.2

3. 将镜像文件传输到所有节点

scp *.tar root@192.168.104.51:/root
scp *.tar root@192.168.104.52:/root
scp *.tar root@192.168.104.53:/root

4. 在每个节点加载镜像

docker load -i prometheus.tar
docker load -i grafana.tar
docker load -i node-exporter.tar
docker load -i nfs-provisioner.tar
docker load -i busybox.tar
docker load -i kube-state-metrics.tar

四、部署NFS Provisioner

1. 创建命名空间

kubectl create namespace nfs-storageclass

2. 创建NFS Provisioner部署文件(nfs-provisioner.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:name: nfs-client-provisionernamespace: nfs-storageclass
spec:replicas: 1selector:matchLabels:app: nfs-client-provisionerstrategy:type: Recreatetemplate:metadata:labels:app: nfs-client-provisionerspec:serviceAccountName: nfs-client-provisionernodeSelector:kubernetes.io/hostname: node1  # 指定运行在node1节点containers:- name: nfs-client-provisionerimage: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2imagePullPolicy: IfNotPresentvolumeMounts:- name: nfs-client-rootmountPath: /persistentvolumesenv:- name: PROVISIONER_NAMEvalue: k8s-sigs.io/nfs-subdir-external-provisioner- name: NFS_SERVERvalue: 192.168.104.50- name: NFS_PATHvalue: /data/k8s_datavolumes:- name: nfs-client-rootnfs:server: 192.168.104.50path: /data/k8s_data
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: nfs-client-provisioner-runner
rules:- apiGroups: [""]resources: ["persistentvolumes"]verbs: ["get", "list", "watch", "create", "delete"]- apiGroups: [""]resources: ["persistentvolumeclaims"]verbs: ["get", "list", "watch", "update"]- apiGroups: ["storage.k8s.io"]resources: ["storageclasses"]verbs: ["get", "list", "watch"]- apiGroups: [""]resources: ["events"]verbs: ["create", "patch", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: run-nfs-client-provisioner
subjects:- kind: ServiceAccountname: nfs-client-provisionernamespace: nfs-storageclass
roleRef:kind: ClusterRolename: nfs-client-provisioner-runnerapiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ServiceAccount
metadata:name: nfs-client-provisionernamespace: nfs-storageclass

3. 创建StorageClass(nfs-storageclass.yaml)

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:name: nfs-clientannotations:storageclass.kubernetes.io/is-default-class: "true"
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
parameters:archiveOnDelete: "false"

4. 应用配置

kubectl apply -f nfs-provisioner.yaml
kubectl apply -f nfs-storageclass.yaml

5. 验证部署

kubectl get pods -n nfs-storageclass
# 应该看到nfs-client-provisioner运行中kubectl get storageclass
# 应该看到nfs-client标记为(default)

五、部署Prometheus监控系统

1. 创建监控命名空间

kubectl create namespace monitor

2. 创建Prometheus配置文件(prometheus-configmap.yaml)

apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-confignamespace: monitor
data:prometheus.yml: |global:scrape_interval: 15sevaluation_interval: 15sscrape_configs:- job_name: 'prometheus'kubernetes_sd_configs:- role: endpointsnamespaces:names: [monitor]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: prometheus-svcaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: webaction: keep- job_name: 'coredns'kubernetes_sd_configs:- role: endpointsnamespaces:names: [kube-system]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: kube-dnsaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: metricsaction: keep- job_name: 'kube-apiserver'scheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtinsecure_skip_verify: falsebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: endpointsnamespaces:names: [default, kube-system]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: kubernetesaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: httpsaction: keep- job_name: 'node-exporter'kubernetes_sd_configs:- role: noderelabel_configs:- source_labels: [__address__]regex: '(.*):10250'replacement: '${1}:9100'target_label: __address__action: replace- job_name: 'cadvisor'kubernetes_sd_configs:- role: nodescheme: httpstls_config:insecure_skip_verify: trueca_file: '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'bearer_token_file: '/var/run/secrets/kubernetes.io/serviceaccount/token'relabel_configs:- target_label: __metrics_path__replacement: /metrics/cadvisor

3. 创建Prometheus主部署文件(prometheus.yaml)

apiVersion: v1
kind: ServiceAccount
metadata:name: prometheusnamespace: monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: prometheus
rules:- apiGroups: [""]resources: ["nodes", "services", "endpoints", "pods", "nodes/proxy"]verbs: ["get", "list", "watch"]- apiGroups: ["extensions"]resources: ["ingresses"]verbs: ["get", "list", "watch"]- apiGroups: [""]resources: ["configmaps", "nodes/metrics"]verbs: ["get"]- nonResourceURLs: ["/metrics"]verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: prometheus
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: prometheus
subjects:- kind: ServiceAccountname: prometheusnamespace: monitor
---
apiVersion: v1
kind: Service
metadata:name: prometheus-svcnamespace: monitorlabels:app: prometheusannotations:prometheus_io_scrape: "true"
spec:selector:app: prometheustype: NodePortports:- name: webnodePort: 32224port: 9090targetPort: http
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: prometheus-pvcnamespace: monitor
spec:accessModes:- ReadWriteOnceresources:requests:storage: 2GistorageClassName: nfs-client
---
apiVersion: apps/v1
kind: Deployment
metadata:name: prometheusnamespace: monitorlabels:app: prometheus
spec:selector:matchLabels:app: prometheusreplicas: 1template:metadata:labels:app: prometheusspec:serviceAccountName: prometheusinitContainers:- name: "change-permission-of-directory"image: busybox:latestcommand: ["/bin/sh", "-c"]args:- chown -R 65534:65534 /prometheussecurityContext:privileged: truevolumeMounts:- mountPath: "/etc/prometheus"name: config-volume- mountPath: "/prometheus"name: datacontainers:- name: prometheusimage: prom/prometheus:latestimagePullPolicy: IfNotPresentargs:- "--config.file=/etc/prometheus/prometheus.yml"- "--storage.tsdb.path=/prometheus"- "--web.enable-lifecycle"- "--web.console.libraries=/usr/share/prometheus/console_libraries"- "--web.console.templates=/usr/share/prometheus/consoles"ports:- name: httpcontainerPort: 9090volumeMounts:- mountPath: "/etc/prometheus"name: config-volume- mountPath: "/prometheus"name: dataresources:requests:cpu: 100mmemory: 512Milimits:cpu: 100mmemory: 512Mivolumes:- name: datapersistentVolumeClaim:claimName: prometheus-pvc- name: config-volumeconfigMap:name: prometheus-config

4. 应用Prometheus配置

kubectl apply -f prometheus-configmap.yaml
kubectl apply -f prometheus.yaml

六、部署Node Exporter

1. 创建Node Exporter部署文件(node-exportet-daemonset.yaml)

apiVersion: apps/v1
kind: DaemonSet
metadata:name: node-exporternamespace: monitorlabels:app: node-exporter
spec:selector:matchLabels:app: node-exportertemplate:metadata:labels:app: node-exporterspec:hostPID: truehostIPC: truehostNetwork: truenodeSelector:kubernetes.io/os: linuxcontainers:- name: node-exporterimage: prom/node-exporter:latestimagePullPolicy: IfNotPresentargs:- --web.listen-address=$(HOSTIP):9100- --path.procfs=/host/proc- --path.sysfs=/host/sys- --path.rootfs=/host/root- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$env:- name: HOSTIPvalueFrom:fieldRef:fieldPath: status.hostIPports:- containerPort: 9100resources:requests:cpu: 150mmemory: 180Milimits:cpu: 150mmemory: 180MisecurityContext:runAsNonRoot: truerunAsUser: 65534volumeMounts:- name: procmountPath: /host/proc- name: sysmountPath: /host/sys- name: rootmountPath: /host/rootmountPropagation: HostToContainerreadOnly: truetolerations:- operator: "Exists"volumes:- name: prochostPath:path: /proc- name: devhostPath:path: /dev- name: syshostPath:path: /sys- name: roothostPath:path: /

2. 创建Node Exporter服务文件(node-exportet-svc.yaml)

apiVersion: v1
kind: Service
metadata:name: node-exporternamespace: monitorlabels:app: node-exporter
spec:selector:app: node-exporterports:- name: metricsport: 9100targetPort: 9100clusterIP: None

3. 应用Node Exporter配置

kubectl apply -f node-exportet-daemonset.yaml
kubectl apply -f node-exportet-svc.yaml

七、部署Grafana

1. 创建Grafana部署文件(grafana.yaml)

apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: grafana-pvcnamespace: monitor
spec:accessModes:- ReadWriteOnceresources:requests:storage: 2GistorageClassName: nfs-client
---
apiVersion: apps/v1
kind: Deployment
metadata:name: grafana-servernamespace: monitor
spec:replicas: 1selector:matchLabels:task: monitoringk8s-app: grafanatemplate:metadata:labels:task: monitoringk8s-app: grafanaspec:containers:- name: grafanaimage: grafana/grafana:latestimagePullPolicy: IfNotPresentports:- containerPort: 3000protocol: TCPvolumeMounts:- mountPath: /var/lib/grafana/name: grafana-dataenv:- name: GF_SERVER_HTTP_PORTvalue: "3000"- name: GF_AUTH_BASIC_ENABLEDvalue: "false"- name: GF_AUTH_ANONYMOUS_ENABLEDvalue: "true"- name: GF_AUTH_ANONYMOUS_ORG_ROLEvalue: Admin- name: GF_SERVER_ROOT_URLvalue: /volumes:- name: grafana-datapersistentVolumeClaim:claimName: grafana-pvc
---
apiVersion: v1
kind: Service
metadata:labels:kubernetes.io/cluster-service: 'true'kubernetes.io/name: monitoring-grafananame: grafana-svcnamespace: monitor
spec:ports:- port: 80targetPort: 3000nodePort: 31091selector:k8s-app: grafanatype: NodePort

2. 应用Grafana配置

kubectl apply -f grafana.yaml

八、验证部署

1. 检查所有组件状态

kubectl get pods -n monitor
# 应该看到类似以下输出:
# NAME                                  READY   STATUS    RESTARTS   AGE
# grafana-server-7868b7cc7c-k8lrd       1/1     Running   0          5m
# kube-state-metrics-74c47f9485-h8787   1/1     Running   0          6m
# node-exporter-6h79q                   1/1     Running   0          5m
# node-exporter-9qkbs                   1/1     Running   0          5m
# node-exporter-t64xb                   1/1     Running   0          5m
# prometheus-5696fb478b-4wf4j           1/1     Running   0          6mkubectl get pvc -n monitor
# 应该看到所有PVC状态为Boundkubectl get svc -n monitor
# 应该看到Prometheus和Grafana服务

2. 访问监控界面

  • Prometheus: http://<任意节点IP>:32224
    在这里插入图片描述

  • Grafana: http://<任意节点IP>:31091

3. 配置Grafana

  1. 登录Grafana(默认用户/密码:admin/admin)
  2. 添加数据源:
    • 类型:Prometheus
    • URL:http://prometheus-svc.monitor.svc.cluster.local:9090
  3. 导入仪表盘:
    • Node监控:ID 16098
    • Kubernetes集群监控:ID 14249

在这里插入图片描述

九、常见问题解决指南

问题1:PVC处于Pending状态

原因:NFS Provisioner未能自动创建PV
​解决方案​​:手动创建PV

# 在NFS服务器创建目录
ssh 192.168.104.50
mkdir -p /data/k8s_data/{prometheus,grafana}
chmod 777 /data/k8s_data/{prometheus,grafana}
exit# 创建手动PV
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolume
metadata:name: prometheus-pv
spec:capacity:storage: 2GivolumeMode: FilesystemaccessModes:- ReadWriteOncepersistentVolumeReclaimPolicy: RetainstorageClassName: nfs-clientnfs:path: /data/k8s_data/prometheusserver: 192.168.104.50
---
apiVersion: v1
kind: PersistentVolume
metadata:name: grafana-pv
spec:capacity:storage: 2GivolumeMode: FilesystemaccessModes:- ReadWriteOncepersistentVolumeReclaimPolicy: RetainstorageClassName: nfs-clientnfs:path: /data/k8s_data/grafanaserver: 192.168.104.50
EOF# 删除旧Pod强制重建
kubectl delete pod -n monitor --all

问题2:镜像拉取失败

解决方案

  1. 使用本地镜像加载
  2. 在部署文件中指定imagePullPolicy: IfNotPresent
  3. 确保节点上已加载所需镜像

问题3:NFS连接问题

验证步骤

# 在K8s节点上测试
mkdir /mnt/test
mount -t nfs 192.168.104.50:/data/k8s_data /mnt/test
touch /mnt/test/testfile
ls /mnt/test
umount /mnt/test

修复方法

  1. 确保NFS服务器防火墙开放2049端口
  2. 检查NFS服务器/etc/exports配置
  3. 在NFS服务器运行:exportfs -arv
http://www.xdnf.cn/news/18596.html

相关文章:

  • Spark04-MLib library01-机器学习的介绍
  • Spring创建的方式
  • 在 Ubuntu 24.04 或 22.04 LTS 服务器上安装、配置和使用 Fail2ban
  • 【LLM】DeepSeek-V3.1-Think模型相关细节
  • Android - 用Scrcpy 将手机投屏到Windows电脑上
  • MySQL学习记录-基础知识及SQL语句
  • SSRF的学习笔记
  • React useState 全面深入解析
  • 6.2 el-menu
  • Axure RP 9的安装
  • 如何让FastAPI在百万级任务处理中依然游刃有余?
  • Postman参数类型、功能、用途及 后端接口接收详解【接口调试工具】
  • 【C++】函数返回方式详解:传值、传引用与传地址
  • Linux 824 shell:expect
  • 今日科技热点 | AI加速创新,5G与量子计算引领未来
  • PHP - 实例属性访问与静态方法调用的性能差异解析
  • B站视频字幕提取工具
  • mysql 5.7 查询运行时间较长的sql
  • 【计算机408数据结构】第三章:基本数据结构之栈
  • 苍穹外卖项目实战(日记十)-记录实战教程及问题的解决方法-(day3-2)新增菜品功能完整版
  • 启动Flink SQL Client并连接到YARN集群会话
  • 拓展:simulink中将仿真环境离散化
  • K8S的部署与常用管理
  • VS2022的MFC中关联使用控制台并用printf输出调试信息
  • Redis 高可用篇
  • rust语言 (1.88) egui (0.32.1) 学习笔记(逐行注释)(十四)垂直滚动条
  • HarmonyOS实战(DevEco AI篇)—深度体验DevEco CodeGenie智能编程助手
  • 算法训练营day60 图论⑩ Bellman_ford 队列优化算法、判断负权回路、单源有限最短路(修改后版本)
  • `strcat` 字符串连接函数
  • 蔬菜溯源系统的开发与设计小程序